LeetCode 3303 - Find the Occurrence of First Almost Equal Substring
We are given two strings, s and pattern. The task is to find the smallest starting index in s such that the substring of length len(pattern) is "almost equal" to pattern.
Difficulty: 🔴 Hard
Topics: String, String Matching
Solution
Problem Understanding
We are given two strings, s and pattern. The task is to find the smallest starting index in s such that the substring of length len(pattern) is "almost equal" to pattern.
Two strings are considered almost equal if we can modify at most one character in one string to make the two strings identical. That means:
- The substring may already be exactly equal to
pattern - The substring may differ in exactly one position
- If it differs in two or more positions, it is invalid
Suppose the pattern length is m. For every valid starting position i in s, we examine the substring:
s[i : i + m]
and determine whether its Hamming distance from pattern is at most 1.
The goal is to return the smallest valid index. If no such substring exists, we return -1.
The constraints are extremely important:
1 <= pattern.length < s.length <= 10^5
A naive comparison of every substring against the pattern would require comparing up to m characters for each starting position. Since there are roughly n starting positions, that leads to O(n * m) time complexity, which becomes too slow when both strings are close to 10^5.
This means we need a near linear time string matching solution.
Several edge cases are important:
- The substring may already exactly match the pattern
- The mismatch may occur at the beginning, middle, or end
- The pattern may have length
1 - There may be no valid substring at all
- Multiple valid substrings may exist, but we must return the smallest index
The key challenge is efficiently determining whether two strings differ in at most one position.
Approaches
Brute Force Approach
The simplest solution is to check every possible substring of length m in s.
For each starting index i:
- Compare
s[i + j]withpattern[j] - Count mismatches
- If mismatches exceed
1, stop early - Otherwise, return
i
This approach is correct because it explicitly checks the exact condition required by the problem.
However, its worst case complexity is too large. There are O(n) substrings, and each comparison may require O(m) work. Therefore the total complexity becomes O(n * m).
With input sizes up to 10^5, this can reach 10^10 operations, which is far beyond acceptable limits.
Optimal Approach
The main insight is that two strings differ in at most one position if and only if:
- Their matching prefix before the mismatch is identical
- Their matching suffix after the mismatch is identical
Suppose a mismatch occurs at position k.
Then:
substring[0:k] == pattern[0:k]
substring[k+1:] == pattern[k+1:]
So instead of directly comparing every substring character by character, we can precompute:
- The longest prefix match length at every alignment
- The longest suffix match length at every alignment
Using the Z algorithm, we can compute these efficiently in linear time.
For each starting position i:
- Let
leftMatchbe the number of matching characters from the front - Let
rightMatchbe the number of matching characters from the back
If:
leftMatch + rightMatch >= m - 1
then the substring differs in at most one position.
This works because all positions except possibly one are matched.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n * m) | O(1) | Compares every substring directly |
| Optimal | O(n + m) | O(n + m) | Uses Z algorithm for prefix and suffix matching |
Algorithm Walkthrough
Step 1: Compute prefix matches using the Z algorithm
We build the string:
pattern + "#" + s
Then we compute its Z array.
The Z value at a position tells us how many characters match the prefix of the entire string.
Since the prefix is pattern, the Z values corresponding to positions inside s tell us how many characters match between:
pattern
and
s[i:]
This gives the longest matching prefix length for every alignment.
Step 2: Compute suffix matches
To compute matching suffixes efficiently, we reverse both strings:
reverse(pattern)
reverse(s)
Then we repeat the same Z algorithm process.
This lets us determine how many characters match from the end.
Step 3: Evaluate every possible starting position
For every valid index i:
prefix[i]tells us how many characters match from the frontsuffix[i]tells us how many characters match from the back
If:
prefix[i] + suffix[i] >= m - 1
then all characters except possibly one match.
We immediately return the first valid index.
Step 4: Return -1 if no valid substring exists
If we finish scanning all positions without finding a valid substring, then no answer exists.
Why it works
The algorithm relies on the fact that a substring and the pattern can differ in at most one position only if all remaining positions match.
Suppose the mismatch occurs at index k.
Then:
- The prefix before
kmust match - The suffix after
kmust match
The total number of matched positions becomes:
k + (m - k - 1) = m - 1
Therefore checking whether:
prefixMatch + suffixMatch >= m - 1
is sufficient and necessary.
The Z algorithm allows these prefix and suffix matches to be computed in linear time.
Python Solution
class Solution:
def minStartingIndex(self, s: str, pattern: str) -> int:
n = len(s)
m = len(pattern)
def z_function(text: str) -> list[int]:
length = len(text)
z = [0] * length
left = 0
right = 0
for i in range(1, length):
if i <= right:
z[i] = min(right - i + 1, z[i - left])
while (
i + z[i] < length
and text[z[i]] == text[i + z[i]]
):
z[i] += 1
if i + z[i] - 1 > right:
left = i
right = i + z[i] - 1
return z
forward = pattern + "#" + s
z_forward = z_function(forward)
prefix_match = [0] * (n - m + 1)
for i in range(n - m + 1):
prefix_match[i] = min(m, z_forward[m + 1 + i])
reversed_pattern = pattern[::-1]
reversed_s = s[::-1]
backward = reversed_pattern + "#" + reversed_s
z_backward = z_function(backward)
suffix_match = [0] * (n - m + 1)
for i in range(n - m + 1):
reversed_index = n - (i + m)
suffix_match[i] = min(
m,
z_backward[m + 1 + reversed_index]
)
for i in range(n - m + 1):
if prefix_match[i] + suffix_match[i] >= m - 1:
return i
return -1
The implementation begins with a standard Z algorithm helper function. The Z function computes, for every position, the longest substring starting there that matches the prefix of the string.
Next, we concatenate:
pattern + "#" + s
This structure ensures that any match length computed inside the s portion corresponds directly to a prefix match against pattern.
The same process is repeated with reversed strings to compute suffix matches.
The arrays prefix_match and suffix_match store, for each starting position in s, how many characters match from the front and back respectively.
Finally, we scan every possible substring position. If the combined matched characters cover at least m - 1 positions, then the substring differs in at most one place, so we return that index.
Go Solution
func minStartingIndex(s string, pattern string) int {
n := len(s)
m := len(pattern)
zFunction := func(text string) []int {
length := len(text)
z := make([]int, length)
left, right := 0, 0
for i := 1; i < length; i++ {
if i <= right {
if z[i-left] < right-i+1 {
z[i] = z[i-left]
} else {
z[i] = right - i + 1
}
}
for i+z[i] < length &&
text[z[i]] == text[i+z[i]] {
z[i]++
}
if i+z[i]-1 > right {
left = i
right = i + z[i] - 1
}
}
return z
}
forward := pattern + "#" + s
zForward := zFunction(forward)
prefixMatch := make([]int, n-m+1)
for i := 0; i <= n-m; i++ {
value := zForward[m+1+i]
if value > m {
value = m
}
prefixMatch[i] = value
}
reversedPattern := reverseString(pattern)
reversedS := reverseString(s)
backward := reversedPattern + "#" + reversedS
zBackward := zFunction(backward)
suffixMatch := make([]int, n-m+1)
for i := 0; i <= n-m; i++ {
reversedIndex := n - (i + m)
value := zBackward[m+1+reversedIndex]
if value > m {
value = m
}
suffixMatch[i] = value
}
for i := 0; i <= n-m; i++ {
if prefixMatch[i]+suffixMatch[i] >= m-1 {
return i
}
}
return -1
}
func reverseString(s string) string {
bytes := []byte(s)
left := 0
right := len(bytes) - 1
for left < right {
bytes[left], bytes[right] = bytes[right], bytes[left]
left++
right--
}
return string(bytes)
}
The Go implementation closely mirrors the Python version. Since Go strings are immutable, reversing requires converting the string into a byte slice.
The Z algorithm implementation uses slices instead of Python lists, but the logic remains identical.
Because the input contains only lowercase English letters, using byte indexing is completely safe and avoids Unicode complications.
Worked Examples
Example 1
s = "abcdefg"
pattern = "bcdffg"
Pattern length:
m = 6
Possible substrings:
| Index | Substring | Differences |
|---|---|---|
| 0 | abcdef | many |
| 1 | bcdefg | 1 |
| 2 | cdefg | invalid length |
At index 1:
bcdefg
bcdffg
^
Only one mismatch exists.
Prefix matches:
| Position | Match Length |
|---|---|
| 1 | 3 |
Suffix matches:
| Position | Match Length |
|---|---|
| 1 | 2 |
Total:
3 + 2 = 5
m - 1 = 5
Valid answer:
1
Example 2
s = "ababbababa"
pattern = "bacaba"
Substring at index 4:
bababa
bacaba
^
Exactly one mismatch.
Prefix match length:
2
Suffix match length:
3
Total:
5 >= 5
Return:
4
Example 3
s = "abcd"
pattern = "dba"
Substrings:
| Index | Substring | Differences |
|---|---|---|
| 0 | abc | 3 |
| 1 | bcd | 3 |
No valid substring exists.
Return:
-1
Example 4
s = "dde"
pattern = "d"
Substring at index 0:
d
d
Exact match.
Return:
0
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n + m) | Two Z algorithm runs plus one linear scan |
| Space | O(n + m) | Storage for Z arrays and match arrays |
The Z algorithm runs in linear time because each character is processed at most a constant number of times while maintaining the current matching interval.
We execute the Z algorithm twice:
- Once for prefix matches
- Once for suffix matches
All remaining work is linear, so the total complexity remains O(n + m).
Test Cases
sol = Solution()
assert sol.minStartingIndex("abcdefg", "bcdffg") == 1 # one mismatch
assert sol.minStartingIndex("ababbababa", "bacaba") == 4 # mismatch in middle
assert sol.minStartingIndex("abcd", "dba") == -1 # no valid substring
assert sol.minStartingIndex("dde", "d") == 0 # single character exact match
assert sol.minStartingIndex("aaaaa", "aaa") == 0 # many exact matches
assert sol.minStartingIndex("aaaaa", "aab") == 0 # one mismatch at end
assert sol.minStartingIndex("xyzabc", "abc") == 3 # exact match at end
assert sol.minStartingIndex("xyzabc", "abd") == 3 # one mismatch at end
assert sol.minStartingIndex("xyzabc", "zzz") == -1 # too many mismatches
assert sol.minStartingIndex("abcde", "xbcde") == 0 # mismatch at beginning
assert sol.minStartingIndex("abcde", "abxde") == 0 # mismatch in middle
assert sol.minStartingIndex("abcde", "abcdx") == 0 # mismatch at end
assert sol.minStartingIndex("bbbbbbbb", "bbbb") == 0 # repeated characters
assert sol.minStartingIndex("abababab", "abac") == 0 # periodic structure
assert sol.minStartingIndex("qwerty", "asdfg") == -1 # completely different
assert sol.minStartingIndex("aaab", "aabb") == 0 # exactly one mismatch
assert sol.minStartingIndex("aaab", "bbbb") == -1 # too many mismatches
| Test | Why |
|---|---|
"abcdefg", "bcdffg" |
Standard one mismatch case |
"ababbababa", "bacaba" |
Mismatch in middle |
"abcd", "dba" |
No valid substring |
"dde", "d" |
Single character pattern |
"aaaaa", "aaa" |
Multiple exact matches |
"aaaaa", "aab" |
One mismatch near end |
"xyzabc", "abc" |
Match at final valid index |
"xyzabc", "abd" |
One mismatch at final valid index |
"xyzabc", "zzz" |
No overlap at all |
"abcde", "xbcde" |
Mismatch at first character |
"abcde", "abxde" |
Mismatch in center |
"abcde", "abcdx" |
Mismatch at last character |
"bbbbbbbb", "bbbb" |
Repeated characters |
"abababab", "abac" |
Alternating pattern |
"qwerty", "asdfg" |
Completely different strings |
"aaab", "aabb" |
Exactly one mismatch |
"aaab", "bbbb" |
More than one mismatch |
Edge Cases
One important edge case occurs when the substring already exactly matches the pattern. Since the problem allows changing at most one character, zero mismatches is completely valid. The implementation handles this naturally because exact matches produce:
prefix_match + suffix_match >= m
which certainly satisfies the required condition.
Another important edge case is when the mismatch occurs at the very beginning or very end of the pattern. Some implementations incorrectly assume both prefix and suffix lengths must be positive. In this solution, either side may contribute zero characters, and the combined total still correctly determines validity.
A particularly tricky case involves patterns of length 1. In this situation, every substring of length 1 differs in at most one position, because a single replacement can transform any character into any other character. The condition:
prefix_match + suffix_match >= m - 1
becomes:
>= 0
which is always true, correctly returning index 0.
Repeated character strings are another common source of bugs for string matching algorithms. The Z algorithm handles repeated prefixes correctly because it carefully maintains the current matching interval and reuses previously computed information without missing overlaps.