LeetCode 3302 - Find the Lexicographically Smallest Valid Sequence
The problem asks us to find a sequence of indices from word1 such that the characters at those indices, when concatenated in order, form a string that is almost equal to word2.
Difficulty: 🟡 Medium
Topics: Two Pointers, String, Dynamic Programming, Greedy
Solution
Problem Understanding
The problem asks us to find a sequence of indices from word1 such that the characters at those indices, when concatenated in order, form a string that is almost equal to word2. "Almost equal" means we are allowed to change at most one character in the resulting string to make it identical to word2.
The input consists of two strings: word1 is longer than word2, and both consist only of lowercase English letters. The output is an array of indices, the same length as word2, representing the positions in word1 we select. Among all valid sequences, we must return the lexicographically smallest sequence of indices, which means the sequence that is smallest when compared element by element from left to right.
Constraints tell us that word1 can be up to 300,000 characters, which rules out any brute-force solution that enumerates all combinations of indices. We need a method that efficiently selects indices to satisfy both the "almost equal" condition and lexicographic minimality.
Edge cases include situations where it is impossible to form a valid sequence (return an empty array), or where multiple subsequences could satisfy the condition but differ in lexicographic order.
Approaches
A brute-force approach would involve generating all subsequences of word1 of length equal to word2 and checking whether each is almost equal to word2. While this guarantees correctness, it is computationally infeasible because the number of subsequences grows combinatorially with word1.length.
The key insight for an optimal solution is to greedily select the lexicographically smallest character possible for each position, while ensuring that we can complete the remainder of the sequence to match word2 with at most one character change. To do this efficiently, we precompute the last positions of each character in word1 to check if selecting a certain character at a current index still allows forming the remaining sequence.
This greedy approach works because we are allowed only one mismatch. We can track the mismatch and continue scanning for the smallest possible next character while ensuring we never exceed one allowed change.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(C(n, m) * m) | O(m) | Generates all subsequences of length m, too slow for large n |
| Optimal | O(n * 26) | O(n * 26) | Greedy with precomputed next occurrence table to find lexicographically smallest sequence efficiently |
Algorithm Walkthrough
- Precompute Next Occurrences: For each position
iinword1, precompute the next index where each character'a'to'z'appears. This allows us to quickly jump to the next occurrence of a candidate character for the sequence. - Initialize Sequence: Create an empty array of size
word2.lengthto store the chosen indices. - Iterate Through word2: For each position
iinword2, try to select the lexicographically smallest character from the remaining part ofword1. If the current character inword1matchesword2[i], select it immediately. If it does not match, record this as a potential mismatch. Ensure that only one mismatch occurs. - Verify Feasibility: After selecting a character, check if the remaining length of
word1is sufficient to fill the rest of the sequence. If not, backtrack or return an empty array if no valid sequence is possible. - Return Result: After completing all positions in
word2, return the sequence of indices. If the algorithm encounters an impossible step (e.g., cannot find a character for a future position), return an empty array.
Why it works: The greedy choice of the lexicographically smallest valid character at each step ensures that the final sequence is minimal in lexicographic order. Allowing at most one mismatch ensures compliance with the "almost equal" condition.
Python Solution
from typing import List
class Solution:
def validSequence(self, word1: str, word2: str) -> List[int]:
n, m = len(word1), len(word2)
if m > n:
return []
# Precompute next occurrence of each character
next_occ = [[-1] * 26 for _ in range(n + 1)]
last = [-1] * 26
for i in range(n - 1, -1, -1):
last[ord(word1[i]) - ord('a')] = i
for c in range(26):
next_occ[i][c] = last[c]
res = []
pos = 0
mismatch_used = False
for i in range(m):
found = False
for c in range(26):
idx = next_occ[pos][c]
if idx == -1:
continue
if idx + (m - i - 1) >= n:
continue
if c == ord(word2[i]) - ord('a') or not mismatch_used:
res.append(idx)
pos = idx + 1
if c != ord(word2[i]) - ord('a'):
mismatch_used = True
found = True
break
if not found:
return []
return res
The solution first constructs a table next_occ to quickly find the next occurrence of each character in word1. We then iterate through word2 and for each character, select the smallest character from word1 that either matches the target or can serve as the allowed mismatch. We carefully update the mismatch_used flag and the current scanning position pos.
Go Solution
func validSequence(word1 string, word2 string) []int {
n, m := len(word1), len(word2)
if m > n {
return []int{}
}
nextOcc := make([][26]int, n+1)
last := [26]int{}
for i := range last {
last[i] = -1
}
for i := n - 1; i >= 0; i-- {
last[word1[i]-'a'] = i
for c := 0; c < 26; c++ {
nextOcc[i][c] = last[c]
}
}
res := make([]int, 0, m)
pos := 0
mismatchUsed := false
for i := 0; i < m; i++ {
found := false
for c := 0; c < 26; c++ {
idx := nextOcc[pos][c]
if idx == -1 || idx+(m-i-1) >= n {
continue
}
if c == int(word2[i]-'a') || !mismatchUsed {
res = append(res, idx)
pos = idx + 1
if c != int(word2[i]-'a') {
mismatchUsed = true
}
found = true
break
}
}
if !found {
return []int{}
}
}
return res
}
Go handles slices and arrays explicitly, and we use fixed-size arrays for nextOcc to optimize memory and access. Nil vs empty slice behavior is handled by returning []int{} when no valid sequence exists.
Worked Examples
Example 1: word1 = "vbcca", word2 = "abc"
| i | word2[i] | Selected c | idx | res | mismatch_used |
|---|---|---|---|---|---|
| 0 | a | a (from v->a) | 0 | [0] | True |
| 1 | b | b | 1 | [0,1] | True |
| 2 | c | c | 2 | [0,1,2] | True |
Example 2: word1 = "bacdc", word2 = "abc"
| i | word2[i] | Selected c | idx | res | mismatch_used |
|---|---|---|---|---|---|
| 0 | a | a | 1 | [1] | False |
| 1 | b | b (v->b) | 2 | [1,2] | True |
| 2 | c | c | 4 | [1,2,4] | True |
Example 3: word1 = "aaaaaa", word2 = "aaabc"
No valid sequence exists because more than one mismatch is needed.
Example 4: word1 = "abc", word2 = "ab"
Sequence [0,1] is selected directly without using a mismatch.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n * 26) | Precomputing next occurrences and scanning for each position in word2 across 26 characters |
| Space | O(n * 26) | Storing the next occurrence table for each character at each index |
The algorithm is linear in practice because the inner loop over 26 characters is a constant factor.
Test Cases
# Provided examples
assert Solution().validSequence("vbcca", "abc") == [0,1,2] # basic mismatch
assert Solution().validSequence("bacdc", "abc") == [1,2,4] # mismatch in middle
assert Solution().validSequence("aaaaaa", "aaabc") == [] # impossible