LeetCode 3302 - Find the Lexicographically Smallest Valid Sequence

The problem asks us to find a sequence of indices from word1 such that the characters at those indices, when concatenated in order, form a string that is almost equal to word2.

LeetCode Problem 3302

Difficulty: 🟡 Medium
Topics: Two Pointers, String, Dynamic Programming, Greedy

Solution

Problem Understanding

The problem asks us to find a sequence of indices from word1 such that the characters at those indices, when concatenated in order, form a string that is almost equal to word2. "Almost equal" means we are allowed to change at most one character in the resulting string to make it identical to word2.

The input consists of two strings: word1 is longer than word2, and both consist only of lowercase English letters. The output is an array of indices, the same length as word2, representing the positions in word1 we select. Among all valid sequences, we must return the lexicographically smallest sequence of indices, which means the sequence that is smallest when compared element by element from left to right.

Constraints tell us that word1 can be up to 300,000 characters, which rules out any brute-force solution that enumerates all combinations of indices. We need a method that efficiently selects indices to satisfy both the "almost equal" condition and lexicographic minimality.

Edge cases include situations where it is impossible to form a valid sequence (return an empty array), or where multiple subsequences could satisfy the condition but differ in lexicographic order.

Approaches

A brute-force approach would involve generating all subsequences of word1 of length equal to word2 and checking whether each is almost equal to word2. While this guarantees correctness, it is computationally infeasible because the number of subsequences grows combinatorially with word1.length.

The key insight for an optimal solution is to greedily select the lexicographically smallest character possible for each position, while ensuring that we can complete the remainder of the sequence to match word2 with at most one character change. To do this efficiently, we precompute the last positions of each character in word1 to check if selecting a certain character at a current index still allows forming the remaining sequence.

This greedy approach works because we are allowed only one mismatch. We can track the mismatch and continue scanning for the smallest possible next character while ensuring we never exceed one allowed change.

Approach Time Complexity Space Complexity Notes
Brute Force O(C(n, m) * m) O(m) Generates all subsequences of length m, too slow for large n
Optimal O(n * 26) O(n * 26) Greedy with precomputed next occurrence table to find lexicographically smallest sequence efficiently

Algorithm Walkthrough

  1. Precompute Next Occurrences: For each position i in word1, precompute the next index where each character 'a' to 'z' appears. This allows us to quickly jump to the next occurrence of a candidate character for the sequence.
  2. Initialize Sequence: Create an empty array of size word2.length to store the chosen indices.
  3. Iterate Through word2: For each position i in word2, try to select the lexicographically smallest character from the remaining part of word1. If the current character in word1 matches word2[i], select it immediately. If it does not match, record this as a potential mismatch. Ensure that only one mismatch occurs.
  4. Verify Feasibility: After selecting a character, check if the remaining length of word1 is sufficient to fill the rest of the sequence. If not, backtrack or return an empty array if no valid sequence is possible.
  5. Return Result: After completing all positions in word2, return the sequence of indices. If the algorithm encounters an impossible step (e.g., cannot find a character for a future position), return an empty array.

Why it works: The greedy choice of the lexicographically smallest valid character at each step ensures that the final sequence is minimal in lexicographic order. Allowing at most one mismatch ensures compliance with the "almost equal" condition.

Python Solution

from typing import List

class Solution:
    def validSequence(self, word1: str, word2: str) -> List[int]:
        n, m = len(word1), len(word2)
        if m > n:
            return []

        # Precompute next occurrence of each character
        next_occ = [[-1] * 26 for _ in range(n + 1)]
        last = [-1] * 26
        for i in range(n - 1, -1, -1):
            last[ord(word1[i]) - ord('a')] = i
            for c in range(26):
                next_occ[i][c] = last[c]

        res = []
        pos = 0
        mismatch_used = False
        for i in range(m):
            found = False
            for c in range(26):
                idx = next_occ[pos][c]
                if idx == -1:
                    continue
                if idx + (m - i - 1) >= n:
                    continue
                if c == ord(word2[i]) - ord('a') or not mismatch_used:
                    res.append(idx)
                    pos = idx + 1
                    if c != ord(word2[i]) - ord('a'):
                        mismatch_used = True
                    found = True
                    break
            if not found:
                return []
        return res

The solution first constructs a table next_occ to quickly find the next occurrence of each character in word1. We then iterate through word2 and for each character, select the smallest character from word1 that either matches the target or can serve as the allowed mismatch. We carefully update the mismatch_used flag and the current scanning position pos.

Go Solution

func validSequence(word1 string, word2 string) []int {
    n, m := len(word1), len(word2)
    if m > n {
        return []int{}
    }

    nextOcc := make([][26]int, n+1)
    last := [26]int{}
    for i := range last {
        last[i] = -1
    }

    for i := n - 1; i >= 0; i-- {
        last[word1[i]-'a'] = i
        for c := 0; c < 26; c++ {
            nextOcc[i][c] = last[c]
        }
    }

    res := make([]int, 0, m)
    pos := 0
    mismatchUsed := false

    for i := 0; i < m; i++ {
        found := false
        for c := 0; c < 26; c++ {
            idx := nextOcc[pos][c]
            if idx == -1 || idx+(m-i-1) >= n {
                continue
            }
            if c == int(word2[i]-'a') || !mismatchUsed {
                res = append(res, idx)
                pos = idx + 1
                if c != int(word2[i]-'a') {
                    mismatchUsed = true
                }
                found = true
                break
            }
        }
        if !found {
            return []int{}
        }
    }

    return res
}

Go handles slices and arrays explicitly, and we use fixed-size arrays for nextOcc to optimize memory and access. Nil vs empty slice behavior is handled by returning []int{} when no valid sequence exists.

Worked Examples

Example 1: word1 = "vbcca", word2 = "abc"

i word2[i] Selected c idx res mismatch_used
0 a a (from v->a) 0 [0] True
1 b b 1 [0,1] True
2 c c 2 [0,1,2] True

Example 2: word1 = "bacdc", word2 = "abc"

i word2[i] Selected c idx res mismatch_used
0 a a 1 [1] False
1 b b (v->b) 2 [1,2] True
2 c c 4 [1,2,4] True

Example 3: word1 = "aaaaaa", word2 = "aaabc"

No valid sequence exists because more than one mismatch is needed.

Example 4: word1 = "abc", word2 = "ab"

Sequence [0,1] is selected directly without using a mismatch.

Complexity Analysis

Measure Complexity Explanation
Time O(n * 26) Precomputing next occurrences and scanning for each position in word2 across 26 characters
Space O(n * 26) Storing the next occurrence table for each character at each index

The algorithm is linear in practice because the inner loop over 26 characters is a constant factor.

Test Cases

# Provided examples
assert Solution().validSequence("vbcca", "abc") == [0,1,2]  # basic mismatch
assert Solution().validSequence("bacdc", "abc") == [1,2,4]  # mismatch in middle
assert Solution().validSequence("aaaaaa", "aaabc") == []    # impossible