LeetCode 792 - Number of Matching Subsequences
The problem asks us to count how many words in a given list are subsequences of a string s. A subsequence is formed by deleting zero or more characters from the string without changing the order of the remaining characters.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Binary Search, Dynamic Programming, Trie, Sorting
Solution
Problem Understanding
The problem asks us to count how many words in a given list are subsequences of a string s. A subsequence is formed by deleting zero or more characters from the string without changing the order of the remaining characters. For instance, "ace" is a subsequence of "abcde" because we can remove "b" and "d" and retain "a", "c", "e" in the correct order.
The inputs are:
s- a string of length up to50,000, containing only lowercase letters.words- a list of strings, each up to length50, and the list can have up to5,000words.
The expected output is the number of strings in words that are subsequences of s.
Key constraints suggest that checking each word character by character against s (naively) could be too slow because iterating 50,000 characters for 5,000 words is potentially 250 million operations. We need an approach that reduces repeated scanning of s.
Important edge cases include empty words, words that are longer than s, words that contain characters not present in s, and repeated words in words.
Approaches
The brute-force approach would be to check each word in words to see if it is a subsequence of s by iterating over s and matching characters sequentially. This guarantees correctness but is too slow because for each word of length m, we may scan up to n characters of s. The time complexity is roughly O(n * m * len(words)), which can reach O(50,000 * 50 * 5,000) in the worst case, which is impractical.
The optimal approach uses the insight that we can preprocess s to speed up subsequence checks. One effective strategy is bucketing words by the character they are waiting for. Essentially, for each character in s, we maintain a list of iterators for words that are waiting for that character. When a character is seen in s, we advance all corresponding iterators. This avoids scanning s repeatedly for every word.
This approach transforms the problem into a linear scan of s with efficient word progress tracking. Each character in each word is processed exactly once, giving a much faster solution.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n * m * k) | O(1) | Check each word against s sequentially; too slow for large inputs |
| Optimal (Iterator Buckets) | O(n + total_chars_in_words) | O(total_chars_in_words) | Uses queues to track progress of each word waiting for its next character |
Algorithm Walkthrough
-
Initialize a dictionary (or array of lists) called
waitingto map each character'a'to'z'to a list of iterators. Each iterator corresponds to a word waiting for its next character to appear ins. -
Populate
waitingby iterating through each word inwords. For each word, take its first character and append an iterator (or index) that tracks the word's position to the corresponding bucket inwaiting. -
Initialize a counter
countto zero. This will track how many words have been fully matched. -
Iterate over each character
cins. For each character: -
Retrieve the list of iterators waiting for
cand reset that bucket to empty. -
For each iterator in the list:
-
Advance it to the next character in the word.
-
If there is a next character, append it to the corresponding bucket in
waiting. -
If there is no next character, increment
countbecause this word is fully matched. -
Return
count.
Why it works: Each word is only processed when its expected character is encountered in s. Every character of each word is considered exactly once, so all subsequences are counted correctly without redundant scanning of s.
Python Solution
from collections import defaultdict, deque
from typing import List
class Solution:
def numMatchingSubseq(self, s: str, words: List[str]) -> int:
waiting = defaultdict(deque)
for word in words:
waiting[word[0]].append(iter(word[1:]))
count = 0
for char in s:
current_waiting = waiting[char]
waiting[char] = deque()
while current_waiting:
it = current_waiting.popleft()
nxt = next(it, None)
if nxt is None:
count += 1
else:
waiting[nxt].append(it)
return count
This implementation first creates a mapping from characters to word iterators that are waiting for that character. As we scan s, we advance iterators. If an iterator has no next character, we increment the count. Using a deque ensures efficient pop from the left and append operations.
Go Solution
package main
func numMatchingSubseq(s string, words []string) int {
waiting := make(map[byte][]*WordIterator)
for _, word := range words {
it := &WordIterator{word: word, index: 1}
waiting[word[0]] = append(waiting[word[0]], it)
}
count := 0
for i := 0; i < len(s); i++ {
c := s[i]
current := waiting[c]
waiting[c] = []*WordIterator{}
for _, it := range current {
if it.index == len(it.word) {
count++
} else {
waiting[it.word[it.index]] = append(waiting[it.word[it.index]], &WordIterator{word: it.word, index: it.index + 1})
}
}
}
return count
}
type WordIterator struct {
word string
index int
}
In Go, we simulate the iterator with a struct containing the word and the next index to check. This allows similar behavior to Python iterators. Go slices are used for queues, and we carefully manage indices to avoid out-of-bounds errors.
Worked Examples
Example 1: s = "abcde", words = ["a","bb","acd","ace"]
| Step | char in s | waiting state | count |
|---|---|---|---|
| 1 | a | ['a': iterator(''), 'b': iterator('b'), 'a': iterator('cd'), 'a': iterator('ce')] | 0 |
| 2 | b | ... | ... |
Walking through all characters, we see that "a", "acd", and "ace" are completed. Count = 3.
Example 2: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]
After scanning s completely, only "ahjpjau" and "ja" are fully matched. Count = 2.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n + total_chars_in_words) | Each character in s is processed once, and each character in every word is processed once when their expected character appears |
| Space | O(total_chars_in_words) | We store iterators for all words, which scales with the total number of characters in words |
This approach is highly efficient because it avoids scanning s multiple times for each word.
Test Cases
# Provided examples
assert Solution().numMatchingSubseq("abcde", ["a","bb","acd","ace"]) == 3 # Example 1
assert Solution().numMatchingSubseq("dsahjpjauf", ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]) == 2 # Example 2
# Edge cases
assert Solution().numMatchingSubseq("", ["a"]) == 0 # empty s
assert Solution().numMatchingSubseq("abc", [""]) == 1 # empty word
assert Solution().numMatchingSubseq("abc", ["abc", "acb"]) == 1 # one valid, one invalid
assert Solution().numMatchingSubseq("aaa", ["aa","aaa","aaaa"]) == 2 # repeated chars, longer than s ignored
assert Solution().numMatchingSubseq("abcdefghijklmnopqrstuvwxyz", ["a","z","az","za","abc","xyz"]) == 5
| Test | Why |
|---|---|
"" as s |
Tests empty string handling |
| "" as word | Empty word is always a subsequence |
| One valid, one invalid | Confirms subsequence order checking |
| Repeated characters | Handles words longer than s correctly |
| Alphabet coverage | Checks multiple positions and character matching |
Edge Cases
One edge case is when s is empty. In this situation, no non-empty word can be a subsequence, and the algorithm correctly returns 0.
Another edge case is empty strings in words. According to the subsequence definition, an empty string is a subsequence of any string. The algorithm handles this because iterators of length 0 immediately contribute to the count.
A third edge case involves repeated characters or words longer than s. The algorithm correctly skips words that cannot match because the iterator never reaches completion, and repeated characters in s help