LeetCode 792 - Number of Matching Subsequences

The problem asks us to count how many words in a given list are subsequences of a string s. A subsequence is formed by deleting zero or more characters from the string without changing the order of the remaining characters.

LeetCode Problem 792

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Binary Search, Dynamic Programming, Trie, Sorting

Solution

Problem Understanding

The problem asks us to count how many words in a given list are subsequences of a string s. A subsequence is formed by deleting zero or more characters from the string without changing the order of the remaining characters. For instance, "ace" is a subsequence of "abcde" because we can remove "b" and "d" and retain "a", "c", "e" in the correct order.

The inputs are:

  1. s - a string of length up to 50,000, containing only lowercase letters.
  2. words - a list of strings, each up to length 50, and the list can have up to 5,000 words.

The expected output is the number of strings in words that are subsequences of s.

Key constraints suggest that checking each word character by character against s (naively) could be too slow because iterating 50,000 characters for 5,000 words is potentially 250 million operations. We need an approach that reduces repeated scanning of s.

Important edge cases include empty words, words that are longer than s, words that contain characters not present in s, and repeated words in words.

Approaches

The brute-force approach would be to check each word in words to see if it is a subsequence of s by iterating over s and matching characters sequentially. This guarantees correctness but is too slow because for each word of length m, we may scan up to n characters of s. The time complexity is roughly O(n * m * len(words)), which can reach O(50,000 * 50 * 5,000) in the worst case, which is impractical.

The optimal approach uses the insight that we can preprocess s to speed up subsequence checks. One effective strategy is bucketing words by the character they are waiting for. Essentially, for each character in s, we maintain a list of iterators for words that are waiting for that character. When a character is seen in s, we advance all corresponding iterators. This avoids scanning s repeatedly for every word.

This approach transforms the problem into a linear scan of s with efficient word progress tracking. Each character in each word is processed exactly once, giving a much faster solution.

Approach Time Complexity Space Complexity Notes
Brute Force O(n * m * k) O(1) Check each word against s sequentially; too slow for large inputs
Optimal (Iterator Buckets) O(n + total_chars_in_words) O(total_chars_in_words) Uses queues to track progress of each word waiting for its next character

Algorithm Walkthrough

  1. Initialize a dictionary (or array of lists) called waiting to map each character 'a' to 'z' to a list of iterators. Each iterator corresponds to a word waiting for its next character to appear in s.

  2. Populate waiting by iterating through each word in words. For each word, take its first character and append an iterator (or index) that tracks the word's position to the corresponding bucket in waiting.

  3. Initialize a counter count to zero. This will track how many words have been fully matched.

  4. Iterate over each character c in s. For each character:

  5. Retrieve the list of iterators waiting for c and reset that bucket to empty.

  6. For each iterator in the list:

  7. Advance it to the next character in the word.

  8. If there is a next character, append it to the corresponding bucket in waiting.

  9. If there is no next character, increment count because this word is fully matched.

  10. Return count.

Why it works: Each word is only processed when its expected character is encountered in s. Every character of each word is considered exactly once, so all subsequences are counted correctly without redundant scanning of s.

Python Solution

from collections import defaultdict, deque
from typing import List

class Solution:
    def numMatchingSubseq(self, s: str, words: List[str]) -> int:
        waiting = defaultdict(deque)
        for word in words:
            waiting[word[0]].append(iter(word[1:]))
        
        count = 0
        for char in s:
            current_waiting = waiting[char]
            waiting[char] = deque()
            while current_waiting:
                it = current_waiting.popleft()
                nxt = next(it, None)
                if nxt is None:
                    count += 1
                else:
                    waiting[nxt].append(it)
        return count

This implementation first creates a mapping from characters to word iterators that are waiting for that character. As we scan s, we advance iterators. If an iterator has no next character, we increment the count. Using a deque ensures efficient pop from the left and append operations.

Go Solution

package main

func numMatchingSubseq(s string, words []string) int {
	waiting := make(map[byte][]*WordIterator)
	for _, word := range words {
		it := &WordIterator{word: word, index: 1}
		waiting[word[0]] = append(waiting[word[0]], it)
	}

	count := 0
	for i := 0; i < len(s); i++ {
		c := s[i]
		current := waiting[c]
		waiting[c] = []*WordIterator{}
		for _, it := range current {
			if it.index == len(it.word) {
				count++
			} else {
				waiting[it.word[it.index]] = append(waiting[it.word[it.index]], &WordIterator{word: it.word, index: it.index + 1})
			}
		}
	}
	return count
}

type WordIterator struct {
	word  string
	index int
}

In Go, we simulate the iterator with a struct containing the word and the next index to check. This allows similar behavior to Python iterators. Go slices are used for queues, and we carefully manage indices to avoid out-of-bounds errors.

Worked Examples

Example 1: s = "abcde", words = ["a","bb","acd","ace"]

Step char in s waiting state count
1 a ['a': iterator(''), 'b': iterator('b'), 'a': iterator('cd'), 'a': iterator('ce')] 0
2 b ... ...

Walking through all characters, we see that "a", "acd", and "ace" are completed. Count = 3.

Example 2: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]

After scanning s completely, only "ahjpjau" and "ja" are fully matched. Count = 2.

Complexity Analysis

Measure Complexity Explanation
Time O(n + total_chars_in_words) Each character in s is processed once, and each character in every word is processed once when their expected character appears
Space O(total_chars_in_words) We store iterators for all words, which scales with the total number of characters in words

This approach is highly efficient because it avoids scanning s multiple times for each word.

Test Cases

# Provided examples
assert Solution().numMatchingSubseq("abcde", ["a","bb","acd","ace"]) == 3  # Example 1
assert Solution().numMatchingSubseq("dsahjpjauf", ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]) == 2  # Example 2

# Edge cases
assert Solution().numMatchingSubseq("", ["a"]) == 0  # empty s
assert Solution().numMatchingSubseq("abc", [""]) == 1  # empty word
assert Solution().numMatchingSubseq("abc", ["abc", "acb"]) == 1  # one valid, one invalid
assert Solution().numMatchingSubseq("aaa", ["aa","aaa","aaaa"]) == 2  # repeated chars, longer than s ignored
assert Solution().numMatchingSubseq("abcdefghijklmnopqrstuvwxyz", ["a","z","az","za","abc","xyz"]) == 5
Test Why
"" as s Tests empty string handling
"" as word Empty word is always a subsequence
One valid, one invalid Confirms subsequence order checking
Repeated characters Handles words longer than s correctly
Alphabet coverage Checks multiple positions and character matching

Edge Cases

One edge case is when s is empty. In this situation, no non-empty word can be a subsequence, and the algorithm correctly returns 0.

Another edge case is empty strings in words. According to the subsequence definition, an empty string is a subsequence of any string. The algorithm handles this because iterators of length 0 immediately contribute to the count.

A third edge case involves repeated characters or words longer than s. The algorithm correctly skips words that cannot match because the iterator never reaches completion, and repeated characters in s help