LeetCode 1032 - Stream of Characters
The problem asks us to design a data structure that processes a stream of characters one at a time. After each newly added character, we must determine whether any suffix of the stream matches one of the words from a predefined dictionary.
Difficulty: 🔴 Hard
Topics: Array, String, Design, Trie, Data Stream
Solution
Problem Understanding
The problem asks us to design a data structure that processes a stream of characters one at a time. After each newly added character, we must determine whether any suffix of the stream matches one of the words from a predefined dictionary.
A suffix means the ending portion of the stream. For example, if the stream is "axyz", then its suffixes are:
"z""yz""xyz""axyz"
If our dictionary contains "xyz", then after receiving the final 'z', the query should return true.
The important detail is that we are not checking whether the entire stream matches a word. We only care whether the stream ends with a valid dictionary word.
The input consists of:
- An array
words, containing all valid words - A sequence of
query(letter)calls, each adding one character to the stream
For every query, we return:
trueif at least one suffix of the current stream exists inwordsfalseotherwise
The constraints are large enough that naive solutions become inefficient:
- Up to 2000 words
- Each word can have length up to 200
- Up to 40,000 queries
A straightforward implementation that repeatedly checks every word against the current stream would perform too much repeated work.
One extremely important observation is that the maximum word length is only 200. That means any suffix longer than 200 characters can never match a word. This allows us to limit how much stream history we keep.
Several edge cases are important:
- Single-character words such as
"f"must match immediately after the corresponding query. - Multiple words can share suffixes or prefixes.
- The stream may become extremely long, but only the most recent 200 characters matter.
- Words can overlap in matching patterns.
- A query can produce
truemultiple times in a row for different suffixes.
Approaches
Brute Force Approach
The most direct solution is to maintain the full stream as a string. After each query, we iterate through every word in words and check whether the stream ends with that word.
For example:
words = ["cd", "f", "kl"]
stream = "abcd"
We would test:
- Does
"abcd"end with"cd"? - Does
"abcd"end with"f"? - Does
"abcd"end with"kl"?
If any check succeeds, we return true.
This approach is correct because it explicitly checks every possible dictionary word against the current stream suffix.
However, it is too slow.
Suppose:
- 2000 words
- each word length up to 200
- 40,000 queries
In the worst case, each query may compare against all words, producing roughly:
40,000 × 2,000 × 200
This becomes far too expensive.
Key Insight for the Optimal Solution
The critical observation is that we only care about suffixes of the stream.
Normally, tries are built forward to support prefix matching. Here we need suffix matching, so we reverse the idea:
- Reverse every word before inserting into the trie
- Traverse the stream backward during queries
For example:
word = "cd"
reversed = "dc"
If the stream becomes:
"...cd"
then traversing backward from the latest character gives:
d -> c
which matches the reversed trie path.
This transforms a suffix problem into a prefix traversal problem inside the trie.
The trie allows us to stop early whenever a path no longer exists, avoiding unnecessary comparisons.
We also only need to keep the most recent maxWordLength characters from the stream, because longer suffixes cannot match anything.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(Q × W × L) | O(S) | Checks every word against the stream after each query |
| Optimal Trie Solution | O(Q × L) | O(total word characters + L) | Uses reversed trie and bounded stream history |
Where:
Q= number of queriesW= number of wordsL= maximum word lengthS= total stream length
Algorithm Walkthrough
1. Build a Reversed Trie
We create a trie where every word is inserted in reverse order.
For example:
"cd" -> "d" -> "c"
"kl" -> "l" -> "k"
Each trie node stores:
- child pointers
- a boolean indicating whether a complete word ends there
Reversing the words is essential because we want to match stream suffixes efficiently.
2. Track the Maximum Word Length
While inserting words, we compute:
max_word_length
This tells us the longest possible suffix that could ever matter.
Any characters older than this limit can be discarded safely.
3. Maintain the Stream
For every query:
- append the new character to the stream
- if the stream exceeds
max_word_length, remove the oldest character
This keeps memory bounded.
4. Traverse the Stream Backward
After adding a character, we start from the trie root and scan the stream in reverse order.
For example:
stream = "abcd"
We examine:
d -> c -> b -> a
At each step:
- if the current character does not exist in the trie, stop immediately and return
false - otherwise move deeper into the trie
- if we reach a node marked as a word ending, return
true
5. Return the Result
If traversal finishes without finding a completed word, return false.
Why it works
The trie contains every dictionary word in reversed form. Traversing the stream backward means we are effectively checking every possible suffix simultaneously.
Whenever traversal reaches a node marked as a complete word, it means the corresponding suffix exists in the stream.
Because the trie only follows valid dictionary paths, unnecessary comparisons are avoided.
Python Solution
from typing import List
from collections import deque
class TrieNode:
def __init__(self):
self.children = {}
self.is_word = False
class StreamChecker:
def __init__(self, words: List[str]):
self.root = TrieNode()
self.stream = deque()
self.max_length = 0
for word in words:
self.max_length = max(self.max_length, len(word))
node = self.root
for char in reversed(word):
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_word = True
def query(self, letter: str) -> bool:
self.stream.append(letter)
if len(self.stream) > self.max_length:
self.stream.popleft()
node = self.root
for char in reversed(self.stream):
if char not in node.children:
return False
node = node.children[char]
if node.is_word:
return True
return False
# Your StreamChecker object will be instantiated and called as such:
# obj = StreamChecker(words)
# param_1 = obj.query(letter)
The implementation begins with a trie node class containing a dictionary of children and a boolean flag indicating whether a word terminates at that node.
During initialization, every word is inserted in reversed order. Reversing the words is what enables efficient suffix checking during queries.
The stream itself is stored using a deque, which allows efficient insertion at the end and removal from the front.
The query method appends the new character, trims the stream if necessary, and then walks backward through the stream while traversing the trie.
If traversal reaches a trie node marked as is_word, we immediately return True because a suffix match has been found.
If traversal encounters a missing trie edge, we can stop early because no longer suffix can possibly match.
Go Solution
package main
type TrieNode struct {
children [26]*TrieNode
isWord bool
}
type StreamChecker struct {
root *TrieNode
stream []byte
maxLength int
}
func Constructor(words []string) StreamChecker {
root := &TrieNode{}
maxLength := 0
for _, word := range words {
if len(word) > maxLength {
maxLength = len(word)
}
node := root
for i := len(word) - 1; i >= 0; i-- {
index := word[i] - 'a'
if node.children[index] == nil {
node.children[index] = &TrieNode{}
}
node = node.children[index]
}
node.isWord = true
}
return StreamChecker{
root: root,
stream: []byte{},
maxLength: maxLength,
}
}
func (this *StreamChecker) Query(letter byte) bool {
this.stream = append(this.stream, letter)
if len(this.stream) > this.maxLength {
this.stream = this.stream[1:]
}
node := this.root
for i := len(this.stream) - 1; i >= 0; i-- {
index := this.stream[i] - 'a'
if node.children[index] == nil {
return false
}
node = node.children[index]
if node.isWord {
return true
}
}
return false
}
/**
* Your StreamChecker object will be instantiated and called as such:
* obj := Constructor(words);
* param_1 := obj.Query(letter);
*/
The Go implementation follows the same algorithmic structure as the Python version, but uses fixed-size arrays of length 26 for trie children instead of hash maps. This improves performance because the alphabet contains only lowercase English letters.
The stream is maintained as a byte slice. When the stream exceeds the maximum relevant length, the oldest character is removed by slicing.
Go does not require special handling for integer overflow here because all values remain comfortably within standard integer limits.
Worked Examples
Example 1
words = ["cd", "f", "kl"]
Reversed Trie Structure
root
├── d
│ └── c (word)
├── f (word)
└── l
└── k (word)
Query Trace
| Query | Stream | Backward Traversal | Match | Result |
|---|---|---|---|---|
| a | a | a not found | None | false |
| b | ab | b not found | None | false |
| c | abc | c not found | None | false |
| d | abcd | d -> c | "cd" | true |
| e | abcde | e not found | None | false |
| f | abcdef | f | "f" | true |
| g | abcdefg | g not found | None | false |
| h | abcdefgh | h not found | None | false |
| i | abcdefghi | i not found | None | false |
| j | abcdefghij | j not found | None | false |
| k | abcdefghijk | k not found | None | false |
| l | abcdefghijkl | l -> k | "kl" | true |
Detailed Walkthrough for Query('d')
Current stream:
abcd
Traverse backward:
d
Trie contains child 'd'.
Move deeper:
d -> c
Next stream character is 'c'.
Trie node 'c' is marked as a complete word.
Therefore:
suffix = "cd"
Return true.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(L) per query | At most max_word_length characters are traversed |
| Space | O(total word characters + L) | Trie storage plus bounded stream history |
The trie contains one node per inserted character in the worst case, so total trie size is proportional to the combined lengths of all words.
Each query examines at most L characters, where L is the maximum word length. Since L <= 200, queries remain efficient even for 40,000 operations.
Test Cases
# Provided example
sc = StreamChecker(["cd", "f", "kl"])
assert sc.query("a") == False # no match
assert sc.query("b") == False # no match
assert sc.query("c") == False # partial only
assert sc.query("d") == True # matches "cd"
assert sc.query("e") == False # reset path
assert sc.query("f") == True # single character word
assert sc.query("g") == False
assert sc.query("h") == False
assert sc.query("i") == False
assert sc.query("j") == False
assert sc.query("k") == False # partial for "kl"
assert sc.query("l") == True # matches "kl"
# Single word
sc = StreamChecker(["abc"])
assert sc.query("a") == False
assert sc.query("b") == False
assert sc.query("c") == True # exact suffix match
# Multiple overlapping words
sc = StreamChecker(["a", "ab", "abc"])
assert sc.query("a") == True # matches "a"
assert sc.query("b") == True # matches "ab"
assert sc.query("c") == True # matches "abc"
# Long stream with trimming
sc = StreamChecker(["xyz"])
for ch in "abcdefghijklmnop":
assert sc.query(ch) == False
assert sc.query("x") == False
assert sc.query("y") == False
assert sc.query("z") == True # matches after long irrelevant prefix
# Repeated characters
sc = StreamChecker(["aaa"])
assert sc.query("a") == False
assert sc.query("a") == False
assert sc.query("a") == True
# Multiple possible suffixes
sc = StreamChecker(["bc", "abc"])
assert sc.query("a") == False
assert sc.query("b") == False
assert sc.query("c") == True # both "bc" and "abc" exist
# Single character dictionary
sc = StreamChecker(["z"])
assert sc.query("z") == True
assert sc.query("a") == False
Test Case Summary
| Test | Why |
|---|---|
| Problem example | Validates standard functionality |
| Single word | Tests exact suffix formation |
| Overlapping words | Ensures trie handles shared paths |
| Long stream | Verifies stream trimming logic |
| Repeated characters | Tests repeated traversal paths |
| Multiple suffix matches | Ensures earliest valid suffix detection |
| Single character word | Confirms immediate matching works |
Edge Cases
One important edge case occurs when the dictionary contains single-character words. A buggy implementation might only check longer suffixes or delay validation until multiple characters exist. In this solution, every query immediately checks whether the first traversed trie node is marked as a word ending, so single-character matches work correctly.
Another tricky case is extremely long streams. Since up to 40,000 queries may occur, storing the entire stream indefinitely would waste memory and slow traversal. The implementation avoids this by retaining only the last max_word_length characters. Any older characters cannot contribute to a valid suffix match.
Overlapping words are another common source of bugs. Consider words like "a", "ab", and "abc". A naive traversal might stop too early or fail to recognize shorter valid matches while traversing toward longer ones. This implementation checks is_word at every trie node during traversal, ensuring all valid suffixes are detected correctly.
A final subtle case occurs when traversal partially matches a trie path but then fails. For example, if the trie contains "abcd" and the stream ends with "abce", traversal succeeds for several characters before failing at 'e'. The implementation immediately returns false once a trie edge is missing, preventing unnecessary work while preserving correctness.