LeetCode 1032 - Stream of Characters

The problem asks us to design a data structure that processes a stream of characters one at a time. After each newly added character, we must determine whether any suffix of the stream matches one of the words from a predefined dictionary.

LeetCode Problem 1032

Difficulty: 🔴 Hard
Topics: Array, String, Design, Trie, Data Stream

Solution

Problem Understanding

The problem asks us to design a data structure that processes a stream of characters one at a time. After each newly added character, we must determine whether any suffix of the stream matches one of the words from a predefined dictionary.

A suffix means the ending portion of the stream. For example, if the stream is "axyz", then its suffixes are:

  • "z"
  • "yz"
  • "xyz"
  • "axyz"

If our dictionary contains "xyz", then after receiving the final 'z', the query should return true.

The important detail is that we are not checking whether the entire stream matches a word. We only care whether the stream ends with a valid dictionary word.

The input consists of:

  • An array words, containing all valid words
  • A sequence of query(letter) calls, each adding one character to the stream

For every query, we return:

  • true if at least one suffix of the current stream exists in words
  • false otherwise

The constraints are large enough that naive solutions become inefficient:

  • Up to 2000 words
  • Each word can have length up to 200
  • Up to 40,000 queries

A straightforward implementation that repeatedly checks every word against the current stream would perform too much repeated work.

One extremely important observation is that the maximum word length is only 200. That means any suffix longer than 200 characters can never match a word. This allows us to limit how much stream history we keep.

Several edge cases are important:

  • Single-character words such as "f" must match immediately after the corresponding query.
  • Multiple words can share suffixes or prefixes.
  • The stream may become extremely long, but only the most recent 200 characters matter.
  • Words can overlap in matching patterns.
  • A query can produce true multiple times in a row for different suffixes.

Approaches

Brute Force Approach

The most direct solution is to maintain the full stream as a string. After each query, we iterate through every word in words and check whether the stream ends with that word.

For example:

words = ["cd", "f", "kl"]
stream = "abcd"

We would test:

  • Does "abcd" end with "cd"?
  • Does "abcd" end with "f"?
  • Does "abcd" end with "kl"?

If any check succeeds, we return true.

This approach is correct because it explicitly checks every possible dictionary word against the current stream suffix.

However, it is too slow.

Suppose:

  • 2000 words
  • each word length up to 200
  • 40,000 queries

In the worst case, each query may compare against all words, producing roughly:

40,000 × 2,000 × 200

This becomes far too expensive.

Key Insight for the Optimal Solution

The critical observation is that we only care about suffixes of the stream.

Normally, tries are built forward to support prefix matching. Here we need suffix matching, so we reverse the idea:

  • Reverse every word before inserting into the trie
  • Traverse the stream backward during queries

For example:

word = "cd"
reversed = "dc"

If the stream becomes:

"...cd"

then traversing backward from the latest character gives:

d -> c

which matches the reversed trie path.

This transforms a suffix problem into a prefix traversal problem inside the trie.

The trie allows us to stop early whenever a path no longer exists, avoiding unnecessary comparisons.

We also only need to keep the most recent maxWordLength characters from the stream, because longer suffixes cannot match anything.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(Q × W × L) O(S) Checks every word against the stream after each query
Optimal Trie Solution O(Q × L) O(total word characters + L) Uses reversed trie and bounded stream history

Where:

  • Q = number of queries
  • W = number of words
  • L = maximum word length
  • S = total stream length

Algorithm Walkthrough

1. Build a Reversed Trie

We create a trie where every word is inserted in reverse order.

For example:

"cd" -> "d" -> "c"
"kl" -> "l" -> "k"

Each trie node stores:

  • child pointers
  • a boolean indicating whether a complete word ends there

Reversing the words is essential because we want to match stream suffixes efficiently.

2. Track the Maximum Word Length

While inserting words, we compute:

max_word_length

This tells us the longest possible suffix that could ever matter.

Any characters older than this limit can be discarded safely.

3. Maintain the Stream

For every query:

  • append the new character to the stream
  • if the stream exceeds max_word_length, remove the oldest character

This keeps memory bounded.

4. Traverse the Stream Backward

After adding a character, we start from the trie root and scan the stream in reverse order.

For example:

stream = "abcd"

We examine:

d -> c -> b -> a

At each step:

  • if the current character does not exist in the trie, stop immediately and return false
  • otherwise move deeper into the trie
  • if we reach a node marked as a word ending, return true

5. Return the Result

If traversal finishes without finding a completed word, return false.

Why it works

The trie contains every dictionary word in reversed form. Traversing the stream backward means we are effectively checking every possible suffix simultaneously.

Whenever traversal reaches a node marked as a complete word, it means the corresponding suffix exists in the stream.

Because the trie only follows valid dictionary paths, unnecessary comparisons are avoided.

Python Solution

from typing import List
from collections import deque

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False

class StreamChecker:

    def __init__(self, words: List[str]):
        self.root = TrieNode()
        self.stream = deque()
        self.max_length = 0

        for word in words:
            self.max_length = max(self.max_length, len(word))

            node = self.root

            for char in reversed(word):
                if char not in node.children:
                    node.children[char] = TrieNode()

                node = node.children[char]

            node.is_word = True

    def query(self, letter: str) -> bool:
        self.stream.append(letter)

        if len(self.stream) > self.max_length:
            self.stream.popleft()

        node = self.root

        for char in reversed(self.stream):
            if char not in node.children:
                return False

            node = node.children[char]

            if node.is_word:
                return True

        return False

# Your StreamChecker object will be instantiated and called as such:
# obj = StreamChecker(words)
# param_1 = obj.query(letter)

The implementation begins with a trie node class containing a dictionary of children and a boolean flag indicating whether a word terminates at that node.

During initialization, every word is inserted in reversed order. Reversing the words is what enables efficient suffix checking during queries.

The stream itself is stored using a deque, which allows efficient insertion at the end and removal from the front.

The query method appends the new character, trims the stream if necessary, and then walks backward through the stream while traversing the trie.

If traversal reaches a trie node marked as is_word, we immediately return True because a suffix match has been found.

If traversal encounters a missing trie edge, we can stop early because no longer suffix can possibly match.

Go Solution

package main

type TrieNode struct {
	children [26]*TrieNode
	isWord   bool
}

type StreamChecker struct {
	root      *TrieNode
	stream    []byte
	maxLength int
}

func Constructor(words []string) StreamChecker {
	root := &TrieNode{}
	maxLength := 0

	for _, word := range words {
		if len(word) > maxLength {
			maxLength = len(word)
		}

		node := root

		for i := len(word) - 1; i >= 0; i-- {
			index := word[i] - 'a'

			if node.children[index] == nil {
				node.children[index] = &TrieNode{}
			}

			node = node.children[index]
		}

		node.isWord = true
	}

	return StreamChecker{
		root:      root,
		stream:    []byte{},
		maxLength: maxLength,
	}
}

func (this *StreamChecker) Query(letter byte) bool {
	this.stream = append(this.stream, letter)

	if len(this.stream) > this.maxLength {
		this.stream = this.stream[1:]
	}

	node := this.root

	for i := len(this.stream) - 1; i >= 0; i-- {
		index := this.stream[i] - 'a'

		if node.children[index] == nil {
			return false
		}

		node = node.children[index]

		if node.isWord {
			return true
		}
	}

	return false
}

/**
 * Your StreamChecker object will be instantiated and called as such:
 * obj := Constructor(words);
 * param_1 := obj.Query(letter);
 */

The Go implementation follows the same algorithmic structure as the Python version, but uses fixed-size arrays of length 26 for trie children instead of hash maps. This improves performance because the alphabet contains only lowercase English letters.

The stream is maintained as a byte slice. When the stream exceeds the maximum relevant length, the oldest character is removed by slicing.

Go does not require special handling for integer overflow here because all values remain comfortably within standard integer limits.

Worked Examples

Example 1

words = ["cd", "f", "kl"]

Reversed Trie Structure

root
 ├── d
 │    └── c (word)
 ├── f (word)
 └── l
      └── k (word)

Query Trace

Query Stream Backward Traversal Match Result
a a a not found None false
b ab b not found None false
c abc c not found None false
d abcd d -> c "cd" true
e abcde e not found None false
f abcdef f "f" true
g abcdefg g not found None false
h abcdefgh h not found None false
i abcdefghi i not found None false
j abcdefghij j not found None false
k abcdefghijk k not found None false
l abcdefghijkl l -> k "kl" true

Detailed Walkthrough for Query('d')

Current stream:

abcd

Traverse backward:

d

Trie contains child 'd'.

Move deeper:

d -> c

Next stream character is 'c'.

Trie node 'c' is marked as a complete word.

Therefore:

suffix = "cd"

Return true.

Complexity Analysis

Measure Complexity Explanation
Time O(L) per query At most max_word_length characters are traversed
Space O(total word characters + L) Trie storage plus bounded stream history

The trie contains one node per inserted character in the worst case, so total trie size is proportional to the combined lengths of all words.

Each query examines at most L characters, where L is the maximum word length. Since L <= 200, queries remain efficient even for 40,000 operations.

Test Cases

# Provided example
sc = StreamChecker(["cd", "f", "kl"])

assert sc.query("a") == False  # no match
assert sc.query("b") == False  # no match
assert sc.query("c") == False  # partial only
assert sc.query("d") == True   # matches "cd"
assert sc.query("e") == False  # reset path
assert sc.query("f") == True   # single character word
assert sc.query("g") == False
assert sc.query("h") == False
assert sc.query("i") == False
assert sc.query("j") == False
assert sc.query("k") == False  # partial for "kl"
assert sc.query("l") == True   # matches "kl"

# Single word
sc = StreamChecker(["abc"])

assert sc.query("a") == False
assert sc.query("b") == False
assert sc.query("c") == True   # exact suffix match

# Multiple overlapping words
sc = StreamChecker(["a", "ab", "abc"])

assert sc.query("a") == True   # matches "a"
assert sc.query("b") == True   # matches "ab"
assert sc.query("c") == True   # matches "abc"

# Long stream with trimming
sc = StreamChecker(["xyz"])

for ch in "abcdefghijklmnop":
    assert sc.query(ch) == False

assert sc.query("x") == False
assert sc.query("y") == False
assert sc.query("z") == True   # matches after long irrelevant prefix

# Repeated characters
sc = StreamChecker(["aaa"])

assert sc.query("a") == False
assert sc.query("a") == False
assert sc.query("a") == True

# Multiple possible suffixes
sc = StreamChecker(["bc", "abc"])

assert sc.query("a") == False
assert sc.query("b") == False
assert sc.query("c") == True   # both "bc" and "abc" exist

# Single character dictionary
sc = StreamChecker(["z"])

assert sc.query("z") == True
assert sc.query("a") == False

Test Case Summary

Test Why
Problem example Validates standard functionality
Single word Tests exact suffix formation
Overlapping words Ensures trie handles shared paths
Long stream Verifies stream trimming logic
Repeated characters Tests repeated traversal paths
Multiple suffix matches Ensures earliest valid suffix detection
Single character word Confirms immediate matching works

Edge Cases

One important edge case occurs when the dictionary contains single-character words. A buggy implementation might only check longer suffixes or delay validation until multiple characters exist. In this solution, every query immediately checks whether the first traversed trie node is marked as a word ending, so single-character matches work correctly.

Another tricky case is extremely long streams. Since up to 40,000 queries may occur, storing the entire stream indefinitely would waste memory and slow traversal. The implementation avoids this by retaining only the last max_word_length characters. Any older characters cannot contribute to a valid suffix match.

Overlapping words are another common source of bugs. Consider words like "a", "ab", and "abc". A naive traversal might stop too early or fail to recognize shorter valid matches while traversing toward longer ones. This implementation checks is_word at every trie node during traversal, ensuring all valid suffixes are detected correctly.

A final subtle case occurs when traversal partially matches a trie path but then fails. For example, if the trie contains "abcd" and the stream ends with "abce", traversal succeeds for several characters before failing at 'e'. The implementation immediately returns false once a trie edge is missing, preventing unnecessary work while preserving correctness.