LeetCode 642 - Design Search Autocomplete System
The problem asks us to design an autocomplete system that behaves similarly to a search engine suggestion feature. The system starts with a collection of historical sentences and the number of times each sentence has been typed before.
Difficulty: 🔴 Hard
Topics: String, Depth-First Search, Design, Trie, Sorting, Heap (Priority Queue), Data Stream
Solution
Problem Understanding
The problem asks us to design an autocomplete system that behaves similarly to a search engine suggestion feature. The system starts with a collection of historical sentences and the number of times each sentence has been typed before. As the user types characters one by one, the system must return up to three matching sentences that begin with the current input prefix.
The ranking rules are very important:
- Sentences with higher frequency should appear first.
- If two sentences have the same frequency, the sentence with the smaller ASCII order should come first.
- Only the top three results are returned.
The special character '#' signals the end of the current sentence. When this happens, the current input string becomes part of the historical data and its frequency increases by one. The system then resets and starts accepting a new sentence.
The input consists of:
sentences, a list of historical search stringstimes, wheretimes[i]is the frequency ofsentences[i]- A sequence of character inputs through repeated calls to
input(c)
The output for each character input is:
- A list of up to three autocomplete suggestions if
c != '#' - An empty list if
c == '#'
The constraints reveal several important details about the expected solution design.
The number of initial sentences is relatively small, at most 100, but each sentence can be fairly long. More importantly, the input() function can be called up to 5000 times. This means repeatedly scanning every sentence for every keystroke could become inefficient. The problem is fundamentally about efficient prefix lookup combined with ranking.
Several edge cases are important:
- Multiple sentences may share the same frequency, requiring ASCII ordering.
- The user may type a completely new sentence that does not currently exist.
- Prefixes may return fewer than three matches.
- Spaces are valid characters and must be handled correctly.
- The system must dynamically update itself after every completed sentence.
A naive implementation can easily make mistakes in sorting, prefix handling, or updating frequencies after '#'.
Approaches
Brute Force Approach
The simplest approach is to store all sentences in a hash map where the key is the sentence and the value is its frequency.
For every character input:
- Append the character to the current prefix.
- Scan every stored sentence.
- Check whether the sentence starts with the current prefix.
- Collect matching sentences.
- Sort them by:
- descending frequency
- ascending lexicographical order
- Return the top three.
When '#' is encountered:
- Add the completed sentence to the frequency map.
- Reset the current input buffer.
This approach is correct because it directly evaluates every possible candidate and applies the ranking rules exactly as stated.
However, it becomes inefficient because every keystroke requires scanning all stored sentences. As the dataset grows, repeated prefix matching and sorting become expensive.
Optimal Approach, Trie with Frequency Map
The key observation is that this problem is fundamentally a prefix search problem. Prefix problems are naturally suited for a Trie.
A Trie allows us to:
- Traverse characters one by one
- Quickly locate all sentences sharing a prefix
- Avoid scanning unrelated sentences
Each Trie node represents a prefix. We store:
- Child pointers
- A mapping of sentences to frequencies for that prefix
As sentences are inserted, every node along the path updates its frequency map.
When the user types a character:
- Traverse the Trie to the node representing the current prefix.
- Retrieve all candidate sentences stored at that node.
- Sort candidates by the required ranking rules.
- Return the top three.
This significantly reduces unnecessary scanning because only sentences sharing the prefix are considered.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(N × L + N log N) per query | O(N × L) | Scans every sentence for each character |
| Optimal | O(P + K log K) per query | O(Total characters in Trie) | Trie narrows search to matching prefixes |
Where:
N= number of sentencesL= average sentence lengthP= prefix lengthK= number of matching candidates
Algorithm Walkthrough
Optimal Trie-Based Algorithm
- Create a Trie data structure where each node contains:
- A dictionary of child nodes
- A dictionary mapping sentences to frequencies
The child dictionary allows traversal character by character. The frequency map allows fast retrieval of candidate sentences for any prefix. 2. Insert every historical sentence into the Trie.
For each character in the sentence:
- Move to the corresponding child node, creating it if necessary.
- Update the node's sentence-frequency map with the sentence's count.
This ensures that every prefix node knows which sentences belong to that prefix. 3. Maintain two pieces of runtime state:
- The current input string
- A pointer to the current Trie node
These allow efficient incremental traversal as the user types.
4. When input(c) is called with a normal character:
- Append the character to the current input buffer.
- Move to the corresponding child node in the Trie.
- If the path does not exist, return an empty list because no sentence matches this prefix.
- Otherwise, retrieve all candidate sentences stored in the node.
- Sort the candidate sentences using:
- Higher frequency first
- Lexicographically smaller sentence first when frequencies tie
In Python, this is achieved with:
sorted(candidates, key=lambda x: (-freq[x], x))
- Return the first three sentences after sorting.
- When
input('#')is called:
- Insert the completed sentence into the Trie with frequency incremented by one.
- Reset the current input buffer.
- Reset the Trie traversal state.
- Return an empty list.
Why it works
The core invariant is that every Trie node stores all sentences sharing that node's prefix. Therefore, once traversal reaches the node representing the current input prefix, every valid autocomplete candidate is immediately available at that node. Sorting those candidates according to the problem rules guarantees the correct top three suggestions.
Python Solution
from typing import List
from collections import defaultdict
class TrieNode:
def __init__(self):
self.children = {}
self.sentences = defaultdict(int)
class AutocompleteSystem:
def __init__(self, sentences: List[str], times: List[int]):
self.root = TrieNode()
self.current_input = ""
self.current_node = self.root
for sentence, time in zip(sentences, times):
self._insert(sentence, time)
def _insert(self, sentence: str, frequency: int) -> None:
node = self.root
for char in sentence:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.sentences[sentence] += frequency
def input(self, c: str) -> List[str]:
if c == '#':
self._insert(self.current_input, 1)
self.current_input = ""
self.current_node = self.root
return []
self.current_input += c
if self.current_node and c in self.current_node.children:
self.current_node = self.current_node.children[c]
candidates = list(self.current_node.sentences.keys())
candidates.sort(
key=lambda sentence: (
-self.current_node.sentences[sentence],
sentence
)
)
return candidates[:3]
self.current_node = None
return []
The implementation begins with the TrieNode class. Each node stores:
children, which represents outgoing character edgessentences, which tracks sentence frequencies for the node's prefix
The constructor initializes the Trie and inserts all historical data.
The _insert() helper method walks through each character in the sentence. As it traverses the Trie, it updates the sentence frequency at every node along the path. This guarantees that each prefix node contains all matching sentences.
The input() method handles two cases.
When the user types '#', the current sentence is finalized and inserted into the Trie with frequency incremented by one. The system state is reset afterward.
For normal characters, the method advances through the Trie. If the prefix exists, it retrieves candidate sentences from the current node, sorts them using the ranking rules, and returns the top three.
If traversal fails because no matching prefix exists, the current node becomes None, and future characters continue returning empty lists until '#' resets the system.
Go Solution
package main
import (
"sort"
)
type TrieNode struct {
children map[byte]*TrieNode
sentences map[string]int
}
func NewTrieNode() *TrieNode {
return &TrieNode{
children: make(map[byte]*TrieNode),
sentences: make(map[string]int),
}
}
type AutocompleteSystem struct {
root *TrieNode
currentNode *TrieNode
currentInput string
}
func Constructor(sentences []string, times []int) AutocompleteSystem {
system := AutocompleteSystem{
root: NewTrieNode(),
}
system.currentNode = system.root
for i := 0; i < len(sentences); i++ {
system.insert(sentences[i], times[i])
}
return system
}
func (this *AutocompleteSystem) insert(sentence string, frequency int) {
node := this.root
for i := 0; i < len(sentence); i++ {
ch := sentence[i]
if _, exists := node.children[ch]; !exists {
node.children[ch] = NewTrieNode()
}
node = node.children[ch]
node.sentences[sentence] += frequency
}
}
func (this *AutocompleteSystem) Input(c byte) []string {
if c == '#' {
this.insert(this.currentInput, 1)
this.currentInput = ""
this.currentNode = this.root
return []string{}
}
this.currentInput += string(c)
if this.currentNode != nil {
if nextNode, exists := this.currentNode.children[c]; exists {
this.currentNode = nextNode
candidates := make([]string, 0)
for sentence := range this.currentNode.sentences {
candidates = append(candidates, sentence)
}
sort.Slice(candidates, func(i, j int) bool {
freqI := this.currentNode.sentences[candidates[i]]
freqJ := this.currentNode.sentences[candidates[j]]
if freqI == freqJ {
return candidates[i] < candidates[j]
}
return freqI > freqJ
})
if len(candidates) > 3 {
return candidates[:3]
}
return candidates
}
}
this.currentNode = nil
return []string{}
}
The Go implementation follows the same overall design but differs in a few language-specific details.
Go does not have built-in default dictionaries, so maps must be explicitly initialized.
Sorting is implemented using sort.Slice() with a custom comparator.
The Trie uses byte keys because the problem only contains lowercase letters, spaces, and '#'.
Unlike Python, Go distinguishes between nil slices and empty slices. Returning []string{} ensures the correct empty result format.
Worked Examples
Example 1
Input:
sentences = ["i love you", "island", "iroman", "i love leetcode"]
times = [5, 3, 2, 2]
Initial Trie contents:
| Sentence | Frequency |
|---|---|
| i love you | 5 |
| island | 3 |
| iroman | 2 |
| i love leetcode | 2 |
Step 1: input('i')
Current prefix:
"i"
Matching sentences:
| Sentence | Frequency |
|---|---|
| i love you | 5 |
| island | 3 |
| iroman | 2 |
| i love leetcode | 2 |
Sorting rules:
- Higher frequency first
- Lexicographically smaller first on ties
Between:
"iroman"
"i love leetcode"
ASCII comparison places "i love leetcode" first because space comes before 'r'.
Top three:
["i love you", "island", "i love leetcode"]
Step 2: input(' ')
Current prefix:
"i "
Matching sentences:
| Sentence | Frequency |
|---|---|
| i love you | 5 |
| i love leetcode | 2 |
Result:
["i love you", "i love leetcode"]
Step 3: input('a')
Current prefix:
"i a"
No matching sentences exist.
Result:
[]
Step 4: input('#')
The sentence:
"i a"
is inserted into the Trie with frequency 1.
Trie update:
| Sentence | Frequency |
|---|---|
| i a | 1 |
System resets current input.
Result:
[]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(P + K log K) | Trie traversal plus sorting matching candidates |
| Space | O(S) | Trie stores all sentence characters and prefix mappings |
Where:
Pis the prefix lengthKis the number of matching sentencesSis the total number of characters across all stored sentences
The Trie traversal itself is efficient because it only follows characters in the current prefix. The main remaining cost is sorting the candidate sentences stored at the prefix node.
In practice, this solution performs very well because autocomplete prefixes usually narrow the candidate set significantly.
Test Cases
from typing import List
# Example case from problem statement
system = AutocompleteSystem(
["i love you", "island", "iroman", "i love leetcode"],
[5, 3, 2, 2]
)
assert system.input("i") == [
"i love you",
"island",
"i love leetcode"
] # basic autocomplete ordering
assert system.input(" ") == [
"i love you",
"i love leetcode"
] # prefix with space
assert system.input("a") == [] # no matching prefix
assert system.input("#") == [] # finalize new sentence
# Test insertion of new sentence
assert system.input("i") == [
"i love you",
"island",
"i love leetcode"
]
assert system.input(" ") == [
"i love you",
"i love leetcode",
"i a"
] # newly inserted sentence appears
# Single sentence test
system2 = AutocompleteSystem(["hello"], [1])
assert system2.input("h") == ["hello"] # only one candidate
assert system2.input("#") == []
# Tie-breaking by ASCII order
system3 = AutocompleteSystem(
["abc", "abd"],
[2, 2]
)
assert system3.input("a") == ["abc", "abd"] # lexicographical ordering
# Multiple insertions increase frequency
system4 = AutocompleteSystem([], [])
assert system4.input("h") == []
assert system4.input("i") == []
assert system4.input("#") == []
assert system4.input("h") == ["hi"]
assert system4.input("i") == ["hi"]
assert system4.input("#") == []
assert system4.input("h") == ["hi"] # frequency updated correctly
# Prefix completely missing
system5 = AutocompleteSystem(["apple"], [3])
assert system5.input("z") == [] # nonexistent prefix
assert system5.input("#") == []
| Test | Why |
|---|---|
| Problem example | Verifies standard functionality |
| New sentence insertion | Ensures dynamic updates work |
| Single sentence | Tests minimal candidate set |
| ASCII tie-breaking | Verifies lexicographical ordering |
| Repeated insertion | Ensures frequencies increment properly |
| Missing prefix | Confirms empty result handling |
Edge Cases
Prefix Does Not Exist
A very common bug occurs when traversal reaches a character path that does not exist in the Trie. A naive implementation may continue searching or accidentally crash with a null reference.
This implementation handles the situation by setting current_node to None in Python or nil in Go. Once that happens, all future inputs for the current sentence immediately return empty results until '#' resets the system.
Frequency Tie Between Sentences
Two sentences may have identical frequencies. The problem requires ASCII ordering as the tie-breaker, which is easy to overlook.
For example:
"i love leetcode"
"iroman"
Both may have frequency 2, but the space character has smaller ASCII value than 'r', so "i love leetcode" must come first.
The implementation handles this using tuple-based sorting:
(-frequency, sentence)
This guarantees correct ordering.
Adding Completely New Sentences
The user may type a sentence that never existed before. A buggy implementation may fail to insert it correctly because intermediate Trie nodes are missing.
The _insert() method dynamically creates missing nodes while traversing characters. This ensures new sentences become searchable immediately after the user enters '#'.
Prefix With Fewer Than Three Matches
The problem requires returning as many results as exist, not exactly three.
The implementation safely uses slicing:
candidates[:3]
If fewer than three candidates exist, Python automatically returns the available subset without errors. The Go version similarly checks the slice length before truncation.