LeetCode 642 - Design Search Autocomplete System

The problem asks us to design an autocomplete system that behaves similarly to a search engine suggestion feature. The system starts with a collection of historical sentences and the number of times each sentence has been typed before.

LeetCode Problem 642

Difficulty: 🔴 Hard
Topics: String, Depth-First Search, Design, Trie, Sorting, Heap (Priority Queue), Data Stream

Solution

Problem Understanding

The problem asks us to design an autocomplete system that behaves similarly to a search engine suggestion feature. The system starts with a collection of historical sentences and the number of times each sentence has been typed before. As the user types characters one by one, the system must return up to three matching sentences that begin with the current input prefix.

The ranking rules are very important:

  • Sentences with higher frequency should appear first.
  • If two sentences have the same frequency, the sentence with the smaller ASCII order should come first.
  • Only the top three results are returned.

The special character '#' signals the end of the current sentence. When this happens, the current input string becomes part of the historical data and its frequency increases by one. The system then resets and starts accepting a new sentence.

The input consists of:

  • sentences, a list of historical search strings
  • times, where times[i] is the frequency of sentences[i]
  • A sequence of character inputs through repeated calls to input(c)

The output for each character input is:

  • A list of up to three autocomplete suggestions if c != '#'
  • An empty list if c == '#'

The constraints reveal several important details about the expected solution design.

The number of initial sentences is relatively small, at most 100, but each sentence can be fairly long. More importantly, the input() function can be called up to 5000 times. This means repeatedly scanning every sentence for every keystroke could become inefficient. The problem is fundamentally about efficient prefix lookup combined with ranking.

Several edge cases are important:

  • Multiple sentences may share the same frequency, requiring ASCII ordering.
  • The user may type a completely new sentence that does not currently exist.
  • Prefixes may return fewer than three matches.
  • Spaces are valid characters and must be handled correctly.
  • The system must dynamically update itself after every completed sentence.

A naive implementation can easily make mistakes in sorting, prefix handling, or updating frequencies after '#'.

Approaches

Brute Force Approach

The simplest approach is to store all sentences in a hash map where the key is the sentence and the value is its frequency.

For every character input:

  1. Append the character to the current prefix.
  2. Scan every stored sentence.
  3. Check whether the sentence starts with the current prefix.
  4. Collect matching sentences.
  5. Sort them by:
  • descending frequency
  • ascending lexicographical order
  1. Return the top three.

When '#' is encountered:

  1. Add the completed sentence to the frequency map.
  2. Reset the current input buffer.

This approach is correct because it directly evaluates every possible candidate and applies the ranking rules exactly as stated.

However, it becomes inefficient because every keystroke requires scanning all stored sentences. As the dataset grows, repeated prefix matching and sorting become expensive.

Optimal Approach, Trie with Frequency Map

The key observation is that this problem is fundamentally a prefix search problem. Prefix problems are naturally suited for a Trie.

A Trie allows us to:

  • Traverse characters one by one
  • Quickly locate all sentences sharing a prefix
  • Avoid scanning unrelated sentences

Each Trie node represents a prefix. We store:

  • Child pointers
  • A mapping of sentences to frequencies for that prefix

As sentences are inserted, every node along the path updates its frequency map.

When the user types a character:

  1. Traverse the Trie to the node representing the current prefix.
  2. Retrieve all candidate sentences stored at that node.
  3. Sort candidates by the required ranking rules.
  4. Return the top three.

This significantly reduces unnecessary scanning because only sentences sharing the prefix are considered.

Approach Time Complexity Space Complexity Notes
Brute Force O(N × L + N log N) per query O(N × L) Scans every sentence for each character
Optimal O(P + K log K) per query O(Total characters in Trie) Trie narrows search to matching prefixes

Where:

  • N = number of sentences
  • L = average sentence length
  • P = prefix length
  • K = number of matching candidates

Algorithm Walkthrough

Optimal Trie-Based Algorithm

  1. Create a Trie data structure where each node contains:
  • A dictionary of child nodes
  • A dictionary mapping sentences to frequencies

The child dictionary allows traversal character by character. The frequency map allows fast retrieval of candidate sentences for any prefix. 2. Insert every historical sentence into the Trie.

For each character in the sentence:

  • Move to the corresponding child node, creating it if necessary.
  • Update the node's sentence-frequency map with the sentence's count.

This ensures that every prefix node knows which sentences belong to that prefix. 3. Maintain two pieces of runtime state:

  • The current input string
  • A pointer to the current Trie node

These allow efficient incremental traversal as the user types. 4. When input(c) is called with a normal character:

  • Append the character to the current input buffer.
  • Move to the corresponding child node in the Trie.
  • If the path does not exist, return an empty list because no sentence matches this prefix.
  • Otherwise, retrieve all candidate sentences stored in the node.
  1. Sort the candidate sentences using:
  • Higher frequency first
  • Lexicographically smaller sentence first when frequencies tie

In Python, this is achieved with:

sorted(candidates, key=lambda x: (-freq[x], x))
  1. Return the first three sentences after sorting.
  2. When input('#') is called:
  • Insert the completed sentence into the Trie with frequency incremented by one.
  • Reset the current input buffer.
  • Reset the Trie traversal state.
  • Return an empty list.

Why it works

The core invariant is that every Trie node stores all sentences sharing that node's prefix. Therefore, once traversal reaches the node representing the current input prefix, every valid autocomplete candidate is immediately available at that node. Sorting those candidates according to the problem rules guarantees the correct top three suggestions.

Python Solution

from typing import List
from collections import defaultdict

class TrieNode:
    def __init__(self):
        self.children = {}
        self.sentences = defaultdict(int)

class AutocompleteSystem:

    def __init__(self, sentences: List[str], times: List[int]):
        self.root = TrieNode()
        self.current_input = ""
        self.current_node = self.root

        for sentence, time in zip(sentences, times):
            self._insert(sentence, time)

    def _insert(self, sentence: str, frequency: int) -> None:
        node = self.root

        for char in sentence:
            if char not in node.children:
                node.children[char] = TrieNode()

            node = node.children[char]
            node.sentences[sentence] += frequency

    def input(self, c: str) -> List[str]:
        if c == '#':
            self._insert(self.current_input, 1)

            self.current_input = ""
            self.current_node = self.root

            return []

        self.current_input += c

        if self.current_node and c in self.current_node.children:
            self.current_node = self.current_node.children[c]

            candidates = list(self.current_node.sentences.keys())

            candidates.sort(
                key=lambda sentence: (
                    -self.current_node.sentences[sentence],
                    sentence
                )
            )

            return candidates[:3]

        self.current_node = None
        return []

The implementation begins with the TrieNode class. Each node stores:

  • children, which represents outgoing character edges
  • sentences, which tracks sentence frequencies for the node's prefix

The constructor initializes the Trie and inserts all historical data.

The _insert() helper method walks through each character in the sentence. As it traverses the Trie, it updates the sentence frequency at every node along the path. This guarantees that each prefix node contains all matching sentences.

The input() method handles two cases.

When the user types '#', the current sentence is finalized and inserted into the Trie with frequency incremented by one. The system state is reset afterward.

For normal characters, the method advances through the Trie. If the prefix exists, it retrieves candidate sentences from the current node, sorts them using the ranking rules, and returns the top three.

If traversal fails because no matching prefix exists, the current node becomes None, and future characters continue returning empty lists until '#' resets the system.

Go Solution

package main

import (
	"sort"
)

type TrieNode struct {
	children  map[byte]*TrieNode
	sentences map[string]int
}

func NewTrieNode() *TrieNode {
	return &TrieNode{
		children:  make(map[byte]*TrieNode),
		sentences: make(map[string]int),
	}
}

type AutocompleteSystem struct {
	root         *TrieNode
	currentNode  *TrieNode
	currentInput string
}

func Constructor(sentences []string, times []int) AutocompleteSystem {
	system := AutocompleteSystem{
		root: NewTrieNode(),
	}

	system.currentNode = system.root

	for i := 0; i < len(sentences); i++ {
		system.insert(sentences[i], times[i])
	}

	return system
}

func (this *AutocompleteSystem) insert(sentence string, frequency int) {
	node := this.root

	for i := 0; i < len(sentence); i++ {
		ch := sentence[i]

		if _, exists := node.children[ch]; !exists {
			node.children[ch] = NewTrieNode()
		}

		node = node.children[ch]
		node.sentences[sentence] += frequency
	}
}

func (this *AutocompleteSystem) Input(c byte) []string {
	if c == '#' {
		this.insert(this.currentInput, 1)

		this.currentInput = ""
		this.currentNode = this.root

		return []string{}
	}

	this.currentInput += string(c)

	if this.currentNode != nil {
		if nextNode, exists := this.currentNode.children[c]; exists {
			this.currentNode = nextNode

			candidates := make([]string, 0)

			for sentence := range this.currentNode.sentences {
				candidates = append(candidates, sentence)
			}

			sort.Slice(candidates, func(i, j int) bool {
				freqI := this.currentNode.sentences[candidates[i]]
				freqJ := this.currentNode.sentences[candidates[j]]

				if freqI == freqJ {
					return candidates[i] < candidates[j]
				}

				return freqI > freqJ
			})

			if len(candidates) > 3 {
				return candidates[:3]
			}

			return candidates
		}
	}

	this.currentNode = nil
	return []string{}
}

The Go implementation follows the same overall design but differs in a few language-specific details.

Go does not have built-in default dictionaries, so maps must be explicitly initialized.

Sorting is implemented using sort.Slice() with a custom comparator.

The Trie uses byte keys because the problem only contains lowercase letters, spaces, and '#'.

Unlike Python, Go distinguishes between nil slices and empty slices. Returning []string{} ensures the correct empty result format.

Worked Examples

Example 1

Input:

sentences = ["i love you", "island", "iroman", "i love leetcode"]
times = [5, 3, 2, 2]

Initial Trie contents:

Sentence Frequency
i love you 5
island 3
iroman 2
i love leetcode 2

Step 1: input('i')

Current prefix:

"i"

Matching sentences:

Sentence Frequency
i love you 5
island 3
iroman 2
i love leetcode 2

Sorting rules:

  1. Higher frequency first
  2. Lexicographically smaller first on ties

Between:

"iroman"
"i love leetcode"

ASCII comparison places "i love leetcode" first because space comes before 'r'.

Top three:

["i love you", "island", "i love leetcode"]

Step 2: input(' ')

Current prefix:

"i "

Matching sentences:

Sentence Frequency
i love you 5
i love leetcode 2

Result:

["i love you", "i love leetcode"]

Step 3: input('a')

Current prefix:

"i a"

No matching sentences exist.

Result:

[]

Step 4: input('#')

The sentence:

"i a"

is inserted into the Trie with frequency 1.

Trie update:

Sentence Frequency
i a 1

System resets current input.

Result:

[]

Complexity Analysis

Measure Complexity Explanation
Time O(P + K log K) Trie traversal plus sorting matching candidates
Space O(S) Trie stores all sentence characters and prefix mappings

Where:

  • P is the prefix length
  • K is the number of matching sentences
  • S is the total number of characters across all stored sentences

The Trie traversal itself is efficient because it only follows characters in the current prefix. The main remaining cost is sorting the candidate sentences stored at the prefix node.

In practice, this solution performs very well because autocomplete prefixes usually narrow the candidate set significantly.

Test Cases

from typing import List

# Example case from problem statement
system = AutocompleteSystem(
    ["i love you", "island", "iroman", "i love leetcode"],
    [5, 3, 2, 2]
)

assert system.input("i") == [
    "i love you",
    "island",
    "i love leetcode"
]  # basic autocomplete ordering

assert system.input(" ") == [
    "i love you",
    "i love leetcode"
]  # prefix with space

assert system.input("a") == []  # no matching prefix

assert system.input("#") == []  # finalize new sentence

# Test insertion of new sentence
assert system.input("i") == [
    "i love you",
    "island",
    "i love leetcode"
]

assert system.input(" ") == [
    "i love you",
    "i love leetcode",
    "i a"
]  # newly inserted sentence appears

# Single sentence test
system2 = AutocompleteSystem(["hello"], [1])

assert system2.input("h") == ["hello"]  # only one candidate
assert system2.input("#") == []

# Tie-breaking by ASCII order
system3 = AutocompleteSystem(
    ["abc", "abd"],
    [2, 2]
)

assert system3.input("a") == ["abc", "abd"]  # lexicographical ordering

# Multiple insertions increase frequency
system4 = AutocompleteSystem([], [])

assert system4.input("h") == []
assert system4.input("i") == []
assert system4.input("#") == []

assert system4.input("h") == ["hi"]
assert system4.input("i") == ["hi"]
assert system4.input("#") == []

assert system4.input("h") == ["hi"]  # frequency updated correctly

# Prefix completely missing
system5 = AutocompleteSystem(["apple"], [3])

assert system5.input("z") == []  # nonexistent prefix
assert system5.input("#") == []
Test Why
Problem example Verifies standard functionality
New sentence insertion Ensures dynamic updates work
Single sentence Tests minimal candidate set
ASCII tie-breaking Verifies lexicographical ordering
Repeated insertion Ensures frequencies increment properly
Missing prefix Confirms empty result handling

Edge Cases

Prefix Does Not Exist

A very common bug occurs when traversal reaches a character path that does not exist in the Trie. A naive implementation may continue searching or accidentally crash with a null reference.

This implementation handles the situation by setting current_node to None in Python or nil in Go. Once that happens, all future inputs for the current sentence immediately return empty results until '#' resets the system.

Frequency Tie Between Sentences

Two sentences may have identical frequencies. The problem requires ASCII ordering as the tie-breaker, which is easy to overlook.

For example:

"i love leetcode"
"iroman"

Both may have frequency 2, but the space character has smaller ASCII value than 'r', so "i love leetcode" must come first.

The implementation handles this using tuple-based sorting:

(-frequency, sentence)

This guarantees correct ordering.

Adding Completely New Sentences

The user may type a sentence that never existed before. A buggy implementation may fail to insert it correctly because intermediate Trie nodes are missing.

The _insert() method dynamically creates missing nodes while traversing characters. This ensures new sentences become searchable immediately after the user enters '#'.

Prefix With Fewer Than Three Matches

The problem requires returning as many results as exist, not exactly three.

The implementation safely uses slicing:

candidates[:3]

If fewer than three candidates exist, Python automatically returns the available subset without errors. The Go version similarly checks the slice length before truncation.