LeetCode 1804 - Implement Trie II (Prefix Tree)

This problem asks us to implement a Trie data structure, also known as a prefix tree, which is designed to store strings efficiently in a way that allows fast lookups and prefix queries. We are required to implement the following operations: 1.

LeetCode Problem 1804

Difficulty: 🟡 Medium
Topics: Hash Table, String, Design, Trie

Solution

Problem Understanding

This problem asks us to implement a Trie data structure, also known as a prefix tree, which is designed to store strings efficiently in a way that allows fast lookups and prefix queries. We are required to implement the following operations:

  1. insert(word): Add a word to the trie.
  2. countWordsEqualTo(word): Count how many times a word appears in the trie.
  3. countWordsStartingWith(prefix): Count how many words in the trie start with a given prefix.
  4. erase(word): Remove one occurrence of a word from the trie.

The input consists of multiple operations applied in sequence, and the output is the result of count operations. The constraints tell us that words and prefixes can be up to 2000 characters long, and we may receive up to 30,000 operations. This implies that brute-force string scanning would be too slow, especially for prefix counts. We are guaranteed that any word passed to erase exists in the trie, so we do not need to handle erasing non-existent words.

Edge cases that can trip up naive implementations include inserting the same word multiple times, erasing words that are prefixes of other words, and counting words with long prefixes.

Approaches

A brute-force approach would store all words in a list or hash map. For insert, we append the word. For countWordsEqualTo, we scan the entire collection to count exact matches. For countWordsStartingWith, we scan the entire collection and check the prefix. Erasing a word requires scanning for the first occurrence and removing it. While simple, this approach is inefficient: counting operations require O(N * L) time where N is the number of words and L is the word length.

The optimal approach uses a trie node with a dictionary of children. Each node stores two counters: countEnd for words ending at this node and countPrefix for words passing through this node. This allows all operations-insert, count, erase-to run in O(L) time, where L is the word length. This is efficient even for large N and long words because operations only depend on word length, not the total number of stored words.

Approach Time Complexity Space Complexity Notes
Brute Force O(N * L) O(N * L) Scans entire collection for counts; too slow for large N
Optimal O(L) O(N * L) Uses trie nodes with counters; fast insert, erase, and prefix queries

Algorithm Walkthrough

  1. Trie Node Structure: Each node contains a dictionary mapping characters to child nodes, an integer countEnd representing how many words end at this node, and an integer countPrefix representing how many words pass through this node.
  2. Insert Operation: Start at the root node. For each character in the word, create a child node if it does not exist, then increment countPrefix. After processing all characters, increment countEnd at the final node.
  3. Count Words Equal To: Start at the root node and traverse according to each character. If a character does not exist in children, return 0. After traversal, return countEnd at the last node.
  4. Count Words Starting With: Similar to counting exact words, traverse characters of the prefix. If any character is missing, return 0. Otherwise, return countPrefix at the last node.
  5. Erase Operation: Traverse the trie as in insert, decrementing countPrefix at each node. At the last node, decrement countEnd. This ensures only one instance of the word is removed and maintains correct counts for prefixes.

Why it works: The trie maintains the invariant that countPrefix at each node equals the total number of words that pass through that node, and countEnd equals the number of words ending at that node. This invariant guarantees that counting and erasing operations work correctly.

Python Solution

class TrieNode:
    def __init__(self):
        self.children = {}
        self.countEnd = 0
        self.countPrefix = 0

class Trie:

    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
            node.countPrefix += 1
        node.countEnd += 1

    def countWordsEqualTo(self, word: str) -> int:
        node = self.root
        for char in word:
            if char not in node.children:
                return 0
            node = node.children[char]
        return node.countEnd

    def countWordsStartingWith(self, prefix: str) -> int:
        node = self.root
        for char in prefix:
            if char not in node.children:
                return 0
            node = node.children[char]
        return node.countPrefix

    def erase(self, word: str) -> None:
        node = self.root
        nodes_on_path = [node]
        for char in word:
            node = node.children[char]
            nodes_on_path.append(node)
        nodes_on_path[-1].countEnd -= 1
        for i, char in enumerate(word, 1):
            nodes_on_path[i].countPrefix -= 1

Explanation: The TrieNode class encapsulates children and counters. The insert function ensures all characters are added and prefix counts updated. Counting operations traverse the trie and return the appropriate counter. Erasing follows the word path, decrementing counts without removing nodes entirely, which avoids breaking other words that share the prefix.

Go Solution

type TrieNode struct {
	children    map[rune]*TrieNode
	countEnd    int
	countPrefix int
}

type Trie struct {
	root *TrieNode
}

func Constructor() Trie {
	return Trie{root: &TrieNode{children: make(map[rune]*TrieNode)}}
}

func (this *Trie) Insert(word string) {
	node := this.root
	for _, char := range word {
		if node.children[char] == nil {
			node.children[char] = &TrieNode{children: make(map[rune]*TrieNode)}
		}
		node = node.children[char]
		node.countPrefix++
	}
	node.countEnd++
}

func (this *Trie) CountWordsEqualTo(word string) int {
	node := this.root
	for _, char := range word {
		if node.children[char] == nil {
			return 0
		}
		node = node.children[char]
	}
	return node.countEnd
}

func (this *Trie) CountWordsStartingWith(prefix string) int {
	node := this.root
	for _, char := range prefix {
		if node.children[char] == nil {
			return 0
		}
		node = node.children[char]
	}
	return node.countPrefix
}

func (this *Trie) Erase(word string) {
	node := this.root
	nodesOnPath := []*TrieNode{node}
	for _, char := range word {
		node = node.children[char]
		nodesOnPath = append(nodesOnPath, node)
	}
	nodesOnPath[len(nodesOnPath)-1].countEnd--
	for i, char := range word {
		nodesOnPath[i+1].countPrefix--
	}
}

Explanation: The Go implementation mirrors the Python version, with map[rune]*TrieNode for children. Slice nodesOnPath keeps track of the nodes for correct prefix decrementing. Initialization requires explicit make(map[rune]*TrieNode) to avoid nil map assignments.

Worked Examples

For the input sequence:

insert("apple")
insert("apple")
countWordsEqualTo("apple")
countWordsStartingWith("app")
erase("apple")
countWordsEqualTo("apple")
countWordsStartingWith("app")
erase("apple")
countWordsStartingWith("app")

The trie evolves as follows:

Operation Trie State countEnd countPrefix Output
insert("apple") a->p->p->l->e 1 at 'e' 1 on each node null
insert("apple") a->p->p->l->e 2 at 'e' 2 on each node null
countWordsEqualTo("apple") - 2 - 2
countWordsStartingWith("app") - - 2 at 'p' 2
erase("apple") a->p->p->l->e 1 at 'e' 1 on each node null
countWordsEqualTo("apple") - 1 - 1
countWordsStartingWith("app") - - 1 at 'p' 1
erase("apple") trie empty 0 0 null
countWordsStartingWith("app") - - 0 0

Complexity Analysis

Measure Complexity Explanation
Time O(L) per operation Traverses each character in the word/prefix only
Space O(N * L) Stores each character in trie nodes for all words

The algorithm is highly efficient since operations depend on word length rather than total number of words. Space usage scales with total characters stored.

Test Cases

trie = Trie