LeetCode 692 - Top K Frequent Words
The problem gives us an array of strings called words and an integer k. Each string represents a word, and words may appear multiple times in the array. Our task is to return the k most frequent words. The result must follow two ordering rules.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Trie, Sorting, Heap (Priority Queue), Bucket Sort, Counting
Solution
Problem Understanding
The problem gives us an array of strings called words and an integer k. Each string represents a word, and words may appear multiple times in the array. Our task is to return the k most frequent words.
The result must follow two ordering rules. First, words with higher frequency must appear earlier in the result. Second, if two words have the same frequency, the word with smaller lexicographical order, meaning alphabetical order, must come first.
In other words, we are ranking words using two criteria:
- Higher frequency is better.
- If frequencies are equal, alphabetically smaller words are better.
For example, if both "apple" and "banana" appear three times, then "apple" must come before "banana" because "apple" is lexicographically smaller.
The input constraints are relatively small. The array length is at most 500, so even an O(n log n) solution is perfectly acceptable. However, the follow-up asks whether we can achieve O(n log k) time, which suggests using a heap to avoid sorting all unique words when we only need the top k.
The constraints also guarantee several important things:
- Every word contains only lowercase English letters.
kis always valid.- There is at least one word.
knever exceeds the number of unique words.
These guarantees simplify implementation because we never need to handle invalid input or empty results.
Several edge cases are important to consider:
- Multiple words may have exactly the same frequency.
kmay equal the number of unique words.- All words may be identical.
- Every word may be unique.
- Lexicographical ordering becomes critical when frequencies tie.
A naive implementation often fails on tie-breaking rules, especially because the problem requires alphabetical order only when frequencies are equal.
Approaches
Brute Force Approach
The most straightforward solution is to count the frequency of every word using a hash map, then sort all unique words using a custom sorting rule.
The sorting rule is:
- Higher frequency first.
- Lexicographically smaller word first when frequencies are equal.
After sorting, we simply take the first k words.
This approach is correct because sorting globally guarantees the final ordering satisfies both requirements. However, it may do unnecessary work because we sort all unique words even though we only need the top k.
Key Insight for the Optimal Solution
The key observation is that we do not actually need all words fully sorted if k is much smaller than the number of unique words.
A heap allows us to maintain only the best k candidates seen so far. Instead of sorting everything, we continuously keep the heap size limited to k.
The tricky part is tie-breaking. Since we want:
- Higher frequency to rank higher
- Lexicographically smaller words to rank higher on ties
We construct a min-heap where the "worst" candidate stays at the top. That way, when the heap grows beyond size k, we remove the least desirable element.
This reduces the complexity from sorting all unique words to maintaining only k elements at a time.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n + m log m) | O(m) | Count frequencies, then sort all unique words |
| Optimal | O(n log k) | O(m) | Use frequency map and min-heap of size k |
Here, n is the total number of words and m is the number of unique words.
Algorithm Walkthrough
Optimal Heap-Based Algorithm
- Count the frequency of every word using a hash map.
We iterate through the input array once and store counts in a dictionary. This gives constant-time updates and allows efficient frequency lookup later.
2. Create a min-heap to store at most k elements.
The heap will contain pairs representing each word and its frequency. The heap ordering is designed so the least desirable candidate appears at the top. 3. Iterate through all unique words in the frequency map.
For each word:
- Push it into the heap.
- If the heap size exceeds
k, remove the smallest element.
This ensures the heap always contains the current best k words.
4. Extract all words from the heap.
Since the heap is a min-heap, extraction gives words from worst to best. We therefore reverse the final result. 5. Return the reversed list.
After reversal, the result satisfies:
- Higher frequencies first
- Lexicographically smaller words first for ties
Why it works
The heap invariant is that the heap always contains the best k candidates seen so far. Whenever a new candidate is better than the worst element currently in the heap, it survives while the worse element gets removed.
Because the heap ordering exactly mirrors the ranking rules in reverse priority, removing the smallest heap element always discards the least desirable candidate. After processing all words, the heap therefore contains exactly the top k frequent words.
Python Solution
from typing import List
from collections import Counter
import heapq
class Solution:
def topKFrequent(self, words: List[str], k: int) -> List[str]:
frequency = Counter(words)
heap = []
for word, count in frequency.items():
heapq.heappush(heap, (-count, word))
result = []
for _ in range(k):
count, word = heapq.heappop(heap)
result.append(word)
return result
The implementation begins by using Counter from Python's collections module to build the frequency map. This is both concise and efficient.
The heap stores tuples of (-count, word). Python's heap implementation is a min-heap, so negating the count effectively simulates max-heap behavior. This allows words with larger frequencies to come out first.
The second tuple element, word, automatically handles lexicographical ordering when frequencies are equal. Python compares tuples element by element, so:
- Larger frequency becomes smaller negative number.
- Lexicographically smaller word comes first naturally.
We then pop exactly k elements from the heap and append the words to the result list.
Although the earlier walkthrough discussed a heap of size k, Python's tuple ordering allows a simpler implementation that pushes all elements and extracts the top k. This is still efficient for the given constraints and very clean to read.
Go Solution
package main
import (
"container/heap"
)
type WordFreq struct {
word string
count int
}
type MinHeap []WordFreq
func (h MinHeap) Len() int {
return len(h)
}
func (h MinHeap) Less(i, j int) bool {
if h[i].count == h[j].count {
return h[i].word < h[j].word
}
return h[i].count > h[j].count
}
func (h MinHeap) Swap(i, j int) {
h[i], h[j] = h[j], h[i]
}
func (h *MinHeap) Push(x interface{}) {
*h = append(*h, x.(WordFreq))
}
func (h *MinHeap) Pop() interface{} {
old := *h
n := len(old)
item := old[n-1]
*h = old[:n-1]
return item
}
func topKFrequent(words []string, k int) []string {
frequency := make(map[string]int)
for _, word := range words {
frequency[word]++
}
h := &MinHeap{}
heap.Init(h)
for word, count := range frequency {
heap.Push(h, WordFreq{
word: word,
count: count,
})
}
result := make([]string, 0, k)
for i := 0; i < k; i++ {
item := heap.Pop(h).(WordFreq)
result = append(result, item.word)
}
return result
}
The Go implementation follows the same overall strategy as the Python version but requires explicit heap implementation because Go's container/heap package only provides heap mechanics, not built-in priority queue behavior.
The Less function defines ordering rules. Higher frequency should appear first, so larger counts are treated as higher priority. When counts are equal, lexicographically smaller words rank higher.
Unlike Python, Go does not provide tuple comparison automatically, so the comparison logic must be written explicitly.
Slices are dynamically resized during insertion, and no special handling for nil is required because the constraints guarantee valid input.
Worked Examples
Example 1
Input:
words = ["i","love","leetcode","i","love","coding"]
k = 2
Step 1: Count Frequencies
| Word | Frequency |
|---|---|
| i | 2 |
| love | 2 |
| leetcode | 1 |
| coding | 1 |
Step 2: Build Heap
Heap contents conceptually become:
| Priority | Word |
|---|---|
| 2 | i |
| 2 | love |
| 1 | coding |
| 1 | leetcode |
Because "i" is lexicographically smaller than "love", it ranks higher when frequencies tie.
Step 3: Extract Top K
First pop:
("i", 2)
Second pop:
("love", 2)
Final Result
["i", "love"]
Example 2
Input:
words = ["the","day","is","sunny","the","the","the","sunny","is","is"]
k = 4
Step 1: Count Frequencies
| Word | Frequency |
|---|---|
| the | 4 |
| is | 3 |
| sunny | 2 |
| day | 1 |
Step 2: Build Heap
Conceptual ordering:
| Priority | Word |
|---|---|
| 4 | the |
| 3 | is |
| 2 | sunny |
| 1 | day |
Step 3: Extract Top K
Pop sequence:
| Extraction | Word |
|---|---|
| 1 | the |
| 2 | is |
| 3 | sunny |
| 4 | day |
Final Result
["the", "is", "sunny", "day"]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n + m log m) | Counting takes O(n), heap insertions take O(m log m) |
| Space | O(m) | Frequency map and heap store unique words |
Here:
nis the total number of words.mis the number of unique words.
The frequency counting phase is linear because each word is processed exactly once. Heap insertion occurs once per unique word. Since the heap may contain all unique words, insertion complexity becomes O(m log m).
If we instead maintain a heap of size k, the complexity can be improved to O(n log k).
Test Cases
solution = Solution()
# Example 1
assert solution.topKFrequent(
["i","love","leetcode","i","love","coding"], 2
) == ["i","love"]
# Example 2
assert solution.topKFrequent(
["the","day","is","sunny","the","the","the","sunny","is","is"], 4
) == ["the","is","sunny","day"]
# Single word input
assert solution.topKFrequent(
["hello"], 1
) == ["hello"]
# All words identical
assert solution.topKFrequent(
["a","a","a","a"], 1
) == ["a"]
# All words unique
assert solution.topKFrequent(
["c","b","a"], 3
) == ["a","b","c"]
# Tie frequency with lexicographical ordering
assert solution.topKFrequent(
["apple","banana","apple","banana","cat"], 2
) == ["apple","banana"]
# k equals number of unique words
assert solution.topKFrequent(
["x","y","z","x"], 3
) == ["x","y","z"]
# Larger frequency dominance
assert solution.topKFrequent(
["a","a","a","b","b","c"], 2
) == ["a","b"]
# Lexicographical tie-breaking
assert solution.topKFrequent(
["aa","ab","ac","aa","ab","ac"], 3
) == ["aa","ab","ac"]
| Test | Why |
|---|---|
| Example 1 | Validates standard frequency counting |
| Example 2 | Validates multiple frequencies |
| Single word input | Smallest valid input |
| All words identical | Ensures repeated counting works |
| All words unique | Tests lexicographical ordering only |
| Tie frequency case | Validates alphabetical tie-breaking |
| k equals unique count | Ensures full output handling |
| Larger frequency dominance | Confirms highest counts come first |
| Equal frequency ordering | Verifies lexicographical comparison |
Edge Cases
One important edge case occurs when all words have the same frequency. In this situation, the output ordering depends entirely on lexicographical order. Many incorrect solutions forget this secondary sorting rule and produce arbitrary ordering. The implementation handles this correctly because tuple comparison in Python automatically compares the word when frequencies tie.
Another important case is when there is only one unique word repeated many times. A buggy implementation may incorrectly assume multiple distinct elements exist or mishandle heap extraction. Since the frequency map stores unique words naturally, the solution correctly returns the single repeated word.
A third edge case happens when k equals the number of unique words. In this scenario, the algorithm must return every unique word in properly sorted order. Some heap-based implementations accidentally discard elements or reverse ordering incorrectly. Because the implementation extracts exactly k elements after building the heap, every unique word appears exactly once in the final result.
A final subtle case involves words that differ only slightly lexicographically, such as "aa", "ab", and "ac". Incorrect custom comparators often mishandle alphabetical ordering. Using tuple ordering in Python guarantees lexicographical comparisons follow the language's built-in string comparison semantics.