LeetCode 347 - Top K Frequent Elements
The problem asks us to return the k most frequently occurring elements from an integer array. We are given an array nums, which may contain duplicates, and an integer k. Our task is to identify which values appear most often and return exactly k of them.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, Divide and Conquer, Sorting, Heap (Priority Queue), Bucket Sort, Counting, Quickselect
Solution
Problem Understanding
The problem asks us to return the k most frequently occurring elements from an integer array. We are given an array nums, which may contain duplicates, and an integer k. Our task is to identify which values appear most often and return exactly k of them.
For example, if the array is [1,1,1,2,2,3], the number 1 appears three times, 2 appears twice, and 3 appears once. Since k = 2, we return the two most frequent elements, [1,2].
The important detail is that the output order does not matter. We only care that the returned values are the correct top k frequent elements.
The constraints are important because they guide us toward the intended solution. The array length can be as large as 10^5, which means inefficient approaches may time out. The follow up explicitly asks for a solution better than O(n log n), which rules out fully sorting all elements by frequency in the worst case.
The problem also guarantees that the answer is unique. This means there will never be ambiguity about which elements belong in the top k. For example, we will not encounter a situation where multiple elements tie for the final position in a way that creates multiple valid answers.
Several edge cases are worth considering upfront. The array may contain only one element, so the algorithm must handle minimal input sizes correctly. All elements could be identical, meaning there is only one unique value regardless of array size. Another important case is when k equals the number of unique elements, in which case we must return every distinct number. Negative integers are also allowed, so the solution cannot assume values are positive.
Approaches
Brute Force Approach
A straightforward approach is to first count the frequency of every number using a hash map. Once we know the frequencies, we can sort all unique elements by their frequency in descending order and return the first k elements.
This works because sorting guarantees that the most frequent elements appear first. After sorting, selecting the top k elements is trivial.
However, this approach is too slow for the follow up requirement. If there are m unique elements, sorting requires O(m log m) time. In the worst case, every element is unique, so m = n, giving a total complexity of O(n log n).
The problem specifically asks for something better than this complexity.
Optimal Approach, Bucket Sort
The key observation is that frequencies are bounded by the array length. No element can appear more than n times.
Instead of sorting elements by frequency, we can group elements into buckets based on how often they appear. Each bucket index represents a frequency, and the bucket stores all numbers with that frequency.
For example:
- Bucket
1contains numbers appearing once - Bucket
2contains numbers appearing twice - Bucket
3contains numbers appearing three times
After building these buckets, we iterate from the highest possible frequency down to the lowest and collect elements until we have k results.
This avoids sorting entirely and achieves linear time complexity.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n log n) | O(n) | Count frequencies, then sort by frequency |
| Optimal | O(n) | O(n) | Use bucket sort based on frequency counts |
Algorithm Walkthrough
- Create a frequency map using a hash table.
We iterate through the array and count how many times each number appears. A hash map is ideal because it provides average O(1) insertion and lookup time.
Example:
nums = [1,1,1,2,2,3]
frequency_map = {
1: 3,
2: 2,
3: 1
}
- Create buckets indexed by frequency.
Since the maximum possible frequency is len(nums), we create an array of buckets with size n + 1.
Each index represents a frequency.
Example:
buckets[1] = [3]
buckets[2] = [2]
buckets[3] = [1]
- Populate the buckets.
For every (number, frequency) pair in the hash map, append the number into the bucket corresponding to its frequency.
This groups numbers by occurrence count without sorting. 4. Traverse buckets in reverse order.
We iterate from the highest frequency down to 1.
The first buckets encountered contain the most frequent elements.
5. Collect elements until we have k results.
As we traverse the buckets, we append numbers into the result list. Once the result size reaches k, we return it immediately.
Why it works
The algorithm works because every number is placed into the bucket corresponding to its exact frequency. By traversing buckets from highest frequency to lowest, we encounter elements in descending order of occurrence count. Since we stop after collecting k elements, the returned list necessarily contains the k most frequent elements.
Python Solution
from typing import List
from collections import defaultdict
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
# Step 1: Count frequencies
frequency_map = {}
for num in nums:
frequency_map[num] = frequency_map.get(num, 0) + 1
# Step 2: Create buckets
buckets = [[] for _ in range(len(nums) + 1)]
# Step 3: Place numbers into frequency buckets
for num, frequency in frequency_map.items():
buckets[frequency].append(num)
# Step 4: Collect top k frequent elements
result = []
for frequency in range(len(buckets) - 1, 0, -1):
for num in buckets[frequency]:
result.append(num)
if len(result) == k:
return result
return result
The implementation begins by constructing a frequency map. Each number in the array is counted using a dictionary, where the key is the number and the value is its occurrence count.
Next, the code creates a bucket array with len(nums) + 1 slots. The extra slot is necessary because frequencies can range from 1 to n.
The algorithm then iterates through the frequency map and inserts each number into the bucket corresponding to its frequency. This effectively groups elements by occurrence count.
Finally, the code traverses the buckets in reverse order. Since higher indices correspond to higher frequencies, this guarantees that the most frequent elements are processed first. The algorithm stops as soon as k elements have been collected.
Go Solution
func topKFrequent(nums []int, k int) []int {
// Step 1: Count frequencies
frequencyMap := make(map[int]int)
for _, num := range nums {
frequencyMap[num]++
}
// Step 2: Create buckets
buckets := make([][]int, len(nums)+1)
// Step 3: Place numbers into buckets
for num, frequency := range frequencyMap {
buckets[frequency] = append(buckets[frequency], num)
}
// Step 4: Collect top k frequent elements
result := []int{}
for frequency := len(buckets) - 1; frequency >= 1; frequency-- {
for _, num := range buckets[frequency] {
result = append(result, num)
if len(result) == k {
return result
}
}
}
return result
}
The Go implementation follows the same logic as the Python version. A map[int]int is used for frequency counting, and a slice of integer slices represents the buckets.
One Go-specific detail is that slices are dynamically resized when using append, which makes bucket insertion straightforward. Unlike Python lists, Go slices must be explicitly initialized with make.
There are no integer overflow concerns because the frequency counts are bounded by 10^5, which easily fits within Go's integer range.
Worked Examples
Example 1
Input:
nums = [1,1,1,2,2,3]
k = 2
Step 1: Build Frequency Map
| Number | Frequency |
|---|---|
| 1 | 3 |
| 2 | 2 |
| 3 | 1 |
Step 2: Build Buckets
| Bucket Index (Frequency) | Values |
|---|---|
| 1 | [3] |
| 2 | [2] |
| 3 | [1] |
Step 3: Traverse Buckets Backward
| Current Frequency | Numbers Added | Result |
|---|---|---|
| 3 | 1 | [1] |
| 2 | 2 | [1,2] |
We now have k = 2 elements, so we return [1,2].
Example 2
Input:
nums = [1]
k = 1
Step 1: Frequency Map
| Number | Frequency |
|---|---|
| 1 | 1 |
Step 2: Buckets
| Bucket Index | Values |
|---|---|
| 1 | [1] |
Step 3: Traverse Backward
| Current Frequency | Numbers Added | Result |
|---|---|---|
| 1 | 1 | [1] |
Return [1].
Example 3
Input:
nums = [1,2,1,2,1,2,3,1,3,2]
k = 2
Step 1: Frequency Map
| Number | Frequency |
|---|---|
| 1 | 4 |
| 2 | 4 |
| 3 | 2 |
Step 2: Buckets
| Bucket Index | Values |
|---|---|
| 2 | [3] |
| 4 | [1,2] |
Step 3: Traverse Backward
| Current Frequency | Numbers Added | Result |
|---|---|---|
| 4 | 1 | [1] |
| 4 | 2 | [1,2] |
Return [1,2].
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Frequency counting, bucket insertion, and bucket traversal are all linear |
| Space | O(n) | Hash map and buckets may collectively store up to n elements |
The time complexity is linear because each element is processed a constant number of times. We avoid sorting entirely, which is the key optimization that satisfies the follow up requirement.
The space complexity is also linear because the frequency map stores all unique elements, and the buckets collectively store every distinct number once.
Test Cases
from typing import List
from collections import Counter
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
frequency_map = {}
for num in nums:
frequency_map[num] = frequency_map.get(num, 0) + 1
buckets = [[] for _ in range(len(nums) + 1)]
for num, frequency in frequency_map.items():
buckets[frequency].append(num)
result = []
for frequency in range(len(buckets) - 1, 0, -1):
for num in buckets[frequency]:
result.append(num)
if len(result) == k:
return result
return result
solution = Solution()
# Provided example
assert set(solution.topKFrequent([1,1,1,2,2,3], 2)) == {1,2} # standard case
# Single element
assert solution.topKFrequent([1], 1) == [1] # minimum input size
# Multiple frequent values
assert set(solution.topKFrequent([1,2,1,2,1,2,3,1,3,2], 2)) == {1,2} # tied high frequencies
# All unique elements
assert set(solution.topKFrequent([1,2,3,4], 2)).issubset({1,2,3,4}) # every frequency is 1
# Negative numbers
assert set(solution.topKFrequent([-1,-1,-2,-2,-2], 1)) == {-2} # negative integers
# k equals number of unique elements
assert set(solution.topKFrequent([4,4,5,5,6], 3)) == {4,5,6} # return all unique values
# Large repetition
assert solution.topKFrequent([7] * 1000, 1) == [7] # heavily repeated value
# Mixed frequencies
assert set(solution.topKFrequent([1,1,2,3,3,3,4,4,4,4], 2)) == {3,4} # descending frequencies
| Test | Why |
|---|---|
[1,1,1,2,2,3], k=2 |
Validates the standard example |
[1], k=1 |
Tests minimum input size |
[1,2,1,2,1,2,3,1,3,2], k=2 |
Tests multiple high-frequency elements |
[1,2,3,4], k=2 |
Ensures algorithm works when all frequencies are equal |
[-1,-1,-2,-2,-2], k=1 |
Verifies handling of negative numbers |
[4,4,5,5,6], k=3 |
Tests when k equals unique element count |
[7] * 1000, k=1 |
Tests extreme repetition |
[1,1,2,3,3,3,4,4,4,4], k=2 |
Verifies correct ordering by frequency |
Edge Cases
One important edge case occurs when the array contains only a single element. A naive implementation might assume multiple buckets or multiple unique values exist, but here there is exactly one valid answer. The implementation handles this naturally because the frequency map contains one entry and the bucket traversal immediately returns that value.
Another tricky case is when every element appears exactly once. In this situation, all numbers end up in the same bucket. Some implementations incorrectly rely on frequency uniqueness, but this problem only guarantees uniqueness of the final answer, not uniqueness of every frequency. The bucket-based solution still works because all elements with frequency 1 are grouped together and returned as needed.
Negative integers are another common source of bugs. Some counting approaches incorrectly attempt to use array indices directly from the values themselves, which breaks for negative numbers. This implementation uses a hash map instead, so both positive and negative integers are handled uniformly and correctly.
A final edge case occurs when k equals the total number of unique elements. In this case, the algorithm must return every distinct value. The reverse bucket traversal naturally continues until all unique elements have been collected, ensuring correctness without requiring special handling.