LeetCode 451 - Sort Characters By Frequency
The problem asks us to rearrange the characters of a string so that characters with higher frequency appear before characters with lower frequency. The output string must group identical characters together, and those groups must appear in descending order of occurrence count.
Difficulty: 🟡 Medium
Topics: Hash Table, String, Sorting, Heap (Priority Queue), Bucket Sort, Counting
Solution
Problem Understanding
The problem asks us to rearrange the characters of a string so that characters with higher frequency appear before characters with lower frequency. The output string must group identical characters together, and those groups must appear in descending order of occurrence count.
For example, if the input is "tree", the character 'e' appears twice, while 't' and 'r' appear once each. Since 'e' has the highest frequency, it must come first in the output. Valid outputs include "eert" and "eetr".
The input is a single string s, consisting of uppercase letters, lowercase letters, and digits. The output is another string containing exactly the same characters, but reordered according to frequency.
An important detail is that uppercase and lowercase letters are considered different characters. For instance, 'A' and 'a' are treated separately.
The constraints are significant:
1 <= s.length <= 5 * 10^5
This means the string can contain up to 500,000 characters. Any solution with quadratic complexity, such as repeatedly scanning the string to count frequencies, will be too slow. We need an approach that is close to linear or O(n log n) at worst.
Several edge cases are important:
- A string containing only one character, such as
"a" - Multiple characters with the same frequency
- Strings containing both uppercase and lowercase versions of the same letter
- Very large inputs where efficiency matters
- Strings where every character is unique
The problem guarantees that the input string is non-empty, so we never need to handle a completely empty input.
Approaches
Brute Force Approach
A brute-force solution would repeatedly search for the most frequent remaining character in the string.
One way to do this would be:
- Count how many times each character appears.
- Find the character with the maximum frequency.
- Append it to the result the required number of times.
- Remove it from consideration.
- Repeat until all characters are processed.
This approach is correct because each step greedily places the currently most frequent character next in the result.
However, if we repeatedly scan the frequency map to find the maximum remaining frequency, the algorithm becomes inefficient. In the worst case, when many unique characters exist, repeated searches increase the overall cost.
A more naive brute-force method could even count frequencies from scratch for each character, leading to quadratic behavior.
Given the large constraint size, we need a more efficient strategy.
Optimal Approach
The key observation is that we only need two operations:
- Count the frequency of each character
- Output characters ordered by frequency
A hash map is ideal for counting frequencies because it allows constant time insertion and lookup on average.
After counting frequencies, we can sort the characters by their counts in descending order. Since the number of distinct characters is limited compared to the total string length, this becomes efficient.
The process is:
- Build a frequency map
- Sort characters by frequency
- Construct the final string
This avoids repeated scans and processes the input efficiently.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(k) | Repeatedly searches for highest frequency character |
| Optimal | O(n + k log k) | O(k) | Uses hash map and sorting by frequency |
Here, n is the string length and k is the number of distinct characters.
Algorithm Walkthrough
- Create a hash map to store character frequencies.
We iterate through the string once and count how many times each character appears. A hash map is ideal because it provides efficient updates and lookups. 2. Convert the frequency map into a sortable structure.
After counting, each entry contains a character and its frequency. We need to sort these entries by frequency in descending order. 3. Sort the characters by frequency.
We sort all (character, frequency) pairs so the most frequent characters appear first.
4. Build the result string.
For each sorted pair:
- Repeat the character
frequencytimes - Append it to the result
This guarantees that all identical characters stay together. 5. Return the final string.
The constructed string satisfies the required ordering by frequency.
Why it works
The algorithm works because every character's exact frequency is counted first, and the sorting step guarantees that higher-frequency characters are processed before lower-frequency ones. Since each character is appended exactly as many times as it appeared in the original string, the final result contains all original characters in valid frequency order.
Python Solution
class Solution:
def frequencySort(self, s: str) -> str:
frequency = {}
# Count character frequencies
for char in s:
frequency[char] = frequency.get(char, 0) + 1
# Sort characters by descending frequency
sorted_chars = sorted(
frequency.items(),
key=lambda item: item[1],
reverse=True
)
# Build result string
result = []
for char, count in sorted_chars:
result.append(char * count)
return "".join(result)
The implementation begins by creating a dictionary called frequency. Each character in the string is processed once, and its count is incremented.
The sorted() function is then used on the dictionary items. Each item is a (character, count) pair. The sorting key is the frequency value, and reverse=True ensures descending order.
A list named result is used to efficiently build the output string. String concatenation inside loops is inefficient in Python because strings are immutable. Using a list and joining at the end avoids repeated allocations.
For each (character, count) pair, the character is repeated count times and appended to the result list.
Finally, "".join(result) combines all pieces into the final string.
Go Solution
package main
import (
"sort"
"strings"
)
func frequencySort(s string) string {
frequency := make(map[rune]int)
// Count frequencies
for _, ch := range s {
frequency[ch]++
}
// Convert map to slice
type pair struct {
char rune
count int
}
pairs := make([]pair, 0, len(frequency))
for ch, count := range frequency {
pairs = append(pairs, pair{ch, count})
}
// Sort by descending frequency
sort.Slice(pairs, func(i, j int) bool {
return pairs[i].count > pairs[j].count
})
// Build result
var builder strings.Builder
for _, p := range pairs {
for i := 0; i < p.count; i++ {
builder.WriteRune(p.char)
}
}
return builder.String()
}
The Go implementation follows the same logic as the Python solution, but there are a few language-specific differences.
Go maps are used for frequency counting. Since strings in Go are UTF-8 encoded, iterating with range produces runes rather than bytes.
Because Go maps cannot be directly sorted, the frequency map is converted into a slice of structs containing character-count pairs.
The sort.Slice function sorts the slice by descending frequency.
A strings.Builder is used to efficiently construct the final string. This avoids repeated string allocations during concatenation.
Worked Examples
Example 1
Input:
s = "tree"
Step 1: Count Frequencies
| Character | Frequency |
|---|---|
| t | 1 |
| r | 1 |
| e | 2 |
Step 2: Sort by Frequency
Sorted pairs:
| Character | Frequency |
|---|---|
| e | 2 |
| t | 1 |
| r | 1 |
Step 3: Build Result
| Step | Result |
|---|---|
Add "ee" |
"ee" |
Add "t" |
"eet" |
Add "r" |
"eetr" |
Output:
"eetr"
"eert" would also be valid.
Example 2
Input:
s = "cccaaa"
Step 1: Count Frequencies
| Character | Frequency |
|---|---|
| c | 3 |
| a | 3 |
Step 2: Sort by Frequency
Both characters have equal frequency, so either order is acceptable.
Possible sorted order:
| Character | Frequency |
|---|---|
| c | 3 |
| a | 3 |
Step 3: Build Result
| Step | Result |
|---|---|
Add "ccc" |
"ccc" |
Add "aaa" |
"cccaaa" |
Output:
"cccaaa"
"aaaccc" is also valid.
Example 3
Input:
s = "Aabb"
Step 1: Count Frequencies
| Character | Frequency |
|---|---|
| A | 1 |
| a | 1 |
| b | 2 |
Step 2: Sort by Frequency
| Character | Frequency |
|---|---|
| b | 2 |
| A | 1 |
| a | 1 |
Step 3: Build Result
| Step | Result |
|---|---|
Add "bb" |
"bb" |
Add "A" |
"bbA" |
Add "a" |
"bbAa" |
Output:
"bbAa"
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n + k log k) | Counting takes O(n), sorting distinct characters takes O(k log k) |
| Space | O(k) | Frequency map and sorting structures store distinct characters |
The counting step processes every character exactly once, giving O(n) time complexity.
If there are k distinct characters, sorting them requires O(k log k) time.
In practice, k is usually much smaller than n, especially because the problem limits characters to letters and digits.
The space complexity is O(k) because the frequency map stores one entry per distinct character.
Test Cases
sol = Solution()
# Provided examples
assert sol.frequencySort("tree") in ["eert", "eetr"] # basic mixed frequencies
assert sol.frequencySort("cccaaa") in ["cccaaa", "aaaccc"] # equal frequencies
assert sol.frequencySort("Aabb") in ["bbAa", "bbaA"] # case sensitivity
# Single character
assert sol.frequencySort("a") == "a" # minimum input size
# All unique characters
result = sol.frequencySort("abcd")
assert sorted(result) == sorted("abcd") # all frequencies equal
# All identical characters
assert sol.frequencySort("aaaaaa") == "aaaaaa" # single repeated character
# Digits included
result = sol.frequencySort("1122333")
assert result.startswith("333") # highest frequency digit first
# Mixed uppercase and lowercase
result = sol.frequencySort("AaAaBB")
assert result[:2] == "AA" or result[:2] == "aa" or result[:2] == "BB"
# Long repeated input
large_input = "z" * 100000
assert sol.frequencySort(large_input) == large_input # stress test
# Multiple equal groups
result = sol.frequencySort("bbccaa")
assert sorted(result) == sorted("bbccaa") # preserves all characters
# Frequency ordering validation
result = sol.frequencySort("mississippi")
assert result.count('i') == 4
assert result.count('s') == 4
assert result.count('p') == 2
assert result.count('m') == 1
| Test | Why |
|---|---|
"tree" |
Validates normal frequency sorting |
"cccaaa" |
Tests equal frequencies |
"Aabb" |
Verifies case sensitivity |
"a" |
Tests smallest valid input |
"abcd" |
Tests all unique characters |
"aaaaaa" |
Tests single repeated character |
"1122333" |
Ensures digits are handled correctly |
"AaAaBB" |
Tests mixed case behavior |
| Very large repeated string | Stress test for performance |
"bbccaa" |
Tests equal frequency groups |
"mississippi" |
Tests complex frequency distribution |
Edge Cases
One important edge case is when all characters appear exactly once, such as "abcd". In this situation, every character has equal frequency, so any ordering is valid. A buggy implementation might incorrectly assume stable ordering requirements. Our implementation handles this naturally because the problem explicitly allows multiple valid outputs.
Another important case is when the string contains only one distinct character, such as "aaaaaa". Some implementations accidentally introduce unnecessary sorting logic or incorrect joins. Our solution correctly counts the character once and outputs it repeated the required number of times.
Case sensitivity is also a common source of bugs. In "Aabb", uppercase 'A' and lowercase 'a' must be treated separately. Since the frequency map uses the exact character as the key, the implementation naturally distinguishes them correctly.
A final important edge case is very large input sizes. Since the string length can reach 500,000 characters, inefficient string concatenation inside loops could become extremely slow. Both implementations avoid this problem by using efficient string-building techniques, specifically list joining in Python and strings.Builder in Go.