LeetCode 451 - Sort Characters By Frequency

The problem asks us to rearrange the characters of a string so that characters with higher frequency appear before characters with lower frequency. The output string must group identical characters together, and those groups must appear in descending order of occurrence count.

LeetCode Problem 451

Difficulty: 🟡 Medium
Topics: Hash Table, String, Sorting, Heap (Priority Queue), Bucket Sort, Counting

Solution

Problem Understanding

The problem asks us to rearrange the characters of a string so that characters with higher frequency appear before characters with lower frequency. The output string must group identical characters together, and those groups must appear in descending order of occurrence count.

For example, if the input is "tree", the character 'e' appears twice, while 't' and 'r' appear once each. Since 'e' has the highest frequency, it must come first in the output. Valid outputs include "eert" and "eetr".

The input is a single string s, consisting of uppercase letters, lowercase letters, and digits. The output is another string containing exactly the same characters, but reordered according to frequency.

An important detail is that uppercase and lowercase letters are considered different characters. For instance, 'A' and 'a' are treated separately.

The constraints are significant:

  • 1 <= s.length <= 5 * 10^5

This means the string can contain up to 500,000 characters. Any solution with quadratic complexity, such as repeatedly scanning the string to count frequencies, will be too slow. We need an approach that is close to linear or O(n log n) at worst.

Several edge cases are important:

  • A string containing only one character, such as "a"
  • Multiple characters with the same frequency
  • Strings containing both uppercase and lowercase versions of the same letter
  • Very large inputs where efficiency matters
  • Strings where every character is unique

The problem guarantees that the input string is non-empty, so we never need to handle a completely empty input.

Approaches

Brute Force Approach

A brute-force solution would repeatedly search for the most frequent remaining character in the string.

One way to do this would be:

  1. Count how many times each character appears.
  2. Find the character with the maximum frequency.
  3. Append it to the result the required number of times.
  4. Remove it from consideration.
  5. Repeat until all characters are processed.

This approach is correct because each step greedily places the currently most frequent character next in the result.

However, if we repeatedly scan the frequency map to find the maximum remaining frequency, the algorithm becomes inefficient. In the worst case, when many unique characters exist, repeated searches increase the overall cost.

A more naive brute-force method could even count frequencies from scratch for each character, leading to quadratic behavior.

Given the large constraint size, we need a more efficient strategy.

Optimal Approach

The key observation is that we only need two operations:

  1. Count the frequency of each character
  2. Output characters ordered by frequency

A hash map is ideal for counting frequencies because it allows constant time insertion and lookup on average.

After counting frequencies, we can sort the characters by their counts in descending order. Since the number of distinct characters is limited compared to the total string length, this becomes efficient.

The process is:

  • Build a frequency map
  • Sort characters by frequency
  • Construct the final string

This avoids repeated scans and processes the input efficiently.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(k) Repeatedly searches for highest frequency character
Optimal O(n + k log k) O(k) Uses hash map and sorting by frequency

Here, n is the string length and k is the number of distinct characters.

Algorithm Walkthrough

  1. Create a hash map to store character frequencies.

We iterate through the string once and count how many times each character appears. A hash map is ideal because it provides efficient updates and lookups. 2. Convert the frequency map into a sortable structure.

After counting, each entry contains a character and its frequency. We need to sort these entries by frequency in descending order. 3. Sort the characters by frequency.

We sort all (character, frequency) pairs so the most frequent characters appear first. 4. Build the result string.

For each sorted pair:

  • Repeat the character frequency times
  • Append it to the result

This guarantees that all identical characters stay together. 5. Return the final string.

The constructed string satisfies the required ordering by frequency.

Why it works

The algorithm works because every character's exact frequency is counted first, and the sorting step guarantees that higher-frequency characters are processed before lower-frequency ones. Since each character is appended exactly as many times as it appeared in the original string, the final result contains all original characters in valid frequency order.

Python Solution

class Solution:
    def frequencySort(self, s: str) -> str:
        frequency = {}

        # Count character frequencies
        for char in s:
            frequency[char] = frequency.get(char, 0) + 1

        # Sort characters by descending frequency
        sorted_chars = sorted(
            frequency.items(),
            key=lambda item: item[1],
            reverse=True
        )

        # Build result string
        result = []

        for char, count in sorted_chars:
            result.append(char * count)

        return "".join(result)

The implementation begins by creating a dictionary called frequency. Each character in the string is processed once, and its count is incremented.

The sorted() function is then used on the dictionary items. Each item is a (character, count) pair. The sorting key is the frequency value, and reverse=True ensures descending order.

A list named result is used to efficiently build the output string. String concatenation inside loops is inefficient in Python because strings are immutable. Using a list and joining at the end avoids repeated allocations.

For each (character, count) pair, the character is repeated count times and appended to the result list.

Finally, "".join(result) combines all pieces into the final string.

Go Solution

package main

import (
	"sort"
	"strings"
)

func frequencySort(s string) string {
	frequency := make(map[rune]int)

	// Count frequencies
	for _, ch := range s {
		frequency[ch]++
	}

	// Convert map to slice
	type pair struct {
		char  rune
		count int
	}

	pairs := make([]pair, 0, len(frequency))

	for ch, count := range frequency {
		pairs = append(pairs, pair{ch, count})
	}

	// Sort by descending frequency
	sort.Slice(pairs, func(i, j int) bool {
		return pairs[i].count > pairs[j].count
	})

	// Build result
	var builder strings.Builder

	for _, p := range pairs {
		for i := 0; i < p.count; i++ {
			builder.WriteRune(p.char)
		}
	}

	return builder.String()
}

The Go implementation follows the same logic as the Python solution, but there are a few language-specific differences.

Go maps are used for frequency counting. Since strings in Go are UTF-8 encoded, iterating with range produces runes rather than bytes.

Because Go maps cannot be directly sorted, the frequency map is converted into a slice of structs containing character-count pairs.

The sort.Slice function sorts the slice by descending frequency.

A strings.Builder is used to efficiently construct the final string. This avoids repeated string allocations during concatenation.

Worked Examples

Example 1

Input:

s = "tree"

Step 1: Count Frequencies

Character Frequency
t 1
r 1
e 2

Step 2: Sort by Frequency

Sorted pairs:

Character Frequency
e 2
t 1
r 1

Step 3: Build Result

Step Result
Add "ee" "ee"
Add "t" "eet"
Add "r" "eetr"

Output:

"eetr"

"eert" would also be valid.

Example 2

Input:

s = "cccaaa"

Step 1: Count Frequencies

Character Frequency
c 3
a 3

Step 2: Sort by Frequency

Both characters have equal frequency, so either order is acceptable.

Possible sorted order:

Character Frequency
c 3
a 3

Step 3: Build Result

Step Result
Add "ccc" "ccc"
Add "aaa" "cccaaa"

Output:

"cccaaa"

"aaaccc" is also valid.

Example 3

Input:

s = "Aabb"

Step 1: Count Frequencies

Character Frequency
A 1
a 1
b 2

Step 2: Sort by Frequency

Character Frequency
b 2
A 1
a 1

Step 3: Build Result

Step Result
Add "bb" "bb"
Add "A" "bbA"
Add "a" "bbAa"

Output:

"bbAa"

Complexity Analysis

Measure Complexity Explanation
Time O(n + k log k) Counting takes O(n), sorting distinct characters takes O(k log k)
Space O(k) Frequency map and sorting structures store distinct characters

The counting step processes every character exactly once, giving O(n) time complexity.

If there are k distinct characters, sorting them requires O(k log k) time.

In practice, k is usually much smaller than n, especially because the problem limits characters to letters and digits.

The space complexity is O(k) because the frequency map stores one entry per distinct character.

Test Cases

sol = Solution()

# Provided examples
assert sol.frequencySort("tree") in ["eert", "eetr"]  # basic mixed frequencies
assert sol.frequencySort("cccaaa") in ["cccaaa", "aaaccc"]  # equal frequencies
assert sol.frequencySort("Aabb") in ["bbAa", "bbaA"]  # case sensitivity

# Single character
assert sol.frequencySort("a") == "a"  # minimum input size

# All unique characters
result = sol.frequencySort("abcd")
assert sorted(result) == sorted("abcd")  # all frequencies equal

# All identical characters
assert sol.frequencySort("aaaaaa") == "aaaaaa"  # single repeated character

# Digits included
result = sol.frequencySort("1122333")
assert result.startswith("333")  # highest frequency digit first

# Mixed uppercase and lowercase
result = sol.frequencySort("AaAaBB")
assert result[:2] == "AA" or result[:2] == "aa" or result[:2] == "BB"

# Long repeated input
large_input = "z" * 100000
assert sol.frequencySort(large_input) == large_input  # stress test

# Multiple equal groups
result = sol.frequencySort("bbccaa")
assert sorted(result) == sorted("bbccaa")  # preserves all characters

# Frequency ordering validation
result = sol.frequencySort("mississippi")
assert result.count('i') == 4
assert result.count('s') == 4
assert result.count('p') == 2
assert result.count('m') == 1
Test Why
"tree" Validates normal frequency sorting
"cccaaa" Tests equal frequencies
"Aabb" Verifies case sensitivity
"a" Tests smallest valid input
"abcd" Tests all unique characters
"aaaaaa" Tests single repeated character
"1122333" Ensures digits are handled correctly
"AaAaBB" Tests mixed case behavior
Very large repeated string Stress test for performance
"bbccaa" Tests equal frequency groups
"mississippi" Tests complex frequency distribution

Edge Cases

One important edge case is when all characters appear exactly once, such as "abcd". In this situation, every character has equal frequency, so any ordering is valid. A buggy implementation might incorrectly assume stable ordering requirements. Our implementation handles this naturally because the problem explicitly allows multiple valid outputs.

Another important case is when the string contains only one distinct character, such as "aaaaaa". Some implementations accidentally introduce unnecessary sorting logic or incorrect joins. Our solution correctly counts the character once and outputs it repeated the required number of times.

Case sensitivity is also a common source of bugs. In "Aabb", uppercase 'A' and lowercase 'a' must be treated separately. Since the frequency map uses the exact character as the key, the implementation naturally distinguishes them correctly.

A final important edge case is very large input sizes. Since the string length can reach 500,000 characters, inefficient string concatenation inside loops could become extremely slow. Both implementations avoid this problem by using efficient string-building techniques, specifically list joining in Python and strings.Builder in Go.