LeetCode 1647 - Minimum Deletions to Make Character Frequencies Unique

Difficulty: 🟡 Medium
Topics: Hash Table, String, Greedy, Sorting

Solution

Problem Understanding

The problem asks us to make a string good, which means no two distinct characters in the string have the same frequency. The input is a string s consisting of lowercase English letters, and the output is the minimum number of character deletions required to achieve a good string. The frequency of a character is simply the number of times it appears in the string.

The constraints indicate that s can be up to length 10^5, so solutions with quadratic time complexity would be too slow. Inputs can include strings where all characters are the same, all characters are unique, or where multiple characters share the same frequency. The problem explicitly allows characters to be deleted until the string is good, including potentially deleting all instances of a character, and frequencies of zero do not conflict with other characters. Edge cases that could trip up naive implementations include strings with many repeated characters forming large frequency clusters or strings where all characters have frequency 1.

Approaches

A brute-force approach would involve repeatedly checking all pairs of character frequencies and decrementing a frequency whenever a duplicate is found, then repeating until all frequencies are unique. This method is correct because it ensures uniqueness by iterative deletion, but it is inefficient, potentially requiring O(n^2) operations in the worst case if many characters share the same frequency.

The key insight for an optimal approach is that we only care about the set of frequencies. If multiple characters share the same frequency, we can greedily reduce the frequency of one of them until it becomes unique. By sorting the frequencies in descending order, we can iteratively adjust them while keeping track of the maximum allowable frequency to avoid duplicates. This works efficiently because we never increase frequencies, and we always resolve conflicts greedily from largest to smallest, which minimizes deletions.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n^2)	O(26)	Repeatedly scan and decrement conflicting frequencies
Optimal	O(n + k log k)	O(26)	Count frequencies, sort, greedily reduce duplicates; k <= 26 letters

Algorithm Walkthrough

Count the frequency of each character in the string using a hash map or fixed-size array of size 26.
Store all non-zero frequencies in a list.
Sort the list of frequencies in descending order so we handle higher frequencies first, which allows us to make minimal deletions.
Initialize a variable deletions to 0 to track the total number of deletions.
Set a variable max_allowed to the first frequency in the sorted list. This represents the maximum frequency the next character can have to maintain uniqueness.
Iterate through the sorted frequencies starting from the second element:
If the current frequency is greater than or equal to max_allowed, reduce it to max(0, max_allowed - 1) and add the difference to deletions.
Update max_allowed to the new frequency for the next iteration.
Return the total deletions.

Why it works: By processing frequencies in descending order and always limiting the next frequency to be less than the previous one, we maintain the invariant that no two frequencies are the same while minimizing deletions. Frequencies cannot increase, so greedily reducing duplicates ensures minimal total deletions.

Python Solution

class Solution:
    def minDeletions(self, s: str) -> int:
        from collections import Counter

        freq_counter = Counter(s)
        frequencies = sorted(freq_counter.values(), reverse=True)
        deletions = 0
        max_allowed = frequencies[0]

        for freq in frequencies:
            if freq > max_allowed:
                deletions += freq - max_allowed
                freq = max_allowed
            max_allowed = max(0, freq - 1)

        return deletions

This implementation first counts character frequencies using Counter, sorts them in descending order, and then iteratively adjusts them using the greedy approach described. The max_allowed ensures each frequency is strictly less than the previous, and max(0, ...) ensures we never have negative frequencies.

Go Solution

import "sort"

func minDeletions(s string) int {
    freq := make([]int, 26)
    for _, ch := range s {
        freq[ch-'a']++
    }

    freqs := []int{}
    for _, f := range freq {
        if f > 0 {
            freqs = append(freqs, f)
        }
    }

    sort.Sort(sort.Reverse(sort.IntSlice(freqs)))

    deletions := 0
    maxAllowed := freqs[0]

    for _, f := range freqs {
        if f > maxAllowed {
            deletions += f - maxAllowed
            f = maxAllowed
        }
        if maxAllowed > 0 {
            maxAllowed = f - 1
        }
    }

    return deletions
}

The Go version uses a fixed-size array to count character frequencies and slices to hold non-zero frequencies. Sorting and greedy reduction follow the same logic as Python. Go requires careful handling of slices and integer indices, but the logic mirrors the Python approach.

Worked Examples

Example 1: s = "aab"

Frequencies: [2,1] → Sorted [2,1] → max_allowed = 2

'a': freq 2 ≤ 2 → OK
'b': freq 1 < 2 → OK

Deletions = 0

Example 2: s = "aaabbbcc"

Frequencies: [3,3,2] → Sorted [3,3,2] → max_allowed = 3

First 3 → OK
Second 3 → reduce to 2 → deletions += 1, max_allowed = 1
Third 2 → reduce to 1 → deletions += 1, max_allowed = 0

Total deletions = 2

Example 3: s = "ceabaacb"

Frequencies: [3,2,2,1] → Sorted [3,2,2,1] → max_allowed = 3

3 → OK
2 → OK, max_allowed = 1
2 → reduce to 1 → deletions +=1, max_allowed = 0
1 → reduce to 0 → deletions +=1

Total deletions = 2

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n + k log k)	Counting frequencies is O(n), sorting ≤ 26 elements is O(k log k) where k ≤ 26, iterating frequencies is O(k)
Space	O(k)	Frequency array of size 26 plus temporary list of non-zero frequencies

The algorithm is efficient because k is bounded by 26, so sorting and iteration are effectively constant, making this linear in the length of the string.

Test Cases

# Provided examples
assert Solution().minDeletions("aab") == 0  # already good
assert Solution().minDeletions("aaabbbcc") == 2  # reduce duplicates
assert Solution().minDeletions("ceabaacb") == 2  # deletions needed for uniqueness

# Edge cases
assert Solution().minDeletions("a") == 0  # single character
assert Solution().minDeletions("abc") == 0  # all unique, frequency 1
assert Solution().minDeletions("aaaa") == 0  # single repeated character
assert Solution().minDeletions("aaaabbbbcccc") == 3  # need to adjust duplicates
assert Solution().minDeletions("abcdefghijklmnopqrstuvwxyza") == 1  # only one duplicate frequency

Test	Why
`"aab"`	Already good, no deletions required
`"aaabbbcc"`	Multiple characters with same frequency require deletions
`"ceabaacb"`	Multiple adjustments needed for uniqueness
`"a"`	Single character edge case
`"abc"`	All characters unique
`"aaaabbbbcccc"`	Multiple duplicates in larger clusters
`"abcdefghijklmnopqrstuvwxyza"`	Single duplicate at the end

Edge Cases

Case 1: Single character string

Strings like "a" have only one character, frequency 1. No deletions are needed, and the algorithm correctly handles this because max_allowed starts at 1 and no other frequencies exist.

Case 2: All characters with frequency 1

A string like "abcdef" has frequencies [1,1,1,1,1,1]. The algorithm reduces each subsequent frequency to strictly less than the previous, eventually deleting characters to reach uniqueness. The first frequency stays 1, the next becomes 0, which is allowed because frequency 0 is ignored.

Case 3: Large repeated clusters

Strings with many repeated characters like "aaaabbbbcccc" have frequencies [4,4,4]. The algorithm sorts [4,4,4] and greedily reduces them to [4,3,2] for minimal deletions, correctly computing deletions as 1 + 2 = 3. This ensures minimal deletions while maintaining uniqueness.