LeetCode 3545 - Minimum Deletions for At Most K Distinct Characters

The problem gives us a string s containing lowercase English letters and an integer k. We are allowed to delete any characters from the string, including multiple occurrences of the same character.

LeetCode Problem 3545

Difficulty: 🟢 Easy
Topics: Hash Table, String, Greedy, Sorting, Counting

Solution

LeetCode 3545 - Minimum Deletions for At Most K Distinct Characters

Problem Understanding

The problem gives us a string s containing lowercase English letters and an integer k. We are allowed to delete any characters from the string, including multiple occurrences of the same character. After performing the deletions, the resulting string must contain at most k distinct characters.

Our goal is to minimize the total number of deleted characters.

A key observation is that if we decide to remove a character type entirely, we must delete all occurrences of that character. Deleting only some occurrences of a character does not reduce the number of distinct characters, because the character still remains present in the string.

For example, if the string contains three distinct characters and we need only two, we must completely eliminate one character type. The cost of eliminating a character type is exactly its frequency in the original string.

The input consists of:

  • s, the original string.
  • k, the maximum number of distinct characters allowed after deletions.

The output is a single integer representing the minimum number of deletions required.

The constraints are very small:

  • 1 <= s.length <= 16
  • 1 <= k <= 16

Because the string length is at most 16, even less efficient approaches could potentially work. However, there is a very simple greedy solution that is both optimal and efficient.

Important edge cases include:

  • The string already contains at most k distinct characters.
  • k is larger than the number of distinct characters.
  • Every character appears exactly once.
  • One character appears many times while others appear very few times.
  • k = 1, meaning all but one distinct character must be removed.

Approaches

Brute Force

A brute force approach would consider every subset of character types to keep.

Suppose the string contains d distinct characters. We could enumerate all subsets of these characters, determine whether the subset contains at most k distinct characters, and compute how many deletions are required to keep only those characters.

Since there are 2^d subsets, this approach becomes exponential in the number of distinct characters. Although the constraints are small enough that it could work, it is more complicated than necessary.

The approach is correct because it explicitly checks every possible set of characters that could remain and chooses the one with the smallest deletion cost.

Key Insight

Each distinct character contributes its frequency to the string.

If there are d distinct characters and d <= k, no deletions are needed.

Otherwise, we must remove exactly d - k character types.

To minimize deletions, we should remove the character types with the smallest frequencies. Every removed character type costs its entire frequency in deletions, so choosing the cheapest character types is always optimal.

Therefore:

  1. Count the frequency of each distinct character.
  2. Sort the frequencies in ascending order.
  3. Remove the smallest d - k frequencies.
  4. Sum those frequencies.

This greedy strategy directly produces the minimum deletion count.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(2^d × d) O(d) Enumerates every subset of distinct characters
Optimal O(d log d) O(d) Sort frequencies and remove the smallest ones

Here, d is the number of distinct characters.

Algorithm Walkthrough

  1. Create a frequency map that counts how many times each character appears in the string.
  2. Compute the number of distinct characters, which is the size of the frequency map.
  3. If the number of distinct characters is already less than or equal to k, return 0 because no deletions are necessary.
  4. Extract all frequencies from the frequency map into a list.
  5. Sort the frequency list in ascending order.
  6. Let remove_count = distinct_count - k. This is the number of character types that must be eliminated completely.
  7. Sum the first remove_count frequencies in the sorted list. These correspond to the cheapest character types to remove.
  8. Return the computed sum.

Why it works

Each character type is independent. Removing a character type costs exactly its frequency. Since we must eliminate exactly distinct_count - k character types, the minimum possible cost is obtained by selecting the character types with the smallest frequencies. Any solution that removes a larger frequency instead of a smaller one would require at least as many deletions, so the greedy choice is always optimal.

Problem Understanding

The problem asks us to modify a given string s by deleting characters so that the resulting string contains at most k distinct characters. A “distinct character” means a unique lowercase letter that appears at least once in the final string. Our goal is not to minimize the number of distinct characters arbitrarily, but to ensure it is at most k while performing the fewest possible deletions.

The input consists of a string s of lowercase English letters and an integer k. The output is a single integer representing the minimum number of character deletions required.

A key observation is that deletions remove individual characters, not entire character types unless we choose to remove all occurrences of a character. Therefore, the cost of eliminating a character type is equal to its frequency in the string.

The constraints are small, with s.length <= 16, which suggests that even exponential approaches might pass, but the structure of the problem strongly hints at a greedy frequency-based solution.

Important edge cases include situations where:

  • The string already has <= k distinct characters, requiring zero deletions.
  • All characters are distinct and k = 1, forcing removal of all but one character type.
  • Highly skewed frequency distributions where keeping high-frequency characters minimizes deletions.

The problem guarantees only lowercase English letters, which bounds the character set to 26 possible values, simplifying frequency analysis.

Approaches

Brute Force Approach

A brute-force method would consider all possible subsets of characters to keep, ensuring that the subset size is at most k. For each subset, we compute how many deletions are required by removing all characters not in the subset. This ensures correctness because we explore every valid configuration.

However, this approach is inefficient because even though the alphabet size is limited to 26, the number of subsets grows exponentially as $2^{26}$. Even if we prune to only subsets of size k, we still face combinatorial explosion. The implementation complexity also increases significantly.

Key Insight (Optimal Approach)

The critical observation is that we do not care which characters remain, only how many remain. To minimize deletions, we should preserve the k most frequent characters in the string, because keeping high-frequency characters reduces the number of deletions needed.

If we sort character frequencies in ascending order, then removing the smallest frequencies until only k distinct characters remain yields the optimal solution. Every removed character type must be fully deleted, so the cost is the sum of their frequencies.

This transforms the problem into:

Compute frequencies → sort → remove smallest (distinct - k) frequencies.

Comparison of Approaches

Approach Time Complexity Space Complexity Notes
Brute Force O(2^26) O(1) Try all subsets of characters to keep
Optimal Greedy O(26 log 26) O(26) Keep k most frequent characters

Algorithm Walkthrough

The optimal algorithm relies on frequency counting and greedy selection of characters to remove.

  1. First, scan the string and compute the frequency of each character using a hash map or fixed-size array of length 26. This step is necessary because we need to know how costly it is to remove each character type.
  2. Extract all non-zero frequencies into a list. Each frequency represents the cost of completely removing that character from the string.
  3. If the number of distinct characters is already less than or equal to k, return 0 immediately since no deletions are required.
  4. Sort the frequency list in ascending order. Sorting ensures that the cheapest characters to remove are considered first.
  5. Compute how many character types must be removed: remove_count = distinct_count - k.
  6. Sum the smallest remove_count frequencies. This sum represents the minimum number of deletions required.
  7. Return the computed sum.

Why it works

The correctness comes from the greedy property that removing a character entirely is independent of others, and the cost is fixed per character type. Since we must reduce the number of distinct characters, the optimal strategy is always to eliminate the least costly character types first. This ensures that we preserve the most frequent characters, minimizing total deletions.

Python Solution

from collections import Counter

class Solution:
    def minDeletion(self, s: str, k: int) -> int:
        frequencies = list(Counter(s).values())

        distinct_count = len(frequencies)

        if distinct_count <= k:
            return 0

        frequencies.sort()

        remove_count = distinct_count - k

        return sum(frequencies[:remove_count])

The implementation begins by using Counter to count occurrences of every character.

The frequencies are extracted into a list, and the number of distinct characters is simply the length of that list.

If the number of distinct characters is already within the allowed limit, the function immediately returns 0.

Otherwise, the frequencies are sorted in ascending order. The number of character types that must be removed is distinct_count - k. Since removing a character type costs its entire frequency, we sum the smallest frequencies and return that value.

This directly implements the greedy strategy described earlier. class Solution: def minDeletion(self, s: str, k: int) -> int: freq = [0] * 26

    for ch in s:
        freq[ord(ch) - ord('a')] += 1
    
    counts = [f for f in freq if f > 0]
    
    if len(counts) <= k:
        return 0
    
    counts.sort()
    
    remove_count = len(counts) - k
    return sum(counts[:remove_count])

The implementation begins by building a fixed-size frequency array for all 26 lowercase letters. We then extract only the characters that appear in the string. If the number of distinct characters is already within the allowed limit, we return zero immediately.

Otherwise, we sort the frequencies so that we can identify the least expensive characters to remove. We then sum the smallest frequencies corresponding to the number of character types we must eliminate. This directly computes the minimum deletions.

## Go Solution

```go
package main

import "sort"

func minDeletion(s string, k int) int {
	freq := make(map[byte]int)

	for i := 0; i < len(s); i++ {
		freq[s[i]]++
	}

	frequencies := make([]int, 0, len(freq))
	for _, count := range freq {
		frequencies = append(frequencies, count)
	}

	distinctCount := len(frequencies)

	if distinctCount <= k {
		return 0
	}

	sort.Ints(frequencies)

	removeCount := distinctCount - k

	answer := 0
	for i := 0; i < removeCount; i++ {
		answer += frequencies[i]
	}

	return answer
}

The Go solution follows the same logic as the Python version.

A map[byte]int is used to count character frequencies. The frequencies are copied into a slice and sorted using sort.Ints.

The smallest distinctCount - k frequencies are summed to obtain the answer.

There are no integer overflow concerns because the string length is at most 16, so the maximum possible answer is also at most 16.

Worked Examples

Example 1

Input: s = "abc", k = 2

Frequency map:

Character Frequency
a 1
b 1
c 1

Distinct characters = 3

Since 3 > 2, we must remove:

Variable Value
distinct_count 3
k 2
remove_count 1

Sorted frequencies:

Index Frequency
0 1
1 1
2 1

Sum of first 1 frequency:

1

Answer = 1

Example 2

Input: s = "aabb", k = 2

Frequency map:

Character Frequency
a 2
b 2

Distinct characters = 2

Since 2 <= 2, no deletions are required.

Answer = 0

Example 3

Input: s = "yyyzz", k = 1

Frequency map:

Character Frequency
y 3
z 2

Distinct characters = 2

Required removals:

Variable Value
distinct_count 2
k 1
remove_count 1

Sorted frequencies:

Index Frequency
0 2
1 3

Sum of first 1 frequency:

2

Answer = 2 func minDeletion(s string, k int) int { freq := make([]int, 26)

for i := 0; i < len(s); i++ {
    freq[s[i]-'a']++
}

counts := make([]int, 0)
for _, f := range freq {
    if f > 0 {
        counts = append(counts, f)
    }
}

if len(counts) <= k {
    return 0
}

// simple insertion sort is fine due to tiny size (<=26)
for i := 0; i < len(counts); i++ {
    for j := i + 1; j < len(counts); j++ {
        if counts[j] < counts[i] {
            counts[i], counts[j] = counts[j], counts[i]
        }
    }
}

removeCount := len(counts) - k
result := 0
for i := 0; i < removeCount; i++ {
    result += counts[i]
}

return result

}


In Go, we use a fixed-size integer array for frequency counting. Since Go does not have a built-in small-array sort optimized for such tiny sizes in a guaranteed way, we use a simple nested-loop sort, which is efficient enough given the maximum size of 26.

We then compute the result exactly as in Python by summing the smallest frequencies needed to reduce the distinct character count.

## Worked Examples

### Example 1: s = "abc", k = 2

Frequency table:

- a: 1
- b: 1
- c: 1

Counts list: `[1, 1, 1]`

Distinct = 3, need to remove 1 character type.

Sorted counts: `[1, 1, 1]`

Remove smallest 1:

- Remove `'a'` → cost = 1

Result = 1

### Example 2: s = "aabb", k = 2

Frequency table:

- a: 2
- b: 2

Counts: `[2, 2]`

Distinct = 2, which is already ≤ k.

Result = 0

No sorting or removal needed.

### Example 3: s = "yyyzz", k = 1

Frequency table:

- y: 3
- z: 2

Counts: `[3, 2]`

Distinct = 2, need to remove 1 character type.

Sorted counts: `[2, 3]`

Remove smallest:

- Remove `z` → cost = 2

Result = 2

## Complexity Analysis

| Measure | Complexity | Explanation |
| --- | --- | --- |
| Time | O(d log d) | Sorting the frequency list dominates the runtime |
| Space | O(d) | Frequency map and frequency list store one entry per distinct character |

Here, `d` is the number of distinct characters in the string.

Since the English alphabet contains only 26 lowercase letters and the string length is at most 16, the actual runtime is extremely small. Nevertheless, the asymptotic complexity is determined by sorting the distinct character frequencies.
| Time | O(n + 26 log 26) | Counting frequencies takes O(n), sorting at most 26 elements is constant bounded |
| Space | O(26) | Fixed array and list of at most 26 frequencies |

The dominant factor is the initial scan of the string. Sorting is negligible due to the constant alphabet size.

## Test Cases

solution = Solution()

assert solution.minDeletion("abc", 2) == 1 # Example 1 assert solution.minDeletion("aabb", 2) == 0 # Example 2 assert solution.minDeletion("yyyzz", 1) == 2 # Example 3

assert solution.minDeletion("a", 1) == 0 # Single character assert solution.minDeletion("aaaa", 1) == 0 # One distinct character assert solution.minDeletion("abcd", 4) == 0 # Already satisfies limit assert solution.minDeletion("abcd", 5) == 0 # k larger than distinct count

assert solution.minDeletion("abcd", 1) == 3 # Keep one character assert solution.minDeletion("abcde", 2) == 3 # Remove three character types

assert solution.minDeletion("aaabbc", 2) == 1 # Remove frequency 1 assert solution.minDeletion("aaabbc", 1) == 3 # Remove frequencies 1 and 2

assert solution.minDeletion("aabbbbcccc", 2) == 2 # Remove smallest frequency

assert solution.minDeletion("abcdefghijklmnop", 1) == 15 # Maximum length, all unique assert solution.minDeletion("abcdefghijklmnop", 16) == 0 # Maximum k


## Test Case Summary

| Test | Why |
| --- | --- |
| `"abc", 2` | Basic example with one removal |
| `"aabb", 2` | No deletions needed |
| `"yyyzz", 1` | Remove lower frequency character |
| `"a", 1` | Smallest valid input |
| `"aaaa", 1` | Single distinct character |
| `"abcd", 4` | Distinct count equals k |
| `"abcd", 5` | k exceeds distinct count |
| `"abcd", 1` | Must remove multiple character types |
| `"abcde", 2` | Several removals required |
| `"aaabbc", 2` | Greedy removes smallest frequency |
| `"aaabbc", 1` | Multiple smallest frequencies removed |
| `"aabbbbcccc", 2` | Uneven frequency distribution |
| `"abcdefghijklmnop", 1` | Maximum removals |
| `"abcdefghijklmnop", 16` | Maximum allowed distinct count |

## Edge Cases

### The String Already Satisfies the Requirement

If the number of distinct characters is already less than or equal to `k`, the correct answer is zero. A common mistake is to continue processing and accidentally remove characters unnecessarily. The implementation explicitly checks `distinct_count <= k` and returns immediately.

### All Characters Have Frequency One

Consider `s = "abcdef"` and `k = 2`. Every character has the same frequency, so there is no preference among them. We simply need to remove four character types, costing `1 + 1 + 1 + 1 = 4` deletions. Sorting still works correctly because all frequencies are equal.

### Highly Uneven Frequencies

Consider `s = "aaaaaaaabc"` and `k = 1`. The frequencies are `[8, 1, 1]`. The optimal solution removes the two rare characters for a cost of only `2`. A naive strategy that removes the most frequent character would require `8` deletions. Sorting and removing the smallest frequencies guarantees the minimum cost.

### k Greater Than the Number of Distinct Characters

For example, `s = "abc"` and `k = 10`. The requirement is already satisfied because the string has only three distinct characters. The implementation correctly returns `0` without performing any deletions.

### k Equal to One

When `k = 1`, only a single character type may remain. The algorithm naturally handles this by removing all character types except the one with the largest frequency, which is equivalent to summing all smaller frequencies after sorting. This minimizes the deletion count.
assert Solution().minDeletion("abc", 2) == 1  # basic removal case
assert Solution().minDeletion("aabb", 2) == 0  # already valid
assert Solution().minDeletion("yyyzz", 1) == 2  # remove one full character type
assert Solution().minDeletion("a", 1) == 0  # single character
assert Solution().minDeletion("a", 0) == 1  # edge: must remove all (even though k>=1 constraint usually)
assert Solution().minDeletion("aaabbbccc", 2) == 3  # keep two most frequent groups
assert Solution().minDeletion("abcdabcd", 3) == 2  # remove one character type
Test Why
"abc", 2 minimal single deletion
"aabb", 2 no operation needed
"yyyzz", 1 must eliminate one full character group
"a", 1 smallest valid input
"aaabbbccc", 2 frequency-based selection matters
"abcdabcd", 3 balanced frequencies, pruning needed

Edge Cases

One important edge case is when the number of distinct characters is already less than or equal to k. In this case, no deletions are required, and the algorithm must return zero immediately. Failing to handle this can lead to unnecessary sorting and incorrect summation logic.

Another edge case is when all characters are distinct and k = 1. Here, the algorithm must effectively choose the single most frequent character and delete all others entirely. This case validates that the greedy frequency-sorting approach is necessary and optimal.

A third edge case involves highly skewed distributions, such as one character appearing many times while others appear once. The algorithm correctly handles this by ensuring that low-frequency characters are removed first, even if they are numerous, because each contributes more efficiently to reducing distinct count per deletion cost ratio.