LeetCode 1531 - String Compression II
The problem asks us to minimize the length of a run-length encoded string after deleting at most k characters from the o
Difficulty: 🔴 Hard
Topics: String, Dynamic Programming
Solution
Problem Understanding
The problem asks us to minimize the length of a run-length encoded string after deleting at most k characters from the original string.
Run-length encoding compresses consecutive repeated characters by writing the character followed by its count, but only when the count is greater than 1.
For example:
"a"becomes"a"with length1"aa"becomes"a2"with length2"aaa"becomes"a3"with length2"aaaaaaaaaa"becomes"a10"with length3
The important detail is that the compressed length changes only at specific count thresholds:
| Count | Encoded Form | Length |
|---|---|---|
| 1 | "a" |
1 |
| 2-9 | "a2" to "a9" |
2 |
| 10-99 | "a10" to "a99" |
3 |
| 100 | "a100" |
4 |
We are allowed to delete up to k characters anywhere in the string. After deletions, the remaining characters stay in their original order. The goal is to produce the shortest possible compressed representation.
The input consists of:
- A string
sof lowercase English letters - An integer
krepresenting the maximum deletions allowed
The output is a single integer representing the minimum possible compressed length.
The constraints are relatively small:
s.length <= 100k <= 100
These constraints strongly suggest that dynamic programming is appropriate. A brute-force solution that tries every subset of deletions would be exponential and infeasible.
Several edge cases are important:
- If
k >= len(s), we can delete the entire string, producing length0 - Single-character runs do not include a numeric suffix
- Compression length increases only when counts cross
1,9, or99 - Deleting characters between matching runs can merge them into a larger run
- Strings with many alternating characters behave differently from strings with long repeated blocks
Understanding when the encoded length changes is the key insight behind the optimal solution.
Approaches
Brute Force Approach
A brute-force approach would try every possible combination of up to k deletions.
For each subset of deleted positions:
- Construct the remaining string
- Run-length encode it
- Compute the compressed length
- Track the minimum
This approach is correct because it explores every valid deletion configuration.
However, it is far too slow. A string of length 100 has 2^100 possible subsets. Even restricting deletions to at most k characters still produces a combinatorial explosion.
The brute-force solution is therefore impractical.
Optimal Dynamic Programming Approach
The key observation is that the compressed length depends only on:
- The current character run
- The count of that run
- The remaining deletions
We can process the string from left to right and decide:
- Keep characters to extend a run
- Delete characters to avoid increasing compressed length
- Delete intervening characters to merge runs
The important insight is that when building a run starting at position i, we only care about:
- How many matching characters we keep
- How many non-matching characters we delete
This naturally leads to dynamic programming.
Define:
dp(i, k)= minimum compressed length for substrings[i:]using at mostkdeletions
At each position:
- We can delete
s[i] - Or build a run starting from
s[i]
While expanding the run:
- Matching characters increase the run count
- Non-matching characters consume deletions
We compute the compressed contribution of the current run and recursively solve the remaining suffix.
Because the string length is only 100, this solution is efficient enough.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^n * n) | O(n) | Tries every deletion subset |
| Optimal Dynamic Programming | O(n² * k) | O(n * k) | Memoized DP over substring positions and deletions |
Algorithm Walkthrough
Step 1: Define the DP State
We define:
dp(index, remaining_deletions)
This represents the minimum compressed length obtainable from s[index:] using at most remaining_deletions deletions.
This state is sufficient because:
- The remaining substring determines future choices
- The remaining deletion budget constrains operations
Step 2: Handle Base Cases
If the remaining substring length is less than or equal to the number of deletions left, we can delete everything.
So:
if len(s) - index <= remaining_deletions:
return 0
This is an important optimization.
Step 3: Option 1, Delete Current Character
We can delete s[index].
That gives:
dp(index + 1, remaining_deletions - 1)
This option is valid only if we still have deletions available.
Step 4: Option 2, Build a Run
We try to create a compressed group beginning with s[index].
We iterate from index forward:
- If the character matches
s[index], increase the run count - Otherwise, count it as a deletion
We maintain:
same_countdeleted_count
Step 5: Compute Compression Cost
The compressed contribution depends on the run length.
| Run Count | Added Length |
|---|---|
| 1 | 1 |
| 2-9 | 2 |
| 10-99 | 3 |
| 100 | 4 |
We define a helper function:
encoded_length(count)
Step 6: Recursively Solve Remaining Suffix
After forming a run ending at position j, recursively solve:
encoded_length(same_count) + dp(j + 1, remaining_deletions - deleted_count)
We take the minimum over all possible endpoints.
Step 7: Memoize Results
Many states repeat.
We store computed results in a memoization table keyed by:
(index, remaining_deletions)
This reduces the exponential recursion into polynomial time.
Why it works
The algorithm works because every optimal compressed string must begin with one of two choices:
- Delete the current character
- Keep it as part of a run
When building a run, every possible valid endpoint is explored. Since the recursive call optimally solves the remaining suffix, the entire solution is optimal by dynamic programming optimal substructure.
Memoization guarantees each state is solved only once.
Python Solution
from functools import lru_cache
class Solution:
def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
n = len(s)
def encoded_length(count: int) -> int:
if count == 1:
return 1
if count < 10:
return 2
if count < 100:
return 3
return 4
@lru_cache(None)
def dp(index: int, remaining_k: int) -> int:
if remaining_k < 0:
return float("inf")
if index >= n or n - index <= remaining_k:
return 0
result = float("inf")
# Option 1: delete current character
result = dp(index + 1, remaining_k - 1)
# Option 2: keep current character and build a run
same_count = 0
deleted_count = 0
for j in range(index, n):
if s[j] == s[index]:
same_count += 1
else:
deleted_count += 1
if deleted_count > remaining_k:
break
current_length = encoded_length(same_count)
result = min(
result,
current_length + dp(j + 1, remaining_k - deleted_count)
)
return result
return dp(0, k)
The implementation closely follows the algorithm described earlier.
The encoded_length() helper computes how many characters a run contributes to the compressed string. This is important because the encoded size changes only when the count crosses digit boundaries.
The recursive dp() function solves the minimum compression problem for a suffix of the string.
The first optimization checks whether the remaining substring can be entirely deleted. If so, the compressed length becomes zero immediately.
Inside the recursion, we first consider deleting the current character. This consumes one deletion budget and moves to the next position.
Next, we try forming a run starting at the current index. As we expand the run:
- Matching characters increase
same_count - Non-matching characters are treated as deleted
If deletions exceed the available budget, we stop expanding.
For every valid run endpoint, we compute:
compressed_length_of_current_run + optimal_solution_for_remaining_suffix
Memoization ensures that repeated subproblems are solved only once.
Go Solution
package main
import "math"
func getLengthOfOptimalCompression(s string, k int) int {
n := len(s)
encodedLength := func(count int) int {
if count == 1 {
return 1
}
if count < 10 {
return 2
}
if count < 100 {
return 3
}
return 4
}
memo := make(map[[2]int]int)
var dp func(int, int) int
dp = func(index int, remainingK int) int {
if remainingK < 0 {
return math.MaxInt32
}
if index >= n || n-index <= remainingK {
return 0
}
key := [2]int{index, remainingK}
if value, exists := memo[key]; exists {
return value
}
result := dp(index+1, remainingK-1)
sameCount := 0
deletedCount := 0
for j := index; j < n; j++ {
if s[j] == s[index] {
sameCount++
} else {
deletedCount++
}
if deletedCount > remainingK {
break
}
currentLength := encodedLength(sameCount)
candidate := currentLength + dp(j+1, remainingK-deletedCount)
if candidate < result {
result = candidate
}
}
memo[key] = result
return result
}
return dp(0, k)
}
The Go implementation mirrors the Python solution closely.
Instead of Python's lru_cache, Go uses a map with a two-element array key:
map[[2]int]int
This stores memoized DP states.
Go does not have built-in infinity values for integers, so math.MaxInt32 is used as a large sentinel value.
Strings in Go are indexed as bytes, which is safe here because the input contains only lowercase English letters.
Worked Examples
Example 1
s = "aaabcccd"
k = 2
Goal:
minimum compressed length
Initial call:
dp(0, 2)
We explore runs starting at index 0.
| j | Character | sameCount | deletedCount | Encoded | Recursive Call |
|---|---|---|---|---|---|
| 0 | a | 1 | 0 | 1 | dp(1,2) |
| 1 | a | 2 | 0 | 2 | dp(2,2) |
| 2 | a | 3 | 0 | 2 | dp(3,2) |
| 3 | b | 3 | 1 | 2 | dp(4,1) |
| 4 | c | 3 | 2 | 2 | dp(5,0) |
The optimal strategy deletes:
'b' and 'd'
Remaining string:
"aaaccc"
Compressed form:
"a3c3"
Length:
4
Example 2
s = "aabbaa"
k = 2
Optimal deletion:
delete both 'b'
Remaining string:
"aaaa"
Compressed:
"a4"
Length:
2
Key insight:
Deleting characters between identical runs can merge them into one larger run.
Example 3
s = "aaaaaaaaaaa"
k = 0
No deletions allowed.
Run length:
11
Compressed:
"a11"
Length:
3
The algorithm simply builds the single run and computes its encoded size.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n² * k) | Each DP state scans forward through the string |
| Space | O(n * k) | Memoization table stores states |
There are at most n * k DP states.
For each state, we may scan up to n characters forward while building a run.
Therefore, the total complexity becomes:
O(n² * k)
Since n <= 100, this is efficient enough.
The memoization table stores one result per (index, remaining_k) pair, requiring:
O(n * k)
space.
Test Cases
solution = Solution()
# Provided examples
assert solution.getLengthOfOptimalCompression("aaabcccd", 2) == 4 # delete b and d
assert solution.getLengthOfOptimalCompression("aabbaa", 2) == 2 # merge into a4
assert solution.getLengthOfOptimalCompression("aaaaaaaaaaa", 0) == 3 # a11
# Single character
assert solution.getLengthOfOptimalCompression("a", 0) == 1 # no compression suffix
# Delete entire string
assert solution.getLengthOfOptimalCompression("abc", 3) == 0 # everything removed
# No deletions
assert solution.getLengthOfOptimalCompression("abcdef", 0) == 6 # all unique chars
# Large repeated block
assert solution.getLengthOfOptimalCompression("aaaaaaaaaa", 0) == 3 # a10
# Compression threshold crossing
assert solution.getLengthOfOptimalCompression("aaaaaaaaa", 0) == 2 # a9
assert solution.getLengthOfOptimalCompression("aaaaaaaaaa", 1) == 2 # delete one -> a9
# Merge separated runs
assert solution.getLengthOfOptimalCompression("aaabbaaa", 2) == 2 # remove bb -> a6
# Alternating characters
assert solution.getLengthOfOptimalCompression("abababab", 2) == 5 # difficult merging
# All same characters with deletions
assert solution.getLengthOfOptimalCompression("aaaaaaaaaa", 5) == 2 # a5
# Boundary size
assert solution.getLengthOfOptimalCompression("a" * 100, 0) == 4 # a100
| Test | Why |
|---|---|
"aaabcccd", 2 |
Standard mixed example |
"aabbaa", 2 |
Tests merging separated runs |
"aaaaaaaaaaa", 0 |
Multi-digit count handling |
"a", 0 |
Smallest valid input |
"abc", 3 |
Entire string deletion |
"abcdef", 0 |
No compression benefit |
"aaaaaaaaaa", 0 |
Threshold from 9 to 10 |
"aaaaaaaaaa", 1 |
Strategic deletion reduces encoded size |
"aaabbaaa", 2 |
Merge runs after deletions |
"abababab", 2 |
Alternating characters |
"aaaaaaaaaa", 5 |
Large deletion budget |
"a" * 100, 0 |
Maximum count boundary |
Edge Cases
One important edge case occurs when the deletion budget is large enough to remove the entire remaining substring. A naive implementation might continue recursing unnecessarily or even produce negative lengths. This implementation explicitly checks:
if n - index <= remaining_k:
return 0
This guarantees correct handling and significantly improves efficiency.
Another tricky edge case involves count thresholds such as 1, 9, and 99. The compressed length only increases when the number of digits changes. For example:
"a9"has length2"a10"has length3
An incorrect implementation might assume every additional character increases the encoded length. The helper function correctly handles these transitions.
A third important edge case involves merging separated runs by deleting intermediate characters. For example:
"aabbaa"
Deleting the two 'b' characters merges two separate 'aa' groups into one 'aaaa' group. A greedy approach that only considers local compression would miss this optimization. The dynamic programming solution explores all possible run expansions and therefore correctly identifies such merges.