LeetCode 1531 - String Compression II

Difficulty: 🔴 Hard
Topics: String, Dynamic Programming

Solution

Problem Understanding

The problem asks us to minimize the length of a run-length encoded string after deleting at most k characters from the original string.

Run-length encoding compresses consecutive repeated characters by writing the character followed by its count, but only when the count is greater than 1.

For example:

"a" becomes "a" with length 1
"aa" becomes "a2" with length 2
"aaa" becomes "a3" with length 2
"aaaaaaaaaa" becomes "a10" with length 3

The important detail is that the compressed length changes only at specific count thresholds:

Count	Encoded Form	Length
1	`"a"`	1
2-9	`"a2"` to `"a9"`	2
10-99	`"a10"` to `"a99"`	3
100	`"a100"`	4

We are allowed to delete up to k characters anywhere in the string. After deletions, the remaining characters stay in their original order. The goal is to produce the shortest possible compressed representation.

The input consists of:

A string s of lowercase English letters
An integer k representing the maximum deletions allowed

The output is a single integer representing the minimum possible compressed length.

The constraints are relatively small:

s.length <= 100
k <= 100

These constraints strongly suggest that dynamic programming is appropriate. A brute-force solution that tries every subset of deletions would be exponential and infeasible.

Several edge cases are important:

If k >= len(s), we can delete the entire string, producing length 0
Single-character runs do not include a numeric suffix
Compression length increases only when counts cross 1, 9, or 99
Deleting characters between matching runs can merge them into a larger run
Strings with many alternating characters behave differently from strings with long repeated blocks

Understanding when the encoded length changes is the key insight behind the optimal solution.

Approaches

Brute Force Approach

A brute-force approach would try every possible combination of up to k deletions.

For each subset of deleted positions:

Construct the remaining string
Run-length encode it
Compute the compressed length
Track the minimum

This approach is correct because it explores every valid deletion configuration.

However, it is far too slow. A string of length 100 has 2^100 possible subsets. Even restricting deletions to at most k characters still produces a combinatorial explosion.

The brute-force solution is therefore impractical.

Optimal Dynamic Programming Approach

The key observation is that the compressed length depends only on:

The current character run
The count of that run
The remaining deletions

We can process the string from left to right and decide:

Keep characters to extend a run
Delete characters to avoid increasing compressed length
Delete intervening characters to merge runs

The important insight is that when building a run starting at position i, we only care about:

How many matching characters we keep
How many non-matching characters we delete

This naturally leads to dynamic programming.

Define:

dp(i, k) = minimum compressed length for substring s[i:] using at most k deletions

At each position:

We can delete s[i]
Or build a run starting from s[i]

While expanding the run:

Matching characters increase the run count
Non-matching characters consume deletions

We compute the compressed contribution of the current run and recursively solve the remaining suffix.

Because the string length is only 100, this solution is efficient enough.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(2^n * n)	O(n)	Tries every deletion subset
Optimal Dynamic Programming	O(n² * k)	O(n * k)	Memoized DP over substring positions and deletions

Algorithm Walkthrough

Step 1: Define the DP State

We define:

dp(index, remaining_deletions)

This represents the minimum compressed length obtainable from s[index:] using at most remaining_deletions deletions.

This state is sufficient because:

The remaining substring determines future choices
The remaining deletion budget constrains operations

Step 2: Handle Base Cases

If the remaining substring length is less than or equal to the number of deletions left, we can delete everything.

So:

if len(s) - index <= remaining_deletions:
    return 0

This is an important optimization.

Step 3: Option 1, Delete Current Character

We can delete s[index].

That gives:

dp(index + 1, remaining_deletions - 1)

This option is valid only if we still have deletions available.

Step 4: Option 2, Build a Run

We try to create a compressed group beginning with s[index].

We iterate from index forward:

If the character matches s[index], increase the run count
Otherwise, count it as a deletion

We maintain:

same_count
deleted_count

Step 5: Compute Compression Cost

The compressed contribution depends on the run length.

Run Count	Added Length
1	1
2-9	2
10-99	3
100	4

We define a helper function:

encoded_length(count)

Step 6: Recursively Solve Remaining Suffix

After forming a run ending at position j, recursively solve:

encoded_length(same_count) + dp(j + 1, remaining_deletions - deleted_count)

We take the minimum over all possible endpoints.

Step 7: Memoize Results

Many states repeat.

We store computed results in a memoization table keyed by:

(index, remaining_deletions)

This reduces the exponential recursion into polynomial time.

Why it works

The algorithm works because every optimal compressed string must begin with one of two choices:

Delete the current character
Keep it as part of a run

When building a run, every possible valid endpoint is explored. Since the recursive call optimally solves the remaining suffix, the entire solution is optimal by dynamic programming optimal substructure.

Memoization guarantees each state is solved only once.

Python Solution

from functools import lru_cache

class Solution:
    def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
        n = len(s)

        def encoded_length(count: int) -> int:
            if count == 1:
                return 1
            if count < 10:
                return 2
            if count < 100:
                return 3
            return 4

        @lru_cache(None)
        def dp(index: int, remaining_k: int) -> int:
            if remaining_k < 0:
                return float("inf")

            if index >= n or n - index <= remaining_k:
                return 0

            result = float("inf")

            # Option 1: delete current character
            result = dp(index + 1, remaining_k - 1)

            # Option 2: keep current character and build a run
            same_count = 0
            deleted_count = 0

            for j in range(index, n):
                if s[j] == s[index]:
                    same_count += 1
                else:
                    deleted_count += 1

                if deleted_count > remaining_k:
                    break

                current_length = encoded_length(same_count)

                result = min(
                    result,
                    current_length + dp(j + 1, remaining_k - deleted_count)
                )

            return result

        return dp(0, k)

The implementation closely follows the algorithm described earlier.

The encoded_length() helper computes how many characters a run contributes to the compressed string. This is important because the encoded size changes only when the count crosses digit boundaries.

The recursive dp() function solves the minimum compression problem for a suffix of the string.

The first optimization checks whether the remaining substring can be entirely deleted. If so, the compressed length becomes zero immediately.

Inside the recursion, we first consider deleting the current character. This consumes one deletion budget and moves to the next position.

Next, we try forming a run starting at the current index. As we expand the run:

Matching characters increase same_count
Non-matching characters are treated as deleted

If deletions exceed the available budget, we stop expanding.

For every valid run endpoint, we compute:

compressed_length_of_current_run + optimal_solution_for_remaining_suffix

Memoization ensures that repeated subproblems are solved only once.

Go Solution

package main

import "math"

func getLengthOfOptimalCompression(s string, k int) int {
	n := len(s)

	encodedLength := func(count int) int {
		if count == 1 {
			return 1
		}
		if count < 10 {
			return 2
		}
		if count < 100 {
			return 3
		}
		return 4
	}

	memo := make(map[[2]int]int)

	var dp func(int, int) int

	dp = func(index int, remainingK int) int {
		if remainingK < 0 {
			return math.MaxInt32
		}

		if index >= n || n-index <= remainingK {
			return 0
		}

		key := [2]int{index, remainingK}

		if value, exists := memo[key]; exists {
			return value
		}

		result := dp(index+1, remainingK-1)

		sameCount := 0
		deletedCount := 0

		for j := index; j < n; j++ {
			if s[j] == s[index] {
				sameCount++
			} else {
				deletedCount++
			}

			if deletedCount > remainingK {
				break
			}

			currentLength := encodedLength(sameCount)

			candidate := currentLength + dp(j+1, remainingK-deletedCount)

			if candidate < result {
				result = candidate
			}
		}

		memo[key] = result

		return result
	}

	return dp(0, k)
}

The Go implementation mirrors the Python solution closely.

Instead of Python's lru_cache, Go uses a map with a two-element array key:

map[[2]int]int

This stores memoized DP states.

Go does not have built-in infinity values for integers, so math.MaxInt32 is used as a large sentinel value.

Strings in Go are indexed as bytes, which is safe here because the input contains only lowercase English letters.

Worked Examples

Example 1

s = "aaabcccd"
k = 2

Goal:

minimum compressed length

Initial call:

dp(0, 2)

We explore runs starting at index 0.

j	Character	sameCount	deletedCount	Encoded	Recursive Call
0	a	1	0	1	dp(1,2)
1	a	2	0	2	dp(2,2)
2	a	3	0	2	dp(3,2)
3	b	3	1	2	dp(4,1)
4	c	3	2	2	dp(5,0)

The optimal strategy deletes:

'b' and 'd'

Remaining string:

"aaaccc"

Compressed form:

"a3c3"

Length:

Example 2

s = "aabbaa"
k = 2

Optimal deletion:

delete both 'b'

Remaining string:

"aaaa"

Compressed:

"a4"

Length:

Key insight:

Deleting characters between identical runs can merge them into one larger run.

Example 3

s = "aaaaaaaaaaa"
k = 0

No deletions allowed.

Run length:

Compressed:

"a11"

Length:

The algorithm simply builds the single run and computes its encoded size.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n² * k)	Each DP state scans forward through the string
Space	O(n * k)	Memoization table stores states

There are at most n * k DP states.

For each state, we may scan up to n characters forward while building a run.

Therefore, the total complexity becomes:

O(n² * k)

Since n <= 100, this is efficient enough.

The memoization table stores one result per (index, remaining_k) pair, requiring:

O(n * k)

space.

Test Cases

solution = Solution()

# Provided examples
assert solution.getLengthOfOptimalCompression("aaabcccd", 2) == 4  # delete b and d
assert solution.getLengthOfOptimalCompression("aabbaa", 2) == 2    # merge into a4
assert solution.getLengthOfOptimalCompression("aaaaaaaaaaa", 0) == 3  # a11

# Single character
assert solution.getLengthOfOptimalCompression("a", 0) == 1  # no compression suffix

# Delete entire string
assert solution.getLengthOfOptimalCompression("abc", 3) == 0  # everything removed

# No deletions
assert solution.getLengthOfOptimalCompression("abcdef", 0) == 6  # all unique chars

# Large repeated block
assert solution.getLengthOfOptimalCompression("aaaaaaaaaa", 0) == 3  # a10

# Compression threshold crossing
assert solution.getLengthOfOptimalCompression("aaaaaaaaa", 0) == 2  # a9
assert solution.getLengthOfOptimalCompression("aaaaaaaaaa", 1) == 2  # delete one -> a9

# Merge separated runs
assert solution.getLengthOfOptimalCompression("aaabbaaa", 2) == 2  # remove bb -> a6

# Alternating characters
assert solution.getLengthOfOptimalCompression("abababab", 2) == 5  # difficult merging

# All same characters with deletions
assert solution.getLengthOfOptimalCompression("aaaaaaaaaa", 5) == 2  # a5

# Boundary size
assert solution.getLengthOfOptimalCompression("a" * 100, 0) == 4  # a100

Test	Why
`"aaabcccd", 2`	Standard mixed example
`"aabbaa", 2`	Tests merging separated runs
`"aaaaaaaaaaa", 0`	Multi-digit count handling
`"a", 0`	Smallest valid input
`"abc", 3`	Entire string deletion
`"abcdef", 0`	No compression benefit
`"aaaaaaaaaa", 0`	Threshold from 9 to 10
`"aaaaaaaaaa", 1`	Strategic deletion reduces encoded size
`"aaabbaaa", 2`	Merge runs after deletions
`"abababab", 2`	Alternating characters
`"aaaaaaaaaa", 5`	Large deletion budget
`"a" * 100, 0`	Maximum count boundary

Edge Cases

One important edge case occurs when the deletion budget is large enough to remove the entire remaining substring. A naive implementation might continue recursing unnecessarily or even produce negative lengths. This implementation explicitly checks:

if n - index <= remaining_k:
    return 0

This guarantees correct handling and significantly improves efficiency.

Another tricky edge case involves count thresholds such as 1, 9, and 99. The compressed length only increases when the number of digits changes. For example:

"a9" has length 2
"a10" has length 3

An incorrect implementation might assume every additional character increases the encoded length. The helper function correctly handles these transitions.

A third important edge case involves merging separated runs by deleting intermediate characters. For example:

"aabbaa"

Deleting the two 'b' characters merges two separate 'aa' groups into one 'aaaa' group. A greedy approach that only considers local compression would miss this optimization. The dynamic programming solution explores all possible run expansions and therefore correctly identifies such merges.