LeetCode 3167 - Better Compression of String
The problem gives us a compressed string where every character is immediately followed by its frequency. For example, the string "a3b2" represents the original expanded string "aaabb". However, the input compression is not guaranteed to be optimal.
Difficulty: 🟡 Medium
Topics: Hash Table, String, Sorting, Counting
Solution
Problem Understanding
The problem gives us a compressed string where every character is immediately followed by its frequency. For example, the string "a3b2" represents the original expanded string "aaabb".
However, the input compression is not guaranteed to be optimal. The same character may appear multiple times in different parts of the compressed string. For example:
"a3c9b2c1"
contains the character 'c' twice, once with frequency 9 and once with frequency 1.
The task is to produce a better compression with two strict rules:
- Every character appears exactly once.
- Characters must appear in alphabetical order.
This means we must:
- Parse the compressed input correctly.
- Accumulate the total frequency for each character.
- Output the characters sorted from
'a'to'z'.
For the previous example:
"a3c9b2c1"
the total count of 'c' becomes:
9 + 1 = 10
so the answer is:
"a3b2c10"
An important detail is that frequencies may contain multiple digits. For example:
"a12"
means character 'a' appears 12 times. A naive parser that assumes frequencies are single digits would fail.
The constraints are also important:
compressed.lengthcan be as large as6 * 10^4- Frequencies can be as large as
10^4
These constraints strongly suggest that we should process the string in a single pass rather than repeatedly reconstructing expanded strings.
The problem guarantees that:
- The input format is always valid.
- Every character is followed by a valid positive integer.
- Frequencies have no leading zeroes.
This simplifies parsing because we never need to validate malformed input.
Several edge cases are important:
- A character may appear many times throughout the string.
- Frequencies may contain multiple digits.
- The input may already be optimally compressed.
- The input may contain only one distinct character.
- Characters may appear in completely unsorted order.
A correct solution must handle all of these cases efficiently.
Approaches
Brute Force Approach
A straightforward but inefficient approach is to fully decompress the string first.
For example:
"a3b2"
would become:
"aaabb"
After constructing the expanded string, we could count character frequencies using a hash map, sort the characters alphabetically, and rebuild the compressed result.
This approach is correct because reconstructing the original string preserves all frequency information exactly. Once expanded, counting occurrences is trivial.
However, this method is inefficient because the decompressed string can become extremely large. Consider an input like:
"a10000b10000c10000"
The expanded string would contain 30,000 characters. In larger theoretical scenarios, decompression wastes both time and memory because we only need frequency totals, not the actual expanded sequence.
The problem can be solved without ever constructing the original string.
Optimal Approach
The key observation is that we never need the decompressed string itself.
We only care about:
- which character appears,
- and the total frequency of that character.
Therefore, we can parse the compressed string directly.
The algorithm works as follows:
- Read one character.
- Read all consecutive digits after it to form the full frequency.
- Add that frequency to a hash map entry for the character.
- Continue until the entire string is processed.
- Finally, iterate from
'a'to'z'and build the output string for characters that appeared.
This avoids unnecessary decompression and processes the input in linear time.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(N + K) | O(K) | Expands the entire string, where K is decompressed size |
| Optimal | O(N) | O(1) | Parses frequencies directly without decompression |
Here:
Nis the length of the compressed input string.Kis the size of the decompressed string, which may be much larger thanN.
Algorithm Walkthrough
Step 1: Create a Frequency Map
We maintain a hash map that stores the total frequency for each character.
For example:
frequency['c'] = 10
means the character 'c' appears 10 times overall.
A hash map is ideal because:
- updates are constant time,
- repeated characters are easy to accumulate,
- and there are only 26 lowercase letters.
Step 2: Traverse the String
We iterate through the compressed string using an index i.
At every step:
compressed[i]is guaranteed to be a character.- The following positions contain digits representing the frequency.
Step 3: Read the Character
We store the current character:
char = compressed[i]
Then move to the next position.
Step 4: Parse the Full Frequency
Frequencies may contain multiple digits.
For example:
"a123"
must be interpreted as frequency 123, not separate digits.
We keep reading digits until we reach another character or the end of the string.
Example:
num = 0
while i < n and compressed[i].isdigit():
num = num * 10 + int(compressed[i])
i += 1
This standard parsing pattern converts consecutive digits into an integer.
Step 5: Accumulate the Frequency
Once the full number is parsed:
frequency[char] += num
This correctly combines repeated occurrences of the same character.
Step 6: Build the Result
After processing the entire input:
- iterate through
'a'to'z', - append characters that exist in the map,
- append their total frequency.
This guarantees alphabetical ordering.
Step 7: Return the Final String
Join all pieces together into the final compressed result.
Why it works
The algorithm processes every character-frequency pair exactly once. Each parsed frequency is added to the correct character total, so the hash map always contains the cumulative frequency of every letter seen so far.
Because the final output iterates alphabetically from 'a' to 'z', the result automatically satisfies the sorting requirement. Since each character is emitted exactly once with its total accumulated count, the produced compression is both correct and optimal.
Python Solution
class Solution:
def betterCompression(self, compressed: str) -> str:
frequency = {}
n = len(compressed)
i = 0
while i < n:
char = compressed[i]
i += 1
count = 0
while i < n and compressed[i].isdigit():
count = count * 10 + int(compressed[i])
i += 1
frequency[char] = frequency.get(char, 0) + count
result = []
for ascii_code in range(ord('a'), ord('z') + 1):
char = chr(ascii_code)
if char in frequency:
result.append(char)
result.append(str(frequency[char]))
return "".join(result)
The implementation follows the algorithm directly.
The frequency dictionary stores cumulative counts for every character. The index i walks through the compressed string exactly once.
For every iteration:
- we read the current character,
- then parse the full numeric frequency using a digit-reading loop.
The expression:
count = count * 10 + int(compressed[i])
builds multi-digit numbers correctly.
After parsing a complete frequency, we update the dictionary entry for that character.
Finally, we iterate alphabetically from 'a' to 'z' and append existing characters and their counts to the result list. Using a list and "".join() is efficient because repeated string concatenation in Python can be costly.
Go Solution
func betterCompression(compressed string) string {
frequency := make(map[byte]int)
n := len(compressed)
i := 0
for i < n {
char := compressed[i]
i++
count := 0
for i < n && compressed[i] >= '0' && compressed[i] <= '9' {
count = count*10 + int(compressed[i]-'0')
i++
}
frequency[char] += count
}
result := make([]byte, 0)
for char := byte('a'); char <= byte('z'); char++ {
if count, exists := frequency[char]; exists {
result = append(result, char...)
result = append(result, []byte(fmt.Sprintf("%d", count))...)
}
}
return string(result)
}
The Go solution uses the same logic as the Python version but adapts it to Go idioms.
The frequency table is implemented with:
map[byte]int
since characters are lowercase English letters.
Digit parsing is done manually using ASCII arithmetic:
compressed[i] - '0'
Unlike Python, Go strings are immutable byte sequences, so we build the result using a byte slice for efficiency.
One important Go-specific detail is converting integers to strings using fmt.Sprintf. Another valid option would be strconv.Itoa.
A corrected version of the append line is:
result = append(result, char)
Here is the fully corrected solution:
import (
"fmt"
)
func betterCompression(compressed string) string {
frequency := make(map[byte]int)
n := len(compressed)
i := 0
for i < n {
char := compressed[i]
i++
count := 0
for i < n && compressed[i] >= '0' && compressed[i] <= '9' {
count = count*10 + int(compressed[i]-'0')
i++
}
frequency[char] += count
}
result := make([]byte, 0)
for char := byte('a'); char <= byte('z'); char++ {
if count, exists := frequency[char]; exists {
result = append(result, char)
result = append(result, []byte(fmt.Sprintf("%d", count))...)
}
}
return string(result)
}
Worked Examples
Example 1
Input:
"a3c9b2c1"
Parsing Phase
| Step | Character | Parsed Frequency | Frequency Map |
|---|---|---|---|
| 1 | a | 3 | {a: 3} |
| 2 | c | 9 | {a: 3, c: 9} |
| 3 | b | 2 | {a: 3, c: 9, b: 2} |
| 4 | c | 1 | {a: 3, c: 10, b: 2} |
Construction Phase
We iterate alphabetically:
| Character | Exists? | Output So Far |
|---|---|---|
| a | Yes | a3 |
| b | Yes | a3b2 |
| c | Yes | a3b2c10 |
Final answer:
"a3b2c10"
Example 2
Input:
"c2b3a1"
Parsing Phase
| Step | Character | Parsed Frequency | Frequency Map |
|---|---|---|---|
| 1 | c | 2 | {c: 2} |
| 2 | b | 3 | {c: 2, b: 3} |
| 3 | a | 1 | {c: 2, b: 3, a: 1} |
Construction Phase
Alphabetical iteration produces:
a1b3c2
Final answer:
"a1b3c2"
Example 3
Input:
"a2b4c1"
Parsing Phase
| Step | Character | Parsed Frequency | Frequency Map |
|---|---|---|---|
| 1 | a | 2 | {a: 2} |
| 2 | b | 4 | {a: 2, b: 4} |
| 3 | c | 1 | {a: 2, b: 4, c: 1} |
The input is already optimally compressed.
Final answer:
"a2b4c1"
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N) | Each character in the input string is processed once |
| Space | O(1) | At most 26 lowercase letters are stored |
The algorithm performs a single linear scan through the compressed string. Every index is visited once while parsing characters and digits.
Although we use a hash map, the number of possible keys is bounded by 26 lowercase English letters, so the auxiliary space is constant.
Test Cases
solution = Solution()
# Provided examples
assert solution.betterCompression("a3c9b2c1") == "a3b2c10" # repeated character
assert solution.betterCompression("c2b3a1") == "a1b3c2" # sorting required
assert solution.betterCompression("a2b4c1") == "a2b4c1" # already optimal
# Single character
assert solution.betterCompression("z5") == "z5" # smallest valid structure
# Multi-digit frequency
assert solution.betterCompression("a12") == "a12" # multi-digit parsing
# Multiple repeated appearances
assert solution.betterCompression("a1a2a3") == "a6" # cumulative frequency
# Unordered characters
assert solution.betterCompression("z1y2x3") == "x3y2z1" # alphabetical output
# Large frequencies
assert solution.betterCompression("a9999a1") == "a10000" # large accumulation
# All letters
assert solution.betterCompression(
"z1y1x1w1v1u1t1s1r1q1p1o1n1m1l1k1j1i1h1g1f1e1d1c1b1a1"
) == "a1b1c1d1e1f1g1h1i1j1k1l1m1n1o1p1q1r1s1t1u1v1w1x1y1z1"
# Mixed repeated and unique characters
assert solution.betterCompression("b10a5b5c2a5") == "a10b15c2"
| Test | Why |
|---|---|
"a3c9b2c1" |
Validates repeated character merging |
"c2b3a1" |
Validates alphabetical sorting |
"a2b4c1" |
Validates already optimal input |
"z5" |
Smallest non-trivial case |
"a12" |
Validates multi-digit parsing |
"a1a2a3" |
Validates cumulative addition |
"z1y2x3" |
Validates reverse-order handling |
"a9999a1" |
Validates large frequency totals |
| all letters reversed | Validates full alphabet ordering |
"b10a5b5c2a5" |
Validates multiple repeated groups |
Edge Cases
Multi-Digit Frequencies
One of the most common mistakes is assuming frequencies are single digits.
For example:
"a123"
should produce:
"a123"
not:
"a1"
with leftover digits incorrectly interpreted.
The implementation avoids this bug by continuously reading digits until a non-digit character is reached.
Repeated Characters in Multiple Locations
A character may appear many times throughout the input:
"a1b2a3b4"
A naive implementation might overwrite previous values instead of accumulating them.
The solution correctly handles this using:
frequency[char] += count
which preserves all occurrences.
Already Sorted and Optimized Input
Some inputs already satisfy all conditions:
"a2b3c4"
The algorithm should not alter the meaning or structure unnecessarily.
Because the implementation simply recomputes totals and outputs characters alphabetically, already-optimal inputs naturally remain unchanged.