LeetCode 3167 - Better Compression of String

Difficulty: 🟡 Medium
Topics: Hash Table, String, Sorting, Counting

Solution

Problem Understanding

The problem gives us a compressed string where every character is immediately followed by its frequency. For example, the string "a3b2" represents the original expanded string "aaabb".

However, the input compression is not guaranteed to be optimal. The same character may appear multiple times in different parts of the compressed string. For example:

"a3c9b2c1"

contains the character 'c' twice, once with frequency 9 and once with frequency 1.

The task is to produce a better compression with two strict rules:

Every character appears exactly once.
Characters must appear in alphabetical order.

This means we must:

Parse the compressed input correctly.
Accumulate the total frequency for each character.
Output the characters sorted from 'a' to 'z'.

For the previous example:

"a3c9b2c1"

the total count of 'c' becomes:

9 + 1 = 10

so the answer is:

"a3b2c10"

An important detail is that frequencies may contain multiple digits. For example:

"a12"

means character 'a' appears 12 times. A naive parser that assumes frequencies are single digits would fail.

The constraints are also important:

compressed.length can be as large as 6 * 10^4
Frequencies can be as large as 10^4

These constraints strongly suggest that we should process the string in a single pass rather than repeatedly reconstructing expanded strings.

The problem guarantees that:

The input format is always valid.
Every character is followed by a valid positive integer.
Frequencies have no leading zeroes.

This simplifies parsing because we never need to validate malformed input.

Several edge cases are important:

A character may appear many times throughout the string.
Frequencies may contain multiple digits.
The input may already be optimally compressed.
The input may contain only one distinct character.
Characters may appear in completely unsorted order.

A correct solution must handle all of these cases efficiently.

Approaches

Brute Force Approach

A straightforward but inefficient approach is to fully decompress the string first.

For example:

"a3b2"

would become:

"aaabb"

After constructing the expanded string, we could count character frequencies using a hash map, sort the characters alphabetically, and rebuild the compressed result.

This approach is correct because reconstructing the original string preserves all frequency information exactly. Once expanded, counting occurrences is trivial.

However, this method is inefficient because the decompressed string can become extremely large. Consider an input like:

"a10000b10000c10000"

The expanded string would contain 30,000 characters. In larger theoretical scenarios, decompression wastes both time and memory because we only need frequency totals, not the actual expanded sequence.

The problem can be solved without ever constructing the original string.

Optimal Approach

The key observation is that we never need the decompressed string itself.

We only care about:

which character appears,
and the total frequency of that character.

Therefore, we can parse the compressed string directly.

The algorithm works as follows:

Read one character.
Read all consecutive digits after it to form the full frequency.
Add that frequency to a hash map entry for the character.
Continue until the entire string is processed.
Finally, iterate from 'a' to 'z' and build the output string for characters that appeared.

This avoids unnecessary decompression and processes the input in linear time.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(N + K)	O(K)	Expands the entire string, where K is decompressed size
Optimal	O(N)	O(1)	Parses frequencies directly without decompression

Here:

N is the length of the compressed input string.
K is the size of the decompressed string, which may be much larger than N.

Algorithm Walkthrough

Step 1: Create a Frequency Map

We maintain a hash map that stores the total frequency for each character.

For example:

frequency['c'] = 10

means the character 'c' appears 10 times overall.

A hash map is ideal because:

updates are constant time,
repeated characters are easy to accumulate,
and there are only 26 lowercase letters.

Step 2: Traverse the String

We iterate through the compressed string using an index i.

At every step:

compressed[i] is guaranteed to be a character.
The following positions contain digits representing the frequency.

Step 3: Read the Character

We store the current character:

char = compressed[i]

Then move to the next position.

Step 4: Parse the Full Frequency

Frequencies may contain multiple digits.

For example:

"a123"

must be interpreted as frequency 123, not separate digits.

We keep reading digits until we reach another character or the end of the string.

Example:

num = 0
while i < n and compressed[i].isdigit():
    num = num * 10 + int(compressed[i])
    i += 1

This standard parsing pattern converts consecutive digits into an integer.

Step 5: Accumulate the Frequency

Once the full number is parsed:

frequency[char] += num

This correctly combines repeated occurrences of the same character.

Step 6: Build the Result

After processing the entire input:

iterate through 'a' to 'z',
append characters that exist in the map,
append their total frequency.

This guarantees alphabetical ordering.

Step 7: Return the Final String

Join all pieces together into the final compressed result.

Why it works

The algorithm processes every character-frequency pair exactly once. Each parsed frequency is added to the correct character total, so the hash map always contains the cumulative frequency of every letter seen so far.

Because the final output iterates alphabetically from 'a' to 'z', the result automatically satisfies the sorting requirement. Since each character is emitted exactly once with its total accumulated count, the produced compression is both correct and optimal.

Python Solution

class Solution:
    def betterCompression(self, compressed: str) -> str:
        frequency = {}
        n = len(compressed)
        i = 0

        while i < n:
            char = compressed[i]
            i += 1

            count = 0
            while i < n and compressed[i].isdigit():
                count = count * 10 + int(compressed[i])
                i += 1

            frequency[char] = frequency.get(char, 0) + count

        result = []

        for ascii_code in range(ord('a'), ord('z') + 1):
            char = chr(ascii_code)

            if char in frequency:
                result.append(char)
                result.append(str(frequency[char]))

        return "".join(result)

The implementation follows the algorithm directly.

The frequency dictionary stores cumulative counts for every character. The index i walks through the compressed string exactly once.

For every iteration:

we read the current character,
then parse the full numeric frequency using a digit-reading loop.

The expression:

count = count * 10 + int(compressed[i])

builds multi-digit numbers correctly.

After parsing a complete frequency, we update the dictionary entry for that character.

Finally, we iterate alphabetically from 'a' to 'z' and append existing characters and their counts to the result list. Using a list and "".join() is efficient because repeated string concatenation in Python can be costly.

Go Solution

func betterCompression(compressed string) string {
    frequency := make(map[byte]int)
    n := len(compressed)

    i := 0

    for i < n {
        char := compressed[i]
        i++

        count := 0

        for i < n && compressed[i] >= '0' && compressed[i] <= '9' {
            count = count*10 + int(compressed[i]-'0')
            i++
        }

        frequency[char] += count
    }

    result := make([]byte, 0)

    for char := byte('a'); char <= byte('z'); char++ {
        if count, exists := frequency[char]; exists {
            result = append(result, char...)
            result = append(result, []byte(fmt.Sprintf("%d", count))...)
        }
    }

    return string(result)
}

The Go solution uses the same logic as the Python version but adapts it to Go idioms.

The frequency table is implemented with:

map[byte]int

since characters are lowercase English letters.

Digit parsing is done manually using ASCII arithmetic:

compressed[i] - '0'

Unlike Python, Go strings are immutable byte sequences, so we build the result using a byte slice for efficiency.

One important Go-specific detail is converting integers to strings using fmt.Sprintf. Another valid option would be strconv.Itoa.

A corrected version of the append line is:

result = append(result, char)

Here is the fully corrected solution:

import (
    "fmt"
)

func betterCompression(compressed string) string {
    frequency := make(map[byte]int)
    n := len(compressed)

    i := 0

    for i < n {
        char := compressed[i]
        i++

        count := 0

        for i < n && compressed[i] >= '0' && compressed[i] <= '9' {
            count = count*10 + int(compressed[i]-'0')
            i++
        }

        frequency[char] += count
    }

    result := make([]byte, 0)

    for char := byte('a'); char <= byte('z'); char++ {
        if count, exists := frequency[char]; exists {
            result = append(result, char)
            result = append(result, []byte(fmt.Sprintf("%d", count))...)
        }
    }

    return string(result)
}

Worked Examples

Example 1

Input:

"a3c9b2c1"

Parsing Phase

Step	Character	Parsed Frequency	Frequency Map
1	a	3	{a: 3}
2	c	9	{a: 3, c: 9}
3	b	2	{a: 3, c: 9, b: 2}
4	c	1	{a: 3, c: 10, b: 2}

Construction Phase

We iterate alphabetically:

Character	Exists?	Output So Far
a	Yes	a3
b	Yes	a3b2
c	Yes	a3b2c10

Final answer:

"a3b2c10"

Example 2

Input:

"c2b3a1"

Parsing Phase

Step	Character	Parsed Frequency	Frequency Map
1	c	2	{c: 2}
2	b	3	{c: 2, b: 3}
3	a	1	{c: 2, b: 3, a: 1}

Construction Phase

Alphabetical iteration produces:

a1b3c2

Final answer:

"a1b3c2"

Example 3

Input:

"a2b4c1"

Parsing Phase

Step	Character	Parsed Frequency	Frequency Map
1	a	2	{a: 2}
2	b	4	{a: 2, b: 4}
3	c	1	{a: 2, b: 4, c: 1}

The input is already optimally compressed.

Final answer:

"a2b4c1"

Complexity Analysis

Measure	Complexity	Explanation
Time	O(N)	Each character in the input string is processed once
Space	O(1)	At most 26 lowercase letters are stored

The algorithm performs a single linear scan through the compressed string. Every index is visited once while parsing characters and digits.

Although we use a hash map, the number of possible keys is bounded by 26 lowercase English letters, so the auxiliary space is constant.

Test Cases

solution = Solution()

# Provided examples
assert solution.betterCompression("a3c9b2c1") == "a3b2c10"  # repeated character
assert solution.betterCompression("c2b3a1") == "a1b3c2"  # sorting required
assert solution.betterCompression("a2b4c1") == "a2b4c1"  # already optimal

# Single character
assert solution.betterCompression("z5") == "z5"  # smallest valid structure

# Multi-digit frequency
assert solution.betterCompression("a12") == "a12"  # multi-digit parsing

# Multiple repeated appearances
assert solution.betterCompression("a1a2a3") == "a6"  # cumulative frequency

# Unordered characters
assert solution.betterCompression("z1y2x3") == "x3y2z1"  # alphabetical output

# Large frequencies
assert solution.betterCompression("a9999a1") == "a10000"  # large accumulation

# All letters
assert solution.betterCompression(
    "z1y1x1w1v1u1t1s1r1q1p1o1n1m1l1k1j1i1h1g1f1e1d1c1b1a1"
) == "a1b1c1d1e1f1g1h1i1j1k1l1m1n1o1p1q1r1s1t1u1v1w1x1y1z1"

# Mixed repeated and unique characters
assert solution.betterCompression("b10a5b5c2a5") == "a10b15c2"

Test	Why
`"a3c9b2c1"`	Validates repeated character merging
`"c2b3a1"`	Validates alphabetical sorting
`"a2b4c1"`	Validates already optimal input
`"z5"`	Smallest non-trivial case
`"a12"`	Validates multi-digit parsing
`"a1a2a3"`	Validates cumulative addition
`"z1y2x3"`	Validates reverse-order handling
`"a9999a1"`	Validates large frequency totals
all letters reversed	Validates full alphabet ordering
`"b10a5b5c2a5"`	Validates multiple repeated groups

Edge Cases

Multi-Digit Frequencies

One of the most common mistakes is assuming frequencies are single digits.

For example:

"a123"

should produce:

"a123"

not:

"a1"

with leftover digits incorrectly interpreted.

The implementation avoids this bug by continuously reading digits until a non-digit character is reached.

Repeated Characters in Multiple Locations

A character may appear many times throughout the input:

"a1b2a3b4"

A naive implementation might overwrite previous values instead of accumulating them.

The solution correctly handles this using:

frequency[char] += count

which preserves all occurrences.

Already Sorted and Optimized Input

Some inputs already satisfy all conditions:

"a2b3c4"

The algorithm should not alter the meaning or structure unnecessarily.

Because the implementation simply recomputes totals and outputs characters alphabetically, already-optimal inputs naturally remain unchanged.