LeetCode 443 - String Compression

The problem asks us to perform in place string compression on an array of characters. The input is not a string object, but a mutable array named chars, where each element is a single character. The compression rule is based on groups of consecutive repeated characters.

LeetCode Problem 443

Difficulty: 🟡 Medium
Topics: Two Pointers, String

Solution

Problem Understanding

The problem asks us to perform in place string compression on an array of characters. The input is not a string object, but a mutable array named chars, where each element is a single character.

The compression rule is based on groups of consecutive repeated characters. For every contiguous block of identical characters:

  • If the character appears exactly once, we keep only the character itself.
  • If the character appears multiple times, we write the character followed by the decimal representation of its frequency.

For example, the sequence:

["a","a","a","b","c","c"]

becomes:

["a","3","b","c","2"]

The important detail is that the compressed result must be written back into the same input array. We are not allowed to allocate another array proportional to the input size. The function should return only the new valid length of the compressed array.

Anything beyond the returned length is irrelevant and may contain leftover data.

The constraints are relatively small, with at most 2000 characters, so time complexity is not extremely restrictive. However, the constant extra space requirement is the real challenge. We must modify the array in place while still correctly handling multi digit counts such as "12" or "123".

A few important observations and edge cases emerge immediately:

  • A single character should remain unchanged.
  • Runs of length 1 should not have a count appended.
  • Counts greater than 9 must be split into multiple characters.
  • The array may already be incompressible, such as alternating characters.
  • The entire array could consist of one repeated character.
  • Since compression must happen in place, careless overwriting could destroy unread data.

The problem naturally suggests a two pointer strategy where one pointer reads the original data and another pointer writes the compressed form.

Approaches

Brute Force Approach

A straightforward solution is to scan through the array, build the compressed result in a separate string or list, and then copy the compressed content back into the original array.

The algorithm works like this:

  • Traverse the array from left to right.
  • Count consecutive identical characters.
  • Append the character to a temporary result.
  • If the count is greater than one, append the string form of the count.
  • After processing all groups, overwrite the beginning of chars with the temporary result.

This approach is easy to implement and clearly correct because it directly follows the compression rules. However, it violates the constant extra space requirement because the temporary structure may grow proportional to the input size.

Optimal In Place Two Pointer Approach

The key insight is that we can process the array group by group while writing compressed output directly back into the same array.

We use:

  • A read pointer to scan the input
  • A write pointer to place compressed characters

For each group of repeated characters:

  • Write the character once
  • If the count is greater than one, write each digit of the count separately

Because the write position never moves ahead of unread data in a problematic way, we can safely overwrite the array in place.

This satisfies both the correctness requirement and the constant extra space constraint.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) O(n) Uses an additional buffer to build the compressed result
Optimal O(n) O(1) Uses two pointers and writes directly into the input array

Algorithm Walkthrough

  1. Initialize two pointers, read_index and write_index.

The read_index scans the original array to identify groups of repeated characters. The write_index tracks where the compressed output should be written. 2. Start processing a new character group.

At every iteration, store the current character and begin counting how many consecutive times it appears. 3. Advance the read pointer while characters match.

Continue moving read_index forward as long as the current character remains the same. This determines the size of the group. 4. Write the character into the compressed section.

Regardless of group size, the character itself always appears once in the compressed output. 5. If the group length is greater than one, write the count digit by digit.

Convert the count to a string and write each digit separately into the array.

This is important because counts such as 12 must become:

['1', '2']

rather than a single integer value. 6. Continue until the entire array is processed.

Once all groups have been compressed, the write_index represents the length of the valid compressed portion. 7. Return the write pointer.

The problem only cares about the compressed prefix of the array.

Why it works

The algorithm maintains an important invariant:

  • Everything before write_index is already correctly compressed.
  • Everything at or after read_index has not yet been processed.

Each group is read completely before any compressed data for that group is written. This guarantees we never lose unread information. Since every character is processed exactly once, the algorithm correctly constructs the compressed representation in place.

Python Solution

from typing import List

class Solution:
    def compress(self, chars: List[str]) -> int:
        read_index = 0
        write_index = 0
        n = len(chars)

        while read_index < n:
            current_char = chars[read_index]
            group_start = read_index

            while (
                read_index < n
                and chars[read_index] == current_char
            ):
                read_index += 1

            group_length = read_index - group_start

            chars[write_index] = current_char
            write_index += 1

            if group_length > 1:
                for digit in str(group_length):
                    chars[write_index] = digit
                    write_index += 1

        return write_index

The implementation follows the exact two pointer strategy described earlier.

The variable read_index scans through the array and identifies contiguous groups of identical characters. For every group, we store the current character and remember where the group started.

The inner while loop advances until the character changes or the array ends. The difference between the new read_index and group_start gives the group size.

Next, we write the character into the compressed section using write_index.

If the group length exceeds one, we convert the count to a string and write each digit individually. This correctly handles multi digit counts such as "12" or "100".

Finally, the function returns write_index, which represents the compressed length.

Go Solution

package main

import "strconv"

func compress(chars []byte) int {
	readIndex := 0
	writeIndex := 0
	n := len(chars)

	for readIndex < n {
		currentChar := chars[readIndex]
		groupStart := readIndex

		for readIndex < n && chars[readIndex] == currentChar {
			readIndex++
		}

		groupLength := readIndex - groupStart

		chars[writeIndex] = currentChar
		writeIndex++

		if groupLength > 1 {
			countString := strconv.Itoa(groupLength)

			for i := 0; i < len(countString); i++ {
				chars[writeIndex] = countString[i]
				writeIndex++
			}
		}
	}

	return writeIndex
}

The Go implementation is nearly identical in logic to the Python version.

One important difference is that Go uses []byte rather than a list of strings. Characters are stored as bytes, so digits from the count string can be written directly into the array.

The strconv.Itoa function converts the integer count into its decimal string representation.

No special handling for nil slices is required because the constraints guarantee at least one character.

Worked Examples

Example 1

Input:

["a","a","b","b","c","c","c"]

Step by Step Trace

Step Current Group Count Array State write_index
Start - - ["a","a","b","b","c","c","c"] 0
Process "aa" a 2 ["a","2", ...] 2
Process "bb" b 2 ["a","2","b","2", ...] 4
Process "ccc" c 3 ["a","2","b","2","c","3", ...] 6

Final compressed array prefix:

["a","2","b","2","c","3"]

Returned length:

6

Example 2

Input:

["a"]

Step by Step Trace

Step Current Group Count Array State write_index
Process "a" a 1 ["a"] 1

Since the count is one, no number is written.

Returned length:

1

Example 3

Input:

["a","b","b","b","b","b","b","b","b","b","b","b","b"]

Step by Step Trace

Step Current Group Count Array State write_index
Process "a" a 1 ["a", ...] 1
Process "bbbbbbbbbbbb" b 12 ["a","b","1","2", ...] 4

Final compressed prefix:

["a","b","1","2"]

Returned length:

4

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is visited once by the read pointer
Space O(1) Compression is performed directly inside the input array

The algorithm is linear because every character belongs to exactly one group and is processed once during scanning. Writing the compressed representation is also linear overall because each count digit is written only once.

The space complexity is constant because no auxiliary data structure proportional to input size is used. Only a few variables are maintained regardless of input length.

Test Cases

from typing import List

class Solution:
    def compress(self, chars: List[str]) -> int:
        read_index = 0
        write_index = 0
        n = len(chars)

        while read_index < n:
            current_char = chars[read_index]
            group_start = read_index

            while (
                read_index < n
                and chars[read_index] == current_char
            ):
                read_index += 1

            group_length = read_index - group_start

            chars[write_index] = current_char
            write_index += 1

            if group_length > 1:
                for digit in str(group_length):
                    chars[write_index] = digit
                    write_index += 1

        return write_index

solution = Solution()

# Example 1
chars = ["a", "a", "b", "b", "c", "c", "c"]
length = solution.compress(chars)
assert length == 6  # standard compression
assert chars[:length] == ["a", "2", "b", "2", "c", "3"]

# Example 2
chars = ["a"]
length = solution.compress(chars)
assert length == 1  # single character remains unchanged
assert chars[:length] == ["a"]

# Example 3
chars = ["a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"]
length = solution.compress(chars)
assert length == 4  # multi digit frequency
assert chars[:length] == ["a", "b", "1", "2"]

# All unique characters
chars = ["a", "b", "c", "d"]
length = solution.compress(chars)
assert length == 4  # no compression possible
assert chars[:length] == ["a", "b", "c", "d"]

# Entire array same character
chars = ["z"] * 9
length = solution.compress(chars)
assert length == 2  # z9
assert chars[:length] == ["z", "9"]

# Count larger than 9
chars = ["x"] * 15
length = solution.compress(chars)
assert length == 3  # x15
assert chars[:length] == ["x", "1", "5"]

# Mixed groups
chars = ["a", "a", "a", "b", "c", "c", "d", "d", "d", "d"]
length = solution.compress(chars)
assert length == 7
assert chars[:length] == ["a", "3", "b", "c", "2", "d", "4"]

# Digits as characters
chars = ["1", "1", "1", "2"]
length = solution.compress(chars)
assert length == 3
assert chars[:length] == ["1", "3", "2"]

# Symbols
chars = ["!", "!", "@", "@", "@"] 
length = solution.compress(chars)
assert length == 4
assert chars[:length] == ["!", "2", "@", "3"]
Test Why
["a","a","b","b","c","c","c"] Standard multi group compression
["a"] Smallest valid input
Long run of "b" characters Verifies multi digit counts
All unique characters Ensures no unnecessary counts are added
Entire array identical Tests single large group
Count greater than 9 Verifies digit splitting
Mixed group sizes Tests transitions between compressed and uncompressed groups
Digits as input characters Ensures digits are treated as normal characters
Symbol characters Confirms non alphabetic characters work correctly

Edge Cases

Single Character Input

A one element array such as:

["a"]

is easy to mishandle if the implementation assumes every group has multiple characters. The correct behavior is to keep the character unchanged and return length 1.

The implementation handles this naturally because the group count becomes 1, and counts are only written when the frequency exceeds one.

Multi Digit Frequencies

Inputs like:

["a","a","a","a","a","a","a","a","a","a","a","a"]

produce a count of 12.

A common mistake is writing the integer directly into the array rather than splitting it into separate characters. The correct compressed form is:

["a","1","2"]

The implementation solves this by converting the count to a string and writing each digit individually.

Alternating Characters

An input such as:

["a","b","c","d"]

does not compress at all.

A buggy implementation might incorrectly append "1" after every character, producing:

["a","1","b","1","c","1","d","1"]

The algorithm avoids this by writing counts only when the group length is greater than one.

In Place Overwriting

Because the array must be modified directly, overwriting unread data is a potential source of bugs.

The two pointer structure guarantees safety because the algorithm always finishes reading an entire group before writing its compressed representation. This preserves all necessary information throughout execution.