LeetCode 443 - String Compression
The problem asks us to perform in place string compression on an array of characters. The input is not a string object, but a mutable array named chars, where each element is a single character. The compression rule is based on groups of consecutive repeated characters.
Difficulty: 🟡 Medium
Topics: Two Pointers, String
Solution
Problem Understanding
The problem asks us to perform in place string compression on an array of characters. The input is not a string object, but a mutable array named chars, where each element is a single character.
The compression rule is based on groups of consecutive repeated characters. For every contiguous block of identical characters:
- If the character appears exactly once, we keep only the character itself.
- If the character appears multiple times, we write the character followed by the decimal representation of its frequency.
For example, the sequence:
["a","a","a","b","c","c"]
becomes:
["a","3","b","c","2"]
The important detail is that the compressed result must be written back into the same input array. We are not allowed to allocate another array proportional to the input size. The function should return only the new valid length of the compressed array.
Anything beyond the returned length is irrelevant and may contain leftover data.
The constraints are relatively small, with at most 2000 characters, so time complexity is not extremely restrictive. However, the constant extra space requirement is the real challenge. We must modify the array in place while still correctly handling multi digit counts such as "12" or "123".
A few important observations and edge cases emerge immediately:
- A single character should remain unchanged.
- Runs of length 1 should not have a count appended.
- Counts greater than 9 must be split into multiple characters.
- The array may already be incompressible, such as alternating characters.
- The entire array could consist of one repeated character.
- Since compression must happen in place, careless overwriting could destroy unread data.
The problem naturally suggests a two pointer strategy where one pointer reads the original data and another pointer writes the compressed form.
Approaches
Brute Force Approach
A straightforward solution is to scan through the array, build the compressed result in a separate string or list, and then copy the compressed content back into the original array.
The algorithm works like this:
- Traverse the array from left to right.
- Count consecutive identical characters.
- Append the character to a temporary result.
- If the count is greater than one, append the string form of the count.
- After processing all groups, overwrite the beginning of
charswith the temporary result.
This approach is easy to implement and clearly correct because it directly follows the compression rules. However, it violates the constant extra space requirement because the temporary structure may grow proportional to the input size.
Optimal In Place Two Pointer Approach
The key insight is that we can process the array group by group while writing compressed output directly back into the same array.
We use:
- A read pointer to scan the input
- A write pointer to place compressed characters
For each group of repeated characters:
- Write the character once
- If the count is greater than one, write each digit of the count separately
Because the write position never moves ahead of unread data in a problematic way, we can safely overwrite the array in place.
This satisfies both the correctness requirement and the constant extra space constraint.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(n) | Uses an additional buffer to build the compressed result |
| Optimal | O(n) | O(1) | Uses two pointers and writes directly into the input array |
Algorithm Walkthrough
- Initialize two pointers,
read_indexandwrite_index.
The read_index scans the original array to identify groups of repeated characters. The write_index tracks where the compressed output should be written.
2. Start processing a new character group.
At every iteration, store the current character and begin counting how many consecutive times it appears. 3. Advance the read pointer while characters match.
Continue moving read_index forward as long as the current character remains the same. This determines the size of the group.
4. Write the character into the compressed section.
Regardless of group size, the character itself always appears once in the compressed output. 5. If the group length is greater than one, write the count digit by digit.
Convert the count to a string and write each digit separately into the array.
This is important because counts such as 12 must become:
['1', '2']
rather than a single integer value. 6. Continue until the entire array is processed.
Once all groups have been compressed, the write_index represents the length of the valid compressed portion.
7. Return the write pointer.
The problem only cares about the compressed prefix of the array.
Why it works
The algorithm maintains an important invariant:
- Everything before
write_indexis already correctly compressed. - Everything at or after
read_indexhas not yet been processed.
Each group is read completely before any compressed data for that group is written. This guarantees we never lose unread information. Since every character is processed exactly once, the algorithm correctly constructs the compressed representation in place.
Python Solution
from typing import List
class Solution:
def compress(self, chars: List[str]) -> int:
read_index = 0
write_index = 0
n = len(chars)
while read_index < n:
current_char = chars[read_index]
group_start = read_index
while (
read_index < n
and chars[read_index] == current_char
):
read_index += 1
group_length = read_index - group_start
chars[write_index] = current_char
write_index += 1
if group_length > 1:
for digit in str(group_length):
chars[write_index] = digit
write_index += 1
return write_index
The implementation follows the exact two pointer strategy described earlier.
The variable read_index scans through the array and identifies contiguous groups of identical characters. For every group, we store the current character and remember where the group started.
The inner while loop advances until the character changes or the array ends. The difference between the new read_index and group_start gives the group size.
Next, we write the character into the compressed section using write_index.
If the group length exceeds one, we convert the count to a string and write each digit individually. This correctly handles multi digit counts such as "12" or "100".
Finally, the function returns write_index, which represents the compressed length.
Go Solution
package main
import "strconv"
func compress(chars []byte) int {
readIndex := 0
writeIndex := 0
n := len(chars)
for readIndex < n {
currentChar := chars[readIndex]
groupStart := readIndex
for readIndex < n && chars[readIndex] == currentChar {
readIndex++
}
groupLength := readIndex - groupStart
chars[writeIndex] = currentChar
writeIndex++
if groupLength > 1 {
countString := strconv.Itoa(groupLength)
for i := 0; i < len(countString); i++ {
chars[writeIndex] = countString[i]
writeIndex++
}
}
}
return writeIndex
}
The Go implementation is nearly identical in logic to the Python version.
One important difference is that Go uses []byte rather than a list of strings. Characters are stored as bytes, so digits from the count string can be written directly into the array.
The strconv.Itoa function converts the integer count into its decimal string representation.
No special handling for nil slices is required because the constraints guarantee at least one character.
Worked Examples
Example 1
Input:
["a","a","b","b","c","c","c"]
Step by Step Trace
| Step | Current Group | Count | Array State | write_index |
|---|---|---|---|---|
| Start | - | - | ["a","a","b","b","c","c","c"] |
0 |
Process "aa" |
a | 2 | ["a","2", ...] |
2 |
Process "bb" |
b | 2 | ["a","2","b","2", ...] |
4 |
Process "ccc" |
c | 3 | ["a","2","b","2","c","3", ...] |
6 |
Final compressed array prefix:
["a","2","b","2","c","3"]
Returned length:
6
Example 2
Input:
["a"]
Step by Step Trace
| Step | Current Group | Count | Array State | write_index |
|---|---|---|---|---|
Process "a" |
a | 1 | ["a"] |
1 |
Since the count is one, no number is written.
Returned length:
1
Example 3
Input:
["a","b","b","b","b","b","b","b","b","b","b","b","b"]
Step by Step Trace
| Step | Current Group | Count | Array State | write_index |
|---|---|---|---|---|
Process "a" |
a | 1 | ["a", ...] |
1 |
Process "bbbbbbbbbbbb" |
b | 12 | ["a","b","1","2", ...] |
4 |
Final compressed prefix:
["a","b","1","2"]
Returned length:
4
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character is visited once by the read pointer |
| Space | O(1) | Compression is performed directly inside the input array |
The algorithm is linear because every character belongs to exactly one group and is processed once during scanning. Writing the compressed representation is also linear overall because each count digit is written only once.
The space complexity is constant because no auxiliary data structure proportional to input size is used. Only a few variables are maintained regardless of input length.
Test Cases
from typing import List
class Solution:
def compress(self, chars: List[str]) -> int:
read_index = 0
write_index = 0
n = len(chars)
while read_index < n:
current_char = chars[read_index]
group_start = read_index
while (
read_index < n
and chars[read_index] == current_char
):
read_index += 1
group_length = read_index - group_start
chars[write_index] = current_char
write_index += 1
if group_length > 1:
for digit in str(group_length):
chars[write_index] = digit
write_index += 1
return write_index
solution = Solution()
# Example 1
chars = ["a", "a", "b", "b", "c", "c", "c"]
length = solution.compress(chars)
assert length == 6 # standard compression
assert chars[:length] == ["a", "2", "b", "2", "c", "3"]
# Example 2
chars = ["a"]
length = solution.compress(chars)
assert length == 1 # single character remains unchanged
assert chars[:length] == ["a"]
# Example 3
chars = ["a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"]
length = solution.compress(chars)
assert length == 4 # multi digit frequency
assert chars[:length] == ["a", "b", "1", "2"]
# All unique characters
chars = ["a", "b", "c", "d"]
length = solution.compress(chars)
assert length == 4 # no compression possible
assert chars[:length] == ["a", "b", "c", "d"]
# Entire array same character
chars = ["z"] * 9
length = solution.compress(chars)
assert length == 2 # z9
assert chars[:length] == ["z", "9"]
# Count larger than 9
chars = ["x"] * 15
length = solution.compress(chars)
assert length == 3 # x15
assert chars[:length] == ["x", "1", "5"]
# Mixed groups
chars = ["a", "a", "a", "b", "c", "c", "d", "d", "d", "d"]
length = solution.compress(chars)
assert length == 7
assert chars[:length] == ["a", "3", "b", "c", "2", "d", "4"]
# Digits as characters
chars = ["1", "1", "1", "2"]
length = solution.compress(chars)
assert length == 3
assert chars[:length] == ["1", "3", "2"]
# Symbols
chars = ["!", "!", "@", "@", "@"]
length = solution.compress(chars)
assert length == 4
assert chars[:length] == ["!", "2", "@", "3"]
| Test | Why |
|---|---|
["a","a","b","b","c","c","c"] |
Standard multi group compression |
["a"] |
Smallest valid input |
Long run of "b" characters |
Verifies multi digit counts |
| All unique characters | Ensures no unnecessary counts are added |
| Entire array identical | Tests single large group |
| Count greater than 9 | Verifies digit splitting |
| Mixed group sizes | Tests transitions between compressed and uncompressed groups |
| Digits as input characters | Ensures digits are treated as normal characters |
| Symbol characters | Confirms non alphabetic characters work correctly |
Edge Cases
Single Character Input
A one element array such as:
["a"]
is easy to mishandle if the implementation assumes every group has multiple characters. The correct behavior is to keep the character unchanged and return length 1.
The implementation handles this naturally because the group count becomes 1, and counts are only written when the frequency exceeds one.
Multi Digit Frequencies
Inputs like:
["a","a","a","a","a","a","a","a","a","a","a","a"]
produce a count of 12.
A common mistake is writing the integer directly into the array rather than splitting it into separate characters. The correct compressed form is:
["a","1","2"]
The implementation solves this by converting the count to a string and writing each digit individually.
Alternating Characters
An input such as:
["a","b","c","d"]
does not compress at all.
A buggy implementation might incorrectly append "1" after every character, producing:
["a","1","b","1","c","1","d","1"]
The algorithm avoids this by writing counts only when the group length is greater than one.
In Place Overwriting
Because the array must be modified directly, overwriting unread data is a potential source of bugs.
The two pointer structure guarantees safety because the algorithm always finishes reading an entire group before writing its compressed representation. This preserves all necessary information throughout execution.