LeetCode 3163 - String Compression III

The problem asks us to implement a specialized string compression algorithm. Given an input string word, we are required to build a compressed version by repeatedly taking prefixes consisting of consecutive repeating characters, limited to a maximum length of 9, and appending…

LeetCode Problem 3163

Difficulty: 🟡 Medium
Topics: String

Solution

Problem Understanding

The problem asks us to implement a specialized string compression algorithm. Given an input string word, we are required to build a compressed version by repeatedly taking prefixes consisting of consecutive repeating characters, limited to a maximum length of 9, and appending their length followed by the character to the result string. The process continues until the input string is fully processed.

For example, the string "aaaaaaaaaaaaaabb" contains a long sequence of as. Since the prefix length cannot exceed 9, the first 9 as are taken, resulting in "9a", then the next 5 as form "5a", and the final "bb" becomes "2b".

Key points about the input and expected output:

  • The input is a non-empty string with a maximum length of 200,000 characters, which implies we need an algorithm that runs in linear time.
  • The input only contains lowercase English letters, so we do not need to handle digits or other characters.
  • The output compresses consecutive repeated characters but respects the maximum prefix length of 9.

Important edge cases to note:

  • Strings where no character repeats more than once, e.g., "abcde": each character should be represented as "1<character>".
  • Strings with very long sequences of a single character, e.g., "aaaaaaaaaa": the compression should split sequences into chunks of at most 9.
  • The input string can be very long, so any solution must avoid repeated string concatenation that scales poorly. Using a list or buffer for efficient accumulation is critical.

Approaches

The most straightforward way to solve the problem is a brute-force approach: scan the string, find the maximal prefix of the current character, append its length and character to the result, and repeat. This works correctly but may be inefficient if implemented naively using string slicing and repeated concatenation in Python, which could cause O(n^2) performance for very long strings.

The optimal approach is to iterate through the string once using a pointer and a counter to track consecutive characters. Whenever the count reaches 9 or the current character changes, append the current run length and character to the result. This approach is linear in time and uses a list or buffer to efficiently build the output string.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2) O(n) Uses repeated slicing and string concatenation; correct but inefficient for long strings
Optimal O(n) O(n) Single-pass iteration with a counter; uses list buffer to build result efficiently

Algorithm Walkthrough

  1. Initialize an empty list result to accumulate compressed segments.
  2. Initialize a counter count to 1 to track consecutive characters and start from the first character of word.
  3. Iterate through the string from the second character to the end.
  4. For each character, check if it is the same as the previous one. If it is, increment count.
  5. If count reaches 9 or the current character is different from the previous one, append the string "{count}{previous_char}" to result and reset count to 1.
  6. After finishing the iteration, append the final run of characters to result.
  7. Join the result list into a single string and return it.

Why it works: The algorithm maintains the invariant that each run of consecutive characters is correctly segmented into chunks of at most 9 characters. The linear scan ensures all characters are processed exactly once, guaranteeing correctness and efficiency.

Python Solution

class Solution:
    def compressedString(self, word: str) -> str:
        if not word:
            return ""
        
        result = []
        count = 1
        
        for i in range(1, len(word)):
            if word[i] == word[i-1] and count < 9:
                count += 1
            else:
                result.append(f"{count}{word[i-1]}")
                count = 1
        
        # Append the last run
        result.append(f"{count}{word[-1]}")
        
        return "".join(result)

The Python implementation uses a list result to efficiently accumulate compressed segments. The count variable tracks the length of the current run, and we append to result whenever the run ends or reaches the maximum allowed length of 9. Finally, join produces the compressed string.

Go Solution

func compressedString(word string) string {
    if len(word) == 0 {
        return ""
    }
    
    result := make([]byte, 0, len(word)*2)
    count := 1
    
    for i := 1; i < len(word); i++ {
        if word[i] == word[i-1] && count < 9 {
            count++
        } else {
            result = append(result, byte('0'+count))
            result = append(result, word[i-1])
            count = 1
        }
    }
    
    result = append(result, byte('0'+count))
    result = append(result, word[len(word)-1])
    
    return string(result)
}

The Go implementation is similar, but uses a byte slice for efficient accumulation. Conversion from integer count to character uses '0'+count. The slice is preallocated to roughly twice the length of the input for efficiency.

Worked Examples

Example 1: word = "abcde"

Step Current Char Count Result
1 a 1 []
2 b 1 (append 1a) ["1a"]
3 c 1 (append 1b) ["1a", "1b"]
4 d 1 (append 1c) ["1a", "1b", "1c"]
5 e 1 (append 1d) ["1a", "1b", "1c", "1d"]
End e append 1e ["1a", "1b", "1c", "1d", "1e"]

Output: "1a1b1c1d1e"

Example 2: word = "aaaaaaaaaaaaaabb"

Step Current Char Count Result
1-9 a 9 []
Append 9a - - ["9a"]
10-14 a 5 ["9a"]
Append 5a - - ["9a", "5a"]
15-16 b 2 ["9a", "5a"]
Append 2b - - ["9a", "5a", "2b"]

Output: "9a5a2b"

Complexity Analysis

Measure Complexity Explanation
Time O(n) Single pass through the string; each character is visited once
Space O(n) The result string can be at most twice the length of the input in worst case

The linear time complexity is achieved because we never perform nested loops, and the space complexity is proportional to the output, which is required.

Test Cases

# Provided examples
assert Solution().compressedString("abcde") == "1a1b1c1d1e"  # No repeated characters
assert Solution().compressedString("aaaaaaaaaaaaaabb") == "9a5a2b"  # Long repeated sequence

# Edge cases
assert Solution().compressedString("a") == "1a"  # Single character
assert Solution().compressedString("aa") == "2a"  # Two same characters
assert Solution().compressedString("aaaaaaaaaa") == "9a1a"  # Sequence exactly 10, split by 9 limit
assert Solution().compressedString("abababab") == "1a1b1a1b1a1b1a1b"  # Alternating characters
assert Solution().compressedString("zzzzzzzzzzzzzz") == "9z5z"  # Long repeated sequence
Test Why
"abcde" Validates handling of non-repeating characters
"aaaaaaaaaaaaaabb" Validates splitting long sequences into 9-length chunks
"a" Validates minimal input
"aa" Validates small repeated sequences
"aaaaaaaaaa" Validates exact split over 9 limit
"abababab" Validates alternating characters
"zzzzzzzzzzzzzz" Validates long repeated sequence handling

Edge Cases

  1. Single Character Input: For word = "a", the result should be "1a". This tests the algorithm’s handling of minimal input without accessing out-of-bounds indices. The implementation correctly appends the final run after iteration.
  2. Sequence Longer Than 9: For word = "aaaaaaaaaa", we must split the sequence into "9a1a". This tests that the maximum prefix rule is correctly enforced. The counter resets after reaching 9, which the implementation handles in the loop.
  3. Alternating Characters: For word = "abababab", no character repeats consecutively beyond 1.