LeetCode 471 - Encode String with Shortest Length

Here’s a fully detailed technical solution guide following your requested format for LeetCode 471 - Encode String with Shortest Length.

LeetCode Problem 471

Difficulty: 🔴 Hard
Topics: String, Dynamic Programming

Solution

Here’s a fully detailed technical solution guide following your requested format for LeetCode 471 - Encode String with Shortest Length.

Problem Understanding

The problem asks us to compress a string s into its shortest possible encoded form using the rule k[encoded_string], where encoded_string is repeated exactly k times. The encoded form should only be used if it reduces the string length; otherwise, the original substring is retained. The function must return any shortest encoding if multiple encodings exist.

The input is a string s of lowercase English letters, with length ranging from 1 to 150. This length is relatively small, so an algorithm with cubic or near-cubic complexity might be acceptable. The output is a string representing the encoded version of s.

Edge cases to consider include:

  1. Strings with no repeating patterns, e.g., "abcdef", which should remain unchanged.
  2. Strings that are a single character repeated, e.g., "aaaa", which compress fully to 4[a].
  3. Strings with partially repeating patterns, e.g., "aabbaabbaab", where only some substrings are compressed.

The problem guarantees that s is non-empty and consists of lowercase letters, so we do not need to handle empty strings or non-alphabet characters.

Approaches

Brute Force Approach

A naive approach is to try every possible substring split and attempt to encode it, then recursively encode the left and right parts. For every substring, we also check if it can be represented in the k[...] form by finding repeating patterns. Although this guarantees correctness, it is extremely slow because it involves generating all substring combinations and checking repeated patterns recursively. The time complexity can easily become exponential due to overlapping recursive calls.

Optimal Approach

The key insight is that this problem can be efficiently solved with dynamic programming. If we define dp[i][j] as the shortest encoded form of s[i:j+1], we can build solutions for longer substrings using smaller substrings.

For each substring s[i:j], we consider two strategies:

  1. Splitting it at every possible index k and combining dp[i][k] and dp[k+1][j].
  2. Checking if the substring itself can be represented as k[pattern], by finding a repeating unit pattern using string concatenation tricks.

By filling the DP table for increasing substring lengths, we ensure that all subproblems are solved before they are needed.

Approach Time Complexity Space Complexity Notes
Brute Force O(2^n) O(n^2) Recursively tries all splits and repeated encodings
Optimal O(n^4) O(n^3) DP over all substrings and pattern matching, practical for n ≤ 150

Algorithm Walkthrough

  1. Initialize a 2D DP table dp where dp[i][j] stores the shortest encoded form of substring s[i:j+1].
  2. Fill dp[i][i] for all i with the character s[i], as a single character cannot be compressed.
  3. Iterate over all substring lengths length from 2 to n.
  4. For each starting index i, calculate ending index j = i + length - 1.
  5. Initialize dp[i][j] as the substring s[i:j+1] itself.
  6. Split the substring at every index k from i to j-1 and update dp[i][j] as the minimum between its current value and dp[i][k] + dp[k+1][j].
  7. Check if the substring s[i:j+1] can be represented as k[pattern]. To find pattern, use (s[i:j+1] + s[i:j+1]).find(s[i:j+1], 1) trick to detect repeating units.
  8. If a valid repeated pattern exists, encode it as repeat_count[pattern] and update dp[i][j] if it produces a shorter string.
  9. Return dp[0][n-1] as the final shortest encoded string.

Why it works: Each dp[i][j] contains the optimal encoding for its substring. By solving smaller substrings first and combining them, we guarantee the global optimum. Checking repeated patterns ensures we exploit compression opportunities.

Python Solution

class Solution:
    def encode(self, s: str) -> str:
        n = len(s)
        dp = [["" for _ in range(n)] for _ in range(n)]
        
        for length in range(1, n+1):
            for i in range(n-length+1):
                j = i + length - 1
                substr = s[i:j+1]
                dp[i][j] = substr
                
                # check all splits
                for k in range(i, j):
                    left = dp[i][k]
                    right = dp[k+1][j]
                    if len(left) + len(right) < len(dp[i][j]):
                        dp[i][j] = left + right
                
                # check repeated pattern
                for l in range(1, length):
                    if length % l == 0:
                        times = length // l
                        pattern = substr[:l]
                        if pattern * times == substr:
                            encoded = f"{times}[{dp[i][i+l-1]}]"
                            if len(encoded) < len(dp[i][j]):
                                dp[i][j] = encoded
        
        return dp[0][n-1]

Implementation Explanation: The code initializes a DP table and fills it by substring lengths. For each substring, it considers all splits and repeated patterns to find the minimal encoding. The repeated pattern check ensures that nested compressions are handled by referring to previously computed dp[i][i+l-1].

Go Solution

import (
    "strconv"
    "strings"
)

func encode(s string) string {
    n := len(s)
    dp := make([][]string, n)
    for i := range dp {
        dp[i] = make([]string, n)
    }
    
    for length := 1; length <= n; length++ {
        for i := 0; i <= n-length; i++ {
            j := i + length - 1
            substr := s[i : j+1]
            dp[i][j] = substr
            
            // check splits
            for k := i; k < j; k++ {
                left := dp[i][k]
                right := dp[k+1][j]
                if len(left)+len(right) < len(dp[i][j]) {
                    dp[i][j] = left + right
                }
            }
            
            // check repeated pattern
            for l := 1; l < length; l++ {
                if length%l == 0 {
                    times := length / l
                    pattern := substr[:l]
                    repeated := strings.Repeat(pattern, times)
                    if repeated == substr {
                        encoded := strconv.Itoa(times) + "[" + dp[i][i+l-1] + "]"
                        if len(encoded) < len(dp[i][j]) {
                            dp[i][j] = encoded
                        }
                    }
                }
            }
        }
    }
    
    return dp[0][n-1]
}

Go-specific notes: We use slices to implement the DP table. The strings.Repeat function handles repeated patterns, and strconv.Itoa converts integers to strings. Indexing and slicing are carefully handled to avoid off-by-one errors.

Worked Examples

Example: "aaaaa"

Step through the DP table:

i j substring splits repeated pattern dp[i][j]
0 0 a - - a
0 1 aa aa 2[a] 2[a]
0 2 aaa 2[a]+a 3[a] 3[a]
0 4 aaaaa 3[a]+aa 5[a] 5[a]

The final answer is "5[a]".

Example: "aaa"

The substring "aaa" cannot be compressed further, so dp[0][2] = "aaa".

Example: "aaaaaaaaaa"

The substring "aaaaaaaaaa" has multiple compression options, but the shortest encoding is "10[a]".

Complexity Analysis

Measure Complexity Explanation
Time O(n^4) We iterate over all substring lengths O(n^2) and for each substring check all splits and possible repeated patterns O(n^2)
Space O(n^3) DP table stores substrings, each of length up to n

The algorithm is feasible for n <= 150 because n^4 operations is roughly 500 million in the worst case, which is acceptable with short-circuit optimizations.

Test Cases

# provided examples
assert Solution().encode("aaa") == "aaa"  # cannot compress
assert Solution().encode("aaaaa") == "5[a]"  # simple compression
assert Solution().encode("aaaaaaaaaa") == "10[a]"  # multiple compressions possible

# edge and boundary cases
assert Solution().encode("