LeetCode 471 - Encode String with Shortest Length
Here’s a fully detailed technical solution guide following your requested format for LeetCode 471 - Encode String with Shortest Length.
Difficulty: 🔴 Hard
Topics: String, Dynamic Programming
Solution
Here’s a fully detailed technical solution guide following your requested format for LeetCode 471 - Encode String with Shortest Length.
Problem Understanding
The problem asks us to compress a string s into its shortest possible encoded form using the rule k[encoded_string], where encoded_string is repeated exactly k times. The encoded form should only be used if it reduces the string length; otherwise, the original substring is retained. The function must return any shortest encoding if multiple encodings exist.
The input is a string s of lowercase English letters, with length ranging from 1 to 150. This length is relatively small, so an algorithm with cubic or near-cubic complexity might be acceptable. The output is a string representing the encoded version of s.
Edge cases to consider include:
- Strings with no repeating patterns, e.g.,
"abcdef", which should remain unchanged. - Strings that are a single character repeated, e.g.,
"aaaa", which compress fully to4[a]. - Strings with partially repeating patterns, e.g.,
"aabbaabbaab", where only some substrings are compressed.
The problem guarantees that s is non-empty and consists of lowercase letters, so we do not need to handle empty strings or non-alphabet characters.
Approaches
Brute Force Approach
A naive approach is to try every possible substring split and attempt to encode it, then recursively encode the left and right parts. For every substring, we also check if it can be represented in the k[...] form by finding repeating patterns. Although this guarantees correctness, it is extremely slow because it involves generating all substring combinations and checking repeated patterns recursively. The time complexity can easily become exponential due to overlapping recursive calls.
Optimal Approach
The key insight is that this problem can be efficiently solved with dynamic programming. If we define dp[i][j] as the shortest encoded form of s[i:j+1], we can build solutions for longer substrings using smaller substrings.
For each substring s[i:j], we consider two strategies:
- Splitting it at every possible index
kand combiningdp[i][k]anddp[k+1][j]. - Checking if the substring itself can be represented as
k[pattern], by finding a repeating unit pattern using string concatenation tricks.
By filling the DP table for increasing substring lengths, we ensure that all subproblems are solved before they are needed.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^n) | O(n^2) | Recursively tries all splits and repeated encodings |
| Optimal | O(n^4) | O(n^3) | DP over all substrings and pattern matching, practical for n ≤ 150 |
Algorithm Walkthrough
- Initialize a 2D DP table
dpwheredp[i][j]stores the shortest encoded form of substrings[i:j+1]. - Fill
dp[i][i]for alliwith the characters[i], as a single character cannot be compressed. - Iterate over all substring lengths
lengthfrom 2 ton. - For each starting index
i, calculate ending indexj = i + length - 1. - Initialize
dp[i][j]as the substrings[i:j+1]itself. - Split the substring at every index
kfromitoj-1and updatedp[i][j]as the minimum between its current value anddp[i][k] + dp[k+1][j]. - Check if the substring
s[i:j+1]can be represented ask[pattern]. To findpattern, use(s[i:j+1] + s[i:j+1]).find(s[i:j+1], 1)trick to detect repeating units. - If a valid repeated pattern exists, encode it as
repeat_count[pattern]and updatedp[i][j]if it produces a shorter string. - Return
dp[0][n-1]as the final shortest encoded string.
Why it works: Each dp[i][j] contains the optimal encoding for its substring. By solving smaller substrings first and combining them, we guarantee the global optimum. Checking repeated patterns ensures we exploit compression opportunities.
Python Solution
class Solution:
def encode(self, s: str) -> str:
n = len(s)
dp = [["" for _ in range(n)] for _ in range(n)]
for length in range(1, n+1):
for i in range(n-length+1):
j = i + length - 1
substr = s[i:j+1]
dp[i][j] = substr
# check all splits
for k in range(i, j):
left = dp[i][k]
right = dp[k+1][j]
if len(left) + len(right) < len(dp[i][j]):
dp[i][j] = left + right
# check repeated pattern
for l in range(1, length):
if length % l == 0:
times = length // l
pattern = substr[:l]
if pattern * times == substr:
encoded = f"{times}[{dp[i][i+l-1]}]"
if len(encoded) < len(dp[i][j]):
dp[i][j] = encoded
return dp[0][n-1]
Implementation Explanation: The code initializes a DP table and fills it by substring lengths. For each substring, it considers all splits and repeated patterns to find the minimal encoding. The repeated pattern check ensures that nested compressions are handled by referring to previously computed dp[i][i+l-1].
Go Solution
import (
"strconv"
"strings"
)
func encode(s string) string {
n := len(s)
dp := make([][]string, n)
for i := range dp {
dp[i] = make([]string, n)
}
for length := 1; length <= n; length++ {
for i := 0; i <= n-length; i++ {
j := i + length - 1
substr := s[i : j+1]
dp[i][j] = substr
// check splits
for k := i; k < j; k++ {
left := dp[i][k]
right := dp[k+1][j]
if len(left)+len(right) < len(dp[i][j]) {
dp[i][j] = left + right
}
}
// check repeated pattern
for l := 1; l < length; l++ {
if length%l == 0 {
times := length / l
pattern := substr[:l]
repeated := strings.Repeat(pattern, times)
if repeated == substr {
encoded := strconv.Itoa(times) + "[" + dp[i][i+l-1] + "]"
if len(encoded) < len(dp[i][j]) {
dp[i][j] = encoded
}
}
}
}
}
}
return dp[0][n-1]
}
Go-specific notes: We use slices to implement the DP table. The strings.Repeat function handles repeated patterns, and strconv.Itoa converts integers to strings. Indexing and slicing are carefully handled to avoid off-by-one errors.
Worked Examples
Example: "aaaaa"
Step through the DP table:
| i | j | substring | splits | repeated pattern | dp[i][j] |
|---|---|---|---|---|---|
| 0 | 0 | a | - | - | a |
| 0 | 1 | aa | aa | 2[a] | 2[a] |
| 0 | 2 | aaa | 2[a]+a | 3[a] | 3[a] |
| 0 | 4 | aaaaa | 3[a]+aa | 5[a] | 5[a] |
The final answer is "5[a]".
Example: "aaa"
The substring "aaa" cannot be compressed further, so dp[0][2] = "aaa".
Example: "aaaaaaaaaa"
The substring "aaaaaaaaaa" has multiple compression options, but the shortest encoding is "10[a]".
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n^4) | We iterate over all substring lengths O(n^2) and for each substring check all splits and possible repeated patterns O(n^2) |
| Space | O(n^3) | DP table stores substrings, each of length up to n |
The algorithm is feasible for n <= 150 because n^4 operations is roughly 500 million in the worst case, which is acceptable with short-circuit optimizations.
Test Cases
# provided examples
assert Solution().encode("aaa") == "aaa" # cannot compress
assert Solution().encode("aaaaa") == "5[a]" # simple compression
assert Solution().encode("aaaaaaaaaa") == "10[a]" # multiple compressions possible
# edge and boundary cases
assert Solution().encode("