LeetCode 271 - Encode and Decode Strings
The problem asks us to design a reversible encoding system for a list of strings. We need two functions: - encode, which converts a list of strings into a single string - decode, which reconstructs the original list from that encoded string The important requirement is that…
Difficulty: 🟡 Medium
Topics: Array, String, Design
Solution
Problem Understanding
The problem asks us to design a reversible encoding system for a list of strings. We need two functions:
encode, which converts a list of strings into a single stringdecode, which reconstructs the original list from that encoded string
The important requirement is that the transformation must be lossless. After encoding and then decoding, we must recover the exact original list, including empty strings and strings containing special characters.
The input is a list of strings where:
- The number of strings ranges from 1 to 200
- Each string can have length from 0 to 200
- Strings may contain any of the 256 ASCII characters
The last constraint is extremely important because it means we cannot safely assume that any character is reserved. For example, a string may contain commas, pipes, colons, hashes, spaces, or even newline characters. Any naive delimiter-based approach can therefore fail.
The expected output from decode is the exact original list. The ordering must remain unchanged, and every character must be preserved exactly.
A major challenge is ambiguity during decoding. Suppose we simply join strings with a delimiter like "#":
["ab", "c#d"] -> "ab#c#d"
When decoding, we cannot determine whether this represents:
["ab", "c", "d"]
or
["ab", "c#d"]
The constraints are small enough that an O(total_characters) solution is more than sufficient, but correctness is the real challenge.
Several edge cases are especially important:
- Empty strings such as
[""] - Multiple empty strings such as
["", "", ""] - Strings containing delimiter-like characters
- Strings containing digits
- Very long strings
- Lists containing only one string
A correct solution must handle all of these without ambiguity.
Approaches
Brute Force Approach
A naive solution is to concatenate all strings using a special separator character.
For example:
["hello", "world"] -> "hello#world"
Decoding would split the encoded string on "#".
This works only if the separator never appears inside the original strings. Unfortunately, the problem explicitly states that strings may contain any ASCII character, so no separator is guaranteed to be safe.
One possible brute-force improvement is escaping special characters before concatenation. For example, every "#" inside a string could become "##". While this can work, it becomes messy and error-prone because we must carefully escape and unescape every reserved character.
The brute-force approach is therefore unreliable and unnecessarily complicated.
Optimal Approach
The key insight is that instead of relying on delimiters between strings, we should explicitly store the length of each string before the string itself.
The encoding format becomes:
<length>#<string>
For example:
["Hello", "World"]
becomes:
5#Hello5#World
Now decoding is unambiguous:
- Read characters until
"#"to determine the length - Convert that length to an integer
- Read exactly that many characters as the next string
- Repeat until the encoded string is exhausted
This works because the decoder always knows exactly how many characters belong to the next string, regardless of the contents of the string itself.
Even if the string contains "#" or digits, decoding remains correct because the parser uses the length prefix rather than searching for separators inside the string data.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(n) | Uses delimiters, fails when strings contain separator characters |
| Optimal | O(n) | O(n) | Uses length-prefixed encoding, fully unambiguous |
Here, n represents the total number of characters across all strings.
Algorithm Walkthrough
Encoding
- Initialize an empty result string.
We will progressively append encoded representations of each string. 2. For every string in the input list, compute its length.
The length tells the decoder exactly how many characters belong to this string.
3. Append the length, followed by a separator character such as "#".
Example:
"Hello" -> "5#"
The separator is safe here because it only separates the numeric length from the string payload itself. 4. Append the original string immediately after the separator.
Example:
"5#" + "Hello" -> "5#Hello"
- Repeat this process for all strings.
- Return the final concatenated encoded string.
Decoding
- Initialize an empty result list and a pointer index starting at 0.
- Read characters from the current position until reaching
"#".
These characters represent the length of the next string.
3. Convert the extracted length substring into an integer.
4. Move the pointer past the separator character.
5. Read exactly length characters starting from the current position.
This slice is the original string.
6. Append the extracted string to the result list.
7. Advance the pointer by length.
8. Repeat until the pointer reaches the end of the encoded string.
9. Return the reconstructed list.
Why it works
The algorithm works because every encoded string contains explicit boundary information. The decoder never guesses where a string ends. Instead, it reads the stored length and extracts exactly that many characters. This guarantees that every string can be reconstructed correctly, even if it contains special characters, digits, separators, or empty content.
Python Solution
from typing import List
class Codec:
def encode(self, strs: List[str]) -> str:
"""Encodes a list of strings to a single string."""
encoded_parts = []
for string in strs:
encoded_parts.append(f"{len(string)}#{string}")
return "".join(encoded_parts)
def decode(self, s: str) -> List[str]:
"""Decodes a single string to a list of strings."""
decoded_strings = []
index = 0
while index < len(s):
separator_index = index
while s[separator_index] != "#":
separator_index += 1
string_length = int(s[index:separator_index])
start_of_string = separator_index + 1
end_of_string = start_of_string + string_length
decoded_strings.append(s[start_of_string:end_of_string])
index = end_of_string
return decoded_strings
# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(strs))
The encode method iterates through every string and constructs a length-prefixed representation. The format is:
length#content
All encoded pieces are stored in a list and joined at the end for efficiency.
The decode method uses a pointer-based parsing strategy. It first scans until "#" to extract the numeric length. Once the length is known, it slices exactly that many characters from the encoded string.
This approach avoids ambiguity entirely because decoding depends on explicit lengths rather than separators embedded in the content.
Go Solution
package main
import (
"strconv"
"strings"
)
type Codec struct {
}
// Encodes a list of strings to a single string.
func (codec *Codec) Encode(strs []string) string {
var builder strings.Builder
for _, str := range strs {
builder.WriteString(strconv.Itoa(len(str)))
builder.WriteString("#")
builder.WriteString(str)
}
return builder.String()
}
// Decodes a single string to a list of strings.
func (codec *Codec) Decode(s string) []string {
var result []string
index := 0
for index < len(s) {
separatorIndex := index
for s[separatorIndex] != '#' {
separatorIndex++
}
length, _ := strconv.Atoi(s[index:separatorIndex])
startOfString := separatorIndex + 1
endOfString := startOfString + length
result = append(result, s[startOfString:endOfString])
index = endOfString
}
return result
}
// Your Codec object will be instantiated and called as such:
// var codec Codec
// codec.Decode(codec.Encode(strs));
The Go solution follows the same logic as the Python implementation, but uses strings.Builder for efficient string concatenation. Since Go strings are immutable, repeatedly concatenating with + would create unnecessary allocations.
The decoder uses index arithmetic exactly like the Python version. Go slices make extracting substrings straightforward.
No special handling is required for empty strings because a length of 0 naturally works with the same logic.
Worked Examples
Example 1
Input:
["Hello", "World"]
Encoding Process
| String | Length | Encoded Piece |
|---|---|---|
| Hello | 5 | 5#Hello |
| World | 5 | 5#World |
Final encoded string:
5#Hello5#World
Decoding Process
| Step | Index | Length Parsed | Extracted String | Result |
|---|---|---|---|---|
| 1 | 0 | 5 | Hello | ["Hello"] |
| 2 | 7 | 5 | World | ["Hello", "World"] |
Final decoded output:
["Hello", "World"]
Example 2
Input:
[""]
Encoding Process
| String | Length | Encoded Piece |
|---|---|---|
| "" | 0 | 0# |
Final encoded string:
0#
Decoding Process
| Step | Index | Length Parsed | Extracted String | Result |
|---|---|---|---|---|
| 1 | 0 | 0 | "" | [""] |
Final decoded output:
[""]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Every character is processed a constant number of times |
| Space | O(n) | The encoded string and decoded output both scale with input size |
Here, n represents the total number of characters across all input strings.
The encoder visits every character once while constructing the output. The decoder also processes each character once while parsing lengths and extracting substrings. Since no nested rescanning occurs, the total runtime is linear.
The space complexity is linear because the encoded representation and decoded result both require storage proportional to the input size.
Test Cases
from typing import List
class Codec:
def encode(self, strs: List[str]) -> str:
encoded_parts = []
for string in strs:
encoded_parts.append(f"{len(string)}#{string}")
return "".join(encoded_parts)
def decode(self, s: str) -> List[str]:
decoded_strings = []
index = 0
while index < len(s):
separator_index = index
while s[separator_index] != "#":
separator_index += 1
string_length = int(s[index:separator_index])
start = separator_index + 1
end = start + string_length
decoded_strings.append(s[start:end])
index = end
return decoded_strings
codec = Codec()
# Basic example
assert codec.decode(codec.encode(["Hello", "World"])) == ["Hello", "World"]
# Single empty string
assert codec.decode(codec.encode([""])) == [""]
# Multiple empty strings
assert codec.decode(codec.encode(["", "", ""])) == ["", "", ""]
# Strings containing separator characters
assert codec.decode(codec.encode(["a#b", "###", "#"])) == ["a#b", "###", "#"]
# Strings containing digits
assert codec.decode(codec.encode(["123", "45#67"])) == ["123", "45#67"]
# Mixed empty and non-empty strings
assert codec.decode(codec.encode(["abc", "", "def"])) == ["abc", "", "def"]
# Long strings
assert codec.decode(codec.encode(["a" * 200])) == ["a" * 200]
# Strings with spaces and punctuation
assert codec.decode(codec.encode(["hello world", "!@#$%^&*()"])) == [
"hello world",
"!@#$%^&*()"
]
# Single character strings
assert codec.decode(codec.encode(["a", "b", "c"])) == ["a", "b", "c"]
# Complex mixed case
assert codec.decode(codec.encode([
"leet",
"code",
"",
"123#456",
"###",
"end"
])) == [
"leet",
"code",
"",
"123#456",
"###",
"end"
]
Test Case Summary
| Test | Why |
|---|---|
["Hello", "World"] |
Validates normal functionality |
[""] |
Tests empty string handling |
["", "", ""] |
Tests multiple consecutive empty strings |
["a#b", "###"] |
Ensures separators inside strings do not break decoding |
["123", "45#67"] |
Verifies digits are handled correctly |
["abc", "", "def"] |
Tests mixed empty and non-empty values |
["a" * 200] |
Validates maximum string length |
["hello world", "!@#$%^&*()"] |
Tests punctuation and whitespace |
["a", "b", "c"] |
Tests many small strings |
| Complex mixed case | Stress test for combined edge conditions |
Edge Cases
Empty Strings
An empty string is tricky because it contributes no characters to the encoded payload. A naive delimiter approach could lose track of whether an empty value existed at all.
This implementation handles empty strings naturally by encoding them as:
0#
During decoding, the parsed length is 0, so the decoder extracts zero characters and correctly reconstructs the empty string.
Strings Containing Separator Characters
A major source of bugs is strings containing the delimiter character itself. For example:
["abc#def"]
A delimiter-based solution would incorrectly split this string.
The length-prefixed approach avoids this entirely because decoding never searches for delimiters inside the payload. Once the length is known, the decoder reads exactly that many characters regardless of their content.
Consecutive Empty and Non-Empty Strings
Cases like:
["", "abc", "", "def"]
can expose pointer-management bugs during decoding. Incorrect index updates may skip strings or create off-by-one errors.
This implementation updates the pointer using exact substring boundaries:
index = end_of_string
As a result, every encoded segment is processed precisely once, ensuring correct reconstruction even with consecutive empty entries.