LeetCode 271 - Encode and Decode Strings

Difficulty: 🟡 Medium
Topics: Array, String, Design

Solution

Problem Understanding

The problem asks us to design a reversible encoding system for a list of strings. We need two functions:

encode, which converts a list of strings into a single string
decode, which reconstructs the original list from that encoded string

The important requirement is that the transformation must be lossless. After encoding and then decoding, we must recover the exact original list, including empty strings and strings containing special characters.

The input is a list of strings where:

The number of strings ranges from 1 to 200
Each string can have length from 0 to 200
Strings may contain any of the 256 ASCII characters

The last constraint is extremely important because it means we cannot safely assume that any character is reserved. For example, a string may contain commas, pipes, colons, hashes, spaces, or even newline characters. Any naive delimiter-based approach can therefore fail.

The expected output from decode is the exact original list. The ordering must remain unchanged, and every character must be preserved exactly.

A major challenge is ambiguity during decoding. Suppose we simply join strings with a delimiter like "#":

["ab", "c#d"] -> "ab#c#d"

When decoding, we cannot determine whether this represents:

["ab", "c", "d"]

["ab", "c#d"]

The constraints are small enough that an O(total_characters) solution is more than sufficient, but correctness is the real challenge.

Several edge cases are especially important:

Empty strings such as [""]
Multiple empty strings such as ["", "", ""]
Strings containing delimiter-like characters
Strings containing digits
Very long strings
Lists containing only one string

A correct solution must handle all of these without ambiguity.

Approaches

Brute Force Approach

A naive solution is to concatenate all strings using a special separator character.

For example:

["hello", "world"] -> "hello#world"

Decoding would split the encoded string on "#".

This works only if the separator never appears inside the original strings. Unfortunately, the problem explicitly states that strings may contain any ASCII character, so no separator is guaranteed to be safe.

One possible brute-force improvement is escaping special characters before concatenation. For example, every "#" inside a string could become "##". While this can work, it becomes messy and error-prone because we must carefully escape and unescape every reserved character.

The brute-force approach is therefore unreliable and unnecessarily complicated.

Optimal Approach

The key insight is that instead of relying on delimiters between strings, we should explicitly store the length of each string before the string itself.

The encoding format becomes:

<length>#<string>

For example:

["Hello", "World"]

becomes:

5#Hello5#World

Now decoding is unambiguous:

Read characters until "#" to determine the length
Convert that length to an integer
Read exactly that many characters as the next string
Repeat until the encoded string is exhausted

This works because the decoder always knows exactly how many characters belong to the next string, regardless of the contents of the string itself.

Even if the string contains "#" or digits, decoding remains correct because the parser uses the length prefix rather than searching for separators inside the string data.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n)	O(n)	Uses delimiters, fails when strings contain separator characters
Optimal	O(n)	O(n)	Uses length-prefixed encoding, fully unambiguous

Here, n represents the total number of characters across all strings.

Algorithm Walkthrough

Encoding

Initialize an empty result string.

We will progressively append encoded representations of each string. 2. For every string in the input list, compute its length.

The length tells the decoder exactly how many characters belong to this string. 3. Append the length, followed by a separator character such as "#".

Example:

"Hello" -> "5#"

The separator is safe here because it only separates the numeric length from the string payload itself. 4. Append the original string immediately after the separator.

Example:

"5#" + "Hello" -> "5#Hello"

Repeat this process for all strings.
Return the final concatenated encoded string.

Decoding

Initialize an empty result list and a pointer index starting at 0.
Read characters from the current position until reaching "#".

These characters represent the length of the next string. 3. Convert the extracted length substring into an integer. 4. Move the pointer past the separator character. 5. Read exactly length characters starting from the current position.

This slice is the original string. 6. Append the extracted string to the result list. 7. Advance the pointer by length. 8. Repeat until the pointer reaches the end of the encoded string. 9. Return the reconstructed list.

Why it works

The algorithm works because every encoded string contains explicit boundary information. The decoder never guesses where a string ends. Instead, it reads the stored length and extracts exactly that many characters. This guarantees that every string can be reconstructed correctly, even if it contains special characters, digits, separators, or empty content.

Python Solution

from typing import List

class Codec:
    def encode(self, strs: List[str]) -> str:
        """Encodes a list of strings to a single string."""
        
        encoded_parts = []

        for string in strs:
            encoded_parts.append(f"{len(string)}#{string}")

        return "".join(encoded_parts)

    def decode(self, s: str) -> List[str]:
        """Decodes a single string to a list of strings."""
        
        decoded_strings = []
        index = 0

        while index < len(s):
            separator_index = index

            while s[separator_index] != "#":
                separator_index += 1

            string_length = int(s[index:separator_index])

            start_of_string = separator_index + 1
            end_of_string = start_of_string + string_length

            decoded_strings.append(s[start_of_string:end_of_string])

            index = end_of_string

        return decoded_strings

# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(strs))

The encode method iterates through every string and constructs a length-prefixed representation. The format is:

length#content

All encoded pieces are stored in a list and joined at the end for efficiency.

The decode method uses a pointer-based parsing strategy. It first scans until "#" to extract the numeric length. Once the length is known, it slices exactly that many characters from the encoded string.

This approach avoids ambiguity entirely because decoding depends on explicit lengths rather than separators embedded in the content.

Go Solution

package main

import (
	"strconv"
	"strings"
)

type Codec struct {
    
}

// Encodes a list of strings to a single string.
func (codec *Codec) Encode(strs []string) string {
	var builder strings.Builder

	for _, str := range strs {
		builder.WriteString(strconv.Itoa(len(str)))
		builder.WriteString("#")
		builder.WriteString(str)
	}

	return builder.String()
}

// Decodes a single string to a list of strings.
func (codec *Codec) Decode(s string) []string {
	var result []string

	index := 0

	for index < len(s) {
		separatorIndex := index

		for s[separatorIndex] != '#' {
			separatorIndex++
		}

		length, _ := strconv.Atoi(s[index:separatorIndex])

		startOfString := separatorIndex + 1
		endOfString := startOfString + length

		result = append(result, s[startOfString:endOfString])

		index = endOfString
	}

	return result
}

// Your Codec object will be instantiated and called as such:
// var codec Codec
// codec.Decode(codec.Encode(strs));

The Go solution follows the same logic as the Python implementation, but uses strings.Builder for efficient string concatenation. Since Go strings are immutable, repeatedly concatenating with + would create unnecessary allocations.

The decoder uses index arithmetic exactly like the Python version. Go slices make extracting substrings straightforward.

No special handling is required for empty strings because a length of 0 naturally works with the same logic.

Worked Examples

Example 1

Input:

["Hello", "World"]

Encoding Process

String	Length	Encoded Piece
Hello	5	`5#Hello`
World	5	`5#World`

Final encoded string:

5#Hello5#World

Decoding Process

Step	Index	Length Parsed	Extracted String	Result
1	0	5	Hello	`["Hello"]`
2	7	5	World	`["Hello", "World"]`

Final decoded output:

["Hello", "World"]

Example 2

Input:

[""]

Encoding Process

String	Length	Encoded Piece
""	0	`0#`

Final encoded string:

0#

Decoding Process

Step	Index	Length Parsed	Extracted String	Result
1	0	0	""	`[""]`

Final decoded output:

[""]

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Every character is processed a constant number of times
Space	O(n)	The encoded string and decoded output both scale with input size

Here, n represents the total number of characters across all input strings.

The encoder visits every character once while constructing the output. The decoder also processes each character once while parsing lengths and extracting substrings. Since no nested rescanning occurs, the total runtime is linear.

The space complexity is linear because the encoded representation and decoded result both require storage proportional to the input size.

Test Cases

from typing import List

class Codec:
    def encode(self, strs: List[str]) -> str:
        encoded_parts = []

        for string in strs:
            encoded_parts.append(f"{len(string)}#{string}")

        return "".join(encoded_parts)

    def decode(self, s: str) -> List[str]:
        decoded_strings = []
        index = 0

        while index < len(s):
            separator_index = index

            while s[separator_index] != "#":
                separator_index += 1

            string_length = int(s[index:separator_index])

            start = separator_index + 1
            end = start + string_length

            decoded_strings.append(s[start:end])

            index = end

        return decoded_strings

codec = Codec()

# Basic example
assert codec.decode(codec.encode(["Hello", "World"])) == ["Hello", "World"]

# Single empty string
assert codec.decode(codec.encode([""])) == [""]

# Multiple empty strings
assert codec.decode(codec.encode(["", "", ""])) == ["", "", ""]

# Strings containing separator characters
assert codec.decode(codec.encode(["a#b", "###", "#"])) == ["a#b", "###", "#"]

# Strings containing digits
assert codec.decode(codec.encode(["123", "45#67"])) == ["123", "45#67"]

# Mixed empty and non-empty strings
assert codec.decode(codec.encode(["abc", "", "def"])) == ["abc", "", "def"]

# Long strings
assert codec.decode(codec.encode(["a" * 200])) == ["a" * 200]

# Strings with spaces and punctuation
assert codec.decode(codec.encode(["hello world", "!@#$%^&*()"])) == [
    "hello world",
    "!@#$%^&*()"
]

# Single character strings
assert codec.decode(codec.encode(["a", "b", "c"])) == ["a", "b", "c"]

# Complex mixed case
assert codec.decode(codec.encode([
    "leet",
    "code",
    "",
    "123#456",
    "###",
    "end"
])) == [
    "leet",
    "code",
    "",
    "123#456",
    "###",
    "end"
]

Test Case Summary

Test	Why
`["Hello", "World"]`	Validates normal functionality
`[""]`	Tests empty string handling
`["", "", ""]`	Tests multiple consecutive empty strings
`["a#b", "###"]`	Ensures separators inside strings do not break decoding
`["123", "45#67"]`	Verifies digits are handled correctly
`["abc", "", "def"]`	Tests mixed empty and non-empty values
`["a" * 200]`	Validates maximum string length
`["hello world", "!@#$%^&*()"]`	Tests punctuation and whitespace
`["a", "b", "c"]`	Tests many small strings
Complex mixed case	Stress test for combined edge conditions

Edge Cases

Empty Strings

An empty string is tricky because it contributes no characters to the encoded payload. A naive delimiter approach could lose track of whether an empty value existed at all.

This implementation handles empty strings naturally by encoding them as:

0#

During decoding, the parsed length is 0, so the decoder extracts zero characters and correctly reconstructs the empty string.

Strings Containing Separator Characters

A major source of bugs is strings containing the delimiter character itself. For example:

["abc#def"]

A delimiter-based solution would incorrectly split this string.

The length-prefixed approach avoids this entirely because decoding never searches for delimiters inside the payload. Once the length is known, the decoder reads exactly that many characters regardless of their content.

Consecutive Empty and Non-Empty Strings

Cases like:

["", "abc", "", "def"]

can expose pointer-management bugs during decoding. Incorrect index updates may skip strings or create off-by-one errors.

This implementation updates the pointer using exact substring boundaries:

index = end_of_string

As a result, every encoded segment is processed precisely once, ensuring correct reconstruction even with consecutive empty entries.