LeetCode 297 - Serialize and Deserialize Binary Tree

The problem asks us to design two complementary operations for a binary tree: serialization and deserialization. Serialization converts a binary tree into a string representation that can be stored or transmitted.

LeetCode Problem 297

Difficulty: 🔴 Hard
Topics: String, Tree, Depth-First Search, Breadth-First Search, Design, Binary Tree

Solution

Problem Understanding

The problem asks us to design two complementary operations for a binary tree: serialization and deserialization.

Serialization converts a binary tree into a string representation that can be stored or transmitted. Deserialization performs the reverse operation, reconstructing the exact same binary tree structure from the serialized string.

The important detail is that the reconstructed tree must match the original tree exactly, including both node values and structure. Two trees with the same values but different arrangements are considered different trees.

The input to serialize is the root node of a binary tree. The output must be a string encoding of that tree.

The input to deserialize is the string previously produced by serialize. The output must be the reconstructed root node.

The problem does not force us to use LeetCode’s array-like format. Any encoding scheme is valid as long as:

  • Every valid tree can be serialized
  • The tree can later be reconstructed perfectly
  • Serialization and deserialization are inverses of each other

The constraints are important:

  • The tree may contain up to 10^4 nodes
  • Node values range from -1000 to 1000
  • The tree may be empty

These constraints tell us several things:

  • The algorithm must be efficient, ideally linear time
  • Recursive depth may become large for skewed trees
  • The serialization format must preserve null children, otherwise structural information is lost

A major challenge is handling missing children correctly. Consider these two trees:

    1          1
   /            \
  2              2

If we serialize only node values in preorder as 1,2, both trees become identical, which means deserialization becomes impossible. Therefore, null markers are essential.

Several edge cases can easily break naive implementations:

  • An empty tree
  • A completely skewed tree
  • Trees with repeated values
  • Trees containing negative numbers
  • Trees where many children are missing

The problem guarantees the input string for deserialization was generated by the serializer, so we do not need to validate malformed data.

Approaches

Brute Force Approach

A brute-force idea is to serialize the tree level by level into an array-like structure and store every possible position in a complete binary tree layout.

For example, we could assign indices exactly like a heap:

  • Root at index 0
  • Left child at 2i + 1
  • Right child at 2i + 2

We would then store values into a very large array, filling unused positions with null markers.

This approach works because each node position uniquely identifies the structure of the tree. During deserialization, children can be reconstructed from index relationships.

However, this method becomes extremely wasteful for skewed trees. A tree with height h may require space proportional to 2^h, even if only h nodes actually exist.

For example:

1
 \
  2
   \
    3

The array representation grows exponentially because many unused positions appear between nodes.

This makes the brute-force approach impractical for large sparse trees.

Optimal Approach

The key insight is that we do not need positional indexing if we explicitly record null children during traversal.

A preorder traversal naturally captures tree structure:

  1. Visit current node
  2. Serialize left subtree
  3. Serialize right subtree

If we insert a special marker such as "N" whenever a child is missing, the traversal becomes uniquely decodable.

For example:

    1
   / \
  2   3

Preorder serialization becomes:

1,2,N,N,3,N,N

This works because every node contributes:

  • Its value
  • Its entire left subtree
  • Its entire right subtree

During deserialization, we read values in exactly the same order. Whenever we encounter "N", we return None. Otherwise, we create a node and recursively build its left and right children.

This gives us a compact linear representation with linear reconstruction time.

Approach Time Complexity Space Complexity Notes
Brute Force O(2^h) O(2^h) Complete-tree array representation wastes space for sparse trees
Optimal O(n) O(n) Preorder traversal with null markers preserves structure efficiently

Algorithm Walkthrough

Serialization

  1. Start a preorder depth-first traversal from the root.
  2. If the current node is None, append a special marker such as "N" to the serialized output. This is necessary because missing children are part of the tree structure.
  3. Otherwise, append the current node’s value to the output.
  4. Recursively serialize the left subtree.
  5. Recursively serialize the right subtree.
  6. Join all recorded values using commas to produce the final string.

Deserialization

  1. Split the serialized string by commas to recover the traversal sequence.
  2. Maintain a pointer that tracks which token is currently being processed.
  3. Read the next token.
  4. If the token is "N", return None. This reconstructs missing children correctly.
  5. Otherwise, create a new tree node using the integer value.
  6. Recursively construct the left subtree.
  7. Recursively construct the right subtree.
  8. Return the reconstructed node.

Why it works

The algorithm works because preorder traversal defines a unique recursive structure:

Node -> value + left subtree + right subtree

Every null child is explicitly encoded, so no structural information is lost. During deserialization, tokens are consumed in exactly the same order they were produced. Each recursive call reconstructs precisely the subtree that was originally serialized at that position.

This establishes a one-to-one mapping between trees and serialized strings.

Python Solution

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

from typing import Optional

class Codec:

    def serialize(self, root: Optional['TreeNode']) -> str:
        """Encodes a tree to a single string."""

        values = []

        def dfs(node: Optional['TreeNode']) -> None:
            if node is None:
                values.append("N")
                return

            values.append(str(node.val))
            dfs(node.left)
            dfs(node.right)

        dfs(root)

        return ",".join(values)

    def deserialize(self, data: str) -> Optional['TreeNode']:
        """Decodes your encoded data to tree."""

        values = data.split(",")
        index = 0

        def dfs() -> Optional['TreeNode']:
            nonlocal index

            if values[index] == "N":
                index += 1
                return None

            node = TreeNode(int(values[index]))
            index += 1

            node.left = dfs()
            node.right = dfs()

            return node

        return dfs()

# Your Codec object will be instantiated and called as such:
# ser = Codec()
# deser = Codec()
# ans = deser.deserialize(ser.serialize(root))

The implementation follows the preorder traversal strategy directly.

The serialize method uses a helper DFS function. Every real node contributes its value, and every missing child contributes "N". The list values accumulates tokens efficiently, and ",".join() creates the final serialized string.

The deserialize method reconstructs the tree recursively. The index variable acts as a shared pointer into the token array. Each recursive call consumes exactly the tokens needed for one subtree.

When the token is "N", the subtree is empty, so we return None. Otherwise, we create a node and recursively rebuild its left and right subtrees in preorder order.

Because serialization and deserialization use the same traversal structure, reconstruction is exact.

Go Solution

/**
 * Definition for a binary tree node.
 * type TreeNode struct {
 *     Val int
 *     Left *TreeNode
 *     Right *TreeNode
 * }
 */

import (
	"strconv"
	"strings"
)

type Codec struct {
    
}

func Constructor() Codec {
	return Codec{}
}

// Serializes a tree to a single string.
func (this *Codec) serialize(root *TreeNode) string {
	values := []string{}

	var dfs func(node *TreeNode)

	dfs = func(node *TreeNode) {
		if node == nil {
			values = append(values, "N")
			return
		}

		values = append(values, strconv.Itoa(node.Val))

		dfs(node.Left)
		dfs(node.Right)
	}

	dfs(root)

	return strings.Join(values, ",")
}

// Deserializes your encoded data to tree.
func (this *Codec) deserialize(data string) *TreeNode {
	values := strings.Split(data, ",")
	index := 0

	var dfs func() *TreeNode

	dfs = func() *TreeNode {
		if values[index] == "N" {
			index++
			return nil
		}

		val, _ := strconv.Atoi(values[index])
		index++

		node := &TreeNode{Val: val}

		node.Left = dfs()
		node.Right = dfs()

		return node
	}

	return dfs()
}

/**
 * Your Codec object will be instantiated and called as such:
 * ser := Constructor();
 * deser := Constructor();
 * data := ser.serialize(root);
 * ans := deser.deserialize(data);
 */

The Go implementation mirrors the Python solution closely.

Instead of Python lists, Go uses slices of strings. Integer conversion is handled using strconv.Itoa and strconv.Atoi.

Go distinguishes nil pointers explicitly, which naturally maps to missing tree children.

The recursive DFS functions are implemented as closures so they can access shared variables like values and index.

Worked Examples

Example 1

Input:

    1
   / \
  2   3
     / \
    4   5

Serialization Trace

Step Current Node Output
1 1 1
2 2 1,2
3 null 1,2,N
4 null 1,2,N,N
5 3 1,2,N,N,3
6 4 1,2,N,N,3,4
7 null 1,2,N,N,3,4,N
8 null 1,2,N,N,3,4,N,N
9 5 1,2,N,N,3,4,N,N,5
10 null 1,2,N,N,3,4,N,N,5,N
11 null 1,2,N,N,3,4,N,N,5,N,N

Final serialized string:

1,2,N,N,3,4,N,N,5,N,N

Deserialization Trace

Index Token Action
0 1 Create node 1
1 2 Create node 2
2 N Left child is null
3 N Right child is null
4 3 Create node 3
5 4 Create node 4
6 N Left child is null
7 N Right child is null
8 5 Create node 5
9 N Left child is null
10 N Right child is null

The reconstructed tree matches the original exactly.

Example 2

Input:

[]

Serialization

The root is None.

Step Current Node Output
1 null N

Serialized string:

N

Deserialization

Index Token Action
0 N Return None

The reconstructed tree is empty.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each node and null marker is processed exactly once
Space O(n) Serialized output and recursion stack together require linear space

The serializer visits every node once and appends a constant amount of information per node. The deserializer also processes each token once while rebuilding the tree.

The recursion stack depth depends on tree height. In the worst case of a completely skewed tree, recursion depth becomes O(n).

Test Cases

# Definition for testing
class TreeNode:
    def __init__(self, x):
        self.val = x
        self.left = None
        self.right = None

def is_same_tree(a, b):
    if not a and not b:
        return True

    if not a or not b:
        return False

    return (
        a.val == b.val and
        is_same_tree(a.left, b.left) and
        is_same_tree(a.right, b.right)
    )

codec = Codec()

# Test 1: Empty tree
root = None
data = codec.serialize(root)
restored = codec.deserialize(data)
assert restored is None  # verifies empty tree handling

# Test 2: Single node
root = TreeNode(1)
data = codec.serialize(root)
restored = codec.deserialize(data)
assert is_same_tree(root, restored)  # verifies minimal non-empty tree

# Test 3: Balanced tree
root = TreeNode(1)
root.left = TreeNode(2)
root.right = TreeNode(3)
root.right.left = TreeNode(4)
root.right.right = TreeNode(5)

data = codec.serialize(root)
restored = codec.deserialize(data)

assert is_same_tree(root, restored)  # verifies standard example

# Test 4: Left-skewed tree
root = TreeNode(1)
root.left = TreeNode(2)
root.left.left = TreeNode(3)

data = codec.serialize(root)
restored = codec.deserialize(data)

assert is_same_tree(root, restored)  # verifies deep left recursion

# Test 5: Right-skewed tree
root = TreeNode(1)
root.right = TreeNode(2)
root.right.right = TreeNode(3)

data = codec.serialize(root)
restored = codec.deserialize(data)

assert is_same_tree(root, restored)  # verifies deep right recursion

# Test 6: Negative values
root = TreeNode(-1)
root.left = TreeNode(-2)
root.right = TreeNode(-3)

data = codec.serialize(root)
restored = codec.deserialize(data)

assert is_same_tree(root, restored)  # verifies negative numbers

# Test 7: Duplicate values
root = TreeNode(1)
root.left = TreeNode(1)
root.right = TreeNode(1)

data = codec.serialize(root)
restored = codec.deserialize(data)

assert is_same_tree(root, restored)  # verifies duplicates handled correctly

# Test 8: Sparse tree
root = TreeNode(1)
root.left = TreeNode(2)
root.left.right = TreeNode(3)

data = codec.serialize(root)
restored = codec.deserialize(data)

assert is_same_tree(root, restored)  # verifies null markers preserve structure
Test Why
Empty tree Verifies handling of None root
Single node Tests smallest non-empty input
Balanced tree Verifies standard serialization/deserialization
Left-skewed tree Tests recursive depth on left-heavy trees
Right-skewed tree Tests recursive depth on right-heavy trees
Negative values Ensures integer parsing works correctly
Duplicate values Verifies structure does not rely on uniqueness
Sparse tree Confirms null markers preserve missing children

Edge Cases

Empty Tree

An empty tree is represented by None. A naive serializer might return an empty string, which can create ambiguity during deserialization.

This implementation explicitly serializes an empty tree as "N". During deserialization, encountering "N" immediately returns None. This guarantees that empty trees are handled consistently and unambiguously.

Highly Skewed Tree

A tree may contain nodes only on one side:

1
 \
  2
   \
    3

Naive array-based representations waste enormous amounts of space for such inputs because many empty positions appear in the conceptual complete tree.

The preorder-with-null-markers approach stores only necessary information, so serialization size remains linear in the number of nodes.

Trees With Duplicate Values

Some solutions incorrectly assume node values are unique and attempt reconstruction using value-based lookup logic.

This problem does not guarantee uniqueness. For example:

    1
   / \
  1   1

Our implementation never relies on uniqueness. Reconstruction depends entirely on traversal order plus null markers, so duplicate values are handled naturally and correctly.

Sparse Trees With Missing Internal Children

Consider:

    1
   /
  2
   \
    3

If null children are omitted during serialization, the structure becomes ambiguous.

This implementation records every missing child using "N", ensuring the exact original structure can always be reconstructed.