LeetCode 428 - Serialize and Deserialize N-ary Tree

Difficulty: 🔴 Hard
Topics: String, Tree, Depth-First Search, Breadth-First Search

Solution

Problem Understanding

LeetCode 428, LeetCode Serialize and Deserialize N-ary Tree, asks us to design a reversible encoding system for an N-ary tree. The problem is not asking for a specific serialization format. Instead, it asks us to guarantee that after serializing a tree into a string and then deserializing that string, we reconstruct the exact same tree structure.

An N-ary tree differs from a binary tree because each node can have any number of children, not just two. Every node contains a value and a list of child nodes. The challenge is that when we flatten the tree into a string, we must preserve both the node values and the parent-child relationships.

The input to serialize is the root node of an N-ary tree. The output must be a string representation of that tree. The input to deserialize is that serialized string, and the output must be the reconstructed root node.

The constraints are important because they shape what kinds of approaches are practical. The tree can contain up to 10^4 nodes, which means the algorithm must be at least linear or close to linear in time complexity. The height can be as large as 1000, which means recursive depth-first traversal is acceptable in Python if recursion limits are managed carefully, but iterative designs are also worth considering. The requirement that the codec be stateless means we cannot rely on global variables or persistent class state between method calls.

The biggest conceptual challenge is that a serialized sequence must preserve enough structural information to reconstruct arbitrary branching. If we only store node values in traversal order, we lose information about how many children each node has. Therefore, the serialization format must explicitly encode child boundaries.

Several edge cases are especially important. An empty tree must serialize and deserialize correctly. Nodes may have zero children, which means leaf nodes need to be represented unambiguously. Trees may be highly unbalanced with large depth, and nodes may have many children. Duplicate values are also allowed, so reconstruction cannot rely on node values being unique.

Approaches

A brute-force approach would attempt to serialize the tree using only traversal order, such as preorder or level-order traversal, while inserting explicit separators or null markers everywhere. For example, we could repeatedly insert delimiters between sibling groups and null markers after every node. While this works, the representation becomes unnecessarily verbose and harder to parse correctly. In the worst case, the serialized string contains excessive structural markers that inflate memory usage and complicate deserialization logic.

The key insight for an optimal solution is that every node only needs two pieces of information during serialization: its value and the number of children it has. Once we know how many children belong to a node, deserialization becomes deterministic. We can recursively rebuild exactly that many subtrees before returning to the parent.

This observation leads naturally to a preorder depth-first traversal. During serialization, for each node we store:

The node value
The number of children
The serialized representation of each child

For example, a node with value 3 and two children becomes:

3 2 ...

The deserializer reads the value and child count, creates the node, then recursively reconstructs exactly that many children. Because preorder traversal preserves hierarchy naturally, reconstruction is straightforward and efficient.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n)	O(n)	Uses excessive null markers or separators, verbose representation
Optimal	O(n)	O(n)	Stores node value and child count using preorder DFS

Algorithm Walkthrough

Serialization

Start with a preorder depth-first traversal from the root node. Preorder is ideal because the parent node is processed before its children, which allows deserialization to reconstruct the hierarchy in the same order.
For each node, append two values to the serialized output:

The node's value
The number of children the node has

This child count is the critical structural information that makes reconstruction possible. 3. After recording the current node, recursively serialize each child in order. Because the number of children is already known, the deserializer will later know exactly how many recursive subtree constructions to perform. 4. Store all values in a list of strings and join them with spaces at the end. Using a list avoids inefficient repeated string concatenation. 5. If the tree is empty, return an empty string.

Deserialization

If the serialized string is empty, return None.
Split the string into tokens. Every node contributes exactly two immediate tokens:

Node value
Child count

Maintain an index pointer that tracks the current reading position in the token list.
Recursively reconstruct the tree:

Read the node value
Read the child count
Create the node
Recursively deserialize exactly that many children

Append each reconstructed child to the node's children list.
Return the reconstructed root node after all recursive calls finish.

Why it works

The algorithm works because every node explicitly stores the number of children it owns. During deserialization, this child count acts as a boundary that tells us exactly how many recursive subtree reads belong to the current node. Since preorder traversal processes parents before children, the token stream naturally preserves hierarchical structure. Every serialized node corresponds to exactly one deserialized node, and every subtree is reconstructed in the same order it was traversed.

Python Solution

"""
# Definition for a Node.
class Node(object):
    def __init__(self, val: Optional[int] = None, children: Optional[List['Node']] = None):
        if children is None:
            children = []
        self.val = val
        self.children = children
"""

from typing import List, Optional

class Codec:
    def serialize(self, root: 'Node') -> str:
        """Encodes a tree to a single string.
        
        :type root: Node
        :rtype: str
        """
        
        if not root:
            return ""
        
        serialized = []
        
        def dfs(node: 'Node') -> None:
            serialized.append(str(node.val))
            serialized.append(str(len(node.children)))
            
            for child in node.children:
                dfs(child)
        
        dfs(root)
        return " ".join(serialized)

	
    def deserialize(self, data: str) -> 'Node':
        """Decodes your encoded data to tree.
        
        :type data: str
        :rtype: Node
        """
        
        if not data:
            return None
        
        tokens = data.split()
        index = 0
        
        def dfs() -> 'Node':
            nonlocal index
            
            value = int(tokens[index])
            index += 1
            
            child_count = int(tokens[index])
            index += 1
            
            node = Node(value, [])
            
            for _ in range(child_count):
                node.children.append(dfs())
            
            return node
        
        return dfs()

The implementation follows the preorder DFS strategy directly. During serialization, the helper function records the node value followed by the number of children. This immediately captures both node identity and structural information.

The recursive traversal naturally handles arbitrary branching. Every child subtree is serialized immediately after its parent, preserving hierarchical ordering.

During deserialization, the index variable tracks the current position in the token list. Each recursive call consumes exactly two tokens for the current node and then recursively consumes all tokens belonging to its children.

Using nonlocal index allows recursive calls to share the same traversal state without relying on global variables, which satisfies the stateless requirement.

Go Solution

/**
 * Definition for a Node.
 * type Node struct {
 *     Val int
 *     Children []*Node
 * }
 */

import (
    "strconv"
    "strings"
)

type Codec struct {
    
}

func Constructor() *Codec {
    return &Codec{}
}

func (this *Codec) serialize(root *Node) string {
    if root == nil {
        return ""
    }
    
    var serialized []string
    
    var dfs func(node *Node)
    
    dfs = func(node *Node) {
        serialized = append(serialized, strconv.Itoa(node.Val))
        serialized = append(serialized, strconv.Itoa(len(node.Children)))
        
        for _, child := range node.Children {
            dfs(child)
        }
    }
    
    dfs(root)
    
    return strings.Join(serialized, " ")
}

func (this *Codec) deserialize(data string) *Node {
    if data == "" {
        return nil
    }
    
    tokens := strings.Split(data, " ")
    index := 0
    
    var dfs func() *Node
    
    dfs = func() *Node {
        value, _ := strconv.Atoi(tokens[index])
        index++
        
        childCount, _ := strconv.Atoi(tokens[index])
        index++
        
        node := &Node{
            Val: value,
            Children: []*Node{},
        }
        
        for i := 0; i < childCount; i++ {
            node.Children = append(node.Children, dfs())
        }
        
        return node
    }
    
    return dfs()
}

/**
 * Your Codec object will be instantiated and called as such:
 * obj := Constructor();
 * data := obj.serialize(root);
 * ans := obj.deserialize(data);
 */

The Go implementation mirrors the Python logic closely. The main difference is explicit string conversion using strconv.Itoa and strconv.Atoi. Go slices are used for dynamic token storage and child lists. Nil handling is explicit because Go distinguishes between nil pointers and initialized structs.

Unlike Python, Go requires closure variables like index to be captured explicitly inside nested functions. Otherwise, the recursive reconstruction logic is identical.

Worked Examples

Example 1

Input:

[1,null,3,2,4,null,5,6]

The tree structure is:

Serialization Trace

Step	Node	Children Count	Serialized Output
1	1	3	`1 3`
2	3	2	`1 3 3 2`
3	5	0	`1 3 3 2 5 0`
4	6	0	`1 3 3 2 5 0 6 0`
5	2	0	`1 3 3 2 5 0 6 0 2 0`
6	4	0	`1 3 3 2 5 0 6 0 2 0 4 0`

Final serialized string:

"1 3 3 2 5 0 6 0 2 0 4 0"

Deserialization Trace

Index	Tokens Read	Action
0-1	`1 3`	Create node 1 with 3 children
2-3	`3 2`	Create node 3 with 2 children
4-5	`5 0`	Create leaf node 5
6-7	`6 0`	Create leaf node 6
8-9	`2 0`	Create leaf node 2
10-11	`4 0`	Create leaf node 4

The original tree is reconstructed exactly.

Example 2

Input:

[1,null,2,3,4,5]

Tree:

      1
    / | | \
   2  3 4  5

Serialized output:

"1 4 2 0 3 0 4 0 5 0"

Deserialization reconstructs node 1 with exactly four leaf children.

Example 3

Input:

[]

Serialization immediately returns:

""

Deserialization sees the empty string and returns None.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Every node is visited exactly once during serialization and deserialization
Space	O(n)	Serialized token storage plus recursion stack

The time complexity is linear because each node contributes a constant amount of work during both serialization and deserialization. The recursion stack may grow up to the height of the tree. In the worst case of a skewed tree, this can become O(n). The serialized representation itself also requires linear space because every node stores two integers.

Test Cases

from typing import List, Optional

class Node:
    def __init__(self, val=None, children=None):
        self.val = val
        self.children = children if children is not None else []

def trees_equal(a, b):
    if not a and not b:
        return True
    
    if not a or not b:
        return False
    
    if a.val != b.val:
        return False
    
    if len(a.children) != len(b.children):
        return False
    
    for c1, c2 in zip(a.children, b.children):
        if not trees_equal(c1, c2):
            return False
    
    return True

codec = Codec()

# Empty tree
root = None
serialized = codec.serialize(root)
deserialized = codec.deserialize(serialized)
assert deserialized is None  # tests empty input

# Single node
root = Node(1)
serialized = codec.serialize(root)
deserialized = codec.deserialize(serialized)
assert trees_equal(root, deserialized)  # tests single-node tree

# Example tree
root = Node(1, [
    Node(3, [
        Node(5),
        Node(6)
    ]),
    Node(2),
    Node(4)
])

serialized = codec.serialize(root)
deserialized = codec.deserialize(serialized)
assert trees_equal(root, deserialized)  # tests standard branching

# Deep skewed tree
root = Node(1, [
    Node(2, [
        Node(3, [
            Node(4)
        ])
    ])
])

serialized = codec.serialize(root)
deserialized = codec.deserialize(serialized)
assert trees_equal(root, deserialized)  # tests deep recursion

# Wide tree
root = Node(1, [Node(i) for i in range(2, 100)])

serialized = codec.serialize(root)
deserialized = codec.deserialize(serialized)
assert trees_equal(root, deserialized)  # tests many children

# Duplicate values
root = Node(1, [
    Node(1),
    Node(1, [
        Node(1)
    ])
])

serialized = codec.serialize(root)
deserialized = codec.deserialize(serialized)
assert trees_equal(root, deserialized)  # tests duplicate values

# Mixed structure
root = Node(10, [
    Node(20),
    Node(30, [
        Node(40),
        Node(50)
    ]),
    Node(60)
])

serialized = codec.serialize(root)
deserialized = codec.deserialize(serialized)
assert trees_equal(root, deserialized)  # tests irregular tree

Test	Why
Empty tree	Validates correct handling of `None`
Single node	Ensures leaf serialization works
Standard branching tree	Verifies normal N-ary structure reconstruction
Deep skewed tree	Tests recursion depth handling
Wide tree	Ensures large child lists deserialize correctly
Duplicate values	Confirms reconstruction does not rely on uniqueness
Mixed structure	Tests varied subtree sizes and shapes

Edge Cases

An empty tree is the most important boundary condition. A naive implementation might attempt to split an empty string into tokens and fail during parsing. The implementation handles this explicitly by returning an empty string during serialization and returning None immediately during deserialization.

Leaf nodes are another subtle case. Since leaf nodes have no children, the serializer records a child count of 0. Without this explicit count, the deserializer would not know where a subtree ends. The implementation guarantees correctness because every node always records its exact number of children.

Duplicate node values can easily break incorrect reconstruction strategies. Some flawed approaches attempt to infer structure based on unique values or markers. This implementation never relies on uniqueness. Instead, the structure is reconstructed entirely from preorder ordering and child counts, so repeated values work naturally.

Highly unbalanced trees are also important because recursive depth can become large. A chain-like tree of height 1000 still works correctly because each recursive call processes exactly one node and then continues downward. The algorithm's correctness does not depend on balanced branching.