LeetCode 652 - Find Duplicate Subtrees

The problem gives us the root of a binary tree and asks us to identify all duplicate subtrees inside it. A subtree is defined as any node together with all of its descendants.

LeetCode Problem 652

Difficulty: 🟡 Medium
Topics: Hash Table, Tree, Depth-First Search, Binary Tree

Solution

Problem Understanding

The problem gives us the root of a binary tree and asks us to identify all duplicate subtrees inside it. A subtree is defined as any node together with all of its descendants. Two subtrees are considered duplicates if they have exactly the same structure and every corresponding node contains the same value.

The output should contain one representative root node for each type of duplicate subtree. If a subtree appears multiple times, we only return one of those roots. The order of the returned nodes does not matter.

For example, consider this tree:

        1
       / \
      2   3
     /   / \
    4   2   4
       /
      4

The subtree rooted at value 4 appears multiple times. The subtree:

  2
 /
4

also appears twice. Therefore, both subtree roots should appear in the result.

The constraints tell us that the tree can contain up to 5000 nodes. This is large enough that naive subtree comparisons can become expensive if implemented poorly. Since binary trees can have many overlapping subtrees, repeatedly comparing subtree structures directly would lead to excessive repeated work.

Several edge cases are important:

  • A tree with only one node cannot contain duplicates.
  • Multiple leaf nodes with the same value are duplicate subtrees.
  • Deep skewed trees may stress recursion depth and serialization size.
  • Different structures with the same values are not duplicates.
  • Identical structures with different values are not duplicates.
  • A duplicate subtree should only be added once to the answer, even if it appears many times.

The problem guarantees that the input is a valid binary tree and that node values remain within a small integer range, but the structure itself can vary significantly.

Approaches

Brute Force Approach

A brute force solution would compare every subtree against every other subtree.

For each node in the tree, we could generate its subtree and compare it structurally against the subtree rooted at every other node. To determine whether two subtrees are equal, we would recursively compare node values and recursively compare left and right children.

This approach is correct because every possible pair of subtrees is examined directly. If two subtrees match structurally and by value, they are duplicates.

However, this method is very inefficient. A tree with n nodes contains O(n) subtrees. Comparing two subtrees may take O(n) time in the worst case. Since we may compare every pair of subtrees, the total complexity can approach O(n^3).

With up to 5000 nodes, this becomes too slow.

Optimal Approach

The key insight is that duplicate subtrees can be detected by giving every subtree a unique representation.

If two subtrees serialize to the same representation, then they are duplicates. We can perform a postorder traversal and build a serialization string for every subtree:

(value,left_subtree,right_subtree)

For example:

4,#,#
2,4,#,#,#

During traversal:

  • Every subtree is serialized exactly once.
  • A hash map tracks how many times each serialization has appeared.
  • When a serialization appears for the second time, we add that subtree root to the answer.

Postorder traversal is important because we must fully process children before we can serialize the parent subtree.

This reduces repeated comparisons dramatically. Instead of repeatedly comparing structures, we compute a canonical representation once per subtree.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^3) O(n) Compares every subtree pair directly
Optimal O(n^2) O(n^2) Uses subtree serialization with hashing

Algorithm Walkthrough

  1. Create a hash map that stores how many times each subtree serialization has appeared.

The key is the serialization string, and the value is the frequency count. This allows us to quickly determine whether we have seen an identical subtree before. 2. Create an empty result list.

Whenever we discover a duplicate subtree for the first time, we append its root node to this list. 3. Perform a postorder DFS traversal.

We process the left subtree first, then the right subtree, and finally the current node. This order ensures that child serializations are already computed before building the parent's serialization. 4. Define the serialization format.

For each node:

value,left_serial,right_serial

Null children are represented using a special marker such as #.

For example:

4,#,#

uniquely represents a leaf node with value 4. 5. Recursively serialize the left and right children.

The recursive calls return serialization strings representing the entire child subtrees. 6. Build the current subtree serialization.

Combine the current node value with the left and right serializations into one string. 7. Update the frequency map.

Increment the count for the current serialization. 8. Detect duplicates.

If the frequency becomes exactly 2, append the current node to the result list.

We only add the node once. If the count later becomes 3 or 4, we do not add additional copies. 9. Return the final result list after traversal completes.

Why it works

The algorithm works because the serialization uniquely captures both subtree structure and node values. Two subtrees produce identical serializations if and only if they are structurally identical and contain the same values in corresponding positions.

Since every subtree is serialized exactly once during DFS traversal, the hash map correctly counts occurrences of every subtree type. Whenever a serialization appears more than once, we have found duplicate subtrees.

Python Solution

# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right

from typing import Optional, List
from collections import defaultdict

class Solution:
    def findDuplicateSubtrees(
        self,
        root: Optional[TreeNode]
    ) -> List[Optional[TreeNode]]:

        subtree_count = defaultdict(int)
        duplicates = []

        def dfs(node: Optional[TreeNode]) -> str:
            if not node:
                return "#"

            left_serial = dfs(node.left)
            right_serial = dfs(node.right)

            serial = (
                f"{node.val},{left_serial},{right_serial}"
            )

            subtree_count[serial] += 1

            if subtree_count[serial] == 2:
                duplicates.append(node)

            return serial

        dfs(root)

        return duplicates

The implementation follows the postorder serialization strategy described earlier.

The subtree_count dictionary stores how many times each serialized subtree has appeared. The duplicates list stores the roots of duplicate subtree types.

The recursive dfs function handles subtree serialization. If the node is None, it returns the null marker "#".

For non-null nodes, the function recursively serializes the left and right subtrees first. This is why the traversal is postorder. Once both child serializations are available, the current subtree serialization is constructed.

The serialization string is then inserted into the frequency map. If its frequency becomes exactly 2, the subtree has just become a duplicate, so the current node is appended to the answer list.

Finally, the traversal begins at the root, and the collected duplicates are returned.

Go Solution

/**
 * Definition for a binary tree node.
 * type TreeNode struct {
 *     Val int
 *     Left *TreeNode
 *     Right *TreeNode
 * }
 */

import "strconv"

func findDuplicateSubtrees(root *TreeNode) []*TreeNode {
    subtreeCount := make(map[string]int)
    duplicates := []*TreeNode{}

    var dfs func(node *TreeNode) string

    dfs = func(node *TreeNode) string {
        if node == nil {
            return "#"
        }

        leftSerial := dfs(node.Left)
        rightSerial := dfs(node.Right)

        serial := strconv.Itoa(node.Val) +
            "," +
            leftSerial +
            "," +
            rightSerial

        subtreeCount[serial]++

        if subtreeCount[serial] == 2 {
            duplicates = append(duplicates, node)
        }

        return serial
    }

    dfs(root)

    return duplicates
}

The Go solution mirrors the Python implementation closely.

Go uses map[string]int instead of Python's defaultdict. Missing keys default to zero automatically when accessed.

The recursive DFS function is declared as a function variable because recursive anonymous functions in Go require prior declaration.

Null nodes are represented using "#" exactly as in the Python version.

String concatenation is handled explicitly using +, and integer node values are converted to strings using strconv.Itoa.

Worked Examples

Example 1

Input:
        1
       / \
      2   3
     /   / \
    4   2   4
       /
      4

Traversal Order

The traversal is postorder:

4 -> 2 -> 4 -> 4 -> 2 -> 3 -> 1

Serialization Process

Node Left Serialization Right Serialization Current Serialization Count
4 # # 4,#,# 1
2 4,#,# # 2,4,#,#,# 1
4 # # 4,#,# 2
4 # # 4,#,# 3
2 4,#,# # 2,4,#,#,# 2
3 2,4,#,#,# 4,#,# 3,2,4,#,#,#,4,#,# 1
1 2,4,#,#,# 3,2,4,#,#,#,4,#,# 1,2,4,#,#,#,3,2,4,#,#,#,4,#,# 1

Duplicate Detection

When "4,#,#" reaches count 2, the leaf node 4 is added to the result.

When "2,4,#,#,#" reaches count 2, the subtree rooted at 2 is added.

Final result:

[[4], [2,4]]

Example 2

    2
   / \
  1   1

Serialization Table

Node Serialization Count
Left 1 1,#,# 1
Right 1 1,#,# 2
2 2,1,#,#,1,#,# 1

The subtree "1,#,#" appears twice, so one of the 1 nodes is returned.

Final result:

[[1]]

Example 3

        2
       / \
      2   2
     /   /
    3   3

Serialization Table

Node Serialization Count
Left 3 3,#,# 1
Left 2 2,3,#,#,# 1
Right 3 3,#,# 2
Right 2 2,3,#,#,# 2
Root 2 2,2,3,#,#,#,2,3,#,#,# 1

Duplicate subtrees:

[3]
[2,3]

Complexity Analysis

Measure Complexity Explanation
Time O(n^2) Serialization strings can grow proportional to subtree size
Space O(n^2) Stored serialization strings may collectively require quadratic space

Although each node is visited once, serialization construction is not constant time. A subtree serialization may contain many characters, and concatenating these strings repeatedly can lead to quadratic total cost in the worst case.

For balanced trees, practical performance is usually much better, but the formal worst case remains O(n^2).

An advanced optimization would assign integer IDs to subtree structures instead of full strings, reducing complexity closer to O(n).

Test Cases

from collections import deque

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

def values(nodes):
    return sorted(node.val for node in nodes)

solution = Solution()

# Example 1
root1 = TreeNode(1)
root1.left = TreeNode(2)
root1.right = TreeNode(3)
root1.left.left = TreeNode(4)
root1.right.left = TreeNode(2)
root1.right.right = TreeNode(4)
root1.right.left.left = TreeNode(4)

result1 = solution.findDuplicateSubtrees(root1)
assert values(result1) == [2, 4]  # Multiple duplicate subtree types

# Example 2
root2 = TreeNode(2)
root2.left = TreeNode(1)
root2.right = TreeNode(1)

result2 = solution.findDuplicateSubtrees(root2)
assert values(result2) == [1]  # Duplicate leaf nodes

# Example 3
root3 = TreeNode(2)
root3.left = TreeNode(2)
root3.right = TreeNode(2)
root3.left.left = TreeNode(3)
root3.right.left = TreeNode(3)

result3 = solution.findDuplicateSubtrees(root3)
assert values(result3) == [2, 3]  # Nested duplicate subtrees

# Single node tree
root4 = TreeNode(1)

result4 = solution.findDuplicateSubtrees(root4)
assert values(result4) == []  # No duplicates possible

# All nodes identical
root5 = TreeNode(1)
root5.left = TreeNode(1)
root5.right = TreeNode(1)
root5.left.left = TreeNode(1)
root5.right.right = TreeNode(1)

result5 = solution.findDuplicateSubtrees(root5)
assert values(result5) == [1, 1]  # Duplicate leaves and larger subtree

# Different structures, same values
root6 = TreeNode(1)
root6.left = TreeNode(2)
root6.right = TreeNode(2)
root6.left.left = TreeNode(3)
root6.right.right = TreeNode(3)

result6 = solution.findDuplicateSubtrees(root6)
assert values(result6) == [3]  # Structure must also match

# Deep skewed tree
root7 = TreeNode(1)
root7.left = TreeNode(1)
root7.left.left = TreeNode(1)
root7.left.left.left = TreeNode(1)

result7 = solution.findDuplicateSubtrees(root7)
assert values(result7) == [1]  # Duplicate leaf subtree only
Test Why
Example 1 Validates multiple duplicate subtree patterns
Example 2 Confirms duplicate leaves are detected
Example 3 Tests nested duplicate subtrees
Single node tree Ensures no false positives
All nodes identical Verifies repeated subtree counting
Different structures, same values Confirms structure matters
Deep skewed tree Tests recursion on unbalanced trees

Edge Cases

Single Node Tree

A tree containing only one node cannot possibly contain duplicate subtrees because there is only one subtree in total.

A buggy implementation might incorrectly count the single subtree as duplicated if serialization tracking is mishandled. This implementation only adds a subtree when its count reaches exactly 2, so a single occurrence is never reported.

Duplicate Leaf Nodes

Leaf nodes are the smallest possible subtrees. Multiple leaves with the same value are considered duplicate subtrees because both their structure and values match.

This case is important because some implementations focus only on larger subtrees and forget that leaves are valid subtrees too. The serialization format naturally handles this using:

value,#,#

which uniquely represents leaf nodes.

Same Values but Different Structures

Two subtrees may contain the same values but have different shapes.

For example:

    2          2
   /            \
  3              3

These are not duplicates because their structures differ.

The serialization explicitly includes null markers, so these produce different serializations:

2,3,#,#,#
2,#,3,#,#

This guarantees that structural differences are preserved.

Large Skewed Trees

A highly unbalanced tree behaves similarly to a linked list. Recursive implementations may risk deep recursion depth.

The algorithm still works correctly because each node is processed exactly once, and null markers preserve subtree structure accurately even in degenerate trees.