LeetCode 508 - Most Frequent Subtree Sum

Difficulty: 🟡 Medium
Topics: Hash Table, Tree, Depth-First Search, Binary Tree

Solution

Problem Understanding

The problem asks us to compute the subtree sum for every node in a binary tree, then determine which subtree sum appears most frequently.

A subtree rooted at a node includes that node and all of its descendants. The subtree sum is simply the sum of every value contained in that subtree.

For example, consider the tree:

    5
   / \
  2  -3

The subtree sums are:

Node 2 → sum = 2
Node -3 → sum = -3
Node 5 → sum = 5 + 2 + (-3) = 4

Each sum appears exactly once, so all three values are returned.

The input is the root node of a binary tree. The output is a list of integers representing all subtree sums that occur with the highest frequency.

The constraints tell us several important things:

The tree can contain up to 10^4 nodes.
Node values may be negative.
The tree is guaranteed to contain at least one node.

Because the tree may be large, inefficient repeated computations will become expensive. An algorithm worse than linear or near-linear time may struggle at the upper bound.

Several edge cases are important:

A tree with only one node.
Multiple subtree sums occurring with the same frequency.
Negative subtree sums.
Repeated subtree sums caused by symmetric or repeated structures.
Highly skewed trees that resemble linked lists.

A naive implementation may repeatedly recompute subtree sums for overlapping subtrees, causing unnecessary work.

Approaches

Brute Force Approach

A straightforward approach is to compute the subtree sum for every node independently.

For each node:

Recursively calculate the sum of its entire subtree.
Store that sum in a frequency map.
Repeat the process for every node in the tree.

The issue is that subtree calculations overlap heavily. Suppose we compute the subtree sum for a parent node. Its calculation already includes the sums of all descendants. If we later compute the subtree sums for those descendants again from scratch, the same nodes are revisited repeatedly.

In the worst case, such as a skewed tree, this leads to quadratic complexity.

This approach is correct because every subtree sum is eventually computed and counted, but it is inefficient due to redundant traversal work.

Optimal Approach

The key observation is that subtree sums can naturally be computed using postorder depth-first traversal.

A subtree sum depends on:

the left subtree sum,
the right subtree sum,
the current node value.

This means we should process children before processing the parent.

Using a single DFS traversal:

Recursively compute the left subtree sum.
Recursively compute the right subtree sum.
Combine them with the current node value.
Record the resulting sum in a hash map.

Because each node is processed exactly once, we avoid redundant calculations entirely.

A hash map is ideal for tracking frequencies because subtree sums may be negative, large, or repeated arbitrarily.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n²)	O(n)	Recomputes subtree sums repeatedly
Optimal	O(n)	O(n)	Single postorder DFS traversal

Algorithm Walkthrough

Create a hash map to store subtree sum frequencies.

The key will be the subtree sum, and the value will be the number of times that sum appears. We need fast insertion and lookup, which makes a hash map the natural choice. 2. Initialize a variable to track the maximum frequency seen so far.

As subtree sums are computed, we continuously update this value so we know which sums are currently the most frequent. 3. Perform a postorder DFS traversal.

Postorder traversal is essential because the subtree sum of a node depends on the sums of its children. We must compute child sums before computing the parent's sum. 4. For each node, recursively compute the left subtree sum.

If the left child is None, its contribution is 0. 5. Recursively compute the right subtree sum.

Similarly, a missing right child contributes 0. 6. Compute the current subtree sum.

The formula is:

$\text{sum} = \text{node.val} + \text{leftSum} + \text{rightSum}$ 7. Update the frequency map.

Increase the count for the computed subtree sum. 8. Update the maximum frequency if needed.

If this subtree sum now appears more frequently than previous sums, update the maximum frequency tracker. 9. Return the subtree sum upward to the parent call.

This allows parent nodes to build their own subtree sums efficiently. 10. After DFS completes, iterate through the frequency map.

Collect every subtree sum whose frequency equals the maximum frequency. 11. Return the resulting list.

Why it works

The algorithm works because every subtree sum is computed exactly once during postorder traversal. When processing a node, the algorithm already knows the exact sums of its left and right subtrees, so the current subtree sum is computed correctly.

Since every subtree sum is inserted into the frequency map exactly once, the frequency counts are accurate. Finally, selecting all sums with the maximum frequency guarantees the correct answer.

Python Solution

# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right

from typing import Optional, List
from collections import defaultdict

class Solution:
    def findFrequentTreeSum(self, root: Optional[TreeNode]) -> List[int]:
        frequency = defaultdict(int)
        max_frequency = 0

        def dfs(node: Optional[TreeNode]) -> int:
            nonlocal max_frequency

            if not node:
                return 0

            left_sum = dfs(node.left)
            right_sum = dfs(node.right)

            current_sum = node.val + left_sum + right_sum

            frequency[current_sum] += 1
            max_frequency = max(max_frequency, frequency[current_sum])

            return current_sum

        dfs(root)

        result = []

        for subtree_sum, count in frequency.items():
            if count == max_frequency:
                result.append(subtree_sum)

        return result

The implementation follows the postorder DFS strategy directly.

The frequency dictionary stores how many times each subtree sum appears. The max_frequency variable keeps track of the highest occurrence count encountered during traversal.

The nested dfs function computes subtree sums recursively. The base case returns 0 for a null node, which simplifies the recursive formula and avoids special handling for missing children.

For each node, the function:

Computes the left subtree sum.
Computes the right subtree sum.
Combines them with the current node value.
Updates the frequency map.
Updates the maximum frequency.
Returns the subtree sum upward.

After traversal finishes, the final loop collects all subtree sums whose frequency matches the maximum frequency.

The solution is concise because DFS naturally models subtree computation.

Go Solution

/**
 * Definition for a binary tree node.
 * type TreeNode struct {
 *     Val int
 *     Left *TreeNode
 *     Right *TreeNode
 * }
 */
func findFrequentTreeSum(root *TreeNode) []int {
    frequency := make(map[int]int)
    maxFrequency := 0

    var dfs func(node *TreeNode) int

    dfs = func(node *TreeNode) int {
        if node == nil {
            return 0
        }

        leftSum := dfs(node.Left)
        rightSum := dfs(node.Right)

        currentSum := node.Val + leftSum + rightSum

        frequency[currentSum]++

        if frequency[currentSum] > maxFrequency {
            maxFrequency = frequency[currentSum]
        }

        return currentSum
    }

    dfs(root)

    result := []int{}

    for subtreeSum, count := range frequency {
        if count == maxFrequency {
            result = append(result, subtreeSum)
        }
    }

    return result
}

The Go implementation mirrors the Python solution closely.

Go uses a map[int]int for frequency tracking instead of Python's defaultdict.

The recursive DFS is implemented as a closure assigned to a variable. Null nodes are represented by nil.

The result is accumulated using a slice. Since Go maps return keys in arbitrary order, the returned slice order is unspecified, which is acceptable because the problem allows any order.

Integer overflow is not an issue under the given constraints because the maximum possible subtree sum remains within Go's int range.

Worked Examples

Example 1

Input:

    5
   / \
  2  -3

DFS Traversal Steps

Step	Node	Left Sum	Right Sum	Current Sum	Frequency Map
1	2	0	0	2	{2: 1}
2	-3	0	0	-3	{2: 1, -3: 1}
3	5	2	-3	4	{2: 1, -3: 1, 4: 1}

Maximum frequency is 1.

Result:

[2, -3, 4]

All sums appear equally often.

Example 2

Input:

    5
   / \
  2  -5

DFS Traversal Steps

Step	Node	Left Sum	Right Sum	Current Sum	Frequency Map
1	2	0	0	2	{2: 1}
2	-5	0	0	-5	{2: 1, -5: 1}
3	5	2	-5	2	{2: 2, -5: 1}

Maximum frequency is 2.

Result:

[2]

The subtree sum 2 appears twice.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each node is visited exactly once
Space	O(n)	Frequency map and recursion stack may store up to n elements

The DFS traversal processes every node one time, and each operation inside the traversal is constant time. Therefore, the total runtime is linear in the number of nodes.

The frequency map may contain up to n unique subtree sums. Additionally, the recursion stack can grow to O(n) in the worst case for a highly skewed tree.

Test Cases

# Definition for testing
class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

# Example 1
root1 = TreeNode(5, TreeNode(2), TreeNode(-3))
assert sorted(Solution().findFrequentTreeSum(root1)) == sorted([2, -3, 4])  # all sums unique

# Example 2
root2 = TreeNode(5, TreeNode(2), TreeNode(-5))
assert Solution().findFrequentTreeSum(root2) == [2]  # repeated subtree sum

# Single node tree
root3 = TreeNode(7)
assert Solution().findFrequentTreeSum(root3) == [7]  # minimal tree

# All zeros
root4 = TreeNode(0, TreeNode(0), TreeNode(0))
assert Solution().findFrequentTreeSum(root4) == [0]  # same sum everywhere

# Skewed tree
root5 = TreeNode(1, TreeNode(2, TreeNode(3)))
assert sorted(Solution().findFrequentTreeSum(root5)) == sorted([3, 5, 6])  # linked-list shape

# Negative values
root6 = TreeNode(-1, TreeNode(-2), TreeNode(-3))
assert sorted(Solution().findFrequentTreeSum(root6)) == sorted([-2, -3, -6])  # negative sums

# Multiple repeated sums
root7 = TreeNode(1,
                 TreeNode(2),
                 TreeNode(2))
assert Solution().findFrequentTreeSum(root7) == [2]  # repeated leaf sum

# Larger balanced tree
root8 = TreeNode(10,
                 TreeNode(5,
                          TreeNode(3),
                          TreeNode(2)),
                 TreeNode(-3,
                          None,
                          TreeNode(11)))
result8 = Solution().findFrequentTreeSum(root8)
assert isinstance(result8, list)  # general correctness check

Test	Why
`[5,2,-3]`	Verifies multiple equally frequent sums
`[5,2,-5]`	Verifies repeated subtree sums
Single node tree	Validates minimum input size
All zeros	Tests identical subtree sums everywhere
Skewed tree	Tests deep recursion structure
Negative values	Ensures negative sums work correctly
Repeated leaf values	Verifies counting logic
Larger balanced tree	Tests general recursive correctness

Edge Cases

Single Node Tree

A tree containing only one node is the smallest valid input. A buggy implementation might incorrectly assume the existence of children or fail to initialize frequency tracking properly.

The implementation handles this naturally because null children contribute 0, so the subtree sum becomes simply the node's value.

Multiple Sums With Equal Frequency

The problem allows ties, meaning several subtree sums may occur equally often. A common mistake is returning only one value instead of all maximum-frequency sums.

The implementation avoids this by scanning the frequency map after traversal and collecting every sum whose count equals the maximum frequency.

Highly Skewed Trees

A tree may degenerate into a linked list shape, causing recursion depth to grow linearly with the number of nodes.

The algorithm still works correctly because each node is visited once, though recursion stack usage becomes O(n) in the worst case.

Negative Values

Subtree sums may become negative because node values themselves may be negative. Some incorrect implementations assume sums are non-negative and use array indexing instead of hash maps.

Using a hash map allows arbitrary integer sums, including negative values, without any special handling.