LeetCode 1519 - Number of Nodes in the Sub-Tree With the Same Label

Difficulty: 🟡 Medium
Topics: Hash Table, Tree, Depth-First Search, Breadth-First Search, Counting

Solution

Problem Understanding

The problem gives us a tree with n nodes numbered from 0 to n - 1. The tree is represented by an array of edges, where each edge connects two nodes ai and bi. Each node also has a label, represented as a string labels, where labels[i] is the label of node i. The root of the tree is always node 0.

We are asked to return an array ans of size n, where ans[i] is the number of nodes in the subtree rooted at node i that have the same label as node i. A subtree includes the node itself and all its descendants.

The input guarantees that the graph is a tree: it is connected and has no cycles, and edges.length is exactly n - 1. The labels consist of lowercase English letters. The constraints allow n to be up to 10^5, which means any solution with worse than linear or near-linear complexity will likely be too slow.

Key points to note:

Every node counts itself in its subtree.
We need to compare labels for counting, which requires some form of frequency tracking.
Edge cases include nodes with unique labels, all nodes having the same label, or the tree being a straight line (degenerate tree).

Approaches

Brute Force

A naive approach is to perform a depth-first search (DFS) or breadth-first search (BFS) for each node separately to traverse its subtree and count the number of nodes with the same label. For each node, this requires traversing potentially all n nodes in the worst case, leading to a total time complexity of O(n^2). This is correct but infeasible for large trees with n up to 10^5.

Optimal Approach

The key observation is that we do not need to traverse the subtree of each node independently. We can perform a single DFS from the root and propagate counts of labels from the children up to their parents. Using a frequency array (or hash map) to count occurrences of each label in the subtree allows us to efficiently compute ans[i] for every node in one traversal. This reduces the complexity to linear time relative to the number of nodes.

We can store counts of all 26 lowercase letters for each subtree and update the parent’s count by adding the child’s counts. The count for the node's own label gives the answer for that node.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n^2)	O(n)	DFS/BFS on each node separately
Optimal	O(n)	O(n)	Single DFS, propagate label counts from children to parent

Algorithm Walkthrough

Build an adjacency list from edges to represent the tree. This allows quick access to each node’s children.
Initialize an answer array ans of size n with zeros.
Define a recursive DFS function dfs(node, parent):
Create a frequency array count of size 26 initialized to 0.
Iterate over each child of node in the adjacency list:
Skip the parent to avoid revisiting the previous node.
Recursively call dfs(child, node) to get the child’s frequency array.
Add the child’s counts into the current node’s count.
Increment the count corresponding to labels[node] by 1.
Set ans[node] to the count of labels[node].
Return the count array.
Call dfs(0, -1) to start DFS from the root.
Return the ans array.

Why it works: Each DFS call aggregates the label counts from all descendant nodes, including itself. This ensures that for each node, we correctly count how many nodes in its subtree share its label. By propagating counts upward, we compute the answer for all nodes efficiently in a single traversal.

Python Solution

from typing import List
from collections import defaultdict

class Solution:
    def countSubTrees(self, n: int, edges: List[List[int]], labels: str) -> List[int]:
        adj = defaultdict(list)
        for u, v in edges:
            adj[u].append(v)
            adj[v].append(u)
        
        ans = [0] * n
        
        def dfs(node: int, parent: int) -> List[int]:
            count = [0] * 26
            for child in adj[node]:
                if child == parent:
                    continue
                child_count = dfs(child, node)
                for i in range(26):
                    count[i] += child_count[i]
            label_index = ord(labels[node]) - ord('a')
            count[label_index] += 1
            ans[node] = count[label_index]
            return count
        
        dfs(0, -1)
        return ans

Explanation: The adjacency list allows fast access to children. The DFS function aggregates counts of each label in the subtree. For each node, after processing all children, we increment the count of the node's label and store it in ans. This method ensures all counts are correct with a single DFS traversal.

Go Solution

func countSubTrees(n int, edges [][]int, labels string) []int {
    adj := make([][]int, n)
    for _, e := range edges {
        u, v := e[0], e[1]
        adj[u] = append(adj[u], v)
        adj[v] = append(adj[v], u)
    }
    
    ans := make([]int, n)
    
    var dfs func(node, parent int) [26]int
    dfs = func(node, parent int) [26]int {
        var count [26]int
        for _, child := range adj[node] {
            if child == parent {
                continue
            }
            childCount := dfs(child, node)
            for i := 0; i < 26; i++ {
                count[i] += childCount[i]
            }
        }
        idx := labels[node] - 'a'
        count[idx]++
        ans[node] = count[idx]
        return count
    }
    
    dfs(0, -1)
    return ans
}

Explanation: The Go version mirrors the Python logic. We use a fixed-size array [26]int for counts instead of dynamic lists. Slices are used for the adjacency list. The recursive DFS function updates counts and fills the ans array. Go does not have a default dictionary, so an adjacency slice is initialized upfront.

Worked Examples

Example 1: n = 7, edges = [[0,1],[0,2],[1,4],[1,5],[2,3],[2,6]], labels = "abaedcd"

During DFS:

Node	Subtree Labels	Count of label[node]	ans[node]
4	d	1	1
5	c	1	1
1	b, d, c	b=1	1
3	e	1	1
6	d	1	1
2	a, e, d	a=1	1
0	a, b, a, e, c, d, d	a=2	2

Output: [2,1,1,1,1,1,1]

Example 2: n = 4, edges = [[0,1],[1,2],[0,3]], labels = "bbbb"

DFS accumulates counts of 'b' in each subtree, giving [4,2,1,1].

Example 3: n = 5, edges = [[0,1],[0,2],[1,3],[0,4]], labels = "aabab"

DFS yields [3,2,1,1,1].

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each node is visited once in DFS, and each edge is traversed twice.
Space	O(n)	Adjacency list uses O(n), recursion stack up to O(n) in worst-case (skewed tree), and count arrays use O(26) per node but reused.

This linear complexity makes the solution efficient for n up to 10^5.

Test Cases

# Provided examples
assert Solution().countSubTrees(7, [[0,1],[0,2],[1,4],[1,5],[2,3],[2,6]], "abaedcd") == [2,1,1,1,1,1,1]
assert Solution().countSubTrees(4, [[0,1],[1,2],[0,3]], "bbbb") == [4,2,1,1]
assert Solution().countSubTrees(5, [[0,1],[0,2],[1,3],[0,4]], "aabab") == [3,2,1,1,1]

# Single node
assert Solution().countSubTrees(1, [], "a") == [1]

# All unique labels
assert Solution().countSubTrees(3, [[0,1],[0,2]], "abc") == [1,1,1]

# Linear tree
assert Solution