LeetCode 1519 - Number of Nodes in the Sub-Tree With the Same Label
The problem gives us a tree with n nodes numbered from 0 to n - 1. The tree is represented by an array of edges, where each edge connects two nodes ai and bi. Each node also has a label, represented as a string labels, where labels[i] is the label of node i.
Difficulty: 🟡 Medium
Topics: Hash Table, Tree, Depth-First Search, Breadth-First Search, Counting
Solution
Problem Understanding
The problem gives us a tree with n nodes numbered from 0 to n - 1. The tree is represented by an array of edges, where each edge connects two nodes ai and bi. Each node also has a label, represented as a string labels, where labels[i] is the label of node i. The root of the tree is always node 0.
We are asked to return an array ans of size n, where ans[i] is the number of nodes in the subtree rooted at node i that have the same label as node i. A subtree includes the node itself and all its descendants.
The input guarantees that the graph is a tree: it is connected and has no cycles, and edges.length is exactly n - 1. The labels consist of lowercase English letters. The constraints allow n to be up to 10^5, which means any solution with worse than linear or near-linear complexity will likely be too slow.
Key points to note:
- Every node counts itself in its subtree.
- We need to compare labels for counting, which requires some form of frequency tracking.
- Edge cases include nodes with unique labels, all nodes having the same label, or the tree being a straight line (degenerate tree).
Approaches
Brute Force
A naive approach is to perform a depth-first search (DFS) or breadth-first search (BFS) for each node separately to traverse its subtree and count the number of nodes with the same label. For each node, this requires traversing potentially all n nodes in the worst case, leading to a total time complexity of O(n^2). This is correct but infeasible for large trees with n up to 10^5.
Optimal Approach
The key observation is that we do not need to traverse the subtree of each node independently. We can perform a single DFS from the root and propagate counts of labels from the children up to their parents. Using a frequency array (or hash map) to count occurrences of each label in the subtree allows us to efficiently compute ans[i] for every node in one traversal. This reduces the complexity to linear time relative to the number of nodes.
We can store counts of all 26 lowercase letters for each subtree and update the parent’s count by adding the child’s counts. The count for the node's own label gives the answer for that node.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n^2) | O(n) | DFS/BFS on each node separately |
| Optimal | O(n) | O(n) | Single DFS, propagate label counts from children to parent |
Algorithm Walkthrough
-
Build an adjacency list from
edgesto represent the tree. This allows quick access to each node’s children. -
Initialize an answer array
ansof sizenwith zeros. -
Define a recursive DFS function
dfs(node, parent): -
Create a frequency array
countof size 26 initialized to 0. -
Iterate over each child of
nodein the adjacency list: -
Skip the parent to avoid revisiting the previous node.
-
Recursively call
dfs(child, node)to get the child’s frequency array. -
Add the child’s counts into the current node’s
count. -
Increment the count corresponding to
labels[node]by 1. -
Set
ans[node]to the count oflabels[node]. -
Return the
countarray. -
Call
dfs(0, -1)to start DFS from the root. -
Return the
ansarray.
Why it works: Each DFS call aggregates the label counts from all descendant nodes, including itself. This ensures that for each node, we correctly count how many nodes in its subtree share its label. By propagating counts upward, we compute the answer for all nodes efficiently in a single traversal.
Python Solution
from typing import List
from collections import defaultdict
class Solution:
def countSubTrees(self, n: int, edges: List[List[int]], labels: str) -> List[int]:
adj = defaultdict(list)
for u, v in edges:
adj[u].append(v)
adj[v].append(u)
ans = [0] * n
def dfs(node: int, parent: int) -> List[int]:
count = [0] * 26
for child in adj[node]:
if child == parent:
continue
child_count = dfs(child, node)
for i in range(26):
count[i] += child_count[i]
label_index = ord(labels[node]) - ord('a')
count[label_index] += 1
ans[node] = count[label_index]
return count
dfs(0, -1)
return ans
Explanation: The adjacency list allows fast access to children. The DFS function aggregates counts of each label in the subtree. For each node, after processing all children, we increment the count of the node's label and store it in ans. This method ensures all counts are correct with a single DFS traversal.
Go Solution
func countSubTrees(n int, edges [][]int, labels string) []int {
adj := make([][]int, n)
for _, e := range edges {
u, v := e[0], e[1]
adj[u] = append(adj[u], v)
adj[v] = append(adj[v], u)
}
ans := make([]int, n)
var dfs func(node, parent int) [26]int
dfs = func(node, parent int) [26]int {
var count [26]int
for _, child := range adj[node] {
if child == parent {
continue
}
childCount := dfs(child, node)
for i := 0; i < 26; i++ {
count[i] += childCount[i]
}
}
idx := labels[node] - 'a'
count[idx]++
ans[node] = count[idx]
return count
}
dfs(0, -1)
return ans
}
Explanation: The Go version mirrors the Python logic. We use a fixed-size array [26]int for counts instead of dynamic lists. Slices are used for the adjacency list. The recursive DFS function updates counts and fills the ans array. Go does not have a default dictionary, so an adjacency slice is initialized upfront.
Worked Examples
Example 1: n = 7, edges = [[0,1],[0,2],[1,4],[1,5],[2,3],[2,6]], labels = "abaedcd"
During DFS:
| Node | Subtree Labels | Count of label[node] | ans[node] |
|---|---|---|---|
| 4 | d | 1 | 1 |
| 5 | c | 1 | 1 |
| 1 | b, d, c | b=1 | 1 |
| 3 | e | 1 | 1 |
| 6 | d | 1 | 1 |
| 2 | a, e, d | a=1 | 1 |
| 0 | a, b, a, e, c, d, d | a=2 | 2 |
Output: [2,1,1,1,1,1,1]
Example 2: n = 4, edges = [[0,1],[1,2],[0,3]], labels = "bbbb"
DFS accumulates counts of 'b' in each subtree, giving [4,2,1,1].
Example 3: n = 5, edges = [[0,1],[0,2],[1,3],[0,4]], labels = "aabab"
DFS yields [3,2,1,1,1].
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each node is visited once in DFS, and each edge is traversed twice. |
| Space | O(n) | Adjacency list uses O(n), recursion stack up to O(n) in worst-case (skewed tree), and count arrays use O(26) per node but reused. |
This linear complexity makes the solution efficient for n up to 10^5.
Test Cases
# Provided examples
assert Solution().countSubTrees(7, [[0,1],[0,2],[1,4],[1,5],[2,3],[2,6]], "abaedcd") == [2,1,1,1,1,1,1]
assert Solution().countSubTrees(4, [[0,1],[1,2],[0,3]], "bbbb") == [4,2,1,1]
assert Solution().countSubTrees(5, [[0,1],[0,2],[1,3],[0,4]], "aabab") == [3,2,1,1,1]
# Single node
assert Solution().countSubTrees(1, [], "a") == [1]
# All unique labels
assert Solution().countSubTrees(3, [[0,1],[0,2]], "abc") == [1,1,1]
# Linear tree
assert Solution