LeetCode 2003 - Smallest Missing Genetic Value in Each Subtree

This problem asks us to compute, for every node in a rooted tree, the smallest positive integer that does not appear in the subtree rooted at that node. We are given two arrays: parents describes the tree structure, and nums contains distinct genetic values assigned to each node.

LeetCode Problem 2003

Difficulty: 🔴 Hard
Topics: Array, Dynamic Programming, Tree, Depth-First Search, Union-Find

Solution

Problem Understanding

This problem asks us to compute, for every node in a rooted tree, the smallest positive integer that does not appear in the subtree rooted at that node. We are given two arrays: parents describes the tree structure, and nums contains distinct genetic values assigned to each node. Node 0 is the root, and all nodes are numbered from 0 to n - 1.

The input parents tells us, for each node, who its parent is. The input nums assigns each node a unique integer in the range [1, 10^5]. The output is an array of length n where ans[i] is the smallest missing number in the subtree rooted at node i.

The key constraints to note are the size of the tree (n can be up to 10^5) and the value range of nums (1 to 10^5). These constraints make brute-force solutions that scan every subtree too slow. Additionally, the uniqueness of nums[i] simplifies checking for missing values because we do not need to handle duplicates.

Important edge cases include trees where 1 is not present at all, trees where the maximum number is at the root, and linear chains where the missing value propagates through ancestors.

Approaches

A naive approach would be to, for each node, traverse its subtree and collect all genetic values into a set. Then, for each node, we would scan from 1 upwards to find the first missing number. While this approach is correct, it requires traversing all descendants for every node, leading to a worst-case complexity of O(n^2) for large trees, which is infeasible given n <= 10^5.

The optimal approach leverages the observation that the missing values propagate from the leaf where nums[i] = 1 exists. Only the path from the node containing 1 to the root will have missing values greater than 1. Therefore, we can first detect if 1 exists; if it does not, every node has 1 as the smallest missing value. Otherwise, we can incrementally mark the numbers present in the subtree as we move from the node containing 1 up to the root, keeping track of the smallest missing number dynamically. Using a set or boolean array for presence allows this propagation efficiently, reducing redundant subtree traversals.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2) O(n) Collect all values in subtree for every node, scan for smallest missing
Optimal O(n + maxVal) O(n + maxVal) Track presence along the path from node containing 1 up to root, dynamically update missing values

Algorithm Walkthrough

  1. Build an adjacency list tree from the parents array to represent the tree structure. This allows quick access to children for DFS traversal.

  2. Initialize the answer array ans with all values as 1 because 1 is the minimum possible missing value.

  3. If 1 does not exist in nums, return ans immediately since 1 is missing for all subtrees.

  4. If 1 exists, find its index start. This is the node where the propagation of missing values begins.

  5. Initialize a boolean array seen to mark which genetic values have been observed. Initially, all values are unmarked.

  6. Start from node start and traverse up the tree to the root:

  7. For the current node, mark its nums[node] as seen.

  8. Recursively mark all numbers in its children subtrees (excluding already processed ancestors) as seen.

  9. Incrementally update the smallest missing number mex until you find a number that has not been seen.

  10. Set ans[node] = mex.

  11. Continue this process until reaching the root.

  12. Return ans array.

Why it works: By only processing the path from the node containing 1 to the root and marking seen numbers in all subtrees of that path, we guarantee that all relevant numbers are considered exactly once. Nodes outside this path always have 1 as the missing value, which satisfies the correctness.

Python Solution

from typing import List
from collections import defaultdict

class Solution:
    def smallestMissingValueSubtree(self, parents: List[int], nums: List[int]) -> List[int]:
        n = len(parents)
        tree = defaultdict(list)
        for i, p in enumerate(parents):
            if p != -1:
                tree[p].append(i)
        
        ans = [1] * n
        num_to_node = {num: i for i, num in enumerate(nums)}
        if 1 not in num_to_node:
            return ans
        
        seen = [False] * (10**5 + 2)
        start = num_to_node[1]
        mex = 1
        
        def dfs(node: int):
            if seen[nums[node]]:
                return
            seen[nums[node]] = True
            for child in tree[node]:
                dfs(child)
        
        node = start
        while node != -1:
            dfs(node)
            while seen[mex]:
                mex += 1
            ans[node] = mex
            node = parents[node]
        
        return ans

The code first constructs the tree, maps values to nodes, and checks if 1 exists. If 1 is absent, the answer is trivial. Otherwise, we use a DFS to mark seen values in subtrees along the path from the node containing 1 to the root. The mex variable tracks the smallest missing value. For each ancestor, after processing the subtree, we update mex and record it in ans.

Go Solution

package main

func smallestMissingValueSubtree(parents []int, nums []int) []int {
    n := len(parents)
    tree := make([][]int, n)
    for i := 0; i < n; i++ {
        if parents[i] != -1 {
            tree[parents[i]] = append(tree[parents[i]], i)
        }
    }
    
    ans := make([]int, n)
    for i := range ans {
        ans[i] = 1
    }
    
    numToNode := make(map[int]int)
    for i, num := range nums {
        numToNode[num] = i
    }
    
    if _, ok := numToNode[1]; !ok {
        return ans
    }
    
    seen := make([]bool, 100002)
    mex := 1
    start := numToNode[1]
    
    var dfs func(int)
    dfs = func(node int) {
        if seen[nums[node]] {
            return
        }
        seen[nums[node]] = true
        for _, child := range tree[node] {
            dfs(child)
        }
    }
    
    node := start
    for node != -1 {
        dfs(node)
        for seen[mex] {
            mex++
        }
        ans[node] = mex
        node = parents[node]
    }
    
    return ans
}

The Go solution mirrors the Python approach. Slice initialization, map handling, and recursion are adapted to Go syntax. Slices are used for the adjacency list and seen array. The same logic for DFS and mex updating applies.

Worked Examples

Example 1: parents = [-1,0,0,2], nums = [1,2,3,4]

Step Node DFS Marks Seen Mex ans
Start at 1 0 [1,2,3,4] 5 ans[0]=5
Node 1 1 [1,2,3,4] 1 ans[1]=1
Node 2 2 [1,2,3,4] 1 ans[2]=1
Node 3 3 [1,2,3,4] 1 ans[3]=1

Example 2: parents = [-1,0,1,0,3,3], nums = [5,4,6,2,1,3]

Similar DFS propagation updates mex incrementally along the path from node containing 1 to root, producing [7,1,1,4,2,1].

Complexity Analysis

Measure Complexity Explanation
Time O(n + maxVal) Each node is visited once along the path from node 1 to root, and each number up to 10^5 is processed once in seen.
Space O(n + maxVal) Tree adjacency list, answer array, and seen array.

The complexity is dominated by the seen array of size 10^5 and traversal along the path from node 1 to root.

Test Cases

# Basic examples
assert Solution().smallestMissingValueSubtree([-1,0,0,2], [1,2,3,4]) == [5,1,1,1]
assert Solution().smallestMissingValueSubtree([-1,0,1,0,3,3], [5,4,6,2,1,3]) == [7,1,1,4,2,1]
assert Solution().smallestMissingValueSubtree([-1,2,3,