LeetCode 1557 - Minimum Number of Vertices to Reach All Nodes

The problem is asking us to identify the minimum set of vertices in a directed acyclic graph (DAG) such that starting from any of these vertices, all other nodes in the graph are reachable.

LeetCode Problem 1557

Difficulty: 🟑 Medium
Topics: Graph Theory

Solution

Problem Understanding

The problem is asking us to identify the minimum set of vertices in a directed acyclic graph (DAG) such that starting from any of these vertices, all other nodes in the graph are reachable. In other words, we want the smallest collection of nodes that can β€œcover” the entire graph via directed paths.

The input consists of an integer n representing the number of vertices labeled from 0 to n-1, and an array edges where each element [fromi, toi] denotes a directed edge from fromi to toi. The output is a list of vertices that forms the minimum set from which all other nodes are reachable.

The constraints indicate that the graph can have up to 10^5 nodes and up to 10^5 edges. This means any solution must be linear or near-linear in time, since quadratic solutions would be too slow.

Important observations and edge cases include:

  1. Nodes with no incoming edges must be included in the result because no other node can reach them.
  2. Nodes with incoming edges do not necessarily need to be included since they might be reachable from other nodes.
  3. The graph is guaranteed to be a DAG, so there are no cycles, simplifying reachability considerations.
  4. There is a guarantee that a unique solution exists.

Naive approaches that try to check reachability from every subset of vertices will fail due to the large input size.

Approaches

A brute-force approach would involve considering every subset of vertices and performing a reachability check (via BFS or DFS) to see if all nodes are covered. While this would be correct, it is exponentially slow and impractical for n up to 10^5.

The key observation for an optimal solution is that a node only needs to be included in the set if it has no incoming edges. If a node has at least one incoming edge, it is already reachable from some other node, so it does not need to be part of the minimum set. Thus, the problem reduces to finding all nodes with in-degree 0.

Approach Time Complexity Space Complexity Notes
Brute Force O(2^n * (n + m)) O(n + m) Check all subsets of vertices and perform reachability checks; impractical
Optimal O(n + m) O(n) Find all nodes with in-degree 0; linear scan over edges suffices

Algorithm Walkthrough

  1. Initialize an array or set has_incoming of size n to keep track of which nodes have incoming edges.
  2. Iterate over the list of edges. For each edge [fromi, toi], mark toi as having an incoming edge by setting has_incoming[toi] = True.
  3. After processing all edges, iterate over all nodes 0 to n-1. Collect all nodes i where has_incoming[i] is False.
  4. Return this collection as the result. These nodes have no incoming edges and are the minimum set from which all nodes are reachable.

Why it works: In a DAG, any node with an incoming edge is reachable from some other node. Nodes without incoming edges are the only candidates that must be included in the starting set. By collecting all such nodes, we ensure the minimum set that guarantees reachability for all nodes.

Python Solution

from typing import List

class Solution:
    def findSmallestSetOfVertices(self, n: int, edges: List[List[int]]) -> List[int]:
        has_incoming = [False] * n
        for from_node, to_node in edges:
            has_incoming[to_node] = True
        
        result = [node for node in range(n) if not has_incoming[node]]
        return result

The Python implementation initializes a boolean list to mark nodes with incoming edges. It then loops over all edges and sets the corresponding entries. Finally, a list comprehension collects all nodes with no incoming edges and returns them.

Go Solution

func findSmallestSetOfVertices(n int, edges [][]int) []int {
    hasIncoming := make([]bool, n)
    for _, edge := range edges {
        from, to := edge[0], edge[1]
        hasIncoming[to] = true
    }
    
    result := []int{}
    for node := 0; node < n; node++ {
        if !hasIncoming[node] {
            result = append(result, node)
        }
    }
    return result
}

The Go version mirrors the Python logic, using a boolean slice to track incoming edges. A standard slice is used to store the result, appending nodes that have no incoming edges.

Worked Examples

Example 1: n = 6, edges = [[0,1],[0,2],[2,5],[3,4],[4,2]]

  1. Initialize has_incoming = [False, False, False, False, False, False]
  2. Process edges:
  • [0,1] β†’ has_incoming[1] = True
  • [0,2] β†’ has_incoming[2] = True
  • [2,5] β†’ has_incoming[5] = True
  • [3,4] β†’ has_incoming[4] = True
  • [4,2] β†’ has_incoming[2] = True (already True)
  1. Final has_incoming = [False, True, True, False, True, True]
  2. Nodes with no incoming edges: [0,3] β†’ Result [0,3]

Example 2: n = 5, edges = [[0,1],[2,1],[3,1],[1,4],[2,4]]

  1. Initialize has_incoming = [False, False, False, False, False]
  2. Process edges:
  • [0,1] β†’ has_incoming[1] = True
  • [2,1] β†’ has_incoming[1] = True
  • [3,1] β†’ has_incoming[1] = True
  • [1,4] β†’ has_incoming[4] = True
  • [2,4] β†’ has_incoming[4] = True
  1. Final has_incoming = [False, True, False, False, True]
  2. Nodes with no incoming edges: [0,2,3] β†’ Result [0,2,3]

Complexity Analysis

Measure Complexity Explanation
Time O(n + m) Loop over all edges to mark in-degrees, then loop over all nodes to collect those with no incoming edges
Space O(n) Boolean array of size n to track nodes with incoming edges

The algorithm is linear with respect to the number of nodes plus edges, which is efficient for the problem constraints.

Test Cases

# Example cases
assert Solution().findSmallestSetOfVertices(6, [[0,1],[0,2],[2,5],[3,4],[4,2]]) == [0,3]  # example 1
assert Solution().findSmallestSetOfVertices(5, [[0,1],[2,1],[3,1],[1,4],[2,4]]) == [0,2,3]  # example 2

# Edge cases
assert Solution().findSmallestSetOfVertices(2, [[0,1]]) == [0]  # minimal graph
assert Solution().findSmallestSetOfVertices(3, [[0,1],[1,2]]) == [0]  # chain graph
assert Solution().findSmallestSetOfVertices(3, [[0,2],[1,2]]) == [0,1]  # multiple roots
assert Solution().findSmallestSetOfVertices(4, [[0,1],[1,2],[2,3]]) == [0]  # linear DAG
assert Solution().findSmallestSetOfVertices(4, [[0,1],[1,2],[2,3],[3,0]]) == [0,1,2,3]  # would be cycle, but DAG guarantee prevents this
Test Why
Minimal graph 2 nodes Validates handling smallest size and direct reachability
Linear DAG Ensures only first node is selected for a chain
Multiple roots Checks correct detection of multiple nodes with no incoming edges
Large chain Validates proper scaling with simple linear DAG

Edge Cases

First, the minimal graph with only two nodes ensures that the algorithm correctly identifies the single source node with no incoming edges. This prevents an off-by-one bug in the array indexing.

Second, a linear chain tests that the solution does not over-select nodes. Only the first node should be in the result even if all nodes are technically reachable sequentially.

Third, multiple independent roots check that the algorithm correctly identifies multiple nodes with no incoming edges. It ensures the collection step handles multiple sources instead of stopping after finding one. Each of these nodes is essential to cover all reachable nodes. The implementation handles these by iterating through the has_incoming array and collecting all False entries.