LeetCode 928 - Minimize Malware Spread II

LeetCode Problem 928

Difficulty: 🔴 Hard
Topics: Array, Hash Table, Depth-First Search, Breadth-First Search, Union-Find, Graph Theory

Solution

Problem Understanding

The problem is about a network of computers represented as an undirected graph using an adjacency matrix. Each node represents a computer, and an edge between two nodes indicates a direct connection. Some nodes are initially infected with malware, and the infection spreads along the network: if a node is infected, all nodes directly connected to it will also become infected, recursively, until no more nodes can be infected.

The task is to remove exactly one node from the initially infected list in order to minimize the final number of infected nodes. If there are multiple nodes that achieve the same minimal infection, we must return the node with the smallest index.

The input consists of the adjacency matrix graph (size n x n, n ≤ 300) and a list initial of infected nodes. The adjacency matrix is symmetric (graph[i][j] == graph[j][i]) and each node is connected to itself (graph[i][i] == 1). The initial list contains unique indices representing initially infected nodes.

Key observations include the fact that malware spreads to all nodes in a connected component that contains at least one infected node. Removing a node can prevent the infection from spreading in components where that node is the only infected node, but if a component contains multiple infected nodes, removing a single node does not prevent the infection in that component.

Important edge cases include networks where all infected nodes belong to the same connected component, where multiple nodes can equally reduce the infection, or where some infected nodes are isolated.

Approaches

The brute-force approach would simulate the infection process for each node in initial removed. For each candidate removal, we would mark nodes as infected starting from the remaining infected nodes, performing a BFS or DFS for every candidate. The node removal that results in the smallest final infection would be returned. While correct, this approach is too slow because for each node removal we may traverse the entire graph, resulting in time complexity O(n^3) in the worst case for n ≤ 300.

The key insight for the optimal solution is that the final number of infected nodes can be reasoned about component by component. We can first identify all connected components in the graph. For each component, if it contains exactly one initially infected node, removing that node prevents the whole component from being infected. If it contains multiple initially infected nodes, the component will be infected regardless. This insight allows us to avoid simulating the infection for each candidate removal and instead compute the effect directly using component sizes and counts of initially infected nodes.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n^3)	O(n^2)	Simulate infection for each node removal using DFS/BFS
Optimal	O(n^2)	O(n)	Identify connected components and track infection contributions per component

Algorithm Walkthrough

Sort the initial list to satisfy the tie-breaking requirement (smallest index node is preferred).
Compute all connected components of the graph. This can be done using DFS or BFS starting from unvisited nodes. For each node, store which component it belongs to and the size of each component.
Count, for each component, how many initially infected nodes are present.
For each initially infected node, check the size of its component. If its component contains exactly one infected node (itself), the contribution to reducing the infection by removing it is the size of that component.
Keep track of the node that maximizes the reduction in infected nodes. If multiple nodes yield the same reduction, the earlier node in sorted initial is chosen.
Return the node identified in step 5.

Why it works: The invariant is that any component with multiple initially infected nodes will always be fully infected regardless of which single node is removed. Components with exactly one infected node can be fully saved by removing that node. Therefore, the algorithm correctly identifies the node whose removal reduces the most infections.

Python Solution

from typing import List

class Solution:
    def minMalwareSpread(self, graph: List[List[int]], initial: List[int]) -> int:
        n = len(graph)
        
        # Step 1: Sort initial for tie-breaking
        initial.sort()
        
        # Step 2: Find connected components
        visited = [False] * n
        component_id = [-1] * n
        component_size = []
        
        def dfs(node: int, cid: int) -> int:
            visited[node] = True
            component_id[node] = cid
            size = 1
            for nei, connected in enumerate(graph[node]):
                if connected and not visited[nei]:
                    size += dfs(nei, cid)
            return size
        
        cid = 0
        for node in range(n):
            if not visited[node]:
                size = dfs(node, cid)
                component_size.append(size)
                cid += 1
        
        # Step 3: Count initially infected nodes per component
        infected_count = [0] * cid
        for node in initial:
            infected_count[component_id[node]] += 1
        
        # Step 4: Find the best node to remove
        result = initial[0]
        max_saved = -1
        for node in initial:
            c = component_id[node]
            if infected_count[c] == 1:
                if component_size[c] > max_saved:
                    max_saved = component_size[c]
                    result = node
        
        return result

Explanation:

We first sort initial for tie-breaking. We compute connected components using DFS, tracking the size and component ID for each node. Then, we count how many initially infected nodes are in each component. Only nodes that are the sole infected node in a component can reduce the infection. We select the node that saves the largest component.

Go Solution

func minMalwareSpread(graph [][]int, initial []int) int {
    n := len(graph)
    
    // Step 1: Sort initial for tie-breaking
    sort.Ints(initial)
    
    // Step 2: Find connected components
    visited := make([]bool, n)
    componentID := make([]int, n)
    componentSize := []int{}
    
    var dfs func(node, cid int) int
    dfs = func(node, cid int) int {
        visited[node] = true
        componentID[node] = cid
        size := 1
        for nei, connected := range graph[node] {
            if connected == 1 && !visited[nei] {
                size += dfs(nei, cid)
            }
        }
        return size
    }
    
    cid := 0
    for i := 0; i < n; i++ {
        if !visited[i] {
            size := dfs(i, cid)
            componentSize = append(componentSize, size)
            cid++
        }
    }
    
    // Step 3: Count initially infected nodes per component
    infectedCount := make([]int, cid)
    for _, node := range initial {
        infectedCount[componentID[node]]++
    }
    
    // Step 4: Find the best node to remove
    result := initial[0]
    maxSaved := -1
    for _, node := range initial {
        c := componentID[node]
        if infectedCount[c] == 1 && componentSize[c] > maxSaved {
            maxSaved = componentSize[c]
            result = node
        }
    }
    
    return result
}

Explanation: Go implementation mirrors Python closely. We use slices for visited tracking, component sizes, and IDs. The DFS is implemented as a closure to allow recursion while accessing outer variables. Sorting the initial slice ensures tie-breaking by smallest index.

Worked Examples

Example 1

Input: graph = [[1,1,0],[1,1,0],[0,0,1]], initial = [0,1]

Connected components:

Component 0: nodes {0,1}, size 2
Component 1: node {2}, size 1

Initially infected counts per component: {2, 0} (nodes 0 and 1 in the same component)
No component has exactly one infected node, so remove smallest index in initial → 0

Output: 0

Example 2

Input: graph = [[1,1,0],[1,1,1],[0,1,1]], initial = [0,1]

Connected component: nodes {0,1,2}, size 3
Infected count = 2 → cannot save any nodes
Remove smallest index → 1

Output: 1

Example 3

Input: graph = [[1,1,0,0],[1,1,1,0],[0,1,1,1],[0,0,1,1]], initial = [0,1]

Connected component: nodes {0,1,2,3}, size 4
Infected count = 2 → cannot save any nodes
Remove smallest index → 1

Output: 1

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n^2)	DFS over adjacency matrix requires visiting each edge in the n x n graph
Space	O(n)	Arrays for visited, component ID, infected counts, and recursion stack

Since the adjacency matrix has O(n^2) entries, DFS will traverse up to all n^2 edges in the worst case.