LeetCode 928 - Minimize Malware Spread II
The problem is about a network of computers represented as an undirected graph using an adjacency matrix. Each node represents a computer, and an edge between two nodes indicates a direct connection.
Difficulty: 🔴 Hard
Topics: Array, Hash Table, Depth-First Search, Breadth-First Search, Union-Find, Graph Theory
Solution
Problem Understanding
The problem is about a network of computers represented as an undirected graph using an adjacency matrix. Each node represents a computer, and an edge between two nodes indicates a direct connection. Some nodes are initially infected with malware, and the infection spreads along the network: if a node is infected, all nodes directly connected to it will also become infected, recursively, until no more nodes can be infected.
The task is to remove exactly one node from the initially infected list in order to minimize the final number of infected nodes. If there are multiple nodes that achieve the same minimal infection, we must return the node with the smallest index.
The input consists of the adjacency matrix graph (size n x n, n ≤ 300) and a list initial of infected nodes. The adjacency matrix is symmetric (graph[i][j] == graph[j][i]) and each node is connected to itself (graph[i][i] == 1). The initial list contains unique indices representing initially infected nodes.
Key observations include the fact that malware spreads to all nodes in a connected component that contains at least one infected node. Removing a node can prevent the infection from spreading in components where that node is the only infected node, but if a component contains multiple infected nodes, removing a single node does not prevent the infection in that component.
Important edge cases include networks where all infected nodes belong to the same connected component, where multiple nodes can equally reduce the infection, or where some infected nodes are isolated.
Approaches
The brute-force approach would simulate the infection process for each node in initial removed. For each candidate removal, we would mark nodes as infected starting from the remaining infected nodes, performing a BFS or DFS for every candidate. The node removal that results in the smallest final infection would be returned. While correct, this approach is too slow because for each node removal we may traverse the entire graph, resulting in time complexity O(n^3) in the worst case for n ≤ 300.
The key insight for the optimal solution is that the final number of infected nodes can be reasoned about component by component. We can first identify all connected components in the graph. For each component, if it contains exactly one initially infected node, removing that node prevents the whole component from being infected. If it contains multiple initially infected nodes, the component will be infected regardless. This insight allows us to avoid simulating the infection for each candidate removal and instead compute the effect directly using component sizes and counts of initially infected nodes.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n^3) | O(n^2) | Simulate infection for each node removal using DFS/BFS |
| Optimal | O(n^2) | O(n) | Identify connected components and track infection contributions per component |
Algorithm Walkthrough
- Sort the
initiallist to satisfy the tie-breaking requirement (smallest index node is preferred). - Compute all connected components of the graph. This can be done using DFS or BFS starting from unvisited nodes. For each node, store which component it belongs to and the size of each component.
- Count, for each component, how many initially infected nodes are present.
- For each initially infected node, check the size of its component. If its component contains exactly one infected node (itself), the contribution to reducing the infection by removing it is the size of that component.
- Keep track of the node that maximizes the reduction in infected nodes. If multiple nodes yield the same reduction, the earlier node in sorted
initialis chosen. - Return the node identified in step 5.
Why it works: The invariant is that any component with multiple initially infected nodes will always be fully infected regardless of which single node is removed. Components with exactly one infected node can be fully saved by removing that node. Therefore, the algorithm correctly identifies the node whose removal reduces the most infections.
Python Solution
from typing import List
class Solution:
def minMalwareSpread(self, graph: List[List[int]], initial: List[int]) -> int:
n = len(graph)
# Step 1: Sort initial for tie-breaking
initial.sort()
# Step 2: Find connected components
visited = [False] * n
component_id = [-1] * n
component_size = []
def dfs(node: int, cid: int) -> int:
visited[node] = True
component_id[node] = cid
size = 1
for nei, connected in enumerate(graph[node]):
if connected and not visited[nei]:
size += dfs(nei, cid)
return size
cid = 0
for node in range(n):
if not visited[node]:
size = dfs(node, cid)
component_size.append(size)
cid += 1
# Step 3: Count initially infected nodes per component
infected_count = [0] * cid
for node in initial:
infected_count[component_id[node]] += 1
# Step 4: Find the best node to remove
result = initial[0]
max_saved = -1
for node in initial:
c = component_id[node]
if infected_count[c] == 1:
if component_size[c] > max_saved:
max_saved = component_size[c]
result = node
return result
Explanation:
We first sort initial for tie-breaking. We compute connected components using DFS, tracking the size and component ID for each node. Then, we count how many initially infected nodes are in each component. Only nodes that are the sole infected node in a component can reduce the infection. We select the node that saves the largest component.
Go Solution
func minMalwareSpread(graph [][]int, initial []int) int {
n := len(graph)
// Step 1: Sort initial for tie-breaking
sort.Ints(initial)
// Step 2: Find connected components
visited := make([]bool, n)
componentID := make([]int, n)
componentSize := []int{}
var dfs func(node, cid int) int
dfs = func(node, cid int) int {
visited[node] = true
componentID[node] = cid
size := 1
for nei, connected := range graph[node] {
if connected == 1 && !visited[nei] {
size += dfs(nei, cid)
}
}
return size
}
cid := 0
for i := 0; i < n; i++ {
if !visited[i] {
size := dfs(i, cid)
componentSize = append(componentSize, size)
cid++
}
}
// Step 3: Count initially infected nodes per component
infectedCount := make([]int, cid)
for _, node := range initial {
infectedCount[componentID[node]]++
}
// Step 4: Find the best node to remove
result := initial[0]
maxSaved := -1
for _, node := range initial {
c := componentID[node]
if infectedCount[c] == 1 && componentSize[c] > maxSaved {
maxSaved = componentSize[c]
result = node
}
}
return result
}
Explanation: Go implementation mirrors Python closely. We use slices for visited tracking, component sizes, and IDs. The DFS is implemented as a closure to allow recursion while accessing outer variables. Sorting the initial slice ensures tie-breaking by smallest index.
Worked Examples
Example 1
Input: graph = [[1,1,0],[1,1,0],[0,0,1]], initial = [0,1]
- Connected components:
- Component 0: nodes {0,1}, size 2
- Component 1: node {2}, size 1
- Initially infected counts per component: {2, 0} (nodes 0 and 1 in the same component)
- No component has exactly one infected node, so remove smallest index in
initial→ 0
Output: 0
Example 2
Input: graph = [[1,1,0],[1,1,1],[0,1,1]], initial = [0,1]
- Connected component: nodes {0,1,2}, size 3
- Infected count = 2 → cannot save any nodes
- Remove smallest index → 1
Output: 1
Example 3
Input: graph = [[1,1,0,0],[1,1,1,0],[0,1,1,1],[0,0,1,1]], initial = [0,1]
- Connected component: nodes {0,1,2,3}, size 4
- Infected count = 2 → cannot save any nodes
- Remove smallest index → 1
Output: 1
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n^2) | DFS over adjacency matrix requires visiting each edge in the n x n graph |
| Space | O(n) | Arrays for visited, component ID, infected counts, and recursion stack |
Since the adjacency matrix has O(n^2) entries, DFS will traverse up to all n^2 edges in the worst case.