LeetCode 1767 - Find the Subtasks That Did Not Execute
This problem revolves around identifying which subtasks of a task were not executed. We are given two tables: Tasks and Executed.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
This problem revolves around identifying which subtasks of a task were not executed. We are given two tables: Tasks and Executed. The Tasks table lists all tasks and the number of subtasks each task is divided into, while the Executed table lists subtasks that successfully ran for a given task. The goal is to produce a table containing all (task_id, subtask_id) pairs that were supposed to run according to Tasks but are missing in Executed.
In other words, for every task, we must generate all subtasks from 1 to subtasks_count and subtract the ones already executed. The remaining subtasks are the missing ones that need to be reported. Constraints tell us that each task has at least two subtasks and at most 20, so the number of subtasks per task is small and manageable. The Executed table guarantees that no subtask ID exceeds the number of subtasks for its task, so we do not need to validate IDs.
Important edge cases include tasks with no executed subtasks (all are missing), tasks where all subtasks have been executed (none are missing), and tasks with only some subtasks executed. The solution should correctly handle all three scenarios.
Approaches
The brute-force approach would involve generating all possible (task_id, subtask_id) pairs for every task and then filtering out the ones present in the Executed table using a join or in-memory check. While simple, this can be inefficient because it generates up to 20 subtasks per task and then checks existence for each pair, leading to unnecessary computation and large intermediate results if the number of tasks is big.
The optimal approach relies on generating all potential subtask IDs for each task and performing a left join with the Executed table. Then, we filter only the NULL matches, which represents missing executions. This leverages SQL set operations efficiently without generating redundant checks or doing complex filtering.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(T * S) | O(T * S) | Generate all subtasks and check existence in Executed |
| Optimal | O(T * S) | O(T * S) | Generate all subtasks and use LEFT JOIN with filter on NULL |
Here T is the number of tasks and S is the maximum subtasks count (≤ 20).
Algorithm Walkthrough
- For each task in the
Taskstable, generate all subtask IDs from 1 up tosubtasks_count. This can be done using a numbers table or SQLgenerate_seriesin databases that support it. - Left join this generated list of subtasks with the
Executedtable on bothtask_idandsubtask_id. - Filter the rows where
subtask_idfrom theExecutedtable isNULL, meaning that this subtask has not been executed. - Return the resulting
(task_id, subtask_id)pairs as the final result.
Why it works: Each task's full list of potential subtasks is created, so any missing execution can be detected as a NULL after the left join. The invariant is that the left join always preserves all possible subtask IDs for a task, and filtering for NULL precisely captures what is missing.
Python Solution
import sqlite3
from typing import List, Tuple
def find_missing_subtasks(tasks: List[Tuple[int, int]], executed: List[Tuple[int, int]]) -> List[Tuple[int, int]]:
conn = sqlite3.connect(":memory:")
cur = conn.cursor()
cur.execute("CREATE TABLE Tasks(task_id INT, subtasks_count INT)")
cur.execute("CREATE TABLE Executed(task_id INT, subtask_id INT)")
cur.executemany("INSERT INTO Tasks VALUES (?, ?)", tasks)
cur.executemany("INSERT INTO Executed VALUES (?, ?)", executed)
query = """
WITH numbers(n) AS (
SELECT 1
UNION ALL
SELECT n + 1 FROM numbers WHERE n < 20
),
all_subtasks AS (
SELECT t.task_id, n.n AS subtask_id
FROM Tasks t
JOIN numbers n ON n.n <= t.subtasks_count
)
SELECT a.task_id, a.subtask_id
FROM all_subtasks a
LEFT JOIN Executed e
ON a.task_id = e.task_id AND a.subtask_id = e.subtask_id
WHERE e.subtask_id IS NULL
"""
cur.execute("PRAGMA recursive_triggers = ON")
cur.execute(query)
result = cur.fetchall()
conn.close()
return result
This Python solution simulates SQL execution using SQLite. A recursive CTE numbers generates integers from 1 to 20, enough to cover all subtasks. Then all_subtasks computes all task-subtask combinations, and a left join with Executed identifies missing subtasks.
Go Solution
package main
import (
"database/sql"
_ "github.com/mattn/go-sqlite3"
)
func findMissingSubtasks(tasks [][2]int, executed [][2]int) [][2]int {
db, _ := sql.Open("sqlite3", ":memory:")
defer db.Close()
db.Exec("CREATE TABLE Tasks(task_id INT, subtasks_count INT)")
db.Exec("CREATE TABLE Executed(task_id INT, subtask_id INT)")
for _, t := range tasks {
db.Exec("INSERT INTO Tasks VALUES (?, ?)", t[0], t[1])
}
for _, e := range executed {
db.Exec("INSERT INTO Executed VALUES (?, ?)", e[0], e[1])
}
query := `
WITH RECURSIVE numbers(n) AS (
SELECT 1
UNION ALL
SELECT n + 1 FROM numbers WHERE n < 20
),
all_subtasks AS (
SELECT t.task_id, n.n AS subtask_id
FROM Tasks t
JOIN numbers n ON n.n <= t.subtasks_count
)
SELECT a.task_id, a.subtask_id
FROM all_subtasks a
LEFT JOIN Executed e
ON a.task_id = e.task_id AND a.subtask_id = e.subtask_id
WHERE e.subtask_id IS NULL
`
rows, _ := db.Query(query)
defer rows.Close()
var result [][2]int
for rows.Next() {
var task_id, subtask_id int
rows.Scan(&task_id, &subtask_id)
result = append(result, [2]int{task_id, subtask_id})
}
return result
}
The Go version mirrors the Python implementation. Go requires explicit slice handling and the use of database/sql to execute queries. Recursion in CTEs works similarly to Python/SQLite.
Worked Examples
Example 1 Input
Tasks table:
| task_id | subtasks_count |
|---|---|
| 1 | 3 |
| 2 | 2 |
| 3 | 4 |
Executed table:
| task_id | subtask_id |
|---|---|
| 1 | 2 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 4 |
Step by Step Execution
- Generate numbers 1 to 20.
- Join with Tasks to get all subtasks:
| task_id | subtask_id |
|---|---|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 4 |
- Left join with Executed and filter
NULL:
| task_id | subtask_id |
|---|---|
| 1 | 1 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
Output matches expected.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(T * S) | For each task, generate up to 20 subtasks and perform a join |
| Space | O(T * S) | Store generated subtask pairs in memory for joining/filtering |
The approach scales well because S is bounded (≤20). The left join ensures that each task is handled independently and efficiently.
Test Cases
# test cases
assert sorted(find_missing_subtasks([(1,3),(2,2),(3,4)], [(1,2),(3,1),(3,2),(3,3),(3,4)])) == sorted([(1,1),(1,3),(2,1),(2,2)]) # example case
assert sorted(find_missing_subtasks([(1,2)], [])) == sorted([(1,1),(1,2)]) # no executed subtasks
assert sorted(find_missing_subtasks([(1,2)], [(1,1),(1,2)])) == [] # all executed
assert sorted(find_missing_subtasks([(1,3)], [(1,2)])) == sorted([(1,1),(1,3)]) # partial execution
assert sorted(find_missing_subtasks([(1,1)], [])) == [(1,1)]