LeetCode 1767 - Find the Subtasks That Did Not Execute

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This problem revolves around identifying which subtasks of a task were not executed. We are given two tables: Tasks and Executed. The Tasks table lists all tasks and the number of subtasks each task is divided into, while the Executed table lists subtasks that successfully ran for a given task. The goal is to produce a table containing all (task_id, subtask_id) pairs that were supposed to run according to Tasks but are missing in Executed.

In other words, for every task, we must generate all subtasks from 1 to subtasks_count and subtract the ones already executed. The remaining subtasks are the missing ones that need to be reported. Constraints tell us that each task has at least two subtasks and at most 20, so the number of subtasks per task is small and manageable. The Executed table guarantees that no subtask ID exceeds the number of subtasks for its task, so we do not need to validate IDs.

Important edge cases include tasks with no executed subtasks (all are missing), tasks where all subtasks have been executed (none are missing), and tasks with only some subtasks executed. The solution should correctly handle all three scenarios.

Approaches

The brute-force approach would involve generating all possible (task_id, subtask_id) pairs for every task and then filtering out the ones present in the Executed table using a join or in-memory check. While simple, this can be inefficient because it generates up to 20 subtasks per task and then checks existence for each pair, leading to unnecessary computation and large intermediate results if the number of tasks is big.

The optimal approach relies on generating all potential subtask IDs for each task and performing a left join with the Executed table. Then, we filter only the NULL matches, which represents missing executions. This leverages SQL set operations efficiently without generating redundant checks or doing complex filtering.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(T * S)	O(T * S)	Generate all subtasks and check existence in `Executed`
Optimal	O(T * S)	O(T * S)	Generate all subtasks and use LEFT JOIN with filter on NULL

Here T is the number of tasks and S is the maximum subtasks count (≤ 20).

Algorithm Walkthrough

For each task in the Tasks table, generate all subtask IDs from 1 up to subtasks_count. This can be done using a numbers table or SQL generate_series in databases that support it.
Left join this generated list of subtasks with the Executed table on both task_id and subtask_id.
Filter the rows where subtask_id from the Executed table is NULL, meaning that this subtask has not been executed.
Return the resulting (task_id, subtask_id) pairs as the final result.

Why it works: Each task's full list of potential subtasks is created, so any missing execution can be detected as a NULL after the left join. The invariant is that the left join always preserves all possible subtask IDs for a task, and filtering for NULL precisely captures what is missing.

Python Solution

import sqlite3
from typing import List, Tuple

def find_missing_subtasks(tasks: List[Tuple[int, int]], executed: List[Tuple[int, int]]) -> List[Tuple[int, int]]:
    conn = sqlite3.connect(":memory:")
    cur = conn.cursor()
    
    cur.execute("CREATE TABLE Tasks(task_id INT, subtasks_count INT)")
    cur.execute("CREATE TABLE Executed(task_id INT, subtask_id INT)")
    
    cur.executemany("INSERT INTO Tasks VALUES (?, ?)", tasks)
    cur.executemany("INSERT INTO Executed VALUES (?, ?)", executed)
    
    query = """
    WITH numbers(n) AS (
        SELECT 1
        UNION ALL
        SELECT n + 1 FROM numbers WHERE n < 20
    ),
    all_subtasks AS (
        SELECT t.task_id, n.n AS subtask_id
        FROM Tasks t
        JOIN numbers n ON n.n <= t.subtasks_count
    )
    SELECT a.task_id, a.subtask_id
    FROM all_subtasks a
    LEFT JOIN Executed e
    ON a.task_id = e.task_id AND a.subtask_id = e.subtask_id
    WHERE e.subtask_id IS NULL
    """
    
    cur.execute("PRAGMA recursive_triggers = ON")
    cur.execute(query)
    result = cur.fetchall()
    conn.close()
    return result

This Python solution simulates SQL execution using SQLite. A recursive CTE numbers generates integers from 1 to 20, enough to cover all subtasks. Then all_subtasks computes all task-subtask combinations, and a left join with Executed identifies missing subtasks.

Go Solution

package main

import (
    "database/sql"
    _ "github.com/mattn/go-sqlite3"
)

func findMissingSubtasks(tasks [][2]int, executed [][2]int) [][2]int {
    db, _ := sql.Open("sqlite3", ":memory:")
    defer db.Close()

    db.Exec("CREATE TABLE Tasks(task_id INT, subtasks_count INT)")
    db.Exec("CREATE TABLE Executed(task_id INT, subtask_id INT)")

    for _, t := range tasks {
        db.Exec("INSERT INTO Tasks VALUES (?, ?)", t[0], t[1])
    }
    for _, e := range executed {
        db.Exec("INSERT INTO Executed VALUES (?, ?)", e[0], e[1])
    }

    query := `
    WITH RECURSIVE numbers(n) AS (
        SELECT 1
        UNION ALL
        SELECT n + 1 FROM numbers WHERE n < 20
    ),
    all_subtasks AS (
        SELECT t.task_id, n.n AS subtask_id
        FROM Tasks t
        JOIN numbers n ON n.n <= t.subtasks_count
    )
    SELECT a.task_id, a.subtask_id
    FROM all_subtasks a
    LEFT JOIN Executed e
    ON a.task_id = e.task_id AND a.subtask_id = e.subtask_id
    WHERE e.subtask_id IS NULL
    `

    rows, _ := db.Query(query)
    defer rows.Close()

    var result [][2]int
    for rows.Next() {
        var task_id, subtask_id int
        rows.Scan(&task_id, &subtask_id)
        result = append(result, [2]int{task_id, subtask_id})
    }
    return result
}

The Go version mirrors the Python implementation. Go requires explicit slice handling and the use of database/sql to execute queries. Recursion in CTEs works similarly to Python/SQLite.

Worked Examples

Example 1 Input

Tasks table:

task_id	subtasks_count
1	3
2	2
3	4

Executed table:

task_id	subtask_id
1	2
3	1
3	2
3	3
3	4

Step by Step Execution

Generate numbers 1 to 20.
Join with Tasks to get all subtasks:

task_id	subtask_id
1	1
1	2
1	3
2	1
2	2
3	1
3	2
3	3
3	4

Left join with Executed and filter NULL:

task_id	subtask_id
1	1
1	3
2	1
2	2

Output matches expected.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(T * S)	For each task, generate up to 20 subtasks and perform a join
Space	O(T * S)	Store generated subtask pairs in memory for joining/filtering

The approach scales well because S is bounded (≤20). The left join ensures that each task is handled independently and efficiently.

Test Cases

# test cases
assert sorted(find_missing_subtasks([(1,3),(2,2),(3,4)], [(1,2),(3,1),(3,2),(3,3),(3,4)])) == sorted([(1,1),(1,3),(2,1),(2,2)])  # example case
assert sorted(find_missing_subtasks([(1,2)], [])) == sorted([(1,1),(1,2)])  # no executed subtasks
assert sorted(find_missing_subtasks([(1,2)], [(1,1),(1,2)])) == []  # all executed
assert sorted(find_missing_subtasks([(1,3)], [(1,2)])) == sorted([(1,1),(1,3)])  # partial execution
assert sorted(find_missing_subtasks([(1,1)], [])) == [(1,1)]