LeetCode 1965 - Employees With Missing Information

LeetCode Problem 1965

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

The problem provides two relational tables, one named Employees and another named Salaries, both keyed by employee_id. Each employee may or may not appear in both tables. The Employees table contains the employee’s name, while the Salaries table contains the employee’s salary.

The task is to identify all employee_id values for which information is incomplete. An employee is considered to have missing information if either their name is absent from the Employees table or their salary is absent from the Salaries table. In other words, we are looking for IDs that are not fully present in both datasets.

The expected output is a single-column result containing these problematic employee_id values sorted in ascending order.

Although the problem is framed in a database context, it fundamentally reduces to a set reconciliation problem between two collections of IDs. The constraints are typical of LeetCode easy-level database problems, implying that the tables are small to moderately sized and can be processed using straightforward joins or in-memory set operations.

Edge cases include scenarios where one table is empty, where both tables are identical, or where there is no overlap at all between the two tables. The problem guarantees uniqueness of employee_id within each table, which simplifies reasoning because there are no duplicate keys to handle.

Approaches

The brute-force approach conceptually considers every possible employee ID that appears in either table and checks, one by one, whether it exists in both tables. This can be simulated by forming the union of all IDs and then scanning each ID against both tables. While this is correct, it is inefficient because each membership check may require scanning an entire table, leading to quadratic behavior in the worst case.

The optimal insight is that we do not actually need to inspect names or salaries at all. We only need to reason about the presence or absence of employee_id across the two tables. This reduces the problem to set operations. Specifically, we compute the union of IDs from both tables and then remove those that appear in both, which is equivalent to computing the symmetric difference between the two ID sets. This is efficient because set operations are typically linear in time.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n × m)	O(n + m)	For each ID, scan both tables to check existence
Optimal	O(n + m)	O(n + m)	Use set operations or hash-based membership

Algorithm Walkthrough

First, extract all employee_id values from the Employees table and store them in a set. This allows constant-time membership checks and avoids repeated scanning of the table.
Next, extract all employee_id values from the Salaries table and store them in another set for the same reason, enabling efficient comparison.
Compute the union of both sets. This represents every employee ID that appears in at least one of the tables.
Compute the intersection of both sets. This represents employees who have complete information, meaning they appear in both tables.
Subtract the intersection set from the union set. The resulting set contains only those employee IDs that are missing from one of the tables, which is exactly the definition of missing information.
Convert the resulting set into a sorted list because the problem requires the output to be ordered by employee_id in ascending order.

Why it works

The correctness relies on a simple invariant: an employee is valid if and only if their ID appears in both sets. Any ID outside this intersection must be missing either a name or a salary. By computing the symmetric difference, we precisely isolate those IDs that fail this completeness condition.

Python Solution

from typing import List

class Solution:
    def findEmployeesWithMissingInfo(self, employees: List[List[int]], salaries: List[List[int]]) -> List[int]:
        employee_ids = {emp_id for emp_id, _ in employees}
        salary_ids = {emp_id for emp_id, _ in salaries}
        
        all_ids = employee_ids | salary_ids
        complete_ids = employee_ids & salary_ids
        
        missing_ids = sorted(all_ids - complete_ids)
        return missing_ids

The Python implementation directly mirrors the algorithmic idea. We first construct two sets: one for employee IDs from the Employees table and one for IDs from the Salaries table. We then compute the union and intersection of these sets. The final result is obtained by subtracting the intersection from the union, which yields the symmetric difference. Sorting ensures the output matches the required ordering constraint.

Go Solution

func findEmployeesWithMissingInfo(employees [][]int, salaries [][]int) []int {
    empSet := make(map[int]bool)
    salSet := make(map[int]bool)

    for _, row := range employees {
        empSet[row[0]] = true
    }

    for _, row := range salaries {
        salSet[row[0]] = true
    }

    result := make([]int, 0)

    for id := range empSet {
        if !salSet[id] {
            result = append(result, id)
        }
    }

    for id := range salSet {
        if !empSet[id] {
            result = append(result, id)
        }
    }

    // simple insertion sort or rely on built-in sort
    for i := 0; i < len(result); i++ {
        for j := i + 1; j < len(result); j++ {
            if result[j] < result[i] {
                result[i], result[j] = result[j], result[i]
            }
        }
    }

    return result
}

In the Go implementation, we use hash maps (map[int]bool) to simulate sets, since Go does not have a built-in set type. We iterate through both input tables to populate these maps. We then collect IDs that are missing from either side by checking membership across the two maps. Finally, we sort the result slice to satisfy the ordering requirement. The sorting is implemented manually here for completeness, although in production Go code we would typically use sort.Ints.

Worked Examples

Using the sample input, we first build the two sets.

The Employees table yields the set {2, 4, 5}.

The Salaries table yields the set {1, 4, 5}.

We compute the union, which is {1, 2, 4, 5}.

We compute the intersection, which is {4, 5}.

Subtracting gives {1, 2}.

After sorting, the final output is [1, 2].

At each step, the sets remain small and easy to track, and no intermediate transformation changes the correctness of the symmetric difference logic.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n + m)	Each table is scanned once to build sets, and set operations are linear
Space	O(n + m)	Two hash sets store all unique employee IDs

The time complexity is linear because each row in both tables is processed a constant number of times. The space complexity is also linear since we store all unique IDs from both tables.

Test Cases

# Provided example
assert Solution().findEmployeesWithMissingInfo(
    [[2, "Crew"], [4, "Haven"], [5, "Kristian"]],
    [[5, 76071], [1, 22517], [4, 63539]]
) == [1, 2]

# Both tables identical, no missing info
assert Solution().findEmployeesWithMissingInfo(
    [[1, "A"], [2, "B"]],
    [[1, 100], [2, 200]]
) == []

# Employees table empty
assert Solution().findEmployeesWithMissingInfo(
    [],
    [[1, 100], [2, 200]]
) == [1, 2]

# Salaries table empty
assert Solution().findEmployeesWithMissingInfo(
    [[1, "A"], [2, "B"]],
    []
) == [1, 2]

# No overlap at all
assert Solution().findEmployeesWithMissingInfo(
    [[1, "A"], [2, "B"]],
    [[3, 100], [4, 200]]
) == [1, 2, 3, 4]

Test	Why
provided example input	verifies standard mixed missing cases
identical tables	verifies empty output case
empty employees table	verifies full salary IDs returned
empty salaries table	verifies full employee IDs returned
disjoint tables	verifies full symmetric difference behavior

Edge Cases

One important edge case is when one of the tables is empty. In that situation, every ID in the non-empty table should be returned because all of them are missing corresponding information. The set-based implementation naturally handles this because the empty set contributes nothing to the intersection.

Another edge case occurs when both tables contain exactly the same employee_id values. In this case, no output should be returned because every employee has both a name and a salary. The intersection equals the union, so the symmetric difference becomes empty.

A third edge case is when there is no overlap between the two tables at all. Here, every employee is missing either a name or a salary, so the result should include all IDs from both tables. The union becomes the final output since the intersection is empty, and the subtraction step leaves all elements intact.