LeetCode 1045 - Customers Who Bought All Products

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This problem asks us to identify all customers who have purchased every product listed in the Product table. In other words, a customer is eligible for the output if, for each product_key in the Product table, there exists a corresponding row in the Customer table where that customer bought that product.

The input consists of two relational tables. The Customer table records individual purchases with possible duplicates, meaning a customer may have purchased the same product multiple times. The Product table lists all products available, each with a unique product_key.

The expected output is a list of customer_id values representing customers who have purchased all products. The order of the output does not matter.

Key constraints and observations include: the Customer table may contain duplicates, so counting unique purchases per customer is essential. Every product_key in Customer is guaranteed to reference a valid product in the Product table. There may be edge cases where no customer has purchased all products, or where the Product table has only one item, which makes the problem trivially satisfied by any customer who purchased that item.

Approaches

Brute-Force Approach

The brute-force approach involves iterating through each customer and checking, one by one, whether they have purchased every product in the Product table. This can be done by collecting the set of products each customer bought and comparing it against the set of all products.

While correct, this method is inefficient because it requires multiple scans of the data. For n customers and m products, the set comparison step can lead to O(n * m) operations, and building the sets for all customers requires additional memory overhead.

Optimal Approach

The key insight is to leverage aggregation. We can count the number of distinct products each customer has purchased and compare it to the total number of products in the Product table. If the counts match, it guarantees that the customer bought all products. This approach reduces the need for repeated set comparisons and allows the database engine to efficiently compute the result using GROUP BY and HAVING clauses.

This method efficiently handles duplicates, as counting distinct product_key ensures multiple purchases of the same product do not inflate the count.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n * m)	O(n * m)	Compare each customer's purchased products to the full product list. Inefficient for large tables.
Optimal	O(c + p)	O(c)	Count distinct products per customer and compare to total products. Efficient aggregation.

Here, c is the number of rows in Customer and p is the number of rows in Product.

Algorithm Walkthrough

Determine the total number of products by counting all rows in the Product table. This gives the target number of products each customer must have purchased.
Query the Customer table, grouping by customer_id, and count the distinct product_key values for each customer. This ensures duplicates do not inflate the count.
Use a HAVING clause to filter only those customers whose count of distinct products matches the total number of products.
Return the customer_id values of these customers.

Why it works: The invariant here is that any customer whose count of distinct products equals the total number of products must have purchased every product. Duplicates do not affect the count because we only consider distinct products per customer.

Python Solution

import sqlite3
from typing import List, Tuple

def customers_who_bought_all_products(conn: sqlite3.Connection) -> List[Tuple[int]]:
    query = """
    SELECT customer_id
    FROM Customer
    GROUP BY customer_id
    HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(*) FROM Product)
    """
    cursor = conn.cursor()
    cursor.execute(query)
    return cursor.fetchall()

In this implementation, we use a single SQL query to perform the aggregation. The GROUP BY clause groups the purchases by customer_id, COUNT(DISTINCT product_key) ensures duplicates are ignored, and the HAVING clause filters only those customers who bought all products. The subquery (SELECT COUNT(*) FROM Product) calculates the total number of products.

Go Solution

package main

import (
    "database/sql"
    _ "github.com/mattn/go-sqlite3"
)

func CustomersWhoBoughtAllProducts(db *sql.DB) ([]int, error) {
    query := `
        SELECT customer_id
        FROM Customer
        GROUP BY customer_id
        HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(*) FROM Product)
    `
    rows, err := db.Query(query)
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var result []int
    for rows.Next() {
        var customerID int
        if err := rows.Scan(&customerID); err != nil {
            return nil, err
        }
        result = append(result, customerID)
    }
    return result, nil
}

In Go, we execute the same SQL query. The differences include iterating through rows.Next() to collect results and explicitly handling errors from Query and Scan. Slices are used to store the results dynamically.

Worked Examples

Example Input:

Customer table:

customer_id	product_key
1	5
2	6
3	5
3	6
1	6

Product table:

product_key
5
6

Step-by-step execution:

Count total products: 2 (5 and 6).
Count distinct products per customer:

Customer 1: {5, 6} → 2
Customer 2: {6} → 1
Customer 3: {5, 6} → 2

Compare to total products:

Customer 1 → 2 == 2 → include
Customer 2 → 1 != 2 → exclude
Customer 3 → 2 == 2 → include

Output:

customer_id
1
3

Complexity Analysis

Measure	Complexity	Explanation
Time	O(c + p)	Single pass through `Customer` for grouping plus a count of products in `Product`.
Space	O(c)	Store intermediate aggregation per customer.

The time complexity is linear in the number of rows in the two tables. Aggregation reduces the need for repeated set comparisons.

Test Cases

# test cases
assert customers_who_bought_all_products(conn) == [(1,), (3,)]  # Example case
assert customers_who_bought_all_products(conn_empty_customer) == []  # No customers
assert customers_who_bought_all_products(conn_single_product) == [(1,), (2,)]  # One product, multiple customers bought it
assert customers_who_bought_all_products(conn_duplicate_rows) == [(1,)]  # Customer bought all but duplicates present
assert customers_who_bought_all_products(conn_no_complete_customer) == []  # No one bought all products

Test	Why
Example case	Validates correct aggregation and filtering.
No customers	Handles empty Customer table.
Single product	Checks trivial full coverage.
Duplicate rows	Ensures DISTINCT counting handles duplicates.
No complete customer	Ensures no false positives when no one bought all products.

Edge Cases

One important edge case is when the Customer table is empty. In this scenario, the query must return an empty result, which our approach handles correctly because no customer meets the required product count.

Another edge case is when the Product table contains only one product. Every customer who bought that product should be included, and the algorithm handles this by comparing against a total count of one.

A third edge case is the presence of duplicate rows in the Customer table. If a customer bought the same product multiple times, a naive count could overcount. Using COUNT(DISTINCT product_key) ensures duplicates do not affect the outcome.

These cases demonstrate that the aggregation-based solution is robust and handles variations in input naturally.