LeetCode 1045 - Customers Who Bought All Products
This problem asks us to identify all customers who have purchased every product listed in the Product table. In other words, a customer is eligible for the output if, for each productkey in the Product table, there exists a corresponding row in the Customer table where that…
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
This problem asks us to identify all customers who have purchased every product listed in the Product table. In other words, a customer is eligible for the output if, for each product_key in the Product table, there exists a corresponding row in the Customer table where that customer bought that product.
The input consists of two relational tables. The Customer table records individual purchases with possible duplicates, meaning a customer may have purchased the same product multiple times. The Product table lists all products available, each with a unique product_key.
The expected output is a list of customer_id values representing customers who have purchased all products. The order of the output does not matter.
Key constraints and observations include: the Customer table may contain duplicates, so counting unique purchases per customer is essential. Every product_key in Customer is guaranteed to reference a valid product in the Product table. There may be edge cases where no customer has purchased all products, or where the Product table has only one item, which makes the problem trivially satisfied by any customer who purchased that item.
Approaches
Brute-Force Approach
The brute-force approach involves iterating through each customer and checking, one by one, whether they have purchased every product in the Product table. This can be done by collecting the set of products each customer bought and comparing it against the set of all products.
While correct, this method is inefficient because it requires multiple scans of the data. For n customers and m products, the set comparison step can lead to O(n * m) operations, and building the sets for all customers requires additional memory overhead.
Optimal Approach
The key insight is to leverage aggregation. We can count the number of distinct products each customer has purchased and compare it to the total number of products in the Product table. If the counts match, it guarantees that the customer bought all products. This approach reduces the need for repeated set comparisons and allows the database engine to efficiently compute the result using GROUP BY and HAVING clauses.
This method efficiently handles duplicates, as counting distinct product_key ensures multiple purchases of the same product do not inflate the count.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n * m) | O(n * m) | Compare each customer's purchased products to the full product list. Inefficient for large tables. |
| Optimal | O(c + p) | O(c) | Count distinct products per customer and compare to total products. Efficient aggregation. |
Here, c is the number of rows in Customer and p is the number of rows in Product.
Algorithm Walkthrough
- Determine the total number of products by counting all rows in the
Producttable. This gives the target number of products each customer must have purchased. - Query the
Customertable, grouping bycustomer_id, and count the distinctproduct_keyvalues for each customer. This ensures duplicates do not inflate the count. - Use a
HAVINGclause to filter only those customers whose count of distinct products matches the total number of products. - Return the
customer_idvalues of these customers.
Why it works: The invariant here is that any customer whose count of distinct products equals the total number of products must have purchased every product. Duplicates do not affect the count because we only consider distinct products per customer.
Python Solution
import sqlite3
from typing import List, Tuple
def customers_who_bought_all_products(conn: sqlite3.Connection) -> List[Tuple[int]]:
query = """
SELECT customer_id
FROM Customer
GROUP BY customer_id
HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(*) FROM Product)
"""
cursor = conn.cursor()
cursor.execute(query)
return cursor.fetchall()
In this implementation, we use a single SQL query to perform the aggregation. The GROUP BY clause groups the purchases by customer_id, COUNT(DISTINCT product_key) ensures duplicates are ignored, and the HAVING clause filters only those customers who bought all products. The subquery (SELECT COUNT(*) FROM Product) calculates the total number of products.
Go Solution
package main
import (
"database/sql"
_ "github.com/mattn/go-sqlite3"
)
func CustomersWhoBoughtAllProducts(db *sql.DB) ([]int, error) {
query := `
SELECT customer_id
FROM Customer
GROUP BY customer_id
HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(*) FROM Product)
`
rows, err := db.Query(query)
if err != nil {
return nil, err
}
defer rows.Close()
var result []int
for rows.Next() {
var customerID int
if err := rows.Scan(&customerID); err != nil {
return nil, err
}
result = append(result, customerID)
}
return result, nil
}
In Go, we execute the same SQL query. The differences include iterating through rows.Next() to collect results and explicitly handling errors from Query and Scan. Slices are used to store the results dynamically.
Worked Examples
Example Input:
Customer table:
| customer_id | product_key |
|---|---|
| 1 | 5 |
| 2 | 6 |
| 3 | 5 |
| 3 | 6 |
| 1 | 6 |
Product table:
| product_key |
|---|
| 5 |
| 6 |
Step-by-step execution:
- Count total products: 2 (
5and6). - Count distinct products per customer:
- Customer 1: {5, 6} → 2
- Customer 2: {6} → 1
- Customer 3: {5, 6} → 2
- Compare to total products:
- Customer 1 → 2 == 2 → include
- Customer 2 → 1 != 2 → exclude
- Customer 3 → 2 == 2 → include
Output:
| customer_id |
|---|
| 1 |
| 3 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(c + p) | Single pass through Customer for grouping plus a count of products in Product. |
| Space | O(c) | Store intermediate aggregation per customer. |
The time complexity is linear in the number of rows in the two tables. Aggregation reduces the need for repeated set comparisons.
Test Cases
# test cases
assert customers_who_bought_all_products(conn) == [(1,), (3,)] # Example case
assert customers_who_bought_all_products(conn_empty_customer) == [] # No customers
assert customers_who_bought_all_products(conn_single_product) == [(1,), (2,)] # One product, multiple customers bought it
assert customers_who_bought_all_products(conn_duplicate_rows) == [(1,)] # Customer bought all but duplicates present
assert customers_who_bought_all_products(conn_no_complete_customer) == [] # No one bought all products
| Test | Why |
|---|---|
| Example case | Validates correct aggregation and filtering. |
| No customers | Handles empty Customer table. |
| Single product | Checks trivial full coverage. |
| Duplicate rows | Ensures DISTINCT counting handles duplicates. |
| No complete customer | Ensures no false positives when no one bought all products. |
Edge Cases
One important edge case is when the Customer table is empty. In this scenario, the query must return an empty result, which our approach handles correctly because no customer meets the required product count.
Another edge case is when the Product table contains only one product. Every customer who bought that product should be included, and the algorithm handles this by comparing against a total count of one.
A third edge case is the presence of duplicate rows in the Customer table. If a customer bought the same product multiple times, a naive count could overcount. Using COUNT(DISTINCT product_key) ensures duplicates do not affect the outcome.
These cases demonstrate that the aggregation-based solution is robust and handles variations in input naturally.