LeetCode 1596 - The Most Frequently Ordered Products for Each Customer
The problem requires us to find the most frequently ordered product or products for each customer from a set of three tables: Customers, Orders, and Products. Each customerid may have multiple orders, and each order contains a productid.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
The problem requires us to find the most frequently ordered product or products for each customer from a set of three tables: Customers, Orders, and Products. Each customer_id may have multiple orders, and each order contains a product_id. The goal is to output, for each customer who has made at least one order, the product_id and product_name of the product(s) they ordered the most.
The input represents a relational database structure where customers, their orders, and product details are stored separately. The expected output is a list of tuples (or rows) containing customer_id, product_id, and product_name for each customer's top product(s). If a customer has multiple products tied for the highest number of orders, all such products must be included. Customers with no orders are excluded from the output.
Key constraints and observations include:
- Each
order_idis unique. - A customer may order the same product multiple times, but not more than once per day.
- There may be multiple products tied for the most frequently ordered product for a customer.
- The dataset can have many customers and orders, so efficiency matters.
Important edge cases are customers with no orders, customers with multiple products tied for the maximum orders, and products that are ordered only once.
Approaches
The brute-force approach is to iterate over each customer and count the number of times they ordered each product. After counting, select the maximum and output all products with that frequency. While correct, this requires nested loops over customers and products, leading to high time complexity if the dataset is large.
The optimal approach leverages SQL window functions. First, we count the number of times each customer ordered each product. Then, we use a window function to compute the maximum order count per customer. Finally, we filter only the products that match this maximum count. This avoids repeated scanning and aggregation for each customer and ensures a single-pass computation of maximum frequency.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(C * P) | O(C * P) | Count orders for each customer-product combination manually |
| Optimal | O(N) | O(N) | Use SQL window functions to compute frequencies and max per customer efficiently |
Algorithm Walkthrough
- Join the
Orderstable with theProductstable onproduct_idto getproduct_namealong with orders. - Group the results by
customer_idandproduct_idto count how many times each customer ordered each product. This gives a frequency table per customer-product pair. - Use a window function (like
MAXover partition bycustomer_id) to determine the maximum order count for each customer. - Filter the frequency table to include only rows where the product count equals the maximum count for that customer.
- Select the required columns
customer_id,product_id, andproduct_namefor the output.
This works because the window function ensures we calculate the maximum frequency for each customer without extra queries, and filtering by this maximum directly gives the top products.
Python Solution
import pandas as pd
def most_frequent_products(customers: pd.DataFrame, orders: pd.DataFrame, products: pd.DataFrame) -> pd.DataFrame:
# Merge Orders with Products to get product_name
order_products = orders.merge(products, on="product_id")
# Count how many times each customer ordered each product
freq = order_products.groupby(["customer_id", "product_id", "product_name"]).size().reset_index(name="count")
# Compute maximum frequency per customer
freq["max_count"] = freq.groupby("customer_id")["count"].transform("max")
# Filter rows where count matches max_count
result = freq[freq["count"] == freq["max_count"]][["customer_id", "product_id", "product_name"]]
return result
This implementation first merges Orders with Products to access product names, then groups by customer and product to count orders. Using the transform("max") function, we compute the max order count per customer in a vectorized manner. Filtering ensures only the most frequent products are returned.
Go Solution
package main
import (
"database/sql"
_ "github.com/go-sql-driver/mysql"
"fmt"
)
func MostFrequentProducts(db *sql.DB) (*sql.Rows, error) {
query := `
SELECT o.customer_id, o.product_id, p.product_name
FROM Orders o
JOIN Products p ON o.product_id = p.product_id
JOIN (
SELECT customer_id, product_id, COUNT(*) AS cnt,
MAX(COUNT(*)) OVER (PARTITION BY customer_id) AS max_cnt
FROM Orders
GROUP BY customer_id, product_id
) AS t ON o.customer_id = t.customer_id AND o.product_id = t.product_id
WHERE t.cnt = t.max_cnt
GROUP BY o.customer_id, o.product_id, p.product_name
`
return db.Query(query)
}
The Go implementation uses SQL directly with window functions. The SQL query counts orders per customer-product pair, computes the maximum count per customer, and filters for the products that match the maximum count. Go code executes this query and returns the results. Differences from Python include reliance on SQL instead of in-memory DataFrame manipulation and handling results through sql.Rows.
Worked Examples
Example: Customer 1 (Alice)
- Orders: product 1 (keyboard) once, product 2 (mouse) three times.
- Count table: product 1 → 1, product 2 → 3.
- Maximum count: 3.
- Filtered result: product 2 (mouse).
Example: Customer 2 (Bob)
- Orders: product 1 → 1, product 2 → 1, product 3 → 1.
- Maximum count: 1.
- Filtered result: product 1, 2, 3.
This process is applied for each customer who has made at least one order.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N) | N is the number of orders. Merging and grouping operations scale linearly with the number of rows. |
| Space | O(N) | Storing counts and intermediate merged tables requires space proportional to the number of orders. |
The complexity is efficient because we avoid nested loops over customers and products by leveraging group-by and window functions.
Test Cases
# Provided example
import pandas as pd
customers = pd.DataFrame({
"customer_id": [1, 2, 3, 4, 5],
"name": ["Alice", "Bob", "Tom", "Jerry", "John"]
})
orders = pd.DataFrame({
"order_id": [1,2,3,4,5,6,7,8,9,10],
"order_date": ["2020-07-31","2020-07-30","2020-08-29","2020-07-29",
"2020-06-10","2020-08-01","2020-08-01","2020-08-03",
"2020-08-07","2020-07-15"],
"customer_id": [1,2,3,4,1,2,3,1,2,1],
"product_id": [1,2,3,1,2,1,3,2,3,2]
})
products = pd.DataFrame({
"product_id": [1,2,3,4],
"product_name": ["keyboard", "mouse", "screen", "hard disk"],
"price": [120, 80, 600, 450]
})
result = most_frequent_products(customers, orders, products)
assert result[(result.customer_id==1) & (result.product_id==2)].shape[0] == 1 # Alice: mouse
assert result[(result.customer_id==2) & (result.product_id==1)].shape[0] == 1 # Bob: keyboard
assert result[(result.customer_id==2) & (result.product_id==2)].shape[0] == 1 # Bob: mouse
assert result[(result.customer_id==2) & (result.product_id==3)].shape[0] == 1 # Bob: screen
assert result[(result.customer_id==3) & (result.product_id==3)].shape[0] == 1 # Tom: screen
assert result[(result.customer_id==4) & (result.product_id==1)].shape[0] == 1 # Jerry: keyboard
assert result[result.customer_id==5].shape[0] == 0 # John: no orders
| Test | Why |
|---|---|
| Alice: multiple orders, one top product | Validates normal case with one maximum |
| Bob: multiple products tied | Ensures tie handling works |
| Tom: only one product | Simple single product case |
| Jerry: single order | Edge case of only one order |
| John: no orders | Validates customers with no orders are excluded |
Edge Cases
One edge case is a customer with no orders, like John. The implementation correctly excludes such customers by grouping and counting only existing orders.
Another edge case is a tie between multiple products for the maximum frequency. Bob’s case illustrates this. The transform("max") function in Python or MAX window function in SQL ensures all products with the