LeetCode 1596 - The Most Frequently Ordered Products for Each Customer

The problem requires us to find the most frequently ordered product or products for each customer from a set of three tables: Customers, Orders, and Products. Each customerid may have multiple orders, and each order contains a productid.

LeetCode Problem 1596

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

The problem requires us to find the most frequently ordered product or products for each customer from a set of three tables: Customers, Orders, and Products. Each customer_id may have multiple orders, and each order contains a product_id. The goal is to output, for each customer who has made at least one order, the product_id and product_name of the product(s) they ordered the most.

The input represents a relational database structure where customers, their orders, and product details are stored separately. The expected output is a list of tuples (or rows) containing customer_id, product_id, and product_name for each customer's top product(s). If a customer has multiple products tied for the highest number of orders, all such products must be included. Customers with no orders are excluded from the output.

Key constraints and observations include:

  • Each order_id is unique.
  • A customer may order the same product multiple times, but not more than once per day.
  • There may be multiple products tied for the most frequently ordered product for a customer.
  • The dataset can have many customers and orders, so efficiency matters.

Important edge cases are customers with no orders, customers with multiple products tied for the maximum orders, and products that are ordered only once.

Approaches

The brute-force approach is to iterate over each customer and count the number of times they ordered each product. After counting, select the maximum and output all products with that frequency. While correct, this requires nested loops over customers and products, leading to high time complexity if the dataset is large.

The optimal approach leverages SQL window functions. First, we count the number of times each customer ordered each product. Then, we use a window function to compute the maximum order count per customer. Finally, we filter only the products that match this maximum count. This avoids repeated scanning and aggregation for each customer and ensures a single-pass computation of maximum frequency.

Approach Time Complexity Space Complexity Notes
Brute Force O(C * P) O(C * P) Count orders for each customer-product combination manually
Optimal O(N) O(N) Use SQL window functions to compute frequencies and max per customer efficiently

Algorithm Walkthrough

  1. Join the Orders table with the Products table on product_id to get product_name along with orders.
  2. Group the results by customer_id and product_id to count how many times each customer ordered each product. This gives a frequency table per customer-product pair.
  3. Use a window function (like MAX over partition by customer_id) to determine the maximum order count for each customer.
  4. Filter the frequency table to include only rows where the product count equals the maximum count for that customer.
  5. Select the required columns customer_id, product_id, and product_name for the output.

This works because the window function ensures we calculate the maximum frequency for each customer without extra queries, and filtering by this maximum directly gives the top products.

Python Solution

import pandas as pd

def most_frequent_products(customers: pd.DataFrame, orders: pd.DataFrame, products: pd.DataFrame) -> pd.DataFrame:
    # Merge Orders with Products to get product_name
    order_products = orders.merge(products, on="product_id")
    
    # Count how many times each customer ordered each product
    freq = order_products.groupby(["customer_id", "product_id", "product_name"]).size().reset_index(name="count")
    
    # Compute maximum frequency per customer
    freq["max_count"] = freq.groupby("customer_id")["count"].transform("max")
    
    # Filter rows where count matches max_count
    result = freq[freq["count"] == freq["max_count"]][["customer_id", "product_id", "product_name"]]
    
    return result

This implementation first merges Orders with Products to access product names, then groups by customer and product to count orders. Using the transform("max") function, we compute the max order count per customer in a vectorized manner. Filtering ensures only the most frequent products are returned.

Go Solution

package main

import (
    "database/sql"
    _ "github.com/go-sql-driver/mysql"
    "fmt"
)

func MostFrequentProducts(db *sql.DB) (*sql.Rows, error) {
    query := `
        SELECT o.customer_id, o.product_id, p.product_name
        FROM Orders o
        JOIN Products p ON o.product_id = p.product_id
        JOIN (
            SELECT customer_id, product_id, COUNT(*) AS cnt,
                   MAX(COUNT(*)) OVER (PARTITION BY customer_id) AS max_cnt
            FROM Orders
            GROUP BY customer_id, product_id
        ) AS t ON o.customer_id = t.customer_id AND o.product_id = t.product_id
        WHERE t.cnt = t.max_cnt
        GROUP BY o.customer_id, o.product_id, p.product_name
    `
    return db.Query(query)
}

The Go implementation uses SQL directly with window functions. The SQL query counts orders per customer-product pair, computes the maximum count per customer, and filters for the products that match the maximum count. Go code executes this query and returns the results. Differences from Python include reliance on SQL instead of in-memory DataFrame manipulation and handling results through sql.Rows.

Worked Examples

Example: Customer 1 (Alice)

  1. Orders: product 1 (keyboard) once, product 2 (mouse) three times.
  2. Count table: product 1 → 1, product 2 → 3.
  3. Maximum count: 3.
  4. Filtered result: product 2 (mouse).

Example: Customer 2 (Bob)

  1. Orders: product 1 → 1, product 2 → 1, product 3 → 1.
  2. Maximum count: 1.
  3. Filtered result: product 1, 2, 3.

This process is applied for each customer who has made at least one order.

Complexity Analysis

Measure Complexity Explanation
Time O(N) N is the number of orders. Merging and grouping operations scale linearly with the number of rows.
Space O(N) Storing counts and intermediate merged tables requires space proportional to the number of orders.

The complexity is efficient because we avoid nested loops over customers and products by leveraging group-by and window functions.

Test Cases

# Provided example
import pandas as pd

customers = pd.DataFrame({
    "customer_id": [1, 2, 3, 4, 5],
    "name": ["Alice", "Bob", "Tom", "Jerry", "John"]
})

orders = pd.DataFrame({
    "order_id": [1,2,3,4,5,6,7,8,9,10],
    "order_date": ["2020-07-31","2020-07-30","2020-08-29","2020-07-29",
                   "2020-06-10","2020-08-01","2020-08-01","2020-08-03",
                   "2020-08-07","2020-07-15"],
    "customer_id": [1,2,3,4,1,2,3,1,2,1],
    "product_id": [1,2,3,1,2,1,3,2,3,2]
})

products = pd.DataFrame({
    "product_id": [1,2,3,4],
    "product_name": ["keyboard", "mouse", "screen", "hard disk"],
    "price": [120, 80, 600, 450]
})

result = most_frequent_products(customers, orders, products)
assert result[(result.customer_id==1) & (result.product_id==2)].shape[0] == 1 # Alice: mouse
assert result[(result.customer_id==2) & (result.product_id==1)].shape[0] == 1 # Bob: keyboard
assert result[(result.customer_id==2) & (result.product_id==2)].shape[0] == 1 # Bob: mouse
assert result[(result.customer_id==2) & (result.product_id==3)].shape[0] == 1 # Bob: screen
assert result[(result.customer_id==3) & (result.product_id==3)].shape[0] == 1 # Tom: screen
assert result[(result.customer_id==4) & (result.product_id==1)].shape[0] == 1 # Jerry: keyboard
assert result[result.customer_id==5].shape[0] == 0 # John: no orders
Test Why
Alice: multiple orders, one top product Validates normal case with one maximum
Bob: multiple products tied Ensures tie handling works
Tom: only one product Simple single product case
Jerry: single order Edge case of only one order
John: no orders Validates customers with no orders are excluded

Edge Cases

One edge case is a customer with no orders, like John. The implementation correctly excludes such customers by grouping and counting only existing orders.

Another edge case is a tie between multiple products for the maximum frequency. Bob’s case illustrates this. The transform("max") function in Python or MAX window function in SQL ensures all products with the