LeetCode 2324 - Product Sales Analysis IV
This problem asks us to analyze sales data to determine which products each user spent the most money on. We are given two tables: Sales and Product. The Sales table contains individual transactions, showing which user bought which product and in what quantity.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
This problem asks us to analyze sales data to determine which products each user spent the most money on. We are given two tables: Sales and Product. The Sales table contains individual transactions, showing which user bought which product and in what quantity. The Product table contains the price of each product.
The task is to calculate the total money each user spent per product, identify the maximum amount each user spent, and then report all products on which the user spent that maximum amount. The output should include each user_id paired with one or more product_ids corresponding to the products they spent the most money on.
Constraints imply that sale_id and product_id are unique identifiers and the tables are of manageable size for database operations, but we must handle cases where a user has multiple products tied for maximum spending. Edge cases include users who bought only one product, users with multiple purchases that sum to the same maximum spending, and users with zero purchases (though the latter is not explicitly listed in the sample).
Approaches
The brute-force approach would calculate total spending for each user-product pair by iterating through all sales, then for each user, scan all product totals to find the maximum. This approach is correct but inefficient, especially if there are many users and products, as it requires multiple nested loops.
The optimal approach leverages SQL aggregation and joins. First, we join Sales with Product to compute spending per transaction, then aggregate spending per user-product pair using SUM(quantity * price). Finally, we use a window function or a subquery to determine the maximum spending per user and filter the results to include only the products where spending equals the maximum. This avoids nested loops and takes advantage of database optimizations for joins, aggregation, and filtering.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(U * P * S) | O(U * P) | Iterate over each user and product; S is number of sales |
| Optimal | O(S + U * P) | O(U * P) | Join + aggregate + filter with subquery or window function |
Algorithm Walkthrough
- Join Tables: Perform an inner join between
SalesandProductonproduct_idto get the price alongside quantity for each sale. This allows calculation of spending per transaction. - Aggregate Spending: Group the resulting table by
user_idandproduct_id, summingquantity * priceto get the total money each user spent on each product. - Determine Maximum Spending: For each user, compute the maximum total spending across all products. This can be done using a window function
MAX()overuser_idor a subquery per user. - Filter Results: Select only those user-product pairs where the total spending equals the maximum for that user.
- Return Output: Produce the final result containing
user_idandproduct_id. Ordering is not required.
Why it works: This method works because aggregation correctly computes total spending, and filtering on the computed maximum ensures that only the products with the highest spending per user are included. Using SQL set operations ensures efficiency and correctness even when multiple products have the same spending.
Python Solution
import sqlite3
from typing import List, Tuple
def product_sales_analysis(sales: List[Tuple[int, int, int, int]],
products: List[Tuple[int, int]]) -> List[Tuple[int, int]]:
# Create in-memory SQLite database
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()
# Create tables
cursor.execute("CREATE TABLE Sales(sale_id INT, product_id INT, user_id INT, quantity INT)")
cursor.execute("CREATE TABLE Product(product_id INT, price INT)")
# Insert data
cursor.executemany("INSERT INTO Sales VALUES (?, ?, ?, ?)", sales)
cursor.executemany("INSERT INTO Product VALUES (?, ?)", products)
# SQL query to find products with max spending per user
query = """
WITH UserProductTotal AS (
SELECT s.user_id, s.product_id, SUM(s.quantity * p.price) AS total_spent
FROM Sales s
JOIN Product p ON s.product_id = p.product_id
GROUP BY s.user_id, s.product_id
),
MaxSpent AS (
SELECT user_id, MAX(total_spent) AS max_spent
FROM UserProductTotal
GROUP BY user_id
)
SELECT u.user_id, u.product_id
FROM UserProductTotal u
JOIN MaxSpent m ON u.user_id = m.user_id AND u.total_spent = m.max_spent
"""
cursor.execute(query)
result = cursor.fetchall()
conn.close()
return result
The Python implementation uses SQLite to mimic a SQL environment. First, it creates Sales and Product tables and inserts the given data. The SQL query uses a common table expression (CTE) to calculate total spending per user-product pair (UserProductTotal) and then finds the maximum spending per user (MaxSpent). The final join selects only products with spending equal to the maximum.
Go Solution
package main
import (
"database/sql"
_ "github.com/mattn/go-sqlite3"
)
func ProductSalesAnalysis(sales [][4]int, products [][2]int) [][2]int {
db, _ := sql.Open("sqlite3", ":memory:")
defer db.Close()
db.Exec("CREATE TABLE Sales(sale_id INT, product_id INT, user_id INT, quantity INT)")
db.Exec("CREATE TABLE Product(product_id INT, price INT)")
for _, s := range sales {
db.Exec("INSERT INTO Sales VALUES (?, ?, ?, ?)", s[0], s[1], s[2], s[3])
}
for _, p := range products {
db.Exec("INSERT INTO Product VALUES (?, ?)", p[0], p[1])
}
query := `
WITH UserProductTotal AS (
SELECT s.user_id, s.product_id, SUM(s.quantity * p.price) AS total_spent
FROM Sales s
JOIN Product p ON s.product_id = p.product_id
GROUP BY s.user_id, s.product_id
),
MaxSpent AS (
SELECT user_id, MAX(total_spent) AS max_spent
FROM UserProductTotal
GROUP BY user_id
)
SELECT u.user_id, u.product_id
FROM UserProductTotal u
JOIN MaxSpent m ON u.user_id = m.user_id AND u.total_spent = m.max_spent
`
rows, _ := db.Query(query)
defer rows.Close()
var result [][2]int
for rows.Next() {
var userID, productID int
rows.Scan(&userID, &productID)
result = append(result, [2]int{userID, productID})
}
return result
}
In Go, the implementation is similar, but with database/sql handling and slice manipulation. Go requires explicit iteration over query results to construct the output slice. Error handling is omitted for brevity but should be included in production.
Worked Examples
For the input:
Sales table:
sale_id | product_id | user_id | quantity
1 | 1 | 101 | 10
2 | 3 | 101 | 7
3 | 1 | 102 | 9
4 | 2 | 102 | 6
5 | 3 | 102 | 10
6 | 1 | 102 | 6
Product table:
product_id | price
1 | 10
2 | 25
3 | 15
Step 1: Join and calculate spending:
user_id | product_id | total_spent
101 | 1 | 100
101 | 3 | 105
102 | 1 | 150
102 | 2 | 150
102 | 3 | 150
Step 2: Determine maximum per user:
user_id | max_spent
101 | 105
102 | 150
Step 3: Filter products equal to max:
user_id | product_id
101 | 3
102 | 1
102 | 2
102 | 3
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(S + U * P) | S is the number of sales; aggregation and filtering per user-product pair dominates |
| Space | O(U * P) | Space to store total spending per user-product pair |
The SQL engine optimizes joins and groupings efficiently. Using CTEs avoids repeated scans and reduces redundant computation.
Test Cases
# Provided example
assert sorted(product_sales_analysis(
[(1,1,101,10),(2,3,101,7),(3,1,102,9),(4,2,102,6),(5,3,102,10),(6,1,102,6)],
[(1,10),(2,25),(3,15)]
)) == sorted([(101,3),(102,1),(102,2),(102,3)]) # max spent tie
# Single user single product
assert product_sales_analysis(
[(1,1,101,5)],
[(1,20)]
)