LeetCode 2324 - Product Sales Analysis IV

This problem asks us to analyze sales data to determine which products each user spent the most money on. We are given two tables: Sales and Product. The Sales table contains individual transactions, showing which user bought which product and in what quantity.

LeetCode Problem 2324

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This problem asks us to analyze sales data to determine which products each user spent the most money on. We are given two tables: Sales and Product. The Sales table contains individual transactions, showing which user bought which product and in what quantity. The Product table contains the price of each product.

The task is to calculate the total money each user spent per product, identify the maximum amount each user spent, and then report all products on which the user spent that maximum amount. The output should include each user_id paired with one or more product_ids corresponding to the products they spent the most money on.

Constraints imply that sale_id and product_id are unique identifiers and the tables are of manageable size for database operations, but we must handle cases where a user has multiple products tied for maximum spending. Edge cases include users who bought only one product, users with multiple purchases that sum to the same maximum spending, and users with zero purchases (though the latter is not explicitly listed in the sample).

Approaches

The brute-force approach would calculate total spending for each user-product pair by iterating through all sales, then for each user, scan all product totals to find the maximum. This approach is correct but inefficient, especially if there are many users and products, as it requires multiple nested loops.

The optimal approach leverages SQL aggregation and joins. First, we join Sales with Product to compute spending per transaction, then aggregate spending per user-product pair using SUM(quantity * price). Finally, we use a window function or a subquery to determine the maximum spending per user and filter the results to include only the products where spending equals the maximum. This avoids nested loops and takes advantage of database optimizations for joins, aggregation, and filtering.

Approach Time Complexity Space Complexity Notes
Brute Force O(U * P * S) O(U * P) Iterate over each user and product; S is number of sales
Optimal O(S + U * P) O(U * P) Join + aggregate + filter with subquery or window function

Algorithm Walkthrough

  1. Join Tables: Perform an inner join between Sales and Product on product_id to get the price alongside quantity for each sale. This allows calculation of spending per transaction.
  2. Aggregate Spending: Group the resulting table by user_id and product_id, summing quantity * price to get the total money each user spent on each product.
  3. Determine Maximum Spending: For each user, compute the maximum total spending across all products. This can be done using a window function MAX() over user_id or a subquery per user.
  4. Filter Results: Select only those user-product pairs where the total spending equals the maximum for that user.
  5. Return Output: Produce the final result containing user_id and product_id. Ordering is not required.

Why it works: This method works because aggregation correctly computes total spending, and filtering on the computed maximum ensures that only the products with the highest spending per user are included. Using SQL set operations ensures efficiency and correctness even when multiple products have the same spending.

Python Solution

import sqlite3
from typing import List, Tuple

def product_sales_analysis(sales: List[Tuple[int, int, int, int]],
                           products: List[Tuple[int, int]]) -> List[Tuple[int, int]]:
    # Create in-memory SQLite database
    conn = sqlite3.connect(":memory:")
    cursor = conn.cursor()

    # Create tables
    cursor.execute("CREATE TABLE Sales(sale_id INT, product_id INT, user_id INT, quantity INT)")
    cursor.execute("CREATE TABLE Product(product_id INT, price INT)")

    # Insert data
    cursor.executemany("INSERT INTO Sales VALUES (?, ?, ?, ?)", sales)
    cursor.executemany("INSERT INTO Product VALUES (?, ?)", products)

    # SQL query to find products with max spending per user
    query = """
    WITH UserProductTotal AS (
        SELECT s.user_id, s.product_id, SUM(s.quantity * p.price) AS total_spent
        FROM Sales s
        JOIN Product p ON s.product_id = p.product_id
        GROUP BY s.user_id, s.product_id
    ),
    MaxSpent AS (
        SELECT user_id, MAX(total_spent) AS max_spent
        FROM UserProductTotal
        GROUP BY user_id
    )
    SELECT u.user_id, u.product_id
    FROM UserProductTotal u
    JOIN MaxSpent m ON u.user_id = m.user_id AND u.total_spent = m.max_spent
    """
    
    cursor.execute(query)
    result = cursor.fetchall()
    conn.close()
    return result

The Python implementation uses SQLite to mimic a SQL environment. First, it creates Sales and Product tables and inserts the given data. The SQL query uses a common table expression (CTE) to calculate total spending per user-product pair (UserProductTotal) and then finds the maximum spending per user (MaxSpent). The final join selects only products with spending equal to the maximum.

Go Solution

package main

import (
    "database/sql"
    _ "github.com/mattn/go-sqlite3"
)

func ProductSalesAnalysis(sales [][4]int, products [][2]int) [][2]int {
    db, _ := sql.Open("sqlite3", ":memory:")
    defer db.Close()

    db.Exec("CREATE TABLE Sales(sale_id INT, product_id INT, user_id INT, quantity INT)")
    db.Exec("CREATE TABLE Product(product_id INT, price INT)")

    for _, s := range sales {
        db.Exec("INSERT INTO Sales VALUES (?, ?, ?, ?)", s[0], s[1], s[2], s[3])
    }
    for _, p := range products {
        db.Exec("INSERT INTO Product VALUES (?, ?)", p[0], p[1])
    }

    query := `
    WITH UserProductTotal AS (
        SELECT s.user_id, s.product_id, SUM(s.quantity * p.price) AS total_spent
        FROM Sales s
        JOIN Product p ON s.product_id = p.product_id
        GROUP BY s.user_id, s.product_id
    ),
    MaxSpent AS (
        SELECT user_id, MAX(total_spent) AS max_spent
        FROM UserProductTotal
        GROUP BY user_id
    )
    SELECT u.user_id, u.product_id
    FROM UserProductTotal u
    JOIN MaxSpent m ON u.user_id = m.user_id AND u.total_spent = m.max_spent
    `

    rows, _ := db.Query(query)
    defer rows.Close()

    var result [][2]int
    for rows.Next() {
        var userID, productID int
        rows.Scan(&userID, &productID)
        result = append(result, [2]int{userID, productID})
    }

    return result
}

In Go, the implementation is similar, but with database/sql handling and slice manipulation. Go requires explicit iteration over query results to construct the output slice. Error handling is omitted for brevity but should be included in production.

Worked Examples

For the input:

Sales table:

sale_id | product_id | user_id | quantity
1       | 1          | 101     | 10
2       | 3          | 101     | 7
3       | 1          | 102     | 9
4       | 2          | 102     | 6
5       | 3          | 102     | 10
6       | 1          | 102     | 6

Product table:

product_id | price
1          | 10
2          | 25
3          | 15

Step 1: Join and calculate spending:

user_id | product_id | total_spent
101     | 1          | 100
101     | 3          | 105
102     | 1          | 150
102     | 2          | 150
102     | 3          | 150

Step 2: Determine maximum per user:

user_id | max_spent
101     | 105
102     | 150

Step 3: Filter products equal to max:

user_id | product_id
101     | 3
102     | 1
102     | 2
102     | 3

Complexity Analysis

Measure Complexity Explanation
Time O(S + U * P) S is the number of sales; aggregation and filtering per user-product pair dominates
Space O(U * P) Space to store total spending per user-product pair

The SQL engine optimizes joins and groupings efficiently. Using CTEs avoids repeated scans and reduces redundant computation.

Test Cases

# Provided example
assert sorted(product_sales_analysis(
    [(1,1,101,10),(2,3,101,7),(3,1,102,9),(4,2,102,6),(5,3,102,10),(6,1,102,6)],
    [(1,10),(2,25),(3,15)]
)) == sorted([(101,3),(102,1),(102,2),(102,3)])  # max spent tie

# Single user single product
assert product_sales_analysis(
    [(1,1,101,5)],
    [(1,20)]
)