LeetCode 3230 - Customer Purchasing Behavior Analysis
The problem asks us to analyze customer purchasing behavior using two relational tables: Transactions and Products. The Transactions table contains every purchase made by customers, including the transaction ID, customer ID, product ID, transaction date, and amount spent.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
The problem asks us to analyze customer purchasing behavior using two relational tables: Transactions and Products. The Transactions table contains every purchase made by customers, including the transaction ID, customer ID, product ID, transaction date, and amount spent. The Products table contains the product catalog with product IDs, their categories, and prices.
We are tasked with producing a summary per customer that includes the total money spent, number of transactions, number of unique categories purchased, average transaction amount, the most frequently purchased category (with tie-breaking by most recent purchase), and a calculated loyalty score. The output must be rounded appropriately and sorted by loyalty score descending, then customer ID ascending.
Important details include the rounding of numeric outputs to two decimal places, tie-breaking for top categories using transaction recency, and that loyalty score depends both on the number of transactions and the total amount spent.
Key edge cases include customers with only one transaction, multiple categories with identical purchase counts, transactions with identical timestamps, and customers who purchased only a single category.
Approaches
A brute-force approach would start by iterating through each customer, computing aggregates like total amount, transaction count, and unique categories by scanning all transactions multiple times. To find the top category, one would iterate through each customer's transaction history and count category occurrences, then identify the most recent category in the event of ties. While conceptually simple, this approach requires multiple scans per customer, leading to poor performance on large datasets.
The optimal approach leverages SQL aggregation and window functions. By joining the Transactions and Products tables, we can calculate sums, counts, and distinct category counts using standard GROUP BY aggregations. To find the most frequent category with recency tie-breaking, we can use ROW_NUMBER() over a partition of customer and category, ordering first by count descending and then by transaction date descending. This approach uses a single scan of the joined table and efficient grouping/window operations.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(N*M) | O(N*M) | Iterates per customer over all transactions, multiple times |
| Optimal | O(N log N) | O(N) | Uses SQL aggregations and window functions to compute all metrics efficiently |
Algorithm Walkthrough
- Join the
Transactionstable with theProductstable onproduct_idto access category information for each transaction. - Compute the total amount spent, transaction count, and unique category count per customer using
GROUP BY customer_id. - Compute the average transaction amount as total amount divided by transaction count and round to 2 decimal places.
- For each customer-category pair, count the number of transactions and identify the latest transaction date.
- Use a window function with
ROW_NUMBER()partitioned by customer and ordered by transaction count descending, then by latest transaction date descending to assign a rank to categories per customer. - Select the top-ranked category per customer as the most frequently purchased category with tie-breaking.
- Calculate the loyalty score as
(transaction_count * 10) + (total_amount / 100)and round to 2 decimal places. - Combine all metrics into a single table and order by loyalty score descending, then by customer ID ascending.
The algorithm works because aggregation correctly summarizes numeric data, and the window function ensures that ties in category frequency are resolved by recency.
Python Solution
import pandas as pd
def customer_behavior_analysis(transactions: pd.DataFrame, products: pd.DataFrame) -> pd.DataFrame:
# Join transactions with products to get categories
df = transactions.merge(products, on='product_id', how='left')
# Aggregate per customer
agg = df.groupby('customer_id').agg(
total_amount=pd.NamedAgg(column='amount', aggfunc='sum'),
transaction_count=pd.NamedAgg(column='transaction_id', aggfunc='count'),
unique_categories=pd.NamedAgg(column='category', aggfunc=lambda x: x.nunique())
).reset_index()
# Average transaction amount
agg['avg_transaction_amount'] = (agg['total_amount'] / agg['transaction_count']).round(2)
agg['total_amount'] = agg['total_amount'].round(2)
# Loyalty score
agg['loyalty_score'] = ((agg['transaction_count'] * 10) + (agg['total_amount'] / 100)).round(2)
# Top category calculation
category_counts = df.groupby(['customer_id', 'category']).agg(
category_count=('transaction_id', 'count'),
last_transaction=('transaction_date', 'max')
).reset_index()
category_counts['rank'] = category_counts.groupby('customer_id')\
.apply(lambda x: x.sort_values(['category_count', 'last_transaction'], ascending=[False, False])
.assign(rank=lambda y: range(1, len(y)+1)))['rank'].values
top_category = category_counts[category_counts['rank'] == 1][['customer_id', 'category']]
top_category = top_category.rename(columns={'category': 'top_category'})
# Merge top category with aggregated metrics
result = agg.merge(top_category, on='customer_id', how='left')
# Sort by loyalty score descending, then customer_id ascending
result = result.sort_values(['loyalty_score', 'customer_id'], ascending=[False, True])
return result
This implementation first merges the transactions with product information to access category data. Then, customer-level aggregates are computed, including total amount, transaction count, unique categories, and average transaction amount. Loyalty score is calculated using the formula given. Top category per customer is determined by counting transactions per category and applying a window function with tie-breaking based on the most recent transaction. Finally, all metrics are combined and sorted as required.
Go Solution
package main
import (
"database/sql"
_ "github.com/go-sql-driver/mysql"
)
func CustomerBehaviorAnalysis(db *sql.DB) (*sql.Rows, error) {
query := `
WITH category_rank AS (
SELECT
t.customer_id,
p.category,
COUNT(*) AS category_count,
MAX(t.transaction_date) AS last_transaction,
ROW_NUMBER() OVER(PARTITION BY t.customer_id
ORDER BY COUNT(*) DESC, MAX(t.transaction_date) DESC) AS rn
FROM Transactions t
JOIN Products p ON t.product_id = p.product_id
GROUP BY t.customer_id, p.category
)
SELECT
a.customer_id,
ROUND(SUM(t.amount), 2) AS total_amount,
COUNT(t.transaction_id) AS transaction_count,
COUNT(DISTINCT p.category) AS unique_categories,
ROUND(SUM(t.amount)/COUNT(t.transaction_id), 2) AS avg_transaction_amount,
cr.category AS top_category,
ROUND(COUNT(t.transaction_id)*10 + SUM(t.amount)/100, 2) AS loyalty_score
FROM Transactions t
JOIN Products p ON t.product_id = p.product_id
JOIN category_rank cr ON cr.customer_id = t.customer_id AND cr.rn = 1
GROUP BY a.customer_id, cr.category
ORDER BY loyalty_score DESC, customer_id ASC;
`
return db.Query(query)
}
In Go, the approach is similar. SQL is used to perform aggregation and ranking. ROW_NUMBER() is used to determine the top category per customer, and the final selection aggregates transactions per customer while joining the top category. We rely on the SQL engine for rounding and sorting.
Worked Examples
For customer 101:
| Step | Calculation |
|---|---|
| Total amount | 100 + 150 + 200 = 450.00 |
| Transaction count | 3 |
| Unique categories | A, B, C → 3 |
| Avg transaction | 450 / 3 = 150.00 |
| Top category | A, B, C all count 1 → C is most recent (2023-02-10) |
| Loyalty score | (3*10) + (450/100) = 34.50 |
For customer 102:
| Step | Calculation |
|---|---|
| Total amount | 100 + 200 = 300.00 |
| Transaction count | 2 |
| Unique categories | A, C → 2 |
| Avg transaction | 300 / 2 = 150.00 |
| Top category | A, C both count 1 → C is most recent (2023-01-22) |
| Loyalty score | (2*10) + (300/100) = 23.00 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N log N) | Aggregation and ranking over N transactions require sorting for ROW_NUMBER tie-breaking |
| Space | O(N) | Storing aggregated metrics and category counts per customer |
The main cost is sorting when computing the rank of top categories. Aggregation functions are linear over the dataset.
Test Cases
import pandas as pd
# Example test case
transactions = pd.DataFrame([
[1, 101, 1, '2023-01-01', 100.00],
[2, 101, 2, '2023-01-15', 150.00],
[3, 102, 1, '2023-01-01', 100.00],
[4, 102, 3, '2023-01-22', 200.00],
[5, 101, 3, '2023-02-10', 200.00]
], columns=['transaction_id', 'customer_id', '