LeetCode 2922 - Market Analysis III
The problem asks us to analyze three relational tables: Users, Items, and Orders. Each user (seller) has a favorite brand, each item has a brand, and orders record which seller sold which item on which date.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
The problem asks us to analyze three relational tables: Users, Items, and Orders. Each user (seller) has a favorite brand, each item has a brand, and orders record which seller sold which item on which date. We are asked to determine the top seller(s) based on the number of unique items sold that have a different brand from the seller's favorite brand. If multiple sellers tie for the highest count, all of them should be returned, sorted by seller_id ascending.
In simpler terms, the task is:
- Identify items sold by each seller.
- Exclude items that match the seller's favorite brand.
- Count the unique items remaining for each seller.
- Determine which seller(s) have the highest count of such unique items.
- Return their
seller_idand the count.
Important points and constraints:
- Each table has unique identifiers:
seller_id,item_id, andorder_id. - Sellers may have sold multiple orders of the same item, but only distinct items count.
- There can be ties in the highest count.
- Output must be sorted by
seller_id.
Edge cases to consider upfront:
- Sellers who have never sold an item with a brand different from their favorite.
- Sellers with multiple orders for the same item (duplicates must be filtered).
- Multiple sellers tying for the maximum count.
- Empty tables, which should return an empty result.
Approaches
The brute-force approach would join all three tables (Orders, Items, Users) first, filter out items that match the seller's favorite brand, deduplicate by seller_id and item_id, and then count the number of unique items per seller. Finally, we would compute the maximum count and return sellers that match it. This approach works correctly but may involve heavy joins and repeated deduplication, which can be expensive for large datasets.
The optimal approach leverages SQL aggregation efficiently:
- Join the three tables directly using
Orders.item_id = Items.item_idandOrders.seller_id = Users.seller_id. - Filter rows where
item_branddoes not matchfavorite_brand. - Use
COUNT(DISTINCT item_id)to count unique items per seller. - Compute the maximum of these counts.
- Select only sellers whose count equals the maximum.
This approach avoids unnecessary intermediate deduplication steps and uses SQL aggregation efficiently.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n * m) | O(n) | Joins all rows and deduplicates manually, slower for large tables |
| Optimal | O(n + m + k) | O(n) | Uses direct joins, filtering, and aggregation; n, m, k are sizes of Users, Items, Orders |
Algorithm Walkthrough
- Perform an inner join between
OrdersandItemsonitem_idto attachitem_brandto each order. - Join the result with
Usersonseller_idto attachfavorite_brandfor each seller. - Filter rows where
item_brandis different fromfavorite_brand. This ensures we only count items not matching the seller's favorite brand. - Group the filtered result by
seller_idand computeCOUNT(DISTINCT item_id)to get the number of unique items sold per seller. - Compute the maximum count from the grouped results.
- Filter the grouped result to include only sellers whose count equals the maximum.
- Order the result by
seller_idascending.
Why it works: The join operations ensure that each order is associated with the correct item brand and seller favorite brand. Filtering before aggregation ensures we count only valid items. Using COUNT(DISTINCT item_id) ensures duplicates are not counted multiple times. The final filtering step guarantees that only top sellers are returned.
Python Solution
import pandas as pd
def top_seller(users: pd.DataFrame, items: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
# Join Orders with Items to get item brands
orders_items = orders.merge(items, on="item_id", how="inner")
# Join the result with Users to get favorite brands
full_data = orders_items.merge(users, on="seller_id", how="inner")
# Filter out items that match seller's favorite brand
filtered = full_data[full_data['item_brand'] != full_data['favorite_brand']]
# Count unique items per seller
seller_counts = filtered.groupby('seller_id')['item_id'].nunique().reset_index()
seller_counts.rename(columns={'item_id': 'num_items'}, inplace=True)
if seller_counts.empty:
return pd.DataFrame(columns=['seller_id', 'num_items'])
# Find the maximum count
max_count = seller_counts['num_items'].max()
# Return sellers with the maximum count, sorted by seller_id
result = seller_counts[seller_counts['num_items'] == max_count].sort_values('seller_id')
return result
Explanation: The code follows the algorithm precisely. First, it joins orders with items and users to get all relevant columns. Filtering ensures only items different from the favorite brand remain. Grouping and nunique() count distinct items. Finally, it filters by maximum and sorts the result.
Go Solution
package main
import (
"database/sql"
_ "github.com/go-sql-driver/mysql"
"fmt"
)
func topSeller(db *sql.DB) (*sql.Rows, error) {
query := `
WITH seller_item_counts AS (
SELECT
u.seller_id,
COUNT(DISTINCT o.item_id) AS num_items
FROM Orders o
JOIN Items i ON o.item_id = i.item_id
JOIN Users u ON o.seller_id = u.seller_id
WHERE i.item_brand <> u.favorite_brand
GROUP BY u.seller_id
),
max_count AS (
SELECT MAX(num_items) AS max_items FROM seller_item_counts
)
SELECT sic.seller_id, sic.num_items
FROM seller_item_counts sic, max_count mc
WHERE sic.num_items = mc.max_items
ORDER BY sic.seller_id ASC;
`
return db.Query(query)
}
Explanation: The Go version uses SQL directly, relying on a CTE to compute seller counts and maximum count. The query ensures only top sellers are returned. Go handles the result set with sql.Rows.
Worked Examples
Using the provided example:
Users Table
| seller_id | join_date | favorite_brand |
|---|---|---|
| 1 | 2019-01-01 | Lenovo |
| 2 | 2019-02-09 | Samsung |
| 3 | 2019-01-19 | LG |
Orders Table
| order_id | order_date | item_id | seller_id |
|---|---|---|---|
| 1 | 2019-08-01 | 4 | 2 |
| 2 | 2019-08-02 | 2 | 3 |
| 3 | 2019-08-03 | 3 | 3 |
| 4 | 2019-08-04 | 1 | 2 |
| 5 | 2019-08-04 | 4 | 2 |
Items Table
| item_id | item_brand |
|---|---|
| 1 | Samsung |
| 2 | Lenovo |
| 3 | LG |
| 4 | HP |
Step-by-step
- Join
OrderswithItemsto attachitem_brand. - Join with
Usersto attachfavorite_brand. - Filter where
item_brand!=favorite_brand:
| seller_id | item_id | item_brand | favorite_brand |
|---|---|---|---|
| 2 | 4 | HP | Samsung |
| 3 | 2 | Lenovo | LG |
| 3 | 3 | LG | LG |
| 2 | 1 | Samsung | Samsung |
| 2 | 4 | HP | Samsung |
- Count unique items per seller:
| seller_id | num_items |
|---|---|
| 2 | 1 |
| 3 | 1 |
- Maximum count is 1, both sellers match. Sorted by
seller_id.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n + m + k) | n, m, k are sizes of Users, Items, Orders; each table is scanned once during joins and aggregation. |
| Space | O(n + k) | Storing joined tables and group-by results requires space proportional to orders and sellers. |
The join and group-by operations dominate time complexity. Filtering and counting distinct items are linear in the size of the joined data.
Test Cases
# Test cases
import pandas as pd
# Example 1
users = pd.DataFrame({
'seller_id': [1,2,3],
'join_date': ['2019-01-01','2019-02-09','2019-01-19'],
'favorite_brand': ['Lenovo','Samsung','LG']
})