LeetCode 2922 - Market Analysis III

The problem asks us to analyze three relational tables: Users, Items, and Orders. Each user (seller) has a favorite brand, each item has a brand, and orders record which seller sold which item on which date.

LeetCode Problem 2922

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

The problem asks us to analyze three relational tables: Users, Items, and Orders. Each user (seller) has a favorite brand, each item has a brand, and orders record which seller sold which item on which date. We are asked to determine the top seller(s) based on the number of unique items sold that have a different brand from the seller's favorite brand. If multiple sellers tie for the highest count, all of them should be returned, sorted by seller_id ascending.

In simpler terms, the task is:

  1. Identify items sold by each seller.
  2. Exclude items that match the seller's favorite brand.
  3. Count the unique items remaining for each seller.
  4. Determine which seller(s) have the highest count of such unique items.
  5. Return their seller_id and the count.

Important points and constraints:

  • Each table has unique identifiers: seller_id, item_id, and order_id.
  • Sellers may have sold multiple orders of the same item, but only distinct items count.
  • There can be ties in the highest count.
  • Output must be sorted by seller_id.

Edge cases to consider upfront:

  • Sellers who have never sold an item with a brand different from their favorite.
  • Sellers with multiple orders for the same item (duplicates must be filtered).
  • Multiple sellers tying for the maximum count.
  • Empty tables, which should return an empty result.

Approaches

The brute-force approach would join all three tables (Orders, Items, Users) first, filter out items that match the seller's favorite brand, deduplicate by seller_id and item_id, and then count the number of unique items per seller. Finally, we would compute the maximum count and return sellers that match it. This approach works correctly but may involve heavy joins and repeated deduplication, which can be expensive for large datasets.

The optimal approach leverages SQL aggregation efficiently:

  1. Join the three tables directly using Orders.item_id = Items.item_id and Orders.seller_id = Users.seller_id.
  2. Filter rows where item_brand does not match favorite_brand.
  3. Use COUNT(DISTINCT item_id) to count unique items per seller.
  4. Compute the maximum of these counts.
  5. Select only sellers whose count equals the maximum.

This approach avoids unnecessary intermediate deduplication steps and uses SQL aggregation efficiently.

Approach Time Complexity Space Complexity Notes
Brute Force O(n * m) O(n) Joins all rows and deduplicates manually, slower for large tables
Optimal O(n + m + k) O(n) Uses direct joins, filtering, and aggregation; n, m, k are sizes of Users, Items, Orders

Algorithm Walkthrough

  1. Perform an inner join between Orders and Items on item_id to attach item_brand to each order.
  2. Join the result with Users on seller_id to attach favorite_brand for each seller.
  3. Filter rows where item_brand is different from favorite_brand. This ensures we only count items not matching the seller's favorite brand.
  4. Group the filtered result by seller_id and compute COUNT(DISTINCT item_id) to get the number of unique items sold per seller.
  5. Compute the maximum count from the grouped results.
  6. Filter the grouped result to include only sellers whose count equals the maximum.
  7. Order the result by seller_id ascending.

Why it works: The join operations ensure that each order is associated with the correct item brand and seller favorite brand. Filtering before aggregation ensures we count only valid items. Using COUNT(DISTINCT item_id) ensures duplicates are not counted multiple times. The final filtering step guarantees that only top sellers are returned.

Python Solution

import pandas as pd

def top_seller(users: pd.DataFrame, items: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    # Join Orders with Items to get item brands
    orders_items = orders.merge(items, on="item_id", how="inner")
    
    # Join the result with Users to get favorite brands
    full_data = orders_items.merge(users, on="seller_id", how="inner")
    
    # Filter out items that match seller's favorite brand
    filtered = full_data[full_data['item_brand'] != full_data['favorite_brand']]
    
    # Count unique items per seller
    seller_counts = filtered.groupby('seller_id')['item_id'].nunique().reset_index()
    seller_counts.rename(columns={'item_id': 'num_items'}, inplace=True)
    
    if seller_counts.empty:
        return pd.DataFrame(columns=['seller_id', 'num_items'])
    
    # Find the maximum count
    max_count = seller_counts['num_items'].max()
    
    # Return sellers with the maximum count, sorted by seller_id
    result = seller_counts[seller_counts['num_items'] == max_count].sort_values('seller_id')
    return result

Explanation: The code follows the algorithm precisely. First, it joins orders with items and users to get all relevant columns. Filtering ensures only items different from the favorite brand remain. Grouping and nunique() count distinct items. Finally, it filters by maximum and sorts the result.

Go Solution

package main

import (
    "database/sql"
    _ "github.com/go-sql-driver/mysql"
    "fmt"
)

func topSeller(db *sql.DB) (*sql.Rows, error) {
    query := `
    WITH seller_item_counts AS (
        SELECT
            u.seller_id,
            COUNT(DISTINCT o.item_id) AS num_items
        FROM Orders o
        JOIN Items i ON o.item_id = i.item_id
        JOIN Users u ON o.seller_id = u.seller_id
        WHERE i.item_brand <> u.favorite_brand
        GROUP BY u.seller_id
    ),
    max_count AS (
        SELECT MAX(num_items) AS max_items FROM seller_item_counts
    )
    SELECT sic.seller_id, sic.num_items
    FROM seller_item_counts sic, max_count mc
    WHERE sic.num_items = mc.max_items
    ORDER BY sic.seller_id ASC;
    `
    return db.Query(query)
}

Explanation: The Go version uses SQL directly, relying on a CTE to compute seller counts and maximum count. The query ensures only top sellers are returned. Go handles the result set with sql.Rows.

Worked Examples

Using the provided example:

Users Table

seller_id join_date favorite_brand
1 2019-01-01 Lenovo
2 2019-02-09 Samsung
3 2019-01-19 LG

Orders Table

order_id order_date item_id seller_id
1 2019-08-01 4 2
2 2019-08-02 2 3
3 2019-08-03 3 3
4 2019-08-04 1 2
5 2019-08-04 4 2

Items Table

item_id item_brand
1 Samsung
2 Lenovo
3 LG
4 HP

Step-by-step

  1. Join Orders with Items to attach item_brand.
  2. Join with Users to attach favorite_brand.
  3. Filter where item_brand != favorite_brand:
seller_id item_id item_brand favorite_brand
2 4 HP Samsung
3 2 Lenovo LG
3 3 LG LG
2 1 Samsung Samsung
2 4 HP Samsung
  1. Count unique items per seller:
seller_id num_items
2 1
3 1
  1. Maximum count is 1, both sellers match. Sorted by seller_id.

Complexity Analysis

Measure Complexity Explanation
Time O(n + m + k) n, m, k are sizes of Users, Items, Orders; each table is scanned once during joins and aggregation.
Space O(n + k) Storing joined tables and group-by results requires space proportional to orders and sellers.

The join and group-by operations dominate time complexity. Filtering and counting distinct items are linear in the size of the joined data.

Test Cases

# Test cases

import pandas as pd

# Example 1
users = pd.DataFrame({
    'seller_id': [1,2,3],
    'join_date': ['2019-01-01','2019-02-09','2019-01-19'],
    'favorite_brand': ['Lenovo','Samsung','LG']
})