LeetCode 1159 - Market Analysis II
This problem asks us to determine, for every user in the marketplace system, whether the brand of the second item they sold matches their favorite brand. The database contains three tables: - The Users table stores user information, including their favoritebrand.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
This problem asks us to determine, for every user in the marketplace system, whether the brand of the second item they sold matches their favorite brand.
The database contains three tables:
- The
Userstable stores user information, including theirfavorite_brand. - The
Orderstable records purchases and sales. Each order includes aseller_id,item_id, andorder_date. - The
Itemstable maps eachitem_idto itsitem_brand.
For each user, we must identify the second item they sold in chronological order, determine that item's brand, and compare it with the user's favorite brand.
If the second sold item's brand matches the favorite brand, the output should contain "yes". Otherwise, it should contain "no".
A very important requirement is that users who sold fewer than two items must still appear in the result, and their answer must be "no".
The result should contain:
seller_id, which corresponds to the user's ID2nd_item_fav_brand, which is either"yes"or"no"
The problem guarantees that no seller sells more than one item on the same day. This guarantee matters because we need to determine the "second" sold item by date. Since there cannot be multiple sales on the same day for the same seller, ordering by order_date is sufficient and deterministic.
The main challenge is identifying the second sale per seller efficiently and then comparing the sold item's brand against the seller's favorite brand.
Several edge cases can easily trip up a naive implementation:
- A user may have zero sales, and still must appear with
"no". - A user may have exactly one sale, which also results in
"no". - The second sold item may exist, but its brand may not match the favorite brand.
- Some users may have many sales, requiring careful ordering to ensure the second item is chosen correctly.
- Since every user must appear in the result, we cannot start from
Ordersalone. We must include users with no orders using aLEFT JOIN.
Approaches
Brute Force Approach
A brute-force solution would process each user individually.
For every user, we could query all orders where they are the seller, sort those orders by order_date, and then check whether a second sale exists. If it does, we would retrieve the corresponding item's brand and compare it against the user's favorite brand.
This works because explicitly sorting all sales for each user guarantees that we can correctly identify the second sold item.
However, this approach is inefficient. If we repeatedly scan and sort orders separately for every user, we end up performing redundant work. In database systems, repeatedly grouping and sorting data per user is costly and scales poorly.
Instead, we want a way to rank all sales globally in a single pass and directly identify the second sale for each seller.
Key Insight
The key observation is that we only care about the second sale for every seller.
This naturally suggests using a window function, specifically ROW_NUMBER().
We can partition orders by seller_id and sort by order_date. This assigns a sequential rank to every sale:
- Rank
1= first sold item - Rank
2= second sold item - Rank
3+= later sold items
Once the ranking is done, we only need rows where rank = 2.
After finding the second sold item, we join it with Items to retrieve the brand, compare it with the user's favorite_brand, and output "yes" or "no".
Since every user must appear in the result, we use a LEFT JOIN from Users.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(U × O log O) | O(O) | Sorts each seller's orders independently |
| Optimal | O(O log O) | O(O) | Uses ROW_NUMBER() to rank sales once |
Where:
U= number of usersO= number of orders
Algorithm Walkthrough
Optimal Algorithm Using Window Functions
- Rank each seller's orders chronologically
Use ROW_NUMBER() partitioned by seller_id and ordered by order_date.
This creates a ranking for each seller's sales history.
Example:
| seller_id | order_date | item_id | rank |
|---|---|---|---|
| 2 | 2019-08-01 | 4 | 1 |
| 2 | 2019-08-04 | 1 | 2 |
| 4 | 2019-08-04 | 1 | 1 |
| 4 | 2019-08-05 | 2 | 2 |
| 2. Filter to only second sales |
After ranking, keep only rows where rank = 2.
These rows represent the second sold item for each seller.
3. Join with Items
Use item_id to retrieve the item_brand.
This gives us the brand of the seller's second sold item.
4. Join with Users
Perform a LEFT JOIN from Users to ensure all users remain in the result, even those with no second sale.
5. Compare brands
Use a conditional expression:
- If the second sold item's brand equals
favorite_brand, return"yes" - Otherwise, return
"no"
Why it works
The correctness comes from the guarantee provided by ROW_NUMBER(). Since each seller's orders are sorted by order_date, the row with rank 2 is always the seller's second sale. The problem guarantees there is at most one sale per day for a seller, so the ordering is unambiguous. By comparing the second sold item's brand against the favorite brand, we produce exactly the required output. Users without a second sale naturally produce "no" because of the LEFT JOIN.
Python Solution
Since this is a Database problem, the Python submission is simply the SQL query returned inside the required method.
from typing import Any
class Solution:
def marketAnalysis(self) -> str:
return """
WITH ranked_orders AS (
SELECT
seller_id,
item_id,
ROW_NUMBER() OVER (
PARTITION BY seller_id
ORDER BY order_date
) AS sale_rank
FROM Orders
)
SELECT
u.user_id AS seller_id,
CASE
WHEN i.item_brand = u.favorite_brand
THEN 'yes'
ELSE 'no'
END AS 2nd_item_fav_brand
FROM Users u
LEFT JOIN ranked_orders r
ON u.user_id = r.seller_id
AND r.sale_rank = 2
LEFT JOIN Items i
ON r.item_id = i.item_id
"""
The query begins by creating a Common Table Expression named ranked_orders. This CTE assigns a sequential number to each seller's orders using ROW_NUMBER(). The partitioning by seller_id ensures every seller gets an independent ranking, while ordering by order_date guarantees chronological order.
Next, the main query starts from Users. This is critical because every user must appear in the final output, even if they sold nothing.
A LEFT JOIN connects users to their ranked second sale. Restricting the join condition to sale_rank = 2 ensures we only retrieve the second sold item.
Another LEFT JOIN connects to Items so we can retrieve the sold item's brand.
Finally, a CASE statement compares the second item's brand with the user's favorite brand. If they match, we output "yes". Otherwise, including missing second sales, we output "no".
Go Solution
In Go-based database problems on LeetCode, we generally provide the SQL query as a string constant.
package main
func marketAnalysis() string {
return `
WITH ranked_orders AS (
SELECT
seller_id,
item_id,
ROW_NUMBER() OVER (
PARTITION BY seller_id
ORDER BY order_date
) AS sale_rank
FROM Orders
)
SELECT
u.user_id AS seller_id,
CASE
WHEN i.item_brand = u.favorite_brand
THEN 'yes'
ELSE 'no'
END AS 2nd_item_fav_brand
FROM Users u
LEFT JOIN ranked_orders r
ON u.user_id = r.seller_id
AND r.sale_rank = 2
LEFT JOIN Items i
ON r.item_id = i.item_id
`
}
The Go version differs very little from the Python version because this is fundamentally an SQL problem. The implementation simply returns the SQL query as a raw string literal. Go's backtick syntax is useful here because it preserves formatting and avoids escaping quotation marks or newlines.
Worked Examples
Example 1
Step 1: Rank Orders Per Seller
Using ROW_NUMBER():
| seller_id | order_date | item_id | rank |
|---|---|---|---|
| 2 | 2019-08-01 | 4 | 1 |
| 2 | 2019-08-04 | 1 | 2 |
| 3 | 2019-08-02 | 2 | 1 |
| 3 | 2019-08-03 | 3 | 2 |
| 4 | 2019-08-04 | 1 | 1 |
| 4 | 2019-08-05 | 2 | 2 |
Step 2: Keep Only Rank 2
| seller_id | second_item_id |
|---|---|
| 2 | 1 |
| 3 | 3 |
| 4 | 2 |
Step 3: Join With Items
| seller_id | second_item_id | brand |
|---|---|---|
| 2 | 1 | Samsung |
| 3 | 3 | LG |
| 4 | 2 | Lenovo |
Step 4: Compare With Favorite Brand
| seller_id | favorite_brand | second_brand | result |
|---|---|---|---|
| 1 | Lenovo | NULL | no |
| 2 | Samsung | Samsung | yes |
| 3 | LG | LG | yes |
| 4 | HP | Lenovo | no |
Final output:
| seller_id | 2nd_item_fav_brand |
|---|---|
| 1 | no |
| 2 | yes |
| 3 | yes |
| 4 | no |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(O log O) | Window function sorts orders by seller and date |
| Space | O(O) | Ranking intermediate result stores order metadata |
The dominant cost comes from computing ROW_NUMBER(), which internally requires ordering sales by order_date within each seller partition. In practice, database engines optimize window functions efficiently, but the worst case is typically modeled as sorting all orders. The joins themselves are linear relative to the table sizes.
Test Cases
def test_market_analysis():
# Example case
assert market_analysis(
users=[
(1, "2019-01-01", "Lenovo"),
(2, "2019-02-09", "Samsung"),
(3, "2019-01-19", "LG"),
(4, "2019-05-21", "HP"),
],
orders=[
(1, "2019-08-01", 4, 1, 2),
(2, "2019-08-02", 2, 1, 3),
(3, "2019-08-03", 3, 2, 3),
(4, "2019-08-04", 1, 4, 2),
(5, "2019-08-04", 1, 3, 4),
(6, "2019-08-05", 2, 2, 4),
],
items=[
(1, "Samsung"),
(2, "Lenovo"),
(3, "LG"),
(4, "HP"),
]
) == {
1: "no",
2: "yes",
3: "yes",
4: "no"
} # Provided example
assert market_analysis(
users=[(1, "2019-01-01", "Apple")],
orders=[],
items=[]
) == {
1: "no"
} # User with zero sales
assert market_analysis(
users=[(1, "2019-01-01", "Samsung")],
orders=[
(1, "2020-01-01", 1, 2, 1)
],
items=[(1, "Samsung")]
) == {
1: "no"
} # User with exactly one sale
assert market_analysis(
users=[(1, "2019-01-01", "Samsung")],
orders=[
(1, "2020-01-01", 1, 2, 1),
(2, "2020-01-02", 2, 3, 1),
],
items=[
(1, "Apple"),
(2, "Samsung")
]
) == {
1: "yes"
} # Second item matches favorite brand
assert market_analysis(
users=[(1, "2019-01-01", "Apple")],
orders=[
(1, "2020-01-01", 1, 2, 1),
(2, "2020-01-02", 2, 3, 1),
],
items=[
(1, "Apple"),
(2, "Samsung")
]
) == {
1: "no"
} # Second item does not match favorite brand
| Test | Why |
|---|---|
| Provided example | Validates correctness against official example |
| User with no sales | Ensures LEFT JOIN behavior works |
| User with one sale | Confirms fewer than two sales returns "no" |
| Second item matches favorite | Verifies positive match logic |
| Second item mismatch | Verifies negative comparison logic |
Edge Cases
Users With No Sales
A common mistake is starting the query from Orders. Doing so would exclude users who never sold anything. Since the problem explicitly requires all users to appear in the output, the implementation starts from Users and uses LEFT JOIN. This guarantees that users with no sales still appear and correctly receive "no".
Users With Exactly One Sale
A seller with only one sold item has no second sale to compare. A naive implementation might accidentally compare the first item instead. By explicitly filtering for sale_rank = 2, the query avoids this mistake. Missing second sales produce NULL, which falls into the "no" branch of the CASE expression.
Correct Chronological Ordering
If multiple sales occurred on the same day for a seller, determining the second item could become ambiguous. Fortunately, the problem guarantees that no seller sells more than one item per day. This means sorting by order_date alone always produces a unique and correct ordering for identifying the second sold item.