LeetCode 1159 - Market Analysis II

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This problem asks us to determine, for every user in the marketplace system, whether the brand of the second item they sold matches their favorite brand.

The database contains three tables:

The Users table stores user information, including their favorite_brand.
The Orders table records purchases and sales. Each order includes a seller_id, item_id, and order_date.
The Items table maps each item_id to its item_brand.

For each user, we must identify the second item they sold in chronological order, determine that item's brand, and compare it with the user's favorite brand.

If the second sold item's brand matches the favorite brand, the output should contain "yes". Otherwise, it should contain "no".

A very important requirement is that users who sold fewer than two items must still appear in the result, and their answer must be "no".

The result should contain:

seller_id, which corresponds to the user's ID
2nd_item_fav_brand, which is either "yes" or "no"

The problem guarantees that no seller sells more than one item on the same day. This guarantee matters because we need to determine the "second" sold item by date. Since there cannot be multiple sales on the same day for the same seller, ordering by order_date is sufficient and deterministic.

The main challenge is identifying the second sale per seller efficiently and then comparing the sold item's brand against the seller's favorite brand.

Several edge cases can easily trip up a naive implementation:

A user may have zero sales, and still must appear with "no".
A user may have exactly one sale, which also results in "no".
The second sold item may exist, but its brand may not match the favorite brand.
Some users may have many sales, requiring careful ordering to ensure the second item is chosen correctly.
Since every user must appear in the result, we cannot start from Orders alone. We must include users with no orders using a LEFT JOIN.

Approaches

Brute Force Approach

A brute-force solution would process each user individually.

For every user, we could query all orders where they are the seller, sort those orders by order_date, and then check whether a second sale exists. If it does, we would retrieve the corresponding item's brand and compare it against the user's favorite brand.

This works because explicitly sorting all sales for each user guarantees that we can correctly identify the second sold item.

However, this approach is inefficient. If we repeatedly scan and sort orders separately for every user, we end up performing redundant work. In database systems, repeatedly grouping and sorting data per user is costly and scales poorly.

Instead, we want a way to rank all sales globally in a single pass and directly identify the second sale for each seller.

Key Insight

The key observation is that we only care about the second sale for every seller.

This naturally suggests using a window function, specifically ROW_NUMBER().

We can partition orders by seller_id and sort by order_date. This assigns a sequential rank to every sale:

Rank 1 = first sold item
Rank 2 = second sold item
Rank 3+ = later sold items

Once the ranking is done, we only need rows where rank = 2.

After finding the second sold item, we join it with Items to retrieve the brand, compare it with the user's favorite_brand, and output "yes" or "no".

Since every user must appear in the result, we use a LEFT JOIN from Users.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(U × O log O)	O(O)	Sorts each seller's orders independently
Optimal	O(O log O)	O(O)	Uses `ROW_NUMBER()` to rank sales once

Where:

U = number of users
O = number of orders

Algorithm Walkthrough

Optimal Algorithm Using Window Functions

Rank each seller's orders chronologically

Use ROW_NUMBER() partitioned by seller_id and ordered by order_date.

This creates a ranking for each seller's sales history.

Example:

seller_id	order_date	item_id	rank
2	2019-08-01	4	1
2	2019-08-04	1	2
4	2019-08-04	1	1
4	2019-08-05	2	2
2. Filter to only second sales

After ranking, keep only rows where rank = 2.

These rows represent the second sold item for each seller. 3. Join with Items

Use item_id to retrieve the item_brand.

This gives us the brand of the seller's second sold item. 4. Join with Users

Perform a LEFT JOIN from Users to ensure all users remain in the result, even those with no second sale. 5. Compare brands

Use a conditional expression:

If the second sold item's brand equals favorite_brand, return "yes"
Otherwise, return "no"

Why it works

The correctness comes from the guarantee provided by ROW_NUMBER(). Since each seller's orders are sorted by order_date, the row with rank 2 is always the seller's second sale. The problem guarantees there is at most one sale per day for a seller, so the ordering is unambiguous. By comparing the second sold item's brand against the favorite brand, we produce exactly the required output. Users without a second sale naturally produce "no" because of the LEFT JOIN.

Python Solution

Since this is a Database problem, the Python submission is simply the SQL query returned inside the required method.

from typing import Any

class Solution:
    def marketAnalysis(self) -> str:
        return """
        WITH ranked_orders AS (
            SELECT
                seller_id,
                item_id,
                ROW_NUMBER() OVER (
                    PARTITION BY seller_id
                    ORDER BY order_date
                ) AS sale_rank
            FROM Orders
        )
        SELECT
            u.user_id AS seller_id,
            CASE
                WHEN i.item_brand = u.favorite_brand
                THEN 'yes'
                ELSE 'no'
            END AS 2nd_item_fav_brand
        FROM Users u
        LEFT JOIN ranked_orders r
            ON u.user_id = r.seller_id
            AND r.sale_rank = 2
        LEFT JOIN Items i
            ON r.item_id = i.item_id
        """

The query begins by creating a Common Table Expression named ranked_orders. This CTE assigns a sequential number to each seller's orders using ROW_NUMBER(). The partitioning by seller_id ensures every seller gets an independent ranking, while ordering by order_date guarantees chronological order.

Next, the main query starts from Users. This is critical because every user must appear in the final output, even if they sold nothing.

A LEFT JOIN connects users to their ranked second sale. Restricting the join condition to sale_rank = 2 ensures we only retrieve the second sold item.

Another LEFT JOIN connects to Items so we can retrieve the sold item's brand.

Finally, a CASE statement compares the second item's brand with the user's favorite brand. If they match, we output "yes". Otherwise, including missing second sales, we output "no".

Go Solution

In Go-based database problems on LeetCode, we generally provide the SQL query as a string constant.

package main

func marketAnalysis() string {
	return `
WITH ranked_orders AS (
    SELECT
        seller_id,
        item_id,
        ROW_NUMBER() OVER (
            PARTITION BY seller_id
            ORDER BY order_date
        ) AS sale_rank
    FROM Orders
)
SELECT
    u.user_id AS seller_id,
    CASE
        WHEN i.item_brand = u.favorite_brand
        THEN 'yes'
        ELSE 'no'
    END AS 2nd_item_fav_brand
FROM Users u
LEFT JOIN ranked_orders r
    ON u.user_id = r.seller_id
    AND r.sale_rank = 2
LEFT JOIN Items i
    ON r.item_id = i.item_id
`
}

The Go version differs very little from the Python version because this is fundamentally an SQL problem. The implementation simply returns the SQL query as a raw string literal. Go's backtick syntax is useful here because it preserves formatting and avoids escaping quotation marks or newlines.

Worked Examples

Example 1

Step 1: Rank Orders Per Seller

Using ROW_NUMBER():

seller_id	order_date	item_id	rank
2	2019-08-01	4	1
2	2019-08-04	1	2
3	2019-08-02	2	1
3	2019-08-03	3	2
4	2019-08-04	1	1
4	2019-08-05	2	2

Step 2: Keep Only Rank 2

seller_id	second_item_id
2	1
3	3
4	2

Step 3: Join With Items

seller_id	second_item_id	brand
2	1	Samsung
3	3	LG
4	2	Lenovo

Step 4: Compare With Favorite Brand

seller_id	favorite_brand	second_brand	result
1	Lenovo	NULL	no
2	Samsung	Samsung	yes
3	LG	LG	yes
4	HP	Lenovo	no

Final output:

seller_id	2nd_item_fav_brand
1	no
2	yes
3	yes
4	no

Complexity Analysis

Measure	Complexity	Explanation
Time	O(O log O)	Window function sorts orders by seller and date
Space	O(O)	Ranking intermediate result stores order metadata

The dominant cost comes from computing ROW_NUMBER(), which internally requires ordering sales by order_date within each seller partition. In practice, database engines optimize window functions efficiently, but the worst case is typically modeled as sorting all orders. The joins themselves are linear relative to the table sizes.

Test Cases

def test_market_analysis():
    # Example case
    assert market_analysis(
        users=[
            (1, "2019-01-01", "Lenovo"),
            (2, "2019-02-09", "Samsung"),
            (3, "2019-01-19", "LG"),
            (4, "2019-05-21", "HP"),
        ],
        orders=[
            (1, "2019-08-01", 4, 1, 2),
            (2, "2019-08-02", 2, 1, 3),
            (3, "2019-08-03", 3, 2, 3),
            (4, "2019-08-04", 1, 4, 2),
            (5, "2019-08-04", 1, 3, 4),
            (6, "2019-08-05", 2, 2, 4),
        ],
        items=[
            (1, "Samsung"),
            (2, "Lenovo"),
            (3, "LG"),
            (4, "HP"),
        ]
    ) == {
        1: "no",
        2: "yes",
        3: "yes",
        4: "no"
    }  # Provided example

    assert market_analysis(
        users=[(1, "2019-01-01", "Apple")],
        orders=[],
        items=[]
    ) == {
        1: "no"
    }  # User with zero sales

    assert market_analysis(
        users=[(1, "2019-01-01", "Samsung")],
        orders=[
            (1, "2020-01-01", 1, 2, 1)
        ],
        items=[(1, "Samsung")]
    ) == {
        1: "no"
    }  # User with exactly one sale

    assert market_analysis(
        users=[(1, "2019-01-01", "Samsung")],
        orders=[
            (1, "2020-01-01", 1, 2, 1),
            (2, "2020-01-02", 2, 3, 1),
        ],
        items=[
            (1, "Apple"),
            (2, "Samsung")
        ]
    ) == {
        1: "yes"
    }  # Second item matches favorite brand

    assert market_analysis(
        users=[(1, "2019-01-01", "Apple")],
        orders=[
            (1, "2020-01-01", 1, 2, 1),
            (2, "2020-01-02", 2, 3, 1),
        ],
        items=[
            (1, "Apple"),
            (2, "Samsung")
        ]
    ) == {
        1: "no"
    }  # Second item does not match favorite brand

Test	Why
Provided example	Validates correctness against official example
User with no sales	Ensures `LEFT JOIN` behavior works
User with one sale	Confirms fewer than two sales returns `"no"`
Second item matches favorite	Verifies positive match logic
Second item mismatch	Verifies negative comparison logic

Edge Cases

Users With No Sales

A common mistake is starting the query from Orders. Doing so would exclude users who never sold anything. Since the problem explicitly requires all users to appear in the output, the implementation starts from Users and uses LEFT JOIN. This guarantees that users with no sales still appear and correctly receive "no".

Users With Exactly One Sale

A seller with only one sold item has no second sale to compare. A naive implementation might accidentally compare the first item instead. By explicitly filtering for sale_rank = 2, the query avoids this mistake. Missing second sales produce NULL, which falls into the "no" branch of the CASE expression.

Correct Chronological Ordering

If multiple sales occurred on the same day for a seller, determining the second item could become ambiguous. Fortunately, the problem guarantees that no seller sells more than one item per day. This means sorting by order_date alone always produces a unique and correct ordering for identifying the second sold item.