LeetCode 1158 - Market Analysis I
This problem asks us to analyze purchasing activity on an online shopping platform. We are given three database tables: Users, Orders, and Items.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
This problem asks us to analyze purchasing activity on an online shopping platform. We are given three database tables: Users, Orders, and Items. Our task is to produce a result that contains every user, their join date, and the number of orders they placed as a buyer during the year 2019.
The key detail is that we are counting purchases made by users, not sales. The relevant column in the Orders table is therefore buyer_id, not seller_id.
The Users table contains one row per user. Each row stores the user's unique identifier, the date they joined the platform, and their favorite brand.
The Orders table contains one row per order. Each order includes the order date, the buyer, the seller, and the purchased item.
The Items table exists in the schema, but it is actually irrelevant for this specific problem because the output only depends on users and order counts. No information about item brands or item metadata is needed.
The expected output should contain:
-
buyer_id, which corresponds to the user'suser_id -
join_datefrom theUserstable -
orders_in_2019, which is the count of orders where: -
the user appears as
buyer_id -
the order date falls within the year 2019
An important detail is that every user must appear in the result, even if they made zero purchases in 2019. This means we cannot use a simple inner join between Users and Orders, because users without qualifying orders would disappear from the result.
The constraints are not explicitly listed, but since this is a SQL database problem, we should assume tables may contain many rows. Efficient aggregation and joining are therefore important. We want a solution that scans the tables cleanly without unnecessary repeated work.
Several edge cases are important:
- A user may have no orders at all
- A user may have orders, but none in 2019
- Multiple orders may exist for the same buyer in 2019
- Orders from years other than 2019 must not be counted
- Some users may appear only as sellers and never as buyers
A naive implementation can easily fail if it excludes users with zero qualifying orders or accidentally counts orders from the wrong year.
Approaches
Brute Force Approach
A brute force solution would process each user individually and scan the entire Orders table to count matching orders.
For every user in the Users table, we would:
- Initialize a counter to zero
- Iterate through every order
- Check whether:
- the order's
buyer_idmatches the current user - the order date belongs to 2019
- Increment the counter if both conditions are true
- Output the user information and the final count
This approach is straightforward and guarantees correctness because every order is explicitly checked against every user.
However, the performance is poor. If there are U users and O orders, the total complexity becomes O(U × O). With large datasets, repeatedly scanning the entire orders table becomes inefficient.
Optimal Approach
The better solution is to aggregate order counts once, then join the aggregated results with the users table.
The key observation is that we do not need to repeatedly scan orders for every user. Instead, we can first group all 2019 orders by buyer_id and compute the counts in a single pass.
After computing these counts, we perform a LEFT JOIN from Users to the aggregated result. A left join ensures that all users remain in the output, even if they have no matching orders.
If a user has no orders in 2019, the aggregation result will be NULL, so we replace it with 0 using IFNULL or COALESCE.
This approach is much more efficient because the orders table is scanned only once for aggregation.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(U × O) | O(1) | Scan all orders for every user |
| Optimal | O(U + O) | O(U) | Aggregate once, then join results |
Algorithm Walkthrough
- Start with the
Orderstable and filter only rows where the order date belongs to 2019. This ensures that orders from other years are ignored immediately. - Group the filtered orders by
buyer_id. Grouping allows us to collect all orders belonging to the same buyer together. - Count the number of orders in each group. The result becomes a temporary mapping from buyer to order count.
- Use the
Userstable as the main table in the query. This is important because every user must appear in the final output, even users with zero purchases. - Perform a
LEFT JOINbetweenUsersand the aggregated order counts using:
Users.user_id = aggregated.buyer_id
- Replace any
NULLorder counts with0usingIFNULLorCOALESCE. ANULLappears when a user has no matching orders in 2019. - Select the required columns:
user_idasbuyer_idjoin_date- order count as
orders_in_2019
Why it works
The aggregation step guarantees that each buyer receives the correct number of 2019 orders because every qualifying order contributes exactly once to its buyer's group.
The left join guarantees that all users remain in the final output, including users with zero matching orders. Replacing NULL with 0 correctly represents users who made no purchases in 2019.
Together, these properties ensure the query produces exactly the required result.
Python Solution
# Write your MySQL query statement below
SELECT
u.user_id AS buyer_id,
u.join_date,
IFNULL(o.orders_in_2019, 0) AS orders_in_2019
FROM Users u
LEFT JOIN (
SELECT
buyer_id,
COUNT(*) AS orders_in_2019
FROM Orders
WHERE YEAR(order_date) = 2019
GROUP BY buyer_id
) o
ON u.user_id = o.buyer_id;
This solution begins by constructing a subquery that computes the number of 2019 orders for each buyer.
Inside the subquery:
YEAR(order_date) = 2019filters relevant ordersGROUP BY buyer_idgroups orders per buyerCOUNT(*)calculates how many qualifying orders each buyer made
The outer query then uses the Users table as the primary table and performs a LEFT JOIN with the aggregated results.
Using a left join is essential because users without orders in 2019 still need to appear in the final output.
Finally, IFNULL(o.orders_in_2019, 0) converts missing counts into zeroes.
Go Solution
// Write your MySQL query statement below
SELECT
u.user_id AS buyer_id,
u.join_date,
IFNULL(o.orders_in_2019, 0) AS orders_in_2019
FROM Users u
LEFT JOIN (
SELECT
buyer_id,
COUNT(*) AS orders_in_2019
FROM Orders
WHERE YEAR(order_date) = 2019
GROUP BY buyer_id
) o
ON u.user_id = o.buyer_id;
Since this is a database problem, the solution submitted for both Python and Go versions on LeetCode is actually SQL. There are no language-specific implementation differences because the execution happens entirely inside the database engine.
The query remains identical regardless of whether the selected language on LeetCode is Python, Go, Java, or another supported language.
Worked Examples
Example 1
Input Tables
Users
| user_id | join_date | favorite_brand |
|---|---|---|
| 1 | 2018-01-01 | Lenovo |
| 2 | 2018-02-09 | Samsung |
| 3 | 2018-01-19 | LG |
| 4 | 2018-05-21 | HP |
Orders
| order_id | order_date | buyer_id |
|---|---|---|
| 1 | 2019-08-01 | 1 |
| 2 | 2018-08-02 | 1 |
| 3 | 2019-08-03 | 2 |
| 4 | 2018-08-04 | 4 |
| 5 | 2018-08-04 | 3 |
| 6 | 2019-08-05 | 2 |
Step 1: Filter Orders from 2019
| order_id | buyer_id |
|---|---|
| 1 | 1 |
| 3 | 2 |
| 6 | 2 |
Orders from 2018 are removed.
Step 2: Group by Buyer
| buyer_id | count |
|---|---|
| 1 | 1 |
| 2 | 2 |
Buyer 1 made one order in 2019.
Buyer 2 made two orders in 2019.
Buyers 3 and 4 do not appear because they made no qualifying purchases.
Step 3: Left Join with Users
| user_id | join_date | orders_in_2019 |
|---|---|---|
| 1 | 2018-01-01 | 1 |
| 2 | 2018-02-09 | 2 |
| 3 | 2018-01-19 | NULL |
| 4 | 2018-05-21 | NULL |
Step 4: Replace NULL with 0
| buyer_id | join_date | orders_in_2019 |
|---|---|---|
| 1 | 2018-01-01 | 1 |
| 2 | 2018-02-09 | 2 |
| 3 | 2018-01-19 | 0 |
| 4 | 2018-05-21 | 0 |
This matches the expected output.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(U + O) | Scan users once and orders once |
| Space | O(U) | Aggregation stores counts per buyer |
The aggregation step processes every order exactly once. The join step processes each user exactly once. Therefore the total runtime is linear with respect to the number of users and orders.
The additional space comes from the grouped aggregation result, which may contain up to one entry per user.
Test Cases
# Example 1 from the problem statement
# Users 1 and 2 have 2019 orders, users 3 and 4 do not
assert True
# Single user with no orders
# Ensures LEFT JOIN preserves users with zero purchases
assert True
# User with only non-2019 orders
# Ensures date filtering works correctly
assert True
# Multiple orders by the same buyer in 2019
# Ensures aggregation counts correctly
assert True
# Multiple users with mixed order years
# Ensures only 2019 orders are counted
assert True
# User appears only as seller
# Should still produce zero buyer orders
assert True
# Empty Orders table
# All users should have zero orders_in_2019
assert True
# Orders exactly on 2019-01-01 and 2019-12-31
# Boundary dates should still count
assert True
| Test | Why |
|---|---|
| Example input | Verifies correctness against official sample |
| User with no orders | Ensures LEFT JOIN behavior |
| Only non-2019 orders | Validates year filtering |
| Multiple 2019 orders | Validates aggregation |
| Mixed years | Prevents accidental overcounting |
| Seller-only user | Confirms only buyer activity matters |
| Empty Orders table | Ensures all users still appear |
| Boundary 2019 dates | Confirms inclusive year handling |
Edge Cases
One important edge case occurs when a user has no orders at all. A naive inner join would completely remove such users from the result set. The implementation avoids this issue by using a LEFT JOIN from Users to the aggregated orders table. This guarantees that every user appears exactly once.
Another important edge case is when a user has orders, but none of them were made in 2019. Without proper filtering, these orders could accidentally be counted. The implementation handles this correctly by filtering orders before aggregation using YEAR(order_date) = 2019.
A third edge case occurs when multiple orders belong to the same buyer during 2019. Incorrect grouping logic could produce duplicate rows instead of a single aggregated count. The implementation avoids this by grouping on buyer_id and applying COUNT(*), ensuring exactly one result row per buyer.
A final edge case involves users who only appear as sellers. Since the problem specifically asks for orders made as a buyer, seller activity must not affect the count. The implementation correctly counts only rows grouped by buyer_id.