LeetCode 2292 - Products With Three or More Orders in Two Consecutive Years

This problem provides a table named Orders, where each row represents a single purchase event. Every order contains an orderid, a productid, the purchased quantity, and the purchasedate. The goal is to identify all products that satisfy two conditions simultaneously: 1.

LeetCode Problem 2292

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This problem provides a table named Orders, where each row represents a single purchase event. Every order contains an order_id, a product_id, the purchased quantity, and the purchase_date.

The goal is to identify all products that satisfy two conditions simultaneously:

  1. The product was ordered at least three times in one year.
  2. The same product was also ordered at least three times in the immediately following or preceding year.

The phrase "two consecutive years" is the key requirement. It means we are not looking for products that simply have three orders in multiple unrelated years. The years must differ by exactly one.

It is important to notice that the problem counts orders, not quantities. Even if a single order has a large quantity, it still counts as only one order occurrence. Therefore, the quantity column is irrelevant for solving the problem.

For example, if a product appears:

  • 3 times in 2020
  • 3 times in 2021

then it qualifies because 2020 and 2021 are consecutive years.

However, if a product appears:

  • 3 times in 2020
  • 3 times in 2022

then it does not qualify because the years are not consecutive.

The expected output is a table containing only the qualifying product_id values. The order of the output rows does not matter.

The input guarantees that order_id values are unique, so each row represents a distinct order event. Since the problem only asks whether yearly order counts reach at least three, we can aggregate orders by (product_id, year) and ignore the individual order details afterward.

Several edge cases are important to consider:

  • A product may have many orders in one year but fewer than three in adjacent years.
  • A product may qualify across multiple pairs of consecutive years.
  • A product may have exactly three orders in consecutive years, which should still qualify.
  • A product may have many orders spread across non-consecutive years, which should not qualify.
  • Multiple products may satisfy the condition simultaneously.

Approaches

Brute Force Approach

A brute force solution would examine every product independently and compare every pair of years for that product.

First, we would gather all orders for each product and count how many orders occurred in each year. Then, for every product, we would iterate through all year pairs and check whether:

  • both years have at least three orders
  • the years differ by exactly one

This approach is correct because it explicitly verifies the condition required by the problem. However, it performs unnecessary comparisons. If a product has many distinct years, checking every possible pair becomes inefficient.

For example, if a product appears across k years, comparing every pair requires O(k²) time for that product.

Optimal Approach

The key observation is that we only care about adjacent years. There is no need to compare every pair of years.

Instead, we can:

  1. Count the number of orders for each (product_id, year) pair.
  2. Keep only those pairs where the count is at least three.
  3. For each qualifying (product_id, year), check whether (product_id, year + 1) also exists.

This transforms the problem into a simple lookup problem using a hash set or grouped aggregation.

In SQL, this is naturally implemented using:

  • GROUP BY to count yearly orders
  • a self join to compare consecutive years

This approach avoids quadratic comparisons and efficiently checks only the necessary adjacent years.

Approach Time Complexity Space Complexity Notes
Brute Force O(n + k²) O(k) Compares all year pairs for each product
Optimal O(n) O(k) Uses grouped counts and direct consecutive-year lookup

Here:

  • n is the number of orders
  • k is the number of unique (product_id, year) pairs

Algorithm Walkthrough

  1. Extract the year from each purchase_date.

Since the condition is based on yearly order counts, we first convert each order date into its corresponding year. 2. Group records by (product_id, year).

For every product and year combination, count how many orders occurred during that year. 3. Filter groups with at least three orders.

We only care about years where a product was ordered three or more times. Any smaller count can never contribute to a valid answer. 4. Create a derived table containing only qualifying (product_id, year) pairs.

After filtering, each row represents a year in which a product satisfied the minimum order requirement. 5. Self join the derived table.

Join the table with itself using:

  • same product_id
  • consecutive years condition:

year1 + 1 = year2

This directly checks whether the product satisfied the requirement in adjacent years. 6. Return distinct product IDs.

A product may satisfy the condition across multiple consecutive year pairs, but the output should contain each product only once.

Why it works

The algorithm works because it first identifies every year in which a product independently satisfies the "three or more orders" condition. After that, it only needs to determine whether any two qualifying years are consecutive. The self join guarantees that only products with adjacent qualifying years are returned.

Python Solution

# complete solution here

import pandas as pd

def find_products(orders: pd.DataFrame) -> pd.DataFrame:
    # Extract year from purchase_date
    orders["year"] = pd.to_datetime(orders["purchase_date"]).dt.year

    # Count orders per product per year
    yearly_counts = (
        orders.groupby(["product_id", "year"])
        .size()
        .reset_index(name="order_count")
    )

    # Keep only years with at least 3 orders
    qualified_years = yearly_counts[yearly_counts["order_count"] >= 3]

    # Self join on consecutive years
    merged = qualified_years.merge(
        qualified_years,
        left_on=["product_id", "year"],
        right_on=["product_id", "year"],
        suffixes=("_left", "_right")
    )

    # Filter consecutive years
    result = merged[
        merged["year_right"] == merged["year_left"] + 1
    ]

    # Return distinct product IDs
    return result[["product_id"]].drop_duplicates()

The implementation begins by extracting the year component from purchase_date. This allows the solution to aggregate data at the yearly level.

Next, the code groups records by both product_id and year. The .size() operation counts how many orders belong to each group.

After computing yearly counts, the solution filters out any groups with fewer than three orders because those years cannot contribute to a valid answer.

The self join is the core step. It matches qualifying years belonging to the same product. The final filter checks whether the second year is exactly one greater than the first year.

Finally, duplicate product IDs are removed before returning the result.

Go Solution

// complete solution here

SELECT DISTINCT t1.product_id
FROM (
    SELECT
        product_id,
        YEAR(purchase_date) AS year,
        COUNT(*) AS order_count
    FROM Orders
    GROUP BY product_id, YEAR(purchase_date)
    HAVING COUNT(*) >= 3
) t1
JOIN (
    SELECT
        product_id,
        YEAR(purchase_date) AS year,
        COUNT(*) AS order_count
    FROM Orders
    GROUP BY product_id, YEAR(purchase_date)
    HAVING COUNT(*) >= 3
) t2
ON t1.product_id = t2.product_id
AND t1.year + 1 = t2.year;

This problem is a database problem, so the "Go solution" section is actually implemented using SQL. The query uses two identical derived tables and joins them on:

  • the same product
  • consecutive years

The DISTINCT keyword ensures that each qualifying product appears only once in the final result.

Worked Examples

Example 1

Input:

order_id product_id purchase_date
1 1 2020-03-16
2 1 2020-12-02
3 1 2020-05-10
4 1 2021-12-23
5 1 2021-05-21
6 1 2021-10-11
7 2 2022-10-11

Step 1, Extract Years

product_id year
1 2020
1 2020
1 2020
1 2021
1 2021
1 2021
2 2022

Step 2, Count Orders Per Product Per Year

product_id year order_count
1 2020 3
1 2021 3
2 2022 1

Step 3, Keep Only Counts >= 3

product_id year
1 2020
1 2021

Step 4, Check Consecutive Years

We compare:

  • (1, 2020) with (1, 2021)

Since 2021 = 2020 + 1, product 1 qualifies.

Final Result

product_id
1

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each order is processed a constant number of times
Space O(k) Stores grouped yearly counts

Here:

  • n is the number of rows in Orders
  • k is the number of unique (product_id, year) pairs

The grouping operation processes every row once. The self join only operates on aggregated yearly records, which is significantly smaller than the original dataset in practice.

Test Cases

# test cases

import pandas as pd

def run_tests():
    # Example case
    orders = pd.DataFrame({
        "order_id": [1,2,3,4,5,6,7],
        "product_id": [1,1,1,1,1,1,2],
        "quantity": [7,4,7,6,5,6,6],
        "purchase_date": [
            "2020-03-16",
            "2020-12-02",
            "2020-05-10",
            "2021-12-23",
            "2021-05-21",
            "2021-10-11",
            "2022-10-11"
        ]
    })

    result = find_products(orders)
    assert set(result["product_id"]) == {1}  # provided example

    # Exactly three orders in consecutive years
    orders = pd.DataFrame({
        "order_id": [1,2,3,4,5,6],
        "product_id": [1,1,1,1,1,1],
        "quantity": [1,1,1,1,1,1],
        "purchase_date": [
            "2020-01-01",
            "2020-02-01",
            "2020-03-01",
            "2021-01-01",
            "2021-02-01",
            "2021-03-01"
        ]
    })

    result = find_products(orders)
    assert set(result["product_id"]) == {1}  # minimum valid case

    # Non-consecutive years
    orders = pd.DataFrame({
        "order_id": [1,2,3,4,5,6],
        "product_id": [1,1,1,1,1,1],
        "quantity": [1,1,1,1,1,1],
        "purchase_date": [
            "2020-01-01",
            "2020-02-01",
            "2020-03-01",
            "2022-01-01",
            "2022-02-01",
            "2022-03-01"
        ]
    })

    result = find_products(orders)
    assert result.empty  # years are not consecutive

    # Multiple valid products
    orders = pd.DataFrame({
        "order_id": list(range(1, 13)),
        "product_id": [1]*6 + [2]*6,
        "quantity": [1]*12,
        "purchase_date": [
            "2020-01-01","2020-02-01","2020-03-01",
            "2021-01-01","2021-02-01","2021-03-01",
            "2021-01-01","2021-02-01","2021-03-01",
            "2022-01-01","2022-02-01","2022-03-01"
        ]
    })

    result = find_products(orders)
    assert set(result["product_id"]) == {1, 2}  # multiple answers

    # Large counts but one year below threshold
    orders = pd.DataFrame({
        "order_id": [1,2,3,4,5],
        "product_id": [1,1,1,1,1],
        "quantity": [1,1,1,1,1],
        "purchase_date": [
            "2020-01-01",
            "2020-02-01",
            "2020-03-01",
            "2021-01-01",
            "2021-02-01"
        ]
    })

    result = find_products(orders)
    assert result.empty  # second year has only two orders

run_tests()
Test Why
Provided example Validates the core problem requirement
Exactly three orders Tests the minimum qualifying threshold
Non-consecutive years Ensures adjacency is required
Multiple valid products Confirms multiple answers are handled
One year below threshold Verifies filtering logic

Edge Cases

One important edge case occurs when a product has many orders across several years, but none of the qualifying years are consecutive. For example, a product might have three orders in 2019 and three orders in 2021. A naive implementation that only checks for multiple qualifying years could incorrectly include it. The self join condition year + 1 prevents this mistake.

Another edge case involves products with exactly three orders in each consecutive year. Some implementations mistakenly use > instead of >= during filtering. This would incorrectly reject valid products. The solution explicitly uses HAVING COUNT(*) >= 3, ensuring that the threshold is inclusive.

A third important case occurs when a product qualifies across multiple consecutive year pairs. For example, a product may have:

  • 3 orders in 2020
  • 3 orders in 2021
  • 3 orders in 2022

This produces two valid pairs:

  • (2020, 2021)
  • (2021, 2022)

Without DISTINCT, the product could appear multiple times in the result. The solution handles this correctly by removing duplicates before returning the final output.