LeetCode 1070 - Product Sales Analysis III

The problem asks us to analyze sales data stored in a Sales table and extract the records corresponding to the first year each product was sold.

LeetCode Problem 1070

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

The problem asks us to analyze sales data stored in a Sales table and extract the records corresponding to the first year each product was sold. Specifically, for each product_id, we need to determine the earliest year in which it appears in the table and return all sales records for that product in that year.

The input is a relational table with columns: sale_id, product_id, year, quantity, and price. Each row represents a specific sale of a product in a given year. Note that multiple sales of the same product can exist in the same year. The primary key is (sale_id, year), ensuring uniqueness of each sale record.

The expected output is a table with columns: product_id, first_year, quantity, and price, showing all sales of each product in the first year it was sold. There are no constraints on the order of rows in the output.

Important edge cases include: a product with multiple sales in its first year, a product appearing in non-consecutive years, or multiple products sold in the same earliest year. The input guarantees non-null values and a valid primary key combination, which simplifies deduplication concerns.

Approaches

The brute-force approach would involve scanning the Sales table for each product to identify the minimum year and then retrieving all sales for that product in that year. This approach is correct because it directly follows the problem definition, but it is inefficient because it may require multiple full table scans, resulting in potentially O(n^2) time complexity for n rows.

The optimal approach leverages SQL aggregation and joins. First, we group the Sales table by product_id and compute the minimum year for each product. Then, we join this result back to the original Sales table to select all records where the year matches the first year. This method reduces the number of table scans and uses set-based operations that databases optimize efficiently.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2) O(n) For each product, scan all rows to find the minimum year, then scan again to fetch matching rows
Optimal O(n) O(n) Use GROUP BY to compute first year per product, then join/filter to get matching rows

Algorithm Walkthrough

  1. Identify the first year each product was sold by grouping the Sales table by product_id and computing the minimum year.
  2. Store this result as a temporary table or derived table, containing product_id and first_year.
  3. Join the temporary table back with the Sales table on product_id and year = first_year.
  4. Select the required columns: product_id, first_year, quantity, and price.
  5. Return the final table.

Why it works: By computing the minimum year for each product and joining it back, we ensure that we select all sales entries corresponding to the earliest sale year for each product. The join ensures no other years are included, and the aggregation guarantees correctness.

Python Solution

import sqlite3
from typing import List, Tuple

def product_sales_first_year(cursor: sqlite3.Cursor) -> List[Tuple[int, int, int, int]]:
    query = """
    WITH FirstYear AS (
        SELECT product_id, MIN(year) AS first_year
        FROM Sales
        GROUP BY product_id
    )
    SELECT s.product_id, f.first_year, s.quantity, s.price
    FROM Sales s
    JOIN FirstYear f
    ON s.product_id = f.product_id AND s.year = f.first_year;
    """
    cursor.execute(query)
    return cursor.fetchall()

This implementation first computes the first year each product was sold using a common table expression (WITH FirstYear). Then it joins back with the original Sales table to filter rows matching the first year. The fetchall() returns all required columns as tuples.

Go Solution

package main

import (
    "database/sql"
    _ "github.com/go-sql-driver/mysql"
)

func ProductSalesFirstYear(db *sql.DB) (*sql.Rows, error) {
    query := `
    WITH FirstYear AS (
        SELECT product_id, MIN(year) AS first_year
        FROM Sales
        GROUP BY product_id
    )
    SELECT s.product_id, f.first_year, s.quantity, s.price
    FROM Sales s
    JOIN FirstYear f
    ON s.product_id = f.product_id AND s.year = f.first_year;
    `
    return db.Query(query)
}

The Go implementation follows the same logic as Python but uses db.Query to execute the SQL. The *sql.Rows type allows iteration over results. Go requires an explicit database driver and connection, unlike Python’s in-memory sqlite3.

Worked Examples

Consider the input:

+---------+------------+------+----------+-------+
| sale_id | product_id | year | quantity | price |
+---------+------------+------+----------+-------+ 
| 1       | 100        | 2008 | 10       | 5000  |
| 2       | 100        | 2009 | 12       | 5000  |
| 7       | 200        | 2011 | 15       | 9000  |

Step 1: Compute first year per product.

+------------+------------+
| product_id | first_year |
+------------+------------+
| 100        | 2008       |
| 200        | 2011       |

Step 2: Join with Sales table on product_id and year = first_year.

+------------+------------+----------+-------+
| product_id | first_year | quantity | price |
+------------+------------+----------+-------+
| 100        | 2008       | 10       | 5000  |
| 200        | 2011       | 15       | 9000  |

Complexity Analysis

Measure Complexity Explanation
Time O(n) Scans the table once to compute min year and once to join/filter results
Space O(n) Stores first year per product temporarily, proportional to number of unique products

The complexity is efficient because we avoid repeated scans of the table for each product and leverage SQL set-based operations.

Test Cases

# Test case 1: provided example
assert product_sales_first_year(cursor) == [(100, 2008, 10, 5000), (200, 2011, 15, 9000)]

# Test case 2: multiple sales in the first year
cursor.execute("""
INSERT INTO Sales VALUES (3, 100, 2008, 5, 5000)
""")
assert sorted(product_sales_first_year(cursor)) == sorted([(100, 2008, 10, 5000), (100, 2008, 5, 5000), (200, 2011, 15, 9000)])

# Test case 3: product sold only in one year
cursor.execute("""
INSERT INTO Sales VALUES (4, 300, 2012, 20, 7000)
""")
assert (300, 2012, 20, 7000) in product_sales_first_year(cursor)

# Test case 4: multiple products same first year
cursor.execute("""
INSERT INTO Sales VALUES (5, 400, 2008, 8, 6000)
""")
assert (400, 2008, 8, 6000) in product_sales_first_year(cursor)
Test Why
Single product first year Validates baseline case
Multiple sales in first year Ensures all entries are included
Product sold in one year Edge case for single-year product
Multiple products same first year Checks handling multiple products concurrently

Edge Cases

One important edge case is when a product has multiple entries in its first year. A naive query using LIMIT 1 or similar could miss some rows. Our solution handles this by joining on year = first_year and not limiting results.

Another edge case is when multiple products share the same earliest year. Some solutions may accidentally group by year instead of product, leading to incorrect filtering. By computing MIN(year) per product_id, we correctly handle each product independently.

A third edge case is when a product only appears once in the table. Our algorithm naturally includes it because the MIN(year) is the year of that single entry, and the join returns it without special handling.