LeetCode 1069 - Product Sales Analysis II

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

This problem asks us to calculate the total quantity sold for each product based on records in the Sales table.

The Sales table contains transaction level information. Each row represents a sale of a specific product in a given year, along with the quantity sold and the unit price. Since a product can appear in multiple rows, potentially across different years, we need to combine all those rows and compute the total quantity sold per product_id.

The Product table contains product names mapped to product_id, but the problem only asks us to report:

product_id
total_quantity

Since the output does not require product_name, we do not actually need to use the Product table in our solution.

In other words, the task is to group all rows in the Sales table by product_id and sum the quantity column for each group.

For example, if product 100 appears in multiple sales records with quantities 10 and 12, then its total quantity sold becomes:

$$10 + 12 = 22$$

The problem guarantees that (sale_id, year) is unique, meaning each sales record is distinct. Also, product_id in Sales is a valid foreign key into Product, so every referenced product exists.

An important detail is that only products with sales should appear in the result. Even if a product exists in the Product table, such as 300 in the example, if there are no matching sales records, it should not be included in the output.

A naive implementation could mistakenly include all products from the Product table or fail to aggregate multiple sales entries correctly. Another common mistake would be grouping by both product_id and year, which would incorrectly produce multiple rows per product instead of a single total.

Since this is a database aggregation problem, the expected solution is straightforward and efficient using SQL aggregation.

Approaches

Brute Force Approach

A brute force way to solve this problem would be to manually process every sales record and repeatedly scan the table to accumulate totals for each product.

For example, for every unique product_id, we could iterate through the entire Sales table and sum all matching quantities. While this produces the correct answer, it is inefficient because we repeatedly revisit the same rows.

If there are n sales records and m unique products, this could take O(n × m) time in the worst case.

The approach works because every sale record contributes exactly once to its corresponding product total, but the repeated scanning makes it unnecessarily expensive.

Optimal Approach

The key observation is that this is a classic grouping and aggregation problem.

We can process the Sales table in a single pass by grouping rows with the same product_id and summing the quantity values. SQL provides this directly through GROUP BY combined with the SUM() aggregate function.

This works because all rows for the same product belong to the same logical group, and summing quantities within each group gives the desired total quantity sold.

Since the Product table is not needed for the requested output, we avoid an unnecessary join and keep the query simple and efficient.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n × m)	O(m)	Repeatedly scans sales records for each product
Optimal	O(n)	O(m)	Single aggregation using `GROUP BY`

Where:

n = number of rows in Sales
m = number of distinct product_id values

Algorithm Walkthrough

Optimal Algorithm

Read all rows from the Sales table.

Each row contains a product_id and a quantity. Since multiple rows may correspond to the same product, we must combine them. 2. Group rows by product_id.

The purpose of grouping is to collect all sales records belonging to the same product into one logical bucket. 3. Compute the sum of quantity for each group.

Once rows are grouped, we calculate the total quantity sold for each product by adding together all quantities in that group. 4. Return the result.

The output should contain one row per product_id with the computed total_quantity.

The SQL implementation becomes very concise because relational databases are optimized for aggregation operations.

Why it works

The algorithm works because every sale record contributes exactly one quantity value to exactly one product_id. By grouping all rows belonging to the same product and summing their quantities, we ensure every sale is counted once and only once. Since no product outside the Sales table contributes any quantity, the output naturally contains only products that were actually sold.

Python Solution

LeetCode Database problems use SQL rather than Python, but if we model the logic programmatically, the equivalent implementation would look like this:

from collections import defaultdict
from typing import List

class Solution:
    def totalQuantity(self, sales: List[List[int]]) -> List[List[int]]:
        quantity_by_product = defaultdict(int)

        for _, product_id, _, quantity, _ in sales:
            quantity_by_product[product_id] += quantity

        return [
            [product_id, total_quantity]
            for product_id, total_quantity in quantity_by_product.items()
        ]

The implementation uses a hash map, specifically a defaultdict(int), to accumulate total quantities per product. As we iterate through the sales records, we extract the product_id and quantity, then add the quantity to that product's running total.

Because dictionary lookup and insertion are constant time on average, the aggregation runs efficiently in a single pass.

For the actual LeetCode submission, the expected answer is SQL:

SELECT
    product_id,
    SUM(quantity) AS total_quantity
FROM Sales
GROUP BY product_id;

The query groups rows sharing the same product_id and computes the total quantity sold using SUM(quantity). The alias total_quantity matches the required output format.

Go Solution

Since this is a database problem, Go is not used in the actual LeetCode submission. However, the equivalent aggregation logic in Go would look like this:

package main

func totalQuantity(sales [][]int) [][]int {
	productTotals := make(map[int]int)

	for _, sale := range sales {
		productID := sale[1]
		quantity := sale[3]

		productTotals[productID] += quantity
	}

	result := [][]int{}

	for productID, totalQuantity := range productTotals {
		result = append(result, []int{
			productID,
			totalQuantity,
		})
	}

	return result
}

The Go version uses a map[int]int to store the running total for each product_id. Maps in Go provide average O(1) insertion and lookup, making them a good fit for aggregation problems.

Unlike Python's defaultdict, Go maps automatically return the zero value for missing keys, so incrementing a nonexistent key works naturally. Since the problem allows returning results in any order, there is no need to sort the output.

For the actual LeetCode problem, the SQL solution remains:

SELECT
    product_id,
    SUM(quantity) AS total_quantity
FROM Sales
GROUP BY product_id;

Worked Examples

Example 1

Input:

Sales

sale_id	product_id	year	quantity	price
1	100	2008	10	5000
2	100	2009	12	5000
7	200	2011	15	9000

We maintain a running total for each product.

Step	Current Sale	Running Totals
1	`(100, 10)`	`{100: 10}`
2	`(100, 12)`	`{100: 22}`
3	`(200, 15)`	`{100: 22, 200: 15}`

Final result:

product_id	total_quantity
100	22
200	15

Notice that product 300 does not appear because it has no sales records.

Using SQL, the grouping works as follows:

product_id	Quantities Found	Sum
100	`10, 12`	22
200	`15`	15

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each sales row is processed once
Space	O(m)	Stores totals for distinct products

The algorithm scans the sales data one time and updates a running total for each product. Since every row contributes exactly once to a hash map entry, the time complexity is linear in the number of sales records.

The space complexity depends on how many distinct products appear in the sales table. In the worst case, every sale could involve a different product, requiring O(m) additional storage.

For the SQL solution, database engines typically implement GROUP BY using hashing or sorting internally, leading to near linear performance for aggregation workloads.

Test Cases

from collections import defaultdict

def total_quantity(sales):
    quantity_by_product = defaultdict(int)

    for _, product_id, _, quantity, _ in sales:
        quantity_by_product[product_id] += quantity

    return sorted(quantity_by_product.items())

# Example from problem statement
assert total_quantity([
    [1, 100, 2008, 10, 5000],
    [2, 100, 2009, 12, 5000],
    [7, 200, 2011, 15, 9000]
]) == [(100, 22), (200, 15)]  # multiple years per product

# Single sale record
assert total_quantity([
    [1, 500, 2020, 8, 100]
]) == [(500, 8)]  # minimum meaningful input

# Multiple products with one sale each
assert total_quantity([
    [1, 100, 2020, 5, 100],
    [2, 200, 2020, 10, 200],
    [3, 300, 2020, 15, 300]
]) == [(100, 5), (200, 10), (300, 15)]  # independent aggregation

# Same product across many years
assert total_quantity([
    [1, 100, 2018, 3, 50],
    [2, 100, 2019, 4, 50],
    [3, 100, 2020, 5, 50]
]) == [(100, 12)]  # aggregation across years

# Multiple products with repeated entries
assert total_quantity([
    [1, 10, 2020, 2, 100],
    [2, 20, 2021, 3, 200],
    [3, 10, 2022, 4, 100],
    [4, 20, 2023, 5, 200]
]) == [(10, 6), (20, 8)]  # repeated aggregation

Test	Why
Example input	Validates correctness against the official sample
Single sale record	Ensures smallest valid input works
Multiple unique products	Verifies separate grouping
Same product across years	Confirms aggregation ignores year
Repeated product entries	Ensures totals accumulate correctly

Edge Cases

Products with no sales

A product may exist in the Product table but never appear in Sales. This can easily lead to bugs if someone mistakenly performs a full join or iterates over every product. The problem only wants products that were actually sold. Since the implementation aggregates directly from the Sales table, products without sales are automatically excluded.

Multiple sales across different years

The same product can appear in several years. A common mistake would be grouping by both product_id and year, which would create multiple rows for the same product. Our implementation groups only by product_id, ensuring quantities across all years are combined correctly.

Multiple sales in the same year

A product might have several separate sales records within the same year. A naive solution could accidentally overwrite earlier quantities instead of summing them. Since we continuously add to a running total using SUM(quantity) in SQL or a hash map in code, every sale contributes correctly to the final total.