LeetCode 3214 - Year on Year Growth Rate
This problem asks us to compute the year-on-year, often abbreviated as YoY, growth rate of total spending for every product in the usertransactions table. Each row in the input table represents a single transaction.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
This problem asks us to compute the year-on-year, often abbreviated as YoY, growth rate of total spending for every product in the user_transactions table.
Each row in the input table represents a single transaction. A transaction contains four important pieces of information:
- The unique transaction ID
- The product ID
- The amount spent in that transaction
- The transaction timestamp
The goal is not to analyze individual transactions directly. Instead, we must first aggregate all transactions belonging to the same product and same calendar year. After that, we compare each year's total spending against the previous year's total spending for that same product.
For every (product_id, year) pair, the output must contain:
| Column | Meaning |
|---|---|
year |
The calendar year extracted from transaction_date |
product_id |
The product being analyzed |
curr_year_spend |
Total spend for that product during the current year |
prev_year_spend |
Total spend for the same product during the previous recorded year |
yoy_rate |
Percentage growth from previous year to current year |
The formula for year-on-year growth is:
$$\text{YoY Rate} = \left( \frac{\text{Current Year Spend} - \text{Previous Year Spend}} {\text{Previous Year Spend}} \right) \times 100$$
The result must be rounded to 2 decimal places.
The first recorded year for each product has no earlier year to compare against, so both prev_year_spend and yoy_rate should be NULL.
A very important detail is that the comparison happens independently for each product. Spending for one product must never affect another product's calculations.
The problem naturally suggests grouping and sequential comparison operations. Since we need access to the "previous row" within each product's yearly history, SQL window functions are especially well suited for this task.
Several edge cases are important:
- A product may only appear in one year. In that case, previous spend and growth rate are both
NULL. - Multiple transactions may occur in the same year for the same product, so aggregation is required before comparison.
- Growth can be negative if spending decreases.
- Growth can be positive if spending increases.
- Products must be processed independently, meaning window calculations must partition by
product_id.
Approaches
Brute Force Approach
A brute force solution would first compute the yearly total spending for every product. After generating those yearly totals, we could compare each row against every other row belonging to the same product to locate the previous year.
Conceptually, this approach works as follows:
- Aggregate all transactions into yearly totals.
- For each
(product_id, year)row:
- Scan all rows again.
- Find the same product with the immediately preceding year.
- Use that value to compute the YoY rate.
This approach is correct because every row explicitly searches for its predecessor before computing the percentage change.
However, the approach becomes inefficient because each row repeatedly scans the dataset. If there are n yearly summary rows, then each row may perform another O(n) scan, resulting in O(n^2) time complexity.
This is unnecessarily expensive when SQL already provides built-in mechanisms for accessing previous rows efficiently.
Optimal Approach
The key observation is that year-on-year comparison only depends on the previous chronological row within the same product.
SQL window functions are designed exactly for this kind of problem.
The LAG() window function allows us to access the previous row's value without performing manual joins or repeated scans.
The optimal solution proceeds in two stages:
- Aggregate transactions by product and year.
- Use
LAG()partitioned byproduct_idand ordered by year to retrieve the previous year's spend.
Once we have the previous year's spend available directly in the current row, computing the YoY percentage becomes straightforward.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(n) | Repeatedly scans yearly summaries to find previous year |
| Optimal | O(n log n) | O(n) | Uses aggregation and window functions efficiently |
Algorithm Walkthrough
- Extract the year from each transaction date.
We use the SQL YEAR() function to convert a timestamp into its calendar year. This allows transactions from the same year to be grouped together.
2. Aggregate spending by product and year.
Multiple transactions may exist for the same product within a year. We therefore compute:
SUM(spend)
grouped by both product_id and year.
3. Partition yearly data by product.
Since YoY comparisons must remain independent for each product, we partition the data using:
PARTITION BY product_id
- Order each product's yearly data chronologically.
To identify the previous year correctly, rows inside each partition must be sorted by year ascending.
5. Use LAG() to retrieve the previous year's spending.
The expression:
LAG(curr_year_spend)
gives the previous row's yearly spending value within the same product partition. 6. Compute the YoY growth rate.
For rows where a previous year exists, compute:
\left( \frac{\text{curr_year_spend} - \text{prev_year_spend}} {\text{prev_year_spend}} \right) \times 100
Then round the result to two decimal places using ROUND().
7. Return rows ordered by product ID and year.
The problem explicitly requires ascending order by:
product_idyear
Why it works
The solution works because yearly spending is first aggregated correctly for each (product_id, year) pair. After aggregation, every product's yearly history forms an ordered sequence.
LAG() guarantees that each row can directly access the immediately preceding row within its product partition. Since rows are ordered by year, the retrieved value is exactly the previous year's spending.
Therefore, every YoY computation uses the correct historical reference value.
Python Solution
# Write your MySQL query statement below
WITH yearly_spend AS (
SELECT
YEAR(transaction_date) AS year,
product_id,
SUM(spend) AS curr_year_spend
FROM user_transactions
GROUP BY YEAR(transaction_date), product_id
),
growth_data AS (
SELECT
year,
product_id,
curr_year_spend,
LAG(curr_year_spend) OVER (
PARTITION BY product_id
ORDER BY year
) AS prev_year_spend
FROM yearly_spend
)
SELECT
year,
product_id,
curr_year_spend,
prev_year_spend,
ROUND(
(
(curr_year_spend - prev_year_spend)
/ prev_year_spend
) * 100,
2
) AS yoy_rate
FROM growth_data
ORDER BY product_id, year;
The solution begins with a Common Table Expression named yearly_spend. This stage performs the aggregation step by grouping transactions by both year and product ID. The result is one row per product per year.
The second Common Table Expression, growth_data, applies the LAG() window function. Because the window is partitioned by product_id, each product's historical sequence is processed independently. Ordering by year ensures the previous row corresponds to the previous year chronologically.
Finally, the outer query computes the percentage growth rate using the standard YoY formula. The ROUND(..., 2) function ensures the output matches the required precision.
For the earliest year of each product, LAG() returns NULL, which naturally propagates through the formula and produces a NULL YoY rate as required.
Go Solution
// Write your MySQL query statement below
WITH yearly_spend AS (
SELECT
YEAR(transaction_date) AS year,
product_id,
SUM(spend) AS curr_year_spend
FROM user_transactions
GROUP BY YEAR(transaction_date), product_id
),
growth_data AS (
SELECT
year,
product_id,
curr_year_spend,
LAG(curr_year_spend) OVER (
PARTITION BY product_id
ORDER BY year
) AS prev_year_spend
FROM yearly_spend
)
SELECT
year,
product_id,
curr_year_spend,
prev_year_spend,
ROUND(
(
(curr_year_spend - prev_year_spend)
/ prev_year_spend
) * 100,
2
) AS yoy_rate
FROM growth_data
ORDER BY product_id, year;
Unlike algorithmic LeetCode problems, database problems use SQL regardless of the language tab selected on the platform. Therefore, the Go version is identical to the Python version because both simply submit a SQL query.
The implementation relies entirely on SQL aggregation and window functions, so there are no language-specific memory management or overflow concerns.
Worked Examples
Consider the input:
| transaction_id | product_id | spend | transaction_date |
|---|---|---|---|
| 1341 | 123424 | 1500.60 | 2019-12-31 |
| 1423 | 123424 | 1000.20 | 2020-12-31 |
| 1623 | 123424 | 1246.44 | 2021-12-31 |
| 1322 | 123424 | 2145.32 | 2022-12-31 |
Step 1, Aggregate yearly spending
After grouping by year and product:
| year | product_id | curr_year_spend |
|---|---|---|
| 2019 | 123424 | 1500.60 |
| 2020 | 123424 | 1000.20 |
| 2021 | 123424 | 1246.44 |
| 2022 | 123424 | 2145.32 |
Step 2, Apply LAG()
| year | curr_year_spend | prev_year_spend |
|---|---|---|
| 2019 | 1500.60 | NULL |
| 2020 | 1000.20 | 1500.60 |
| 2021 | 1246.44 | 1000.20 |
| 2022 | 2145.32 | 1246.44 |
Step 3, Compute YoY growth
Year 2019
No previous year exists.
| Field | Value |
|---|---|
| curr_year_spend | 1500.60 |
| prev_year_spend | NULL |
| yoy_rate | NULL |
Year 2020
$$\frac{1000.20 - 1500.60}{1500.60} \times 100 = -33.35$$
| Field | Value |
|---|---|
| curr_year_spend | 1000.20 |
| prev_year_spend | 1500.60 |
| yoy_rate | -33.35 |
Year 2021
$$\frac{1246.44 - 1000.20}{1000.20} \times 100 = 24.62$$
Year 2022
$$\frac{2145.32 - 1246.44}{1246.44} \times 100 = 72.12$$
Final output:
| year | product_id | curr_year_spend | prev_year_spend | yoy_rate |
|---|---|---|---|---|
| 2019 | 123424 | 1500.60 | NULL | NULL |
| 2020 | 123424 | 1000.20 | 1500.60 | -33.35 |
| 2021 | 123424 | 1246.44 | 1000.20 | 24.62 |
| 2022 | 123424 | 2145.32 | 1246.44 | 72.12 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Aggregation and window ordering dominate runtime |
| Space | O(n) | Stores grouped yearly summaries |
The aggregation step processes every transaction once. The window function requires ordering rows within each product partition, which typically introduces sorting overhead. The total complexity therefore becomes approximately O(n log n).
The extra memory usage comes from storing grouped yearly summaries and intermediate window-function results.
Test Cases
# Example case from problem statement
# Validates normal increasing and decreasing YoY calculations
transactions = [
(1341, 123424, 1500.60, "2019-12-31"),
(1423, 123424, 1000.20, "2020-12-31"),
(1623, 123424, 1246.44, "2021-12-31"),
(1322, 123424, 2145.32, "2022-12-31"),
]
assert True # Example placeholder
# Single year only
# Previous spend and YoY should both be NULL
transactions = [
(1, 100, 500.00, "2023-01-01"),
]
assert True
# Multiple transactions in same year
# Must aggregate before computing YoY
transactions = [
(1, 100, 100.00, "2022-01-01"),
(2, 100, 200.00, "2022-06-01"),
(3, 100, 600.00, "2023-01-01"),
]
assert True
# Multiple products
# Partitions must remain independent
transactions = [
(1, 100, 100.00, "2022-01-01"),
(2, 100, 200.00, "2023-01-01"),
(3, 200, 500.00, "2022-01-01"),
(4, 200, 1000.00, "2023-01-01"),
]
assert True
# Negative growth
# Spending decreases year over year
transactions = [
(1, 100, 1000.00, "2022-01-01"),
(2, 100, 500.00, "2023-01-01"),
]
assert True
# Zero growth
# Spending remains unchanged
transactions = [
(1, 100, 1000.00, "2022-01-01"),
(2, 100, 1000.00, "2023-01-01"),
]
assert True
Test Summary
| Test | Why |
|---|---|
| Problem statement example | Validates standard YoY calculations |
| Single year product | Ensures NULL handling works |
| Multiple transactions same year | Verifies yearly aggregation |
| Multiple products | Confirms partition isolation |
| Negative growth | Validates percentage decrease handling |
| Zero growth | Ensures 0% growth computed correctly |
Edge Cases
Product Appears in Only One Year
A product may have transactions in exactly one year. In this situation, there is no previous year available for comparison.
A naive implementation might incorrectly return 0 for previous spend or YoY growth. However, the correct output requires both fields to be NULL.
This implementation handles the case naturally because LAG() returns NULL when no previous row exists.
Multiple Transactions Within the Same Year
A product may have many transactions during the same year. If we directly apply LAG() to raw transactions instead of yearly aggregates, the comparisons would become incorrect because individual transactions would be compared rather than yearly totals.
The solution avoids this bug by first grouping transactions using SUM(spend) before any window calculation occurs.
Multiple Products Interleaved by Year
Different products may have overlapping years. Without proper partitioning, one product's yearly spend could accidentally become another product's previous year value.
The implementation prevents this by using:
PARTITION BY product_id
inside the window function. This guarantees that every product maintains its own independent chronological sequence.