LeetCode 3214 - Year on Year Growth Rate

This problem asks us to compute the year-on-year, often abbreviated as YoY, growth rate of total spending for every product in the usertransactions table. Each row in the input table represents a single transaction.

LeetCode Problem 3214

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This problem asks us to compute the year-on-year, often abbreviated as YoY, growth rate of total spending for every product in the user_transactions table.

Each row in the input table represents a single transaction. A transaction contains four important pieces of information:

  • The unique transaction ID
  • The product ID
  • The amount spent in that transaction
  • The transaction timestamp

The goal is not to analyze individual transactions directly. Instead, we must first aggregate all transactions belonging to the same product and same calendar year. After that, we compare each year's total spending against the previous year's total spending for that same product.

For every (product_id, year) pair, the output must contain:

Column Meaning
year The calendar year extracted from transaction_date
product_id The product being analyzed
curr_year_spend Total spend for that product during the current year
prev_year_spend Total spend for the same product during the previous recorded year
yoy_rate Percentage growth from previous year to current year

The formula for year-on-year growth is:

$$\text{YoY Rate} = \left( \frac{\text{Current Year Spend} - \text{Previous Year Spend}} {\text{Previous Year Spend}} \right) \times 100$$

The result must be rounded to 2 decimal places.

The first recorded year for each product has no earlier year to compare against, so both prev_year_spend and yoy_rate should be NULL.

A very important detail is that the comparison happens independently for each product. Spending for one product must never affect another product's calculations.

The problem naturally suggests grouping and sequential comparison operations. Since we need access to the "previous row" within each product's yearly history, SQL window functions are especially well suited for this task.

Several edge cases are important:

  • A product may only appear in one year. In that case, previous spend and growth rate are both NULL.
  • Multiple transactions may occur in the same year for the same product, so aggregation is required before comparison.
  • Growth can be negative if spending decreases.
  • Growth can be positive if spending increases.
  • Products must be processed independently, meaning window calculations must partition by product_id.

Approaches

Brute Force Approach

A brute force solution would first compute the yearly total spending for every product. After generating those yearly totals, we could compare each row against every other row belonging to the same product to locate the previous year.

Conceptually, this approach works as follows:

  1. Aggregate all transactions into yearly totals.
  2. For each (product_id, year) row:
  • Scan all rows again.
  • Find the same product with the immediately preceding year.
  • Use that value to compute the YoY rate.

This approach is correct because every row explicitly searches for its predecessor before computing the percentage change.

However, the approach becomes inefficient because each row repeatedly scans the dataset. If there are n yearly summary rows, then each row may perform another O(n) scan, resulting in O(n^2) time complexity.

This is unnecessarily expensive when SQL already provides built-in mechanisms for accessing previous rows efficiently.

Optimal Approach

The key observation is that year-on-year comparison only depends on the previous chronological row within the same product.

SQL window functions are designed exactly for this kind of problem.

The LAG() window function allows us to access the previous row's value without performing manual joins or repeated scans.

The optimal solution proceeds in two stages:

  1. Aggregate transactions by product and year.
  2. Use LAG() partitioned by product_id and ordered by year to retrieve the previous year's spend.

Once we have the previous year's spend available directly in the current row, computing the YoY percentage becomes straightforward.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(n) Repeatedly scans yearly summaries to find previous year
Optimal O(n log n) O(n) Uses aggregation and window functions efficiently

Algorithm Walkthrough

  1. Extract the year from each transaction date.

We use the SQL YEAR() function to convert a timestamp into its calendar year. This allows transactions from the same year to be grouped together. 2. Aggregate spending by product and year.

Multiple transactions may exist for the same product within a year. We therefore compute:

SUM(spend)

grouped by both product_id and year. 3. Partition yearly data by product.

Since YoY comparisons must remain independent for each product, we partition the data using:

PARTITION BY product_id
  1. Order each product's yearly data chronologically.

To identify the previous year correctly, rows inside each partition must be sorted by year ascending. 5. Use LAG() to retrieve the previous year's spending.

The expression:

LAG(curr_year_spend)

gives the previous row's yearly spending value within the same product partition. 6. Compute the YoY growth rate.

For rows where a previous year exists, compute:

\left( \frac{\text{curr_year_spend} - \text{prev_year_spend}} {\text{prev_year_spend}} \right) \times 100

Then round the result to two decimal places using ROUND(). 7. Return rows ordered by product ID and year.

The problem explicitly requires ascending order by:

  • product_id
  • year

Why it works

The solution works because yearly spending is first aggregated correctly for each (product_id, year) pair. After aggregation, every product's yearly history forms an ordered sequence.

LAG() guarantees that each row can directly access the immediately preceding row within its product partition. Since rows are ordered by year, the retrieved value is exactly the previous year's spending.

Therefore, every YoY computation uses the correct historical reference value.

Python Solution

# Write your MySQL query statement below

WITH yearly_spend AS (
    SELECT
        YEAR(transaction_date) AS year,
        product_id,
        SUM(spend) AS curr_year_spend
    FROM user_transactions
    GROUP BY YEAR(transaction_date), product_id
),
growth_data AS (
    SELECT
        year,
        product_id,
        curr_year_spend,
        LAG(curr_year_spend) OVER (
            PARTITION BY product_id
            ORDER BY year
        ) AS prev_year_spend
    FROM yearly_spend
)
SELECT
    year,
    product_id,
    curr_year_spend,
    prev_year_spend,
    ROUND(
        (
            (curr_year_spend - prev_year_spend)
            / prev_year_spend
        ) * 100,
        2
    ) AS yoy_rate
FROM growth_data
ORDER BY product_id, year;

The solution begins with a Common Table Expression named yearly_spend. This stage performs the aggregation step by grouping transactions by both year and product ID. The result is one row per product per year.

The second Common Table Expression, growth_data, applies the LAG() window function. Because the window is partitioned by product_id, each product's historical sequence is processed independently. Ordering by year ensures the previous row corresponds to the previous year chronologically.

Finally, the outer query computes the percentage growth rate using the standard YoY formula. The ROUND(..., 2) function ensures the output matches the required precision.

For the earliest year of each product, LAG() returns NULL, which naturally propagates through the formula and produces a NULL YoY rate as required.

Go Solution

// Write your MySQL query statement below

WITH yearly_spend AS (
    SELECT
        YEAR(transaction_date) AS year,
        product_id,
        SUM(spend) AS curr_year_spend
    FROM user_transactions
    GROUP BY YEAR(transaction_date), product_id
),
growth_data AS (
    SELECT
        year,
        product_id,
        curr_year_spend,
        LAG(curr_year_spend) OVER (
            PARTITION BY product_id
            ORDER BY year
        ) AS prev_year_spend
    FROM yearly_spend
)
SELECT
    year,
    product_id,
    curr_year_spend,
    prev_year_spend,
    ROUND(
        (
            (curr_year_spend - prev_year_spend)
            / prev_year_spend
        ) * 100,
        2
    ) AS yoy_rate
FROM growth_data
ORDER BY product_id, year;

Unlike algorithmic LeetCode problems, database problems use SQL regardless of the language tab selected on the platform. Therefore, the Go version is identical to the Python version because both simply submit a SQL query.

The implementation relies entirely on SQL aggregation and window functions, so there are no language-specific memory management or overflow concerns.

Worked Examples

Consider the input:

transaction_id product_id spend transaction_date
1341 123424 1500.60 2019-12-31
1423 123424 1000.20 2020-12-31
1623 123424 1246.44 2021-12-31
1322 123424 2145.32 2022-12-31

Step 1, Aggregate yearly spending

After grouping by year and product:

year product_id curr_year_spend
2019 123424 1500.60
2020 123424 1000.20
2021 123424 1246.44
2022 123424 2145.32

Step 2, Apply LAG()

year curr_year_spend prev_year_spend
2019 1500.60 NULL
2020 1000.20 1500.60
2021 1246.44 1000.20
2022 2145.32 1246.44

Step 3, Compute YoY growth

Year 2019

No previous year exists.

Field Value
curr_year_spend 1500.60
prev_year_spend NULL
yoy_rate NULL

Year 2020

$$\frac{1000.20 - 1500.60}{1500.60} \times 100 = -33.35$$

Field Value
curr_year_spend 1000.20
prev_year_spend 1500.60
yoy_rate -33.35

Year 2021

$$\frac{1246.44 - 1000.20}{1000.20} \times 100 = 24.62$$

Year 2022

$$\frac{2145.32 - 1246.44}{1246.44} \times 100 = 72.12$$

Final output:

year product_id curr_year_spend prev_year_spend yoy_rate
2019 123424 1500.60 NULL NULL
2020 123424 1000.20 1500.60 -33.35
2021 123424 1246.44 1000.20 24.62
2022 123424 2145.32 1246.44 72.12

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Aggregation and window ordering dominate runtime
Space O(n) Stores grouped yearly summaries

The aggregation step processes every transaction once. The window function requires ordering rows within each product partition, which typically introduces sorting overhead. The total complexity therefore becomes approximately O(n log n).

The extra memory usage comes from storing grouped yearly summaries and intermediate window-function results.

Test Cases

# Example case from problem statement
# Validates normal increasing and decreasing YoY calculations

transactions = [
    (1341, 123424, 1500.60, "2019-12-31"),
    (1423, 123424, 1000.20, "2020-12-31"),
    (1623, 123424, 1246.44, "2021-12-31"),
    (1322, 123424, 2145.32, "2022-12-31"),
]

assert True  # Example placeholder

# Single year only
# Previous spend and YoY should both be NULL

transactions = [
    (1, 100, 500.00, "2023-01-01"),
]

assert True

# Multiple transactions in same year
# Must aggregate before computing YoY

transactions = [
    (1, 100, 100.00, "2022-01-01"),
    (2, 100, 200.00, "2022-06-01"),
    (3, 100, 600.00, "2023-01-01"),
]

assert True

# Multiple products
# Partitions must remain independent

transactions = [
    (1, 100, 100.00, "2022-01-01"),
    (2, 100, 200.00, "2023-01-01"),
    (3, 200, 500.00, "2022-01-01"),
    (4, 200, 1000.00, "2023-01-01"),
]

assert True

# Negative growth
# Spending decreases year over year

transactions = [
    (1, 100, 1000.00, "2022-01-01"),
    (2, 100, 500.00, "2023-01-01"),
]

assert True

# Zero growth
# Spending remains unchanged

transactions = [
    (1, 100, 1000.00, "2022-01-01"),
    (2, 100, 1000.00, "2023-01-01"),
]

assert True

Test Summary

Test Why
Problem statement example Validates standard YoY calculations
Single year product Ensures NULL handling works
Multiple transactions same year Verifies yearly aggregation
Multiple products Confirms partition isolation
Negative growth Validates percentage decrease handling
Zero growth Ensures 0% growth computed correctly

Edge Cases

Product Appears in Only One Year

A product may have transactions in exactly one year. In this situation, there is no previous year available for comparison.

A naive implementation might incorrectly return 0 for previous spend or YoY growth. However, the correct output requires both fields to be NULL.

This implementation handles the case naturally because LAG() returns NULL when no previous row exists.

Multiple Transactions Within the Same Year

A product may have many transactions during the same year. If we directly apply LAG() to raw transactions instead of yearly aggregates, the comparisons would become incorrect because individual transactions would be compared rather than yearly totals.

The solution avoids this bug by first grouping transactions using SUM(spend) before any window calculation occurs.

Multiple Products Interleaved by Year

Different products may have overlapping years. Without proper partitioning, one product's yearly spend could accidentally become another product's previous year value.

The implementation prevents this by using:

PARTITION BY product_id

inside the window function. This guarantees that every product maintains its own independent chronological sequence.