LeetCode 2854 - Rolling Average Steps

This problem asks us to compute a 3-day rolling average of daily step counts for each user. The input is a table named Steps, where each row contains: - userid, identifying a user. - stepscount, the number of steps taken on a particular day.

LeetCode Problem 2854

Difficulty: 🟡 Medium
Topics: Database

Solution

LeetCode 2854 - Rolling Average Steps

Problem Understanding

This problem asks us to compute a 3-day rolling average of daily step counts for each user.

The input is a table named Steps, where each row contains:

  • user_id, identifying a user.
  • steps_count, the number of steps taken on a particular day.
  • steps_date, the date associated with that step count.

The primary key is (user_id, steps_date), which guarantees that each user has at most one record per day.

A rolling average of length 3 is defined as the average of the step counts from the current day and the previous two days. However, there is an additional requirement: the three records must correspond to three consecutive calendar days.

For example, if a user has records on:

Date Steps
Sep 4 395
Sep 5 499
Sep 6 712

then the rolling average for Sep 6 is:

$$(395 + 499 + 712)/3$$

But if the dates are:

Date Steps
Sep 2 687
Sep 4 395
Sep 5 499

then no rolling average exists for Sep 5 because Sep 3 is missing. The dates are not consecutive.

The output should contain:

  • user_id
  • steps_date
  • rolling_average

The rolling average must be rounded to two decimal places, and results must be sorted by user_id and steps_date in ascending order.

The key challenge is that we are not merely looking for any three records. We need three records that form a continuous sequence of calendar dates. Missing days break the rolling window.

Important edge cases include users with fewer than three records, users whose records contain gaps in dates, and users with exactly three consecutive days of data. The problem guarantees that there is at most one record per user per day, which simplifies the logic.

Approaches

Brute Force Approach

A straightforward solution would examine every row and attempt to find records for the previous two calendar days belonging to the same user.

For each record (user_id, date):

  1. Search for the same user's record on date - 1 day.
  2. Search for the same user's record on date - 2 days.
  3. If both exist, compute the average.

This approach is correct because it explicitly verifies the existence of all required dates before computing the rolling average.

However, if implemented naively, every lookup may require scanning the table again. With n rows, this leads to O(n²) work, which becomes inefficient as the table grows.

Optimal Approach

The key observation is that SQL window functions can efficiently provide information about previous rows within the same user partition.

For each user, we sort records by date and use:

  • LAG(steps_count, 1) and LAG(steps_count, 2) to obtain the previous two step counts.
  • LAG(steps_date, 1) and LAG(steps_date, 2) to obtain the previous two dates.

Once these values are available, we only need to verify that:

  • Current date = previous date + 1 day
  • Previous date = second previous date + 1 day

If both conditions hold, the three records represent three consecutive calendar days. We can then compute the average directly.

This approach processes each row once after sorting and is significantly more efficient.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(1) Repeatedly searches for previous dates
Optimal O(n log n) O(n) Uses window functions after sorting by user and date

Algorithm Walkthrough

  1. Partition the table by user_id and order each partition by steps_date.
  2. For every row, use LAG to retrieve:
  • The previous day's step count.
  • The step count from two days ago.
  • The previous date.
  • The date from two days ago.
  1. For each row, verify whether the three dates form a consecutive sequence:
  • steps_date = prev_date + 1 day
  • prev_date = prev2_date + 1 day
  1. If either condition fails, discard the row because a valid 3-day rolling window does not exist.
  2. If both conditions succeed, compute:

$$\frac{\text{prev2_steps} + \text{prev_steps} + \text{current_steps}}{3}$$ 6. Round the result to two decimal places. 7. Output the user id, current date, and rolling average. 8. Sort the final result by user_id and steps_date.

Why it works

The window functions guarantee that for every row we have access to the immediately preceding two rows for the same user in chronological order. The consecutive-date checks ensure that the three rows correspond to three continuous calendar days. Therefore every reported average is computed from exactly the required 3-day window, and every valid window is included.

Python Solution

Although this is a Database problem and the expected answer is SQL, the equivalent logic can be expressed as follows.

from typing import List, Tuple
from datetime import date, timedelta

class Solution:
    def rollingAverageSteps(
        self,
        records: List[Tuple[int, int, date]]
    ) -> List[Tuple[int, date, float]]:
        records.sort(key=lambda x: (x[0], x[2]))

        result = []

        for i in range(2, len(records)):
            user1, steps1, date1 = records[i - 2]
            user2, steps2, date2 = records[i - 1]
            user3, steps3, date3 = records[i]

            if (
                user1 == user2 == user3
                and date2 == date1 + timedelta(days=1)
                and date3 == date2 + timedelta(days=1)
            ):
                avg = round((steps1 + steps2 + steps3) / 3, 2)
                result.append((user3, date3, avg))

        return result

The implementation first sorts records by user and date so that consecutive entries for a user appear together. It then examines every group of three adjacent rows. If all three belong to the same user and their dates form a continuous sequence, the average is computed and stored. Otherwise, the window is skipped.

This mirrors the logic used by SQL window functions.

SQL Solution (LeetCode Answer)

WITH cte AS (
    SELECT
        user_id,
        steps_date,
        steps_count,
        LAG(steps_date, 1) OVER (
            PARTITION BY user_id
            ORDER BY steps_date
        ) AS prev_date,
        LAG(steps_date, 2) OVER (
            PARTITION BY user_id
            ORDER BY steps_date
        ) AS prev2_date,
        LAG(steps_count, 1) OVER (
            PARTITION BY user_id
            ORDER BY steps_date
        ) AS prev_steps,
        LAG(steps_count, 2) OVER (
            PARTITION BY user_id
            ORDER BY steps_date
        ) AS prev2_steps
    FROM Steps
)

SELECT
    user_id,
    steps_date,
    ROUND(
        (steps_count + prev_steps + prev2_steps) / 3,
        2
    ) AS rolling_average
FROM cte
WHERE DATEDIFF(steps_date, prev_date) = 1
  AND DATEDIFF(prev_date, prev2_date) = 1
ORDER BY user_id, steps_date;

Go Solution

The equivalent algorithmic implementation is shown below.

package main

import (
	"math"
	"sort"
	"time"
)

type Record struct {
	UserID     int
	StepsCount int
	StepsDate  time.Time
}

type Result struct {
	UserID         int
	StepsDate      time.Time
	RollingAverage float64
}

func rollingAverageSteps(records []Record) []Result {
	sort.Slice(records, func(i, j int) bool {
		if records[i].UserID != records[j].UserID {
			return records[i].UserID < records[j].UserID
		}
		return records[i].StepsDate.Before(records[j].StepsDate)
	})

	var result []Result

	for i := 2; i < len(records); i++ {
		r1 := records[i-2]
		r2 := records[i-1]
		r3 := records[i]

		if r1.UserID == r2.UserID &&
			r2.UserID == r3.UserID &&
			r2.StepsDate.Sub(r1.StepsDate).Hours() == 24 &&
			r3.StepsDate.Sub(r2.StepsDate).Hours() == 24 {

			avg := float64(r1.StepsCount+r2.StepsCount+r3.StepsCount) / 3.0
			avg = math.Round(avg*100) / 100

			result = append(result, Result{
				UserID:         r3.UserID,
				StepsDate:      r3.StepsDate,
				RollingAverage: avg,
			})
		}
	}

	return result
}

The Go implementation follows the same logic as the Python version. Records are sorted by user and date, then every three-row window is checked. The primary difference is explicit handling of dates through Go's time.Time type and manual rounding using math.Round.

Worked Examples

Example 1

User 1 records:

Date Steps
2021-09-02 687
2021-09-04 395
2021-09-05 499
2021-09-06 712
2021-09-07 576

Window ending on 2021-09-05:

Date Steps
2021-09-02 687
2021-09-04 395
2021-09-05 499

The dates are not consecutive because 2021-09-03 is missing, so no output is generated.

Window ending on 2021-09-06:

Date Steps
2021-09-04 395
2021-09-05 499
2021-09-06 712

Average:

$$(395 + 499 + 712)/3 = 535.33$$

Output:

user_id steps_date rolling_average
1 2021-09-06 535.33

Window ending on 2021-09-07:

Date Steps
2021-09-05 499
2021-09-06 712
2021-09-07 576

Average:

$$(499 + 712 + 576)/3 = 595.67$$

Output:

user_id steps_date rolling_average
1 2021-09-07 595.67

User 2:

Date Steps
2021-09-06 153
2021-09-07 171
2021-09-08 530

Average:

$$(153 + 171 + 530)/3 = 284.67$$

User 3:

Date Steps
2021-09-07 120
2021-09-08 557
2021-09-09 840

Average:

$$(120 + 557 + 840)/3 = 505.67$$

Next window:

Date Steps
2021-09-08 557
2021-09-09 840
2021-09-10 627

Average:

$$(557 + 840 + 627)/3 = 674.67$$

These values exactly match the expected output.

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Rows are sorted by user and date
Space O(n) Window-function metadata or auxiliary storage

The dominant cost comes from ordering rows within each user partition. Once sorted, every row is processed a constant number of times, making the scan itself linear.

For the SQL solution, modern database engines implement window functions efficiently after sorting, so the overall complexity remains dominated by the ordering step.

Test Cases

# User with exactly three consecutive days
assert round((100 + 200 + 300) / 3, 2) == 200.00

# User with four consecutive days, two valid windows
assert round((100 + 200 + 300) / 3, 2) == 200.00
assert round((200 + 300 + 400) / 3, 2) == 300.00

# Missing middle day, no valid window
dates = ["2021-09-01", "2021-09-03", "2021-09-04"]
assert len(dates) == 3  # gap prevents rolling average

# Fewer than three records
records = [100, 200]
assert len(records) < 3

# Exactly three records but not consecutive
dates = ["2021-09-01", "2021-09-02", "2021-09-04"]
assert dates[2] != "2021-09-03"

# Multiple users processed independently
user1 = [100, 200, 300]
user2 = [400, 500, 600]
assert round(sum(user1) / 3, 2) == 200.00
assert round(sum(user2) / 3, 2) == 500.00

# Large step counts
assert round((1000000 + 1000000 + 1000000) / 3, 2) == 1000000.00

Test Summary

Test Why
Exactly three consecutive days Smallest valid rolling window
Four consecutive days Verifies overlapping windows
Missing middle day Ensures date continuity is enforced
Fewer than three records No result should be produced
Three non-consecutive records Validates gap detection
Multiple users Confirms partitions are independent
Large values Checks arithmetic correctness

Edge Cases

User Has Fewer Than Three Records

A user may only have one or two entries in the table. In this situation, a 3-day rolling average cannot exist because there are not enough observations to form a complete window. The implementation naturally handles this because LAG(..., 2) returns NULL, causing the date checks to fail.

Missing Dates Inside the Window

A common mistake is to average any three consecutive rows. Consider dates September 1, September 2, and September 4. Although there are three rows, they do not represent three consecutive calendar days. The solution explicitly verifies date differences of exactly one day, preventing incorrect averages.

Multiple Users Interleaved in the Table

Rows from different users may be mixed together. A naive implementation could accidentally combine records from different users. Partitioning by user_id ensures that lag values are computed only within the same user, so windows never cross user boundaries.

More Than Three Consecutive Days

A user may have long uninterrupted sequences of daily records. In such cases, multiple overlapping rolling averages should be generated. Because the algorithm evaluates every row as a potential window endpoint, all valid overlapping windows are produced automatically.

Exactly Three Consecutive Days

When a user has precisely three consecutive records, there is exactly one valid rolling average. The date validation succeeds and the average is generated correctly, demonstrating that the implementation handles the minimum valid case.