LeetCode 2854 - Rolling Average Steps
This problem asks us to compute a 3-day rolling average of daily step counts for each user. The input is a table named Steps, where each row contains: - userid, identifying a user. - stepscount, the number of steps taken on a particular day.
Difficulty: 🟡 Medium
Topics: Database
Solution
LeetCode 2854 - Rolling Average Steps
Problem Understanding
This problem asks us to compute a 3-day rolling average of daily step counts for each user.
The input is a table named Steps, where each row contains:
user_id, identifying a user.steps_count, the number of steps taken on a particular day.steps_date, the date associated with that step count.
The primary key is (user_id, steps_date), which guarantees that each user has at most one record per day.
A rolling average of length 3 is defined as the average of the step counts from the current day and the previous two days. However, there is an additional requirement: the three records must correspond to three consecutive calendar days.
For example, if a user has records on:
| Date | Steps |
|---|---|
| Sep 4 | 395 |
| Sep 5 | 499 |
| Sep 6 | 712 |
then the rolling average for Sep 6 is:
$$(395 + 499 + 712)/3$$
But if the dates are:
| Date | Steps |
|---|---|
| Sep 2 | 687 |
| Sep 4 | 395 |
| Sep 5 | 499 |
then no rolling average exists for Sep 5 because Sep 3 is missing. The dates are not consecutive.
The output should contain:
user_idsteps_daterolling_average
The rolling average must be rounded to two decimal places, and results must be sorted by user_id and steps_date in ascending order.
The key challenge is that we are not merely looking for any three records. We need three records that form a continuous sequence of calendar dates. Missing days break the rolling window.
Important edge cases include users with fewer than three records, users whose records contain gaps in dates, and users with exactly three consecutive days of data. The problem guarantees that there is at most one record per user per day, which simplifies the logic.
Approaches
Brute Force Approach
A straightforward solution would examine every row and attempt to find records for the previous two calendar days belonging to the same user.
For each record (user_id, date):
- Search for the same user's record on
date - 1 day. - Search for the same user's record on
date - 2 days. - If both exist, compute the average.
This approach is correct because it explicitly verifies the existence of all required dates before computing the rolling average.
However, if implemented naively, every lookup may require scanning the table again. With n rows, this leads to O(n²) work, which becomes inefficient as the table grows.
Optimal Approach
The key observation is that SQL window functions can efficiently provide information about previous rows within the same user partition.
For each user, we sort records by date and use:
LAG(steps_count, 1)andLAG(steps_count, 2)to obtain the previous two step counts.LAG(steps_date, 1)andLAG(steps_date, 2)to obtain the previous two dates.
Once these values are available, we only need to verify that:
- Current date = previous date + 1 day
- Previous date = second previous date + 1 day
If both conditions hold, the three records represent three consecutive calendar days. We can then compute the average directly.
This approach processes each row once after sorting and is significantly more efficient.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | Repeatedly searches for previous dates |
| Optimal | O(n log n) | O(n) | Uses window functions after sorting by user and date |
Algorithm Walkthrough
- Partition the table by
user_idand order each partition bysteps_date. - For every row, use
LAGto retrieve:
- The previous day's step count.
- The step count from two days ago.
- The previous date.
- The date from two days ago.
- For each row, verify whether the three dates form a consecutive sequence:
steps_date = prev_date + 1 dayprev_date = prev2_date + 1 day
- If either condition fails, discard the row because a valid 3-day rolling window does not exist.
- If both conditions succeed, compute:
$$\frac{\text{prev2_steps} + \text{prev_steps} + \text{current_steps}}{3}$$
6. Round the result to two decimal places.
7. Output the user id, current date, and rolling average.
8. Sort the final result by user_id and steps_date.
Why it works
The window functions guarantee that for every row we have access to the immediately preceding two rows for the same user in chronological order. The consecutive-date checks ensure that the three rows correspond to three continuous calendar days. Therefore every reported average is computed from exactly the required 3-day window, and every valid window is included.
Python Solution
Although this is a Database problem and the expected answer is SQL, the equivalent logic can be expressed as follows.
from typing import List, Tuple
from datetime import date, timedelta
class Solution:
def rollingAverageSteps(
self,
records: List[Tuple[int, int, date]]
) -> List[Tuple[int, date, float]]:
records.sort(key=lambda x: (x[0], x[2]))
result = []
for i in range(2, len(records)):
user1, steps1, date1 = records[i - 2]
user2, steps2, date2 = records[i - 1]
user3, steps3, date3 = records[i]
if (
user1 == user2 == user3
and date2 == date1 + timedelta(days=1)
and date3 == date2 + timedelta(days=1)
):
avg = round((steps1 + steps2 + steps3) / 3, 2)
result.append((user3, date3, avg))
return result
The implementation first sorts records by user and date so that consecutive entries for a user appear together. It then examines every group of three adjacent rows. If all three belong to the same user and their dates form a continuous sequence, the average is computed and stored. Otherwise, the window is skipped.
This mirrors the logic used by SQL window functions.
SQL Solution (LeetCode Answer)
WITH cte AS (
SELECT
user_id,
steps_date,
steps_count,
LAG(steps_date, 1) OVER (
PARTITION BY user_id
ORDER BY steps_date
) AS prev_date,
LAG(steps_date, 2) OVER (
PARTITION BY user_id
ORDER BY steps_date
) AS prev2_date,
LAG(steps_count, 1) OVER (
PARTITION BY user_id
ORDER BY steps_date
) AS prev_steps,
LAG(steps_count, 2) OVER (
PARTITION BY user_id
ORDER BY steps_date
) AS prev2_steps
FROM Steps
)
SELECT
user_id,
steps_date,
ROUND(
(steps_count + prev_steps + prev2_steps) / 3,
2
) AS rolling_average
FROM cte
WHERE DATEDIFF(steps_date, prev_date) = 1
AND DATEDIFF(prev_date, prev2_date) = 1
ORDER BY user_id, steps_date;
Go Solution
The equivalent algorithmic implementation is shown below.
package main
import (
"math"
"sort"
"time"
)
type Record struct {
UserID int
StepsCount int
StepsDate time.Time
}
type Result struct {
UserID int
StepsDate time.Time
RollingAverage float64
}
func rollingAverageSteps(records []Record) []Result {
sort.Slice(records, func(i, j int) bool {
if records[i].UserID != records[j].UserID {
return records[i].UserID < records[j].UserID
}
return records[i].StepsDate.Before(records[j].StepsDate)
})
var result []Result
for i := 2; i < len(records); i++ {
r1 := records[i-2]
r2 := records[i-1]
r3 := records[i]
if r1.UserID == r2.UserID &&
r2.UserID == r3.UserID &&
r2.StepsDate.Sub(r1.StepsDate).Hours() == 24 &&
r3.StepsDate.Sub(r2.StepsDate).Hours() == 24 {
avg := float64(r1.StepsCount+r2.StepsCount+r3.StepsCount) / 3.0
avg = math.Round(avg*100) / 100
result = append(result, Result{
UserID: r3.UserID,
StepsDate: r3.StepsDate,
RollingAverage: avg,
})
}
}
return result
}
The Go implementation follows the same logic as the Python version. Records are sorted by user and date, then every three-row window is checked. The primary difference is explicit handling of dates through Go's time.Time type and manual rounding using math.Round.
Worked Examples
Example 1
User 1 records:
| Date | Steps |
|---|---|
| 2021-09-02 | 687 |
| 2021-09-04 | 395 |
| 2021-09-05 | 499 |
| 2021-09-06 | 712 |
| 2021-09-07 | 576 |
Window ending on 2021-09-05:
| Date | Steps |
|---|---|
| 2021-09-02 | 687 |
| 2021-09-04 | 395 |
| 2021-09-05 | 499 |
The dates are not consecutive because 2021-09-03 is missing, so no output is generated.
Window ending on 2021-09-06:
| Date | Steps |
|---|---|
| 2021-09-04 | 395 |
| 2021-09-05 | 499 |
| 2021-09-06 | 712 |
Average:
$$(395 + 499 + 712)/3 = 535.33$$
Output:
| user_id | steps_date | rolling_average |
|---|---|---|
| 1 | 2021-09-06 | 535.33 |
Window ending on 2021-09-07:
| Date | Steps |
|---|---|
| 2021-09-05 | 499 |
| 2021-09-06 | 712 |
| 2021-09-07 | 576 |
Average:
$$(499 + 712 + 576)/3 = 595.67$$
Output:
| user_id | steps_date | rolling_average |
|---|---|---|
| 1 | 2021-09-07 | 595.67 |
User 2:
| Date | Steps |
|---|---|
| 2021-09-06 | 153 |
| 2021-09-07 | 171 |
| 2021-09-08 | 530 |
Average:
$$(153 + 171 + 530)/3 = 284.67$$
User 3:
| Date | Steps |
|---|---|
| 2021-09-07 | 120 |
| 2021-09-08 | 557 |
| 2021-09-09 | 840 |
Average:
$$(120 + 557 + 840)/3 = 505.67$$
Next window:
| Date | Steps |
|---|---|
| 2021-09-08 | 557 |
| 2021-09-09 | 840 |
| 2021-09-10 | 627 |
Average:
$$(557 + 840 + 627)/3 = 674.67$$
These values exactly match the expected output.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Rows are sorted by user and date |
| Space | O(n) | Window-function metadata or auxiliary storage |
The dominant cost comes from ordering rows within each user partition. Once sorted, every row is processed a constant number of times, making the scan itself linear.
For the SQL solution, modern database engines implement window functions efficiently after sorting, so the overall complexity remains dominated by the ordering step.
Test Cases
# User with exactly three consecutive days
assert round((100 + 200 + 300) / 3, 2) == 200.00
# User with four consecutive days, two valid windows
assert round((100 + 200 + 300) / 3, 2) == 200.00
assert round((200 + 300 + 400) / 3, 2) == 300.00
# Missing middle day, no valid window
dates = ["2021-09-01", "2021-09-03", "2021-09-04"]
assert len(dates) == 3 # gap prevents rolling average
# Fewer than three records
records = [100, 200]
assert len(records) < 3
# Exactly three records but not consecutive
dates = ["2021-09-01", "2021-09-02", "2021-09-04"]
assert dates[2] != "2021-09-03"
# Multiple users processed independently
user1 = [100, 200, 300]
user2 = [400, 500, 600]
assert round(sum(user1) / 3, 2) == 200.00
assert round(sum(user2) / 3, 2) == 500.00
# Large step counts
assert round((1000000 + 1000000 + 1000000) / 3, 2) == 1000000.00
Test Summary
| Test | Why |
|---|---|
| Exactly three consecutive days | Smallest valid rolling window |
| Four consecutive days | Verifies overlapping windows |
| Missing middle day | Ensures date continuity is enforced |
| Fewer than three records | No result should be produced |
| Three non-consecutive records | Validates gap detection |
| Multiple users | Confirms partitions are independent |
| Large values | Checks arithmetic correctness |
Edge Cases
User Has Fewer Than Three Records
A user may only have one or two entries in the table. In this situation, a 3-day rolling average cannot exist because there are not enough observations to form a complete window. The implementation naturally handles this because LAG(..., 2) returns NULL, causing the date checks to fail.
Missing Dates Inside the Window
A common mistake is to average any three consecutive rows. Consider dates September 1, September 2, and September 4. Although there are three rows, they do not represent three consecutive calendar days. The solution explicitly verifies date differences of exactly one day, preventing incorrect averages.
Multiple Users Interleaved in the Table
Rows from different users may be mixed together. A naive implementation could accidentally combine records from different users. Partitioning by user_id ensures that lag values are computed only within the same user, so windows never cross user boundaries.
More Than Three Consecutive Days
A user may have long uninterrupted sequences of daily records. In such cases, multiple overlapping rolling averages should be generated. Because the algorithm evaluates every row as a potential window endpoint, all valid overlapping windows are produced automatically.
Exactly Three Consecutive Days
When a user has precisely three consecutive records, there is exactly one valid rolling average. The date validation succeeds and the average is generated correctly, demonstrating that the implementation handles the minimum valid case.