LeetCode 1651 - Hopper Company Queries III
The problem requires computing a rolling 3-month average of ride distance and ride duration from ride data in a ride-hailing company database. We are given three tables: Drivers, Rides, and AcceptedRides.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
The problem requires computing a rolling 3-month average of ride distance and ride duration from ride data in a ride-hailing company database. We are given three tables: Drivers, Rides, and AcceptedRides. Drivers provides driver information and join dates, Rides contains requested rides with dates, and AcceptedRides contains rides that were actually accepted by drivers, along with the ride distance and duration.
The goal is to calculate the average ride distance and duration for every 3-month window starting from January-March 2020 through October-December 2020. Each window consists of three consecutive months, and the average for a month is calculated by summing the ride distances (or durations) for that month and the two preceding months, divided by 3. If there are months without rides, the sum is considered zero. The averages are rounded to two decimal places. The output should list the starting month of the window and the corresponding averages in ascending order.
Key constraints include the focus on 2020, the requirement to compute 3-month rolling averages, and handling missing data correctly by treating absent months as zeros. Edge cases include months with no accepted rides, rides that occur outside 2020, or months with partial data in the first 3-month windows.
Approaches
The brute-force approach would involve iterating over each month from January to December, extracting rides for that month, then summing distances and durations for that month and the two preceding months. This would involve multiple nested queries or loops, joining all three tables repeatedly, which is inefficient for large datasets.
The optimal approach uses a single aggregation of rides per month, followed by a self-join or window function to sum the values over the 3-month window. By precomputing monthly totals and then summing over rolling windows, we avoid repeated joins and can efficiently handle missing months with COALESCE to treat absent data as zeros.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(N*M) | O(M) | Join rides and accepted rides per month repeatedly for each 3-month window; inefficient for large datasets |
| Optimal | O(N) | O(M) | Aggregate monthly totals first, then compute rolling sums with window functions; handles missing months efficiently |
Algorithm Walkthrough
- Extract monthly data: Join
RidesandAcceptedRidesto get all accepted rides withrequested_atdates. Convert each date to a month number (1-12) and year. Filter for the year 2020 only. - Aggregate per month: Sum the
ride_distanceandride_durationper month to get total distance and duration for each month. If a month has no rides, it will have a total of zero. - Create a month series: Construct a table or series containing months 1 to 12, ensuring that all months are represented even if no rides exist.
- Compute 3-month rolling sums: For each month, sum the total distance and duration of that month and the two preceding months. Use
COALESCEfor months with missing data. - Divide by 3: For each rolling sum, divide the sum by 3 to compute the average. Round the results to two decimal places.
- Order the result: Sort the final output by
monthin ascending order.
Why it works: By pre-aggregating monthly totals and using a rolling sum over three months, we correctly capture the average ride distance and duration for every window. Treating missing months as zeros ensures the average is accurate even when no rides occurred.
Python Solution
# SQL-oriented solution using Python style (for LeetCode's database execution)
# Here we write it as a string to execute directly on LeetCode
sql = """
WITH monthly_totals AS (
SELECT
EXTRACT(MONTH FROM r.requested_at) AS month,
SUM(a.ride_distance) AS total_distance,
SUM(a.ride_duration) AS total_duration
FROM Rides r
JOIN AcceptedRides a ON r.ride_id = a.ride_id
WHERE EXTRACT(YEAR FROM r.requested_at) = 2020
GROUP BY EXTRACT(MONTH FROM r.requested_at)
),
months AS (
SELECT 1 AS month UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL
SELECT 10
),
combined AS (
SELECT
m.month,
COALESCE(mt1.total_distance, 0) AS d1,
COALESCE(mt2.total_distance, 0) AS d2,
COALESCE(mt3.total_distance, 0) AS d3,
COALESCE(mt1.total_duration, 0) AS du1,
COALESCE(mt2.total_duration, 0) AS du2,
COALESCE(mt3.total_duration, 0) AS du3
FROM months m
LEFT JOIN monthly_totals mt1 ON mt1.month = m.month
LEFT JOIN monthly_totals mt2 ON mt2.month = m.month - 1
LEFT JOIN monthly_totals mt3 ON mt3.month = m.month - 2
)
SELECT
month,
ROUND((d1 + d2 + d3)/3, 2) AS average_ride_distance,
ROUND((du1 + du2 + du3)/3, 2) AS average_ride_duration
FROM combined
ORDER BY month;
"""
The Python-oriented solution constructs three CTEs: monthly_totals aggregates rides per month, months ensures all months exist, and combined joins the current month and two previous months for rolling sums. COALESCE handles missing data.
Go Solution
// SQL solution can be executed in Go using database/sql package
query := `
WITH monthly_totals AS (
SELECT
EXTRACT(MONTH FROM r.requested_at) AS month,
SUM(a.ride_distance) AS total_distance,
SUM(a.ride_duration) AS total_duration
FROM Rides r
JOIN AcceptedRides a ON r.ride_id = a.ride_id
WHERE EXTRACT(YEAR FROM r.requested_at) = 2020
GROUP BY EXTRACT(MONTH FROM r.requested_at)
),
months AS (
SELECT 1 AS month UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL
SELECT 10
),
combined AS (
SELECT
m.month,
COALESCE(mt1.total_distance, 0) AS d1,
COALESCE(mt2.total_distance, 0) AS d2,
COALESCE(mt3.total_distance, 0) AS d3,
COALESCE(mt1.total_duration, 0) AS du1,
COALESCE(mt2.total_duration, 0) AS du2,
COALESCE(mt3.total_duration, 0) AS du3
FROM months m
LEFT JOIN monthly_totals mt1 ON mt1.month = m.month
LEFT JOIN monthly_totals mt2 ON mt2.month = m.month - 1
LEFT JOIN monthly_totals mt3 ON mt3.month = m.month - 2
)
SELECT
month,
ROUND((d1 + d2 + d3)/3, 2) AS average_ride_distance,
ROUND((du1 + du2 + du3)/3, 2) AS average_ride_duration
FROM combined
ORDER BY month;
`
In Go, the logic is identical. The key difference is that you would execute this query using the database driver, and the results would be scanned into Go structs or variables. There is no native SQL processing in Go itself; it is passed to the DB engine.
Worked Examples
For January (month = 1), the three-month window is November-December 2019 and January 2020. Only rides in January 2020 exist, giving totals ride_distance=63 and ride_duration=38. Dividing by 3 gives 21.00 and 12.67.
For February (month = 2), the window is December 2019 to February 2020. Only February ride contributes, so the averages remain 21.00 and 12.67.
Continuing this way, each month aggregates totals from itself and two prior months, ensuring rolling averages match the output in the example.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N) | N is the number of accepted rides. Aggregation per month and rolling sum are linear operations. |
| Space | O(M) | M is the number of months (constant, here <=12). Temporary tables store month totals and combined data. |
The algorithm is efficient because we compute per-month aggregates first, reducing repeated computation, and the number of months is fixed, keeping space usage minimal.
Test Cases
# provided example
assert execute_query(sql) == [
(1, 21.00, 12.67),
(2, 21.00, 12.67),
(3, 21.00, 12.67),
(4, 24.