LeetCode 1651 - Hopper Company Queries III

The problem requires computing a rolling 3-month average of ride distance and ride duration from ride data in a ride-hailing company database. We are given three tables: Drivers, Rides, and AcceptedRides.

LeetCode Problem 1651

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

The problem requires computing a rolling 3-month average of ride distance and ride duration from ride data in a ride-hailing company database. We are given three tables: Drivers, Rides, and AcceptedRides. Drivers provides driver information and join dates, Rides contains requested rides with dates, and AcceptedRides contains rides that were actually accepted by drivers, along with the ride distance and duration.

The goal is to calculate the average ride distance and duration for every 3-month window starting from January-March 2020 through October-December 2020. Each window consists of three consecutive months, and the average for a month is calculated by summing the ride distances (or durations) for that month and the two preceding months, divided by 3. If there are months without rides, the sum is considered zero. The averages are rounded to two decimal places. The output should list the starting month of the window and the corresponding averages in ascending order.

Key constraints include the focus on 2020, the requirement to compute 3-month rolling averages, and handling missing data correctly by treating absent months as zeros. Edge cases include months with no accepted rides, rides that occur outside 2020, or months with partial data in the first 3-month windows.

Approaches

The brute-force approach would involve iterating over each month from January to December, extracting rides for that month, then summing distances and durations for that month and the two preceding months. This would involve multiple nested queries or loops, joining all three tables repeatedly, which is inefficient for large datasets.

The optimal approach uses a single aggregation of rides per month, followed by a self-join or window function to sum the values over the 3-month window. By precomputing monthly totals and then summing over rolling windows, we avoid repeated joins and can efficiently handle missing months with COALESCE to treat absent data as zeros.

Approach Time Complexity Space Complexity Notes
Brute Force O(N*M) O(M) Join rides and accepted rides per month repeatedly for each 3-month window; inefficient for large datasets
Optimal O(N) O(M) Aggregate monthly totals first, then compute rolling sums with window functions; handles missing months efficiently

Algorithm Walkthrough

  1. Extract monthly data: Join Rides and AcceptedRides to get all accepted rides with requested_at dates. Convert each date to a month number (1-12) and year. Filter for the year 2020 only.
  2. Aggregate per month: Sum the ride_distance and ride_duration per month to get total distance and duration for each month. If a month has no rides, it will have a total of zero.
  3. Create a month series: Construct a table or series containing months 1 to 12, ensuring that all months are represented even if no rides exist.
  4. Compute 3-month rolling sums: For each month, sum the total distance and duration of that month and the two preceding months. Use COALESCE for months with missing data.
  5. Divide by 3: For each rolling sum, divide the sum by 3 to compute the average. Round the results to two decimal places.
  6. Order the result: Sort the final output by month in ascending order.

Why it works: By pre-aggregating monthly totals and using a rolling sum over three months, we correctly capture the average ride distance and duration for every window. Treating missing months as zeros ensures the average is accurate even when no rides occurred.

Python Solution

# SQL-oriented solution using Python style (for LeetCode's database execution)
# Here we write it as a string to execute directly on LeetCode

sql = """
WITH monthly_totals AS (
    SELECT
        EXTRACT(MONTH FROM r.requested_at) AS month,
        SUM(a.ride_distance) AS total_distance,
        SUM(a.ride_duration) AS total_duration
    FROM Rides r
    JOIN AcceptedRides a ON r.ride_id = a.ride_id
    WHERE EXTRACT(YEAR FROM r.requested_at) = 2020
    GROUP BY EXTRACT(MONTH FROM r.requested_at)
),
months AS (
    SELECT 1 AS month UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
    SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
    SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL
    SELECT 10
),
combined AS (
    SELECT
        m.month,
        COALESCE(mt1.total_distance, 0) AS d1,
        COALESCE(mt2.total_distance, 0) AS d2,
        COALESCE(mt3.total_distance, 0) AS d3,
        COALESCE(mt1.total_duration, 0) AS du1,
        COALESCE(mt2.total_duration, 0) AS du2,
        COALESCE(mt3.total_duration, 0) AS du3
    FROM months m
    LEFT JOIN monthly_totals mt1 ON mt1.month = m.month
    LEFT JOIN monthly_totals mt2 ON mt2.month = m.month - 1
    LEFT JOIN monthly_totals mt3 ON mt3.month = m.month - 2
)
SELECT
    month,
    ROUND((d1 + d2 + d3)/3, 2) AS average_ride_distance,
    ROUND((du1 + du2 + du3)/3, 2) AS average_ride_duration
FROM combined
ORDER BY month;
"""

The Python-oriented solution constructs three CTEs: monthly_totals aggregates rides per month, months ensures all months exist, and combined joins the current month and two previous months for rolling sums. COALESCE handles missing data.

Go Solution

// SQL solution can be executed in Go using database/sql package

query := `
WITH monthly_totals AS (
    SELECT
        EXTRACT(MONTH FROM r.requested_at) AS month,
        SUM(a.ride_distance) AS total_distance,
        SUM(a.ride_duration) AS total_duration
    FROM Rides r
    JOIN AcceptedRides a ON r.ride_id = a.ride_id
    WHERE EXTRACT(YEAR FROM r.requested_at) = 2020
    GROUP BY EXTRACT(MONTH FROM r.requested_at)
),
months AS (
    SELECT 1 AS month UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
    SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
    SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL
    SELECT 10
),
combined AS (
    SELECT
        m.month,
        COALESCE(mt1.total_distance, 0) AS d1,
        COALESCE(mt2.total_distance, 0) AS d2,
        COALESCE(mt3.total_distance, 0) AS d3,
        COALESCE(mt1.total_duration, 0) AS du1,
        COALESCE(mt2.total_duration, 0) AS du2,
        COALESCE(mt3.total_duration, 0) AS du3
    FROM months m
    LEFT JOIN monthly_totals mt1 ON mt1.month = m.month
    LEFT JOIN monthly_totals mt2 ON mt2.month = m.month - 1
    LEFT JOIN monthly_totals mt3 ON mt3.month = m.month - 2
)
SELECT
    month,
    ROUND((d1 + d2 + d3)/3, 2) AS average_ride_distance,
    ROUND((du1 + du2 + du3)/3, 2) AS average_ride_duration
FROM combined
ORDER BY month;
`

In Go, the logic is identical. The key difference is that you would execute this query using the database driver, and the results would be scanned into Go structs or variables. There is no native SQL processing in Go itself; it is passed to the DB engine.

Worked Examples

For January (month = 1), the three-month window is November-December 2019 and January 2020. Only rides in January 2020 exist, giving totals ride_distance=63 and ride_duration=38. Dividing by 3 gives 21.00 and 12.67.

For February (month = 2), the window is December 2019 to February 2020. Only February ride contributes, so the averages remain 21.00 and 12.67.

Continuing this way, each month aggregates totals from itself and two prior months, ensuring rolling averages match the output in the example.

Complexity Analysis

Measure Complexity Explanation
Time O(N) N is the number of accepted rides. Aggregation per month and rolling sum are linear operations.
Space O(M) M is the number of months (constant, here <=12). Temporary tables store month totals and combined data.

The algorithm is efficient because we compute per-month aggregates first, reducing repeated computation, and the number of months is fixed, keeping space usage minimal.

Test Cases

# provided example
assert execute_query(sql) == [
    (1, 21.00, 12.67),
    (2, 21.00, 12.67),
    (3, 21.00, 12.67),
    (4, 24.