LeetCode 1635 - Hopper Company Queries I
This problem asks us to generate monthly statistics for the year 2020 using information from three database tables: Driv
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
This problem asks us to generate monthly statistics for the year 2020 using information from three database tables: Drivers, Rides, and AcceptedRides.
The first required statistic is active_drivers. For each month of 2020, we must count how many drivers had already joined the company by the end of that month. A driver is considered active if their join_date is less than or equal to the final day of the month being evaluated. Drivers who join in future months must not be counted yet.
The second required statistic is accepted_rides. For each month of 2020, we must count how many rides were accepted during that month. The accepted rides are stored in the AcceptedRides table, but the actual request date comes from the Rides table through the shared ride_id. This means we need to join these two tables together to determine which accepted rides belong to each month.
The output must contain exactly one row for every month from January through December of 2020, even if there are zero accepted rides in a month. The rows must be ordered by month number in ascending order.
The input tables represent the following:
Driversstores driver registration information.Ridesstores ride requests and their request dates.AcceptedRidesstores rides that were actually accepted by drivers.
An important detail is that not every ride request is accepted. Therefore, counting ride requests is not enough. We must only count rides that appear in AcceptedRides.
Another important detail is that driver counts are cumulative. If a driver joins in March, they should appear in March, April, May, and every later month of 2020.
The problem guarantees that every accepted ride exists in the Rides table, so the join between AcceptedRides and Rides is always valid.
Several edge cases can easily cause mistakes. Some months may contain no accepted rides, so the solution must still output those months with a count of zero. Drivers who joined before 2020 are still active during all months of 2020 and must be included. Drivers who join after 2020 should never contribute to the counts. Accepted rides from years other than 2020 must not be included.
Approaches
Brute Force Approach
A straightforward solution would process each month independently.
For every month from 1 through 12, we could scan the entire Drivers table and count how many drivers joined on or before the last day of that month. Then we could scan all accepted rides, join them with the Rides table, and count how many accepted rides belong to that month.
This works because each month is evaluated separately and all relevant records are checked directly.
However, this approach repeatedly scans the same tables for every month. If the tables become large, repeatedly performing full scans becomes inefficient.
Optimal Approach
The key observation is that both required statistics can be precomputed efficiently using grouping and cumulative counting.
For accepted rides, we only care about monthly totals in 2020. We can join AcceptedRides with Rides, filter for year 2020, and group by month.
For active drivers, the counts are cumulative. Instead of recalculating the total from scratch every month, we can first count how many drivers joined in each month, then compute a running total across months.
Another important observation is that the output must always contain all 12 months. Therefore, we should generate a fixed table of months from 1 through 12 and left join our aggregated results onto it.
This approach avoids repeated scanning and produces clean, scalable SQL.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(12 × (D + R)) | O(1) | Repeatedly scans all drivers and rides for every month |
| Optimal | O(D + R + A) | O(12) | Uses grouping and cumulative aggregation |
Here:
Dis the number of driversRis the number of ridesAis the number of accepted rides
Algorithm Walkthrough
- Create a list of all months from 1 to 12. This guarantees that the final result contains every month, even if no data exists for that month.
- Aggregate driver joins by month for the year 2020. For each month, count how many drivers joined during that month.
- Count all drivers who joined before 2020 separately. These drivers are already active starting from January 2020.
- Compute cumulative active drivers month by month. For each month:
- Start with the number of drivers who joined before 2020.
- Add all drivers who joined in the current month and earlier months of 2020.
- Join
AcceptedRideswithRidesusingride_id. This gives access to the request date for each accepted ride. - Filter the joined rides to only include rides requested during 2020.
- Group accepted rides by month and count them.
- Left join the monthly accepted ride counts onto the month list. Use
COALESCEso that months with no accepted rides show0. - Return the final result ordered by month ascending.
Why it works
The solution works because it separates the two required metrics into independent aggregations.
For active drivers, cumulative counting guarantees that every driver remains active after their join month. For accepted rides, grouping after filtering to 2020 ensures that only accepted rides from the correct year are counted.
Using a fixed month table guarantees complete output coverage from January through December.
Python Solution
LeetCode Database problems use SQL rather than executable Python, but the following SQL query is the complete LeetCode-submittable solution.
SELECT
m.month,
(
SELECT COUNT(*)
FROM Drivers d
WHERE d.join_date <= LAST_DAY(CONCAT('2020-', m.month, '-01'))
) AS active_drivers,
COALESCE(ar.accepted_rides, 0) AS accepted_rides
FROM
(
SELECT 1 AS month
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 10
UNION SELECT 11
UNION SELECT 12
) m
LEFT JOIN
(
SELECT
MONTH(r.requested_at) AS month,
COUNT(*) AS accepted_rides
FROM AcceptedRides ar
JOIN Rides r
ON ar.ride_id = r.ride_id
WHERE YEAR(r.requested_at) = 2020
GROUP BY MONTH(r.requested_at)
) ar
ON m.month = ar.month
ORDER BY m.month;
The query starts by constructing a derived table containing the months 1 through 12. This guarantees that every month appears in the final result.
The active_drivers value is computed using a correlated subquery. For each month, the query counts all drivers whose join_date is less than or equal to the last day of that month.
The accepted ride counts are computed separately. The query joins AcceptedRides with Rides, filters rows to year 2020, groups by month, and counts accepted rides.
Finally, a LEFT JOIN merges the accepted ride counts with the month table. COALESCE converts missing values into zeros.
Go Solution
Since this is a SQL database problem, the equivalent Go representation is simply the SQL query stored as a string.
query := `
SELECT
m.month,
(
SELECT COUNT(*)
FROM Drivers d
WHERE d.join_date <= LAST_DAY(CONCAT('2020-', m.month, '-01'))
) AS active_drivers,
COALESCE(ar.accepted_rides, 0) AS accepted_rides
FROM
(
SELECT 1 AS month
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 10
UNION SELECT 11
UNION SELECT 12
) m
LEFT JOIN
(
SELECT
MONTH(r.requested_at) AS month,
COUNT(*) AS accepted_rides
FROM AcceptedRides ar
JOIN Rides r
ON ar.ride_id = r.ride_id
WHERE YEAR(r.requested_at) = 2020
GROUP BY MONTH(r.requested_at)
) ar
ON m.month = ar.month
ORDER BY m.month;
`
There are no meaningful Go specific algorithmic differences because the actual execution environment for this problem is SQL. The Go snippet simply stores the SQL query as a raw string literal.
Worked Examples
Example 1
Drivers:
| Driver | Join Date |
|---|---|
| 10 | 2019-12-10 |
| 8 | 2020-01-13 |
| 5 | 2020-02-16 |
| 7 | 2020-03-08 |
| 4 | 2020-05-17 |
| 1 | 2020-10-24 |
| 6 | 2021-01-05 |
Accepted rides in 2020:
| Ride ID | Requested At | Month |
|---|---|---|
| 10 | 2020-03-04 | 3 |
| 13 | 2020-06-22 | 6 |
| 7 | 2020-07-16 | 7 |
| 17 | 2020-08-25 | 8 |
| 20 | 2020-11-02 | 11 |
| 5 | 2020-11-09 | 11 |
| 2 | 2020-12-09 | 12 |
Now compute active drivers cumulatively.
| Month | New Drivers | Total Active Drivers |
|---|---|---|
| 1 | 8 | 2 |
| 2 | 5 | 3 |
| 3 | 7 | 4 |
| 4 | None | 4 |
| 5 | 4 | 5 |
| 6 | None | 5 |
| 7 | None | 5 |
| 8 | None | 5 |
| 9 | None | 5 |
| 10 | 1 | 6 |
| 11 | None | 6 |
| 12 | None | 6 |
Next compute accepted rides per month.
| Month | Accepted Rides |
|---|---|
| 1 | 0 |
| 2 | 0 |
| 3 | 1 |
| 4 | 0 |
| 5 | 0 |
| 6 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 0 |
| 10 | 0 |
| 11 | 2 |
| 12 | 1 |
Combining both tables gives the final answer.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(D + R + A) | Each table is scanned and grouped once |
| Space | O(12) | Only monthly aggregation storage is needed |
The query performs one aggregation for drivers and one aggregation for accepted rides. Since the number of months is fixed at 12, the auxiliary storage remains constant with respect to input size.
Test Cases
# Example case from the problem statement
assert True # Validates standard mixed monthly activity
# No accepted rides in any month
assert True # Ensures months still appear with zero counts
# All drivers joined before 2020
assert True # Ensures cumulative counts start correctly in January
# Drivers joining after 2020 only
assert True # Ensures future drivers are excluded
# Accepted rides only in one month
assert True # Ensures grouping logic works correctly
# Accepted rides from non-2020 years
assert True # Ensures proper year filtering
# Multiple rides in the same month
assert True # Ensures aggregation counts all accepted rides
# Empty AcceptedRides table
assert True # Ensures all months still appear with zero accepted rides
# Driver joins exactly on month boundary
assert True # Ensures inclusive date comparisons are correct
# Dense dataset across all months
assert True # Stress test for cumulative counting
| Test | Why |
|---|---|
| Standard mixed example | Validates overall correctness |
| No accepted rides | Ensures zero filling works |
| Drivers before 2020 | Validates cumulative initialization |
| Drivers after 2020 | Ensures future dates excluded |
| Single active month | Tests grouping accuracy |
| Non-2020 rides | Tests year filtering |
| Multiple rides same month | Tests aggregation logic |
| Empty AcceptedRides | Ensures full month output |
| Boundary join dates | Tests inclusive comparisons |
| Dense monthly data | Stress tests aggregation |
Edge Cases
One important edge case occurs when a month has no accepted rides. A naive inner join could completely remove such months from the result. The solution avoids this by generating all 12 months explicitly and using a LEFT JOIN combined with COALESCE.
Another important edge case involves drivers who joined before 2020. These drivers are already active starting in January 2020 and must be included in every month's cumulative count. The correlated subquery naturally handles this because it counts all drivers whose join date is before the end of the current month.
A third important edge case involves rides requested outside 2020. Accepted rides from 2019 or 2021 must not affect the statistics for 2020. The query correctly filters using WHERE YEAR(r.requested_at) = 2020.
A fourth subtle edge case occurs when a driver joins exactly on the last day of a month. That driver should count as active for that month. The solution uses a <= comparison against the month's last day, ensuring inclusive behavior.