LeetCode 1645 - Hopper Company Queries II
This problem asks us to calculate, for every month in the year 2020, the percentage of drivers who actually worked durin
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
This problem asks us to calculate, for every month in the year 2020, the percentage of drivers who actually worked during that month.
The database contains three tables:
Driversstores when each driver joined Hopper.Ridesstores all ride requests.AcceptedRidesstores only rides that were accepted and completed by a driver.
The key observation is that a driver is considered "available" during a month if they joined on or before the end of that month. A driver is considered "working" during a month if they completed at least one accepted ride in that month.
For every month from January through December 2020, we must compute:
$$\text{working_percentage} = \frac{\text{number of distinct working drivers in that month}} {\text{number of available drivers by end of that month}} \times 100$$
The result must:
- include all 12 months,
- be ordered by month ascending,
- round the percentage to 2 decimal places,
- return
0if there are no available drivers.
The difficulty comes from the fact that driver availability is cumulative. Once a driver joins, they remain available for all future months. Meanwhile, working drivers are counted independently for each month.
Another subtle detail is that we count distinct drivers who worked during the month, not total rides. If one driver completed multiple rides in the same month, they still count only once.
The input sizes are not explicitly given in the statement, but since this is a database problem, efficiency matters. A naive solution that repeatedly scans the tables for each month can become unnecessarily expensive. The optimal solution aggregates data once and reuses it.
Important edge cases include:
- Months with zero accepted rides.
- Months with zero available drivers.
- Drivers who joined before 2020.
- Drivers who joined after 2020, these should not affect 2020 calculations.
- Drivers completing multiple rides in the same month.
- Ride requests that exist in
Ridesbut were never accepted.
Approaches
Brute Force Approach
A straightforward solution is to process each month independently.
For every month from 1 to 12:
- Scan the
Driverstable and count how many drivers joined on or before the end of that month. - Join
AcceptedRideswithRidesto determine which rides occurred during the current month. - Count the number of distinct drivers who completed accepted rides during that month.
- Compute the percentage.
This works because each month is computed directly from the raw data.
However, the problem is inefficiency. For every one of the 12 months, we repeatedly scan the same tables. If the tables are large, this becomes wasteful. The repeated joins and repeated counting of cumulative drivers introduce unnecessary overhead.
Optimal Approach
The key insight is that the calculations can be pre-aggregated.
We notice two important patterns:
- Available drivers are cumulative by month.
- Working drivers only require monthly distinct counts.
Instead of recomputing everything for every month, we can:
- First compute how many drivers joined in each month.
- Build a cumulative running total of available drivers.
- Separately compute distinct working drivers for each month.
- Join the two monthly summaries together.
This avoids repeated table scans and transforms the problem into simple grouped aggregations.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(12 × (D + R)) | O(1) | Repeatedly scans tables for each month |
| Optimal | O(D + R) | O(12) | Pre-aggregates monthly statistics once |
Here:
D= number of rows inDriversR= number of rows inRidesandAcceptedRides
Algorithm Walkthrough
Step 1: Generate All Months
We first create a list of months from 1 through 12.
This guarantees that even months with no drivers or no rides still appear in the final output.
Step 2: Count Driver Joins Per Month
We examine the Drivers table and count how many drivers joined in each month of 2020 or earlier.
For example:
| join_month | new_drivers |
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
Drivers who joined before 2020 are considered already active starting in January.
Step 3: Build Cumulative Active Driver Counts
Available drivers are cumulative.
If:
- January adds 2 drivers,
- February adds 1 driver,
- March adds 1 driver,
then:
| month | active_drivers |
|---|---|
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
We use a running total to compute this efficiently.
Step 4: Compute Working Drivers Per Month
We join AcceptedRides with Rides because the month information exists in Rides.requested_at.
For each month in 2020:
- extract the ride month,
- count distinct
driver_id.
Using COUNT(DISTINCT driver_id) is critical because multiple rides by the same driver should count only once.
Step 5: Compute the Percentage
For each month:
$$\text{working_percentage} = \frac{\text{working drivers}} {\text{active drivers}} \times 100$$
If active drivers are zero, return 0.
Finally, round the value to 2 decimal places.
Why it works
The algorithm works because it separates the problem into two independent monthly aggregates:
- cumulative available drivers,
- distinct working drivers.
The cumulative count correctly models driver availability because once a driver joins, they remain active in all future months. The monthly distinct count correctly models working participation because a driver either worked during the month or did not, regardless of how many rides they completed.
Since every month is computed from these correct aggregates, the final percentages are correct.
Python Solution
# Write your MySQL query statement below
WITH RECURSIVE months AS (
SELECT 1 AS month
UNION ALL
SELECT month + 1
FROM months
WHERE month < 12
),
driver_counts AS (
SELECT
MONTH(join_date) AS month,
COUNT(*) AS new_drivers
FROM Drivers
WHERE YEAR(join_date) = 2020
GROUP BY MONTH(join_date)
),
active_drivers AS (
SELECT
m.month,
(
SELECT COUNT(*)
FROM Drivers d
WHERE d.join_date <= LAST_DAY(CONCAT('2020-', m.month, '-01'))
) AS active_count
FROM months m
),
working_drivers AS (
SELECT
MONTH(r.requested_at) AS month,
COUNT(DISTINCT ar.driver_id) AS working_count
FROM AcceptedRides ar
JOIN Rides r
ON ar.ride_id = r.ride_id
WHERE YEAR(r.requested_at) = 2020
GROUP BY MONTH(r.requested_at)
)
SELECT
m.month,
ROUND(
IFNULL(wd.working_count, 0) * 100.0 /
IF(ad.active_count, ad.active_count, NULL),
2
) AS working_percentage
FROM months m
LEFT JOIN active_drivers ad
ON m.month = ad.month
LEFT JOIN working_drivers wd
ON m.month = wd.month
ORDER BY m.month;
The solution begins by generating all months from 1 through 12 using a recursive common table expression.
Next, the active_drivers CTE computes how many drivers had joined by the end of each month. The condition:
d.join_date <= LAST_DAY(CONCAT('2020-', m.month, '-01'))
ensures cumulative inclusion.
The working_drivers CTE joins accepted rides with rides to retrieve the ride date. It groups by month and counts distinct drivers.
Finally, the main query joins everything together and computes the percentage. The IFNULL handles months without working drivers, while the IF(..., NULL) prevents division by zero.
Go Solution
// There is no Go implementation for SQL database problems on LeetCode.
// The expected submission is a SQL query.
Unlike algorithmic LeetCode problems, database problems are solved directly in SQL. Therefore, there is no executable Go implementation required for submission.
Worked Examples
Example 1
Drivers:
| driver_id | join_date |
|---|---|
| 10 | 2019-12-10 |
| 8 | 2020-01-13 |
| 5 | 2020-02-16 |
| 7 | 2020-03-08 |
| 4 | 2020-05-17 |
| 1 | 2020-10-24 |
Step 1: Active Drivers Per Month
| Month | Newly Joined | Cumulative Active |
|---|---|---|
| 1 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 1 | 4 |
| 4 | 0 | 4 |
| 5 | 1 | 5 |
| 6 | 0 | 5 |
| 7 | 0 | 5 |
| 8 | 0 | 5 |
| 9 | 0 | 5 |
| 10 | 1 | 6 |
| 11 | 0 | 6 |
| 12 | 0 | 6 |
Driver 10 joined in 2019, so January already starts with one active driver.
Step 2: Working Drivers Per Month
Accepted rides in 2020:
| Month | Drivers Working | Distinct Count |
|---|---|---|
| 3 | {10} | 1 |
| 6 | {10} | 1 |
| 7 | {8} | 1 |
| 8 | {7} | 1 |
| 11 | {1, 7} | 2 |
| 12 | {4} | 1 |
Step 3: Percentage Calculation
| Month | Active | Working | Percentage |
|---|---|---|---|
| 1 | 2 | 0 | 0.00 |
| 2 | 3 | 0 | 0.00 |
| 3 | 4 | 1 | 25.00 |
| 4 | 4 | 0 | 0.00 |
| 5 | 5 | 0 | 0.00 |
| 6 | 5 | 1 | 20.00 |
| 7 | 5 | 1 | 20.00 |
| 8 | 5 | 1 | 20.00 |
| 9 | 5 | 0 | 0.00 |
| 10 | 6 | 0 | 0.00 |
| 11 | 6 | 2 | 33.33 |
| 12 | 6 | 1 | 16.67 |
These values match the expected output.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(D + R) | Each table is scanned a constant number of times |
| Space | O(12) | Only monthly aggregate tables are stored |
The solution is efficient because it avoids recomputing monthly statistics repeatedly. All heavy operations are grouped aggregations over the original tables, and the intermediate results contain at most 12 rows.
Test Cases
# These are conceptual validation cases for the SQL logic.
# Case 1: Example from problem statement
# Validates normal cumulative counting and monthly distinct drivers.
# Case 2: No drivers at all
# Expected: every month has 0.00 percentage.
# Case 3: Drivers exist but no accepted rides
# Expected: every month has 0.00 percentage.
# Case 4: One driver completes multiple rides in same month
# Must count driver only once.
# Case 5: Drivers join after 2020
# They should not affect 2020 calculations.
# Case 6: Driver joined before 2020
# Must be counted active starting January.
# Case 7: Accepted rides exist in 2021
# They must not affect 2020 output.
# Case 8: Sparse months
# Months without rides must still appear.
# Case 9: All rides concentrated in one month
# Distinct driver counting must still work correctly.
# Case 10: Division by zero
# If no active drivers exist during a month, percentage must be 0.
Test Summary
| Test | Why |
|---|---|
| Example input | Validates standard behavior |
| No drivers | Ensures division-by-zero handling |
| No accepted rides | Ensures working count defaults to 0 |
| Multiple rides same driver | Validates DISTINCT counting |
| Drivers joining after 2020 | Ensures future joins ignored |
| Driver joined before 2020 | Validates cumulative availability |
| 2021 rides | Ensures year filtering correctness |
| Sparse months | Ensures all 12 months appear |
| Single busy month | Validates monthly grouping |
| Zero active drivers | Prevents invalid percentage calculation |
Edge Cases
Months With No Active Drivers
A particularly important edge case occurs when no drivers have joined yet. In this situation, the denominator becomes zero. A naive implementation may produce a division-by-zero error or return NULL.
This solution handles the case explicitly by using conditional division logic. If the active driver count is zero, the percentage evaluates safely to 0.
Multiple Accepted Rides By the Same Driver
Another common source of bugs is counting rides instead of drivers.
Suppose a single driver completes five rides during a month. The problem asks for the percentage of working drivers, not the number of rides completed. Therefore, that driver must contribute only one count.
The implementation correctly uses:
COUNT(DISTINCT ar.driver_id)
which guarantees uniqueness.
Drivers Joining Before 2020
Drivers who joined before January 2020 are already active when the year begins. A flawed implementation might only count drivers whose join date occurs during 2020.
This solution instead compares join dates against the end of each month, ensuring that all previously joined drivers remain active throughout 2020.
Months Without Accepted Rides
Some months may contain no accepted rides at all. If inner joins are used incorrectly, these months could disappear from the result.
The dedicated months table guarantees that every month from 1 to 12 appears in the final output. Left joins then fill missing values with zeros.