LeetCode 1645 - Hopper Company Queries II

This problem asks us to calculate, for every month in the year 2020, the percentage of drivers who actually worked durin

LeetCode Problem 1645

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This problem asks us to calculate, for every month in the year 2020, the percentage of drivers who actually worked during that month.

The database contains three tables:

  • Drivers stores when each driver joined Hopper.
  • Rides stores all ride requests.
  • AcceptedRides stores only rides that were accepted and completed by a driver.

The key observation is that a driver is considered "available" during a month if they joined on or before the end of that month. A driver is considered "working" during a month if they completed at least one accepted ride in that month.

For every month from January through December 2020, we must compute:

$$\text{working_percentage} = \frac{\text{number of distinct working drivers in that month}} {\text{number of available drivers by end of that month}} \times 100$$

The result must:

  • include all 12 months,
  • be ordered by month ascending,
  • round the percentage to 2 decimal places,
  • return 0 if there are no available drivers.

The difficulty comes from the fact that driver availability is cumulative. Once a driver joins, they remain available for all future months. Meanwhile, working drivers are counted independently for each month.

Another subtle detail is that we count distinct drivers who worked during the month, not total rides. If one driver completed multiple rides in the same month, they still count only once.

The input sizes are not explicitly given in the statement, but since this is a database problem, efficiency matters. A naive solution that repeatedly scans the tables for each month can become unnecessarily expensive. The optimal solution aggregates data once and reuses it.

Important edge cases include:

  • Months with zero accepted rides.
  • Months with zero available drivers.
  • Drivers who joined before 2020.
  • Drivers who joined after 2020, these should not affect 2020 calculations.
  • Drivers completing multiple rides in the same month.
  • Ride requests that exist in Rides but were never accepted.

Approaches

Brute Force Approach

A straightforward solution is to process each month independently.

For every month from 1 to 12:

  1. Scan the Drivers table and count how many drivers joined on or before the end of that month.
  2. Join AcceptedRides with Rides to determine which rides occurred during the current month.
  3. Count the number of distinct drivers who completed accepted rides during that month.
  4. Compute the percentage.

This works because each month is computed directly from the raw data.

However, the problem is inefficiency. For every one of the 12 months, we repeatedly scan the same tables. If the tables are large, this becomes wasteful. The repeated joins and repeated counting of cumulative drivers introduce unnecessary overhead.

Optimal Approach

The key insight is that the calculations can be pre-aggregated.

We notice two important patterns:

  1. Available drivers are cumulative by month.
  2. Working drivers only require monthly distinct counts.

Instead of recomputing everything for every month, we can:

  • First compute how many drivers joined in each month.
  • Build a cumulative running total of available drivers.
  • Separately compute distinct working drivers for each month.
  • Join the two monthly summaries together.

This avoids repeated table scans and transforms the problem into simple grouped aggregations.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(12 × (D + R)) O(1) Repeatedly scans tables for each month
Optimal O(D + R) O(12) Pre-aggregates monthly statistics once

Here:

  • D = number of rows in Drivers
  • R = number of rows in Rides and AcceptedRides

Algorithm Walkthrough

Step 1: Generate All Months

We first create a list of months from 1 through 12.

This guarantees that even months with no drivers or no rides still appear in the final output.

Step 2: Count Driver Joins Per Month

We examine the Drivers table and count how many drivers joined in each month of 2020 or earlier.

For example:

join_month new_drivers
1 1
2 1
3 1

Drivers who joined before 2020 are considered already active starting in January.

Step 3: Build Cumulative Active Driver Counts

Available drivers are cumulative.

If:

  • January adds 2 drivers,
  • February adds 1 driver,
  • March adds 1 driver,

then:

month active_drivers
1 2
2 3
3 4

We use a running total to compute this efficiently.

Step 4: Compute Working Drivers Per Month

We join AcceptedRides with Rides because the month information exists in Rides.requested_at.

For each month in 2020:

  • extract the ride month,
  • count distinct driver_id.

Using COUNT(DISTINCT driver_id) is critical because multiple rides by the same driver should count only once.

Step 5: Compute the Percentage

For each month:

$$\text{working_percentage} = \frac{\text{working drivers}} {\text{active drivers}} \times 100$$

If active drivers are zero, return 0.

Finally, round the value to 2 decimal places.

Why it works

The algorithm works because it separates the problem into two independent monthly aggregates:

  • cumulative available drivers,
  • distinct working drivers.

The cumulative count correctly models driver availability because once a driver joins, they remain active in all future months. The monthly distinct count correctly models working participation because a driver either worked during the month or did not, regardless of how many rides they completed.

Since every month is computed from these correct aggregates, the final percentages are correct.

Python Solution

# Write your MySQL query statement below

WITH RECURSIVE months AS (
    SELECT 1 AS month
    UNION ALL
    SELECT month + 1
    FROM months
    WHERE month < 12
),

driver_counts AS (
    SELECT
        MONTH(join_date) AS month,
        COUNT(*) AS new_drivers
    FROM Drivers
    WHERE YEAR(join_date) = 2020
    GROUP BY MONTH(join_date)
),

active_drivers AS (
    SELECT
        m.month,
        (
            SELECT COUNT(*)
            FROM Drivers d
            WHERE d.join_date <= LAST_DAY(CONCAT('2020-', m.month, '-01'))
        ) AS active_count
    FROM months m
),

working_drivers AS (
    SELECT
        MONTH(r.requested_at) AS month,
        COUNT(DISTINCT ar.driver_id) AS working_count
    FROM AcceptedRides ar
    JOIN Rides r
        ON ar.ride_id = r.ride_id
    WHERE YEAR(r.requested_at) = 2020
    GROUP BY MONTH(r.requested_at)
)

SELECT
    m.month,
    ROUND(
        IFNULL(wd.working_count, 0) * 100.0 /
        IF(ad.active_count, ad.active_count, NULL),
        2
    ) AS working_percentage
FROM months m
LEFT JOIN active_drivers ad
    ON m.month = ad.month
LEFT JOIN working_drivers wd
    ON m.month = wd.month
ORDER BY m.month;

The solution begins by generating all months from 1 through 12 using a recursive common table expression.

Next, the active_drivers CTE computes how many drivers had joined by the end of each month. The condition:

d.join_date <= LAST_DAY(CONCAT('2020-', m.month, '-01'))

ensures cumulative inclusion.

The working_drivers CTE joins accepted rides with rides to retrieve the ride date. It groups by month and counts distinct drivers.

Finally, the main query joins everything together and computes the percentage. The IFNULL handles months without working drivers, while the IF(..., NULL) prevents division by zero.

Go Solution

// There is no Go implementation for SQL database problems on LeetCode.
// The expected submission is a SQL query.

Unlike algorithmic LeetCode problems, database problems are solved directly in SQL. Therefore, there is no executable Go implementation required for submission.

Worked Examples

Example 1

Drivers:

driver_id join_date
10 2019-12-10
8 2020-01-13
5 2020-02-16
7 2020-03-08
4 2020-05-17
1 2020-10-24

Step 1: Active Drivers Per Month

Month Newly Joined Cumulative Active
1 1 2
2 1 3
3 1 4
4 0 4
5 1 5
6 0 5
7 0 5
8 0 5
9 0 5
10 1 6
11 0 6
12 0 6

Driver 10 joined in 2019, so January already starts with one active driver.

Step 2: Working Drivers Per Month

Accepted rides in 2020:

Month Drivers Working Distinct Count
3 {10} 1
6 {10} 1
7 {8} 1
8 {7} 1
11 {1, 7} 2
12 {4} 1

Step 3: Percentage Calculation

Month Active Working Percentage
1 2 0 0.00
2 3 0 0.00
3 4 1 25.00
4 4 0 0.00
5 5 0 0.00
6 5 1 20.00
7 5 1 20.00
8 5 1 20.00
9 5 0 0.00
10 6 0 0.00
11 6 2 33.33
12 6 1 16.67

These values match the expected output.

Complexity Analysis

Measure Complexity Explanation
Time O(D + R) Each table is scanned a constant number of times
Space O(12) Only monthly aggregate tables are stored

The solution is efficient because it avoids recomputing monthly statistics repeatedly. All heavy operations are grouped aggregations over the original tables, and the intermediate results contain at most 12 rows.

Test Cases

# These are conceptual validation cases for the SQL logic.

# Case 1: Example from problem statement
# Validates normal cumulative counting and monthly distinct drivers.

# Case 2: No drivers at all
# Expected: every month has 0.00 percentage.

# Case 3: Drivers exist but no accepted rides
# Expected: every month has 0.00 percentage.

# Case 4: One driver completes multiple rides in same month
# Must count driver only once.

# Case 5: Drivers join after 2020
# They should not affect 2020 calculations.

# Case 6: Driver joined before 2020
# Must be counted active starting January.

# Case 7: Accepted rides exist in 2021
# They must not affect 2020 output.

# Case 8: Sparse months
# Months without rides must still appear.

# Case 9: All rides concentrated in one month
# Distinct driver counting must still work correctly.

# Case 10: Division by zero
# If no active drivers exist during a month, percentage must be 0.

Test Summary

Test Why
Example input Validates standard behavior
No drivers Ensures division-by-zero handling
No accepted rides Ensures working count defaults to 0
Multiple rides same driver Validates DISTINCT counting
Drivers joining after 2020 Ensures future joins ignored
Driver joined before 2020 Validates cumulative availability
2021 rides Ensures year filtering correctness
Sparse months Ensures all 12 months appear
Single busy month Validates monthly grouping
Zero active drivers Prevents invalid percentage calculation

Edge Cases

Months With No Active Drivers

A particularly important edge case occurs when no drivers have joined yet. In this situation, the denominator becomes zero. A naive implementation may produce a division-by-zero error or return NULL.

This solution handles the case explicitly by using conditional division logic. If the active driver count is zero, the percentage evaluates safely to 0.

Multiple Accepted Rides By the Same Driver

Another common source of bugs is counting rides instead of drivers.

Suppose a single driver completes five rides during a month. The problem asks for the percentage of working drivers, not the number of rides completed. Therefore, that driver must contribute only one count.

The implementation correctly uses:

COUNT(DISTINCT ar.driver_id)

which guarantees uniqueness.

Drivers Joining Before 2020

Drivers who joined before January 2020 are already active when the year begins. A flawed implementation might only count drivers whose join date occurs during 2020.

This solution instead compares join dates against the end of each month, ensuring that all previously joined drivers remain active throughout 2020.

Months Without Accepted Rides

Some months may contain no accepted rides at all. If inner joins are used incorrectly, these months could disappear from the result.

The dedicated months table guarantees that every month from 1 to 12 appears in the final output. Left joins then fill missing values with zeros.