LeetCode 2238 - Number of Times a Driver Was a Passenger

The problem provides a table Rides that records rides between drivers and passengers. Each row has a unique rideid, a driverid, and a passengerid. The task is to determine, for each driver, how many times that driver has appeared as a passenger in the table.

LeetCode Problem 2238

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

The problem provides a table Rides that records rides between drivers and passengers. Each row has a unique ride_id, a driver_id, and a passenger_id. The task is to determine, for each driver, how many times that driver has appeared as a passenger in the table. In other words, for each unique driver_id, we count the occurrences where their ID appears in the passenger_id column.

The input table can have multiple rides for the same driver or passenger, and a driver can never drive themselves (i.e., driver_id != passenger_id). The expected output is a table with two columns: driver_id and cnt, where cnt represents the number of times this driver was a passenger. The result can be returned in any order.

Important constraints and observations include: each ride_id is unique, the table can contain many rides, and the main challenge is efficiently counting passenger occurrences for all drivers without missing any driver or double counting.

Edge cases that could trip up a naive solution include drivers who never appear as passengers (the count should be zero), multiple rides with the same passenger, and situations where the Rides table is empty. The solution must account for drivers with zero passenger occurrences.

Approaches

A brute-force approach would iterate over all unique drivers and, for each driver, scan the entire passenger_id column to count matches. This approach works because it directly follows the problem description, but it is inefficient, requiring a nested scan over the table, which results in O(n*m) time complexity, where n is the number of unique drivers and m is the total number of rides.

The optimal approach relies on using aggregation. First, count how many times each passenger ID appears using a GROUP BY query. Then, join this aggregation with the list of unique drivers to get the count of times each driver appears as a passenger. This method is efficient because it leverages database aggregation and join operations instead of nested loops, reducing the computational complexity significantly.

Approach Time Complexity Space Complexity Notes
Brute Force O(n * m) O(1) Iterate over each driver, count in passenger column
Optimal O(m) O(n) Aggregate passenger counts, join with unique drivers

Algorithm Walkthrough

  1. Identify all unique drivers in the Rides table using SELECT DISTINCT driver_id.
  2. Aggregate counts of each passenger ID across all rides with GROUP BY passenger_id to get how many times each person was a passenger.
  3. Perform a left join between the unique drivers and the passenger counts. This ensures that all drivers appear in the result, even those who were never passengers.
  4. Replace null counts with 0 for drivers who never appeared as a passenger.
  5. Return the resulting table with columns driver_id and cnt.

Why it works: The left join guarantees that every driver is included in the result. The aggregation step ensures that passenger occurrences are counted correctly. Replacing null with 0 maintains correctness for drivers who were never passengers.

Python Solution

import pandas as pd

def count_passenger_occurrences(rides: pd.DataFrame) -> pd.DataFrame:
    # Step 1: Find all unique drivers
    drivers = rides[['driver_id']].drop_duplicates()
    
    # Step 2: Count how many times each driver appears as a passenger
    passenger_counts = rides.groupby('passenger_id').size().reset_index(name='cnt')
    
    # Step 3: Join the counts with the drivers
    result = drivers.merge(passenger_counts, how='left', left_on='driver_id', right_on='passenger_id')
    
    # Step 4: Replace null counts with 0
    result['cnt'] = result['cnt'].fillna(0).astype(int)
    
    # Step 5: Select only required columns
    return result[['driver_id', 'cnt']]

The implementation mirrors the algorithm steps. First, it extracts all unique drivers. Then, it uses groupby to count passenger occurrences efficiently. The left join ensures all drivers are included, and fillna handles drivers with zero occurrences.

Go Solution

package main

import (
    "database/sql"
    _ "github.com/go-sql-driver/mysql"
    "fmt"
)

func CountPassengerOccurrences(db *sql.DB) ([]struct {
    DriverID int
    Cnt      int
}, error) {
    query := `
        SELECT d.driver_id, IFNULL(p.cnt, 0) AS cnt
        FROM (SELECT DISTINCT driver_id FROM Rides) d
        LEFT JOIN (
            SELECT passenger_id, COUNT(*) AS cnt
            FROM Rides
            GROUP BY passenger_id
        ) p
        ON d.driver_id = p.passenger_id
    `
    rows, err := db.Query(query)
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var results []struct {
        DriverID int
        Cnt      int
    }

    for rows.Next() {
        var r struct {
            DriverID int
            Cnt      int
        }
        if err := rows.Scan(&r.DriverID, &r.Cnt); err != nil {
            return nil, err
        }
        results = append(results, r)
    }

    return results, nil
}

In Go, the solution uses a SQL query to leverage aggregation and joining, similar to the Python logic. Handling null values with IFNULL ensures drivers with zero counts are correctly processed.

Worked Examples

For the input:

Rides table:
+---------+-----------+--------------+
| ride_id | driver_id | passenger_id |
+---------+-----------+--------------+
| 1       | 7         | 1            |
| 2       | 7         | 2            |
| 3       | 11        | 1            |
| 4       | 11        | 7            |
| 5       | 11        | 7            |
| 6       | 11        | 3            |

Step 1: Unique drivers → {7, 11}

Step 2: Passenger counts → {1:2, 2:1, 3:1, 7:2}

Step 3: Left join → {7:2, 11:null}

Step 4: Replace null → {7:2, 11:0}

Final output:

+-----------+-----+
| driver_id | cnt |
+-----------+-----+
| 7         | 2   |
| 11        | 0   |
+-----------+-----+

Complexity Analysis

Measure Complexity Explanation
Time O(m) Counting passengers requires scanning all rides once, and joining is proportional to the number of unique drivers
Space O(n) Storing passenger counts and driver IDs requires space proportional to the number of unique drivers and passengers

The time complexity is linear in the number of rides since aggregation and joins are optimized by the database engine. Space complexity is proportional to the number of unique IDs stored for aggregation and joining.

Test Cases

import pandas as pd

# Example case
rides = pd.DataFrame({
    'ride_id': [1,2,3,4,5,6],
    'driver_id': [7,7,11,11,11,11],
    'passenger_id': [1,2,1,7,7,3]
})
result = count_passenger_occurrences(rides)
assert result[result['driver_id']==7]['cnt'].iloc[0] == 2  # 7 as passenger twice
assert result[result['driver_id']==11]['cnt'].iloc[0] == 0 # 11 never a passenger

# Edge: no rides
rides_empty = pd.DataFrame(columns=['ride_id','driver_id','passenger_id'])
result_empty = count_passenger_occurrences(rides_empty)
assert result_empty.empty  # No drivers, output empty

# Edge: driver never a passenger
rides_single = pd.DataFrame({
    'ride_id': [1],
    'driver_id': [5],
    'passenger_id': [10]
})
result_single = count_passenger_occurrences(rides_single)
assert result_single['cnt'].iloc[0] == 0  # 5 never as passenger

# Multiple appearances as passenger
rides_multi = pd.DataFrame({
    'ride_id': [1,2,3],
    'driver_id': [1,2,3],
    'passenger_id': [2,2,2]
})
result_multi = count_passenger_occurrences(rides_multi)
assert result_multi[result_multi['driver_id']==2]['cnt'].iloc[0] == 3  # 2 as passenger three times
Test Why
Example case Validates standard scenario
No rides Handles empty input table
Driver never a passenger Ensures zero counts are correct
Multiple appearances as passenger Checks aggregation correctness

Edge Cases

A driver never being a passenger is a common source of bugs. If null values after joining are not replaced with 0, the output will be incorrect. The implementation explicitly fills null with 0.

An empty Rides table is another edge case. The algorithm handles this gracefully by returning an empty result set.

Drivers who appear multiple times as passengers must have their counts summed correctly. The aggregation with GROUP BY ensures that multiple appearances are counted correctly without duplication.

This comprehensive approach ensures correctness across typical, empty, and extreme input scenarios