LeetCode 2142 - The Number of Passengers in Each Bus I

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This problem involves simulating passengers arriving at a bus station and boarding buses as they arrive. We are given two tables: Buses and Passengers. Each bus has a unique bus_id and an arrival_time, and each passenger has a unique passenger_id and an arrival_time. Passengers board the first bus that arrives after their arrival time, and each passenger can only board one bus. Our goal is to calculate the number of passengers that board each bus and return the results sorted by bus_id in ascending order.

The input represents the arrival times of buses and passengers, and the output is a count of passengers for each bus. Constraints include that no two buses arrive at the same time, and the number of buses and passengers can be large enough that naive approaches could be inefficient. Edge cases include situations where buses arrive before any passengers, after all passengers, or where multiple passengers arrive at the same time as a bus.

Approaches

The brute-force approach is straightforward: for each bus, iterate through all passengers and count those who arrived at or before the bus's arrival time and have not already boarded a previous bus. While this is correct, it is inefficient, requiring nested loops and resulting in a time complexity of $O(B \times P)$, where $B$ is the number of buses and $P$ is the number of passengers.

A more optimal approach relies on sorting both buses and passengers by arrival time and using a running pointer to track which passengers have boarded. For each bus, we iterate through passengers in order, boarding all passengers whose arrival time is less than or equal to the current bus's arrival. This approach eliminates the need to repeatedly scan passengers for every bus, reducing time complexity to $O(B \log B + P \log P + B + P)$, dominated by the sorting step.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(B * P)	O(1)	Nested loops over buses and passengers; correct but inefficient
Optimal	O(B log B + P log P)	O(P)	Sort both tables and use a pointer to assign passengers efficiently

Algorithm Walkthrough

Sort the Buses table by arrival_time while keeping track of bus_id.
Sort the Passengers table by arrival_time.
Initialize a result list to store bus_id and passengers_cnt.
Initialize a pointer p to 0, representing the index of the next passenger who has not boarded a bus.
Iterate over each bus in order of arrival time:
Initialize a counter cnt to 0 for this bus.
While p is less than the total number of passengers and the passenger at index p has arrival_time ≤ current bus arrival time:
Increment cnt by 1.
Increment p to mark the passenger as boarded.
Append the tuple (bus_id, cnt) to the result list.
Finally, sort the result list by bus_id to match the output requirements.

Why it works: This algorithm works because by sorting both buses and passengers by time, we maintain the invariant that every passenger is assigned to the earliest bus possible. The pointer ensures no passenger is counted twice, and sorting guarantees that buses are processed in temporal order.

Python Solution

import pandas as pd

class Solution:
    def numberOfPassengers(self, Buses: pd.DataFrame, Passengers: pd.DataFrame) -> pd.DataFrame:
        # Sort buses and passengers by arrival_time
        buses_sorted = Buses.sort_values(by='arrival_time').reset_index(drop=True)
        passengers_sorted = Passengers.sort_values(by='arrival_time').reset_index(drop=True)

        result = []
        p = 0
        num_passengers = len(passengers_sorted)

        # Iterate over buses in chronological order
        for _, bus in buses_sorted.iterrows():
            cnt = 0
            while p < num_passengers and passengers_sorted.at[p, 'arrival_time'] <= bus['arrival_time']:
                cnt += 1
                p += 1
            result.append({'bus_id': bus['bus_id'], 'passengers_cnt': cnt})

        # Return the result sorted by bus_id
        return pd.DataFrame(result).sort_values(by='bus_id').reset_index(drop=True)

The Python implementation first sorts the buses and passengers by arrival_time so that we can process them chronologically. The pointer p ensures that each passenger is only assigned once, and the result is built incrementally. Finally, sorting by bus_id ensures the output matches the problem requirement.

Go Solution

package main

import (
    "sort"
)

type Bus struct {
    BusID       int
    ArrivalTime int
}

type Passenger struct {
    PassengerID int
    ArrivalTime int
}

type BusCount struct {
    BusID         int
    PassengersCnt int
}

func NumberOfPassengers(buses []Bus, passengers []Passenger) []BusCount {
    sort.Slice(buses, func(i, j int) bool { return buses[i].ArrivalTime < buses[j].ArrivalTime })
    sort.Slice(passengers, func(i, j int) bool { return passengers[i].ArrivalTime < passengers[j].ArrivalTime })

    result := make([]BusCount, 0, len(buses))
    p := 0
    for _, bus := range buses {
        cnt := 0
        for p < len(passengers) && passengers[p].ArrivalTime <= bus.ArrivalTime {
            cnt++
            p++
        }
        result = append(result, BusCount{BusID: bus.BusID, PassengersCnt: cnt})
    }

    sort.Slice(result, func(i, j int) bool { return result[i].BusID < result[j].BusID })
    return result
}

The Go implementation mirrors the Python solution, using slices and sort functions. The main difference is explicit slice allocation and struct types for the tables, as Go does not have a built-in DataFrame type.

Worked Examples

For the example provided:

Buses sorted by arrival time: [(1,2), (2,4), (3,7)]
Passengers sorted by arrival time: [(11,1), (12,5), (13,6), (14,7)]

Iteration:

Bus ID	Arrival Time	Passenger Index p	Passengers Boarded	Count
1	2	0	11	1
2	4	1	none	0
3	7	1	12, 13, 14	3

Final result sorted by bus_id: [(1,1), (2,0), (3,3)]

Complexity Analysis

Measure	Complexity	Explanation
Time	O(B log B + P log P)	Sorting buses and passengers dominates; the assignment step is O(B + P)
Space	O(P)	Storing the result and using a pointer over passengers

Sorting ensures we can assign passengers in a single pass per bus.

Test Cases

import pandas as pd

# Example from problem
Buses = pd.DataFrame({'bus_id': [1,2,3], 'arrival_time': [2,4,7]})
Passengers = pd.DataFrame({'passenger_id': [11,12,13,14], 'arrival_time': [1,5,6,7]})
sol = Solution()
df = sol.numberOfPassengers(Buses, Passengers)
assert df.equals(pd.DataFrame({'bus_id':[1,2,3],'passengers_cnt':[1,0,3]}))  # Example test

# No passengers
Buses = pd.DataFrame({'bus_id':[1,2],'arrival_time':[1,2]})
Passengers = pd.DataFrame({'passenger_id':[],'arrival_time':[]})
df = sol.numberOfPassengers(Buses, Passengers)
assert df.equals(pd.DataFrame({'bus_id':[1,2],'passengers_cnt':[0,0]}))  # No passengers

# Passengers all before first bus
Buses = pd.DataFrame({'bus_id':[1,2],'arrival_time':[5,10]})
Passengers = pd.DataFrame({'passenger_id':[1,2,3],'arrival_time':[1,2,3]})
df = sol.numberOfPassengers(Buses, Passengers)
assert df.equals(pd.DataFrame({'bus_id':[1,2],'passengers_cnt':[3,0]}))  # All board first bus

# Passengers after all buses
Buses = pd.DataFrame({'bus_id':[1,2],'arrival_time':[1,2]})
Passengers = pd.DataFrame({'passenger_id':[1,2],'arrival_time':[3,4]})
df = sol.numberOfPassengers(Buses, Passengers)
assert df.equals(pd.DataFrame({'bus_id':[1,2],'passengers_cnt':[0,0]}))  # No passengers can board

Test	Why
Example	Validates normal passenger distribution
No passengers	Ensures algorithm handles empty passenger table
Passengers all before first bus	Checks that all early passengers are assigned to the first bus
Passengers after all buses	Ensures passengers after the last bus are not counted

Edge Cases

One edge case is when no passengers are present