LeetCode 2142 - The Number of Passengers in Each Bus I
This problem involves simulating passengers arriving at a bus station and boarding buses as they arrive. We are given two tables: Buses and Passengers. Each bus has a unique busid and an arrivaltime, and each passenger has a unique passengerid and an arrivaltime.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
This problem involves simulating passengers arriving at a bus station and boarding buses as they arrive. We are given two tables: Buses and Passengers. Each bus has a unique bus_id and an arrival_time, and each passenger has a unique passenger_id and an arrival_time. Passengers board the first bus that arrives after their arrival time, and each passenger can only board one bus. Our goal is to calculate the number of passengers that board each bus and return the results sorted by bus_id in ascending order.
The input represents the arrival times of buses and passengers, and the output is a count of passengers for each bus. Constraints include that no two buses arrive at the same time, and the number of buses and passengers can be large enough that naive approaches could be inefficient. Edge cases include situations where buses arrive before any passengers, after all passengers, or where multiple passengers arrive at the same time as a bus.
Approaches
The brute-force approach is straightforward: for each bus, iterate through all passengers and count those who arrived at or before the bus's arrival time and have not already boarded a previous bus. While this is correct, it is inefficient, requiring nested loops and resulting in a time complexity of $O(B \times P)$, where $B$ is the number of buses and $P$ is the number of passengers.
A more optimal approach relies on sorting both buses and passengers by arrival time and using a running pointer to track which passengers have boarded. For each bus, we iterate through passengers in order, boarding all passengers whose arrival time is less than or equal to the current bus's arrival. This approach eliminates the need to repeatedly scan passengers for every bus, reducing time complexity to $O(B \log B + P \log P + B + P)$, dominated by the sorting step.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(B * P) | O(1) | Nested loops over buses and passengers; correct but inefficient |
| Optimal | O(B log B + P log P) | O(P) | Sort both tables and use a pointer to assign passengers efficiently |
Algorithm Walkthrough
-
Sort the
Busestable byarrival_timewhile keeping track ofbus_id. -
Sort the
Passengerstable byarrival_time. -
Initialize a result list to store
bus_idandpassengers_cnt. -
Initialize a pointer
pto 0, representing the index of the next passenger who has not boarded a bus. -
Iterate over each bus in order of arrival time:
-
Initialize a counter
cntto 0 for this bus. -
While
pis less than the total number of passengers and the passenger at indexphasarrival_time≤ current bus arrival time: -
Increment
cntby 1. -
Increment
pto mark the passenger as boarded. -
Append the tuple
(bus_id, cnt)to the result list. -
Finally, sort the result list by
bus_idto match the output requirements.
Why it works: This algorithm works because by sorting both buses and passengers by time, we maintain the invariant that every passenger is assigned to the earliest bus possible. The pointer ensures no passenger is counted twice, and sorting guarantees that buses are processed in temporal order.
Python Solution
import pandas as pd
class Solution:
def numberOfPassengers(self, Buses: pd.DataFrame, Passengers: pd.DataFrame) -> pd.DataFrame:
# Sort buses and passengers by arrival_time
buses_sorted = Buses.sort_values(by='arrival_time').reset_index(drop=True)
passengers_sorted = Passengers.sort_values(by='arrival_time').reset_index(drop=True)
result = []
p = 0
num_passengers = len(passengers_sorted)
# Iterate over buses in chronological order
for _, bus in buses_sorted.iterrows():
cnt = 0
while p < num_passengers and passengers_sorted.at[p, 'arrival_time'] <= bus['arrival_time']:
cnt += 1
p += 1
result.append({'bus_id': bus['bus_id'], 'passengers_cnt': cnt})
# Return the result sorted by bus_id
return pd.DataFrame(result).sort_values(by='bus_id').reset_index(drop=True)
The Python implementation first sorts the buses and passengers by arrival_time so that we can process them chronologically. The pointer p ensures that each passenger is only assigned once, and the result is built incrementally. Finally, sorting by bus_id ensures the output matches the problem requirement.
Go Solution
package main
import (
"sort"
)
type Bus struct {
BusID int
ArrivalTime int
}
type Passenger struct {
PassengerID int
ArrivalTime int
}
type BusCount struct {
BusID int
PassengersCnt int
}
func NumberOfPassengers(buses []Bus, passengers []Passenger) []BusCount {
sort.Slice(buses, func(i, j int) bool { return buses[i].ArrivalTime < buses[j].ArrivalTime })
sort.Slice(passengers, func(i, j int) bool { return passengers[i].ArrivalTime < passengers[j].ArrivalTime })
result := make([]BusCount, 0, len(buses))
p := 0
for _, bus := range buses {
cnt := 0
for p < len(passengers) && passengers[p].ArrivalTime <= bus.ArrivalTime {
cnt++
p++
}
result = append(result, BusCount{BusID: bus.BusID, PassengersCnt: cnt})
}
sort.Slice(result, func(i, j int) bool { return result[i].BusID < result[j].BusID })
return result
}
The Go implementation mirrors the Python solution, using slices and sort functions. The main difference is explicit slice allocation and struct types for the tables, as Go does not have a built-in DataFrame type.
Worked Examples
For the example provided:
- Buses sorted by arrival time: [(1,2), (2,4), (3,7)]
- Passengers sorted by arrival time: [(11,1), (12,5), (13,6), (14,7)]
Iteration:
| Bus ID | Arrival Time | Passenger Index p | Passengers Boarded | Count |
|---|---|---|---|---|
| 1 | 2 | 0 | 11 | 1 |
| 2 | 4 | 1 | none | 0 |
| 3 | 7 | 1 | 12, 13, 14 | 3 |
Final result sorted by bus_id: [(1,1), (2,0), (3,3)]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(B log B + P log P) | Sorting buses and passengers dominates; the assignment step is O(B + P) |
| Space | O(P) | Storing the result and using a pointer over passengers |
Sorting ensures we can assign passengers in a single pass per bus.
Test Cases
import pandas as pd
# Example from problem
Buses = pd.DataFrame({'bus_id': [1,2,3], 'arrival_time': [2,4,7]})
Passengers = pd.DataFrame({'passenger_id': [11,12,13,14], 'arrival_time': [1,5,6,7]})
sol = Solution()
df = sol.numberOfPassengers(Buses, Passengers)
assert df.equals(pd.DataFrame({'bus_id':[1,2,3],'passengers_cnt':[1,0,3]})) # Example test
# No passengers
Buses = pd.DataFrame({'bus_id':[1,2],'arrival_time':[1,2]})
Passengers = pd.DataFrame({'passenger_id':[],'arrival_time':[]})
df = sol.numberOfPassengers(Buses, Passengers)
assert df.equals(pd.DataFrame({'bus_id':[1,2],'passengers_cnt':[0,0]})) # No passengers
# Passengers all before first bus
Buses = pd.DataFrame({'bus_id':[1,2],'arrival_time':[5,10]})
Passengers = pd.DataFrame({'passenger_id':[1,2,3],'arrival_time':[1,2,3]})
df = sol.numberOfPassengers(Buses, Passengers)
assert df.equals(pd.DataFrame({'bus_id':[1,2],'passengers_cnt':[3,0]})) # All board first bus
# Passengers after all buses
Buses = pd.DataFrame({'bus_id':[1,2],'arrival_time':[1,2]})
Passengers = pd.DataFrame({'passenger_id':[1,2],'arrival_time':[3,4]})
df = sol.numberOfPassengers(Buses, Passengers)
assert df.equals(pd.DataFrame({'bus_id':[1,2],'passengers_cnt':[0,0]})) # No passengers can board
| Test | Why |
|---|---|
| Example | Validates normal passenger distribution |
| No passengers | Ensures algorithm handles empty passenger table |
| Passengers all before first bus | Checks that all early passengers are assigned to the first bus |
| Passengers after all buses | Ensures passengers after the last bus are not counted |
Edge Cases
One edge case is when no passengers are present