LeetCode 262 - Trips and Users

This problem asks us to compute the daily cancellation rate for taxi trips over a fixed date range, specifically from "2013-10-01" to "2013-10-03".

LeetCode Problem 262

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This problem asks us to compute the daily cancellation rate for taxi trips over a fixed date range, specifically from "2013-10-01" to "2013-10-03".

We are given two database tables:

  • Trips, which stores information about every taxi trip request
  • Users, which stores user metadata such as whether a user is banned

Each trip contains a client, a driver, a trip status, and the request date. The important detail is that both the client and the driver are represented by IDs that reference rows in the Users table.

The cancellation rate for a day is defined as:

$$\text{Cancellation Rate} = \frac{\text{Number of canceled trips with unbanned users}} {\text{Total number of trips with unbanned users}}$$

A trip counts only if:

  • the client is not banned
  • the driver is not banned

A trip is considered canceled if its status is either:

  • cancelled_by_driver
  • cancelled_by_client

Trips with status completed are not cancellations.

The output should contain:

  • one row per day
  • the day itself
  • the cancellation rate rounded to two decimal places

The problem guarantees that we only need to consider days that have at least one valid trip entry in the given range.

A subtle but important detail is that banned users invalidate the entire trip for the calculation. Even if only the client is banned, or only the driver is banned, the trip must be excluded from both the numerator and denominator.

The input scale in LeetCode database problems is usually moderate, but the goal is not raw performance through indexing tricks. Instead, the challenge is correctly expressing filtering, aggregation, and conditional counting in SQL.

Several edge cases can easily cause mistakes:

  • Trips involving banned users must be ignored entirely
  • Only canceled trips contribute to the numerator
  • Completed trips still contribute to the denominator
  • Cancellation rates must be rounded to exactly two decimal places
  • Both client and driver must be checked independently
  • Days with no valid trips should not appear

Approaches

Brute Force Approach

A brute force solution would process each trip individually and repeatedly query the Users table to determine whether the client and driver are banned.

Conceptually, the algorithm would work like this:

  1. Iterate through every trip in the date range.
  2. For each trip:
  • look up the client in Users
  • look up the driver in Users
  1. Skip the trip if either user is banned.
  2. Otherwise:
  • increment the total trip count for that day
  • increment the canceled count if the status indicates cancellation
  1. After processing all trips, compute:

$$\text{canceled} / \text{total}$$

for each day.

This produces the correct answer because every trip is individually validated and categorized correctly.

However, this approach is inefficient because it repeatedly scans or searches the Users table for every trip. Without proper joins or indexing, this can become expensive as the dataset grows.

Optimal Approach

The key insight is that SQL is designed specifically for relational filtering and aggregation.

Instead of manually validating users one trip at a time, we can:

  1. Join the Trips table with the Users table twice:
  • once for the client
  • once for the driver
  1. Filter out banned users directly in the query
  2. Group the remaining trips by day
  3. Use conditional aggregation to count canceled trips

The important observation is that cancellation counting is naturally expressible as a conditional sum:

SUM(status != 'completed')

or equivalently:

SUM(CASE WHEN status != 'completed' THEN 1 ELSE 0 END)

Since SQL treats grouped aggregation very efficiently, this approach avoids repeated scans and computes everything in a single pass over the filtered dataset.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(T × U) O(D) Repeatedly searches users for each trip
Optimal O(T + U) O(D) Uses joins and grouped aggregation efficiently

Here:

  • T is the number of trips
  • U is the number of users
  • D is the number of distinct days

Algorithm Walkthrough

Optimal SQL Algorithm

  1. Start with the Trips table because every calculation is based on trip records.
  2. Join the Users table for the client relationship. This allows us to access whether the client is banned. We join using:
Trips.client_id = Users.users_id
  1. Join the Users table a second time for the driver relationship. Since the same table is used twice, aliases are necessary:
  • one alias for clients
  • one alias for drivers
  1. Filter the dataset so only trips with unbanned users remain:
client.banned = 'No'
AND driver.banned = 'No'
  1. Restrict the date range to:
BETWEEN '2013-10-01' AND '2013-10-03'
  1. Group all remaining rows by request_at. This allows us to compute daily statistics independently.
  2. For each group:
  • count total valid trips using COUNT(*)
  • count canceled trips using a conditional aggregation
  1. Divide canceled trips by total trips to compute the cancellation rate.
  2. Round the result to two decimal places using ROUND().
  3. Return the day and computed cancellation rate.

Why it works

The algorithm works because every trip is filtered before aggregation. This guarantees that only trips involving unbanned users contribute to the statistics.

Grouping by day partitions the dataset correctly, and conditional aggregation ensures that canceled trips are counted separately from completed trips. Since every valid trip contributes exactly once to the denominator, and every canceled valid trip contributes exactly once to the numerator, the computed ratio is mathematically correct.

Python Solution

Although this is a database problem normally solved with SQL, the following demonstrates the equivalent logic in Python.

from collections import defaultdict
from typing import List, Dict

class Solution:
    def cancellationRate(
        self,
        trips: List[Dict],
        users: List[Dict]
    ) -> List[Dict]:

        user_banned = {}

        for user in users:
            user_banned[user["users_id"]] = user["banned"]

        total_trips = defaultdict(int)
        cancelled_trips = defaultdict(int)

        for trip in trips:
            day = trip["request_at"]

            if not ("2013-10-01" <= day <= "2013-10-03"):
                continue

            client_ok = user_banned.get(trip["client_id"]) == "No"
            driver_ok = user_banned.get(trip["driver_id"]) == "No"

            if not (client_ok and driver_ok):
                continue

            total_trips[day] += 1

            if trip["status"] != "completed":
                cancelled_trips[day] += 1

        result = []

        for day in sorted(total_trips.keys()):
            rate = cancelled_trips[day] / total_trips[day]

            result.append({
                "Day": day,
                "Cancellation Rate": round(rate, 2)
            })

        return result

The implementation begins by building a lookup table from users_id to banned status. This allows constant time access when validating trips.

Two hash maps are then maintained:

  • total_trips tracks all valid trips per day
  • cancelled_trips tracks canceled valid trips per day

Each trip is processed once. The algorithm first checks whether the date falls inside the required range. It then verifies that both the client and driver are unbanned.

If the trip is valid, the total counter for that day increases. If the trip status is not completed, the canceled counter also increases.

Finally, the cancellation rate is computed for every day and rounded to two decimal places before returning the result.

Go Solution

package main

import (
	"sort"
)

type Trip struct {
	ID        int
	ClientID  int
	DriverID  int
	CityID    int
	Status    string
	RequestAt string
}

type User struct {
	UsersID int
	Banned  string
	Role    string
}

type Result struct {
	Day              string
	CancellationRate float64
}

func cancellationRate(trips []Trip, users []User) []Result {
	userBanned := make(map[int]string)

	for _, user := range users {
		userBanned[user.UsersID] = user.Banned
	}

	totalTrips := make(map[string]int)
	cancelledTrips := make(map[string]int)

	for _, trip := range trips {
		day := trip.RequestAt

		if day < "2013-10-01" || day > "2013-10-03" {
			continue
		}

		clientOK := userBanned[trip.ClientID] == "No"
		driverOK := userBanned[trip.DriverID] == "No"

		if !(clientOK && driverOK) {
			continue
		}

		totalTrips[day]++

		if trip.Status != "completed" {
			cancelledTrips[day]++
		}
	}

	var days []string

	for day := range totalTrips {
		days = append(days, day)
	}

	sort.Strings(days)

	results := []Result{}

	for _, day := range days {
		rate := float64(cancelledTrips[day]) / float64(totalTrips[day])

		results = append(results, Result{
			Day:              day,
			CancellationRate: float64(int(rate*100+0.5)) / 100,
		})
	}

	return results
}

The Go implementation follows the same logic as the Python version but uses maps explicitly for hash table behavior.

One notable difference is rounding. Go does not have a direct equivalent of Python’s round(..., 2) for decimal formatting in raw float calculations, so manual rounding is used:

float64(int(rate*100+0.5)) / 100

Go also requires explicit type conversion when dividing integers to avoid integer division.

SQL Solution

SELECT
    t.request_at AS Day,
    ROUND(
        AVG(
            CASE
                WHEN t.status = 'completed' THEN 0
                ELSE 1
            END
        ),
        2
    ) AS `Cancellation Rate`
FROM Trips t
JOIN Users c
    ON t.client_id = c.users_id
JOIN Users d
    ON t.driver_id = d.users_id
WHERE c.banned = 'No'
    AND d.banned = 'No'
    AND t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
GROUP BY t.request_at;

This solution uses an elegant SQL trick:

  • completed becomes 0
  • canceled statuses become 1

Taking the average of these values directly computes:

$$\frac{\text{canceled trips}}{\text{total trips}}$$

This avoids writing separate numerator and denominator calculations.

Worked Examples

Example 1

Initial Validity Check

We first determine which trips are eligible.

Trip ID Client Banned Driver Banned Valid?
1 No No Yes
2 Yes No No
3 No No Yes
4 No No Yes
5 No No Yes
6 Yes No No
7 No No Yes
8 Yes No No
9 No No Yes
10 No No Yes

Day: 2013-10-01

Valid trips:

Trip ID Status Cancelled?
1 completed No
3 completed No
4 cancelled_by_client Yes

State after processing:

Variable Value
total_trips["2013-10-01"] 3
cancelled_trips["2013-10-01"] 1

Cancellation rate:

$$\frac{1}{3} = 0.33$$

Day: 2013-10-02

Valid trips:

Trip ID Status Cancelled?
5 completed No
7 completed No

State after processing:

Variable Value
total_trips["2013-10-02"] 2
cancelled_trips["2013-10-02"] 0

Cancellation rate:

$$\frac{0}{2} = 0.00$$

Day: 2013-10-03

Valid trips:

Trip ID Status Cancelled?
9 completed No
10 cancelled_by_driver Yes

State after processing:

Variable Value
total_trips["2013-10-03"] 2
cancelled_trips["2013-10-03"] 1

Cancellation rate:

$$\frac{1}{2} = 0.50$$

Complexity Analysis

Measure Complexity Explanation
Time O(T + U) Each user and trip is processed once
Space O(D + U) Stores user lookup table and per-day counters

The algorithm performs a single pass over the users table to build the banned lookup map and a single pass over the trips table to compute statistics.

Hash map lookups for banned status are constant time on average, making the overall runtime linear in the input size.

The additional space comes from:

  • storing user statuses
  • storing aggregated counts per day

Since the number of days is very small in this problem, the dominant factor is the user lookup table.

Test Cases

from collections import defaultdict

def cancellation_rate(trips, users):
    user_banned = {
        user["users_id"]: user["banned"]
        for user in users
    }

    total = defaultdict(int)
    cancelled = defaultdict(int)

    for trip in trips:
        day = trip["request_at"]

        if not ("2013-10-01" <= day <= "2013-10-03"):
            continue

        if (
            user_banned[trip["client_id"]] != "No"
            or user_banned[trip["driver_id"]] != "No"
        ):
            continue

        total[day] += 1

        if trip["status"] != "completed":
            cancelled[day] += 1

    result = {}

    for day in total:
        result[day] = round(cancelled[day] / total[day], 2)

    return result

users = [
    {"users_id": 1, "banned": "No"},
    {"users_id": 2, "banned": "Yes"},
    {"users_id": 10, "banned": "No"},
]

# Basic completed trip
assert cancellation_rate(
    [
        {
            "client_id": 1,
            "driver_id": 10,
            "status": "completed",
            "request_at": "2013-10-01"
        }
    ],
    users
) == {
    "2013-10-01": 0.0
}  # no cancellations

# Cancelled valid trip
assert cancellation_rate(
    [
        {
            "client_id": 1,
            "driver_id": 10,
            "status": "cancelled_by_driver",
            "request_at": "2013-10-01"
        }
    ],
    users
) == {
    "2013-10-01": 1.0
}  # all trips cancelled

# Banned client ignored
assert cancellation_rate(
    [
        {
            "client_id": 2,
            "driver_id": 10,
            "status": "cancelled_by_driver",
            "request_at": "2013-10-01"
        }
    ],
    users
) == {}  # invalid trip excluded

# Mixed trips
assert cancellation_rate(
    [
        {
            "client_id": 1,
            "driver_id": 10,
            "status": "completed",
            "request_at": "2013-10-01"
        },
        {
            "client_id": 1,
            "driver_id": 10,
            "status": "cancelled_by_client",
            "request_at": "2013-10-01"
        }
    ],
    users
) == {
    "2013-10-01": 0.5
}  # one out of two cancelled

# Outside date range ignored
assert cancellation_rate(
    [
        {
            "client_id": 1,
            "driver_id": 10,
            "status": "completed",
            "request_at": "2013-09-30"
        }
    ],
    users
) == {}  # date outside allowed range

# Multiple days
assert cancellation_rate(
    [
        {
            "client_id": 1,
            "driver_id": 10,
            "status": "completed",
            "request_at": "2013-10-01"
        },
        {
            "client_id": 1,
            "driver_id": 10,
            "status": "cancelled_by_driver",
            "request_at": "2013-10-02"
        }
    ],
    users
) == {
    "2013-10-01": 0.0,
    "2013-10-02": 1.0
}  # independent daily aggregation

Test Summary

Test Why
Single completed trip Verifies zero cancellation rate
Single cancelled trip Verifies full cancellation rate
Banned client Ensures invalid trips are excluded
Mixed completed and cancelled trips Verifies fractional rates
Outside date range Ensures filtering is correct
Multiple days Verifies independent daily grouping

Edge Cases

Trips Involving Banned Users

One of the most common mistakes is counting trips where either the client or driver is banned. The problem explicitly states that both users must be unbanned.

A buggy implementation might only check the client or only check the driver. The correct solution validates both independently before counting the trip.

Days With No Valid Trips

A day may contain trips, but all of them could involve banned users. In that case, the day should not appear in the output at all.

The implementation naturally handles this because a day is only added to the aggregation maps after a valid trip is processed.

All Trips Cancelled

If every valid trip on a day is canceled, the cancellation rate becomes 1.00.

Some implementations accidentally perform integer division and produce 1 or 0. The provided implementations explicitly use floating point division to avoid this issue.

No Cancelled Trips

If all valid trips are completed, the cancellation rate should be 0.00.

The conditional aggregation correctly produces zero canceled trips while still counting completed trips in the denominator.

Mixed Cancellation Types

Both:

  • cancelled_by_driver
  • cancelled_by_client

must count as cancellations.

The implementation handles this by treating every non-completed status as canceled, which automatically includes both cases.