LeetCode 601: Human Traffic of Stadium

A SQL guide for finding stadium records that belong to runs of at least three consecutive ids where each row has at least 100 people.

Problem Restatement

We are given a table called Stadium.

Each row records one stadium visit day. The table has three columns:

Column Type Meaning
id int Visit id
visit_date date Date of the visit
people int Number of people who visited

We need to return all rows that are part of a group of at least three consecutive id values where every row has people >= 100.

The result should be ordered by visit_date.

The official statement asks for records with three or more consecutive ids, and each record in that consecutive group must have people >= 100.

Input and Output

Input table:

Stadium

Output columns:

id, visit_date, people

The output should include only rows that belong to a valid consecutive group.

Example

Input:

id visit_date people
1 2017-01-01 10
2 2017-01-02 109
3 2017-01-03 150
4 2017-01-04 99
5 2017-01-05 145
6 2017-01-06 1455
7 2017-01-07 199
8 2017-01-08 188

Rows with people >= 100 are:

id visit_date people
2 2017-01-02 109
3 2017-01-03 150
5 2017-01-05 145
6 2017-01-06 1455
7 2017-01-07 199
8 2017-01-08 188

The consecutive high-traffic groups are:

Group ids Length
First 2, 3 2
Second 5, 6, 7, 8 4

Only the second group has length at least 3.

So the output is:

id visit_date people
5 2017-01-05 145
6 2017-01-06 1455
7 2017-01-07 199
8 2017-01-08 188

First Thought: Self Join

One direct way is to check whether each row is part of a valid window of three rows.

A row can be valid in three ways:

Case Meaning
It is the first row of a valid triple id, id + 1, id + 2
It is the middle row of a valid triple id - 1, id, id + 1
It is the last row of a valid triple id - 2, id - 1, id

This can be solved with several self joins, but the query becomes verbose. It also repeats the same condition many times.

A cleaner solution is to first keep only high-traffic rows, then group consecutive ids.

Key Insight

After filtering to rows where:

people >= 100

we need to find runs of consecutive id values.

For a consecutive sequence, this expression stays constant:

id - ROW_NUMBER() OVER (ORDER BY id)

Example:

id row_number id - row_number
5 1 4
6 2 4
7 3 4
8 4 4

All four rows have the same value, so they belong to the same consecutive group.

If there is a gap, the value changes.

Example:

id row_number id - row_number
2 1 1
3 2 1
5 3 2
6 4 2

The gap between 3 and 5 creates a new group.

Algorithm

First, filter the table to rows where people >= 100.

Then assign each remaining row a group key:

id - ROW_NUMBER() OVER (ORDER BY id)

Rows with the same group key form one consecutive run.

Next, count how many rows are in each group.

Finally, return only rows whose group has at least three rows.

SQL Solution

WITH high_traffic AS (
    SELECT
        id,
        visit_date,
        people,
        id - ROW_NUMBER() OVER (ORDER BY id) AS group_id
    FROM Stadium
    WHERE people >= 100
),
valid_groups AS (
    SELECT group_id
    FROM high_traffic
    GROUP BY group_id
    HAVING COUNT(*) >= 3
)
SELECT
    id,
    visit_date,
    people
FROM high_traffic
WHERE group_id IN (
    SELECT group_id
    FROM valid_groups
)
ORDER BY visit_date;

Code Explanation

The first CTE keeps only rows that can possibly be part of the answer:

WITH high_traffic AS (
    SELECT
        id,
        visit_date,
        people,
        id - ROW_NUMBER() OVER (ORDER BY id) AS group_id
    FROM Stadium
    WHERE people >= 100
)

Rows with fewer than 100 people cannot be in any valid group, so we remove them before grouping.

The expression:

id - ROW_NUMBER() OVER (ORDER BY id)

creates the same group_id for rows whose ids are consecutive.

The second CTE keeps only groups with at least three rows:

valid_groups AS (
    SELECT group_id
    FROM high_traffic
    GROUP BY group_id
    HAVING COUNT(*) >= 3
)

The final query returns all rows from those valid groups:

SELECT
    id,
    visit_date,
    people
FROM high_traffic
WHERE group_id IN (
    SELECT group_id
    FROM valid_groups
)
ORDER BY visit_date;

The result is ordered by visit_date, as required.

Correctness

After filtering, every row in high_traffic satisfies people >= 100.

For any consecutive run of ids, the value id - ROW_NUMBER() is constant because both id and ROW_NUMBER() increase by 1 at each step.

When a gap appears in the ids, id increases by more than 1, but ROW_NUMBER() still increases by exactly 1. Therefore, id - ROW_NUMBER() changes, creating a new group.

So each group_id represents exactly one consecutive run among rows where people >= 100.

The valid_groups CTE keeps only groups whose size is at least 3.

Therefore, the final query returns exactly the rows that belong to a consecutive high-traffic group of length at least 3.

Complexity

Let n be the number of rows in Stadium.

Metric Value Why
Time O(n log n) The window function orders rows by id
Space O(n) The CTE may store filtered rows and group ids

In practice, if id is indexed, the database can often process the ordering more efficiently.