LeetCode 601 - Human Traffic of Stadium
The problem asks us to extract all consecutive records from a Stadium table where the people count is at least 100, and the consecutive streak has a length of three or more. Each row has a unique id and a corresponding visitdate.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
The problem asks us to extract all consecutive records from a Stadium table where the people count is at least 100, and the consecutive streak has a length of three or more. Each row has a unique id and a corresponding visit_date. Consecutive here refers to consecutive id values, not consecutive dates. The output should be all records that belong to these streaks, ordered by visit_date.
The input is a SQL table with three columns: id (an incrementing integer), visit_date (a unique date), and people (an integer representing attendance). The expected output is a subset of this table where every row is part of a sequence of three or more consecutive rows (by id) with people >= 100.
Important constraints include that visit_date is unique and id is strictly increasing. Edge cases could involve sequences at the start or end of the table, sequences that are exactly three long, or sequences that are broken by a row with people < 100. Also, gaps in dates do not matter-consecutiveness is defined solely by id.
Approaches
Brute-Force Approach
A naive approach would scan the table and, for each row, check if the next two rows also have people >= 100. For longer sequences, this would require iterating through all possible subarrays of length three or more. This approach is correct but inefficient because it repeatedly checks overlapping sequences, leading to potentially O(n^2) comparisons for n rows.
Optimal Approach
The key insight is to use a windowing technique with a cumulative count. First, we mark rows with people >= 100. Then, by using SQL window functions like LEAD and LAG, we can determine if a row is part of a streak of three or more consecutive rows meeting the criteria. Another method is to assign a "group" identifier to each streak of consecutive qualifying rows and filter for groups of size at least three. This reduces repeated scanning and leverages SQL set-based operations for efficiency.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n^2) | O(1) | Checks every possible consecutive triplet manually, redundant work |
| Optimal | O(n) | O(n) | Uses window functions or grouping to identify qualifying streaks efficiently |
Algorithm Walkthrough
- Filter rows by attendance: Mark all rows where
people >= 100to identify potential streaks. - Identify consecutive groups: Compute the difference between
idand the row number of the filtered set to assign each streak a unique group number. Consecutive rows will have the same difference value. - Count streak size per group: Use a
COUNT(*) OVER (PARTITION BY group_id)to find the size of each group. - Filter for groups of size >= 3: Only include rows where the group size is three or more.
- Order by
visit_date: Ensure the output matches the required order.
Why it works: The difference between id and row number creates a stable group identifier for consecutive qualifying rows because for a consecutive sequence, id - row_number is constant. Filtering by group size ensures we only return sequences of length three or more.
Python Solution
# SQL-based solution represented in Python-friendly syntax for LeetCode
SELECT id, visit_date, people
FROM (
SELECT *,
id - ROW_NUMBER() OVER (ORDER BY id) AS grp
FROM Stadium
WHERE people >= 100
) AS t
WHERE grp IN (
SELECT grp
FROM (
SELECT id - ROW_NUMBER() OVER (ORDER BY id) AS grp, COUNT(*) AS cnt
FROM Stadium
WHERE people >= 100
GROUP BY grp
) AS sub
WHERE cnt >= 3
)
ORDER BY visit_date;
Explanation: We first filter rows with people >= 100. The ROW_NUMBER() assigns consecutive numbers based on id. Subtracting this from id gives a constant grp value for consecutive sequences. The inner query counts group sizes, and the outer query selects only groups of length three or more, ordering by visit_date.
Go Solution
// Go solution uses raw SQL execution, as the problem is SQL-focused
query := `
SELECT id, visit_date, people
FROM (
SELECT *,
id - ROW_NUMBER() OVER (ORDER BY id) AS grp
FROM Stadium
WHERE people >= 100
) AS t
WHERE grp IN (
SELECT grp
FROM (
SELECT id - ROW_NUMBER() OVER (ORDER BY id) AS grp, COUNT(*) AS cnt
FROM Stadium
WHERE people >= 100
GROUP BY grp
) AS sub
WHERE cnt >= 3
)
ORDER BY visit_date;
`
Go-specific note: The Go solution directly embeds SQL in a string to execute with database/sql or similar package. Logic mirrors the Python/SQL approach, and there is no need to manage nil or slice-specific details since the database handles filtering and ordering.
Worked Examples
Example 1:
id | visit_date | people
1 | 2017-01-01 | 10
2 | 2017-01-02 | 109
3 | 2017-01-03 | 150
4 | 2017-01-04 | 99
5 | 2017-01-05 | 145
6 | 2017-01-06 | 1455
7 | 2017-01-07 | 199
8 | 2017-01-09 | 188
Step-by-step:
- Filter rows with
people >= 100: rows 2, 3, 5, 6, 7, 8. - Compute
grp = id - row_number():
- Row 2: 2 - 1 = 1
- Row 3: 3 - 2 = 1
- Row 5: 5 - 3 = 2
- Row 6: 6 - 4 = 2
- Row 7: 7 - 5 = 2
- Row 8: 8 - 6 = 2
- Count group sizes:
- Group 1: size 2 (rows 2, 3) → too small, discarded
- Group 2: size 4 (rows 5, 6, 7, 8) → included
- Output rows ordered by
visit_date: rows 5, 6, 7, 8.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | We scan the table once to compute row numbers and group identifiers. SQL engine handles filtering and counting efficiently. |
| Space | O(n) | Space for window function computation and temporary group identifiers. |
The optimal algorithm avoids repeated scanning by leveraging SQL window functions and grouping, ensuring linear complexity relative to the number of rows.
Test Cases
# test cases represented as SQL queries/asserts
# Test case 1: example provided
# Expected rows: ids 5,6,7,8
# Test case 2: no qualifying streak
# Test case 3: exactly three in a streak
# Test case 4: all rows qualify
# Test case 5: streak at the start
| Test | Why |
|---|---|
| Rows 5-8 in example | Tests normal consecutive sequence handling |
| No rows >= 100 | Ensures function handles empty result |
| Exactly three consecutive >=100 | Validates boundary streak length |
| All rows >=100 | Checks handling of full table streak |
| Streak at start | Validates start-of-table edge case |
Edge Cases
One important edge case is when streaks are exactly three rows long. This is the minimum valid streak and must be included. If miscalculated, the implementation might omit them. Another edge case is streaks at the very beginning or end of the table; SQL window functions handle these correctly but a naive loop could miss them. Finally, rows with people < 100 breaking otherwise long sequences must correctly terminate a group; our group ID approach naturally separates these sequences without additional logic.