LeetCode 601 - Human Traffic of Stadium

The problem asks us to extract all consecutive records from a Stadium table where the people count is at least 100, and the consecutive streak has a length of three or more. Each row has a unique id and a corresponding visitdate.

LeetCode Problem 601

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

The problem asks us to extract all consecutive records from a Stadium table where the people count is at least 100, and the consecutive streak has a length of three or more. Each row has a unique id and a corresponding visit_date. Consecutive here refers to consecutive id values, not consecutive dates. The output should be all records that belong to these streaks, ordered by visit_date.

The input is a SQL table with three columns: id (an incrementing integer), visit_date (a unique date), and people (an integer representing attendance). The expected output is a subset of this table where every row is part of a sequence of three or more consecutive rows (by id) with people >= 100.

Important constraints include that visit_date is unique and id is strictly increasing. Edge cases could involve sequences at the start or end of the table, sequences that are exactly three long, or sequences that are broken by a row with people < 100. Also, gaps in dates do not matter-consecutiveness is defined solely by id.

Approaches

Brute-Force Approach

A naive approach would scan the table and, for each row, check if the next two rows also have people >= 100. For longer sequences, this would require iterating through all possible subarrays of length three or more. This approach is correct but inefficient because it repeatedly checks overlapping sequences, leading to potentially O(n^2) comparisons for n rows.

Optimal Approach

The key insight is to use a windowing technique with a cumulative count. First, we mark rows with people >= 100. Then, by using SQL window functions like LEAD and LAG, we can determine if a row is part of a streak of three or more consecutive rows meeting the criteria. Another method is to assign a "group" identifier to each streak of consecutive qualifying rows and filter for groups of size at least three. This reduces repeated scanning and leverages SQL set-based operations for efficiency.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2) O(1) Checks every possible consecutive triplet manually, redundant work
Optimal O(n) O(n) Uses window functions or grouping to identify qualifying streaks efficiently

Algorithm Walkthrough

  1. Filter rows by attendance: Mark all rows where people >= 100 to identify potential streaks.
  2. Identify consecutive groups: Compute the difference between id and the row number of the filtered set to assign each streak a unique group number. Consecutive rows will have the same difference value.
  3. Count streak size per group: Use a COUNT(*) OVER (PARTITION BY group_id) to find the size of each group.
  4. Filter for groups of size >= 3: Only include rows where the group size is three or more.
  5. Order by visit_date: Ensure the output matches the required order.

Why it works: The difference between id and row number creates a stable group identifier for consecutive qualifying rows because for a consecutive sequence, id - row_number is constant. Filtering by group size ensures we only return sequences of length three or more.

Python Solution

# SQL-based solution represented in Python-friendly syntax for LeetCode
SELECT id, visit_date, people
FROM (
    SELECT *,
           id - ROW_NUMBER() OVER (ORDER BY id) AS grp
    FROM Stadium
    WHERE people >= 100
) AS t
WHERE grp IN (
    SELECT grp
    FROM (
        SELECT id - ROW_NUMBER() OVER (ORDER BY id) AS grp, COUNT(*) AS cnt
        FROM Stadium
        WHERE people >= 100
        GROUP BY grp
    ) AS sub
    WHERE cnt >= 3
)
ORDER BY visit_date;

Explanation: We first filter rows with people >= 100. The ROW_NUMBER() assigns consecutive numbers based on id. Subtracting this from id gives a constant grp value for consecutive sequences. The inner query counts group sizes, and the outer query selects only groups of length three or more, ordering by visit_date.

Go Solution

// Go solution uses raw SQL execution, as the problem is SQL-focused
query := `
SELECT id, visit_date, people
FROM (
    SELECT *,
           id - ROW_NUMBER() OVER (ORDER BY id) AS grp
    FROM Stadium
    WHERE people >= 100
) AS t
WHERE grp IN (
    SELECT grp
    FROM (
        SELECT id - ROW_NUMBER() OVER (ORDER BY id) AS grp, COUNT(*) AS cnt
        FROM Stadium
        WHERE people >= 100
        GROUP BY grp
    ) AS sub
    WHERE cnt >= 3
)
ORDER BY visit_date;
`

Go-specific note: The Go solution directly embeds SQL in a string to execute with database/sql or similar package. Logic mirrors the Python/SQL approach, and there is no need to manage nil or slice-specific details since the database handles filtering and ordering.

Worked Examples

Example 1:

id | visit_date | people
1  | 2017-01-01 | 10
2  | 2017-01-02 | 109
3  | 2017-01-03 | 150
4  | 2017-01-04 | 99
5  | 2017-01-05 | 145
6  | 2017-01-06 | 1455
7  | 2017-01-07 | 199
8  | 2017-01-09 | 188

Step-by-step:

  1. Filter rows with people >= 100: rows 2, 3, 5, 6, 7, 8.
  2. Compute grp = id - row_number():
  • Row 2: 2 - 1 = 1
  • Row 3: 3 - 2 = 1
  • Row 5: 5 - 3 = 2
  • Row 6: 6 - 4 = 2
  • Row 7: 7 - 5 = 2
  • Row 8: 8 - 6 = 2
  1. Count group sizes:
  • Group 1: size 2 (rows 2, 3) → too small, discarded
  • Group 2: size 4 (rows 5, 6, 7, 8) → included
  1. Output rows ordered by visit_date: rows 5, 6, 7, 8.

Complexity Analysis

Measure Complexity Explanation
Time O(n) We scan the table once to compute row numbers and group identifiers. SQL engine handles filtering and counting efficiently.
Space O(n) Space for window function computation and temporary group identifiers.

The optimal algorithm avoids repeated scanning by leveraging SQL window functions and grouping, ensuring linear complexity relative to the number of rows.

Test Cases

# test cases represented as SQL queries/asserts
# Test case 1: example provided
# Expected rows: ids 5,6,7,8
# Test case 2: no qualifying streak
# Test case 3: exactly three in a streak
# Test case 4: all rows qualify
# Test case 5: streak at the start
Test Why
Rows 5-8 in example Tests normal consecutive sequence handling
No rows >= 100 Ensures function handles empty result
Exactly three consecutive >=100 Validates boundary streak length
All rows >=100 Checks handling of full table streak
Streak at start Validates start-of-table edge case

Edge Cases

One important edge case is when streaks are exactly three rows long. This is the minimum valid streak and must be included. If miscalculated, the implementation might omit them. Another edge case is streaks at the very beginning or end of the table; SQL window functions handle these correctly but a naive loop could miss them. Finally, rows with people < 100 breaking otherwise long sequences must correctly terminate a group; our group ID approach naturally separates these sequences without additional logic.