LeetCode 1225 - Report Contiguous Dates

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

The problem asks us to analyze a system that runs one task per day over a fixed period, from 2019-01-01 to 2019-12-31. Each day, the task either succeeds or fails. We are given two tables: Failed and Succeeded, each containing dates when the tasks failed or succeeded, respectively. The dates are unique within each table.

The goal is to generate a report of continuous intervals of consecutive days where the system state is the same (failed or succeeded). Each interval should be represented by its start_date, end_date, and the period_state. Intervals must be merged properly: if multiple consecutive days have the same state, they should appear as a single row.

Key points:

The range is strictly limited to 2019. Any date outside this range must be ignored.
Each day has exactly one outcome (either failed or succeeded).
The tables may include dates outside 2019, which must be filtered out.
We need to identify continuous sequences of the same outcome and report them as a single interval.

Important edge cases:

A day with no record in either table: according to the problem statement, every task is either in Failed or Succeeded tables, so there are no missing days.
Intervals that start or end at the boundaries 2019-01-01 or 2019-12-31.
Single-day intervals.

Approaches

Brute Force

A naive solution would generate every date from 2019-01-01 to 2019-12-31 and check each date against the Failed and Succeeded tables. For each date, we would assign a state and then iterate over the sequence to merge consecutive days with the same state into intervals.

While correct, this approach has two main issues. First, generating and checking each date individually can be inefficient if implemented naively in SQL with subqueries. Second, merging consecutive dates manually in SQL requires either a correlated subquery or procedural logic, which is cumbersome and slow.

Optimal Approach

The key insight is to treat consecutive dates as a grouped sequence where the difference between the date and a sequential row number is constant. This technique, often called the "gaps and islands" problem, allows us to identify continuous sequences efficiently.

Steps for the optimal approach:

Filter both Failed and Succeeded tables to only include dates in 2019.
Union the tables into a single table with columns date and period_state.
Assign a row number to each row ordered by date.
Compute a "group identifier" by subtracting the row number (or using date arithmetic) from the date. Consecutive dates of the same state will share the same identifier.
Group by the state and the group identifier to find the start and end date of each interval.
Order the result by start_date.

This approach avoids iterating over each date explicitly and uses SQL set operations efficiently.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(N)	O(N)	Generate all dates and merge sequences manually
Optimal	O(N)	O(N)	Use "gaps and islands" with row numbers to group consecutive dates

Algorithm Walkthrough

Filter 2019 Dates: Select only rows from Failed and Succeeded where the date is between '2019-01-01' and '2019-12-31'. This ensures irrelevant data from other years is ignored.
Union Tables: Combine Failed and Succeeded into a single table with two columns: date and period_state. This simplifies processing by having all dates in one place.
Assign Row Numbers: Use SQL's ROW_NUMBER() function over ORDER BY date for each period_state. This will allow us to calculate a "group key" to identify consecutive sequences.
Compute Group Key: Subtract the row number from the date (converted to an integer or ordinal format) to create a key. Consecutive dates of the same state will have the same key because both the date and row number increase by 1 each day.
Aggregate Intervals: Group by period_state and the computed key. For each group, compute MIN(date) as start_date and MAX(date) as end_date.
Order the Result: Sort by start_date to produce the final report.

Why it works: Subtracting the row number from the date produces the same difference for consecutive dates, forming a constant group identifier. This guarantees that only consecutive sequences of the same state are merged, which exactly models the required intervals.

Python Solution

import pandas as pd

def report_contiguous_dates(failed: pd.DataFrame, succeeded: pd.DataFrame) -> pd.DataFrame:
    # Filter 2019 dates
    failed_2019 = failed[(failed['fail_date'] >= '2019-01-01') & (failed['fail_date'] <= '2019-12-31')]
    succeeded_2019 = succeeded[(succeeded['success_date'] >= '2019-01-01') & (succeeded['success_date'] <= '2019-12-31')]

    # Union tables
    failed_2019 = failed_2019.rename(columns={'fail_date': 'date'})
    failed_2019['period_state'] = 'failed'
    succeeded_2019 = succeeded_2019.rename(columns={'success_date': 'date'})
    succeeded_2019['period_state'] = 'succeeded'

    all_dates = pd.concat([failed_2019, succeeded_2019])
    all_dates = all_dates.sort_values('date').reset_index(drop=True)

    # Compute group identifier using cumulative difference
    all_dates['date'] = pd.to_datetime(all_dates['date'])
    all_dates['rn'] = all_dates.groupby('period_state').cumcount()
    all_dates['group'] = all_dates['date'] - pd.to_timedelta(all_dates['rn'], unit='D')

    # Aggregate intervals
    result = all_dates.groupby(['period_state', 'group'], as_index=False).agg(
        start_date=('date', 'min'),
        end_date=('date', 'max')
    ).sort_values('start_date')

    result['start_date'] = result['start_date'].dt.date
    result['end_date'] = result['end_date'].dt.date

    return result[['period_state', 'start_date', 'end_date']]

Explanation: The code first filters dates to 2019, then combines both tables. Sorting ensures correct ordering. The cumulative count per period_state creates a running index. Subtracting this index from the date produces a consistent group key for consecutive sequences. Finally, grouping by this key aggregates the intervals.

Go Solution

package main

import (
    "database/sql"
    "fmt"
)

func reportContiguousDates(db *sql.DB) (*sql.Rows, error) {
    query := `
    WITH combined AS (
        SELECT fail_date AS date, 'failed' AS period_state
        FROM Failed
        WHERE fail_date BETWEEN '2019-01-01' AND '2019-12-31'
        UNION ALL
        SELECT success_date AS date, 'succeeded' AS period_state
        FROM Succeeded
        WHERE success_date BETWEEN '2019-01-01' AND '2019-12-31'
    ),
    numbered AS (
        SELECT
            date,
            period_state,
            ROW_NUMBER() OVER (PARTITION BY period_state ORDER BY date) AS rn
        FROM combined
    ),
    grouped AS (
        SELECT
            period_state,
            date - INTERVAL (rn) DAY AS grp,
            MIN(date) OVER (PARTITION BY period_state, date - INTERVAL (rn) DAY) AS start_date,
            MAX(date) OVER (PARTITION BY period_state, date - INTERVAL (rn) DAY) AS end_date
        FROM numbered
    )
    SELECT DISTINCT period_state, start_date, end_date
    FROM grouped
    ORDER BY start_date;
    `
    return db.Query(query)
}

Explanation: The Go version uses SQL with CTEs to handle the problem in the database, which is natural for LeetCode SQL problems. The logic mirrors the Python approach: combine tables, assign row numbers, compute a group key using date arithmetic, aggregate min and max, and order by start date.

Worked Examples

Example Input:

Failed: 2018-12-28, 2018-12-29, 2019-01-04, 2019-01-05
Succeeded: 2018-12-30, 2018-12-31, 2019-01-01, 2019-01-02, 2019-01-03, 2019-01-06

Filter 2019 dates:

Failed: 2019-01-04, 2019-01-05
Succeeded: 2019-01-01, 2019-01-02, 2019-01-03, 2019-01-06

Union and sort:

date	period_state
2019-01-01	succeeded
2019-01-02	succeeded
2019-01-03	succeeded