LeetCode 1225 - Report Contiguous Dates
The problem asks us to analyze a system that runs one task per day over a fixed period, from 2019-01-01 to 2019-12-31. Each day, the task either succeeds or fails.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
The problem asks us to analyze a system that runs one task per day over a fixed period, from 2019-01-01 to 2019-12-31. Each day, the task either succeeds or fails. We are given two tables: Failed and Succeeded, each containing dates when the tasks failed or succeeded, respectively. The dates are unique within each table.
The goal is to generate a report of continuous intervals of consecutive days where the system state is the same (failed or succeeded). Each interval should be represented by its start_date, end_date, and the period_state. Intervals must be merged properly: if multiple consecutive days have the same state, they should appear as a single row.
Key points:
- The range is strictly limited to 2019. Any date outside this range must be ignored.
- Each day has exactly one outcome (either failed or succeeded).
- The tables may include dates outside 2019, which must be filtered out.
- We need to identify continuous sequences of the same outcome and report them as a single interval.
Important edge cases:
- A day with no record in either table: according to the problem statement, every task is either in
FailedorSucceededtables, so there are no missing days. - Intervals that start or end at the boundaries
2019-01-01or2019-12-31. - Single-day intervals.
Approaches
Brute Force
A naive solution would generate every date from 2019-01-01 to 2019-12-31 and check each date against the Failed and Succeeded tables. For each date, we would assign a state and then iterate over the sequence to merge consecutive days with the same state into intervals.
While correct, this approach has two main issues. First, generating and checking each date individually can be inefficient if implemented naively in SQL with subqueries. Second, merging consecutive dates manually in SQL requires either a correlated subquery or procedural logic, which is cumbersome and slow.
Optimal Approach
The key insight is to treat consecutive dates as a grouped sequence where the difference between the date and a sequential row number is constant. This technique, often called the "gaps and islands" problem, allows us to identify continuous sequences efficiently.
Steps for the optimal approach:
- Filter both
FailedandSucceededtables to only include dates in 2019. - Union the tables into a single table with columns
dateandperiod_state. - Assign a row number to each row ordered by date.
- Compute a "group identifier" by subtracting the row number (or using date arithmetic) from the date. Consecutive dates of the same state will share the same identifier.
- Group by the state and the group identifier to find the start and end date of each interval.
- Order the result by
start_date.
This approach avoids iterating over each date explicitly and uses SQL set operations efficiently.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(N) | O(N) | Generate all dates and merge sequences manually |
| Optimal | O(N) | O(N) | Use "gaps and islands" with row numbers to group consecutive dates |
Algorithm Walkthrough
- Filter 2019 Dates: Select only rows from
FailedandSucceededwhere the date is between'2019-01-01'and'2019-12-31'. This ensures irrelevant data from other years is ignored. - Union Tables: Combine
FailedandSucceededinto a single table with two columns:dateandperiod_state. This simplifies processing by having all dates in one place. - Assign Row Numbers: Use SQL's
ROW_NUMBER()function overORDER BY datefor eachperiod_state. This will allow us to calculate a "group key" to identify consecutive sequences. - Compute Group Key: Subtract the row number from the date (converted to an integer or ordinal format) to create a key. Consecutive dates of the same state will have the same key because both the date and row number increase by 1 each day.
- Aggregate Intervals: Group by
period_stateand the computed key. For each group, computeMIN(date)asstart_dateandMAX(date)asend_date. - Order the Result: Sort by
start_dateto produce the final report.
Why it works: Subtracting the row number from the date produces the same difference for consecutive dates, forming a constant group identifier. This guarantees that only consecutive sequences of the same state are merged, which exactly models the required intervals.
Python Solution
import pandas as pd
def report_contiguous_dates(failed: pd.DataFrame, succeeded: pd.DataFrame) -> pd.DataFrame:
# Filter 2019 dates
failed_2019 = failed[(failed['fail_date'] >= '2019-01-01') & (failed['fail_date'] <= '2019-12-31')]
succeeded_2019 = succeeded[(succeeded['success_date'] >= '2019-01-01') & (succeeded['success_date'] <= '2019-12-31')]
# Union tables
failed_2019 = failed_2019.rename(columns={'fail_date': 'date'})
failed_2019['period_state'] = 'failed'
succeeded_2019 = succeeded_2019.rename(columns={'success_date': 'date'})
succeeded_2019['period_state'] = 'succeeded'
all_dates = pd.concat([failed_2019, succeeded_2019])
all_dates = all_dates.sort_values('date').reset_index(drop=True)
# Compute group identifier using cumulative difference
all_dates['date'] = pd.to_datetime(all_dates['date'])
all_dates['rn'] = all_dates.groupby('period_state').cumcount()
all_dates['group'] = all_dates['date'] - pd.to_timedelta(all_dates['rn'], unit='D')
# Aggregate intervals
result = all_dates.groupby(['period_state', 'group'], as_index=False).agg(
start_date=('date', 'min'),
end_date=('date', 'max')
).sort_values('start_date')
result['start_date'] = result['start_date'].dt.date
result['end_date'] = result['end_date'].dt.date
return result[['period_state', 'start_date', 'end_date']]
Explanation: The code first filters dates to 2019, then combines both tables. Sorting ensures correct ordering. The cumulative count per period_state creates a running index. Subtracting this index from the date produces a consistent group key for consecutive sequences. Finally, grouping by this key aggregates the intervals.
Go Solution
package main
import (
"database/sql"
"fmt"
)
func reportContiguousDates(db *sql.DB) (*sql.Rows, error) {
query := `
WITH combined AS (
SELECT fail_date AS date, 'failed' AS period_state
FROM Failed
WHERE fail_date BETWEEN '2019-01-01' AND '2019-12-31'
UNION ALL
SELECT success_date AS date, 'succeeded' AS period_state
FROM Succeeded
WHERE success_date BETWEEN '2019-01-01' AND '2019-12-31'
),
numbered AS (
SELECT
date,
period_state,
ROW_NUMBER() OVER (PARTITION BY period_state ORDER BY date) AS rn
FROM combined
),
grouped AS (
SELECT
period_state,
date - INTERVAL (rn) DAY AS grp,
MIN(date) OVER (PARTITION BY period_state, date - INTERVAL (rn) DAY) AS start_date,
MAX(date) OVER (PARTITION BY period_state, date - INTERVAL (rn) DAY) AS end_date
FROM numbered
)
SELECT DISTINCT period_state, start_date, end_date
FROM grouped
ORDER BY start_date;
`
return db.Query(query)
}
Explanation: The Go version uses SQL with CTEs to handle the problem in the database, which is natural for LeetCode SQL problems. The logic mirrors the Python approach: combine tables, assign row numbers, compute a group key using date arithmetic, aggregate min and max, and order by start date.
Worked Examples
Example Input:
Failed: 2018-12-28, 2018-12-29, 2019-01-04, 2019-01-05
Succeeded: 2018-12-30, 2018-12-31, 2019-01-01, 2019-01-02, 2019-01-03, 2019-01-06
- Filter 2019 dates:
- Failed: 2019-01-04, 2019-01-05
- Succeeded: 2019-01-01, 2019-01-02, 2019-01-03, 2019-01-06
- Union and sort:
| date | period_state |
|---|---|
| 2019-01-01 | succeeded |
| 2019-01-02 | succeeded |
| 2019-01-03 | succeeded |