LeetCode 1454 - Active Users
The problem asks us to identify active users from a database containing two tables: Accounts and Logins. The Accounts ta
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
The problem asks us to identify active users from a database containing two tables: Accounts and Logins. The Accounts table lists all users with their id and name, and the Logins table records all login events, including duplicates and multiple logins per day for the same user. A user is defined as active if they have logged in for five or more consecutive days. The expected output is a list of id and name for these active users, sorted by id.
The inputs are relational tables, so the solution should ideally use SQL. The challenge comes from identifying consecutive dates, which requires careful handling because dates may have gaps, duplicates, or unordered entries. Edge cases include users with multiple logins per day, users with exactly five consecutive days, and users with intermittent logins that never reach five consecutive days.
The follow-up generalizes the problem to detecting n consecutive days, which implies a need for a flexible, parameterized approach.
Approaches
The brute-force approach would be to retrieve all login dates for each user, sort them, and iterate through the dates to count consecutive sequences. If a sequence of five or more consecutive days is found, the user is marked active. While this works, it requires iterating through potentially large datasets and performing multiple sorts, which is inefficient for large-scale logs.
The optimal approach leverages a gaps-and-islands technique in SQL. The key insight is that consecutive dates for a user form an "island" where the difference between the date and a sequential row number is constant. By computing this difference, consecutive dates can be grouped, and counting the size of each group identifies sequences of five or more consecutive days. This method avoids explicit loops and scales efficiently using window functions.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(u * d log d) | O(d) | For each user, sort login dates (d = number of login days per user) and check consecutive sequences |
| Optimal | O(L) | O(L) | Use SQL window functions to assign row numbers, compute differences, group by user, and filter sequences of length >= 5 (L = total login rows) |
Algorithm Walkthrough
- Deduplicate logins: Remove duplicate login events per user per day to simplify counting consecutive days.
- Assign row numbers: For each user, order login dates and assign a row number using
ROW_NUMBER(). - Compute gaps: Calculate the difference between the login date (converted to an integer or epoch) and the row number. This difference is constant for consecutive dates, effectively creating an "island ID".
- Group by island: Group by
user_idand the computed difference to identify consecutive sequences. - Count sequence length: For each group, count the number of days. Filter groups where the count is five or more.
- Retrieve user info: Join the filtered user IDs with the
Accountstable to get the names. - Order result: Sort by
idto match the expected output.
Why it works: Consecutive dates for a user differ by one day per row. By subtracting the row number from the date (converted to an integer), all consecutive sequences yield the same difference. Grouping by this difference isolates sequences efficiently, and counting days identifies active users.
Python Solution
import sqlite3
def active_users(n: int = 5):
query = f"""
WITH DistinctLogins AS (
SELECT DISTINCT id, login_date
FROM Logins
),
NumberedLogins AS (
SELECT id,
login_date,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY login_date) AS rn
FROM DistinctLogins
),
Grouped AS (
SELECT id,
MIN(login_date) AS start_date,
MAX(login_date) AS end_date,
COUNT(*) AS consecutive_days
FROM NumberedLogins
GROUP BY id, DATE(login_date, '-' || rn || ' days')
)
SELECT a.id, a.name
FROM Accounts a
JOIN (
SELECT id
FROM Grouped
WHERE consecutive_days >= {n}
) active
ON a.id = active.id
ORDER BY a.id;
"""
return query
This Python function constructs a SQL query string using standard SQL window functions and date arithmetic. It deduplicates logins, assigns row numbers, calculates gaps to detect consecutive sequences, counts sequence lengths, filters for active users, and joins with the Accounts table to return the id and name.
Go Solution
package main
import "fmt"
func ActiveUsers(n int) string {
query := fmt.Sprintf(`
WITH DistinctLogins AS (
SELECT DISTINCT id, login_date
FROM Logins
),
NumberedLogins AS (
SELECT id,
login_date,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY login_date) AS rn
FROM DistinctLogins
),
Grouped AS (
SELECT id,
MIN(login_date) AS start_date,
MAX(login_date) AS end_date,
COUNT(*) AS consecutive_days
FROM NumberedLogins
GROUP BY id, DATE(login_date, '-' || rn || ' days')
)
SELECT a.id, a.name
FROM Accounts a
JOIN (
SELECT id
FROM Grouped
WHERE consecutive_days >= %d
) active
ON a.id = active.id
ORDER BY a.id;
`, n)
return query
}
The Go version constructs the same SQL query using fmt.Sprintf to inject the n parameter. The logic mirrors the Python solution. Go does not require database-specific types for this query construction.
Worked Examples
For the input:
Accounts:
1, Winston
7, Jonathan
Logins:
7, 2020-05-30
1, 2020-05-30
7, 2020-05-31
7, 2020-06-01
7, 2020-06-02
7, 2020-06-02
7, 2020-06-03
1, 2020-06-07
7, 2020-06-10
Step by step:
- Deduplicate logins:
2020-06-02for user 7 appears twice, reduce to one. - Number rows per user ordered by date:
| id | login_date | rn |
|---|---|---|
| 1 | 2020-05-30 | 1 |
| 1 | 2020-06-07 | 2 |
| 7 | 2020-05-30 | 1 |
| 7 | 2020-05-31 | 2 |
| 7 | 2020-06-01 | 3 |
| 7 | 2020-06-02 | 4 |
| 7 | 2020-06-03 | 5 |
| 7 | 2020-06-10 | 6 |
- Compute
login_date - rnto group sequences. - Count consecutive days per group. User 7 has one group of 5 consecutive days.
- Filter sequences >= 5 days → User 7 is active.
- Join with
Accounts→ Returnid=7, name=Jonathan.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(L log L) | Deduplication and sorting of login dates per user dominates (L = total login rows) |
| Space | O(L) | Stores distinct logins, row numbers, and intermediate grouping results |
The solution avoids nested iteration over users and dates, so it scales efficiently with the number of logins.
Test Cases
# Basic example
assert active_users() == active_users(5) # standard case
# Single user, exactly 5 consecutive days
assert active_users(5) # should detect as active
# Multiple users, none active
assert active_users(5) # should return empty set
# User with gaps
assert active_users(5) # only consecutive sequences count
# Follow-up with n=3
assert active_users(3) # flexible consecutive days parameter
| Test | Why |
|---|---|
| Basic example | Matches problem statement example |
| Exactly 5 days | Edge case for threshold |
| None active | Checks empty result handling |
| User with gaps | Validates consecutive days logic |
| n=3 | Validates follow-up generalization |
Edge Cases
Case 1: Duplicate logins on the same day. Users may log in multiple times per day. A naive implementation counting raw rows could incorrectly overcount consecutive days. Deduplicating by (id, login_date) ensures accurate sequences.
Case 2: Users with exactly n consecutive days. If the threshold is n (5 in the main problem), sequences exactly equal to n must be included. The solution filters >= n to handle this correctly.
Case 3: Non-consecutive login sequences separated by gaps. Users with intermittent login patterns may have many login days but no sequence of n consecutive days. The gap computation ensures only true consecutive sequences are counted.
This approach handles these edge cases correctly and scales efficiently to large login datasets