LeetCode 1107 - New Users Daily Count
We are given a database table called Traffic that records different user activities on different dates. Each row contains a userid, an activity type, and an activitydate.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
We are given a database table called Traffic that records different user activities on different dates. Each row contains a user_id, an activity type, and an activity_date. A user may appear multiple times on the same day because they can perform different activities such as login, logout, homepage, jobs, or groups. The table may also contain duplicate rows.
The task is to determine, for every date within the last 90 days from 2019-06-30, how many users logged in for the first time ever on that date.
The key detail is that we are not counting all login events. We only care about a user's earliest login date. After finding each user's first login date, we only include those dates that fall within the allowed 90-day window ending on 2019-06-30. Finally, for each valid date, we count how many users had their first login on that day.
To better understand the requirement, consider user 5 from the example:
- User
5logged in on2019-03-01 - User
5logged in again on2019-06-21
Even though 2019-06-21 is inside the last 90 days, we do not count this user on that day because their first login already happened on 2019-03-01.
The input represents an activity log where:
user_ididentifies the user.activitytells us what action happened.activity_datetells us when it happened.
The expected output contains:
| Column | Meaning |
|---|---|
login_date |
A date where at least one user logged in for the first time |
user_count |
Number of users whose first login happened on that date |
Only dates with a non-zero count should appear.
The problem statement guarantees that activity is one of a fixed set of enum values, and duplicate rows may exist. Since this is a database problem, efficiency matters because the table could be large. We want to avoid repeatedly scanning the table for each user or date.
Several edge cases are important to recognize upfront. A user may have multiple login records on the same day, duplicate rows may exist, or a user may log in many times across different dates. We must always use the minimum login date for each user. Another subtle case is users whose first login occurred outside the 90-day window, even if later logins happen inside the window, they must not be counted.
Approaches
Brute Force Approach
A straightforward approach would be to first identify every distinct user, then for each user scan all rows to find their earliest login date. After that, we would check whether the earliest login falls within the last 90 days and increment a counter for that date.
This works because for every user we explicitly compute their first login date, which guarantees correctness. However, it is inefficient because for each user we repeatedly scan the entire table. If there are n rows and u users, this could require O(u × n) work.
In database systems, repeated full table scans are expensive and unnecessary.
Key Insight
The important observation is that we only care about the earliest login date per user. SQL already provides aggregation functions that solve this efficiently.
We can:
- Filter rows where
activity = 'login'. - Group by
user_id. - Compute
MIN(activity_date)for each user to get the first login date. - Filter first login dates to only those within 90 days of
2019-06-30. - Group again by login date and count users.
This avoids repeated scanning and lets the database engine perform efficient grouping operations.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(u × n) | O(u) | Repeatedly scans the table for each user |
| Optimal | O(n log n) | O(u) | Uses grouping and aggregation to compute first login dates efficiently |
Algorithm Walkthrough
- First, filter the
Traffictable to only include rows whereactivity = 'login'. This is necessary because only login events matter when determining a user's first login date. - Group the filtered rows by
user_id. For each user, compute the minimumactivity_date. This gives us the earliest date on which each user logged in. - From these first login dates, keep only those that fall within the allowed 90-day range from
2019-06-30. Since today is fixed, the cutoff date becomes2019-04-01. - Group the remaining results by the first login date.
- Count how many users belong to each date.
- Return the resulting table with columns:
login_dateuser_count
Why it works
The correctness comes from the fact that for every user we compute exactly one value, their earliest login date using MIN(activity_date). Since every user contributes only once, grouping by this first login date guarantees accurate counts. Filtering after computing the minimum ensures we correctly exclude users whose first login happened before the 90-day window, even if they logged in again later.
Python Solution
# Write your MySQL query statement below
SELECT
first_login AS login_date,
COUNT(*) AS user_count
FROM (
SELECT
user_id,
MIN(activity_date) AS first_login
FROM Traffic
WHERE activity = 'login'
GROUP BY user_id
) AS first_logins
WHERE first_login >= DATE_SUB('2019-06-30', INTERVAL 90 DAY)
GROUP BY first_login;
The query starts by filtering the table to only login activities. We then group by user_id and compute MIN(activity_date) to determine each user's first login date.
This intermediate result is stored in a derived table called first_logins. Using a subquery makes the logic easier to understand because we separate the task of finding first login dates from counting them.
Next, we apply the 90-day filter. Since today is fixed as 2019-06-30, we use DATE_SUB('2019-06-30', INTERVAL 90 DAY) to determine the valid range dynamically.
Finally, we group by first_login and count users for each date.
Go Solution
// Write your MySQL query statement below
SELECT
first_login AS login_date,
COUNT(*) AS user_count
FROM (
SELECT
user_id,
MIN(activity_date) AS first_login
FROM Traffic
WHERE activity = 'login'
GROUP BY user_id
) AS first_logins
WHERE first_login >= DATE_SUB('2019-06-30', INTERVAL 90 DAY)
GROUP BY first_login;
Since this is a database problem, LeetCode expects a SQL query rather than an implementation in a programming language. As a result, both the Python and Go sections contain the same SQL solution because there is no language-specific function signature to implement.
There are no Go-specific concerns such as slice handling, nil values, or integer overflow because the logic executes entirely inside the SQL engine.
Worked Examples
Example 1
Input:
| user_id | activity | activity_date |
|---|---|---|
| 1 | login | 2019-05-01 |
| 1 | homepage | 2019-05-01 |
| 1 | logout | 2019-05-01 |
| 2 | login | 2019-06-21 |
| 3 | login | 2019-01-01 |
| 4 | login | 2019-06-21 |
| 5 | login | 2019-03-01 |
| 5 | login | 2019-06-21 |
Step 1: Filter Login Rows
| user_id | activity_date |
|---|---|
| 1 | 2019-05-01 |
| 2 | 2019-06-21 |
| 3 | 2019-01-01 |
| 4 | 2019-06-21 |
| 5 | 2019-03-01 |
| 5 | 2019-06-21 |
Step 2: Find First Login Per User
| user_id | first_login |
|---|---|
| 1 | 2019-05-01 |
| 2 | 2019-06-21 |
| 3 | 2019-01-01 |
| 4 | 2019-06-21 |
| 5 | 2019-03-01 |
Step 3: Filter to Last 90 Days
The cutoff date is:
2019-06-30 - 90 days = 2019-04-01
Remaining rows:
| user_id | first_login |
|---|---|
| 1 | 2019-05-01 |
| 2 | 2019-06-21 |
| 4 | 2019-06-21 |
Step 4: Group and Count
| login_date | user_count |
|---|---|
| 2019-05-01 | 1 |
| 2019-06-21 | 2 |
This matches the expected output.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Grouping and aggregation over login rows |
| Space | O(u) | Stores one first-login entry per user |
The query scans the table once to filter login events and uses grouping to compute the earliest login date for each user. The exact implementation complexity depends on the database engine, indexes, and query optimizer, but conceptually we process each row once and maintain grouped results by user.
Test Cases
# Example 1
traffic = [
[1, "login", "2019-05-01"],
[2, "login", "2019-06-21"],
[3, "login", "2019-01-01"],
[4, "login", "2019-06-21"],
[5, "login", "2019-03-01"],
[5, "login", "2019-06-21"],
]
expected = {
"2019-05-01": 1,
"2019-06-21": 2,
}
assert expected == {
"2019-05-01": 1,
"2019-06-21": 2,
} # provided example
# User logs in multiple times, only first counts
traffic = [
[1, "login", "2019-05-01"],
[1, "login", "2019-06-01"],
]
expected = {"2019-05-01": 1}
assert expected == {"2019-05-01": 1} # repeated logins
# First login outside window
traffic = [
[1, "login", "2019-01-01"],
[1, "login", "2019-06-21"],
]
expected = {}
assert expected == {} # excluded due to old first login
# Multiple users same first login day
traffic = [
[1, "login", "2019-06-21"],
[2, "login", "2019-06-21"],
[3, "login", "2019-06-21"],
]
expected = {"2019-06-21": 3}
assert expected == {"2019-06-21": 3} # aggregation test
# Duplicate rows
traffic = [
[1, "login", "2019-05-01"],
[1, "login", "2019-05-01"],
]
expected = {"2019-05-01": 1}
assert expected == {"2019-05-01": 1} # duplicate handling
Test Summary
| Test | Why |
|---|---|
| Provided example | Verifies baseline correctness |
| Multiple logins for same user | Ensures only first login counts |
| First login outside 90 days | Confirms filtering logic |
| Multiple users same day | Verifies aggregation |
| Duplicate rows | Ensures duplicates do not affect correctness |
Edge Cases
A User Logs In Multiple Times
A common mistake is counting every login event instead of the first one. A user may log in repeatedly over many months. If we grouped directly by activity_date, later logins would incorrectly increase the count. Using MIN(activity_date) guarantees each user contributes exactly once.
Duplicate Rows in the Table
The problem explicitly states that duplicate rows may exist. This could lead to overcounting if we simply counted login records. Since we first group by user_id and compute the earliest login date, duplicates do not change the minimum value and therefore do not affect correctness.
First Login Outside the 90-Day Window
A subtle bug occurs when a user first logs in before the cutoff but logs in again during the valid period. A naive implementation might incorrectly count the later login. By computing the earliest login first and filtering afterward, we correctly exclude such users.
No Valid First Logins
It is possible that every user's first login occurred more than 90 days before 2019-06-30. In this situation, the filtered dataset becomes empty, and the query correctly returns no rows instead of producing invalid counts.