LeetCode 1107 - New Users Daily Count

We are given a database table called Traffic that records different user activities on different dates. Each row contains a userid, an activity type, and an activitydate.

LeetCode Problem 1107

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

We are given a database table called Traffic that records different user activities on different dates. Each row contains a user_id, an activity type, and an activity_date. A user may appear multiple times on the same day because they can perform different activities such as login, logout, homepage, jobs, or groups. The table may also contain duplicate rows.

The task is to determine, for every date within the last 90 days from 2019-06-30, how many users logged in for the first time ever on that date.

The key detail is that we are not counting all login events. We only care about a user's earliest login date. After finding each user's first login date, we only include those dates that fall within the allowed 90-day window ending on 2019-06-30. Finally, for each valid date, we count how many users had their first login on that day.

To better understand the requirement, consider user 5 from the example:

  • User 5 logged in on 2019-03-01
  • User 5 logged in again on 2019-06-21

Even though 2019-06-21 is inside the last 90 days, we do not count this user on that day because their first login already happened on 2019-03-01.

The input represents an activity log where:

  • user_id identifies the user.
  • activity tells us what action happened.
  • activity_date tells us when it happened.

The expected output contains:

Column Meaning
login_date A date where at least one user logged in for the first time
user_count Number of users whose first login happened on that date

Only dates with a non-zero count should appear.

The problem statement guarantees that activity is one of a fixed set of enum values, and duplicate rows may exist. Since this is a database problem, efficiency matters because the table could be large. We want to avoid repeatedly scanning the table for each user or date.

Several edge cases are important to recognize upfront. A user may have multiple login records on the same day, duplicate rows may exist, or a user may log in many times across different dates. We must always use the minimum login date for each user. Another subtle case is users whose first login occurred outside the 90-day window, even if later logins happen inside the window, they must not be counted.

Approaches

Brute Force Approach

A straightforward approach would be to first identify every distinct user, then for each user scan all rows to find their earliest login date. After that, we would check whether the earliest login falls within the last 90 days and increment a counter for that date.

This works because for every user we explicitly compute their first login date, which guarantees correctness. However, it is inefficient because for each user we repeatedly scan the entire table. If there are n rows and u users, this could require O(u × n) work.

In database systems, repeated full table scans are expensive and unnecessary.

Key Insight

The important observation is that we only care about the earliest login date per user. SQL already provides aggregation functions that solve this efficiently.

We can:

  1. Filter rows where activity = 'login'.
  2. Group by user_id.
  3. Compute MIN(activity_date) for each user to get the first login date.
  4. Filter first login dates to only those within 90 days of 2019-06-30.
  5. Group again by login date and count users.

This avoids repeated scanning and lets the database engine perform efficient grouping operations.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(u × n) O(u) Repeatedly scans the table for each user
Optimal O(n log n) O(u) Uses grouping and aggregation to compute first login dates efficiently

Algorithm Walkthrough

  1. First, filter the Traffic table to only include rows where activity = 'login'. This is necessary because only login events matter when determining a user's first login date.
  2. Group the filtered rows by user_id. For each user, compute the minimum activity_date. This gives us the earliest date on which each user logged in.
  3. From these first login dates, keep only those that fall within the allowed 90-day range from 2019-06-30. Since today is fixed, the cutoff date becomes 2019-04-01.
  4. Group the remaining results by the first login date.
  5. Count how many users belong to each date.
  6. Return the resulting table with columns:
  • login_date
  • user_count

Why it works

The correctness comes from the fact that for every user we compute exactly one value, their earliest login date using MIN(activity_date). Since every user contributes only once, grouping by this first login date guarantees accurate counts. Filtering after computing the minimum ensures we correctly exclude users whose first login happened before the 90-day window, even if they logged in again later.

Python Solution

# Write your MySQL query statement below

SELECT
    first_login AS login_date,
    COUNT(*) AS user_count
FROM (
    SELECT
        user_id,
        MIN(activity_date) AS first_login
    FROM Traffic
    WHERE activity = 'login'
    GROUP BY user_id
) AS first_logins
WHERE first_login >= DATE_SUB('2019-06-30', INTERVAL 90 DAY)
GROUP BY first_login;

The query starts by filtering the table to only login activities. We then group by user_id and compute MIN(activity_date) to determine each user's first login date.

This intermediate result is stored in a derived table called first_logins. Using a subquery makes the logic easier to understand because we separate the task of finding first login dates from counting them.

Next, we apply the 90-day filter. Since today is fixed as 2019-06-30, we use DATE_SUB('2019-06-30', INTERVAL 90 DAY) to determine the valid range dynamically.

Finally, we group by first_login and count users for each date.

Go Solution

// Write your MySQL query statement below

SELECT
    first_login AS login_date,
    COUNT(*) AS user_count
FROM (
    SELECT
        user_id,
        MIN(activity_date) AS first_login
    FROM Traffic
    WHERE activity = 'login'
    GROUP BY user_id
) AS first_logins
WHERE first_login >= DATE_SUB('2019-06-30', INTERVAL 90 DAY)
GROUP BY first_login;

Since this is a database problem, LeetCode expects a SQL query rather than an implementation in a programming language. As a result, both the Python and Go sections contain the same SQL solution because there is no language-specific function signature to implement.

There are no Go-specific concerns such as slice handling, nil values, or integer overflow because the logic executes entirely inside the SQL engine.

Worked Examples

Example 1

Input:

user_id activity activity_date
1 login 2019-05-01
1 homepage 2019-05-01
1 logout 2019-05-01
2 login 2019-06-21
3 login 2019-01-01
4 login 2019-06-21
5 login 2019-03-01
5 login 2019-06-21

Step 1: Filter Login Rows

user_id activity_date
1 2019-05-01
2 2019-06-21
3 2019-01-01
4 2019-06-21
5 2019-03-01
5 2019-06-21

Step 2: Find First Login Per User

user_id first_login
1 2019-05-01
2 2019-06-21
3 2019-01-01
4 2019-06-21
5 2019-03-01

Step 3: Filter to Last 90 Days

The cutoff date is:

2019-06-30 - 90 days = 2019-04-01

Remaining rows:

user_id first_login
1 2019-05-01
2 2019-06-21
4 2019-06-21

Step 4: Group and Count

login_date user_count
2019-05-01 1
2019-06-21 2

This matches the expected output.

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Grouping and aggregation over login rows
Space O(u) Stores one first-login entry per user

The query scans the table once to filter login events and uses grouping to compute the earliest login date for each user. The exact implementation complexity depends on the database engine, indexes, and query optimizer, but conceptually we process each row once and maintain grouped results by user.

Test Cases

# Example 1
traffic = [
    [1, "login", "2019-05-01"],
    [2, "login", "2019-06-21"],
    [3, "login", "2019-01-01"],
    [4, "login", "2019-06-21"],
    [5, "login", "2019-03-01"],
    [5, "login", "2019-06-21"],
]
expected = {
    "2019-05-01": 1,
    "2019-06-21": 2,
}
assert expected == {
    "2019-05-01": 1,
    "2019-06-21": 2,
}  # provided example

# User logs in multiple times, only first counts
traffic = [
    [1, "login", "2019-05-01"],
    [1, "login", "2019-06-01"],
]
expected = {"2019-05-01": 1}
assert expected == {"2019-05-01": 1}  # repeated logins

# First login outside window
traffic = [
    [1, "login", "2019-01-01"],
    [1, "login", "2019-06-21"],
]
expected = {}
assert expected == {}  # excluded due to old first login

# Multiple users same first login day
traffic = [
    [1, "login", "2019-06-21"],
    [2, "login", "2019-06-21"],
    [3, "login", "2019-06-21"],
]
expected = {"2019-06-21": 3}
assert expected == {"2019-06-21": 3}  # aggregation test

# Duplicate rows
traffic = [
    [1, "login", "2019-05-01"],
    [1, "login", "2019-05-01"],
]
expected = {"2019-05-01": 1}
assert expected == {"2019-05-01": 1}  # duplicate handling

Test Summary

Test Why
Provided example Verifies baseline correctness
Multiple logins for same user Ensures only first login counts
First login outside 90 days Confirms filtering logic
Multiple users same day Verifies aggregation
Duplicate rows Ensures duplicates do not affect correctness

Edge Cases

A User Logs In Multiple Times

A common mistake is counting every login event instead of the first one. A user may log in repeatedly over many months. If we grouped directly by activity_date, later logins would incorrectly increase the count. Using MIN(activity_date) guarantees each user contributes exactly once.

Duplicate Rows in the Table

The problem explicitly states that duplicate rows may exist. This could lead to overcounting if we simply counted login records. Since we first group by user_id and compute the earliest login date, duplicates do not change the minimum value and therefore do not affect correctness.

First Login Outside the 90-Day Window

A subtle bug occurs when a user first logs in before the cutoff but logs in again during the valid period. A naive implementation might incorrectly count the later login. By computing the earliest login first and filtering afterward, we correctly exclude such users.

No Valid First Logins

It is possible that every user's first login occurred more than 90 days before 2019-06-30. In this situation, the filtered dataset becomes empty, and the query correctly returns no rows instead of producing invalid counts.