LeetCode 1141 - User Activity for the Past 30 Days I

This problem asks us to compute the number of distinct active users for each day within a fixed 30 day window. The window ends on 2019-07-27, inclusive, which means we only consider activity dates from 2019-06-28 through 2019-07-27.

LeetCode Problem 1141

Difficulty: 🟢 Easy
Topics: Database

Solution

LeetCode 1141 - User Activity for the Past 30 Days I

Problem Understanding

This problem asks us to compute the number of distinct active users for each day within a fixed 30 day window. The window ends on 2019-07-27, inclusive, which means we only consider activity dates from 2019-06-28 through 2019-07-27.

The Activity table records user interactions on a social media platform. Each row represents a single action performed by a user during a session. The table contains four columns:

Column Meaning
user_id The user who performed the action
session_id The session in which the action occurred
activity_date The date of the activity
activity_type The type of action performed

The important detail is that any activity counts as valid activity. Whether the action is opening a session, scrolling, sending a message, or ending a session, the user should be considered active on that day.

The output should contain one row per day that has at least one active user. For each such day, we must return:

Column Meaning
day The activity date
active_users The number of distinct users active on that date

The phrase "distinct users" is critical. A user may generate multiple rows on the same day, but they should only be counted once for that date.

The table may contain duplicate rows, which means a naive counting strategy using COUNT(*) would overcount users. We must instead use COUNT(DISTINCT user_id) grouped by date.

The constraints are small enough that a straightforward grouping solution is sufficient. The main challenge is correctly filtering the date range and handling duplicate activities from the same user.

Several edge cases are important:

  1. A user may perform many activities on the same day. They must still count as only one active user.
  2. Duplicate rows may exist in the table. These duplicates should not inflate the user count.
  3. Activities outside the 30 day window must be ignored completely.
  4. Some days inside the window may have zero activity. Those days should not appear in the output.
  5. Multiple sessions from the same user on the same day still count as one active user.

Approaches

Brute Force Approach

A brute force solution would iterate through every date in the 30 day range and, for each date, scan the entire Activity table to determine which users were active on that day.

For each date:

  1. Traverse all rows in the table.
  2. Check whether the row belongs to the current date.
  3. Add the user_id to a set for deduplication.
  4. Count the size of the set after scanning all rows.

This approach is correct because the set guarantees distinct users are counted only once. However, it repeatedly scans the entire table for every date in the range.

If there are N rows and D = 30 days, the total complexity becomes O(N * D). While acceptable for small datasets, it is inefficient because the same rows are revisited many times.

Optimal Approach

The key insight is that SQL databases are designed for grouping and aggregation operations. Instead of processing each date separately, we can filter the relevant rows once and group them by activity_date.

For each date, we count distinct users directly using:

COUNT(DISTINCT user_id)

This automatically handles:

  • Multiple activities from the same user
  • Duplicate rows
  • Multiple sessions on the same day

The database performs the grouping efficiently in a single pass over the filtered rows.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(N * D) O(U) Repeatedly scans the table for each day
Optimal O(N) O(U) Single filtering and grouping pass

Here:

  • N is the number of rows in the table
  • D is the number of days in the range, fixed at 30
  • U is the number of distinct users for a day

Algorithm Walkthrough

  1. Filter the rows to include only dates within the required 30 day window.

The window is inclusive and ends on 2019-07-27. Therefore, the valid range is:

2019-06-28 <= activity_date <= 2019-07-27
  1. Group the filtered rows by activity_date.

This allows us to process each day independently and compute the active user count for that specific date. 3. Count distinct users within each group.

We use COUNT(DISTINCT user_id) because a user may have multiple activity rows on the same day. 4. Return the resulting date and active user count.

Days with no activity naturally disappear because no group exists for them.

Why it works

The algorithm works because grouping by activity_date partitions all valid rows into separate daily buckets. Within each bucket, counting distinct user_id values ensures each user contributes exactly once to the count for that day, regardless of duplicates or multiple activities.

Python Solution

# Write your MySQL query statement below

SELECT
    activity_date AS day,
    COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE activity_date BETWEEN '2019-06-28' AND '2019-07-27'
GROUP BY activity_date;

The solution begins by filtering rows using the WHERE clause. This removes all activities outside the required 30 day interval.

Next, the query groups rows by activity_date. Each group represents all activities performed on a specific day.

Finally, COUNT(DISTINCT user_id) counts unique active users for each day. The DISTINCT keyword is essential because users may have multiple activity records on the same date.

The alias day matches the required output format.

Go Solution

// Write your MySQL query statement below

SELECT
    activity_date AS day,
    COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE activity_date BETWEEN '2019-06-28' AND '2019-07-27'
GROUP BY activity_date;

Since this is a database problem, the SQL solution is identical regardless of the programming language section. There are no Go specific implementation concerns because the logic is executed entirely within the database engine.

Worked Examples

Example 1

Input table:

user_id session_id activity_date activity_type
1 1 2019-07-20 open_session
1 1 2019-07-20 scroll_down
1 1 2019-07-20 end_session
2 4 2019-07-20 open_session
2 4 2019-07-21 send_message
2 4 2019-07-21 end_session
3 2 2019-07-21 open_session
3 2 2019-07-21 send_message
3 2 2019-07-21 end_session
4 3 2019-06-25 open_session
4 3 2019-06-25 end_session

Step 1, Filter by Date Range

Valid range:

2019-06-28 to 2019-07-27

Rows from 2019-06-25 are removed.

Remaining rows:

user_id activity_date
1 2019-07-20
1 2019-07-20
1 2019-07-20
2 2019-07-20
2 2019-07-21
2 2019-07-21
3 2019-07-21
3 2019-07-21
3 2019-07-21

Step 2, Group by Date

Date Users Seen
2019-07-20 1, 1, 1, 2
2019-07-21 2, 2, 3, 3, 3

Step 3, Count Distinct Users

Date Distinct Users Count
2019-07-20 {1, 2} 2
2019-07-21 {2, 3} 2

Final Output

day active_users
2019-07-20 2
2019-07-21 2

Complexity Analysis

Measure Complexity Explanation
Time O(N) Each relevant row is processed once during filtering and grouping
Space O(U) The database may maintain distinct user sets during aggregation

The query scans the table once, applies the date filter, groups rows by date, and counts distinct users within each group. The dominant cost is linear in the number of rows processed.

Test Cases

# Example case
assert solution == [
    ["2019-07-20", 2],
    ["2019-07-21", 2]
]  # standard example from problem statement

# Single user with multiple activities same day
assert solution == [
    ["2019-07-20", 1]
]  # user counted once despite multiple actions

# Duplicate rows
assert solution == [
    ["2019-07-20", 1]
]  # duplicate activity rows should not increase count

# Activities outside valid range
assert solution == []  # no rows inside the 30 day window

# Multiple users on same day
assert solution == [
    ["2019-07-25", 5]
]  # verifies distinct counting for many users

# Same user across multiple sessions same day
assert solution == [
    ["2019-07-26", 1]
]  # user still counts once

# Multiple valid dates
assert solution == [
    ["2019-07-20", 2],
    ["2019-07-21", 3],
    ["2019-07-22", 1]
]  # grouping by day works correctly

# Boundary date inclusion
assert solution == [
    ["2019-06-28", 1],
    ["2019-07-27", 1]
]  # both endpoints are included

Test Case Summary

Test Why
Standard example Verifies basic grouping and counting
Multiple activities same day Ensures distinct user counting
Duplicate rows Confirms duplicates do not inflate counts
Outside range only Verifies date filtering
Many users same day Tests aggregation correctness
Multiple sessions same user Ensures user counted once
Multiple dates Confirms grouping by date
Boundary dates Verifies inclusive range handling

Edge Cases

Multiple Activities by the Same User

A user may perform many actions on the same day, such as opening a session, scrolling, and sending messages. A naive query using COUNT(*) would count every row and overestimate the number of active users.

The implementation avoids this bug by using:

COUNT(DISTINCT user_id)

This guarantees each user contributes only once per day.

Duplicate Rows in the Table

The problem explicitly states that duplicate rows may exist. Without deduplication, repeated rows would inflate the active user count.

Because the query counts distinct user IDs instead of rows, duplicates do not affect the result.

Activities Outside the 30 Day Window

Rows outside the required interval must be ignored completely. A common mistake is incorrectly computing the start date or using an exclusive range.

The implementation correctly includes both endpoints using:

WHERE activity_date BETWEEN '2019-06-28' AND '2019-07-27'

The BETWEEN operator is inclusive in SQL, which matches the problem statement exactly.