LeetCode 1142 - User Activity for the Past 30 Days II
This problem asks us to compute the average number of sessions per user over a specific 30-day period ending on 2019-07-27. Each row in the Activity table represents an activity a user performed in a session on a particular date.
Difficulty: 🟢 Easy
Topics: Database
Solution
Problem Understanding
This problem asks us to compute the average number of sessions per user over a specific 30-day period ending on 2019-07-27. Each row in the Activity table represents an activity a user performed in a session on a particular date. A single session can have multiple activities, and the table may contain duplicate rows. A session counts for the average only if it has at least one activity in the 30-day window.
The input table has four columns: user_id, session_id, activity_date, and activity_type. The activity_type is an ENUM representing actions like opening or ending a session, scrolling, or sending a message. The output is a single number, rounded to two decimal places, representing the average number of distinct sessions per user during the 30-day period.
The constraints to note are that each session belongs to exactly one user, and activities may be duplicated. A naive solution that simply counts rows would be incorrect because a single session may have multiple activities and duplicates, so we must count distinct sessions per user.
Important edge cases include users with no sessions in the period, users with multiple sessions, and duplicate rows that should not inflate counts.
Approaches
The brute-force approach is to iterate through each user, then for each user iterate through all their activities, check if the activity falls within the 30-day window, track all session IDs, remove duplicates, and finally calculate the average. This works but requires multiple scans over the dataset, which can be expensive if the table is large.
The key observation for an optimal solution is that we do not need to examine individual activity types or handle duplicates manually. By using a SQL query that filters by date, selects distinct user_id and session_id, groups by user_id, counts the sessions per user, and then averages these counts, we can compute the desired result efficiently in one pass.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(N^2) | O(N) | Iterate each user and session manually, slow for large datasets |
| Optimal | O(N) | O(U) | Use SQL DISTINCT, GROUP BY, and AVG to compute result in a single query |
Algorithm Walkthrough
- Filter the
Activitytable to include only rows whereactivity_dateis between2019-06-28and2019-07-27inclusive. This ensures we only consider the last 30 days. - Select distinct combinations of
user_idandsession_idto remove duplicates and multiple activity rows from the same session. - Group the resulting distinct sessions by
user_idto count the number of sessions per user. - Compute the average of these session counts across all users.
- Round the final average to 2 decimal places as required.
This works because after filtering and deduplicating, each session is counted exactly once per user. Aggregating by user and taking the average then gives the correct average sessions per user.
Python Solution
# This is a LeetCode database problem, so the solution is SQL-based.
# For completeness, here is how it would look in a Python DB API context:
import sqlite3
def average_sessions_per_user(conn) -> float:
query = """
SELECT ROUND(AVG(session_count), 2) AS average_sessions_per_user
FROM (
SELECT user_id, COUNT(DISTINCT session_id) AS session_count
FROM Activity
WHERE activity_date BETWEEN DATE('2019-07-27', '-29 DAY') AND '2019-07-27'
GROUP BY user_id
) AS user_sessions;
"""
cursor = conn.cursor()
cursor.execute(query)
result = cursor.fetchone()
return result[0]
The implementation filters activities within the last 30 days, deduplicates sessions per user, counts them, and then averages the counts. The inner subquery produces a table with one row per user showing their session count, and the outer query computes the average rounded to two decimal places.
Go Solution
// In Go, this would typically be executed using a database/sql connection
import (
"database/sql"
_ "github.com/go-sql-driver/mysql"
"fmt"
)
func AverageSessionsPerUser(db *sql.DB) (float64, error) {
query := `
SELECT ROUND(AVG(session_count), 2) AS average_sessions_per_user
FROM (
SELECT user_id, COUNT(DISTINCT session_id) AS session_count
FROM Activity
WHERE activity_date BETWEEN DATE_SUB('2019-07-27', INTERVAL 29 DAY) AND '2019-07-27'
GROUP BY user_id
) AS user_sessions;
`
var avg float64
err := db.QueryRow(query).Scan(&avg)
if err != nil {
return 0, err
}
return avg, nil
}
In Go, the SQL logic remains the same. The only differences are Go-specific handling of database connections, error checking, and scanning the result into a variable. The rounding is handled by SQL.
Worked Examples
Using the example from the problem statement:
| user_id | session_id | activity_date |
|---|---|---|
| 1 | 1 | 2019-07-20 |
| 2 | 4 | 2019-07-20 |
| 3 | 2 | 2019-07-21 |
| 3 | 5 | 2019-07-21 |
| 4 | 3 | 2019-06-25 |
- Filter activities in 30-day window: exclude
user 4. - Select distinct
user_idandsession_id: user 1 → 1 session, user 2 → 1 session, user 3 → 2 sessions. - Compute average:
(1 + 1 + 2)/3 = 1.33.
The final output is 1.33.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N) | We scan each row once to filter by date and compute distinct sessions |
| Space | O(U) | We store one row per user to count sessions, where U is number of unique users |
This is optimal for the problem because each activity must be examined to ensure it falls within the time window.
Test Cases
# Test cases
# Provided example
assert average_sessions_per_user(conn) == 1.33 # Users 1,2,3 as in example
# User with no activity in period
# Should not count in average
# User 5 has no activity in last 30 days
# Users 1 and 2 have 1 session each
# Average = (1+1)/2 = 1.00
assert average_sessions_per_user(conn) == 1.00
# Single user multiple sessions
# User 6 has 3 sessions in last 30 days
# Average = 3.00
assert average_sessions_per_user(conn) == 3.00
# Duplicate rows
# Multiple rows for same session
# Only count distinct sessions
assert average_sessions_per_user(conn) == 2.50
| Test | Why |
|---|---|
| Provided example | Validates normal scenario with multiple users and sessions |
| User with no activity | Ensures users without activity are excluded |
| Single user multiple sessions | Checks that multiple sessions per user are counted correctly |
| Duplicate rows | Ensures duplicates do not inflate session count |
Edge Cases
One important edge case is a user with no sessions in the last 30 days. The algorithm must exclude such users from the average, which is handled naturally by grouping after filtering by date.
Another edge case is duplicate activity rows for the same session. Naive counting could overcount sessions. Using COUNT(DISTINCT session_id) ensures each session is counted only once per user.
A third edge case is a session spanning the boundary of the 30-day period, for example starting before the period and ending within it. Only sessions with at least one activity in the period should be counted, which is correctly handled by filtering activities by date before deduplication.