LeetCode 2688 - Find Active Users
The problem gives us a database table named Users. Each row represents a purchase made by a user. The columns include: - userid, the identifier of the user - item, the purchased product - createdat, the purchase timestamp - amount, the purchase value The table may contain…
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
The problem gives us a database table named Users. Each row represents a purchase made by a user. The columns include:
user_id, the identifier of the useritem, the purchased productcreated_at, the purchase timestampamount, the purchase value
The table may contain duplicate rows, which means we cannot assume uniqueness of purchases.
We need to identify all "active users". A user is considered active if they made at least two purchases where the second purchase happened within 7 days of another purchase made by the same user.
The key detail is that we are comparing purchase dates belonging to the same user. If any pair of purchases for a user differs by at most 7 days, that user must appear in the result.
The output should contain only the user_id values of active users, and the order does not matter.
For example, suppose a user made purchases on:
- 2021-09-10
- 2021-09-14
The difference is 4 days, which is within the allowed 7-day window, so the user is active.
However, if purchases happened on:
- 2021-09-02
- 2021-09-13
The difference is 11 days, so the user is not active.
An important observation is that we do not need to compare every possible pair of dates once the purchases are sorted chronologically. If there exists any valid pair within 7 days, then at least one adjacent pair in sorted order will also satisfy the condition. This insight allows for a much more efficient solution.
There are several edge cases worth considering:
- Users with only one purchase can never be active.
- Multiple purchases on the same day should count because the difference is 0 days.
- Duplicate rows should still work correctly because duplicate timestamps naturally satisfy the condition.
- Purchases may not be stored in chronological order, so sorting is necessary.
- A user may have many purchases spread across long time periods, where only some nearby purchases satisfy the condition.
Approaches
Brute Force Approach
The brute force solution groups purchases by user and compares every pair of purchases for that user.
For each user:
- Collect all purchase dates.
- Compare every pair of dates.
- Compute the absolute difference in days.
- If any pair differs by at most 7 days, mark the user as active.
This approach is correct because it explicitly checks all possible purchase pairs. If a valid pair exists, the algorithm will eventually find it.
However, this method becomes inefficient when users have many purchases. If a user has k purchases, we must perform O(k²) comparisons. Across the entire table, this can become expensive.
Optimal Approach
The better solution relies on sorting each user's purchases by date.
After sorting, we only need to compare adjacent purchases. This works because the closest dates in a sorted sequence must appear next to each other.
Suppose a valid pair exists somewhere in the sorted list. Then there must also exist at least one neighboring pair whose difference is less than or equal to that same value. Therefore, checking adjacent pairs is sufficient.
The algorithm becomes:
- Group purchases by user.
- Sort each user's dates.
- Scan adjacent dates.
- If any neighboring pair differs by at most 7 days, the user is active.
This significantly reduces the number of comparisons.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) in the worst case | O(n) | Compares every pair of purchases for each user |
| Optimal | O(n log n) | O(n) | Sorts dates and checks only adjacent pairs |
Algorithm Walkthrough
- Group all purchase dates by
user_id.
We first organize the data so that every user has a list of their purchase timestamps. This allows us to process each user's purchases independently. 2. Sort the purchase dates for each user.
Purchases may appear in arbitrary order in the database table. Sorting ensures that nearby dates become adjacent in the list. 3. Scan adjacent purchase dates.
For a sorted list of dates:
d1 <= d2 <= d3 <= ...
we compare:
d2 - d1d3 - d2- and so on
If any difference is less than or equal to 7 days, the user is active. 4. Add active users to the result.
Once a valid adjacent pair is found, we include the user in the answer and stop checking further purchases for that user. 5. Return all active user IDs.
Why it works
Sorting guarantees that the minimum distance between any two purchases appears among adjacent elements. If some pair of purchases is within 7 days, then at least one neighboring pair in sorted order must also be within 7 days. Therefore, checking adjacent pairs is sufficient to correctly identify all active users.
Python Solution
from collections import defaultdict
from datetime import datetime
from typing import List
class Solution:
def findActiveUsers(self, users: List[List[str]]) -> List[int]:
purchases_by_user = defaultdict(list)
# Group purchase dates by user
for user_id, item, created_at, amount in users:
purchase_date = datetime.strptime(created_at, "%Y-%m-%d")
purchases_by_user[user_id].append(purchase_date)
active_users = []
# Process each user's purchases
for user_id, purchase_dates in purchases_by_user.items():
purchase_dates.sort()
# Check adjacent purchases
for i in range(1, len(purchase_dates)):
days_difference = (
purchase_dates[i] - purchase_dates[i - 1]
).days
if days_difference <= 7:
active_users.append(user_id)
break
return active_users
The implementation begins by using a defaultdict to group purchase dates by user. Every row is parsed, and the created_at string is converted into a datetime object so that date arithmetic becomes straightforward.
After grouping, each user's purchase dates are sorted chronologically. This is the critical optimization step because it allows the algorithm to check only neighboring purchases.
The loop over adjacent purchases computes the day difference using datetime subtraction. If the difference is at most 7, the user is immediately added to the result list, and we stop processing that user because we already know they are active.
Finally, the list of active users is returned.
Go Solution
package main
import (
"sort"
"time"
)
type User struct {
user_id int
item string
created_at string
amount int
}
func findActiveUsers(users []User) []int {
purchasesByUser := make(map[int][]time.Time)
// Group purchase dates by user
for _, user := range users {
purchaseDate, _ := time.Parse("2006-01-02", user.created_at)
purchasesByUser[user.user_id] = append(
purchasesByUser[user.user_id],
purchaseDate,
)
}
activeUsers := []int{}
// Process each user's purchases
for userID, purchaseDates := range purchasesByUser {
sort.Slice(purchaseDates, func(i, j int) bool {
return purchaseDates[i].Before(purchaseDates[j])
})
// Check adjacent purchases
for i := 1; i < len(purchaseDates); i++ {
diff := purchaseDates[i].Sub(purchaseDates[i-1]).Hours() / 24
if diff <= 7 {
activeUsers = append(activeUsers, userID)
break
}
}
}
return activeUsers
}
The Go implementation follows the same overall strategy as the Python solution. The main difference is that Go uses the time package for date parsing and arithmetic.
Dates are parsed using the layout string "2006-01-02", which is Go's standard formatting convention. Sorting is performed with sort.Slice, and date differences are computed using Sub() followed by conversion from hours to days.
Go slices naturally handle dynamic growth, so appending purchase dates and active user IDs works efficiently.
Worked Examples
Example 1
Input:
+---------+-------------------+------------+--------+
| user_id | item | created_at | amount |
+---------+-------------------+------------+--------+
| 5 | Smart Crock Pot | 2021-09-18 | 698882 |
| 6 | Smart Lock | 2021-09-14 | 11487 |
| 6 | Smart Thermostat | 2021-09-10 | 674762 |
| 8 | Smart Light Strip | 2021-09-29 | 630773 |
| 4 | Smart Cat Feeder | 2021-09-02 | 693545 |
| 4 | Smart Bed | 2021-09-13 | 170249 |
+---------+-------------------+------------+--------+
Step 1, Group Purchases
| User | Purchase Dates |
|---|---|
| 5 | [2021-09-18] |
| 6 | [2021-09-14, 2021-09-10] |
| 8 | [2021-09-29] |
| 4 | [2021-09-02, 2021-09-13] |
Step 2, Sort Purchases
| User | Sorted Dates |
|---|---|
| 5 | [2021-09-18] |
| 6 | [2021-09-10, 2021-09-14] |
| 8 | [2021-09-29] |
| 4 | [2021-09-02, 2021-09-13] |
Step 3, Compare Adjacent Dates
User 5
Only one purchase exists, so the user cannot be active.
User 6
| Pair | Difference |
|---|---|
| 2021-09-10 -> 2021-09-14 | 4 days |
Since 4 <= 7, user 6 is active.
User 8
Only one purchase exists, so the user cannot be active.
User 4
| Pair | Difference |
|---|---|
| 2021-09-02 -> 2021-09-13 | 11 days |
Since 11 > 7, user 4 is not active.
Final Result
[6]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Sorting purchase dates dominates the runtime |
| Space | O(n) | Additional storage is used for grouped purchase lists |
The grouping step processes each purchase exactly once, which costs O(n) time. The expensive part is sorting the purchase lists. In the worst case, all purchases belong to one user, so sorting costs O(n log n).
The algorithm stores all purchase dates in hash map lists, requiring O(n) additional memory.
Test Cases
from collections import defaultdict
from datetime import datetime
class Solution:
def findActiveUsers(self, users):
purchases_by_user = defaultdict(list)
for user_id, item, created_at, amount in users:
purchase_date = datetime.strptime(created_at, "%Y-%m-%d")
purchases_by_user[user_id].append(purchase_date)
active_users = []
for user_id, purchase_dates in purchases_by_user.items():
purchase_dates.sort()
for i in range(1, len(purchase_dates)):
if (purchase_dates[i] - purchase_dates[i - 1]).days <= 7:
active_users.append(user_id)
break
return sorted(active_users)
solution = Solution()
# Provided example
assert solution.findActiveUsers([
[5, "Smart Crock Pot", "2021-09-18", 698882],
[6, "Smart Lock", "2021-09-14", 11487],
[6, "Smart Thermostat", "2021-09-10", 674762],
[8, "Smart Light Strip", "2021-09-29", 630773],
[4, "Smart Cat Feeder", "2021-09-02", 693545],
[4, "Smart Bed", "2021-09-13", 170249],
]) == [6] # basic example
# Same day purchases
assert solution.findActiveUsers([
[1, "A", "2021-01-01", 10],
[1, "B", "2021-01-01", 20],
]) == [1] # difference is 0 days
# Exactly 7 days apart
assert solution.findActiveUsers([
[2, "A", "2021-01-01", 10],
[2, "B", "2021-01-08", 20],
]) == [2] # inclusive boundary
# More than 7 days apart
assert solution.findActiveUsers([
[3, "A", "2021-01-01", 10],
[3, "B", "2021-01-09", 20],
]) == [] # outside allowed window
# Multiple purchases where only one pair qualifies
assert solution.findActiveUsers([
[4, "A", "2021-01-01", 10],
[4, "B", "2021-02-01", 20],
[4, "C", "2021-02-05", 30],
]) == [4] # later pair qualifies
# Single purchase user
assert solution.findActiveUsers([
[5, "A", "2021-01-01", 10],
]) == [] # cannot be active
# Duplicate records
assert solution.findActiveUsers([
[6, "A", "2021-01-01", 10],
[6, "A", "2021-01-01", 10],
]) == [6] # duplicates still count
# Multiple users mixed together
assert solution.findActiveUsers([
[1, "A", "2021-01-01", 10],
[1, "B", "2021-01-20", 20],
[2, "C", "2021-02-01", 30],
[2, "D", "2021-02-05", 40],
[3, "E", "2021-03-01", 50],
]) == [2] # only one active user
| Test | Why |
|---|---|
| Provided example | Verifies correctness on the official sample |
| Same day purchases | Ensures 0-day difference counts |
| Exactly 7 days apart | Confirms inclusive boundary handling |
| More than 7 days apart | Ensures invalid pairs are rejected |
| Multiple purchases with one valid pair | Verifies partial matches work |
| Single purchase user | Ensures at least two purchases are required |
| Duplicate records | Confirms duplicates are handled correctly |
| Multiple users mixed | Verifies grouping logic works properly |
Edge Cases
Users With Only One Purchase
A user who appears only once in the table can never satisfy the condition because there is no second purchase to compare against. A naive implementation might accidentally treat a single transaction as valid if it does not properly check pair counts. This implementation naturally handles the case because the adjacent comparison loop never executes for lists of length 1.
Purchases Exactly 7 Days Apart
The problem explicitly states that "within 7 days" is inclusive. This means a difference of exactly 7 days should count as active. Off-by-one errors are very common here. The implementation correctly uses <= 7 instead of < 7.
Duplicate Purchase Records
The table may contain duplicate rows. Two identical timestamps produce a difference of 0 days, which should qualify the user as active. Some implementations may incorrectly remove duplicates before processing, potentially changing the intended behavior. This solution keeps duplicates intact, ensuring correctness.
Purchases Stored Out of Order
The input data is not guaranteed to be sorted chronologically. Without sorting, checking only consecutive rows from the input could miss valid pairs. The implementation explicitly sorts each user's purchase dates before comparing adjacent entries, guaranteeing correct results regardless of input order.