LeetCode 3056 - Snaps Analysis
The problem requires calculating the percentage of time each age group spends on two types of snap activities: sending and opening. We are given two tables: Activities and Age.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
The problem requires calculating the percentage of time each age group spends on two types of snap activities: sending and opening. We are given two tables: Activities and Age. The Activities table tracks individual user activity with the time spent per action, and the Age table maps each user to an age group bucket. The goal is to aggregate the total time spent per activity type for all users in each age group, compute percentages for each type relative to the total time in that group, and round the results to two decimal places.
The input is relational and normalized. Each user may have multiple activity records, and the solution must correctly sum all activities per user and then per age bucket. The output is a table with age_bucket, send_perc, and open_perc. Edge cases include users with no activities, age groups with a single user, and cases where all activity is of one type, which could lead to a 100% value for one activity and 0% for the other. The problem guarantees unique user_id in the Age table and unique activity_id in the Activities table.
Approaches
The brute-force approach involves joining the tables, then iterating through each age group and manually summing the times for each activity type and computing percentages. While correct, this approach requires multiple nested loops and manual aggregation, making it inefficient for large datasets.
The optimal solution leverages SQL-style aggregation using either a GROUP BY after joining the tables or equivalent logic in Python/Go. By grouping first by user_id and then by age_bucket, we can efficiently sum time_spent per activity type. Using conditional sums allows computing total time for sending and opening snaps, from which percentages are derived.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n * m) | O(n + m) | Iterates all users per age group; inefficient for large tables |
| Optimal | O(n) | O(k) | Single pass aggregation using hash maps; k is the number of age buckets |
Algorithm Walkthrough
- Join the
ActivitiesandAgetables onuser_idto associate each activity with an age bucket. This ensures every activity has its corresponding age group. - Initialize a hash map (or dictionary) keyed by
age_bucketto store the cumulative time forsendandopenactivities. - Iterate over the joined table, adding the
time_spentto the corresponding activity type in the age bucket aggregate. - After summing, compute the total time per age bucket as the sum of
sendandopentimes. - Calculate the percentage for each activity type by dividing the time spent by the total time and multiplying by 100.
- Round the percentages to two decimal places.
- Construct the final result table with
age_bucket,send_perc, andopen_perc.
Why it works: Aggregating first per age bucket ensures that all activities are correctly accounted for by group. Conditional sums preserve the activity type distinction. Computing percentages after total aggregation guarantees correct results even with multiple users per age bucket.
Python Solution
import pandas as pd
def snaps_analysis(activities: pd.DataFrame, age: pd.DataFrame) -> pd.DataFrame:
merged = activities.merge(age, on='user_id', how='inner')
agg = merged.groupby('age_bucket').agg(
send_time=('time_spent', lambda x: x[merged['activity_type'] == 'send'].sum()),
open_time=('time_spent', lambda x: x[merged['activity_type'] == 'open'].sum())
).reset_index()
agg['total_time'] = agg['send_time'] + agg['open_time']
agg['send_perc'] = round((agg['send_time'] / agg['total_time']) * 100, 2)
agg['open_perc'] = round((agg['open_time'] / agg['total_time']) * 100, 2)
return agg[['age_bucket', 'send_perc', 'open_perc']]
The code merges the tables to associate activities with age buckets, then aggregates using a lambda to conditionally sum send and open times. It computes total time and derives percentages, rounding to two decimal places before returning the final table.
Go Solution
package main
import (
"fmt"
"math"
)
type Activity struct {
ActivityID int
UserID int
ActivityType string
TimeSpent float64
}
type Age struct {
UserID int
AgeBucket string
}
type Result struct {
AgeBucket string
SendPerc float64
OpenPerc float64
}
func snapsAnalysis(activities []Activity, ages []Age) []Result {
ageMap := make(map[int]string)
for _, a := range ages {
ageMap[a.UserID] = a.AgeBucket
}
agg := make(map[string]map[string]float64)
for _, act := range activities {
ageBucket, ok := ageMap[act.UserID]
if !ok {
continue
}
if _, exists := agg[ageBucket]; !exists {
agg[ageBucket] = map[string]float64{"send": 0, "open": 0}
}
agg[ageBucket][act.ActivityType] += act.TimeSpent
}
var results []Result
for ageBucket, times := range agg {
total := times["send"] + times["open"]
sendPerc := math.Round(times["send"]/total*100*100) / 100
openPerc := math.Round(times["open"]/total*100*100) / 100
results = append(results, Result{AgeBucket: ageBucket, SendPerc: sendPerc, OpenPerc: openPerc})
}
return results
}
func main() {
// Example usage omitted for brevity
}
The Go implementation mirrors Python logic: map user IDs to age buckets, aggregate activity times by age bucket, then compute and round percentages. Go requires explicit rounding and use of maps for aggregation.
Worked Examples
Example 1:
Age bucket 31-35 has user 123: send=3.50, open=4.50+1.25=5.75. Total=9.25. Percentages: send=37.84, open=62.16.
Age bucket 26-30 has user 456: send=5.67+8.24=13.91, open=3.00. Total=16.91. Percentages: send=82.26, open=17.74.
Age bucket 21-25 has user 789: send=6.24, open=5.25. Total=11.49. Percentages: send=54.31, open=45.69.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Single pass over activities and ages; n = number of activity rows |
| Space | O(k) | Hash map stores aggregated times per age bucket; k = number of age buckets |
The algorithm is efficient because it avoids nested loops and performs aggregation in linear time relative to input size.
Test Cases
import pandas as pd
activities = pd.DataFrame({
'activity_id': [7274,2425,1413,2536,8564,5235,4251,1435],
'user_id': [123,123,456,456,456,789,123,789],
'activity_type': ['open','send','send','open','send','send','open','open'],
'time_spent': [4.50,3.50,5.67,3.00,8.24,6.24,1.25,5.25]
})
ages = pd.DataFrame({
'user_id': [123,789,456],
'age_bucket': ['31-35','21-25','26-30']
})
result = snaps_analysis(activities, ages)
expected = pd.DataFrame({
'age_bucket': ['31-35','26-30','21-25'],
'send_perc': [37.84,82.26,54.31],
'open_perc': [62.16,17.74,45.69]
})
assert result.equals(expected)
| Test | Why |
|---|---|
| Provided example | Validates aggregation and percentage calculation |
| Single user per age group | Ensures sums work when group size=1 |
| All activities same type | Validates 100%/0% handling |
Edge Cases
- Age bucket with no activities: If an age group has users but no recorded activities, the total time is zero. The implementation avoids division by zero by only processing age buckets present in the activity data.
- All activities of one type: When an age group only has
sendor onlyopenactivities, one percentage should be 100 and the other 0. Rounding is handled correctly to produce accurate percentages. - Multiple users per age bucket: Ensures that aggregation sums across multiple users within the same age group, not just per user. This avoids underestimating total times.