LeetCode 3056 - Snaps Analysis

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

The problem requires calculating the percentage of time each age group spends on two types of snap activities: sending and opening. We are given two tables: Activities and Age. The Activities table tracks individual user activity with the time spent per action, and the Age table maps each user to an age group bucket. The goal is to aggregate the total time spent per activity type for all users in each age group, compute percentages for each type relative to the total time in that group, and round the results to two decimal places.

The input is relational and normalized. Each user may have multiple activity records, and the solution must correctly sum all activities per user and then per age bucket. The output is a table with age_bucket, send_perc, and open_perc. Edge cases include users with no activities, age groups with a single user, and cases where all activity is of one type, which could lead to a 100% value for one activity and 0% for the other. The problem guarantees unique user_id in the Age table and unique activity_id in the Activities table.

Approaches

The brute-force approach involves joining the tables, then iterating through each age group and manually summing the times for each activity type and computing percentages. While correct, this approach requires multiple nested loops and manual aggregation, making it inefficient for large datasets.

The optimal solution leverages SQL-style aggregation using either a GROUP BY after joining the tables or equivalent logic in Python/Go. By grouping first by user_id and then by age_bucket, we can efficiently sum time_spent per activity type. Using conditional sums allows computing total time for sending and opening snaps, from which percentages are derived.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n * m)	O(n + m)	Iterates all users per age group; inefficient for large tables
Optimal	O(n)	O(k)	Single pass aggregation using hash maps; k is the number of age buckets

Algorithm Walkthrough

Join the Activities and Age tables on user_id to associate each activity with an age bucket. This ensures every activity has its corresponding age group.
Initialize a hash map (or dictionary) keyed by age_bucket to store the cumulative time for send and open activities.
Iterate over the joined table, adding the time_spent to the corresponding activity type in the age bucket aggregate.
After summing, compute the total time per age bucket as the sum of send and open times.
Calculate the percentage for each activity type by dividing the time spent by the total time and multiplying by 100.
Round the percentages to two decimal places.
Construct the final result table with age_bucket, send_perc, and open_perc.

Why it works: Aggregating first per age bucket ensures that all activities are correctly accounted for by group. Conditional sums preserve the activity type distinction. Computing percentages after total aggregation guarantees correct results even with multiple users per age bucket.

Python Solution

import pandas as pd

def snaps_analysis(activities: pd.DataFrame, age: pd.DataFrame) -> pd.DataFrame:
    merged = activities.merge(age, on='user_id', how='inner')
    agg = merged.groupby('age_bucket').agg(
        send_time=('time_spent', lambda x: x[merged['activity_type'] == 'send'].sum()),
        open_time=('time_spent', lambda x: x[merged['activity_type'] == 'open'].sum())
    ).reset_index()
    agg['total_time'] = agg['send_time'] + agg['open_time']
    agg['send_perc'] = round((agg['send_time'] / agg['total_time']) * 100, 2)
    agg['open_perc'] = round((agg['open_time'] / agg['total_time']) * 100, 2)
    return agg[['age_bucket', 'send_perc', 'open_perc']]

The code merges the tables to associate activities with age buckets, then aggregates using a lambda to conditionally sum send and open times. It computes total time and derives percentages, rounding to two decimal places before returning the final table.

Go Solution

package main

import (
    "fmt"
    "math"
)

type Activity struct {
    ActivityID   int
    UserID       int
    ActivityType string
    TimeSpent    float64
}

type Age struct {
    UserID    int
    AgeBucket string
}

type Result struct {
    AgeBucket string
    SendPerc  float64
    OpenPerc  float64
}

func snapsAnalysis(activities []Activity, ages []Age) []Result {
    ageMap := make(map[int]string)
    for _, a := range ages {
        ageMap[a.UserID] = a.AgeBucket
    }

    agg := make(map[string]map[string]float64)
    for _, act := range activities {
        ageBucket, ok := ageMap[act.UserID]
        if !ok {
            continue
        }
        if _, exists := agg[ageBucket]; !exists {
            agg[ageBucket] = map[string]float64{"send": 0, "open": 0}
        }
        agg[ageBucket][act.ActivityType] += act.TimeSpent
    }

    var results []Result
    for ageBucket, times := range agg {
        total := times["send"] + times["open"]
        sendPerc := math.Round(times["send"]/total*100*100) / 100
        openPerc := math.Round(times["open"]/total*100*100) / 100
        results = append(results, Result{AgeBucket: ageBucket, SendPerc: sendPerc, OpenPerc: openPerc})
    }
    return results
}

func main() {
    // Example usage omitted for brevity
}

The Go implementation mirrors Python logic: map user IDs to age buckets, aggregate activity times by age bucket, then compute and round percentages. Go requires explicit rounding and use of maps for aggregation.

Worked Examples

Example 1:

Age bucket 31-35 has user 123: send=3.50, open=4.50+1.25=5.75. Total=9.25. Percentages: send=37.84, open=62.16.

Age bucket 26-30 has user 456: send=5.67+8.24=13.91, open=3.00. Total=16.91. Percentages: send=82.26, open=17.74.

Age bucket 21-25 has user 789: send=6.24, open=5.25. Total=11.49. Percentages: send=54.31, open=45.69.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Single pass over activities and ages; n = number of activity rows
Space	O(k)	Hash map stores aggregated times per age bucket; k = number of age buckets

The algorithm is efficient because it avoids nested loops and performs aggregation in linear time relative to input size.

Test Cases

import pandas as pd

activities = pd.DataFrame({
    'activity_id': [7274,2425,1413,2536,8564,5235,4251,1435],
    'user_id': [123,123,456,456,456,789,123,789],
    'activity_type': ['open','send','send','open','send','send','open','open'],
    'time_spent': [4.50,3.50,5.67,3.00,8.24,6.24,1.25,5.25]
})

ages = pd.DataFrame({
    'user_id': [123,789,456],
    'age_bucket': ['31-35','21-25','26-30']
})

result = snaps_analysis(activities, ages)
expected = pd.DataFrame({
    'age_bucket': ['31-35','26-30','21-25'],
    'send_perc': [37.84,82.26,54.31],
    'open_perc': [62.16,17.74,45.69]
})
assert result.equals(expected)

Test	Why
Provided example	Validates aggregation and percentage calculation
Single user per age group	Ensures sums work when group size=1
All activities same type	Validates 100%/0% handling

Edge Cases

Age bucket with no activities: If an age group has users but no recorded activities, the total time is zero. The implementation avoids division by zero by only processing age buckets present in the activity data.
All activities of one type: When an age group only has send or only open activities, one percentage should be 100 and the other 0. Rounding is handled correctly to produce accurate percentages.
Multiple users per age bucket: Ensures that aggregation sums across multiple users within the same age group, not just per user. This avoids underestimating total times.