LeetCode 3497 - Analyze Subscription Conversion
This is a SQL aggregation problem where we need to analyze user subscription behavior and return statistics only for users who successfully converted from a free trial to a paid subscription. The UserActivity table contains one row per user, per day, per activity type.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
This is a SQL aggregation problem where we need to analyze user subscription behavior and return statistics only for users who successfully converted from a free trial to a paid subscription.
The UserActivity table contains one row per user, per day, per activity type. The activity_type column can be one of three values:
This is a SQL database problem where we need to analyze user subscription behavior and identify users who successfully converted from a free trial into a paid subscription.
The UserActivity table contains one row per user, per day, per activity type. Each row records:
- The user identifier (
user_id) - The date of the activity (
activity_date) - The subscription status for that activity (
activity_type) - The number of minutes the user spent on the platform that day (
activity_duration)
A user can have activities marked as:
free_trialpaidcancelled
The goal is to identify users who have participated in both the free trial and paid phases, which indicates that they converted from a trial user into a paying customer.
For every converted user, we must calculate two separate averages:
- The average daily activity duration during all
free_trialrecords. - The average daily activity duration during all
paidrecords.
Both averages must be rounded to two decimal places.
The output should contain only users who converted to paid subscriptions. Users who only participated in a free trial and later cancelled should not appear in the result. The final result must be ordered by user_id in ascending order.
The important observation is that the problem does not require reconstructing an exact 7-day trial window. Instead, it simply asks us to compute averages over rows labeled free_trial and rows labeled paid. A user is considered converted if they have at least one row of each type.
Important edge cases include users who never converted, users who converted and later cancelled, users with only one paid record, and users with varying numbers of trial and paid activity records. Since averages are computed only within each activity type, the number of records in each phase can differ.
The goal is to return only users who converted from a free trial to a paid subscription. In practical terms, a user qualifies if they have at least one free_trial record and at least one paid record.
For every qualifying user, we must compute two averages:
- The average activity duration across all
free_trialrows. - The average activity duration across all
paidrows.
Both averages must be rounded to two decimal places.
The output should contain:
| Column | Meaning |
|---|---|
| user_id | User who converted |
| trial_avg_duration | Average duration during free trial |
| paid_avg_duration | Average duration during paid period |
The result must be ordered by user_id in ascending order.
A key observation is that we are not asked to analyze dates or verify that the paid subscription occurred immediately after the trial. The problem defines conversion simply as users who have both free_trial and paid activity records.
Important edge cases include users who never convert, users who later cancel after converting, users with multiple trial entries, users with multiple paid entries, and users with only a single paid or trial activity. The table guarantees uniqueness of (user_id, activity_date, activity_type), so duplicate rows do not need special handling.
Approaches
Brute Force Approach
A brute force solution would process each user independently. First, gather all distinct users. Then, for every user, scan the entire table to collect all trial records and paid records belonging to that user. Once collected, compute the averages and determine whether the user converted.
This approach is correct because every user's records are explicitly examined and classified. However, it repeatedly scans the entire table for every user. If there are n rows and u users, the complexity becomes O(u × n), which is unnecessarily expensive.
Key Insight
The table already contains all information needed for aggregation. Instead of repeatedly scanning the data, we can group rows by user_id and compute all required statistics in a single aggregation pass.
For each user, we need:
- Average duration of
free_trialrows. - Average duration of
paidrows. - Verification that at least one
free_trialrow exists. - Verification that at least one
paidrow exists.
SQL aggregation functions such as AVG() combined with conditional expressions make this straightforward. By grouping by user_id, we compute both averages simultaneously. Then, a HAVING clause filters out users who do not have both activity types.
A brute force solution would process every user independently.
For each distinct user, we could scan the entire table to collect all of that user's trial activities and all of that user's paid activities. After gathering those rows, we would determine whether the user converted and compute the corresponding averages.
This approach is correct because every user's statistics are computed from all rows belonging to that user. However, it repeatedly scans the same table for each user, resulting in unnecessary work.
If there are n rows and u users, this approach can require approximately O(u × n) work.
Key Insight
The important observation is that SQL aggregation naturally solves this problem.
We do not need to process users individually with repeated scans. Instead, we can group rows by user_id once and calculate:
- Trial average using conditional aggregation
- Paid average using conditional aggregation
- Counts of trial and paid rows to determine whether the user converted
By aggregating all information in a single grouping operation, every row is processed only once.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(u × n) | O(u) | Repeatedly scans the table for each user |
| Optimal | O(n) | O(u) | Single grouping pass using SQL aggregation |
| Brute Force | O(u × n) | O(n) | Repeatedly scans table for each user |
| Optimal | O(n) | O(u) | Single grouping with conditional aggregation |
Algorithm Walkthrough
Optimal SQL Algorithm
- Group all rows by
user_id.
This allows us to compute statistics independently for every user. 2. Calculate the trial average.
Use a conditional expression that includes activity_duration only when activity_type = 'free_trial'. Applying AVG() over these values produces the average free-trial duration.
3. Calculate the paid average.
Similarly, include activity_duration only when activity_type = 'paid', then compute the average.
4. Determine whether the user converted.
Count how many rows belong to the free_trial category and how many belong to the paid category.
5. Keep only converted users.
A user qualifies only if both counts are greater than zero. 6. Round both averages to two decimal places.
Use ROUND(..., 2).
7. Order the final result by user_id.
This satisfies the output requirement.
Why it works
The grouping operation ensures all records for a user are processed together. Conditional aggregation isolates free-trial and paid records while ignoring unrelated activity types. The HAVING clause guarantees that only users with both trial and paid activity are retained. Therefore, every returned row corresponds exactly to a converted user, and the averages are computed over the correct subsets of records.
Python Solution
For LeetCode Database problems, the expected answer is an SQL query rather than executable Python code. The equivalent logical solution is shown below for completeness.
# Database problem: submit SQL instead of Python
class Solution:
pass
The actual LeetCode submission should be SQL. The core idea is to group by user, compute conditional averages, and filter users that have both free-trial and paid records. 2. For each user, calculate the average trial duration using a conditional expression:
AVG(CASE WHEN activity_type = 'free_trial' THEN activity_duration END)
Since AVG ignores NULL values, only free trial rows contribute to this average.
3. Similarly, calculate the average paid duration:
AVG(CASE WHEN activity_type = 'paid' THEN activity_duration END)
- Determine whether the user converted by counting paid rows:
SUM(activity_type = 'paid')
A converted user must have at least one paid record.
5. Since conversion is from free trial to paid, also ensure the user has at least one free trial record.
6. Use a HAVING clause to keep only users with both activity types.
7. Round both averages to two decimal places.
8. Sort the final result by user_id in ascending order.
Why it works
The solution relies on conditional aggregation. Every row belonging to a user is examined exactly once during grouping. Trial rows contribute only to the trial average, paid rows contribute only to the paid average, and the HAVING clause guarantees that only users containing both activity types are returned. Therefore every returned user is a valid converter, and the averages are computed from exactly the required records.
Python Solution
Since this is a Database problem, the actual LeetCode solution is SQL rather than Python. The following query is the intended solution.
# SQL problem - no Python implementation required
The database engine groups rows by user and computes conditional averages. Trial activities and paid activities are aggregated independently using CASE expressions inside AVG. The HAVING clause ensures that only users who have both trial and paid records are included in the final result.
SQL Solution
SELECT
user_id,
ROUND(
AVG(
CASE
WHEN activity_type = 'free_trial'
THEN activity_duration
END
),
2
) AS trial_avg_duration,
ROUND(
AVG(
CASE
WHEN activity_type = 'paid'
THEN activity_duration
END
),
2
) AS paid_avg_duration
FROM UserActivity
GROUP BY user_id
HAVING
SUM(activity_type = 'free_trial') > 0
AND SUM(activity_type = 'paid') > 0
ORDER BY user_id;
The query first groups records by user. The conditional AVG() expressions calculate averages for the desired activity types while ignoring all other rows through NULL values. The HAVING clause ensures that both free-trial and paid records exist for the user. Finally, the averages are rounded and the result is sorted by user ID.
Go Solution
Since this is a database problem, LeetCode expects SQL rather than Go code. A Go implementation is not applicable to the actual submission format.
// Database problem: submit SQL instead of Go code.
package main
func main() {
}
The solution logic remains identical regardless of language. Aggregate records by user, compute conditional averages for each subscription phase, filter converted users, and sort by user ID.
Worked Examples
Example Input
The query performs a single grouping operation. Conditional aggregation isolates the trial and paid records. The HAVING clause filters out users who never converted, and ROUND(..., 2) produces the required precision.
Go Solution
As with the Python section, this is a Database problem and does not require a Go implementation on LeetCode.
// SQL problem - no Go implementation required
There are no language-specific implementation concerns because the official solution is expressed entirely in SQL and executed by the database engine.
Worked Examples
Example Walkthrough
Input:
| user_id | activity_type | activity_duration |
|---|---|---|
| 1 | free_trial | 45 |
| 1 | free_trial | 30 |
| 1 | free_trial | 60 |
| 1 | paid | 75 |
| 1 | paid | 90 |
| 1 | paid | 65 |
User 1 Aggregation State
| Metric | Value |
|---|---|
| Trial durations | 45, 30, 60 |
| Trial sum | 135 |
| Trial count | 3 |
| Trial average | 45.00 |
| Paid durations | 75, 90, 65 |
| Paid sum | 230 |
| Paid count | 3 |
| Paid average | 76.67 |
Since both trial and paid records exist, user 1 is included.
User 2
| Metric | Value |
|---|---|
| Trial durations | 55, 25, 50 |
| Paid records | None |
| Cancelled records | Present |
The HAVING condition fails because there are no paid rows. User 2 is excluded.
User 3
| Metric | Value |
|---|---|
| Trial durations | 70, 60, 80 |
| Trial average | 70.00 |
| Paid durations | 50, 55, 85 |
| Paid average | 63.33 |
The user has both trial and paid records, so they are included.
User 4
| Metric | Value |
|---|---|
| Trial durations | 40, 35 |
| Trial average | 37.50 |
| Paid durations | 45 |
| Paid average | 45.00 |
Even though the user later cancelled, they successfully converted to paid first. Therefore they remain in the result. | 2 | free_trial | 55 | | 2 | free_trial | 25 | | 2 | free_trial | 50 | | 2 | cancelled | 0 | | 3 | free_trial | 70 | | 3 | free_trial | 60 | | 3 | free_trial | 80 | | 3 | paid | 50 | | 3 | paid | 55 | | 3 | paid | 85 | | 4 | free_trial | 40 | | 4 | free_trial | 35 | | 4 | paid | 45 | | 4 | cancelled | 0 |
After grouping by user:
| User | Trial Durations | Paid Durations |
|---|---|---|
| 1 | 45, 30, 60 | 75, 90, 65 |
| 2 | 55, 25, 50 | None |
| 3 | 70, 60, 80 | 50, 55, 85 |
| 4 | 40, 35 | 45 |
Average calculations:
| User | Trial Average | Paid Average | Included? |
|---|---|---|---|
| 1 | (45+30+60)/3 = 45.00 | (75+90+65)/3 = 76.67 | Yes |
| 2 | 43.33 | NULL | No |
| 3 | 70.00 | 63.33 | Yes |
| 4 | 37.50 | 45.00 | Yes |
Final result:
| user_id | trial_avg_duration | paid_avg_duration |
|---|---|---|
| 1 | 45.00 | 76.67 |
| 3 | 70.00 | 63.33 |
| 4 | 37.50 | 45.00 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Every row is processed once during grouping |
| Space | O(u) | Aggregation state is maintained per user |
| Sort Output | O(u log u) | Ordering by user_id may require sorting grouped results |
The dominant work comes from scanning the table and maintaining aggregate values for each user. Modern database engines typically perform the grouping in a single pass or through optimized aggregation strategies. The final ordering step depends on the number of grouped users rather than the total number of activity records. | Time | O(n) | Each row participates in a single grouping operation | | Space | O(u) | Aggregation state is maintained per user |
The database scans the table once and maintains aggregate values for each distinct user. If there are u unique users, the grouping structure stores information proportional to the number of users. Therefore the overall complexity is linear in the number of rows.
Test Cases
# Conceptual test cases for the SQL logic
# Example 1
# User converts from trial to paid
assert True
# User has free trial only
assert True
# User has free trial and cancellation but never pays
assert True
# User has one paid record after trial
assert True
# User has multiple paid records
assert True
# User converts and later cancels
assert True
# User has many trial records and many paid records
assert True
# User has exactly one trial and one paid record
assert True
# Multiple users with mixed conversion states
assert True
# Output ordering by user_id
assert True
# Conceptual test cases for SQL output validation
# Example case from statement
assert True # users 1, 3, 4 should appear
# User only has free_trial
assert True # should not appear
# User only has paid
assert True # should not appear
# User has free_trial and paid
assert True # should appear
# User converts and later cancels
assert True # still appears
# Single trial row and single paid row
assert True # averages equal those values
# Multiple paid rows
assert True # average computed correctly
# Multiple trial rows
assert True # average computed correctly
# Large number of activities
assert True # aggregation remains correct
# User with zero-duration paid activity
assert True # included in average calculation
Test Summary
| Test | Why |
|---|---|
| Trial then paid | Basic conversion case |
| Trial only | Should be excluded |
| Trial then cancelled | Should be excluded |
| Single paid record | Valid average calculation |
| Multiple paid records | Verifies aggregation |
| Paid then cancelled | Converted users remain included |
| Large record counts | Stress aggregation logic |
| One trial and one paid | Minimum valid conversion |
| Mixed user population | Ensures proper filtering |
| Ordered output | Verifies sorting requirement |
Edge Cases
Users Who Never Convert
A common mistake is to include any user with free-trial activity. The problem explicitly requires users who converted to paid subscriptions. The HAVING clause prevents this by requiring at least one paid record in addition to at least one trial record.
Users Who Convert and Later Cancel
A user may have records in all three categories: free_trial, paid, and cancelled. Such users must still be included because they successfully converted before cancelling. The solution ignores cancellation records when computing averages and only checks that paid activity exists.
Users With Only One Paid Activity
Some implementations accidentally assume multiple paid rows exist. Since AVG() works correctly even on a single value, the solution naturally handles this scenario. For example, a single paid duration of 45 correctly produces an average of 45.00.
Unequal Numbers of Trial and Paid Records
A user may have many trial entries and only a few paid entries, or vice versa. Since averages are computed independently for each activity type using conditional aggregation, the counts do not need to match. Each average is calculated using only the relevant subset of rows.
Presence of Cancelled Records
Cancelled rows should not affect either average. Because the conditional CASE expressions only include durations from free_trial or paid records, cancelled activities contribute NULL and are automatically ignored by AVG().
| Example from statement | Validates expected output |
| Only free_trial | Ensures non-converters are excluded |
| Only paid | Ensures missing trial users are excluded |
| Trial and paid | Validates conversion detection |
| Paid then cancelled | Confirms cancellation does not remove converted users |
| Single trial and paid row | Tests minimum valid conversion case |
| Multiple paid rows | Verifies paid average calculation |
| Multiple trial rows | Verifies trial average calculation |
| Large dataset | Tests scalability |
| Zero-duration activity | Ensures averages handle zero values correctly |
Edge Cases
User Never Converts
A user may have only free_trial and cancelled records. This is the most important filtering condition because such users should not appear in the result.
The implementation handles this through:
HAVING SUM(activity_type = 'paid') > 0
Since the user has no paid rows, they are excluded.
User Converts Then Cancels
A user may successfully subscribe and later cancel. The problem still considers them a converted user because they had both trial and paid activity records.
The query ignores cancellation rows when computing averages and only checks for the existence of paid activity. Therefore these users remain in the result.
User Has Only One Paid Activity
Some users may have multiple trial records but only a single paid record. In this case, the paid average should equal that one duration value.
Since SQL's AVG works correctly with a single value, no special handling is required.
Users With Different Numbers of Trial and Paid Days
A user may have three trial records and ten paid records, or vice versa. A common bug is accidentally averaging all activities together.
Conditional aggregation prevents this problem because trial and paid rows are aggregated independently using separate CASE expressions.
Presence of Cancelled Rows
Cancelled rows should not affect either average. If they are included accidentally, the averages become incorrect.
Because only free_trial and paid rows are included inside the conditional AVG expressions, cancelled records contribute NULL and are ignored automatically.