LeetCode 3497 - Analyze Subscription Conversion

This is a SQL aggregation problem where we need to analyze user subscription behavior and return statistics only for users who successfully converted from a free trial to a paid subscription. The UserActivity table contains one row per user, per day, per activity type.

LeetCode Problem 3497

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This is a SQL aggregation problem where we need to analyze user subscription behavior and return statistics only for users who successfully converted from a free trial to a paid subscription.

The UserActivity table contains one row per user, per day, per activity type. The activity_type column can be one of three values: This is a SQL database problem where we need to analyze user subscription behavior and identify users who successfully converted from a free trial into a paid subscription.

The UserActivity table contains one row per user, per day, per activity type. Each row records:

  • The user identifier (user_id)
  • The date of the activity (activity_date)
  • The subscription status for that activity (activity_type)
  • The number of minutes the user spent on the platform that day (activity_duration)

A user can have activities marked as:

  • free_trial
  • paid
  • cancelled

The goal is to identify users who have participated in both the free trial and paid phases, which indicates that they converted from a trial user into a paying customer.

For every converted user, we must calculate two separate averages:

  • The average daily activity duration during all free_trial records.
  • The average daily activity duration during all paid records.

Both averages must be rounded to two decimal places.

The output should contain only users who converted to paid subscriptions. Users who only participated in a free trial and later cancelled should not appear in the result. The final result must be ordered by user_id in ascending order.

The important observation is that the problem does not require reconstructing an exact 7-day trial window. Instead, it simply asks us to compute averages over rows labeled free_trial and rows labeled paid. A user is considered converted if they have at least one row of each type.

Important edge cases include users who never converted, users who converted and later cancelled, users with only one paid record, and users with varying numbers of trial and paid activity records. Since averages are computed only within each activity type, the number of records in each phase can differ. The goal is to return only users who converted from a free trial to a paid subscription. In practical terms, a user qualifies if they have at least one free_trial record and at least one paid record.

For every qualifying user, we must compute two averages:

  1. The average activity duration across all free_trial rows.
  2. The average activity duration across all paid rows.

Both averages must be rounded to two decimal places.

The output should contain:

Column Meaning
user_id User who converted
trial_avg_duration Average duration during free trial
paid_avg_duration Average duration during paid period

The result must be ordered by user_id in ascending order.

A key observation is that we are not asked to analyze dates or verify that the paid subscription occurred immediately after the trial. The problem defines conversion simply as users who have both free_trial and paid activity records.

Important edge cases include users who never convert, users who later cancel after converting, users with multiple trial entries, users with multiple paid entries, and users with only a single paid or trial activity. The table guarantees uniqueness of (user_id, activity_date, activity_type), so duplicate rows do not need special handling.

Approaches

Brute Force Approach

A brute force solution would process each user independently. First, gather all distinct users. Then, for every user, scan the entire table to collect all trial records and paid records belonging to that user. Once collected, compute the averages and determine whether the user converted.

This approach is correct because every user's records are explicitly examined and classified. However, it repeatedly scans the entire table for every user. If there are n rows and u users, the complexity becomes O(u × n), which is unnecessarily expensive.

Key Insight

The table already contains all information needed for aggregation. Instead of repeatedly scanning the data, we can group rows by user_id and compute all required statistics in a single aggregation pass.

For each user, we need:

  • Average duration of free_trial rows.
  • Average duration of paid rows.
  • Verification that at least one free_trial row exists.
  • Verification that at least one paid row exists.

SQL aggregation functions such as AVG() combined with conditional expressions make this straightforward. By grouping by user_id, we compute both averages simultaneously. Then, a HAVING clause filters out users who do not have both activity types. A brute force solution would process every user independently.

For each distinct user, we could scan the entire table to collect all of that user's trial activities and all of that user's paid activities. After gathering those rows, we would determine whether the user converted and compute the corresponding averages.

This approach is correct because every user's statistics are computed from all rows belonging to that user. However, it repeatedly scans the same table for each user, resulting in unnecessary work.

If there are n rows and u users, this approach can require approximately O(u × n) work.

Key Insight

The important observation is that SQL aggregation naturally solves this problem.

We do not need to process users individually with repeated scans. Instead, we can group rows by user_id once and calculate:

  • Trial average using conditional aggregation
  • Paid average using conditional aggregation
  • Counts of trial and paid rows to determine whether the user converted

By aggregating all information in a single grouping operation, every row is processed only once.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(u × n) O(u) Repeatedly scans the table for each user
Optimal O(n) O(u) Single grouping pass using SQL aggregation
Brute Force O(u × n) O(n) Repeatedly scans table for each user
Optimal O(n) O(u) Single grouping with conditional aggregation

Algorithm Walkthrough

Optimal SQL Algorithm

  1. Group all rows by user_id.

This allows us to compute statistics independently for every user. 2. Calculate the trial average.

Use a conditional expression that includes activity_duration only when activity_type = 'free_trial'. Applying AVG() over these values produces the average free-trial duration. 3. Calculate the paid average.

Similarly, include activity_duration only when activity_type = 'paid', then compute the average. 4. Determine whether the user converted.

Count how many rows belong to the free_trial category and how many belong to the paid category. 5. Keep only converted users.

A user qualifies only if both counts are greater than zero. 6. Round both averages to two decimal places.

Use ROUND(..., 2). 7. Order the final result by user_id.

This satisfies the output requirement.

Why it works

The grouping operation ensures all records for a user are processed together. Conditional aggregation isolates free-trial and paid records while ignoring unrelated activity types. The HAVING clause guarantees that only users with both trial and paid activity are retained. Therefore, every returned row corresponds exactly to a converted user, and the averages are computed over the correct subsets of records.

Python Solution

For LeetCode Database problems, the expected answer is an SQL query rather than executable Python code. The equivalent logical solution is shown below for completeness.

# Database problem: submit SQL instead of Python

class Solution:
    pass

The actual LeetCode submission should be SQL. The core idea is to group by user, compute conditional averages, and filter users that have both free-trial and paid records. 2. For each user, calculate the average trial duration using a conditional expression:

AVG(CASE WHEN activity_type = 'free_trial' THEN activity_duration END)

Since AVG ignores NULL values, only free trial rows contribute to this average. 3. Similarly, calculate the average paid duration:

AVG(CASE WHEN activity_type = 'paid' THEN activity_duration END)
  1. Determine whether the user converted by counting paid rows:
SUM(activity_type = 'paid')

A converted user must have at least one paid record. 5. Since conversion is from free trial to paid, also ensure the user has at least one free trial record. 6. Use a HAVING clause to keep only users with both activity types. 7. Round both averages to two decimal places. 8. Sort the final result by user_id in ascending order.

Why it works

The solution relies on conditional aggregation. Every row belonging to a user is examined exactly once during grouping. Trial rows contribute only to the trial average, paid rows contribute only to the paid average, and the HAVING clause guarantees that only users containing both activity types are returned. Therefore every returned user is a valid converter, and the averages are computed from exactly the required records.

Python Solution

Since this is a Database problem, the actual LeetCode solution is SQL rather than Python. The following query is the intended solution.

# SQL problem - no Python implementation required

The database engine groups rows by user and computes conditional averages. Trial activities and paid activities are aggregated independently using CASE expressions inside AVG. The HAVING clause ensures that only users who have both trial and paid records are included in the final result.

SQL Solution

SELECT
    user_id,
    ROUND(
        AVG(
            CASE
                WHEN activity_type = 'free_trial'
                THEN activity_duration
            END
        ),
        2
    ) AS trial_avg_duration,
    ROUND(
        AVG(
            CASE
                WHEN activity_type = 'paid'
                THEN activity_duration
            END
        ),
        2
    ) AS paid_avg_duration
FROM UserActivity
GROUP BY user_id
HAVING
    SUM(activity_type = 'free_trial') > 0
    AND SUM(activity_type = 'paid') > 0
ORDER BY user_id;

The query first groups records by user. The conditional AVG() expressions calculate averages for the desired activity types while ignoring all other rows through NULL values. The HAVING clause ensures that both free-trial and paid records exist for the user. Finally, the averages are rounded and the result is sorted by user ID.

Go Solution

Since this is a database problem, LeetCode expects SQL rather than Go code. A Go implementation is not applicable to the actual submission format.

// Database problem: submit SQL instead of Go code.

package main

func main() {
}

The solution logic remains identical regardless of language. Aggregate records by user, compute conditional averages for each subscription phase, filter converted users, and sort by user ID.

Worked Examples

Example Input

The query performs a single grouping operation. Conditional aggregation isolates the trial and paid records. The HAVING clause filters out users who never converted, and ROUND(..., 2) produces the required precision.

Go Solution

As with the Python section, this is a Database problem and does not require a Go implementation on LeetCode.

// SQL problem - no Go implementation required

There are no language-specific implementation concerns because the official solution is expressed entirely in SQL and executed by the database engine.

Worked Examples

Example Walkthrough

Input:

user_id activity_type activity_duration
1 free_trial 45
1 free_trial 30
1 free_trial 60
1 paid 75
1 paid 90
1 paid 65

User 1 Aggregation State

Metric Value
Trial durations 45, 30, 60
Trial sum 135
Trial count 3
Trial average 45.00
Paid durations 75, 90, 65
Paid sum 230
Paid count 3
Paid average 76.67

Since both trial and paid records exist, user 1 is included.

User 2

Metric Value
Trial durations 55, 25, 50
Paid records None
Cancelled records Present

The HAVING condition fails because there are no paid rows. User 2 is excluded.

User 3

Metric Value
Trial durations 70, 60, 80
Trial average 70.00
Paid durations 50, 55, 85
Paid average 63.33

The user has both trial and paid records, so they are included.

User 4

Metric Value
Trial durations 40, 35
Trial average 37.50
Paid durations 45
Paid average 45.00

Even though the user later cancelled, they successfully converted to paid first. Therefore they remain in the result. | 2 | free_trial | 55 | | 2 | free_trial | 25 | | 2 | free_trial | 50 | | 2 | cancelled | 0 | | 3 | free_trial | 70 | | 3 | free_trial | 60 | | 3 | free_trial | 80 | | 3 | paid | 50 | | 3 | paid | 55 | | 3 | paid | 85 | | 4 | free_trial | 40 | | 4 | free_trial | 35 | | 4 | paid | 45 | | 4 | cancelled | 0 |

After grouping by user:

User Trial Durations Paid Durations
1 45, 30, 60 75, 90, 65
2 55, 25, 50 None
3 70, 60, 80 50, 55, 85
4 40, 35 45

Average calculations:

User Trial Average Paid Average Included?
1 (45+30+60)/3 = 45.00 (75+90+65)/3 = 76.67 Yes
2 43.33 NULL No
3 70.00 63.33 Yes
4 37.50 45.00 Yes

Final result:

user_id trial_avg_duration paid_avg_duration
1 45.00 76.67
3 70.00 63.33
4 37.50 45.00

Complexity Analysis

Measure Complexity Explanation
Time O(n) Every row is processed once during grouping
Space O(u) Aggregation state is maintained per user
Sort Output O(u log u) Ordering by user_id may require sorting grouped results

The dominant work comes from scanning the table and maintaining aggregate values for each user. Modern database engines typically perform the grouping in a single pass or through optimized aggregation strategies. The final ordering step depends on the number of grouped users rather than the total number of activity records. | Time | O(n) | Each row participates in a single grouping operation | | Space | O(u) | Aggregation state is maintained per user |

The database scans the table once and maintains aggregate values for each distinct user. If there are u unique users, the grouping structure stores information proportional to the number of users. Therefore the overall complexity is linear in the number of rows.

Test Cases

# Conceptual test cases for the SQL logic

# Example 1
# User converts from trial to paid
assert True

# User has free trial only
assert True

# User has free trial and cancellation but never pays
assert True

# User has one paid record after trial
assert True

# User has multiple paid records
assert True

# User converts and later cancels
assert True

# User has many trial records and many paid records
assert True

# User has exactly one trial and one paid record
assert True

# Multiple users with mixed conversion states
assert True

# Output ordering by user_id
assert True
# Conceptual test cases for SQL output validation

# Example case from statement
assert True  # users 1, 3, 4 should appear

# User only has free_trial
assert True  # should not appear

# User only has paid
assert True  # should not appear

# User has free_trial and paid
assert True  # should appear

# User converts and later cancels
assert True  # still appears

# Single trial row and single paid row
assert True  # averages equal those values

# Multiple paid rows
assert True  # average computed correctly

# Multiple trial rows
assert True  # average computed correctly

# Large number of activities
assert True  # aggregation remains correct

# User with zero-duration paid activity
assert True  # included in average calculation

Test Summary

Test Why
Trial then paid Basic conversion case
Trial only Should be excluded
Trial then cancelled Should be excluded
Single paid record Valid average calculation
Multiple paid records Verifies aggregation
Paid then cancelled Converted users remain included
Large record counts Stress aggregation logic
One trial and one paid Minimum valid conversion
Mixed user population Ensures proper filtering
Ordered output Verifies sorting requirement

Edge Cases

Users Who Never Convert

A common mistake is to include any user with free-trial activity. The problem explicitly requires users who converted to paid subscriptions. The HAVING clause prevents this by requiring at least one paid record in addition to at least one trial record.

Users Who Convert and Later Cancel

A user may have records in all three categories: free_trial, paid, and cancelled. Such users must still be included because they successfully converted before cancelling. The solution ignores cancellation records when computing averages and only checks that paid activity exists.

Users With Only One Paid Activity

Some implementations accidentally assume multiple paid rows exist. Since AVG() works correctly even on a single value, the solution naturally handles this scenario. For example, a single paid duration of 45 correctly produces an average of 45.00.

Unequal Numbers of Trial and Paid Records

A user may have many trial entries and only a few paid entries, or vice versa. Since averages are computed independently for each activity type using conditional aggregation, the counts do not need to match. Each average is calculated using only the relevant subset of rows.

Presence of Cancelled Records

Cancelled rows should not affect either average. Because the conditional CASE expressions only include durations from free_trial or paid records, cancelled activities contribute NULL and are automatically ignored by AVG(). | Example from statement | Validates expected output | | Only free_trial | Ensures non-converters are excluded | | Only paid | Ensures missing trial users are excluded | | Trial and paid | Validates conversion detection | | Paid then cancelled | Confirms cancellation does not remove converted users | | Single trial and paid row | Tests minimum valid conversion case | | Multiple paid rows | Verifies paid average calculation | | Multiple trial rows | Verifies trial average calculation | | Large dataset | Tests scalability | | Zero-duration activity | Ensures averages handle zero values correctly |

Edge Cases

User Never Converts

A user may have only free_trial and cancelled records. This is the most important filtering condition because such users should not appear in the result.

The implementation handles this through:

HAVING SUM(activity_type = 'paid') > 0

Since the user has no paid rows, they are excluded.

User Converts Then Cancels

A user may successfully subscribe and later cancel. The problem still considers them a converted user because they had both trial and paid activity records.

The query ignores cancellation rows when computing averages and only checks for the existence of paid activity. Therefore these users remain in the result.

User Has Only One Paid Activity

Some users may have multiple trial records but only a single paid record. In this case, the paid average should equal that one duration value.

Since SQL's AVG works correctly with a single value, no special handling is required.

Users With Different Numbers of Trial and Paid Days

A user may have three trial records and ten paid records, or vice versa. A common bug is accidentally averaging all activities together.

Conditional aggregation prevents this problem because trial and paid rows are aggregated independently using separate CASE expressions.

Presence of Cancelled Rows

Cancelled rows should not affect either average. If they are included accidentally, the averages become incorrect.

Because only free_trial and paid rows are included inside the conditional AVG expressions, cancelled records contribute NULL and are ignored automatically.