LeetCode 550 - Game Play Analysis IV
This problem is asking us to calculate a retention metric from a table of player activities. Specifically, we need to determine the fraction of players who log in on the day immediately following their first login.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
This problem is asking us to calculate a retention metric from a table of player activities. Specifically, we need to determine the fraction of players who log in on the day immediately following their first login. The input is a SQL table named Activity where each row represents a single player’s login on a given date and the number of games they played that day. The primary key is (player_id, event_date), ensuring no duplicate records for the same player on the same day.
The expected output is a single decimal value, fraction, representing the ratio of players who logged in the next day to the total number of players, rounded to two decimal places.
Important clarifications and constraints:
- Only the first login date per player matters. Subsequent logins are relevant only to check if they occurred on the next day.
- The fraction is based on distinct players, not total logins.
- Edge cases could include players who log in only once, multiple players sharing the same first login date, or large gaps between logins.
The problem guarantees unique (player_id, event_date) pairs, which simplifies aggregation since we do not need to de-duplicate rows.
Approaches
Brute Force Approach
A naive solution would be to first find the first login date for each player and then scan the table to see if that player has any login on the day after their first login. This involves joining the table to itself on (player_id, event_date = first_login_date + 1). While this gives the correct answer, it requires scanning potentially large tables multiple times, which could be slow if the table contains millions of rows.
Optimal Approach
The key insight is that SQL allows us to compute aggregates and date arithmetic efficiently. We can first calculate each player's first login date using MIN(event_date) grouped by player_id. Then, we check if the player has a login on first_login_date + 1. Using an inner join or a WHERE EXISTS subquery allows us to efficiently count players who meet this condition without scanning the entire table multiple times. Finally, dividing by the total number of distinct players and rounding gives the required fraction.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n^2) | O(n) | Self-join for every player’s first login to check next day login |
| Optimal | O(n) | O(n) | Aggregate to find first login, join/filter to count next-day logins |
Algorithm Walkthrough
- Identify first login per player: Group the
Activitytable byplayer_idand computeMIN(event_date)to find the first login date for each player. This isolates the key date we need to check against. - Check for next-day logins: For each player, check if there exists a row in
Activitywhereevent_date = first_login_date + 1. This can be done efficiently using a join orEXISTSsubquery. - Count players: Count the number of players who meet the next-day login condition.
- Calculate fraction: Divide the count of players who logged in the next day by the total number of distinct players. Use
ROUND(..., 2)to round the result to two decimal places.
Why it works: By isolating each player’s first login date and checking for exactly one day after, we directly compute the numerator of the fraction. Counting all players gives the denominator. The division then produces the desired fraction, and rounding ensures the correct formatting.
Python Solution
# Python solution using SQL execution via any DB connection
import sqlite3
def fraction_next_day_login(conn: sqlite3.Connection) -> float:
query = """
WITH first_login AS (
SELECT player_id, MIN(event_date) AS first_date
FROM Activity
GROUP BY player_id
)
SELECT ROUND(
CAST(COUNT(a.player_id) AS FLOAT) / (SELECT COUNT(DISTINCT player_id) FROM Activity),
2
) AS fraction
FROM first_login f
JOIN Activity a
ON a.player_id = f.player_id
AND a.event_date = DATE(f.first_date, '+1 day')
"""
cursor = conn.cursor()
cursor.execute(query)
result = cursor.fetchone()
return result[0] if result else 0.0
This Python implementation assumes a SQLite database connection. It defines a common table expression (CTE) to compute each player's first login, then joins to the Activity table to find next-day logins. The fraction is calculated by dividing the count of qualifying players by the total number of players.
Go Solution
import (
"database/sql"
_ "github.com/mattn/go-sqlite3"
"fmt"
)
func FractionNextDayLogin(db *sql.DB) (float64, error) {
query := `
WITH first_login AS (
SELECT player_id, MIN(event_date) AS first_date
FROM Activity
GROUP BY player_id
)
SELECT ROUND(
CAST(COUNT(a.player_id) AS FLOAT) / (SELECT COUNT(DISTINCT player_id) FROM Activity),
2
) AS fraction
FROM first_login f
JOIN Activity a
ON a.player_id = f.player_id
AND a.event_date = DATE(f.first_date, '+1 day')
`
var fraction float64
err := db.QueryRow(query).Scan(&fraction)
if err != nil {
return 0.0, err
}
return fraction, nil
}
The Go implementation mirrors the Python solution, using sql.DB to execute the query and QueryRow to fetch the result. The main difference is Go’s strict handling of errors and types.
Worked Examples
Input Table:
| player_id | device_id | event_date | games_played |
|---|---|---|---|
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
Step 1: Compute first login per player
| player_id | first_date |
|---|---|
| 1 | 2016-03-01 |
| 2 | 2017-06-25 |
| 3 | 2016-03-02 |
Step 2: Check next-day logins
| player_id | next_day_login_exists |
|---|---|
| 1 | yes |
| 2 | no |
| 3 | no |
Step 3: Count players who logged in next day
1 player out of 3 total → 1/3 = 0.33
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Scanning the table once for aggregation and once for join |
| Space | O(n) | Storing first login dates in memory (CTE) |
The algorithm scales linearly with the number of activity records and requires space proportional to the number of distinct players.
Test Cases
# test cases
assert fraction_next_day_login(conn) == 0.33 # Example 1
# single player, no next-day login
assert fraction_next_day_login(single_player_conn) == 0.0
# all players log in next day
assert fraction_next_day_login(all_next_day_conn) == 1.0
# players with gaps more than 1 day
assert fraction_next_day_login(gap_days_conn) == 0.0
# large dataset with mixed cases
assert fraction_next_day_login(large_conn) == expected_fraction
| Test | Why |
|---|---|
| Example 1 | Basic functionality check |
| Single player, no next-day login | Edge case with fraction 0 |
| All players next-day login | Edge case with fraction 1 |
| Players with gaps | Ensure only exactly next-day counts |
| Large dataset | Test performance and rounding correctness |
Edge Cases
- Single player with no next-day login: This tests that the fraction correctly evaluates to
0.0and rounding does not introduce artifacts. - Multiple logins on the first day: Since the first login is computed via
MIN(event_date), additional logins on the same day do not affect the calculation. - Large date gaps or non-contiguous logins: Players who log in multiple times but skip days should not be counted, ensuring that only consecutive day logins contribute to the numerator. This confirms the correctness of the join on
first_date + 1 day.