LeetCode 550 - Game Play Analysis IV

This problem is asking us to calculate a retention metric from a table of player activities. Specifically, we need to determine the fraction of players who log in on the day immediately following their first login.

LeetCode Problem 550

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This problem is asking us to calculate a retention metric from a table of player activities. Specifically, we need to determine the fraction of players who log in on the day immediately following their first login. The input is a SQL table named Activity where each row represents a single player’s login on a given date and the number of games they played that day. The primary key is (player_id, event_date), ensuring no duplicate records for the same player on the same day.

The expected output is a single decimal value, fraction, representing the ratio of players who logged in the next day to the total number of players, rounded to two decimal places.

Important clarifications and constraints:

  1. Only the first login date per player matters. Subsequent logins are relevant only to check if they occurred on the next day.
  2. The fraction is based on distinct players, not total logins.
  3. Edge cases could include players who log in only once, multiple players sharing the same first login date, or large gaps between logins.

The problem guarantees unique (player_id, event_date) pairs, which simplifies aggregation since we do not need to de-duplicate rows.

Approaches

Brute Force Approach

A naive solution would be to first find the first login date for each player and then scan the table to see if that player has any login on the day after their first login. This involves joining the table to itself on (player_id, event_date = first_login_date + 1). While this gives the correct answer, it requires scanning potentially large tables multiple times, which could be slow if the table contains millions of rows.

Optimal Approach

The key insight is that SQL allows us to compute aggregates and date arithmetic efficiently. We can first calculate each player's first login date using MIN(event_date) grouped by player_id. Then, we check if the player has a login on first_login_date + 1. Using an inner join or a WHERE EXISTS subquery allows us to efficiently count players who meet this condition without scanning the entire table multiple times. Finally, dividing by the total number of distinct players and rounding gives the required fraction.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2) O(n) Self-join for every player’s first login to check next day login
Optimal O(n) O(n) Aggregate to find first login, join/filter to count next-day logins

Algorithm Walkthrough

  1. Identify first login per player: Group the Activity table by player_id and compute MIN(event_date) to find the first login date for each player. This isolates the key date we need to check against.
  2. Check for next-day logins: For each player, check if there exists a row in Activity where event_date = first_login_date + 1. This can be done efficiently using a join or EXISTS subquery.
  3. Count players: Count the number of players who meet the next-day login condition.
  4. Calculate fraction: Divide the count of players who logged in the next day by the total number of distinct players. Use ROUND(..., 2) to round the result to two decimal places.

Why it works: By isolating each player’s first login date and checking for exactly one day after, we directly compute the numerator of the fraction. Counting all players gives the denominator. The division then produces the desired fraction, and rounding ensures the correct formatting.

Python Solution

# Python solution using SQL execution via any DB connection

import sqlite3

def fraction_next_day_login(conn: sqlite3.Connection) -> float:
    query = """
    WITH first_login AS (
        SELECT player_id, MIN(event_date) AS first_date
        FROM Activity
        GROUP BY player_id
    )
    SELECT ROUND(
        CAST(COUNT(a.player_id) AS FLOAT) / (SELECT COUNT(DISTINCT player_id) FROM Activity),
        2
    ) AS fraction
    FROM first_login f
    JOIN Activity a
      ON a.player_id = f.player_id
     AND a.event_date = DATE(f.first_date, '+1 day')
    """
    cursor = conn.cursor()
    cursor.execute(query)
    result = cursor.fetchone()
    return result[0] if result else 0.0

This Python implementation assumes a SQLite database connection. It defines a common table expression (CTE) to compute each player's first login, then joins to the Activity table to find next-day logins. The fraction is calculated by dividing the count of qualifying players by the total number of players.

Go Solution

import (
    "database/sql"
    _ "github.com/mattn/go-sqlite3"
    "fmt"
)

func FractionNextDayLogin(db *sql.DB) (float64, error) {
    query := `
    WITH first_login AS (
        SELECT player_id, MIN(event_date) AS first_date
        FROM Activity
        GROUP BY player_id
    )
    SELECT ROUND(
        CAST(COUNT(a.player_id) AS FLOAT) / (SELECT COUNT(DISTINCT player_id) FROM Activity),
        2
    ) AS fraction
    FROM first_login f
    JOIN Activity a
      ON a.player_id = f.player_id
     AND a.event_date = DATE(f.first_date, '+1 day')
    `
    var fraction float64
    err := db.QueryRow(query).Scan(&fraction)
    if err != nil {
        return 0.0, err
    }
    return fraction, nil
}

The Go implementation mirrors the Python solution, using sql.DB to execute the query and QueryRow to fetch the result. The main difference is Go’s strict handling of errors and types.

Worked Examples

Input Table:

player_id device_id event_date games_played
1 2 2016-03-01 5
1 2 2016-03-02 6
2 3 2017-06-25 1
3 1 2016-03-02 0
3 4 2018-07-03 5

Step 1: Compute first login per player

player_id first_date
1 2016-03-01
2 2017-06-25
3 2016-03-02

Step 2: Check next-day logins

player_id next_day_login_exists
1 yes
2 no
3 no

Step 3: Count players who logged in next day

1 player out of 3 total → 1/3 = 0.33

Complexity Analysis

Measure Complexity Explanation
Time O(n) Scanning the table once for aggregation and once for join
Space O(n) Storing first login dates in memory (CTE)

The algorithm scales linearly with the number of activity records and requires space proportional to the number of distinct players.

Test Cases

# test cases
assert fraction_next_day_login(conn) == 0.33  # Example 1
# single player, no next-day login
assert fraction_next_day_login(single_player_conn) == 0.0
# all players log in next day
assert fraction_next_day_login(all_next_day_conn) == 1.0
# players with gaps more than 1 day
assert fraction_next_day_login(gap_days_conn) == 0.0
# large dataset with mixed cases
assert fraction_next_day_login(large_conn) == expected_fraction
Test Why
Example 1 Basic functionality check
Single player, no next-day login Edge case with fraction 0
All players next-day login Edge case with fraction 1
Players with gaps Ensure only exactly next-day counts
Large dataset Test performance and rounding correctness

Edge Cases

  1. Single player with no next-day login: This tests that the fraction correctly evaluates to 0.0 and rounding does not introduce artifacts.
  2. Multiple logins on the first day: Since the first login is computed via MIN(event_date), additional logins on the same day do not affect the calculation.
  3. Large date gaps or non-contiguous logins: Players who log in multiple times but skip days should not be counted, ensuring that only consecutive day logins contribute to the numerator. This confirms the correctness of the join on first_date + 1 day.