LeetCode 512: Game Play Analysis II

A clear explanation of finding the first device used by each player using SQL aggregation and a join.

Problem Restatement

We are given an Activity table.

Each row records one player login on one date, including the device used and the number of games played.

We need to report the device that each player used on their first login date.

The result can be returned in any order. The table has columns player_id, device_id, event_date, and games_played; (player_id, event_date) is the primary key. The task is to report the first logged-in device for each player.

Table

Column Type Meaning
player_id int Player identifier
device_id int Device used by the player
event_date date Login date
games_played int Number of games played before logout

The primary key is:

(player_id, event_date)

This means each player has at most one row on a given date.

Output

Column Meaning
player_id Player identifier
device_id Device used on the player's first login date

Example

Input:

player_id device_id event_date games_played
1 2 2016-03-01 5
1 2 2016-05-02 6
2 3 2017-06-25 1
3 1 2016-03-02 0
3 4 2018-07-03 5

For player 1, the first login date is:

2016-03-01

The device used on that date is:

2

For player 2, the only login date is:

2017-06-25

The device is:

3

For player 3, the first login date is:

2016-03-02

The device is:

1

Output:

player_id device_id
1 2
2 3
3 1

First Thought: Find the First Date

This problem is close to LeetCode 511.

In LeetCode 511, we only needed:

player_id, MIN(event_date)

But here we need the device_id from the same row as the first login date.

So this query is not enough:

SELECT
    player_id,
    MIN(event_date) AS first_login
FROM Activity
GROUP BY player_id;

It gives the first date, but it does not give the device used on that date.

Key Insight

We should solve this in two steps:

  1. Find each player's first login date.
  2. Join that result back to Activity to get the device from that exact row.

The grouped subquery gives:

player_id, first_login

Then the join matches:

Activity.player_id = first_login.player_id
Activity.event_date = first_login.first_login

Since (player_id, event_date) is the primary key, this match identifies exactly one row for each player.

Algorithm

Create a derived table:

SELECT
    player_id,
    MIN(event_date) AS first_login
FROM Activity
GROUP BY player_id

Then join it with Activity.

Return:

Activity.player_id,
Activity.device_id

from the row where the event date equals the first login date.

Correctness

The subquery groups rows by player_id.

For each player group, MIN(event_date) returns that player's earliest login date.

Joining this result back to Activity on both player_id and event_date selects the row that represents the player's first login.

Because (player_id, event_date) is the primary key, there is only one such row per player.

The selected device_id therefore belongs to the device used on that player's first login date.

So the query returns exactly the required result.

Complexity

Metric Value Why
Time O(n) or O(n log n) Depends on how the database executes grouping and joining
Space O(p) The grouped result stores one row per player

Here:

Symbol Meaning
n Number of rows in Activity
p Number of distinct players

An index on (player_id, event_date) helps because the query groups and joins using those columns.

Implementation

SELECT
    a.player_id,
    a.device_id
FROM Activity AS a
JOIN (
    SELECT
        player_id,
        MIN(event_date) AS first_login
    FROM Activity
    GROUP BY player_id
) AS first_activity
    ON a.player_id = first_activity.player_id
   AND a.event_date = first_activity.first_login;

Code Explanation

The subquery finds the first login date per player:

SELECT
    player_id,
    MIN(event_date) AS first_login
FROM Activity
GROUP BY player_id

The outer query reads from the original table:

FROM Activity AS a

Then joins the first-login table:

JOIN (...) AS first_activity

The join condition matches the same player:

a.player_id = first_activity.player_id

and the first login date:

a.event_date = first_activity.first_login

Finally, the query returns the player and the first-login device:

SELECT
    a.player_id,
    a.device_id

Alternative: Window Function

Some SQL engines support FIRST_VALUE.

SELECT DISTINCT
    player_id,
    FIRST_VALUE(device_id) OVER (
        PARTITION BY player_id
        ORDER BY event_date
    ) AS device_id
FROM Activity;

This partitions rows by player and chooses the device from the earliest event_date. A public solution also uses this FIRST_VALUE pattern.

The join version is usually easier to explain and works well for this problem.

Testing

Test table:

CREATE TABLE Activity (
    player_id INT,
    device_id INT,
    event_date DATE,
    games_played INT,
    PRIMARY KEY (player_id, event_date)
);

INSERT INTO Activity (player_id, device_id, event_date, games_played) VALUES
    (1, 2, '2016-03-01', 5),
    (1, 2, '2016-05-02', 6),
    (2, 3, '2017-06-25', 1),
    (3, 1, '2016-03-02', 0),
    (3, 4, '2018-07-03', 5);

Expected result:

player_id device_id
1 2
2 3
3 1

Additional test case:

INSERT INTO Activity (player_id, device_id, event_date, games_played) VALUES
    (4, 9, '2020-01-10', 0),
    (4, 8, '2019-12-31', 3);

For player 4, the first login date is:

2019-12-31

So the returned device should be:

8

Test meaning:

Test Why
Player with multiple rows Checks earliest date selection
Player with one row The only device is returned
Earlier row inserted later Confirms insertion order does not matter
games_played = 0 Confirms games played does not affect the result