LeetCode 1919 - Leetcodify Similar Friends

LeetCode Problem 1919

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This problem involves analyzing two database tables, Listens and Friendship, to determine pairs of "similar friends." In plain terms, we are asked to find pairs of users who are friends and who listened to at least three different songs on the same day.

The Listens table contains the listening history of users, with each row representing a single user listening to a song on a specific day. It is important to note that there may be duplicate rows, meaning a user could appear multiple times for the same song and day. The Friendship table contains all friend pairs in the system, with the guarantee that user1_id < user2_id for all rows.

The expected output is a list of all friend pairs satisfying the "similar friends" condition. Each pair must maintain the original ordering (user1_id < user2_id) and can be returned in any order.

Key points to note:

Two users must be friends. Listening similarity alone is not enough.
They must have listened to at least three different songs on the same day. Different days do not count together.
Duplicate listening records do not increase the song count.
Only pairs present in the Friendship table are eligible.

Edge cases to consider include friends who listen to fewer than three songs together, multiple days of overlap, or duplicates in the Listens table. The input is relatively small in terms of tables (likely up to tens of thousands of rows) but must be handled efficiently using SQL aggregation.

Approaches

Brute-Force Approach

The brute-force approach would involve checking every possible pair of friends and iterating through all days to count common songs. For each friend pair (u1, u2), we would:

Retrieve all songs for u1 grouped by day.
Retrieve all songs for u2 grouped by day.
Compare each day to see if at least three songs overlap.

While this approach is correct, it is very slow because it requires nested iteration over both friends and days, leading to a time complexity on the order of O(F × D × S), where F is the number of friends, D is the number of distinct days, and S is the average number of songs per day. This is impractical for large datasets.

Optimal Approach

The key insight is that we can leverage SQL aggregation and joins to avoid iterating manually. Specifically, we can:

Join the Listens table to itself on the same day and song for two users who are friends. This automatically pairs users listening to the same song on the same day.
Group by the friend pair and day, counting distinct songs.
Filter groups with at least 3 distinct songs to find similar friends.

This approach uses SQL joins and COUNT(DISTINCT ...) to efficiently compute the answer without manual iteration.

Comparison:

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(F × D × S)	O(S)	Iterate through all friends and days, compare song lists
Optimal	O(N log N)	O(N)	Use SQL joins and aggregation; N is number of `Listens` rows

Algorithm Walkthrough

Start by joining the Friendship table with the Listens table twice: once for user1_id and once for user2_id. Ensure both listens occur on the same day and are for the same song.
After the join, each row represents a common song for a friend pair on a particular day.
Group the results by user1_id, user2_id, and day.
Count distinct song_id in each group.
Filter groups where the count is greater than or equal to 3. These are valid "similar friends."
Select only the friend IDs (user1_id, user2_id) for the output, maintaining the original ordering.

Why it works: Each join ensures that only songs listened to by both friends on the same day are counted. Grouping and counting distinct songs guarantees that we respect the "three or more different songs on the same day" requirement.

Python Solution

Since this is a database problem, a Python solution involves using SQL execution through a connection. Here is a submittable solution assuming a database cursor:

def similarFriends():
    query = """
    SELECT f.user1_id, f.user2_id
    FROM Friendship f
    JOIN Listens l1 ON f.user1_id = l1.user_id
    JOIN Listens l2 ON f.user2_id = l2.user_id 
        AND l1.song_id = l2.song_id 
        AND l1.day = l2.day
    GROUP BY f.user1_id, f.user2_id, l1.day
    HAVING COUNT(DISTINCT l1.song_id) >= 3
    """
    return query

Explanation: We perform a self-join on the Listens table to align users listening to the same song on the same day. Grouping by friend pair and day allows counting distinct songs. The HAVING clause filters out pairs with fewer than three common songs.

Go Solution

In Go, we would typically execute the same SQL query using a database driver. Here is a conceptual solution using database/sql:

func SimilarFriends(db *sql.DB) (*sql.Rows, error) {
    query := `
    SELECT f.user1_id, f.user2_id
    FROM Friendship f
    JOIN Listens l1 ON f.user1_id = l1.user_id
    JOIN Listens l2 ON f.user2_id = l2.user_id
        AND l1.song_id = l2.song_id
        AND l1.day = l2.day
    GROUP BY f.user1_id, f.user2_id, l1.day
    HAVING COUNT(DISTINCT l1.song_id) >= 3
    `
    return db.Query(query)
}

Go-specific notes: Go requires explicit database connection handling and returns rows or an error. The SQL logic remains identical to the Python version.

Worked Examples

Input Tables

Listens:

user_id	song_id	day
1	10	2021-03-15
1	11	2021-03-15
1	12	2021-03-15
2	10	2021-03-15
2	11	2021-03-15
2	12	2021-03-15
4	10	2021-03-15
4	11	2021-03-15
4	13	2021-03-15

Friendship:

user1_id	user2_id
1	2
2	4

Step 1: Join Friendship with Listens for both users.

Step 2: Find matching songs on the same day:

user1_id	user2_id	day	song_id
1	2	2021-03-15	10
1	2	2021-03-15	11
1	2	2021-03-15	12

Step 3: Group by user1_id, user2_id, day and count distinct songs. Count = 3 ≥ 3.

Step 4: Output pair (1, 2).

Complexity Analysis

Measure	Complexity	Explanation
Time	O(N log N)	Join and group operations dominate; N is number of `Listens` rows
Space	O(N)	Storage for intermediate join results and grouping

The SQL aggregation efficiently reduces the problem size by eliminating unnecessary iteration over non-overlapping friends or days.

Test Cases

# Test with given example
assert similarFriends() == """
SELECT f.user1_id, f.user2_id
FROM Friendship f
JOIN Listens l1 ON f.user1_id = l1.user_id
JOIN Listens l2 ON f.user2_id = l2.user_id 
    AND l1.song_id = l2.song_id 
    AND l1.day = l2.day
GROUP BY f.user1_id, f.user2_id, l1.day
HAVING COUNT(DISTINCT l1.song_id) >= 3
"""  # returns query string

# Edge case: no friends
# Edge case: friends with only 2 common songs
# Edge case: friends with 3+ songs but on different days
# Edge case: multiple valid pairs

Test	Why
Example input	Validates base case from problem
No friends	Checks that empty output is handled
Only 2 songs shared	Ensures HAVING COUNT >=3 works
Songs on different days	Ensures day alignment is required
Multiple valid pairs	Ensures query returns all qualifying pairs

Edge Cases

One edge case is friends who have common songs but fewer than three. The `HAVING COUNT(DISTINCT song_id) >=