LeetCode 1919 - Leetcodify Similar Friends
This problem involves analyzing two database tables, Listens and Friendship, to determine pairs of "similar friends." In plain terms, we are asked to find pairs of users who are friends and who listened to at least three different songs on the same day.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
This problem involves analyzing two database tables, Listens and Friendship, to determine pairs of "similar friends." In plain terms, we are asked to find pairs of users who are friends and who listened to at least three different songs on the same day.
The Listens table contains the listening history of users, with each row representing a single user listening to a song on a specific day. It is important to note that there may be duplicate rows, meaning a user could appear multiple times for the same song and day. The Friendship table contains all friend pairs in the system, with the guarantee that user1_id < user2_id for all rows.
The expected output is a list of all friend pairs satisfying the "similar friends" condition. Each pair must maintain the original ordering (user1_id < user2_id) and can be returned in any order.
Key points to note:
- Two users must be friends. Listening similarity alone is not enough.
- They must have listened to at least three different songs on the same day. Different days do not count together.
- Duplicate listening records do not increase the song count.
- Only pairs present in the
Friendshiptable are eligible.
Edge cases to consider include friends who listen to fewer than three songs together, multiple days of overlap, or duplicates in the Listens table. The input is relatively small in terms of tables (likely up to tens of thousands of rows) but must be handled efficiently using SQL aggregation.
Approaches
Brute-Force Approach
The brute-force approach would involve checking every possible pair of friends and iterating through all days to count common songs. For each friend pair (u1, u2), we would:
- Retrieve all songs for
u1grouped by day. - Retrieve all songs for
u2grouped by day. - Compare each day to see if at least three songs overlap.
While this approach is correct, it is very slow because it requires nested iteration over both friends and days, leading to a time complexity on the order of O(F × D × S), where F is the number of friends, D is the number of distinct days, and S is the average number of songs per day. This is impractical for large datasets.
Optimal Approach
The key insight is that we can leverage SQL aggregation and joins to avoid iterating manually. Specifically, we can:
- Join the
Listenstable to itself on the same day and song for two users who are friends. This automatically pairs users listening to the same song on the same day. - Group by the friend pair and day, counting distinct songs.
- Filter groups with at least 3 distinct songs to find similar friends.
This approach uses SQL joins and COUNT(DISTINCT ...) to efficiently compute the answer without manual iteration.
Comparison:
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(F × D × S) | O(S) | Iterate through all friends and days, compare song lists |
| Optimal | O(N log N) | O(N) | Use SQL joins and aggregation; N is number of Listens rows |
Algorithm Walkthrough
- Start by joining the
Friendshiptable with theListenstable twice: once foruser1_idand once foruser2_id. Ensure both listens occur on the same day and are for the same song. - After the join, each row represents a common song for a friend pair on a particular day.
- Group the results by
user1_id,user2_id, andday. - Count distinct
song_idin each group. - Filter groups where the count is greater than or equal to 3. These are valid "similar friends."
- Select only the friend IDs (
user1_id,user2_id) for the output, maintaining the original ordering.
Why it works: Each join ensures that only songs listened to by both friends on the same day are counted. Grouping and counting distinct songs guarantees that we respect the "three or more different songs on the same day" requirement.
Python Solution
Since this is a database problem, a Python solution involves using SQL execution through a connection. Here is a submittable solution assuming a database cursor:
def similarFriends():
query = """
SELECT f.user1_id, f.user2_id
FROM Friendship f
JOIN Listens l1 ON f.user1_id = l1.user_id
JOIN Listens l2 ON f.user2_id = l2.user_id
AND l1.song_id = l2.song_id
AND l1.day = l2.day
GROUP BY f.user1_id, f.user2_id, l1.day
HAVING COUNT(DISTINCT l1.song_id) >= 3
"""
return query
Explanation: We perform a self-join on the Listens table to align users listening to the same song on the same day. Grouping by friend pair and day allows counting distinct songs. The HAVING clause filters out pairs with fewer than three common songs.
Go Solution
In Go, we would typically execute the same SQL query using a database driver. Here is a conceptual solution using database/sql:
func SimilarFriends(db *sql.DB) (*sql.Rows, error) {
query := `
SELECT f.user1_id, f.user2_id
FROM Friendship f
JOIN Listens l1 ON f.user1_id = l1.user_id
JOIN Listens l2 ON f.user2_id = l2.user_id
AND l1.song_id = l2.song_id
AND l1.day = l2.day
GROUP BY f.user1_id, f.user2_id, l1.day
HAVING COUNT(DISTINCT l1.song_id) >= 3
`
return db.Query(query)
}
Go-specific notes: Go requires explicit database connection handling and returns rows or an error. The SQL logic remains identical to the Python version.
Worked Examples
Input Tables
Listens:
| user_id | song_id | day |
|---|---|---|
| 1 | 10 | 2021-03-15 |
| 1 | 11 | 2021-03-15 |
| 1 | 12 | 2021-03-15 |
| 2 | 10 | 2021-03-15 |
| 2 | 11 | 2021-03-15 |
| 2 | 12 | 2021-03-15 |
| 4 | 10 | 2021-03-15 |
| 4 | 11 | 2021-03-15 |
| 4 | 13 | 2021-03-15 |
Friendship:
| user1_id | user2_id |
|---|---|
| 1 | 2 |
| 2 | 4 |
Step 1: Join Friendship with Listens for both users.
Step 2: Find matching songs on the same day:
| user1_id | user2_id | day | song_id |
|---|---|---|---|
| 1 | 2 | 2021-03-15 | 10 |
| 1 | 2 | 2021-03-15 | 11 |
| 1 | 2 | 2021-03-15 | 12 |
Step 3: Group by user1_id, user2_id, day and count distinct songs. Count = 3 ≥ 3.
Step 4: Output pair (1, 2).
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N log N) | Join and group operations dominate; N is number of Listens rows |
| Space | O(N) | Storage for intermediate join results and grouping |
The SQL aggregation efficiently reduces the problem size by eliminating unnecessary iteration over non-overlapping friends or days.
Test Cases
# Test with given example
assert similarFriends() == """
SELECT f.user1_id, f.user2_id
FROM Friendship f
JOIN Listens l1 ON f.user1_id = l1.user_id
JOIN Listens l2 ON f.user2_id = l2.user_id
AND l1.song_id = l2.song_id
AND l1.day = l2.day
GROUP BY f.user1_id, f.user2_id, l1.day
HAVING COUNT(DISTINCT l1.song_id) >= 3
""" # returns query string
# Edge case: no friends
# Edge case: friends with only 2 common songs
# Edge case: friends with 3+ songs but on different days
# Edge case: multiple valid pairs
| Test | Why |
|---|---|
| Example input | Validates base case from problem |
| No friends | Checks that empty output is handled |
| Only 2 songs shared | Ensures HAVING COUNT >=3 works |
| Songs on different days | Ensures day alignment is required |
| Multiple valid pairs | Ensures query returns all qualifying pairs |
Edge Cases
One edge case is friends who have common songs but fewer than three. The `HAVING COUNT(DISTINCT song_id) >=