LeetCode 1917 - Leetcodify Friends Recommendations

This problem requires generating friend recommendations for users on the Leetcodify platform based on their listening habits. We are given two tables: Listens and Friendship.

LeetCode Problem 1917

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This problem requires generating friend recommendations for users on the Leetcodify platform based on their listening habits. We are given two tables: Listens and Friendship. The Listens table logs when a user listens to a particular song on a specific day, and it may contain duplicates. The Friendship table lists all pairs of users who are already friends.

The goal is to recommend user x to user y if they are not already friends and they have listened to three or more identical songs on the same day. Recommendations are unidirectional, meaning that if user x is recommended to user y, it does not automatically imply the reverse unless the condition also holds in the other direction. The output should be a list of recommendations with no duplicates.

Key points include: comparisons are day-specific, only unique songs on that day count, and friendship pairs are symmetric but stored such that user1_id < user2_id. Edge cases include users who listen to the same songs on different days, duplicate entries in Listens, and users with no overlapping songs.

The scale is moderate to large given typical LeetCode database problems, so a naive solution that compares every pair of users for every day is likely too slow.

Approaches

A brute-force approach would involve joining the Listens table with itself on day and song_id to find all pairs of users who listened to the same song on the same day. Then, we would group by user pairs and count the distinct songs to see if they meet the threshold of three. Finally, we would filter out pairs who are already friends. While correct, this approach involves many self-joins and can become extremely slow with large datasets because it examines every combination of users and songs.

The optimal approach relies on the key insight that we can aggregate first, counting the number of shared songs per user pair per day. By grouping by day and user pair, we reduce the problem from a quadratic number of combinations to only those that actually listened to the same songs. After identifying pairs with three or more shared songs, we perform a left join with the Friendship table to exclude existing friends. Finally, we generate the unidirectional recommendations.

Approach Time Complexity Space Complexity Notes
Brute Force O(N^2 * M) O(N^2) Join Listens with itself for every user pair; N = users, M = songs per day
Optimal O(N * M + F) O(N^2) Aggregate shared songs per day, filter ≥3, then exclude friends; F = number of friendships

Algorithm Walkthrough

  1. Normalize friendships: Convert Friendship table into pairs where both directions exist (for easier exclusion). This ensures that user1_id -> user2_id and user2_id -> user1_id can be filtered.
  2. Generate user pairs who listened to the same song on the same day: Join the Listens table with itself on day and song_id, ensuring user_id from the first table is different from the second (user_id_1 != user_id_2). This gives all user pairs who share at least one song per day.
  3. Count shared songs per user pair per day: Group by user1_id, user2_id, and day, then count distinct song_ids. Keep only pairs with count >= 3.
  4. Exclude existing friendships: Left join the result with the normalized friendship table. Keep only pairs that do not exist in the friendship table. This gives non-friend user pairs with sufficient song overlap.
  5. Generate unidirectional recommendations: Output each valid pair in both directions as separate rows. This ensures that recommendations are unidirectional.
  6. Return results: Remove any duplicates if necessary and return the user_id and recommended_id.

This works because the grouping ensures only users with at least three shared songs on the same day are considered. Filtering out existing friendships guarantees we recommend only non-friends.

Python Solution

import pandas as pd

def recommend_friends(listens: pd.DataFrame, friendship: pd.DataFrame) -> pd.DataFrame:
    # Step 1: Self-join listens to get user pairs listening to the same song on the same day
    merged = listens.merge(
        listens,
        on=['day', 'song_id'],
        suffixes=('_1', '_2')
    )
    merged = merged[merged['user_id_1'] < merged['user_id_2']]

    # Step 2: Count distinct shared songs per day per user pair
    shared_counts = merged.groupby(['user_id_1', 'user_id_2', 'day'])['song_id'].nunique().reset_index()
    shared_counts = shared_counts[shared_counts['song_id'] >= 3]

    # Step 3: Remove existing friendships
    # Normalize friendship for easy lookup
    friendships = pd.concat([
        friendship.rename(columns={'user1_id':'user_id_1','user2_id':'user_id_2'}),
        friendship.rename(columns={'user1_id':'user_id_2','user2_id':'user_id_1'})
    ])
    shared_counts = shared_counts.merge(
        friendships,
        on=['user_id_1', 'user_id_2'],
        how='left',
        indicator=True
    )
    shared_counts = shared_counts[shared_counts['_merge'] == 'left_only']

    # Step 4: Generate unidirectional recommendations
    recommendations = pd.concat([
        shared_counts[['user_id_1','user_id_2']].rename(columns={'user_id_1':'user_id','user_id_2':'recommended_id'}),
        shared_counts[['user_id_2','user_id_1']].rename(columns={'user_id_2':'user_id','user_id_1':'recommended_id'})
    ]).drop_duplicates().reset_index(drop=True)

    return recommendations

The code merges the Listens table with itself to find pairs sharing songs on the same day, counts distinct songs per day, filters out existing friends, and then produces unidirectional recommendations by concatenating both directions of each pair.

Go Solution

package main

import (
    "database/sql"
    _ "github.com/go-sql-driver/mysql"
)

func recommendFriends(db *sql.DB) (*sql.Rows, error) {
    query := `
    WITH user_pairs AS (
        SELECT l1.user_id AS user_id_1, l2.user_id AS user_id_2, l1.day
        FROM Listens l1
        JOIN Listens l2 ON l1.day = l2.day AND l1.song_id = l2.song_id
        WHERE l1.user_id < l2.user_id
    ),
    shared_songs AS (
        SELECT user_id_1, user_id_2
        FROM user_pairs
        GROUP BY user_id_1, user_id_2, day
        HAVING COUNT(DISTINCT song_id) >= 3
    ),
    filtered_pairs AS (
        SELECT ss.user_id_1, ss.user_id_2
        FROM shared_songs ss
        LEFT JOIN (
            SELECT user1_id AS u1, user2_id AS u2 FROM Friendship
            UNION
            SELECT user2_id AS u1, user1_id AS u2 FROM Friendship
        ) f ON ss.user_id_1 = f.u1 AND ss.user_id_2 = f.u2
        WHERE f.u1 IS NULL
    )
    SELECT user_id_1 AS user_id, user_id_2 AS recommended_id FROM filtered_pairs
    UNION
    SELECT user_id_2 AS user_id, user_id_1 AS recommended_id FROM filtered_pairs;
    `
    return db.Query(query)
}

The Go solution uses a single SQL query to handle everything efficiently. Differences from Python include relying entirely on SQL and using UNION to produce unidirectional recommendations. The Go function returns a *sql.Rows which can be iterated to retrieve the results.

Worked Examples

Using the example from the problem:

Step Data Notes
Self-join listens Pairs like (1,2), (1,3), (2,3) on 2021-03-15 with each song Filters out same user
Count shared songs (1,2,2021-03-15):3, (1,3,2021-03-15):3, (2,3,2021-03-15):3, (1,4,2021-03-15):2 Only >=3 retained
Exclude friends (1,2) removed because they are friends Remaining: (1,3), (2,3)
Generate recommendations Output both directions: 1→3, 3→1, 2→3, 3→2 Matches expected output

Complexity Analysis

Measure Complexity Explanation
Time O(N*M + F) Merge and group operations scale with number of listens N*M; F for filtering friendships
Space O(N^2) Storing user pairs and intermediate data for aggregations

The algorithm avoids full pairwise comparisons by aggregating per day and song, significantly reducing unnecessary computations.

Test Cases

# Example from the problem
assert recommend_friends(
    pd.DataFrame([
        [1,10,'2021-03-15'], [1,11,'2021-03-15'], [