LeetCode 3521 - Find Product Recommendation Pairs

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This problem asks us to identify product pairs that are frequently purchased by the same customers. The goal is to support a recommendation feature similar to "Customers who bought this also bought...", where products that commonly appear together in customer purchase histories can be suggested to future buyers.

The ProductPurchases table contains one row for each (user_id, product_id) combination. Because (user_id, product_id) is unique, a customer can appear at most once for a given product. The quantity column tells us how many units were purchased, but for this problem the quantity is irrelevant. We only care whether a customer purchased a product or not.

The ProductInfo table stores metadata about each product, specifically its category and price. The price is not used in the computation, but the category must be included in the final output.

For every customer, we need to find all pairs of products that customer purchased. If multiple customers purchased the same pair, we count how many distinct customers purchased both products.

A pair is represented as:

(product1_id, product2_id)
product1_id < product2_id

This ordering prevents duplicates such as both (101, 102) and (102, 101).

After counting customers for every pair, we keep only those pairs purchased by at least three different customers.

The output must contain:

product1_id
product2_id
product1_category
product2_category
customer_count

The result must be sorted by:

customer_count descending
product1_id ascending
product2_id ascending

An important observation is that quantity does not affect the answer. A customer purchasing 100 units of a product contributes exactly the same as a customer purchasing 1 unit. The only thing that matters is whether the customer purchased the product.

Some important edge cases include customers who purchased only one product, which generate no pairs, product pairs purchased by fewer than three customers, which must be excluded, and customers with many purchased products, which generate many pair combinations.

Since this is a database problem, the expected solution is an SQL query rather than an algorithm implemented in a traditional programming language. This problem asks us to identify product pairs that are frequently co-purchased by the same customers. We are given a transactional table ProductPurchases, where each row represents a user purchasing a product in some quantity, and a ProductInfo table that maps each product to its category and price. The quantity is irrelevant for determining co-purchase relationships because the problem is concerned with whether a user bought a product, not how many units they bought.

The goal is to find all unique pairs of products (product1_id, product2_id) such that the same customer has purchased both products. A pair is considered valid only if at least 3 distinct users have bought both products. Additionally, we must ensure a consistent ordering constraint product1_id < product2_id to avoid duplicate symmetric pairs.

For each valid pair, we must return the number of distinct customers who bought both products, along with the categories of each product from ProductInfo. The final output must be sorted by customer_count descending, then product1_id ascending, and then product2_id ascending.

The input size is not explicitly stated, but since this is a typical database co-occurrence problem, we should assume potentially large tables. This implies that naive quadratic grouping across all products per user would likely be too slow.

Edge cases include users purchasing only one product (cannot form pairs), products purchased by fewer than 3 overlapping users (must be excluded), and repeated purchases of the same product by the same user (must not inflate counts).

Approaches

Brute Force Approach

A brute force approach would examine every possible pair of products in the catalog.

For each pair of products, we could scan all customers and check whether each customer purchased both products. If so, we increment the count for that pair.

This approach is correct because every product pair is explicitly checked against every customer. However, it is inefficient because most product pairs may never have been purchased together.

If there are P products and U users, the approach requires examining roughly P² pairs and checking each against all users.

Key Insight

Instead of starting from products, we should start from customers.

Each customer already tells us which products were purchased together. If a customer bought products {A, B, C}, then that customer contributes exactly one count to:

(A, B)
(A, C)
(B, C)

This can be generated naturally through a self join on ProductPurchases.

By joining the table to itself on the same user_id, we directly generate all product pairs purchased by each customer. Then we group by the product pair and count distinct customers.

After obtaining the pair counts, we join with ProductInfo twice to retrieve the categories of both products.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(P² × U)	O(1)	Check every product pair against every customer
Optimal	O(K²) per customer group, implemented via self join	O(number of generated pairs)	Generate only actual co-purchased pairs

Here:

P = number of products
U = number of users
K = products purchased by a particular user

The database optimizer typically executes the self join efficiently using indexes and grouping.

Algorithm Walkthrough

Step 1

Self join ProductPurchases with itself using the same user_id.

This finds products purchased by the same customer.

p1.user_id = p2.user_id

Step 2

Enforce:

p1.product_id < p2.product_id

This guarantees that every pair appears exactly once and prevents duplicate pairs such as (101,102) and (102,101).

Step 3

Group by the product pair.

GROUP BY p1.product_id, p2.product_id

Each group now represents one unique co-purchase relationship.

Step 4

Count the number of distinct customers in each group.

COUNT(DISTINCT p1.user_id)

This produces the required customer_count.

Step 5

Keep only pairs purchased by at least three customers.

HAVING COUNT(DISTINCT p1.user_id) >= 3

Step 6

Join ProductInfo twice.

The first join retrieves the category of product1.

The second join retrieves the category of product2.

Step 7

Sort the result according to the required ordering.

ORDER BY customer_count DESC,
         product1_id ASC,
         product2_id ASC

Why it works

The self join generates every pair of products purchased by the same customer. Because the join condition enforces product1_id < product2_id, every valid pair appears exactly once per customer. Grouping by the pair aggregates all customers who purchased both products. Counting distinct users therefore gives exactly the number of customers who bought both products. Filtering by the threshold of three customers leaves only recommendation-worthy pairs.

SQL Solution

SELECT
    p1.product_id AS product1_id,
    p2.product_id AS product2_id,
    pi1.category AS product1_category,
    pi2.category AS product2_category,
    COUNT(DISTINCT p1.user_id) AS customer_count
FROM ProductPurchases p1
JOIN ProductPurchases p2
    ON p1.user_id = p2.user_id
   AND p1.product_id < p2.product_id
JOIN ProductInfo pi1
    ON p1.product_id = pi1.product_id
JOIN ProductInfo pi2
    ON p2.product_id = pi2.product_id
GROUP BY
    p1.product_id,
    p2.product_id,
    pi1.category,
    pi2.category
HAVING COUNT(DISTINCT p1.user_id) >= 3
ORDER BY
    customer_count DESC,
    product1_id ASC,
    product2_id ASC;

Implementation Explanation

The solution begins with a self join of ProductPurchases. Matching rows with the same user_id identifies products purchased by the same customer. The condition p1.product_id < p2.product_id ensures that only unique product pairs are generated.

After generating all customer-product pair combinations, the query groups by the pair of products. Within each group, COUNT(DISTINCT p1.user_id) computes how many unique customers purchased both products.

The HAVING clause removes pairs that do not meet the recommendation threshold of three customers.

Finally, the query joins ProductInfo twice to obtain category information for both products and sorts the results according to the required ordering.

Worked Example

Consider the sample data.

User 1

Purchased:

Product
101
102
103

Generated pairs:

Pair
(101,102)
(101,103)
(102,103)

User 2

Purchased:

Product
101
102
104

Generated pairs:

Pair
(101,102)
(101,104)
(102,104)

User 3

Purchased:

Product
101
103
105

Generated pairs:

Pair
(101,103)
(101,105)
(103,105)

User 4

Purchased:

Product
101
102
103
104

Generated pairs:

Pair
(101,102)
(101,103)
(101,104)
(102,103)
(102,104)
(103,104)

User 5

Purchased:

Product
102
104

Generated pairs:

Pair
(102,104)

Aggregated Counts

Pair	Customers
(101,102)	3
(101,103)	3
(102,104)	3
(101,104)	2
(102,103)	2
(101,105)	1
(103,104)	1
(103,105)	1

After applying:

HAVING COUNT(DISTINCT user_id) >= 3

we keep:

Pair	Customer Count
(101,102)	3
(101,103)	3
(102,104)	3

Joining category information produces the final answer. The brute force idea is to explicitly compute, for every user, the set of products they purchased, and then generate all possible product pairs for that user. For each pair, we increment a global counter. After processing all users, we filter pairs that have at least 3 distinct users.

This approach is correct because it directly models the definition of co-purchase: if two products appear in the same user's purchase set, they form a candidate pair. However, it is inefficient because a user who purchases k products contributes O(k²) pairs, and across many users this can become very expensive.

The inefficiency comes from recomputing and materializing all pairs explicitly without any aggregation optimization.

Optimal Approach

The key insight is that we only care about whether a user has purchased a product, not how many times. Therefore, we can first deduplicate purchases at the (user_id, product_id) level. Then, we group by user and collect their product sets.

Once we have each user's product set, we generate combinations of product pairs within that set and aggregate counts across users using a hash map keyed by (product1_id, product2_id).

Finally, we join with ProductInfo to fetch categories and filter pairs with at least 3 distinct users.

This reduces redundant computation and ensures we only process meaningful unique relationships.

Complexity Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(U * k²)	O(P²)	Generate all pairs per user without preprocessing
Optimal	O(U * k²)	O(P²)	Deduplicate first, aggregate pair counts efficiently

Here U is number of users, k is average number of products per user, and P is number of unique products.

Algorithm Walkthrough

First, we remove duplicate product purchases per user by converting each user's purchased products into a set. This ensures that multiple quantities of the same product do not inflate pair counts, since co-purchase is binary in nature.
Next, for each user, we sort their product list to ensure consistent ordering. Sorting is necessary so that we always form pairs (p1, p2) where p1 < p2, avoiding duplicate symmetric entries like (101, 102) and (102, 101).
For each user, we generate all possible unique pairs of products using a nested iteration over the sorted list. Each pair represents a co-purchase event contributed by that user.
We maintain a hash map where the key is (product1_id, product2_id) and the value is the number of distinct users who have contributed this pair. Each time a user generates a pair, we increment its count by 1.
After processing all users, we filter the map to retain only those pairs whose count is at least 3, since the problem requires a minimum threshold of 3 distinct customers.
We then join each valid product pair with the ProductInfo table twice to retrieve category values for both products.
Finally, we sort the result by customer_count descending, then product1_id ascending, then product2_id ascending.

Why it works

The correctness comes from the invariant that each user contributes at most one increment per product pair, because we deduplicate purchases per user. This ensures the count stored in the hash map is exactly the number of distinct users who purchased both products. Since we generate all combinations of each user's product set, we guarantee no valid co-purchase relationship is missed.

Python Solution

from typing import List, Tuple
from collections import defaultdict

class Solution:
    def findProductRecommendationPairs(
        self,
        ProductPurchases: List[List[int]],
        ProductInfo: List[List[object]]
    ) -> List[List[object]]:

        # Build product -> category mapping
        product_to_category = {}
        for pid, category, price in ProductInfo:
            product_to_category[pid] = category

        # Deduplicate purchases per user
        user_products = defaultdict(set)
        for user_id, product_id, quantity in ProductPurchases:
            user_products[user_id].add(product_id)

        # Count co-purchase pairs
        pair_count = defaultdict(int)

        for user_id, products in user_products.items():
            products = sorted(products)
            n = len(products)

            for i in range(n):
                for j in range(i + 1, n):
                    p1, p2 = products[i], products[j]
                    pair_count[(p1, p2)] += 1

        # Build result
        result = []
        for (p1, p2), cnt in pair_count.items():
            if cnt >= 3:
                result.append([
                    p1,
                    p2,
                    product_to_category.get(p1, ""),
                    product_to_category.get(p2, ""),
                    cnt
                ])

        # Sort by required order
        result.sort(key=lambda x: (-x[4], x[0], x[1]))
        return result

Explanation of Python Implementation

We first construct a mapping from product IDs to categories for fast lookup. Then we aggregate each user's purchases into a set to remove duplicates caused by quantity.

For each user, we sort their product list so that pair generation is consistent and ordered. We then generate all combinations of product pairs using a nested loop and increment counts in a dictionary keyed by tuples.

Finally, we filter pairs based on the threshold and attach category metadata before sorting the result.

Go Solution

package main

import (
	"sort"
)

type Pair struct {
	p1, p2 int
}

func findProductRecommendationPairs(productPurchases [][]int, productInfo [][]interface{}) [][]interface{} {
	productToCategory := make(map[int]string)

	for _, row := range productInfo {
		pid := row[0].(int)
		category := row[1].(string)
		productToCategory[pid] = category
	}

	userProducts := make(map[int]map[int]bool)

	for _, row := range productPurchases {
		userID := row[0].(int)
		productID := row[1].(int)

		if userProducts[userID] == nil {
			userProducts[userID] = make(map[int]bool)
		}
		userProducts[userID][productID] = true
	}

	pairCount := make(map[Pair]int)

	for _, productsMap := range userProducts {
		products := make([]int, 0, len(productsMap))

		for p := range productsMap {
			products = append(products, p)
		}

		sort.Ints(products)

		for i := 0; i < len(products); i++ {
			for j := i + 1; j < len(products); j++ {
				pair := Pair{products[i], products[j]}
				pairCount[pair]++
			}
		}
	}

	result := make([][]interface{}, 0)

	for pair, cnt := range pairCount {
		if cnt >= 3 {
			result = append(result, []interface{}{
				pair.p1,
				pair.p2,
				productToCategory[pair.p1],
				productToCategory[pair.p2],
				cnt,
			})
		}
	}

	sort.Slice(result, func(i, j int) bool {
		if result[i][4].(int) != result[j][4].(int) {
			return result[i][4].(int) > result[j][4].(int)
		}
		if result[i][0].(int) != result[j][0].(int) {
			return result[i][0].(int) < result[j][0].(int)
		}
		return result[i][1].(int) < result[j][1].(int)
	})

	return result
}

Go-specific Notes

Go requires explicit type handling when working with interface-based input like this. We use a struct Pair to represent product pairs as map keys since Go does not allow slice keys in maps. We also use maps of maps (map[int]map[int]bool) to simulate sets for user product deduplication.

Sorting is handled via sort.Ints, and final ordering is implemented using sort.Slice with explicit type assertions.

Worked Examples

Using the provided input, we first build user product sets:

User 1: {101, 102, 103}

User 2: {101, 102, 104}

User 3: {101, 103, 105}

User 4: {101, 102, 103, 104}

User 5: {102, 104}

Now we generate pairs per user.

User 1 pairs:

(101,102), (101,103), (102,103)

User 2 pairs:

(101,102), (101,104), (102,104)

User 3 pairs:

(101,103), (101,105), (103,105)

User 4 pairs:

(101,102), (101,103), (101,104), (102,103), (102,104), (103,104)

User 5 pairs:

(102,104)

Now we aggregate counts:

(101,102) -> users 1,2,4 = 3

(101,103) -> users 1,3,4 = 3

(102,104) -> users 2,4,5 = 3

others < 3

Final output includes only these three pairs, sorted accordingly.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(total generated customer pairs)	Self join creates co-purchase pairs for each customer
Space	O(number of distinct product pairs)	Required for grouping and aggregation

The dominant work comes from generating product pairs for each customer. If a customer purchased K products, that customer contributes K(K-1)/2 generated rows. The overall complexity is therefore proportional to the total number of co-purchase pairs generated across all customers.

Test Cases

Since this is an SQL problem, the following datasets are useful for validation.

# Sample case from the problem statement
# Expected pairs:
# (101,102), (101,103), (102,104)

# No pair reaches threshold 3
# Expected: empty result

# User 1: {1,2}
# User 2: {1,2}
# Only 2 customers purchased both
# Result should be empty

# Exactly 3 customers purchase same pair
# Users:
# {1,2}
# {1,2}
# {1,2}
# Expected:
# (1,2) with count 3

# Customer purchases only one product
# Generates no pairs
# Should not affect result

# Multiple qualifying pairs
# Verify ordering by customer_count DESC

# Tie on customer_count
# Verify ordering by product1_id ASC,
# then product2_id ASC

# Large quantity values
# Quantity should not affect answer

# Product pair purchased by many customers
# Verify DISTINCT customer counting

# Product catalog contains unused products
# Unused products should never appear
| Time | O(U * k²) | Each user generates all product pairs from their purchase set |
| Space | O(P + C) | P for product mapping, C for pair counts |

The dominant factor is pair generation per user. Although quadratic in per-user product count, this is necessary because all pair relationships must be explicitly considered.

## Test Cases

assert Solution().findProductRecommendationPairs( [[1,101,2],[1,102,1],[2,101,1],[2,102,1],[3,101,1],[3,102,1],[4,101,1],[4,102,1],[5,101,1],[5,102,1]], [[101,"A",10],[102,"B",20]] ) == [[101,102,"A","B",5]] # all users buy same pair

assert Solution().findProductRecommendationPairs( [[1,101,1],[1,102,1],[2,103,1],[2,104,1]], [[101,"A",10],[102,"B",20],[103,"C",30],[104,"D",40]] ) == [] # no pair meets threshold

assert Solution().findProductRecommendationPairs( [[1,101,1],[1,102,1],[1,103,1],[2,101,1],[2,102,1],[2,103,1],[3,101,1],[3,102,1],[3,103,1]], [[101,"A",10],[102,"B",20],[103,"C",30]] ) == [ [101,102,"A","B",3], [101,103,"A","C",3], [102,103,"B","C",3] ] # full triangle


### Test Summary

| Test | Why |
| --- | --- |
| Sample input | Verifies overall correctness |
| No qualifying pair | Tests HAVING filter |
| Exactly three customers | Tests threshold boundary |
| Single product customer | Ensures no invalid pair generation |
| Multiple qualifying pairs | Verifies aggregation logic |
| Tie counts | Verifies sorting rules |
| Large quantities | Confirms quantity is ignored |
| Repeated qualifying pair | Validates customer counting |
| Unused products | Ensures joins do not introduce extra rows |

## Edge Cases

### Customers With Only One Purchased Product

A customer who purchased only one product cannot contribute any product pair. A common mistake is to accidentally include such customers in pair calculations. The self join naturally avoids this because no second product exists to satisfy `product1_id < product2_id`.

### Product Pairs Below the Recommendation Threshold

Many product pairs may be purchased together by only one or two customers. These pairs should not appear in the final result. The `HAVING COUNT(DISTINCT p1.user_id) >= 3` clause correctly filters them after aggregation.

### Duplicate Pair Ordering

Without enforcing `product1_id < product2_id`, both `(101,102)` and `(102,101)` would be generated, causing double counting and duplicate output rows. The strict inequality guarantees a unique canonical representation for every pair.

### Quantity Values Greater Than One

A customer may purchase multiple units of a product. Since the recommendation logic is based on co-purchase relationships rather than purchase volume, quantities must not influence pair counts. The solution never references the `quantity` column, ensuring each customer contributes at most one count per product pair.

### Products That Never Participate in a Qualifying Pair

Some products may appear in purchases but never reach the threshold of three common customers with another product. These products naturally disappear during aggregation because no qualifying pair survives the `HAVING` filter.
| all users buy same pair | verifies max aggregation |
| no overlap | verifies empty output |
| full triangle overlap | verifies multi-pair correctness |

## Edge Cases

One important edge case is when users purchase only a single product. In this case, no pairs can be generated, and the implementation correctly skips such users because the nested loop requires at least two products.

Another edge case is repeated purchases of the same product by the same user. Since we convert each user's product list into a set, duplicates are removed, ensuring we do not artificially inflate pair counts.

A third edge case is when many users share partial overlaps of products, but no pair reaches the threshold of 3. The algorithm still computes counts correctly but filters everything out at the final stage, returning an empty result.