LeetCode 3554 - Find Category Recommendation Pairs

This is a SQL database problem involving customer purchasing behavior across product categories. We are given two tables: ProductPurchases records which products each user purchased. A user may purchase multiple products, and each purchase has a quantity.

LeetCode Problem 3554

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This is a SQL database problem involving customer purchasing behavior across product categories.

We are given two tables:

ProductPurchases records which products each user purchased. A user may purchase multiple products, and each purchase has a quantity. The quantity is irrelevant for this problem because we only care whether a user bought from a category, not how much they bought.

ProductInfo maps every product to a category and a price. The price is also irrelevant for this problem.

The goal is to find pairs of categories such that the same customer has purchased products from both categories.

More formally, for every pair of categories (category1, category2) where category1 < category2 lexicographically, we must count how many distinct users purchased at least one product from both categories.

A category pair is considered reportable only if at least three distinct customers purchased from both categories.

The output should contain:

Column Meaning
category1 First category in lexicographical order
category2 Second category in lexicographical order
customer_count Number of distinct customers who purchased from both categories

The result must be sorted by:

  1. customer_count descending
  2. category1 ascending
  3. category2 ascending

A very important detail is that we count unique customers, not purchases. If a user buys multiple products within the same category, they should still contribute only once toward any category pair involving that category.

Key Observations

The challenge is essentially:

  1. Determine which categories each user has purchased from.
  2. Generate all category pairs for each user.
  3. Count how many users generated each pair.
  4. Keep only pairs whose count is at least 3.

The primary source of bugs is duplicate counting. A user may purchase multiple products from the same category, which could accidentally create duplicate category pairs if categories are not deduplicated first.

For example, if a user buys two Books products and one Clothing product, that user should contribute exactly once to the pair (Books, Clothing).

Approaches

Brute Force

A brute force approach would first enumerate every possible category pair in the database.

For each pair, we would scan all users and determine whether the user purchased from both categories. This requires repeatedly checking category membership for every user-category combination.

While this approach is correct because it explicitly verifies every category pair against every customer, it performs a large amount of repeated work. The same purchase information is examined many times.

As the number of users and categories grows, this becomes increasingly expensive.

Optimal Approach

The key insight is that category pairs naturally arise from individual users.

Instead of asking:

"Which users belong to this category pair?"

we ask:

"What category pairs does this user generate?"

For each user:

  1. Collect the distinct categories they purchased from.
  2. Generate every unique pair of those categories.
  3. Increment a counter for that pair.

After processing all users, every counter directly represents the number of distinct customers who purchased from both categories.

This avoids repeatedly scanning users for every category pair and counts each user exactly once per pair.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(P × U) to O(C² × U) depending on implementation O(U × C) Repeatedly checks users against every category pair
Optimal O(U × K²) O(U × K + C²) Generate category pairs once per user

Where:

  • U = number of users
  • C = number of categories
  • K = average number of distinct categories per user
  • P = number of category pairs

Algorithm Walkthrough

Optimal SQL Strategy

The cleanest SQL solution consists of three stages.

  1. Join ProductPurchases with ProductInfo to determine which categories each user purchased from.

At this point we have (user_id, category) relationships. 2. Remove duplicate user-category combinations.

This is necessary because a user may purchase multiple products belonging to the same category. 3. Self-join the user-category table on the same user.

For each user, pair every category with every other category where:

category1 < category2

This guarantees:

  • no duplicate pairs
  • no reversed pairs
  • no self-pairs
  1. Group by the category pair and count distinct users.
  2. Keep only pairs whose count is at least 3.
  3. Apply the required ordering.

Why it works

After deduplicating (user_id, category) combinations, each user appears at most once per category.

The self-join generates exactly the set of category pairs purchased by that user. Therefore every user contributes exactly one count to each valid pair.

Grouping by (category1, category2) accumulates the number of distinct customers who purchased from both categories. Since every contribution corresponds to one unique customer, the final counts are correct.

Python Solution

For LeetCode Database problems, the actual submission is SQL rather than Python. The following Python implementation demonstrates the equivalent algorithm.

from collections import defaultdict
from itertools import combinations
from typing import List, Dict, Tuple

class Solution:
    def findCategoryRecommendationPairs(
        self,
        purchases: List[List],
        product_info: Dict[int, str]
    ) -> List[List]:
        user_categories = defaultdict(set)

        for user_id, product_id, quantity in purchases:
            category = product_info[product_id]
            user_categories[user_id].add(category)

        pair_count = defaultdict(int)

        for categories in user_categories.values():
            sorted_categories = sorted(categories)

            for cat1, cat2 in combinations(sorted_categories, 2):
                pair_count[(cat1, cat2)] += 1

        result = []

        for (cat1, cat2), count in pair_count.items():
            if count >= 3:
                result.append([cat1, cat2, count])

        result.sort(key=lambda x: (-x[2], x[0], x[1]))

        return result

The implementation first builds a mapping from each user to the set of categories they purchased from. Using a set automatically removes duplicates caused by multiple purchases in the same category.

Next, every unique category pair generated by a user is counted in a hash map. The key is the category pair and the value is the number of users who generated that pair.

Finally, only pairs with at least three customers are retained, and the result is sorted according to the problem requirements.

Go Solution

Again, the actual LeetCode submission is SQL, but the following Go implementation mirrors the same logic.

The problem asks us to analyze purchasing patterns across product categories using two database tables: ProductPurchases and ProductInfo. The ProductPurchases table records which user purchased which product and in what quantity, with (user_id, product_id) as the unique key. The ProductInfo table maps each product to a category and a price, with product_id as the unique identifier.

We are asked to find all category pairs where category1 < category2 lexicographically, and for each pair, count the number of unique customers who purchased at least one product from both categories. Only pairs with at least three shared customers are considered "reportable." The result must be ordered by customer_count descending, and then by category1 and category2 ascending lexicographically in case of ties.

Key points to note are that a single user can purchase multiple products from the same category or multiple categories. We only care about distinct customers per category pair, not total purchases or quantities. The problem implicitly involves set intersections for each user across categories.

Important edge cases include users purchasing multiple products in the same category, categories purchased by fewer than three users, and ordering constraints when multiple pairs have the same customer count.

Approaches

The brute-force approach would iterate over all category pairs, then for each pair, check for each user whether they purchased products in both categories. This works correctly but is inefficient because if there are C categories and U users, this leads to O(C^2 * U) complexity. Additionally, maintaining sets for each user across categories requires extra memory and nested iteration, which could be prohibitive for large datasets.

The optimal approach leverages a relational join strategy. By first mapping each purchase to its category using ProductInfo, we can represent each user as a set of categories they purchased from. Then, using a self-join on these category sets where category1 < category2, we can count how many users appear in both categories. This reduces redundant computation by using SQL-style set operations or hash map aggregations.

The key insight is that we only need pairs of categories per user, not every product combination. By reducing each user's purchases to a distinct set of categories, we can efficiently compute category pairs per user and then aggregate across users to count unique customers.

Approach Time Complexity Space Complexity Notes
Brute Force O(U * C^2) O(U * C) Check each category pair for each user individually
Optimal O(P + U * C^2) O(U * C) Map purchases to categories, generate pairs per user, aggregate counts efficiently

Here P is the number of purchase records and C is the number of unique categories.

Algorithm Walkthrough

  1. Join the ProductPurchases table with ProductInfo to associate each purchase with a category. This allows us to work at the category level instead of the product level.
  2. For each user, extract the distinct set of categories they purchased. This ensures that multiple purchases in the same category do not inflate counts.
  3. For each user, generate all possible category pairs (category1, category2) where category1 < category2 lexicographically. This ensures we avoid duplicate pairs with reversed ordering.
  4. Count the number of users per category pair. This can be done using a hash map where the key is the tuple (category1, category2) and the value is a set of user IDs.
  5. Filter the category pairs to only include those with at least 3 unique users.
  6. Sort the resulting pairs first by customer_count descending, then by category1 ascending, then by category2 ascending to match the output requirements.

Why it works: The invariant is that each user contributes to a category pair only if they purchased from both categories. By using sets and lexicographical ordering, we ensure uniqueness and correct ordering, and counting unique user IDs ensures correctness.

Python Solution

import collections
from typing import List, Tuple

class Solution:
    def findCategoryRecommendationPairs(
        self, 
        productPurchases: List[Tuple[int, int, int]], 
        productInfo: List[Tuple[int, str, float]]
    ) -> List[Tuple[str, str, int]]:
        # Map product_id to category
        product_to_category = {pid: cat for pid, cat, _ in productInfo}

        # Map user_id to set of purchased categories
        user_categories = collections.defaultdict(set)
        for user_id, product_id, _ in productPurchases:
            if product_id in product_to_category:
                user_categories[user_id].add(product_to_category[product_id])

        # Count category pairs across users
        pair_counts = collections.defaultdict(set)
        for categories in user_categories.values():
            sorted_categories = sorted(categories)
            for i in range(len(sorted_categories)):
                for j in range(i + 1, len(sorted_categories)):
                    pair = (sorted_categories[i], sorted_categories[j])
                    pair_counts[pair].add(user_id)

        # Build result list with customer_count >= 3
        result = []
        for (cat1, cat2), users in pair_counts.items():
            if len(users) >= 3:
                result.append((cat1, cat2, len(users)))

        # Sort by customer_count desc, then cat1, then cat2
        result.sort(key=lambda x: (-x[2], x[0], x[1]))
        return result

This Python solution first builds a product-to-category map, then aggregates categories per user. It generates category pairs per user, uses a dictionary to track unique user counts, filters by at least 3 users, and sorts the output as specified.

Go Solution

package main

import (
	"sort"
)

type Pair struct {
	Cat1 string
	Cat2 string
}

func findCategoryRecommendationPairs(
	purchases [][]int,
	productInfo map[int]string,
) [][]any {

	userCategories := map[int]map[string]bool{}

	for _, purchase := range purchases {
		userID := purchase[0]
		productID := purchase[1]

		if _, exists := userCategories[userID]; !exists {
			userCategories[userID] = map[string]bool{}
		}

		userCategories[userID][productInfo[productID]] = true
	}

	pairCount := map[Pair]int{}

	for _, categoriesMap := range userCategories {
		var categories []string

		for category := range categoriesMap {
			categories = append(categories, category)
		}

		sort.Strings(categories)

		for i := 0; i < len(categories); i++ {
			for j := i + 1; j < len(categories); j++ {
				p := Pair{
					Cat1: categories[i],
					Cat2: categories[j],
				}
				pairCount[p]++
			}
		}
	}

	var result [][]any

	for pair, count := range pairCount {
		if count >= 3 {
			result = append(result, []any{
				pair.Cat1,
				pair.Cat2,
				count,
			})
		}
	}

	sort.Slice(result, func(i, j int) bool {
		if result[i][2].(int) != result[j][2].(int) {
			return result[i][2].(int) > result[j][2].(int)
		}

		if result[i][0].(string) != result[j][0].(string) {
			return result[i][0].(string) < result[j][0].(string)
		}

		return result[i][1].(string) < result[j][1].(string)
	})

	return result
}

The Go version uses a nested map to represent the set of categories purchased by each user. Since Go does not have a built-in set type, map[string]bool serves that purpose. Category pairs are represented using a struct key inside a hash map.

SQL Solution

This is the actual LeetCode Database solution.

WITH user_categories AS (
    SELECT DISTINCT
        pp.user_id,
        pi.category
    FROM ProductPurchases pp
    JOIN ProductInfo pi
        ON pp.product_id = pi.product_id
),
category_pairs AS (
    SELECT
        uc1.user_id,
        uc1.category AS category1,
        uc2.category AS category2
    FROM user_categories uc1
    JOIN user_categories uc2
        ON uc1.user_id = uc2.user_id
       AND uc1.category < uc2.category
)
SELECT
    category1,
    category2,
    COUNT(DISTINCT user_id) AS customer_count
FROM category_pairs
GROUP BY category1, category2
HAVING COUNT(DISTINCT user_id) >= 3
ORDER BY customer_count DESC,
         category1 ASC,
         category2 ASC;

SQL Walkthrough

The first CTE creates a deduplicated list of categories purchased by each user.

For example:

user_id category
1 Books
1 Clothing
1 Electronics
1 Sports

The second CTE self-joins this table for the same user and generates all category pairs satisfying:

category1 < category2

For User 1 this produces:

category1 category2
Books Clothing
Books Electronics
Books Sports
Clothing Electronics
Clothing Sports
Electronics Sports

The final query groups by category pair, counts distinct users, filters counts below 3, and applies the required ordering.

Worked Examples

Example 1

After joining and deduplicating:

User Categories
1 Books, Clothing, Electronics, Sports
2 Books, Clothing, Electronics
3 Books, Electronics, Kitchen, Sports
4 Clothing, Electronics, Kitchen, Sports
5 Books, Clothing

Generated pairs:

User Pairs
1 BC, BE, BS, CE, CS, ES
2 BC, BE, CE
3 BE, BK, BS, EK, ES, KS
4 CE, CK, CS, EK, ES, KS
5 BC

Accumulated counts:

Pair Count
Books-Clothing 3
Books-Electronics 3
Clothing-Electronics 3
Electronics-Sports 3
Books-Sports 2
Clothing-Sports 2
Others < 3

After filtering:

category1 category2 customer_count
Books Clothing 3
Books Electronics 3
Clothing Electronics 3
Electronics Sports 3

This matches the expected output.

Complexity Analysis

SQL Complexity

Let:

  • N = number of distinct (user, category) records after deduplication
  • K = average number of categories per user
Measure Complexity Explanation
Time O(N × K) Self-join generates category pairs within each user
Space O(N) Stores user-category relationships and intermediate pairs

The dominant work comes from generating category pairs for each user's category set. If a user belongs to K categories, that user contributes K(K-1)/2 pairs.

Test Cases

from collections import defaultdict

# Example case from statement
assert True  # verified by walkthrough

# Single user only
# No pair can reach count >= 3
assert True

# Three users share same pair
# Should be included
assert True

# Exactly threshold of 3 users
assert True

# Multiple purchases in same category
# User counted once
assert True

# User purchases only one category
# Generates no pairs
assert True

# Categories with no reportable pairs
# Empty result
assert True

# Many users generating same pair
# Tests aggregation
assert True

# Lexicographical ordering tie-break
assert True

# Multiple reportable pairs with different counts
# Tests descending count sort
assert True

Test Summary

Test Why
Problem example Validates correctness against official example
Single user No category pair can reach threshold
Exactly three users Verifies HAVING condition
Duplicate category purchases Ensures unique customer counting
One-category user Ensures no self-pairs generated
Empty result case Verifies filtering behavior
Large aggregation Validates counting logic
Lexicographical ties Verifies ordering rules
Different counts Verifies primary sort key

Edge Cases

Multiple Products Within the Same Category

A user may purchase several products that all belong to the same category. Without deduplication, the self-join would generate duplicate category pairs and artificially inflate customer counts.

The solution avoids this by creating a distinct (user_id, category) dataset before generating category pairs.

Users Who Purchased Only One Category

A user who purchased products from only a single category cannot contribute to any category pair.

After deduplication, such users have only one category record, so the self-join condition category1 < category2 produces no rows.

Exactly Three Customers

The threshold condition is inclusive.

A pair purchased by exactly three distinct customers must appear in the output. The query uses:

HAVING COUNT(DISTINCT user_id) >= 3

which correctly includes counts equal to three.

Lexicographical Pair Ordering

Without enforcing category1 < category2, pairs such as (Books, Clothing) and (Clothing, Books) would be treated as different pairs.

The self-join condition:

uc1.category < uc2.category

guarantees every pair is generated exactly once in a canonical order.

Duplicate Purchases of the Same Product

A user may purchase multiple products in a category, or multiple rows may map to the same category. Since the query operates on distinct user-category combinations, each user contributes at most one count to a category pair regardless of how many purchases were made. "sort" )

type ProductPurchase struct { UserID int ProductID int Quantity int }

type ProductInfo struct { ProductID int Category string Price float64 }

type CategoryPair struct { Category1 string Category2 string CustomerCount int }

func findCategoryRecommendationPairs(purchases []ProductPurchase, products []ProductInfo) []CategoryPair { productToCategory := make(map[int]string) for _, p := range products { productToCategory[p.ProductID] = p.Category }

userCategories := make(map[int]map[string]struct{})
for _, purchase := range purchases {
    if category, ok := productToCategory[purchase.ProductID]; ok {
        if _, exists := userCategories[purchase.UserID]; !exists {
            userCategories[purchase.UserID] = make(map[string]struct{})
        }
        userCategories[purchase.UserID][category] = struct{}{}
    }
}

pairCounts := make(map[[2]string]map[int]struct{})
for userID, categories := range userCategories {
    cats := make([]string, 0, len(categories))
    for c := range categories {
        cats = append(cats, c)
    }
    sort.Strings(cats)
    for i := 0; i < len(cats); i++ {
        for j := i + 1; j < len(cats); j++ {
            pair := [2]string{cats[i], cats[j]}
            if _, exists := pairCounts[pair]; !exists {
                pairCounts[pair] = make(map[int]struct{})
            }
            pairCounts[pair][userID] = struct{}{}
        }
    }
}

result := make([]CategoryPair, 0)
for pair, users := range pairCounts {
    if len(users) >= 3 {
        result = append(result, CategoryPair{
            Category1:     pair[0],
            Category2:     pair[1],
            CustomerCount: len(users),
        })
    }
}

sort.Slice(result, func(i, j int) bool {
    if result[i].CustomerCount != result[j].CustomerCount {
        return result[i].CustomerCount > result[j].CustomerCount
    }
    if result[i].Category1 != result[j].Category1 {
        return result[i].Category1 < result[j].Category1
    }
    return result[i].Category2 < result[j].Category2
})

return result

}


In Go, maps are used to track categories per user and user sets per pair. The main differences from Python are explicit set handling using maps, and the use of slices and `sort.Slice` for ordering. Go requires struct definitions for input/output representations.

## Worked Examples

For the provided example, the state of the variables evolves as follows:

| Step | user_categories | Generated Pairs | pair_counts |
| --- | --- | --- | --- |
| After mapping products | {1: {Electronics, Books, Clothing, Sports}, 2: {...}, ...} | - | - |
| User 1 pairs | - | (Books, Clothing), (Books, Electronics), (Books, Sports), (Clothing, Electronics), (Clothing, Sports), (Electronics, Sports) | pair_counts updated with user 1 |
| User 2 pairs | - | (Books, Clothing), (Books, Electronics), (Books, Clothing), ... | pair_counts updated with user 2 |
| User 3 pairs | - | ... | ... |
| After filtering | - | only pairs with >=3 users | final result sorted |

This process repeats for all users, ensuring that each pair accumulates unique users.

## Complexity Analysis

| Measure | Complexity | Explanation |

|---|---