LeetCode 3554 - Find Category Recommendation Pairs
This is a SQL database problem involving customer purchasing behavior across product categories. We are given two tables: ProductPurchases records which products each user purchased. A user may purchase multiple products, and each purchase has a quantity.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
This is a SQL database problem involving customer purchasing behavior across product categories.
We are given two tables:
ProductPurchases records which products each user purchased. A user may purchase multiple products, and each purchase has a quantity. The quantity is irrelevant for this problem because we only care whether a user bought from a category, not how much they bought.
ProductInfo maps every product to a category and a price. The price is also irrelevant for this problem.
The goal is to find pairs of categories such that the same customer has purchased products from both categories.
More formally, for every pair of categories (category1, category2) where category1 < category2 lexicographically, we must count how many distinct users purchased at least one product from both categories.
A category pair is considered reportable only if at least three distinct customers purchased from both categories.
The output should contain:
| Column | Meaning |
|---|---|
| category1 | First category in lexicographical order |
| category2 | Second category in lexicographical order |
| customer_count | Number of distinct customers who purchased from both categories |
The result must be sorted by:
customer_countdescendingcategory1ascendingcategory2ascending
A very important detail is that we count unique customers, not purchases. If a user buys multiple products within the same category, they should still contribute only once toward any category pair involving that category.
Key Observations
The challenge is essentially:
- Determine which categories each user has purchased from.
- Generate all category pairs for each user.
- Count how many users generated each pair.
- Keep only pairs whose count is at least 3.
The primary source of bugs is duplicate counting. A user may purchase multiple products from the same category, which could accidentally create duplicate category pairs if categories are not deduplicated first.
For example, if a user buys two Books products and one Clothing product, that user should contribute exactly once to the pair (Books, Clothing).
Approaches
Brute Force
A brute force approach would first enumerate every possible category pair in the database.
For each pair, we would scan all users and determine whether the user purchased from both categories. This requires repeatedly checking category membership for every user-category combination.
While this approach is correct because it explicitly verifies every category pair against every customer, it performs a large amount of repeated work. The same purchase information is examined many times.
As the number of users and categories grows, this becomes increasingly expensive.
Optimal Approach
The key insight is that category pairs naturally arise from individual users.
Instead of asking:
"Which users belong to this category pair?"
we ask:
"What category pairs does this user generate?"
For each user:
- Collect the distinct categories they purchased from.
- Generate every unique pair of those categories.
- Increment a counter for that pair.
After processing all users, every counter directly represents the number of distinct customers who purchased from both categories.
This avoids repeatedly scanning users for every category pair and counts each user exactly once per pair.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(P × U) to O(C² × U) depending on implementation | O(U × C) | Repeatedly checks users against every category pair |
| Optimal | O(U × K²) | O(U × K + C²) | Generate category pairs once per user |
Where:
U= number of usersC= number of categoriesK= average number of distinct categories per userP= number of category pairs
Algorithm Walkthrough
Optimal SQL Strategy
The cleanest SQL solution consists of three stages.
- Join
ProductPurchaseswithProductInfoto determine which categories each user purchased from.
At this point we have (user_id, category) relationships.
2. Remove duplicate user-category combinations.
This is necessary because a user may purchase multiple products belonging to the same category. 3. Self-join the user-category table on the same user.
For each user, pair every category with every other category where:
category1 < category2
This guarantees:
- no duplicate pairs
- no reversed pairs
- no self-pairs
- Group by the category pair and count distinct users.
- Keep only pairs whose count is at least 3.
- Apply the required ordering.
Why it works
After deduplicating (user_id, category) combinations, each user appears at most once per category.
The self-join generates exactly the set of category pairs purchased by that user. Therefore every user contributes exactly one count to each valid pair.
Grouping by (category1, category2) accumulates the number of distinct customers who purchased from both categories. Since every contribution corresponds to one unique customer, the final counts are correct.
Python Solution
For LeetCode Database problems, the actual submission is SQL rather than Python. The following Python implementation demonstrates the equivalent algorithm.
from collections import defaultdict
from itertools import combinations
from typing import List, Dict, Tuple
class Solution:
def findCategoryRecommendationPairs(
self,
purchases: List[List],
product_info: Dict[int, str]
) -> List[List]:
user_categories = defaultdict(set)
for user_id, product_id, quantity in purchases:
category = product_info[product_id]
user_categories[user_id].add(category)
pair_count = defaultdict(int)
for categories in user_categories.values():
sorted_categories = sorted(categories)
for cat1, cat2 in combinations(sorted_categories, 2):
pair_count[(cat1, cat2)] += 1
result = []
for (cat1, cat2), count in pair_count.items():
if count >= 3:
result.append([cat1, cat2, count])
result.sort(key=lambda x: (-x[2], x[0], x[1]))
return result
The implementation first builds a mapping from each user to the set of categories they purchased from. Using a set automatically removes duplicates caused by multiple purchases in the same category.
Next, every unique category pair generated by a user is counted in a hash map. The key is the category pair and the value is the number of users who generated that pair.
Finally, only pairs with at least three customers are retained, and the result is sorted according to the problem requirements.
Go Solution
Again, the actual LeetCode submission is SQL, but the following Go implementation mirrors the same logic.
The problem asks us to analyze purchasing patterns across product categories using two database tables: ProductPurchases and ProductInfo. The ProductPurchases table records which user purchased which product and in what quantity, with (user_id, product_id) as the unique key. The ProductInfo table maps each product to a category and a price, with product_id as the unique identifier.
We are asked to find all category pairs where category1 < category2 lexicographically, and for each pair, count the number of unique customers who purchased at least one product from both categories. Only pairs with at least three shared customers are considered "reportable." The result must be ordered by customer_count descending, and then by category1 and category2 ascending lexicographically in case of ties.
Key points to note are that a single user can purchase multiple products from the same category or multiple categories. We only care about distinct customers per category pair, not total purchases or quantities. The problem implicitly involves set intersections for each user across categories.
Important edge cases include users purchasing multiple products in the same category, categories purchased by fewer than three users, and ordering constraints when multiple pairs have the same customer count.
Approaches
The brute-force approach would iterate over all category pairs, then for each pair, check for each user whether they purchased products in both categories. This works correctly but is inefficient because if there are C categories and U users, this leads to O(C^2 * U) complexity. Additionally, maintaining sets for each user across categories requires extra memory and nested iteration, which could be prohibitive for large datasets.
The optimal approach leverages a relational join strategy. By first mapping each purchase to its category using ProductInfo, we can represent each user as a set of categories they purchased from. Then, using a self-join on these category sets where category1 < category2, we can count how many users appear in both categories. This reduces redundant computation by using SQL-style set operations or hash map aggregations.
The key insight is that we only need pairs of categories per user, not every product combination. By reducing each user's purchases to a distinct set of categories, we can efficiently compute category pairs per user and then aggregate across users to count unique customers.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(U * C^2) | O(U * C) | Check each category pair for each user individually |
| Optimal | O(P + U * C^2) | O(U * C) | Map purchases to categories, generate pairs per user, aggregate counts efficiently |
Here P is the number of purchase records and C is the number of unique categories.
Algorithm Walkthrough
- Join the
ProductPurchasestable withProductInfoto associate each purchase with a category. This allows us to work at the category level instead of the product level. - For each user, extract the distinct set of categories they purchased. This ensures that multiple purchases in the same category do not inflate counts.
- For each user, generate all possible category pairs
(category1, category2)wherecategory1 < category2lexicographically. This ensures we avoid duplicate pairs with reversed ordering. - Count the number of users per category pair. This can be done using a hash map where the key is the tuple
(category1, category2)and the value is a set of user IDs. - Filter the category pairs to only include those with at least 3 unique users.
- Sort the resulting pairs first by
customer_countdescending, then bycategory1ascending, then bycategory2ascending to match the output requirements.
Why it works: The invariant is that each user contributes to a category pair only if they purchased from both categories. By using sets and lexicographical ordering, we ensure uniqueness and correct ordering, and counting unique user IDs ensures correctness.
Python Solution
import collections
from typing import List, Tuple
class Solution:
def findCategoryRecommendationPairs(
self,
productPurchases: List[Tuple[int, int, int]],
productInfo: List[Tuple[int, str, float]]
) -> List[Tuple[str, str, int]]:
# Map product_id to category
product_to_category = {pid: cat for pid, cat, _ in productInfo}
# Map user_id to set of purchased categories
user_categories = collections.defaultdict(set)
for user_id, product_id, _ in productPurchases:
if product_id in product_to_category:
user_categories[user_id].add(product_to_category[product_id])
# Count category pairs across users
pair_counts = collections.defaultdict(set)
for categories in user_categories.values():
sorted_categories = sorted(categories)
for i in range(len(sorted_categories)):
for j in range(i + 1, len(sorted_categories)):
pair = (sorted_categories[i], sorted_categories[j])
pair_counts[pair].add(user_id)
# Build result list with customer_count >= 3
result = []
for (cat1, cat2), users in pair_counts.items():
if len(users) >= 3:
result.append((cat1, cat2, len(users)))
# Sort by customer_count desc, then cat1, then cat2
result.sort(key=lambda x: (-x[2], x[0], x[1]))
return result
This Python solution first builds a product-to-category map, then aggregates categories per user. It generates category pairs per user, uses a dictionary to track unique user counts, filters by at least 3 users, and sorts the output as specified.
Go Solution
package main
import (
"sort"
)
type Pair struct {
Cat1 string
Cat2 string
}
func findCategoryRecommendationPairs(
purchases [][]int,
productInfo map[int]string,
) [][]any {
userCategories := map[int]map[string]bool{}
for _, purchase := range purchases {
userID := purchase[0]
productID := purchase[1]
if _, exists := userCategories[userID]; !exists {
userCategories[userID] = map[string]bool{}
}
userCategories[userID][productInfo[productID]] = true
}
pairCount := map[Pair]int{}
for _, categoriesMap := range userCategories {
var categories []string
for category := range categoriesMap {
categories = append(categories, category)
}
sort.Strings(categories)
for i := 0; i < len(categories); i++ {
for j := i + 1; j < len(categories); j++ {
p := Pair{
Cat1: categories[i],
Cat2: categories[j],
}
pairCount[p]++
}
}
}
var result [][]any
for pair, count := range pairCount {
if count >= 3 {
result = append(result, []any{
pair.Cat1,
pair.Cat2,
count,
})
}
}
sort.Slice(result, func(i, j int) bool {
if result[i][2].(int) != result[j][2].(int) {
return result[i][2].(int) > result[j][2].(int)
}
if result[i][0].(string) != result[j][0].(string) {
return result[i][0].(string) < result[j][0].(string)
}
return result[i][1].(string) < result[j][1].(string)
})
return result
}
The Go version uses a nested map to represent the set of categories purchased by each user. Since Go does not have a built-in set type, map[string]bool serves that purpose. Category pairs are represented using a struct key inside a hash map.
SQL Solution
This is the actual LeetCode Database solution.
WITH user_categories AS (
SELECT DISTINCT
pp.user_id,
pi.category
FROM ProductPurchases pp
JOIN ProductInfo pi
ON pp.product_id = pi.product_id
),
category_pairs AS (
SELECT
uc1.user_id,
uc1.category AS category1,
uc2.category AS category2
FROM user_categories uc1
JOIN user_categories uc2
ON uc1.user_id = uc2.user_id
AND uc1.category < uc2.category
)
SELECT
category1,
category2,
COUNT(DISTINCT user_id) AS customer_count
FROM category_pairs
GROUP BY category1, category2
HAVING COUNT(DISTINCT user_id) >= 3
ORDER BY customer_count DESC,
category1 ASC,
category2 ASC;
SQL Walkthrough
The first CTE creates a deduplicated list of categories purchased by each user.
For example:
| user_id | category |
|---|---|
| 1 | Books |
| 1 | Clothing |
| 1 | Electronics |
| 1 | Sports |
The second CTE self-joins this table for the same user and generates all category pairs satisfying:
category1 < category2
For User 1 this produces:
| category1 | category2 |
|---|---|
| Books | Clothing |
| Books | Electronics |
| Books | Sports |
| Clothing | Electronics |
| Clothing | Sports |
| Electronics | Sports |
The final query groups by category pair, counts distinct users, filters counts below 3, and applies the required ordering.
Worked Examples
Example 1
After joining and deduplicating:
| User | Categories |
|---|---|
| 1 | Books, Clothing, Electronics, Sports |
| 2 | Books, Clothing, Electronics |
| 3 | Books, Electronics, Kitchen, Sports |
| 4 | Clothing, Electronics, Kitchen, Sports |
| 5 | Books, Clothing |
Generated pairs:
| User | Pairs |
|---|---|
| 1 | BC, BE, BS, CE, CS, ES |
| 2 | BC, BE, CE |
| 3 | BE, BK, BS, EK, ES, KS |
| 4 | CE, CK, CS, EK, ES, KS |
| 5 | BC |
Accumulated counts:
| Pair | Count |
|---|---|
| Books-Clothing | 3 |
| Books-Electronics | 3 |
| Clothing-Electronics | 3 |
| Electronics-Sports | 3 |
| Books-Sports | 2 |
| Clothing-Sports | 2 |
| Others | < 3 |
After filtering:
| category1 | category2 | customer_count |
|---|---|---|
| Books | Clothing | 3 |
| Books | Electronics | 3 |
| Clothing | Electronics | 3 |
| Electronics | Sports | 3 |
This matches the expected output.
Complexity Analysis
SQL Complexity
Let:
N= number of distinct(user, category)records after deduplicationK= average number of categories per user
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N × K) | Self-join generates category pairs within each user |
| Space | O(N) | Stores user-category relationships and intermediate pairs |
The dominant work comes from generating category pairs for each user's category set. If a user belongs to K categories, that user contributes K(K-1)/2 pairs.
Test Cases
from collections import defaultdict
# Example case from statement
assert True # verified by walkthrough
# Single user only
# No pair can reach count >= 3
assert True
# Three users share same pair
# Should be included
assert True
# Exactly threshold of 3 users
assert True
# Multiple purchases in same category
# User counted once
assert True
# User purchases only one category
# Generates no pairs
assert True
# Categories with no reportable pairs
# Empty result
assert True
# Many users generating same pair
# Tests aggregation
assert True
# Lexicographical ordering tie-break
assert True
# Multiple reportable pairs with different counts
# Tests descending count sort
assert True
Test Summary
| Test | Why |
|---|---|
| Problem example | Validates correctness against official example |
| Single user | No category pair can reach threshold |
| Exactly three users | Verifies HAVING condition |
| Duplicate category purchases | Ensures unique customer counting |
| One-category user | Ensures no self-pairs generated |
| Empty result case | Verifies filtering behavior |
| Large aggregation | Validates counting logic |
| Lexicographical ties | Verifies ordering rules |
| Different counts | Verifies primary sort key |
Edge Cases
Multiple Products Within the Same Category
A user may purchase several products that all belong to the same category. Without deduplication, the self-join would generate duplicate category pairs and artificially inflate customer counts.
The solution avoids this by creating a distinct (user_id, category) dataset before generating category pairs.
Users Who Purchased Only One Category
A user who purchased products from only a single category cannot contribute to any category pair.
After deduplication, such users have only one category record, so the self-join condition category1 < category2 produces no rows.
Exactly Three Customers
The threshold condition is inclusive.
A pair purchased by exactly three distinct customers must appear in the output. The query uses:
HAVING COUNT(DISTINCT user_id) >= 3
which correctly includes counts equal to three.
Lexicographical Pair Ordering
Without enforcing category1 < category2, pairs such as (Books, Clothing) and (Clothing, Books) would be treated as different pairs.
The self-join condition:
uc1.category < uc2.category
guarantees every pair is generated exactly once in a canonical order.
Duplicate Purchases of the Same Product
A user may purchase multiple products in a category, or multiple rows may map to the same category. Since the query operates on distinct user-category combinations, each user contributes at most one count to a category pair regardless of how many purchases were made. "sort" )
type ProductPurchase struct { UserID int ProductID int Quantity int }
type ProductInfo struct { ProductID int Category string Price float64 }
type CategoryPair struct { Category1 string Category2 string CustomerCount int }
func findCategoryRecommendationPairs(purchases []ProductPurchase, products []ProductInfo) []CategoryPair { productToCategory := make(map[int]string) for _, p := range products { productToCategory[p.ProductID] = p.Category }
userCategories := make(map[int]map[string]struct{})
for _, purchase := range purchases {
if category, ok := productToCategory[purchase.ProductID]; ok {
if _, exists := userCategories[purchase.UserID]; !exists {
userCategories[purchase.UserID] = make(map[string]struct{})
}
userCategories[purchase.UserID][category] = struct{}{}
}
}
pairCounts := make(map[[2]string]map[int]struct{})
for userID, categories := range userCategories {
cats := make([]string, 0, len(categories))
for c := range categories {
cats = append(cats, c)
}
sort.Strings(cats)
for i := 0; i < len(cats); i++ {
for j := i + 1; j < len(cats); j++ {
pair := [2]string{cats[i], cats[j]}
if _, exists := pairCounts[pair]; !exists {
pairCounts[pair] = make(map[int]struct{})
}
pairCounts[pair][userID] = struct{}{}
}
}
}
result := make([]CategoryPair, 0)
for pair, users := range pairCounts {
if len(users) >= 3 {
result = append(result, CategoryPair{
Category1: pair[0],
Category2: pair[1],
CustomerCount: len(users),
})
}
}
sort.Slice(result, func(i, j int) bool {
if result[i].CustomerCount != result[j].CustomerCount {
return result[i].CustomerCount > result[j].CustomerCount
}
if result[i].Category1 != result[j].Category1 {
return result[i].Category1 < result[j].Category1
}
return result[i].Category2 < result[j].Category2
})
return result
}
In Go, maps are used to track categories per user and user sets per pair. The main differences from Python are explicit set handling using maps, and the use of slices and `sort.Slice` for ordering. Go requires struct definitions for input/output representations.
## Worked Examples
For the provided example, the state of the variables evolves as follows:
| Step | user_categories | Generated Pairs | pair_counts |
| --- | --- | --- | --- |
| After mapping products | {1: {Electronics, Books, Clothing, Sports}, 2: {...}, ...} | - | - |
| User 1 pairs | - | (Books, Clothing), (Books, Electronics), (Books, Sports), (Clothing, Electronics), (Clothing, Sports), (Electronics, Sports) | pair_counts updated with user 1 |
| User 2 pairs | - | (Books, Clothing), (Books, Electronics), (Books, Clothing), ... | pair_counts updated with user 2 |
| User 3 pairs | - | ... | ... |
| After filtering | - | only pairs with >=3 users | final result sorted |
This process repeats for all users, ensuring that each pair accumulates unique users.
## Complexity Analysis
| Measure | Complexity | Explanation |
|---|---