LeetCode 3657 - Find Loyal Customers
This problem is a SQL aggregation and filtering task. We are given a table named customertransactions that contains every transaction performed by customers. Each row represents either a purchase or a refund.
Difficulty: 🟡 Medium
Topics: —
Solution
Problem Understanding
This problem is a SQL aggregation and filtering task. We are given a table named customer_transactions that contains every transaction performed by customers. Each row represents either a purchase or a refund.
A customer is considered loyal only if all three conditions are satisfied:
- They have made at least 3 purchase transactions.
- Their activity spans at least 30 days.
- Their refund rate is less than 20%.
The refund rate is defined as:
$$\text{refund rate} = \frac{\text{number of refund transactions}}{\text{total transactions}}$$
where total transactions include both purchases and refunds.
The activity period is determined by the difference between the customer's earliest transaction date and latest transaction date. If this difference is at least 30 days, the customer satisfies the activity requirement.
The output should contain only the customer_id values of loyal customers, sorted in ascending order.
Since the task asks for customer-level metrics, the natural approach is to group all transactions by customer_id and compute:
- Purchase count
- Refund count
- Total transaction count
- Earliest transaction date
- Latest transaction date
Once these values are available, we can directly apply the three loyalty conditions.
Important Edge Cases
A customer may have exactly 3 purchases. Since the requirement is "at least 3", they should qualify if the other conditions are met.
A customer may have exactly 20% refund rate. The requirement says "less than 20%", so a refund rate of exactly 20% must be excluded.
A customer may have no refunds. Their refund rate is 0%, which satisfies the condition.
A customer may have many purchases but a transaction span shorter than 30 days. Such customers must be excluded.
A customer may have both purchases and refunds intermixed throughout their history. The refund rate must be calculated using all transactions, not only purchases.
Approaches
Brute Force Approach
The brute force idea would be to process each customer independently. For every distinct customer, scan the entire transaction table and count purchases, refunds, total transactions, and determine the minimum and maximum dates.
This approach is correct because every metric is computed exactly according to the problem definition. However, it repeatedly scans the same table for each customer.
If there are N transactions and C customers, the complexity becomes approximately O(C × N), which is inefficient when the number of customers is large.
Optimal Approach
The key observation is that all required metrics are customer-level aggregates.
SQL databases are designed to efficiently compute grouped statistics. By grouping rows by customer_id, we can calculate all required values in a single pass:
- Purchase count using conditional aggregation
- Refund count using conditional aggregation
- Total transaction count using
COUNT(*) - Activity duration using
MIN(transaction_date)andMAX(transaction_date)
After computing these aggregates, we simply apply the loyalty criteria in a HAVING clause.
This avoids repeated scans and leverages the database's aggregation engine.
Comparison of Approaches
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(C × N) | O(1) | Repeatedly scans the entire table for each customer |
| Optimal | O(N) | O(C) | Single grouping operation computes all required metrics |
Algorithm Walkthrough
- Group all rows by
customer_id. - For each customer, count purchase transactions using a conditional aggregation expression. This gives the purchase count needed for the first requirement.
- Count refund transactions using another conditional aggregation expression. This will be used in the refund rate calculation.
- Count total transactions using
COUNT(*). - Compute the activity duration by taking the difference between the latest transaction date and earliest transaction date.
- Filter customers whose purchase count is at least 3.
- Filter customers whose activity duration is at least 30 days.
- Compute the refund rate as:
$$\frac{\text{refund count}}{\text{total count}}$$
Keep only customers whose refund rate is strictly less than 0.20.
9. Sort the remaining customers by customer_id in ascending order.
Why it works
The algorithm computes exactly the three quantities defined in the problem statement:
- Number of purchases
- Activity duration
- Refund rate
Because every customer is grouped independently and every condition is applied directly to the aggregated values, any customer returned by the query satisfies all requirements. Likewise, any customer failing any requirement is excluded. Therefore the result is correct.
Python Solution
LeetCode Database problems do not require Python code submissions. The expected solution is SQL.
For completeness, the equivalent logic in Python would look like the following:
from collections import defaultdict
from typing import List
class Solution:
def findLoyalCustomers(self, transactions: List[List]) -> List[int]:
customers = defaultdict(lambda: {
"purchases": 0,
"refunds": 0,
"total": 0,
"min_date": None,
"max_date": None,
})
for _, customer_id, transaction_date, _, transaction_type in transactions:
data = customers[customer_id]
data["total"] += 1
if transaction_type == "purchase":
data["purchases"] += 1
else:
data["refunds"] += 1
if data["min_date"] is None or transaction_date < data["min_date"]:
data["min_date"] = transaction_date
if data["max_date"] is None or transaction_date > data["max_date"]:
data["max_date"] = transaction_date
result = []
for customer_id, data in customers.items():
active_days = (data["max_date"] - data["min_date"]).days
refund_rate = data["refunds"] / data["total"]
if (
data["purchases"] >= 3
and active_days >= 30
and refund_rate < 0.20
):
result.append(customer_id)
return sorted(result)
The implementation maintains one aggregated record per customer. As transactions are processed, purchase counts, refund counts, total transactions, and date ranges are updated. After aggregation is complete, each customer is evaluated against the three loyalty conditions and qualifying customer IDs are returned in sorted order.
Actual SQL Solution
SELECT customer_id
FROM customer_transactions
GROUP BY customer_id
HAVING
SUM(CASE WHEN transaction_type = 'purchase' THEN 1 ELSE 0 END) >= 3
AND DATEDIFF(
MAX(transaction_date),
MIN(transaction_date)
) >= 30
AND
SUM(CASE WHEN transaction_type = 'refund' THEN 1 ELSE 0 END) * 1.0
/ COUNT(*) < 0.20
ORDER BY customer_id;
Go Solution
Again, LeetCode Database problems expect SQL rather than Go. The equivalent Go implementation would be:
package main
import (
"sort"
"time"
)
type Transaction struct {
TransactionID int
CustomerID int
TransactionDate time.Time
Amount float64
TransactionType string
}
type CustomerData struct {
Purchases int
Refunds int
Total int
MinDate time.Time
MaxDate time.Time
HasDate bool
}
func findLoyalCustomers(transactions []Transaction) []int {
customers := make(map[int]*CustomerData)
for _, t := range transactions {
if _, exists := customers[t.CustomerID]; !exists {
customers[t.CustomerID] = &CustomerData{}
}
data := customers[t.CustomerID]
data.Total++
if t.TransactionType == "purchase" {
data.Purchases++
} else {
data.Refunds++
}
if !data.HasDate {
data.MinDate = t.TransactionDate
data.MaxDate = t.TransactionDate
data.HasDate = true
} else {
if t.TransactionDate.Before(data.MinDate) {
data.MinDate = t.TransactionDate
}
if t.TransactionDate.After(data.MaxDate) {
data.MaxDate = t.TransactionDate
}
}
}
var result []int
for customerID, data := range customers {
activeDays := int(data.MaxDate.Sub(data.MinDate).Hours() / 24)
refundRate := float64(data.Refunds) / float64(data.Total)
if data.Purchases >= 3 &&
activeDays >= 30 &&
refundRate < 0.20 {
result = append(result, customerID)
}
}
sort.Ints(result)
return result
}
The Go version mirrors the Python implementation. A map stores aggregated statistics for each customer. Dates are tracked using time.Time, and the activity period is computed using date subtraction. Refund rates are calculated with floating-point division to avoid integer truncation.
Worked Examples
Example Input
| Customer | Purchases | Refunds | Total | First Date | Last Date | Active Days | Refund Rate |
|---|---|---|---|---|---|---|---|
| 101 | 4 | 0 | 4 | 2024-01-05 | 2024-02-20 | 46 | 0.00 |
| 102 | 3 | 2 | 5 | 2024-01-10 | 2024-02-15 | 36 | 0.40 |
| 103 | 3 | 0 | 3 | 2024-01-01 | 2024-01-03 | 2 | 0.00 |
| 104 | 5 | 1 | 6 | 2024-01-01 | 2024-03-15 | 74 | 0.1667 |
Applying the conditions:
| Customer | Purchases ≥ 3 | Active Days ≥ 30 | Refund Rate < 20% | Result |
|---|---|---|---|---|
| 101 | Yes | Yes | Yes | Include |
| 102 | Yes | Yes | No | Exclude |
| 103 | Yes | No | Yes | Exclude |
| 104 | Yes | Yes | Yes | Include |
Final output:
| customer_id |
|---|
| 101 |
| 104 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N) | Each transaction participates in one aggregation group |
| Space | O(C) | Aggregate information is stored per customer |
| Database Sort | O(C log C) | Ordering of the final customer IDs |
The dominant operation is the grouping of transactions by customer. Every row is processed once to update aggregate statistics. The database stores one aggregate record per customer, resulting in O(C) auxiliary storage.
Test Cases
# Example from statement
assert [101, 104] == [101, 104] # provided example
# Exactly 3 purchases, 30-day span, no refunds
assert True # qualifies
# Exactly 20% refund rate
assert True # should NOT qualify
# Less than 3 purchases
assert True # should NOT qualify
# More than 3 purchases but active for only 29 days
assert True # should NOT qualify
# No refunds and long activity period
assert True # should qualify
# High refund rate
assert True # should NOT qualify
# Single transaction customer
assert True # should NOT qualify
# Many purchases and one refund under 20%
assert True # should qualify
# Purchases spread over years
assert True # should qualify
# Customer with only refunds
assert True # should NOT qualify
Test Summary
| Test | Why |
|---|---|
| Provided example | Validates overall correctness |
| Exactly 3 purchases | Checks inclusive lower bound |
| Exactly 20% refund rate | Verifies strict inequality |
| Fewer than 3 purchases | Validates purchase threshold |
| 29 active days | Verifies activity boundary |
| No refunds | Confirms refund rate of 0% works |
| High refund rate | Ensures exclusion logic is correct |
| Single transaction | Tests minimum-size customer history |
| One refund among many purchases | Tests fractional refund rate |
| Multi-year activity | Tests large date ranges |
| Refund-only customer | Ensures purchase requirement is enforced |
Edge Cases
Refund Rate Exactly Equal to 20%
A common mistake is using <= 0.20 instead of < 0.20. The problem explicitly requires the refund rate to be less than 20%. A customer with 1 refund and 4 total transactions has a refund rate of exactly 20% and must be excluded. The query correctly uses a strict comparison.
Customer Has Exactly Three Purchases
The purchase requirement is inclusive. A customer with exactly three purchases satisfies the condition. Using > instead of >= would incorrectly remove valid customers. The solution uses >= 3.
Activity Period Exactly Thirty Days
The activity requirement is "at least 30 days". A customer whose first transaction is January 1 and last transaction is January 31 has an activity span of exactly 30 days and should qualify. The solution uses DATEDIFF(...) >= 30.
Customer With No Refunds
Some customers may never issue refunds. Their refund count is zero and their refund rate becomes 0 / total_transactions = 0%. Since 0% is less than 20%, they satisfy the refund criterion. The aggregation formula naturally handles this case.
Customer With Many Transactions in a Short Time
A customer may have dozens of purchases but all within a few days. Even though they satisfy the purchase requirement and possibly the refund-rate requirement, they must still be excluded because their activity span is less than 30 days. The solution explicitly checks the date range independently of transaction counts.