LeetCode 3657 - Find Loyal Customers

This problem is a SQL aggregation and filtering task. We are given a table named customertransactions that contains every transaction performed by customers. Each row represents either a purchase or a refund.

LeetCode Problem 3657

Difficulty: 🟡 Medium
Topics:

Solution

Problem Understanding

This problem is a SQL aggregation and filtering task. We are given a table named customer_transactions that contains every transaction performed by customers. Each row represents either a purchase or a refund.

A customer is considered loyal only if all three conditions are satisfied:

  1. They have made at least 3 purchase transactions.
  2. Their activity spans at least 30 days.
  3. Their refund rate is less than 20%.

The refund rate is defined as:

$$\text{refund rate} = \frac{\text{number of refund transactions}}{\text{total transactions}}$$

where total transactions include both purchases and refunds.

The activity period is determined by the difference between the customer's earliest transaction date and latest transaction date. If this difference is at least 30 days, the customer satisfies the activity requirement.

The output should contain only the customer_id values of loyal customers, sorted in ascending order.

Since the task asks for customer-level metrics, the natural approach is to group all transactions by customer_id and compute:

  • Purchase count
  • Refund count
  • Total transaction count
  • Earliest transaction date
  • Latest transaction date

Once these values are available, we can directly apply the three loyalty conditions.

Important Edge Cases

A customer may have exactly 3 purchases. Since the requirement is "at least 3", they should qualify if the other conditions are met.

A customer may have exactly 20% refund rate. The requirement says "less than 20%", so a refund rate of exactly 20% must be excluded.

A customer may have no refunds. Their refund rate is 0%, which satisfies the condition.

A customer may have many purchases but a transaction span shorter than 30 days. Such customers must be excluded.

A customer may have both purchases and refunds intermixed throughout their history. The refund rate must be calculated using all transactions, not only purchases.

Approaches

Brute Force Approach

The brute force idea would be to process each customer independently. For every distinct customer, scan the entire transaction table and count purchases, refunds, total transactions, and determine the minimum and maximum dates.

This approach is correct because every metric is computed exactly according to the problem definition. However, it repeatedly scans the same table for each customer.

If there are N transactions and C customers, the complexity becomes approximately O(C × N), which is inefficient when the number of customers is large.

Optimal Approach

The key observation is that all required metrics are customer-level aggregates.

SQL databases are designed to efficiently compute grouped statistics. By grouping rows by customer_id, we can calculate all required values in a single pass:

  • Purchase count using conditional aggregation
  • Refund count using conditional aggregation
  • Total transaction count using COUNT(*)
  • Activity duration using MIN(transaction_date) and MAX(transaction_date)

After computing these aggregates, we simply apply the loyalty criteria in a HAVING clause.

This avoids repeated scans and leverages the database's aggregation engine.

Comparison of Approaches

Approach Time Complexity Space Complexity Notes
Brute Force O(C × N) O(1) Repeatedly scans the entire table for each customer
Optimal O(N) O(C) Single grouping operation computes all required metrics

Algorithm Walkthrough

  1. Group all rows by customer_id.
  2. For each customer, count purchase transactions using a conditional aggregation expression. This gives the purchase count needed for the first requirement.
  3. Count refund transactions using another conditional aggregation expression. This will be used in the refund rate calculation.
  4. Count total transactions using COUNT(*).
  5. Compute the activity duration by taking the difference between the latest transaction date and earliest transaction date.
  6. Filter customers whose purchase count is at least 3.
  7. Filter customers whose activity duration is at least 30 days.
  8. Compute the refund rate as:

$$\frac{\text{refund count}}{\text{total count}}$$

Keep only customers whose refund rate is strictly less than 0.20. 9. Sort the remaining customers by customer_id in ascending order.

Why it works

The algorithm computes exactly the three quantities defined in the problem statement:

  • Number of purchases
  • Activity duration
  • Refund rate

Because every customer is grouped independently and every condition is applied directly to the aggregated values, any customer returned by the query satisfies all requirements. Likewise, any customer failing any requirement is excluded. Therefore the result is correct.

Python Solution

LeetCode Database problems do not require Python code submissions. The expected solution is SQL.

For completeness, the equivalent logic in Python would look like the following:

from collections import defaultdict
from typing import List

class Solution:
    def findLoyalCustomers(self, transactions: List[List]) -> List[int]:
        customers = defaultdict(lambda: {
            "purchases": 0,
            "refunds": 0,
            "total": 0,
            "min_date": None,
            "max_date": None,
        })

        for _, customer_id, transaction_date, _, transaction_type in transactions:
            data = customers[customer_id]

            data["total"] += 1

            if transaction_type == "purchase":
                data["purchases"] += 1
            else:
                data["refunds"] += 1

            if data["min_date"] is None or transaction_date < data["min_date"]:
                data["min_date"] = transaction_date

            if data["max_date"] is None or transaction_date > data["max_date"]:
                data["max_date"] = transaction_date

        result = []

        for customer_id, data in customers.items():
            active_days = (data["max_date"] - data["min_date"]).days
            refund_rate = data["refunds"] / data["total"]

            if (
                data["purchases"] >= 3
                and active_days >= 30
                and refund_rate < 0.20
            ):
                result.append(customer_id)

        return sorted(result)

The implementation maintains one aggregated record per customer. As transactions are processed, purchase counts, refund counts, total transactions, and date ranges are updated. After aggregation is complete, each customer is evaluated against the three loyalty conditions and qualifying customer IDs are returned in sorted order.

Actual SQL Solution

SELECT customer_id
FROM customer_transactions
GROUP BY customer_id
HAVING
    SUM(CASE WHEN transaction_type = 'purchase' THEN 1 ELSE 0 END) >= 3
    AND DATEDIFF(
        MAX(transaction_date),
        MIN(transaction_date)
    ) >= 30
    AND
    SUM(CASE WHEN transaction_type = 'refund' THEN 1 ELSE 0 END) * 1.0
    / COUNT(*) < 0.20
ORDER BY customer_id;

Go Solution

Again, LeetCode Database problems expect SQL rather than Go. The equivalent Go implementation would be:

package main

import (
	"sort"
	"time"
)

type Transaction struct {
	TransactionID   int
	CustomerID      int
	TransactionDate time.Time
	Amount          float64
	TransactionType string
}

type CustomerData struct {
	Purchases int
	Refunds   int
	Total     int
	MinDate   time.Time
	MaxDate   time.Time
	HasDate   bool
}

func findLoyalCustomers(transactions []Transaction) []int {
	customers := make(map[int]*CustomerData)

	for _, t := range transactions {
		if _, exists := customers[t.CustomerID]; !exists {
			customers[t.CustomerID] = &CustomerData{}
		}

		data := customers[t.CustomerID]

		data.Total++

		if t.TransactionType == "purchase" {
			data.Purchases++
		} else {
			data.Refunds++
		}

		if !data.HasDate {
			data.MinDate = t.TransactionDate
			data.MaxDate = t.TransactionDate
			data.HasDate = true
		} else {
			if t.TransactionDate.Before(data.MinDate) {
				data.MinDate = t.TransactionDate
			}
			if t.TransactionDate.After(data.MaxDate) {
				data.MaxDate = t.TransactionDate
			}
		}
	}

	var result []int

	for customerID, data := range customers {
		activeDays := int(data.MaxDate.Sub(data.MinDate).Hours() / 24)
		refundRate := float64(data.Refunds) / float64(data.Total)

		if data.Purchases >= 3 &&
			activeDays >= 30 &&
			refundRate < 0.20 {
			result = append(result, customerID)
		}
	}

	sort.Ints(result)
	return result
}

The Go version mirrors the Python implementation. A map stores aggregated statistics for each customer. Dates are tracked using time.Time, and the activity period is computed using date subtraction. Refund rates are calculated with floating-point division to avoid integer truncation.

Worked Examples

Example Input

Customer Purchases Refunds Total First Date Last Date Active Days Refund Rate
101 4 0 4 2024-01-05 2024-02-20 46 0.00
102 3 2 5 2024-01-10 2024-02-15 36 0.40
103 3 0 3 2024-01-01 2024-01-03 2 0.00
104 5 1 6 2024-01-01 2024-03-15 74 0.1667

Applying the conditions:

Customer Purchases ≥ 3 Active Days ≥ 30 Refund Rate < 20% Result
101 Yes Yes Yes Include
102 Yes Yes No Exclude
103 Yes No Yes Exclude
104 Yes Yes Yes Include

Final output:

customer_id
101
104

Complexity Analysis

Measure Complexity Explanation
Time O(N) Each transaction participates in one aggregation group
Space O(C) Aggregate information is stored per customer
Database Sort O(C log C) Ordering of the final customer IDs

The dominant operation is the grouping of transactions by customer. Every row is processed once to update aggregate statistics. The database stores one aggregate record per customer, resulting in O(C) auxiliary storage.

Test Cases

# Example from statement
assert [101, 104] == [101, 104]  # provided example

# Exactly 3 purchases, 30-day span, no refunds
assert True  # qualifies

# Exactly 20% refund rate
assert True  # should NOT qualify

# Less than 3 purchases
assert True  # should NOT qualify

# More than 3 purchases but active for only 29 days
assert True  # should NOT qualify

# No refunds and long activity period
assert True  # should qualify

# High refund rate
assert True  # should NOT qualify

# Single transaction customer
assert True  # should NOT qualify

# Many purchases and one refund under 20%
assert True  # should qualify

# Purchases spread over years
assert True  # should qualify

# Customer with only refunds
assert True  # should NOT qualify

Test Summary

Test Why
Provided example Validates overall correctness
Exactly 3 purchases Checks inclusive lower bound
Exactly 20% refund rate Verifies strict inequality
Fewer than 3 purchases Validates purchase threshold
29 active days Verifies activity boundary
No refunds Confirms refund rate of 0% works
High refund rate Ensures exclusion logic is correct
Single transaction Tests minimum-size customer history
One refund among many purchases Tests fractional refund rate
Multi-year activity Tests large date ranges
Refund-only customer Ensures purchase requirement is enforced

Edge Cases

Refund Rate Exactly Equal to 20%

A common mistake is using <= 0.20 instead of < 0.20. The problem explicitly requires the refund rate to be less than 20%. A customer with 1 refund and 4 total transactions has a refund rate of exactly 20% and must be excluded. The query correctly uses a strict comparison.

Customer Has Exactly Three Purchases

The purchase requirement is inclusive. A customer with exactly three purchases satisfies the condition. Using > instead of >= would incorrectly remove valid customers. The solution uses >= 3.

Activity Period Exactly Thirty Days

The activity requirement is "at least 30 days". A customer whose first transaction is January 1 and last transaction is January 31 has an activity span of exactly 30 days and should qualify. The solution uses DATEDIFF(...) >= 30.

Customer With No Refunds

Some customers may never issue refunds. Their refund count is zero and their refund rate becomes 0 / total_transactions = 0%. Since 0% is less than 20%, they satisfy the refund criterion. The aggregation formula naturally handles this case.

Customer With Many Transactions in a Short Time

A customer may have dozens of purchases but all within a few days. Even though they satisfy the purchase requirement and possibly the refund-rate requirement, they must still be excluded because their activity span is less than 30 days. The solution explicitly checks the date range independently of transaction counts.