LeetCode 1565 - Unique Orders and Customers Per Month
This problem requires analyzing a table of orders to calculate monthly statistics. Specifically, for each unique month present in the Orders table, we need to determine two metrics: the number of unique orders and the number of unique customers whose orders have an invoice…
Difficulty: 🟢 Easy
Topics: Database
Solution
Problem Understanding
This problem requires analyzing a table of orders to calculate monthly statistics. Specifically, for each unique month present in the Orders table, we need to determine two metrics: the number of unique orders and the number of unique customers whose orders have an invoice greater than 20. The input consists of a table with four columns: order_id (unique identifier), order_date (date of the order), customer_id (who made the order), and invoice (amount paid). The expected output is a table that aggregates by month (in YYYY-MM format), listing the count of qualifying orders and count of unique customers for that month.
The key constraints and details are: all orders have unique order_id, invoice amounts are numeric, and months with no orders above 20 should be excluded from the result. Edge cases include months where all invoices are ≤ 20, months with multiple orders from the same customer, and months with orders that straddle year boundaries.
Approaches
The brute-force approach would involve iterating through every row in the table, parsing the month from order_date, and maintaining separate lists or sets for qualifying orders and customers for each month. While this works, it becomes cumbersome when the dataset is large because it requires iterating multiple times and manually managing aggregation.
The optimal approach uses SQL aggregation functions. By first filtering out orders with invoice <= 20, we can then group by the extracted month and count distinct order_id and customer_id in a single query. The key insight is leveraging SQL's COUNT(DISTINCT ...) and DATE_FORMAT (or equivalent) to perform the aggregation directly in the database, avoiding any manual iteration or intermediate storage.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(n) | Iterate over all orders, manually aggregate monthly stats in a hash map |
| Optimal | O(n) | O(m) | SQL aggregation: filter, group by month, count distinct orders and customers, where n = total rows, m = number of months |
Algorithm Walkthrough
- Filter the
Orderstable to retain only rows whereinvoice > 20. This ensures only qualifying orders contribute to the counts. - Extract the month from the
order_datecolumn. In SQL, this can be done usingDATE_FORMAT(order_date, '%Y-%m'). This produces a string in theYYYY-MMformat. - Group the filtered table by the extracted month. This creates one group per unique month.
- Within each group, count the number of distinct
order_idvalues to getorder_count. - Within each group, count the number of distinct
customer_idvalues to getcustomer_count. - Return the aggregated table with columns
month,order_count, andcustomer_count. Sorting is not required.
Why it works: By filtering first, we ensure only orders with invoices greater than 20 are considered. Aggregating by month ensures that all orders and customers are correctly counted for the right month. Using COUNT(DISTINCT ...) prevents double-counting orders or customers, even if multiple orders exist for a customer in the same month.
Python Solution
import pandas as pd
def unique_orders_customers_per_month(orders: pd.DataFrame) -> pd.DataFrame:
# Filter orders with invoice > 20
filtered = orders[orders['invoice'] > 20].copy()
# Extract month in YYYY-MM format
filtered['month'] = filtered['order_date'].dt.to_period('M').astype(str)
# Group by month and compute distinct counts
result = filtered.groupby('month').agg(
order_count=('order_id', 'nunique'),
customer_count=('customer_id', 'nunique')
).reset_index()
return result
The Python implementation mirrors the SQL approach. First, it filters the DataFrame for invoices over 20. Then it creates a new column with the month extracted from the order date. Finally, it groups by month and calculates the number of unique orders and unique customers using nunique.
Go Solution
package main
import (
"database/sql"
"fmt"
_ "github.com/go-sql-driver/mysql"
)
type Result struct {
Month string
OrderCount int
CustomerCount int
}
func UniqueOrdersCustomersPerMonth(db *sql.DB) ([]Result, error) {
query := `
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month,
COUNT(DISTINCT order_id) AS order_count,
COUNT(DISTINCT customer_id) AS customer_count
FROM Orders
WHERE invoice > 20
GROUP BY month
`
rows, err := db.Query(query)
if err != nil {
return nil, err
}
defer rows.Close()
var results []Result
for rows.Next() {
var r Result
if err := rows.Scan(&r.Month, &r.OrderCount, &r.CustomerCount); err != nil {
return nil, err
}
results = append(results, r)
}
return results, nil
}
The Go solution executes a SQL query similar to the Python Pandas approach. The main difference is the explicit connection to the database and row scanning into a Go struct. COUNT(DISTINCT ...) handles unique orders and customers automatically. Error handling is essential for database operations.
Worked Examples
Using the example table:
- Filter orders with
invoice > 20. Rows remaining: 1, 2, 4, 7, 8, 9. - Extract months: 2020-09, 2020-09, 2020-10, 2020-12, 2020-12, 2021-01.
- Group by month:
- 2020-09: orders = [1, 2], customers = [1, 2]
- 2020-10: orders = [4], customers = [3]
- 2020-12: orders = [7, 8], customers = [4]
- 2021-01: orders = [9], customers = [3]
- Count distinct orders and customers for each month, producing the output table.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | We iterate over each row once to filter and extract months, then aggregate, which is linear in the number of orders n |
| Space | O(m) | Space required for the grouped result and intermediate structures, where m is the number of unique months |
The complexity is efficient because filtering and grouping operations are handled by vectorized operations in Python or SQL engine optimizations in Go.
Test Cases
import pandas as pd
orders = pd.DataFrame({
"order_id": [1,2,3,4,5,6,7,8,9,10],
"order_date": pd.to_datetime(["2020-09-15","2020-09-17","2020-10-06","2020-10-20","2020-11-10","2020-11-21","2020-12-01","2020-12-03","2021-01-07","2021-01-15"]),
"customer_id": [1,2,3,3,1,2,4,4,3,2],
"invoice": [30,90,20,21,10,15,55,77,31,20]
})
result = unique_orders_customers_per_month(orders)
assert result.loc[result['month']=='2020-09', 'order_count'].iloc[0] == 2 # September 2020
assert result.loc[result['month']=='2020-09', 'customer_count'].iloc[0] == 2
assert result.loc[result['month']=='2020-10', 'order_count'].iloc[0] == 1 # October 2020
assert result.loc[result['month']=='2020-10', 'customer_count'].iloc[0] == 1
assert result.loc[result['month']=='2020-12', 'order_count'].iloc[0] == 2 # December 2020
assert result.loc[result['month']=='2020-12', 'customer_count'].iloc[0] == 1
assert result.loc[result['month']=='2021-01', 'order_count'].iloc[0] == 1 # January 2021
assert result.loc[result['month']=='2021-01', 'customer_count'].iloc[0] == 1
| Test | Why |
|---|---|
| Provided example | Validates basic filtering and monthly aggregation |
| Months with invoices ≤ 20 | Confirms these months are excluded |
| Multiple orders per customer in the same month | Ensures COUNT(DISTINCT customer_id) works |
Edge Cases
One edge case is a month where all orders have invoices ≤ 20. The algorithm correctly excludes such months by filtering first. Another edge case is a customer making multiple qualifying orders in the same month; using COUNT(DISTINCT customer_id) ensures they are only counted once. A third edge case involves year boundaries, e.g., orders on December 31 and January 1; extracting months as YYYY-MM ensures these are treated as separate months and counted correctly. These cases are all handled naturally by the filter and grouping steps.