LeetCode 1565 - Unique Orders and Customers Per Month

This problem requires analyzing a table of orders to calculate monthly statistics. Specifically, for each unique month present in the Orders table, we need to determine two metrics: the number of unique orders and the number of unique customers whose orders have an invoice…

LeetCode Problem 1565

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

This problem requires analyzing a table of orders to calculate monthly statistics. Specifically, for each unique month present in the Orders table, we need to determine two metrics: the number of unique orders and the number of unique customers whose orders have an invoice greater than 20. The input consists of a table with four columns: order_id (unique identifier), order_date (date of the order), customer_id (who made the order), and invoice (amount paid). The expected output is a table that aggregates by month (in YYYY-MM format), listing the count of qualifying orders and count of unique customers for that month.

The key constraints and details are: all orders have unique order_id, invoice amounts are numeric, and months with no orders above 20 should be excluded from the result. Edge cases include months where all invoices are ≤ 20, months with multiple orders from the same customer, and months with orders that straddle year boundaries.

Approaches

The brute-force approach would involve iterating through every row in the table, parsing the month from order_date, and maintaining separate lists or sets for qualifying orders and customers for each month. While this works, it becomes cumbersome when the dataset is large because it requires iterating multiple times and manually managing aggregation.

The optimal approach uses SQL aggregation functions. By first filtering out orders with invoice <= 20, we can then group by the extracted month and count distinct order_id and customer_id in a single query. The key insight is leveraging SQL's COUNT(DISTINCT ...) and DATE_FORMAT (or equivalent) to perform the aggregation directly in the database, avoiding any manual iteration or intermediate storage.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) O(n) Iterate over all orders, manually aggregate monthly stats in a hash map
Optimal O(n) O(m) SQL aggregation: filter, group by month, count distinct orders and customers, where n = total rows, m = number of months

Algorithm Walkthrough

  1. Filter the Orders table to retain only rows where invoice > 20. This ensures only qualifying orders contribute to the counts.
  2. Extract the month from the order_date column. In SQL, this can be done using DATE_FORMAT(order_date, '%Y-%m'). This produces a string in the YYYY-MM format.
  3. Group the filtered table by the extracted month. This creates one group per unique month.
  4. Within each group, count the number of distinct order_id values to get order_count.
  5. Within each group, count the number of distinct customer_id values to get customer_count.
  6. Return the aggregated table with columns month, order_count, and customer_count. Sorting is not required.

Why it works: By filtering first, we ensure only orders with invoices greater than 20 are considered. Aggregating by month ensures that all orders and customers are correctly counted for the right month. Using COUNT(DISTINCT ...) prevents double-counting orders or customers, even if multiple orders exist for a customer in the same month.

Python Solution

import pandas as pd

def unique_orders_customers_per_month(orders: pd.DataFrame) -> pd.DataFrame:
    # Filter orders with invoice > 20
    filtered = orders[orders['invoice'] > 20].copy()
    
    # Extract month in YYYY-MM format
    filtered['month'] = filtered['order_date'].dt.to_period('M').astype(str)
    
    # Group by month and compute distinct counts
    result = filtered.groupby('month').agg(
        order_count=('order_id', 'nunique'),
        customer_count=('customer_id', 'nunique')
    ).reset_index()
    
    return result

The Python implementation mirrors the SQL approach. First, it filters the DataFrame for invoices over 20. Then it creates a new column with the month extracted from the order date. Finally, it groups by month and calculates the number of unique orders and unique customers using nunique.

Go Solution

package main

import (
    "database/sql"
    "fmt"
    _ "github.com/go-sql-driver/mysql"
)

type Result struct {
    Month         string
    OrderCount    int
    CustomerCount int
}

func UniqueOrdersCustomersPerMonth(db *sql.DB) ([]Result, error) {
    query := `
        SELECT 
            DATE_FORMAT(order_date, '%Y-%m') AS month,
            COUNT(DISTINCT order_id) AS order_count,
            COUNT(DISTINCT customer_id) AS customer_count
        FROM Orders
        WHERE invoice > 20
        GROUP BY month
    `
    
    rows, err := db.Query(query)
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var results []Result
    for rows.Next() {
        var r Result
        if err := rows.Scan(&r.Month, &r.OrderCount, &r.CustomerCount); err != nil {
            return nil, err
        }
        results = append(results, r)
    }
    
    return results, nil
}

The Go solution executes a SQL query similar to the Python Pandas approach. The main difference is the explicit connection to the database and row scanning into a Go struct. COUNT(DISTINCT ...) handles unique orders and customers automatically. Error handling is essential for database operations.

Worked Examples

Using the example table:

  1. Filter orders with invoice > 20. Rows remaining: 1, 2, 4, 7, 8, 9.
  2. Extract months: 2020-09, 2020-09, 2020-10, 2020-12, 2020-12, 2021-01.
  3. Group by month:
  • 2020-09: orders = [1, 2], customers = [1, 2]
  • 2020-10: orders = [4], customers = [3]
  • 2020-12: orders = [7, 8], customers = [4]
  • 2021-01: orders = [9], customers = [3]
  1. Count distinct orders and customers for each month, producing the output table.

Complexity Analysis

Measure Complexity Explanation
Time O(n) We iterate over each row once to filter and extract months, then aggregate, which is linear in the number of orders n
Space O(m) Space required for the grouped result and intermediate structures, where m is the number of unique months

The complexity is efficient because filtering and grouping operations are handled by vectorized operations in Python or SQL engine optimizations in Go.

Test Cases

import pandas as pd

orders = pd.DataFrame({
    "order_id": [1,2,3,4,5,6,7,8,9,10],
    "order_date": pd.to_datetime(["2020-09-15","2020-09-17","2020-10-06","2020-10-20","2020-11-10","2020-11-21","2020-12-01","2020-12-03","2021-01-07","2021-01-15"]),
    "customer_id": [1,2,3,3,1,2,4,4,3,2],
    "invoice": [30,90,20,21,10,15,55,77,31,20]
})

result = unique_orders_customers_per_month(orders)

assert result.loc[result['month']=='2020-09', 'order_count'].iloc[0] == 2  # September 2020
assert result.loc[result['month']=='2020-09', 'customer_count'].iloc[0] == 2
assert result.loc[result['month']=='2020-10', 'order_count'].iloc[0] == 1  # October 2020
assert result.loc[result['month']=='2020-10', 'customer_count'].iloc[0] == 1
assert result.loc[result['month']=='2020-12', 'order_count'].iloc[0] == 2  # December 2020
assert result.loc[result['month']=='2020-12', 'customer_count'].iloc[0] == 1
assert result.loc[result['month']=='2021-01', 'order_count'].iloc[0] == 1  # January 2021
assert result.loc[result['month']=='2021-01', 'customer_count'].iloc[0] == 1
Test Why
Provided example Validates basic filtering and monthly aggregation
Months with invoices ≤ 20 Confirms these months are excluded
Multiple orders per customer in the same month Ensures COUNT(DISTINCT customer_id) works

Edge Cases

One edge case is a month where all orders have invoices ≤ 20. The algorithm correctly excludes such months by filtering first. Another edge case is a customer making multiple qualifying orders in the same month; using COUNT(DISTINCT customer_id) ensures they are only counted once. A third edge case involves year boundaries, e.g., orders on December 31 and January 1; extracting months as YYYY-MM ensures these are treated as separate months and counted correctly. These cases are all handled naturally by the filter and grouping steps.