LeetCode 1867 - Orders With Maximum Quantity Above Average
The problem asks us to identify "imbalanced orders" from an OrdersDetails table. Each row represents a product within an order, containing the orderid, productid, and quantity ordered. An order can have multiple products, meaning multiple rows can share the same orderid.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
The problem asks us to identify "imbalanced orders" from an OrdersDetails table. Each row represents a product within an order, containing the order_id, product_id, and quantity ordered. An order can have multiple products, meaning multiple rows can share the same order_id.
An imbalanced order is defined as one whose maximum quantity of any single product is strictly greater than the average quantity across all orders (including itself). The average quantity of an order is computed as the sum of all quantities in that order divided by the number of products in that order. The maximum quantity is simply the largest quantity in that order.
We are given a table of arbitrary size and must return the order_id values for all imbalanced orders. There are no explicit constraints on the number of orders or products, but the use of aggregate calculations implies that a naive solution that compares each order to every other order individually may be inefficient.
Key edge cases include:
- Orders with only one product: the maximum equals the average.
- Orders where all quantities are equal: the maximum equals the average.
- Multiple orders sharing the same maximum or average: careful comparison is required since "strictly greater" is needed.
- Negative or zero quantities are not expected since this is an e-commerce order context, simplifying calculations.
The problem guarantees that (order_id, product_id) is unique, so no duplicates exist for the same product within an order.
The problem provides a relational table OrdersDetails where each row represents a product within an order. Each order_id can appear multiple times because an order may contain multiple products, and each row stores the quantity of a specific product in that order.
For every order, we define two metrics. The first is the average quantity, computed as the sum of all quantities in that order divided by the number of distinct products in that order. The second is the maximum quantity, which is simply the largest single product quantity within that order.
An order is considered imbalanced if its maximum product quantity is strictly greater than the average quantity of every order in the table, including itself. In other words, if we compute the average for all orders, an order qualifies only if its maximum is greater than the global maximum of those averages.
The task is to return all order_id values that satisfy this condition.
The input size is not explicitly given, but since this is a database problem, we assume potentially large datasets where efficiency matters. A naive approach that repeatedly scans the table per order would be too slow.
Important edge cases include orders with a single product (where max equals average), orders with identical averages across all orders, and orders where floating point division could lead to precision issues if not handled carefully.
Approaches
Brute Force Approach
A brute-force approach would involve calculating the maximum and average quantity for every order and then, for each order, comparing its maximum quantity against the average quantities of all orders. If the maximum quantity is greater than every average, we mark it as imbalanced.
While this approach works logically, it is inefficient. For n orders, if each order is compared to all other averages, the complexity is O(n²), which becomes impractical for large datasets. In SQL, this could translate to self-joins, which are expensive.
Optimal Approach
The key insight is that an order is imbalanced if its maximum quantity exceeds the maximum of all average quantities across orders. This is because, for it to be greater than every average, it must at least exceed the highest average. Therefore, we do not need to compare each order to every other order individually.
The optimal approach involves two aggregate steps:
- Compute the average and maximum quantity per order.
- Find the maximum of all average quantities.
- Select the orders whose maximum quantity exceeds this global maximum average.
This reduces comparisons significantly and is efficient in both SQL and programming implementations.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(n) | Compare each order's max to all order averages |
| Optimal | O(n) | O(n) | Compute per-order aggregates, then compare to global max average |
Algorithm Walkthrough
- Aggregate per order: Iterate through each order and calculate the total quantity, the number of products, and the maximum quantity. Store these as
(order_id, avg_quantity, max_quantity)for each order. - Compute global maximum average: Iterate through all orders and determine the maximum average quantity across all orders.
- Filter imbalanced orders: For each order, compare its maximum quantity to the global maximum average. If
max_quantity > global_max_average, include the order in the result. - Return results: Output the list of
order_ids that meet the condition.
Why it works:
By the problem definition, an order is imbalanced if its maximum quantity exceeds all averages. The global maximum average is the largest among all order averages. Therefore, comparing an order's maximum against this global maximum is both necessary and sufficient to identify imbalanced orders. The brute-force solution computes, for each order, its average and maximum, and then compares the order’s maximum against the average of every other order. This requires recomputing aggregates repeatedly or performing nested scans across orders. While correct, it is inefficient because each comparison may require scanning the entire dataset multiple times.
Optimal Approach
The key observation is that the condition “maximum quantity is strictly greater than the average quantity of every order” is equivalent to saying:
We only need the global maximum average across all orders, and then filter orders whose maximum exceeds that single value.
Thus, instead of comparing each order to all others, we first compute per-order aggregates (sum, count, max), derive per-order averages, and then compute the maximum of those averages globally. Finally, we select orders whose max quantity exceeds this global threshold.
This reduces the problem to two aggregation passes, which SQL handles efficiently using GROUP BY.
Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(n) | Repeated scanning per order comparison |
| Optimal | O(n) | O(k) | Two aggregation passes using GROUP BY |
Algorithm Walkthrough
- First, group the table by
order_idto compute three values for each order: total quantity, number of products, and maximum single-product quantity. This step is necessary because all required metrics are order-level aggregates derived from row-level data. - Compute the average quantity per order as total quantity divided by product count. We keep this as a derived column because we need to compare averages across all orders.
- Compute the maximum of these averages across all orders. This becomes the global threshold that determines whether an order is imbalanced.
- Filter the previously computed order aggregates to retain only those orders where
max_quantity > global_max_average. - Return the resulting
order_idvalues.
Why it works
The key invariant is that the condition “maximum is greater than the average of every order” reduces to “maximum is greater than the maximum average across all orders.” This transformation eliminates the need for pairwise comparisons and ensures correctness through global aggregation.
Python Solution
import pandas as pd
def find_imbalanced_orders(OrdersDetails: pd.DataFrame) -> pd.DataFrame:
# Step 1: Calculate per-order aggregates
order_stats = OrdersDetails.groupby('order_id').agg(
total_quantity=('quantity', 'sum'),
product_count=('product_id', 'count'),
max_quantity=('quantity', 'max')
).reset_index()
# Calculate average quantity per order
order_stats['avg_quantity'] = order_stats['total_quantity'] / order_stats['product_count']
# Step 2: Determine the maximum of all average quantities
global_max_avg = order_stats['avg_quantity'].max()
# Step 3: Filter orders whose max quantity exceeds global max average
imbalanced_orders = order_stats[order_stats['max_quantity'] > global_max_avg][['order_id']]
return imbalanced_orders
The implementation first aggregates orders to compute their total, count, and maximum quantity. Then it derives the average quantity for each order. By finding the maximum average across all orders, it can efficiently filter the imbalanced orders using a single comparison per order. from typing import List import pandas as pd
def find_imbalanced_orders(order_details: pd.DataFrame) -> pd.DataFrame: # Step 1: compute per-order aggregates grouped = order_details.groupby("order_id").agg( total_quantity=("quantity", "sum"), product_count=("product_id", "count"), max_quantity=("quantity", "max") ).reset_index()
# Step 2: compute average quantity per order
grouped["avg_quantity"] = grouped["total_quantity"] / grouped["product_count"]
# Step 3: compute global maximum average across all orders
max_avg = grouped["avg_quantity"].max()
# Step 4: filter imbalanced orders
result = grouped[grouped["max_quantity"] > max_avg][["order_id"]]
return result
### Code Explanation
We first aggregate the dataset using `groupby`, which allows us to compute per-order statistics in a single pass. We calculate total quantity, number of products, and maximum quantity. Then we derive the average per order.
After computing all averages, we extract the global maximum average. This is the critical threshold that replaces all pairwise comparisons.
Finally, we filter orders whose maximum product quantity exceeds this threshold and return only their identifiers.
## Go Solution
```go
package main
import (
"database/sql"
_ "github.com/go-sql-driver/mysql"
"fmt"
)
type OrderDetail struct {
OrderID int
ProductID int
Quantity int
}
func findImbalancedOrders(db *sql.DB) ([]int, error) {
// Step 1: Calculate per-order aggregates using SQL
rows, err := db.Query(`
SELECT
order_id,
SUM(quantity) AS total_quantity,
COUNT(product_id) AS product_count,
MAX(quantity) AS max_quantity
FROM OrdersDetails
GROUP BY order_id
`)
if err != nil {
return nil, err
}
defer rows.Close()
type OrderStats struct {
OrderID int
AvgQuantity float64
MaxQuantity int
}
var orders []OrderStats
var globalMaxAvg float64
for rows.Next() {
var orderID, totalQty, prodCount, maxQty int
if err := rows.Scan(&orderID, &totalQty, &prodCount, &maxQty); err != nil {
return nil, err
}
avg := float64(totalQty) / float64(prodCount)
orders = append(orders, OrderStats{OrderID: orderID, AvgQuantity: avg, MaxQuantity: maxQty})
if avg > globalMaxAvg {
globalMaxAvg = avg
}
}
// Step 2: Filter orders whose max quantity exceeds global max average
var result []int
for _, o := range orders {
if float64(o.MaxQuantity) > globalMaxAvg {
result = append(result, o.OrderID)
}
}
return result, nil
}
The Go version uses SQL to aggregate per-order data and then filters in memory. It handles floating-point comparisons and iterates efficiently.
Worked Examples
Using the sample input:
| order_id | quantities | avg | max |
|---|---|---|---|
| 1 | 12, 10, 15 | 12.33 | 15 |
| 2 | 8, 4, 6, 4 | 5.5 | 8 |
| 3 | 5, 18, 20 | 14.33 | 20 |
| 4 | 2, 8 | 5 | 8 |
| 5 | 9, 9 | 9 | 9 |
Global maximum average: 14.33 (order 3). Compare each order's max:
- Order 1: 15 > 14.33 → imbalanced
- Order 2: 8 > 14.33 → false
- Order 3: 20 > 14.33 → imbalanced
- Order 4: 8 > 14.33 → false
- Order 5: 9 > 14.33 → false
Result: [1, 3]
"sort"
)
type OrderRow struct { orderID int productID int quantity int }
func findImbalancedOrders(rows []OrderRow) []int { type agg struct { sum int count int max int }
stats := make(map[int]*agg)
// Step 1: aggregate per order
for _, r := range rows {
if _, ok := stats[r.orderID]; !ok {
stats[r.orderID] = &agg{sum: 0, count: 0, max: 0}
}
a := stats[r.orderID]
a.sum += r.quantity
a.count++
if r.quantity > a.max {
a.max = r.quantity
}
}
// Step 2: compute averages and global max average
maxAvg := 0.0
avgMap := make(map[int]float64)
for id, a := range stats {
avg := float64(a.sum) / float64(a.count)
avgMap[id] = avg
if avg > maxAvg {
maxAvg = avg
}
}
// Step 3: collect results
result := make([]int, 0)
for id, a := range stats {
avg := avgMap[id]
if float64(a.max) > maxAvg {
result = append(result, id)
}
}
sort.Ints(result)
return result
}
### Go-Specific Notes
Go requires explicit type conversions for floating-point division, so we cast integers to `float64`. We also use maps for grouping since Go lacks built-in groupby functionality. Sorting is optional depending on output requirements, but included here for deterministic results.
## Worked Examples
Using the sample input, we first compute per-order aggregates:
| order_id | sum | count | avg | max |
| --- | --- | --- | --- | --- |
| 1 | 37 | 3 | 12.33 | 15 |
| 2 | 22 | 4 | 5.5 | 8 |
| 3 | 43 | 3 | 14.33 | 20 |
| 4 | 10 | 2 | 5 | 8 |
| 5 | 18 | 2 | 9 | 9 |
Next we compute the global maximum average:
| order_id | avg |
| --- | --- |
| 1 | 12.33 |
| 2 | 5.5 |
| 3 | 14.33 |
| 4 | 5 |
| 5 | 9 |
Global max average = 14.33.
Now we compare each order’s max:
- Order 1: 15 > 14.33 → included
- Order 2: 8 > 14.33 → no
- Order 3: 20 > 14.33 → included
- Order 4: 8 > 14.33 → no
- Order 5: 9 > 14.33 → no
Final result: [1, 3]
## Complexity Analysis
| Measure | Complexity | Explanation |
| --- | --- | --- |
| Time | O(n) | Aggregate per order and single max comparison per order |
| Space | O(n) | Store aggregates for each order in memory |
The algorithm scales linearly with the number of unique orders and avoids quadratic comparisons of naive methods.
| Time | O(n) | Each row is processed once for aggregation |
| Space | O(k) | Storage for per-order aggregates, where k is number of orders |
The algorithm is linear in the number of rows because each record contributes only once to aggregation structures. Additional passes over grouped data are proportional to the number of distinct orders, which is typically much smaller than total rows.
## Test Cases
```python
import pandas as pd
# Example 1
df1 = pd.DataFrame([
[1, 1, 12], [1, 2, 10], [1, 3, 15],
[2, 1, 8], [2, 4, 4], [2, 5, 6], [2, 9, 4],
[3, 3, 5], [3, 4, 18], [3, 9, 20],
[4, 5, 2], [4, 6, 8],
[5, 7, 9], [5, 8, 9]
], columns=['order_id', 'product_id', 'quantity'])
assert set(find_imbalanced_orders(df1)['order_id']) == {1, 3}
# Single product order
df2 = pd.DataFrame([[1, 1, 10]], columns=['order_id', 'product_id', 'quantity'])
assert set(find_imbalanced_orders(df2)['order_id']) == set()
# Equal quantities
df3 = pd.DataFrame([
[1, 1, 5], [1, 2
def test_basic(): df = pd.DataFrame([ [1,1,12],[1,2,10],[1,3,15], [2,1,8],[2,4,4],[2,5,6],[2,9,4], [3,3,5],[3,4,18],[3,9,20], [4,5,2],[4,6,8], [5,7,9],[5,8,9] ], columns=["order_id","product_id","quantity"]) res = find_imbalanced_orders(df) assert set(res["order_id"]) == {1,3} # sample case
def test_single_product_orders(): df = pd.DataFrame([ [1,1,10], [2,1,9], [3,1,8] ], columns=["order_id","product_id","quantity"]) res = find_imbalanced_orders(df) assert set(res["order_id"]) == set() # max equals avg always
def test_all_equal(): df = pd.DataFrame([ [1,1,5],[1,2,5], [2,1,5],[2,2,5] ], columns=["order_id","product_id","quantity"]) res = find_imbalanced_orders(df) assert set(res["order_id"]) == set() # no imbalance
def test_one_large_outlier(): df = pd.DataFrame([ [1,1,100], [2,1,1],[2,2,1] ], columns=["order_id","product_id","quantity"]) res = find_imbalanced_orders(df) assert set(res["order_id"]) == {1} # only outlier order
def test_empty_result_safety(): df = pd.DataFrame([ [1,1,5],[1,2,5] ], columns=["order_id","product_id","quantity"]) res = find_imbalanced_orders(df) assert len(res) == 0 # no order exceeds global max avg
| Test | Why |
| --- | --- |
| sample case | validates correctness on given example |
| single product | tests max equals average edge case |
| all equal | ensures no false positives |
| outlier order | tests dominance of one order |
| small dataset | ensures safe handling of minimal input |
## Edge Cases
One important edge case is when an order contains only a single product. In this situation, the average and maximum are identical, meaning such orders can never be imbalanced unless compared against a smaller global average from other orders.
Another edge case occurs when all orders have identical averages. In this case, the global maximum average equals every individual average, so no order can satisfy the strict inequality condition. The implementation correctly handles this by requiring `max_quantity > max_avg`.
A final edge case involves floating-point precision. Since averages are computed via division, using integer division would silently produce incorrect results. The solution ensures proper floating-point arithmetic in both Python and Go to preserve correctness.