LeetCode 1084 - Sales Analysis III

The problem asks us to identify products that were sold only during the first quarter of 2019, meaning between 2019-01-01 and 2019-03-31, inclusive. We are provided with two tables: Product and Sales.

LeetCode Problem 1084

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

The problem asks us to identify products that were sold only during the first quarter of 2019, meaning between 2019-01-01 and 2019-03-31, inclusive. We are provided with two tables: Product and Sales. The Product table lists all products along with their IDs and unit prices, while the Sales table lists individual sales transactions with the product_id, sale_date, and other sales-related information.

The output should be a table of product_id and product_name for all products that appear in sales only in the first quarter of 2019. Products sold outside this period should be excluded. If a product is never sold, it should not appear in the result.

Important edge cases include products that were never sold, products sold exactly on the boundaries (2019-01-01 or 2019-03-31), and products sold multiple times in the quarter. The solution should handle duplicate sales entries correctly.

Approaches

A brute-force approach would be to scan the Sales table and check each product to see if all its sales fall within the first quarter of 2019. We could use multiple nested loops or aggregations to verify the condition. While this works, it requires examining every row multiple times, which can be inefficient for large datasets.

The key insight for an optimal solution is to group sales by product_id and check two things per product: whether it has any sales outside the first quarter, and whether it has at least one sale within the first quarter. SQL provides tools like GROUP BY and HAVING to efficiently perform this aggregation without scanning the data multiple times per product.

The optimal approach uses aggregation to filter products using conditions on the minimum and maximum sale dates or by counting sales inside versus outside the quarter.

Approach Time Complexity Space Complexity Notes
Brute Force O(n * m) O(n) Scan all sales for each product to check dates; n = #products, m = #sales
Optimal O(m) O(n) Group by product_id, filter using conditional aggregation on sale_date

Algorithm Walkthrough

  1. Filter sales by first quarter and outside quarter: For each product, count how many sales occurred in the first quarter and how many occurred outside it.
  2. Group by product_id: Use SQL GROUP BY to aggregate all sales rows for each product.
  3. Use conditional aggregation: Count the number of sales outside the quarter. Products with zero sales outside the quarter and at least one sale in the quarter satisfy our criteria.
  4. Join with Product table: Once we have the qualifying product IDs, join them with the Product table to get the product_name.
  5. Return the result: Select product_id and product_name for all products satisfying the above conditions. Order is not important.

Why it works: By grouping sales by product_id and counting the sales outside the quarter, we ensure that only products sold exclusively within the first quarter are selected. Aggregation guarantees that all sales are considered exactly once per product.

Python Solution

# LeetCode 1084: SQL problem; for Python we simulate using pandas or SQLAlchemy style
import sqlite3
from typing import List, Tuple

def sales_analysis_iii(connection: sqlite3.Connection) -> List[Tuple[int, str]]:
    query = """
    SELECT p.product_id, p.product_name
    FROM Product p
    JOIN (
        SELECT product_id
        FROM Sales
        GROUP BY product_id
        HAVING SUM(CASE WHEN sale_date < '2019-01-01' OR sale_date > '2019-03-31' THEN 1 ELSE 0 END) = 0
           AND SUM(CASE WHEN sale_date >= '2019-01-01' AND sale_date <= '2019-03-31' THEN 1 ELSE 0 END) > 0
    ) AS sq
    ON p.product_id = sq.product_id
    """
    cursor = connection.cursor()
    cursor.execute(query)
    return cursor.fetchall()

This implementation uses a subquery to group Sales by product_id and applies conditional sums to filter products. Products with any sale outside the first quarter are excluded, and only those with at least one sale inside the quarter are kept. Finally, we join with the Product table to retrieve the product_name.

Go Solution

package main

import (
    "database/sql"
    _ "github.com/go-sql-driver/mysql"
    "fmt"
)

func SalesAnalysisIII(db *sql.DB) ([]struct{ ProductID int; ProductName string }, error) {
    query := `
    SELECT p.product_id, p.product_name
    FROM Product p
    JOIN (
        SELECT product_id
        FROM Sales
        GROUP BY product_id
        HAVING SUM(CASE WHEN sale_date < '2019-01-01' OR sale_date > '2019-03-31' THEN 1 ELSE 0 END) = 0
           AND SUM(CASE WHEN sale_date >= '2019-01-01' AND sale_date <= '2019-03-31' THEN 1 ELSE 0 END) > 0
    ) AS sq
    ON p.product_id = sq.product_id
    `
    rows, err := db.Query(query)
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var results []struct {
        ProductID   int
        ProductName string
    }

    for rows.Next() {
        var r struct {
            ProductID   int
            ProductName string
        }
        if err := rows.Scan(&r.ProductID, &r.ProductName); err != nil {
            return nil, err
        }
        results = append(results, r)
    }
    return results, nil
}

In Go, we handle the query results by scanning each row into a struct. Go requires explicit handling of rows and errors, unlike Python, which allows a more direct fetch. The logic of the SQL query remains identical.

Worked Examples

Using the provided example:

product_id sale_date inside_Q1 outside_Q1
1 2019-01-21 1 0
2 2019-02-17 1 0
2 2019-06-02 0 1
3 2019-05-13 0 1

Conditional sums per product:

product_id sum_outside sum_inside
1 0 1
2 1 1
3 1 0

Only product 1 satisfies sum_outside = 0 and sum_inside > 0, so it is returned.

Complexity Analysis

Measure Complexity Explanation
Time O(m) Each sales row is scanned once during aggregation; m = number of sales
Space O(n) Space for grouping by product_id; n = number of products

The SQL aggregation avoids nested loops, so we efficiently compute the result in a single pass per product.

Test Cases

# Using SQLite in-memory for illustration
import sqlite3

conn = sqlite3.connect(":memory:")
cursor = conn.cursor()
cursor.execute("CREATE TABLE Product(product_id INT, product_name TEXT, unit_price INT)")
cursor.execute("CREATE TABLE Sales(seller_id INT, product_id INT, buyer_id INT, sale_date TEXT, quantity INT, price INT)")

# Example data
cursor.executemany("INSERT INTO Product VALUES (?, ?, ?)", [
    (1, 'S8', 1000),
    (2, 'G4', 800),
    (3, 'iPhone', 1400)
])
cursor.executemany("INSERT INTO Sales VALUES (?, ?, ?, ?, ?, ?)", [
    (1, 1, 1, '2019-01-21', 2, 2000),
    (1, 2, 2, '2019-02-17', 1, 800),
    (2, 2, 3, '2019-06-02', 1, 800),
    (3, 3, 4, '2019-05-13', 2, 2800)
])

assert sales_analysis_iii(conn) == [(1, 'S8')]  # only Q1 product
Test Why
Q1 only sale Ensures correct inclusion
Sale outside Q1 Ensures exclusion of products sold later
Multiple sales per product Ensures aggregation works

Edge Cases

One edge case is a product that was never sold. The algorithm correctly excludes it because SUM on no rows results in null or zero, and the condition sum_inside > 0 fails.

Another edge case is sales exactly on 2019-01-01 or 2019-03-31. The solution uses inclusive comparison operators (>= and <=), so these boundary sales are counted as inside the quarter.

A third edge case is products sold multiple times inside the quarter but never outside. The aggregation correctly sums all inside-quarter sales and zero outside-quarter