LeetCode 1543 - Fix Product Name Format

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

The problem asks us to clean and aggregate sales data from a Sales table. Each row contains a unique sale_id, a product_name, and a sale_date. Due to manual data entry errors, product_name may contain extra leading or trailing spaces and may use inconsistent letter casing. Our task is to produce a report that normalizes the product names (lowercase, trimmed) and aggregates the total number of sales per product per month. The sale_date in the output should be truncated to the year and month in 'YYYY-MM' format.

The input represents raw sales records with potential inconsistencies in text formatting. The output must be a table with three columns: product_name, sale_date, and total representing the count of sales for that product in the corresponding month. The output should be ordered alphabetically by product_name, and if multiple rows have the same product_name, they should be sorted by sale_date.

Key constraints are minimal because the table is relatively small (daily sales since 2000). However, we must handle text inconsistencies and ensure correct aggregation. Edge cases include product names with only spaces, varying capitalization, months with no sales for a product, and ensuring date formatting truncates to year and month correctly.

Approaches

Brute Force Approach

A naive approach would involve iterating over all sales, cleaning each product name by trimming spaces and converting to lowercase, then grouping by product name and month manually in code. We could store counts in a dictionary keyed by (product_name, year-month). While this works for correctness, it is unnecessary in SQL because SQL provides aggregation functions like COUNT() and string functions like TRIM() and LOWER(). Using a programming approach would also introduce overhead in memory management and sorting.

Optimal Approach

The key observation is that SQL already has functions to clean and aggregate data. We can use TRIM() to remove spaces, LOWER() to normalize case, and DATE_FORMAT() (MySQL) or TO_CHAR() (PostgreSQL) to extract the YYYY-MM from sale_date. Then, we can GROUP BY the normalized product name and month, and ORDER BY accordingly. This leverages the database engine for aggregation and sorting efficiently, minimizing memory usage and computation in application code.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n)	O(n)	Iterate and aggregate manually in code using dictionaries or maps
Optimal	O(n log n)	O(n)	Use SQL aggregation, sorting handled by database engine

Algorithm Walkthrough

Select product_name from the Sales table.
Use TRIM() to remove leading and trailing spaces.
Use LOWER() to convert the product name to lowercase.
Extract the year and month from sale_date using DATE_FORMAT(sale_date, '%Y-%m').
Count the number of rows for each (product_name, year-month) combination using COUNT(*).
Group the results by normalized product_name and truncated sale_date.
Order the final result first by product_name ascending, then by sale_date ascending.

Why it works: SQL aggregation guarantees that all rows with the same normalized product name and month are counted exactly once. Using TRIM() and LOWER() ensures consistent naming, and DATE_FORMAT() standardizes the date. Sorting ensures the output meets the required order.

Python Solution

# Python does not directly run SQL in LeetCode; this is a demonstration using sqlite3
import sqlite3

def fix_product_name_format(conn: sqlite3.Connection) -> list[tuple[str, str, int]]:
    cursor = conn.cursor()
    query = """
    SELECT LOWER(TRIM(product_name)) AS product_name,
           STRFTIME('%Y-%m', sale_date) AS sale_date,
           COUNT(*) AS total
    FROM Sales
    GROUP BY product_name, sale_date
    ORDER BY product_name ASC, sale_date ASC
    """
    cursor.execute(query)
    return cursor.fetchall()

In this Python version, we assume an active database connection conn. The SQL query normalizes product_name, formats sale_date to YYYY-MM, counts sales per product per month, groups by both fields, and orders results correctly.

Go Solution

package main

import (
    "database/sql"
    _ "github.com/go-sql-driver/mysql"
    "fmt"
)

type ProductSale struct {
    ProductName string
    SaleDate    string
    Total       int
}

func FixProductNameFormat(db *sql.DB) ([]ProductSale, error) {
    query := `
    SELECT LOWER(TRIM(product_name)) AS product_name,
           DATE_FORMAT(sale_date, '%Y-%m') AS sale_date,
           COUNT(*) AS total
    FROM Sales
    GROUP BY product_name, sale_date
    ORDER BY product_name ASC, sale_date ASC
    `
    rows, err := db.Query(query)
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var results []ProductSale
    for rows.Next() {
        var ps ProductSale
        if err := rows.Scan(&ps.ProductName, &ps.SaleDate, &ps.Total); err != nil {
            return nil, err
        }
        results = append(results, ps)
    }
    return results, nil
}

func main() {
    db, err := sql.Open("mysql", "user:password@/dbname")
    if err != nil {
        panic(err)
    }
    defer db.Close()

    results, err := FixProductNameFormat(db)
    if err != nil {
        panic(err)
    }

    for _, r := range results {
        fmt.Println(r)
    }
}

In Go, the main differences include using sql.DB for database operations, scanning rows into a struct, and explicitly handling errors. The SQL query remains the same.

Worked Examples

For the input:

+---------+--------------+------------+
| sale_id | product_name | sale_date  |
+---------+--------------+------------+
| 1       | LCPHONE      | 2000-01-16 |
| 2       | LCPhone      | 2000-01-17 |
| 3       | LcPhOnE      | 2000-02-18 |
| 4       | LCKeyCHAiN   | 2000-02-19 |
| 5       | LCKeyChain   | 2000-02-28 |
| 6       | Matryoshka   | 2000-03-31 |
+---------+--------------+------------+

Step-by-step:

Normalize product_name:

"LCPHONE" -> "lcphone"
"LCKeyCHAiN" -> "lckeychain"
"Matryoshka" -> "matryoshka"

Extract YYYY-MM:

"2000-01-16" -> "2000-01"
"2000-02-19" -> "2000-02"
"2000-03-31" -> "2000-03"

Count per (product_name, month):

lcphone, 2000-01 -> 2
lcphone, 2000-02 -> 1
lckeychain, 2000-02 -> 2
matryoshka, 2000-03 -> 1

Sort alphabetically by product name and then by date.

Final output:

+--------------+-----------+-------+
| product_name | sale_date | total |
+--------------+-----------+-------+
| lckeychain   | 2000-02   | 2     |
| lcphone      | 2000-01   | 2     |
| lcphone      | 2000-02   | 1     |
| matryoshka   | 2000-03   | 1     |
+--------------+-----------+-------+

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n log n)	SQL must scan all n rows and sort the grouped results
Space	O(n)	Storing intermediate groups and final results

The complexity is dominated by the grouping and sorting operations in SQL. Database engines handle these efficiently using optimized internal algorithms.

Test Cases

# Assuming SQLite connection
import sqlite3

conn = sqlite3.connect(":memory:")
c = conn.cursor()
c.execute("""
CREATE TABLE Sales (
    sale_id INTEGER PRIMARY KEY,
    product_name TEXT,
    sale_date DATE
)
""")
c.executemany("INSERT INTO Sales VALUES (?, ?, ?)", [
    (1, 'LCPHONE', '2000-01-16'),
    (2, 'LCPhone', '2000-01-17'),
    (3, 'LcPhOnE', '2000-02-18'),
    (4, 'LCKeyCHAiN', '2000-02-19'),
    (5, 'LCKeyChain', '2000-02-28'),
    (6, 'Matryoshka', '2000-03-31')
])

result = fix_product_name_format(conn)
assert result == [
    ('lckeychain', '2000-02', 2),
    ('lcphone', '2000-01', 2),
    ('lcphone', '2000-02', 1),
    ('matryosh