LeetCode 1543 - Fix Product Name Format
The problem asks us to clean and aggregate sales data from a Sales table. Each row contains a unique saleid, a productna
Difficulty: 🟢 Easy
Topics: Database
Solution
Problem Understanding
The problem asks us to clean and aggregate sales data from a Sales table. Each row contains a unique sale_id, a product_name, and a sale_date. Due to manual data entry errors, product_name may contain extra leading or trailing spaces and may use inconsistent letter casing. Our task is to produce a report that normalizes the product names (lowercase, trimmed) and aggregates the total number of sales per product per month. The sale_date in the output should be truncated to the year and month in 'YYYY-MM' format.
The input represents raw sales records with potential inconsistencies in text formatting. The output must be a table with three columns: product_name, sale_date, and total representing the count of sales for that product in the corresponding month. The output should be ordered alphabetically by product_name, and if multiple rows have the same product_name, they should be sorted by sale_date.
Key constraints are minimal because the table is relatively small (daily sales since 2000). However, we must handle text inconsistencies and ensure correct aggregation. Edge cases include product names with only spaces, varying capitalization, months with no sales for a product, and ensuring date formatting truncates to year and month correctly.
Approaches
Brute Force Approach
A naive approach would involve iterating over all sales, cleaning each product name by trimming spaces and converting to lowercase, then grouping by product name and month manually in code. We could store counts in a dictionary keyed by (product_name, year-month). While this works for correctness, it is unnecessary in SQL because SQL provides aggregation functions like COUNT() and string functions like TRIM() and LOWER(). Using a programming approach would also introduce overhead in memory management and sorting.
Optimal Approach
The key observation is that SQL already has functions to clean and aggregate data. We can use TRIM() to remove spaces, LOWER() to normalize case, and DATE_FORMAT() (MySQL) or TO_CHAR() (PostgreSQL) to extract the YYYY-MM from sale_date. Then, we can GROUP BY the normalized product name and month, and ORDER BY accordingly. This leverages the database engine for aggregation and sorting efficiently, minimizing memory usage and computation in application code.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(n) | Iterate and aggregate manually in code using dictionaries or maps |
| Optimal | O(n log n) | O(n) | Use SQL aggregation, sorting handled by database engine |
Algorithm Walkthrough
- Select
product_namefrom theSalestable. - Use
TRIM()to remove leading and trailing spaces. - Use
LOWER()to convert the product name to lowercase. - Extract the year and month from
sale_dateusingDATE_FORMAT(sale_date, '%Y-%m'). - Count the number of rows for each
(product_name, year-month)combination usingCOUNT(*). - Group the results by normalized
product_nameand truncatedsale_date. - Order the final result first by
product_nameascending, then bysale_dateascending.
Why it works: SQL aggregation guarantees that all rows with the same normalized product name and month are counted exactly once. Using TRIM() and LOWER() ensures consistent naming, and DATE_FORMAT() standardizes the date. Sorting ensures the output meets the required order.
Python Solution
# Python does not directly run SQL in LeetCode; this is a demonstration using sqlite3
import sqlite3
def fix_product_name_format(conn: sqlite3.Connection) -> list[tuple[str, str, int]]:
cursor = conn.cursor()
query = """
SELECT LOWER(TRIM(product_name)) AS product_name,
STRFTIME('%Y-%m', sale_date) AS sale_date,
COUNT(*) AS total
FROM Sales
GROUP BY product_name, sale_date
ORDER BY product_name ASC, sale_date ASC
"""
cursor.execute(query)
return cursor.fetchall()
In this Python version, we assume an active database connection conn. The SQL query normalizes product_name, formats sale_date to YYYY-MM, counts sales per product per month, groups by both fields, and orders results correctly.
Go Solution
package main
import (
"database/sql"
_ "github.com/go-sql-driver/mysql"
"fmt"
)
type ProductSale struct {
ProductName string
SaleDate string
Total int
}
func FixProductNameFormat(db *sql.DB) ([]ProductSale, error) {
query := `
SELECT LOWER(TRIM(product_name)) AS product_name,
DATE_FORMAT(sale_date, '%Y-%m') AS sale_date,
COUNT(*) AS total
FROM Sales
GROUP BY product_name, sale_date
ORDER BY product_name ASC, sale_date ASC
`
rows, err := db.Query(query)
if err != nil {
return nil, err
}
defer rows.Close()
var results []ProductSale
for rows.Next() {
var ps ProductSale
if err := rows.Scan(&ps.ProductName, &ps.SaleDate, &ps.Total); err != nil {
return nil, err
}
results = append(results, ps)
}
return results, nil
}
func main() {
db, err := sql.Open("mysql", "user:password@/dbname")
if err != nil {
panic(err)
}
defer db.Close()
results, err := FixProductNameFormat(db)
if err != nil {
panic(err)
}
for _, r := range results {
fmt.Println(r)
}
}
In Go, the main differences include using sql.DB for database operations, scanning rows into a struct, and explicitly handling errors. The SQL query remains the same.
Worked Examples
For the input:
+---------+--------------+------------+
| sale_id | product_name | sale_date |
+---------+--------------+------------+
| 1 | LCPHONE | 2000-01-16 |
| 2 | LCPhone | 2000-01-17 |
| 3 | LcPhOnE | 2000-02-18 |
| 4 | LCKeyCHAiN | 2000-02-19 |
| 5 | LCKeyChain | 2000-02-28 |
| 6 | Matryoshka | 2000-03-31 |
+---------+--------------+------------+
Step-by-step:
- Normalize
product_name:
"LCPHONE" -> "lcphone"
"LCKeyCHAiN" -> "lckeychain"
"Matryoshka" -> "matryoshka"
- Extract
YYYY-MM:
"2000-01-16" -> "2000-01"
"2000-02-19" -> "2000-02"
"2000-03-31" -> "2000-03"
- Count per
(product_name, month):
lcphone, 2000-01 -> 2
lcphone, 2000-02 -> 1
lckeychain, 2000-02 -> 2
matryoshka, 2000-03 -> 1
- Sort alphabetically by product name and then by date.
Final output:
+--------------+-----------+-------+
| product_name | sale_date | total |
+--------------+-----------+-------+
| lckeychain | 2000-02 | 2 |
| lcphone | 2000-01 | 2 |
| lcphone | 2000-02 | 1 |
| matryoshka | 2000-03 | 1 |
+--------------+-----------+-------+
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | SQL must scan all n rows and sort the grouped results |
| Space | O(n) | Storing intermediate groups and final results |
The complexity is dominated by the grouping and sorting operations in SQL. Database engines handle these efficiently using optimized internal algorithms.
Test Cases
# Assuming SQLite connection
import sqlite3
conn = sqlite3.connect(":memory:")
c = conn.cursor()
c.execute("""
CREATE TABLE Sales (
sale_id INTEGER PRIMARY KEY,
product_name TEXT,
sale_date DATE
)
""")
c.executemany("INSERT INTO Sales VALUES (?, ?, ?)", [
(1, 'LCPHONE', '2000-01-16'),
(2, 'LCPhone', '2000-01-17'),
(3, 'LcPhOnE', '2000-02-18'),
(4, 'LCKeyCHAiN', '2000-02-19'),
(5, 'LCKeyChain', '2000-02-28'),
(6, 'Matryoshka', '2000-03-31')
])
result = fix_product_name_format(conn)
assert result == [
('lckeychain', '2000-02', 2),
('lcphone', '2000-01', 2),
('lcphone', '2000-02', 1),
('matryosh