LeetCode 2408 - Design SQL
This problem asks us to design a lightweight in memory database system that supports four core operations on multiple tables: 1. Insert a row into a table 2. Remove a row by id 3. Select a single cell value 4.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Design
Solution
Problem Understanding
This problem asks us to design a lightweight in memory database system that supports four core operations on multiple tables:
- Insert a row into a table
- Remove a row by id
- Select a single cell value
- Export all rows in CSV style format
Each table has a fixed number of columns. The constructor receives two arrays:
names[i]is the table namecolumns[i]is the number of columns in that table
Every inserted row automatically receives an integer id. These ids are table specific and auto increment independently for each table.
One subtle but important rule is that deleted ids are never reused. Suppose a table inserts rows with ids 1, 2, and 3, then row 2 is deleted. The next inserted row must still receive id 4, not 2.
The operations behave as follows:
ins(name, row)inserts a row if the table exists and the column count matches exactly.rmv(name, rowId)removes a row if it exists.sel(name, rowId, columnId)retrieves a specific cell value.exp(name)exports all existing rows in insertion order, formatted as comma separated strings beginning with the row id.
The constraints are relatively small:
- At most
2000insertions and removals - At most
10^4select operations - At most
500export operations
These limits tell us we do not need highly advanced database indexing structures. A carefully designed hash map based solution is more than sufficient.
There are several important edge cases:
- Inserting into a non existent table
- Inserting a row with the wrong number of columns
- Selecting from a deleted row
- Selecting an invalid column index
- Exporting a table after deletions
- Ensuring deleted ids are never reused
A naive implementation can easily fail the auto increment requirement if it computes the next id from the current row count instead of tracking the last assigned id separately.
Approaches
Brute Force Approach
A straightforward implementation would store each table as a simple list of rows. Every inserted row would append to the list along with its generated id.
For removal, we could linearly search through the rows until we find the matching id and then remove it.
For selection, we could again linearly scan the list to locate the row id, then retrieve the desired column.
For export, we iterate through all rows and format them into CSV strings.
This approach is correct because every operation directly searches through the actual stored rows. However, selection and removal become inefficient because they require scanning the entire table.
In the worst case:
rmvbecomesO(n)selbecomesO(n)
With many queries, repeated linear scans are unnecessary and inefficient.
Optimal Approach
The key observation is that row ids are unique within each table. This makes hash maps ideal for row lookup.
Instead of storing rows only in a list, we maintain:
-
A hash map from table name to table metadata
-
For each table:
-
Expected column count
-
Next available auto increment id
-
A hash map from row id to row data
This immediately improves:
seltoO(1)rmvtoO(1)instoO(1)
The only operation that still requires iterating through rows is exp, because it must return every existing row.
The follow up about sparse tables strongly suggests using a hash map rather than a dense array indexed by row id. If many deletions occur, a dense array wastes memory storing empty slots, while a hash map stores only active rows.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) per select/remove |
O(n) |
Uses linear scans through rows |
| Optimal | O(1) average for insert/select/remove |
O(n) |
Uses hash maps for direct lookup |
Algorithm Walkthrough
- Create a table structure for each table name.
Each table stores:
- The expected number of columns
- The next available row id
- A hash map from row id to row contents
We initialize the next id as 1.
2. For insertion, first verify the table exists.
If the table name is invalid, return False.
3. Verify the inserted row has the correct number of columns.
If the size does not match the table definition, return False.
4. Assign the current auto increment id to the row.
Store the row in the table's row map using this id. 5. Increment the next available id.
This guarantees ids are never reused, even after deletions. 6. For removal, verify the table exists.
If it does, remove the row id from the hash map if present. 7. For selection, validate:
- The table exists
- The row exists
- The column index is within bounds
If any check fails, return "<null>".
8. If all validations pass, return the correct column value.
Note that columnId is 1 indexed, so we access row[columnId - 1].
9. For export, validate the table exists.
If not, return an empty list. 10. Iterate through all stored rows in ascending row id order.
For each row:
- Convert the id to string
- Concatenate it with all column values using commas
- Return the resulting list of CSV formatted rows.
Why it works
The correctness relies on maintaining a direct mapping between row ids and stored rows. Since every inserted row receives a unique incrementing id, lookups are unambiguous. Deletions only remove entries from the hash map and never affect the next id counter, which guarantees the auto increment rule is preserved.
Python Solution
from typing import List, Dict
class Table:
def __init__(self, column_count: int):
self.column_count = column_count
self.next_id = 1
self.rows: Dict[int, List[str]] = {}
class SQL:
def __init__(self, names: List[str], columns: List[int]):
self.tables: Dict[str, Table] = {}
for name, column_count in zip(names, columns):
self.tables[name] = Table(column_count)
def ins(self, name: str, row: List[str]) -> bool:
if name not in self.tables:
return False
table = self.tables[name]
if len(row) != table.column_count:
return False
row_id = table.next_id
table.rows[row_id] = row
table.next_id += 1
return True
def rmv(self, name: str, rowId: int) -> None:
if name not in self.tables:
return
table = self.tables[name]
if rowId in table.rows:
del table.rows[rowId]
def sel(self, name: str, rowId: int, columnId: int) -> str:
if name not in self.tables:
return "<null>"
table = self.tables[name]
if rowId not in table.rows:
return "<null>"
row = table.rows[rowId]
if columnId < 1 or columnId > table.column_count:
return "<null>"
return row[columnId - 1]
def exp(self, name: str) -> List[str]:
if name not in self.tables:
return []
table = self.tables[name]
exported_rows = []
for row_id in sorted(table.rows.keys()):
row = table.rows[row_id]
exported_rows.append(",".join([str(row_id)] + row))
return exported_rows
The implementation separates table metadata into a dedicated Table class. This improves readability because each table naturally owns its:
- Column count
- Auto increment counter
- Stored rows
The constructor builds a dictionary from table name to Table object.
Insertion validates both table existence and column count before storing the row. The current next_id becomes the row id, then the counter increments immediately afterward.
Removal simply deletes the row id from the dictionary if present.
Selection performs all required validity checks before accessing the requested column.
Export iterates through existing row ids in sorted order and formats each row as a CSV string beginning with the row id.
Go Solution
package main
import (
"strconv"
"strings"
)
type Table struct {
columnCount int
nextID int
rows map[int][]string
}
type SQL struct {
tables map[string]*Table
}
func Constructor(names []string, columns []int) SQL {
sql := SQL{
tables: make(map[string]*Table),
}
for i, name := range names {
sql.tables[name] = &Table{
columnCount: columns[i],
nextID: 1,
rows: make(map[int][]string),
}
}
return sql
}
func (this *SQL) Ins(name string, row []string) bool {
table, exists := this.tables[name]
if !exists {
return false
}
if len(row) != table.columnCount {
return false
}
rowID := table.nextID
table.rows[rowID] = row
table.nextID++
return true
}
func (this *SQL) Rmv(name string, rowId int) {
table, exists := this.tables[name]
if !exists {
return
}
delete(table.rows, rowId)
}
func (this *SQL) Sel(name string, rowId int, columnId int) string {
table, exists := this.tables[name]
if !exists {
return "<null>"
}
row, exists := table.rows[rowId]
if !exists {
return "<null>"
}
if columnId < 1 || columnId > table.columnCount {
return "<null>"
}
return row[columnId-1]
}
func (this *SQL) Exp(name string) []string {
table, exists := this.tables[name]
if !exists {
return []string{}
}
result := []string{}
for rowID := 1; rowID < table.nextID; rowID++ {
row, exists := table.rows[rowID]
if !exists {
continue
}
values := append([]string{strconv.Itoa(rowID)}, row...)
result = append(result, strings.Join(values, ","))
}
return result
}
The Go implementation mirrors the Python logic closely but uses structs and maps instead of classes and dictionaries.
One notable difference is the export implementation. Since Go maps are unordered, we iterate from 1 to nextID - 1 and check whether each row still exists. This naturally preserves insertion order while skipping deleted rows.
Go slices are used for row storage, and the built in delete function removes rows efficiently from the map.
Worked Examples
Example 1
Input:
["SQL","ins","sel","ins","exp","rmv","sel","exp"]
Initial tables:
| Table | Columns | Next ID | Rows |
|---|---|---|---|
| one | 2 | 1 | {} |
| two | 3 | 1 | {} |
| three | 1 | 1 | {} |
Step 1
sql.ins("two", ["first", "second", "third"])
Table state:
| Row ID | Row |
|---|---|
| 1 | ["first", "second", "third"] |
Next ID becomes 2.
Step 2
sql.sel("two", 1, 3)
Row 1 exists.
Column 3 corresponds to "third".
Return:
"third"
Step 3
sql.ins("two", ["fourth", "fifth", "sixth"])
Table state:
| Row ID | Row |
|---|---|
| 1 | ["first", "second", "third"] |
| 2 | ["fourth", "fifth", "sixth"] |
Next ID becomes 3.
Step 4
sql.exp("two")
Exported rows:
[
"1,first,second,third",
"2,fourth,fifth,sixth"
]
Step 5
sql.rmv("two", 1)
Table state:
| Row ID | Row |
|---|---|
| 2 | ["fourth", "fifth", "sixth"] |
Step 6
sql.sel("two", 2, 2)
Return:
"fifth"
Step 7
sql.exp("two")
Return:
[
"2,fourth,fifth,sixth"
]
Example 2
After removing row 1, selecting from it returns:
"<null>"
When inserting a row with only two columns into a table expecting three columns:
sql.ins("two", ["fourth", "fifth"])
The operation returns:
False
because the column count does not match.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(1) average for insert/remove/select, O(n) for export |
Hash map operations are constant average time |
| Space | O(n) |
Stores all active rows |
The dominant storage cost comes from keeping all rows in memory. Hash maps allow efficient direct access by row id, which makes insertions, deletions, and lookups fast. Export must still iterate through all rows because every row must appear in the output.
Test Cases
# Example 1
sql = SQL(["one", "two", "three"], [2, 3, 1])
assert sql.ins("two", ["first", "second", "third"]) is True
assert sql.sel("two", 1, 3) == "third"
assert sql.ins("two", ["fourth", "fifth", "sixth"]) is True
assert sql.exp("two") == [
"1,first,second,third",
"2,fourth,fifth,sixth"
]
sql.rmv("two", 1)
assert sql.sel("two", 2, 2) == "fifth"
assert sql.exp("two") == [
"2,fourth,fifth,sixth"
]
# Example 2
sql = SQL(["one", "two", "three"], [2, 3, 1])
assert sql.ins("two", ["first", "second", "third"]) is True
assert sql.sel("two", 1, 3) == "third"
sql.rmv("two", 1)
assert sql.sel("two", 1, 2) == "<null>" # deleted row
assert sql.ins("two", ["fourth", "fifth"]) is False # wrong column count
assert sql.ins("two", ["fourth", "fifth", "sixth"]) is True
# Invalid table name
sql = SQL(["users"], [2])
assert sql.ins("invalid", ["a", "b"]) is False
assert sql.sel("invalid", 1, 1) == "<null>"
assert sql.exp("invalid") == []
# Invalid column index
sql = SQL(["users"], [2])
sql.ins("users", ["alice", "bob"])
assert sql.sel("users", 1, 0) == "<null>" # too small
assert sql.sel("users", 1, 3) == "<null>" # too large
# Auto increment after deletion
sql = SQL(["users"], [1])
sql.ins("users", ["a"])
sql.ins("users", ["b"])
sql.rmv("users", 1)
sql.ins("users", ["c"])
assert sql.exp("users") == [
"2,b",
"3,c"
]
# Empty export
sql = SQL(["users"], [2])
assert sql.exp("users") == []
# Remove nonexistent row
sql = SQL(["users"], [1])
sql.rmv("users", 99)
assert sql.exp("users") == []
# Multiple tables
sql = SQL(["a", "b"], [1, 2])
sql.ins("a", ["x"])
sql.ins("b", ["y", "z"])
assert sql.exp("a") == ["1,x"]
assert sql.exp("b") == ["1,y,z"]
| Test | Why |
|---|---|
| Example 1 | Validates all core operations together |
| Example 2 | Verifies deletion and invalid insertion |
| Invalid table | Ensures graceful handling of bad table names |
| Invalid column index | Tests bounds checking |
| Auto increment after deletion | Confirms ids are never reused |
| Empty export | Verifies export on empty table |
| Remove nonexistent row | Ensures safe deletion |
| Multiple tables | Confirms independent table state |
Edge Cases
Deleting Rows Without Reusing IDs
One easy mistake is generating the next row id using the current row count. That fails after deletions because ids would be reused. For example, after inserting rows 1 and 2, then deleting 2, the next insertion must still receive id 3.
The implementation solves this by maintaining a dedicated next_id counter that only increases and never decreases.
Selecting Invalid Columns
The problem uses 1 indexed column ids. This can easily cause off by one bugs when accessing arrays.
The implementation validates:
1 <= columnId <= column_count
before accessing:
row[columnId - 1]
This guarantees safe indexing.
Sparse Tables After Many Deletions
If many rows are deleted, using an array indexed directly by row id wastes memory because deleted positions remain empty.
The hash map based design stores only active rows. Deleted rows disappear entirely from memory, making the implementation efficient even when tables become sparse. This directly addresses the follow up question and is the preferred approach for real database like workloads.