LeetCode 2408 - Design SQL

This problem asks us to design a lightweight in memory database system that supports four core operations on multiple tables: 1. Insert a row into a table 2. Remove a row by id 3. Select a single cell value 4.

LeetCode Problem 2408

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Design

Solution

Problem Understanding

This problem asks us to design a lightweight in memory database system that supports four core operations on multiple tables:

  1. Insert a row into a table
  2. Remove a row by id
  3. Select a single cell value
  4. Export all rows in CSV style format

Each table has a fixed number of columns. The constructor receives two arrays:

  • names[i] is the table name
  • columns[i] is the number of columns in that table

Every inserted row automatically receives an integer id. These ids are table specific and auto increment independently for each table.

One subtle but important rule is that deleted ids are never reused. Suppose a table inserts rows with ids 1, 2, and 3, then row 2 is deleted. The next inserted row must still receive id 4, not 2.

The operations behave as follows:

  • ins(name, row) inserts a row if the table exists and the column count matches exactly.
  • rmv(name, rowId) removes a row if it exists.
  • sel(name, rowId, columnId) retrieves a specific cell value.
  • exp(name) exports all existing rows in insertion order, formatted as comma separated strings beginning with the row id.

The constraints are relatively small:

  • At most 2000 insertions and removals
  • At most 10^4 select operations
  • At most 500 export operations

These limits tell us we do not need highly advanced database indexing structures. A carefully designed hash map based solution is more than sufficient.

There are several important edge cases:

  • Inserting into a non existent table
  • Inserting a row with the wrong number of columns
  • Selecting from a deleted row
  • Selecting an invalid column index
  • Exporting a table after deletions
  • Ensuring deleted ids are never reused

A naive implementation can easily fail the auto increment requirement if it computes the next id from the current row count instead of tracking the last assigned id separately.

Approaches

Brute Force Approach

A straightforward implementation would store each table as a simple list of rows. Every inserted row would append to the list along with its generated id.

For removal, we could linearly search through the rows until we find the matching id and then remove it.

For selection, we could again linearly scan the list to locate the row id, then retrieve the desired column.

For export, we iterate through all rows and format them into CSV strings.

This approach is correct because every operation directly searches through the actual stored rows. However, selection and removal become inefficient because they require scanning the entire table.

In the worst case:

  • rmv becomes O(n)
  • sel becomes O(n)

With many queries, repeated linear scans are unnecessary and inefficient.

Optimal Approach

The key observation is that row ids are unique within each table. This makes hash maps ideal for row lookup.

Instead of storing rows only in a list, we maintain:

  • A hash map from table name to table metadata

  • For each table:

  • Expected column count

  • Next available auto increment id

  • A hash map from row id to row data

This immediately improves:

  • sel to O(1)
  • rmv to O(1)
  • ins to O(1)

The only operation that still requires iterating through rows is exp, because it must return every existing row.

The follow up about sparse tables strongly suggests using a hash map rather than a dense array indexed by row id. If many deletions occur, a dense array wastes memory storing empty slots, while a hash map stores only active rows.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) per select/remove O(n) Uses linear scans through rows
Optimal O(1) average for insert/select/remove O(n) Uses hash maps for direct lookup

Algorithm Walkthrough

  1. Create a table structure for each table name.

Each table stores:

  • The expected number of columns
  • The next available row id
  • A hash map from row id to row contents

We initialize the next id as 1. 2. For insertion, first verify the table exists.

If the table name is invalid, return False. 3. Verify the inserted row has the correct number of columns.

If the size does not match the table definition, return False. 4. Assign the current auto increment id to the row.

Store the row in the table's row map using this id. 5. Increment the next available id.

This guarantees ids are never reused, even after deletions. 6. For removal, verify the table exists.

If it does, remove the row id from the hash map if present. 7. For selection, validate:

  • The table exists
  • The row exists
  • The column index is within bounds

If any check fails, return "<null>". 8. If all validations pass, return the correct column value.

Note that columnId is 1 indexed, so we access row[columnId - 1]. 9. For export, validate the table exists.

If not, return an empty list. 10. Iterate through all stored rows in ascending row id order.

For each row:

  • Convert the id to string
  • Concatenate it with all column values using commas
  1. Return the resulting list of CSV formatted rows.

Why it works

The correctness relies on maintaining a direct mapping between row ids and stored rows. Since every inserted row receives a unique incrementing id, lookups are unambiguous. Deletions only remove entries from the hash map and never affect the next id counter, which guarantees the auto increment rule is preserved.

Python Solution

from typing import List, Dict

class Table:
    def __init__(self, column_count: int):
        self.column_count = column_count
        self.next_id = 1
        self.rows: Dict[int, List[str]] = {}

class SQL:

    def __init__(self, names: List[str], columns: List[int]):
        self.tables: Dict[str, Table] = {}

        for name, column_count in zip(names, columns):
            self.tables[name] = Table(column_count)

    def ins(self, name: str, row: List[str]) -> bool:
        if name not in self.tables:
            return False

        table = self.tables[name]

        if len(row) != table.column_count:
            return False

        row_id = table.next_id
        table.rows[row_id] = row
        table.next_id += 1

        return True

    def rmv(self, name: str, rowId: int) -> None:
        if name not in self.tables:
            return

        table = self.tables[name]

        if rowId in table.rows:
            del table.rows[rowId]

    def sel(self, name: str, rowId: int, columnId: int) -> str:
        if name not in self.tables:
            return "<null>"

        table = self.tables[name]

        if rowId not in table.rows:
            return "<null>"

        row = table.rows[rowId]

        if columnId < 1 or columnId > table.column_count:
            return "<null>"

        return row[columnId - 1]

    def exp(self, name: str) -> List[str]:
        if name not in self.tables:
            return []

        table = self.tables[name]
        exported_rows = []

        for row_id in sorted(table.rows.keys()):
            row = table.rows[row_id]
            exported_rows.append(",".join([str(row_id)] + row))

        return exported_rows

The implementation separates table metadata into a dedicated Table class. This improves readability because each table naturally owns its:

  • Column count
  • Auto increment counter
  • Stored rows

The constructor builds a dictionary from table name to Table object.

Insertion validates both table existence and column count before storing the row. The current next_id becomes the row id, then the counter increments immediately afterward.

Removal simply deletes the row id from the dictionary if present.

Selection performs all required validity checks before accessing the requested column.

Export iterates through existing row ids in sorted order and formats each row as a CSV string beginning with the row id.

Go Solution

package main

import (
	"strconv"
	"strings"
)

type Table struct {
	columnCount int
	nextID      int
	rows        map[int][]string
}

type SQL struct {
	tables map[string]*Table
}

func Constructor(names []string, columns []int) SQL {
	sql := SQL{
		tables: make(map[string]*Table),
	}

	for i, name := range names {
		sql.tables[name] = &Table{
			columnCount: columns[i],
			nextID:      1,
			rows:        make(map[int][]string),
		}
	}

	return sql
}

func (this *SQL) Ins(name string, row []string) bool {
	table, exists := this.tables[name]
	if !exists {
		return false
	}

	if len(row) != table.columnCount {
		return false
	}

	rowID := table.nextID
	table.rows[rowID] = row
	table.nextID++

	return true
}

func (this *SQL) Rmv(name string, rowId int) {
	table, exists := this.tables[name]
	if !exists {
		return
	}

	delete(table.rows, rowId)
}

func (this *SQL) Sel(name string, rowId int, columnId int) string {
	table, exists := this.tables[name]
	if !exists {
		return "<null>"
	}

	row, exists := table.rows[rowId]
	if !exists {
		return "<null>"
	}

	if columnId < 1 || columnId > table.columnCount {
		return "<null>"
	}

	return row[columnId-1]
}

func (this *SQL) Exp(name string) []string {
	table, exists := this.tables[name]
	if !exists {
		return []string{}
	}

	result := []string{}

	for rowID := 1; rowID < table.nextID; rowID++ {
		row, exists := table.rows[rowID]
		if !exists {
			continue
		}

		values := append([]string{strconv.Itoa(rowID)}, row...)
		result = append(result, strings.Join(values, ","))
	}

	return result
}

The Go implementation mirrors the Python logic closely but uses structs and maps instead of classes and dictionaries.

One notable difference is the export implementation. Since Go maps are unordered, we iterate from 1 to nextID - 1 and check whether each row still exists. This naturally preserves insertion order while skipping deleted rows.

Go slices are used for row storage, and the built in delete function removes rows efficiently from the map.

Worked Examples

Example 1

Input:

["SQL","ins","sel","ins","exp","rmv","sel","exp"]

Initial tables:

Table Columns Next ID Rows
one 2 1 {}
two 3 1 {}
three 1 1 {}

Step 1

sql.ins("two", ["first", "second", "third"])

Table state:

Row ID Row
1 ["first", "second", "third"]

Next ID becomes 2.

Step 2

sql.sel("two", 1, 3)

Row 1 exists.

Column 3 corresponds to "third".

Return:

"third"

Step 3

sql.ins("two", ["fourth", "fifth", "sixth"])

Table state:

Row ID Row
1 ["first", "second", "third"]
2 ["fourth", "fifth", "sixth"]

Next ID becomes 3.

Step 4

sql.exp("two")

Exported rows:

[
  "1,first,second,third",
  "2,fourth,fifth,sixth"
]

Step 5

sql.rmv("two", 1)

Table state:

Row ID Row
2 ["fourth", "fifth", "sixth"]

Step 6

sql.sel("two", 2, 2)

Return:

"fifth"

Step 7

sql.exp("two")

Return:

[
  "2,fourth,fifth,sixth"
]

Example 2

After removing row 1, selecting from it returns:

"<null>"

When inserting a row with only two columns into a table expecting three columns:

sql.ins("two", ["fourth", "fifth"])

The operation returns:

False

because the column count does not match.

Complexity Analysis

Measure Complexity Explanation
Time O(1) average for insert/remove/select, O(n) for export Hash map operations are constant average time
Space O(n) Stores all active rows

The dominant storage cost comes from keeping all rows in memory. Hash maps allow efficient direct access by row id, which makes insertions, deletions, and lookups fast. Export must still iterate through all rows because every row must appear in the output.

Test Cases

# Example 1
sql = SQL(["one", "two", "three"], [2, 3, 1])

assert sql.ins("two", ["first", "second", "third"]) is True
assert sql.sel("two", 1, 3) == "third"
assert sql.ins("two", ["fourth", "fifth", "sixth"]) is True
assert sql.exp("two") == [
    "1,first,second,third",
    "2,fourth,fifth,sixth"
]
sql.rmv("two", 1)
assert sql.sel("two", 2, 2) == "fifth"
assert sql.exp("two") == [
    "2,fourth,fifth,sixth"
]

# Example 2
sql = SQL(["one", "two", "three"], [2, 3, 1])

assert sql.ins("two", ["first", "second", "third"]) is True
assert sql.sel("two", 1, 3) == "third"

sql.rmv("two", 1)

assert sql.sel("two", 1, 2) == "<null>"  # deleted row
assert sql.ins("two", ["fourth", "fifth"]) is False  # wrong column count
assert sql.ins("two", ["fourth", "fifth", "sixth"]) is True

# Invalid table name
sql = SQL(["users"], [2])

assert sql.ins("invalid", ["a", "b"]) is False
assert sql.sel("invalid", 1, 1) == "<null>"
assert sql.exp("invalid") == []

# Invalid column index
sql = SQL(["users"], [2])

sql.ins("users", ["alice", "bob"])

assert sql.sel("users", 1, 0) == "<null>"  # too small
assert sql.sel("users", 1, 3) == "<null>"  # too large

# Auto increment after deletion
sql = SQL(["users"], [1])

sql.ins("users", ["a"])
sql.ins("users", ["b"])

sql.rmv("users", 1)

sql.ins("users", ["c"])

assert sql.exp("users") == [
    "2,b",
    "3,c"
]

# Empty export
sql = SQL(["users"], [2])

assert sql.exp("users") == []

# Remove nonexistent row
sql = SQL(["users"], [1])

sql.rmv("users", 99)

assert sql.exp("users") == []

# Multiple tables
sql = SQL(["a", "b"], [1, 2])

sql.ins("a", ["x"])
sql.ins("b", ["y", "z"])

assert sql.exp("a") == ["1,x"]
assert sql.exp("b") == ["1,y,z"]
Test Why
Example 1 Validates all core operations together
Example 2 Verifies deletion and invalid insertion
Invalid table Ensures graceful handling of bad table names
Invalid column index Tests bounds checking
Auto increment after deletion Confirms ids are never reused
Empty export Verifies export on empty table
Remove nonexistent row Ensures safe deletion
Multiple tables Confirms independent table state

Edge Cases

Deleting Rows Without Reusing IDs

One easy mistake is generating the next row id using the current row count. That fails after deletions because ids would be reused. For example, after inserting rows 1 and 2, then deleting 2, the next insertion must still receive id 3.

The implementation solves this by maintaining a dedicated next_id counter that only increases and never decreases.

Selecting Invalid Columns

The problem uses 1 indexed column ids. This can easily cause off by one bugs when accessing arrays.

The implementation validates:

1 <= columnId <= column_count

before accessing:

row[columnId - 1]

This guarantees safe indexing.

Sparse Tables After Many Deletions

If many rows are deleted, using an array indexed directly by row id wastes memory because deleted positions remain empty.

The hash map based design stores only active rows. Deleted rows disappear entirely from memory, making the implementation efficient even when tables become sparse. This directly addresses the follow up question and is the preferred approach for real database like workloads.