LeetCode 631 - Design Excel Sum Formula
The problem asks us to design a miniature spreadsheet system similar to Microsoft Excel. The spreadsheet contains cells arranged in rows and columns, where rows are numbered starting from 1 and columns are labeled using uppercase letters such as A, B, C, and so on.
Difficulty: 🔴 Hard
Topics: Array, Hash Table, String, Graph Theory, Design, Topological Sort, Matrix
Solution
Problem Understanding
The problem asks us to design a miniature spreadsheet system similar to Microsoft Excel. The spreadsheet contains cells arranged in rows and columns, where rows are numbered starting from 1 and columns are labeled using uppercase letters such as A, B, C, and so on.
Each cell can store either:
- A direct integer value
- A formula representing the sum of other cells or ranges of cells
The key challenge is that formulas are dynamic. If a formula cell depends on other cells, and one of those source cells changes later, the formula cell must automatically reflect the updated value.
For example, suppose cell C3 stores the formula:
=A1 + A1:B2
If any cell inside the referenced range changes later, C3 must also change automatically.
The class must support four operations:
Initialization
We create a spreadsheet with a fixed height and width. Every cell starts with value 0.
set(row, column, val)
This operation assigns a direct value to a cell.
If the cell previously contained a formula, the formula must be removed and replaced by the explicit value.
get(row, column)
This returns the current value of the cell.
If the cell contains a formula, its value must reflect all current dependencies.
sum(row, column, numbers)
This operation defines a formula for a cell.
The numbers list may contain:
- Single cells such as
"A1" - Rectangular ranges such as
"B2:D5"
The formula persists until another set or sum overwrites the cell.
An important guarantee is that circular references never exist. Therefore, we never need to handle infinite dependency loops.
The constraints are intentionally small:
- Maximum grid size is
26 x 26 - At most
100operations occur
This means we do not need extremely advanced optimization techniques. A clean dependency graph solution is sufficient.
Several edge cases are important:
- A formula may reference the same cell multiple times
- A range may overlap with another range
- Updating a source cell must update all dependent formula cells
- Overwriting a formula with
set()must completely remove old dependencies - Formulas may depend on other formula cells recursively
These cases make dependency tracking the core difficulty of the problem.
Approaches
Brute Force Approach
A naive solution would recompute every formula cell from scratch after every update.
The idea is simple:
- Store raw formulas for cells
- Whenever
set()changes a value, scan the entire spreadsheet - Recalculate every formula recursively
This works because formulas form a directed acyclic graph due to the no-cycle guarantee.
However, this approach is inefficient because every update may trigger recomputation of many unrelated cells. Even though the constraints are small, repeatedly scanning the whole spreadsheet is unnecessary and inefficient.
The brute force solution also becomes harder to manage because recursive recomputation may revisit the same formula many times.
Optimal Approach
The better solution treats the spreadsheet as a dependency graph.
Each formula cell stores:
- Which cells it depends on
- How many times each dependency contributes
For example:
=A1 + A1:B2
means:
A1 contributes twice
B1 contributes once
A2 contributes once
B2 contributes once
Instead of eagerly updating all cells, we compute formula values lazily using depth first search.
The key observation is that a formula cell's value is fully determined by the current values of its dependencies.
So:
set()stores a raw value and removes any old formulasum()stores dependency mappingsget()recursively evaluates formulas when needed
Because there are no cycles, recursion is safe.
This creates a clean and maintainable design.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(H × W × D) per update | O(H × W) | Recomputes many unrelated formulas repeatedly |
| Optimal | O(D) per evaluation | O(H × W + E) | Uses dependency graph with recursive evaluation |
Here:
H × Wis spreadsheet sizeDis total referenced dependenciesEis number of dependency edges
Algorithm Walkthrough
1. Represent each cell uniquely
We represent each cell using a tuple:
(row, column)
For example:
(1, 'A')
This makes cells easy to use as dictionary keys.
2. Store direct values separately from formulas
We maintain two structures:
values[cell]stores direct integer valuesformulas[cell]stores dependency mappings
A formula mapping looks like:
{
dependency_cell: frequency
}
For example:
{
(1, 'A'): 2,
(1, 'B'): 1,
(2, 'A'): 1,
(2, 'B'): 1
}
This correctly handles repeated references.
3. Parse ranges into individual cells
When processing strings like:
"B2:D4"
we expand the rectangle into all individual cells.
This simplifies later evaluation because formulas only need explicit dependency mappings.
4. Handle set()
When assigning a direct value:
- Remove any existing formula
- Store the new value
This is important because formulas persist until overwritten.
5. Handle sum()
When creating a formula:
- Parse every range or single cell
- Count how many times each referenced cell appears
- Store the dependency mapping
- Return the evaluated value
6. Evaluate cells recursively
get() works recursively.
If the cell contains a direct value:
- Return it immediately
If the cell contains a formula:
- Recursively evaluate every dependency
- Multiply by its frequency
- Sum the results
Because the graph is acyclic, recursion always terminates safely.
7. Support nested formulas naturally
A formula may depend on another formula cell.
Since get() recursively evaluates dependencies, nested formulas work automatically without special handling.
Why it works
The algorithm maintains the invariant that every formula cell stores an exact mapping of its dependencies.
Whenever get() evaluates a cell, it recursively computes the current values of all dependencies. Since formulas never form cycles, recursive evaluation always terminates and always reflects the latest spreadsheet state.
Therefore every returned value is correct.
Python Solution
from typing import List
from collections import defaultdict
class Excel:
def __init__(self, height: int, width: str):
self.values = {}
self.formulas = {}
self.height = height
self.width = width
def set(self, row: int, column: str, val: int) -> None:
cell = (row, column)
if cell in self.formulas:
del self.formulas[cell]
self.values[cell] = val
def get(self, row: int, column: str) -> int:
cell = (row, column)
if cell not in self.formulas:
return self.values.get(cell, 0)
total = 0
for dep_cell, count in self.formulas[cell].items():
total += self.get(dep_cell[0], dep_cell[1]) * count
return total
def sum(self, row: int, column: str, numbers: List[str]) -> int:
cell = (row, column)
formula = defaultdict(int)
for item in numbers:
if ':' not in item:
r, c = self.parse_cell(item)
formula[(r, c)] += 1
else:
start, end = item.split(':')
r1, c1 = self.parse_cell(start)
r2, c2 = self.parse_cell(end)
for r in range(r1, r2 + 1):
for c in range(ord(c1), ord(c2) + 1):
formula[(r, chr(c))] += 1
self.formulas[cell] = formula
if cell in self.values:
del self.values[cell]
return self.get(row, column)
def parse_cell(self, s: str):
column = s[0]
row = int(s[1:])
return row, column
The implementation separates direct values and formulas into different dictionaries. This cleanly distinguishes between cells storing raw integers and cells storing computed expressions.
The set() method removes any existing formula before storing the new value. This matches the problem statement that formulas disappear once overwritten.
The sum() method parses each reference string. Single cells are added directly, while ranges are expanded into all included cells. The frequency map is important because the same cell may appear multiple times in a formula.
The recursive get() method is the core of the solution. If a cell contains a formula, it recursively evaluates all dependencies and combines them according to their frequencies.
Because formulas never contain cycles, recursion remains safe.
Go Solution
package main
import (
"strconv"
"strings"
)
type Cell struct {
row int
col byte
}
type Excel struct {
values map[Cell]int
formulas map[Cell]map[Cell]int
height int
width byte
}
func Constructor(height int, width byte) Excel {
return Excel{
values: make(map[Cell]int),
formulas: make(map[Cell]map[Cell]int),
height: height,
width: width,
}
}
func (this *Excel) Set(row int, column byte, val int) {
cell := Cell{row, column}
delete(this.formulas, cell)
this.values[cell] = val
}
func (this *Excel) Get(row int, column byte) int {
cell := Cell{row, column}
formula, exists := this.formulas[cell]
if !exists {
val, ok := this.values[cell]
if !ok {
return 0
}
return val
}
total := 0
for depCell, count := range formula {
total += this.Get(depCell.row, depCell.col) * count
}
return total
}
func (this *Excel) Sum(row int, column byte, numbers []string) int {
cell := Cell{row, column}
formula := make(map[Cell]int)
for _, item := range numbers {
if !strings.Contains(item, ":") {
r, c := parseCell(item)
formula[Cell{r, c}]++
} else {
parts := strings.Split(item, ":")
r1, c1 := parseCell(parts[0])
r2, c2 := parseCell(parts[1])
for r := r1; r <= r2; r++ {
for c := c1; c <= c2; c++ {
formula[Cell{r, c}]++
}
}
}
}
this.formulas[cell] = formula
delete(this.values, cell)
return this.Get(row, column)
}
func parseCell(s string) (int, byte) {
col := s[0]
row, _ := strconv.Atoi(s[1:])
return row, col
}
The Go implementation follows the same design as the Python version but uses explicit struct types for cells because Go does not support tuples as map keys.
Go maps return zero values automatically, which simplifies some lookups. The recursive evaluation logic remains identical.
Integer overflow is not a concern because the constraints are very small.
Worked Examples
Example 1
Input:
["Excel", "set", "sum", "set", "get"]
[[3, "C"], [1, "A", 2], [3, "C", ["A1", "A1:B2"]], [2, "B", 2], [3, "C"]]
Step 1
Initialize spreadsheet:
| Cell | Value |
|---|---|
| A1 | 0 |
| B1 | 0 |
| C1 | 0 |
| A2 | 0 |
| B2 | 0 |
| C2 | 0 |
| A3 | 0 |
| B3 | 0 |
| C3 | 0 |
Step 2
Call:
set(1, "A", 2)
State:
| Cell | Value |
|---|---|
| A1 | 2 |
Step 3
Call:
sum(3, "C", ["A1", "A1:B2"])
Dependencies become:
| Dependency | Frequency |
|---|---|
| A1 | 2 |
| B1 | 1 |
| A2 | 1 |
| B2 | 1 |
Current values:
A1 = 2
B1 = 0
A2 = 0
B2 = 0
Computed result:
2 + 2 + 0 + 0 + 0 = 4
So:
C3 = 4
Step 4
Call:
set(2, "B", 2)
Now:
B2 = 2
Since C3 depends on B2, recursive evaluation updates automatically.
New computation:
A1 + A1 + B1 + A2 + B2
= 2 + 2 + 0 + 0 + 2
= 6
Step 5
Call:
get(3, "C")
Returns:
6
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(D) | Evaluating a formula visits each dependency once |
| Space | O(H × W + E) | Stores cell values and dependency graph |
Here:
Dis the number of referenced cells in a formulaEis the total number of dependency edges
The recursive evaluation may revisit shared dependencies multiple times, but the constraints are small enough that memoization is unnecessary.
Test Cases
# Provided example
excel = Excel(3, "C")
excel.set(1, "A", 2)
assert excel.sum(3, "C", ["A1", "A1:B2"]) == 4 # formula with range
excel.set(2, "B", 2)
assert excel.get(3, "C") == 6 # dependency update
# Single direct value
excel = Excel(2, "B")
excel.set(1, "A", 5)
assert excel.get(1, "A") == 5 # direct retrieval
# Formula overwrite with set
excel = Excel(2, "B")
excel.set(1, "A", 3)
excel.sum(2, "B", ["A1"])
assert excel.get(2, "B") == 3
excel.set(2, "B", 10)
assert excel.get(2, "B") == 10 # formula removed
# Nested formulas
excel = Excel(3, "C")
excel.set(1, "A", 2)
excel.sum(1, "B", ["A1"])
excel.sum(1, "C", ["B1"])
assert excel.get(1, "C") == 2 # recursive formula evaluation
# Range summation
excel = Excel(3, "C")
excel.set(1, "A", 1)
excel.set(1, "B", 2)
excel.set(2, "A", 3)
excel.set(2, "B", 4)
assert excel.sum(3, "C", ["A1:B2"]) == 10 # rectangular range
# Multiple references to same cell
excel = Excel(2, "B")
excel.set(1, "A", 7)
assert excel.sum(2, "B", ["A1", "A1"]) == 14 # duplicate references
# Empty untouched cells
excel = Excel(2, "B")
assert excel.get(2, "B") == 0 # default value
# Updating dependency after formula creation
excel = Excel(2, "B")
excel.set(1, "A", 1)
excel.sum(2, "B", ["A1"])
assert excel.get(2, "B") == 1
excel.set(1, "A", 9)
assert excel.get(2, "B") == 9 # dynamic update
| Test | Why |
|---|---|
| Provided example | Validates core formula behavior |
| Single direct value | Ensures plain storage works |
| Formula overwrite with set | Confirms formulas are removed correctly |
| Nested formulas | Tests recursive dependency evaluation |
| Range summation | Verifies rectangle expansion |
| Multiple references to same cell | Ensures frequency counting works |
| Empty untouched cells | Confirms default zero behavior |
| Updating dependency after formula creation | Verifies dynamic recalculation |
Edge Cases
Overwriting a Formula Cell
One subtle issue occurs when a formula cell later receives a direct value through set().
For example:
C1 = SUM(A1:B1)
followed by:
set(C1, 10)
If the old formula is not removed, future updates to A1 or B1 would incorrectly continue affecting C1.
The implementation handles this by deleting the formula entry whenever set() is called.
Duplicate References in Formulas
A formula may reference the same cell multiple times:
["A1", "A1"]
A naive implementation using a set would incorrectly count A1 only once.
The solution uses a frequency map so each dependency stores how many times it contributes.
Nested Formula Dependencies
A formula may depend on another formula:
B1 = SUM(A1)
C1 = SUM(B1)
When A1 changes, both B1 and C1 must reflect the update.
The recursive get() implementation naturally handles this because evaluating C1 recursively evaluates B1, which recursively evaluates A1.
Empty Cells Inside Ranges
Ranges may include untouched cells that were never explicitly assigned.
Those cells must contribute 0.
The implementation handles this safely using:
self.values.get(cell, 0)
so missing cells default to zero automatically.