LeetCode 631 - Design Excel Sum Formula

Difficulty: 🔴 Hard
Topics: Array, Hash Table, String, Graph Theory, Design, Topological Sort, Matrix

Solution

Problem Understanding

The problem asks us to design a miniature spreadsheet system similar to Microsoft Excel. The spreadsheet contains cells arranged in rows and columns, where rows are numbered starting from 1 and columns are labeled using uppercase letters such as A, B, C, and so on.

Each cell can store either:

A direct integer value
A formula representing the sum of other cells or ranges of cells

The key challenge is that formulas are dynamic. If a formula cell depends on other cells, and one of those source cells changes later, the formula cell must automatically reflect the updated value.

For example, suppose cell C3 stores the formula:

=A1 + A1:B2

If any cell inside the referenced range changes later, C3 must also change automatically.

The class must support four operations:

Initialization

We create a spreadsheet with a fixed height and width. Every cell starts with value 0.

set(row, column, val)

This operation assigns a direct value to a cell.

If the cell previously contained a formula, the formula must be removed and replaced by the explicit value.

get(row, column)

This returns the current value of the cell.

If the cell contains a formula, its value must reflect all current dependencies.

sum(row, column, numbers)

This operation defines a formula for a cell.

The numbers list may contain:

Single cells such as "A1"
Rectangular ranges such as "B2:D5"

The formula persists until another set or sum overwrites the cell.

An important guarantee is that circular references never exist. Therefore, we never need to handle infinite dependency loops.

The constraints are intentionally small:

Maximum grid size is 26 x 26
At most 100 operations occur

This means we do not need extremely advanced optimization techniques. A clean dependency graph solution is sufficient.

Several edge cases are important:

A formula may reference the same cell multiple times
A range may overlap with another range
Updating a source cell must update all dependent formula cells
Overwriting a formula with set() must completely remove old dependencies
Formulas may depend on other formula cells recursively

These cases make dependency tracking the core difficulty of the problem.

Approaches

Brute Force Approach

A naive solution would recompute every formula cell from scratch after every update.

The idea is simple:

Store raw formulas for cells
Whenever set() changes a value, scan the entire spreadsheet
Recalculate every formula recursively

This works because formulas form a directed acyclic graph due to the no-cycle guarantee.

However, this approach is inefficient because every update may trigger recomputation of many unrelated cells. Even though the constraints are small, repeatedly scanning the whole spreadsheet is unnecessary and inefficient.

The brute force solution also becomes harder to manage because recursive recomputation may revisit the same formula many times.

Optimal Approach

The better solution treats the spreadsheet as a dependency graph.

Each formula cell stores:

Which cells it depends on
How many times each dependency contributes

For example:

=A1 + A1:B2

means:

A1 contributes twice
B1 contributes once
A2 contributes once
B2 contributes once

Instead of eagerly updating all cells, we compute formula values lazily using depth first search.

The key observation is that a formula cell's value is fully determined by the current values of its dependencies.

So:

set() stores a raw value and removes any old formula
sum() stores dependency mappings
get() recursively evaluates formulas when needed

Because there are no cycles, recursion is safe.

This creates a clean and maintainable design.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(H × W × D) per update	O(H × W)	Recomputes many unrelated formulas repeatedly
Optimal	O(D) per evaluation	O(H × W + E)	Uses dependency graph with recursive evaluation

Here:

H × W is spreadsheet size
D is total referenced dependencies
E is number of dependency edges

Algorithm Walkthrough

1. Represent each cell uniquely

We represent each cell using a tuple:

(row, column)

For example:

(1, 'A')

This makes cells easy to use as dictionary keys.

2. Store direct values separately from formulas

We maintain two structures:

values[cell] stores direct integer values
formulas[cell] stores dependency mappings

A formula mapping looks like:

{
    dependency_cell: frequency
}

For example:

{
    (1, 'A'): 2,
    (1, 'B'): 1,
    (2, 'A'): 1,
    (2, 'B'): 1
}

This correctly handles repeated references.

3. Parse ranges into individual cells

When processing strings like:

"B2:D4"

we expand the rectangle into all individual cells.

This simplifies later evaluation because formulas only need explicit dependency mappings.

4. Handle set()

When assigning a direct value:

Remove any existing formula
Store the new value

This is important because formulas persist until overwritten.

5. Handle sum()

When creating a formula:

Parse every range or single cell
Count how many times each referenced cell appears
Store the dependency mapping
Return the evaluated value

6. Evaluate cells recursively

get() works recursively.

If the cell contains a direct value:

Return it immediately

If the cell contains a formula:

Recursively evaluate every dependency
Multiply by its frequency
Sum the results

Because the graph is acyclic, recursion always terminates safely.

7. Support nested formulas naturally

A formula may depend on another formula cell.

Since get() recursively evaluates dependencies, nested formulas work automatically without special handling.

Why it works

The algorithm maintains the invariant that every formula cell stores an exact mapping of its dependencies.

Whenever get() evaluates a cell, it recursively computes the current values of all dependencies. Since formulas never form cycles, recursive evaluation always terminates and always reflects the latest spreadsheet state.

Therefore every returned value is correct.

Python Solution

from typing import List
from collections import defaultdict

class Excel:

    def __init__(self, height: int, width: str):
        self.values = {}
        self.formulas = {}
        self.height = height
        self.width = width

    def set(self, row: int, column: str, val: int) -> None:
        cell = (row, column)

        if cell in self.formulas:
            del self.formulas[cell]

        self.values[cell] = val

    def get(self, row: int, column: str) -> int:
        cell = (row, column)

        if cell not in self.formulas:
            return self.values.get(cell, 0)

        total = 0

        for dep_cell, count in self.formulas[cell].items():
            total += self.get(dep_cell[0], dep_cell[1]) * count

        return total

    def sum(self, row: int, column: str, numbers: List[str]) -> int:
        cell = (row, column)

        formula = defaultdict(int)

        for item in numbers:
            if ':' not in item:
                r, c = self.parse_cell(item)
                formula[(r, c)] += 1
            else:
                start, end = item.split(':')

                r1, c1 = self.parse_cell(start)
                r2, c2 = self.parse_cell(end)

                for r in range(r1, r2 + 1):
                    for c in range(ord(c1), ord(c2) + 1):
                        formula[(r, chr(c))] += 1

        self.formulas[cell] = formula

        if cell in self.values:
            del self.values[cell]

        return self.get(row, column)

    def parse_cell(self, s: str):
        column = s[0]
        row = int(s[1:])
        return row, column

The implementation separates direct values and formulas into different dictionaries. This cleanly distinguishes between cells storing raw integers and cells storing computed expressions.

The set() method removes any existing formula before storing the new value. This matches the problem statement that formulas disappear once overwritten.

The sum() method parses each reference string. Single cells are added directly, while ranges are expanded into all included cells. The frequency map is important because the same cell may appear multiple times in a formula.

The recursive get() method is the core of the solution. If a cell contains a formula, it recursively evaluates all dependencies and combines them according to their frequencies.

Because formulas never contain cycles, recursion remains safe.

Go Solution

package main

import (
	"strconv"
	"strings"
)

type Cell struct {
	row int
	col byte
}

type Excel struct {
	values   map[Cell]int
	formulas map[Cell]map[Cell]int
	height   int
	width    byte
}

func Constructor(height int, width byte) Excel {
	return Excel{
		values:   make(map[Cell]int),
		formulas: make(map[Cell]map[Cell]int),
		height:   height,
		width:    width,
	}
}

func (this *Excel) Set(row int, column byte, val int) {
	cell := Cell{row, column}

	delete(this.formulas, cell)
	this.values[cell] = val
}

func (this *Excel) Get(row int, column byte) int {
	cell := Cell{row, column}

	formula, exists := this.formulas[cell]

	if !exists {
		val, ok := this.values[cell]
		if !ok {
			return 0
		}
		return val
	}

	total := 0

	for depCell, count := range formula {
		total += this.Get(depCell.row, depCell.col) * count
	}

	return total
}

func (this *Excel) Sum(row int, column byte, numbers []string) int {
	cell := Cell{row, column}

	formula := make(map[Cell]int)

	for _, item := range numbers {
		if !strings.Contains(item, ":") {
			r, c := parseCell(item)
			formula[Cell{r, c}]++
		} else {
			parts := strings.Split(item, ":")

			r1, c1 := parseCell(parts[0])
			r2, c2 := parseCell(parts[1])

			for r := r1; r <= r2; r++ {
				for c := c1; c <= c2; c++ {
					formula[Cell{r, c}]++
				}
			}
		}
	}

	this.formulas[cell] = formula
	delete(this.values, cell)

	return this.Get(row, column)
}

func parseCell(s string) (int, byte) {
	col := s[0]
	row, _ := strconv.Atoi(s[1:])
	return row, col
}

The Go implementation follows the same design as the Python version but uses explicit struct types for cells because Go does not support tuples as map keys.

Go maps return zero values automatically, which simplifies some lookups. The recursive evaluation logic remains identical.

Integer overflow is not a concern because the constraints are very small.

Worked Examples

Example 1

Input:

["Excel", "set", "sum", "set", "get"]
[[3, "C"], [1, "A", 2], [3, "C", ["A1", "A1:B2"]], [2, "B", 2], [3, "C"]]

Step 1

Initialize spreadsheet:

Cell	Value
A1	0
B1	0
C1	0
A2	0
B2	0
C2	0
A3	0
B3	0
C3	0

Step 2

Call:

set(1, "A", 2)

State:

Cell	Value
A1	2

Step 3

Call:

sum(3, "C", ["A1", "A1:B2"])

Dependencies become:

Dependency	Frequency
A1	2
B1	1
A2	1
B2	1

Current values:

A1 = 2
B1 = 0
A2 = 0
B2 = 0

Computed result:

2 + 2 + 0 + 0 + 0 = 4

So:

C3 = 4

Step 4

Call:

set(2, "B", 2)

Now:

B2 = 2

Since C3 depends on B2, recursive evaluation updates automatically.

New computation:

A1 + A1 + B1 + A2 + B2
= 2 + 2 + 0 + 0 + 2
= 6

Step 5

Call:

get(3, "C")

Returns:

Complexity Analysis

Measure	Complexity	Explanation
Time	O(D)	Evaluating a formula visits each dependency once
Space	O(H × W + E)	Stores cell values and dependency graph

Here:

D is the number of referenced cells in a formula
E is the total number of dependency edges

The recursive evaluation may revisit shared dependencies multiple times, but the constraints are small enough that memoization is unnecessary.

Test Cases

# Provided example
excel = Excel(3, "C")
excel.set(1, "A", 2)
assert excel.sum(3, "C", ["A1", "A1:B2"]) == 4  # formula with range
excel.set(2, "B", 2)
assert excel.get(3, "C") == 6  # dependency update

# Single direct value
excel = Excel(2, "B")
excel.set(1, "A", 5)
assert excel.get(1, "A") == 5  # direct retrieval

# Formula overwrite with set
excel = Excel(2, "B")
excel.set(1, "A", 3)
excel.sum(2, "B", ["A1"])
assert excel.get(2, "B") == 3
excel.set(2, "B", 10)
assert excel.get(2, "B") == 10  # formula removed

# Nested formulas
excel = Excel(3, "C")
excel.set(1, "A", 2)
excel.sum(1, "B", ["A1"])
excel.sum(1, "C", ["B1"])
assert excel.get(1, "C") == 2  # recursive formula evaluation

# Range summation
excel = Excel(3, "C")
excel.set(1, "A", 1)
excel.set(1, "B", 2)
excel.set(2, "A", 3)
excel.set(2, "B", 4)
assert excel.sum(3, "C", ["A1:B2"]) == 10  # rectangular range

# Multiple references to same cell
excel = Excel(2, "B")
excel.set(1, "A", 7)
assert excel.sum(2, "B", ["A1", "A1"]) == 14  # duplicate references

# Empty untouched cells
excel = Excel(2, "B")
assert excel.get(2, "B") == 0  # default value

# Updating dependency after formula creation
excel = Excel(2, "B")
excel.set(1, "A", 1)
excel.sum(2, "B", ["A1"])
assert excel.get(2, "B") == 1
excel.set(1, "A", 9)
assert excel.get(2, "B") == 9  # dynamic update

Test	Why
Provided example	Validates core formula behavior
Single direct value	Ensures plain storage works
Formula overwrite with set	Confirms formulas are removed correctly
Nested formulas	Tests recursive dependency evaluation
Range summation	Verifies rectangle expansion
Multiple references to same cell	Ensures frequency counting works
Empty untouched cells	Confirms default zero behavior
Updating dependency after formula creation	Verifies dynamic recalculation

Edge Cases

Overwriting a Formula Cell

One subtle issue occurs when a formula cell later receives a direct value through set().

For example:

C1 = SUM(A1:B1)

followed by:

set(C1, 10)

If the old formula is not removed, future updates to A1 or B1 would incorrectly continue affecting C1.

The implementation handles this by deleting the formula entry whenever set() is called.

Duplicate References in Formulas

A formula may reference the same cell multiple times:

["A1", "A1"]

A naive implementation using a set would incorrectly count A1 only once.

The solution uses a frequency map so each dependency stores how many times it contributes.

Nested Formula Dependencies

A formula may depend on another formula:

B1 = SUM(A1)
C1 = SUM(B1)

When A1 changes, both B1 and C1 must reflect the update.

The recursive get() implementation naturally handles this because evaluating C1 recursively evaluates B1, which recursively evaluates A1.

Empty Cells Inside Ranges

Ranges may include untouched cells that were never explicitly assigned.

Those cells must contribute 0.

The implementation handles this safely using:

self.values.get(cell, 0)

so missing cells default to zero automatically.