LeetCode 3369 - Design an Array Statistics Tracker
The problem asks us to design a data structure that supports a stream of operations on a dynamic collection of integers. Numbers are added over time, and the oldest inserted number can also be removed.
Difficulty: 🔴 Hard
Topics: Hash Table, Binary Search, Design, Queue, Heap (Priority Queue), Data Stream, Ordered Set
Solution
Problem Understanding
The problem asks us to design a data structure that supports a stream of operations on a dynamic collection of integers. Numbers are added over time, and the oldest inserted number can also be removed. At any point, we must efficiently answer three statistical queries:
- The floored mean of all current numbers
- The median of the current numbers
- The mode of the current numbers
The important detail is that removals always happen in insertion order. This means the structure behaves partially like a queue, because removeFirstAddedNumber() always removes the earliest inserted value that still exists.
The challenge comes from the fact that we must support all operations efficiently under up to 10^5 total calls. A naive implementation that recomputes everything after every update would become too slow.
The mean is straightforward because it only depends on the total sum and count.
The median is more difficult because the collection changes dynamically. We need to efficiently maintain the middle element of a sorted version of the current numbers. The problem specifies that when the size is even, we take the larger middle element. For example:
[1, 2, 3, 4]- Middle candidates are
2and3 - We return
3
The mode is also tricky because frequencies change dynamically as elements are added and removed. If multiple values have the same highest frequency, we must return the smallest such value.
The constraints tell us that:
- Values can be as large as
10^9 - There are up to
10^5operations total - All query operations are guaranteed to happen only when the structure is non-empty
These constraints strongly suggest that:
- We need logarithmic time updates
- Full sorting or scanning on every operation is too expensive
- We need specialized data structures for median and mode maintenance
Several edge cases are important:
- Repeated values
- Removing values that currently define the median
- Multiple valid modes with tie-breaking by smallest value
- Even-sized arrays where the larger middle value must be chosen
- Frequent add/remove alternation
A correct solution must handle all of these efficiently.
Approaches
Brute Force Approach
The brute force solution stores all active numbers in a queue or list.
When a number is added, we append it.
When removing, we pop the oldest element.
For each query:
- Mean is computed by summing all elements
- Median is computed by sorting the current elements
- Mode is computed by counting frequencies
This approach is correct because every query directly computes the desired statistic from the current state of the collection.
However, it is too slow:
- Sorting for every median query costs
O(n log n) - Rebuilding frequency counts for every mode query costs
O(n) - Recomputing sums repeatedly is wasteful
With up to 10^5 operations, this approach can easily time out.
Optimal Approach
The key insight is that each statistic can be maintained incrementally.
For the mean:
- Maintain a running sum
- Maintain the current size
Then:
$$\text{mean} = \left\lfloor \frac{\text{sum}}{\text{count}} \right\rfloor$$
For the median:
- Use two heaps
- A max heap stores the smaller half
- A min heap stores the larger half
- Maintain balancing rules so the min heap always contains the median
Since removals are arbitrary relative to heap positions, we use lazy deletion.
For the mode:
-
Maintain a frequency map
-
Maintain a heap ordered by:
-
Highest frequency first
-
Smallest value first for ties
Again, lazy deletion is necessary because frequencies change dynamically.
The queue structure tracks insertion order so we know which value to remove.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n log n) per query | O(n) | Repeated sorting and counting |
| Optimal | O(log n) per update/query | O(n) | Uses heaps, frequency maps, and lazy deletion |
Algorithm Walkthrough
Data Structures
We maintain several structures simultaneously:
- A queue storing insertion order
- A running sum
- Two heaps for median maintenance
- Frequency maps for lazy deletion
- A frequency counter for mode tracking
- A heap for mode queries
Median Maintenance
We use:
small, a max heap implemented with negated valueslarge, a min heap
Invariant:
largecontains the larger halfsmallcontains the smaller halflen(large)is either equal tolen(small)or exactly one larger
This ensures the median is always at the top of large.
Lazy Deletion
Python heaps do not support arbitrary removal efficiently.
When removing a number:
- Mark it as deleted in a map
- Physically remove it only when it reaches the top of a heap
This technique keeps all operations logarithmic.
Step-by-Step Algorithm
- When adding a number, append it to the queue and update the running sum and count.
- Insert the number into one of the median heaps.
- If it belongs to the smaller half, insert into
small - Otherwise insert into
large
- Rebalance the heaps so:
largehas either equal size or one extra element- All elements in
smallare less than or equal to elements inlarge
- Update the frequency count for the number.
- Push the updated
(frequency, value)pair into the mode heap.
-
We store
(-frequency, value)so the heap behaves like: -
Highest frequency first
-
Smallest value first
- When removing the oldest number:
- Pop from the queue
- Update sum and count
- Decrease frequency
- Mark the number for lazy deletion in the correct median heap
- Rebalance the heaps again after removal.
- For
getMean(), returnsum // count. - For
getMedian(), clean invalid heap tops and return the top oflarge. - For
getMode(), repeatedly discard stale heap entries until the heap top matches the current frequency map.
Why it works
The correctness comes from maintaining strict invariants.
For the median:
- The two heaps always partition the numbers into lower and upper halves
- The size invariant guarantees the top of
largeis exactly the required median
For the mode:
- The frequency map always stores true frequencies
- Lazy deletion guarantees stale heap entries are ignored
- Heap ordering guarantees the highest frequency and smallest value are selected
For the mean:
- The running sum and count always represent the active elements
Together, these invariants ensure every query returns the correct answer.
Python Solution
from collections import deque, defaultdict
import heapq
from typing import Deque, Dict, List
class StatisticsTracker:
def __init__(self):
self.queue: Deque[int] = deque()
self.total_sum = 0
self.count = 0
# Median structures
self.small: List[int] = [] # max heap via negatives
self.large: List[int] = [] # min heap
self.del_small: Dict[int, int] = defaultdict(int)
self.del_large: Dict[int, int] = defaultdict(int)
self.small_size = 0
self.large_size = 0
# Mode structures
self.freq: Dict[int, int] = defaultdict(int)
self.mode_heap: List[tuple[int, int]] = []
def _prune_small(self) -> None:
while self.small:
value = -self.small[0]
if self.del_small[value] > 0:
self.del_small[value] -= 1
heapq.heappop(self.small)
else:
break
def _prune_large(self) -> None:
while self.large:
value = self.large[0]
if self.del_large[value] > 0:
self.del_large[value] -= 1
heapq.heappop(self.large)
else:
break
def _rebalance(self) -> None:
if self.large_size > self.small_size + 1:
value = heapq.heappop(self.large)
heapq.heappush(self.small, -value)
self.large_size -= 1
self.small_size += 1
self._prune_large()
elif self.small_size > self.large_size:
value = -heapq.heappop(self.small)
heapq.heappush(self.large, value)
self.small_size -= 1
self.large_size += 1
self._prune_small()
def addNumber(self, number: int) -> None:
self.queue.append(number)
self.total_sum += number
self.count += 1
if not self.large or number >= self.large[0]:
heapq.heappush(self.large, number)
self.large_size += 1
else:
heapq.heappush(self.small, -number)
self.small_size += 1
self._rebalance()
self.freq[number] += 1
heapq.heappush(
self.mode_heap,
(-self.freq[number], number)
)
def removeFirstAddedNumber(self) -> None:
number = self.queue.popleft()
self.total_sum -= number
self.count -= 1
self.freq[number] -= 1
if self.large and number >= self.large[0]:
self.del_large[number] += 1
self.large_size -= 1
if self.large and self.large[0] == number:
self._prune_large()
else:
self.del_small[number] += 1
self.small_size -= 1
if self.small and -self.small[0] == number:
self._prune_small()
self._rebalance()
def getMean(self) -> int:
return self.total_sum // self.count
def getMedian(self) -> int:
self._prune_large()
return self.large[0]
def getMode(self) -> int:
while True:
freq, value = self.mode_heap[0]
if -freq == self.freq[value]:
return value
heapq.heappop(self.mode_heap)
The implementation separates the three statistical responsibilities cleanly.
The queue preserves insertion order so removals are correct.
The running sum and count make mean queries constant time.
The median logic uses two heaps with balancing rules. The large heap always contains the median at its top because we intentionally maintain one extra element there when the total count is odd.
The lazy deletion maps are essential because heap removal is not efficient otherwise. Instead of removing elements immediately, we mark them invalid and discard them later when they appear at the heap top.
The mode heap stores many historical frequency states. Most entries eventually become stale. The getMode() method repeatedly removes outdated entries until it finds one matching the current frequency map.
Go Solution
package main
import (
"container/heap"
)
type IntHeap []int
func (h IntHeap) Len() int {
return len(h)
}
func (h IntHeap) Less(i, j int) bool {
return h[i] < h[j]
}
func (h IntHeap) Swap(i, j int) {
h[i], h[j] = h[j], h[i]
}
func (h *IntHeap) Push(x interface{}) {
*h = append(*h, x.(int))
}
func (h *IntHeap) Pop() interface{} {
old := *h
n := len(old)
val := old[n-1]
*h = old[:n-1]
return val
}
type Pair struct {
freq int
value int
}
type PairHeap []Pair
func (h PairHeap) Len() int {
return len(h)
}
func (h PairHeap) Less(i, j int) bool {
if h[i].freq == h[j].freq {
return h[i].value < h[j].value
}
return h[i].freq < h[j].freq
}
func (h PairHeap) Swap(i, j int) {
h[i], h[j] = h[j], h[i]
}
func (h *PairHeap) Push(x interface{}) {
*h = append(*h, x.(Pair))
}
func (h *PairHeap) Pop() interface{} {
old := *h
n := len(old)
val := old[n-1]
*h = old[:n-1]
return val
}
type StatisticsTracker struct {
queue []int
head int
totalSum int64
count int
small IntHeap
large IntHeap
delSmall map[int]int
delLarge map[int]int
smallSize int
largeSize int
freq map[int]int
modeHeap PairHeap
}
func Constructor() StatisticsTracker {
s := StatisticsTracker{
queue: []int{},
delSmall: map[int]int{},
delLarge: map[int]int{},
freq: map[int]int{},
}
heap.Init(&s.small)
heap.Init(&s.large)
heap.Init(&s.modeHeap)
return s
}
func (this *StatisticsTracker) pruneSmall() {
for this.small.Len() > 0 {
value := -this.small[0]
if this.delSmall[value] > 0 {
this.delSmall[value]--
heap.Pop(&this.small)
} else {
break
}
}
}
func (this *StatisticsTracker) pruneLarge() {
for this.large.Len() > 0 {
value := this.large[0]
if this.delLarge[value] > 0 {
this.delLarge[value]--
heap.Pop(&this.large)
} else {
break
}
}
}
func (this *StatisticsTracker) rebalance() {
if this.largeSize > this.smallSize+1 {
value := heap.Pop(&this.large).(int)
heap.Push(&this.small, -value)
this.largeSize--
this.smallSize++
this.pruneLarge()
} else if this.smallSize > this.largeSize {
value := -heap.Pop(&this.small).(int)
heap.Push(&this.large, value)
this.smallSize--
this.largeSize++
this.pruneSmall()
}
}
func (this *StatisticsTracker) AddNumber(number int) {
this.queue = append(this.queue, number)
this.totalSum += int64(number)
this.count++
if this.large.Len() == 0 || number >= this.large[0] {
heap.Push(&this.large, number)
this.largeSize++
} else {
heap.Push(&this.small, -number)
this.smallSize++
}
this.rebalance()
this.freq[number]++
heap.Push(&this.modeHeap, Pair{
freq: -this.freq[number],
value: number,
})
}
func (this *StatisticsTracker) RemoveFirstAddedNumber() {
number := this.queue[this.head]
this.head++
this.totalSum -= int64(number)
this.count--
this.freq[number]--
if this.large.Len() > 0 && number >= this.large[0] {
this.delLarge[number]++
this.largeSize--
if this.large.Len() > 0 && this.large[0] == number {
this.pruneLarge()
}
} else {
this.delSmall[number]++
this.smallSize--
if this.small.Len() > 0 && -this.small[0] == number {
this.pruneSmall()
}
}
this.rebalance()
}
func (this *StatisticsTracker) GetMean() int {
return int(this.totalSum / int64(this.count))
}
func (this *StatisticsTracker) GetMedian() int {
this.pruneLarge()
return this.large[0]
}
func (this *StatisticsTracker) GetMode() int {
for {
top := this.modeHeap[0]
if -top.freq == this.freq[top.value] {
return top.value
}
heap.Pop(&this.modeHeap)
}
}
/**
* Your StatisticsTracker object will be instantiated and called as such:
* obj := Constructor();
* obj.AddNumber(number);
* obj.RemoveFirstAddedNumber();
* param_3 := obj.GetMean();
* param_4 := obj.GetMedian();
* param_5 := obj.GetMode();
*/
The Go version follows the same algorithmic structure as the Python solution.
A few implementation details differ:
- Go requires explicit heap implementations using
container/heap - The max heap is simulated using negated values
int64is used for the running sum to avoid overflow- The queue is implemented with a slice and head pointer instead of popping from the front
The lazy deletion logic and balancing rules are identical to the Python implementation.
Worked Examples
Example 1
Input:
add 4
add 4
add 2
add 3
State Evolution
| Operation | Queue | Sorted Values | Mean | Median | Mode |
|---|---|---|---|---|---|
| add 4 | [4] | [4] | 4 | 4 | 4 |
| add 4 | [4,4] | [4,4] | 4 | 4 | 4 |
| add 2 | [4,4,2] | [2,4,4] | 3 | 4 | 4 |
| add 3 | [4,4,2,3] | [2,3,4,4] | 3 | 4 | 4 |
Median explanation:
- Even size
- Middle candidates are
3and4 - Larger one is
4
Now remove first added number:
remove 4
Current values:
[4,2,3]
Sorted:
[2,3,4]
Frequencies:
- 2 → 1
- 3 → 1
- 4 → 1
All frequencies tie, so choose the smallest value:
mode = 2
Example 2
State Evolution
| Operation | Queue | Sorted Values | Median | Mode |
|---|---|---|---|---|
| add 9 | [9] | [9] | 9 | 9 |
| add 5 | [9,5] | [5,9] | 9 | 5 |
| remove 9 | [5] | [5] | 5 | 5 |
| add 5 | [5,5] | [5,5] | 5 | 5 |
| add 6 | [5,5,6] | [5,5,6] | 5 | 5 |
| remove 5 | [5,6] | [5,6] | 6 | 5 |
| add 8 | [5,6,8] | [5,6,8] | 6 | 5 |
The median after [5,6] is 6 because the larger middle element is chosen.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(log n) per update/query | Heap operations dominate |
| Space | O(n) | Heaps, queue, and maps store all active elements |
Each insertion and removal performs a constant number of heap operations, each costing O(log n).
The lazy deletion mechanism ensures that each stale heap entry is removed at most once overall, so amortized complexity remains logarithmic.
The data structures collectively store at most O(n) elements.
Test Cases
# Example 1
obj = StatisticsTracker()
obj.addNumber(4)
obj.addNumber(4)
obj.addNumber(2)
obj.addNumber(3)
assert obj.getMean() == 3 # floored mean
assert obj.getMedian() == 4 # larger middle value
assert obj.getMode() == 4 # highest frequency
obj.removeFirstAddedNumber()
assert obj.getMode() == 2 # all freq 1, smallest wins
# Example 2
obj = StatisticsTracker()
obj.addNumber(9)
obj.addNumber(5)
assert obj.getMean() == 7 # floor(14 / 2)
obj.removeFirstAddedNumber()
obj.addNumber(5)
obj.addNumber(6)
obj.removeFirstAddedNumber()
assert obj.getMedian() == 6 # larger middle
obj.addNumber(8)
assert obj.getMode() == 5 # freq tie handling
# Single element
obj = StatisticsTracker()
obj.addNumber(10)
assert obj.getMean() == 10
assert obj.getMedian() == 10
assert obj.getMode() == 10
# All duplicates
obj = StatisticsTracker()
for _ in range(5):
obj.addNumber(7)
assert obj.getMean() == 7
assert obj.getMedian() == 7
assert obj.getMode() == 7
# Increasing sequence
obj = StatisticsTracker()
for x in [1, 2, 3, 4, 5]:
obj.addNumber(x)
assert obj.getMedian() == 3
# Even-sized median
obj = StatisticsTracker()
for x in [1, 2, 3, 4]:
obj.addNumber(x)
assert obj.getMedian() == 3 # larger middle
# Mode tie
obj = StatisticsTracker()
obj.addNumber(5)
obj.addNumber(2)
obj.addNumber(5)
obj.addNumber(2)
assert obj.getMode() == 2 # same freq, smaller wins
# Frequent removals
obj = StatisticsTracker()
for x in [1, 2, 3, 4, 5]:
obj.addNumber(x)
obj.removeFirstAddedNumber()
obj.removeFirstAddedNumber()
assert obj.getMedian() == 4
assert obj.getMode() == 3 # all freq 1, smallest remaining
# Large values
obj = StatisticsTracker()
obj.addNumber(10**9)
obj.addNumber(10**9)
assert obj.getMean() == 10**9
assert obj.getMedian() == 10**9
assert obj.getMode() == 10**9
| Test | Why |
|---|---|
| Example 1 | Validates all core operations |
| Example 2 | Validates removals and even median |
| Single element | Smallest valid structure |
| All duplicates | Heavy frequency concentration |
| Increasing sequence | Normal ordered data |
| Even-sized median | Confirms larger-middle rule |
| Mode tie | Confirms smallest-value tie breaker |
| Frequent removals | Validates lazy deletion correctness |
| Large values | Ensures no overflow issues |
Edge Cases
One important edge case is when all remaining elements have the same frequency. In this situation, the mode is determined entirely by the tie-breaking rule, which requires returning the smallest value. A careless implementation might return whichever value appears first in a heap or hash map iteration. The solution avoids this issue by storing (−frequency, value) in the mode heap, ensuring smaller values win ties automatically.
Another tricky case occurs with even-sized collections. Many median implementations return the smaller middle value or the average of the two middle values. This problem specifically requires the larger middle value. The implementation guarantees this by maintaining the invariant that the large heap always contains the extra element whenever the size is odd or balanced toward the upper half when even.
Lazy deletion is another major source of bugs. Removed elements may still physically exist inside heaps long after deletion. If stale entries are not cleaned before queries or balancing, the returned median or mode can become incorrect. The implementation carefully prunes invalid heap tops whenever necessary, ensuring only active values participate in calculations.
A final edge case involves repeated add and remove operations on the same values. Without correct frequency bookkeeping, stale mode entries could incorrectly dominate the heap. The solution handles this safely because every mode query validates heap entries against the current frequency map before returning a result.