LeetCode 3369 - Design an Array Statistics Tracker

The problem asks us to design a data structure that supports a stream of operations on a dynamic collection of integers. Numbers are added over time, and the oldest inserted number can also be removed.

LeetCode Problem 3369

Difficulty: 🔴 Hard
Topics: Hash Table, Binary Search, Design, Queue, Heap (Priority Queue), Data Stream, Ordered Set

Solution

Problem Understanding

The problem asks us to design a data structure that supports a stream of operations on a dynamic collection of integers. Numbers are added over time, and the oldest inserted number can also be removed. At any point, we must efficiently answer three statistical queries:

  • The floored mean of all current numbers
  • The median of the current numbers
  • The mode of the current numbers

The important detail is that removals always happen in insertion order. This means the structure behaves partially like a queue, because removeFirstAddedNumber() always removes the earliest inserted value that still exists.

The challenge comes from the fact that we must support all operations efficiently under up to 10^5 total calls. A naive implementation that recomputes everything after every update would become too slow.

The mean is straightforward because it only depends on the total sum and count.

The median is more difficult because the collection changes dynamically. We need to efficiently maintain the middle element of a sorted version of the current numbers. The problem specifies that when the size is even, we take the larger middle element. For example:

  • [1, 2, 3, 4]
  • Middle candidates are 2 and 3
  • We return 3

The mode is also tricky because frequencies change dynamically as elements are added and removed. If multiple values have the same highest frequency, we must return the smallest such value.

The constraints tell us that:

  • Values can be as large as 10^9
  • There are up to 10^5 operations total
  • All query operations are guaranteed to happen only when the structure is non-empty

These constraints strongly suggest that:

  • We need logarithmic time updates
  • Full sorting or scanning on every operation is too expensive
  • We need specialized data structures for median and mode maintenance

Several edge cases are important:

  • Repeated values
  • Removing values that currently define the median
  • Multiple valid modes with tie-breaking by smallest value
  • Even-sized arrays where the larger middle value must be chosen
  • Frequent add/remove alternation

A correct solution must handle all of these efficiently.

Approaches

Brute Force Approach

The brute force solution stores all active numbers in a queue or list.

When a number is added, we append it.

When removing, we pop the oldest element.

For each query:

  • Mean is computed by summing all elements
  • Median is computed by sorting the current elements
  • Mode is computed by counting frequencies

This approach is correct because every query directly computes the desired statistic from the current state of the collection.

However, it is too slow:

  • Sorting for every median query costs O(n log n)
  • Rebuilding frequency counts for every mode query costs O(n)
  • Recomputing sums repeatedly is wasteful

With up to 10^5 operations, this approach can easily time out.

Optimal Approach

The key insight is that each statistic can be maintained incrementally.

For the mean:

  • Maintain a running sum
  • Maintain the current size

Then:

$$\text{mean} = \left\lfloor \frac{\text{sum}}{\text{count}} \right\rfloor$$

For the median:

  • Use two heaps
  • A max heap stores the smaller half
  • A min heap stores the larger half
  • Maintain balancing rules so the min heap always contains the median

Since removals are arbitrary relative to heap positions, we use lazy deletion.

For the mode:

  • Maintain a frequency map

  • Maintain a heap ordered by:

  • Highest frequency first

  • Smallest value first for ties

Again, lazy deletion is necessary because frequencies change dynamically.

The queue structure tracks insertion order so we know which value to remove.

Approach Time Complexity Space Complexity Notes
Brute Force O(n log n) per query O(n) Repeated sorting and counting
Optimal O(log n) per update/query O(n) Uses heaps, frequency maps, and lazy deletion

Algorithm Walkthrough

Data Structures

We maintain several structures simultaneously:

  • A queue storing insertion order
  • A running sum
  • Two heaps for median maintenance
  • Frequency maps for lazy deletion
  • A frequency counter for mode tracking
  • A heap for mode queries

Median Maintenance

We use:

  • small, a max heap implemented with negated values
  • large, a min heap

Invariant:

  • large contains the larger half
  • small contains the smaller half
  • len(large) is either equal to len(small) or exactly one larger

This ensures the median is always at the top of large.

Lazy Deletion

Python heaps do not support arbitrary removal efficiently.

When removing a number:

  • Mark it as deleted in a map
  • Physically remove it only when it reaches the top of a heap

This technique keeps all operations logarithmic.

Step-by-Step Algorithm

  1. When adding a number, append it to the queue and update the running sum and count.
  2. Insert the number into one of the median heaps.
  • If it belongs to the smaller half, insert into small
  • Otherwise insert into large
  1. Rebalance the heaps so:
  • large has either equal size or one extra element
  • All elements in small are less than or equal to elements in large
  1. Update the frequency count for the number.
  2. Push the updated (frequency, value) pair into the mode heap.
  • We store (-frequency, value) so the heap behaves like:

  • Highest frequency first

  • Smallest value first

  1. When removing the oldest number:
  • Pop from the queue
  • Update sum and count
  • Decrease frequency
  • Mark the number for lazy deletion in the correct median heap
  1. Rebalance the heaps again after removal.
  2. For getMean(), return sum // count.
  3. For getMedian(), clean invalid heap tops and return the top of large.
  4. For getMode(), repeatedly discard stale heap entries until the heap top matches the current frequency map.

Why it works

The correctness comes from maintaining strict invariants.

For the median:

  • The two heaps always partition the numbers into lower and upper halves
  • The size invariant guarantees the top of large is exactly the required median

For the mode:

  • The frequency map always stores true frequencies
  • Lazy deletion guarantees stale heap entries are ignored
  • Heap ordering guarantees the highest frequency and smallest value are selected

For the mean:

  • The running sum and count always represent the active elements

Together, these invariants ensure every query returns the correct answer.

Python Solution

from collections import deque, defaultdict
import heapq
from typing import Deque, Dict, List

class StatisticsTracker:

    def __init__(self):
        self.queue: Deque[int] = deque()

        self.total_sum = 0
        self.count = 0

        # Median structures
        self.small: List[int] = []   # max heap via negatives
        self.large: List[int] = []   # min heap

        self.del_small: Dict[int, int] = defaultdict(int)
        self.del_large: Dict[int, int] = defaultdict(int)

        self.small_size = 0
        self.large_size = 0

        # Mode structures
        self.freq: Dict[int, int] = defaultdict(int)
        self.mode_heap: List[tuple[int, int]] = []

    def _prune_small(self) -> None:
        while self.small:
            value = -self.small[0]
            if self.del_small[value] > 0:
                self.del_small[value] -= 1
                heapq.heappop(self.small)
            else:
                break

    def _prune_large(self) -> None:
        while self.large:
            value = self.large[0]
            if self.del_large[value] > 0:
                self.del_large[value] -= 1
                heapq.heappop(self.large)
            else:
                break

    def _rebalance(self) -> None:
        if self.large_size > self.small_size + 1:
            value = heapq.heappop(self.large)
            heapq.heappush(self.small, -value)

            self.large_size -= 1
            self.small_size += 1

            self._prune_large()

        elif self.small_size > self.large_size:
            value = -heapq.heappop(self.small)
            heapq.heappush(self.large, value)

            self.small_size -= 1
            self.large_size += 1

            self._prune_small()

    def addNumber(self, number: int) -> None:
        self.queue.append(number)

        self.total_sum += number
        self.count += 1

        if not self.large or number >= self.large[0]:
            heapq.heappush(self.large, number)
            self.large_size += 1
        else:
            heapq.heappush(self.small, -number)
            self.small_size += 1

        self._rebalance()

        self.freq[number] += 1
        heapq.heappush(
            self.mode_heap,
            (-self.freq[number], number)
        )

    def removeFirstAddedNumber(self) -> None:
        number = self.queue.popleft()

        self.total_sum -= number
        self.count -= 1

        self.freq[number] -= 1

        if self.large and number >= self.large[0]:
            self.del_large[number] += 1
            self.large_size -= 1

            if self.large and self.large[0] == number:
                self._prune_large()
        else:
            self.del_small[number] += 1
            self.small_size -= 1

            if self.small and -self.small[0] == number:
                self._prune_small()

        self._rebalance()

    def getMean(self) -> int:
        return self.total_sum // self.count

    def getMedian(self) -> int:
        self._prune_large()
        return self.large[0]

    def getMode(self) -> int:
        while True:
            freq, value = self.mode_heap[0]

            if -freq == self.freq[value]:
                return value

            heapq.heappop(self.mode_heap)

The implementation separates the three statistical responsibilities cleanly.

The queue preserves insertion order so removals are correct.

The running sum and count make mean queries constant time.

The median logic uses two heaps with balancing rules. The large heap always contains the median at its top because we intentionally maintain one extra element there when the total count is odd.

The lazy deletion maps are essential because heap removal is not efficient otherwise. Instead of removing elements immediately, we mark them invalid and discard them later when they appear at the heap top.

The mode heap stores many historical frequency states. Most entries eventually become stale. The getMode() method repeatedly removes outdated entries until it finds one matching the current frequency map.

Go Solution

package main

import (
	"container/heap"
)

type IntHeap []int

func (h IntHeap) Len() int {
	return len(h)
}

func (h IntHeap) Less(i, j int) bool {
	return h[i] < h[j]
}

func (h IntHeap) Swap(i, j int) {
	h[i], h[j] = h[j], h[i]
}

func (h *IntHeap) Push(x interface{}) {
	*h = append(*h, x.(int))
}

func (h *IntHeap) Pop() interface{} {
	old := *h
	n := len(old)
	val := old[n-1]
	*h = old[:n-1]
	return val
}

type Pair struct {
	freq  int
	value int
}

type PairHeap []Pair

func (h PairHeap) Len() int {
	return len(h)
}

func (h PairHeap) Less(i, j int) bool {
	if h[i].freq == h[j].freq {
		return h[i].value < h[j].value
	}
	return h[i].freq < h[j].freq
}

func (h PairHeap) Swap(i, j int) {
	h[i], h[j] = h[j], h[i]
}

func (h *PairHeap) Push(x interface{}) {
	*h = append(*h, x.(Pair))
}

func (h *PairHeap) Pop() interface{} {
	old := *h
	n := len(old)
	val := old[n-1]
	*h = old[:n-1]
	return val
}

type StatisticsTracker struct {
	queue []int
	head  int

	totalSum int64
	count    int

	small IntHeap
	large IntHeap

	delSmall map[int]int
	delLarge map[int]int

	smallSize int
	largeSize int

	freq     map[int]int
	modeHeap PairHeap
}

func Constructor() StatisticsTracker {
	s := StatisticsTracker{
		queue:    []int{},
		delSmall: map[int]int{},
		delLarge: map[int]int{},
		freq:     map[int]int{},
	}

	heap.Init(&s.small)
	heap.Init(&s.large)
	heap.Init(&s.modeHeap)

	return s
}

func (this *StatisticsTracker) pruneSmall() {
	for this.small.Len() > 0 {
		value := -this.small[0]

		if this.delSmall[value] > 0 {
			this.delSmall[value]--
			heap.Pop(&this.small)
		} else {
			break
		}
	}
}

func (this *StatisticsTracker) pruneLarge() {
	for this.large.Len() > 0 {
		value := this.large[0]

		if this.delLarge[value] > 0 {
			this.delLarge[value]--
			heap.Pop(&this.large)
		} else {
			break
		}
	}
}

func (this *StatisticsTracker) rebalance() {
	if this.largeSize > this.smallSize+1 {
		value := heap.Pop(&this.large).(int)
		heap.Push(&this.small, -value)

		this.largeSize--
		this.smallSize++

		this.pruneLarge()
	} else if this.smallSize > this.largeSize {
		value := -heap.Pop(&this.small).(int)
		heap.Push(&this.large, value)

		this.smallSize--
		this.largeSize++

		this.pruneSmall()
	}
}

func (this *StatisticsTracker) AddNumber(number int) {
	this.queue = append(this.queue, number)

	this.totalSum += int64(number)
	this.count++

	if this.large.Len() == 0 || number >= this.large[0] {
		heap.Push(&this.large, number)
		this.largeSize++
	} else {
		heap.Push(&this.small, -number)
		this.smallSize++
	}

	this.rebalance()

	this.freq[number]++

	heap.Push(&this.modeHeap, Pair{
		freq:  -this.freq[number],
		value: number,
	})
}

func (this *StatisticsTracker) RemoveFirstAddedNumber() {
	number := this.queue[this.head]
	this.head++

	this.totalSum -= int64(number)
	this.count--

	this.freq[number]--

	if this.large.Len() > 0 && number >= this.large[0] {
		this.delLarge[number]++
		this.largeSize--

		if this.large.Len() > 0 && this.large[0] == number {
			this.pruneLarge()
		}
	} else {
		this.delSmall[number]++
		this.smallSize--

		if this.small.Len() > 0 && -this.small[0] == number {
			this.pruneSmall()
		}
	}

	this.rebalance()
}

func (this *StatisticsTracker) GetMean() int {
	return int(this.totalSum / int64(this.count))
}

func (this *StatisticsTracker) GetMedian() int {
	this.pruneLarge()
	return this.large[0]
}

func (this *StatisticsTracker) GetMode() int {
	for {
		top := this.modeHeap[0]

		if -top.freq == this.freq[top.value] {
			return top.value
		}

		heap.Pop(&this.modeHeap)
	}
}

/**
 * Your StatisticsTracker object will be instantiated and called as such:
 * obj := Constructor();
 * obj.AddNumber(number);
 * obj.RemoveFirstAddedNumber();
 * param_3 := obj.GetMean();
 * param_4 := obj.GetMedian();
 * param_5 := obj.GetMode();
 */

The Go version follows the same algorithmic structure as the Python solution.

A few implementation details differ:

  • Go requires explicit heap implementations using container/heap
  • The max heap is simulated using negated values
  • int64 is used for the running sum to avoid overflow
  • The queue is implemented with a slice and head pointer instead of popping from the front

The lazy deletion logic and balancing rules are identical to the Python implementation.

Worked Examples

Example 1

Input:

add 4
add 4
add 2
add 3

State Evolution

Operation Queue Sorted Values Mean Median Mode
add 4 [4] [4] 4 4 4
add 4 [4,4] [4,4] 4 4 4
add 2 [4,4,2] [2,4,4] 3 4 4
add 3 [4,4,2,3] [2,3,4,4] 3 4 4

Median explanation:

  • Even size
  • Middle candidates are 3 and 4
  • Larger one is 4

Now remove first added number:

remove 4

Current values:

[4,2,3]

Sorted:

[2,3,4]

Frequencies:

  • 2 → 1
  • 3 → 1
  • 4 → 1

All frequencies tie, so choose the smallest value:

mode = 2

Example 2

State Evolution

Operation Queue Sorted Values Median Mode
add 9 [9] [9] 9 9
add 5 [9,5] [5,9] 9 5
remove 9 [5] [5] 5 5
add 5 [5,5] [5,5] 5 5
add 6 [5,5,6] [5,5,6] 5 5
remove 5 [5,6] [5,6] 6 5
add 8 [5,6,8] [5,6,8] 6 5

The median after [5,6] is 6 because the larger middle element is chosen.

Complexity Analysis

Measure Complexity Explanation
Time O(log n) per update/query Heap operations dominate
Space O(n) Heaps, queue, and maps store all active elements

Each insertion and removal performs a constant number of heap operations, each costing O(log n).

The lazy deletion mechanism ensures that each stale heap entry is removed at most once overall, so amortized complexity remains logarithmic.

The data structures collectively store at most O(n) elements.

Test Cases

# Example 1
obj = StatisticsTracker()
obj.addNumber(4)
obj.addNumber(4)
obj.addNumber(2)
obj.addNumber(3)

assert obj.getMean() == 3       # floored mean
assert obj.getMedian() == 4     # larger middle value
assert obj.getMode() == 4       # highest frequency

obj.removeFirstAddedNumber()

assert obj.getMode() == 2       # all freq 1, smallest wins

# Example 2
obj = StatisticsTracker()

obj.addNumber(9)
obj.addNumber(5)

assert obj.getMean() == 7       # floor(14 / 2)

obj.removeFirstAddedNumber()

obj.addNumber(5)
obj.addNumber(6)

obj.removeFirstAddedNumber()

assert obj.getMedian() == 6     # larger middle

obj.addNumber(8)

assert obj.getMode() == 5       # freq tie handling

# Single element
obj = StatisticsTracker()

obj.addNumber(10)

assert obj.getMean() == 10
assert obj.getMedian() == 10
assert obj.getMode() == 10

# All duplicates
obj = StatisticsTracker()

for _ in range(5):
    obj.addNumber(7)

assert obj.getMean() == 7
assert obj.getMedian() == 7
assert obj.getMode() == 7

# Increasing sequence
obj = StatisticsTracker()

for x in [1, 2, 3, 4, 5]:
    obj.addNumber(x)

assert obj.getMedian() == 3

# Even-sized median
obj = StatisticsTracker()

for x in [1, 2, 3, 4]:
    obj.addNumber(x)

assert obj.getMedian() == 3     # larger middle

# Mode tie
obj = StatisticsTracker()

obj.addNumber(5)
obj.addNumber(2)
obj.addNumber(5)
obj.addNumber(2)

assert obj.getMode() == 2       # same freq, smaller wins

# Frequent removals
obj = StatisticsTracker()

for x in [1, 2, 3, 4, 5]:
    obj.addNumber(x)

obj.removeFirstAddedNumber()
obj.removeFirstAddedNumber()

assert obj.getMedian() == 4
assert obj.getMode() == 3       # all freq 1, smallest remaining

# Large values
obj = StatisticsTracker()

obj.addNumber(10**9)
obj.addNumber(10**9)

assert obj.getMean() == 10**9
assert obj.getMedian() == 10**9
assert obj.getMode() == 10**9
Test Why
Example 1 Validates all core operations
Example 2 Validates removals and even median
Single element Smallest valid structure
All duplicates Heavy frequency concentration
Increasing sequence Normal ordered data
Even-sized median Confirms larger-middle rule
Mode tie Confirms smallest-value tie breaker
Frequent removals Validates lazy deletion correctness
Large values Ensures no overflow issues

Edge Cases

One important edge case is when all remaining elements have the same frequency. In this situation, the mode is determined entirely by the tie-breaking rule, which requires returning the smallest value. A careless implementation might return whichever value appears first in a heap or hash map iteration. The solution avoids this issue by storing (−frequency, value) in the mode heap, ensuring smaller values win ties automatically.

Another tricky case occurs with even-sized collections. Many median implementations return the smaller middle value or the average of the two middle values. This problem specifically requires the larger middle value. The implementation guarantees this by maintaining the invariant that the large heap always contains the extra element whenever the size is odd or balanced toward the upper half when even.

Lazy deletion is another major source of bugs. Removed elements may still physically exist inside heaps long after deletion. If stale entries are not cleaned before queries or balancing, the returned median or mode can become incorrect. The implementation carefully prunes invalid heap tops whenever necessary, ensuring only active values participate in calculations.

A final edge case involves repeated add and remove operations on the same values. Without correct frequency bookkeeping, stale mode entries could incorrectly dominate the heap. The solution handles this safely because every mode query validates heap entries against the current frequency map before returning a result.