LeetCode 295 - Find Median from Data Stream

The problem asks us to design a data structure that continuously receives integers from a stream and can efficiently return the median of all numbers seen so far. The median is defined as the middle element in a sorted list.

LeetCode Problem 295

Difficulty: 🔴 Hard
Topics: Two Pointers, Design, Sorting, Heap (Priority Queue), Data Stream

Solution

Problem Understanding

The problem asks us to design a data structure that continuously receives integers from a stream and can efficiently return the median of all numbers seen so far.

The median is defined as the middle element in a sorted list. If the number of elements is odd, the median is the single middle value. If the number of elements is even, the median is the average of the two middle values.

For example:

  • In [2, 3, 4], the median is 3
  • In [2, 3], the median is (2 + 3) / 2 = 2.5

The key challenge is that numbers arrive dynamically. We are not given the entire array upfront. Instead, we must support two operations repeatedly:

  • addNum(num) inserts a new number into the data structure
  • findMedian() returns the current median at any moment

The constraints are important:

  • Up to 5 * 10^4 operations may occur
  • Numbers range from -10^5 to 10^5

These limits immediately rule out inefficient approaches that repeatedly sort the entire collection after every insertion. Even though 50,000 operations is not enormous, repeatedly sorting would create unnecessary overhead.

The problem guarantees that findMedian() is never called on an empty structure, so we do not need to handle that invalid case.

Several edge cases are important:

  • The stream may contain negative numbers
  • Duplicate values may appear frequently
  • The stream may alternate between large and small numbers
  • The number of elements may constantly switch between odd and even
  • The median may be a floating-point value

A correct implementation must efficiently maintain ordering information as numbers are inserted.

Approaches

Brute Force Approach

The most straightforward solution is to store all inserted numbers in a list. Every time findMedian() is called, we sort the list and compute the middle value.

This works because sorting guarantees that the middle elements are positioned correctly. Once the list is sorted:

  • If the length is odd, the median is the center element
  • If the length is even, the median is the average of the two center elements

However, this approach is inefficient. Sorting requires O(n log n) time every time we request the median. Since findMedian() may be called many times, repeated sorting becomes expensive.

An alternative brute-force variation inserts numbers into a sorted list using binary insertion. This improves median lookup but insertion still costs O(n) because shifting elements in an array is expensive.

The main issue is that we need efficient insertion and efficient median retrieval simultaneously.

Optimal Approach, Two Heaps

The key observation is that the median only depends on the middle portion of the data, not the full sorted order.

We can divide the numbers into two halves:

  • A max heap stores the smaller half
  • A min heap stores the larger half

This structure gives direct access to the largest value in the lower half and the smallest value in the upper half.

The heaps maintain these properties:

  • Every value in the max heap is less than or equal to every value in the min heap
  • The heap sizes differ by at most one
  • The max heap may contain one extra element when the total count is odd

With these invariants:

  • If the total count is odd, the median is the top of the max heap
  • If the total count is even, the median is the average of both heap tops

This approach is efficient because heap insertion and balancing each take O(log n) time, while median lookup becomes O(1).

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n log n) per median query O(n) Sorts the full collection repeatedly
Optimal O(log n) insertion, O(1) median lookup O(n) Uses two heaps to maintain balanced halves

Algorithm Walkthrough

  1. Create two heaps.

The first heap represents the smaller half of the numbers. Since Python only provides a min heap, we simulate a max heap by storing negative values.

The second heap represents the larger half and uses a normal min heap. 2. Insert the new number into the max heap first.

We initially assume the new number belongs to the lower half. In Python, this means pushing -num into the max heap. 3. Restore ordering between the heaps.

The largest number in the lower half must never exceed the smallest number in the upper half.

After inserting into the max heap, move its largest element into the min heap. This guarantees ordering correctness. 4. Rebalance heap sizes if necessary.

The max heap is allowed to contain at most one more element than the min heap.

If the min heap becomes larger, move its smallest element back into the max heap. 5. Compute the median.

If the heaps have equal size, the median is the average of both heap tops.

If the max heap has one extra element, its top is the median.

Why it works

The algorithm maintains two critical invariants:

  • All values in the max heap are less than or equal to all values in the min heap
  • The heap sizes differ by at most one

Because of these properties, the middle element or middle pair is always located at the heap tops. This guarantees that median computation is always correct without sorting the full dataset.

Python Solution

import heapq
from typing import List

class MedianFinder:

    def __init__(self):
        self.small = []  # max heap simulated with negative values
        self.large = []  # min heap

    def addNum(self, num: int) -> None:
        # Step 1: push into max heap
        heapq.heappush(self.small, -num)

        # Step 2: move the largest from small into large
        value = -heapq.heappop(self.small)
        heapq.heappush(self.large, value)

        # Step 3: rebalance sizes if needed
        if len(self.large) > len(self.small):
            value = heapq.heappop(self.large)
            heapq.heappush(self.small, -value)

    def findMedian(self) -> float:
        # Odd number of elements
        if len(self.small) > len(self.large):
            return float(-self.small[0])

        # Even number of elements
        return (-self.small[0] + self.large[0]) / 2.0

# Your MedianFinder object will be instantiated and called as such:
# obj = MedianFinder()
# obj.addNum(num)
# param_2 = obj.findMedian()

The implementation uses two heaps named small and large.

The small heap stores the lower half of the numbers. Since Python's heapq only supports min heaps, negative values are stored to simulate max heap behavior. This allows the largest value in the lower half to remain accessible at index 0.

The large heap stores the upper half using normal min heap behavior.

Inside addNum, every new value is first pushed into small. Immediately afterward, the largest value from small is moved into large. This ensures that all elements in small remain less than or equal to those in large.

The balancing step guarantees that small either has the same number of elements as large or exactly one more.

The findMedian method directly uses the heap tops:

  • If the total count is odd, the median comes from small
  • If the total count is even, the median is the average of both heap tops

This design avoids repeated sorting while maintaining efficient insertion.

Go Solution

package main

import (
	"container/heap"
)

type MinHeap []int

func (h MinHeap) Len() int {
	return len(h)
}

func (h MinHeap) Less(i, j int) bool {
	return h[i] < h[j]
}

func (h MinHeap) Swap(i, j int) {
	h[i], h[j] = h[j], h[i]
}

func (h *MinHeap) Push(x interface{}) {
	*h = append(*h, x.(int))
}

func (h *MinHeap) Pop() interface{} {
	old := *h
	n := len(old)
	value := old[n-1]
	*h = old[:n-1]
	return value
}

type MaxHeap []int

func (h MaxHeap) Len() int {
	return len(h)
}

func (h MaxHeap) Less(i, j int) bool {
	return h[i] > h[j]
}

func (h MaxHeap) Swap(i, j int) {
	h[i], h[j] = h[j], h[i]
}

func (h *MaxHeap) Push(x interface{}) {
	*h = append(*h, x.(int))
}

func (h *MaxHeap) Pop() interface{} {
	old := *h
	n := len(old)
	value := old[n-1]
	*h = old[:n-1]
	return value
}

type MedianFinder struct {
	small *MaxHeap
	large *MinHeap
}

func Constructor() MedianFinder {
	small := &MaxHeap{}
	large := &MinHeap{}

	heap.Init(small)
	heap.Init(large)

	return MedianFinder{
		small: small,
		large: large,
	}
}

func (this *MedianFinder) AddNum(num int) {
	heap.Push(this.small, num)

	value := heap.Pop(this.small).(int)
	heap.Push(this.large, value)

	if this.large.Len() > this.small.Len() {
		value := heap.Pop(this.large).(int)
		heap.Push(this.small, value)
	}
}

func (this *MedianFinder) FindMedian() float64 {
	if this.small.Len() > this.large.Len() {
		return float64((*this.small)[0])
	}

	left := (*this.small)[0]
	right := (*this.large)[0]

	return float64(left+right) / 2.0
}

/**
 * Your MedianFinder object will be instantiated and called as such:
 * obj := Constructor();
 * obj.AddNum(num);
 * param_2 := obj.FindMedian();
 */

The Go solution follows the same logical structure as the Python version but requires explicit heap implementations because Go's container/heap package relies on interfaces.

Separate MinHeap and MaxHeap types are defined. The only meaningful difference is the Less method:

  • MinHeap uses standard ascending comparison
  • MaxHeap reverses comparison to simulate max heap behavior

Unlike Python, Go does not support direct heap operations on primitive slices without implementing the required interface methods.

The median calculation uses float64 conversion to avoid integer division.

Worked Examples

Example 1

Input sequence:

addNum(1)
addNum(2)
findMedian()
addNum(3)
findMedian()
Operation Max Heap (small) Min Heap (large) Median
addNum(1) [1] [] 1
addNum(2) [1] [2] 1.5
findMedian() [1] [2] 1.5
addNum(3) [2, 1] [3] 2
findMedian() [2, 1] [3] 2

Detailed walkthrough:

  1. Insert 1
  • small = [1]
  • large = []
  • Odd count, median is 1
  1. Insert 2
  • Lower half contains 1
  • Upper half contains 2
  • Even count, median is (1 + 2) / 2 = 1.5
  1. Insert 3
  • Rebalancing moves values appropriately
  • small contains [2, 1]
  • large contains [3]
  • Odd count, median is 2

Complexity Analysis

Measure Complexity Explanation
Time O(log n) for addNum, O(1) for findMedian Heap insertion and balancing dominate insertion cost
Space O(n) All inserted elements are stored in the heaps

Each insertion performs a constant number of heap operations. Since heap push and pop each require O(log n) time, insertion complexity is logarithmic.

Median retrieval simply reads heap tops, which is constant time.

The space complexity is linear because every inserted number remains stored.

Test Cases

# Example case
mf = MedianFinder()
mf.addNum(1)
mf.addNum(2)
assert mf.findMedian() == 1.5  # even number of elements
mf.addNum(3)
assert mf.findMedian() == 2.0  # odd number of elements

# Single element
mf = MedianFinder()
mf.addNum(10)
assert mf.findMedian() == 10.0  # single value median

# Negative numbers
mf = MedianFinder()
mf.addNum(-1)
mf.addNum(-2)
mf.addNum(-3)
assert mf.findMedian() == -2.0  # negative values

# Duplicate numbers
mf = MedianFinder()
mf.addNum(5)
mf.addNum(5)
mf.addNum(5)
assert mf.findMedian() == 5.0  # duplicates

# Even count
mf = MedianFinder()
mf.addNum(1)
mf.addNum(4)
mf.addNum(7)
mf.addNum(10)
assert mf.findMedian() == 5.5  # average of middle pair

# Alternating high and low values
mf = MedianFinder()
mf.addNum(1000)
mf.addNum(-1000)
mf.addNum(999)
mf.addNum(-999)
assert mf.findMedian() == 0.0  # balanced extremes

# Increasing sequence
mf = MedianFinder()
for i in range(1, 6):
    mf.addNum(i)
assert mf.findMedian() == 3.0  # sorted ascending input

# Decreasing sequence
mf = MedianFinder()
for i in range(5, 0, -1):
    mf.addNum(i)
assert mf.findMedian() == 3.0  # sorted descending input

# Large range values
mf = MedianFinder()
mf.addNum(-100000)
mf.addNum(100000)
assert mf.findMedian() == 0.0  # boundary constraints

Test Summary

Test Why
Basic example Verifies standard functionality
Single element Checks smallest valid input
Negative numbers Ensures ordering works with negatives
Duplicate numbers Verifies repeated values
Even count Tests averaging behavior
Alternating extremes Stresses heap balancing
Increasing sequence Tests monotonic insertion
Decreasing sequence Tests reverse-order insertion
Boundary values Validates constraint extremes

Edge Cases

One important edge case occurs when there is only a single element in the data structure. A buggy implementation might incorrectly attempt to average two values or access an empty heap. This solution handles the case naturally because the max heap contains one element while the min heap remains empty, so the median is simply the top of the max heap.

Another important case involves duplicate values. Some balancing strategies accidentally rely on strict inequalities and can misplace equal values between heaps. This implementation avoids that issue because it only enforces that all elements in the lower half are less than or equal to those in the upper half. Equal values work correctly without special handling.

A third critical edge case is alternating large and small numbers, such as 1000, -1000, 999, -999. Poor implementations can become unbalanced or violate heap ordering when values fluctuate dramatically. The rebalance step guarantees that heap sizes remain valid after every insertion, while the transfer between heaps preserves ordering correctness.

A final edge case involves even-sized collections where the median is fractional. Integer division bugs are common in languages like Go and Java. Both implementations explicitly return floating-point results, ensuring correct values such as 1.5 instead of truncated integers like 1.