LeetCode 295 - Find Median from Data Stream
The problem asks us to design a data structure that continuously receives integers from a stream and can efficiently return the median of all numbers seen so far. The median is defined as the middle element in a sorted list.
Difficulty: 🔴 Hard
Topics: Two Pointers, Design, Sorting, Heap (Priority Queue), Data Stream
Solution
Problem Understanding
The problem asks us to design a data structure that continuously receives integers from a stream and can efficiently return the median of all numbers seen so far.
The median is defined as the middle element in a sorted list. If the number of elements is odd, the median is the single middle value. If the number of elements is even, the median is the average of the two middle values.
For example:
- In
[2, 3, 4], the median is3 - In
[2, 3], the median is(2 + 3) / 2 = 2.5
The key challenge is that numbers arrive dynamically. We are not given the entire array upfront. Instead, we must support two operations repeatedly:
addNum(num)inserts a new number into the data structurefindMedian()returns the current median at any moment
The constraints are important:
- Up to
5 * 10^4operations may occur - Numbers range from
-10^5to10^5
These limits immediately rule out inefficient approaches that repeatedly sort the entire collection after every insertion. Even though 50,000 operations is not enormous, repeatedly sorting would create unnecessary overhead.
The problem guarantees that findMedian() is never called on an empty structure, so we do not need to handle that invalid case.
Several edge cases are important:
- The stream may contain negative numbers
- Duplicate values may appear frequently
- The stream may alternate between large and small numbers
- The number of elements may constantly switch between odd and even
- The median may be a floating-point value
A correct implementation must efficiently maintain ordering information as numbers are inserted.
Approaches
Brute Force Approach
The most straightforward solution is to store all inserted numbers in a list. Every time findMedian() is called, we sort the list and compute the middle value.
This works because sorting guarantees that the middle elements are positioned correctly. Once the list is sorted:
- If the length is odd, the median is the center element
- If the length is even, the median is the average of the two center elements
However, this approach is inefficient. Sorting requires O(n log n) time every time we request the median. Since findMedian() may be called many times, repeated sorting becomes expensive.
An alternative brute-force variation inserts numbers into a sorted list using binary insertion. This improves median lookup but insertion still costs O(n) because shifting elements in an array is expensive.
The main issue is that we need efficient insertion and efficient median retrieval simultaneously.
Optimal Approach, Two Heaps
The key observation is that the median only depends on the middle portion of the data, not the full sorted order.
We can divide the numbers into two halves:
- A max heap stores the smaller half
- A min heap stores the larger half
This structure gives direct access to the largest value in the lower half and the smallest value in the upper half.
The heaps maintain these properties:
- Every value in the max heap is less than or equal to every value in the min heap
- The heap sizes differ by at most one
- The max heap may contain one extra element when the total count is odd
With these invariants:
- If the total count is odd, the median is the top of the max heap
- If the total count is even, the median is the average of both heap tops
This approach is efficient because heap insertion and balancing each take O(log n) time, while median lookup becomes O(1).
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n log n) per median query |
O(n) |
Sorts the full collection repeatedly |
| Optimal | O(log n) insertion, O(1) median lookup |
O(n) |
Uses two heaps to maintain balanced halves |
Algorithm Walkthrough
- Create two heaps.
The first heap represents the smaller half of the numbers. Since Python only provides a min heap, we simulate a max heap by storing negative values.
The second heap represents the larger half and uses a normal min heap. 2. Insert the new number into the max heap first.
We initially assume the new number belongs to the lower half. In Python, this means pushing -num into the max heap.
3. Restore ordering between the heaps.
The largest number in the lower half must never exceed the smallest number in the upper half.
After inserting into the max heap, move its largest element into the min heap. This guarantees ordering correctness. 4. Rebalance heap sizes if necessary.
The max heap is allowed to contain at most one more element than the min heap.
If the min heap becomes larger, move its smallest element back into the max heap. 5. Compute the median.
If the heaps have equal size, the median is the average of both heap tops.
If the max heap has one extra element, its top is the median.
Why it works
The algorithm maintains two critical invariants:
- All values in the max heap are less than or equal to all values in the min heap
- The heap sizes differ by at most one
Because of these properties, the middle element or middle pair is always located at the heap tops. This guarantees that median computation is always correct without sorting the full dataset.
Python Solution
import heapq
from typing import List
class MedianFinder:
def __init__(self):
self.small = [] # max heap simulated with negative values
self.large = [] # min heap
def addNum(self, num: int) -> None:
# Step 1: push into max heap
heapq.heappush(self.small, -num)
# Step 2: move the largest from small into large
value = -heapq.heappop(self.small)
heapq.heappush(self.large, value)
# Step 3: rebalance sizes if needed
if len(self.large) > len(self.small):
value = heapq.heappop(self.large)
heapq.heappush(self.small, -value)
def findMedian(self) -> float:
# Odd number of elements
if len(self.small) > len(self.large):
return float(-self.small[0])
# Even number of elements
return (-self.small[0] + self.large[0]) / 2.0
# Your MedianFinder object will be instantiated and called as such:
# obj = MedianFinder()
# obj.addNum(num)
# param_2 = obj.findMedian()
The implementation uses two heaps named small and large.
The small heap stores the lower half of the numbers. Since Python's heapq only supports min heaps, negative values are stored to simulate max heap behavior. This allows the largest value in the lower half to remain accessible at index 0.
The large heap stores the upper half using normal min heap behavior.
Inside addNum, every new value is first pushed into small. Immediately afterward, the largest value from small is moved into large. This ensures that all elements in small remain less than or equal to those in large.
The balancing step guarantees that small either has the same number of elements as large or exactly one more.
The findMedian method directly uses the heap tops:
- If the total count is odd, the median comes from
small - If the total count is even, the median is the average of both heap tops
This design avoids repeated sorting while maintaining efficient insertion.
Go Solution
package main
import (
"container/heap"
)
type MinHeap []int
func (h MinHeap) Len() int {
return len(h)
}
func (h MinHeap) Less(i, j int) bool {
return h[i] < h[j]
}
func (h MinHeap) Swap(i, j int) {
h[i], h[j] = h[j], h[i]
}
func (h *MinHeap) Push(x interface{}) {
*h = append(*h, x.(int))
}
func (h *MinHeap) Pop() interface{} {
old := *h
n := len(old)
value := old[n-1]
*h = old[:n-1]
return value
}
type MaxHeap []int
func (h MaxHeap) Len() int {
return len(h)
}
func (h MaxHeap) Less(i, j int) bool {
return h[i] > h[j]
}
func (h MaxHeap) Swap(i, j int) {
h[i], h[j] = h[j], h[i]
}
func (h *MaxHeap) Push(x interface{}) {
*h = append(*h, x.(int))
}
func (h *MaxHeap) Pop() interface{} {
old := *h
n := len(old)
value := old[n-1]
*h = old[:n-1]
return value
}
type MedianFinder struct {
small *MaxHeap
large *MinHeap
}
func Constructor() MedianFinder {
small := &MaxHeap{}
large := &MinHeap{}
heap.Init(small)
heap.Init(large)
return MedianFinder{
small: small,
large: large,
}
}
func (this *MedianFinder) AddNum(num int) {
heap.Push(this.small, num)
value := heap.Pop(this.small).(int)
heap.Push(this.large, value)
if this.large.Len() > this.small.Len() {
value := heap.Pop(this.large).(int)
heap.Push(this.small, value)
}
}
func (this *MedianFinder) FindMedian() float64 {
if this.small.Len() > this.large.Len() {
return float64((*this.small)[0])
}
left := (*this.small)[0]
right := (*this.large)[0]
return float64(left+right) / 2.0
}
/**
* Your MedianFinder object will be instantiated and called as such:
* obj := Constructor();
* obj.AddNum(num);
* param_2 := obj.FindMedian();
*/
The Go solution follows the same logical structure as the Python version but requires explicit heap implementations because Go's container/heap package relies on interfaces.
Separate MinHeap and MaxHeap types are defined. The only meaningful difference is the Less method:
MinHeapuses standard ascending comparisonMaxHeapreverses comparison to simulate max heap behavior
Unlike Python, Go does not support direct heap operations on primitive slices without implementing the required interface methods.
The median calculation uses float64 conversion to avoid integer division.
Worked Examples
Example 1
Input sequence:
addNum(1)
addNum(2)
findMedian()
addNum(3)
findMedian()
| Operation | Max Heap (small) | Min Heap (large) | Median |
|---|---|---|---|
| addNum(1) | [1] | [] | 1 |
| addNum(2) | [1] | [2] | 1.5 |
| findMedian() | [1] | [2] | 1.5 |
| addNum(3) | [2, 1] | [3] | 2 |
| findMedian() | [2, 1] | [3] | 2 |
Detailed walkthrough:
- Insert
1
small = [1]large = []- Odd count, median is
1
- Insert
2
- Lower half contains
1 - Upper half contains
2 - Even count, median is
(1 + 2) / 2 = 1.5
- Insert
3
- Rebalancing moves values appropriately
smallcontains[2, 1]largecontains[3]- Odd count, median is
2
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(log n) for addNum, O(1) for findMedian |
Heap insertion and balancing dominate insertion cost |
| Space | O(n) |
All inserted elements are stored in the heaps |
Each insertion performs a constant number of heap operations. Since heap push and pop each require O(log n) time, insertion complexity is logarithmic.
Median retrieval simply reads heap tops, which is constant time.
The space complexity is linear because every inserted number remains stored.
Test Cases
# Example case
mf = MedianFinder()
mf.addNum(1)
mf.addNum(2)
assert mf.findMedian() == 1.5 # even number of elements
mf.addNum(3)
assert mf.findMedian() == 2.0 # odd number of elements
# Single element
mf = MedianFinder()
mf.addNum(10)
assert mf.findMedian() == 10.0 # single value median
# Negative numbers
mf = MedianFinder()
mf.addNum(-1)
mf.addNum(-2)
mf.addNum(-3)
assert mf.findMedian() == -2.0 # negative values
# Duplicate numbers
mf = MedianFinder()
mf.addNum(5)
mf.addNum(5)
mf.addNum(5)
assert mf.findMedian() == 5.0 # duplicates
# Even count
mf = MedianFinder()
mf.addNum(1)
mf.addNum(4)
mf.addNum(7)
mf.addNum(10)
assert mf.findMedian() == 5.5 # average of middle pair
# Alternating high and low values
mf = MedianFinder()
mf.addNum(1000)
mf.addNum(-1000)
mf.addNum(999)
mf.addNum(-999)
assert mf.findMedian() == 0.0 # balanced extremes
# Increasing sequence
mf = MedianFinder()
for i in range(1, 6):
mf.addNum(i)
assert mf.findMedian() == 3.0 # sorted ascending input
# Decreasing sequence
mf = MedianFinder()
for i in range(5, 0, -1):
mf.addNum(i)
assert mf.findMedian() == 3.0 # sorted descending input
# Large range values
mf = MedianFinder()
mf.addNum(-100000)
mf.addNum(100000)
assert mf.findMedian() == 0.0 # boundary constraints
Test Summary
| Test | Why |
|---|---|
| Basic example | Verifies standard functionality |
| Single element | Checks smallest valid input |
| Negative numbers | Ensures ordering works with negatives |
| Duplicate numbers | Verifies repeated values |
| Even count | Tests averaging behavior |
| Alternating extremes | Stresses heap balancing |
| Increasing sequence | Tests monotonic insertion |
| Decreasing sequence | Tests reverse-order insertion |
| Boundary values | Validates constraint extremes |
Edge Cases
One important edge case occurs when there is only a single element in the data structure. A buggy implementation might incorrectly attempt to average two values or access an empty heap. This solution handles the case naturally because the max heap contains one element while the min heap remains empty, so the median is simply the top of the max heap.
Another important case involves duplicate values. Some balancing strategies accidentally rely on strict inequalities and can misplace equal values between heaps. This implementation avoids that issue because it only enforces that all elements in the lower half are less than or equal to those in the upper half. Equal values work correctly without special handling.
A third critical edge case is alternating large and small numbers, such as 1000, -1000, 999, -999. Poor implementations can become unbalanced or violate heap ordering when values fluctuate dramatically. The rebalance step guarantees that heap sizes remain valid after every insertion, while the transfer between heaps preserves ordering correctness.
A final edge case involves even-sized collections where the median is fractional. Integer division bugs are common in languages like Go and Java. Both implementations explicitly return floating-point results, ensuring correct values such as 1.5 instead of truncated integers like 1.