LeetCode 4 - Median of Two Sorted Arrays
The problem gives two individually sorted integer arrays, nums1 and nums2, with lengths m and n. The task is to compute the median of the combined sorted sequence formed by merging both arrays. The median is the middle value of a sorted sequence.
Difficulty: 🔴 Hard
Topics: Array, Binary Search, Divide and Conquer
Solution
Median of Two Sorted Arrays
Problem Understanding
The problem gives two individually sorted integer arrays, nums1 and nums2, with lengths m and n. The task is to compute the median of the combined sorted sequence formed by merging both arrays.
The median is the middle value of a sorted sequence. If the total number of elements is odd, the median is the single middle element. If the total number of elements is even, the median is the average of the two middle elements.
For example, if the merged array is [1, 2, 3], the median is 2. If the merged array is [1, 2, 3, 4], the median is (2 + 3) / 2 = 2.5.
A very important detail is the required time complexity: O(log(m + n)). This immediately rules out the straightforward merge-based approach, because fully merging the arrays takes linear time. The problem is specifically testing whether we can exploit the fact that both arrays are already sorted.
The constraints tell us several useful things:
- Either array may be empty.
- The combined length is always at least 1.
- Both arrays are already sorted in non-decreasing order.
- Values may be negative or positive.
- The total input size is relatively small here, but the required asymptotic complexity still matters because this is fundamentally an algorithm design problem.
Several edge cases are especially important:
- One array may be completely empty.
- One array may contain all smaller values than the other.
- Arrays may have very different lengths.
- Arrays may contain duplicate values.
- The total number of elements may be odd or even.
These situations often break incorrect partition logic or boundary handling in binary search solutions.
Approaches
Brute Force Approach
The most direct solution is to merge both sorted arrays into a single sorted array, exactly like the merge step in merge sort.
Because both arrays are already sorted, we can maintain two pointers and repeatedly append the smaller current value into a merged array. Once merging is complete, we compute the median from the merged result.
This approach is easy to understand and guaranteed to work correctly because the merged array is fully sorted.
However, the time complexity is O(m + n), which violates the required logarithmic runtime constraint. Even though the constraints are not extremely large, the problem explicitly asks for a more efficient algorithm.
Optimal Approach
The key insight is that we do not actually need to fully merge the arrays. We only need enough information to determine the middle element or elements.
Instead of merging, we can think about partitioning both arrays into left and right halves such that:
- Every value on the left side is less than or equal to every value on the right side.
- The left half contains exactly half of the total elements.
If we can find such a partition, then:
- For an odd total length, the median is the maximum value from the left side.
- For an even total length, the median is the average of the maximum left value and minimum right value.
Because the arrays are sorted, we can binary search the partition position in the smaller array. This reduces the runtime to O(log(min(m, n))).
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(m + n) | O(m + n) | Merge both arrays completely, then compute median |
| Optimal | O(log(min(m, n))) | O(1) | Binary search partition position without merging |
Algorithm Walkthrough
Optimal Binary Search Partition Algorithm
- Always perform binary search on the smaller array.
This is important because binary search complexity depends on the search space size. Swapping the arrays when necessary guarantees the logarithmic complexity remains minimal.
2. Let the smaller array be nums1 and the larger array be nums2.
Suppose we choose a partition index partition1 in nums1. Then we can derive the corresponding partition in nums2 as:
$$partition2 = \frac{m + n + 1}{2} - partition1$$
This guarantees that the total number of elements on the left side equals half of the combined length. 3. Determine the four boundary values around the partitions.
We need:
- Left value from
nums1 - Right value from
nums1 - Left value from
nums2 - Right value from
nums2
Conceptually:
nums1: [ left1 | right1 ]
nums2: [ left2 | right2 ]
If a partition is at the edge of an array, we use negative infinity or positive infinity so comparisons still work cleanly. 4. Check whether the partition is valid.
The partition is correct if:
$$left1 \le right2$$
and
$$left2 \le right1$$
This condition guarantees every element on the left side is less than or equal to every element on the right side. 5. If the partition is valid, compute the median.
If the total length is odd:
$$median = \max(left1, left2)$$
If the total length is even:
$$median = \frac{\max(left1, left2) + \min(right1, right2)}{2}$$ 6. If the partition is not valid, adjust the binary search.
- If
left1 > right2, then we moved too far right innums1, so move left. - Otherwise, move right.
Why it works
The algorithm maintains the invariant that the left partition always contains exactly half of the total elements. Binary search adjusts the partition until all left-side elements are less than or equal to all right-side elements.
Because both arrays are sorted, once this partition condition is satisfied, the median must lie at the boundary between the left and right partitions. The maximum left value and minimum right value completely determine the median.
Python Solution
from typing import List
class Solution:
def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
# Ensure nums1 is the smaller array
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1
m, n = len(nums1), len(nums2)
left = 0
right = m
while left <= right:
partition1 = (left + right) // 2
partition2 = (m + n + 1) // 2 - partition1
max_left1 = float("-inf") if partition1 == 0 else nums1[partition1 - 1]
min_right1 = float("inf") if partition1 == m else nums1[partition1]
max_left2 = float("-inf") if partition2 == 0 else nums2[partition2 - 1]
min_right2 = float("inf") if partition2 == n else nums2[partition2]
if max_left1 <= min_right2 and max_left2 <= min_right1:
if (m + n) % 2 == 1:
return float(max(max_left1, max_left2))
return (
max(max_left1, max_left2)
+ min(min_right1, min_right2)
) / 2.0
elif max_left1 > min_right2:
right = partition1 - 1
else:
left = partition1 + 1
return 0.0
The implementation begins by ensuring that nums1 is the smaller array. This guarantees the binary search operates on the smaller search space.
The binary search range spans all possible partition positions in nums1, from 0 to m.
At each iteration, the algorithm computes the partition indices for both arrays. The partition in nums2 is derived automatically so that the left side always contains half of the total elements.
The four boundary variables represent the values directly adjacent to the partitions. Using infinities for out-of-bounds situations avoids complicated special-case logic.
The main condition checks whether the partition is valid. If it is, the algorithm computes the median based on whether the total length is odd or even.
If the partition is invalid, binary search narrows the search space appropriately.
The final return 0.0 is technically unreachable because the problem guarantees valid input.
Go Solution
import "math"
func findMedianSortedArrays(nums1 []int, nums2 []int) float64 {
if len(nums1) > len(nums2) {
return findMedianSortedArrays(nums2, nums1)
}
m := len(nums1)
n := len(nums2)
left := 0
right := m
for left <= right {
partition1 := (left + right) / 2
partition2 := (m + n + 1) / 2 - partition1
maxLeft1 := math.Inf(-1)
if partition1 > 0 {
maxLeft1 = float64(nums1[partition1-1])
}
minRight1 := math.Inf(1)
if partition1 < m {
minRight1 = float64(nums1[partition1])
}
maxLeft2 := math.Inf(-1)
if partition2 > 0 {
maxLeft2 = float64(nums2[partition2-1])
}
minRight2 := math.Inf(1)
if partition2 < n {
minRight2 = float64(nums2[partition2])
}
if maxLeft1 <= minRight2 && maxLeft2 <= minRight1 {
if (m+n)%2 == 1 {
if maxLeft1 > maxLeft2 {
return maxLeft1
}
return maxLeft2
}
leftMax := math.Max(maxLeft1, maxLeft2)
rightMin := math.Min(minRight1, minRight2)
return (leftMax + rightMin) / 2.0
}
if maxLeft1 > minRight2 {
right = partition1 - 1
} else {
left = partition1 + 1
}
}
return 0.0
}
The Go implementation follows exactly the same algorithmic structure as the Python version.
One important difference is handling infinities. Go uses math.Inf(-1) and math.Inf(1) because Go does not have Python-style float("inf").
Another difference is explicit conversion from int to float64 before performing comparisons and median calculations.
Slices in Go naturally handle empty arrays, so no additional nil handling is required.
Worked Examples
Example 1
nums1 = [1, 3]
nums2 = [2]
Because binary search should operate on the smaller array, we swap them:
nums1 = [2]
nums2 = [1, 3]
Total length:
m = 1
n = 2
total = 3
We binary search on nums1.
| Step | partition1 | partition2 | maxLeft1 | minRight1 | maxLeft2 | minRight2 | Valid? |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 2 | -inf | 2 | 3 | inf | No |
| 2 | 1 | 1 | 2 | inf | 1 | 3 | Yes |
Now:
max(2, 1) = 2
Since the total length is odd:
median = 2
Example 2
nums1 = [1, 2]
nums2 = [3, 4]
m = 2
n = 2
total = 4
| Step | partition1 | partition2 | maxLeft1 | minRight1 | maxLeft2 | minRight2 | Valid? |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 2 | 3 | 4 | No |
| 2 | 2 | 0 | 2 | inf | -inf | 3 | Yes |
Now:
left_max = max(2, -inf) = 2
right_min = min(inf, 3) = 3
Since the total length is even:
median = (2 + 3) / 2 = 2.5
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(log(min(m, n))) | Binary search runs only on the smaller array |
| Space | O(1) | Only a constant number of variables are used |
The binary search repeatedly halves the search space in the smaller array, which gives logarithmic runtime. No auxiliary arrays or additional data structures are created, so the algorithm uses constant extra memory.
Test Cases
sol = Solution()
assert sol.findMedianSortedArrays([1, 3], [2]) == 2.0
# Odd total length
assert sol.findMedianSortedArrays([1, 2], [3, 4]) == 2.5
# Even total length
assert sol.findMedianSortedArrays([], [1]) == 1.0
# First array empty
assert sol.findMedianSortedArrays([2], []) == 2.0
# Second array empty
assert sol.findMedianSortedArrays([0, 0], [0, 0]) == 0.0
# Duplicate values
assert sol.findMedianSortedArrays([1], [2, 3, 4, 5, 6]) == 3.5
# Very different array sizes
assert sol.findMedianSortedArrays([1, 2, 3], [4, 5, 6, 7]) == 4.0
# Odd total after merge
assert sol.findMedianSortedArrays([-5, -3, -1], [-2, 0, 2]) == -1.5
# Negative numbers
assert sol.findMedianSortedArrays([1, 1, 1], [1, 1, 1]) == 1.0
# All identical values
assert sol.findMedianSortedArrays([1000000], [-1000000]) == 0.0
# Extreme constraint values
Test Case Summary
| Test | Why |
|---|---|
[1,3] and [2] |
Validates odd total length |
[1,2] and [3,4] |
Validates even total length |
[] and [1] |
Tests empty first array |
[2] and [] |
Tests empty second array |
[0,0] and [0,0] |
Tests duplicates |
[1] and [2,3,4,5,6] |
Tests highly unbalanced sizes |
[1,2,3] and [4,5,6,7] |
Tests ordered non-overlapping arrays |
[-5,-3,-1] and [-2,0,2] |
Tests negative values |
[1,1,1] and [1,1,1] |
Tests repeated identical values |
[1000000] and [-1000000] |
Tests extreme values |
Edge Cases
One important edge case occurs when one array is empty. Many implementations accidentally access invalid indices during partition calculations. This solution handles the situation cleanly using negative infinity and positive infinity as boundary values. If a partition falls entirely outside an array, comparisons still behave correctly.
Another tricky case occurs when all elements of one array are smaller than all elements of the other array. For example:
nums1 = [1, 2]
nums2 = [100, 101]
Incorrect partition logic can fail here because the valid partition lies at the extreme edge of one array. The implementation explicitly supports partitions at index 0 and index m, which correctly handles these boundary configurations.
Arrays containing many duplicate values are also important. Some implementations mistakenly use strict inequalities instead of non-strict comparisons. This algorithm uses <=, which allows equal values across partition boundaries and correctly handles duplicates.
A final subtle edge case is when the arrays have dramatically different sizes. For example:
nums1 = [1]
nums2 = [2,3,4,5,6,7,8]
Binary searching the smaller array guarantees both correctness and efficiency. The partition formulas continue to work regardless of size imbalance.