LeetCode 528 - Random Pick with Weight
The problem asks us to design a data structure that supports weighted random sampling. You are given an array w, where each element represents the weight of an index. Instead of choosing every index with equal probability, the selection probability depends on its weight.
Difficulty: 🟡 Medium
Topics: Array, Math, Binary Search, Prefix Sum, Randomized
Solution
Problem Understanding
The problem asks us to design a data structure that supports weighted random sampling.
You are given an array w, where each element represents the weight of an index. Instead of choosing every index with equal probability, the selection probability depends on its weight.
For example:
- If
w = [1, 3] - Total weight =
1 + 3 = 4 - Index
0should be chosen with probability1/4 - Index
1should be chosen with probability3/4
The challenge is implementing the method pickIndex() such that repeated calls statistically follow these probabilities.
In other words, we need to simulate a probability distribution where larger weights make an index more likely to be chosen.
The input consists of:
- A constructor
Solution(w)that initializes the data structure. - A method
pickIndex()that returns one valid index from[0, len(w) - 1].
The key requirement is that selection must be random but weighted.
The constraints give us important hints about efficiency:
w.length <= 10^4pickIndex()is called at most10^4timesw[i] <= 10^5
Since pickIndex() may be called many times, recomputing probabilities from scratch every call would be inefficient. We should preprocess information once during initialization and make each query efficient.
An important observation is that all weights are positive integers, meaning every index always has a non-zero probability of being selected. This removes corner cases involving zero probability intervals.
Some edge cases that could trip up a naive implementation include:
- A single-element array like
[1], where the only valid answer is0. - Very uneven distributions such as
[1, 100000], where one index should dominate the probability space. - Large arrays with repeated calls to
pickIndex(), where poor performance would become noticeable.
The problem guarantees valid positive weights, so we do not need to handle negative values or empty arrays.
Approaches
Brute Force Approach
A straightforward approach is to explicitly build a weighted pool.
For each index i, insert it into an auxiliary list w[i] times.
For example:
w = [1, 3]
Expanded array:
[0, 1, 1, 1]
Then pickIndex() simply selects a random position in this expanded array.
This works because frequency directly corresponds to probability:
- Index
0appears once - Index
1appears three times
Thus, index 1 is selected three times more often.
Although correct, this approach becomes impractical when weights are large.
For example:
w = [100000, 100000]
The expanded array would require 200000 elements.
Since:
sum(w) can be as large as 10^9
this solution is far too memory-intensive.
Key Insight
Instead of physically expanding the weighted array, we can think of weights as ranges on a number line.
Suppose:
w = [1, 3, 2]
We create prefix sums:
[1, 4, 6]
This represents intervals:
| Index | Range |
|---|---|
| 0 | [1] |
| 1 | [2,4] |
| 2 | [5,6] |
Now imagine generating a random integer:
target ∈ [1, totalWeight]
If:
target = 3
then it lies inside index 1’s range.
The problem becomes:
Find the first prefix sum greater than or equal to
target.
Since prefix sums are sorted, we can use binary search.
This gives:
- Preprocessing:
O(n) - Each query:
O(log n)
which is much more efficient.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(sum(w)) preprocessing, O(1) pick |
O(sum(w)) |
Expands the weighted array explicitly, impractical for large weights |
| Optimal | O(n) preprocessing, O(log n) pick |
O(n) |
Uses prefix sums and binary search |
Algorithm Walkthrough
Optimal Algorithm: Prefix Sum + Binary Search
- During initialization, compute a prefix sum array.
Each position stores the cumulative sum of weights up to that index.
Example:
w = [1, 3, 2]
prefix = [1, 4, 6]
This transforms weights into searchable intervals. 2. Store the total weight.
The last prefix sum equals the total probability mass:
total = 6
- When
pickIndex()is called, generate a random integer in the inclusive range:
[1, total]
Every number in this range represents one probability slot. 4. Use binary search to locate the first prefix sum greater than or equal to the random number.
Example:
target = 5
prefix = [1, 4, 6]
Binary search finds:
6 >= 5
which corresponds to index 2.
5. Return that index.
Because intervals are proportional to weight size, larger weights occupy more random-number slots and therefore get selected more frequently.
Why it works
The prefix sum array partitions the range [1, totalWeight] into contiguous intervals whose lengths equal their weights.
If index i has weight w[i], then it owns exactly w[i] integers in the probability space. Since every random number is equally likely, the probability of landing inside index i’s interval is:
$$\frac{w[i]}{\text{sum}(w)}$$
Binary search correctly identifies which interval contains the random number, guaranteeing the required weighted probability distribution.
Python Solution
from typing import List
import random
import bisect
class Solution:
def __init__(self, w: List[int]):
self.prefix_sums = []
running_sum = 0
for weight in w:
running_sum += weight
self.prefix_sums.append(running_sum)
self.total_weight = running_sum
def pickIndex(self) -> int:
target = random.randint(1, self.total_weight)
return bisect.bisect_left(self.prefix_sums, target)
# Your Solution object will be instantiated and called as such:
# obj = Solution(w)
# param_1 = obj.pickIndex()
The constructor preprocesses the input array by computing prefix sums. A running cumulative total is maintained, and each cumulative value is appended to self.prefix_sums.
For example:
w = [1, 3, 2]
becomes:
prefix_sums = [1, 4, 6]
The variable self.total_weight stores the maximum possible random number.
Inside pickIndex(), we generate a uniformly random integer between 1 and total_weight. This random value represents a location in the weighted probability space.
We then use bisect_left() to find the first prefix sum greater than or equal to the target. Because the prefix sum array is sorted, binary search runs in logarithmic time.
For instance:
prefix = [1, 4, 6]
target = 4
Binary search returns index 1, which corresponds to the interval [2,4].
Go Solution
package main
import (
"math/rand"
"sort"
)
type Solution struct {
prefixSums []int
totalWeight int
}
func Constructor(w []int) Solution {
prefixSums := make([]int, len(w))
runningSum := 0
for i, weight := range w {
runningSum += weight
prefixSums[i] = runningSum
}
return Solution{
prefixSums: prefixSums,
totalWeight: runningSum,
}
}
func (this *Solution) PickIndex() int {
target := rand.Intn(this.totalWeight) + 1
index := sort.Search(len(this.prefixSums), func(i int) bool {
return this.prefixSums[i] >= target
})
return index
}
/**
* Your Solution object will be instantiated and called as such:
* obj := Constructor(w);
* param_1 := obj.PickIndex();
*/
The Go implementation follows the same idea but uses Go standard library utilities.
Instead of Python's bisect_left, Go uses sort.Search, which performs binary search over indices. The callback function checks whether the prefix sum at position i is large enough to contain the random target.
Random number generation differs slightly:
rand.Intn(totalWeight) + 1
This produces a number in the inclusive range:
[1, totalWeight]
which matches the Python implementation.
Go slices are used instead of Python lists, and integers are safe because the maximum total weight:
10^4 × 10^5 = 10^9
fits comfortably within Go's integer range.
Worked Examples
Example 1
Input:
w = [1]
Initialization
| Step | Weight | Running Sum | Prefix Sums |
|---|---|---|---|
| 1 | 1 | 1 | [1] |
Final state:
prefix_sums = [1]
total_weight = 1
pickIndex()
Random target:
target = 1
Binary search:
| Index | Prefix Sum | Condition |
|---|---|---|
| 0 | 1 | 1 ≥ 1 |
Result:
return 0
Since only one index exists, it is always selected.
Example 2
Input:
w = [1, 3]
Initialization
| Step | Weight | Running Sum | Prefix Sums |
|---|---|---|---|
| 1 | 1 | 1 | [1] |
| 2 | 3 | 4 | [1, 4] |
Final state:
prefix_sums = [1, 4]
total_weight = 4
Probability intervals:
| Index | Weight | Interval |
|---|---|---|
| 0 | 1 | [1] |
| 1 | 3 | [2,4] |
Suppose:
target = 3
Binary search proceeds:
| Left | Right | Mid | Prefix[mid] | Decision |
|---|---|---|---|---|
| 0 | 1 | 0 | 1 | Move right |
| 1 | 1 | 1 | 4 | Found |
Result:
return 1
If:
target = 1
then index 0 is selected.
Because index 1 owns three probability slots while index 0 owns only one, repeated calls naturally approximate the expected probabilities:
P(0) = 1/4
P(1) = 3/4
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) initialization, O(log n) per pickIndex() |
Prefix sums are built once, binary search is logarithmic |
| Space | O(n) |
Stores the prefix sum array |
The preprocessing step scans the input array once to compute cumulative sums, requiring linear time. Each call to pickIndex() performs a binary search over the prefix sums, reducing lookup time to logarithmic complexity. Since we store one prefix sum per element, the extra memory usage is linear.
Test Cases
from collections import Counter
# Example 1
solution = Solution([1])
assert solution.pickIndex() == 0 # single element always returns 0
# Example 2
solution = Solution([1, 3])
for _ in range(100):
result = solution.pickIndex()
assert result in [0, 1] # valid index only
# Boundary case: minimum input size
solution = Solution([100000])
assert solution.pickIndex() == 0 # only one possible choice
# Uneven probability distribution
solution = Solution([1, 100000])
results = [solution.pickIndex() for _ in range(1000)]
count = Counter(results)
assert count[1] > count[0] # larger weight dominates
# Equal weights
solution = Solution([5, 5, 5])
for _ in range(100):
result = solution.pickIndex()
assert result in [0, 1, 2] # all indices valid
# Larger input
weights = [1] * 10000
solution = Solution(weights)
for _ in range(100):
result = solution.pickIndex()
assert 0 <= result < 10000 # handles max array size
Test Summary
| Test | Why |
|---|---|
[1] |
Validates single-element edge case |
[1,3] |
Confirms normal weighted behavior |
[100000] |
Tests large weight with one index |
[1,100000] |
Ensures dominant probability works |
[5,5,5] |
Validates equal probability setup |
10000 identical weights |
Stress test for maximum input size |
Edge Cases
Single Element Array
A case like:
w = [1]
can expose indexing bugs if binary search assumes multiple elements exist. Since there is only one valid interval, every random number must map to index 0.
Our implementation handles this naturally because the prefix sum array becomes:
[1]
and binary search always returns 0.
Extremely Skewed Weights
Consider:
w = [1, 100000]
A naive implementation might suffer from overflow or probability mistakes.
Our interval representation avoids this issue entirely:
prefix = [1, 100001]
The second index simply owns most of the probability space, so it is selected more frequently as intended.
Large Number of Calls
Since pickIndex() may be called up to 10^4 times, repeatedly recomputing cumulative weights would waste time.
By preprocessing prefix sums once in the constructor, every query becomes an efficient binary search:
O(log n)
This ensures strong performance even at maximum constraints.