LeetCode 528 - Random Pick with Weight

The problem asks us to design a data structure that supports weighted random sampling. You are given an array w, where each element represents the weight of an index. Instead of choosing every index with equal probability, the selection probability depends on its weight.

LeetCode Problem 528

Difficulty: 🟡 Medium
Topics: Array, Math, Binary Search, Prefix Sum, Randomized

Solution

Problem Understanding

The problem asks us to design a data structure that supports weighted random sampling.

You are given an array w, where each element represents the weight of an index. Instead of choosing every index with equal probability, the selection probability depends on its weight.

For example:

  • If w = [1, 3]
  • Total weight = 1 + 3 = 4
  • Index 0 should be chosen with probability 1/4
  • Index 1 should be chosen with probability 3/4

The challenge is implementing the method pickIndex() such that repeated calls statistically follow these probabilities.

In other words, we need to simulate a probability distribution where larger weights make an index more likely to be chosen.

The input consists of:

  • A constructor Solution(w) that initializes the data structure.
  • A method pickIndex() that returns one valid index from [0, len(w) - 1].

The key requirement is that selection must be random but weighted.

The constraints give us important hints about efficiency:

  • w.length <= 10^4
  • pickIndex() is called at most 10^4 times
  • w[i] <= 10^5

Since pickIndex() may be called many times, recomputing probabilities from scratch every call would be inefficient. We should preprocess information once during initialization and make each query efficient.

An important observation is that all weights are positive integers, meaning every index always has a non-zero probability of being selected. This removes corner cases involving zero probability intervals.

Some edge cases that could trip up a naive implementation include:

  • A single-element array like [1], where the only valid answer is 0.
  • Very uneven distributions such as [1, 100000], where one index should dominate the probability space.
  • Large arrays with repeated calls to pickIndex(), where poor performance would become noticeable.

The problem guarantees valid positive weights, so we do not need to handle negative values or empty arrays.

Approaches

Brute Force Approach

A straightforward approach is to explicitly build a weighted pool.

For each index i, insert it into an auxiliary list w[i] times.

For example:

w = [1, 3]

Expanded array:
[0, 1, 1, 1]

Then pickIndex() simply selects a random position in this expanded array.

This works because frequency directly corresponds to probability:

  • Index 0 appears once
  • Index 1 appears three times

Thus, index 1 is selected three times more often.

Although correct, this approach becomes impractical when weights are large.

For example:

w = [100000, 100000]

The expanded array would require 200000 elements.

Since:

sum(w) can be as large as 10^9

this solution is far too memory-intensive.

Key Insight

Instead of physically expanding the weighted array, we can think of weights as ranges on a number line.

Suppose:

w = [1, 3, 2]

We create prefix sums:

[1, 4, 6]

This represents intervals:

Index Range
0 [1]
1 [2,4]
2 [5,6]

Now imagine generating a random integer:

target ∈ [1, totalWeight]

If:

target = 3

then it lies inside index 1’s range.

The problem becomes:

Find the first prefix sum greater than or equal to target.

Since prefix sums are sorted, we can use binary search.

This gives:

  • Preprocessing: O(n)
  • Each query: O(log n)

which is much more efficient.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(sum(w)) preprocessing, O(1) pick O(sum(w)) Expands the weighted array explicitly, impractical for large weights
Optimal O(n) preprocessing, O(log n) pick O(n) Uses prefix sums and binary search

Algorithm Walkthrough

  1. During initialization, compute a prefix sum array.

Each position stores the cumulative sum of weights up to that index.

Example:

w = [1, 3, 2]

prefix = [1, 4, 6]

This transforms weights into searchable intervals. 2. Store the total weight.

The last prefix sum equals the total probability mass:

total = 6
  1. When pickIndex() is called, generate a random integer in the inclusive range:
[1, total]

Every number in this range represents one probability slot. 4. Use binary search to locate the first prefix sum greater than or equal to the random number.

Example:

target = 5
prefix = [1, 4, 6]

Binary search finds:

6 >= 5

which corresponds to index 2. 5. Return that index.

Because intervals are proportional to weight size, larger weights occupy more random-number slots and therefore get selected more frequently.

Why it works

The prefix sum array partitions the range [1, totalWeight] into contiguous intervals whose lengths equal their weights.

If index i has weight w[i], then it owns exactly w[i] integers in the probability space. Since every random number is equally likely, the probability of landing inside index i’s interval is:

$$\frac{w[i]}{\text{sum}(w)}$$

Binary search correctly identifies which interval contains the random number, guaranteeing the required weighted probability distribution.

Python Solution

from typing import List
import random
import bisect

class Solution:

    def __init__(self, w: List[int]):
        self.prefix_sums = []
        running_sum = 0

        for weight in w:
            running_sum += weight
            self.prefix_sums.append(running_sum)

        self.total_weight = running_sum

    def pickIndex(self) -> int:
        target = random.randint(1, self.total_weight)

        return bisect.bisect_left(self.prefix_sums, target)

# Your Solution object will be instantiated and called as such:
# obj = Solution(w)
# param_1 = obj.pickIndex()

The constructor preprocesses the input array by computing prefix sums. A running cumulative total is maintained, and each cumulative value is appended to self.prefix_sums.

For example:

w = [1, 3, 2]

becomes:

prefix_sums = [1, 4, 6]

The variable self.total_weight stores the maximum possible random number.

Inside pickIndex(), we generate a uniformly random integer between 1 and total_weight. This random value represents a location in the weighted probability space.

We then use bisect_left() to find the first prefix sum greater than or equal to the target. Because the prefix sum array is sorted, binary search runs in logarithmic time.

For instance:

prefix = [1, 4, 6]
target = 4

Binary search returns index 1, which corresponds to the interval [2,4].

Go Solution

package main

import (
	"math/rand"
	"sort"
)

type Solution struct {
	prefixSums []int
	totalWeight int
}

func Constructor(w []int) Solution {
	prefixSums := make([]int, len(w))
	runningSum := 0

	for i, weight := range w {
		runningSum += weight
		prefixSums[i] = runningSum
	}

	return Solution{
		prefixSums: prefixSums,
		totalWeight: runningSum,
	}
}

func (this *Solution) PickIndex() int {
	target := rand.Intn(this.totalWeight) + 1

	index := sort.Search(len(this.prefixSums), func(i int) bool {
		return this.prefixSums[i] >= target
	})

	return index
}

/**
 * Your Solution object will be instantiated and called as such:
 * obj := Constructor(w);
 * param_1 := obj.PickIndex();
 */

The Go implementation follows the same idea but uses Go standard library utilities.

Instead of Python's bisect_left, Go uses sort.Search, which performs binary search over indices. The callback function checks whether the prefix sum at position i is large enough to contain the random target.

Random number generation differs slightly:

rand.Intn(totalWeight) + 1

This produces a number in the inclusive range:

[1, totalWeight]

which matches the Python implementation.

Go slices are used instead of Python lists, and integers are safe because the maximum total weight:

10^4 × 10^5 = 10^9

fits comfortably within Go's integer range.

Worked Examples

Example 1

Input:

w = [1]

Initialization

Step Weight Running Sum Prefix Sums
1 1 1 [1]

Final state:

prefix_sums = [1]
total_weight = 1

pickIndex()

Random target:

target = 1

Binary search:

Index Prefix Sum Condition
0 1 1 ≥ 1

Result:

return 0

Since only one index exists, it is always selected.

Example 2

Input:

w = [1, 3]

Initialization

Step Weight Running Sum Prefix Sums
1 1 1 [1]
2 3 4 [1, 4]

Final state:

prefix_sums = [1, 4]
total_weight = 4

Probability intervals:

Index Weight Interval
0 1 [1]
1 3 [2,4]

Suppose:

target = 3

Binary search proceeds:

Left Right Mid Prefix[mid] Decision
0 1 0 1 Move right
1 1 1 4 Found

Result:

return 1

If:

target = 1

then index 0 is selected.

Because index 1 owns three probability slots while index 0 owns only one, repeated calls naturally approximate the expected probabilities:

P(0) = 1/4
P(1) = 3/4

Complexity Analysis

Measure Complexity Explanation
Time O(n) initialization, O(log n) per pickIndex() Prefix sums are built once, binary search is logarithmic
Space O(n) Stores the prefix sum array

The preprocessing step scans the input array once to compute cumulative sums, requiring linear time. Each call to pickIndex() performs a binary search over the prefix sums, reducing lookup time to logarithmic complexity. Since we store one prefix sum per element, the extra memory usage is linear.

Test Cases

from collections import Counter

# Example 1
solution = Solution([1])
assert solution.pickIndex() == 0  # single element always returns 0

# Example 2
solution = Solution([1, 3])

for _ in range(100):
    result = solution.pickIndex()
    assert result in [0, 1]  # valid index only

# Boundary case: minimum input size
solution = Solution([100000])
assert solution.pickIndex() == 0  # only one possible choice

# Uneven probability distribution
solution = Solution([1, 100000])

results = [solution.pickIndex() for _ in range(1000)]
count = Counter(results)

assert count[1] > count[0]  # larger weight dominates

# Equal weights
solution = Solution([5, 5, 5])

for _ in range(100):
    result = solution.pickIndex()
    assert result in [0, 1, 2]  # all indices valid

# Larger input
weights = [1] * 10000
solution = Solution(weights)

for _ in range(100):
    result = solution.pickIndex()
    assert 0 <= result < 10000  # handles max array size

Test Summary

Test Why
[1] Validates single-element edge case
[1,3] Confirms normal weighted behavior
[100000] Tests large weight with one index
[1,100000] Ensures dominant probability works
[5,5,5] Validates equal probability setup
10000 identical weights Stress test for maximum input size

Edge Cases

Single Element Array

A case like:

w = [1]

can expose indexing bugs if binary search assumes multiple elements exist. Since there is only one valid interval, every random number must map to index 0.

Our implementation handles this naturally because the prefix sum array becomes:

[1]

and binary search always returns 0.

Extremely Skewed Weights

Consider:

w = [1, 100000]

A naive implementation might suffer from overflow or probability mistakes.

Our interval representation avoids this issue entirely:

prefix = [1, 100001]

The second index simply owns most of the probability space, so it is selected more frequently as intended.

Large Number of Calls

Since pickIndex() may be called up to 10^4 times, repeatedly recomputing cumulative weights would waste time.

By preprocessing prefix sums once in the constructor, every query becomes an efficient binary search:

O(log n)

This ensures strong performance even at maximum constraints.