LeetCode 382 - Linked List Random Node

The problem gives us the head of a singly linked list and asks us to return a random node value such that every node in the list has exactly the same probability of being selected. The class has two operations: 1. Solution(head) initializes the object with the linked list. 2.

LeetCode Problem 382

Difficulty: 🟡 Medium
Topics: Linked List, Math, Reservoir Sampling, Randomized

Solution

Problem Understanding

The problem gives us the head of a singly linked list and asks us to return a random node value such that every node in the list has exactly the same probability of being selected.

The class has two operations:

  1. Solution(head) initializes the object with the linked list.
  2. getRandom() returns one node value chosen uniformly at random.

The important requirement is that every node must be equally likely to be returned. If the list contains n nodes, then each node must have probability 1 / n.

The linked list is singly linked, which means we can only move forward from one node to the next. We cannot index directly into the list like we can with an array. This detail matters because random access becomes expensive unless we preprocess the list.

The constraints tell us that:

  • The list contains between 1 and 10^4 nodes.
  • Up to 10^4 calls to getRandom() may occur.
  • Node values can be negative or positive.

The follow up question is the most important part of the problem. It asks what happens if:

  • The linked list is extremely large.
  • The length is unknown.
  • We want to avoid extra space.

These hints strongly suggest that the intended optimal solution is Reservoir Sampling.

Several edge cases are important:

  • A linked list containing only one node. The algorithm must always return that value.
  • Very large linked lists where storing all elements in an array may be undesirable.
  • Repeated values. Selection must be based on nodes, not distinct values.
  • Unknown list length. We cannot rely on counting nodes beforehand.

Approaches

Brute Force Approach

A straightforward solution is to traverse the linked list once during initialization and store all node values in an array.

Then, every call to getRandom() simply picks a random index from the array and returns the corresponding value.

This works because arrays support constant time indexing, and choosing a uniformly random index guarantees each node has equal probability.

For example:

  • Linked list: 1 -> 2 -> 3
  • Store as array: [1, 2, 3]
  • Generate random index between 0 and 2

Each value is returned with probability 1/3.

This approach is easy to implement and efficient for repeated queries. However, it uses O(n) extra space because all node values must be stored. The follow up explicitly asks whether we can avoid extra space, so we need something better.

Optimal Approach, Reservoir Sampling

The key observation is that we do not actually need to know the total length of the linked list in advance.

Reservoir Sampling allows us to process elements one at a time while maintaining a uniformly random selection among all elements seen so far.

The idea works like this:

  • Start with the first node as the current answer.
  • As we visit the i-th node, replace the current answer with probability 1 / i.
  • Continue until the end of the list.

This guarantees that after processing all nodes, every node has equal probability of being selected.

The beauty of Reservoir Sampling is that:

  • We only traverse the list once per call.
  • We use constant extra space.
  • We do not need to know the list length beforehand.

This is exactly what the follow up asks for.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) initialization, O(1) per query O(n) Store all node values in an array
Optimal O(n) per query O(1) Reservoir Sampling, no extra storage needed

Algorithm Walkthrough

Reservoir Sampling Algorithm

  1. Store the head of the linked list during initialization.

We keep a reference to the linked list because we will traverse it every time getRandom() is called. 2. Initialize a variable to hold the chosen value.

Initially, we select the first node's value as the tentative answer. 3. Traverse the linked list node by node.

We maintain a counter representing how many nodes have been seen so far. 4. For the i-th node, generate a random integer in the range [1, i].

If the random number equals 1, replace the current answer with the current node's value. 5. Continue until the traversal finishes.

At the end, return the selected value.

Why it works

The correctness comes from probability preservation.

Consider node k.

  • When node k is visited, it is selected with probability 1/k.
  • For it to remain selected, it must survive replacement by all later nodes.
  • Node k+1 replaces it with probability 1/(k+1), so it survives with probability k/(k+1).
  • Node k+2 replaces it with probability 1/(k+2), so it survives with probability (k+1)/(k+2).

Multiplying all probabilities:

$$\frac{1}{k} \times \frac{k}{k+1} \times \frac{k+1}{k+2} \times \cdots \times \frac{n-1}{n} = \frac{1}{n}$$

Therefore, every node ends with equal probability 1/n.

Python Solution

import random
from typing import Optional

# Definition for singly-linked list.
# class ListNode:
#     def __init__(self, val=0, next=None):
#         self.val = val
#         self.next = next

class Solution:

    def __init__(self, head: Optional[ListNode]):
        self.head = head

    def getRandom(self) -> int:
        current = self.head
        chosen_value = current.val

        node_index = 1

        while current:
            if random.randint(1, node_index) == 1:
                chosen_value = current.val

            current = current.next
            node_index += 1

        return chosen_value

# Your Solution object will be instantiated and called as such:
# obj = Solution(head)
# param_1 = obj.getRandom()

The implementation stores only the head pointer during initialization. No preprocessing or array conversion occurs.

Inside getRandom(), we traverse the linked list from beginning to end.

The variable chosen_value stores the current reservoir sample. Initially, it contains the first node's value.

The variable node_index tracks how many nodes have been visited so far.

For each node:

  • Generate a random integer from 1 to node_index
  • Replace the chosen value only if the random number equals 1

This replacement probability is exactly 1 / node_index, which is the core rule of Reservoir Sampling.

After traversal completes, the final selected value is returned.

Go Solution

import (
	"math/rand"
)

/**
 * Definition for singly-linked list.
 * type ListNode struct {
 *     Val int
 *     Next *ListNode
 * }
 */
type Solution struct {
	head *ListNode
}

func Constructor(head *ListNode) Solution {
	return Solution{
		head: head,
	}
}

func (this *Solution) GetRandom() int {
	current := this.head
	chosenValue := current.Val

	nodeIndex := 1

	for current != nil {
		if rand.Intn(nodeIndex) == 0 {
			chosenValue = current.Val
		}

		current = current.Next
		nodeIndex++
	}

	return chosenValue
}

/**
 * Your Solution object will be instantiated and called as such:
 * obj := Constructor(head);
 * param_1 := obj.GetRandom();
 */

The Go implementation follows the same Reservoir Sampling logic as the Python version.

One small difference is the random number generation API:

  • Python uses random.randint(1, node_index) == 1
  • Go uses rand.Intn(nodeIndex) == 0

Since rand.Intn(nodeIndex) generates a number in [0, nodeIndex - 1], checking equality with 0 gives probability 1 / nodeIndex.

Go also requires explicit struct fields, so the linked list head is stored inside the Solution struct.

Worked Examples

Example 1

Input list:

1 -> 2 -> 3

Suppose getRandom() is called once.

Step Current Node node_index Random Result Selected Value
Start 1 1 always select 1
Next 2 2 select with probability 1/2 maybe 2
Next 3 3 select with probability 1/3 maybe 3

Let us trace one possible execution.

Step Current Node Random Choice Selected Value
1 1 selected 1
2 2 not selected 1
3 3 selected 3

Final returned value:

3

Another execution could produce:

Step Current Node Random Choice Selected Value
1 1 selected 1
2 2 selected 2
3 3 not selected 2

Final returned value:

2

Over many calls, each node appears approximately one third of the time.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each call traverses the entire linked list once
Space O(1) Only a few variables are stored

The algorithm processes every node exactly once during each call to getRandom().

No additional data structures proportional to input size are allocated. The algorithm stores only:

  • A traversal pointer
  • A counter
  • The currently selected value

Therefore, the extra space usage remains constant.

Test Cases

import random
from collections import Counter

# Definition for testing
class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next

def build_list(values):
    dummy = ListNode(0)
    current = dummy

    for value in values:
        current.next = ListNode(value)
        current = current.next

    return dummy.next

# Test 1: Single node list
head = build_list([5])
solution = Solution(head)

for _ in range(100):
    assert solution.getRandom() == 5  # only possible value

# Test 2: Two node list
head = build_list([1, 2])
solution = Solution(head)

results = [solution.getRandom() for _ in range(1000)]

assert all(value in [1, 2] for value in results)  # valid outputs only

# Test 3: Multiple node list
head = build_list([1, 2, 3, 4, 5])
solution = Solution(head)

results = [solution.getRandom() for _ in range(1000)]

assert all(value in [1, 2, 3, 4, 5] for value in results)  # all values valid

# Test 4: Negative values
head = build_list([-10, -20, -30])
solution = Solution(head)

results = [solution.getRandom() for _ in range(500)]

assert all(value in [-10, -20, -30] for value in results)  # negatives handled correctly

# Test 5: Duplicate values
head = build_list([7, 7, 7])
solution = Solution(head)

for _ in range(100):
    assert solution.getRandom() == 7  # duplicates handled correctly

# Test 6: Approximate distribution check
head = build_list([1, 2, 3])
solution = Solution(head)

counter = Counter()

for _ in range(6000):
    counter[solution.getRandom()] += 1

for count in counter.values():
    assert 1500 <= count <= 2500  # approximate uniform distribution
Test Why
Single node list Ensures the only node is always returned
Two node list Validates random selection between two nodes
Multiple node list Confirms traversal works for larger lists
Negative values Ensures value sign does not affect logic
Duplicate values Confirms selection is node based, not value based
Distribution check Verifies approximate uniform randomness

Edge Cases

Single Node Linked List

A list containing only one node is the smallest valid input. This case can expose bugs in algorithms that assume at least two nodes exist.

For example:

5

The correct behavior is to always return 5.

The implementation handles this naturally because the traversal loop runs once, and the first node is selected with probability 1.

Duplicate Node Values

The linked list may contain repeated values.

For example:

7 -> 7 -> 7

A common mistake is to think in terms of unique values rather than nodes. The problem requires equal probability per node, not per distinct value.

Reservoir Sampling operates on nodes individually, so each node still has equal probability even if values repeat.

Extremely Large Linked List

The follow up specifically mentions very large linked lists with unknown size.

A brute force array conversion could consume substantial memory. It also requires knowing the full list contents in advance.

Reservoir Sampling avoids this issue completely because it processes one node at a time and uses only constant extra memory.

This makes the algorithm scalable and suitable for streaming or unknown length data.

Unknown Length of the List

Some randomized algorithms rely on knowing the total number of elements beforehand. That is impossible in a streaming scenario.

Reservoir Sampling does not require prior knowledge of the list length. The replacement probability adjusts dynamically as nodes are encountered.

This property is exactly why the algorithm is ideal for this problem.