LeetCode 1467 - Probability of a Two Boxes Having The Same Number of Distinct Balls
The problem gives us several colors of balls, where balls[i] represents how many balls exist for color i. The total numb
Difficulty: 🔴 Hard
Topics: Array, Math, Dynamic Programming, Backtracking, Combinatorics, Probability and Statistics
Solution
Problem Understanding
The problem gives us several colors of balls, where balls[i] represents how many balls exist for color i. The total number of balls is always even, say 2n. We randomly shuffle all balls and place the first n balls into box 1 and the remaining n balls into box 2.
We must compute the probability that the two boxes end up containing the same number of distinct colors.
A distinct color means that at least one ball of that color appears in the box. For example, if a box contains [red, red, blue], then it contains 2 distinct colors.
The important detail is that all valid shuffles are equally likely. Since the boxes are ordered, swapping the contents of the two boxes produces a different outcome.
The constraints are small:
balls.length <= 8balls[i] <= 6
This immediately suggests that exponential or combinatorial enumeration may be feasible if carefully optimized. The maximum total number of balls is:
8 * 6 = 48
Since the number of colors is at most 8, a backtracking solution over colors becomes practical.
The main challenge is not generating all physical permutations of balls, because that would be astronomically large. Instead, we must count distributions combinatorially.
Several edge cases are important:
- All colors appear exactly once, such as
[1,1,1,1]. Every balanced split automatically gives equal distinct counts. - One color dominates heavily, such as
[6,1,1]. Many splits become impossible or unequal. - Colors may all go into one box during recursion, which affects distinct color counts.
- We must correctly count multiplicities using combinations, because multiple physical arrangements correspond to the same color allocation.
Approaches
Brute Force Approach
A brute force solution would explicitly generate every possible way to choose n balls out of the 2n total balls for the first box. The remaining balls automatically go into the second box.
For every distribution:
- Count how many distinct colors appear in box 1.
- Count how many distinct colors appear in box 2.
- If the counts are equal, increment the favorable count.
- Divide favorable outcomes by total outcomes.
This works because every equally sized split corresponds to a valid shuffled outcome.
However, the total number of combinations grows extremely quickly:
C(48, 24)
This is far too large to enumerate directly.
Additionally, constructing explicit ball lists and checking each distribution repeatedly would be extremely inefficient.
Key Insight
The crucial observation is that we do not care about the order of balls inside the boxes. We only care about:
- how many balls of each color go into box 1
- how many go into box 2
- how many distinct colors each box contains
Instead of enumerating individual balls, we can enumerate how each color is split between the two boxes.
For each color with balls[i] balls, we decide:
x balls -> box 1
balls[i] - x balls -> box 2
This creates a much smaller state space.
For every split, we compute how many actual arrangements it represents using combinations:
C(balls[i], x)
Multiplying these across colors gives the number of physical configurations corresponding to that split.
This transforms the problem into a combinatorial backtracking problem.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(C(2n, n)) | O(n) | Enumerates every possible balanced distribution of balls |
| Optimal | O(product(balls[i] + 1)) | O(k) | Enumerates color splits instead of physical permutations |
Algorithm Walkthrough
Step 1: Compute Total Balls Per Box
The total number of balls is:
total = sum(balls)
Each box must contain:
half = total // 2
This becomes a hard constraint during recursion.
Step 2: Precompute Combination Values
We repeatedly need values like:
C(n, r)
These count how many ways we can choose which balls of a color go into box 1.
We use Python's math.comb() or a precomputed Pascal triangle.
Step 3: Backtracking Over Colors
We process colors one by one.
For color i, suppose it has balls[i] = c.
We try every possible split:
0 -> box1, c -> box2
1 -> box1, c-1 -> box2
2 -> box1, c-2 -> box2
...
c -> box1, 0 -> box2
For each split we track:
- current number of balls in box 1
- current number of balls in box 2
- distinct colors in box 1
- distinct colors in box 2
- number of combinatorial ways
Step 4: Track Distinct Colors
If x > 0, then box 1 gains this color.
If c - x > 0, then box 2 gains this color.
We increment distinct counts accordingly.
Step 5: Track Number of Ways
For a split choosing x balls into box 1:
ways *= C(c, x)
This counts how many physical assignments correspond to this color split.
Step 6: Prune Invalid States
If either box exceeds half balls, the state is invalid.
We stop recursion early.
Step 7: Evaluate Complete Distributions
When all colors are processed:
- both boxes must contain exactly
halfballs - if distinct counts match, add the number of ways to favorable
- always add the number of ways to total
Step 8: Return Probability
Finally:
probability = favorable / total
Why it works
Every valid physical arrangement corresponds uniquely to a particular split of each color between the two boxes. The recursion enumerates all such splits exactly once.
The combinatorial multiplier:
C(c, x)
correctly counts how many physical assignments produce that split.
Because we count both favorable and total weighted by their exact multiplicities, the resulting ratio is the true probability.
Python Solution
from typing import List
from math import comb
class Solution:
def getProbability(self, balls: List[int]) -> float:
total_balls = sum(balls)
half = total_balls // 2
favorable = 0
total = 0
def backtrack(
index: int,
box1_count: int,
box2_count: int,
distinct1: int,
distinct2: int,
ways: int
) -> None:
nonlocal favorable, total
if box1_count > half or box2_count > half:
return
if index == len(balls):
if box1_count == half and box2_count == half:
total += ways
if distinct1 == distinct2:
favorable += ways
return
current = balls[index]
for take in range(current + 1):
new_box1 = box1_count + take
new_box2 = box2_count + (current - take)
new_distinct1 = distinct1 + (1 if take > 0 else 0)
new_distinct2 = distinct2 + (1 if current - take > 0 else 0)
new_ways = ways * comb(current, take)
backtrack(
index + 1,
new_box1,
new_box2,
new_distinct1,
new_distinct2,
new_ways
)
backtrack(0, 0, 0, 0, 0, 1)
return favorable / total
The implementation follows the recursive combinatorial enumeration described earlier.
The recursive function processes one color at a time. For each color, it tries every possible number of balls that can go into box 1. The remainder automatically goes into box 2.
The variables distinct1 and distinct2 track how many unique colors currently appear in each box.
The variable ways stores how many physical arrangements correspond to the current partial distribution. Every split multiplies this value by:
comb(current, take)
which counts how many ways that specific split can occur.
When recursion reaches the end, the code checks:
- both boxes contain exactly half the balls
- whether distinct color counts are equal
If both conditions hold, the configuration contributes to the favorable count.
Finally, the probability is computed as:
favorable / total
Go Solution
package main
import "math"
func comb(n, r int) float64 {
if r > n-r {
r = n - r
}
result := 1.0
for i := 1; i <= r; i++ {
result *= float64(n-r+i)
result /= float64(i)
}
return result
}
func getProbability(balls []int) float64 {
totalBalls := 0
for _, x := range balls {
totalBalls += x
}
half := totalBalls / 2
var favorable float64
var total float64
var backtrack func(
index int,
box1Count int,
box2Count int,
distinct1 int,
distinct2 int,
ways float64,
)
backtrack = func(
index int,
box1Count int,
box2Count int,
distinct1 int,
distinct2 int,
ways float64,
) {
if box1Count > half || box2Count > half {
return
}
if index == len(balls) {
if box1Count == half && box2Count == half {
total += ways
if distinct1 == distinct2 {
favorable += ways
}
}
return
}
current := balls[index]
for take := 0; take <= current; take++ {
newBox1 := box1Count + take
newBox2 := box2Count + (current - take)
newDistinct1 := distinct1
newDistinct2 := distinct2
if take > 0 {
newDistinct1++
}
if current-take > 0 {
newDistinct2++
}
newWays := ways * comb(current, take)
backtrack(
index+1,
newBox1,
newBox2,
newDistinct1,
newDistinct2,
newWays,
)
}
}
backtrack(0, 0, 0, 0, 0, 1.0)
return favorable / total
}
The Go implementation mirrors the Python logic closely.
Since Go does not provide a built in combinatorial function like Python's math.comb, we implement a helper function manually using multiplicative computation.
The recursion uses float64 for counting ways to avoid integer overflow and simplify division at the end.
Go slices are passed efficiently by reference semantics, so no special optimization is needed for the balls array.
Worked Examples
Example 1
balls = [1,1]
Total balls:
2
Each box must contain:
1
Possible splits:
| Color 1 | Color 2 | Box 1 | Box 2 | Distinct Counts | Ways |
|---|---|---|---|---|---|
| 1/0 | 0/1 | [1] | [2] | 1 vs 1 | 1 |
| 0/1 | 1/0 | [2] | [1] | 1 vs 1 | 1 |
Favorable:
2
Total:
2
Probability:
1.0
Example 2
balls = [2,1,1]
Each box must contain:
2
Consider the split:
| Color | Box 1 | Box 2 |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 1 | 0 |
| 3 | 0 | 1 |
Distinct counts:
2 vs 2
Ways contributed:
C(2,1) * C(1,1) * C(1,0)
= 2 * 1 * 1
= 2
The recursion evaluates every possible split similarly.
Final counts:
favorable = 8
total = 12
Probability:
0.66667
Example 3
balls = [1,2,1,2]
Total balls:
6
Each box must contain:
3
The recursion explores all valid distributions while tracking:
- box sizes
- distinct color counts
- combinatorial multiplicities
Final result:
108 / 180 = 0.6
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(product(balls[i] + 1)) | Every color tries all possible split counts |
| Space | O(k) | Recursion depth is at most number of colors |
The recursion tree size depends on how many choices each color has. If a color has c balls, it contributes c + 1 branching options.
Since:
balls.length <= 8
balls[i] <= 6
the worst case is manageable:
7^8
which is acceptable for this problem.
The recursion stack depth never exceeds the number of colors.
Test Cases
sol = Solution()
assert abs(sol.getProbability([1,1]) - 1.0) < 1e-5
# simplest balanced case
assert abs(sol.getProbability([2,1,1]) - 0.6666666667) < 1e-5
# sample case with unequal color frequencies
assert abs(sol.getProbability([1,2,1,2]) - 0.6) < 1e-5
# larger sample case
assert abs(sol.getProbability([1,1,1,1]) - 1.0) < 1e-5
# every split keeps distinct counts equal
assert abs(sol.getProbability([2,2]) - 1.0) < 1e-5
# symmetric duplicate colors
assert abs(sol.getProbability([6,6]) - 1.0) < 1e-5
# only two colors, always balanced distinct counts
assert abs(sol.getProbability([3,2,1]) - 0.3) > 0
# sanity check on mixed counts
assert abs(sol.getProbability([4,4,4,4])) <= 1.0
# probability must remain valid
assert 0.0 <= sol.getProbability([6,1,1]) <= 1.0
# dominant color edge case
assert 0.0 <= sol.getProbability([2,2,2,2,2,2]) <= 1.0
# stress test with many colors
Test Summary
| Test | Why |
|---|---|
[1,1] |
Smallest nontrivial input |
[2,1,1] |
Validates combinatorial counting |
[1,2,1,2] |
Tests multiple branching choices |
[1,1,1,1] |
Every split succeeds |
[2,2] |
Symmetric repeated colors |
[6,6] |
Large counts with simple structure |
[3,2,1] |
Mixed frequencies |
[4,4,4,4] |
Larger combinatorial state space |
[6,1,1] |
Dominant color edge case |
[2,2,2,2,2,2] |
Stress test with many colors |
Edge Cases
One important edge case occurs when every color appears exactly once, such as [1,1,1,1]. In this situation, every valid balanced split automatically produces the same number of distinct colors in both boxes. A naive implementation might overcomplicate counting or mishandle symmetry. This solution handles it naturally because every color contributes either one distinct count to one box or the other.
Another important case is when one color dominates heavily, such as [6,1,1]. Many distributions place most balls of the dominant color into one box, which can easily create unequal distinct color counts. Incorrect pruning logic could accidentally discard valid states. The implementation only prunes when box sizes exceed the required half size, so all legitimate distributions are preserved.
A third tricky case involves colors entirely assigned to one box. For example, a split may place all balls of a color into box 1 and none into box 2. Some implementations incorrectly increment distinct counts in both boxes. This solution carefully checks:
take > 0
and
current - take > 0
before increasing distinct color counts, ensuring accuracy.
A final subtle issue is combinatorial multiplicity. Different physical shuffles may correspond to the same color split. If we counted each split equally, the probability would be incorrect. Multiplying by:
comb(current, take)
correctly weights each split according to how many physical arrangements produce it.