LeetCode 3416 - Subsequences with a Unique Middle Mode II
We are given an array nums, and we must count how many subsequences of length exactly 5 have a very specific property. For any chosen subsequence the middle element is c, because it is the third element of a length-5 sequence. The subsequence is valid if: 1.
Difficulty: 🔴 Hard
Topics: Array, Hash Table, Math, Combinatorics
Solution
Problem Understanding
We are given an array nums, and we must count how many subsequences of length exactly 5 have a very specific property.
For any chosen subsequence
$$[a,b,c,d,e]$$
the middle element is c, because it is the third element of a length-5 sequence.
The subsequence is valid if:
cis a mode of the subsequence, meaning it appears the maximum number of times.cis the unique mode, meaning no other value appears as many times asc.
The important detail is that we are counting subsequences, not contiguous subarrays. We choose five indices
$$i_1 < i_2 < i_3 < i_4 < i_5$$
and the third chosen index determines the middle element.
The array length can be as large as
$$10^5$$
which immediately rules out any approach that explicitly enumerates subsequences. The total number of length-5 subsequences is
$$\binom{10^5}{5}$$
which is astronomically large.
The values themselves may be as large as 10^9 or as small as -10^9, so value compression or hash maps are required. We cannot use frequency arrays indexed by value.
Several edge cases are important:
- All elements are equal. Every length-5 subsequence is valid.
- All elements are distinct. No value can become a mode with frequency greater than one, therefore the answer is zero.
- Multiple values can tie for highest frequency. Those subsequences must be excluded because the middle value would not be a unique mode.
- The middle value may appear only once. Such subsequences are invalid because some other value may match or exceed its frequency.
Approaches
Brute Force
The most direct solution is to generate every subsequence of length five.
For each chosen subsequence:
- Count frequencies of its five values.
- Determine the maximum frequency.
- Check whether the middle element is the only value having that maximum frequency.
If so, increment the answer.
This is correct because it explicitly checks every valid subsequence definition.
However, the number of length-5 subsequences is
$$\binom{n}{5}$$
which is approximately $O(n^5)$. With $n = 10^5$, this is completely infeasible.
Optimal Observation
Instead of constructing subsequences, we fix the middle position.
Suppose index i is chosen as the middle element.
Let:
a = nums[i]pa= number of occurrences ofaon the left sidesa= number of occurrences ofaon the right side
We must choose:
- two positions from the left side
- two positions from the right side
Initially, there are
$$\binom{l}{2}\binom{r}{2}$$
ways, where:
l = ir = n - i - 1
These count every length-5 subsequence centered at i.
Then we subtract all invalid configurations:
ais not a mode.aties with another value.- another value becomes a mode.
The difficult part is counting those bad configurations efficiently.
A naive implementation would iterate through every distinct value for every index, producing $O(n^2)$ complexity.
The key optimization is to maintain several aggregated sums over value frequencies:
- $\sum p[b]s[b]$
- $\sum p[b]^2$
- $\sum s[b]^2$
- $\sum p[b]s[b]^2$
- $\sum s[b]p[b]^2$
These running totals allow all required combinatorial counts to be computed in constant time per index.
As we sweep from left to right, we update these aggregates whenever one occurrence moves from the suffix side to the prefix side.
This produces an overall $O(n)$ solution.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | $O(n^5)$ | $O(1)$ | Enumerates every length-5 subsequence |
| Optimal | $O(n)$ | $O(n)$ | Uses prefix/suffix frequency counts and combinatorial aggregates |
Algorithm Walkthrough
- Count the total frequency of every value and store it in a suffix map
s. - Create an empty prefix map
p. - Maintain the following running sums:
$$ps=\sum p[x]s[x]$$
$$pp=\sum p[x]^2$$
$$ss=\sum s[x]^2$$
$$pss=\sum p[x]s[x]^2$$
$$spp=\sum s[x]p[x]^2$$
These aggregates allow us to evaluate all invalid patterns without iterating through every distinct value.
- Sweep through the array. Treat position
ias the middle position. - Let:
a = nums[i]pa = p[a]sa = s[a]
Move the current occurrence from suffix to middle by decrementing sa.
- Compute:
l = ir = n - i - 1
The total number of subsequences centered at i equals
$$\binom{l}{2}\binom{r}{2}$$
- Subtract cases where the middle value appears only once in the subsequence. In those cases it cannot be a mode.
- Using the maintained aggregates, compute the number of configurations where another value ties or beats the frequency of
a. - Subtract all those invalid counts.
- Add the remaining valid count into the answer.
- Move the current value into the prefix structure and update all aggregates.
- Continue until every position has been processed.
Why it works
For every index i, the algorithm starts from the set of all length-5 subsequences whose middle element is nums[i]. Every invalid frequency pattern is counted and removed exactly once through combinatorial formulas derived from prefix and suffix occurrence counts.
The maintained aggregates represent sums over all distinct values, allowing the invalid configurations to be counted without iterating through every value. Therefore each position contributes its valid subsequences exactly once, and the final sum equals the number of subsequences whose middle element is the unique mode.
Python Solution
from collections import Counter, defaultdict
from typing import List
class Solution:
def subsequencesWithMiddleMode(self, nums: List[int]) -> int:
MOD = 1_000_000_007
def c2(x: int) -> int:
if x < 2:
return 0
return x * (x - 1) // 2
suffix = Counter(nums)
prefix = defaultdict(int)
pss = 0
spp = 0
pp = 0
ss = 0
ps = 0
for freq in suffix.values():
ss = (ss + freq * freq) % MOD
ans = 0
n = len(nums)
for i, a in enumerate(nums):
sa = suffix[a]
pa = prefix[a]
pss = (pss + pa * (-(sa * sa) + (sa - 1) * (sa - 1))) % MOD
spp = (spp - pa * pa) % MOD
ss = (ss - sa * sa + (sa - 1) * (sa - 1)) % MOD
ps = (ps - pa) % MOD
suffix[a] -= 1
sa = suffix[a]
left = i
right = n - i - 1
ans = (
ans
+ c2(left) * c2(right)
) % MOD
ans = (
ans
- c2(left - pa) * c2(right - sa)
) % MOD
pss_ = (pss - pa * sa * sa) % MOD
spp_ = (spp - sa * pa * pa) % MOD
pp_ = (pp - pa * pa) % MOD
ss_ = (ss - sa * sa) % MOD
ps_ = (ps - pa * sa) % MOD
p_ = left - pa
s_ = right - sa
subtract = 0
subtract = (
subtract
+ ps_ * (pa * (right - sa))
) % MOD
subtract = (
subtract
- pss_ * pa
) % MOD
subtract = (
subtract
+ ps_ * (sa * (left - pa))
) % MOD
subtract = (
subtract
- spp_ * sa
) % MOD
subtract = (
subtract
+ ((pp_ - p_) * sa * (right - sa)) // 2
) % MOD
subtract = (
subtract
+ ((ss_ - s_) * pa * (left - pa)) // 2
) % MOD
ans = (ans - subtract) % MOD
pss = (pss + sa * sa) % MOD
spp = (
spp
+ sa * (-(pa * pa) + (pa + 1) * (pa + 1))
) % MOD
pp = (
pp
- pa * pa
+ (pa + 1) * (pa + 1)
) % MOD
ps = (ps + sa) % MOD
prefix[a] += 1
return ans % MOD
The implementation follows the sweep-line strategy directly.
prefix stores occurrences already processed, while suffix stores occurrences that remain to the right of the current position. Every iteration treats the current index as the potential middle element.
The aggregate variables pss, spp, pp, ss, and ps summarize information across all values. They are updated in constant time whenever one occurrence moves from the suffix side to the prefix side.
The main loop first counts all possible centered subsequences, then subtracts invalid frequency patterns, and finally updates the prefix-side statistics for the next iteration.
Go Solution
package main
func subsequencesWithMiddleMode(nums []int) int {
const MOD int64 = 1_000_000_007
c2 := func(x int64) int64 {
if x < 2 {
return 0
}
return x * (x - 1) / 2
}
suffix := make(map[int]int64)
for _, v := range nums {
suffix[v]++
}
prefix := make(map[int]int64)
var pss int64
var spp int64
var pp int64
var ss int64
var ps int64
for _, freq := range suffix {
ss = (ss + freq*freq) % MOD
}
var ans int64
n := len(nums)
for i, a := range nums {
sa := suffix[a]
pa := prefix[a]
pss = (pss + pa*(-sa*sa+(sa-1)*(sa-1))) % MOD
spp = (spp - pa*pa) % MOD
ss = (ss - sa*sa + (sa-1)*(sa-1)) % MOD
ps = (ps - pa) % MOD
suffix[a]--
sa = suffix[a]
left := int64(i)
right := int64(n - i - 1)
ans = (ans + c2(left)*c2(right)) % MOD
ans = (ans - c2(left-pa)*c2(right-sa)) % MOD
pss2 := (pss - pa*sa*sa) % MOD
spp2 := (spp - sa*pa*pa) % MOD
pp2 := (pp - pa*pa) % MOD
ss2 := (ss - sa*sa) % MOD
ps2 := (ps - pa*sa) % MOD
pOther := left - pa
sOther := right - sa
var subtract int64
subtract = (subtract + ps2*(pa*(right-sa))) % MOD
subtract = (subtract - pss2*pa) % MOD
subtract = (subtract + ps2*(sa*(left-pa))) % MOD
subtract = (subtract - spp2*sa) % MOD
subtract = (subtract + ((pp2-pOther)*sa*(right-sa))/2) % MOD
subtract = (subtract + ((ss2-sOther)*pa*(left-pa))/2) % MOD
ans = (ans - subtract) % MOD
pss = (pss + sa*sa) % MOD
spp = (spp + sa*(-pa*pa+(pa+1)*(pa+1))) % MOD
pp = (pp - pa*pa + (pa+1)*(pa+1)) % MOD
ps = (ps + sa) % MOD
prefix[a]++
}
ans %= MOD
if ans < 0 {
ans += MOD
}
return int(ans)
}
The Go implementation mirrors the Python version closely. The primary difference is that all combinatorial calculations use int64 to avoid overflow. Go maps replace Python's Counter and defaultdict, and explicit modulo normalization is required because negative values may appear during intermediate subtraction steps.
Worked Examples
Example 1
Input:
[1,1,1,1,1,1]
Every length-5 subsequence equals:
[1,1,1,1,1]
There are:
$$\binom{6}{5}=6$$
such subsequences.
| Value | Frequency |
|---|---|
| 1 | 5 |
The middle element is 1, and it is the unique mode.
Answer:
6
Example 2
Input:
[1,2,2,3,3,4]
Possible length-5 subsequences:
| Chosen Subsequence | Valid? | Reason |
|---|---|---|
| [1,2,2,3,3] | No | 2 and 3 both appear twice |
| [1,2,2,3,4] | Yes | 2 is unique mode |
| [1,2,2,3,4] | Yes | Different indices |
| [1,2,3,3,4] | Yes | 3 is unique mode |
| [1,2,3,3,4] | Yes | Different indices |
| [2,2,3,3,4] | No | Tie between 2 and 3 |
Total valid subsequences:
4
Example 3
Input:
[0,1,2,3,4,5,6,7,8]
Every length-5 subsequence contains five distinct values.
Each frequency equals:
1
No value is a unique mode.
Answer:
0
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | $O(n)$ | Each index is processed once, and all updates are constant time |
| Space | $O(n)$ | Hash maps store frequencies of distinct values |
The crucial observation is that the aggregate statistics eliminate the need to iterate over all distinct values for every position. Every update only changes counts associated with the current value, allowing the sweep to remain linear.
Test Cases
sol = Solution()
assert sol.subsequencesWithMiddleMode([1, 1, 1, 1, 1, 1]) == 6 # all equal
assert sol.subsequencesWithMiddleMode([1, 2, 2, 3, 3, 4]) == 4 # sample 2
assert sol.subsequencesWithMiddleMode([0, 1, 2, 3, 4, 5, 6, 7, 8]) == 0 # all distinct
assert sol.subsequencesWithMiddleMode([1, 1, 1, 1, 1]) == 1 # minimum valid size
assert sol.subsequencesWithMiddleMode([1, 2, 3, 4, 5]) == 0 # no duplicates
assert sol.subsequencesWithMiddleMode([2, 2, 2, 2, 2, 2, 2]) == 21 # C(7,5)
assert sol.subsequencesWithMiddleMode([1, 1, 2, 1, 1]) == 1 # middle is dominant
assert sol.subsequencesWithMiddleMode([1, 2, 1, 2, 1]) == 1 # unique mode frequency 3
assert sol.subsequencesWithMiddleMode([1, 2, 2, 2, 3]) == 1 # one obvious valid subsequence
assert sol.subsequencesWithMiddleMode([1, 1, 2, 2, 3, 3, 4, 4]) >= 0 # tie-heavy case
Test Summary
| Test | Why |
|---|---|
[1,1,1,1,1,1] |
Every subsequence is valid |
[1,2,2,3,3,4] |
Official example with ties |
| Distinct values | No mode exists |
| Length exactly 5 | Smallest valid input |
| All equal length 7 | Large combinatorial count |
[1,1,2,1,1] |
Middle value appears most often |
[1,2,1,2,1] |
Frequency 3 versus 2 |
[1,2,2,2,3] |
Single guaranteed valid subsequence |
| Many paired duplicates | Stress test for tie handling |
Edge Cases
One important edge case occurs when all values are distinct. Every chosen subsequence has frequency distribution {1,1,1,1,1}. Since no value appears more often than the others, there is no unique mode. The implementation correctly removes these cases through the invalid-count formulas, producing zero.
Another important case occurs when all values are identical. Every length-5 subsequence contains the same value five times, making the middle value trivially the unique mode. The combinatorial counting naturally reduces to counting all length-5 subsequences, which is exactly $\binom{n}{5}$.
A third subtle case occurs when another value ties the middle value. For example, in [1,2,2,3,3], both 2 and 3 appear twice. Although the middle value is a mode, it is not a unique mode. The aggregate subtraction formulas explicitly count and remove these tie configurations, preventing overcounting.
A final edge case involves very large arrays with many distinct values. Any solution that loops over all distinct values for every index becomes quadratic and fails. The maintained aggregate sums allow the implementation to update only the current value while still accounting for every other value collectively, preserving linear complexity.