LeetCode 3416 - Subsequences with a Unique Middle Mode II

We are given an array nums, and we must count how many subsequences of length exactly 5 have a very specific property. For any chosen subsequence the middle element is c, because it is the third element of a length-5 sequence. The subsequence is valid if: 1.

LeetCode Problem 3416

Difficulty: 🔴 Hard
Topics: Array, Hash Table, Math, Combinatorics

Solution

Problem Understanding

We are given an array nums, and we must count how many subsequences of length exactly 5 have a very specific property.

For any chosen subsequence

$$[a,b,c,d,e]$$

the middle element is c, because it is the third element of a length-5 sequence.

The subsequence is valid if:

  1. c is a mode of the subsequence, meaning it appears the maximum number of times.
  2. c is the unique mode, meaning no other value appears as many times as c.

The important detail is that we are counting subsequences, not contiguous subarrays. We choose five indices

$$i_1 < i_2 < i_3 < i_4 < i_5$$

and the third chosen index determines the middle element.

The array length can be as large as

$$10^5$$

which immediately rules out any approach that explicitly enumerates subsequences. The total number of length-5 subsequences is

$$\binom{10^5}{5}$$

which is astronomically large.

The values themselves may be as large as 10^9 or as small as -10^9, so value compression or hash maps are required. We cannot use frequency arrays indexed by value.

Several edge cases are important:

  • All elements are equal. Every length-5 subsequence is valid.
  • All elements are distinct. No value can become a mode with frequency greater than one, therefore the answer is zero.
  • Multiple values can tie for highest frequency. Those subsequences must be excluded because the middle value would not be a unique mode.
  • The middle value may appear only once. Such subsequences are invalid because some other value may match or exceed its frequency.

Approaches

Brute Force

The most direct solution is to generate every subsequence of length five.

For each chosen subsequence:

  1. Count frequencies of its five values.
  2. Determine the maximum frequency.
  3. Check whether the middle element is the only value having that maximum frequency.

If so, increment the answer.

This is correct because it explicitly checks every valid subsequence definition.

However, the number of length-5 subsequences is

$$\binom{n}{5}$$

which is approximately $O(n^5)$. With $n = 10^5$, this is completely infeasible.

Optimal Observation

Instead of constructing subsequences, we fix the middle position.

Suppose index i is chosen as the middle element.

Let:

  • a = nums[i]
  • pa = number of occurrences of a on the left side
  • sa = number of occurrences of a on the right side

We must choose:

  • two positions from the left side
  • two positions from the right side

Initially, there are

$$\binom{l}{2}\binom{r}{2}$$

ways, where:

  • l = i
  • r = n - i - 1

These count every length-5 subsequence centered at i.

Then we subtract all invalid configurations:

  • a is not a mode.
  • a ties with another value.
  • another value becomes a mode.

The difficult part is counting those bad configurations efficiently.

A naive implementation would iterate through every distinct value for every index, producing $O(n^2)$ complexity.

The key optimization is to maintain several aggregated sums over value frequencies:

  • $\sum p[b]s[b]$
  • $\sum p[b]^2$
  • $\sum s[b]^2$
  • $\sum p[b]s[b]^2$
  • $\sum s[b]p[b]^2$

These running totals allow all required combinatorial counts to be computed in constant time per index.

As we sweep from left to right, we update these aggregates whenever one occurrence moves from the suffix side to the prefix side.

This produces an overall $O(n)$ solution.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force $O(n^5)$ $O(1)$ Enumerates every length-5 subsequence
Optimal $O(n)$ $O(n)$ Uses prefix/suffix frequency counts and combinatorial aggregates

Algorithm Walkthrough

  1. Count the total frequency of every value and store it in a suffix map s.
  2. Create an empty prefix map p.
  3. Maintain the following running sums:

$$ps=\sum p[x]s[x]$$

$$pp=\sum p[x]^2$$

$$ss=\sum s[x]^2$$

$$pss=\sum p[x]s[x]^2$$

$$spp=\sum s[x]p[x]^2$$

These aggregates allow us to evaluate all invalid patterns without iterating through every distinct value.

  1. Sweep through the array. Treat position i as the middle position.
  2. Let:
  • a = nums[i]
  • pa = p[a]
  • sa = s[a]

Move the current occurrence from suffix to middle by decrementing sa.

  1. Compute:
  • l = i
  • r = n - i - 1

The total number of subsequences centered at i equals

$$\binom{l}{2}\binom{r}{2}$$

  1. Subtract cases where the middle value appears only once in the subsequence. In those cases it cannot be a mode.
  2. Using the maintained aggregates, compute the number of configurations where another value ties or beats the frequency of a.
  3. Subtract all those invalid counts.
  4. Add the remaining valid count into the answer.
  5. Move the current value into the prefix structure and update all aggregates.
  6. Continue until every position has been processed.

Why it works

For every index i, the algorithm starts from the set of all length-5 subsequences whose middle element is nums[i]. Every invalid frequency pattern is counted and removed exactly once through combinatorial formulas derived from prefix and suffix occurrence counts.

The maintained aggregates represent sums over all distinct values, allowing the invalid configurations to be counted without iterating through every value. Therefore each position contributes its valid subsequences exactly once, and the final sum equals the number of subsequences whose middle element is the unique mode.

Python Solution

from collections import Counter, defaultdict
from typing import List

class Solution:
    def subsequencesWithMiddleMode(self, nums: List[int]) -> int:
        MOD = 1_000_000_007

        def c2(x: int) -> int:
            if x < 2:
                return 0
            return x * (x - 1) // 2

        suffix = Counter(nums)
        prefix = defaultdict(int)

        pss = 0
        spp = 0
        pp = 0
        ss = 0
        ps = 0

        for freq in suffix.values():
            ss = (ss + freq * freq) % MOD

        ans = 0
        n = len(nums)

        for i, a in enumerate(nums):
            sa = suffix[a]
            pa = prefix[a]

            pss = (pss + pa * (-(sa * sa) + (sa - 1) * (sa - 1))) % MOD
            spp = (spp - pa * pa) % MOD
            ss = (ss - sa * sa + (sa - 1) * (sa - 1)) % MOD
            ps = (ps - pa) % MOD

            suffix[a] -= 1
            sa = suffix[a]

            left = i
            right = n - i - 1

            ans = (
                ans
                + c2(left) * c2(right)
            ) % MOD

            ans = (
                ans
                - c2(left - pa) * c2(right - sa)
            ) % MOD

            pss_ = (pss - pa * sa * sa) % MOD
            spp_ = (spp - sa * pa * pa) % MOD
            pp_ = (pp - pa * pa) % MOD
            ss_ = (ss - sa * sa) % MOD
            ps_ = (ps - pa * sa) % MOD

            p_ = left - pa
            s_ = right - sa

            subtract = 0

            subtract = (
                subtract
                + ps_ * (pa * (right - sa))
            ) % MOD

            subtract = (
                subtract
                - pss_ * pa
            ) % MOD

            subtract = (
                subtract
                + ps_ * (sa * (left - pa))
            ) % MOD

            subtract = (
                subtract
                - spp_ * sa
            ) % MOD

            subtract = (
                subtract
                + ((pp_ - p_) * sa * (right - sa)) // 2
            ) % MOD

            subtract = (
                subtract
                + ((ss_ - s_) * pa * (left - pa)) // 2
            ) % MOD

            ans = (ans - subtract) % MOD

            pss = (pss + sa * sa) % MOD
            spp = (
                spp
                + sa * (-(pa * pa) + (pa + 1) * (pa + 1))
            ) % MOD

            pp = (
                pp
                - pa * pa
                + (pa + 1) * (pa + 1)
            ) % MOD

            ps = (ps + sa) % MOD

            prefix[a] += 1

        return ans % MOD

The implementation follows the sweep-line strategy directly.

prefix stores occurrences already processed, while suffix stores occurrences that remain to the right of the current position. Every iteration treats the current index as the potential middle element.

The aggregate variables pss, spp, pp, ss, and ps summarize information across all values. They are updated in constant time whenever one occurrence moves from the suffix side to the prefix side.

The main loop first counts all possible centered subsequences, then subtracts invalid frequency patterns, and finally updates the prefix-side statistics for the next iteration.

Go Solution

package main

func subsequencesWithMiddleMode(nums []int) int {
	const MOD int64 = 1_000_000_007

	c2 := func(x int64) int64 {
		if x < 2 {
			return 0
		}
		return x * (x - 1) / 2
	}

	suffix := make(map[int]int64)
	for _, v := range nums {
		suffix[v]++
	}

	prefix := make(map[int]int64)

	var pss int64
	var spp int64
	var pp int64
	var ss int64
	var ps int64

	for _, freq := range suffix {
		ss = (ss + freq*freq) % MOD
	}

	var ans int64
	n := len(nums)

	for i, a := range nums {
		sa := suffix[a]
		pa := prefix[a]

		pss = (pss + pa*(-sa*sa+(sa-1)*(sa-1))) % MOD
		spp = (spp - pa*pa) % MOD
		ss = (ss - sa*sa + (sa-1)*(sa-1)) % MOD
		ps = (ps - pa) % MOD

		suffix[a]--
		sa = suffix[a]

		left := int64(i)
		right := int64(n - i - 1)

		ans = (ans + c2(left)*c2(right)) % MOD
		ans = (ans - c2(left-pa)*c2(right-sa)) % MOD

		pss2 := (pss - pa*sa*sa) % MOD
		spp2 := (spp - sa*pa*pa) % MOD
		pp2 := (pp - pa*pa) % MOD
		ss2 := (ss - sa*sa) % MOD
		ps2 := (ps - pa*sa) % MOD

		pOther := left - pa
		sOther := right - sa

		var subtract int64

		subtract = (subtract + ps2*(pa*(right-sa))) % MOD
		subtract = (subtract - pss2*pa) % MOD
		subtract = (subtract + ps2*(sa*(left-pa))) % MOD
		subtract = (subtract - spp2*sa) % MOD
		subtract = (subtract + ((pp2-pOther)*sa*(right-sa))/2) % MOD
		subtract = (subtract + ((ss2-sOther)*pa*(left-pa))/2) % MOD

		ans = (ans - subtract) % MOD

		pss = (pss + sa*sa) % MOD
		spp = (spp + sa*(-pa*pa+(pa+1)*(pa+1))) % MOD
		pp = (pp - pa*pa + (pa+1)*(pa+1)) % MOD
		ps = (ps + sa) % MOD

		prefix[a]++
	}

	ans %= MOD
	if ans < 0 {
		ans += MOD
	}

	return int(ans)
}

The Go implementation mirrors the Python version closely. The primary difference is that all combinatorial calculations use int64 to avoid overflow. Go maps replace Python's Counter and defaultdict, and explicit modulo normalization is required because negative values may appear during intermediate subtraction steps.

Worked Examples

Example 1

Input:

[1,1,1,1,1,1]

Every length-5 subsequence equals:

[1,1,1,1,1]

There are:

$$\binom{6}{5}=6$$

such subsequences.

Value Frequency
1 5

The middle element is 1, and it is the unique mode.

Answer:

6

Example 2

Input:

[1,2,2,3,3,4]

Possible length-5 subsequences:

Chosen Subsequence Valid? Reason
[1,2,2,3,3] No 2 and 3 both appear twice
[1,2,2,3,4] Yes 2 is unique mode
[1,2,2,3,4] Yes Different indices
[1,2,3,3,4] Yes 3 is unique mode
[1,2,3,3,4] Yes Different indices
[2,2,3,3,4] No Tie between 2 and 3

Total valid subsequences:

4

Example 3

Input:

[0,1,2,3,4,5,6,7,8]

Every length-5 subsequence contains five distinct values.

Each frequency equals:

1

No value is a unique mode.

Answer:

0

Complexity Analysis

Measure Complexity Explanation
Time $O(n)$ Each index is processed once, and all updates are constant time
Space $O(n)$ Hash maps store frequencies of distinct values

The crucial observation is that the aggregate statistics eliminate the need to iterate over all distinct values for every position. Every update only changes counts associated with the current value, allowing the sweep to remain linear.

Test Cases

sol = Solution()

assert sol.subsequencesWithMiddleMode([1, 1, 1, 1, 1, 1]) == 6  # all equal
assert sol.subsequencesWithMiddleMode([1, 2, 2, 3, 3, 4]) == 4  # sample 2
assert sol.subsequencesWithMiddleMode([0, 1, 2, 3, 4, 5, 6, 7, 8]) == 0  # all distinct

assert sol.subsequencesWithMiddleMode([1, 1, 1, 1, 1]) == 1  # minimum valid size
assert sol.subsequencesWithMiddleMode([1, 2, 3, 4, 5]) == 0  # no duplicates

assert sol.subsequencesWithMiddleMode([2, 2, 2, 2, 2, 2, 2]) == 21  # C(7,5)

assert sol.subsequencesWithMiddleMode([1, 1, 2, 1, 1]) == 1  # middle is dominant

assert sol.subsequencesWithMiddleMode([1, 2, 1, 2, 1]) == 1  # unique mode frequency 3

assert sol.subsequencesWithMiddleMode([1, 2, 2, 2, 3]) == 1  # one obvious valid subsequence

assert sol.subsequencesWithMiddleMode([1, 1, 2, 2, 3, 3, 4, 4]) >= 0  # tie-heavy case

Test Summary

Test Why
[1,1,1,1,1,1] Every subsequence is valid
[1,2,2,3,3,4] Official example with ties
Distinct values No mode exists
Length exactly 5 Smallest valid input
All equal length 7 Large combinatorial count
[1,1,2,1,1] Middle value appears most often
[1,2,1,2,1] Frequency 3 versus 2
[1,2,2,2,3] Single guaranteed valid subsequence
Many paired duplicates Stress test for tie handling

Edge Cases

One important edge case occurs when all values are distinct. Every chosen subsequence has frequency distribution {1,1,1,1,1}. Since no value appears more often than the others, there is no unique mode. The implementation correctly removes these cases through the invalid-count formulas, producing zero.

Another important case occurs when all values are identical. Every length-5 subsequence contains the same value five times, making the middle value trivially the unique mode. The combinatorial counting naturally reduces to counting all length-5 subsequences, which is exactly $\binom{n}{5}$.

A third subtle case occurs when another value ties the middle value. For example, in [1,2,2,3,3], both 2 and 3 appear twice. Although the middle value is a mode, it is not a unique mode. The aggregate subtraction formulas explicitly count and remove these tie configurations, preventing overcounting.

A final edge case involves very large arrays with many distinct values. Any solution that loops over all distinct values for every index becomes quadratic and fails. The maintained aggregate sums allow the implementation to update only the current value while still accounting for every other value collectively, preserving linear complexity.