LeetCode 3934 - Smallest Unique Subarray

The problem asks us to find the smallest possible length of a contiguous subarray such that there exists at least one subarray of that length that appears exactly once in the entire array.

LeetCode Problem 3934

Difficulty: 🔴 Hard
Topics: Array, Hash Table, Binary Search, Rolling Hash, Suffix Array, Hash Function

Solution

Problem Understanding

The problem asks us to find the smallest possible length of a contiguous subarray such that there exists at least one subarray of that length that appears exactly once in the entire array.

In other words, for a fixed length L, we look at every subarray of size L and count how many times each distinct sequence occurs. If at least one of those sequences occurs exactly once, then L is considered valid. Our goal is to find the minimum such valid L.

The input is a single integer array nums of size up to 10^5, and each element is an integer up to 10^5. The output is a single integer representing the smallest length satisfying the uniqueness condition.

The constraints imply that we cannot afford naive substring enumeration with deep comparisons. A naive O(n^3) approach would be far too slow, and even O(n^2) per length would be too large. This strongly suggests that we need hashing, prefix preprocessing, or a binary search optimization.

Edge cases include arrays of size 1, arrays where all elements are identical, and arrays where all elements are distinct. These cases test whether the solution correctly handles maximal repetition and maximal uniqueness scenarios.

Approaches

Brute Force Approach

The brute force approach considers every possible subarray length L from 1 to n. For each L, it generates all subarrays of that length, compares them pairwise, and counts occurrences. If any subarray appears exactly once, we record L as a candidate answer.

This works because it directly enforces the definition of the problem, but it is too slow. There are O(n) lengths, O(n) subarrays per length, and comparing subarrays costs O(L), leading to a cubic worst-case complexity.

Optimal Approach: Binary Search + Rolling Hash

The key insight is that we can efficiently test whether a given length L is valid by checking if any subarray of length L appears exactly once. We can compute hashes for all subarrays of length L in O(n) using a rolling hash and store their frequencies in a hash map.

We then use binary search over L because the property is monotonic in practice for this problem: if there exists a unique subarray of length L, then increasing the length generally reduces repetition, making uniqueness easier to achieve. Thus, we can find the minimum valid L efficiently.

To avoid recomputing subarray values, we use prefix rolling hashes so each subarray hash can be computed in O(1).

Complexity Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n^3) O(1) Compare all subarrays directly
Optimal (Binary Search + Rolling Hash) O(n log n) O(n) Efficient frequency counting using hashing

Algorithm Walkthrough

  1. We define a helper function exists_unique(L) that determines whether there is at least one subarray of length L that appears exactly once. This function is the core check used by binary search.
  2. Inside exists_unique(L), we compute prefix hashes of the array using a polynomial rolling hash. This allows us to compute any subarray hash in constant time, which is critical for efficiency.
  3. We iterate over all subarrays of length L using a sliding window. For each window, we compute its hash and store its frequency in a hash map.
  4. After processing all subarrays of length L, we scan the frequency map to check if any hash has frequency exactly equal to 1. If such a hash exists, we return true.
  5. We perform binary search on L from 1 to n. For each midpoint mid, we check exists_unique(mid). If true, we attempt smaller lengths; otherwise, we increase the length.
  6. The final answer is the smallest L for which exists_unique(L) returns true.

Why it works

The correctness relies on the fact that subarray uniqueness becomes easier to achieve as length increases because longer sequences have fewer opportunities to repeat identically. Combined with exact frequency counting using hashing, this ensures that once a valid length is found, we can safely search for smaller valid lengths without missing the true minimum.

Python Solution

from typing import List
from collections import defaultdict

class Solution:
    def smallestUniqueSubarray(self, nums: List[int]) -> int:
        n = len(nums)
        base = 91138233
        mod = 10**9 + 7

        prefix = [0] * (n + 1)
        power = [1] * (n + 1)

        for i in range(n):
            prefix[i + 1] = (prefix[i] * base + nums[i]) % mod
            power[i + 1] = (power[i] * base) % mod

        def get_hash(l: int, r: int) -> int:
            return (prefix[r] - prefix[l] * power[r - l]) % mod

        def exists_unique(L: int) -> bool:
            freq = defaultdict(int)
            for i in range(n - L + 1):
                h = get_hash(i, i + L)
                freq[h] += 1
            for v in freq.values():
                if v == 1:
                    return True
            return False

        left, right = 1, n
        answer = n

        while left <= right:
            mid = (left + right) // 2
            if exists_unique(mid):
                answer = mid
                right = mid - 1
            else:
                left = mid + 1

        return answer

Implementation Explanation

We first build prefix hashes and power arrays to allow O(1) subarray hash computation. The get_hash function extracts a normalized hash for any subarray.

The exists_unique function computes frequencies of all subarrays of a fixed length L. It uses a hash map to count occurrences and then checks if any frequency equals 1.

Finally, binary search is used to minimize L, repeatedly refining the search space based on whether a valid unique subarray exists.

Go Solution

package main

func smallestUniqueSubarray(nums []int) int {
	n := len(nums)
	base := int64(91138233)
	mod := int64(1000000007)

	prefix := make([]int64, n+1)
	power := make([]int64, n+1)
	power[0] = 1

	for i := 0; i < n; i++ {
		prefix[i+1] = (prefix[i]*base + int64(nums[i])) % mod
		power[i+1] = (power[i] * base) % mod
	}

	getHash := func(l, r int) int64 {
		val := (prefix[r] - prefix[l]*power[r-l]) % mod
		if val < 0 {
			val += mod
		}
		return val
	}

	existsUnique := func(L int) bool {
		freq := make(map[int64]int)
		for i := 0; i+L <= n; i++ {
			h := getHash(i, i+L)
			freq[h]++
		}
		for _, v := range freq {
			if v == 1 {
				return true
			}
		}
		return false
	}

	left, right := 1, n
	ans := n

	for left <= right {
		mid := (left + right) / 2
		if existsUnique(mid) {
			ans = mid
			right = mid - 1
		} else {
			left = mid + 1
		}
	}

	return ans
}

Go-Specific Notes

Go requires explicit handling of modular arithmetic underflow, so we normalize negative hash values. We also use map[int64]int for frequency counting instead of a Python dictionary. Slice initialization for prefix and power arrays is explicit, and integer types are fixed-width int64 to avoid overflow during multiplication.

Worked Examples

Example 1: nums = [3,3,3]

For L = 1, subarrays are [3] occurring 3 times, so none are unique. For L = 2, [3,3] occurs twice, so none are unique. For L = 3, [3,3,3] occurs once, so L = 3 is valid and returned.

L Subarrays Frequencies Unique Exists
1 [3],[3],[3] 3 No
2 [3,3],[3,3] 2 No
3 [3,3,3] 1 Yes

Example 2: nums = [2,1,2,3,3]

For L = 1, subarrays are [2],[1],[2],[3],[3]. The value [1] occurs once, so L = 1 is immediately valid.

Example 3: nums = [1,1,2,2,1]

For L = 1, no value occurs exactly once. For L = 2, all subarrays [1,1],[1,2],[2,2],[2,1] occur once each, so L = 2 is valid.

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Binary search over L, each check scans all subarrays using O(1) hashing
Space O(n) Prefix arrays and hash map for frequency counting

The logarithmic factor comes from binary search over possible lengths, and each feasibility check runs in linear time over the array using precomputed rolling hashes.

Test Cases

assert Solution().smallestUniqueSubarray([3,3,3]) == 3  # all duplicates until full length
assert Solution().smallestUniqueSubarray([2,1,2,3,3]) == 1  # single unique element exists
assert Solution().smallestUniqueSubarray([1,1,2,2,1]) == 2  # first unique subarrays appear at length 2

assert Solution().smallestUniqueSubarray([1]) == 1  # single element
assert Solution().smallestUniqueSubarray([1,2,3,4]) == 1  # all elements unique
assert Solution().smallestUniqueSubarray([5,5,5,5]) == 4  # only full array unique
assert Solution().smallestUniqueSubarray([1,2,1,2,1]) == 2  # mixed repetition
Test Why
[3,3,3] tests full repetition
[2,1,2,3,3] early unique element
[1,1,2,2,1] uniqueness emerges at L=2
[1] minimal size array
[1,2,3,4] all singletons
[5,5,5,5] uniqueness only at max length
[1,2,1,2,1] overlapping patterns

Edge Cases

One important edge case is when the array has length 1. In this case, the only subarray is the array itself, which is trivially unique, so the answer is always 1. The implementation correctly handles this because the binary search range starts at 1 and immediately succeeds.

Another edge case is when all elements are identical. In this scenario, no subarray of length less than n will be unique because every subarray of a given length is identical to others. The algorithm correctly increases L until it reaches n.

A third edge case is when all elements are distinct. Here, every subarray of length 1 is unique, so the answer should be 1. The frequency check correctly identifies singleton occurrences at the smallest length.

A final subtle case involves repeated patterns such as alternating sequences. These can produce multiple overlapping identical subarrays, and correctness depends entirely on accurate hashing and frequency counting, which the rolling hash ensures.