LeetCode 3329 - Count Substrings With K-Frequency Characters II

The problem asks us to count how many substrings of a given string s contain at least one character that appears at least k times within that substring. A substring is a contiguous portion of the string.

LeetCode Problem 3329

Difficulty: 🔴 Hard
Topics: Hash Table, String, Sliding Window

Solution

LeetCode 3329 - Count Substrings With K-Frequency Characters II

Problem Understanding

The problem asks us to count how many substrings of a given string s contain at least one character that appears at least k times within that substring.

A substring is a contiguous portion of the string. For every possible substring, we need to determine whether there exists some character whose frequency inside that substring is greater than or equal to k.

For example, consider:

s = "abacb", k = 2

The substring "aba" is valid because 'a' appears twice. The substring "bac" is not valid because no character appears at least twice.

The input consists of:

  • A string s containing only lowercase English letters.
  • An integer k.

The output is the total number of valid substrings.

The constraints are extremely important:

  • 1 <= s.length <= 3 * 10^5
  • The string can be very large.
  • An O(n^2) solution will be far too slow because there are roughly 9 * 10^10 substrings when n = 3 * 10^5.

This immediately tells us we need a near linear-time solution, typically O(n) or O(n log n).

There are several important edge cases:

  • k = 1, every substring is automatically valid because every substring contains at least one character appearing once.
  • k > frequency of every character in the entire string, no substring is valid.
  • Strings with all identical characters, where almost every sufficiently long substring becomes valid.
  • Very short strings such as length 1.

Approaches

Brute Force Approach

A straightforward solution is to generate every possible substring and count character frequencies for each substring.

For every starting index i, we extend the substring one character at a time toward the right. We maintain character counts and check whether any character frequency has reached k.

This works because every substring is explicitly examined, so no valid substring can be missed.

However, the complexity is far too high.

There are O(n^2) substrings. Even if frequency updates are efficient, checking all substrings for n = 3 * 10^5 is impossible within time limits.

Key Insight for the Optimal Solution

Instead of counting valid substrings directly, we can use a sliding window to efficiently determine how many substrings ending at a certain position are valid.

The crucial observation is:

Once a window contains a character with frequency at least k, every larger window extending further right is also valid.

This monotonic property makes sliding window applicable.

We maintain:

  • A left pointer
  • A right pointer
  • Frequency counts inside the current window

As we expand the right boundary, we track whether any character frequency has reached k.

When the condition becomes true, we know:

  • Every substring starting at the current left position and ending at or beyond the current right position is valid.

We then shrink the left side while maintaining validity.

This gives a linear-time solution because each pointer moves at most n times.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2) O(1) Examines every substring individually
Optimal Sliding Window O(n) O(1) Uses two pointers and frequency counting

Algorithm Walkthrough

Optimal Sliding Window Algorithm

  1. Initialize two pointers, left and right, both starting at 0.
  2. Maintain a frequency array of size 26 because the string contains only lowercase English letters.
  3. Expand the window by moving right forward one character at a time.
  4. For each new character added to the window:
  • Increase its frequency count.
  • Check whether its frequency becomes equal to k.
  1. Once some character frequency reaches at least k, the current window becomes valid.
  2. At this point:
  • Every substring starting at left and ending anywhere from right to n - 1 is valid.
  • There are exactly n - right such substrings.
  • Add this amount to the answer.
  1. Then shrink the window from the left:
  • Remove s[left] from the frequency table.
  • Increment left.
  • Continue shrinking while the window still contains a character with frequency at least k.
  1. Continue expanding right until the entire string has been processed.

Why it works

The sliding window maintains the smallest valid window for the current right boundary.

Whenever the window becomes valid, every extension of that window to the right also remains valid because adding more characters cannot reduce frequencies.

Thus, when the window first satisfies the condition at a particular right, all substrings extending further right are guaranteed valid and can be counted immediately.

Each character enters and leaves the window at most once, which guarantees linear complexity.

Python Solution

class Solution:
    def numberOfSubstrings(self, s: str, k: int) -> int:
        n = len(s)

        if k == 1:
            return n * (n + 1) // 2

        freq = [0] * 26

        left = 0
        answer = 0

        for right in range(n):
            index = ord(s[right]) - ord('a')
            freq[index] += 1

            while freq[index] >= k:
                answer += n - right

                left_index = ord(s[left]) - ord('a')
                freq[left_index] -= 1
                left += 1

        return answer

The implementation uses a fixed-size frequency array because the string contains only lowercase English letters. This is more efficient than a hash map.

The variable right expands the sliding window. Each iteration adds one new character into the current window.

When the newly added character reaches frequency k, the window becomes valid. At that moment, every substring extending further right is also valid, so we add n - right to the answer.

The inner while loop shrinks the window from the left while the condition remains true. This ensures we count every valid starting position exactly once.

The algorithm relies on the fact that frequencies only increase when expanding and only decrease when shrinking, which preserves the sliding window invariant.

Go Solution

func numberOfSubstrings(s string, k int) int64 {
	n := len(s)

	if k == 1 {
		return int64(n * (n + 1) / 2)
	}

	freq := make([]int, 26)

	left := 0
	var answer int64 = 0

	for right := 0; right < n; right++ {
		index := int(s[right] - 'a')
		freq[index]++

		for freq[index] >= k {
			answer += int64(n - right)

			leftIndex := int(s[left] - 'a')
			freq[leftIndex]--
			left++
		}
	}

	return answer
}

The Go implementation follows exactly the same logic as the Python version.

The main difference is integer handling. Since the number of substrings can be as large as approximately 4.5 * 10^10, the answer must use int64.

The frequency array is implemented using a slice of length 26.

Go strings are byte indexed, so character conversion uses:

int(s[right] - 'a')

which directly maps lowercase letters to indices 0 through 25.

Worked Examples

Example 1

s = "abacb"
k = 2

Step-by-step Trace

Right Character Window Frequencies Valid? Action Answer
0 a "a" a=1 No Expand 0
1 b "ab" a=1,b=1 No Expand 0
2 a "aba" a=2,b=1 Yes Add 5-2=3 3
"ba" a=1,b=1 No Shrink 3
3 c "bac" a=1,b=1,c=1 No Expand 3
4 b "bacb" a=1,b=2,c=1 Yes Add 5-4=1 4
"acb" a=1,b=1,c=1 No Shrink 4

Final answer:

4

The valid substrings are:

"aba"
"abac"
"abacb"
"bacb"

Example 2

s = "abcde"
k = 1

Since every character already appears once, every substring is valid.

Total substrings:

$\frac{n(n+1)}{2}$

For n = 5:

5 * 6 / 2 = 15

Answer:

15

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each pointer moves at most n times
Space O(1) Frequency array size is fixed at 26

The sliding window guarantees linear processing because each character is inserted into the window once and removed once.

The frequency structure never grows beyond 26 entries because the alphabet is fixed.

Test Cases

sol = Solution()

assert sol.numberOfSubstrings("abacb", 2) == 4  # provided example
assert sol.numberOfSubstrings("abcde", 1) == 15  # every substring valid

assert sol.numberOfSubstrings("a", 1) == 1  # single character valid
assert sol.numberOfSubstrings("a", 2) == 0  # impossible frequency

assert sol.numberOfSubstrings("aaaa", 2) == 6  # many overlapping valid substrings
assert sol.numberOfSubstrings("aaaa", 4) == 1  # only full string valid
assert sol.numberOfSubstrings("aaaa", 5) == 0  # impossible requirement

assert sol.numberOfSubstrings("abcabc", 2) == 6  # repeated characters

assert sol.numberOfSubstrings("abab", 2) == 3  # multiple valid windows

assert sol.numberOfSubstrings("zzzzzz", 1) == 21  # all substrings valid
assert sol.numberOfSubstrings("zzzzzz", 6) == 1  # only entire string valid

assert sol.numberOfSubstrings("abcdefghijklmnopqrstuvwxyz", 2) == 0  # no repeats

assert sol.numberOfSubstrings("aaabbb", 3) == 7  # multiple qualifying groups

Test Case Summary

Test Why
"abacb", 2 Basic mixed example
"abcde", 1 Every substring valid
"a", 1 Smallest valid input
"a", 2 Smallest impossible case
"aaaa", 2 Heavy repetition
"aaaa", 4 Only one valid substring
"aaaa", 5 Requirement exceeds length
"abcabc", 2 Repeated pattern
"abab", 2 Multiple overlapping valid windows
"zzzzzz", 1 All substrings counted
"zzzzzz", 6 Entire string only
"abcdefghijklmnopqrstuvwxyz", 2 No valid substrings
"aaabbb", 3 Multiple qualifying regions

Edge Cases

Case 1: k = 1

This is an important special case because every substring automatically satisfies the condition. A naive sliding window still works, but it performs unnecessary processing.

The implementation handles this immediately using the substring count formula:

$\frac{n(n+1)}{2}$

This avoids extra computation and simplifies execution.

Case 2: No Character Can Ever Reach k

Consider:

s = "abcde"
k = 2

No character repeats, so no valid substring exists.

A buggy implementation might incorrectly count windows during expansion. Our implementation only counts substrings when some frequency explicitly reaches k, so the answer correctly remains 0.

Case 3: All Characters Are Identical

Consider:

s = "aaaaaa"
k = 3

Many overlapping substrings are valid. This stresses whether the shrinking logic correctly counts all starting positions.

The sliding window works because each time the window satisfies the condition, it counts all future extensions immediately and then shrinks carefully to avoid duplicate counting.

Case 4: Very Large Input Size

The maximum string length is 3 * 10^5, which makes quadratic algorithms impossible.

The implementation guarantees linear performance because each pointer moves monotonically from left to right and never backtracks.