LeetCode 2953 - Count Complete Substrings

Difficulty: 🔴 Hard
Topics: Hash Table, String, Sliding Window

Solution

Problem Understanding

The problem asks us to count how many substrings of a given string are considered "complete".

A substring is complete if it satisfies two conditions simultaneously:

Every distinct character in the substring appears exactly k times.
For every pair of adjacent characters, the absolute difference between their alphabetical positions is at most 2.

The second condition means that for every adjacent pair:

abs(ord(s[i]) - ord(s[i - 1])) <= 2

For example:

'a' and 'c' are valid neighbors because the difference is 2
'a' and 'd' are invalid neighbors because the difference is 3

The input consists of:

word, a lowercase English string
k, the exact frequency every character must have inside a valid substring

The output is the total number of substrings satisfying both requirements.

The constraints are important:

1 <= word.length <= 100000

This immediately rules out expensive substring enumeration approaches such as checking every substring independently in quadratic or cubic time. We need something close to linear time.

There are two major observations hidden inside the problem:

First, the adjacency condition effectively partitions the string into independent segments. Whenever two adjacent characters differ by more than 2, no valid substring can cross that boundary.

Second, if a substring contains u distinct characters, then its length must be exactly:

u * k

because each of the u characters appears exactly k times.

Since there are only 26 lowercase letters, the number of possible distinct character counts is very small, which makes sliding window enumeration feasible.

Important edge cases include:

Very short strings
k = 1, where every distinct character only needs to appear once
Strings where every adjacent pair violates the adjacency rule
Strings with all identical characters
Windows containing too many copies of one character
Multiple overlapping valid substrings

The problem guarantees only lowercase English letters, which allows us to use fixed-size frequency arrays of length 26.

Approaches

Brute Force Approach

A straightforward solution is to generate every possible substring and verify whether it is complete.

For each substring:

Count the frequency of every character.
Verify that every occurring character appears exactly k times.
Verify that every adjacent pair differs by at most 2.

If the substring satisfies both conditions, increment the answer.

This approach is correct because it explicitly checks the definition of a complete substring.

However, the complexity is far too large. There are O(n^2) substrings, and validating each one may take O(n) time, leading to O(n^3) complexity in the worst case.

Even with prefix optimizations, checking all substrings remains too expensive for n = 100000.

Key Insight

The crucial optimization comes from combining two observations.

The first observation is that the adjacency condition creates natural segment boundaries. If:

abs(word[i] - word[i - 1]) > 2

then no valid substring can include both positions. Therefore, we can split the string into maximal valid segments and process each segment independently.

The second observation is that a valid substring with u distinct characters must have length:

u * k

Since there are only 26 lowercase letters, we only need to try window sizes:

k, 2k, 3k, ..., 26k

For each fixed window size, we use a sliding window with frequency counting.

A window is valid if:

Every present character appears exactly k times
The number of characters with frequency k equals the number of distinct characters expected

This reduces the complexity dramatically.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n³)	O(26)	Enumerates and validates every substring
Optimal	O(26 × n)	O(26)	Sliding window on valid segments

Algorithm Walkthrough

Step 1: Split the string into valid segments

Traverse the string and split it whenever adjacent characters differ by more than 2.

For example:

word = "abcfgh"

Between 'c' and 'f':

abs('f' - 'c') = 3

So the string becomes:

["abc", "fgh"]

No valid substring can cross this boundary.

Step 2: Process each segment independently

Suppose a segment has length m.

Inside this segment, try every possible number of distinct characters:

distinct_count = 1 to 26

The corresponding window length is:

window_size = distinct_count * k

If the window size exceeds the segment length, stop.

Step 3: Use a sliding window

Maintain:

A frequency array of size 26
The number of characters currently appearing exactly k times

Expand the window one character at a time.

When adding a character:

If its old frequency was k, decrement the matched counter
Increment frequency
If its new frequency becomes k, increment the matched counter

Do the same carefully when removing a character.

Step 4: Check window validity

A window is complete if:

Its length is exactly distinct_count * k
Exactly distinct_count characters appear k times

Since the total window size already equals distinct_count * k, this automatically guarantees no extra frequencies exist.

Step 5: Accumulate the answer

Count all valid windows across all segments and return the total.

Why it works

The algorithm works because every complete substring must lie entirely inside one valid adjacency segment.

For each segment, every valid substring must have a length equal to:

(number of distinct characters) * k

The sliding window enumerates all such candidate windows exactly once.

The frequency bookkeeping guarantees that a window is counted only when every occurring character appears exactly k times.

Therefore, every valid substring is counted exactly once, and no invalid substring is included.

Python Solution

class Solution:
    def countCompleteSubstrings(self, word: str, k: int) -> int:
        n = len(word)

        def count_in_segment(segment: str) -> int:
            m = len(segment)
            total = 0

            for distinct_count in range(1, 27):
                window_size = distinct_count * k

                if window_size > m:
                    break

                freq = [0] * 26
                matched = 0
                left = 0

                for right in range(m):
                    idx = ord(segment[right]) - ord('a')

                    if freq[idx] == k:
                        matched -= 1

                    freq[idx] += 1

                    if freq[idx] == k:
                        matched += 1

                    if right - left + 1 > window_size:
                        remove_idx = ord(segment[left]) - ord('a')

                        if freq[remove_idx] == k:
                            matched -= 1

                        freq[remove_idx] -= 1

                        if freq[remove_idx] == k:
                            matched += 1

                        left += 1

                    if (
                        right - left + 1 == window_size
                        and matched == distinct_count
                    ):
                        total += 1

            return total

        answer = 0
        start = 0

        for i in range(1, n):
            if abs(ord(word[i]) - ord(word[i - 1])) > 2:
                answer += count_in_segment(word[start:i])
                start = i

        answer += count_in_segment(word[start:])

        return answer

The implementation begins by splitting the original string into maximal segments where every adjacent pair satisfies the difference constraint.

The helper function count_in_segment handles one independent segment. Since every valid substring inside the segment must have length distinct_count * k, the function iterates through all possible distinct character counts from 1 to 26.

For each possible window size, a sliding window is maintained using a fixed-size frequency array.

The variable matched tracks how many characters currently appear exactly k times. This allows constant-time validity checks.

When the window grows beyond the target size, the leftmost character is removed and the bookkeeping is updated carefully.

Whenever the window size matches the required size and exactly distinct_count characters have frequency k, the substring is complete and contributes to the answer.

Finally, the counts from all segments are summed together.

Go Solution

func countCompleteSubstrings(word string, k int) int {
	n := len(word)

	countInSegment := func(segment string) int {
		m := len(segment)
		total := 0

		for distinctCount := 1; distinctCount <= 26; distinctCount++ {
			windowSize := distinctCount * k

			if windowSize > m {
				break
			}

			freq := make([]int, 26)
			matched := 0
			left := 0

			for right := 0; right < m; right++ {
				idx := int(segment[right] - 'a')

				if freq[idx] == k {
					matched--
				}

				freq[idx]++

				if freq[idx] == k {
					matched++
				}

				if right-left+1 > windowSize {
					removeIdx := int(segment[left] - 'a')

					if freq[removeIdx] == k {
						matched--
					}

					freq[removeIdx]--

					if freq[removeIdx] == k {
						matched++
					}

					left++
				}

				if right-left+1 == windowSize && matched == distinctCount {
					total++
				}
			}
		}

		return total
	}

	answer := 0
	start := 0

	for i := 1; i < n; i++ {
		diff := int(word[i]) - int(word[i-1])

		if diff < 0 {
			diff = -diff
		}

		if diff > 2 {
			answer += countInSegment(word[start:i])
			start = i
		}
	}

	answer += countInSegment(word[start:])

	return answer
}

The Go implementation follows the same logic as the Python version.

The frequency table is implemented using a fixed integer slice of length 26.

Since Go does not provide a built-in absolute value function for integers, the adjacent character difference is computed manually.

The sliding window logic and frequency bookkeeping are otherwise identical.

Because the maximum substring length is at most 100000, standard Go integers are sufficient and overflow is not a concern.

Worked Examples

Example 1

word = "igigee"
k = 2

First, check adjacency differences:

i-g = 2
g-i = 2
i-g = 2
g-e = 2
e-e = 0

All are valid, so the entire string is one segment:

"igigee"

Now try different distinct counts.

Distinct Count = 1

Window size:

1 * 2 = 2

Sliding windows:

Window	Frequencies	Valid
"ig"	i:1 g:1	No
"gi"	g:1 i:1	No
"ig"	i:1 g:1	No
"ge"	g:1 e:1	No
"ee"	e:2	Yes

Count = 1

Distinct Count = 2

Window size:

2 * 2 = 4

Window	Frequencies	Valid
"igig"	i:2 g:2	Yes
"gige"	g:2 i:1 e:1	No
"igee"	i:1 g:1 e:2	No

Count = 1

Distinct Count = 3

Window size:

3 * 2 = 6

Window	Frequencies	Valid
"igigee"	i:2 g:2 e:2	Yes

Count = 1

Final answer:

1 + 1 + 1 = 3

Example 2

word = "aaabbbccc"
k = 3

All adjacent differences are valid.

Distinct Count = 1

Window size:

Valid windows:

Window	Valid
"aaa"	Yes
"aab"	No
"abb"	No
"bbb"	Yes
"bbc"	No
"bcc"	No
"ccc"	Yes

Count = 3

Distinct Count = 2

Window size:

Window	Frequencies	Valid
"aaabbb"	a:3 b:3	Yes
"aabbbc"	a:2 b:3 c:1	No
"abbbcc"	a:1 b:3 c:2	No
"bbbccc"	b:3 c:3	Yes

Count = 2

Distinct Count = 3

Window size:

Window	Frequencies	Valid
"aaabbbccc"	a:3 b:3 c:3	Yes

Count = 1

Final answer:

3 + 2 + 1 = 6

Complexity Analysis

Measure	Complexity	Explanation
Time	O(26 × n)	Each segment is processed for at most 26 window sizes
Space	O(26)	Frequency array uses constant space

The outer loop iterates over at most 26 distinct character counts. For each count, the sliding window scans the segment linearly.

Since all segments together contain exactly n characters, the total complexity becomes:

O(26 * n)

which simplifies to linear time.

The frequency array always contains 26 integers, so the space usage is constant.

Test Cases

sol = Solution()

assert sol.countCompleteSubstrings("igigee", 2) == 3
# Basic example with overlapping valid substrings

assert sol.countCompleteSubstrings("aaabbbccc", 3) == 6
# Multiple valid window sizes

assert sol.countCompleteSubstrings("aaaa", 2) == 3
# Repeated identical character

assert sol.countCompleteSubstrings("abcd", 1) == 10
# Every substring valid because all frequencies are 1

assert sol.countCompleteSubstrings("az", 1) == 2
# Adjacent difference too large, only single letters valid

assert sol.countCompleteSubstrings("a", 1) == 1
# Smallest possible input

assert sol.countCompleteSubstrings("abc", 2) == 0
# Window cannot satisfy exact frequency requirement

assert sol.countCompleteSubstrings("aabbcc", 2) == 6
# Multiple distinct-count configurations

assert sol.countCompleteSubstrings("zzz", 1) == 3
# Every single character valid, larger windows invalid

assert sol.countCompleteSubstrings("abac", 1) == 8
# Distinct single-occurrence windows

Test Summary

Test	Why
`"igigee", 2`	Verifies overlapping complete substrings
`"aaabbbccc", 3`	Verifies multiple valid window sizes
`"aaaa", 2`	Tests repeated single-character windows
`"abcd", 1`	Tests all substrings becoming valid
`"az", 1`	Tests adjacency splitting
`"a", 1`	Tests minimum input size
`"abc", 2`	Tests impossible frequency requirement
`"aabbcc", 2`	Tests multiple valid groupings
`"zzz", 1`	Tests identical-character edge case
`"abac", 1`	Tests windows with all unique characters

Edge Cases

One important edge case occurs when adjacent characters differ by more than 2. For example:

word = "az"

The substring "az" is invalid because:

abs('a' - 'z') = 25

A naive implementation might still consider windows crossing this boundary. The segmentation step prevents this by splitting the string into independent regions.

Another tricky edge case is when the string contains only one repeated character, such as:

word = "aaaaaa"

Here, multiple overlapping windows may satisfy the exact frequency requirement. The sliding window correctly counts every valid occurrence independently.

A third important case occurs when k = 1. In this scenario, every character inside a valid substring must appear exactly once. The algorithm still works because the window size becomes equal to the number of distinct characters, and the frequency tracking naturally enforces uniqueness.

A subtle bug source is handling frequency transitions around k. When a character count changes from k to k + 1, it must stop contributing to the matched counter. Similarly, when removing a character changes its count back to k, it must contribute again. The implementation carefully updates matched before and after every frequency modification to maintain correctness.