LeetCode 3297 - Count Substrings That Can Be Rearranged to Contain a String I

We are given two lowercase strings, word1 and word2. A substring of word1 is considered valid if its characters can be rearranged so that word2 becomes a prefix of the rearranged string. To understand what this means, suppose word2 = "abc".

LeetCode Problem 3297

Difficulty: 🟡 Medium
Topics: Hash Table, String, Sliding Window

Solution

LeetCode 3297 - Count Substrings That Can Be Rearranged to Contain a String I

Problem Understanding

We are given two lowercase strings, word1 and word2.

A substring of word1 is considered valid if its characters can be rearranged so that word2 becomes a prefix of the rearranged string.

To understand what this means, suppose word2 = "abc". If a substring contains at least one 'a', one 'b', and one 'c', then we can rearrange the substring so that those three characters appear first, forming the prefix "abc". Any extra characters can appear afterward.

Therefore, the ordering inside the substring does not matter. Only the character frequencies matter.

More generally, if word2 contains repeated characters, then the substring must contain at least the same frequency of every character. For example:

  • If word2 = "abc", a valid substring must contain at least:

  • 1 'a'

  • 1 'b'

  • 1 'c'

  • If word2 = "aaabc", a valid substring must contain at least:

  • 3 'a'

  • 1 'b'

  • 1 'c'

The task is to count how many substrings of word1 satisfy these frequency requirements.

The constraints are important:

  • |word1| ≤ 100000
  • |word2| ≤ 10000

A quadratic solution that examines every substring individually would be far too slow because word1 can contain one hundred thousand characters. We need a solution that runs approximately in linear time.

Important edge cases include:

  • word2 contains repeated characters.
  • word1 is shorter than word2.
  • Required characters never appear enough times.
  • Every sufficiently long substring is valid.
  • A valid window may contain many extra characters that are irrelevant to the requirements.

Approaches

Brute Force

The most direct approach is to enumerate every substring of word1.

For each starting position, we expand the ending position and maintain character frequencies. For every substring, we check whether its frequency counts satisfy all requirements from word2.

This approach is correct because every substring is examined exactly once and tested against the validity condition.

However, there are O(n²) substrings. Even if frequency checking is optimized to O(26), the overall complexity becomes O(n²), which is far too slow for n = 100000.

Key Observation

A substring is valid if and only if its frequency count contains at least the required frequency of every character in word2.

This transforms the problem into a frequency coverage problem.

Suppose we use a sliding window [left, right].

When the current window satisfies all frequency requirements, every larger window that starts at the same left and ends at or after right is also valid. Adding more characters cannot invalidate the frequency requirements.

Therefore, once a valid window is found:

  • Current window [left, right] is valid.
  • [left, right+1], [left, right+2], ..., [left, n-1] are also valid.

If n is the length of word1, then this contributes:

n - right

valid substrings.

This monotonic property allows us to use a sliding window and count many substrings at once.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(26) Checks all substrings individually
Optimal Sliding Window O(n) O(26) Counts groups of valid substrings using frequency coverage

Algorithm Walkthrough

  1. Create a frequency array required storing how many times each character appears in word2.
  2. Compute how many distinct characters actually have a positive requirement. Call this value needed.
  3. Maintain another frequency array window for the current sliding window.
  4. Maintain a variable formed representing how many required characters currently meet their required frequency exactly or beyond.
  5. Initialize:
  • left = 0
  • answer = 0
  1. Expand the window by moving right from left to right across word1.
  2. When a new character enters the window:
  • Increment its frequency in window.
  • If its frequency becomes equal to the required frequency, increment formed.
  1. Whenever formed == needed, the current window satisfies all requirements.
  2. At this point:
  • Every substring ending at positions right, right+1, ..., n-1 and starting at left is valid.
  • Add n - right to the answer.
  1. Try to shrink the window from the left:
  • Remove word1[left].
  • If removing it causes a previously satisfied requirement to become unsatisfied, decrement formed.
  • Increment left.
  1. Continue shrinking while the window remains valid. This ensures each valid starting position contributes exactly once.
  2. After processing all positions, return the accumulated answer.

Why it works

The sliding window maintains the exact frequency counts of the current substring. The variable formed tracks whether every required character frequency has been satisfied.

Whenever the window becomes valid, any extension of that window remains valid because adding characters cannot reduce frequencies. Therefore all suffix extensions ending after right are guaranteed to be valid. Counting n - right substrings at once is correct.

The shrinking phase ensures that every starting position is processed exactly once, preventing double counting. Since both pointers only move forward, the algorithm runs in linear time.

Python Solution

class Solution:
    def validSubstringCount(self, word1: str, word2: str) -> int:
        required = [0] * 26
        for ch in word2:
            required[ord(ch) - ord('a')] += 1

        needed = sum(1 for count in required if count > 0)

        window = [0] * 26
        formed = 0
        left = 0
        n = len(word1)
        answer = 0

        for right, ch in enumerate(word1):
            idx = ord(ch) - ord('a')
            window[idx] += 1

            if required[idx] > 0 and window[idx] == required[idx]:
                formed += 1

            while formed == needed:
                answer += n - right

                left_idx = ord(word1[left]) - ord('a')

                if required[left_idx] > 0 and window[left_idx] == required[left_idx]:
                    formed -= 1

                window[left_idx] -= 1
                left += 1

        return answer

The implementation begins by constructing the required frequency table from word2.

The variable needed stores the number of distinct characters that must be satisfied. For example, if word2 = "aaabc", then the required characters are 'a', 'b', and 'c', so needed = 3.

As the right pointer expands the window, frequencies are updated. Whenever a character reaches its required frequency, formed increases.

Once all requirements are satisfied (formed == needed), the current window and every extension of it are valid. Therefore n - right is added to the answer.

The left pointer then shrinks the window as much as possible while maintaining correctness. If removing a character breaks a requirement, formed is decreased and the shrinking stops.

Because each character enters and leaves the window at most once, the solution runs in linear time.

Go Solution

func validSubstringCount(word1 string, word2 string) int64 {
	required := [26]int{}

	for _, ch := range word2 {
		required[ch-'a']++
	}

	needed := 0
	for _, count := range required {
		if count > 0 {
			needed++
		}
	}

	window := [26]int{}
	formed := 0
	left := 0
	n := len(word1)

	var answer int64 = 0

	for right := 0; right < n; right++ {
		idx := word1[right] - 'a'
		window[idx]++

		if required[idx] > 0 && window[idx] == required[idx] {
			formed++
		}

		for formed == needed {
			answer += int64(n - right)

			leftIdx := word1[left] - 'a'

			if required[leftIdx] > 0 && window[leftIdx] == required[leftIdx] {
				formed--
			}

			window[leftIdx]--
			left++
		}
	}

	return answer
}

The Go implementation follows exactly the same logic as the Python version.

One important difference is the return type. The number of valid substrings can be as large as roughly n * (n + 1) / 2, which exceeds the range of a 32 bit integer when n = 100000. Therefore the answer is stored in an int64.

The fixed-size frequency arrays use Go arrays of length 26, which avoids hash map overhead and provides constant-time access.

Worked Examples

Example 1

word1 = "bcca"
word2 = "abc"

Required frequencies:

Character Required
a 1
b 1
c 1

n = 4

right char Window formed
0 b b 1
1 c bc 2
2 c bcc 2
3 a bcca 3

Window becomes valid at right = 3.

Add:

n - right = 4 - 3 = 1

Answer becomes 1.

After removing 'b', the requirement for 'b' is no longer satisfied.

Final answer:

1

Example 2

word1 = "abcabc"
word2 = "abc"

n = 6

right Window Valid? Contribution
0 a No 0
1 ab No 0
2 abc Yes 4
3 abca Yes 3
4 abcab Yes 2
5 abcabc Yes 1

The shrinking process counts all valid starting positions.

Total:

4 + 3 + 2 + 1 = 10

Answer:

10

Example 3

word1 = "abcabc"
word2 = "aaabc"

Required:

Character Required
a 3
b 1
c 1

The entire string contains only two 'a' characters.

No window can ever satisfy the requirement of three 'a' characters.

Answer:

0

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each pointer moves forward at most n times
Space O(1) Two fixed-size arrays of length 26

The left and right pointers each traverse word1 at most once. Every character enters the window once and leaves the window once. Since the alphabet contains only 26 lowercase letters, all frequency operations are constant time and the extra memory usage is constant.

Test Cases

sol = Solution()

assert sol.validSubstringCount("bcca", "abc") == 1      # problem example 1
assert sol.validSubstringCount("abcabc", "abc") == 10  # problem example 2
assert sol.validSubstringCount("abcabc", "aaabc") == 0 # problem example 3

assert sol.validSubstringCount("a", "a") == 1          # smallest valid case
assert sol.validSubstringCount("a", "b") == 0          # impossible requirement

assert sol.validSubstringCount("aaaa", "aa") == 6      # repeated character requirement
assert sol.validSubstringCount("aaaa", "aaaa") == 1    # whole string only

assert sol.validSubstringCount("abcd", "abcd") == 1    # exact match
assert sol.validSubstringCount("abcd", "abcde") == 0   # word2 longer than word1

assert sol.validSubstringCount("zzzzzz", "z") == 21    # every substring valid

assert sol.validSubstringCount("abca", "aa") == 1      # need two copies of a

assert sol.validSubstringCount("ababab", "aab") == 8   # repeated requirements

assert sol.validSubstringCount("xyz", "abc") == 0      # no required letters present

assert sol.validSubstringCount("abcde", "e") == 5      # all substrings ending at e

Test Summary

Test Why
"bcca", "abc" Official example
"abcabc", "abc" Official example with many valid windows
"abcabc", "aaabc" Official impossible case
"a", "a" Minimum valid input
"a", "b" Minimum invalid input
"aaaa", "aa" Repeated character requirement
"aaaa", "aaaa" Entire string needed
"abcd", "abcd" Exact frequency match
"abcd", "abcde" Required length exceeds available length
"zzzzzz", "z" Every substring valid
"abca", "aa" Need multiple copies of one character
"ababab", "aab" Mixed repeated requirements
"xyz", "abc" Missing required characters
"abcde", "e" Single-character target

Edge Cases

word2 Contains Repeated Characters

A common mistake is checking only whether a character appears, rather than whether it appears enough times.

For example:

word1 = "abca"
word2 = "aa"

The substring must contain two copies of 'a'. A window containing only one 'a' is not valid. The implementation stores exact frequency requirements and only marks a character as satisfied when the required count is reached.

word1 Is Shorter Than word2

Consider:

word1 = "abc"
word2 = "abcd"

No substring can possibly contain all required characters because even the entire string is too short. The sliding window never reaches a valid state, so the answer remains zero.

Required Characters Never Reach Their Target Frequency

Consider:

word1 = "abcabc"
word2 = "aaabc"

The string contains only two 'a' characters, but three are required. The window never satisfies all requirements, so formed never reaches needed. The algorithm naturally returns zero without any special-case logic.

Every Extension Remains Valid

Consider:

word1 = "zzzzzz"
word2 = "z"

As soon as a window contains one 'z', every larger window starting at the same position is also valid. The algorithm exploits this property by adding n - right at once, avoiding the need to enumerate all larger windows individually. This observation is the key to achieving linear time complexity.