LeetCode 1371 - Find the Longest Substring Containing Vowels in Even Counts
This problem asks us to find the length of the longest contiguous substring in which every vowel appears an even number
Difficulty: 🟡 Medium
Topics: Hash Table, String, Bit Manipulation, Prefix Sum
Solution
LeetCode 1371 - Find the Longest Substring Containing Vowels in Even Counts
Problem Understanding
This problem asks us to find the length of the longest contiguous substring in which every vowel appears an even number of times. The vowels we care about are:
aeiou
A vowel may appear zero times, and zero is considered even. Therefore, a substring with no vowels at all is valid.
The input is a string s consisting only of lowercase English letters. We must return the maximum possible length of a substring satisfying the condition.
The important detail is that we only care whether each vowel appears an even or odd number of times. The exact count does not matter beyond its parity.
For example, if a substring contains:
etwiceifour timesozero times
then those vowels are valid because all counts are even.
The constraints are large:
1 <= s.length <= 5 * 10^5
This immediately tells us that quadratic algorithms will likely time out. An O(n^2) solution would require checking billions of substrings in the worst case. We need a linear or near-linear solution.
Several edge cases are important:
- A string with no vowels at all, such as
"bcbcbc", should return the entire string length. - A string where no non-empty valid substring exists beyond single consonants.
- Repeated toggling of vowel parity, where vowels appear many times.
- Very large inputs, where efficiency is critical.
- Prefixes themselves may already satisfy the condition.
The problem guarantees only lowercase English letters, so we do not need to handle uppercase characters or non-alphabetic symbols.
Approaches
Brute Force Approach
The most straightforward solution is to examine every possible substring.
For each substring:
- Count the occurrences of each vowel.
- Check whether every vowel count is even.
- If valid, update the maximum length.
A naive implementation would generate all O(n^2) substrings, and counting vowels inside each substring would take O(n) time, resulting in O(n^3) complexity.
We can improve this slightly using prefix frequency arrays so each substring check becomes O(1), reducing the total complexity to O(n^2). However, with n = 5 * 10^5, even O(n^2) is far too slow.
The brute force method is correct because it explicitly checks every possible substring, but it is not practical for the input size.
Optimal Approach
The key insight is that we only care about whether each vowel count is even or odd.
Each vowel can have two states:
- even
- odd
There are five vowels, so the total number of possible parity states is:
$$2^5 = 32$$
We can represent the parity state using a 5-bit bitmask:
- bit 0 for
a - bit 1 for
e - bit 2 for
i - bit 3 for
o - bit 4 for
u
As we scan the string:
- encountering a vowel toggles its corresponding bit
- consonants leave the state unchanged
If the same bitmask appears at two different indices, then the substring between them has even counts for all vowels.
Why?
Because the parity changes cancel out. If the prefix parity state at index i equals the prefix parity state at index j, then the substring (i+1 ... j) must contain an even number of each vowel.
This transforms the problem into finding the largest distance between two identical prefix states.
We store the earliest index where each state first appears. Whenever we see the same state again, we compute the substring length.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) or O(n) | Checks every substring explicitly |
| Optimal | O(n) | O(1) | Uses bitmask parity states and prefix tracking |
Algorithm Walkthrough
- Create a bitmask variable called
state, initially set to0.
A state of 0 means all vowels currently have even counts.
2. Create a hash map called first_seen.
This map stores the earliest index where each parity state appeared.
Initialize it with:
{0: -1}
This handles cases where a valid substring starts from index 0.
3. Define a mapping from vowels to bit positions.
For example:
a -> 0e -> 1i -> 2o -> 3u -> 4
- Traverse the string character by character.
- For each character:
- If it is a vowel, toggle its corresponding bit using XOR.
- If it is a consonant, do nothing.
Example:
state ^= (1 << vowel_index)
- After updating the state:
- If this state has appeared before, compute the substring length from the first occurrence to the current index.
- Otherwise, store the current index as the first occurrence of this state.
- Keep track of the maximum substring length found.
- Return the maximum length after processing the entire string.
Why it works
The algorithm relies on prefix parity states.
Suppose two prefixes have the same parity state. This means every vowel has changed parity an even number of times between those two positions. Therefore, the substring between them contains even counts of all vowels.
By storing the earliest occurrence of each state, we maximize the substring length whenever the same state reappears.
Since there are only 32 possible states, the algorithm remains extremely efficient.
Python Solution
class Solution:
def findTheLongestSubstring(self, s: str) -> int:
vowel_to_bit = {
'a': 0,
'e': 1,
'i': 2,
'o': 3,
'u': 4
}
first_seen = {0: -1}
state = 0
max_length = 0
for index, char in enumerate(s):
if char in vowel_to_bit:
bit_position = vowel_to_bit[char]
state ^= (1 << bit_position)
if state in first_seen:
current_length = index - first_seen[state]
max_length = max(max_length, current_length)
else:
first_seen[state] = index
return max_length
The implementation starts by creating a mapping between vowels and their corresponding bit positions. This allows us to efficiently toggle parity information using bit operations.
The first_seen dictionary records the earliest index where each parity state appeared. We initialize state 0 at index -1 because an empty prefix has all vowels appearing an even number of times.
As we iterate through the string, we update the parity state whenever we encounter a vowel. XOR is used because it naturally toggles a bit between 0 and 1.
If the current state has appeared before, then the substring between the previous occurrence and the current position has even counts for every vowel. We compute its length and update the answer.
If the state has never appeared before, we store the current index. We only store the first occurrence because earlier positions produce longer substrings.
Go Solution
func findTheLongestSubstring(s string) int {
vowelToBit := map[byte]int{
'a': 0,
'e': 1,
'i': 2,
'o': 3,
'u': 4,
}
firstSeen := map[int]int{
0: -1,
}
state := 0
maxLength := 0
for i := 0; i < len(s); i++ {
ch := s[i]
if bit, exists := vowelToBit[ch]; exists {
state ^= (1 << bit)
}
if firstIndex, exists := firstSeen[state]; exists {
currentLength := i - firstIndex
if currentLength > maxLength {
maxLength = currentLength
}
} else {
firstSeen[state] = i
}
}
return maxLength
}
The Go implementation follows the same logic as the Python version. A map[int]int is used to store the first occurrence of each parity state.
Since Go strings are byte arrays for ASCII characters, we index directly using s[i]. This is efficient and safe because the input contains only lowercase English letters.
The parity state is stored as an integer bitmask, and XOR toggles vowel parity exactly as in the Python solution.
Worked Examples
Example 1
Input:
s = "eleetminicoworoep"
We track the parity state as a 5-bit mask:
| Index | Char | State (binary) | Meaning | First Seen | Max Length |
|---|---|---|---|---|---|
| -1 | - | 00000 | all even | {00000: -1} | 0 |
| 0 | e | 00010 | e odd | store 0 | 0 |
| 1 | l | 00010 | unchanged | seen at 0 | 1 |
| 2 | e | 00000 | e even again | seen at -1 | 3 |
| 3 | e | 00010 | e odd | seen at 0 | 3 |
| 4 | t | 00010 | unchanged | seen at 0 | 4 |
| 5 | m | 00010 | unchanged | seen at 0 | 5 |
| 6 | i | 00110 | e odd, i odd | store 6 | 5 |
| 7 | n | 00110 | unchanged | seen at 6 | 5 |
| 8 | i | 00010 | i even again | seen at 0 | 8 |
| 9 | c | 00010 | unchanged | seen at 0 | 9 |
| 10 | o | 01010 | e odd, o odd | store 10 | 9 |
| 11 | w | 01010 | unchanged | seen at 10 | 9 |
| 12 | o | 00010 | o even again | seen at 0 | 12 |
| 13 | r | 00010 | unchanged | seen at 0 | 13 |
The longest valid substring length becomes 13.
Example 2
Input:
s = "leetcodeisgreat"
| Index | Char | State | Action | Max Length |
|---|---|---|---|---|
| -1 | - | 00000 | initialize | 0 |
| 0 | l | 00000 | seen before | 1 |
| 1 | e | 00010 | first time | 1 |
| 2 | e | 00000 | seen at -1 | 3 |
| 3 | t | 00000 | seen at -1 | 4 |
| 4 | c | 00000 | seen at -1 | 5 |
The maximum valid substring is "leetc" with length 5.
Example 3
Input:
s = "bcbcbc"
Since no vowels appear, the state always remains 00000.
| Index | Char | State | Max Length |
|---|---|---|---|
| -1 | - | 00000 | 0 |
| 0 | b | 00000 | 1 |
| 1 | c | 00000 | 2 |
| 2 | b | 00000 | 3 |
| 3 | c | 00000 | 4 |
| 4 | b | 00000 | 5 |
| 5 | c | 00000 | 6 |
The entire string is valid.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character is processed exactly once |
| Space | O(1) | At most 32 parity states are stored |
The time complexity is linear because we scan the string once and each operation inside the loop is constant time.
The space complexity is technically O(1) because there are only 32 possible parity states for the five vowels. Regardless of input size, the hash map can never grow beyond 32 entries.
Test Cases
solution = Solution()
assert solution.findTheLongestSubstring("eleetminicoworoep") == 13 # provided example
assert solution.findTheLongestSubstring("leetcodeisgreat") == 5 # provided example
assert solution.findTheLongestSubstring("bcbcbc") == 6 # no vowels
assert solution.findTheLongestSubstring("a") == 0 # single odd vowel
assert solution.findTheLongestSubstring("aa") == 2 # single even vowel pair
assert solution.findTheLongestSubstring("aeiou") == 0 # all vowels odd
assert solution.findTheLongestSubstring("aeiouaeiou") == 10 # all vowels even
assert solution.findTheLongestSubstring("abcde") == 3 # substring "bcd"
assert solution.findTheLongestSubstring("aaaa") == 4 # repeated toggling
assert solution.findTheLongestSubstring("abba") == 4 # no vowels except balanced a
assert solution.findTheLongestSubstring("x") == 1 # single consonant
assert solution.findTheLongestSubstring("aebcdiouf") == 1 # only consonant substrings valid
assert solution.findTheLongestSubstring("bbbbbbbb") == 8 # all consonants
assert solution.findTheLongestSubstring("aeae") == 4 # multiple balanced vowels
long_input = "a" * 100000
assert solution.findTheLongestSubstring(long_input) == 100000 # large even count input
long_input_odd = "a" * 99999
assert solution.findTheLongestSubstring(long_input_odd) == 99998 # large odd count input
| Test | Why |
|---|---|
"eleetminicoworoep" |
Validates the main example |
"leetcodeisgreat" |
Tests mixed vowels and consonants |
"bcbcbc" |
Confirms zero vowel counts are valid |
"a" |
Smallest invalid vowel case |
"aa" |
Smallest balanced vowel case |
"aeiou" |
Every vowel appears once, invalid |
"aeiouaeiou" |
All vowels balanced |
"abcde" |
Valid substring exists in middle |
"aaaa" |
Repeated parity toggling |
"abba" |
Balanced single vowel type |
"x" |
Single consonant |
"aebcdiouf" |
Mostly invalid vowel states |
"bbbbbbbb" |
Entire consonant string valid |
"aeae" |
Multiple balanced vowel occurrences |
"a" * 100000 |
Large-scale performance test |
"a" * 99999 |
Large odd-length edge case |
Edge Cases
Strings With No Vowels
A string like "bcbcbc" contains no vowels at all. Since zero is even, the entire string is valid.
This case can cause bugs if an implementation assumes vowels must appear at least once. Our solution handles it naturally because the parity state never changes from 0, so the entire string becomes the longest valid substring.
Substrings Starting at Index 0
Consider "aa".
The valid substring starts from the beginning of the string. Without initializing:
first_seen = {0: -1}
the algorithm would fail to detect prefixes that are themselves valid.
Using -1 represents the empty prefix before the string begins, allowing correct length calculations.
Repeated State Occurrences
A state may appear many times throughout the string. For correctness, we must store only the first occurrence.
For example, if a state first appears at index 2 and later at index 10, then reappears at index 15, using the earliest index produces the longest possible substring.
If we overwrote earlier indices, we would incorrectly compute shorter substrings and potentially miss the optimal answer.