LeetCode 1371 - Find the Longest Substring Containing Vowels in Even Counts

This problem asks us to find the length of the longest contiguous substring in which every vowel appears an even number

LeetCode Problem 1371

Difficulty: 🟡 Medium
Topics: Hash Table, String, Bit Manipulation, Prefix Sum

Solution

LeetCode 1371 - Find the Longest Substring Containing Vowels in Even Counts

Problem Understanding

This problem asks us to find the length of the longest contiguous substring in which every vowel appears an even number of times. The vowels we care about are:

  • a
  • e
  • i
  • o
  • u

A vowel may appear zero times, and zero is considered even. Therefore, a substring with no vowels at all is valid.

The input is a string s consisting only of lowercase English letters. We must return the maximum possible length of a substring satisfying the condition.

The important detail is that we only care whether each vowel appears an even or odd number of times. The exact count does not matter beyond its parity.

For example, if a substring contains:

  • e twice
  • i four times
  • o zero times

then those vowels are valid because all counts are even.

The constraints are large:

  • 1 <= s.length <= 5 * 10^5

This immediately tells us that quadratic algorithms will likely time out. An O(n^2) solution would require checking billions of substrings in the worst case. We need a linear or near-linear solution.

Several edge cases are important:

  • A string with no vowels at all, such as "bcbcbc", should return the entire string length.
  • A string where no non-empty valid substring exists beyond single consonants.
  • Repeated toggling of vowel parity, where vowels appear many times.
  • Very large inputs, where efficiency is critical.
  • Prefixes themselves may already satisfy the condition.

The problem guarantees only lowercase English letters, so we do not need to handle uppercase characters or non-alphabetic symbols.

Approaches

Brute Force Approach

The most straightforward solution is to examine every possible substring.

For each substring:

  1. Count the occurrences of each vowel.
  2. Check whether every vowel count is even.
  3. If valid, update the maximum length.

A naive implementation would generate all O(n^2) substrings, and counting vowels inside each substring would take O(n) time, resulting in O(n^3) complexity.

We can improve this slightly using prefix frequency arrays so each substring check becomes O(1), reducing the total complexity to O(n^2). However, with n = 5 * 10^5, even O(n^2) is far too slow.

The brute force method is correct because it explicitly checks every possible substring, but it is not practical for the input size.

Optimal Approach

The key insight is that we only care about whether each vowel count is even or odd.

Each vowel can have two states:

  • even
  • odd

There are five vowels, so the total number of possible parity states is:

$$2^5 = 32$$

We can represent the parity state using a 5-bit bitmask:

  • bit 0 for a
  • bit 1 for e
  • bit 2 for i
  • bit 3 for o
  • bit 4 for u

As we scan the string:

  • encountering a vowel toggles its corresponding bit
  • consonants leave the state unchanged

If the same bitmask appears at two different indices, then the substring between them has even counts for all vowels.

Why?

Because the parity changes cancel out. If the prefix parity state at index i equals the prefix parity state at index j, then the substring (i+1 ... j) must contain an even number of each vowel.

This transforms the problem into finding the largest distance between two identical prefix states.

We store the earliest index where each state first appears. Whenever we see the same state again, we compute the substring length.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(1) or O(n) Checks every substring explicitly
Optimal O(n) O(1) Uses bitmask parity states and prefix tracking

Algorithm Walkthrough

  1. Create a bitmask variable called state, initially set to 0.

A state of 0 means all vowels currently have even counts. 2. Create a hash map called first_seen.

This map stores the earliest index where each parity state appeared.

Initialize it with:

{0: -1}

This handles cases where a valid substring starts from index 0. 3. Define a mapping from vowels to bit positions.

For example:

  • a -> 0
  • e -> 1
  • i -> 2
  • o -> 3
  • u -> 4
  1. Traverse the string character by character.
  2. For each character:
  • If it is a vowel, toggle its corresponding bit using XOR.
  • If it is a consonant, do nothing.

Example:

state ^= (1 << vowel_index)
  1. After updating the state:
  • If this state has appeared before, compute the substring length from the first occurrence to the current index.
  • Otherwise, store the current index as the first occurrence of this state.
  1. Keep track of the maximum substring length found.
  2. Return the maximum length after processing the entire string.

Why it works

The algorithm relies on prefix parity states.

Suppose two prefixes have the same parity state. This means every vowel has changed parity an even number of times between those two positions. Therefore, the substring between them contains even counts of all vowels.

By storing the earliest occurrence of each state, we maximize the substring length whenever the same state reappears.

Since there are only 32 possible states, the algorithm remains extremely efficient.

Python Solution

class Solution:
    def findTheLongestSubstring(self, s: str) -> int:
        vowel_to_bit = {
            'a': 0,
            'e': 1,
            'i': 2,
            'o': 3,
            'u': 4
        }

        first_seen = {0: -1}

        state = 0
        max_length = 0

        for index, char in enumerate(s):
            if char in vowel_to_bit:
                bit_position = vowel_to_bit[char]
                state ^= (1 << bit_position)

            if state in first_seen:
                current_length = index - first_seen[state]
                max_length = max(max_length, current_length)
            else:
                first_seen[state] = index

        return max_length

The implementation starts by creating a mapping between vowels and their corresponding bit positions. This allows us to efficiently toggle parity information using bit operations.

The first_seen dictionary records the earliest index where each parity state appeared. We initialize state 0 at index -1 because an empty prefix has all vowels appearing an even number of times.

As we iterate through the string, we update the parity state whenever we encounter a vowel. XOR is used because it naturally toggles a bit between 0 and 1.

If the current state has appeared before, then the substring between the previous occurrence and the current position has even counts for every vowel. We compute its length and update the answer.

If the state has never appeared before, we store the current index. We only store the first occurrence because earlier positions produce longer substrings.

Go Solution

func findTheLongestSubstring(s string) int {
    vowelToBit := map[byte]int{
        'a': 0,
        'e': 1,
        'i': 2,
        'o': 3,
        'u': 4,
    }

    firstSeen := map[int]int{
        0: -1,
    }

    state := 0
    maxLength := 0

    for i := 0; i < len(s); i++ {
        ch := s[i]

        if bit, exists := vowelToBit[ch]; exists {
            state ^= (1 << bit)
        }

        if firstIndex, exists := firstSeen[state]; exists {
            currentLength := i - firstIndex

            if currentLength > maxLength {
                maxLength = currentLength
            }
        } else {
            firstSeen[state] = i
        }
    }

    return maxLength
}

The Go implementation follows the same logic as the Python version. A map[int]int is used to store the first occurrence of each parity state.

Since Go strings are byte arrays for ASCII characters, we index directly using s[i]. This is efficient and safe because the input contains only lowercase English letters.

The parity state is stored as an integer bitmask, and XOR toggles vowel parity exactly as in the Python solution.

Worked Examples

Example 1

Input:

s = "eleetminicoworoep"

We track the parity state as a 5-bit mask:

Index Char State (binary) Meaning First Seen Max Length
-1 - 00000 all even {00000: -1} 0
0 e 00010 e odd store 0 0
1 l 00010 unchanged seen at 0 1
2 e 00000 e even again seen at -1 3
3 e 00010 e odd seen at 0 3
4 t 00010 unchanged seen at 0 4
5 m 00010 unchanged seen at 0 5
6 i 00110 e odd, i odd store 6 5
7 n 00110 unchanged seen at 6 5
8 i 00010 i even again seen at 0 8
9 c 00010 unchanged seen at 0 9
10 o 01010 e odd, o odd store 10 9
11 w 01010 unchanged seen at 10 9
12 o 00010 o even again seen at 0 12
13 r 00010 unchanged seen at 0 13

The longest valid substring length becomes 13.

Example 2

Input:

s = "leetcodeisgreat"
Index Char State Action Max Length
-1 - 00000 initialize 0
0 l 00000 seen before 1
1 e 00010 first time 1
2 e 00000 seen at -1 3
3 t 00000 seen at -1 4
4 c 00000 seen at -1 5

The maximum valid substring is "leetc" with length 5.

Example 3

Input:

s = "bcbcbc"

Since no vowels appear, the state always remains 00000.

Index Char State Max Length
-1 - 00000 0
0 b 00000 1
1 c 00000 2
2 b 00000 3
3 c 00000 4
4 b 00000 5
5 c 00000 6

The entire string is valid.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is processed exactly once
Space O(1) At most 32 parity states are stored

The time complexity is linear because we scan the string once and each operation inside the loop is constant time.

The space complexity is technically O(1) because there are only 32 possible parity states for the five vowels. Regardless of input size, the hash map can never grow beyond 32 entries.

Test Cases

solution = Solution()

assert solution.findTheLongestSubstring("eleetminicoworoep") == 13  # provided example
assert solution.findTheLongestSubstring("leetcodeisgreat") == 5     # provided example
assert solution.findTheLongestSubstring("bcbcbc") == 6              # no vowels

assert solution.findTheLongestSubstring("a") == 0                   # single odd vowel
assert solution.findTheLongestSubstring("aa") == 2                  # single even vowel pair
assert solution.findTheLongestSubstring("aeiou") == 0               # all vowels odd
assert solution.findTheLongestSubstring("aeiouaeiou") == 10         # all vowels even

assert solution.findTheLongestSubstring("abcde") == 3               # substring "bcd"
assert solution.findTheLongestSubstring("aaaa") == 4                # repeated toggling
assert solution.findTheLongestSubstring("abba") == 4                # no vowels except balanced a
assert solution.findTheLongestSubstring("x") == 1                   # single consonant

assert solution.findTheLongestSubstring("aebcdiouf") == 1           # only consonant substrings valid
assert solution.findTheLongestSubstring("bbbbbbbb") == 8            # all consonants
assert solution.findTheLongestSubstring("aeae") == 4                # multiple balanced vowels

long_input = "a" * 100000
assert solution.findTheLongestSubstring(long_input) == 100000       # large even count input

long_input_odd = "a" * 99999
assert solution.findTheLongestSubstring(long_input_odd) == 99998    # large odd count input
Test Why
"eleetminicoworoep" Validates the main example
"leetcodeisgreat" Tests mixed vowels and consonants
"bcbcbc" Confirms zero vowel counts are valid
"a" Smallest invalid vowel case
"aa" Smallest balanced vowel case
"aeiou" Every vowel appears once, invalid
"aeiouaeiou" All vowels balanced
"abcde" Valid substring exists in middle
"aaaa" Repeated parity toggling
"abba" Balanced single vowel type
"x" Single consonant
"aebcdiouf" Mostly invalid vowel states
"bbbbbbbb" Entire consonant string valid
"aeae" Multiple balanced vowel occurrences
"a" * 100000 Large-scale performance test
"a" * 99999 Large odd-length edge case

Edge Cases

Strings With No Vowels

A string like "bcbcbc" contains no vowels at all. Since zero is even, the entire string is valid.

This case can cause bugs if an implementation assumes vowels must appear at least once. Our solution handles it naturally because the parity state never changes from 0, so the entire string becomes the longest valid substring.

Substrings Starting at Index 0

Consider "aa".

The valid substring starts from the beginning of the string. Without initializing:

first_seen = {0: -1}

the algorithm would fail to detect prefixes that are themselves valid.

Using -1 represents the empty prefix before the string begins, allowing correct length calculations.

Repeated State Occurrences

A state may appear many times throughout the string. For correctness, we must store only the first occurrence.

For example, if a state first appears at index 2 and later at index 10, then reappears at index 15, using the earliest index produces the longest possible substring.

If we overwrote earlier indices, we would incorrectly compute shorter substrings and potentially miss the optimal answer.