Difficulty: 🔴 Hard
Topics: Array, Hash Table, String, Sliding Window

Solution

LeetCode 2781 - Length of the Longest Valid Substring

Problem Understanding

The problem gives us a string word and a list of forbidden strings forbidden.

A substring is considered valid if none of its internal substrings appear in the forbidden list. In other words, for a candidate substring word[l:r], every contiguous segment inside it must avoid matching any forbidden string.

Our goal is to find the length of the longest valid substring of word.

A straightforward interpretation is:

We want to choose a contiguous segment of word.
Inside that segment, no forbidden string may appear anywhere.
Among all such valid segments, return the maximum possible length.

The constraints are the most important clue:

word.length <= 100000
forbidden.length <= 100000
Each forbidden string has length at most 10

The input size is large enough that any algorithm examining all substrings is impossible. A string of length 100000 contains roughly 5 × 10^9 substrings, which is far too many to process explicitly.

However, there is a very useful observation hidden in the constraints: every forbidden string has length at most 10.

This means that whenever we need to determine whether a forbidden pattern ends at some position, we only need to inspect at most the previous 10 characters.

Important edge cases include:

A single-character forbidden string, which immediately invalidates every substring containing that character.
Many overlapping forbidden strings.
A forbidden string appearing multiple times.
A word with no forbidden occurrences at all.
A word where every character belongs to some forbidden pattern.
Very large inputs where efficiency is critical.

The problem guarantees that all strings consist only of lowercase English letters and every forbidden string has length between 1 and 10.

Approaches

Brute Force

A naive solution would generate every substring of word.

For each substring, we would need to determine whether any forbidden string appears inside it. One possibility is to scan through the forbidden set and search for each forbidden pattern within the candidate substring.

This approach is correct because it explicitly checks the definition of validity. If a substring contains any forbidden pattern, it is rejected. Otherwise it is counted as valid.

Unfortunately, the number of substrings is O(n²). With n = 100000, this is already far too large. Even before checking forbidden patterns, simply enumerating all substrings is impossible within the constraints.

Key Insight

Instead of examining every substring independently, we can use a sliding window.

Suppose we maintain a window [left, right] that represents the longest valid substring ending at position right.

When we extend the window by moving right one step to the right, the only newly created forbidden occurrences are those that end at right.

Since every forbidden string has length at most 10, we only need to check substrings ending at right with lengths from 1 to 10.

If we find a forbidden substring starting at position start and ending at right, then every valid window ending at right must begin after start. Therefore:

left = max(left, start + 1)

This removes the forbidden occurrence from the current window.

Because we only inspect at most 10 substrings per position, the entire algorithm becomes linear.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n³) or worse	O(1)	Enumerates substrings and validates each one
Optimal	O(n × 10) = O(n)	O(m)	Sliding window with forbidden set lookups

Where:

n = len(word)
m = len(forbidden)

Algorithm Walkthrough

Store all forbidden strings in a hash set.

A hash set provides average O(1) membership checks, which is essential because we perform many lookups. 2. Initialize:

left = 0
answer = 0

The interval [left, right] will always represent a valid window. 3. Iterate right from 0 to n - 1.

At each position, we attempt to extend the current valid window. 4. Check all substrings ending at right whose lengths are between 1 and 10.

Specifically, iterate:

start = right
start = right - 1
...
start = max(left, right - 9)

For each candidate substring:

word[start:right+1]

test whether it belongs to the forbidden set. 6. If a forbidden substring is found:

left = max(left, start + 1)

Any valid window ending at right must start strictly after the forbidden substring's beginning. 7. Continue checking shorter starting positions.

We do not stop after finding one forbidden occurrence because another forbidden substring might begin even farther to the right and require a larger adjustment of left. 8. After processing all possible forbidden endings at right, the window [left, right] is guaranteed to be valid. 9. Update:

answer = max(answer, right - left + 1)

After processing every position, return answer.

Why it works

The invariant is that after processing position right, the window [left, right] contains no forbidden substring.

Every forbidden substring that could invalidate the window and end at right has length at most 10, so checking those candidates is sufficient. Whenever such a substring is found, moving left past its starting position removes it from the window. Since all forbidden occurrences ending at right are examined, the resulting window is valid. Taking the maximum window length over all positions therefore yields the longest valid substring.

Python Solution

from typing import List

class Solution:
    def longestValidSubstring(self, word: str, forbidden: List[str]) -> int:
        forbidden_set = set(forbidden)

        left = 0
        answer = 0
        n = len(word)

        for right in range(n):
            for start in range(right, max(left - 1, right - 10), -1):
                if word[start:right + 1] in forbidden_set:
                    left = max(left, start + 1)

            answer = max(answer, right - left + 1)

        return answer

The implementation begins by converting the forbidden list into a hash set. This allows constant-time membership checks.

The variable left stores the smallest valid starting index for the current window. As we extend the window with each new right position, we inspect all substrings ending at right whose lengths are at most 10.

Whenever one of these substrings belongs to the forbidden set, the current window can no longer include its starting position. Therefore we move left to at least start + 1.

After all candidate forbidden substrings ending at right have been examined, the window is valid. We then update the maximum length found so far.

Because each position examines at most 10 substrings, the algorithm remains linear.

Go Solution

func longestValidSubstring(word string, forbidden []string) int {
	forbiddenSet := make(map[string]struct{})

	for _, s := range forbidden {
		forbiddenSet[s] = struct{}{}
	}

	left := 0
	ans := 0
	n := len(word)

	for right := 0; right < n; right++ {
		limit := right - 10
		if limit < left-1 {
			limit = left - 1
		}

		for start := right; start > limit; start-- {
			if _, exists := forbiddenSet[word[start:right+1]]; exists {
				if start+1 > left {
					left = start + 1
				}
			}
		}

		length := right - left + 1
		if length > ans {
			ans = length
		}
	}

	return ans
}

The Go implementation follows the same logic as the Python version.

The forbidden strings are stored in a map[string]struct{} to achieve constant-time lookups. Go string slicing creates substrings using index ranges, making the implementation very similar to Python.

No special handling for integer overflow is required because all indices and lengths are bounded by 100000, well within Go's int range.

Worked Examples

Example 1

word = "cbaaaabc"
forbidden = ["aaa", "cb"]

Forbidden set:

{"aaa", "cb"}

right	char	left after processing	valid window	length	answer
0	c	0	"c"	1	1
1	b	1 (found "cb")	"b"	1	1
2	a	1	"ba"	2	2
3	a	1	"baa"	3	3
4	a	3 (found "aaa")	"aa"	2	3
5	a	4 (found "aaa")	"aa"	2	3
6	b	4	"aab"	3	3
7	c	4	"aabc"	4	4

Final answer:

Example 2

word = "leetcode"
forbidden = ["de", "le", "e"]

Forbidden set:

{"de", "le", "e"}

right	char	left after processing	valid window	length	answer
0	l	0	"l"	1	1
1	e	2	empty window ending at 1	0	1
2	e	3	empty window ending at 2	0	1
3	t	3	"t"	1	1
4	c	3	"tc"	2	2
5	o	3	"tco"	3	3
6	d	3	"tcod"	4	4
7	e	8	empty window ending at 7	0	4

Final answer:

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	At each position we check at most 10 substrings
Space	O(m)	Hash set storing forbidden strings

Let n be the length of word and m be the number of forbidden strings.

For every position in word, we inspect at most 10 candidate substrings because forbidden strings have maximum length 10. Therefore the total amount of work is bounded by 10n, which simplifies to O(n).

The extra memory usage comes from storing all forbidden strings in a hash set, requiring O(m) space.

Test Cases

from typing import List

s = Solution()

assert s.longestValidSubstring("cbaaaabc", ["aaa", "cb"]) == 4  # example 1
assert s.longestValidSubstring("leetcode", ["de", "le", "e"]) == 4  # example 2

assert s.longestValidSubstring("a", ["b"]) == 1  # single valid character
assert s.longestValidSubstring("a", ["a"]) == 0  # single forbidden character

assert s.longestValidSubstring("abcdef", ["xyz"]) == 6  # no forbidden occurrence

assert s.longestValidSubstring("aaaaa", ["a"]) == 0  # every character forbidden

assert s.longestValidSubstring("aaaaa", ["aa"]) == 1  # longest segment avoids length-2 pattern

assert s.longestValidSubstring("ababab", ["aba"]) == 2  # overlapping forbidden patterns

assert s.longestValidSubstring("abcde", ["bc", "cd"]) == 2  # multiple forbidden strings

assert s.longestValidSubstring("abcdefghij", ["abcdefghij"]) == 9  # forbidden length 10

assert s.longestValidSubstring("zzzzzz", ["zzz"]) == 2  # repeated overlapping occurrences

assert s.longestValidSubstring("abacaba", ["aca"]) == 4  # forbidden occurrence in middle

assert s.longestValidSubstring("abc", ["a", "b", "c"]) == 0  # all characters forbidden

Test Summary

Test	Why
`"cbaaaabc"`	Official example 1
`"leetcode"`	Official example 2
Single valid character	Smallest valid input
Single forbidden character	Smallest invalid input
No forbidden occurrence	Entire string should be valid
Forbidden `"a"`	Every window becomes invalid
Forbidden `"aa"`	Repeated pattern handling
Overlapping `"aba"`	Tests overlapping matches
Multiple forbidden patterns	Tests left boundary updates
Length-10 forbidden string	Maximum forbidden length
Repeated `"zzz"` occurrences	Dense overlapping violations
Forbidden in middle	Window adjustment correctness
All characters forbidden	Answer becomes 0

Edge Cases

Single-Character Forbidden Strings

When a forbidden string has length one, every occurrence of that character immediately invalidates any substring containing it. This can be a source of bugs because the valid window may collapse completely. The implementation handles this naturally by detecting the forbidden substring ending at the current position and moving left to right + 1.

Overlapping Forbidden Occurrences

Consider a string like "aaaaa" with forbidden string "aaa". Multiple forbidden occurrences overlap heavily. A naive implementation might only react to the first occurrence and leave another forbidden occurrence inside the window. By checking all candidate substrings ending at each position and updating left whenever necessary, the algorithm correctly removes every violation.

Maximum Forbidden Length

The constraint that forbidden strings have length at most 10 is the key optimization. It would be easy to accidentally scan arbitrarily long substrings ending at each position, leading to quadratic behavior. The implementation deliberately limits inspection to the last 10 characters, ensuring linear performance even when word contains 100,000 characters.

Multiple Forbidden Matches Ending at the Same Position

Several forbidden substrings can end at the same index. For example, both "bc" and "abc" might end at a given position. The algorithm continues checking all candidate starting positions and uses left = max(left, start + 1), ensuring the final window excludes every forbidden occurrence ending at that position.