LeetCode 2781 - Length of the Longest Valid Substring
The problem gives us a string word and a list of forbidden strings forbidden. A substring is considered valid if none of its internal substrings appear in the forbidden list.
Difficulty: 🔴 Hard
Topics: Array, Hash Table, String, Sliding Window
Solution
LeetCode 2781 - Length of the Longest Valid Substring
Problem Understanding
The problem gives us a string word and a list of forbidden strings forbidden.
A substring is considered valid if none of its internal substrings appear in the forbidden list. In other words, for a candidate substring word[l:r], every contiguous segment inside it must avoid matching any forbidden string.
Our goal is to find the length of the longest valid substring of word.
A straightforward interpretation is:
- We want to choose a contiguous segment of
word. - Inside that segment, no forbidden string may appear anywhere.
- Among all such valid segments, return the maximum possible length.
The constraints are the most important clue:
word.length <= 100000forbidden.length <= 100000- Each forbidden string has length at most
10
The input size is large enough that any algorithm examining all substrings is impossible. A string of length 100000 contains roughly 5 × 10^9 substrings, which is far too many to process explicitly.
However, there is a very useful observation hidden in the constraints: every forbidden string has length at most 10.
This means that whenever we need to determine whether a forbidden pattern ends at some position, we only need to inspect at most the previous 10 characters.
Important edge cases include:
- A single-character forbidden string, which immediately invalidates every substring containing that character.
- Many overlapping forbidden strings.
- A forbidden string appearing multiple times.
- A word with no forbidden occurrences at all.
- A word where every character belongs to some forbidden pattern.
- Very large inputs where efficiency is critical.
The problem guarantees that all strings consist only of lowercase English letters and every forbidden string has length between 1 and 10.
Approaches
Brute Force
A naive solution would generate every substring of word.
For each substring, we would need to determine whether any forbidden string appears inside it. One possibility is to scan through the forbidden set and search for each forbidden pattern within the candidate substring.
This approach is correct because it explicitly checks the definition of validity. If a substring contains any forbidden pattern, it is rejected. Otherwise it is counted as valid.
Unfortunately, the number of substrings is O(n²). With n = 100000, this is already far too large. Even before checking forbidden patterns, simply enumerating all substrings is impossible within the constraints.
Key Insight
Instead of examining every substring independently, we can use a sliding window.
Suppose we maintain a window [left, right] that represents the longest valid substring ending at position right.
When we extend the window by moving right one step to the right, the only newly created forbidden occurrences are those that end at right.
Since every forbidden string has length at most 10, we only need to check substrings ending at right with lengths from 1 to 10.
If we find a forbidden substring starting at position start and ending at right, then every valid window ending at right must begin after start. Therefore:
left = max(left, start + 1)
This removes the forbidden occurrence from the current window.
Because we only inspect at most 10 substrings per position, the entire algorithm becomes linear.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n³) or worse | O(1) | Enumerates substrings and validates each one |
| Optimal | O(n × 10) = O(n) | O(m) | Sliding window with forbidden set lookups |
Where:
n = len(word)m = len(forbidden)
Algorithm Walkthrough
- Store all forbidden strings in a hash set.
A hash set provides average O(1) membership checks, which is essential because we perform many lookups.
2. Initialize:
left = 0answer = 0
The interval [left, right] will always represent a valid window.
3. Iterate right from 0 to n - 1.
At each position, we attempt to extend the current valid window.
4. Check all substrings ending at right whose lengths are between 1 and 10.
Specifically, iterate:
start = right
start = right - 1
...
start = max(left, right - 9)
- For each candidate substring:
word[start:right+1]
test whether it belongs to the forbidden set. 6. If a forbidden substring is found:
left = max(left, start + 1)
Any valid window ending at right must start strictly after the forbidden substring's beginning.
7. Continue checking shorter starting positions.
We do not stop after finding one forbidden occurrence because another forbidden substring might begin even farther to the right and require a larger adjustment of left.
8. After processing all possible forbidden endings at right, the window [left, right] is guaranteed to be valid.
9. Update:
answer = max(answer, right - left + 1)
- After processing every position, return
answer.
Why it works
The invariant is that after processing position right, the window [left, right] contains no forbidden substring.
Every forbidden substring that could invalidate the window and end at right has length at most 10, so checking those candidates is sufficient. Whenever such a substring is found, moving left past its starting position removes it from the window. Since all forbidden occurrences ending at right are examined, the resulting window is valid. Taking the maximum window length over all positions therefore yields the longest valid substring.
Python Solution
from typing import List
class Solution:
def longestValidSubstring(self, word: str, forbidden: List[str]) -> int:
forbidden_set = set(forbidden)
left = 0
answer = 0
n = len(word)
for right in range(n):
for start in range(right, max(left - 1, right - 10), -1):
if word[start:right + 1] in forbidden_set:
left = max(left, start + 1)
answer = max(answer, right - left + 1)
return answer
The implementation begins by converting the forbidden list into a hash set. This allows constant-time membership checks.
The variable left stores the smallest valid starting index for the current window. As we extend the window with each new right position, we inspect all substrings ending at right whose lengths are at most 10.
Whenever one of these substrings belongs to the forbidden set, the current window can no longer include its starting position. Therefore we move left to at least start + 1.
After all candidate forbidden substrings ending at right have been examined, the window is valid. We then update the maximum length found so far.
Because each position examines at most 10 substrings, the algorithm remains linear.
Go Solution
func longestValidSubstring(word string, forbidden []string) int {
forbiddenSet := make(map[string]struct{})
for _, s := range forbidden {
forbiddenSet[s] = struct{}{}
}
left := 0
ans := 0
n := len(word)
for right := 0; right < n; right++ {
limit := right - 10
if limit < left-1 {
limit = left - 1
}
for start := right; start > limit; start-- {
if _, exists := forbiddenSet[word[start:right+1]]; exists {
if start+1 > left {
left = start + 1
}
}
}
length := right - left + 1
if length > ans {
ans = length
}
}
return ans
}
The Go implementation follows the same logic as the Python version.
The forbidden strings are stored in a map[string]struct{} to achieve constant-time lookups. Go string slicing creates substrings using index ranges, making the implementation very similar to Python.
No special handling for integer overflow is required because all indices and lengths are bounded by 100000, well within Go's int range.
Worked Examples
Example 1
word = "cbaaaabc"
forbidden = ["aaa", "cb"]
Forbidden set:
{"aaa", "cb"}
| right | char | left after processing | valid window | length | answer |
|---|---|---|---|---|---|
| 0 | c | 0 | "c" | 1 | 1 |
| 1 | b | 1 (found "cb") | "b" | 1 | 1 |
| 2 | a | 1 | "ba" | 2 | 2 |
| 3 | a | 1 | "baa" | 3 | 3 |
| 4 | a | 3 (found "aaa") | "aa" | 2 | 3 |
| 5 | a | 4 (found "aaa") | "aa" | 2 | 3 |
| 6 | b | 4 | "aab" | 3 | 3 |
| 7 | c | 4 | "aabc" | 4 | 4 |
Final answer:
4
Example 2
word = "leetcode"
forbidden = ["de", "le", "e"]
Forbidden set:
{"de", "le", "e"}
| right | char | left after processing | valid window | length | answer |
|---|---|---|---|---|---|
| 0 | l | 0 | "l" | 1 | 1 |
| 1 | e | 2 | empty window ending at 1 | 0 | 1 |
| 2 | e | 3 | empty window ending at 2 | 0 | 1 |
| 3 | t | 3 | "t" | 1 | 1 |
| 4 | c | 3 | "tc" | 2 | 2 |
| 5 | o | 3 | "tco" | 3 | 3 |
| 6 | d | 3 | "tcod" | 4 | 4 |
| 7 | e | 8 | empty window ending at 7 | 0 | 4 |
Final answer:
4
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | At each position we check at most 10 substrings |
| Space | O(m) | Hash set storing forbidden strings |
Let n be the length of word and m be the number of forbidden strings.
For every position in word, we inspect at most 10 candidate substrings because forbidden strings have maximum length 10. Therefore the total amount of work is bounded by 10n, which simplifies to O(n).
The extra memory usage comes from storing all forbidden strings in a hash set, requiring O(m) space.
Test Cases
from typing import List
s = Solution()
assert s.longestValidSubstring("cbaaaabc", ["aaa", "cb"]) == 4 # example 1
assert s.longestValidSubstring("leetcode", ["de", "le", "e"]) == 4 # example 2
assert s.longestValidSubstring("a", ["b"]) == 1 # single valid character
assert s.longestValidSubstring("a", ["a"]) == 0 # single forbidden character
assert s.longestValidSubstring("abcdef", ["xyz"]) == 6 # no forbidden occurrence
assert s.longestValidSubstring("aaaaa", ["a"]) == 0 # every character forbidden
assert s.longestValidSubstring("aaaaa", ["aa"]) == 1 # longest segment avoids length-2 pattern
assert s.longestValidSubstring("ababab", ["aba"]) == 2 # overlapping forbidden patterns
assert s.longestValidSubstring("abcde", ["bc", "cd"]) == 2 # multiple forbidden strings
assert s.longestValidSubstring("abcdefghij", ["abcdefghij"]) == 9 # forbidden length 10
assert s.longestValidSubstring("zzzzzz", ["zzz"]) == 2 # repeated overlapping occurrences
assert s.longestValidSubstring("abacaba", ["aca"]) == 4 # forbidden occurrence in middle
assert s.longestValidSubstring("abc", ["a", "b", "c"]) == 0 # all characters forbidden
Test Summary
| Test | Why |
|---|---|
"cbaaaabc" |
Official example 1 |
"leetcode" |
Official example 2 |
| Single valid character | Smallest valid input |
| Single forbidden character | Smallest invalid input |
| No forbidden occurrence | Entire string should be valid |
Forbidden "a" |
Every window becomes invalid |
Forbidden "aa" |
Repeated pattern handling |
Overlapping "aba" |
Tests overlapping matches |
| Multiple forbidden patterns | Tests left boundary updates |
| Length-10 forbidden string | Maximum forbidden length |
Repeated "zzz" occurrences |
Dense overlapping violations |
| Forbidden in middle | Window adjustment correctness |
| All characters forbidden | Answer becomes 0 |
Edge Cases
Single-Character Forbidden Strings
When a forbidden string has length one, every occurrence of that character immediately invalidates any substring containing it. This can be a source of bugs because the valid window may collapse completely. The implementation handles this naturally by detecting the forbidden substring ending at the current position and moving left to right + 1.
Overlapping Forbidden Occurrences
Consider a string like "aaaaa" with forbidden string "aaa". Multiple forbidden occurrences overlap heavily. A naive implementation might only react to the first occurrence and leave another forbidden occurrence inside the window. By checking all candidate substrings ending at each position and updating left whenever necessary, the algorithm correctly removes every violation.
Maximum Forbidden Length
The constraint that forbidden strings have length at most 10 is the key optimization. It would be easy to accidentally scan arbitrarily long substrings ending at each position, leading to quadratic behavior. The implementation deliberately limits inspection to the last 10 characters, ensuring linear performance even when word contains 100,000 characters.
Multiple Forbidden Matches Ending at the Same Position
Several forbidden substrings can end at the same index. For example, both "bc" and "abc" might end at a given position. The algorithm continues checking all candidate starting positions and uses left = max(left, start + 1), ensuring the final window excludes every forbidden occurrence ending at that position.