LeetCode 301 - Remove Invalid Parentheses
The problem asks us to remove the minimum number of invalid parentheses from a string so that the remaining string becomes valid. The input string may contain lowercase English letters in addition to parentheses.
Difficulty: 🔴 Hard
Topics: String, Backtracking, Breadth-First Search
Solution
Problem Understanding
The problem asks us to remove the minimum number of invalid parentheses from a string so that the remaining string becomes valid. The input string may contain lowercase English letters in addition to parentheses. Letters do not affect validity and should always remain in the resulting strings unless they are removed as part of a broader deletion process. The goal is to return every unique valid string that can be produced with the fewest possible removals.
A valid parentheses string follows the standard matching rules. Every opening parenthesis ( must eventually be matched by a closing parenthesis ), and at no point during a left-to-right scan can the number of closing parentheses exceed the number of opening parentheses. Letters can appear anywhere and are ignored when checking validity.
The key detail in this problem is that we are not asked to return just one valid string. We must return all unique valid strings that can be formed after removing the minimum number of parentheses. This means two things matter simultaneously:
- The resulting strings must be valid.
- The number of removals must be as small as possible.
The constraints are relatively small. The string length is at most 25, and there are at most 20 parentheses. These limits strongly suggest that exponential exploration is acceptable if pruning is done carefully. A brute-force solution that generates every possible subsequence would still be too large in the worst case, but a carefully controlled search using backtracking or breadth-first search becomes practical.
Several edge cases are important:
- Strings that are already valid, such as
"()()", should return the original string unchanged. - Strings containing only invalid parentheses, such as
")(", should return[""]. - Duplicate results must be avoided because different removal paths can produce the same string.
- Strings containing letters mixed with parentheses require preserving letters correctly while validating only parentheses.
- Inputs with many repeated parentheses can create huge duplicate search trees if pruning is not handled carefully.
Approaches
Brute Force Approach
The brute-force strategy is to generate every possible subsequence of the string by deciding for each character whether to keep it or remove it. After generating a candidate string, we check whether it forms a valid parentheses sequence. Among all valid candidates, we keep only those with the maximum length, because maximizing length is equivalent to minimizing removals.
This approach is correct because it explores every possible way to remove characters. Since every valid result appears somewhere in the search space, the algorithm eventually finds all minimum-removal solutions.
The problem is that the number of subsequences grows exponentially. For a string of length n, there are 2^n possible subsequences. Even with n = 25, this becomes too expensive. In addition, validating every generated string requires another linear scan, making the total complexity extremely high.
Optimal Approach
A much better solution uses Breadth-First Search, abbreviated as BFS.
The key observation is that BFS naturally explores states level by level. Each level corresponds to removing exactly one more character than the previous level. This property is extremely valuable because the first time we encounter valid strings, we know they were produced using the minimum number of removals.
Instead of generating every subsequence, we:
- Start with the original string.
- Generate neighboring states by removing one parenthesis at each position.
- Use BFS to explore strings with fewer and fewer parentheses.
- Stop exploring deeper levels once valid strings are found.
This guarantees minimal removals because BFS processes all states with k removals before considering states with k + 1 removals.
A hash set is used to avoid revisiting duplicate strings.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^n × n) | O(2^n × n) | Generates every subsequence and validates each one |
| Optimal BFS | O(2^n × n) worst case | O(2^n × n) | Much faster in practice because BFS stops at minimum-removal level |
Although the worst-case complexity remains exponential, BFS dramatically reduces unnecessary exploration because it terminates as soon as the minimum-removal solutions are discovered.
Algorithm Walkthrough
BFS Strategy
- Start by placing the original string into a queue.
The queue represents states waiting to be processed. Each state is a candidate string formed after some removals.
2. Create a hash set called visited.
Different removal paths can produce the same string. Without deduplication, the search tree would explode exponentially. The visited set ensures every string is processed only once.
3. Define a helper function to validate a string.
The validation logic scans left to right while maintaining a counter:
- Increment for
(. - Decrement for
). - If the counter ever becomes negative, the string is invalid.
- At the end, the counter must be zero.
- Process strings level by level using BFS.
Every level corresponds to removing one additional character. This is the critical property that guarantees minimal removals. 5. For each string removed from the queue, check whether it is valid.
If it is valid:
- Add it to the answer list.
- Mark that a valid level has been found.
- Once a valid string is found at the current level, do not generate deeper states.
Since BFS explores by removal count, any deeper level would require more removals and therefore cannot be optimal. 7. If the current string is not valid and no valid string has yet been found at this level, generate neighbors.
Neighbors are created by removing one parenthesis character at every possible position. 8. Skip non-parenthesis characters during removal generation.
Removing letters is never beneficial because letters do not affect validity. 9. Add unseen neighbor strings into the queue and mark them visited.
This continues the BFS exploration efficiently without redundant processing. 10. Return all collected valid strings.
Because BFS stops at the first valid level, every returned string uses the minimum number of removals.
Why it works
The correctness comes from the level-order nature of BFS. Every BFS level corresponds to strings produced after the same number of removals. Therefore, the first level containing valid strings must represent the minimum number of removals required. Since every possible removal combination at that level is explored, all valid minimum-removal strings are found.
Python Solution
from collections import deque
from typing import List
class Solution:
def removeInvalidParentheses(self, s: str) -> List[str]:
def is_valid(string: str) -> bool:
balance = 0
for char in string:
if char == '(':
balance += 1
elif char == ')':
balance -= 1
if balance < 0:
return False
return balance == 0
result = []
visited = set([s])
queue = deque([s])
found = False
while queue:
current = queue.popleft()
if is_valid(current):
result.append(current)
found = True
if found:
continue
for i in range(len(current)):
if current[i] not in '()':
continue
next_string = current[:i] + current[i + 1:]
if next_string not in visited:
visited.add(next_string)
queue.append(next_string)
return result
The implementation begins with the is_valid helper function. This function checks whether a string has balanced parentheses using a simple counter. Every opening parenthesis increases the counter, and every closing parenthesis decreases it. If the counter ever becomes negative, a closing parenthesis appeared before a matching opening one, making the string invalid immediately.
The BFS portion uses a queue initialized with the original input string. The visited set prevents duplicate processing.
During each BFS iteration, the current string is validated. If it is valid, it is added to the result list and the found flag becomes True. Once this happens, no deeper levels are explored because deeper levels would involve additional removals.
If no valid string has been found yet, the algorithm generates all neighboring states by removing one parenthesis at each possible position. Letters are skipped because removing them never helps create valid parentheses.
The final result contains every unique valid string requiring the minimum number of removals.
Go Solution
package main
func removeInvalidParentheses(s string) []string {
isValid := func(str string) bool {
balance := 0
for _, ch := range str {
if ch == '(' {
balance++
} else if ch == ')' {
balance--
if balance < 0 {
return false
}
}
}
return balance == 0
}
result := []string{}
visited := make(map[string]bool)
queue := []string{s}
visited[s] = true
found := false
for len(queue) > 0 {
current := queue[0]
queue = queue[1:]
if isValid(current) {
result = append(result, current)
found = true
}
if found {
continue
}
for i := 0; i < len(current); i++ {
if current[i] != '(' && current[i] != ')' {
continue
}
nextString := current[:i] + current[i+1:]
if !visited[nextString] {
visited[nextString] = true
queue = append(queue, nextString)
}
}
}
return result
}
The Go implementation follows the same BFS structure as the Python version. Since Go does not have a built-in deque structure like Python's collections.deque, a slice is used as the queue.
The visited structure is implemented using a map from string to boolean. String slicing is used to generate neighbor states after removing one character.
Go strings are immutable, so creating nextString produces a new string safely without modifying the original.
Worked Examples
Example 1
Input:
s = "()())()"
BFS Level 0
| Current String | Valid? |
|---|---|
()())() |
No |
Generate all states with one removal:
| Removed Index | New String |
|---|---|
| 0 | )())() |
| 1 | (())() |
| 2 | ()))() |
| 3 | ()()() |
| 4 | ()()() |
| 5 | ()())) |
| 6 | ()())( |
BFS Level 1
Check each generated string:
| String | Valid? |
|---|---|
)())() |
No |
(())() |
Yes |
()))() |
No |
()()() |
Yes |
()())) |
No |
()())( |
No |
The first valid strings appear at this level, so BFS stops.
Output:
["(())()", "()()()"]
Example 2
Input:
s = "(a)())()"
BFS Level 0
| Current String | Valid? |
|---|---|
(a)())() |
No |
Generate neighbors by removing one parenthesis:
| New String |
|---|
a)())() |
(a())() |
(a))() |
(a)()() |
(a)())) |
(a)())( |
BFS Level 1
| String | Valid? |
|---|---|
a)())() |
No |
(a())() |
Yes |
(a))() |
No |
(a)()() |
Yes |
(a)())) |
No |
(a)())( |
No |
Output:
["(a())()", "(a)()()"]
Example 3
Input:
s = ")("
BFS Level 0
| Current String | Valid? |
|---|---|
)( |
No |
Generate neighbors:
| New String |
|---|
( |
) |
BFS Level 1
Both are invalid.
Generate next level:
| New String |
|---|
"" |
BFS Level 2
| String | Valid? |
|---|---|
"" |
Yes |
Output:
[""]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(2^n × n) | In the worst case, many substrings are explored and each validity check is O(n) |
| Space | O(2^n × n) | Queue and visited set may store exponentially many strings |
The algorithm is exponential in the worst case because the number of possible removal combinations grows exponentially. However, BFS significantly improves practical performance because it terminates immediately after discovering the minimum-removal level, avoiding unnecessary deeper exploration.
Test Cases
sol = Solution()
# Basic example with two valid answers
assert sorted(sol.removeInvalidParentheses("()())()")) == sorted(["(())()", "()()()"])
# Example with letters included
assert sorted(sol.removeInvalidParentheses("(a)())()")) == sorted(["(a())()", "(a)()()"])
# Completely invalid parentheses
assert sorted(sol.removeInvalidParentheses(")(")) == sorted([""])
# Already valid string
assert sorted(sol.removeInvalidParentheses("()()")) == sorted(["()()"])
# Empty result after removals
assert sorted(sol.removeInvalidParentheses("(((")) == sorted([""])
# No parentheses at all
assert sorted(sol.removeInvalidParentheses("abc")) == sorted(["abc"])
# Single valid pair
assert sorted(sol.removeInvalidParentheses("()")) == sorted(["()"])
# Mixed valid and invalid structure
assert sorted(sol.removeInvalidParentheses("(()")) == sorted(["()"])
# Duplicate removal paths
assert sorted(sol.removeInvalidParentheses("())")) == sorted(["()"])
# Multiple nested possibilities
assert sorted(sol.removeInvalidParentheses("((())")) == sorted(["(())"])
# Complex mixed case
assert sorted(sol.removeInvalidParentheses("(a(b)c)d)")) == sorted(["(a(b)c)d"])
# All closing parentheses
assert sorted(sol.removeInvalidParentheses(")))")) == sorted([""])
Test Summary
| Test | Why |
|---|---|
"()())()" |
Verifies multiple minimum-removal solutions |
"(a)())()" |
Ensures letters are preserved correctly |
")(" |
Tests completely invalid ordering |
"()()" |
Ensures already valid strings remain unchanged |
"(((" |
Tests removal of all parentheses |
"abc" |
Verifies handling of strings without parentheses |
"()" |
Smallest non-trivial valid case |
"(()" |
Tests unmatched opening parenthesis |
"())" |
Tests unmatched closing parenthesis |
"((())" |
Tests nested imbalance |
"(a(b)c)d)" |
Complex mixed structure with letters |
")))" |
All closing parentheses edge case |
Edge Cases
One important edge case is a string that is already valid. A naive implementation might continue removing parentheses unnecessarily and produce shorter valid strings. The BFS solution avoids this because it checks the original string first. If the original string is valid, BFS immediately returns it and never explores deeper levels.
Another important edge case is a string containing only invalid parentheses, such as ")(" or "(((". These cases often expose bugs in validation logic because there may be no valid non-empty solution. The implementation correctly handles this by continuing BFS until the empty string is reached, which is always valid.
A third critical edge case involves duplicate generation paths. For example, the string "())" can produce the same result "()" by removing either of the two closing parentheses. Without a visited set, the algorithm would process duplicate states repeatedly, causing unnecessary work and potentially duplicate outputs. The hash set guarantees that each generated string is explored exactly once.
Another subtle edge case is strings containing letters intermixed with parentheses. Since letters do not affect validity, removing them is unnecessary and wasteful. The implementation skips non-parenthesis characters when generating neighbors, which improves efficiency while preserving correctness.