LeetCode 10 - Regular Expression Matching
LeetCode 10, Regular Expression Matching, asks us to determine whether an entire input string s matches a pattern p. The pattern supports two special regular expression characters: - .
Difficulty: 🔴 Hard
Topics: String, Dynamic Programming, Recursion
Solution
Problem Understanding
LeetCode 10, Regular Expression Matching, asks us to determine whether an entire input string s matches a pattern p. The pattern supports two special regular expression characters:
.matches any single character*matches zero or more occurrences of the character immediately before it
The important detail is that the match must cover the entire string, not just a substring. This means we cannot stop early when part of the string matches. Every character in s must be consumed by the pattern, and every meaningful part of the pattern must participate in the match.
For example:
"aa"does not match"a"because the pattern only matches one character"aa"matches"a*"because*allows repeating'a'"ab"matches".*"because.matches any character and*allows unlimited repetitions
The input consists of two strings:
s, the text we are trying to matchp, the regex-style pattern
The constraints are small:
1 <= s.length <= 201 <= p.length <= 20
Although the input size is small, the recursive branching introduced by * can still produce exponential behavior if implemented naively. This makes dynamic programming or memoization the preferred approach.
Several edge cases are especially important:
- Patterns like
"a*"can match an empty sequence ".*"can match almost anything- Multiple
*operators may appear consecutively in logical terms, such as"a*b*c*" - The algorithm must distinguish between consuming a character and skipping a
*group entirely - Greedy matching alone is insufficient because sometimes taking fewer repetitions is necessary for the full match to succeed
The problem guarantees that every * always has a valid preceding character, so invalid patterns like "*a" never appear.
Approaches
Brute Force Recursive Backtracking
The most direct solution is recursive backtracking. At every position, we compare the current character in s and p.
If the next pattern character is not *, then the current characters must match directly, either because they are equal or because the pattern contains .. We then recursively move to the next positions in both strings.
If the next pattern character is *, we have two possible choices:
- Skip the entire
x*pattern and move forward in the pattern - Use one occurrence of
x*if the current characters match, then continue matching the same pattern position
This brute force approach is correct because it explores every valid interpretation of *. However, many states repeat. The same (i, j) position pair can be recomputed many times, leading to exponential time complexity.
Dynamic Programming with Memoization
The key insight is that the recursive state is fully determined by two indices:
i, current position insj, current position inp
Once we know whether s[i:] matches p[j:], we never need to recompute it again.
This observation makes memoization possible. We store previously computed results for each (i, j) pair. Since there are at most len(s) * len(p) distinct states, the complexity becomes polynomial.
This transforms an exponential recursive search into an efficient dynamic programming solution.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^(m+n)) | O(m+n) | Explores all recursive possibilities repeatedly |
| Optimal | O(m × n) | O(m × n) | Memoized recursion avoids recomputation |
Here, m = len(s) and n = len(p).
Algorithm Walkthrough
Optimal Dynamic Programming Algorithm
- Define a recursive function
dp(i, j).
This function returns whether the substring s[i:] matches the pattern substring p[j:].
Using indices instead of slicing strings avoids unnecessary memory allocations and keeps the solution efficient. 2. Handle the base case.
If j reaches the end of the pattern, the match succeeds only if i also reaches the end of the string.
This ensures the entire string is matched, not just a prefix. 3. Determine whether the current characters match.
A match exists if:
i < len(s), and- either
s[i] == p[j]orp[j] == '.'
This logic handles both exact character matching and wildcard matching.
4. Check whether the next pattern character is *.
This is done by verifying:
j + 1 < len(p) and p[j + 1] == '*'
If a * exists, there are two possibilities:
- Skip the
x*entirely:
dp(i, j + 2)
- Use one occurrence of
x*if the current characters match:
first_match and dp(i + 1, j)
Notice that the second option keeps j unchanged because * may continue consuming characters.
5. If there is no *, require a direct character match.
In this case, both pointers move forward:
first_match and dp(i + 1, j + 1)
- Store results in a memoization cache.
Before computing a state, check whether it has already been solved.
This prevents repeated recursive exploration and guarantees polynomial complexity.
Why it works
The algorithm works because every recursive state precisely represents the question:
"Does s[i:] match p[j:]?"
For every pattern position, the algorithm explores all legally valid interpretations of *, either consuming characters or skipping the pattern segment entirely. Since every state is computed once and cached, the recursion eventually resolves every possible matching configuration without redundant work. This guarantees correctness and efficiency.
Python Solution
class Solution:
def isMatch(self, s: str, p: str) -> bool:
memo = {}
def dp(i: int, j: int) -> bool:
if (i, j) in memo:
return memo[(i, j)]
if j == len(p):
return i == len(s)
first_match = (
i < len(s) and
(s[i] == p[j] or p[j] == '.')
)
if j + 1 < len(p) and p[j + 1] == '*':
result = (
dp(i, j + 2) or
(first_match and dp(i + 1, j))
)
else:
result = (
first_match and
dp(i + 1, j + 1)
)
memo[(i, j)] = result
return result
return dp(0, 0)
The implementation begins by creating a memoization dictionary named memo. Each key is a tuple (i, j) representing positions in the string and pattern.
The nested dp function performs the recursive matching logic. The first operation checks whether the state has already been computed. If so, the cached result is returned immediately.
The base case occurs when the pattern pointer reaches the end of the pattern. At that point, the string pointer must also be at the end for the match to succeed.
The first_match variable determines whether the current characters are compatible. This includes support for the wildcard ..
The most important part of the implementation is handling *. Since * applies to the preceding character, the code checks p[j + 1].
Two recursive possibilities are explored:
- Skip the entire
x* - Consume one character and continue matching against the same pattern position
If no * exists, the solution performs ordinary character matching and advances both pointers.
Finally, the computed result is stored in the memoization dictionary before returning.
Go Solution
func isMatch(s string, p string) bool {
type State struct {
i int
j int
}
memo := make(map[State]bool)
visited := make(map[State]bool)
var dp func(i int, j int) bool
dp = func(i int, j int) bool {
state := State{i, j}
if visited[state] {
return memo[state]
}
visited[state] = true
if j == len(p) {
memo[state] = (i == len(s))
return memo[state]
}
firstMatch := (
i < len(s) &&
(s[i] == p[j] || p[j] == '.')
)
var result bool
if j+1 < len(p) && p[j+1] == '*' {
result =
dp(i, j+2) ||
(firstMatch && dp(i+1, j))
} else {
result =
firstMatch &&
dp(i+1, j+1)
}
memo[state] = result
return result
}
return dp(0, 0)
}
The Go implementation follows the same recursive memoized structure as the Python solution. Since Go does not allow tuples as map keys directly in the same way Python does, a custom State struct is used to represent (i, j).
Go maps cannot distinguish between an unset key and a stored false value, so a separate visited map is used to track whether a state has already been computed.
String indexing in Go operates on bytes, which is safe here because the problem guarantees lowercase English letters and special ASCII characters only.
Worked Examples
Example 1
Input:
s = "aa"
p = "a"
Expected output:
false
Trace
| Step | i | j | s[i:] | p[j:] | Action |
|---|---|---|---|---|---|
| 1 | 0 | 0 | "aa" | "a" | Characters match |
| 2 | 1 | 1 | "a" | "" | Pattern ended |
| 3 | 1 | 1 | "a" | "" | String not finished, return false |
Final result:
false
Example 2
Input:
s = "aa"
p = "a*"
Expected output:
true
Trace
| Step | i | j | s[i:] | p[j:] | Action |
|---|---|---|---|---|---|
| 1 | 0 | 0 | "aa" | "a*" | * detected |
| 2 | 0 | 2 | "aa" | "" | Skip a*, fails |
| 3 | 1 | 0 | "a" | "a*" | Consume one a |
| 4 | 2 | 0 | "" | "a*" | Consume second a |
| 5 | 2 | 2 | "" | "" | Full match |
Final result:
true
Example 3
Input:
s = "ab"
p = ".*"
Expected output:
true
Trace
| Step | i | j | s[i:] | p[j:] | Action |
|---|---|---|---|---|---|
| 1 | 0 | 0 | "ab" | ".*" | . matches a |
| 2 | 1 | 0 | "b" | ".*" | . matches b |
| 3 | 2 | 0 | "" | ".*" | Cannot consume more |
| 4 | 2 | 2 | "" | "" | Skip .*, success |
Final result:
true
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(m × n) | Each (i, j) state is computed once |
| Space | O(m × n) | Memoization cache stores all states |
The recursive function depends only on the pair of indices (i, j). Since i ranges from 0 to m and j ranges from 0 to n, there are at most m × n unique states. Each state performs constant work outside recursive calls, resulting in polynomial complexity.
The recursion stack depth is at most O(m + n), but the memoization table dominates the total space usage.
Test Cases
solution = Solution()
assert solution.isMatch("aa", "a") == False # exact mismatch
assert solution.isMatch("aa", "a*") == True # star repetition
assert solution.isMatch("ab", ".*") == True # wildcard any string
assert solution.isMatch("aab", "c*a*b") == True # multiple star groups
assert solution.isMatch("mississippi", "mis*is*p*.") == False # complex failure
assert solution.isMatch("abc", "abc") == True # exact full match
assert solution.isMatch("abc", "abd") == False # exact mismatch
assert solution.isMatch("aaa", "a*a") == True # star with trailing char
assert solution.isMatch("aaa", "ab*a*c*a") == True # nested star usage
assert solution.isMatch("a", ".") == True # single wildcard
assert solution.isMatch("abcd", "d*") == False # pattern cannot cover string
assert solution.isMatch("", "a*") == True # empty string matched by star
assert solution.isMatch("", ".*") == True # empty string via wildcard star
assert solution.isMatch("", "") == True # both empty
assert solution.isMatch("bbbba", ".*a*a") == True # greedy trap case
assert solution.isMatch("aaaaaaaaaaaaab", "a*a*a*a*a*a*a*a*a*a*c") == False # stress branching
| Test | Why |
|---|---|
"aa", "a" |
Verifies exact mismatch |
"aa", "a*" |
Verifies repetition handling |
"ab", ".*" |
Verifies wildcard matching |
"aab", "c*a*b" |
Tests multiple star segments |
"mississippi", "mis*is*p*." |
Tests complex failing pattern |
"abc", "abc" |
Tests exact matching |
"abc", "abd" |
Tests ordinary mismatch |
"aaa", "a*a" |
Tests star plus trailing character |
"aaa", "ab*a*c*a" |
Tests skipping star groups |
"a", "." |
Tests single wildcard |
"abcd", "d*" |
Tests unmatched prefix |
"", "a*" |
Tests empty string with repetition |
"", ".*" |
Tests empty string with wildcard |
"", "" |
Tests fully empty inputs |
"bbbba", ".*a*a" |
Tests backtracking correctness |
| Long repeated pattern case | Tests memoization efficiency |
Edge Cases
One important edge case is when the pattern can match an empty string. Patterns like "a*", "a*b*", or ".*" may legally consume zero characters. A naive implementation often forgets to allow skipping these sections. This solution handles the case correctly through the recursive branch dp(i, j + 2), which skips the entire x* segment.
Another critical edge case involves greedy matching traps. For example, in "bbbba" matched against ".*a*a", consuming too many characters initially can prevent the remaining pattern from matching correctly. A purely greedy algorithm would fail here. The recursive exploration ensures both possibilities are considered, consuming characters or stopping consumption when needed.
A third important edge case is reaching the end of the pattern before the string is fully consumed. For example, "aa" versus "a" must return false. The base case explicitly checks that both indices reach their respective ends simultaneously, ensuring partial matches are rejected.
A final subtle case occurs with repeated recursive states. Patterns with many * operators can cause exponential recomputation in naive recursion. The memoization cache prevents this by storing every solved (i, j) state, guaranteeing efficient execution even for highly branching inputs.