LeetCode 10 - Regular Expression Matching

Difficulty: 🔴 Hard
Topics: String, Dynamic Programming, Recursion

Solution

Problem Understanding

LeetCode 10, Regular Expression Matching, asks us to determine whether an entire input string s matches a pattern p. The pattern supports two special regular expression characters:

. matches any single character
* matches zero or more occurrences of the character immediately before it

The important detail is that the match must cover the entire string, not just a substring. This means we cannot stop early when part of the string matches. Every character in s must be consumed by the pattern, and every meaningful part of the pattern must participate in the match.

For example:

"aa" does not match "a" because the pattern only matches one character
"aa" matches "a*" because * allows repeating 'a'
"ab" matches ".*" because . matches any character and * allows unlimited repetitions

The input consists of two strings:

s, the text we are trying to match
p, the regex-style pattern

The constraints are small:

1 <= s.length <= 20
1 <= p.length <= 20

Although the input size is small, the recursive branching introduced by * can still produce exponential behavior if implemented naively. This makes dynamic programming or memoization the preferred approach.

Several edge cases are especially important:

Patterns like "a*" can match an empty sequence
".*" can match almost anything
Multiple * operators may appear consecutively in logical terms, such as "a*b*c*"
The algorithm must distinguish between consuming a character and skipping a * group entirely
Greedy matching alone is insufficient because sometimes taking fewer repetitions is necessary for the full match to succeed

The problem guarantees that every * always has a valid preceding character, so invalid patterns like "*a" never appear.

Approaches

Brute Force Recursive Backtracking

The most direct solution is recursive backtracking. At every position, we compare the current character in s and p.

If the next pattern character is not *, then the current characters must match directly, either because they are equal or because the pattern contains .. We then recursively move to the next positions in both strings.

If the next pattern character is *, we have two possible choices:

Skip the entire x* pattern and move forward in the pattern
Use one occurrence of x* if the current characters match, then continue matching the same pattern position

This brute force approach is correct because it explores every valid interpretation of *. However, many states repeat. The same (i, j) position pair can be recomputed many times, leading to exponential time complexity.

Dynamic Programming with Memoization

The key insight is that the recursive state is fully determined by two indices:

i, current position in s
j, current position in p

Once we know whether s[i:] matches p[j:], we never need to recompute it again.

This observation makes memoization possible. We store previously computed results for each (i, j) pair. Since there are at most len(s) * len(p) distinct states, the complexity becomes polynomial.

This transforms an exponential recursive search into an efficient dynamic programming solution.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(2^(m+n))	O(m+n)	Explores all recursive possibilities repeatedly
Optimal	O(m × n)	O(m × n)	Memoized recursion avoids recomputation

Here, m = len(s) and n = len(p).

Algorithm Walkthrough

Optimal Dynamic Programming Algorithm

Define a recursive function dp(i, j).

This function returns whether the substring s[i:] matches the pattern substring p[j:].

Using indices instead of slicing strings avoids unnecessary memory allocations and keeps the solution efficient. 2. Handle the base case.

If j reaches the end of the pattern, the match succeeds only if i also reaches the end of the string.

This ensures the entire string is matched, not just a prefix. 3. Determine whether the current characters match.

A match exists if:

i < len(s), and
either s[i] == p[j] or p[j] == '.'

This logic handles both exact character matching and wildcard matching. 4. Check whether the next pattern character is *.

This is done by verifying:

j + 1 < len(p) and p[j + 1] == '*'

If a * exists, there are two possibilities:

Skip the x* entirely:

dp(i, j + 2)

Use one occurrence of x* if the current characters match:

first_match and dp(i + 1, j)

Notice that the second option keeps j unchanged because * may continue consuming characters. 5. If there is no *, require a direct character match.

In this case, both pointers move forward:

first_match and dp(i + 1, j + 1)

Store results in a memoization cache.

Before computing a state, check whether it has already been solved.

This prevents repeated recursive exploration and guarantees polynomial complexity.

Why it works

The algorithm works because every recursive state precisely represents the question:

"Does s[i:] match p[j:]?"

For every pattern position, the algorithm explores all legally valid interpretations of *, either consuming characters or skipping the pattern segment entirely. Since every state is computed once and cached, the recursion eventually resolves every possible matching configuration without redundant work. This guarantees correctness and efficiency.

Python Solution

class Solution:
    def isMatch(self, s: str, p: str) -> bool:
        memo = {}

        def dp(i: int, j: int) -> bool:
            if (i, j) in memo:
                return memo[(i, j)]

            if j == len(p):
                return i == len(s)

            first_match = (
                i < len(s) and
                (s[i] == p[j] or p[j] == '.')
            )

            if j + 1 < len(p) and p[j + 1] == '*':
                result = (
                    dp(i, j + 2) or
                    (first_match and dp(i + 1, j))
                )
            else:
                result = (
                    first_match and
                    dp(i + 1, j + 1)
                )

            memo[(i, j)] = result
            return result

        return dp(0, 0)

The implementation begins by creating a memoization dictionary named memo. Each key is a tuple (i, j) representing positions in the string and pattern.

The nested dp function performs the recursive matching logic. The first operation checks whether the state has already been computed. If so, the cached result is returned immediately.

The base case occurs when the pattern pointer reaches the end of the pattern. At that point, the string pointer must also be at the end for the match to succeed.

The first_match variable determines whether the current characters are compatible. This includes support for the wildcard ..

The most important part of the implementation is handling *. Since * applies to the preceding character, the code checks p[j + 1].

Two recursive possibilities are explored:

Skip the entire x*
Consume one character and continue matching against the same pattern position

If no * exists, the solution performs ordinary character matching and advances both pointers.

Finally, the computed result is stored in the memoization dictionary before returning.

Go Solution

func isMatch(s string, p string) bool {
    type State struct {
        i int
        j int
    }

    memo := make(map[State]bool)
    visited := make(map[State]bool)

    var dp func(i int, j int) bool

    dp = func(i int, j int) bool {
        state := State{i, j}

        if visited[state] {
            return memo[state]
        }

        visited[state] = true

        if j == len(p) {
            memo[state] = (i == len(s))
            return memo[state]
        }

        firstMatch := (
            i < len(s) &&
                (s[i] == p[j] || p[j] == '.')
        )

        var result bool

        if j+1 < len(p) && p[j+1] == '*' {
            result =
                dp(i, j+2) ||
                    (firstMatch && dp(i+1, j))
        } else {
            result =
                firstMatch &&
                    dp(i+1, j+1)
        }

        memo[state] = result
        return result
    }

    return dp(0, 0)
}

The Go implementation follows the same recursive memoized structure as the Python solution. Since Go does not allow tuples as map keys directly in the same way Python does, a custom State struct is used to represent (i, j).

Go maps cannot distinguish between an unset key and a stored false value, so a separate visited map is used to track whether a state has already been computed.

String indexing in Go operates on bytes, which is safe here because the problem guarantees lowercase English letters and special ASCII characters only.

Worked Examples

Example 1

Input:

s = "aa"
p = "a"

Expected output:

false

Trace

Step	i	j	s[i:]	p[j:]	Action
1	0	0	"aa"	"a"	Characters match
2	1	1	"a"	""	Pattern ended
3	1	1	"a"	""	String not finished, return false

Final result:

false

Example 2

Input:

s = "aa"
p = "a*"

Expected output:

true

Trace

Step	i	j	s[i:]	p[j:]	Action
1	0	0	"aa"	"a*"	`*` detected
2	0	2	"aa"	""	Skip `a*`, fails
3	1	0	"a"	"a*"	Consume one `a`
4	2	0	""	"a*"	Consume second `a`
5	2	2	""	""	Full match

Final result:

true

Example 3

Input:

s = "ab"
p = ".*"

Expected output:

true

Trace

Step	i	j	s[i:]	p[j:]	Action
1	0	0	"ab"	".*"	`.` matches `a`
2	1	0	"b"	".*"	`.` matches `b`
3	2	0	""	".*"	Cannot consume more
4	2	2	""	""	Skip `.*`, success

Final result:

true

Complexity Analysis

Measure	Complexity	Explanation
Time	O(m × n)	Each `(i, j)` state is computed once
Space	O(m × n)	Memoization cache stores all states

The recursive function depends only on the pair of indices (i, j). Since i ranges from 0 to m and j ranges from 0 to n, there are at most m × n unique states. Each state performs constant work outside recursive calls, resulting in polynomial complexity.

The recursion stack depth is at most O(m + n), but the memoization table dominates the total space usage.

Test Cases

solution = Solution()

assert solution.isMatch("aa", "a") == False      # exact mismatch
assert solution.isMatch("aa", "a*") == True      # star repetition
assert solution.isMatch("ab", ".*") == True      # wildcard any string

assert solution.isMatch("aab", "c*a*b") == True  # multiple star groups
assert solution.isMatch("mississippi", "mis*is*p*.") == False  # complex failure

assert solution.isMatch("abc", "abc") == True    # exact full match
assert solution.isMatch("abc", "abd") == False   # exact mismatch

assert solution.isMatch("aaa", "a*a") == True    # star with trailing char
assert solution.isMatch("aaa", "ab*a*c*a") == True  # nested star usage

assert solution.isMatch("a", ".") == True        # single wildcard
assert solution.isMatch("abcd", "d*") == False   # pattern cannot cover string

assert solution.isMatch("", "a*") == True        # empty string matched by star
assert solution.isMatch("", ".*") == True        # empty string via wildcard star
assert solution.isMatch("", "") == True          # both empty

assert solution.isMatch("bbbba", ".*a*a") == True  # greedy trap case
assert solution.isMatch("aaaaaaaaaaaaab", "a*a*a*a*a*a*a*a*a*a*c") == False  # stress branching

Test	Why
`"aa", "a"`	Verifies exact mismatch
`"aa", "a*"`	Verifies repetition handling
`"ab", ".*"`	Verifies wildcard matching
`"aab", "cab"`	Tests multiple star segments
`"mississippi", "misisp*."`	Tests complex failing pattern
`"abc", "abc"`	Tests exact matching
`"abc", "abd"`	Tests ordinary mismatch
`"aaa", "a*a"`	Tests star plus trailing character
`"aaa", "abac*a"`	Tests skipping star groups
`"a", "."`	Tests single wildcard
`"abcd", "d*"`	Tests unmatched prefix
`"", "a*"`	Tests empty string with repetition
`"", ".*"`	Tests empty string with wildcard
`"", ""`	Tests fully empty inputs
`"bbbba", ".aa"`	Tests backtracking correctness
Long repeated pattern case	Tests memoization efficiency

Edge Cases

One important edge case is when the pattern can match an empty string. Patterns like "a*", "a*b*", or ".*" may legally consume zero characters. A naive implementation often forgets to allow skipping these sections. This solution handles the case correctly through the recursive branch dp(i, j + 2), which skips the entire x* segment.

Another critical edge case involves greedy matching traps. For example, in "bbbba" matched against ".*a*a", consuming too many characters initially can prevent the remaining pattern from matching correctly. A purely greedy algorithm would fail here. The recursive exploration ensures both possibilities are considered, consuming characters or stopping consumption when needed.

A third important edge case is reaching the end of the pattern before the string is fully consumed. For example, "aa" versus "a" must return false. The base case explicitly checks that both indices reach their respective ends simultaneously, ensuring partial matches are rejected.

A final subtle case occurs with repeated recursive states. Patterns with many * operators can cause exponential recomputation in naive recursion. The memoization cache prevents this by storing every solved (i, j) state, guaranteeing efficient execution even for highly branching inputs.