LeetCode 1397 - Find All Good Strings

LeetCode Problem 1397

Difficulty: 🔴 Hard
Topics: String, Dynamic Programming, String Matching

Solution

Problem Understanding

The problem asks us to count how many strings of length n satisfy three conditions simultaneously:

The string must be lexicographically greater than or equal to s1
The string must be lexicographically less than or equal to s2
The string must not contain evil as a substring anywhere

The result can become extremely large, so we return it modulo 10^9 + 7.

Lexicographical comparison works the same way dictionary ordering works. For example:

"abc" < "abd"
"baa" > "azz"

The input consists of:

n, the required length of every generated string
s1, the lower lexicographical bound
s2, the upper lexicographical bound
evil, a forbidden substring

We must count every valid string between s1 and s2 inclusive that never contains evil.

The constraints are the most important clue for choosing the correct algorithm:

n can be as large as 500
evil.length can be up to 50

A brute-force enumeration of all possible strings is impossible because there are 26^500 possible strings in the worst case. Even generating all strings between s1 and s2 is astronomically large.

The problem combines several advanced ideas:

Digit DP style boundary constraints
String matching
Dynamic programming with memoization
KMP prefix function for efficient substring tracking

Several edge cases are important:

evil may appear immediately at the beginning
evil may overlap with itself, requiring careful prefix handling
s1 and s2 may be extremely close
Every possible string in the range may be invalid
evil may have length 1, making filtering especially restrictive

The problem guarantees:

s1 <= s2
All strings contain lowercase English letters only
Both bounds already have length n

This allows us to focus entirely on constrained string generation.

Approaches

Brute Force Approach

The brute-force solution would generate every string between s1 and s2, then check whether each string contains evil.

To check containment, we could use normal substring search. Since every generated string has length n, checking one string costs O(n).

However, the number of candidate strings is enormous. In the worst case there are 26^n possibilities.

For n = 500, brute force is completely infeasible.

Even if we optimized substring checking with KMP, the generation itself remains impossible.

Key Insight

The key observation is that we never need to construct all strings explicitly.

Instead, we can build strings character by character using dynamic programming.

At every position, the future possibilities depend only on:

The current index
Whether we are still tied to the lower bound
Whether we are still tied to the upper bound
How much of evil has already been matched

The last component is critical.

Suppose evil = "abab".

If the current suffix of our generated string is "aba", then adding 'b' would complete the forbidden substring. We therefore need efficient tracking of partial matches.

This is exactly what the KMP prefix function provides.

We combine:

Digit DP for lexicographical constraints
KMP automaton for forbidden substring tracking

This creates a compact state space that is efficient enough for the constraints.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(26^n × n)	O(n)	Generates all candidate strings and checks each for `evil`
Optimal	O(n × m × 26 × 4)	O(n × m × 4)	DP with KMP automaton and lexicographical bounds

Here:

n = len(s1)
m = len(evil)

Algorithm Walkthrough

Step 1: Build the KMP Prefix Table

We first compute the KMP longest prefix suffix array for evil.

For every position i, lps[i] stores the length of the longest proper prefix of evil[:i+1] that is also a suffix.

This allows us to efficiently transition between partial matches when adding new characters.

For example, if:

evil = "abab"

Then:

lps = [0, 0, 1, 2]

This table prevents us from restarting matching from scratch every time a mismatch occurs.

Step 2: Define DP State

We define a memoized DFS:

dfs(position, matched, tightLow, tightHigh)

Where:

position is the current index in the string
matched is how many characters of evil currently match
tightLow indicates whether we still match the prefix of s1
tightHigh indicates whether we still match the prefix of s2

This state fully determines all future choices.

Step 3: Handle Forbidden State

If:

matched == len(evil)

then we already formed evil.

This path is invalid and contributes 0.

Step 4: Handle Completed String

If:

position == n

then we successfully built a valid string without matching evil.

Return 1.

Step 5: Determine Allowed Character Range

At each position, the valid character range depends on boundary constraints.

If tightLow is true, the minimum allowed character is:

s1[position]

Otherwise it is 'a'.

Similarly:

if tightHigh is true, maximum is s2[position]
otherwise maximum is 'z'

Step 6: Try Every Possible Character

For each candidate character:

Update KMP matching state
Update lower-bound tightness
Update upper-bound tightness
Recurse

The KMP transition works like this:

While mismatch exists, fall back using lps
If current character matches, increase matched length

This efficiently maintains substring matching information.

Step 7: Memoize Results

Many states repeat.

Memoization ensures each state is computed once.

Without memoization, recursion would still explode exponentially.

Why it works

The algorithm works because the DP state captures every piece of information needed to determine future valid strings.

The KMP state guarantees we always know whether adding future characters can complete evil. The tight boundary flags guarantee every generated string remains within [s1, s2].

Since every recursive call explores all valid next characters exactly once, and invalid states are discarded immediately, the algorithm counts precisely all good strings.

Python Solution

from functools import lru_cache
from typing import List

class Solution:
    def findGoodStrings(self, n: int, s1: str, s2: str, evil: str) -> int:
        MOD = 10**9 + 7
        m = len(evil)

        # Build KMP LPS array
        lps = [0] * m
        length = 0

        for i in range(1, m):
            while length > 0 and evil[i] != evil[length]:
                length = lps[length - 1]

            if evil[i] == evil[length]:
                length += 1
                lps[i] = length

        @lru_cache(None)
        def dfs(pos: int, matched: int, tight_low: bool, tight_high: bool) -> int:
            # Found forbidden substring
            if matched == m:
                return 0

            # Successfully built valid string
            if pos == n:
                return 1

            low_char = s1[pos] if tight_low else 'a'
            high_char = s2[pos] if tight_high else 'z'

            total = 0

            for code in range(ord(low_char), ord(high_char) + 1):
                ch = chr(code)

                # KMP transition
                next_match = matched

                while next_match > 0 and evil[next_match] != ch:
                    next_match = lps[next_match - 1]

                if evil[next_match] == ch:
                    next_match += 1

                next_tight_low = tight_low and (ch == low_char)
                next_tight_high = tight_high and (ch == high_char)

                total += dfs(
                    pos + 1,
                    next_match,
                    next_tight_low,
                    next_tight_high
                )

                total %= MOD

            return total

        return dfs(0, 0, True, True)

The implementation starts by constructing the KMP prefix table for evil. This enables efficient substring tracking during DP transitions.

The recursive function dfs represents the main dynamic programming logic.

The parameters capture:

current index
current partial match length
lower bound status
upper bound status

At each position, the code computes the allowed character range according to the boundary constraints.

For every candidate character, the KMP automaton updates the matched prefix length. If a full match of evil is reached, that recursive path immediately becomes invalid.

Memoization via lru_cache ensures repeated states are computed only once.

Modulo arithmetic is applied after every addition to prevent overflow.

Go Solution

package main

func findGoodStrings(n int, s1 string, s2 string, evil string) int {
	const MOD int = 1_000_000_007

	m := len(evil)

	// Build KMP LPS array
	lps := make([]int, m)

	length := 0
	for i := 1; i < m; i++ {
		for length > 0 && evil[i] != evil[length] {
			length = lps[length-1]
		}

		if evil[i] == evil[length] {
			length++
			lps[i] = length
		}
	}

	type State struct {
		pos        int
		matched    int
		tightLow   bool
		tightHigh  bool
	}

	memo := make(map[State]int)

	var dfs func(int, int, bool, bool) int

	dfs = func(pos int, matched int, tightLow bool, tightHigh bool) int {
		if matched == m {
			return 0
		}

		if pos == n {
			return 1
		}

		state := State{pos, matched, tightLow, tightHigh}

		if val, ok := memo[state]; ok {
			return val
		}

		lowChar := byte('a')
		highChar := byte('z')

		if tightLow {
			lowChar = s1[pos]
		}

		if tightHigh {
			highChar = s2[pos]
		}

		total := 0

		for ch := lowChar; ch <= highChar; ch++ {
			nextMatch := matched

			for nextMatch > 0 && evil[nextMatch] != ch {
				nextMatch = lps[nextMatch-1]
			}

			if evil[nextMatch] == ch {
				nextMatch++
			}

			nextTightLow := tightLow && (ch == lowChar)
			nextTightHigh := tightHigh && (ch == highChar)

			total += dfs(
				pos+1,
				nextMatch,
				nextTightLow,
				nextTightHigh,
			)

			total %= MOD
		}

		memo[state] = total
		return total
	}

	return dfs(0, 0, true, true)
}

The Go implementation follows the same logic as the Python version but uses an explicit map[State]int for memoization.

Since Go does not have built-in memoized recursion decorators like Python's lru_cache, we define a custom State struct as the map key.

Boolean values are included directly in the memoization state.

Modulo arithmetic is especially important in Go because integer overflow is easier to encounter during accumulation.

Worked Examples

Example 1

Input:
n = 2
s1 = "aa"
s2 = "da"
evil = "b"

Since evil = "b", any string containing 'b' is invalid.

KMP Table

Index	Character	LPS
0	b	0

Initial State

dfs(0, 0, True, True)

At position 0:

lower bound = 'a'
upper bound = 'd'

Possible choices:

Character	Valid?	Reason
a	Yes	does not match evil
b	No	immediately forms evil
c	Yes	safe
d	Yes	safe

Branch: "a"

Next state:

dfs(1, 0, True, False)

At position 1:

lower bound = 'a'
upper bound = 'z'

All letters except 'b' are valid.

Count:

Branch: "c"

Again 25 valid second characters.

Branch: "d"

Because upper bound remains tight:

dfs(1, 0, False, True)

Allowed range is only 'a'.

Count:

Total:

25 + 25 + 1 = 51

Example 2

Input:
n = 8
s1 = "leetcode"
s2 = "leetgoes"
evil = "leet"

Every valid string between the bounds must begin with "leet".

As soon as the DP matches all 4 characters of evil, it returns 0.

Therefore no valid string exists.

Output:

Example 3

Input:
n = 2
s1 = "gx"
s2 = "gz"
evil = "x"

Candidate strings:

String	Contains x?	Valid
gx	Yes	No
gy	No	Yes
gz	No	Yes

Answer:

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n × m × 26 × 4)	DP states multiplied by alphabet transitions
Space	O(n × m × 4)	Memoization table plus recursion stack

The DP state consists of:

n positions
m KMP match states
2 possibilities for lower tightness
2 possibilities for upper tightness

Each state tries at most 26 characters.

Since m <= 50, the solution is efficient enough even for n = 500.

Test Cases

sol = Solution()

# Provided examples
assert sol.findGoodStrings(2, "aa", "da", "b") == 51  # sample case
assert sol.findGoodStrings(8, "leetcode", "leetgoes", "leet") == 0  # all invalid
assert sol.findGoodStrings(2, "gx", "gz", "x") == 2  # small range

# Single character bounds
assert sol.findGoodStrings(1, "a", "z", "a") == 25  # exclude one letter
assert sol.findGoodStrings(1, "a", "a", "b") == 1  # exact valid string
assert sol.findGoodStrings(1, "a", "a", "a") == 0  # exact invalid string

# Evil longer than possible appearance
assert sol.findGoodStrings(2, "aa", "zz", "abc") == 26 * 26  # impossible to contain evil

# Tight range
assert sol.findGoodStrings(2, "ab", "ab", "c") == 1  # exact valid
assert sol.findGoodStrings(2, "ab", "ab", "ab") == 0  # exact invalid

# Overlapping evil pattern
assert sol.findGoodStrings(4, "aaaa", "zzzz", "aa") >= 0  # overlapping forbidden substring

# Evil length 1
assert sol.findGoodStrings(3, "aaa", "zzz", "z") >= 0  # many exclusions

# Entire alphabet range
assert sol.findGoodStrings(2, "aa", "zz", "zz") >= 0  # forbidden suffix case

print("All tests passed.")

Test	Why
`(2, "aa", "da", "b")`	Validates standard DP counting
`(8, "leetcode", "leetgoes", "leet")`	Tests immediate forbidden prefix
`(2, "gx", "gz", "x")`	Tests narrow range filtering
`(1, "a", "z", "a")`	Tests single-character evil
`(1, "a", "a", "b")`	Tests exact valid boundary
`(1, "a", "a", "a")`	Tests exact invalid boundary
`evil longer than n`	Ensures impossible matches are handled
`tight equal bounds`	Tests fully constrained generation
`overlapping evil`	Validates KMP fallback correctness
`evil = "z"`	Tests many forbidden states
`evil = "zz"`	Tests forbidden suffix handling

Edge Cases

Evil Appears as a Prefix

Consider:

s1 = "leetcode"
s2 = "leetgoes"
evil = "leet"

Every valid candidate starts with "leet".

A naive solution might continue exploring deeper recursion before detecting invalidity. The KMP state immediately reaches a full match after processing the fourth character, causing the DP to terminate that branch early.

This prevents wasted computation.

Overlapping Forbidden Patterns

Consider:

evil = "abab"

Suppose the current generated suffix is:

"aba"

If the next character is 'b', the forbidden substring appears.

However, after partial mismatches, we cannot simply reset matching to zero because overlaps matter.

The KMP prefix table correctly transitions between partial matches, ensuring overlapping structures are handled efficiently and correctly.

Very Tight Bounds

Consider:

s1 = "abc"
s2 = "abc"

There is only one possible string.

A common bug is incorrectly relaxing boundary constraints too early.

The tightLow and tightHigh flags guarantee that if the current prefix still matches the bounds, future characters remain constrained appropriately.

This ensures exact-range cases work correctly.