LeetCode 3455 - Shortest Matching Substring
This problem asks us to find the length of the shortest substring in a string s that matches a pattern p, where p contains exactly two '' wildcards. Each '' can match zero or more characters.
Difficulty: 🔴 Hard
Topics: Two Pointers, String, Binary Search, String Matching
Solution
Problem Understanding
This problem asks us to find the length of the shortest substring in a string s that matches a pattern p, where p contains exactly two '*' wildcards. Each '*' can match zero or more characters. The substring of s must contain all the letters in p in the correct order, with the wildcards allowing any sequence of characters in between. The output is the length of that substring, or -1 if no substring matches.
The inputs are:
s, a string of lowercase English letters with length up to10^5.p, a pattern string of lowercase English letters and exactly two'*', with length up to10^5.
The output is a single integer, the minimal length of a substring in s that matches p. If the pattern can match the empty substring (e.g., "**"), the output should be 0. The constraints imply that brute-force substring checking will be too slow, so an efficient approach is necessary.
Edge cases include:
pis just"**"- the shortest match is the empty substring.- Characters around
'*'might not exist ins, leading to no match. - Patterns with consecutive characters that never appear consecutively in
s. - Minimal-length matches could be at the start, end, or middle of
s.
Approaches
The brute-force approach involves generating all possible substrings of s and checking whether they match p. For each substring, we would simulate the pattern matching, including wildcards. This guarantees correctness because it exhaustively checks all substrings, but it is extremely inefficient. With O(n^2) substrings and O(m) matching for each, the total complexity would be roughly O(n^2 * m), which is infeasible for n, m ~ 10^5.
The optimal approach leverages the fact that p has exactly two wildcards, which allows us to break p into three fixed segments: the prefix before the first '*', the middle between the two '*', and the suffix after the second '*'. Then we can use a two-pointer technique to find valid positions of these segments in s efficiently. Essentially, we precompute the earliest end positions of each segment starting from each index, and then combine them to find the minimal-length substring. This reduces the problem to O(n) or O(n*m_segment) depending on implementation.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n^2 * m) | O(1) | Check all substrings of s against p using standard wildcard matching |
| Optimal | O(n) | O(n) | Use two-pointer scan from left and right to locate minimal-length match using the three segments of p |
Algorithm Walkthrough
- Split the pattern
pinto three segments. Identify the positions of the two'*'characters and extract the fixed substrings:part1before the first'*',part2between the two'*', andpart3after the second'*'. - Handle the trivial
"**"pattern. If bothpart1andpart2andpart3are empty, return0immediately because the empty substring matches. - Scan
sfrom left to right to find the earliest positions wherepart1andpart2can appear in order. For each index, use a pointer to see ifpart1matches starting at that index; then continue scanning forpart2afterpart1. - Scan
sfrom right to left to find the latest positions wherepart3can appear. This ensures that the substring can accommodate the suffix. - Combine positions. For each possible start index of
part1, find the earliest end index ofpart3that occurs after the start ofpart1and the occurrence ofpart2in between. Compute the length of this substring and update the minimum length. - Return the minimum length found. If no valid substring exists, return
-1.
Why it works: The invariant is that each candidate substring contains all three fixed segments of p in the correct order. Since wildcards can match any number of characters, only the order of segments matters. Scanning left-to-right for prefix and middle and right-to-left for suffix guarantees we find the minimal-length substring that contains all segments in order.
Python Solution
class Solution:
def shortestMatchingSubstring(self, s: str, p: str) -> int:
first_star = p.find('*')
last_star = p.rfind('*')
part1 = p[:first_star]
part2 = p[first_star+1:last_star]
part3 = p[last_star+1:]
if not part1 and not part2 and not part3:
return 0 # "**" matches empty substring
n = len(s)
min_len = float('inf')
# Left to right pass to find earliest positions for part1 and part2
left_pos = [-1] * 3
i = 0
for idx, part in enumerate([part1, part2, part3]):
if not part:
left_pos[idx] = i
continue
while i <= n - len(part):
if s[i:i+len(part)] == part:
left_pos[idx] = i
i += len(part)
break
i += 1
else:
left_pos[idx] = -1
# Right to left pass to find latest positions for part1, part2, part3
right_pos = [-1] * 3
j = n
for idx, part in reversed(list(enumerate([part1, part2, part3]))):
if not part:
right_pos[idx] = j
continue
while j >= len(part):
if s[j-len(part):j] == part:
right_pos[idx] = j
j -= len(part)
break
j -= 1
else:
right_pos[idx] = -1
# If any part is not found, return -1
if -1 in left_pos or -1 in right_pos:
return -1
# Minimal substring length from first part start to last part end
min_len = right_pos[2] - left_pos[0]
return min_len
Implementation Explanation: The code first splits the pattern into three segments and handles the trivial "**" case. It then performs a left-to-right scan to locate the earliest positions of the prefix and middle segments. A right-to-left scan locates the latest positions of the suffix. If all parts exist in s, the difference between the earliest prefix start and latest suffix end gives the minimal substring length.
Go Solution
func shortestMatchingSubstring(s string, p string) int {
firstStar := -1
lastStar := -1
for i, c := range p {
if c == '*' {
if firstStar == -1 {
firstStar = i
}
lastStar = i
}
}
part1 := p[:firstStar]
part2 := p[firstStar+1:lastStar]
part3 := p[lastStar+1:]
if part1 == "" && part2 == "" && part3 == "" {
return 0
}
n := len(s)
leftPos := [3]int{-1, -1, -1}
i := 0
parts := []string{part1, part2, part3}
for idx, part := range parts {
if part == "" {
leftPos[idx] = i
continue
}
for i <= n-len(part) {
if s[i:i+len(part)] == part {
leftPos[idx] = i
i += len(part)
break
}
i++
}
if leftPos[idx] == -1 {
return -1
}
}
rightPos := [3]int{-1, -1, -1}
j := n
for idx := 2; idx >= 0; idx-- {
part := parts[idx]
if part == "" {
rightPos[idx] = j
continue
}
for j >= len(part) {
if s[j-len(part):j] == part {
rightPos[idx] = j
j -= len(part)
break
}
j--
}
if rightPos[idx] == -1 {
return -1
}
}
return rightPos[2] - leftPos[0]
}
Go Notes: Similar logic applies. Slices are used for substring extraction. Care is taken with slice bounds (j-len(part):j) when scanning from right. Go does not have float('inf'), so we do not need it; we directly compute the substring length after confirming all parts exist.
Worked Examples
Example 1: s = "abaacbaecebce", p = "ba_c_ce"
Segments: part1="ba", part2="c", part3="ce"
Left-to-right scan:
| Part | Start