LeetCode 3455 - Shortest Matching Substring

This problem asks us to find the length of the shortest substring in a string s that matches a pattern p, where p contains exactly two '' wildcards. Each '' can match zero or more characters.

LeetCode Problem 3455

Difficulty: 🔴 Hard
Topics: Two Pointers, String, Binary Search, String Matching

Solution

Problem Understanding

This problem asks us to find the length of the shortest substring in a string s that matches a pattern p, where p contains exactly two '*' wildcards. Each '*' can match zero or more characters. The substring of s must contain all the letters in p in the correct order, with the wildcards allowing any sequence of characters in between. The output is the length of that substring, or -1 if no substring matches.

The inputs are:

  • s, a string of lowercase English letters with length up to 10^5.
  • p, a pattern string of lowercase English letters and exactly two '*', with length up to 10^5.

The output is a single integer, the minimal length of a substring in s that matches p. If the pattern can match the empty substring (e.g., "**"), the output should be 0. The constraints imply that brute-force substring checking will be too slow, so an efficient approach is necessary.

Edge cases include:

  1. p is just "**" - the shortest match is the empty substring.
  2. Characters around '*' might not exist in s, leading to no match.
  3. Patterns with consecutive characters that never appear consecutively in s.
  4. Minimal-length matches could be at the start, end, or middle of s.

Approaches

The brute-force approach involves generating all possible substrings of s and checking whether they match p. For each substring, we would simulate the pattern matching, including wildcards. This guarantees correctness because it exhaustively checks all substrings, but it is extremely inefficient. With O(n^2) substrings and O(m) matching for each, the total complexity would be roughly O(n^2 * m), which is infeasible for n, m ~ 10^5.

The optimal approach leverages the fact that p has exactly two wildcards, which allows us to break p into three fixed segments: the prefix before the first '*', the middle between the two '*', and the suffix after the second '*'. Then we can use a two-pointer technique to find valid positions of these segments in s efficiently. Essentially, we precompute the earliest end positions of each segment starting from each index, and then combine them to find the minimal-length substring. This reduces the problem to O(n) or O(n*m_segment) depending on implementation.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2 * m) O(1) Check all substrings of s against p using standard wildcard matching
Optimal O(n) O(n) Use two-pointer scan from left and right to locate minimal-length match using the three segments of p

Algorithm Walkthrough

  1. Split the pattern p into three segments. Identify the positions of the two '*' characters and extract the fixed substrings: part1 before the first '*', part2 between the two '*', and part3 after the second '*'.
  2. Handle the trivial "**" pattern. If both part1 and part2 and part3 are empty, return 0 immediately because the empty substring matches.
  3. Scan s from left to right to find the earliest positions where part1 and part2 can appear in order. For each index, use a pointer to see if part1 matches starting at that index; then continue scanning for part2 after part1.
  4. Scan s from right to left to find the latest positions where part3 can appear. This ensures that the substring can accommodate the suffix.
  5. Combine positions. For each possible start index of part1, find the earliest end index of part3 that occurs after the start of part1 and the occurrence of part2 in between. Compute the length of this substring and update the minimum length.
  6. Return the minimum length found. If no valid substring exists, return -1.

Why it works: The invariant is that each candidate substring contains all three fixed segments of p in the correct order. Since wildcards can match any number of characters, only the order of segments matters. Scanning left-to-right for prefix and middle and right-to-left for suffix guarantees we find the minimal-length substring that contains all segments in order.

Python Solution

class Solution:
    def shortestMatchingSubstring(self, s: str, p: str) -> int:
        first_star = p.find('*')
        last_star = p.rfind('*')
        part1 = p[:first_star]
        part2 = p[first_star+1:last_star]
        part3 = p[last_star+1:]

        if not part1 and not part2 and not part3:
            return 0  # "**" matches empty substring

        n = len(s)
        min_len = float('inf')

        # Left to right pass to find earliest positions for part1 and part2
        left_pos = [-1] * 3
        i = 0
        for idx, part in enumerate([part1, part2, part3]):
            if not part:
                left_pos[idx] = i
                continue
            while i <= n - len(part):
                if s[i:i+len(part)] == part:
                    left_pos[idx] = i
                    i += len(part)
                    break
                i += 1
            else:
                left_pos[idx] = -1

        # Right to left pass to find latest positions for part1, part2, part3
        right_pos = [-1] * 3
        j = n
        for idx, part in reversed(list(enumerate([part1, part2, part3]))):
            if not part:
                right_pos[idx] = j
                continue
            while j >= len(part):
                if s[j-len(part):j] == part:
                    right_pos[idx] = j
                    j -= len(part)
                    break
                j -= 1
            else:
                right_pos[idx] = -1

        # If any part is not found, return -1
        if -1 in left_pos or -1 in right_pos:
            return -1

        # Minimal substring length from first part start to last part end
        min_len = right_pos[2] - left_pos[0]
        return min_len

Implementation Explanation: The code first splits the pattern into three segments and handles the trivial "**" case. It then performs a left-to-right scan to locate the earliest positions of the prefix and middle segments. A right-to-left scan locates the latest positions of the suffix. If all parts exist in s, the difference between the earliest prefix start and latest suffix end gives the minimal substring length.

Go Solution

func shortestMatchingSubstring(s string, p string) int {
    firstStar := -1
    lastStar := -1
    for i, c := range p {
        if c == '*' {
            if firstStar == -1 {
                firstStar = i
            }
            lastStar = i
        }
    }

    part1 := p[:firstStar]
    part2 := p[firstStar+1:lastStar]
    part3 := p[lastStar+1:]

    if part1 == "" && part2 == "" && part3 == "" {
        return 0
    }

    n := len(s)
    leftPos := [3]int{-1, -1, -1}
    i := 0
    parts := []string{part1, part2, part3}
    for idx, part := range parts {
        if part == "" {
            leftPos[idx] = i
            continue
        }
        for i <= n-len(part) {
            if s[i:i+len(part)] == part {
                leftPos[idx] = i
                i += len(part)
                break
            }
            i++
        }
        if leftPos[idx] == -1 {
            return -1
        }
    }

    rightPos := [3]int{-1, -1, -1}
    j := n
    for idx := 2; idx >= 0; idx-- {
        part := parts[idx]
        if part == "" {
            rightPos[idx] = j
            continue
        }
        for j >= len(part) {
            if s[j-len(part):j] == part {
                rightPos[idx] = j
                j -= len(part)
                break
            }
            j--
        }
        if rightPos[idx] == -1 {
            return -1
        }
    }

    return rightPos[2] - leftPos[0]
}

Go Notes: Similar logic applies. Slices are used for substring extraction. Care is taken with slice bounds (j-len(part):j) when scanning from right. Go does not have float('inf'), so we do not need it; we directly compute the substring length after confirming all parts exist.

Worked Examples

Example 1: s = "abaacbaecebce", p = "ba_c_ce"

Segments: part1="ba", part2="c", part3="ce"

Left-to-right scan:

| Part | Start