LeetCode 3008 - Find Beautiful Indices in the Given Array II

This problem asks us to find every starting index where the substring a appears inside the string s, subject to an additional proximity condition involving another substring b. More specifically, an index i is considered beautiful if two conditions hold: 1.

LeetCode Problem 3008

Difficulty: 🔴 Hard
Topics: Two Pointers, String, Binary Search, Rolling Hash, String Matching, Hash Function

Solution

LeetCode 3008 - Find Beautiful Indices in the Given Array II

Problem Understanding

This problem asks us to find every starting index where the substring a appears inside the string s, subject to an additional proximity condition involving another substring b.

More specifically, an index i is considered beautiful if two conditions hold:

  1. The substring of s starting at index i matches a.
  2. There exists some index j where the substring of s starting at j matches b, and the distance between i and j is at most k.

The output must contain all such beautiful indices in increasing order.

The input consists of:

  • A large string s
  • Two pattern strings, a and b
  • An integer k

The constraints are extremely important here:

  • s.length can be as large as 5 * 10^5
  • a.length and b.length can also each be as large as 5 * 10^5

These constraints immediately rule out many naive substring comparison approaches. Any solution with quadratic behavior will time out.

The core challenge has two separate parts:

  1. Efficiently finding all occurrences of a and b
  2. Efficiently checking whether each occurrence of a has a nearby occurrence of b

A naive implementation might repeatedly scan the entire string or compare substrings character by character, which becomes far too expensive for large inputs.

Several edge cases are important:

  • a and b may overlap
  • a and b may be identical
  • There may be many repeated matches
  • k may be very large, allowing almost every occurrence
  • k may be very small, requiring exact or near exact positioning
  • The string may contain no occurrences of a or b

The problem guarantees that all strings contain only lowercase English letters, which simplifies hashing and string processing.

Approaches

Brute Force Approach

The brute force solution would first examine every possible starting index in s and check whether a occurs there. For every such occurrence, it would then scan every possible position again to determine whether b appears within distance k.

To verify substring equality, the brute force approach would compare characters one by one.

Suppose:

  • n = len(s)
  • m = len(a)
  • p = len(b)

For each position in s, substring comparison costs up to O(m) or O(p). Then, for each occurrence of a, we may scan all occurrences of b.

This creates extremely poor worst case complexity, especially for large inputs near 5 * 10^5.

Although this method is logically correct, it is far too slow for the given constraints.

Optimal Approach

The key observation is that the problem naturally separates into two efficient subproblems:

  1. Find all occurrences of a
  2. Find all occurrences of b

Once we have two sorted lists of indices, the remaining task becomes a range query problem.

We can efficiently solve substring matching using the Knuth-Morris-Pratt algorithm, commonly called KMP. KMP finds all occurrences of a pattern inside a text in linear time.

After obtaining:

  • positionsA
  • positionsB

we can use binary search to determine whether each index in positionsA has some nearby index in positionsB.

Since both lists are naturally sorted, binary search allows us to efficiently locate the closest candidate.

This reduces the overall complexity to nearly linear time.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(1) Repeated substring checks and nested scans
Optimal O(n + m + p) O(n) KMP for pattern matching and binary search for proximity checks

Algorithm Walkthrough

Step 1, Find all occurrences of pattern a

We use KMP string matching to scan s and locate every index where a begins.

KMP avoids rechecking characters by preprocessing the pattern into an LPS array, short for Longest Prefix Suffix.

The LPS array tells us how far we can shift the pattern after a mismatch while preserving previously matched information.

This allows pattern matching in linear time.

Step 2, Find all occurrences of pattern b

We repeat the same KMP process for pattern b.

At the end of this step we have:

  • positionsA, sorted list of all indices where a occurs
  • positionsB, sorted list of all indices where b occurs

Step 3, Process each occurrence of a

For every index i in positionsA, we need to determine whether there exists some index j in positionsB such that:

$$|i - j| \le k$$

Since positionsB is sorted, binary search becomes very useful.

We search for the leftmost position in positionsB that is greater than or equal to:

$$i - k$$

If such an index exists and is also less than or equal to:

$$i + k$$

then i is beautiful.

Step 4, Build the answer list

Whenever the proximity condition succeeds, we append i to the result list.

Because positionsA is already sorted, the result automatically remains sorted.

Why it works

KMP guarantees that every occurrence of a pattern is found exactly once in linear time. After collecting all valid occurrences of a and b, the binary search step correctly determines whether some occurrence of b lies within the allowed distance range.

Since every candidate index from positionsA is checked independently and accurately, the algorithm returns exactly all beautiful indices.

Python Solution

from typing import List
import bisect

class Solution:
    def beautifulIndices(self, s: str, a: str, b: str, k: int) -> List[int]:
        
        def build_lps(pattern: str) -> List[int]:
            lps = [0] * len(pattern)
            length = 0
            i = 1

            while i < len(pattern):
                if pattern[i] == pattern[length]:
                    length += 1
                    lps[i] = length
                    i += 1
                else:
                    if length != 0:
                        length = lps[length - 1]
                    else:
                        lps[i] = 0
                        i += 1

            return lps

        def kmp_search(text: str, pattern: str) -> List[int]:
            lps = build_lps(pattern)

            matches = []

            text_index = 0
            pattern_index = 0

            while text_index < len(text):
                if text[text_index] == pattern[pattern_index]:
                    text_index += 1
                    pattern_index += 1

                if pattern_index == len(pattern):
                    matches.append(text_index - pattern_index)
                    pattern_index = lps[pattern_index - 1]

                elif (
                    text_index < len(text)
                    and text[text_index] != pattern[pattern_index]
                ):
                    if pattern_index != 0:
                        pattern_index = lps[pattern_index - 1]
                    else:
                        text_index += 1

            return matches

        positions_a = kmp_search(s, a)
        positions_b = kmp_search(s, b)

        result = []

        for index_a in positions_a:
            left_bound = index_a - k
            right_bound = index_a + k

            position = bisect.bisect_left(positions_b, left_bound)

            if (
                position < len(positions_b)
                and positions_b[position] <= right_bound
            ):
                result.append(index_a)

        return result

The implementation begins by defining a helper function to construct the LPS array for KMP. The LPS array stores the length of the longest proper prefix that is also a suffix for every prefix of the pattern.

The kmp_search function uses this LPS array to efficiently scan the text. Instead of restarting comparisons after mismatches, it reuses previously matched information, which guarantees linear time complexity.

After computing all occurrences of a and b, the algorithm iterates through every occurrence of a.

For each index, it computes the valid range:

  • Minimum acceptable index: index_a - k
  • Maximum acceptable index: index_a + k

Binary search then finds the first occurrence of b that is not smaller than the left boundary. If that occurrence also lies within the right boundary, the index is beautiful.

The final result list is returned in sorted order automatically because occurrences were processed in increasing order.

Go Solution

package main

import "sort"

func makeLPS(pattern string) []int {
	lps := make([]int, len(pattern))

	length := 0
	i := 1

	for i < len(pattern) {
		if pattern[i] == pattern[length] {
			length++
			lps[i] = length
			i++
		} else {
			if length != 0 {
				length = lps[length-1]
			} else {
				lps[i] = 0
				i++
			}
		}
	}

	return lps
}

func kmpSearch(text string, pattern string) []int {
	lps := makeLPS(pattern)

	var matches []int

	textIndex := 0
	patternIndex := 0

	for textIndex < len(text) {
		if text[textIndex] == pattern[patternIndex] {
			textIndex++
			patternIndex++
		}

		if patternIndex == len(pattern) {
			matches = append(matches, textIndex-patternIndex)
			patternIndex = lps[patternIndex-1]
		} else if textIndex < len(text) &&
			text[textIndex] != pattern[patternIndex] {

			if patternIndex != 0 {
				patternIndex = lps[patternIndex-1]
			} else {
				textIndex++
			}
		}
	}

	return matches
}

func beautifulIndices(s string, a string, b string, k int) []int {

	positionsA := kmpSearch(s, a)
	positionsB := kmpSearch(s, b)

	var result []int

	for _, indexA := range positionsA {

		leftBound := indexA - k
		rightBound := indexA + k

		position := sort.SearchInts(positionsB, leftBound)

		if position < len(positionsB) &&
			positionsB[position] <= rightBound {

			result = append(result, indexA)
		}
	}

	return result
}

The Go implementation follows exactly the same algorithmic structure as the Python solution.

One important Go specific detail is the use of sort.SearchInts for binary search. This function efficiently locates the insertion position of the left boundary inside the sorted list of occurrences.

Go strings are byte indexed, which works perfectly here because the problem guarantees lowercase English letters only. If Unicode characters were involved, rune handling would be necessary.

Slices are dynamically sized, making them a natural replacement for Python lists.

Worked Examples

Example 1

s = "isawsquirrelnearmysquirrelhouseohmy"
a = "my"
b = "squirrel"
k = 15

Step 1, Find occurrences of a

Occurrences of "my":

Index Substring
16 "my"
33 "my"

So:

positionsA = [16, 33]

Step 2, Find occurrences of b

Occurrences of "squirrel":

Index Substring
4 "squirrel"
18 "squirrel"

So:

positionsB = [4, 18]

Step 3, Check each occurrence of a

indexA Valid Range Nearby b Exists? Beautiful?
16 [1, 31] 4 and 18 both valid Yes
33 [18, 48] 18 valid Yes

Final result:

[16, 33]

Example 2

s = "abcd"
a = "a"
b = "a"
k = 4

Occurrences:

positionsA = [0]
positionsB = [0]

Check:

indexA Valid Range Nearby b Exists?
0 [-4, 4] Yes

Final answer:

[0]

Complexity Analysis

Measure Complexity Explanation
Time O(n + m + p) KMP searches are linear, binary searches are logarithmic
Space O(n) Storage for occurrence lists and LPS arrays

The KMP preprocessing for each pattern takes linear time in the pattern length. The text scan also takes linear time because each character is processed a constant number of times.

The binary searches contribute:

$$O(\text{occurrences of } a \times \log(\text{occurrences of } b))$$

which is still efficient and dominated by the linear scanning phase in practice.

Test Cases

solution = Solution()

assert solution.beautifulIndices(
    "isawsquirrelnearmysquirrelhouseohmy",
    "my",
    "squirrel",
    15
) == [16, 33]  # provided example

assert solution.beautifulIndices(
    "abcd",
    "a",
    "a",
    4
) == [0]  # identical patterns

assert solution.beautifulIndices(
    "aaaaa",
    "aa",
    "aa",
    0
) == [0, 1, 2, 3]  # overlapping matches

assert solution.beautifulIndices(
    "abcdef",
    "gh",
    "ab",
    2
) == []  # no occurrence of a

assert solution.beautifulIndices(
    "abcdef",
    "ab",
    "gh",
    2
) == []  # no occurrence of b

assert solution.beautifulIndices(
    "abababab",
    "ab",
    "ba",
    1
) == [0, 2, 4, 6]  # alternating overlapping patterns

assert solution.beautifulIndices(
    "aaaaaa",
    "aaa",
    "aaa",
    2
) == [0, 1, 2, 3]  # dense overlapping matches

assert solution.beautifulIndices(
    "xyz",
    "x",
    "z",
    1
) == []  # distance too large

assert solution.beautifulIndices(
    "xyz",
    "x",
    "z",
    2
) == [0]  # exact boundary distance

assert solution.beautifulIndices(
    "a",
    "a",
    "a",
    1
) == [0]  # minimum input size
Test Why
"isawsquirrelnearmysquirrelhouseohmy" Validates standard multi match behavior
"abcd" Tests identical patterns
"aaaaa" Validates overlapping matches
No occurrence of a Ensures empty result handling
No occurrence of b Ensures proximity check correctness
"abababab" Tests alternating overlaps
"aaaaaa" Stress test for repeated characters
"xyz" with k=1 Tests failing distance constraint
"xyz" with k=2 Tests exact boundary condition
Single character input Validates smallest valid input

Edge Cases

Overlapping pattern matches

One subtle issue is overlapping occurrences. For example, in "aaaaa" the pattern "aa" appears at indices 0, 1, 2, and 3.

A naive implementation might accidentally skip overlaps after finding a match. KMP handles overlaps naturally because after each successful match it reuses the LPS array to continue searching efficiently without restarting completely.

No occurrences of one pattern

If either a or b never appears in the string, the result must be empty.

The implementation handles this cleanly because the occurrence lists become empty arrays. Binary search on an empty array simply fails the bounds check and no indices are added to the result.

Exact boundary distances

The condition is:

$$|i - j| \le k$$

The equality case matters. If the distance is exactly k, the index is still beautiful.

The implementation correctly includes boundary values by checking:

positions_b[position] <= right_bound

which preserves the inclusive comparison required by the problem statement.