LeetCode 3008 - Find Beautiful Indices in the Given Array II
This problem asks us to find every starting index where the substring a appears inside the string s, subject to an additional proximity condition involving another substring b. More specifically, an index i is considered beautiful if two conditions hold: 1.
Difficulty: 🔴 Hard
Topics: Two Pointers, String, Binary Search, Rolling Hash, String Matching, Hash Function
Solution
LeetCode 3008 - Find Beautiful Indices in the Given Array II
Problem Understanding
This problem asks us to find every starting index where the substring a appears inside the string s, subject to an additional proximity condition involving another substring b.
More specifically, an index i is considered beautiful if two conditions hold:
- The substring of
sstarting at indeximatchesa. - There exists some index
jwhere the substring ofsstarting atjmatchesb, and the distance betweeniandjis at mostk.
The output must contain all such beautiful indices in increasing order.
The input consists of:
- A large string
s - Two pattern strings,
aandb - An integer
k
The constraints are extremely important here:
s.lengthcan be as large as5 * 10^5a.lengthandb.lengthcan also each be as large as5 * 10^5
These constraints immediately rule out many naive substring comparison approaches. Any solution with quadratic behavior will time out.
The core challenge has two separate parts:
- Efficiently finding all occurrences of
aandb - Efficiently checking whether each occurrence of
ahas a nearby occurrence ofb
A naive implementation might repeatedly scan the entire string or compare substrings character by character, which becomes far too expensive for large inputs.
Several edge cases are important:
aandbmay overlapaandbmay be identical- There may be many repeated matches
kmay be very large, allowing almost every occurrencekmay be very small, requiring exact or near exact positioning- The string may contain no occurrences of
aorb
The problem guarantees that all strings contain only lowercase English letters, which simplifies hashing and string processing.
Approaches
Brute Force Approach
The brute force solution would first examine every possible starting index in s and check whether a occurs there. For every such occurrence, it would then scan every possible position again to determine whether b appears within distance k.
To verify substring equality, the brute force approach would compare characters one by one.
Suppose:
n = len(s)m = len(a)p = len(b)
For each position in s, substring comparison costs up to O(m) or O(p). Then, for each occurrence of a, we may scan all occurrences of b.
This creates extremely poor worst case complexity, especially for large inputs near 5 * 10^5.
Although this method is logically correct, it is far too slow for the given constraints.
Optimal Approach
The key observation is that the problem naturally separates into two efficient subproblems:
- Find all occurrences of
a - Find all occurrences of
b
Once we have two sorted lists of indices, the remaining task becomes a range query problem.
We can efficiently solve substring matching using the Knuth-Morris-Pratt algorithm, commonly called KMP. KMP finds all occurrences of a pattern inside a text in linear time.
After obtaining:
positionsApositionsB
we can use binary search to determine whether each index in positionsA has some nearby index in positionsB.
Since both lists are naturally sorted, binary search allows us to efficiently locate the closest candidate.
This reduces the overall complexity to nearly linear time.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | Repeated substring checks and nested scans |
| Optimal | O(n + m + p) | O(n) | KMP for pattern matching and binary search for proximity checks |
Algorithm Walkthrough
Step 1, Find all occurrences of pattern a
We use KMP string matching to scan s and locate every index where a begins.
KMP avoids rechecking characters by preprocessing the pattern into an LPS array, short for Longest Prefix Suffix.
The LPS array tells us how far we can shift the pattern after a mismatch while preserving previously matched information.
This allows pattern matching in linear time.
Step 2, Find all occurrences of pattern b
We repeat the same KMP process for pattern b.
At the end of this step we have:
positionsA, sorted list of all indices whereaoccurspositionsB, sorted list of all indices whereboccurs
Step 3, Process each occurrence of a
For every index i in positionsA, we need to determine whether there exists some index j in positionsB such that:
$$|i - j| \le k$$
Since positionsB is sorted, binary search becomes very useful.
We search for the leftmost position in positionsB that is greater than or equal to:
$$i - k$$
If such an index exists and is also less than or equal to:
$$i + k$$
then i is beautiful.
Step 4, Build the answer list
Whenever the proximity condition succeeds, we append i to the result list.
Because positionsA is already sorted, the result automatically remains sorted.
Why it works
KMP guarantees that every occurrence of a pattern is found exactly once in linear time. After collecting all valid occurrences of a and b, the binary search step correctly determines whether some occurrence of b lies within the allowed distance range.
Since every candidate index from positionsA is checked independently and accurately, the algorithm returns exactly all beautiful indices.
Python Solution
from typing import List
import bisect
class Solution:
def beautifulIndices(self, s: str, a: str, b: str, k: int) -> List[int]:
def build_lps(pattern: str) -> List[int]:
lps = [0] * len(pattern)
length = 0
i = 1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
def kmp_search(text: str, pattern: str) -> List[int]:
lps = build_lps(pattern)
matches = []
text_index = 0
pattern_index = 0
while text_index < len(text):
if text[text_index] == pattern[pattern_index]:
text_index += 1
pattern_index += 1
if pattern_index == len(pattern):
matches.append(text_index - pattern_index)
pattern_index = lps[pattern_index - 1]
elif (
text_index < len(text)
and text[text_index] != pattern[pattern_index]
):
if pattern_index != 0:
pattern_index = lps[pattern_index - 1]
else:
text_index += 1
return matches
positions_a = kmp_search(s, a)
positions_b = kmp_search(s, b)
result = []
for index_a in positions_a:
left_bound = index_a - k
right_bound = index_a + k
position = bisect.bisect_left(positions_b, left_bound)
if (
position < len(positions_b)
and positions_b[position] <= right_bound
):
result.append(index_a)
return result
The implementation begins by defining a helper function to construct the LPS array for KMP. The LPS array stores the length of the longest proper prefix that is also a suffix for every prefix of the pattern.
The kmp_search function uses this LPS array to efficiently scan the text. Instead of restarting comparisons after mismatches, it reuses previously matched information, which guarantees linear time complexity.
After computing all occurrences of a and b, the algorithm iterates through every occurrence of a.
For each index, it computes the valid range:
- Minimum acceptable index:
index_a - k - Maximum acceptable index:
index_a + k
Binary search then finds the first occurrence of b that is not smaller than the left boundary. If that occurrence also lies within the right boundary, the index is beautiful.
The final result list is returned in sorted order automatically because occurrences were processed in increasing order.
Go Solution
package main
import "sort"
func makeLPS(pattern string) []int {
lps := make([]int, len(pattern))
length := 0
i := 1
for i < len(pattern) {
if pattern[i] == pattern[length] {
length++
lps[i] = length
i++
} else {
if length != 0 {
length = lps[length-1]
} else {
lps[i] = 0
i++
}
}
}
return lps
}
func kmpSearch(text string, pattern string) []int {
lps := makeLPS(pattern)
var matches []int
textIndex := 0
patternIndex := 0
for textIndex < len(text) {
if text[textIndex] == pattern[patternIndex] {
textIndex++
patternIndex++
}
if patternIndex == len(pattern) {
matches = append(matches, textIndex-patternIndex)
patternIndex = lps[patternIndex-1]
} else if textIndex < len(text) &&
text[textIndex] != pattern[patternIndex] {
if patternIndex != 0 {
patternIndex = lps[patternIndex-1]
} else {
textIndex++
}
}
}
return matches
}
func beautifulIndices(s string, a string, b string, k int) []int {
positionsA := kmpSearch(s, a)
positionsB := kmpSearch(s, b)
var result []int
for _, indexA := range positionsA {
leftBound := indexA - k
rightBound := indexA + k
position := sort.SearchInts(positionsB, leftBound)
if position < len(positionsB) &&
positionsB[position] <= rightBound {
result = append(result, indexA)
}
}
return result
}
The Go implementation follows exactly the same algorithmic structure as the Python solution.
One important Go specific detail is the use of sort.SearchInts for binary search. This function efficiently locates the insertion position of the left boundary inside the sorted list of occurrences.
Go strings are byte indexed, which works perfectly here because the problem guarantees lowercase English letters only. If Unicode characters were involved, rune handling would be necessary.
Slices are dynamically sized, making them a natural replacement for Python lists.
Worked Examples
Example 1
s = "isawsquirrelnearmysquirrelhouseohmy"
a = "my"
b = "squirrel"
k = 15
Step 1, Find occurrences of a
Occurrences of "my":
| Index | Substring |
|---|---|
| 16 | "my" |
| 33 | "my" |
So:
positionsA = [16, 33]
Step 2, Find occurrences of b
Occurrences of "squirrel":
| Index | Substring |
|---|---|
| 4 | "squirrel" |
| 18 | "squirrel" |
So:
positionsB = [4, 18]
Step 3, Check each occurrence of a
| indexA | Valid Range | Nearby b Exists? |
Beautiful? |
|---|---|---|---|
| 16 | [1, 31] | 4 and 18 both valid | Yes |
| 33 | [18, 48] | 18 valid | Yes |
Final result:
[16, 33]
Example 2
s = "abcd"
a = "a"
b = "a"
k = 4
Occurrences:
positionsA = [0]
positionsB = [0]
Check:
| indexA | Valid Range | Nearby b Exists? |
|---|---|---|
| 0 | [-4, 4] | Yes |
Final answer:
[0]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n + m + p) | KMP searches are linear, binary searches are logarithmic |
| Space | O(n) | Storage for occurrence lists and LPS arrays |
The KMP preprocessing for each pattern takes linear time in the pattern length. The text scan also takes linear time because each character is processed a constant number of times.
The binary searches contribute:
$$O(\text{occurrences of } a \times \log(\text{occurrences of } b))$$
which is still efficient and dominated by the linear scanning phase in practice.
Test Cases
solution = Solution()
assert solution.beautifulIndices(
"isawsquirrelnearmysquirrelhouseohmy",
"my",
"squirrel",
15
) == [16, 33] # provided example
assert solution.beautifulIndices(
"abcd",
"a",
"a",
4
) == [0] # identical patterns
assert solution.beautifulIndices(
"aaaaa",
"aa",
"aa",
0
) == [0, 1, 2, 3] # overlapping matches
assert solution.beautifulIndices(
"abcdef",
"gh",
"ab",
2
) == [] # no occurrence of a
assert solution.beautifulIndices(
"abcdef",
"ab",
"gh",
2
) == [] # no occurrence of b
assert solution.beautifulIndices(
"abababab",
"ab",
"ba",
1
) == [0, 2, 4, 6] # alternating overlapping patterns
assert solution.beautifulIndices(
"aaaaaa",
"aaa",
"aaa",
2
) == [0, 1, 2, 3] # dense overlapping matches
assert solution.beautifulIndices(
"xyz",
"x",
"z",
1
) == [] # distance too large
assert solution.beautifulIndices(
"xyz",
"x",
"z",
2
) == [0] # exact boundary distance
assert solution.beautifulIndices(
"a",
"a",
"a",
1
) == [0] # minimum input size
| Test | Why |
|---|---|
"isawsquirrelnearmysquirrelhouseohmy" |
Validates standard multi match behavior |
"abcd" |
Tests identical patterns |
"aaaaa" |
Validates overlapping matches |
No occurrence of a |
Ensures empty result handling |
No occurrence of b |
Ensures proximity check correctness |
"abababab" |
Tests alternating overlaps |
"aaaaaa" |
Stress test for repeated characters |
"xyz" with k=1 |
Tests failing distance constraint |
"xyz" with k=2 |
Tests exact boundary condition |
| Single character input | Validates smallest valid input |
Edge Cases
Overlapping pattern matches
One subtle issue is overlapping occurrences. For example, in "aaaaa" the pattern "aa" appears at indices 0, 1, 2, and 3.
A naive implementation might accidentally skip overlaps after finding a match. KMP handles overlaps naturally because after each successful match it reuses the LPS array to continue searching efficiently without restarting completely.
No occurrences of one pattern
If either a or b never appears in the string, the result must be empty.
The implementation handles this cleanly because the occurrence lists become empty arrays. Binary search on an empty array simply fails the bounds check and no indices are added to the result.
Exact boundary distances
The condition is:
$$|i - j| \le k$$
The equality case matters. If the distance is exactly k, the index is still beautiful.
The implementation correctly includes boundary values by checking:
positions_b[position] <= right_bound
which preserves the inclusive comparison required by the problem statement.