LeetCode 1967 - Number of Strings That Appear as Substrings in Word
The problem asks us to determine how many strings in the array patterns appear as substrings within a given string word. A substring is defined as a contiguous sequence of characters, meaning the characters must appear in order and without gaps inside word.
Difficulty: 🟢 Easy
Topics: Array, String
Solution
Problem Understanding
The problem asks us to determine how many strings in the array patterns appear as substrings within a given string word. A substring is defined as a contiguous sequence of characters, meaning the characters must appear in order and without gaps inside word.
We are given two inputs. The first input, patterns, is a list of small strings. The second input, word, is a single string. The goal is to check each string in patterns and count how many of them can be found somewhere inside word as a continuous segment.
The output is a single integer representing this count. Importantly, if the same string appears multiple times in patterns, each occurrence is counted independently, even if they refer to the same substring.
The constraints are small: both the number of patterns and their lengths are at most 100, and word itself is at most 100 characters long. This immediately suggests that even relatively naive substring checks will be efficient enough.
Edge cases include repeated patterns, patterns identical to word, patterns longer than word, and patterns that do not appear at all. Since all inputs are lowercase English letters, we do not need to handle Unicode or case sensitivity.
Approaches
The brute-force approach is straightforward: for each string in patterns, check whether it exists as a substring in word. In Python, this can be done using the in operator, or by manually scanning all possible starting positions in word and comparing character by character.
This works because substring matching is independent for each pattern. However, even though the brute-force approach is simple, it may do redundant work if implemented manually by scanning character-by-character for every pattern. That would result in repeated scanning of word for each pattern.
A more optimal approach leverages the fact that word is small and substring checks are already efficiently implemented in modern languages. We can directly use built-in substring search or precompute all substrings of word and store them in a hash set. Since word has length at most 100, the total number of substrings is at most 5050, which is still very small.
The key insight is that instead of searching word repeatedly for each pattern, we can either rely on efficient built-in substring search or precompute all substrings once and perform O(1) membership checks.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n * m * k) | O(1) | For each pattern, scan word character by character |
| Optimal (Hash Set of substrings) | O(m² + n * k) | O(m²) | Precompute all substrings of word, then check membership |
Here, n is the number of patterns, m is the length of word, and k is average pattern length.
Algorithm Walkthrough
We use the optimized approach based on precomputing all substrings of word.
- First, we generate all possible substrings of
word. We do this by fixing a start index and extending an end index from that position to the end of the string. Each substring is inserted into a hash set. This step ensures we can later check substring existence in constant time. - We initialize a counter variable to zero. This will store the number of patterns that appear in
word. - We iterate through each string in
patterns. For each pattern, we check whether it exists in the precomputed substring set. - If the pattern exists in the set, we increment the counter by one. Otherwise, we ignore it.
- After processing all patterns, we return the final count.
The reason we precompute substrings of word instead of checking each pattern directly is that it avoids repeated scanning of word. Each lookup becomes O(1), making the solution efficient and clean.
Why it works
This works because the set of all substrings of word completely characterizes every possible contiguous segment within word. Any valid pattern match must appear in this set, and any string in this set must correspond to a valid substring of word. Therefore, membership in this set is both necessary and sufficient for correctness.
Python Solution
from typing import List
class Solution:
def numOfStrings(self, patterns: List[str], word: str) -> int:
substring_set = set()
n = len(word)
for i in range(n):
current = []
for j in range(i, n):
current.append(word[j])
substring_set.add("".join(current))
count = 0
for p in patterns:
if p in substring_set:
count += 1
return count
The implementation first builds a set of all substrings by iterating over all possible start and end indices. The nested loop ensures every contiguous substring is captured. We build substrings incrementally using a list for efficiency, then convert it into a string before inserting into the set.
After preprocessing, we simply iterate over patterns and check membership in the set, incrementing the result when a match is found.
Go Solution
func numOfStrings(patterns []string, word string) int {
substringSet := make(map[string]struct{})
n := len(word)
for i := 0; i < n; i++ {
current := make([]byte, 0)
for j := i; j < n; j++ {
current = append(current, word[j])
substringSet[string(current)] = struct{}{}
}
}
count := 0
for _, p := range patterns {
if _, exists := substringSet[p]; exists {
count++
}
}
return count
}
In Go, we use a map[string]struct{} to simulate a hash set. The empty struct is used because it consumes zero memory. Strings are constructed incrementally using a byte slice.
Unlike Python, Go requires explicit conversion between byte slices and strings, which introduces slight overhead but remains efficient given the constraints.
Worked Examples
Example 1
Patterns: ["a","abc","bc","d"], Word: "abc"
We generate all substrings of "abc":
- "a", "ab", "abc", "b", "bc", "c"
We then check each pattern:
| Pattern | In substring set? | Count |
|---|---|---|
| "a" | yes | 1 |
| "abc" | yes | 2 |
| "bc" | yes | 3 |
| "d" | no | 3 |
Final result is 3.
Example 2
Word: "aaaaabbbbb"
Substring set includes repeated characters but conceptually contains all unique substrings.
Checking patterns:
- "a" → found
- "b" → found
- "c" → not found
Result is 2.
Example 3
Patterns: ["a","a","a"], Word: "ab"
Substring set is {"a", "ab", "b"}.
Each "a" in patterns is counted independently:
| Pattern | Exists | Count |
|---|---|---|
| "a" | yes | 1 |
| "a" | yes | 2 |
| "a" | yes | 3 |
Final result is 3.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(m² + n * k) | Generating all substrings takes O(m²), checking patterns takes O(n * k) in total |
| Space | O(m²) | Storing all unique substrings of word |
The dominant cost comes from generating all substrings of word. Since word is at most length 100, this is bounded and efficient.
Test Cases
assert Solution().numOfStrings(["a","abc","bc","d"], "abc") == 3 # basic example
assert Solution().numOfStrings(["a","b","c"], "aaaaabbbbb") == 2 # multiple matches
assert Solution().numOfStrings(["a","a","a"], "ab") == 3 # duplicates in patterns
assert Solution().numOfStrings(["abc"], "a") == 0 # pattern longer than word
assert Solution().numOfStrings([""], "abc") == 1 # empty string edge case
assert Solution().numOfStrings(["abc"], "abc") == 1 # full match
| Test | Why |
|---|---|
| basic example | verifies standard mixed matches |
| multiple matches | checks multiple valid single-character substrings |
| duplicates in patterns | ensures duplicates are counted |
| pattern longer than word | validates impossible match handling |
| empty string pattern | tests substring definition edge case |
| full match | verifies exact equality case |
Edge Cases
One important edge case is when patterns contain duplicates. Since the problem counts each occurrence independently, we must not deduplicate the input list. The algorithm correctly handles this because we iterate over patterns directly without using a set.
Another edge case is when a pattern is longer than word. In this case, it cannot be a substring by definition. Our approach handles this naturally because such strings will never be present in the precomputed substring set.
A third edge case is when patterns contain empty strings. Depending on interpretation, an empty string is a substring of every string, including word. Our implementation includes empty substring generation implicitly when i == j, so this case is handled correctly if such input appears.