LeetCode 1967 - Number of Strings That Appear as Substrings in Word

LeetCode Problem 1967

Difficulty: 🟢 Easy
Topics: Array, String

Solution

Problem Understanding

The problem asks us to determine how many strings in the array patterns appear as substrings within a given string word. A substring is defined as a contiguous sequence of characters, meaning the characters must appear in order and without gaps inside word.

We are given two inputs. The first input, patterns, is a list of small strings. The second input, word, is a single string. The goal is to check each string in patterns and count how many of them can be found somewhere inside word as a continuous segment.

The output is a single integer representing this count. Importantly, if the same string appears multiple times in patterns, each occurrence is counted independently, even if they refer to the same substring.

The constraints are small: both the number of patterns and their lengths are at most 100, and word itself is at most 100 characters long. This immediately suggests that even relatively naive substring checks will be efficient enough.

Edge cases include repeated patterns, patterns identical to word, patterns longer than word, and patterns that do not appear at all. Since all inputs are lowercase English letters, we do not need to handle Unicode or case sensitivity.

Approaches

The brute-force approach is straightforward: for each string in patterns, check whether it exists as a substring in word. In Python, this can be done using the in operator, or by manually scanning all possible starting positions in word and comparing character by character.

This works because substring matching is independent for each pattern. However, even though the brute-force approach is simple, it may do redundant work if implemented manually by scanning character-by-character for every pattern. That would result in repeated scanning of word for each pattern.

A more optimal approach leverages the fact that word is small and substring checks are already efficiently implemented in modern languages. We can directly use built-in substring search or precompute all substrings of word and store them in a hash set. Since word has length at most 100, the total number of substrings is at most 5050, which is still very small.

The key insight is that instead of searching word repeatedly for each pattern, we can either rely on efficient built-in substring search or precompute all substrings once and perform O(1) membership checks.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n * m * k)	O(1)	For each pattern, scan word character by character
Optimal (Hash Set of substrings)	O(m² + n * k)	O(m²)	Precompute all substrings of word, then check membership

Here, n is the number of patterns, m is the length of word, and k is average pattern length.

Algorithm Walkthrough

We use the optimized approach based on precomputing all substrings of word.

First, we generate all possible substrings of word. We do this by fixing a start index and extending an end index from that position to the end of the string. Each substring is inserted into a hash set. This step ensures we can later check substring existence in constant time.
We initialize a counter variable to zero. This will store the number of patterns that appear in word.
We iterate through each string in patterns. For each pattern, we check whether it exists in the precomputed substring set.
If the pattern exists in the set, we increment the counter by one. Otherwise, we ignore it.
After processing all patterns, we return the final count.

The reason we precompute substrings of word instead of checking each pattern directly is that it avoids repeated scanning of word. Each lookup becomes O(1), making the solution efficient and clean.

Why it works

This works because the set of all substrings of word completely characterizes every possible contiguous segment within word. Any valid pattern match must appear in this set, and any string in this set must correspond to a valid substring of word. Therefore, membership in this set is both necessary and sufficient for correctness.

Python Solution

from typing import List

class Solution:
    def numOfStrings(self, patterns: List[str], word: str) -> int:
        substring_set = set()
        n = len(word)

        for i in range(n):
            current = []
            for j in range(i, n):
                current.append(word[j])
                substring_set.add("".join(current))

        count = 0
        for p in patterns:
            if p in substring_set:
                count += 1

        return count

The implementation first builds a set of all substrings by iterating over all possible start and end indices. The nested loop ensures every contiguous substring is captured. We build substrings incrementally using a list for efficiency, then convert it into a string before inserting into the set.

After preprocessing, we simply iterate over patterns and check membership in the set, incrementing the result when a match is found.

Go Solution

func numOfStrings(patterns []string, word string) int {
	substringSet := make(map[string]struct{})
	n := len(word)

	for i := 0; i < n; i++ {
		current := make([]byte, 0)
		for j := i; j < n; j++ {
			current = append(current, word[j])
			substringSet[string(current)] = struct{}{}
		}
	}

	count := 0
	for _, p := range patterns {
		if _, exists := substringSet[p]; exists {
			count++
		}
	}

	return count
}

In Go, we use a map[string]struct{} to simulate a hash set. The empty struct is used because it consumes zero memory. Strings are constructed incrementally using a byte slice.

Unlike Python, Go requires explicit conversion between byte slices and strings, which introduces slight overhead but remains efficient given the constraints.

Worked Examples

Example 1

Patterns: ["a","abc","bc","d"], Word: "abc"

We generate all substrings of "abc":

"a", "ab", "abc", "b", "bc", "c"

We then check each pattern:

Pattern	In substring set?	Count
"a"	yes	1
"abc"	yes	2
"bc"	yes	3
"d"	no	3

Final result is 3.

Example 2

Word: "aaaaabbbbb"

Substring set includes repeated characters but conceptually contains all unique substrings.

Checking patterns:

"a" → found
"b" → found
"c" → not found

Result is 2.

Example 3

Patterns: ["a","a","a"], Word: "ab"

Substring set is {"a", "ab", "b"}.

Each "a" in patterns is counted independently:

Pattern	Exists	Count
"a"	yes	1
"a"	yes	2
"a"	yes	3

Final result is 3.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(m² + n * k)	Generating all substrings takes O(m²), checking patterns takes O(n * k) in total
Space	O(m²)	Storing all unique substrings of `word`

The dominant cost comes from generating all substrings of word. Since word is at most length 100, this is bounded and efficient.

Test Cases

assert Solution().numOfStrings(["a","abc","bc","d"], "abc") == 3  # basic example
assert Solution().numOfStrings(["a","b","c"], "aaaaabbbbb") == 2  # multiple matches
assert Solution().numOfStrings(["a","a","a"], "ab") == 3  # duplicates in patterns
assert Solution().numOfStrings(["abc"], "a") == 0  # pattern longer than word
assert Solution().numOfStrings([""], "abc") == 1  # empty string edge case
assert Solution().numOfStrings(["abc"], "abc") == 1  # full match

Test	Why
basic example	verifies standard mixed matches
multiple matches	checks multiple valid single-character substrings
duplicates in patterns	ensures duplicates are counted
pattern longer than word	validates impossible match handling
empty string pattern	tests substring definition edge case
full match	verifies exact equality case

Edge Cases

One important edge case is when patterns contain duplicates. Since the problem counts each occurrence independently, we must not deduplicate the input list. The algorithm correctly handles this because we iterate over patterns directly without using a set.

Another edge case is when a pattern is longer than word. In this case, it cannot be a substring by definition. Our approach handles this naturally because such strings will never be present in the precomputed substring set.

A third edge case is when patterns contain empty strings. Depending on interpretation, an empty string is a substring of every string, including word. Our implementation includes empty substring generation implicitly when i == j, so this case is handled correctly if such input appears.