LeetCode 758 - Bold Words in String
The problem gives us two inputs: an array of strings called words, and a target string s. Every occurrence of every word from words inside s must become bold by surrounding that substring with <b and </b tags.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Trie, String Matching
Solution
Problem Understanding
The problem gives us two inputs: an array of strings called words, and a target string s. Every occurrence of every word from words inside s must become bold by surrounding that substring with <b> and </b> tags.
The important requirement is that we must use the minimum number of tags possible. This means that overlapping or adjacent bold regions should be merged into a single bold segment instead of creating multiple nested or separate tags.
For example, if words = ["ab", "bc"] and s = "aabcd", then:
"ab"appears starting at index1"bc"appears starting at index2
These matches overlap on the character 'b', so instead of generating separate bold regions, we combine them into one continuous region covering "abc".
The final result becomes:
a<b>abc</b>d
The constraints are relatively small:
s.length <= 500words.length <= 50words[i].length <= 10
These limits tell us that even moderately inefficient string matching solutions will still work comfortably within time limits. We do not need advanced algorithms like Aho-Corasick or suffix automata. A carefully implemented scanning solution is sufficient.
Several edge cases are important:
- Multiple words may overlap.
- Multiple words may be directly adjacent and should still merge into one bold section.
- Some words may appear many times.
- Some words may never appear.
wordscan be empty.- The entire string may become bold.
- Single-character matches can create many small intervals that need merging.
A naive implementation often fails when handling overlaps and adjacency correctly, especially when deciding where to place opening and closing tags.
Approaches
Brute Force Approach
The brute force idea is to generate every matching substring interval and then repeatedly merge overlapping intervals afterward.
For every word in words, we scan every possible starting position in s. Whenever we find a match, we store its interval [start, end].
After collecting all intervals, we sort them and merge overlapping or adjacent intervals. Finally, we rebuild the string by inserting bold tags around merged ranges.
This approach is correct because every occurrence is explicitly discovered and represented as an interval. Merging intervals guarantees that the minimum number of bold tags is used.
However, this method performs unnecessary interval management and repeated substring checks. While still acceptable for the given constraints, it is less elegant and slightly more complicated than necessary.
Optimal Approach
The key observation is that we do not actually need to store intervals explicitly. Instead, we only need to know whether each character position should be bold.
We create a boolean array bold where:
bold[i] = True
means character s[i] belongs inside a bold region.
We scan every position in s and check whether any word starts there. Whenever a match is found, we mark all covered positions as bold.
Once all matches are processed, constructing the final string becomes straightforward:
- Start a
<b>tag whenever we enter a bold region. - End with
</b>whenever we leave a bold region.
This naturally merges overlapping and adjacent matches because consecutive True positions form one continuous bold segment.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(W × N × L + K log K) | O(K) | Stores and merges intervals |
| Optimal | O(W × N × L) | O(N) | Uses boolean marking array |
Where:
N= length ofsW= number of wordsL= maximum word lengthK= number of matched intervals
Algorithm Walkthrough
Step 1: Create a Bold Marker Array
We create a boolean array called bold with length equal to len(s).
Initially, every value is False because no characters are marked bold yet.
bold = [False] * len(s)
This array will eventually tell us exactly which characters belong inside bold tags.
Step 2: Find Every Matching Word
For every starting index i in the string:
- Check every word in
words - Determine whether the word matches starting at position
i - If it matches, mark all covered character positions as bold
For example:
s = "aabcd"
word = "ab"
If "ab" starts at index 1, then:
bold[1] = True
bold[2] = True
This converts substring matching into simple positional marking.
Step 3: Build the Result String
Now we traverse the string character by character.
At each position:
- If current position is bold and previous position was not bold, insert
<b> - Append the current character
- If current position is bold and next position is not bold, insert
</b>
This ensures:
- Overlapping regions become one segment
- Adjacent regions also merge naturally
- Tags are minimal
Step 4: Return the Final String
Join all collected pieces together and return the final formatted string.
Why it works
The algorithm works because every matched substring marks all of its covered characters as bold. After processing all words, the bold array exactly represents the union of all matching intervals.
When reconstructing the string, contiguous True regions correspond precisely to maximal bold segments. Since overlapping and adjacent intervals produce one continuous sequence of True values, the algorithm automatically generates the minimum number of bold tags.
Python Solution
from typing import List
class Solution:
def boldWords(self, words: List[str], s: str) -> str:
n = len(s)
bold = [False] * n
# Mark bold positions
for i in range(n):
for word in words:
if s.startswith(word, i):
for j in range(i, i + len(word)):
bold[j] = True
result = []
for i in range(n):
# Start bold section
if bold[i] and (i == 0 or not bold[i - 1]):
result.append("<b>")
result.append(s[i])
# End bold section
if bold[i] and (i == n - 1 or not bold[i + 1]):
result.append("</b>")
return "".join(result)
The implementation begins by allocating the bold array. Each position corresponds directly to one character in the input string.
The nested loops perform substring matching. The method s.startswith(word, i) checks whether word begins at index i. If it does, every covered position becomes bold.
After all matches are processed, the second traversal reconstructs the final answer. The logic for opening and closing tags depends entirely on transitions between bold and non-bold states.
An opening tag appears when entering a bold region:
bold[i] and not bold[i - 1]
A closing tag appears when leaving one:
bold[i] and not bold[i + 1]
This guarantees that contiguous bold ranges receive exactly one pair of tags.
Go Solution
func boldWords(words []string, s string) string {
n := len(s)
bold := make([]bool, n)
// Mark bold positions
for i := 0; i < n; i++ {
for _, word := range words {
wordLen := len(word)
if i+wordLen <= n && s[i:i+wordLen] == word {
for j := i; j < i+wordLen; j++ {
bold[j] = true
}
}
}
}
result := make([]byte, 0)
for i := 0; i < n; i++ {
// Start bold section
if bold[i] && (i == 0 || !bold[i-1]) {
result = append(result, []byte("<b>")...)
}
result = append(result, s[i])
// End bold section
if bold[i] && (i == n-1 || !bold[i+1]) {
result = append(result, []byte("</b>")...)
}
}
return string(result)
}
The Go implementation follows the same overall strategy as the Python version.
One notable difference is substring handling. In Go, substring extraction uses slicing:
s[i:i+wordLen]
Before slicing, we must verify:
i + wordLen <= n
to avoid out-of-bounds panics.
The result string is built using a byte slice for efficiency, since repeated string concatenation in Go can become expensive.
Worked Examples
Example 1
words = ["ab", "bc"]
s = "aabcd"
Step 1: Initialize
| Index | Character | Bold |
|---|---|---|
| 0 | a | False |
| 1 | a | False |
| 2 | b | False |
| 3 | c | False |
| 4 | d | False |
Step 2: Process "ab"
"ab" matches starting at index 1.
Mark positions 1 and 2.
| Index | Character | Bold |
|---|---|---|
| 0 | a | False |
| 1 | a | True |
| 2 | b | True |
| 3 | c | False |
| 4 | d | False |
Step 3: Process "bc"
"bc" matches starting at index 2.
Mark positions 2 and 3.
| Index | Character | Bold |
|---|---|---|
| 0 | a | False |
| 1 | a | True |
| 2 | b | True |
| 3 | c | True |
| 4 | d | False |
Step 4: Build Result
| Index | Action | Result |
|---|---|---|
| 0 | append a |
a |
| 1 | open <b> |
a<b> |
| 1 | append a |
a<b>a |
| 2 | append b |
a<b>ab |
| 3 | append c |
a<b>abc |
| 3 | close </b> |
a<b>abc</b> |
| 4 | append d |
a<b>abc</b>d |
Final answer:
a<b>abc</b>d
Example 2
words = ["ab", "cb"]
s = "aabcd"
Step 1: Process "ab"
"ab" matches at index 1.
| Index | Character | Bold |
|---|---|---|
| 0 | a | False |
| 1 | a | True |
| 2 | b | True |
| 3 | c | False |
| 4 | d | False |
Step 2: Process "cb"
No match exists.
Step 3: Build Result
Only indices 1 and 2 are bold.
Result:
a<b>ab</b>cd
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(W × N × L) | For every position, we may compare every word |
| Space | O(N) | Boolean array for bold markers |
Here:
Nis the length ofsWis the number of wordsLis the maximum word length
The dominant work comes from substring matching. Since the constraints are small, this complexity is easily fast enough.
Test Cases
from typing import List
class Solution:
def boldWords(self, words: List[str], s: str) -> str:
n = len(s)
bold = [False] * n
for i in range(n):
for word in words:
if s.startswith(word, i):
for j in range(i, i + len(word)):
bold[j] = True
result = []
for i in range(n):
if bold[i] and (i == 0 or not bold[i - 1]):
result.append("<b>")
result.append(s[i])
if bold[i] and (i == n - 1 or not bold[i + 1]):
result.append("</b>")
return "".join(result)
sol = Solution()
assert sol.boldWords(["ab", "bc"], "aabcd") == "a<b>abc</b>d" # overlapping matches
assert sol.boldWords(["ab", "cb"], "aabcd") == "a<b>ab</b>cd" # single match
assert sol.boldWords([], "abc") == "abc" # no words
assert sol.boldWords(["abc"], "abc") == "<b>abc</b>" # entire string bold
assert sol.boldWords(["a"], "aaaa") == "<b>aaaa</b>" # adjacent matches merge
assert sol.boldWords(["aa"], "aaaa") == "<b>aaaa</b>" # overlapping intervals merge
assert sol.boldWords(["x"], "abc") == "abc" # no occurrence
assert sol.boldWords(["a", "b", "c"], "abc") == "<b>abc</b>" # consecutive regions merge
assert sol.boldWords(["abc", "bc"], "zabcx") == "z<b>abc</b>x" # nested match
assert sol.boldWords(["ab", "cd"], "xabcdz") == "x<b>abcd</b>z" # adjacent intervals merge
| Test | Why |
|---|---|
["ab","bc"], "aabcd" |
Validates overlapping matches |
["ab","cb"], "aabcd" |
Validates isolated match |
[], "abc" |
Validates empty words array |
["abc"], "abc" |
Validates full-string bolding |
["a"], "aaaa" |
Validates adjacent region merging |
["aa"], "aaaa" |
Validates overlapping repeated matches |
["x"], "abc" |
Validates no matching substrings |
["a","b","c"], "abc" |
Validates consecutive single-character merging |
["abc","bc"], "zabcx" |
Validates nested intervals |
["ab","cd"], "xabcdz" |
Validates adjacent interval merging |
Edge Cases
One important edge case occurs when multiple matches overlap. For example:
words = ["ab", "bc"]
s = "abc"
The intervals overlap on character 'b'. A buggy implementation might create nested tags like:
<b>ab<b>bc</b>
which is invalid. The boolean marking approach avoids this completely because overlapping intervals simply produce one continuous bold region.
Another important case is adjacent intervals. Consider:
words = ["ab", "cd"]
s = "abcd"
Although the matches do not overlap, they touch directly. The problem requires minimal tags, so the correct result is:
<b>abcd</b>
instead of:
<b>ab</b><b>cd</b>
Because the algorithm merges contiguous True values naturally, adjacent regions automatically become one segment.
A third edge case happens when no words match at all. For example:
words = ["x"]
s = "abc"
In this situation, every entry in bold remains False, and the reconstruction phase simply returns the original string unchanged. No unnecessary tags are added.
A final subtle case is when the entire string becomes bold. In that situation, the algorithm correctly inserts exactly one opening tag at the beginning and one closing tag at the end, because the first and last positions are handled explicitly in the boundary checks.