LeetCode 758 - Bold Words in String

The problem gives us two inputs: an array of strings called words, and a target string s. Every occurrence of every word from words inside s must become bold by surrounding that substring with <b and </b tags.

LeetCode Problem 758

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Trie, String Matching

Solution

Problem Understanding

The problem gives us two inputs: an array of strings called words, and a target string s. Every occurrence of every word from words inside s must become bold by surrounding that substring with <b> and </b> tags.

The important requirement is that we must use the minimum number of tags possible. This means that overlapping or adjacent bold regions should be merged into a single bold segment instead of creating multiple nested or separate tags.

For example, if words = ["ab", "bc"] and s = "aabcd", then:

  • "ab" appears starting at index 1
  • "bc" appears starting at index 2

These matches overlap on the character 'b', so instead of generating separate bold regions, we combine them into one continuous region covering "abc".

The final result becomes:

a<b>abc</b>d

The constraints are relatively small:

  • s.length <= 500
  • words.length <= 50
  • words[i].length <= 10

These limits tell us that even moderately inefficient string matching solutions will still work comfortably within time limits. We do not need advanced algorithms like Aho-Corasick or suffix automata. A carefully implemented scanning solution is sufficient.

Several edge cases are important:

  • Multiple words may overlap.
  • Multiple words may be directly adjacent and should still merge into one bold section.
  • Some words may appear many times.
  • Some words may never appear.
  • words can be empty.
  • The entire string may become bold.
  • Single-character matches can create many small intervals that need merging.

A naive implementation often fails when handling overlaps and adjacency correctly, especially when deciding where to place opening and closing tags.

Approaches

Brute Force Approach

The brute force idea is to generate every matching substring interval and then repeatedly merge overlapping intervals afterward.

For every word in words, we scan every possible starting position in s. Whenever we find a match, we store its interval [start, end].

After collecting all intervals, we sort them and merge overlapping or adjacent intervals. Finally, we rebuild the string by inserting bold tags around merged ranges.

This approach is correct because every occurrence is explicitly discovered and represented as an interval. Merging intervals guarantees that the minimum number of bold tags is used.

However, this method performs unnecessary interval management and repeated substring checks. While still acceptable for the given constraints, it is less elegant and slightly more complicated than necessary.

Optimal Approach

The key observation is that we do not actually need to store intervals explicitly. Instead, we only need to know whether each character position should be bold.

We create a boolean array bold where:

bold[i] = True

means character s[i] belongs inside a bold region.

We scan every position in s and check whether any word starts there. Whenever a match is found, we mark all covered positions as bold.

Once all matches are processed, constructing the final string becomes straightforward:

  • Start a <b> tag whenever we enter a bold region.
  • End with </b> whenever we leave a bold region.

This naturally merges overlapping and adjacent matches because consecutive True positions form one continuous bold segment.

Approach Time Complexity Space Complexity Notes
Brute Force O(W × N × L + K log K) O(K) Stores and merges intervals
Optimal O(W × N × L) O(N) Uses boolean marking array

Where:

  • N = length of s
  • W = number of words
  • L = maximum word length
  • K = number of matched intervals

Algorithm Walkthrough

Step 1: Create a Bold Marker Array

We create a boolean array called bold with length equal to len(s).

Initially, every value is False because no characters are marked bold yet.

bold = [False] * len(s)

This array will eventually tell us exactly which characters belong inside bold tags.

Step 2: Find Every Matching Word

For every starting index i in the string:

  1. Check every word in words
  2. Determine whether the word matches starting at position i
  3. If it matches, mark all covered character positions as bold

For example:

s = "aabcd"
word = "ab"

If "ab" starts at index 1, then:

bold[1] = True
bold[2] = True

This converts substring matching into simple positional marking.

Step 3: Build the Result String

Now we traverse the string character by character.

At each position:

  • If current position is bold and previous position was not bold, insert <b>
  • Append the current character
  • If current position is bold and next position is not bold, insert </b>

This ensures:

  • Overlapping regions become one segment
  • Adjacent regions also merge naturally
  • Tags are minimal

Step 4: Return the Final String

Join all collected pieces together and return the final formatted string.

Why it works

The algorithm works because every matched substring marks all of its covered characters as bold. After processing all words, the bold array exactly represents the union of all matching intervals.

When reconstructing the string, contiguous True regions correspond precisely to maximal bold segments. Since overlapping and adjacent intervals produce one continuous sequence of True values, the algorithm automatically generates the minimum number of bold tags.

Python Solution

from typing import List

class Solution:
    def boldWords(self, words: List[str], s: str) -> str:
        n = len(s)
        bold = [False] * n

        # Mark bold positions
        for i in range(n):
            for word in words:
                if s.startswith(word, i):
                    for j in range(i, i + len(word)):
                        bold[j] = True

        result = []

        for i in range(n):
            # Start bold section
            if bold[i] and (i == 0 or not bold[i - 1]):
                result.append("<b>")

            result.append(s[i])

            # End bold section
            if bold[i] and (i == n - 1 or not bold[i + 1]):
                result.append("</b>")

        return "".join(result)

The implementation begins by allocating the bold array. Each position corresponds directly to one character in the input string.

The nested loops perform substring matching. The method s.startswith(word, i) checks whether word begins at index i. If it does, every covered position becomes bold.

After all matches are processed, the second traversal reconstructs the final answer. The logic for opening and closing tags depends entirely on transitions between bold and non-bold states.

An opening tag appears when entering a bold region:

bold[i] and not bold[i - 1]

A closing tag appears when leaving one:

bold[i] and not bold[i + 1]

This guarantees that contiguous bold ranges receive exactly one pair of tags.

Go Solution

func boldWords(words []string, s string) string {
	n := len(s)
	bold := make([]bool, n)

	// Mark bold positions
	for i := 0; i < n; i++ {
		for _, word := range words {
			wordLen := len(word)

			if i+wordLen <= n && s[i:i+wordLen] == word {
				for j := i; j < i+wordLen; j++ {
					bold[j] = true
				}
			}
		}
	}

	result := make([]byte, 0)

	for i := 0; i < n; i++ {
		// Start bold section
		if bold[i] && (i == 0 || !bold[i-1]) {
			result = append(result, []byte("<b>")...)
		}

		result = append(result, s[i])

		// End bold section
		if bold[i] && (i == n-1 || !bold[i+1]) {
			result = append(result, []byte("</b>")...)
		}
	}

	return string(result)
}

The Go implementation follows the same overall strategy as the Python version.

One notable difference is substring handling. In Go, substring extraction uses slicing:

s[i:i+wordLen]

Before slicing, we must verify:

i + wordLen <= n

to avoid out-of-bounds panics.

The result string is built using a byte slice for efficiency, since repeated string concatenation in Go can become expensive.

Worked Examples

Example 1

words = ["ab", "bc"]
s = "aabcd"

Step 1: Initialize

Index Character Bold
0 a False
1 a False
2 b False
3 c False
4 d False

Step 2: Process "ab"

"ab" matches starting at index 1.

Mark positions 1 and 2.

Index Character Bold
0 a False
1 a True
2 b True
3 c False
4 d False

Step 3: Process "bc"

"bc" matches starting at index 2.

Mark positions 2 and 3.

Index Character Bold
0 a False
1 a True
2 b True
3 c True
4 d False

Step 4: Build Result

Index Action Result
0 append a a
1 open <b> a<b>
1 append a a<b>a
2 append b a<b>ab
3 append c a<b>abc
3 close </b> a<b>abc</b>
4 append d a<b>abc</b>d

Final answer:

a<b>abc</b>d

Example 2

words = ["ab", "cb"]
s = "aabcd"

Step 1: Process "ab"

"ab" matches at index 1.

Index Character Bold
0 a False
1 a True
2 b True
3 c False
4 d False

Step 2: Process "cb"

No match exists.

Step 3: Build Result

Only indices 1 and 2 are bold.

Result:

a<b>ab</b>cd

Complexity Analysis

Measure Complexity Explanation
Time O(W × N × L) For every position, we may compare every word
Space O(N) Boolean array for bold markers

Here:

  • N is the length of s
  • W is the number of words
  • L is the maximum word length

The dominant work comes from substring matching. Since the constraints are small, this complexity is easily fast enough.

Test Cases

from typing import List

class Solution:
    def boldWords(self, words: List[str], s: str) -> str:
        n = len(s)
        bold = [False] * n

        for i in range(n):
            for word in words:
                if s.startswith(word, i):
                    for j in range(i, i + len(word)):
                        bold[j] = True

        result = []

        for i in range(n):
            if bold[i] and (i == 0 or not bold[i - 1]):
                result.append("<b>")

            result.append(s[i])

            if bold[i] and (i == n - 1 or not bold[i + 1]):
                result.append("</b>")

        return "".join(result)

sol = Solution()

assert sol.boldWords(["ab", "bc"], "aabcd") == "a<b>abc</b>d"  # overlapping matches

assert sol.boldWords(["ab", "cb"], "aabcd") == "a<b>ab</b>cd"  # single match

assert sol.boldWords([], "abc") == "abc"  # no words

assert sol.boldWords(["abc"], "abc") == "<b>abc</b>"  # entire string bold

assert sol.boldWords(["a"], "aaaa") == "<b>aaaa</b>"  # adjacent matches merge

assert sol.boldWords(["aa"], "aaaa") == "<b>aaaa</b>"  # overlapping intervals merge

assert sol.boldWords(["x"], "abc") == "abc"  # no occurrence

assert sol.boldWords(["a", "b", "c"], "abc") == "<b>abc</b>"  # consecutive regions merge

assert sol.boldWords(["abc", "bc"], "zabcx") == "z<b>abc</b>x"  # nested match

assert sol.boldWords(["ab", "cd"], "xabcdz") == "x<b>abcd</b>z"  # adjacent intervals merge
Test Why
["ab","bc"], "aabcd" Validates overlapping matches
["ab","cb"], "aabcd" Validates isolated match
[], "abc" Validates empty words array
["abc"], "abc" Validates full-string bolding
["a"], "aaaa" Validates adjacent region merging
["aa"], "aaaa" Validates overlapping repeated matches
["x"], "abc" Validates no matching substrings
["a","b","c"], "abc" Validates consecutive single-character merging
["abc","bc"], "zabcx" Validates nested intervals
["ab","cd"], "xabcdz" Validates adjacent interval merging

Edge Cases

One important edge case occurs when multiple matches overlap. For example:

words = ["ab", "bc"]
s = "abc"

The intervals overlap on character 'b'. A buggy implementation might create nested tags like:

<b>ab<b>bc</b>

which is invalid. The boolean marking approach avoids this completely because overlapping intervals simply produce one continuous bold region.

Another important case is adjacent intervals. Consider:

words = ["ab", "cd"]
s = "abcd"

Although the matches do not overlap, they touch directly. The problem requires minimal tags, so the correct result is:

<b>abcd</b>

instead of:

<b>ab</b><b>cd</b>

Because the algorithm merges contiguous True values naturally, adjacent regions automatically become one segment.

A third edge case happens when no words match at all. For example:

words = ["x"]
s = "abc"

In this situation, every entry in bold remains False, and the reconstruction phase simply returns the original string unchanged. No unnecessary tags are added.

A final subtle case is when the entire string becomes bold. In that situation, the algorithm correctly inserts exactly one opening tag at the beginning and one closing tag at the end, because the first and last positions are handled explicitly in the boundary checks.