LeetCode 616 - Add Bold Tag in String

The problem asks us to insert HTML-style bold tags into a string whenever a substring matches any word from a given dictionary of words.

LeetCode Problem 616

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Trie, String Matching

Solution

Problem Understanding

The problem asks us to insert HTML-style bold tags into a string whenever a substring matches any word from a given dictionary of words.

We are given:

  • A string s
  • An array words, where each element is a pattern we want to highlight

For every occurrence of every word inside s, we must wrap that matching portion with <b> and </b> tags.

The tricky part is that overlapping or adjacent matches must be merged into a single bold section.

For example, if one word matches indices [0,2] and another matches [2,4], we do not create nested or separate tags. Instead, we combine them into one continuous bold region covering [0,4].

The output is the original string with the minimum number of bold tag pairs inserted.

The constraints are relatively small:

  • s.length <= 1000
  • words.length <= 100
  • words[i].length <= 1000

Because the input size is modest, we can afford solutions that examine many substrings directly. We do not necessarily need highly advanced string matching algorithms like KMP or Aho-Corasick, although those could also solve the problem.

Several edge cases are important:

  • Multiple words can overlap
  • Multiple words can be consecutive
  • A word may appear many times
  • words may be empty
  • One word may be a substring of another
  • The entire string may become bold
  • No words may match at all

A naive implementation that inserts tags immediately upon finding a match can easily produce incorrect nested or fragmented tags. The key challenge is correctly merging intervals before generating the final string.

Approaches

Brute Force Approach

The brute-force approach is to examine every substring of s and check whether it equals any word in words.

Whenever a match is found, we mark the indices covered by that substring as bold.

After processing all possible substrings, we build the final result by inserting <b> before entering a bold region and </b> after leaving one.

This approach is correct because every possible matching substring is checked explicitly. Any character that belongs to a matched word becomes marked bold.

However, generating all substrings is expensive. A string of length n has O(n^2) substrings, and comparing them against all words increases the cost further.

Optimal Approach

A better approach is to iterate through each position in s and directly test whether any word starts there.

Instead of generating all substrings, we only check substrings whose lengths correspond to actual words.

We maintain a boolean array bold, where:

  • bold[i] = True means character s[i] should be bolded
  • bold[i] = False means it should remain normal

For every index i and every word:

  • If s[i:i+len(word)] == word
  • Then mark all characters in that range as bold

After all matches are processed, we traverse the string once to construct the final answer.

The main insight is that the problem is fundamentally an interval merging problem. Instead of inserting tags during matching, we first determine which characters belong to bold intervals, then generate tags only when transitioning between normal and bold regions.

Approach Time Complexity Space Complexity Notes
Brute Force O(n² × m × k) O(n) Generates all substrings and compares against all words
Optimal O(n × m × k) O(n) Checks only relevant word lengths and marks bold intervals

Where:

  • n = len(s)
  • m = len(words)
  • k = average word length

Algorithm Walkthrough

  1. Create a boolean array called bold of length len(s).

Each position represents whether that character should appear inside a bold tag. 2. Iterate through every starting index i in the string.

At each position, test every word in words. 3. For each word, check whether the substring starting at i matches the word.

This is done using:

s[i:i+len(word)] == word
  1. If a match is found, mark all characters covered by that word as bold.

For example, if "abc" matches starting at index 2, then indices 2, 3, and 4 become bold. 5. After all matches are processed, traverse the string again to build the result. 6. Before appending character s[i], determine whether we are entering a bold region.

We insert <b> when:

  • bold[i] == True
  • and either i == 0 or bold[i-1] == False
  1. Append the current character.
  2. After appending the character, determine whether we are leaving a bold region.

We insert </b> when:

  • bold[i] == True
  • and either i == len(s)-1 or bold[i+1] == False
  1. Return the constructed string.

Why it works

The algorithm works because every matching word explicitly marks its covered characters as bold.

Overlapping matches naturally merge because overlapping ranges mark the same boolean positions as True.

Consecutive matches also merge automatically because adjacent positions remain continuously bold, so no closing and reopening tags are inserted between them.

The final traversal inserts tags exactly at transitions between normal and bold regions, guaranteeing the minimum valid number of tag pairs.

Python Solution

from typing import List

class Solution:
    def addBoldTag(self, s: str, words: List[str]) -> str:
        n = len(s)
        bold = [False] * n

        # Mark all bold positions
        for i in range(n):
            for word in words:
                word_len = len(word)

                if s[i:i + word_len] == word:
                    for j in range(i, i + word_len):
                        if j < n:
                            bold[j] = True

        # Build final result
        result = []

        for i in range(n):
            # Start of bold region
            if bold[i] and (i == 0 or not bold[i - 1]):
                result.append("<b>")

            result.append(s[i])

            # End of bold region
            if bold[i] and (i == n - 1 or not bold[i + 1]):
                result.append("</b>")

        return "".join(result)

The implementation follows the algorithm directly.

The bold array is the core data structure. It stores whether each character belongs to any matched word.

The first nested loop scans every position and every word. Whenever a match occurs, the corresponding indices are marked True.

The second traversal constructs the output string.

The condition:

bold[i] and (i == 0 or not bold[i - 1])

detects the start of a bold region.

Similarly:

bold[i] and (i == n - 1 or not bold[i + 1])

detects the end of a bold region.

Because tags are inserted only at transitions, overlapping and consecutive intervals are merged automatically.

Go Solution

func addBoldTag(s string, words []string) string {
	n := len(s)
	bold := make([]bool, n)

	// Mark bold positions
	for i := 0; i < n; i++ {
		for _, word := range words {
			wordLen := len(word)

			if i+wordLen <= n && s[i:i+wordLen] == word {
				for j := i; j < i+wordLen; j++ {
					bold[j] = true
				}
			}
		}
	}

	// Build result
	result := make([]byte, 0)

	for i := 0; i < n; i++ {
		// Start bold region
		if bold[i] && (i == 0 || !bold[i-1]) {
			result = append(result, []byte("<b>")...)
		}

		result = append(result, s[i])

		// End bold region
		if bold[i] && (i == n-1 || !bold[i+1]) {
			result = append(result, []byte("</b>")...)
		}
	}

	return string(result)
}

The Go implementation mirrors the Python logic closely.

One important difference is bounds handling. In Go, slicing beyond the string length causes a runtime panic, so we explicitly check:

i + wordLen <= n

before comparing substrings.

The result is constructed using a byte slice for efficiency, since repeated string concatenation in Go can be expensive.

Worked Examples

Example 1

Input:
s = "abcxyz123"
words = ["abc", "123"]

Step 1: Initialize bold array

Index Character Bold
0 a False
1 b False
2 c False
3 x False
4 y False
5 z False
6 1 False
7 2 False
8 3 False

Step 2: Find matches

"abc" matches at index 0

Mark indices [0,1,2]

"123" matches at index 6

Mark indices [6,7,8]

Updated array:

Index Character Bold
0 a True
1 b True
2 c True
3 x False
4 y False
5 z False
6 1 True
7 2 True
8 3 True

Step 3: Build result

Index Action Result
0 Open <b> <b>
0 Append a <b>a
1 Append b <b>ab
2 Append c <b>abc
2 Close </b> <b>abc</b>
3-5 Append xyz <b>abc</b>xyz
6 Open <b> <b>abc</b>xyz<b>
6-8 Append 123 <b>abc</b>xyz<b>123
8 Close </b> <b>abc</b>xyz<b>123</b>

Final output:

<b>abc</b>xyz<b>123</b>

Example 2

Input:
s = "aaabbb"
words = ["aa", "b"]

Step 1: Match "aa"

Occurrences:

  • indices [0,1]
  • indices [1,2]

Current bold array:

Index Character Bold
0 a True
1 a True
2 a True
3 b False
4 b False
5 b False

Step 2: Match "b"

Occurrences:

  • index 3
  • index 4
  • index 5

Updated array:

Index Character Bold
0 a True
1 a True
2 a True
3 b True
4 b True
5 b True

Now every character is bold.

Step 3: Build result

We enter bold mode at index 0 and never leave it until the end.

Final output:

<b>aaabbb</b>

Complexity Analysis

Measure Complexity Explanation
Time O(n × m × k) For each position, we compare against every word
Space O(n) The boolean array stores bold status for each character

The dominant cost comes from substring comparisons.

For each of the n positions, we may compare against all m words, and each comparison may examine up to k characters.

The additional space usage is linear because the bold array stores one boolean per character.

Test Cases

from typing import List

class Solution:
    def addBoldTag(self, s: str, words: List[str]) -> str:
        n = len(s)
        bold = [False] * n

        for i in range(n):
            for word in words:
                if s[i:i + len(word)] == word:
                    for j in range(i, i + len(word)):
                        if j < n:
                            bold[j] = True

        result = []

        for i in range(n):
            if bold[i] and (i == 0 or not bold[i - 1]):
                result.append("<b>")

            result.append(s[i])

            if bold[i] and (i == n - 1 or not bold[i + 1]):
                result.append("</b>")

        return "".join(result)

sol = Solution()

assert sol.addBoldTag("abcxyz123", ["abc", "123"]) == "<b>abc</b>xyz<b>123</b>"  # basic separate matches

assert sol.addBoldTag("aaabbb", ["aa", "b"]) == "<b>aaabbb</b>"  # overlapping and consecutive merge

assert sol.addBoldTag("abcdef", ["gh"]) == "abcdef"  # no matches

assert sol.addBoldTag("aaaaa", ["aa"]) == "<b>aaaaa</b>"  # repeated overlapping matches

assert sol.addBoldTag("abc", ["abc"]) == "<b>abc</b>"  # entire string bold

assert sol.addBoldTag("abc", []) == "abc"  # empty words array

assert sol.addBoldTag("a", ["a"]) == "<b>a</b>"  # single character match

assert sol.addBoldTag("a", ["b"]) == "a"  # single character no match

assert sol.addBoldTag("abcxyz", ["abc", "xyz"]) == "<b>abc</b><b>xyz</b>"  # consecutive regions remain merged by adjacency rule

assert sol.addBoldTag("abababcd", ["ab", "abcd"]) == "<b>abababcd</b>"  # nested and overlapping intervals
Test Why
"abcxyz123", ["abc", "123"] Validates separate bold regions
"aaabbb", ["aa", "b"] Validates overlap and adjacency merging
"abcdef", ["gh"] Validates no-match behavior
"aaaaa", ["aa"] Validates repeated overlapping matches
"abc", ["abc"] Validates whole-string bolding
"abc", [] Validates empty dictionary handling
"a", ["a"] Validates smallest matching input
"a", ["b"] Validates smallest non-matching input
"abcxyz", ["abc", "xyz"] Validates multiple independent regions
"abababcd", ["ab", "abcd"] Validates nested overlaps

Edge Cases

One important edge case is overlapping matches. For example, in "aaaaa" with ["aa"], matches occur at indices [0,1], [1,2], [2,3], and [3,4]. A naive implementation might generate multiple nested tags. The boolean marking approach avoids this entirely because all overlapping positions simply become True, producing one continuous bold interval.

Another critical edge case is consecutive matches. Suppose one word ends exactly where another begins. The problem requires consecutive bold sections to merge into one larger section. Because the algorithm inserts tags only when transitioning between bold and non-bold characters, adjacent matches automatically combine into a single region.

A third important case is when no words match the string at all. Some implementations accidentally insert empty tags or produce malformed output. In this solution, the bold array remains entirely False, so no tags are ever inserted, and the original string is returned unchanged.

A final subtle edge case occurs when the entire string becomes bold. Incorrect implementations sometimes forget to close the final </b> tag. This solution explicitly checks whether the current index is the last character in the string, ensuring the closing tag is inserted correctly at the end.