LeetCode 3374 - First Letter Capitalization II

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This problem asks us to transform text stored in a database table while preserving the original formatting structure. The table user_content contains two columns: a unique content_id and a content_text string. For every row, we must generate a new version of the text where each word follows title case formatting rules.

More specifically, every word must be converted so that its first letter becomes uppercase and all remaining letters become lowercase. For example, "hello" becomes "Hello" and "SQL" becomes "Sql".

The tricky part comes from special characters, especially hyphens. If a word contains a hyphen, both sides of the hyphen must be capitalized independently. For example:

"top-rated" becomes "Top-Rated"
"FRONT-end" becomes "Front-End"

At the same time, the problem explicitly states that all original spacing and formatting must remain unchanged. This means we cannot simply split the string on spaces and rebuild it, because multiple spaces or special separators could be lost or modified.

The input represents raw text data that may contain uppercase letters, lowercase letters, spaces, and several special characters such as:

'
' '
'@'
'-'
'/'
'^'
','

The expected output must contain:

content_id
the original text
the transformed text

The constraints tell us that the input size is manageable, but correctness of string transformation is critical. The main challenge is handling delimiters correctly while preserving the exact structure of the original string.

Several edge cases are important:

Words already fully uppercase, such as "DATA"
Mixed-case words, such as "qUiCk"
Hyphenated words with multiple segments
Consecutive separators or multiple spaces
Strings containing special characters that should remain untouched

A naive implementation that splits only on spaces would fail for cases like "web-based" because both parts need independent capitalization.

Approaches

Brute Force Approach

A brute-force solution would manually scan every character in the string while tracking whether the current character starts a new word segment. Every time a separator is encountered, the next alphabetical character is treated as the beginning of a new word.

This approach works because capitalization rules depend entirely on character position within a word. By examining characters one by one, we can decide whether to uppercase or lowercase each character.

However, implementing this logic manually becomes complicated because we must carefully handle multiple separator types and preserve formatting exactly. The code becomes error-prone and difficult to maintain.

Optimal Approach

The key observation is that SQL already provides built-in string transformation functions such as:

UPPER()
LOWER()
REGEXP_REPLACE()

Using regular expressions allows us to identify word boundaries cleanly and transform only the appropriate characters.

The optimal solution uses regex pattern matching to locate:

the first character of each word
the first character after hyphens

Then we apply uppercase transformation to those positions while lowercasing the remaining letters.

This approach is cleaner, easier to reason about, and naturally preserves formatting because replacements occur directly inside the original string.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n)	O(n)	Manually scans characters and rebuilds the string
Optimal	O(n)	O(n)	Uses regex and built-in SQL string functions

Algorithm Walkthrough

Start with the original content_text value from each row.
Convert the entire string to lowercase first. This simplifies the logic because every non-leading character is guaranteed to already be lowercase.
Identify every character that should become uppercase. A character must be capitalized if:

it is the first character of the string
it appears immediately after a space
it appears immediately after a hyphen

Use a regular expression to locate these positions. The regex effectively detects word starts while preserving all original delimiters and spacing.
Replace each matched character with its uppercase version.
Return:

content_id
the original text
the transformed text

Why it works

The algorithm works because every valid word segment has exactly one leading character that should be uppercase. By first converting the entire string to lowercase, we eliminate inconsistent casing. Then, by uppercasing only characters at valid word boundaries, we guarantee the final formatting matches the problem requirements.

Python Solution

class Solution:
    def capitalizeContent(self, user_content: 'pd.DataFrame') -> 'pd.DataFrame':
        import re
        import pandas as pd

        def transform(text: str) -> str:
            text = text.lower()

            result = []
            capitalize_next = True

            for char in text:
                if char.isalpha():
                    if capitalize_next:
                        result.append(char.upper())
                    else:
                        result.append(char.lower())

                    capitalize_next = False
                else:
                    result.append(char)

                    if char == ' ' or char == '-':
                        capitalize_next = True

            return ''.join(result)

        return pd.DataFrame({
            'content_id': user_content['content_id'],
            'original_text': user_content['content_text'],
            'converted_text': user_content['content_text'].apply(transform)
        })

The implementation begins by converting the entire string to lowercase. This guarantees that all non-leading characters already satisfy the formatting requirement.

The algorithm then iterates through the string character by character. A boolean flag named capitalize_next determines whether the next alphabetical character should be uppercase.

Whenever a space or hyphen is encountered, the flag becomes True, meaning the next letter starts a new word segment.

All other characters are appended unchanged, which preserves formatting exactly as required.

Finally, the transformed strings are assembled into the output DataFrame with the required column names.

Go Solution

package main

import (
	"strings"
	"unicode"
)

func capitalizeContent(contentText string) string {
	text := strings.ToLower(contentText)

	var result []rune
	capitalizeNext := true

	for _, ch := range text {
		if unicode.IsLetter(ch) {
			if capitalizeNext {
				result = append(result, unicode.ToUpper(ch))
			} else {
				result = append(result, ch)
			}

			capitalizeNext = false
		} else {
			result = append(result, ch)

			if ch == ' ' || ch == '-' {
				capitalizeNext = true
			}
		}
	}

	return string(result)
}

The Go implementation follows the same logic as the Python version but uses rune processing because strings in Go are UTF-8 encoded.

A []rune slice is used to efficiently build the output string character by character.

The unicode package provides safe case conversion functions such as:

unicode.ToUpper
unicode.IsLetter

Unlike Python, Go does not have a built-in DataFrame structure, so the function focuses purely on the string transformation logic.

Worked Examples

Example 1

Input:

hello world of SQL

After lowercase conversion:

hello world of sql

Character	capitalize_next	Output
h	True	H
e	False	He
l	False	Hel
l	False	Hell
o	False	Hello
space	False	Hello
w	True	Hello W
o	False	Hello Wo

Final result:

Hello World Of Sql

Example 2

Input:

the QUICK-brown fox

After lowercase conversion:

the quick-brown fox

Character	capitalize_next	Output
t	True	T
h	False	Th
e	False	The
space	False	The
q	True	The Q
u	False	The Qu
i	False	The Qui
c	False	The Quic
k	False	The Quick
-	False	The Quick-
b	True	The Quick-B

Final result:

The Quick-Brown Fox

Example 3

Input:

modern-day DATA science

Lowercase conversion:

modern-day data science

The hyphen causes the next character to become uppercase.

Final result:

Modern-Day Data Science

Example 4

Input:

web-based FRONT-end development

Lowercase conversion:

web-based front-end development

Both "web-based" and "front-end" are treated as two-part hyphenated words.

Final result:

Web-Based Front-End Development

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each character is processed exactly once
Space	O(n)	The transformed output string requires additional storage

The algorithm performs a single linear scan through the text. Every character is visited once, and all operations inside the loop are constant time. Therefore, the runtime complexity is linear relative to the input length.

The output string must be stored separately, so the space complexity is also linear.

Test Cases

def transform(text: str) -> str:
    text = text.lower()

    result = []
    capitalize_next = True

    for char in text:
        if char.isalpha():
            if capitalize_next:
                result.append(char.upper())
            else:
                result.append(char.lower())

            capitalize_next = False
        else:
            result.append(char)

            if char == ' ' or char == '-':
                capitalize_next = True

    return ''.join(result)

assert transform("hello world of SQL") == "Hello World Of Sql"  # basic example
assert transform("the QUICK-brown fox") == "The Quick-Brown Fox"  # hyphenated word
assert transform("modern-day DATA science") == "Modern-Day Data Science"  # mixed casing
assert transform("web-based FRONT-end development") == "Web-Based Front-End Development"  # multiple hyphens
assert transform("SINGLE") == "Single"  # fully uppercase word
assert transform("already Correct") == "Already Correct"  # partially formatted input
assert transform("multiple   spaces") == "Multiple   Spaces"  # preserve spacing
assert transform("a-b-c") == "A-B-C"  # repeated hyphen separators
assert transform("hello/world") == "Hello/world"  # slash does not trigger capitalization
assert transform("test@EMAIL") == "Test@email"  # special character handling

Test	Why
`"hello world of SQL"`	Validates standard capitalization
`"the QUICK-brown fox"`	Tests hyphen handling
`"modern-day DATA science"`	Tests lowercase normalization
`"web-based FRONT-end development"`	Tests multiple hyphenated words
`"SINGLE"`	Tests all-uppercase input
`"already Correct"`	Tests mixed-case normalization
`"multiple spaces"`	Ensures spaces are preserved
`"a-b-c"`	Tests repeated hyphen capitalization
`"hello/world"`	Ensures slash does not reset capitalization
`"test@EMAIL"`	Tests special character preservation

Edge Cases

One important edge case is fully uppercase input such as "SQL SERVER". A naive implementation that only uppercases the first character without lowercasing the remainder would incorrectly produce "SQL SERVER" instead of "Sql Server". The implementation avoids this by converting the entire string to lowercase first.

Another tricky case involves multiple consecutive spaces, such as "hello world". Solutions that split on spaces and rejoin strings would collapse multiple spaces into one. The character-by-character approach preserves every original character exactly, including repeated spaces.

Hyphenated words are another common source of bugs. For example, "front-END" must become "Front-End". A naive title-case function would often produce "Front-end" because it only capitalizes after spaces. The implementation explicitly treats hyphens as capitalization boundaries, ensuring every segment is handled independently.