LeetCode 3374 - First Letter Capitalization II
This problem asks us to transform text stored in a database table while preserving the original formatting structure. The table usercontent contains two columns: a unique contentid and a contenttext string.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
This problem asks us to transform text stored in a database table while preserving the original formatting structure. The table user_content contains two columns: a unique content_id and a content_text string. For every row, we must generate a new version of the text where each word follows title case formatting rules.
More specifically, every word must be converted so that its first letter becomes uppercase and all remaining letters become lowercase. For example, "hello" becomes "Hello" and "SQL" becomes "Sql".
The tricky part comes from special characters, especially hyphens. If a word contains a hyphen, both sides of the hyphen must be capitalized independently. For example:
"top-rated"becomes"Top-Rated""FRONT-end"becomes"Front-End"
At the same time, the problem explicitly states that all original spacing and formatting must remain unchanged. This means we cannot simply split the string on spaces and rebuild it, because multiple spaces or special separators could be lost or modified.
The input represents raw text data that may contain uppercase letters, lowercase letters, spaces, and several special characters such as:
'' ''@''-''/''^'','
The expected output must contain:
content_id- the original text
- the transformed text
The constraints tell us that the input size is manageable, but correctness of string transformation is critical. The main challenge is handling delimiters correctly while preserving the exact structure of the original string.
Several edge cases are important:
- Words already fully uppercase, such as
"DATA" - Mixed-case words, such as
"qUiCk" - Hyphenated words with multiple segments
- Consecutive separators or multiple spaces
- Strings containing special characters that should remain untouched
A naive implementation that splits only on spaces would fail for cases like "web-based" because both parts need independent capitalization.
Approaches
Brute Force Approach
A brute-force solution would manually scan every character in the string while tracking whether the current character starts a new word segment. Every time a separator is encountered, the next alphabetical character is treated as the beginning of a new word.
This approach works because capitalization rules depend entirely on character position within a word. By examining characters one by one, we can decide whether to uppercase or lowercase each character.
However, implementing this logic manually becomes complicated because we must carefully handle multiple separator types and preserve formatting exactly. The code becomes error-prone and difficult to maintain.
Optimal Approach
The key observation is that SQL already provides built-in string transformation functions such as:
UPPER()LOWER()REGEXP_REPLACE()
Using regular expressions allows us to identify word boundaries cleanly and transform only the appropriate characters.
The optimal solution uses regex pattern matching to locate:
- the first character of each word
- the first character after hyphens
Then we apply uppercase transformation to those positions while lowercasing the remaining letters.
This approach is cleaner, easier to reason about, and naturally preserves formatting because replacements occur directly inside the original string.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(n) | Manually scans characters and rebuilds the string |
| Optimal | O(n) | O(n) | Uses regex and built-in SQL string functions |
Algorithm Walkthrough
- Start with the original
content_textvalue from each row. - Convert the entire string to lowercase first. This simplifies the logic because every non-leading character is guaranteed to already be lowercase.
- Identify every character that should become uppercase. A character must be capitalized if:
- it is the first character of the string
- it appears immediately after a space
- it appears immediately after a hyphen
- Use a regular expression to locate these positions. The regex effectively detects word starts while preserving all original delimiters and spacing.
- Replace each matched character with its uppercase version.
- Return:
content_id- the original text
- the transformed text
Why it works
The algorithm works because every valid word segment has exactly one leading character that should be uppercase. By first converting the entire string to lowercase, we eliminate inconsistent casing. Then, by uppercasing only characters at valid word boundaries, we guarantee the final formatting matches the problem requirements.
Python Solution
class Solution:
def capitalizeContent(self, user_content: 'pd.DataFrame') -> 'pd.DataFrame':
import re
import pandas as pd
def transform(text: str) -> str:
text = text.lower()
result = []
capitalize_next = True
for char in text:
if char.isalpha():
if capitalize_next:
result.append(char.upper())
else:
result.append(char.lower())
capitalize_next = False
else:
result.append(char)
if char == ' ' or char == '-':
capitalize_next = True
return ''.join(result)
return pd.DataFrame({
'content_id': user_content['content_id'],
'original_text': user_content['content_text'],
'converted_text': user_content['content_text'].apply(transform)
})
The implementation begins by converting the entire string to lowercase. This guarantees that all non-leading characters already satisfy the formatting requirement.
The algorithm then iterates through the string character by character. A boolean flag named capitalize_next determines whether the next alphabetical character should be uppercase.
Whenever a space or hyphen is encountered, the flag becomes True, meaning the next letter starts a new word segment.
All other characters are appended unchanged, which preserves formatting exactly as required.
Finally, the transformed strings are assembled into the output DataFrame with the required column names.
Go Solution
package main
import (
"strings"
"unicode"
)
func capitalizeContent(contentText string) string {
text := strings.ToLower(contentText)
var result []rune
capitalizeNext := true
for _, ch := range text {
if unicode.IsLetter(ch) {
if capitalizeNext {
result = append(result, unicode.ToUpper(ch))
} else {
result = append(result, ch)
}
capitalizeNext = false
} else {
result = append(result, ch)
if ch == ' ' || ch == '-' {
capitalizeNext = true
}
}
}
return string(result)
}
The Go implementation follows the same logic as the Python version but uses rune processing because strings in Go are UTF-8 encoded.
A []rune slice is used to efficiently build the output string character by character.
The unicode package provides safe case conversion functions such as:
unicode.ToUpperunicode.IsLetter
Unlike Python, Go does not have a built-in DataFrame structure, so the function focuses purely on the string transformation logic.
Worked Examples
Example 1
Input:
hello world of SQL
After lowercase conversion:
hello world of sql
| Character | capitalize_next | Output |
|---|---|---|
| h | True | H |
| e | False | He |
| l | False | Hel |
| l | False | Hell |
| o | False | Hello |
| space | False | Hello |
| w | True | Hello W |
| o | False | Hello Wo |
Final result:
Hello World Of Sql
Example 2
Input:
the QUICK-brown fox
After lowercase conversion:
the quick-brown fox
| Character | capitalize_next | Output |
|---|---|---|
| t | True | T |
| h | False | Th |
| e | False | The |
| space | False | The |
| q | True | The Q |
| u | False | The Qu |
| i | False | The Qui |
| c | False | The Quic |
| k | False | The Quick |
| - | False | The Quick- |
| b | True | The Quick-B |
Final result:
The Quick-Brown Fox
Example 3
Input:
modern-day DATA science
Lowercase conversion:
modern-day data science
The hyphen causes the next character to become uppercase.
Final result:
Modern-Day Data Science
Example 4
Input:
web-based FRONT-end development
Lowercase conversion:
web-based front-end development
Both "web-based" and "front-end" are treated as two-part hyphenated words.
Final result:
Web-Based Front-End Development
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character is processed exactly once |
| Space | O(n) | The transformed output string requires additional storage |
The algorithm performs a single linear scan through the text. Every character is visited once, and all operations inside the loop are constant time. Therefore, the runtime complexity is linear relative to the input length.
The output string must be stored separately, so the space complexity is also linear.
Test Cases
def transform(text: str) -> str:
text = text.lower()
result = []
capitalize_next = True
for char in text:
if char.isalpha():
if capitalize_next:
result.append(char.upper())
else:
result.append(char.lower())
capitalize_next = False
else:
result.append(char)
if char == ' ' or char == '-':
capitalize_next = True
return ''.join(result)
assert transform("hello world of SQL") == "Hello World Of Sql" # basic example
assert transform("the QUICK-brown fox") == "The Quick-Brown Fox" # hyphenated word
assert transform("modern-day DATA science") == "Modern-Day Data Science" # mixed casing
assert transform("web-based FRONT-end development") == "Web-Based Front-End Development" # multiple hyphens
assert transform("SINGLE") == "Single" # fully uppercase word
assert transform("already Correct") == "Already Correct" # partially formatted input
assert transform("multiple spaces") == "Multiple Spaces" # preserve spacing
assert transform("a-b-c") == "A-B-C" # repeated hyphen separators
assert transform("hello/world") == "Hello/world" # slash does not trigger capitalization
assert transform("test@EMAIL") == "Test@email" # special character handling
| Test | Why |
|---|---|
"hello world of SQL" |
Validates standard capitalization |
"the QUICK-brown fox" |
Tests hyphen handling |
"modern-day DATA science" |
Tests lowercase normalization |
"web-based FRONT-end development" |
Tests multiple hyphenated words |
"SINGLE" |
Tests all-uppercase input |
"already Correct" |
Tests mixed-case normalization |
"multiple spaces" |
Ensures spaces are preserved |
"a-b-c" |
Tests repeated hyphen capitalization |
"hello/world" |
Ensures slash does not reset capitalization |
"test@EMAIL" |
Tests special character preservation |
Edge Cases
One important edge case is fully uppercase input such as "SQL SERVER". A naive implementation that only uppercases the first character without lowercasing the remainder would incorrectly produce "SQL SERVER" instead of "Sql Server". The implementation avoids this by converting the entire string to lowercase first.
Another tricky case involves multiple consecutive spaces, such as "hello world". Solutions that split on spaces and rejoin strings would collapse multiple spaces into one. The character-by-character approach preserves every original character exactly, including repeated spaces.
Hyphenated words are another common source of bugs. For example, "front-END" must become "Front-End". A naive title-case function would often produce "Front-end" because it only capitalizes after spaces. The implementation explicitly treats hyphens as capitalization boundaries, ensuring every segment is handled independently.