LeetCode 3374 - First Letter Capitalization II

This problem asks us to transform text stored in a database table while preserving the original formatting structure. The table usercontent contains two columns: a unique contentid and a contenttext string.

LeetCode Problem 3374

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This problem asks us to transform text stored in a database table while preserving the original formatting structure. The table user_content contains two columns: a unique content_id and a content_text string. For every row, we must generate a new version of the text where each word follows title case formatting rules.

More specifically, every word must be converted so that its first letter becomes uppercase and all remaining letters become lowercase. For example, "hello" becomes "Hello" and "SQL" becomes "Sql".

The tricky part comes from special characters, especially hyphens. If a word contains a hyphen, both sides of the hyphen must be capitalized independently. For example:

  • "top-rated" becomes "Top-Rated"
  • "FRONT-end" becomes "Front-End"

At the same time, the problem explicitly states that all original spacing and formatting must remain unchanged. This means we cannot simply split the string on spaces and rebuild it, because multiple spaces or special separators could be lost or modified.

The input represents raw text data that may contain uppercase letters, lowercase letters, spaces, and several special characters such as:

  • '
  • ' '
  • '@'
  • '-'
  • '/'
  • '^'
  • ','

The expected output must contain:

  • content_id
  • the original text
  • the transformed text

The constraints tell us that the input size is manageable, but correctness of string transformation is critical. The main challenge is handling delimiters correctly while preserving the exact structure of the original string.

Several edge cases are important:

  • Words already fully uppercase, such as "DATA"
  • Mixed-case words, such as "qUiCk"
  • Hyphenated words with multiple segments
  • Consecutive separators or multiple spaces
  • Strings containing special characters that should remain untouched

A naive implementation that splits only on spaces would fail for cases like "web-based" because both parts need independent capitalization.

Approaches

Brute Force Approach

A brute-force solution would manually scan every character in the string while tracking whether the current character starts a new word segment. Every time a separator is encountered, the next alphabetical character is treated as the beginning of a new word.

This approach works because capitalization rules depend entirely on character position within a word. By examining characters one by one, we can decide whether to uppercase or lowercase each character.

However, implementing this logic manually becomes complicated because we must carefully handle multiple separator types and preserve formatting exactly. The code becomes error-prone and difficult to maintain.

Optimal Approach

The key observation is that SQL already provides built-in string transformation functions such as:

  • UPPER()
  • LOWER()
  • REGEXP_REPLACE()

Using regular expressions allows us to identify word boundaries cleanly and transform only the appropriate characters.

The optimal solution uses regex pattern matching to locate:

  • the first character of each word
  • the first character after hyphens

Then we apply uppercase transformation to those positions while lowercasing the remaining letters.

This approach is cleaner, easier to reason about, and naturally preserves formatting because replacements occur directly inside the original string.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) O(n) Manually scans characters and rebuilds the string
Optimal O(n) O(n) Uses regex and built-in SQL string functions

Algorithm Walkthrough

  1. Start with the original content_text value from each row.
  2. Convert the entire string to lowercase first. This simplifies the logic because every non-leading character is guaranteed to already be lowercase.
  3. Identify every character that should become uppercase. A character must be capitalized if:
  • it is the first character of the string
  • it appears immediately after a space
  • it appears immediately after a hyphen
  1. Use a regular expression to locate these positions. The regex effectively detects word starts while preserving all original delimiters and spacing.
  2. Replace each matched character with its uppercase version.
  3. Return:
  • content_id
  • the original text
  • the transformed text

Why it works

The algorithm works because every valid word segment has exactly one leading character that should be uppercase. By first converting the entire string to lowercase, we eliminate inconsistent casing. Then, by uppercasing only characters at valid word boundaries, we guarantee the final formatting matches the problem requirements.

Python Solution

class Solution:
    def capitalizeContent(self, user_content: 'pd.DataFrame') -> 'pd.DataFrame':
        import re
        import pandas as pd

        def transform(text: str) -> str:
            text = text.lower()

            result = []
            capitalize_next = True

            for char in text:
                if char.isalpha():
                    if capitalize_next:
                        result.append(char.upper())
                    else:
                        result.append(char.lower())

                    capitalize_next = False
                else:
                    result.append(char)

                    if char == ' ' or char == '-':
                        capitalize_next = True

            return ''.join(result)

        return pd.DataFrame({
            'content_id': user_content['content_id'],
            'original_text': user_content['content_text'],
            'converted_text': user_content['content_text'].apply(transform)
        })

The implementation begins by converting the entire string to lowercase. This guarantees that all non-leading characters already satisfy the formatting requirement.

The algorithm then iterates through the string character by character. A boolean flag named capitalize_next determines whether the next alphabetical character should be uppercase.

Whenever a space or hyphen is encountered, the flag becomes True, meaning the next letter starts a new word segment.

All other characters are appended unchanged, which preserves formatting exactly as required.

Finally, the transformed strings are assembled into the output DataFrame with the required column names.

Go Solution

package main

import (
	"strings"
	"unicode"
)

func capitalizeContent(contentText string) string {
	text := strings.ToLower(contentText)

	var result []rune
	capitalizeNext := true

	for _, ch := range text {
		if unicode.IsLetter(ch) {
			if capitalizeNext {
				result = append(result, unicode.ToUpper(ch))
			} else {
				result = append(result, ch)
			}

			capitalizeNext = false
		} else {
			result = append(result, ch)

			if ch == ' ' || ch == '-' {
				capitalizeNext = true
			}
		}
	}

	return string(result)
}

The Go implementation follows the same logic as the Python version but uses rune processing because strings in Go are UTF-8 encoded.

A []rune slice is used to efficiently build the output string character by character.

The unicode package provides safe case conversion functions such as:

  • unicode.ToUpper
  • unicode.IsLetter

Unlike Python, Go does not have a built-in DataFrame structure, so the function focuses purely on the string transformation logic.

Worked Examples

Example 1

Input:

hello world of SQL

After lowercase conversion:

hello world of sql
Character capitalize_next Output
h True H
e False He
l False Hel
l False Hell
o False Hello
space False Hello
w True Hello W
o False Hello Wo

Final result:

Hello World Of Sql

Example 2

Input:

the QUICK-brown fox

After lowercase conversion:

the quick-brown fox
Character capitalize_next Output
t True T
h False Th
e False The
space False The
q True The Q
u False The Qu
i False The Qui
c False The Quic
k False The Quick
- False The Quick-
b True The Quick-B

Final result:

The Quick-Brown Fox

Example 3

Input:

modern-day DATA science

Lowercase conversion:

modern-day data science

The hyphen causes the next character to become uppercase.

Final result:

Modern-Day Data Science

Example 4

Input:

web-based FRONT-end development

Lowercase conversion:

web-based front-end development

Both "web-based" and "front-end" are treated as two-part hyphenated words.

Final result:

Web-Based Front-End Development

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is processed exactly once
Space O(n) The transformed output string requires additional storage

The algorithm performs a single linear scan through the text. Every character is visited once, and all operations inside the loop are constant time. Therefore, the runtime complexity is linear relative to the input length.

The output string must be stored separately, so the space complexity is also linear.

Test Cases

def transform(text: str) -> str:
    text = text.lower()

    result = []
    capitalize_next = True

    for char in text:
        if char.isalpha():
            if capitalize_next:
                result.append(char.upper())
            else:
                result.append(char.lower())

            capitalize_next = False
        else:
            result.append(char)

            if char == ' ' or char == '-':
                capitalize_next = True

    return ''.join(result)

assert transform("hello world of SQL") == "Hello World Of Sql"  # basic example
assert transform("the QUICK-brown fox") == "The Quick-Brown Fox"  # hyphenated word
assert transform("modern-day DATA science") == "Modern-Day Data Science"  # mixed casing
assert transform("web-based FRONT-end development") == "Web-Based Front-End Development"  # multiple hyphens
assert transform("SINGLE") == "Single"  # fully uppercase word
assert transform("already Correct") == "Already Correct"  # partially formatted input
assert transform("multiple   spaces") == "Multiple   Spaces"  # preserve spacing
assert transform("a-b-c") == "A-B-C"  # repeated hyphen separators
assert transform("hello/world") == "Hello/world"  # slash does not trigger capitalization
assert transform("test@EMAIL") == "Test@email"  # special character handling
Test Why
"hello world of SQL" Validates standard capitalization
"the QUICK-brown fox" Tests hyphen handling
"modern-day DATA science" Tests lowercase normalization
"web-based FRONT-end development" Tests multiple hyphenated words
"SINGLE" Tests all-uppercase input
"already Correct" Tests mixed-case normalization
"multiple spaces" Ensures spaces are preserved
"a-b-c" Tests repeated hyphen capitalization
"hello/world" Ensures slash does not reset capitalization
"test@EMAIL" Tests special character preservation

Edge Cases

One important edge case is fully uppercase input such as "SQL SERVER". A naive implementation that only uppercases the first character without lowercasing the remainder would incorrectly produce "SQL SERVER" instead of "Sql Server". The implementation avoids this by converting the entire string to lowercase first.

Another tricky case involves multiple consecutive spaces, such as "hello world". Solutions that split on spaces and rejoin strings would collapse multiple spaces into one. The character-by-character approach preserves every original character exactly, including repeated spaces.

Hyphenated words are another common source of bugs. For example, "front-END" must become "Front-End". A naive title-case function would often produce "Front-end" because it only capitalizes after spaces. The implementation explicitly treats hyphens as capitalization boundaries, ensuring every segment is handled independently.