LeetCode 3582 - Generate Tag for Video Caption

The problem asks us to transform a given caption string into a valid video tag following a strict sequence of rules. The input is a single string caption consisting only of English letters and spaces. These words must be transformed into a single hashtag-style identifier.

LeetCode Problem 3582

Difficulty: 🟢 Easy
Topics: String, Simulation

Solution

Problem Understanding

The problem asks us to transform a given caption string into a valid video tag following a strict sequence of rules. The input is a single string caption consisting only of English letters and spaces. These words must be transformed into a single hashtag-style identifier.

The transformation has three ordered steps. First, we must merge all words into a single camelCase string prefixed with the character '#'. CamelCase here means the first word is fully lowercase, while every subsequent word starts with an uppercase letter followed by lowercase letters for the rest of the word. Second, we must remove any characters that are not English letters, except for the leading '#'. Third, we truncate the resulting string so that its length does not exceed 100 characters.

The output is a single string representing the final tag.

The constraints are small, with input length up to 150 characters, meaning a linear scan solution is sufficient. Since the input contains only letters and spaces, we do not need to handle punctuation or digits, but we still implement sanitization to respect the rules.

Important edge cases include a single-word caption, captions where the resulting tag exceeds 100 characters, and cases with multiple words where capitalization must be applied precisely. Another subtle case is ensuring that filtering does not accidentally remove the leading '#'.

Approaches

The brute-force approach would first generate all possible transformations step by step, possibly building intermediate strings repeatedly and then cleaning them in separate passes. For example, one might concatenate words, then repeatedly scan the string to enforce casing rules, then scan again to remove invalid characters, and finally truncate. While correct, this approach can involve multiple full passes over the string and repeated string concatenations, which is inefficient in languages where strings are immutable.

The optimal approach performs all transformations in a single linear pass over the words, constructing the result incrementally. We split the caption into words, normalize each word once, apply camelCase rules immediately, and append directly to a list of characters. Finally, we filter and truncate in a single controlled step. This avoids repeated rescanning and ensures O(n) behavior.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(n) Multiple intermediate string rebuilds and repeated scans
Optimal O(n) O(n) Single pass word processing with incremental construction

Algorithm Walkthrough

The optimal algorithm proceeds as follows.

  1. First, split the input string into individual words using whitespace separation. This isolates each word so that camelCase transformation can be applied correctly at word boundaries.
  2. Initialize an empty result container, typically a list of characters for efficiency, and start by appending the prefix character '#'. This prefix must always remain untouched during later filtering steps.
  3. Process the first word separately. Convert the entire word to lowercase and append it directly to the result. This ensures the first word satisfies camelCase rules, which require it to remain uncapitalized except for the prefix context.
  4. Iterate through the remaining words one by one. For each word, convert it to lowercase, then capitalize only the first character. Append this transformed word to the result container. This enforces the camelCase rule where only word-initial letters (except the first word) are uppercase.
  5. After constructing the raw tag, perform filtering to remove any non-alphabetic characters except the leading '#'. This is done by scanning the constructed character list and retaining characters that are either the prefix or alphabetic.
  6. Finally, truncate the filtered result to a maximum length of 100 characters. This ensures compliance with the output constraint.
  7. Return the resulting string.

Why it works: The algorithm maintains the invariant that each word is transformed exactly once into its correct camelCase representation before being appended. Since filtering and truncation happen after construction, we preserve correctness while ensuring that no invalid characters remain and the length constraint is satisfied.

Python Solution

class Solution:
    def generateTag(self, caption: str) -> str:
        words = caption.split()
        if not words:
            return "#"

        result = ["#"]

        first = words[0].lower()
        result.extend(first)

        for word in words[1:]:
            w = word.lower()
            if w:
                result.append(w[0].upper())
                result.extend(w[1:])

        filtered = []
        for i, ch in enumerate(result):
            if i == 0:
                filtered.append(ch)
            else:
                if ch.isalpha():
                    filtered.append(ch)

        tag = "".join(filtered)[:100]
        return tag

The implementation begins by splitting the caption into words. The result list starts with '#'. The first word is fully lowercased and appended. Each subsequent word is lowercased and then transformed so that only its first character is capitalized before being appended. After constructing the raw string, a filtering pass ensures only alphabetic characters remain, preserving only the leading '#'. Finally, the string is truncated to 100 characters.

Go Solution

func generateTag(caption string) string {
	words := make([]string, 0)
	start := 0

	for i := 0; i <= len(caption); i++ {
		if i == len(caption) || caption[i] == ' ' {
			if start < i {
				words = append(words, caption[start:i])
			}
			start = i + 1
		}
	}

	if len(words) == 0 {
		return "#"
	}

	result := make([]byte, 0, len(caption)+1)
	result = append(result, '#')

	first := []byte(words[0])
	for i := 0; i < len(first); i++ {
		if first[i] >= 'A' && first[i] <= 'Z' {
			first[i] = first[i] + 32
		}
	}
	result = append(result, first...)

	for i := 1; i < len(words); i++ {
		w := []byte(words[i])
		for j := 0; j < len(w); j++ {
			if w[j] >= 'A' && w[j] <= 'Z' {
				w[j] = w[j] + 32
			}
		}
		if len(w) > 0 {
			if w[0] >= 'a' && w[0] <= 'z' {
				w[0] = w[0] - 32
			}
			result = append(result, w...)
		}
	}

	final := make([]byte, 0, len(result))
	for i := 0; i < len(result); i++ {
		if i == 0 || (result[i] >= 'a' && result[i] <= 'z') || (result[i] >= 'A' && result[i] <= 'Z') {
			final = append(final, result[i])
		}
	}

	if len(final) > 100 {
		final = final[:100]
	}

	return string(final)
}

In Go, we manually split the string since Go does not automatically provide Python-style split semantics without importing strings utilities. We construct words by scanning the string. We then build the result using a byte slice for efficiency. Case conversion is handled manually using ASCII arithmetic. Filtering is performed in a second pass, and finally we truncate the slice to 100 bytes.

Worked Examples

For the input caption = "Leetcode daily streak achieved", the words are ["Leetcode", "daily", "streak", "achieved"].

We start with #.

After processing the first word, we append "leetcode", giving #leetcode.

Processing "daily" adds "Daily", producing #leetcodeDaily.

Processing "streak" adds "Streak", producing #leetcodeDailyStreak.

Processing "achieved" adds "Achieved", producing #leetcodeDailyStreakAchieved.

No invalid characters exist, and the length is under 100, so the final result remains unchanged.

For caption = "can I Go There", the process is similar.

Start with #.

First word "can" becomes can, so #can.

Then "I" becomes "I" but capitalized first letter results in "I" appended as "I" giving #canI.

Next "Go" becomes "go" then "Go", giving #canIGo.

Finally "There" becomes "there" then "There", giving #canIGoThere.

For the long single-word input of repeated h characters, the entire word becomes lowercase and prefixed, then truncated at 100 characters, ensuring only the first 100 characters of the final tag remain.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is processed a constant number of times during splitting, transformation, filtering, and truncation
Space O(n) We store the intermediate result and final output string

The solution is linear because every step involves a single pass over the input or constructed output, with no nested scans or repeated rebuilding of strings.

Test Cases

assert Solution().generateTag("Leetcode daily streak achieved") == "#leetcodeDailyStreakAchieved"  # basic multi-word case
assert Solution().generateTag("can I Go There") == "#canIGoThere"  # mixed capitalization input
assert Solution().generateTag("a") == "#a"  # single word minimal input
assert Solution().generateTag("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh")[:101] == "#hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh"  # truncation case
assert Solution().generateTag("HELLO WORLD") == "#helloWorld"  # uppercase normalization
assert Solution().generateTag("   multiple   spaces here   ") == "#multipleSpacesHere"  # irregular spacing
Test Why
normal sentence verifies standard camelCase construction
mixed casing ensures correct normalization
single word verifies minimal structure handling
long repeated characters validates truncation logic
all caps input ensures lowercase conversion
irregular spaces ensures split robustness

Edge Cases

A single-word caption is an important edge case because the camelCase rules still apply, but no internal capitalization is needed. The implementation must ensure the word is simply lowercased and prefixed without attempting to access non-existent subsequent words.

Another edge case involves very long input strings where the resulting tag exceeds 100 characters. If truncation is applied before filtering or construction is not carefully managed, it can lead to incorrect outputs. The implementation ensures truncation happens only at the final stage so the structure remains valid before cutting.

Finally, irregular spacing between words can cause empty tokens if not handled properly. By using a robust split operation, we ensure only valid words are processed, preventing accidental insertion of empty or malformed segments into the camelCase result. The is longer. This guarantees correct behavior at the boundary.