LeetCode 591 - Tag Validator

Difficulty: 🔴 Hard
Topics: String, Stack

Solution

Problem Understanding

This problem asks us to validate whether a given string represents a correctly structured code snippet according to a custom XML-like syntax. At first glance, it resembles parsing HTML or XML tags, but the validation rules are stricter and more specialized.

The input is a single string code, which may contain opening tags, closing tags, CDATA sections, and plain text. Our goal is to return true if the entire string forms a valid snippet according to all rules, otherwise return false.

The most important requirement is that the entire string must be enclosed within exactly one valid root tag. This means a string like <A></A> is valid, but plain text outside the root tag or multiple top-level tags make the input invalid.

A valid tag has the form:

<TAG_NAME>content</TAG_NAME>

where:

The opening and closing tag names must match.
The tag name must contain only uppercase English letters.
The tag name length must be between 1 and 9.

The content inside a tag can contain:

Nested valid tags
CDATA blocks
Arbitrary characters

However, the content cannot contain malformed tag syntax, unmatched <, invalid tag names, or improperly nested tags.

CDATA sections deserve special attention. A CDATA block has the form:

<![CDATA[anything]]>

Inside CDATA, absolutely nothing is parsed as tags. Even invalid XML-like content must be treated as plain text. The parser must stop at the first occurrence of ]]>.

The constraints are relatively small:

1 <= code.length <= 500

Since the input size is capped at only 500 characters, even moderately expensive parsing approaches would work. However, correctness is far more challenging than efficiency because of the many edge cases around nesting, malformed syntax, and CDATA handling.

Several tricky cases can break naive implementations:

Text outside the root tag, such as <A></A>text, must be rejected.
Improper nesting like <A><B></A></B> is invalid.
Invalid tag names such as <abc> or <ABCDEFGHIJK> must fail.
CDATA sections must only appear inside an open tag.
A stray < without a matching > invalidates the input.
Empty stack situations, such as a closing tag appearing before any opening tag, must be handled carefully.

The problem guarantees that the input only contains a limited character set, but it does not guarantee structural correctness. That means the parser must explicitly validate every rule.

Approaches

Brute Force Approach

A brute-force solution would attempt recursive parsing by repeatedly scanning substrings and trying to match valid opening and closing tags. For every opening tag, we could search forward for possible matching closing tags, recursively validating nested contents.

This works conceptually because the structure resembles recursive grammar parsing. Whenever we encounter a tag, we recursively validate its contents until we find the corresponding closing tag.

However, the difficulty comes from ambiguity and repeated rescanning. Nested tags and CDATA blocks force us to repeatedly search through overlapping regions of the string. In pathological cases, this can lead to repeated work and inefficient substring processing.

More importantly, implementing recursive backtracking correctly becomes extremely error-prone due to nesting rules and malformed inputs.

Key Insight for an Optimal Solution

The key observation is that the input naturally behaves like a structured language with nested scopes. Whenever we encounter an opening tag, we enter a new nested context. Whenever we encounter a closing tag, we leave the current context.

This pattern strongly suggests using a stack.

The stack tracks currently open tags. When we encounter:

A start tag, we push it onto the stack.
An end tag, we verify it matches the stack top and pop it.
A CDATA section, we skip it entirely because it must not be parsed.
Plain characters, we simply move forward.

An additional critical rule is that the entire snippet must be wrapped in one valid root tag. We enforce this by ensuring:

Parsing starts with an opening tag.
The stack never becomes empty before reaching the end of the string.
The stack is empty exactly at the end.

This gives us a clean single-pass parser.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n²)	O(n)	Recursive rescanning of nested substrings
Optimal	O(n)	O(n)	Single pass parsing using a stack

Algorithm Walkthrough

Initialize an empty stack and an index pointer i = 0.
Traverse the string while i < len(code).
Before processing, check whether the stack is empty while we are not at the beginning of parsing. If the stack becomes empty before reaching the end, it means content exists outside the root tag, which is invalid.
If we encounter <![CDATA[, validate that:

We are inside an open tag, meaning the stack is not empty.
We can find the terminating ]]>.

If either condition fails, return false.

Otherwise, skip the entire CDATA block without parsing its contents. 5. If we encounter an end tag </TAG>:

Find the next >.
Extract the tag name.
Validate the tag name format.
Ensure the stack is not empty.
Verify the tag matches the current stack top.
Pop the tag.

If any condition fails, return false. 6. If we encounter a start tag <TAG>:

Find the next >.
Extract the tag name.
Validate that:
Length is between 1 and 9
Contains only uppercase letters
Push the tag onto the stack.

If we encounter normal characters, simply move forward by one character.
After processing the entire string, return true only if the stack is empty.

Why it works

The correctness comes from the stack invariant:

The stack always contains exactly the currently open tags in nesting order.

Whenever we open a tag, we push it. Whenever we close a tag, we must close the most recently opened unmatched tag, which is exactly the stack top. This naturally enforces proper nesting.

CDATA handling works because we intentionally skip parsing inside it, matching the problem requirement that CDATA content must be treated as raw text.

The root tag constraint is guaranteed because the stack cannot become empty before reaching the end of the string.

Python Solution

class Solution:
    def isValid(self, code: str) -> bool:
        def is_valid_tag(tag_name: str) -> bool:
            return (
                1 <= len(tag_name) <= 9
                and all("A" <= ch <= "Z" for ch in tag_name)
            )

        stack: list[str] = []
        index = 0
        n = len(code)

        while index < n:
            # No content allowed outside root tag
            if index > 0 and not stack:
                return False

            # CDATA section
            if code.startswith("<![CDATA[", index):
                if not stack:
                    return False

                cdata_end = code.find("]]>", index)
                if cdata_end == -1:
                    return False

                index = cdata_end + 3

            # End tag
            elif code.startswith("</", index):
                tag_end = code.find(">", index)
                if tag_end == -1:
                    return False

                tag_name = code[index + 2 : tag_end]

                if not is_valid_tag(tag_name):
                    return False

                if not stack or stack[-1] != tag_name:
                    return False

                stack.pop()
                index = tag_end + 1

            # Start tag
            elif code[index] == "<":
                tag_end = code.find(">", index)
                if tag_end == -1:
                    return False

                tag_name = code[index + 1 : tag_end]

                if not is_valid_tag(tag_name):
                    return False

                stack.append(tag_name)
                index = tag_end + 1

            # Regular text
            else:
                index += 1

        return len(stack) == 0

The implementation begins with a helper function is_valid_tag(), which ensures tag names follow the problem rules. A valid tag must contain only uppercase letters and have length between 1 and 9.

The stack keeps track of currently open tags. As we scan the string, the parser checks for CDATA first because it begins with <, and mistakenly treating it as a normal tag would break parsing.

For closing tags, we verify the tag name matches the stack top. This enforces proper nesting and prevents invalid structures such as:

<A><B></A></B>

The root-tag restriction is handled by checking whether the stack becomes empty before reaching the end. If it does, additional content exists outside the valid root tag.

Finally, we return true only if every opened tag has been properly closed.

Go Solution

func isValid(code string) bool {
	isValidTag := func(tag string) bool {
		if len(tag) < 1 || len(tag) > 9 {
			return false
		}

		for _, ch := range tag {
			if ch < 'A' || ch > 'Z' {
				return false
			}
		}

		return true
	}

	stack := []string{}
	n := len(code)
	index := 0

	for index < n {
		if index > 0 && len(stack) == 0 {
			return false
		}

		// CDATA
		if index+9 <= n && code[index:index+9] == "<![CDATA[" {
			if len(stack) == 0 {
				return false
			}

			cdataEnd := indexOf(code, "]]>", index)
			if cdataEnd == -1 {
				return false
			}

			index = cdataEnd + 3

		} else if index+2 <= n && code[index:index+2] == "</" {
			// End tag
			tagEnd := indexOf(code, ">", index)
			if tagEnd == -1 {
				return false
			}

			tagName := code[index+2 : tagEnd]

			if !isValidTag(tagName) {
				return false
			}

			if len(stack) == 0 || stack[len(stack)-1] != tagName {
				return false
			}

			stack = stack[:len(stack)-1]
			index = tagEnd + 1

		} else if code[index] == '<' {
			// Start tag
			tagEnd := indexOf(code, ">", index)
			if tagEnd == -1 {
				return false
			}

			tagName := code[index+1 : tagEnd]

			if !isValidTag(tagName) {
				return false
			}

			stack = append(stack, tagName)
			index = tagEnd + 1

		} else {
			index++
		}
	}

	return len(stack) == 0
}

func indexOf(s, target string, start int) int {
	for i := start; i <= len(s)-len(target); i++ {
		if s[i:i+len(target)] == target {
			return i
		}
	}
	return -1
}

The Go implementation follows the same logic as Python, but uses slices to simulate a stack. Since Go does not provide a built-in string search with start index semantics like Python's find(), we implement a helper indexOf() function.

Go strings are byte indexed, which is safe here because the problem guarantees ASCII characters only. There are no integer overflow concerns because the maximum input length is only 500.

Worked Examples

Example 1

code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>"

Step	Current Token	Stack	Action
1	`<DIV>`	`[]`	Push `DIV`
2	Text	`[DIV]`	Skip
3	`<![CDATA[<div>]]>`	`[DIV]`	Skip entire CDATA
4	`</DIV>`	`[DIV]`	Pop `DIV`

Final stack:

[]

Since the stack is empty and parsing completed successfully, return true.

Example 2

code = "<DIV>>>  ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"

Step	Current Token	Stack	Action
1	`<DIV>`	`[]`	Push `DIV`
2	Text	`[DIV]`	Skip
3	CDATA block	`[DIV]`	Ignore contents
4	Remaining text	`[DIV]`	Skip
5	`</DIV>`	`[DIV]`	Pop

Final stack is empty, therefore return true.

Example 3

code = "<A>  <B> </A>   </B>"

Step	Current Token	Stack	Action
1	`<A>`	`[]`	Push `A`
2	`<B>`	`[A]`	Push `B`
3	`</A>`	`[A, B]`	Mismatch, expected `B`

Since nesting is invalid, return false.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each character is processed at most a constant number of times
Space	O(n)	Stack may store all nested tags in worst case

Although we use string searching for > and ]]>, every character is effectively visited a constant number of times during parsing. Since n <= 500, performance is easily sufficient. The stack depth is bounded by the nesting level of tags, which in the worst case can be proportional to the input length.

Test Cases

solution = Solution()

assert solution.isValid("<DIV>This is the first line <![CDATA[<div>]]></DIV>") is True  # example 1
assert solution.isValid("<DIV>>>  ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>") is True  # example 2
assert solution.isValid("<A>  <B> </A>   </B>") is False  # invalid nesting

assert solution.isValid("<A></A>") is True  # simplest valid tag
assert solution.isValid("<A>text</A>") is True  # plain text content
assert solution.isValid("<A><B></B></A>") is True  # nested tags
assert solution.isValid("<A><![CDATA[test]]></A>") is True  # valid CDATA

assert solution.isValid("text<A></A>") is False  # text outside root
assert solution.isValid("<A></A>text") is False  # trailing text outside root
assert solution.isValid("<A><B></A></B>") is False  # mismatched nesting

assert solution.isValid("<a></a>") is False  # lowercase tag invalid
assert solution.isValid("<ABCDEFGHIJ></ABCDEFGHIJ>") is False  # tag too long
assert solution.isValid("<A></B>") is False  # mismatched tag names

assert solution.isValid("<A><![CDATA[test]</A>") is False  # unclosed CDATA
assert solution.isValid("<![CDATA[test]]>") is False  # CDATA outside tag
assert solution.isValid("<A><</A>") is False  # malformed tag syntax

Test	Why
Example 1	Valid nested structure with CDATA
Example 2	Complex parsing with text and CDATA
Example 3	Detects unbalanced nesting
`<A></A>`	Smallest valid input
`<A>text</A>`	Valid plain text
Nested tags	Confirms stack behavior
CDATA inside tag	Ensures raw text parsing
Text outside root	Enforces root-tag rule
Trailing text	Rejects extra content
Mismatched nesting	Detects improper order
Lowercase tag	Validates uppercase requirement
Long tag	Enforces length limit
Wrong closing tag	Ensures matching names
Broken CDATA	Detects malformed section
CDATA outside tag	Enforces placement rule
Malformed `<`	Handles invalid syntax

Edge Cases

Content Outside the Root Tag

A subtle but important edge case is when valid content exists outside the root tag, such as:

<A></A>extra

A naive parser may incorrectly accept this because the tag itself is valid. However, the problem requires the entire snippet to be wrapped in a single closed tag. Our implementation handles this by immediately rejecting cases where the stack becomes empty before reaching the end of the string.

Invalid Tag Names

Another tricky case involves malformed tag names:

<abc></abc>

<ABCDEFGHIJ></ABCDEFGHIJ>

The first violates the uppercase-only rule, while the second exceeds the maximum length of 9. Our helper function validates both conditions before allowing a tag onto the stack.

Improper Nesting

Improper nesting often causes bugs in recursive implementations:

<A><B></A></B>

Even though both tags exist, the closing order is wrong. Since stacks naturally enforce last-opened-first-closed behavior, our implementation correctly detects this mismatch by checking the closing tag against the stack top.

CDATA Misinterpretation

CDATA content must never be parsed as tags. For example:

<A><![CDATA[<invalid></invalid>]]></A>

The invalid tags inside CDATA should not matter. Our implementation explicitly skips everything until the first ]]>, preventing accidental parsing of internal content.