LeetCode 591 - Tag Validator
This problem asks us to validate whether a given string represents a correctly structured code snippet according to a custom XML-like syntax. At first glance, it resembles parsing HTML or XML tags, but the validation rules are stricter and more specialized.
Difficulty: 🔴 Hard
Topics: String, Stack
Solution
Problem Understanding
This problem asks us to validate whether a given string represents a correctly structured code snippet according to a custom XML-like syntax. At first glance, it resembles parsing HTML or XML tags, but the validation rules are stricter and more specialized.
The input is a single string code, which may contain opening tags, closing tags, CDATA sections, and plain text. Our goal is to return true if the entire string forms a valid snippet according to all rules, otherwise return false.
The most important requirement is that the entire string must be enclosed within exactly one valid root tag. This means a string like <A></A> is valid, but plain text outside the root tag or multiple top-level tags make the input invalid.
A valid tag has the form:
<TAG_NAME>content</TAG_NAME>
where:
- The opening and closing tag names must match.
- The tag name must contain only uppercase English letters.
- The tag name length must be between
1and9.
The content inside a tag can contain:
- Nested valid tags
- CDATA blocks
- Arbitrary characters
However, the content cannot contain malformed tag syntax, unmatched <, invalid tag names, or improperly nested tags.
CDATA sections deserve special attention. A CDATA block has the form:
<![CDATA[anything]]>
Inside CDATA, absolutely nothing is parsed as tags. Even invalid XML-like content must be treated as plain text. The parser must stop at the first occurrence of ]]>.
The constraints are relatively small:
1 <= code.length <= 500
Since the input size is capped at only 500 characters, even moderately expensive parsing approaches would work. However, correctness is far more challenging than efficiency because of the many edge cases around nesting, malformed syntax, and CDATA handling.
Several tricky cases can break naive implementations:
- Text outside the root tag, such as
<A></A>text, must be rejected. - Improper nesting like
<A><B></A></B>is invalid. - Invalid tag names such as
<abc>or<ABCDEFGHIJK>must fail. - CDATA sections must only appear inside an open tag.
- A stray
<without a matching>invalidates the input. - Empty stack situations, such as a closing tag appearing before any opening tag, must be handled carefully.
The problem guarantees that the input only contains a limited character set, but it does not guarantee structural correctness. That means the parser must explicitly validate every rule.
Approaches
Brute Force Approach
A brute-force solution would attempt recursive parsing by repeatedly scanning substrings and trying to match valid opening and closing tags. For every opening tag, we could search forward for possible matching closing tags, recursively validating nested contents.
This works conceptually because the structure resembles recursive grammar parsing. Whenever we encounter a tag, we recursively validate its contents until we find the corresponding closing tag.
However, the difficulty comes from ambiguity and repeated rescanning. Nested tags and CDATA blocks force us to repeatedly search through overlapping regions of the string. In pathological cases, this can lead to repeated work and inefficient substring processing.
More importantly, implementing recursive backtracking correctly becomes extremely error-prone due to nesting rules and malformed inputs.
Key Insight for an Optimal Solution
The key observation is that the input naturally behaves like a structured language with nested scopes. Whenever we encounter an opening tag, we enter a new nested context. Whenever we encounter a closing tag, we leave the current context.
This pattern strongly suggests using a stack.
The stack tracks currently open tags. When we encounter:
- A start tag, we push it onto the stack.
- An end tag, we verify it matches the stack top and pop it.
- A CDATA section, we skip it entirely because it must not be parsed.
- Plain characters, we simply move forward.
An additional critical rule is that the entire snippet must be wrapped in one valid root tag. We enforce this by ensuring:
- Parsing starts with an opening tag.
- The stack never becomes empty before reaching the end of the string.
- The stack is empty exactly at the end.
This gives us a clean single-pass parser.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(n) | Recursive rescanning of nested substrings |
| Optimal | O(n) | O(n) | Single pass parsing using a stack |
Algorithm Walkthrough
- Initialize an empty stack and an index pointer
i = 0. - Traverse the string while
i < len(code). - Before processing, check whether the stack is empty while we are not at the beginning of parsing. If the stack becomes empty before reaching the end, it means content exists outside the root tag, which is invalid.
- If we encounter
<![CDATA[, validate that:
- We are inside an open tag, meaning the stack is not empty.
- We can find the terminating
]]>.
If either condition fails, return false.
Otherwise, skip the entire CDATA block without parsing its contents.
5. If we encounter an end tag </TAG>:
- Find the next
>. - Extract the tag name.
- Validate the tag name format.
- Ensure the stack is not empty.
- Verify the tag matches the current stack top.
- Pop the tag.
If any condition fails, return false.
6. If we encounter a start tag <TAG>:
-
Find the next
>. -
Extract the tag name.
-
Validate that:
-
Length is between
1and9 -
Contains only uppercase letters
-
Push the tag onto the stack.
- If we encounter normal characters, simply move forward by one character.
- After processing the entire string, return
trueonly if the stack is empty.
Why it works
The correctness comes from the stack invariant:
The stack always contains exactly the currently open tags in nesting order.
Whenever we open a tag, we push it. Whenever we close a tag, we must close the most recently opened unmatched tag, which is exactly the stack top. This naturally enforces proper nesting.
CDATA handling works because we intentionally skip parsing inside it, matching the problem requirement that CDATA content must be treated as raw text.
The root tag constraint is guaranteed because the stack cannot become empty before reaching the end of the string.
Python Solution
class Solution:
def isValid(self, code: str) -> bool:
def is_valid_tag(tag_name: str) -> bool:
return (
1 <= len(tag_name) <= 9
and all("A" <= ch <= "Z" for ch in tag_name)
)
stack: list[str] = []
index = 0
n = len(code)
while index < n:
# No content allowed outside root tag
if index > 0 and not stack:
return False
# CDATA section
if code.startswith("<![CDATA[", index):
if not stack:
return False
cdata_end = code.find("]]>", index)
if cdata_end == -1:
return False
index = cdata_end + 3
# End tag
elif code.startswith("</", index):
tag_end = code.find(">", index)
if tag_end == -1:
return False
tag_name = code[index + 2 : tag_end]
if not is_valid_tag(tag_name):
return False
if not stack or stack[-1] != tag_name:
return False
stack.pop()
index = tag_end + 1
# Start tag
elif code[index] == "<":
tag_end = code.find(">", index)
if tag_end == -1:
return False
tag_name = code[index + 1 : tag_end]
if not is_valid_tag(tag_name):
return False
stack.append(tag_name)
index = tag_end + 1
# Regular text
else:
index += 1
return len(stack) == 0
The implementation begins with a helper function is_valid_tag(), which ensures tag names follow the problem rules. A valid tag must contain only uppercase letters and have length between 1 and 9.
The stack keeps track of currently open tags. As we scan the string, the parser checks for CDATA first because it begins with <, and mistakenly treating it as a normal tag would break parsing.
For closing tags, we verify the tag name matches the stack top. This enforces proper nesting and prevents invalid structures such as:
<A><B></A></B>
The root-tag restriction is handled by checking whether the stack becomes empty before reaching the end. If it does, additional content exists outside the valid root tag.
Finally, we return true only if every opened tag has been properly closed.
Go Solution
func isValid(code string) bool {
isValidTag := func(tag string) bool {
if len(tag) < 1 || len(tag) > 9 {
return false
}
for _, ch := range tag {
if ch < 'A' || ch > 'Z' {
return false
}
}
return true
}
stack := []string{}
n := len(code)
index := 0
for index < n {
if index > 0 && len(stack) == 0 {
return false
}
// CDATA
if index+9 <= n && code[index:index+9] == "<![CDATA[" {
if len(stack) == 0 {
return false
}
cdataEnd := indexOf(code, "]]>", index)
if cdataEnd == -1 {
return false
}
index = cdataEnd + 3
} else if index+2 <= n && code[index:index+2] == "</" {
// End tag
tagEnd := indexOf(code, ">", index)
if tagEnd == -1 {
return false
}
tagName := code[index+2 : tagEnd]
if !isValidTag(tagName) {
return false
}
if len(stack) == 0 || stack[len(stack)-1] != tagName {
return false
}
stack = stack[:len(stack)-1]
index = tagEnd + 1
} else if code[index] == '<' {
// Start tag
tagEnd := indexOf(code, ">", index)
if tagEnd == -1 {
return false
}
tagName := code[index+1 : tagEnd]
if !isValidTag(tagName) {
return false
}
stack = append(stack, tagName)
index = tagEnd + 1
} else {
index++
}
}
return len(stack) == 0
}
func indexOf(s, target string, start int) int {
for i := start; i <= len(s)-len(target); i++ {
if s[i:i+len(target)] == target {
return i
}
}
return -1
}
The Go implementation follows the same logic as Python, but uses slices to simulate a stack. Since Go does not provide a built-in string search with start index semantics like Python's find(), we implement a helper indexOf() function.
Go strings are byte indexed, which is safe here because the problem guarantees ASCII characters only. There are no integer overflow concerns because the maximum input length is only 500.
Worked Examples
Example 1
code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>"
| Step | Current Token | Stack | Action |
|---|---|---|---|
| 1 | <DIV> |
[] |
Push DIV |
| 2 | Text | [DIV] |
Skip |
| 3 | <![CDATA[<div>]]> |
[DIV] |
Skip entire CDATA |
| 4 | </DIV> |
[DIV] |
Pop DIV |
Final stack:
[]
Since the stack is empty and parsing completed successfully, return true.
Example 2
code = "<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"
| Step | Current Token | Stack | Action |
|---|---|---|---|
| 1 | <DIV> |
[] |
Push DIV |
| 2 | Text | [DIV] |
Skip |
| 3 | CDATA block | [DIV] |
Ignore contents |
| 4 | Remaining text | [DIV] |
Skip |
| 5 | </DIV> |
[DIV] |
Pop |
Final stack is empty, therefore return true.
Example 3
code = "<A> <B> </A> </B>"
| Step | Current Token | Stack | Action |
|---|---|---|---|
| 1 | <A> |
[] |
Push A |
| 2 | <B> |
[A] |
Push B |
| 3 | </A> |
[A, B] |
Mismatch, expected B |
Since nesting is invalid, return false.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character is processed at most a constant number of times |
| Space | O(n) | Stack may store all nested tags in worst case |
Although we use string searching for > and ]]>, every character is effectively visited a constant number of times during parsing. Since n <= 500, performance is easily sufficient. The stack depth is bounded by the nesting level of tags, which in the worst case can be proportional to the input length.
Test Cases
solution = Solution()
assert solution.isValid("<DIV>This is the first line <![CDATA[<div>]]></DIV>") is True # example 1
assert solution.isValid("<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>") is True # example 2
assert solution.isValid("<A> <B> </A> </B>") is False # invalid nesting
assert solution.isValid("<A></A>") is True # simplest valid tag
assert solution.isValid("<A>text</A>") is True # plain text content
assert solution.isValid("<A><B></B></A>") is True # nested tags
assert solution.isValid("<A><![CDATA[test]]></A>") is True # valid CDATA
assert solution.isValid("text<A></A>") is False # text outside root
assert solution.isValid("<A></A>text") is False # trailing text outside root
assert solution.isValid("<A><B></A></B>") is False # mismatched nesting
assert solution.isValid("<a></a>") is False # lowercase tag invalid
assert solution.isValid("<ABCDEFGHIJ></ABCDEFGHIJ>") is False # tag too long
assert solution.isValid("<A></B>") is False # mismatched tag names
assert solution.isValid("<A><![CDATA[test]</A>") is False # unclosed CDATA
assert solution.isValid("<![CDATA[test]]>") is False # CDATA outside tag
assert solution.isValid("<A><</A>") is False # malformed tag syntax
| Test | Why |
|---|---|
| Example 1 | Valid nested structure with CDATA |
| Example 2 | Complex parsing with text and CDATA |
| Example 3 | Detects unbalanced nesting |
<A></A> |
Smallest valid input |
<A>text</A> |
Valid plain text |
| Nested tags | Confirms stack behavior |
| CDATA inside tag | Ensures raw text parsing |
| Text outside root | Enforces root-tag rule |
| Trailing text | Rejects extra content |
| Mismatched nesting | Detects improper order |
| Lowercase tag | Validates uppercase requirement |
| Long tag | Enforces length limit |
| Wrong closing tag | Ensures matching names |
| Broken CDATA | Detects malformed section |
| CDATA outside tag | Enforces placement rule |
Malformed < |
Handles invalid syntax |
Edge Cases
Content Outside the Root Tag
A subtle but important edge case is when valid content exists outside the root tag, such as:
<A></A>extra
A naive parser may incorrectly accept this because the tag itself is valid. However, the problem requires the entire snippet to be wrapped in a single closed tag. Our implementation handles this by immediately rejecting cases where the stack becomes empty before reaching the end of the string.
Invalid Tag Names
Another tricky case involves malformed tag names:
<abc></abc>
or
<ABCDEFGHIJ></ABCDEFGHIJ>
The first violates the uppercase-only rule, while the second exceeds the maximum length of 9. Our helper function validates both conditions before allowing a tag onto the stack.
Improper Nesting
Improper nesting often causes bugs in recursive implementations:
<A><B></A></B>
Even though both tags exist, the closing order is wrong. Since stacks naturally enforce last-opened-first-closed behavior, our implementation correctly detects this mismatch by checking the closing tag against the stack top.
CDATA Misinterpretation
CDATA content must never be parsed as tags. For example:
<A><![CDATA[<invalid></invalid>]]></A>
The invalid tags inside CDATA should not matter. Our implementation explicitly skips everything until the first ]]>, preventing accidental parsing of internal content.