LeetCode 722 - Remove Comments

The problem asks us to remove comments from a C++ program represented as an array of strings. Each string represents a line of the program, split by newline characters.

LeetCode Problem 722

Difficulty: 🟡 Medium
Topics: Array, String

Solution

Problem Understanding

The problem asks us to remove comments from a C++ program represented as an array of strings. Each string represents a line of the program, split by newline characters. In C++, there are two types of comments: line comments starting with "//" and block comments enclosed between "/*" and "*/". Line comments terminate at the end of the line, while block comments may span multiple lines.

Our task is to process the input line by line and remove all comments. After removing comments, lines that are empty should not be included in the output. The output must retain the relative order of the remaining code lines.

The input constraints are manageable: the source code has at most 100 lines, each up to 80 characters. Every block comment is guaranteed to be closed, and there are no quotes in the input. These constraints imply that we do not need to handle malformed comments or embedded strings, which simplifies parsing. Key edge cases include comments that start and end in the middle of lines, multiple comments in a single line, and block comments spanning multiple lines. We also need to correctly handle implicit newline removals caused by block comments.

Approaches

A brute-force approach would be to convert the source code into a single string, search for all occurrences of "//" and "/*...*/", and remove them. After removing comments, we would split the string back into lines. This approach works, but it is cumbersome because handling multiline block comments and line breaks manually is error-prone. Additionally, string concatenation and repeated searches could make the solution less efficient, though the input size is small.

The optimal approach uses a line-by-line scan with a flag to track whether we are inside a block comment. For each line, we iterate through characters while checking for "//" and "/*" or "*/" sequences. If we encounter a line comment outside a block comment, we ignore the rest of the line. If we encounter a block comment start, we enter block comment mode, skipping characters until we find the end of the block comment. This approach preserves line structure and avoids unnecessary string operations.

This approach is linear in the total number of characters in the input, making it efficient for the given constraints.

Approach Time Complexity Space Complexity Notes
Brute Force O(N * M) O(N * M) Concatenate lines, remove all comments via string replacement, then split back
Optimal O(N * M) O(N * M) Single pass through characters, tracking block comments, preserving lines

Here, N is the number of lines, and M is the average line length.

Algorithm Walkthrough

  1. Initialize an empty list result to store cleaned lines and a boolean flag in_block_comment set to False.
  2. Iterate through each line in the input source.
  3. For each line, initialize an empty list new_line to accumulate non-comment characters and a pointer i to track the current character index.
  4. While i is within the bounds of the line:
  • If in_block_comment is True, check if the current and next character are "*/". If yes, exit block comment mode and advance i by 2. Otherwise, skip the current character.
  • If in_block_comment is False, check if the current and next character are "//". If yes, break the loop since the rest of the line is a line comment.
  • If the current and next character are "/*", enter block comment mode and advance i by 2.
  • Otherwise, append the current character to new_line and advance i by 1.
  1. After processing the line, if new_line is non-empty and not entirely whitespace, join it into a string and append to result.
  2. Return the result list.

Why it works: The invariant is that all characters outside block comments are accumulated, and all comments are skipped. Line and block comment rules are followed exactly, ensuring that only valid code is retained.

Python Solution

from typing import List

class Solution:
    def removeComments(self, source: List[str]) -> List[str]:
        result = []
        in_block_comment = False
        new_line = []

        for line in source:
            i = 0
            if not in_block_comment:
                new_line = []
            while i < len(line):
                if in_block_comment:
                    if i + 1 < len(line) and line[i] == '*' and line[i+1] == '/':
                        in_block_comment = False
                        i += 2
                    else:
                        i += 1
                else:
                    if i + 1 < len(line) and line[i] == '/' and line[i+1] == '/':
                        break
                    elif i + 1 < len(line) and line[i] == '/' and line[i+1] == '*':
                        in_block_comment = True
                        i += 2
                    else:
                        new_line.append(line[i])
                        i += 1
            if not in_block_comment and new_line:
                result.append(''.join(new_line))

        return result

This Python implementation directly mirrors the algorithm. The main loop handles line traversal, while in_block_comment tracks block comment state. Characters are appended only if outside a comment, ensuring correct line construction.

Go Solution

func removeComments(source []string) []string {
    result := []string{}
    inBlock := false
    var newLine []byte

    for _, line := range source {
        i := 0
        if !inBlock {
            newLine = []byte{}
        }
        for i < len(line) {
            if inBlock {
                if i+1 < len(line) && line[i] == '*' && line[i+1] == '/' {
                    inBlock = false
                    i += 2
                } else {
                    i++
                }
            } else {
                if i+1 < len(line) && line[i] == '/' && line[i+1] == '/' {
                    break
                } else if i+1 < len(line) && line[i] == '/' && line[i+1] == '*' {
                    inBlock = true
                    i += 2
                } else {
                    newLine = append(newLine, line[i])
                    i++
                }
            }
        }
        if !inBlock && len(newLine) > 0 {
            result = append(result, string(newLine))
        }
    }
    return result
}

The Go solution is almost identical to Python. Slices replace Python lists for character accumulation. newLine is a byte slice to efficiently build strings. The main logic remains unchanged.

Worked Examples

Example 1

Input: ["/*Test program */", "int main()", "{ ", " // variable declaration ", "int a, b, c;", "/* This is a test", " multiline ", " comment for ", " testing */", "a = b + c;", "}"]

Step-by-step:

Line in_block_comment new_line Action
"/*Test program */" False→True→False [] Block comment removed
"int main()" False ['i','n','t',' ','m','a','i','n','(',')'] Appended to result
"{ " False ['{',' '] Appended
" // variable declaration " False [' ',' '] Stops at "//", append [' ',' ']
"int a, b, c;" False ['i','n','t',' ','a',',',' ','b',',',' ','c',';'] Append
"/* This is a test" False→True [] Enter block comment
" multiline " True [] Inside block comment, skip
" comment for " True [] Skip
" testing */" True→False [] End block comment
"a = b + c;" False ['a',' ','=',' ','b',' ','+',' ','c',';'] Append
"}" False ['}'] Append

Output: ["int main()","{ "," ","int a, b, c;","a = b + c;","}"]

Example 2

Input: ["a/*comment", "line", "more_comment*/b"]

Step-by-step:

Line in_block_comment new_line Action
"a/*comment" False→True ['a'] Enter block comment
"line" True [] Skip
"more_comment*/b" True→False ['b'] Exit block comment and append 'b'

Output: ["ab"]

Complexity Analysis

Measure Complexity Explanation
Time O(N * M) Each character of each line is visited exactly once
Space O(N * M) Accumulate non-comment characters into the result list

The solution is linear with respect to the total number of characters, which is optimal for this problem. Using a single pass ensures we never re-process lines or characters unnecessarily.

Test Cases

# Provided examples
assert Solution().removeComments(["/*Test program */", "int main()", "{ ", "  // variable declaration ", "int a, b, c;", "/* This is a test", "   multiline  ", "   comment