LeetCode 726 - Number of Atoms

This problem asks us to parse a chemical formula represented as a string and return a canonical string that counts the number of each atom in the formula. Each atom starts with an uppercase letter and may have lowercase letters following it.

LeetCode Problem 726

Difficulty: 🔴 Hard
Topics: Hash Table, String, Stack, Sorting

Solution

Problem Understanding

This problem asks us to parse a chemical formula represented as a string and return a canonical string that counts the number of each atom in the formula. Each atom starts with an uppercase letter and may have lowercase letters following it. A number following an atom represents how many of that atom are present, with the convention that no number implies a count of 1. Parentheses are used to group atoms, and a number following a closing parenthesis multiplies the counts of all atoms inside the parentheses.

For example, "Mg(OH)2" means 1 magnesium atom, 2 oxygen atoms, and 2 hydrogen atoms. Nested parentheses are possible, as in "K4(ON(SO3)2)2", which requires correctly applying multipliers from multiple levels of parentheses.

The input is always valid and contains only letters, digits, and parentheses. Its length can be up to 1000 characters, so any solution must handle moderately large strings efficiently. Edge cases include formulas with no numbers, deeply nested parentheses, or multiple concatenated formulas without separators. The output must be in sorted order by atom name, with the count omitted if it is 1.

Approaches

A naive brute-force approach would attempt to split the formula at each atom and parenthesis, recursively count atoms, and manually multiply counts. This approach is prone to errors, especially with nested parentheses, and requires complex bookkeeping. It would likely involve repeated scanning of the string and building new maps for every nested group, which can be inefficient.

The key insight for an optimal solution is to use a stack-based approach. By scanning the formula left-to-right, we can push a new counter onto the stack whenever we encounter an open parenthesis and pop it when we encounter a closing parenthesis. Multipliers are applied immediately when closing parentheses are encountered. Using a stack ensures that nested structures are handled correctly, and combining counts from different levels of nesting becomes straightforward. Sorting and formatting the output is simple once all counts are accumulated.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2) O(n) Recursive parsing and repeated merging of dictionaries; inefficient for nested formulas.
Optimal O(n log n) O(n) Stack-based single pass; sorting at the end contributes log n factor.

Algorithm Walkthrough

  1. Initialize a stack that will store Counter objects to track atom counts at each level of nesting. Push an empty Counter for the base level.
  2. Iterate over the formula string character by character using an index i.
  3. When an uppercase letter is encountered, read the full atom name by appending any lowercase letters following it.
  4. Immediately after the atom, read any digits representing the count. If no digits exist, the count is 1.
  5. Update the top Counter on the stack with the atom and its count.
  6. When an opening parenthesis '(' is encountered, push a new empty Counter onto the stack to start a new nested scope.
  7. When a closing parenthesis ')' is encountered, read the digits following it to determine the multiplier (default 1 if missing). Pop the top Counter, multiply all its counts by the multiplier, and merge it into the previous Counter on the stack.
  8. After processing the entire formula, the remaining Counter on the stack contains the total counts for each atom.
  9. Sort the atoms alphabetically and construct the output string. Append the count only if it is greater than 1.

Why it works: The stack ensures that each nested level is counted separately, and the counts are correctly multiplied when parentheses close. The top of the stack always represents the current active scope, so atoms are accumulated in the right context. This guarantees that the final merged counter is correct.

Python Solution

from collections import Counter

class Solution:
    def countOfAtoms(self, formula: str) -> str:
        stack = [Counter()]
        i, n = 0, len(formula)

        while i < n:
            if formula[i] == '(':
                stack.append(Counter())
                i += 1
            elif formula[i] == ')':
                i += 1
                start = i
                while i < n and formula[i].isdigit():
                    i += 1
                multiplier = int(formula[start:i] or 1)
                top = stack.pop()
                for atom in top:
                    stack[-1][atom] += top[atom] * multiplier
            elif formula[i].isupper():
                start = i
                i += 1
                while i < n and formula[i].islower():
                    i += 1
                atom = formula[start:i]
                start_count = i
                while i < n and formula[i].isdigit():
                    i += 1
                count = int(formula[start_count:i] or 1)
                stack[-1][atom] += count
            else:
                i += 1

        result = []
        for atom in sorted(stack[-1]):
            cnt = stack[-1][atom]
            result.append(f"{atom}{cnt if cnt > 1 else ''}")
        return ''.join(result)

The code initializes a stack with a base Counter and processes the formula character by character. Parentheses push and pop new counters, applying multipliers. Atoms and their counts are added to the current top counter. Finally, the atoms are sorted and concatenated into the output string.

Go Solution

import (
    "strconv"
    "strings"
    "unicode"
)

func countOfAtoms(formula string) string {
    type Counter map[string]int
    stack := []Counter{{}}
    n := len(formula)
    i := 0

    for i < n {
        c := formula[i]
        if c == '(' {
            stack = append(stack, Counter{})
            i++
        } else if c == ')' {
            i++
            start := i
            for i < n && unicode.IsDigit(rune(formula[i])) {
                i++
            }
            multiplier := 1
            if start != i {
                multiplier, _ = strconv.Atoi(formula[start:i])
            }
            top := stack[len(stack)-1]
            stack = stack[:len(stack)-1]
            for atom, cnt := range top {
                stack[len(stack)-1][atom] += cnt * multiplier
            }
        } else if unicode.IsUpper(rune(c)) {
            start := i
            i++
            for i < n && unicode.IsLower(rune(formula[i])) {
                i++
            }
            atom := formula[start:i]
            startCount := i
            for i < n && unicode.IsDigit(rune(formula[i])) {
                i++
            }
            count := 1
            if startCount != i {
                count, _ = strconv.Atoi(formula[startCount:i])
            }
            stack[len(stack)-1][atom] += count
        } else {
            i++
        }
    }

    atoms := make([]string, 0, len(stack[0]))
    for atom := range stack[0] {
        atoms = append(atoms, atom)
    }
    sort.Strings(atoms)
    var sb strings.Builder
    for _, atom := range atoms {
        sb.WriteString(atom)
        if stack[0][atom] > 1 {
            sb.WriteString(strconv.Itoa(stack[0][atom]))
        }
    }
    return sb.String()
}

The Go solution mirrors the Python solution, with Go-specific handling of slices for the stack, map[string]int for counters, and strconv for converting numbers. Sorting and string concatenation use Go idioms.

Worked Examples

Example: "Mg(OH)2"

Step Character Stack State Notes
0 M [{}] Start reading atom
1 g [{}] Atom = Mg
2 ( [{Mg:1}, {}] Push new counter for '('
3 O [{Mg:1}, {}] Add O:1 to top
4 H [{Mg:1}, {O:1, H:1}] Add H:1 to top
5 ) [{Mg:1}] Pop top, multiply by 2, merge → {Mg:1, O:2, H:2}

Final sorted string: "H2MgO2"

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Single pass over formula O(n), sorting atoms at the end O(k log k) where k <= n
Space O(n) Stack and counters store at most all atoms; nested parentheses add extra counters

The algorithm is efficient since it avoids repeated string scanning and dictionary merging. Sorting dominates the log factor.

Test Cases

# Provided examples
assert Solution().countOfAtoms("H2O") == "H2O" # simple formula
assert Solution().countOfAtoms("Mg(OH)2") == "H2MgO2" # single-level parentheses
assert Solution().countOfAtoms("K4(ON(SO3)2)2") == "K4N2O14S4" # nested parentheses

# Edge cases
assert Solution().countOfAtoms("H") == "H" # single atom, no number
assert Solution().countOfAtoms("O2") == "O2" # single atom with number
assert Solution().countOfAtoms("(H)") == "H" # parentheses with no multiplier
assert Solution().countOfAtoms("(H)