LeetCode 316 - Remove Duplicate Letters

This problem asks us to construct a new string from the input string s such that every distinct character appears exactly once, while also ensuring that the final result is the smallest possible in lexicographical order.

LeetCode Problem 316

Difficulty: 🟡 Medium
Topics: String, Stack, Greedy, Monotonic Stack

Solution

LeetCode 316, Remove Duplicate Letters

Problem Understanding

This problem asks us to construct a new string from the input string s such that every distinct character appears exactly once, while also ensuring that the final result is the smallest possible in lexicographical order.

The input is a string consisting only of lowercase English letters. Characters may appear multiple times. We are allowed to remove characters, but we cannot reorder the remaining characters arbitrarily. The relative order of characters that remain must still respect the original string order because we are effectively choosing a subsequence.

The challenge comes from balancing two competing goals:

  1. Every unique character must appear exactly once.
  2. Among all valid answers, we must choose the lexicographically smallest one.

For example, consider:

s = "bcabc"

The distinct characters are a, b, and c. Some valid subsequences containing all three exactly once are:

  • "bca"
  • "cab"
  • "abc"

Among these, "abc" is lexicographically smallest, so that is the correct answer.

The constraints tell us that:

  • 1 <= s.length <= 10^4
  • Only lowercase English letters are used

Since the input can contain up to ten thousand characters, exponential or factorial solutions are not feasible. We need an algorithm close to linear time.

Several edge cases are important:

  • A string with all identical characters, such as "aaaaa", should return "a".
  • A string already in optimal form, such as "abc", should remain unchanged.
  • A string where a smaller character appears later and should replace an earlier larger character, such as "cbacdcbc", requires careful greedy decisions.
  • Characters that appear only once cannot be removed once chosen.

The core difficulty is determining when it is safe to remove a previously chosen character in favor of a smaller one that appears later.

Approaches

Brute Force Approach

A brute force solution would generate every possible subsequence of the string, then filter only those subsequences that:

  1. Contain every distinct character exactly once.
  2. Have length equal to the number of distinct characters.

Among all valid candidates, we would return the lexicographically smallest one.

This approach is correct because it exhaustively checks all possibilities. However, a string of length n has 2^n subsequences. With n up to 10^4, this is computationally impossible.

Even if we attempted optimizations, the search space grows exponentially and quickly becomes unmanageable.

Optimal Greedy + Monotonic Stack Approach

The key insight is that we want the smallest lexicographical result, which means we should prefer smaller characters earlier in the answer whenever possible.

Suppose we already placed a character in our result stack. If we later encounter a smaller character, we would like to remove the larger one and place the smaller one earlier. However, we can only do this safely if the removed character appears again later in the string. Otherwise, we would lose it completely.

This leads to a greedy strategy:

  • Build the answer incrementally using a stack.

  • Keep the stack lexicographically increasing when possible.

  • Before adding a new character:

  • Remove larger characters from the top of the stack if they appear again later.

  • Avoid duplicates by ensuring each character appears only once in the stack.

This works because each greedy removal improves lexicographical order without sacrificing the ability to include all required characters later.

Approach Time Complexity Space Complexity Notes
Brute Force O(2^n) O(2^n) Generates all subsequences and checks validity
Optimal O(n) O(1) or O(n) Uses greedy strategy with monotonic stack

Algorithm Walkthrough

  1. Count how many times each character appears in the string.

We need to know whether a character will appear again later. A frequency map allows us to determine whether it is safe to remove a character from the stack. 2. Create an empty stack to build the answer.

The stack represents the current best subsequence. 3. Create a set called in_stack.

This tracks which characters are already present in the stack so we do not insert duplicates. 4. Iterate through each character in the string.

For every character:

  • Decrease its remaining frequency count because we are currently processing it.
  1. If the character is already in the stack, skip it.

Since each character must appear exactly once, we do not want duplicates. 6. While the stack is not empty, check whether we should remove the top character.

We remove the top character if all three conditions are true:

  • The current character is lexicographically smaller than the top character.
  • The top character appears again later in the string.
  • Removing it would improve lexicographical order.

This is the greedy optimization step. 7. Push the current character onto the stack.

After removing any unnecessary larger characters, the current character is inserted. 8. Mark the character as present in the stack.

Add it to in_stack. 9. After processing all characters, join the stack into a string.

This final stack is the smallest valid subsequence.

Why it works

The algorithm maintains an important invariant:

  • The stack always represents the smallest lexicographical subsequence possible using the characters processed so far.

Whenever we remove a character from the stack, we know it appears again later, so we are not losing the ability to include it in the final answer. By always removing larger characters when a smaller one can replace them safely, we greedily improve lexicographical order while preserving validity.

Because each character is pushed and popped at most once, the algorithm remains efficient.

Python Solution

class Solution:
    def removeDuplicateLetters(self, s: str) -> str:
        remaining = {}

        for char in s:
            remaining[char] = remaining.get(char, 0) + 1

        stack = []
        in_stack = set()

        for char in s:
            remaining[char] -= 1

            if char in in_stack:
                continue

            while (
                stack
                and char < stack[-1]
                and remaining[stack[-1]] > 0
            ):
                removed = stack.pop()
                in_stack.remove(removed)

            stack.append(char)
            in_stack.add(char)

        return "".join(stack)

The implementation begins by counting the remaining occurrences of every character. This allows the algorithm to determine whether a character can safely be removed from the stack.

The stack stores the current answer being built. The in_stack set prevents duplicate insertions.

For every character in the string, we first decrement its remaining count because we are currently visiting that occurrence.

If the character is already in the stack, we skip it immediately since every character should appear only once.

Otherwise, we compare the current character with the top of the stack. If the current character is smaller and the top character appears later again, we remove the larger character. This process continues until no more beneficial removals are possible.

Finally, the current character is appended to the stack and marked as present.

At the end, joining the stack produces the lexicographically smallest valid subsequence.

Go Solution

func removeDuplicateLetters(s string) string {
    remaining := make(map[byte]int)

    for i := 0; i < len(s); i++ {
        remaining[s[i]]++
    }

    stack := make([]byte, 0)
    inStack := make(map[byte]bool)

    for i := 0; i < len(s); i++ {
        char := s[i]
        remaining[char]--

        if inStack[char] {
            continue
        }

        for len(stack) > 0 &&
            char < stack[len(stack)-1] &&
            remaining[stack[len(stack)-1]] > 0 {

            removed := stack[len(stack)-1]
            stack = stack[:len(stack)-1]
            delete(inStack, removed)
        }

        stack = append(stack, char)
        inStack[char] = true
    }

    return string(stack)
}

The Go implementation follows the same logic as the Python solution. Since Go strings are byte slices for ASCII characters, we use byte for character storage.

The stack is implemented using a slice of bytes. Removing the top element is done using slice truncation:

stack = stack[:len(stack)-1]

The inStack map tracks whether a character is already present.

Because the input contains only lowercase English letters, there are no Unicode complications in this solution.

Worked Examples

Example 1

s = "bcabc"

Initial frequencies:

Character Count
a 1
b 2
c 2

Processing steps:

Current Char Stack Before Action Stack After
b [] push b [b]
c [b] push c [b, c]
a [b, c] pop c, pop b, push a [a]
b [a] push b [a, b]
c [a, b] push c [a, b, c]

Final result:

"abc"

Example 2

s = "cbacdcbc"

Initial frequencies:

Character Count
a 1
b 2
c 4
d 1

Processing steps:

Current Char Stack Before Action Stack After
c [] push c [c]
b [c] pop c, push b [b]
a [b] pop b, push a [a]
c [a] push c [a, c]
d [a, c] push d [a, c, d]
c [a, c, d] skip duplicate [a, c, d]
b [a, c, d] cannot pop d, push b [a, c, d, b]
c [a, c, d, b] skip duplicate [a, c, d, b]

Final result:

"acdb"

Notice the important moment when processing "b" near the end:

  • d is larger than b
  • However, d does not appear again later
  • Therefore, removing d would make the result invalid

So d must remain.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is pushed and popped at most once
Space O(n) Stack, set, and frequency map store characters

The algorithm is linear because every character can enter the stack once and leave the stack once. Even though there is a nested while loop, the total number of pops across the entire execution is bounded by n.

The auxiliary space includes:

  • The stack
  • The set of used characters
  • The frequency map

Since the alphabet contains only lowercase English letters, the practical space usage is very small, though asymptotically it is still considered O(n) or O(1) relative to alphabet size.

Test Cases

solution = Solution()

assert solution.removeDuplicateLetters("bcabc") == "abc"  # basic example
assert solution.removeDuplicateLetters("cbacdcbc") == "acdb"  # complex greedy case

assert solution.removeDuplicateLetters("a") == "a"  # single character
assert solution.removeDuplicateLetters("aaaaa") == "a"  # all duplicates
assert solution.removeDuplicateLetters("abc") == "abc"  # already optimal
assert solution.removeDuplicateLetters("cba") == "cba"  # cannot reorder arbitrarily

assert solution.removeDuplicateLetters("abacb") == "abc"  # repeated middle characters
assert solution.removeDuplicateLetters("bbcaac") == "bac"  # requires selective popping

assert solution.removeDuplicateLetters("ecbacba") == "eacb"  # classic tricky case
assert solution.removeDuplicateLetters("leetcode") == "letcod"  # mixed duplicates

assert solution.removeDuplicateLetters("zxyxz") == "xyz"  # lexicographical improvement
assert solution.removeDuplicateLetters("abcdabcd") == "abcd"  # repeated sequence
Test Why
"bcabc" Validates standard example
"cbacdcbc" Tests complex stack decisions
"a" Smallest possible input
"aaaaa" Ensures duplicates collapse correctly
"abc" Already optimal ordering
"cba" Confirms characters cannot be arbitrarily reordered
"abacb" Tests duplicate skipping
"bbcaac" Tests repeated popping behavior
"ecbacba" Common tricky greedy scenario
"leetcode" Mixed repeated characters
"zxyxz" Ensures lexicographical minimization
"abcdabcd" Repeated ordered pattern

Edge Cases

All Characters Are the Same

Consider:

"aaaaaa"

A buggy implementation might repeatedly add the same character or fail to remove duplicates properly. The correct result is simply "a".

The implementation handles this using the in_stack set. Once "a" is added to the stack, every later occurrence is skipped immediately.

A Larger Character Cannot Be Removed

Consider:

"cbad"

When processing "a", the stack contains "b". Although "a" is smaller, "b" does not appear later again. Removing "b" would permanently lose it, making the final result invalid.

The condition:

remaining[stack[-1]] > 0

prevents unsafe removals.

Characters Already in Optimal Order

Consider:

"abc"

The stack already forms the smallest lexicographical sequence, so no popping occurs. The algorithm simply appends each character.

This verifies that the greedy logic does not over-optimize or accidentally remove characters unnecessarily.

Repeated Characters Interleaved With Smaller Ones

Consider:

"bbcaac"

The algorithm must carefully decide when to remove characters and when to keep them. Multiple pops may occur in sequence before inserting a smaller character.

The monotonic stack structure naturally handles this by repeatedly checking whether the current character should replace larger previous ones.