LeetCode 1081: Smallest Subsequence of Distinct Characters

A clear explanation of finding the lexicographically smallest subsequence with all distinct characters using a greedy stack approach.

Problem Restatement

We are given a string s.

We need to return the lexicographically smallest subsequence of s that contains all distinct characters of s exactly once.

The official constraints state that 1 <= s.length <= 1000 and s consists of lowercase English letters.

Note: This problem is identical to LeetCode 316 (Remove Duplicate Letters).

Input and Output

Item Meaning
Input String s
Output Lexicographically smallest subsequence with each character appearing exactly once

Function shape:

def smallestSubsequence(s: str) -> str:
    ...

Examples

Example 1:

s = "bcabc"

Answer:

"abc"

Example 2:

s = "cbacdcbc"

Answer:

"acdb"

Key Insight

Use a monotonic stack. For each character:

  1. If it is already in the stack, skip it.
  2. Otherwise, pop characters from the top of the stack that are greater than the current character and appear later in the string (so they can be added again).
  3. Push the current character.

Algorithm

  1. Count last occurrence of each character.
  2. Maintain a stack and an in_stack boolean set.
  3. For each character:
    • Skip if already in stack.
    • Pop from stack while top > current char and top's last occurrence is later.
    • Push current char.

Correctness

By popping larger characters that will appear again, we defer them to a later (possibly better) position. The resulting stack is always lexicographically smallest while including all characters exactly once.

Edge Cases

  • The stack should store exactly the unresolved items needed by the invariant.
  • Process equal values deliberately; many monotonic-stack problems differ only in < versus <=.
  • Flush or inspect the remaining stack after the scan if unresolved items still contribute to the answer.

Complexity

Metric Value Why
Time O(n) Each character is pushed and popped at most once
Space O(1) At most 26 characters in stack

Common Pitfalls

  • Do not optimize away the invariant; the code should still make it clear what condition is being maintained.
  • Prefer problem-specific names over one-letter variables once the logic becomes stateful.

Implementation

class Solution:
    def smallestSubsequence(self, s: str) -> str:
        last = {c: i for i, c in enumerate(s)}
        stack = []
        in_stack = set()

        for i, c in enumerate(s):
            if c in in_stack:
                continue
            while stack and stack[-1] > c and last[stack[-1]] > i:
                in_stack.discard(stack.pop())
            stack.append(c)
            in_stack.add(c)

        return "".join(stack)

Testing

def run_tests():
    s = Solution()

    assert s.smallestSubsequence("bcabc") == "abc"
    assert s.smallestSubsequence("cbacdcbc") == "acdb"
    assert s.smallestSubsequence("a") == "a"
    assert s.smallestSubsequence("abcd") == "abcd"

    print("all tests passed")

run_tests()
Test Expected Why
"bcabc" "abc" All three chars, lexicographic minimum
"cbacdcbc" "acdb" Complex greedy choices
"a" "a" Single character
Already distinct "abcd" Already optimal