LeetCode 1081 - Smallest Subsequence of Distinct Characters
The problem asks us to construct the lexicographically smallest subsequence of a string that contains every distinct character exactly once. A subsequence is formed by deleting zero or more characters without changing the relative order of the remaining characters.
Difficulty: 🟡 Medium
Topics: String, Stack, Greedy, Monotonic Stack
Solution
Problem Understanding
The problem asks us to construct the lexicographically smallest subsequence of a string that contains every distinct character exactly once.
A subsequence is formed by deleting zero or more characters without changing the relative order of the remaining characters. This is important because we are not allowed to rearrange characters arbitrarily. The final answer must preserve the order in which characters appear in the original string.
We are given a string s consisting only of lowercase English letters. Some characters may appear multiple times. Our goal is to remove duplicates while keeping exactly one occurrence of each distinct character, but we must do so in a way that produces the smallest possible lexicographical result.
Lexicographical order is the same ordering used in dictionaries. For example:
"abc"is smaller than"acb""acdb"is smaller than"adbc"
The constraints are relatively small:
1 <= s.length <= 1000- Only lowercase English letters appear
Even though the input size is not extremely large, brute force approaches quickly become infeasible because the number of subsequences grows exponentially.
There are several important observations and edge cases:
- A character may appear many times, but the final answer can only contain it once.
- We cannot simply sort the distinct characters because the relative order from the original string must be preserved.
- Sometimes we should remove an earlier character occurrence if it appears again later and replacing it creates a smaller lexicographical result.
- Strings with already increasing order, such as
"abc", should remain unchanged. - Strings with repeated decreasing patterns, such as
"cbacdcbc", require careful stack manipulation to achieve the optimal answer.
The key challenge is balancing two requirements simultaneously:
- Every distinct character must appear exactly once.
- The resulting subsequence must be lexicographically smallest.
Approaches
Brute Force Approach
A brute force solution would generate all possible subsequences of the string and filter those that:
- Contain every distinct character exactly once.
- Preserve subsequence ordering.
- Have the correct length equal to the number of distinct characters.
Among all valid subsequences, we would choose the lexicographically smallest one.
This approach is correct because it exhaustively checks every possible candidate. However, a string of length n has 2^n subsequences. Even for moderate input sizes, this becomes computationally impossible.
For example, if n = 1000, the number of subsequences is astronomically large.
Optimal Greedy + Monotonic Stack Approach
The efficient solution relies on a greedy strategy combined with a stack.
The main insight is:
When building the answer from left to right, if the current character is smaller than the character at the top of the stack, and the top character appears again later, then removing the larger character now leads to a lexicographically smaller result without losing that character permanently.
This observation allows us to greedily maintain the smallest possible subsequence at every step.
We use:
- A stack to build the result
- A frequency counter to know whether a character appears later
- A set to track which characters are already included
The stack behaves like a monotonic structure where we remove larger characters whenever it is safe and beneficial to do so.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^n) | O(2^n) | Generates all subsequences and filters valid ones |
| Optimal | O(n) | O(1) | Greedy monotonic stack with character tracking |
Algorithm Walkthrough
- Count the frequency of every character in the string.
We need to know whether a character will appear again later. This information determines whether it is safe to remove a character from the stack. 2. Initialize an empty stack.
The stack stores the characters of our current best subsequence.
3. Initialize a set called in_stack.
This helps us avoid adding duplicate characters into the result. 4. Iterate through each character in the string.
For every character:
- Decrease its remaining frequency because we are currently processing one occurrence.
- If the character is already in the stack, skip it because we only want one occurrence.
- While the stack is not empty, check whether the current character should replace the top character.
We remove the top character if all three conditions hold:
- The stack is not empty.
- The current character is lexicographically smaller than the top character.
- The top character appears again later.
If all conditions are true, removing the top character improves the lexicographical order and is still safe because we can reinsert that character later. 6. Push the current character onto the stack.
After removing any larger unnecessary characters, we add the current character into the subsequence. 7. Mark the character as present in the stack. 8. After processing all characters, join the stack into a string.
This produces the final lexicographically smallest subsequence.
Why it works
The algorithm maintains the invariant that the stack always represents the smallest lexicographical subsequence that can still be completed into a valid final answer.
Whenever a larger character appears before a smaller one, and the larger character can still be used later, removing it immediately improves the answer. Since we never remove a character unless it appears again later, we never lose the ability to include all distinct characters exactly once.
This greedy choice is always optimal because lexicographical order is determined from left to right. Making earlier characters as small as possible guarantees the globally smallest answer.
Python Solution
class Solution:
def smallestSubsequence(self, s: str) -> str:
remaining_count = {}
for char in s:
remaining_count[char] = remaining_count.get(char, 0) + 1
stack = []
in_stack = set()
for char in s:
remaining_count[char] -= 1
if char in in_stack:
continue
while (
stack
and char < stack[-1]
and remaining_count[stack[-1]] > 0
):
removed_char = stack.pop()
in_stack.remove(removed_char)
stack.append(char)
in_stack.add(char)
return "".join(stack)
The implementation begins by counting how many times each character appears in the string. This allows the algorithm to determine whether a character can safely be removed from the stack and reinserted later.
The stack stores the current subsequence being built. The in_stack set ensures that each character appears at most once in the stack.
For every character in the input string, the algorithm first decreases its remaining count because one occurrence has now been processed.
If the character already exists in the stack, it is skipped immediately. This enforces the requirement that every distinct character appears exactly once.
Otherwise, the algorithm checks whether the current character should replace larger characters near the end of the current subsequence. The while loop repeatedly removes stack elements as long as:
- The stack is not empty.
- The current character is smaller.
- The removed character appears again later.
This greedy improvement ensures lexicographical minimality.
Finally, the character is pushed into the stack and marked as present.
At the end, joining the stack produces the final answer.
Go Solution
func smallestSubsequence(s string) string {
remainingCount := make(map[byte]int)
for i := 0; i < len(s); i++ {
remainingCount[s[i]]++
}
stack := make([]byte, 0)
inStack := make(map[byte]bool)
for i := 0; i < len(s); i++ {
char := s[i]
remainingCount[char]--
if inStack[char] {
continue
}
for len(stack) > 0 &&
char < stack[len(stack)-1] &&
remainingCount[stack[len(stack)-1]] > 0 {
removedChar := stack[len(stack)-1]
stack = stack[:len(stack)-1]
delete(inStack, removedChar)
}
stack = append(stack, char)
inStack[char] = true
}
return string(stack)
}
The Go implementation follows the same algorithmic structure as the Python version.
A map[byte]int stores remaining frequencies because the string contains only lowercase English letters. The stack is implemented using a byte slice, which provides efficient append and pop operations.
The inStack map tracks whether a character already exists in the stack.
One implementation detail in Go is stack popping. Since slices do not have a built in pop operation, the code removes the last element using:
stack = stack[:len(stack)-1]
The final byte slice is converted back into a string before returning.
Worked Examples
Example 1
Input:
s = "bcabc"
Initial frequency map:
| Character | Count |
|---|---|
| b | 2 |
| c | 2 |
| a | 1 |
Processing steps:
| Current Char | Stack Before | Action | Stack After |
|---|---|---|---|
| b | [] | push b | [b] |
| c | [b] | push c | [b, c] |
| a | [b, c] | pop c, pop b, push a | [a] |
| b | [a] | push b | [a, b] |
| c | [a, b] | push c | [a, b, c] |
Final answer:
"abc"
Example 2
Input:
s = "cbacdcbc"
Initial frequency map:
| Character | Count |
|---|---|
| c | 4 |
| b | 2 |
| a | 1 |
| d | 1 |
Processing steps:
| Current Char | Stack Before | Action | Stack After |
|---|---|---|---|
| c | [] | push c | [c] |
| b | [c] | pop c, push b | [b] |
| a | [b] | pop b, push a | [a] |
| c | [a] | push c | [a, c] |
| d | [a, c] | push d | [a, c, d] |
| c | [a, c, d] | skip duplicate | [a, c, d] |
| b | [a, c, d] | cannot pop d, pop c, push b | [a, d, b] |
| c | [a, d, b] | push c | [a, d, b, c] |
Final answer:
"acdb"
Notice the subtle detail:
dcannot be removed because it does not appear later.ccan be removed because anothercexists later.
This distinction is what makes the algorithm correct.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character is pushed and popped at most once |
| Space | O(1) | At most 26 lowercase English letters are stored |
The algorithm processes every character exactly once. Although the inner while loop may appear expensive, each character can only be pushed and popped one time throughout the entire execution. Therefore, the total work remains linear.
The space complexity is technically bounded by the alphabet size. Since only lowercase English letters are allowed, the maximum number of distinct characters is 26, which is constant space.
Test Cases
solution = Solution()
assert solution.smallestSubsequence("bcabc") == "abc" # basic example
assert solution.smallestSubsequence("cbacdcbc") == "acdb" # complex stack popping
assert solution.smallestSubsequence("a") == "a" # single character
assert solution.smallestSubsequence("aaaa") == "a" # all duplicates
assert solution.smallestSubsequence("abc") == "abc" # already smallest
assert solution.smallestSubsequence("cba") == "cba" # cannot reorder arbitrarily
assert solution.smallestSubsequence("ecbacba") == "eacb" # multiple removals
assert solution.smallestSubsequence("leetcode") == "letcod" # mixed duplicates
assert solution.smallestSubsequence("bbcaac") == "bac" # delayed replacement
assert solution.smallestSubsequence("cdadabcc") == "adbc" # frequent popping
assert solution.smallestSubsequence("zxyzzxy") == "xyz" # repeated pattern
assert solution.smallestSubsequence("abcdefghijklmnopqrstuvwxyza") == \
"abcdefghijklmnopqrstuvwxyz" # full alphabet with duplicate
| Test | Why |
|---|---|
"bcabc" |
Validates the standard example |
"cbacdcbc" |
Tests complex greedy popping logic |
"a" |
Smallest possible input |
"aaaa" |
Ensures duplicates are removed correctly |
"abc" |
Already optimal ordering |
"cba" |
Confirms subsequence ordering cannot be rearranged |
"ecbacba" |
Tests multiple stack removals |
"leetcode" |
Handles scattered duplicates |
"bbcaac" |
Validates delayed lexicographical improvement |
"cdadabcc" |
Tests repeated popping conditions |
"zxyzzxy" |
Repeated descending and ascending patterns |
| Full alphabet test | Ensures scalability across many distinct letters |
Edge Cases
One important edge case occurs when all characters are identical, such as "aaaa". A naive implementation might accidentally include multiple copies of the same character. The algorithm handles this correctly using the in_stack set, which prevents duplicates from being inserted into the stack.
Another tricky case is a strictly decreasing sequence such as "cba". It may seem tempting to rearrange characters into "abc", but that would violate subsequence ordering rules. Since characters must preserve their original relative positions, the correct answer remains "cba". The algorithm respects this naturally because it only removes characters when they appear again later.
A more subtle edge case occurs when a larger character should not be removed because it never appears again later. For example, in "cbacdcbc", when processing 'b', the stack contains 'd'. Even though 'b' is lexicographically smaller than 'd', removing 'd' would permanently lose it because no future 'd' exists. The condition:
remaining_count[stack[-1]] > 0
prevents this mistake and guarantees correctness.
Another important scenario involves already optimal strings such as "abc". The algorithm should not unnecessarily modify the sequence. Since no smaller character appears later that justifies popping, the stack simply grows in order and returns the original string unchanged.