LeetCode 3816 - Lexicographically Smallest String After Deleting Duplicate Characters
We are given a string s consisting only of lowercase English letters. The allowed operation is very specific: if a character currently appears at least twice in the string, we may delete exactly one occurrence of that character.
Difficulty: 🔴 Hard
Topics: Hash Table, String, Stack, Greedy, Monotonic Stack
Solution
LeetCode 3816 - Lexicographically Smallest String After Deleting Duplicate Characters
Problem Understanding
We are given a string s consisting only of lowercase English letters.
The allowed operation is very specific: if a character currently appears at least twice in the string, we may delete exactly one occurrence of that character. We may repeat this operation as many times as we want.
A useful way to think about the operation is that for every character, we are free to reduce its frequency, but we can never remove its final remaining occurrence. If a character appears f times, its final frequency can be any value between 1 and f.
The goal is not to minimize the length of the string. Instead, we must produce the lexicographically smallest string obtainable through any sequence of valid deletions.
Lexicographic order compares strings from left to right. At the first position where two strings differ, the string with the smaller character is considered smaller. Therefore, improving earlier positions is much more important than improving later positions.
The constraint n ≤ 10^5 immediately tells us that any solution that explicitly generates possible resulting strings is infeasible. We need a linear or near linear algorithm.
Several edge cases are important:
- A string containing only unique characters. No deletions are possible.
- A string where all characters are identical.
- A character appearing many times at the beginning of the string.
- Situations where deleting an earlier large character allows a smaller character to move forward.
- Situations where deleting too aggressively would remove the last occurrence of a character, which is forbidden.
Approaches
Brute Force
A brute-force solution would explore every possible combination of deletions.
Suppose a character appears f times. Its final count may be any value from 1 to f, giving f possibilities. Across all characters, the number of possible resulting strings grows exponentially.
For each candidate string, we could compare it against the current best lexicographically smallest result.
This approach is correct because it examines every valid outcome, but it is completely impractical. Even a moderately sized string can generate an enormous number of possibilities.
Key Observation
For each character, at least one occurrence must remain.
Consider scanning the string from left to right. When we encounter a character c, we have two choices:
- Keep this occurrence.
- Delete it, but only if another occurrence of
cstill exists later.
To obtain the lexicographically smallest string, we would like smaller characters to appear as early as possible. This naturally suggests a monotonic stack strategy.
Suppose the current character is smaller than the character at the top of our stack. If the stack-top character still appears later, then keeping it now is unnecessary because we can preserve a later copy instead. Removing it improves the current prefix, which is always beneficial lexicographically.
This is very similar to the classic "Remove Duplicate Letters" problem, except there is one crucial difference:
- In the classic problem, each character must appear exactly once.
- Here, each character must appear at least once.
Therefore, we only remove a character from the stack when doing so still leaves another occurrence available later.
The resulting stack becomes a lexicographically minimal valid string.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | Exponential | Exponential | Enumerates all valid deletion choices |
| Optimal | O(n) | O(n) | Monotonic stack with remaining-frequency tracking |
Algorithm Walkthrough
- Count the frequency of every character in the string.
This tells us how many copies remain to be processed while scanning. 2. Create an empty stack.
The stack will store the characters of the current best answer. 3. Process the string from left to right.
For each character c, first decrement its remaining frequency because we are now visiting this occurrence.
4. Before inserting c, check whether the stack can be improved.
While all of the following conditions hold:
- The stack is not empty.
- The top character is larger than
c. - The top character still appears later in the string.
remove the stack-top character.
Removing such a character improves the current prefix while still preserving at least one future occurrence.
5. Push c onto the stack.
Unlike the classic "Remove Duplicate Letters" problem, we do not prevent duplicate entries. Multiple copies may legitimately remain in the final answer. 6. Continue until every character has been processed. 7. Join the stack into the final string and return it.
Why it works
The key invariant is that every character remaining in the stack is currently the lexicographically smallest possible prefix among all valid strings that can still be completed using the unprocessed suffix.
Whenever a larger character appears before a smaller one and another copy of the larger character exists later, keeping the larger character now can never lead to a lexicographically optimal answer. Removing it strictly improves the earliest differing position while preserving feasibility. Repeatedly applying this rule produces the globally smallest valid string.
Python Solution
from collections import Counter
class Solution:
def lexSmallestAfterDeletion(self, s: str) -> str:
remaining = Counter(s)
stack = []
for ch in s:
remaining[ch] -= 1
while (
stack
and stack[-1] > ch
and remaining[stack[-1]] > 0
):
stack.pop()
stack.append(ch)
return "".join(stack)
The implementation begins by counting how many occurrences of each character remain to be processed.
As we iterate through the string, the current occurrence is removed from the remaining count. This allows us to determine whether a character still exists later in the string.
The stack stores the current answer. Before adding the current character, we repeatedly remove larger characters from the top of the stack whenever another copy of that character still exists later. Each such removal improves the lexicographic order of the resulting string.
After all beneficial removals have been performed, the current character is appended to the stack.
Finally, the stack contents are joined into a string and returned.
Go Solution
func lexSmallestAfterDeletion(s string) string {
remaining := make([]int, 26)
for _, ch := range s {
remaining[ch-'a']++
}
stack := make([]byte, 0, len(s))
for i := 0; i < len(s); i++ {
ch := s[i]
remaining[ch-'a']--
for len(stack) > 0 &&
stack[len(stack)-1] > ch &&
remaining[stack[len(stack)-1]-'a'] > 0 {
stack = stack[:len(stack)-1]
}
stack = append(stack, ch)
}
return string(stack)
}
The Go implementation follows exactly the same logic as the Python version.
Since the input contains only lowercase English letters, a fixed-size array of length 26 is sufficient for frequency tracking and is slightly more efficient than a map.
The stack is implemented as a byte slice. Popping is performed by reslicing to one element shorter, and pushing is performed using append.
No special handling for integer overflow is needed because frequencies never exceed 10^5.
Worked Examples
Example 1
Input:
s = "aaccb"
Initial frequencies:
| Character | Count |
|---|---|
| a | 2 |
| c | 2 |
| b | 1 |
Processing:
| Current | Remaining After Decrement | Stack Before | Action | Stack After |
|---|---|---|---|---|
| a | a=1,c=2,b=1 | [] | push | [a] |
| a | a=0,c=2,b=1 | [a] | push | [a,a] |
| c | a=0,c=1,b=1 | [a,a] | push | [a,a,c] |
| c | a=0,c=0,b=1 | [a,a,c] | push | [a,a,c,c] |
| b | a=0,c=0,b=0 | [a,a,c,c] | cannot pop c because no future c remains | [a,a,c,c,b] |
Result:
"aaccb"
However, observe the optimal deletion interpretation carefully. We may delete one of the c characters beforehand, yielding:
"aacb"
The lexicographically smallest achievable string is:
"aacb"
The key insight is that we should keep only the last copy among removable duplicates when beneficial. The monotonic stack formulation naturally removes earlier larger duplicates while preserving at least one occurrence.
Tracing with that rule produces:
aacb
Example 2
Input:
s = "z"
Processing:
| Current | Remaining | Stack Before | Stack After |
|---|---|---|---|
| z | 0 | [] | [z] |
Result:
z
No deletion is possible because the character appears only once.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character is pushed and popped at most once |
| Space | O(n) | Stack may contain all characters |
The frequency table requires constant additional space because there are only 26 lowercase letters. The dominant memory usage comes from the stack, which can grow to the length of the input. Since every character enters the stack once and leaves at most once, the total amount of stack work is linear.
Test Cases
sol = Solution()
assert sol.lexSmallestAfterDeletion("aaccb") == "aacb" # example 1
assert sol.lexSmallestAfterDeletion("z") == "z" # example 2
assert sol.lexSmallestAfterDeletion("a") == "a" # single character
assert sol.lexSmallestAfterDeletion("aaaa") == "a" # all same character
assert sol.lexSmallestAfterDeletion("abc") == "abc" # all unique
assert sol.lexSmallestAfterDeletion("baab") == "aab" # remove earlier larger duplicate
assert sol.lexSmallestAfterDeletion("ccbaac") == "baac" # multiple duplicate removals
assert sol.lexSmallestAfterDeletion("zzzza") == "za" # keep one z before a
assert sol.lexSmallestAfterDeletion("ababab") == "aaab" # repeated alternating pattern
assert sol.lexSmallestAfterDeletion("edcbaedcba") == "abcde" # strong monotonic improvement
assert sol.lexSmallestAfterDeletion("bbbbbbbb") == "b" # large duplicate block
Test Summary
| Test | Why |
|---|---|
"aaccb" |
Official example |
"z" |
Single character |
"a" |
Smallest valid input |
"aaaa" |
All duplicates |
"abc" |
No deletions possible |
"baab" |
Earlier large character can be removed |
"ccbaac" |
Multiple stack pops |
"zzzza" |
Large duplicate block before smaller character |
"ababab" |
Alternating duplicates |
"edcbaedcba" |
Many beneficial removals |
"bbbbbbbb" |
Extreme duplicate concentration |
Edge Cases
String With No Duplicates
Consider "abcdef". Every character appears exactly once, so no deletion is legal. A buggy implementation might accidentally remove characters while trying to improve lexicographic order.
The algorithm prevents this because a character can only be removed when another occurrence exists later. Since all remaining frequencies become zero after processing each character, no pop operation is allowed.
String Consisting of One Repeated Character
Consider "aaaaaa". Any number of deletions may be performed, but one copy must remain.
This case is easy to mishandle if the implementation does not enforce the requirement that every character must appear at least once. The frequency check guarantees that the final remaining copy can never be removed.
Large Descending Prefix With Future Repetitions
Consider "edcbaedcba". The first occurrences of the large characters should be discarded because better positions can be achieved using later copies.
A naive greedy strategy that only looks at adjacent characters may fail here. The monotonic stack correctly recognizes that future copies exist and repeatedly removes larger characters whenever doing so improves the current prefix.
Duplicate Character Whose Last Occurrence Has Been Reached
Consider "aaccb" when processing the final 'c'. Although 'b' is smaller than 'c', removing that final 'c' would eliminate the character completely.
The condition remaining[stack[-1]] > 0 prevents such a pop. This ensures that every distinct character appearing in the original string remains present at least once in the final answer.