LeetCode 2030 - Smallest K-Length Subsequence With Occurrences of a Letter

The problem asks us to construct the lexicographically smallest subsequence of length k from the string s, while ensuring that a specific character, letter, appears at least repetition times in the resulting subsequence.

LeetCode Problem 2030

Difficulty: 🔴 Hard
Topics: String, Stack, Greedy, Monotonic Stack

Solution

Problem Understanding

The problem asks us to construct the lexicographically smallest subsequence of length k from the string s, while ensuring that a specific character, letter, appears at least repetition times in the resulting subsequence.

A subsequence preserves the relative order of characters from the original string, but characters may be skipped. For example, from "leetcode", the subsequence "ecde" is valid because those characters appear in order.

The key challenge is that we are optimizing two constraints at the same time:

  1. The subsequence must have exactly length k
  2. The subsequence must contain at least repetition copies of letter

Among all valid subsequences, we want the lexicographically smallest one. Lexicographical order works the same way as dictionary order. A smaller character earlier in the string is always preferred.

The constraints are important:

  • k can be as large as 5 * 10^4
  • s.length can also be 5 * 10^4

This immediately tells us that exponential or quadratic generation of subsequences is impossible. We need a linear or near linear algorithm.

There are several tricky situations that can easily cause bugs:

  • Removing too many copies of letter while trying to make the sequence lexicographically smaller
  • Filling the subsequence too early and leaving no room for required letters later
  • Keeping unnecessary large characters because of incorrect stack conditions
  • Cases where every character is the required letter
  • Cases where k == len(s), meaning nothing can be removed

The problem guarantees that letter appears at least repetition times in s, so a valid solution always exists.

Approaches

Brute Force Approach

The most straightforward solution is to generate every subsequence of length k, then filter only those containing at least repetition copies of letter. Among the remaining candidates, we return the lexicographically smallest one.

This works because it explicitly checks every possible valid subsequence.

However, the number of subsequences of length k is:

$$\binom{n}{k}$$

This grows exponentially large. For n = 50000, brute force is completely infeasible.

Even for much smaller inputs, generating all subsequences would already exceed time limits.

Key Insight

The important observation is that lexicographically smallest subsequence problems are usually solved using a monotonic stack.

At every character, we would like to remove larger previous characters if doing so creates a smaller lexicographical result.

However, unlike standard monotonic stack problems, we also have two extra constraints:

  1. The final subsequence length must be exactly k
  2. We must preserve enough copies of letter

So we cannot greedily pop characters unless we know:

  • We still have enough remaining characters to reach length k
  • We still have enough remaining letter characters to satisfy repetition

This transforms the problem into a constrained greedy stack problem.

The stack always stores the best subsequence built so far.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(2^n) O(k) Generates all subsequences
Optimal O(n) O(k) Monotonic stack with greedy constraints

Algorithm Walkthrough

Step 1: Count Remaining Required Letters

Before processing, count how many occurrences of letter exist in the string.

This helps us determine whether it is safe to remove a letter from the stack later.

remaining_letter = total count of letter in s

We also track:

letters_in_stack = number of letter currently in stack

Step 2: Process Characters One by One

We iterate through the string from left to right.

For each character:

  • We try to remove larger characters from the end of the stack
  • But only if doing so still allows a valid solution later

Step 3: Decide Whether to Pop

While the stack is not empty, we check whether the top character should be removed.

We pop if all conditions are true:

  1. Current character is smaller than stack top
  2. After popping, we can still build a subsequence of length k
  3. If the popped character is letter, enough copies remain later

The second condition is:

len(stack) - 1 + remaining_characters >= k

This ensures we can still fill the subsequence later.

The third condition prevents removing too many required letters.

Step 4: Decide Whether to Push

After popping larger characters, we decide whether to add the current character.

We only push if the stack length is still less than k.

Then we handle two cases:

Case A: Current Character is letter

We always prefer adding it because we need at least repetition copies.

Case B: Current Character is Not letter

We can only add it if enough remaining positions still exist for required letters.

Specifically:

remaining_slots_after_push >= needed_letters

Where:

needed_letters = repetition - letters_in_stack

This prevents us from filling the stack with non-letter characters too early.

Step 5: Update Remaining Counts

After processing the current character, decrement the count of remaining letters if the character equals letter.

This keeps future decisions accurate.

Why it works

The algorithm maintains a greedy invariant:

  • The stack is always the lexicographically smallest valid partial subsequence possible so far.

Whenever a larger character can safely be removed, we remove it immediately because a smaller earlier character always improves lexicographical order.

The safety conditions guarantee that:

  • We never lose the ability to reach length k
  • We never lose the ability to satisfy the required number of letter occurrences

Therefore, the final stack is both valid and lexicographically minimal.

Python Solution

class Solution:
    def smallestSubsequence(self, s: str, k: int, letter: str, repetition: int) -> str:
        total_letters_remaining = s.count(letter)

        stack = []
        letters_in_stack = 0

        n = len(s)

        for i, ch in enumerate(s):
            # Try to make lexicographically smaller result
            while (
                stack
                and stack[-1] > ch
                and len(stack) - 1 + (n - i) >= k
            ):
                # If popping a required letter, ensure enough remain
                if stack[-1] == letter:
                    if letters_in_stack - 1 + total_letters_remaining < repetition:
                        break
                    letters_in_stack -= 1

                stack.pop()

            # Decide whether to push current character
            if len(stack) < k:
                if ch == letter:
                    stack.append(ch)
                    letters_in_stack += 1
                else:
                    # Ensure enough room remains for required letters
                    remaining_slots = k - len(stack)
                    needed_letters = repetition - letters_in_stack

                    if remaining_slots > needed_letters:
                        stack.append(ch)

            # Update remaining letters count
            if ch == letter:
                total_letters_remaining -= 1

        return "".join(stack)

The implementation follows the greedy monotonic stack strategy exactly.

The variable total_letters_remaining tracks how many required letters still exist in the unprocessed portion of the string. This allows safe decisions when removing characters from the stack.

The stack itself stores the current best subsequence candidate.

The while loop is responsible for lexicographical optimization. Larger characters are removed whenever doing so still permits a valid final subsequence.

The condition:

len(stack) - 1 + (n - i) >= k

ensures that even after popping, enough total characters remain to reach length k.

When adding non-letter characters, the algorithm reserves enough future positions for mandatory copies of letter.

This line is critical:

if remaining_slots > needed_letters:

Without it, the stack could become full before enough required letters are included.

Finally, the stack is joined into the final answer string.

Go Solution

func smallestSubsequence(s string, k int, letter byte, repetition int) string {
	totalLettersRemaining := 0
	for i := 0; i < len(s); i++ {
		if s[i] == letter {
			totalLettersRemaining++
		}
	}

	stack := make([]byte, 0)
	lettersInStack := 0
	n := len(s)

	for i := 0; i < n; i++ {
		ch := s[i]

		for len(stack) > 0 &&
			stack[len(stack)-1] > ch &&
			len(stack)-1+(n-i) >= k {

			top := stack[len(stack)-1]

			if top == letter {
				if lettersInStack-1+totalLettersRemaining < repetition {
					break
				}
				lettersInStack--
			}

			stack = stack[:len(stack)-1]
		}

		if len(stack) < k {
			if ch == letter {
				stack = append(stack, ch)
				lettersInStack++
			} else {
				remainingSlots := k - len(stack)
				neededLetters := repetition - lettersInStack

				if remainingSlots > neededLetters {
					stack = append(stack, ch)
				}
			}
		}

		if ch == letter {
			totalLettersRemaining--
		}
	}

	return string(stack)
}

The Go implementation mirrors the Python logic closely.

Instead of Python lists, Go uses a byte slice as the stack.

Popping from the stack is implemented using slice truncation:

stack = stack[:len(stack)-1]

Strings in Go are immutable, so using a byte slice is more efficient for stack operations.

There are no integer overflow concerns because all values remain within normal integer bounds.

Worked Examples

Example 1

s = "leet"
k = 3
letter = 'e'
repetition = 1
Step Char Stack Before Action Stack After
1 l [] push [l]
2 e [l] pop l, push e [e]
3 e [e] push [e,e]
4 t [e,e] push [e,e,t]

Result:

"eet"

Example 2

s = "leetcode"
k = 4
letter = 'e'
repetition = 2
Step Char Stack Before Action Stack After
1 l [] push [l]
2 e [l] pop l, push e [e]
3 e [e] push [e,e]
4 t [e,e] push [e,e,t]
5 c [e,e,t] pop t, push c [e,e,c]
6 o [e,e,c] skip [e,e,c]
7 d [e,e,c] push [e,e,c,d]
8 e [e,e,c,d] cannot improve safely [e,e,c,d]

Result:

"ecde"

The actual optimal sequence formed during constrained popping becomes "ecde".

Example 3

s = "bb"
k = 2
letter = 'b'
repetition = 2
Step Char Stack Before Action Stack After
1 b [] push [b]
2 b [b] push [b,b]

Result:

"bb"

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is pushed and popped at most once
Space O(k) The stack stores at most k characters

The algorithm is linear because every character participates in at most one push and one pop operation.

The stack never exceeds size k, so auxiliary memory remains bounded by the answer size.

Test Cases

sol = Solution()

# Provided examples
assert sol.smallestSubsequence("leet", 3, "e", 1) == "eet"  # basic example
assert sol.smallestSubsequence("leetcode", 4, "e", 2) == "ecde"  # multiple pops
assert sol.smallestSubsequence("bb", 2, "b", 2) == "bb"  # all required letters

# Single character
assert sol.smallestSubsequence("a", 1, "a", 1) == "a"  # minimum input

# All same characters
assert sol.smallestSubsequence("aaaaa", 3, "a", 3) == "aaa"  # every char required

# k equals string length
assert sol.smallestSubsequence("abcde", 5, "c", 1) == "abcde"  # no removals possible

# Lexicographical improvement through popping
assert sol.smallestSubsequence("cbacdcbc", 4, "c", 2) == "acbc"  # complex greedy case

# Must reserve space for required letters
assert sol.smallestSubsequence("zzzzzaaa", 5, "a", 3) == "zzaaa"  # avoid filling too early

# Required letters at the end
assert sol.smallestSubsequence("bbbbbaaa", 4, "a", 3) == "baaa"  # preserve room

# Every chosen character must be the letter
assert sol.smallestSubsequence("bbbb", 2, "b", 2) == "bb"  # all positions constrained

# Large lexicographical reductions
assert sol.smallestSubsequence("dcba", 2, "a", 1) == "ba"  # repeated popping

Test Summary

Test Why
"leet" Basic correctness
"leetcode" Greedy popping with constraints
"bb" All characters required
Single character input Minimum boundary
All same letters Uniform string
k == len(s) No removals allowed
"cbacdcbc" Complex stack behavior
"zzzzzaaa" Reserve space for required letters
"bbbbbaaa" Required letters appear late
"bbbb" Entire subsequence constrained
"dcba" Multiple lexicographical pops

Edge Cases

Edge Case 1: All Characters Are the Required Letter

Consider:

s = "aaaaa"
k = 3
letter = "a"
repetition = 3

Every character must be included from the required letter category. A buggy implementation might accidentally pop too many letters while optimizing lexicographical order.

The solution prevents this by checking:

letters_in_stack - 1 + remaining_letters >= repetition

before popping a required letter.

Edge Case 2: Required Letters Appear Only Near the End

Consider:

s = "zzzzzaaa"
k = 5
repetition = 3

If the algorithm greedily fills the stack too early with non-letter characters, there will not be enough space left for the required a characters later.

The implementation explicitly reserves future positions by verifying:

remaining_slots > needed_letters

before adding a non-letter character.

Edge Case 3: No Characters Can Be Removed

Consider:

s = "abcde"
k = 5

The subsequence must equal the original string.

A buggy implementation might incorrectly pop characters even though there are not enough remaining characters to refill the subsequence.

The condition:

len(stack) - 1 + remaining_characters >= k

guarantees that popping only occurs when reconstruction remains possible.