LeetCode 3662 - Filter Characters by Frequency

This problem asks us to filter a string based on character frequency while preserving the original order of characters. We are given a string s consisting only of lowercase English letters and an integer k.

LeetCode Problem 3662

Difficulty: 🟢 Easy
Topics: Hash Table, String, Counting

Solution

Problem Understanding

This problem asks us to filter a string based on character frequency while preserving the original order of characters. We are given a string s consisting only of lowercase English letters and an integer k. The goal is to construct a new string by keeping only those characters whose total frequency in the entire string is strictly less than k. Importantly, if a character qualifies, every occurrence of that character in the original string must be included in the output, and the relative ordering of all kept characters must remain unchanged.

The input string represents a sequence of characters where repetition matters, since decisions depend on global frequency rather than local structure. The integer k acts as a threshold that determines whether a character is “rare enough” to be retained.

The output is another string derived from filtering s without rearranging characters. If no characters satisfy the condition, the result must be an empty string.

The constraints are small, with 1 <= s.length <= 100, meaning even quadratic solutions would technically pass. However, the problem naturally invites an optimal linear-time solution using frequency counting.

Important edge cases include when all characters are frequent enough to be excluded, when all characters are unique, when k equals 1 (which would exclude everything), and when all characters are identical. These cases test whether frequency counting and filtering logic are applied globally rather than locally.

Approaches

The brute-force approach evaluates each character independently by scanning the entire string to compute its frequency on demand. For every character position, we count how many times that character appears in s, and if the count is less than k, we include it in the result. While this approach is correct, it repeatedly recomputes frequencies, leading to unnecessary work.

The optimal approach recognizes that character frequencies are independent of position and can be precomputed in a single pass. Once we build a frequency map for all characters, we can iterate through the string once more and include only those characters whose frequency is below the threshold k. This reduces redundant computation and ensures linear time complexity.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(1) Recompute frequency for each character by scanning the string repeatedly
Optimal O(n) O(1) Precompute frequency once, then filter in a single pass

Algorithm Walkthrough

The optimal algorithm relies on separating the problem into two independent phases: frequency computation and filtering.

  1. First, we traverse the string s and compute the frequency of each character using a hash map (or fixed-size array since only lowercase letters are allowed). This step is necessary because we need global knowledge of character occurrences before making any filtering decisions.
  2. We store these frequencies so that we can access them in constant time for each character during the filtering stage.
  3. Next, we iterate through the string again from left to right. For each character, we check its precomputed frequency.
  4. If the frequency of the current character is strictly less than k, we append it to the output string. Otherwise, we skip it entirely.
  5. Because we process characters in original order and only append valid ones, the relative ordering of retained characters is preserved automatically.
  6. Finally, we return the constructed result string.

Why it works

The correctness relies on the invariant that frequency information is global and independent of position. Once frequencies are computed, each character can be evaluated in isolation without affecting others. Since we preserve iteration order during the filtering pass, the output maintains the required relative ordering while ensuring only valid characters are included.

Python Solution

class Solution:
    def filterCharacters(self, s: str, k: int) -> str:
        freq = {}

        for ch in s:
            freq[ch] = freq.get(ch, 0) + 1

        result = []

        for ch in s:
            if freq[ch] < k:
                result.append(ch)

        return "".join(result)

The implementation first builds a dictionary freq that maps each character to its total occurrence count in the string. This corresponds to the preprocessing step of the algorithm. Then it iterates through s again, and for each character checks whether its frequency is less than k. If so, it appends the character to a list result. Finally, the list is joined into a string, ensuring efficient concatenation.

Go Solution

func filterCharacters(s string, k int) string {
    freq := make(map[rune]int)

    for _, ch := range s {
        freq[ch]++
    }

    result := make([]rune, 0, len(s))

    for _, ch := range s {
        if freq[ch] < k {
            result = append(result, ch)
        }
    }

    return string(result)
}

The Go implementation mirrors the Python solution. A map from rune to int is used to store frequencies. We iterate over the string using range, which properly handles Unicode code points. A slice of runes is used as a dynamic buffer for the result, which is more efficient than repeated string concatenation. Finally, the rune slice is converted back into a string.

The main Go-specific difference is the use of rune instead of byte, ensuring correctness for Unicode-safe iteration, even though the problem guarantees lowercase English letters.

Worked Examples

Example 1

Input: s = "aadbbcccca", k = 3

First pass (frequency table):

Character Count
a 3
d 1
b 2
c 4

Second pass (filtering step-by-step):

Index Char Frequency Keep? Result so far
0 a 3 No ""
1 a 3 No ""
2 d 1 Yes "d"
3 b 2 Yes "db"
4 b 2 Yes "dbb"
5 c 4 No "dbb"
6 c 4 No "dbb"
7 c 4 No "dbb"
8 c 4 No "dbb"
9 a 3 No "dbb"

Output: "dbb"

Example 2

Input: s = "xyz", k = 2

Frequency table:

Character Count
x 1
y 1
z 1

Filtering step:

Each character has frequency 1, which is less than 2, so all are kept.

Output: "xyz"

Complexity Analysis

Measure Complexity Explanation
Time O(n) Two linear passes over the string, one for frequency counting and one for filtering
Space O(1) Frequency map is bounded by at most 26 lowercase letters

The algorithm runs in linear time because each character is visited a constant number of times. Space usage is constant because the alphabet size does not grow with input size.

Test Cases

assert Solution().filterCharacters("aadbbcccca", 3) == "dbb"  # example 1
assert Solution().filterCharacters("xyz", 2) == "xyz"        # example 2
assert Solution().filterCharacters("aaaa", 2) == ""          # all excluded (freq >= k)
assert Solution().filterCharacters("a", 1) == ""             # single char, k=1 excludes all
assert Solution().filterCharacters("aabbcc", 3) == "aabbcc"  # all freq < k
assert Solution().filterCharacters("abcabc", 2) == ""       # all freq equal k
assert Solution().filterCharacters("abacabad", 3) == "dddd"  # mixed frequencies case
Test Why
"aadbbcccca", 3 verifies standard mixed frequency filtering
"xyz", 2 verifies all characters are kept
"aaaa", 2 verifies full exclusion when frequency threshold is exceeded
"a", 1 verifies k=1 edge case
"aabbcc", 3 verifies all characters pass
"abcabc", 2 verifies boundary where freq equals k
"abacabad", 3 stresses mixed distribution behavior

Edge Cases

One important edge case is when k = 1. Since every character appears at least once, no character can have frequency strictly less than 1, meaning the correct output is always an empty string. This case is easy to mishandle if the condition is written incorrectly as <= k instead of < k.

Another edge case occurs when all characters in the string are identical. In this scenario, the frequency of that single character equals the string length. If k is greater than 1, the character may be excluded entirely depending on the threshold, and this tests whether the solution correctly uses global frequency rather than per-position logic.

A third edge case is when all characters are unique. Here, every character has frequency 1, so the result depends entirely on whether k is greater than 1. This case verifies that the solution properly handles uniform low-frequency distributions and preserves order without alteration.

A final subtle edge case is when frequencies are mixed but the output becomes empty due to filtering. This ensures the implementation correctly handles returning an empty string without errors from joining or appending logic.