LeetCode 1170 - Compare Strings by Frequency of the Smallest Character
The problem defines a function f(s) for a non-empty string s. This function returns the frequency of the lexicographically smallest character in the string.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Binary Search, Sorting
Solution
Problem Understanding
The problem defines a function f(s) for a non-empty string s. This function returns the frequency of the lexicographically smallest character in the string.
For example:
f("dcce") = 2- The smallest character is
'c' 'c'appears 2 times
We are given two arrays:
queries, containing query stringswords, containing reference strings
For every query string queries[i], we must compute how many strings W in words satisfy:
$$f(queries[i]) < f(W)$$
The result should be an integer array where each position contains the count for the corresponding query.
The important part of the problem is that we are not comparing strings directly. We only compare the frequency values produced by the function f.
The constraints are relatively small:
- Up to 2000 queries
- Up to 2000 words
- Maximum string length is only 10
These constraints allow several approaches, including brute force. However, a more optimized approach using sorting and binary search is significantly cleaner and more scalable.
A key observation is that the maximum possible frequency is only 10, because strings have length at most 10. Even so, the standard optimal solution uses sorting plus binary search, which generalizes well and demonstrates an important algorithmic pattern.
Several edge cases are important:
- Strings where every character is identical, such as
"aaaa" - Strings where the smallest character appears only once
- Multiple words producing the same frequency
- Queries larger than all word frequencies
- Queries smaller than all word frequencies
The problem guarantees that all strings are non-empty and contain only lowercase English letters, so we never need to handle empty strings or invalid characters.
Approaches
Brute Force Approach
The most direct solution is:
- Compute
f(s)for every query. - Compute
f(w)for every word. - For each query frequency, scan all word frequencies and count how many are larger.
This approach is correct because it explicitly checks every valid comparison required by the problem statement.
To compute f(s):
- Find the smallest character in the string.
- Count how many times it appears.
If there are Q queries and W words, the brute-force comparison phase takes O(Q × W) time.
Given the constraints, this solution is actually acceptable in practice:
$$2000 \times 2000 = 4,000,000$$
However, we can do better conceptually.
Optimal Approach
The key insight is that we repeatedly ask the same type of question:
How many word frequencies are greater than a given query frequency?
Instead of scanning all word frequencies every time, we can:
- Precompute all word frequencies.
- Sort them.
- Use binary search for each query.
After sorting, binary search can quickly find the first position where the word frequency becomes strictly greater than the query frequency.
If that index is idx, then:
$$\text{answer} = \text{len(words)} - idx$$
This transforms repeated linear scans into logarithmic searches.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(Q × W + QL + WL) | O(W) | Directly compares every query against every word |
| Optimal | O(W log W + Q log W + QL + WL) | O(W) | Uses sorting and binary search for efficient lookup |
Here:
Q= number of queriesW= number of wordsL= maximum string length, at most 10
Algorithm Walkthrough
- Define a helper function
frequency(s)that computesf(s).
This function finds the lexicographically smallest character in the string and counts how many times it appears. Since strings are very short, a simple scan is sufficient. 2. Compute frequencies for all words.
For every string in words, calculate its frequency value and store it in a list.
Example:
words = ["a", "aa", "aaa", "aaaa"]
frequencies = [1, 2, 3, 4]
- Sort the word frequencies.
Sorting enables binary search. Once sorted, all larger frequencies appear to the right of any given position. 4. Process each query independently.
For each query:
- Compute its frequency.
- Use binary search to find the first word frequency strictly greater than the query frequency.
- Compute the number of larger frequencies.
If the binary search returns index idx, then every element from idx onward is greater.
Therefore:
count = len(word_frequencies) - idx
- Store the result for each query.
Append the computed count to the answer array.
Why it works
The sorted word frequency array maintains the invariant that all frequencies to the right of a position are greater than or equal to the current value.
Binary search finds the first position where:
$$f(word) > f(query)$$
Every frequency after that index also satisfies the condition. Therefore, subtracting the index from the total number of words gives exactly the required count.
Python Solution
from typing import List
from bisect import bisect_right
class Solution:
def numSmallerByFrequency(self, queries: List[str], words: List[str]) -> List[int]:
def frequency(s: str) -> int:
smallest_char = min(s)
return s.count(smallest_char)
word_frequencies = [frequency(word) for word in words]
word_frequencies.sort()
result = []
for query in queries:
query_frequency = frequency(query)
index = bisect_right(word_frequencies, query_frequency)
result.append(len(word_frequencies) - index)
return result
The helper function frequency directly implements the definition of f(s) from the problem statement. It first determines the smallest character using min(s) and then counts how many times it appears.
Next, the solution precomputes all frequencies for words and sorts them. This preprocessing step is essential because it allows efficient binary searches later.
For every query, the code computes its frequency and uses bisect_right to find the insertion point after all values less than or equal to the query frequency. This gives the first index containing a strictly larger frequency.
Finally, the number of valid words is computed as:
len(word_frequencies) - index
This works because every value to the right of the insertion point is strictly greater.
Go Solution
package main
import (
"sort"
)
func numSmallerByFrequency(queries []string, words []string) []int {
frequency := func(s string) int {
smallest := byte('z')
for i := 0; i < len(s); i++ {
if s[i] < smallest {
smallest = s[i]
}
}
count := 0
for i := 0; i < len(s); i++ {
if s[i] == smallest {
count++
}
}
return count
}
wordFrequencies := make([]int, len(words))
for i, word := range words {
wordFrequencies[i] = frequency(word)
}
sort.Ints(wordFrequencies)
result := make([]int, len(queries))
for i, query := range queries {
queryFrequency := frequency(query)
index := sort.Search(
len(wordFrequencies),
func(j int) bool {
return wordFrequencies[j] > queryFrequency
},
)
result[i] = len(wordFrequencies) - index
}
return result
}
The Go implementation follows the same algorithmic structure as the Python solution.
Instead of Python's bisect_right, Go uses sort.Search. The search condition finds the first index where the word frequency becomes strictly greater than the query frequency.
Go strings are byte indexed, which works perfectly here because the input only contains lowercase English letters.
The implementation uses slices throughout, and no special handling for nil is required because empty slices behave correctly with len.
Worked Examples
Example 1
queries = ["cbd"]
words = ["zaaaz"]
Step 1: Compute word frequencies
| Word | Smallest Character | Frequency |
|---|---|---|
| "zaaaz" | 'a' | 3 |
word_frequencies = [3]
After sorting:
[3]
Step 2: Process query
For "cbd":
| Query | Smallest Character | Frequency |
|---|---|---|
| "cbd" | 'b' | 1 |
We now need:
count of frequencies > 1
Binary search:
| Array | Query Frequency | First Greater Index |
|---|---|---|
| [3] | 1 | 0 |
Result:
1 - 0 = 1
Final answer:
[1]
Example 2
queries = ["bbb", "cc"]
words = ["a", "aa", "aaa", "aaaa"]
Step 1: Compute word frequencies
| Word | Frequency |
|---|---|
| "a" | 1 |
| "aa" | 2 |
| "aaa" | 3 |
| "aaaa" | 4 |
Sorted frequencies:
[1, 2, 3, 4]
Step 2: Process query "bbb"
| Query | Smallest Character | Frequency |
|---|---|---|
| "bbb" | 'b' | 3 |
We need:
count of frequencies > 3
Binary search result:
| Array | Query Frequency | First Greater Index |
|---|---|---|
| [1,2,3,4] | 3 | 3 |
Count:
4 - 3 = 1
Step 3: Process query "cc"
| Query | Smallest Character | Frequency |
|---|---|---|
| "cc" | 'c' | 2 |
We need:
count of frequencies > 2
Binary search result:
| Array | Query Frequency | First Greater Index |
|---|---|---|
| [1,2,3,4] | 2 | 2 |
Count:
4 - 2 = 2
Final answer:
[1, 2]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(W log W + Q log W) | Sorting dominates preprocessing, binary search handles each query efficiently |
| Space | O(W) | Stores the word frequency array |
The frequency computation itself is effectively constant time because strings have maximum length 10. Therefore, the dominant cost comes from sorting the word frequencies and performing binary searches.
Sorting takes:
$$O(W \log W)$$
Each query performs one binary search:
$$O(\log W)$$
So processing all queries takes:
$$O(Q \log W)$$
The extra space is primarily the frequency array for the words.
Test Cases
from typing import List
class Solution:
def numSmallerByFrequency(self, queries: List[str], words: List[str]) -> List[int]:
from bisect import bisect_right
def frequency(s: str) -> int:
smallest_char = min(s)
return s.count(smallest_char)
word_frequencies = sorted(frequency(word) for word in words)
result = []
for query in queries:
query_frequency = frequency(query)
index = bisect_right(word_frequencies, query_frequency)
result.append(len(word_frequencies) - index)
return result
solution = Solution()
assert solution.numSmallerByFrequency(
["cbd"],
["zaaaz"]
) == [1] # basic example
assert solution.numSmallerByFrequency(
["bbb", "cc"],
["a", "aa", "aaa", "aaaa"]
) == [1, 2] # multiple queries
assert solution.numSmallerByFrequency(
["a"],
["b", "cc", "ddd"]
) == [2] # one frequency equal, two larger
assert solution.numSmallerByFrequency(
["aaaa"],
["a", "aa", "aaa"]
) == [0] # query larger than all word frequencies
assert solution.numSmallerByFrequency(
["b"],
["aaaa"]
) == [1] # query smaller than all word frequencies
assert solution.numSmallerByFrequency(
["abcd"],
["efgh"]
) == [0] # equal frequencies only
assert solution.numSmallerByFrequency(
["zzzz"],
["a", "aa", "aaa", "aaaa"]
) == [0] # same frequency as largest
assert solution.numSmallerByFrequency(
["a", "aa", "aaa"],
["aaaa"]
) == [1, 1, 1] # repeated valid comparisons
assert solution.numSmallerByFrequency(
["dcce"],
["aaaa", "bb", "ccc"]
) == [1] # smallest character appears multiple times
Test Summary
| Test | Why |
|---|---|
["cbd"], ["zaaaz"] |
Validates the simplest example |
["bbb","cc"] |
Tests multiple queries with different answers |
| Query with equal frequencies | Ensures strict greater-than comparison |
| Query larger than all words | Verifies zero result handling |
| Query smaller than all words | Verifies full-count handling |
| Equal frequencies only | Ensures equality is excluded |
| Maximum repeated character style input | Tests repeated-character logic |
| Multiple queries against one word | Verifies repeated binary searches |
| Mixed frequency patterns | Tests smallest-character counting correctness |
Edge Cases
One important edge case occurs when the query frequency is larger than every word frequency. In this situation, the binary search returns the length of the array because no larger value exists. The implementation correctly computes:
len(word_frequencies) - len(word_frequencies) = 0
which produces the expected answer.
Another important edge case is when many word frequencies are equal to the query frequency. The problem requires strictly greater frequencies, not greater-than-or-equal. Using bisect_right in Python and the > condition in Go ensures that all equal values are skipped correctly before counting larger ones.
A third edge case involves strings where every character is identical, such as "aaaa" or "bbbb". In these cases, the smallest character frequency equals the entire string length. The helper function handles this naturally because min(s) returns the repeated character and count() returns the full length.
A subtle edge case appears when the smallest character occurs multiple times but is not the first character in the string. For example:
"dcce"
The smallest character is 'c', not 'd'. A buggy implementation might incorrectly count the first character or assume sorted order. The helper function explicitly computes the minimum character before counting, which guarantees correctness regardless of character arrangement.