LeetCode 1170 - Compare Strings by Frequency of the Smallest Character

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Binary Search, Sorting

Solution

Problem Understanding

The problem defines a function f(s) for a non-empty string s. This function returns the frequency of the lexicographically smallest character in the string.

For example:

f("dcce") = 2
The smallest character is 'c'
'c' appears 2 times

We are given two arrays:

queries, containing query strings
words, containing reference strings

For every query string queries[i], we must compute how many strings W in words satisfy:

$$f(queries[i]) < f(W)$$

The result should be an integer array where each position contains the count for the corresponding query.

The important part of the problem is that we are not comparing strings directly. We only compare the frequency values produced by the function f.

The constraints are relatively small:

Up to 2000 queries
Up to 2000 words
Maximum string length is only 10

These constraints allow several approaches, including brute force. However, a more optimized approach using sorting and binary search is significantly cleaner and more scalable.

A key observation is that the maximum possible frequency is only 10, because strings have length at most 10. Even so, the standard optimal solution uses sorting plus binary search, which generalizes well and demonstrates an important algorithmic pattern.

Several edge cases are important:

Strings where every character is identical, such as "aaaa"
Strings where the smallest character appears only once
Multiple words producing the same frequency
Queries larger than all word frequencies
Queries smaller than all word frequencies

The problem guarantees that all strings are non-empty and contain only lowercase English letters, so we never need to handle empty strings or invalid characters.

Approaches

Brute Force Approach

The most direct solution is:

Compute f(s) for every query.
Compute f(w) for every word.
For each query frequency, scan all word frequencies and count how many are larger.

This approach is correct because it explicitly checks every valid comparison required by the problem statement.

To compute f(s):

Find the smallest character in the string.
Count how many times it appears.

If there are Q queries and W words, the brute-force comparison phase takes O(Q × W) time.

Given the constraints, this solution is actually acceptable in practice:

$$2000 \times 2000 = 4,000,000$$

However, we can do better conceptually.

Optimal Approach

The key insight is that we repeatedly ask the same type of question:

How many word frequencies are greater than a given query frequency?

Instead of scanning all word frequencies every time, we can:

Precompute all word frequencies.
Sort them.
Use binary search for each query.

After sorting, binary search can quickly find the first position where the word frequency becomes strictly greater than the query frequency.

If that index is idx, then:

$$\text{answer} = \text{len(words)} - idx$$

This transforms repeated linear scans into logarithmic searches.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(Q × W + QL + WL)	O(W)	Directly compares every query against every word
Optimal	O(W log W + Q log W + QL + WL)	O(W)	Uses sorting and binary search for efficient lookup

Here:

Q = number of queries
W = number of words
L = maximum string length, at most 10

Algorithm Walkthrough

Define a helper function frequency(s) that computes f(s).

This function finds the lexicographically smallest character in the string and counts how many times it appears. Since strings are very short, a simple scan is sufficient. 2. Compute frequencies for all words.

For every string in words, calculate its frequency value and store it in a list.

Example:

words = ["a", "aa", "aaa", "aaaa"]
frequencies = [1, 2, 3, 4]

Sort the word frequencies.

Sorting enables binary search. Once sorted, all larger frequencies appear to the right of any given position. 4. Process each query independently.

For each query:

Compute its frequency.
Use binary search to find the first word frequency strictly greater than the query frequency.

Compute the number of larger frequencies.

If the binary search returns index idx, then every element from idx onward is greater.

Therefore:

count = len(word_frequencies) - idx

Store the result for each query.

Append the computed count to the answer array.

Why it works

The sorted word frequency array maintains the invariant that all frequencies to the right of a position are greater than or equal to the current value.

Binary search finds the first position where:

$$f(word) > f(query)$$

Every frequency after that index also satisfies the condition. Therefore, subtracting the index from the total number of words gives exactly the required count.

Python Solution

from typing import List
from bisect import bisect_right

class Solution:
    def numSmallerByFrequency(self, queries: List[str], words: List[str]) -> List[int]:
        
        def frequency(s: str) -> int:
            smallest_char = min(s)
            return s.count(smallest_char)

        word_frequencies = [frequency(word) for word in words]
        word_frequencies.sort()

        result = []

        for query in queries:
            query_frequency = frequency(query)

            index = bisect_right(word_frequencies, query_frequency)

            result.append(len(word_frequencies) - index)

        return result

The helper function frequency directly implements the definition of f(s) from the problem statement. It first determines the smallest character using min(s) and then counts how many times it appears.

Next, the solution precomputes all frequencies for words and sorts them. This preprocessing step is essential because it allows efficient binary searches later.

For every query, the code computes its frequency and uses bisect_right to find the insertion point after all values less than or equal to the query frequency. This gives the first index containing a strictly larger frequency.

Finally, the number of valid words is computed as:

len(word_frequencies) - index

This works because every value to the right of the insertion point is strictly greater.

Go Solution

package main

import (
	"sort"
)

func numSmallerByFrequency(queries []string, words []string) []int {

	frequency := func(s string) int {
		smallest := byte('z')

		for i := 0; i < len(s); i++ {
			if s[i] < smallest {
				smallest = s[i]
			}
		}

		count := 0

		for i := 0; i < len(s); i++ {
			if s[i] == smallest {
				count++
			}
		}

		return count
	}

	wordFrequencies := make([]int, len(words))

	for i, word := range words {
		wordFrequencies[i] = frequency(word)
	}

	sort.Ints(wordFrequencies)

	result := make([]int, len(queries))

	for i, query := range queries {

		queryFrequency := frequency(query)

		index := sort.Search(
			len(wordFrequencies),
			func(j int) bool {
				return wordFrequencies[j] > queryFrequency
			},
		)

		result[i] = len(wordFrequencies) - index
	}

	return result
}

The Go implementation follows the same algorithmic structure as the Python solution.

Instead of Python's bisect_right, Go uses sort.Search. The search condition finds the first index where the word frequency becomes strictly greater than the query frequency.

Go strings are byte indexed, which works perfectly here because the input only contains lowercase English letters.

The implementation uses slices throughout, and no special handling for nil is required because empty slices behave correctly with len.

Worked Examples

Example 1

queries = ["cbd"]
words = ["zaaaz"]

Step 1: Compute word frequencies

Word	Smallest Character	Frequency
"zaaaz"	'a'	3

word_frequencies = [3]

After sorting:

[3]

Step 2: Process query

For "cbd":

Query	Smallest Character	Frequency
"cbd"	'b'	1

We now need:

count of frequencies > 1

Binary search:

Array	Query Frequency	First Greater Index
[3]	1	0

Result:

1 - 0 = 1

Final answer:

[1]

Example 2

queries = ["bbb", "cc"]
words = ["a", "aa", "aaa", "aaaa"]

Step 1: Compute word frequencies

Word	Frequency
"a"	1
"aa"	2
"aaa"	3
"aaaa"	4

Sorted frequencies:

[1, 2, 3, 4]

Step 2: Process query `"bbb"`

Query	Smallest Character	Frequency
"bbb"	'b'	3

We need:

count of frequencies > 3

Binary search result:

Array	Query Frequency	First Greater Index
[1,2,3,4]	3	3

Count:

4 - 3 = 1

Step 3: Process query `"cc"`

Query	Smallest Character	Frequency
"cc"	'c'	2

We need:

count of frequencies > 2

Binary search result:

Array	Query Frequency	First Greater Index
[1,2,3,4]	2	2

Count:

4 - 2 = 2

Final answer:

[1, 2]

Complexity Analysis

Measure	Complexity	Explanation
Time	O(W log W + Q log W)	Sorting dominates preprocessing, binary search handles each query efficiently
Space	O(W)	Stores the word frequency array

The frequency computation itself is effectively constant time because strings have maximum length 10. Therefore, the dominant cost comes from sorting the word frequencies and performing binary searches.

Sorting takes:

$$O(W \log W)$$

Each query performs one binary search:

$$O(\log W)$$

So processing all queries takes:

$$O(Q \log W)$$

The extra space is primarily the frequency array for the words.

Test Cases

from typing import List

class Solution:
    def numSmallerByFrequency(self, queries: List[str], words: List[str]) -> List[int]:
        
        from bisect import bisect_right

        def frequency(s: str) -> int:
            smallest_char = min(s)
            return s.count(smallest_char)

        word_frequencies = sorted(frequency(word) for word in words)

        result = []

        for query in queries:
            query_frequency = frequency(query)
            index = bisect_right(word_frequencies, query_frequency)
            result.append(len(word_frequencies) - index)

        return result

solution = Solution()

assert solution.numSmallerByFrequency(
    ["cbd"],
    ["zaaaz"]
) == [1]  # basic example

assert solution.numSmallerByFrequency(
    ["bbb", "cc"],
    ["a", "aa", "aaa", "aaaa"]
) == [1, 2]  # multiple queries

assert solution.numSmallerByFrequency(
    ["a"],
    ["b", "cc", "ddd"]
) == [2]  # one frequency equal, two larger

assert solution.numSmallerByFrequency(
    ["aaaa"],
    ["a", "aa", "aaa"]
) == [0]  # query larger than all word frequencies

assert solution.numSmallerByFrequency(
    ["b"],
    ["aaaa"]
) == [1]  # query smaller than all word frequencies

assert solution.numSmallerByFrequency(
    ["abcd"],
    ["efgh"]
) == [0]  # equal frequencies only

assert solution.numSmallerByFrequency(
    ["zzzz"],
    ["a", "aa", "aaa", "aaaa"]
) == [0]  # same frequency as largest

assert solution.numSmallerByFrequency(
    ["a", "aa", "aaa"],
    ["aaaa"]
) == [1, 1, 1]  # repeated valid comparisons

assert solution.numSmallerByFrequency(
    ["dcce"],
    ["aaaa", "bb", "ccc"]
) == [1]  # smallest character appears multiple times

Test Summary

Test	Why
`["cbd"], ["zaaaz"]`	Validates the simplest example
`["bbb","cc"]`	Tests multiple queries with different answers
Query with equal frequencies	Ensures strict greater-than comparison
Query larger than all words	Verifies zero result handling
Query smaller than all words	Verifies full-count handling
Equal frequencies only	Ensures equality is excluded
Maximum repeated character style input	Tests repeated-character logic
Multiple queries against one word	Verifies repeated binary searches
Mixed frequency patterns	Tests smallest-character counting correctness

Edge Cases

One important edge case occurs when the query frequency is larger than every word frequency. In this situation, the binary search returns the length of the array because no larger value exists. The implementation correctly computes:

len(word_frequencies) - len(word_frequencies) = 0

which produces the expected answer.

Another important edge case is when many word frequencies are equal to the query frequency. The problem requires strictly greater frequencies, not greater-than-or-equal. Using bisect_right in Python and the > condition in Go ensures that all equal values are skipped correctly before counting larger ones.

A third edge case involves strings where every character is identical, such as "aaaa" or "bbbb". In these cases, the smallest character frequency equals the entire string length. The helper function handles this naturally because min(s) returns the repeated character and count() returns the full length.

A subtle edge case appears when the smallest character occurs multiple times but is not the first character in the string. For example:

"dcce"

The smallest character is 'c', not 'd'. A buggy implementation might incorrectly count the first character or assume sorted order. The helper function explicitly computes the minimum character before counting, which guarantees correctness regardless of character arrangement.

LeetCode 1170 - Compare Strings by Frequency of the Smallest Character

Solution

Problem Understanding

Approaches

Brute Force Approach

Optimal Approach

Approach Comparison

Algorithm Walkthrough

Why it works

Python Solution

Go Solution

Worked Examples

Example 1

Step 1: Compute word frequencies

Step 2: Process query

Example 2

Step 1: Compute word frequencies

Step 2: Process query "bbb"

Step 3: Process query "cc"

Complexity Analysis

Test Cases

Test Summary

Edge Cases

Step 2: Process query `"bbb"`

Step 3: Process query `"cc"`