LeetCode 1772 - Sort Features by Popularity
This problem asks us to determine the popularity of product features based on user survey responses. We are given a list of features where each element is a single-word feature name, and a list of responses where each element is a string of space-separated words that users…
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Sorting
Solution
Problem Understanding
This problem asks us to determine the popularity of product features based on user survey responses. We are given a list of features where each element is a single-word feature name, and a list of responses where each element is a string of space-separated words that users reported liking. The popularity of a feature is defined as the number of responses that mention that feature at least once. If a response mentions the same feature multiple times, it is counted only once for that response.
The goal is to return the features sorted in non-increasing order of popularity. If two features have the same popularity, the feature appearing earlier in the original features list should come first.
The constraints indicate that features can have up to 10,000 items, and each response can have up to 1,000 characters, but the number of responses is limited to 102. This hints that iterating through responses is relatively cheap, while iterating through the features list for every word in responses could become costly. Important edge cases include responses that contain repeated mentions of a feature, responses that mention no features, or features that are not mentioned at all.
Approaches
Brute Force Approach
A brute-force solution would iterate over every feature for every response, counting how many responses contain each feature. For each response, we could check if the feature appears as a substring in the response string. While this approach works, it is inefficient because it requires scanning each response string for every feature, leading to a time complexity of O(F * R * L) where F is the number of features, R is the number of responses, and L is the average length of a response. This approach could be slow for the upper bound of 10,000 features.
Optimal Approach
The key insight is that we do not need to repeatedly search the same response for each feature. Instead, we can tokenize each response into a set of unique words and check if any feature exists in that set. Using a hash map (dictionary) to keep track of feature popularity allows us to count efficiently. After counting, we can sort the features by popularity using a stable sort that respects the original feature order in case of ties.
This approach improves efficiency because converting a response to a set of words and checking for membership is much faster than repeated substring searches.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(F * R * L) | O(F) | Checks each feature against every response string directly |
| Optimal | O(R * W + F * log F) | O(F + R * W) | Uses sets for unique words in responses, hash map for counts, and stable sort |
Here, W is the average number of words per response.
Algorithm Walkthrough
-
Initialize a dictionary
feature_countwith keys as features and values as zero to store popularity counts. -
For each response in
responses: -
Split the response string into individual words.
-
Convert the list of words into a set
words_in_responseto remove duplicates. -
For each feature in
features, check if it exists inwords_in_response. If yes, increment its count infeature_count. -
After processing all responses, sort the
featureslist based on two criteria: -
Popularity in descending order.
-
Original index in the
featureslist for tie-breaking. -
Return the sorted list.
Why it works: Each response contributes at most once to the count of each feature due to the conversion to a set. Using a dictionary ensures efficient counting. Sorting with popularity as the primary key and original index as secondary preserves the required order and guarantees correctness.
Python Solution
from typing import List
class Solution:
def sortFeatures(self, features: List[str], responses: List[str]) -> List[str]:
feature_count = {feature: 0 for feature in features}
feature_index = {feature: idx for idx, feature in enumerate(features)}
for response in responses:
words_in_response = set(response.split())
for feature in features:
if feature in words_in_response:
feature_count[feature] += 1
sorted_features = sorted(
features,
key=lambda f: (-feature_count[f], feature_index[f])
)
return sorted_features
The code initializes a count dictionary and an index dictionary for tie-breaking. Each response is split into unique words, and counts are updated efficiently. Finally, a stable sort orders features by popularity and original order.
Go Solution
package main
import (
"sort"
"strings"
)
func sortFeatures(features []string, responses []string) []string {
featureCount := make(map[string]int)
featureIndex := make(map[string]int)
for i, feature := range features {
featureCount[feature] = 0
featureIndex[feature] = i
}
for _, response := range responses {
words := strings.Fields(response)
wordsSet := make(map[string]struct{})
for _, word := range words {
wordsSet[word] = struct{}{}
}
for _, feature := range features {
if _, exists := wordsSet[feature]; exists {
featureCount[feature]++
}
}
}
sort.SliceStable(features, func(i, j int) bool {
if featureCount[features[i]] == featureCount[features[j]] {
return featureIndex[features[i]] < featureIndex[features[j]]
}
return featureCount[features[i]] > featureCount[features[j]]
})
return features
}
In Go, we use maps for counting and storing original indices. strings.Fields conveniently splits the response into words, and sort.SliceStable guarantees that the original order is preserved for ties.
Worked Examples
Example 1:
features = ["cooler","lock","touch"]
responses = ["i like cooler cooler","lock touch cool","locker like touch"]
Step 1: Initialize counts:
feature_count = {"cooler":0, "lock":0, "touch":0}
Step 2: Process first response "i like cooler cooler":
words_in_response = {"i","like","cooler"}
feature_count["cooler"] += 1
Step 3: Process second response "lock touch cool":
words_in_response = {"lock","touch","cool"}
feature_count["lock"] += 1
feature_count["touch"] += 1
Step 4: Process third response "locker like touch":
words_in_response = {"locker","like","touch"}
feature_count["touch"] += 1
Final counts:
{"cooler":1,"lock":1,"touch":2}
Sorted by popularity: ["touch","cooler","lock"]
Example 2:
features = ["a","aa","b","c"]
responses = ["a","a aa","a a a a a","b a"]
Final counts:
{"a":4,"aa":1,"b":1,"c":0}
Sorted: ["a","aa","b","c"]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(R * W + F * log F) | R is number of responses, W is avg words per response; sorting features takes F * log F |
| Space | O(F + R * W) | O(F) for count and index maps, O(W) for words set in each response |
The algorithm scales well because R is small compared to F, and checking membership in a set is O(1).
Test Cases
solution = Solution()
# Example 1
assert solution.sortFeatures(["cooler","lock","touch"], ["i like cooler cooler","lock touch cool","locker like touch"]) == ["touch","cooler","lock"]
# Example 2
assert solution.sortFeatures(["a","aa","b","c"], ["a","a aa","a a a a a","b a"]) == ["a","aa","b","c"]
# Feature never mentioned
assert solution.sortFeatures(["x","y","z"], ["x y","y","x x"]) == ["x","y","z"]
# All features mentioned once
assert solution.sortFeatures(["m","n","o"], ["m","n","o"]) == ["m","n","o"]
# Multiple mentions in same response
assert solution.sortFeatures(["p","q"], ["p p p q q","p q"]) == ["p","q"]
# No responses
assert solution.sortFeatures(["alpha","beta"], []) == ["alpha","beta"]
# Single feature multiple responses
assert solution.sortFeatures(["feat"], ["feat feat","feat"]) == ["feat"]
| Test | Why |
|---|---|
| Features appear multiple times in responses | Ensures duplicate mentions in a single response are counted once |
| Feature never mentioned | Validates correct handling of zero-popularity features |
| No responses | Checks handling of empty responses list |
| All features mentioned once | Confirms stable ordering when counts are equal |
| Single feature multiple responses | Ensures counts are aggregated across responses |
Edge Cases
1. Feature never appears in responses: Features with zero counts should appear at the end of the sorted list. The implementation handles this because their count remains zero, and the stable sort respects original order.
2. Responses contain repeated words: Multiple occurrences of a feature in a single response should only count once. Converting each response to a set ensures duplicates