LeetCode 1772 - Sort Features by Popularity

This problem asks us to determine the popularity of product features based on user survey responses. We are given a list of features where each element is a single-word feature name, and a list of responses where each element is a string of space-separated words that users…

LeetCode Problem 1772

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Sorting

Solution

Problem Understanding

This problem asks us to determine the popularity of product features based on user survey responses. We are given a list of features where each element is a single-word feature name, and a list of responses where each element is a string of space-separated words that users reported liking. The popularity of a feature is defined as the number of responses that mention that feature at least once. If a response mentions the same feature multiple times, it is counted only once for that response.

The goal is to return the features sorted in non-increasing order of popularity. If two features have the same popularity, the feature appearing earlier in the original features list should come first.

The constraints indicate that features can have up to 10,000 items, and each response can have up to 1,000 characters, but the number of responses is limited to 102. This hints that iterating through responses is relatively cheap, while iterating through the features list for every word in responses could become costly. Important edge cases include responses that contain repeated mentions of a feature, responses that mention no features, or features that are not mentioned at all.

Approaches

Brute Force Approach

A brute-force solution would iterate over every feature for every response, counting how many responses contain each feature. For each response, we could check if the feature appears as a substring in the response string. While this approach works, it is inefficient because it requires scanning each response string for every feature, leading to a time complexity of O(F * R * L) where F is the number of features, R is the number of responses, and L is the average length of a response. This approach could be slow for the upper bound of 10,000 features.

Optimal Approach

The key insight is that we do not need to repeatedly search the same response for each feature. Instead, we can tokenize each response into a set of unique words and check if any feature exists in that set. Using a hash map (dictionary) to keep track of feature popularity allows us to count efficiently. After counting, we can sort the features by popularity using a stable sort that respects the original feature order in case of ties.

This approach improves efficiency because converting a response to a set of words and checking for membership is much faster than repeated substring searches.

Approach Time Complexity Space Complexity Notes
Brute Force O(F * R * L) O(F) Checks each feature against every response string directly
Optimal O(R * W + F * log F) O(F + R * W) Uses sets for unique words in responses, hash map for counts, and stable sort

Here, W is the average number of words per response.

Algorithm Walkthrough

  1. Initialize a dictionary feature_count with keys as features and values as zero to store popularity counts.

  2. For each response in responses:

  3. Split the response string into individual words.

  4. Convert the list of words into a set words_in_response to remove duplicates.

  5. For each feature in features, check if it exists in words_in_response. If yes, increment its count in feature_count.

  6. After processing all responses, sort the features list based on two criteria:

  7. Popularity in descending order.

  8. Original index in the features list for tie-breaking.

  9. Return the sorted list.

Why it works: Each response contributes at most once to the count of each feature due to the conversion to a set. Using a dictionary ensures efficient counting. Sorting with popularity as the primary key and original index as secondary preserves the required order and guarantees correctness.

Python Solution

from typing import List

class Solution:
    def sortFeatures(self, features: List[str], responses: List[str]) -> List[str]:
        feature_count = {feature: 0 for feature in features}
        feature_index = {feature: idx for idx, feature in enumerate(features)}
        
        for response in responses:
            words_in_response = set(response.split())
            for feature in features:
                if feature in words_in_response:
                    feature_count[feature] += 1
        
        sorted_features = sorted(
            features,
            key=lambda f: (-feature_count[f], feature_index[f])
        )
        
        return sorted_features

The code initializes a count dictionary and an index dictionary for tie-breaking. Each response is split into unique words, and counts are updated efficiently. Finally, a stable sort orders features by popularity and original order.

Go Solution

package main

import (
	"sort"
	"strings"
)

func sortFeatures(features []string, responses []string) []string {
	featureCount := make(map[string]int)
	featureIndex := make(map[string]int)
	for i, feature := range features {
		featureCount[feature] = 0
		featureIndex[feature] = i
	}
	
	for _, response := range responses {
		words := strings.Fields(response)
		wordsSet := make(map[string]struct{})
		for _, word := range words {
			wordsSet[word] = struct{}{}
		}
		for _, feature := range features {
			if _, exists := wordsSet[feature]; exists {
				featureCount[feature]++
			}
		}
	}
	
	sort.SliceStable(features, func(i, j int) bool {
		if featureCount[features[i]] == featureCount[features[j]] {
			return featureIndex[features[i]] < featureIndex[features[j]]
		}
		return featureCount[features[i]] > featureCount[features[j]]
	})
	
	return features
}

In Go, we use maps for counting and storing original indices. strings.Fields conveniently splits the response into words, and sort.SliceStable guarantees that the original order is preserved for ties.

Worked Examples

Example 1:

features = ["cooler","lock","touch"]
responses = ["i like cooler cooler","lock touch cool","locker like touch"]

Step 1: Initialize counts:

feature_count = {"cooler":0, "lock":0, "touch":0}

Step 2: Process first response "i like cooler cooler":

words_in_response = {"i","like","cooler"}
feature_count["cooler"] += 1

Step 3: Process second response "lock touch cool":

words_in_response = {"lock","touch","cool"}
feature_count["lock"] += 1
feature_count["touch"] += 1

Step 4: Process third response "locker like touch":

words_in_response = {"locker","like","touch"}
feature_count["touch"] += 1

Final counts:

{"cooler":1,"lock":1,"touch":2}

Sorted by popularity: ["touch","cooler","lock"]

Example 2:

features = ["a","aa","b","c"]
responses = ["a","a aa","a a a a a","b a"]

Final counts:

{"a":4,"aa":1,"b":1,"c":0}

Sorted: ["a","aa","b","c"]

Complexity Analysis

Measure Complexity Explanation
Time O(R * W + F * log F) R is number of responses, W is avg words per response; sorting features takes F * log F
Space O(F + R * W) O(F) for count and index maps, O(W) for words set in each response

The algorithm scales well because R is small compared to F, and checking membership in a set is O(1).

Test Cases

solution = Solution()

# Example 1
assert solution.sortFeatures(["cooler","lock","touch"], ["i like cooler cooler","lock touch cool","locker like touch"]) == ["touch","cooler","lock"]

# Example 2
assert solution.sortFeatures(["a","aa","b","c"], ["a","a aa","a a a a a","b a"]) == ["a","aa","b","c"]

# Feature never mentioned
assert solution.sortFeatures(["x","y","z"], ["x y","y","x x"]) == ["x","y","z"]

# All features mentioned once
assert solution.sortFeatures(["m","n","o"], ["m","n","o"]) == ["m","n","o"]

# Multiple mentions in same response
assert solution.sortFeatures(["p","q"], ["p p p q q","p q"]) == ["p","q"]

# No responses
assert solution.sortFeatures(["alpha","beta"], []) == ["alpha","beta"]

# Single feature multiple responses
assert solution.sortFeatures(["feat"], ["feat feat","feat"]) == ["feat"]
Test Why
Features appear multiple times in responses Ensures duplicate mentions in a single response are counted once
Feature never mentioned Validates correct handling of zero-popularity features
No responses Checks handling of empty responses list
All features mentioned once Confirms stable ordering when counts are equal
Single feature multiple responses Ensures counts are aggregated across responses

Edge Cases

1. Feature never appears in responses: Features with zero counts should appear at the end of the sorted list. The implementation handles this because their count remains zero, and the stable sort respects original order.

2. Responses contain repeated words: Multiple occurrences of a feature in a single response should only count once. Converting each response to a set ensures duplicates