LeetCode 916 - Word Subsets

The problem asks us to determine which strings in words1 satisfy all character requirements imposed by every string in words2. A string b is considered a subset of another string a if every character in b appears in a at least as many times as it appears in b.

LeetCode Problem 916

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String

Solution

Problem Understanding

The problem asks us to determine which strings in words1 satisfy all character requirements imposed by every string in words2.

A string b is considered a subset of another string a if every character in b appears in a at least as many times as it appears in b. This includes repeated characters. For example, "eoo" requires one 'e' and two 'o' characters.

For each word in words1, we must check whether it satisfies every constraint from every word in words2. If it does, it is called a universal word and should be included in the result.

The input consists of:

  • words1, the list of candidate words we want to test
  • words2, the list of required character patterns

The output is a list containing all universal words from words1.

The constraints are important:

  • Both arrays can contain up to 10^4 words
  • Each word length is at most 10
  • Only lowercase English letters are used

The short word length is significant because it means character frequency counting is cheap. However, the number of words can be large, so repeatedly comparing every pair of words would become expensive.

A naive implementation could easily become too slow if it checks every word in words1 against every word in words2 independently.

There are several important edge cases:

  • words2 may contain overlapping requirements such as "e" and "ee"
  • Multiple words in words2 may require different characters
  • A required word may contain repeated letters such as "ccc"
  • A candidate word may contain extra characters, which should not matter
  • Very short words, including single-character strings, must still work correctly

The problem guarantees that all strings in words1 are unique, which simplifies result handling because duplicates do not need special treatment.

Approaches

Brute Force Approach

The most direct solution is to test every word in words1 against every word in words2.

For each candidate word:

  1. Build a frequency count of its characters
  2. For every word in words2:
  • Build another frequency count
  • Verify that the candidate contains each required character with sufficient multiplicity

If the candidate passes every check, add it to the result.

This approach is correct because it explicitly validates every requirement independently. However, it repeats a large amount of work. If several words in words2 require the same characters, those requirements are recalculated over and over again.

With up to 10^4 words in each array, this repeated checking becomes inefficient.

Key Insight for the Optimal Solution

The critical observation is that all words in words2 can be merged into a single combined requirement.

Suppose words2 = ["e", "oo"].

A universal word only needs:

  • at least one 'e'
  • at least two 'o'

There is no need to separately check both words once we know the maximum required count for every character.

Similarly:

  • "e" requires e:1
  • "ee" requires e:2

The true combined requirement is simply e:2.

Therefore, instead of checking every word in words2 independently, we can compute one global frequency requirement array where each character stores the maximum frequency required across all words in words2.

Then every candidate word in words1 only needs to be checked once against this combined requirement.

This dramatically reduces redundant work.

Approach Time Complexity Space Complexity Notes
Brute Force O(n × m × 26) O(26) Rechecks every constraint repeatedly
Optimal O((n + m) × 26) O(26) Merges all requirements into one frequency profile

Here:

  • n = total characters across words1
  • m = total characters across words2

Since each word length is at most 10, the complexity is effectively linear in the number of words.

Algorithm Walkthrough

  1. Create a frequency array representing the maximum required count for each character.

We use an array of size 26 because the input only contains lowercase English letters. Index 0 corresponds to 'a', index 1 to 'b', and so on. 2. Iterate through every word in words2.

For each word, compute its character frequency count. 3. Merge the current word's frequencies into the global requirement array.

For every character:

  • keep the maximum count seen so far
  • this represents the strongest requirement among all words

For example:

  • "e" gives e:1
  • "ee" gives e:2

The merged requirement becomes e:2. 4. Iterate through every word in words1.

For each candidate word:

  • compute its character frequency count
  • compare it against the global requirement array
  1. Verify whether the candidate satisfies every requirement.

For every character:

  • if the candidate frequency is smaller than the required frequency, reject the word immediately
  1. If all requirements are satisfied, append the word to the result list.
  2. Return the final result list.

Why it works

The algorithm works because the merged requirement array captures the strongest requirement for every character across all words in words2.

If a candidate word satisfies these maximum requirements, then it automatically satisfies every individual word in words2.

This is sufficient because subset relationships are determined independently for each character frequency.

Python Solution

from typing import List

class Solution:
    def wordSubsets(self, words1: List[str], words2: List[str]) -> List[str]:
        def count_chars(word: str) -> List[int]:
            counts = [0] * 26
            
            for ch in word:
                counts[ord(ch) - ord('a')] += 1
            
            return counts

        required = [0] * 26

        # Build the maximum requirement from words2
        for word in words2:
            word_counts = count_chars(word)

            for i in range(26):
                required[i] = max(required[i], word_counts[i])

        result = []

        # Check each word in words1
        for word in words1:
            word_counts = count_chars(word)

            is_universal = True

            for i in range(26):
                if word_counts[i] < required[i]:
                    is_universal = False
                    break

            if is_universal:
                result.append(word)

        return result

The implementation begins with a helper function named count_chars, which converts a word into a frequency array of size 26. This representation allows constant-time access to character counts.

The required array stores the merged requirements from all words in words2. For every word in words2, we compute its frequency count and update each character position with the maximum value seen so far.

After preprocessing the requirements, we iterate through each candidate word in words1. Its frequency array is compared against the global requirement array.

If the candidate lacks enough occurrences of any character, it is rejected immediately using an early break. Otherwise, it is added to the result list.

The final result contains all universal words.

Go Solution

func wordSubsets(words1 []string, words2 []string) []string {
	countChars := func(word string) [26]int {
		var counts [26]int

		for _, ch := range word {
			counts[ch-'a']++
		}

		return counts
	}

	var required [26]int

	// Build maximum requirement from words2
	for _, word := range words2 {
		wordCounts := countChars(word)

		for i := 0; i < 26; i++ {
			if wordCounts[i] > required[i] {
				required[i] = wordCounts[i]
			}
		}
	}

	result := []string{}

	// Check each word in words1
	for _, word := range words1 {
		wordCounts := countChars(word)

		isUniversal := true

		for i := 0; i < 26; i++ {
			if wordCounts[i] < required[i] {
				isUniversal = false
				break
			}
		}

		if isUniversal {
			result = append(result, word)
		}
	}

	return result
}

The Go solution closely mirrors the Python implementation. One notable difference is the use of fixed-size arrays [26]int instead of slices for character frequencies. Since the alphabet size is constant, fixed arrays are efficient and avoid additional allocations.

Go strings are iterated using range, which produces runes. Because the input is guaranteed to contain only lowercase English letters, subtracting 'a' safely maps characters into indices from 0 to 25.

The result slice starts empty and grows dynamically with append.

Worked Examples

Example 1

Input:

words1 = ["amazon","apple","facebook","google","leetcode"]
words2 = ["e","o"]

First, build the combined requirement.

Word Frequency Requirement
"e" e:1
"o" o:1

Merged requirement:

Character Required Count
e 1
o 1

Now evaluate each word.

Word Contains e? Contains o? Universal?
amazon No Yes No
apple Yes No No
facebook Yes Yes Yes
google Yes Yes Yes
leetcode Yes Yes Yes

Final result:

["facebook","google","leetcode"]

Example 2

Input:

words1 = ["amazon","apple","facebook","google","leetcode"]
words2 = ["lc","eo"]

Frequency requirements:

Word Requirement
"lc" l:1, c:1
"eo" e:1, o:1

Merged requirement:

Character Required Count
l 1
c 1
e 1
o 1

Check each candidate.

Word Meets Requirement?
amazon Missing l and c
apple Missing c and o
facebook Missing l
google Missing c
leetcode Yes

Final result:

["leetcode"]

Example 3

Input:

words1 = ["acaac","cccbb","aacbb","caacc","bcbbb"]
words2 = ["c","cc","b"]

Requirements:

Word Requirement
"c" c:1
"cc" c:2
"b" b:1

Merged requirement:

Character Required Count
c 2
b 1

Check candidates.

Word c Count b Count Universal?
acaac 2 0 No
cccbb 3 2 Yes
aacbb 1 2 No
caacc 3 0 No
bcbbb 1 4 No

Final result:

["cccbb"]

Complexity Analysis

Measure Complexity Explanation
Time O((N + M) × 26) Each word is processed once with fixed-size frequency comparisons
Space O(26) Only constant-sized frequency arrays are used

Here:

  • N is the total number of characters across words1
  • M is the total number of characters across words2

Because the alphabet size is fixed at 26 lowercase letters, all frequency operations are effectively constant time. The algorithm scales linearly with the total input size.

Test Cases

from typing import List

class Solution:
    def wordSubsets(self, words1: List[str], words2: List[str]) -> List[str]:
        def count_chars(word: str) -> List[int]:
            counts = [0] * 26

            for ch in word:
                counts[ord(ch) - ord('a')] += 1

            return counts

        required = [0] * 26

        for word in words2:
            word_counts = count_chars(word)

            for i in range(26):
                required[i] = max(required[i], word_counts[i])

        result = []

        for word in words1:
            word_counts = count_chars(word)

            valid = True

            for i in range(26):
                if word_counts[i] < required[i]:
                    valid = False
                    break

            if valid:
                result.append(word)

        return result

solution = Solution()

# Example 1
assert solution.wordSubsets(
    ["amazon","apple","facebook","google","leetcode"],
    ["e","o"]
) == ["facebook","google","leetcode"]  # basic multiple requirements

# Example 2
assert solution.wordSubsets(
    ["amazon","apple","facebook","google","leetcode"],
    ["lc","eo"]
) == ["leetcode"]  # combined character requirements

# Example 3
assert solution.wordSubsets(
    ["acaac","cccbb","aacbb","caacc","bcbbb"],
    ["c","cc","b"]
) == ["cccbb"]  # repeated character requirement

# Single character exact match
assert solution.wordSubsets(
    ["a", "b", "c"],
    ["a"]
) == ["a"]  # smallest valid input

# Multiple repeated letters
assert solution.wordSubsets(
    ["aabbcc", "abc", "abcc"],
    ["aabc"]
) == ["aabbcc"]  # multiplicity matters

# No valid universal words
assert solution.wordSubsets(
    ["abc", "def"],
    ["zzz"]
) == []  # impossible requirement

# All words valid
assert solution.wordSubsets(
    ["abc", "abcd", "abcde"],
    ["a"]
) == ["abc", "abcd", "abcde"]  # every word satisfies condition

# Requirement merging test
assert solution.wordSubsets(
    ["eeeeoo", "eeo", "eo"],
    ["e", "ee", "ooo"]
) == []  # merged max counts eliminate all

# Complex overlap
assert solution.wordSubsets(
    ["leetcode", "google", "facebook"],
    ["ec", "oc", "ceo"]
) == ["leetcode"]  # overlapping requirements
Test Why
Basic multiple requirements Verifies independent character requirements
Combined character requirements Ensures merged constraints work
Repeated character requirement Confirms multiplicity handling
Smallest valid input Tests minimum boundary
Multiplicity matters Ensures repeated letters are counted
Impossible requirement Verifies empty result handling
Every word valid Confirms broad acceptance
Requirement merging Tests maximum-frequency merging logic
Complex overlap Validates overlapping constraints

Edge Cases

One important edge case occurs when words2 contains repeated requirements for the same character, such as ["e", "ee", "eee"]. A naive implementation might incorrectly sum these counts and require six 'e' characters. The correct behavior is to require only the maximum count, which is three. The implementation handles this correctly by storing the maximum frequency for each character.

Another important case involves repeated letters inside a single word, such as "ccc" or "aabb". A buggy implementation might only check whether a character exists rather than whether it appears enough times. This solution uses full frequency arrays, so multiplicity is handled precisely.

A third edge case occurs when no words satisfy the constraints. For example, if words2 = ["zzz"] and no candidate contains three 'z' characters, the result should be an empty list. The implementation naturally handles this because every candidate fails the frequency comparison and nothing is appended to the result.

A subtle case involves words that contain many extra characters. Extra characters should not invalidate a candidate. For example, "facebook" is still valid for the requirement "eo" even though it contains many unrelated letters. The algorithm only checks lower bounds on required frequencies, so extra characters are ignored correctly.

Finally, single-character inputs are important boundary cases. Arrays containing words like "a" or "b" verify that indexing logic and frequency counting work correctly even at minimal sizes.