LeetCode 2135 - Count Words Obtained After Adding a Letter
The problem requires us to determine how many strings in targetWords can be formed from strings in startWords through a specific transformation.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Bit Manipulation, Sorting
Solution
Problem Understanding
The problem requires us to determine how many strings in targetWords can be formed from strings in startWords through a specific transformation. The transformation involves two operations: appending a single letter not already present in the string, and then rearranging all letters arbitrarily. Essentially, we are checking whether, by adding one new character and permuting the result, a target word can be produced from some starting word.
The input consists of two arrays of strings: startWords and targetWords, each containing lowercase English letters only, with no duplicates within a string. The output is a single integer representing the number of target words that can be obtained from any start word.
Constraints are important: there can be up to 50,000 words in each array, and each string has at most 26 letters. This suggests that a brute-force solution that checks every permutation is infeasible due to combinatorial explosion. Also, the unique-letter guarantee simplifies the problem because we do not need to consider repeated letters when performing operations.
Key edge cases include empty strings (although the constraints guarantee length ≥1), single-letter strings, and cases where the transformation is impossible due to overlapping letters.
Approaches
The brute-force approach would attempt to simulate the operation for each startWord by generating all possible strings obtained by adding one new letter and permuting the result, then checking if any match a targetWord. This approach is correct but inefficient, as generating all permutations of a string is factorial in length, which is prohibitively expensive for strings of length up to 26 and arrays of size 50,000.
The optimal approach leverages the fact that each string contains unique letters and uses bit masking to represent sets of letters efficiently. Each string can be encoded as a 26-bit integer where each bit represents the presence of a letter. For startWords, we store all bitmasks in a set. For each targetWord, we generate bitmasks by removing one letter at a time (since the operation involves adding a letter to get the target), and then check if the resulting mask exists in the start set. This is efficient because checking membership in a set is O(1), and generating up to 26 masks per target word is feasible.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(S * T * 26!) | O(S * 26!) | Generate all possible permutations after adding a letter |
| Optimal | O(S + 26 * T) | O(S) | Use bitmask representation for efficient checking |
Algorithm Walkthrough
- Initialize an empty set
start_masksto store bitmask representations of allstartWords. - For each word in
startWords, convert it into a bitmask where each bit represents the presence of a letter. Add this mask tostart_masks. - Initialize a counter
countto zero. This will track the number oftargetWordsthat can be obtained. - For each word in
targetWords, convert it into a bitmask representation. - For each letter in the
targetWord, generate a new mask by removing that letter (i.e., subtract the bit corresponding to that letter from the target mask). This represents a possiblestartWordthat could generate the target. - If any of these generated masks exists in
start_masks, incrementcountand break to the next target word. - After processing all target words, return
count.
Why it works: Each target word can only be obtained by adding one letter to a start word. By reversing the operation (removing one letter), we efficiently check all possible start word candidates using bitmasking. Since the mapping is unique and checking is O(1), this guarantees correctness.
Python Solution
from typing import List
class Solution:
def wordCount(self, startWords: List[str], targetWords: List[str]) -> int:
def word_to_mask(word: str) -> int:
mask = 0
for ch in word:
mask |= 1 << (ord(ch) - ord('a'))
return mask
start_masks = {word_to_mask(word) for word in startWords}
count = 0
for word in targetWords:
target_mask = word_to_mask(word)
for ch in word:
possible_start = target_mask & ~(1 << (ord(ch) - ord('a')))
if possible_start in start_masks:
count += 1
break
return count
The function word_to_mask converts a word into a bitmask, efficiently encoding all letters. start_masks stores all starting word masks for quick lookup. For each target word, removing each letter and checking membership ensures we only count valid transformations, avoiding unnecessary computation.
Go Solution
func wordCount(startWords []string, targetWords []string) int {
wordToMask := func(word string) int {
mask := 0
for i := 0; i < len(word); i++ {
mask |= 1 << (word[i] - 'a')
}
return mask
}
startMasks := make(map[int]struct{})
for _, word := range startWords {
startMasks[wordToMask(word)] = struct{}{}
}
count := 0
for _, word := range targetWords {
targetMask := wordToMask(word)
for i := 0; i < len(word); i++ {
possibleStart := targetMask & ^(1 << (word[i] - 'a'))
if _, exists := startMasks[possibleStart]; exists {
count++
break
}
}
}
return count
}
In Go, the bitmask conversion and map usage are slightly different syntactically. We use an empty struct as the map value to minimize memory. Logic remains identical to the Python version.
Worked Examples
Example 1: startWords = ["ant","act","tack"], targetWords = ["tack","act","acti"]
- Convert
startWordsto masks:ant -> 0b111001,act -> 0b1011,tack -> 0b11101 - Target "tack" mask: 0b11101. Removing letters:
- Remove 't' -> 0b101 -> not in start
- Remove 'a' -> 0b1101 -> not in start
- Remove 'c' -> 0b10101 -> not in start
- Remove 'k' -> 0b1011 -> matches "act" → count += 1
- Target "act" mask: cannot remove a letter to match any start → skip
- Target "acti" mask: 0b11011. Removing letters:
- Remove 'i' -> 0b1011 → matches "act" → count += 1
Result = 2
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(S + 26 * T) | Convert each start word to mask O(S) and for each target word, generate up to 26 masks and check O(1) membership |
| Space | O(S) | Store all start word masks |
This complexity is feasible given constraints, as S and T can be up to 50,000.
Test Cases
# Provided examples
assert Solution().wordCount(["ant","act","tack"], ["tack","act","acti"]) == 2
assert Solution().wordCount(["ab","a"], ["abc","abcd"]) == 1
# Edge cases
assert Solution().wordCount(["a"], ["ab"]) == 1 # smallest strings
assert Solution().wordCount(["a"], ["a"]) == 0 # cannot add a letter to itself
assert Solution().wordCount(["xyz"], ["wxyz"]) == 1 # adding a missing letter
assert Solution().wordCount(["abc"], ["def"]) == 0 # impossible conversion
| Test | Why |
|---|---|
| ["ant","act","tack"], ["tack","act","acti"] | Standard example with multiple matches |
| ["ab","a"], ["abc","abcd"] | Only first target is achievable |
| ["a"], ["ab"] | Smallest strings, single addition |
| ["a"], ["a"] | Cannot add a letter that already exists |
| ["xyz"], ["wxyz"] | Adding a missing letter to form target |
| ["abc"], ["def"] | Impossible transformation |
Edge Cases
Single-letter start and target: When startWords or targetWords have only one letter, adding any letter is trivial, but we must ensure we do not attempt to "add" the same letter, which is forbidden. The implementation correctly uses bitmasking and avoids this.
Target identical to start: If a target word is already in startWords, it cannot be counted, as the operation requires adding a letter. The algorithm handles this by generating masks with one letter removed; since removal of a non-existent letter does not create the same mask, the target is correctly skipped.
Maximum size inputs: With 50,000 words and up to 26 letters, brute-force permutation approaches fail. The bitmask approach remains efficient because it reduces string comparison to integer comparison and limits iterations to 26 per target. This ensures scalability.
This solution is robust, efficient, and handles all edge cases dictated by the problem constraints.