LeetCode 472 - Concatenated Words
The problem asks us to identify all words in a given list that can be formed by concatenating at least two other words from the same list. In other words, each concatenated word must be fully composed of smaller words that exist in the input array.
Difficulty: 🔴 Hard
Topics: Array, String, Dynamic Programming, Depth-First Search, Trie, Sorting
Solution
Problem Understanding
The problem asks us to identify all words in a given list that can be formed by concatenating at least two other words from the same list. In other words, each concatenated word must be fully composed of smaller words that exist in the input array. The input is a list of unique lowercase strings, and the output is a list of strings that meet the concatenation criterion.
The constraints tell us that the total number of words can be up to 10,000 and the total combined length of all words can be up to 100,000 characters. Each individual word is relatively short (up to 30 characters). This suggests that an efficient solution must avoid excessive repeated substring checks and leverage fast lookups of words.
Edge cases include words that are themselves very short, words that cannot possibly be concatenated because no smaller words exist in the list, and words that are substrings of themselves. The guarantee that words are unique prevents duplicates but does not prevent words from being prefixes of other words.
Approaches
The brute-force approach involves checking, for each word, whether it can be formed by concatenating two or more other words in the list. This can be done by trying every possible split of the word into two parts and recursively checking if each part is a valid concatenated word. While correct, this approach quickly becomes prohibitively slow because the number of substring splits grows exponentially with the length of the word.
The optimal approach uses a combination of a hash set for fast word lookups and dynamic programming to avoid redundant computations. For each word, we maintain a boolean array where dp[i] indicates whether the substring up to index i can be formed by concatenating words in the set. By checking all possible splits in a systematic way and reusing results from smaller substrings, we reduce redundant calculations and achieve much better performance. Additionally, sorting words by length allows us to only attempt forming longer words from smaller words, which naturally enforces the “at least two words” requirement.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n * 2^L) | O(L) | Check all splits recursively for each word of length L, too slow for constraints |
| Optimal | O(N * L^2) | O(N + L) | Use hash set for fast lookups and DP for substring concatenation |
Algorithm Walkthrough
- Sort the words by length. This ensures that when evaluating a word, all potential smaller components have already been considered.
- Initialize a hash set to store all words that we have seen so far. This allows constant-time lookup to check if a substring is a valid word.
- Iterate through each word in the sorted list.
- For each word, use a dynamic programming array
dpof lengthlen(word) + 1, wheredp[i]represents whether the substringword[0:i]can be formed by concatenating smaller words. - Set
dp[0] = Truebecause the empty string is trivially valid. - For each index
ifrom 1 tolen(word), check every indexjfrom 0 toi. Ifdp[j]is true andword[j:i]exists in the set, setdp[i] = Trueand break. - If
dp[len(word)]is True and the word length is nonzero, add the word to the result list. - Add the current word to the hash set so it can be used for forming longer words.
Why it works: The DP array ensures that every possible substring combination is checked exactly once. By iterating through smaller words first, we guarantee that a word is only considered a concatenated word if it can be composed entirely of previously seen words. This method avoids counting the word itself as a component and satisfies the “at least two words” constraint.
Python Solution
from typing import List
class Solution:
def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
words.sort(key=len)
word_set = set()
result = []
for word in words:
if not word:
continue
dp = [False] * (len(word) + 1)
dp[0] = True
for i in range(1, len(word) + 1):
for j in range(i):
if dp[j] and word[j:i] in word_set:
dp[i] = True
break
if dp[len(word)]:
result.append(word)
word_set.add(word)
return result
This implementation first sorts the words to ensure smaller components are processed first. The DP array is initialized for each word, and the nested loop efficiently determines whether a substring can be formed from previously seen words. Adding the current word to the set after processing ensures it can contribute to forming longer words but does not mistakenly form itself.
Go Solution
func findAllConcatenatedWordsInADict(words []string) []string {
sort.Slice(words, func(i, j int) bool {
return len(words[i]) < len(words[j])
})
wordSet := make(map[string]bool)
result := []string{}
for _, word := range words {
if len(word) == 0 {
continue
}
dp := make([]bool, len(word)+1)
dp[0] = true
for i := 1; i <= len(word); i++ {
for j := 0; j < i; j++ {
if dp[j] && wordSet[word[j:i]] {
dp[i] = true
break
}
}
}
if dp[len(word)] {
result = append(result, word)
}
wordSet[word] = true
}
return result
}
In Go, slices are used for the DP array and maps for word lookup. Sorting is handled with sort.Slice. The logic mirrors the Python version, with careful handling of slice indexing and map lookups. Adding a word to the map after processing ensures it is available for concatenation in subsequent iterations.
Worked Examples
Example 1: ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]
Sorting by length: ["cat","dog","rat","cats","catsdogcats","dogcatsdog","ratcatdogcat","hippopotamuses"]
| Word | DP array after processing | Result added? |
|---|---|---|
| cat | [T,F,F,F] | No |
| dog | [T,F,F,F] | No |
| rat | [T,F,F,F] | No |
| cats | [T,F,F,F,F] | No |
| catsdogcats | [T,F,F,F,T,F,F,F,T,F,F,F,T,F,F,F] | Yes |
| dogcatsdog | similar DP | Yes |
| ratcatdogcat | similar DP | Yes |
| hippopotamuses | DP never reaches True at end | No |
Example 2: ["cat","dog","catdog"]
| Word | DP array | Result added? |
|---|---|---|
| cat | [T,F,F,F] | No |
| dog | [T,F,F,F] | No |
| catdog | [T,F,F,F,T,F,F,F,F,F,F,F] | Yes |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N * L^2) | Sorting takes O(N log N), DP nested loops take O(L^2) for each word of max length L, across N words |
| Space | O(N + L) | O(N) for word set, O(L) for DP array |
Sorting ensures we process smaller words first. DP ensures we check all substring splits efficiently. The hash set allows constant-time substring lookups.
Test Cases
# Provided examples
assert Solution().findAllConcatenatedWordsInADict(["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]) == ["catsdogcats","dogcatsdog","ratcatdogcat"]
assert Solution().findAllConcatenatedWordsInADict(["cat","dog","catdog"]) == ["catdog"]
# Edge cases
assert Solution().findAllConcatenatedWordsInADict([""]) == []
assert Solution().findAllConcatenatedWordsInADict(["a","b","ab","abc"]) == ["ab"]
assert Solution().findAllConcatenatedWordsInADict(["a","aa","aaa"]) == ["aa","aaa"]
assert Solution().findAllConcatenatedWordsInADict(["abcd","abc","def","abcdef"]) == ["abcdef"]
| Test | Why |
|---|---|
| empty string | validates handling of zero-length words |
| single letter words forming a concatenation | checks DP logic with minimal length |
| repeated letters forming multiple concatenations | tests multiple valid splits |
| word composed of two smaller words | confirms proper concatenation detection |
Edge Cases
One edge case is an empty string in the input. Our implementation skips empty words, preventing false positives or out-of-bound errors.
Another important case is when a word can be formed by repeating a single smaller word multiple times, like ["a","aa","aaa"]. The DP approach correctly identifies concatenations even if the same word is used more than once, since the algorithm allows repeated use of words in forming longer words.
A third edge case is words that are prefixes