LeetCode 472 - Concatenated Words

The problem asks us to identify all words in a given list that can be formed by concatenating at least two other words from the same list. In other words, each concatenated word must be fully composed of smaller words that exist in the input array.

LeetCode Problem 472

Difficulty: 🔴 Hard
Topics: Array, String, Dynamic Programming, Depth-First Search, Trie, Sorting

Solution

Problem Understanding

The problem asks us to identify all words in a given list that can be formed by concatenating at least two other words from the same list. In other words, each concatenated word must be fully composed of smaller words that exist in the input array. The input is a list of unique lowercase strings, and the output is a list of strings that meet the concatenation criterion.

The constraints tell us that the total number of words can be up to 10,000 and the total combined length of all words can be up to 100,000 characters. Each individual word is relatively short (up to 30 characters). This suggests that an efficient solution must avoid excessive repeated substring checks and leverage fast lookups of words.

Edge cases include words that are themselves very short, words that cannot possibly be concatenated because no smaller words exist in the list, and words that are substrings of themselves. The guarantee that words are unique prevents duplicates but does not prevent words from being prefixes of other words.

Approaches

The brute-force approach involves checking, for each word, whether it can be formed by concatenating two or more other words in the list. This can be done by trying every possible split of the word into two parts and recursively checking if each part is a valid concatenated word. While correct, this approach quickly becomes prohibitively slow because the number of substring splits grows exponentially with the length of the word.

The optimal approach uses a combination of a hash set for fast word lookups and dynamic programming to avoid redundant computations. For each word, we maintain a boolean array where dp[i] indicates whether the substring up to index i can be formed by concatenating words in the set. By checking all possible splits in a systematic way and reusing results from smaller substrings, we reduce redundant calculations and achieve much better performance. Additionally, sorting words by length allows us to only attempt forming longer words from smaller words, which naturally enforces the “at least two words” requirement.

Approach Time Complexity Space Complexity Notes
Brute Force O(n * 2^L) O(L) Check all splits recursively for each word of length L, too slow for constraints
Optimal O(N * L^2) O(N + L) Use hash set for fast lookups and DP for substring concatenation

Algorithm Walkthrough

  1. Sort the words by length. This ensures that when evaluating a word, all potential smaller components have already been considered.
  2. Initialize a hash set to store all words that we have seen so far. This allows constant-time lookup to check if a substring is a valid word.
  3. Iterate through each word in the sorted list.
  4. For each word, use a dynamic programming array dp of length len(word) + 1, where dp[i] represents whether the substring word[0:i] can be formed by concatenating smaller words.
  5. Set dp[0] = True because the empty string is trivially valid.
  6. For each index i from 1 to len(word), check every index j from 0 to i. If dp[j] is true and word[j:i] exists in the set, set dp[i] = True and break.
  7. If dp[len(word)] is True and the word length is nonzero, add the word to the result list.
  8. Add the current word to the hash set so it can be used for forming longer words.

Why it works: The DP array ensures that every possible substring combination is checked exactly once. By iterating through smaller words first, we guarantee that a word is only considered a concatenated word if it can be composed entirely of previously seen words. This method avoids counting the word itself as a component and satisfies the “at least two words” constraint.

Python Solution

from typing import List

class Solution:
    def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
        words.sort(key=len)
        word_set = set()
        result = []
        
        for word in words:
            if not word:
                continue
            dp = [False] * (len(word) + 1)
            dp[0] = True
            for i in range(1, len(word) + 1):
                for j in range(i):
                    if dp[j] and word[j:i] in word_set:
                        dp[i] = True
                        break
            if dp[len(word)]:
                result.append(word)
            word_set.add(word)
        
        return result

This implementation first sorts the words to ensure smaller components are processed first. The DP array is initialized for each word, and the nested loop efficiently determines whether a substring can be formed from previously seen words. Adding the current word to the set after processing ensures it can contribute to forming longer words but does not mistakenly form itself.

Go Solution

func findAllConcatenatedWordsInADict(words []string) []string {
    sort.Slice(words, func(i, j int) bool {
        return len(words[i]) < len(words[j])
    })
    
    wordSet := make(map[string]bool)
    result := []string{}
    
    for _, word := range words {
        if len(word) == 0 {
            continue
        }
        dp := make([]bool, len(word)+1)
        dp[0] = true
        for i := 1; i <= len(word); i++ {
            for j := 0; j < i; j++ {
                if dp[j] && wordSet[word[j:i]] {
                    dp[i] = true
                    break
                }
            }
        }
        if dp[len(word)] {
            result = append(result, word)
        }
        wordSet[word] = true
    }
    
    return result
}

In Go, slices are used for the DP array and maps for word lookup. Sorting is handled with sort.Slice. The logic mirrors the Python version, with careful handling of slice indexing and map lookups. Adding a word to the map after processing ensures it is available for concatenation in subsequent iterations.

Worked Examples

Example 1: ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]

Sorting by length: ["cat","dog","rat","cats","catsdogcats","dogcatsdog","ratcatdogcat","hippopotamuses"]

Word DP array after processing Result added?
cat [T,F,F,F] No
dog [T,F,F,F] No
rat [T,F,F,F] No
cats [T,F,F,F,F] No
catsdogcats [T,F,F,F,T,F,F,F,T,F,F,F,T,F,F,F] Yes
dogcatsdog similar DP Yes
ratcatdogcat similar DP Yes
hippopotamuses DP never reaches True at end No

Example 2: ["cat","dog","catdog"]

Word DP array Result added?
cat [T,F,F,F] No
dog [T,F,F,F] No
catdog [T,F,F,F,T,F,F,F,F,F,F,F] Yes

Complexity Analysis

Measure Complexity Explanation
Time O(N * L^2) Sorting takes O(N log N), DP nested loops take O(L^2) for each word of max length L, across N words
Space O(N + L) O(N) for word set, O(L) for DP array

Sorting ensures we process smaller words first. DP ensures we check all substring splits efficiently. The hash set allows constant-time substring lookups.

Test Cases

# Provided examples
assert Solution().findAllConcatenatedWordsInADict(["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]) == ["catsdogcats","dogcatsdog","ratcatdogcat"]
assert Solution().findAllConcatenatedWordsInADict(["cat","dog","catdog"]) == ["catdog"]

# Edge cases
assert Solution().findAllConcatenatedWordsInADict([""]) == []
assert Solution().findAllConcatenatedWordsInADict(["a","b","ab","abc"]) == ["ab"]
assert Solution().findAllConcatenatedWordsInADict(["a","aa","aaa"]) == ["aa","aaa"]
assert Solution().findAllConcatenatedWordsInADict(["abcd","abc","def","abcdef"]) == ["abcdef"]
Test Why
empty string validates handling of zero-length words
single letter words forming a concatenation checks DP logic with minimal length
repeated letters forming multiple concatenations tests multiple valid splits
word composed of two smaller words confirms proper concatenation detection

Edge Cases

One edge case is an empty string in the input. Our implementation skips empty words, preventing false positives or out-of-bound errors.

Another important case is when a word can be formed by repeating a single smaller word multiple times, like ["a","aa","aaa"]. The DP approach correctly identifies concatenations even if the same word is used more than once, since the algorithm allows repeated use of words in forming longer words.

A third edge case is words that are prefixes