LeetCode 140 - Word Break II
The problem asks us to segment a given string s into all possible sentences where each word in the sentence exists in a given dictionary wordDict.
Difficulty: 🔴 Hard
Topics: Array, Hash Table, String, Dynamic Programming, Backtracking, Trie, Memoization
Solution
Problem Understanding
The problem asks us to segment a given string s into all possible sentences where each word in the sentence exists in a given dictionary wordDict. Essentially, we need to find all ways to insert spaces into the string such that every resulting substring is a valid dictionary word. The output should be a list of strings, each representing a valid sentence.
The input s is a string of lowercase English letters. The wordDict is a list of unique lowercase words, which can be reused multiple times. The problem ensures that the output will not exceed a total length of 10^5 characters, which is a hint that even though multiple combinations exist, the total output is manageable. The length of s is at most 20, so we can afford recursion or backtracking but need to be careful to avoid redundant calculations.
Important edge cases include:
- A string that cannot be segmented at all (e.g., "catsandog").
- Words in the dictionary that overlap in the string (e.g., "cat" and "cats" in "catsanddog").
- Empty dictionary or empty string (though constraints guarantee at least one character).
The challenge lies in finding all combinations efficiently, avoiding repeated work when exploring overlapping subproblems.
Approaches
Brute Force
A brute-force solution tries to split the string at every possible index and recursively checks if the left substring is a valid word and continues on the right substring. For each valid split, it combines the results recursively. While correct, this approach generates many overlapping subproblems because the same substring can be evaluated multiple times. Its time complexity grows exponentially with the length of s, making it impractical even for relatively small strings of length 20.
Optimized Approach: Backtracking with Memoization
The key insight is that many subproblems repeat: for example, if "sanddog" needs to be broken down, we may evaluate it multiple times in different recursion paths. By storing previously computed results in a hash map (memoization), we avoid recomputation. This reduces the complexity from exponential to a manageable level.
Additionally, using a set for the dictionary allows O(1) lookup, which is crucial when checking substring validity repeatedly.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^n) | O(n) | Tries all possible splits recursively, exponential in string length |
| Optimal | O(n * 2^n) | O(n * 2^n) | Memoized backtracking reduces repeated work; dictionary set allows fast lookups |
Algorithm Walkthrough
- Convert
wordDictto a set for O(1) lookups. - Define a recursive helper function
dfs(start)that returns all valid sentences for the substrings[start:]. - If
startreaches the end of the string, return a list containing the empty string as a base case. - Check if the current
starthas already been computed in the memo dictionary. If so, return the cached result. - Initialize an empty list
sentencesto store results for this starting index. - Loop through
endfromstart + 1tolen(s) + 1. Extract the substringword = s[start:end]. - If
wordis in the dictionary, recursively calldfs(end)to get all sentences for the remaining string. - For each sentence from the recursion, concatenate
wordand the sentence (with a space if the sentence is not empty) and append tosentences. - Store
sentencesin the memo dictionary keyed bystart. - Return
sentences.
Why it works: Each recursive call constructs all valid sentences for the substring starting at start. Memoization guarantees that each substring is processed at most once. By combining valid words with the results from subsequent substrings, we generate all valid sentences without missing any.
Python Solution
from typing import List
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> List[str]:
word_set = set(wordDict)
memo = {}
def dfs(start: int) -> List[str]:
if start in memo:
return memo[start]
if start == len(s):
return [""]
sentences = []
for end in range(start + 1, len(s) + 1):
word = s[start:end]
if word in word_set:
for subsentence in dfs(end):
if subsentence:
sentences.append(word + " " + subsentence)
else:
sentences.append(word)
memo[start] = sentences
return sentences
return dfs(0)
The implementation uses a recursive dfs function to explore every valid segmentation starting at each index. By storing results in memo, repeated substrings are computed only once. The nested loop checks each substring against the dictionary, and valid words are combined with recursive results to form complete sentences.
Go Solution
package main
func wordBreak(s string, wordDict []string) []string {
wordSet := make(map[string]bool)
for _, word := range wordDict {
wordSet[word] = true
}
memo := make(map[int][]string)
var dfs func(start int) []string
dfs = func(start int) []string {
if val, exists := memo[start]; exists {
return val
}
if start == len(s) {
return []string{""}
}
sentences := []string{}
for end := start + 1; end <= len(s); end++ {
word := s[start:end]
if wordSet[word] {
for _, subsentence := range dfs(end) {
if subsentence != "" {
sentences = append(sentences, word + " " + subsentence)
} else {
sentences = append(sentences, word)
}
}
}
}
memo[start] = sentences
return sentences
}
return dfs(0)
}
The Go implementation mirrors the Python version closely. A map is used for memoization, and another map stores dictionary words for fast lookup. Go handles slices efficiently, so we append sentences to a slice instead of a list. Empty string handling ensures that spaces are inserted correctly.
Worked Examples
Example 1: s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"]
| start | end | word | Recursive result | Sentences formed |
|---|---|---|---|---|
| 0 | 3 | "cat" | dfs(3) -> ["sand dog"] | "cat sand dog" |
| 0 | 4 | "cats" | dfs(4) -> ["and dog"] | "cats and dog" |
Output: ["cats and dog", "cat sand dog"]
Example 2: s = "pineapplepenapple", wordDict = ["apple","pen","applepen","pine","pineapple"]
| start | end | word | Recursive result | Sentences formed |
|---|---|---|---|---|
| 0 | 4 | "pine" | dfs(4) -> ["apple pen apple","applepen apple"] | "pine apple pen apple", "pine applepen apple" |
| 0 | 9 | "pineapple" | dfs(9) -> ["pen apple"] | "pineapple pen apple" |
Output: ["pine apple pen apple","pineapple pen apple","pine applepen apple"]
Example 3: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
No valid splits lead to a complete segmentation, so output is [].
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n * 2^n) | Each substring can branch into multiple valid words. Memoization ensures each start index is processed once. |
| Space | O(n * 2^n) | Memo stores lists of sentences for each start index; recursion stack can go up to n. |
Although worst-case exponential in the number of valid splits, memoization significantly reduces recomputation, making it feasible for s.length <= 20.
Test Cases
# Provided examples
assert sorted(Solution().wordBreak("catsanddog", ["cat","cats","and","sand","dog"])) == sorted(["cats and dog","cat sand dog"]) # overlapping words
assert sorted(Solution().wordBreak("pineapplepenapple", ["apple","pen","applepen","pine","pineapple"])) == sorted(["pine apple pen apple","pineapple pen apple","pine applepen apple"]) # reuse of words
assert Solution().wordBreak("catsandog", ["cats","dog","sand","and","cat"]) == [] # impossible segmentation
# Edge cases
assert Solution().wordBreak("a", ["a"]) == ["a"] # single character
assert Solution().wordBreak("aaaa", ["a","aa"]) == ["a a a a","aa a a","a aa a","a a aa","aa aa"]) # multiple splits with repeated words
assert Solution().wordBreak("leetcode", ["leet","code"]) == ["leet code"] # simple valid segmentation
assert Solution().wordBreak("leetcode", ["le","etc","ode"]) == [] # segmentation impossible due to missing word
| Test | Why |
|---|---|
| "catsanddog" | Tests overlapping dictionary words and multiple |