LeetCode 139 - Word Break
The problem asks us to determine whether a given string s can be broken into a sequence of valid words from a given dictionary wordDict. Essentially, we need to check if we can insert spaces into s such that every substring separated by these spaces exists in the dictionary.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Dynamic Programming, Trie, Memoization
Solution
Problem Understanding
The problem asks us to determine whether a given string s can be broken into a sequence of valid words from a given dictionary wordDict. Essentially, we need to check if we can insert spaces into s such that every substring separated by these spaces exists in the dictionary. The input consists of a string s with lowercase English letters and a list of unique lowercase words wordDict. The output is a boolean indicating whether such a segmentation exists.
The constraints tell us that the length of s is up to 300 characters and the dictionary can contain up to 1000 words, each up to 20 characters long. These constraints suggest that naive solutions that explore all possible segmentations recursively could be too slow, as the number of combinations can grow exponentially. Important edge cases include an empty string, a string where no words match, and cases where a valid segmentation requires reusing the same word multiple times.
Approaches
The brute-force approach is to try every possible segmentation of the string. For each index in the string, we can recursively check if the substring from the current index to the end can be segmented using words in the dictionary. While this guarantees correctness, it is extremely slow because it explores all combinations of segmentations. Its time complexity is exponential, O(2^n), where n is the length of the string, due to the combinatorial explosion of possible splits.
The key insight to improve the solution is to use dynamic programming. We can maintain a boolean array dp of size len(s) + 1, where dp[i] indicates whether the substring s[0:i] can be segmented using the dictionary. For each position i, we check all previous positions j where dp[j] is true and s[j:i] exists in the dictionary. This avoids redundant computation by building solutions for smaller substrings and reusing them for larger substrings.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^n) | O(n) | Recursively checks all substring combinations |
| Dynamic Programming | O(n^2 * m) | O(n + k) | Checks all substring ends with dictionary lookup; m = max word length, k = number of words |
Algorithm Walkthrough
- Convert
wordDictinto a set for O(1) lookups. This is important because we will repeatedly check if a substring exists in the dictionary. - Initialize a boolean array
dpof sizelen(s) + 1. Setdp[0] = Trueto represent the empty string being segmentable. - Iterate over the string using index
ifrom 1 tolen(s). For eachi, iterate backwards with indexjfrom 0 toi. - For each
j, check ifdp[j]is True (meanings[0:j]is segmentable) and ifs[j:i]is in the dictionary. - If both conditions are met, set
dp[i] = Trueand break the inner loop because we found a valid segmentation ending ati. - After filling the array,
dp[len(s)]contains the answer for the full string.
Why it works: The DP array maintains the invariant that dp[i] is True if and only if the substring s[0:i] can be segmented using words from the dictionary. By checking all possible previous positions j, we ensure that every valid segmentation is considered, but without repeating calculations for overlapping substrings.
Python Solution
from typing import List
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
word_set = set(wordDict)
n = len(s)
dp = [False] * (n + 1)
dp[0] = True
for i in range(1, n + 1):
for j in range(max(0, i - 20), i): # Optimization using max word length
if dp[j] and s[j:i] in word_set:
dp[i] = True
break
return dp[n]
In this Python implementation, we first convert the dictionary to a set for faster lookup. The dp array tracks which prefixes can be segmented. The nested loop checks for each end index i if there exists a valid split at some index j. The inner loop is optimized by considering only substrings of length up to 20, the maximum word length.
Go Solution
func wordBreak(s string, wordDict []string) bool {
wordSet := make(map[string]bool)
for _, word := range wordDict {
wordSet[word] = true
}
n := len(s)
dp := make([]bool, n+1)
dp[0] = true
for i := 1; i <= n; i++ {
for j := max(0, i-20); j < i; j++ {
if dp[j] && wordSet[s[j:i]] {
dp[i] = true
break
}
}
}
return dp[n]
}
func max(a, b int) int {
if a > b {
return a
}
return b
}
The Go implementation mirrors the Python approach. The dictionary is converted into a map for O(1) lookups. The DP array is a slice of booleans. We use a helper max function to avoid negative indexing when optimizing by maximum word length.
Worked Examples
Example 1: s = "leetcode", wordDict = ["leet","code"]
| i | j | dp[j] | s[j:i] | dp[i] after check |
|---|---|---|---|---|
| 4 | 0 | True | "leet" | True |
| 8 | 4 | True | "code" | True |
Result: dp[8] = True.
Example 2: s = "applepenapple", wordDict = ["apple","pen"]
| i | j | dp[j] | s[j:i] | dp[i] |
|---|---|---|---|---|
| 5 | 0 | True | "apple" | True |
| 8 | 5 | True | "pen" | True |
| 13 | 8 | True | "apple" | True |
Result: dp[13] = True.
Example 3: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
No combination of j and i satisfies the conditions for the final dp[9]. Result: False.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n * m) | We iterate over all end indices i and up to maximum word length m for each substring check. |
| Space | O(n + k) | dp array of size n+1 and the dictionary set/map of size k. |
The time complexity is efficient because we avoid redundant recursive calls and only check substrings up to the maximum word length. Space complexity is linear in the string length and dictionary size.
Test Cases
# provided examples
assert Solution().wordBreak("leetcode", ["leet","code"]) == True # "leet code"
assert Solution().wordBreak("applepenapple", ["apple","pen"]) == True # "apple pen apple"
assert Solution().wordBreak("catsandog", ["cats","dog","sand","and","cat"]) == False # no valid segmentation
# additional cases
assert Solution().wordBreak("", ["a"]) == True # empty string
assert Solution().wordBreak("a", ["b"]) == False # single character no match
assert Solution().wordBreak("aaaaaaa", ["aaaa","aaa"]) == True # reuse words multiple times
assert Solution().wordBreak("abcd", ["a","abc","b","cd"]) == True # multiple segmentation options
| Test | Why |
|---|---|
| "leetcode", ["leet","code"] | Basic positive case with two words |
| "applepenapple", ["apple","pen"] | Reusing dictionary words |
| "catsandog", ["cats","dog","sand","and","cat"] | No valid segmentation |
| "" | Edge case: empty string |
| "a", ["b"] | Single character mismatch |
| "aaaaaaa", ["aaaa","aaa"] | Overlapping word reuse |
| "abcd", ["a","abc","b","cd"] | Multiple segmentation paths |
Edge Cases
One important edge case is an empty string. According to the DP setup, dp[0] = True represents an empty string, so the algorithm correctly returns True without processing further. Another edge case is a string containing characters not present in the dictionary at all. The DP array will remain False for all indices beyond the empty string, returning False correctly. A third edge case involves words in the dictionary that are reused multiple times in the string, such as "aaaaaaa" segmented as ["aaaa","aaa"]. The algorithm correctly handles this because the DP array allows multiple valid splits to propagate True values forward, ensuring all possible combinations are considered.