LeetCode 139 - Word Break

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Dynamic Programming, Trie, Memoization

Solution

Problem Understanding

The problem asks us to determine whether a given string s can be broken into a sequence of valid words from a given dictionary wordDict. Essentially, we need to check if we can insert spaces into s such that every substring separated by these spaces exists in the dictionary. The input consists of a string s with lowercase English letters and a list of unique lowercase words wordDict. The output is a boolean indicating whether such a segmentation exists.

The constraints tell us that the length of s is up to 300 characters and the dictionary can contain up to 1000 words, each up to 20 characters long. These constraints suggest that naive solutions that explore all possible segmentations recursively could be too slow, as the number of combinations can grow exponentially. Important edge cases include an empty string, a string where no words match, and cases where a valid segmentation requires reusing the same word multiple times.

Approaches

The brute-force approach is to try every possible segmentation of the string. For each index in the string, we can recursively check if the substring from the current index to the end can be segmented using words in the dictionary. While this guarantees correctness, it is extremely slow because it explores all combinations of segmentations. Its time complexity is exponential, O(2^n), where n is the length of the string, due to the combinatorial explosion of possible splits.

The key insight to improve the solution is to use dynamic programming. We can maintain a boolean array dp of size len(s) + 1, where dp[i] indicates whether the substring s[0:i] can be segmented using the dictionary. For each position i, we check all previous positions j where dp[j] is true and s[j:i] exists in the dictionary. This avoids redundant computation by building solutions for smaller substrings and reusing them for larger substrings.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(2^n)	O(n)	Recursively checks all substring combinations
Dynamic Programming	O(n^2 * m)	O(n + k)	Checks all substring ends with dictionary lookup; m = max word length, k = number of words

Algorithm Walkthrough

Convert wordDict into a set for O(1) lookups. This is important because we will repeatedly check if a substring exists in the dictionary.
Initialize a boolean array dp of size len(s) + 1. Set dp[0] = True to represent the empty string being segmentable.
Iterate over the string using index i from 1 to len(s). For each i, iterate backwards with index j from 0 to i.
For each j, check if dp[j] is True (meaning s[0:j] is segmentable) and if s[j:i] is in the dictionary.
If both conditions are met, set dp[i] = True and break the inner loop because we found a valid segmentation ending at i.
After filling the array, dp[len(s)] contains the answer for the full string.

Why it works: The DP array maintains the invariant that dp[i] is True if and only if the substring s[0:i] can be segmented using words from the dictionary. By checking all possible previous positions j, we ensure that every valid segmentation is considered, but without repeating calculations for overlapping substrings.

Python Solution

from typing import List

class Solution:
    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        word_set = set(wordDict)
        n = len(s)
        dp = [False] * (n + 1)
        dp[0] = True
        
        for i in range(1, n + 1):
            for j in range(max(0, i - 20), i):  # Optimization using max word length
                if dp[j] and s[j:i] in word_set:
                    dp[i] = True
                    break
        
        return dp[n]

In this Python implementation, we first convert the dictionary to a set for faster lookup. The dp array tracks which prefixes can be segmented. The nested loop checks for each end index i if there exists a valid split at some index j. The inner loop is optimized by considering only substrings of length up to 20, the maximum word length.

Go Solution

func wordBreak(s string, wordDict []string) bool {
    wordSet := make(map[string]bool)
    for _, word := range wordDict {
        wordSet[word] = true
    }
    
    n := len(s)
    dp := make([]bool, n+1)
    dp[0] = true
    
    for i := 1; i <= n; i++ {
        for j := max(0, i-20); j < i; j++ {
            if dp[j] && wordSet[s[j:i]] {
                dp[i] = true
                break
            }
        }
    }
    
    return dp[n]
}

func max(a, b int) int {
    if a > b {
        return a
    }
    return b
}

The Go implementation mirrors the Python approach. The dictionary is converted into a map for O(1) lookups. The DP array is a slice of booleans. We use a helper max function to avoid negative indexing when optimizing by maximum word length.

Worked Examples

Example 1: s = "leetcode", wordDict = ["leet","code"]

i	j	dp[j]	s[j:i]	dp[i] after check
4	0	True	"leet"	True
8	4	True	"code"	True

Result: dp[8] = True.

Example 2: s = "applepenapple", wordDict = ["apple","pen"]

i	j	dp[j]	s[j:i]	dp[i]
5	0	True	"apple"	True
8	5	True	"pen"	True
13	8	True	"apple"	True

Result: dp[13] = True.

Example 3: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]

No combination of j and i satisfies the conditions for the final dp[9]. Result: False.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n * m)	We iterate over all end indices `i` and up to maximum word length `m` for each substring check.
Space	O(n + k)	`dp` array of size n+1 and the dictionary set/map of size k.

The time complexity is efficient because we avoid redundant recursive calls and only check substrings up to the maximum word length. Space complexity is linear in the string length and dictionary size.

Test Cases

# provided examples
assert Solution().wordBreak("leetcode", ["leet","code"]) == True  # "leet code"
assert Solution().wordBreak("applepenapple", ["apple","pen"]) == True  # "apple pen apple"
assert Solution().wordBreak("catsandog", ["cats","dog","sand","and","cat"]) == False  # no valid segmentation

# additional cases
assert Solution().wordBreak("", ["a"]) == True  # empty string
assert Solution().wordBreak("a", ["b"]) == False  # single character no match
assert Solution().wordBreak("aaaaaaa", ["aaaa","aaa"]) == True  # reuse words multiple times
assert Solution().wordBreak("abcd", ["a","abc","b","cd"]) == True  # multiple segmentation options

Test	Why
"leetcode", ["leet","code"]	Basic positive case with two words
"applepenapple", ["apple","pen"]	Reusing dictionary words
"catsandog", ["cats","dog","sand","and","cat"]	No valid segmentation
""	Edge case: empty string
"a", ["b"]	Single character mismatch
"aaaaaaa", ["aaaa","aaa"]	Overlapping word reuse
"abcd", ["a","abc","b","cd"]	Multiple segmentation paths

Edge Cases

One important edge case is an empty string. According to the DP setup, dp[0] = True represents an empty string, so the algorithm correctly returns True without processing further. Another edge case is a string containing characters not present in the dictionary at all. The DP array will remain False for all indices beyond the empty string, returning False correctly. A third edge case involves words in the dictionary that are reused multiple times in the string, such as "aaaaaaa" segmented as ["aaaa","aaa"]. The algorithm correctly handles this because the DP array allows multiple valid splits to propagate True values forward, ensuring all possible combinations are considered.