LeetCode 140 - Word Break II

The problem asks us to segment a given string s into all possible sentences where each word in the sentence exists in a given dictionary wordDict.

LeetCode Problem 140

Difficulty: 🔴 Hard
Topics: Array, Hash Table, String, Dynamic Programming, Backtracking, Trie, Memoization

Solution

Problem Understanding

The problem asks us to segment a given string s into all possible sentences where each word in the sentence exists in a given dictionary wordDict. Essentially, we need to find all ways to insert spaces into the string such that every resulting substring is a valid dictionary word. The output should be a list of strings, each representing a valid sentence.

The input s is a string of lowercase English letters. The wordDict is a list of unique lowercase words, which can be reused multiple times. The problem ensures that the output will not exceed a total length of 10^5 characters, which is a hint that even though multiple combinations exist, the total output is manageable. The length of s is at most 20, so we can afford recursion or backtracking but need to be careful to avoid redundant calculations.

Important edge cases include:

  1. A string that cannot be segmented at all (e.g., "catsandog").
  2. Words in the dictionary that overlap in the string (e.g., "cat" and "cats" in "catsanddog").
  3. Empty dictionary or empty string (though constraints guarantee at least one character).

The challenge lies in finding all combinations efficiently, avoiding repeated work when exploring overlapping subproblems.

Approaches

Brute Force

A brute-force solution tries to split the string at every possible index and recursively checks if the left substring is a valid word and continues on the right substring. For each valid split, it combines the results recursively. While correct, this approach generates many overlapping subproblems because the same substring can be evaluated multiple times. Its time complexity grows exponentially with the length of s, making it impractical even for relatively small strings of length 20.

Optimized Approach: Backtracking with Memoization

The key insight is that many subproblems repeat: for example, if "sanddog" needs to be broken down, we may evaluate it multiple times in different recursion paths. By storing previously computed results in a hash map (memoization), we avoid recomputation. This reduces the complexity from exponential to a manageable level.

Additionally, using a set for the dictionary allows O(1) lookup, which is crucial when checking substring validity repeatedly.

Approach Time Complexity Space Complexity Notes
Brute Force O(2^n) O(n) Tries all possible splits recursively, exponential in string length
Optimal O(n * 2^n) O(n * 2^n) Memoized backtracking reduces repeated work; dictionary set allows fast lookups

Algorithm Walkthrough

  1. Convert wordDict to a set for O(1) lookups.
  2. Define a recursive helper function dfs(start) that returns all valid sentences for the substring s[start:].
  3. If start reaches the end of the string, return a list containing the empty string as a base case.
  4. Check if the current start has already been computed in the memo dictionary. If so, return the cached result.
  5. Initialize an empty list sentences to store results for this starting index.
  6. Loop through end from start + 1 to len(s) + 1. Extract the substring word = s[start:end].
  7. If word is in the dictionary, recursively call dfs(end) to get all sentences for the remaining string.
  8. For each sentence from the recursion, concatenate word and the sentence (with a space if the sentence is not empty) and append to sentences.
  9. Store sentences in the memo dictionary keyed by start.
  10. Return sentences.

Why it works: Each recursive call constructs all valid sentences for the substring starting at start. Memoization guarantees that each substring is processed at most once. By combining valid words with the results from subsequent substrings, we generate all valid sentences without missing any.

Python Solution

from typing import List

class Solution:
    def wordBreak(self, s: str, wordDict: List[str]) -> List[str]:
        word_set = set(wordDict)
        memo = {}
        
        def dfs(start: int) -> List[str]:
            if start in memo:
                return memo[start]
            
            if start == len(s):
                return [""]
            
            sentences = []
            for end in range(start + 1, len(s) + 1):
                word = s[start:end]
                if word in word_set:
                    for subsentence in dfs(end):
                        if subsentence:
                            sentences.append(word + " " + subsentence)
                        else:
                            sentences.append(word)
            
            memo[start] = sentences
            return sentences
        
        return dfs(0)

The implementation uses a recursive dfs function to explore every valid segmentation starting at each index. By storing results in memo, repeated substrings are computed only once. The nested loop checks each substring against the dictionary, and valid words are combined with recursive results to form complete sentences.

Go Solution

package main

func wordBreak(s string, wordDict []string) []string {
    wordSet := make(map[string]bool)
    for _, word := range wordDict {
        wordSet[word] = true
    }
    
    memo := make(map[int][]string)
    
    var dfs func(start int) []string
    dfs = func(start int) []string {
        if val, exists := memo[start]; exists {
            return val
        }
        if start == len(s) {
            return []string{""}
        }
        
        sentences := []string{}
        for end := start + 1; end <= len(s); end++ {
            word := s[start:end]
            if wordSet[word] {
                for _, subsentence := range dfs(end) {
                    if subsentence != "" {
                        sentences = append(sentences, word + " " + subsentence)
                    } else {
                        sentences = append(sentences, word)
                    }
                }
            }
        }
        memo[start] = sentences
        return sentences
    }
    
    return dfs(0)
}

The Go implementation mirrors the Python version closely. A map is used for memoization, and another map stores dictionary words for fast lookup. Go handles slices efficiently, so we append sentences to a slice instead of a list. Empty string handling ensures that spaces are inserted correctly.

Worked Examples

Example 1: s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"]

start end word Recursive result Sentences formed
0 3 "cat" dfs(3) -> ["sand dog"] "cat sand dog"
0 4 "cats" dfs(4) -> ["and dog"] "cats and dog"

Output: ["cats and dog", "cat sand dog"]

Example 2: s = "pineapplepenapple", wordDict = ["apple","pen","applepen","pine","pineapple"]

start end word Recursive result Sentences formed
0 4 "pine" dfs(4) -> ["apple pen apple","applepen apple"] "pine apple pen apple", "pine applepen apple"
0 9 "pineapple" dfs(9) -> ["pen apple"] "pineapple pen apple"

Output: ["pine apple pen apple","pineapple pen apple","pine applepen apple"]

Example 3: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]

No valid splits lead to a complete segmentation, so output is [].

Complexity Analysis

Measure Complexity Explanation
Time O(n * 2^n) Each substring can branch into multiple valid words. Memoization ensures each start index is processed once.
Space O(n * 2^n) Memo stores lists of sentences for each start index; recursion stack can go up to n.

Although worst-case exponential in the number of valid splits, memoization significantly reduces recomputation, making it feasible for s.length <= 20.

Test Cases

# Provided examples
assert sorted(Solution().wordBreak("catsanddog", ["cat","cats","and","sand","dog"])) == sorted(["cats and dog","cat sand dog"])  # overlapping words
assert sorted(Solution().wordBreak("pineapplepenapple", ["apple","pen","applepen","pine","pineapple"])) == sorted(["pine apple pen apple","pineapple pen apple","pine applepen apple"])  # reuse of words
assert Solution().wordBreak("catsandog", ["cats","dog","sand","and","cat"]) == []  # impossible segmentation

# Edge cases
assert Solution().wordBreak("a", ["a"]) == ["a"]  # single character
assert Solution().wordBreak("aaaa", ["a","aa"]) == ["a a a a","aa a a","a aa a","a a aa","aa aa"])  # multiple splits with repeated words
assert Solution().wordBreak("leetcode", ["leet","code"]) == ["leet code"]  # simple valid segmentation
assert Solution().wordBreak("leetcode", ["le","etc","ode"]) == []  # segmentation impossible due to missing word
Test Why
"catsanddog" Tests overlapping dictionary words and multiple