LeetCode 2707 - Extra Characters in a String
The problem asks us to break a string s into non-overlapping substrings such that each substring exists in a given dictionary. Characters in s that cannot be matched with any dictionary word are considered extra characters.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Dynamic Programming, Trie
Solution
Problem Understanding
The problem asks us to break a string s into non-overlapping substrings such that each substring exists in a given dictionary. Characters in s that cannot be matched with any dictionary word are considered extra characters. The goal is to find the minimum number of extra characters when splitting s optimally.
The input s is a lowercase English string of length up to 50, and dictionary is a list of up to 50 distinct lowercase words, each of length up to 50. The constraints are small, which allows us to consider dynamic programming or backtracking approaches without hitting performance issues typical for larger inputs.
Edge cases include strings where no words from the dictionary match any part of s, strings fully contained in the dictionary, or overlapping possibilities where careful selection minimizes extra characters. The problem guarantees distinct words, avoiding duplicate handling complications.
Approaches
Brute Force
A brute-force approach would consider all possible ways to split the string and calculate extra characters for each. For each index, we could try every substring starting at that index, check if it exists in the dictionary, and recursively continue. This is correct because it explores every possible combination. However, the time complexity is exponential, O(2^n), because each character could either start a substring in the dictionary or be counted as an extra character. This is clearly too slow for practical use, even though the input size is small.
Optimal Approach
The optimal approach uses dynamic programming with a hash set for dictionary lookup. The key insight is to define dp[i] as the minimum number of extra characters in the substring s[i:]. For each index i, we explore all possible substrings s[i:j] and check if they exist in the dictionary. If a substring exists, it contributes zero extra characters; otherwise, we count the first character as extra and recurse. Using a DP array avoids recomputation, ensuring each substring is only considered once.
To speed up substring checks, we store the dictionary in a hash set, making lookups O(1).
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^n) | O(n) | Recursive exploration of all substring partitions |
| Optimal | O(n^2) | O(n + m) | DP array + hash set lookup, where n = |
Algorithm Walkthrough
- Convert the
dictionarylist into a hash set for O(1) lookup. This allows fast checking if a substring exists. - Initialize a DP array
dpof sizelen(s) + 1, wheredp[i]represents the minimum extra characters ins[i:]. Setdp[len(s)] = 0since the empty suffix has zero extra characters. - Iterate backwards from
i = len(s)-1to 0. For eachi, initializedp[i] = dp[i+1] + 1, which corresponds to treatings[i]as an extra character. - For each
jfromitolen(s)-1, check ifs[i:j+1]is in the dictionary. If it is, updatedp[i] = min(dp[i], dp[j+1]). - After filling the DP array,
dp[0]contains the minimum number of extra characters for the entire string.
Why it works: The DP maintains the invariant that dp[i] always represents the minimal extra characters for the suffix starting at i. By checking all substrings and leveraging previous computed results, the algorithm ensures all optimal partitions are considered without redundancy.
Python Solution
from typing import List
class Solution:
def minExtraChar(self, s: str, dictionary: List[str]) -> int:
word_set = set(dictionary)
n = len(s)
dp = [0] * (n + 1)
for i in range(n - 1, -1, -1):
dp[i] = dp[i + 1] + 1 # Assume s[i] is extra
for j in range(i, n):
if s[i:j + 1] in word_set:
dp[i] = min(dp[i], dp[j + 1])
return dp[0]
The code first converts the dictionary to a set for fast lookups. The DP array is filled from the end of the string backwards. At each index, we either count the character as extra or use a matching dictionary word to minimize extra characters. Finally, dp[0] gives the optimal result.
Go Solution
func minExtraChar(s string, dictionary []string) int {
wordSet := make(map[string]bool)
for _, word := range dictionary {
wordSet[word] = true
}
n := len(s)
dp := make([]int, n+1)
for i := n - 1; i >= 0; i-- {
dp[i] = dp[i+1] + 1 // treat s[i] as extra
for j := i; j < n; j++ {
if wordSet[s[i:j+1]] {
if dp[j+1] < dp[i] {
dp[i] = dp[j+1]
}
}
}
}
return dp[0]
}
The Go solution mirrors the Python approach. A map is used instead of a set. String slicing in Go s[i:j+1] behaves similarly to Python. Care is taken to initialize the DP array and iterate backwards, updating minimum extra characters at each step.
Worked Examples
Example 1:
Input: s = "leetscode", dictionary = ["leet","code","leetcode"]
| i | Substring | In dict? | dp[i] calculation |
|---|---|---|---|
| 8 | "e" | No | dp[8] = dp[9]+1 = 1 |
| 7 | "de" | No | dp[7] = dp[8]+1 = 2 |
| ... | ... | ... | ... |
| 0 | "leet" | Yes | dp[0] = min(dp[0], dp[4]) = 1 |
Result: 1 extra character at index 4.
Example 2:
Input: s = "sayhelloworld", dictionary = ["hello","world"]
| i | Substring | In dict? | dp[i] calculation |
|---|---|---|---|
| 0 | "say" | No | dp[0] = dp[1]+1 = 3 |
| 3 | "hello" | Yes | dp[3] = dp[8] = 0 |
| 8 | "world" | Yes | dp[8] = dp[13] = 0 |
Result: 3 extra characters at indices 0,1,2.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n^2) | For each starting index, we check all possible substrings |
| Space | O(n + m) | DP array of size n and hash set storing dictionary words (m = sum of lengths) |
The time complexity is quadratic due to nested iteration over substring indices. Space is linear in the string size plus dictionary storage.
Test Cases
# Provided examples
assert Solution().minExtraChar("leetscode", ["leet","code","leetcode"]) == 1
assert Solution().minExtraChar("sayhelloworld", ["hello","world"]) == 3
# Edge cases
assert Solution().minExtraChar("abc", ["x","y","z"]) == 3 # no words match
assert Solution().minExtraChar("apple", ["apple"]) == 0 # entire string matches
assert Solution().minExtraChar("aaaaa", ["aa","aaa"]) == 0 # overlapping matches
assert Solution().minExtraChar("abcdef", ["ab","de"]) == 2 # mix of matched and extra
| Test | Why |
|---|---|
| "leetscode" | Basic case with partial match and one extra character |
| "sayhelloworld" | Extra characters at the beginning |
| "abc" | No dictionary match |
| "apple" | Full match |
| "aaaaa" | Overlapping substrings in dictionary |
| "abcdef" | Some matched, some extra characters |
Edge Cases
One important edge case is when no substring matches. Here, each character counts as extra, and the DP correctly accumulates the length of s.
Another edge case is when the entire string matches a dictionary word, which should return zero. The DP ensures that if a substring from i to end matches a word, dp[i] becomes zero.
Finally, overlapping dictionary entries, like ["aa","aaa"] for s="aaaaa", could lead to multiple valid partitions. The DP naturally chooses the partition that minimizes extra characters because it always takes the minimum over all possible substrings starting at i.
This careful DP design ensures correctness across these tricky scenarios.