LeetCode 2707 - Extra Characters in a String

The problem asks us to break a string s into non-overlapping substrings such that each substring exists in a given dictionary. Characters in s that cannot be matched with any dictionary word are considered extra characters.

LeetCode Problem 2707

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Dynamic Programming, Trie

Solution

Problem Understanding

The problem asks us to break a string s into non-overlapping substrings such that each substring exists in a given dictionary. Characters in s that cannot be matched with any dictionary word are considered extra characters. The goal is to find the minimum number of extra characters when splitting s optimally.

The input s is a lowercase English string of length up to 50, and dictionary is a list of up to 50 distinct lowercase words, each of length up to 50. The constraints are small, which allows us to consider dynamic programming or backtracking approaches without hitting performance issues typical for larger inputs.

Edge cases include strings where no words from the dictionary match any part of s, strings fully contained in the dictionary, or overlapping possibilities where careful selection minimizes extra characters. The problem guarantees distinct words, avoiding duplicate handling complications.

Approaches

Brute Force

A brute-force approach would consider all possible ways to split the string and calculate extra characters for each. For each index, we could try every substring starting at that index, check if it exists in the dictionary, and recursively continue. This is correct because it explores every possible combination. However, the time complexity is exponential, O(2^n), because each character could either start a substring in the dictionary or be counted as an extra character. This is clearly too slow for practical use, even though the input size is small.

Optimal Approach

The optimal approach uses dynamic programming with a hash set for dictionary lookup. The key insight is to define dp[i] as the minimum number of extra characters in the substring s[i:]. For each index i, we explore all possible substrings s[i:j] and check if they exist in the dictionary. If a substring exists, it contributes zero extra characters; otherwise, we count the first character as extra and recurse. Using a DP array avoids recomputation, ensuring each substring is only considered once.

To speed up substring checks, we store the dictionary in a hash set, making lookups O(1).

Approach Time Complexity Space Complexity Notes
Brute Force O(2^n) O(n) Recursive exploration of all substring partitions
Optimal O(n^2) O(n + m) DP array + hash set lookup, where n =

Algorithm Walkthrough

  1. Convert the dictionary list into a hash set for O(1) lookup. This allows fast checking if a substring exists.
  2. Initialize a DP array dp of size len(s) + 1, where dp[i] represents the minimum extra characters in s[i:]. Set dp[len(s)] = 0 since the empty suffix has zero extra characters.
  3. Iterate backwards from i = len(s)-1 to 0. For each i, initialize dp[i] = dp[i+1] + 1, which corresponds to treating s[i] as an extra character.
  4. For each j from i to len(s)-1, check if s[i:j+1] is in the dictionary. If it is, update dp[i] = min(dp[i], dp[j+1]).
  5. After filling the DP array, dp[0] contains the minimum number of extra characters for the entire string.

Why it works: The DP maintains the invariant that dp[i] always represents the minimal extra characters for the suffix starting at i. By checking all substrings and leveraging previous computed results, the algorithm ensures all optimal partitions are considered without redundancy.

Python Solution

from typing import List

class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        word_set = set(dictionary)
        n = len(s)
        dp = [0] * (n + 1)
        
        for i in range(n - 1, -1, -1):
            dp[i] = dp[i + 1] + 1  # Assume s[i] is extra
            for j in range(i, n):
                if s[i:j + 1] in word_set:
                    dp[i] = min(dp[i], dp[j + 1])
        return dp[0]

The code first converts the dictionary to a set for fast lookups. The DP array is filled from the end of the string backwards. At each index, we either count the character as extra or use a matching dictionary word to minimize extra characters. Finally, dp[0] gives the optimal result.

Go Solution

func minExtraChar(s string, dictionary []string) int {
    wordSet := make(map[string]bool)
    for _, word := range dictionary {
        wordSet[word] = true
    }

    n := len(s)
    dp := make([]int, n+1)

    for i := n - 1; i >= 0; i-- {
        dp[i] = dp[i+1] + 1 // treat s[i] as extra
        for j := i; j < n; j++ {
            if wordSet[s[i:j+1]] {
                if dp[j+1] < dp[i] {
                    dp[i] = dp[j+1]
                }
            }
        }
    }
    return dp[0]
}

The Go solution mirrors the Python approach. A map is used instead of a set. String slicing in Go s[i:j+1] behaves similarly to Python. Care is taken to initialize the DP array and iterate backwards, updating minimum extra characters at each step.

Worked Examples

Example 1:

Input: s = "leetscode", dictionary = ["leet","code","leetcode"]

i Substring In dict? dp[i] calculation
8 "e" No dp[8] = dp[9]+1 = 1
7 "de" No dp[7] = dp[8]+1 = 2
... ... ... ...
0 "leet" Yes dp[0] = min(dp[0], dp[4]) = 1

Result: 1 extra character at index 4.

Example 2:

Input: s = "sayhelloworld", dictionary = ["hello","world"]

i Substring In dict? dp[i] calculation
0 "say" No dp[0] = dp[1]+1 = 3
3 "hello" Yes dp[3] = dp[8] = 0
8 "world" Yes dp[8] = dp[13] = 0

Result: 3 extra characters at indices 0,1,2.

Complexity Analysis

Measure Complexity Explanation
Time O(n^2) For each starting index, we check all possible substrings
Space O(n + m) DP array of size n and hash set storing dictionary words (m = sum of lengths)

The time complexity is quadratic due to nested iteration over substring indices. Space is linear in the string size plus dictionary storage.

Test Cases

# Provided examples
assert Solution().minExtraChar("leetscode", ["leet","code","leetcode"]) == 1
assert Solution().minExtraChar("sayhelloworld", ["hello","world"]) == 3

# Edge cases
assert Solution().minExtraChar("abc", ["x","y","z"]) == 3  # no words match
assert Solution().minExtraChar("apple", ["apple"]) == 0   # entire string matches
assert Solution().minExtraChar("aaaaa", ["aa","aaa"]) == 0 # overlapping matches
assert Solution().minExtraChar("abcdef", ["ab","de"]) == 2 # mix of matched and extra
Test Why
"leetscode" Basic case with partial match and one extra character
"sayhelloworld" Extra characters at the beginning
"abc" No dictionary match
"apple" Full match
"aaaaa" Overlapping substrings in dictionary
"abcdef" Some matched, some extra characters

Edge Cases

One important edge case is when no substring matches. Here, each character counts as extra, and the DP correctly accumulates the length of s.

Another edge case is when the entire string matches a dictionary word, which should return zero. The DP ensures that if a substring from i to end matches a word, dp[i] becomes zero.

Finally, overlapping dictionary entries, like ["aa","aaa"] for s="aaaaa", could lead to multiple valid partitions. The DP naturally chooses the partition that minimizes extra characters because it always takes the minimum over all possible substrings starting at i.

This careful DP design ensures correctness across these tricky scenarios.