LeetCode 943 - Find the Shortest Superstring
The problem asks us to find the shortest superstring that contains all the given strings in the array words as substrings.
Difficulty: 🔴 Hard
Topics: Array, String, Dynamic Programming, Bit Manipulation, Bitmask
Solution
Problem Understanding
The problem asks us to find the shortest superstring that contains all the given strings in the array words as substrings. In other words, we need a single string that includes every string in words at least once, concatenated in some order, but we want this superstring to have minimal length. The problem guarantees that no string in words is a substring of another, which simplifies overlap handling.
The input words is an array of unique strings of lowercase English letters. The array length is between 1 and 12, and each string can be up to 20 characters long. The output is a single string, which is the shortest possible combination containing all words. Multiple outputs can be valid if multiple minimal-length superstrings exist.
Edge cases include: a single string in the array, strings that do not overlap at all, and strings that overlap completely except for a few characters. These situations test how overlaps are handled and whether the algorithm efficiently finds the shortest concatenation.
Approaches
Brute Force
The brute-force approach is to generate all permutations of the words and compute the resulting superstring for each permutation by greedily merging overlapping parts. For each pair of consecutive words in a permutation, you find the maximum overlap and append the non-overlapping suffix. After checking all permutations, you return the shortest superstring found.
This approach is correct because it considers every possible order of the strings, so it cannot miss the optimal arrangement. However, the factorial growth of permutations makes it infeasible for the upper bound of words.length = 12 because 12! is over 479 million possibilities. Even with a fast overlap calculation, this is too slow.
Optimal Approach
The key insight for an optimal approach is to use dynamic programming with bitmasking. Each subset of words can be represented by a bitmask, and we can compute the shortest superstring ending with each word for each subset. Specifically, define dp[mask][i] as the length of the shortest superstring that uses all words in mask and ends with word i. We can fill dp using previously computed smaller subsets and the maximum overlaps between words. Finally, we reconstruct the path by backtracking from the optimal endpoint.
This approach works efficiently because the number of states is 2^n * n (where n is the number of words), which is manageable for n <= 12.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n! * n^2 * L) | O(n * L) | Generates all permutations and merges words based on overlap. Too slow for n=12. |
| Optimal | O(n^2 * 2^n + n^2 * L) | O(n * 2^n + n^2) | Uses DP with bitmasking, precomputes overlaps, and reconstructs the shortest superstring. |
Algorithm Walkthrough
- Compute Overlaps: For each pair of words
(i, j), compute the maximum suffix ofwords[i]that matches a prefix ofwords[j]. Store this in a 2D arrayoverlap[i][j]. This allows quick calculation of how much extra string is needed when appendingwords[j]afterwords[i]. - Initialize DP Table: Create a DP table
dp[mask][i]wheremaskrepresents a subset of words andiindicates the last word used. Initializedp[1 << i][i] = len(words[i])since a single-word superstring is just that word. - Fill DP Table: Iterate over all masks in increasing order of bits set. For each
maskand eachiinmask, try to extend the superstring by appending another wordjnot inmask. Updatedp[mask | (1 << j)][j]if appendingjafterigives a shorter length. - Track Parent Pointers: Alongside the DP table, maintain a
parentarray to store which previous wordiled to the optimal length fordp[mask][j]. This allows reconstruction of the word order. - Reconstruct Superstring: Identify the endpoint
iindp[(1 << n) - 1][i]with minimal length. Backtrack throughparentto find the optimal order of words. Concatenate the words using the precomputed overlaps.
Why it works: Each DP state considers all subsets and ends with a specific word, ensuring that all combinations are covered without repetition. By using overlaps, we minimize the added length at each step. The parent pointers ensure we can reconstruct a valid superstring from the DP solution.
Python Solution
from typing import List
class Solution:
def shortestSuperstring(self, words: List[str]) -> str:
n = len(words)
overlap = [[0] * n for _ in range(n)]
# Compute maximum overlap between words[i] suffix and words[j] prefix
for i in range(n):
for j in range(n):
if i == j:
continue
m = min(len(words[i]), len(words[j]))
for k in range(m, 0, -1):
if words[i][-k:] == words[j][:k]:
overlap[i][j] = k
break
dp = [[0] * n for _ in range(1 << n)]
parent = [[-1] * n for _ in range(1 << n)]
# Fill DP
for mask in range(1, 1 << n):
for j in range(n):
if not (mask & (1 << j)):
continue
prev_mask = mask ^ (1 << j)
if prev_mask == 0:
dp[mask][j] = len(words[j])
else:
for i in range(n):
if not (prev_mask & (1 << i)):
continue
val = dp[prev_mask][i] + len(words[j]) - overlap[i][j]
if dp[mask][j] == 0 or val < dp[mask][j]:
dp[mask][j] = val
parent[mask][j] = i
# Reconstruct path
mask = (1 << n) - 1
j = min(range(n), key=lambda x: dp[mask][x])
path = []
while j != -1:
path.append(j)
j, mask = parent[mask][j], mask ^ (1 << path[-1])
path = path[::-1]
# Build superstring
res = words[path[0]]
for k in range(1, len(path)):
i, j = path[k - 1], path[k]
res += words[j][overlap[i][j]:]
return res
The Python solution first precomputes overlaps for efficient merging, fills a DP table using bitmasking to track minimal superstrings for each subset ending at each word, then reconstructs the optimal order to produce the final string.
Go Solution
func shortestSuperstring(words []string) string {
n := len(words)
overlap := make([][]int, n)
for i := range overlap {
overlap[i] = make([]int, n)
}
// Compute overlaps
for i := 0; i < n; i++ {
for j := 0; j < n; j++ {
if i == j {
continue
}
m := min(len(words[i]), len(words[j]))
for k := m; k > 0; k-- {
if words[i][len(words[i])-k:] == words[j][:k] {
overlap[i][j] = k
break
}
}
}
}
dp := make([][]int, 1<<n)
parent := make([][]int, 1<<n)
for i := range dp {
dp[i] = make([]int, n)
parent[i] = make([]int, n)
for j := range parent[i] {
parent[i][j] = -1
}
}
// Fill DP
for mask := 1; mask < 1<<n; mask++ {
for j := 0; j < n; j++ {
if mask&(1<<j) == 0 {
continue
}
prevMask := mask ^ (1 << j)
if prevMask == 0 {
dp[mask][j] = len(words[j])
} else {
for i := 0; i < n; i++ {
if prevMask&(1<<i) == 0 {
continue
}
val := dp[prevMask][i] + len(words[j]) - overlap[i][j]
if dp[mask][j] == 0 || val < dp[mask][j] {
dp[mask][j] = val
parent[mask][j] = i
}
}
}
}
}
// Reconstruct path
mask := (1 << n) - 1
j := 0
for i := 1; i < n; i++ {
if dp[mask][i] < dp[mask][j] {
j = i
}
}
path := []int{}
for j != -1 {
path = append(path,