LeetCode 943 - Find the Shortest Superstring

The problem asks us to find the shortest superstring that contains all the given strings in the array words as substrings.

LeetCode Problem 943

Difficulty: 🔴 Hard
Topics: Array, String, Dynamic Programming, Bit Manipulation, Bitmask

Solution

Problem Understanding

The problem asks us to find the shortest superstring that contains all the given strings in the array words as substrings. In other words, we need a single string that includes every string in words at least once, concatenated in some order, but we want this superstring to have minimal length. The problem guarantees that no string in words is a substring of another, which simplifies overlap handling.

The input words is an array of unique strings of lowercase English letters. The array length is between 1 and 12, and each string can be up to 20 characters long. The output is a single string, which is the shortest possible combination containing all words. Multiple outputs can be valid if multiple minimal-length superstrings exist.

Edge cases include: a single string in the array, strings that do not overlap at all, and strings that overlap completely except for a few characters. These situations test how overlaps are handled and whether the algorithm efficiently finds the shortest concatenation.

Approaches

Brute Force

The brute-force approach is to generate all permutations of the words and compute the resulting superstring for each permutation by greedily merging overlapping parts. For each pair of consecutive words in a permutation, you find the maximum overlap and append the non-overlapping suffix. After checking all permutations, you return the shortest superstring found.

This approach is correct because it considers every possible order of the strings, so it cannot miss the optimal arrangement. However, the factorial growth of permutations makes it infeasible for the upper bound of words.length = 12 because 12! is over 479 million possibilities. Even with a fast overlap calculation, this is too slow.

Optimal Approach

The key insight for an optimal approach is to use dynamic programming with bitmasking. Each subset of words can be represented by a bitmask, and we can compute the shortest superstring ending with each word for each subset. Specifically, define dp[mask][i] as the length of the shortest superstring that uses all words in mask and ends with word i. We can fill dp using previously computed smaller subsets and the maximum overlaps between words. Finally, we reconstruct the path by backtracking from the optimal endpoint.

This approach works efficiently because the number of states is 2^n * n (where n is the number of words), which is manageable for n <= 12.

Approach Time Complexity Space Complexity Notes
Brute Force O(n! * n^2 * L) O(n * L) Generates all permutations and merges words based on overlap. Too slow for n=12.
Optimal O(n^2 * 2^n + n^2 * L) O(n * 2^n + n^2) Uses DP with bitmasking, precomputes overlaps, and reconstructs the shortest superstring.

Algorithm Walkthrough

  1. Compute Overlaps: For each pair of words (i, j), compute the maximum suffix of words[i] that matches a prefix of words[j]. Store this in a 2D array overlap[i][j]. This allows quick calculation of how much extra string is needed when appending words[j] after words[i].
  2. Initialize DP Table: Create a DP table dp[mask][i] where mask represents a subset of words and i indicates the last word used. Initialize dp[1 << i][i] = len(words[i]) since a single-word superstring is just that word.
  3. Fill DP Table: Iterate over all masks in increasing order of bits set. For each mask and each i in mask, try to extend the superstring by appending another word j not in mask. Update dp[mask | (1 << j)][j] if appending j after i gives a shorter length.
  4. Track Parent Pointers: Alongside the DP table, maintain a parent array to store which previous word i led to the optimal length for dp[mask][j]. This allows reconstruction of the word order.
  5. Reconstruct Superstring: Identify the endpoint i in dp[(1 << n) - 1][i] with minimal length. Backtrack through parent to find the optimal order of words. Concatenate the words using the precomputed overlaps.

Why it works: Each DP state considers all subsets and ends with a specific word, ensuring that all combinations are covered without repetition. By using overlaps, we minimize the added length at each step. The parent pointers ensure we can reconstruct a valid superstring from the DP solution.

Python Solution

from typing import List

class Solution:
    def shortestSuperstring(self, words: List[str]) -> str:
        n = len(words)
        overlap = [[0] * n for _ in range(n)]
        
        # Compute maximum overlap between words[i] suffix and words[j] prefix
        for i in range(n):
            for j in range(n):
                if i == j:
                    continue
                m = min(len(words[i]), len(words[j]))
                for k in range(m, 0, -1):
                    if words[i][-k:] == words[j][:k]:
                        overlap[i][j] = k
                        break
        
        dp = [[0] * n for _ in range(1 << n)]
        parent = [[-1] * n for _ in range(1 << n)]
        
        # Fill DP
        for mask in range(1, 1 << n):
            for j in range(n):
                if not (mask & (1 << j)):
                    continue
                prev_mask = mask ^ (1 << j)
                if prev_mask == 0:
                    dp[mask][j] = len(words[j])
                else:
                    for i in range(n):
                        if not (prev_mask & (1 << i)):
                            continue
                        val = dp[prev_mask][i] + len(words[j]) - overlap[i][j]
                        if dp[mask][j] == 0 or val < dp[mask][j]:
                            dp[mask][j] = val
                            parent[mask][j] = i
        
        # Reconstruct path
        mask = (1 << n) - 1
        j = min(range(n), key=lambda x: dp[mask][x])
        path = []
        while j != -1:
            path.append(j)
            j, mask = parent[mask][j], mask ^ (1 << path[-1])
        path = path[::-1]
        
        # Build superstring
        res = words[path[0]]
        for k in range(1, len(path)):
            i, j = path[k - 1], path[k]
            res += words[j][overlap[i][j]:]
        return res

The Python solution first precomputes overlaps for efficient merging, fills a DP table using bitmasking to track minimal superstrings for each subset ending at each word, then reconstructs the optimal order to produce the final string.

Go Solution

func shortestSuperstring(words []string) string {
    n := len(words)
    overlap := make([][]int, n)
    for i := range overlap {
        overlap[i] = make([]int, n)
    }

    // Compute overlaps
    for i := 0; i < n; i++ {
        for j := 0; j < n; j++ {
            if i == j {
                continue
            }
            m := min(len(words[i]), len(words[j]))
            for k := m; k > 0; k-- {
                if words[i][len(words[i])-k:] == words[j][:k] {
                    overlap[i][j] = k
                    break
                }
            }
        }
    }

    dp := make([][]int, 1<<n)
    parent := make([][]int, 1<<n)
    for i := range dp {
        dp[i] = make([]int, n)
        parent[i] = make([]int, n)
        for j := range parent[i] {
            parent[i][j] = -1
        }
    }

    // Fill DP
    for mask := 1; mask < 1<<n; mask++ {
        for j := 0; j < n; j++ {
            if mask&(1<<j) == 0 {
                continue
            }
            prevMask := mask ^ (1 << j)
            if prevMask == 0 {
                dp[mask][j] = len(words[j])
            } else {
                for i := 0; i < n; i++ {
                    if prevMask&(1<<i) == 0 {
                        continue
                    }
                    val := dp[prevMask][i] + len(words[j]) - overlap[i][j]
                    if dp[mask][j] == 0 || val < dp[mask][j] {
                        dp[mask][j] = val
                        parent[mask][j] = i
                    }
                }
            }
        }
    }

    // Reconstruct path
    mask := (1 << n) - 1
    j := 0
    for i := 1; i < n; i++ {
        if dp[mask][i] < dp[mask][j] {
            j = i
        }
    }

    path := []int{}
    for j != -1 {
        path = append(path,