Difficulty: 🔴 Hard
Topics: String, Dynamic Programming

Solution

Shortest Common Supersequence

Problem Understanding

The problem asks us to construct the shortest possible string that contains both str1 and str2 as subsequences.

A subsequence does not require characters to appear contiguously. Instead, the characters only need to appear in the same relative order. For example, "abc" is a subsequence of "axbxc" because we can delete the extra characters.

We are given two input strings:

str1
str2

We must return a single string such that:

str1 is a subsequence of the result.
str2 is also a subsequence of the result.
Among all valid possibilities, the returned string has minimum length.

If multiple shortest answers exist, returning any one of them is acceptable.

The key challenge is avoiding unnecessary duplication of characters. If both strings share common subsequences, we should reuse those shared characters instead of inserting duplicates.

The constraints are important:

1 <= str1.length, str2.length <= 1000
Both strings contain lowercase English letters.

Since each string can have length up to 1000, brute-force generation of all supersequences is completely infeasible. An efficient dynamic programming solution is required.

Several edge cases are important:

The strings may already be identical.
One string may already be a subsequence of the other.
The strings may share no common characters at all.
Large repeated-character inputs may expose inefficient implementations.

The problem strongly suggests a relationship with the Longest Common Subsequence problem, because shared subsequences allow us to minimize duplication.

Approaches

Brute Force Approach

A naive solution would attempt to generate all possible supersequences of the two strings and then select the shortest valid one.

One way to think about this is recursively deciding whether to take the next character from str1, the next character from str2, or both if they match.

This eventually explores an exponential number of possibilities because every mismatch creates branching recursion. Even memoization on the resulting strings would still be far too expensive for strings of length 1000.

The brute-force approach is correct because it explores every valid supersequence and selects the minimum-length one. However, its runtime grows exponentially and becomes impossible for the given constraints.

Key Insight

The important observation is that we only need to avoid duplicating the characters that both strings already share.

That immediately connects this problem to the Longest Common Subsequence, commonly abbreviated as LCS.

Suppose:

len(str1) = n
len(str2) = m
LCS length = k

Then the shortest common supersequence length is:

$|SCS| = n + m - k$

Why?

Because the LCS characters can be shared between both strings instead of duplicated.

So the optimal strategy becomes:

Compute the LCS.
Merge the two strings around the LCS characters.
Include non-common characters from both strings in order.

This transforms the problem into a standard dynamic programming solution with polynomial complexity.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(2^(n+m))	O(n+m) recursion	Explores all possible supersequences
Optimal	O(n × m)	O(n × m)	Uses LCS dynamic programming

Algorithm Walkthrough

Step 1: Build the LCS DP Table

Create a 2D DP table where:

dp[i][j] represents the length of the LCS between:
str1[:i]
str2[:j]

The recurrence is:

If characters match:
dp[i][j] = dp[i-1][j-1] + 1
Otherwise:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])

This table tells us the maximum reusable subsequence between the two strings.

Step 2: Reconstruct the LCS

Start from dp[n][m] and move backward through the table.

If characters match:

That character belongs to the LCS.
Move diagonally.

Otherwise:

Move toward the larger DP value.
This follows the optimal subsequence path.

Reverse the collected characters at the end because reconstruction happens backward.

Step 3: Merge the Strings Around the LCS

Now we construct the shortest common supersequence.

Maintain two pointers:

i for str1
j for str2

For each character c in the LCS:

Add all characters from str1 before c.
Add all characters from str2 before c.
Add c once.
Advance both pointers.

This preserves the relative order of both strings while avoiding duplication of shared characters.

Step 4: Append Remaining Characters

After processing the full LCS:

Append remaining characters from str1
Append remaining characters from str2

These characters are not shared and therefore must appear in the final answer.

Why it works

The algorithm works because the LCS represents the maximum number of characters that can be shared between the two strings.

Every character outside the LCS must appear separately in the supersequence. By merging both strings around the shared LCS characters, we guarantee:

Both strings remain subsequences.
Shared characters are included only once.
No shorter valid supersequence can exist.

Therefore, the constructed string is optimal.

Python Solution

class Solution:
    def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
        n = len(str1)
        m = len(str2)

        # Step 1: Build LCS DP table
        dp = [[0] * (m + 1) for _ in range(n + 1)]

        for i in range(1, n + 1):
            for j in range(1, m + 1):
                if str1[i - 1] == str2[j - 1]:
                    dp[i][j] = dp[i - 1][j - 1] + 1
                else:
                    dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

        # Step 2: Reconstruct LCS
        i = n
        j = m
        lcs_chars = []

        while i > 0 and j > 0:
            if str1[i - 1] == str2[j - 1]:
                lcs_chars.append(str1[i - 1])
                i -= 1
                j -= 1
            elif dp[i - 1][j] >= dp[i][j - 1]:
                i -= 1
            else:
                j -= 1

        lcs = "".join(reversed(lcs_chars))

        # Step 3: Build shortest common supersequence
        result = []

        i = 0
        j = 0

        for char in lcs:
            while str1[i] != char:
                result.append(str1[i])
                i += 1

            while str2[j] != char:
                result.append(str2[j])
                j += 1

            result.append(char)
            i += 1
            j += 1

        # Step 4: Append remaining characters
        result.append(str1[i:])
        result.append(str2[j:])

        return "".join(result)

The implementation begins by constructing the standard LCS dynamic programming table. Each cell stores the length of the best common subsequence up to that point.

After the table is complete, the code reconstructs the actual LCS by walking backward through the table. Whenever matching characters are found, they are added to the LCS.

Once the LCS is known, the algorithm merges the two strings. Characters that appear before the next LCS character are copied directly into the answer. When an LCS character is reached, it is added only once.

Finally, any leftover suffixes are appended.

This guarantees the resulting string contains both original strings as subsequences while remaining as short as possible.

Go Solution

func shortestCommonSupersequence(str1 string, str2 string) string {
	n := len(str1)
	m := len(str2)

	// Step 1: Build LCS DP table
	dp := make([][]int, n+1)
	for i := range dp {
		dp[i] = make([]int, m+1)
	}

	for i := 1; i <= n; i++ {
		for j := 1; j <= m; j++ {
			if str1[i-1] == str2[j-1] {
				dp[i][j] = dp[i-1][j-1] + 1
			} else {
				if dp[i-1][j] > dp[i][j-1] {
					dp[i][j] = dp[i-1][j]
				} else {
					dp[i][j] = dp[i][j-1]
				}
			}
		}
	}

	// Step 2: Reconstruct LCS
	i := n
	j := m
	lcsChars := []byte{}

	for i > 0 && j > 0 {
		if str1[i-1] == str2[j-1] {
			lcsChars = append(lcsChars, str1[i-1])
			i--
			j--
		} else if dp[i-1][j] >= dp[i][j-1] {
			i--
		} else {
			j--
		}
	}

	// Reverse LCS
	for left, right := 0, len(lcsChars)-1; left < right; {
		lcsChars[left], lcsChars[right] = lcsChars[right], lcsChars[left]
		left++
		right--
	}

	// Step 3: Build SCS
	result := []byte{}

	i = 0
	j = 0

	for _, ch := range lcsChars {
		for str1[i] != ch {
			result = append(result, str1[i])
			i++
		}

		for str2[j] != ch {
			result = append(result, str2[j])
			j++
		}

		result = append(result, byte(ch))
		i++
		j++
	}

	// Step 4: Append remaining characters
	result = append(result, str1[i:]...)
	result = append(result, str2[j:]...)

	return string(result)
}

The Go implementation follows exactly the same algorithmic structure as the Python solution.

The main difference is manual handling of slices and byte arrays. Since strings in Go are immutable, a []byte slice is used to efficiently build both the LCS and the final answer.

No integer overflow concerns exist because the maximum DP value is at most 1000.

Worked Examples

Example 1

str1 = "abac"
str2 = "cab"

Step 1: Build LCS Table

The LCS is "ab".

	c	a	b
a	0	1	1
b	0	1	2
a	0	1	2
c	1	1	2

Step 2: Merge Around the LCS

LCS = "ab"

Initial pointers:

Variable	Value
i	0
j	0
result	""

Process 'a':

str2[0] = 'c', append 'c'
Both reach 'a'
Append 'a'

Result becomes:

"ca"

Pointers:

Variable	Value
i	1
j	2

Process 'b':

Both already at 'b'
Append 'b'

Result:

"cab"

Pointers:

Variable	Value
i	2
j	3

Append remaining suffix from str1:

"ac"

Final answer:

"cabac"

Example 2

str1 = "aaaaaaaa"
str2 = "aaaaaaaa"

The LCS is the entire string:

"aaaaaaaa"

Since both strings are identical, every character is shared.

No extra characters need to be inserted.

Final answer:

"aaaaaaaa"

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n × m)	Building the LCS DP table dominates
Space	O(n × m)	The DP table stores all subproblem results

The DP table contains (n + 1) × (m + 1) cells, and each cell is computed in constant time.

Reconstructing the LCS and building the final supersequence each take linear time relative to the string lengths.

Therefore, the total complexity remains O(n × m).

Test Cases

class Solution:
    def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
        n = len(str1)
        m = len(str2)

        dp = [[0] * (m + 1) for _ in range(n + 1)]

        for i in range(1, n + 1):
            for j in range(1, m + 1):
                if str1[i - 1] == str2[j - 1]:
                    dp[i][j] = dp[i - 1][j - 1] + 1
                else:
                    dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

        i = n
        j = m
        lcs_chars = []

        while i > 0 and j > 0:
            if str1[i - 1] == str2[j - 1]:
                lcs_chars.append(str1[i - 1])
                i -= 1
                j -= 1
            elif dp[i - 1][j] >= dp[i][j - 1]:
                i -= 1
            else:
                j -= 1

        lcs = "".join(reversed(lcs_chars))

        result = []

        i = 0
        j = 0

        for char in lcs:
            while str1[i] != char:
                result.append(str1[i])
                i += 1

            while str2[j] != char:
                result.append(str2[j])
                j += 1

            result.append(char)
            i += 1
            j += 1

        result.append(str1[i:])
        result.append(str2[j:])

        return "".join(result)

def is_subsequence(sub, s):
    it = iter(s)
    return all(c in it for c in sub)

sol = Solution()

# Example 1
ans = sol.shortestCommonSupersequence("abac", "cab")
assert ans == "cabac"  # standard example

# Example 2
ans = sol.shortestCommonSupersequence("aaaaaaaa", "aaaaaaaa")
assert ans == "aaaaaaaa"  # identical strings

# No common characters
ans = sol.shortestCommonSupersequence("abc", "def")
assert len(ans) == 6  # no overlap possible
assert is_subsequence("abc", ans)
assert is_subsequence("def", ans)

# One string is subsequence of the other
ans = sol.shortestCommonSupersequence("abc", "abdbc")
assert ans == "abdbc"  # shorter string already embedded

# Single-character strings
ans = sol.shortestCommonSupersequence("a", "b")
assert len(ans) == 2
assert is_subsequence("a", ans)
assert is_subsequence("b", ans)

# Partial overlap
ans = sol.shortestCommonSupersequence("geek", "eke")
assert len(ans) == 5
assert is_subsequence("geek", ans)
assert is_subsequence("eke", ans)

# Repeated characters
ans = sol.shortestCommonSupersequence("aaaab", "baaab")
assert is_subsequence("aaaab", ans)
assert is_subsequence("baaab", ans)

# Long overlap
ans = sol.shortestCommonSupersequence("abcdef", "zabcyf")
assert is_subsequence("abcdef", ans)
assert is_subsequence("zabcyf", ans)

Test	Why
`"abac", "cab"`	Validates the main example
`"aaaaaaaa", "aaaaaaaa"`	Tests identical strings
`"abc", "def"`	Tests no common subsequence
`"abc", "abdbc"`	Tests subsequence containment
`"a", "b"`	Smallest possible input
`"geek", "eke"`	Tests partial overlap
`"aaaab", "baaab"`	Tests repeated characters
`"abcdef", "zabcyf"`	Tests nontrivial shared subsequences

Edge Cases

Identical Strings

If both strings are exactly the same, the shortest common supersequence is simply the original string itself.

This case can expose bugs where implementations accidentally duplicate shared characters during merging. Our implementation handles this naturally because the LCS becomes the entire string, so every character is shared exactly once.

No Common Characters

Consider:

str1 = "abc"
str2 = "def"

The LCS is empty.

In this situation, every character from both strings must appear independently in the result. The implementation correctly appends all characters from both strings because no shared merge points exist.

One String Already Contains the Other

Consider:

str1 = "abc"
str2 = "abdbc"

Here, "abc" is already a subsequence of "abdbc".

A buggy implementation might unnecessarily duplicate characters while merging. Our approach avoids this because the LCS reconstruction correctly identifies all reusable characters, allowing the longer string to remain unchanged.

Repeated Characters

Repeated characters often create ambiguity during LCS reconstruction.

For example:

str1 = "aaaab"
str2 = "baaab"

There are multiple valid LCS choices. The DP reconstruction ensures a valid LCS is selected consistently, and the merge process still produces an optimal shortest supersequence regardless of which valid LCS path is chosen.

LeetCode 1092 - Shortest Common Supersequence

Solution

Shortest Common Supersequence

Problem Understanding

Approaches

Brute Force Approach

Key Insight

Algorithm Walkthrough

Step 1: Build the LCS DP Table

Step 2: Reconstruct the LCS

Step 3: Merge the Strings Around the LCS

Step 4: Append Remaining Characters

Why it works

Python Solution

Go Solution

Worked Examples

Example 1

Step 1: Build LCS Table

Step 2: Merge Around the LCS

Example 2

Complexity Analysis

Test Cases

Edge Cases

Identical Strings

No Common Characters

One String Already Contains the Other

Repeated Characters