LeetCode 1092 - Shortest Common Supersequence
The problem asks us to construct the shortest possible string that contains both str1 and str2 as subsequences. A subsequence does not require characters to appear contiguously. Instead, the characters only need to appear in the same relative order.
Difficulty: 🔴 Hard
Topics: String, Dynamic Programming
Solution
Shortest Common Supersequence
Problem Understanding
The problem asks us to construct the shortest possible string that contains both str1 and str2 as subsequences.
A subsequence does not require characters to appear contiguously. Instead, the characters only need to appear in the same relative order. For example, "abc" is a subsequence of "axbxc" because we can delete the extra characters.
We are given two input strings:
str1str2
We must return a single string such that:
str1is a subsequence of the result.str2is also a subsequence of the result.- Among all valid possibilities, the returned string has minimum length.
If multiple shortest answers exist, returning any one of them is acceptable.
The key challenge is avoiding unnecessary duplication of characters. If both strings share common subsequences, we should reuse those shared characters instead of inserting duplicates.
The constraints are important:
1 <= str1.length, str2.length <= 1000- Both strings contain lowercase English letters.
Since each string can have length up to 1000, brute-force generation of all supersequences is completely infeasible. An efficient dynamic programming solution is required.
Several edge cases are important:
- The strings may already be identical.
- One string may already be a subsequence of the other.
- The strings may share no common characters at all.
- Large repeated-character inputs may expose inefficient implementations.
The problem strongly suggests a relationship with the Longest Common Subsequence problem, because shared subsequences allow us to minimize duplication.
Approaches
Brute Force Approach
A naive solution would attempt to generate all possible supersequences of the two strings and then select the shortest valid one.
One way to think about this is recursively deciding whether to take the next character from str1, the next character from str2, or both if they match.
This eventually explores an exponential number of possibilities because every mismatch creates branching recursion. Even memoization on the resulting strings would still be far too expensive for strings of length 1000.
The brute-force approach is correct because it explores every valid supersequence and selects the minimum-length one. However, its runtime grows exponentially and becomes impossible for the given constraints.
Key Insight
The important observation is that we only need to avoid duplicating the characters that both strings already share.
That immediately connects this problem to the Longest Common Subsequence, commonly abbreviated as LCS.
Suppose:
len(str1) = nlen(str2) = mLCS length = k
Then the shortest common supersequence length is:
$|SCS| = n + m - k$
Why?
Because the LCS characters can be shared between both strings instead of duplicated.
So the optimal strategy becomes:
- Compute the LCS.
- Merge the two strings around the LCS characters.
- Include non-common characters from both strings in order.
This transforms the problem into a standard dynamic programming solution with polynomial complexity.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^(n+m)) | O(n+m) recursion | Explores all possible supersequences |
| Optimal | O(n × m) | O(n × m) | Uses LCS dynamic programming |
Algorithm Walkthrough
Step 1: Build the LCS DP Table
Create a 2D DP table where:
-
dp[i][j]represents the length of the LCS between: -
str1[:i] -
str2[:j]
The recurrence is:
-
If characters match:
-
dp[i][j] = dp[i-1][j-1] + 1 -
Otherwise:
-
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
This table tells us the maximum reusable subsequence between the two strings.
Step 2: Reconstruct the LCS
Start from dp[n][m] and move backward through the table.
If characters match:
- That character belongs to the LCS.
- Move diagonally.
Otherwise:
- Move toward the larger DP value.
- This follows the optimal subsequence path.
Reverse the collected characters at the end because reconstruction happens backward.
Step 3: Merge the Strings Around the LCS
Now we construct the shortest common supersequence.
Maintain two pointers:
iforstr1jforstr2
For each character c in the LCS:
- Add all characters from
str1beforec. - Add all characters from
str2beforec. - Add
conce. - Advance both pointers.
This preserves the relative order of both strings while avoiding duplication of shared characters.
Step 4: Append Remaining Characters
After processing the full LCS:
- Append remaining characters from
str1 - Append remaining characters from
str2
These characters are not shared and therefore must appear in the final answer.
Why it works
The algorithm works because the LCS represents the maximum number of characters that can be shared between the two strings.
Every character outside the LCS must appear separately in the supersequence. By merging both strings around the shared LCS characters, we guarantee:
- Both strings remain subsequences.
- Shared characters are included only once.
- No shorter valid supersequence can exist.
Therefore, the constructed string is optimal.
Python Solution
class Solution:
def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
n = len(str1)
m = len(str2)
# Step 1: Build LCS DP table
dp = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for j in range(1, m + 1):
if str1[i - 1] == str2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
# Step 2: Reconstruct LCS
i = n
j = m
lcs_chars = []
while i > 0 and j > 0:
if str1[i - 1] == str2[j - 1]:
lcs_chars.append(str1[i - 1])
i -= 1
j -= 1
elif dp[i - 1][j] >= dp[i][j - 1]:
i -= 1
else:
j -= 1
lcs = "".join(reversed(lcs_chars))
# Step 3: Build shortest common supersequence
result = []
i = 0
j = 0
for char in lcs:
while str1[i] != char:
result.append(str1[i])
i += 1
while str2[j] != char:
result.append(str2[j])
j += 1
result.append(char)
i += 1
j += 1
# Step 4: Append remaining characters
result.append(str1[i:])
result.append(str2[j:])
return "".join(result)
The implementation begins by constructing the standard LCS dynamic programming table. Each cell stores the length of the best common subsequence up to that point.
After the table is complete, the code reconstructs the actual LCS by walking backward through the table. Whenever matching characters are found, they are added to the LCS.
Once the LCS is known, the algorithm merges the two strings. Characters that appear before the next LCS character are copied directly into the answer. When an LCS character is reached, it is added only once.
Finally, any leftover suffixes are appended.
This guarantees the resulting string contains both original strings as subsequences while remaining as short as possible.
Go Solution
func shortestCommonSupersequence(str1 string, str2 string) string {
n := len(str1)
m := len(str2)
// Step 1: Build LCS DP table
dp := make([][]int, n+1)
for i := range dp {
dp[i] = make([]int, m+1)
}
for i := 1; i <= n; i++ {
for j := 1; j <= m; j++ {
if str1[i-1] == str2[j-1] {
dp[i][j] = dp[i-1][j-1] + 1
} else {
if dp[i-1][j] > dp[i][j-1] {
dp[i][j] = dp[i-1][j]
} else {
dp[i][j] = dp[i][j-1]
}
}
}
}
// Step 2: Reconstruct LCS
i := n
j := m
lcsChars := []byte{}
for i > 0 && j > 0 {
if str1[i-1] == str2[j-1] {
lcsChars = append(lcsChars, str1[i-1])
i--
j--
} else if dp[i-1][j] >= dp[i][j-1] {
i--
} else {
j--
}
}
// Reverse LCS
for left, right := 0, len(lcsChars)-1; left < right; {
lcsChars[left], lcsChars[right] = lcsChars[right], lcsChars[left]
left++
right--
}
// Step 3: Build SCS
result := []byte{}
i = 0
j = 0
for _, ch := range lcsChars {
for str1[i] != ch {
result = append(result, str1[i])
i++
}
for str2[j] != ch {
result = append(result, str2[j])
j++
}
result = append(result, byte(ch))
i++
j++
}
// Step 4: Append remaining characters
result = append(result, str1[i:]...)
result = append(result, str2[j:]...)
return string(result)
}
The Go implementation follows exactly the same algorithmic structure as the Python solution.
The main difference is manual handling of slices and byte arrays. Since strings in Go are immutable, a []byte slice is used to efficiently build both the LCS and the final answer.
No integer overflow concerns exist because the maximum DP value is at most 1000.
Worked Examples
Example 1
str1 = "abac"
str2 = "cab"
Step 1: Build LCS Table
The LCS is "ab".
| c | a | b | |
|---|---|---|---|
| a | 0 | 1 | 1 |
| b | 0 | 1 | 2 |
| a | 0 | 1 | 2 |
| c | 1 | 1 | 2 |
Step 2: Merge Around the LCS
LCS = "ab"
Initial pointers:
| Variable | Value |
|---|---|
| i | 0 |
| j | 0 |
| result | "" |
Process 'a':
str2[0] = 'c', append'c'- Both reach
'a' - Append
'a'
Result becomes:
"ca"
Pointers:
| Variable | Value |
|---|---|
| i | 1 |
| j | 2 |
Process 'b':
- Both already at
'b' - Append
'b'
Result:
"cab"
Pointers:
| Variable | Value |
|---|---|
| i | 2 |
| j | 3 |
Append remaining suffix from str1:
"ac"
Final answer:
"cabac"
Example 2
str1 = "aaaaaaaa"
str2 = "aaaaaaaa"
The LCS is the entire string:
"aaaaaaaa"
Since both strings are identical, every character is shared.
No extra characters need to be inserted.
Final answer:
"aaaaaaaa"
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n × m) | Building the LCS DP table dominates |
| Space | O(n × m) | The DP table stores all subproblem results |
The DP table contains (n + 1) × (m + 1) cells, and each cell is computed in constant time.
Reconstructing the LCS and building the final supersequence each take linear time relative to the string lengths.
Therefore, the total complexity remains O(n × m).
Test Cases
class Solution:
def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
n = len(str1)
m = len(str2)
dp = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for j in range(1, m + 1):
if str1[i - 1] == str2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
i = n
j = m
lcs_chars = []
while i > 0 and j > 0:
if str1[i - 1] == str2[j - 1]:
lcs_chars.append(str1[i - 1])
i -= 1
j -= 1
elif dp[i - 1][j] >= dp[i][j - 1]:
i -= 1
else:
j -= 1
lcs = "".join(reversed(lcs_chars))
result = []
i = 0
j = 0
for char in lcs:
while str1[i] != char:
result.append(str1[i])
i += 1
while str2[j] != char:
result.append(str2[j])
j += 1
result.append(char)
i += 1
j += 1
result.append(str1[i:])
result.append(str2[j:])
return "".join(result)
def is_subsequence(sub, s):
it = iter(s)
return all(c in it for c in sub)
sol = Solution()
# Example 1
ans = sol.shortestCommonSupersequence("abac", "cab")
assert ans == "cabac" # standard example
# Example 2
ans = sol.shortestCommonSupersequence("aaaaaaaa", "aaaaaaaa")
assert ans == "aaaaaaaa" # identical strings
# No common characters
ans = sol.shortestCommonSupersequence("abc", "def")
assert len(ans) == 6 # no overlap possible
assert is_subsequence("abc", ans)
assert is_subsequence("def", ans)
# One string is subsequence of the other
ans = sol.shortestCommonSupersequence("abc", "abdbc")
assert ans == "abdbc" # shorter string already embedded
# Single-character strings
ans = sol.shortestCommonSupersequence("a", "b")
assert len(ans) == 2
assert is_subsequence("a", ans)
assert is_subsequence("b", ans)
# Partial overlap
ans = sol.shortestCommonSupersequence("geek", "eke")
assert len(ans) == 5
assert is_subsequence("geek", ans)
assert is_subsequence("eke", ans)
# Repeated characters
ans = sol.shortestCommonSupersequence("aaaab", "baaab")
assert is_subsequence("aaaab", ans)
assert is_subsequence("baaab", ans)
# Long overlap
ans = sol.shortestCommonSupersequence("abcdef", "zabcyf")
assert is_subsequence("abcdef", ans)
assert is_subsequence("zabcyf", ans)
| Test | Why |
|---|---|
"abac", "cab" |
Validates the main example |
"aaaaaaaa", "aaaaaaaa" |
Tests identical strings |
"abc", "def" |
Tests no common subsequence |
"abc", "abdbc" |
Tests subsequence containment |
"a", "b" |
Smallest possible input |
"geek", "eke" |
Tests partial overlap |
"aaaab", "baaab" |
Tests repeated characters |
"abcdef", "zabcyf" |
Tests nontrivial shared subsequences |
Edge Cases
Identical Strings
If both strings are exactly the same, the shortest common supersequence is simply the original string itself.
This case can expose bugs where implementations accidentally duplicate shared characters during merging. Our implementation handles this naturally because the LCS becomes the entire string, so every character is shared exactly once.
No Common Characters
Consider:
str1 = "abc"
str2 = "def"
The LCS is empty.
In this situation, every character from both strings must appear independently in the result. The implementation correctly appends all characters from both strings because no shared merge points exist.
One String Already Contains the Other
Consider:
str1 = "abc"
str2 = "abdbc"
Here, "abc" is already a subsequence of "abdbc".
A buggy implementation might unnecessarily duplicate characters while merging. Our approach avoids this because the LCS reconstruction correctly identifies all reusable characters, allowing the longer string to remain unchanged.
Repeated Characters
Repeated characters often create ambiguity during LCS reconstruction.
For example:
str1 = "aaaab"
str2 = "baaab"
There are multiple valid LCS choices. The DP reconstruction ensures a valid LCS is selected consistently, and the merge process still produces an optimal shortest supersequence regardless of which valid LCS path is chosen.