LeetCode 115 - Distinct Subsequences
This problem asks us to count how many distinct subsequences of a string s are equal to another string t. A subsequence is formed by deleting zero or more characters from a string without changing the relative order of the remaining characters.
Difficulty: 🔴 Hard
Topics: String, Dynamic Programming
Solution
Problem Understanding
This problem asks us to count how many distinct subsequences of a string s are equal to another string t.
A subsequence is formed by deleting zero or more characters from a string without changing the relative order of the remaining characters. For example, "rabbit" is a subsequence of "rabbbit" because we can remove one of the three 'b' characters while preserving order.
The key detail is that we are counting distinct ways to form t, not merely checking whether t can be formed. Different choices of indices in s count as separate subsequences, even if the resulting string looks identical.
The input consists of:
s, the source string from which subsequences are selectedt, the target string we want to match
The output is a single integer representing the total number of distinct subsequences of s that equal t.
For example:
s = "rabbbit"andt = "rabbit"returns3because there are three different ways to choose which'b'to remove.s = "babgbag"andt = "bag"returns5because multiple character combinations preserve the required order.
The constraints are important:
1 <= s.length, t.length <= 1000- Both strings contain only English letters.
A string length of 1000 immediately rules out exponential brute-force solutions. Since every character can either be included or excluded, a naive approach would explore up to 2^1000 possibilities, which is computationally infeasible.
Several edge cases are worth considering upfront:
- If
tis longer thans, formingtis impossible, so the answer must be0. - If
s == t, there is exactly one valid subsequence, the entire string itself. - Repeated characters create overlapping possibilities and are easy to double count incorrectly.
- Strings with no matching characters should correctly return
0.
The challenge is therefore to count combinations efficiently while avoiding recomputation.
Approaches
Brute Force Approach
The brute-force idea is to recursively explore every possible subsequence of s.
At each character in s, we make a binary choice:
- Skip the character.
- Include the character if it matches the next needed character in
t.
Whenever we finish matching all characters in t, we count one valid subsequence. If we exhaust s before matching all of t, that path contributes zero.
This approach is correct because it systematically explores every valid subsequence and counts exactly those that match t.
However, it is far too slow. Since each character in s creates two branching choices, the worst-case complexity becomes exponential, roughly O(2^n), where n is the length of s. With strings of length 1000, this is impossible to compute within time limits.
The main inefficiency is repeated work. The same substring states are solved again and again. For example, the question:
"How many ways can
s[i:]formt[j:]?"
may appear repeatedly through different recursive paths.
Key Insight
The key observation is that this problem has overlapping subproblems, making it a perfect candidate for dynamic programming.
Instead of recomputing the same state repeatedly, we define:
dp[i][j] = number of ways to form t[:j] using s[:i]
This allows us to reuse previously computed results.
For each character pair:
-
If
s[i-1] == t[j-1], we have two choices: -
Use this character match
-
Skip it
So:
dp[i][j] = dp[i-1][j-1] + dp[i-1][j]
- If the characters do not match, we can only skip the current character:
dp[i][j] = dp[i-1][j]
This transforms an exponential problem into a polynomial one.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^n) | O(n) | Explores every subsequence recursively |
| Optimal Dynamic Programming | O(n × m) | O(m) | Uses rolling DP to avoid recomputation |
Where n = len(s) and m = len(t).
Algorithm Walkthrough
We use a 1D dynamic programming optimization to reduce memory usage.
Instead of storing an entire 2D table, we only keep one row because each state depends only on the previous row.
- Create a DP array of length
m + 1, wherem = len(t).
dp[j] represents the number of ways to form the first j characters of t.
2. Initialize the base case.
There is exactly one way to form an empty target string: choose nothing.
Therefore:
dp[0] = 1
All other positions begin at 0.
3. Iterate through characters in s.
For every character in s, we attempt to update matches against t.
4. Traverse t backwards.
This step is crucial.
We iterate from right to left because each update depends on the previous row's values. If we move left to right, we would overwrite information too early. 5. Update matching characters.
If:
s[i] == t[j]
then:
dp[j + 1] += dp[j]
This means:
dp[j]counts ways to form the prefix before the current character- We extend those ways using the matching character
- Continue until all characters are processed.
- Return
dp[m].
This stores the total number of ways to form the full target string.
Why it works
The invariant is that after processing the first i characters of s, dp[j] correctly stores the number of ways to form the first j characters of t.
Each matching character gives two possibilities: use it or skip it. Each non-matching character can only be skipped. Because every valid subsequence is counted exactly once, and because DP preserves counts for all prefixes, the final answer is correct.
Python Solution
class Solution:
def numDistinct(self, s: str, t: str) -> int:
n = len(s)
m = len(t)
if m > n:
return 0
dp = [0] * (m + 1)
dp[0] = 1
for char_s in s:
for j in range(m - 1, -1, -1):
if char_s == t[j]:
dp[j + 1] += dp[j]
return dp[m]
The implementation begins by checking whether t is longer than s. If it is, forming t is impossible.
The dp array is initialized with m + 1 entries. dp[0] is set to 1 because there is always one way to create an empty string.
The outer loop iterates through every character in s. For each character, we iterate backward through t.
Backward traversal is essential because it prevents overwriting values needed later in the same iteration. When characters match, we extend all previously valid subsequences:
dp[j + 1] += dp[j]
At the end, dp[m] contains the total number of distinct subsequences matching t.
Go Solution
func numDistinct(s string, t string) int {
n := len(s)
m := len(t)
if m > n {
return 0
}
dp := make([]int, m+1)
dp[0] = 1
for i := 0; i < n; i++ {
for j := m - 1; j >= 0; j-- {
if s[i] == t[j] {
dp[j+1] += dp[j]
}
}
}
return dp[m]
}
The Go implementation mirrors the Python logic closely.
The DP slice is initialized using make([]int, m+1). Since Go initializes integers to 0 automatically, only dp[0] must be explicitly set to 1.
Backward iteration is equally important in Go for avoiding accidental overwrites.
Although Go uses int, the problem guarantees the answer fits within a 32-bit signed integer, so overflow is not a concern.
Worked Examples
Example 1
s = "rabbbit"
t = "rabbit"
Target length is 6.
Initial DP:
| Step | Character | DP State |
|---|---|---|
| Initial | - | [1,0,0,0,0,0,0] |
Process 'r':
| Character | DP |
|---|---|
| r | [1,1,0,0,0,0,0] |
Process 'a':
| Character | DP |
|---|---|
| a | [1,1,1,0,0,0,0] |
Process first 'b':
| Character | DP |
|---|---|
| b | [1,1,1,1,0,0,0] |
Process second 'b':
| Character | DP |
|---|---|
| b | [1,1,1,2,1,0,0] |
Process third 'b':
| Character | DP |
|---|---|
| b | [1,1,1,3,3,0,0] |
Process 'i':
| Character | DP |
|---|---|
| i | [1,1,1,3,3,3,0] |
Process 't':
| Character | DP |
|---|---|
| t | [1,1,1,3,3,3,3] |
Final answer:
3
Example 2
s = "babgbag"
t = "bag"
Initial:
| Step | DP |
|---|---|
| Initial | [1,0,0,0] |
After 'b':
[1,1,0,0]
After 'a':
[1,1,1,0]
After second 'b':
[1,2,1,0]
After 'g':
[1,2,1,1]
After third 'b':
[1,3,1,1]
After second 'a':
[1,3,4,1]
After second 'g':
[1,3,4,5]
Final answer:
5
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n × m) | Every character in s is compared against every character in t |
| Space | O(m) | Only a single DP array of size m + 1 is maintained |
The time complexity comes from the nested loops. For every character in s, we scan through t once.
The space complexity is optimized to O(m) because we only keep one DP row rather than a full n × m table.
Test Cases
solution = Solution()
assert solution.numDistinct("rabbbit", "rabbit") == 3 # provided example
assert solution.numDistinct("babgbag", "bag") == 5 # provided example
assert solution.numDistinct("abc", "abc") == 1 # identical strings
assert solution.numDistinct("abc", "abcd") == 0 # target longer than source
assert solution.numDistinct("abc", "d") == 0 # no matching characters
assert solution.numDistinct("aaaaa", "aa") == 10 # repeated characters
assert solution.numDistinct("aaa", "aaaa") == 0 # impossible due to length
assert solution.numDistinct("a", "a") == 1 # smallest valid match
assert solution.numDistinct("a", "b") == 0 # smallest invalid match
assert solution.numDistinct("banana", "ban") == 3 # multiple combinations
assert solution.numDistinct("ABCDE", "ACE") == 1 # ordered subsequence
assert solution.numDistinct("bbb", "bb") == 3 # repeated character counting
| Test | Why |
|---|---|
"rabbbit", "rabbit" |
Verifies repeated-character counting |
"babgbag", "bag" |
Confirms overlapping subsequences |
"abc", "abc" |
Exact match case |
"abc", "abcd" |
Target longer than source |
"abc", "d" |
No valid subsequence |
"aaaaa", "aa" |
Large combinatorial counting |
"aaa", "aaaa" |
Impossible construction |
"a", "a" |
Smallest valid input |
"a", "b" |
Smallest invalid input |
"banana", "ban" |
Multiple branching paths |
"ABCDE", "ACE" |
Standard subsequence behavior |
"bbb", "bb" |
Repeated characters and combinations |
Edge Cases
Target Longer Than Source
When t is longer than s, it is impossible to form the target because subsequences can only remove characters, not add them.
For example:
s = "abc"
t = "abcd"
A naive implementation may still run unnecessary DP work. Our solution handles this immediately with:
if m > n:
return 0
This avoids wasted computation.
Repeated Characters
Repeated characters are the hardest part of this problem because they create overlapping combinations.
For example:
s = "aaaaa"
t = "aa"
The answer is 10, not 1, because every distinct pair of positions counts separately.
A naive recursive solution can easily double count or miss combinations. The DP transition correctly accumulates all possibilities:
dp[j + 1] += dp[j]
This guarantees every valid combination is counted exactly once.
No Matching Characters
If s contains none of the characters required for t, the answer should be zero.
For example:
s = "abc"
t = "xyz"
Since no matches occur, the DP array never updates beyond:
[1,0,0,0]
The algorithm naturally returns 0 without any special-case logic.
Exact Match
When s and t are identical:
s = "abc"
t = "abc"
There is exactly one valid subsequence, choosing every character.
The DP formulation handles this naturally because each matching character extends exactly one existing path, leading to a final count of 1.