LeetCode 115 - Distinct Subsequences

This problem asks us to count how many distinct subsequences of a string s are equal to another string t. A subsequence is formed by deleting zero or more characters from a string without changing the relative order of the remaining characters.

LeetCode Problem 115

Difficulty: 🔴 Hard
Topics: String, Dynamic Programming

Solution

Problem Understanding

This problem asks us to count how many distinct subsequences of a string s are equal to another string t.

A subsequence is formed by deleting zero or more characters from a string without changing the relative order of the remaining characters. For example, "rabbit" is a subsequence of "rabbbit" because we can remove one of the three 'b' characters while preserving order.

The key detail is that we are counting distinct ways to form t, not merely checking whether t can be formed. Different choices of indices in s count as separate subsequences, even if the resulting string looks identical.

The input consists of:

  • s, the source string from which subsequences are selected
  • t, the target string we want to match

The output is a single integer representing the total number of distinct subsequences of s that equal t.

For example:

  • s = "rabbbit" and t = "rabbit" returns 3 because there are three different ways to choose which 'b' to remove.
  • s = "babgbag" and t = "bag" returns 5 because multiple character combinations preserve the required order.

The constraints are important:

  • 1 <= s.length, t.length <= 1000
  • Both strings contain only English letters.

A string length of 1000 immediately rules out exponential brute-force solutions. Since every character can either be included or excluded, a naive approach would explore up to 2^1000 possibilities, which is computationally infeasible.

Several edge cases are worth considering upfront:

  • If t is longer than s, forming t is impossible, so the answer must be 0.
  • If s == t, there is exactly one valid subsequence, the entire string itself.
  • Repeated characters create overlapping possibilities and are easy to double count incorrectly.
  • Strings with no matching characters should correctly return 0.

The challenge is therefore to count combinations efficiently while avoiding recomputation.

Approaches

Brute Force Approach

The brute-force idea is to recursively explore every possible subsequence of s.

At each character in s, we make a binary choice:

  1. Skip the character.
  2. Include the character if it matches the next needed character in t.

Whenever we finish matching all characters in t, we count one valid subsequence. If we exhaust s before matching all of t, that path contributes zero.

This approach is correct because it systematically explores every valid subsequence and counts exactly those that match t.

However, it is far too slow. Since each character in s creates two branching choices, the worst-case complexity becomes exponential, roughly O(2^n), where n is the length of s. With strings of length 1000, this is impossible to compute within time limits.

The main inefficiency is repeated work. The same substring states are solved again and again. For example, the question:

"How many ways can s[i:] form t[j:]?"

may appear repeatedly through different recursive paths.

Key Insight

The key observation is that this problem has overlapping subproblems, making it a perfect candidate for dynamic programming.

Instead of recomputing the same state repeatedly, we define:

dp[i][j] = number of ways to form t[:j] using s[:i]

This allows us to reuse previously computed results.

For each character pair:

  • If s[i-1] == t[j-1], we have two choices:

  • Use this character match

  • Skip it

So:

dp[i][j] = dp[i-1][j-1] + dp[i-1][j]
  • If the characters do not match, we can only skip the current character:
dp[i][j] = dp[i-1][j]

This transforms an exponential problem into a polynomial one.

Approach Time Complexity Space Complexity Notes
Brute Force O(2^n) O(n) Explores every subsequence recursively
Optimal Dynamic Programming O(n × m) O(m) Uses rolling DP to avoid recomputation

Where n = len(s) and m = len(t).

Algorithm Walkthrough

We use a 1D dynamic programming optimization to reduce memory usage.

Instead of storing an entire 2D table, we only keep one row because each state depends only on the previous row.

  1. Create a DP array of length m + 1, where m = len(t).

dp[j] represents the number of ways to form the first j characters of t. 2. Initialize the base case.

There is exactly one way to form an empty target string: choose nothing.

Therefore:

dp[0] = 1

All other positions begin at 0. 3. Iterate through characters in s.

For every character in s, we attempt to update matches against t. 4. Traverse t backwards.

This step is crucial.

We iterate from right to left because each update depends on the previous row's values. If we move left to right, we would overwrite information too early. 5. Update matching characters.

If:

s[i] == t[j]

then:

dp[j + 1] += dp[j]

This means:

  • dp[j] counts ways to form the prefix before the current character
  • We extend those ways using the matching character
  1. Continue until all characters are processed.
  2. Return dp[m].

This stores the total number of ways to form the full target string.

Why it works

The invariant is that after processing the first i characters of s, dp[j] correctly stores the number of ways to form the first j characters of t.

Each matching character gives two possibilities: use it or skip it. Each non-matching character can only be skipped. Because every valid subsequence is counted exactly once, and because DP preserves counts for all prefixes, the final answer is correct.

Python Solution

class Solution:
    def numDistinct(self, s: str, t: str) -> int:
        n = len(s)
        m = len(t)

        if m > n:
            return 0

        dp = [0] * (m + 1)
        dp[0] = 1

        for char_s in s:
            for j in range(m - 1, -1, -1):
                if char_s == t[j]:
                    dp[j + 1] += dp[j]

        return dp[m]

The implementation begins by checking whether t is longer than s. If it is, forming t is impossible.

The dp array is initialized with m + 1 entries. dp[0] is set to 1 because there is always one way to create an empty string.

The outer loop iterates through every character in s. For each character, we iterate backward through t.

Backward traversal is essential because it prevents overwriting values needed later in the same iteration. When characters match, we extend all previously valid subsequences:

dp[j + 1] += dp[j]

At the end, dp[m] contains the total number of distinct subsequences matching t.

Go Solution

func numDistinct(s string, t string) int {
	n := len(s)
	m := len(t)

	if m > n {
		return 0
	}

	dp := make([]int, m+1)
	dp[0] = 1

	for i := 0; i < n; i++ {
		for j := m - 1; j >= 0; j-- {
			if s[i] == t[j] {
				dp[j+1] += dp[j]
			}
		}
	}

	return dp[m]
}

The Go implementation mirrors the Python logic closely.

The DP slice is initialized using make([]int, m+1). Since Go initializes integers to 0 automatically, only dp[0] must be explicitly set to 1.

Backward iteration is equally important in Go for avoiding accidental overwrites.

Although Go uses int, the problem guarantees the answer fits within a 32-bit signed integer, so overflow is not a concern.

Worked Examples

Example 1

s = "rabbbit"
t = "rabbit"

Target length is 6.

Initial DP:

Step Character DP State
Initial - [1,0,0,0,0,0,0]

Process 'r':

Character DP
r [1,1,0,0,0,0,0]

Process 'a':

Character DP
a [1,1,1,0,0,0,0]

Process first 'b':

Character DP
b [1,1,1,1,0,0,0]

Process second 'b':

Character DP
b [1,1,1,2,1,0,0]

Process third 'b':

Character DP
b [1,1,1,3,3,0,0]

Process 'i':

Character DP
i [1,1,1,3,3,3,0]

Process 't':

Character DP
t [1,1,1,3,3,3,3]

Final answer:

3

Example 2

s = "babgbag"
t = "bag"

Initial:

Step DP
Initial [1,0,0,0]

After 'b':

[1,1,0,0]

After 'a':

[1,1,1,0]

After second 'b':

[1,2,1,0]

After 'g':

[1,2,1,1]

After third 'b':

[1,3,1,1]

After second 'a':

[1,3,4,1]

After second 'g':

[1,3,4,5]

Final answer:

5

Complexity Analysis

Measure Complexity Explanation
Time O(n × m) Every character in s is compared against every character in t
Space O(m) Only a single DP array of size m + 1 is maintained

The time complexity comes from the nested loops. For every character in s, we scan through t once.

The space complexity is optimized to O(m) because we only keep one DP row rather than a full n × m table.

Test Cases

solution = Solution()

assert solution.numDistinct("rabbbit", "rabbit") == 3  # provided example
assert solution.numDistinct("babgbag", "bag") == 5  # provided example

assert solution.numDistinct("abc", "abc") == 1  # identical strings
assert solution.numDistinct("abc", "abcd") == 0  # target longer than source
assert solution.numDistinct("abc", "d") == 0  # no matching characters
assert solution.numDistinct("aaaaa", "aa") == 10  # repeated characters
assert solution.numDistinct("aaa", "aaaa") == 0  # impossible due to length
assert solution.numDistinct("a", "a") == 1  # smallest valid match
assert solution.numDistinct("a", "b") == 0  # smallest invalid match
assert solution.numDistinct("banana", "ban") == 3  # multiple combinations
assert solution.numDistinct("ABCDE", "ACE") == 1  # ordered subsequence
assert solution.numDistinct("bbb", "bb") == 3  # repeated character counting
Test Why
"rabbbit", "rabbit" Verifies repeated-character counting
"babgbag", "bag" Confirms overlapping subsequences
"abc", "abc" Exact match case
"abc", "abcd" Target longer than source
"abc", "d" No valid subsequence
"aaaaa", "aa" Large combinatorial counting
"aaa", "aaaa" Impossible construction
"a", "a" Smallest valid input
"a", "b" Smallest invalid input
"banana", "ban" Multiple branching paths
"ABCDE", "ACE" Standard subsequence behavior
"bbb", "bb" Repeated characters and combinations

Edge Cases

Target Longer Than Source

When t is longer than s, it is impossible to form the target because subsequences can only remove characters, not add them.

For example:

s = "abc"
t = "abcd"

A naive implementation may still run unnecessary DP work. Our solution handles this immediately with:

if m > n:
    return 0

This avoids wasted computation.

Repeated Characters

Repeated characters are the hardest part of this problem because they create overlapping combinations.

For example:

s = "aaaaa"
t = "aa"

The answer is 10, not 1, because every distinct pair of positions counts separately.

A naive recursive solution can easily double count or miss combinations. The DP transition correctly accumulates all possibilities:

dp[j + 1] += dp[j]

This guarantees every valid combination is counted exactly once.

No Matching Characters

If s contains none of the characters required for t, the answer should be zero.

For example:

s = "abc"
t = "xyz"

Since no matches occur, the DP array never updates beyond:

[1,0,0,0]

The algorithm naturally returns 0 without any special-case logic.

Exact Match

When s and t are identical:

s = "abc"
t = "abc"

There is exactly one valid subsequence, choosing every character.

The DP formulation handles this naturally because each matching character extends exactly one existing path, leading to a final count of 1.