LeetCode 3333 - Find the Original Typed String II

The problem asks us to determine the number of possible original strings that Alice could have intended to type, given a string word that represents the final output and an integer k representing the minimum length of the original string.

LeetCode Problem 3333

Difficulty: 🔴 Hard
Topics: String, Dynamic Programming, Prefix Sum

Solution

Problem Understanding

The problem asks us to determine the number of possible original strings that Alice could have intended to type, given a string word that represents the final output and an integer k representing the minimum length of the original string. Because Alice may press keys too long, each character in the output may correspond to one or more repeated presses of a character in the original string.

In other words, starting from the typed word, we need to count all sequences that can be derived by reducing consecutive characters to a single character or leaving them as-is, provided the final "original" string is at least k characters long. The key is that characters can only be compressed, not expanded, and the original string cannot be shorter than k.

The input word has a maximum length of 5 * 10^5, and k is at most 2000. This indicates that brute force enumeration of all possible sequences will be infeasible, and we need a method that scales linearly or near-linearly with respect to word while factoring in the manageable constraint on k. Edge cases include a string with all identical characters, strings where k equals the word length, and very short words where the minimum k is greater than the word length.

Approaches

The brute-force approach would involve generating all possible ways to collapse consecutive identical characters to a smaller count (down to one), then filtering out sequences shorter than k. This is correct logically but computationally impossible for large inputs, because each block of repeated characters can produce an exponential number of combinations.

The key insight is that the problem can be solved with dynamic programming using prefix sums for efficiency. We iterate through the string while maintaining a DP array where dp[i] represents the number of valid original strings ending at position i. For each group of consecutive identical characters, we determine the possible lengths we can use in the original string and add these to dp efficiently using prefix sums to avoid recalculating sums repeatedly. This leverages the fact that k is relatively small, which allows the DP array to track lengths up to k efficiently.

Approach Time Complexity Space Complexity Notes
Brute Force O(2^n) O(n) Generate all reductions of consecutive characters; infeasible for n = 5 * 10^5
Optimal O(n * k) O(k) Dynamic programming with prefix sums leveraging k <= 2000

Algorithm Walkthrough

  1. Initialize a DP array of size k + 1 where dp[i] tracks the number of ways to form an original string of length i. Initialize dp[0] = 1 because there is one way to form a string of length 0 (empty string).
  2. Iterate through word and identify consecutive blocks of the same character. For example, in "aaabb", the first block is "aaa" and the second is "bb".
  3. For each block, determine the range of possible contributions to the original string. A block of length len_block can contribute 1 to len_block characters to the original string.
  4. Update the DP array for the new block using prefix sums. Specifically, calculate the new dp for each possible length by summing the counts of all valid previous lengths that can combine with the current block.
  5. Ensure that the DP updates only count sequences of length at least k.
  6. Return the sum of all counts in dp that correspond to lengths >= k. Apply modulo 10^9 + 7 to prevent integer overflow.

Why it works: At every step, the DP array accurately counts the number of valid original strings ending at that position. Using prefix sums ensures that overlapping contributions are efficiently calculated, and the final sum correctly accounts for all sequences of at least length k. The problem describes a typing mistake where Alice may accidentally hold a key for too long. When this happens, a single intended character can appear multiple times in the final displayed string.

For example, if Alice intended to type "abc" but held the 'a' key too long, the resulting string might become "aaabc". The displayed string therefore contains groups of repeated characters, and each group could have originated from one or more intended characters.

We are given two inputs:

  • word, the final displayed string after all accidental long presses
  • k, the minimum allowed length of the original intended string

The goal is to count how many different original strings Alice could have intended to type, such that:

  1. Expanding characters through long presses could produce word
  2. The original string length is at least k

The answer must be returned modulo 10^9 + 7.

The important observation is that every contiguous block of identical characters in word corresponds to one character in the original string, repeated some number of times because of long presses.

For example:

  • "aaa" could come from:

  • "a"

  • "aa"

  • "aaa"

The original string can never change the character order. The only freedom is deciding how many intended characters each repeated block originally contained.

The constraints are large:

  • word.length can be up to 5 * 10^5
  • k can be up to 2000

These constraints immediately rule out brute force generation of all possible original strings. The solution must process the string efficiently, ideally close to linear time.

Several edge cases are important:

  • A string with no repeated characters, where only one original string is possible
  • Very large repeated groups such as "aaaaaaaaaa"
  • Cases where k is larger than every possible original length
  • Cases where every valid decomposition is acceptable because k is small
  • Extremely long input strings, where quadratic solutions will time out

Approaches

Brute Force Approach

The brute force idea is to process each group of repeated characters independently.

Suppose a group has length m. Then the original string could have contributed anywhere from 1 to m characters for that group.

For example:

  • "aaaa" allows:

  • "a"

  • "aa"

  • "aaa"

  • "aaaa"

If the string has multiple groups, we could recursively try every possible choice for every group, construct all possible original lengths, and count how many satisfy the condition length >= k.

This approach is correct because it explicitly enumerates every valid interpretation of the typing process.

Unfortunately, it is far too slow.

If the groups have sizes:

g1, g2, g3, ...

then the total number of combinations is:

g1 * g2 * g3 * ...

In the worst case this becomes exponential.

For example:

"aaaaaaaaaaaaaaaa..."

already creates a huge branching factor.

Since the input length can reach 500000, brute force is completely infeasible.

Key Insight

The important observation is that we do not actually care about the exact original strings. We only care about their lengths.

Suppose the repeated groups have lengths:

c1, c2, c3, ..., cm

For each group ci, we choose a contribution between:

1 to ci

The final original string length is simply:

x1 + x2 + x3 + ... + xm

where:

1 <= xi <= ci

So the problem becomes:

Count how many ways exist to choose values from bounded ranges such that the total sum is at least k.

This transforms the problem into a bounded dynamic programming problem.

A direct DP would still be too slow if implemented naively. However, since k <= 2000, we only need to track sums up to k - 1.

Instead of counting valid sums directly, we compute:

total combinations - combinations with length < k

The total number of combinations is easy:

c1 * c2 * c3 * ...

Then we use DP with prefix sums to efficiently count the invalid cases where the total original length is less than k.

Comparison of Approaches

Approach Time Complexity Space Complexity Notes
Brute Force Exponential Exponential Enumerates every possible decomposition
Optimal O(n + gk) O(k) Uses bounded DP with prefix sums

Here:

  • n is the length of word
  • g is the number of character groups
  • k <= 2000

Algorithm Walkthrough

  1. First, compress the string into groups of consecutive identical characters.

For example:

"aaabccddd"

becomes:

[3,1,2,3]

Each number represents how many repeated characters appeared in one block. 2. Compute the total number of possible original strings.

For a group of size c, the original group could contain any number from 1 to c.

Therefore the total number of choices for that group is c.

Multiply all group sizes together:

total = c1 * c2 * c3 * ...
  1. Observe that every original string must contain at least one character from each group.

Therefore the minimum possible original length is:

number_of_groups

If this minimum is already at least k, then every possible decomposition is valid.

We can immediately return the total count. 4. Otherwise, we need to count how many decompositions produce total length strictly less than k.

Define:

dp[s]

as the number of ways to achieve original length s. 5. Initialize the DP.

Before processing any groups:

dp[0] = 1

because there is exactly one way to form length 0, by choosing nothing. 6. Process groups one by one.

For each group with size c, we transition:

new_dp[s] =
    dp[s-1] + dp[s-2] + ... + dp[s-c]

because we may contribute between 1 and c characters from this group. 7. A naive implementation would require O(k * c) per group, which is too slow.

Instead, use prefix sums.

Let:

prefix[i] = dp[0] + dp[1] + ... + dp[i-1]

Then:

new_dp[s] =
    prefix[s] - prefix[max(0, s-c)]

This computes the range sum in constant time. 8. After processing all groups, compute:

invalid = sum(dp[s] for s < k)
  1. The final answer is:
total - invalid

taken modulo 10^9 + 7.

Why it works

Each repeated character block is independent. For every block of size c, we choose how many intended characters originally existed, between 1 and c.

The DP correctly counts every possible total original length because each transition represents choosing a valid contribution from the current group. Prefix sums do not change the recurrence, they only accelerate its computation.

Subtracting the number of decompositions with length less than k from the total number of decompositions leaves exactly the valid original strings whose length is at least k.

Python Solution

class Solution:
    def possibleStringCount(self, word: str, k: int) -> int:
        MOD = 10**9 + 7
        n = len(word)
        dp = [0] * (k + 1)
        dp[0] = 1

        i = 0
        while i < n:
            char = word[i]
            j = i
            while j < n and word[j] == char:
                j += 1
            length_block = j - i

            new_dp = [0] * (k + 1)
            prefix = [0] * (k + 2)
            for idx in range(k + 1):
                prefix[idx + 1] = (prefix[idx] + dp[idx]) % MOD
            for l in range(1, min(length_block, k) + 1):
                for idx in range(k - l + 1):
                    new_dp[idx + l] = (new_dp[idx + l] + dp[idx]) % MOD
            dp = new_dp
            i = j

        return sum(dp[k:]) % MOD

The code first initializes the DP array and iterates through word to find consecutive character blocks. For each block, it calculates all valid original string lengths that include this block using prefix sums. After processing all blocks, it sums all entries corresponding to lengths >= k to get the final result modulo 10^9 + 7.

    groups = []

    n = len(word)
    i = 0

    while i < n:
        j = i

        while j < n and word[j] == word[i]:
            j += 1

        groups.append(j - i)
        i = j

    total = 1

    for size in groups:
        total = (total * size) % MOD

    minimum_length = len(groups)

    if minimum_length >= k:
        return total

    dp = [0] * k
    dp[0] = 1

    for size in groups:
        prefix = [0] * (k + 1)

        for i in range(k):
            prefix[i + 1] = (prefix[i] + dp[i]) % MOD

        new_dp = [0] * k

        for length in range(1, k):
            left = max(0, length - size)
            right = length

            new_dp[length] = (
                prefix[right] - prefix[left]
            ) % MOD

        dp = new_dp

    invalid = sum(dp[:k]) % MOD

    return (total - invalid) % MOD

The implementation begins by compressing the string into consecutive character groups. This preprocessing step is essential because the exact characters do not matter, only the sizes of the repeated blocks.

The variable `total` stores the total number of possible decompositions. Since each group independently contributes between `1` and its full size, the number of choices multiplies across groups.

The early return optimization is important. If the minimum possible original length already satisfies the requirement, then every decomposition is valid and no dynamic programming is necessary.

The DP array tracks how many ways exist to build each possible original length smaller than `k`. Since we only care about excluding invalid lengths, we never need states beyond `k - 1`.

Prefix sums allow each DP transition to be computed in constant time. Without them, each state would require iterating across up to `size` previous states.

Finally, we subtract all invalid decompositions from the total number of decompositions.

## Go Solution

```go
func possibleStringCount(word string, k int) int {
    MOD := int(1e9 + 7)
    n := len(word)
    dp := make([]int, k+1)
    dp[0] = 1

    for i := 0; i < n; {
        char := word[i]
        j := i
        for j < n && word[j] == char {
            j++
        }
        lengthBlock := j - i

        newDp := make([]int, k+1)
        for l := 1; l <= lengthBlock && l <= k; l++ {
            for idx := 0; idx+l <= k; idx++ {
                newDp[idx+l] = (newDp[idx+l] + dp[idx]) % MOD
            }
        }
        dp = newDp
        i = j
    }

    total := 0
    for _, val := range dp[k:] {
        total = (total + val) % MOD
    }
    return total
}

In Go, slices are used for the DP array. The logic mirrors the Python implementation, iterating through character blocks and updating dp for possible original string lengths. The modulo operation ensures values do not exceed integer limits.

Worked Examples

Example 1: word = "aabbccdd", k = 7

Initial DP: [1, 0, 0, 0, 0, 0, 0, 0]

Processing block "aa": new DP contributes lengths 1 or 2.

Processing "bb": extend previous sequences with lengths 1 or 2, etc.

After all blocks, sum DP entries dp[7:] = 5

Example 2: word = "aaabbb", k = 3

Blocks: "aaa" and "bbb"

DP after first block: sequences of length 1, 2, 3

DP after second block: sequences extendable to length >= 3

Final sum: 8 package main

func possibleStringCount(word string, k int) int { const MOD int = 1_000_000_007

groups := []int{}

n := len(word)
i := 0

for i < n {
	j := i

	for j < n && word[j] == word[i] {
		j++
	}

	groups = append(groups, j-i)
	i = j
}

total := 1

for _, size := range groups {
	total = (total * size) % MOD
}

minimumLength := len(groups)

if minimumLength >= k {
	return total
}

dp := make([]int, k)
dp[0] = 1

for _, size := range groups {
	prefix := make([]int, k+1)

	for i := 0; i < k; i++ {
		prefix[i+1] = (prefix[i] + dp[i]) % MOD
	}

	newDP := make([]int, k)

	for length := 1; length < k; length++ {
		left := length - size

		if left < 0 {
			left = 0
		}

		value := prefix[length] - prefix[left]

		if value < 0 {
			value += MOD
		}

		newDP[length] = value
	}

	dp = newDP
}

invalid := 0

for _, value := range dp {
	invalid = (invalid + value) % MOD
}

answer := total - invalid

if answer < 0 {
	answer += MOD
}

return answer

}


The Go implementation follows the same logic as the Python solution. Since Go does not automatically normalize negative modulo values, the code explicitly adds `MOD` whenever subtraction becomes negative.

Slices are used for dynamic arrays, and all arithmetic is performed with integers since the modulo fits safely within Go's integer range.

## Worked Examples

### Example 1

Input:

word = "aabbccdd" k = 7


Group compression:

| Group | Size |
| --- | --- |
| aa | 2 |
| bb | 2 |
| cc | 2 |
| dd | 2 |

So:

groups = [2,2,2,2]


Total decompositions:

2 * 2 * 2 * 2 = 16


Minimum possible original length:

4


Since `4 < 7`, we need DP.

Initial DP:

| Length | Ways |
| --- | --- |
| 0 | 1 |

After first group:

| Length | Ways |
| --- | --- |
| 1 | 1 |
| 2 | 1 |

After second group:

| Length | Ways |
| --- | --- |
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |

After third group:

| Length | Ways |
| --- | --- |
| 3 | 1 |
| 4 | 3 |
| 5 | 3 |
| 6 | 1 |

After fourth group:

| Length | Ways |
| --- | --- |
| 4 | 1 |
| 5 | 4 |
| 6 | 6 |

Invalid decompositions:

1 + 4 + 6 = 11


Final answer:

16 - 11 = 5


### Example 2

Input:

word = "aabbccdd" k = 8


Groups:

[2,2,2,2]


Total decompositions:

16


Only one decomposition reaches length `8`:

2 + 2 + 2 + 2


Therefore the answer is:

1


### Example 3

Input:

word = "aaabbb" k = 3


Groups:

[3,3]


Total decompositions:

3 * 3 = 9


Possible original lengths:

| First Group | Second Group | Total Length |
| --- | --- | --- |
| 1 | 1 | 2 |
| 1 | 2 | 3 |
| 1 | 3 | 4 |
| 2 | 1 | 3 |
| 2 | 2 | 4 |
| 2 | 3 | 5 |
| 3 | 1 | 4 |
| 3 | 2 | 5 |
| 3 | 3 | 6 |

Only one decomposition has length less than `3`:

1 + 1 = 2


So:

9 - 1 = 8


## Complexity Analysis

| Measure | Complexity | Explanation |
| --- | --- | --- |
| Time | O(n * k) | We iterate over `n` characters, and for each block we iterate up to `k` lengths |
| Space | O(k) | The DP array size is `k + 1` and prefix sums use similar space |

The complexity is manageable because `k <= 2000` while `n` can be up to `5*10^5`.
| Time | O(n + gk) | String compression is linear, DP processes each group across k states |
| Space | O(k) | Only the current DP row and prefix sums are stored |

Here:

- `n` is the length of the string
- `g` is the number of character groups
- `k <= 2000`

The important optimization is that the DP only tracks lengths smaller than `k`. Since `k` is relatively small, the algorithm remains efficient even for very large strings.

The prefix sum optimization reduces each DP transition from `O(size)` to `O(1)`, which is critical for passing the largest test cases.

## Test Cases

Provided examples

assert Solution().possibleStringCount("aabbccdd", 7) == 5 assert Solution().possibleStringCount("aabbccdd", 8) == 1 assert Solution().possibleStringCount("aaabbb", 3) == 8

Edge and stress cases

assert Solution().possibleStringCount("a", 1) == 1 # single char assert Solution().possibleStringCount("aaaaa", 1) == 5 # all same char assert Solution().possibleStringCount("abc", 3) == 1 # all distinct, k = length assert Solution().possibleStringCount("abc", 4) == 0 # impossible, k > length assert Solution().possibleStringCount("a"*2000, 2000) == 1 # max k, all same


| Test | Why |
| --- | --- |
| `"aabbccdd", 7` | Tests general case with multiple blocks |
| `"aabbccdd", 8` | Tests when k equals string length |
| `"aaabbb", 3` | Tests overlapping blocks and multiple combinations |
| `"a", 1` | Minimum input size |
| `"aaaaa", 1` | Single repeated character |
| `"abc", 3` | Distinct characters, k = length |
| `"abc", 4` | k greater than length, should return 0 |
| `"a"*2000, 2000` | Large block with k equal to block length |

## Edge Cases

One edge case is when the entire string is a single repeated character. Here, the algorithm correctly calculates sequences by considering all possible reductions of the block. Another edge case is
sol = Solution()

assert sol.possibleStringCount("aabbccdd", 7) == 5
# Example 1

assert sol.possibleStringCount("aabbccdd", 8) == 1
# Example 2

assert sol.possibleStringCount("aaabbb", 3) == 8
# Example 3

assert sol.possibleStringCount("a", 1) == 1
# Smallest possible input

assert sol.possibleStringCount("aaaa", 1) == 4
# Every decomposition is valid

assert sol.possibleStringCount("aaaa", 4) == 1
# Only one decomposition reaches maximum length

assert sol.possibleStringCount("abcd", 4) == 1
# No repeated characters

assert sol.possibleStringCount("abcd", 5) == 0
# Impossible to reach required length

assert sol.possibleStringCount("aaabb", 3) == 5
# Mixed group sizes

assert sol.possibleStringCount("zzzzzz", 2) == 5
# Single large group

assert sol.possibleStringCount("zzzzzz", 7) == 0
# k larger than every possible length

assert sol.possibleStringCount("aabbbbcc", 5) == 11
# Multiple uneven groups

Test Summary

Test Why
"aabbccdd", 7 Standard mixed case
"aabbccdd", 8 Only maximum decomposition valid
"aaabbb", 3 Small DP example
"a", 1 Minimum input size
"aaaa", 1 All decompositions valid
"aaaa", 4 Only full-length decomposition valid
"abcd", 4 No repeated groups
"abcd", 5 Impossible target length
"aaabb", 3 Uneven group sizes
"zzzzzz", 2 Single repeated block
"zzzzzz", 7 Required length impossible
"aabbbbcc", 5 General stress scenario

Edge Cases

One important edge case occurs when the string contains no repeated characters at all, such as "abcd". In this situation, every group has size 1, meaning there is exactly one possible original string. A buggy implementation might incorrectly attempt extra decompositions. The algorithm handles this naturally because the product of group sizes remains 1.

Another critical edge case is when k exceeds the maximum achievable original length. For example:

word = "aaaa"
k = 5

The largest possible original string has length 4, so the answer must be 0. The DP correctly counts all decompositions with length less than k, causing the subtraction to eliminate every possibility.

A third important case is when the minimum possible original length already satisfies the condition. For example:

word = "abcd"
k = 2

Every decomposition is automatically valid because each group contributes at least one character. The early return optimization avoids unnecessary dynamic programming and keeps the solution efficient for large inputs.

Another subtle edge case involves modulo subtraction. Since we compute:

total - invalid

the intermediate result may become negative before taking modulo. Both implementations explicitly normalize the result to ensure correctness under modular arithmetic.