LeetCode 1759 - Count Number of Homogenous Substrings

The problem asks us to count all substrings in a string where every character inside the substring is identical. These are called homogenous substrings. A substring must be contiguous, meaning the characters must appear next to each other in the original string.

LeetCode Problem 1759

Difficulty: 🟡 Medium
Topics: Math, String

Solution

Problem Understanding

The problem asks us to count all substrings in a string where every character inside the substring is identical. These are called homogenous substrings.

A substring must be contiguous, meaning the characters must appear next to each other in the original string. For example, in the string "abbcccaa", the substring "ccc" is valid because all characters are 'c', while "bcc" is not homogenous because it contains different characters.

The input is a single string s consisting only of lowercase English letters. The output should be the total number of homogenous substrings, modulo 10^9 + 7.

The constraint 1 <= s.length <= 10^5 is important. It tells us that the solution must be efficient. Any algorithm with quadratic complexity, such as checking every possible substring individually, will likely be too slow for the upper limit.

The key observation is that homogenous substrings only come from consecutive runs of the same character. For example:

  • "aaa" contains:

  • "a" three times

  • "aa" two times

  • "aaa" one time

This gives a total of:

3 + 2 + 1 = 6

More generally, if a run has length k, then the number of homogenous substrings contributed by that run is:

k * (k + 1) / 2

Some important edge cases include:

  • A string with only one character
  • A string where all characters are different
  • A string where all characters are the same
  • Very long repeated runs, which can produce very large counts requiring modulo arithmetic

The problem guarantees that the string is non-empty and only contains lowercase letters, so we do not need to handle invalid input cases.

Approaches

Brute Force Approach

A straightforward solution is to generate every possible substring and check whether all characters inside it are the same.

There are O(n^2) substrings in a string of length n. For each substring, we may need up to O(n) work to verify that every character matches. Even with optimization, the brute force solution still requires at least O(n^2) time.

For example:

s = "abbcccaa"

We would examine:

  • "a"
  • "ab"
  • "abb"
  • "abbc"
  • and so on

For each substring, we would check if every character equals the first character.

This approach is correct because it explicitly evaluates every substring, but it is too slow for n = 100000.

Optimal Approach

The important insight is that homogenous substrings only depend on consecutive runs of identical characters.

Suppose we have a run:

"cccc"

Its length is 4.

The valid homogenous substrings are:

"c"    -> 4
"cc"   -> 3
"ccc"  -> 2
"cccc" -> 1

Total:

4 + 3 + 2 + 1 = 10

Instead of generating substrings explicitly, we can process the string once and track the current run length.

Every time we extend a run by one character, we gain exactly current_run_length new homogenous substrings ending at the current position.

For example:

s = "ccc"

Processing step by step:

  • first 'c' adds 1
  • second 'c' adds 2
  • third 'c' adds 3

Total:

1 + 2 + 3 = 6

This allows us to solve the problem in linear time.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) to O(n³) O(1) Checks every substring individually
Optimal O(n) O(1) Tracks consecutive character runs

Algorithm Walkthrough

  1. Initialize a variable result to store the total number of homogenous substrings.
  2. Initialize a variable current_run_length to 0. This variable tracks how many consecutive identical characters we have seen so far.
  3. Iterate through the string character by character.
  4. For each character:
  • If it matches the previous character, increment current_run_length.
  • Otherwise, reset current_run_length to 1 because a new run begins.
  1. Add current_run_length to result.
  • This works because every extension of a run creates new homogenous substrings ending at the current position.
  • If the run length is k, then there are exactly k homogenous substrings ending at this position.
  1. Apply modulo 10^9 + 7 after each addition to avoid integer overflow.
  2. Return the final result.

Why it works

At every index, the algorithm counts all homogenous substrings ending at that position.

If the current run length is k, then the valid substrings ending here are:

length 1
length 2
...
length k

There are exactly k such substrings. Since every homogenous substring ends somewhere in the string, summing these contributions over all positions counts every valid substring exactly once.

Python Solution

class Solution:
    def countHomogenous(self, s: str) -> int:
        MOD = 10**9 + 7

        result = 0
        current_run_length = 0

        for i, char in enumerate(s):
            if i > 0 and char == s[i - 1]:
                current_run_length += 1
            else:
                current_run_length = 1

            result = (result + current_run_length) % MOD

        return result

The implementation follows the exact logic from the algorithm walkthrough.

result stores the cumulative count of homogenous substrings.

current_run_length tracks the length of the current sequence of identical characters.

During iteration:

  • If the current character matches the previous one, the run continues and the length increases.
  • Otherwise, a new run begins, so the length resets to 1.

At each step, we add the current run length to the answer because that many homogenous substrings end at the current index.

Modulo arithmetic is applied during accumulation to keep the value within bounds.

Go Solution

func countHomogenous(s string) int {
    const MOD int = 1_000_000_007

    result := 0
    currentRunLength := 0

    for i := 0; i < len(s); i++ {
        if i > 0 && s[i] == s[i-1] {
            currentRunLength++
        } else {
            currentRunLength = 1
        }

        result = (result + currentRunLength) % MOD
    }

    return result
}

The Go implementation is nearly identical to the Python version.

Since Go strings are byte-indexable and the input only contains lowercase English letters, accessing characters with s[i] is safe and efficient.

The modulo operation is performed during accumulation to prevent integer overflow for large inputs.

Go does not require special handling for empty strings because the constraints guarantee at least one character.

Worked Examples

Example 1

Input: s = "abbcccaa"
Index Character Current Run Length Added to Result Total Result
0 a 1 1 1
1 b 1 1 2
2 b 2 2 4
3 c 1 1 5
4 c 2 2 7
5 c 3 3 10
6 a 1 1 11
7 a 2 2 13

Final answer:

13

Example 2

Input: s = "xy"
Index Character Current Run Length Added to Result Total Result
0 x 1 1 1
1 y 1 1 2

Final answer:

2

Example 3

Input: s = "zzzzz"
Index Character Current Run Length Added to Result Total Result
0 z 1 1 1
1 z 2 2 3
2 z 3 3 6
3 z 4 4 10
4 z 5 5 15

Final answer:

15

Complexity Analysis

Measure Complexity Explanation
Time O(n) The string is traversed exactly once
Space O(1) Only a few variables are used

The algorithm performs a single pass through the string. Each character is processed once, and all operations inside the loop are constant time operations.

No additional data structures proportional to the input size are required, so the space usage remains constant.

Test Cases

solution = Solution()

assert solution.countHomogenous("abbcccaa") == 13  # mixed runs
assert solution.countHomogenous("xy") == 2  # all distinct characters
assert solution.countHomogenous("zzzzz") == 15  # single large run
assert solution.countHomogenous("a") == 1  # minimum length input
assert solution.countHomogenous("aa") == 3  # simple repeated run
assert solution.countHomogenous("ababab") == 6  # alternating characters
assert solution.countHomogenous("cccccc") == 21  # longer repeated run
assert solution.countHomogenous("abcdddeeeeaabbbcd") == 30  # multiple varied runs
assert solution.countHomogenous("qwerty") == 6  # all unique characters
assert solution.countHomogenous("pppppppppp") == 55  # arithmetic series behavior

# stress style case
large_input = "a" * 100000
expected = (100000 * 100001 // 2) % (10**9 + 7)
assert solution.countHomogenous(large_input) == expected  # maximum constraint
Test Why
"abbcccaa" Standard mixed example
"xy" No repeated characters
"zzzzz" Entire string is one run
"a" Smallest possible input
"aa" Small repeated sequence
"ababab" Constantly resetting runs
"cccccc" Larger continuous run
"abcdddeeeeaabbbcd" Multiple run transitions
"qwerty" All characters unique
"pppppppppp" Verifies arithmetic growth
"a" * 100000 Maximum input size stress test

Edge Cases

Single Character String

A string containing only one character is the smallest valid input. A buggy implementation might incorrectly initialize counters and return 0 instead of 1.

Example:

s = "a"

The algorithm handles this correctly because the first character always starts a new run of length 1, contributing exactly one homogenous substring.

All Characters Different

When every character differs from the previous one, every run length is exactly 1.

Example:

s = "abcdef"

The only valid homogenous substrings are the individual characters themselves.

The implementation correctly resets current_run_length to 1 whenever the character changes.

Entire String Identical

A long run of identical characters can produce very large counts.

Example:

s = "aaaaa"

The number of homogenous substrings becomes:

1 + 2 + 3 + 4 + 5 = 15

For the maximum constraint size, this value becomes extremely large, so modulo arithmetic is necessary. The implementation applies modulo during accumulation, preventing overflow and ensuring correctness.