LeetCode 1759 - Count Number of Homogenous Substrings
The problem asks us to count all substrings in a string where every character inside the substring is identical. These are called homogenous substrings. A substring must be contiguous, meaning the characters must appear next to each other in the original string.
Difficulty: 🟡 Medium
Topics: Math, String
Solution
Problem Understanding
The problem asks us to count all substrings in a string where every character inside the substring is identical. These are called homogenous substrings.
A substring must be contiguous, meaning the characters must appear next to each other in the original string. For example, in the string "abbcccaa", the substring "ccc" is valid because all characters are 'c', while "bcc" is not homogenous because it contains different characters.
The input is a single string s consisting only of lowercase English letters. The output should be the total number of homogenous substrings, modulo 10^9 + 7.
The constraint 1 <= s.length <= 10^5 is important. It tells us that the solution must be efficient. Any algorithm with quadratic complexity, such as checking every possible substring individually, will likely be too slow for the upper limit.
The key observation is that homogenous substrings only come from consecutive runs of the same character. For example:
-
"aaa"contains: -
"a"three times -
"aa"two times -
"aaa"one time
This gives a total of:
3 + 2 + 1 = 6
More generally, if a run has length k, then the number of homogenous substrings contributed by that run is:
k * (k + 1) / 2
Some important edge cases include:
- A string with only one character
- A string where all characters are different
- A string where all characters are the same
- Very long repeated runs, which can produce very large counts requiring modulo arithmetic
The problem guarantees that the string is non-empty and only contains lowercase letters, so we do not need to handle invalid input cases.
Approaches
Brute Force Approach
A straightforward solution is to generate every possible substring and check whether all characters inside it are the same.
There are O(n^2) substrings in a string of length n. For each substring, we may need up to O(n) work to verify that every character matches. Even with optimization, the brute force solution still requires at least O(n^2) time.
For example:
s = "abbcccaa"
We would examine:
"a""ab""abb""abbc"- and so on
For each substring, we would check if every character equals the first character.
This approach is correct because it explicitly evaluates every substring, but it is too slow for n = 100000.
Optimal Approach
The important insight is that homogenous substrings only depend on consecutive runs of identical characters.
Suppose we have a run:
"cccc"
Its length is 4.
The valid homogenous substrings are:
"c" -> 4
"cc" -> 3
"ccc" -> 2
"cccc" -> 1
Total:
4 + 3 + 2 + 1 = 10
Instead of generating substrings explicitly, we can process the string once and track the current run length.
Every time we extend a run by one character, we gain exactly current_run_length new homogenous substrings ending at the current position.
For example:
s = "ccc"
Processing step by step:
- first
'c'adds1 - second
'c'adds2 - third
'c'adds3
Total:
1 + 2 + 3 = 6
This allows us to solve the problem in linear time.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) to O(n³) | O(1) | Checks every substring individually |
| Optimal | O(n) | O(1) | Tracks consecutive character runs |
Algorithm Walkthrough
- Initialize a variable
resultto store the total number of homogenous substrings. - Initialize a variable
current_run_lengthto0. This variable tracks how many consecutive identical characters we have seen so far. - Iterate through the string character by character.
- For each character:
- If it matches the previous character, increment
current_run_length. - Otherwise, reset
current_run_lengthto1because a new run begins.
- Add
current_run_lengthtoresult.
- This works because every extension of a run creates new homogenous substrings ending at the current position.
- If the run length is
k, then there are exactlykhomogenous substrings ending at this position.
- Apply modulo
10^9 + 7after each addition to avoid integer overflow. - Return the final result.
Why it works
At every index, the algorithm counts all homogenous substrings ending at that position.
If the current run length is k, then the valid substrings ending here are:
length 1
length 2
...
length k
There are exactly k such substrings. Since every homogenous substring ends somewhere in the string, summing these contributions over all positions counts every valid substring exactly once.
Python Solution
class Solution:
def countHomogenous(self, s: str) -> int:
MOD = 10**9 + 7
result = 0
current_run_length = 0
for i, char in enumerate(s):
if i > 0 and char == s[i - 1]:
current_run_length += 1
else:
current_run_length = 1
result = (result + current_run_length) % MOD
return result
The implementation follows the exact logic from the algorithm walkthrough.
result stores the cumulative count of homogenous substrings.
current_run_length tracks the length of the current sequence of identical characters.
During iteration:
- If the current character matches the previous one, the run continues and the length increases.
- Otherwise, a new run begins, so the length resets to
1.
At each step, we add the current run length to the answer because that many homogenous substrings end at the current index.
Modulo arithmetic is applied during accumulation to keep the value within bounds.
Go Solution
func countHomogenous(s string) int {
const MOD int = 1_000_000_007
result := 0
currentRunLength := 0
for i := 0; i < len(s); i++ {
if i > 0 && s[i] == s[i-1] {
currentRunLength++
} else {
currentRunLength = 1
}
result = (result + currentRunLength) % MOD
}
return result
}
The Go implementation is nearly identical to the Python version.
Since Go strings are byte-indexable and the input only contains lowercase English letters, accessing characters with s[i] is safe and efficient.
The modulo operation is performed during accumulation to prevent integer overflow for large inputs.
Go does not require special handling for empty strings because the constraints guarantee at least one character.
Worked Examples
Example 1
Input: s = "abbcccaa"
| Index | Character | Current Run Length | Added to Result | Total Result |
|---|---|---|---|---|
| 0 | a | 1 | 1 | 1 |
| 1 | b | 1 | 1 | 2 |
| 2 | b | 2 | 2 | 4 |
| 3 | c | 1 | 1 | 5 |
| 4 | c | 2 | 2 | 7 |
| 5 | c | 3 | 3 | 10 |
| 6 | a | 1 | 1 | 11 |
| 7 | a | 2 | 2 | 13 |
Final answer:
13
Example 2
Input: s = "xy"
| Index | Character | Current Run Length | Added to Result | Total Result |
|---|---|---|---|---|
| 0 | x | 1 | 1 | 1 |
| 1 | y | 1 | 1 | 2 |
Final answer:
2
Example 3
Input: s = "zzzzz"
| Index | Character | Current Run Length | Added to Result | Total Result |
|---|---|---|---|---|
| 0 | z | 1 | 1 | 1 |
| 1 | z | 2 | 2 | 3 |
| 2 | z | 3 | 3 | 6 |
| 3 | z | 4 | 4 | 10 |
| 4 | z | 5 | 5 | 15 |
Final answer:
15
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | The string is traversed exactly once |
| Space | O(1) | Only a few variables are used |
The algorithm performs a single pass through the string. Each character is processed once, and all operations inside the loop are constant time operations.
No additional data structures proportional to the input size are required, so the space usage remains constant.
Test Cases
solution = Solution()
assert solution.countHomogenous("abbcccaa") == 13 # mixed runs
assert solution.countHomogenous("xy") == 2 # all distinct characters
assert solution.countHomogenous("zzzzz") == 15 # single large run
assert solution.countHomogenous("a") == 1 # minimum length input
assert solution.countHomogenous("aa") == 3 # simple repeated run
assert solution.countHomogenous("ababab") == 6 # alternating characters
assert solution.countHomogenous("cccccc") == 21 # longer repeated run
assert solution.countHomogenous("abcdddeeeeaabbbcd") == 30 # multiple varied runs
assert solution.countHomogenous("qwerty") == 6 # all unique characters
assert solution.countHomogenous("pppppppppp") == 55 # arithmetic series behavior
# stress style case
large_input = "a" * 100000
expected = (100000 * 100001 // 2) % (10**9 + 7)
assert solution.countHomogenous(large_input) == expected # maximum constraint
| Test | Why |
|---|---|
"abbcccaa" |
Standard mixed example |
"xy" |
No repeated characters |
"zzzzz" |
Entire string is one run |
"a" |
Smallest possible input |
"aa" |
Small repeated sequence |
"ababab" |
Constantly resetting runs |
"cccccc" |
Larger continuous run |
"abcdddeeeeaabbbcd" |
Multiple run transitions |
"qwerty" |
All characters unique |
"pppppppppp" |
Verifies arithmetic growth |
"a" * 100000 |
Maximum input size stress test |
Edge Cases
Single Character String
A string containing only one character is the smallest valid input. A buggy implementation might incorrectly initialize counters and return 0 instead of 1.
Example:
s = "a"
The algorithm handles this correctly because the first character always starts a new run of length 1, contributing exactly one homogenous substring.
All Characters Different
When every character differs from the previous one, every run length is exactly 1.
Example:
s = "abcdef"
The only valid homogenous substrings are the individual characters themselves.
The implementation correctly resets current_run_length to 1 whenever the character changes.
Entire String Identical
A long run of identical characters can produce very large counts.
Example:
s = "aaaaa"
The number of homogenous substrings becomes:
1 + 2 + 3 + 4 + 5 = 15
For the maximum constraint size, this value becomes extremely large, so modulo arithmetic is necessary. The implementation applies modulo during accumulation, preventing overflow and ensuring correctness.