LeetCode 2156 - Find Substring With Given Hash Value
This problem asks us to find the earliest substring of length k whose polynomial rolling hash equals a given target value. The hash function is defined as: where: - val('a') = 1 - val('b') = 2 - ...
Difficulty: 🔴 Hard
Topics: String, Sliding Window, Rolling Hash, Hash Function
Solution
Problem Understanding
This problem asks us to find the earliest substring of length k whose polynomial rolling hash equals a given target value.
The hash function is defined as:
$$hash(s, p, m) = \left( \sum_{i=0}^{k-1} val(s[i]) \cdot p^i \right) \bmod m$$
where:
val('a') = 1val('b') = 2- ...
val('z') = 26
For a substring of length k, the first character contributes with multiplier p^0, the second with p^1, and so on.
The input consists of:
s, the full stringpower, the base of the polynomial hashmodulo, the modulus used to keep values boundedk, the substring lengthhashValue, the target hash we want to match
The output must be the first substring of length k whose hash equals hashValue.
The important detail is the word "first". Multiple substrings may produce the same hash value, but we must return the one with the smallest starting index.
The constraints are large enough that recomputing the hash for every substring independently would be too slow:
s.lengthcan be up to2 * 10^4powerandmodulocan be as large as10^9
This strongly suggests the use of a rolling hash technique, where we efficiently update the hash value while sliding across the string.
There are several edge cases worth noting:
kmay equal the entire string length, meaning there is only one candidate substring.modulomay be very small, producing many collisions.- Multiple valid substrings may exist, so we must carefully preserve the earliest one.
- Large values of
powerandmodulorequire modular arithmetic throughout to avoid overflow and maintain correctness. - Because powers increase from left to right in the formula, a standard left-to-right rolling hash update becomes awkward. Reversing the traversal direction simplifies the implementation considerably.
Approaches
Brute Force Approach
The most direct solution is to examine every substring of length k.
For each substring:
- Compute its hash from scratch.
- Compare the result with
hashValue. - Return the first matching substring.
To compute the hash of one substring, we evaluate:
$$val(s[0]) \cdot p^0 + val(s[1]) \cdot p^1 + \dots + val(s[k-1]) \cdot p^{k-1}$$
This takes O(k) time per substring.
Since there are approximately n - k + 1 substrings, the total complexity becomes:
$$O((n-k+1) \cdot k)$$
In the worst case, this becomes O(nk), which is too slow when both n and k are large.
Optimal Approach, Reverse Rolling Hash
The key insight is that consecutive substring hashes are highly related.
Instead of recomputing each hash independently, we can update the previous hash in constant time using a rolling hash.
However, the hash formula uses increasing powers from left to right:
$$val(s[i]) \cdot p^0, val(s[i+1]) \cdot p^1, \dots$$
This makes a normal left-to-right sliding window inconvenient.
The trick is to process the string from right to left.
Suppose we maintain the hash of the current window while moving backward. Then:
- Adding a new character to the front becomes multiplication by
power - Removing the trailing character becomes subtraction of its weighted contribution
This transforms the problem into a clean rolling hash update.
As we move from right to left:
- Every window hash can be computed in
O(1) - The total runtime becomes linear
Because we traverse from right to left, every time we find a valid substring we overwrite the answer index. The final stored index will automatically be the leftmost valid substring.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(nk) | O(1) | Recomputes every substring hash independently |
| Optimal | O(n) | O(1) | Uses reverse rolling hash for constant time updates |
Algorithm Walkthrough
Optimal Reverse Rolling Hash Algorithm
- Initialize variables for the rolling hash, the power multiplier, and the answer index.
The rolling hash stores the hash of the current window while traversing from right to left. The multiplier stores power^k mod modulo, which is needed when removing characters from the window.
2. Precompute power^k mod modulo.
This value represents the weight of the character leaving the sliding window. We will use it during hash updates. 3. Traverse the string from right to left.
We process characters starting from the end because the hash definition aligns naturally with backward traversal. 4. Add the current character into the rolling hash.
If the current hash is:
$$H$$
then after prepending character c, the new hash becomes:
$$(H \cdot power + val(c)) \bmod modulo$$
This shifts all existing characters one power higher and inserts the new character with exponent 0.
5. Remove the character that exits the window.
Once the window size exceeds k, we remove the trailing character contribution.
The outgoing character contributed:
$$val(outgoing) \cdot power^k$$
so we subtract it modulo modulo.
6. When the window size reaches exactly k, compare the rolling hash with hashValue.
If they match, record the current starting index.
Since we are traversing backward, later matches correspond to earlier substrings in the original string.
7. After the traversal finishes, return the substring starting at the recorded index with length k.
Why it works
The rolling hash invariant is that at every step, the maintained hash equals the hash of the current length-k window according to the problem definition.
Processing from right to left ensures that each newly added character naturally receives exponent 0, while existing characters shift to higher exponents through multiplication by power.
Because each update exactly mirrors the polynomial hash structure, every computed rolling hash is correct. Since we overwrite the answer while moving leftward, the final recorded substring is guaranteed to be the first valid substring.
Python Solution
class Solution:
def subStrHash(
self,
s: str,
power: int,
modulo: int,
k: int,
hashValue: int
) -> str:
n = len(s)
current_hash = 0
power_k = pow(power, k, modulo)
answer_index = 0
for i in range(n - 1, -1, -1):
char_value = ord(s[i]) - ord('a') + 1
current_hash = (
current_hash * power + char_value
) % modulo
if i + k < n:
outgoing_value = ord(s[i + k]) - ord('a') + 1
current_hash = (
current_hash
- outgoing_value * power_k
) % modulo
if n - i >= k and current_hash == hashValue:
answer_index = i
return s[answer_index:answer_index + k]
The implementation begins by initializing the rolling hash and precomputing power^k mod modulo.
The loop traverses the string from right to left. Each new character is inserted into the rolling hash using:
$$newHash = (oldHash \cdot power + val(c)) \bmod modulo$$
This correctly shifts all existing powers upward.
Once the window grows beyond size k, the outgoing character is removed by subtracting:
$$val(outgoing) \cdot power^k$$
The modulo operation ensures values remain within bounds and also handles negative intermediate results correctly.
Whenever the window size becomes exactly k, the algorithm compares the rolling hash against hashValue. If they match, the current index becomes the candidate answer.
Because traversal proceeds backward, the final stored index corresponds to the leftmost valid substring.
Go Solution
func subStrHash(s string, power int, modulo int, k int, hashValue int) string {
n := len(s)
currentHash := 0
answerIndex := 0
powerK := 1
for i := 0; i < k; i++ {
powerK = (powerK * power) % modulo
}
for i := n - 1; i >= 0; i-- {
charValue := int(s[i]-'a') + 1
currentHash = (currentHash*power + charValue) % modulo
if i+k < n {
outgoingValue := int(s[i+k]-'a') + 1
currentHash = (
currentHash -
(outgoingValue*powerK)%modulo +
modulo,
) % modulo
}
if n-i >= k && currentHash == hashValue {
answerIndex = i
}
}
return s[answerIndex : answerIndex+k]
}
The Go implementation follows the same logic as the Python solution but requires a few language-specific considerations.
Go does not provide a built-in modular exponentiation function like Python's pow(base, exp, mod), so power^k mod modulo is computed manually using iterative multiplication.
Go's modulo operator can produce negative results after subtraction, so we explicitly add modulo before applying % modulo again:
(currentHash - value + modulo) % modulo
This guarantees the hash remains non-negative.
String slicing in Go uses byte indices, which works correctly here because the input contains only lowercase English letters.
Worked Examples
Example 1
Input:
s = "leetcode"
power = 7
modulo = 20
k = 2
hashValue = 0
We traverse from right to left.
| i | Character | Current Window | Rolling Hash |
|---|---|---|---|
| 7 | e | "e" | 5 |
| 6 | d | "de" | (5*7 + 4) % 20 = 19 |
| 5 | o | "od" | update/remove -> 3 |
| 4 | c | "co" | 16 |
| 3 | t | "tc" | 17 |
| 2 | e | "et" | 9 |
| 1 | e | "ee" | 0 |
| 0 | l | "le" | 17 |
At index 1, the rolling hash equals 0.
So the answer is:
"ee"
Example 2
Input:
s = "fbxzaad"
power = 31
modulo = 100
k = 3
hashValue = 32
| i | Window | Hash |
|---|---|---|
| 4 | "aad" | 58 |
| 3 | "zaa" | 49 |
| 2 | "xza" | 16 |
| 1 | "bxz" | 32 |
| 0 | "fbx" | 32 |
Both "bxz" and "fbx" match the target hash.
Since "fbx" appears earlier, it is returned.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character enters and leaves the rolling hash exactly once |
| Space | O(1) | Only a few integer variables are maintained |
The algorithm performs a single pass through the string. Every rolling hash update is constant time, so the total runtime scales linearly with the input size.
No auxiliary arrays or hash maps are required, which keeps the extra space usage constant.
Test Cases
sol = Solution()
# Example 1
assert sol.subStrHash("leetcode", 7, 20, 2, 0) == "ee"
# Example 2
assert sol.subStrHash("fbxzaad", 31, 100, 3, 32) == "fbx"
# Entire string is the answer
assert sol.subStrHash("abc", 3, 100, 3, 34) == "abc"
# Single character substring
assert sol.subStrHash("z", 5, 7, 1, 5) == "z"
# Multiple valid substrings, must return first
assert sol.subStrHash("aaaaa", 3, 100, 2, 4) == "aa"
# Small modulo causing many collisions
assert sol.subStrHash("abcdef", 2, 3, 2, 2) == "ab"
# Window size 1
assert sol.subStrHash("xyz", 10, 100, 1, 24) == "x"
# Large power value
assert sol.subStrHash("abcde", 1000000000, 97, 2, 8) == "ab"
# Repeated characters
assert sol.subStrHash("zzzzzz", 5, 13, 3, 0) == "zzz"
# Valid substring near end
assert sol.subStrHash("abcdefgh", 3, 1000, 2, 29) == "ab"
Test Summary
| Test | Why |
|---|---|
"leetcode" example |
Verifies standard behavior |
"fbxzaad" example |
Verifies earliest matching substring |
| Entire string | Ensures full-length window works |
| Single character | Smallest possible k |
| Repeated characters | Checks overlapping windows |
| Small modulo | Tests collision-heavy scenarios |
k = 1 |
Simplest rolling hash case |
| Large power | Verifies modular arithmetic correctness |
| All same letters | Tests repeated-value handling |
| Match near boundary | Verifies edge indexing correctness |
Edge Cases
One important edge case occurs when k equals the entire string length. In this situation there is only one possible substring, and the rolling hash must correctly handle a single window without attempting to remove outgoing characters. The implementation naturally handles this because the removal condition only activates when i + k < n.
Another subtle case is when multiple substrings produce the same hash value. A naive implementation might return the last match encountered instead of the first one. Since this solution traverses from right to left and overwrites the answer index whenever a match is found, the final recorded index always corresponds to the earliest valid substring.
A third important case involves very small modulo values. Small moduli create many hash collisions and can also produce negative intermediate values during subtraction. The implementation carefully applies modular arithmetic after every operation, ensuring correctness even when subtraction temporarily produces negative numbers.
A fourth edge case involves k = 1. With single-character windows, the rolling hash effectively reduces to the character value modulo modulo. Some rolling hash implementations accidentally assume larger windows and mishandle removal logic. Here, the same unified logic works correctly for all window sizes, including one-character substrings.