LeetCode 2156 - Find Substring With Given Hash Value

This problem asks us to find the earliest substring of length k whose polynomial rolling hash equals a given target value. The hash function is defined as: where: - val('a') = 1 - val('b') = 2 - ...

LeetCode Problem 2156

Difficulty: 🔴 Hard
Topics: String, Sliding Window, Rolling Hash, Hash Function

Solution

Problem Understanding

This problem asks us to find the earliest substring of length k whose polynomial rolling hash equals a given target value.

The hash function is defined as:

$$hash(s, p, m) = \left( \sum_{i=0}^{k-1} val(s[i]) \cdot p^i \right) \bmod m$$

where:

  • val('a') = 1
  • val('b') = 2
  • ...
  • val('z') = 26

For a substring of length k, the first character contributes with multiplier p^0, the second with p^1, and so on.

The input consists of:

  • s, the full string
  • power, the base of the polynomial hash
  • modulo, the modulus used to keep values bounded
  • k, the substring length
  • hashValue, the target hash we want to match

The output must be the first substring of length k whose hash equals hashValue.

The important detail is the word "first". Multiple substrings may produce the same hash value, but we must return the one with the smallest starting index.

The constraints are large enough that recomputing the hash for every substring independently would be too slow:

  • s.length can be up to 2 * 10^4
  • power and modulo can be as large as 10^9

This strongly suggests the use of a rolling hash technique, where we efficiently update the hash value while sliding across the string.

There are several edge cases worth noting:

  • k may equal the entire string length, meaning there is only one candidate substring.
  • modulo may be very small, producing many collisions.
  • Multiple valid substrings may exist, so we must carefully preserve the earliest one.
  • Large values of power and modulo require modular arithmetic throughout to avoid overflow and maintain correctness.
  • Because powers increase from left to right in the formula, a standard left-to-right rolling hash update becomes awkward. Reversing the traversal direction simplifies the implementation considerably.

Approaches

Brute Force Approach

The most direct solution is to examine every substring of length k.

For each substring:

  1. Compute its hash from scratch.
  2. Compare the result with hashValue.
  3. Return the first matching substring.

To compute the hash of one substring, we evaluate:

$$val(s[0]) \cdot p^0 + val(s[1]) \cdot p^1 + \dots + val(s[k-1]) \cdot p^{k-1}$$

This takes O(k) time per substring.

Since there are approximately n - k + 1 substrings, the total complexity becomes:

$$O((n-k+1) \cdot k)$$

In the worst case, this becomes O(nk), which is too slow when both n and k are large.

Optimal Approach, Reverse Rolling Hash

The key insight is that consecutive substring hashes are highly related.

Instead of recomputing each hash independently, we can update the previous hash in constant time using a rolling hash.

However, the hash formula uses increasing powers from left to right:

$$val(s[i]) \cdot p^0, val(s[i+1]) \cdot p^1, \dots$$

This makes a normal left-to-right sliding window inconvenient.

The trick is to process the string from right to left.

Suppose we maintain the hash of the current window while moving backward. Then:

  • Adding a new character to the front becomes multiplication by power
  • Removing the trailing character becomes subtraction of its weighted contribution

This transforms the problem into a clean rolling hash update.

As we move from right to left:

  • Every window hash can be computed in O(1)
  • The total runtime becomes linear

Because we traverse from right to left, every time we find a valid substring we overwrite the answer index. The final stored index will automatically be the leftmost valid substring.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(nk) O(1) Recomputes every substring hash independently
Optimal O(n) O(1) Uses reverse rolling hash for constant time updates

Algorithm Walkthrough

Optimal Reverse Rolling Hash Algorithm

  1. Initialize variables for the rolling hash, the power multiplier, and the answer index.

The rolling hash stores the hash of the current window while traversing from right to left. The multiplier stores power^k mod modulo, which is needed when removing characters from the window. 2. Precompute power^k mod modulo.

This value represents the weight of the character leaving the sliding window. We will use it during hash updates. 3. Traverse the string from right to left.

We process characters starting from the end because the hash definition aligns naturally with backward traversal. 4. Add the current character into the rolling hash.

If the current hash is:

$$H$$

then after prepending character c, the new hash becomes:

$$(H \cdot power + val(c)) \bmod modulo$$

This shifts all existing characters one power higher and inserts the new character with exponent 0. 5. Remove the character that exits the window.

Once the window size exceeds k, we remove the trailing character contribution.

The outgoing character contributed:

$$val(outgoing) \cdot power^k$$

so we subtract it modulo modulo. 6. When the window size reaches exactly k, compare the rolling hash with hashValue.

If they match, record the current starting index.

Since we are traversing backward, later matches correspond to earlier substrings in the original string. 7. After the traversal finishes, return the substring starting at the recorded index with length k.

Why it works

The rolling hash invariant is that at every step, the maintained hash equals the hash of the current length-k window according to the problem definition.

Processing from right to left ensures that each newly added character naturally receives exponent 0, while existing characters shift to higher exponents through multiplication by power.

Because each update exactly mirrors the polynomial hash structure, every computed rolling hash is correct. Since we overwrite the answer while moving leftward, the final recorded substring is guaranteed to be the first valid substring.

Python Solution

class Solution:
    def subStrHash(
        self,
        s: str,
        power: int,
        modulo: int,
        k: int,
        hashValue: int
    ) -> str:

        n = len(s)

        current_hash = 0
        power_k = pow(power, k, modulo)

        answer_index = 0

        for i in range(n - 1, -1, -1):

            char_value = ord(s[i]) - ord('a') + 1

            current_hash = (
                current_hash * power + char_value
            ) % modulo

            if i + k < n:
                outgoing_value = ord(s[i + k]) - ord('a') + 1

                current_hash = (
                    current_hash
                    - outgoing_value * power_k
                ) % modulo

            if n - i >= k and current_hash == hashValue:
                answer_index = i

        return s[answer_index:answer_index + k]

The implementation begins by initializing the rolling hash and precomputing power^k mod modulo.

The loop traverses the string from right to left. Each new character is inserted into the rolling hash using:

$$newHash = (oldHash \cdot power + val(c)) \bmod modulo$$

This correctly shifts all existing powers upward.

Once the window grows beyond size k, the outgoing character is removed by subtracting:

$$val(outgoing) \cdot power^k$$

The modulo operation ensures values remain within bounds and also handles negative intermediate results correctly.

Whenever the window size becomes exactly k, the algorithm compares the rolling hash against hashValue. If they match, the current index becomes the candidate answer.

Because traversal proceeds backward, the final stored index corresponds to the leftmost valid substring.

Go Solution

func subStrHash(s string, power int, modulo int, k int, hashValue int) string {
	n := len(s)

	currentHash := 0
	answerIndex := 0

	powerK := 1
	for i := 0; i < k; i++ {
		powerK = (powerK * power) % modulo
	}

	for i := n - 1; i >= 0; i-- {
		charValue := int(s[i]-'a') + 1

		currentHash = (currentHash*power + charValue) % modulo

		if i+k < n {
			outgoingValue := int(s[i+k]-'a') + 1

			currentHash = (
				currentHash -
					(outgoingValue*powerK)%modulo +
					modulo,
			) % modulo
		}

		if n-i >= k && currentHash == hashValue {
			answerIndex = i
		}
	}

	return s[answerIndex : answerIndex+k]
}

The Go implementation follows the same logic as the Python solution but requires a few language-specific considerations.

Go does not provide a built-in modular exponentiation function like Python's pow(base, exp, mod), so power^k mod modulo is computed manually using iterative multiplication.

Go's modulo operator can produce negative results after subtraction, so we explicitly add modulo before applying % modulo again:

(currentHash - value + modulo) % modulo

This guarantees the hash remains non-negative.

String slicing in Go uses byte indices, which works correctly here because the input contains only lowercase English letters.

Worked Examples

Example 1

Input:

s = "leetcode"
power = 7
modulo = 20
k = 2
hashValue = 0

We traverse from right to left.

i Character Current Window Rolling Hash
7 e "e" 5
6 d "de" (5*7 + 4) % 20 = 19
5 o "od" update/remove -> 3
4 c "co" 16
3 t "tc" 17
2 e "et" 9
1 e "ee" 0
0 l "le" 17

At index 1, the rolling hash equals 0.

So the answer is:

"ee"

Example 2

Input:

s = "fbxzaad"
power = 31
modulo = 100
k = 3
hashValue = 32
i Window Hash
4 "aad" 58
3 "zaa" 49
2 "xza" 16
1 "bxz" 32
0 "fbx" 32

Both "bxz" and "fbx" match the target hash.

Since "fbx" appears earlier, it is returned.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character enters and leaves the rolling hash exactly once
Space O(1) Only a few integer variables are maintained

The algorithm performs a single pass through the string. Every rolling hash update is constant time, so the total runtime scales linearly with the input size.

No auxiliary arrays or hash maps are required, which keeps the extra space usage constant.

Test Cases

sol = Solution()

# Example 1
assert sol.subStrHash("leetcode", 7, 20, 2, 0) == "ee"

# Example 2
assert sol.subStrHash("fbxzaad", 31, 100, 3, 32) == "fbx"

# Entire string is the answer
assert sol.subStrHash("abc", 3, 100, 3, 34) == "abc"

# Single character substring
assert sol.subStrHash("z", 5, 7, 1, 5) == "z"

# Multiple valid substrings, must return first
assert sol.subStrHash("aaaaa", 3, 100, 2, 4) == "aa"

# Small modulo causing many collisions
assert sol.subStrHash("abcdef", 2, 3, 2, 2) == "ab"

# Window size 1
assert sol.subStrHash("xyz", 10, 100, 1, 24) == "x"

# Large power value
assert sol.subStrHash("abcde", 1000000000, 97, 2, 8) == "ab"

# Repeated characters
assert sol.subStrHash("zzzzzz", 5, 13, 3, 0) == "zzz"

# Valid substring near end
assert sol.subStrHash("abcdefgh", 3, 1000, 2, 29) == "ab"

Test Summary

Test Why
"leetcode" example Verifies standard behavior
"fbxzaad" example Verifies earliest matching substring
Entire string Ensures full-length window works
Single character Smallest possible k
Repeated characters Checks overlapping windows
Small modulo Tests collision-heavy scenarios
k = 1 Simplest rolling hash case
Large power Verifies modular arithmetic correctness
All same letters Tests repeated-value handling
Match near boundary Verifies edge indexing correctness

Edge Cases

One important edge case occurs when k equals the entire string length. In this situation there is only one possible substring, and the rolling hash must correctly handle a single window without attempting to remove outgoing characters. The implementation naturally handles this because the removal condition only activates when i + k < n.

Another subtle case is when multiple substrings produce the same hash value. A naive implementation might return the last match encountered instead of the first one. Since this solution traverses from right to left and overwrites the answer index whenever a match is found, the final recorded index always corresponds to the earliest valid substring.

A third important case involves very small modulo values. Small moduli create many hash collisions and can also produce negative intermediate values during subtraction. The implementation carefully applies modular arithmetic after every operation, ensuring correctness even when subtraction temporarily produces negative numbers.

A fourth edge case involves k = 1. With single-character windows, the rolling hash effectively reduces to the character value modulo modulo. Some rolling hash implementations accidentally assume larger windows and mishandle removal logic. Here, the same unified logic works correctly for all window sizes, including one-character substrings.