LeetCode 2564 - Substring XOR Queries

LeetCode Problem 2564

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Bit Manipulation

Solution

Problem Understanding

The problem gives us a binary string s and a list of queries. Each query contains two integers, first and second.

For every query, we need to find a substring of s whose decimal value satisfies:

$$val \oplus first = second$$

where ⊕ represents the bitwise XOR operation.

We can rearrange the equation using the XOR property:

$$val = first \oplus second$$

This transformation is the key insight of the entire problem. Instead of searching for substrings that satisfy an equation, we can directly compute the exact decimal value that the substring must represent.

The task becomes:

Compute target = first ^ second
Find the shortest substring in s whose binary value equals target
If multiple substrings have the same shortest length, choose the one with the smallest left index
Return its [left, right] indices
If no such substring exists, return [-1, -1]

The binary string length is at most 10^4, while the number of queries can reach 10^5. This immediately tells us that processing every query independently with substring generation would be far too slow.

The substring value is interpreted as a binary number. For example:

"101" → 5
"11" → 3
"0" → 0

An important observation is that query values can be as large as 10^9. Since:

$$2^{30} > 10^9$$

any relevant binary substring only needs at most 30 bits. Longer substrings would exceed the possible query targets.

Several edge cases are important:

The target value may be 0, which requires handling substrings like "0"
Leading zeros are allowed in substrings, but they create longer representations of the same value, so shorter representations are always preferred
Multiple substrings may produce the same value, so we must preserve the shortest one and then the leftmost one
Queries may ask for values that never appear in the string

Approaches

Brute Force Approach

The brute-force method would generate every possible substring of s, convert each substring into its decimal value, and then compare it against every query target.

There are O(n^2) substrings in a string of length n. Converting each substring into a binary number costs additional time, and matching against up to 10^5 queries makes the approach completely impractical.

Even if we optimize conversion incrementally, the total work remains enormous because:

n can be 10^4
Number of substrings is about 5 × 10^7
Queries can be 10^5

This approach would time out easily.

Optimal Approach

The crucial observation is that every query can be reduced to a target value:

$$target = first \oplus second$$

Instead of processing queries individually, we preprocess the string once.

We enumerate every binary substring whose length is at most 30, compute its decimal value, and store the shortest occurrence of each value in a hash map.

Why only 30 characters?

Because query values are at most 10^9, and binary representations larger than 30 bits exceed that range.

This reduces preprocessing to:

$$O(n \times 30)$$

After preprocessing, each query becomes a simple hash map lookup.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n² × q)	O(1)	Enumerates all substrings for every query
Optimal	O(n × 30 + q)	O(number of unique values)	Preprocesses all relevant substring values once

Algorithm Walkthrough

Create a hash map called value_to_indices.

This map will store:

decimal_value -> [left, right]

For every value, we only keep the shortest substring that produces it. If lengths tie, we keep the leftmost occurrence. 2. Iterate through every starting index left in the string.

At each position, we attempt to build binary numbers incrementally. 3. Extend the substring up to 30 characters.

We use a running integer value:

current_value = (current_value << 1) | bit

This efficiently builds the binary number without repeatedly converting strings. 4. Store the substring if it is the best representation for that value.

If the value has never appeared before, store it immediately.

Otherwise, compare substring lengths:

Keep the shorter substring
If lengths are equal, keep the earlier starting index

Handle the value 0 naturally.

A substring "0" produces value 0. Longer strings like "00" are worse because they are longer. 6. Process queries.

For each query:

target = first ^ second

Then check whether target exists in the map. 7. Return stored indices or [-1, -1].

Why it works

Every valid query answer corresponds to a substring whose decimal value equals:

$$first \oplus second$$

The preprocessing step enumerates every possible relevant substring value that can ever appear in a query. Since we always store the shortest substring for each value, and break ties by smaller left index, the stored answer exactly matches the problem requirements.

Because all queries are answered from this precomputed map, correctness follows directly from exhaustive preprocessing over all possible substring values up to 30 bits.

Python Solution

from typing import List

class Solution:
    def substringXorQueries(self, s: str, queries: List[List[int]]) -> List[List[int]]:
        value_to_indices = {}

        n = len(s)

        for left in range(n):
            value = 0

            for right in range(left, min(n, left + 30)):
                value = (value << 1) | int(s[right])

                current_length = right - left + 1

                if value not in value_to_indices:
                    value_to_indices[value] = [left, right]
                else:
                    prev_left, prev_right = value_to_indices[value]
                    prev_length = prev_right - prev_left + 1

                    if current_length < prev_length:
                        value_to_indices[value] = [left, right]

                if value == 0:
                    break

        answer = []

        for first, second in queries:
            target = first ^ second

            if target in value_to_indices:
                answer.append(value_to_indices[target])
            else:
                answer.append([-1, -1])

        return answer

The implementation begins by preprocessing the string. The nested loop generates every substring starting at each index, but only up to length 30.

The variable value is updated incrementally using bit operations. Shifting left multiplies the current number by 2, and OR-ing with the new bit appends the next binary digit.

The hash map stores the best substring for every value encountered.

The condition:

if value == 0:
    break

is an optimization. Once the substring value becomes zero, extending it only produces longer zero representations, which can never be optimal.

After preprocessing, query handling becomes extremely efficient. Each query computes:

target = first ^ second

and performs a constant-time hash map lookup.

Go Solution

func substringXorQueries(s string, queries [][]int) [][]int {
	valueToIndices := make(map[int][]int)

	n := len(s)

	for left := 0; left < n; left++ {
		value := 0

		limit := left + 30
		if limit > n {
			limit = n
		}

		for right := left; right < limit; right++ {
			value = (value << 1) | int(s[right]-'0')

			currentLength := right - left + 1

			if prev, exists := valueToIndices[value]; !exists {
				valueToIndices[value] = []int{left, right}
			} else {
				prevLength := prev[1] - prev[0] + 1

				if currentLength < prevLength {
					valueToIndices[value] = []int{left, right}
				}
			}

			if value == 0 {
				break
			}
		}
	}

	answer := make([][]int, 0, len(queries))

	for _, query := range queries {
		target := query[0] ^ query[1]

		if indices, exists := valueToIndices[target]; exists {
			answer = append(answer, indices)
		} else {
			answer = append(answer, []int{-1, -1})
		}
	}

	return answer
}

The Go implementation follows the same logic as the Python version.

A map[int][]int is used instead of a Python dictionary. Binary value construction uses integer bit operations exactly the same way.

Go strings are byte arrays, so individual bits are extracted with:

s[right] - '0'

The solution safely fits within Go's integer range because substring lengths never exceed 30 bits.

Worked Examples

Example 1

s = "101101"
queries = [[0,5],[1,2]]

Preprocessing Phase

Substring	Value	Indices
"1"	1	[0,0]
"10"	2	[0,1]
"101"	5	[0,2]
"11"	3	[2,3]
"110"	6	[2,4]
"1011"	11	[0,3]

Only the shortest occurrence for each value is stored.

Query 1

first = 0
second = 5
target = 0 ^ 5 = 5

Lookup:

5 -> [0,2]

Answer:

[0,2]

Query 2

first = 1
second = 2
target = 1 ^ 2 = 3

Lookup:

3 -> [2,3]

Answer:

[2,3]

Final result:

[[0,2],[2,3]]

Example 2

s = "0101"
queries = [[12,8]]

Query

target = 12 ^ 8 = 4

Binary value 4 corresponds to "100".

No substring in "0101" has value 4.

Answer:

[-1,-1]

Example 3

s = "1"
queries = [[4,5]]

Query

target = 4 ^ 5 = 1

Substring "1" at [0,0] has value 1.

Answer:

[0,0]

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n × 30 + q)	Each position explores at most 30 substrings, then each query is O(1)
Space	O(number of unique substring values)	Hash map stores one entry per distinct value

The preprocessing phase dominates the runtime. Since each starting position generates at most 30 substrings, total substring processing is bounded by:

$$30n$$

which is effectively linear.

Query processing is constant time because every answer is a direct hash map lookup.

Test Cases

from typing import List

class Solution:
    def substringXorQueries(self, s: str, queries: List[List[int]]) -> List[List[int]]:
        value_to_indices = {}

        n = len(s)

        for left in range(n):
            value = 0

            for right in range(left, min(n, left + 30)):
                value = (value << 1) | int(s[right])

                current_length = right - left + 1

                if value not in value_to_indices:
                    value_to_indices[value] = [left, right]
                else:
                    prev_left, prev_right = value_to_indices[value]
                    prev_length = prev_right - prev_left + 1

                    if current_length < prev_length:
                        value_to_indices[value] = [left, right]

                if value == 0:
                    break

        answer = []

        for first, second in queries:
            target = first ^ second

            if target in value_to_indices:
                answer.append(value_to_indices[target])
            else:
                answer.append([-1, -1])

        return answer

sol = Solution()

# Provided example 1
assert sol.substringXorQueries(
    "101101",
    [[0, 5], [1, 2]]
) == [[0, 2], [2, 3]]

# Provided example 2
assert sol.substringXorQueries(
    "0101",
    [[12, 8]]
) == [[-1, -1]]

# Provided example 3
assert sol.substringXorQueries(
    "1",
    [[4, 5]]
) == [[0, 0]]

# Single zero character
assert sol.substringXorQueries(
    "0",
    [[1, 1]]
) == [[0, 0]]

# Multiple zeros, shortest substring should be chosen
assert sol.substringXorQueries(
    "0000",
    [[5, 5]]
) == [[0, 0]]

# Multiple representations of same value, shortest wins
assert sol.substringXorQueries(
    "0011",
    [[0, 3]]
) == [[2, 3]]

# Query target absent
assert sol.substringXorQueries(
    "1010",
    [[7, 0]]
) == [[-1, -1]]

# Tie in length, smaller left index wins
assert sol.substringXorQueries(
    "10101",
    [[0, 5]]
) == [[0, 2]]

# Large binary values
assert sol.substringXorQueries(
    "111111111111111111111111111111",
    [[0, 1073741823]]
) == [[0, 29]]

# Mixed queries
assert sol.substringXorQueries(
    "1101001",
    [[3, 6], [4, 5], [1, 0]]
) == [[0, 0], [4, 4], [1, 1]]

Test Summary

Test	Why
`"101101"` basic example	Verifies standard functionality
`"0101"` missing target	Confirms `[-1,-1]` handling
`"1"` single character	Smallest non-empty input
`"0"` single zero	Validates zero handling
`"0000"` repeated zeros	Ensures shortest zero substring
`"0011"` repeated values	Confirms shortest representation selection
`"1010"` absent value	Tests failed lookup
`"10101"` tie handling	Ensures leftmost tie-breaking
30-bit binary string	Verifies maximum relevant substring length
Mixed queries	Tests multiple simultaneous lookups

Edge Cases

Target Value Equals Zero

The value 0 is special because many substrings with leading zeros also evaluate to zero.

For example:

"0"
"00"
"000"

all represent decimal value 0.

The correct answer must always be the shortest substring, which is simply "0".

The implementation handles this naturally because once value == 0, the inner loop breaks immediately. Longer zero substrings are never considered.

Multiple Substrings Produce the Same Value

Binary strings with leading zeros can generate duplicate values.

For example:

"11"   -> 3
"011"  -> 3
"0011" -> 3

The problem requires choosing the shortest substring. If lengths tie, choose the smallest left index.

The hash map update logic explicitly compares substring lengths before replacing an existing answer.

Large Query Values

Query integers may be as large as 10^9.

A naive implementation might attempt to generate arbitrarily long substrings, causing unnecessary work and possible overflow concerns.

The implementation avoids this by limiting substring exploration to 30 bits, because:

$$2^{30} > 10^9$$

This keeps preprocessing efficient and guarantees all relevant values are still covered correctly.