LeetCode 2564 - Substring XOR Queries
The problem gives us a binary string s and a list of queries. Each query contains two integers, first and second. For every query, we need to find a substring of s whose decimal value satisfies: where ⊕ represents the bitwise XOR operation.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Bit Manipulation
Solution
Problem Understanding
The problem gives us a binary string s and a list of queries. Each query contains two integers, first and second.
For every query, we need to find a substring of s whose decimal value satisfies:
$$val \oplus first = second$$
where ⊕ represents the bitwise XOR operation.
We can rearrange the equation using the XOR property:
$$val = first \oplus second$$
This transformation is the key insight of the entire problem. Instead of searching for substrings that satisfy an equation, we can directly compute the exact decimal value that the substring must represent.
The task becomes:
- Compute
target = first ^ second - Find the shortest substring in
swhose binary value equalstarget - If multiple substrings have the same shortest length, choose the one with the smallest left index
- Return its
[left, right]indices - If no such substring exists, return
[-1, -1]
The binary string length is at most 10^4, while the number of queries can reach 10^5. This immediately tells us that processing every query independently with substring generation would be far too slow.
The substring value is interpreted as a binary number. For example:
"101"→5"11"→3"0"→0
An important observation is that query values can be as large as 10^9. Since:
$$2^{30} > 10^9$$
any relevant binary substring only needs at most 30 bits. Longer substrings would exceed the possible query targets.
Several edge cases are important:
- The target value may be
0, which requires handling substrings like"0" - Leading zeros are allowed in substrings, but they create longer representations of the same value, so shorter representations are always preferred
- Multiple substrings may produce the same value, so we must preserve the shortest one and then the leftmost one
- Queries may ask for values that never appear in the string
Approaches
Brute Force Approach
The brute-force method would generate every possible substring of s, convert each substring into its decimal value, and then compare it against every query target.
There are O(n^2) substrings in a string of length n. Converting each substring into a binary number costs additional time, and matching against up to 10^5 queries makes the approach completely impractical.
Even if we optimize conversion incrementally, the total work remains enormous because:
ncan be10^4- Number of substrings is about
5 × 10^7 - Queries can be
10^5
This approach would time out easily.
Optimal Approach
The crucial observation is that every query can be reduced to a target value:
$$target = first \oplus second$$
Instead of processing queries individually, we preprocess the string once.
We enumerate every binary substring whose length is at most 30, compute its decimal value, and store the shortest occurrence of each value in a hash map.
Why only 30 characters?
Because query values are at most 10^9, and binary representations larger than 30 bits exceed that range.
This reduces preprocessing to:
$$O(n \times 30)$$
After preprocessing, each query becomes a simple hash map lookup.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n² × q) | O(1) | Enumerates all substrings for every query |
| Optimal | O(n × 30 + q) | O(number of unique values) | Preprocesses all relevant substring values once |
Algorithm Walkthrough
- Create a hash map called
value_to_indices.
This map will store:
decimal_value -> [left, right]
For every value, we only keep the shortest substring that produces it. If lengths tie, we keep the leftmost occurrence.
2. Iterate through every starting index left in the string.
At each position, we attempt to build binary numbers incrementally. 3. Extend the substring up to 30 characters.
We use a running integer value:
current_value = (current_value << 1) | bit
This efficiently builds the binary number without repeatedly converting strings. 4. Store the substring if it is the best representation for that value.
If the value has never appeared before, store it immediately.
Otherwise, compare substring lengths:
- Keep the shorter substring
- If lengths are equal, keep the earlier starting index
- Handle the value
0naturally.
A substring "0" produces value 0. Longer strings like "00" are worse because they are longer.
6. Process queries.
For each query:
target = first ^ second
Then check whether target exists in the map.
7. Return stored indices or [-1, -1].
Why it works
Every valid query answer corresponds to a substring whose decimal value equals:
$$first \oplus second$$
The preprocessing step enumerates every possible relevant substring value that can ever appear in a query. Since we always store the shortest substring for each value, and break ties by smaller left index, the stored answer exactly matches the problem requirements.
Because all queries are answered from this precomputed map, correctness follows directly from exhaustive preprocessing over all possible substring values up to 30 bits.
Python Solution
from typing import List
class Solution:
def substringXorQueries(self, s: str, queries: List[List[int]]) -> List[List[int]]:
value_to_indices = {}
n = len(s)
for left in range(n):
value = 0
for right in range(left, min(n, left + 30)):
value = (value << 1) | int(s[right])
current_length = right - left + 1
if value not in value_to_indices:
value_to_indices[value] = [left, right]
else:
prev_left, prev_right = value_to_indices[value]
prev_length = prev_right - prev_left + 1
if current_length < prev_length:
value_to_indices[value] = [left, right]
if value == 0:
break
answer = []
for first, second in queries:
target = first ^ second
if target in value_to_indices:
answer.append(value_to_indices[target])
else:
answer.append([-1, -1])
return answer
The implementation begins by preprocessing the string. The nested loop generates every substring starting at each index, but only up to length 30.
The variable value is updated incrementally using bit operations. Shifting left multiplies the current number by 2, and OR-ing with the new bit appends the next binary digit.
The hash map stores the best substring for every value encountered.
The condition:
if value == 0:
break
is an optimization. Once the substring value becomes zero, extending it only produces longer zero representations, which can never be optimal.
After preprocessing, query handling becomes extremely efficient. Each query computes:
target = first ^ second
and performs a constant-time hash map lookup.
Go Solution
func substringXorQueries(s string, queries [][]int) [][]int {
valueToIndices := make(map[int][]int)
n := len(s)
for left := 0; left < n; left++ {
value := 0
limit := left + 30
if limit > n {
limit = n
}
for right := left; right < limit; right++ {
value = (value << 1) | int(s[right]-'0')
currentLength := right - left + 1
if prev, exists := valueToIndices[value]; !exists {
valueToIndices[value] = []int{left, right}
} else {
prevLength := prev[1] - prev[0] + 1
if currentLength < prevLength {
valueToIndices[value] = []int{left, right}
}
}
if value == 0 {
break
}
}
}
answer := make([][]int, 0, len(queries))
for _, query := range queries {
target := query[0] ^ query[1]
if indices, exists := valueToIndices[target]; exists {
answer = append(answer, indices)
} else {
answer = append(answer, []int{-1, -1})
}
}
return answer
}
The Go implementation follows the same logic as the Python version.
A map[int][]int is used instead of a Python dictionary. Binary value construction uses integer bit operations exactly the same way.
Go strings are byte arrays, so individual bits are extracted with:
s[right] - '0'
The solution safely fits within Go's integer range because substring lengths never exceed 30 bits.
Worked Examples
Example 1
s = "101101"
queries = [[0,5],[1,2]]
Preprocessing Phase
| Substring | Value | Indices |
|---|---|---|
| "1" | 1 | [0,0] |
| "10" | 2 | [0,1] |
| "101" | 5 | [0,2] |
| "11" | 3 | [2,3] |
| "110" | 6 | [2,4] |
| "1011" | 11 | [0,3] |
Only the shortest occurrence for each value is stored.
Query 1
first = 0
second = 5
target = 0 ^ 5 = 5
Lookup:
5 -> [0,2]
Answer:
[0,2]
Query 2
first = 1
second = 2
target = 1 ^ 2 = 3
Lookup:
3 -> [2,3]
Answer:
[2,3]
Final result:
[[0,2],[2,3]]
Example 2
s = "0101"
queries = [[12,8]]
Query
target = 12 ^ 8 = 4
Binary value 4 corresponds to "100".
No substring in "0101" has value 4.
Answer:
[-1,-1]
Example 3
s = "1"
queries = [[4,5]]
Query
target = 4 ^ 5 = 1
Substring "1" at [0,0] has value 1.
Answer:
[0,0]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n × 30 + q) | Each position explores at most 30 substrings, then each query is O(1) |
| Space | O(number of unique substring values) | Hash map stores one entry per distinct value |
The preprocessing phase dominates the runtime. Since each starting position generates at most 30 substrings, total substring processing is bounded by:
$$30n$$
which is effectively linear.
Query processing is constant time because every answer is a direct hash map lookup.
Test Cases
from typing import List
class Solution:
def substringXorQueries(self, s: str, queries: List[List[int]]) -> List[List[int]]:
value_to_indices = {}
n = len(s)
for left in range(n):
value = 0
for right in range(left, min(n, left + 30)):
value = (value << 1) | int(s[right])
current_length = right - left + 1
if value not in value_to_indices:
value_to_indices[value] = [left, right]
else:
prev_left, prev_right = value_to_indices[value]
prev_length = prev_right - prev_left + 1
if current_length < prev_length:
value_to_indices[value] = [left, right]
if value == 0:
break
answer = []
for first, second in queries:
target = first ^ second
if target in value_to_indices:
answer.append(value_to_indices[target])
else:
answer.append([-1, -1])
return answer
sol = Solution()
# Provided example 1
assert sol.substringXorQueries(
"101101",
[[0, 5], [1, 2]]
) == [[0, 2], [2, 3]]
# Provided example 2
assert sol.substringXorQueries(
"0101",
[[12, 8]]
) == [[-1, -1]]
# Provided example 3
assert sol.substringXorQueries(
"1",
[[4, 5]]
) == [[0, 0]]
# Single zero character
assert sol.substringXorQueries(
"0",
[[1, 1]]
) == [[0, 0]]
# Multiple zeros, shortest substring should be chosen
assert sol.substringXorQueries(
"0000",
[[5, 5]]
) == [[0, 0]]
# Multiple representations of same value, shortest wins
assert sol.substringXorQueries(
"0011",
[[0, 3]]
) == [[2, 3]]
# Query target absent
assert sol.substringXorQueries(
"1010",
[[7, 0]]
) == [[-1, -1]]
# Tie in length, smaller left index wins
assert sol.substringXorQueries(
"10101",
[[0, 5]]
) == [[0, 2]]
# Large binary values
assert sol.substringXorQueries(
"111111111111111111111111111111",
[[0, 1073741823]]
) == [[0, 29]]
# Mixed queries
assert sol.substringXorQueries(
"1101001",
[[3, 6], [4, 5], [1, 0]]
) == [[0, 0], [4, 4], [1, 1]]
Test Summary
| Test | Why |
|---|---|
"101101" basic example |
Verifies standard functionality |
"0101" missing target |
Confirms [-1,-1] handling |
"1" single character |
Smallest non-empty input |
"0" single zero |
Validates zero handling |
"0000" repeated zeros |
Ensures shortest zero substring |
"0011" repeated values |
Confirms shortest representation selection |
"1010" absent value |
Tests failed lookup |
"10101" tie handling |
Ensures leftmost tie-breaking |
| 30-bit binary string | Verifies maximum relevant substring length |
| Mixed queries | Tests multiple simultaneous lookups |
Edge Cases
Target Value Equals Zero
The value 0 is special because many substrings with leading zeros also evaluate to zero.
For example:
"0"
"00"
"000"
all represent decimal value 0.
The correct answer must always be the shortest substring, which is simply "0".
The implementation handles this naturally because once value == 0, the inner loop breaks immediately. Longer zero substrings are never considered.
Multiple Substrings Produce the Same Value
Binary strings with leading zeros can generate duplicate values.
For example:
"11" -> 3
"011" -> 3
"0011" -> 3
The problem requires choosing the shortest substring. If lengths tie, choose the smallest left index.
The hash map update logic explicitly compares substring lengths before replacing an existing answer.
Large Query Values
Query integers may be as large as 10^9.
A naive implementation might attempt to generate arbitrarily long substrings, causing unnecessary work and possible overflow concerns.
The implementation avoids this by limiting substring exploration to 30 bits, because:
$$2^{30} > 10^9$$
This keeps preprocessing efficient and guarantees all relevant values are still covered correctly.