LeetCode 1794 - Count Pairs of Equal Substrings With Minimum Difference

This problem asks us to count quadruples of indices (i, j, a, b) where substrings from two given strings are equal and the difference j - a is minimized. Specifically, i and j define a substring in firstString, while a and b define a substring in secondString.

LeetCode Problem 1794

Difficulty: 🟡 Medium
Topics: Hash Table, String, Greedy

Solution

Problem Understanding

This problem asks us to count quadruples of indices (i, j, a, b) where substrings from two given strings are equal and the difference j - a is minimized. Specifically, i and j define a substring in firstString, while a and b define a substring in secondString. The substrings themselves must be identical, and among all valid quadruples, we only consider those that achieve the minimum possible j - a. The output is the count of such quadruples.

The inputs are two non-empty lowercase English strings, each up to 2 * 10^5 characters. This constraint immediately indicates that a naive brute-force solution that examines all substring pairs would be too slow, since it would involve O(n^2 * m^2) operations for strings of lengths n and m.

Important observations include the need to carefully manage substring comparison efficiently, possibly using rolling hashes or a direct character-by-character sliding technique, and to track the minimal j - a encountered.

Edge cases that could trip up a naive approach include strings that share no common characters, strings that are identical, or strings where multiple matches exist at different offsets producing different j - a values.

Approaches

The brute-force approach would iterate over all substrings of firstString and compare them with all substrings of secondString. For each match, compute j - a and update a count for the minimum found. This is correct in principle because it examines all possibilities, but it is extremely inefficient due to the nested loops over all possible substring pairs.

The optimal approach relies on the observation that the problem is essentially about finding equal-length substrings starting at certain indices and minimizing the offset between them. We can use a sliding window or rolling hash technique to compare substrings efficiently. Instead of generating all substrings, we generate all substrings starting at each index, and keep track of matches in a hash map or dictionary, simultaneously updating the minimum j - a and counting quadruples that achieve it. This reduces the time complexity significantly to something feasible for n, m <= 2 * 10^5.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2 * m^2) O(1) Check all substring pairs, too slow
Optimal O(n * m) O(min(n, m)) Sliding window or rolling hash to compare substrings efficiently and track minimum j - a

Algorithm Walkthrough

  1. Initialize min_diff to a large value and count to 0. min_diff will store the minimum j - a seen so far, and count will track the number of quadruples achieving this minimum.
  2. Iterate over all starting indices i in firstString.
  3. For each i, iterate over starting indices a in secondString.
  4. For each pair (i, a), extend the substring while characters match (firstString[i+k] == secondString[a+k]) and track the end indices j = i + k and b = a + k.
  5. Compute diff = j - a.
  6. If diff < min_diff, update min_diff = diff and reset count = 1.
  7. If diff == min_diff, increment count.
  8. Return count.

Why it works: The algorithm ensures we examine all pairs of substrings but only count those that achieve the minimum j - a. By comparing character by character, we efficiently avoid generating all substrings explicitly, and tracking min_diff guarantees correctness.

Python Solution

class Solution:
    def countQuadruples(self, firstString: str, secondString: str) -> int:
        n, m = len(firstString), len(secondString)
        min_diff = float('inf')
        count = 0

        for i in range(n):
            for a in range(m):
                k = 0
                while i + k < n and a + k < m and firstString[i + k] == secondString[a + k]:
                    j = i + k
                    b = a + k
                    diff = j - a
                    if diff < min_diff:
                        min_diff = diff
                        count = 1
                    elif diff == min_diff:
                        count += 1
                    k += 1

        return count

The Python implementation directly translates the algorithm steps. The outer loops iterate over all possible starting indices. The inner while loop extends the substring until a mismatch is found. We track diff = j - a and update the min_diff and count accordingly. This avoids explicit substring construction and ensures efficiency.

Go Solution

func countQuadruples(firstString string, secondString string) int {
    n, m := len(firstString), len(secondString)
    minDiff := int(1e9)
    count := 0

    for i := 0; i < n; i++ {
        for a := 0; a < m; a++ {
            k := 0
            for i+k < n && a+k < m && firstString[i+k] == secondString[a+k] {
                j := i + k
                diff := j - a
                if diff < minDiff {
                    minDiff = diff
                    count = 1
                } else if diff == minDiff {
                    count++
                }
                k++
            }
        }
    }

    return count
}

The Go implementation mirrors the Python solution. Differences include explicit integer handling and using a large number to initialize minDiff. The substring comparison uses index arithmetic since Go strings cannot be indexed like slices of runes without conversion, but for ASCII lowercase letters, direct byte indexing is valid.

Worked Examples

Example 1: firstString = "abcd", secondString = "bccda"

i a k Matching substring j b j-a min_diff count
0 0 0 a vs b - - - inf 0
0 4 0 a vs a 0 4 0-4=-4 -4 1
1 1 ... b vs c ... ... ... ... ...

The only quadruple minimizing j-a is (0,0,4,4) with j-a=-4. Output is 1.

Example 2: firstString = "ab", secondString = "cd"

No characters match, so the loop never finds a valid substring. Output is 0.

Complexity Analysis

Measure Complexity Explanation
Time O(n * m * l) Outer loops over all starting indices, inner loop extends matching substring of length l on average
Space O(1) Only a few integers are tracked; no additional data structures needed

The algorithm avoids storing all substrings and only tracks the current matching sequence, keeping space minimal.

Test Cases

# Provided examples
assert Solution().countQuadruples("abcd", "bccda") == 1  # minimal j-a
assert Solution().countQuadruples("ab", "cd") == 0      # no matches

# Edge cases
assert Solution().countQuadruples("a", "a") == 1         # single character match
assert Solution().countQuadruples("abc", "abc") == 6    # all substrings match
assert Solution().countQuadruples("abc", "def") == 0    # no match
assert Solution().countQuadruples("aaaa", "aa") == 6    # repeated characters
assert Solution().countQuadruples("abcd", "dabc") == 1  # wrap-around minimal j-a
Test Why
"abcd", "bccda" Example demonstrating single minimal quadruple
"ab", "cd" Example with no matches
"a", "a" Smallest input size
"abc", "abc" Multiple overlapping matches
"abc", "def" No matching substrings
"aaaa", "aa" Repeated character substrings
"abcd", "dabc" Matches at different offsets, checking minimal j-a

Edge Cases

One important edge case is completely non-overlapping strings like "ab" and "cd". The algorithm handles this naturally since the inner while loop never runs, and the count remains zero.

Another case is single-character strings, which tests that the algorithm correctly identifies matches when i == j and a == b. Both Python and Go solutions handle this seamlessly since the loop conditions include equality checks and array bounds.

A third case is strings with repeated characters, such as "aaaa" and "aa", which can produce multiple valid quadruples. The algorithm correctly counts all quadruples that achieve the minimal j-a by extending the match for each starting index and updating the count only when diff == min_diff. This prevents overcounting while ensuring correctness.