LeetCode 1316 - Distinct Echo Substrings

The problem asks us to find the number of distinct non-empty substrings of a given string text that can be expressed as

LeetCode Problem 1316

Difficulty: 🔴 Hard
Topics: String, Trie, Rolling Hash, Hash Function

Solution

Problem Understanding

The problem asks us to find the number of distinct non-empty substrings of a given string text that can be expressed as the concatenation of some string with itself. In other words, we need to count all substrings of the form a + a, where a is any non-empty string. For example, "abcabc" can be represented as "abc" + "abc", making it an echo substring. Substrings like "aa" and "abab" also qualify if they satisfy this repetition property.

The input is a string text of length between 1 and 2000, containing only lowercase English letters. The output is an integer representing the count of distinct echo substrings.

Important points include recognizing that substrings may overlap and that only distinct substrings should be counted. Edge cases involve very short strings (length 1, which cannot have echo substrings) or strings with repeated single characters (like "aaaaa") where multiple overlapping echo substrings exist.

The constraints tell us that a brute-force solution with O(n^3) complexity is likely too slow, and we need a more optimized method, potentially using hashing or tries to detect repeated patterns efficiently.

Approaches

The brute-force approach involves generating all possible substrings of text and checking for each whether it can be split into two equal halves. This works by iterating over every possible starting index i and every even length l (since only even-length strings can form a + a). For each candidate substring, we check if the first half equals the second half. While correct, this approach has a time complexity of O(n^3) because generating all substrings is O(n^2) and comparing halves is O(n), making it infeasible for n = 2000.

The key insight for an optimal approach is to use rolling hash or trie-based techniques to compare substrings efficiently. Rolling hash allows constant-time substring comparison after preprocessing the hash values of all prefixes. By iterating over possible lengths and using hashing, we can check whether the two halves of a substring are equal in O(1) time per substring. This reduces the time complexity significantly.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^3) O(n^2) Generate all substrings and compare halves directly
Rolling Hash / Trie O(n^2) O(n^2) Use hashing to compare substrings efficiently

Algorithm Walkthrough

  1. Precompute the rolling hash of all prefixes of text. Choose a base (e.g., 26) and a large prime modulus to avoid collisions.
  2. Initialize an empty set seen to store distinct echo substrings.
  3. Iterate over all possible lengths length of the echo substring. Since an echo substring is a + a, only consider even lengths.
  4. For each starting index i, calculate the hash of the first half (i to i + length // 2) and the second half (i + length // 2 to i + length).
  5. If the two halves have equal hash values, add the substring text[i:i + length] to seen.
  6. Continue until all even lengths and starting indices have been processed.
  7. Return the size of the seen set, which gives the number of distinct echo substrings.

Why it works: The rolling hash ensures that substring equality checks are efficient and collision-resistant with a proper choice of base and modulus. By iterating over all starting indices and lengths, we guarantee that all potential echo substrings are considered. Using a set ensures that duplicates are counted only once.

Python Solution

class Solution:
    def distinctEchoSubstrings(self, text: str) -> int:
        n = len(text)
        base = 26
        mod = 10**9 + 7
        
        # Precompute prefix hashes and powers of base
        prefix_hash = [0] * (n + 1)
        power = [1] * (n + 1)
        
        for i in range(n):
            prefix_hash[i + 1] = (prefix_hash[i] * base + ord(text[i]) - ord('a')) % mod
            power[i + 1] = (power[i] * base) % mod
        
        def get_hash(l, r):
            return (prefix_hash[r] - prefix_hash[l] * power[r - l]) % mod
        
        seen = set()
        
        for length in range(2, n + 1, 2):
            for i in range(n - length + 1):
                mid = i + length // 2
                if get_hash(i, mid) == get_hash(mid, i + length):
                    seen.add(text[i:i + length])
        
        return len(seen)

Implementation Explanation: First, the code computes rolling hash values for all prefixes and precomputes powers of the base to enable O(1) substring hash queries. For each possible even length, it iterates over all starting positions, calculates the hashes of the two halves, and if they match, adds the substring to the set seen. Finally, it returns the number of distinct echo substrings.

Go Solution

func distinctEchoSubstrings(text string) int {
    n := len(text)
    base := 26
    mod := int(1e9 + 7)
    
    prefixHash := make([]int, n+1)
    power := make([]int, n+1)
    power[0] = 1
    
    for i := 0; i < n; i++ {
        prefixHash[i+1] = (prefixHash[i]*base + int(text[i]-'a')) % mod
        power[i+1] = (power[i] * base) % mod
    }
    
    getHash := func(l, r int) int {
        return (prefixHash[r] - prefixHash[l]*power[r-l]%mod + mod) % mod
    }
    
    seen := make(map[string]struct{})
    
    for length := 2; length <= n; length += 2 {
        for i := 0; i <= n-length; i++ {
            mid := i + length/2
            if getHash(i, mid) == getHash(mid, i+length) {
                seen[text[i:i+length]] = struct{}{}
            }
        }
    }
    
    return len(seen)
}

Go Notes: The Go version mirrors the Python solution. Maps are used instead of sets, and modulo arithmetic ensures positive values. Slices are used to extract substrings when storing in the map.

Worked Examples

Example 1: "abcabcabc"

i length first half second half Equal? seen set
0 6 "abc" "abc" Yes {"abcabc"}
1 6 "bca" "bca" Yes {"abcabc","bcabca"}
2 6 "cab" "cab" Yes {"abcabc","bcabca","cabcab"}
3 6 "abc" "abc" Yes already in set
...

Output: 3

Example 2: "leetcodeleetcode"

i length first half second half Equal? seen set
1 2 "ee" "ee" Yes {"ee"}
0 16 "leetcode" "leetcode" Yes {"ee","leetcodeleetcode"}

Output: 2

Complexity Analysis

Measure Complexity Explanation
Time O(n^2) Iterate over O(n) starting positions for O(n) lengths, with O(1) hash comparisons
Space O(n^2) Storing prefix hashes O(n), set/map may contain up to O(n^2) substrings

The O(n^2) time is feasible for n <= 2000. Rolling hash ensures constant-time substring equality checks, avoiding O(n) comparisons.

Test Cases

# Provided examples
assert Solution().distinctEchoSubstrings("abcabcabc") == 3  # multiple overlapping substrings
assert Solution().distinctEchoSubstrings("leetcodeleetcode") == 2  # includes long substring

# Edge cases
assert Solution().distinctEchoSubstrings("a") == 0  # length 1, no echo substrings
assert Solution().distinctEchoSubstrings("aa") == 1  # simplest echo substring
assert Solution().distinctEchoSubstrings("aaaa") == 3  # "aa", "aaaa" overlapping

# Random cases
assert Solution().distinctEchoSubstrings("abcd") == 0  # no repeated halves
assert Solution().distinctEchoSubstrings("ababab") == 2  # "abab", "ababab"
Test Why
"abcabcabc" Multiple overlapping echo substrings
"leetcodeleetcode" Includes long substring
"a" Too short, should return 0
"aa" Minimal valid echo substring
"aaaa" Multiple overlapping repeats
"abcd" No echo substring present
"ababab" Repeating pattern of odd length