LeetCode 1698 - Number of Distinct Substrings in a String

The problem asks for the total number of distinct substrings of a given string s. A substring is any contiguous sequence of characters from the string, including single-character substrings and the string itself.

LeetCode Problem 1698

Difficulty: 🟡 Medium
Topics: String, Trie, Rolling Hash, Suffix Array, Hash Function

Solution

Problem Understanding

The problem asks for the total number of distinct substrings of a given string s. A substring is any contiguous sequence of characters from the string, including single-character substrings and the string itself. The output is an integer representing how many unique substrings can be formed from s.

For example, in the string "aabbaba", substrings like "a", "aa", "ab", "aba" and so on are counted, but duplicates are ignored. The result should include all possible lengths, from 1 up to the full length of the string.

The constraints indicate that s has at most 500 characters and only lowercase letters. This makes brute-force solutions that enumerate all possible substrings feasible but not optimal. Edge cases to consider include strings where all characters are the same, strings with all unique characters, and very short strings (length 1).

Approaches

The simplest approach is brute-force: generate all possible substrings, store them in a set to remove duplicates, and then return the size of the set. While correct, this approach is slow because it requires generating roughly n*(n+1)/2 substrings for a string of length n and each substring comparison may take linear time. The overall complexity is about O(n^3).

A more efficient approach leverages Trie, Rolling Hash, or Suffix Array. The key insight is that we do not need to repeatedly compare substrings character by character. A Trie can store all substrings efficiently: each path from the root represents a substring, and inserting all substrings guarantees that duplicates are automatically handled. Rolling hash allows for constant-time substring comparisons using hash values, enabling an O(n^2) approach. For O(n) time, a Suffix Automaton is ideal, but that is more advanced and involves building a structure representing all substrings.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^3) O(n^2) Generate all substrings and store in a set
Trie / Rolling Hash O(n^2) O(n^2) Insert all substrings into a Trie or compute hashes to track uniqueness efficiently
Suffix Automaton O(n) O(n) Advanced technique to store all substrings implicitly

Algorithm Walkthrough

We will describe the Rolling Hash / Set approach, which is simple and efficient for the input size:

  1. Initialize an empty set unique_substrings to store hash values of substrings.
  2. Iterate over all starting indices i of the string s.
  3. For each i, iterate over all ending indices j >= i to extract substrings s[i:j+1].
  4. Compute the rolling hash for s[i:j+1] to avoid comparing substrings directly.
  5. Insert the hash value into unique_substrings.
  6. After processing all substrings, return the size of unique_substrings.

Why it works: The rolling hash ensures that identical substrings map to the same hash value, so duplicates are automatically filtered when stored in a set. Since we consider all possible start and end positions, all substrings are accounted for.

Python Solution

class Solution:
    def countDistinct(self, s: str) -> int:
        n = len(s)
        mod = 10**9 + 7
        base = 31
        unique_hashes = set()
        
        for i in range(n):
            current_hash = 0
            for j in range(i, n):
                current_hash = (current_hash * base + (ord(s[j]) - ord('a') + 1)) % mod
                unique_hashes.add(current_hash)
        
        return len(unique_hashes)

Explanation: We iterate through all substrings starting at i. The inner loop extends the substring to j, computing the hash incrementally. ord(s[j]) - ord('a') + 1 converts a character into a numeric value suitable for hashing. All unique hashes are stored in a set, and its length is returned as the answer.

Go Solution

func countDistinct(s string) int {
    n := len(s)
    mod := int64(1e9 + 7)
    base := int64(31)
    unique := make(map[int64]struct{})
    
    for i := 0; i < n; i++ {
        var currentHash int64 = 0
        for j := i; j < n; j++ {
            currentHash = (currentHash*base + int64(s[j]-'a'+1)) % mod
            unique[currentHash] = struct{}{}
        }
    }
    
    return len(unique)
}

Go differences: We use int64 for hashing to avoid overflow, and map[int64]struct{} as a set. Empty struct uses minimal memory.

Worked Examples

Example 1: s = "aabbaba"

i j Substring Hash Added
0 0 "a" hash("a")
0 1 "aa" hash("aa")
0 2 "aab" hash("aab")
0 3 "aabb" hash("aabb")
... ... ... ...

This continues for all (i, j) pairs. After all iterations, the set contains 21 unique hashes, corresponding to 21 distinct substrings.

Example 2: s = "abcdefg"

All substrings are unique because all characters are different. Total distinct substrings = n*(n+1)/2 = 28.

Complexity Analysis

Measure Complexity Explanation
Time O(n^2) Outer loop over i and inner loop over j produces roughly n*(n+1)/2 iterations
Space O(n^2) In the worst case, all substrings are distinct and stored in the set

Even though the worst-case space is quadratic, it is acceptable for n <= 500.

Test Cases

# Provided examples
assert Solution().countDistinct("aabbaba") == 21  # multiple duplicates
assert Solution().countDistinct("abcdefg") == 28  # all unique characters

# Edge cases
assert Solution().countDistinct("a") == 1  # single character
assert Solution().countDistinct("aaaa") == 4  # repeated single character, substrings: "a","aa","aaa","aaaa"
assert Solution().countDistinct("ab") == 3  # "a","b","ab"
Test Why
"aabbaba" Checks multiple duplicates and overlapping substrings
"abcdefg" All substrings are unique
"a" Minimal length
"aaaa" All characters same, tests substring duplication handling
"ab" Short string with distinct and combined substrings

Edge Cases

  1. Single character string: For s = "a", the algorithm correctly returns 1, as the only substring is "a". This tests handling of minimal input.
  2. All identical characters: For s = "aaaa", naive approaches that compare substrings directly might count duplicates multiple times. Our hashing approach handles this automatically because all identical substrings map to the same hash.
  3. All unique characters: For s = "abcdef", each substring is unique. This tests whether the algorithm correctly handles the case where the number of distinct substrings grows quadratically, and ensures the set or map properly counts all entries without collisions.