LeetCode 828 - Count Unique Characters of All Substrings of a Given String

The problem asks us to compute the total number of unique characters across every possible substring of a given string. A character is considered unique inside a substring if it appears exactly once within that substring.

LeetCode Problem 828

Difficulty: šŸ”“ Hard
Topics: Hash Table, String, Dynamic Programming

Solution

Problem Understanding

The problem asks us to compute the total number of unique characters across every possible substring of a given string.

A character is considered unique inside a substring if it appears exactly once within that substring. For example, in the substring "ABA", only "B" is unique because "A" appears twice.

We are given a string s consisting only of uppercase English letters, and we must consider every substring of s. For each substring, we compute how many characters appear exactly once, then sum all of those counts together.

For example, if s = "ABC", the substrings are:

  • "A" → 1 unique character
  • "B" → 1 unique character
  • "C" → 1 unique character
  • "AB" → 2 unique characters
  • "BC" → 2 unique characters
  • "ABC" → 3 unique characters

The total is 1 + 1 + 1 + 2 + 2 + 3 = 10.

The string length can be as large as 10^5, which immediately tells us that generating all substrings explicitly is too expensive. A string of length n has O(n^2) substrings, and examining each substring independently would lead to at least quadratic or cubic complexity.

The important challenge is to avoid repeatedly recomputing character frequencies for overlapping substrings.

Several edge cases are important:

  • A string with all identical characters such as "AAAA" produces very few unique characters.
  • A string with all distinct characters such as "ABCDE" maximizes unique counts.
  • Characters appearing multiple times at irregular intervals can easily cause counting mistakes.
  • Very large inputs require an algorithm close to linear time.

The problem guarantees:

  • The string length is at least 1.
  • Only uppercase English letters are used.
  • The final answer fits inside a 32-bit integer.

Approaches

Brute Force Approach

The most direct solution is to generate every substring and compute how many characters are unique within it.

For each starting index i, we extend the substring one character at a time toward the right. We maintain a frequency map of characters inside the current substring. After each extension, we count how many characters currently have frequency exactly one and add that to the answer.

This approach is correct because it explicitly evaluates every substring exactly as defined in the problem statement.

However, it is far too slow for the constraints. There are O(n^2) substrings, and determining the number of unique characters can take up to O(26) or even O(n) depending on implementation. With n = 10^5, quadratic processing is infeasible.

Optimal Approach

The key insight is to reverse the perspective.

Instead of asking:

For each substring, how many unique characters does it contain?

we ask:

For each occurrence of each character, in how many substrings is this occurrence counted as unique?

This transformation is the core optimization.

Suppose a character appears at index i. This occurrence contributes to all substrings where:

  • it is included,
  • no other occurrence of the same character is included.

If we know:

  • the previous occurrence of the same character,
  • the next occurrence of the same character,

then we can count exactly how many substrings treat this occurrence as unique.

Assume:

  • prev is the previous occurrence index,
  • next is the next occurrence index.

Then:

  • the left boundary of the substring can be chosen from (prev + 1) through i,
  • the right boundary can be chosen from i through (next - 1).

Therefore, the number of substrings where this occurrence is uniquely counted is:

$$(i - prev) \times (next - i)$$

We compute this contribution for every character occurrence and sum them all.

This avoids enumerating substrings entirely and reduces the complexity to linear time.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) to O(n³) O(1) to O(26) Explicitly evaluates every substring
Optimal O(n) O(26) Counts contribution of each character occurrence directly

Algorithm Walkthrough

  1. Create a data structure to store all occurrence indices for each character.

Since the string contains only uppercase English letters, we can maintain a dictionary or array where each character maps to a list of indices where it appears. 2. Add sentinel boundaries to simplify calculations.

For each character's index list, prepend -1 and append n, where n is the string length.

These sentinel values represent virtual boundaries before the string starts and after it ends. 3. Process each occurrence independently.

For a character occurrence at position curr, let:

  • prev be the previous occurrence index,
  • next be the next occurrence index.
  1. Compute how many substrings use this occurrence as unique.

The left boundary may start anywhere between prev + 1 and curr.

Number of left choices:

$$curr - prev$$

The right boundary may end anywhere between curr and next - 1.

Number of right choices:

$$next - curr$$ 5. Multiply the two counts.

Every valid left choice can pair with every valid right choice.

Contribution:

$$(curr - prev) \times (next - curr)$$ 6. Add the contribution to the global answer. 7. Repeat for all character occurrences. 8. Return the final accumulated sum.

Why it works

Each substring contributes one count for every character that appears exactly once inside it.

Instead of iterating through substrings, the algorithm counts how many substrings uniquely include each occurrence of each character.

Every substring-character uniqueness relationship is counted exactly once because each unique occurrence has a precisely defined valid range determined by neighboring identical characters.

Therefore, the total contribution sum equals the required answer.

Python Solution

class Solution:
    def uniqueLetterString(self, s: str) -> int:
        positions = {}

        for index, char in enumerate(s):
            if char not in positions:
                positions[char] = []

            positions[char].append(index)

        total = 0
        n = len(s)

        for char_positions in positions.values():
            indices = [-1] + char_positions + [n]

            for i in range(1, len(indices) - 1):
                prev_index = indices[i - 1]
                current_index = indices[i]
                next_index = indices[i + 1]

                total += (
                    (current_index - prev_index)
                    * (next_index - current_index)
                )

        return total

The implementation begins by recording every index where each character appears.

For example, for "ABA":

{
    'A': [0, 2],
    'B': [1]
}

Next, for each character list, sentinel values are added. This eliminates special edge handling for first and last occurrences.

For "A":

[-1, 0, 2, 3]

Now every actual occurrence has both a previous and next boundary.

The loop computes the contribution of each occurrence independently using:

(current_index - prev_index) * (next_index - current_index)

Finally, all contributions are accumulated into total.

The algorithm never generates substrings explicitly, which is why it remains efficient even for very large inputs.

Go Solution

func uniqueLetterString(s string) int {
    positions := make(map[byte][]int)

    for i := 0; i < len(s); i++ {
        positions[s[i]] = append(positions[s[i]], i)
    }

    total := 0
    n := len(s)

    for _, charPositions := range positions {
        indices := make([]int, 0, len(charPositions)+2)

        indices = append(indices, -1)

        for _, index := range charPositions {
            indices = append(indices, index)
        }

        indices = append(indices, n)

        for i := 1; i < len(indices)-1; i++ {
            prevIndex := indices[i-1]
            currentIndex := indices[i]
            nextIndex := indices[i+1]

            total += (currentIndex - prevIndex) *
                (nextIndex - currentIndex)
        }
    }

    return total
}

The Go solution follows exactly the same logic as the Python version.

A map[byte][]int stores occurrence indices for each character. Since the input contains only uppercase English letters, using byte is sufficient and efficient.

Slices are used to construct the augmented index arrays containing sentinel values.

Integer overflow is not a concern because the problem guarantees the final answer fits within a 32-bit integer, and Go's int type comfortably handles this range on modern systems.

Worked Examples

Example 1

Input:

s = "ABC"

Character positions:

Character Indices
A [0]
B [1]
C [2]

Process Character A

Augmented indices:

[-1, 0, 3]
prev curr next Contribution
-1 0 3 (0 - (-1)) Ɨ (3 - 0) = 3

Contribution = 3

Substrings:

  • "A"
  • "AB"
  • "ABC"

Process Character B

Augmented indices:

[-1, 1, 3]
prev curr next Contribution
-1 1 3 (1 - (-1)) Ɨ (3 - 1) = 4

Contribution = 4

Substrings:

  • "B"
  • "AB"
  • "BC"
  • "ABC"

Process Character C

Augmented indices:

[-1, 2, 3]
prev curr next Contribution
-1 2 3 (2 - (-1)) Ɨ (3 - 2) = 3

Contribution = 3

Total:

3 + 4 + 3 = 10

Example 2

Input:

s = "ABA"

Character positions:

Character Indices
A [0, 2]
B [1]

Process Character A

Augmented indices:

[-1, 0, 2, 3]

First occurrence:

prev curr next Contribution
-1 0 2 1 Ɨ 2 = 2

Second occurrence:

prev curr next Contribution
0 2 3 2 Ɨ 1 = 2

Total contribution from A = 4

Process Character B

Augmented indices:

[-1, 1, 3]
prev curr next Contribution
-1 1 3 2 Ɨ 2 = 4

Total contribution from B = 4

Final answer:

4 + 4 = 8

Example 3

Input:

s = "LEETCODE"

The algorithm computes the contribution of every occurrence independently and sums them.

Final result:

92

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character occurrence is processed exactly once
Space O(n) Occurrence lists store all indices

The total number of stored indices equals the string length, so the auxiliary space is linear.

Each index participates in exactly one contribution calculation, which makes the runtime linear as well.

Test Cases

solution = Solution()

assert solution.uniqueLetterString("ABC") == 10  # all characters unique
assert solution.uniqueLetterString("ABA") == 8  # repeated character
assert solution.uniqueLetterString("LEETCODE") == 92  # official example

assert solution.uniqueLetterString("A") == 1  # single character
assert solution.uniqueLetterString("AA") == 2  # no length-2 unique chars
assert solution.uniqueLetterString("AAA") == 3  # only single-char substrings contribute

assert solution.uniqueLetterString("AB") == 4  # all substrings unique
assert solution.uniqueLetterString("ABB") == 5  # duplicate at end
assert solution.uniqueLetterString("AAB") == 5  # duplicate at beginning

assert solution.uniqueLetterString("ABCDE") == 35  # all distinct
assert solution.uniqueLetterString("ABCA") == 18  # repeated boundary char
assert solution.uniqueLetterString("BACAD") == 34  # mixed repetitions

assert solution.uniqueLetterString("ZZZZZ") == 5  # all identical
assert solution.uniqueLetterString("ABCDEFGHIJKLMNOPQRSTUVWXYZ") == 3276  # large unique alphabet
Test Why
"ABC" Basic all-unique example
"ABA" Repeated character in middle
"LEETCODE" Official hard example
"A" Minimum input size
"AA" Simplest duplicate case
"AAA" All characters identical
"AB" Small all-unique string
"ABB" Duplicate at end
"AAB" Duplicate at beginning
"ABCDE" Fully unique larger input
"ABCA" Character repeated across boundaries
"BACAD" Irregular repetition pattern
"ZZZZZ" Stress repeated-character logic
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" Maximum unique uppercase set

Edge Cases

Single Character String

A string like "A" is the smallest possible input.

A buggy implementation might mishandle sentinel boundaries or fail when there is only one occurrence of a character.

This implementation handles it naturally because the augmented index list becomes:

[-1, 0, 1]

The contribution formula still works correctly:

(0 - (-1)) Ɨ (1 - 0) = 1

All Characters Identical

For "AAAAA", most substrings contain repeated characters, so only single-character substrings contribute.

Naive logic can easily overcount because many overlapping substrings contain duplicates.

The contribution method avoids this entirely because each occurrence only contributes within the range bounded by neighboring identical characters.

Characters Repeating Far Apart

Consider "ABCA".

The first and last "A" are separated by several characters. Incorrect implementations sometimes count substrings that include both occurrences, even though the character would no longer be unique.

Using previous and next occurrence boundaries guarantees that any substring counted for an occurrence excludes all other copies of the same character.