LeetCode 3076 - Shortest Uncommon Substring in an Array

The problem asks us to find, for each string in a given array, the shortest substring that does not appear in any other string in the array. If multiple shortest substrings exist, we must choose the lexicographically smallest one.

LeetCode Problem 3076

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Trie

Solution

Problem Understanding

The problem asks us to find, for each string in a given array, the shortest substring that does not appear in any other string in the array. If multiple shortest substrings exist, we must choose the lexicographically smallest one. If no such substring exists, the answer for that string is an empty string.

In other words, for an array arr of strings:

  • Input: arr = ["cab", "ad", "bad", "c"]
  • Output: ["ab", "", "ba", ""]

The output represents, for each string, the minimal unique substring that differentiates it from all other strings.

The constraints tell us that:

  • n can be up to 100, so the number of strings is relatively small.
  • Each string has a maximum length of 20, so enumerating all substrings of a single string is feasible (O(L^2) substrings per string).
  • Strings consist only of lowercase letters, so lexicographic comparison is straightforward.

Important edge cases include:

  • Strings with complete overlap, e.g., ["abc","bcd"] where short substrings may appear in multiple strings.
  • Strings of length 1.
  • Strings where all possible substrings occur in other strings.

The problem guarantees non-empty strings, which simplifies substring generation.

Approaches

The brute-force approach is to generate all substrings for each string and check whether each substring appears in any other string. For each string, we would:

  1. Generate all substrings.
  2. Check if the substring occurs in any other string.
  3. Track the shortest substrings that are unique and pick the lexicographically smallest.

This approach is correct but inefficient because the substring search across other strings is O(n * L^3) in the worst case (since checking a substring against a string takes O(L)).

The optimal approach is to use a Trie data structure to track the occurrence of substrings across all strings. By inserting every substring into a Trie along with the set of string indices in which it occurs, we can efficiently query the shortest unique substring for each string:

  1. Generate all substrings for each string.
  2. Insert each substring into a Trie while recording the indices of the strings where it appears.
  3. For each string, traverse the Trie to find the shortest substring that occurs only in this string's index set.
  4. Among substrings of the same length, choose the lexicographically smallest.

This reduces repeated substring searches and allows us to efficiently find unique substrings.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2 * L^3) O(L^2) per string Generate all substrings and check against other strings
Trie-Based Optimal O(n * L^3) O(n * L^3) Insert all substrings into a Trie and track string indices

Algorithm Walkthrough

  1. Initialize a Trie where each node stores a dictionary of children and a set of string indices.
  2. For each string arr[i] in the array, generate all possible substrings.
  3. Insert each substring into the Trie. While inserting, update the set of indices at the node to include i.
  4. After building the Trie, for each string arr[i], traverse all paths from the root and identify nodes where the index set contains only i. This identifies substrings unique to this string.
  5. Keep track of the shortest unique substring for each string. If multiple exist of the same length, select the lexicographically smallest.
  6. If no unique substring exists for a string, return an empty string for that position.

Why it works: By storing the set of string indices at each Trie node, we can immediately identify which substrings are unique. Since all substrings are generated and inserted, the search is exhaustive, and using the shortest-length-first traversal ensures minimal length and lexicographic order.

Python Solution

from typing import List

class TrieNode:
    def __init__(self):
        self.children = {}
        self.indices = set()

class Solution:
    def shortestSubstrings(self, arr: List[str]) -> List[str]:
        root = TrieNode()
        
        # Insert all substrings into the Trie
        for idx, word in enumerate(arr):
            L = len(word)
            for start in range(L):
                node = root
                for end in range(start, L):
                    char = word[end]
                    if char not in node.children:
                        node.children[char] = TrieNode()
                    node = node.children[char]
                    node.indices.add(idx)
        
        result = []
        
        # Find shortest unique substring for each word
        for idx, word in enumerate(arr):
            shortest = None
            L = len(word)
            for start in range(L):
                node = root
                substring = ""
                for end in range(start, L):
                    char = word[end]
                    node = node.children[char]
                    substring += char
                    if node.indices == {idx}:  # unique substring
                        if shortest is None or len(substring) < len(shortest) or (len(substring) == len(shortest) and substring < shortest):
                            shortest = substring
                        break
            result.append(shortest if shortest else "")
        
        return result

Explanation: The TrieNode stores children and the indices of strings passing through. Substrings are inserted into the Trie, and each node collects the indices of strings containing it. While retrieving the shortest unique substring, we traverse the Trie per string and find the first node where the set of indices contains only that string's index.

Go Solution

package main

func shortestSubstrings(arr []string) []string {
    type TrieNode struct {
        children map[rune]*TrieNode
        indices  map[int]struct{}
    }

    root := &TrieNode{children: make(map[rune]*TrieNode), indices: make(map[int]struct{})}

    // Insert all substrings
    for idx, word := range arr {
        for start := 0; start < len(word); start++ {
            node := root
            for _, char := range word[start:] {
                if node.children[char] == nil {
                    node.children[char] = &TrieNode{children: make(map[rune]*TrieNode), indices: make(map[int]struct{})}
                }
                node = node.children[char]
                node.indices[idx] = struct{}{}
            }
        }
    }

    result := make([]string, len(arr))

    for idx, word := range arr {
        shortest := ""
        L := len(word)
        for start := 0; start < L; start++ {
            node := root
            substring := ""
            for _, char := range word[start:] {
                node = node.children[char]
                substring += string(char)
                if len(node.indices) == 1 {
                    for k := range node.indices {
                        if k == idx {
                            if shortest == "" || len(substring) < len(shortest) || (len(substring) == len(shortest) && substring < shortest) {
                                shortest = substring
                            }
                            break
                        }
                    }
                    break
                }
            }
        }
        result[idx] = shortest
    }

    return result
}

Explanation: Go uses map[rune]*TrieNode for children and map[int]struct{} for indices. The logic mirrors Python: build the Trie with substring indices and search for the shortest unique substring.

Worked Examples

Example 1: ["cab","ad","bad","c"]

String Substrings Checked Unique Substring Selected
"cab" "c","ca","cab","a","ab","b" "ab","ca" "ab"
"ad" "a","ad","d" none ""
"bad" "b","ba","bad","a","ad","d" "ba" "ba"
"c" "c" none ""

Example 2: ["abc","bcd","abcd"]

String Substrings Checked Unique Substring Selected
"abc" all none ""
"bcd" all none ""
"abcd" all "abcd" "abcd"

Complexity Analysis

Measure Complexity Explanation
Time O(n * L^3) Generating all substrings (L^2 per string) and inserting/traversing Trie (up to L per substring)
Space O(n * L^3) Trie stores all substrings with index sets

The cubic factor in time comes from generating substrings of length up to L and processing them per string. Space is proportional to the total number of substrings across all strings.

Test Cases

# Provided examples
assert Solution().shortestSubstrings(["cab","ad","bad","c"]) == ["ab","","ba",""]  # Example 1
assert Solution().shortestSubstrings(["abc","bcd","abcd"]) == ["","","abcd"]     # Example 2

# Edge cases
assert Solution().shortestSubstrings(["a","b"]) == ["a","b"]                        # Single-letter strings
assert Solution().shortestSubstrings(["aa","aa","aa"]) == ["","",""]               # All identical strings
assert Solution