LeetCode 527 - Word Abbreviation

The problem asks us to compute minimal unique abbreviations for an array of distinct strings. An abbreviation replaces the middle characters of a word with a count, keeping the first and last characters.

LeetCode Problem 527

Difficulty: 🔴 Hard
Topics: Array, String, Greedy, Trie, Sorting

Solution

Problem Understanding

The problem asks us to compute minimal unique abbreviations for an array of distinct strings. An abbreviation replaces the middle characters of a word with a count, keeping the first and last characters. For example, "internal" becomes "i6l" because there are six letters between i and l.

The challenge arises when multiple words share the same abbreviation. In that case, we must increase the prefix length (the number of starting characters included before the count) until all abbreviations are unique. For example, "abcdef" and "abndef" initially abbreviate to "a4f". Incrementing the prefix gives "ab3f" and then "abc2f" / "abn2f" until uniqueness is achieved.

The input is an array of unique strings, each of length 2 to 400, with up to 400 strings. The output is an array of abbreviations such that every abbreviation is minimal, unique, and shorter than the original word (otherwise we retain the original word).

Edge cases to be aware of include very short words (like "aa"), words that differ only at the last character, and words whose abbreviation would not reduce their length.

Approaches

Brute-Force Approach

A naive approach is to start with the standard abbreviation for each word. Then, repeatedly check all pairs of words for collisions. Whenever two words share an abbreviation, increment the prefix of the conflicting words and recompute the abbreviation. Continue until all abbreviations are unique.

This method works because it directly follows the problem rules, but it is inefficient. Checking all pairs and repeatedly updating abbreviations could take O(n² * L) time, where n is the number of words and L is the maximum word length, which is too slow for n up to 400 and L up to 400.

Optimal Approach

The key observation is that words can be grouped by length and by current abbreviation, and then we can resolve conflicts within each group rather than checking all pairs globally. Using a trie-like prefix resolution or a hash map of conflicts, we can efficiently determine the minimum prefix length required for each word to ensure uniqueness.

For each group of words that initially share the same abbreviation, we increment their prefix lengths only as much as needed to disambiguate them. This avoids unnecessary repeated work and ensures that every word has the minimal abbreviation.

Approach Time Complexity Space Complexity Notes
Brute Force O(n² * L) O(n) Compare all pairs, increment prefixes iteratively
Optimal O(n * L) O(n * L) Group words, increment prefix intelligently until conflicts are resolved

Algorithm Walkthrough

  1. Initial Abbreviations: For each word, generate the first abbreviation using the formula: first letter + count of middle letters + last letter.
  2. Grouping: Group words by (length, last character) and current abbreviation. This ensures we only check conflicts among words that could actually collide.
  3. Prefix Resolution: For each group with conflicts, determine the minimum prefix length for each word that makes the abbreviation unique. Increase prefixes iteratively but only within conflicting sets.
  4. Abbreviation Finalization: For each word, compute the final abbreviation using its resolved prefix length. If the abbreviation is not shorter than the original word, keep the original.
  5. Return: Assemble the list of abbreviations in the same order as the input.

Why it works: By focusing on conflicting groups and only incrementing the prefix as needed, we guarantee both minimality and uniqueness. Since conflicts are resolved locally within groups, we avoid unnecessary comparisons. The invariant is that after processing, no two words share the same abbreviation.

Python Solution

from typing import List, Dict, Tuple
from collections import defaultdict

class Solution:
    def wordsAbbreviation(self, words: List[str]) -> List[str]:
        n = len(words)
        prefix_lengths = [1] * n
        
        def abbreviate(word: str, prefix_len: int) -> str:
            if prefix_len >= len(word) - 2:
                return word
            return word[:prefix_len] + str(len(word) - prefix_len - 1) + word[-1]
        
        # Initialize abbreviations
        abbr_map: Dict[str, List[int]] = defaultdict(list)
        for i, word in enumerate(words):
            abbr = abbreviate(word, 1)
            abbr_map[abbr].append(i)
        
        # Resolve conflicts
        while True:
            new_map = defaultdict(list)
            resolved = True
            for abbr, indices in abbr_map.items():
                if len(indices) == 1:
                    new_map[abbr].append(indices[0])
                else:
                    resolved = False
                    for idx in indices:
                        prefix_lengths[idx] += 1
                        new_abbr = abbreviate(words[idx], prefix_lengths[idx])
                        new_map[new_abbr].append(idx)
            abbr_map = new_map
            if resolved:
                break
        
        # Build final abbreviations
        result = []
        for i, word in enumerate(words):
            abbr = abbreviate(word, prefix_lengths[i])
            result.append(abbr)
        return result

Implementation Walkthrough:

We start by initializing each word's prefix length to 1. The abbreviate function generates the abbreviation based on a given prefix length. Then, we map abbreviations to their corresponding word indices. In the conflict resolution loop, we detect any abbreviations used by multiple words and increment their prefix lengths until all abbreviations are unique. Finally, we compute and return the final abbreviations.

Go Solution

package main

func wordsAbbreviation(words []string) []string {
    n := len(words)
    prefixLens := make([]int, n)
    for i := 0; i < n; i++ {
        prefixLens[i] = 1
    }
    
    abbreviate := func(word string, prefixLen int) string {
        if prefixLen >= len(word)-2 {
            return word
        }
        return word[:prefixLen] + string(rune(len(word)-prefixLen-1+'0')) + word[len(word)-1:]
    }
    
    abbrMap := map[string][]int{}
    for i, word := range words {
        abbr := abbreviate(word, 1)
        abbrMap[abbr] = append(abbrMap[abbr], i)
    }
    
    for {
        newMap := map[string][]int{}
        resolved := true
        for abbr, indices := range abbrMap {
            if len(indices) == 1 {
                newMap[abbr] = append(newMap[abbr], indices[0])
            } else {
                resolved = false
                for _, idx := range indices {
                    prefixLens[idx]++
                    newAbbr := abbreviate(words[idx], prefixLens[idx])
                    newMap[newAbbr] = append(newMap[newAbbr], idx)
                }
            }
        }
        abbrMap = newMap
        if resolved {
            break
        }
    }
    
    result := make([]string, n)
    for i, word := range words {
        result[i] = abbreviate(word, prefixLens[i])
    }
    return result
}

Go-specific Notes:

The main differences include using slices and maps, handling string slicing, and converting integer counts to strings. The logic mirrors Python but uses Go syntax and data structures. Edge cases such as very short words are handled by returning the original word if the abbreviation would not shorten it.

Worked Examples

Example 1

Input: ["like","god","internal","me","internet","interval","intension","face","intrusion"]

Word Initial Abbr Conflict? Final Abbr
like l2e No l2e
god g1d No god
internal i6l No internal
me m0e No me
internet i6t Yes i6t
interval i6l No interval
intension i7n Yes inte4n
face f2e No f2e
intrusion i7n Yes intr4n

Example 2

Input: ["aa","aaa"]

Word Initial Abbr Conflict? Final Abbr
aa a0a Yes aa
aaa a1a No aaa

Complexity Analysis

Measure Complexity Explanation
Time O(n * L) Each word's prefix increases at most L times, and we process n words, L is max word length
Space O(n * L) Store abbreviations and prefix lengths for all words

The main work is incrementing prefixes to resolve conflicts, which is bounded by word length. Grouping by abbreviation ensures we do not repeatedly check all pairs.

Test Cases

# test cases
assert Solution().wordsAbbreviation(["like","god","internal","me","internet","interval","intension","face","intrusion"]) == ["l2e","god","internal","me","i6t","interval","inte4n","f2e