LeetCode 288 - Unique Word Abbreviation

The problem asks us to design a data structure that determines whether a word's abbreviation is unique within a given dictionary.

LeetCode Problem 288

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Design

Solution

Problem Understanding

The problem asks us to design a data structure that determines whether a word's abbreviation is unique within a given dictionary.

An abbreviation is defined using three parts:

  • The first character
  • The number of characters between the first and last character
  • The last character

For example:

  • "dog" becomes "d1g"
  • "internationalization" becomes "i18n"
  • "it" remains "it" because words with length less than or equal to 2 are not shortened

The ValidWordAbbr class is initialized with a dictionary of words. After initialization, we must answer queries using isUnique(word).

A word is considered unique if either:

  • No dictionary word shares the same abbreviation
  • Every dictionary word that shares the abbreviation is exactly the same word

This second condition is extremely important. If the dictionary contains "cake" and we query "cake", the answer should still be true as long as no other distinct word shares the abbreviation "c2e".

The input size gives us important design constraints:

  • Up to 3 * 10^4 dictionary words
  • Up to 5000 calls to isUnique

A naive scan of the entire dictionary for every query would become expensive. Since abbreviation computation is simple and repeated often, preprocessing the dictionary into a hash-based structure is the natural optimization.

Several edge cases are important:

  • Duplicate words may appear in the dictionary
  • Very short words of length 1 or 2 abbreviate to themselves
  • Multiple different words can map to the same abbreviation
  • Query words may or may not already exist in the dictionary

Correct handling of duplicates is especially important. If "deer" appears multiple times, it should still behave as a single unique word associated with abbreviation "d2r".

Approaches

Brute Force Approach

The brute-force solution stores the entire dictionary and, for every isUnique(word) query, scans through all dictionary words.

For each dictionary word:

  1. Compute its abbreviation
  2. Compute the query word's abbreviation
  3. Compare the abbreviations
  4. If the abbreviations match but the words differ, return false

If the scan finishes without finding a conflicting word, return true.

This approach is correct because it directly implements the problem definition. Every possible conflicting dictionary word is checked explicitly.

However, this solution is inefficient because each query requires scanning the full dictionary. With up to 30000 words and 5000 queries, the worst-case runtime becomes very large.

Optimal Approach

The key insight is that the only thing that matters for uniqueness is the relationship between:

  • An abbreviation
  • The set of words that produce it

Instead of repeatedly scanning the dictionary, we can preprocess all abbreviations into a hash map.

For every abbreviation, we store the set of dictionary words that map to it.

Then during isUnique(word):

  • If the abbreviation does not exist in the map, the word is unique
  • If the abbreviation exists and the associated set contains only the same word, the word is unique
  • Otherwise, it is not unique

This transforms repeated expensive scans into constant-time hash lookups.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) per query O(n) Scans entire dictionary for every lookup
Optimal O(n) preprocessing, O(1) per query O(n) Uses hash map from abbreviation to set of words

Algorithm Walkthrough

Optimal Algorithm

  1. Create a helper function that computes the abbreviation of a word.

If the word length is less than or equal to 2, return the word itself because short words are never compressed.

Otherwise construct:

  • first character
  • number of middle characters
  • last character

For example:

  • "door""d2r"
  • "cake""c2e"
  1. During initialization, build a hash map where:
  • key = abbreviation
  • value = set of words that share that abbreviation

This preprocessing step groups all dictionary words by abbreviation. 3. When isUnique(word) is called, compute the word's abbreviation. 4. Check whether the abbreviation exists in the hash map.

If it does not exist, no dictionary word shares the abbreviation, so return true. 5. If the abbreviation exists, retrieve the associated set of words. 6. The word is unique only if the set contains exactly one word and that word equals the query word.

This condition handles both:

  • duplicate dictionary entries
  • words already present in the dictionary
  1. Otherwise return false because another distinct word shares the abbreviation.

Why it works

The algorithm works because the hash map completely captures the relationship between abbreviations and dictionary words.

For any queried word, only words with the same abbreviation can possibly violate uniqueness. The map lets us access exactly those words in constant time.

The correctness invariant is:

For every abbreviation, the map stores the complete set of distinct dictionary words that generate it.

Therefore, checking whether the set is empty or contains only the queried word exactly matches the problem definition.

Python Solution

from typing import List
from collections import defaultdict

class ValidWordAbbr:

    def __init__(self, dictionary: List[str]):
        self.abbr_map = defaultdict(set)

        for word in dictionary:
            abbreviation = self._get_abbreviation(word)
            self.abbr_map[abbreviation].add(word)

    def _get_abbreviation(self, word: str) -> str:
        if len(word) <= 2:
            return word

        return word[0] + str(len(word) - 2) + word[-1]

    def isUnique(self, word: str) -> bool:
        abbreviation = self._get_abbreviation(word)

        if abbreviation not in self.abbr_map:
            return True

        matching_words = self.abbr_map[abbreviation]

        return matching_words == {word}

# Your ValidWordAbbr object will be instantiated and called as such:
# obj = ValidWordAbbr(dictionary)
# param_1 = obj.isUnique(word)

The implementation begins by constructing abbr_map, a dictionary that maps each abbreviation to the set of words producing it.

The _get_abbreviation helper centralizes abbreviation logic so it is implemented consistently in both initialization and query handling.

Using a set is important because the dictionary may contain duplicate words. We only care about distinct words associated with an abbreviation.

In isUnique, we first compute the abbreviation for the query word. If the abbreviation does not exist in the map, the word is automatically unique.

If the abbreviation exists, we compare the associated set against {word}. This is a concise and reliable way to verify that the queried word is the only distinct word sharing the abbreviation.

Go Solution

package main

type ValidWordAbbr struct {
	abbrMap map[string]map[string]bool
}

func Constructor(dictionary []string) ValidWordAbbr {
	abbrMap := make(map[string]map[string]bool)

	for _, word := range dictionary {
		abbr := getAbbreviation(word)

		if _, exists := abbrMap[abbr]; !exists {
			abbrMap[abbr] = make(map[string]bool)
		}

		abbrMap[abbr][word] = true
	}

	return ValidWordAbbr{
		abbrMap: abbrMap,
	}
}

func getAbbreviation(word string) string {
	if len(word) <= 2 {
		return word
	}

	return string(word[0]) +
		string(rune(len(word)-2+'0')) +
		string(word[len(word)-1])
}

func (this *ValidWordAbbr) IsUnique(word string) bool {
	abbr := getAbbreviation(word)

	words, exists := this.abbrMap[abbr]

	if !exists {
		return true
	}

	return len(words) == 1 && words[word]
}

/**
 * Your ValidWordAbbr object will be instantiated and called as such:
 * obj := Constructor(dictionary);
 * param_1 := obj.IsUnique(word);
 */

One important Go-specific detail is the representation of sets. Go does not have a built-in set type, so we use map[string]bool.

Another detail is abbreviation construction. Since the middle count may exceed one digit, a production implementation should ideally use strconv.Itoa(len(word) - 2) instead of direct rune conversion. A more robust version is shown below:

return string(word[0]) + strconv.Itoa(len(word)-2) + string(word[len(word)-1])

This handles multi-digit counts correctly, such as "internationalization" becoming "i18n".

Unlike Python's defaultdict, Go requires explicit map initialization before insertion.

Worked Examples

Example 1

Dictionary:

["deer", "door", "cake", "card"]

Initialization Phase

Word Abbreviation Map State
deer d2r d2r → {deer}
door d2r d2r → {deer, door}
cake c2e c2e → {cake}
card c2d c2d → {card}

Final map:

Abbreviation Words
d2r {deer, door}
c2e {cake}
c2d {card}

Query: isUnique("dear")

Abbreviation:

dear → d2r

Lookup:

Abbreviation Stored Words
d2r {deer, door}

Because the set contains words different from "dear", return:

false

Query: isUnique("cart")

Abbreviation:

cart → c2t

c2t does not exist in the map.

Return:

true

Query: isUnique("cane")

Abbreviation:

cane → c2e

Lookup:

Abbreviation Stored Words
c2e {cake}

Since "cake" differs from "cane", return:

false

Query: isUnique("make")

Abbreviation:

make → m2e

No such abbreviation exists.

Return:

true

Query: isUnique("cake")

Abbreviation:

cake → c2e

Lookup:

Abbreviation Stored Words
c2e {cake}

The only matching word is exactly "cake".

Return:

true

Complexity Analysis

Measure Complexity Explanation
Time O(n + q) O(n) preprocessing and O(1) average per query
Space O(n) Stores abbreviations and associated word sets

Let:

  • n = number of dictionary words
  • q = number of isUnique queries

Building the hash map requires visiting each dictionary word once, giving O(n) preprocessing time.

Each query performs:

  • one abbreviation computation
  • one hash lookup
  • one small set comparison

All of these are constant-time operations on average.

The space complexity is O(n) because every distinct dictionary word may be stored in the hash map.

Test Cases

from typing import List

# Example test
obj = ValidWordAbbr(["deer", "door", "cake", "card"])

assert obj.isUnique("dear") == False  # conflicting abbreviation
assert obj.isUnique("cart") == True   # no matching abbreviation
assert obj.isUnique("cane") == False  # conflicts with cake
assert obj.isUnique("make") == True   # unique abbreviation
assert obj.isUnique("cake") == True   # same word only

# Duplicate dictionary entries
obj = ValidWordAbbr(["hello", "hello"])

assert obj.isUnique("hello") == True  # duplicates should not matter
assert obj.isUnique("hallo") == False # same abbreviation, different word

# Short words
obj = ValidWordAbbr(["it", "to"])

assert obj.isUnique("it") == True     # exact same short word
assert obj.isUnique("at") == True     # no abbreviation conflict

# Single-character words
obj = ValidWordAbbr(["a", "b"])

assert obj.isUnique("a") == True      # exact match
assert obj.isUnique("c") == True      # unique single char

# Multiple conflicts
obj = ValidWordAbbr(["deer", "door", "dare"])

assert obj.isUnique("dear") == False  # several conflicting words

# Empty conflict set
obj = ValidWordAbbr([])

assert obj.isUnique("anything") == True  # empty dictionary

# Long abbreviation
obj = ValidWordAbbr(["internationalization"])

assert obj.isUnique("interoperability") == True   # different abbreviation
assert obj.isUnique("internationalization") == True  # exact same word
Test Why
Standard example Verifies core functionality
Duplicate entries Ensures duplicates do not create false conflicts
Short words Validates length ≤ 2 handling
Single-character words Tests smallest valid input
Multiple conflicts Ensures many words sharing abbreviation are handled
Empty dictionary Verifies trivial uniqueness
Long words Tests multi-digit abbreviation counts

Edge Cases

One important edge case is duplicate dictionary entries. If "hello" appears multiple times, we should not treat it as conflicting with itself. A naive implementation using counts instead of distinct words might incorrectly return false. Using a set solves this naturally because duplicate insertions collapse into a single stored word.

Another important edge case involves very short words. Words with length 1 or 2 are not abbreviated according to the normal rule. A buggy implementation might incorrectly convert "it" into "i0t". The helper function explicitly returns the original word unchanged when the length is less than or equal to 2.

A third subtle case occurs when the queried word already exists in the dictionary. For example, if the dictionary contains only "cake", then isUnique("cake") should return true. The implementation handles this by verifying that the abbreviation's associated set equals exactly {word}. This ensures the queried word is the only distinct word sharing the abbreviation.

A final edge case involves abbreviations shared by many distinct words. For example:

deer → d2r
door → d2r
dear → d2r

A naive solution might stop after finding one matching word and incorrectly assume uniqueness. Our implementation stores the full set of conflicting words, ensuring all conflicts are accounted for correctly.