LeetCode 127 - Word Ladder

This problem asks us to determine the length of the shortest transformation sequence between two words, beginWord and endWord, under a strict transformation rule.

LeetCode Problem 127

Difficulty: 🔴 Hard
Topics: Hash Table, String, Breadth-First Search

Solution

Problem Understanding

This problem asks us to determine the length of the shortest transformation sequence between two words, beginWord and endWord, under a strict transformation rule.

A valid transformation changes exactly one character at a time, and every intermediate word must exist inside the provided wordList. The beginWord itself does not need to appear in the dictionary, but every word after it in the transformation sequence must be valid according to the dictionary.

The goal is not to generate all possible sequences. Instead, we only care about the minimum number of words in the shortest valid path from beginWord to endWord.

For example, if:

beginWord = "hit"
endWord = "cog"
wordList = ["hot","dot","dog","lot","log","cog"]

One valid transformation is:

hit → hot → dot → dog → cog

This sequence contains 5 words, so the answer is 5.

The problem becomes impossible if endWord is not present in wordList, because every transformed word must belong to the dictionary. In that case, the answer is immediately 0.

The constraints reveal important information about the scale of the problem:

  • Word length is at most 10
  • wordList contains up to 5000 unique words
  • All words have equal length
  • Only lowercase English letters are used

A naive graph construction that compares every pair of words would become expensive at this scale. Since every word can potentially connect to many others, we need a more efficient strategy for finding neighboring words.

The key observation is that this problem is fundamentally a shortest path problem in an unweighted graph. Each word is a node, and two words share an edge if they differ by exactly one letter.

Several edge cases can trip up a naive implementation. If endWord is missing from the dictionary, the answer must be 0. If there is no possible chain connecting the two words, we also return 0. Another subtle case is when many words differ by only one letter, which can create a large branching factor and make inefficient neighbor generation prohibitively expensive.

Approaches

Brute Force Approach

The brute force idea is to explicitly model the problem as a graph.

We treat every word as a node and compare every pair of words to determine whether they differ by exactly one character. If they do, we connect them with an edge. We must also include beginWord in this graph.

After building the graph, we run Breadth-First Search (BFS) starting from beginWord. Since BFS explores nodes level by level, the first time we encounter endWord, we know we have found the shortest transformation sequence.

This approach is correct because every edge represents a single valid transformation, and BFS guarantees the shortest path in an unweighted graph.

However, graph construction is expensive. If there are N words and each word has length L, checking every pair requires:

O(N² × L)

For 5000 words, this becomes too slow.

Optimal Approach, BFS with Pattern Mapping

Instead of comparing every pair of words, we can generate neighbors efficiently.

The key insight is that two words differ by one letter if they share a common intermediate pattern.

For example:

hot → *ot, h*t, ho*
dot → *ot, d*t, do*
lot → *ot, l*t, lo*

Notice that "hot", "dot", and "lot" all share the pattern *ot, meaning they are one transformation apart.

We preprocess all words and map each wildcard pattern to matching words.

Example:

*ot → [hot, dot, lot]
h*t → [hot]
do* → [dot, dog]

Now, instead of checking every word during BFS, we can efficiently retrieve neighbors using these patterns.

At each BFS step:

  1. Generate wildcard patterns for the current word.
  2. Use the pattern map to retrieve neighbors.
  3. Visit unprocessed words.
  4. Continue level-by-level until reaching endWord.

This reduces neighbor lookup dramatically and allows BFS to remain efficient even for larger dictionaries.

Approach Time Complexity Space Complexity Notes
Brute Force O(N² × L) O(N²) Explicitly builds the entire graph
Optimal O(N × L²) O(N × L) Uses wildcard pattern mapping with BFS

Where:

  • N = number of words
  • L = word length

Algorithm Walkthrough

Optimal Algorithm, BFS with Wildcard Pattern Mapping

  1. Convert wordList into a set and verify endWord exists

Before doing any work, check whether endWord is inside the dictionary. If it is missing, return 0 immediately because a valid transformation is impossible. 2. Build the wildcard pattern map

For every word in the dictionary, generate all possible wildcard versions by replacing one character at a time with '*'.

Example:

hot → *ot, h*t, ho*

Store these patterns inside a hash map:

pattern → list of words

This preprocessing step allows us to efficiently find neighboring words later. 3. Initialize BFS

Create a queue containing:

(beginWord, 1)

The number 1 represents the transformation sequence length, because beginWord itself counts as part of the sequence.

Also create a visited set to avoid revisiting words and creating cycles. 4. Process the queue level by level

While the queue is not empty:

  • Remove the front word.
  • If it matches endWord, return the current sequence length.
  • Generate wildcard patterns for the current word.
  • Retrieve neighboring words from the pattern map.
  • Add unseen neighbors to the queue with distance +1.
  1. Mark words as visited immediately

As soon as a word enters the queue, mark it visited.

This prevents duplicate processing and guarantees BFS explores the shortest route first. 6. Return 0 if BFS ends

If the queue becomes empty without reaching endWord, no valid transformation exists.

Why it works

The correctness relies on the core property of Breadth-First Search.

BFS explores nodes level by level. Every BFS level represents words reachable using the same number of transformations. Since every valid transformation has equal cost, the first time we encounter endWord, it must be through the shortest possible sequence.

The wildcard pattern map guarantees we efficiently discover all valid neighboring words without missing any possibilities.

Python Solution

from collections import defaultdict, deque
from typing import List

class Solution:
    def ladderLength(
        self,
        beginWord: str,
        endWord: str,
        wordList: List[str]
    ) -> int:
        if endWord not in wordList:
            return 0

        word_length = len(beginWord)

        pattern_map = defaultdict(list)

        for word in wordList:
            for i in range(word_length):
                pattern = word[:i] + "*" + word[i + 1:]
                pattern_map[pattern].append(word)

        queue = deque([(beginWord, 1)])
        visited = {beginWord}

        while queue:
            current_word, steps = queue.popleft()

            if current_word == endWord:
                return steps

            for i in range(word_length):
                pattern = (
                    current_word[:i]
                    + "*"
                    + current_word[i + 1:]
                )

                for neighbor in pattern_map[pattern]:
                    if neighbor not in visited:
                        visited.add(neighbor)
                        queue.append((neighbor, steps + 1))

                pattern_map[pattern] = []

        return 0

The implementation starts with an early exit condition. If endWord is missing from the dictionary, we immediately return 0.

Next, we preprocess the dictionary into a wildcard pattern map. For every word, we create all possible single-character wildcard transformations and map them to the original word.

The BFS queue stores tuples of:

(current_word, sequence_length)

This allows us to track how many transformations have been made so far.

The visited set prevents cycles and duplicate work. We mark words as visited immediately after enqueueing them, ensuring each word is processed only once.

A useful optimization appears here:

pattern_map[pattern] = []

Once a wildcard pattern has been processed, we clear its neighbors. This avoids repeatedly traversing the same adjacency list during future BFS steps and improves efficiency.

When endWord is encountered, we immediately return the transformation length. If BFS finishes without finding it, the answer is 0.

Go Solution

func ladderLength(beginWord string, endWord string, wordList []string) int {
	wordExists := false

	for _, word := range wordList {
		if word == endWord {
			wordExists = true
			break
		}
	}

	if !wordExists {
		return 0
	}

	wordLength := len(beginWord)

	patternMap := make(map[string][]string)

	for _, word := range wordList {
		for i := 0; i < wordLength; i++ {
			pattern := word[:i] + "*" + word[i+1:]
			patternMap[pattern] = append(
				patternMap[pattern],
				word,
			)
		}
	}

	type Pair struct {
		word  string
		steps int
	}

	queue := []Pair{
		{beginWord, 1},
	}

	visited := map[string]bool{
		beginWord: true,
	}

	for len(queue) > 0 {
		current := queue[0]
		queue = queue[1:]

		if current.word == endWord {
			return current.steps
		}

		for i := 0; i < wordLength; i++ {
			pattern := current.word[:i] +
				"*" +
				current.word[i+1:]

			neighbors := patternMap[pattern]

			for _, neighbor := range neighbors {
				if !visited[neighbor] {
					visited[neighbor] = true
					queue = append(queue, Pair{
						word:  neighbor,
						steps: current.steps + 1,
					})
				}
			}

			patternMap[pattern] = []string{}
		}
	}

	return 0
}

The Go solution follows the same algorithmic structure as Python but uses Go-native data structures.

Since Go does not have a built-in queue, a slice is used to simulate BFS queue behavior. We remove elements using:

queue = queue[1:]

The visited structure is implemented using:

map[string]bool

Go strings are immutable, so substring operations remain safe and efficient for this problem size.

No integer overflow concerns exist here because the maximum path length is bounded by the number of words, which is at most 5000.

Worked Examples

Example 1

Input:

beginWord = "hit"
endWord = "cog"
wordList = ["hot","dot","dog","lot","log","cog"]

Wildcard map:

Pattern Words
*ot hot, dot, lot
h*t hot
ho* hot
d*t dot
do* dot, dog
*og dog, log, cog

Initial queue:

Queue Visited
[(hit,1)] {hit}

Step 1

Process:

hit

Patterns:

*it
h*t
hi*

Neighbor found:

hot

Queue becomes:

Queue Visited
[(hot,2)] {hit, hot}

Step 2

Process:

hot

Neighbors:

dot
lot

Queue becomes:

Queue Visited
[(dot,3), (lot,3)] {hit, hot, dot, lot}

Step 3

Process:

dot

Neighbor:

dog

Queue:

Queue Visited
[(lot,3), (dog,4)] {hit, hot, dot, lot, dog}

Step 4

Process:

lot

Neighbor:

log

Queue:

Queue Visited
[(dog,4), (log,4)] {hit, hot, dot, lot, dog, log}

Step 5

Process:

dog

Neighbor:

cog

Queue:

Queue Visited
[(log,4), (cog,5)] {hit, hot, dot, lot, dog, log, cog}

Step 6

Process:

cog

Reached target.

Answer:

5

Example 2

Input:

beginWord = "hit"
endWord = "cog"
wordList = ["hot","dot","dog","lot","log"]

Since:

"cog" not in wordList

We immediately return:

0

Complexity Analysis

Measure Complexity Explanation
Time O(N × L²) Each word generates L patterns, each pattern creation costs O(L)
Space O(N × L) Pattern map, queue, and visited set

Let N be the number of words and L be the word length.

Building the wildcard map requires generating L patterns for each word, and each substring operation costs O(L), resulting in O(N × L²) preprocessing time.

During BFS, each word is visited at most once, and pattern lists are cleared after use, preventing repeated work. This keeps traversal efficient and preserves the same asymptotic complexity.

Test Cases

solution = Solution()

# Example 1
assert solution.ladderLength(
    "hit",
    "cog",
    ["hot", "dot", "dog", "lot", "log", "cog"]
) == 5  # Standard valid transformation

# Example 2
assert solution.ladderLength(
    "hit",
    "cog",
    ["hot", "dot", "dog", "lot", "log"]
) == 0  # endWord missing

# Direct transformation
assert solution.ladderLength(
    "hit",
    "hot",
    ["hot"]
) == 2  # Single transformation

# No valid path despite endWord existing
assert solution.ladderLength(
    "hit",
    "cog",
    ["hot", "dot", "tod", "hog", "cog"]
) == 0  # Disconnected graph

# Multiple shortest paths
assert solution.ladderLength(
    "hit",
    "cog",
    ["hot", "dot", "dog", "lot", "log", "cog"]
) == 5  # Several shortest routes

# Smallest valid dictionary
assert solution.ladderLength(
    "a",
    "c",
    ["a", "b", "c"]
) == 2  # Single-character words

# Larger branching factor
assert solution.ladderLength(
    "red",
    "tax",
    ["ted", "tex", "red", "tax", "tad", "den", "rex", "pee"]
) == 4  # Multiple branching choices
Test Why
Example 1 Verifies standard shortest transformation
Example 2 Verifies immediate failure when endWord is absent
Direct transformation Tests one-step conversion
No valid path Ensures disconnected graph returns 0
Multiple shortest paths Confirms BFS finds minimum distance
Smallest valid dictionary Tests minimal word length
Larger branching factor Verifies BFS correctness under many choices

Edge Cases

endWord Does Not Exist in the Dictionary

This is the most important edge case. Since every transformed word must belong to wordList, reaching an endWord outside the dictionary is impossible.

A naive implementation might still run BFS unnecessarily, wasting time. Our implementation explicitly checks this condition at the start:

if endWord not in wordList:
    return 0

This guarantees immediate termination.

No Valid Transformation Path Exists

Even if endWord exists, there may be no chain connecting beginWord to it.

For example:

beginWord = "hit"
endWord = "cog"
wordList = ["hot","dot","tod","hog","cog"]

The graph is disconnected, so BFS eventually exhausts all possibilities and returns 0.

Our implementation naturally handles this by finishing traversal and returning:

0

when the queue becomes empty.

Many Similar Words Causing Excessive Reprocessing

A dictionary with many overlapping patterns can cause repeated neighbor scanning.

For example:

hot, dot, lot, pot, cot, got

all share:

*ot

Without optimization, the same adjacency list might be scanned repeatedly for every BFS node.

Our implementation avoids this inefficiency by clearing processed pattern entries:

pattern_map[pattern] = []

This ensures each wildcard group is traversed at most once, significantly improving performance for dense graphs.