LeetCode 269 - Alien Dictionary
This problem asks us to reconstruct the ordering of characters in an unknown alien language. We are given a list of words that are already sorted according to the alien language's lexicographical order.
Difficulty: 🔴 Hard
Topics: Array, String, Depth-First Search, Breadth-First Search, Graph Theory, Topological Sort
Solution
LeetCode 269, Alien Dictionary
Problem Understanding
This problem asks us to reconstruct the ordering of characters in an unknown alien language.
We are given a list of words that are already sorted according to the alien language's lexicographical order. The alphabet itself still uses lowercase English letters, but the order between letters is unknown. Our task is to infer a valid ordering of those letters.
The important observation is that lexicographical ordering works similarly to dictionary ordering in English. When comparing two words, the first position where the characters differ determines the ordering between those two characters.
For example:
"wrt"
"wrf"
The first differing characters are t and f, therefore we know:
t comes before f
These relationships form a directed graph:
t -> f
The final answer is a topological ordering of this graph.
The input is an array of strings:
words[i]
Each word contains lowercase English letters. The words are guaranteed to contain only valid characters, but the ordering itself may be invalid.
The output should be:
- A string representing one valid character ordering, if possible
- An empty string
""if the dictionary ordering is contradictory
The constraints are relatively small:
- At most 100 words
- Each word length at most 100
- Only lowercase English letters
This means we do not need extremely advanced optimizations, but we do need a correct graph-based solution.
Several edge cases are especially important:
- Cycles in the ordering graph
If we derive relationships like:
a -> b
b -> c
c -> a
then no valid ordering exists.
- Prefix invalidation
This is one of the trickiest cases.
Example:
["abc", "ab"]
This is invalid because a longer word appears before its own prefix. In any lexicographical ordering, "ab" must come before "abc".
- Characters with no edges
Some letters may appear in words but never participate in ordering constraints. They still must appear in the final answer.
- Multiple valid orderings
The problem allows returning any valid topological ordering.
Approaches
Brute Force Approach
A brute-force approach would attempt to generate all possible permutations of the unique characters and test whether each permutation satisfies the dictionary ordering.
Suppose there are k unique characters. We could:
- Generate every permutation of those characters
- Build a ranking map for the permutation
- Verify whether all adjacent words are correctly ordered
This approach is theoretically correct because it eventually checks every possible alphabet ordering.
However, it is far too slow.
The number of permutations is:
k!
Even with only 10 unique characters:
10! = 3,628,800
The search space becomes enormous very quickly.
Optimal Approach, Graph + Topological Sort
The key insight is that each adjacent pair of words gives us direct ordering information.
When two adjacent words differ at some position:
word1[i] != word2[i]
then we immediately know:
word1[i] comes before word2[i]
These relationships naturally form a directed graph.
For example:
w -> e
e -> r
r -> t
t -> f
Finding a valid alphabet ordering becomes equivalent to finding a valid topological ordering of the graph.
A topological sort works because:
- Directed edges represent ordering constraints
- A valid topological order respects all constraints
- Cycles indicate contradictory constraints
We can use Kahn's Algorithm, BFS-based topological sorting, to efficiently compute the ordering.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(k! * n * m) | O(k) | Generates every possible character ordering |
| Optimal | O(C + E) | O(C + E) | Uses graph construction and topological sort |
Where:
C= total number of unique charactersE= number of ordering relationships
Algorithm Walkthrough
- Initialize the graph and indegree map
We first collect every unique character from all words.
Even if a character has no incoming or outgoing edges, it still belongs in the final ordering, so every character must exist in the graph structures.
We maintain:
graph[char], adjacency listindegree[char], number of incoming edges
- Compare adjacent words
We compare:
words[i]
words[i + 1]
because only adjacent words provide direct lexicographical constraints.
- Detect invalid prefix cases
Before comparing characters, we check:
if len(word1) > len(word2) and word1.startswith(word2)
Example:
["abc", "ab"]
This is impossible in lexicographical ordering, so we immediately return "".
- Find the first differing character
We compare characters position by position.
The first mismatch determines the ordering relationship.
Example:
"wrt"
"wrf"
The first difference is:
t != f
Therefore:
t -> f
We add a directed edge.
We only use the first differing character because later characters provide no information once lexicographical order is already determined.
- Build indegrees carefully
When adding an edge:
u -> v
we increment:
indegree[v]
We also avoid duplicate edges using a set check.
- Initialize BFS queue
All characters with indegree 0 have no prerequisites.
These can appear first in the ordering.
We place them into a queue.
- Perform topological sort
We repeatedly:
- Remove a node from the queue
- Add it to the answer
- Reduce indegree of neighbors
- Push neighbors whose indegree becomes
0
This ensures characters only appear after all prerequisites are satisfied.
- Detect cycles
If the resulting ordering length is smaller than the number of unique characters, then some nodes were never processed.
That means a cycle exists.
In that case we return "".
Why it works
The graph precisely represents all ordering constraints derived from the dictionary.
A topological ordering guarantees that every directed edge:
u -> v
appears with u before v.
If a cycle exists, no such ordering is possible because some characters would depend on themselves indirectly.
Therefore:
- A successful topological sort gives a valid alien alphabet
- Failure to process all nodes correctly identifies contradictions
Python Solution
from typing import List
from collections import defaultdict, deque
class Solution:
def alienOrder(self, words: List[str]) -> str:
# Initialize graph and indegree for all unique characters
graph = defaultdict(set)
indegree = {}
for word in words:
for char in word:
indegree[char] = 0
# Build graph
for i in range(len(words) - 1):
word1 = words[i]
word2 = words[i + 1]
# Invalid prefix case
if len(word1) > len(word2) and word1.startswith(word2):
return ""
min_length = min(len(word1), len(word2))
for j in range(min_length):
if word1[j] != word2[j]:
parent = word1[j]
child = word2[j]
# Avoid duplicate edges
if child not in graph[parent]:
graph[parent].add(child)
indegree[child] += 1
break
# Start with all nodes having indegree 0
queue = deque()
for char in indegree:
if indegree[char] == 0:
queue.append(char)
result = []
# Kahn's topological sort
while queue:
current = queue.popleft()
result.append(current)
for neighbor in graph[current]:
indegree[neighbor] -= 1
if indegree[neighbor] == 0:
queue.append(neighbor)
# Cycle detection
if len(result) != len(indegree):
return ""
return "".join(result)
The implementation begins by initializing all unique characters in the indegree map. This guarantees isolated characters are preserved in the final answer.
The graph uses a defaultdict(set) so duplicate edges are avoided naturally. Duplicate edges are important to prevent because multiple identical constraints should not artificially increase indegree counts.
The adjacent word comparison phase extracts ordering relationships. Only the first differing character matters because lexicographical ordering is decided immediately at the first mismatch.
The prefix validation is handled before character comparison. This catches invalid inputs like:
["abc", "ab"]
The BFS phase implements Kahn's Algorithm. Characters with indegree 0 are processed first because they have no prerequisites.
Finally, cycle detection compares the number of processed nodes with the number of unique characters. If they differ, a cycle prevented completion of the topological ordering.
Go Solution
package main
import (
"strings"
)
func alienOrder(words []string) string {
graph := make(map[byte]map[byte]bool)
indegree := make(map[byte]int)
// Initialize all characters
for _, word := range words {
for i := 0; i < len(word); i++ {
indegree[word[i]] = 0
}
}
// Build graph
for i := 0; i < len(words)-1; i++ {
word1 := words[i]
word2 := words[i+1]
// Invalid prefix case
if len(word1) > len(word2) &&
strings.HasPrefix(word1, word2) {
return ""
}
minLen := len(word1)
if len(word2) < minLen {
minLen = len(word2)
}
for j := 0; j < minLen; j++ {
if word1[j] != word2[j] {
parent := word1[j]
child := word2[j]
if graph[parent] == nil {
graph[parent] = make(map[byte]bool)
}
// Avoid duplicate edges
if !graph[parent][child] {
graph[parent][child] = true
indegree[child]++
}
break
}
}
}
// Queue for BFS
queue := []byte{}
for char, degree := range indegree {
if degree == 0 {
queue = append(queue, char)
}
}
result := []byte{}
// Kahn's algorithm
for len(queue) > 0 {
current := queue[0]
queue = queue[1:]
result = append(result, current)
for neighbor := range graph[current] {
indegree[neighbor]--
if indegree[neighbor] == 0 {
queue = append(queue, neighbor)
}
}
}
// Cycle detection
if len(result) != len(indegree) {
return ""
}
return string(result)
}
The Go implementation follows the same algorithmic structure as the Python solution.
One difference is the graph representation. Go does not have a built-in set type, so the adjacency list uses:
map[byte]bool
to simulate a set.
The BFS queue is implemented using slices. Removing the first element:
queue = queue[1:]
works efficiently enough for the given constraints.
Go also requires explicit handling of missing maps, so adjacency maps are initialized lazily before inserting edges.
Worked Examples
Example 1
words = ["wrt","wrf","er","ett","rftt"]
Step 1, Initialize Characters
| Character | Indegree |
|---|---|
| w | 0 |
| r | 0 |
| t | 0 |
| f | 0 |
| e | 0 |
Step 2, Build Graph
Compare "wrt" and "wrf"
First difference:
t != f
Add:
t -> f
| Edge | Result |
|---|---|
| t -> f | indegree[f] = 1 |
Compare "wrf" and "er"
First difference:
w != e
Add:
w -> e
| Edge | Result |
|---|---|
| w -> e | indegree[e] = 1 |
Compare "er" and "ett"
First difference:
r != t
Add:
r -> t
| Edge | Result |
|---|---|
| r -> t | indegree[t] = 1 |
Compare "ett" and "rftt"
First difference:
e != r
Add:
e -> r
| Edge | Result |
|---|---|
| e -> r | indegree[r] = 1 |
Final Graph
| Node | Neighbors |
|---|---|
| w | e |
| e | r |
| r | t |
| t | f |
Indegrees
| Character | Indegree |
|---|---|
| w | 0 |
| e | 1 |
| r | 1 |
| t | 1 |
| f | 1 |
BFS Process
| Queue | Output |
|---|---|
| [w] | "" |
| [e] | "w" |
| [r] | "we" |
| [t] | "wer" |
| [f] | "wert" |
| [] | "wertf" |
Final answer:
"wertf"
Example 2
words = ["z","x"]
Graph Construction
Compare:
"z"
"x"
First difference:
z != x
Add:
z -> x
Indegrees
| Character | Indegree |
|---|---|
| z | 0 |
| x | 1 |
BFS
| Queue | Output |
|---|---|
| [z] | "" |
| [x] | "z" |
| [] | "zx" |
Final answer:
"zx"
Example 3
words = ["z","x","z"]
Graph Construction
Compare "z" and "x":
z -> x
Compare "x" and "z":
x -> z
Final Graph
z -> x
x -> z
This creates a cycle.
BFS Initialization
Every node has indegree 1.
Queue starts empty.
Result
No nodes can be processed.
Since:
processed nodes != total nodes
we detect a cycle and return:
""
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(C + E) | Each character and edge is processed a constant number of times |
| Space | O(C + E) | Graph, indegree map, queue, and result storage |
Where:
Cis the number of unique charactersEis the number of ordering constraints
The graph construction phase examines adjacent word pairs and processes each character comparison once. The BFS topological sort processes every node and edge exactly once.
Because the alphabet is limited to lowercase English letters, the practical runtime is very small, but the asymptotic analysis still follows standard graph complexity.
Test Cases
from typing import List
class Solution:
def alienOrder(self, words: List[str]) -> str:
from collections import defaultdict, deque
graph = defaultdict(set)
indegree = {}
for word in words:
for char in word:
indegree[char] = 0
for i in range(len(words) - 1):
word1 = words[i]
word2 = words[i + 1]
if len(word1) > len(word2) and word1.startswith(word2):
return ""
for j in range(min(len(word1), len(word2))):
if word1[j] != word2[j]:
if word2[j] not in graph[word1[j]]:
graph[word1[j]].add(word2[j])
indegree[word2[j]] += 1
break
queue = deque(
[c for c in indegree if indegree[c] == 0]
)
result = []
while queue:
current = queue.popleft()
result.append(current)
for neighbor in graph[current]:
indegree[neighbor] -= 1
if indegree[neighbor] == 0:
queue.append(neighbor)
if len(result) != len(indegree):
return ""
return "".join(result)
solution = Solution()
# Basic valid ordering
assert solution.alienOrder(
["wrt","wrf","er","ett","rftt"]
) == "wertf"
# Simple two-character ordering
assert solution.alienOrder(["z","x"]) == "zx"
# Cycle detection
assert solution.alienOrder(["z","x","z"]) == ""
# Invalid prefix case
assert solution.alienOrder(["abc","ab"]) == ""
# Single word
assert set(solution.alienOrder(["abc"])) == {"a", "b", "c"}
# Multiple disconnected characters
result = solution.alienOrder(["ab","ac"])
assert result.index("b") < result.index("c")
# Duplicate constraints should not overcount indegree
result = solution.alienOrder(["za","zb","ca","cb"])
assert result != ""
# All same character
assert solution.alienOrder(["a","a"]) == "a"
# Independent letters
result = solution.alienOrder(["a","b","c"])
assert result == "abc"
# Larger cycle
assert solution.alienOrder(["abc","bca","cab","aac"]) == ""
Test Summary
| Test | Why |
|---|---|
["wrt","wrf","er","ett","rftt"] |
Standard topological ordering |
["z","x"] |
Simple direct dependency |
["z","x","z"] |
Detects graph cycle |
["abc","ab"] |
Validates prefix invalidation |
["abc"] |
Handles single-word input |
["ab","ac"] |
Tests partial ordering constraints |
["za","zb","ca","cb"] |
Ensures duplicate edges are ignored |
["a","a"] |
Handles repeated identical words |
["a","b","c"] |
Simple linear ordering |
["abc","bca","cab","aac"] |
Detects more complex cycles |
Edge Cases
Invalid Prefix Relationships
One of the most common bugs in this problem comes from failing to handle prefix violations.
Example:
["abc", "ab"]
A naive implementation might compare characters, find no mismatch, and incorrectly conclude the ordering is valid.
However, lexicographically, a shorter prefix must always come before the longer word. Since "abc" appears before "ab", the input is invalid.
The implementation explicitly checks:
if len(word1) > len(word2) and word1.startswith(word2):
and immediately returns "".
Cycles in the Dependency Graph
Another critical edge case is cyclic dependencies.
Example:
["z","x","z"]
This creates:
z -> x
x -> z
No valid alphabet can satisfy both constraints simultaneously.
The implementation detects this naturally through topological sorting. If some nodes never reach indegree 0, they cannot be processed, meaning a cycle exists.
Characters Without Constraints
Some characters may never participate in any ordering relationship.
Example:
["abc"]
There are no edges because there is only one word.
A buggy implementation might omit such characters entirely.
This solution avoids that issue by initializing every unique character before graph construction:
for word in words:
for char in word:
indegree[char] = 0
As a result, isolated characters still appear in the final ordering.