LeetCode 1065 - Index Pairs of a String

The problem gives us a string called text and a list of dictionary words called words. We must find every substring inside text that exactly matches one of the words in the dictionary.

LeetCode Problem 1065

Difficulty: 🟢 Easy
Topics: Array, String, Trie, Sorting

Solution

Problem Understanding

The problem gives us a string called text and a list of dictionary words called words. We must find every substring inside text that exactly matches one of the words in the dictionary.

For every valid match, we return the pair [i, j], where:

  • i is the starting index of the substring
  • j is the ending index of the substring, inclusive

The output must contain all matching index pairs, sorted first by the starting index and then by the ending index.

For example, if:

text = "ababa"
words = ["aba", "ab"]

then:

  • "ab" appears from index 0 to 1
  • "aba" appears from index 0 to 2
  • "ab" appears again from index 2 to 3
  • "aba" appears again from index 2 to 4

So the answer becomes:

[[0,1],[0,2],[2,3],[2,4]]

An important detail is that overlapping matches are allowed. The substring at [0,2] and the substring at [2,4] both use index 2, and both must be included.

The constraints are fairly small:

  • text.length <= 100
  • words.length <= 20
  • words[i].length <= 50

These limits tell us that even moderately inefficient solutions are acceptable. We do not need highly advanced optimization techniques like suffix automata or Aho-Corasick. A clean Trie-based solution or direct matching solution will work comfortably within limits.

There are several edge cases worth identifying early:

  • Multiple matches can start at the same index.
  • Matches may overlap.
  • Some words may never appear in the text.
  • Single-character words are valid.
  • The entire text itself may be a valid word.
  • Different words may share prefixes, which makes Trie structures especially useful.

Approaches

Brute Force Approach

The most straightforward solution is to check every possible substring of text.

For every starting index i, we can try every ending index j, extract the substring text[i:j+1], and check whether it exists in the dictionary.

To make lookup efficient, we place all words into a hash set.

The algorithm works because every possible substring is examined exactly once. If a substring belongs to words, we record its indices.

However, this approach repeatedly creates substrings, which can be inefficient. Even though the constraints are small enough for it to pass, it does unnecessary work because many substrings cannot possibly match any dictionary word.

A better solution uses a Trie.

A Trie is useful because many words may share prefixes. Instead of generating every substring independently, we start from each index in text and walk character by character through the Trie.

At each step:

  • If the current character does not exist in the Trie, we stop immediately.
  • If we reach a Trie node marked as a complete word, we record a valid index pair.

This avoids exploring substrings that cannot possibly match any word.

The key insight is that the Trie lets us validate prefixes incrementally instead of building complete substrings repeatedly.

Approach Time Complexity Space Complexity Notes
Brute Force O(n³) O(m) Checks every substring and creates substring objects repeatedly
Optimal O(n × L) O(total characters in words) Trie enables early stopping during matching

Where:

  • n is the length of text
  • L is the maximum word length
  • m is the number of words

Algorithm Walkthrough

Trie Construction

We first build a Trie from all words in the dictionary.

Each Trie node contains:

  • A map of child characters
  • A boolean flag indicating whether the path forms a complete word

For example, if the words are:

["ab", "aba"]

then the Trie stores:

a -> b (word)
       -> a (word)

Search Process

We now iterate through every starting position in text.

Step-by-Step Algorithm

  1. Create an empty Trie root node.
  2. Insert every word into the Trie.

For each character:

  • If the child node does not exist, create it.
  • Move to the child node.

After the final character, mark the node as a complete word. 3. Initialize an empty result list. 4. For every starting index start in text:

  • Begin traversal from the Trie root.
  • Move forward character by character through text.
  1. For every ending index end starting from start:
  • Check whether text[end] exists as a child in the Trie.
  • If it does not exist, stop searching from this starting index because no longer substring can match.
  • Otherwise, move to the corresponding child node.
  1. If the current Trie node marks the end of a word:
  • Add [start, end] to the result list.
  1. Continue until all starting positions are processed.
  2. Return the result list.

Why it works

The Trie guarantees that every traversal corresponds to a valid prefix of at least one dictionary word. Whenever we reach a node marked as a complete word, the substring from start to end exactly matches a dictionary word.

Because we begin traversal from every starting index in text, every possible valid substring is considered. Because traversal stops immediately when a prefix becomes invalid, no unnecessary substrings are explored.

Therefore, the algorithm returns all valid index pairs exactly once and in sorted order.

Python Solution

from typing import List

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False

class Solution:
    def indexPairs(self, text: str, words: List[str]) -> List[List[int]]:
        root = TrieNode()

        # Build Trie
        for word in words:
            node = root

            for char in word:
                if char not in node.children:
                    node.children[char] = TrieNode()

                node = node.children[char]

            node.is_word = True

        result = []

        # Search from every starting index
        for start in range(len(text)):
            node = root

            for end in range(start, len(text)):
                char = text[end]

                if char not in node.children:
                    break

                node = node.children[char]

                if node.is_word:
                    result.append([start, end])

        return result

The implementation begins by defining a TrieNode class. Each node stores:

  • children, a dictionary mapping characters to child nodes
  • is_word, which indicates whether the current path forms a complete word

The Trie is built by inserting every word character by character. Shared prefixes automatically reuse existing nodes.

After construction, the algorithm searches from every possible starting index in text.

For each starting index:

  • Traversal begins from the Trie root.
  • Characters are consumed one at a time.
  • If the next character does not exist in the Trie, traversal stops immediately.
  • If the current node represents a complete word, the corresponding index pair is added to the answer.

Because the outer loop processes starting indices in ascending order and the inner loop processes ending indices in ascending order, the output is naturally sorted as required.

Go Solution

package main

type TrieNode struct {
	children map[byte]*TrieNode
	isWord   bool
}

func newTrieNode() *TrieNode {
	return &TrieNode{
		children: make(map[byte]*TrieNode),
	}
}

func indexPairs(text string, words []string) [][]int {
	root := newTrieNode()

	// Build Trie
	for _, word := range words {
		node := root

		for i := 0; i < len(word); i++ {
			char := word[i]

			if _, exists := node.children[char]; !exists {
				node.children[char] = newTrieNode()
			}

			node = node.children[char]
		}

		node.isWord = true
	}

	result := [][]int{}

	// Search from every starting index
	for start := 0; start < len(text); start++ {
		node := root

		for end := start; end < len(text); end++ {
			char := text[end]

			nextNode, exists := node.children[char]

			if !exists {
				break
			}

			node = nextNode

			if node.isWord {
				result = append(result, []int{start, end})
			}
		}
	}

	return result
}

The Go implementation closely follows the Python version.

The Trie uses:

  • map[byte]*TrieNode for child pointers
  • A boolean isWord field

Since the input contains only lowercase English letters, using byte is sufficient and efficient.

The result is stored as a slice of integer slices, [][]int.

Go slices naturally handle dynamic resizing during append operations, so no special memory management is needed.

Worked Examples

Example 1

text = "thestoryofleetcodeandme"
words = ["story","fleet","leetcode"]

Trie Structure

s -> t -> o -> r -> y
f -> l -> e -> e -> t
l -> e -> e -> t -> c -> o -> d -> e

Traversal

Start Index Traversal Match Found
0 "t" not in Trie root None
1 "h" not in Trie root None
2 "e" not in Trie root None
3 s -> t -> o -> r -> y [3,7]
9 l -> e -> e -> t -> c -> o -> d -> e [9,17]
10 e not in Trie root None
13 f -> l -> e -> e -> t [10,14] not possible because traversal started at 10

The valid output becomes:

[[3,7],[9,17]]

However, "fleet" also exists starting at index 9 inside "leetcode"? No. The substring beginning at index 9 is "leetcode...", so "fleet" does not match there.

The final output is:

[[3,7],[9,17]]

Actually, the official example output also includes:

[9,13]

because "fleet" appears starting at index 9 in "fleet" inside the text. Let us verify carefully:

thestoryofleetcodeandme
         ^
         10

The substring "fleet" does not occur. The official example is:

[[3,7],[9,13],[10,17]]

This corresponds to:

  • "story" at [3,7]
  • "fleet" at [9,13]
  • "leetcode" at [10,17]

because "fleet" overlaps with "leetcode" in the sequence "fleetcode".

Example 2

text = "ababa"
words = ["aba","ab"]

Trie Structure

a -> b (word)
       -> a (word)

Detailed Traversal

Start End Current Substring Trie Valid Word Match Result
0 0 "a" Yes No []
0 1 "ab" Yes Yes [[0,1]]
0 2 "aba" Yes Yes [[0,1],[0,2]]
0 3 "abab" No No Stop
1 1 "b" No No Stop
2 2 "a" Yes No []
2 3 "ab" Yes Yes [[2,3]]
2 4 "aba" Yes Yes [[2,3],[2,4]]

Final result:

[[0,1],[0,2],[2,3],[2,4]]

Complexity Analysis

Measure Complexity Explanation
Time O(n × L) For each starting index, traversal continues at most the maximum word length
Space O(total characters in words) Trie stores every character from all words

The Trie prevents unnecessary substring exploration. From each starting position, traversal stops immediately once the current substring is no longer a valid prefix.

If L is the maximum word length, then each starting position explores at most L characters before terminating.

The Trie size depends on the total number of characters inserted across all dictionary words.

Test Cases

from typing import List

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False

class Solution:
    def indexPairs(self, text: str, words: List[str]) -> List[List[int]]:
        root = TrieNode()

        for word in words:
            node = root

            for char in word:
                if char not in node.children:
                    node.children[char] = TrieNode()

                node = node.children[char]

            node.is_word = True

        result = []

        for start in range(len(text)):
            node = root

            for end in range(start, len(text)):
                char = text[end]

                if char not in node.children:
                    break

                node = node.children[char]

                if node.is_word:
                    result.append([start, end])

        return result

solution = Solution()

# Provided example with overlapping matches
assert solution.indexPairs(
    "ababa",
    ["aba", "ab"]
) == [[0, 1], [0, 2], [2, 3], [2, 4]]

# Provided example with multiple dictionary words
assert solution.indexPairs(
    "thestoryofleetcodeandme",
    ["story", "fleet", "leetcode"]
) == [[3, 7], [9, 13], [10, 17]]

# Single character match
assert solution.indexPairs(
    "abc",
    ["a"]
) == [[0, 0]]

# Entire text is one word
assert solution.indexPairs(
    "hello",
    ["hello"]
) == [[0, 4]]

# No matches
assert solution.indexPairs(
    "abc",
    ["xyz"]
) == []

# Multiple overlapping occurrences
assert solution.indexPairs(
    "aaaa",
    ["a", "aa"]
) == [
    [0, 0], [0, 1],
    [1, 1], [1, 2],
    [2, 2], [2, 3],
    [3, 3]
]

# Shared prefixes in Trie
assert solution.indexPairs(
    "abcdef",
    ["ab", "abc", "abcd"]
) == [[0, 1], [0, 2], [0, 3]]

# Word appears multiple times
assert solution.indexPairs(
    "catcat",
    ["cat"]
) == [[0, 2], [3, 5]]

# Match at end of string
assert solution.indexPairs(
    "xyzabc",
    ["abc"]
) == [[3, 5]]
Test Why
"ababa" with overlapping words Validates overlap handling
"thestoryofleetcodeandme" Validates multiple matches and sorting
Single-character word Tests smallest word size
Entire text as word Tests full-length match
No matches Ensures empty result handling
"aaaa" with repeated patterns Stresses overlapping repeated matches
Shared prefixes Verifies Trie prefix reuse
Repeated word occurrence Ensures multiple appearances are found
Match at end of string Tests boundary handling

Edge Cases

Overlapping Matches

One of the most important edge cases is overlapping substrings.

For example:

text = "ababa"
words = ["aba"]

The word "aba" appears at both [0,2] and [2,4].

A buggy implementation might skip the second occurrence after finding the first one. Our algorithm avoids this problem because it independently starts a Trie traversal from every index in the text.

Shared Prefixes

Words may share long prefixes:

words = ["ab", "abc", "abcd"]

A naive implementation might repeatedly reprocess the same prefix characters.

The Trie handles this naturally because all words share the same prefix path. During traversal, every node can simultaneously represent:

  • An intermediate prefix
  • A complete word
  • The beginning of a longer word

This allows all matches to be discovered efficiently.

Early Mismatch Termination

Consider:

text = "xyzabc"
words = ["abc"]

At indices 0, 1, and 2, traversal immediately fails because those characters do not exist at the Trie root.

Without early termination, an implementation might continue checking unnecessary substrings. The Trie-based approach stops immediately once a prefix becomes invalid, which keeps the algorithm efficient.