LeetCode 2199 - Finding the Topic of Each Post

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

The problem gives us two database tables, Keywords and Posts.

The Keywords table maps words to topic IDs. A single topic can have multiple words associated with it, and a single word may belong to multiple topics. For example, the word "football" might represent topic 1, while another word such as "war" might represent topic 3.

The Posts table contains social media posts. Each post has a unique post_id and a content string.

Our goal is to determine which topics apply to each post.

A topic applies to a post if at least one keyword associated with that topic appears in the post content. The matching must satisfy two important conditions:

Matching is case insensitive.
Matching must be done using complete words, not substrings.

That second rule is extremely important. For example:

"war" matches "war"
"war" does not match "warning"
"vaccine" does not match "vaccinated"

After determining all matching topics for a post:

If no topics match, return "Ambiguous!"
Otherwise, return the matching topic IDs sorted in ascending order and joined with commas

The output contains one row per post:

post_id	topic
post ID	matching topics or `"Ambiguous!"`

The problem is fundamentally a text matching problem inside SQL. We need to compare every post against every keyword while respecting word boundaries and case insensitivity.

The important edge cases are:

Different capitalization such as "WAR" vs "war"
Multiple keywords mapping to the same topic
One keyword belonging to multiple topics
Preventing substring matches such as "war" inside "warning"
Posts with no matching keywords
Duplicate topic IDs caused by multiple matching keywords from the same topic

These constraints strongly suggest that careful string normalization and exact word matching are required.

Approaches

Brute Force Approach

The most direct approach is to compare every post against every keyword.

For each post:

Convert the post content to lowercase.
Split the content into words.
For every keyword:

Convert the keyword to lowercase.
Check whether the keyword exists in the set of words from the post.

Collect all matching topic IDs.
Remove duplicates.
Sort the topic IDs.
Produce the final string.

This approach is correct because it explicitly checks every possible keyword against every post. Since we only accept complete words from the tokenized post, substring issues are avoided.

However, this approach becomes expensive when the number of posts and keywords grows large. If there are:

P posts
K keywords
L average words per post

then the algorithm performs roughly P * K comparisons.

Optimal Approach

The key insight is that word lookup should be constant time.

Instead of repeatedly scanning keywords, we can preprocess the Keywords table into a hash map:

word -> set of topic IDs

Example:

football -> {1}
handball -> {1}
war -> {3}
vaccine -> {2}

Then for each post:

Normalize the content to lowercase.
Split it into words.
For each word in the post:

Check whether it exists in the hash map.
If yes, add all associated topic IDs to the result set.

Convert the set into the required output format.

This avoids repeatedly scanning the entire keyword list for every post.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(P × K × L)	O(L)	Compare every keyword against every post
Optimal	O(K + total_words_in_posts)	O(K)	Hash map enables constant time keyword lookup

Algorithm Walkthrough

Step 1, Normalize the Keywords

Create a hash map where:

The key is the lowercase keyword
The value is a set of topic IDs associated with that keyword

Example:

"football" -> {1}
"war" -> {3}

We use lowercase keys so matching becomes case insensitive.

Step 2, Process Each Post

For every post:

Convert the entire content to lowercase.
Split the content by spaces into individual words.

This guarantees that only complete words are checked.

Step 3, Find Matching Topics

Initialize an empty set called matched_topics.

For every word in the post:

Check whether the word exists in the keyword map.
If it exists:
Add all corresponding topic IDs into matched_topics.

A set is used because:

Multiple keywords may map to the same topic
The same keyword may appear multiple times
We must avoid duplicate topic IDs

Step 4, Build the Output String

If matched_topics is empty:

"Ambiguous!"

Otherwise:

Sort the topic IDs numerically
Convert them to strings
Join them with commas

Example:

{3, 1} -> "1,3"

Step 5, Return Results

Return one output row for every post.

Why it works

The algorithm works because every post word is checked against the complete set of valid keywords. Since the post is tokenized into exact words, substring matches cannot occur. The hash map guarantees that every keyword lookup is accurate and efficient, while the set guarantees unique topic IDs. Sorting ensures the final output format matches the problem requirements.

Python Solution

from collections import defaultdict
from typing import List

class Solution:
    def findTopic(self, keywords: List[List[str]], posts: List[List[str]]) -> List[List[str]]:
        word_to_topics = defaultdict(set)

        # Build keyword mapping
        for topic_id, word in keywords:
            word_to_topics[word.lower()].add(topic_id)

        result = []

        # Process each post
        for post_id, content in posts:
            matched_topics = set()

            words = content.lower().split()

            for word in words:
                if word in word_to_topics:
                    matched_topics.update(word_to_topics[word])

            if matched_topics:
                topic_string = ",".join(
                    str(topic_id)
                    for topic_id in sorted(matched_topics)
                )
            else:
                topic_string = "Ambiguous!"

            result.append([post_id, topic_string])

        return result

The implementation begins by constructing a dictionary called word_to_topics. This dictionary maps each lowercase keyword to the set of topic IDs associated with it.

Using a defaultdict(set) simplifies insertion because we can directly add topic IDs without checking whether the key already exists.

Next, we process each post independently.

The content is converted to lowercase and split into words using .split(). Since the problem guarantees that the content only contains English letters and spaces, simple whitespace splitting is sufficient.

For every word in the post, we check whether the word exists in the keyword dictionary. If it does, we merge all associated topic IDs into matched_topics.

After processing the entire post:

If the set is empty, the result is "Ambiguous!"
Otherwise, we sort the topic IDs and join them into the required comma-separated string

Finally, we append the [post_id, topic_string] pair into the answer list.

Go Solution

package main

import (
	"sort"
	"strconv"
	"strings"
)

type Solution struct{}

func (s Solution) findTopic(keywords [][]string, posts [][]string) [][]string {
	wordToTopics := make(map[string]map[int]bool)

	// Build keyword mapping
	for _, row := range keywords {
		topicID, _ := strconv.Atoi(row[0])
		word := strings.ToLower(row[1])

		if _, exists := wordToTopics[word]; !exists {
			wordToTopics[word] = make(map[int]bool)
		}

		wordToTopics[word][topicID] = true
	}

	result := [][]string{}

	// Process posts
	for _, row := range posts {
		postID := row[0]
		content := strings.ToLower(row[1])

		words := strings.Fields(content)

		matchedTopics := make(map[int]bool)

		for _, word := range words {
			if topics, exists := wordToTopics[word]; exists {
				for topicID := range topics {
					matchedTopics[topicID] = true
				}
			}
		}

		if len(matchedTopics) == 0 {
			result = append(result, []string{postID, "Ambiguous!"})
			continue
		}

		topicList := []int{}

		for topicID := range matchedTopics {
			topicList = append(topicList, topicID)
		}

		sort.Ints(topicList)

		topicStrings := []string{}

		for _, topicID := range topicList {
			topicStrings = append(topicStrings, strconv.Itoa(topicID))
		}

		result = append(result, []string{
			postID,
			strings.Join(topicStrings, ","),
		})
	}

	return result
}

The Go implementation follows the same overall structure as the Python solution, but there are several language-specific details.

Go does not have a built in set type, so maps with boolean values are used to simulate sets:

map[int]bool

Similarly, the keyword mapping uses nested maps:

map[string]map[int]bool

The strings.Fields() function is used instead of split() because it automatically handles whitespace tokenization cleanly.

Topic IDs are gathered into a slice before sorting because Go cannot sort map keys directly.

Unlike Python, Go requires explicit string and integer conversions using the strconv package.

Worked Examples

Example 1

Input

Keywords:

topic_id	word
1	handball
1	football
3	WAR
2	Vaccine

Posts:

post_id	content
1	We call it soccer They call it football hahaha
2	Americans prefer basketball while Europeans love handball and football
3	stop the war and play handball
4	warning I planted some flowers this morning and then got vaccinated

Step 1, Build Keyword Map

After lowercasing:

word	topics
handball	{1}
football	{1}
war	{3}
vaccine	{2}

Process Post 1

Content:

we call it soccer they call it football hahaha

Words:

Word	Match	Topics Added
we	No	{}
call	No	{}
it	No	{}
soccer	No	{}
football	Yes	{1}
hahaha	No	{}

Final topic set:

{1}

Output:

"1"

Process Post 2

Words:

Word	Match	Topics Added
basketball	No	{}
handball	Yes	{1}
football	Yes	{1}

Final topic set:

{1}

Output:

"1"

Process Post 3

Words:

Word	Match	Topics Added
war	Yes	{3}
handball	Yes	{1}

Final topic set:

{1, 3}

Sorted output:

"1,3"

Process Post 4

Words include:

warning
vaccinated

Neither matches exact keywords:

"warning" is not "war"
"vaccinated" is not "vaccine"

Final topic set:

{}

Output:

"Ambiguous!"

Complexity Analysis

Measure	Complexity	Explanation
Time	O(K + W)	`K` keywords are inserted once, `W` total post words are processed once
Space	O(K)	Hash map stores all keywords and associated topic IDs

The preprocessing step inserts every keyword into the dictionary exactly once.

During post processing, each word is examined once, and each lookup in the hash map is average constant time.

The dominant factor is therefore the total number of words across all posts.

The space usage comes primarily from storing the keyword mapping and topic sets.

Test Cases

solution = Solution()

# Provided example
assert solution.findTopic(
    [
        [1, "handball"],
        [1, "football"],
        [3, "WAR"],
        [2, "Vaccine"]
    ],
    [
        [1, "We call it soccer They call it football hahaha"],
        [2, "Americans prefer basketball while Europeans love handball and football"],
        [3, "stop the war and play handball"],
        [4, "warning I planted some flowers this morning and then got vaccinated"]
    ]
) == [
    [1, "1"],
    [2, "1"],
    [3, "1,3"],
    [4, "Ambiguous!"]
]  # basic example from prompt

# Case insensitive matching
assert solution.findTopic(
    [
        [1, "War"]
    ],
    [
        [1, "WAR"],
        [2, "war"],
        [3, "WaR"]
    ]
) == [
    [1, "1"],
    [2, "1"],
    [3, "1"]
]  # verifies lowercase normalization

# Substring should not match
assert solution.findTopic(
    [
        [1, "war"]
    ],
    [
        [1, "warning"],
        [2, "war"]
    ]
) == [
    [1, "Ambiguous!"],
    [2, "1"]
]  # ensures exact word matching

# Multiple topics for same word
assert solution.findTopic(
    [
        [1, "football"],
        [2, "football"]
    ],
    [
        [1, "football"]
    ]
) == [
    [1, "1,2"]
]  # one keyword maps to multiple topics

# Duplicate keywords for same topic
assert solution.findTopic(
    [
        [1, "football"],
        [1, "soccer"]
    ],
    [
        [1, "football soccer football"]
    ]
) == [
    [1, "1"]
]  # duplicate topic IDs should be removed

# No matching keywords
assert solution.findTopic(
    [
        [1, "apple"]
    ],
    [
        [1, "banana orange"]
    ]
) == [
    [1, "Ambiguous!"]
]  # no topics found

# Multiple topics sorted
assert solution.findTopic(
    [
        [3, "war"],
        [1, "football"],
        [2, "vaccine"]
    ],
    [
        [1, "football vaccine war"]
    ]
) == [
    [1, "1,2,3"]
]  # verifies sorting of topic IDs

Test	Why
Provided example	Validates the complete workflow
Case insensitive matching	Ensures lowercase normalization works
Substring should not match	Prevents incorrect partial matches
Multiple topics for same word	Verifies one keyword can map to many topics
Duplicate keywords for same topic	Ensures duplicate topic IDs are removed
No matching keywords	Validates `"Ambiguous!"` behavior
Multiple topics sorted	Confirms ascending order formatting

Edge Cases

Substring Collisions

One of the most dangerous mistakes is using substring matching instead of exact word matching.

For example:

keyword = "war"
post = "warning"

A naive implementation using substring search would incorrectly match "war" inside "warning".

This solution avoids the issue by splitting posts into complete words and performing exact hash map lookups.

Multiple Keywords Mapping to the Same Topic

A post may contain several keywords belonging to the same topic.

Example:

Topic 1:
- football
- handball

A post containing both words should still produce:

"1"

not:

"1,1"

The implementation uses a set to guarantee uniqueness.

One Keyword Mapping to Multiple Topics

The same keyword may belong to multiple topics.

Example:

football -> topic 1
football -> topic 2

When "football" appears in a post, both topics must be included.

The hash map stores a set of topic IDs for each keyword, so all associated topics are collected correctly.

Posts With No Matching Keywords

Some posts may contain no keywords at all.

In this situation, the result must be exactly:

"Ambiguous!"

The implementation handles this by checking whether the topic set is empty after processing all words.

Mixed Capitalization

Keywords and post contents may use arbitrary capitalization:

"WAR"
"War"
"war"

All comparisons are performed on lowercase strings, guaranteeing consistent matching regardless of case.