LeetCode 819: Most Common Word

Problem Restatement

We are given a string paragraph and a list of banned words.

We need to return the most frequent word in paragraph that is not banned.

Words are case-insensitive, so:

"Bob"
"bob"
"BOB"

all count as the same word.

The answer must be returned in lowercase.

Punctuation should be ignored. The paragraph may contain spaces and symbols such as:

!?',;.

The problem guarantees that at least one non-banned word exists, and that the answer is unique. (leetcode.com, doocs)

Input and Output

Item	Meaning
Input	A paragraph string and an array `banned`
Output	The most frequent word that is not banned
Case rule	Words are compared case-insensitively
Punctuation rule	Punctuation separates words
Guarantee	The answer is unique

Examples

Example 1:

paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]

After converting to lowercase and removing punctuation, the words are:

["bob", "hit", "a", "ball", "the", "hit", "ball", "flew", "far", "after", "it", "was", "hit"]

The word "hit" appears three times, but it is banned.

The word "ball" appears two times and is not banned.

So the answer is:

"ball"

Example 2:

paragraph = "a."
banned = []

The only word is:

"a"

So the answer is:

"a"

First Thought: Split by Spaces

A first idea is to split the paragraph by spaces:

paragraph.split()

But this does not handle punctuation correctly.

For example:

"ball,"

would stay as "ball,", not "ball".

So we need to normalize the paragraph first.

Key Insight

Punctuation acts like a word separator.

We can scan the paragraph character by character.

If the character is a letter, keep its lowercase form.

Otherwise, replace it with a space.

Then the paragraph becomes easy to split into clean lowercase words.

For example:

"Bob hit a ball,"

becomes:

"bob hit a ball "

Then we count every word that is not banned.

Algorithm

Convert banned to a set:

banned_set = set(banned)

Normalize the paragraph:

Convert letters to lowercase.
Replace non-letters with spaces.

Then split into words:

words = normalized.split()

Use a hash map to count non-banned words.

For each word:

If the word is banned, skip it.
Otherwise, increment its count.
Track the word with the largest count seen so far.

Return the best word.

Correctness

The normalization step preserves every letter from the paragraph and converts it to lowercase. It also replaces every punctuation symbol or space with a space. Therefore, splitting the normalized string produces exactly the lowercase words from the paragraph, with punctuation ignored.

The banned set contains exactly the words that must not be considered. The algorithm skips every word in this set, so no banned word can become the answer.

For every non-banned word, the algorithm increments its frequency count once for each occurrence. Therefore, the hash map stores the correct frequency of every allowed word.

Whenever a word’s count becomes larger than the current maximum, the algorithm updates the answer. Since the problem guarantees a unique most frequent non-banned word, the final stored answer is exactly the required word.

Complexity

Let n = len(paragraph) and m = len(banned).

Metric	Value	Why
Time	`O(n + m)`	Build the banned set, scan the paragraph, and count words
Space	`O(n + m)`	Store normalized characters, banned words, and word counts

Implementation

from collections import defaultdict

class Solution:
    def mostCommonWord(self, paragraph: str, banned: list[str]) -> str:
        banned_set = set(banned)

        chars = []

        for ch in paragraph:
            if ch.isalpha():
                chars.append(ch.lower())
            else:
                chars.append(" ")

        normalized = "".join(chars)

        counts = defaultdict(int)
        best_word = ""
        best_count = 0

        for word in normalized.split():
            if word in banned_set:
                continue

            counts[word] += 1

            if counts[word] > best_count:
                best_count = counts[word]
                best_word = word

        return best_word

Code Explanation

We first make banned-word lookup fast:

banned_set = set(banned)

Then we build a normalized version of the paragraph:

chars = []

Letters are kept and converted to lowercase:

if ch.isalpha():
    chars.append(ch.lower())

Everything else becomes a space:

else:
    chars.append(" ")

This handles commas, periods, exclamation marks, and other allowed punctuation.

After joining the characters:

normalized = "".join(chars)

we can safely split by whitespace:

for word in normalized.split():

Banned words are ignored:

if word in banned_set:
    continue

Every allowed word is counted:

counts[word] += 1

When a word reaches a new largest count, we update the answer:

if counts[word] > best_count:
    best_count = counts[word]
    best_word = word

Finally, return the most frequent allowed word.

Testing

def run_tests():
    s = Solution()

    assert s.mostCommonWord(
        "Bob hit a ball, the hit BALL flew far after it was hit.",
        ["hit"],
    ) == "ball"

    assert s.mostCommonWord(
        "a.",
        [],
    ) == "a"

    assert s.mostCommonWord(
        "Bob. hIt, baLl",
        ["bob", "hit"],
    ) == "ball"

    assert s.mostCommonWord(
        "a, a, a, b,b,b,c, c",
        ["a"],
    ) == "b"

    assert s.mostCommonWord(
        "Hello! hello? world.",
        ["world"],
    ) == "hello"

    print("all tests passed")

run_tests()

Test	Why
Official paragraph example	Checks punctuation, casing, and banned word
`"a."`	Minimum simple case
Mixed casing	Confirms case-insensitive counting
Repeated punctuation-separated words	Checks punctuation normalization
Banned lower-frequency alternative	Ensures banned words are skipped