LeetCode 819 - Most Common Word
The problem asks us to process a paragraph of text and determine which word appears most frequently, while ignoring a given list of banned words. The final answer must be returned in lowercase.
Difficulty: 🟢 Easy
Topics: Array, Hash Table, String, Counting
Solution
Problem Understanding
The problem asks us to process a paragraph of text and determine which word appears most frequently, while ignoring a given list of banned words. The final answer must be returned in lowercase.
Several important details make this problem slightly more involved than simple word counting. First, the paragraph may contain punctuation such as commas, periods, exclamation marks, apostrophes, semicolons, and question marks. These punctuation symbols are not considered part of a word and must be ignored during processing. Second, words are case-insensitive, meaning "Ball", "BALL", and "ball" should all be treated as the same word.
The input consists of two values:
paragraph, a string containing words, spaces, and punctuationbanned, an array of lowercase words that are not allowed to be considered for the answer
The output is a single string representing the most frequent word that is not banned.
The constraints are fairly small. The paragraph length is at most 1000 characters, and the banned list contains at most 100 words. Because of these limits, we do not need highly specialized optimizations. A straightforward linear scan with hash-based counting is more than sufficient.
There are several important edge cases to consider:
- Words may appear with different capitalization, so normalization to lowercase is necessary.
- Punctuation may appear adjacent to words, such as
"ball,"or"hit.", so punctuation must be removed or treated as separators. - The banned list may be empty.
- The paragraph may contain only a single word.
- Multiple punctuation marks may appear consecutively.
- The problem guarantees that there is at least one non-banned word and that the answer is unique, so we never need to handle ties.
Approaches
Brute Force Approach
A brute-force solution would first extract every word from the paragraph, normalize them to lowercase, and then, for each distinct word, repeatedly scan the entire list of words to count how many times it appears.
The algorithm would work like this:
- Convert the paragraph into lowercase.
- Remove punctuation and split the paragraph into words.
- For every word:
- Skip it if it is banned.
- Count how many times it appears by scanning the entire word list again.
- Track the word with the maximum frequency.
This approach is correct because every word frequency is computed exactly. However, it is inefficient because counting each word requires another pass through the list.
If there are n words, and we count frequencies by rescanning the list for every word, the total complexity becomes O(n²).
Optimal Approach
The key observation is that frequency counting is exactly what hash maps are designed for. Instead of rescanning the entire list repeatedly, we can count occurrences in a single pass.
The improved approach works as follows:
- Convert the paragraph to lowercase.
- Replace punctuation characters with spaces.
- Split the cleaned string into words.
- Store banned words in a hash set for
O(1)lookup. - Use a hash map to count occurrences of every non-banned word.
- Track the word with the highest frequency while counting.
This approach is efficient because every word is processed exactly once, and hash map operations are constant time on average.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(n) | Repeatedly scans all words to count frequencies |
| Optimal | O(n) | O(n) | Uses hash map frequency counting in a single pass |
Algorithm Walkthrough
- Convert the entire paragraph to lowercase.
This ensures that words differing only by capitalization are treated as identical. For example, "BALL" and "ball" become the same word.
2. Replace punctuation characters with spaces.
Since punctuation is not part of words, characters such as ., ,, !, ?, ;, and ' should act as separators. Replacing them with spaces simplifies splitting.
3. Split the cleaned paragraph into individual words.
After punctuation removal, calling split() gives a clean list of words.
4. Store the banned words in a hash set.
A hash set allows constant-time membership checks. This is important because we may check every word against the banned list. 5. Create a hash map to count frequencies.
As we iterate through the words:
- Skip the word if it is banned.
- Otherwise increment its count in the frequency map.
- Track the most frequent valid word while counting.
After updating a word's frequency:
- Compare it with the current maximum frequency.
- If it exceeds the maximum, update the answer.
- Return the most frequent non-banned word.
Why it works
The algorithm processes every valid word exactly once and maintains an accurate frequency count in the hash map. Since banned words are skipped entirely, only eligible words contribute to the counts. By continuously tracking the maximum frequency encountered, the algorithm guarantees that the returned word is the most common non-banned word. The problem guarantees uniqueness, so there is no ambiguity in the final result.
Python Solution
from typing import List
from collections import defaultdict
class Solution:
def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
banned_set = set(banned)
normalized = paragraph.lower()
for char in "!?',;.":
normalized = normalized.replace(char, " ")
words = normalized.split()
frequency = defaultdict(int)
most_common = ""
max_count = 0
for word in words:
if word in banned_set:
continue
frequency[word] += 1
if frequency[word] > max_count:
max_count = frequency[word]
most_common = word
return most_common
The implementation begins by converting the banned list into a set. This allows efficient membership checks while processing words.
Next, the paragraph is converted to lowercase so that comparisons become case-insensitive. The punctuation characters specified in the problem are replaced with spaces, which ensures that punctuation does not become part of a word.
The cleaned paragraph is then split into words using split(). Python automatically handles multiple spaces correctly.
A defaultdict(int) is used for frequency counting. This avoids manually checking whether a word already exists in the map.
As each word is processed:
- Banned words are skipped immediately.
- Valid words have their frequency incremented.
- If the new frequency exceeds the current maximum, the answer is updated.
Finally, the stored most common word is returned.
Go Solution
package main
import (
"strings"
)
func mostCommonWord(paragraph string, banned []string) string {
bannedSet := make(map[string]bool)
for _, word := range banned {
bannedSet[word] = true
}
normalized := strings.ToLower(paragraph)
punctuation := "!?',;."
for _, ch := range punctuation {
normalized = strings.ReplaceAll(normalized, string(ch), " ")
}
words := strings.Fields(normalized)
frequency := make(map[string]int)
mostCommon := ""
maxCount := 0
for _, word := range words {
if bannedSet[word] {
continue
}
frequency[word]++
if frequency[word] > maxCount {
maxCount = frequency[word]
mostCommon = word
}
}
return mostCommon
}
The Go implementation follows the same logic as