LeetCode 2047 - Number of Valid Words in a Sentence
This problem asks us to count the number of valid words in a sentence. The sentence consists of lowercase letters, digits, hyphens, punctuation (!, ., ,), and spaces. A word is defined as a token separated by spaces, and it is considered valid if it satisfies three conditions: 1.
Difficulty: 🟢 Easy
Topics: String
Solution
Problem Understanding
This problem asks us to count the number of valid words in a sentence. The sentence consists of lowercase letters, digits, hyphens, punctuation (!, ., ,), and spaces. A word is defined as a token separated by spaces, and it is considered valid if it satisfies three conditions:
- It contains only lowercase letters, at most one hyphen, and at most one punctuation mark. It cannot contain digits.
- If it contains a hyphen, the hyphen must be surrounded by lowercase letters. Tokens like
"a-b"are valid, but"-ab"and"ab-"are not. - If it contains a punctuation mark, it must appear at the end of the token. Tokens like
"ab,"and"c!"are valid, but"a!b"is invalid.
The input is a string sentence with at least one token, up to length 1000. We need to return an integer representing the number of valid words in the sentence.
Important edge cases include multiple consecutive spaces, tokens containing digits, punctuation in the middle of a token, and tokens with multiple hyphens.
Approaches
The brute-force approach would split the sentence into tokens by spaces, then check each token character by character against all three rules. This works correctly because we can verify each rule sequentially. However, it requires careful handling of multiple hyphens and punctuation placement.
The optimal approach uses the same idea but is cleaner: iterate through each token, check for digits, count hyphens and punctuation marks, and enforce their placement rules. Since each token has at most length 1000, this approach is efficient.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(n) | Split the sentence and check each token sequentially |
| Optimal | O(n) | O(1) | Same as brute force but minimal overhead; checks rules efficiently while iterating |
Algorithm Walkthrough
-
Split the sentence into tokens by spaces. Ignore empty tokens resulting from multiple spaces.
-
Initialize a counter for valid words.
-
For each token:
-
Skip the token if it contains any digits.
-
Count the number of hyphens. If there is more than one, the token is invalid.
-
Ensure any hyphen is surrounded by lowercase letters. If not, the token is invalid.
-
Count punctuation marks. If there is more than one, the token is invalid.
-
Ensure any punctuation appears only at the end of the token. If not, the token is invalid.
-
If the token passes all checks, increment the valid word counter.
-
Return the counter.
Why it works: By checking each token sequentially and validating it against all three rules, we guarantee that only valid tokens contribute to the count. Each token is independently checked, which aligns with the problem definition.
Python Solution
class Solution:
def countValidWords(self, sentence: str) -> int:
def is_valid(token: str) -> bool:
if not token:
return False
hyphen_count = 0
for i, char in enumerate(token):
if char.isdigit():
return False
if char == '-':
hyphen_count += 1
if hyphen_count > 1:
return False
if i == 0 or i == len(token) - 1:
return False
if not (token[i-1].isalpha() and token[i+1].isalpha()):
return False
if char in "!.,":
if i != len(token) - 1:
return False
return True
tokens = sentence.split()
count = 0
for token in tokens:
if is_valid(token):
count += 1
return count
The Python implementation splits the sentence, then uses a helper is_valid function to check each token against the rules. The hyphen and punctuation rules are explicitly enforced, and digits immediately invalidate a token. Empty tokens from multiple spaces are ignored.
Go Solution
import "strings"
import "unicode"
func countValidWords(sentence string) int {
isValid := func(token string) bool {
if len(token) == 0 {
return false
}
hyphenCount := 0
for i, ch := range token {
if unicode.IsDigit(ch) {
return false
}
if ch == '-' {
hyphenCount++
if hyphenCount > 1 {
return false
}
if i == 0 || i == len(token)-1 {
return false
}
if !unicode.IsLetter(rune(token[i-1])) || !unicode.IsLetter(rune(token[i+1])) {
return false
}
}
if ch == '!' || ch == '.' || ch == ',' {
if i != len(token)-1 {
return false
}
}
}
return true
}
tokens := strings.Fields(sentence)
count := 0
for _, token := range tokens {
if isValid(token) {
count++
}
}
return count
}
In Go, we use strings.Fields to split by whitespace and iterate each token. The unicode package ensures proper checking of letters and digits. Index checks enforce hyphen and punctuation rules.
Worked Examples
Example 1: "cat and dog"
| Token | Valid? | Reason |
|---|---|---|
"cat" |
Yes | Only letters |
"and" |
Yes | Only letters |
"dog" |
Yes | Only letters |
Output: 3
Example 2: "!this 1-s b8d!"
| Token | Valid? | Reason |
|---|---|---|
"!this" |
No | Punctuation not at the end |
"1-s" |
No | Contains digit |
"b8d!" |
No | Contains digit |
Output: 0
Example 3: "alice and bob are playing stone-game10"
| Token | Valid? | Reason |
|---|---|---|
"alice" |
Yes | Only letters |
"and" |
Yes | Only letters |
"bob" |
Yes | Only letters |
"are" |
Yes | Only letters |
"playing" |
Yes | Only letters |
"stone-game10" |
No | Contains digit |
Output: 5
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | We iterate over each character of the sentence exactly once in total |
| Space | O(n) | Splitting the sentence creates an array of tokens |
Since the input length is up to 1000, both time and space are efficient.
Test Cases
# Provided examples
assert Solution().countValidWords("cat and dog") == 3 # simple words only
assert Solution().countValidWords("!this 1-s b8d!") == 0 # invalid tokens
assert Solution().countValidWords("alice and bob are playing stone-game10") == 5 # mix of valid/invalid
# Edge cases
assert Solution().countValidWords("a-b.") == 1 # valid hyphen
assert Solution().countValidWords("-ab ab-") == 0 # invalid hyphen positions
assert Solution().countValidWords("hello, world!") == 2 # punctuation at the end
assert Solution().countValidWords("hello! world!") == 2 # multiple punctuation at end of different tokens
assert Solution().countValidWords("1 2 3") == 0 # only digits
assert Solution().countValidWords("! . ,") == 3 # single punctuation tokens
| Test | Why |
|---|---|
"cat and dog" |
Normal sentence, all words valid |
"!this 1-s b8d!" |
Invalid words due to digits or punctuation placement |
"alice and bob are playing stone-game10" |
Mixed valid and invalid tokens |
"a-b." |
Tests valid hyphen |
"-ab ab-" |
Tests hyphen at start/end invalid |
"hello, world!" |
Punctuation at the end of valid words |
"1 2 3" |
Tokens with only digits |
"! . ," |
Tokens with only punctuation are valid |
Edge Cases
Multiple consecutive spaces: The sentence may have multiple spaces, creating empty tokens when splitting. Using split() in Python or strings.Fields() in Go automatically handles this by ignoring empty strings.
Hyphen at boundary: Tokens like "-ab" or "ab-" are invalid because the hyphen must be between letters. The implementation explicitly checks the character before and after the hyphen.
Single-character punctuation tokens: Tokens like "!" or "." are valid. The code allows punctuation to appear alone at the end of the token, which covers this case.
These edge cases cover the primary pitfalls that can cause naive solutions to fail.