LeetCode 2047 - Number of Valid Words in a Sentence

This problem asks us to count the number of valid words in a sentence. The sentence consists of lowercase letters, digits, hyphens, punctuation (!, ., ,), and spaces. A word is defined as a token separated by spaces, and it is considered valid if it satisfies three conditions: 1.

LeetCode Problem 2047

Difficulty: 🟢 Easy
Topics: String

Solution

Problem Understanding

This problem asks us to count the number of valid words in a sentence. The sentence consists of lowercase letters, digits, hyphens, punctuation (!, ., ,), and spaces. A word is defined as a token separated by spaces, and it is considered valid if it satisfies three conditions:

  1. It contains only lowercase letters, at most one hyphen, and at most one punctuation mark. It cannot contain digits.
  2. If it contains a hyphen, the hyphen must be surrounded by lowercase letters. Tokens like "a-b" are valid, but "-ab" and "ab-" are not.
  3. If it contains a punctuation mark, it must appear at the end of the token. Tokens like "ab," and "c!" are valid, but "a!b" is invalid.

The input is a string sentence with at least one token, up to length 1000. We need to return an integer representing the number of valid words in the sentence.

Important edge cases include multiple consecutive spaces, tokens containing digits, punctuation in the middle of a token, and tokens with multiple hyphens.

Approaches

The brute-force approach would split the sentence into tokens by spaces, then check each token character by character against all three rules. This works correctly because we can verify each rule sequentially. However, it requires careful handling of multiple hyphens and punctuation placement.

The optimal approach uses the same idea but is cleaner: iterate through each token, check for digits, count hyphens and punctuation marks, and enforce their placement rules. Since each token has at most length 1000, this approach is efficient.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) O(n) Split the sentence and check each token sequentially
Optimal O(n) O(1) Same as brute force but minimal overhead; checks rules efficiently while iterating

Algorithm Walkthrough

  1. Split the sentence into tokens by spaces. Ignore empty tokens resulting from multiple spaces.

  2. Initialize a counter for valid words.

  3. For each token:

  4. Skip the token if it contains any digits.

  5. Count the number of hyphens. If there is more than one, the token is invalid.

  6. Ensure any hyphen is surrounded by lowercase letters. If not, the token is invalid.

  7. Count punctuation marks. If there is more than one, the token is invalid.

  8. Ensure any punctuation appears only at the end of the token. If not, the token is invalid.

  9. If the token passes all checks, increment the valid word counter.

  10. Return the counter.

Why it works: By checking each token sequentially and validating it against all three rules, we guarantee that only valid tokens contribute to the count. Each token is independently checked, which aligns with the problem definition.

Python Solution

class Solution:
    def countValidWords(self, sentence: str) -> int:
        def is_valid(token: str) -> bool:
            if not token:
                return False
            hyphen_count = 0
            for i, char in enumerate(token):
                if char.isdigit():
                    return False
                if char == '-':
                    hyphen_count += 1
                    if hyphen_count > 1:
                        return False
                    if i == 0 or i == len(token) - 1:
                        return False
                    if not (token[i-1].isalpha() and token[i+1].isalpha()):
                        return False
                if char in "!.,": 
                    if i != len(token) - 1:
                        return False
            return True
        
        tokens = sentence.split()
        count = 0
        for token in tokens:
            if is_valid(token):
                count += 1
        return count

The Python implementation splits the sentence, then uses a helper is_valid function to check each token against the rules. The hyphen and punctuation rules are explicitly enforced, and digits immediately invalidate a token. Empty tokens from multiple spaces are ignored.

Go Solution

import "strings"
import "unicode"

func countValidWords(sentence string) int {
    isValid := func(token string) bool {
        if len(token) == 0 {
            return false
        }
        hyphenCount := 0
        for i, ch := range token {
            if unicode.IsDigit(ch) {
                return false
            }
            if ch == '-' {
                hyphenCount++
                if hyphenCount > 1 {
                    return false
                }
                if i == 0 || i == len(token)-1 {
                    return false
                }
                if !unicode.IsLetter(rune(token[i-1])) || !unicode.IsLetter(rune(token[i+1])) {
                    return false
                }
            }
            if ch == '!' || ch == '.' || ch == ',' {
                if i != len(token)-1 {
                    return false
                }
            }
        }
        return true
    }

    tokens := strings.Fields(sentence)
    count := 0
    for _, token := range tokens {
        if isValid(token) {
            count++
        }
    }
    return count
}

In Go, we use strings.Fields to split by whitespace and iterate each token. The unicode package ensures proper checking of letters and digits. Index checks enforce hyphen and punctuation rules.

Worked Examples

Example 1: "cat and dog"

Token Valid? Reason
"cat" Yes Only letters
"and" Yes Only letters
"dog" Yes Only letters

Output: 3

Example 2: "!this 1-s b8d!"

Token Valid? Reason
"!this" No Punctuation not at the end
"1-s" No Contains digit
"b8d!" No Contains digit

Output: 0

Example 3: "alice and bob are playing stone-game10"

Token Valid? Reason
"alice" Yes Only letters
"and" Yes Only letters
"bob" Yes Only letters
"are" Yes Only letters
"playing" Yes Only letters
"stone-game10" No Contains digit

Output: 5

Complexity Analysis

Measure Complexity Explanation
Time O(n) We iterate over each character of the sentence exactly once in total
Space O(n) Splitting the sentence creates an array of tokens

Since the input length is up to 1000, both time and space are efficient.

Test Cases

# Provided examples
assert Solution().countValidWords("cat and  dog") == 3  # simple words only
assert Solution().countValidWords("!this  1-s b8d!") == 0  # invalid tokens
assert Solution().countValidWords("alice and  bob are playing stone-game10") == 5  # mix of valid/invalid

# Edge cases
assert Solution().countValidWords("a-b.") == 1  # valid hyphen
assert Solution().countValidWords("-ab ab-") == 0  # invalid hyphen positions
assert Solution().countValidWords("hello, world!") == 2  # punctuation at the end
assert Solution().countValidWords("hello! world!") == 2  # multiple punctuation at end of different tokens
assert Solution().countValidWords("1 2 3") == 0  # only digits
assert Solution().countValidWords("! . ,") == 3  # single punctuation tokens
Test Why
"cat and dog" Normal sentence, all words valid
"!this 1-s b8d!" Invalid words due to digits or punctuation placement
"alice and bob are playing stone-game10" Mixed valid and invalid tokens
"a-b." Tests valid hyphen
"-ab ab-" Tests hyphen at start/end invalid
"hello, world!" Punctuation at the end of valid words
"1 2 3" Tokens with only digits
"! . ," Tokens with only punctuation are valid

Edge Cases

Multiple consecutive spaces: The sentence may have multiple spaces, creating empty tokens when splitting. Using split() in Python or strings.Fields() in Go automatically handles this by ignoring empty strings.

Hyphen at boundary: Tokens like "-ab" or "ab-" are invalid because the hyphen must be between letters. The implementation explicitly checks the character before and after the hyphen.

Single-character punctuation tokens: Tokens like "!" or "." are valid. The code allows punctuation to appear alone at the end of the token, which covers this case.

These edge cases cover the primary pitfalls that can cause naive solutions to fail.