LeetCode 3926 - Count Valid Word Occurrences

The problem asks us to count occurrences of specific words in a string that is formed by concatenating an array of smaller string "chunks".

LeetCode Problem 3926

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Counting

Solution

Problem Understanding

The problem asks us to count occurrences of specific words in a string that is formed by concatenating an array of smaller string "chunks". The twist is that words are defined in a non-standard way: they can include joiner hyphens, which are hyphens that appear between two lowercase English letters. Any other character, including spaces or non-joiner hyphens, acts as a separator.

The inputs are chunks, a list of strings that may contain letters, spaces, and hyphens, and queries, a list of words whose occurrences we need to count. Each query is guaranteed to be a valid word: it starts and ends with a lowercase letter and does not contain consecutive hyphens. The output is an array of integers corresponding to the count of each query word in the fully concatenated string s.

Key constraints indicate that the combined size of chunks and queries can be up to $10^5$ characters, so any solution that iterates inefficiently over all substrings will be too slow. Important edge cases include hyphens at the edges of strings, consecutive hyphens, and spaces.

Approaches

A brute-force approach would concatenate all chunks into a single string, split it using spaces or hyphens as separators, and then count exact matches for each query. The challenge here is correctly identifying which hyphens are joiner hyphens. This approach works correctly but may be inefficient because it requires parsing the entire string multiple times and storing all words.

The optimal approach is to scan the concatenated string once, building words on the fly and checking for joiner hyphens. We maintain a dictionary (hash map) that counts the frequency of each word. Once all words are counted, we simply look up each query in the dictionary to produce the result. This method avoids repeatedly scanning the string and ensures an efficient solution.

Approach Time Complexity Space Complexity Notes
Brute Force O(n * m) O(n) Concatenate and split string, then count matches per query
Optimal O(n + q) O(n) Single scan to build word frequencies, then query in O(1) per query

Where $n$ is the total length of all chunks and $q$ is the number of queries.

Algorithm Walkthrough

  1. Concatenate all strings in chunks to form s.

  2. Initialize an empty dictionary word_count to keep track of word frequencies.

  3. Initialize an empty string current_word to accumulate letters and joiner hyphens.

  4. Iterate through each character in s:

  5. If the character is a lowercase letter, append it to current_word.

  6. If the character is a hyphen -, check if it is a joiner hyphen by verifying that both the previous and next characters are lowercase letters. If so, append it to current_word. Otherwise, finalize current_word as a complete word and reset it.

  7. For any other separator (space, invalid hyphen), finalize current_word as a complete word and reset it.

  8. After the iteration, ensure the last current_word (if non-empty) is added to word_count.

  9. For each query, retrieve its count from word_count (defaulting to 0 if not present) and append to the result array.

  10. Return the result array.

Why it works: The algorithm guarantees that all words are identified according to the problem definition. Each character is examined exactly once, and each word is counted correctly in a hash map, allowing efficient lookups for queries.

Python Solution

class Solution:
    def countWordOccurrences(self, chunks: list[str], queries: list[str]) -> list[int]:
        s = ''.join(chunks)
        word_count = {}
        n = len(s)
        i = 0
        
        while i < n:
            if s[i].islower():
                start = i
                while i < n:
                    if s[i].islower():
                        i += 1
                    elif s[i] == '-' and i > start and i + 1 < n and s[i-1].islower() and s[i+1].islower():
                        i += 1
                    else:
                        break
                word = s[start:i]
                word_count[word] = word_count.get(word, 0) + 1
            else:
                i += 1
        
        return [word_count.get(q, 0) for q in queries]

The implementation starts by concatenating chunks to simplify processing. A while loop scans each character, carefully handling joiner hyphens by checking adjacent characters. Each word found is counted in word_count. Finally, we generate the result array by looking up each query's frequency.

Go Solution

func countWordOccurrences(chunks []string, queries []string) []int {
    s := ""
    for _, chunk := range chunks {
        s += chunk
    }
    
    wordCount := make(map[string]int)
    n := len(s)
    i := 0
    
    for i < n {
        if s[i] >= 'a' && s[i] <= 'z' {
            start := i
            for i < n {
                if s[i] >= 'a' && s[i] <= 'z' {
                    i++
                } else if s[i] == '-' && i > start && i+1 < n && s[i-1] >= 'a' && s[i-1] <= 'z' && s[i+1] >= 'a' && s[i+1] <= 'z' {
                    i++
                } else {
                    break
                }
            }
            word := s[start:i]
            wordCount[word]++
        } else {
            i++
        }
    }
    
    result := make([]int, len(queries))
    for idx, q := range queries {
        result[idx] = wordCount[q]
    }
    
    return result
}

In Go, we handle strings as byte slices implicitly. The logic mirrors Python: a single pass counts all valid words, respecting joiner hyphens, and the final result array is built by looking up each query in the map.

Worked Examples

Example 1:

chunks = ["hello wor","ld hello"]s = "hello world hello"

Scanning produces words: "hello", "world", "hello"

Word count dictionary: {"hello": 2, "world": 1}

Queries: ["hello","world","wor"] → result [2, 1, 0]

Example 2:

chunks = ["a-b a--b ","a-","b"]s = "a-b a--b a-b"

Scanning produces words: "a-b", "a", "b", "a-b"

Word count dictionary: {"a-b": 2, "a": 1, "b": 1}

Queries: ["a-b","a","b"] → result [2, 1, 1]

Example 3:

chunks = ["-cat dog- mouse"]s = "-cat dog- mouse"

Scanning produces words: "cat", "dog", "mouse"

Word count dictionary: {"cat": 1, "dog": 1, "mouse": 1}

Queries: ["cat","dog","mouse","cat-dog"] → result [1, 1, 1, 0]

Complexity Analysis

Measure Complexity Explanation
Time O(n + q) Single pass through concatenated string of length n, plus O(1) lookup per query for q queries
Space O(n) Storing all words in a hash map may require space proportional to n

The algorithm efficiently handles large input sizes because each character is examined exactly once, and dictionary lookups are amortized O(1).

Test Cases

# Provided examples
assert Solution().countWordOccurrences(["hello wor","ld hello"], ["hello","world","wor"]) == [2,1,0]
assert Solution().countWordOccurrences(["a-b a--b ","a-","b"], ["a-b","a","b"]) == [2,1,1]
assert Solution().countWordOccurrences(["-cat dog- mouse"], ["cat","dog","mouse","cat-dog"]) == [1,1,1,0]

# Edge case: single chunk, single word
assert Solution().countWordOccurrences(["abc"], ["abc"]) == [1]

# Edge case: hyphens at edges
assert Solution().countWordOccurrences(["-a-b-"], ["a-b","a","b"]) == [1,0,0]

# Multiple joiner hyphens
assert Solution().countWordOccurrences(["x-y-z"], ["x-y-z","x-y","y-z","x","y","z"]) == [1,0,0,0,0,0]

# Spaces and multiple separators
assert Solution().countWordOccurrences(["a  b--c d-e "], ["a","b","c","d-e"]) == [1,1,1,1]
Test Why
Single chunk, single word Minimal input
Hyphens at edges Ensures non-joiner hyphens are correctly ignored
Multiple joiner hyphens