LeetCode 811 - Subdomain Visit Count

In this problem, we are given a list of strings where each string represents a visit count paired with a domain name. Each input entry has the form: For example: means the domain discuss.leetcode.com was visited 9001 times.

LeetCode Problem 811

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Counting

Solution

Problem Understanding

In this problem, we are given a list of strings where each string represents a visit count paired with a domain name. Each input entry has the form:

"count domain"

For example:

"9001 discuss.leetcode.com"

means the domain discuss.leetcode.com was visited 9001 times.

The important detail is that visiting a domain also counts as visiting all of its parent subdomains. For the domain:

discuss.leetcode.com

the following domains are considered visited:

discuss.leetcode.com
leetcode.com
com

Therefore, the visit count must be added to every subdomain generated from the original domain.

The task is to compute the total number of visits for every possible subdomain and return them in the same "count domain" format. The output order does not matter.

The input size is relatively small:

  • At most 100 domain entries
  • Each string length is at most 100 characters

These constraints tell us that performance requirements are modest. Even moderately inefficient string operations will still pass comfortably. However, we still want a clean and scalable solution.

A key observation is that each domain contains at most three parts because the constraints specify formats like:

d1.d2
d1.d2.d3

This means each input generates only a small number of subdomains.

There are several important edge cases to consider:

  • Multiple domains may contribute to the same subdomain, such as google.mail.com and intel.mail.com both contributing to mail.com
  • A domain may already be a top level domain like com
  • Counts can accumulate across many entries
  • Output order is arbitrary, so we should not rely on sorting

The problem guarantees valid formatting, so we do not need to handle malformed input.

Approaches

Brute Force Approach

A brute force solution would explicitly generate every possible subdomain for every input domain and then repeatedly scan the entire result set to merge counts.

For example, given:

"900 google.mail.com"

we generate:

google.mail.com
mail.com
com

Then, for each generated subdomain, we search through previously stored results to determine whether it already exists. If it does, we update its count. Otherwise, we insert a new entry.

This approach is correct because every valid subdomain is generated and every count is accumulated properly. However, repeated linear searches make it inefficient.

If there are N input domains and K total generated subdomains, repeatedly searching through existing results can lead to quadratic behavior.

Optimal Approach

The key insight is that this problem is fundamentally a counting problem.

Whenever we need to accumulate frequencies by key, a hash map is the natural data structure. Instead of repeatedly scanning existing results, we can directly update counts in constant average time.

For each input string:

  1. Split it into the count and domain.
  2. Generate every suffix subdomain.
  3. Add the count to that subdomain in a hash map.

For example:

google.mail.com

produces:

google.mail.com
mail.com
com

Each of these becomes a key in the hash map.

After processing all entries, we convert the hash map back into the required output format.

Because each domain has at most three components, the amount of work per input is very small.

Approach Time Complexity Space Complexity Notes
Brute Force O(N²) O(K) Repeated searches through accumulated results
Optimal O(N × L) O(K) Uses a hash map for direct frequency accumulation

Here:

  • N is the number of input domains
  • L is the number of domain segments
  • K is the number of unique subdomains

Since L is very small in this problem, the optimal solution is effectively linear.

Algorithm Walkthrough

  1. Create a hash map to store visit counts for each subdomain.

The key will be the subdomain string, and the value will be the accumulated visit count. 2. Iterate through every entry in cpdomains.

Each string contains a count and a domain separated by a space. 3. Split the string into two parts.

For example:

"900 google.mail.com"

becomes:

count = 900
domain = "google.mail.com"
  1. Split the domain by ".".

Example:

["google", "mail", "com"]
  1. Generate every valid subdomain.

Starting from each position, join the remaining components:

  • "google.mail.com"
  • "mail.com"
  • "com"
  1. Add the visit count to each generated subdomain in the hash map.

This automatically merges counts from different domains that share the same parent domain. 7. After processing all inputs, iterate through the hash map.

Convert each key-value pair into the required format:

"count domain"
  1. Return the resulting list.

Why it works

The algorithm works because every visit to a domain necessarily implies a visit to all suffix subdomains. By explicitly generating all suffixes and adding the visit count to each one, we ensure every implied visit is counted exactly once.

The hash map guarantees that counts from multiple domains accumulate correctly for shared subdomains.

Python Solution

from typing import List
from collections import defaultdict

class Solution:
    def subdomainVisits(self, cpdomains: List[str]) -> List[str]:
        visit_counts = defaultdict(int)

        for entry in cpdomains:
            count_str, domain = entry.split()
            count = int(count_str)

            parts = domain.split(".")

            for i in range(len(parts)):
                subdomain = ".".join(parts[i:])
                visit_counts[subdomain] += count

        result = []

        for domain, count in visit_counts.items():
            result.append(f"{count} {domain}")

        return result

The implementation begins by creating a defaultdict(int) so that missing keys automatically start at zero. This simplifies the counting logic because we can directly increment counts without checking whether the key already exists.

For each input string, we split it into the numeric count and the domain name. The count is converted into an integer for arithmetic operations.

Next, we split the domain into its components using ".". By iterating over every starting index, we generate all suffix subdomains using ".".join(parts[i:]).

Each generated subdomain receives the current visit count in the hash map.

Finally, we iterate through the hash map and construct the required output strings in "count domain" format.

Go Solution

package main

import (
	"fmt"
	"strconv"
	"strings"
)

func subdomainVisits(cpdomains []string) []string {
	visitCounts := make(map[string]int)

	for _, entry := range cpdomains {
		parts := strings.Split(entry, " ")

		count, _ := strconv.Atoi(parts[0])
		domain := parts[1]

		domainParts := strings.Split(domain, ".")

		for i := 0; i < len(domainParts); i++ {
			subdomain := strings.Join(domainParts[i:], ".")
			visitCounts[subdomain] += count
		}
	}

	result := []string{}

	for domain, count := range visitCounts {
		result = append(result, fmt.Sprintf("%d %s", count, domain))
	}

	return result
}

The Go implementation follows the same logic as the Python version.

A map[string]int is used instead of a Python dictionary. String splitting is handled with strings.Split, and integer conversion uses strconv.Atoi.

Unlike Python's defaultdict, Go maps return the zero value for missing keys, so incrementing nonexistent entries works naturally.

The output slice is dynamically built using append.

Worked Examples

Example 1

Input:

["9001 discuss.leetcode.com"]

Step 1

Process:

"9001 discuss.leetcode.com"

Extract:

Variable Value
count 9001
domain discuss.leetcode.com

Step 2

Split domain:

Index Part
0 discuss
1 leetcode
2 com

Step 3

Generate subdomains and update counts.

Generated Subdomain Added Count Hash Map State
discuss.leetcode.com 9001 {"discuss.leetcode.com": 9001}
leetcode.com 9001 {"discuss.leetcode.com": 9001, "leetcode.com": 9001}
com 9001 {"discuss.leetcode.com": 9001, "leetcode.com": 9001, "com": 9001}

Final Output

[
    "9001 discuss.leetcode.com",
    "9001 leetcode.com",
    "9001 com"
]

Example 2

Input:

["900 google.mail.com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org"]

Processing "900 google.mail.com"

Subdomain Updated Count
google.mail.com 900
mail.com 900
com 900

Processing "50 yahoo.com"

Subdomain Updated Count
yahoo.com 50
com 950

Processing "1 intel.mail.com"

Subdomain Updated Count
intel.mail.com 1
mail.com 901
com 951

Processing "5 wiki.org"

Subdomain Updated Count
wiki.org 5
org 5

Final Hash Map

Domain Count
google.mail.com 900
mail.com 901
com 951
yahoo.com 50
intel.mail.com 1
wiki.org 5
org 5

Final Output

[
    "901 mail.com",
    "50 yahoo.com",
    "900 google.mail.com",
    "5 wiki.org",
    "5 org",
    "1 intel.mail.com",
    "951 com"
]

Complexity Analysis

Measure Complexity Explanation
Time O(N × L) Each domain generates at most L subdomains
Space O(K) Hash map stores all unique subdomains

Here:

  • N is the number of input domains
  • L is the number of domain segments
  • K is the number of unique subdomains

Because each domain contains at most three segments, L is extremely small. This makes the solution effectively linear in practice.

Test Cases

from collections import Counter

solution = Solution()

# Example 1
result = solution.subdomainVisits(["9001 discuss.leetcode.com"])
assert Counter(result) == Counter([
    "9001 discuss.leetcode.com",
    "9001 leetcode.com",
    "9001 com"
])

# Example 2
result = solution.subdomainVisits([
    "900 google.mail.com",
    "50 yahoo.com",
    "1 intel.mail.com",
    "5 wiki.org"
])
assert Counter(result) == Counter([
    "901 mail.com",
    "50 yahoo.com",
    "900 google.mail.com",
    "5 wiki.org",
    "5 org",
    "1 intel.mail.com",
    "951 com"
])

# Single top-level domain
result = solution.subdomainVisits(["10 com"])
assert Counter(result) == Counter([
    "10 com"
])  # Tests already minimal domain

# Multiple entries contributing to same subdomain
result = solution.subdomainVisits([
    "5 a.b.com",
    "10 b.com"
])
assert Counter(result) == Counter([
    "5 a.b.com",
    "15 b.com",
    "15 com"
])  # Tests count aggregation

# Duplicate exact domains
result = solution.subdomainVisits([
    "3 x.y.com",
    "7 x.y.com"
])
assert Counter(result) == Counter([
    "10 x.y.com",
    "10 y.com",
    "10 com"
])  # Tests repeated identical domains

# Different top-level domains
result = solution.subdomainVisits([
    "2 test.org",
    "3 test.com"
])
assert Counter(result) == Counter([
    "2 test.org",
    "2 org",
    "3 test.com",
    "3 com"
])  # Tests separation of independent domains

# Minimum count value
result = solution.subdomainVisits(["1 a.com"])
assert Counter(result) == Counter([
    "1 a.com",
    "1 com"
])  # Tests smallest valid count
Test Why
Single example domain Verifies basic subdomain generation
Multiple mixed domains Verifies accumulation across shared parents
Top-level domain only Ensures minimal domains work correctly
Shared parent domains Confirms counts merge properly
Duplicate exact domains Validates repeated inputs accumulate
Different TLDs Ensures independent domains stay separate
Minimum count Tests lower constraint boundary

Edge Cases

Multiple domains sharing the same parent

One common source of bugs is failing to correctly accumulate counts from different domains that share a parent subdomain.

For example:

"900 google.mail.com"
"1 intel.mail.com"

Both contribute to mail.com and com.

A naive implementation might overwrite counts instead of adding them. Using a hash map with incremental updates ensures counts are accumulated correctly.

Domains with only one segment

A domain like:

"10 com"

already represents the top level domain.

Some implementations incorrectly assume every domain contains dots and may attempt invalid slicing operations. This implementation handles the case naturally because splitting "com" produces a single-element list, and the loop generates exactly one subdomain.

Duplicate exact domain entries

The same domain may appear multiple times in the input:

"3 x.y.com"
"7 x.y.com"

The algorithm must combine both occurrences into a single total.

Because every generated subdomain is stored in a hash map and updated with +=, repeated domains accumulate correctly without any additional handling.