LeetCode 1241 - Number of Comments per Post
This problem asks us to analyze a database table named Submissions and determine how many unique comments belong to each post.
Difficulty: 🟢 Easy
Topics: Database
Solution
Problem Understanding
This problem asks us to analyze a database table named Submissions and determine how many unique comments belong to each post.
The table contains two columns:
| Column | Meaning |
|---|---|
sub_id |
The ID of the current submission |
parent_id |
The parent submission ID |
A row represents either a post or a comment:
- If
parent_idisNULL, the row is a post. - If
parent_idis notNULL, the row is a comment whose parent is another submission.
The goal is to produce a result table containing:
| Column | Meaning |
|---|---|
post_id |
The ID of the post |
number_of_comments |
Number of unique comments belonging to that post |
Several details make this problem slightly trickier than a simple count.
First, the table may contain duplicate rows. If the same post appears multiple times, we still treat it as a single post. Similarly, if the same comment appears multiple times, it should only be counted once.
Second, not every comment should be counted. A comment only counts if its parent_id refers to an actual post that exists in the table. If a comment references a deleted or nonexistent post, we ignore it completely.
The output must be sorted by post_id in ascending order.
The key observations are:
- Posts are rows where
parent_id IS NULL - Comments are rows where
parent_id IS NOT NULL - Duplicate comments must be removed before counting
- Duplicate posts must also be treated as one post
- Comments pointing to nonexistent posts must be ignored
The input size is not explicitly given, but since this is a database problem categorized as Easy, we should still aim for an efficient query using grouping and distinct counting rather than expensive repeated scans.
Several edge cases are important:
- A post may have zero comments
- The same comment may appear multiple times
- The same post may appear multiple times
- Comments may reference deleted posts
- Multiple comments may belong to the same post
- The table may contain only posts or only comments
A naive implementation that simply counts rows grouped by parent_id would incorrectly count duplicates and deleted-post references.
Approaches
Brute Force Approach
A brute force solution would first identify every distinct post. Then, for each post, we would scan the entire table and count how many comments reference that post.
Conceptually, the algorithm works like this:
- Extract all unique posts.
- For every post:
- Iterate through the entire table.
- Check whether each row is a comment belonging to the current post.
- Store unique comment IDs in a temporary set.
- Output the final counts.
This approach is correct because every post explicitly checks all possible comments. Duplicate comments can be removed using a temporary set.
However, this method is inefficient because the table is scanned repeatedly for every post. If there are P posts and N total rows, the complexity becomes approximately O(P × N).
Optimal Approach
The better solution uses SQL grouping and aggregation.
The key insight is that SQL is designed to efficiently aggregate related rows. Instead of scanning the table once per post, we can:
- Extract all distinct posts.
- Join comments to those posts using
parent_id. - Count distinct comment IDs for each post.
Using COUNT(DISTINCT sub_id) automatically removes duplicate comments.
Using a LEFT JOIN ensures that posts with zero comments still appear in the result.
This transforms the problem into a single grouped aggregation query, which is much more efficient and idiomatic in SQL.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(P × N) | O(C) | Repeatedly scans the table for each post |
| Optimal | O(N log N) approximately | O(N) | Uses grouping and distinct aggregation efficiently |
Algorithm Walkthrough
- First, identify all unique posts by selecting rows where
parent_id IS NULL.
These rows represent valid posts. Since duplicate posts may exist, we use DISTINCT to ensure each post appears only once.
2. Next, identify all comments.
Comments are rows where parent_id IS NOT NULL. The parent_id tells us which post the comment belongs to.
3. Perform a LEFT JOIN between posts and comments.
We match:
posts.sub_idcomments.parent_id
This associates each post with all comments that reference it.
4. Use COUNT(DISTINCT comments.sub_id).
Duplicate comments may appear multiple times in the table. Using DISTINCT ensures each comment ID is counted only once.
5. Group by post ID.
This produces one row per post with its total unique comment count.
6. Sort the result by post_id in ascending order.
Why it works
The solution works because every valid post is explicitly extracted first, and comments are only counted if they successfully join to one of those posts. Duplicate posts are removed using DISTINCT, duplicate comments are removed using COUNT(DISTINCT ...), and posts without comments are preserved through the LEFT JOIN.
Python Solution
Although this is fundamentally a SQL problem, LeetCode database problems are often explained algorithmically. The following Python implementation simulates the same logic.
from typing import List, Optional
from collections import defaultdict
class Solution:
def numberOfComments(self, submissions: List[List[Optional[int]]]) -> List[List[int]]:
posts = set()
for sub_id, parent_id in submissions:
if parent_id is None:
posts.add(sub_id)
comments_per_post = defaultdict(set)
for sub_id, parent_id in submissions:
if parent_id is not None and parent_id in posts:
comments_per_post[parent_id].add(sub_id)
result = []
for post_id in sorted(posts):
result.append([post_id, len(comments_per_post[post_id])])
return result
The implementation begins by collecting all valid posts into a set. Using a set automatically removes duplicate posts.
Next, we create a dictionary where each post maps to a set of unique comment IDs. A set is important because duplicate comments may appear multiple times in the input.
We then iterate through all rows again. If a row represents a comment and its parent exists as a valid post, we insert the comment ID into that post's comment set.
Finally, we iterate through the sorted posts and compute the size of each comment set.
This directly mirrors the SQL logic:
postscorresponds to distinct postscomments_per_postcorresponds to grouped commentsset()corresponds toDISTINCT- sorting corresponds to
ORDER BY
Go Solution
package main
import (
"sort"
)
func numberOfComments(submissions [][]*int) [][]int {
posts := make(map[int]bool)
for _, row := range submissions {
subID := *row[0]
parentID := row[1]
if parentID == nil {
posts[subID] = true
}
}
commentsPerPost := make(map[int]map[int]bool)
for _, row := range submissions {
subID := *row[0]
parentID := row[1]
if parentID != nil {
parent := *parentID
if posts[parent] {
if commentsPerPost[parent] == nil {
commentsPerPost[parent] = make(map[int]bool)
}
commentsPerPost[parent][subID] = true
}
}
}
postIDs := make([]int, 0, len(posts))
for postID := range posts {
postIDs = append(postIDs, postID)
}
sort.Ints(postIDs)
result := [][]int{}
for _, postID := range postIDs {
count := 0
if commentsPerPost[postID] != nil {
count = len(commentsPerPost[postID])
}
result = append(result, []int{postID, count})
}
return result
}
The Go version follows the same logic as the Python solution, but there are a few language-specific differences.
Since Go does not have a built-in set type, maps with boolean values are used to simulate sets.
The input uses *int pointers so that nil can represent SQL NULL.
Go also requires explicit sorting using sort.Ints() before generating the final result.
Worked Examples
Example 1
Input:
| sub_id | parent_id |
|---|---|
| 1 | NULL |
| 2 | NULL |
| 1 | NULL |
| 12 | NULL |
| 3 | 1 |
| 5 | 2 |
| 3 | 1 |
| 4 | 1 |
| 9 | 1 |
| 10 | 2 |
| 6 | 7 |
Step 1, Extract Posts
We collect all rows where parent_id IS NULL.
| Row | Action | Posts Set |
|---|---|---|
| (1, NULL) | Add post 1 | {1} |
| (2, NULL) | Add post 2 | {1,2} |
| (1, NULL) | Duplicate | {1,2} |
| (12, NULL) | Add post 12 | {1,2,12} |
Final posts:
{1, 2, 12}
Step 2, Process Comments
We now process comment rows.
| Comment | Valid Parent? | Action |
|---|---|---|
| (3,1) | Yes | Add comment 3 to post 1 |
| (5,2) | Yes | Add comment 5 to post 2 |
| (3,1) | Yes | Duplicate, ignored by set |
| (4,1) | Yes | Add comment 4 to post 1 |
| (9,1) | Yes | Add comment 9 to post 1 |
| (10,2) | Yes | Add comment 10 to post 2 |
| (6,7) | No | Ignore |
State of comment sets:
| Post | Comment Set |
|---|---|
| 1 | {3,4,9} |
| 2 | {5,10} |
| 12 | {} |
Step 3, Generate Result
| post_id | number_of_comments |
|---|---|
| 1 | 3 |
| 2 | 2 |
| 12 | 0 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N log N) | We scan the table twice and sort the posts |
| Space | O(N) | Sets and maps may store all posts and comments |
The algorithm performs two linear scans over the input data. The only non-linear operation is sorting the distinct post IDs before returning the result. If the database already guarantees ordered output through indexing, the effective runtime may approach linear time.
Test Cases
from collections import defaultdict
class Solution:
def numberOfComments(self, submissions):
posts = set()
for sub_id, parent_id in submissions:
if parent_id is None:
posts.add(sub_id)
comments_per_post = defaultdict(set)
for sub_id, parent_id in submissions:
if parent_id is not None and parent_id in posts:
comments_per_post[parent_id].add(sub_id)
result = []
for post_id in sorted(posts):
result.append([post_id, len(comments_per_post[post_id])])
return result
sol = Solution()
# Example from problem statement
assert sol.numberOfComments([
[1, None],
[2, None],
[1, None],
[12, None],
[3, 1],
[5, 2],
[3, 1],
[4, 1],
[9, 1],
[10, 2],
[6, 7]
]) == [
[1, 3],
[2, 2],
[12, 0]
] # duplicate posts and comments
# Single post with no comments
assert sol.numberOfComments([
[1, None]
]) == [
[1, 0]
] # zero-comment post
# Duplicate comments should count once
assert sol.numberOfComments([
[1, None],
[2, 1],
[2, 1],
[2, 1]
]) == [
[1, 1]
] # duplicate comments
# Comments referencing deleted posts ignored
assert sol.numberOfComments([
[5, 10],
[6, 10]
]) == [] # no valid posts
# Multiple posts with mixed comments
assert sol.numberOfComments([
[1, None],
[2, None],
[3, 1],
[4, 1],
[5, 2]
]) == [
[1, 2],
[2, 1]
] # normal multi-post case
# Duplicate posts should appear once
assert sol.numberOfComments([
[1, None],
[1, None],
[1, None]
]) == [
[1, 0]
] # duplicate post rows
# Comments on comments should not count unless parent is a post
assert sol.numberOfComments([
[1, None],
[2, 1],
[3, 2]
]) == [
[1, 1]
] # nested comment ignored
| Test | Why |
|---|---|
| Problem statement example | Verifies duplicates and deleted posts |
| Single post with no comments | Ensures zero counts are included |
| Duplicate comments | Confirms uniqueness handling |
| Deleted-post references | Ensures invalid comments ignored |
| Multiple posts | Validates grouping behavior |
| Duplicate posts | Ensures posts are deduplicated |
| Nested comments | Ensures only direct post comments counted |
Edge Cases
One important edge case occurs when the table contains duplicate comments. A naive counting solution using COUNT(*) would incorrectly inflate the number of comments. This implementation avoids that problem by storing comment IDs inside a set, which automatically removes duplicates.
Another important edge case is comments referencing deleted or nonexistent posts. For example, a row like (6, 7) should not count if post 7 does not exist. The implementation explicitly checks whether the parent exists in the set of valid posts before counting the comment.
A third edge case is posts with zero comments. Many incorrect solutions accidentally exclude such posts because they use an inner join instead of a left join. This implementation correctly includes every valid post and assigns a count of zero when no comments exist.
A fourth subtle case involves duplicate post rows. Since the table may contain the same post multiple times, failing to deduplicate posts would produce repeated output rows. Using a set for posts guarantees each post appears exactly once in the final result.