LeetCode 1241 - Number of Comments per Post

This problem asks us to analyze a database table named Submissions and determine how many unique comments belong to each post.

LeetCode Problem 1241

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

This problem asks us to analyze a database table named Submissions and determine how many unique comments belong to each post.

The table contains two columns:

Column Meaning
sub_id The ID of the current submission
parent_id The parent submission ID

A row represents either a post or a comment:

  • If parent_id is NULL, the row is a post.
  • If parent_id is not NULL, the row is a comment whose parent is another submission.

The goal is to produce a result table containing:

Column Meaning
post_id The ID of the post
number_of_comments Number of unique comments belonging to that post

Several details make this problem slightly trickier than a simple count.

First, the table may contain duplicate rows. If the same post appears multiple times, we still treat it as a single post. Similarly, if the same comment appears multiple times, it should only be counted once.

Second, not every comment should be counted. A comment only counts if its parent_id refers to an actual post that exists in the table. If a comment references a deleted or nonexistent post, we ignore it completely.

The output must be sorted by post_id in ascending order.

The key observations are:

  • Posts are rows where parent_id IS NULL
  • Comments are rows where parent_id IS NOT NULL
  • Duplicate comments must be removed before counting
  • Duplicate posts must also be treated as one post
  • Comments pointing to nonexistent posts must be ignored

The input size is not explicitly given, but since this is a database problem categorized as Easy, we should still aim for an efficient query using grouping and distinct counting rather than expensive repeated scans.

Several edge cases are important:

  • A post may have zero comments
  • The same comment may appear multiple times
  • The same post may appear multiple times
  • Comments may reference deleted posts
  • Multiple comments may belong to the same post
  • The table may contain only posts or only comments

A naive implementation that simply counts rows grouped by parent_id would incorrectly count duplicates and deleted-post references.

Approaches

Brute Force Approach

A brute force solution would first identify every distinct post. Then, for each post, we would scan the entire table and count how many comments reference that post.

Conceptually, the algorithm works like this:

  1. Extract all unique posts.
  2. For every post:
  • Iterate through the entire table.
  • Check whether each row is a comment belonging to the current post.
  • Store unique comment IDs in a temporary set.
  1. Output the final counts.

This approach is correct because every post explicitly checks all possible comments. Duplicate comments can be removed using a temporary set.

However, this method is inefficient because the table is scanned repeatedly for every post. If there are P posts and N total rows, the complexity becomes approximately O(P × N).

Optimal Approach

The better solution uses SQL grouping and aggregation.

The key insight is that SQL is designed to efficiently aggregate related rows. Instead of scanning the table once per post, we can:

  1. Extract all distinct posts.
  2. Join comments to those posts using parent_id.
  3. Count distinct comment IDs for each post.

Using COUNT(DISTINCT sub_id) automatically removes duplicate comments.

Using a LEFT JOIN ensures that posts with zero comments still appear in the result.

This transforms the problem into a single grouped aggregation query, which is much more efficient and idiomatic in SQL.

Approach Time Complexity Space Complexity Notes
Brute Force O(P × N) O(C) Repeatedly scans the table for each post
Optimal O(N log N) approximately O(N) Uses grouping and distinct aggregation efficiently

Algorithm Walkthrough

  1. First, identify all unique posts by selecting rows where parent_id IS NULL.

These rows represent valid posts. Since duplicate posts may exist, we use DISTINCT to ensure each post appears only once. 2. Next, identify all comments.

Comments are rows where parent_id IS NOT NULL. The parent_id tells us which post the comment belongs to. 3. Perform a LEFT JOIN between posts and comments.

We match:

  • posts.sub_id
  • comments.parent_id

This associates each post with all comments that reference it. 4. Use COUNT(DISTINCT comments.sub_id).

Duplicate comments may appear multiple times in the table. Using DISTINCT ensures each comment ID is counted only once. 5. Group by post ID.

This produces one row per post with its total unique comment count. 6. Sort the result by post_id in ascending order.

Why it works

The solution works because every valid post is explicitly extracted first, and comments are only counted if they successfully join to one of those posts. Duplicate posts are removed using DISTINCT, duplicate comments are removed using COUNT(DISTINCT ...), and posts without comments are preserved through the LEFT JOIN.

Python Solution

Although this is fundamentally a SQL problem, LeetCode database problems are often explained algorithmically. The following Python implementation simulates the same logic.

from typing import List, Optional
from collections import defaultdict

class Solution:
    def numberOfComments(self, submissions: List[List[Optional[int]]]) -> List[List[int]]:
        posts = set()
        
        for sub_id, parent_id in submissions:
            if parent_id is None:
                posts.add(sub_id)

        comments_per_post = defaultdict(set)

        for sub_id, parent_id in submissions:
            if parent_id is not None and parent_id in posts:
                comments_per_post[parent_id].add(sub_id)

        result = []

        for post_id in sorted(posts):
            result.append([post_id, len(comments_per_post[post_id])])

        return result

The implementation begins by collecting all valid posts into a set. Using a set automatically removes duplicate posts.

Next, we create a dictionary where each post maps to a set of unique comment IDs. A set is important because duplicate comments may appear multiple times in the input.

We then iterate through all rows again. If a row represents a comment and its parent exists as a valid post, we insert the comment ID into that post's comment set.

Finally, we iterate through the sorted posts and compute the size of each comment set.

This directly mirrors the SQL logic:

  • posts corresponds to distinct posts
  • comments_per_post corresponds to grouped comments
  • set() corresponds to DISTINCT
  • sorting corresponds to ORDER BY

Go Solution

package main

import (
	"sort"
)

func numberOfComments(submissions [][]*int) [][]int {
	posts := make(map[int]bool)

	for _, row := range submissions {
		subID := *row[0]
		parentID := row[1]

		if parentID == nil {
			posts[subID] = true
		}
	}

	commentsPerPost := make(map[int]map[int]bool)

	for _, row := range submissions {
		subID := *row[0]
		parentID := row[1]

		if parentID != nil {
			parent := *parentID

			if posts[parent] {
				if commentsPerPost[parent] == nil {
					commentsPerPost[parent] = make(map[int]bool)
				}

				commentsPerPost[parent][subID] = true
			}
		}
	}

	postIDs := make([]int, 0, len(posts))

	for postID := range posts {
		postIDs = append(postIDs, postID)
	}

	sort.Ints(postIDs)

	result := [][]int{}

	for _, postID := range postIDs {
		count := 0

		if commentsPerPost[postID] != nil {
			count = len(commentsPerPost[postID])
		}

		result = append(result, []int{postID, count})
	}

	return result
}

The Go version follows the same logic as the Python solution, but there are a few language-specific differences.

Since Go does not have a built-in set type, maps with boolean values are used to simulate sets.

The input uses *int pointers so that nil can represent SQL NULL.

Go also requires explicit sorting using sort.Ints() before generating the final result.

Worked Examples

Example 1

Input:

sub_id parent_id
1 NULL
2 NULL
1 NULL
12 NULL
3 1
5 2
3 1
4 1
9 1
10 2
6 7

Step 1, Extract Posts

We collect all rows where parent_id IS NULL.

Row Action Posts Set
(1, NULL) Add post 1 {1}
(2, NULL) Add post 2 {1,2}
(1, NULL) Duplicate {1,2}
(12, NULL) Add post 12 {1,2,12}

Final posts:

{1, 2, 12}

Step 2, Process Comments

We now process comment rows.

Comment Valid Parent? Action
(3,1) Yes Add comment 3 to post 1
(5,2) Yes Add comment 5 to post 2
(3,1) Yes Duplicate, ignored by set
(4,1) Yes Add comment 4 to post 1
(9,1) Yes Add comment 9 to post 1
(10,2) Yes Add comment 10 to post 2
(6,7) No Ignore

State of comment sets:

Post Comment Set
1 {3,4,9}
2 {5,10}
12 {}

Step 3, Generate Result

post_id number_of_comments
1 3
2 2
12 0

Complexity Analysis

Measure Complexity Explanation
Time O(N log N) We scan the table twice and sort the posts
Space O(N) Sets and maps may store all posts and comments

The algorithm performs two linear scans over the input data. The only non-linear operation is sorting the distinct post IDs before returning the result. If the database already guarantees ordered output through indexing, the effective runtime may approach linear time.

Test Cases

from collections import defaultdict

class Solution:
    def numberOfComments(self, submissions):
        posts = set()

        for sub_id, parent_id in submissions:
            if parent_id is None:
                posts.add(sub_id)

        comments_per_post = defaultdict(set)

        for sub_id, parent_id in submissions:
            if parent_id is not None and parent_id in posts:
                comments_per_post[parent_id].add(sub_id)

        result = []

        for post_id in sorted(posts):
            result.append([post_id, len(comments_per_post[post_id])])

        return result

sol = Solution()

# Example from problem statement
assert sol.numberOfComments([
    [1, None],
    [2, None],
    [1, None],
    [12, None],
    [3, 1],
    [5, 2],
    [3, 1],
    [4, 1],
    [9, 1],
    [10, 2],
    [6, 7]
]) == [
    [1, 3],
    [2, 2],
    [12, 0]
]  # duplicate posts and comments

# Single post with no comments
assert sol.numberOfComments([
    [1, None]
]) == [
    [1, 0]
]  # zero-comment post

# Duplicate comments should count once
assert sol.numberOfComments([
    [1, None],
    [2, 1],
    [2, 1],
    [2, 1]
]) == [
    [1, 1]
]  # duplicate comments

# Comments referencing deleted posts ignored
assert sol.numberOfComments([
    [5, 10],
    [6, 10]
]) == []  # no valid posts

# Multiple posts with mixed comments
assert sol.numberOfComments([
    [1, None],
    [2, None],
    [3, 1],
    [4, 1],
    [5, 2]
]) == [
    [1, 2],
    [2, 1]
]  # normal multi-post case

# Duplicate posts should appear once
assert sol.numberOfComments([
    [1, None],
    [1, None],
    [1, None]
]) == [
    [1, 0]
]  # duplicate post rows

# Comments on comments should not count unless parent is a post
assert sol.numberOfComments([
    [1, None],
    [2, 1],
    [3, 2]
]) == [
    [1, 1]
]  # nested comment ignored
Test Why
Problem statement example Verifies duplicates and deleted posts
Single post with no comments Ensures zero counts are included
Duplicate comments Confirms uniqueness handling
Deleted-post references Ensures invalid comments ignored
Multiple posts Validates grouping behavior
Duplicate posts Ensures posts are deduplicated
Nested comments Ensures only direct post comments counted

Edge Cases

One important edge case occurs when the table contains duplicate comments. A naive counting solution using COUNT(*) would incorrectly inflate the number of comments. This implementation avoids that problem by storing comment IDs inside a set, which automatically removes duplicates.

Another important edge case is comments referencing deleted or nonexistent posts. For example, a row like (6, 7) should not count if post 7 does not exist. The implementation explicitly checks whether the parent exists in the set of valid posts before counting the comment.

A third edge case is posts with zero comments. Many incorrect solutions accidentally exclude such posts because they use an inner join instead of a left join. This implementation correctly includes every valid post and assigns a count of zero when no comments exist.

A fourth subtle case involves duplicate post rows. Since the table may contain the same post multiple times, failing to deduplicate posts would produce repeated output rows. Using a set for posts guarantees each post appears exactly once in the final result.