LeetCode 1050 - Actors and Directors Who Cooperated At Least Three Times

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

This problem asks us to identify every (actor_id, director_id) pair where an actor and a director have collaborated at least three times.

The input is a database table named ActorDirector. Each row represents one collaboration event between an actor and a director. The table contains three columns:

actor_id, which uniquely identifies an actor
director_id, which uniquely identifies a director
timestamp, which uniquely identifies each collaboration record

The important detail is that timestamp is the primary key, meaning every row is unique. Even if the same actor and director work together multiple times, each collaboration is stored as a separate row with a distinct timestamp.

The goal is to return all actor and director pairs that appear together at least three times in the table. In other words, we need to count how many records exist for every (actor_id, director_id) combination and keep only those whose count is greater than or equal to 3.

For example, if actor 1 worked with director 1 three times, that pair should appear in the output. If actor 1 worked with director 2 only twice, that pair should not appear.

Since this is a database problem, the intended solution is SQL based. The scale implied by the problem is straightforward because counting grouped records is an operation databases are optimized for. The key observation is that we do not care about the individual timestamps beyond the fact that each row represents one cooperation. We only care about the frequency of each actor and director combination.

Several edge cases are worth considering upfront. Some actor and director pairs may appear exactly three times, and these must be included. Some may appear fewer than three times, and these must be excluded. Multiple qualifying pairs may exist, and the result can be returned in any order. The problem also guarantees that timestamps are unique, so duplicate records caused by repeated primary keys are impossible.

Approaches

Brute Force Approach

A brute force solution would manually count collaborations for every possible actor and director pair.

One way to think about this is to first collect all distinct (actor_id, director_id) combinations. Then, for each pair, scan the entire table again and count how many rows match that actor and director. If the count reaches at least three, add the pair to the result.

This approach works because every pair is explicitly checked against every collaboration record, guaranteeing an accurate count. However, it is inefficient because the table may be scanned repeatedly. If there are n rows and many unique pairs, this repeated scanning becomes unnecessarily expensive.

In database systems, repeatedly iterating over the same table is inefficient compared to aggregation operations that databases are specifically designed to optimize.

Optimal Approach

The key insight is that this problem is fundamentally a grouping and counting problem.

We want to know how many times each (actor_id, director_id) pair appears. SQL provides a direct mechanism for this using GROUP BY. Once the rows are grouped by actor and director, we can count how many rows belong to each group using COUNT(*).

After computing these counts, we only keep groups whose count is at least 3. SQL provides the HAVING clause specifically for filtering grouped results.

This approach is efficient because the database performs aggregation in a single pass over the relevant grouped data, avoiding repeated scans.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n²)	O(1)	Repeatedly scans the table for every pair
Optimal	O(n)	O(n)	Uses `GROUP BY` and `HAVING` to count collaborations efficiently

Algorithm Walkthrough

Group the table rows by (actor_id, director_id).

We treat each actor and director combination as a unique group because we want to count collaborations for every specific pair independently. 2. Count the number of rows in each group.

Since every row represents one collaboration, COUNT(*) gives the total number of times an actor and director worked together. 3. Filter groups whose count is at least 3.

The HAVING clause is used instead of WHERE because filtering happens after aggregation. We only want groups where the collaboration count is greater than or equal to three. 4. Return the actor_id and director_id columns.

The problem only asks for qualifying pairs, so the collaboration count itself does not need to be included in the output.

Why it works

The correctness comes from the fact that grouping by (actor_id, director_id) ensures all collaborations for a specific pair are collected together. Since each row corresponds to exactly one cooperation event, counting rows inside each group gives the exact number of collaborations for that pair. Filtering with HAVING COUNT(*) >= 3 guarantees that only valid pairs remain in the result.

Python Solution

Although this is a database problem and does not use Python on LeetCode, the following demonstrates the equivalent logic programmatically.

from collections import defaultdict
from typing import List

class Solution:
    def actorsAndDirectors(self, actorDirector: List[List[int]]) -> List[List[int]]:
        collaboration_count = defaultdict(int)

        for actor_id, director_id, _ in actorDirector:
            collaboration_count[(actor_id, director_id)] += 1

        result = []

        for (actor_id, director_id), count in collaboration_count.items():
            if count >= 3:
                result.append([actor_id, director_id])

        return result

The implementation uses a hash map, implemented through defaultdict(int), to count collaborations efficiently. Each key is a tuple (actor_id, director_id) and the corresponding value stores how many times that pair appears.

We iterate through every collaboration record once and increment the count for the relevant pair. The timestamp value is ignored because it is not needed for counting.

After all records are processed, we iterate through the dictionary and collect every pair whose count is at least 3. The final result contains exactly the qualifying actor and director combinations.

SQL Solution

This is the actual LeetCode submission solution:

SELECT actor_id, director_id
FROM ActorDirector
GROUP BY actor_id, director_id
HAVING COUNT(*) >= 3;

The query groups rows by actor and director, counts collaborations for each pair, and filters to keep only those with at least three cooperations.

Go Solution

Although LeetCode expects SQL for this problem, the following Go implementation mirrors the same counting logic.

package main

type pair struct {
	actorID    int
	directorID int
}

func actorsAndDirectors(actorDirector [][]int) [][]int {
	collaborationCount := make(map[pair]int)

	for _, record := range actorDirector {
		actorID := record[0]
		directorID := record[1]

		key := pair{
			actorID:    actorID,
			directorID: directorID,
		}

		collaborationCount[key]++
	}

	result := [][]int{}

	for key, count := range collaborationCount {
		if count >= 3 {
			result = append(result, []int{key.actorID, key.directorID})
		}
	}

	return result
}

The Go implementation uses a custom pair struct as the map key because Go maps cannot directly use slices as keys. The map tracks how many collaborations each actor and director pair has.

Unlike Python, Go requires explicit struct definitions and typed map declarations. Since the problem only involves counting integers, there are no concerns about integer overflow under normal constraints.

Worked Examples

Example 1

Input table:

actor_id	director_id	timestamp
1	1	0
1	1	1
1	1	2
1	2	3
1	2	4
2	1	5
2	1	6

We process each row and maintain collaboration counts.

Step	Pair	Count State
1	(1,1)	{(1,1): 1}
2	(1,1)	{(1,1): 2}
3	(1,1)	{(1,1): 3}
4	(1,2)	{(1,1): 3, (1,2): 1}
5	(1,2)	{(1,1): 3, (1,2): 2}
6	(2,1)	{(1,1): 3, (1,2): 2, (2,1): 1}
7	(2,1)	{(1,1): 3, (1,2): 2, (2,1): 2}

After counting completes:

(1,1) has count 3, include it
(1,2) has count 2, exclude it
(2,1) has count 2, exclude it

Final output:

actor_id	director_id
1	1

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each row is processed exactly once
Space	O(n)	A hash map stores counts for distinct actor and director pairs

The time complexity is linear because we only traverse the input once to build collaboration counts, followed by a pass over unique pairs. The space complexity is linear in the worst case, where every row belongs to a unique (actor_id, director_id) pair.

For the SQL solution, the exact complexity depends on the database implementation and indexing strategy, but conceptually it behaves like a grouping operation over all rows.

Test Cases

def test_solution():
    solution = Solution()

    # Provided example
    assert sorted(solution.actorsAndDirectors([
        [1, 1, 0],
        [1, 1, 1],
        [1, 1, 2],
        [1, 2, 3],
        [1, 2, 4],
        [2, 1, 5],
        [2, 1, 6]
    ])) == [[1, 1]]  # exactly three collaborations

    # No pair reaches three collaborations
    assert solution.actorsAndDirectors([
        [1, 1, 0],
        [1, 1, 1],
        [2, 2, 2]
    ]) == []  # no valid pair

    # More than three collaborations
    assert sorted(solution.actorsAndDirectors([
        [1, 1, 0],
        [1, 1, 1],
        [1, 1, 2],
        [1, 1, 3]
    ])) == [[1, 1]]  # count greater than three

    # Multiple valid pairs
    assert sorted(solution.actorsAndDirectors([
        [1, 1, 0],
        [1, 1, 1],
        [1, 1, 2],
        [2, 2, 3],
        [2, 2, 4],
        [2, 2, 5]
    ])) == [[1, 1], [2, 2]]  # multiple qualifying pairs

    # Empty input
    assert solution.actorsAndDirectors([]) == []  # no records

test_solution()

Test	Why
Provided example	Verifies standard functionality
No pair reaches three	Ensures invalid pairs are excluded
More than three collaborations	Confirms threshold is inclusive
Multiple valid pairs	Checks handling of multiple outputs
Empty input	Validates behavior with no records

Edge Cases

One important edge case is when a pair cooperates exactly three times. A common mistake is using a strict comparison such as COUNT(*) > 3, which would incorrectly exclude valid results. This implementation correctly uses >= 3, ensuring exact threshold cases are included.

Another edge case occurs when multiple actor and director pairs qualify simultaneously. A naive implementation might accidentally overwrite counts or return only one result. Since the algorithm stores counts independently for every (actor_id, director_id) key, all valid pairs are preserved and returned.

A third edge case is when no actor and director pair reaches the required threshold. Some implementations might incorrectly return partial results or fail due to empty collections. Here, if no count reaches 3, the result list remains empty and is returned correctly.

Finally, cases with many unique collaborations are handled safely because the counting mechanism isolates each pair independently. Even if every row contains a different actor and director combination, the algorithm still processes the data correctly without collisions or logical errors.