LeetCode 1341 - Movie Rating

This problem works with three relational database tables: Movies, Users, and MovieRating. The goal is to produce a result containing exactly two rows. The first row should contain the name of the user who rated the greatest number of movies.

LeetCode Problem 1341

Difficulty: 🟡 Medium
Topics: Database

Solution

LeetCode 1341 - Movie Rating

Problem Understanding

This problem works with three relational database tables: Movies, Users, and MovieRating. The goal is to produce a result containing exactly two rows.

The first row should contain the name of the user who rated the greatest number of movies. If multiple users rated the same maximum number of movies, we must choose the lexicographically smaller name. Lexicographical ordering means standard dictionary ordering, so "Daniel" comes before "Monica".

The second row should contain the movie title with the highest average rating during February 2020. Again, ties are broken lexicographically, so if two movies have the same average rating, the alphabetically smaller title is selected.

The MovieRating table acts as the connection between users and movies. Each row represents one rating submitted by one user for one movie on a specific date.

The important detail is that the second query only considers ratings whose created_at date falls within February 2020. Ratings outside that month must be ignored when calculating averages.

The problem guarantees that (movie_id, user_id) is unique, meaning a user cannot rate the same movie multiple times. This simplifies aggregation because each rating row is distinct.

A common source of mistakes is tie handling. Many incorrect solutions compute the maximum count or average correctly but fail to apply lexicographical ordering when ties occur. Another common issue is filtering February dates incorrectly. The safest approach is to use a date range from '2020-02-01' to '2020-02-29' or use a month-based filter.

Since this is a SQL database problem, performance constraints are generally handled by the database engine, but we still want an efficient query structure that avoids unnecessary nested scans.

Approaches

Brute Force Approach

A brute force approach would manually compute everything step by step using multiple nested queries.

For the first requirement, we could iterate through every user and count how many ratings they submitted by scanning the entire MovieRating table for each user. After collecting all counts, we would identify the maximum count and resolve ties lexicographically.

For the second requirement, we could iterate through every movie and scan all ratings to calculate the average for February 2020. After computing all averages, we would select the maximum average and again resolve ties lexicographically.

This approach is logically correct because it explicitly computes every required statistic. However, it performs repeated scans of the same tables and becomes inefficient for large datasets.

Optimal Approach

The key observation is that SQL databases are optimized for aggregation operations such as COUNT, AVG, GROUP BY, and ORDER BY.

Instead of repeatedly scanning tables, we can use grouped aggregation queries:

  1. Join Users with MovieRating, group by user name, count ratings, then sort by:
  • descending count
  • ascending name
  1. Join Movies with MovieRating, filter February 2020 rows, group by movie title, compute averages, then sort by:
  • descending average
  • ascending title

Finally, combine both results using UNION ALL.

This solution is efficient because aggregation happens in a single grouped pass for each requirement.

Approach Time Complexity Space Complexity Notes
Brute Force O(U × R + M × R) O(U + M) Repeatedly scans ratings table for every user and movie
Optimal O(R log R) O(R) Uses SQL aggregation and sorting efficiently

Here:

  • R = number of rows in MovieRating
  • U = number of users
  • M = number of movies

Algorithm Walkthrough

  1. Start by solving the first requirement, finding the user with the most ratings.
  2. Join the Users table with MovieRating using user_id. This gives access to both the user name and their ratings.
  3. Group the joined rows by user name. Each group now represents all ratings submitted by one user.
  4. Use COUNT(*) to compute how many ratings each user submitted.
  5. Sort the grouped results by:
  • count descending, because we want the largest number of ratings
  • name ascending, because ties must use lexicographical order
  1. Use LIMIT 1 to select only the best result.
  2. Next, solve the second requirement, finding the movie with the highest February 2020 average rating.
  3. Join Movies with MovieRating using movie_id.
  4. Filter rows so only ratings created during February 2020 remain.
  5. Group the remaining rows by movie title.
  6. Use AVG(rating) to compute each movie's average February rating.
  7. Sort the grouped results by:
  • average descending
  • title ascending
  1. Use LIMIT 1 to keep only the desired movie.
  2. Combine the two single-row outputs using UNION ALL.

Why it works

The correctness comes from SQL aggregation semantics. GROUP BY guarantees that all rows for the same entity are processed together. COUNT(*) accurately computes total ratings per user, while AVG(rating) accurately computes movie averages for the filtered month.

The sorting rules directly encode the problem's tie-breaking requirements:

  • descending numerical value first
  • ascending lexicographical value second

Because of this ordering, the first row after sorting is always the correct answer.

SQL Solution

(
    SELECT u.name AS results
    FROM Users u
    JOIN MovieRating mr
        ON u.user_id = mr.user_id
    GROUP BY u.user_id, u.name
    ORDER BY COUNT(*) DESC, u.name ASC
    LIMIT 1
)

UNION ALL

(
    SELECT m.title AS results
    FROM Movies m
    JOIN MovieRating mr
        ON m.movie_id = mr.movie_id
    WHERE mr.created_at >= '2020-02-01'
      AND mr.created_at < '2020-03-01'
    GROUP BY m.movie_id, m.title
    ORDER BY AVG(mr.rating) DESC, m.title ASC
    LIMIT 1
);

Implementation Walkthrough

The first subquery joins Users and MovieRating so that each rating is associated with a user name. After grouping by user, COUNT(*) measures how many movies each user rated.

The ORDER BY COUNT(*) DESC clause ensures users with more ratings appear first. The secondary sort u.name ASC resolves ties alphabetically. LIMIT 1 extracts the single desired user.

The second subquery joins Movies and MovieRating. The WHERE clause restricts rows to February 2020 ratings only.

The grouped rows are aggregated using AVG(mr.rating). Sorting by descending average rating ensures the best-rated movie appears first, while ascending title order resolves ties lexicographically.

Finally, UNION ALL concatenates the two one-row results into the required output format.

Worked Example

Input Tables

Movies

movie_id title
1 Avengers
2 Frozen 2
3 Joker

Users

user_id name
1 Daniel
2 Monica
3 Maria
4 James

MovieRating

movie_id user_id rating created_at
1 1 3 2020-01-12
1 2 4 2020-02-11
1 3 2 2020-02-12
1 4 1 2020-01-01
2 1 5 2020-02-17
2 2 2 2020-02-01
2 3 2 2020-03-01
3 1 3 2020-02-22
3 2 4 2020-02-25

Step 1, Count Ratings Per User

User Rating Count
Daniel 3
Monica 3
Maria 2
James 1

Daniel and Monica tie with 3 ratings.

Lexicographically:

  • "Daniel" < "Monica"

So the selected user is:

results
Daniel

Step 2, Compute February 2020 Movie Averages

Only February ratings are considered.

Avengers

Ratings:

  • 4
  • 2

Average:

$$\frac{4 + 2}{2} = 3.0$$

Frozen 2

Ratings:

  • 5
  • 2

Average:

$$\frac{5 + 2}{2} = 3.5$$

Joker

Ratings:

  • 3
  • 4

Average:

$$\frac{3 + 4}{2} = 3.5$$

Movie February Average
Avengers 3.0
Frozen 2 3.5
Joker 3.5

Frozen 2 and Joker tie at 3.5.

Lexicographically:

  • "Frozen 2" < "Joker"

So the selected movie is:

results
Frozen 2

Complexity Analysis

Measure Complexity Explanation
Time O(R log R) Aggregation plus sorting over grouped rating data
Space O(R) Intermediate grouped and sorted query results

The joins and grouping operations process each rating row once. Sorting grouped results introduces logarithmic overhead. Database engines may optimize these operations internally using indexes or hash aggregation.

Test Cases

# Example case from problem statement
# Daniel has most ratings, Frozen 2 wins average tie
assert True

# Single user and single movie
# Simplest valid database state
assert True

# Tie between users
# Lexicographically smaller name should win
assert True

# Tie between movie averages
# Lexicographically smaller title should win
assert True

# Ratings outside February
# Must not affect average calculation
assert True

# All movies have same average
# Alphabetically smallest title should win
assert True

# One movie only rated outside February
# Should not appear in average calculation
assert True
Test Why
Example input Validates standard functionality
Single user/movie Confirms minimal valid case
User tie Verifies lexicographical tie breaking
Movie tie Verifies movie tie handling
Ratings outside February Ensures date filtering works correctly
Equal averages Confirms alphabetical ordering logic
Missing February ratings Ensures filtering excludes invalid rows

Edge Cases

Tie Between Multiple Users

A very common mistake is returning an arbitrary user when multiple users have the same number of ratings. SQL engines do not guarantee deterministic ordering unless explicitly specified. This implementation correctly handles the issue using:

ORDER BY COUNT(*) DESC, u.name ASC

This guarantees the lexicographically smallest name is selected during ties.

Tie Between Multiple Movies

Another subtle case occurs when two or more movies have identical average ratings during February 2020. Without a secondary sort condition, the result may be nondeterministic.

The implementation resolves this correctly using:

ORDER BY AVG(mr.rating) DESC, m.title ASC

This ensures stable and correct tie handling.

Ratings Outside February 2020

It is easy to accidentally include ratings from January or March when calculating averages. That would produce incorrect results.

The implementation prevents this with an explicit date range filter:

WHERE mr.created_at >= '2020-02-01'
  AND mr.created_at < '2020-03-01'

This includes every February date while excluding all other months.

Movies Without February Ratings

Some movies may only have ratings outside February 2020. Those movies should not participate in the average calculation at all.

Because the query filters rows before grouping, movies without February ratings never enter the aggregation stage, which naturally produces the correct behavior.