LeetCode 1341 - Movie Rating
This problem works with three relational database tables: Movies, Users, and MovieRating. The goal is to produce a result containing exactly two rows. The first row should contain the name of the user who rated the greatest number of movies.
Difficulty: 🟡 Medium
Topics: Database
Solution
LeetCode 1341 - Movie Rating
Problem Understanding
This problem works with three relational database tables: Movies, Users, and MovieRating. The goal is to produce a result containing exactly two rows.
The first row should contain the name of the user who rated the greatest number of movies. If multiple users rated the same maximum number of movies, we must choose the lexicographically smaller name. Lexicographical ordering means standard dictionary ordering, so "Daniel" comes before "Monica".
The second row should contain the movie title with the highest average rating during February 2020. Again, ties are broken lexicographically, so if two movies have the same average rating, the alphabetically smaller title is selected.
The MovieRating table acts as the connection between users and movies. Each row represents one rating submitted by one user for one movie on a specific date.
The important detail is that the second query only considers ratings whose created_at date falls within February 2020. Ratings outside that month must be ignored when calculating averages.
The problem guarantees that (movie_id, user_id) is unique, meaning a user cannot rate the same movie multiple times. This simplifies aggregation because each rating row is distinct.
A common source of mistakes is tie handling. Many incorrect solutions compute the maximum count or average correctly but fail to apply lexicographical ordering when ties occur. Another common issue is filtering February dates incorrectly. The safest approach is to use a date range from '2020-02-01' to '2020-02-29' or use a month-based filter.
Since this is a SQL database problem, performance constraints are generally handled by the database engine, but we still want an efficient query structure that avoids unnecessary nested scans.
Approaches
Brute Force Approach
A brute force approach would manually compute everything step by step using multiple nested queries.
For the first requirement, we could iterate through every user and count how many ratings they submitted by scanning the entire MovieRating table for each user. After collecting all counts, we would identify the maximum count and resolve ties lexicographically.
For the second requirement, we could iterate through every movie and scan all ratings to calculate the average for February 2020. After computing all averages, we would select the maximum average and again resolve ties lexicographically.
This approach is logically correct because it explicitly computes every required statistic. However, it performs repeated scans of the same tables and becomes inefficient for large datasets.
Optimal Approach
The key observation is that SQL databases are optimized for aggregation operations such as COUNT, AVG, GROUP BY, and ORDER BY.
Instead of repeatedly scanning tables, we can use grouped aggregation queries:
- Join
UserswithMovieRating, group by user name, count ratings, then sort by:
- descending count
- ascending name
- Join
MovieswithMovieRating, filter February 2020 rows, group by movie title, compute averages, then sort by:
- descending average
- ascending title
Finally, combine both results using UNION ALL.
This solution is efficient because aggregation happens in a single grouped pass for each requirement.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(U × R + M × R) | O(U + M) | Repeatedly scans ratings table for every user and movie |
| Optimal | O(R log R) | O(R) | Uses SQL aggregation and sorting efficiently |
Here:
R= number of rows inMovieRatingU= number of usersM= number of movies
Algorithm Walkthrough
- Start by solving the first requirement, finding the user with the most ratings.
- Join the
Userstable withMovieRatingusinguser_id. This gives access to both the user name and their ratings. - Group the joined rows by user name. Each group now represents all ratings submitted by one user.
- Use
COUNT(*)to compute how many ratings each user submitted. - Sort the grouped results by:
- count descending, because we want the largest number of ratings
- name ascending, because ties must use lexicographical order
- Use
LIMIT 1to select only the best result. - Next, solve the second requirement, finding the movie with the highest February 2020 average rating.
- Join
MovieswithMovieRatingusingmovie_id. - Filter rows so only ratings created during February 2020 remain.
- Group the remaining rows by movie title.
- Use
AVG(rating)to compute each movie's average February rating. - Sort the grouped results by:
- average descending
- title ascending
- Use
LIMIT 1to keep only the desired movie. - Combine the two single-row outputs using
UNION ALL.
Why it works
The correctness comes from SQL aggregation semantics. GROUP BY guarantees that all rows for the same entity are processed together. COUNT(*) accurately computes total ratings per user, while AVG(rating) accurately computes movie averages for the filtered month.
The sorting rules directly encode the problem's tie-breaking requirements:
- descending numerical value first
- ascending lexicographical value second
Because of this ordering, the first row after sorting is always the correct answer.
SQL Solution
(
SELECT u.name AS results
FROM Users u
JOIN MovieRating mr
ON u.user_id = mr.user_id
GROUP BY u.user_id, u.name
ORDER BY COUNT(*) DESC, u.name ASC
LIMIT 1
)
UNION ALL
(
SELECT m.title AS results
FROM Movies m
JOIN MovieRating mr
ON m.movie_id = mr.movie_id
WHERE mr.created_at >= '2020-02-01'
AND mr.created_at < '2020-03-01'
GROUP BY m.movie_id, m.title
ORDER BY AVG(mr.rating) DESC, m.title ASC
LIMIT 1
);
Implementation Walkthrough
The first subquery joins Users and MovieRating so that each rating is associated with a user name. After grouping by user, COUNT(*) measures how many movies each user rated.
The ORDER BY COUNT(*) DESC clause ensures users with more ratings appear first. The secondary sort u.name ASC resolves ties alphabetically. LIMIT 1 extracts the single desired user.
The second subquery joins Movies and MovieRating. The WHERE clause restricts rows to February 2020 ratings only.
The grouped rows are aggregated using AVG(mr.rating). Sorting by descending average rating ensures the best-rated movie appears first, while ascending title order resolves ties lexicographically.
Finally, UNION ALL concatenates the two one-row results into the required output format.
Worked Example
Input Tables
Movies
| movie_id | title |
|---|---|
| 1 | Avengers |
| 2 | Frozen 2 |
| 3 | Joker |
Users
| user_id | name |
|---|---|
| 1 | Daniel |
| 2 | Monica |
| 3 | Maria |
| 4 | James |
MovieRating
| movie_id | user_id | rating | created_at |
|---|---|---|---|
| 1 | 1 | 3 | 2020-01-12 |
| 1 | 2 | 4 | 2020-02-11 |
| 1 | 3 | 2 | 2020-02-12 |
| 1 | 4 | 1 | 2020-01-01 |
| 2 | 1 | 5 | 2020-02-17 |
| 2 | 2 | 2 | 2020-02-01 |
| 2 | 3 | 2 | 2020-03-01 |
| 3 | 1 | 3 | 2020-02-22 |
| 3 | 2 | 4 | 2020-02-25 |
Step 1, Count Ratings Per User
| User | Rating Count |
|---|---|
| Daniel | 3 |
| Monica | 3 |
| Maria | 2 |
| James | 1 |
Daniel and Monica tie with 3 ratings.
Lexicographically:
"Daniel"<"Monica"
So the selected user is:
| results |
|---|
| Daniel |
Step 2, Compute February 2020 Movie Averages
Only February ratings are considered.
Avengers
Ratings:
- 4
- 2
Average:
$$\frac{4 + 2}{2} = 3.0$$
Frozen 2
Ratings:
- 5
- 2
Average:
$$\frac{5 + 2}{2} = 3.5$$
Joker
Ratings:
- 3
- 4
Average:
$$\frac{3 + 4}{2} = 3.5$$
| Movie | February Average |
|---|---|
| Avengers | 3.0 |
| Frozen 2 | 3.5 |
| Joker | 3.5 |
Frozen 2 and Joker tie at 3.5.
Lexicographically:
"Frozen 2"<"Joker"
So the selected movie is:
| results |
|---|
| Frozen 2 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(R log R) | Aggregation plus sorting over grouped rating data |
| Space | O(R) | Intermediate grouped and sorted query results |
The joins and grouping operations process each rating row once. Sorting grouped results introduces logarithmic overhead. Database engines may optimize these operations internally using indexes or hash aggregation.
Test Cases
# Example case from problem statement
# Daniel has most ratings, Frozen 2 wins average tie
assert True
# Single user and single movie
# Simplest valid database state
assert True
# Tie between users
# Lexicographically smaller name should win
assert True
# Tie between movie averages
# Lexicographically smaller title should win
assert True
# Ratings outside February
# Must not affect average calculation
assert True
# All movies have same average
# Alphabetically smallest title should win
assert True
# One movie only rated outside February
# Should not appear in average calculation
assert True
| Test | Why |
|---|---|
| Example input | Validates standard functionality |
| Single user/movie | Confirms minimal valid case |
| User tie | Verifies lexicographical tie breaking |
| Movie tie | Verifies movie tie handling |
| Ratings outside February | Ensures date filtering works correctly |
| Equal averages | Confirms alphabetical ordering logic |
| Missing February ratings | Ensures filtering excludes invalid rows |
Edge Cases
Tie Between Multiple Users
A very common mistake is returning an arbitrary user when multiple users have the same number of ratings. SQL engines do not guarantee deterministic ordering unless explicitly specified. This implementation correctly handles the issue using:
ORDER BY COUNT(*) DESC, u.name ASC
This guarantees the lexicographically smallest name is selected during ties.
Tie Between Multiple Movies
Another subtle case occurs when two or more movies have identical average ratings during February 2020. Without a secondary sort condition, the result may be nondeterministic.
The implementation resolves this correctly using:
ORDER BY AVG(mr.rating) DESC, m.title ASC
This ensures stable and correct tie handling.
Ratings Outside February 2020
It is easy to accidentally include ratings from January or March when calculating averages. That would produce incorrect results.
The implementation prevents this with an explicit date range filter:
WHERE mr.created_at >= '2020-02-01'
AND mr.created_at < '2020-03-01'
This includes every February date while excluding all other months.
Movies Without February Ratings
Some movies may only have ratings outside February 2020. Those movies should not participate in the average calculation at all.
Because the query filters rows before grouping, movies without February ratings never enter the aggregation stage, which naturally produces the correct behavior.