LeetCode 3642 - Find Books with Polarized Opinions
The problem asks us to identify books that exhibit polarized opinions based on reading session ratings. We have two tables: books, which contains metadata about each book including its title, author, genre, and number of pages, and readingsessions, which records individual…
Difficulty: 🟡 Medium
Topics: —
Solution
Problem Understanding
The problem asks us to identify books that exhibit polarized opinions based on reading session ratings. We have two tables: books, which contains metadata about each book including its title, author, genre, and number of pages, and reading_sessions, which records individual reading sessions with a session_rating from 1 to 5.
A book qualifies as having polarized opinions if it satisfies the following criteria: it has at least five reading sessions, it has at least one high rating (≥ 4) and at least one low rating (≤ 2), and its polarization score-defined as the fraction of extreme ratings (≤2 or ≥4) to total sessions-is at least 0.6. Additionally, we must compute the rating spread (highest rating minus lowest rating) and return the results ordered by polarization score descending and title descending.
Key edge considerations include books that have exactly 5 sessions, books where all ratings are high or low, and books where the extreme ratings are below 60% of total sessions. Another subtle point is ensuring the polarization score is rounded to two decimal places.
Approaches
The brute-force approach would involve iterating over each book, collecting all its sessions, counting extreme ratings, checking if high and low ratings exist, and computing the polarization score. While this works, it requires multiple passes over the sessions for each book, which can be inefficient if the dataset is large.
The optimal approach leverages SQL aggregation and conditional expressions. We can join books and reading_sessions, group by book_id, and compute aggregate values such as the highest rating, lowest rating, total sessions, extreme rating counts, and the polarization score in a single query. Conditional aggregation allows us to count extreme ratings efficiently, and filtering can be applied after aggregation. This approach is both simpler and more performant because it reduces the number of passes over the data.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(B * S) | O(S) | Iterate each book and each session, compute counts manually |
| Optimal | O(S) | O(B) | Use SQL aggregation, one pass per session, group by book |
Algorithm Walkthrough
- Join the
bookstable with thereading_sessionstable onbook_idto bring together metadata and ratings. - Group the combined table by
book_idto compute per-book aggregates:countof sessions,maxrating,minrating, andsumof extreme ratings where ratings are≤2 or ≥4. - Filter groups to only include books with at least 5 reading sessions. This ensures we consider only books with sufficient data.
- Within each group, check that there exists at least one high rating (
max ≥ 4) and at least one low rating (min ≤ 2). This identifies books with polarized ratings. - Compute the rating spread as
max_rating - min_ratingto measure the intensity of polarization. - Compute the polarization score as the ratio of extreme ratings to total sessions, rounded to two decimal places. Apply a filter to retain only books with a polarization score ≥ 0.6.
- Select required columns:
book_id,title,author,genre,pages,rating_spread,polarization_score. - Order the final results first by polarization score descending, then by title descending.
This algorithm works because the aggregation guarantees that all conditions (high and low ratings, sufficient sessions, extreme ratings fraction) are checked correctly, and SQL handles the grouping efficiently. Conditional sums accurately capture the number of extreme ratings, and the final filter ensures only highly polarized books remain.
Python Solution
import pandas as pd
def find_books_with_polarized_opinions(books: pd.DataFrame, reading_sessions: pd.DataFrame) -> pd.DataFrame:
# Merge books and reading_sessions
merged = pd.merge(books, reading_sessions, on='book_id')
# Aggregate statistics per book
agg = merged.groupby(['book_id', 'title', 'author', 'genre', 'pages']).agg(
total_sessions=('session_rating', 'count'),
max_rating=('session_rating', 'max'),
min_rating=('session_rating', 'min'),
extreme_ratings=('session_rating', lambda x: ((x <= 2) | (x >= 4)).sum())
).reset_index()
# Compute rating spread and polarization score
agg['rating_spread'] = agg['max_rating'] - agg['min_rating']
agg['polarization_score'] = (agg['extreme_ratings'] / agg['total_sessions']).round(2)
# Apply filters
result = agg[
(agg['total_sessions'] >= 5) &
(agg['max_rating'] >= 4) &
(agg['min_rating'] <= 2) &
(agg['polarization_score'] >= 0.6)
][['book_id', 'title', 'author', 'genre', 'pages', 'rating_spread', 'polarization_score']]
# Order by polarization_score desc, title desc
result = result.sort_values(by=['polarization_score', 'title'], ascending=[False, False])
return result
This Python implementation mirrors the algorithm steps. We merge the two tables to bring ratings and book details together, then aggregate per book using pandas. Conditional aggregation identifies extreme ratings efficiently. Filters are applied sequentially to enforce session count, rating bounds, and polarization score. Finally, sorting ensures correct ordering.
Go Solution
package main
import (
"database/sql"
"fmt"
_ "github.com/go-sql-driver/mysql"
)
type Book struct {
BookID int
Title string
Author string
Genre string
Pages int
RatingSpread int
PolarizationScore float64
}
func FindBooksWithPolarizedOpinions(db *sql.DB) ([]Book, error) {
query := `
SELECT
b.book_id,
b.title,
b.author,
b.genre,
b.pages,
MAX(rs.session_rating) - MIN(rs.session_rating) AS rating_spread,
ROUND(SUM(CASE WHEN rs.session_rating <= 2 OR rs.session_rating >= 4 THEN 1 ELSE 0 END) / COUNT(*), 2) AS polarization_score
FROM books b
JOIN reading_sessions rs ON b.book_id = rs.book_id
GROUP BY b.book_id, b.title, b.author, b.genre, b.pages
HAVING COUNT(*) >= 5
AND MAX(rs.session_rating) >= 4
AND MIN(rs.session_rating) <= 2
AND ROUND(SUM(CASE WHEN rs.session_rating <= 2 OR rs.session_rating >= 4 THEN 1 ELSE 0 END) / COUNT(*), 2) >= 0.6
ORDER BY polarization_score DESC, title DESC;
`
rows, err := db.Query(query)
if err != nil {
return nil, err
}
defer rows.Close()
var books []Book
for rows.Next() {
var b Book
if err := rows.Scan(&b.BookID, &b.Title, &b.Author, &b.Genre, &b.Pages, &b.RatingSpread, &b.PolarizationScore); err != nil {
return nil, err
}
books = append(books, b)
}
return books, nil
}
In Go, we execute a single SQL query that performs all aggregation and filtering on the database side. We use ROUND for the polarization score and CASE WHEN for counting extreme ratings. The Go code then scans the rows into a struct slice. This approach avoids manual iteration and leverages SQL efficiency.
Worked Examples
Example: The Great Gatsby
- Merge with reading sessions gives ratings
[5, 1, 4, 2, 5]. total_sessions = 5,max_rating = 5,min_rating = 1.- Count extreme ratings:
5, 1, 4, 2, 5→ 5 extreme ratings. - Polarization score = 5 / 5 = 1.00, rating spread = 5 - 1 = 4.
- Meets all filters → included.
Example: 1984
- Ratings
[2, 1, 2, 1, 4, 5]. total_sessions = 6,max_rating = 5,min_rating = 1.- Extreme ratings count = 6.
- Polarization score = 6 / 6 = 1.00, rating spread = 4.
- Meets all filters → included.
Books like To Kill a Mockingbird are excluded because all ratings are high, failing the low rating condition.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(S) | Each reading session is processed once during aggregation. |
| Space | O(B) | We store per-book aggregates, where B is the number of books. |
Since we rely on SQL/pandas aggregation, the complexity is linear in the number of sessions. Memory usage scales with the number of books, not sessions.
Test Cases
# Provided example
assert find_books_with_polarized_opinions(books_df, sessions_df).shape[0] == 2 # The Great Gatsby and 1984
# Boundary: exactly 5 sessions with polarization
books_test = pd.DataFrame({'book_id':[