LeetCode 3642 - Find Books with Polarized Opinions

The problem asks us to identify books that exhibit polarized opinions based on reading session ratings. We have two tables: books, which contains metadata about each book including its title, author, genre, and number of pages, and readingsessions, which records individual…

LeetCode Problem 3642

Difficulty: 🟡 Medium
Topics:

Solution

Problem Understanding

The problem asks us to identify books that exhibit polarized opinions based on reading session ratings. We have two tables: books, which contains metadata about each book including its title, author, genre, and number of pages, and reading_sessions, which records individual reading sessions with a session_rating from 1 to 5.

A book qualifies as having polarized opinions if it satisfies the following criteria: it has at least five reading sessions, it has at least one high rating (≥ 4) and at least one low rating (≤ 2), and its polarization score-defined as the fraction of extreme ratings (≤2 or ≥4) to total sessions-is at least 0.6. Additionally, we must compute the rating spread (highest rating minus lowest rating) and return the results ordered by polarization score descending and title descending.

Key edge considerations include books that have exactly 5 sessions, books where all ratings are high or low, and books where the extreme ratings are below 60% of total sessions. Another subtle point is ensuring the polarization score is rounded to two decimal places.

Approaches

The brute-force approach would involve iterating over each book, collecting all its sessions, counting extreme ratings, checking if high and low ratings exist, and computing the polarization score. While this works, it requires multiple passes over the sessions for each book, which can be inefficient if the dataset is large.

The optimal approach leverages SQL aggregation and conditional expressions. We can join books and reading_sessions, group by book_id, and compute aggregate values such as the highest rating, lowest rating, total sessions, extreme rating counts, and the polarization score in a single query. Conditional aggregation allows us to count extreme ratings efficiently, and filtering can be applied after aggregation. This approach is both simpler and more performant because it reduces the number of passes over the data.

Approach Time Complexity Space Complexity Notes
Brute Force O(B * S) O(S) Iterate each book and each session, compute counts manually
Optimal O(S) O(B) Use SQL aggregation, one pass per session, group by book

Algorithm Walkthrough

  1. Join the books table with the reading_sessions table on book_id to bring together metadata and ratings.
  2. Group the combined table by book_id to compute per-book aggregates: count of sessions, max rating, min rating, and sum of extreme ratings where ratings are ≤2 or ≥4.
  3. Filter groups to only include books with at least 5 reading sessions. This ensures we consider only books with sufficient data.
  4. Within each group, check that there exists at least one high rating (max ≥ 4) and at least one low rating (min ≤ 2). This identifies books with polarized ratings.
  5. Compute the rating spread as max_rating - min_rating to measure the intensity of polarization.
  6. Compute the polarization score as the ratio of extreme ratings to total sessions, rounded to two decimal places. Apply a filter to retain only books with a polarization score ≥ 0.6.
  7. Select required columns: book_id, title, author, genre, pages, rating_spread, polarization_score.
  8. Order the final results first by polarization score descending, then by title descending.

This algorithm works because the aggregation guarantees that all conditions (high and low ratings, sufficient sessions, extreme ratings fraction) are checked correctly, and SQL handles the grouping efficiently. Conditional sums accurately capture the number of extreme ratings, and the final filter ensures only highly polarized books remain.

Python Solution

import pandas as pd

def find_books_with_polarized_opinions(books: pd.DataFrame, reading_sessions: pd.DataFrame) -> pd.DataFrame:
    # Merge books and reading_sessions
    merged = pd.merge(books, reading_sessions, on='book_id')
    
    # Aggregate statistics per book
    agg = merged.groupby(['book_id', 'title', 'author', 'genre', 'pages']).agg(
        total_sessions=('session_rating', 'count'),
        max_rating=('session_rating', 'max'),
        min_rating=('session_rating', 'min'),
        extreme_ratings=('session_rating', lambda x: ((x <= 2) | (x >= 4)).sum())
    ).reset_index()
    
    # Compute rating spread and polarization score
    agg['rating_spread'] = agg['max_rating'] - agg['min_rating']
    agg['polarization_score'] = (agg['extreme_ratings'] / agg['total_sessions']).round(2)
    
    # Apply filters
    result = agg[
        (agg['total_sessions'] >= 5) &
        (agg['max_rating'] >= 4) &
        (agg['min_rating'] <= 2) &
        (agg['polarization_score'] >= 0.6)
    ][['book_id', 'title', 'author', 'genre', 'pages', 'rating_spread', 'polarization_score']]
    
    # Order by polarization_score desc, title desc
    result = result.sort_values(by=['polarization_score', 'title'], ascending=[False, False])
    
    return result

This Python implementation mirrors the algorithm steps. We merge the two tables to bring ratings and book details together, then aggregate per book using pandas. Conditional aggregation identifies extreme ratings efficiently. Filters are applied sequentially to enforce session count, rating bounds, and polarization score. Finally, sorting ensures correct ordering.

Go Solution

package main

import (
    "database/sql"
    "fmt"
    _ "github.com/go-sql-driver/mysql"
)

type Book struct {
    BookID            int
    Title             string
    Author            string
    Genre             string
    Pages             int
    RatingSpread      int
    PolarizationScore float64
}

func FindBooksWithPolarizedOpinions(db *sql.DB) ([]Book, error) {
    query := `
        SELECT 
            b.book_id,
            b.title,
            b.author,
            b.genre,
            b.pages,
            MAX(rs.session_rating) - MIN(rs.session_rating) AS rating_spread,
            ROUND(SUM(CASE WHEN rs.session_rating <= 2 OR rs.session_rating >= 4 THEN 1 ELSE 0 END) / COUNT(*), 2) AS polarization_score
        FROM books b
        JOIN reading_sessions rs ON b.book_id = rs.book_id
        GROUP BY b.book_id, b.title, b.author, b.genre, b.pages
        HAVING COUNT(*) >= 5
           AND MAX(rs.session_rating) >= 4
           AND MIN(rs.session_rating) <= 2
           AND ROUND(SUM(CASE WHEN rs.session_rating <= 2 OR rs.session_rating >= 4 THEN 1 ELSE 0 END) / COUNT(*), 2) >= 0.6
        ORDER BY polarization_score DESC, title DESC;
    `
    
    rows, err := db.Query(query)
    if err != nil {
        return nil, err
    }
    defer rows.Close()
    
    var books []Book
    for rows.Next() {
        var b Book
        if err := rows.Scan(&b.BookID, &b.Title, &b.Author, &b.Genre, &b.Pages, &b.RatingSpread, &b.PolarizationScore); err != nil {
            return nil, err
        }
        books = append(books, b)
    }
    
    return books, nil
}

In Go, we execute a single SQL query that performs all aggregation and filtering on the database side. We use ROUND for the polarization score and CASE WHEN for counting extreme ratings. The Go code then scans the rows into a struct slice. This approach avoids manual iteration and leverages SQL efficiency.

Worked Examples

Example: The Great Gatsby

  • Merge with reading sessions gives ratings [5, 1, 4, 2, 5].
  • total_sessions = 5, max_rating = 5, min_rating = 1.
  • Count extreme ratings: 5, 1, 4, 2, 5 → 5 extreme ratings.
  • Polarization score = 5 / 5 = 1.00, rating spread = 5 - 1 = 4.
  • Meets all filters → included.

Example: 1984

  • Ratings [2, 1, 2, 1, 4, 5].
  • total_sessions = 6, max_rating = 5, min_rating = 1.
  • Extreme ratings count = 6.
  • Polarization score = 6 / 6 = 1.00, rating spread = 4.
  • Meets all filters → included.

Books like To Kill a Mockingbird are excluded because all ratings are high, failing the low rating condition.

Complexity Analysis

Measure Complexity Explanation
Time O(S) Each reading session is processed once during aggregation.
Space O(B) We store per-book aggregates, where B is the number of books.

Since we rely on SQL/pandas aggregation, the complexity is linear in the number of sessions. Memory usage scales with the number of books, not sessions.

Test Cases

# Provided example
assert find_books_with_polarized_opinions(books_df, sessions_df).shape[0] == 2  # The Great Gatsby and 1984

# Boundary: exactly 5 sessions with polarization
books_test = pd.DataFrame({'book_id':[