LeetCode 578 - Get Highest Answer Rate Question

This problem asks us to analyze user interaction data stored in the SurveyLog table and determine which question has the highest answer rate. The table records actions performed by users during survey sessions. Every row represents a single interaction with a question.

LeetCode Problem 578

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This problem asks us to analyze user interaction data stored in the SurveyLog table and determine which question has the highest answer rate.

The table records actions performed by users during survey sessions. Every row represents a single interaction with a question. The important columns for this problem are:

Column Meaning
question_id The question being interacted with
action The type of interaction, either "show", "answer", or "skip"

The answer rate for a question is defined as:

$\text{Answer Rate} = \frac{\text{Number of answer actions}}{\text{Number of show actions}}$

The task is to compute this rate for every question and return the question_id with the highest value.

If multiple questions share the same maximum answer rate, we must return the smallest question_id.

The input consists of a database table that may contain duplicate rows. This detail matters because duplicates should still be counted. If the same interaction appears multiple times, each occurrence contributes to the counts.

The expected output is a single row containing the question with the best answer rate.

A few important observations immediately stand out:

A question may be shown many times but answered only a few times. In that case, its answer rate will be less than 1.

A question could theoretically have answer actions without corresponding show actions if the dataset is malformed, but the problem implicitly assumes valid survey logs. In practice, every answer should correspond to a shown question.

Questions with equal answer rates require careful tie-breaking. A naive implementation that simply sorts by rate descending may forget to sort by question_id ascending as the secondary condition.

Since this is a database aggregation problem, the core challenge is efficiently grouping rows by question_id and computing conditional counts.

Approaches

Brute Force Approach

A brute-force solution would process each distinct question_id independently.

For every question, we would scan the entire table and count:

  • how many rows have action = 'show'
  • how many rows have action = 'answer'

We would then compute the ratio and track the best question.

This approach is correct because every question's answer rate is computed directly from the complete dataset. However, it is inefficient because the table is repeatedly rescanned for each question.

If there are n rows and k unique questions, this becomes an O(n * k) solution. In the worst case where every row belongs to a different question, the complexity approaches O(n²).

This repeated scanning makes the brute-force approach unnecessarily expensive.

Optimal Approach

The key insight is that all required statistics can be computed in a single grouped aggregation.

Instead of scanning the table multiple times, we group rows by question_id and compute:

  • total number of "answer" actions
  • total number of "show" actions

Using SQL aggregation functions with conditional counting allows us to calculate both values simultaneously during one pass through the grouped data.

After computing the answer rate for every question, we sort:

  1. by answer rate descending
  2. by question_id ascending

The first row after sorting is the correct answer.

This works efficiently because grouping consolidates all relevant information for each question in one operation.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(1) Repeatedly scans the table for every question
Optimal O(n log n) O(k) Uses grouping and sorting to compute all rates efficiently

The sorting step dominates the complexity in most SQL implementations. The aggregation itself is linear.

Algorithm Walkthrough

  1. Group all rows by question_id.

This ensures that all interactions belonging to the same question are processed together. Since the answer rate is calculated per question, grouping is the natural structure for the computation. 2. Count the number of "answer" actions for each question.

We use a conditional aggregation expression such as:

SUM(action = 'answer')

or an equivalent CASE WHEN expression depending on SQL dialect. 3. Count the number of "show" actions for each question.

This provides the denominator of the answer rate formula. 4. Compute the answer rate.

The rate is:

$\frac{\text{answer count}}{\text{show count}}$

SQL automatically handles floating-point division if at least one operand is treated as non-integer. 5. Sort all grouped questions.

The ordering criteria are:

  • highest answer rate first
  • smallest question_id first when rates tie
  1. Return only the first row.

Since the rows are sorted according to the required conditions, the top row is the correct result.

Why it works

The algorithm works because every question's answer rate depends only on two aggregated values: the total answer count and total show count.

Grouping by question_id guarantees that all relevant rows are considered together. Conditional aggregation correctly counts only the desired action types. Sorting by the computed rate ensures the maximum rate appears first, while the secondary ordering by question_id correctly resolves ties.

Therefore, the algorithm always returns the required question.

Python Solution

Even though this is fundamentally a SQL problem, LeetCode database problems are solved by writing SQL queries. Below is the correct SQL solution formatted inside a Python block as requested.

SELECT
    question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
    SUM(action = 'answer') / SUM(action = 'show') DESC,
    question_id ASC
LIMIT 1;

The query starts by grouping all records according to question_id. This allows the database engine to aggregate statistics separately for each question.

The expression:

SUM(action = 'answer')

counts how many answer actions occurred for a question because boolean expressions evaluate to 1 for true and 0 for false in MySQL.

Similarly:

SUM(action = 'show')

counts the number of show actions.

Dividing these two values produces the answer rate.

The ORDER BY clause first sorts by answer rate in descending order so the highest rate appears first. It then sorts by question_id ascending to satisfy the tie-breaking requirement.

Finally, LIMIT 1 ensures that only the best question is returned.

Go Solution

Again, since this is a database problem, the actual accepted submission is SQL. The same query is shown below in a Go block as requested.

SELECT
    question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
    SUM(action = 'answer') / SUM(action = 'show') DESC,
    question_id ASC
LIMIT 1;

There are no language-specific implementation differences here because the solution is purely SQL-based. The same query works regardless of whether the surrounding environment is Python, Go, Java, or another language supported by LeetCode database problems.

One subtle implementation detail is that MySQL treats boolean expressions as integers during arithmetic operations. This allows concise expressions like:

SUM(action = 'answer')

instead of a longer CASE WHEN statement.

Worked Examples

Example 1

Input table:

id action question_id answer_id q_num timestamp
5 show 285 null 1 123
5 answer 285 124124 1 124
5 show 369 null 2 125
5 skip 369 null 2 126

Step 1: Group by question_id

We form two groups:

question_id Rows
285 show, answer
369 show, skip

Step 2: Count answers and shows

question_id answer count show count
285 1 1
369 0 1

Step 3: Compute answer rate

question_id answer rate
285 1 / 1 = 1.0
369 0 / 1 = 0.0

Step 4: Sort results

Sorted by:

  1. answer rate descending
  2. question_id ascending
question_id answer rate
285 1.0
369 0.0

Final Answer

survey_log
285

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Aggregation is linear, sorting grouped results dominates
Space O(k) Stores grouped aggregates for k unique questions

The database scans all rows once during grouping, which requires linear time relative to the number of records.

After aggregation, the grouped results are sorted. If there are k unique questions, sorting costs O(k log k) time.

The database engine also stores aggregate information for each distinct question, which requires O(k) additional space.

Test Cases

# Example case from the problem
assert solve([
    ("show", 285),
    ("answer", 285),
    ("show", 369),
    ("skip", 369)
]) == 285  # higher answer rate wins

# Single question fully answered
assert solve([
    ("show", 1),
    ("answer", 1)
]) == 1  # rate is 1.0

# Multiple questions with same rate
assert solve([
    ("show", 1),
    ("answer", 1),
    ("show", 2),
    ("answer", 2)
]) == 1  # smaller question_id wins tie

# Question shown but never answered
assert solve([
    ("show", 10),
    ("skip", 10)
]) == 10  # rate is 0.0

# Different answer rates
assert solve([
    ("show", 1),
    ("answer", 1),
    ("show", 1),
    ("show", 2),
    ("answer", 2),
    ("show", 2)
]) == 1  # question 1 has rate 1/2, question 2 has rate 1/2, smaller id wins

# Duplicate rows
assert solve([
    ("show", 5),
    ("show", 5),
    ("answer", 5),
    ("answer", 5)
]) == 5  # duplicates count toward totals

# Large imbalance
assert solve([
    ("show", 1),
    ("show", 1),
    ("show", 1),
    ("answer", 1),
    ("show", 2),
    ("answer", 2)
]) == 2  # question 2 has better rate

Test Case Summary

Test Why
Basic example Validates standard functionality
Single question Smallest valid dataset
Equal rates Ensures tie-breaking logic works
Never answered Confirms zero-rate handling
Same fractional rates Tests secondary sorting condition
Duplicate rows Verifies duplicates are counted
Large imbalance Ensures maximum ratio selection

Edge Cases

Multiple Questions With the Same Answer Rate

This is one of the most important edge cases because many incorrect solutions only sort by answer rate and forget the tie-breaking rule.

For example, if questions 1 and 2 both have an answer rate of 1.0, the correct answer must be 1 because it has the smaller question_id.

The implementation handles this correctly by adding:

question_id ASC

as the secondary sorting condition.

Questions That Are Never Answered

A question may be shown many times but never answered. In that case, the answer count is zero and the answer rate becomes 0.

A buggy implementation might accidentally divide by the wrong count or exclude such questions entirely.

The aggregation logic correctly counts zero answers and still computes a valid rate.

Duplicate Rows

The problem explicitly states that duplicate rows may exist.

A common mistake would be attempting to remove duplicates before processing. That would produce incorrect counts because duplicates are intended to contribute to the totals.

The solution avoids deduplication entirely and counts every row exactly as it appears in the table.