LeetCode 578: Get Highest Answer Rate Question

A clear SQL guide for finding the question with the highest answer rate from survey logs.

Problem Restatement

We are given a table called SurveyLog.

Each row records one user action on one question. The action can be:

'show'
'answer'
'skip'

The answer rate of a question is:

number of answer actions / number of show actions

We need to return the question_id with the highest answer rate. If multiple questions have the same highest answer rate, return the smallest question_id. The returned column should be named survey_log.

Table

SurveyLog

Column Type Meaning
id int User ID
action varchar One of show, answer, or skip
question_id int Question ID
answer_id int Answer ID, not null only when action is answer
q_num int Order of the question in the current session
timestamp int Time of the action

Example

Input:

id action question_id answer_id q_num timestamp
5 show 285 null 1 123
5 answer 285 124124 1 124
5 show 369 null 2 125
5 skip 369 null 2 126

For question 285:

answer count = 1
show count = 1
answer rate = 1 / 1 = 1.0

For question 369:

answer count = 0
show count = 1
answer rate = 0 / 1 = 0.0

Output:

survey_log
285

First Thought: Count Answers Only

A first attempt might be to count only answer rows:

SELECT question_id
FROM SurveyLog
WHERE action = 'answer'
GROUP BY question_id
ORDER BY COUNT(*) DESC
LIMIT 1;

This is not enough.

The problem asks for answer rate, not answer count.

A question shown 100 times and answered 50 times has rate 0.5.

A question shown 2 times and answered 2 times has rate 1.0.

The second question should rank higher even though it has fewer total answers.

Key Insight

For each question, we need two counts:

answer_count = number of rows where action = 'answer'
show_count = number of rows where action = 'show'

Then compute:

answer_rate = answer_count / show_count

In SQL, we can compute conditional counts using SUM(CASE WHEN ... THEN 1 ELSE 0 END).

Algorithm

  1. Group rows by question_id.
  2. Count how many times each question was answered.
  3. Count how many times each question was shown.
  4. Sort by answer rate from highest to lowest.
  5. If rates tie, sort by question_id from smallest to largest.
  6. Return the first row.

Correctness

The query groups all rows with the same question_id, so every aggregate is computed per question.

For each group, the answer count includes exactly the rows whose action is answer. The show count includes exactly the rows whose action is show. Therefore, their ratio is exactly the answer rate defined by the problem.

Sorting by this ratio in descending order places the question with the highest answer rate first.

If multiple questions have the same answer rate, sorting by question_id in ascending order places the smallest question_id first.

Therefore, selecting the first row returns exactly the required question.

Complexity

Let:

N = number of rows in SurveyLog
Q = number of distinct questions
Metric Value Why
Time O(N + Q log Q) Scan rows, group by question, then sort grouped results
Space O(Q) Store one aggregate row per question

The exact execution plan depends on the database engine.

Implementation

SELECT
    question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
    SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END)
    / SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END) DESC,
    question_id ASC
LIMIT 1;

Code Explanation

This groups all actions for the same question:

GROUP BY question_id

This counts answer actions:

SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END)

This counts show actions:

SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END)

The ratio is used for sorting:

ORDER BY
    answer_count / show_count DESC

The query writes the expression directly inside ORDER BY.

The tie-breaker is:

question_id ASC

So if two questions have the same answer rate, the smaller question ID is returned.

The output column is renamed here:

question_id AS survey_log

MySQL Integer Division Note

In MySQL, division with / returns a decimal result, so the implementation above works.

A more explicit version is:

SELECT
    question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
    SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END) * 1.0
    / SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END) DESC,
    question_id ASC
LIMIT 1;

Multiplying by 1.0 makes the ratio clearly numeric.

Testing

Sample data:

CREATE TABLE SurveyLog (
    id INT,
    action VARCHAR(20),
    question_id INT,
    answer_id INT,
    q_num INT,
    timestamp INT
);

INSERT INTO SurveyLog (id, action, question_id, answer_id, q_num, timestamp) VALUES
(5, 'show', 285, NULL, 1, 123),
(5, 'answer', 285, 124124, 1, 124),
(5, 'show', 369, NULL, 2, 125),
(5, 'skip', 369, NULL, 2, 126);

Query:

SELECT
    question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
    SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END) * 1.0
    / SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END) DESC,
    question_id ASC
LIMIT 1;

Expected result:

survey_log
285

Additional test cases:

Case Expected behavior
One question has rate 1.0 That question should rank first
One question has many answers but lower rate It should not win by count alone
Two questions have the same rate Smaller question_id should win
Question has shows but no answers Its answer rate is 0
Several users see the same question All rows for that question are grouped together