LeetCode 578 - Get Highest Answer Rate Question
This problem asks us to analyze user interaction data stored in the SurveyLog table and determine which question has the highest answer rate. The table records actions performed by users during survey sessions. Every row represents a single interaction with a question.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
This problem asks us to analyze user interaction data stored in the SurveyLog table and determine which question has the highest answer rate.
The table records actions performed by users during survey sessions. Every row represents a single interaction with a question. The important columns for this problem are:
| Column | Meaning |
|---|---|
question_id |
The question being interacted with |
action |
The type of interaction, either "show", "answer", or "skip" |
The answer rate for a question is defined as:
$\text{Answer Rate} = \frac{\text{Number of answer actions}}{\text{Number of show actions}}$
The task is to compute this rate for every question and return the question_id with the highest value.
If multiple questions share the same maximum answer rate, we must return the smallest question_id.
The input consists of a database table that may contain duplicate rows. This detail matters because duplicates should still be counted. If the same interaction appears multiple times, each occurrence contributes to the counts.
The expected output is a single row containing the question with the best answer rate.
A few important observations immediately stand out:
A question may be shown many times but answered only a few times. In that case, its answer rate will be less than 1.
A question could theoretically have answer actions without corresponding show actions if the dataset is malformed, but the problem implicitly assumes valid survey logs. In practice, every answer should correspond to a shown question.
Questions with equal answer rates require careful tie-breaking. A naive implementation that simply sorts by rate descending may forget to sort by question_id ascending as the secondary condition.
Since this is a database aggregation problem, the core challenge is efficiently grouping rows by question_id and computing conditional counts.
Approaches
Brute Force Approach
A brute-force solution would process each distinct question_id independently.
For every question, we would scan the entire table and count:
- how many rows have
action = 'show' - how many rows have
action = 'answer'
We would then compute the ratio and track the best question.
This approach is correct because every question's answer rate is computed directly from the complete dataset. However, it is inefficient because the table is repeatedly rescanned for each question.
If there are n rows and k unique questions, this becomes an O(n * k) solution. In the worst case where every row belongs to a different question, the complexity approaches O(n²).
This repeated scanning makes the brute-force approach unnecessarily expensive.
Optimal Approach
The key insight is that all required statistics can be computed in a single grouped aggregation.
Instead of scanning the table multiple times, we group rows by question_id and compute:
- total number of
"answer"actions - total number of
"show"actions
Using SQL aggregation functions with conditional counting allows us to calculate both values simultaneously during one pass through the grouped data.
After computing the answer rate for every question, we sort:
- by answer rate descending
- by
question_idascending
The first row after sorting is the correct answer.
This works efficiently because grouping consolidates all relevant information for each question in one operation.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | Repeatedly scans the table for every question |
| Optimal | O(n log n) | O(k) | Uses grouping and sorting to compute all rates efficiently |
The sorting step dominates the complexity in most SQL implementations. The aggregation itself is linear.
Algorithm Walkthrough
- Group all rows by
question_id.
This ensures that all interactions belonging to the same question are processed together. Since the answer rate is calculated per question, grouping is the natural structure for the computation.
2. Count the number of "answer" actions for each question.
We use a conditional aggregation expression such as:
SUM(action = 'answer')
or an equivalent CASE WHEN expression depending on SQL dialect.
3. Count the number of "show" actions for each question.
This provides the denominator of the answer rate formula. 4. Compute the answer rate.
The rate is:
$\frac{\text{answer count}}{\text{show count}}$
SQL automatically handles floating-point division if at least one operand is treated as non-integer. 5. Sort all grouped questions.
The ordering criteria are:
- highest answer rate first
- smallest
question_idfirst when rates tie
- Return only the first row.
Since the rows are sorted according to the required conditions, the top row is the correct result.
Why it works
The algorithm works because every question's answer rate depends only on two aggregated values: the total answer count and total show count.
Grouping by question_id guarantees that all relevant rows are considered together. Conditional aggregation correctly counts only the desired action types. Sorting by the computed rate ensures the maximum rate appears first, while the secondary ordering by question_id correctly resolves ties.
Therefore, the algorithm always returns the required question.
Python Solution
Even though this is fundamentally a SQL problem, LeetCode database problems are solved by writing SQL queries. Below is the correct SQL solution formatted inside a Python block as requested.
SELECT
question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
SUM(action = 'answer') / SUM(action = 'show') DESC,
question_id ASC
LIMIT 1;
The query starts by grouping all records according to question_id. This allows the database engine to aggregate statistics separately for each question.
The expression:
SUM(action = 'answer')
counts how many answer actions occurred for a question because boolean expressions evaluate to 1 for true and 0 for false in MySQL.
Similarly:
SUM(action = 'show')
counts the number of show actions.
Dividing these two values produces the answer rate.
The ORDER BY clause first sorts by answer rate in descending order so the highest rate appears first. It then sorts by question_id ascending to satisfy the tie-breaking requirement.
Finally, LIMIT 1 ensures that only the best question is returned.
Go Solution
Again, since this is a database problem, the actual accepted submission is SQL. The same query is shown below in a Go block as requested.
SELECT
question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
SUM(action = 'answer') / SUM(action = 'show') DESC,
question_id ASC
LIMIT 1;
There are no language-specific implementation differences here because the solution is purely SQL-based. The same query works regardless of whether the surrounding environment is Python, Go, Java, or another language supported by LeetCode database problems.
One subtle implementation detail is that MySQL treats boolean expressions as integers during arithmetic operations. This allows concise expressions like:
SUM(action = 'answer')
instead of a longer CASE WHEN statement.
Worked Examples
Example 1
Input table:
| id | action | question_id | answer_id | q_num | timestamp |
|---|---|---|---|---|---|
| 5 | show | 285 | null | 1 | 123 |
| 5 | answer | 285 | 124124 | 1 | 124 |
| 5 | show | 369 | null | 2 | 125 |
| 5 | skip | 369 | null | 2 | 126 |
Step 1: Group by question_id
We form two groups:
| question_id | Rows |
|---|---|
| 285 | show, answer |
| 369 | show, skip |
Step 2: Count answers and shows
| question_id | answer count | show count |
|---|---|---|
| 285 | 1 | 1 |
| 369 | 0 | 1 |
Step 3: Compute answer rate
| question_id | answer rate |
|---|---|
| 285 | 1 / 1 = 1.0 |
| 369 | 0 / 1 = 0.0 |
Step 4: Sort results
Sorted by:
- answer rate descending
- question_id ascending
| question_id | answer rate |
|---|---|
| 285 | 1.0 |
| 369 | 0.0 |
Final Answer
| survey_log |
|---|
| 285 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Aggregation is linear, sorting grouped results dominates |
| Space | O(k) | Stores grouped aggregates for k unique questions |
The database scans all rows once during grouping, which requires linear time relative to the number of records.
After aggregation, the grouped results are sorted. If there are k unique questions, sorting costs O(k log k) time.
The database engine also stores aggregate information for each distinct question, which requires O(k) additional space.
Test Cases
# Example case from the problem
assert solve([
("show", 285),
("answer", 285),
("show", 369),
("skip", 369)
]) == 285 # higher answer rate wins
# Single question fully answered
assert solve([
("show", 1),
("answer", 1)
]) == 1 # rate is 1.0
# Multiple questions with same rate
assert solve([
("show", 1),
("answer", 1),
("show", 2),
("answer", 2)
]) == 1 # smaller question_id wins tie
# Question shown but never answered
assert solve([
("show", 10),
("skip", 10)
]) == 10 # rate is 0.0
# Different answer rates
assert solve([
("show", 1),
("answer", 1),
("show", 1),
("show", 2),
("answer", 2),
("show", 2)
]) == 1 # question 1 has rate 1/2, question 2 has rate 1/2, smaller id wins
# Duplicate rows
assert solve([
("show", 5),
("show", 5),
("answer", 5),
("answer", 5)
]) == 5 # duplicates count toward totals
# Large imbalance
assert solve([
("show", 1),
("show", 1),
("show", 1),
("answer", 1),
("show", 2),
("answer", 2)
]) == 2 # question 2 has better rate
Test Case Summary
| Test | Why |
|---|---|
| Basic example | Validates standard functionality |
| Single question | Smallest valid dataset |
| Equal rates | Ensures tie-breaking logic works |
| Never answered | Confirms zero-rate handling |
| Same fractional rates | Tests secondary sorting condition |
| Duplicate rows | Verifies duplicates are counted |
| Large imbalance | Ensures maximum ratio selection |
Edge Cases
Multiple Questions With the Same Answer Rate
This is one of the most important edge cases because many incorrect solutions only sort by answer rate and forget the tie-breaking rule.
For example, if questions 1 and 2 both have an answer rate of 1.0, the correct answer must be 1 because it has the smaller question_id.
The implementation handles this correctly by adding:
question_id ASC
as the secondary sorting condition.
Questions That Are Never Answered
A question may be shown many times but never answered. In that case, the answer count is zero and the answer rate becomes 0.
A buggy implementation might accidentally divide by the wrong count or exclude such questions entirely.
The aggregation logic correctly counts zero answers and still computes a valid rate.
Duplicate Rows
The problem explicitly states that duplicate rows may exist.
A common mistake would be attempting to remove duplicates before processing. That would produce incorrect counts because duplicates are intended to contribute to the totals.
The solution avoids deduplication entirely and counts every row exactly as it appears in the table.