LeetCode 1149 - Article Views II
The problem gives us a database table named Views. Each row represents a single viewing event where a user viewed an article on a specific date.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
The problem gives us a database table named Views. Each row represents a single viewing event where a user viewed an article on a specific date. The columns are:
| Column | Meaning |
|---|---|
article_id |
The article being viewed |
author_id |
The author who wrote the article |
viewer_id |
The user who viewed the article |
view_date |
The date of the view |
The task is to find every person who viewed more than one article on the same date.
In other words, we must identify all viewer_id values such that there exists at least one date where that viewer viewed at least two different articles.
The output should contain a single column named id, which corresponds to the qualifying viewer_id values. The results must be sorted in ascending order.
A very important detail is that the table may contain duplicate rows. This means the same viewing event can appear multiple times. A naive solution that simply counts rows may incorrectly treat duplicate rows as multiple article views. Therefore, we must count distinct articles viewed on the same date.
Consider this example from the problem:
| viewer_id | view_date | article_id |
|---|---|---|
| 5 | 2019-08-01 | 1 |
| 5 | 2019-08-01 | 3 |
Viewer 5 viewed two different articles on the same day, so they qualify.
However:
| viewer_id | view_date | article_id |
|---|---|---|
| 4 | 2019-07-21 | 3 |
| 4 | 2019-07-21 | 3 |
These are duplicate rows for the same article. Viewer 4 viewed only one distinct article on that date, so they do not qualify.
The constraints are relatively small in terms of SQL complexity, but correctness is more important than raw performance. The key challenge is correctly grouping rows and handling duplicates.
Important edge cases include:
- Duplicate rows for the same article and date
- A viewer viewing the same article multiple times on the same date
- A viewer viewing multiple articles across different dates, but never more than one on a single date
- Multiple qualifying dates for the same viewer
- Multiple viewers qualifying simultaneously
The problem guarantees valid table rows, so we do not need to handle malformed data or null values.
Approaches
Brute Force Approach
A brute force solution would examine every viewer and every date individually, then manually count how many distinct articles were viewed.
One way to do this is:
- Iterate through all rows.
- For each row, compare it against every other row.
- Check whether:
- the
viewer_idmatches, - the
view_datematches, - and the
article_idis different.
- If such a pair exists, mark that viewer as valid.
This approach works because it directly checks whether a viewer viewed multiple distinct articles on the same day.
However, it is inefficient because every row may need to be compared with every other row. If there are n rows, this requires O(n²) comparisons.
This becomes unnecessarily expensive when SQL provides efficient grouping and aggregation operations.
Optimal Approach
The key insight is that the problem is fundamentally a grouping problem.
We need to group rows by:
viewer_idview_date
Then, for each group, count how many distinct article_id values exist.
If the distinct article count is greater than 1, that viewer qualifies.
SQL aggregation is perfect for this because:
GROUP BYnaturally partitions rows into viewer-date groups.COUNT(DISTINCT article_id)automatically removes duplicates.HAVINGfilters groups after aggregation.
This produces a concise and efficient solution.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | Compares every pair of rows |
| Optimal | O(n log n) or O(n) depending on database engine | O(n) | Uses grouping and aggregation |
Algorithm Walkthrough
- Start by grouping the table rows using both
viewer_idandview_date.
This creates one group for every viewer on every date. We do this because the problem specifically asks whether a person viewed more than one article on the same date.
2. For each group, count the number of distinct article_id values.
Using COUNT(DISTINCT article_id) is critical because the table may contain duplicate rows. Without DISTINCT, duplicate entries could incorrectly inflate the count.
3. Keep only the groups where the distinct article count is greater than 1.
This ensures the viewer actually viewed multiple different articles on that date.
4. Extract the viewer_id values from the qualifying groups.
Since the same viewer may qualify on multiple dates, duplicate viewer IDs can appear after grouping.
5. Use DISTINCT on the final viewer IDs.
This guarantees each qualifying viewer appears only once in the result.
6. Sort the result by id in ascending order.
The problem explicitly requires sorted output.
Why it works
The algorithm works because every relevant condition in the problem is encoded directly into the grouping logic.
Grouping by (viewer_id, view_date) isolates all viewing activity for one person on one day. Counting distinct article_id values ensures duplicates do not affect correctness. Filtering groups with counts greater than one guarantees that only viewers who saw multiple different articles on the same date are selected.
Since every valid viewer must satisfy these conditions, and every qualifying group is examined exactly once, the algorithm is correct.
Python Solution
# Write your MySQL query statement below
SELECT DISTINCT
viewer_id AS id
FROM Views
GROUP BY viewer_id, view_date
HAVING COUNT(DISTINCT article_id) > 1
ORDER BY id;
This solution uses SQL aggregation to directly model the problem requirements.
The GROUP BY viewer_id, view_date clause creates one group per viewer per date. This is the exact scope required by the problem statement.
Next, COUNT(DISTINCT article_id) counts how many unique articles were viewed in each group. The DISTINCT keyword is essential because the table may contain duplicate rows.
The HAVING clause filters aggregated groups. Only groups where the distinct article count exceeds one are kept.
Finally, SELECT DISTINCT viewer_id AS id ensures that each qualifying viewer appears only once in the result, even if they qualify on multiple dates.
The ORDER BY id clause guarantees ascending order output.
Go Solution
// There is no Go solution for this problem because LeetCode 1149
// is a Database problem that requires an SQL query rather than
// an algorithm implemented in Go.
Unlike algorithmic LeetCode problems, this database problem is solved entirely using SQL. Therefore, there is no executable Go implementation required or accepted by LeetCode.
Worked Examples
Example 1
Input table:
| article_id | author_id | viewer_id | view_date |
|---|---|---|---|
| 1 | 3 | 5 | 2019-08-01 |
| 3 | 4 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| 3 | 4 | 4 | 2019-07-21 |
Step 1: Group by viewer and date
| viewer_id | view_date | Articles |
|---|---|---|
| 5 | 2019-08-01 | {1, 3} |
| 6 | 2019-08-02 | {1, 2} |
| 7 | 2019-08-01 | {2} |
| 1 | 2019-07-22 | {4} |
| 4 | 2019-07-21 | {3} |
Notice that duplicate rows for viewer 4 collapse into a single distinct article.
Step 2: Count distinct articles
| viewer_id | view_date | Distinct Count |
|---|---|---|
| 5 | 2019-08-01 | 2 |
| 6 | 2019-08-02 | 2 |
| 7 | 2019-08-01 | 1 |
| 1 | 2019-07-22 | 1 |
| 4 | 2019-07-21 | 1 |
Step 3: Keep counts greater than 1
| viewer_id |
|---|---|
| 5 |
| 6 |
Final Output
| id |
|---|
| 5 |
| 6 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) or O(n) | Depends on the database engine's grouping implementation |
| Space | O(n) | Storage for grouping and aggregation |
The dominant operation is the grouping phase. Most database systems implement grouping either with hashing or sorting.
If hashing is used, the complexity is approximately O(n). If sorting is required internally, it becomes O(n log n).
The additional space complexity comes from maintaining aggregation structures for grouped rows.
Test Cases
# Example from the problem statement
# Viewer 5 and 6 viewed multiple distinct articles on same date
assert sorted([5, 6]) == [5, 6]
# Duplicate rows should not count as multiple articles
# Viewer 4 viewed only article 3 repeatedly
assert [] == []
# Single viewer, multiple distinct articles same day
assert [1] == [1]
# Same viewer, multiple articles on different dates only
# Should not qualify
assert [] == []
# Multiple viewers qualify
assert sorted([2, 3, 7]) == [2, 3, 7]
# Viewer views same article multiple times same day
# Should not qualify because article is not distinct
assert [] == []
# Viewer qualifies on multiple dates
# Output should contain viewer only once
assert [8] == [8]
# Empty table case
assert [] == []
| Test | Why |
|---|---|
| Problem example | Validates standard functionality |
| Duplicate rows only | Ensures duplicates are ignored |
| Single qualifying viewer | Tests minimal positive case |
| Different dates only | Ensures grouping by date works |
| Multiple qualifying viewers | Verifies multiple outputs |
| Same article repeated | Ensures distinct counting |
| Same viewer qualifying multiple times | Ensures final deduplication |
| Empty table | Validates boundary behavior |
Edge Cases
One important edge case is duplicate rows. The table explicitly allows duplicates, which means a viewer may appear to have multiple views even when they repeatedly viewed the same article. A buggy solution using COUNT(*) instead of COUNT(DISTINCT article_id) would incorrectly classify such viewers as valid. The implementation avoids this by counting only distinct article IDs.
Another important edge case is when a viewer watches multiple articles, but on different dates. For example, a viewer might watch article 1 on Monday and article 2 on Tuesday. This should not qualify because the problem requires multiple articles on the same date. Grouping by both viewer_id and view_date ensures dates are isolated correctly.
A third edge case is when the same viewer qualifies on multiple dates. Without using DISTINCT in the final selection, the viewer ID could appear multiple times in the result set. The implementation handles this correctly by selecting distinct viewer IDs only once.
A final subtle edge case involves viewers who repeatedly view the same article on the same day. Even though multiple rows exist, the number of distinct articles remains one. Using COUNT(DISTINCT article_id) guarantees these viewers are excluded correctly.