LeetCode 3586 - Find COVID Recovery Patients
The task is to identify patients who have recovered from COVID based on their testing history. We are given two tables: one containing patient demographic information and another containing COVID test records with timestamps and results.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
The task is to identify patients who have recovered from COVID based on their testing history. We are given two tables: one containing patient demographic information and another containing COVID test records with timestamps and results.
A patient is considered recovered if they have at least one positive test and, after that positive test, they later have at least one negative test. The recovery time is defined as the number of days between the first positive test and the first negative test that occurs after that positive test.
The input represents relational data where patients contains static attributes per patient, and covid_tests contains a time series of test events per patient. The output must return one row per recovered patient, enriched with patient information and the computed recovery time. The results must be sorted first by recovery time in ascending order, then by patient name in ascending order.
Although the problem does not explicitly state constraints, typical LeetCode database problems imply that the dataset can be large enough that naive per-patient scanning with repeated subqueries could become inefficient. The structure of the problem strongly suggests grouping and aggregation over partitions of test records.
Important edge cases include patients with only negative tests, patients with only positive tests, patients whose negative tests occur before their first positive test, and patients with multiple positive and negative cycles where only the first valid recovery sequence should be considered.
Approaches
The brute-force approach considers each patient independently and, for each patient, scans all of their test records repeatedly to find the first positive test, then searches again to find the first qualifying negative test after it. This repeated scanning leads to inefficient nested processing and becomes expensive as the number of tests grows.
The key insight is that recovery depends only on ordered events per patient, and only two points matter: the earliest positive test and the earliest negative test after that positive test. This allows us to reduce the problem to grouping by patient, filtering by condition, and using aggregation functions over ordered subsets of data.
A more efficient approach uses grouping by patient_id, computing the minimum positive test date, and then filtering negative tests that occur after that date to compute the minimum valid recovery date.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(P × T) | O(1) extra | For each patient, repeatedly scan all tests to locate valid dates |
| Optimal | O(T log T) | O(P) | Group by patient, aggregate first positive and conditional first negative |
Algorithm Walkthrough
- First, we conceptually separate test records by patient using
patient_idbecause recovery is defined independently per patient. This ensures that all subsequent computations respect patient-level boundaries. - For each patient, we identify the earliest date where the test result is "Positive". This is critical because the definition of recovery explicitly depends on the first occurrence of infection, not later reinfections or repeated positives.
- After identifying the first positive date for a patient, we restrict our attention to only those test records for the same patient whose date is strictly greater than the first positive date. This ensures we are measuring recovery after infection rather than unrelated earlier tests.
- Within this filtered subset, we identify the earliest test where the result is "Negative". This represents the first confirmed recovery event after infection.
- We compute recovery time as the difference in days between the first negative date after infection and the first positive date. This produces a single scalar value per recovered patient.
- We join these computed results back with the
patientstable to retrieve demographic information such aspatient_nameandage. - Finally, we filter out patients who do not have both a valid first positive and a valid subsequent negative test, and we sort the result by
recovery_timeascending and thenpatient_nameascending.
Why it works
The correctness relies on the invariant that only the earliest positive test can start a valid recovery window, and only the earliest negative test after that point can end it. Any earlier negative tests are irrelevant because they do not satisfy the “after positive” condition, and any later negatives are dominated by the first qualifying one due to the minimum aggregation. The problem asks us to identify patients who have recovered from COVID based on their test history. A patient is considered recovered if they have at least one positive test followed by at least one negative test on a later date. We are to calculate the recovery time as the difference in days between the first positive test and the first negative test after that positive test.
The inputs consist of two tables: patients and covid_tests. The patients table contains basic demographic data and a unique patient_id. The covid_tests table contains all COVID test results, including the date of the test and whether the result was Positive, Negative, or Inconclusive.
The expected output is a table of recovered patients with columns: patient_id, patient_name, age, and recovery_time. This table must be sorted first by recovery_time ascending, then by patient_name ascending.
Key constraints and observations:
- Each patient may have multiple positive, negative, or inconclusive tests.
- Only the first negative after the first positive counts toward recovery time.
- Patients without both positive and negative tests are excluded.
- Edge cases include patients with multiple positive tests before a negative, patients with inconclusive tests between positive and negative, and patients with only one type of test.
Important edge cases that may break naive solutions include: patients who first test negative, multiple positives before a negative, and interspersed inconclusive tests.
Approaches
Brute Force Approach:
A brute force method would scan the covid_tests table for each patient, identify all positive and negative tests, and then compute the first negative after each positive. This approach requires nested loops: for each patient, iterate through positive tests and for each positive, iterate through all subsequent tests to find the first negative. While correct, this has O(P × T) time complexity, where P is the number of patients and T is the number of tests, making it inefficient for large datasets.
Optimal Approach:
The key insight is to partition tests by patient and order by date, then for each patient, find the minimum positive test date and the minimum negative test date greater than that positive date. This can be done efficiently using a self-join in SQL (or sorting + grouping in code). We ignore inconclusive tests, as they do not contribute to recovery. This approach scales linearly with the number of tests after sorting.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(P × T) | O(1) | Scan each patient and search for negative tests after positive tests |
| Optimal | O(T log T) | O(T) | Sort tests per patient, compute first positive and next negative efficiently |
Algorithm Walkthrough
- Filter tests: Exclude inconclusive tests, keeping only
PositiveandNegative. - Sort tests by patient_id and test_date: This ensures chronological order per patient.
- Group tests by patient: For each patient, extract the first positive test date.
- Find the first negative test after the first positive: Iterate through each patient's tests or use a SQL join with the condition that negative test_date > first positive date.
- Calculate recovery_time: Subtract the first positive date from the first negative date for each patient.
- Join with patients table: Attach patient_name and age to the computed recovery_time.
- Order the result: Sort by
recovery_timeascending, then bypatient_nameascending.
Why it works:
The invariant maintained is that for each patient, only the earliest positive test and the earliest subsequent negative test are used. This guarantees that the recovery time is correctly computed for the first recovery event, satisfying the problem constraints.
Python Solution
from typing import List
import pandas as pd
def find_recovery_patients(patients: pd.DataFrame, covid_tests: pd.DataFrame) -> pd.DataFrame:
# Ensure date column is datetime
covid_tests = covid_tests.copy()
covid_tests["test_date"] = pd.to_datetime(covid_tests["test_date"])
# First positive test per patient
positive = covid_tests[covid_tests["result"] == "Positive"]
first_positive = positive.groupby("patient_id")["test_date"].min().reset_index()
first_positive.columns = ["patient_id", "first_positive_date"]
# Join to bring positive threshold
merged = covid_tests.merge(first_positive, on="patient_id", how="inner")
# Only consider tests after first positive
after_positive = merged[merged["test_date"] > merged["first_positive_date"]]
# First negative after first positive
negative = after_positive[after_positive["result"] == "Negative"]
first_negative = negative.groupby("patient_id")["test_date"].min().reset_index()
first_negative.columns = ["patient_id", "first_negative_date"]
# Combine with positive dates
result = first_positive.merge(first_negative, on="patient_id", how="inner")
# Compute recovery time in days
result["recovery_time"] = (
result["first_negative_date"] - result["first_positive_date"]
).dt.days
# Attach patient info
result = result.merge(patients, on="patient_id", how="inner")
# Select final columns and sort
result = result[["patient_id", "patient_name", "age", "recovery_time"]]
result = result.sort_values(["recovery_time", "patient_name"], ascending=[True, True])
return result
The implementation first normalizes the test dates into a comparable datetime format. It then isolates positive tests and computes the first infection date per patient. After joining this back to the full test dataset, it filters to only consider tests occurring strictly after the first positive event. From that filtered set, it extracts the earliest negative test, which defines recovery. Finally, it computes the time difference, joins patient metadata, and sorts the output as required. import datetime
class Solution: def findCovidRecoveryPatients(self, patients: List[dict], covid_tests: List[dict]) -> List[dict]: from collections import defaultdict
# Step 1: Group tests by patient_id
patient_tests = defaultdict(list)
for test in covid_tests:
if test['result'] in ('Positive', 'Negative'):
patient_tests[test['patient_id']].append((test['test_date'], test['result']))
result = []
# Step 2: Compute recovery times
for patient_id, tests in patient_tests.items():
# Sort by date
tests.sort(key=lambda x: x[0])
first_positive = None
first_negative_after_positive = None
for date_str, result_status in tests:
if result_status == 'Positive' and first_positive is None:
first_positive = datetime.datetime.strptime(date_str, '%Y-%m-%d').date()
elif result_status == 'Negative' and first_positive is not None:
negative_date = datetime.datetime.strptime(date_str, '%Y-%m-%d').date()
if negative_date > first_positive:
first_negative_after_positive = negative_date
break
if first_positive and first_negative_after_positive:
recovery_time = (first_negative_after_positive - first_positive).days
result.append((patient_id, recovery_time))
# Step 3: Join with patient info
patient_info = {p['patient_id']: p for p in patients}
final_result = []
for pid, rtime in result:
p = patient_info[pid]
final_result.append({
'patient_id': pid,
'patient_name': p['patient_name'],
'age': p['age'],
'recovery_time': rtime
})
# Step 4: Sort by recovery_time then patient_name
final_result.sort(key=lambda x: (x['recovery_time'], x['patient_name']))
return final_result
**Explanation:**
We first group all relevant tests by patient, discarding inconclusive results. For each patient, we sort tests chronologically and locate the first positive test. The first negative after that positive determines the recovery time. We join the computed recovery times with patient demographics and finally sort according to the problem requirements.
## Go Solution
```go
package main
import (
"sort"
"time"
)
type Patient struct {
PatientID int
PatientName string
Age int
}
type Test struct {
TestID int
PatientID int
TestDate time.Time
Result string
}
type ResultRow struct {
PatientID int
PatientName string
Age int
RecoveryTime int
}
func findRecoveryPatients(patients []Patient, tests []Test) []ResultRow {
// Step 1: map patient info
patientMap := make(map[int]Patient)
for _, p := range patients {
patientMap[p.PatientID] = p
}
// Step 2: find first positive per patient
firstPositive := make(map[int]time.Time)
for _, t := range tests {
if t.Result == "Positive" {
if cur, ok := firstPositive[t.PatientID]; !ok || t.TestDate.Before(cur) {
firstPositive[t.PatientID] = t.TestDate
}
}
}
// Step 3: find first negative after first positive
firstNegative := make(map[int]time.Time)
for _, t := range tests {
posDate, ok := firstPositive[t.PatientID]
if !ok {
continue
}
if t.TestDate.After(posDate) && t.Result == "Negative" {
if cur, ok2 := firstNegative[t.PatientID]; !ok2 || t.TestDate.Before(cur) {
firstNegative[t.PatientID] = t.TestDate
}
}
}
// Step 4: build results
var res []ResultRow
for pid, posDate := range firstPositive {
negDate, ok := firstNegative[pid]
if !ok {
continue
}
p := patientMap[pid]
days := int(negDate.Sub(posDate).Hours() / 24)
res = append(res, ResultRow{
PatientID: pid,
PatientName: p.PatientName,
Age: p.Age,
RecoveryTime: days,
})
}
// Step 5: sort results
sort.Slice(res, func(i, j int) bool {
if res[i].RecoveryTime == res[j].RecoveryTime {
return res[i].PatientName < res[j].PatientName
}
return res[i].RecoveryTime < res[j].RecoveryTime
})
return res
}
In the Go implementation, we explicitly model patients and tests as structs and use hash maps to compute per-patient aggregates efficiently. We track the first positive and first valid negative dates separately, ensuring we only consider negatives that occur after infection. Time arithmetic is handled using time.Time.Sub, which returns a duration that we convert into days.
Worked Examples
For patient 1 (Alice Smith), we scan her tests and find a positive on 2023-01-15. We then ignore all tests on or before this date and find the first negative after it on 2023-01-25. The recovery time is computed as 10 days.
For patient 2 (Bob Johnson), the first positive is on 2023-02-01. The test on 2023-02-05 is inconclusive and ignored. The first valid negative after the positive is 2023-02-12, giving a recovery time of 11 days.
For patient 3 (Carol Davis), even though there is an earlier negative test, it occurs before the positive event and is therefore irrelevant. The first positive is 2023-02-10, and the first negative after it is 2023-02-20, producing a recovery time of 10 days.
Patients 4 and 5 are excluded because patient 4 has no qualifying negative after a positive, and patient 5 has no positive test at all.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(T) | Each test is processed a constant number of times to compute first positive and first valid negative |
| Space | O(P) | Maps store at most one entry per patient for aggregated results |
The efficiency comes from single-pass aggregation over the test dataset combined with constant-time map operations per record, avoiding nested scans.
Test Cases
import pandas as pd
from datetime import datetime
patients_df = pd.DataFrame([
(1, "Alice Smith", 28),
(2, "Bob Johnson", 35),
(3, "Carol Davis", 42),
(4, "David Wilson", 31),
(5, "Emma Brown", 29),
], columns=["patient_id", "patient_name", "age"])
tests_df = pd.DataFrame([
(1, 1, "2023-01-15", "Positive"),
(2, 1, "2023-01-25", "Negative"),
(3, 2, "2023-02-01", "Positive"),
(4, 2, "2023-02-05", "Inconclusive"),
(5, 2, "2023-02-12", "Negative"),
(6, 3, "2023-01-20", "Negative"),
(7, 3, "2023-02-10", "Positive"),
(8, 3, "2023-02-20", "Negative"),
(9, 4, "2023-01-10", "Positive"),
(10, 4, "2023-01-18", "Positive"),
(11, 5, "2023-02-15", "Negative"),
(12, 5, "2023-02-20", "Negative"),
], columns=["test_id", "patient_id", "test_date", "result"])
res = find_recovery_patients(patients_df, tests_df)
assert set(res["patient_id"]) == {1, 2, 3} # only recovered patients
assert res.iloc[0]["patient_id"] == 1 # sorted by recovery time then name
assert res.iloc[1]["patient_id"] in {2, 3}
| Test | Why |
|---|---|
| Mixed valid and invalid patients | validates filtering logic |
| Multiple positives per patient | ensures first positive is used |
| Negative before positive | ensures ordering constraint is enforced |
Edge Cases
One important edge case is when a patient has multiple positive tests. The implementation must ensure that only the earliest positive test is used as the starting point. If a later positive is incorrectly used, the recovery time could be underestimated or incorrectly computed.
Another edge case occurs when a patient has negative tests both before and after a positive test. Only negatives strictly after the first positive should be considered; otherwise, the algorithm would incorrectly treat pre-infection negatives as recovery events.
A third edge case involves patients with no valid recovery sequence. This includes patients with only positive tests or only negative tests. The implementation correctly excludes these by requiring both a computed first positive and a valid first negative after it before producing output. "sort" "time" )
type Patient struct { PatientID int PatientName string Age int }
type CovidTest struct { TestID int PatientID int TestDate string Result string }
type Recovery struct { PatientID int PatientName string Age int RecoveryTime int }
func FindCovidRecoveryPatients(patients []Patient, covidTests []CovidTest) []Recovery { testsByPatient := make(map[int][]CovidTest) for _, test := range covidTests { if test.Result == "Positive" || test.Result == "Negative" { testsByPatient[test.PatientID] = append(testsByPatient[test.PatientID], test) } }
patientInfo := make(map[int]Patient)
for _, p := range patients {
patientInfo[p.PatientID] = p
}
var result []Recovery
for pid, tests := range testsByPatient {
sort.Slice(tests, func(i, j int) bool {
return tests[i].TestDate < tests[j].TestDate
})
var firstPositive, firstNegativeAfterPositive time.Time
for _, t := range tests {
date, _ := time.Parse("2006-01-02", t.TestDate)
if t.Result == "Positive" && firstPositive.IsZero() {
firstPositive = date
} else if t.Result == "Negative" && !firstPositive.IsZero() && date.After(firstPositive) {
firstNegativeAfterPositive = date
break
}
}
if !firstPositive.IsZero() && !firstNegativeAfterPositive.IsZero() {
recoveryDays := int(firstNegativeAfterPositive.Sub(firstPositive).Hours() / 24)
p := patientInfo[pid]
result = append(result, Recovery{
PatientID: pid,
PatientName: p.PatientName,
Age: p.Age,
RecoveryTime: recoveryDays,
})
}
}
sort.Slice(result, func(i, j int) bool {
if result[i].RecoveryTime == result[j].RecoveryTime {
return result[i].PatientName < result[j].PatientName
}
return result[i].RecoveryTime < result[j].RecoveryTime
})
return result
}
**Go-specific notes:**
We use `time.Parse` for date comparison. Maps replace Python dictionaries for grouping. Sorting slices is required to process tests chronologically. Zero-value `time.Time` indicates an unset date, avoiding nil issues.
## Worked Examples
### Alice Smith (patient_id = 1)
- Tests: Positive on 2023-01-15, Negative on 2023-01-25
- First positive = 2023-01-15, first negative after positive = 2023-01-25
- Recovery time = 10 days
### Bob Johnson (patient_id = 2)
- Tests: Positive 2023-02-01, Inconclusive 2023-02-05, Negative 2023-02-12
- First positive = 2023-02-01, first negative after positive = 2023-02-12
- Recovery time = 11 days
### Carol Davis (patient_id = 3)
- Tests: Negative