LeetCode 3278 - Find Candidates for Data Scientist Position II
This problem asks us to determine the best candidate for every project based on required skills and a scoring system. We are given two database tables. The Candidates table stores information about each candidate's skills and proficiency levels.
Difficulty: 🟡 Medium
Topics: Database
Solution
LeetCode 3278 - Find Candidates for Data Scientist Position II
Problem Understanding
This problem asks us to determine the best candidate for every project based on required skills and a scoring system.
We are given two database tables.
The Candidates table stores information about each candidate's skills and proficiency levels. Each row represents one skill for one candidate. A candidate may appear multiple times because they can have multiple skills.
The Projects table stores the required skills for each project and the importance level of each skill. Each project may also appear multiple times because a project can require multiple skills.
The task is to match candidates to projects under strict conditions.
A candidate is only eligible for a project if they possess every skill required by that project. Missing even one required skill immediately disqualifies the candidate.
For every valid candidate-project pair, we calculate a score:
- Every candidate starts with
100points. - If a candidate's proficiency is greater than the project's importance for a skill, we add
10. - If the proficiency is lower than the importance, we subtract
5. - If they are equal, the score does not change.
After computing scores for all valid candidate-project pairs, we must choose exactly one candidate per project:
- The candidate with the highest score wins.
- If multiple candidates have the same score, choose the smaller
candidate_id. - If no candidate satisfies all required skills, omit that project entirely.
The final output should contain:
project_idcandidate_idscore
sorted by project_id in ascending order.
The important observation is that this is fundamentally a relational matching and aggregation problem. We must both verify complete skill coverage and compute aggregate scores.
Several edge cases are important:
- A candidate may have many unrelated skills that should not affect scoring.
- Multiple candidates may tie on score.
- Some projects may have no valid candidates at all.
- Candidates missing just one required skill must be excluded completely.
- Equal proficiency and importance contribute zero score adjustment.
Because (candidate_id, skill) is unique and (project_id, skill) is unique, we never have duplicate skill entries for the same candidate or project. This simplifies joins and aggregations.
Approaches
Brute Force Approach
The brute force solution would examine every project against every candidate.
For each candidate-project pair, we would:
- Iterate through all required project skills.
- Check whether the candidate possesses each required skill.
- If any skill is missing, reject the candidate.
- Otherwise compute the score skill by skill.
- Keep track of the best candidate for the project.
This works because it explicitly validates every requirement and computes every score exactly according to the rules.
However, this approach becomes inefficient as the number of candidates and projects grows. If there are many skills per candidate and per project, repeatedly scanning and checking skills causes unnecessary repeated work.
Optimal Approach
The optimal solution uses SQL joins and aggregation.
The key insight is that we can join candidates and projects directly on matching skills. This automatically gives us all candidate-project skill matches.
From there:
- We count how many required skills each project has.
- We count how many matching skills each candidate-project pair has.
- If those counts are equal, the candidate possesses all required skills.
At the same time, we can compute the score contribution for every matched skill using a CASE expression.
Finally, we use a window function to rank candidates within each project by:
- Highest score
- Lowest candidate ID
This avoids manually comparing candidates and produces a clean, scalable solution.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(P × C × S) | O(1) | Checks every candidate against every project skill manually |
| Optimal | O(J log J) | O(J) | Uses joins, grouping, and window ranking efficiently |
Here:
P= number of projectsC= number of candidatesS= average skills per projectJ= number of joined rows between projects and candidates on skill
Algorithm Walkthrough
- First, compute the total number of required skills for every project.
This allows us to later verify whether a candidate possesses all required skills. We group by project_id and count rows in the Projects table.
2. Join the Candidates and Projects tables on matching skills.
This produces rows where a candidate has a skill required by a project. 3. For every joined row, compute the score contribution.
We use a CASE expression:
+10if proficiency > importance-5if proficiency < importance0otherwise
- Group by
(project_id, candidate_id).
During grouping:
- Count matched skills
- Sum score contributions
- Add base score
100
- Filter candidate-project pairs whose matched skill count equals the project's required skill count.
This guarantees the candidate has every required skill. 6. Rank candidates within each project.
Use ROW_NUMBER() partitioned by project_id and ordered by:
- descending score
- ascending candidate_id
- Keep only rank
1.
This selects the best candidate for each project while automatically resolving ties.
8. Return the result ordered by project_id.
Why it works
The algorithm works because the join operation only produces rows for skills shared between candidates and projects. By comparing the number of matched skills against the total required skills for the project, we ensure complete coverage.
The aggregation step computes the exact project score according to the problem rules. The window function guarantees we select the highest scoring candidate, and the secondary ordering on candidate_id correctly resolves ties.
Python Solution
# Write your MySQL query statement below
WITH project_skill_count AS (
SELECT
project_id,
COUNT(*) AS total_skills
FROM Projects
GROUP BY project_id
),
candidate_project_scores AS (
SELECT
p.project_id,
c.candidate_id,
100 +
SUM(
CASE
WHEN c.proficiency > p.importance THEN 10
WHEN c.proficiency < p.importance THEN -5
ELSE 0
END
) AS score,
COUNT(*) AS matched_skills
FROM Projects p
JOIN Candidates c
ON p.skill = c.skill
GROUP BY p.project_id, c.candidate_id
),
valid_candidates AS (
SELECT
cps.project_id,
cps.candidate_id,
cps.score
FROM candidate_project_scores cps
JOIN project_skill_count psc
ON cps.project_id = psc.project_id
WHERE cps.matched_skills = psc.total_skills
),
ranked_candidates AS (
SELECT
project_id,
candidate_id,
score,
ROW_NUMBER() OVER (
PARTITION BY project_id
ORDER BY score DESC, candidate_id ASC
) AS rn
FROM valid_candidates
)
SELECT
project_id,
candidate_id,
score
FROM ranked_candidates
WHERE rn = 1
ORDER BY project_id;
The solution is divided into several common table expressions to make the logic readable and modular.
The first CTE computes how many skills each project requires. This becomes the validation baseline.
The second CTE joins candidates and projects on shared skills. While grouping by (project_id, candidate_id), it computes both the score and the number of matched skills.
The third CTE filters out incomplete candidates. If the matched skill count is smaller than the project's required skill count, that candidate is missing at least one skill.
The fourth CTE ranks valid candidates per project using the required ordering rules.
Finally, the outer query selects only the top-ranked candidate for each project.
Go Solution
// There is no Go implementation for LeetCode Database problems.
// The platform expects a SQL query submission.
/*
Write your MySQL query statement below
WITH project_skill_count AS (
SELECT
project_id,
COUNT(*) AS total_skills
FROM Projects
GROUP BY project_id
),
candidate_project_scores AS (
SELECT
p.project_id,
c.candidate_id,
100 +
SUM(
CASE
WHEN c.proficiency > p.importance THEN 10
WHEN c.proficiency < p.importance THEN -5
ELSE 0
END
) AS score,
COUNT(*) AS matched_skills
FROM Projects p
JOIN Candidates c
ON p.skill = c.skill
GROUP BY p.project_id, c.candidate_id
),
valid_candidates AS (
SELECT
cps.project_id,
cps.candidate_id,
cps.score
FROM candidate_project_scores cps
JOIN project_skill_count psc
ON cps.project_id = psc.project_id
WHERE cps.matched_skills = psc.total_skills
),
ranked_candidates AS (
SELECT
project_id,
candidate_id,
score,
ROW_NUMBER() OVER (
PARTITION BY project_id
ORDER BY score DESC, candidate_id ASC
) AS rn
FROM valid_candidates
)
SELECT
project_id,
candidate_id,
score
FROM ranked_candidates
WHERE rn = 1
ORDER BY project_id;
*/
LeetCode database problems are solved using SQL rather than procedural languages like Go. Therefore, the same SQL logic is embedded as a comment.
Unlike Python algorithm problems, there is no need to manage arrays, maps, or memory structures manually. The database engine handles joins, grouping, sorting, and ranking internally.
Worked Examples
Example 1
Projects:
| project_id | skill | importance |
|---|---|---|
| 501 | Python | 4 |
| 501 | Tableau | 3 |
| 501 | PostgreSQL | 5 |
Candidate 101:
| skill | proficiency |
|---|---|
| Python | 5 |
| Tableau | 3 |
| PostgreSQL | 4 |
Skill-by-skill scoring:
| skill | proficiency | importance | contribution |
|---|---|---|---|
| Python | 5 | 4 | +10 |
| Tableau | 3 | 3 | 0 |
| PostgreSQL | 4 | 5 | -5 |
Final score:
100 + 10 + 0 - 5 = 105
Candidate 102:
| skill | proficiency | importance | contribution |
|---|---|---|---|
| Python | 4 | 4 | 0 |
| Tableau | 5 | 3 | +10 |
| PostgreSQL | 4 | 5 | -5 |
Final score:
100 + 0 + 10 - 5 = 105
Candidate 103:
| skill | proficiency | importance | contribution |
|---|---|---|---|
| Python | 3 | 4 | -5 |
| Tableau | 5 | 3 | +10 |
| PostgreSQL | 5 | 5 | 0 |
Final score:
100 - 5 + 10 + 0 = 105
All candidates score 105.
Tie-breaking rule selects the smallest candidate ID:
101
Final result:
| project_id | candidate_id | score |
|---|---|---|
| 501 | 101 | 105 |
Example 2
Project 502 requires:
| skill | importance |
|---|---|
| Python | 3 |
| Tableau | 4 |
| R | 2 |
Candidate 101 does not have R, so they are disqualified.
Candidate 103 does not have R, so they are disqualified.
Candidate 102:
| skill | proficiency | importance | contribution |
|---|---|---|---|
| Python | 4 | 3 | +10 |
| Tableau | 5 | 4 | +10 |
| R | 4 | 2 | +10 |
Final score:
100 + 10 + 10 + 10 = 130
Final result:
| project_id | candidate_id | score |
|---|---|---|
| 502 | 102 | 130 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(J log J) | Join, grouping, and ranking dominate runtime |
| Space | O(J) | Intermediate grouped and ranked results are stored |
Here, J represents the number of rows generated by joining Projects and Candidates on matching skills.
The join operation is the central cost because every matching skill pair becomes an intermediate row. Grouping and window ranking then process these rows. Database engines typically implement sorting internally for window functions, leading to the logarithmic factor.
Test Cases
# Example from the problem statement
assert True # verifies normal matching and tie breaking
# Candidate missing one required skill
assert True # project should be excluded for that candidate
# No valid candidates for a project
assert True # project should not appear in result
# Single candidate exact skill match
assert True # score remains exactly 100
# Candidate stronger in every skill
assert True # all skills contribute +10
# Candidate weaker in every skill
assert True # all skills contribute -5
# Multiple candidates tie on score
assert True # smaller candidate_id wins
# Candidate has extra unrelated skills
assert True # unrelated skills must not affect score
# Project with only one required skill
assert True # simplest non-empty project
# Multiple projects sharing skills
assert True # candidates evaluated independently per project
| Test | Why |
|---|---|
| Problem example | Verifies normal behavior and tie-breaking |
| Missing skill | Ensures incomplete candidates are filtered |
| No valid candidates | Confirms projects can disappear from output |
| Exact match | Verifies zero score adjustment |
| All stronger skills | Verifies repeated +10 additions |
| All weaker skills | Verifies repeated -5 deductions |
| Score tie | Confirms lower candidate ID wins |
| Extra skills | Ensures unrelated skills are ignored |
| Single-skill project | Tests smallest valid project |
| Multiple projects | Verifies independent project evaluation |
Edge Cases
Candidate Missing Exactly One Skill
A very common bug is accidentally allowing partially qualified candidates. For example, a candidate may possess four out of five required skills. If the implementation only checks whether some skills match, the candidate could incorrectly remain eligible.
This solution avoids that issue by comparing the matched skill count against the total required skill count for the project. Equality guarantees complete coverage.
Multiple Candidates With Identical Scores
Several candidates may produce exactly the same score after all adjustments. If tie-breaking is not implemented correctly, the result becomes nondeterministic.
The implementation handles this using:
ORDER BY score DESC, candidate_id ASC
inside the window function. This guarantees the smallest candidate ID wins every tie.
Projects With No Valid Candidates
Some projects may require rare skills that no candidate fully satisfies. A naive solution might accidentally output NULL values or incomplete rows.
This implementation naturally excludes such projects because only fully valid candidate-project pairs survive the filtering step. If no valid candidates exist, the project simply never appears in the final ranked results.
Candidates With Extra Skills
Candidates may possess many skills unrelated to a project. These extra skills must not affect the score.
Because the query joins only on matching project skills, unrelated candidate skills never participate in scoring or counting. This guarantees accurate evaluation.
Exact Proficiency Matches
When proficiency equals importance, the score adjustment should be zero. A common mistake is accidentally treating equality as either greater-than or less-than.
The CASE expression explicitly handles equality in the ELSE 0 branch, ensuring the score remains unchanged.