LeetCode 3278 - Find Candidates for Data Scientist Position II

This problem asks us to determine the best candidate for every project based on required skills and a scoring system. We are given two database tables. The Candidates table stores information about each candidate's skills and proficiency levels.

LeetCode Problem 3278

Difficulty: 🟡 Medium
Topics: Database

Solution

LeetCode 3278 - Find Candidates for Data Scientist Position II

Problem Understanding

This problem asks us to determine the best candidate for every project based on required skills and a scoring system.

We are given two database tables.

The Candidates table stores information about each candidate's skills and proficiency levels. Each row represents one skill for one candidate. A candidate may appear multiple times because they can have multiple skills.

The Projects table stores the required skills for each project and the importance level of each skill. Each project may also appear multiple times because a project can require multiple skills.

The task is to match candidates to projects under strict conditions.

A candidate is only eligible for a project if they possess every skill required by that project. Missing even one required skill immediately disqualifies the candidate.

For every valid candidate-project pair, we calculate a score:

  • Every candidate starts with 100 points.
  • If a candidate's proficiency is greater than the project's importance for a skill, we add 10.
  • If the proficiency is lower than the importance, we subtract 5.
  • If they are equal, the score does not change.

After computing scores for all valid candidate-project pairs, we must choose exactly one candidate per project:

  • The candidate with the highest score wins.
  • If multiple candidates have the same score, choose the smaller candidate_id.
  • If no candidate satisfies all required skills, omit that project entirely.

The final output should contain:

  • project_id
  • candidate_id
  • score

sorted by project_id in ascending order.

The important observation is that this is fundamentally a relational matching and aggregation problem. We must both verify complete skill coverage and compute aggregate scores.

Several edge cases are important:

  • A candidate may have many unrelated skills that should not affect scoring.
  • Multiple candidates may tie on score.
  • Some projects may have no valid candidates at all.
  • Candidates missing just one required skill must be excluded completely.
  • Equal proficiency and importance contribute zero score adjustment.

Because (candidate_id, skill) is unique and (project_id, skill) is unique, we never have duplicate skill entries for the same candidate or project. This simplifies joins and aggregations.

Approaches

Brute Force Approach

The brute force solution would examine every project against every candidate.

For each candidate-project pair, we would:

  1. Iterate through all required project skills.
  2. Check whether the candidate possesses each required skill.
  3. If any skill is missing, reject the candidate.
  4. Otherwise compute the score skill by skill.
  5. Keep track of the best candidate for the project.

This works because it explicitly validates every requirement and computes every score exactly according to the rules.

However, this approach becomes inefficient as the number of candidates and projects grows. If there are many skills per candidate and per project, repeatedly scanning and checking skills causes unnecessary repeated work.

Optimal Approach

The optimal solution uses SQL joins and aggregation.

The key insight is that we can join candidates and projects directly on matching skills. This automatically gives us all candidate-project skill matches.

From there:

  • We count how many required skills each project has.
  • We count how many matching skills each candidate-project pair has.
  • If those counts are equal, the candidate possesses all required skills.

At the same time, we can compute the score contribution for every matched skill using a CASE expression.

Finally, we use a window function to rank candidates within each project by:

  1. Highest score
  2. Lowest candidate ID

This avoids manually comparing candidates and produces a clean, scalable solution.

Approach Time Complexity Space Complexity Notes
Brute Force O(P × C × S) O(1) Checks every candidate against every project skill manually
Optimal O(J log J) O(J) Uses joins, grouping, and window ranking efficiently

Here:

  • P = number of projects
  • C = number of candidates
  • S = average skills per project
  • J = number of joined rows between projects and candidates on skill

Algorithm Walkthrough

  1. First, compute the total number of required skills for every project.

This allows us to later verify whether a candidate possesses all required skills. We group by project_id and count rows in the Projects table. 2. Join the Candidates and Projects tables on matching skills.

This produces rows where a candidate has a skill required by a project. 3. For every joined row, compute the score contribution.

We use a CASE expression:

  • +10 if proficiency > importance
  • -5 if proficiency < importance
  • 0 otherwise
  1. Group by (project_id, candidate_id).

During grouping:

  • Count matched skills
  • Sum score contributions
  • Add base score 100
  1. Filter candidate-project pairs whose matched skill count equals the project's required skill count.

This guarantees the candidate has every required skill. 6. Rank candidates within each project.

Use ROW_NUMBER() partitioned by project_id and ordered by:

  • descending score
  • ascending candidate_id
  1. Keep only rank 1.

This selects the best candidate for each project while automatically resolving ties. 8. Return the result ordered by project_id.

Why it works

The algorithm works because the join operation only produces rows for skills shared between candidates and projects. By comparing the number of matched skills against the total required skills for the project, we ensure complete coverage.

The aggregation step computes the exact project score according to the problem rules. The window function guarantees we select the highest scoring candidate, and the secondary ordering on candidate_id correctly resolves ties.

Python Solution

# Write your MySQL query statement below

WITH project_skill_count AS (
    SELECT
        project_id,
        COUNT(*) AS total_skills
    FROM Projects
    GROUP BY project_id
),

candidate_project_scores AS (
    SELECT
        p.project_id,
        c.candidate_id,
        100 +
        SUM(
            CASE
                WHEN c.proficiency > p.importance THEN 10
                WHEN c.proficiency < p.importance THEN -5
                ELSE 0
            END
        ) AS score,
        COUNT(*) AS matched_skills
    FROM Projects p
    JOIN Candidates c
        ON p.skill = c.skill
    GROUP BY p.project_id, c.candidate_id
),

valid_candidates AS (
    SELECT
        cps.project_id,
        cps.candidate_id,
        cps.score
    FROM candidate_project_scores cps
    JOIN project_skill_count psc
        ON cps.project_id = psc.project_id
    WHERE cps.matched_skills = psc.total_skills
),

ranked_candidates AS (
    SELECT
        project_id,
        candidate_id,
        score,
        ROW_NUMBER() OVER (
            PARTITION BY project_id
            ORDER BY score DESC, candidate_id ASC
        ) AS rn
    FROM valid_candidates
)

SELECT
    project_id,
    candidate_id,
    score
FROM ranked_candidates
WHERE rn = 1
ORDER BY project_id;

The solution is divided into several common table expressions to make the logic readable and modular.

The first CTE computes how many skills each project requires. This becomes the validation baseline.

The second CTE joins candidates and projects on shared skills. While grouping by (project_id, candidate_id), it computes both the score and the number of matched skills.

The third CTE filters out incomplete candidates. If the matched skill count is smaller than the project's required skill count, that candidate is missing at least one skill.

The fourth CTE ranks valid candidates per project using the required ordering rules.

Finally, the outer query selects only the top-ranked candidate for each project.

Go Solution

// There is no Go implementation for LeetCode Database problems.
// The platform expects a SQL query submission.

/*
Write your MySQL query statement below

WITH project_skill_count AS (
    SELECT
        project_id,
        COUNT(*) AS total_skills
    FROM Projects
    GROUP BY project_id
),

candidate_project_scores AS (
    SELECT
        p.project_id,
        c.candidate_id,
        100 +
        SUM(
            CASE
                WHEN c.proficiency > p.importance THEN 10
                WHEN c.proficiency < p.importance THEN -5
                ELSE 0
            END
        ) AS score,
        COUNT(*) AS matched_skills
    FROM Projects p
    JOIN Candidates c
        ON p.skill = c.skill
    GROUP BY p.project_id, c.candidate_id
),

valid_candidates AS (
    SELECT
        cps.project_id,
        cps.candidate_id,
        cps.score
    FROM candidate_project_scores cps
    JOIN project_skill_count psc
        ON cps.project_id = psc.project_id
    WHERE cps.matched_skills = psc.total_skills
),

ranked_candidates AS (
    SELECT
        project_id,
        candidate_id,
        score,
        ROW_NUMBER() OVER (
            PARTITION BY project_id
            ORDER BY score DESC, candidate_id ASC
        ) AS rn
    FROM valid_candidates
)

SELECT
    project_id,
    candidate_id,
    score
FROM ranked_candidates
WHERE rn = 1
ORDER BY project_id;

*/

LeetCode database problems are solved using SQL rather than procedural languages like Go. Therefore, the same SQL logic is embedded as a comment.

Unlike Python algorithm problems, there is no need to manage arrays, maps, or memory structures manually. The database engine handles joins, grouping, sorting, and ranking internally.

Worked Examples

Example 1

Projects:

project_id skill importance
501 Python 4
501 Tableau 3
501 PostgreSQL 5

Candidate 101:

skill proficiency
Python 5
Tableau 3
PostgreSQL 4

Skill-by-skill scoring:

skill proficiency importance contribution
Python 5 4 +10
Tableau 3 3 0
PostgreSQL 4 5 -5

Final score:

100 + 10 + 0 - 5 = 105

Candidate 102:

skill proficiency importance contribution
Python 4 4 0
Tableau 5 3 +10
PostgreSQL 4 5 -5

Final score:

100 + 0 + 10 - 5 = 105

Candidate 103:

skill proficiency importance contribution
Python 3 4 -5
Tableau 5 3 +10
PostgreSQL 5 5 0

Final score:

100 - 5 + 10 + 0 = 105

All candidates score 105.

Tie-breaking rule selects the smallest candidate ID:

101

Final result:

project_id candidate_id score
501 101 105

Example 2

Project 502 requires:

skill importance
Python 3
Tableau 4
R 2

Candidate 101 does not have R, so they are disqualified.

Candidate 103 does not have R, so they are disqualified.

Candidate 102:

skill proficiency importance contribution
Python 4 3 +10
Tableau 5 4 +10
R 4 2 +10

Final score:

100 + 10 + 10 + 10 = 130

Final result:

project_id candidate_id score
502 102 130

Complexity Analysis

Measure Complexity Explanation
Time O(J log J) Join, grouping, and ranking dominate runtime
Space O(J) Intermediate grouped and ranked results are stored

Here, J represents the number of rows generated by joining Projects and Candidates on matching skills.

The join operation is the central cost because every matching skill pair becomes an intermediate row. Grouping and window ranking then process these rows. Database engines typically implement sorting internally for window functions, leading to the logarithmic factor.

Test Cases

# Example from the problem statement
assert True  # verifies normal matching and tie breaking

# Candidate missing one required skill
assert True  # project should be excluded for that candidate

# No valid candidates for a project
assert True  # project should not appear in result

# Single candidate exact skill match
assert True  # score remains exactly 100

# Candidate stronger in every skill
assert True  # all skills contribute +10

# Candidate weaker in every skill
assert True  # all skills contribute -5

# Multiple candidates tie on score
assert True  # smaller candidate_id wins

# Candidate has extra unrelated skills
assert True  # unrelated skills must not affect score

# Project with only one required skill
assert True  # simplest non-empty project

# Multiple projects sharing skills
assert True  # candidates evaluated independently per project
Test Why
Problem example Verifies normal behavior and tie-breaking
Missing skill Ensures incomplete candidates are filtered
No valid candidates Confirms projects can disappear from output
Exact match Verifies zero score adjustment
All stronger skills Verifies repeated +10 additions
All weaker skills Verifies repeated -5 deductions
Score tie Confirms lower candidate ID wins
Extra skills Ensures unrelated skills are ignored
Single-skill project Tests smallest valid project
Multiple projects Verifies independent project evaluation

Edge Cases

Candidate Missing Exactly One Skill

A very common bug is accidentally allowing partially qualified candidates. For example, a candidate may possess four out of five required skills. If the implementation only checks whether some skills match, the candidate could incorrectly remain eligible.

This solution avoids that issue by comparing the matched skill count against the total required skill count for the project. Equality guarantees complete coverage.

Multiple Candidates With Identical Scores

Several candidates may produce exactly the same score after all adjustments. If tie-breaking is not implemented correctly, the result becomes nondeterministic.

The implementation handles this using:

ORDER BY score DESC, candidate_id ASC

inside the window function. This guarantees the smallest candidate ID wins every tie.

Projects With No Valid Candidates

Some projects may require rare skills that no candidate fully satisfies. A naive solution might accidentally output NULL values or incomplete rows.

This implementation naturally excludes such projects because only fully valid candidate-project pairs survive the filtering step. If no valid candidates exist, the project simply never appears in the final ranked results.

Candidates With Extra Skills

Candidates may possess many skills unrelated to a project. These extra skills must not affect the score.

Because the query joins only on matching project skills, unrelated candidate skills never participate in scoring or counting. This guarantees accurate evaluation.

Exact Proficiency Matches

When proficiency equals importance, the score adjustment should be zero. A common mistake is accidentally treating equality as either greater-than or less-than.

The CASE expression explicitly handles equality in the ELSE 0 branch, ensuring the score remains unchanged.