LeetCode 274 - H-Index
The problem gives an array called citations, where each element represents how many citations a research paper has received. Each index corresponds to one paper, and the value at that index is the number of times that paper has been cited.
Difficulty: 🟡 Medium
Topics: Array, Sorting, Counting Sort
Solution
LeetCode 274, H-Index
Problem Understanding
The problem gives an array called citations, where each element represents how many citations a research paper has received. Each index corresponds to one paper, and the value at that index is the number of times that paper has been cited.
We must compute the researcher's h-index. The h-index is defined as the largest integer h such that the researcher has at least h papers with at least h citations each.
This definition is very important because it combines both productivity and impact. A researcher with many papers but few citations will have a low h-index. Similarly, a researcher with one extremely famous paper but very few other papers may also have a limited h-index.
For example, consider:
[3,0,6,1,5]
If we sort this array in descending order:
[6,5,3,1,0]
We can check how many papers satisfy the h-index condition:
- At least 1 paper has at least 1 citation
- At least 2 papers have at least 2 citations
- At least 3 papers have at least 3 citations
- Not at least 4 papers have at least 4 citations
So the maximum valid value is 3.
The constraints are relatively small:
1 <= n <= 50000 <= citations[i] <= 1000
An O(n log n) solution is completely acceptable for these limits. However, the problem also hints at counting sort as a topic, which suggests there may be a linear-time solution as well.
Several edge cases are important:
- All citation counts may be zero
- Citation counts may be much larger than the number of papers
- The array may already be sorted or completely unsorted
- Multiple papers may have the same citation count
- There may only be one paper
A naive implementation can easily make off-by-one mistakes when checking whether a candidate h is valid.
Approaches
Brute Force Approach
The brute-force approach tries every possible h-index from 0 to n.
For each candidate value h, we count how many papers have at least h citations. If at least h papers satisfy that condition, then h is valid.
For example:
citations = [3,0,6,1,5]
To test h = 3:
-
Papers with at least 3 citations:
-
3
-
6
-
5
There are 3 such papers, so h = 3 is valid.
To test h = 4:
-
Papers with at least 4 citations:
-
6
-
5
Only 2 papers satisfy the condition, so h = 4 is invalid.
This approach is correct because it directly checks the definition of h-index. However, it is inefficient because for every possible h, we scan the entire array again.
The time complexity becomes O(n^2).
Optimal Approach
The key observation is that the exact ordering of papers does not matter, only how many papers have citation counts above a threshold.
If we sort the array in descending order, we can determine the h-index in one pass.
Suppose the sorted array is:
[6,5,3,1,0]
At index i, there are exactly i + 1 papers to the left including the current paper.
If:
citations[i] >= i + 1
then there are at least i + 1 papers with at least i + 1 citations.
We continue until this condition fails.
This transforms the problem into a simple scan after sorting.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | Try every possible h-index and rescan the array each time |
| Optimal | O(n log n) | O(1) or O(n) | Sort the array and scan once to find the maximum valid h-index |
Algorithm Walkthrough
Optimal Sorting Algorithm
- Sort the
citationsarray in descending order.
We want the most cited papers first because the h-index depends on how many papers exceed a threshold. Sorting makes it easy to compare the paper count with citation counts directly.
2. Iterate through the sorted array using index i.
At position i, there are exactly i + 1 papers considered so far.
3. For each position, check whether:
citations[i] >= i + 1
This means the current paper has enough citations to support an h-index of i + 1.
4. If the condition is true, update the answer.
We continue scanning because a larger h-index may still be possible. 5. If the condition becomes false, stop checking larger values.
Since the array is sorted in descending order, all future citation counts will be smaller or equal. Therefore, no larger h-index can exist. 6. Return the maximum valid h-index found.
Why it works
After sorting in descending order, the paper at index i is the (i + 1)th most cited paper. If that paper has at least i + 1 citations, then the first i + 1 papers all satisfy the h-index condition.
Because the array is sorted, once the condition fails, every later paper will also fail for larger h-values. This guarantees that the largest valid value encountered during the scan is the correct h-index.
Python Solution
from typing import List
class Solution:
def hIndex(self, citations: List[int]) -> int:
citations.sort(reverse=True)
h_index = 0
for index, citation_count in enumerate(citations):
paper_count = index + 1
if citation_count >= paper_count:
h_index = paper_count
else:
break
return h_index
The implementation begins by sorting the citations in descending order. This ensures that the most cited papers appear first.
The variable h_index stores the best valid value found so far.
The loop iterates through the sorted papers using enumerate, which provides both the index and citation count. At each step:
paper_count = index + 1
This represents how many papers have been examined.
The condition:
citation_count >= paper_count
checks whether there are at least paper_count papers with at least paper_count citations each.
If the condition is satisfied, the h-index can safely be updated. Otherwise, the loop breaks because further papers will only have smaller citation counts.
Finally, the method returns the maximum valid h-index discovered during the scan.
Go Solution
package main
import "sort"
func hIndex(citations []int) int {
sort.Sort(sort.Reverse(sort.IntSlice(citations)))
hIndex := 0
for index, citationCount := range citations {
paperCount := index + 1
if citationCount >= paperCount {
hIndex = paperCount
} else {
break
}
}
return hIndex
}
The Go solution follows the same algorithmic structure as the Python version.
Go does not provide a direct descending sort for integer slices, so the implementation uses:
sort.Sort(sort.Reverse(sort.IntSlice(citations)))
The rest of the logic is identical. The loop checks whether the current citation count supports a larger h-index.
There are no integer overflow concerns because the constraints are very small. Go slices naturally handle both nil and empty behavior safely, although the problem guarantees at least one paper.
Worked Examples
Example 1
citations = [3,0,6,1,5]
Step 1: Sort Descending
[6,5,3,1,0]
Step 2: Scan Through Array
| Index | Citation Count | Papers Considered | Condition | h-index |
|---|---|---|---|---|
| 0 | 6 | 1 | 6 >= 1 | 1 |
| 1 | 5 | 2 | 5 >= 2 | 2 |
| 2 | 3 | 3 | 3 >= 3 | 3 |
| 3 | 1 | 4 | 1 >= 4 is false | stop |
Final answer:
3
Example 2
citations = [1,3,1]
Step 1: Sort Descending
[3,1,1]
Step 2: Scan Through Array
| Index | Citation Count | Papers Considered | Condition | h-index |
|---|---|---|---|---|
| 0 | 3 | 1 | 3 >= 1 | 1 |
| 1 | 1 | 2 | 1 >= 2 is false | stop |
Final answer:
1
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Sorting dominates the runtime |
| Space | O(1) or O(n) | Depends on the sorting implementation used internally |
The main cost comes from sorting the array. After sorting, the algorithm performs a single linear scan through the citations.
The scan itself is O(n), but sorting requires O(n log n) time, which dominates the total complexity.
The auxiliary space complexity depends on the language implementation of sorting. Python's sorting may use additional memory internally, while Go's sort implementation is closer to in-place behavior.
Test Cases
from typing import List
class Solution:
def hIndex(self, citations: List[int]) -> int:
citations.sort(reverse=True)
h_index = 0
for index, citation_count in enumerate(citations):
if citation_count >= index + 1:
h_index = index + 1
else:
break
return h_index
solution = Solution()
assert solution.hIndex([3, 0, 6, 1, 5]) == 3 # standard example
assert solution.hIndex([1, 3, 1]) == 1 # small unsorted input
assert solution.hIndex([0]) == 0 # single paper with zero citations
assert solution.hIndex([100]) == 1 # single highly cited paper
assert solution.hIndex([0, 0, 0, 0]) == 0 # all papers uncited
assert solution.hIndex([1, 1, 1, 1]) == 1 # all papers cited once
assert solution.hIndex([4, 4, 4, 4]) == 4 # every paper supports maximum h-index
assert solution.hIndex([10, 8, 5, 4, 3]) == 4 # descending order input
assert solution.hIndex([25, 8, 5, 3, 3]) == 3 # citation counts larger than n
assert solution.hIndex([6, 6, 6, 6, 1]) == 4 # failure occurs near end
assert solution.hIndex([2, 2, 2]) == 2 # exact equality condition
assert solution.hIndex([11, 15]) == 2 # small array with large citations
Test Case Summary
| Test | Why |
|---|---|
[3,0,6,1,5] |
Standard example from the problem |
[1,3,1] |
Small unsorted input |
[0] |
Minimum citation value |
[100] |
Single highly cited paper |
[0,0,0,0] |
No valid h-index greater than zero |
[1,1,1,1] |
Duplicate low citation counts |
[4,4,4,4] |
Maximum possible h-index |
[10,8,5,4,3] |
Already sorted descending |
[25,8,5,3,3] |
Citation counts larger than paper count |
[6,6,6,6,1] |
Failure condition appears late |
[2,2,2] |
Exact equality boundary |
[11,15] |
Small array with large values |
Edge Cases
One important edge case occurs when every paper has zero citations, such as:
[0,0,0]
A buggy implementation might incorrectly return 1 simply because there is at least one paper. However, the h-index requires at least one paper with at least one citation. Since none exist, the correct answer is 0. The implementation handles this because the condition:
citation_count >= paper_count
fails immediately on the first iteration.
Another important case occurs when citation counts are much larger than the number of papers:
[100,100,100]
The h-index can never exceed the total number of papers. Even though each paper has 100 citations, there are only 3 papers total, so the answer must be 3. The implementation naturally enforces this because the largest possible paper_count is n.
A third edge case involves exact equality:
[3,3,3]
The h-index should be 3, not 2. Some incorrect implementations use a strict greater-than comparison instead of greater-than-or-equal-to. The correct condition is:
citation_count >= paper_count
which properly handles equality boundaries.
Another subtle edge case is when the failure occurs in the middle of the scan:
[10,8,2,1]
The algorithm correctly stops once the condition fails. Since the array is sorted descending, no later paper can restore validity for a larger h-index. This guarantees correctness and avoids unnecessary work.