LeetCode 275: H-Index II
A clear explanation of the H-Index II problem using binary search on a sorted citations array.
Problem Restatement
We are given an integer array citations.
The array is sorted in ascending order.
Each value:
citations[i]
is the number of citations received by one paper.
Return the researcher's h-index.
The h-index is the maximum value h such that the researcher has at least h papers with at least h citations each.
The required runtime is logarithmic, so we need binary search. The official examples include [0,1,3,5,6] -> 3 and [1,2,100] -> 2.
Input and Output
| Item | Meaning |
|---|---|
| Input | Sorted ascending array citations |
| Output | The h-index |
| Goal | Find the largest valid h |
| Required time | O(log n) |
Example function shape:
def hIndex(citations: list[int]) -> int:
...
Examples
Example 1:
citations = [0, 1, 3, 5, 6]
There are 5 papers.
If h = 3, then the last three papers have citation counts:
[3, 5, 6]
All three have at least 3 citations.
If h = 4, then we would need four papers with at least four citations, but only two papers have at least four citations.
Answer:
3
Example 2:
citations = [1, 2, 100]
There are two papers with at least two citations:
[2, 100]
There are not three papers with at least three citations.
Answer:
2
First Thought: Linear Scan
Since the array is sorted, we can scan from left to right.
At index i, the number of papers from i to the end is:
n - i
If:
citations[i] >= n - i
then there are at least n - i papers with at least n - i citations.
The first index where this becomes true gives the h-index.
class Solution:
def hIndex(self, citations: list[int]) -> int:
n = len(citations)
for i, c in enumerate(citations):
h = n - i
if c >= h:
return h
return 0
This is correct, but it runs in O(n).
Problem With Linear Scan
The problem explicitly asks for logarithmic time.
So we need to use the sorted order more aggressively.
The condition:
citations[i] >= n - i
has a monotonic shape.
At earlier indices, citations[i] is smaller and n - i is larger.
At later indices, citations[i] is larger and n - i is smaller.
So the condition changes from false to true at most once.
That is exactly a binary search pattern.
Key Insight
For an index i, define:
h = n - i
This means there are h papers from index i through the end.
Because the array is sorted ascending, all papers from index i onward have at least citations[i] citations.
So if:
citations[i] >= h
then these h papers all have at least h citations.
This makes h a valid h-index candidate.
We want the earliest index where this condition becomes true.
Then the answer is:
n - index
Algorithm
- Let
n = len(citations). - Binary search over indices
0..n - 1. - At each
mid, compute:h = n - mid - If:
thencitations[mid] >= hmidis feasible, so search left for an earlier feasible index. - Otherwise, search right.
- When binary search ends,
leftis the first feasible index. - Return:
n - left
If no index is feasible, left == n, so the answer becomes 0.
Walkthrough
Use:
citations = [0, 1, 3, 5, 6]
Start:
n = 5
left = 0
right = 5
We use a half-open search interval [left, right).
First middle:
mid = 2
h = 5 - 2 = 3
citations[2] = 3
Since:
3 >= 3
index 2 is feasible.
Search left:
right = 2
Next middle:
mid = 1
h = 5 - 1 = 4
citations[1] = 1
Since:
1 < 4
index 1 is not feasible.
Search right:
left = 2
Now:
left == right == 2
The first feasible index is 2.
Return:
n - left = 5 - 2 = 3
Correctness
For each index i, there are exactly n - i papers from i to the end of the sorted array.
Because the array is sorted ascending, every paper from index i onward has at least citations[i] citations.
Therefore, if citations[i] >= n - i, then the researcher has at least n - i papers with at least n - i citations. So n - i is a valid h-index candidate.
The condition is monotonic. When it becomes true at some index, it remains true for later indices because citation counts do not decrease while n - i decreases.
Binary search finds the first index where the condition is true.
This first feasible index gives the largest valid h-index, because earlier indices correspond to larger h values. If no feasible index exists, the h-index is 0.
Therefore the algorithm returns the correct h-index.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(log n) |
Binary search over the sorted array |
| Space | O(1) |
Only a few variables are used |
Implementation
class Solution:
def hIndex(self, citations: list[int]) -> int:
n = len(citations)
left = 0
right = n
while left < right:
mid = (left + right) // 2
h = n - mid
if citations[mid] >= h:
right = mid
else:
left = mid + 1
return n - left
Code Explanation
We search in the half-open interval:
[left, right)
Initialize it to the whole array:
left = 0
right = n
At each index mid, the possible h-index is:
h = n - mid
If the current paper has enough citations to support that h:
if citations[mid] >= h:
then this index works, and maybe an earlier index also works.
So we move left:
right = mid
Otherwise, this index cannot support its candidate h, so we search to the right:
left = mid + 1
At the end, left is the first feasible index.
The answer is:
return n - left
If no index works, then left == n, and the function returns 0.
Testing
def run_tests():
s = Solution()
assert s.hIndex([0, 1, 3, 5, 6]) == 3
assert s.hIndex([1, 2, 100]) == 2
assert s.hIndex([0]) == 0
assert s.hIndex([100]) == 1
assert s.hIndex([0, 0, 0]) == 0
assert s.hIndex([1, 1, 1]) == 1
assert s.hIndex([4, 4, 4, 4]) == 4
assert s.hIndex([0, 1, 4, 5, 6]) == 3
print("all tests passed")
run_tests()
Test meaning:
| Test | Why |
|---|---|
[0, 1, 3, 5, 6] |
Official example |
[1, 2, 100] |
Official example |
[0] |
Single paper with zero citations |
[100] |
H-index capped by paper count |
| All zero | No positive h-index |
| All one | H-index is one |
| All high enough | H-index equals n |
| Mixed sorted values | Confirms binary search boundary |