LeetCode 275 - H-Index II
This problem asks us to compute a researcher's h-index from a sorted list of citation counts. The input array citations is sorted in non-decreasing order, meaning the citation counts appear from smallest to largest.
Difficulty: 🟡 Medium
Topics: Array, Binary Search
Solution
LeetCode 275, H-Index II
Problem Understanding
This problem asks us to compute a researcher's h-index from a sorted list of citation counts. The input array citations is sorted in non-decreasing order, meaning the citation counts appear from smallest to largest.
The h-index is defined as the largest integer h such that the researcher has at least h papers with at least h citations each.
For example, if a researcher has citation counts [0,1,3,5,6], then:
- There are 3 papers with at least 3 citations, specifically
3,5,6 - There are not 4 papers with at least 4 citations
Therefore, the h-index is 3.
The important detail in this problem is that the array is already sorted. Unlike the original H-Index problem where sorting may be required, here we can take advantage of the ordering to design a logarithmic time solution.
The input size can be as large as 10^5, so an inefficient quadratic approach would not work. The problem explicitly requires a logarithmic time algorithm, which strongly suggests binary search.
The array is guaranteed to be sorted in ascending order, and citation counts are non-negative integers. This guarantee is extremely important because binary search only works correctly on ordered data.
Several edge cases are important to think about:
- Arrays where all citation counts are
0 - Arrays where every paper has very high citations
- Arrays with only one paper
- Arrays where the h-index equals the total number of papers
- Arrays where the h-index is
0
A naive implementation can easily make off-by-one mistakes because the relationship between the array index and the number of papers remaining must be handled carefully.
Approaches
Brute Force Approach
A straightforward solution is to test every possible value of h from 0 to n.
For each candidate h, we count how many papers have citations greater than or equal to h. If at least h papers satisfy this condition, then h is valid.
Because the array is sorted, we could scan through it for each candidate value.
This approach is correct because it directly checks the definition of h-index. However, it is inefficient because for every possible h, we may scan the entire array.
With n possible values of h and up to n work for each one, the total complexity becomes O(n^2).
Even if optimized slightly, it still does not satisfy the logarithmic time requirement.
Optimal Binary Search Approach
The key observation is that the array is sorted.
Suppose we are at index i in the array. Since the array is sorted ascending:
- All elements from
iton - 1are at leastcitations[i] - The number of papers remaining is
n - i
If citations[i] >= n - i, then there are at least n - i papers with at least n - i citations.
That means n - i is a valid h-index candidate.
Because the condition transitions monotonically across the sorted array, binary search can locate the first index where:
citations[i] >= n - i
Once we find this position, the answer is simply:
n - i
This reduces the time complexity to O(log n).
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | Tests every possible h-index by scanning papers |
| Optimal | O(log n) | O(1) | Uses binary search on the sorted array |
Algorithm Walkthrough
- Let
nbe the number of papers.
The array length matters because if we are at index i, then there are exactly n - i papers to the right, including the current paper.
2. Initialize binary search boundaries.
Set:
left = 0
right = n - 1
We will search for the first index where the h-index condition becomes true. 3. Compute the middle index.
During each iteration:
mid = (left + right) // 2
- Determine how many papers remain.
Since the array is sorted ascending, the papers from mid through the end all have at least citations[mid] citations.
The count of such papers is:
papers = n - mid
- Check whether the current position satisfies the h-index condition.
If:
citations[mid] >= papers
then we found a valid h-index candidate.
However, there may be an earlier index that also works and produces a larger h-index, so continue searching on the left side:
right = mid - 1
- Otherwise, move right.
If:
citations[mid] < papers
then the current position cannot produce a valid h-index.
We need larger citation values, so search the right half:
left = mid + 1
- After the binary search finishes,
leftpoints to the first valid index.
The answer becomes:
n - left
Why it works
The binary search works because the condition is monotonic.
As the index increases:
citations[i]tends to increasen - idecreases
Eventually, the condition:
citations[i] >= n - i
changes from false to true and stays true afterward.
Binary search is ideal for locating this transition point efficiently.
Python Solution
from typing import List
class Solution:
def hIndex(self, citations: List[int]) -> int:
n = len(citations)
left = 0
right = n - 1
while left <= right:
mid = (left + right) // 2
papers_with_at_least_mid_citations = n - mid
if citations[mid] >= papers_with_at_least_mid_citations:
right = mid - 1
else:
left = mid + 1
return n - left
The implementation begins by storing the array length in n. This value is repeatedly used to calculate how many papers remain from a given index onward.
The binary search boundaries are initialized with left and right. During each iteration, the middle index is computed.
The expression:
papers_with_at_least_mid_citations = n - mid
represents how many papers exist from index mid to the end of the array.
The condition:
citations[mid] >= papers_with_at_least_mid_citations
checks whether those papers satisfy the h-index definition.
If the condition is true, we continue searching left because an earlier index may produce a larger valid h-index.
If the condition is false, we move right because larger citation counts are needed.
After the loop terminates, left stores the first valid index, and the final h-index is n - left.
Go Solution
func hIndex(citations []int) int {
n := len(citations)
left := 0
right := n - 1
for left <= right {
mid := left + (right-left)/2
papers := n - mid
if citations[mid] >= papers {
right = mid - 1
} else {
left = mid + 1
}
}
return n - left
}
The Go implementation follows the same binary search logic as the Python version.
One small difference is the calculation of mid:
mid := left + (right-left)/2
This style avoids potential integer overflow in languages where integer sizes are fixed. While overflow is not an issue for this problem's constraints, this is standard binary search practice in Go.
Go slices naturally provide array length through len(citations), and no additional memory allocation is required.
Worked Examples
Example 1
Input:
citations = [0,1,3,5,6]
Initial values:
n = 5
left = 0
right = 4
| Iteration | left | right | mid | citations[mid] | papers = n - mid | Condition |
|---|---|---|---|---|---|---|
| 1 | 0 | 4 | 2 | 3 | 3 | 3 >= 3, true |
| 2 | 0 | 1 | 0 | 0 | 5 | 0 >= 5, false |
| 3 | 1 | 1 | 1 | 1 | 4 | 1 >= 4, false |
After iteration 3:
left = 2
right = 1
Loop ends.
Answer:
n - left = 5 - 2 = 3
Final result:
3
Example 2
Input:
citations = [1,2,100]
Initial values:
n = 3
left = 0
right = 2
| Iteration | left | right | mid | citations[mid] | papers = n - mid | Condition |
|---|---|---|---|---|---|---|
| 1 | 0 | 2 | 1 | 2 | 2 | 2 >= 2, true |
| 2 | 0 | 0 | 0 | 1 | 3 | 1 >= 3, false |
After iteration 2:
left = 1
right = 0
Loop ends.
Answer:
n - left = 3 - 1 = 2
Final result:
2
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(log n) | Binary search halves the search space each iteration |
| Space | O(1) | Only a few variables are used |
The algorithm performs a standard binary search over the sorted array. Each iteration cuts the remaining search range in half, resulting in logarithmic time complexity.
No auxiliary data structures are allocated, so the space complexity remains constant.
Test Cases
from typing import List
class Solution:
def hIndex(self, citations: List[int]) -> int:
n = len(citations)
left = 0
right = n - 1
while left <= right:
mid = (left + right) // 2
papers = n - mid
if citations[mid] >= papers:
right = mid - 1
else:
left = mid + 1
return n - left
solution = Solution()
assert solution.hIndex([0, 1, 3, 5, 6]) == 3 # standard example
assert solution.hIndex([1, 2, 100]) == 2 # large outlier citation
assert solution.hIndex([0]) == 0 # single paper with zero citations
assert solution.hIndex([100]) == 1 # single highly cited paper
assert solution.hIndex([0, 0, 0, 0]) == 0 # no valid h-index
assert solution.hIndex([1, 1, 1, 1]) == 1 # all papers cited once
assert solution.hIndex([4, 4, 4, 4]) == 4 # h-index equals number of papers
assert solution.hIndex([0, 1, 2, 3, 4, 5]) == 3 # mixed increasing sequence
assert solution.hIndex([11, 15]) == 2 # every paper qualifies
assert solution.hIndex([0, 0, 4, 4]) == 2 # exact threshold boundary
| Test | Why |
|---|---|
[0,1,3,5,6] |
Standard example from the problem |
[1,2,100] |
Verifies handling of very large citation counts |
[0] |
Smallest input with no citations |
[100] |
Smallest input with valid h-index |
[0,0,0,0] |
Ensures h-index can be zero |
[1,1,1,1] |
Tests repeated values |
[4,4,4,4] |
Every paper satisfies the condition |
[0,1,2,3,4,5] |
General increasing sequence |
[11,15] |
Small array where all papers qualify |
[0,0,4,4] |
Boundary condition around exact threshold |
Edge Cases
One important edge case occurs when all citation counts are zero. In this situation, no paper satisfies even an h-index of 1. A buggy implementation might incorrectly return 1 due to off-by-one errors in binary search. This implementation correctly returns 0 because the condition citations[mid] >= n - mid never becomes true, causing left to move all the way to n.
Another important case is when every paper has very large citation counts. For example, [10,10,10,10] should produce an h-index of 4. Some incorrect implementations stop too early after finding a valid candidate. This solution continues searching left to locate the first valid index, ensuring the maximum h-index is returned.
A third critical edge case is arrays of length one. With a single paper, the answer can only be 0 or 1. Binary search implementations frequently fail on very small arrays due to incorrect loop conditions or pointer updates. This implementation handles single-element arrays naturally because the standard binary search logic still applies correctly.
Another subtle edge case occurs when the citation count exactly equals the number of remaining papers. For example:
[0,1,3,5,6]
At index 2, we have:
citations[2] = 3
n - 2 = 3
The equality case must count as valid. Using > instead of >= would produce the wrong answer. This implementation correctly uses >=, which matches the mathematical definition of h-index.