LeetCode 275 - H-Index II

This problem asks us to compute a researcher's h-index from a sorted list of citation counts. The input array citations is sorted in non-decreasing order, meaning the citation counts appear from smallest to largest.

LeetCode Problem 275

Difficulty: 🟡 Medium
Topics: Array, Binary Search

Solution

LeetCode 275, H-Index II

Problem Understanding

This problem asks us to compute a researcher's h-index from a sorted list of citation counts. The input array citations is sorted in non-decreasing order, meaning the citation counts appear from smallest to largest.

The h-index is defined as the largest integer h such that the researcher has at least h papers with at least h citations each.

For example, if a researcher has citation counts [0,1,3,5,6], then:

  • There are 3 papers with at least 3 citations, specifically 3,5,6
  • There are not 4 papers with at least 4 citations

Therefore, the h-index is 3.

The important detail in this problem is that the array is already sorted. Unlike the original H-Index problem where sorting may be required, here we can take advantage of the ordering to design a logarithmic time solution.

The input size can be as large as 10^5, so an inefficient quadratic approach would not work. The problem explicitly requires a logarithmic time algorithm, which strongly suggests binary search.

The array is guaranteed to be sorted in ascending order, and citation counts are non-negative integers. This guarantee is extremely important because binary search only works correctly on ordered data.

Several edge cases are important to think about:

  • Arrays where all citation counts are 0
  • Arrays where every paper has very high citations
  • Arrays with only one paper
  • Arrays where the h-index equals the total number of papers
  • Arrays where the h-index is 0

A naive implementation can easily make off-by-one mistakes because the relationship between the array index and the number of papers remaining must be handled carefully.

Approaches

Brute Force Approach

A straightforward solution is to test every possible value of h from 0 to n.

For each candidate h, we count how many papers have citations greater than or equal to h. If at least h papers satisfy this condition, then h is valid.

Because the array is sorted, we could scan through it for each candidate value.

This approach is correct because it directly checks the definition of h-index. However, it is inefficient because for every possible h, we may scan the entire array.

With n possible values of h and up to n work for each one, the total complexity becomes O(n^2).

Even if optimized slightly, it still does not satisfy the logarithmic time requirement.

Optimal Binary Search Approach

The key observation is that the array is sorted.

Suppose we are at index i in the array. Since the array is sorted ascending:

  • All elements from i to n - 1 are at least citations[i]
  • The number of papers remaining is n - i

If citations[i] >= n - i, then there are at least n - i papers with at least n - i citations.

That means n - i is a valid h-index candidate.

Because the condition transitions monotonically across the sorted array, binary search can locate the first index where:

citations[i] >= n - i

Once we find this position, the answer is simply:

n - i

This reduces the time complexity to O(log n).

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(1) Tests every possible h-index by scanning papers
Optimal O(log n) O(1) Uses binary search on the sorted array

Algorithm Walkthrough

  1. Let n be the number of papers.

The array length matters because if we are at index i, then there are exactly n - i papers to the right, including the current paper. 2. Initialize binary search boundaries.

Set:

left = 0
right = n - 1

We will search for the first index where the h-index condition becomes true. 3. Compute the middle index.

During each iteration:

mid = (left + right) // 2
  1. Determine how many papers remain.

Since the array is sorted ascending, the papers from mid through the end all have at least citations[mid] citations.

The count of such papers is:

papers = n - mid
  1. Check whether the current position satisfies the h-index condition.

If:

citations[mid] >= papers

then we found a valid h-index candidate.

However, there may be an earlier index that also works and produces a larger h-index, so continue searching on the left side:

right = mid - 1
  1. Otherwise, move right.

If:

citations[mid] < papers

then the current position cannot produce a valid h-index.

We need larger citation values, so search the right half:

left = mid + 1
  1. After the binary search finishes, left points to the first valid index.

The answer becomes:

n - left

Why it works

The binary search works because the condition is monotonic.

As the index increases:

  • citations[i] tends to increase
  • n - i decreases

Eventually, the condition:

citations[i] >= n - i

changes from false to true and stays true afterward.

Binary search is ideal for locating this transition point efficiently.

Python Solution

from typing import List

class Solution:
    def hIndex(self, citations: List[int]) -> int:
        n = len(citations)

        left = 0
        right = n - 1

        while left <= right:
            mid = (left + right) // 2

            papers_with_at_least_mid_citations = n - mid

            if citations[mid] >= papers_with_at_least_mid_citations:
                right = mid - 1
            else:
                left = mid + 1

        return n - left

The implementation begins by storing the array length in n. This value is repeatedly used to calculate how many papers remain from a given index onward.

The binary search boundaries are initialized with left and right. During each iteration, the middle index is computed.

The expression:

papers_with_at_least_mid_citations = n - mid

represents how many papers exist from index mid to the end of the array.

The condition:

citations[mid] >= papers_with_at_least_mid_citations

checks whether those papers satisfy the h-index definition.

If the condition is true, we continue searching left because an earlier index may produce a larger valid h-index.

If the condition is false, we move right because larger citation counts are needed.

After the loop terminates, left stores the first valid index, and the final h-index is n - left.

Go Solution

func hIndex(citations []int) int {
	n := len(citations)

	left := 0
	right := n - 1

	for left <= right {
		mid := left + (right-left)/2

		papers := n - mid

		if citations[mid] >= papers {
			right = mid - 1
		} else {
			left = mid + 1
		}
	}

	return n - left
}

The Go implementation follows the same binary search logic as the Python version.

One small difference is the calculation of mid:

mid := left + (right-left)/2

This style avoids potential integer overflow in languages where integer sizes are fixed. While overflow is not an issue for this problem's constraints, this is standard binary search practice in Go.

Go slices naturally provide array length through len(citations), and no additional memory allocation is required.

Worked Examples

Example 1

Input:

citations = [0,1,3,5,6]

Initial values:

n = 5
left = 0
right = 4
Iteration left right mid citations[mid] papers = n - mid Condition
1 0 4 2 3 3 3 >= 3, true
2 0 1 0 0 5 0 >= 5, false
3 1 1 1 1 4 1 >= 4, false

After iteration 3:

left = 2
right = 1

Loop ends.

Answer:

n - left = 5 - 2 = 3

Final result:

3

Example 2

Input:

citations = [1,2,100]

Initial values:

n = 3
left = 0
right = 2
Iteration left right mid citations[mid] papers = n - mid Condition
1 0 2 1 2 2 2 >= 2, true
2 0 0 0 1 3 1 >= 3, false

After iteration 2:

left = 1
right = 0

Loop ends.

Answer:

n - left = 3 - 1 = 2

Final result:

2

Complexity Analysis

Measure Complexity Explanation
Time O(log n) Binary search halves the search space each iteration
Space O(1) Only a few variables are used

The algorithm performs a standard binary search over the sorted array. Each iteration cuts the remaining search range in half, resulting in logarithmic time complexity.

No auxiliary data structures are allocated, so the space complexity remains constant.

Test Cases

from typing import List

class Solution:
    def hIndex(self, citations: List[int]) -> int:
        n = len(citations)

        left = 0
        right = n - 1

        while left <= right:
            mid = (left + right) // 2

            papers = n - mid

            if citations[mid] >= papers:
                right = mid - 1
            else:
                left = mid + 1

        return n - left

solution = Solution()

assert solution.hIndex([0, 1, 3, 5, 6]) == 3  # standard example
assert solution.hIndex([1, 2, 100]) == 2  # large outlier citation
assert solution.hIndex([0]) == 0  # single paper with zero citations
assert solution.hIndex([100]) == 1  # single highly cited paper
assert solution.hIndex([0, 0, 0, 0]) == 0  # no valid h-index
assert solution.hIndex([1, 1, 1, 1]) == 1  # all papers cited once
assert solution.hIndex([4, 4, 4, 4]) == 4  # h-index equals number of papers
assert solution.hIndex([0, 1, 2, 3, 4, 5]) == 3  # mixed increasing sequence
assert solution.hIndex([11, 15]) == 2  # every paper qualifies
assert solution.hIndex([0, 0, 4, 4]) == 2  # exact threshold boundary
Test Why
[0,1,3,5,6] Standard example from the problem
[1,2,100] Verifies handling of very large citation counts
[0] Smallest input with no citations
[100] Smallest input with valid h-index
[0,0,0,0] Ensures h-index can be zero
[1,1,1,1] Tests repeated values
[4,4,4,4] Every paper satisfies the condition
[0,1,2,3,4,5] General increasing sequence
[11,15] Small array where all papers qualify
[0,0,4,4] Boundary condition around exact threshold

Edge Cases

One important edge case occurs when all citation counts are zero. In this situation, no paper satisfies even an h-index of 1. A buggy implementation might incorrectly return 1 due to off-by-one errors in binary search. This implementation correctly returns 0 because the condition citations[mid] >= n - mid never becomes true, causing left to move all the way to n.

Another important case is when every paper has very large citation counts. For example, [10,10,10,10] should produce an h-index of 4. Some incorrect implementations stop too early after finding a valid candidate. This solution continues searching left to locate the first valid index, ensuring the maximum h-index is returned.

A third critical edge case is arrays of length one. With a single paper, the answer can only be 0 or 1. Binary search implementations frequently fail on very small arrays due to incorrect loop conditions or pointer updates. This implementation handles single-element arrays naturally because the standard binary search logic still applies correctly.

Another subtle edge case occurs when the citation count exactly equals the number of remaining papers. For example:

[0,1,3,5,6]

At index 2, we have:

citations[2] = 3
n - 2 = 3

The equality case must count as valid. Using > instead of >= would produce the wrong answer. This implementation correctly uses >=, which matches the mathematical definition of h-index.