Skip to content

Fix quantile empty 29315 #29326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

imran4444shaik
Copy link

Description

This PR fixes two related issues with np.quantile():

  1. Empty array handling: Makes np.quantile([], 0.5) return np.nan consistently with np.median([]) instead of raising IndexError
  2. Integer overflow: Fixes incorrect results for integer arrays with large values (e.g., np.array([32767, -1], dtype=np.int16))

Changes

  1. Added explicit empty array check in _quantile() that returns NaN/NaT-filled array
  2. Added safe float conversion for integer arrays before interpolation
  3. Added tests verifying both fixes

Impact

Testing

Added test cases for:

  • Empty arrays of all supported types
  • Integer arrays with overflow potential
  • Verification against median results
  • Existing functionality remains unchanged
Before:
>>> np.quantile([], 0.5)
IndexError
>>> np.quantile(np.array([32767,-1], dtype=np.int16), 0.5)
49151.0  # Wrong due to overflow

After:
>>> np.quantile([], 0.5)
nan  # Matches median behavior
>>> np.quantile(np.array([32767,-1], dtype=np.int16), 0.5)
16383.0  # Correct, matches median

Fixes numpy#29315 by making  return  consistently with
 instead of raising an IndexError. The fix:

1. Explicitly checks for empty arrays (size=0) in _quantile()
2. Returns NaN/NaT-filled array with correct shape and dtype
3. Maintains consistency with median behavior for empty inputs
4. Preserves all existing functionality for non-empty arrays

Handles all numeric, datetime and timedelta dtypes appropriately.
Fixes integer overflow in quantile calculation by converting integer arrays
to float64 before interpolation. This ensures:

1. Correct calculation for extreme values (e.g., [32767, -1] in int16)
2. Consistent results with median for integer inputs
3. No effect on floating-point or datetime types

The fix handles all integer, unsigned, and boolean types by safely casting
to float before interpolation operations while maintaining existing behavior
for other types. Matches median's behavior of returning float for integers.
@jorenham
Copy link
Member

jorenham commented Jul 5, 2025

Personally I would a fail-fast approach, and have this raise an appropriate error. Since this is size-dependent, this won't help when applying this over an axis: Either all returned values are nan, or none of them are. So I don't see any advantage for returning nan, instead of raising an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Pending authors' response
Development

Successfully merging this pull request may close these issues.

BUG: quantile inconsitent with median for size=0
3 participants