-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF: Implement NDArrayBackedExtensionArray #33660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/arrays/datetimelike.py
Outdated
# NB: A bunch of Interval tests fail if we use ._data | ||
return self.asi8 | ||
|
||
def _from_backing_data(self, arr: np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a new method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
------ | ||
ValueError | ||
""" | ||
raise AbstractMethodError(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseMaskedArray is in core/arrays/masked.py. If the purpose of this is to create a common base class for another subset of extensionarrays, can we make the module names for the base classes more consistent.
We could also restructure the extension array tests with a similar hierarchy, so that could influence the naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to other naming ideas; what do you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have core/arrays/numpy_ for PandasArray, so the simple numpy name is taken. The difference though is a PandasArray can be instantiated (It is also possible to instantiate a BaseMaskedArray directly) whereas NDArrayBackedExtensionArray is abstract. maybe we need a subdirectory of core/arrays for the base classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could reasonably put the mixin in arrays.numpy_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Categorical.take docsting explicitly stated that it raises a TypeError.
While I agree that a ValueError is more appropriate, do we consider than an API change? We'll want a release note regardless.
@@ -1780,85 +1781,17 @@ def fillna(self, value=None, method=None, limit=None): | |||
|
|||
return self._constructor(codes, dtype=self.dtype, fastpath=True) | |||
|
|||
def take(self, indexer, allow_fill: bool = False, fill_value=None): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to restore this docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restored with TypeError->ValueError and whatsnew
|
||
def _from_backing_data(self, arr: np.ndarray): | ||
# Note: we do not retain `freq` | ||
return type(self)(arr, dtype=self.dtype) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unlike Frame and Series, EAs do not have the concept of _constructor. could this be useful?
the exception is Categorical which uses _constructor. so maybe either extend to all EAs or can we remove from Categorical for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC Categorical has _constructor because it subclasses PandasObject.
_from_backing_data
is specifically about round-tripping or commutativity; not really a good fit for _constructor.
return self._codes | ||
|
||
def _from_backing_data(self, arr: np.ndarray): | ||
return self._constructor(arr, dtype=self.dtype, fastpath=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see other comment re _constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC Categorical has _constructor because it subclasses PandasObject.
_from_backing_data
is specifically about round-tripping or commutativity; not really a good fit for _constructor.
in PandasObject _constructor is type(self), in Categorical, _constructor is Categorical (i.e type(self)). so I think it may worth investigating whether we can remove _constructor from Categorical and use type(self) here to be consistent with datetimelike._from_backing_data.
can u add return types for _from_backing_data, both here and in datetimelike
|
||
_ndarray: np.ndarray | ||
|
||
def _from_backing_data(self: _T, arr: np.ndarray) -> _T: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is typing self
needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding is that is how we indicate that the return type is "same type as self", but id defer to @simonjayhawkins on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. T is a typevar, i.e. can take on different types (could be subtypes of a type or union of types) so adding to self binds the typevar and the return type is the same as self.
This can sometimes cause problems in Mixins, but here the 'Mixin' is IMO an abstract base class.
Many EAs are thin wrappers around np.ndarray. We can both de-duplicate a bunch of code and make things easier on downstream authors by implementing
NDArrayBackedExtensionArray
as a base class for such EAs.This PR only implements
NDArrayBackedExtensionArray.take
, but there is quite a bit more that can be shared in follow-ups:__getitem__
,__setitem__
, reductions, ...The only change in logic this PR makes is to make
Categorical.take
raise a ValueError instead of aTypeError
, matching DTA/TDA/PA.