-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Implement DataFrame.__array_ufunc__ #36955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
01024ef
to
0d725e8
Compare
For some cases, this will preserve extension types of arrays by calling the ufunc blockwise. ```python In [1]: import pandas as pd; import numpy as np In [2]: df = pd.DataFrame({"A": pd.array([0, 1], dtype="Sparse")}) In [3]: np.sin(df).dtypes Out[3]: A Sparse[float64, nan] dtype: object ``` We don't currently handle the multi-input case well (aside from ufuncs that are implemented as dunder ops like `np.add`). For these, we fall back to the old implementation of converting to an ndarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm some questions.
needs rebase, otherwise i think pretty ready |
can you merge master |
@TomAugspurger needs rebase, otherwise i think good to go |
) | ||
else: | ||
reconstruct_axes = dict(zip(self._AXIS_ORDERS, self.axes)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally can you split this function up a bit (if easy)
doc/source/whatsnew/v1.2.0.rst
Outdated
@@ -220,6 +220,8 @@ Other enhancements | |||
- :meth:`DatetimeIndex.searchsorted`, :meth:`TimedeltaIndex.searchsorted`, :meth:`PeriodIndex.searchsorted`, and :meth:`Series.searchsorted` with datetimelike dtypes will now try to cast string arguments (listlike and scalar) to the matching datetimelike type (:issue:`36346`) | |||
- | |||
- Added methods :meth:`IntegerArray.prod`, :meth:`IntegerArray.min`, and :meth:`IntegerArray.max` (:issue:`33790`) | |||
- Calling a NumPy ufunc on a ``DataFrame`` with extension types now presrves the extension types when possible (:issue:`23743`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presrves -> preserves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update :->
if we were to implement does this make |
IIRC I tried removing |
…ame-array-ufunc
@TomAugspurger can you merge master? i tried doing it myself but am getting some really weird-looking errors |
Gonna be a bit probably.
… On Nov 20, 2020, at 2:02 PM, jbrockmendel ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> can you merge master? i tried doing it myself but am getting some really weird-looking errors
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#36955 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOISEIYDAE3HVNRKCPODSQ3DMRANCNFSM4SHYJLYQ>.
|
…ame-array-ufunc
ideally we could do this for 1.2 if can merge master |
@jreback i merged master 2 days ago, this should be OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok 2 comments i think should do here (these are typos / doc-string). my comments about splitting up the actual function can be done later. ping on green.
doc/source/whatsnew/v1.2.0.rst
Outdated
@@ -220,6 +220,8 @@ Other enhancements | |||
- :meth:`DatetimeIndex.searchsorted`, :meth:`TimedeltaIndex.searchsorted`, :meth:`PeriodIndex.searchsorted`, and :meth:`Series.searchsorted` with datetimelike dtypes will now try to cast string arguments (listlike and scalar) to the matching datetimelike type (:issue:`36346`) | |||
- | |||
- Added methods :meth:`IntegerArray.prod`, :meth:`IntegerArray.min`, and :meth:`IntegerArray.max` (:issue:`33790`) | |||
- Calling a NumPy ufunc on a ``DataFrame`` with extension types now presrves the extension types when possible (:issue:`23743`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update :->
thanks @TomAugspurger very nice |
Thanks Brock. |
The fact that we now align is a breaking change, but it seems one that could easily be detected (when the operands are not yet aligned) and done with a deprecation cycle? |
Did we face a similar issue with Series implementing I vaguely recall a discussion around |
Yes, we did a similar change back then for Series I think, and it seems we didn't discuss it much at the time either. And I also don't recall (m)any issues about it that were raised after doing the change. Which is certainly a sign that this might be indeed OK to simply change. I would say the main differences is that for Series we changed this in 0.25, but now for 1.x, we said to do changes using a deprecation cycle where possible (and this seems a case where it is easily possible) |
Right, the 1.0 vs. 0.25 distinction is the important part. I'd prefer to see a deprecation warning for the alignment, but I don't know if I'll have time to do it for this release. One complication just noticed: there isn't an easy way to silence the warning, like with a keyword. We'd need to only warn when the inputs are unaligned, and require users to call |
Indeed, aligning manually would be the way to get around the warning if you want the new behaviour, otherwise the solution is to first convert to numpy arrays before calling the ufunc, in case you want the old non-aligning behaviour. |
For some cases, this will preserve extension types of arrays by calling
the ufunc blockwise.
Implementation-wise, this was done by moving
Series.__array_ufunc__
toNDFrame
and making it generic for Series / DataFrame. The DataFrame implementation goes throughBlockManager.apply(ufunc)
.We don't currently handle the multi-input case well for dataframes (aside from ufuncs that are implemented as dunder ops like
np.add
). For these, we fall back to the old implementation of converting to an ndarray and wrapping the result. This loses extension types.We also don't currently handle multi-output ufuncs (like
np.modf
). This would require aBlockManager.apply
that returns aTuple[BlockManager]
,nout
per input block. Maybe someday, but that's low priority.closes #23743