Skip to content

Add XArray compatibility features #102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 22, 2018

Conversation

hameerabbasi
Copy link
Collaborator

@hameerabbasi hameerabbasi commented Feb 19, 2018

Once the right broadcasting setup was in place; it was trivial to implement the three-argument version of where and NaN-skipping aggregations.

cc @mrocklin Feedback welcome.
cc @shoyer Feedback really welcome, as you know the ins and outs of XArray. Of course, "I can't" is okay; as always. :-)

@hameerabbasi
Copy link
Collaborator Author

Tests not yet added.

@hameerabbasi
Copy link
Collaborator Author

hameerabbasi commented Feb 19, 2018

Unfortunately, there doesn't seem to be a way to override np.nansum. I added a print command in there, and:

>>> import numpy as np
>>> import sparse
>>> x = np.asarray([5, 6, np.nan])
>>> s = sparse.COO.from_numpy(x)
>>> s.nansum()
nanreduce
11.0
>>> np.nansum(s)
11.0

Edit: It doesn't want to fall back to our implementation even if we err when coercing.

>>> class NoncoercibleCOO(sparse.COO):
...     def __array__(self, *args, **kwargs):
...         raise ValueError('Cannot coerce COO.')
...     
>>> s = NoncoercibleCOO.from_numpy(x)
>>> np.nansum(s)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/hameer/anaconda3/envs/sparse/lib/python3.6/site-packages/numpy/lib/nanfunctions.py", line 581, in nansum
    a, mask = _replace_nan(a, 0)
  File "/Users/hameer/anaconda3/envs/sparse/lib/python3.6/site-packages/numpy/lib/nanfunctions.py", line 64, in _replace_nan
    a = np.array(a, subok=True, copy=True)
  File "<input>", line 3, in __array__
ValueError: Cannot coerce COO.

@hameerabbasi
Copy link
Collaborator Author

Hmm. It seems that (for nanmin and nanmax), sometimes we get -inf as a value (on a full axis that contained JUST nan and nothing else), whereas NumPy actually returns nan. I think the actual value of the reduction in this case is debatable (I, for one, think empty min should be inf, etc, as a hobbyist mathematician).

In any case, I'm willing to leave this as an xfail for now. Any input welcome.

@hameerabbasi
Copy link
Collaborator Author

hameerabbasi commented Feb 19, 2018

I added tests and matched the Numpy API. Looks review ready to me. 💃 cc @mrocklin @shoyer

@hameerabbasi
Copy link
Collaborator Author

hameerabbasi commented Feb 20, 2018

Do we want to support the edge case of NaNs in object arrays? It's proving difficult. It's certainly possible, but only with a bit of trickery.

@mrocklin
Copy link
Contributor

mrocklin commented Feb 20, 2018 via email

@shoyer
Copy link
Member

shoyer commented Feb 20, 2018 via email

@hameerabbasi
Copy link
Collaborator Author

It seems we don't support object arrays in many cases at the moment. I've opened #104 to track this but am inclined to ignore this.

@hameerabbasi
Copy link
Collaborator Author

hameerabbasi commented Feb 21, 2018

I fixed the nan issue (by using fmin/fmax); tested for it; and matched the Numpy warning (and tested for it). This time around I feel this really is ready for review.

Edit: I looked at the implementation for nanmin etc. in Numpy. Unfortunately, without hooking into Numpy and replacing its functions with our own if the args are SparseArray, I don't see a way to solve this.

@mrocklin
Copy link
Contributor

This looks good to me :)

@hameerabbasi hameerabbasi merged commit 8965294 into pydata:master Feb 22, 2018
@hameerabbasi
Copy link
Collaborator Author

Merged!

@hameerabbasi hameerabbasi deleted the xarray-compat branch February 22, 2018 09:00
hameerabbasi added a commit to hameerabbasi/sparse that referenced this pull request Feb 27, 2018
* Implement where.

* Implement NaN-skipping aggregations.

* Docs.

* Add tests, clarify docs a bit.

* Move NaN aggregations to be functions rather than methods to match Numpy

* Get rid of eval that was bothering me a lot.

* Fix NaN inequality issue.

* Remove object dtype code.

* Test for and fix warning code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants