-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
DOC: Document interoperability best practices #20998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would suggest we discuss this here (or in a new topic there):
I think the transition problem should be clear enough. In my personal opinion, the question is not about what we want, but about how to transition users. There are three ways to do that, which I noted in the second point in the above discussion. For this issue: I doubt we should bother here for documentation, this needs to be solved ecosystem wide in those discussions first. Note that IMO, NEP 37 "moves" the responsibility a bit, but doesn't remove the problem itself entirely. |
Thanks a lot @seberg. I went over those posts during the weekend - it's quite a lot to digest and I don't think I'll easily wrap my head around all of it, but in any case, I'll try to see what things we can distill from those discussions now (even by adding notes or warnings about possible caveats), and what is still in flux and needs wider ecosystem consensus first. |
To give one particular example, it seems that the idea of subclassing ndarrays is frowned upon, but I couldn't find detailed explanations, only breadcrumbs. For example, @rgommers in one of the threads you linked:
Subclasses are tangentially mentioned in NEP 47 when discussing the
And the readme of your precodita:
In summary, there seems to be some "tribal knowledge" that I would like to help properly capture in written form. It doesn't have to be "everybody agrees subclassing is bad", but at least we could describe the kind of surprises that might appear, how to avoid them, etc. |
I'll be honest - I'd be out of my depth here. But as long as someone can point me in the right direction as far as what should actually be documented, or even pair up on this doc, I'd be happy to work on it. |
I prefer "culture" ;). More serious, subclasses are problematic for exactly the same reason that subclasses are always problematic: If you break Liskov's substitution principle, you have to live with the consequences. The problem is that:
but while such notes may make sense somewhere, I think they would belong on the doc page for "subclassing" which should probably point to the interop stuff. There is nothing wrong with subclassing (memmaps are fine, except that they should demote more aggressively IMO), but almost everyone breaks Liskov's. That is OK, but there are clear limits and most of the time the gains (a few things "just work") are not worth the downsides (the stuff that doesn't work may surprise you). EDIT: OK, to be clear, I am not sure masked arrays break "Liskov" explicitly – matrix does. But they "enrich" the base class in ways that mean you lose vital information if you go back to the base-class. |
I think we have consensus that for many users subclassing is a bad idea, but our docs don't really give much of a hint as to why, and what else. We long have the answers for these, so this is a start to actually write them down. Addresses some of the points in numpygh-20998.
I think we have consensus that for many users subclassing is a bad idea, but our docs don't really give much of a hint as to why, and what else. We long have the answers for these, so this is a start to actually write them down. Addresses some of the points in numpygh-20998.
I think we have consensus that for many users subclassing is a bad idea, but our docs don't really give much of a hint as to why, and what else. We long have the answers for these, so this is a start to actually write them down. Addresses some of the points in numpygh-20998.
Issue with current documentation:
I was reading the wonderful Interoperability with NumPy guide introduced in #20185 (thanks!) and it does a very good job at explaining what the user can do, by giving (a) detailed explanations of what the different methods do and (b) showing examples from popular libraries that apply such methods.
However, if one dives deep enough there seems to be some conflicting information. For example, NEP 18 (2018) says:
However, NEP 37 (2019) recollects
Therefore, from my understanding (and I'd love to be corrected here if I'm wrong!) even though NEP 18 left the return value of
__array_function__
loosely defined, there was some friction when projects tried to adopt it. I think these findings that were only possible after putting the code in the hands of users are extremely valuable for new library authors.In a way, I believe this goes to the core of something the guide briefly mentions at the beginning (inspiration from the chart at the top of data-apis/array-api#1): the difference between "array providers" (CuPy, pydata/sparse, dask.array) and "array consumers" (xarray), including the gray area between the two (pandas Series consume NumPy arrays but they also offer an array interface, and same goes to astropy.units).
Idea or request for content:
Expand the current interoperability guide (or add another guide/cookbook/whatever) that addresses practical questions like
(random questions off the top of my head)
If folks are happy to discuss these, either here or in the mailing list, I'd be happy to try to sort out those thoughts and contribute something to the docs myself.
The text was updated successfully, but these errors were encountered: