-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Linear Algebra routines as generalized ufuncs #2954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… code from the matrix_multiply in other gufuncs
…ue and the functions are not guaranteed to work with them). also got cholesky working.
…al due to a bug in the harness.
… functions (as is made in linalg module). Changed some code so that it uses BLAS instead of cblas (the fortran functions) also in line with what it is done on linalg. Modified the matrix multiply code, made it simpler and adapted to use blas (it was using cblas with CblasRowMajor, that is not available in the fortran function.
… and needed by out library. It seems to work now on Linux.
…ed from PDL" ufuncs). Only missing from "inspired from PDL" is matrix_multiply3.
added chosolve and poinv, they need testing.
… also a patched f2c to remove warnings.
…. Some bugs already fixed.
…changed to assignment
The only code that is really repeated is the blas-lite and lapack-lite source code that is just a f2c version of the stantard blas/lapack routines. Note that this is only used as a fallback when there is no other blas/lapack provider. As charis mentioned this code could be merged (it is mostly a grunt job involving regenerating the source code using f2p using the union of the functions in both modules, and modifying build scripts). Other than that, there are some differences in the way both modules are implemented that make hard using the kernels in gufuncs in linalg transparently:
It may be possible to implement some linalg functions that can not raise exceptions via gufunc-linalg. Maybe the differences in error handling are not important compared to the benefits of having the functions broadcast. I guess it is something to be discussed. But I also think that it would be a good idea to have the new kernels available for a while before modifying a module that has been working for a while, and consolidate it later after some additional usage. |
linalg uses wrappers to the lapack functions directly, gufunc_linalg uses kernels written in C that call the lapack functions as part of the execution - obviously if linalg used the gufuncs as its backend, then this would change. I don't see how it matters how the old code that we'd be throwing away works. Error handling behavior would be hard to reproduce - If the gufunc code has bad error handling, then this is just a bug in the gufunc code that should be fixed regardless. I suppose the best fix would be to enhance the general (g)ufunc machinery so that error reporting is possible -- this is a pretty fundamentally useful thing to have! -- but lacking that, as a for-now hack you could easily implement is to squirrel the Lapack error code away in a private TLS variable and check for it afterwards. (See Linalg always uses double precision, independently of precision in input parameters. The gufunc_wrappers use single precision lapack functions - This is exactly the kind of subtle incompatibility BS that makes me opposed to having duplicate APIs! Either linalg is wrong, or the gufunc_wrappers are wrong, and one of them should be fixed. Going forward, this kind of stuff will just crop up repeatedly, as people make enhancements to one or the other, or discover more subtle incompatibilities. Just say no to subtly incompatible APIs. Linalg is probably nicer with matrices - not sure what "probably" means here :-) Surely linalg and the gufunc machinery both call But I also think that it would be a good idea to have the new kernels available for a while before modifying a module that has been working for a while, and consolidate it later after some additional usage - No, I disagree here too. If we break linalg, we'll hear about it very quickly (during the RC cycle for anything even moderately bad), and it'll get fixed. If we stick these functions off in some ghetto, then they'll get minimal testing, and if gufuncs_linalg (or linalg) don't work for someone they'll just switch to using linalg (or gufuncs_linalg), and there will end up being people complaining how they were using one of them specifically because of some weird incompatible quirk (like the single-precision thing) and how dare we fix this, and it will be a nightmare to unravel. This kind of thing just makes more work for maintainers, and we don't have enough maintainers. Have the courage of your convictions! Working code is working code! |
Error handling is different for a reason. You can solve many problems with a single call. That means that many can fail with different condition codes. Design was solve all the problems regardless of the error conditions of any single element and provide a way to detect the problem. If you have an array of say 10000 eigvalues problems in an array and the last one fails, you don't want to throw away 9999 valid results. And then, should it report the first error? or all the errors? what happens with the valid results? IMO, In order to have "good" error handling there should be some machinery to handle per element error handling and correction. About matrices... you are right, I guess the machinery wraps matrices properly (or at least if should). there is code in linalg to handle wrapping, but that's because it creates its own temporary buffers. I just want to add: The idea of having this module was two-fold: convenience, as it allows broadcasting to work and performance, as it eliminates any need of looping in python code and removes some python glue code. The second point is performance.
The library was designed with that in mind, not to replace linalg. In that sense it adds value by adding a new tool, not by replacing an existing one. In fact, both complement each other, as an error in an entry executed via gufunc_linalg (that can be detected) could use code in linalg to diagnose the error, having the best of both worlds (speed when everything runs fine, better error diagnosis when something goes wrong. |
In thinking about the library, Numpy tradition is to support a limited number of operations in double precision, while Scipy provides full access to LAPACK. In light of that, I think it would make sense to make the modifications to the ufunc machinery, select among the new c functions to provide basic functionality in Numpy, and move the rest to Scipy where the presence of the LAPACK library can be assumed. @rgommers @pv What do you think about that? As to error handling, IMHO it has always been a bit inadequate in the ufunc loops and if this work leads to improvement so much the better. It might be worth bringing that up on the list for discussion. I'd like to get much of this into Numpy without too much hassle. It is difficult to design and implement a perfect system at the first go, usually it works better to get something out there for people to work with and gather some feedback. It is a variant of the release early, release often mantra. |
I think in the long run it's important to get rid of the duplication of work vs. scipy.linalg and numpy.linalg --- there's little point to have the two compete against each other in features (like this one), that's just wasted effort. One possible path to success is to get the version of the functions in Numpy to feature parity with those in Scipy, and after that just drop the Scipy versions (or make them thin wrappers only so that the rubber hits the road inside Numpy). So if a way to do single-precision arithmetic is added to the routines Numpy, that's good. Adding new routines probably should go through the question: is it basic linear algebra? Having the gufunc versions be completely separate from the numpy.linalg versions is IMHO not very good, as the functional difference is that one of them can operate on a "stacked" set of problems and one cannot. The error handling issue can likely be worked around by defining the generalization from N=1 upwards reasonably (e.g. raise if everything fails, or just drop the concept of LinalgErrors as any code relying on them is likely numerically unstable) --- so I don't see fundamental problems with, say, As far as this PR is concerned --- the feature itself is certainly something I've been wishing for. However, I don't think the API is yet in its final form. My take would be here: +1 for getting it in, but integrating it into numpy.linalg should be considered seriously. So, maybe merge and mark the feature as experimental, and we'll work on from that point? As to the question whether something should go to Scipy: I think this decision should be made on the basis on which functions are "basic" enough to put in numpy.linalg. There's in principle nothing preventing gufuncs in Scipy, and the f2py based wrapping of BLAS/LAPACK currently used in Scipy is not an optimal way to go. I'd be happy to see Scipy linalg functions gufuncized. (I think we can mostly ignore the LAPACK/BLAS issue --- in any serious environment Numpy is linked against a real LAPACK+BLAS combination, and we can always put more stuff to lapack_lite.c if necessary.) |
Thanks Pauli. So if this is merged now the remaining tasks would be
My own priority would be 1), which is pretty much a requirement before proceeding with 2..4. @ovillellas Oscar, I still don't see where the documentation goes. What is your long term interest improving numpy linalg? |
@pv doesn't the history of this separation between numpy.linalg and scipy.linalg have to do with Fortran code? So are you proposing to drop only the scipy versions where there is an existing duplicate C version in numpy now? Therefore scipy.linalg will always contain things that won't be available in numpy.linalg, correct? You're also talking about "basic enough". So we have a basic/non-basic and a C/Fortran divide. You're right that the duplication of work now is a waste. It would help me though to see an overview at some point of what goes where for the scenarios of (a) only dropping duplicates, and (b) gufuncizing everything. |
My Idea was adding an entry to the sphinx documentation for the module, with bits from the gufuncs_linalg_contents.rst file. The idea would be remove that .rst in favor of the sphinx source. Maybe the more technical details could be bundled as a README that is more suited for development. From discussions here I guess that in the documentation it should go in a way that makes clear the differences with linalg, and not to sell it as a replacement (that it is not right now). As soon as I submit that I will ping this thread. I do think that in any case getting the doc write is quite important to avoid problems. I was placing a link to the new entry in the reference, inside the routines section, under the entry for linalg module. I was thinking naming it something in the lines of "Generalized ufuncs for Linear Algebra", but I am open on how to name that and where to place it. WRT the question about my long term interest, we developed this as part of a project and wanted to share it as others may find use for it. I would help to better integrate the code basis, but will be limited on how much time I can put into it, depending on my workload on other projects. I just wanted to add that while developing the module there were plenty of ideas around the gufunc interface. One was related to the error handling mentioned here, were exceptions didn't look like a good fit for vector executions of problems (as an exception kind of aborts the whole execution even if a single element fails). But there were other points raised (and that eventually resulted in the creation of the wrappers instead of it being just a collection of gufuncs in a C module):
|
@rgommers: the historical reason is probably to keep lapack/blas_lite.c reasonably small. The interface to Lapack goes anyway through the Fortran ABI, and numpy.linalg functions indeed link against the system LAPACK/BLAS when available. Wrapping all of LAPACK (well, at least the Also yes, I was only proposing to drop the already existing duplicate functions --- this seems like the path of least resistance. There's additional stuff in Scipy like EDIT: forgot, expokit is eventually coming to scipy.linalg, so there will indeed be more Fortran code there in the future. |
I agree with Pauli that the numpy.linalg/scipy.linalg distinction is not Regarding single vs. double precision, I also think that if single Regarding error handling: it would indeed be excellent to have richer error I take Chuck's point about not trying to be perfect on the first go, but if
Release early, release often is awesome, but we still have to think about This will all be much nicer if we can (1) make sure that this code is To show how close we are, here (I believe) is a complete, strictly def err(_args): This is truly trivial, and we could easily phase out the compatibility On Mon, Apr 8, 2013 at 4:28 PM, Pauli Virtanen notifications@github.comwrote:
|
@njsmith: agreed on the API stability --- your prophecy rings true. Getting as much as possible into numpy.linalg would go with the idea that there should be one --- and preferably only one --- obvious way to do things. To keep the ball rolling: |
So, something like this seems possible: pv/numpy@a344d47...pv:linalg-gu Here, I punted on the float32 vs. float64 thing, and we may want to switch raising warnings instead of LinAlgErrors (at least for inversion), but things seem robust. Test suite passes whatever that means. If we want to improve the error handling mechanism, e.g. raising custom error types, then I think an extension of the current errobj mechanism will be a soft point for pushing more functionality in. Overall, it seems to me that this functionality will be very easy to discover and understand if it's put in numpy.linalg. |
@pv: Keep in mind that gufuncs_linalg.py is in itself a wrapper of _umath_linalg, the C extension module that implements the functionality. In many cases it is a straight call to the underlying C function, but I decided to add it as a full function so that docstrings for all functionality was in a single python file. So if the functionality really ends into linalg, maybe gufuncs_linalg should just go and place the wrapping it provides in linalg itself. Or just keep it for the different exception semantics. Another option would be allowing changing the error behavior by either a keyword parameter in the functions or by setting some variable in the module... |
@ovillellas: yes, I think gufunc_linalg.py shouldn't be kept around when the same features are in numpy.linalg. I think it is not too difficult to collapse everything into numpy.linalg even when maintaining backward compatibility. How the error handling should exactly be done can be discussed. One option indeed is to add global state, e.g. This is perhaps partly orthogonal to the generalized ufunc implementation, as also for single matrices it can be useful to signal errors with NaNs rather than LinAlgerrors. One possibility is to first merge the functionality into numpy.linalg raising LinAlgErrors, and then think how to allow users to e.g. convert errors to warnings, or ignore them altogether. |
@charris, @njsmith: what do you think? Merge now, maybe cherry-picking 1a482c7 and df6fe13, and then do integration with numpy.linalg, sorting out how to get rid of LinAlgErrors and casting to doubles in separate PRs? I have _umath_linalg based backwards-compatible (or so I hope :) versions of the common routines in the branch. |
@pv I'm inclined to merge early, but I think it would be good to open a task, or list of tasks, for what remains. AFAICT, that is to unify the *_lite libraries, decide on and implement the error handling, and use the new routines in linalg as far as possible. It might be a good idea to sort the error handling before this goes in. However you want to get your work in works for me, either cherry-picking or a follow on PR would be fine. |
@pv: Is it an option now to just merge your branch and this together? Would IIUC this would leave master in a working state where the core parts of On Tue, Apr 9, 2013 at 10:58 PM, Charles Harris notifications@github.comwrote:
|
@njsmith: Yes, it was just replacing the core parts in numpy.linalg with the umath_linalg routines where applicable, keeping the old semantics for ndim <= 2, so nothing fancy. (The previous behavior with single vs. double issue is handled by converting single->double before processing and then back double->single.) As far as I know, the backward compatibility is preserved as long as it's about arrays with ndim <= 2. So we could go at it incrementally. The test suite passes, but it should still be tested against some Numpy-using packages, though, didn't do that yet. Some 3rd party code might in principle depend on errors being raised for ndim > 2 input, but I think we can just go and break this. |
The general rule is that we don't worry about error -> non-error So maybe the best approach is for Pauli to go ahead and submit his branch @njsmith https://github.com/njsmith: Yes, it was just replacing the core As far as I know, the backward compatibility is preserved as long as it's — |
@njsmith Makes sense, I guess the discussion should be linked somehow, just adding a pointer to the new pull request will suffice. Pauli's changes just modify linalg itself, right? Just a note: in gufuncs_linalg there is some extra function not present in linalg, so it may be a good idea to add that as well. I am thinking of chosolve (solve using cholesky decomposition for Hermitian, positive-definite matrices). |
@ovillellas: Pauli's changes are here, you can look for yourself :-) Looks like they also add an _ to the beginning of the name of chosolve sounds like an excellent addition indeed... On Wed, Apr 10, 2013 at 5:41 PM, Óscar Villellas Guillén <
|
Closing this to keep the PR list tidy (or tidier...). |
point status will be set. Error handling can be configured for this cases using | ||
np.seterr. | ||
|
||
Additional functions some fused arithmetic, useful for efficient operation over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfinished sentence (also missing verb, typo in "additional")
This pull request contains a module with some lineal algebra functions implemented as generalized ufuncs. It is implemented using LAPACK/BLAS, and it is configured to compile using the same LAPACK/BLAS configuration as numpy.linalg. It also includes some extras, like some fused arithmetic ufuncs (multiply_add and similar ones).
More information can be found in gufuncs_linalg_contents.rst