ENH, MAINT: Refactor `PyArray_InnerProduct` to use `PyArray_MatrixProduct2` #6968

jakirkham · 2016-01-07T21:20:13Z

Related: #6932
Related: #6977
Related: #6986
Related: #6987
Related: #6988

This follows @seberg's suggestion ( #6948 (comment) ) and does the simplest thing. Namely, refactors PyArray_InnerProduct to use PyArray_MatrixProduct2. As a consequence, np.inner will see a speedup in cases where the problem contains portions like these a @ a.T and a.T @ a as this already is optimized in cblas_matrixproduct because of this PR ( #6932 ).

Todo: - [x] Refactor so that `PyArray_InnerProduct` just calls `PyArray_MatrixProduct2`. - [x] Add more tests of different dimensions for `np.inner` to make sure it still behaves correctly. - [x] Add benchmarks so that cases where `np.inner` can now see a speedup are shown vs those it can't as was done with `np.dot`. - [x] Add test for type mismatch exception.

seberg · 2016-01-07T21:30:15Z

Yeah, that is what I mean. I will let someone else decide further, but I think the transpose may need to be a bit more complicated. To figure that out/test it, can you add a test for multiplying larger then 2D arrays?

jakirkham · 2016-01-07T21:40:22Z

My understanding is that this isn't intended for public consumption as it only works on 2D arrays or smaller just as cblas_matrixproduct. Instead, there is a PyArray_InnerProduct, which is intend for use in the C API and is bound to np.inner in the Python API.

This only gets called in one place (

numpy/numpy/core/src/multiarray/multiarraymodule.c

Lines 848 to 854 in eeba2cb

    
           #if defined(HAVE_CBLAS) 
        
               if (PyArray_NDIM(ap1) <= 2 && PyArray_NDIM(ap2) <= 2 && 
        
                       (NPY_DOUBLE == typenum || NPY_CDOUBLE == typenum || 
        
                        NPY_FLOAT == typenum || NPY_CFLOAT == typenum)) { 
        
                   return cblas_innerproduct(typenum, ap1, ap2); 
        
               } 
        
           #endif

). As you can see, we are ensured to only have arrays with dimension 2 or less inside cblas_innerproduct.

seberg · 2016-01-07T21:52:07Z

Ah, right. Frankly, I would much prefer if we can do it for PyArray_InnerProduct making it just call PyArray_MatrixProduct2. Then remove the whole cblas_innerproduct function. Though I am not sure it makes sense, my guess would be it does.

jakirkham · 2016-01-07T21:54:38Z

It looks like there are some einsum tests that also give inner 3D arrays, but I don't know that this is the best way to test inner on these cases by comparing it to the much more complex einsum. I can certainly add some. ( https://github.com/jakirkham/numpy/blob/e72e1510d892ce4464cf102000e89582327953a0/numpy/core/tests/test_einsum.py#L262-L275 ).

Other things this probably needs are some benchmarks as I am claiming one can now get a speedup with inner in some special cases.

jakirkham · 2016-01-07T21:59:17Z

Ah, right. Frankly, I would much prefer if we can do it for PyArray_InnerProduct making it just call PyArray_MatrixProduct2. Then remove the whole cblas_innerproduct function. Though I am not sure it makes sense, my guess would be it does.

It is probably removable, but what I liked about your original proposal (at least how I read it) is everything stays pretty much the same in terms of how everything builds. Also, how many, which, and where functions are remains the same. As soon as we start messing with that, we open ourselves up to spending time hunting down potentially weird build errors. At present, this just works.

seberg · 2016-01-07T22:02:53Z

Well, look at MatrixProduct2 ;), it is a monster of annoying stuff and all that is needed to call it from inner is transposing the last two axes of op2 (if op2.ndim >= 2). The added complexity is only that the transpose is more complicated because you cannot pass in NULL. Other then that, you just remove more complexity. Note that the function you changed can be removed completely, it is not public.

jakirkham · 2016-01-07T22:19:06Z

It would be nice if there was a PyArray_RollAxis.

seberg · 2016-01-07T22:30:13Z

Well, I don't know, you probably have to go way back to figure that one out ;). That is why I would like it refactored away. You save a single tranpose (array creation) for all I figure.

jakirkham · 2016-01-07T22:35:37Z

Sorry, I realized I was asking a question that was basically becoming "why do we have numpy.dot and numpy.inner?" and decided I should just be quiet lest this gets derailed.

jakirkham · 2016-01-07T22:39:44Z

BTW, did you see this ( #5859 )?

jakirkham · 2016-01-08T17:30:19Z

So, if I make this simple change ( jakirkham@2288e34 ), I get a segmentation fault in the test suite. Is there something that I am doing wrong here? I'm not very familiar with the C API so I wouldn't be surprised if I am. I just need a few pointers.

seberg · 2016-01-08T17:59:36Z

Can you do me the favor and try to refactor all of PyArray_Inner? At least to me doing the transpose specifically in this subfunction seems half baked. Doing the full transpose should not be too difficult, see for example https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/mapping.c#L135 for a "complex" Transpose operation, but it gives you the idea that you can just iterate and set all ndim (if ndim >= 2), and then switch the last two ndim around.

jakirkham · 2016-01-08T18:02:18Z

So, I actually did that too. Though I have not pushed it to the PR yet.

I am struggling with a segmentation fault there, as well. Here is the commit ( jakirkham@8f5464c ). Any pointers on why the segmentation fault occurs in either case would be helpful.

jakirkham · 2016-01-10T00:10:19Z

Tried to add some tests for the exception, but they fail USE_DEBUG=1 only due to some issue with the exception. Not really, sure what is going on there. Thought it might be related to this ( http://bugs.python.org/issue20500 ), but that is suppose to be fixed. Any pointers anybody has would be helpful. Maybe related to this ( #6741 ), but I doubt it.

jakirkham · 2016-01-10T02:28:29Z

So, I have removed the changed exception as it appears to be somewhat controversial and it is in a different PR ( #6987 ) with exception tests. I have also removed the exceptions tests as they seem to have issues that at the moment I cannot seem to figure out and placed them in another PR ( #6988 ). If we can get these working, I am willing to combine them back into this PR. However, I don't want this PR's fate to be determined by trying to come up with acceptable solutions to these more minor issues.

jakirkham · 2016-01-10T18:35:05Z

Looks good to me, did you check there are tests for 3-dim arrays, or could you add them if they don't exist?

There are some, but they are comparing einsum to inner. Though we probably should have some tests for inner that don't rely on another function (especially a more complex one) to determine success, which I am adding. Also, probably could use at least one syrk vs. gemm test.

Yes, they went in the PR ( #6986 ), which is already merged. This PR has been rebased on master after those commits were added so includes them.

jakirkham · 2016-01-11T05:38:29Z

Figured out the issues with type exceptions thanks to help from @seberg and @charris. As they are now working, I have brought that PR ( #6988 ) back into the history of this PR.

jakirkham · 2016-01-11T14:37:22Z

Failures on AppVeyor ( #6991 ) had not previously occurred for this PR with exception of the segmentation fault issue (also happened on Travis), which has since been resolved. I believe these to be unrelated to the content of this PR.

jakirkham · 2016-01-11T16:21:31Z

As AppVeyor is merging with master, which is broken, it is currently failing the tests there. So, I ran my own AppVeyor build without this merge (this is rebased on a commit on master before the bad commit) and it looks like it checks out ( https://ci.appveyor.com/project/jakirkham/numpy/build/1.0.4 ).

…to a common type.

…into a common type.

…Object_Repr`. Also, do a better job of handling any errors raised while constructing the error message.

…nspose and calls `PyArray_MatrixProduct2`.

ENH, MAINT: Refactor `PyArray_InnerProduct` to use `PyArray_MatrixProduct2`

charris · 2016-01-12T17:33:42Z

Thanks @jakirkham . The cblasfuncs.c could use some style fixes, but that isn't new here.

seberg · 2016-01-12T17:45:44Z

Thanks!

jakirkham · 2016-01-12T19:04:07Z

Thanks everyone.

Alright, @charris, I'll try to look at this at some point soon.

jakirkham · 2016-01-14T15:55:55Z

The benchmark shows the syrk speedup. ( https://pv.github.io/numpy-bench/#bench_linalg.Eindot.time_inner_trans_a_a )

mhvk · 2016-01-14T16:00:39Z

Very nice!

jakirkham · 2016-01-16T22:45:21Z

An attempt at fixing the style errors, @charris. ( #7038 )

jakirkham mentioned this pull request Jan 7, 2016

np.innerproduct(a, a) should be detected and dispatched to the BLAS syrk routine #6948

Closed

jakirkham force-pushed the optimize_innerproduct branch from 770b9bc to 9e8a47f Compare January 7, 2016 21:36

jakirkham changed the title ~~MAINT: Refactor cblas_innerproduct to use cblas_matrixproduct~~ ENH, MAINT: Refactor cblas_innerproduct to use cblas_matrixproduct Jan 8, 2016

jakirkham force-pushed the optimize_innerproduct branch 12 times, most recently from 6a4903e to d2ea634 Compare January 8, 2016 17:26

jakirkham force-pushed the optimize_innerproduct branch from d2ea634 to 81d4275 Compare January 8, 2016 17:55

jakirkham changed the title ~~ENH, MAINT: Refactor cblas_innerproduct to use cblas_matrixproduct~~ WIP, ENH, MAINT: Refactor cblas_innerproduct to use cblas_matrixproduct Jan 8, 2016

jakirkham force-pushed the optimize_innerproduct branch 4 times, most recently from a5220ef to c616a02 Compare January 9, 2016 23:29

jakirkham force-pushed the optimize_innerproduct branch from c616a02 to 0c708d8 Compare January 10, 2016 02:22

jakirkham force-pushed the optimize_innerproduct branch 2 times, most recently from df15b9a to 920b70c Compare January 11, 2016 05:32

jakirkham force-pushed the optimize_innerproduct branch from 920b70c to 2ba0898 Compare January 11, 2016 14:41

jakirkham mentioned this pull request Jan 11, 2016

BUG: Fix dot and inner type/value exception failures #6988

Merged

jakirkham added 5 commits January 11, 2016 19:42

TST: Ensure dot fails correctly if array types cannot be coerced in…

b491cc8

…to a common type.

TST: Ensure inner fails correctly if array types cannot be coerced …

67592d3

…into a common type.

BUG: Clear error before constructing error message using calls to `Py…

bab118d

…Object_Repr`. Also, do a better job of handling any errors raised while constructing the error message.

MAINT: Refactor cblas_innerproduct to use cblas_matrixproduct.

88c8a9c

MAINT: Refactor PyArray_InnerProduct so that it just performs a tra…

223513a

…nspose and calls `PyArray_MatrixProduct2`.

jakirkham force-pushed the optimize_innerproduct branch from 2ba0898 to 223513a Compare January 12, 2016 00:42

charris added a commit that referenced this pull request Jan 12, 2016

Merge pull request #6968 from jakirkham/optimize_innerproduct

eb271a5

ENH, MAINT: Refactor `PyArray_InnerProduct` to use `PyArray_MatrixProduct2`

charris merged commit eb271a5 into numpy:master Jan 12, 2016

jakirkham deleted the optimize_innerproduct branch January 12, 2016 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH, MAINT: Refactor `PyArray_InnerProduct` to use `PyArray_MatrixProduct2` #6968

ENH, MAINT: Refactor `PyArray_InnerProduct` to use `PyArray_MatrixProduct2` #6968

jakirkham commented Jan 7, 2016

seberg commented Jan 7, 2016

jakirkham commented Jan 7, 2016

seberg commented Jan 7, 2016

jakirkham commented Jan 7, 2016

jakirkham commented Jan 7, 2016

seberg commented Jan 7, 2016

jakirkham commented Jan 7, 2016

seberg commented Jan 7, 2016

jakirkham commented Jan 7, 2016

jakirkham commented Jan 7, 2016

jakirkham commented Jan 8, 2016

seberg commented Jan 8, 2016

jakirkham commented Jan 8, 2016

jakirkham commented Jan 10, 2016

jakirkham commented Jan 10, 2016

jakirkham commented Jan 10, 2016

jakirkham commented Jan 11, 2016

jakirkham commented Jan 11, 2016

jakirkham commented Jan 11, 2016

charris commented Jan 12, 2016

seberg commented Jan 12, 2016

jakirkham commented Jan 12, 2016

jakirkham commented Jan 14, 2016

mhvk commented Jan 14, 2016

jakirkham commented Jan 16, 2016

ENH, MAINT: Refactor PyArray_InnerProduct to use PyArray_MatrixProduct2 #6968

ENH, MAINT: Refactor PyArray_InnerProduct to use PyArray_MatrixProduct2 #6968

Conversation

jakirkham commented Jan 7, 2016

seberg commented Jan 7, 2016

jakirkham commented Jan 7, 2016

seberg commented Jan 7, 2016

jakirkham commented Jan 7, 2016

jakirkham commented Jan 7, 2016

seberg commented Jan 7, 2016

jakirkham commented Jan 7, 2016

seberg commented Jan 7, 2016

jakirkham commented Jan 7, 2016

jakirkham commented Jan 7, 2016

jakirkham commented Jan 8, 2016

seberg commented Jan 8, 2016

jakirkham commented Jan 8, 2016

jakirkham commented Jan 10, 2016

jakirkham commented Jan 10, 2016

jakirkham commented Jan 10, 2016

jakirkham commented Jan 11, 2016

jakirkham commented Jan 11, 2016

jakirkham commented Jan 11, 2016

charris commented Jan 12, 2016

seberg commented Jan 12, 2016

jakirkham commented Jan 12, 2016

jakirkham commented Jan 14, 2016

mhvk commented Jan 14, 2016

jakirkham commented Jan 16, 2016

ENH, MAINT: Refactor `PyArray_InnerProduct` to use `PyArray_MatrixProduct2` #6968

ENH, MAINT: Refactor `PyArray_InnerProduct` to use `PyArray_MatrixProduct2` #6968