Skip to content

ENH, MAINT: Refactor PyArray_InnerProduct to use PyArray_MatrixProduct2 #6968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 12, 2016

Conversation

jakirkham
Copy link
Contributor

Fixes #6948

Related: #6932
Related: #6977
Related: #6986
Related: #6987
Related: #6988

This follows @seberg's suggestion ( #6948 (comment) ) and does the simplest thing. Namely, refactors PyArray_InnerProduct to use PyArray_MatrixProduct2. As a consequence, np.inner will see a speedup in cases where the problem contains portions like these a @ a.T and a.T @ a as this already is optimized in cblas_matrixproduct because of this PR ( #6932 ).


Todo: - [x] Refactor so that `PyArray_InnerProduct` just calls `PyArray_MatrixProduct2`. - [x] Add more tests of different dimensions for `np.inner` to make sure it still behaves correctly. - [x] Add benchmarks so that cases where `np.inner` can now see a speedup are shown vs those it can't as was done with `np.dot`. - [x] Add test for type mismatch exception.

@seberg
Copy link
Member

seberg commented Jan 7, 2016

Yeah, that is what I mean. I will let someone else decide further, but I think the transpose may need to be a bit more complicated. To figure that out/test it, can you add a test for multiplying larger then 2D arrays?

@jakirkham jakirkham force-pushed the optimize_innerproduct branch from 770b9bc to 9e8a47f Compare January 7, 2016 21:36
@jakirkham
Copy link
Contributor Author

My understanding is that this isn't intended for public consumption as it only works on 2D arrays or smaller just as cblas_matrixproduct. Instead, there is a PyArray_InnerProduct, which is intend for use in the C API and is bound to np.inner in the Python API.

This only gets called in one place (

#if defined(HAVE_CBLAS)
if (PyArray_NDIM(ap1) <= 2 && PyArray_NDIM(ap2) <= 2 &&
(NPY_DOUBLE == typenum || NPY_CDOUBLE == typenum ||
NPY_FLOAT == typenum || NPY_CFLOAT == typenum)) {
return cblas_innerproduct(typenum, ap1, ap2);
}
#endif
). As you can see, we are ensured to only have arrays with dimension 2 or less inside cblas_innerproduct.

@seberg
Copy link
Member

seberg commented Jan 7, 2016

Ah, right. Frankly, I would much prefer if we can do it for PyArray_InnerProduct making it just call PyArray_MatrixProduct2. Then remove the whole cblas_innerproduct function. Though I am not sure it makes sense, my guess would be it does.

@jakirkham
Copy link
Contributor Author

It looks like there are some einsum tests that also give inner 3D arrays, but I don't know that this is the best way to test inner on these cases by comparing it to the much more complex einsum. I can certainly add some. ( https://github.com/jakirkham/numpy/blob/e72e1510d892ce4464cf102000e89582327953a0/numpy/core/tests/test_einsum.py#L262-L275 ).

Other things this probably needs are some benchmarks as I am claiming one can now get a speedup with inner in some special cases.

@jakirkham
Copy link
Contributor Author

Ah, right. Frankly, I would much prefer if we can do it for PyArray_InnerProduct making it just call PyArray_MatrixProduct2. Then remove the whole cblas_innerproduct function. Though I am not sure it makes sense, my guess would be it does.

It is probably removable, but what I liked about your original proposal (at least how I read it) is everything stays pretty much the same in terms of how everything builds. Also, how many, which, and where functions are remains the same. As soon as we start messing with that, we open ourselves up to spending time hunting down potentially weird build errors. At present, this just works.

@seberg
Copy link
Member

seberg commented Jan 7, 2016

Well, look at MatrixProduct2 ;), it is a monster of annoying stuff and all that is needed to call it from inner is transposing the last two axes of op2 (if op2.ndim >= 2). The added complexity is only that the transpose is more complicated because you cannot pass in NULL. Other then that, you just remove more complexity. Note that the function you changed can be removed completely, it is not public.

@jakirkham
Copy link
Contributor Author

It would be nice if there was a PyArray_RollAxis.

@seberg
Copy link
Member

seberg commented Jan 7, 2016

Well, I don't know, you probably have to go way back to figure that one out ;). That is why I would like it refactored away. You save a single tranpose (array creation) for all I figure.

@jakirkham
Copy link
Contributor Author

Sorry, I realized I was asking a question that was basically becoming "why do we have numpy.dot and numpy.inner?" and decided I should just be quiet lest this gets derailed.

@jakirkham
Copy link
Contributor Author

BTW, did you see this ( #5859 )?

@jakirkham jakirkham changed the title MAINT: Refactor cblas_innerproduct to use cblas_matrixproduct ENH, MAINT: Refactor cblas_innerproduct to use cblas_matrixproduct Jan 8, 2016
@jakirkham jakirkham force-pushed the optimize_innerproduct branch 12 times, most recently from 6a4903e to d2ea634 Compare January 8, 2016 17:26
@jakirkham
Copy link
Contributor Author

So, if I make this simple change ( jakirkham@2288e34 ), I get a segmentation fault in the test suite. Is there something that I am doing wrong here? I'm not very familiar with the C API so I wouldn't be surprised if I am. I just need a few pointers.

@jakirkham jakirkham force-pushed the optimize_innerproduct branch from d2ea634 to 81d4275 Compare January 8, 2016 17:55
@seberg
Copy link
Member

seberg commented Jan 8, 2016

Can you do me the favor and try to refactor all of PyArray_Inner? At least to me doing the transpose specifically in this subfunction seems half baked. Doing the full transpose should not be too difficult, see for example https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/mapping.c#L135 for a "complex" Transpose operation, but it gives you the idea that you can just iterate and set all ndim (if ndim >= 2), and then switch the last two ndim around.

@jakirkham
Copy link
Contributor Author

So, I actually did that too. Though I have not pushed it to the PR yet.

I am struggling with a segmentation fault there, as well. Here is the commit ( jakirkham@8f5464c ). Any pointers on why the segmentation fault occurs in either case would be helpful.

@jakirkham jakirkham changed the title ENH, MAINT: Refactor cblas_innerproduct to use cblas_matrixproduct WIP, ENH, MAINT: Refactor cblas_innerproduct to use cblas_matrixproduct Jan 8, 2016
@jakirkham jakirkham force-pushed the optimize_innerproduct branch 4 times, most recently from a5220ef to c616a02 Compare January 9, 2016 23:29
@jakirkham
Copy link
Contributor Author

Tried to add some tests for the exception, but they fail USE_DEBUG=1 only due to some issue with the exception. Not really, sure what is going on there. Thought it might be related to this ( http://bugs.python.org/issue20500 ), but that is suppose to be fixed. Any pointers anybody has would be helpful. Maybe related to this ( #6741 ), but I doubt it.

@jakirkham jakirkham force-pushed the optimize_innerproduct branch from c616a02 to 0c708d8 Compare January 10, 2016 02:22
@jakirkham
Copy link
Contributor Author

So, I have removed the changed exception as it appears to be somewhat controversial and it is in a different PR ( #6987 ) with exception tests. I have also removed the exceptions tests as they seem to have issues that at the moment I cannot seem to figure out and placed them in another PR ( #6988 ). If we can get these working, I am willing to combine them back into this PR. However, I don't want this PR's fate to be determined by trying to come up with acceptable solutions to these more minor issues.

@jakirkham
Copy link
Contributor Author

Looks good to me, did you check there are tests for 3-dim arrays, or could you add them if they don't exist?

There are some, but they are comparing einsum to inner. Though we probably should have some tests for inner that don't rely on another function (especially a more complex one) to determine success, which I am adding. Also, probably could use at least one syrk vs. gemm test.

Yes, they went in the PR ( #6986 ), which is already merged. This PR has been rebased on master after those commits were added so includes them.

@jakirkham jakirkham force-pushed the optimize_innerproduct branch 2 times, most recently from df15b9a to 920b70c Compare January 11, 2016 05:32
@jakirkham
Copy link
Contributor Author

Figured out the issues with type exceptions thanks to help from @seberg and @charris. As they are now working, I have brought that PR ( #6988 ) back into the history of this PR.

@jakirkham
Copy link
Contributor Author

Failures on AppVeyor ( #6991 ) had not previously occurred for this PR with exception of the segmentation fault issue (also happened on Travis), which has since been resolved. I believe these to be unrelated to the content of this PR.

@jakirkham
Copy link
Contributor Author

As AppVeyor is merging with master, which is broken, it is currently failing the tests there. So, I ran my own AppVeyor build without this merge (this is rebased on a commit on master before the bad commit) and it looks like it checks out ( https://ci.appveyor.com/project/jakirkham/numpy/build/1.0.4 ).

@jakirkham jakirkham force-pushed the optimize_innerproduct branch from 2ba0898 to 223513a Compare January 12, 2016 00:42
charris added a commit that referenced this pull request Jan 12, 2016
ENH, MAINT: Refactor `PyArray_InnerProduct` to use `PyArray_MatrixProduct2`
@charris charris merged commit eb271a5 into numpy:master Jan 12, 2016
@charris
Copy link
Member

charris commented Jan 12, 2016

Thanks @jakirkham . The cblasfuncs.c could use some style fixes, but that isn't new here.

@seberg
Copy link
Member

seberg commented Jan 12, 2016

Thanks!

@jakirkham
Copy link
Contributor Author

Thanks everyone.

Alright, @charris, I'll try to look at this at some point soon.

@jakirkham jakirkham deleted the optimize_innerproduct branch January 12, 2016 19:04
@jakirkham
Copy link
Contributor Author

The benchmark shows the syrk speedup. ( https://pv.github.io/numpy-bench/#bench_linalg.Eindot.time_inner_trans_a_a )

@mhvk
Copy link
Contributor

mhvk commented Jan 14, 2016

Very nice!

@jakirkham
Copy link
Contributor Author

An attempt at fixing the style errors, @charris. ( #7038 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants