-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Add support for 64 bit indices in CSR array normalization #9663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Please add to what's new... ideally listing which estimators can now handle the large CSR data, but perhaps then we need a test for them. Hmm.
@jnothman Thanks for the review! Maybe we could add the "what's new" entry once we actually have a mechanism for testing estimators with 64bit csr as you suggested here #2969 (comment) . Working on that.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 as well for waiting to use that in actual estimator before adding an entry in changelog.
@jnothman Please let me know if you have other comments on this PR. Thanks! |
Okay. You just make sure the changelog is appropriately attributed, etc. Thanks again @rth |
This adds fused type to support 64 bit indices in
csr_row_norms
,inplace_csr_row_normalize_l1
andinplace_csr_row_normalize_l2
.Fixes #6468 andbegins to adress #2969 .In this PR, for CSR array indices I define a fused
integral
type (int
,long long
) which should produce either 32 bit and 64 bit respectively on Linux / Windows and 64bit / 32bit platforms as far as I understand. The array indices variables are alwaysnp.npy_intp
, while the array shapes(M, N)
arelong long
to support up to 64 bit array sizes. Please let me know if I missed something.cc @jnothman @lesteve @ogrisel
Also tested that the performance is not affected (with 32bit CSR indices) using the following script (needs to be run with ipython),