-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG] Downcast large matrix indices where possible in sparsefuncs._minor_reduce (fix #13737) #13741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
35afc42
330d5de
d00257b
77bda79
4ac4504
ea02681
52ca4f5
b6fd15a
661595b
f23d5ab
40b0c1a
137c4c6
35c9b82
8dca597
d4e09cf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -341,6 +341,11 @@ def inplace_swap_column(X, m, n): | |
|
||
def _minor_reduce(X, ufunc): | ||
major_index = np.flatnonzero(np.diff(X.indptr)) | ||
|
||
# reduceat tries casts X.indptr to intp, which errors | ||
# if it is int64 on a 32 bit system. | ||
# Reinitializing prevents this where possible, see #13737 | ||
X = type(X)((X.data, X.indices, X.indptr), shape=X.shape) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so what does this reinitialisation actually do? Will it downcast indptr if X.shape is less than int32.max??? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, precisely. The logic is in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And in other cases is this a non-copying operation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes (see |
||
value = ufunc.reduceat(X.data, X.indptr[major_index]) | ||
return major_index, value | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better off putting it under preprocessing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do, although the code change is in sparsefuncs and will also affect
feature_selection.VarianceThreshold
once pull #13704 goes through.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I believe that in this case the two classes that are potentially impacted by this bug are:
LabelBinarizer
andMaxAbsScaler
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had not read the previous comment when writing mine. In the end I am fine with the current PR.