[MRG] Downcast large matrix indices where possible in sparsefuncs._minor_reduce (fix #13737) #13741

rlms · 2019-04-28T22:30:06Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Currently sparsefuncs.min_max_axis gives a TypeError whenever the input is a csc matrix (and axis is 0) with int64 indices on a 32-bit machine (due to a cast to intp in _minor_reduce). This does not occur (necessarily) for csr matrices, since the conversion to a csc matrix in _min_or_max_axis will downcast the indices when safe to do so (i.e. when all values fit in an int32). Reinitialising the input to _minor_reduce should avoid this (and also the corresponding problem that would occur for csr inputs and axis of 1).

A test for min_max_axis has been changed to cover the large matrices this affects. As expected, it now fails on 32-bit Windows for the master code base.

Any other comments?

…t-learn#13691

Co-Authored-By: rlms <macsweenroddy@gmail.com>

…/rlms/scikit-learn into variance-threshold-zero-variance

jnothman · 2019-04-28T23:53:15Z

We are working towards a release, so please ping after the release is out in a week or two.

rlms · 2019-05-13T10:35:11Z

@jnothman pinging

jnothman · 2019-05-22T04:08:02Z

sklearn/utils/sparsefuncs.py

+    # reduceat tries casts X.indptr to intp, which errors
+    # if it is int64 on a 32 bit system.
+    # Reinitializing prevents this where possible, see #13737
+    X = type(X)((X.data, X.indices, X.indptr), shape=X.shape)


so what does this reinitialisation actually do? Will it downcast indptr if X.shape is less than int32.max???

Yes, precisely. The logic is in get_index_dtype here.

And in other cases is this a non-copying operation?

Yes (see __init__ for _cs_matrix here). That could be made explicit by adding copy=False.

jnothman

Is there any reason it's still marked WIP?

If it should be MRC, I'm happy to try include this in release 0.21.2 (due imminently) if another reviewer approves.

Please add an entry to the change log at doc/whats_new/v0.21.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:. We might land up moving this entry if the release ships sooner.

jnothman · 2019-05-23T01:08:47Z

The what's new entry should ideally mention public API (e.g. estimators) that this affects

jnothman · 2019-05-23T11:53:30Z

doc/whats_new/v0.21.rst

@@ -26,6 +26,14 @@ Changelog
  (regression introduced in 0.21) :issue:`13910`
  by :user:`Jérémie du Boisberranger <jeremiedbb>`.

+:mod:`sklearn.utils.sparsefuncs`


Better off putting it under preprocessing?

Could do, although the code change is in sparsefuncs and will also affect feature_selection.VarianceThreshold once pull #13704 goes through.

+1, I believe that in this case the two classes that are potentially impacted by this bug are: LabelBinarizer and MaxAbsScaler.

I had not read the previous comment when writing mine. In the end I am fine with the current PR.

ogrisel

~~Besides the whats new section issue,~~ LGTM.

ogrisel · 2019-05-23T12:37:34Z

@jnothman merge?

jnothman · 2019-05-23T12:55:18Z

Thanks @rlms

…r_reduce (fix scikit-learn#13737) (scikit-learn#13741)

rlmacsween and others added 13 commits April 23, 2019 16:47

Changed VarianceThreshold behaviour when threshold is zero. See sciki…

35afc42

…t-learn#13691

Slightly modified change to VarianceThreshold and added test

330d5de

Removed blank lines from end of file

d00257b

Minor changes to new VarianceThreshold behaviour

77bda79

Commented test

4ac4504

Update sklearn/feature_selection/variance_threshold.py

ea02681

Co-Authored-By: rlms <macsweenroddy@gmail.com>

Update sklearn/feature_selection/variance_threshold.py

52ca4f5

Co-Authored-By: rlms <macsweenroddy@gmail.com>

Changed test format

b6fd15a

Merge branch 'variance-threshold-zero-variance' of https://github.com…

661595b

…/rlms/scikit-learn into variance-threshold-zero-variance

Reformatted assertion in test

f23d5ab

Added test

40b0c1a

Rolled back changes from other branch

137c4c6

Uncommented change

35c9b82

jnothman reviewed May 22, 2019

View reviewed changes

jnothman approved these changes May 23, 2019

View reviewed changes

rlms changed the title ~~[WIP] Downcast large matrix indices where possible in sparsefuncs._minor_reduce (fix #13737)~~ [MRG] Downcast large matrix indices where possible in sparsefuncs._minor_reduce (fix #13737) May 23, 2019

rlmacsween added 2 commits May 23, 2019 12:03

Merge remote-tracking branch 'upstream/master' into min-max-axis-csc

8dca597

Updated change log

d4e09cf

jnothman reviewed May 23, 2019

View reviewed changes

jnothman mentioned this pull request May 23, 2019

Release 0.21.2 #13915

Merged

ogrisel approved these changes May 23, 2019

View reviewed changes

jnothman merged commit dc5e4d8 into scikit-learn:master May 23, 2019

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request May 23, 2019

FIX downcast large matrix indices where possible in sparsefuncs._mino…

e833c92

…r_reduce (fix scikit-learn#13737) (scikit-learn#13741)

rlms deleted the min-max-axis-csc branch May 23, 2019 17:27

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

FIX downcast large matrix indices where possible in sparsefuncs._mino…

8516946

…r_reduce (fix scikit-learn#13737) (scikit-learn#13741)

lesteve mentioned this pull request Sep 26, 2023

Error inside Pyodide with sklearn.utils.sparsefuncs on scipy sparse arrays and int64 indices #27470

Closed

Uh oh!

[MRG] Downcast large matrix indices where possible in sparsefuncs._minor_reduce (fix #13737) #13741

[MRG] Downcast large matrix indices where possible in sparsefuncs._minor_reduce (fix #13737) #13741

Uh oh!

Conversation

rlms commented Apr 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman commented Apr 28, 2019 via email

Uh oh!

rlms commented May 13, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman commented May 23, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented May 23, 2019

Uh oh!

jnothman commented May 23, 2019

Uh oh!

Uh oh!

rlms commented Apr 28, 2019 •

edited

Loading

ogrisel left a comment •

edited

Loading