-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
We use sp.hstack
in a number of places in ClassifierChain
where we may be stacking sparse with dense, e.g.,:
scikit-learn/sklearn/multioutput.py
Line 948 in 36f6734
X_aug = sp.hstack((X, previous_predictions)) |
and
scikit-learn/sklearn/multioutput.py
Line 693 in 36f6734
X_aug = sp.hstack((X, Y_pred_chain), format="lil") |
AFAICT it seems stacking a sparse with dense via sp.hstack
gives you a sparse array (even though sp.hstack
is not documented to support dense):
In [34]: from scipy.sparse import coo_matrix, hstack
...:
...: A = coo_matrix([[1, 2], [3, 4]])
In [35]: B = np.zeros((2,2))
In [36]: hstack([A,B])
Out[36]:
<2x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in COOrdinate format>
Maybe due to: https://github.com/scipy/scipy/blob/f990b1d2471748c79bc4260baf8923db0a5248af/scipy/sparse/_construct.py#L654 ?
Should we ensure y is sparse before using sp.hstack
?
I had quick look at our code, I could not find any other cases where it would be possible to be stacking dense + sparse. I think ClassifierChain
is unique in that we do not usually combine X
with y
Discussed here: #27700 (comment)
cc @glemaitre