Skip to content

Ensure predictions sparse before sp.hstack in ClassifierChain #27905

@lucyleeow

Description

@lucyleeow

We use sp.hstack in a number of places in ClassifierChain where we may be stacking sparse with dense, e.g.,:

X_aug = sp.hstack((X, previous_predictions))

and

X_aug = sp.hstack((X, Y_pred_chain), format="lil")

AFAICT it seems stacking a sparse with dense via sp.hstack gives you a sparse array (even though sp.hstack is not documented to support dense):

In [34]: from scipy.sparse import coo_matrix, hstack
    ...: 
    ...: A = coo_matrix([[1, 2], [3, 4]])

In [35]: B = np.zeros((2,2))

In [36]: hstack([A,B])
Out[36]: 
<2x4 sparse matrix of type '<class 'numpy.float64'>'
        with 4 stored elements in COOrdinate format>

Maybe due to: https://github.com/scipy/scipy/blob/f990b1d2471748c79bc4260baf8923db0a5248af/scipy/sparse/_construct.py#L654 ?

Should we ensure y is sparse before using sp.hstack ?

I had quick look at our code, I could not find any other cases where it would be possible to be stacking dense + sparse. I think ClassifierChain is unique in that we do not usually combine X with y

Discussed here: #27700 (comment)

cc @glemaitre

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions