fetch_mldata needs to handle sparse matrices as labels #700

amueller · 2012-03-15T16:42:43Z

This might be a special case but the "yeast" data set returns a sparse matrix as labels.
As this is a standard dataset for multi-label prediction, it would be good if we supported that.
Maybe we could even make an example.

amueller · 2012-03-18T17:02:34Z

done in af3b08a

davidmarek · 2012-03-18T17:49:42Z

Hi,
I am trying to get familiar with scikit-learn. I have looked at this bug and I think the easiest way to handle sparse matrices is to check if dataset['target'] is sparse and in that case transform it into dense matrix.

My fix looks like this:

    # set axes to sklearn conventions
    if transpose_data:
        dataset['data'] = dataset['data'].T
    if 'target' in dataset:
        if issparse(dataset['target']):
            dataset['target'] = dataset['target'].todense()
        dataset['target'] = dataset['target'].squeeze()

Do you think this is the right way to solve this bug? What else is there to do? Add tests?

amueller · 2012-03-18T18:17:53Z

You're just half an hour to late, I already fixed the bug. Sorry about that.

My fix was using squeeze only when the target is not sparse, which should be more efficient than converting to dense if there are many outputs.

Let me have a look if I can find anything else that you could give a try.

amueller · 2012-03-18T18:28:13Z

You can try looking into #569 if you like. #615 should be very easy, #558 should be moderate, #615 should also be ok.

davidmarek · 2012-03-18T18:34:58Z

Thanks, I'll look at those issues. I wasn't sure where the labels can be used and if there won't be any code depending on getting dense matrix.

amueller · 2012-03-18T18:38:39Z

Most of the code assumes dense matrices but not inside the fetch_mldata part. So I thought it would be reasonable to let the user decide what to do, once he got his hands on the targets.
I think both solutions have pro and cons - and as this really doesn't come up that often, I just went with the first that came to mind.

amueller closed this as completed Mar 18, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fetch_mldata needs to handle sparse matrices as labels #700

fetch_mldata needs to handle sparse matrices as labels #700

amueller commented Mar 15, 2012

amueller commented Mar 18, 2012

davidmarek commented Mar 18, 2012

amueller commented Mar 18, 2012

amueller commented Mar 18, 2012

davidmarek commented Mar 18, 2012

amueller commented Mar 18, 2012

fetch_mldata needs to handle sparse matrices as labels #700

fetch_mldata needs to handle sparse matrices as labels #700

Comments

amueller commented Mar 15, 2012

amueller commented Mar 18, 2012

davidmarek commented Mar 18, 2012

amueller commented Mar 18, 2012

amueller commented Mar 18, 2012

davidmarek commented Mar 18, 2012

amueller commented Mar 18, 2012