Skip to content

Redesign of the sklearn.datasets API #13122

Closed
@daniel-cortez-stevenson

Description

@daniel-cortez-stevenson

@rth At the API level I agree the functional programming style is friendlier. The benefit really would come when the sklearn datasets API is extended, allowing for easier maintenance and ensuring similar behavior. The proposed pull request would not make sense if the datasets API is 'closed'.

@rth I also agree that a the Datasets API in deep learning libraries is more obviously necessary. But, on a leap of faith here, I'm hoping that - through a thoughtful redesign - we may identify useful extensions of a Dataset class that are relevant to supporting reproducibility and performance of scikit-learn's more notable functionality (Estimators/Transformers).

For example, could extending datasets to act as generators increase memory or time performance of sklearn meaningfully?

I'll take a look at OpenML, and open up a fresh issue to start a high level conversation. I hope this PR serves to get some brain juices flowing.

Originally posted by @daniel-cortez-stevenson in #13120 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions