Description
@rth At the API level I agree the functional programming style is friendlier. The benefit really would come when the sklearn datasets API is extended, allowing for easier maintenance and ensuring similar behavior. The proposed pull request would not make sense if the datasets API is 'closed'.
@rth I also agree that a the Datasets API in deep learning libraries is more obviously necessary. But, on a leap of faith here, I'm hoping that - through a thoughtful redesign - we may identify useful extensions of a Dataset
class that are relevant to supporting reproducibility and performance of scikit-learn's more notable functionality (Estimators/Transformers).
For example, could extending datasets to act as generators increase memory or time performance of sklearn meaningfully?
I'll take a look at OpenML, and open up a fresh issue to start a high level conversation. I hope this PR serves to get some brain juices flowing.
Originally posted by @daniel-cortez-stevenson in #13120 (comment)