Skip to content

GridSearchCV saves all fitted estimator in cv_results['params'] when params are estimators #10063

@fcharras

Description

@fcharras

Description

I use GridSearchCV to optimize the hyperparameters of a pipeline. I set the param grid by inputing transformers or estimators at different steps of the pipeline, following the Pipeline documentation:

A step’s estimator may be replaced entirely by setting the parameter with its name to another estimator, or a transformer removed by setting to None.

I couldn't figure why dumping cv_results_ would take so much memory on disk. It happens that cv_results_['params'] and all cv_results_['param_*'] objects contains fitted estimators, as much as there are points on my grid.

This bug should happen only when n_jobs = 1 (which is my usecase).

I don't think this is intended (else the arguments and attributes refit and best_estimator wouldn't be used).

My guess is that during the grid search, those estimator's aren't cloned before use (which could be a problem if using the same grid search several times, because estimators passed in the next grid would be fitted...).

Version: 0.19.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions