Skip to content

FIX Extract estimator objects before aggregating dict of scores #17745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 27, 2020

Conversation

alfaro96
Copy link
Member

Reference Issues/PRs

Coming from #17332.

What does this implement/fix? Explain your changes.

This PR extracts the estimator objects for each cross-validation split before executing the _aggregate_score_dicts in cross_validation function to avoid that nested estimators are converted to arrays.

Any other comments?

WDYT @thomasjpfan about this solution?

Another idea to solve this issue?

@glemaitre
Copy link
Member

You are super-efficient. You open a PR before that I found that the CI was broken :)

@glemaitre
Copy link
Member

We should probably have a test to trigger this usage. I am thinking that this is maybe the job of the _aggregate_* function to take care to not convert everything to an array (but I have to look more into details).

@alfaro96
Copy link
Member Author

You are super-efficient. You open a PR before that I found that the CI was broken :)

I was looking to the failures in the CRON jobs and I realize that the documentation build was failing. Since I love keep things properly working, I propose the PR :)

@alfaro96 alfaro96 requested a review from glemaitre June 26, 2020 14:57
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe @thomasjpfan wants to have a look

@glemaitre
Copy link
Member

So maybe your solution is better then :)

Comment on lines 1564 to 1565
if key.endswith(("time", "score"))
else [score[key] for score in scores]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to dispatch based on the type:

    return {
        key: np.asarray([score[key] for score in scores])
        if isinstance(scores[0][key], numbers.Number)
        else [score[key] for score in scores]
        for key in scores[0]
    }

and then update the docstring:

Aggregate a list of dicts to a dict of ndarray or list.

WTYD @glemaitre ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change and merged so that PRs are not failing. We can have a followup PR if we want to go back to checking the key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... the only recent PR that was blocked was #17743

@thomasjpfan thomasjpfan merged commit 0e33229 into scikit-learn:master Jun 27, 2020
@thomasjpfan
Copy link
Member

Thank you @alfaro96 for being so quick with the fix!

@alfaro96
Copy link
Member Author

alfaro96 commented Jun 27, 2020

Thanks @thomasjpfan and @glemaitre for taking care and finalizing this PR!

@alfaro96 alfaro96 deleted the fix_named_steps branch June 28, 2020 08:47
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Jul 17, 2020
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants