-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+3] ENH Restructure grid_scores_ into a dict of 1D (numpy) (masked) arrays that can be imported into pandas as a DataFrame. #6697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+3] ENH Restructure grid_scores_ into a dict of 1D (numpy) (masked) arrays that can be imported into pandas as a DataFrame. #6697
Conversation
413404e
to
12d8261
Compare
cad5e6c
to
e153cbb
Compare
Sorry for the delay! I was trying to attack multiple metric support along with this and later realized the Scorer interface discussion needs to take place before that! So I've just restructured the |
* ``mean_validation_score``, the mean score over the | ||
cross-validation folds | ||
* ``cv_validation_scores``, the list of scores for each fold | ||
search_results_ : dict of numpy (masked) ndarrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By renaming grid_scores_
to search_results_
, you're changing the API, right? You probably don't want to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model_selection
module has not been publicly released, so there's an argument to be made that this is fair game.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather, this sort of change has been proposed and sought by the core devs for years; the fact that model_selection has not been released maybe makes it the best time to do it. However, I think we must consider this a deprecation case, not just drop grid_scores_
altogether. And that's because users will initially not rewrite all their old code but will change some imports, and will be much less angry if they get a DeprecationWarning
at the end of a long fit rather than an AttributeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm... We plan on adding multiple metric support subsequently. How would grid_scores_
look for that use case? Do we raise an error when grid_scores_
is accessed when multiple metrics are required. Or do we return the first metric alone? or all the metrics as a dict of lists (of _CVScoreTuples
)?
Also the old grid_search
module is available for all the angry users who can tolerate a DeprecationWarning
. I have a humble opinion that trying to support old code here will constrain us, not now but in subsequent PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @jnothman is right, especially because this is likely to be an issue at the end of a long fit. I guess grid_scores_
should raise an exception if (when) multiple metrics are used. They're not "angry users", if anything, they will be users who embark in the process of updating their projects to keep up with scikit-learn changes. We want to make it smooth, such that they can stop in the middle of the rewriting effort, if they have to, and things will still work, right?
Maybe in such a case we could use a shorter deprecation cycle, if you think that would help keep things cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
users who embark in the process of updating their projects to keep up with scikit-learn changes.
Okay with +3 I'm adding grid_scores_
as a property function. Also I was wondering if we should store like we did before as computing them from search_results_
(especially the params) is not very clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only has to work for cases formerly supported. Either way of producing them is fine.
Ping again when this is ready for review.. |
mask = [ True True False False]...), | ||
'degree' : masked_array(data = [2.0 3.0 -- --], | ||
mask = [False False True True]...), | ||
'accuracy_score_split_0' : [0.8, 0.7, 0.8, 0.9], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is annoying that we are pandering to pandas (ha!) to not allow a 2d array here :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! The other option is to store the scores in compact 2D numpy float arrays and have a sorted list of column headers (the dict keys of search_results_
). (only the scores, not the parameters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for some functions in networkx, they allow you to return pandas dataframes when possible by flipping a return_pandas
parameter to True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I suppose the most idiomatic Pandas representation is using a MultiIndex. But I don't think it's necessary our job to produce that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'm a proficient enough Pandas user to really say what is idiomatic or useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be practical, what I am a little concerned about is: how easy is it to calculate the standard deviations of scores across k-fold CV? It's easy in the current storage; how easy is it in the proposed? How easy is it in the Pandas rep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman Assuming that the folds in question are named accuracy_score_split_0, ..., accuracy_score_split_(k-1)
, and the pandas dataframe with the scores is called search_results_
, it's as easy as
fold_scores = search_results_.select(lambda name: name.startswith("accuracy_score_split_"), axis=1)
# Placing data into the dataframe
search_results_["std"] = fold_scores.std(axis=1)
search_results_["mean"] = fold_scores.mean(axis=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's not terrible...
On 1 May 2016 at 16:29, Henry Lin notifications@github.com wrote:
In sklearn/model_selection/_search.py
#6697 (comment)
:
kernel|gamma|degree|accuracy_score_split_0...|accuracy_score_mean ...|
=====================================================================
'poly'| - | 2 | 0.8 | 0.81 |
'poly'| - | 3 | 0.7 | 0.60 |
'rbf' | 0.1 | - | 0.8 | 0.75 |
'rbf' | 0.2 | - | 0.9 | 0.82 |
will be represented by a search_results_ dict of :
{'kernel' : masked_array(data = ['poly', 'poly', 'rbf', 'rbf'],
mask = [False False False False]...)
'gamma' : masked_array(data = [-- -- 0.1 0.2],
mask = [ True True False False]...),
'degree' : masked_array(data = [2.0 3.0 -- --],
mask = [False False True True]...),
'accuracy_score_split_0' : [0.8, 0.7, 0.8, 0.9],
@jnothman https://github.com/jnothman Assuming that the folds in
question are named accuracy_score_split_0, ..., accuracy_score_split_(k-1),
and the pandas dataframe with the scores is called search_results_, it's
as easy asfold_scores = search_results_.select(lambda name: name.startswith("accuracy_score_split_"), axis=1)
search_results_["std"] = fold_scores.std(axis=1) # To place it into the dataframe—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/6697/files/e153cbbbe7fc8ae5edbb427ca6f3e22d63bd6f0a#r61681497
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think get_candidates(...)
addresses this problem?
I think, @rvraghav93, you would've benefited from writing this in a more test-driven way. Indeed, we would be able to critique your expected format before you write the implementation! Please get that test written up so I don't need to keep referring to an outdated example, when I know whether or not the test passes. |
This is otherwise going in the right direction and I look forward to its acceptance! I think fixing #1020 directly here (visualising and marginalising over parameters) is out of scope, though you're right that enabling Pandas access allows users to groupby and visualise. I also think #1742, adding training scores/times should probably also remain out of scope. |
Sure, but I was just weighing in. I'll do my best to squeeze in an actual review. |
Ok I've re-added the I've also added a Not sure if this is helpful or superfluous. I just went ahead and implemented this. We could remove if you feel this is superfluous and confuses Refer the 2nd cell of the sandbox notebook for a sample usage. I'll clean up docs and add tests soon. |
I don't think
Separate;y I suggest we don't delimit everything in column names with underscores. We have no constraint that these be valid Python identifiers. Go for broke with colons, for instance. |
Though I guess Pandas has special attribute accessors for Python identifier names... Maybe I'm wrong about the colons, but I dislike the long underscored names aesthetically and in terms of the potential for naming conflicts... |
Having said that #1742 is a separate concern, I remember that when I was designing an equivalent solution, I realised that our results need to distinguish between train scores and test scores. Is it overkill to already label columns "test_score" rather than "score"? |
"""Generate the metric name given the scoring parameter""" | ||
if callable(scoring): | ||
if scoring.__name__ == "_passthrough_scorer": | ||
return "estimator_default_scorer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if "_score" is being appended, isn't "estimator" sufficient? i.e. "estimator_score"...?
All done. I think circle ci failure is unrelated. Could you confirm @ogrisel? If so merge? @jnothman @MechCoder @agramfort |
+1 for merge. |
18ac6a1
to
a3b1eb9
Compare
Don't merge yet |
I meant that as : I will not merge because there may yet be minor issues, but I'm generally satisfied with this |
a3b1eb9
to
18ac6a1
Compare
Sorry for the short uninformative comment. I had pushed a superfluous commit (unstashed from my multiple metric search work) by mistake and commented so to avoid accidental merge. I just used reflog to fix it. Now its all good. Thanks heaps for the reviews!! |
Interesting... Force pushing a previous version (restored via reflog) which was tested already here does not trigger a CI rebuild. Cool! |
merge? |
Yea... All good from my side!! |
Well, well. Congratulations and thank you, @raghavrv! |
Yay!! Congratulations and well done 👍 🍷 🍷 Or as they say over here, மகிழ்ச்சி |
Congrats!
|
🍻 |
Awesome! Great job! |
for params, mean_score, scores in clf.grid_scores_: | ||
means = clf.results_['test_mean_score'] | ||
stds = clf.results_['test_std_score'] | ||
for i in range(len(clf.results_['params'])): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not zip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this from the now removed _get_candidate_scores
. zip should have indeed been a better choice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the reviewers got lazy towards the end :)
Sorry for coming late to the party, but why was |
Thanks everyone! :)
name
|
Thanks. then maybe remove Ok, saw #6697 (comment) by @jnothman. |
the docstring doesn't render nicely :-/ |
Looks okay at http://scikit-learn.org/dev/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV... could you point where it renders poorly, @amueller? |
true, it renders ok. Though the rank is a link. And the thing throws a bunch of warnings when building. |
Overrides/Fixes #6686, #1034, #1787, #1768
Also related #1742, #1020
Tangentially related #1850, #1837, #2759
Also adds a
name
to all the scorersReverted based on consensus._get_scorer_name(checked_scoring_function)
.TestReverted based on consensus._get_scorer_name
.grid_scores_
into dict of (masked) numpy arrays, which can be easily imported into pandas._get_candidate_params()
-->search.candidate_params_
-->search.results_['params']
with nice plots?(not for this PR)Reviews welcome!
The sandbox notebook
@amueller @vene @jnothman @agramfort @GaelVaroquaux @mblondel @hlin117 @ogrisel