[MRG+3] ENH Restructure grid_scores_ into a dict of 1D (numpy) (masked) arrays that can be imported into pandas as a DataFrame. #6697

raghavrv · 2016-04-22T17:54:43Z

Overrides/Fixes #6686, #1034, #1787, #1768

Also related #1742, #1020

Tangentially related #1850, #1837, #2759

Also adds a name to all the scorers

Reviews welcome!

The sandbox notebook

@amueller @vene @jnothman @agramfort @GaelVaroquaux @mblondel @hlin117 @ogrisel

raghavrv · 2016-04-27T18:30:57Z

Sorry for the delay! I was trying to attack multiple metric support along with this and later realized the Scorer interface discussion needs to take place before that!

So I've just restructured the grid_scores into search_results which is a dict of numpy masked arrays for parameters / reg arrays for scores/means and ranks.

hlin117 · 2016-04-27T19:12:31Z

sklearn/model_selection/_search.py

-            * ``mean_validation_score``, the mean score over the
-              cross-validation folds
-            * ``cv_validation_scores``, the list of scores for each fold
+    search_results_ : dict of numpy (masked) ndarrays


By renaming grid_scores_ to search_results_, you're changing the API, right? You probably don't want to do that.

The model_selection module has not been publicly released, so there's an argument to be made that this is fair game.

Rather, this sort of change has been proposed and sought by the core devs for years; the fact that model_selection has not been released maybe makes it the best time to do it. However, I think we must consider this a deprecation case, not just drop grid_scores_ altogether. And that's because users will initially not rewrite all their old code but will change some imports, and will be much less angry if they get a DeprecationWarning at the end of a long fit rather than an AttributeError

Humm... We plan on adding multiple metric support subsequently. How would grid_scores_ look for that use case? Do we raise an error when grid_scores_ is accessed when multiple metrics are required. Or do we return the first metric alone? or all the metrics as a dict of lists (of _CVScoreTuples)?

Also the old grid_search module is available for all the angry users who can tolerate a DeprecationWarning. I have a humble opinion that trying to support old code here will constrain us, not now but in subsequent PRs.

I think @jnothman is right, especially because this is likely to be an issue at the end of a long fit. I guess grid_scores_ should raise an exception if (when) multiple metrics are used. They're not "angry users", if anything, they will be users who embark in the process of updating their projects to keep up with scikit-learn changes. We want to make it smooth, such that they can stop in the middle of the rewriting effort, if they have to, and things will still work, right?

Maybe in such a case we could use a shorter deprecation cycle, if you think that would help keep things cleaner.

users who embark in the process of updating their projects to keep up with scikit-learn changes.

Okay with +3 I'm adding grid_scores_ as a property function. Also I was wondering if we should store like we did before as computing them from search_results_ (especially the params) is not very clean.

It only has to work for cases formerly supported. Either way of producing them is fine.

MechCoder · 2016-04-27T23:33:48Z

Ping again when this is ready for review..

jnothman · 2016-04-28T00:06:57Z

sklearn/model_selection/_search.py

+                                mask = [ True  True False False]...),
+         'degree' : masked_array(data = [2.0 3.0 -- --],
+                                 mask = [False False  True  True]...),
+         'accuracy_score_split_0' : [0.8, 0.7, 0.8, 0.9],


It is annoying that we are pandering to pandas (ha!) to not allow a 2d array here :(

Indeed! The other option is to store the scores in compact 2D numpy float arrays and have a sorted list of column headers (the dict keys of search_results_). (only the scores, not the parameters).

I think for some functions in networkx, they allow you to return pandas dataframes when possible by flipping a return_pandas parameter to True.

Well, I suppose the most idiomatic Pandas representation is using a MultiIndex. But I don't think it's necessary our job to produce that.

Not that I'm a proficient enough Pandas user to really say what is idiomatic or useful.

To be practical, what I am a little concerned about is: how easy is it to calculate the standard deviations of scores across k-fold CV? It's easy in the current storage; how easy is it in the proposed? How easy is it in the Pandas rep?

@jnothman Assuming that the folds in question are named accuracy_score_split_0, ..., accuracy_score_split_(k-1), and the pandas dataframe with the scores is called search_results_, it's as easy as

fold_scores = search_results_.select(lambda name: name.startswith("accuracy_score_split_"), axis=1) # Placing data into the dataframe search_results_["std"] = fold_scores.std(axis=1) search_results_["mean"] = fold_scores.mean(axis=1)

I guess that's not terrible...

On 1 May 2016 at 16:29, Henry Lin notifications@github.com wrote:

In sklearn/model_selection/_search.py
#6697 (comment)
:

kernel|gamma|degree|accuracy_score_split_0...|accuracy_score_mean ...|

=====================================================================

'poly'| - | 2 | 0.8 | 0.81 |

'poly'| - | 3 | 0.7 | 0.60 |

'rbf' | 0.1 | - | 0.8 | 0.75 |

'rbf' | 0.2 | - | 0.9 | 0.82 |

will be represented by a search_results_ dict of :

{'kernel' : masked_array(data = ['poly', 'poly', 'rbf', 'rbf'],

mask = [False False False False]...)

'gamma' : masked_array(data = [-- -- 0.1 0.2],

mask = [ True True False False]...),

'degree' : masked_array(data = [2.0 3.0 -- --],

mask = [False False True True]...),

'accuracy_score_split_0' : [0.8, 0.7, 0.8, 0.9],

@jnothman https://github.com/jnothman Assuming that the folds in
question are named accuracy_score_split_0, ..., accuracy_score_split_(k-1),
and the pandas dataframe with the scores is called search_results_, it's
as easy as

fold_scores = search_results_.select(lambda name: name.startswith("accuracy_score_split_"), axis=1)
search_results_["std"] = fold_scores.std(axis=1) # To place it into the dataframe

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/6697/files/e153cbbbe7fc8ae5edbb427ca6f3e22d63bd6f0a#r61681497

Do you think get_candidates(...) addresses this problem?

jnothman · 2016-04-28T00:17:35Z

I think, @rvraghav93, you would've benefited from writing this in a more test-driven way. Indeed, we would be able to critique your expected format before you write the implementation! Please get that test written up so I don't need to keep referring to an outdated example, when I know whether or not the test passes.

jnothman · 2016-04-28T00:22:39Z

This is otherwise going in the right direction and I look forward to its acceptance!

I think fixing #1020 directly here (visualising and marginalising over parameters) is out of scope, though you're right that enabling Pandas access allows users to groupby and visualise.

I also think #1742, adding training scores/times should probably also remain out of scope.

raghavrv · 2016-04-28T16:05:12Z

Thanks heaps @jnothman, @hlin117 and @vene for the review!! I'll clean up a bit and add the tests soon.

vene · 2016-04-28T16:40:12Z

Sure, but I was just weighing in. I'll do my best to squeeze in an actual review.

raghavrv · 2016-05-01T15:08:47Z

Ok I've re-added the grid_scores_ as a property function.

I've also added a get_candidates(<int>/<list of ints>/None(for all candidates)) method that will get us a (list of) dict of candidate parameter dict and scores to help users perform row wise operations.

Not sure if this is helpful or superfluous. I just went ahead and implemented this. We could remove if you feel this is superfluous and confuses search_results_.

Refer the 2nd cell of the sandbox notebook for a sample usage. I'll clean up docs and add tests soon.

jnothman · 2016-05-01T22:41:42Z

I don't think get_candidates is helpful. I see it provides:

indexing, but this is far from exciting enough to justify multiple access paradigms
parameters as dict, but this can be done within search_results_ too.
fold scores as array, but this could be done by providing fold_search_results_ or similar with one entry per fold rather than one entry per parameter setting; it does not not require an entirely different data structure

Separate;y I suggest we don't delimit everything in column names with underscores. We have no constraint that these be valid Python identifiers. Go for broke with colons, for instance.

jnothman · 2016-05-01T22:42:56Z

Though I guess Pandas has special attribute accessors for Python identifier names... Maybe I'm wrong about the colons, but I dislike the long underscored names aesthetically and in terms of the potential for naming conflicts...

jnothman · 2016-05-02T11:42:01Z

Having said that #1742 is a separate concern, I remember that when I was designing an equivalent solution, I realised that our results need to distinguish between train scores and test scores. Is it overkill to already label columns "test_score" rather than "score"?

jnothman · 2016-05-02T11:44:04Z

sklearn/model_selection/_search.py

+    """Generate the metric name given the scoring parameter"""
+    if callable(scoring):
+        if scoring.__name__ == "_passthrough_scorer":
+            return "estimator_default_scorer"


if "_score" is being appended, isn't "estimator" sufficient? i.e. "estimator_score"...?

raghavrv · 2016-06-15T22:15:26Z

All done.

I think circle ci failure is unrelated. Could you confirm @ogrisel?

If so merge? @jnothman @MechCoder @agramfort

jnothman · 2016-06-15T22:51:05Z

+1 for merge.

raghavrv · 2016-06-15T22:54:52Z

Don't merge yet

jnothman · 2016-06-15T23:00:10Z

I meant that as : I will not merge because there may yet be minor issues, but I'm generally satisfied with this

raghavrv · 2016-06-15T23:03:27Z

Sorry for the short uninformative comment. I had pushed a superfluous commit (unstashed from my multiple metric search work) by mistake and commented so to avoid accidental merge.

I just used reflog to fix it. Now its all good. Thanks heaps for the reviews!!

raghavrv · 2016-06-15T23:06:40Z

Interesting... Force pushing a previous version (restored via reflog) which was tested already here does not trigger a CI rebuild. Cool!

MechCoder · 2016-06-15T23:37:10Z

merge?

raghavrv · 2016-06-16T01:18:44Z

Yea... All good from my side!!

jnothman · 2016-06-16T02:23:22Z

Well, well. Congratulations and thank you, @raghavrv!

MechCoder · 2016-06-16T03:01:55Z

Yay!! Congratulations and well done 👍 🍷 🍷

Or as they say over here, மகிழ்ச்சி

hlin117 · 2016-06-16T06:21:52Z

Congrats!
On Jun 16, 2016 11:02, "Manoj Kumar" notifications@github.com wrote:

Yay!! Congratulations and well done 👍 🍷 🍷

Or as they say over here, மகிழ்ச்சி

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dlearn_scikit-2Dlearn_pull_6697-23issuecomment-2D226376596&d=CwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=IgVP_sCIQHhgRgxQoo7gufqSQOvlFIckTGgYIpXBa_4&m=RfBuMsvy1DhN34JxxXUi5qOlq7W-zZFwBJ_rVZbkiWo&s=ckLgMzofE_7vgjOSziweIVchPepBj2Qgu1unv7AYq-U&e=,
or mute the thread
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe_ABx4naIJBIzU-5F91KMvD7VRCqU2ANfqCyks5qMLzOgaJpZM4IN1Px&d=CwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=IgVP_sCIQHhgRgxQoo7gufqSQOvlFIckTGgYIpXBa_4&m=RfBuMsvy1DhN34JxxXUi5qOlq7W-zZFwBJ_rVZbkiWo&s=Xqt-dEScrBxqL5jFxFNm-wJiITo3Z7zDG0bGO6lmyAE&e=
.

TomDLT · 2016-06-16T09:01:45Z

🍻

amueller · 2016-06-16T17:20:03Z

Awesome! Great job!

amueller · 2016-06-16T17:24:07Z

examples/model_selection/grid_search_digits.py

-    for params, mean_score, scores in clf.grid_scores_:
+    means = clf.results_['test_mean_score']
+    stds = clf.results_['test_std_score']
+    for i in range(len(clf.results_['params'])):


why not zip?

I refactored this from the now removed _get_candidate_scores. zip should have indeed been a better choice!

Because the reviewers got lazy towards the end :)

amueller · 2016-06-16T17:27:09Z

Sorry for coming late to the party, but why was results_ introduced instead of using grid_scores_? We just moved to a different module anyhow. Why should we have any deprecations in a new module?

raghavrv · 2016-06-16T17:35:12Z

Thanks everyone! :)

Why should we have any deprecations in a new module?

#6697 (comment)

name results_ was chosen instead of grid_scores_ as

It will generically apply to GridSearchCV as well as RandomizedSearchCV and other non-grid based *SearchCV that maybe added in the future.
the results_, will store more than scores (very soon times/n_test_samples etc too).

amueller · 2016-06-16T17:53:06Z

Thanks. then maybe remove grid_scores_ instead of having a deprecated one?

Ok, saw #6697 (comment) by @jnothman.
I guess it's not that much of a pain so let's keep it.

amueller · 2016-09-06T22:05:47Z

the docstring doesn't render nicely :-/

jnothman · 2016-09-06T23:52:40Z

Looks okay at http://scikit-learn.org/dev/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV... could you point where it renders poorly, @amueller?

amueller · 2016-09-08T15:18:18Z

true, it renders ok. Though the rank is a link. And the thing throws a bunch of warnings when building.

raghavrv force-pushed the multiple_metric_grid_search branch from 413404e to 12d8261 Compare April 22, 2016 17:55

raghavrv mentioned this pull request Apr 23, 2016

[RFC] Better Format for search results in model_selection module. #6686

Closed

raghavrv changed the title ~~[WIP] ENH Restructure grid_scores_ into more efficient data structure + Simultaneous Multiple Metric Grid / Random Search / cross-validation~~ [WIP] ENH Restructure grid_scores_ into more efficient data structure Apr 25, 2016

raghavrv force-pushed the multiple_metric_grid_search branch 5 times, most recently from cad5e6c to e153cbb Compare April 27, 2016 18:29

hlin117 reviewed Apr 27, 2016
View reviewed changes

jnothman reviewed Apr 28, 2016
View reviewed changes

jnothman reviewed May 2, 2016
View reviewed changes

raghavrv force-pushed the multiple_metric_grid_search branch from 18ac6a1 to a3b1eb9 Compare June 15, 2016 22:54

raghavrv force-pushed the multiple_metric_grid_search branch from a3b1eb9 to 18ac6a1 Compare June 15, 2016 23:01

FIX Remove scaffolding print line.

e8e2a9c

jnothman merged commit afd5d18 into scikit-learn:master Jun 16, 2016

raghavrv deleted the multiple_metric_grid_search branch June 16, 2016 07:39

amueller reviewed Jun 16, 2016
View reviewed changes

jnothman mentioned this pull request Jun 23, 2016

[MRG +1] Fix text data tutorial #6923

Merged

raghavrv mentioned this pull request Aug 18, 2016

results_ attribute for all EstimatorCV classes? #7206

Closed

13 tasks

jnothman mentioned this pull request Sep 6, 2016

[MRG+2] ENH/MNT results_ --> cv_results; test_mean_score --> mean_test_score et al. #7324

Merged

elsander mentioned this pull request Mar 5, 2019

Compatibility with newest scikit-learn civisanalytics/civisml-extensions#45

Closed

[MRG+3] ENH Restructure grid_scores_ into a dict of 1D (numpy) (masked) arrays that can be imported into pandas as a DataFrame. #6697

[MRG+3] ENH Restructure grid_scores_ into a dict of 1D (numpy) (masked) arrays that can be imported into pandas as a DataFrame. #6697

Conversation

raghavrv commented Apr 22, 2016 • edited Loading

raghavrv commented Apr 27, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MechCoder commented Apr 27, 2016

Choose a reason for hiding this comment

raghavrv Apr 28, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlin117 May 1, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raghavrv May 1, 2016 • edited Loading

Choose a reason for hiding this comment

jnothman commented Apr 28, 2016

jnothman commented Apr 28, 2016

raghavrv commented Apr 28, 2016

vene commented Apr 28, 2016

raghavrv commented May 1, 2016 • edited Loading

jnothman commented May 1, 2016

jnothman commented May 1, 2016

jnothman commented May 2, 2016

Choose a reason for hiding this comment

raghavrv commented Jun 15, 2016

jnothman commented Jun 15, 2016

raghavrv commented Jun 15, 2016

jnothman commented Jun 15, 2016

raghavrv commented Jun 15, 2016 • edited Loading

raghavrv commented Jun 15, 2016

MechCoder commented Jun 15, 2016

raghavrv commented Jun 16, 2016

jnothman commented Jun 16, 2016

MechCoder commented Jun 16, 2016

hlin117 commented Jun 16, 2016

TomDLT commented Jun 16, 2016

amueller commented Jun 16, 2016

Choose a reason for hiding this comment

raghavrv Jun 16, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Jun 16, 2016 • edited Loading

raghavrv commented Jun 16, 2016 • edited Loading

amueller commented Jun 16, 2016 • edited Loading

amueller commented Sep 6, 2016

jnothman commented Sep 6, 2016

amueller commented Sep 8, 2016

raghavrv commented Apr 22, 2016 •

edited

Loading

raghavrv commented Apr 27, 2016 •

edited

Loading

raghavrv Apr 28, 2016 •

edited

Loading

hlin117 May 1, 2016 •

edited

Loading

raghavrv May 1, 2016 •

edited

Loading

raghavrv commented May 1, 2016 •

edited

Loading

raghavrv commented Jun 15, 2016 •

edited

Loading

raghavrv Jun 16, 2016 •

edited

Loading

amueller commented Jun 16, 2016 •

edited

Loading

raghavrv commented Jun 16, 2016 •

edited

Loading

amueller commented Jun 16, 2016 •

edited

Loading