-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] DOC: Added Nested Cross Validation Example #7111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] DOC: Added Nested Cross Validation Example #7111
Conversation
I think the idea was more to use |
Okay! There's a comment included at the bottom of the code that shows that the |
Thanks for the PR :) Yes like Andy suggests conveying that doing nested cv has become simpler due to our new cv iterators could be more helpful. I think using using LOLO-cv would be a good idea as you can show that we can pass I remember speaking with @ogrisel IRL at Paris Pydata. He suggested a good toy problem would be to predict the sepal/petal length/width using the other 3 features and use the iris classes as group labels. With this you can show that it is useful to use LOLO cross-validator as the measurements would depend on the iris class. |
I feel that task will be confusing to people new to ml / sklearn. |
Alright. In that case @mlliou112, you could additionally show how the scores without nested cv are not reliable and how nested cv reveals that inconsistency in scores for different parameter setting. That would be a nice way to highlight the importance of nested cv... |
Great, sounds good. I'll work on it and push changes when I have them. On a separate note, is there any reason you would suspect for the test failing on CircleCI? Also, is this something I should be worried about and how should I approach fixing it? |
Alright, sorry for the delay. In this change, I've tried to illustrate here the slight optimistic bias of non-nested CV versus nested CV, especially when the splits are on a small dataset such as the iris one. I tried to narrow down how many parameter values are optimized over on |
Don't worry about the delay. This looks okay, though I'm not sure we should give the impression that those differences are significant. The text is also a bit verbose. The thing to emphasise is that taking the max over multiple parameter settings in grid search is liable to over-fit, often yielding an over-estimate of generalisation error. |
I trimmed the text to make it more concise. Let me know what you think! |
I'll take a look at this later, but it looks like your example is failing tests. |
I don't want to make the text too long, but I think there should be a connection to doing a "train/test" split and doing a "train/validation/test" split. |
Also, it might be important to point out what the result of nested cross-validation is. It doesn't yield a model or even a best parameter setting, so you don't get a model that you could use on new data. It approximates the generalization error of the |
I was looking for a demo like this, so I like it! Then wanted to take it one step further and shuffled the labels expecting that The original use-case that made me think about this is:
Later on return to repeat steps 2. and 3. However that setup is subtly different from the one in this example. So yeah. |
@betatim Your expectation is correct. What numbers are you getting? I added a few lines of @amueller I added two sentences with your suggestions. Let me know if they are appropriately placed! |
Trying various seeds it fluctuates up and down. Should have tried that first. What I learnt from this is that |
I think this would be nice to have in 0.18. Could @jnothman mark this so? |
This will likely miss the RC, @raghavrv, but we might be able to throw it in for final. |
@mlliou112 you with us? :) |
Yes! Following the 0.18 release has certainly been exciting. :) Is there anything else I have to do for this/or help in general? |
You should refer to this example from the narrative docs on model selection. |
performance of non-nested and nested CV strategies by taking the difference | ||
between their scores. | ||
|
||
See Also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These headings are too big. Can we use .. topic
instead?
This example compares non-nested and nested cross-validation strategies on a | ||
classifier of the iris data set. Nested cross-validation (CV) is often used to | ||
train a model in which hyperparameters also need to be optimized. Nested CV | ||
approximates the generalization error of the resulting estimator of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"approximates" -> "estimates"
How about "estimates the generalization error of the underlying model and its (hyper)parameter search"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I also changed other occurrences of "hyperparameter" -> (hyper)parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're inconsistent on this terminology. Hyperparameter is a Bayesian term that's recently become more popular, for clarity, in the rest of the ML community. But scikit-learn uses "get_params" and "set_params" to operate on these things and not on the model parameters in the bayesian sense, so I find it a bit strange to call it a hyperparameter.
train a model in which hyperparameters also need to be optimized. Nested CV | ||
approximates the generalization error of the resulting estimator of the | ||
hyperparameter search. It is generally best practice to use nested CV as | ||
non-nested CV will sometimes provide a slightly more biased and optimistic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need "slightly" and "sometimes".
Choosing the parameters that maximise non-nested CV biases it to the dataset, yielding an overly-optimistic score.
This notion is repeated multiple times here and I think you need to work on making it succinct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I have specified "it" -> "the model" and chose "maximize" over "maximise"... (sorry uk/canada, just to be consistent with rest of the example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rereading "them" (i.e. the parameters) would have done instead of it. I didn't write that with the intention that you would necessarily copy verbatim, so your edits to my paraphrases are welcome. Even when they arbitrarily side with Webster.
non-nested CV will sometimes provide a slightly more biased and optimistic | ||
score. | ||
|
||
In contrast to non-nested CV, an inner CV loop is introduced that partitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
Model selection without nesting CV involves evaluating the model's performance on data that is also used to tune the model. Information may thus "leak" into the model and overfit the data. The magnitude of this effect is primarily dependent on the size of the dataset and the stability of the model. See Cawley and Talbot [1]_ for an analysis of these issues.
Nested CV effectively uses a series of train/validation/test set splits. Score is approximately maximised in fitting a model to each training set, and then directly maximised in selecting hyperparameters over the validation set. Assessing performance on a held-out test set avoids evaluating a model on data that has been used to tune it. Generalization error is estimated by averaging test set scores over several dataset splits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 .
First paragraph: I altered first sentence to emphasize the data part.
"Model selection without nested CV uses the same data to tune model parameters and evaluate model performance."
Second paragraph: I would really like to emphasize where the tests splits are relative to the inner and outer CV loops, so I added prepositional phrases. I also removed the sentence "assessing performance..." and added a transitional phrase before the paragraph that I think demonstrates the same thing.
Again, "s" -> "z" 🇺🇸 :p
the dataset and the stability of the model. For more quantitative detail of | ||
potential bias when tuning parameters, see this paper. [1]_ | ||
|
||
Each iteration of the inner CV loop will provide the best estimator for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop this sentence, perhaps this paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped paragraph.
It was a point of confusion for me when first starting so I thought I would point it out explicitly, but i'm not at all attached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true of CV generally. If you feel that's not clear enough in the narrative docs, propose a change there? I'd rather example text be to the point.
@jnothman Thanks for the review, I took most your suggestions, w/ some minor alterations (see above). Let me know what you think. |
@@ -79,6 +79,10 @@ evaluated and the best combination is retained. | |||
classifier (here a linear SVM trained with SGD with either elastic | |||
net or L2 penalty) using a :class:`pipeline.Pipeline` instance. | |||
|
|||
- See :ref:`example_model_selection_plot_nested_cross_validation_iris.py` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add "This is best practice for evaluating the performance of a model with grid search."
[1]_ for an analysis of these issues. | ||
|
||
To avoid this problem, nested CV effectively uses a series of | ||
train/validation/test set splits. In the inner loop, score is approximately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"score" -> "the score"
|
||
# Choose cross-validation techniques for the inner and outer loops, | ||
# independently of the dataset. | ||
# E.g "LabelKFold", "LeaveOneOut","LeaveOneLabelOut", etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after comma.
otherwise LGTM |
# Choose cross-validation techniques for the inner and outer loops, | ||
# independently of the dataset. | ||
# E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc. | ||
inner_cv = KFold(n_folds=4, shuffle=True, random_state=i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_splits instead of n_folds
Aye, I think I messed up this branch/PR. There were new commits to How do I fix this? (And what was the better way to integrate those changes?) 🙏 Should I just make a new PR? |
To fix the mess: # go to your branch
git checkout your_branch
# always do a backup
git checkout -b your_branch_back_up
# squash your 9 commits into one
git rebase -i HEAD~9
# Now you have only one commit. Note its SHA (16 hexadecimal characters)
git log -n 1
# delete master
git branch -D master
# and download a fresh new version of master
git fetch upstream master:master
# go to master
git checkout master
# delete your branch (you have a backup)
git branch -D your_branch
# recreate your branch from fresh master
git checkout -b your_branch
# apply only your last commit with its SHA
git cherry-pick SHA
# force push
git push -f origin your_branch In the future, when you need to rebase: # go to master
git checkout master
# update master
git pull --rebase upstream master
# got to your branch (never work on master)
git checkout your_branch
# rebase your branch on master
git rebase master
# solve your conflicts and force push
git push -f origin your_branch Highly recommend readings: http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html |
ec77e9f
to
dd8fa72
Compare
@TomDLT Thanks very much! The development workflow article is much more helpful than anything that I found. |
LGTM, thanks @mlliou112! Merging. @amueller, please backport. |
Thanks @mlliou112 :) |
@TomDLT can you add the git stuff as a link to the contributing docs? |
* tag '0.18': (1286 commits) [MRG + 1] More versionadded everywhere! (scikit-learn#7403) minor doc fixes fix lbfgs rename (scikit-learn#7503) minor fixes to whatsnew fix scoring function table fix rebase messup DOC more what's new subdivision DOC Attempt to impose some order on What's New 0.18 no fixed width within bold REL changes for release in 0.18.X branch (scikit-learn#7414) [MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325) DOC: Added Nested Cross Validation Example (scikit-learn#7111) Sync docstring and definition default argument in kneighbors (scikit-learn#7476) added contributors for 0.18, minor formatting fixes. Fix typo in whats_new.rst [MRG+2] FIX adaboost estimators not randomising correctly (scikit-learn#7411) Addressing issue scikit-learn#7468. (scikit-learn#7472) Reorganize README clean up deprecation warning stuff in common tests [MRG+1] Fix regression in silhouette_score for clusters of size 1 (scikit-learn#7438) ...
* releases: (1286 commits) [MRG + 1] More versionadded everywhere! (scikit-learn#7403) minor doc fixes fix lbfgs rename (scikit-learn#7503) minor fixes to whatsnew fix scoring function table fix rebase messup DOC more what's new subdivision DOC Attempt to impose some order on What's New 0.18 no fixed width within bold REL changes for release in 0.18.X branch (scikit-learn#7414) [MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325) DOC: Added Nested Cross Validation Example (scikit-learn#7111) Sync docstring and definition default argument in kneighbors (scikit-learn#7476) added contributors for 0.18, minor formatting fixes. Fix typo in whats_new.rst [MRG+2] FIX adaboost estimators not randomising correctly (scikit-learn#7411) Addressing issue scikit-learn#7468. (scikit-learn#7472) Reorganize README clean up deprecation warning stuff in common tests [MRG+1] Fix regression in silhouette_score for clusters of size 1 (scikit-learn#7438) ...
* dfsg: (1286 commits) [MRG + 1] More versionadded everywhere! (scikit-learn#7403) minor doc fixes fix lbfgs rename (scikit-learn#7503) minor fixes to whatsnew fix scoring function table fix rebase messup DOC more what's new subdivision DOC Attempt to impose some order on What's New 0.18 no fixed width within bold REL changes for release in 0.18.X branch (scikit-learn#7414) [MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325) DOC: Added Nested Cross Validation Example (scikit-learn#7111) Sync docstring and definition default argument in kneighbors (scikit-learn#7476) added contributors for 0.18, minor formatting fixes. Fix typo in whats_new.rst [MRG+2] FIX adaboost estimators not randomising correctly (scikit-learn#7411) Addressing issue scikit-learn#7468. (scikit-learn#7472) Reorganize README clean up deprecation warning stuff in common tests [MRG+1] Fix regression in silhouette_score for clusters of size 1 (scikit-learn#7438) ...
@amueller @mlliou112 I read the nested cross validation material (http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html) and it is really good! Do you know if I can get the best_params_ out of each fold of outer loop cross validation during the nested cross validation? I want to use nested cross validation to identify the best hyperparameters |
@johnny5550822 The outer loop of the nested cross-validation is done only to evaluate the best models chosen by the inner loop. Refer this stackoverflow answer |
@raghavrv ya, Hmm......so what should I do if I want to get the best hyperparameters? |
To select the best hyper params, you simply do one cross-validation. To evaluate this 'selection', you do the outer cross-validation and infer if your selection can be trusted. Read the lower part of the above SO answer on what to look for in outer cv. If from those inferences you understand that your selection can't be trusted, you tweak a different set of parameters / choose a different model / do more feature engineering. Maybe @GaelVaroquaux @jnothman @amueller would be able to answer you in more detail. But unless I am mistaken, that is the crux of nested cv. Do the selection in inner cv, and use the outer cv to see if your selection can be trusted. |
@raghavrv "To select the best hyper params, you simply do one cross-validation. To evaluate this 'selection', you do the outer cross-validation and infer if your selection can be trusted" Are you suggesting I use the inner-loop cross validation to identify the best hyperparameters? @GaelVaroquaux @jnothman @amueller |
Yes... |
@raghavrv but how can I do it in scikit-learn. In the nested cv example, seem like GridSearchCV is wrapped by cross_val_score. I don't know how to obtain the hyperparameters.
|
Just do (But please do wait for replies from others to gain more clarity...) |
This conversation would be much more appropriate on the mailing list or
stack overflow / cross validated
With nested CV you are able to make a statement like "On average, this
parameter selection strategy yields an accuracy of xx±x%." There may,
however be many different parameter sets selected to make up that average.
You can then use the parameter selection strategy to the full dataset and
apply the learnt model, with an assumption of that estimated accuracy.
Our cross_val_score function does not allow you to pull out any one of the
fitted estimators that make up its scores, although that would be another
option, saving time, but only fit with a portion of the dataset.
Is that clear?
…On 5 January 2017 at 09:52, (Venkat) Raghav (Rajagopalan) < ***@***.***> wrote:
Just do clf.fit(features, labels).best_params_
(But please do wait for replies from others to gain more clarity...)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7111 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6-i_-Ph8LP-DxMBTa6-jIGlGbwcmks5rPCKfgaJpZM4JX5BL>
.
|
@jnothman To be clear, that is to say nested cross validation is not really the way to do hyperparameter selection (instead, we use repeated simple cross-validation). Rather, nested cross validation is to not only provide an estimated performance of the model, but also tell you roughly about "what is the simple cross-validation performance on hyperparameter selection?" Thus, the nested cross validation will give something like xx+-x%. Am I right? Also, when we construct the nested CV, we pass the clf into cross_val_score (see below), there is no repeated trial inside the inner loop of CV right (i.e., just one time CV for each possible combination of parameter inside the GridSearchCV)? I
|
@johnny5550822 I think what you're saying is correct. I'd caution thinking about nested and simple as different methods of cross validation. Nested is really just the simple CV done twice for two things at once, estimating hyperparameters and evaluating parameter selection strategy. (I'll reemphasize that the "optimal parameters" given by the inner CV may not be the same for all outer CV loops.) Yes, the 3-fold CV is just done once per combination of parameters, and gives out the average performance of those 3 folds. |
@mlliou112 Great. I want to make sure the inner-loop is just done once because I originally thought it is done more than just one time. This is because the suggested nested CV is a repeated nested CV, i.e. the inner loop is repeated several times before identifying the best score. (https://jcheminf.springeropen.com/articles/10.1186/1758-2946-6-10) Maybe this can be part of the future work and allow user to use repeated or not. |
You are absolutely right. And that is exactly the problem. You cannot get the best parameters of a model, while using nested crossvalidation since the models could be different in each outer loop. A good way to visualize this is with this image I found https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png Noone guaranties that the split of the inner CV will be the same in every outer CV loop, thus the hyper parameters and the model itself. So I think there are two solutions:
Have I understood it correctly ??? |
Reference Issue
Fixes #5589
What does this implement/fix? Explain your changes.
Example of Nested Cross Validation using new model selection module. SVC on digits dataset.
Any other comments?
First contribution to Scikit-learn. Hence, I am by no means an expert on nested cross validation, but I read as much as I could and tried to follow all contribution guidelines appropriately.
I happily welcome any constructive comments, large or small!