-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MRG Training Score in Gridsearch #1742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
'cv_validation_scores')) | ||
CVScoreTuple = namedtuple('CVScoreTuple', | ||
('parameters', 'mean_test_score', | ||
'mean_training_score', 'cv_test_scores')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you rename *_validation_score
to *_test_score
? validation sounds more correct in a CV setting. Don't you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I thought training and test build a nicer pair. Then I though validation would be better but didn't change it back. Will do once my slides are done ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright as you wish I don't have any strong opinion on this either.
What about measuring the |
It's on the todo. Is there a better way than using |
I think |
Fixed doctests, rebased squashed. Should be good to go. |
for i, (arr, title) in enumerate(zip(arrays, titles)): | ||
pl.subplot(2, 2, i + 1) | ||
arr = np.array(arr).reshape(len(C_range), len(gamma_range)) | ||
#pl.subplots_adjust(left=0.05, right=0.95, bottom=0.15, top=0.95) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a left-over of some experiment? It should be removed if it's not useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops. Actually I still need to have a look how it renders on the website.
Please add some smoke tests for the new tuple items: for instance check that all of them are positive and that train_score is lower than 1.0. |
Other than the above comments this looks good to me. |
Also added some tests. |
|
||
We can observe that the lower right half of the parameters (below the diagonal | ||
with high C and gamma values) is characteristic of parameters that yields an | ||
overfitting model: the trainin score is very high but there is a wide gap. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: trainin (my fault)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" wide gap ... with the validation score"
See an alternative patch at https://github.com/jnothman/scikit-learn/tree/grid_search_more_info Note I have chosen different field names, aiming for consistency and memorability, if not preciseness of name. |
@jnothman btw, does your version work with lists of dicts as |
I don't think it's better, but it's certainly no worse: it provides exactly the same ordering according to It doesn't do anything particular to PR forthcoming. |
On 03/13/2013 01:07 AM, jnothman wrote:
|
add docstring for GridSearchCV, RandomizedSearchCV and fit_grid_point. In "fit_grid_point" I used test_score rather than validation_score, as the split is given to the function. rbf svm grid search example now also shows training scores - which illustrates overfitting for high C, and training/prediction times... which pasically serve to illustrate that this is possible. Maybe random forests would be better to evaluate training times?
superseeded by #7026 |
This PR adds training scores to the GridSearchCV output, as wished for by @ogrisel.