-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG + 2] ENH Allow cross_val_score
, GridSearchCV
et al. to evaluate on multiple metrics
#7388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 2] ENH Allow cross_val_score
, GridSearchCV
et al. to evaluate on multiple metrics
#7388
Conversation
(If it's focused on I've been thinking of a function |
For a single param evaluation, I think it's easier to simply call mean/std directly on the return value of |
Yes, but I mean to also get times, training score, multiple param results, On 12 September 2016 at 19:50, Raghav RV notifications@github.com wrote:
|
On 12 September 2016 at 21:10, Joel Nothman joel.nothman@gmail.com wrote:
|
Maybe as a separate function Thoughts @vene @amueller @agramfort I thought we could simply have
|
can you write a usage script the way you see? code snippets make things
really concrete
|
Thanks for the comment @agramfort. I will post a sample script soon. |
And @GaelVaroquaux thanks for the comment at #7435. Could you clarify what kind of output you have in mind for @agramfort this is the usage I had in mind - >>> from sklearn.model_selection import cross_val_score
>>> from sklearn.tree import DecisionTreeRegressor
>>> from sklearn.datasets import make_regression
>>> dtc = DecisionTreeRegressor()
>>> X, y = make_regression(n_samples=100, random_state=42)
# For multiple metric - as list of metrics
>>> cross_val_score(dtc, X, y, cv=2, scoring=['neg_mean_absolute_error',
... 'neg_mean_squared_error',
... 'neg_median_absolute_error'])
{'neg_mean_absolute_error': array([-109.20020926, -124.05659102]),
'neg_mean_squared_error': array([-15507.92864917, -27689.6700291 ]),
'neg_median_absolute_error': array([ -87.57322795, -117.34946122])}
# For multiple metric - as dict of callables
>>> cross_val_score(dtc, X, y, cv=2,
... scoring={'neg_mean_absolute_error': neg_uae_scorer,
... 'neg_mean_squared_error': neg_mse_scorer,
... 'neg_median_absolute_error': neg_mae_scorer})
{'neg_mean_absolute_error': array([-109.20020926, -124.05659102]),
'neg_mean_squared_error': array([-15507.92864917, -27689.6700291 ]),
'neg_median_absolute_error': array([ -87.57322795, -117.34946122])}
# For single metric (like before)
>>> cross_val_score(dtc, X, y, cv=2, scoring='neg_mean_absolute_error')
array([-109.20020926, -124.05659102]) |
@mblondel WDYT? |
Ah, for this usecase I actually support a dict. It's a bit weird if the output type changes depending on whether you provide a single metric or not, though. For callables, couldn't we just use |
So @jnothman suggested introducing a new function, and I think that might be a good idea. Optionally we could deprecate the current behavior of I think the new output should be structured like the |
Two functions can have same name. We discussed this when we were brewing the |
Were you suggesting that The list of scores as returned by Can I suggest that we leave |
Sorry I missed that. But there is no multiple metric in |
Not yet. Implementing there is very straight forward given our new But before that we need to fix on |
Hm so do we also want to support |
No we won't support
If the user wants it, they can quickly wrap the individual scorers into separate scorers with single value output each... In which case each such scorer will have a ranking associated with it... And the EDIT For single metric, the current format of |
In short, we can't design |
cross_val_score
to evaluate on multiple metricscross_val_score
, GridSearchCV
and RandomizedSearchCV
to evaluate on multiple metrics
f6d3fe6
to
8b89687
Compare
Please in 0.19. Please please please. While I have monkey patched this in my own code, I've fixed a colleague's code by simply avoiding |
Sure :P I thought 0.18 milestone is already complete with the timing and training score added? I intended this for |
*0.19 only |
1149527
to
f5a917d
Compare
Great work Raghav! |
Sure :) Sorry I was flying to Austin! |
@raghavrv see you tomorrow :) |
Where is it? I can't find it. |
…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline
Old refers to new tag added with PR scikit-learn#7388
Release 0.19b2 * tag '0.19b2': (808 commits) Preparing 0.19b2 [MRG+1] FIX out of bounds array access in SAGA (scikit-learn#9376) FIX make test_importances pass on 32 bit linux Release 0.19b1 DOC remove 'in dev' header in whats_new.rst DOC typos in whats_news.rst [ci skip] [MRG] DOC cleaning up what's new for 0.19 (scikit-learn#9252) FIX t-SNE memory usage and many other optimizer issues (scikit-learn#9032) FIX broken link in gallery and bad title rendering [MRG] DOC Replace \acute by prime (scikit-learn#9332) Fix typos (scikit-learn#9320) [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (scikit-learn#9206) DOC Residual sum vs. regression sum (scikit-learn#9314) [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (scikit-learn#9317) More informative error message for classification metrics given regression output (scikit-learn#9275) [MRG] COSMIT Remove unused parameters in private functions (scikit-learn#9310) [MRG+1] Ridgecv normalize (scikit-learn#9302) [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (scikit-learn#7388) Add data_home parameter to fetch_kddcup99 (scikit-learn#9289) FIX makedirs(..., exists_ok) not available in Python 2 (scikit-learn#9284) ...
* releases: (808 commits) Preparing 0.19b2 [MRG+1] FIX out of bounds array access in SAGA (scikit-learn#9376) FIX make test_importances pass on 32 bit linux Release 0.19b1 DOC remove 'in dev' header in whats_new.rst DOC typos in whats_news.rst [ci skip] [MRG] DOC cleaning up what's new for 0.19 (scikit-learn#9252) FIX t-SNE memory usage and many other optimizer issues (scikit-learn#9032) FIX broken link in gallery and bad title rendering [MRG] DOC Replace \acute by prime (scikit-learn#9332) Fix typos (scikit-learn#9320) [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (scikit-learn#9206) DOC Residual sum vs. regression sum (scikit-learn#9314) [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (scikit-learn#9317) More informative error message for classification metrics given regression output (scikit-learn#9275) [MRG] COSMIT Remove unused parameters in private functions (scikit-learn#9310) [MRG+1] Ridgecv normalize (scikit-learn#9302) [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (scikit-learn#7388) Add data_home parameter to fetch_kddcup99 (scikit-learn#9289) FIX makedirs(..., exists_ok) not available in Python 2 (scikit-learn#9284) ...
* dfsg: (808 commits) Preparing 0.19b2 [MRG+1] FIX out of bounds array access in SAGA (scikit-learn#9376) FIX make test_importances pass on 32 bit linux Release 0.19b1 DOC remove 'in dev' header in whats_new.rst DOC typos in whats_news.rst [ci skip] [MRG] DOC cleaning up what's new for 0.19 (scikit-learn#9252) FIX t-SNE memory usage and many other optimizer issues (scikit-learn#9032) FIX broken link in gallery and bad title rendering [MRG] DOC Replace \acute by prime (scikit-learn#9332) Fix typos (scikit-learn#9320) [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (scikit-learn#9206) DOC Residual sum vs. regression sum (scikit-learn#9314) [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (scikit-learn#9317) More informative error message for classification metrics given regression output (scikit-learn#9275) [MRG] COSMIT Remove unused parameters in private functions (scikit-learn#9310) [MRG+1] Ridgecv normalize (scikit-learn#9302) [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (scikit-learn#7388) Add data_home parameter to fetch_kddcup99 (scikit-learn#9289) FIX makedirs(..., exists_ok) not available in Python 2 (scikit-learn#9284) ...
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR #6651 * Change tag name Old refers to new tag added with PR #7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline
…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline
* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw
Supercedes #2759
Fixes #1837
TODO
check_multimetric_scoring
for validation of multimetricscoring
paramcross_val_score
GridSearchCV
andRandomizedSearchCV
refit
param w.r.t multimetric setting...GridSearchCV
plotting multiple metrics for the search ofmin_samples_split
on a dtcrefit='<metric/scorer>'
validation_curve
learning_curve
fit_grid_point
better to ensure previous public API is not brokencross_val_score
a dict (like grid-search'scv_results_
cross_val_score
's userguide for multi-metricGridSearchCV
's userguide for multi-metricCurrently, in master
scoring
can only be a single string ('precision' etc) or a single callable (make_scorer(precision_score)
, custom_scorer).In this PR
scoring
can now be a list/tuple like('precision', 'accuracy'...)
or a dict like{'precision': make_scorer(precision_score), 'accuracy score': 'accuracy', 'custom': custom_scorer_callable}
scoring
is of multimetric type, the return ofcross_val_score
/learning_curve
/validation_curve
will be dict mappingscorer_names
to their correspondingtrain_scores
ortest_scores
.GridSearchCV
's attributesbest_index_
,best_params_
,best_score_
will correspond to the metric set atrefit
param. If refit is simplyTrue
an error is raised.GridSearchCV
'scv_result_
attribute will consist of keys ending with scorer names for multiple metrics...A sample plot on multiple metric search for
min_samples_split
in dtc (click on the plot to go to the example hosted at circle ci)cc: @jnothman @amueller @vene