[MRG + 1] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search #8117

raghavrv · 2016-12-26T20:48:31Z

This adds all cluster metrics that uses supervised evaluation like fowlkess-mallows etc...

Code to reproduce

>>> from sklearn.model_selection import GridSearchCV
>>> from sklearn.cluster import KMeans
>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
>>> X, y = iris.data, iris.target

>>> km = KMeans(random_state=42)
>>> grid_search = GridSearchCV(km, param_grid=dict(n_clusters=[2, 3, 4, 5]),
...                            scoring='fowlkes_mallows_score')
>>> grid_search.fit(X, y).best_params_['n_clusters']

At master


'fowlkes_mallows_score' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_rand_score', 'average_precision', 'f1',
 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error',
 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'precision', 'precision_macro', 'precision_micro', 'precision_samples',
 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc']

In this branch

3   # = np.unique(iris.target).shape[0]

@tguillemot @jnothman

jnothman · 2016-12-26T21:44:57Z

Currently no clustering scores are listed there, are they? That's because clusterers don't generally implement predict.

raghavrv · 2016-12-26T22:14:53Z

Ah so no clustering metric is added :/

adjusted_rand_score alone seems to be listed. And hence only that is permitted in grid search...

This doesn't work -

    grid_search = GridSearchCV(km, param_grid=dict(n_clusters=[2, 3, 4]),
                               scoring='fowlkes_mallows_score')
    grid_search.fit(X, y)

raghavrv · 2016-12-26T22:15:08Z

should we add all the clustering metrics?

raghavrv · 2016-12-26T22:25:42Z

(All cluster metrics that use supervised evaluation (compare true and predicted labels like a classification metric)?)

jnothman · 2016-12-26T22:48:41Z

Oh, strange. I don't mind other supervised measures being there, but really we need to deal with the scoring framework for clusterers. (Might be an interesting thing to shape up as a GSoC project??)

…

On 27 December 2016 at 09:25, (Venkat) Raghav (Rajagopalan) < ***@***.***> wrote: (All cluster metrics that compare true and predicted labels like a classification metric?) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz62ern5ycLCAu_X6_4ywtzBRImot-ks5rMD7ngaJpZM4LV6WK> .

…rn#8117 is merged

raghavrv · 2016-12-27T08:30:12Z

we need to deal with the scoring framework for clusterers. (Might be an interesting thing to shape up as a GSoC project??)

+1 Maybe we should first start a dedicated issue or wiki page well before GSoC timeline to first sketch out the design so the student can spend less time in API design and more time in implementation...

raghavrv · 2016-12-27T08:31:11Z

And this is ready for a review... All supervised cluster metrics have been added and there is a test for fowlkes_mallows_score in GridSearchCV with KMeans in addition to adjusted rand score...

jnothman · 2016-12-27T11:42:59Z

please update doc/modules/model_evaluation.rst

raghavrv · 2016-12-27T19:29:14Z

Done! :)

raghavrv · 2016-12-27T19:29:54Z

(Apart from another look by Joel for the 1st review) can I have the 2nd review too from @TomDLT @tguillemot @lesteve in parallel?

tguillemot

LGTM

raghavrv · 2017-01-03T13:32:10Z

Thanks for the review @tguillemot :)

raghavrv · 2017-01-05T15:02:21Z

Another review from @amueller maybe?

tguillemot · 2017-01-06T10:26:41Z

sklearn/metrics/tests/test_score_objects.py

@@ -18,7 +18,7 @@
 from sklearn.base import BaseEstimator
 from sklearn.metrics import (f1_score, r2_score, roc_auc_score, fbeta_score,
                             log_loss, precision_score, recall_score)
-from sklearn.metrics.cluster import adjusted_rand_score
+from sklearn.metrics import cluster as cluster_module


I prefer you export directly all the necessary metrics as it's done on the line before.

Not a big deal.

agramfort · 2017-01-06T11:03:14Z

thx @raghavrv !

…o SCORERS dict so it can be used in hyper-param search (scikit-learn#8117) * Add supervised cluster metrics to metrics.scorers * Add all the supervised cluster metrics to the tests * Add test for fowlkes_mallows_score in unsupervised grid search * COSMIT: Clarify comment on CLUSTER_SCORERS * Fix doctest

…rn#8117 is merged

…o SCORERS dict so it can be used in hyper-param search (scikit-learn#8117) * Add supervised cluster metrics to metrics.scorers * Add all the supervised cluster metrics to the tests * Add test for fowlkes_mallows_score in unsupervised grid search * COSMIT: Clarify comment on CLUSTER_SCORERS * Fix doctest

@jnothman

…ate on multiple metrics (#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

…o SCORERS dict so it can be used in hyper-param search (scikit-learn#8117) * Add supervised cluster metrics to metrics.scorers * Add all the supervised cluster metrics to the tests * Add test for fowlkes_mallows_score in unsupervised grid search * COSMIT: Clarify comment on CLUSTER_SCORERS * Fix doctest

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

…o SCORERS dict so it can be used in hyper-param search (scikit-learn#8117) * Add supervised cluster metrics to metrics.scorers * Add all the supervised cluster metrics to the tests * Add test for fowlkes_mallows_score in unsupervised grid search * COSMIT: Clarify comment on CLUSTER_SCORERS * Fix doctest

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

…o SCORERS dict so it can be used in hyper-param search (scikit-learn#8117) * Add supervised cluster metrics to metrics.scorers * Add all the supervised cluster metrics to the tests * Add test for fowlkes_mallows_score in unsupervised grid search * COSMIT: Clarify comment on CLUSTER_SCORERS * Fix doctest

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

raghavrv mentioned this pull request Dec 27, 2016

[MRG + 2] ENH Allow cross_val_score, GridSearchCV et al. to evaluate on multiple metrics #7388

Merged

16 tasks

raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Dec 27, 2016

Fix tests; Unsupervised multimetric gs will not pass until scikit-lea…

636086d

…rn#8117 is merged

raghavrv added 3 commits December 27, 2016 09:26

Add supervised cluster metrics to metrics.scorers

f99545c

Add all the supervised cluster metrics to the tests

64a7698

Add test for fowlkes_mallows_score in unsupervised grid search

77a133d

raghavrv force-pushed the add_cluster_metrics_to_scorer_error_list branch from 281197d to 77a133d Compare December 27, 2016 08:27

raghavrv changed the title ~~[MRG] Add fowlkess-mallows and calinski scores to metrics.scorers~~ [MRG] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search Dec 27, 2016

COSMIT: Clarify comment on CLUSTER_SCORERS

7b17af7

Fix doctest

75476a0

tguillemot approved these changes Jan 3, 2017

View reviewed changes

raghavrv changed the title ~~[MRG] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search~~ [MRG + 1] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search Jan 3, 2017

tguillemot reviewed Jan 6, 2017

View reviewed changes

agramfort merged commit 2f7f5a1 into scikit-learn:master Jan 6, 2017

Przemo10 mentioned this pull request Mar 17, 2017

update fork (#1) #8606

Closed

raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request May 17, 2017

Fix tests; Unsupervised multimetric gs will not pass until scikit-lea…

c7094c4

…rn#8117 is merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG + 1] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search #8117

[MRG + 1] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search #8117

raghavrv commented Dec 26, 2016 •

edited

Loading

jnothman commented Dec 26, 2016

raghavrv commented Dec 26, 2016

raghavrv commented Dec 26, 2016

raghavrv commented Dec 26, 2016 •

edited

Loading

jnothman commented Dec 26, 2016 via email

raghavrv commented Dec 27, 2016

raghavrv commented Dec 27, 2016

jnothman commented Dec 27, 2016

raghavrv commented Dec 27, 2016

raghavrv commented Dec 27, 2016 •

edited

Loading

tguillemot left a comment

raghavrv commented Jan 3, 2017

raghavrv commented Jan 5, 2017

tguillemot Jan 6, 2017 •

edited

Loading

tguillemot Jan 6, 2017

agramfort commented Jan 6, 2017

[MRG + 1] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search #8117

[MRG + 1] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search #8117

Conversation

raghavrv commented Dec 26, 2016 • edited Loading

Code to reproduce

At master

In this branch

jnothman commented Dec 26, 2016

raghavrv commented Dec 26, 2016

raghavrv commented Dec 26, 2016

raghavrv commented Dec 26, 2016 • edited Loading

jnothman commented Dec 26, 2016 via email

raghavrv commented Dec 27, 2016

raghavrv commented Dec 27, 2016

jnothman commented Dec 27, 2016

raghavrv commented Dec 27, 2016

raghavrv commented Dec 27, 2016 • edited Loading

tguillemot left a comment

Choose a reason for hiding this comment

raghavrv commented Jan 3, 2017

raghavrv commented Jan 5, 2017

tguillemot Jan 6, 2017 • edited Loading

Choose a reason for hiding this comment

tguillemot Jan 6, 2017

Choose a reason for hiding this comment

agramfort commented Jan 6, 2017

raghavrv commented Dec 26, 2016 •

edited

Loading

raghavrv commented Dec 26, 2016 •

edited

Loading

raghavrv commented Dec 27, 2016 •

edited

Loading

tguillemot Jan 6, 2017 •

edited

Loading