Skip to content

[MRG + 1] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search #8117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

raghavrv
Copy link
Member

@raghavrv raghavrv commented Dec 26, 2016

This adds all cluster metrics that uses supervised evaluation like fowlkess-mallows etc...

Code to reproduce

>>> from sklearn.model_selection import GridSearchCV
>>> from sklearn.cluster import KMeans
>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
>>> X, y = iris.data, iris.target

>>> km = KMeans(random_state=42)
>>> grid_search = GridSearchCV(km, param_grid=dict(n_clusters=[2, 3, 4, 5]),
...                            scoring='fowlkes_mallows_score')
>>> grid_search.fit(X, y).best_params_['n_clusters']

At master


'fowlkes_mallows_score' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_rand_score', 'average_precision', 'f1',
 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error',
 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'precision', 'precision_macro', 'precision_micro', 'precision_samples',
 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc']

In this branch

3   # = np.unique(iris.target).shape[0]

@tguillemot @jnothman

@jnothman
Copy link
Member

Currently no clustering scores are listed there, are they? That's because clusterers don't generally implement predict.

@raghavrv
Copy link
Member Author

Ah so no clustering metric is added :/

adjusted_rand_score alone seems to be listed. And hence only that is permitted in grid search...

This doesn't work -

    grid_search = GridSearchCV(km, param_grid=dict(n_clusters=[2, 3, 4]),
                               scoring='fowlkes_mallows_score')
    grid_search.fit(X, y)

@raghavrv
Copy link
Member Author

should we add all the clustering metrics?

@raghavrv
Copy link
Member Author

raghavrv commented Dec 26, 2016

(All cluster metrics that use supervised evaluation (compare true and predicted labels like a classification metric)?)

@jnothman
Copy link
Member

jnothman commented Dec 26, 2016 via email

@raghavrv raghavrv force-pushed the add_cluster_metrics_to_scorer_error_list branch from 281197d to 77a133d Compare December 27, 2016 08:27
@raghavrv raghavrv changed the title [MRG] Add fowlkess-mallows and calinski scores to metrics.scorers [MRG] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search Dec 27, 2016
@raghavrv
Copy link
Member Author

we need to deal with the scoring framework for clusterers. (Might be an interesting thing to shape up as a GSoC project??)

+1 Maybe we should first start a dedicated issue or wiki page well before GSoC timeline to first sketch out the design so the student can spend less time in API design and more time in implementation...

@raghavrv
Copy link
Member Author

And this is ready for a review... All supervised cluster metrics have been added and there is a test for fowlkes_mallows_score in GridSearchCV with KMeans in addition to adjusted rand score...

@jnothman
Copy link
Member

please update doc/modules/model_evaluation.rst

@raghavrv
Copy link
Member Author

Done! :)

@raghavrv
Copy link
Member Author

raghavrv commented Dec 27, 2016

(Apart from another look by Joel for the 1st review) can I have the 2nd review too from @TomDLT @tguillemot @lesteve in parallel?

Copy link
Contributor

@tguillemot tguillemot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@raghavrv raghavrv changed the title [MRG] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search [MRG + 1] Add fowlkess-mallows and other supervised cluster metrics to SCORERS dict so it can be used in hyper-param search Jan 3, 2017
@raghavrv
Copy link
Member Author

raghavrv commented Jan 3, 2017

Thanks for the review @tguillemot :)

@raghavrv
Copy link
Member Author

raghavrv commented Jan 5, 2017

Another review from @amueller maybe?

@@ -18,7 +18,7 @@
from sklearn.base import BaseEstimator
from sklearn.metrics import (f1_score, r2_score, roc_auc_score, fbeta_score,
log_loss, precision_score, recall_score)
from sklearn.metrics.cluster import adjusted_rand_score
from sklearn.metrics import cluster as cluster_module
Copy link
Contributor

@tguillemot tguillemot Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer you export directly all the necessary metrics as it's done on the line before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal.

@agramfort agramfort merged commit 2f7f5a1 into scikit-learn:master Jan 6, 2017
@agramfort
Copy link
Member

thx @raghavrv !

sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
…o SCORERS dict so it can be used in hyper-param search (scikit-learn#8117)

* Add supervised cluster metrics to metrics.scorers

* Add all the supervised cluster metrics to the tests

* Add test for fowlkes_mallows_score in unsupervised grid search

* COSMIT: Clarify comment on CLUSTER_SCORERS

* Fix doctest
@Przemo10 Przemo10 mentioned this pull request Mar 17, 2017
raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request May 17, 2017
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…o SCORERS dict so it can be used in hyper-param search (scikit-learn#8117)

* Add supervised cluster metrics to metrics.scorers

* Add all the supervised cluster metrics to the tests

* Add test for fowlkes_mallows_score in unsupervised grid search

* COSMIT: Clarify comment on CLUSTER_SCORERS

* Fix doctest
amueller pushed a commit that referenced this pull request Jul 7, 2017
…ate on multiple metrics (#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
massich pushed a commit to massich/scikit-learn that referenced this pull request Jul 13, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…o SCORERS dict so it can be used in hyper-param search (scikit-learn#8117)

* Add supervised cluster metrics to metrics.scorers

* Add all the supervised cluster metrics to the tests

* Add test for fowlkes_mallows_score in unsupervised grid search

* COSMIT: Clarify comment on CLUSTER_SCORERS

* Fix doctest
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…o SCORERS dict so it can be used in hyper-param search (scikit-learn#8117)

* Add supervised cluster metrics to metrics.scorers

* Add all the supervised cluster metrics to the tests

* Add test for fowlkes_mallows_score in unsupervised grid search

* COSMIT: Clarify comment on CLUSTER_SCORERS

* Fix doctest
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…o SCORERS dict so it can be used in hyper-param search (scikit-learn#8117)

* Add supervised cluster metrics to metrics.scorers

* Add all the supervised cluster metrics to the tests

* Add test for fowlkes_mallows_score in unsupervised grid search

* COSMIT: Clarify comment on CLUSTER_SCORERS

* Fix doctest
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants