Skip to content

Added user guide documentation for permutation_test_score #14757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions doc/modules/cross_validation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,23 @@ section.
* :ref:`sphx_glr_auto_examples_model_selection_plot_cv_predict.py`,
* :ref:`sphx_glr_auto_examples_model_selection_plot_nested_cross_validation_iris.py`.


.. _cv_significance_evaluation:

Cross-validation significance evaluation
----------------------------------------

Significance of cross validation scores can be evaluated using the
:func:`permutation_test_score` function. The function returns a p-value, which
approximates the probability that the average cross-validation score would be
obtained by chance if the target is independent of the data.

It also returns cross validation scores for each permutation of y labels. It
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking about return value here confuses the matter if what we want to talk about is how the P value is constructed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation is for user guide. The function description and the paper cited give more details about how the p-value is constructed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordinarily our user guide is less focused on API things like what the function returns. Conversely, a description of how the algorithm works belongs in the user guide not in the docstring.

Copy link
Contributor

@kellycarmody kellycarmody Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, we're trying to get all of the PR's from the WiMLDS sprint wrapped up, are we waiting on @aditi9783, or approval from another reviewer? Thanks!

Not completely sure what's going on with this one @reshamas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kellycarmody We are waiting on @aditi9783

@aditi9783 We use the user guide to explain the math with the details necessary to explain the function. The function's docstring is usually brief and links to the user guide. In this case, moving most of the docstring into the user guide would be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thomasjpfan, @reshamas asked me to take over this PR, but I see that Nicolas Hug already approved the changes, and it is just waiting for one more reviewer.

Is the PR good? Or should I move most of the docstring into the user guide?

permutes the labels of the samples and computes the p-value against the null
hypothesis that the features and the labels are independent, meaning that there
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "features" be "predictions"? Or is this right, if we add "conditioned on the estimator"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features are the input vectors (X), and thus not predictions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so you need to add "given the estimator"

is no difference between the classes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"classes" implies classification, which we aren't necessarily talking about here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function permutes class labels. Isn't classification implied in that case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it peemutes target values for each sample, not class labels



Cross validation iterators
==========================

Expand Down
27 changes: 21 additions & 6 deletions sklearn/model_selection/_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -961,9 +961,23 @@ def _index_param_value(X, v, indices):
def permutation_test_score(estimator, X, y, groups=None, cv=None,
n_permutations=100, n_jobs=None, random_state=0,
verbose=0, scoring=None):
"""Evaluate the significance of a cross-validated score with permutations
"""Evaluates the significance of a cross-validated score by permutations.

Read more in the :ref:`User Guide <cross_validation>`.
Permutes labels and computes the p-value against the null
hypothesis that the features and the labels are independent, meaning that
there is no difference between the classes.

The p-value represents the fraction of randomized data sets where the
classifier would have had a larger error on the original data
than in the randomized one.

A small p-value (under a threshold, like ``0.05``) gives
enough evidence to conclude that the classifier has not learned a random
pattern in the data.

Read more in the :ref:`User Guide <cv_significance_evaluation>`.

.. versionadded:: 0.9

Parameters
----------
Expand Down Expand Up @@ -1050,11 +1064,12 @@ def permutation_test_score(estimator, X, y, groups=None, cv=None,

Notes
-----
This function implements Test 1 in:
This function implements "Test 1" as described in the following paper:

Ojala and Garriga. Permutation Tests for Studying Classifier
Performance. The Journal of Machine Learning Research (2010)
vol. 11
* `Permutation Tests for Studying Classifier Performance
<http://ieeexplore.ieee.org/document/5360332/>`_,
Ojala and Garriga - The Journal of Machine Learning Research (2010)
vol. 11

"""
X, y, groups = indexable(X, y, groups)
Expand Down