From 6fb0605cc342f13b2a07d2b690e710a4d0481b4b Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Fri, 31 Jul 2020 22:31:10 +0200
Subject: [PATCH 1/5] wip

---
 doc/modules/cross_validation.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/modules/cross_validation.rst b/doc/modules/cross_validation.rst
index 10808c4f4c82b..bf8552fb54398 100644
--- a/doc/modules/cross_validation.rst
+++ b/doc/modules/cross_validation.rst
@@ -856,3 +856,10 @@ Cross validation and model selection
 Cross validation iterators can also be used to directly perform model
 selection using Grid Search for the optimal hyperparameters of the
 model. This is the topic of the next section: :ref:`grid_search`.
+
+.. _permutation_test_score:
+
+Permutation test score
+======================
+
+

From f4601399ec97fedc7032edc8add5b0d82770e557 Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Sat, 1 Aug 2020 17:50:29 +0200
Subject: [PATCH 2/5] add user guide

---
 doc/modules/cross_validation.rst       | 40 ++++++++++++++++++++++++++
 sklearn/model_selection/_validation.py |  2 ++
 2 files changed, 42 insertions(+)

diff --git a/doc/modules/cross_validation.rst b/doc/modules/cross_validation.rst
index bf8552fb54398..6db2fe5195e6e 100644
--- a/doc/modules/cross_validation.rst
+++ b/doc/modules/cross_validation.rst
@@ -862,4 +862,44 @@ model. This is the topic of the next section: :ref:`grid_search`.
 Permutation test score
 ======================
 
+:func:`~sklearn.model_selection.permutation_test_score` offers another way
+to evaluate the performance of classifiers. It provides a permutation-based
+p-value, which represents how likely an observed performance of the
+classifier would be obtained by chance. The null hypothesis in this test is
+that the features and labels are independent.
+:func:`~sklearn.model_selection.permutation_test_score` generates a null
+distribution by calculating `n_permutations` different permutations of the
+data. In each permutation the labels are randomly shuffled, thereby removing
+any dependency between the features (data) and the labels. The p-value output
+is the fraction of permutations for which the score obtained is better
+that the score obtained using the original data.
+
+A low p-value provides evidence that the dataset contains real dependency
+between features and labels and the classifier was able to utilize this
+to obtain good results. A high p-value could be due to a lack of dependency
+between features and labels (there is no difference in feature values between
+the classes) or because the classifier was not able to use the dependency in
+the data. In the latter case, using a more appropriate classifier that
+is able to utilize the structure in the data, would result in a low
+p-value.
+
+Cross-validation provides information about how well a classifier generalizes,
+specifically the range of expected errors of the classifier. However, a
+classifier trained on a high dimensional dataset with no structure may still
+perform well on cross-validation.
+:func:`~sklearn.model_selection.permutation_test_score` provides information
+on whether the classifier has found a real class structure and can help in
+evaluating the performance of the classifier.
+
+Finally, it is important to note that this test has been shown to produce low
+p-values even if there is only weak structure in the data
 
+.. topic:: Examples
+
+    * :ref:`sphx_glr_auto_examples_model_selection_plot_permutation_test_for_classification.py`
+
+.. topic:: References:
+
+ * Ojala and Garriga. `Permutation Tests for Studying Classifier Performance
+   <http://www.jmlr.org/papers/volume11/ojala10a/ojala10a.pdf>`_.
+   J. Mach. Learn. Res. 2010.
diff --git a/sklearn/model_selection/_validation.py b/sklearn/model_selection/_validation.py
index bcfe9a6bafedf..923d5a2a65c65 100644
--- a/sklearn/model_selection/_validation.py
+++ b/sklearn/model_selection/_validation.py
@@ -1053,6 +1053,8 @@ def permutation_test_score(estimator, X, y, *, groups=None, cv=None,
     and targets or the estimator was not able to use the dependency to
     give good predictions.
 
+    Read more in the :ref:`User Guide <permutation_test_score>`.
+
     Parameters
     ----------
     estimator : estimator object implementing 'fit'

From 9e621b246562841630addb2a55e5880a030a696e Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Sat, 1 Aug 2020 18:59:48 +0200
Subject: [PATCH 3/5] fix link to ex

---
 doc/modules/cross_validation.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/modules/cross_validation.rst b/doc/modules/cross_validation.rst
index 6db2fe5195e6e..4b60d38650484 100644
--- a/doc/modules/cross_validation.rst
+++ b/doc/modules/cross_validation.rst
@@ -896,7 +896,7 @@ p-values even if there is only weak structure in the data
 
 .. topic:: Examples
 
-    * :ref:`sphx_glr_auto_examples_model_selection_plot_permutation_test_for_classification.py`
+    * :ref:`sphx_glr_auto_examples_feature_selection_plot_permutation_test_for_classification.py`
 
 .. topic:: References:
 

From 10f8286a1087f98c21fbd4bd2def256c06f0c88f Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Tue, 4 Aug 2020 20:27:13 +0200
Subject: [PATCH 4/5] suggestion

---
 doc/modules/cross_validation.rst | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/doc/modules/cross_validation.rst b/doc/modules/cross_validation.rst
index 4b60d38650484..c35a072ae0f88 100644
--- a/doc/modules/cross_validation.rst
+++ b/doc/modules/cross_validation.rst
@@ -892,7 +892,10 @@ on whether the classifier has found a real class structure and can help in
 evaluating the performance of the classifier.
 
 Finally, it is important to note that this test has been shown to produce low
-p-values even if there is only weak structure in the data
+p-values even if there is only weak structure in the data because in the
+corresponding permutated datasets there is absolutely no structure. This
+test is therefore only able to show when the model reliably outperforms
+random guessing.
 
 .. topic:: Examples
 

From e314e4f09331322a8b0a70f572c8520bccd204fa Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Thu, 6 Aug 2020 19:09:37 +0200
Subject: [PATCH 5/5] suggestions

---
 doc/modules/cross_validation.rst | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/doc/modules/cross_validation.rst b/doc/modules/cross_validation.rst
index c35a072ae0f88..e0fc1a61ac2bc 100644
--- a/doc/modules/cross_validation.rst
+++ b/doc/modules/cross_validation.rst
@@ -866,13 +866,16 @@ Permutation test score
 to evaluate the performance of classifiers. It provides a permutation-based
 p-value, which represents how likely an observed performance of the
 classifier would be obtained by chance. The null hypothesis in this test is
-that the features and labels are independent.
+that the classifier fails to leverage any statistical dependency between the
+features and the labels to make correct predictions on left out data.
 :func:`~sklearn.model_selection.permutation_test_score` generates a null
 distribution by calculating `n_permutations` different permutations of the
 data. In each permutation the labels are randomly shuffled, thereby removing
-any dependency between the features (data) and the labels. The p-value output
-is the fraction of permutations for which the score obtained is better
-that the score obtained using the original data.
+any dependency between the features and the labels. The p-value output
+is the fraction of permutations for which the average cross-validation score
+obtained by the model is better than the cross-validation score obtained by
+the model using the original data. For reliable results ``n_permutations``
+should typically be larger than 100 and ``cv`` between 3-10 folds.
 
 A low p-value provides evidence that the dataset contains real dependency
 between features and labels and the classifier was able to utilize this
@@ -886,17 +889,24 @@ p-value.
 Cross-validation provides information about how well a classifier generalizes,
 specifically the range of expected errors of the classifier. However, a
 classifier trained on a high dimensional dataset with no structure may still
-perform well on cross-validation.
+perform better than expected on cross-validation, just by chance.
+This can typically happen with small datasets with less than a few hundred
+samples.
 :func:`~sklearn.model_selection.permutation_test_score` provides information
 on whether the classifier has found a real class structure and can help in
 evaluating the performance of the classifier.
 
-Finally, it is important to note that this test has been shown to produce low
+It is important to note that this test has been shown to produce low
 p-values even if there is only weak structure in the data because in the
 corresponding permutated datasets there is absolutely no structure. This
 test is therefore only able to show when the model reliably outperforms
 random guessing.
 
+Finally, :func:`~sklearn.model_selection.permutation_test_score` is computed
+using brute force and interally fits ``(n_permutations + 1) * n_cv`` models.
+It is therefore only tractable with small datasets for which fitting an
+individual model is very fast.
+
 .. topic:: Examples
 
     * :ref:`sphx_glr_auto_examples_feature_selection_plot_permutation_test_for_classification.py`