-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Added user guide documentation for permutation_test_score #14757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@aditi9783 should we close this PR in favor of #14769? |
@NicolasHug Sure. |
…ction "Obtaining predictions by cross-validation" and modified function help text to follow PEP8.
Actually re-opening so that @aditi9783 can update (will probably close #14769 then ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aditi9783 and @vmanisha. I merged with master for the docs to compile (not sure what went wrong), and also made some very minor modifications.
Looks good!
It also returns cross validation scores for each permutation of y labels. It | ||
permutes the labels of the samples and computes the p-value against the null | ||
hypothesis that the features and the labels are independent, meaning that there | ||
is no difference between the classes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"classes" implies classification, which we aren't necessarily talking about here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function permutes class labels. Isn't classification implied in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it peemutes target values for each sample, not class labels
|
||
It also returns cross validation scores for each permutation of y labels. It | ||
permutes the labels of the samples and computes the p-value against the null | ||
hypothesis that the features and the labels are independent, meaning that there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "features" be "predictions"? Or is this right, if we add "conditioned on the estimator"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Features are the input vectors (X), and thus not predictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so you need to add "given the estimator"
approximates the probability that the average cross-validation score would be | ||
obtained by chance if the target is independent of the data. | ||
|
||
It also returns cross validation scores for each permutation of y labels. It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking about return value here confuses the matter if what we want to talk about is how the P value is constructed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation is for user guide. The function description and the paper cited give more details about how the p-value is constructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordinarily our user guide is less focused on API things like what the function returns. Conversely, a description of how the algorithm works belongs in the user guide not in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, we're trying to get all of the PR's from the WiMLDS sprint wrapped up, are we waiting on @aditi9783, or approval from another reviewer? Thanks!
Not completely sure what's going on with this one @reshamas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kellycarmody We are waiting on @aditi9783
@aditi9783 We use the user guide to explain the math with the details necessary to explain the function. The function's docstring is usually brief and links to the user guide. In this case, moving most of the docstring into the user guide would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @thomasjpfan, @reshamas asked me to take over this PR, but I see that Nicolas Hug already approved the changes, and it is just waiting for one more reviewer.
Is the PR good? Or should I move most of the docstring into the user guide?
This PR is completed from our (me and my sprint partner, Manisha Verma)
side and is ready for merge. I don't know if we need approval from a core
contributor or if there is anything else we need to do.
Thanks,
Aditi
…On Wed, Sep 11, 2019, 8:48 AM kellycarmody ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In doc/modules/cross_validation.rst
<#14757 (comment)>
:
> @@ -300,6 +300,23 @@ section.
* :ref:`sphx_glr_auto_examples_model_selection_plot_cv_predict.py`,
* :ref:`sphx_glr_auto_examples_model_selection_plot_nested_cross_validation_iris.py`.
+
+.. _cv_significance_evaluation:
+
+Cross-validation significance evaluation
+----------------------------------------
+
+Significance of cross validation scores can be evaluated using the
+:func:`permutation_test_score` function. The function returns a p-value, which
+approximates the probability that the average cross-validation score would be
+obtained by chance if the target is independent of the data.
+
+It also returns cross validation scores for each permutation of y labels. It
Hi, we're trying to get all of the PR's from the WiMLDS sprint wrapped up,
are we waiting on
@aditi9783 <https://github.com/aditi9783>, or approval from another
reviewer? Thanks
Not completely sure what's going on with this one @reshamas
<https://github.com/reshamas>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14757?email_source=notifications&email_token=AALMXJGABNVXXGXNP6WDO4DQJBPPJA5CNFSM4IPGTBXKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCEKC74A#discussion_r323043617>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AALMXJH5X6PT64OJFQK7V7TQJBPPJANCNFSM4IPGTBXA>
.
|
Hi @aditi9783, Thomas Fan mentioned that they are still waiting for you to close this out. Would you be able to work on this sometime soon so we can start closing the PR's from the sprint? Thanks! |
@amueller @jnothman @thomasjpfan Is this typical to include a reference link in documentation that may require payment? I was attempting to access the paper to better understand the comments on this PR. cc: @cmarmo (this PR is from NYC wimlds Aug 2019 sprint) |
In this specific case, it is better to cite the following: http://www.jmlr.org/papers/volume11/ojala10a/ojala10a.pdf which is an open-access JMLR paper. It should the extended version of the IEEE version which was a conference paper.
If we really want to acknowledge the original method of an author, it might be the only way sometimes. |
I propose adding this PR to the upcoming Berlin beginner sprint list of issues. cc: @adrinjalali @cmarmo |
@reshamas do you have a repo as you usually do for the sprints to add these to a project on that repo? |
@adrinjalali |
also added @cmarmo to repo: |
Hey I just started working on this citation problem @adrinjalali @reshamas @aditi9783 |
@cmarmo this PR is one of the last ones from the NYC Aug 2019 sprint. Does it need a reviewer? |
Hi @reshamas , if I understand correctly, @lucyleeow has taken over in #17373. |
doesn't that mean this is superseded and not stalled @cmarmo ? |
Closed by #17373. |
Reference Issues/PRs
Fixes #10905
Fixes #10931. Worked on the branch https://github.com/maskani-moh/scikit-learn/tree/fix-10905.
Closes #14769
What does this implement/fix? Explain your changes.
Added user guide documentation for permutation_test_score.
Any other comments?
Worked with @vmanisha at the NYC scikit-learn sprint 2019.