Skip to content

Added user guide documentation for permutation_test_score #14757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

aditi9783
Copy link

@aditi9783 aditi9783 commented Aug 24, 2019

Reference Issues/PRs

Fixes #10905
Fixes #10931. Worked on the branch https://github.com/maskani-moh/scikit-learn/tree/fix-10905.

Closes #14769

What does this implement/fix? Explain your changes.

Added user guide documentation for permutation_test_score.

Any other comments?

Worked with @vmanisha at the NYC scikit-learn sprint 2019.

@NicolasHug
Copy link
Member

@aditi9783 should we close this PR in favor of #14769?

@aditi9783
Copy link
Author

@NicolasHug Sure.

@NicolasHug NicolasHug closed this Aug 24, 2019
…ction "Obtaining predictions by cross-validation" and modified function help text to follow PEP8.
@NicolasHug NicolasHug reopened this Aug 24, 2019
@NicolasHug
Copy link
Member

Actually re-opening so that @aditi9783 can update (will probably close #14769 then )

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aditi9783 and @vmanisha. I merged with master for the docs to compile (not sure what went wrong), and also made some very minor modifications.

Looks good!

@NicolasHug
Copy link
Member

@rth @amueller @thomasjpfan ;)

@rth rth changed the title Added user guide documentation for permutation_test_score #10905 Added user guide documentation for permutation_test_score Aug 26, 2019
It also returns cross validation scores for each permutation of y labels. It
permutes the labels of the samples and computes the p-value against the null
hypothesis that the features and the labels are independent, meaning that there
is no difference between the classes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"classes" implies classification, which we aren't necessarily talking about here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function permutes class labels. Isn't classification implied in that case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it peemutes target values for each sample, not class labels


It also returns cross validation scores for each permutation of y labels. It
permutes the labels of the samples and computes the p-value against the null
hypothesis that the features and the labels are independent, meaning that there
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "features" be "predictions"? Or is this right, if we add "conditioned on the estimator"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features are the input vectors (X), and thus not predictions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so you need to add "given the estimator"

approximates the probability that the average cross-validation score would be
obtained by chance if the target is independent of the data.

It also returns cross validation scores for each permutation of y labels. It
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking about return value here confuses the matter if what we want to talk about is how the P value is constructed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation is for user guide. The function description and the paper cited give more details about how the p-value is constructed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordinarily our user guide is less focused on API things like what the function returns. Conversely, a description of how the algorithm works belongs in the user guide not in the docstring.

Copy link
Contributor

@kellycarmody kellycarmody Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, we're trying to get all of the PR's from the WiMLDS sprint wrapped up, are we waiting on @aditi9783, or approval from another reviewer? Thanks!

Not completely sure what's going on with this one @reshamas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kellycarmody We are waiting on @aditi9783

@aditi9783 We use the user guide to explain the math with the details necessary to explain the function. The function's docstring is usually brief and links to the user guide. In this case, moving most of the docstring into the user guide would be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thomasjpfan, @reshamas asked me to take over this PR, but I see that Nicolas Hug already approved the changes, and it is just waiting for one more reviewer.

Is the PR good? Or should I move most of the docstring into the user guide?

@aditi9783
Copy link
Author

aditi9783 commented Sep 11, 2019 via email

@reshamas
Copy link
Member

@adijo @vmanisha
I am wondering if you need to edit the title to include "[MRG]"?

cc: @kellycarmody @NicolasHug

@kellycarmody
Copy link
Contributor

Hi @aditi9783, Thomas Fan mentioned that they are still waiting for you to close this out. Would you be able to work on this sometime soon so we can start closing the PR's from the sprint? Thanks!

@glemaitre glemaitre self-requested a review September 24, 2019 13:16
@reshamas
Copy link
Member

@amueller @jnothman @thomasjpfan
This article is locked: https://ieeexplore.ieee.org/document/5360332
It looks like it is for corporate members, IEEE members or by paid subscription.

Is this typical to include a reference link in documentation that may require payment? I was attempting to access the paper to better understand the comments on this PR.

cc: @cmarmo (this PR is from NYC wimlds Aug 2019 sprint)

@glemaitre
Copy link
Member

In this specific case, it is better to cite the following: http://www.jmlr.org/papers/volume11/ojala10a/ojala10a.pdf which is an open-access JMLR paper.

It should the extended version of the IEEE version which was a conference paper.

Is this typical to include a reference link in documentation that may require payment?

If we really want to acknowledge the original method of an author, it might be the only way sometimes.

@reshamas
Copy link
Member

reshamas commented Jan 3, 2020

I propose adding this PR to the upcoming Berlin beginner sprint list of issues.

cc: @adrinjalali @cmarmo

@adrinjalali
Copy link
Member

@reshamas do you have a repo as you usually do for the sprints to add these to a project on that repo?

@reshamas
Copy link
Member

reshamas commented Jan 3, 2020

@adrinjalali
Yes, I just added you to the repo:
https://github.com/WiMLDS/berlin-2020-scikit-sprint

@reshamas
Copy link
Member

reshamas commented Jan 3, 2020

@stareh
Copy link
Contributor

stareh commented Jan 25, 2020

Hey I just started working on this citation problem @adrinjalali @reshamas @aditi9783
#wimlds

@reshamas
Copy link
Member

@cmarmo this PR is one of the last ones from the NYC Aug 2019 sprint. Does it need a reviewer?

@cmarmo cmarmo added the Stalled label Jun 26, 2020
@cmarmo
Copy link
Contributor

cmarmo commented Jun 26, 2020

@cmarmo this PR is one of the last ones from the NYC Aug 2019 sprint. Does it need a reviewer?

Hi @reshamas , if I understand correctly, @lucyleeow has taken over in #17373.

@adrinjalali
Copy link
Member

doesn't that mean this is superseded and not stalled @cmarmo ?

@cmarmo cmarmo added Superseded PR has been replace by a newer PR and removed Stalled labels Jun 26, 2020
@cmarmo
Copy link
Contributor

cmarmo commented Jul 31, 2020

Closed by #17373.

@cmarmo cmarmo closed this Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:model_selection Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

permutation_test_score has no user guide