Skip to content

ENH Allow sample_weight and other fit_params in RFE #20380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jul 27, 2021

Conversation

fbidu
Copy link
Contributor

@fbidu fbidu commented Jun 26, 2021

Reference Issues/PRs

This PR takes over the work from #7333 and fixes #7308
closes #7333

What does this implement/fix? Explain your changes.

This PR just continues the work from the original PR by implementing the changes on the current main branch

The original description reads:

Adds support for passing sample weights into RFE.fit() and having them used by the estimator's fit method.

Adds a test for this with the iris dataset, where sample weights are used to give one class double its normal weight and compare that to doubling the samples in that class. The test passes if both these approaches produce the same feature ranking. This feature ranking is different from the one which arises when all samples have the same weight.

c/c: @ijpulidos

@glemaitre
Copy link
Member

The current failure is due to the fact that you need an entry in the what's new.

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@fbidu
Copy link
Contributor Author

fbidu commented Jun 26, 2021

Thanks, @glemaitre! I'll do that

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@fbidu fbidu requested a review from glemaitre June 27, 2021 21:52
@reshamas
Copy link
Member

reshamas commented Jul 7, 2021

#DataUmbrella LATAM sprint

@adrinjalali
Copy link
Member

Alternatively, you could do what you see in this test: https://github.com/scikit-learn/scikit-learn/pull/20491/files#diff-54f1f33fb83821d84c45b02c34b247e231052da9380c79c6487a33a4d147ee72R499

@glemaitre I think the test I point to, runs faster and does the check as we want. WDYT?

Also, could we also add the same support for score?

@glemaitre
Copy link
Member

@adrinjalali Yes I agree. The test that you propose is more readable because it directly tests the metadata instead of using the fit to implicitly check that everything is fine. RFE is only passing the metadata around and not consuming them.

@glemaitre
Copy link
Member

@fbidu Could you implement what @adrinjalali proposed?

@glemaitre glemaitre changed the title Allow sample weights and other fit_params for RFE ENH Allow sample_weight and other fit_params in RFE Jul 22, 2021
@glemaitre glemaitre removed their request for review July 22, 2021 08:10
@reshamas
Copy link
Member

@fbidu Could you implement what @adrinjalali proposed?

@glemaitre Just sent a reminder email to both @fbidu and @ijpulidos, in case GitHub notifications get buried.

@fbidu
Copy link
Contributor Author

fbidu commented Jul 24, 2021

@fbidu Could you implement what @adrinjalali proposed?

@glemaitre sure, I'll give it a try

@fbidu fbidu requested a review from glemaitre July 24, 2021 14:32
@fbidu
Copy link
Contributor Author

fbidu commented Jul 24, 2021

So, I've made a 'merge' of sorts with @adrinjalali's mods in the other PR and added him as co-author as well. Will solve those conflicts now

@fbidu
Copy link
Contributor Author

fbidu commented Jul 24, 2021

Oh, I also removed the former test completely because I understood they are redundant to each other and @adrinjalali's is better. Let me know if I should keep them both

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

fbidu and others added 6 commits July 25, 2021 17:04
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the two minor changes, LGTM :)

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
@glemaitre glemaitre self-requested a review July 27, 2021 16:20
@glemaitre glemaitre merged commit eb901df into scikit-learn:main Jul 27, 2021
@glemaitre
Copy link
Member

Thanks @fbidu

TomDLT pushed a commit to TomDLT/scikit-learn that referenced this pull request Jul 29, 2021
…#20380)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
…#20380)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFE/RFECV doesn't work with sample weights
4 participants