Skip to content

Poisson criterion in RandomForestRegressor #19304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks
lorentzenchr opened this issue Jan 30, 2021 · 18 comments · Fixed by #19836
Closed
3 tasks

Poisson criterion in RandomForestRegressor #19304

lorentzenchr opened this issue Jan 30, 2021 · 18 comments · Fixed by #19836
Assignees
Labels
Easy Well-defined and straightforward way to resolve module:ensemble New Feature
Milestone

Comments

@lorentzenchr
Copy link
Member

lorentzenchr commented Jan 30, 2021

Describe the workflow you want to enable

I want to officially use the Poisson splitting criterion in RandomForestRegressor.

Describe your proposed solution

#17386 implemented the poisson splitting criterion for DecisionTreeRegressor and ExtraTreeRegressor. This also enabled—somewhat silently—to do:

import numpy as np
from sklearn.ensemble import RandomForestRegressor
y = [0, 1, 2]
X = np.arange(6).reshape(3, 2)
rf = RandomForestRegressor(criterion="poisson")
rf.fit(X, y)

Note: The same is true for ensemble.ExtraTreesRegressor.

Tasks:

  • Add the poisson splitting criterion to the docstring of RandomForestRegressor.
  • Add input validation (non-negative y) to RandomForestRegressor.
  • Expand the tests for RandomForestRegressor.
@lorentzenchr lorentzenchr added New Feature Easy Well-defined and straightforward way to resolve module:ensemble labels Jan 30, 2021
@pk1130
Copy link

pk1130 commented Jan 30, 2021

take

@NicolasHug
Copy link
Member

Add input validation (non-negative y) to RandomForestRegressor.

Isn't that taken care of by the DecisionTreeRegressor instances?

@pk1130
Copy link

pk1130 commented Feb 1, 2021

Hey guys! I'm new to open source contributions, and I'd love to contribute to this issue (the reason I took this up as soon as it was posted :)). I'm an avid machine learning enthusiast and scikit-learn user! Any tips for a newcomer like me? Since I'm not fully familiar with the entire codebase, it's taking me a little while to navigate my way around here.

@lorentzenchr
Copy link
Member Author

Add input validation (non-negative y) to RandomForestRegressor.

Isn't that taken care of by the DecisionTreeRegressor instances?

import numpy as np
from sklearn.ensemble import RandomForestRegressor

X = np.array([[1, 2, 3]]).T
y = [-1, 0, 1]
rf = RandomForestRegressor(criterion="poisson", random_state=4)
rf.fit(X, y)
rf.predict(X)

gives array([-0.07666667, -0.07666667, -0.07666667]) but should raise ValueError.

@lorentzenchr
Copy link
Member Author

lorentzenchr commented Feb 1, 2021

@pk1130 Welcome. I hope your motivation keeps on going.
You'll find general advice in the developer's guide for contributing.
The class RandomForestRegressor lives in sklearn/ensemble/_forest.py. Input validation should happen in def fit(..) which is inherited from class BaseForest. So you need to adapt the code there.
I would start with adding the poisson criterion to the docstring of RandomForestRegressor.
Let me know if you need more guidance.

@pk1130
Copy link

pk1130 commented Feb 1, 2021

@lorentzenchr Thanks for the welcome and the quick direction to parts of the codebase. I was reading through class RandomForestRegressor just right now. I'll start off with the docstring as you suggested.

@lorentzenchr
Copy link
Member Author

@pk1130 Could you open a PR or can I help you somehow?
I'm asking because there is a sprint this Saturday and this issue might be a good candidate for it. I do not intend to take this issue away from you, just informing.

@pk1130
Copy link

pk1130 commented Feb 6, 2021

@lorentzenchr Sorry for the delayed response. Last couple days have been crazy with me injuring my ankle ligament. Couldn't get to this sooner. I've updated the docstring as you'd suggested and I'm figuring out how to add input validation. I'm guessing it's fairly trivial but since I'm relatively new to the codebase, just figuring it out. Haven't looked at the test suite yet to expand on it. Let me know if there's some initial work I can get done before today's sprint meeting.

@lorentzenchr
Copy link
Member Author

@pk1130 Easy, there is no rush and good recovery for you foot. If you open a PR, it is easier to help out. In order to open one, you push your branch to your forked repository on github, then you can add a pull request to scikit-learn from github (on your fork).

@anashas
Copy link
Contributor

anashas commented Feb 11, 2021

hi @lorentzenchr can I start working on this?

@lorentzenchr
Copy link
Member Author

@designer1234 This issue is currently taken by @pk1130 as you can see under Assignees. So you can ask him and offer our help.

@pk1130
Copy link

pk1130 commented Feb 15, 2021

Hey @lorentzenchr! Apologies for the delay in getting around to this. Opened a pull request like you'd suggested. Done everything other than expanding the test suite. Let me know what you think and whether you'd be able to review. Thanks!

@bsun94
Copy link
Contributor

bsun94 commented Apr 4, 2021

hi @lorentzenchr @pk1130 ! I'm new to the scikit-learn repo, and would love to help contribute. I was wondering if this ticket was open for additional help? If so, what could I do to help?

@lorentzenchr
Copy link
Member Author

@bsun94 Yes, help is welcome. You can either continue #19464 and incorporate the review comments or start fresh. The issue description at the top lists 3 tasks.

@bsun94
Copy link
Contributor

bsun94 commented Apr 6, 2021

thank you @lorentzenchr ! I might just need a bit of time to get more familiar with the code base, but will be putting in some time this & next week.

@lorentzenchr lorentzenchr assigned bsun94 and unassigned pk1130 Apr 6, 2021
@lorentzenchr
Copy link
Member Author

@bsun94 Great. It is usually advantageous to open/have a PR early such that we can help you there.

@bsun94
Copy link
Contributor

bsun94 commented Apr 9, 2021

Hi there! I'd put in a PR inclusive of a test - please let me know if it looks good!

bsun94 added a commit to bsun94/scikit-learn that referenced this issue Apr 10, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue Apr 10, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue Apr 10, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue Apr 11, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue Apr 11, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue Apr 11, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue Apr 11, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue Apr 11, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue Apr 17, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue Apr 18, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue Apr 24, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue May 8, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue May 27, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue May 27, 2021
bsun94 added a commit to bsun94/scikit-learn that referenced this issue May 31, 2021
@lorentzenchr lorentzenchr linked a pull request Jun 6, 2021 that will close this issue
3 tasks
@lorentzenchr
Copy link
Member Author

Closed in #19836.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve module:ensemble New Feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants