-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Raise an error when min_samples_split=1 in trees #25744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Raise an error when min_samples_split=1 in trees #25744
Conversation
@@ -364,7 +364,7 @@ class Interval(_Constraint): | |||
|
|||
Parameters | |||
---------- | |||
type : {numbers.Integral, numbers.Real} | |||
type : {numbers.Integral, numbers.Real, "real_not_int"} | |||
The set of numbers in which to set the interval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be good to have a description of these internally in the docstring? Just for community-devs that aren't familiar w/ what each of these are intended to mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a description of the new option.
A short description of all the constraints and what they represent can be found here https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_param_validation.py#L28
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I was just mentioning mainly the real_not_int
option, not the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes make sense to me and I'm glad this error is fixed since it actually came up for me before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, introducing "real_not_int" to Interval
is cleaner than adding another keyword parameter.
"closed must be either 'left', 'right', 'both' or 'neither'. " | ||
f"Got {self.closed} instead." | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the removal of @validate_params
, does self.left
and self.right
needs to be validated here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I added the missing validation
@jeremiedbb do you think that we should introduce this validation wherever we should in this PR or let it for another one? |
@glemaitre I already answered in the PR description :)
|
I should read the description of PRs :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #25481
Fixes a regression introduced by the new param validations. This PR introduces a new type for the Interval constraint: "real_not_int". It's useful when parameters have a behavior depending on whether the param is an int or a float.
@thomasjpfan I didn't follow exactly your suggestion from #25481 (comment) because there can only be 1 situation where invalid_type is meaningful, i.e real but not int (int but not real is already excluded by definition). I felt that making it a third type option was more natural. Let me know what you think.
I think we should use this type wherever a parameter has 2 Interval constraints, one for ints and one for reals. It would make constraints more disjoints and would prevent more situations like this one. I plan to do that in a separate PR though.