-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX avoid parameter validation in init in TweedieRegressor #22124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX avoid parameter validation in init in TweedieRegressor #22124
Conversation
I find it odd that family is both an attribute and also a property. I'm not sure why we're making it immutable by having those properties really. If the user sets an invalid value, it should raise in I'm happy with the change, but as you say, it's backward incompatible. I wonder what @jnothman thinks about it. This PR also needs a changelog entry. |
I share the same feeling. Maybe @lorentzenchr can comment on this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @adrinjalali and to be more specific, I think the following should work although I am not 100% sure.
family_instance : TweedieDistribution | ||
Returns an instance of TweedieDistribution. | ||
""" | ||
return TweedieDistribution(power=self.power) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would be in favor of removing the TweedieRegressor.family
property getter and setter and instead validate the consistency of self.family
in _get_family_instance
which should be only called at fit
time instead of __init__
or set_params
time:
return TweedieDistribution(power=self.power) | |
if isinstance(self.family, TweedieDistribution): | |
# ignore self.power if the user passed a distribution instance directly | |
# for backward compat | |
return self.family | |
elif self.family == "tweedie": | |
return TweedieDistribution(power=self.power) | |
else: | |
raise ValueError( | |
"TweedieRegressor.family must always be 'tweedie', " | |
f"got: {self.family!r}" | |
) |
disclaimer: untested, there might be typos.
The issue was that est = PoissonRegressor()
est.family = "gamma" and get an estimator that is not actually a PoissonRegressor. We could enforce this in fit, but then it means that TweedieRegressor.fit needs to be aware of its subclasses PoissonRegressor which is a bit contrary to how class inheritance works. Generally, +1 to have a method I was really convinced we already had this common test for not validating in init when this PR was made. And it's the first time I hear about validation in set_params being inconsistent with the API though it makes sense in retrospective. |
I think we do have other estimators which can be tampered with to put them in an odd state. I'd say we don't have to worry about users doing odd things like the one you mention. If we do, we can add a So it seems to me that we have a consensus to remove the |
I'm OK with it, maybe let's also wait for @lorentzenchr's feedback. |
I'm +0, so no objection. Is it possible to rename the family parameter in the subclasses, e.g. to |
I'm not sure if it'd make sense to have a private property to check the validity of the value. If it's private, then people won't be changing it, I guess? @marenwestermann I think you can go ahead with the changes now. |
I opened #22548 for very different reasons. But the refactoring there also solves the issue of this PR. |
Reference Issues/PRs
Towards #21406
What does this implement/fix? Explain your changes.
I removed the parameter validation of the attribute
family
inTweedieRegressor.__init__
and changed the tests accordingly.Any other comments?
The test failure described in #21406 was due to the attribute
family
being set toTweedieDistribution
in init (family=TweedieDistribution(power=power)
).In the module
sklearn/linear_model/_glm/glm.py
thePoissonRegressor
,GammaRegressor
andTweedieRegressor
have the classGeneralizedLinearRegressor
as a shared parent class. In the first two estimators thefamily
parameter is set as a string. I therefore changed the family inTweedieRegressor
to be a string as well, however my implementation is not backward compatible.I introduced the method
_get_family_instance
inGeneralizedLinearRegressor
andTweedieRegressor
in order to get the correct distribution for the regressors and to set thepower
correctly inTweedieRegressor
.ping @adrinjalali @rth