-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Preserving dtype for numpy.float32 in Least Angle Regression #20155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserving dtype for numpy.float32 in Least Angle Regression #20155
Conversation
…odel'coef_path should be tested or nor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @takoika for your pull request. A first comment about tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Just a small suggestion for an alternative strategy to set return_dtype
.
Also I wonder if we should document this change in the changelog or not. I think we should, despite that it seems like a detail for most users but no strong opinion.
sklearn/linear_model/_least_angle.py
Outdated
for input_array in (X, y, Xy, Gram): | ||
if input_array is not None: | ||
return_dtype = input_array.dtype | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can pass training data x and target y to Least Angle Regressor. Test cases in this PR are that both are numpy.flaot32 or numpy.float64, because it is unclear which type is appropriate when x's type is different from y's type. This is the same with #13243 .
Maybe the following would feel less arbitrary?
dtypes = set(a.dtype for a in (X, y, Xy, Gram) if a is not None)
if len(dtypes) == 1:
# use the precision level of input data if it is consistent
return_dtype = next(iter(dtypes))
else:
# fallback to double precision otherwise
return_dtype = np.float64
I am not so sure. Maybe giving the priority to X
and Gram
over y
and Xy
would make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I which case, I think the tests should be extended to test the case where X.dtype
is f32 but y.dtype
is f64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this case could happen because when the data is validated, X and y dtypes are matched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion is better. I have incorporated the suggestion. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding a case where calling fit
method with x.dtype=np.float32 and y.dtypes=np.float64, trained coef_.dtype
is np.float32. It may not be desirable. The mismatched case would be addressed later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's the expected outcome. In all estimators where both dtypes are supported, the dtype of y is always matched to the dtype of X.
I think this PR needs a change log because it does change the behavior of least angle regression. @takoika Please add an entry to the change log at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @takoika !
Reference Issues/PRs
This PR is part of #11000 .
What does this implement/fix? Explain your changes.
This PR makes trained coefficients of Least Angle Regression to be numpy.float32 if input data is numpy.float32.
Conventionally when you pass numpy.float32 to Least Angle Regression, trained coefficients turn to be numpy.float64. This is inconsistent.
Any other comments?
We can pass training data x and target y to Least Angle Regressor. Test cases in this PR are that both are numpy.flaot32 or numpy.float64, because it is unclear which type is appropriate when x's type is different from y's type. This is the same with #13243 .
Further test cases may be required because this PR does not cover argument variations.
I used #13303 and #13243 as references to make this.