Preserving dtype for numpy.float32 in Least Angle Regression #20155

takoika · 2021-05-28T01:41:38Z

Reference Issues/PRs

This PR is part of #11000 .

What does this implement/fix? Explain your changes.

This PR makes trained coefficients of Least Angle Regression to be numpy.float32 if input data is numpy.float32.

Conventionally when you pass numpy.float32 to Least Angle Regression, trained coefficients turn to be numpy.float64. This is inconsistent.

Any other comments?

We can pass training data x and target y to Least Angle Regressor. Test cases in this PR are that both are numpy.flaot32 or numpy.float64, because it is unclear which type is appropriate when x's type is different from y's type. This is the same with #13243 .

Further test cases may be required because this PR does not cover argument variations.

I used #13303 and #13243 as references to make this.

…odel'coef_path should be tested or nor

cmarmo

Thanks @takoika for your pull request. A first comment about tests.

sklearn/linear_model/tests/test_least_angle.py

ogrisel

This LGTM. Just a small suggestion for an alternative strategy to set return_dtype.

Also I wonder if we should document this change in the changelog or not. I think we should, despite that it seems like a detail for most users but no strong opinion.

ogrisel · 2021-05-28T10:01:45Z

sklearn/linear_model/_least_angle.py

+    for input_array in (X, y, Xy, Gram):
+        if input_array is not None:
+            return_dtype = input_array.dtype
+            break


We can pass training data x and target y to Least Angle Regressor. Test cases in this PR are that both are numpy.flaot32 or numpy.float64, because it is unclear which type is appropriate when x's type is different from y's type. This is the same with #13243 .

Maybe the following would feel less arbitrary?

dtypes = set(a.dtype for a in (X, y, Xy, Gram) if a is not None) if len(dtypes) == 1: # use the precision level of input data if it is consistent return_dtype = next(iter(dtypes)) else: # fallback to double precision otherwise return_dtype = np.float64

I am not so sure. Maybe giving the priority to X and Gram over y and Xy would make more sense.

I which case, I think the tests should be extended to test the case where X.dtype is f32 but y.dtype is f64.

I don't think this case could happen because when the data is validated, X and y dtypes are matched

The suggestion is better. I have incorporated the suggestion. Thanks

Regarding a case where calling fitmethod with x.dtype=np.float32 and y.dtypes=np.float64, trained coef_.dtype is np.float32. It may not be desirable. The mismatched case would be addressed later PR.

I guess it's the expected outcome. In all estimators where both dtypes are supported, the dtype of y is always matched to the dtype of X.

sklearn/linear_model/tests/test_least_angle.py

thomasjpfan · 2021-05-28T13:11:42Z

Also I wonder if we should document this change in the changelog or not. I think we should, despite that it seems like a detail for most users but no strong opinion.

I think this PR needs a change log because it does change the behavior of least angle regression.

@takoika Please add an entry to the change log at doc/whats_new/v1.0.rst with tag |Enhancement|. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

doc/whats_new/v1.0.rst

jeremiedbb

LGTM, thanks @takoika !

takoika added 8 commits May 27, 2021 15:38

Add test case to confirm input/output type match

940b5b9

Add test case

d7a53f8

Set empty array's dtype from input array

2a4de87

Add test cases

23885aa

Address E302

d13de62

Test numerical consistency betwenn float32 and float64

ac59801

Use has_coef_path for parametrize test for explicitly represent the m…

ccd8841

…odel'coef_path should be tested or nor

Add model to be tested and add args to avoid warnings

71ee583

github-actions bot added the module:linear_model label May 28, 2021

cmarmo added the No Changelog Needed label May 28, 2021

cmarmo reviewed May 28, 2021

View reviewed changes

sklearn/linear_model/tests/test_least_angle.py Outdated Show resolved Hide resolved

takoika added 2 commits May 28, 2021 17:24

Use asser_allclose and set absolute torellance

b897381

Relax torellance in order to pass test

f7e1bbd

ogrisel approved these changes May 28, 2021

View reviewed changes

jeremiedbb reviewed May 28, 2021

View reviewed changes

sklearn/linear_model/tests/test_least_angle.py Outdated Show resolved Hide resolved

sklearn/linear_model/tests/test_least_angle.py Outdated Show resolved Hide resolved

sklearn/linear_model/tests/test_least_angle.py Show resolved Hide resolved

thomasjpfan removed the No Changelog Needed label May 28, 2021

takoika added 6 commits May 29, 2021 10:49

Add comments of test aim

e02dfa4

Remove an unnecessary variable

ebb7c44

Use consistent dtype as returned type among X, y, Xy and Gram

b25243c

Add change log

d50bf47

Use one parameter

3d088f1

Remove unnecessary print

f150fd2

jeremiedbb reviewed May 31, 2021

View reviewed changes

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved

jeremiedbb approved these changes May 31, 2021

View reviewed changes

jeremiedbb and others added 4 commits May 31, 2021 11:49

Update doc/whats_new/v1.0.rst

c1672a9

Merge branch 'master' into issue11000_least_angle

15103bd

Merge branch 'master' into issue11000_least_angle

3fa8f61

trigger ci

e0bdf88

jeremiedbb merged commit c8753d4 into scikit-learn:main May 31, 2021

takoika mentioned this pull request Dec 17, 2021

ENH Preserving dtype for np.float32 in *DictionaryLearning, SparseCoder and orthogonal_mp_gram #22002

Merged

This was referenced Jan 1, 2022

ENH Preserving dtype for np.float32 in SparsePCA and MiniBatchSparsePCA #22111

Merged

ENH Preserving dtype for np.float32 in LatentDirichletAllocation #22113

Closed

eddiebergman mentioned this pull request Nov 15, 2022

Update scikit learn 1.2 automl/auto-sklearn#1611

Closed

54 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Preserving dtype for numpy.float32 in Least Angle Regression #20155

Preserving dtype for numpy.float32 in Least Angle Regression #20155

takoika commented May 28, 2021

Uh oh!

cmarmo left a comment

Uh oh!

Uh oh!

ogrisel left a comment •

edited

Loading

Uh oh!

ogrisel May 28, 2021 •

edited

Loading

Uh oh!

ogrisel May 28, 2021

Uh oh!

jeremiedbb May 28, 2021

Uh oh!

takoika May 29, 2021

Uh oh!

takoika May 29, 2021

Uh oh!

jeremiedbb May 31, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasjpfan commented May 28, 2021

Uh oh!

Uh oh!

jeremiedbb left a comment

Uh oh!

Uh oh!

Uh oh!

Preserving dtype for numpy.float32 in Least Angle Regression #20155

Preserving dtype for numpy.float32 in Least Angle Regression #20155

Conversation

takoika commented May 28, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

cmarmo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel May 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel May 28, 2021

Choose a reason for hiding this comment

Uh oh!

jeremiedbb May 28, 2021

Choose a reason for hiding this comment

Uh oh!

takoika May 29, 2021

Choose a reason for hiding this comment

Uh oh!

takoika May 29, 2021

Choose a reason for hiding this comment

Uh oh!

jeremiedbb May 31, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasjpfan commented May 28, 2021

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel left a comment •

edited

Loading

ogrisel May 28, 2021 •

edited

Loading