ENH Preserving dtype for np.float32 in *DictionaryLearning, SparseCoder and orthogonal_mp_gram #22002

takoika · 2021-12-17T02:10:06Z

Reference Issues/PRs

This PR is part of #11000 .

What does this implement/fix? Explain your changes.

This PR makes an obtained code and dictionary of Dictionary Learning numpy.float32 when input data is numpy.float32 in order to preserve input data type.

Any other comments?

I found two difficulties in testing numerical consistency between numpy.float32 and numpy.float64 for dictionary learning.

optimal code and dictionary are not unique.
difficult to match with high precision(maybe due to multiple linear algebra)

In the scope of the PR it is OK. But potentially this makes it difficult to guarantee numerical consistency for downstream methods to use Dictionary learning.

Further test cases may be required because this PR does not cover argument variations.

I used #13303, #13243 and #20155 as references to make this.

…onary

…0dict_learning

…float32 and np.float64

ogrisel

Just a few questions/suggestions for improvements below but the PR already LGTM as it is.

Please do not forget to document the enhancement in doc/whats_new/v1.1.rst.

ogrisel · 2021-12-17T14:28:20Z

sklearn/decomposition/tests/test_dict_learning.py

+    # instead of comparing directory U and V.
+    assert_allclose(np.matmul(U_64, V_64), np.matmul(U_32, V_32), rtol=rtol, atol=atol)
+    assert_allclose(np.sum(np.abs(U_64)), np.sum(np.abs(U_32)), rtol=rtol, atol=atol)
+    assert_allclose(np.sum(V_64 ** 2), np.sum(V_32 ** 2), rtol=rtol, atol=atol)


This is a clever way to test numerical equivalence of the solutions.

I am just wandering, is rtol really necessary if we already pass atol=1e-7?

I'd rather have only rtol than only atol. In addition I think rtol=1e-7 is a little bit too optimistic when comparing float32s because it's slightly lower than the machine precision. 1e-6 would be safer.

Thanks!
I have changed to use only rtol.
Anyway some tests are difficult to pass rtol=1e-7 or 1e-6. So minimum rtol to pass tests are set.

sklearn/decomposition/tests/test_dict_learning.py

jeremiedbb · 2021-12-21T10:19:08Z

doc/whats_new/v1.1.rst

+- |Enhancement| `dict_learning` and `dict_learning_online` methods preserve dtype for numpy.float32.
+  :pr:`22002` by :user:`Takeshi Oura <takoika>`.
+


Your changes also impact DictionaryLearning, MiniBatchDictionaryLearning and SparseCoder. Please mention them here. I think it would also be nice to add similar tests for those as well

Changed.
Also unit tests to verify dtype matching for DictionaryLearning, MiniBatchDictionaryLearning and SparseCoder are added. Then a test for numerical consistency among np.float32 and np.float64 in sparse_encode method is added.

…aryLearning and MiniBatchDictionaryLearning

…encode

jeremiedbb · 2021-12-22T15:24:30Z

I suspect that we only need to change this to make "omp" preserve dtype as well

scikit-learn/sklearn/linear_model/_omp.py

Line 557 in 532e1bc

coef = np.zeros((len(Gram), Xy.shape[1], len(Gram)))

scikit-learn/sklearn/linear_model/_omp.py

Line 559 in 532e1bc

coef = np.zeros((len(Gram), Xy.shape[1]))

adding dtype=Gram.dtype.

Would you mind trying this ? If it requires more work, we can do it later in a separated PR.

takoika · 2021-12-22T16:34:35Z

I suspect that we only need to change this to make "omp" preserve dtype as well

Thank you for the suggestion.
By the proposed changes omp algorithm come to pass unit test. So I have added the change to this PR.

takoika · 2021-12-23T05:03:28Z

@jeremiedbb
The change of scikit-learn/sklearn/linear_model/_omp.py changes the behaviour of orthogonal_mp_gram. So is it needed to mention it in change log?

Anyway othogonal_mp, OrthogonalMatchingPursuit and OrthogonalMatchingPursuitCV will be affected by the change as well as orthogonal_mp_gram. It would be better to add unit tests to confirm relations between input dtype and output dtype for them. But in terms of making PR small and simple, it would be better to address _omp.py at next PR.
Any thoughts?

glemaitre · 2021-12-23T10:38:34Z

Something that will be required as well for the common test is to add the proper tag to the classes that are preserving the dtype. It boils down to adding the "preserves_dtype" key into the dictionary returned by _more_tags.

def _more_tags(self):
        return {
            "preserves_dtype": [np.float64, np.float32],
        }

This should be added to:

DictionaryLearning
MiniBatchDictionaryLearning
SparseCoder

Some checks are probably a bit duplicated with the one written but I don't think that this is a big deal because I am not sure that we are running the common test for all the above classes.

glemaitre · 2021-12-23T10:40:22Z

Anyway othogonal_mp, OrthogonalMatchingPursuit and OrthogonalMatchingPursuitCV will be affected by the change as well as orthogonal_mp_gram. It would be better to add unit tests to confirm relations between input dtype and output dtype for them. But in terms of making PR small and simple, it would be better to address _omp.py at next PR.
Any thoughts?

I would advise making it in a separate PR if this is not as straightforward regarding the behaviour.

takoika · 2021-12-23T17:48:46Z

@glemaitre
Thanks!

Something that will be required as well for the common test is to add the proper tag to the classes that are preserving the dtype. It boils down to adding the "preserves_dtype" key into the dictionary returned by _more_tags.

I have added preserves_dtype tag.

I would advise making it in a separate PR if this is not as straightforward regarding the behaviour.

I have confirmed that the change affect only orthogonal_mp_gram . So I have added unit tests to verify the behaviour that input np.float32 is preserved as output np.float32 for orthogonal_mp_gram. othogonal_mp, OrthogonalMatchingPursuit and OrthogonalMatchingPursuitCV do not yet preserve np.float32. So I will leave as is.

glemaitre

LGTM on my side. @jeremiedbb are you OK with the PR as-is.

…0dict_learning

jeremiedbb

LGTM. Thanks @takoika !

…er and orthogonal_mp_gram (scikit-learn#22002)

takoika added 7 commits May 27, 2021 13:06

Add test for checking consistency beteen input data and learned dicti…

03e6b52

…onary

Merge branch 'main' of github.com:takoika/scikit-learn into issue1100…

9b5e794

…0dict_learning

Accept np.float32 for check_array

c6508da

Split test function for dict_learning and dict_learning_online

32440c6

Merge branch 'main' of github.com:takoika/scikit-learn into issue1100…

70927a7

…0dict_learning

Add unit tests for numerical consistency of dict learning between np.…

1146d50

…float32 and np.float64

refine comments

bd91ed7

github-actions bot added the module:decomposition label Dec 17, 2021

ogrisel approved these changes Dec 17, 2021

View reviewed changes

takoika added 2 commits December 20, 2021 14:42

Add change log

cd67e57

Fix pr number

512f15a

jeremiedbb reviewed Dec 21, 2021

View reviewed changes

takoika added 6 commits December 22, 2021 00:20

Update test parameters and add test for verifying not degenarate

ffabf28

Fix typo

78f8ef1

Add tests for dtype preserving in sparse_encode, SparseCoder, Diction…

d7f15be

…aryLearning and MiniBatchDictionaryLearning

Add numerical consistency among np.float32 and np.float64 for sparse_…

e0a4f5b

…encode

Refactor for visibility

5135c04

update whats_new

88060d1

takoika requested a review from jeremiedbb December 22, 2021 07:17

Relax rtol

ea775fe

takoika added 2 commits December 23, 2021 01:17

Initialize coef by same dtype with input gram matrix

3325b9b

update changelog

6ce4ecc

glemaitre changed the title ~~Preserving dtype for numpy.float32 in Dictionary learning~~ ENH Preserving dtype for np.float32 in *DictionaryLearning and SparseCoder Dec 23, 2021

takoika added 2 commits December 23, 2021 23:50

Remove incorrect todo

0090cd6

Add test for preverving input dtype np.float32 in orthogonal_mp_gram

1a3b0bd

takoika added 3 commits December 24, 2021 00:04

Add changelog

754eacb

refactor

9c9c4a3

Add preserves_dtype tag

d940e21

takoika changed the title ~~ENH Preserving dtype for np.float32 in *DictionaryLearning and SparseCoder~~ ENH Preserving dtype for np.float32 in *DictionaryLearning, SparseCoder and orthogonal_mp_gram Dec 23, 2021

glemaitre approved these changes Dec 24, 2021

View reviewed changes

takoika added 2 commits December 25, 2021 16:01

Merge branch 'main' of github.com:takoika/scikit-learn into issue1100…

ea01889

…0dict_learning

Revert accidental change introduced by merging upstream

8f4c61b

jeremiedbb approved these changes Dec 30, 2021

View reviewed changes

jeremiedbb merged commit 5dfa5d9 into scikit-learn:main Dec 30, 2021

venkyyuvy pushed a commit to venkyyuvy/scikit-learn that referenced this pull request Jan 1, 2022

ENH Preserving dtype for np.float32 in *DictionaryLearning, SparseCod…

34d4229

…er and orthogonal_mp_gram (scikit-learn#22002)

takoika mentioned this pull request Jan 1, 2022

ENH Preserving dtype for np.float32 in SparsePCA and MiniBatchSparsePCA #22111

Merged

thomasjpfan mentioned this pull request Jan 8, 2022

Preserving dtype for float32 / float64 in transformers #11000

Open

28 tasks

This was referenced Feb 9, 2022

Performance regression in MiniBatchDictionaryLearning #22426

Closed

FIX Preserve float32 in MiniBatchDictionaryLearning #22428

Merged

		- \|Enhancement\| `dict_learning` and `dict_learning_online` methods preserve dtype for numpy.float32.
		:pr:`22002` by :user:`Takeshi Oura <takoika>`.

Uh oh!

ENH Preserving dtype for np.float32 in *DictionaryLearning, SparseCoder and orthogonal_mp_gram #22002

ENH Preserving dtype for np.float32 in *DictionaryLearning, SparseCoder and orthogonal_mp_gram #22002

Uh oh!

Conversation

takoika commented Dec 17, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

jeremiedbb Dec 21, 2021

Choose a reason for hiding this comment

Uh oh!

takoika Dec 22, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb Dec 21, 2021

Choose a reason for hiding this comment

Uh oh!

takoika Dec 22, 2021

Choose a reason for hiding this comment

Uh oh!

jeremiedbb commented Dec 22, 2021

Uh oh!

takoika commented Dec 22, 2021

Uh oh!

takoika commented Dec 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Dec 23, 2021

Uh oh!

glemaitre commented Dec 23, 2021

Uh oh!

takoika commented Dec 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

takoika commented Dec 23, 2021 •

edited

Loading

takoika commented Dec 23, 2021 •

edited

Loading