Skip to content

ENH Preserving dtype for np.float32 in *DictionaryLearning, SparseCoder and orthogonal_mp_gram #22002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Dec 30, 2021

Conversation

takoika
Copy link
Contributor

@takoika takoika commented Dec 17, 2021

Reference Issues/PRs

This PR is part of #11000 .

What does this implement/fix? Explain your changes.

This PR makes an obtained code and dictionary of Dictionary Learning numpy.float32 when input data is numpy.float32 in order to preserve input data type.

Any other comments?

I found two difficulties in testing numerical consistency between numpy.float32 and numpy.float64 for dictionary learning.

  • optimal code and dictionary are not unique.
  • difficult to match with high precision(maybe due to multiple linear algebra)

In the scope of the PR it is OK. But potentially this makes it difficult to guarantee numerical consistency for downstream methods to use Dictionary learning.

Further test cases may be required because this PR does not cover argument variations.

I used #13303, #13243 and #20155 as references to make this.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions/suggestions for improvements below but the PR already LGTM as it is.

Please do not forget to document the enhancement in doc/whats_new/v1.1.rst.

# instead of comparing directory U and V.
assert_allclose(np.matmul(U_64, V_64), np.matmul(U_32, V_32), rtol=rtol, atol=atol)
assert_allclose(np.sum(np.abs(U_64)), np.sum(np.abs(U_32)), rtol=rtol, atol=atol)
assert_allclose(np.sum(V_64 ** 2), np.sum(V_32 ** 2), rtol=rtol, atol=atol)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clever way to test numerical equivalence of the solutions.

I am just wandering, is rtol really necessary if we already pass atol=1e-7?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have only rtol than only atol. In addition I think rtol=1e-7 is a little bit too optimistic when comparing float32s because it's slightly lower than the machine precision. 1e-6 would be safer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
I have changed to use only rtol.
Anyway some tests are difficult to pass rtol=1e-7 or 1e-6. So minimum rtol to pass tests are set.

Comment on lines 122 to 124
- |Enhancement| `dict_learning` and `dict_learning_online` methods preserve dtype for numpy.float32.
:pr:`22002` by :user:`Takeshi Oura <takoika>`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes also impact DictionaryLearning, MiniBatchDictionaryLearning and SparseCoder. Please mention them here. I think it would also be nice to add similar tests for those as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.
Also unit tests to verify dtype matching for DictionaryLearning, MiniBatchDictionaryLearning and SparseCoder are added. Then a test for numerical consistency among np.float32 and np.float64 in sparse_encode method is added.

@takoika takoika requested a review from jeremiedbb December 22, 2021 07:17
@jeremiedbb
Copy link
Member

I suspect that we only need to change this to make "omp" preserve dtype as well

coef = np.zeros((len(Gram), Xy.shape[1], len(Gram)))

coef = np.zeros((len(Gram), Xy.shape[1]))

adding dtype=Gram.dtype.

Would you mind trying this ? If it requires more work, we can do it later in a separated PR.

@takoika
Copy link
Contributor Author

takoika commented Dec 22, 2021

I suspect that we only need to change this to make "omp" preserve dtype as well

Thank you for the suggestion.
By the proposed changes omp algorithm come to pass unit test. So I have added the change to this PR.

@takoika
Copy link
Contributor Author

takoika commented Dec 23, 2021

@jeremiedbb
The change of scikit-learn/sklearn/linear_model/_omp.py changes the behaviour of orthogonal_mp_gram. So is it needed to mention it in change log?

Anyway othogonal_mp, OrthogonalMatchingPursuit and OrthogonalMatchingPursuitCV will be affected by the change as well as orthogonal_mp_gram. It would be better to add unit tests to confirm relations between input dtype and output dtype for them. But in terms of making PR small and simple, it would be better to address _omp.py at next PR.
Any thoughts?

@glemaitre
Copy link
Member

Something that will be required as well for the common test is to add the proper tag to the classes that are preserving the dtype. It boils down to adding the "preserves_dtype" key into the dictionary returned by _more_tags.

def _more_tags(self):
        return {
            "preserves_dtype": [np.float64, np.float32],
        }

This should be added to:

  • DictionaryLearning
  • MiniBatchDictionaryLearning
  • SparseCoder

Some checks are probably a bit duplicated with the one written but I don't think that this is a big deal because I am not sure that we are running the common test for all the above classes.

@glemaitre
Copy link
Member

Anyway othogonal_mp, OrthogonalMatchingPursuit and OrthogonalMatchingPursuitCV will be affected by the change as well as orthogonal_mp_gram. It would be better to add unit tests to confirm relations between input dtype and output dtype for them. But in terms of making PR small and simple, it would be better to address _omp.py at next PR.
Any thoughts?

I would advise making it in a separate PR if this is not as straightforward regarding the behaviour.

@glemaitre glemaitre changed the title Preserving dtype for numpy.float32 in Dictionary learning ENH Preserving dtype for np.float32 in *DictionaryLearning and SparseCoder Dec 23, 2021
@takoika takoika changed the title ENH Preserving dtype for np.float32 in *DictionaryLearning and SparseCoder ENH Preserving dtype for np.float32 in *DictionaryLearning, SparseCoder and orthogonal_mp_gram Dec 23, 2021
@takoika
Copy link
Contributor Author

takoika commented Dec 23, 2021

@glemaitre
Thanks!

Something that will be required as well for the common test is to add the proper tag to the classes that are preserving the dtype. It boils down to adding the "preserves_dtype" key into the dictionary returned by _more_tags.

I have added preserves_dtype tag.

I would advise making it in a separate PR if this is not as straightforward regarding the behaviour.

I have confirmed that the change affect only orthogonal_mp_gram . So I have added unit tests to verify the behaviour that input np.float32 is preserved as output np.float32 for orthogonal_mp_gram. othogonal_mp, OrthogonalMatchingPursuit and OrthogonalMatchingPursuitCV do not yet preserve np.float32. So I will leave as is.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on my side. @jeremiedbb are you OK with the PR as-is.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @takoika !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants