-
-
Notifications
You must be signed in to change notification settings - Fork 26k
FIX Fix gram validation: dtype-aware tolerance #22059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… no attribute 'random'
Converted this into a draft PR because of a failing unit test https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=36348&view=logs&j=18b0749f-dd9a-5274-d197-77895e43d4e4&t=ba53dc33-2c0b-592b-6f69-b1c7af7ca977. See also the open problem here #21997 (comment). |
what makes little sense to me is to use the same default rtol and atol for float32 and float64 a solution along those lines:
seems to fix your problem. as it is it would require to update docstring etc but that would be my approach. |
Thanks @agramfort for the feedback. I agree that your proposal is the better solution. However, I wonder whether one could just call scikit-learn/sklearn/utils/_testing.py Lines 390 to 393 in 735d3c6
It is already data-type aware and for example also used in the module https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/estimator_checks.py. --> The function |
no objection to use sklearn.utils._testing.assert_allclose sorry for the slow reaction time @MalteKurz |
…ed instead of an ValueError
@agramfort No worries, thanks for the feedback. I adapted the code accordingly, such that the the validation is now dtype-aware. The description of this PR (#22059 (comment)) was also adapted in this lines and should now be ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MalteKurz can you add a what's new entry?
🙏 for taking a stab at this.
sklearn/linear_model/_base.py
Outdated
rtol : float, default=None | ||
Relative tolerance; see numpy.allclose and | ||
sklearn.utils._testing.assert_allclose. | ||
If None, it is set based on the provided arrays' dtypes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explicit on the number. Before it was clear it was 1e-7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adapted the PR accordingly (see 9d2ae61) and now explicitly state the rtol
values in the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
sklearn/linear_model/_base.py
Outdated
f"{expected} but the user-supplied value was " | ||
f"{actual}." | ||
) | ||
assert_allclose( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the exception class to ValueError
. Also importing from _testing
in non-testing code has the consequence of importing pytest
(if pytest
is installed). Loading an unused module can increase the startup time of programs.
I am in favor of copying the rtol
selection logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adapted the PR accordingly (see 9d2ae61):
- prevent import from
utils._testing
- copy-paste
rtol
logic - go back to
ValueError
sklearn/utils/_testing.py
Outdated
@@ -414,7 +414,6 @@ def assert_allclose( | |||
If None, it is set based on the provided arrays' dtypes. | |||
atol : float, optional, default=0. | |||
Absolute tolerance. | |||
If None, it is set based on the provided arrays' dtypes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open a separate PR for this? This would keep this PR focused on fixing _check_precomputed_gram_matrix
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, see #23555.
… ValueError; explicitly state the rtol in the docstring
@agramfort A what's new entry was added in 1acfd42. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reference Issues/PRs
Fixes #21997. See also #19004.
What does this implement/fix? Explain your changes.
In #19004 validations for user-supplied gram matrices have been introduced. A core part of the check is based on re-computing a single element of the gram matrix and checking whether it is equal to the provided entry of the Gram matrix.
The check can cause "false alarms" because there is no guarantee that
np.dot
always delivers "numerically" the same result if either called with two 2d-arrays (specific element extracted) or with two 1d-arrays corresponding to that specific element. Two examples are provided in the followingAs a consequence, the validation can even kick in when the user didn't provide a pre-computed Gram matrix. Such a case, where in a call to
LassoCV().fit()
the Gram matrix is internally pre-computed, passed through and the validation fails is described in #21997:The proposed fix: As suggested by @agramfort, the check for the Gram matrix is now done with dtype-aware tolerance. This is achieved by calling
sklearn.utils._testing.assert_allclose
withrtol=None
, see #22059 (comment). For this the defaultrtol
of_check_precomputed_gram_matrix
was changed toNone
. The docstring has been adapted accordingly.Unit test: Based on one of the above examples, I added a new unit test being sensitive for the described problem.
Any other comments?
sklearn.utils._testing.assert_allclose
. There, forrtol
as well asatol
it was stated that "If None, it is set based on the provided arrays' dtypes.". However, looking into the code it becomes clear that onlyrtol
is dtype-aware (if set toNone
) but notatol
. Therefore, I removed the statement from theatol
docstring, see 60372ce