Skip to content

TST dtype dependent rtol, atol in assert_allclose_dense_sparse #13978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 26, 2019

Conversation

oleksandr-pavlyk
Copy link
Contributor

@oleksandr-pavlyk oleksandr-pavlyk commented May 29, 2019

Check for closeness of new_result and the previously computed result
in check_fit_idempotent should depend on the dtype of the result, and
cannot be less than that precision's epsilon.

Reference Issues/PRs

Closes #13977

What does this implement/fix? Explain your changes.

This change sets atol keyword in the call to assert_allclose_dense_sparse in
check_fit_idempotent estimator checker.

Any other comments?

@amueller @jeremiedbb

Check for closeness of new_result and the previously compute result
should depend on the dtype of the result, and can not be less than
that precision's epsilon.
@oleksandr-pavlyk
Copy link
Contributor Author

is this build hung?

@oleksandr-pavlyk
Copy link
Contributor Author

@ogrisel @jeremiedbb CI is green. Could you review, please.

@jeremiedbb
Copy link
Member

Shouldn't we adjust the rtol instead ? unless comparing to 0, atol is usually not meaningful and kind of unrelated to the dtype.

@oleksandr-pavlyk
Copy link
Contributor Author

Yes, if that case how about setting atol = rtol = np.finfo(dtype).epsilon ?

@jeremiedbb
Copy link
Member

Yes that's fine

@oleksandr-pavlyk oleksandr-pavlyk force-pushed the check-fit-itempotent-atol branch 2 times, most recently from e654d07 to 1073501 Compare June 4, 2019 17:10
@oleksandr-pavlyk oleksandr-pavlyk force-pushed the check-fit-itempotent-atol branch from 1073501 to 35ca88b Compare June 4, 2019 17:11
@oleksandr-pavlyk
Copy link
Contributor Author

@jeremiedbb Done.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a minor comment. Other than that lgtm.

if np.issubdtype(new_result.dtype, np.floating):
tol = np.finfo(new_result.dtype).eps
else:
tol = np.finfo(np.float64).eps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment # integer dtype because it's not clear at first sight.

I wonder if it would not be even clearer if you reformulate this way:

if new_result.dtype == np.float32:
    tol = np.finfo(np.float32).eps
else:
    tol = np.finfo(np.float64).eps

wdyt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using np.floating is much cleaner NumPy, it accounts for np.half, np.longdouble etc., even though these types may not be used in Scikit-Learn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough :)

@oleksandr-pavlyk
Copy link
Contributor Author

pinging reviewers, please.

rth
rth previously approved these changes Jun 16, 2019
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @oleksandr-pavlyk !

@@ -2448,4 +2448,12 @@ def check_fit_idempotent(name, estimator_orig):
for method in check_methods:
if hasattr(estimator, method):
new_result = getattr(estimator, method)(X_test)
assert_allclose_dense_sparse(result[method], new_result)
if np.issubdtype(new_result.dtype, np.floating):
tol = np.finfo(new_result.dtype).eps
Copy link
Member

@rth rth Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No actually for float64, this would 2.22e-16 do we actually need such precision? I think the previous default of 1e-7 was enough? For other dtypes, I agree.

Maybe then,

# previous defaults in `assert_allclose_dense_sparse`
rtol = max(tol, 1e-7)
atol = max(tol, 1e-9)

also maybe take 2*np.finfo(dtype).eps just to be safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the good feedback. I agree, and have pushed both proposed changes.

@rth rth dismissed their stale review June 25, 2019 11:52

Changes requested

Parameter tol is set to be 2*np.finfo(dtype).eps, rather than
np.finfo(dtype).eps.

Setting of rtol and atol parameters of `assert_allclose_dense_sparse`
is modified to not go below the default values.
@rth
Copy link
Member

rth commented Jun 26, 2019

Thanks!

@rth rth changed the title MAINT: Per #13977 set non-default absolute tolerance TST dtype dependent rtol, atol in assert_allclose_dense_sparse Jun 26, 2019
@rth rth merged commit 3f25ea0 into scikit-learn:master Jun 26, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default atol value in assert_allclose_dense_sparse is too low
3 participants