Skip to content

CI: Disable numpy avx512 instructions #22099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 4, 2022

Conversation

greglucas
Copy link
Contributor

PR Summary

This disables the AVX512 instruction set at runtime within the CI using the NPY_DISABLE_CPU_FEATURES environment variable. This is due to small floating point differences causing test failures when using that instruction set on numpy 1.22 wheels on the GitHub Actions runners.

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

This removes the NPY_DISABLE_CPU_FEATURES flag from the sphinx and tk tests
as they emit warnings on CI which leads to failure from the subprocess.
These don't need to be disabled on these tests, so remove them from
the environment variables that are passed in.
@timhoffm timhoffm added this to the v3.5.2 milestone Jan 3, 2022
@jklymak
Copy link
Member

jklymak commented Jan 3, 2022

This is pretty obscure. Does it make sense to just up the test tolerance instead? Though I think a couple of tests were quite wrong. Why did these flags have such a large effect?

@dstansby
Copy link
Member

dstansby commented Jan 3, 2022

This is pretty obscure. Does it make sense to just up the test tolerance instead? Though I think a couple of tests were quite wrong. Why did these flags have such a large effect?

I don't think any were unexpectedly wrong. I downloaded the failed images and all were minor chnages apart from errorbar_mixed, which was a large change because (I presume) a small difference was causing the axes limit to jump from 1e-2 to 1e-3.

I think this is the most pragmatic way to go forward, without having to play whack-a-mole with test tolerances across the test suite.

@jklymak
Copy link
Member

jklymak commented Jan 3, 2022

Oh I have the opposite point of view. If we have tests that are susceptible to compiler foibles maybe we should fix the tests or increase their tolerance, because there will always be more compiler foibles.

@greglucas
Copy link
Contributor Author

I agree this is a pretty big hammer, saying we won't test against certain floating point instruction sets. However, I assume we rely on numpy for some verification that their floating point calculations are "good enough" for them and this is impacting us because we are doing pixel-perfect comparisons with floating point inputs. I can see both arguments here, so I'm not sure which one people would be more generally comfortable with (increased tolerances, or reduced instruction sets).

There are only a few tests failing, so the tolerances on a few of them could probably be bumped, but the errorbar one we should probably update the autoscaling in that test by just bumping the np.minimum() call to add in a small epsilon rather than adding a large tolerance.

@timhoffm
Copy link
Member

timhoffm commented Jan 3, 2022

We have to live with the fact that our dependencies may introduce minor variations. First and foremost we don't want brittle tests. There are two ways to get there:

  1. Make the test environment as reproducible as possible
  2. Increase tolerances

I argue that if we can easily get away with (1) that's better than (2). We have more control and don't blindly accept other changes as well. For the same reason we pin freetype or remove text overall.

@timhoffm
Copy link
Member

timhoffm commented Jan 4, 2022

I merge as is to unbreak the builds. @jklymak if you think tolerances are the better approach we can discuss this in the next call, and then maybe change.

@jklymak
Copy link
Member

jklymak commented Jan 4, 2022

Sure, I've added to the call agenda.

dstansby added a commit that referenced this pull request Jan 4, 2022
…099-on-v3.5.x

Backport PR #22099 on branch v3.5.x (CI: Disable numpy avx512 instructions)
@greglucas greglucas deleted the numpy-cpu-avx512 branch January 4, 2022 15:23
@tacaswell
Copy link
Member

I may have over-learned from our issue we had with "rendering the wrong glyph but tests are passing", but I am very very concerned about bumping tolerances in anything but a per-test basis.

@jklymak
Copy link
Member

jklymak commented Jan 4, 2022

Sure, I wasn't suggesting a general large tolerance. But here we have one test that had a large change because of floating point slop. So in my opinion that test should be made more robust. The other changes are pretty small, so upping those tolerances would also make sense to me.

@greglucas
Copy link
Contributor Author

Here is a link to a commit that would be what @jklymak suggests to increase tolerances/dtypes. All of them were pretty small.
greglucas@c030e05
I tried to move all of the failing tests to use longdouble rather than increasing tolerances, but there ended up being some unsafe downcasting to float64 in the qhull procedures of trisurf.

@tacaswell
Copy link
Member

I came across https://etna.math.kent.edu/vol.52.2020/pp358-369.dir/pp358-369.pdf recently which has a case where increasing the precision of floats can make things worse (sometimes the rounding is actually in your favor!).

@tacaswell
Copy link
Member

@greglucas can you open a PR with those tolerances, they seem reasonable to me.

@greglucas
Copy link
Contributor Author

Sure, see #22132.

That is an interesting finding! I didn't think about trying single precision instead in the update...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants