Skip to content

DISCUSS: Runtime precision selection drive? #23362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
seberg opened this issue Mar 9, 2023 · 15 comments
Open

DISCUSS: Runtime precision selection drive? #23362

seberg opened this issue Mar 9, 2023 · 15 comments

Comments

@seberg
Copy link
Member

seberg commented Mar 9, 2023

Yesterday there was a bit of a discussion around adding an env variable to allow runtime selection (mainly opt out of lower precision SVML). I need to think it a bit over, but overall, I would be fine with doing that.

There was also a discussion that a proper solution may be to allow selecting this maybe as a context (or similar). I don't think that is a small project, but I am not sure it is huge either.

I am not planning on attacking it, but if e.g. @mattip is interested in it together with @seiko2plus it may be a well scoped project, so I thought I open an issue to see if things drop out.

What would it take? Some internal reorganization for sure:

  1. We probably need to fix np.errstate to be a context variable (that is needed anyway). I think we could then expand it to tag on a "float precision" context? np.errstate is now a contextvar with ENH,API: Make the errstate/extobj a contextvar #23936. This makes it now easier to extend and absolutely no worry to fetch the internal npy_extobj once and pass it around to anywhere needed. (It should be very fast to fetch now, so even multiple fetches are fine in practice.)
  2. We need to include that information in the context passed into ufuncs and get_loop.
  3. We need to actually use it to dispatch :).

The first thing is well scoped, but a mid-sized project (not too hard really). The second thing should be very easy afterwards, but may need refactoring of the errstate handling as a follow-up to the first.

The second point is easy enough.

For the third point, we end up here currently: https://github.com/numpy/numpy/blob/main/numpy/core/src/umath/legacy_array_method.c#L195
that function can be customized freely for each ufunc. And the float context is even passed through to the inner-loop (but you need to customize that function to use the new-style inner-loop signature).

Now, we need to customize the get_loop(), which is a bit annoying in the current machinery. But I am not sure it is that bad: Matti effectively did the same for the ufunc.at loop specialization already.

Note that I am happy to leave ufunc->functions[i] function pointer that we currently use/fill at NULL. Nobody should be using that! (This part might need very managable followups if you want to keep supporting pnumpy style things, but I suspect that is a bridge that can be crossed when someone asks for it.)

(I am happy to close this, if there is no explicit interest.)

@mattip
Copy link
Member

mattip commented Mar 9, 2023

This seems like a good idea. We already have a start for disabling SIMD features in order to preserve byte-for-byte compatible results, this seems like a natural extension.

@mattip
Copy link
Member

mattip commented Mar 9, 2023

I think the context manager part is partially done, at least errstate can already be used as one.

@seberg
Copy link
Member Author

seberg commented Mar 9, 2023

It is a context manager, but not backed by a context variable, which has its own set of bugs, so it felt like if we extend that mechanism that might make sense. I am not sure if it is necessary though, I admit.

@mattip
Copy link
Member

mattip commented Jun 11, 2023

There was further discussion on the mailing list. I think there is consensus that the current 1.25.0rc2 errors for sin and cos are too large, and that we should revert. @r-devulap what should we set the permitted ULP error to here

std::vector<struct ufunc> umathfunc = {
{"sin", sin, sin, 2.37, 3.3},
{"cos", cos, cos, 2.36, 3.38},
instead of 2.37, 3.3 / 2.36, 3.38? Then we should regenerate the data files. As I understand the code we should change these lines to avoid using the SIMD code for sin/cos


#if NPY_SIMD_F64 && NPY_SIMD_FMA3

#if NPY_SIMD_F32 && NPY_SIMD_FMA3

@charris
Copy link
Member

charris commented Jun 11, 2023

Rather than revert, I'd rather just disable. Long term, I'm hoping for either/both of:

  • An improved approximation, especially around zero (I think this can be done).
  • A context manager that allows for using the faster approximation where it is useful.

@seberg
Copy link
Member Author

seberg commented Jun 11, 2023

The "fast" context manger would have one more advantage: Disabling FPEs for it is more of a no-brainer.

@mattip
Copy link
Member

mattip commented Jun 11, 2023

Sorry, I did not properly communicate my proposal. We should revert the code to the situation we had before SVML by disabling the SIMD code paths for sin and cos. I think this can be done by disabling the macros in the places I listed above. I agree that in the long-term allowing use of a context manager is needed.

@charris
Copy link
Member

charris commented Jun 11, 2023

Sounds like we are in agreement then. There don't seem to be any other unfixed problems with 1.25.0rc1, so I will release after the fix goes in.

@r-devulap
Copy link
Member

I will work on submitting a PR: disable SIMD for sin/cos, modify ULP error for sin/cos to 1 and regenerate data files.

@mattip
Copy link
Member

mattip commented Jun 12, 2023

@r-devulap thanks, much appreciated. 👏

@r-devulap
Copy link
Member

Are we reversing both float32 and float64 SIMD versions? float32 version does not use SVML and has an upper ULP bound of 1.49 (has been around since #13368).

@charris
Copy link
Member

charris commented Jun 12, 2023

Just the new float64 version. There have been no complaints about the float32 version and 1.49 ULPS seems reasonable.

@r-devulap
Copy link
Member

makes sense.

@r-devulap
Copy link
Member

See #23925

@seberg
Copy link
Member Author

seberg commented Jun 21, 2023

Since errstate is saner now which should help a push here, some additional thoughts:

  • Maybe fast_math=True would be a better name (rather than precision="medium" or similar). That wouldn't include low-precision, but has some advantages:
    • Sounds less pessimistic about lower precision
    • A bit more clear, since I suspect toggling it should also disable FPE checking (allowing loops to be sloppy about FPEs).
    • Familiar naming from compilers, which it would in practice match.
    • It may be interesting for numba to query such a flag to decide to JIT with fast-math enabled (when compiling on the fly).
    • Low accuracy would require an additional flag (maybe even forcing users to use fast_math=True, low_accuracy=True explicitly). This may be a downside, but I would see it only as one, if we anticipate more accuracy modes. (A "reproducible" would be amazing, but I doubt that is even remotely feasible.)
  • errstate is of course not a great name, it would be nice to find a better one.
  • It would be nice to think about what Matti brought up a few times: a way to trace selection of different specializations. But I don't think it should block us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants