DISCUSS: Runtime precision selection drive? #23362

seberg · 2023-03-09T09:40:12Z

Yesterday there was a bit of a discussion around adding an env variable to allow runtime selection (mainly opt out of lower precision SVML). I need to think it a bit over, but overall, I would be fine with doing that.

There was also a discussion that a proper solution may be to allow selecting this maybe as a context (or similar). I don't think that is a small project, but I am not sure it is huge either.

I am not planning on attacking it, but if e.g. @mattip is interested in it together with @seiko2plus it may be a well scoped project, so I thought I open an issue to see if things drop out.

What would it take? Some internal reorganization for sure:

~~We probably need to fix np.errstate to be a context variable (that is needed anyway). I think we could then expand it to tag on a "float precision" context?~~ np.errstate is now a contextvar with ENH,API: Make the errstate/extobj a contextvar #23936. This makes it now easier to extend and absolutely no worry to fetch the internal npy_extobj once and pass it around to anywhere needed. (It should be very fast to fetch now, so even multiple fetches are fine in practice.)
We need to include that information in the context passed into ufuncs and get_loop.
We need to actually use it to dispatch :).

The first thing is well scoped, but a mid-sized project (not too hard really). The second thing should be very easy afterwards, but may need refactoring of the errstate handling as a follow-up to the first.

The second point is easy enough.

For the third point, we end up here currently: https://github.com/numpy/numpy/blob/main/numpy/core/src/umath/legacy_array_method.c#L195
that function can be customized freely for each ufunc. And the float context is even passed through to the inner-loop (but you need to customize that function to use the new-style inner-loop signature).

Now, we need to customize the get_loop(), which is a bit annoying in the current machinery. But I am not sure it is that bad: Matti effectively did the same for the ufunc.at loop specialization already.

Note that I am happy to leave ufunc->functions[i] function pointer that we currently use/fill at NULL. Nobody should be using that! (This part might need very managable followups if you want to keep supporting pnumpy style things, but I suspect that is a bridge that can be crossed when someone asks for it.)

(I am happy to close this, if there is no explicit interest.)

The text was updated successfully, but these errors were encountered:

mattip · 2023-03-09T20:36:28Z

This seems like a good idea. We already have a start for disabling SIMD features in order to preserve byte-for-byte compatible results, this seems like a natural extension.

mattip · 2023-03-09T20:59:38Z

I think the context manager part is partially done, at least errstate can already be used as one.

seberg · 2023-03-09T21:01:12Z

It is a context manager, but not backed by a context variable, which has its own set of bugs, so it felt like if we extend that mechanism that might make sense. I am not sure if it is necessary though, I admit.

mattip · 2023-06-11T14:58:31Z

There was further discussion on the mailing list. I think there is consensus that the current 1.25.0rc2 errors for sin and cos are too large, and that we should revert. @r-devulap what should we set the permitted ULP error to here

numpy/numpy/core/tests/data/generate_umath_validation_data.cpp

Lines 113 to 115 in e75214c

    
           std::vector<struct ufunc> umathfunc = { 
        
                   {"sin", sin, sin, 2.37, 3.3}, 
        
                   {"cos", cos, cos, 2.36, 3.38},

instead of 2.37, 3.3 / 2.36, 3.38? Then we should regenerate the data files. As I understand the code we should change these lines to avoid using the SIMD code for sin/cos

numpy/numpy/core/src/umath/loops_trigonometric.dispatch.c.src

Line 43 in e75214c

#if NPY_SIMD_F64

numpy/numpy/core/src/umath/loops_trigonometric.dispatch.c.src

Line 214 in e75214c

#if NPY_SIMD_F32

numpy/numpy/core/src/umath/loops_trigonometric.dispatch.c.src

Line 391 in e75214c

#if NPY_SIMD_F64 && NPY_SIMD_FMA3

numpy/numpy/core/src/umath/loops_trigonometric.dispatch.c.src

Line 425 in e75214c

#if NPY_SIMD_F32 && NPY_SIMD_FMA3

charris · 2023-06-11T15:34:37Z

Rather than revert, I'd rather just disable. Long term, I'm hoping for either/both of:

An improved approximation, especially around zero (I think this can be done).
A context manager that allows for using the faster approximation where it is useful.

seberg · 2023-06-11T15:45:11Z

The "fast" context manger would have one more advantage: Disabling FPEs for it is more of a no-brainer.

mattip · 2023-06-11T16:22:11Z

Sorry, I did not properly communicate my proposal. We should revert the code to the situation we had before SVML by disabling the SIMD code paths for sin and cos. I think this can be done by disabling the macros in the places I listed above. I agree that in the long-term allowing use of a context manager is needed.

charris · 2023-06-11T16:41:34Z

Sounds like we are in agreement then. There don't seem to be any other unfixed problems with 1.25.0rc1, so I will release after the fix goes in.

r-devulap · 2023-06-12T16:40:23Z

I will work on submitting a PR: disable SIMD for sin/cos, modify ULP error for sin/cos to 1 and regenerate data files.

mattip · 2023-06-12T16:50:18Z

@r-devulap thanks, much appreciated. 👏

r-devulap · 2023-06-12T20:15:41Z

Are we reversing both float32 and float64 SIMD versions? float32 version does not use SVML and has an upper ULP bound of 1.49 (has been around since #13368).

charris · 2023-06-12T20:30:51Z

Just the new float64 version. There have been no complaints about the float32 version and 1.49 ULPS seems reasonable.

r-devulap · 2023-06-12T20:33:59Z

makes sense.

r-devulap · 2023-06-12T21:04:50Z

See #23925

seberg · 2023-06-21T07:42:50Z

Since errstate is saner now which should help a push here, some additional thoughts:

Maybe fast_math=True would be a better name (rather than precision="medium" or similar). That wouldn't include low-precision, but has some advantages:
- Sounds less pessimistic about lower precision
- A bit more clear, since I suspect toggling it should also disable FPE checking (allowing loops to be sloppy about FPEs).
- Familiar naming from compilers, which it would in practice match.
- It may be interesting for numba to query such a flag to decide to JIT with fast-math enabled (when compiling on the fly).
- Low accuracy would require an additional flag (maybe even forcing users to use fast_math=True, low_accuracy=True explicitly). This may be a downside, but I would see it only as one, if we anticipate more accuracy modes. (A "reproducible" would be amazing, but I doubt that is even remotely feasible.)
errstate is of course not a great name, it would be nice to find a better one.
It would be nice to think about what Matti brought up a few times: a way to trace selection of different specializations. But I don't think it should block us.

melissawm added the 15 - Discussion label Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DISCUSS: Runtime precision selection drive? #23362

DISCUSS: Runtime precision selection drive? #23362

seberg commented Mar 9, 2023 •

edited

Loading

mattip commented Mar 9, 2023

mattip commented Mar 9, 2023

seberg commented Mar 9, 2023

mattip commented Jun 11, 2023

charris commented Jun 11, 2023 •

edited

Loading

seberg commented Jun 11, 2023

mattip commented Jun 11, 2023

charris commented Jun 11, 2023

r-devulap commented Jun 12, 2023

mattip commented Jun 12, 2023

r-devulap commented Jun 12, 2023

charris commented Jun 12, 2023

r-devulap commented Jun 12, 2023

r-devulap commented Jun 12, 2023

seberg commented Jun 21, 2023

DISCUSS: Runtime precision selection drive? #23362

DISCUSS: Runtime precision selection drive? #23362

Comments

seberg commented Mar 9, 2023 • edited Loading

mattip commented Mar 9, 2023

mattip commented Mar 9, 2023

seberg commented Mar 9, 2023

mattip commented Jun 11, 2023

charris commented Jun 11, 2023 • edited Loading

seberg commented Jun 11, 2023

mattip commented Jun 11, 2023

charris commented Jun 11, 2023

r-devulap commented Jun 12, 2023

mattip commented Jun 12, 2023

r-devulap commented Jun 12, 2023

charris commented Jun 12, 2023

r-devulap commented Jun 12, 2023

r-devulap commented Jun 12, 2023

seberg commented Jun 21, 2023

seberg commented Mar 9, 2023 •

edited

Loading

charris commented Jun 11, 2023 •

edited

Loading