ENH: enable multi-platform SIMD compiler optimizations #13516

seiko2plus · 2019-05-09T17:48:41Z

This pullrequest changes

New Distutils class `CCompilerOpt`

Used for handling the CPU/hardware optimization, starting from parsing the command arguments, to managing the relation between the CPU baseline and dispatch-able features, also generating
the required C headers and ending with compiling the sources with proper compiler's flags.

Class CCompilerOpt doesn't provide runtime detection for the CPU features,
instead only focuses on the compiler side, but it creates abstract C headers
that can be used later for the final runtime dispatching process.

New Build Arguments

--cpu-baseline minimal set of required optimizations, default value is min which provides the minimum CPU features that can safely run on a wide range of users platforms.
--cpu-dispatch dispatched set of additional optimizations, default value is max -xop -fma4 which enables all CPU features, except for AMD legacy features.

The new arguments can be reached through build, build_clib , build_ext.
if build_clib or build_ext are not specified by the user, the arguments of build will be used instead, which also hold the default values.

Both `--cpu-baseline` and `--cpu-dispatch` are accepting the following CPU features :

X86: SSE, SSE2, SSE3, SSSE3, SSE41, POPCNT, SSE42, AVX, F16C, XOP, FMA4, FMA3, AVX2, AVX512F, AVX512CD, AVX512_KNL, AVX512_KNM, AVX512_SKX, AVX512_CLX, AVX512_CNL, AVX512_ICL

IBM/Power64: VSX, VSX2, VSX3

ARM7/8 : NEON, NEON_FP16, NEON_VFPV4, ASIMD, ASIMDHP, ASIMDDP, ASIMDFHM

Other acceptable options:

MIN: Enables the minimum CPU features that can safely run on a wide range of users platforms :
X86: SSE, SSE2
X86_64: SSE, SSE2, SSE3
IBM/Power64(big-endian): nothing
IBM/Power64(little-endian): VSX, VSX2
ARM7: nothing
ARM8(64-BIT): NEON, NEON_FP16, NEON_VFPV4, ASIMD
MAX: Enables all supported CPU features by the Compiler and platform.
NATIVE: Enables all CPU features that supported by the current machine, this operation is based on the compiler flags (-march=native, -xHost, /QxHost)
NONE: enables nothing
Operand +/-: remove or add features, useful with options MAX , MIN and NATIVE.
NOTE: operand + is only added for nominal reason, For example :
- --cpu-basline="min avx2" equivalent to --cpu-basline="min + avx2"
- --cpu-basline="min,avx2" equivalent to --cpu-basline="min,+avx2"

NOTES:

Case-insensitive among all CPU features and other options.
Comma or space can be used as a separator, e.g.
--cpu-dispatch= "avx2 avx512f" or --cpu-dispatch= "avx2, avx512f" both applicable.
If the CPU feature is not supported by the user platform or compiler, it will be skipped rather than
raising a fatal error.
Any specified CPU features to --cpu-dispatch will be skipped if its part of CPU baseline features
Argument --cpu-baseline force enables implied features,
e.g. --cpu-baseline="sse42" equivalent to --cpu-baseline="sse sse2 sse3 ssse3 sse41 popcnt sse42"
The value of --cpu-baseline will be treated as "native" if compiler native flag -march=native or -xHost or QxHost is enabled though environment variable CFLAGS
The user should always check the final report through the build log to verify the enabled features.

Explicitly disable the new infrastructure

Add new command argument --disable-optimization to explicitly disable the whole new infrastructure, It also adds a new compiler definition called NPY_DISABLE_OPTIMIZATION

when --disable-optimization is enabled the dispatch-able sources(explained in this comment).dispatch.c will be treated as a normal C source, also due to this disabling any C headers that generated by CCompilerOpt must guard it with NPY_DISABLE_OPTIMIZATION, otherwise, it will definitely break the build.

New auto-generated C header(`core/src/common/_cpu_dispatch.h`)

The new header contains all the definitions and headers of instruction-sets for the CPU baseline and dispatch-able features that have enabled through command arguments --cpu-baseline and --cpu-dispatch.

NPY_HAVE_ is the suffix of CPU features definitions, e.g. NPY_HAVE_SSE2, NPY_HAVE_VSX, NPY_HAVE_NEON.
The new header can be included inside normal C files or dispatch-able C sources(explained below),
however, if the new header is included inside normal C files, then it will only provides the definitions and headers of instruction-sets for the CPU baseline features, For example:

#include "_cpu_dispatch.h"
#ifdef NPY_HAVE_AVX2
// this branch will only be enabled if AVX2 is enabled through --cpu-baseline="avx2"
#endif

NOTE: It should not be used directly but through another new header called npy_cpu_dispatch.h

New C header(`core/src/common/npy_cpu_dispatch.h`)

This header contains all utilities that required for the whole CPU dispatching process, it also can be considered as a bridge linking the new infrastructure work with NumPy CPU runtime detection(#13421).

New CPU dispatcher solution(dispatch-able sources)

Explained in this comment

Add new attributes to umath module

__cpu_baseline__ a list of CPU baseline feature names that configured by --cpu-baseline
__cpu_dispatch__ a list of CPU dispatch feature names that configured by --cpu-dispatch

The new attributes contain the final enabled CPU features that supported by the platform.

Print the supported CPU features during the run of PytestTester

This could be one of the easiest ways to trace the enabled features during the run of unit tests.

An example on GCC 8.3.0(64-bit) - i7-8550U:

NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?

The format explained as follows :

Supported dispatched features by the running machine end with *
Dispatched features that are not supported by the running machine end with ?
Remained features are representing the baseline.
If there any missing features, that because:
- not supported by the compiler or platform
- not configured by command arguments --cpu-baseline and --cpu-dispatch.

mattip · 2019-07-03T23:07:19Z

I am concerned all this will become a maintenance burden moving forward. Is there a way to use this as a library or a subrepo?

edelsohn · 2019-07-03T23:15:00Z

What does this comment about library or subrepo mean? This sounds very hostile to NumPy supporting other architectures.

rgommers · 2019-07-03T23:32:10Z

I am concerned all this will become a maintenance burden moving forward. Is there a way to use this as a library or a subrepo?

@mattip this is quite clearly addressed in the discussion that spurred this PR: gh-13393. And we also discussed it with the Steering Council before saying yes to posting the bounty mentioned in gh-13393. So yes it's a concern, but if things are done the right way there's no reason that this should substantially increase the maintenance burden. Please review that issue and comment on code or ideas that are potentially problematic, no need to start a new abstract discussion.

This sounds very hostile to NumPy supporting other architectures.

@edelsohn that's certainly not our intent as a project. We'd really like to have complete SIMD support for PowerPC as well as other architectures, with the compiler doing most of the heavy lifting.

rgommers · 2019-07-03T23:44:07Z

That said, this is quite a bit of code. I believe a lot of it is well-tested though, taken over from OpenCV (correct me if I'm wrong @seiko2plus). Some more context in the issue description would be useful:

there's a TODO for an example
this contains both build-time and run-time features. does that change the situation we have today? IIRC we wanted to move more to run-time, is that achieved?
what parts are and aren't covered by CI (we have PPC on TravisCI, were thinking about ARM via drone.io I believe, and conda-forge has ARM too)?

charris · 2019-07-04T00:13:59Z

What does this comment about library or subrepo mean?

It is simply another way to organize the code. NumPy already has two submodules that provide tools to build the documentation, putting the code in a submodule allows the NumPy to use the code while letting development proceed at its own pace. Likewise, NumPy builds several libraries for the sorting and math code, probably not strictly necessary these days, but routines in the libraries are linked in when needed, and strictly speaking could be used by downstream projects. You have a number of new files dedicated to this work, so it might be nice to put them together in subdirecties, maybe organized by architecture. Note that the libraries have their own setup and include files and configure the system. So the question here is how should we organize the code, not whether we should have it.

rkern · 2019-07-04T00:47:43Z

It seems to me, after a cursory inspection, that how you want to do the distutils integration might be dispositive. It would be nice to package this up into a separate library with its own distutils integration tooling so that I could write other packages that use this code without needing to use numpy.distutils. numpy.distutils works well for numpy and scipy, but we try not to make it necessary for packages that just want to build against numpy's functionality.

On the other hand, I recognize that the distutils integration required is fairly significant, and welding together two significant distutils augmentations is often fraught, so I can see an argument for avoiding it and just making this one more feature that numpy.distutils alone provides, like f2py. If that's the case, then numpy/numpy is the right place. There isn't much else that would be profitable to break out into a separate package.

One thing to note is that we ought to be able to use this infrastructure to build another package that uses CPU features that the numpy binary itself was not compiled to use. I don't see why that would not be the case in this PR, but I just wanted it stated explicitly. Organizing this as a separate library may help enforce that when future modifications are made.

seiko2plus · 2019-07-04T21:08:32Z

@mattip,

I am concerned all this will become a maintenance burden moving forward.

A maintenance burden already moving forward, I think what I'm doing is just trying to ease the burden.

Is there a way to use this as a library or a subrepo?

I'm afraid not or at least for now , because the universal intrinsics (mapped intrinsics) are going to be heavily expanded depending on the needs and I think it should be part of NumPy's core, later we should forbid or at least reduce the direct use of raw SIMD intrinsics and only count on universal intrinsics for CPU optimizations.

@edelsohn,

This sounds very hostile to NumPy supporting other architectures.

Actually, most of the work here is mainly for the x86 due to the need to dealing with many SIMD extensions, compilers, and platforms which make it kinda complicated and I don't see VSX or even NEON create any obstacles here, maybe supporting BE mode for VSX will be a little bit challengable but I'm planning to implement it in a separate pull-request after I get done from re-implement the current optimized kernels using the new SIMD interface.

@rommers,

That said, this is quite a bit of code.

Yes, well I had to add some fundamental universal intrinsics just for testing propose but sure I will add a lot more depending on the needs.

I believe a lot of it is well-tested though

I believe we shouldn't put our trust in any intrinsics even if it directly mapping to native instruction-set since
compilers aren't developed by angels, that may be why we should try to cover all the used intrinsics within a unit testing.

taken over from OpenCV (correct me if I'm wrong @seiko2plus)

Like I mentioned before in #13393 the whole road map is inspired by OpenCV but not exactly the same design or code but I can tell the similarity is very high especially to infrastructure.

there's a TODO for an example

Of course, I will not be able to cover all areas in this pr but I think it's necessary to provide a decent example for illustrative and testing purposes and please feel free to add any extra tasks.

what parts are and aren't covered by CI (we have PPC on TravisCI, were thinking about ARM via drone.io I believe, and conda-forge has ARM too)?

if it's necessary having a custom CI using "buildbot" could be an alternative solution too.

@charris,

You have a number of new files dedicated to this work, so it might be nice to put them together in subdirecties, maybe organized by architecture. Note that the libraries have their own setup and include files and configure the system.

Do you think the current work isn't organized enough?

So the question here is how should we organize the code, not whether we should have it

Although this work still not completed yet, I would truly appreciate it if you could provide a proposal or any kind of notes.

@rkern,

it seems to me, after a cursory inspection, that how you want to do the distutils integration might be dispositive. It would be nice to package this up into a separate library with its own distutils integration tooling so that I could write other packages that use this code without needing to use numpy.distutils. numpy.distutils works well for numpy and scipy, but we try not to make it necessary for packages that just want to build against numpy's functionality.

On the other hand, I recognize that the distutils integration required is fairly significant, and welding together two significant distutils augmentations is often fraught, so I can see an argument for avoiding it and just making this one more feature that numpy.distutils alone provides, like f2py. If that's the case, then numpy/numpy is the right place. There isn't much else that would be profitable to break out into a separate package.

One thing to note is that we ought to be able to use this infrastructure to build another package that uses CPU features that the numpy binary itself was not compiled to use. I don't see why that would not be the case in this PR, but I just wanted it stated explicitly. Organizing this as a separate library may help enforce that when future modifications are made.

Sure it is a great idea and it seems like the community prefer this way (a separate package) but wait as you know every project have his own needs, that means you have to find compatibilities, flexible design which means more time, more issues and I think we should focus on the main purpose instead at least for now and have some fun by optimize some kernels :).

rgommers · 2019-07-04T21:52:49Z

if it's necessary having a custom CI using "buildbot" could be an alternative solution too.

no not necessary, and in fact very much undesired. we used to have those in the pre-TravisCI age and they're a maintenance nightmare. we'll just cover what we can with the hosted CI platforms we have, and the rest we test manually at PR develop/merge time and rely on community input to keep things working. that's the way it is now too for AIX, Sparc, etc. we'll just slowly expand coverage as we go

Add several C macros to `utils.h`

Add new C header located at `core/src/common/npy_cpu_dispatch.h` the new header works as a bridge linking the generated headers and macros by `CCompilerOpt` with NumPy runtime CPU features detection API through provides several C macros can be used in dispatching the generated objects from dispatch-able sources.

- Add new attributes to umath module * __cpu_baseline__ a list contains the minimal set of required optimizations that supported by the compiler and platform according to the specified values to command argument '--cpu-baseline'. * __cpu_dispatch__ a list contains the dispatched set of additional optimizations that supported by the compiler and platform according to the specified values to command argument '--cpu-dispatch' - Print required and additional optimizations during the run of PytestTester

Add testing unit for the utilites of CPU dispatcher

mattip · 2020-06-17T18:35:49Z

Thanks @seiko2plus. Step 1 is done!

seiko2plus · 2020-06-18T07:01:37Z

I started to love the purple color, thank you!

charris · 2020-07-04T12:49:17Z

@seiko2plus This PR raises compiler warnings with fedora gcc 10 when AVX is missing.

/home/charris/Workspace/numpy.git/numpy/distutils/checks/cpu_avx512f.c:5:13: error: AVX512F vector return without AVX512F enabled changes the ABI [-Werror=psabi]
    5 |     __m512i a = _mm512_abs_epi32(_mm512_setzero_si512());
      |             ^
In file included from /usr/lib/gcc/x86_64-redhat-linux/10/include/immintrin.h:55,
                 from /home/charris/Workspace/numpy.git/numpy/distutils/checks/cpu_avx512f.c:1:
/usr/lib/gcc/x86_64-redhat-linux/10/include/avx512fintrin.h:15594:1: error: inlining failed in call to ‘always_inline’ ‘_mm512_castsi512_si128’: target specific option mismatch
15594 | _mm512_castsi512_si128 (__m512i __A)
      | ^~~~~~~~~~~~~~~~~~~~~~

EDIT: Also for other avx checks. Seems the checks should not be compiled without AVX.

seiko2plus · 2020-07-08T06:37:36Z

@charris, class CCompilerOpt is doing two tests for each required feature to determine the compiler support:

check the flags without any involved intrinsics, to determine the availability of flags.
check the test file with or without the flags depends on the availability of flags(friendly with msvc).

If both tests failed, then the feature will be skipped, but since we're dealing with GCC-10 both tests should be passed.
And according to the provided error, CCompilerOpt was testing the check file cpu_avx512f.c without providing the avx512f flag -mavx512f which means, the flags test was failed.

However, I tested (GCC version 10.1.0) and it looks fine to me.
please could you provide the build log or provide at least the failure part that related to testing flag -mavx512f?

charris · 2020-07-08T13:25:41Z

@seiko2plus I don't see that warning anymore, may have changed with a recent kernel update. Let me try with an older kernel.

charris · 2020-07-08T14:19:31Z

No, they are still there, but tests run fine. I think the errors are configuration output that runtests.py should not be showing, so this is probably some sort of build organization problem. Are those the sort of errors you would expect during configuration? @mattip IIRC, you were the one who suppressed the configuration output, thoughts?

mattip · 2020-07-08T14:26:50Z

Could you put the complete build log up somewhere like https://paste.ubuntu.com/ ?

charris · 2020-07-08T14:54:49Z

@mattip Here: https://paste.ubuntu.com/p/VH7x8qJkVS/ . Search for error:.

seiko2plus · 2020-07-08T15:09:57Z

@charris, umm well it seems '-march=native' is provided through environment variable CFLAGS or it is part of distutils config vars. see the final build log:

CPU baseline  : 
  Requested   : 'native'
  Enabled     : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2
  Flags       : -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2

CPU dispatch  : 
  Requested   : 'max -xop -fma4'
  Enabled     : AVX512F AVX512CD AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL
  Generated   : none
CCompilerOpt._cache_write[786] : write cache to path -> /home/charris/Workspace/numpy.git/build/temp.linux-x86_64-3.9/ccompiler_opt_cache_ext.py

########### CLIB COMPILER OPTIMIZATION ###########
CPU baseline  : 
  Requested   : 'native'
  Enabled     : SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2
  Flags       : -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2

CPU dispatch  : 
  Requested   : 'max -xop -fma4'
  Enabled     : AVX512F AVX512CD AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL
  Generated   : none
CCompilerOpt._cache_write[786] : write cache to path -> /home/charris/Workspace/numpy.git/build/temp.linux-x86_64-3.9/ccompiler_opt_cache_clib.py

The failure messages are normal in this case since CCompilerOpt was testing all features with the native flag to determine
the supported features by the running machine during the build time.

mattip · 2020-07-08T15:10:57Z

This is part of "build_clib" (line 627). The configuration work is done during "build_src" (line 135). Maybe we should do the same tricks during CCompilerOpt to ignore warnings that we do during build_src.

seiko2plus · 2020-07-08T15:21:44Z

@mattip, why we should suppress the error messages?

seiko2plus · 2020-07-08T15:34:47Z

@charris,

I think the errors are configuration output that runtests.py should not be showing, so this is probably some sort of build organization problem.

Yes, It should be showing if --show-build-log is enabled.

Are those the sort of errors you would expect during configuration?

Yes, when the compiler doesn't support certain feature.

seiko2plus · 2020-07-08T15:41:50Z

numpy/distutils/ccompiler_opt.py

+        ").*"
+    )
+    @staticmethod
+    def _dist_test_spawn(cmd, display=None):


@mattip, @charris, we can suppress errors from here. if it's necessary I can add support for verbose levels.

mattip · 2020-07-10T06:05:38Z

I am not sure whether we should suppress the warnings when someone specifies -mnative or not.

seiko2plus force-pushed the core_improve_infa_build branch 2 times, most recently from 026a1b2 to 1c24bd0 Compare May 11, 2019 11:12

charris added 01 - Enhancement component: numpy.distutils labels May 11, 2019

seiko2plus force-pushed the core_improve_infa_build branch 5 times, most recently from 2561efa to 660f621 Compare May 18, 2019 05:41

seiko2plus force-pushed the core_improve_infa_build branch 2 times, most recently from e3f7603 to 95ffca3 Compare May 26, 2019 23:58

seiko2plus force-pushed the core_improve_infa_build branch from 95ffca3 to 9e05278 Compare June 4, 2019 17:57

seiko2plus force-pushed the core_improve_infa_build branch 6 times, most recently from 531f6a0 to 582cb46 Compare July 1, 2019 23:43

mattip mentioned this pull request Jul 2, 2019

WIP: use openmp vector function ABI for sin, cos, exp and log #7865

Closed

seiko2plus force-pushed the core_improve_infa_build branch 3 times, most recently from c1c7d9e to 8802a35 Compare August 22, 2019 13:40

seiko2plus added 4 commits June 15, 2020 22:51

ENH: [4/7] enable multi-platform SIMD compiler optimizations

b9d5e12

Add several C macros to `utils.h`

ENH: [7/7] enable multi-platform SIMD compiler optimizations

86d2515

Add testing unit for the utilites of CPU dispatcher

seiko2plus force-pushed the core_improve_infa_build branch from ccfad27 to 93fee4e Compare June 16, 2020 16:42

DOC: add a release note

e726538

seiko2plus force-pushed the core_improve_infa_build branch from 93fee4e to e726538 Compare June 16, 2020 16:44

mattip merged commit 8245b39 into numpy:master Jun 17, 2020

anirudh2290 mentioned this pull request Jun 19, 2020

BUG: Fix uint->timedelta promotion to raise TypeError #16639

Merged

seiko2plus commented Jul 8, 2020

View reviewed changes

mattip mentioned this pull request Jul 9, 2020

Build Failure on Windows, Python 3.6 GCC Compilers: UnicodeDecodeError. #16668

Closed

mattip added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 21, 2020

mattip mentioned this pull request Aug 19, 2020

MAINT: refactor _Distutils out of CCompiler_Opt mattip/numpy#46

Closed

charris mentioned this pull request Oct 10, 2020

REL: PRs that may need release notes #17533

Closed

20 tasks

seiko2plus mentioned this pull request Oct 17, 2020

ENH, TST: Bring the NumPy C SIMD vectorization interface "NPYV" to Python #16782

Merged

7 tasks

ganesh-k13 mentioned this pull request Nov 10, 2020

ENH: Added libdivide for floor divide #17727

Merged

seiko2plus mentioned this pull request Dec 18, 2020

ENH: refactor our loop implementations into a stand-alone library #17698

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: enable multi-platform SIMD compiler optimizations #13516

ENH: enable multi-platform SIMD compiler optimizations #13516

seiko2plus commented May 9, 2019 •

edited

Loading

mattip commented Jul 3, 2019

edelsohn commented Jul 3, 2019

rgommers commented Jul 3, 2019

rgommers commented Jul 3, 2019

charris commented Jul 4, 2019

rkern commented Jul 4, 2019

seiko2plus commented Jul 4, 2019

rgommers commented Jul 4, 2019

mattip commented Jun 17, 2020

seiko2plus commented Jun 18, 2020

charris commented Jul 4, 2020 •

edited

Loading

seiko2plus commented Jul 8, 2020

charris commented Jul 8, 2020

charris commented Jul 8, 2020

mattip commented Jul 8, 2020

charris commented Jul 8, 2020

seiko2plus commented Jul 8, 2020 •

edited

Loading

mattip commented Jul 8, 2020

seiko2plus commented Jul 8, 2020

seiko2plus commented Jul 8, 2020

seiko2plus Jul 8, 2020

mattip commented Jul 10, 2020

ENH: enable multi-platform SIMD compiler optimizations #13516

ENH: enable multi-platform SIMD compiler optimizations #13516

Conversation

seiko2plus commented May 9, 2019 • edited Loading

This pullrequest changes

New Distutils class CCompilerOpt

New Build Arguments

Both --cpu-baseline and --cpu-dispatch are accepting the following CPU features :

Explicitly disable the new infrastructure

New auto-generated C header(core/src/common/_cpu_dispatch.h)

New C header(core/src/common/npy_cpu_dispatch.h)

New CPU dispatcher solution(dispatch-able sources)

Add new attributes to umath module

Print the supported CPU features during the run of PytestTester

mattip commented Jul 3, 2019

edelsohn commented Jul 3, 2019

rgommers commented Jul 3, 2019

rgommers commented Jul 3, 2019

charris commented Jul 4, 2019

rkern commented Jul 4, 2019

seiko2plus commented Jul 4, 2019

rgommers commented Jul 4, 2019

mattip commented Jun 17, 2020

seiko2plus commented Jun 18, 2020

charris commented Jul 4, 2020 • edited Loading

seiko2plus commented Jul 8, 2020

charris commented Jul 8, 2020

charris commented Jul 8, 2020

mattip commented Jul 8, 2020

charris commented Jul 8, 2020

seiko2plus commented Jul 8, 2020 • edited Loading

mattip commented Jul 8, 2020

seiko2plus commented Jul 8, 2020

seiko2plus commented Jul 8, 2020

seiko2plus Jul 8, 2020

Choose a reason for hiding this comment

mattip commented Jul 10, 2020

seiko2plus commented May 9, 2019 •

edited

Loading

New Distutils class `CCompilerOpt`

Both `--cpu-baseline` and `--cpu-dispatch` are accepting the following CPU features :

New auto-generated C header(`core/src/common/_cpu_dispatch.h`)

New C header(`core/src/common/npy_cpu_dispatch.h`)

charris commented Jul 4, 2020 •

edited

Loading

seiko2plus commented Jul 8, 2020 •

edited

Loading