ENH: Convert comparison from C universal intrinsics to C++ using Highway #28490

ixgbe · 2025-03-13T01:15:26Z

No description provided.

…hitectures

…IMD-native

ixgbe · 2025-03-24T02:04:33Z

I apologize for the interruption, as I know you must be very busy. I submitted PR #28490 a while ago and was wondering if you might have time to review it when your schedule permits. I understand you're likely juggling many priorities, but I'd greatly appreciate any feedback you could provide on my contribution.

Thank you for all your work maintaining NumPy!

Best regards！

r-devulap · 2025-03-24T21:53:01Z

@ixgbe thanks for your contribution. @seiko2plus had a few comments about using a wrapper around Highway tags to make it easier to develop template functions. I have tagged him to add his comments as well.

r-devulap · 2025-03-24T21:56:03Z

From #21057: If I understood correctly, it was along the lines of

numpy/numpy/_core/src/umath/loops_comparison.dispatch.cpp

Lines 12 to 82 in b70cda0

    
           template <typename T> 
        
           struct OpEq { 
        
           #if NPY_SIMD 
        
               template <typename T_ = std::enable_if<kSupportLane<T>, T>> 
        
               NPY_FINLINE Vec<T> operator()(const Vec<T_> &a) 
        
               { return a; } 
        
               template <typename T_ = std::enable_if<kSupportLane<T>, T>> 
        
               NPY_FINLINE auto operator()(const Vec<T_> &a, const Vec<T_> &b) 
        
               {return Eq(a, b); } 
        
           #endif 
        
               NPY_FINLINE T operator()(T a) 
        
               { return a; } 
        
               NPY_FINLINE npy_bool operator()(T a, T b) 
        
               { return a == b; } 
        
           }; 
        
           template <typename T> 
        
           struct OpNe { 
        
           #if NPY_SIMD 
        
               template <typename T_ = std::enable_if<kSupportLane<T>, T>> 
        
               NPY_FINLINE Vec<T> operator()(const Vec<T_> &a) 
        
               { return a; } 
        
               template <typename T_ = std::enable_if<kSupportLane<T>, T>> 
        
               NPY_FINLINE auto operator()(const Vec<T_> &a, const Vec<T_> &b) 
        
               { return Ne(a, b); } 
        
           #endif 
        
               NPY_FINLINE T operator()(T a) 
        
               { return a; } 
        
               NPY_FINLINE npy_bool operator()(T a, T b) 
        
               { return a != b; } 
        
           }; 
        
           template <typename T> 
        
           struct OpLt { 
        
           #if NPY_SIMD 
        
               template <typename T_ = std::enable_if<kSupportLane<T>, T>> 
        
               NPY_FINLINE Vec<T> operator()(const Vec<T_> &a) 
        
               { return a; } 
        
               template <typename T_ = std::enable_if<kSupportLane<T>, T>> 
        
               NPY_FINLINE auto operator()(const Vec<T_> &a, const Vec<T_> &b) 
        
               { return Lt(a, b); } 
        
           #endif 
        
               NPY_FINLINE T operator()(T a) 
        
               { return a; } 
        
               NPY_FINLINE npy_bool operator()(T a, T b) 
        
               { return a < b; } 
        
           }; 
        
           template <typename T> 
        
           struct OpLe { 
        
           #if NPY_SIMD 
        
               template <typename T_ = std::enable_if<kSupportLane<T>, T>> 
        
               NPY_FINLINE Vec<T> operator()(const Vec<T_> &a) 
        
               { return a; } 
        
               template <typename T_ = std::enable_if<kSupportLane<T>, T>> 
        
               NPY_FINLINE auto operator()(const Vec<T_> &a, const Vec<T_> &b) 
        
               { return Le(a, b); } 
        
           #endif 
        
               NPY_FINLINE T operator()(T a) 
        
               { return a; } 
        
               NPY_FINLINE npy_bool operator()(T a, T b) 
        
               { return a <= b; } 
        
           };

ixgbe · 2025-03-27T00:46:55Z

Hi @seiko2plus,
I've updated the code to wrap Highway tags as suggested (#28490 (comment)). Could you please review the implementation when you have time?
Let me know if any further adjustments are needed. Thanks!

seiko2plus

During the last optimization meeting, I proposed a thin wrapper over Google's Highway
SIMD library to simplify its interface. The wrapper would eliminate the need for
class tags and use lane types directly, which can be deduced from the arguments in
most cases. We can also leverage namespaces for low-level register access and still
rely on lane type only.

While your last commits try to follow the stracture of the purposed C++ universal
intrinsics example still the code count on tags.

So, I suggest to use the following snippet as a reference to implement the wrapper:

We going to need two headers to implement the wrapper, the first one is the simd.hpp
header which will provide the SIMD interface and the second one is simd.inc.hpp
which is a template header will be called multiple times with different namespaces to
provide the implementation of the SIMD interface.

The simd.hpp header will look like this:

Click to expand

/**
 * This header provides a thin wrapper over Google's Highway SIMD library.
 *
 * The wrapper aims to simplify the SIMD interface of Google's Highway by
 * get ride of its class tags and use lane types directly which can be deduced
 * from the args in most cases.
 */

#ifndef NUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_
#define NUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_

#ifndef NPY_DISABLE_OPTIMIZATION
// Highway SIMD is only available when optimization is enabled
#include <hwy/highway.h>
#define NPY_SIMDX 1  // since NPY_SIMD indicates the C universal intrinsics interface
#else
#define NPY_SIMDX 0
#endif

namespace np {

/// Represents the max SIMD width supported by the platform.
namespace simd {
#if NPY_SIMDX
/// The highway namespace alias.
/// We can not import all the symbols from the HWY_NAMESPACE because it will
/// conflict with the existing symbols in the numpy namespace.
namespace hn = hwy::HWY_NAMESPACE;
// internaly used by the template header
template <typename TLane>
using _Tag = hn::ScalableTag<TLane>;
#endif
#include "simd.inc.hpp"
}  // namespace simd

/// Represents the 128-bit SIMD width.
namespace simd128 {
#if NPY_SIMDX
namespace hn = hwy::HWY_NAMESPACE;
template <typename TLane>
using _Tag = hn::Full128<TLane>;
#endif
#include "simd.inc.hpp"
}  // namespace simd128

}  // namespace np

#endif  // NUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_

and the simd.inc.hpp header will look like this:

Click to expand

#ifndef NPY_SIMDX
#error "This is not defined, this is not a standalone header use simd.hpp instead"
#endif

// NOTE: This file is included by simd.hpp multiple times with different namespaces
// so avoid include any headers here

// #define NPY_SIMDX 1  // uncomment to enable Highlighting

/**
 * Determines whether the specified lane type is supported by the SIMD extension.
 * Alaways defined as false when SIMD is not enabled so it can be used in SFINAE.
 *
 * @tparam TLane The lane type to check for support.
 */
template <typename TLane>
constexpr bool kSupportLane = NPY_SIMDX != 0;

#if !NPY_SIMDX
// defined as void when SIMD is not enabled, for SFINAE purposes
template <typename TLane>
using Vec = void;
// defined as void when SIMD is not enabled, for SFINAE purposes
template <typename TLane>
using Mask = void;
#endif

#if NPY_SIMDX

#if HWY_HAVE_FLOAT16
template <>
constexpr bool kSupportLane<Half> = true;
#endif

#if HWY_HAVE_FLOAT64
template <>
constexpr bool kSupportLane<double> = true;
#endif

/// Represents an N-lane vector based on the specified lane type.
template <typename TLane>
using Vec = hn::Vec<_Tag<TLane>>;

/// Represents a mask vector with boolean values or as a bitmask.
template <typename TLane>
using Mask = hn::Mask<_Tag<TLane>>;

/// Unaligned load of a vector from memory.
template <typename TLane>
HWY_API Vec<TLane>
LoadU(const TLane *ptr)
{
    return hn::LoadU(_Tag<TLane>(), ptr);
}

/// Unaligned store of a vector to memory.
template <typename TLane>
HWY_API void
StoreU(const Vec<TLane> &a, TLane *ptr)
{
    hn::StoreU(a, _Tag<TLane>(), ptr);
}

/// Returns the number of vector lanes based on the lane type.
template <typename TLane>
HWY_API constexpr size_t
Lanes(TLane tag = 0 /*optional tag*/)
{
    return hn::Lanes(_Tag<TLane>());
}

/// Returns an uninitialized N-lane vector.
template <typename TLane>
HWY_API Vec<TLane>
Undefined(TLane tag = 0 /*optional tag*/)
{
    return hn::Undefined(_Tag<TLane>());
}

// Import common Highway intrinsics:
using hn::Add;
using hn::And;
using hn::Mul;
using hn::Not;
using hn::Or;
using hn::Sub;
using hn::Xor;

#endif  // NPY_SIMDX

"@ixgbe, please hold off on applying this suggestion, even if you agree with it, until we reach consensus on the final interface design. @r-devulap, @Mousius, and @jan-wassenberg - your input on this matter would be valuable."

Note this not my final review, just add it some examples:

seiko2plus · 2025-03-28T00:37:22Z