-
-
Notifications
You must be signed in to change notification settings - Fork 2
algo: implement sine/cosine based on Intel SVML for single/double precision #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add sin/cos functions with ≤1 ULP and ≤4 ULP accuracy variants - Support large argument ranges with proper range reduction - Standalone implementation doesn't require libm - Written using Google Highway for SIMD optimization - Optimize for both float32 and float64 data types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements highly accurate sine and cosine functions using Intel SVML techniques and extended precision range reduction, with separate implementations for small and large argument cases. Key changes include:
- New implementations of sin/cos functions that guarantee ≤1 ULP (single precision) and ≤4 ULP (double precision) accuracy.
- Support for large argument range reduction using multi-precision techniques based on a precomputed lookup table.
- Integration with the Google Highway library for platform-specific SIMD optimizations.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
npsr/utils-inl.h | Adds utility functions with conditional compilation and lookup support. |
npsr/trig/small-inl.h | Provides polynomial approximations for small-angle sin/cos calculations. |
npsr/trig/lut-inl.h.py | Implements a Python script to generate lookup tables for trigonometric reductions and approximations. |
npsr/trig/large-inl.h | Implements the extended precision algorithm for large argument reduction. |
npsr/trig/inl.h | Integrates the small and large argument implementations into a unified sin/cos API. |
npsr/npsr.h | Wraps the trigonometric headers ensuring single inclusion per target. |
npsr/common.h | Provides common macros, type definitions, and an RAII class for floating‑point environment control. |
Comments suppressed due to low confidence (2)
npsr/trig/small-inl.h:338
- Cosine-specific adjustments are marked as TODO and not yet implemented. Please complete the phase adjustment logic (i.e. correctly apply the π/2 shift) to ensure cosine accuracy matches the sine pathway.
// TODO: Implement cosine-specific adjustments
npsr/trig/large-inl.h:147
- [nitpick] Consider adding more inline comments to explain the rationale behind the magic numbers used for fractional shifts to improve code clarity and maintainability.
constexpr int kFracMidShift = kIsSingle ? 18 : 24;
npsr/common.h
Outdated
* | ||
* Precise precise = {kLowAccuracy, kNoSpecialCases, kNoLargeArgument}; | ||
* const ScalableTag<float> d; | ||
* typename V = Vec<DFromV<SclableTag>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in the documentation comment: 'SclableTag' should be corrected to 'ScalableTag'.
* typename V = Vec<DFromV<SclableTag>>; | |
* typename V = Vec<DFromV<ScalableTag>>; |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think this is a good starting point. Do you want to land this with #if 0
and other various things incomplete @seiko2plus? We can be a bit relaxed as we setup the repo.
#ifdef NPSR_UTILS_INL_H_ | ||
#undef NPSR_UTILS_INL_H_ | ||
#else | ||
#define NPSR_UTILS_INL_H_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a lut-inl.h
, so we can haev LutX2
, LutX4
etc?
A generic utils-inl.h
is going to get pretty big I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, mainly to optimize 128-bit simd extensions, transpose and shuffles to reduce cache/memory access
npsr/common.h
Outdated
struct _Zero {}; | ||
static constexpr auto kForce = _Force{}; | ||
static constexpr auto kNearest = _Nearest{}; | ||
#if 0 // not used yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add these when we come across use cases for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, this good point .. I'm gonna remove them
|
||
namespace npsr::HWY_NAMESPACE::sincos { | ||
template <bool IS_COS, typename V, typename Prec> | ||
HWY_API V SinCos(Prec &prec, V x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially worth creating a new macro for HWY_API
so it doesn't change behaviour for Highway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest a name? HWY_API just combine HWY_ATTR and HWY_INLINE.
npsr/trig/inl.h
Outdated
#include "npsr/sincos/small-inl.h" | ||
|
||
// clang-format off | ||
#if defined(NPSR_TRIG_INL_H_) == defined(HWY_TARGET_TOGGLE) // NOLINT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be trig-inl.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
- Add Python-based generator tool for creating C++ headers and Python templates - Include sollya wrapper for mathematical computations - Add utilities for formatting C/Python arrays and header generation - Include comprehensive .gitignore for Python/C++ development
- Configure spin commands for building, testing, and development - Add custom generate command for sollya-based file generation - Set up IPython integration with pre-imported numpy_sr module - Configure project metadata and build requirements
- Implement Python bindings for Highway SIMD intrinsics - Add type system for scalar/vector data with proper conversions - Create dynamic dispatch system for multiple SIMD targets - Include helper functions for ULP distance calculations - Set up pytest infrastructure for comprehensive testing
Match Highway library's code formatting conventions
Highway provides portable SIMD intrinsics required by the testing framework
Initialize build configuration for npsr module
- Generate test vectors using Sollya for high-precision reference values - Test challenging cases including large arguments, pi multiples, and edge cases - Cover both float32 and float64 precision with ULP accuracy validation - Include tests for standard and low-accuracy modes
Clean up empty PreciseBind function in preparation for proper implementation
Enable trigonometric functions in the Python testing framework
- Correct polynomial coefficients for double precision cosine - Fix table indexing and data layout for lookup tables - Improve accuracy for large argument range reduction
- Implement efficient SVML-style linear approximation for sin/cos - Fix precision issues in Cody-Waite range reduction - Add proper sign handling for both sine and cosine - Improve polynomial evaluation with correction terms - Document mathematical equivalence between traditional and optimized approaches
- Move Precise class and related types to npsr/precise.h - Add deduction guides for cleaner template instantiation - Improve modularity by separating precision control from common utilities
…ures - Change parameter order to Precise<Args...>, V for consistency - Update include paths from sincos/ to trig/ - Fix constant types and naming conventions (kException -> kExceptions)
Add meson configuration to include trigonometric test data in installation
Define Python dependencies for building and testing: - meson/meson-python for build system - ninja for compilation - spin for development workflow - pytest for test execution
I was putting all my time into the testing unit, also improve luts generations.. so I left any extra optimizations off until I make sure that both functions sin/cos are well implemented. |
All stress tests passed localy on x86, just need a CI and nice cleanup and then it will be ready for review ... maybe I should split it up into multiple prs |
Yep, this is unreviewable as-is, so incremental PRs would be much appreciated 😸 |
Yes please, I am having a hard time reviewing this. |
Working on it. |
Uh oh!
There was an error while loading. Please reload this page.