Skip to content

algo: implement sine/cosine based on Intel SVML for single/double precision #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

seiko2plus
Copy link
Member

@seiko2plus seiko2plus commented Jun 4, 2025

  • Add sin/cos functions with ≤1 ULP and ≤4 ULP accuracy variants
  • Support large argument ranges with proper range reduction
  • Standalone implementation doesn't require libm
  • Written using Google Highway
  • implemented for both float32 and float64 data types

- Add sin/cos functions with ≤1 ULP and ≤4 ULP accuracy variants
- Support large argument ranges with proper range reduction
- Standalone implementation doesn't require libm
- Written using Google Highway for SIMD optimization
- Optimize for both float32 and float64 data types
@seiko2plus seiko2plus changed the title algo: implement sine/cosine using Intel SVML for single/double precision algo: implement sine/cosine based on Intel SVML for single/double precision Jun 4, 2025
@r-devulap r-devulap self-assigned this Jun 6, 2025
@r-devulap r-devulap requested a review from Copilot June 6, 2025 15:49
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements highly accurate sine and cosine functions using Intel SVML techniques and extended precision range reduction, with separate implementations for small and large argument cases. Key changes include:

  • New implementations of sin/cos functions that guarantee ≤1 ULP (single precision) and ≤4 ULP (double precision) accuracy.
  • Support for large argument range reduction using multi-precision techniques based on a precomputed lookup table.
  • Integration with the Google Highway library for platform-specific SIMD optimizations.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
npsr/utils-inl.h Adds utility functions with conditional compilation and lookup support.
npsr/trig/small-inl.h Provides polynomial approximations for small-angle sin/cos calculations.
npsr/trig/lut-inl.h.py Implements a Python script to generate lookup tables for trigonometric reductions and approximations.
npsr/trig/large-inl.h Implements the extended precision algorithm for large argument reduction.
npsr/trig/inl.h Integrates the small and large argument implementations into a unified sin/cos API.
npsr/npsr.h Wraps the trigonometric headers ensuring single inclusion per target.
npsr/common.h Provides common macros, type definitions, and an RAII class for floating‑point environment control.
Comments suppressed due to low confidence (2)

npsr/trig/small-inl.h:338

  • Cosine-specific adjustments are marked as TODO and not yet implemented. Please complete the phase adjustment logic (i.e. correctly apply the π/2 shift) to ensure cosine accuracy matches the sine pathway.
    // TODO: Implement cosine-specific adjustments

npsr/trig/large-inl.h:147

  • [nitpick] Consider adding more inline comments to explain the rationale behind the magic numbers used for fractional shifts to improve code clarity and maintainability.
constexpr int kFracMidShift = kIsSingle ? 18 : 24;

npsr/common.h Outdated
*
* Precise precise = {kLowAccuracy, kNoSpecialCases, kNoLargeArgument};
* const ScalableTag<float> d;
* typename V = Vec<DFromV<SclableTag>>;
Copy link
Preview

Copilot AI Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo in the documentation comment: 'SclableTag' should be corrected to 'ScalableTag'.

Suggested change
* typename V = Vec<DFromV<SclableTag>>;
* typename V = Vec<DFromV<ScalableTag>>;

Copilot uses AI. Check for mistakes.

@Mousius Mousius mentioned this pull request Jun 9, 2025
Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think this is a good starting point. Do you want to land this with #if 0 and other various things incomplete @seiko2plus? We can be a bit relaxed as we setup the repo.

#ifdef NPSR_UTILS_INL_H_
#undef NPSR_UTILS_INL_H_
#else
#define NPSR_UTILS_INL_H_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a lut-inl.h, so we can haev LutX2, LutX4 etc?

A generic utils-inl.h is going to get pretty big I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, mainly to optimize 128-bit simd extensions, transpose and shuffles to reduce cache/memory access

npsr/common.h Outdated
struct _Zero {};
static constexpr auto kForce = _Force{};
static constexpr auto kNearest = _Nearest{};
#if 0 // not used yet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add these when we come across use cases for them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, this good point .. I'm gonna remove them


namespace npsr::HWY_NAMESPACE::sincos {
template <bool IS_COS, typename V, typename Prec>
HWY_API V SinCos(Prec &prec, V x) {
Copy link
Member

@Mousius Mousius Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially worth creating a new macro for HWY_API so it doesn't change behaviour for Highway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest a name? HWY_API just combine HWY_ATTR and HWY_INLINE.

npsr/trig/inl.h Outdated
#include "npsr/sincos/small-inl.h"

// clang-format off
#if defined(NPSR_TRIG_INL_H_) == defined(HWY_TARGET_TOGGLE) // NOLINT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be trig-inl.h?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed

- Add Python-based generator tool for creating C++ headers and Python templates
- Include sollya wrapper for mathematical computations
- Add utilities for formatting C/Python arrays and header generation
- Include comprehensive .gitignore for Python/C++ development
- Configure spin commands for building, testing, and development
- Add custom generate command for sollya-based file generation
- Set up IPython integration with pre-imported numpy_sr module
- Configure project metadata and build requirements
- Implement Python bindings for Highway SIMD intrinsics
- Add type system for scalar/vector data with proper conversions
- Create dynamic dispatch system for multiple SIMD targets
- Include helper functions for ULP distance calculations
- Set up pytest infrastructure for comprehensive testing
Match Highway library's code formatting conventions
Highway provides portable SIMD intrinsics required by the testing framework
Initialize build configuration for npsr module
- Generate test vectors using Sollya for high-precision reference values
- Test challenging cases including large arguments, pi multiples, and edge cases
- Cover both float32 and float64 precision with ULP accuracy validation
- Include tests for standard and low-accuracy modes
Clean up empty PreciseBind function in preparation for proper implementation
Enable trigonometric functions in the Python testing framework
- Correct polynomial coefficients for double precision cosine
- Fix table indexing and data layout for lookup tables
- Improve accuracy for large argument range reduction
- Implement efficient SVML-style linear approximation for sin/cos
- Fix precision issues in Cody-Waite range reduction
- Add proper sign handling for both sine and cosine
- Improve polynomial evaluation with correction terms
- Document mathematical equivalence between traditional and optimized approaches
- Move Precise class and related types to npsr/precise.h
- Add deduction guides for cleaner template instantiation
- Improve modularity by separating precision control from common utilities
…ures

- Change parameter order to Precise<Args...>, V for consistency
- Update include paths from sincos/ to trig/
- Fix constant types and naming conventions (kException -> kExceptions)
Add meson configuration to include trigonometric test data in installation
Define Python dependencies for building and testing:
- meson/meson-python for build system
- ninja for compilation
- spin for development workflow
- pytest for test execution
@seiko2plus
Copy link
Member Author

In general I think this is a good starting point. Do you want to land this with #if 0 and other various things incomplete @seiko2plus? We can be a bit relaxed as we setup the repo.

I was putting all my time into the testing unit, also improve luts generations.. so I left any extra optimizations off until I make sure that both functions sin/cos are well implemented.

@seiko2plus
Copy link
Member Author

All stress tests passed localy on x86, just need a CI and nice cleanup and then it will be ready for review ... maybe I should split it up into multiple prs

@seiko2plus seiko2plus mentioned this pull request Jun 10, 2025
@Mousius
Copy link
Member

Mousius commented Jun 11, 2025

All stress tests passed localy on x86, just need a CI and nice cleanup and then it will be ready for review ... maybe I should split it up into multiple prs

Yep, this is unreviewable as-is, so incremental PRs would be much appreciated 😸

@r-devulap
Copy link
Member

maybe I should split it up into multiple prs

Yes please, I am having a hard time reviewing this.

@seiko2plus
Copy link
Member Author

Yep, this is unreviewable as-is, so incremental PRs would be much appreciated
Yes please, I am having a hard time reviewing this.

Working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants