|
1 |
| -QPFloat 0.3 beta |
2 |
| -Release under the GPL 3.0 License. See COPYING.txt |
3 |
| - |
4 |
| -***** FEATURES ***** |
5 |
| - |
6 |
| -This library emulates the quadruple precision floating-point (IEEE-754-2008 binary128). It includes: |
7 |
| - Primitive operations such as addition, subtraction, multiplication, and division |
8 |
| - Higher operations such as natural logarithm, arbitrary base logarithm, exp, and pow |
9 |
| - Cieling, Floor, Round, Truncate, and Fraction (Fraction returns just the fractional portion) |
10 |
| - Sin, Cos, Tan, ASin, ACos, ATan, ATan2 implemented using Maclaurin series |
11 |
| - ToString function |
12 |
| - Hard coded constants such as Pi and e to full quadruple precision |
13 |
| - Implemented on little-endian, may or may not work on big-endian |
14 |
| - Follows the IEEE in-memory format on little-endian machines (transferable to and from hardware) |
15 |
| - arithmetic on sub-normals |
16 |
| - round-to-even |
17 |
| - correct propogation Inf, -Inf, and NaN |
18 |
| - enablable exception mechanisms |
19 |
| - |
20 |
| - This library contains both an unmanaged implementation and a managed implementation (atop the unmanaged). |
21 |
| - |
22 |
| -***** USING ***** |
23 |
| - |
24 |
| -VB.Net and C# users: |
25 |
| - Add a reference to Release/QPFloat.dll (x64/Release/QPFloat.dll for x64) |
26 |
| - create new variables using the System.Quadruple type (C# can do "using System;" so that you can just type Quadruple) |
27 |
| - |
28 |
| -VC++/CLI users: |
29 |
| - Add a reference to Release/QPFloat.dll (x64/Release/QPFloat.dll for x64) |
30 |
| - create new variables using System::Quadruple |
31 |
| - |
32 |
| -C++ users: |
33 |
| - Add a reference to Release/QPFloat.dll (x64/Release/QPFloat.dll for x64) |
34 |
| - #include "__float128.h" |
35 |
| - create new variables using __float128 |
36 |
| - |
37 |
| -***** COMPILING ***** |
38 |
| - |
39 |
| -Microsoft Visual C++ users: |
40 |
| - This code base can be compiled with or without /clr (Managed C++) to the compiler. |
41 |
| - If /clr IS used, then this library can be used by C# and VB.Net via the type Quadruple, the same way as Double |
42 |
| - If /clr IS NOT used, #ifdef _MANAGED has been used to automatically exclude the managed implementation. |
43 |
| - Since extension methods can't be used to add static functions to System.Math, Operations like Abs, Sin, etc are static methods of Quadruple |
44 |
| - __float128 is the unmanaged (faster) implementation, which is still present even when /clr is used. |
45 |
| - |
46 |
| -Other compiler users: |
47 |
| - This code uses #ifdef to remove Microsoft specific functionality automatically (#ifdef _MANAGED) |
48 |
| - Though I've not tried, it should be relatively easy to build using other compilers. |
49 |
| - following existing conventions, __float128 is the type proffered by this library. |
| 1 | +# QPFloat (GPL 3.0) # |
| 2 | + |
| 3 | +For high-precision mathematics, the Quadruple-Precision Floating Point library (QPFloat) emulates the IEEE 754 2008 binary128 on x86, and x64 (and probably any other little-endian platform) using integer arithmetic and bit manipulation. It contains a native C++ and assembler implementation, and a .Net C++/CLI implementation. |
| 4 | + |
| 5 | +## Features ## |
| 6 | + |
| 7 | +Much effort has been put into supporting a full feature set, including optimized transcendental functions to full precision. |
| 8 | + |
| 9 | +### Standard operations ### |
| 10 | +* addition, subtraction, multiplication, division |
| 11 | +* Min, Max, Abs, Ceiling, Floor, Round, Truncate, Fraction |
| 12 | +* ToString and FromString |
| 13 | +* Cast operators to and from native numeric data types |
| 14 | + |
| 15 | +### Numeric information ### |
| 16 | +* IsZero |
| 17 | +* IsNaN |
| 18 | +* IsInfinite |
| 19 | +* IsSigned |
| 20 | +* IsSubNormal |
| 21 | + |
| 22 | +### Transcendental functions ### |
| 23 | +* natural logarithm (Ln), arbitrary-base logarithm (Log) |
| 24 | +* exponentiation (Exp) |
| 25 | +* power function (Pow) |
| 26 | +* Sin, Cos, Tan |
| 27 | +* ASin, ACos, ATan, ATan2 |
| 28 | + |
| 29 | +### Miscellaneous features ### |
| 30 | +* Fast! Optimized low level bit manipulation |
| 31 | +* Hard coded constant Pi and E to full quadruple precision |
| 32 | +* Implemented on little-endian architecture, may or may not work on big-endian??? |
| 33 | +* Strictly follows IEEE specifications to the extent availabile on Wikipedia. |
| 34 | +* Multiple guard bits |
| 35 | +* Arithmetic on sub-normals is fully supported. |
| 36 | +* Inf, -Inf, and NaN are fully supported. |
| 37 | +* round-to-even |
| 38 | +* emulates FPU exceptions with enable/disable flags (default: all disabled) |
| 39 | + |
| 40 | +## Using ## |
| 41 | + |
| 42 | +### VB.Net, C# users, and C++/CLI ### |
| 43 | +1. Add a reference to Release/QPFloat.dll (x64/Release/QPFloat.dll for x64) |
| 44 | +2. Create new variables using System.Quadruple |
| 45 | + |
| 46 | +### C++ users ### |
| 47 | +1. Add a reference to QPFloat.dll (x64/Release/QPFloat.dll for x64) |
| 48 | +2. include "__float128.h" |
| 49 | +3. create new variables using __float128 |
| 50 | + |
| 51 | +## Compiling ## |
| 52 | + |
| 53 | +### Microsoft Visual C++ ### |
| 54 | +* This code base can be compiled with or without /clr (Managed C++) to the compiler. |
| 55 | +* If /clr IS used, then this library can be used by C#, VB.Net, and C++/CLI via the type System.Quadruple. |
| 56 | +* If /clr IS NOT used, #ifdef _MANAGED has been used to automatically exclude the managed implementation. |
| 57 | +* Since extension methods can't be used to add static functions to System.Math, Operations like Abs, Sin, etc are static methods of Quadruple. |
| 58 | +* __float128 is the unmanaged (faster) implementation, which is still present even when /clr is used. |
| 59 | + |
| 60 | +### Other compilers ### |
| 61 | +* This code uses #ifdef to remove Microsoft specific functionality automatically (#ifdef _MANAGED) |
| 62 | +* Though I've not tried, it should be relatively easy to build using other compilers. |
| 63 | +* following existing conventions, __float128 is the type proffered by this library. |
0 commit comments