Skip to content

Conversation

ganesh-k13
Copy link
Member

@ganesh-k13 ganesh-k13 commented Oct 10, 2021

Adding float.hex and float.fromhex Implementation

Added float.hex as follows:

>>> np.float32(3074).hex()
'0x1.804000p+11'

Added float.fromhex as follows

>>> np.float32.fromhex('0x1.804000p+11')
3074.0

TODO

  • Documentation
  • Annotations [Already present for 64]
  • Testcases
    • test_ends
    • test_invalid_inputs
    • test_whitespace
    • test_from_hex
    • test_roundtrip
    • test_subclass ?
  • Release notes
  • Benchmarks

Part of #13375

References:

@ganesh-k13 ganesh-k13 changed the title ENH: float.hex Implementation ENH: float.hex and float.fromhex Implementation Nov 20, 2021
@ganesh-k13
Copy link
Member Author

It was just easier to add fromhex than implement hex and fromhex separately for testing, etc.

@ganesh-k13 ganesh-k13 added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label Nov 20, 2021
@ganesh-k13 ganesh-k13 force-pushed the enh_13375_float_hex branch 2 times, most recently from ad4f12a to 1f395ed Compare November 27, 2021 06:49
Comment on lines 2034 to 2046
#if @is_half@
(NPY_SIZEOF_FLOAT * CHAR_BIT + 3) / 4 + // mantissa hex digits max
#else
(NPY_SIZEOF_@NAME@ * CHAR_BIT + 3) / 4 + // mantissa hex digits max
#endif
1 + // decimal point, '.'
1 + // mantissa-exponent separator, 'p'
1 + // mantissa sign, '-' or '+'
#if @is_half@
(NPY_SIZEOF_FLOAT * CHAR_BIT + 2) / 3 + // exponent decimal digits max
#else
(NPY_SIZEOF_@NAME@ * CHAR_BIT + 2) / 3 + // exponent decimal digits max
#endif
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a staggering amount of #if @is_half@ :), open to any suggestions to avoid them. We can convert the PyObject of half itself to float and try perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda resolved in 07b4614, not very happy with the solution, hence keeping this thread open to suggestions,

Comment on lines 309 to 344
# XXX How are we getting to 970?
# f"0x3fffffffffffffp+{MAXEXP - NMANT}",
Copy link
Member Author

@ganesh-k13 ganesh-k13 Dec 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mdickinson, can you please help me understand how we arrive at 970 for 64 bits? It exactly gives 1.7976931348623158079e+308 when max is 1.7976931348623157e+308. Please ignore {MAXEXP - NMANT}, it was just a placeholder. [EDIT], to be more specific, how can we get it via MAXEXP, NMANT, etc

This is in reference to https://github.com/python/cpython/blob/e18d81569fa0564f3bc7bcfd2fce26ec91ba0a6e/Lib/test/test_float.py#L1332

Also, thanks for the extensive testsuite, was able to find a myriad of bugs :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x3fffffffffffffp+970 represents the smallest positive value that overflows rather than rounding to a finite number.

As to derivation: the maximum finite representable value is 2**DBL_MAX_EXP - 2**(DBL_MAX_EXP - DBL_MANT_DIG) (in terms of the constants defined in the C standard). However, this isn't the boundary at which we start to overflow: that boundary is halfway between the above value and 2**DBL_MAX_EXP, so it's 2**DBL_MAX_EXP - 2**(DBL_MAX_EXP - DBL_MANT_DIG - 1), which assuming IEEE 754 binary64, is 2**1024 - 2**970, or (2**54 - 1) * 2**970.

For IEEE 754 single-precision (float32), the corresponding boundary would be 2**128 - 2**103, or 0x1ffffffp+103. For float16, it would be 2**16 - 2**4 (0xfffp+4):

A bit off-topic, but there's something a bit funny going on with int-to-float32 conversions in NumPy. I suspect a double-rounding through float64 first.

>>> np.float32(2**128 - 2**103)  # ok: gives infinity, as expected
inf
>>> np.float32(2**128 - 2**103 - 1)  # bad: expect a finite result, get infinity instead
inf

It looks as though the actual boundary is at 2**128 - 2**103 - 2**74 instead of 2**128 - 2**103:

>>> np.float32(2**128 - 2**103 - 2**74)
inf
>>> np.float32(2**128 - 2**103 - 2**74 - 1)
3.4028235e+38

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the explanation @mdickinson! Will incorporate the same.

@mdickinson
Copy link
Contributor

A bit off-topic, but there's something a bit funny going on with int-to-float32 conversions in NumPy.

I opened #20687 for this.

@ganesh-k13 ganesh-k13 force-pushed the enh_13375_float_hex branch from 85717d1 to 07b4614 Compare January 1, 2022 15:19
Comment on lines 2103 to 2105
npy_clear_floatstatus_barrier((char *)&value);
half_value = npy_float_to_half(value);
if (npy_get_floatstatus_barrier((char *)&value) & NPY_FPE_OVERFLOW) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized after pushing, if we go beyond float instead of beyond half, we'll return at line 2094's if and the message will be .. too large for float instead of half. Hmm, half is getting real tricky with a lot of edge cases. Will see how we can handle this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this get handled?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think wouldn't worry about this one. Yes, it would be nicer to report half, but overall it doesn't matter too much.

Rather than float barrier, it may be a bit faster/easier to explicitly check for an inf result (and that it was not inf before) for scalar cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change probably fixes the piodide error indirectly: It doesn't properly support FPEs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a compromise, I removed the @name@ so we don't get an incorrect message.

@ganesh-k13 ganesh-k13 force-pushed the enh_13375_float_hex branch 2 times, most recently from da93a4b to fbb99c9 Compare January 4, 2022 14:57
Copy link
Member

@BvB93 BvB93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the fromhex signature in the .pyi stub file? Since currently it works with np.float64 types.

--- a/numpy/__init__.pyi
+++ b/numpy/__init__.pyi
@@ -2910,7 +2910,7 @@ def tolist(self) -> float: ...
     @classmethod
-    def fromhex(cls: type[float64], string: str, /) -> float64: ...
+    def fromhex(cls: type[_FloatType], string: str, /) -> _FloatType: ...

@ganesh-k13
Copy link
Member Author

Thanks @BvB93, I was wondering the same. Will add to TODO

@ganesh-k13 ganesh-k13 force-pushed the enh_13375_float_hex branch from 74c45be to fc8d562 Compare January 8, 2022 11:46
@ganesh-k13
Copy link
Member Author

I can resolve the merge conflict this week. @mattip any idea who can review this? Can Mark from CPython help? I can try pinging him.

@mattip
Copy link
Member

mattip commented Mar 13, 2023

Let's see once it is ready for review.

ganesh-k13 added a commit to ganesh-k13/numpy that referenced this pull request Mar 18, 2023
@ganesh-k13 ganesh-k13 force-pushed the enh_13375_float_hex branch from 620ee9b to 5931945 Compare March 18, 2023 05:56
@ganesh-k13
Copy link
Member Author

i have resolved the conflicts

@ganesh-k13
Copy link
Member Author

@mattip any pointers on how to proceed with this one?

@mattip
Copy link
Member

mattip commented Mar 28, 2023

There are some failing tests. Is there a problem with Pyodide and PyPy?

Could you add a test that the result of f.hex() is equivalent in NumPy and CPython, for some values of f (as a python float vs a NumPy scalar)

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review, but some comments (not all need addressing).

Overall, I guess we should just go with this.

I think right now these functions end up as public API in npy_math. That doesn't seem bad (they are very well defined), just wanted to check on it briefly.

* should be used as resultant floating point value.
*/
NPY_INPLACE npy_bool
npy_@name@_fromhex(const char *s, const char *s_end, npy_@name@ *result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick, but please return -1 on error, it is just the more common convention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited in 8d7bc55

('longdouble', ('-0x1.999999999999a000p-4', '0x1.921fb54442d18000p+1'))
):
# TODO: We loose precision for 128 bits here, hence the ending 0s
pi_rounded = getattr(_numerictypes, float_name)(3.141592653589793116)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the string for pi or np.sqrt(sctype(2)), but will differ on platforms anyway...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I did not fully get it, Do you mean change it to individual type's string so it's uniform across platforms?

// read mantissa (0xh.hh) and store end at mant_end
// Size of mantissa + decimal '.'
mant_end = i + (NPY_SIZEOF_HALF * CHAR_BIT + 3) / 4 + 1 + 1;
i = mant_end = mant_end+1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to round here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to round here. I don't have a mathematical proof as to why we don't need to round off, however, purely guessing from the tests passing.

Comment on lines 2103 to 2105
npy_clear_floatstatus_barrier((char *)&value);
half_value = npy_float_to_half(value);
if (npy_get_floatstatus_barrier((char *)&value) & NPY_FPE_OVERFLOW) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think wouldn't worry about this one. Yes, it would be nicer to report half, but overall it doesn't matter too much.

Rather than float barrier, it may be a bit faster/easier to explicitly check for an inf result (and that it was not inf before) for scalar cases.

Comment on lines 2103 to 2105
npy_clear_floatstatus_barrier((char *)&value);
half_value = npy_float_to_half(value);
if (npy_get_floatstatus_barrier((char *)&value) & NPY_FPE_OVERFLOW) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change probably fixes the piodide error indirectly: It doesn't properly support FPEs.

Added the following tests:
* two spellings of infinity, with optional signs; case-insensitive
* nans with optional sign; case insensitive
* variations in input format
* moving the point around
* [WIP] results that should overflow...
* ...and those that round to +-max float
* zeros
Improved half handling via:
* Added a new utility function to convert float hex to half
* Split half `hex` and `fromhex` to avoid macro clutter
* Special handling of overflows for half cases
* Few minor comments and string changes.
test_from_hex:
* values that should underflow to 0

test_ends:
* Added case for MAX, MIN, TINY(smallest subnormal) and EPS

Misc:
* Added asserts. No idea what I was testing without it :)
* Various linting fixes
Added following cases:
* check round-half-even is working correctly near 0
* b.p.o. 44954 cases

Added TODO for following:
* Overflow end cases
* those that round to +-max float
* ... and near MIN
* ... and near 1.0
* Use proper Mantissa for moving the point around
Added TODO for 128 bit precision loss
Tracking in numpy#21903

1. moving the point around (pi).
2. Make it go towards NMANT (in ref to [1])
3. results that should overflow...
4. check round-half-even is working correctly near MIN
5. check round-half-even is working correctly near 1.0
@ganesh-k13 ganesh-k13 force-pushed the enh_13375_float_hex branch from 5931945 to a0112b0 Compare April 30, 2023 16:31
@ganesh-k13
Copy link
Member Author

The explicit inf check helped with Pyodide issue. I'm curious how the other calls are passing. Is it a lack of tests or is the caller handling it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. triaged Issue/PR that was discussed in a triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants