WIP, MAINT: scalar float str match array #28696

tylerjereddy · 2025-04-12T17:51:41Z

Fixes BUG: Floating point printing / string conversion is incorrect #28679, but mostly an initial demo for now...
This patch sends scalar floating point string representations through the exact dragon representation path rather than the unique path for the reasons outlined in the matching ticket. It does not concern itself with whether the final integer.0 is printed or excluded via integer.. The main argument in favor is that it is pretty weird to follow a different path for 0-D arrays vs. multi-element arrays. The main argument against is likely the potential shenanigans this kind of large-scale reprsentation change could entail downstream if folks already have shims around it and if the rounded representations are not truly bugs but just different design decisions. Also, the impact on round-tripping behaviors in some cases looks like it may cause quite some degree of annoyance for us based on remaining local test failures for this branch.
A hypothesis test was added that fails before and passes after the switch to the exact scalar float representations. hypothesis was chosen because the idiosyncrasies of asymmetric IEEE floating point number lines are well known and trying to test them with a few hard-coded manual cases seems like a recipe for missing something.
I think the NumPy team tends not to like partial fixes, so if we do this we may also want to apply the same decision to scientific notation representations, which I have not touched here yet, but the changes here should suffice to provide enough of a preview for decision making on the matter.
The stringdtype testing changes are rather egregious, so that's probably enough of a sample of potential drawbacks to weigh the design considerations for feedback. More gruesome changes would likely be needed for other tests based on local experiments--I see some other failures related to array/scalar/string round-tripping guarantees that would need careful thought and that are broken by this set of adjustments.

[ci skip] [skip ci]

* Fixes numpygh-28679, but mostly an initial demo for now... * This patch sends scalar floating point string representations through the exact dragon representation path rather than the unique path for the reasons outlined in the matching ticket. It does not concern itself with whether the final `integer.0` is printed or excluded via `integer.`. The main argument in favor is that it is pretty weird to follow a different path for 0-D arrays vs. multi-element arrays. The main argument against is likely the potential shenanigans this kind of large-scale reprsentation change could entail downstream if folks already have shims around it and if the rounded representations are not truly bugs but just different design decisions. * A `hypothesis` test was added that fails before and passes after the switch to the exact scalar flot representations. `hypothesis` was chosen because the idiosyncrasies of asymmetric IEEE floating point number lines are well known and trying to test them with a few hard-coded manual cases seems like a recipe for missing something. * I think the NumPy team tends not to like partial fixes, so if we do this we may also want to apply the same decision to scientific notation representations, which I have not touched here yet, but the changes here should suffice to provide enough of a preview for decision making on the matter. * The `stringdtype` testing changes are rather egregious, so that's probably enough of a sample of potential drawbacks to weigh the design considerations for feedback. More gruesome changes would likely be needed for other tests based on local experiments--I see some other failures related to array/scalar/string round-tripping guarantees that would need careful thought and that are broken by this set of adjustments. [ci skip] [skip ci]

tylerjereddy · 2025-04-12T17:53:50Z

numpy/_core/tests/test_stringdtype.py

+        ("csingle",
+         "(0.100000001490116119384765625+0.100000001490116119384765625j)"),
+        ("cdouble",
+         "(0.1000000000000000055511151231257827021181583404541015625+0.1000000000000000055511151231257827021181583404541015625j)"),


@f2013519

Here is an example of the kinds of shenanigans the NumPy team would have to deal with if we just naively switch to DigitMode_Exact as I've done in the demo here.

I believe the shims needed for certain round tripping behaviors that were previously solid may also be rather messy.

Not saying the team won't make changes, but at first glance.. yikes!

Is it possible to make it a more surgical fix to use the exact representation only when,

abs(x) > flintmax(precision)

This means we will be dealing only with integers. And so the exact representation will contain the same number of digits as the rounded one. That would fix the cases I mentioned without affecting these test cases.

I would agree the larger hammer probably does not make sense. Most people do not want to see endless digits on their screen :). The current algorithm does well for these scenarios.

tylerjereddy · 2025-04-12T17:55:06Z

numpy/_core/src/multiarray/scalartypes.c.src

@@ -705,7 +705,7 @@ format_@name@(@type@ val, npy_bool scientific,
    }
    else {
        return Dragon4_Positional_@Name@(&val,
-                        DigitMode_Unique, CutoffMode_TotalLength, precision,
+                        DigitMode_Exact, CutoffMode_TotalLength, precision,


Scientific notation is just above this, so would be equally easy to change.

But... the testsuite shims currently added in this PR are just a small sample of the tradeoffs we'd have to consider, so I won't invest more time here until that's all considered and weighed/balanced.

* Guard exact float printing for scalars to trigger only in the "integer" case. * Revert some testing shims that are no longer needed; a single shim in `test_stringdtype.py` is still needed for half precision.

tylerjereddy · 2025-04-12T20:12:21Z

numpy/_core/tests/test_stringdtype.py

+    if typename == "float16":
+        assert sres[0] == "0.0999755859375"
+    else:
+        assert sres[0] == "0.1"


@f2013519 interesting idea--I went with if (val == (long long)val) to guard the "exact" prints to "exact" integers, and it almost worked--this was the only shim I needed in the entire testsuite for things to pass locally. That probably means there is something I'm not quite getting--the StringDtype business is probably something Nathan can comment on though.

I'll go ahead and flush the CI test matrix here to see if there are any other issues reviewers would want to consider (32-bit, other platforms/architectures, etc.). Something tells me CI will find some other failures, but we'll have to wait and see.

tylerjereddy · 2025-04-12T20:29:27Z

Yeah, looks like CI is finding just 1 failure so far, for the new hypothesis test I added, for this case:

Falsifying example: test_gh_28679(
    dt_val=(dtype('<f4'), 99999996),
)

So, the code obviously isn't perfect yet, but the extent of the testsuite failures is actually kind of minor with the latest shim. I should move on for now--this at least provides a preview with a small diff and an example of the remaining challenges to deal with.

mhvk

As commented in #28679 (comment), I'm not sure it really is better to typeset exactly, as it gives a false sense of precision. But maybe best to discuss further in #28679 - in the meantime, it is very helpful to see what a possible solution would entail.

tylerjereddy · 2025-04-14T22:39:51Z

I think this PR has served its purpose of prototyping something and catalyzing a discussion.

gh-28703 appears to now be the preferred solution, and is also from a first-time contributor, which is even better, so let's close this and focus on that. I've repurposed the property based test drafted here to aid in the review process over there re: scientific notation transition points for arrays and scalars.

tylerjereddy added component: numpy._core 03 - Maintenance labels Apr 12, 2025

github-actions bot added the 25 - WIP label Apr 12, 2025

tylerjereddy commented Apr 12, 2025

View reviewed changes

MAINT: PR 28696 revisions

b0bc771

* Guard exact float printing for scalars to trigger only in the "integer" case. * Revert some testing shims that are no longer needed; a single shim in `test_stringdtype.py` is still needed for half precision.

tylerjereddy commented Apr 12, 2025

View reviewed changes

mhvk reviewed Apr 12, 2025

View reviewed changes

tylerjereddy mentioned this pull request Apr 14, 2025

MAINT: Improve float16 and float32 printing #28703

Merged

tylerjereddy closed this Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP, MAINT: scalar float str match array #28696

WIP, MAINT: scalar float str match array #28696

Uh oh!

tylerjereddy commented Apr 12, 2025 •

edited

Loading

Uh oh!

tylerjereddy Apr 12, 2025

Uh oh!

f2013519 Apr 12, 2025

Uh oh!

tylerjereddy Apr 12, 2025

Uh oh!

tylerjereddy Apr 12, 2025

Uh oh!

tylerjereddy commented Apr 12, 2025

Uh oh!

mhvk left a comment

Uh oh!

tylerjereddy commented Apr 14, 2025

Uh oh!

Uh oh!

Uh oh!

WIP, MAINT: scalar float str match array #28696

WIP, MAINT: scalar float str match array #28696

Uh oh!

Conversation

tylerjereddy commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tylerjereddy Apr 12, 2025

Choose a reason for hiding this comment

Uh oh!

f2013519 Apr 12, 2025

Choose a reason for hiding this comment

Uh oh!

tylerjereddy Apr 12, 2025

Choose a reason for hiding this comment

Uh oh!

tylerjereddy Apr 12, 2025

Choose a reason for hiding this comment

Uh oh!

tylerjereddy commented Apr 12, 2025

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

tylerjereddy commented Apr 14, 2025

Uh oh!

Uh oh!

tylerjereddy commented Apr 12, 2025 •

edited

Loading