-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
WIP, MAINT: scalar float str match array #28696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Fixes numpygh-28679, but mostly an initial demo for now... * This patch sends scalar floating point string representations through the exact dragon representation path rather than the unique path for the reasons outlined in the matching ticket. It does not concern itself with whether the final `integer.0` is printed or excluded via `integer.`. The main argument in favor is that it is pretty weird to follow a different path for 0-D arrays vs. multi-element arrays. The main argument against is likely the potential shenanigans this kind of large-scale reprsentation change could entail downstream if folks already have shims around it and if the rounded representations are not truly bugs but just different design decisions. * A `hypothesis` test was added that fails before and passes after the switch to the exact scalar flot representations. `hypothesis` was chosen because the idiosyncrasies of asymmetric IEEE floating point number lines are well known and trying to test them with a few hard-coded manual cases seems like a recipe for missing something. * I think the NumPy team tends not to like partial fixes, so if we do this we may also want to apply the same decision to scientific notation representations, which I have not touched here yet, but the changes here should suffice to provide enough of a preview for decision making on the matter. * The `stringdtype` testing changes are rather egregious, so that's probably enough of a sample of potential drawbacks to weigh the design considerations for feedback. More gruesome changes would likely be needed for other tests based on local experiments--I see some other failures related to array/scalar/string round-tripping guarantees that would need careful thought and that are broken by this set of adjustments. [ci skip] [skip ci]
("csingle", | ||
"(0.100000001490116119384765625+0.100000001490116119384765625j)"), | ||
("cdouble", | ||
"(0.1000000000000000055511151231257827021181583404541015625+0.1000000000000000055511151231257827021181583404541015625j)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example of the kinds of shenanigans the NumPy team would have to deal with if we just naively switch to DigitMode_Exact
as I've done in the demo here.
I believe the shims needed for certain round tripping behaviors that were previously solid may also be rather messy.
Not saying the team won't make changes, but at first glance.. yikes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make it a more surgical fix to use the exact representation only when,
abs(x) > flintmax(precision)
This means we will be dealing only with integers. And so the exact representation will contain the same number of digits as the rounded one. That would fix the cases I mentioned without affecting these test cases.
I would agree the larger hammer probably does not make sense. Most people do not want to see endless digits on their screen :). The current algorithm does well for these scenarios.
@@ -705,7 +705,7 @@ format_@name@(@type@ val, npy_bool scientific, | |||
} | |||
else { | |||
return Dragon4_Positional_@Name@(&val, | |||
DigitMode_Unique, CutoffMode_TotalLength, precision, | |||
DigitMode_Exact, CutoffMode_TotalLength, precision, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scientific notation is just above this, so would be equally easy to change.
But... the testsuite shims currently added in this PR are just a small sample of the tradeoffs we'd have to consider, so I won't invest more time here until that's all considered and weighed/balanced.
* Guard exact float printing for scalars to trigger only in the "integer" case. * Revert some testing shims that are no longer needed; a single shim in `test_stringdtype.py` is still needed for half precision.
if typename == "float16": | ||
assert sres[0] == "0.0999755859375" | ||
else: | ||
assert sres[0] == "0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@f2013519 interesting idea--I went with if (val == (long long)val)
to guard the "exact" prints to "exact" integers, and it almost worked--this was the only shim I needed in the entire testsuite for things to pass locally. That probably means there is something I'm not quite getting--the StringDtype business is probably something Nathan can comment on though.
I'll go ahead and flush the CI test matrix here to see if there are any other issues reviewers would want to consider (32-bit, other platforms/architectures, etc.). Something tells me CI will find some other failures, but we'll have to wait and see.
Yeah, looks like CI is finding just 1 failure so far, for the new
So, the code obviously isn't perfect yet, but the extent of the testsuite failures is actually kind of minor with the latest shim. I should move on for now--this at least provides a preview with a small diff and an example of the remaining challenges to deal with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented in #28679 (comment), I'm not sure it really is better to typeset exactly, as it gives a false sense of precision. But maybe best to discuss further in #28679 - in the meantime, it is very helpful to see what a possible solution would entail.
I think this PR has served its purpose of prototyping something and catalyzing a discussion. gh-28703 appears to now be the preferred solution, and is also from a first-time contributor, which is even better, so let's close this and focus on that. I've repurposed the property based test drafted here to aid in the review process over there re: scientific notation transition points for arrays and scalars. |
Fixes BUG: Floating point printing / string conversion is incorrect #28679, but mostly an initial demo for now...
This patch sends scalar floating point string representations through the exact dragon representation path rather than the unique path for the reasons outlined in the matching ticket. It does not concern itself with whether the final
integer.0
is printed or excluded viainteger.
. The main argument in favor is that it is pretty weird to follow a different path for 0-D arrays vs. multi-element arrays. The main argument against is likely the potential shenanigans this kind of large-scale reprsentation change could entail downstream if folks already have shims around it and if the rounded representations are not truly bugs but just different design decisions. Also, the impact on round-tripping behaviors in some cases looks like it may cause quite some degree of annoyance for us based on remaining local test failures for this branch.A
hypothesis
test was added that fails before and passes after the switch to the exact scalar float representations.hypothesis
was chosen because the idiosyncrasies of asymmetric IEEE floating point number lines are well known and trying to test them with a few hard-coded manual cases seems like a recipe for missing something.I think the NumPy team tends not to like partial fixes, so if we do this we may also want to apply the same decision to scientific notation representations, which I have not touched here yet, but the changes here should suffice to provide enough of a preview for decision making on the matter.
The
stringdtype
testing changes are rather egregious, so that's probably enough of a sample of potential drawbacks to weigh the design considerations for feedback. More gruesome changes would likely be needed for other tests based on local experiments--I see some other failures related to array/scalar/string round-tripping guarantees that would need careful thought and that are broken by this set of adjustments.[ci skip] [skip ci]