Skip to content

ENH: Update scalar representations as per NEP 51 #22449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Jul 26, 2023

Conversation

seberg
Copy link
Member

@seberg seberg commented Oct 18, 2022

This is a WIP for implementing NEP 51/gh-22261. There are a few things that go beyond just changing the scalar representation:

  1. It makes get_formatter() a more central/public API, to also move fmt=... to it.

    • This introduces the distinction that if the dtype is known, we can e.g. represent a longdouble as '3.1' (with quotes) rather than np.longdouble('3.1').
    • fmt is mainly used for fmt="r" (previous point), I think fmt="s" should be supported, but always be identical to str(arr[0]).
    • There could be an argument for the fmt path to be a ufunc (it should not inspect array values, unlike the default printing).
    • fmt should be used more cleanly for the __format__ work. This should be fine, I think it only means pushing the "option" adjustments lower.
  2. arr.tofile() used repr(arr.item(...)) to print each element. This is weird/problematic:

    • For longdouble, I want that repr to be np.longdouble('3.1').
    • For strings, this means it uses 'string' or b'string'
  3. Record and MA printing has to be adjusted to use the new functionality. This affects record scalars and the printing of the masked array fill_value.

    After some back and forth, I think the default should just be to go back to str() (this was changed silently in BUG 4381 Longdouble from string without precision loss #6199). str(arr.item()) is not ideal for float32 for example.
    This could use some thought (or should change to use the above get_formatter(fmt="s"), but I hope this can be decoupled from NEP 51.


I plan to update the NEP 51 draft to include these points. Happy to discuss here of course (will make pushing NEP 51 quicker). But for now the draft-PR exists primarily to help with the NEP 51 discussion.

TODOS::

  • Refguide check is failing (and may lead to fixups being necessary?)
  • Discuss and accept NEP 51
  • Decide whether this should be hidden behind a flag or if it should interact with the legacy print-mode (i.e. to allow inclusion in the next release, but only switch with NEP 50 – which would likely be a NumPy 2.0 as well)
  • New tests for some paths are needed. Mainly arr.tofile() is not well tested (which is why the change here should not fail tests).

@seberg seberg force-pushed the scalar-repr-change branch 2 times, most recently from 342b701 to ff552fc Compare October 18, 2022 11:59
@tylerjereddy
Copy link
Contributor

@seberg so far so good for latest main of SciPy alongside this feature branch in my hands (Ubuntu Linux; i9-7900X):

  • python dev.py test -j 10
  • 37669 passed, 2222 skipped, 139 xfailed, 9 xpassed in 83.59s (0:01:23)

@seberg
Copy link
Member Author

seberg commented Oct 21, 2022

So, I realized that the empty format string "" already has more or less the meaning of str(), and in that sense I was thinking of using the repr function itself as a singleton for "scalar repr" (rather than "r").

I don't really have an opinion, could even say that str should be used for !s formatting effectively. Or could go back, because why would anyone use {obj:r} for something that is not a repr...

EDIT: Reconsidering this, maybe "" is different, since it would be simpler if that was the same as the default formatting, which is not string. So using str and repr seems actually pretty nice?

The thing to look into now (besides test fixes), is that array float formatting doesn't match scalar float formatting, so I actually need to pass through the formatting information better (should not be hard, but needs a bit organizing/thought.)

@seberg seberg marked this pull request as ready for review October 21, 2022 13:14
@seberg seberg changed the title WIP: Update scalar representations as per NEP 51 ENH: Update scalar representations as per NEP 51 Oct 21, 2022
@seberg seberg added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Oct 21, 2022
@seberg seberg marked this pull request as draft October 21, 2022 13:44
@seberg seberg force-pushed the scalar-repr-change branch from 893d77d to 387e26a Compare October 24, 2022 12:34
@seberg
Copy link
Member Author

seberg commented Oct 24, 2022

This is starting to settle, although I am considering changing np.float64(nan) to add the quotes always (np.float64('nan') similar to Python).

@rossbar do you know why the refguide check does not seem to notice changes from 0.0 to np.float64(0.0)? In a sense, I like that, making that a followup is probably far more manageable.

@seberg seberg marked this pull request as ready for review October 24, 2022 12:56
@seberg seberg force-pushed the scalar-repr-change branch 3 times, most recently from ce4166f to 9a64758 Compare October 24, 2022 15:31
@h-vetinari
Copy link
Contributor

I saw #22261 only now, so I won't complain if this is too late, but I'd really like to get away from long, longlong, etc. I'd much prefer to just use sized integers throughout. It's a much saner model, and now that C99 is available on all platforms, we could do it (modulo the fact that LAPACK and perhaps some other venerable libs still have those types in their API 😑)

@seberg
Copy link
Member Author

seberg commented Nov 1, 2022

I am proposing to print the bit-sized version, even if imprecise sometimes. Yes, it would be nice to change it, but it seems like a big ABI change on the C-side?

@h-vetinari
Copy link
Contributor

I am proposing to print the bit-sized version, even if imprecise sometimes.

Looked more closely, looks fine.

Yes, it would be nice to change it, but it seems like a big ABI change on the C-side?

I'm afraid so, but I still think it'd be worth it eventually (numpy 2.0 material?).

Copy link
Contributor

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a mismatch between the implementation and the NEP w.r.t. whether printing is affected or not.

Based on the implementation, I think this PR has pretty high chance of disrupting services where something gets calculated with numpy, and then passed on in some serialized form to another consumer, where numpy (or even python) might not necessarily be installed.

Is there any fallback we could provide so that people can adapt such interfaces (e.g. .value or .raw or ...)?

@ev-br
Copy link
Contributor

ev-br commented Nov 5, 2022

the refguide check does not seem to notice changes from 0.0 to np.float64(0.0)

It won't, as long as np.allclose(eval('np.float64(0.0)'), 0.0)

@charris
Copy link
Member

charris commented Feb 19, 2023

Needs rebase.

@seberg seberg force-pushed the scalar-repr-change branch from decdee0 to 08c07ce Compare June 21, 2023 14:26
@seberg seberg force-pushed the scalar-repr-change branch from 08c07ce to bf92d6e Compare June 30, 2023 16:01
@pllim
Copy link
Contributor

pllim commented Jul 3, 2023

@mhvk , how do you think astropy can/should deal with this in our doctest downstream? I can only think of disabling doctest for numpy-dev and then when numpy 2.0 is released, we update the docs and disable doctest for numpy 1.x. However, this comes with the risk that we won't catch any incompatibility if a feature is only tested in the doc examples. Hope you can advise. Thanks! 🙏

@larsoner
Copy link
Contributor

Looks like we're headed to green with legacy="1.25" (set only when the NumPy version is > 1.25), so the suggested workaround is all good (when deployed correctly) 👍

@seberg
Copy link
Member Author

seberg commented Jul 27, 2023

I will note that just setting the legacy mode for the tests will unfortunately hide issues if you use repr(scalar) for serialization/writing to a file (or a dependency does). So it is a good hot-fix for doctests or tests which check repr explicitly, but might cause real issues to be missed.

brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Jan 24, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 4, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 4, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 4, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 4, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 4, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 5, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 5, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 6, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 14, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 14, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 14, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 14, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 14, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 17, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 17, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this pull request Feb 17, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.
ricardoV94 pushed a commit to pymc-devs/pytensor that referenced this pull request Feb 17, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.
Aarsh-Wankar pushed a commit to Aarsh-Wankar/pytensor that referenced this pull request Feb 22, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants