Skip to content

gh-96512: Modernize int_max_str_digits protection in 3.12 #96568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Sep 5, 2022

This takes care of the TODOs outlined in the linked issue:

  • Cleanup: Move int_max_str_digits into PyConfig in 3.12.
  • Use int32_t or Py_ssize_t instead of int for the max digits value internally.
  • Minor feature: add PEP-578 PySys_Audit hook calls at the new ValueError raise spots.
  • Update the version notes about to say 3.11 and remove the 3.12 What's New and NEWS entry text about it as this shipped in 3.11. The original PR was created not knowing exactly when we'd merge.

@gpshead gpshead added type-feature A feature request or enhancement tests Tests in the Lib/test dir DO-NOT-MERGE interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 5, 2022
@gpshead gpshead self-assigned this Sep 5, 2022
@gpshead gpshead marked this pull request as ready for review September 15, 2022 23:24
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the rationale for audit events on operations which can be slow. Is that a new trend? If you consider that it's so important to emit audit events on "slow" operations, why not adding on every single Python operation which is "slow"? Like... time.sleep(), 10 ** (10 ** 10), or well... anything else.

It's hard for me to see how converting a large integer to a string, or a long string to an integer, is a threat. There is already the ValueErrror if it's larger than the limit.

Calling PySys_Audit() is cheap but it's not free. It has a minor impact on performance. Once it's added, it will hard to justify to remove it to "optimize Python". So I would prefer that we think twice about it.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@vstinner
Copy link
Member

Use int32_t or Py_ssize_t instead of int for the max digits value internally.

What's the rationale for that? It makes this specific option special and requires special code in Python/initconfig.c and other places. I would prefer to use an int for all integer options. If you want to put min or max values, you can add debug checks in config_check_consistency() and checks in _PyConfig_Read().

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is hard to review, it does many things at once:

  • Add PyConfig.int_max_str_digits: you should add tests
  • Add a new int32_t type for PyConfig
  • Add new audit events
  • Update the int_max_str_digits documentation

You might split it into separated PR, like move the addition of audit events to a new PR.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the adding of PyConfig.int_max_str_digits, IMO PyInterpreterState.int_max_str_digits becomes redundant and can introduce inconsistencies. I suggest removing it.

In PyInterpreterState, PyInterpreterState.config is a directly structure, not a pointer. At the machine code level, reading PyInterpreterState.config.int_max_str_digits is efficient: the offset in the memory is constant.

@vstinner
Copy link
Member

nitpick: in the commit message, you should mention that the change also removes _Py_global_config_int_max_str_digits.

I suggest to start with a PR with replaces _Py_global_config_int_max_str_digits + PyInterpreterState.int_max_str_digits with PyConfig.int_max_str_digits.

@gpshead
Copy link
Member Author

gpshead commented Sep 16, 2022

Thanks! Yep this is entirely reasonable to split up into separate PRs.

The audit hook can go away, it was just an idea. I've personally never used those.

I used individual commits for each list item to make wrangling this into smaller PRs after concept review easy, will do.

Using int32_t was a request from @mdickinson on the original PR.

@vstinner
Copy link
Member

Using int32_t was a request from @mdickinson on the original PR.

Objects/longobject.c uses int max_str_digits = interp->int_max_str_digits;. Can we cast the PyConfig value to int32_t at this place, and use int in PyConfig?

@mdickinson
Copy link
Member

Using int32_t was a request from @mdickinson on the original PR.

Let's just leave it as is.

@mdickinson
Copy link
Member

mdickinson commented Sep 16, 2022

Can we cast the PyConfig value to int32_t at this place

No, not safely, unless we know for sure that sizeof(int) == 4.

I'm more than happy to make sizeof(int) == 4 a CPython build-time assumption, since it's hardly ever likely not to be true, but in that case we should update the configure script to error out on systems where sizeof(int) is not 4, so that it's clear to all that it's safe to assume sizeof(int) == 4 within the CPython source.

@vstinner
Copy link
Member

No, not safely, unless we know for sure that sizeof(int) == 4.

In PyConfig, we can reject values which don't fit in an uint32_t, to make sure that the cast is safe. No?

What's the issue on int max_str_digits = interp->int_max_str_digits; if interp->int_max_str_digits is an int32_t? When int is smaller than int32_t, or when int is larger than int32_t?

@mdickinson
Copy link
Member

mdickinson commented Sep 16, 2022

In PyConfig, we can reject values which don't fit in an uint32_t, to make sure that the cast is safe. No?

Assuming you mean int32_t, not uint32_t, then yes: that would work fine.

What's the issue on int max_str_digits = interp->int_max_str_digits; if interp->int_max_str_digits is an int32_t?

The usual one whenever you're converting between two signed integer types that aren't known to have the same width - potential for undefined behaviour from overflow. There's no guarantee from the C standard that int has width exactly 32 bits. As I've already said, I have no objection to CPython assuming that the int type has width 32, but that assumption needs to be checked at build time and documented - if that's the way that people want to go, then that's fine with me.

I've put significant effort over the years into making sure that Object/longobject.c is highly portable and depends on very little beyond what's guaranteed by the C standards; I care very much about keeping that the case.

But really, this isn't worth the discussion at this point; I really wasn't anticipating that my request to use Py_ssize_t instead of int would cause this level of fuss. Let's just drop this change altogether and use int throughout, as in the current main branch.

@gpshead
Copy link
Member Author

gpshead commented Sep 16, 2022

Will do. Crafting the change was an informative exercise regardless. I'm a fan of always using explicitly sized types, so any attempt to do so in a codebase that didn't start out that way reveals how ready it is or isn't.

I'll dive back in on this code next week sometime, but questions in the back of my mind before I do that you may know already:

PyConfig vs PyInterpreterState vs subinterpreters? Right now I believe each subinterpreter gets its own limit. We need a test for that. I wouldn't want a sys.set_int_max_str_digits() call to change it for all interpreters if so.

Similarly, PyConfig... is that intended to be mutable beyond process startup? I thought might be meant to represent the environment and flag values at start, not not necessarily settings that were changed via APIs afterwards. thoughts?

@vstinner
Copy link
Member

Mark:

There's no guarantee from the C standard that int has width exactly 32 bits. As I've already said, I have no objection to CPython assuming that the int type has width 32, but that assumption needs to be checked at build time and documented - if that's the way that people want to go, then that's fine with me.

I don't think that only this specific change is a good motivation to require 32-bit int to build Python.

So far, we did our best to support all architectures. We accepted a change to support m68k (the code was generalized recently: commit dec0757). Hum. I checked and it seems like m68k uses 32-bit int, at least on Linux ;-)

Well, it seems like on Linux, int is always 32-bit. I heard (Wikipedia) about ILP64 and SILP64 which use 64-bit int, but it seems to be outdated:

  • "ILP64: HAL Computer Systems port of Solaris to the SPARC64" but HAL Computer Systems defunct in 2001.
  • "SILP64: UNICOS" which was created in 1984; 38 years ago and is now discontinued

All modern operating systems use 32-bit int.

@vstinner
Copy link
Member

Mark:

But really, this isn't worth the discussion at this point; I really wasn't anticipating that my request to use Py_ssize_t instead of int would cause this level of fuss. Let's just drop this change altogether and use int throughout, as in the current main branch.

My concern is more about the maintenance burden of Python/initconfig.c. Adding a new variable already requires a lot of code, as you can see in this PR. I would prefer to keep the code as simple as possible, by using only a few types: int, wchar_t* and list of wchar_t*. Supporting 3 types already requires different for each operation: cast to a dict, update from a dict, copy config to config2, print to stdout, etc.

I'm fine with using int32_t in longobject.c. IMO it's just a matter of making sure in PyConfig that the cast from PyConfig to uint32_t cannot overflow.

Gregory:

I'm a fan of always using explicitly sized types, so any attempt to do so in a codebase that didn't start out that way reveals how ready it is or isn't.

So maybe all PyConfig integer members should become int32_t? I don't know.

The int type comes from Python 3.6 global configuration variables like Py_VerboseFlag. In C, the int is commonly used and easy to use. For example, it's the "%i" formatter for printf(). For int32_t, the "%" PRIi32 formatter should be used instead and to be honest I had to use Google to find it since I only use it rarely.

@vstinner
Copy link
Member

PyConfig vs PyInterpreterState vs subinterpreters? Right now I believe each subinterpreter gets its own limit.

Each interpreter has its own config in PyInterpreter.config. Technically, each interepreter can have a different config. It's already the case if you set config._isolated_interpreter to 1 to create an "isolated" sub-interpreter (enforce more restrictions on subprocesses and threads).

Oh, there was a test API to set _isolated_interpreter=1, but it seems like it's gone. Or I misremember.

Example modifying the sub-interpreter config with _testinternalcapi.set_config():

import _testcapi
import _testinternalcapi

code = """
import _testinternalcapi

config = _testinternalcapi.get_config()
config['bytes_warning'] = 2
_testinternalcapi.set_config(config)

print("bytes warnings of the sub-interpreter?",
      _testinternalcapi.get_config()['bytes_warning'])
"""
_testcapi.run_in_subinterp(code)

print("bytes warnings of main interpreter?",
      _testinternalcapi.get_config()['bytes_warning'])

Output:

bytes warnings of the sub-interpreter? 2
bytes warnings of main interpreter? 0

Modifying the sub-interpreter config doesn't modify the main interpreter config ;-)

@gpshead
Copy link
Member Author

gpshead commented Sep 20, 2022

the pair of PRs mentioned above replace this with & address the feedback.

@gpshead gpshead closed this Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes DO-NOT-MERGE interpreter-core (Objects, Python, Grammar, and Parser dirs) tests Tests in the Lib/test dir type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants