Skip to content

gh-123660: Internal macros for accessing empty string/bytes singletons #123643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Sep 3, 2024

This is a proposal for using macros instead of typing everytime _Py_STR(empty) or _Py_SINGLETON(bytes_empty)

This is a follow-up to capi-workgroup/decisions#36 (comment).

I wanted to use Py_GetConstantBorrowed instead, but I wasn't sure which approach was the most efficient one. In Py_GetConstantBorrowed you have a constant_id < Py_ARRAY_LENGTH(constants) check and I'm not sure it's even possible to use the function since you'll need to create the constant in the array constants, and this uses PyUnicode_New(0, 0) which itself returns the singleton. So you'll end up with a recursion issue.

Anyway, it's still WIP although all occurrences to empty strings and bytes were replaced. I'm just opening a draft PR so it's easier to see the diff (I'll create an issue later).

@picnixz
Copy link
Member Author

picnixz commented Sep 3, 2024

@vstinner @encukou @erlend-aasland @zooba @serhiy-storchaka In our discussion on the PyBytes_Join function, we discussed on the possibility of using macros for empty strings/bytes. Do you think this is worth the change? if so, I'll open an associated issue. I haven't changed the PyUnicode_FromString("") calls since they are "optimized" to returnt he singleton (though there are quite a lot of functions being called under the hood).

@picnixz picnixz added the pending The issue will be closed if no feedback is provided label Sep 3, 2024
@encukou
Copy link
Member

encukou commented Sep 3, 2024

The C API WG deals with public API.

IMO the internal API is fine as it is; the rename doesn't bring much.

@zooba
Copy link
Member

zooba commented Sep 3, 2024

I'm quite happy to review internal C API designs as well - we all still have to live with them.

As a concept, this one looks fine to me. It'll help our contributors discover how best to access these singletons, as it isn't exactly obvious. If/when we make another way to get them it'll be easier to update.

As a PR, I think there are some generated files you've touched that will be undone when they regenerate (though I haven't checked for sure). Make sure you do a make regen to see if anything changes.

@picnixz
Copy link
Member Author

picnixz commented Sep 4, 2024

As a PR, I think there are some generated files you've touched that will be undone when they regenerate (though I haven't checked for sure).

I think there's already a CI job that checks whether the files are up-to-date or not and I locally did regenerate them so they should be part of the PR. It should be fine I think.

I'll create an issue soon (I think we can make a meta-issue to integrate the empty tuple case as well).

@picnixz picnixz removed skip issue pending The issue will be closed if no feedback is provided labels Sep 4, 2024
@picnixz picnixz changed the title (Proposal) Internal macros for accessing empty string/bytes singletons gh-123660: Internal macros for accessing empty string/bytes singletons Sep 4, 2024
@picnixz picnixz marked this pull request as ready for review September 4, 2024 08:17
@picnixz picnixz marked this pull request as draft September 4, 2024 09:31
@picnixz
Copy link
Member Author

picnixz commented Sep 4, 2024

For now, I'm converting it into a draft until I've resolved my own interrogations on whether to replace PyUnicode_FromString("") and PyUnicode_New(0, 0) or not.

@vstinner
Copy link
Member

vstinner commented Sep 4, 2024

For now, I'm converting it into a draft until I've resolved my own interrogations on whether to replace PyUnicode_FromString("") and PyUnicode_New(0, 0) or not.

_Py_STR() and _Py_SINGLETON() are replaced with direct pointers at build time, whereas PyUnicode_FromString("") and PyUnicode_New(0, 0) are implemented as regular function calls at runtime: it's slower.

@picnixz
Copy link
Member Author

picnixz commented Sep 4, 2024

_Py_STR() and _Py_SINGLETON() are replaced with direct pointers at build time, whereas PyUnicode_FromString("") and PyUnicode_New(0, 0) are implemented as regular function calls at runtime: it's slower.

Yes, but is it safe to replace all occurrences? Actually, what I'd like to know whether there was a deeper meaning of not using the singleton (is it because when the code was written, this singleton did not yet exist? I think it was introduced in 2022).

This ensures that the `_Py_EMPTY_STRING` keeps working.
@vstinner
Copy link
Member

vstinner commented Sep 4, 2024

Yes, but is it safe to replace all occurrences?

I don't think that it's useful to replace all occurrences.

@picnixz
Copy link
Member Author

picnixz commented Sep 4, 2024

Which is why I didn't do it and was interrogating myself on whether it's useful or not. Now that I have another opinion, I'll leave it as is. I don't want to have an immense diff so I kept it simple. Otherwise the number of occurrences to change is much larger and we'll have a lot of changed files (thereby creating conflicts in other PRs I think...)

@zooba
Copy link
Member

zooba commented Sep 4, 2024

I don't think that it's useful to replace all occurrences.

Are there some you know are not safe? Provided the reference counting works out (which may require additional changes), it should be fine to replace calls like PyUnicode_FromString("") with the empty str singleton, right?

@encukou
Copy link
Member

encukou commented Sep 5, 2024

The C API WG deals with public API.

I was extremely brief there, sorry. In the WG discussion, Marc-Andre asks for public API.

And I think that is what we need first. Perhaps something like:

  • #define Py_NONE Py_GetConstantBorrowed(Py_CONSTANT_NONE)
  • #define Py_FALSE Py_GetConstantBorrowed(Py_CONSTANT_FALSE)
  • ... and so on

These should be documented. They're ”borrowed from the interpreter”, i.e they don't necessarily survive interpreter shutdown/reinitialization, and can't be shared between interpreters, although both will work in CPython.
Unlike Py_None, Py_NONE would be in the limited API, but it would not be an exported ABI symbol.

And when that's in place, these can get a faster implementation for non-limited API or for Py_BUILD_CORE.

I don't think it's helpful to only add new undocumented internal macros.

@vstinner
Copy link
Member

vstinner commented Sep 5, 2024

I didn't propose #define Py_EMPTY_STR Py_GetConstantBorrowed(Py_CONSTANT_EMPTY_STR) since Py_GetConstantBorrowed() is documented as:

This function is primarily intended for backwards compatibility: using Py_GetConstant() is recommended for new code.

It's strange to me to add a new macro relying on something only provided "for backward compatibility".

If we add such macro, I would prefer to see the function documentation rephrased.

Or maybe it's a bad idea to add a new macro which returns a borrowed reference, whereas the C API Guidelines is against adding new functions returning borrowed references. I don't know.

@vstinner
Copy link
Member

vstinner commented Sep 5, 2024

@encukou:

I don't think it's helpful to only add new undocumented internal macros.

I have mixed feelings about this change. I'm not convinced that it makes Python easier to maintain or easier to understand. I like using explicitly the complex _Py_STR() and _Py_SINGLETON() macros, but maybe it's just a personal taste :-)

@encukou
Copy link
Member

encukou commented Sep 5, 2024

If we add such macro, I would prefer to see the function documentation rephrased.

I agree with that.

C API Guidelines is against adding new functions returning borrowed references

Which guidelines? I guess we should clarify those first :)

@zooba
Copy link
Member

zooba commented Sep 5, 2024

I like using explicitly the complex _Py_STR() and _Py_SINGLETON() macros, but maybe it's just a personal taste :-)

I suspect this is personal taste, because you are highly familiar with how it works under the covers :-) For someone who isn't so aware, I think they're unlikely to guess to write the more complex version, and so if we have an obvious macros that does it then they're more likely to choose something efficient.

@picnixz
Copy link
Member Author

picnixz commented Nov 12, 2024

I'm closing this one since we didn't manage to find an answer to what we should replace at the end and whether this macro would be useful or not. We can revisit this question later (and there are anyway lots of conflicts now and I'm cleaning up my stale branches).

@picnixz picnixz closed this Nov 12, 2024
@picnixz picnixz deleted the chore/empty-strings-macros branch November 12, 2024 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants