-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-106931: Intern Statically Allocated Strings Globally #107272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-106931: Intern Statically Allocated Strings Globally #107272
Conversation
Objects/unicodeobject.c
Outdated
Py_ssize_t | ||
_PyUnicode_InternedSize(void) | ||
{ | ||
return PyObject_Length(get_interned_dict(_PyInterpreterState_GET())); | ||
PyObject *dict = get_interned_dict(_PyInterpreterState_GET()); | ||
return _Py_hashtable_len(INTERNED_STRINGS) + PyObject_Length(dict); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return _Py_hashtable_len(INTERNED_STRINGS) + PyObject_Length(dict); | |
return _Py_hashtable_len(INTERNED_STRINGS) + PyDict_GET_SIZE(dict); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
FWIW, I wouldn't be too sad if we didn't do this in 3.12 (considering the current state is that strings just aren't interned in isolated subinterpreters), but if you feel confident enough to backport this before next monday, I'm okay with that. |
Honestly, I was a little on the fence myself. My rationale is IMHO reasonable but clearly not the strongest: leaving a performance penalty, however small, because of subinterpreters is frustrating. If the solution were any more complex I'd probably lean the other way.
Regardless of what I said above, I'll stew on it a bit more. Thanks for the clarity. |
Thanks @ericsnowcurrently for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
…gh-107272) We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict. Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter. (cherry picked from commit b72947a) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
GH-107358 is a backport of this pull request to the 3.12 branch. |
|
|
|
See gh-107362. |
…ythongh-107272) We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict. Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter. (cherry-picked from commit b72947a)
…ythongh-107272) We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict. Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter. (cherry-picked from commit b72947a)
GH-110713 is a backport of this pull request to the 3.12 branch. |
…7272) (gh-110713) We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict. Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter. (cherry-picked from commit b72947a)
We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic
_Py_hashtable_t
for that global cache instead of a dict.Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.