-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-118761: Always lazy import re
in locale
#129860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@AA-Turner Is this what you wanted to see? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always recompile the pattern, which is not ideal. In addition, this function is likely to be used by gettext
or other stuff so it will be called quite a lot IMO. Note that only the recent calls to re.compile()
are cached, so it could or could not be efficient and it could also badly interact with other occurrences of re.compile()
. An alternative is to use _percent_re = None
+ one-time compile but that could also harm readability.
I think I left out all re
imports that were needed for a global pattern (if that pattern was necessary in a common function). Again, the issue is not just about making the import faster.
So, I'm not really convinced by this specific change. But maybe I'm overthinking and we can do it. Now, locale
is likely to be used in text processing context where re
is likely to have already been imported.
But yes, that's what we wanted to see (namely tuna + hyperfine benchmarks). Ideally, I'd also like to have the benchmarks for |
Benchmarks for
|
I also found a weird argument passing for the Lines 329 to 336 in c1f352b
I understand why it works, but can I change it to the expected argument type, like |
@picnixz, sorry for the previous test, I forgot about the fact that import affects Benchmarks for
|
Sorry but that's not what I meant by "caches". The cache in question is just an LRU cache at Python level: Lines 330 to 337 in 3cf8590
so it has nothing to do with |
In another PR. We want atomic changes because it's easier to backport and inspect. We don't want multiple unrelated changes in general. But for this one, I think we don't really need to change it. It's fine, it works and it's in a test function. |
Now I understand, sorry :-) |
Simple example: locale.format_string(...)
some_other_function_that_calls_re_compile()
...
locale.format_string(...) # and we need to re-compile it The Anyway, apart from this possibility of performance loss, the readability is still affected IMO. I'm not against Adam giving his opinion on this matter but if we want to have a real performance improvement, I think we should use the _pattern = None
def format_string(...):
if _pattern is None:
import re
global _pattern
_pattern = re.compile(...)
... With this approach |
Thank you for your explanation. |
I did that None pattern in GH-128898, though admittedly my rationale for there was that it is quite "common" for gettext to be imported in contexts where it is installed as a global Note though that the import for this PR is only used by It also seems like it should be relatively common to import locale to get at getencoding without necessarily using re or generalized text manipulation, just needing to know the correct format to open files or even some pesky subprocesses which a dreadful habit of communicating via non-UTF8 stdout. |
Improve import time of
locale
by lazy importre
CPython configure flags:
Benchmarks:
Running:
pipx install tuna && ./python -X importtime -c 'import locale' 2> import.log && tuna import.log
Total import time: 0.014s -> 0.009s = x1.56 as fast
locale
import time: 0.007s -> 0.003s = x2.34 as fasthyperfine: 14.9ms -> 12.1ms = x1.23 as fast
Main branch:
PR branch: