Skip to content

gh-118761: Always lazy import re in locale #129860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 27, 2025
Merged

Conversation

donBarbos
Copy link
Contributor

@donBarbos donBarbos commented Feb 8, 2025

Improve import time of locale by lazy import re

CPython configure flags:

./configure --enable-optimizations --with-lto --enable-loadable-sqlite-extensions

Benchmarks:

Running: pipx install tuna && ./python -X importtime -c 'import locale' 2> import.log && tuna import.log

Total import time: 0.014s -> 0.009s = x1.56 as fast

main branch PR branch
Screenshot from 2025-02-08 21-42-54 Screenshot from 2025-02-08 21-43-12

locale import time: 0.007s -> 0.003s = x2.34 as fast

main branch PR branch
Screenshot from 2025-02-08 21-43-46 Screenshot from 2025-02-08 21-43-59

hyperfine: 14.9ms -> 12.1ms = x1.23 as fast

Main branch:

$ hyperfine --warmup 11 --runs 3000 "./python -c 'import locale'"
Benchmark 1: ./python -c 'import locale'
  Time (mean ± σ):      14.9 ms ±   0.5 ms    [User: 12.8 ms, System: 2.1 ms]
  Range (min … max):    14.4 ms …  25.3 ms    3000 runs

PR branch:

$ hyperfine --warmup 11 --runs 3000 "./python -c 'import locale'"
Benchmark 1: ./python -c 'import locale'
  Time (mean ± σ):      12.1 ms ±   0.5 ms    [User: 10.0 ms, System: 2.1 ms]
  Range (min … max):    11.6 ms …  21.6 ms    3000 runs

@donBarbos
Copy link
Contributor Author

@AA-Turner Is this what you wanted to see?

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always recompile the pattern, which is not ideal. In addition, this function is likely to be used by gettext or other stuff so it will be called quite a lot IMO. Note that only the recent calls to re.compile() are cached, so it could or could not be efficient and it could also badly interact with other occurrences of re.compile(). An alternative is to use _percent_re = None + one-time compile but that could also harm readability.

I think I left out all re imports that were needed for a global pattern (if that pattern was necessary in a common function). Again, the issue is not just about making the import faster.

So, I'm not really convinced by this specific change. But maybe I'm overthinking and we can do it. Now, locale is likely to be used in text processing context where re is likely to have already been imported.

@picnixz
Copy link
Member

picnixz commented Feb 8, 2025

But yes, that's what we wanted to see (namely tuna + hyperfine benchmarks). Ideally, I'd also like to have the benchmarks for locale.format_string now since the call is more complex and where re.compile() has no cache (namely, I want to know how fast a single call to format_string is. Since there is a caching mechanism, the cache will need to be cleared after each call as well so be careful when doing benchmarks).

@donBarbos
Copy link
Contributor Author

@picnixz

Benchmarks for locale.format_string function:

before run tests: sudo -v

hyperfine: 44.6ms -> 44.4ms (no diff)

Main branch:

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' --runs 100 "./python -c 'import locale;locale.format_string(\"%d\", 123456789)'"
Benchmark 1: ./python -c 'import locale;locale.format_string("%d", 123456789)'
  Time (mean ± σ):      44.6 ms ±   1.6 ms    [User: 15.2 ms, System: 9.4 ms]
  Range (min … max):    41.1 ms …  51.0 ms    100 runs

PR branch:

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' --runs 100 "./python -c 'import locale;locale.format_string(\"%d\", 123456789)'"
Benchmark 1: ./python -c 'import locale;locale.format_string("%d", 123456789)'
  Time (mean ± σ):      44.4 ms ±   1.8 ms    [User: 15.5 ms, System: 9.1 ms]
  Range (min … max):    38.9 ms …  55.0 ms    100 runs

@donBarbos
Copy link
Contributor Author

donBarbos commented Feb 8, 2025

I also found a weird argument passing for the format_string function in same file on line 332:

cpython/Lib/locale.py

Lines 329 to 336 in c1f352b

def _test():
setlocale(LC_ALL, "")
#do grouping
s1 = format_string("%d", 123456789,1)
print(s1, "is", atoi(s1))
#standard formatting
s1 = str(3.14)
print(s1, "is", atof(s1))

I understand why it works, but can I change it to the expected argument type, like format_string("%d", 123456789, True)?

@donBarbos
Copy link
Contributor Author

@picnixz, sorry for the previous test, I forgot about the fact that import affects

Benchmarks for locale.format_string function:

I hope everything managed to take into account, i wrote next script benchmark.py:

import time
import locale
import os

code = """
locale.format_string('%d', 123456789)
"""

result_times = []

for _ in range(300):
    os.system("sync && echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null")
    start_time = time.time()
    exec(code, {"locale": locale})
    end_time = time.time()
    result_times.append(end_time-start_time)


import statistics

first = result_times[0]
mean = statistics.mean(result_times)
median = statistics.median(result_times)
stdev = statistics.stdev(result_times)
variance = statistics.variance(result_times)

print(f"First time: {first * 1_000_000:.2f}μs")
print(f"Mean: {mean * 1_000_000:.2f}μs")
print(f"Median: {median * 1_000_000:.2f}μs")
print(f"Standard deviation: {stdev * 1_000_000:.2f}μs")
print(f"Variance: {variance * 1_000_000:.2f}μs")

Main branch:

$ ./python -B benchmark.py
First time: 961.78μs
Mean: 127.01μs
Median: 122.90μs
Standard deviation: 49.28μs
Variance: 0.00μs

PR branch:

$ ./python -B benchmark.py
First time: 6951.33μs
Mean: 151.57μs
Median: 127.79μs
Standard deviation: 394.02μs
Variance: 0.16μs

to be honest, i don't understand why the first run in my code takes so much time.

@picnixz
Copy link
Member

picnixz commented Feb 9, 2025

Sorry but that's not what I meant by "caches". The cache in question is just an LRU cache at Python level:

cpython/Lib/re/__init__.py

Lines 330 to 337 in 3cf8590

def _compile(pattern, flags):
# internal: compile pattern
if isinstance(flags, RegexFlag):
flags = flags.value
try:
return _cache2[type(pattern), pattern, flags]
except KeyError:
pass

so it has nothing to do with /proc/sys/vm/drop_caches. In order to properly benchmark functions, we usually use pyperf instead. The idea is to essentially benchmark the call of format_string. I suspect that the first call to format_string will always be slower as we need to compile the pattern. Then, if we continue using format_string with any call to re.compile() in between, there won't be any issue. However, if we do multiple re.compile() with different patterns, then the next call to format_string will be slower as we need to make a full re-compile.

@picnixz
Copy link
Member

picnixz commented Feb 9, 2025

I understand why it works, but can I change it to the expected argument type, like format_string("%d", 123456789, True)?

In another PR. We want atomic changes because it's easier to backport and inspect. We don't want multiple unrelated changes in general. But for this one, I think we don't really need to change it. It's fine, it works and it's in a test function.

@donBarbos
Copy link
Contributor Author

@picnixz

However, if we do multiple re.compile() with different patterns, then the next call to format_string will be slower as we need to make a full re-compile.

Now I understand, sorry :-)
But how can it happen that call do multiple re.compile() with different patterns if the pattern is hardcoded (the user does not pass it)

@picnixz
Copy link
Member

picnixz commented Feb 9, 2025

But how can it happen that call do multiple re.compile() with different patterns if the pattern is hardcoded (the user does not pass it)

Simple example:

locale.format_string(...)
some_other_function_that_calls_re_compile()
...
locale.format_string(...)  # and we need to re-compile it

The re.compile() cache is shared not by "location" but by the entire Python process. So it might happen that you have other re.compile() calls between different calls to locale.format_string().

Anyway, apart from this possibility of performance loss, the readability is still affected IMO. I'm not against Adam giving his opinion on this matter but if we want to have a real performance improvement, I think we should use the _pattern = None approach where you cache the compiled pattern at the module (locale) level:

_pattern = None

def format_string(...):
    if _pattern is None:
        import re

        global _pattern
        _pattern = re.compile(...)
    ...

With this approach re.compile() is called only once and for all but readability is affected even more.

@donBarbos
Copy link
Contributor Author

Thank you for your explanation.
Then I will send a change that at least does not decrease performance. And I will leave this PR until the next reaction

@eli-schwartz
Copy link
Contributor

This will always recompile the pattern, which is not ideal. In addition, this function is likely to be used by gettext or other stuff so it will be called quite a lot IMO. Note that only the recent calls to re.compile() are cached, so it could or could not be efficient and it could also badly interact with other occurrences of re.compile(). An alternative is to use _percent_re = None + one-time compile but that could also harm readability.

I think I left out all re imports that were needed for a global pattern (if that pattern was necessary in a common function). Again, the issue is not just about making the import faster.

So, I'm not really convinced by this specific change. But maybe I'm overthinking and we can do it. Now, locale is likely to be used in text processing context where re is likely to have already been imported.

I did that None pattern in GH-128898, though admittedly my rationale for there was that it is quite "common" for gettext to be imported in contexts where it is installed as a global _() handler for, uh, readability purposes :) even if it never ends up being invoked -- and even if you turn out not to have any actual translations.

Note though that the import for this PR is only used by locale.format_string() which isn't used by gettext at all. In fact, nothing in the standard library uses this function at all. While there are surely non-stdlib uses of locale.format_string() it seems plausible that it isn't the default use case of the module :) so at the very least, avoiding a re.compile could be a bit of a time-saver even if your application still imports re either way...

It also seems like it should be relatively common to import locale to get at getencoding without necessarily using re or generalized text manipulation, just needing to know the correct format to open files or even some pesky subprocesses which a dreadful habit of communicating via non-UTF8 stdout.

@hugovk hugovk merged commit cf5e438 into python:main Mar 27, 2025
43 checks passed
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 1, 2025
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants