gh-118761: Always lazy import `re` in `locale` #129860

donBarbos · 2025-02-08T18:31:23Z

Improve import time of locale by lazy import re

CPython configure flags:

./configure --enable-optimizations --with-lto --enable-loadable-sqlite-extensions

Benchmarks:

Running: pipx install tuna && ./python -X importtime -c 'import locale' 2> import.log && tuna import.log

Total import time: 0.014s -> 0.009s = x1.56 as fast

main branch	PR branch

`locale` import time: 0.007s -> 0.003s = x2.34 as fast

main branch	PR branch

hyperfine: 14.9ms -> 12.1ms = x1.23 as fast

Main branch:

$ hyperfine --warmup 11 --runs 3000 "./python -c 'import locale'"
Benchmark 1: ./python -c 'import locale'
  Time (mean ± σ):      14.9 ms ±   0.5 ms    [User: 12.8 ms, System: 2.1 ms]
  Range (min … max):    14.4 ms …  25.3 ms    3000 runs

PR branch:

$ hyperfine --warmup 11 --runs 3000 "./python -c 'import locale'"
Benchmark 1: ./python -c 'import locale'
  Time (mean ± σ):      12.1 ms ±   0.5 ms    [User: 10.0 ms, System: 2.1 ms]
  Range (min … max):    11.6 ms …  21.6 ms    3000 runs

Issue: Improve import time of various stdlib modules #118761

donBarbos · 2025-02-08T18:35:58Z

@AA-Turner Is this what you wanted to see?

picnixz

This will always recompile the pattern, which is not ideal. In addition, this function is likely to be used by gettext or other stuff so it will be called quite a lot IMO. Note that only the recent calls to re.compile() are cached, so it could or could not be efficient and it could also badly interact with other occurrences of re.compile(). An alternative is to use _percent_re = None + one-time compile but that could also harm readability.

I think I left out all re imports that were needed for a global pattern (if that pattern was necessary in a common function). Again, the issue is not just about making the import faster.

So, I'm not really convinced by this specific change. But maybe I'm overthinking and we can do it. Now, locale is likely to be used in text processing context where re is likely to have already been imported.

picnixz · 2025-02-08T18:44:16Z

But yes, that's what we wanted to see (namely tuna + hyperfine benchmarks). Ideally, I'd also like to have the benchmarks for locale.format_string now since the call is more complex and where re.compile() has no cache (namely, I want to know how fast a single call to format_string is. Since there is a caching mechanism, the cache will need to be cleared after each call as well so be careful when doing benchmarks).

donBarbos · 2025-02-08T20:23:48Z

@picnixz

Benchmarks for `locale.format_string` function:

before run tests: sudo -v

hyperfine: 44.6ms -> 44.4ms (no diff)

Main branch:

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' --runs 100 "./python -c 'import locale;locale.format_string(\"%d\", 123456789)'"
Benchmark 1: ./python -c 'import locale;locale.format_string("%d", 123456789)'
  Time (mean ± σ):      44.6 ms ±   1.6 ms    [User: 15.2 ms, System: 9.4 ms]
  Range (min … max):    41.1 ms …  51.0 ms    100 runs

PR branch:

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' --runs 100 "./python -c 'import locale;locale.format_string(\"%d\", 123456789)'"
Benchmark 1: ./python -c 'import locale;locale.format_string("%d", 123456789)'
  Time (mean ± σ):      44.4 ms ±   1.8 ms    [User: 15.5 ms, System: 9.1 ms]
  Range (min … max):    38.9 ms …  55.0 ms    100 runs

donBarbos · 2025-02-08T20:38:45Z

I also found a weird argument passing for the format_string function in same file on line 332:

cpython/Lib/locale.py

Lines 329 to 336 in c1f352b

    
           def _test(): 
        
               setlocale(LC_ALL, "") 
        
               #do grouping 
        
               s1 = format_string("%d", 123456789,1) 
        
               print(s1, "is", atoi(s1)) 
        
               #standard formatting 
        
               s1 = str(3.14) 
        
               print(s1, "is", atof(s1))

I understand why it works, but can I change it to the expected argument type, like format_string("%d", 123456789, True)?

donBarbos · 2025-02-08T23:51:17Z

@picnixz, sorry for the previous test, I forgot about the fact that import affects

Benchmarks for `locale.format_string` function:

I hope everything managed to take into account, i wrote next script benchmark.py:

import time
import locale
import os

code = """
locale.format_string('%d', 123456789)
"""

result_times = []

for _ in range(300):
    os.system("sync && echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null")
    start_time = time.time()
    exec(code, {"locale": locale})
    end_time = time.time()
    result_times.append(end_time-start_time)


import statistics

first = result_times[0]
mean = statistics.mean(result_times)
median = statistics.median(result_times)
stdev = statistics.stdev(result_times)
variance = statistics.variance(result_times)

print(f"First time: {first * 1_000_000:.2f}μs")
print(f"Mean: {mean * 1_000_000:.2f}μs")
print(f"Median: {median * 1_000_000:.2f}μs")
print(f"Standard deviation: {stdev * 1_000_000:.2f}μs")
print(f"Variance: {variance * 1_000_000:.2f}μs")

Main branch:

$ ./python -B benchmark.py
First time: 961.78μs
Mean: 127.01μs
Median: 122.90μs
Standard deviation: 49.28μs
Variance: 0.00μs

PR branch:

$ ./python -B benchmark.py
First time: 6951.33μs
Mean: 151.57μs
Median: 127.79μs
Standard deviation: 394.02μs
Variance: 0.16μs

to be honest, i don't understand why the first run in my code takes so much time.

picnixz · 2025-02-09T00:02:03Z

Sorry but that's not what I meant by "caches". The cache in question is just an LRU cache at Python level:

cpython/Lib/re/__init__.py

Lines 330 to 337 in 3cf8590

    
           def _compile(pattern, flags): 
        
               # internal: compile pattern 
        
               if isinstance(flags, RegexFlag): 
        
                   flags = flags.value 
        
               try: 
        
                   return _cache2[type(pattern), pattern, flags] 
        
               except KeyError: 
        
                   pass

so it has nothing to do with /proc/sys/vm/drop_caches. In order to properly benchmark functions, we usually use pyperf instead. The idea is to essentially benchmark the call of format_string. I suspect that the first call to format_string will always be slower as we need to compile the pattern. Then, if we continue using format_string with any call to re.compile() in between, there won't be any issue. However, if we do multiple re.compile() with different patterns, then the next call to format_string will be slower as we need to make a full re-compile.

picnixz · 2025-02-09T00:03:17Z

I understand why it works, but can I change it to the expected argument type, like format_string("%d", 123456789, True)?

In another PR. We want atomic changes because it's easier to backport and inspect. We don't want multiple unrelated changes in general. But for this one, I think we don't really need to change it. It's fine, it works and it's in a test function.

donBarbos · 2025-02-09T00:19:56Z

@picnixz

However, if we do multiple re.compile() with different patterns, then the next call to format_string will be slower as we need to make a full re-compile.

Now I understand, sorry :-)
But how can it happen that call do multiple re.compile() with different patterns if the pattern is hardcoded (the user does not pass it)

picnixz · 2025-02-09T00:28:07Z

But how can it happen that call do multiple re.compile() with different patterns if the pattern is hardcoded (the user does not pass it)

Simple example:

locale.format_string(...)
some_other_function_that_calls_re_compile()
...
locale.format_string(...)  # and we need to re-compile it

The re.compile() cache is shared not by "location" but by the entire Python process. So it might happen that you have other re.compile() calls between different calls to locale.format_string().

Anyway, apart from this possibility of performance loss, the readability is still affected IMO. I'm not against Adam giving his opinion on this matter but if we want to have a real performance improvement, I think we should use the _pattern = None approach where you cache the compiled pattern at the module (locale) level:

_pattern = None

def format_string(...):
    if _pattern is None:
        import re

        global _pattern
        _pattern = re.compile(...)
    ...

With this approach re.compile() is called only once and for all but readability is affected even more.

donBarbos · 2025-02-09T00:40:13Z

Thank you for your explanation.
Then I will send a change that at least does not decrease performance. And I will leave this PR until the next reaction

eli-schwartz · 2025-02-09T07:10:34Z

This will always recompile the pattern, which is not ideal. In addition, this function is likely to be used by gettext or other stuff so it will be called quite a lot IMO. Note that only the recent calls to re.compile() are cached, so it could or could not be efficient and it could also badly interact with other occurrences of re.compile(). An alternative is to use _percent_re = None + one-time compile but that could also harm readability.

I think I left out all re imports that were needed for a global pattern (if that pattern was necessary in a common function). Again, the issue is not just about making the import faster.

So, I'm not really convinced by this specific change. But maybe I'm overthinking and we can do it. Now, locale is likely to be used in text processing context where re is likely to have already been imported.

I did that None pattern in GH-128898, though admittedly my rationale for there was that it is quite "common" for gettext to be imported in contexts where it is installed as a global _() handler for, uh, readability purposes :) even if it never ends up being invoked -- and even if you turn out not to have any actual translations.

Note though that the import for this PR is only used by locale.format_string() which isn't used by gettext at all. In fact, nothing in the standard library uses this function at all. While there are surely non-stdlib uses of locale.format_string() it seems plausible that it isn't the default use case of the module :) so at the very least, avoiding a re.compile could be a bit of a time-saver even if your application still imports re either way...

It also seems like it should be relatively common to import locale to get at getencoding without necessarily using re or generalized text manipulation, just needing to know the correct format to open files or even some pesky subprocesses which a dreadful habit of communicating via non-UTF8 stdout.

Always lazy import re in locale

f2366f8

bedevere-app bot added the awaiting review label Feb 8, 2025

bedevere-app bot mentioned this pull request Feb 8, 2025

Improve import time of various stdlib modules #118761

Open

picnixz reviewed Feb 8, 2025

View reviewed changes

donBarbos added 2 commits February 9, 2025 04:50

Back performance to re

bca1380

Fix bug

5e08b48

Merge branch 'main' into lazy-locale-re

b52596c

hugovk approved these changes Feb 14, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Feb 14, 2025

hugovk merged commit cf5e438 into python:main Mar 27, 2025
43 checks passed

bedevere-app bot removed the awaiting merge label Mar 27, 2025

diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 1, 2025

pythongh-118761: Always lazy import re in locale (python#129860)

d4c9834

seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025

pythongh-118761: Always lazy import re in locale (python#129860)

a52f0df

dangotbanned mentioned this pull request May 10, 2025

chore: Bump version manually only in pyproject.toml narwhals-dev/narwhals#2514

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-118761: Always lazy import `re` in `locale` #129860

gh-118761: Always lazy import `re` in `locale` #129860

Uh oh!

donBarbos commented Feb 8, 2025 •

edited

Loading

Uh oh!

donBarbos commented Feb 8, 2025

Uh oh!

picnixz left a comment

Uh oh!

picnixz commented Feb 8, 2025 •

edited

Loading

Uh oh!

donBarbos commented Feb 8, 2025

Uh oh!

donBarbos commented Feb 8, 2025 •

edited

Loading

Uh oh!

donBarbos commented Feb 8, 2025

Uh oh!

picnixz commented Feb 9, 2025 •

edited

Loading

Uh oh!

picnixz commented Feb 9, 2025 •

edited

Loading

Uh oh!

donBarbos commented Feb 9, 2025

Uh oh!

picnixz commented Feb 9, 2025

Uh oh!

donBarbos commented Feb 9, 2025

Uh oh!

eli-schwartz commented Feb 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gh-118761: Always lazy import re in locale #129860

gh-118761: Always lazy import re in locale #129860

Uh oh!

Conversation

donBarbos commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CPython configure flags:

Benchmarks:

Total import time: 0.014s -> 0.009s = x1.56 as fast

locale import time: 0.007s -> 0.003s = x2.34 as fast

hyperfine: 14.9ms -> 12.1ms = x1.23 as fast

Main branch:

PR branch:

Uh oh!

donBarbos commented Feb 8, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

picnixz commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

donBarbos commented Feb 8, 2025

Benchmarks for locale.format_string function:

hyperfine: 44.6ms -> 44.4ms (no diff)

Main branch:

PR branch:

Uh oh!

donBarbos commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

donBarbos commented Feb 8, 2025

Benchmarks for locale.format_string function:

Main branch:

PR branch:

Uh oh!

picnixz commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

donBarbos commented Feb 9, 2025

Uh oh!

picnixz commented Feb 9, 2025

Uh oh!

donBarbos commented Feb 9, 2025

Uh oh!

eli-schwartz commented Feb 9, 2025

Uh oh!

Uh oh!

Uh oh!

gh-118761: Always lazy import `re` in `locale` #129860

gh-118761: Always lazy import `re` in `locale` #129860

donBarbos commented Feb 8, 2025 •

edited

Loading

`locale` import time: 0.007s -> 0.003s = x2.34 as fast

picnixz commented Feb 8, 2025 •

edited

Loading

Benchmarks for `locale.format_string` function:

donBarbos commented Feb 8, 2025 •

edited

Loading

Benchmarks for `locale.format_string` function:

picnixz commented Feb 9, 2025 •

edited

Loading

picnixz commented Feb 9, 2025 •

edited

Loading