-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
json: Optimize escaping string in Encoder #133186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cd2e18c
to
59e5131
Compare
without
|
I'm going to benchmark this on pyperformance on the Faster CPython infrastructure and report back in a couple of hours. |
I benchmarked this feature on my own library and I'm a bit worried. Strings without escapes are faster, but strings with escapes are a lot slower:
|
How about adding if (copy_len > 0) before PyUnicodeWriter_WriteSubstring?
|
Better, but it's still twice as slow:
|
How about just writing strings without escapes directly to the unicode writer? _PyUnicodeWriter_WriteChar(writer, '"')
_PyUnicodeWriter_WriteStr(writer, pystr) // original string
_PyUnicodeWriter_WriteChar(writer, '"') |
Results of that (nineteendo/jsonyx@7c31ee4):
It's going to be a little harder to apply the change here (unless we just duplicate the functions). |
I would still like a proper fix for faster-cpython/ideas#726 though. Should we just switch back to the private API? |
646f257
to
5c8fcf9
Compare
5c8fcf9
to
8e5e00b
Compare
See #133239 for my approach. |
b66863d
to
19c0f1f
Compare
https://gist.github.com/methane/e080ec9783db2a313f40a2b9e1837e72
Benchmark hidden because not significant (5): json_dumps: List of 256 floats, json_dumps(ensure_ascii=False): List of 256 floats, json_loads: List of 256 dicts with 1 int, json_loads: Complex object, json_loads: Complex objectensure_ascii=False |
This PR is faster, but #133239 is enough for fixing regression from Python 3.13. For longer term, encoder should use private (maybe utf-8) buffer instead of PyUnicodeWriter. |
It's still not fully fixed, encoding booleans is twice as slow. And I don't fully understand why this PR is faster. |
Just as a data point, on our Faster CPython infrastructure, this makes the json_dumps benchmark 14.8% faster than main, and is within the noise as the same performance as 3.13.0. I will also kick off a run on #133239 for comparison. |
Using
Patch:
|
No description provided.