-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-135336: Add fast path to json string encoding #133239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
https://gist.github.com/methane/e080ec9783db2a313f40a2b9e1837e72
Benchmark hidden because not significant (10): json_dumps: List of 256 floats, json_dumps(ensure_ascii=False): List of 256 floats, json_loads: List of 256 booleans, json_loads: List of 256 ASCII strings, json_loads: List of 256 dicts with 1 int, json_loads: Medium complex object, json_loads: Complex object, json_loads: Dict with 256 lists of 256 dicts with 1 int, json_loads: List of 256 stringsensure_ascii=False, json_loads: Complex objectensure_ascii=False |
@mdboom do you have the results of the Faster CPython infrastructure? |
Sorry, forgot to come back to them. Confirmed 14% faster on json_dumps benchmark. In the noise for the others (as one would expect). |
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some very high level comments. I haven't dove too deep into the actual implementation yet.
It would also be good to make an issue explaining the rationale and whatnot, and a blurb entry containing the performance increase. |
Before merging this, we need to decide using private _PyUnicodeWriter APIs or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds quite a bit of code. Could not it be shared between py_encode_basestring and write_escaped_unicode?
I've created an issue and re-used shared code, but https://blurb-it.herokuapp.com is down |
I ran my benchmark #133832 (comment) on this PR. I rebased the PR on the main branch. Encoding a list of ASCII strings is up to 1.7x faster, it's impressive! Sadly, encoding a long ASCII string is always slower (between 1.05x and 1.09x slower).
Benchmark hidden because not significant (2): encode 1000 integers, encode 1000 floats UPDATE: I had to re-run the benchmark since my first attempt was on debug builds :-( |
Whenever possible, I would prefer to use the public |
You can install the blurb tool (pip install blurb) and run it locally in a terminal to add a NEWS entry. |
This is not what I had in mind, although it does speed up a common case. Currently, encoding is two-pass. First we calculate the size of the encoded string, then create the Unicode object of such size and fill it char by char. This PR uses the first step to determine whether we can get rid of the intermediate Unicode object (if there are no characters that need escaping). This helps for booleand, numbers, and many simple strings. But we can get rid of the intermediate Unicode object in all cases -- just reserve space in PyUnicodeWriter and write the encoded string directly there. For performance, we should not use high-level API like |
This not exposed through the public API. You could maybe try to use |
Not sure why but calling |
pyperformance (with
--enable-optimizations
and--with-lto
)jsonyx-performance-tests (with
--enable-optimizations
and--with-lto
)