Skip to content

gh-135336: Add fast path to json string encoding #133239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

nineteendo
Copy link
Contributor

@nineteendo nineteendo commented May 1, 2025

pyperformance (with --enable-optimizations and --with-lto)

main.json
=========

Performance version: 1.11.0
Python version: 3.15.0a0 (64-bit) revision c600310663
Report on macOS-13.7.6-x86_64-i386-64bit-Mach-O
Number of logical CPUs: 8
Start date: 2025-06-12 08:26:22.632424
End date: 2025-06-12 08:26:59.100296

feature.json
============

Performance version: 1.11.0
Python version: 3.15.0a0 (64-bit) revision 660d962602
Report on macOS-13.7.6-x86_64-i386-64bit-Mach-O
Number of logical CPUs: 8
Start date: 2025-06-12 08:27:40.576627
End date: 2025-06-12 08:28:11.517308

### json_dumps ###
Mean +- std dev: 12.2 ms +- 0.2 ms -> 10.0 ms +- 0.2 ms: 1.22x faster
Significant (t=88.87)

jsonyx-performance-tests (with --enable-optimizations and --with-lto)

encode main feature difference
Dict with 65,536 booleans 8735.25 μs 5793.46 μs 1.50x faster
List of 65,536 empty strings 3424.57 μs 1654.34 μs 2.07x faster
List of 65,536 ASCII strings 12975.45 μs 5896.28 μs 2.20x faster
List of 65,536 strings 85195.07 μs 85930.24 μs 1.01x slower

@methane
Copy link
Member

methane commented May 1, 2025

https://gist.github.com/methane/e080ec9783db2a313f40a2b9e1837e72

Benchmark main #133186 #133239
json_dumps: List of 256 booleans 16.6 us not significant 17.2 us: 1.03x slower
json_dumps: List of 256 ASCII strings 67.9 us 34.7 us: 1.96x faster 46.5 us: 1.46x faster
json_dumps: List of 256 dicts with 1 int 122 us 101 us: 1.21x faster 112 us: 1.09x faster
json_dumps: Medium complex object 205 us 173 us: 1.18x faster 189 us: 1.09x faster
json_dumps: List of 256 strings 330 us 302 us: 1.09x faster 298 us: 1.11x faster
json_dumps: Complex object 2.57 ms 1.96 ms: 1.31x faster not significant
json_dumps: Dict with 256 lists of 256 dicts with 1 int 30.5 ms 26.5 ms: 1.15x faster 29.4 ms: 1.04x faster
json_dumps(ensure_ascii=False): List of 256 booleans 16.6 us not significant 17.2 us: 1.03x slower
json_dumps(ensure_ascii=False): List of 256 ASCII strings 68.1 us 34.6 us: 1.96x faster 46.5 us: 1.46x faster
json_dumps(ensure_ascii=False): List of 256 dicts with 1 int 122 us 101 us: 1.21x faster 112 us: 1.09x faster
json_dumps(ensure_ascii=False): Medium complex object 205 us 172 us: 1.19x faster 188 us: 1.09x faster
json_dumps(ensure_ascii=False): List of 256 strings 329 us 303 us: 1.09x faster 298 us: 1.11x faster
json_dumps(ensure_ascii=False): Complex object 2.56 ms 1.95 ms: 1.31x faster not significant
json_dumps(ensure_ascii=False): Dict with 256 lists of 256 dicts with 1 int 30.6 ms 26.5 ms: 1.15x faster 29.4 ms: 1.04x faster
json_loads: List of 256 floats 91.4 us 88.3 us: 1.03x faster not significant
json_loads: List of 256 strings 848 us 816 us: 1.04x faster not significant
Geometric mean (ref) 1.13x faster 1.05x faster

Benchmark hidden because not significant (10): json_dumps: List of 256 floats, json_dumps(ensure_ascii=False): List of 256 floats, json_loads: List of 256 booleans, json_loads: List of 256 ASCII strings, json_loads: List of 256 dicts with 1 int, json_loads: Medium complex object, json_loads: Complex object, json_loads: Dict with 256 lists of 256 dicts with 1 int, json_loads: List of 256 stringsensure_ascii=False, json_loads: Complex objectensure_ascii=False

@nineteendo
Copy link
Contributor Author

@mdboom do you have the results of the Faster CPython infrastructure?

@mdboom
Copy link
Contributor

mdboom commented May 10, 2025

@mdboom do you have the results of the Faster CPython infrastructure?

Sorry, forgot to come back to them.

They are here: https://github.com/faster-cpython/benchmarking-public/blob/main/results/bm-20250501-3.14.0a7%2B-930e938/bm-20250501-linux-x86_64-nineteendo-speedup_json_encode-3.14.0a7%2B-930e938-vs-base.svg

Confirmed 14% faster on json_dumps benchmark. In the noise for the others (as one would expect).

@nineteendo nineteendo marked this pull request as ready for review May 13, 2025 14:06
@nineteendo

This comment was marked as resolved.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very high level comments. I haven't dove too deep into the actual implementation yet.

@ZeroIntensity ZeroIntensity added the performance Performance or resource usage label Jun 10, 2025
@ZeroIntensity
Copy link
Member

It would also be good to make an issue explaining the rationale and whatnot, and a blurb entry containing the performance increase.

@methane
Copy link
Member

methane commented Jun 10, 2025

Before merging this, we need to decide using private _PyUnicodeWriter APIs or not.
We shouldn't decide how to optimize more before it.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds quite a bit of code. Could not it be shared between py_encode_basestring and write_escaped_unicode?

@nineteendo nineteendo changed the title json: Fast path for string encoding gh-135336: Add fast path to json string encoding Jun 10, 2025
@nineteendo
Copy link
Contributor Author

I've created an issue and re-used shared code, but https://blurb-it.herokuapp.com is down

@vstinner
Copy link
Member

vstinner commented Jun 11, 2025

I ran my benchmark #133832 (comment) on this PR. I rebased the PR on the main branch.

Encoding a list of ASCII strings is up to 1.7x faster, it's impressive!

Sadly, encoding a long ASCII string is always slower (between 1.05x and 1.09x slower).

Benchmark main pr133239
encode 100 booleans 4.38 us 3.97 us: 1.10x faster
encode 100 integers 7.97 us 6.76 us: 1.18x faster
encode 100 floats 12.7 us 11.1 us: 1.14x faster
encode 100 "ascii" strings 8.75 us 5.63 us: 1.55x faster
encode ascii string len=100 540 ns 577 ns: 1.07x slower
encode escaped string len=128 754 ns 615 ns: 1.23x faster
encode Unicode string len=100 645 ns 595 ns: 1.08x faster
encode 1000 booleans 18.0 us 19.7 us: 1.09x slower
encode 1000 "ascii" strings 59.0 us 34.1 us: 1.73x faster
encode ascii string len=1000 2.09 us 1.91 us: 1.09x faster
encode escaped string len=896 2.33 us 2.15 us: 1.08x faster
encode Unicode string len=1000 2.81 us 2.90 us: 1.03x slower
encode 10000 booleans 158 us 169 us: 1.07x slower
encode 10000 integers 501 us 442 us: 1.13x faster
encode 10000 floats 1.04 ms 888 us: 1.18x faster
encode 10000 "ascii" strings 596 us 348 us: 1.71x faster
encode ascii string len=10000 16.9 us 17.8 us: 1.05x slower
encode escaped string len=9984 20.2 us 19.6 us: 1.03x faster
encode Unicode string len=10000 27.3 us 24.1 us: 1.13x faster
Geometric mean (ref) 1.13x faster

Benchmark hidden because not significant (2): encode 1000 integers, encode 1000 floats

UPDATE: I had to re-run the benchmark since my first attempt was on debug builds :-(

@vstinner
Copy link
Member

Before merging this, we need to decide using private _PyUnicodeWriter APIs or not.

Whenever possible, I would prefer to use the public PyUnicodeWriter API. In issue gh-133968, I optimized PyUnicodeWriter to make the public API faster and so more interesting.

@vstinner
Copy link
Member

I've created an issue and re-used shared code, but https://blurb-it.herokuapp.com/ is down

You can install the blurb tool (pip install blurb) and run it locally in a terminal to add a NEWS entry.

@serhiy-storchaka
Copy link
Member

This is not what I had in mind, although it does speed up a common case.

Currently, encoding is two-pass. First we calculate the size of the encoded string, then create the Unicode object of such size and fill it char by char. This PR uses the first step to determine whether we can get rid of the intermediate Unicode object (if there are no characters that need escaping). This helps for booleand, numbers, and many simple strings. But we can get rid of the intermediate Unicode object in all cases -- just reserve space in PyUnicodeWriter and write the encoded string directly there. For performance, we should not use high-level API like PyUnicodeWriter_WriteChar(), but write directly in the buffer.

@nineteendo
Copy link
Contributor Author

nineteendo commented Jun 12, 2025

we can get rid of the intermediate Unicode object in all cases -- just reserve space in PyUnicodeWriter and write the encoded string directly there. For performance, we should not use high-level API like PyUnicodeWriter_WriteChar(), but write directly in the buffer.

This not exposed through the public API. You could maybe try to use PyUnicodeWriter_WriteUCS4(), but I doubt that's much faster.

@nineteendo
Copy link
Contributor Author

Not sure why but calling PyUnicode_GET_LENGTH(), PyUnicode_DATA(), PyUnicode_KIND() multiple times is inefficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants