Skip to content

json: Optimize escaping string in Encoder #133186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

methane
Copy link
Member

@methane methane commented Apr 30, 2025

No description provided.

@methane methane added performance Performance or resource usage skip issue extension-modules C modules in the Modules dir labels Apr 30, 2025
@methane methane force-pushed the optimize-json-encode branch from cd2e18c to 59e5131 Compare April 30, 2025 07:32
@methane methane requested a review from Copilot April 30, 2025 07:33
Copilot

This comment was marked as resolved.

@methane
Copy link
Member Author

methane commented Apr 30, 2025

without --enable-optimizations:

https://github.com/python/pyperformance/blob/main/pyperformance/data-files/benchmarks/bm_json_dumps/run_benchmark.py

Mean +- std dev: [main] 9.25 ms +- 0.07 ms -> [patched] 7.68 ms +- 0.03 ms: 1.20x faster

@mdboom
Copy link
Contributor

mdboom commented Apr 30, 2025

I'm going to benchmark this on pyperformance on the Faster CPython infrastructure and report back in a couple of hours.

@nineteendo
Copy link
Contributor

I benchmarked this feature on my own library and I'm a bit worried. Strings without escapes are faster, but strings with escapes are a lot slower:

encode json (setuptools) jsonyx (2.2.1) reference time
List of 256 ASCII strings 1.00x 0.89x 49.97 μs
List of 256 dicts with 1 int 1.00x 1.02x 90.40 μs
Medium complex object 1.00x 1.06x 138.32 μs
List of 256 strings 1.00x 0.91x 310.31 μs
Complex object 1.00x 0.99x 1522.59 μs
Dict with 256 lists of 256 dicts with 1 int 1.00x 1.07x 23563.12 μs
encode json (setuptools) jsonyx (main) reference time
List of 256 ASCII strings 1.00x 0.47x 66.49 μs
List of 256 dicts with 1 int 1.00x 0.94x 94.91 μs
Medium complex object 1.00x 0.91x 146.82 μs
List of 256 strings 1.00x 2.76x 323.10 μs
Complex object 1.00x 1.26x 1523.92 μs
Dict with 256 lists of 256 dicts with 1 int 1.00x 0.92x 22958.90 μs

@methane
Copy link
Member Author

methane commented Apr 30, 2025 via email

@nineteendo
Copy link
Contributor

Better, but it's still twice as slow:

encode json (setuptools) jsonyx reference time
List of 256 ASCII strings 1.00x 0.60x 50.39 μs
List of 256 dicts with 1 int 1.00x 0.92x 91.32 μs
Medium complex object 1.00x 0.87x 144.80 μs
List of 256 strings 1.00x 2.05x 305.92 μs
Complex object 1.00x 1.15x 1543.54 μs
Dict with 256 lists of 256 dicts with 1 int 1.00x 0.91x 23013.43 μs

@nineteendo
Copy link
Contributor

nineteendo commented Apr 30, 2025

How about just writing strings without escapes directly to the unicode writer?
Because the main performance improvement of this PR is simply to avoid creating a new string.

_PyUnicodeWriter_WriteChar(writer, '"')
_PyUnicodeWriter_WriteStr(writer, pystr) // original string
_PyUnicodeWriter_WriteChar(writer, '"')

@nineteendo
Copy link
Contributor

nineteendo commented Apr 30, 2025

Results of that (nineteendo/jsonyx@7c31ee4):

encode json (setuptools) jsonyx (main) reference time
List of 256 ASCII strings 1.00x 0.45x 50.34 μs
List of 256 dicts with 1 int 1.00x 0.86x 91.83 μs
Medium complex object 1.00x 0.86x 141.84 μs
List of 256 strings 1.00x 0.97x 313.86 μs
Complex object 1.00x 1.03x 1529.10 μs
Dict with 256 lists of 256 dicts with 1 int 1.00x 0.86x 23190.66 μs

It's going to be a little harder to apply the change here (unless we just duplicate the functions).

@nineteendo
Copy link
Contributor

I would still like a proper fix for faster-cpython/ideas#726 though. Should we just switch back to the private API?

@methane methane force-pushed the optimize-json-encode branch from 646f257 to 5c8fcf9 Compare May 1, 2025 03:57
@methane methane force-pushed the optimize-json-encode branch from 5c8fcf9 to 8e5e00b Compare May 1, 2025 06:17
@nineteendo
Copy link
Contributor

See #133239 for my approach.

@methane methane force-pushed the optimize-json-encode branch from b66863d to 19c0f1f Compare May 1, 2025 08:01
@methane
Copy link
Member Author

methane commented May 1, 2025

https://gist.github.com/methane/e080ec9783db2a313f40a2b9e1837e72

Benchmark main patched2
json_dumps: List of 256 booleans 16.6 us 16.5 us: 1.01x faster
json_dumps: List of 256 ASCII strings 67.9 us 34.7 us: 1.96x faster
json_dumps: List of 256 dicts with 1 int 122 us 101 us: 1.21x faster
json_dumps: Medium complex object 205 us 173 us: 1.18x faster
json_dumps: List of 256 strings 330 us 302 us: 1.09x faster
json_dumps: Complex object 2.57 ms 1.96 ms: 1.31x faster
json_dumps: Dict with 256 lists of 256 dicts with 1 int 30.5 ms 26.5 ms: 1.15x faster
json_dumps(ensure_ascii=False): List of 256 booleans 16.6 us 16.5 us: 1.01x faster
json_dumps(ensure_ascii=False): List of 256 ASCII strings 68.1 us 34.6 us: 1.96x faster
json_dumps(ensure_ascii=False): List of 256 dicts with 1 int 122 us 101 us: 1.21x faster
json_dumps(ensure_ascii=False): Medium complex object 205 us 172 us: 1.19x faster
json_dumps(ensure_ascii=False): List of 256 strings 329 us 303 us: 1.09x faster
json_dumps(ensure_ascii=False): Complex object 2.56 ms 1.95 ms: 1.31x faster
json_dumps(ensure_ascii=False): Dict with 256 lists of 256 dicts with 1 int 30.6 ms 26.5 ms: 1.15x faster
json_loads: List of 256 booleans 9.01 us 9.09 us: 1.01x slower
json_loads: List of 256 ASCII strings 40.7 us 40.2 us: 1.01x faster
json_loads: List of 256 floats 91.4 us 88.3 us: 1.03x faster
json_loads: Medium complex object 150 us 147 us: 1.02x faster
json_loads: List of 256 strings 848 us 816 us: 1.04x faster
json_loads: Dict with 256 lists of 256 dicts with 1 int 46.5 ms 46.7 ms: 1.00x slower
json_loads: List of 256 stringsensure_ascii=False 85.2 us 85.7 us: 1.01x slower
Geometric mean (ref) 1.13x faster

Benchmark hidden because not significant (5): json_dumps: List of 256 floats, json_dumps(ensure_ascii=False): List of 256 floats, json_loads: List of 256 dicts with 1 int, json_loads: Complex object, json_loads: Complex objectensure_ascii=False

@methane
Copy link
Member Author

methane commented May 1, 2025

This PR is faster, but #133239 is enough for fixing regression from Python 3.13.

For longer term, encoder should use private (maybe utf-8) buffer instead of PyUnicodeWriter.
Calling overhead of PyUnicodeWriter is not negligible. It is enough for "much faster than pure Python", but not enough for JSON serializer.

@nineteendo
Copy link
Contributor

This PR is faster, but #133239 is enough for fixing regression from Python 3.13.

It's still not fully fixed, encoding booleans is twice as slow. And I don't fully understand why this PR is faster.

@mdboom
Copy link
Contributor

mdboom commented May 1, 2025

Just as a data point, on our Faster CPython infrastructure, this makes the json_dumps benchmark 14.8% faster than main, and is within the noise as the same performance as 3.13.0.

I will also kick off a run on #133239 for comparison.

@methane methane requested a review from vstinner May 2, 2025 06:03
@methane
Copy link
Member Author

methane commented May 2, 2025

Using _PyUnicodeWriter_WriteASCIIString() instead of PyUnicodeWriter_WriteUTF8:

$ ./python -m pyperf compare_to with-fast-path.json use_write_ascii.json -G
Slower (3):
- json_dumps(ensure_ascii=False): List of 256 dicts with 1 int: 101 us +- 0 us -> 102 us +- 0 us: 1.00x slower
- json_loads: Dict with 256 lists of 256 dicts with 1 int: 46.6 ms +- 0.1 ms -> 46.8 ms +- 0.5 ms: 1.00x slower
- json_dumps(ensure_ascii=False): List of 256 floats: 239 us +- 1 us -> 239 us +- 1 us: 1.00x slower

Faster (10):
- json_dumps(ensure_ascii=False): List of 256 strings: 303 us +- 5 us -> 279 us +- 3 us: 1.08x faster
- json_dumps: List of 256 strings: 302 us +- 3 us -> 278 us +- 3 us: 1.08x faster
- json_dumps(ensure_ascii=False): List of 256 booleans: 16.5 us +- 0.1 us -> 15.3 us +- 0.1 us: 1.08x faster
- json_dumps: List of 256 booleans: 16.5 us +- 0.1 us -> 15.3 us +- 0.1 us: 1.07x faster
- json_dumps: Complex object: 1.96 ms +- 0.01 ms -> 1.87 ms +- 0.01 ms: 1.05x faster
- json_dumps(ensure_ascii=False): Complex object: 1.96 ms +- 0.01 ms -> 1.87 ms +- 0.02 ms: 1.05x faster
- json_dumps: Medium complex object: 173 us +- 1 us -> 171 us +- 1 us: 1.01x faster
- json_dumps(ensure_ascii=False): Medium complex object: 172 us +- 1 us -> 171 us +- 1 us: 1.01x faster
- json_loads: Medium complex object: 148 us +- 1 us -> 147 us +- 1 us: 1.00x faster
- json_dumps: List of 256 floats: 239 us +- 0 us -> 239 us +- 0 us: 1.00x faster

Benchmark hidden because not significant (13): json_dumps: List of 256 ASCII strings, json_dumps: List of 256 dicts with 1 int, json_dumps: Dict with 256 lists of 256 dicts with 1 int, json_dumps(ensure_ascii=False): List of 256 ASCII st
rings, json_dumps(ensure_ascii=False): Dict with 256 lists of 256 dicts with 1 int, json_loads: List of 256 booleans, json_loads: List of 256 ASCII strings, json_loads: List of 256 floats, json_loads: List of 256 dicts with 1 int, json_l
oads: List of 256 strings, json_loads: Complex object, json_loads: List of 256 stringsensure_ascii=False, json_loads: Complex objectensure_ascii=False

Patch:

diff --git a/Modules/_json.c b/Modules/_json.c
index cd08fa688d3..cd57760282a 100644
--- a/Modules/_json.c
+++ b/Modules/_json.c
@@ -351,7 +351,7 @@ write_escaped_ascii(PyUnicodeWriter *writer, PyObject *pystr)
         }

         if (buf_len + 12 > ESCAPE_BUF_SIZE) {
-            ret = PyUnicodeWriter_WriteUTF8(writer, buf, buf_len);
+            ret = _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, buf, buf_len);
             if (ret) return ret;
             buf_len = 0;
         }
@@ -359,7 +359,7 @@ write_escaped_ascii(PyUnicodeWriter *writer, PyObject *pystr)

     assert(buf_len < ESCAPE_BUF_SIZE);
     buf[buf_len++] = '"';
-    return PyUnicodeWriter_WriteUTF8(writer, buf, buf_len);
+    return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, buf, buf_len);
 }

 static int
@@ -1612,13 +1612,13 @@ encoder_listencode_obj(PyEncoderObject *s, PyUnicodeWriter *writer,
     int rv;

     if (obj == Py_None) {
-      return PyUnicodeWriter_WriteUTF8(writer, "null", 4);
+      return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, "null", 4);
     }
     else if (obj == Py_True) {
-      return PyUnicodeWriter_WriteUTF8(writer, "true", 4);
+      return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, "true", 4);
     }
     else if (obj == Py_False) {
-      return PyUnicodeWriter_WriteUTF8(writer, "false", 5);
+      return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, "false", 5);
     }
     else if (PyUnicode_Check(obj)) {
         return encoder_write_string(s, writer, obj);
@@ -1779,7 +1779,7 @@ encoder_listencode_dict(PyEncoderObject *s, PyUnicodeWriter *writer,

     if (PyDict_GET_SIZE(dct) == 0) {
         /* Fast path */
-        return PyUnicodeWriter_WriteUTF8(writer, "{}", 2);
+        return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, "{}", 2);
     }

     if (s->markers != Py_None) {
@@ -1883,7 +1883,7 @@ encoder_listencode_list(PyEncoderObject *s, PyUnicodeWriter *writer,
         return -1;
     if (PySequence_Fast_GET_SIZE(s_fast) == 0) {
         Py_DECREF(s_fast);
-        return PyUnicodeWriter_WriteUTF8(writer, "[]", 2);
+        return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, "[]", 2);
     }

     if (s->markers != Py_None) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review extension-modules C modules in the Modules dir performance Performance or resource usage skip issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants