Skip to content

Use RecursionError to check for circular references in json.dumps #134935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aivarsk opened this issue May 30, 2025 · 9 comments
Closed

Use RecursionError to check for circular references in json.dumps #134935

aivarsk opened this issue May 30, 2025 · 9 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@aivarsk
Copy link
Contributor

aivarsk commented May 30, 2025

Feature or enhancement

Proposal:

I was looking at several Python JSON encoders that are faster than Python's json module. One thing they do not do is checking for circular references. Instead they rely on RecursionError^H^H^H^H^H^H^H custom recursion limit to do it for them:

ultrajson gives you OverflowError: Maximum recursion level reached
orjson gives you a TypeError: Recursion limit reached.

I tried comparing json.dumps() with json.dumps(check_circular=False) and it gave 15% on a JSON used in JSON library comparison. The result might be different for different use cases.

But I think having extra code to track references just to raise a nicer ValueError is a bit too much. I have a patch that removes the extra code and tries to keep everything backward compatible. Could remove even more by removing markers from c_make_encoder and _make_iterencode but that requires fixes to some tests and maybe it breaks something.

Anyway, open to feedback.

# Add a code block here, if required

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

@nineteendo
Copy link
Contributor

@picnixz, another JSON issue.

@picnixz picnixz added the stdlib Python modules in the Lib dir label May 30, 2025
@aivarsk
Copy link
Contributor Author

aivarsk commented May 30, 2025

Something broke on WASI build, I will try to figure out

pyperformance:

before.json
===========

Performance version: 1.11.0
Python version: 3.15.0a0 (64-bit) revision 1a89991d236
Report on Linux-6.11.0-19-generic-x86_64-with-glibc2.40
Number of logical CPUs: 8
Start date: 2025-05-30 19:14:01.969718
End date: 2025-05-30 19:14:14.319257

after.json
==========

Performance version: 1.11.0
Python version: 3.15.0a0 (64-bit) revision c3120cbe1f
Report on Linux-6.11.0-19-generic-x86_64-with-glibc2.40
Number of logical CPUs: 8
Start date: 2025-05-30 19:13:37.909306
End date: 2025-05-30 19:13:56.541717

### json_dumps ###
Mean +- std dev: 24.2 ms +- 1.9 ms -> 22.2 ms +- 1.8 ms: 1.09x faster
Significant (t=5.81)

@nineteendo
Copy link
Contributor

Something broke on WASI build, I will try to figure out

Is this the "worse" referred to in the docs?

Also, if you're running python.exe -m pyperformance run -b "json_dumps" -r -o out.json, that's with check_circular=True.

@aivarsk
Copy link
Contributor Author

aivarsk commented May 30, 2025

pyperformance is for before and after my patch and it shows there is a performance improvement for check_circular=True (the default value) when the custom code is removed and RecursionError is used to signal about circular references.

Sorry, I did not understand the 'the "worse" referred to in the docs'

@gpshead
Copy link
Member

gpshead commented May 30, 2025

In general I like that this simplifies the code to maintain and as faster... BUT the one failure showing up in CI is an example of the potential danger of relying on RecursionError being raised where we did not before adds, API wise: A dying process instead of the previous nice error when CPython isn't able to detect recursion depth limits vs the underlying platform it is running on accurately. (unsurprisingly, the WASI test - the WASM VM has much tighter limits)

That doesn't necessarily block this change (the test could be adjusted to not go so deep). WASI has recursion depth limits problems in general. But it shows that it isn't entirely safe as other environments interpreters run in could hit the same situation.

@gpshead
Copy link
Member

gpshead commented May 30, 2025

i'm blindly assuming the WASI failure was a C stack overflow FWIW, I didn't dig into anything.

@nineteendo
Copy link
Contributor

pyperformance is for before and after my patch and it shows there is a performance improvement for check_circular=True (the default value) when the custom code is removed and RecursionError is used to signal about circular references.

Could you compare the performance of this PR with one that simply changes the default value? This is not really a fair comparison.

@serhiy-storchaka
Copy link
Member

RecursionError is raised in different case -- for too deep nested structures -- and it is just an implementation detail. It is possible to write an implementation that can serialize arbitrary nested structures.

Also, RecursionError is raised for two different reasons. In the C code it prevents the C stack overflow. This is unreliable way, but better than nothing. In the Python code it is not needed, it is raised implicitly because there is global limit for recursion calls.

Also, the circular reference can be detected as soon as it is found, while RecursionError is raised much later, wasting time and memory (and possibly writing or transmitting unnecessary data). The Python interpreter is also in vulnerable state near the end of the recursion limit -- some calls can fail with their own RecursionError.

In any case, I do not recommend to rely on RecursionError. This is an extreme case. A normal program shouldn't get to this point.

@aivarsk
Copy link
Contributor Author

aivarsk commented Jun 2, 2025

I am having doubts about my patch. I looked at other "faster" JSON library implementations and they rely on a custom counter for recursion depth and stop at 1024 by default our it is changeable through API.

@serhiy-storchaka serhiy-storchaka closed this as not planned Won't fix, can't repro, duplicate, stale Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests

5 participants