-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Use RecursionError to check for circular references in json.dumps #134935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@picnixz, another JSON issue. |
Something broke on WASI build, I will try to figure out pyperformance:
|
Is this the "worse" referred to in the docs? Also, if you're running |
pyperformance is for before and after my patch and it shows there is a performance improvement for check_circular=True (the default value) when the custom code is removed and RecursionError is used to signal about circular references. Sorry, I did not understand the 'the "worse" referred to in the docs' |
In general I like that this simplifies the code to maintain and as faster... BUT the one failure showing up in CI is an example of the potential danger of relying on RecursionError being raised where we did not before adds, API wise: A dying process instead of the previous nice error when CPython isn't able to detect recursion depth limits vs the underlying platform it is running on accurately. (unsurprisingly, the WASI test - the WASM VM has much tighter limits) That doesn't necessarily block this change (the test could be adjusted to not go so deep). WASI has recursion depth limits problems in general. But it shows that it isn't entirely safe as other environments interpreters run in could hit the same situation. |
i'm blindly assuming the WASI failure was a C stack overflow FWIW, I didn't dig into anything. |
Could you compare the performance of this PR with one that simply changes the default value? This is not really a fair comparison. |
RecursionError is raised in different case -- for too deep nested structures -- and it is just an implementation detail. It is possible to write an implementation that can serialize arbitrary nested structures. Also, RecursionError is raised for two different reasons. In the C code it prevents the C stack overflow. This is unreliable way, but better than nothing. In the Python code it is not needed, it is raised implicitly because there is global limit for recursion calls. Also, the circular reference can be detected as soon as it is found, while RecursionError is raised much later, wasting time and memory (and possibly writing or transmitting unnecessary data). The Python interpreter is also in vulnerable state near the end of the recursion limit -- some calls can fail with their own RecursionError. In any case, I do not recommend to rely on RecursionError. This is an extreme case. A normal program shouldn't get to this point. |
I am having doubts about my patch. I looked at other "faster" JSON library implementations and they rely on a custom counter for recursion depth and stop at 1024 by default our it is changeable through API. |
Uh oh!
There was an error while loading. Please reload this page.
Feature or enhancement
Proposal:
I was looking at several Python JSON encoders that are faster than Python's
json
module. One thing they do not do is checking for circular references. Instead they rely onRecursionError
^H^H^H^H^H^H^H custom recursion limit to do it for them:ultrajson gives you OverflowError: Maximum recursion level reached
orjson gives you a TypeError: Recursion limit reached.
I tried comparing
json.dumps()
withjson.dumps(check_circular=False)
and it gave 15% on a JSON used in JSON library comparison. The result might be different for different use cases.But I think having extra code to track references just to raise a nicer
ValueError
is a bit too much. I have a patch that removes the extra code and tries to keep everything backward compatible. Could remove even more by removingmarkers
fromc_make_encoder
and_make_iterencode
but that requires fixes to some tests and maybe it breaks something.Anyway, open to feedback.
# Add a code block here, if required
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: