-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Simplify utils.find_json function #68253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 3006.x
Are you sure you want to change the base?
Conversation
This PR stems from a user trying to parse a very large JSON with
With the same JSON but the linear complexity algorithm implemented in this PR, the same JSON returns much quicker:
So we're looking at orders of magnitude speedup in this PR (as is expected when we go from multiplicative big-O complexity to linear). I cannot share the specific JSON I use for testing (because it has customer data), but should you want to replicate this, it shouldn't be difficult to create a valid JSON with a lot of opening/closing braces inside of the JSON. [0] Lines 50 to 56 in 4a84820
|
90f47d8
to
43119ef
Compare
43119ef
to
cfec538
Compare
cfec538
to
60cafab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind creating a test for this?
I wasn't happy with the fact that I traded correctness for speed and found a better way works better and is faster. The below is a benchmark of three implementations:
@twangboy there are already tests in |
60cafab
to
7e8be36
Compare
The previous implementation computed all combinations of potential JSON documents and tried to `json.loads()`them. That resumted in num({) * num(}) tries, which could take hours on large inputs. The approach implemented with this change simplifies the work we do: we only look for opening '{' and '[' characters, and try to parse the rest of input string with JSONDecoder.raw_decode. This method ignores extraneous data at the end and is faster than doing it ourselves in Python.
7e8be36
to
d0fbd82
Compare
@@ -49,6 +49,12 @@ class JSONTestCase(TestCase): | |||
) | |||
) | |||
|
|||
def test_find_json_unbalanced_brace_in_string(self): | |||
test_sample_json = '{"title": "I like curly braces like this one:{"}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvements. Can you also add this same test for square bracket [
?
What does this PR do?
Instead of trying to parse all combinations of json blobs to find the largest valid one, track open JSON objects / lists and try to parse them one-by-one, ordered by size.
This approach prioritizes speed over correctness. We don't track strings inside JSON, i.e. strings with odd numbers of opening/closing brackets/braces break the algorithm.
What issues does this PR fix or reference?
Fixes #68258
Previous Behavior
Parsing long JSON strings takes a lot of time.
New Behavior
Parsing long JSON strings is quick.
Merge requirements satisfied?
[NOTICE] Bug fixes or features added to Salt require tests.
Commits signed with GPG?
Yes