Skip to content

bpo-33927: Handle pyton -m json.tool --json-lines infile infile #29478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

CharString
Copy link
Contributor

@CharString CharString commented Nov 8, 2021

Handles the remaining case that bpo-45644 left open of bpo-33927.
https://bugs.python.org/issue45644#msg405913

Thanks @remilapeyre for pointing that out.

As far as I can see this should be enough to close bpo-33927

https://bugs.python.org/issue33927

Chris Wesseling added 3 commits November 8, 2021 22:23
Handles the remaining case that bpo-45644 left open of bpo-33927.
https://bugs.python.org/issue45644#msg405913

Thanks Rémi Lapeyre for pointing that out.
Even if in the expected TimeoutExpired isn't raised.
@methane
Copy link
Member

methane commented Nov 12, 2021

I think this is better:

             if options.json_lines:
-                objs = (json.loads(line) for line in infile)
+                objs = [json.loads(line) for line in infile]
             else:
-                objs = (json.load(infile),)
+                objs = [json.load(infile)]

@CharString
Copy link
Contributor Author

@methane Thanks for looking at this.

+                objs = [json.loads(line) for line in infile]

That looks clearer, yes. But doing this unconditionally won't pass the test_jsonlines_from_a_stream case, we need a generator for that case. When dealing with json-lines, we may never know whether reading from infile ever halts (it could be stdin or a different kind of fifo). That's why just before writing I make a final check whether the infile and outfile are the same and only then turn it into a list. Because only when infile == outfile, we have something to gain from using a list instead of a generator.

We could lift this check into the if options.json_lines body. But in my opinion that clouds the clear sequence of what we are doing; deal with input, prepare outfile, maybe completely exhaust input, open outfile and write output.

@methane
Copy link
Member

methane commented Nov 12, 2021

json.tool is a toy/demo program. So I don't think we need to support every case.

But if you want to support such cases, I prefer "write to temp file and rename" idiom.

I created a topic for it.
https://discuss.python.org/t/adding-atomicwrite-in-stdlib/11899

@CharString
Copy link
Contributor Author

CharString commented Nov 12, 2021

json.tool is a toy/demo program. So I don't think we need to support every case.

I agree, that's why I made #29273, as small as it is. stdlib is where code goes to die (and live forever) with minimal change.

But if you want to support such cases, I prefer "write to temp file and rename" idiom.

I created a topic for it. https://discuss.python.org/t/adding-atomicwrite-in-stdlib/11899

Interesting. I've added to the discussion :-)

As for this patch. It's the smallest change (effectively 2 lines) that would close bpo-33927. I'm completely okay with a "won't fix" resolution of this, if we deem it too much complexity for a toy.

PS as someone that deals a lot with rest-api's, I think it is a hell of a useful toy when exploring. :)

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Dec 13, 2021
@CharString
Copy link
Contributor Author

CharString commented Jan 10, 2022

Concerning the news entry: I think the "skip news" label is in order, iff this makes it into 3.11. The news entry of #29273 already describes what this PR does. This one can be seen as a simple continuation that covers an extra case.
I stand corrected if 3.11 is already in feature freeze, or when this issue is a "won't fix". I'm just going through my github notifications.

@@ -69,6 +69,10 @@ def main():
if options.outfile is None:
out = sys.stdout
else:
if (options.outfile.exists() and
options.outfile.expanduser().samefile(infile.name)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee that samefile() works as expected for all cases.
For example,

$ ./python.exe x.py | ./python.exe -m json.tool --json-lines - out.jsonl
Traceback (most recent call last):
  File "/Users/inada-n/work/python/cpython/Lib/pathlib.py", line 1006, in samefile
    other_st = other_path.stat()
               ^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'stat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/inada-n/work/python/cpython/Lib/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/inada-n/work/python/cpython/Lib/runpy.py", line 86, in _run_code
    exec(code, run_globals)
    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/inada-n/work/python/cpython/Lib/json/tool.py", line 87, in <module>
    main()
    ^^^^^^
  File "/Users/inada-n/work/python/cpython/Lib/json/tool.py", line 73, in main
    options.outfile.expanduser().samefile(infile.name)):
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/inada-n/work/python/cpython/Lib/pathlib.py", line 1008, in samefile
    other_st = self._accessor.stat(other_path)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '<stdin>'

There may be other cases that samefile() don't work as your expect.

Additionally, if we don't load the whole file at once, we shouldn't do that even when infile and outfile is same file.

So I think creating a temporary file and rename on success is the right way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's rude, github... I had written a response and pressed "Resolve" and it didn't post my comment. Anyway the gist was:

[...]

There may be other cases that samefile() don't work as your expect.

Thanks for the counterexample the samefile() docs clearly link to the os.stat cases, I should have followed them.

Additionally, if we don't load the whole file at once, we shouldn't do that even when infile and outfile is same file.

So I think creating a temporary file and rename on success is the right way.

And while I think the tempfile idiom/technique has valid applications, the aim I had with #29273 was to implement what sponge and sort -o do, with little added complexity. Having seen the mess tools like pip and pipenv make of my /tmp, I feel uncomfortable introducing it into a small tool like json.tool.

So like you said

json.tool is a toy/demo program. So I don't think we need to support every case.

I've closed this PR. I'm unsure what should happen with the bpo. Feels a bit bad to just scratch my own itch and then drop the ball, but this rabbit hole goes a bit deeper than I initially judged. Thanks again for your time and patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review skip news stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants