Skip to content

Popen move from wait to communicate to avoid deadlock #5302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

franzhaas
Copy link
Contributor

@franzhaas franzhaas marked this pull request as draft January 1, 2025 20:51
@franzhaas
Copy link
Contributor Author

I do not understand why the second last run failed, or why the last changes fixed it...

Thus said, I believe this is good enough for review now.

@franzhaas franzhaas marked this pull request as ready for review January 2, 2025 07:34
@pekkaklarck
Copy link
Member

Switching from wait to communicate is great. Some quick comments:

  • The added test is currently not run at all, there needs to be a test on the atest/robot side for running it. It probably could be added to some existing suite and also the test under atest/testdata moved to some existing suite. See atest/README.rst for details and ask help on #devel if needed.
  • This can be added earliest in RF 7.3. I'll add Process: Avoid deadlock when standard streams are not redirected to files #4173 to its scope.
  • The code needs to be cleaned up. I'll review it properly once the test is added and we start RF 7.3 development.

@franzhaas franzhaas marked this pull request as draft January 3, 2025 21:02
@franzhaas
Copy link
Contributor Author

I added the test and made the change more invasive, leading to clearer code. I documented the two spots with the most wiredness...

@franzhaas franzhaas marked this pull request as ready for review January 4, 2025 10:25
@franzhaas
Copy link
Contributor Author

@pekkaklarck can we up-prioritise this PR? It solves 2 issues, the deadlock and the timeout issue.

I tried to balance the intrusiveness of the patch with code quality, but maybe some more aggressive usage of "communicate" can simplify this code for the cost of an invasive changeset. Maybe this is welcome in the long run, maybe for the 8.0 target?

@pekkaklarck
Copy link
Member

I tested the code and it fixes both #4173 and #5345 which is awesome. The code is rather complicated, though, and there are some style issues like accessing private attributes of the ExecutionResult object. While testing the implementation locally, I was able to simplify the code a lot. Now the options are:

  1. I write a code review so that you can fix the issues.
  2. I update the PR with my changes and merge it.
  3. I simply commit my changes and close this PR.

Because the fix is based on your solution, you'd be acknowledged in the release notes in all these cases.


_take_stdout = (process.stdout and (not process.stdout.closed))
_take_stderr = (process.stderr and (not process.stderr.closed))

Copy link
Member

@pekkaklarck pekkaklarck Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style:

  • We don't generally use empty line inside methods. If methods get so long that they need to be grouped by empty lines, we prefer introducing helper methods. In this particular case empty lines could just be removed.
  • We don't use _ prefix with local variables.
  • All parentheses are unnecessary.

# This is needed to handle stdin closes identically to how it was when the
# wait function was used.
_stdin = process.stdin
process.stdin = process.stdin if (process.stdin and not process.stdin.closed) else io.StringIO("")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that Popen.communicate() doesn't like stdin to be a closed PIPE, but there can also be problems if stdout and stderr are closed PIPEs. How they are all handled is pretty inconsistent and I submitted a bug about that:
python/cpython#131064

To be consistent with our old Popen.wait() based code, we need to handle all streams. Closed streams can be replaced with None instead io.StringIO(""), though.

# this is to allow the process to be aborted when on windows
# an exception is used to mange the test timeout situation
(_stdout, _stderr,) = process.communicate(timeout=0.1)
break
Copy link
Member

@pekkaklarck pekkaklarck Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't related to #4173 anymore and should rather be submitted as a separate PR. It also needs a separate test.

In my opinion try/except would make the code easier to understand that contextlib.suppress.


for f in [process.stdout, process.stderr] + result._custom_streams:
with contextlib.suppress(AttributeError):
f.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing _private members of the result object like the above code does is not ok. Making result.stdout property settable is a better solution.

@pekkaklarck
Copy link
Member

I looked at this again, and noticed that the change broke a case where only stdout or stderr is a PIPE and it is closed externally. For example, the following test succeeds with the old code but now with this one:

*** Test Cases ***
Example
    ${p}=    Start process    python    -V    stderr=DEVNULL
    Call method    ${p.stdout}    close
    ${r}=    Wait for process
    Log to console    ${r.stdout}

I have fixed the problem locally and added a test for it. I've also fixed other issues I mentioned in the review, and it would probably be easiest if I committed my changes and closed this PR. I'll wait until tomorrow with that in case you want to fix the issues yourself.

Regardless how we proceed, big thanks for the PR. It proved that the deadlock with PIPEs can be avoided.

@franzhaas
Copy link
Contributor Author

Sorry, I was unavailable the last few days. Sure, go ahead and commit + push if you already have the changes available.

Most of the things mentioned here I have no strong opinion about, except for the 'close stdout' issue, which should be fixed.

pekkaklarck added a commit that referenced this pull request Mar 11, 2025
The code is based on PR #5302 by @franzhaas, but contains some cleanup,
doc updates, and fix for handling closed stdout/stderr PIPEs.

Fixes #4173.
@pekkaklarck
Copy link
Member

pekkaklarck commented Mar 11, 2025

I committed my changes in f2f2f99. Notice that those changes don't contain the fix for timeouts (#5345) but I'll fix that separately. Huge thanks for your work @franzhaas!

pekkaklarck added a commit that referenced this pull request Mar 11, 2025
The fix is based on the code in PR #5302 by @franzhaas.

Fixes #5345.

Now that `Popen.communicate()` gets a timeout, its implementation
always gets to a code path where stdtout/stderr being a closed PIPE
doesn't matter and we only need to care about stdin. As the result
content written to PIPEs that are later closed is now available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants