-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: prevent celery from hanging due to spawned greenlet errors in greenlet drainers #9371
base: main
Are you sure you want to change the base?
fix: prevent celery from hanging due to spawned greenlet errors in greenlet drainers #9371
Conversation
…anual restart Co-authored-by: Linus Phan <13613724+linusphan@users.noreply.github.com> Co-authored-by: Jack <57678801+mothershipper@users.noreply.github.com>
…ckend-connection-error-handling
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9371 +/- ##
==========================================
+ Coverage 78.24% 78.30% +0.06%
==========================================
Files 153 153
Lines 19040 19056 +16
Branches 2520 2523 +3
==========================================
+ Hits 14898 14922 +24
+ Misses 3856 3848 -8
Partials 286 286
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Co-authored-by: Jack <57678801+mothershipper@users.noreply.github.com> Co-authored-by: Linus Phan <13613724+linusphan@users.noreply.github.com>
13e673e
to
0cfb89d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests to make sure the changes are behaving as expected 🙏
Thank you
Hi @Nusnus, thank you for looking at this PR. 🙏 I'm new here, and Python isn't my primary language. I may have rushed this. Let me make sure that I can run this locally and add the tests. As I look into this would you be able to take a look at the approach here and help confirm if the changes here are the right behavior? |
Co-authored-by: Linus Phan <13613724+linusphan@users.noreply.github.com> Co-authored-by: Jack <57678801+mothershipper@users.noreply.github.com>
Co-authored-by: Linus Phan <13613724+linusphan@users.noreply.github.com> Co-authored-by: Jack <57678801+mothershipper@users.noreply.github.com>
Co-authored-by: Jack <57678801+mothershipper@users.noreply.github.com> Co-authored-by: Linus Phan <13613724+linusphan@users.noreply.github.com>
Co-authored-by: Jack <57678801+mothershipper@users.noreply.github.com> Co-authored-by: Linus Phan <13613724+linusphan@users.noreply.github.com>
…est_EventletDrainer Co-authored-by: Jack <57678801+mothershipper@users.noreply.github.com> Co-authored-by: Linus Phan <13613724+linusphan@users.noreply.github.com>
quick update: we think that the PR is ready for a first pass review. We won't be pushing any more commits at the moment. We also think that code coverage lint is cached or something because we're seeing a check warning stating that "line #L150 was not covered by tests" in The only uncovered part is where this discussion thread is at. Happy to add tests if we reach a conclusion about the specific error and whether this is the approach that we'd like to take here. |
Hey @Nusnus don't mean to be a bother, but is there anything we can do to help move this along? We're using this fork in production, but would love to upstream it (or get this into a state that you'd be willing to take on). Thanks! |
I'll give it another look this week, let's see what I can do to help |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add some integration test for the change, please?
@auvipy / @thedrow happy to take a look at smoke tests, but is there anything in particular you're looking for coverage on? The only behavior change should be for users that are already on a non-happy path, the happy path shouldn't have changed. My gut says this may be tough to test without stubbing as it'd require killing redis from within the testing process (and then restarting it for other tests?). Let me know your thoughts, we'll still take a look today either way! |
I stand corrected, does look like you have support for initing / killing containers in the smoke tests already -- we'll see what we can do :) |
Check out the pytest-celery docs for the smoke tests: https://pytest-celery.readthedocs.io |
CI Issues fixed, 100% passing now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we improve the test coverage?
@auvipy I've been struggling to get the smoke tests to run locally, and don't want to spam you all/force CI to run broken tests while we figure it out. I've got a bit of time over the holiday break to make a second attempt, but I think we may be close to (or hitting) our limit in terms of being able to contribute effectively here :/ For what it's worth, I don't believe the codecov comment on this PR is accurate, I think it was cached based on earlier commits to the branch that didn't add tests. |
What issues are you having? |
@Nusnus how do you all recommend running the smoke tests on local? Was kind of hoping there'd be a Where I left off was having permission issues in the container when invoking tox, need to figure out which dirs need to be mounted as writable and hope that's the last blocker to execution. Haven't spent a much time trying to get the tests to run outside docker, but it seems like some of the deps are missing prebuilt wheels for apple silicon and need extra system deps -- didn't really want to install anything into my env unless really necessary. |
Fixes #4857
Description
This PR fixes an issue where an error raised in the spawned greenlet from the greenlet drainer causes the drainer to stop retrieving task results. Currently, these errors are only logged, making it difficult for clients to handle the situation effectively. As a result, a client may wait indefinitely for task results that will never be fetched, since the greenlet from the drainer has already stopped running. This change ensures that an error is thrown back to clients, enabling them to handle the error appropriately, such as by exiting or restarting the process.
Here are the steps that we took in our testing to reproduce the issue:
gevent
workers that allow for multiple connections.delay
method.