-
Notifications
You must be signed in to change notification settings - Fork 14
Increase consumer group test timeout #187
base: master
Are you sure you want to change the base?
Increase consumer group test timeout #187
Conversation
In my opinion, it's slower than it used to be. If you re-run the tests, it eventually passes but it's still strange. I'm suspecting something internally with GitHub's CI changed but I need to review things further. |
I started to observe this behavior after merging #184 but I can't imagine anything in there being causally related. Maybe I should rethink how brokers are spun up for each CI test. |
Test failed again after increasing the timeout. This time I reverted |
Hmm, seems like the revert didn't fix it. Thanks for checking, though. I'm baffled this is now an issue. |
I have a suspicion that the test failure might be related to the CPU time available in github workflow. So I have started a Ubuntu 24.04 VM on GCP using VM type n2d-standard-4 (4 vCPU, 2 core, 16G RAM), all tests can pass (master branch). But if I limit the CPU to 0.1 core (using cgroup), the following tests will fail.
I am still investigating but I don't have enough evidence at the moment. I also started testing using e2-micro (0.25~2 vCPU, 1 shared core, 1G RAM), will update result here later. |
n2d-standard-4 limit to 1 core
n2d-standard-4 limit to 0.5 core: passed twice e2-micro 1st time
e2-micro 2nd time
Most of them raised Result: Inconclusive. I'm not familiar with tox but Is there a way to run a single tests? I tried something like |
I am using this PR as a testing ground, but I have submitted #192, I assume it will fix part of the problem. |
This PR still fail because |
I haven't figure out why java dies in I guess that GitHub runner may not have enough memory, and I can reproduce slowliness on resource constrained VMs when there are too many Kafka running. |
I agree. My goal has been to run one Kafka instance per test in order to conserve memory, as a solution. I was planning on tinkering with concurrency groups in https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/using-concurrency to improve this issue. |
This reverts commit f76b6d4.
7d8bfac
to
1608037
Compare
^ I cannot reproduce the timeout in my environment (Kafka 0.8.2.2, Python 3.12, after test/test_partitioner.py, before test/test_producer.py). I have updated this branch to trigger the test to find out... |
Thank you for your meticulous investigation, I really do appreciate it. It's been troubling me as to why it's been an issue the past month. Possibly Microsoft scaling down runner resources as a cost-cutting measure? |
No worries. I am not sure. On paper, public repo runner have plenty of resource. But I am not sure what's behind the scene, and I never tried to benchmark it... Note: |
Test
test/test_consumer_group.py::test_group
failed. Increase timeout to find out if it is just slow or real failure.