-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Fix the stalled linux/arm64 [cd build] jobs on travis #20958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The root cause of the problem seems to be described here: https://travis-ci.community/t/builds-hang-with-output-truncated-mid-line/7611/9 (currently reading the discussion). |
Ok apparently my workaround works, for some reason I do not really understand... |
To workaround travis arm64 buffering issues causing the job to be wrongly detected as stalled after 10min by travis.
636605c
to
b0aa4e4
Compare
Testing wheels used to take 961.99s before tuning parallelism using |
Having a look to the logs, the issue is apparently a stalled test, is it not? |
It could be but we cannot be sure because the std output buffer is truncated. And it often happens that the stalling happens when Others suspect in https://travis-ci.community/t/builds-hang-with-output-truncated-mid-line/7611/9 that the buffering of std output on travis linux/arm64 workers is actually the cause of the problem: because travis does not see output, it detects the build as stalled even though it is not. |
This reverts commit 1067a4d.
I think parallelism is well tuned now. ~7 min for each build and they mostly happen in parallel. This is much faster than the circleci sequential build. Let me re-enable the other CIs. |
I am not sure which change helps the most to fix the original problem. I think the first time I got it to work prior to rebasing was after moving the pip install inside the test script. I am not sure if disabling Python buffering and removing the But anyways it seems to work and more importantly it is fast! |
When squash-merging this PR, it would be nice to put [cd build] in the commit message so has to trigger a fake weekly build for linux/arm64 on the |
The failure is unrelated and being addressed in #20963. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thank you @ogrisel for working on this.
Just a question, have you tried the new CPU_COUNT
without the || travis_terminate
sentences?
I think I had one that worked with a lower value of |
Merged to make it easier to move forward with the release process. |
@adrinjalali you might want to backport this to the |
tagging #20965 |
As an alternative to #20711 that might be too slow because we only have one concurrent executor on the free circle ci account.