-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
RFC/CI: use Shippable for testing on ARM v8 architecture #13073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can we use one Cron job to package a wheel and save the environment
somewhere, and another to run the tests?
|
It seems something feasible. |
You can clear the cache of the build when building and cache it at the end of the job and reuse the cache after. |
So if the cached wheel is less than n hours old, use it to run the tests,
and if more than n hours, compile a new one?
|
Kinda but without automation for the checking. I would set up 2 distinct CRON jobs with a 1-hour difference. The first one to build (erasing the cache first), the second one to test the version which has been stored in the cache. |
Sounds good too. Do we also consider splitting the test suite in half?
|
Maybe using ccache could help some.
Sounds like we are almost there. Would pytest-xdist help enough to get it finished in 1 job (assuming there are multiple cores on the ARM VM)? |
There were a few PRs to make tests faster, so this Shippable build might now complete. Also, those VM have 96 CPU cores, I don't know how many we can access, it would be interesting to run, python -c "import joblib; print(joblib.effective_n_jobs())" there. If it's more than 1, can now use |
Also pytest-xdist is handy for catching segfaults in tests as well. |
Actually conda-forge is doing this on Azure, though I don't see aarch in the public Agent pools. Not sure how they got it, but there is a possibility of asking for it as well. |
They are using |
OK, so it's an Azure VM that runs Docker, that runs QEMU that emulates ARM. No wonder the test suite is taking 2 hours 20min. If we can get real ARM hardware that would be best, and preliminary results above indicate that it's at least 2x faster. |
I was considering using Gitlab CI for Github and configure gitlab runners to be our own raspberry pis. (And take advantage of the new raspberry pi 4) |
FYI: Drone.io ( https://drone.io/ ) have an AArch64 and AArch32 build service that might be helpful. |
Hi, Since manylinux2014 has given Linux AArch64 support, I am interested in building/uploading Linux AArch64 wheel to PyPI repository. Please let me know if you are interested. Thanks |
MacPython/scikit-learn-wheels#61 could be one way to do it long term. But we would still need a CI on the main repo. drone.io seems promising (it is in particular use by conda-forge). If we can manage to build scikit-learn and run tests there it would already be a step forward. CI is likely a pre-requisite for building wheels. A PR to invetigate would be very welcome you should be able to activate that CI on your fork. |
@rth, I will work on it and update you. |
Azure Pipelines now supposedly supports the Linux/ARM64 platform as announced here: https://azure.microsoft.com/en-in/updates/azure-devops-pipelines-introduces-support-for-linuxarm64/ However I could not find any reference for this platform in the documentation. Maybe it's just a matter of trying to use it. |
Looking more into it, I think azure added an arm64 agent so one can run an self-hosted arm agent. I do not think microsoft is hosting any arm agents themselves. Here is a list of microsoft hosted agents. |
It's not explicitly mentioned here, but Travis CI also has some ARM CI I think. |
I recently checked the Shippable CI service which offers some ARM architecture to test on. I tested it on my fork to check the required configuration and so on. However, there is a timeout of an hour, and currently, we are encountering this limit. Here, this is the log of such build:
https://app.shippable.com/github/glemaitre/scikit-learn/runs/24/1/console
On ARM 32 bits, we have a "Bus error (core dumped)":
https://app.shippable.com/github/glemaitre/scikit-learn/runs/24/2/console
I already cached the numpy, scipy, and cython wheels. Then, we still spent 15 minutes to make the scikit-learn wheel and the remaining time for the tests (the tests almost complete ~99%).
So I was wondering if it would be interesting to add this CI at least in a CRON job. Then, regarding the timeout constraint, we might have different strategies to overcome it:
I would be happy to have some thoughts.
The text was updated successfully, but these errors were encountered: