Skip to content

RFC/CI: use Shippable for testing on ARM v8 architecture #13073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
glemaitre opened this issue Jan 31, 2019 · 20 comments · Fixed by #17996
Closed

RFC/CI: use Shippable for testing on ARM v8 architecture #13073

glemaitre opened this issue Jan 31, 2019 · 20 comments · Fixed by #17996

Comments

@glemaitre
Copy link
Member

I recently checked the Shippable CI service which offers some ARM architecture to test on. I tested it on my fork to check the required configuration and so on. However, there is a timeout of an hour, and currently, we are encountering this limit. Here, this is the log of such build:

https://app.shippable.com/github/glemaitre/scikit-learn/runs/24/1/console

On ARM 32 bits, we have a "Bus error (core dumped)":

https://app.shippable.com/github/glemaitre/scikit-learn/runs/24/2/console

I already cached the numpy, scipy, and cython wheels. Then, we still spent 15 minutes to make the scikit-learn wheel and the remaining time for the tests (the tests almost complete ~99%).

So I was wondering if it would be interesting to add this CI at least in a CRON job. Then, regarding the timeout constraint, we might have different strategies to overcome it:

  • Make some nightly wheels including some for the ARM target and fetching those instead of building from source in Shippable.
  • Ask Shippable if they can lift up the timeout constraint (they 120 minutes for paid plans) and offer to only use this CI for CRON job and not on all PR or master push.

I would be happy to have some thoughts.

@jnothman
Copy link
Member

jnothman commented Jan 31, 2019 via email

@glemaitre
Copy link
Member Author

It seems something feasible.

@glemaitre
Copy link
Member Author

You can clear the cache of the build when building and cache it at the end of the job and reuse the cache after.

@jnothman
Copy link
Member

jnothman commented Jan 31, 2019 via email

@glemaitre
Copy link
Member Author

glemaitre commented Jan 31, 2019

Kinda but without automation for the checking.

I would set up 2 distinct CRON jobs with a 1-hour difference. The first one to build (erasing the cache first), the second one to test the version which has been stored in the cache.

@jnothman
Copy link
Member

jnothman commented Jan 31, 2019 via email

@rth
Copy link
Member

rth commented Feb 4, 2019

Then, we still spent 15 minutes to make the scikit-learn wheel and the remaining time for the tests

Maybe using ccache could help some.

and the remaining time for the tests (the tests almost complete ~99%).

Sounds like we are almost there. Would pytest-xdist help enough to get it finished in 1 job (assuming there are multiple cores on the ARM VM)?

@rth
Copy link
Member

rth commented Jun 24, 2019

There were a few PRs to make tests faster, so this Shippable build might now complete.

Also, those VM have 96 CPU cores, I don't know how many we can access, it would be interesting to run,

python -c "import joblib; print(joblib.effective_n_jobs())"

there. If it's more than 1, can now use pytest-xdist, with pytest -n <n-cpu>, that might also help.

@rth
Copy link
Member

rth commented Jun 24, 2019

Also pytest-xdist is handy for catching segfaults in tests as well.

@rth
Copy link
Member

rth commented Jun 25, 2019

Actually conda-forge is doing this on Azure, though I don't see aarch in the public Agent pools. Not sure how they got it, but there is a possibility of asking for it as well.

@thomasjpfan
Copy link
Member

They are using vmImage: ubuntu-16.04 with a condaforge/linux-anvil-aarch64 as a docker image. We can set up a similar test environment here for ARM.

@rth
Copy link
Member

rth commented Jun 25, 2019

OK, so it's an Azure VM that runs Docker, that runs QEMU that emulates ARM. No wonder the test suite is taking 2 hours 20min.

If we can get real ARM hardware that would be best, and preliminary results above indicate that it's at least 2x faster.

@thomasjpfan
Copy link
Member

I was considering using Gitlab CI for Github and configure gitlab runners to be our own raspberry pis. (And take advantage of the new raspberry pi 4)

@rhenwood-arm
Copy link

FYI: Drone.io ( https://drone.io/ ) have an AArch64 and AArch32 build service that might be helpful.

@odidev
Copy link

odidev commented Jul 1, 2020

Hi,

Since manylinux2014 has given Linux AArch64 support, I am interested in building/uploading Linux AArch64 wheel to PyPI repository. Please let me know if you are interested.

Thanks

@rth
Copy link
Member

rth commented Jul 1, 2020

MacPython/scikit-learn-wheels#61 could be one way to do it long term. But we would still need a CI on the main repo. drone.io seems promising (it is in particular use by conda-forge). If we can manage to build scikit-learn and run tests there it would already be a step forward. CI is likely a pre-requisite for building wheels. A PR to invetigate would be very welcome you should be able to activate that CI on your fork.

@odidev
Copy link

odidev commented Jul 2, 2020

@rth, I will work on it and update you.

@ogrisel
Copy link
Member

ogrisel commented Jul 24, 2020

Azure Pipelines now supposedly supports the Linux/ARM64 platform as announced here:

https://azure.microsoft.com/en-in/updates/azure-devops-pipelines-introduces-support-for-linuxarm64/

However I could not find any reference for this platform in the documentation. Maybe it's just a matter of trying to use it.

@thomasjpfan
Copy link
Member

Looking more into it, I think azure added an arm64 agent so one can run an self-hosted arm agent. I do not think microsoft is hosting any arm agents themselves. Here is a list of microsoft hosted agents.

@rth
Copy link
Member

rth commented Jul 24, 2020

It's not explicitly mentioned here, but Travis CI also has some ARM CI I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants