Skip to content

RFC set up Codespaces to ease contributor experience especially during sprints? #31091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lesteve opened this issue Mar 27, 2025 · 7 comments
Labels

Comments

@lesteve
Copy link
Member

lesteve commented Mar 27, 2025

IMO this could be useful as fall-back during sprints, in particular for pesky company Windows laptops, where I (and others for example @adrinjalali and @glemaitre) have been guilty to debug the Windows situation rather than focussing on more important stuff 😅.

Try it on my fork https://github.com/lesteve/scikit-learn

Image

Full disclosure: for some reason, this does not work for me on Firefox, I need to use a Chromium-like browser (Vivaldi works for example), maybe due to my addons not sure.

Image

The setup seems quite maintable see current diff main...lesteve:scikit-learn:main. This could be tweaked for example to setup ccache if we insist but I think is good enough as is.

I tried it on my fork on a 2-core machine (default):

  • build time from scratch: ~7 minutes
  • run full test suite: ~13 minutes (with or without -n2 has similar timings)
  • doc make html-noplot (i.e. no example) ~9 minutes first time, ~1 minute second time

Pricing: 120 core hours + 15GB storage free per month. With the default 2-core machine, which is probably enough for sprints. See doc for more details.

I guess we may want to add light documentation about it somewhere.

Previous related conversations:

@github-actions github-actions bot added the Needs Triage Issue requires triage label Mar 27, 2025
@lesteve lesteve added RFC and removed Needs Triage Issue requires triage labels Mar 27, 2025
@glemaitre
Copy link
Member

I know that in PyParis, we spoke with someone in GitHub where they mentioned that ahead of time, we could require some more credits for participants if needed. I would not be against having this solution as a plan B whatever else does not work.

@adrinjalali
Copy link
Member

I would be happy to have this option, and I would even use it as a plan A for people who come to a spring w/o setting up their machine.

@lesteve
Copy link
Member Author

lesteve commented Mar 28, 2025

Thanks for the feed-back, I opened a draft PR with the needed .devcontainer folder #31096. I will look into adding some short doc next week.

@ogrisel
Copy link
Member

ogrisel commented Mar 28, 2025

The timings you quote in the description look good enough to be a useful fallback. +1 for adding minimal documentation to your PR.

@betatim
Copy link
Member

betatim commented Mar 31, 2025

I think this will be useful for getting users started without the negative experience of spending half the sprint setting things up.

One thing we could ponder is if we can reduce the "time to working container" in some way. Basically instead of downloading mamba and creating an environment when the devcontainer launches, can we do this ahead of time. Maybe with a custom docker image? Or maybe there are options within the GitHub codespaces experience to have "pre launched" containers? Not something we need to do now or something that is worth if it adds a lot of complexity. As a side note: I rarely update the environment I use locally to do scikit-learn development, so prebuilding it for the container seems like it would work.

Maybe we can attract someone who is a devcontainers geek who can help with this.

edit: codespaces also doesn't work for me in Firefox :( I think you can use devcontainers for local development as well, if you use VS Code (or maybe other editors as well?). Maybe that would also help windows users, assuming that getting docker to work is easy??

@ogrisel
Copy link
Member

ogrisel commented Mar 31, 2025

I think it worked for me on Firefox (macOS). I tried on @lesteve's fork.

What is implemented in #31096 is good enough for a start, and then we can incrementally optimize the start-up time in follow-up PRs.

@thomasjpfan
Copy link
Member

thomasjpfan commented Apr 1, 2025

I agree we can get started with a simple #31096.

Not something we need to do now or something that is worth if it adds a lot of complexity.

Optimizing the devcontainer startup time does add complexity. Github does have prebuilds, but I can not tell if we can share pre-builds on this repo with forks. If a user can go to the main scikit-learn/scikit-learn repo and click on "Open codespaces" and then push to their fork, then maybe they could use our GitHub prebuilds. We should test it out once #31096 gets merged.


If we can not use prebuilds we need to have a public custom base image for the dev container. Here is the optimization strategies: (with increasing complexity)

  1. Build a custom base image with our dev dependences. We build the image weekly, because these do not change that frequently.
  2. In our custom base image, install ccache and compile scikit-learn. This means forks can use this image and have a faster time building scikit-learn. I have tried this before with GitPod with SciPy, and it works.
  3. In our custom base image, build the docs to cache the docs build. We likely need to mess with the mtime of files to get sphinx to recognize the cache when using the devcontainer in a fork. (I have not figure out the right way to do this 😅)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants