Skip to content

Devcontainer configuration added #27743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

brunnelu
Copy link

@brunnelu brunnelu commented Nov 7, 2023

Reference Issues/PRs

None

What does this implement/fix?

At a scikit-learn sprint many attendees have problems getting started with contributing and setting up their environment. Be it to clone a the repo, installing dependencies, ...

The github codespaces as an alternative are a possibility to speed this part up, such that most time can be used going through code/ updating docs.

Any other comments?

If this is even interesting or should be supported can be debated.

At least for updating docs it might be simpler then having people go through the contributing guide.

This PR probably is only helpful to people contributing once or a few times at most.

Copy link

github-actions bot commented Nov 7, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d20e037. Link to the linter CI: here

@thomasjpfan
Copy link
Member

I have tested using GitHub's Codespaces for scikit-learn development and it was not a great experience because the built times were too long. (Scikit-learn needs to compile some code for us to run tests, etc.) Even for documentation changes, building the docs with sphinx on Codespaces took a while.

Maybe the build times were improved recently. Can you try building scikit-learn and the docs and see how long it takes?

@betatim
Copy link
Member

betatim commented Nov 8, 2023

Is there a way to test drive this PR (I've never used codespaces/dev containers :-/)?

I think having a way to get started contributing that involves less preparation steps would be great (what is a ssh-key? how do I make one? how to setup remotes? what is a remote? why does https for remotes not work? conda what?...) There are a lot of seemingly innocent steps but for a newcomer it is a lot of steps.

I think a reason not to have something like devcontainers is that it is yet another thing that needs maintaining and keeping in sync. Which like the individual setup steps is not a big deal by itself, but when you combine many simple steps you suddenly have a whole bunch of work. Another thing I dislike about tools like this that lets you "hide complexity", is that by hiding it we remove pressure on keeping the instructions simple (and I say that as someone who created mybinder.org which is all about hiding complexity...).

So overall I wonder how much work and expertise we'd have to acquire to make the devcontainer experience good. What else could we do to make the "getting started experience" better (is there stuff we can cut out from the instructions)? I'm sure there are tricks of the trade to help improve start up and compile times, the question is how expert level does the knowledge of the person maintaining the setup need to be?

Overall I am a definite maybe on the fence certainly cool on this topic :D

@thomasjpfan
Copy link
Member

I have not seen other numerical Python projects really take off using devcontainers. Personally, I dislike hiding the complexity of setting up a dev environment. It is a useful skill to have and is transferable to other Python projects a contributor may work on.

With the technical barrier aside from #27743 (comment), I am overall -0.5 on devcontainers.

@glemaitre
Copy link
Member

I experimented with codespace during another sprint. If I recall correctly, I set up a mambaforge there and everything was straightforward following the linux guideline.

Having a devcontainer means that we impose mamba, pip or whatever package manager on people which is not a great experience as well. In terms of installing scikit-learn, I would say that it goes relatively well with linux and MacOS machines. I always ended up having issue with the compilers in Windows and the latest trick is to use WSL.

My bottom line is that I don't think we need to have devcontainer in the repository. However, I would not mind to having a blog post (https://blog.scikit-learn.org/) to show how to get started contributing with codespace.

@ogrisel
Copy link
Member

ogrisel commented Nov 13, 2023

I think we need at least 2 maintainers comfortable enough with using this tool so that PRs related to its configuration update can be maintained with minimum overhead.

Personally I have never gave it a try yet. But if build times are good (to be confirmed as @thomasjpfan requested above), then I might invest time in learning to use it.

Note: mambaforge is being soft-deprecated in favor of miniforge3 since the 2 tarballs have converged (mamba is shipped by default in miniforge3).

Furthermore, the libmamba solver is now used by default in the latest conda release. This means that we can probably simplify the packaging options to pip vs conda (with with conda-forge channel installed via miniforge3).

@ogrisel
Copy link
Member

ogrisel commented Nov 13, 2023

I think before considering dev containers, we should start by providing a single line install of all the dev and doc dependencies.

For instance by providing dev-requirements.txt and dev-environment.yml files.

One of those files could be used in the dev container.

@ogrisel
Copy link
Member

ogrisel commented Nov 13, 2023

Also I think the dev container should install and configure ccache. This can be done for conda environments with:

conda-forge/compilers-feedstock#31 (comment)

@adrinjalali
Copy link
Member

I think it would be a good thing to have the dev container, and to enable one or two online development environments (like codespaces). During the sprints, this is a real pain point and at this point I don't think people are getting much by configuring a complex environment. They're completely lost and don't understand what's happening anyway. This would make it much easier for them to start with things that they understand.

But I think we should use one of our lock files instead in the container, so that the dependencies are not out of sync with the CI.

@thomasjpfan
Copy link
Member

Also I think the dev container should install and configure ccache

I developed this cache approach for NumPy when they wanted to use GitPod. I had to create a CI workflow to build an image that compiles and populates the cache. This way, when a user opens a new workspace, the cache is available and the first compile is quick. GitPod had a prebuild feature, but it was not accessible on forks. Only organization members can use it.

BTW I could not figure out a way to properly cache the Sphinx build. I think it's possible, but I'll have to do some hacking around the files' timestamps.

GitHub Codespaces also has a pre-build feature. If GitHub's Codespace pre-builds are accessible on forks, then there is a path to get caching to work here.

As for sprints, I see that we use separate repos as the "sprint landing page". For example: https://github.com/scikit-learn-inria-fondation/EuroPython22 . That repo can have Codespaces and stand up a development environment for scikit-learn, which should be simple. It would be a bit harder to get caching to work, but it is doable.

@betatim
Copy link
Member

betatim commented Dec 8, 2023

Is there a way to test drive this PR with codespaces? As I never use it, I have a lot of (trivial) questions.

Using the separate sprint repo(s) is an interesting idea. I wonder how much of the integration between a codepsace and its repo for making PRs and such is lost by using such a "side repo" approach. That is why test driving would be nice.

@lesteve
Copy link
Member

lesteve commented Mar 27, 2025

I am going to close this one and discuss it in #31091.

I have a GitHub codespaces PoC that seems usable (build time from scratch is ~7 minutes) and I think this could be useful as fall-back during sprints (especially pesky Windows company laptops which we are sometimes spending a lot on time on).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants