Skip to content

CI: move Linux and MacOS Azure builds to conda lock files #22448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 115 commits into from
May 13, 2022

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Feb 11, 2022

Reference Issues/PRs

part of #22425

What does this implement/fix? Explain your changes.

This is a start:

  • I replaced the two pylatest_conda_forge_mkl builds (Linux and MacOS) with two lock files
  • I added an environment.yml used for these two builds
  • I added a script to generate these two lock files.

Any other comments

Right now I am thinking that the bot functionality is decoupled from using lock files in all the builds and it could happen in a second step.

@thomasjpfan
Copy link
Member

Note that the lock file in the development version of conda-lock is a yaml file which I think is easier to parse if we need to.

@lesteve
Copy link
Member Author

lesteve commented Feb 11, 2022

OK I think I saw some discussions about this but did not investigate further. Do you think we should wait for a conda-lock release then?

As long as the diff between two lock files (with the same conda lock command but after one dependency has a new release) is reasonably readable (full disclosure: I have not checked maybe the order can change quite a bit).

@lesteve
Copy link
Member Author

lesteve commented Feb 11, 2022

One of the nice thing of the conda-lock development release is the support for pip section in environment.yml with poetry (we may need this for some builds where we combine conda + pip, e.g. the one using the conda defaults channel, not 100% sure)

@thomasjpfan
Copy link
Member

One of the nice thing of the conda-lock development release is the support for pip section in environment.yml with poetry

There are few nice features in the dev version we can use. I would wait for their release.

@ogrisel
Copy link
Member

ogrisel commented Feb 15, 2022

There are few nice features in the dev version we can use. I would wait for their release.

Can we install the dev version (e.g. a fixed commit hash) with a pip git URL in the mean time?

@lesteve
Copy link
Member Author

lesteve commented Feb 16, 2022

I am seeing weird errors with the conda-lock development version, I opened conda/conda-lock#148 for one of them.

I think I can keep going moving more builds to environment.yml files with conda-lock 0.13.2 in this PR, see if there are any show-stopper, and then reassess the situation with the conda-lock dev version (or the new release if there has been any).

@lesteve
Copy link
Member Author

lesteve commented Feb 16, 2022

As an alternative I could do a first PR using environment.yml in all the builds and not using lock files. Once we have everything in environment.yml, moving to lock files would me a bit more straightforward.

@lesteve
Copy link
Member Author

lesteve commented Feb 16, 2022

As long as the diff between two lock files (with the same conda lock command but after one dependency has a new release) is reasonably readable (full disclosure: I have not checked maybe the order can change quite a bit).

About the diff between lock files, you can see in d2a6dea#diff-708171dcd71ffc9f7ed79f676e2cb3dd1b923ba3e21319eb5e91613f12737120 that the diff is reasonably readable even if the order changes a bit, for example requests is not at the same place.

@ogrisel
Copy link
Member

ogrisel commented Feb 18, 2022

I am seeing weird errors with the conda-lock development version, I opened conda/conda-lock#148 for one of them.

I think I can keep going moving more builds to environment.yml files with conda-lock 0.13.2 in this PR, see if there are any show-stopper, and then reassess the situation with the conda-lock dev version (or the new release if there has been any).

Since the bug has been fixed in the dev branch of conda-lock we can consider using the dev version again (installed from a fixed git hash for stability), in order to use the new features if needed.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the update of the Python version, the build_tools/azure/update_environments_and_lock_files.py now runs on my M1 machine.

There are likely merge conflicts from: #23174

@lesteve
Copy link
Member Author

lesteve commented May 10, 2022

In my last commit, CI is green for all the nightly builds (including the pip-nogil build).

Right now, the python-nogil lockfile is generated manually rather than with the update_environments_and_lock_files.py script. I added a comment at the top of the lockfile with the docker command I use. In an ideal world, the update_environments_and_lock_files.py would do it but maybe this can be done in a further PR?

@ogrisel
Copy link
Member

ogrisel commented May 10, 2022

In an ideal world, the update_environments_and_lock_files.py would do it but maybe this can be done in a further PR?

+1 for keeping it manual for the nogil build for now.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with the handling of nogil. I tested it on m1 and it works as expected.

DOCKER_CONTAINER: 'i386/debian:11.2'
JOBLIB_VERSION: 'min'
DISTRIB: 'debian-32'
LOCK_FILE: './build_tools/azure/debian_atlas_32bit_lock.txt'
# disable pytest xdist due to unknown bug with 32-bit container
PYTEST_XDIST_VERSION: 'none'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove PYTEST_XDIST_VERSION as well? I do not think it is being used anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PYTEST_XDIST_VERSION is still used in the test script:

if [[ "$PYTEST_XDIST_VERSION" != "none" ]]; then
TEST_CMD="$TEST_CMD -n$CPU_COUNT"

Longer term, I was thinking about cleaning up environment variables like PYTEST_XDIST_VERSION and COVERAGE and check whether a package is installed instead of relying on a environment variable.

Comment on lines 1 to 7
# The content below has been generated manually with the following command:
# docker run -v $PWD:/scikit-learn -it nogil/python bash -c 'pip install pip-tools; pip-compile /scikit-learn/build_tools/azure/python_nogil_requirements.txt -o /scikit-learn/build_tools/azure/python_nogil_lock.txt'
#
# The reason behind it is that you need python-nogil to generate the pip lock
# file. Using pip-compile --index and --extra-index will not work, for example
# the latest cython will be picked up from PyPI, rather than the one from the
# python-nogil index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run the docker run -v $PWD:/scikit-learn ... command this header gets replaced. Maybe we move this comment into python_nogil_requirements.txt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe good enough for now? Longer term, I was thinking about using Docker to generate all the pip-based lock files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing what you suggested (I added the --upgrade option as well which is needed to upgrade dependencies as I discovered recently)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codewise LGTM

Assuming we have an auto updating bot that opens a PR once a week to bump versions, what do you see the workflow look like when the PR fails?

@lesteve
Copy link
Member Author

lesteve commented May 12, 2022

Assuming we have an auto updating bot that opens a PR once a week to bump versions, what do you see the workflow look like when the PR fails?

I guess a human has to investigate to understand the root cause of the issue. It might be simpler to fix it "by hand" (e.g. pin a package to a previous version) and regenerate the lock files with the script. Of course I am open to better suggestions. I am hoping that in most cases the PR updating the dependencies will pass.

I had in mind to use the peter-evans/create-pull-request. The defaults behaviour is what we want I think:

This only opens one PR which gets updated continuously until it is merged or
closed

@thomasjpfan
Copy link
Member

thomasjpfan commented May 13, 2022

Here is a concrete example: Imagine we have lock files in place, and pytest 6.2.5 was the latest locked version. The bot opens a PR trying to update to 7.0, which fails. Here are the options I see for a human to do:

  1. (Quickest) Pin pytest to 6.2.5 in update_environments_and_lock_files.py and regenerate the lock files.
  2. Fix the pytest error in code so it works with 7.0 and run update_environments_and_lock_files.py.

In both cases, we end up closing the PR created by the bot.

@lesteve
Copy link
Member Author

lesteve commented May 13, 2022

I am fine with your proposal.

In both cases, we end up closing the PR created by the bot.

It seems OK to me. Maybe you were hoping for a better workflow?

I think the main advantage of the CI lock files is that we don't have to rush to fix it since other PRs and main will be unaffected by the issue.

@thomasjpfan
Copy link
Member

I think the main advantage of the CI lock files is that we don't have to rush to fix it since other PRs and main will be unaffected by the issue.

I'm on board with lock files. I'm trying to see what situations we end up with when we have lock files. Let's say a new version of NumPy and pytest are both causing errors. The most straight forward thing to do is to pin both of them in update_environments_and_lock_files and a contributor can work on one at a time.

There are a few follow up items to this PR:

  1. Configure bot to create PR to auto update.
  2. Document steps for a contributor to take if the PR fails.

@lesteve
Copy link
Member Author

lesteve commented May 13, 2022

There are a few follow up items to this PR:

Yep. On top of the things you mention, I also had in mind to switch to lock files:

  • in Azure Windows builds
  • in CircleCI builds

@thomasjpfan thomasjpfan merged commit f862129 into scikit-learn:main May 13, 2022
@lesteve lesteve deleted the ci-lock-file branch May 16, 2022 09:58
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Aug 4, 2022
…rn#22448)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
glemaitre pushed a commit that referenced this pull request Aug 5, 2022
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants