Skip to content

CI Experimental [nogil] build of scikit-learn #23174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
May 2, 2022

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Apr 21, 2022

@colesbury announced in October 2021 an experimental fork of CPython that does not have a Global Interpreter Lock:

I recently started to test it on my local dev environment and since the fixes of the following issues:

all scikit-learn tests now pass when run on the experimental nogil branch of CPython (3.9).

I think it's a good idea to maintain a nightly CI entry to ensure that this stays the case or at least detect any regression to report them upstream so that the nogil branch stays relevant for the scientific Python stack and hopefully makes it in upstream CPython at some point in the future (maybe with an explicit flag).

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nogil project looks interesting. In the long term, I think nogil support only makes sense for us if nogil makes it into CPython itself. In the short term, I am concern about maintenance. If there is an issue with the nogil build, then I would consider that lower priority because nogil itself is experimental.

That being said, I am okay with adding this to the CI as a nightly build.

@ogrisel
Copy link
Member Author

ogrisel commented Apr 21, 2022

The nogil project looks interesting. In the long term, I think nogil support only makes sense for us if nogil makes it into CPython itself.

Yes but I am afraid this won't happen if we cannot show the CPython maintainers that a nogil mode is very important for CPU-intensive Python users such as machine-learners.

In the short term, I am concern about maintenance. If there is an issue with the nogil build, then I would consider that lower priority because nogil itself is experimental.

Yes I agree: the goal would be to detect problems and report them upstream. I don't expect that there will be anything to change in our own code base. The only example so far was the recently merged #23159.

@ogrisel
Copy link
Member Author

ogrisel commented Apr 21, 2022

Ok the CI is running but the nogil linux run seems to be much slower than other linux runs. I am not sure why as I cannot reproduce this slowdown on my local machine.

  • pytest -n4 completes in 274.20s (for ~25k tests) in my nogil venv on main after a make clean in to make sure the Cython files are compiled by the nogil variant of Cython
  • the same pytest -n4 completed in 263.70s (for ~27k tests) in my regular conda-forge based dev env.

So there might be a slight slowdown (could be caused by different openblas / numpy / scipy versions and binaries) but far from what we observe on this CI entry.

@ogrisel
Copy link
Member Author

ogrisel commented Apr 21, 2022

The nogil numpy and scipy wheels use an old version of OpenBLAS (0.3.3) where my SkylakeX CPU is detected as Prescott. This was reported as scipy/scipy#14886 and was fixed in more recent of scipy that ship a more recent version of OpenBLAS. So that might explain the slight slowdown compared to my conda-forge env which was running MKL (I just realized).

EDIT: I did another run with a fresh conda-forge env and pytest -n4 ran in 462.57s. So the nogil run was much better in comparison!

Note that on the CI the CPU is older and is detected as Haswell, so not as bad as detecting Prescott. So OpenBLAS CPU detection is probably not the problem on the CI.

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@colesbury
Copy link
Contributor

colesbury commented Apr 21, 2022

@ogrisel I'm rebuilding the NumPy and SciPy wheels with a new version of OpenBLAS (0.3.18) and will re-upload them soon.

EDIT: the new wheels are uploaded

@ogrisel
Copy link
Member Author

ogrisel commented Apr 28, 2022

After disabling test coverage collection and reporting for this build, the test time went from 48 min to less than 12 min...

@colesbury Maybe there is a bad interaction between the coverage module and nogil CPython?

Anyway we don't need coverage reports for this nightly build.

I will try to re-enable xdist in a subsequent commit to see the impact.

@ogrisel
Copy link
Member Author

ogrisel commented Apr 28, 2022

It's now really fast with no coverage and with pytest-xdist. +1 for merge on my side :)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise LGTM

@ogrisel
Copy link
Member Author

ogrisel commented Apr 29, 2022

Weird, all the successful CI results went away... Let me push another empty commit to get at least one with the current state of the branch.

@@ -120,6 +135,24 @@ python_environment_install() {
pip install https://github.com/joblib/joblib/archive/master.zip
echo "Installing pillow master"
pip install https://github.com/python-pillow/Pillow/archive/main.zip
elif [[ "$DISTRIB" == "pip-nogil" ]]; then
setup_ccache # speed-up the build of CPython it-self
Copy link
Member

@lesteve lesteve Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move setup_ccache earlier e.g. in pre_python_environment_install and remove it from scikit_learn_install this way it will always be called once.

Not sure whether then you'd want the "ccache already configured, skipping..." logic

Other than this LGTM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move setup_ccache earlier e.g. in pre_python_environment_install and remove it from scikit_learn_install this way it will always be called once.

This is what I wanted to do initially but it does not work because on some builds (in particular the macos builds) the ccache command is installed with conda instead of apt.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for setting up the experimental build, @ogrisel.

@jeremiedbb
Copy link
Member

jeremiedbb commented May 2, 2022

Nice ! and fast !

@jeremiedbb jeremiedbb merged commit f6e6973 into scikit-learn:main May 2, 2022
@ogrisel ogrisel deleted the nogil-ci branch May 2, 2022 12:31
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Aug 4, 2022
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
glemaitre pushed a commit that referenced this pull request Aug 5, 2022
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants