-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
CI Experimental [nogil] build of scikit-learn #23174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nogil project looks interesting. In the long term, I think nogil support only makes sense for us if nogil makes it into CPython itself. In the short term, I am concern about maintenance. If there is an issue with the nogil build, then I would consider that lower priority because nogil itself is experimental.
That being said, I am okay with adding this to the CI as a nightly build.
Yes but I am afraid this won't happen if we cannot show the CPython maintainers that a nogil mode is very important for CPU-intensive Python users such as machine-learners.
Yes I agree: the goal would be to detect problems and report them upstream. I don't expect that there will be anything to change in our own code base. The only example so far was the recently merged #23159. |
Ok the CI is running but the nogil linux run seems to be much slower than other linux runs. I am not sure why as I cannot reproduce this slowdown on my local machine.
So there might be a slight slowdown (could be caused by different openblas / numpy / scipy versions and binaries) but far from what we observe on this CI entry. |
The nogil numpy and scipy wheels use an old version of OpenBLAS (0.3.3) where my SkylakeX CPU is detected as Prescott. This was reported as scipy/scipy#14886 and was fixed in more recent of scipy that ship a more recent version of OpenBLAS. So that might explain the slight slowdown compared to my conda-forge env which was running MKL (I just realized). EDIT: I did another run with a fresh conda-forge env and Note that on the CI the CPU is older and is detected as Haswell, so not as bad as detecting Prescott. So OpenBLAS CPU detection is probably not the problem on the CI. |
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@ogrisel I'm rebuilding the NumPy and SciPy wheels with a new version of OpenBLAS (0.3.18) and will re-upload them soon. EDIT: the new wheels are uploaded |
After disabling test coverage collection and reporting for this build, the test time went from 48 min to less than 12 min... @colesbury Maybe there is a bad interaction between the coverage module and nogil CPython? Anyway we don't need coverage reports for this nightly build. I will try to re-enable xdist in a subsequent commit to see the impact. |
It's now really fast with no coverage and with pytest-xdist. +1 for merge on my side :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, otherwise LGTM
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Weird, all the successful CI results went away... Let me push another empty commit to get at least one with the current state of the branch. |
@@ -120,6 +135,24 @@ python_environment_install() { | |||
pip install https://github.com/joblib/joblib/archive/master.zip | |||
echo "Installing pillow master" | |||
pip install https://github.com/python-pillow/Pillow/archive/main.zip | |||
elif [[ "$DISTRIB" == "pip-nogil" ]]; then | |||
setup_ccache # speed-up the build of CPython it-self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move setup_ccache
earlier e.g. in pre_python_environment_install
and remove it from scikit_learn_install
this way it will always be called once.
Not sure whether then you'd want the "ccache already configured, skipping..." logic
Other than this LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move setup_ccache earlier e.g. in pre_python_environment_install and remove it from scikit_learn_install this way it will always be called once.
This is what I wanted to do initially but it does not work because on some builds (in particular the macos builds) the ccache command is installed with conda instead of apt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for setting up the experimental build, @ogrisel.
Nice ! and fast ! |
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@colesbury announced in October 2021 an experimental fork of CPython that does not have a Global Interpreter Lock:
I recently started to test it on my local dev environment and since the fixes of the following issues:
all scikit-learn tests now pass when run on the experimental
nogil
branch of CPython (3.9).I think it's a good idea to maintain a nightly CI entry to ensure that this stays the case or at least detect any regression to report them upstream so that the nogil branch stays relevant for the scientific Python stack and hopefully makes it in upstream CPython at some point in the future (maybe with an explicit flag).