CI Experimental [nogil] build of scikit-learn #23174

ogrisel · 2022-04-21T09:40:29Z

@colesbury announced in October 2021 an experimental fork of CPython that does not have a Global Interpreter Lock:

main repo: https://github.com/colesbury/nogil
design doc: https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit

I recently started to test it on my local dev environment and since the fixes of the following issues:

Segmentation Fault when importing scipy.stats: SystemError: NYI: function() with closure colesbury/nogil#47
Segmentation Fault in multithreaded numpy calls in NPY_AUXDATA_FREE(auxdata) colesbury/nogil#48
Segfault in multithreaded Cython code in memoryview __dealloc__ in scikit-learn colesbury/nogil#50
Memory leak when using nogil Cython colesbury/nogil#51
"gc_get_refs(gc): -1" debug prints on stdout colesbury/nogil#53

all scikit-learn tests now pass when run on the experimental nogil branch of CPython (3.9).

I think it's a good idea to maintain a nightly CI entry to ensure that this stays the case or at least detect any regression to report them upstream so that the nogil branch stays relevant for the scientific Python stack and hopefully makes it in upstream CPython at some point in the future (maybe with an explicit flag).

thomasjpfan

The nogil project looks interesting. In the long term, I think nogil support only makes sense for us if nogil makes it into CPython itself. In the short term, I am concern about maintenance. If there is an issue with the nogil build, then I would consider that lower priority because nogil itself is experimental.

That being said, I am okay with adding this to the CI as a nightly build.

azure-pipelines.yml

build_tools/azure/install.sh

ogrisel · 2022-04-21T13:49:32Z

The nogil project looks interesting. In the long term, I think nogil support only makes sense for us if nogil makes it into CPython itself.

Yes but I am afraid this won't happen if we cannot show the CPython maintainers that a nogil mode is very important for CPU-intensive Python users such as machine-learners.

In the short term, I am concern about maintenance. If there is an issue with the nogil build, then I would consider that lower priority because nogil itself is experimental.

Yes I agree: the goal would be to detect problems and report them upstream. I don't expect that there will be anything to change in our own code base. The only example so far was the recently merged #23159.

ogrisel · 2022-04-21T15:43:37Z

Ok the CI is running but the nogil linux run seems to be much slower than other linux runs. I am not sure why as I cannot reproduce this slowdown on my local machine.

pytest -n4 completes in 274.20s (for ~25k tests) in my nogil venv on main after a make clean in to make sure the Cython files are compiled by the nogil variant of Cython
the same pytest -n4 completed in 263.70s (for ~27k tests) in my regular conda-forge based dev env.

So there might be a slight slowdown (could be caused by different openblas / numpy / scipy versions and binaries) but far from what we observe on this CI entry.

ogrisel · 2022-04-21T15:51:06Z

The nogil numpy and scipy wheels use an old version of OpenBLAS (0.3.3) where my SkylakeX CPU is detected as Prescott. This was reported as scipy/scipy#14886 and was fixed in more recent of scipy that ship a more recent version of OpenBLAS. So that might explain the slight slowdown compared to my conda-forge env which was running MKL (I just realized).

EDIT: I did another run with a fresh conda-forge env and pytest -n4 ran in 462.57s. So the nogil run was much better in comparison!

Note that on the CI the CPU is older and is detected as Haswell, so not as bad as detecting Prescott. So OpenBLAS CPU detection is probably not the problem on the CI.

azure-pipelines.yml

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

colesbury · 2022-04-21T22:25:58Z

@ogrisel I'm rebuilding the NumPy and SciPy wheels with a new version of OpenBLAS (0.3.18) and will re-upload them soon.

EDIT: the new wheels are uploaded

ogrisel · 2022-04-28T09:57:57Z

After disabling test coverage collection and reporting for this build, the test time went from 48 min to less than 12 min...

@colesbury Maybe there is a bad interaction between the coverage module and nogil CPython?

Anyway we don't need coverage reports for this nightly build.

I will try to re-enable xdist in a subsequent commit to see the impact.

ogrisel · 2022-04-28T12:30:26Z

It's now really fast with no coverage and with pytest-xdist. +1 for merge on my side :)

thomasjpfan

Minor comments, otherwise LGTM

build_tools/azure/test_docs.sh

build_tools/azure/test_script.sh

azure-pipelines.yml

build_tools/azure/install.sh

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

ogrisel · 2022-04-29T08:44:57Z

Weird, all the successful CI results went away... Let me push another empty commit to get at least one with the current state of the branch.

lesteve · 2022-04-29T14:05:09Z

build_tools/azure/install.sh

@@ -120,6 +135,24 @@ python_environment_install() {
        pip install https://github.com/joblib/joblib/archive/master.zip
        echo "Installing pillow master"
        pip install https://github.com/python-pillow/Pillow/archive/main.zip
+    elif [[ "$DISTRIB" == "pip-nogil" ]]; then
+        setup_ccache  # speed-up the build of CPython it-self


You can move setup_ccache earlier e.g. in pre_python_environment_install and remove it from scikit_learn_install this way it will always be called once.

Not sure whether then you'd want the "ccache already configured, skipping..." logic

Other than this LGTM.

You can move setup_ccache earlier e.g. in pre_python_environment_install and remove it from scikit_learn_install this way it will always be called once.

This is what I wanted to do initially but it does not work because on some builds (in particular the macos builds) the ccache command is installed with conda instead of apt.

jjerphan

Thank you for setting up the experimental build, @ogrisel.

build_tools/azure/install.sh

jeremiedbb · 2022-05-02T12:08:33Z

Nice ! and fast !

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

CI Experimental [nogil] build of scikit-learn

c1a15af

ogrisel added the Build / CI label Apr 21, 2022

ogrisel added 10 commits April 21, 2022 11:48

fix typos [nogil]

b93f035

Skip ccache setup if not installed [nogil]

787b286

Move ccache setup back to itself original position [nogil]

9ec5962

missing sudo [nogil]

adcaf82

Add the APT package sources [nogil]

c64638c

typo [nogil]

64a09a7

tee -a instead of tee [nogil]

16c7a24

python.exe => python [nogil]

2966aba

VIRTUALENV is just a name, not a path [nogil]

34b2a5b

git clone --depth 1 [nogil]

e09e2ef

thomasjpfan reviewed Apr 21, 2022

View reviewed changes

azure-pipelines.yml Show resolved Hide resolved

build_tools/azure/install.sh Show resolved Hide resolved

Make setup_ccache idempotent

d6f6218

ogrisel added 2 commits April 21, 2022 15:52

Better output messages in setup_ccache [nogil]

f5afcdc

activate venv in tests [nogil]

f94cfb8

thomasjpfan reviewed Apr 21, 2022

View reviewed changes

azure-pipelines.yml Show resolved Hide resolved

Try to disable xdist [nogil]

17967d7

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

ogrisel added 6 commits April 27, 2022 11:46

Trigger [nogil] build

4aa9467

Merge remote-tracking branch 'origin/main' into nogil-ci

eba8dbd

enable nogil venv for doctests

715546d

Trigger [nogil]

3a9ebd2

Fixed a typo [nogil]

7767753

disable coverage for [nogil]

8173b1c

Re-enable pytext-xdist for [nogil]

baa4d4c

thomasjpfan approved these changes Apr 28, 2022

View reviewed changes

build_tools/azure/test_docs.sh Outdated Show resolved Hide resolved

build_tools/azure/test_script.sh Outdated Show resolved Hide resolved

azure-pipelines.yml Show resolved Hide resolved

thomasjpfan reviewed Apr 28, 2022

View reviewed changes

build_tools/azure/install.sh Outdated Show resolved Hide resolved

ogrisel and others added 4 commits April 29, 2022 09:52

Document the [nogil] commit flag

c872dea

Apply suggestions from code review [nogil]

b7a84e9

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

More explicit comment on the [nogil] specific index

bb79bb4

Typo [ci skip]

465a183

Trigger [nogil]

6729e0b

lesteve reviewed Apr 29, 2022

View reviewed changes

jjerphan approved these changes Apr 29, 2022

View reviewed changes

build_tools/azure/install.sh Outdated Show resolved Hide resolved

ogrisel added 2 commits May 2, 2022 12:19

reorg install script for [nogil]

4b478ad

reorg install script for [nogil]

3598eb6

jeremiedbb merged commit f6e6973 into scikit-learn:main May 2, 2022

ogrisel deleted the nogil-ci branch May 2, 2022 12:31

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Aug 4, 2022

CI Experimental [nogil] build of scikit-learn (scikit-learn#23174)

7f10a80

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

glemaitre pushed a commit that referenced this pull request Aug 5, 2022

CI Experimental [nogil] build of scikit-learn (#23174)

12f3e07

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Uh oh!

CI Experimental [nogil] build of scikit-learn #23174

CI Experimental [nogil] build of scikit-learn #23174

Uh oh!

Conversation

ogrisel commented Apr 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Apr 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Apr 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Apr 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

colesbury commented Apr 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Apr 28, 2022

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Apr 29, 2022

Uh oh!

lesteve Apr 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel May 2, 2022

Choose a reason for hiding this comment

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeremiedbb commented May 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Apr 21, 2022 •

edited

Loading

ogrisel commented Apr 21, 2022 •

edited

Loading

ogrisel commented Apr 21, 2022 •

edited

Loading

ogrisel commented Apr 21, 2022 •

edited

Loading

colesbury commented Apr 21, 2022 •

edited

Loading

ogrisel commented Apr 28, 2022 •

edited

Loading

lesteve Apr 29, 2022 •

edited

Loading

jeremiedbb commented May 2, 2022 •

edited

Loading