Skip to content

CI Enable parallel build on CI via an env variable #18826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ jobs:
OMP_NUM_THREADS=2
OPENBLAS_NUM_THREADS=2
SKLEARN_SKIP_NETWORK_TESTS=1
SKLEARN_BUILD_PARALLEL=3
CIBW_BUILD: cp${{ matrix.python }}-${{ matrix.platform_id }}
CIBW_TEST_REQUIRES: pytest pandas threadpoolctl
# Test that there are no links to system libraries
Expand Down
14 changes: 7 additions & 7 deletions build_tools/azure/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -133,19 +133,19 @@ try:
except ImportError:
print('pandas not installed')
"
python -m pip list
# Set parallelism to 3 to overlap IO bound tasks with CPU bound tasks on CI
# workers with 2 cores when building the compiled extensions of scikit-learn.
export SKLEARN_BUILD_PARALLEL=3

python -m pip list
if [[ "$DISTRIB" == "conda-pip-latest" ]]; then
# Check that pip can automatically install the build dependencies from
# pyproject.toml using an isolated build environment:
# Check that pip can automatically build scikit-learn with the build
# dependencies specified in pyproject.toml using an isolated build
# environment:
pip install --verbose --editable .
else
# Use the pre-installed build dependencies and build directly in the
# current environment.
# Use setup.py instead of `pip install -e .` to be able to pass the -j flag
# to speed-up the building multicore CI machines.
python setup.py build_ext --inplace -j 3
python setup.py develop
fi

ccache -s
5 changes: 3 additions & 2 deletions build_tools/circle/build_doc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -183,8 +183,9 @@ source activate testenv
pip install sphinx-gallery
pip install numpydoc

# Build and install scikit-learn in dev mode
python setup.py build_ext --inplace -j 3
# Set parallelism to 3 to overlap IO bound tasks with CPU bound tasks on CI
# workers with 2 cores when building the compiled extensions of scikit-learn.
export SKLEARN_BUILD_PARALLEL=3
python setup.py develop

export OMP_NUM_THREADS=1
Expand Down
5 changes: 4 additions & 1 deletion build_tools/circle/build_test_pypy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,11 @@ export CCACHE_COMPRESS=1
export PATH=/usr/lib/ccache:$PATH
export LOKY_MAX_CPU_COUNT="2"
export OMP_NUM_THREADS="1"
# Set parallelism to 3 to overlap IO bound tasks with CPU bound tasks on CI
# workers with 2 cores when building the compiled extensions of scikit-learn.
export SKLEARN_BUILD_PARALLEL=3

python setup.py build_ext --inplace -j 3
# Build and install scikit-learn in dev mode
pip install --no-build-isolation -e .

# Check that Python implementation is PyPy
Expand Down
14 changes: 14 additions & 0 deletions doc/developers/advanced_installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -439,3 +439,17 @@ Before using ICC, you need to set up environment variables::
Finally, you can build scikit-learn. For example on Linux x86_64::

python setup.py build_ext --compiler=intelem -i build_clib --compiler=intelem

Parallel builds
===============

It is possible to build scikit-learn compiled extensions in parallel by setting
and environment variable as follows before calling the ``pip install`` or
``python setup.py build_ext`` commands::

export SKLEARN_BUILD_PARALLEL=3
pip install --verbose --no-build-isolation --editable .

On a machine with 2 CPU cores, it can be beneficial to use a parallelism level
of 3 to overlap IO bound tasks (reading and writing files on disk) with CPU
bound tasks (actually compiling).
18 changes: 16 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,13 +108,27 @@ def run(self):

cmdclass = {'clean': CleanCommand, 'sdist': sdist}

# custom build_ext command to set OpenMP compile flags depending on os and
# compiler
# Custom build_ext command to set OpenMP compile flags depending on os and
# compiler. Also makes it possible to set the parallelism level via
# and environment variable (useful for the wheel building CI).
# build_ext has to be imported after setuptools
try:
from numpy.distutils.command.build_ext import build_ext # noqa

class build_ext_subclass(build_ext):

def finalize_options(self):
super().finalize_options()
if self.parallel is None:
# Do not override self.parallel if already defined by
# command-line flag (--parallel or -j)

parallel = os.environ.get("SKLEARN_BUILD_PARALLEL")
if parallel:
self.parallel = int(parallel)
if self.parallel:
print("setting parallel=%d " % self.parallel)

def build_extensions(self):
from sklearn._build_utils.openmp_helpers import get_openmp_flag

Expand Down