-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
BLD Migrate away from distutils and only use setuptools #24563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLD Migrate away from distutils and only use setuptools #24563
Conversation
@@ -1,7 +1,7 @@ | |||
include *.rst | |||
recursive-include doc * | |||
recursive-include examples * | |||
recursive-include sklearn *.c *.h *.pyx *.pxd *.pxi *.tp | |||
recursive-include sklearn *.c *.cpp *.h *.pyx *.pxd *.pxi *.tp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required for the sdist
to contain *.cpp
files that is not a source file. Specifically, svm/src/libsvm/svm.cpp
.
Looks good but wheels are failing? |
@@ -8,9 +8,6 @@ | |||
import sys | |||
import textwrap | |||
import warnings | |||
import subprocess | |||
|
|||
from distutils.errors import CompileError, LinkError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setuptools
introduced CompileError and LinkError, but that requires 61.1. Even the latest version of Ubuntu (22.04LTS) only ships with 59.6.
In our case, I think we can get away with catching all Exceptions and display the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this migration, @thomasjpfan! 💯
There are still a few references to distutils
and numpy.distutils
, namely:
-W
options in this snippet in this CI script which I think should be removed or be turned-Werror
:
scikit-learn/build_tools/azure/test_script.sh
Lines 60 to 66 in bc55cd7
# Python 3.10 deprecates distutils, which is imported by numpy internally TEST_CMD="$TEST_CMD -Wignore:The\ distutils:DeprecationWarning" # Ignore distutils deprecation warning, used by joblib internally TEST_CMD="$TEST_CMD -Wignore:distutils\ Version\ classes\ are\ deprecated:DeprecationWarning" - in this snippet of the documentation which I think can instead state to use
sklearn.show_versions
:
scikit-learn/doc/computing/computational_performance.rst
Lines 279 to 286 in bc55cd7
You can display the BLAS / LAPACK implementation used by your NumPy / SciPy / scikit-learn install with the following commands:: from numpy.distutils.system_info import get_info print(get_info('blas_opt')) print(get_info('lapack_opt'))
Here are some questions and suggestions, mainly about the inclusion of subpackages which was specified in setup.py
files when recursively buidling.
# Check compilation with intel C++ compiler (ICC) | ||
- template: build_tools/azure/posix.yml | ||
parameters: | ||
name: Linux_Nightly_ICC | ||
vmImage: ubuntu-20.04 | ||
dependsOn: [git_commit, linting] | ||
condition: | | ||
and( | ||
succeeded(), | ||
not(contains(dependencies['git_commit']['outputs']['commit.message'], '[ci skip]')), | ||
or(eq(variables['Build.Reason'], 'Schedule'), | ||
contains(dependencies['git_commit']['outputs']['commit.message'], '[icc-build]') | ||
) | ||
) | ||
matrix: | ||
pylatest_conda_forge_mkl: | ||
DISTRIB: 'conda' | ||
LOCK_FILE: 'build_tools/azure/pylatest_conda_forge_mkl_no_coverage_linux-64_conda.lock' | ||
COVERAGE: 'false' | ||
BUILD_WITH_ICC: 'true' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few occurrences of ICC in the code-base, namely:
scikit-learn/doc/developers/contributing.rst
Line 552 in f905c21
[icc-build] Build & test with the Intel C compiler (ICC) |
scikit-learn/sklearn/_build_utils/openmp_helpers.py
Lines 21 to 25 in bc55cd7
if sys.platform == "win32" and ("icc" in compiler or "icl" in compiler): | |
return ["/Qopenmp"] | |
elif sys.platform == "win32": | |
return ["/openmp"] | |
elif sys.platform in ("darwin", "linux") and "icc" in compiler: |
Should we let users be able to use icc? or should we remove those if we do not test the support of this compiler on the CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep this, assuming setting the CC
env variable works for setuptools. But it's true it would be better to test it at some point. But I would not make this a blocker for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not get the Intel compilers to work by setting the CC
environment variable (and friends). From memory, I got Intel to compile by setting environment variables, but there was an issue when running pytest
.
(The Azure logs does not exist anymore because it has been too long)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let's keep that in mind for a follow-up PR.
"neighbors": [ | ||
{"sources": ["_ball_tree.pyx"], "include_np": True}, | ||
{"sources": ["_kd_tree.pyx"], "include_np": True}, | ||
{"sources": ["_partition_nodes.pyx"], "language": "c++", "include_np": True}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the distutils
directive for the support of C++ in the header of this file:
scikit-learn/sklearn/neighbors/_partition_nodes.pyx
Lines 1 to 2 in bc55cd7
# distutils : language = c++ | |
is_pypy = platform.python_implementation() == "PyPy" | ||
np_include = numpy.get_include() | ||
default_libraries = ["m"] if os.name == "posix" else [] | ||
default_extra_compile_args = ["-O3"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does using the third level of optimisation with "-O3"
produces bigger binaries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe but our binaries are not too big compared to other projects in the ecosystem, so I think it's fine for now.
But it would be interesting to understand why the linux binaries are much bigger than the other OSes. Maybe -03
is very different for GCC used by default for Linux than it is for clang used for macos?
We should investigate in a later issue / PR but let's keep this PR focused on the handling of the distutils
deprecation.
sources=["_svmlight_format_fast.pyx"], | ||
include_dirs=[numpy.get_include()], | ||
) | ||
config.add_subpackage("tests") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are tests folders now added as sub-packages recursively and automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -294,6 +243,328 @@ def check_package_status(package, min_version): | |||
) | |||
|
|||
|
|||
extension_config = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this configuration, I guess the following comment can be updated to not mention the use of numpy.distutils
and of a recursively build:
Lines 25 to 31 in bc55cd7
# This is a bit (!) hackish: we are setting a global variable so that the | |
# main sklearn __init__ can detect if it is being loaded by the setup | |
# routine, to avoid attempting to load components that aren't built yet: | |
# the numpy distutils extensions that are used by scikit-learn to | |
# recursively build the compiled extensions in sub-packages is based on the | |
# Python import machinery. | |
builtins.__SKLEARN_SETUP__ = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed: aebf345 but we can keep this possible simplification for a later PR (preferably after 1.2 is released).
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending the points raised by @jjerphan about remaining references to distutils and the following, LGTM.
# Check compilation with intel C++ compiler (ICC) | ||
- template: build_tools/azure/posix.yml | ||
parameters: | ||
name: Linux_Nightly_ICC | ||
vmImage: ubuntu-20.04 | ||
dependsOn: [git_commit, linting] | ||
condition: | | ||
and( | ||
succeeded(), | ||
not(contains(dependencies['git_commit']['outputs']['commit.message'], '[ci skip]')), | ||
or(eq(variables['Build.Reason'], 'Schedule'), | ||
contains(dependencies['git_commit']['outputs']['commit.message'], '[icc-build]') | ||
) | ||
) | ||
matrix: | ||
pylatest_conda_forge_mkl: | ||
DISTRIB: 'conda' | ||
LOCK_FILE: 'build_tools/azure/pylatest_conda_forge_mkl_no_coverage_linux-64_conda.lock' | ||
COVERAGE: 'false' | ||
BUILD_WITH_ICC: 'true' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep this, assuming setting the CC
env variable works for setuptools. But it's true it would be better to test it at some point. But I would not make this a blocker for this PR.
I prefer to use |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if CI and CD are 🟢!
…#24563) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Closes #21499
This PR migrates away from
distutils
andnumpy.distutils
to only usingsetuptools
. Summary of changes:setuptools
does not support nestedsetup.py
, all the setup goes into the rootsetup.py
.compiler=intelem
is dropped. In the future, I think a possible way to support it is to use environment flags, such asCC
, etc.CCompiler_spawn
anymore.