Skip to content

BLD Migrate away from distutils and only use setuptools #24563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Oct 2, 2022

Closes #21499

This PR migrates away from distutils and numpy.distutils to only using setuptools. Summary of changes:

  1. Since setuptools does not support nested setup.py, all the setup goes into the root setup.py.
  2. As noted in Should we continue to support compiler=intelem? #24525, compiler=intelem is dropped. In the future, I think a possible way to support it is to use environment flags, such as CC, etc.
  3. No need to monkeypatch CCompiler_spawn anymore.

@@ -1,7 +1,7 @@
include *.rst
recursive-include doc *
recursive-include examples *
recursive-include sklearn *.c *.h *.pyx *.pxd *.pxi *.tp
recursive-include sklearn *.c *.cpp *.h *.pyx *.pxd *.pxi *.tp
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for the sdist to contain *.cpp files that is not a source file. Specifically, svm/src/libsvm/svm.cpp.

@amueller
Copy link
Member

Looks good but wheels are failing?

@@ -8,9 +8,6 @@
import sys
import textwrap
import warnings
import subprocess

from distutils.errors import CompileError, LinkError
Copy link
Member Author

@thomasjpfan thomasjpfan Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setuptools introduced CompileError and LinkError, but that requires 61.1. Even the latest version of Ubuntu (22.04LTS) only ships with 59.6.

In our case, I think we can get away with catching all Exceptions and display the error message.

@jjerphan jjerphan self-requested a review October 31, 2022 16:11
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this migration, @thomasjpfan! 💯

There are still a few references to distutils and numpy.distutils, namely:

  • -W options in this snippet in this CI script which I think should be removed or be turned -Werror:
    # Python 3.10 deprecates distutils, which is imported by numpy internally
    TEST_CMD="$TEST_CMD -Wignore:The\ distutils:DeprecationWarning"
    # Ignore distutils deprecation warning, used by joblib internally
    TEST_CMD="$TEST_CMD -Wignore:distutils\ Version\ classes\ are\ deprecated:DeprecationWarning"
  • in this snippet of the documentation which I think can instead state to use sklearn.show_versions:
    You can display the BLAS / LAPACK implementation used by your NumPy / SciPy /
    scikit-learn install with the following commands::
    from numpy.distutils.system_info import get_info
    print(get_info('blas_opt'))
    print(get_info('lapack_opt'))

Here are some questions and suggestions, mainly about the inclusion of subpackages which was specified in setup.py files when recursively buidling.

Comment on lines -109 to -129
# Check compilation with intel C++ compiler (ICC)
- template: build_tools/azure/posix.yml
parameters:
name: Linux_Nightly_ICC
vmImage: ubuntu-20.04
dependsOn: [git_commit, linting]
condition: |
and(
succeeded(),
not(contains(dependencies['git_commit']['outputs']['commit.message'], '[ci skip]')),
or(eq(variables['Build.Reason'], 'Schedule'),
contains(dependencies['git_commit']['outputs']['commit.message'], '[icc-build]')
)
)
matrix:
pylatest_conda_forge_mkl:
DISTRIB: 'conda'
LOCK_FILE: 'build_tools/azure/pylatest_conda_forge_mkl_no_coverage_linux-64_conda.lock'
COVERAGE: 'false'
BUILD_WITH_ICC: 'true'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few occurrences of ICC in the code-base, namely:

[icc-build] Build & test with the Intel C compiler (ICC)

if sys.platform == "win32" and ("icc" in compiler or "icl" in compiler):
return ["/Qopenmp"]
elif sys.platform == "win32":
return ["/openmp"]
elif sys.platform in ("darwin", "linux") and "icc" in compiler:

Should we let users be able to use icc? or should we remove those if we do not test the support of this compiler on the CI?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this, assuming setting the CC env variable works for setuptools. But it's true it would be better to test it at some point. But I would not make this a blocker for this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not get the Intel compilers to work by setting the CC environment variable (and friends). From memory, I got Intel to compile by setting environment variables, but there was an issue when running pytest.

(The Azure logs does not exist anymore because it has been too long)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's keep that in mind for a follow-up PR.

"neighbors": [
{"sources": ["_ball_tree.pyx"], "include_np": True},
{"sources": ["_kd_tree.pyx"], "include_np": True},
{"sources": ["_partition_nodes.pyx"], "language": "c++", "include_np": True},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the distutils directive for the support of C++ in the header of this file:

# distutils : language = c++

is_pypy = platform.python_implementation() == "PyPy"
np_include = numpy.get_include()
default_libraries = ["m"] if os.name == "posix" else []
default_extra_compile_args = ["-O3"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using the third level of optimisation with "-O3" produces bigger binaries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe but our binaries are not too big compared to other projects in the ecosystem, so I think it's fine for now.

But it would be interesting to understand why the linux binaries are much bigger than the other OSes. Maybe -03 is very different for GCC used by default for Linux than it is for clang used for macos?

We should investigate in a later issue / PR but let's keep this PR focused on the handling of the distutils deprecation.

sources=["_svmlight_format_fast.pyx"],
include_dirs=[numpy.get_include()],
)
config.add_subpackage("tests")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are tests folders now added as sub-packages recursively and automatically?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -294,6 +243,328 @@ def check_package_status(package, min_version):
)


extension_config = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this configuration, I guess the following comment can be updated to not mention the use of numpy.distutils and of a recursively build:

scikit-learn/setup.py

Lines 25 to 31 in bc55cd7

# This is a bit (!) hackish: we are setting a global variable so that the
# main sklearn __init__ can detect if it is being loaded by the setup
# routine, to avoid attempting to load components that aren't built yet:
# the numpy distutils extensions that are used by scikit-learn to
# recursively build the compiled extensions in sub-packages is based on the
# Python import machinery.
builtins.__SKLEARN_SETUP__ = True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed: aebf345 but we can keep this possible simplification for a later PR (preferably after 1.2 is released).

ogrisel and others added 2 commits November 3, 2022 15:38
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending the points raised by @jjerphan about remaining references to distutils and the following, LGTM.

Comment on lines -109 to -129
# Check compilation with intel C++ compiler (ICC)
- template: build_tools/azure/posix.yml
parameters:
name: Linux_Nightly_ICC
vmImage: ubuntu-20.04
dependsOn: [git_commit, linting]
condition: |
and(
succeeded(),
not(contains(dependencies['git_commit']['outputs']['commit.message'], '[ci skip]')),
or(eq(variables['Build.Reason'], 'Schedule'),
contains(dependencies['git_commit']['outputs']['commit.message'], '[icc-build]')
)
)
matrix:
pylatest_conda_forge_mkl:
DISTRIB: 'conda'
LOCK_FILE: 'build_tools/azure/pylatest_conda_forge_mkl_no_coverage_linux-64_conda.lock'
COVERAGE: 'false'
BUILD_WITH_ICC: 'true'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this, assuming setting the CC env variable works for setuptools. But it's true it would be better to test it at some point. But I would not make this a blocker for this PR.

@ogrisel
Copy link
Member

ogrisel commented Nov 3, 2022

I prefer to use sklearn.show_versions() because it displays actual runtime information. The build time information from numpy might be wrong (and is less precise).

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI and CD are 🟢!

@jjerphan jjerphan merged commit 266d2a2 into scikit-learn:main Nov 3, 2022
andportnoy pushed a commit to andportnoy/scikit-learn that referenced this pull request Nov 5, 2022
…#24563)

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC Migrate away from numpy.distutils before CPython 3.12 (October 2023)
5 participants