Skip to content

BLD do not cythonize for sdist #15533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from

Conversation

adrinjalali
Copy link
Member

Fixes #14863
Closes #15335

This builds on top of #15335, except that it uses pyproject.toml instead.

On a fresh virtual environment:

$ pip install .
Processing /home/adrin/Projects/sklearn/scikit-learn
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Collecting numpy>=1.11.0
  Using cached https://files.pythonhosted.org/packages/00/4a/e34fce8f18c0e052c2b49f1b3713469d855f7662d58ae2b82a88341e865b/numpy-1.17.3-cp37-cp37m-manylinux1_x86_64.whl
Collecting scipy>=0.17.0
  Using cached https://files.pythonhosted.org/packages/94/7f/b535ec711cbcc3246abea4385d17e1b325d4c3404dd86f15fc4f3dba1dbb/scipy-1.3.1-cp37-cp37m-manylinux1_x86_64.whl
Collecting joblib>=0.11
  Downloading https://files.pythonhosted.org/packages/8f/42/155696f85f344c066e17af287359c9786b436b1bf86029bb3411283274f3/joblib-0.14.0-py2.py3-none-any.whl (294kB)
     |████████████████████████████████| 296kB 1.6MB/s 
Building wheels for collected packages: scikit-learn
  Building wheel for scikit-learn (PEP 517) ... done
  Created wheel for scikit-learn: filename=scikit_learn-0.22.dev0-cp37-cp37m-linux_x86_64.whl size=23240615 sha256=dc15e41730f08a77b10cb95efd8b96292f85d030375df89f384bcae6ca115ca7
  Stored in directory: /home/adrin/.cache/pip/wheels/a7/f6/b8/87c0905207200a0cca42db062b540ba3a4dccc37310ac18a96
Successfully built scikit-learn
Installing collected packages: numpy, scipy, joblib, scikit-learn
Successfully installed joblib-0.14.0 numpy-1.17.3 scikit-learn-0.22.dev0 scipy-1.3.1

Before pip install:

$ pip freeze --all
pip==19.3.1
setuptools==41.6.0
wheel==0.33.6

after pip install:

$ pip freeze --all
joblib==0.14.0
numpy==1.17.3
pip==19.3.1
scikit-learn==0.22.dev0
scipy==1.3.1
setuptools==41.6.0
wheel==0.33.6

Our circleci min dep job tests the build and doc build with the minimum cython requirement. I think we don't do that in azure pipelines.

ping @rth @glemaitre @thomasjpfan

@adrinjalali adrinjalali added this to the 0.22 milestone Nov 4, 2019
@glemaitre
Copy link
Member

Thanks to have picked-up this one over. We need an additional step to check the version of pip (0.18) is required. Scipy is doing something similar: https://github.com/scipy/scipy/blob/master/setup.py#L285

@adrinjalali
Copy link
Member Author

The old ubuntu seems to have pip=10, we need 18 to build the package now. What do we want to do about it?

@glemaitre
Copy link
Member

Could we ask to upgrade the version of pip on ubuntu? Somehow, we already use cython from pip instead of the system one.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

@adrinjalali
Copy link
Member Author

any idea why the windows builds fail?

@thomasjpfan
Copy link
Member

Does pyproject.toml need to be added to MANIFEST.in?

@jeremiedbb
Copy link
Member

any idea why the windows builds fail?

On windows CI we install sklearn from source distribution (that we first build), no in editable mode. So sklearn ends up in python site-packages. But when we do that only the sklearn folder is put in site-packages, not top level scikit-learn. So there's no pyproject.toml

We would have the same issue on linux and mac if we installed from source distribution (maybe we should do that in one of the jobs ?)

So it turns out that parsing pyproject.toml is not a suitable option

@adrinjalali
Copy link
Member Author

Does pyproject.toml need to be added to MANIFEST.in?

I don't know, I'd say no, I just followed numpy.

So it turns out that parsing pyproject.toml is not a suitable option

WDYT @glemaitre ? I'd be happy to revert it and just have the version synced with a comment.

@ogrisel
Copy link
Member

ogrisel commented Nov 6, 2019

Could we ask to upgrade the version of pip on ubuntu?

On Ubuntu 18.04, the system pip3 is 9.0.1:

$ pip3 --version
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

We should never ever ask users to sudo pip3 install -U pip on there system.

Doing pip3 install -U pip on ubuntu 18.04 without sudo will install pip in $HOME/.local but then the system /usr/bin/pip3 script is broken:

Traceback (most recent call last):
  File "/usr/bin/pip3", line 9, in <module>
    from pip import main
ImportError: cannot import name 'main'

The fix is to delete the new version of pip in $HOME/.local is to configure $HOME/.local/bin to be first in the PATH but manually managing environment variables to get a consistent environment is likely to be a source of complication.

Note: to delete the locally installed pip and make /usr/bin/pip3 work again, just do:

$ rm -rf /home/ogrisel/.local/lib/python3.6/site-packages/pip*

So in the end my advice would be to do: as long as cython is installed (and recent enough), make sure that /usr/bin/pip3 can build scikit-learn from source without asking the user to upgrade pip.

This is what scipy does in https://github.com/scipy/scipy/blob/master/setup.py#L285: only complain about old pip version if cython is missing or too old. But even in this case we should probably ask the user to upgrade Cython instead.

/usr/bin/pip3 install -U cython on ubuntu 18.04 will install cython in $HOME/.local/lib/python3.6/site-packages which is fine to build numpy/scipy and scikit-learn.

@ogrisel
Copy link
Member

ogrisel commented Nov 6, 2019

Basically we should make it such that:

  • if you have a recent enough pip, the build dependency (cython) is automatically installed transparently;
  • if you have an old pip, you have to install the build dependency (cython) yourself first and then build scikit-learn. This is basically what we ask the user to do in:

https://scikit-learn.org/dev/developers/advanced_installation.html

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments:

@adrinjalali
Copy link
Member Author

Basically we should make it such that:

* if you have a new pip, the build dependency (cython) is automatically installed transparently;

* if you have an old pip, you have to install the build dependency (cython) yourself first and then build scikit-learn. This is basically what we ask the user to do in:

https://scikit-learn.org/dev/developers/advanced_installation.html

So you're suggesting that we don't check for pip version at all? Cause if pip is new, it'll get the new Cython anyway, right?

@ogrisel
Copy link
Member

ogrisel commented Nov 6, 2019

So you're suggesting that we don't check for pip version at all? Cause if pip is new, it'll get the new Cython anyway, right?

Yes.

@ogrisel
Copy link
Member

ogrisel commented Nov 7, 2019

@lesteve do you know what are the caveats with pip install -e . and build isolation? It seems to work fine with me. But maybe it's broken with slightly older versions of pip?

@@ -78,7 +79,8 @@ def configuration(parent_package='', top_path=None):
# add the test directory
config.add_subpackage('tests')

maybe_cythonize_extensions(top_path, config)
if 'sdist' not in sys.argv:
cythonize_extensions(top_path, config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure about this as it prevents to do things like python setup.py sdist bdist_wheel in a single line but maybe we don't care. Our CI does not do this.

@lesteve
Copy link
Member

lesteve commented Nov 7, 2019

@lesteve do you know what are the caveats with pip install -e . and build isolation?

I am not too sure, but I'll quote pandas-dev/pandas#28374 (comment):

the proper way to rebuild extensions is now

python setup.py build_ext -i  # -j 8 for parrallel
python -m pip install -e . --no-build-isolation

if you leave off the --no-build-isolation you'll be rebuilding from scratch each time.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 7, 2019

You also (apparently) also need --no-use-pep517. At pandas we had contributors where just python -m pip install -e . --no-build-isolation was failing, though none of us discovered why. So the full invocation is

python -m pip install -e . --no-build-isolation --no-use-pep517

@jorisvandenbossche
Copy link
Member

do you know what are the caveats with pip install -e . and build isolation? It seems to work fine with me.

When doing an editable install, I don't think you want to use build isolation. Actually, I don't really know what happens if you do (does incremental cython build work?). But it shouldn't be needed to create a build environment each time you build (all dependencies are already installed when doing development with editable installs).

Another thing that can be tricky with build isolation in general is the versions of packages you build against. If pyproject.toml has numpy>=1.11 that means that in the build env it might install 1.18, while you can have a much lower numpy version in your actual environment in which you are installing scikit-learn. So that can lead to incompatibilities.
Basically, I think you therefore need to pin numpy to the lowest supported version in pyproject.toml (that's what you also do when building wheels). In any case, that's also what numpy, scipy etc are doing (https://github.com/scipy/scipy/blob/master/pyproject.toml, the lowest numpy version depends on the python version)

@jorisvandenbossche
Copy link
Member

I seem to remember there were plenty of caveats when pandas tried adding a pyproject.toml originally

Yes, but I think most issues have been solved (build dependencies can now also come from wheels, can depend on the python version, etc). Scipy and numpy also added back their pyproject.toml in the meantime.

I think the main remaining issue we ran into this time is the confusion around editable installs (see above, basically editable installs is not yet supported by PEP517/PEP518, but you need to pass some extra flags to pip to keep the old behaviour for some reason ..)

@ogrisel
Copy link
Member

ogrisel commented Nov 7, 2019

Thank you very much for your feedback @jorisvandenbossche! Indeed if we decide to go with pyproject.toml we really need to pin the oldest support versions of numpy and scipy for each Python version. This is actually good because now this information is only (implicitly) present in the configuration of our release CI and conda-forge feedstock rather than in the main scikit-learn repo.

The fact that build isolation is enabled when --editable is indeed weird (even if it works). The remark on incremental build is a good point. I will run some timing tests about incremental building (with and without ccache).

Finally, in 2bc54a9 I enabled pip install -e . which now work with the latest push. The only problem is that when we build the project this way, we cannot pass the -j 3 option to use speed up the compilation stage and the "install" stage takes 6min instead of 3min for the other jobs...

@adrinjalali
Copy link
Member Author

So other than setting the numpy/scipy versions for each python version, do we actually want to have this in 0.22?

@ogrisel
Copy link
Member

ogrisel commented Nov 7, 2019

@adrinjalali I am still running some build timing tests to see what is the impact of using isolated builds.

An option would be to not introduce the pyproject.toml for now and just keep the rest of this pull request: users will have to pip install Cython manually prior to running pip install -e . as is already documented in the advanced installation documentation.

@adrinjalali
Copy link
Member Author

yeah that sounds like a good idea.

@ogrisel
Copy link
Member

ogrisel commented Nov 7, 2019

Indeed the isolated build (pip install --editable . when the pyproject.toml file is present) takes 5min each time (with or without ccache enabled it does not change much, maybe because the source file paths are different in the temp folder?).

Without the pyproject.toml the first call to pip install --editable . takes 5 minutes but then subsequent calls take 4 seconds (only changed cython files are recompiled so it's very fast if nothing has changed).

To me this is a big downside for introducing pyproject.toml. I find that having to pass --no-build-isolation each time is cumbersome. And furthermore if we update our documentation add the --no-build-isolation everywhere we will cause users to get errors on old pip versions. For instance with pip3 version 9.0.1 from ubuntu 18.04 one would get:

$ pip3 install -e . --no-build-isolation

Usage:   
  pip install [options] <requirement specifier> [package-index-options] ...
  pip install [options] -r <requirements file> [package-index-options] ...
  pip install [options] [-e] <vcs project url> ...
  pip install [options] [-e] <local project path> ...
  pip install [options] <archive url/path> ...

no such option: --no-build-isolation

So for now I would be in favor of not using pyproject.toml in scikit-learn (but keep the rest of this PR to not ship cythonized outputs in our sdist and just have the users to manually install cython first).

I think we should also motivate the fact that pip install --editable should always disable build isolation by default to benefit from incremental build performance. But I would need to read PEP 517/518 first to know whether this was already discussed.

Using setup_requires to install the cython build dependency automatically is a big "no no" for me: I really don't want to have anything installed via easy_install because easy_install tends to mess with .pth files in site-packages folder and it's misleading as hell to debug.

WDYT @adrinjalali @jorisvandenbossche ?

@adrinjalali
Copy link
Member Author

I'm more than happy with all the suggestions you have. They all sounds reasonable to me. I had the same reservations towards pyproject.toml when I read about it more extensively a while ago. I just forgot these two flags we need to pass to make it work.

@jeremiedbb
Copy link
Member

jeremiedbb commented Nov 7, 2019

To me this is a big downside for introducing pyproject.toml. I find that having to pass --no-build-isolation each time is cumbersome. And furthermore if we update our documentation add the --no-build-isolation everywhere we will cause users to get errors on old pip versions

I agree that we can delay the usage of pyproject.toml but I also think that we should not document pip install -e . as the way to rebuild sklearn each time a cython file is modified. To me the right command to document is python setup.py build_ext -i, and with that one won't have a full rebuild every time.

@rth
Copy link
Member

rth commented Nov 7, 2019

To me this is a big downside for introducing pyproject.toml. I find that having to pass --no-build-isolation each time is cumbersome.

I though this was supposed to be fixed in pip 19.1.1+? There was a long discussion about it several month ago https://discuss.python.org/t/pip-19-1-and-installing-in-editable-mode-with-pyproject-toml/1553/65 . According to pip release notes --no-use-pep517 should no longer needed. I though it did also imply --no-build-isolation but maybe it doesn't.

@ogrisel
Copy link
Member

ogrisel commented Nov 8, 2019

I also think that we should not document pip install -e . as the way to rebuild sklearn each time a cython file is modified. To me the right command to document is python setup.py build_ext -i, and with that one won't have a full rebuild every time.

The think is there is value in keeping things as simple as possible and you need to at least run pip install -e . once before being able to use python setup.py build_ext -i. And without build isolation pip install -e . is as efficient as python setup.py build_ext -i.

@ogrisel
Copy link
Member

ogrisel commented Nov 8, 2019

According to pip release notes --no-use-pep517 should no longer needed. I though it did also imply --no-build-isolation but maybe it doesn't.

Apparently no: with pip 19.2.2 I get the following lines when I pip install -e . -v in scikit-learn source tree:

  Getting requirements to build wheel ... done
    Running command /home/ogrisel/miniconda3/envs/pip/bin/python /home/ogrisel/miniconda3/envs/pip/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmpcbnemrt_
    C compiler: gcc -pthread -B /home/ogrisel/miniconda3/envs/pip/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

    compile options: '-c'
    extra options: '-fopenmp'
    gcc: test_openmp.c
    gcc -pthread -B /home/ogrisel/miniconda3/envs/pip/compiler_compat -Wl,--sysroot=/ objects/test_openmp.o -o test_openmp -fopenmp
    Compiling sklearn/preprocessing/_csr_polynomial_expansion.pyx because it depends on /tmp/pip-build-env-tpb4w59h/overlay/lib/python3.7/site-packages/Cython/Includes/numpy/__init__.pxd.
    Compiling sklearn/cluster/_dbscan_inner.pyx because it depends on /tmp/pip-build-env-tpb4w59h/overlay/lib/python3.7/site-packages/Cython/Includes/libcpp/vector.pxd.
    Compiling sklearn/cluster/_hierarchical_fast.pyx because it depends on /tmp/pip-build-env-tpb4w59h/overlay/lib/python3.7/site-packages/Cython/Includes/libcpp/map.pxd.
    Compiling sklearn/cluster/_k_means_elkan.pyx because it depends on /tmp/pip-build-env-tpb4w59h/overlay/lib/python3.7/site-packages/Cython/Includes/numpy/__init__.pxd.
    Compiling sklearn/cluster/_k_means_fast.pyx because it depends on /tmp/pip-build-env-tpb4w59h/overlay/lib/python3.7/site-packages/Cython/Includes/numpy/__init__.pxd.
[...]

This is causing to rebuild everything every time instead of benefiting from incremental building.

@ogrisel
Copy link
Member

ogrisel commented Nov 8, 2019

I will open a new PR with the cythonization change only (no pyproject.toml) for the 0.22 release and report the issue to the pip developers. Let's keep this PR as it is for now for documentation purpose (reading the discussion in the current state of the diff).

@ogrisel
Copy link
Member

ogrisel commented Nov 8, 2019

I have opened: #15567.

@ogrisel
Copy link
Member

ogrisel commented Nov 8, 2019

I though this was supposed to be fixed in pip 19.1.1+? There was a long discussion about it several month ago https://discuss.python.org/t/pip-19-1-and-installing-in-editable-mode-with-pyproject-toml/1553/65 . According to pip release notes --no-use-pep517 should no longer needed. I though it did also imply --no-build-isolation but maybe it doesn't.

I thought that I was running the latest version but actually it was not the case: I was running pip 19.2.2 while the latest pip is 19.3.1.

When I try again pip install -e . -v with pip 19.3.1, it actually ignores the pyproject.toml as expected and incremental building with cython extension is fast again (but you still have to install the cython build dependency manually first). Still maybe we should stick to the strategy of #15567 (no pyproject.toml) for the 0.22 release and re-explore this option later.

@ogrisel
Copy link
Member

ogrisel commented Nov 8, 2019

Closing in favor of #15567 that was already merged in master for now.

Let's re-explore the opportunity to configure a pyproject.toml file with proper lower versions for the build dependencies once 0.22 is out so that we have some time to test it further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop cythonizing sdist PyPi releases
9 participants