Skip to content

Register build-time dependencies in the setup.py as a setup_requirement. #7867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
HolgerPeters opened this issue Nov 14, 2016 · 22 comments · Fixed by #10402
Closed

Register build-time dependencies in the setup.py as a setup_requirement. #7867

HolgerPeters opened this issue Nov 14, 2016 · 22 comments · Fixed by #10402

Comments

@HolgerPeters
Copy link
Contributor

I package Python packages as wheels using the pip wheel command. Scikit-learn does not report its requirements to setuptools as install_requires or setup_requires. This makes pip wheel fail.

Description

The current checks for the presence of numpy and scipy during installation is according to what I found in the repo and the tracker based on Issue #1495 and PR #4371.

I suggest to avoid raising errors in setup.py and list numpy as an setup requirement and numpy and scipy as an install requirement, so that pip/wheel can install the dependencies seamlessly in the background, if they are not present. Pip's setup_requires section should list all packages necessary to invoke setup.py subbcommands (http://setuptools.readthedocs.io/en/latest/setuptools.html#setup_requires).

In my usage example, pip wheel would install numpy then temporarily for the creation of the scikit-learn wheel.

Steps/Code to Reproduce

virtualenv sklearntest
. sklearntest/bin/activate
pip install -U pip setuptools wheel
pip wheel --no-binary :all: scikit-learn

Expected Results

A scikit-learn wheel should be buildable as simple as

% pip install scikit-learn
Processing /whatever-dir/devel/scikit-learn
Collecting numpy (from scikit-learn==0.19.dev0)
  File was already downloaded /whatever-dir/devel/scikit-learn/numpy-1.11.2-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting scipy (from scikit-learn==0.19.dev0)
  File was already downloaded /whatever-dir/devel/scikit-learn/scipy-0.18.1-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
[..............]
Successfully built scikit-learn
pip wheel .  87.87s user 9.66s system 76% cpu 2:07.13 total

Actual Results

Collecting scikit-learn
  Downloading scikit-learn-0.18.1.tar.gz (8.9MB)
    100% |################################| 8.9MB 143kB/s 
Building wheels for collected packages: scikit-learn
  Running setup.py bdist_wheel for scikit-learn ... error
  Complete output from command /private/tmp/sklearntest/bin/python2.7 -u -c "import setuptools, tokenize;__file__='/private/var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/pip-build-489cEa/scikit-learn/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/tmpMCqZC2pip-wheel-:
  Partial import of sklearn during the build process.
  Traceback (most recent call last):
    File "/private/var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/pip-build-489cEa/scikit-learn/setup.py", line 169, in get_numpy_status
      import numpy
  ImportError: No module named numpy
  Traceback (most recent call last):
    File "/private/var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/pip-build-489cEa/scikit-learn/setup.py", line 149, in get_scipy_status
      import scipy
  ImportError: No module named scipy
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/private/var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/pip-build-489cEa/scikit-learn/setup.py", line 270, in <module>
      setup_package()
    File "/private/var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/pip-build-489cEa/scikit-learn/setup.py", line 250, in setup_package
      .format(numpy_req_str, instructions))
  ImportError: Numerical Python (NumPy) is not installed.
  scikit-learn requires NumPy >= 1.6.1.
  Installation instructions are available on the scikit-learn website: http://scikit-learn.org/stable/install.html

Discussion

The nature of my request is basically, to make scikit-learn behave more closely to how other packages behave.

@jnothman
Copy link
Member

Would you be able to submit a PR illustrating the desired fix?

On 14 November 2016 at 23:39, Holger Peters notifications@github.com
wrote:

I package Python packages as wheels using the pip wheel command.
Scikit-learn does not report its requirements to setuptools as
install_requires or setup_requires. This makes pip wheel fail.
Description

The current checks for the presence of numpy and scipy during installation
is according to what I found in the repo and the tracker based on Issue
#1495 #1495 and PR
#4371 #4371.

I suggest to avoid raising errors in setup.py and list numpy as an setup
requirement and numpy and scipy as an install requirement, so that
pip/wheel can install the dependencies seamlessly in the background, if
they are not present. Pip's setup_requires section should list all
packages necessary to invoke setup.py subbcommands (http://setuptools.
readthedocs.io/en/latest/setuptools.html#setup_requires).

In my usage example, pip wheel would install numpy then temporarily for
the creation of the scikit-learn wheel.
Steps/Code to Reproduce

virtualenv sklearntest
. sklearntest/bin/activate
pip install -U pip setuptools wheel
pip wheel --no-binary :all: scikit-learn

Expected Results

A scikit-learn wheel should be buildable as simple as

% pip install scikit-learn
Processing /whatever-dir/devel/scikit-learn
Collecting numpy (from scikit-learn==0.19.dev0)
File was already downloaded /whatever-dir/devel/scikit-learn/numpy-1.11.2-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting scipy (from scikit-learn==0.19.dev0)
File was already downloaded /whatever-dir/devel/scikit-learn/scipy-0.18.1-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
[..............]
Successfully built scikit-learn
pip wheel . 87.87s user 9.66s system 76% cpu 2:07.13 total

Actual Results

Collecting scikit-learn
Downloading scikit-learn-0.18.1.tar.gz (8.9MB)
100% |################################| 8.9MB 143kB/s
Building wheels for collected packages: scikit-learn
Running setup.py bdist_wheel for scikit-learn ... error
Complete output from command /private/tmp/sklearntest/bin/python2.7 -u -c "import setuptools, tokenize;file='/private/var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/pip-build-489cEa/scikit-learn/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/tmpMCqZC2pip-wheel-:
Partial import of sklearn during the build process.
Traceback (most recent call last):
File "/private/var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/pip-build-489cEa/scikit-learn/setup.py", line 169, in get_numpy_status
import numpy
ImportError: No module named numpy
Traceback (most recent call last):
File "/private/var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/pip-build-489cEa/scikit-learn/setup.py", line 149, in get_scipy_status
import scipy
ImportError: No module named scipy
Traceback (most recent call last):
File "", line 1, in
File "/private/var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/pip-build-489cEa/scikit-learn/setup.py", line 270, in
setup_package()
File "/private/var/folders/kn/3fznwwr15z9chdd30p3q4qm80000gp/T/pip-build-489cEa/scikit-learn/setup.py", line 250, in setup_package
.format(numpy_req_str, instructions))
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1.
Installation instructions are available on the scikit-learn website: http://scikit-learn.org/stable/install.html

Discussion

The nature of my request is basically, to make scikit-learn behave more
closely to how other packages behave.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#7867, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AAEz60Urz5tewlRcc3ju3bLoROGI1qzRks5q-FZqgaJpZM4KxRmC
.

@HolgerPeters
Copy link
Contributor Author

Okay, I spent some time reading setup.py and usage of numpy.distutils.core.setup might be an impediment, as using this patched version of setuptools's setup() would prevent setuptools from discovering the setup_requires declaration.

I don't know if I understand numpy.distutils properly, but to my understanding it is mostly a tool to simplify writing fortran-based libraries, so I don't really know what the benefits of using numpy.distutils is for scikit-learn (maybe it helps with "decentralizing" C-extension declarations?). Anyway, the Cythonic code bases I worked with were rather using setuptools.setup as outlined in http://docs.cython.org/en/latest/src/reference/compilation.html#configuring-the-c-build

Conclusion

I guess that my request is more tricky than I originally thought. Making scikit-learn pip wheel scikit-learn-able would probably require a rewrite of the build-declarations, which probably isn't a change you do lightly.

@njsmith
Copy link

njsmith commented Nov 18, 2016

Theoretically, numpy is both a setup_requires and a install_requires for scikit-learn.

The install_requires is more important, since this is what actually goes into the wheel metadata. It's also relatively safe to use these days, because numpy now provides wheels for all supported platforms, so it's no longer the case that pip install scikit-learn will start trying to build numpy from source. (Unless you're on some weird platform, in which case I guess trying to build from source is as good as anything.)

The setup_requires part is... kind of a disaster. Even if you do put numpy into setup_requires and deal with the circular bootstrap problem, it won't actually trigger a proper install, because setup_requires stuff gets installed by easy_install, which doesn't know about wheels, and... basically setup_requires is just kinda useless in practice.

The real solution is PEP 518, which provides a way for source packages to statically tell pip "hey please install these packages before you run setup.py". Unfortunately, PEP 518 is accepted but not yet implemented. If someone wants to write a patch for pip to actually implement it then many people will be grateful :-).

@HolgerPeters
Copy link
Contributor Author

@njsmith well, this (setup_requires as a "disaster") might be true for a python setup.py develop approach to installing from source. Yet, when building wheels a la pip wheel in a temporary directory - as https://github.com/blue-yonder/devpi-builder does it for example - I think it isn't bad. I.e. even if you link against a non-blas numpy installation, If I am not mistaken, the resulting scikit-learn wheel would be ABI compatible with a numpy wheel that was built with BLAS support, etc.

Pandas uses setup_requires for example, I would think they are performance conscious. So far I haven't had a pandas and setup_requires related issue.

@illume
Copy link

illume commented Jan 29, 2017

I tried adding setup_requires=['numpy', 'scipy'], and install_requires=['numpy', 'scipy'] to the setup.py and it didn't work. :(

PEP 518 is still in development (and I guess it will take quite some time). pypa/pip#3691

Note these two packages also require numpy to build and install ok with pip.

ivyleavedtoadflax pushed a commit to alphagov/classifyintentspipe that referenced this issue Jul 26, 2017
Some weirdness around installing scikit-learn.
See:

* scikit-learn/scikit-learn#8242
* scikit-learn/scikit-learn#7867

Solution is to force numpy and scipy installation before
requirements.txt is installed.
ivyleavedtoadflax pushed a commit to alphagov/classifyintentspipe that referenced this issue Jul 26, 2017
Some weirdness around installing scikit-learn.
See:

* scikit-learn/scikit-learn#8242
* scikit-learn/scikit-learn#7867

Solution is to force numpy and scipy installation before
requirements.txt is installed.
ivyleavedtoadflax pushed a commit to alphagov/classifyintentspipe that referenced this issue Aug 12, 2017
* Add reformat.R to convert smartsurvey format.
* Add feature generators.
* Extract data from db not flat files.
* Update README.
* Add db_to_flat_file.py to create merged db dump.
* Add urllookup as module.
* Add urllookup tests on travis.
* Add PII removal using scrubadub.
* Dump to pickle objects.

Note: DataFrameSelector outputs a numpy array, but it is easier to calculate
date features using pandas prior to conversion to a numpy array. For
this reason, create an argument as to whether the DataFrameSelector is
dealing with date features or not, and then apply the date transforms if
so, and output a numpy array.

Some weirdness around installing scikit-learn.
See:

* scikit-learn/scikit-learn#8242
* scikit-learn/scikit-learn#7867

Solution is to force numpy and scipy installation before
requirements.txt is installed.
ivyleavedtoadflax pushed a commit to alphagov/classifyintentspipe that referenced this issue Sep 17, 2017
* Add reformat.R to convert smartsurvey format.
* Add feature generators.
* Extract data from db not flat files.
* Update README.
* Add db_to_flat_file.py to create merged db dump.
* Add urllookup as module.
* Add urllookup tests on travis.
* Add PII removal using scrubadub.
* Dump to pickle objects.

Note: DataFrameSelector outputs a numpy array, but it is easier to calculate
date features using pandas prior to conversion to a numpy array. For
this reason, create an argument as to whether the DataFrameSelector is
dealing with date features or not, and then apply the date transforms if
so, and output a numpy array.

Some weirdness around installing scikit-learn.
See:

* scikit-learn/scikit-learn#8242
* scikit-learn/scikit-learn#7867

Solution is to force numpy and scipy installation before
requirements.txt is installed.
@barrywhart
Copy link
Contributor

Any updates on this? It would be great if this worked, so scikit-learn would work better with typical Python tools like pip.

@jnothman
Copy link
Member

jnothman commented Jan 3, 2018 via email

@jnothman
Copy link
Member

jnothman commented Jan 3, 2018 via email

@barrywhart
Copy link
Contributor

I'm working on a PR for this. Ready for early (but not final) review if anyone is interested.

@njsmith
Copy link

njsmith commented Jan 4, 2018

I really wouldn't recommend doing the scipy thing these days. What I'd recommend is:

  • Unconditionally declare the install_requires. This is what will end up in your wheel. In almost all cases, this will cause people who don't already have numpy installed to download the numpy wheels, which are optimized and available for all major platforms.

  • If import numpy doesn't work, then don't use setup_requires, just print an error and exit. If you put numpy in setup_requires, then setuptools will automatically try to install it using easy_install, which is some ancient software that doesn't understand wheels, so it'll try to build numpy from source and probably screw things up. Telling people to use pip/conda/whatever is much better.

There is light on the horizon for the setup_requires mess; PEP 518 has been accepted and will make it possible to declare setup requirements in a not-totally-broken way. But no-one's yet implemented it in pip, so it will be a little while yet.

@jnothman
Copy link
Member

jnothman commented Jan 4, 2018

Thanks for the input @njsmith!

@barrywhart
Copy link
Contributor

@njsmith: I had already implemented the first suggestion, and it seems to work okay. I tried the simpler approach, but on my Mac, source installations did not work correctly. I saw no apparent errors, but after installation was complete, import sklearn was failing.

Anyway, the PR is up for review.

@HolgerPeters
Copy link
Contributor Author

@njsmith summarizes exactly what I would consider a good solution to the problem (today).

@barrywhart
Copy link
Contributor

I created a second PR following @njsmith's recommendations. We can merge one and close the other, whichever seems better after review.

@lesteve
Copy link
Member

lesteve commented Jan 9, 2018

OK I have looked a bit at this (including the two PRs from @barrywhart) and I have to say that I am a bit stuck, here is where I am at:

  • following @njsmith recommendation, and using install_requires makes our setup.py pip-friendly, which is great for end users: they will do pip install scikit-learn and get the numpy and scipy wheels (if they are not already installed)
  • when using python setup.py, install_requires builds numpy and scipy from source AFAICT, which is basically what our setup.py was meant to avoid. What is the best way of keeping the existing behaviour when using python setup.py (basically have a nice error message saying that numpy and scipy needs to be available prior to building). Should we use a flag is_pip_running if that's even possible?
  • even when building from source using pip install ., I did not find an easy way to install numpy prior to use numpy.distutils to build our extensions other than first calling setuptoold.setup (which installs numpy and scipy wheels) and then numpy.distutils.setup (which builds our extensions). This is very likely not the right way to do it, is there a better way?

If there are examples of packages, which use numpy.distutils to build extensions, and that came up with a good way of doing this kind of things I'd be super happy if someone can point me to their setup.py.

@njsmith
Copy link

njsmith commented Jan 9, 2018

Unfortunately, python setup.py install is unfixably broken -- it doesn't properly record what it installed, so after you use it then future uninstalls/upgrades are broken. You might consider checking whether pip is running and erroring out (maybe only if the user asked for install). You can tell whether you were invoked by pip because pip makes sure to inject setuptools before your setup.py starts running, so you can do something like

# Must be at the top of setup.py, before importing setuptools
import sys
pip_is_running = ("setuptools" in sys.modules)

Also, yeah, until PEP 518 support lands in pip, there is no workable way to get numpy automatically installed at build time. Even if you use setup_requires, that will install using easy_install, which doesn't know about wheels, so your users will end up trying to build numpy from scratch and likely ending up with a no-BLAS version. If someone is building from source, there's currently not much you can do except print a nice error asking them to install numpy first.

@lesteve
Copy link
Member

lesteve commented Jan 9, 2018

I see, thanks a lot for the details!

If someone is building from source, there's currently not much you can do except print a nice error asking them to install numpy first.

Maybe a dumb question, but is there an easy way to figure out inside setup.py whether we are compiling from source or not?

@njsmith
Copy link

njsmith commented Jan 10, 2018

The simplest approach would be to make setup.py always error out if numpy isn't installed. Pretty much the only things you can do with setup.py are either to build from source, or some more obscure developer-only actions, and I assume developers will have numpy installed anyway.

@jnothman
Copy link
Member

jnothman commented Jan 10, 2018 via email

@lesteve
Copy link
Member

lesteve commented Jan 10, 2018

Which is pretty much the status quo?

Agreed, basically this is what we are already doing. Unless there is a way in our setup.py to figure out that we are installing a wheel (or using pip wheel as in the OP), I am not sure what we can do to improve the situation 😞.

@njsmith
Copy link

njsmith commented Jan 10, 2018

My compete suggestion is:

  • Always put numpy and scipy in install_requires
  • Never use setup_requires
  • If setup.py is run without numpy installed, print a nice error and exit

Is that what's being called the status quo?

@lesteve
Copy link
Member

lesteve commented Jan 12, 2018

IIUC your complete suggestion is "Keep your setup.py as it is and just add numpy and scipy in install_requires".

I think this is what #10402 is doing we should probably discuss there for the finer details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants