Skip to content

The Linux wheels broke our build process #7570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ChrisAichinger opened this issue Apr 25, 2016 · 24 comments
Closed

The Linux wheels broke our build process #7570

ChrisAichinger opened this issue Apr 25, 2016 · 24 comments

Comments

@ChrisAichinger
Copy link

We are running Ubuntu 12.04 and 16.04 on amd64 and we're building several Debian packages that contain NumPy in a Python 2.7 virtualenv. Recently, several of our build processes started failing because of the addition of Linux wheels to NumPy.

The wheels contain pre-built libraries (libgfortran-ed201abd.so.3.0.0 and libopenblasp-r0-39a31c03.2.18.so). For whatever reason, the Ubuntu strip command (used to remove debugging information) fails when called for these two libraries:

strip --remove-section=.comment --remove-section=.note --strip-unneeded libopenblasp-r0-39a31c03.2.18.so
BFD: stRpMnp0: Not enough room for program headers, try linking with -N
strip:stRpMnp0[.note.gnu.build-id]: Bad value
BFD: stRpMnp0: Not enough room for program headers, try linking with -N
strip:stRpMnp0: Bad value

Since strip is automatically called by dh_strip during package build, that lead to the build errors. Even disabling stripping for this binary package lead to errors as dh_shlibdeps (which generates Debian package dependencies based on ELF executable/shlib dependencies) also fails for these binaries.

Steps to reproduce

In a freshly installed Ubuntu 12.04 or Ubuntu 16.04 on amd64, e.g. inside Vagrant:

sudo apt-get install python-virtualenv
python -m virtualenv venv
source venv/bin/activate
pip install -U pip          # pip-8.1.1.tar.gz
pip install -U setuptools   # setuptools-20.10.1
pip install numpy           # numpy-1.11.0-cp27-cp27mu-manylinux1_x86_64.whl
strip --remove-section=.comment --remove-section=.note --strip-unneeded venv/lib/python2.7/site-packages/numpy/.libs/libopenblasp-r0-39a31c03.2.18.so

Possible solutions

  • We can (and will) try to install numpy with pip's --no-binary option in the future
  • Remove the wheels from older NumPy releases. The wheels were added back to 1.6.0, which was released in 2011. In our package builds we previously pinned a 6-month-old NumPy version (1.9.3), exactly to avoid such surprises. If the wheels were only added starting from 1.11.0 going forward, this wouldn't be an issue.

Fixing the strip problem alone is not a viable solution, since dh_shlibdeps still fails after that, because the two libraries are not native to Ubuntu.

Broader issues

The pre-built wheel is great for individual users, but it's the wrong thing to use for anyone building RPM/Debian packages for others. Even if our build continued to work, we still wouldn't want to ship generic pre-built libraries. Silently slipping this change into older releases, is going to surprise a lot of people.

It's still going to surprise people if you start doing this in new releases only, but you could communicate this change more vocally via release notes, etc. so distributors/packagers are aware of it. Ultimately, they should all move to installing NumPy withpip install --no-binary numpy numpy, but that's not going to happen over night.

@matthew-brett
Copy link
Contributor

To summarize, the problem is that your build process assumed that numpy does not have binary wheels on Linux, and the right fix - as you say - is to state this explicitly with pip install --no-binary :all:.

I think your case is unusual, in that it is much more common for CI setups to want fast installs of earlier numpy, rather than distro specific compiles. At least I know a lot of projects that want to test against earlier numpy, and none of them are building .deb / .rpm packages. Maybe you agree that this is the right way to go in the longer term.

It's somewhat hard to take the wheels down and put them back again, because

a) I guess that some people will already be relying on them and
b) pip prevents you uploading a second time with the same filename.

I'm Cc'ing @yarikoptic because I know he builds a lot of Debian packages. Any thoughts?

@yarikoptic
Copy link
Contributor

Well, in my case, while building debian packages I am not touching pip even with a long stick (using it otherwise on a daily basis) simply because I want to have guarantees that whatever packages I am building against are available now and in the future from the debian/ubuntu archives and if necessary I could also fetch their sources, thus knowing exact provenance of the materials in the packages I produce. Fetching some binary blobs from other locations dilutes such assurance. Note that debian policy altogether demands absent network communication (beyond apt repositories) exactly for that purpose -- that packages do not require anything but other packages.

strip'ing and shlib issues though are intriguing and would be interesting if it is possible to have them resolved (may be not ;) )

@ChrisAichinger
Copy link
Author

The virtualenv-inside-deb packaging setup is reasonably common, given the popularity of dh_virtualenv. It's especially appealing when working on older distributions (Ubuntu 12.04), where many interesting Python packages are just not available or available only in ancient versions. In our case, @yarikoptic's approach would be rather painful, as we'd have to backport tons of Python packages to 12.04.

It would be easier if there was some reasonable way to communicate this change to users. Having to find this out by digging through your CI failure logs after pushing a harmless change doesn't brighten anyone's day.

Fixing the stripping issue (no clue) and the shlib issue (maybe adding an appropriate relative rpath would help) would also go a long way toward solving the issue. If things continue to work, few people will complain, even if they don't get exactly what they used to get. As @matthew-brett said, we can't update the already-published wheels on PyPI, though :-(

For reference / Google, here are the errors that dh_shlibdeps generates:

dh_shlibdeps -ldebian/myproject//usr/lib
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'
dpkg-shlibdeps: error: couldn't find library libgfortran-ed201abd.so.3.0.0 needed by debian/myprojecttools/usr/lib/myprojecttools/venv/lib/python2.7/site-packages/numpy/.libs/libopenblasp-r0-ea933816.2.18.so (ELF format: 'elf64-x86-64'; RPATH: '').
dpkg-shlibdeps: error: Cannot continue due to the error above.
Note: libraries are not searched in other binary packages that do not have any shlibs or symbols file.
To help dpkg-shlibdeps find private libraries, you might need to set LD_LIBRARY_PATH.

@njsmith
Copy link
Member

njsmith commented Apr 25, 2016

I'm sorry this created problems for you.

Unfortunately, you're going to have to find a solution for this in dh_virtualenv regardless of what numpy does, because lots of packages are starting to post linux wheels.

Possibly it should be taught to ignore errors in dh_strip -- I guess strip is making some invalid assumptions about the structure of ELF binaries, and our slightly weird ELF binaries that have been modified by patchelf are violating strip's assumptions. In general these vendored libraries ought to be already stripped before distribution though. (I'm not 100% sure if that's actually happening now -- possibly auditwheel should run strip over the libraries it vendors by default? That would be a simple fix in auditwheel, might be worth filing a bug: https://github.com/pypa/auditwheel)

It also looks like auditwheel is being somewhat buggy in its RPATH handling -- libopenblas is indeed missing any RPATH/RUNPATH. This still works because libopenblas is getting loaded by multiarray.so, and multiarray.so does have a correct RPATH:

      28165:     file=libgfortran-ed201abd.so.3.0.0 [0];  needed by /home/njs/.user-python3.5-64bit/lib/python3.5/site-packages/numpy/core/../.libs/libopenblasp-r0-39a31c03.2.18.so [0]
     28165:     find library=libgfortran-ed201abd.so.3.0.0 [0]; searching
     28165:      search path=/home/njs/.user-python3.5-64bit/lib/python3.5/site-packages/numpy/core/../.libs            (RPATH from file /home/njs/.user-python3.5-64bit/lib/python3.5/site-packages/numpy/core/multiarray.cpython-35m-x86_64-linux-gnu.so)

It would make sense for auditwheel to set an RPATH on libopenblas too, just to be thorough, and that might make dpkg-shlibs happier.

@yarikoptic
Copy link
Contributor

On Mon, 25 Apr 2016, Christian Aichinger wrote:

The virtualenv-inside-deb packaging setup is reasonably common, given the
popularity of dh_virtualenv. It's especially appealing when working on
older distributions (Ubuntu 12.04), where many interesting Python packages
are just not available or available only in ancient versions. In our case,
@yarikoptic's approach would be rather painful, as we'd have to backport
tons of Python packages to 12.04.

yeah -- 12.04 is quite aged by now... still, just out of interest, what
is the project/packages?

I saw dh_virtualenv lightning talk at pycon but never used it myself and
forgot detail... in your case I would have considered then generating
some separate PROJECT-virtualenv package which then would have
used/depend/build-depend-on on by leaf packages to use the environment
of. Not sure if it is available from stock dh_virtualenv functionality,
but it would have provided assurance that you have at least some
specific, versioned (packages demand versions) set of such base
environments instead of generating a new one on the fly each time and
hoping that all (network connection, etc) would be swell ;)

It would be easier if there was some reasonable way to communicate this
change to users. Having to find this out by digging through your CI
failure logs after pushing a harmless change doesn't brighten anyone's
day.

hm... interesting point. I saw tweets about wheels for numpy on linux
being added (admiring the effort, but not carrying personally since not
using myself), but I wonder if there is a
python-announcements@python.org mailing list of some kind, which
is low-volume enough for big mass subscription and which would provide
announcements on big infrastructural innovations etc? if not -- we
should initiate one IMHO and we both would subscribe ;)

Fixing the stripping issue (no clue) and the shlib issue (maybe adding an
appropriate relative rpath would help) would also go a long way toward
solving the issue. If things continue to work, few people will complain,
even if they don't get exactly what they used to get. As @matthew-brett
said, we can't update the already-published wheels on PyPI, though :-(

For reference / Google, here are the errors that dh_shlibdeps generates:

dh_shlibdeps -ldebian/myproject//usr/lib
dpkg-shlibdeps: warning: Can't extract name and version from library name `libopenblasp-r0-ea933816.2.18.so'

indeed, version was completely screwed up by joining with the git
treeish or some other checksum lead... how was it generated, do you know
@matthew-brett?

Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@njsmith
Copy link
Member

njsmith commented Apr 25, 2016

indeed, version was completely screwed up by joining with the git treeish or some other checksum lead... how was it generated, do you know

Haha oh no, there's a whole epic story behind that filename -- it's an intentionally unique name (original name + truncated hash of content), and it's necessary to work around this glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19884
Specifically it's necessary to avoid nasty conflicts between Debian's version of the library and the bundled version of the library. The name isn't the problem.

@ChrisAichinger
Copy link
Author

Wheels by themselves are not the problem. The problems are wheels

  • that contain precompiled binaries which trip dh_strip/dh_shlibdeps
  • that were pushed to PyPI for previously released stable package versions.

But by this point it may be easier to keep everything as is (given that the PyPI packages can't be changed any more), and hope that this issue doesn't affect too many people.

In hindsight, it might have been a better idea to add wheels to new versions for a year or so before pushing them for all the historic releases, especially given how many users numpy has and how little wiggle room PyPI gives you for fixing problems.

I hope I'm not sounding too negative, it's a tremendous technical achievement and a great addition to NumPy. Thanks to all of you for working on this!

Neurocinetics pushed a commit to Neurocinetics/virtualenv that referenced this issue Feb 1, 2017
@EvenOldridge
Copy link

Has anyone confirmed that pre-built libraries (libgfortran-ed201abd.so.3.0.0 and libopenblasp-r0-39a31c03.2.18.so) are stripped? The file sizes lead me to believe that they're not and this is a huge issue when trying to do something like deploy to AWS lambda with it's 50mb zip limit.

@njsmith
Copy link
Member

njsmith commented Mar 22, 2017

@EvenOldridge: I don't think anyone has followed up on that, no. If you're interested then the scripts used to build the numpy wheels are here. There's also a patch that might make the binaries strippable here, and in general it might be nice if auditwheel were modified to automatically strip the binaries it vendors.

@EvenOldridge
Copy link

Thanks @njsmith I'll do some follow up and report back.

@danlg
Copy link

danlg commented Jul 20, 2017

@njsmith I also share the use case of AWS lambda.
In short, to save some disk space, if I run on a linux box (or docker-lambda)strip on all "*.so*" libraries of numpy, will it work or not? Or can I strip all dynamic libraries of numpy except libgfortran-ed201abd.so.3.0.0 and libopenblasp-r0-39a31c03.2.18.so as mentionned by @Grk0 ?
Thanks.

@njsmith
Copy link
Member

njsmith commented Jul 20, 2017

@danlg: I don't know, I haven't tried :-). I think the latest numpy wheels are using the new version of patchelf that's supposed to result in strippable binaries, but I don't know if it actually works.

I don't think we're stripping before distribution, and we probably should be one way or another. If you wanted to fix that then many people might be grateful.

@danlg
Copy link

danlg commented Jul 26, 2017

@njsmith Update: I've tried to strip on Linux (with docker-lambda) and the stripped scipy libraries do not load: "ELF load command address/offset not properly aligned". But not stripping them result in AWS Lambda complaining about the zip over 250 MB when unzipped... Pandas seems to load though. So no solution at this stage for me.

@njsmith
Copy link
Member

njsmith commented Jul 26, 2017

@danlg: can you confirm which version of the scipy wheel you're using that cause the problem? I just want to make sure they actually used the new patchelf...

@semitom
Copy link

semitom commented Feb 5, 2018

Can confirm the problem regarding stripping scipy shared object files with numpy 1.14.0 and scipy 1.0.0.

scipy/special/_ufuncs.cpython-36m-x86_64-linux-gnu.so: ELF load command address/offset not properly aligned

@xhochy
Copy link

xhochy commented Feb 6, 2018

Updating the scipy and numpy build to the latest revision of multibuild scripts produces strippable wheels that pass their unittests. So when the next releases of NumPy and scipy use newer versions of these scripts, this bug should be solved.

@charris
Copy link
Member

charris commented Feb 6, 2018

@xoviat @matthew-brett Let me know when multibuild is in a good state so that I can update the numpy-wheels submodule commit.

@charris charris added this to the 1.14.1 release milestone Feb 6, 2018
@charris
Copy link
Member

charris commented Feb 24, 2018

The wheels for the 1.14.1 release look to be stripped. I don't know that there is anything to be done about wheels for older versions. Kicking this off to 1.15 to wait for more feedback, but I don't think there is much more we can do.

@charris charris modified the milestones: 1.14.1 release, 1.15.0 release Feb 24, 2018
@bashtage
Copy link
Contributor

Someone could spend a chunk of their time rebuilding old wheels with build tags 🥇:

https://github.com/MacPython/wiki/wiki/Build-Tags

@rgommers
Copy link
Member

Someone could spend a chunk of their time rebuilding old wheels with build tags

I think this issue is exactly an example of why we should not mess with old releases. Imho we should document that as a policy (only change old releases in case of serious issues) and then close this.

@charris
Copy link
Member

charris commented May 16, 2018

Closing. Although I believe the wheels for older versions could be fixed, the wheels have a version as part of the name so it is doable without new releases of old versions. @matthew-brett Ping.

@charris charris closed this as completed May 16, 2018
@matthew-brett
Copy link
Contributor

Yes, it could work - someone would have to rebuild all those wheels using the current stripping, and then post them with build tags. Is this still enough of an issue to justify that work, and the smallish risk that we'll break something?

@pv
Copy link
Member

pv commented May 16, 2018

Not sure if it's worth it --- moreover, it will invalidate any checksums listed in release notes (although I guess the build tag changes file names, in which case it would not be such a problem?).

@crizCraig
Copy link

crizCraig commented Aug 4, 2018

Tangential, but I started seeing this building my own wheel with manylinux, due to numpy being built as a dependency. Fix was to avoid repairing third party wheels, including numpy, with pypa/python-manylinux-demo#7 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests