-
Notifications
You must be signed in to change notification settings - Fork 34
[CIVP-10579] move to pip only? #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mheilman @stephen-hoover Take a look and tell me what you think. I need to update the docs still, but a 30% look at this would be good. One wrinkle: You have to install |
(hammer) (hammer) |
does this make sure numpy is linked to the appropriate linear algebra libraries? it'd probably be good to check this, and maybe run a benchmark too. in general, i'm skeptical that this will make our lives easier, but it also probably won't make things much worse. |
I am skeptical as well. It tests scipy, but not numpy. https://github.com/civisanalytics/datascience-python/blob/master/circle.yml#L12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the warning
warning: building with the bundled copy of libffi is deprecated on this platform. It will not be distributed with Python 3.7
Let's add libffi to the packages we pull.
When I build this image locally, the size has decreased from 1.89 GB on latest to 1.32 GB here. That's a big change. If anything, I would have expected the size to increase with this PR. Any ideas why it's smaller? Smaller is great, but I want to be sure it's not signaling a problem somewhere.
Also, just a note: The Python build gives
The necessary bits to build these optional modules were not found:
_dbm _gdbm _tkinter
I think that's fine; AFAIK we don't need DBM, and I don't expect anyone to be running tkinter.
Dockerfile
Outdated
libssl-dev \ | ||
make \ | ||
xz-utils \ | ||
zlib1g-dev' && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not install these with the first apt-get? It seems unnecessarily obfuscated as a string here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea...
Dockerfile
Outdated
curl -fSL "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc" -o python.tar.xz.asc && \ | ||
export GNUPGHOME="$(mktemp -d)" && \ | ||
gpg --keyserver pgp.mit.edu --recv-keys "$GPG_KEY" && \ | ||
gpg --batch --verify python.tar.xz.asc python.tar.xz && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the warning
gpg: keyring
/tmp/tmp.nL93R0YNBb/secring.gpg' created gpg: keyring
/tmp/tmp.nL93R0YNBb/pubring.gpg' created
gpg: requesting key AA65421D from hkp server pgp.mit.edu
gpg: /tmp/tmp.nL93R0YNBb/trustdb.gpg: trustdb created
gpg: key AA65421D: public key "Ned Deily (Python release signing key) nad@python.org" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
gpg --batch --verify python.tar.xz.asc python.tar.xz
gpg: Signature made Fri 23 Dec 2016 02:53:10 AM UTC using RSA key ID AA65421D
gpg: Good signature from "Ned Deily (Python release signing key) nad@python.org"
gpg: aka "Ned Deily nad@baybryj.net"
gpg: aka "keybase.io/nad nad@keybase.io"
gpg: aka "Ned Deily (Python release signing key) nad@acm.org"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 0D96 DF4D 4110 E5C4 3FBF B17F 2D34 7EA6 AA65 421D
Is that expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Privacy is hard? :)
I don't understand pgp well enough to say what is going on here though I do have an idea.
Dockerfile
Outdated
rm python.tar.xz && \ | ||
\ | ||
cd /usr/src/python && \ | ||
./configure --enable-shared --enable-unicode=ucs4 && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line gives the warning
configure: WARNING: unrecognized options: --enable-unicode
What is that option supposed to do? We should correct it if we need it, or remove it if we don't.
I also get
If you want a release build with all optimizations active (LTO, PGO, etc),
please run ./configure --enable-optimizations
I think we should use that option: http://stackoverflow.com/questions/41405728/what-does-enable-optimizations-do-while-compiling-python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also discussion here: docker-library/python#160
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are getting a segfault like the issue you linked above with --enable-optimizations. I think we should pass.
Dockerfile
Outdated
RUN python -c "import matplotlib.pyplot" | ||
|
||
ENV VERSION=2.0.1 \ | ||
VERSION_MAJOR=2 \ | ||
ENV VERSION=3.0.0 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this change about the environment? Ideally, this change is only an implementation detail and should be in the next minor version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. Conda is gone completely. That would be a breaking change for anyone who was conda-installing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Scary.
Dockerfile
Outdated
# Ensure UTF-8 locale. | ||
RUN locale-gen en_US.UTF-8 | ||
# We need this file to install the right version of numpy. | ||
COPY requirements.txt /requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part of the image which will change most frequently. Let's put it near the end to avoid recompiling Python with every package version change. I think you can combine the
pip3 install --no-cache-dir `grep 'numpy' /requirements.txt` && \
+ find /usr/local -depth \
+ \( \
+ \( -type d -a -name test -o -name tests \) \
+ -o \
+ \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
+ \) -exec rm -rf '{}' + && \
+ rm -rf /usr/src/python ~/.cache
onto the same layer with the other pip installs, rather than in the Python installation layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requirements is at the top in order to install numpy right after the build process. This can be moved. I was just trying to get this to work once.
Dockerfile
Outdated
conda clean --all -y | ||
# Now install the rest of the dependencies from the requirements file. | ||
RUN pip install --no-cache-dir -r /requirements.txt | ||
RUN rm -rf ~/.cache/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way Docker works, there's not much point to having an rm on its own line. The files would still be in the stack of layers we download. This should be combined into the previous line. (Is there anything in the cache to clear, or is this just to be sure?)
If this really does cut the size of the image by 500 MB, then I'm in favor. Compiling our own Python is a bit of pain, but we only have to do it once. |
Latest build is at 1.31 GB. Not bad. We have about ~850 MB of python packages, of which TensorFlow is the biggest at ~160 MB. |
Linear algebra in numpy seems to work, @mheilman. |
@stephen-hoover I think the latest version addresses all of your comments in various ways. |
I am closing this for now. We can revisit later if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add miniconda too.
libffi-dev \ | ||
gfortran \ | ||
g++ \ | ||
git \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep git installed in the container. It's useful for installing new packages which aren't on pypi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops! I did not see that.
CIVIS_CONDA_VERSION=4.3.11 \ | ||
CIVIS_PYTHON_VERSION=3.6.0 | ||
|
||
RUN DEBIAN_FRONTEND=noninteractive apt-get update -y --no-install-recommends && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to keep the "DEBIAN_FRONTEND=noninteractive" bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
This PR moves this image to a pip-only install. We may not do this, but let's see what happens.