Skip to content

[CIVP-10579] move to pip only? #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed

[CIVP-10579] move to pip only? #24

wants to merge 13 commits into from

Conversation

beckermr
Copy link
Contributor

@beckermr beckermr commented Mar 10, 2017

This PR moves this image to a pip-only install. We may not do this, but let's see what happens.

@beckermr
Copy link
Contributor Author

@mheilman @stephen-hoover Take a look and tell me what you think. I need to update the docs still, but a 30% look at this would be good.

One wrinkle: You have to install numpy before you can install glmnet, so I have to run the pip install -r ... command twice. :(

@beckermr beckermr changed the title WIP start to move to pip only [CIVP-10579] move to pip only? Mar 13, 2017
@beckermr
Copy link
Contributor Author

(hammer) (hammer)

@mheilman
Copy link
Contributor

does this make sure numpy is linked to the appropriate linear algebra libraries? it'd probably be good to check this, and maybe run a benchmark too.

in general, i'm skeptical that this will make our lives easier, but it also probably won't make things much worse.

@beckermr
Copy link
Contributor Author

I am skeptical as well. It tests scipy, but not numpy.

https://github.com/civisanalytics/datascience-python/blob/master/circle.yml#L12

Copy link
Contributor

@stephen-hoover stephen-hoover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the warning

warning: building with the bundled copy of libffi is deprecated on this platform. It will not be distributed with Python 3.7
Let's add libffi to the packages we pull.

When I build this image locally, the size has decreased from 1.89 GB on latest to 1.32 GB here. That's a big change. If anything, I would have expected the size to increase with this PR. Any ideas why it's smaller? Smaller is great, but I want to be sure it's not signaling a problem somewhere.

Also, just a note: The Python build gives

The necessary bits to build these optional modules were not found:
_dbm _gdbm _tkinter

I think that's fine; AFAIK we don't need DBM, and I don't expect anyone to be running tkinter.

Dockerfile Outdated
libssl-dev \
make \
xz-utils \
zlib1g-dev' && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not install these with the first apt-get? It seems unnecessarily obfuscated as a string here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea...

Dockerfile Outdated
curl -fSL "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc" -o python.tar.xz.asc && \
export GNUPGHOME="$(mktemp -d)" && \
gpg --keyserver pgp.mit.edu --recv-keys "$GPG_KEY" && \
gpg --batch --verify python.tar.xz.asc python.tar.xz && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the warning

gpg: keyring /tmp/tmp.nL93R0YNBb/secring.gpg' created gpg: keyring /tmp/tmp.nL93R0YNBb/pubring.gpg' created
gpg: requesting key AA65421D from hkp server pgp.mit.edu
gpg: /tmp/tmp.nL93R0YNBb/trustdb.gpg: trustdb created
gpg: key AA65421D: public key "Ned Deily (Python release signing key) nad@python.org" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
gpg --batch --verify python.tar.xz.asc python.tar.xz
gpg: Signature made Fri 23 Dec 2016 02:53:10 AM UTC using RSA key ID AA65421D
gpg: Good signature from "Ned Deily (Python release signing key) nad@python.org"
gpg: aka "Ned Deily nad@baybryj.net"
gpg: aka "keybase.io/nad nad@keybase.io"
gpg: aka "Ned Deily (Python release signing key) nad@acm.org"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 0D96 DF4D 4110 E5C4 3FBF B17F 2D34 7EA6 AA65 421D

Is that expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Privacy is hard? :)

I don't understand pgp well enough to say what is going on here though I do have an idea.

Dockerfile Outdated
rm python.tar.xz && \
\
cd /usr/src/python && \
./configure --enable-shared --enable-unicode=ucs4 && \
Copy link
Contributor

@stephen-hoover stephen-hoover Mar 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line gives the warning

configure: WARNING: unrecognized options: --enable-unicode

What is that option supposed to do? We should correct it if we need it, or remove it if we don't.

I also get

If you want a release build with all optimizations active (LTO, PGO, etc),
please run ./configure --enable-optimizations

I think we should use that option: http://stackoverflow.com/questions/41405728/what-does-enable-optimizations-do-while-compiling-python

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also discussion here: docker-library/python#160

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are getting a segfault like the issue you linked above with --enable-optimizations. I think we should pass.

Dockerfile Outdated
RUN python -c "import matplotlib.pyplot"

ENV VERSION=2.0.1 \
VERSION_MAJOR=2 \
ENV VERSION=3.0.0 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this change about the environment? Ideally, this change is only an implementation detail and should be in the next minor version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Conda is gone completely. That would be a breaking change for anyone who was conda-installing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Scary.

Dockerfile Outdated
# Ensure UTF-8 locale.
RUN locale-gen en_US.UTF-8
# We need this file to install the right version of numpy.
COPY requirements.txt /requirements.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part of the image which will change most frequently. Let's put it near the end to avoid recompiling Python with every package version change. I think you can combine the

pip3 install --no-cache-dir `grep 'numpy' /requirements.txt` && \
 +    find /usr/local -depth \
 +        \( \
 +            \( -type d -a -name test -o -name tests \) \
 +            -o \
 +            \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
 +        \) -exec rm -rf '{}' + && \
 +    rm -rf /usr/src/python ~/.cache

onto the same layer with the other pip installs, rather than in the Python installation layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirements is at the top in order to install numpy right after the build process. This can be moved. I was just trying to get this to work once.

Dockerfile Outdated
conda clean --all -y
# Now install the rest of the dependencies from the requirements file.
RUN pip install --no-cache-dir -r /requirements.txt
RUN rm -rf ~/.cache/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way Docker works, there's not much point to having an rm on its own line. The files would still be in the stack of layers we download. This should be combined into the previous line. (Is there anything in the cache to clear, or is this just to be sure?)

@stephen-hoover
Copy link
Contributor

If this really does cut the size of the image by 500 MB, then I'm in favor. Compiling our own Python is a bit of pain, but we only have to do it once.

@beckermr
Copy link
Contributor Author

Latest build is at 1.31 GB. Not bad. We have about ~850 MB of python packages, of which TensorFlow is the biggest at ~160 MB.

@beckermr
Copy link
Contributor Author

Linear algebra in numpy seems to work, @mheilman.

@beckermr
Copy link
Contributor Author

@stephen-hoover I think the latest version addresses all of your comments in various ways.

@beckermr
Copy link
Contributor Author

I am closing this for now. We can revisit later if needed.

@beckermr beckermr closed this Mar 17, 2017
Copy link
Contributor

@stephen-hoover stephen-hoover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add miniconda too.

libffi-dev \
gfortran \
g++ \
git \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep git installed in the container. It's useful for installing new packages which aren't on pypi.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! I did not see that.

CIVIS_CONDA_VERSION=4.3.11 \
CIVIS_PYTHON_VERSION=3.6.0

RUN DEBIAN_FRONTEND=noninteractive apt-get update -y --no-install-recommends && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep the "DEBIAN_FRONTEND=noninteractive" bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@beckermr beckermr reopened this Mar 17, 2017
@beckermr beckermr closed this Mar 17, 2017
@salilgupta1 salilgupta1 deleted the move-to-pip branch March 5, 2020 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants