Skip to content

[MRG] PyPy support #11010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 64 commits into from
Jul 20, 2018
Merged

[MRG] PyPy support #11010

merged 64 commits into from
Jul 20, 2018

Conversation

rlamy
Copy link
Contributor

@rlamy rlamy commented Apr 22, 2018

Reference Issues/PRs

See #8625

What does this implement/fix? Explain your changes.

Working on adding PyPy support.

@rth
Copy link
Member

rth commented Apr 22, 2018

Thanks for working on this @rlamy !

For future reference to create a Python 3.5 environment with PyPy 5.10.1, I used the following steps,

  1. Installed PyPy 5.10 (from my Linux distribution),
  2. Run,
    virtualenv -p /usr/bin/pypy3 pypy-env
    source pypy-env/bin/activate
    pip install numpy Cython Tempita
    pip install scipy==1.1.0rc1
    # pandas is used in benchmarks
    pip install pandas --no-build-isolation
    cd scikit-learn
    pip install -e .

@rth
Copy link
Member

rth commented Apr 22, 2018

I added CI setup for PyPy and disabled other builds temporarly to avoid uncessary runs. Also copied the cython code that includes array.array back into separate files _hashing_pypy.py and _svmlight_format_pypy.py so it's easier to diff with the original version. Also added a IS_PYPY global flag similarly to scipy.

Unfortunately,

  • despite what the Travis CI documentation says pypy for Python 2 does not seem to be included anymore,
     $ ls ~/virtualenv/
     pypy3.5		python2.7     python2.7_with_system_site_packages  python3.6
     pypy3.5-5.10.1	python2.7.14  python3.4_with_system_site_packages  python3.6.3
    
    (specifying python: pypy2.7 returns an error about not found virtualenv). Unless I'm looking in wrong places
  • on Python 3, the build timeouts when building scipy at
    Processing scipy/spatial/_voronoi.pyx
    No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
    

generally I'm not sure if it's worth spending time on fixing this as locally the test suite with PyPy 3 takes ~30min for me (without the examples), so accounting for the time to build scipy I am not sure we will be able to run the test suite within the 50 min limit on Travis CI.

The output of the test suite I am getting locally with PyPy 3 - 5.10.1 can be found in
scikit-learn-pypy3-5.10.log; as you noted @rlamy there are ~105 failing tests out of 8727 which is surprisingly good for a first try. Most _load_svmlight_file tests fails so the included changes have issues (it seems I have missed a few imports while moving code - this should be fixed now).

Benchmark wise, we need to think which existing benchmarks under benchmarks/ could be used.

cc @jnothman @lesteve

@njsmith
Copy link

njsmith commented Apr 22, 2018

despite what the Travis CI documentation says pypy for Python 2 does not seem to be included anymore

The Travis CI docs are always misleading about what Pythons are actually available. If you want to actually know, the secret trick is to look directly at their s3 bucket:

s3cmd ls s3://travis-python-archives/binaries/ubuntu/16.04/x86_64/

So for example, right now that list includes pypy2.7-5.10.0.tar.bz2 and pypy3.5-5.10.1.tar.bz2, which means you can use python: pypy2.7-5.10.0 or python: pypy3.5-5.10.1 in your .travis.yml.

@rth
Copy link
Member

rth commented Apr 22, 2018

Thanks for the trick @njsmith! will use pypy2.7-5.10.0

On another topic, for future reference, I opened an issue at PyPy regarding array.array in Cython https://bitbucket.org/pypy/pypy/issues/2807/arrayarray-fails-to-import-in-cython

ls ~/virtualenv/
pip install numpy Cython Tempita
# use scipy 1.1 when it's released
pip install -vv git+https://github.com/scipy/scipy.git@b3162e187174fa0b1462f7b243c483285b38f45e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip install scipy==1.1.0rc1 may be easier to remember (and maybe gets rid of the Cython timeout, since it's pre-cythonized)...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed! Sorry I missed the RC announcement...

@rth
Copy link
Member

rth commented Apr 23, 2018

Using scipy 1.1.rc1 worked much better in CI, thank you.

Current status:

  • pypy2.7-5.10.0 : 1 error at test collection time
  • pypy3.5-5.10 : Tavis CI timouts after 50min at 70% of the test suite

CircleCI seems to have a larger maximum allowed build time (5hours) so I'll try to use that instead.

@rlamy
Copy link
Contributor Author

rlamy commented Apr 23, 2018

Using @antocuni 's pypy-wheels should make the install much faster. Unfortunately, there is no suitable scipy wheel yet, but we can at least get numpy.

@rlamy
Copy link
Contributor Author

rlamy commented Apr 23, 2018

Well, numpy seems to be pre-installed by Travis, so this doesn't change anything.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ogrisel
Copy link
Member

ogrisel commented Jul 17, 2018

pypy3 build setup is failing:

Installing collected packages: scipy, MarkupSafe, Jinja2, Pygments, docutils, snowballstemmer, pytz, babel, alabaster, imagesize, chardet, idna, urllib3, certifi, requests, pyparsing, packaging, sphinxcontrib-websupport, sphinx, numpydoc
Successfully installed Jinja2-2.10 MarkupSafe-1.0 Pygments-2.2.0 alabaster-0.7.11 babel-2.6.0 certifi-2018.4.16 chardet-3.0.4 docutils-0.14 idna-2.7 imagesize-1.0.0 numpydoc-0.8.0 packaging-17.1 pyparsing-2.2.0 pytz-2018.5 requests-2.19.1 scipy-1.1.0 snowballstemmer-1.2.1 sphinx-1.7.6 sphinxcontrib-websupport-1.1.0 urllib3-1.23
+ ccache -M 512M
./build_tools/circle/build_test_pypy.sh: line 24: ccache: command not found
Exited with code 127

doc/faq.rst Outdated
in PyPy is complete or near-complete, and SciPy is ported over as well,
we can start thinking of a port.
We use too much of NumPy to work with a partial implementation.
just-in-time compiling Python implementation. Experimental support for
Copy link
Member

@ogrisel ogrisel Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather say: "PyPy <http://pypy.org/>_ is an alternative Python implementation with a built-in just-in-time compiler."

It's not that new anymore. It is fast in many case but for scikit-learn users it's quite unlikely to improve speed significantly on our models as they already rely on numpy and Cython.

@jnothman
Copy link
Member

Let's merge.

@jnothman
Copy link
Member

Thank you @rlamy and @rth.

@jnothman jnothman merged commit 5592a2e into scikit-learn:master Jul 20, 2018
@rth
Copy link
Member

rth commented Jul 20, 2018

Thanks @jnothman , the issue was that Circle CI was down, and so it didn't trigger at the last commit, and the PyPy build didn't pass on master.

I'll try to fix it in a separate PR, but a temporary workaround could be to disable the corresponding pypy3 build in CircleCI to avoid failing CI status on master...

@jorisvandenbossche
Copy link
Member

but a temporary workaround could be to disable the corresponding pypy3 build in CircleCI to avoid failing CI status on master..

+1

@qinhanmin2014
Copy link
Member

qinhanmin2014 commented Jul 20, 2018

Seems that we're running pypi (which fails) in PRs, which might confuse contributors and block codecov from providing results. Maybe we need a temporary workaround ASAP.

@rth
Copy link
Member

rth commented Jul 20, 2018

Temporary workaround applied in 6e94a63

@giodegas
Copy link

giodegas commented Sep 2, 2018

Hi to all. I just participated to the EuroSciPy 2018 sprint about PyPy. I am trying to build a dockerized pypy/numpy/scipy/scikitlearn set with jupyter notebook image.

docker pull giodegas/pypy-jupyter:scipy

( source repo here , please open issues )

Result of

pytest --pyargs sklearn > pytest.log

pytest.log

How can I quickly fix the error:

ImportError: cannot import name '_load_svmlight_file'

Thank's.

@rth
Copy link
Member

rth commented Sep 2, 2018

@giodegas Great to know you are working on this! Please make sure you are using the just released 0.20.rc1 -- version 0.19 doesn't support PyPy. Also it's expected that you would get some tests failures on Python 2, there shouldn't be any on Python 3. If you do experience other problems please open a new issue.

@rth
Copy link
Member

rth commented Sep 2, 2018

Also I would like to thank @rlamy for starting this effort, and all reviewers for very helpful suggestions!

(I still need to re-enable PyPy CI probably in a Circle CI Cron job)

@giodegas
Copy link

giodegas commented Sep 2, 2018

I also get the error:

ValueError: array.array has the wrong size, try recompiling. Expected 24, got 72

@giodegas
Copy link

giodegas commented Sep 2, 2018

@rth done the upgrade to 0.20.rc1.
The new pytest.log

New error showing up...

@giodegas
Copy link

giodegas commented Sep 2, 2018

@rth I am testing starting from latest pypy3 docker image.

@rth
Copy link
Member

rth commented Sep 2, 2018

@giodegas Could you please open a new issue with this information?

@giodegas
Copy link

giodegas commented Sep 2, 2018

Here suggests to do:

import pickle

instead of

import _pickle

@giodegas
Copy link

giodegas commented Sep 2, 2018

moved to #11971

@rth rth mentioned this pull request Sep 3, 2018
@lesteve lesteve mentioned this pull request Oct 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.