Skip to content

[MRG] BLD Specify build time dependencies via pyproject.toml #16244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 11, 2020

Conversation

jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Jan 27, 2020

Fixes #15298
I basically took all what's done by adrin in #15533 but the cythonizing part.

That was discussed in several previous PRs (#15335, #15533, #15567) but we decided to not add it right before a release. We now have more time before the next release. Let's not delay that to the last minute.

It can also let us use external build tools like extension-helpers (#15917) or oldest-supported-numpy (https://pypi.org/project/oldest-supported-numpy/).

@rth
Copy link
Member

rth commented Feb 3, 2020

I agree it would be good to merge this some time before the release. Is anything holding this?

@jeremiedbb
Copy link
Member Author

jeremiedbb commented Feb 4, 2020

Is anything holding this?

We need to decide if we do want this. The main concern is editable mode vs build isolation. With a "pyproject.toml", pip install -e . makes an isolated build. As a consequence, pip entirely re-builds sklearn each time. To get an incremental build, we'd need to run pip install --no-build-isolation -e .. In that case, we need to install build dependencies manually (as we do today).

So in developer mode, the new way to do is:

  • install build dependencies (same as today)
  • pip install --no-build-isolation -e . (1 additional flag)

Where I see the value of adding "pyproject.toml" is for people who install sklearn from source distribution. Today, one needs to install cython first otherwise pip will fail. But usually you'll never have to build sklearn again in that case, so here an isolated build makes totally sense.

@jeremiedbb
Copy link
Member Author

To keep track of previous discussions, the behavior reported in #15533 (comment) can't be reproduced. We tried it again with olivier and the build is still isolated.

@jeremiedbb jeremiedbb changed the title [WIP] Specify build time dependencies via pyproject.toml [MRG] Specify build time dependencies via pyproject.toml Feb 4, 2020
@jeremiedbb jeremiedbb changed the title [MRG] Specify build time dependencies via pyproject.toml [MRG] BLD Specify build time dependencies via pyproject.toml Feb 4, 2020
@NicolasHug
Copy link
Member

pip entirely re-builds scklearn each time

Are you sure -- do you have a ref?

From my understanding of https://pip.pypa.io/en/stable/reference/pip/#pep-517-and-518-support, scikit-learn woudn't be entirely re-built. The worst thing that I guess could happen is that the build dependencies are still installed again in the temp dir, even if they are already satisfied. Though I hope that pip is smart enough to avoid that?

I.e. I assume that merging this PR wouldn't change our workflow since we would still manually install cython numpy scipy and then pip install -e . or make in would work as before?

@jeremiedbb
Copy link
Member Author

jeremiedbb commented Feb 4, 2020

Are you sure -- do you have a ref?

tested it with different versions of pip from 10.X to 20.X. In verbose mode you can see the compilation happening twice when running pip install -e . twice.

I.e. I assume that merging this PR wouldn't change our workflow since we would still manually install cython numpy scipy and then pip install -e . or make in would work as before?

No, as I said, pip install -e . would re-compile everything. You'll need to run pip install --no-build-isolation -e ..

Make in would work as before

@jvesely
Copy link
Contributor

jvesely commented Feb 4, 2020

There's a benefit beyond pulling in cython automatically; pip creates and caches wheels so it doesn't have to be rebuilt every time scikit-learn is installed.
This improves CI times on platforms that lack official scikit-learn wheels (arm64, pypy3).

@NicolasHug
Copy link
Member

NicolasHug commented Feb 4, 2020

In verbose mode you can see the compilation happening twice when running pip install -e . twice

Indeed. This is because numpy is installed in the temp dir and Cython believes that it needs to recompile all its files that have a cimport numpy, because the numpy headers (in the temp dir) have a newer timestamp.

I think that if pip didn't installed numpy in the temp dir, the compilation issue would disappear.

I've opened an issue on https://discuss.python.org (currently going through the spam filter, will link to it when available).

EDIT: link to post

@jeremiedbb
Copy link
Member Author

I think it's due to cython being installed in the temp dir.

Compiling sklearn/cluster/_k_means_fast.pyx because it depends on /tmp/pip-build-env-uo_4y7gb/overlay/lib/python3.6/site-packages/Cython/Includes/numpy/__init__.pxd.

The detected change is in cython's header for numpy c api.

It also occurs for non numpy stuff:

Compiling sklearn/decomposition/_cdnmf_fast.pyx because it depends on /tmp/pip-build-env-uo_4y7gb/overlay/lib/python3.6/site-packages/Cython/Includes/libc/math.pxd.
Compiling sklearn/utils/_openmp_helpers.pyx because it depends on /tmp/pip-build-env-uo_4y7gb/overlay/lib/python3.6/site-packages/Cython/Includes/openmp.pxd.

The problem is that we can't expect to have cython in the original env.

@ogrisel
Copy link
Member

ogrisel commented Feb 6, 2020

Maybe Cython could be made smarter by using a sha256 hash of the the dependent source files instead of file timestamps?

@ogrisel
Copy link
Member

ogrisel commented Feb 6, 2020

Looking at the source code of Cython, I think only mtime-based change detection is implemented: https://github.com/cython/cython/blob/master/Cython/Build/Dependencies.py

@ogrisel
Copy link
Member

ogrisel commented Feb 6, 2020

I have the feeling that we will need to use the --no-build-isolation flag to benefit from incremental building our cython extensions...

@NicolasHug
Copy link
Member

yeah I guess that's fine. It will only impact contributors that modify Cython files and that can't use make in.

We'd need to update the note in https://scikit-learn.org/stable/developers/contributing.html#how-to-contribute to introduce the --no-build-isolation there (and also suggest make in instead). Also the note in https://scikit-learn.org/stable/developers/advanced_installation.html#building-from-source

@thomasjpfan
Copy link
Member

If we want to service those who does not have make, we can advise contributors to: python setup.py build_ext -i (and with -j if they want to do it faster).

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe worth pinging other core devs for completeness

# to speed-up the building multicore CI machines.
python setup.py build_ext --inplace -j 3
python setup.py develop
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug you did not finish your comment, or am I missing something ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was a missing line return but it's been fixed (it showed on the main diff, maybe not in the PR comments page)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see. I must have fixed that in a recent commit indeed :)

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using --no-build-isolation by contributors is a modest price to pay for the advantages this could bring.

If we want to service those who does not have make, we can advise contributors to: python setup.py build_ext -i (and with -j if they want to do it faster).

I am sad there is no equivalent of -j for pip install (or at least I haven't found one), but I was under the impression that python setup.py ... use is now discouraged, and I'm also not so keen on make in as that's not universal with respect to other projects.

pip install --no-build-isolation -e . is still the most standard way IMO.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well.

+1 for keeping the message simple and only mentioning the pip install --no-build-isolation -e . command as it's cross-platform.

pyproject.toml Outdated
"wheel",
"Cython>=0.28.5",
"numpy>=1.11.0",
"scipy>=0.17.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15106 will be merged soon. Let's bump-up numpy to 1.13.3 and scipy to 0.19.1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged :)

@ogrisel
Copy link
Member

ogrisel commented Feb 7, 2020

BTW the discussion is going on at https://discuss.python.org/t/pep-518-and-editable-mode-dont-install-already-satisfied-dependencies/3124 if you are interested in contributing to pip to improve incremental building by reusing an existing build env instead of creating a new one each time.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jeremiedbb

@@ -5,3 +5,4 @@ recursive-include sklearn *.c *.h *.pyx *.pxd *.pxi *.tp
recursive-include sklearn/datasets *.csv *.csv.gz *.rst *.jpg *.txt *.arff.gz *.json.gz
include COPYING
include README.rst
include pyproject.toml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because we need this file in the sdist? Otherwise in the binary releases we don't need to have it. Is my understanding correct?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jeremiedbb

@adrinjalali adrinjalali merged commit 97d49f2 into scikit-learn:master Feb 11, 2020
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
…16244)

* add pyproject.toml

* do it in CI

* cln

* no build isolation in the doc

* bump dependencies

* no-build-isolation everywhere needed
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
…16244)

* add pyproject.toml

* do it in CI

* cln

* no build isolation in the doc

* bump dependencies

* no-build-isolation everywhere needed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Undeclared build dependency on cython
7 participants