Skip to content

CI Uses pytest-xdist to parallelize tests #13041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 93 commits into from
Jun 13, 2019

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Jan 24, 2019

Reference Issues/PRs

Fixes #13016

What does this implement/fix? Explain your changes.

Overall the changes are made to address how dictionaries are not ordered in Python <= 3.5. The major changes are as follows:

  1. Uses OrderedDict in test to make sure the values passed pytest.mark.parametrize has a deterministic order.
  2. Users sorted on sets for a deterministic order.
  3. Changes made to sklearn/linear_model/coordinate_descent.py were to ensure that staticmethod path has a deterministic order in their respective classes. Only the two classes that uses lasso_path as a staticmethod are affected.
  4. The Makefile has been updated to take advantage of pytest-xdist.

Any other comments?

When pytest-xdist spawns processes, the processes independently look through all files in sklearn to test. If the order of discovered test functions are not the same for all processes, pytest-xdist will return an error.

Travis-CI does not look like it gets that much faster. Locally, this PR does speed up running our tests suite.

@thomasjpfan
Copy link
Member Author

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Jan 25, 2019

The LGTM error comes from this PR changing the Makefile which triggers C/C++ to run. Since this fails on master it also fails on this PR.

This LGTM C/C++ issue is being looked at in #13044.

Edit: #13044 now fixes the issue, after merging that PR, this PR should pass LGTM analysis: C/C++.

@glemaitre
Copy link
Member

I merge #13044 PR. Could merge master. It seems that Travis did not launch last time.

@thomasjpfan thomasjpfan changed the title [MRG] Uses pytest-xdist to parallelize tests [WIP] Uses pytest-xdist to parallelize tests Jan 26, 2019
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, but unless I'm mistaken there is (almost) no improvement in the Travis CI run time with this change.

What's the speed-up when you run these tests locally with pytest-xdist?

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Jan 26, 2019

Not being able to speed up the travis build is concerning. I suspect running coverage slows it down.

I am not content with how _LassoPathStaticMixin needed to be added to get pytest-xdist to run.

On my system, pytest runs in 378 seconds, and pytest -n4 runs in 160 seconds, which is about a 2.4x speed up.

@thomasjpfan
Copy link
Member Author

BTW pytest-xdist can not be enabled for python 3.5 because of a bug in doctest which causes pytest collection to be indeterministic: pytest-dev/pytest#4838 (comment)

This PR only enables pytest-xdist for the latest python version, and windows 3.7. On Azure, there is a reduction of 40% of time spend running pytest.

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Jun 4, 2019

The error in lgtm is a unrelated. I will fix it in another PR.

@@ -31,10 +31,15 @@ test-doc:
ifeq ($(BITS),64)
$(PYTEST) $(shell find doc -name '*.rst' | sort)
endif
test-code-parallel: in
$(PYTEST) -n auto --showlocals -v sklearn --durations=20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that this requires pytest-xdist. Else pytest fails with

pytest: error: unrecognized arguments: -n

IF "%PYTHON_ARCH%"=="64" (
call activate %VIRTUALENV%
set PYTEST_ARGS=%PYTEST_ARGS% -n2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Briefly describe -n2 for future us?

@@ -123,11 +123,14 @@ def test_normalized_output(metric_name):
assert not (score < 0).any()


SUPERVISED_UNSUPERVISED_METRICS = {**SUPERVISED_METRICS,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has it been reverted @thomasjpfan ?

@@ -446,6 +446,7 @@ def fit(self, X, y):
"""Incorrect docstring but should not be tested"""


@ignore_warnings(category=DeprecationWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? add a comment?

@NicolasHug
Copy link
Member

Looks like some of the sorted parameters are still dicts. Since xdist will only work on python >= 3.6 this is not needed.

@jnothman jnothman changed the title [MRG] Uses pytest-xdist to parallelize tests CI Uses pytest-xdist to parallelize tests Jun 13, 2019
@jnothman
Copy link
Member

If you think this is finished, feel free to merge, @thomasjpfan

@thomasjpfan thomasjpfan merged commit bec8308 into scikit-learn:master Jun 13, 2019
@rth
Copy link
Member

rth commented Jun 14, 2019

I get 3.1x speed-up for the test suite on 4 cores (using OMP_NUM_THREADS=1). Very nice @thomasjpfan !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelize tests again?
7 participants