Skip to content

TST Download datasets before running pytest-xdist #19118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jan 13, 2021

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Related to #19114
Related to #17553

What does this implement/fix? Explain your changes.

Allows pytest to fetch the datasets when SKLEARN_SKIP_NETWORK_TESTS=0.

Any other comments?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If find the logic hard to follow: see comments below for details.

Also the documentation of SKLEARN_SKIP_NETWORK_TESTS in doc/computing/parallelism.rst should be updated to explain what is default behavior (skipping the network tests).

@thomasjpfan thomasjpfan changed the title TST Download datasets before running pytest [scipy-dev] TST Download datasets before running pytest-xdist Jan 6, 2021
Copy link
Member

@alfaro96 alfaro96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @thomasjpfan for your PR!

The CI is failing because the fixtures are not found. Maybe they should be included in sklearn/datasets/tests/conftest.py instead of conftest.py or the `@pytest.fixture marker is required?

@ogrisel
Copy link
Member

ogrisel commented Jan 12, 2021

I can reproduce this error (also triggered on some CI jobs):

[...]
INTERNALERROR> E               File "/home/vsts/work/1/s/sklearn/conftest.py", line 73, in pytest_collection_modifyitems
INTERNALERROR> E                 item_fixtures = set(item.fixturenames)
INTERNALERROR> E             AttributeError: 'DoctestItem' object has no attribute 'fixturenames'
[...]

when running SKLEARN_SKIP_NETWORK_TESTS=0 pytest -n 4 locally with the latest pytest packages from conda-forge:

% mamba list pytest
[...]
# Name                    Version                   Build  Channel
pytest                    6.2.1            py39h2804cbe_1    conda-forge
pytest-forked             1.2.0              pyh9f0ad1d_0    conda-forge
pytest-timeout            1.4.2              pyh9f0ad1d_0    conda-forge
pytest-xdist              2.2.0              pyhd8ed1ab_0    conda-forge

@ogrisel
Copy link
Member

ogrisel commented Jan 13, 2021

The coverage failure is expected: the network dependent tests are only run on the nightly scheduled job, hence not part of the coverage report. Merging.

@ogrisel
Copy link
Member

ogrisel commented Jan 13, 2021

Actually, let's re-rerun with the [scipy-dev] flag first.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, on a local run, after cleaning up the data folder, I get the following crash with this PR:

(dev) ogrisel@arm64-apple-darwin20 scikit-learn % rm -rf /Users/ogrisel/scikit_learn_data 
(dev) ogrisel@arm64-apple-darwin20 scikit-learn % SKLEARN_SKIP_NETWORK_TESTS=0 pytest -n 4
==================================================================== test session starts ====================================================================
platform darwin -- Python 3.9.1, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: /Users/ogrisel/code/scikit-learn, configfile: setup.cfg
plugins: forked-1.2.0, xdist-2.2.0, timeout-1.4.2
gw0 [19408] / gw1 ok / gw2 ok / gw3 ok
INTERNALERROR> def worker_internal_error(self, node, formatted_error):                                                
INTERNALERROR>         """
INTERNALERROR>         pytest_internalerror() was called on the worker.
INTERNALERROR>     
INTERNALERROR>         pytest_internalerror() arguments are an excinfo and an excrepr, which can't
INTERNALERROR>         be serialized, so we go with a poor man's solution of raising an exception
INTERNALERROR>         here ourselves using the formatted message.
INTERNALERROR>         """
INTERNALERROR>         self._active_nodes.remove(node)
INTERNALERROR>         try:
INTERNALERROR> >           assert False, formatted_error
INTERNALERROR> E           AssertionError: Traceback (most recent call last):
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 269, in wrap_session
INTERNALERROR> E                 session.exitstatus = doit(config, session) or 0
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 322, in _main
INTERNALERROR> E                 config.hook.pytest_collection(session=session)
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR> E                 return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR> E                 return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 84, in <lambda>
INTERNALERROR> E                 self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 208, in _multicall
INTERNALERROR> E                 return outcome.get_result()
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 80, in get_result
INTERNALERROR> E                 raise ex[1].with_traceback(ex[2])
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 187, in _multicall
INTERNALERROR> E                 res = hook_impl.function(*args)
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 333, in pytest_collection
INTERNALERROR> E                 session.perform_collect()
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 637, in perform_collect
INTERNALERROR> E                 hook.pytest_collection_modifyitems(
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR> E                 return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR> E                 return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 84, in <lambda>
INTERNALERROR> E                 self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 208, in _multicall
INTERNALERROR> E                 return outcome.get_result()
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 80, in get_result
INTERNALERROR> E                 raise ex[1].with_traceback(ex[2])
INTERNALERROR> E               File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 187, in _multicall
INTERNALERROR> E                 res = hook_impl.function(*args)
INTERNALERROR> E               File "/Users/ogrisel/code/scikit-learn/sklearn/conftest.py", line 91, in pytest_collection_modifyitems
INTERNALERROR> E                 dataset_fetchers[name]()
INTERNALERROR> E               File "/Users/ogrisel/code/scikit-learn/sklearn/utils/validation.py", line 63, in inner_f
INTERNALERROR> E                 return f(*args, **kwargs)
INTERNALERROR> E               File "/Users/ogrisel/code/scikit-learn/sklearn/datasets/_twenty_newsgroups.py", line 437, in fetch_20newsgroups_vectorized
INTERNALERROR> E                 data_train = fetch_20newsgroups(data_home=data_home,
INTERNALERROR> E               File "/Users/ogrisel/code/scikit-learn/sklearn/utils/validation.py", line 63, in inner_f
INTERNALERROR> E                 return f(*args, **kwargs)
INTERNALERROR> E               File "/Users/ogrisel/code/scikit-learn/sklearn/datasets/_twenty_newsgroups.py", line 259, in fetch_20newsgroups
INTERNALERROR> E                 cache = _download_20newsgroups(target_dir=twenty_home,
INTERNALERROR> E               File "/Users/ogrisel/code/scikit-learn/sklearn/datasets/_twenty_newsgroups.py", line 75, in _download_20newsgroups
INTERNALERROR> E                 archive_path = _fetch_remote(ARCHIVE, dirname=target_dir)
INTERNALERROR> E               File "/Users/ogrisel/code/scikit-learn/sklearn/datasets/_base.py", line 1190, in _fetch_remote
INTERNALERROR> E                 checksum = _sha256(file_path)
INTERNALERROR> E               File "/Users/ogrisel/code/scikit-learn/sklearn/datasets/_base.py", line 1156, in _sha256
INTERNALERROR> E                 with open(path, "rb") as f:
INTERNALERROR> E             FileNotFoundError: [Errno 2] No such file or directory: '/Users/ogrisel/scikit_learn_data/20news_home/20news-bydate.tar.gz'
INTERNALERROR> E           assert False
INTERNALERROR> 
INTERNALERROR> ../../miniforge3/envs/dev/lib/python3.9/site-packages/xdist/dsession.py:187: AssertionError
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 269, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>   File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 323, in _main
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>   File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR>   File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 84, in <lambda>
INTERNALERROR>     self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
INTERNALERROR>   File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 208, in _multicall
INTERNALERROR>     return outcome.get_result()
INTERNALERROR>   File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 80, in get_result
INTERNALERROR>     raise ex[1].with_traceback(ex[2])
INTERNALERROR>   File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 187, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/xdist/dsession.py", line 112, in pytest_runtestloop
INTERNALERROR>     self.loop_once()
INTERNALERROR>   File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/xdist/dsession.py", line 135, in loop_once
INTERNALERROR>     call(**kwargs)
INTERNALERROR>   File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/xdist/dsession.py", line 175, in worker_workerfinished
INTERNALERROR>     self._active_nodes.remove(node)
INTERNALERROR> KeyError: <WorkerController gw0>

so it seems that we still have a race condition with this PR.

@ogrisel
Copy link
Member

ogrisel commented Jan 13, 2021

The [scipy-dev] build also failed in a similar manner (probably with a race condition on another dataset):

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=25253&view=logs&j=dfe99b15-50db-5d7b-b1e9-4105c42527cf&t=ef785ae2-496b-5b02-9f0e-07a6c3ab3081

INTERNALERROR> E               File "/usr/share/miniconda/envs/testvenv/lib/python3.9/site-packages/_pytest/main.py", line 637, in perform_collect
INTERNALERROR> E                 hook.pytest_collection_modifyitems

It seems that the pytest_collection_modifyitems hook is run in parallel in all the workers (instead of sequentially) and this causes the crash.

@ogrisel
Copy link
Member

ogrisel commented Jan 13, 2021

It seems to work, but I cannot see the print statements from the sequential downloads, even with -vs flags:

(dev) ogrisel@arm64-apple-darwin20 scikit-learn % rm -rf ~/scikit_learn_data
(dev) ogrisel@arm64-apple-darwin20 scikit-learn % SKLEARN_SKIP_NETWORK_TESTS=0 pytest -n 4 -svx sklearn/datasets/tests/test_20news.py
============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.9.1, pytest-6.2.1, py-1.10.0, pluggy-0.13.1 -- /Users/ogrisel/miniforge3/envs/dev/bin/python
cachedir: .pytest_cache
rootdir: /Users/ogrisel/code/scikit-learn, configfile: setup.cfg
plugins: forked-1.2.0, xdist-2.2.0, timeout-1.4.2
[gw0] darwin Python 3.9.1 cwd: /Users/ogrisel/code/scikit-learn
[gw1] darwin Python 3.9.1 cwd: /Users/ogrisel/code/scikit-learn
[gw2] darwin Python 3.9.1 cwd: /Users/ogrisel/code/scikit-learn
[gw3] darwin Python 3.9.1 cwd: /Users/ogrisel/code/scikit-learn
[gw0] Python 3.9.1 | packaged by conda-forge | (default, Jan 10 2021, 02:51:11)  -- [Clang 11.0.0 ]
[gw1] Python 3.9.1 | packaged by conda-forge | (default, Jan 10 2021, 02:51:11)  -- [Clang 11.0.0 ]
[gw2] Python 3.9.1 | packaged by conda-forge | (default, Jan 10 2021, 02:51:11)  -- [Clang 11.0.0 ]
[gw3] Python 3.9.1 | packaged by conda-forge | (default, Jan 10 2021, 02:51:11)  -- [Clang 11.0.0 ]
gw0 [7] / gw1 [7] / gw2 [7] / gw3 [7]
scheduling tests via LoadScheduling

sklearn/datasets/tests/test_20news.py::test_20news 
sklearn/datasets/tests/test_20news.py::test_20news_length_consistency 
sklearn/datasets/tests/test_20news.py::test_20news_normalization 
sklearn/datasets/tests/test_20news.py::test_20news_vectorized 
[gw1] PASSED sklearn/datasets/tests/test_20news.py::test_20news_length_consistency 
sklearn/datasets/tests/test_20news.py::test_as_frame_no_pandas 
[gw0] PASSED sklearn/datasets/tests/test_20news.py::test_20news 
sklearn/datasets/tests/test_20news.py::test_20news_as_frame 
[gw1] PASSED sklearn/datasets/tests/test_20news.py::test_as_frame_no_pandas 
[gw3] PASSED sklearn/datasets/tests/test_20news.py::test_20news_normalization 
[gw2] PASSED sklearn/datasets/tests/test_20news.py::test_20news_vectorized 
sklearn/datasets/tests/test_20news.py::test_outdated_pickle 
[gw2] PASSED sklearn/datasets/tests/test_20news.py::test_outdated_pickle 
[gw0] PASSED sklearn/datasets/tests/test_20news.py::test_20news_as_frame 

=============================================================================================== 7 passed in 34.03s ===============================================================================================

@thomasjpfan
Copy link
Member Author

It seems that worker nodes capture all the output. I am trying to see if there is a way to pass the logs to the master node.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, do not waste any more time on this. I did a bit of googling and it seems complicated. This PR is already a net improvement.

+1 for merging if the [scipy-dev] build is green.

@ogrisel
Copy link
Member

ogrisel commented Jan 13, 2021

It's not green, but this is expected: test_exactly_zero_info_score is failing as reported in #19114 (comment).

@ogrisel ogrisel merged commit 0e546eb into scikit-learn:master Jan 13, 2021
@ogrisel
Copy link
Member

ogrisel commented Jan 13, 2021

Great! Thank you very much!

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Jan 18, 2021
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
jeremiedbb pushed a commit that referenced this pull request Jan 19, 2021
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants