-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
TST Download datasets before running pytest-xdist #19118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST Download datasets before running pytest-xdist #19118
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If find the logic hard to follow: see comments below for details.
Also the documentation of SKLEARN_SKIP_NETWORK_TESTS
in doc/computing/parallelism.rst
should be updated to explain what is default behavior (skipping the network tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @thomasjpfan for your PR!
The CI is failing because the fixtures are not found. Maybe they should be included in sklearn/datasets/tests/conftest.py
instead of conftest.py
or the `@pytest.fixture marker is required?
I can reproduce this error (also triggered on some CI jobs):
when running
|
The coverage failure is expected: the network dependent tests are only run on the nightly scheduled job, hence not part of the coverage report. Merging. |
Actually, let's re-rerun with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, on a local run, after cleaning up the data folder, I get the following crash with this PR:
(dev) ogrisel@arm64-apple-darwin20 scikit-learn % rm -rf /Users/ogrisel/scikit_learn_data
(dev) ogrisel@arm64-apple-darwin20 scikit-learn % SKLEARN_SKIP_NETWORK_TESTS=0 pytest -n 4
==================================================================== test session starts ====================================================================
platform darwin -- Python 3.9.1, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: /Users/ogrisel/code/scikit-learn, configfile: setup.cfg
plugins: forked-1.2.0, xdist-2.2.0, timeout-1.4.2
gw0 [19408] / gw1 ok / gw2 ok / gw3 ok
INTERNALERROR> def worker_internal_error(self, node, formatted_error):
INTERNALERROR> """
INTERNALERROR> pytest_internalerror() was called on the worker.
INTERNALERROR>
INTERNALERROR> pytest_internalerror() arguments are an excinfo and an excrepr, which can't
INTERNALERROR> be serialized, so we go with a poor man's solution of raising an exception
INTERNALERROR> here ourselves using the formatted message.
INTERNALERROR> """
INTERNALERROR> self._active_nodes.remove(node)
INTERNALERROR> try:
INTERNALERROR> > assert False, formatted_error
INTERNALERROR> E AssertionError: Traceback (most recent call last):
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 269, in wrap_session
INTERNALERROR> E session.exitstatus = doit(config, session) or 0
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 322, in _main
INTERNALERROR> E config.hook.pytest_collection(session=session)
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR> E return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR> E return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 84, in <lambda>
INTERNALERROR> E self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 208, in _multicall
INTERNALERROR> E return outcome.get_result()
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 80, in get_result
INTERNALERROR> E raise ex[1].with_traceback(ex[2])
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 187, in _multicall
INTERNALERROR> E res = hook_impl.function(*args)
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 333, in pytest_collection
INTERNALERROR> E session.perform_collect()
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 637, in perform_collect
INTERNALERROR> E hook.pytest_collection_modifyitems(
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR> E return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR> E return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 84, in <lambda>
INTERNALERROR> E self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 208, in _multicall
INTERNALERROR> E return outcome.get_result()
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 80, in get_result
INTERNALERROR> E raise ex[1].with_traceback(ex[2])
INTERNALERROR> E File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 187, in _multicall
INTERNALERROR> E res = hook_impl.function(*args)
INTERNALERROR> E File "/Users/ogrisel/code/scikit-learn/sklearn/conftest.py", line 91, in pytest_collection_modifyitems
INTERNALERROR> E dataset_fetchers[name]()
INTERNALERROR> E File "/Users/ogrisel/code/scikit-learn/sklearn/utils/validation.py", line 63, in inner_f
INTERNALERROR> E return f(*args, **kwargs)
INTERNALERROR> E File "/Users/ogrisel/code/scikit-learn/sklearn/datasets/_twenty_newsgroups.py", line 437, in fetch_20newsgroups_vectorized
INTERNALERROR> E data_train = fetch_20newsgroups(data_home=data_home,
INTERNALERROR> E File "/Users/ogrisel/code/scikit-learn/sklearn/utils/validation.py", line 63, in inner_f
INTERNALERROR> E return f(*args, **kwargs)
INTERNALERROR> E File "/Users/ogrisel/code/scikit-learn/sklearn/datasets/_twenty_newsgroups.py", line 259, in fetch_20newsgroups
INTERNALERROR> E cache = _download_20newsgroups(target_dir=twenty_home,
INTERNALERROR> E File "/Users/ogrisel/code/scikit-learn/sklearn/datasets/_twenty_newsgroups.py", line 75, in _download_20newsgroups
INTERNALERROR> E archive_path = _fetch_remote(ARCHIVE, dirname=target_dir)
INTERNALERROR> E File "/Users/ogrisel/code/scikit-learn/sklearn/datasets/_base.py", line 1190, in _fetch_remote
INTERNALERROR> E checksum = _sha256(file_path)
INTERNALERROR> E File "/Users/ogrisel/code/scikit-learn/sklearn/datasets/_base.py", line 1156, in _sha256
INTERNALERROR> E with open(path, "rb") as f:
INTERNALERROR> E FileNotFoundError: [Errno 2] No such file or directory: '/Users/ogrisel/scikit_learn_data/20news_home/20news-bydate.tar.gz'
INTERNALERROR> E assert False
INTERNALERROR>
INTERNALERROR> ../../miniforge3/envs/dev/lib/python3.9/site-packages/xdist/dsession.py:187: AssertionError
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR> File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 269, in wrap_session
INTERNALERROR> session.exitstatus = doit(config, session) or 0
INTERNALERROR> File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/_pytest/main.py", line 323, in _main
INTERNALERROR> config.hook.pytest_runtestloop(session=session)
INTERNALERROR> File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR> return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR> File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR> return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR> File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/manager.py", line 84, in <lambda>
INTERNALERROR> self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
INTERNALERROR> File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 208, in _multicall
INTERNALERROR> return outcome.get_result()
INTERNALERROR> File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 80, in get_result
INTERNALERROR> raise ex[1].with_traceback(ex[2])
INTERNALERROR> File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/pluggy/callers.py", line 187, in _multicall
INTERNALERROR> res = hook_impl.function(*args)
INTERNALERROR> File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/xdist/dsession.py", line 112, in pytest_runtestloop
INTERNALERROR> self.loop_once()
INTERNALERROR> File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/xdist/dsession.py", line 135, in loop_once
INTERNALERROR> call(**kwargs)
INTERNALERROR> File "/Users/ogrisel/miniforge3/envs/dev/lib/python3.9/site-packages/xdist/dsession.py", line 175, in worker_workerfinished
INTERNALERROR> self._active_nodes.remove(node)
INTERNALERROR> KeyError: <WorkerController gw0>
so it seems that we still have a race condition with this PR.
The
It seems that the |
It seems to work, but I cannot see the print statements from the sequential downloads, even with
|
It seems that worker nodes capture all the output. I am trying to see if there is a way to pass the logs to the master node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, do not waste any more time on this. I did a bit of googling and it seems complicated. This PR is already a net improvement.
+1 for merging if the [scipy-dev]
build is green.
It's not green, but this is expected: |
Great! Thank you very much! |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Reference Issues/PRs
Related to #19114
Related to #17553
What does this implement/fix? Explain your changes.
Allows pytest to fetch the datasets when
SKLEARN_SKIP_NETWORK_TESTS=0
.Any other comments?