-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Download and test datasets in cron job #16348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
With this update to master, PRs may have a negative coverage result (when compared to master), because they will not run the network tests. This would be a false positive. |
One alternative is to disable uploading of codecov results from nightly builds. It's already confusing since it doesn't run for each commit. I feel that running untested code is more important than having 100% accurate coverage results (which are not trustworthy already now). |
Looking at the |
Thanks for the feedback. In addition, I realized that tests might again be skipped, if their order should change at some point. Therefore, added wrappers to fetch the datasets, similar to how EDIT: Don't know about the failing doc-min-dependencies, seems unrelated? |
Merging with master should resolve the circleci issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about putting the logic in sklearn/datasets/test/conftest.py
as a fixture?
# sklearn/datasets/test/conftest.py
from os import environ
import pytest
from sklearn.datasets import fetch_kddcup99 as _fetch_kddcup99
def _wrapped_fetch(f, dataset_name):
download_if_missing = environ.get('SKLEARN_SKIP_NETWORK_TESTS', '1') == '0'
def wrapped(*args, **kwargs):
kwargs['download_if_missing'] = download_if_missing
try:
return f(*args, **kwargs)
except IOError:
pytest.skip("Download {} to run this test".format(dataset_name))
return wrapped
@pytest.fixture
def fetch_kddcup99():
return _wrapped_fetch(fetch_kddcup99, dataset_name='kddcup99')
# define the other fixtures here as well.
And then in the test we can do
def test_percent10(fetch_kddcup99):
data = fetch_kddcup99()
Gladly, much cleaner this way. Thanks! Some tests were still skipped, because pandas is available (e.g. |
I think it is out of scope. I would be happy to include this feature in another PR. On our CI, we have instances that do not have pandas installed to test checks that requires pandas to not be installed. |
I reverted the pandas-related changes, and created a new PR for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @VarIr ! I'm not fully convinced this is the right use case for fixtures. For me fixtures can allow to setup an environment (e.g. make test data) in which the test can be run. Having tests which test fixtures is more confusing as it means that pytest machinery is going to show in the traceback for errors.
Still since it was suggested in the review before, let's go forward with it. LGTM, aside for one comment below.
Renamed the fixtures. Unfortunately, the PR labeler still fails after merging master. |
Thanks. Never mind that: cf. #16520 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is already quite advanced so please feel free to ignore this comment. Still:
In the long run I would rather like to consolidate everything under the Azure Pipelines CI and stop using travis (and maybe even circle ci). That would make it easier for maintainers not to have to deal with multiple logins on multiple CI systems with different permissions systems.
Azure Pipelines allow for scheduled jobs so we could run daily / weekly checks there.
I think we should still merge this PR as is: most of the changes are unrelated to CI provider. Then migrate to Azure Pipelines once #16603 is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @VarIr !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @VarIr !
Thank you all for reviewing! |
Reference Issues/PRs
Fixes #16340
What does this implement/fix? Explain your changes.
Download and test datasets in Travis cron job. Otherwise, these tests are never automatically run.
Any other comments?