-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
CI Use Python 3.12 in scipy-dev #28383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Use Python 3.12 in scipy-dev #28383
Conversation
… renaming" This reverts commit be8b69c.
Also use the same logic in doc/conf.py and tweak doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov is red but I think this is fine. I have commented below about the uncovered lines.
try: | ||
tarfile.extractall(path, filter="data") | ||
except TypeError: | ||
tarfile.extractall(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line being not covered because it seems like in all our Python 3.9 builds we are using Python >= 3.9.17
@@ -76,7 +77,8 @@ def _download_20newsgroups(target_dir, cache_path): | |||
archive_path = _fetch_remote(ARCHIVE, dirname=target_dir) | |||
|
|||
logger.debug("Decompressing %s", archive_path) | |||
tarfile.open(archive_path, "r:gz").extractall(path=target_dir) | |||
with tarfile.open(archive_path, "r:gz") as fp: | |||
tarfile_extractall(fp, path=target_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line was not covered before either in a PR (unless you triggered a scipy-dev build).
Any issues will be caught in a scheduled CI run on main
@@ -109,7 +110,8 @@ def _check_fetch_lfw(data_home=None, funneled=True, download_if_missing=True): | |||
import tarfile | |||
|
|||
logger.debug("Decompressing the data archive to %s", data_folder_path) | |||
tarfile.open(archive_path, "r:gz").extractall(path=lfw_home) | |||
with tarfile.open(archive_path, "r:gz") as fp: | |||
tarfile_extractall(fp, path=lfw_home) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line was not covered before either in a PR (unless you triggered a scipy-dev build).
Any issues will be caught in a scheduled CI run on main
@@ -175,7 +176,8 @@ def progress(blocknum, bs, size): | |||
assert sha256(archive_path.read_bytes()).hexdigest() == ARCHIVE_SHA256 | |||
|
|||
print("untarring Reuters dataset...") | |||
tarfile.open(archive_path, "r:gz").extractall(data_path) | |||
with tarfile.open(archive_path, "r:gz") as fp: | |||
tarfile_extractall(fp, data_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is in our examples, deserves a note for users to not be very confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I was not sure, what was the best thing to do here:
- use
.extractall(data_path, filter='data')
. This would work as long as you run this example with Python>=3.19.17. That includes our doc build. - do it the safest way and use our utils.fixes but this makes the example code more complicated
I have done 2. for now but I am a bit unsure whether 1. would not be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just pushed a commit that does 1., i.e. use .extractall(data_path, filter='data')
. That means this particular example will not work if you use Python >=3.9 <3.9.17.
Nice to see this one merged! The slowness on datasets download in the scipy-dev build and only inside pytest will remain a mystery for the eternity but I can certainly live with it 😉 |
Decision
In order to be able to at least run locally with Python 3.12 with warnings as errors, it would be great to merge this PR without too much additional work. I personally have bumped into it often (I have a number of Python 3.12 environments) and this has been reported in #27949 (comment) and #28372 (comment).
Based on the investigation below, I have spent already enough time on this and I am going to move the dataset download to the pylatest_conda_forge_mkl. I also enabled network tests only on scheduled runs (I think that should work but not 100% sure based on this Azure doc and this SO question) to avoid adding ~10 minutes on the pylatest_conda_forge_mkl build on each push to a PR.
Ongoing investigation
Let's see if we observe slowness as in #28374
So scipy-dev is slow build log and I only ignored the warnings. tests took ~37 minutes (total build time 48 minutes).
About 20 minutes between
pytest
being launched and "test session starts"Log excerpt
this is due to datasets download since setting SKLEARN_SKIP_NETWORK_TESTS=1 makes it fast again test takes ~16 minutes (total build time ~27 minutes) see build log
some things don't make any sense and seems very CI specific, in this scipy-dev doc build. Outside of pytest, fetch_covtype takes ~1.3 minute, inside pytest ~16minutes, pytest_collection_modifyitems with the full dataset download took 19.5 minutes
a similar build for pylatest_pip_openblas_pandas build log. fetch_covtype takes 90s outside of pytest and the same inside pytest, pytest_collection_modifyitems took ~6minutes