CI Use Python 3.12 in scipy-dev #28383

lesteve · 2024-02-08T06:20:00Z

Decision

In order to be able to at least run locally with Python 3.12 with warnings as errors, it would be great to merge this PR without too much additional work. I personally have bumped into it often (I have a number of Python 3.12 environments) and this has been reported in #27949 (comment) and #28372 (comment).

Based on the investigation below, I have spent already enough time on this and I am going to move the dataset download to the pylatest_conda_forge_mkl. I also enabled network tests only on scheduled runs (I think that should work but not 100% sure based on this Azure doc and this SO question) to avoid adding ~10 minutes on the pylatest_conda_forge_mkl build on each push to a PR.

Ongoing investigation

Let's see if we observe slowness as in #28374

So scipy-dev is slow build log and I only ignored the warnings. tests took ~37 minutes (total build time 48 minutes).

About 20 minutes between pytest being launched and "test session starts"

Log excerpt

2024-02-08T06:32:55.6427369Z + eval 'python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config='\''/home/vsts/work/1/s/.coveragerc'\'' --cov sklearn --cov-report= -n2 --maxfail=10 --pyargs sklearn'
2024-02-08T06:32:55.6428939Z ++ python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config=/home/vsts/work/1/s/.coveragerc --cov sklearn --cov-report= -n2 --maxfail=10 --pyargs sklearn
2024-02-08T06:52:55.6985287Z Downloading file 'face.dat' from 'https://raw.githubusercontent.com/scipy/dataset-face/main/face.dat' to '/home/vsts/.cache/scipy-data'.
2024-02-08T06:53:17.4786746Z �[1m============================= test session starts ==============================�[0m
2024-02-08T06:53:17.4788242Z platform linux -- Python 3.12.1, pytest-8.0.0, pluggy-1.4.0
2024-02-08T06:53:17.4937170Z rootdir: /home/vsts/work/tmp_folder
2024-02-08T06:53:17.4937852Z configfile: setup.cfg
2024-02-08T06:53:17.4938359Z plugins: xdist-3.5.0, cov-4.1.0
2024-02-08T06:53:17.4938636Z created: 2/2 workers
2024-02-08T06:53:17.4938881Z 2 workers [36000 items]
2024-02-08T06:53:17.4939058Z 
2024-02-08T06:53:17.7882103Z �[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m

this is due to datasets download since setting SKLEARN_SKIP_NETWORK_TESTS=1 makes it fast again test takes ~16 minutes (total build time ~27 minutes) see build log

some things don't make any sense and seems very CI specific, in this scipy-dev doc build. Outside of pytest, fetch_covtype takes ~1.3 minute, inside pytest ~16minutes, pytest_collection_modifyitems with the full dataset download took 19.5 minutes

   Ordered by: cumulative time
   List reduced from 747 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      7/1    0.000    0.000  340.830  340.830 {built-in method builtins.exec}
        1    0.005    0.005  340.830  340.830 <string>:1(<module>)
      3/1    0.019    0.006  340.825  340.825 _param_validation.py:182(wrapper)
        1    4.654    4.654  340.814  340.814 _rcv1.py:75(fetch_rcv1)
        6    0.000    0.000  197.327   32.888 _base.py:1410(_fetch_remote)
        6    0.156    0.026  195.378   32.563 request.py:222(urlretrieve)
   110323    0.165    0.000  189.944    0.002 socket.py:693(readinto)
   110323    0.219    0.000  189.649    0.002 ssl.py:1238(recv_into)
   110323    0.108    0.000  189.389    0.002 ssl.py:1096(read)
   110323  189.269    0.002  189.269    0.002 {method 'read' of '_ssl._SSLSocket' objects}
   165146    0.548    0.000  189.017    0.001 {method 'read' of '_io.BufferedReader' objects}
    79980    0.165    0.000  188.857    0.002 client.py:463(read)
80120/80064    1.361    0.000   83.705    0.001 {method 'write' of '_io.BufferedWriter' objects}
        4    0.000    0.000   83.083   20.771 numpy_pickle.py:424(dump)
        4    0.000    0.000   83.042   20.761 pickle.py:470(dump)
    274/4    0.000    0.000   83.042   20.760 numpy_pickle.py:322(save)
        8    0.023    0.003   83.038   10.380 numpy_pickle.py:97(write_array)
       60    0.002    0.000   82.771    1.380 compressor.py:466(write)
    274/4    0.001    0.000   82.516   20.629 pickle.py:529(save)
     16/4    0.000    0.000   82.516   20.629 pickle.py:615(save_reduce)
          40240826 function calls (40240584 primitive calls) in 76.848 seconds

   Ordered by: cumulative time
   List reduced from 699 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      7/1    0.000    0.000   76.848   76.848 {built-in method builtins.exec}
        1    0.001    0.001   76.848   76.848 <string>:1(<module>)
      2/1    0.000    0.000   76.847   76.847 _param_validation.py:182(wrapper)
        1    1.151    1.151   76.846   76.846 _covtype.py:66(fetch_covtype)
        2    0.000    0.000   43.998   21.999 numpy_pickle.py:424(dump)
        2    0.000    0.000   43.996   21.998 pickle.py:470(dump)
     75/2    0.000    0.000   43.996   21.998 numpy_pickle.py:322(save)
        2    0.057    0.029   43.995   21.998 numpy_pickle.py:97(write_array)
1423/1405    0.024    0.000   43.839    0.031 {method 'write' of '_io.BufferedWriter' objects}
       20    0.000    0.000   43.825    2.191 compressor.py:466(write)
       20   43.695    2.185   43.695    2.185 {method 'compress' of 'zlib.Compress' objects}
        1   14.048   14.048   28.825   28.825 _npyio_impl.py:1714(genfromtxt)
 31955660    8.084    0.000    8.084    0.000 _iotools.py:670(_loose_call)
        1    2.885    2.885    2.885    2.885 {built-in method numpy.array}
        1    0.000    0.000    2.862    2.862 _base.py:1410(_fetch_remote)
        1    0.002    0.002    2.829    2.829 request.py:222(urlretrieve)
   581013    0.371    0.000    2.274    0.000 _iotools.py:225(__call__)
     2169    0.003    0.000    2.156    0.001 socket.py:693(readinto)
     2169    0.004    0.000    2.151    0.001 ssl.py:1238(recv_into)
     2169    0.002    0.000    2.147    0.001 ssl.py:1096(read)
 total 8.0K
drwxr-xr-x 2 vsts docker 4.0K Feb  8 21:35 RCV1
drwxr-xr-x 2 vsts docker 4.0K Feb  8 21:36 covertype
544M	/home/vsts/scikit_learn_data
+ eval 'python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config='\''/home/vsts/work/1/s/.coveragerc'\'' --cov sklearn --cov-report= --maxfail=10 --pyargs sklearn'
++ python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config=/home/vsts/work/1/s/.coveragerc --cov sklearn --cov-report= --maxfail=10 --pyargs sklearn
Downloading file 'face.dat' from 'https://raw.githubusercontent.com/scipy/dataset-face/main/face.dat' to '/home/vsts/.cache/scipy-data'.
============================= test session starts ==============================
platform linux -- Python 3.12.1, pytest-8.0.0, pluggy-1.4.0
rootdir: /home/vsts/work/tmp_folder
configfile: setup.cfg
plugins: xdist-3.5.0, cov-4.1.0
pytest_configure
pytest_collection_modifyitems
9 datasets to download
dataset: fetch_20newsgroups_fxt
dataset: fetch_20newsgroups_fxt took 8.54s
dataset: fetch_20newsgroups_vectorized_fxt
dataset: fetch_20newsgroups_vectorized_fxt took 18.54s
dataset: fetch_california_housing_fxt
dataset: fetch_california_housing_fxt took 1.51s
dataset: fetch_covtype_fxt
dataset: fetch_covtype_fxt took 947.15s
dataset: fetch_kddcup99_fxt
dataset: fetch_kddcup99_fxt took 14.82s
dataset: fetch_olivetti_faces_fxt
downloading Olivetti faces from https://ndownloader.figshare.com/files/5976027 to /home/vsts/scikit_learn_data
dataset: fetch_olivetti_faces_fxt took 2.73s
dataset: fetch_rcv1_fxt
dataset: fetch_rcv1_fxt took 174.58s
dataset: fetch_species_distributions_fxt
dataset: fetch_species_distributions_fxt took 5.50s
dataset: raccoon_face_fxt
dataset: raccoon_face_fxt took 0.86s
pytest_collection_modifyitems took 1174.38s

a similar build for pylatest_pip_openblas_pandas build log. fetch_covtype takes 90s outside of pytest and the same inside pytest, pytest_collection_modifyitems took ~6minutes

   Ordered by: cumulative time
   List reduced from 704 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/1    0.000    0.000  196.617  196.617 {built-in method builtins.exec}
        1    0.004    0.004  196.616  196.616 <string>:1(<module>)
      3/1    0.024    0.008  196.612  196.612 _param_validation.py:182(wrapper)
        1    5.202    5.202  196.600  196.600 _rcv1.py:75(fetch_rcv1)
80120/80064    1.796    0.000   96.369    0.001 {method 'write' of '_io.BufferedWriter' objects}
        4    0.000    0.000   95.571   23.893 numpy_pickle.py:424(dump)
        4    0.000    0.000   95.524   23.881 pickle.py:476(dump)
    274/4    0.000    0.000   95.524   23.881 numpy_pickle.py:322(save)
        8    0.037    0.005   95.519   11.940 numpy_pickle.py:97(write_array)
       60    0.002    0.000   95.137    1.586 compressor.py:466(write)
    274/4    0.001    0.000   94.877   23.719 pickle.py:535(save)
     16/4    0.000    0.000   94.877   23.719 pickle.py:621(save_reduce)
     10/4    0.000    0.000   94.876   23.719 pickle.py:964(save_dict)
     10/4    0.000    0.000   94.876   23.719 pickle.py:977(_batch_setitems)
       60   94.025    1.567   94.025    1.567 {method 'compress' of 'zlib.Compress' objects}
        1    0.094    0.094   58.179   58.179 _svmlight_format_io.py:247(load_svmlight_files)
        1    0.000    0.000   57.624   57.624 _svmlight_format_io.py:371(<listcomp>)
        5   41.127    8.225   57.624   11.525 _svmlight_format_io.py:224(_open_and_load)
        6    0.000    0.000   31.112    5.185 _base.py:1410(_fetch_remote)
        6    0.182    0.030   28.740    4.790 request.py:221(urlretrieve)
          40291464 function calls (40291241 primitive calls) in 76.936 seconds

   Ordered by: cumulative time
   List reduced from 659 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/1    0.000    0.000   90.255   90.255 {built-in method builtins.exec}
        1    0.006    0.006   90.255   90.255 <string>:1(<module>)
      2/1    0.000    0.000   90.249   90.249 _param_validation.py:182(wrapper)
        1    1.559    1.559   90.249   90.249 _covtype.py:66(fetch_covtype)
        2    0.000    0.000   52.408   26.204 numpy_pickle.py:424(dump)
        2    0.000    0.000   52.407   26.204 pickle.py:476(dump)
     75/2    0.000    0.000   52.407   26.204 numpy_pickle.py:322(save)
        2    0.080    0.040   52.406   26.203 numpy_pickle.py:97(write_array)
1423/1405    0.033    0.000   52.225    0.037 {method 'write' of '_io.BufferedWriter' objects}
       20    0.001    0.000   52.207    2.610 compressor.py:466(write)
       20   52.046    2.602   52.046    2.602 {method 'compress' of 'zlib.Compress' objects}
        1    3.369    3.369   34.302   34.302 npyio.py:1742(genfromtxt)
        1    0.001    0.001   22.657   22.657 npyio.py:2327(<listcomp>)
 31955660    9.337    0.000    9.337    0.000 _iotools.py:670(_loose_call)
        1    4.138    4.138    4.138    4.138 {built-in method numpy.array}
   581013    0.391    0.000    2.407    0.000 _iotools.py:225(__call__)
        1    0.000    0.000    1.966    1.966 _base.py:1410(_fetch_remote)
        1    0.003    0.003    1.926    1.926 request.py:221(urlretrieve)
   581013    0.523    0.000    1.625    0.000 _iotools.py:198(_delimited_splitter)
   581013    0.424    0.000    1.523    0.000 gzip.py:396(readline)
 total 8.0K
drwxr-xr-x 2 vsts docker 4.0K Feb  8 21:30 RCV1
drwxr-xr-x 2 vsts docker 4.0K Feb  8 21:31 covertype
544M	/home/vsts/scikit_learn_data
+ eval 'python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config='\''/home/vsts/work/1/s/.coveragerc'\'' --cov sklearn --cov-report= --maxfail=10 --pyargs sklearn'
++ python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config=/home/vsts/work/1/s/.coveragerc --cov sklearn --cov-report= --maxfail=10 --pyargs sklearn
============================= test session starts ==============================
platform linux -- Python 3.9.18, pytest-8.0.0, pluggy-1.4.0
rootdir: /home/vsts/work/tmp_folder
configfile: setup.cfg
plugins: xdist-3.5.0, cov-4.1.0
pytest_configure
pytest_collection_modifyitems
9 datasets to download
dataset: fetch_20newsgroups_fxt
dataset: fetch_20newsgroups_fxt took 10.17s
dataset: fetch_20newsgroups_vectorized_fxt
dataset: fetch_20newsgroups_vectorized_fxt took 20.59s
dataset: fetch_california_housing_fxt
dataset: fetch_california_housing_fxt took 1.36s
dataset: fetch_covtype_fxt
dataset: fetch_covtype_fxt took 91.70s
dataset: fetch_kddcup99_fxt
dataset: fetch_kddcup99_fxt took 18.59s
dataset: fetch_olivetti_faces_fxt
downloading Olivetti faces from https://ndownloader.figshare.com/files/5976027 to /home/vsts/scikit_learn_data
dataset: fetch_olivetti_faces_fxt took 2.63s
dataset: fetch_rcv1_fxt
dataset: fetch_rcv1_fxt took 203.30s
dataset: fetch_species_distributions_fxt
dataset: fetch_species_distributions_fxt took 6.67s
dataset: raccoon_face_fxt
pytest_collection_modifyitems took 355.17s

… renaming" This reverts commit be8b69c.

Also use the same logic in doc/conf.py and tweak doc

github-actions · 2024-02-08T06:21:17Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: f19f527. Link to the linter CI: here}

lesteve

Codecov is red but I think this is fine. I have commented below about the uncovered lines.

lesteve · 2024-02-12T15:22:19Z

sklearn/utils/fixes.py

+    try:
+        tarfile.extractall(path, filter="data")
+    except TypeError:
+        tarfile.extractall(path)


This line being not covered because it seems like in all our Python 3.9 builds we are using Python >= 3.9.17

lesteve · 2024-02-12T15:23:17Z

sklearn/datasets/_twenty_newsgroups.py

@@ -76,7 +77,8 @@ def _download_20newsgroups(target_dir, cache_path):
    archive_path = _fetch_remote(ARCHIVE, dirname=target_dir)

    logger.debug("Decompressing %s", archive_path)
-    tarfile.open(archive_path, "r:gz").extractall(path=target_dir)
+    with tarfile.open(archive_path, "r:gz") as fp:
+        tarfile_extractall(fp, path=target_dir)


This line was not covered before either in a PR (unless you triggered a scipy-dev build).

Any issues will be caught in a scheduled CI run on main

lesteve · 2024-02-12T15:23:25Z

sklearn/datasets/_lfw.py

@@ -109,7 +110,8 @@ def _check_fetch_lfw(data_home=None, funneled=True, download_if_missing=True):
        import tarfile

        logger.debug("Decompressing the data archive to %s", data_folder_path)
-        tarfile.open(archive_path, "r:gz").extractall(path=lfw_home)
+        with tarfile.open(archive_path, "r:gz") as fp:
+            tarfile_extractall(fp, path=lfw_home)


This line was not covered before either in a PR (unless you triggered a scipy-dev build).

Any issues will be caught in a scheduled CI run on main

adrinjalali · 2024-02-13T14:42:31Z

examples/applications/plot_out_of_core_classification.py

@@ -175,7 +176,8 @@ def progress(blocknum, bs, size):
        assert sha256(archive_path.read_bytes()).hexdigest() == ARCHIVE_SHA256

        print("untarring Reuters dataset...")
-        tarfile.open(archive_path, "r:gz").extractall(data_path)
+        with tarfile.open(archive_path, "r:gz") as fp:
+            tarfile_extractall(fp, data_path)


since this is in our examples, deserves a note for users to not be very confused.

Good point, I was not sure, what was the best thing to do here:

use .extractall(data_path, filter='data'). This would work as long as you run this example with Python>=3.19.17. That includes our doc build.

do it the safest way and use our utils.fixes but this makes the example code more complicated

I have done 2. for now but I am a bit unsure whether 1. would not be simpler.

I have just pushed a commit that does 1., i.e. use .extractall(data_path, filter='data'). That means this particular example will not work if you use Python >=3.9 <3.9.17.

lesteve · 2024-02-15T10:22:33Z

Nice to see this one merged! The slowness on datasets download in the scipy-dev build and only inside pytest will remain a mystery for the eternity but I can certainly live with it 😉

lesteve added 24 commits February 6, 2024 14:05

wip

a494222

Remove Python 3.12 warnings

600246d

Add common function to use in tests and doc build

ac4e4aa

Remove pytest_addoption

1adc9e1

tweak

bd995fc

[scipy-dev]

2fefea7

Adapt environment variables + tweaks

416678d

[scipy-dev]

d80ab81

[scipy-dev] fix for scipy dev attr renaming

be8b69c

[scipy-dev] [doc build] make sure full doc build works

5db0b99

[scipy-dev] [doc build] make sure full doc build works take 2

0e76426

[scipy-dev] [doc build] make sure full doc build works take 2

213e686

[azure parallel] [doc build] Revert "scipy-dev fix for scipy dev attr…

35dac82

… renaming" This reverts commit be8b69c.

Use _inicache approach to change pytest config filterwarnings.

8d27ed7

Also use the same logic in doc/conf.py and tweak doc

[doc build]

20ebe68

[azure parallel] [doc build]

fb7fd6f

[scipy-dev] remove debug code

226c619

Add test

0a702f4

Fix

1f47260

Use addinivalue_line

cefa1ce

[azure parallel] [doc build]

b5dad1c

Add ignored warnings for Python 3.12

f08f380

Update scipy-dev to Python 3.12

8ac97a4

[scipy-dev]

cea7144

github-actions bot added module:utils Build / CI labels Feb 8, 2024

lesteve added the No Changelog Needed label Feb 8, 2024

lesteve added 2 commits February 8, 2024 08:37

[scipy-dev] make sure this is due to dataset download

520722e

Disable pytest-xdist and add debug

737453d

lesteve added 14 commits February 8, 2024 20:47

debug rcv1 and covtype

8d16d88

[scipy-dev] [azure parallel]

6ad8eec

[scipy-dev] [azure parallel] no cd

0abff76

[scipy-dev] [azure parallel] tweak

177df9f

[scipy-dev] [azure parallel] tweak

56f27f3

[scipy-dev] [azure parallel] tweak

ed20d55

[scipy-dev] [azure parallel] tweak

4108ed9

[scipy-dev] [azure parallel] tweak

df917ec

[scipy-dev] [azure parallel] tweak

c319bdc

[scipy-dev] [azure parallel] tweak

e805fb5

[scipy-dev] [azure parallel] tweak

c2831d2

Move network tests to pip_latest_openblas_pandas

a9a78f0

[scipy-dev] [azure parallel] move network tests to conda_forge_mkl

c17ece2

[scipy-dev] clean-up

45fac8f

lesteve marked this pull request as ready for review February 9, 2024 10:58

[scipy-dev] trigger CI

089cb4d

lesteve changed the title ~~CI Second attempt on moving scipy-dev to Python 3.12~~ CI Use Python 3.12 in scipy-dev Feb 9, 2024

lesteve added 2 commits February 10, 2024 05:26

[scipy-dev] Add tarfile_extractall to utils.fixes

0a03efa

[scipy-dev] Only run network tests on scheduled run

2e01bc6

lesteve mentioned this pull request Feb 12, 2024

🔒 🤖 CI Update lock files for scipy-dev CI build(s) 🔒 🤖 #28213

Closed

lesteve commented Feb 12, 2024

View reviewed changes

adrinjalali reviewed Feb 13, 2024

View reviewed changes

[doc build] Use filter="all" in example rather than utils.fixes function

f19f527

adrinjalali approved these changes Feb 14, 2024

View reviewed changes

thomasjpfan approved these changes Feb 14, 2024

View reviewed changes

thomasjpfan merged commit 50e8749 into scikit-learn:main Feb 14, 2024

lesteve deleted the simpler-warnings-as-errors-setup-python3.12 branch February 15, 2024 10:21

lesteve mentioned this pull request Mar 27, 2024

MAINT avoid fetching lfw datasets when SKLEARN_SKIP_NETWORK_TESTS=1 #28709

Merged

lesteve mentioned this pull request Jul 9, 2024

Fix tests for numpy 2 and array api compat #29436

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI Use Python 3.12 in scipy-dev #28383

CI Use Python 3.12 in scipy-dev #28383

lesteve commented Feb 8, 2024 •

edited

Loading

github-actions bot commented Feb 8, 2024 •

edited

Loading

lesteve left a comment

lesteve Feb 12, 2024

lesteve Feb 12, 2024 •

edited

Loading

lesteve Feb 12, 2024 •

edited

Loading

adrinjalali Feb 13, 2024

lesteve Feb 13, 2024 •

edited

Loading

lesteve Feb 14, 2024 •

edited

Loading

lesteve commented Feb 15, 2024

CI Use Python 3.12 in scipy-dev #28383

CI Use Python 3.12 in scipy-dev #28383

Conversation

lesteve commented Feb 8, 2024 • edited Loading

Decision

Ongoing investigation

github-actions bot commented Feb 8, 2024 • edited Loading

✔️ Linting Passed

lesteve left a comment

Choose a reason for hiding this comment

lesteve Feb 12, 2024

Choose a reason for hiding this comment

lesteve Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

lesteve Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

adrinjalali Feb 13, 2024

Choose a reason for hiding this comment

lesteve Feb 13, 2024 • edited Loading

Choose a reason for hiding this comment

lesteve Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

lesteve commented Feb 15, 2024

lesteve commented Feb 8, 2024 •

edited

Loading

github-actions bot commented Feb 8, 2024 •

edited

Loading

lesteve Feb 12, 2024 •

edited

Loading

lesteve Feb 12, 2024 •

edited

Loading

lesteve Feb 13, 2024 •

edited

Loading

lesteve Feb 14, 2024 •

edited

Loading