Skip to content

CI Use Python 3.12 in scipy-dev #28383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Feb 8, 2024

Decision

In order to be able to at least run locally with Python 3.12 with warnings as errors, it would be great to merge this PR without too much additional work. I personally have bumped into it often (I have a number of Python 3.12 environments) and this has been reported in #27949 (comment) and #28372 (comment).

Based on the investigation below, I have spent already enough time on this and I am going to move the dataset download to the pylatest_conda_forge_mkl. I also enabled network tests only on scheduled runs (I think that should work but not 100% sure based on this Azure doc and this SO question) to avoid adding ~10 minutes on the pylatest_conda_forge_mkl build on each push to a PR.

Ongoing investigation

Let's see if we observe slowness as in #28374

  • So scipy-dev is slow build log and I only ignored the warnings. tests took ~37 minutes (total build time 48 minutes).

    About 20 minutes between pytest being launched and "test session starts"

    Log excerpt
    2024-02-08T06:32:55.6427369Z + eval 'python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config='\''/home/vsts/work/1/s/.coveragerc'\'' --cov sklearn --cov-report= -n2 --maxfail=10 --pyargs sklearn'
    2024-02-08T06:32:55.6428939Z ++ python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config=/home/vsts/work/1/s/.coveragerc --cov sklearn --cov-report= -n2 --maxfail=10 --pyargs sklearn
    2024-02-08T06:52:55.6985287Z Downloading file 'face.dat' from 'https://raw.githubusercontent.com/scipy/dataset-face/main/face.dat' to '/home/vsts/.cache/scipy-data'.
    2024-02-08T06:53:17.4786746Z �[1m============================= test session starts ==============================�[0m
    2024-02-08T06:53:17.4788242Z platform linux -- Python 3.12.1, pytest-8.0.0, pluggy-1.4.0
    2024-02-08T06:53:17.4937170Z rootdir: /home/vsts/work/tmp_folder
    2024-02-08T06:53:17.4937852Z configfile: setup.cfg
    2024-02-08T06:53:17.4938359Z plugins: xdist-3.5.0, cov-4.1.0
    2024-02-08T06:53:17.4938636Z created: 2/2 workers
    2024-02-08T06:53:17.4938881Z 2 workers [36000 items]
    2024-02-08T06:53:17.4939058Z 
    2024-02-08T06:53:17.7882103Z �[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m�[33ms�[0m
    
  • this is due to datasets download since setting SKLEARN_SKIP_NETWORK_TESTS=1 makes it fast again test takes ~16 minutes (total build time ~27 minutes) see build log

  • some things don't make any sense and seems very CI specific, in this scipy-dev doc build. Outside of pytest, fetch_covtype takes ~1.3 minute, inside pytest ~16minutes, pytest_collection_modifyitems with the full dataset download took 19.5 minutes

       Ordered by: cumulative time
       List reduced from 747 to 20 due to restriction <20>
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
          7/1    0.000    0.000  340.830  340.830 {built-in method builtins.exec}
            1    0.005    0.005  340.830  340.830 <string>:1(<module>)
          3/1    0.019    0.006  340.825  340.825 _param_validation.py:182(wrapper)
            1    4.654    4.654  340.814  340.814 _rcv1.py:75(fetch_rcv1)
            6    0.000    0.000  197.327   32.888 _base.py:1410(_fetch_remote)
            6    0.156    0.026  195.378   32.563 request.py:222(urlretrieve)
       110323    0.165    0.000  189.944    0.002 socket.py:693(readinto)
       110323    0.219    0.000  189.649    0.002 ssl.py:1238(recv_into)
       110323    0.108    0.000  189.389    0.002 ssl.py:1096(read)
       110323  189.269    0.002  189.269    0.002 {method 'read' of '_ssl._SSLSocket' objects}
       165146    0.548    0.000  189.017    0.001 {method 'read' of '_io.BufferedReader' objects}
        79980    0.165    0.000  188.857    0.002 client.py:463(read)
    80120/80064    1.361    0.000   83.705    0.001 {method 'write' of '_io.BufferedWriter' objects}
            4    0.000    0.000   83.083   20.771 numpy_pickle.py:424(dump)
            4    0.000    0.000   83.042   20.761 pickle.py:470(dump)
        274/4    0.000    0.000   83.042   20.760 numpy_pickle.py:322(save)
            8    0.023    0.003   83.038   10.380 numpy_pickle.py:97(write_array)
           60    0.002    0.000   82.771    1.380 compressor.py:466(write)
        274/4    0.001    0.000   82.516   20.629 pickle.py:529(save)
         16/4    0.000    0.000   82.516   20.629 pickle.py:615(save_reduce)
              40240826 function calls (40240584 primitive calls) in 76.848 seconds
    
       Ordered by: cumulative time
       List reduced from 699 to 20 due to restriction <20>
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
          7/1    0.000    0.000   76.848   76.848 {built-in method builtins.exec}
            1    0.001    0.001   76.848   76.848 <string>:1(<module>)
          2/1    0.000    0.000   76.847   76.847 _param_validation.py:182(wrapper)
            1    1.151    1.151   76.846   76.846 _covtype.py:66(fetch_covtype)
            2    0.000    0.000   43.998   21.999 numpy_pickle.py:424(dump)
            2    0.000    0.000   43.996   21.998 pickle.py:470(dump)
         75/2    0.000    0.000   43.996   21.998 numpy_pickle.py:322(save)
            2    0.057    0.029   43.995   21.998 numpy_pickle.py:97(write_array)
    1423/1405    0.024    0.000   43.839    0.031 {method 'write' of '_io.BufferedWriter' objects}
           20    0.000    0.000   43.825    2.191 compressor.py:466(write)
           20   43.695    2.185   43.695    2.185 {method 'compress' of 'zlib.Compress' objects}
            1   14.048   14.048   28.825   28.825 _npyio_impl.py:1714(genfromtxt)
     31955660    8.084    0.000    8.084    0.000 _iotools.py:670(_loose_call)
            1    2.885    2.885    2.885    2.885 {built-in method numpy.array}
            1    0.000    0.000    2.862    2.862 _base.py:1410(_fetch_remote)
            1    0.002    0.002    2.829    2.829 request.py:222(urlretrieve)
       581013    0.371    0.000    2.274    0.000 _iotools.py:225(__call__)
         2169    0.003    0.000    2.156    0.001 socket.py:693(readinto)
         2169    0.004    0.000    2.151    0.001 ssl.py:1238(recv_into)
         2169    0.002    0.000    2.147    0.001 ssl.py:1096(read)
     total 8.0K
    drwxr-xr-x 2 vsts docker 4.0K Feb  8 21:35 RCV1
    drwxr-xr-x 2 vsts docker 4.0K Feb  8 21:36 covertype
    544M	/home/vsts/scikit_learn_data
    + eval 'python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config='\''/home/vsts/work/1/s/.coveragerc'\'' --cov sklearn --cov-report= --maxfail=10 --pyargs sklearn'
    ++ python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config=/home/vsts/work/1/s/.coveragerc --cov sklearn --cov-report= --maxfail=10 --pyargs sklearn
    Downloading file 'face.dat' from 'https://raw.githubusercontent.com/scipy/dataset-face/main/face.dat' to '/home/vsts/.cache/scipy-data'.
    ============================= test session starts ==============================
    platform linux -- Python 3.12.1, pytest-8.0.0, pluggy-1.4.0
    rootdir: /home/vsts/work/tmp_folder
    configfile: setup.cfg
    plugins: xdist-3.5.0, cov-4.1.0
    pytest_configure
    pytest_collection_modifyitems
    9 datasets to download
    dataset: fetch_20newsgroups_fxt
    dataset: fetch_20newsgroups_fxt took 8.54s
    dataset: fetch_20newsgroups_vectorized_fxt
    dataset: fetch_20newsgroups_vectorized_fxt took 18.54s
    dataset: fetch_california_housing_fxt
    dataset: fetch_california_housing_fxt took 1.51s
    dataset: fetch_covtype_fxt
    dataset: fetch_covtype_fxt took 947.15s
    dataset: fetch_kddcup99_fxt
    dataset: fetch_kddcup99_fxt took 14.82s
    dataset: fetch_olivetti_faces_fxt
    downloading Olivetti faces from https://ndownloader.figshare.com/files/5976027 to /home/vsts/scikit_learn_data
    dataset: fetch_olivetti_faces_fxt took 2.73s
    dataset: fetch_rcv1_fxt
    dataset: fetch_rcv1_fxt took 174.58s
    dataset: fetch_species_distributions_fxt
    dataset: fetch_species_distributions_fxt took 5.50s
    dataset: raccoon_face_fxt
    dataset: raccoon_face_fxt took 0.86s
    pytest_collection_modifyitems took 1174.38s
    
  • a similar build for pylatest_pip_openblas_pandas build log. fetch_covtype takes 90s outside of pytest and the same inside pytest, pytest_collection_modifyitems took ~6minutes

       Ordered by: cumulative time
       List reduced from 704 to 20 due to restriction <20>
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
          4/1    0.000    0.000  196.617  196.617 {built-in method builtins.exec}
            1    0.004    0.004  196.616  196.616 <string>:1(<module>)
          3/1    0.024    0.008  196.612  196.612 _param_validation.py:182(wrapper)
            1    5.202    5.202  196.600  196.600 _rcv1.py:75(fetch_rcv1)
    80120/80064    1.796    0.000   96.369    0.001 {method 'write' of '_io.BufferedWriter' objects}
            4    0.000    0.000   95.571   23.893 numpy_pickle.py:424(dump)
            4    0.000    0.000   95.524   23.881 pickle.py:476(dump)
        274/4    0.000    0.000   95.524   23.881 numpy_pickle.py:322(save)
            8    0.037    0.005   95.519   11.940 numpy_pickle.py:97(write_array)
           60    0.002    0.000   95.137    1.586 compressor.py:466(write)
        274/4    0.001    0.000   94.877   23.719 pickle.py:535(save)
         16/4    0.000    0.000   94.877   23.719 pickle.py:621(save_reduce)
         10/4    0.000    0.000   94.876   23.719 pickle.py:964(save_dict)
         10/4    0.000    0.000   94.876   23.719 pickle.py:977(_batch_setitems)
           60   94.025    1.567   94.025    1.567 {method 'compress' of 'zlib.Compress' objects}
            1    0.094    0.094   58.179   58.179 _svmlight_format_io.py:247(load_svmlight_files)
            1    0.000    0.000   57.624   57.624 _svmlight_format_io.py:371(<listcomp>)
            5   41.127    8.225   57.624   11.525 _svmlight_format_io.py:224(_open_and_load)
            6    0.000    0.000   31.112    5.185 _base.py:1410(_fetch_remote)
            6    0.182    0.030   28.740    4.790 request.py:221(urlretrieve)
              40291464 function calls (40291241 primitive calls) in 76.936 seconds
    
       Ordered by: cumulative time
       List reduced from 659 to 20 due to restriction <20>
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
          4/1    0.000    0.000   90.255   90.255 {built-in method builtins.exec}
            1    0.006    0.006   90.255   90.255 <string>:1(<module>)
          2/1    0.000    0.000   90.249   90.249 _param_validation.py:182(wrapper)
            1    1.559    1.559   90.249   90.249 _covtype.py:66(fetch_covtype)
            2    0.000    0.000   52.408   26.204 numpy_pickle.py:424(dump)
            2    0.000    0.000   52.407   26.204 pickle.py:476(dump)
         75/2    0.000    0.000   52.407   26.204 numpy_pickle.py:322(save)
            2    0.080    0.040   52.406   26.203 numpy_pickle.py:97(write_array)
    1423/1405    0.033    0.000   52.225    0.037 {method 'write' of '_io.BufferedWriter' objects}
           20    0.001    0.000   52.207    2.610 compressor.py:466(write)
           20   52.046    2.602   52.046    2.602 {method 'compress' of 'zlib.Compress' objects}
            1    3.369    3.369   34.302   34.302 npyio.py:1742(genfromtxt)
            1    0.001    0.001   22.657   22.657 npyio.py:2327(<listcomp>)
     31955660    9.337    0.000    9.337    0.000 _iotools.py:670(_loose_call)
            1    4.138    4.138    4.138    4.138 {built-in method numpy.array}
       581013    0.391    0.000    2.407    0.000 _iotools.py:225(__call__)
            1    0.000    0.000    1.966    1.966 _base.py:1410(_fetch_remote)
            1    0.003    0.003    1.926    1.926 request.py:221(urlretrieve)
       581013    0.523    0.000    1.625    0.000 _iotools.py:198(_delimited_splitter)
       581013    0.424    0.000    1.523    0.000 gzip.py:396(readline)
     total 8.0K
    drwxr-xr-x 2 vsts docker 4.0K Feb  8 21:30 RCV1
    drwxr-xr-x 2 vsts docker 4.0K Feb  8 21:31 covertype
    544M	/home/vsts/scikit_learn_data
    + eval 'python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config='\''/home/vsts/work/1/s/.coveragerc'\'' --cov sklearn --cov-report= --maxfail=10 --pyargs sklearn'
    ++ python -m pytest --showlocals --durations=20 --junitxml=test-data.xml --cov-config=/home/vsts/work/1/s/.coveragerc --cov sklearn --cov-report= --maxfail=10 --pyargs sklearn
    ============================= test session starts ==============================
    platform linux -- Python 3.9.18, pytest-8.0.0, pluggy-1.4.0
    rootdir: /home/vsts/work/tmp_folder
    configfile: setup.cfg
    plugins: xdist-3.5.0, cov-4.1.0
    pytest_configure
    pytest_collection_modifyitems
    9 datasets to download
    dataset: fetch_20newsgroups_fxt
    dataset: fetch_20newsgroups_fxt took 10.17s
    dataset: fetch_20newsgroups_vectorized_fxt
    dataset: fetch_20newsgroups_vectorized_fxt took 20.59s
    dataset: fetch_california_housing_fxt
    dataset: fetch_california_housing_fxt took 1.36s
    dataset: fetch_covtype_fxt
    dataset: fetch_covtype_fxt took 91.70s
    dataset: fetch_kddcup99_fxt
    dataset: fetch_kddcup99_fxt took 18.59s
    dataset: fetch_olivetti_faces_fxt
    downloading Olivetti faces from https://ndownloader.figshare.com/files/5976027 to /home/vsts/scikit_learn_data
    dataset: fetch_olivetti_faces_fxt took 2.63s
    dataset: fetch_rcv1_fxt
    dataset: fetch_rcv1_fxt took 203.30s
    dataset: fetch_species_distributions_fxt
    dataset: fetch_species_distributions_fxt took 6.67s
    dataset: raccoon_face_fxt
    pytest_collection_modifyitems took 355.17s
    
    

Copy link

github-actions bot commented Feb 8, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: f19f527. Link to the linter CI: here

@lesteve lesteve marked this pull request as ready for review February 9, 2024 10:58
@lesteve lesteve changed the title CI Second attempt on moving scipy-dev to Python 3.12 CI Use Python 3.12 in scipy-dev Feb 9, 2024
Copy link
Member Author

@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov is red but I think this is fine. I have commented below about the uncovered lines.

try:
tarfile.extractall(path, filter="data")
except TypeError:
tarfile.extractall(path)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line being not covered because it seems like in all our Python 3.9 builds we are using Python >= 3.9.17

@@ -76,7 +77,8 @@ def _download_20newsgroups(target_dir, cache_path):
archive_path = _fetch_remote(ARCHIVE, dirname=target_dir)

logger.debug("Decompressing %s", archive_path)
tarfile.open(archive_path, "r:gz").extractall(path=target_dir)
with tarfile.open(archive_path, "r:gz") as fp:
tarfile_extractall(fp, path=target_dir)
Copy link
Member Author

@lesteve lesteve Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was not covered before either in a PR (unless you triggered a scipy-dev build).

Any issues will be caught in a scheduled CI run on main

@@ -109,7 +110,8 @@ def _check_fetch_lfw(data_home=None, funneled=True, download_if_missing=True):
import tarfile

logger.debug("Decompressing the data archive to %s", data_folder_path)
tarfile.open(archive_path, "r:gz").extractall(path=lfw_home)
with tarfile.open(archive_path, "r:gz") as fp:
tarfile_extractall(fp, path=lfw_home)
Copy link
Member Author

@lesteve lesteve Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was not covered before either in a PR (unless you triggered a scipy-dev build).

Any issues will be caught in a scheduled CI run on main

@@ -175,7 +176,8 @@ def progress(blocknum, bs, size):
assert sha256(archive_path.read_bytes()).hexdigest() == ARCHIVE_SHA256

print("untarring Reuters dataset...")
tarfile.open(archive_path, "r:gz").extractall(data_path)
with tarfile.open(archive_path, "r:gz") as fp:
tarfile_extractall(fp, data_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is in our examples, deserves a note for users to not be very confused.

Copy link
Member Author

@lesteve lesteve Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I was not sure, what was the best thing to do here:

  1. use .extractall(data_path, filter='data'). This would work as long as you run this example with Python>=3.19.17. That includes our doc build.
  2. do it the safest way and use our utils.fixes but this makes the example code more complicated

I have done 2. for now but I am a bit unsure whether 1. would not be simpler.

Copy link
Member Author

@lesteve lesteve Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just pushed a commit that does 1., i.e. use .extractall(data_path, filter='data'). That means this particular example will not work if you use Python >=3.9 <3.9.17.

@thomasjpfan thomasjpfan merged commit 50e8749 into scikit-learn:main Feb 14, 2024
@lesteve lesteve deleted the simpler-warnings-as-errors-setup-python3.12 branch February 15, 2024 10:21
@lesteve
Copy link
Member Author

lesteve commented Feb 15, 2024

Nice to see this one merged! The slowness on datasets download in the scipy-dev build and only inside pytest will remain a mystery for the eternity but I can certainly live with it 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants