Skip to content

REL scikit-learn 1.5.2 for Python 3.13 #29987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 2, 2024

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Oct 2, 2024

Closes #29973 (after some manual upload of the generated wheels once merged into 1.5.X).

As discussed in the dev meeting, I'm trying to backport #29789 such that we can generate the Python 3.13 wheels ahead of the CPython release.

Let's try our luck here.

Copy link

github-actions bot commented Oct 2, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 217ab6c. Link to the linter CI: here

@ogrisel
Copy link
Member

ogrisel commented Oct 2, 2024

There is something fishy happening with openml.org:

  • on the regular doc build we get invalid digests:
Unexpected failing examples (2):

    ../examples/inspection/plot_partial_dependence.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/inspection/plot_partial_dependence.py", line 45, in <module>
        bikes = fetch_openml("Bike_Sharing_Demand", version=2, as_frame=True)
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 1008, in fetch_openml
        data_info = _get_data_info_by_name(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 340, in _get_data_info_by_name
        json_data = _get_json_content_from_openml_api(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 251, in _get_json_content_from_openml_api
        raise OpenMLError(error_message)
    sklearn.datasets._openml.OpenMLError: Dataset bike_sharing_demand with version 2 not found.

    ../examples/linear_model/plot_tweedie_regression_insurance_claims.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/linear_model/plot_tweedie_regression_insurance_claims.py", line 220, in <module>
        df = load_mtpl2()
      File "/home/circleci/project/examples/linear_model/plot_tweedie_regression_insurance_claims.py", line 73, in load_mtpl2
        df_sev = fetch_openml(data_id=41215, as_frame=True).data
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 1127, in fetch_openml
        bunch = _download_data_to_bunch(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 681, in _download_data_to_bunch
        X, y, frame, categories = _retry_with_clean_cache(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 76, in wrapper
        return f(*args, **kw)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 524, in _load_arff_response
        raise ValueError(
    ValueError: md5 checksum of local file for data/v1/download/20649149 does not match description: expected: 24cc74449e3931cb1aad0d43a12e7a6e but got 87899c4863968a25a3dcd6f241ee91c3. Downloaded file could have been modified / corrupted, clean cache and retry...

-------------------------------------------------------------------------------

Extension error:
Here is a summary of the problems encountered when running the examples:

Unexpected failing examples (2):

    ../examples/inspection/plot_partial_dependence.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/inspection/plot_partial_dependence.py", line 45, in <module>
        bikes = fetch_openml("Bike_Sharing_Demand", version=2, as_frame=True)
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 1008, in fetch_openml
        data_info = _get_data_info_by_name(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 340, in _get_data_info_by_name
        json_data = _get_json_content_from_openml_api(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 251, in _get_json_content_from_openml_api
        raise OpenMLError(error_message)
    sklearn.datasets._openml.OpenMLError: Dataset bike_sharing_demand with version 2 not found.

    ../examples/linear_model/plot_tweedie_regression_insurance_claims.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/linear_model/plot_tweedie_regression_insurance_claims.py", line 220, in <module>
        df = load_mtpl2()
      File "/home/circleci/project/examples/linear_model/plot_tweedie_regression_insurance_claims.py", line 73, in load_mtpl2
        df_sev = fetch_openml(data_id=41215, as_frame=True).data
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 1127, in fetch_openml
        bunch = _download_data_to_bunch(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 681, in _download_data_to_bunch
        X, y, frame, categories = _retry_with_clean_cache(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 76, in wrapper
        return f(*args, **kw)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 524, in _load_arff_response
        raise ValueError(
    ValueError: md5 checksum of local file for data/v1/download/20649149 does not match description: expected: 24cc74449e3931cb1aad0d43a12e7a6e but got 87899c4863968a25a3dcd6f241ee91c3. Downloaded file could have been modified / corrupted, clean cache and retry...
  • on the doc-min-dependencies build we have missing datasets:
Unexpected failing examples:

    ../examples/linear_model/plot_tweedie_regression_insurance_claims.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/linear_model/plot_tweedie_regression_insurance_claims.py", line 220, in <module>
        df = load_mtpl2()
      File "/home/circleci/project/examples/linear_model/plot_tweedie_regression_insurance_claims.py", line 73, in load_mtpl2
        df_sev = fetch_openml(data_id=41215, as_frame=True).data
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 1120, in fetch_openml
        data_qualities = _get_data_qualities(data_id, data_home)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 400, in _get_data_qualities
        json_data = _get_json_content_from_openml_api(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 251, in _get_json_content_from_openml_api
        raise OpenMLError(error_message)
    sklearn.datasets._openml.OpenMLError: Dataset with data_id 41215 not found.

    ../examples/gaussian_process/plot_gpr_co2.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/gaussian_process/plot_gpr_co2.py", line 39, in <module>
        co2 = fetch_openml(data_id=41187, as_frame=True)
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 1025, in fetch_openml
        data_description = _get_data_description_by_id(data_id, data_home)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 360, in _get_data_description_by_id
        json_data = _get_json_content_from_openml_api(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 251, in _get_json_content_from_openml_api
        raise OpenMLError(error_message)
    sklearn.datasets._openml.OpenMLError: Dataset with data_id 41187 not found.

    ../examples/release_highlights/plot_release_highlights_0_22_0.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/release_highlights/plot_release_highlights_0_22_0.py", line 244, in <module>
        titanic = fetch_openml("titanic", version=1, as_frame=True, parser="pandas")
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 1008, in fetch_openml
        data_info = _get_data_info_by_name(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 340, in _get_data_info_by_name
        json_data = _get_json_content_from_openml_api(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 251, in _get_json_content_from_openml_api
        raise OpenMLError(error_message)
    sklearn.datasets._openml.OpenMLError: Dataset titanic with version 1 not found.

    ../examples/applications/plot_digits_denoising.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/applications/plot_digits_denoising.py", line 40, in <module>
        X, y = fetch_openml(data_id=41082, as_frame=False, return_X_y=True)
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 1025, in fetch_openml
        data_description = _get_data_description_by_id(data_id, data_home)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 360, in _get_data_description_by_id
        json_data = _get_json_content_from_openml_api(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 251, in _get_json_content_from_openml_api
        raise OpenMLError(error_message)
    sklearn.datasets._openml.OpenMLError: Dataset with data_id 41082 not found.

    ../examples/ensemble/plot_hgbt_regression.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/ensemble/plot_hgbt_regression.py", line 57, in <module>
        electricity = fetch_openml(
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 1008, in fetch_openml
        data_info = _get_data_info_by_name(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 340, in _get_data_info_by_name
        json_data = _get_json_content_from_openml_api(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 251, in _get_json_content_from_openml_api
        raise OpenMLError(error_message)
    sklearn.datasets._openml.OpenMLError: Dataset electricity with version 1 not found.

    ../examples/release_highlights/plot_release_highlights_1_1_0.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/release_highlights/plot_release_highlights_1_1_0.py", line 74, in <module>
        X, y = fetch_openml(
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 1008, in fetch_openml
        data_info = _get_data_info_by_name(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 340, in _get_data_info_by_name
        json_data = _get_json_content_from_openml_api(
      File "/home/circleci/project/sklearn/datasets/_openml.py", line 251, in _get_json_content_from_openml_api
        raise OpenMLError(error_message)
    sklearn.datasets._openml.OpenMLError: Dataset titanic with version 1 not found.

I will trigger a new doc run to check whether this failure is just transient or not.

Has the corrupted files (download or in cache) problem already been observed elsewhere?

EDIT: the new doc build was successful...

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff looks good, but one should probably try to install and test the resulting wheel manually at least on one platform before uploading to the pypi release.

@ogrisel
Copy link
Member

ogrisel commented Oct 2, 2024

Question: shall we both upload free-threading and gil-enabled wheels to pypi?

  • numpy does it:

https://pypi.org/project/numpy/#files

  • scipy does not (yet):

https://pypi.org/project/scipy/#files

I am +0 to do so, even if it's of limited use as long as scipy does not do it either. But at least we will be ready the day it does.

@lesteve
Copy link
Member

lesteve commented Oct 2, 2024

Do we want to upload free-threaded wheels?

The failures on Windows free-threaded is because there is no numpy development wheels (and no scipy wheels) for free-threaded on Windows and the attempt to build from source fails.

@ogrisel
Copy link
Member

ogrisel commented Oct 2, 2024

The windows free threading wheel build has failed as follows:

C:\Users\runneradmin\AppData\Local\Temp\cibw-run-ydyqpout\cp313t-win_amd64\build\venv\Scripts\python.exe C:\Users\runneradmin\AppData\Local\Temp\pip-install-vvlc3xgm\numpy_ebb35a4b339e4fd6af18a4584bcc3864\vendored-meson\meson\meson.py setup C:\Users\runneradmin\AppData\Local\Temp\pip-install-vvlc3xgm\numpy_ebb35a4b339e4fd6af18a4584bcc3864 C:\Users\runneradmin\AppData\Local\Temp\pip-install-vvlc3xgm\numpy_ebb35a4b339e4fd6af18a4584bcc3864\.mesonpy-qpaqen1j -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=C:\Users\runneradmin\AppData\Local\Temp\pip-install-vvlc3xgm\numpy_ebb35a4b339e4fd6af18a4584bcc3864\.mesonpy-qpaqen1j\meson-python-native-file.ini
      The Meson build system
      Version: 1.4.99
      Source dir: C:\Users\runneradmin\AppData\Local\Temp\pip-install-vvlc3xgm\numpy_ebb35a4b339e4fd6af18a4584bcc3864
      Build dir: C:\Users\runneradmin\AppData\Local\Temp\pip-install-vvlc3xgm\numpy_ebb35a4b339e4fd6af18a4584bcc3864\.mesonpy-qpaqen1j
      Build type: native build
      Project name: NumPy
      Project version: 2.1.1
      C compiler for the host machine: gcc (gcc 12.2.0 "gcc (x86_64-posix-seh-rev2, Built by MinGW-W64 project) 12.2.0")
      C linker for the host machine: gcc ld.bfd 2.39
      C++ compiler for the host machine: c++ (gcc 12.2.0 "c++ (x86_64-posix-seh-rev2, Built by MinGW-W64 project) 12.2.0")
      C++ linker for the host machine: c++ ld.bfd 2.39
      Cython compiler for the host machine: cython (cython 3.1.0)
      Host machine cpu family: x86_64
      Host machine cpu: x86_64
      Program python found: YES (C:\Users\runneradmin\AppData\Local\Temp\cibw-run-ydyqpout\cp313t-win_amd64\build\venv\Scripts\python.exe)
      Could not find Python3 library 'C:\\Users\\runneradmin\\AppData\\Local\\pypa\\cibuildwheel\\Cache\\nuget-cpython\\python-freethreaded.3.13.0-rc2\\tools\\python313.dll'
      Run-time dependency python found: NO (tried sysconfig)
  
      ..\meson.build:41:12: ERROR: Python dependency not found
  
      A full log can be found at C:\Users\runneradmin\AppData\Local\Temp\pip-install-vvlc3xgm\numpy_ebb35a4b339e4fd6af18a4584bcc3864\.mesonpy-qpaqen1j\meson-logs\meson-log.txt
      error: subprocess-exited-with-error
  
      Preparing metadata (pyproject.toml) did not run successfully.
      exit code: 1

I am fine with disabling free-threading in 1.5.X for now if we want to release the regular wheels faster.

@jeremiedbb
Copy link
Member

I think it's okay to delay the free threaded wheels for 1.6 as the first release supporting free threaded python.

@glemaitre
Copy link
Member Author

OK. So I'll disable the free threaded and I'll make a PR in main to add MacOS since this is already available.

@lesteve
Copy link
Member

lesteve commented Oct 2, 2024

Merging, thanks!

@lesteve lesteve merged commit 870081c into scikit-learn:1.5.X Oct 2, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants