ENH Allow for appropriate dtype us in `preprocessing.PolynomialFeatures` for sparse matrices #23731

Micky774 · 2022-06-22T18:07:41Z

Reference Issues/PRs

Fixes #16803
Fixes #17554
Resolves #19676 (stalled)
Resolves #20524 (stalled)

What does this implement/fix? Explain your changes.

PR #20524: Calculates number of non-zero terms for each degree (row-wise) and creates dense arrays for data/indices/indptr to pass to Cython _csr_polynomial_expansion. Since the size is known a-priori, the appropriate dtype can be used during construction. The use of fused types in _csr_polynomial_expansion allows for only the minimally sufficient index dtype to be used, decreasing wasted memory when int32 is sufficient.

This PR: reconciles w/ main and makes minor changes.

Any other comments?

The full functionality of this PR is really only enabled in scipy_version>1.8 since it depends on an upstream bug fix

…(bis) Fixes scikit-learn#19676

…nto csr_polynomial

ogrisel

Thanks for the PR. Here is a first pass of feedback.

doc/whats_new/v1.2.rst

sklearn/preprocessing/_csr_polynomial_expansion.pyx

sklearn/preprocessing/_polynomial.py

sklearn/preprocessing/tests/test_polynomial.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

sklearn/preprocessing/_polynomial.py

sklearn/preprocessing/_csr_polynomial_expansion.pyx

sklearn/preprocessing/tests/test_polynomial.py

sklearn/preprocessing/_polynomial.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

jjerphan

Thank you, @Micky774.

A few comments.

Note that a specific dtype and C type have been added for sparse matrices indices.

scikit-learn/sklearn/utils/_typedefs.pxd

Lines 19 to 28 in b157ac7

    
           # scipy matrices indices dtype (namely for indptr and indices arrays) 
        
           # 
        
           #   Note that indices might need to be represented as cnp.int64_t. 
        
           #   Currently, we use Cython classes which do not handle fused types 
        
           #   so we hardcode this type to cnp.int32_t, supporting all but edge 
        
           #   cases. 
        
           # 
        
           # TODO: support cnp.int64_t for this case 
        
           # See: https://github.com/scikit-learn/scikit-learn/issues/23653 
        
           ctypedef cnp.int32_t SPARSE_INDEX_TYPE_t

Could we propagate this here for semantics?

sklearn/preprocessing/_csr_polynomial_expansion.pyx

sklearn/preprocessing/tests/test_polynomial.py

thomasjpfan

LGTM

sklearn/preprocessing/_polynomial.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Micky774 · 2023-04-24T15:03:23Z

@ogrisel wondering if you could take another look

ogrisel

Thanks for the update. Here are some more comments for a build problem that I am still investigating:

ogrisel · 2023-04-25T08:35:00Z

sklearn/preprocessing/tests/test_polynomial.py

+    # On Windows, scikit-learn is typically compiled with MSVC that
+    # does not support int128 arithmetic (at the time of writing):
+    # https://stackoverflow.com/a/6761962/163740
+    if sys.maxsize <= 2**32 or sys.platform == "win32":


The sys.maxsize <= 2**32 part was unexpected to me but I checked:

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=54466&view=ms.vss-test-web.build-test-results-tab

and added "Passed" to the status filer and test_sizeof_LARGEST_INT_t to the name filter and indeed it passed on our 32 bit linux build (with gcc).

So I think we need to check if this test would also pass with clang on 32 bit linux (at least manually, we don't necessarily need to configure a new CI entry for this).

clang (or clang-cl) on windows would be challenging to test because I believe there is not documented way to use this compiler to build Python extensions at this time.

I tried to the following debian docker image for i386 (32 bit).

docker run -ti -v `pwd`:/io --platform linux/i386 i386/debian:11.2 bash

Then I installed build dependencies:

apt update && apt install clang python3-pip python3-dev build-essential python3-scipy

and then changed update-alternatives --config cc and update-alternatives --config c++ to point to clang version 11.0.1-2 instead of gcc (not sure if it's needed) and also set the following env variables:

export CC="clang" export CXX="clang++"

and then installed cython with pip3 and built scikit-learn with in editable mode with --no-build-isolation.

The beginning of the build works fine, but when it reaches this error:

(...) clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/build/python3.9-hfHQKB/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/build/python3.9-hfHQKB/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -I/usr/include/python3.9 -c sklearn/preprocessing/_csr_polynomial_expansion.c -o build/temp.linux-i686-3.9/sklearn/preprocessing/_csr_polynomial_expansion.o -g0 -O2 -fopenmp sklearn/preprocessing/_csr_polynomial_expansion.c:767:25: error: expected parameter declarator typedef _BitInt(128) LARGEST_INT_t; ^ sklearn/preprocessing/_csr_polynomial_expansion.c:767:25: error: expected ')' sklearn/preprocessing/_csr_polynomial_expansion.c:767:24: note: to match this '(' typedef _BitInt(128) LARGEST_INT_t; ^ sklearn/preprocessing/_csr_polynomial_expansion.c:767:30: error: expected function body after function declarator typedef _BitInt(128) LARGEST_INT_t; ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2061:22: error: unknown type name 'LARGEST_INT_t' static CYTHON_INLINE LARGEST_INT_t __Pyx_pow_LARGEST_INT_t(LARGEST_INT_t, LARGEST_INT_t); ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2061:75: error: redefinition of parameter 'LARGEST_INT_t' static CYTHON_INLINE LARGEST_INT_t __Pyx_pow_LARGEST_INT_t(LARGEST_INT_t, LARGEST_INT_t); ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2061:60: error: a parameter list without types is only allowed in a function definition static CYTHON_INLINE LARGEST_INT_t __Pyx_pow_LARGEST_INT_t(LARGEST_INT_t, LARGEST_INT_t); ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2079:22: error: unknown type name 'LARGEST_INT_t' static CYTHON_INLINE LARGEST_INT_t __Pyx_PyInt_As_LARGEST_INT_t(PyObject *); ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2091:63: error: unknown type name 'LARGEST_INT_t' static CYTHON_INLINE PyObject* __Pyx_PyInt_From_LARGEST_INT_t(LARGEST_INT_t value); ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2149:154: error: redefinition of parameter 'LARGEST_INT_t' static CYTHON_INLINE __pyx_t_7sklearn_5utils_9_typedefs_int64_t __pyx_f_7sklearn_13preprocessing_25_csr_polynomial_expansion__deg2_column(LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, __pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t); /*proto*/ ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2149:169: error: redefinition of parameter 'LARGEST_INT_t' static CYTHON_INLINE __pyx_t_7sklearn_5utils_9_typedefs_int64_t __pyx_f_7sklearn_13preprocessing_25_csr_polynomial_expansion__deg2_column(LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, __pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t); /*proto*/ ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2149:184: error: unexpected type name '__pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t': expected identifier static CYTHON_INLINE __pyx_t_7sklearn_5utils_9_typedefs_int64_t __pyx_f_7sklearn_13preprocessing_25_csr_polynomial_expansion__deg2_column(LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, __pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t); /*proto*/ ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2149:139: error: a parameter list without types is only allowed in a function definition static CYTHON_INLINE __pyx_t_7sklearn_5utils_9_typedefs_int64_t __pyx_f_7sklearn_13preprocessing_25_csr_polynomial_expansion__deg2_column(LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, __pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t); /*proto*/ ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2150:154: error: redefinition of parameter 'LARGEST_INT_t' static CYTHON_INLINE __pyx_t_7sklearn_5utils_9_typedefs_int64_t __pyx_f_7sklearn_13preprocessing_25_csr_polynomial_expansion__deg3_column(LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, __pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t); /*proto*/ ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2150:169: error: redefinition of parameter 'LARGEST_INT_t' static CYTHON_INLINE __pyx_t_7sklearn_5utils_9_typedefs_int64_t __pyx_f_7sklearn_13preprocessing_25_csr_polynomial_expansion__deg3_column(LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, __pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t); /*proto*/ ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2150:184: error: redefinition of parameter 'LARGEST_INT_t' static CYTHON_INLINE __pyx_t_7sklearn_5utils_9_typedefs_int64_t __pyx_f_7sklearn_13preprocessing_25_csr_polynomial_expansion__deg3_column(LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, __pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t); /*proto*/ ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2150:199: error: unexpected type name '__pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t': expected identifier static CYTHON_INLINE __pyx_t_7sklearn_5utils_9_typedefs_int64_t __pyx_f_7sklearn_13preprocessing_25_csr_polynomial_expansion__deg3_column(LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, __pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t); /*proto*/ ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2150:139: error: a parameter list without types is only allowed in a function definition static CYTHON_INLINE __pyx_t_7sklearn_5utils_9_typedefs_int64_t __pyx_f_7sklearn_13preprocessing_25_csr_polynomial_expansion__deg3_column(LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, LARGEST_INT_t, __pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t); /*proto*/ ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2151:146: error: unexpected type name '__pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t': expected identifier static __pyx_t_7sklearn_5utils_9_typedefs_int64_t __pyx_f_7sklearn_13preprocessing_25_csr_polynomial_expansion__calc_expanded_nnz(LARGEST_INT_t, __pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t, LARGEST_INT_t, int __pyx_skip_dispatch); /*proto*/ ^ sklearn/preprocessing/_csr_polynomial_expansion.c:2151:215: error: redefinition of parameter 'LARGEST_INT_t' static __pyx_t_7sklearn_5utils_9_typedefs_int64_t __pyx_f_7sklearn_13preprocessing_25_csr_polynomial_expansion__calc_expanded_nnz(LARGEST_INT_t, __pyx_t_7sklearn_13preprocessing_25_csr_polynomial_expansion_FLAG_t, LARGEST_INT_t, int __pyx_skip_dispatch); /*proto*/

The first error seems to be triggering the other errors. But this code works with clang on macos. Maybe the version of clang (11) is too old?

I can try with a more recent version of debian or ubuntu. I used the same docker container as the one we use on the CI.

Actually maybe the problem above was caused by previously generated .c files. I need to try again from a clean build folder to make sure.

I tried again on 32 bit linux on armv7 because I could find more recent python / clang versions for that platform and now the build works. But the resulting _csr_polynomial_expansion compiled extension is still broken:

root@4af20c1dad16:/io# pytest -lxk test_sizeof_LARGEST_INT_t sklearn/preprocessing/tests/test_polynomial.py ImportError while loading conftest '/io/sklearn/conftest.py'. sklearn/conftest.py:18: in <module> from sklearn.datasets import fetch_20newsgroups sklearn/datasets/__init__.py:8: in <module> from ._base import load_breast_cancer sklearn/datasets/_base.py:20: in <module> from ..preprocessing import scale sklearn/preprocessing/__init__.py:38: in <module> from ._polynomial import PolynomialFeatures sklearn/preprocessing/_polynomial.py:22: in <module> from ._csr_polynomial_expansion import ( E ImportError: /io/sklearn/preprocessing/_csr_polynomial_expansion.cpython-310-arm-linux-gnueabihf.so: undefined symbol: __divti3

Indeed I think we need to first check for 32bit systems before eagerly trying __int128 or _BitInt(128) since neither are representable on 32bit systems. Probably the most portable way would be to check sizeof(void*) == 8 right? I'll implement that soon.

Indeed:

If you compile for a 32-bit architecture like ARM, or x86 with -m32, no 128-bit integer type is supported with even the newest version of any of these compilers. So you need to detect support before using, if it's possible for your code to work at all without it.

from: https://stackoverflow.com/a/54815033/163740

I think we should also link to that SO answer which is quite a good reference for this topic.

Please also add a TODO note such that once all platform have C23 compilers, we should instead switch to the new way to do this:

SO C23 will let you typedef unsigned _BitInt(128) u128, modeled on clang's feature originally called _ExtInt() which works even on 32-bit machines; see a brief intro to it. Current GCC -std=gnu2x doesn't even support that syntax yet.

This will require a few years, but then it should work both for 32-bit and 64-bit target architectures.

Note that the clang + 32-bit target is no just a theoretical possibility: emscripten / WASM / pyodide is a concrete platform that is built on that combo. I think @lesteve has the required tooling to test this PR on that target.

BTW @lesteve if you do so, please copy and paste the command lines you use to do this because I am curious but too lazy to RTFM ;)

Quick and dirty testing of this PR on Pyodide seems to indicate that the test_sizeof_LARGEST_INT_t fails but the other added tests pass:

>>> pytest.main(['--pyargs', 'sklearn.preprocessing', '-k', 'test_csr_polynomial_expansion_index_overflow_non_reg ression or test_csr_polynomial_expansion_too_large_to_index or LARGEST_INT', '-v']) ============================= test session starts ============================== platform emscripten -- Python 3.11.3, pytest-7.3.1, pluggy-1.0.0 -- /home/pyodide/this.program cachedir: .pytest_cache rootdir: /home/pyodide collecting ... collected 1098 items / 1089 deselected / 9 selected tests/test_polynomial.py::test_csr_polynomial_expansion_index_overflow_non_regression[True-True] PASSED [ 11%] tests/test_polynomial.py::test_csr_polynomial_expansion_index_overflow_non_regression[True-False] PASSED [ 22%] tests/test_polynomial.py::test_csr_polynomial_expansion_index_overflow_non_regression[False-True] PASSED [ 33%] tests/test_polynomial.py::test_csr_polynomial_expansion_index_overflow_non_regression[False-False] PASSED [ 44%] tests/test_polynomial.py::test_csr_polynomial_expansion_too_large_to_index[True-True] PASSED [ 55%] tests/test_polynomial.py::test_csr_polynomial_expansion_too_large_to_index[True-False] PASSED [ 66%] tests/test_polynomial.py::test_csr_polynomial_expansion_too_large_to_index[False-True] PASSED [ 77%] tests/test_polynomial.py::test_csr_polynomial_expansion_too_large_to_index[False-False] PASSED [ 88%] tests/test_polynomial.py::test_sizeof_LARGEST_INT_t FAILED [100%] =================================== FAILURES =================================== __________________________ test_sizeof_LARGEST_INT_t ___________________________ def test_sizeof_LARGEST_INT_t(): # On Windows, scikit-learn is typically compiled with MSVC that # does not support int128 arithmetic (at the time of writing): # https://stackoverflow.com/a/6761962/163740 if sys.maxsize <= 2**32 or sys.platform == "win32": expected_size = 8 else: expected_size = 16 > assert _get_sizeof_LARGEST_INT_t() == expected_size E assert 16 == 8 E + where 16 = _get_sizeof_LARGEST_INT_t() /lib/python3.11/site-packages/sklearn/preprocessing/tests/test_polynomial.py:1094: AssertionError =========================== short test summary info ============================ FAILED tests/test_polynomial.py::test_sizeof_LARGEST_INT_t - assert 16 == 8 ================= 1 failed, 8 passed, 1089 deselected in 1.03s =================

Note that the clang + 32-bit target is no just a theoretical possibility: emscripten / WASM / pyodide is a concrete platform that is built on that combo.

By the way I am not sure __clang__ is defined in the Pyodide use case, I think __EMSCRIPTEN__ is used instead.

BTW @lesteve if you do so, please copy and paste the command lines you use to do this because I am curious but too lazy to RTFM ;)

Set-up the correct emscripten version (it has to match the version Pyodide was built with namely 3.1.32 for Pyodide stable, 0.23.1 at the time of writing, see https://github.com/pyodide/pyodide/blob/0.23.1/Makefile.envs):

git clone https://github.com/emscripten-core/emsdk.git cd emsdk ./emsdk install 3.1.32 ./emsdk activate 3.1.32 source ./emsdk_env.sh

Build a Pyodide wheel, this should work from the scikit-learn repo root:

pip install pyodide-build pyodide build

Now you need to use this wheel and I would say one of the possibility is something like this:

put this wheel in a github repo

use the stable Pyodide console

in the console use await micropip.install('https://cdn.jsdelivr.net/gh/<owner>/<repo>/path/to/wheel.whl') (the cdn.jsdelivr.net/gh is a CORS proxy if you are wondering)

Okay so I have:

Added a check for __EMSCRIPTEN__ as an alternative to __clang__

Updated the runtime test for largest integer type to account for the odd (but beneficial) exception that is emscripten

Also when trying to build pydodide/emscripten for myself, I get the following error. I'm not sure what's going on here 🤷 :

* Creating virtualenv isolated environment... * Installing packages in isolated environment... (Cython>=0.29.24, setuptools<=61.0, wheel) * Getting dependencies for wheel... Partial import of sklearn during the build process. running egg_info writing scikit_learn.egg-info/PKG-INFO writing dependency_links to scikit_learn.egg-info/dependency_links.txt writing requirements to scikit_learn.egg-info/requires.txt writing top-level names to scikit_learn.egg-info/top_level.txt reading manifest file 'scikit_learn.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no previously-included files matching '*' found under directory 'asv_benchmarks' warning: no previously-included files matching '*' found under directory 'benchmarks' warning: no previously-included files matching '*' found under directory 'build_tools' warning: no previously-included files matching '*' found under directory 'maint_tools' warning: no previously-included files matching '*' found under directory 'benchmarks' warning: no previously-included files matching '*' found under directory '.binder' warning: no previously-included files matching '*' found under directory '.circleci' warning: no previously-included files found matching '.codecov.yml' warning: no previously-included files found matching '.git-blame-ignore-revs' warning: no previously-included files found matching '.mailmap' warning: no previously-included files found matching '.pre-commit-config.yaml' warning: no previously-included files found matching 'azure-pipelines.yml' warning: no previously-included files found matching 'CODE_OF_CONDUCT.md' warning: no previously-included files found matching 'CONTRIBUTING.md' warning: no previously-included files found matching 'SECURITY.md' warning: no previously-included files found matching 'PULL_REQUEST_TEMPLATE.md' adding license file 'COPYING' writing manifest file 'scikit_learn.egg-info/SOURCES.txt' * Installing packages in isolated environment... (wheel) * Building wheel... Partial import of sklearn during the build process. Traceback (most recent call last): File "/tmp/build-env-mnfj07nc/lib/python3.11/site-packages/numpy/core/__init__.py", line 23, in <module> from . import multiarray File "/tmp/build-env-mnfj07nc/lib/python3.11/site-packages/numpy/core/multiarray.py", line 10, in <module> from . import overrides File "/tmp/build-env-mnfj07nc/lib/python3.11/site-packages/numpy/core/overrides.py", line 6, in <module> from numpy.core._multiarray_umath import ( ModuleNotFoundError: No module named 'numpy.core._multiarray_umath' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "setup.py", line 221, in check_package_status module = importlib.import_module(package) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1206, in _gcd_import File "<frozen importlib._bootstrap>", line 1178, in _find_and_load File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/tmp/build-env-mnfj07nc/lib/python3.11/site-packages/numpy/__init__.py", line 141, in <module> from . import core File "/tmp/build-env-mnfj07nc/lib/python3.11/site-packages/numpy/core/__init__.py", line 49, in <module> raise ImportError(msg) ImportError: IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: * The Python version is: Python3.11 from "/tmp/build-env-mnfj07nc/bin/python" * The NumPy version is: "1.24.2" and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help. Original error was: No module named 'numpy.core._multiarray_umath' Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/pep517/in_process/_in_process.py", line 351, in <module> main() File "/usr/local/lib/python3.11/site-packages/pep517/in_process/_in_process.py", line 333, in main json_out['return_val'] = hook(**hook_input['kwargs']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pep517/in_process/_in_process.py", line 249, in build_wheel return _build_backend().build_wheel(wheel_directory, config_settings, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/build-env-mnfj07nc/lib/python3.11/site-packages/setuptools/build_meta.py", line 244, in build_wheel return self._build_with_temp_dir(['bdist_wheel'], '.whl', ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/build-env-mnfj07nc/lib/python3.11/site-packages/setuptools/build_meta.py", line 229, in _build_with_temp_dir self.run_setup() File "/tmp/build-env-mnfj07nc/lib/python3.11/site-packages/setuptools/build_meta.py", line 282, in run_setup self).run_setup(setup_script=setup_script) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/build-env-mnfj07nc/lib/python3.11/site-packages/setuptools/build_meta.py", line 174, in run_setup exec(compile(code, __file__, 'exec'), locals()) File "setup.py", line 655, in <module> setup_package() File "setup.py", line 645, in setup_package check_package_status("numpy", min_deps.NUMPY_MIN_VERSION) File "setup.py", line 248, in check_package_status raise ImportError( ImportError: numpy is not installed. scikit-learn requires numpy >= 1.17.3. Installation instructions are available on the scikit-learn website: http://scikit-learn.org/stable/install.html ERROR Backend subproccess exited when trying to invoke build_wheel

sklearn/preprocessing/_csr_polynomial_expansion.pyx

sklearn/preprocessing/_polynomial.py

jjerphan · 2023-05-01T16:35:47Z

@Micky774: is this PR reviewable now? If it is not the case, could you ping me when it is? :)

Micky774 · 2023-05-01T16:44:41Z

@Micky774: is this PR reviewable now? If it is not the case, could you ping me when it is? :)

It should be good for review! The last large point of interest was this thread: #23731 (comment)

which hopefully is resolved through: #23731 (comment)

Let me know if you have any questions or concerns

ogrisel

Alright, LGTM. If the wasm env triggers a new failure we can fix it up in a follow-up PRs.

At some point we will introduce a pyodide CI entry but later :)

ogrisel · 2023-05-04T19:31:07Z

Thanks @Micky774!

jjerphan · 2023-05-04T19:34:38Z

Thank you all (and especially @Micky774) for your efforts in this rocambolesque journey!

Micky774 · 2023-05-04T19:36:24Z

It was definitely quite a tale of bugs (old and new) and niche features/interactions 😅

I really appreciate the patience and support of the reviewers as well ❤️

Edit: I should have known better than to say at the beginning of this PR "oh it shouldn't be too bad"

ogrisel · 2023-05-05T09:51:56Z

I tried to build scikit-learn in a local emscrypten sdk on macos and I get a build failure (seemingly unrelated to this PR, it fails when building sklearn/utils/_typedefs.cpython-311-wasm32-emscripten.so).

I suspect we might still want to test the code of this PR on pyodide because, in retrospect the emscrypten condition seems a bit counterintuitive to me.

* MAINT Clean deprecated losses in (hist) gradient boosting for 1.3 (scikit-learn#25834) * MAINT Clean deprecation of normalize in calibration_curve for 1.3 (scikit-learn#25833) * BLD Clean command removes generated from cython templates (scikit-learn#25839) * PERF Implement `PairwiseDistancesReduction` backend for `KNeighbors.predict_proba` (scikit-learn#24076) Signed-off-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT Added Parameter Validation for datasets.make_circles (scikit-learn#25848) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MNT use a single job by default with sphinx build (scikit-learn#25836) * BLD Generate warning automatically for templated cython files (scikit-learn#25842) * MAINT parameter validation for sklearn.datasets.fetch_lfw_people (scikit-learn#25820) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for metrics.fbeta_score (scikit-learn#25841) * TST add global_random_seed fixture to sklearn/covariance/tests/test_robust_covariance.py (scikit-learn#25821) * MAINT Parameter validation for linear_model.orthogonal_mp (scikit-learn#25817) * TST activate common tests for TSNE (scikit-learn#25374) * CI Update lock files (scikit-learn#25849) * MAINT Added Parameter Validation for metrics.mean_gamma_deviance (scikit-learn#25853) * MAINT Parameters validation for feature_selection.mutual_info_regression (scikit-learn#25850) * MAINT parameter validation metrics.class_likelihood_ratios (scikit-learn#25863) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Ensure disjoint interval constraints (scikit-learn#25797) * MAINT Parameters validation for utils.gen_batches (scikit-learn#25864) * TST use global_random_seed in test_dict_vectorizer.py (scikit-learn#24533) * TST use global_random_seed in test_pls.py (scikit-learn#24526) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * TST use global_random_seed in test_gpc.py (scikit-learn#24600) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * DOC Fix overlapping plot axis in bench_sample_without_replacement.py (scikit-learn#25870) * MAINT Use contiguous memoryviews in _random.pyx (scikit-learn#25871) * MAINT parameter validation sklearn.datasets.fetch_lfw_pair (scikit-learn#25857) * MAINT Parameters validation for metrics.classification_report (scikit-learn#25868) * Empty commit * DOC fix docstring dtype parameter in OrdinalEncoder (scikit-learn#25877) * MAINT Clean up depreacted "log" loss of SGDClassifier for 1.3 (scikit-learn#25865) * ENH Adds TargetEncoder (scikit-learn#25334) Co-authored-by: Andreas Mueller <t3kcit@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jovan Stojanovic <62058944+jovan-stojanovic@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * CI make it possible to cancel running Azure jobs (scikit-learn#25876) * MAINT Clean-up deprecated if_delegate_has_method for 1.3 (scikit-learn#25879) * MAINT Parameter validation for tree.export_text (scikit-learn#25867) * DOC impact of `tol` for solvers in RidgeClassifier (scikit-learn#25530) * MAINT Parameters validation for metrics.hinge_loss (scikit-learn#25880) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for metrics.ndcg_score (scikit-learn#25885) * ENH KMeans initialization account for sample weights (scikit-learn#25752) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * TST use global_random_seed in sklearn/tests/test_dummy.py (scikit-learn#25884) * DOC improve calibration user guide (scikit-learn#25687) * ENH Support for sparse matrices added to `sklearn.metrics.silhouette_samples` (scikit-learn#24677) Co-authored-by: Sahil Gupta <sahil@Sahils-MBP.lan> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT validate_params for plot_tree (scikit-learn#25882) Co-authored-by: Itay <itayvegh@gmail.com> * MAINT add missing space in error message in SVM (scikit-learn#25913) * FIX Adds requires_y tag to TargetEncoder (scikit-learn#25917) * MAINT Consistent cython types continued (scikit-learn#25810) * TST Speed-up common tests of DictionaryLearning (scikit-learn#25892) * TST Speed-up test_dbscan_optics_parity (scikit-learn#25893) * ENH add np.nan option for zero_division in precision/recall/f-score (scikit-learn#25531) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT Parameters validation for datasets.make_low_rank_matrix (scikit-learn#25901) * MAINT Parameter validation for metrics.cluster.adjusted_mutual_info_score (scikit-learn#25898) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * TST Speed-up test_partial_dependence.test_output_shape (scikit-learn#25895) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * MAINT Parameters validation for datasets.make_regression (scikit-learn#25899) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for metrics.mean_squared_log_error (scikit-learn#25924) * TST Use global_random_seed in tests/test_naive_bayes.py (scikit-learn#25890) * TST add global_random_seed fixture to sklearn/datasets/tests/test_covtype.py (scikit-learn#25904) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for datasets.make_multilabel_classification (scikit-learn#25920) * Fixed feature mapping typo (scikit-learn#25934) * MAINT switch to newer codecov uploader (scikit-learn#25919) Co-authored-by: Loïc Estève <loic.esteve@ymail.com> * TST Speed-up test suite when using pytest-xdist (scikit-learn#25918) * DOC update license year to 2023 (scikit-learn#25936) * FIX Remove spurious feature names warning in IsolationForest (scikit-learn#25931) * TST fix unstable test_newrand_set_seed (scikit-learn#25940) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Clean-up deprecated max_features="auto" in trees/forests/gb (scikit-learn#25941) * MAINT LogisticRegression informative error msg when penaly=elasticnet and l1_ratio is None (scikit-learn#25925) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Clean-up remaining SGDClassifier(loss="log") (scikit-learn#25938) * FIX Fixes pandas extension arrays in check_array (scikit-learn#25813) * FIX Fixes pandas extension arrays with objects in check_array (scikit-learn#25814) * CI Disable pytest-xdist in pylatest_pip_openblas_pandas build (scikit-learn#25943) * MAINT remove deprecated call to resources.content (scikit-learn#25951) * DOC note on calibration impact on ranking (scikit-learn#25900) * Remove loguniform fix, use scipy.stats instead (scikit-learn#24665) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT Fix broken links in cluster.dbscan module (scikit-learn#25958) * DOC Fix lars Xy shape (scikit-learn#25952) * ENH Add drop_intermediate parameter to metrics.precision_recall_curve (scikit-learn#24668) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * FIX improve error message when computing NDCG with a single document (scikit-learn#25672) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT introduce _get_response_values and _check_response_methods (scikit-learn#23073) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Extend message for large sparse matrices support (scikit-learn#25961) Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> * MAINT Parameters validation for datasets.make_gaussian_quantiles (scikit-learn#25959) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.d2_tweedie_score (scikit-learn#25975) * MAINT Parameters validation for datasets.make_hastie_10_2 (scikit-learn#25967) * MAINT Parameters validation for preprocessing.minmax_scale (scikit-learn#25962) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for datasets.make_checkerboard (scikit-learn#25955) * MAINT Parameters validation for datasets.make_biclusters (scikit-learn#25945) * MAINT Parameters validation for datasets.make_moons (scikit-learn#25971) * DOC replace deviance by loss in docstring of GradientBoosting (scikit-learn#25968) * MAINT Fix broken link in feature_selection/_univariate_selection.py (scikit-learn#25984) * DOC Update model_persistence.rst to fix skops example (scikit-learn#25993) Co-authored-by: adrinjalali <adrin.jalali@gmail.com> * DOC Specified meaning for max_patches=None in extract_patches_2d (scikit-learn#25996) * DOC document that last step is never cached in pipeline (scikit-learn#25995) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * FIX SequentialFeatureSelector throws IndexError when cv is a generator (scikit-learn#25973) * ENH Adds infrequent categories support to OrdinalEncoder (scikit-learn#25677) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Andreas Mueller <t3kcit@gmail.com> * MAINT make plot_digits_denoising deterministic by fixing random state (scikit-learn#26004) * DOC improve example of PatchExtractor (scikit-learn#26002) * MAINT Parameters validation for datasets.make_friedman2 (scikit-learn#25986) * MAINT Parameters validation for datasets.make_friedman3 (scikit-learn#25989) * MAINT Parameters validation for datasets.make_sparse_uncorrelated (scikit-learn#26001) * MAINT Parameters validation for datasets.make_spd_matrix (scikit-learn#26003) * MAINT Parameters validation for datasets.make_sparse_spd_matrix (scikit-learn#26009) * DOC Added the meanings of default=None for PatchExtractor parameters (scikit-learn#26005) * MAINT remove unecessary check covered by parameter validation framework (scikit-learn#26014) * MAINT Consistent cython types from _typedefs (scikit-learn#25942) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * MAINT Parameters validation for datasets.make_swiss_roll (scikit-learn#26020) * MAINT Parameters validation for datasets.make_s_curve (scikit-learn#26022) * MAINT Parameters validation for datasets.make_blobs (scikit-learn#25983) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * DOC fix SplineTransformer include_bias docstring (scikit-learn#26018) * ENH RocCurveDisplay add option to plot chance level (scikit-learn#25987) * DOC show from_estimator and from_predictions for Displays (scikit-learn#25994) * EXA Fix rst in plot_partial_dependence (scikit-learn#26028) * CI Adds coverage to docker jobs on Azure (scikit-learn#26027) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * API Replace `n_iter` in `Bayesian Ridge` and `ARDRegression` (scikit-learn#25697) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * CLN Make _NumPyAPIWrapper naming consistent to _ArrayAPIWrapper (scikit-learn#26039) * CI disable coverage on Windows to keep CI times reasonable (scikit-learn#26052) * DOC Use Scientific Python Plausible instance for analytics (scikit-learn#25547) * MAINT Parameters validation for sklearn.preprocessing.scale (scikit-learn#26036) * MAINT Parameters validation for sklearn.metrics.pairwise.haversine_distances (scikit-learn#26047) * MAINT Parameters validation for sklearn.metrics.pairwise.laplacian_kernel (scikit-learn#26048) * MAINT Parameters validation for sklearn.metrics.pairwise.linear_kernel (scikit-learn#26049) * MAINT Parameters validation for sklearn.metrics.silhouette_samples (scikit-learn#26053) * MAINT Parameters validation for sklearn.preprocessing.add_dummy_feature (scikit-learn#26058) * Added Parameter Validation for metrics.cluster.normalized_mutual_info_score() (scikit-learn#26060) * DOC Typos in HistGradientBoosting documentation (scikit-learn#26057) * TST add global_random_seed fixture to sklearn/datasets/tests/test_rcv1.py (scikit-learn#26043) * MAINT Parameters validation for sklearn.metrics.pairwise.cosine_similarity (scikit-learn#26006) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * ENH Adds isdtype to Array API wrapper (scikit-learn#26029) * MAINT Parameters validation for sklearn.metrics.silhouette_score (scikit-learn#26054) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * FIX fix spelling mistake in _NumPyAPIWrapper (scikit-learn#26064) * CI ignore more non-library Python files in codecov (scikit-learn#26059) * MAINT Parameters validation for sklearn.metrics.pairwise.cosine_distances (scikit-learn#26046) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Introduce BinaryClassifierCurveDisplayMixin (scikit-learn#25969) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * ENH Forces shape to be tuple when using Array API's reshape (scikit-learn#26030) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Tim Head <betatim@gmail.com> * MAINT Parameters validation for sklearn.metrics.pairwise.paired_euclidean_distances (scikit-learn#26073) * MAINT Parameters validation for sklearn.metrics.pairwise.paired_manhattan_distances (scikit-learn#26074) * MAINT Parameters validation for sklearn.metrics.pairwise.paired_cosine_distances (scikit-learn#26075) * MAINT Parameters validation for sklearn.preprocessing.binarize (scikit-learn#26076) * MAINT Parameters validation for metrics.explained_variance_score (scikit-learn#26079) * DOC use correct template name for displays (scikit-learn#26081) * MAINT Parameters validation for sklearn.preprocessing.maxabs_scale (scikit-learn#26077) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.preprocessing.label_binarize (scikit-learn#26078) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT parameter validation for d2_absolute_error_score (scikit-learn#26066) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameter validation for roc_auc_score (scikit-learn#26007) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.preprocessing.normalize (scikit-learn#26069) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameter validation for metrics.cluster.fowlkes_mallows_score (scikit-learn#26080) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for compose.make_column_transformer (scikit-learn#25897) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.metrics.pairwise.polynomial_kernel (scikit-learn#26070) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.pairwise.rbf_kernel (scikit-learn#26071) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.pairwise.sigmoid_kernel (scikit-learn#26072) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Param validation: constraint for numeric missing values (scikit-learn#26085) * FIX Adds support for negative values in categorical features in gradient boosting (scikit-learn#25629) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Tim Head <betatim@gmail.com> * MAINT Fix C warning in Cython module splitting.pyx (scikit-learn#26051) * MNT Updates _isotonic.pyx to use memoryviews instead of `cnp.ndarray` (scikit-learn#26068) * FIX Fixes memory regression for inspecting extension arrays (scikit-learn#26106) * PERF set openmp to use only physical cores by default (scikit-learn#26082) * MNT Update black to 23.3.0 (scikit-learn#26110) * MNT Adds black commit to git-blame-ignore-revs (scikit-learn#26111) * MAINT Parameters validation for sklearn.metrics.pair_confusion_matrix (scikit-learn#26107) * MAINT Parameters validation for sklearn.metrics.mean_poisson_deviance (scikit-learn#26104) * DOC Use notebook style in plot_lof_outlier_detection.py (scikit-learn#26017) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT utils._fast_dict uses types from utils._typedefs (scikit-learn#26025) * DOC remove sparse-matrix for `y` in ElasticNet (scikit-learn#26127) * ENH add exponential loss (scikit-learn#25965) * MAINT Parameters validation for sklearn.preprocessing.robust_scale (scikit-learn#26086) * MAINT Parameters validation for sklearn.datasets.fetch_rcv1 (scikit-learn#26126) * MAINT Parameters validation for sklearn.metrics.adjusted_rand_score (scikit-learn#26134) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.calinski_harabasz_score (scikit-learn#26135) * MAINT Parameters validation for sklearn.metrics.davies_bouldin_score (scikit-learn#26136) * MAINT: remove `from numpy.math cimport` statements (scikit-learn#26143) * MAINT Parameters validation for sklearn.inspection.permutation_importance (scikit-learn#26145) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.cluster.homogeneity_completeness_v_measure (scikit-learn#26137) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.rand_score (scikit-learn#26138) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC update comment in metrics/tests/test_classification.py (scikit-learn#26150) * CI small cleanup of Cirrus CI test script (scikit-learn#26168) * MAINT remove deprecated is_categorical_dtype (scikit-learn#26156) * DOC Add skforecast to related projects page (scikit-learn#26133) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * FIX Keeps namedtuple's class when transform returns a tuple (scikit-learn#26121) * DOC corrected letter case for better readability in sklearn/metrics/_classification.py / (scikit-learn#26169) * MAINT Parameters validation for sklearn.preprocessing.power_transform (scikit-learn#26142) * FIX `roc_auc_score` now uses `y_prob` instead of `y_pred` (scikit-learn#26155) * MAINT Parameters validation for sklearn.datasets.load_iris (scikit-learn#26177) * MAINT Parameters validation for sklearn.datasets.load_diabetes (scikit-learn#26166) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.load_breast_cancer (scikit-learn#26165) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.cluster.entropy (scikit-learn#26162) * MAINT Parameters validation for sklearn.datasets.fetch_species_distributions (scikit-learn#26161) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * ASV Fix tol in SGDRegressorBenchmark (scikit-learn#26146) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MNT use api.openml.org URLs for fetch_openml (scikit-learn#26171) * MAINT Parameters validation for sklearn.utils.resample (scikit-learn#26139) * MAINT make it explicit that additive_chi2_kernel does not accept sparse matrix (scikit-learn#26178) * MNT fix circleci link in README.rst (scikit-learn#26183) * CI Fix circleci artifact redirector action (scikit-learn#26181) * GOV introduce rights for groups as discussed in SLEP019 (scikit-learn#25753) Co-authored-by: Julien <git@jjerphan.xyz> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * MAINT Parameters validation for sklearn.neighbors.sort_graph_by_row_values (scikit-learn#26173) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * FIX improve convergence criterion for LogisticRegression(penalty="l1", solver='liblinear') (scikit-learn#25214) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT Fix several typos in src and doc files (scikit-learn#26187) * PERF fix overhead of _rescale_data in LinearRegression (scikit-learn#26207) * ENH add Huber loss (scikit-learn#25966) * MAINT Refactor GraphicalLasso and graphical_lasso (scikit-learn#26033) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Cython linting (scikit-learn#25861) * DOC Add JupyterLite button in example gallery (scikit-learn#25887) * MAINT Parameters validation for sklearn.covariance.ledoit_wolf_shrinkage (scikit-learn#26200) * MAINT Parameters validation for sklearn.datasets.load_linnerud (scikit-learn#26199) * MAINT Parameters validation for sklearn.datasets.load_wine (scikit-learn#26196) * DOC Added redirect to Provost paper + minor refactor (scikit-learn#26223) * MAINT Parameter Validation for `covariance.graphical_lasso` (scikit-learn#25053) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.load_digits (scikit-learn#26195) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.preprocessing.quantile_transform (scikit-learn#26144) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.model_selection.cross_validate (scikit-learn#26129) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * DOC Adds TargetEncoder example explaining the internal CV (scikit-learn#26185) Co-authored-by: Tim Head <betatim@gmail.com> * spelling mistake corrected in documentation for script `plot_document_clustering.py` (scikit-learn#26228) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * FIX possible UnboundLocalError in fetch_openml (scikit-learn#26236) * ENH Adds PyTorch support to LinearDiscriminantAnalysis (scikit-learn#25956) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Tim Head <betatim@gmail.com> * MNT Use fixed version of Pyodide (scikit-learn#26247) * MNT Reset transform_output default in example to fix doc build build (scikit-learn#26269) * DOC Update example plot_nearest_centroid.py (scikit-learn#26263) * MNT reduce JupyterLite build size (scikit-learn#26246) * DOC term -> meth in GradientBoosting (scikit-learn#26225) * MNT speed-up html-noplot build (scikit-learn#26245) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * MNT Use copy=False when creating DataFrames (scikit-learn#26272) * MAINT Parameters validation for sklearn.model_selection.permutation_test_score (scikit-learn#26230) * MAINT Parameters validation for sklearn.datasets.clear_data_home (scikit-learn#26259) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.load_files (scikit-learn#26203) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.get_data_home (scikit-learn#26260) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC Fix y-axis plot labels in permutation test score example (scikit-learn#26240) * MAINT cython-lint ignores asv_benchmarks (scikit-learn#26282) * MAINT Parameter validation for metrics.cluster._supervised (scikit-learn#26258) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC Improve docstring for tol in SequentialFeatureSelector (scikit-learn#26271) * MAINT Parameters validation for sklearn.datasets.load_sample_image (scikit-learn#26226) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC Consistent param type for pos_label (scikit-learn#26237) * DOC Minor grammar fix to imputation docs (scikit-learn#26283) * MAINT Parameters validation for sklearn.calibration.calibration_curve (scikit-learn#26198) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.inspection.partial_dependence (scikit-learn#26209) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.model_selection.validation_curve (scikit-learn#26229) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.model_selection.learning_curve (scikit-learn#26227) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MNT Remove deprecated pandas.api.types.is_sparse (scikit-learn#26287) * CI Use Trusted Publishers for uploading wheels to PyPI (scikit-learn#26249) * MAINT Parameters validation for sklearn.metrics.pairwise.manhattan_distances (scikit-learn#26122) * PERF revert openmp use in csr_row_norms (scikit-learn#26275) * MAINT Parameters validation for metrics.check_scoring (scikit-learn#26041) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MNT Improve error message when checking classification target is of a non-regression type (scikit-learn#26281) Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * DOC fix link to User Guide encoder_infrequent_categories (scikit-learn#26309) * MNT remove unused args in _predict_regression_tree_inplace_fast_dense (scikit-learn#26314) * ENH Adds missing value support for trees (scikit-learn#23595) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * CLN Clean up logic in validate_data and cast_to_ndarray (scikit-learn#26300) * MAINT refactor scorer using _get_response_values (scikit-learn#26037) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> * DOC Add HGBDT to "see also" section of random forests (scikit-learn#26319) Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Tim Head <betatim@gmail.com> * MNT Bump Github Action labeler version to use newer Node (scikit-learn#26302) * FIX thresholds should not exceed 1.0 with probabilities in `roc_curve` (scikit-learn#26194) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * ENH Allow for appropriate dtype us in `preprocessing.PolynomialFeatures` for sparse matrices (scikit-learn#23731) Co-authored-by: Aleksandr Kokhaniukov <alexander.kohanyukov@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * DOC Fix minor typo (scikit-learn#26327) * MAINT bump minimum version for pytest (scikit-learn#26184) Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * DOC fix return type in isotonic_regression (scikit-learn#26332) * FIX fix available_if for MultiOutputRegressor.partial_fit (scikit-learn#26333) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * FIX make pipeline pass check_estimator (scikit-learn#26325) * FEA Add multiclass support to `average_precision_score` (scikit-learn#24769) Co-authored-by: Geoffrey <geoffrey.bolmier@gmail.com> Co-authored-by: gbolmier <geoffrey.bolmier@volvocars.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> --------- Signed-off-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: zeeshan lone <56621467+still-learning-ev@users.noreply.github.com> Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Shiva chauhan <103742975+Shivachauhan17@users.noreply.github.com> Co-authored-by: AymericBasset <45051041+AymericBasset@users.noreply.github.com> Co-authored-by: Maren Westermann <maren.westermann@gmail.com> Co-authored-by: Nishu Choudhary <51842539+choudharynishu@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Benedek Harsanyi <80836204+hbenedek@users.noreply.github.com> Co-authored-by: Pooja Subramaniam <poojas2086@gmail.com> Co-authored-by: Rushil Desai <rushildesai01@gmail.com> Co-authored-by: Xiao Yuan <yuanx749@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: 2357juan <29247195+2357juan@users.noreply.github.com> Co-authored-by: Théophile Baranger <39696928+tbaranger@users.noreply.github.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Andreas Mueller <t3kcit@gmail.com> Co-authored-by: Jovan Stojanovic <62058944+jovan-stojanovic@users.noreply.github.com> Co-authored-by: Rahil Parikh <75483881+rprkh@users.noreply.github.com> Co-authored-by: Bharat Raghunathan <bharatraghunthan9767@gmail.com> Co-authored-by: Sortofamudkip <wishyutp0328@gmail.com> Co-authored-by: Gleb Levitski <36483986+glevv@users.noreply.github.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Ashwin Mathur <97467100+awinml@users.noreply.github.com> Co-authored-by: Sahil Gupta <sahil@Sahils-MBP.lan> Co-authored-by: Veghit <itay.vegh@gmail.com> Co-authored-by: Itay <itayvegh@gmail.com> Co-authored-by: precondition <57645186+precondition@users.noreply.github.com> Co-authored-by: Marc Torrellas Socastro <marc.torsoc@gmail.com> Co-authored-by: Dominic Fox <dominicjfox2@gmail.com> Co-authored-by: futurewarning <36329275+futurewarning@users.noreply.github.com> Co-authored-by: Yao Xiao <108576690+Charlie-XIAO@users.noreply.github.com> Co-authored-by: Joey Ortiz <orangesherbet0@gmail.com> Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Christian Veenhuis <veenhuis@gmail.com> Co-authored-by: adienes <51664769+adienes@users.noreply.github.com> Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com> Co-authored-by: Lene Preuss <lene.preuss@gmail.com> Co-authored-by: A.H.Mansouri <83764851+A-H-Mansoury@users.noreply.github.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: Carla J <ca.jancik@gmail.com> Co-authored-by: windiana42 <61181806+windiana42@users.noreply.github.com> Co-authored-by: mdarii <dariimaxim@gmail.com> Co-authored-by: murezzda <47388020+murezzda@users.noreply.github.com> Co-authored-by: Peter Piontek <piontek0@gmail.com> Co-authored-by: John Pangas <swiftyxswaggy@outlook.com> Co-authored-by: Dmitry Nesterov <76070534+dmitrylala@users.noreply.github.com> Co-authored-by: Yuchen Zhou <72342196+ROMEEZHOU@users.noreply.github.com> Co-authored-by: Ekaterina Butyugina <102963496+ekaterinabutyugina@users.noreply.github.com> Co-authored-by: Jiawei Zhang <jiawei.zhang@nyu.edu> Co-authored-by: Ansam Zedan <86729068+ansamz@users.noreply.github.com> Co-authored-by: genvalen <genvalen@protonmail.com> Co-authored-by: farhan khan <86480450+BabaYaga1221@users.noreply.github.com> Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Co-authored-by: Jiawei Zhang <jz4721@nyu.edu> Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com> Co-authored-by: Jessicakk0711 <106110789+Jessicakk0711@users.noreply.github.com> Co-authored-by: Ankur Singh <singankur28@gmail.com> Co-authored-by: Seoeun(Sun☀️) Hong <75988952+seoeunHong@users.noreply.github.com> Co-authored-by: Nightwalkx <74856680+xi-jiajun@users.noreply.github.com> Co-authored-by: VIGNESH D <35656793+dvignesh1995@users.noreply.github.com> Co-authored-by: Vincent-violet <130581473+Vincent-violet@users.noreply.github.com> Co-authored-by: Elabonga Atuo <elabongaatuo@gmail.com> Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org> Co-authored-by: André Pedersen <andrped94@gmail.com> Co-authored-by: Ashish Dutt <ashish.dutt8@gmail.com> Co-authored-by: Phil <philsupertramp@users.noreply.github.com> Co-authored-by: Stanislav (Stanley) Modrak <44023416+smith558@users.noreply.github.com> Co-authored-by: hujiahong726 <52920842+hujiahong726@users.noreply.github.com> Co-authored-by: James Dean <24254612+AcylSilane@users.noreply.github.com> Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Aleksandr Kokhaniukov <alexander.kohanyukov@gmail.com> Co-authored-by: c-git <43485962+c-git@users.noreply.github.com> Co-authored-by: annegnx <64203599+annegnx@users.noreply.github.com> Co-authored-by: Geoffrey <geoffrey.bolmier@gmail.com> Co-authored-by: gbolmier <geoffrey.bolmier@volvocars.com>

niuk-a and others added 6 commits July 13, 2021 15:54

[WIP] FIX index overflow error in sparse matrix polynomial expansion …

7eef7ad

…(bis) Fixes scikit-learn#19676

Merge branch 'main' into csr_polynomial

4adbf38

Reconciled with main

baa98a2

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

2b9187d

…nto csr_polynomial

Merge branch 'main' into csr_polynomial

55424a0

Removed extra total_nnz assignment

69438dc

github-actions bot added the cython label Jun 22, 2022

Micky774 changed the title ~~ENH Allow for appropriate dtype us in preprocessing.PolynomialFeatures::_csr_polynomial_expansion~~ ENH Allow for appropriate dtype us in preprocessing.PolynomialFeatures for sparse matrices Jun 22, 2022

github-actions bot added the module:preprocessing label Jun 22, 2022

Micky774 added 6 commits June 22, 2022 14:09

Added fused type

9ecbf8a

Added clarifying comment

345e043

Merge branch 'main' into csr_polynomial

1d23b1d

Added changelog entry

cc6a548

Merge branch 'main' into csr_polynomial

8b189bb

Fixed PR tag in changelog entry

15b00fd

ogrisel reviewed Jun 27, 2022

View reviewed changes

Micky774 and others added 2 commits June 29, 2022 17:26

Apply suggestions from code review

ee8a3ba

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Merge branch 'main' into csr_polynomial

cd346f1

Micky774 commented Jun 29, 2022

View reviewed changes

sklearn/preprocessing/_polynomial.py Outdated Show resolved Hide resolved

Micky774 and others added 7 commits June 29, 2022 19:20

Streamlined logic and improved tests

0a17dee

Added test depending on scipy version

b118a3c

Clarified breaking and renamed types

fa1ecf2

Merge branch 'main' into csr_polynomial

d735c8f

Merge branch 'main' into csr_polynomial

f3bb5cd

Merge branch 'main' into csr_polynomial

0c9a563

Merge branch 'main' into csr_polynomial

a006bf0

ogrisel reviewed Jul 4, 2022

View reviewed changes

Apply suggestions from code review

96259d7

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

jjerphan reviewed Jul 7, 2022

View reviewed changes

Merge branch 'main' into csr_polynomial

2e44f39

Micky774 added 3 commits April 11, 2023 12:12

Adopted review feedback

99eabab

Improved constant documentation

9a90a59

Improved variable names

9d9d21b

thomasjpfan approved these changes Apr 12, 2023

View reviewed changes

sklearn/preprocessing/_polynomial.py Outdated Show resolved Hide resolved

Micky774 and others added 3 commits April 12, 2023 14:25

Update sklearn/preprocessing/_polynomial.py

7ac1a41

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Merge branch 'main' into csr_polynomial

1ebfbbe

Incorporated typedef changes

7756f62

ogrisel reviewed Apr 25, 2023

View reviewed changes

Micky774 added 4 commits April 25, 2023 11:58

Merge branch 'main' into csr_polynomial

6112cf8

Added check for 32bit-ness for clang

cf3c00c

Improved documentation

810fb3b

Merge branch 'main' into csr_polynomial

ff36da1

jjerphan self-requested a review April 27, 2023 19:24

Micky774 added 2 commits May 1, 2023 10:27

Merge branch 'main' into csr_polynomial

3033e4b

Updated for emscripten edge-case

ce6b72a

lorentzenchr reviewed May 1, 2023

View reviewed changes

sklearn/preprocessing/_polynomial.py Outdated Show resolved Hide resolved

Removed extraneous assertion

bcdee5d

ogrisel approved these changes May 4, 2023

View reviewed changes

ogrisel merged commit c6628a0 into scikit-learn:main May 4, 2023

Micky774 deleted the csr_polynomial branch May 4, 2023 19:31

jjerphan removed their request for review May 4, 2023 19:34

jjerphan mentioned this pull request May 23, 2023

Remove indices downcasting for sparse arrays scipy/scipy#18509

Merged

	# scipy matrices indices dtype (namely for indptr and indices arrays)
	#
	# Note that indices might need to be represented as cnp.int64_t.
	# Currently, we use Cython classes which do not handle fused types
	# so we hardcode this type to cnp.int32_t, supporting all but edge
	# cases.
	#
	# TODO: support cnp.int64_t for this case
	# See: https://github.com/scikit-learn/scikit-learn/issues/23653
	ctypedef cnp.int32_t SPARSE_INDEX_TYPE_t

Uh oh!

ENH Allow for appropriate dtype us in preprocessing.PolynomialFeatures for sparse matrices #23731

ENH Allow for appropriate dtype us in preprocessing.PolynomialFeatures for sparse matrices #23731

Uh oh!

Conversation

Micky774 commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Micky774 commented Apr 24, 2023

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Apr 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Apr 25, 2023

Choose a reason for hiding this comment

Uh oh!

ogrisel Apr 25, 2023

Choose a reason for hiding this comment

Uh oh!

Micky774 Apr 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Apr 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Apr 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lesteve Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

lesteve Apr 26, 2023 • edited by ogrisel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Micky774 May 1, 2023

Choose a reason for hiding this comment

Uh oh!

ENH Allow for appropriate dtype us in `preprocessing.PolynomialFeatures` for sparse matrices #23731

ENH Allow for appropriate dtype us in `preprocessing.PolynomialFeatures` for sparse matrices #23731

Micky774 commented Jun 22, 2022 •

edited

Loading

ogrisel Apr 25, 2023 •

edited

Loading

Micky774 Apr 25, 2023 •

edited

Loading

ogrisel Apr 25, 2023 •

edited

Loading

ogrisel Apr 25, 2023 •

edited

Loading

lesteve Apr 26, 2023 •

edited by ogrisel

Loading

Micky774 commented May 4, 2023 •

edited

Loading

ogrisel commented May 5, 2023 •

edited

Loading