Skip to content

MAINT Improve the _middle_term_sparse_sparse_{32, 64} routines #25449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Vincent-Maladiere
Copy link
Contributor

Reference Issues/PRs

Towards #22587
Follow up #24556

What does this implement/fix? Explain your changes.

In #24556, we introduced a routine for computing the dot product of sparse matrices efficiently for the Euclidean specialization of ArgKmin and RadiusNeighbors with CSR-CSR matrices.

  • This PR removes two TODOs aiming at improving the routine performance after trying these optimizations without success.
  • It also introduces shorter variable names to improve readability without losing too much context.

More details about these two optimization tentatives are below.


TODO1:

# If possible optimize this routine to efficiently treat cases where
# `n_samples_X << n_samples_Y` met in practise when X_test consists of a
# few samples, and thus when there's a single chunk of X whose number of
# samples is less than the default chunk size.

This first optimization suggests focusing on the iteration sequence order when there is a large imbalance between the number of rows between the X chunk and Y chunk (default is 256).
As we already loop on n_X first, we found no further way to gain performance based on this scenario.


TODO2:

# Compare this routine with the similar ones in SciPy, especially
# `csr_matmat` which might implement a better algorithm.
# See: https://github.com/scipy/scipy/blob/e58292e066ba2cb2f3d1e0563ca9314ff1f4f311/scipy/sparse/sparsetools/csr.h#L603-L669  # noqa

csr_matmat from SciPy introduces a slightly different routine for doing the same operation. As it uses only 3 for-loops instead of 4 in our case, we may gain some speed by applying a similar logic.

Before we try reproducing this logic, note that our setup differs from Scipy's at several levels:

  • Scipy csr_matmat uses a CSR matrix and a CSC matrix, instead of two CSR matrices. Since there is no documentation, the only way to spot it is by manually running the routine on two small matrices.
  • We have to deal with chunks via X_start, X_end and Y_start, Y_end, while Scipy's routine consumes the entire input matrices. This creates some overhead that will kill the performance of our candidate routine.

Our candidate routine, which passes all tests, is:

cdef void _middle_term_sparse_sparse_64(
    const DTYPE_t[:] X_data,
    const SPARSE_INDEX_TYPE_t[:] X_indices,
    const SPARSE_INDEX_TYPE_t[:] X_indptr,
    ITYPE_t X_start,
    ITYPE_t X_end,
    const DTYPE_t[:] Y_data,
    const SPARSE_INDEX_TYPE_t[:] Y_indices,
    const SPARSE_INDEX_TYPE_t[:] Y_indptr,
    ITYPE_t Y_start,
    ITYPE_t Y_end,
    DTYPE_t * D,
) nogil:
    # This routine assumes that D points to the first element of a
    # zeroed buffer of length at least equal to n_X × n_Y, conceptually
    # representing a 2-d C-ordered array.
    cdef:
        ITYPE_t i, j, k
        ITYPE_t n_X = X_end - X_start
        ITYPE_t n_Y = Y_end - Y_start
        ITYPE_t x_col, x_ptr, y_col, y_ptr
    for i in range(n_X):
        for x_ptr in range(X_indptr[X_start+i], X_indptr[X_start+i+1]):
            x_col = X_indices[x_ptr]
            for y_ptr in range(Y_indptr[x_col], Y_indptr[x_col+1]):
                y_col = Y_indices[y_ptr]
                if Y_start <= y_col < Y_end:
                    k = i * n_Y + y_col - Y_start
                    D[k] += -2 * X_data[x_ptr] * Y_data[y_ptr]
  • The main difference with our prior routine is that we got rid of the 3rd for-loop on n_Y by plugging x_col into Y_indptr directly.
  • We need to convert Y from CSR to CSC, and we achieve this in a single place, during SparseSparseMiddleTermComputer.__init__:
self.Y_data, self.Y_indices, self.Y_indptr = self.unpack_csr_matrix(Y.tocsc())
  • However, we need to use a super costly if Y_start <= y_col < Y_end: to filter the correct indices of Y, which introduces a serious performance degradation. Doing branchless doesn't improve this issue and creates some erratic errors during testing.

cc @jjerphan @glemaitre

Any other comments?

@jjerphan
Copy link
Member

Hi @Vincent-Maladiere,

Thanks for exploring this. I do not have time nor bandwidth to have a look at the algorithm you developed now and will likely come back to you once the 1.2.1 release is out.

As discussed IRL, doing branch-less here might just evaluate both statements which is more costly than doing a comparison and a jump eventually. I would be in favor of not being too smart and keep the code-logic clear and transparent here.

@glemaitre glemaitre changed the title [MAINT] Remove TODOs from _middle_term_sparse_sparse_64 routine MAINT Remove TODOs from _middle_term_sparse_sparse_64 routine Jan 23, 2023
@ogrisel
Copy link
Member

ogrisel commented Jan 24, 2023

We need to convert Y from CSR to CSC, and we achieve this in a single place, during SparseSparseMiddleTermComputer.init:

self.Y_data, self.Y_indices, self.Y_indptr = self.unpack_csr_matrix(Y.tocsc())

Note that this would also use more memory compared to working directly with chunks of CSR matrices as is the case in our current code base.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I preferred the old variable names. They are a bit verbose but also more explicit.

Not a very strong preference though.

@jjerphan
Copy link
Member

jjerphan commented Jan 26, 2023

Scipy csr_matmat uses a CSR matrix and a CSC matrix, instead of two CSR matrices. Since there is no documentation, the only way to spot it is by manually running the routine on two small matrices.

In our case we use two (chunks of) CSR matrices but the second one is transposed and those is seen as a CSC matrices without any conversion cost.

Since csr_matmat expects a (CSR, CSC)-couple, I still do not know what is blocking us from using it (I am saying it naively, I have not yet been able to get into the algorithm).

For the reader, profiling this script:

# csr_matmat.py
import numpy as np

from numpy.testing import assert_array_equal
from scipy.sparse import csr_matrix

n, p = 1000, 256
X = np.random.random((n, p))
X[X <= 0.3] = 0.
X = csr_matrix(X.astype(np.float64))

# While X is CSR, Y is CSC here due to the
# transposition (the coercion is natural).
Y = X.T

# Arrays are copied here are equal but are
# not identical (copies are created). 
assert_array_equal(X.data, Y.data)
assert_array_equal(X.indices, Y.indices)
assert_array_equal(X.indptr, Y.indptr)

# This dispatch to csr_matmat
X @ Y

With:

py-spy record --rate=500 \
              --native \
              -o csr_matmat.svg \ 
              -f speedscope \
              -- python csr_matmat.py

gives the following SpeedScope inspectable profiling: csr_matmat.svg

@Vincent-Maladiere
Copy link
Contributor Author

Hi @jjerphan, thanks for the clarification. Could you point out where the actual transpose operation on Y happens?
Empirically, the adapted code from scipy only passed tests after converting Y to CSC (I didn't try to transpose it, though).

Note that the Scipy's routine can't work for us since they loop on their entire input, while we have to deal with {X,Y}_{start,end} which introduces a significant cost.

@jjerphan
Copy link
Member

Let's recap and make it explicit.

csr_matmat computes C = X @ Z where:

  • X is CSR
  • Z is CSC.
  • C is CSR.

To compute the middle term of a pair of the $(l, k)$-th pair of chunks, i.e.:

$$ - 2 \mathbf{X}^{(l)} {\mathbf{Y}^{(k)}}^\top $$

We use:

  • $\mathbf{X}^{(l)}$, a chunk (here the $l$-th) of $\mathbf{X}$. $\mathbf{X}$ is handled as X, a CSR matrix.
  • $\mathbf{Y}^{(k)}$, a chunk (here the $k$-th) of $\mathbf{Y}$. $\mathbf{Y}$ is handled as Y, a CSR matrix
  • hence ${\mathbf{Y}^{(k)}}^\top$ is we can be seen as a chunk of a CSC matrix (i.e. $\mathbf{Y}^{(k)}$ is conceptually but not programmatically transposed.)

Thus, if we were computing it without chunks, i.e.:

$$ - 2 \mathbf{X} {\mathbf{Y}}^\top $$

we could slightly modify csr_matmat to change the accumulations of sums. @Vincent-Maladiere: can you confirm that we have the same understanding?

Now, we are using chunks, so we can't simply translate csr_matmat from C++ to Cython but we might get some inspiration from it to better craft _middle_term_sparse_sparse_*. (This was the original motivation for my comment (named TODO2 above), but this was probably unclear or not explicit enough). Can you confirm, @Vincent-Maladiere?

@Vincent-Maladiere
Copy link
Contributor Author

Thus, if we were computing it without chunks, we could slightly modify csr_matmat to change the accumulations of sums. @Vincent-Maladiere: can you confirm that we have the same understanding?

Absolutely! This would work like a charm, with a nice speed-up.

Now, we are using chunks, so we can't simply translate csr_matmat from C++ to Cython, but we might get some inspiration from it to better craft middle_term_sparse_sparse*. (This was the original motivation for my comment (named TODO2 above), but this was probably unclear or not explicit enough). Can you confirm, @Vincent-Maladiere?

That is precisely what I have been trying to achieve, but I haven't found a more efficient solution than the one described above. This is, of course, up to discussion, and I would appreciate having feedback on new candidates for _middle_term_sparse_sparse_*.

Also, note that the innovation of Scipy's approach is to remove the loop on n_Y by using x_col to lookup Y_indptr directly.

@jjerphan
Copy link
Member

That is precisely what I have been trying to achieve, but I haven't found a more efficient solution than the one described above. This is, of course, up to discussion, and I would appreciate having feedback on new candidates for _middle_term_sparse_sparse_*.

OK. 👍

I need to scope some time to have a look at this.

@jjerphan jjerphan changed the title MAINT Remove TODOs from _middle_term_sparse_sparse_64 routine MAINT Improve the _middle_term_sparse_sparse_{32, 64} routines Jan 26, 2023
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, after IRL discussions with @Vincent-Maladiere.

I am +0 regarding integrating the variable names' changes.

@Micky774
Copy link
Contributor

Micky774 commented Feb 4, 2023

Wanted to mention that I prefer the shorter names, since I think the truncated information is still clear enough in context of the code while making it easier to read.

@jjerphan
Copy link
Member

jjerphan commented Feb 6, 2023

I let @ogrisel or @Micky774 merge when this LGTT (to me, this can be merged but this is not urgent).

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge then.

@ogrisel ogrisel merged commit 4acd91d into scikit-learn:main Feb 6, 2023
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Feb 27, 2023
This was already studied in:
scikit-learn#25449

Co-authored-by: Vincent M <maladiere.vincent@yahoo.fr>
AdarshPrusty7 added a commit to AdarshPrusty7/GSGP that referenced this pull request Mar 6, 2023
* ENH Raise NotFittedError in get_feature_names_out for MissingIndicator, KBinsDiscretizer, SplineTransformer, DictVectorizer (scikit-learn#25402)

Co-authored-by: Alex <alex.buzenet.fr@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* DOC Update date and contributors list for v1.2.1 (scikit-learn#25459)

* DOC Make MeanShift documentation clearer (scikit-learn#25305)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* Finishes boolean and arithmetic creation

* Skeleton for traditional GP

* DOC Reorder whats_new/v1.2.rst (scikit-learn#25461)

Follow-up of scikit-learn#25459

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>

* FIX fix faulty test in `cross_validate` that used the wrong estimator (scikit-learn#25456)

* ENH Raise NotFittedError in get_feature_names_out for estimators that use ClassNamePrefixFeatureOutMixin and SelectorMixin (scikit-learn#25308)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* EFF Improve IsolationForest predict time (scikit-learn#25186)

Co-authored-by: Felipe Breve Siola <felipe.breve-siola@klarna.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Tim Head <betatim@gmail.com>

* MAINT refactor spectral_clustering to call SpectralClustering (scikit-learn#25392)

* TST reduce warnings in test_logistic.py (scikit-learn#25469)

* CI Build doc on CircleCI (scikit-learn#25466)

* DOC Update news footer for 1.2.1 (scikit-learn#25472)

* MAINT Validate parameter for `sklearn.cluster.cluster_optics_xi` (scikit-learn#25385)

Co-authored-by: adossantosalfam <anthony.dos_santos_alfama@insa-rouen.fr>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* MAINT Parameters validation for additive_chi2_kernel (scikit-learn#25424)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* Initial Program Creation

* CI Include linting in CircleCI (scikit-learn#25475)

* MAINT Update version number to 1.2.1 in SECURITY.md (scikit-learn#25471)

* TST Sets random_state for test_logistic.py (scikit-learn#25446)

* MAINT Remove -Wcpp warnings when compiling sklearn.decomposition._online_lda_fast (scikit-learn#25020)

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

* FIX Support readonly sparse datasets for `manhattan_distances`  (scikit-learn#25432)

* TST Add non-regression test for scikit-learn#7981

This reproducer is adapted from the one of this message:
scikit-learn#7981 (comment)

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

* FIX Support readonly sparse datasets for manhattan

* DOC Add entry in whats_new/v1.2.rst for 1.2.1

* FIX Fix comment

* Update sklearn/metrics/tests/test_pairwise.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

* DOC Move entry to whats_new/v1.3.rst

* Update sklearn/metrics/tests/test_pairwise.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* MAINT dynamically expose kulsinski and remove support in BallTree (scikit-learn#25417)

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
closes scikit-learn#25212

* DOC Adds CirrusCI badge to readme (scikit-learn#25483)

* CI add linter display name (scikit-learn#25485)

* DOC update description of X in `FunctionTransformer.transform()`  (scikit-learn#24844)

* MAINT remove -Wcpp warnings when compiling sklearn.preprocessing._csr_polynomial_expansion (scikit-learn#25041)

* DOC more didactic example of bisecting kmeans (scikit-learn#25494)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* ENH csr_row_norms optimization (scikit-learn#24426)

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>

* TST Allow callables as valid parameter regarding cloning estimator (scikit-learn#25498)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: From: Tim Head <betatim@gmail.com>

* DOC Fixes sphinx search on website (scikit-learn#25504)

* FIX make IsotonicRegression always predict NumPy arrays (scikit-learn#25500)



Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* FEA Add Gamma deviance as loss function to HGBT (scikit-learn#22409)

* FEA add gamma loss to HGBT

* DOC add whatsnew

* CLN address review comments

* TST make test_gamma pass by not testing out-of-sample

* TST compare gamma and poisson to LightGBM

* TST fix test_gamma by comparing to MSE HGBT instead of Poisson HGBT

* TST fix for test_same_predictions_regression for poisson

* CLN address review comments

* CLN nits

* CLN better comments

* TST use pytest.param with skip mark

* TST Correct conditional test parametrization mark

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

* CI Trigger CI

Builds currently fail because requests to Azure Ubuntu repository
timeout.

* DOC add comment for lax comparison with LightGBM

* CLN tuple needs trailing comma

---------

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

* MAINT Remove -Wsign-compare warnings when compiling sklearn.tree._tree (scikit-learn#25507)

* MAINT add more intuition on OAS computation based on literature (scikit-learn#23867)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* CI Allow cirrus arm tests to run with cd build commit tag (scikit-learn#25514)

* CI Upload ARM wheels from CirrusCI to nightly and staging index (scikit-learn#25513)



Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* MAINT Remove -Wcpp warnings from sklearn.utils._seq_dataset (scikit-learn#25406)

* FIX Fixes linux ARM CI on CirrusCI (scikit-learn#25536)

* DOC Fix grammatical mistake in `mixture` module (scikit-learn#25541)

* DOC add missing trailing colon (scikit-learn#25542)

* MAINT Parameters validation for sklearn.datasets.make_classification (scikit-learn#25474)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* MNT Expose allow_nan tag in bagging (scikit-learn#25506)

* MAINT Clean-up comments and rename variables in `_middle_term_sparse_sparse_{32, 64}` (scikit-learn#25449)

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

* DOC: remove incorrect statement (scikit-learn#25544)

* MAINT Parameters validation for reconstruct_from_patches_2d (scikit-learn#25384)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* MAINT Parameter validation for sklearn.metrics.d2_pinball_score (scikit-learn#25414)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* MAINT Parameters validation for spectral_clustering (scikit-learn#25378)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* MAINT Parameters validation for sklearn.datasets.fetch_kddcup99 (scikit-learn#25463)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* DOC Update MLPRegressor docs (scikit-learn#25556)

Co-authored-by: Ian Thompson <ian.thompson@hrblock.com>

* DOC Update docs for KMeans (scikit-learn#25546)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* FIX BisectingKMeans crashes randomly (scikit-learn#25563)

Fixes scikit-learn#25505

* ENH BaseLabelPropagation to accept sparse matrices (scikit-learn#19664)

Co-authored-by: Kaushik Amar Das <kaushik.amar.das@accenture.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* MAINT Remove travis ci config and related doc (scikit-learn#25562)

* DOC Add pynndescent to Approximate nearest neighbors in TSNE example (scikit-learn#25480)


Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* DOC Add docstring example to make_regression (scikit-learn#25551)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* MAINT ensure that pos_label support all possible types (scikit-learn#25317)

* MAINT Parameters validation for sklearn.metrics.f1_score (scikit-learn#25557)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* ENH Adds `class_names` to `tree.export_text` (scikit-learn#25387)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* MAINT Replace cnp.ndarray with memory views in sklearn.tree._tree (where possible) (scikit-learn#25540)

* DOC Change print format in TSNE example (scikit-learn#25569)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* FIX ColumnTransformer supports empty selection for pandas output (scikit-learn#25570)

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

* DOC fix docstring of _plain_sgd (scikit-learn#25573)

* FIX Enable setting of sub-parameters for deprecated base_estimator param (scikit-learn#25477)

* DOC Improve minor and bug-fix release processes documentation (scikit-learn#25457)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@yahoo.fr>

* MAINT Remove ReadonlyArrayWrapper from _loss module (scikit-learn#25555)

* MAINT Remove ReadonlyArrayWrapper from _loss module

* CLN Remove comments about Cython 3.0

* MAINT Remove ReadonlyArrayWrapper from _kmeans (scikit-learn#25554)

* MAINT Remove ReadonlyArrayWrapper from _kmeans

* more const and remove blas compile warnings

* CLN Adds comment about casting to non const pointers

* Update sklearn/utils/_cython_blas.pyx

* MAINT Remove ReadonlyArrayWrapper from DistanceMetric (scikit-learn#25553)

* DOC improve stop_words description w.r.t. max_df range in CountVectorizer (scikit-learn#25489)

* MAINT Removes ReadOnlyWrapper (scikit-learn#25586)

* MAINT Parameters validation for sklearn.metrics.log_loss (scikit-learn#25577)

* MAINT Adds comments and better naming into tree code (scikit-learn#25576)

* MAINT Adds comments and better naming into tree code

* CLN Use feature_values instead of Xf

* Apply suggestions from code review

Co-authored-by: Adam Li <adam2392@gmail.com>

* DOC Improve comment from review

* Apply suggestions from code review

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

---------

Co-authored-by: Adam Li <adam2392@gmail.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* FIX error when deserialzing a Tree instance from a read only buffer (scikit-learn#25585)

* DOC: fix typo in California Housing dataset description (scikit-learn#25613)

* ENH: Update KDTree, and example documentation (scikit-learn#25482)

* ENH: Update KDTree, and example documentation

* ENH: Add valid metric function and reference doc

* CHG: Documentation update

Co-authored-by: Adam Li <adam2392@gmail.com>

* CHG: make valid metric property and fix doc string

* FIX: documentation, and add code example

* ENH: Change valid metric to class method, and doc

* ENH: Change valid metric class variable, and doc

* FIX: documentation error

* FIX: documentation error

* CHG: Use class method for valid metrics

* FIX: CI problems

---------

Co-authored-by: Adam Li <adam2392@gmail.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

* TST Common test for checking estimator deserialization from a read only buffer (scikit-learn#25624)

* DOC fix comment in plot_logistic_l1_l2_sparsity.py (scikit-learn#25633)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* DOC Places governance in navigation bar (scikit-learn#25618)

* MAINT Check pyproject toml is consistent with min_dependencies (scikit-learn#25610)

* MAINT Check pyproject toml is consistent with min_dependencies

* CLN Make it clear that only SciPy and Cython are checked

* CLN Revert auto formatter

* MAINT Use newest NumPy C API in tree._criterion (scikit-learn#25615)

* MAINT Use newest NumPy C API in tree._criterion

* FIX Use pointer for children

* FIX Fixes check_array nonfinite checks with ArrayAPI specification (scikit-learn#25619)

* FIX Fixes check_array nonfinite checks with ArrayAPI specification

* DOC Adds PR number

* FIX Test on both cupy and numpy

* DOC Correctly docstring in StackingRegressor.fit_transform (scikit-learn#25599)

* MAINT Remove Cython compilation warnings ahead of Cython3.0 release (scikit-learn#25621)

* ENH Preserve DataFrame dtypes in transform for feature selectors (scikit-learn#25102)

* FIX report properly n_iter_ when warm_start=True (scikit-learn#25443)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* DOC fix typo in KMeans's param. (scikit-learn#25649)

* FIX use const memory views in hist_gradient_boosting predictor (scikit-learn#25650)

* DOC modified the graph for better readability (scikit-learn#25644)

* MAINT Removes upper limit on setuptools (scikit-learn#25651)

* DOC improve the `warm_start` glossary entry (scikit-learn#25523)

* DOC Update governance document for SLEP020 (scikit-learn#25663)



Co-authored-by: Tim Head <betatim@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

* FIX renormalization of y_pred inside log_loss (scikit-learn#25299)

* Remove renormalization of y_pred inside log_loss

* Deprecate eps parameter in log_loss

* ENH Allows target to be pandas nullable dtypes (scikit-learn#25638)

* DOC unify usage of 'w.r.t.' (scikit-learn#25683)

* MAINT Parameters validation for metrics.max_error (scikit-learn#25679)

* MAINT Parameters validation for datasets.make_friedman1 (scikit-learn#25674)

Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>

* MAINT Parameters validation for mean_pinball_loss (scikit-learn#25685)

Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>

* DOC Specify behavior of None for CountVectorizer (scikit-learn#25678)

* DOC Specify behaviour of None for TfIdfVectorizer max_features parameter (scikit-learn#25676)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* MAINT Set random state for plot_anomaly_comparison (scikit-learn#25675)

* MAINT Parameters validation for cluster.mean_shift (scikit-learn#25684)

Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>

* MAINT Parameters validation for sklearn.metrics.jaccard_score (scikit-learn#25680)

Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>

* DOC Add the custom compiler section back (scikit-learn#25667)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* MAINT Parameters validation for precision_recall_fscore_support (scikit-learn#25681)

Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>

* FIX Allow negative tol in SequentialFeatureSelector (scikit-learn#25664)

* MAINT Replace deprecated cython conditional compilation (scikit-learn#25654)



Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* DOC fix formatting typo in related_projects (scikit-learn#25706)

* MAINT Parameters validation for metrics.mean_absolute_percentage_error (scikit-learn#25695)

* MAINT Parameters validation for metrics.precision_recall_curve (scikit-learn#25698)

Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>

* MAINT Parameter Validation for metrics.precision_score (scikit-learn#25708)

Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>

* CI Stablize build with random_state (scikit-learn#25701)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* MAINT Remove -Wcpp warnings when compiling arrayfuncs (scikit-learn#25415)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* DOC Add scikit-learn-intelex to related projects (scikit-learn#23766)

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* ENH Support float32 in SGDClassifier and SGDRegressor (scikit-learn#25587)

* FIX Raise appropriate attribute error in ensemble (scikit-learn#25668)

* FIX Allow OrdinalEncoder's encoded_missing_value set to the cardinality (scikit-learn#25704)

* ENH Let csr_row_norms support multi-thread (scikit-learn#25598)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Vincent M <maladiere.vincent@yahoo.fr>

* MAINT Parameter Validation for feature_selection.chi2 (scikit-learn#25719)

Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr>

* MAINT Parameter Validation for feature_selection.f_classif (scikit-learn#25720)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* MAINT Parameters validation for sklearn.metrics.matthews_corrcoef (scikit-learn#25712)

Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr>

* MAINT parameter validation for sklearn.datasets.dump_svmlight_file (scikit-learn#25726)

Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr>

* MAINT Clean dead code in build helpers (scikit-learn#25661)

* MAINT Use newest NumPy C API in metrics._dist_metrics (scikit-learn#25702)

* CI Adds permissions to workflows that use GITHUB_TOKEN (scikit-learn#25600)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* FIX Improves error message in partial_fit when early_stopping=True (scikit-learn#25694)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* DOC Makes navbar static (scikit-learn#25688)

* MAINT Remove redundant sparse square euclidian distances function (scikit-learn#25731)

* MAINT Use float64 for accumulators in WeightVector* (scikit-learn#25721)

* API make PatchExtractor being a real scikit-learn transformer (scikit-learn#24230)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* MAINT Update pyparsing.py to use bool instead of double negation (scikit-learn#25724)

* API Deprecates values in partial_dependence in favor of pdp_values (scikit-learn#21809)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* API Use grid_values instead of pdp_values in partial_dependence (scikit-learn#25732)

* MAINT remove np.product and inf/nan aliases in favor of canonical names (scikit-learn#25741)

* MAINT Parameters validation for metrics.label_ranking_loss (scikit-learn#25742)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* MAINT Parameters validation for metrics.coverage_error (scikit-learn#25748)

* MAINT Parameters validation for metrics.dcg_score (scikit-learn#25749)

* MAINT replace cnp.ndarray with memory views in _fast_dict (scikit-learn#25754)

* MAINT Parameter Validation for feature_selection.f_regression (scikit-learn#25736)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* MAINT Parameters validation for feature_selection.r_regression (scikit-learn#25734)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* MAINT Parameter Validation for metrics.get_scorer (scikit-learn#25738)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* DOC Move allowing pandas nullable dtypes to 1.2.2 (scikit-learn#25692)

* MAINT replace cnp.ndarray with memory views in sparsefuncs_fast (scikit-learn#25764)

* MAINT parameter validation for sklearn.datasets.fetch_covtype (scikit-learn#25759)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* MAINT Define centralized generic, but with explicit precision, types (scikit-learn#25739)

* CI Disable network when SciPy requires it (scikit-learn#25743)

* CI Open issue when arm wheel fails on CirrusCI (scikit-learn#25620)

* ENH Speed-up expected mutual information (scikit-learn#25713)

Co-authored-by: Kshitij Mathur <k.mathur68@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

* FIX add retry mechanism to handle quotechar in read_csv (scikit-learn#25511)

* Merge Population Creation (#1)

---------

Co-authored-by: Alex Buzenet <94121450+albuzenet@users.noreply.github.com>
Co-authored-by: Alex <alex.buzenet.fr@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Adam Kania <48769688+remilvus@users.noreply.github.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
Co-authored-by: Shady el Gewily <90049412+shadyelgewily-slimstock@users.noreply.github.com>
Co-authored-by: John Pangas <swiftyxswaggy@outlook.com>
Co-authored-by: Felipe Siola <fsiola@gmail.com>
Co-authored-by: Felipe Breve Siola <felipe.breve-siola@klarna.com>
Co-authored-by: Tim Head <betatim@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Anthony22-dev <122220081+Anthony22-dev@users.noreply.github.com>
Co-authored-by: adossantosalfam <anthony.dos_santos_alfama@insa-rouen.fr>
Co-authored-by: Xiao Yuan <yuanx749@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Co-authored-by: Rahil Parikh <75483881+rprkh@users.noreply.github.com>
Co-authored-by: Gael Varoquaux <gael.varoquaux@normalesup.org>
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
Co-authored-by: davidblnc <40642621+davidblnc@users.noreply.github.com>
Co-authored-by: Changyao Chen <changyao.chen@gmail.com>
Co-authored-by: Nicola Fanelli <48762613+nicolafan@users.noreply.github.com>
Co-authored-by: Vincent M <maladiere.vincent@yahoo.fr>
Co-authored-by: partev <petrosyan@gmail.com>
Co-authored-by: ouss1508 <121971998+ouss1508@users.noreply.github.com>
Co-authored-by: ashah002 <97778401+ashah002@users.noreply.github.com>
Co-authored-by: Ahmedbgh <83551938+Ahmedbgh@users.noreply.github.com>
Co-authored-by: Pooja M <90301980+pm155@users.noreply.github.com>
Co-authored-by: Ian Thompson <ianiat11@gmail.com>
Co-authored-by: Ian Thompson <ian.thompson@hrblock.com>
Co-authored-by: SANJAI_3 <86285670+sanjail3@users.noreply.github.com>
Co-authored-by: Kaushik Amar Das <cozek@users.noreply.github.com>
Co-authored-by: Kaushik Amar Das <kaushik.amar.das@accenture.com>
Co-authored-by: Nawazish Alam <nawazishmail@gmail.com>
Co-authored-by: William M <64324808+Akbeeh@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@yahoo.fr>
Co-authored-by: JanFidor <66260538+JanFidor@users.noreply.github.com>
Co-authored-by: Adam Li <adam2392@gmail.com>
Co-authored-by: Logan Thomas <logan.thomas005@gmail.com>
Co-authored-by: Vyom Pathak <angerstick3@gmail.com>
Co-authored-by: as-90 <88336957+as-90@users.noreply.github.com>
Co-authored-by: Marvin Krawutschke <101656586+Marvvxi@users.noreply.github.com>
Co-authored-by: Haesun Park <haesunrpark@gmail.com>
Co-authored-by: Christine P. Chai <star1327p@gmail.com>
Co-authored-by: Christian Veenhuis <124370897+ChVeen@users.noreply.github.com>
Co-authored-by: Sortofamudkip <wishyutp0328@gmail.com>
Co-authored-by: sonnivs <48860780+sonnivs@users.noreply.github.com>
Co-authored-by: Ali H. El-Kassas <aliabdelmonem234@gmail.com>
Co-authored-by: Yusuf Raji <raji.yusuf234@gmail.com>
Co-authored-by: Tabea Kossen <tabeakossen@gmail.com>
Co-authored-by: Pooja Subramaniam <poojas2086@gmail.com>
Co-authored-by: JuliaSchoepp <63353759+JuliaSchoepp@users.noreply.github.com>
Co-authored-by: Jack McIvor <jacktmcivor@gmail.com>
Co-authored-by: zeeshan lone <56621467+still-learning-ev@users.noreply.github.com>
Co-authored-by: Max Halford <maxhalford25@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: genvalen <genvalen@protonmail.com>
Co-authored-by: Shiva chauhan <103742975+Shivachauhan17@users.noreply.github.com>
Co-authored-by: Dayne <daynesorvisto@yahoo.ca>
Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Kshitij Mathur <k.mathur68@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants