MAINT Improve the `_middle_term_sparse_sparse_{32, 64}` routines #25449

Vincent-Maladiere · 2023-01-21T15:09:12Z

Reference Issues/PRs

Towards #22587
Follow up #24556

What does this implement/fix? Explain your changes.

In #24556, we introduced a routine for computing the dot product of sparse matrices efficiently for the Euclidean specialization of ArgKmin and RadiusNeighbors with CSR-CSR matrices.

This PR removes two TODOs aiming at improving the routine performance after trying these optimizations without success.
It also introduces shorter variable names to improve readability without losing too much context.

More details about these two optimization tentatives are below.

TODO1:

# If possible optimize this routine to efficiently treat cases where
# `n_samples_X << n_samples_Y` met in practise when X_test consists of a
# few samples, and thus when there's a single chunk of X whose number of
# samples is less than the default chunk size.

This first optimization suggests focusing on the iteration sequence order when there is a large imbalance between the number of rows between the X chunk and Y chunk (default is 256).
As we already loop on n_X first, we found no further way to gain performance based on this scenario.

TODO2:

# Compare this routine with the similar ones in SciPy, especially
# `csr_matmat` which might implement a better algorithm.
# See: https://github.com/scipy/scipy/blob/e58292e066ba2cb2f3d1e0563ca9314ff1f4f311/scipy/sparse/sparsetools/csr.h#L603-L669  # noqa

csr_matmat from SciPy introduces a slightly different routine for doing the same operation. As it uses only 3 for-loops instead of 4 in our case, we may gain some speed by applying a similar logic.

Before we try reproducing this logic, note that our setup differs from Scipy's at several levels:

Scipy csr_matmat uses a CSR matrix and a CSC matrix, instead of two CSR matrices. Since there is no documentation, the only way to spot it is by manually running the routine on two small matrices.
We have to deal with chunks via X_start, X_end and Y_start, Y_end, while Scipy's routine consumes the entire input matrices. This creates some overhead that will kill the performance of our candidate routine.

Our candidate routine, which passes all tests, is:

cdef void _middle_term_sparse_sparse_64(
    const DTYPE_t[:] X_data,
    const SPARSE_INDEX_TYPE_t[:] X_indices,
    const SPARSE_INDEX_TYPE_t[:] X_indptr,
    ITYPE_t X_start,
    ITYPE_t X_end,
    const DTYPE_t[:] Y_data,
    const SPARSE_INDEX_TYPE_t[:] Y_indices,
    const SPARSE_INDEX_TYPE_t[:] Y_indptr,
    ITYPE_t Y_start,
    ITYPE_t Y_end,
    DTYPE_t * D,
) nogil:
    # This routine assumes that D points to the first element of a
    # zeroed buffer of length at least equal to n_X × n_Y, conceptually
    # representing a 2-d C-ordered array.
    cdef:
        ITYPE_t i, j, k
        ITYPE_t n_X = X_end - X_start
        ITYPE_t n_Y = Y_end - Y_start
        ITYPE_t x_col, x_ptr, y_col, y_ptr
    for i in range(n_X):
        for x_ptr in range(X_indptr[X_start+i], X_indptr[X_start+i+1]):
            x_col = X_indices[x_ptr]
            for y_ptr in range(Y_indptr[x_col], Y_indptr[x_col+1]):
                y_col = Y_indices[y_ptr]
                if Y_start <= y_col < Y_end:
                    k = i * n_Y + y_col - Y_start
                    D[k] += -2 * X_data[x_ptr] * Y_data[y_ptr]

The main difference with our prior routine is that we got rid of the 3rd for-loop on n_Y by plugging x_col into Y_indptr directly.
We need to convert Y from CSR to CSC, and we achieve this in a single place, during SparseSparseMiddleTermComputer.__init__:

self.Y_data, self.Y_indices, self.Y_indptr = self.unpack_csr_matrix(Y.tocsc())

However, we need to use a super costly if Y_start <= y_col < Y_end: to filter the correct indices of Y, which introduces a serious performance degradation. Doing branchless doesn't improve this issue and creates some erratic errors during testing.

cc @jjerphan @glemaitre

Any other comments?

jjerphan · 2023-01-23T08:06:51Z

Hi @Vincent-Maladiere,

Thanks for exploring this. I do not have time nor bandwidth to have a look at the algorithm you developed now and will likely come back to you once the 1.2.1 release is out.

As discussed IRL, doing branch-less here might just evaluate both statements which is more costly than doing a comparison and a jump eventually. I would be in favor of not being too smart and keep the code-logic clear and transparent here.

ogrisel · 2023-01-24T16:54:02Z

We need to convert Y from CSR to CSC, and we achieve this in a single place, during SparseSparseMiddleTermComputer.init:

self.Y_data, self.Y_indices, self.Y_indptr = self.unpack_csr_matrix(Y.tocsc())

Note that this would also use more memory compared to working directly with chunks of CSR matrices as is the case in our current code base.

ogrisel

I think I preferred the old variable names. They are a bit verbose but also more explicit.

Not a very strong preference though.

jjerphan · 2023-01-26T08:06:58Z

Scipy csr_matmat uses a CSR matrix and a CSC matrix, instead of two CSR matrices. Since there is no documentation, the only way to spot it is by manually running the routine on two small matrices.

In our case we use two (chunks of) CSR matrices but the second one is transposed and those is seen as a CSC matrices without any conversion cost.

Since csr_matmat expects a (CSR, CSC)-couple, I still do not know what is blocking us from using it (I am saying it naively, I have not yet been able to get into the algorithm).

For the reader, profiling this script:

# csr_matmat.py
import numpy as np

from numpy.testing import assert_array_equal
from scipy.sparse import csr_matrix

n, p = 1000, 256
X = np.random.random((n, p))
X[X <= 0.3] = 0.
X = csr_matrix(X.astype(np.float64))

# While X is CSR, Y is CSC here due to the
# transposition (the coercion is natural).
Y = X.T

# Arrays are copied here are equal but are
# not identical (copies are created). 
assert_array_equal(X.data, Y.data)
assert_array_equal(X.indices, Y.indices)
assert_array_equal(X.indptr, Y.indptr)

# This dispatch to csr_matmat
X @ Y

With:

py-spy record --rate=500 \
              --native \
              -o csr_matmat.svg \ 
              -f speedscope \
              -- python csr_matmat.py

gives the following SpeedScope inspectable profiling:

Vincent-Maladiere · 2023-01-26T08:45:23Z

Hi @jjerphan, thanks for the clarification. Could you point out where the actual transpose operation on Y happens?
Empirically, the adapted code from scipy only passed tests after converting Y to CSC (I didn't try to transpose it, though).

Note that the Scipy's routine can't work for us since they loop on their entire input, while we have to deal with {X,Y}_{start,end} which introduces a significant cost.

jjerphan · 2023-01-26T09:18:15Z

Let's recap and make it explicit.

csr_matmat computes C = X @ Z where:

X is CSR
Z is CSC.
C is CSR.

To compute the middle term of a pair of the $(l, k)$-th pair of chunks, i.e.:

$$ - 2 \mathbf{X}^{(l)} {\mathbf{Y}^{(k)}}^\top $$

We use:

$\mathbf{X}^{(l)}$, a chunk (here the $l$-th) of $\mathbf{X}$. $\mathbf{X}$ is handled as X, a CSR matrix.
$\mathbf{Y}^{(k)}$, a chunk (here the $k$-th) of $\mathbf{Y}$. $\mathbf{Y}$ is handled as Y, a CSR matrix
hence ${\mathbf{Y}^{(k)}}^\top$ is we can be seen as a chunk of a CSC matrix (i.e. $\mathbf{Y}^{(k)}$ is conceptually but not programmatically transposed.)

Thus, if we were computing it without chunks, i.e.:

$$ - 2 \mathbf{X} {\mathbf{Y}}^\top $$

we could slightly modify csr_matmat to change the accumulations of sums. @Vincent-Maladiere: can you confirm that we have the same understanding?

Now, we are using chunks, so we can't simply translate csr_matmat from C++ to Cython but we might get some inspiration from it to better craft _middle_term_sparse_sparse_*. (This was the original motivation for my comment (named TODO2 above), but this was probably unclear or not explicit enough). Can you confirm, @Vincent-Maladiere?

Vincent-Maladiere · 2023-01-26T09:32:51Z

Thus, if we were computing it without chunks, we could slightly modify csr_matmat to change the accumulations of sums. @Vincent-Maladiere: can you confirm that we have the same understanding?

Absolutely! This would work like a charm, with a nice speed-up.

Now, we are using chunks, so we can't simply translate csr_matmat from C++ to Cython, but we might get some inspiration from it to better craft middle_term_sparse_sparse*. (This was the original motivation for my comment (named TODO2 above), but this was probably unclear or not explicit enough). Can you confirm, @Vincent-Maladiere?

That is precisely what I have been trying to achieve, but I haven't found a more efficient solution than the one described above. This is, of course, up to discussion, and I would appreciate having feedback on new candidates for _middle_term_sparse_sparse_*.

Also, note that the innovation of Scipy's approach is to remove the loop on n_Y by using x_col to lookup Y_indptr directly.

jjerphan · 2023-01-26T09:45:54Z

That is precisely what I have been trying to achieve, but I haven't found a more efficient solution than the one described above. This is, of course, up to discussion, and I would appreciate having feedback on new candidates for _middle_term_sparse_sparse_*.

OK. 👍

I need to scope some time to have a look at this.

jjerphan

LGTM, after IRL discussions with @Vincent-Maladiere.

I am +0 regarding integrating the variable names' changes.

sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx.tp

…puter.pyx.tp Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

Micky774 · 2023-02-04T22:04:19Z

Wanted to mention that I prefer the shorter names, since I think the truncated information is still clear enough in context of the code while making it easier to read.

jjerphan · 2023-02-06T13:55:36Z

I let @ogrisel or @Micky774 merge when this LGTT (to me, this can be merged but this is not urgent).

ogrisel

Let's merge then.

This was already studied in: scikit-learn#25449 Co-authored-by: Vincent M <maladiere.vincent@yahoo.fr>

* ENH Raise NotFittedError in get_feature_names_out for MissingIndicator, KBinsDiscretizer, SplineTransformer, DictVectorizer (scikit-learn#25402) Co-authored-by: Alex <alex.buzenet.fr@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * DOC Update date and contributors list for v1.2.1 (scikit-learn#25459) * DOC Make MeanShift documentation clearer (scikit-learn#25305) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * Finishes boolean and arithmetic creation * Skeleton for traditional GP * DOC Reorder whats_new/v1.2.rst (scikit-learn#25461) Follow-up of scikit-learn#25459 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com> * FIX fix faulty test in `cross_validate` that used the wrong estimator (scikit-learn#25456) * ENH Raise NotFittedError in get_feature_names_out for estimators that use ClassNamePrefixFeatureOutMixin and SelectorMixin (scikit-learn#25308) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * EFF Improve IsolationForest predict time (scikit-learn#25186) Co-authored-by: Felipe Breve Siola <felipe.breve-siola@klarna.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Tim Head <betatim@gmail.com> * MAINT refactor spectral_clustering to call SpectralClustering (scikit-learn#25392) * TST reduce warnings in test_logistic.py (scikit-learn#25469) * CI Build doc on CircleCI (scikit-learn#25466) * DOC Update news footer for 1.2.1 (scikit-learn#25472) * MAINT Validate parameter for `sklearn.cluster.cluster_optics_xi` (scikit-learn#25385) Co-authored-by: adossantosalfam <anthony.dos_santos_alfama@insa-rouen.fr> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT Parameters validation for additive_chi2_kernel (scikit-learn#25424) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * Initial Program Creation * CI Include linting in CircleCI (scikit-learn#25475) * MAINT Update version number to 1.2.1 in SECURITY.md (scikit-learn#25471) * TST Sets random_state for test_logistic.py (scikit-learn#25446) * MAINT Remove -Wcpp warnings when compiling sklearn.decomposition._online_lda_fast (scikit-learn#25020) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * FIX Support readonly sparse datasets for `manhattan_distances` (scikit-learn#25432) * TST Add non-regression test for scikit-learn#7981 This reproducer is adapted from the one of this message: scikit-learn#7981 (comment) Co-authored-by: Loïc Estève <loic.esteve@ymail.com> * FIX Support readonly sparse datasets for manhattan * DOC Add entry in whats_new/v1.2.rst for 1.2.1 * FIX Fix comment * Update sklearn/metrics/tests/test_pairwise.py Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> * DOC Move entry to whats_new/v1.3.rst * Update sklearn/metrics/tests/test_pairwise.py Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT dynamically expose kulsinski and remove support in BallTree (scikit-learn#25417) Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> closes scikit-learn#25212 * DOC Adds CirrusCI badge to readme (scikit-learn#25483) * CI add linter display name (scikit-learn#25485) * DOC update description of X in `FunctionTransformer.transform()` (scikit-learn#24844) * MAINT remove -Wcpp warnings when compiling sklearn.preprocessing._csr_polynomial_expansion (scikit-learn#25041) * DOC more didactic example of bisecting kmeans (scikit-learn#25494) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * ENH csr_row_norms optimization (scikit-learn#24426) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com> * TST Allow callables as valid parameter regarding cloning estimator (scikit-learn#25498) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: From: Tim Head <betatim@gmail.com> * DOC Fixes sphinx search on website (scikit-learn#25504) * FIX make IsotonicRegression always predict NumPy arrays (scikit-learn#25500) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * FEA Add Gamma deviance as loss function to HGBT (scikit-learn#22409) * FEA add gamma loss to HGBT * DOC add whatsnew * CLN address review comments * TST make test_gamma pass by not testing out-of-sample * TST compare gamma and poisson to LightGBM * TST fix test_gamma by comparing to MSE HGBT instead of Poisson HGBT * TST fix for test_same_predictions_regression for poisson * CLN address review comments * CLN nits * CLN better comments * TST use pytest.param with skip mark * TST Correct conditional test parametrization mark Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> * CI Trigger CI Builds currently fail because requests to Azure Ubuntu repository timeout. * DOC add comment for lax comparison with LightGBM * CLN tuple needs trailing comma --------- Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * MAINT Remove -Wsign-compare warnings when compiling sklearn.tree._tree (scikit-learn#25507) * MAINT add more intuition on OAS computation based on literature (scikit-learn#23867) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * CI Allow cirrus arm tests to run with cd build commit tag (scikit-learn#25514) * CI Upload ARM wheels from CirrusCI to nightly and staging index (scikit-learn#25513) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT Remove -Wcpp warnings from sklearn.utils._seq_dataset (scikit-learn#25406) * FIX Fixes linux ARM CI on CirrusCI (scikit-learn#25536) * DOC Fix grammatical mistake in `mixture` module (scikit-learn#25541) * DOC add missing trailing colon (scikit-learn#25542) * MAINT Parameters validation for sklearn.datasets.make_classification (scikit-learn#25474) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MNT Expose allow_nan tag in bagging (scikit-learn#25506) * MAINT Clean-up comments and rename variables in `_middle_term_sparse_sparse_{32, 64}` (scikit-learn#25449) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * DOC: remove incorrect statement (scikit-learn#25544) * MAINT Parameters validation for reconstruct_from_patches_2d (scikit-learn#25384) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT Parameter validation for sklearn.metrics.d2_pinball_score (scikit-learn#25414) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for spectral_clustering (scikit-learn#25378) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT Parameters validation for sklearn.datasets.fetch_kddcup99 (scikit-learn#25463) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * DOC Update MLPRegressor docs (scikit-learn#25556) Co-authored-by: Ian Thompson <ian.thompson@hrblock.com> * DOC Update docs for KMeans (scikit-learn#25546) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * FIX BisectingKMeans crashes randomly (scikit-learn#25563) Fixes scikit-learn#25505 * ENH BaseLabelPropagation to accept sparse matrices (scikit-learn#19664) Co-authored-by: Kaushik Amar Das <kaushik.amar.das@accenture.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT Remove travis ci config and related doc (scikit-learn#25562) * DOC Add pynndescent to Approximate nearest neighbors in TSNE example (scikit-learn#25480) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * DOC Add docstring example to make_regression (scikit-learn#25551) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT ensure that pos_label support all possible types (scikit-learn#25317) * MAINT Parameters validation for sklearn.metrics.f1_score (scikit-learn#25557) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * ENH Adds `class_names` to `tree.export_text` (scikit-learn#25387) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT Replace cnp.ndarray with memory views in sklearn.tree._tree (where possible) (scikit-learn#25540) * DOC Change print format in TSNE example (scikit-learn#25569) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * FIX ColumnTransformer supports empty selection for pandas output (scikit-learn#25570) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * DOC fix docstring of _plain_sgd (scikit-learn#25573) * FIX Enable setting of sub-parameters for deprecated base_estimator param (scikit-learn#25477) * DOC Improve minor and bug-fix release processes documentation (scikit-learn#25457) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@yahoo.fr> * MAINT Remove ReadonlyArrayWrapper from _loss module (scikit-learn#25555) * MAINT Remove ReadonlyArrayWrapper from _loss module * CLN Remove comments about Cython 3.0 * MAINT Remove ReadonlyArrayWrapper from _kmeans (scikit-learn#25554) * MAINT Remove ReadonlyArrayWrapper from _kmeans * more const and remove blas compile warnings * CLN Adds comment about casting to non const pointers * Update sklearn/utils/_cython_blas.pyx * MAINT Remove ReadonlyArrayWrapper from DistanceMetric (scikit-learn#25553) * DOC improve stop_words description w.r.t. max_df range in CountVectorizer (scikit-learn#25489) * MAINT Removes ReadOnlyWrapper (scikit-learn#25586) * MAINT Parameters validation for sklearn.metrics.log_loss (scikit-learn#25577) * MAINT Adds comments and better naming into tree code (scikit-learn#25576) * MAINT Adds comments and better naming into tree code * CLN Use feature_values instead of Xf * Apply suggestions from code review Co-authored-by: Adam Li <adam2392@gmail.com> * DOC Improve comment from review * Apply suggestions from code review Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> --------- Co-authored-by: Adam Li <adam2392@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * FIX error when deserialzing a Tree instance from a read only buffer (scikit-learn#25585) * DOC: fix typo in California Housing dataset description (scikit-learn#25613) * ENH: Update KDTree, and example documentation (scikit-learn#25482) * ENH: Update KDTree, and example documentation * ENH: Add valid metric function and reference doc * CHG: Documentation update Co-authored-by: Adam Li <adam2392@gmail.com> * CHG: make valid metric property and fix doc string * FIX: documentation, and add code example * ENH: Change valid metric to class method, and doc * ENH: Change valid metric class variable, and doc * FIX: documentation error * FIX: documentation error * CHG: Use class method for valid metrics * FIX: CI problems --------- Co-authored-by: Adam Li <adam2392@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * TST Common test for checking estimator deserialization from a read only buffer (scikit-learn#25624) * DOC fix comment in plot_logistic_l1_l2_sparsity.py (scikit-learn#25633) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * DOC Places governance in navigation bar (scikit-learn#25618) * MAINT Check pyproject toml is consistent with min_dependencies (scikit-learn#25610) * MAINT Check pyproject toml is consistent with min_dependencies * CLN Make it clear that only SciPy and Cython are checked * CLN Revert auto formatter * MAINT Use newest NumPy C API in tree._criterion (scikit-learn#25615) * MAINT Use newest NumPy C API in tree._criterion * FIX Use pointer for children * FIX Fixes check_array nonfinite checks with ArrayAPI specification (scikit-learn#25619) * FIX Fixes check_array nonfinite checks with ArrayAPI specification * DOC Adds PR number * FIX Test on both cupy and numpy * DOC Correctly docstring in StackingRegressor.fit_transform (scikit-learn#25599) * MAINT Remove Cython compilation warnings ahead of Cython3.0 release (scikit-learn#25621) * ENH Preserve DataFrame dtypes in transform for feature selectors (scikit-learn#25102) * FIX report properly n_iter_ when warm_start=True (scikit-learn#25443) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * DOC fix typo in KMeans's param. (scikit-learn#25649) * FIX use const memory views in hist_gradient_boosting predictor (scikit-learn#25650) * DOC modified the graph for better readability (scikit-learn#25644) * MAINT Removes upper limit on setuptools (scikit-learn#25651) * DOC improve the `warm_start` glossary entry (scikit-learn#25523) * DOC Update governance document for SLEP020 (scikit-learn#25663) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> * FIX renormalization of y_pred inside log_loss (scikit-learn#25299) * Remove renormalization of y_pred inside log_loss * Deprecate eps parameter in log_loss * ENH Allows target to be pandas nullable dtypes (scikit-learn#25638) * DOC unify usage of 'w.r.t.' (scikit-learn#25683) * MAINT Parameters validation for metrics.max_error (scikit-learn#25679) * MAINT Parameters validation for datasets.make_friedman1 (scikit-learn#25674) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MAINT Parameters validation for mean_pinball_loss (scikit-learn#25685) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * DOC Specify behavior of None for CountVectorizer (scikit-learn#25678) * DOC Specify behaviour of None for TfIdfVectorizer max_features parameter (scikit-learn#25676) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT Set random state for plot_anomaly_comparison (scikit-learn#25675) * MAINT Parameters validation for cluster.mean_shift (scikit-learn#25684) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.metrics.jaccard_score (scikit-learn#25680) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * DOC Add the custom compiler section back (scikit-learn#25667) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * MAINT Parameters validation for precision_recall_fscore_support (scikit-learn#25681) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * FIX Allow negative tol in SequentialFeatureSelector (scikit-learn#25664) * MAINT Replace deprecated cython conditional compilation (scikit-learn#25654) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * DOC fix formatting typo in related_projects (scikit-learn#25706) * MAINT Parameters validation for metrics.mean_absolute_percentage_error (scikit-learn#25695) * MAINT Parameters validation for metrics.precision_recall_curve (scikit-learn#25698) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MAINT Parameter Validation for metrics.precision_score (scikit-learn#25708) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * CI Stablize build with random_state (scikit-learn#25701) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Remove -Wcpp warnings when compiling arrayfuncs (scikit-learn#25415) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * DOC Add scikit-learn-intelex to related projects (scikit-learn#23766) Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * ENH Support float32 in SGDClassifier and SGDRegressor (scikit-learn#25587) * FIX Raise appropriate attribute error in ensemble (scikit-learn#25668) * FIX Allow OrdinalEncoder's encoded_missing_value set to the cardinality (scikit-learn#25704) * ENH Let csr_row_norms support multi-thread (scikit-learn#25598) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Vincent M <maladiere.vincent@yahoo.fr> * MAINT Parameter Validation for feature_selection.chi2 (scikit-learn#25719) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameter Validation for feature_selection.f_classif (scikit-learn#25720) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.matthews_corrcoef (scikit-learn#25712) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT parameter validation for sklearn.datasets.dump_svmlight_file (scikit-learn#25726) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Clean dead code in build helpers (scikit-learn#25661) * MAINT Use newest NumPy C API in metrics._dist_metrics (scikit-learn#25702) * CI Adds permissions to workflows that use GITHUB_TOKEN (scikit-learn#25600) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * FIX Improves error message in partial_fit when early_stopping=True (scikit-learn#25694) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC Makes navbar static (scikit-learn#25688) * MAINT Remove redundant sparse square euclidian distances function (scikit-learn#25731) * MAINT Use float64 for accumulators in WeightVector* (scikit-learn#25721) * API make PatchExtractor being a real scikit-learn transformer (scikit-learn#24230) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Update pyparsing.py to use bool instead of double negation (scikit-learn#25724) * API Deprecates values in partial_dependence in favor of pdp_values (scikit-learn#21809) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * API Use grid_values instead of pdp_values in partial_dependence (scikit-learn#25732) * MAINT remove np.product and inf/nan aliases in favor of canonical names (scikit-learn#25741) * MAINT Parameters validation for metrics.label_ranking_loss (scikit-learn#25742) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for metrics.coverage_error (scikit-learn#25748) * MAINT Parameters validation for metrics.dcg_score (scikit-learn#25749) * MAINT replace cnp.ndarray with memory views in _fast_dict (scikit-learn#25754) * MAINT Parameter Validation for feature_selection.f_regression (scikit-learn#25736) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for feature_selection.r_regression (scikit-learn#25734) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameter Validation for metrics.get_scorer (scikit-learn#25738) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC Move allowing pandas nullable dtypes to 1.2.2 (scikit-learn#25692) * MAINT replace cnp.ndarray with memory views in sparsefuncs_fast (scikit-learn#25764) * MAINT parameter validation for sklearn.datasets.fetch_covtype (scikit-learn#25759) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Define centralized generic, but with explicit precision, types (scikit-learn#25739) * CI Disable network when SciPy requires it (scikit-learn#25743) * CI Open issue when arm wheel fails on CirrusCI (scikit-learn#25620) * ENH Speed-up expected mutual information (scikit-learn#25713) Co-authored-by: Kshitij Mathur <k.mathur68@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com> * FIX add retry mechanism to handle quotechar in read_csv (scikit-learn#25511) * Merge Population Creation (#1) --------- Co-authored-by: Alex Buzenet <94121450+albuzenet@users.noreply.github.com> Co-authored-by: Alex <alex.buzenet.fr@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Adam Kania <48769688+remilvus@users.noreply.github.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com> Co-authored-by: Shady el Gewily <90049412+shadyelgewily-slimstock@users.noreply.github.com> Co-authored-by: John Pangas <swiftyxswaggy@outlook.com> Co-authored-by: Felipe Siola <fsiola@gmail.com> Co-authored-by: Felipe Breve Siola <felipe.breve-siola@klarna.com> Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Anthony22-dev <122220081+Anthony22-dev@users.noreply.github.com> Co-authored-by: adossantosalfam <anthony.dos_santos_alfama@insa-rouen.fr> Co-authored-by: Xiao Yuan <yuanx749@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: Rahil Parikh <75483881+rprkh@users.noreply.github.com> Co-authored-by: Gael Varoquaux <gael.varoquaux@normalesup.org> Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> Co-authored-by: davidblnc <40642621+davidblnc@users.noreply.github.com> Co-authored-by: Changyao Chen <changyao.chen@gmail.com> Co-authored-by: Nicola Fanelli <48762613+nicolafan@users.noreply.github.com> Co-authored-by: Vincent M <maladiere.vincent@yahoo.fr> Co-authored-by: partev <petrosyan@gmail.com> Co-authored-by: ouss1508 <121971998+ouss1508@users.noreply.github.com> Co-authored-by: ashah002 <97778401+ashah002@users.noreply.github.com> Co-authored-by: Ahmedbgh <83551938+Ahmedbgh@users.noreply.github.com> Co-authored-by: Pooja M <90301980+pm155@users.noreply.github.com> Co-authored-by: Ian Thompson <ianiat11@gmail.com> Co-authored-by: Ian Thompson <ian.thompson@hrblock.com> Co-authored-by: SANJAI_3 <86285670+sanjail3@users.noreply.github.com> Co-authored-by: Kaushik Amar Das <cozek@users.noreply.github.com> Co-authored-by: Kaushik Amar Das <kaushik.amar.das@accenture.com> Co-authored-by: Nawazish Alam <nawazishmail@gmail.com> Co-authored-by: William M <64324808+Akbeeh@users.noreply.github.com> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@yahoo.fr> Co-authored-by: JanFidor <66260538+JanFidor@users.noreply.github.com> Co-authored-by: Adam Li <adam2392@gmail.com> Co-authored-by: Logan Thomas <logan.thomas005@gmail.com> Co-authored-by: Vyom Pathak <angerstick3@gmail.com> Co-authored-by: as-90 <88336957+as-90@users.noreply.github.com> Co-authored-by: Marvin Krawutschke <101656586+Marvvxi@users.noreply.github.com> Co-authored-by: Haesun Park <haesunrpark@gmail.com> Co-authored-by: Christine P. Chai <star1327p@gmail.com> Co-authored-by: Christian Veenhuis <124370897+ChVeen@users.noreply.github.com> Co-authored-by: Sortofamudkip <wishyutp0328@gmail.com> Co-authored-by: sonnivs <48860780+sonnivs@users.noreply.github.com> Co-authored-by: Ali H. El-Kassas <aliabdelmonem234@gmail.com> Co-authored-by: Yusuf Raji <raji.yusuf234@gmail.com> Co-authored-by: Tabea Kossen <tabeakossen@gmail.com> Co-authored-by: Pooja Subramaniam <poojas2086@gmail.com> Co-authored-by: JuliaSchoepp <63353759+JuliaSchoepp@users.noreply.github.com> Co-authored-by: Jack McIvor <jacktmcivor@gmail.com> Co-authored-by: zeeshan lone <56621467+still-learning-ev@users.noreply.github.com> Co-authored-by: Max Halford <maxhalford25@gmail.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: genvalen <genvalen@protonmail.com> Co-authored-by: Shiva chauhan <103742975+Shivachauhan17@users.noreply.github.com> Co-authored-by: Dayne <daynesorvisto@yahoo.ca> Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com> Co-authored-by: Kshitij Mathur <k.mathur68@gmail.com>

remove TODOs and simplify variable names

d29fe05

github-actions bot added module:metrics cython labels Jan 21, 2023

glemaitre changed the title ~~[MAINT] Remove TODOs from _middle_term_sparse_sparse_64 routine~~ MAINT Remove TODOs from _middle_term_sparse_sparse_64 routine Jan 23, 2023

Merge branch 'main' into optimize_csr_csr_routine

a8ad977

ogrisel reviewed Jan 24, 2023

View reviewed changes

jjerphan changed the title ~~MAINT Remove TODOs from _middle_term_sparse_sparse_64 routine~~ MAINT Improve the _middle_term_sparse_sparse_{32, 64} routines Jan 26, 2023

jjerphan approved these changes Jan 31, 2023

View reviewed changes

sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx.tp Outdated Show resolved Hide resolved

Vincent-Maladiere and others added 2 commits February 1, 2023 09:45

Update sklearn/metrics/_pairwise_distances_reduction/_middle_term_com…

777e012

…puter.pyx.tp Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

Merge branch 'main' into optimize_csr_csr_routine

109c248

ogrisel approved these changes Feb 6, 2023

View reviewed changes

ogrisel merged commit 4acd91d into scikit-learn:main Feb 6, 2023

jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Feb 27, 2023

Remove outdated TODO comment

a874def

This was already studied in: scikit-learn#25449 Co-authored-by: Vincent M <maladiere.vincent@yahoo.fr>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT Improve the `_middle_term_sparse_sparse_{32, 64}` routines #25449

MAINT Improve the `_middle_term_sparse_sparse_{32, 64}` routines #25449

Uh oh!

Vincent-Maladiere commented Jan 21, 2023

Uh oh!

jjerphan commented Jan 23, 2023

Uh oh!

ogrisel commented Jan 24, 2023

Uh oh!

ogrisel left a comment

Uh oh!

jjerphan commented Jan 26, 2023 •

edited

Loading

Uh oh!

Vincent-Maladiere commented Jan 26, 2023

Uh oh!

jjerphan commented Jan 26, 2023

Uh oh!

Vincent-Maladiere commented Jan 26, 2023

Uh oh!

jjerphan commented Jan 26, 2023

Uh oh!

jjerphan left a comment •

edited

Loading

Uh oh!

Uh oh!

Micky774 commented Feb 4, 2023

Uh oh!

jjerphan commented Feb 6, 2023

Uh oh!

ogrisel left a comment

Uh oh!

Uh oh!

Uh oh!

MAINT Improve the _middle_term_sparse_sparse_{32, 64} routines #25449

MAINT Improve the _middle_term_sparse_sparse_{32, 64} routines #25449

Uh oh!

Conversation

Vincent-Maladiere commented Jan 21, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jjerphan commented Jan 23, 2023

Uh oh!

ogrisel commented Jan 24, 2023

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

jjerphan commented Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vincent-Maladiere commented Jan 26, 2023

Uh oh!

jjerphan commented Jan 26, 2023

Uh oh!

Vincent-Maladiere commented Jan 26, 2023

Uh oh!

jjerphan commented Jan 26, 2023

Uh oh!

jjerphan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Micky774 commented Feb 4, 2023

Uh oh!

jjerphan commented Feb 6, 2023

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MAINT Improve the `_middle_term_sparse_sparse_{32, 64}` routines #25449

MAINT Improve the `_middle_term_sparse_sparse_{32, 64}` routines #25449

jjerphan commented Jan 26, 2023 •

edited

Loading

jjerphan left a comment •

edited

Loading