-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT PairwiseDistancesReduction
: Do not slice memoryviews in _compute_dist_middle_terms
#24715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See the reasons here: scikit-learn#17299
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see a performance improvement with this change?
Codewise, this LGTM. In general, I am okay with avoiding slicing memoryviews in Cython.
I am currently rerunning benchmarks on relevant configurations on a machine with 128 cores. I know using Amdahl's Law that 2.5% of If we get increased performances in this PR, it means that the slicing is likely part of the sequential portions. In this case, removing the slicing would allow getting the sequential portions bellow the 2.5% of the whole implementation (and I would like to reassess maximum speed-up again, but this can be done after this PR getting merged if significant improvement are observed). |
Most configurations aren't impacted but the ones with small datasets are sometimes slower or faster. 🤔
Full ASV results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on this one. I think the previous code was fine and did a few less arithmetic operations (substractions) compared to the new one.
Furthermore I don't see the relation with #17299 which is about Cython function calls on views. Here there is no Cython function call on memory views, right?
It's only about manual pointer arithmetic and dereferencing before calling a function with point arguments.
Feel free to merge if you believe it's an improvement but I am not convinced myself.
I am confused by the ASV results. Maybe you can confirm that this does not significantly change the performance when using a lower number of threads?
It is more than slicing comes with a sets of extra instructions that are more costly than pointer arithmetic. See the red lines on the difference between generated sources bellow which calls internals of Cython for memoryview/memoryviews slices creation and reference counting prefixed by Source code generated on `main` @ 68a7427cat a.cpp /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":219
* return
*
* cdef DTYPE_t * _compute_dist_middle_terms( # <<<<<<<<<<<<<<
* self,
* ITYPE_t X_start,
*/
static __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_f_7sklearn_7metrics_29_pairwise_distances_reduction_21_middle_term_computer_30DenseDenseMiddleTermComputer64__compute_dist_middle_terms(struct __pyx_obj_7sklearn_7metrics_29_pairwise_distances_reduction_21_middle_term_computer_DenseDenseMiddleTermComputer64 *__pyx_v_self, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_X_start, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_X_end, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_Y_start, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_Y_end, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_thread_num) {
__Pyx_memviewslice __pyx_v_X_c = { 0, 0, { 0 }, { 0 }, { 0 } };
__Pyx_memviewslice __pyx_v_Y_c = { 0, 0, { 0 }, { 0 }, { 0 } };
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_dist_middle_terms;
enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Order __pyx_v_order;
enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Trans __pyx_v_ta;
enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Trans __pyx_v_tb;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_m;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_n;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_K;
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t __pyx_v_alpha;
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_A;
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_B;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_lda;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_ldb;
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t __pyx_v_beta;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_ldc;
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_r;
__Pyx_memviewslice __pyx_t_1 = { 0, 0, { 0 }, { 0 }, { 0 } };
int __pyx_t_2;
__Pyx_memviewslice __pyx_t_3 = { 0, 0, { 0 }, { 0 }, { 0 } };
Py_ssize_t __pyx_t_4;
Py_ssize_t __pyx_t_5;
int __pyx_lineno = 0;
const char *__pyx_filename = NULL;
int __pyx_clineno = 0;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":228
* ) nogil:
* cdef:
* const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :] # <<<<<<<<<<<<<<
* const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :]
* DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data()
*/
__pyx_t_1.data = __pyx_v_self->X.data;
__pyx_t_1.memview = __pyx_v_self->X.memview;
__PYX_INC_MEMVIEW(&__pyx_t_1, 0);
__pyx_t_2 = -1;
if (unlikely(__pyx_memoryview_slice_memviewslice(
&__pyx_t_1,
__pyx_v_self->X.shape[0], __pyx_v_self->X.strides[0], __pyx_v_self->X.suboffsets[0],
0,
0,
&__pyx_t_2,
__pyx_v_X_start,
__pyx_v_X_end,
0,
1,
1,
0,
1) < 0))
{
__PYX_ERR(0, 228, __pyx_L1_error)
}
__pyx_t_1.shape[1] = __pyx_v_self->X.shape[1];
__pyx_t_1.strides[1] = __pyx_v_self->X.strides[1];
__pyx_t_1.suboffsets[1] = -1;
__pyx_v_X_c = __pyx_t_1;
__pyx_t_1.memview = NULL;
__pyx_t_1.data = NULL;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":229
* cdef:
* const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :]
* const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :] # <<<<<<<<<<<<<<
* DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data()
*
*/
__pyx_t_3.data = __pyx_v_self->Y.data;
__pyx_t_3.memview = __pyx_v_self->Y.memview;
__PYX_INC_MEMVIEW(&__pyx_t_3, 0);
__pyx_t_2 = -1;
if (unlikely(__pyx_memoryview_slice_memviewslice(
&__pyx_t_3,
__pyx_v_self->Y.shape[0], __pyx_v_self->Y.strides[0], __pyx_v_self->Y.suboffsets[0],
0,
0,
&__pyx_t_2,
__pyx_v_Y_start,
__pyx_v_Y_end,
0,
1,
1,
0,
1) < 0))
{
__PYX_ERR(0, 229, __pyx_L1_error)
}
__pyx_t_3.shape[1] = __pyx_v_self->Y.shape[1];
__pyx_t_3.strides[1] = __pyx_v_self->Y.strides[1];
__pyx_t_3.suboffsets[1] = -1;
__pyx_v_Y_c = __pyx_t_3;
__pyx_t_3.memview = NULL;
__pyx_t_3.data = NULL;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":230
* const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :]
* const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :]
* DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data() # <<<<<<<<<<<<<<
*
* # Careful: LDA, LDB and LDC are given for F-ordered arrays
*/
__pyx_v_dist_middle_terms = (__pyx_v_self->__pyx_base.dist_middle_terms_chunks[__pyx_v_thread_num]).data();
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":237
* #
* # Here, we use their counterpart values to work with C-ordered arrays.
* BLAS_Order order = RowMajor # <<<<<<<<<<<<<<
* BLAS_Trans ta = NoTrans
* BLAS_Trans tb = Trans
*/
__pyx_v_order = __pyx_e_7sklearn_5utils_12_cython_blas_RowMajor;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":238
* # Here, we use their counterpart values to work with C-ordered arrays.
* BLAS_Order order = RowMajor
* BLAS_Trans ta = NoTrans # <<<<<<<<<<<<<<
* BLAS_Trans tb = Trans
* ITYPE_t m = X_c.shape[0]
*/
__pyx_v_ta = __pyx_e_7sklearn_5utils_12_cython_blas_NoTrans;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":239
* BLAS_Order order = RowMajor
* BLAS_Trans ta = NoTrans
* BLAS_Trans tb = Trans # <<<<<<<<<<<<<<
* ITYPE_t m = X_c.shape[0]
* ITYPE_t n = Y_c.shape[0]
*/
__pyx_v_tb = __pyx_e_7sklearn_5utils_12_cython_blas_Trans;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":240
* BLAS_Trans ta = NoTrans
* BLAS_Trans tb = Trans
* ITYPE_t m = X_c.shape[0] # <<<<<<<<<<<<<<
* ITYPE_t n = Y_c.shape[0]
* ITYPE_t K = X_c.shape[1]
*/
__pyx_v_m = (__pyx_v_X_c.shape[0]);
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":241
* BLAS_Trans tb = Trans
* ITYPE_t m = X_c.shape[0]
* ITYPE_t n = Y_c.shape[0] # <<<<<<<<<<<<<<
* ITYPE_t K = X_c.shape[1]
* DTYPE_t alpha = - 2.
*/
__pyx_v_n = (__pyx_v_Y_c.shape[0]);
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":242
* ITYPE_t m = X_c.shape[0]
* ITYPE_t n = Y_c.shape[0]
* ITYPE_t K = X_c.shape[1] # <<<<<<<<<<<<<<
* DTYPE_t alpha = - 2.
* # Casting for A and B to remove the const is needed because APIs exposed via
*/
__pyx_v_K = (__pyx_v_X_c.shape[1]);
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":243
* ITYPE_t n = Y_c.shape[0]
* ITYPE_t K = X_c.shape[1]
* DTYPE_t alpha = - 2. # <<<<<<<<<<<<<<
* # Casting for A and B to remove the const is needed because APIs exposed via
* # scipy.linalg.cython_blas aren't reflecting the arguments' const qualifier.
*/
__pyx_v_alpha = -2.;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":247
* # scipy.linalg.cython_blas aren't reflecting the arguments' const qualifier.
* # See: https://github.com/scipy/scipy/issues/14262
* DTYPE_t * A = <DTYPE_t *> &X_c[0, 0] # <<<<<<<<<<<<<<
* DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
* ITYPE_t lda = X_c.shape[1]
*/
__pyx_t_4 = 0;
__pyx_t_5 = 0;
__pyx_v_A = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=0 */ (__pyx_v_X_c.data + __pyx_t_4 * __pyx_v_X_c.strides[0]) )) + __pyx_t_5)) )))));
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":248
* # See: https://github.com/scipy/scipy/issues/14262
* DTYPE_t * A = <DTYPE_t *> &X_c[0, 0]
* DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0] # <<<<<<<<<<<<<<
* ITYPE_t lda = X_c.shape[1]
* ITYPE_t ldb = X_c.shape[1]
*/
__pyx_t_5 = 0;
__pyx_t_4 = 0;
__pyx_v_B = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=0 */ (__pyx_v_Y_c.data + __pyx_t_5 * __pyx_v_Y_c.strides[0]) )) + __pyx_t_4)) )))));
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":249
* DTYPE_t * A = <DTYPE_t *> &X_c[0, 0]
* DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
* ITYPE_t lda = X_c.shape[1] # <<<<<<<<<<<<<<
* ITYPE_t ldb = X_c.shape[1]
* DTYPE_t beta = 0.
*/
__pyx_v_lda = (__pyx_v_X_c.shape[1]);
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":250
* DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
* ITYPE_t lda = X_c.shape[1]
* ITYPE_t ldb = X_c.shape[1] # <<<<<<<<<<<<<<
* DTYPE_t beta = 0.
* ITYPE_t ldc = Y_c.shape[0]
*/
__pyx_v_ldb = (__pyx_v_X_c.shape[1]);
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":251
* ITYPE_t lda = X_c.shape[1]
* ITYPE_t ldb = X_c.shape[1]
* DTYPE_t beta = 0. # <<<<<<<<<<<<<<
* ITYPE_t ldc = Y_c.shape[0]
*
*/
__pyx_v_beta = 0.;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":252
* ITYPE_t ldb = X_c.shape[1]
* DTYPE_t beta = 0.
* ITYPE_t ldc = Y_c.shape[0] # <<<<<<<<<<<<<<
*
* # dist_middle_terms = `-2 * X_c @ Y_c.T`
*/
__pyx_v_ldc = (__pyx_v_Y_c.shape[0]);
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":255
*
* # dist_middle_terms = `-2 * X_c @ Y_c.T`
* _gemm(order, ta, tb, m, n, K, alpha, A, lda, B, ldb, beta, dist_middle_terms, ldc) # <<<<<<<<<<<<<<
*
* return dist_middle_terms
*/
__pyx_fuse_1__pyx_f_7sklearn_5utils_12_cython_blas__gemm(__pyx_v_order, __pyx_v_ta, __pyx_v_tb, __pyx_v_m, __pyx_v_n, __pyx_v_K, __pyx_v_alpha, __pyx_v_A, __pyx_v_lda, __pyx_v_B, __pyx_v_ldb, __pyx_v_beta, __pyx_v_dist_middle_terms, __pyx_v_ldc);
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":257
* _gemm(order, ta, tb, m, n, K, alpha, A, lda, B, ldb, beta, dist_middle_terms, ldc)
*
* return dist_middle_terms # <<<<<<<<<<<<<<
*
*
*/
__pyx_r = __pyx_v_dist_middle_terms;
goto __pyx_L0;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":219
* return
*
* cdef DTYPE_t * _compute_dist_middle_terms( # <<<<<<<<<<<<<<
* self,
* ITYPE_t X_start,
*/
/* function exit code */
__pyx_L1_error:;
__PYX_XDEC_MEMVIEW(&__pyx_t_1, 0);
__PYX_XDEC_MEMVIEW(&__pyx_t_3, 0);
__Pyx_WriteUnraisable("sklearn.metrics._pairwise_distances_reduction._middle_term_computer.DenseDenseMiddleTermComputer64._compute_dist_middle_terms", __pyx_clineno, __pyx_lineno, __pyx_filename, 1, 1);
__pyx_r = 0;
__pyx_L0:;
__PYX_XDEC_MEMVIEW(&__pyx_v_X_c, 0);
__PYX_XDEC_MEMVIEW(&__pyx_v_Y_c, 0);
return __pyx_r;
} Source code generated on this PR @ 7705579cat gh-24715_extract.cpp /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":219
* return
*
* cdef DTYPE_t * _compute_dist_middle_terms( # <<<<<<<<<<<<<<
* self,
* ITYPE_t X_start,
*/
static __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_f_7sklearn_7metrics_29_pairwise_distances_reduction_21_middle_term_computer_30DenseDenseMiddleTermComputer64__compute_dist_middle_terms(struct __pyx_obj_7sklearn_7metrics_29_pairwise_distances_reduction_21_middle_term_computer_DenseDenseMiddleTermComputer64 *__pyx_v_self, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_X_start, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_X_end, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_Y_start, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_Y_end, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_thread_num) {
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_dist_middle_terms;
enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Order __pyx_v_order;
enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Trans __pyx_v_ta;
enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Trans __pyx_v_tb;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_m;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_n;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_K;
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t __pyx_v_alpha;
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_A;
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_B;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_lda;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_ldb;
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t __pyx_v_beta;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_ldc;
__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_r;
__pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_t_1;
Py_ssize_t __pyx_t_2;
Py_ssize_t __pyx_t_3;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":228
* ) nogil:
* cdef:
* DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data() # <<<<<<<<<<<<<<
*
* # Careful: LDA, LDB and LDC are given for F-ordered arrays
*/
__pyx_v_dist_middle_terms = (__pyx_v_self->__pyx_base.dist_middle_terms_chunks[__pyx_v_thread_num]).data();
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":235
* #
* # Here, we use their counterpart values to work with C-ordered arrays.
* BLAS_Order order = RowMajor # <<<<<<<<<<<<<<
* BLAS_Trans ta = NoTrans
* BLAS_Trans tb = Trans
*/
__pyx_v_order = __pyx_e_7sklearn_5utils_12_cython_blas_RowMajor;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":236
* # Here, we use their counterpart values to work with C-ordered arrays.
* BLAS_Order order = RowMajor
* BLAS_Trans ta = NoTrans # <<<<<<<<<<<<<<
* BLAS_Trans tb = Trans
* ITYPE_t m = X_end - X_start
*/
__pyx_v_ta = __pyx_e_7sklearn_5utils_12_cython_blas_NoTrans;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":237
* BLAS_Order order = RowMajor
* BLAS_Trans ta = NoTrans
* BLAS_Trans tb = Trans # <<<<<<<<<<<<<<
* ITYPE_t m = X_end - X_start
* ITYPE_t n = Y_end - Y_start
*/
__pyx_v_tb = __pyx_e_7sklearn_5utils_12_cython_blas_Trans;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":238
* BLAS_Trans ta = NoTrans
* BLAS_Trans tb = Trans
* ITYPE_t m = X_end - X_start # <<<<<<<<<<<<<<
* ITYPE_t n = Y_end - Y_start
* ITYPE_t K = self.n_features
*/
__pyx_v_m = (__pyx_v_X_end - __pyx_v_X_start);
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":239
* BLAS_Trans tb = Trans
* ITYPE_t m = X_end - X_start
* ITYPE_t n = Y_end - Y_start # <<<<<<<<<<<<<<
* ITYPE_t K = self.n_features
* DTYPE_t alpha = - 2.
*/
__pyx_v_n = (__pyx_v_Y_end - __pyx_v_Y_start);
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":240
* ITYPE_t m = X_end - X_start
* ITYPE_t n = Y_end - Y_start
* ITYPE_t K = self.n_features # <<<<<<<<<<<<<<
* DTYPE_t alpha = - 2.
* # Casting for A and B to remove the const is needed because APIs exposed via
*/
__pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
__pyx_v_K = __pyx_t_1;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":241
* ITYPE_t n = Y_end - Y_start
* ITYPE_t K = self.n_features
* DTYPE_t alpha = - 2. # <<<<<<<<<<<<<<
* # Casting for A and B to remove the const is needed because APIs exposed via
* # scipy.linalg.cython_blas aren't reflecting the arguments' const qualifier.
*/
__pyx_v_alpha = -2.;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":245
* # scipy.linalg.cython_blas aren't reflecting the arguments' const qualifier.
* # See: https://github.com/scipy/scipy/issues/14262
* DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0] # <<<<<<<<<<<<<<
* DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
* ITYPE_t lda = self.n_features
*/
__pyx_t_2 = __pyx_v_X_start;
__pyx_t_3 = 0;
__pyx_v_A = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=0 */ (__pyx_v_self->X.data + __pyx_t_2 * __pyx_v_self->X.strides[0]) )) + __pyx_t_3)) )))));
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":246
* # See: https://github.com/scipy/scipy/issues/14262
* DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0]
* DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0] # <<<<<<<<<<<<<<
* ITYPE_t lda = self.n_features
* ITYPE_t ldb = self.n_features
*/
__pyx_t_3 = __pyx_v_Y_start;
__pyx_t_2 = 0;
__pyx_v_B = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=0 */ (__pyx_v_self->Y.data + __pyx_t_3 * __pyx_v_self->Y.strides[0]) )) + __pyx_t_2)) )))));
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":247
* DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0]
* DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
* ITYPE_t lda = self.n_features # <<<<<<<<<<<<<<
* ITYPE_t ldb = self.n_features
* DTYPE_t beta = 0.
*/
__pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
__pyx_v_lda = __pyx_t_1;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":248
* DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
* ITYPE_t lda = self.n_features
* ITYPE_t ldb = self.n_features # <<<<<<<<<<<<<<
* DTYPE_t beta = 0.
* ITYPE_t ldc = Y_end - Y_start
*/
__pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
__pyx_v_ldb = __pyx_t_1;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":249
* ITYPE_t lda = self.n_features
* ITYPE_t ldb = self.n_features
* DTYPE_t beta = 0. # <<<<<<<<<<<<<<
* ITYPE_t ldc = Y_end - Y_start
*
*/
__pyx_v_beta = 0.;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":250
* ITYPE_t ldb = self.n_features
* DTYPE_t beta = 0.
* ITYPE_t ldc = Y_end - Y_start # <<<<<<<<<<<<<<
*
* # dist_middle_terms = `-2 * X[X_start:X_end] @ Y[Y_start:Y_end].T`
*/
__pyx_v_ldc = (__pyx_v_Y_end - __pyx_v_Y_start);
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":253
*
* # dist_middle_terms = `-2 * X[X_start:X_end] @ Y[Y_start:Y_end].T`
* _gemm(order, ta, tb, m, n, K, alpha, A, lda, B, ldb, beta, dist_middle_terms, ldc) # <<<<<<<<<<<<<<
*
* return dist_middle_terms
*/
__pyx_fuse_1__pyx_f_7sklearn_5utils_12_cython_blas__gemm(__pyx_v_order, __pyx_v_ta, __pyx_v_tb, __pyx_v_m, __pyx_v_n, __pyx_v_K, __pyx_v_alpha, __pyx_v_A, __pyx_v_lda, __pyx_v_B, __pyx_v_ldb, __pyx_v_beta, __pyx_v_dist_middle_terms, __pyx_v_ldc);
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":255
* _gemm(order, ta, tb, m, n, K, alpha, A, lda, B, ldb, beta, dist_middle_terms, ldc)
*
* return dist_middle_terms # <<<<<<<<<<<<<<
*
*
*/
__pyx_r = __pyx_v_dist_middle_terms;
goto __pyx_L0;
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":219
* return
*
* cdef DTYPE_t * _compute_dist_middle_terms( # <<<<<<<<<<<<<<
* self,
* ITYPE_t X_start,
*/
/* function exit code */
__pyx_L0:;
return __pyx_r;
} Difference between bothdiff main_extract.cpp gh-24715_extract.cpp 10,11d9
< __Pyx_memviewslice __pyx_v_X_c = { 0, 0, { 0 }, { 0 }, { 0 } };
< __Pyx_memviewslice __pyx_v_Y_c = { 0, 0, { 0 }, { 0 }, { 0 } };
27,34c25,27
< __Pyx_memviewslice __pyx_t_1 = { 0, 0, { 0 }, { 0 }, { 0 } };
< int __pyx_t_2;
< __Pyx_memviewslice __pyx_t_3 = { 0, 0, { 0 }, { 0 }, { 0 } };
< Py_ssize_t __pyx_t_4;
< Py_ssize_t __pyx_t_5;
< int __pyx_lineno = 0;
< const char *__pyx_filename = NULL;
< int __pyx_clineno = 0;
---
> __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_t_1;
> Py_ssize_t __pyx_t_2;
> Py_ssize_t __pyx_t_3;
39,110d31
< * const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :] # <<<<<<<<<<<<<<
< * const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :]
< * DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data()
< */
< __pyx_t_1.data = __pyx_v_self->X.data;
< __pyx_t_1.memview = __pyx_v_self->X.memview;
< __PYX_INC_MEMVIEW(&__pyx_t_1, 0);
< __pyx_t_2 = -1;
< if (unlikely(__pyx_memoryview_slice_memviewslice(
< &__pyx_t_1,
< __pyx_v_self->X.shape[0], __pyx_v_self->X.strides[0], __pyx_v_self->X.suboffsets[0],
< 0,
< 0,
< &__pyx_t_2,
< __pyx_v_X_start,
< __pyx_v_X_end,
< 0,
< 1,
< 1,
< 0,
< 1) < 0))
< {
< __PYX_ERR(0, 228, __pyx_L1_error)
< }
<
< __pyx_t_1.shape[1] = __pyx_v_self->X.shape[1];
< __pyx_t_1.strides[1] = __pyx_v_self->X.strides[1];
< __pyx_t_1.suboffsets[1] = -1;
<
< __pyx_v_X_c = __pyx_t_1;
< __pyx_t_1.memview = NULL;
< __pyx_t_1.data = NULL;
<
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":229
< * cdef:
< * const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :]
< * const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :] # <<<<<<<<<<<<<<
< * DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data()
< *
< */
< __pyx_t_3.data = __pyx_v_self->Y.data;
< __pyx_t_3.memview = __pyx_v_self->Y.memview;
< __PYX_INC_MEMVIEW(&__pyx_t_3, 0);
< __pyx_t_2 = -1;
< if (unlikely(__pyx_memoryview_slice_memviewslice(
< &__pyx_t_3,
< __pyx_v_self->Y.shape[0], __pyx_v_self->Y.strides[0], __pyx_v_self->Y.suboffsets[0],
< 0,
< 0,
< &__pyx_t_2,
< __pyx_v_Y_start,
< __pyx_v_Y_end,
< 0,
< 1,
< 1,
< 0,
< 1) < 0))
< {
< __PYX_ERR(0, 229, __pyx_L1_error)
< }
<
< __pyx_t_3.shape[1] = __pyx_v_self->Y.shape[1];
< __pyx_t_3.strides[1] = __pyx_v_self->Y.strides[1];
< __pyx_t_3.suboffsets[1] = -1;
<
< __pyx_v_Y_c = __pyx_t_3;
< __pyx_t_3.memview = NULL;
< __pyx_t_3.data = NULL;
<
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":230
< * const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :]
< * const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :]
117c38
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":237
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":235
126c47
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":238
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":236
131c52
< * ITYPE_t m = X_c.shape[0]
---
> * ITYPE_t m = X_end - X_start
135c56
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":239
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":237
139,140c60,61
< * ITYPE_t m = X_c.shape[0]
< * ITYPE_t n = Y_c.shape[0]
---
> * ITYPE_t m = X_end - X_start
> * ITYPE_t n = Y_end - Y_start
144c65
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":240
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":238
147,149c68,70
< * ITYPE_t m = X_c.shape[0] # <<<<<<<<<<<<<<
< * ITYPE_t n = Y_c.shape[0]
< * ITYPE_t K = X_c.shape[1]
---
> * ITYPE_t m = X_end - X_start # <<<<<<<<<<<<<<
> * ITYPE_t n = Y_end - Y_start
> * ITYPE_t K = self.n_features
151c72
< __pyx_v_m = (__pyx_v_X_c.shape[0]);
---
> __pyx_v_m = (__pyx_v_X_end - __pyx_v_X_start);
153c74
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":241
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":239
155,157c76,78
< * ITYPE_t m = X_c.shape[0]
< * ITYPE_t n = Y_c.shape[0] # <<<<<<<<<<<<<<
< * ITYPE_t K = X_c.shape[1]
---
> * ITYPE_t m = X_end - X_start
> * ITYPE_t n = Y_end - Y_start # <<<<<<<<<<<<<<
> * ITYPE_t K = self.n_features
160c81
< __pyx_v_n = (__pyx_v_Y_c.shape[0]);
---
> __pyx_v_n = (__pyx_v_Y_end - __pyx_v_Y_start);
162,165c83,86
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":242
< * ITYPE_t m = X_c.shape[0]
< * ITYPE_t n = Y_c.shape[0]
< * ITYPE_t K = X_c.shape[1] # <<<<<<<<<<<<<<
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":240
> * ITYPE_t m = X_end - X_start
> * ITYPE_t n = Y_end - Y_start
> * ITYPE_t K = self.n_features # <<<<<<<<<<<<<<
169c90,91
< __pyx_v_K = (__pyx_v_X_c.shape[1]);
---
> __pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
> __pyx_v_K = __pyx_t_1;
171,173c93,95
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":243
< * ITYPE_t n = Y_c.shape[0]
< * ITYPE_t K = X_c.shape[1]
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":241
> * ITYPE_t n = Y_end - Y_start
> * ITYPE_t K = self.n_features
180c102
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":247
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":245
183,189c105,111
< * DTYPE_t * A = <DTYPE_t *> &X_c[0, 0] # <<<<<<<<<<<<<<
< * DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
< * ITYPE_t lda = X_c.shape[1]
< */
< __pyx_t_4 = 0;
< __pyx_t_5 = 0;
< __pyx_v_A = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=0 */ (__pyx_v_X_c.data + __pyx_t_4 * __pyx_v_X_c.strides[0]) )) + __pyx_t_5)) )))));
---
> * DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0] # <<<<<<<<<<<<<<
> * DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
> * ITYPE_t lda = self.n_features
> */
> __pyx_t_2 = __pyx_v_X_start;
> __pyx_t_3 = 0;
> __pyx_v_A = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=0 */ (__pyx_v_self->X.data + __pyx_t_2 * __pyx_v_self->X.strides[0]) )) + __pyx_t_3)) )))));
191c113
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":248
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":246
193,200c115,122
< * DTYPE_t * A = <DTYPE_t *> &X_c[0, 0]
< * DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0] # <<<<<<<<<<<<<<
< * ITYPE_t lda = X_c.shape[1]
< * ITYPE_t ldb = X_c.shape[1]
< */
< __pyx_t_5 = 0;
< __pyx_t_4 = 0;
< __pyx_v_B = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=0 */ (__pyx_v_Y_c.data + __pyx_t_5 * __pyx_v_Y_c.strides[0]) )) + __pyx_t_4)) )))));
---
> * DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0]
> * DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0] # <<<<<<<<<<<<<<
> * ITYPE_t lda = self.n_features
> * ITYPE_t ldb = self.n_features
> */
> __pyx_t_3 = __pyx_v_Y_start;
> __pyx_t_2 = 0;
> __pyx_v_B = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const *) ( /* dim=0 */ (__pyx_v_self->Y.data + __pyx_t_3 * __pyx_v_self->Y.strides[0]) )) + __pyx_t_2)) )))));
202,206c124,128
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":249
< * DTYPE_t * A = <DTYPE_t *> &X_c[0, 0]
< * DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
< * ITYPE_t lda = X_c.shape[1] # <<<<<<<<<<<<<<
< * ITYPE_t ldb = X_c.shape[1]
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":247
> * DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0]
> * DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
> * ITYPE_t lda = self.n_features # <<<<<<<<<<<<<<
> * ITYPE_t ldb = self.n_features
209c131,132
< __pyx_v_lda = (__pyx_v_X_c.shape[1]);
---
> __pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
> __pyx_v_lda = __pyx_t_1;
211,214c134,137
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":250
< * DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
< * ITYPE_t lda = X_c.shape[1]
< * ITYPE_t ldb = X_c.shape[1] # <<<<<<<<<<<<<<
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":248
> * DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
> * ITYPE_t lda = self.n_features
> * ITYPE_t ldb = self.n_features # <<<<<<<<<<<<<<
216c139
< * ITYPE_t ldc = Y_c.shape[0]
---
> * ITYPE_t ldc = Y_end - Y_start
218c141,142
< __pyx_v_ldb = (__pyx_v_X_c.shape[1]);
---
> __pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
> __pyx_v_ldb = __pyx_t_1;
220,222c144,146
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":251
< * ITYPE_t lda = X_c.shape[1]
< * ITYPE_t ldb = X_c.shape[1]
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":249
> * ITYPE_t lda = self.n_features
> * ITYPE_t ldb = self.n_features
224c148
< * ITYPE_t ldc = Y_c.shape[0]
---
> * ITYPE_t ldc = Y_end - Y_start
229,230c153,154
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":252
< * ITYPE_t ldb = X_c.shape[1]
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":250
> * ITYPE_t ldb = self.n_features
232c156
< * ITYPE_t ldc = Y_c.shape[0] # <<<<<<<<<<<<<<
---
> * ITYPE_t ldc = Y_end - Y_start # <<<<<<<<<<<<<<
234c158
< * # dist_middle_terms = `-2 * X_c @ Y_c.T`
---
> * # dist_middle_terms = `-2 * X[X_start:X_end] @ Y[Y_start:Y_end].T`
236c160
< __pyx_v_ldc = (__pyx_v_Y_c.shape[0]);
---
> __pyx_v_ldc = (__pyx_v_Y_end - __pyx_v_Y_start);
238c162
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":255
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":253
240c164
< * # dist_middle_terms = `-2 * X_c @ Y_c.T`
---
> * # dist_middle_terms = `-2 * X[X_start:X_end] @ Y[Y_start:Y_end].T`
247c171
< /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":257
---
> /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":255
266,270d189
< __pyx_L1_error:;
< __PYX_XDEC_MEMVIEW(&__pyx_t_1, 0);
< __PYX_XDEC_MEMVIEW(&__pyx_t_3, 0);
< __Pyx_WriteUnraisable("sklearn.metrics._pairwise_distances_reduction._middle_term_computer.DenseDenseMiddleTermComputer64._compute_dist_middle_terms", __pyx_clineno, __pyx_lineno, __pyx_filename, 1, 1);
< __pyx_r = 0;
272,273d190
< __PYX_XDEC_MEMVIEW(&__pyx_v_X_c, 0);
< __PYX_XDEC_MEMVIEW(&__pyx_v_Y_c, 0); Still, I am rerunning benchmarks using 8 threads. |
Performances have not significantly changed. Detailsasv continuous -b PairwiseDistancesR -e upstream/main maint/pdr-do-not-slice
|
Since the performance impact of slicing vs pointer arithmetic is negligible in this case (thanks for checking), I would go for the solution that is the most readable. On can argue that the new version is slightly better because it removes the |
Reference Issues/PRs
Relates to #22587.
Originally part of #24542.
What does this implement/fix? Explain your changes.
See the reasons here: #17299
Any other comments?