Truly parallel execution of pairwise_kernels and pairwise_distances

### Describe the workflow you want to enable

Both `pairwise_kernels` and `pairwise_distances` functions call `_parallel_pairwise` function, which is (contrary to its name) not parallel as it enforces the threading backend. Therefore, these functions are terribly slow, especially for computationally expensive user-defined metrics. I understand that the reasons for the threading backend are possibly large memory demands and data communication overhead but I suggest a different approach. Also, the documentation for these functions talks about parallel execution and processes which is currently simply not true.

### Describe your proposed solution

The memory and data communication issues can be reduced by a smarter distribution of the input data to individual processes. Right now, only `Y` is sliced in the `_parallel_pairwise` function which is suboptimal for parallel processing. Both `X` and `Y` should be sliced to lower the demands for multiprocessing. For example for 100x100 `X` and `Y` distributed to 100 processes, we have to copy 100+1 inputs to every process when slicing only `Y` while only 10+10 when slicing both `X` and `Y`. As a result, multiprocessing can be allowed. Also, joblib does automatic memmapping in some cases.

Alternatively, at least the documentation for  `pairwise_kernels` and `pairwise_distances` should be corrected.

### Describe alternatives you've considered, if relevant

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Truly parallel execution of pairwise_kernels and pairwise_distances #29587

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Truly parallel execution of pairwise_kernels and pairwise_distances #29587

Description

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions