Skip to content

Truly parallel execution of pairwise_kernels and pairwise_distances #29587

Closed
@stepan-srsen

Description

@stepan-srsen

Describe the workflow you want to enable

Both pairwise_kernels and pairwise_distances functions call _parallel_pairwise function, which is (contrary to its name) not parallel as it enforces the threading backend. Therefore, these functions are terribly slow, especially for computationally expensive user-defined metrics. I understand that the reasons for the threading backend are possibly large memory demands and data communication overhead but I suggest a different approach. Also, the documentation for these functions talks about parallel execution and processes which is currently simply not true.

Describe your proposed solution

The memory and data communication issues can be reduced by a smarter distribution of the input data to individual processes. Right now, only Y is sliced in the _parallel_pairwise function which is suboptimal for parallel processing. Both X and Y should be sliced to lower the demands for multiprocessing. For example for 100x100 X and Y distributed to 100 processes, we have to copy 100+1 inputs to every process when slicing only Y while only 10+10 when slicing both X and Y. As a result, multiprocessing can be allowed. Also, joblib does automatic memmapping in some cases.

Alternatively, at least the documentation for pairwise_kernels and pairwise_distances should be corrected.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions