Euclidean pairwise_distances slower for n_jobs > 1

In a followup of issue #8213 , it looks like using `n_jobs > 1` in  Eucledian `pairwise_distances` makes computations slower instead of speeding them up. 

#### Steps to reproduce

```py
from sklearn.metrics import pairwise_distances
import numpy as np

np.random.seed(99999)
n_dim = 200

for n_train, n_test in [(1000, 100000),
                        (10000, 10000),
                        (100000, 1000)]:
    print('\n# n_train={}, n_test={}, n_dim={}\n'.format(
                         n_train, n_test, n_dim))

    X_train = np.random.rand(n_train, n_dim)
    X_test = np.random.rand(n_test, n_dim)

    for n_jobs in [1, 2]:
        print('n_jobs=', n_jobs, ' => ', end='')

        %timeit pairwise_distances(X_train, X_test, 'euclidean',
                                   n_jobs=n_jobs, squared=True)
```
which on a 2 core CPU returns,
```
# n_train=1000, n_test=100000, n_dim=200

n_jobs= 1  => 1 loop, best of 3: 1.92 s per loop
n_jobs= 2  => 1 loop, best of 3: 4.95 s per loop

# n_train=10000, n_test=10000, n_dim=200

n_jobs= 1  => 1 loop, best of 3: 1.89 s per loop
n_jobs= 2  => 1 loop, best of 3: 4.74 s per loop

# n_train=100000, n_test=1000, n_dim=200

n_jobs= 1  => 1 loop, best of 3: 2 s per loop
n_jobs= 2  => 1 loop, best of 3: 5.6 s per loop
```
While for small datasets, it would make sens that the parallel processing would not improve performance due to the multiprocessing etc overhead, this is by no mean a small dataset. And the compute time does not decrease when using e.g. n_jobs=4 on a 4 core CPU.

This also holds for other number of dimensions,
**n_dim=10**
```
# n_train=1000, n_test=100000, n_dim=10

n_jobs= 1  => 1 loop, best of 3: 873 ms per loop
n_jobs= 2  => 1 loop, best of 3: 4.25 s per loop
```
**n_dim=1000**
```
# n_train=1000, n_test=100000, n_dim=1000

n_jobs= 1  => 1 loop, best of 3: 6.56 s per loop
n_jobs= 2  => 1 loop, best of 3: 8.56 s per loop
```

Running `benchmarks/bench_plot_parallel_pairwise.py` also yields similar results,
![untitled](https://cloud.githubusercontent.com/assets/630936/22105598/240d33bc-de45-11e6-8b68-3ce97b6d7e2d.png)

This might affect a number of estimators / metrics where `pairwise_distances` is used.

#### Versions
```
Linux-4.6.0-gentoo-x86_64-Intel-R-_Core-TM-_i5-6200U_CPU_@_2.30GHz-with-gentoo-2.3
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.11.1
SciPy 0.18.1
Scikit-Learn 0.18.1
```
I also get similar results with scikit-learn 0.17.1 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Euclidean pairwise_distances slower for n_jobs > 1 #8216

Steps to reproduce

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Euclidean pairwise_distances slower for n_jobs > 1 #8216

Description

Steps to reproduce

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions