#4475 : Add a safe_pairwise_distances function, dealing with zero varian... #4495

LowikC · 2015-04-02T12:57:47Z

#4475 : Add a safe_pairwise_distances function, dealing with zero variance samples when using correlation metric.

The best fix would be to have the metric not returning NaN values, but as the correlation metric is actually computed by scipy, we can't modify it directly.
So, when metric=='correlation', we replace rows and cols corresponding to zero variance samples by the maximum distance (here 1.0).

This change is

…ith zero variance samples when using correlation metric. The best fix would be to have the metric not returning NaN values, but as the correlation metric is actually computed by spicy, we can't modify it directly. So, in case of metric=='correlation', we replace rows and cols corresponding to zero variance samples by the maximum distance (here 1.0).

landscape-bot · 2015-04-02T13:04:08Z

Code quality remained the same when pulling 359fd1a on Nzeuwik:issue_4475 into 1c33a6f on scikit-learn:master.

coveralls · 2015-04-02T13:18:50Z

Coverage decreased (-0.01%) to 95.11% when pulling 359fd1a on Nzeuwik:issue_4475 into 1c33a6f on scikit-learn:master.

amueller · 2015-04-02T15:17:32Z

Why do you define a new function and not fix pairwise_distances?

LowikC · 2015-04-02T15:53:00Z

As there are several returns statements in pairwise_distances, and the check I added must be done for several of them, it seemed to me it would be clearer to do it after applying the original function.

I pushed a new commit with the fix directly in pairwise_distances

landscape-bot · 2015-04-02T17:26:29Z

Code quality remained the same when pulling b444c58 on Nzeuwik:issue_4475 into 1c33a6f on scikit-learn:master.

amueller · 2015-04-02T18:50:17Z

sklearn/metrics/pairwise.py


-    return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
+    if distances is None :


why not an else here, so you don't need to initialize distances?

amueller · 2015-04-02T18:53:08Z

The implementation of correlation is 7 lines: https://github.com/scipy/scipy/blob/master/scipy/spatial/distance.py#L328
We don't even need to validate again. Shouldn't we just rather reimplement a "stable" version?

amueller · 2015-04-02T19:14:50Z

To explain my reasoning: Adding another general function to the public interface for a fix seemed overkill for a single metric fix. Adding metric-specific code in a very general function seemed like a bad idea (this function could become very long if every metric needed a fix).

LowikC · 2015-04-12T16:28:05Z

I agree that it would cleaner to fix directly the metric, but there are several points I'd like to clarify:

all metrics from sklearn support sparse matrices, whereas metrics from scipy don't. So if we include correlation in sklearn, it should also support sparse matrices, right? And in this case, the implementation will be tougher than the one in scipy.
Correlation distance between a vector with zero-variance and another vector should be 1 (instead of NaN). But what should be the correlation distance between 2 equal vectors with zero-variance? 0 or 1? (correlation_distance(x, x) = 0 in general)

amueller · 2015-04-13T20:39:54Z

It would be fine to add a metric that only supports dense matrices.
For the second, I'm not sure. I'd say 0 but checking x1 == x2 is not robust. If there is a robust way to write it such that it is 0, go for it.

jnothman · 2017-06-14T11:35:03Z

@Nzeuwik, do you intend to follow up @amueller's suggestion of implementing a metric?

thomasjpfan · 2022-07-20T02:19:09Z

I am closing this PR since the original issue was closed: #4475. Thank you for working on this PR!

scikit-learn#4475: Fix directly in pairwise_distances, as suggested

b444c58

amueller reviewed Apr 2, 2015
View reviewed changes

amueller mentioned this pull request Apr 2, 2015

TSNE in t_sne.py Replaced nan with 0 in distance matrix. Issue #4475 #4492

Closed

amueller added the Bug label May 6, 2015

amueller added this to the 0.16.2 milestone May 6, 2015

amueller modified the milestones: 0.16.2, 0.17 Sep 8, 2015

giorgiop mentioned this pull request Oct 1, 2015

[MRG] pairwise_distances outputs Nan and negative values #5333

Closed

4 tasks

lesteve modified the milestones: 0.17, 0.18 Jul 27, 2016

amueller modified the milestones: 0.18, 0.19 Sep 22, 2016

jnothman modified the milestones: 0.20, 0.19 Jun 14, 2017

jnothman added the Waiting for Reviewer label Jun 14, 2017

glemaitre modified the milestones: 0.20, 0.21 Jun 13, 2018

jnothman modified the milestones: 0.21, 0.22 Apr 15, 2019

jnothman added help wanted and removed Waiting for Reviewer labels Oct 27, 2019

jnothman modified the milestones: 0.22, 0.23 Oct 31, 2019

github-actions bot added module:manifold module:metrics labels Mar 2, 2020

thomasjpfan modified the milestones: 0.23, 0.24 Apr 20, 2020

rth added the Stalled label Jul 25, 2020

cmarmo removed this from the 0.24 milestone Oct 15, 2020

Base automatically changed from master to main January 22, 2021 10:48

thomasjpfan closed this Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#4475 : Add a safe_pairwise_distances function, dealing with zero varian... #4495

#4475 : Add a safe_pairwise_distances function, dealing with zero varian... #4495

LowikC commented Apr 2, 2015 •

edited by amueller

Loading

landscape-bot commented Apr 2, 2015

coveralls commented Apr 2, 2015

amueller commented Apr 2, 2015

LowikC commented Apr 2, 2015

landscape-bot commented Apr 2, 2015

amueller Apr 2, 2015

amueller commented Apr 2, 2015

amueller commented Apr 2, 2015

LowikC commented Apr 12, 2015

amueller commented Apr 13, 2015

jnothman commented Jun 14, 2017

thomasjpfan commented Jul 20, 2022


		return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
		if distances is None :

#4475 : Add a safe_pairwise_distances function, dealing with zero varian... #4495

#4475 : Add a safe_pairwise_distances function, dealing with zero varian... #4495

Conversation

LowikC commented Apr 2, 2015 • edited by amueller Loading

landscape-bot commented Apr 2, 2015

coveralls commented Apr 2, 2015

amueller commented Apr 2, 2015

LowikC commented Apr 2, 2015

landscape-bot commented Apr 2, 2015

amueller Apr 2, 2015

Choose a reason for hiding this comment

amueller commented Apr 2, 2015

amueller commented Apr 2, 2015

LowikC commented Apr 12, 2015

amueller commented Apr 13, 2015

jnothman commented Jun 14, 2017

thomasjpfan commented Jul 20, 2022

LowikC commented Apr 2, 2015 •

edited by amueller

Loading