-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
TSNE with correlation metric: ValueError: Distance matrix 'X' must be symmetric #4475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. Maybe the best fix would be to not make the metric produce NaNs and instead produce zeros in metrics. That is how we usually deal with zero variance cases. |
…ith zero variance samples when using correlation metric. The best fix would be to have the metric not returning NaN values, but as the correlation metric is actually computed by spicy, we can't modify it directly. So, in case of metric=='correlation', we replace rows and cols corresponding to zero variance samples by the maximum distance (here 1.0).
It also happens when using cosine metric. |
@littmus can you give a small example? |
Same as above, with cosine
In the case of
|
For the cosine metric it should be fixed a posteriori by changings negative numbers with small absolute value (e.g. smaller than For the correlation distance nans, it looks like a problem in scipy's |
Returning scipy/#3728 also discusses the problem of negative values due to finite arithmetic calculation issues. I will do some experiments. By the way, we would get a similar issue with missing value, but I don't think we want to handle that explicitly. See scipy/#3870. |
I have tried all the Regarding the negative values, we can follow what suggested by @ogrisel and it works fine. Or we could use the trick with |
@giorgiop there is no missing values (in scikit-learn, apart from the preprocessing module) |
Clipping to zero sounds good. For correlation, I'd say we should set the distance between constant points to zero. |
as post-processing? |
👍 |
This is the problem of
The fix is to add in pairwise.py#L573 (almost the same as you did in
|
Pool request fixing negative values: #7732. |
So it seems the cosine bug has been fixed, but correlation is open. |
I think we should close this issue. As I understand this metric does not work if one of the vectors has zero variance, returning Nan seems to be the chosen behaviour for this metric and no solution seems to be better. We could add a different assertion but I am not sure if we should. |
I agree with the comments above and this issue can be closed. On AssertionError: All probabilities should be finite which is more informative about there being non-finite values, such as nans. Edit: I am leaving this open for now, to see what other maintainers think. |
This looks good imo |
TSNE raises an obscure error, when the data set contains rows with a standard deviation
0
and therefore undefined correlations:The text was updated successfully, but these errors were encountered: