Unconsistent behaviour of SDML when ussing skggm with fixed seed. #272

grudloff · 2020-01-13T09:38:45Z

Description

I was using SDML_Supervised() for a subsequent 2D visualization with UMAP (Similar to t-sne) and got large differences in the results on every fit instance while using the same data. Fixing the seed doesn't make a difference. I tracked down the problem to the call of quic() done when skggm is installed, reviewing their code I found there is a fixed seed but anyway the results from that function vary in every call.

note: I am using the latest version from Skggm, will try to reproduce later with the version indicated in the documentation.

Steps/Code to Reproduce

from metric_learn import SDML_Supervised
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np

wine=load_wine()
X, y = load_wine(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

SDML=SDML_Supervised(random_state=42)
X_transform=SDML.fit_transform(X_train,y_train)
print(np.sum(np.abs(X_transform - SDML.fit_transform(X_train,y_train))))

Expected Results

The two instances of SDML fit should have the same result, then the printed difference should be zero.

Actual Results

Large numbers in the order of 100 to 300.

Versions

Linux-5.0.0-37-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]
NumPy 1.18.1
SciPy 1.4.1
Scikit-Learn 0.22.1
Metric-Learn 0.5.0
Skggm 0.2.8

bellet · 2020-01-13T13:04:19Z

I just realized that we had noticed this when refurbishing SDML. See the following link and in particular the last post by jasonlaska:
skggm/skggm#122

It seems they have not made a new release since then, which is why our doc suggests to install a particular developer version. Doing this should fix this problem. @grudloff can you try and confirm?

grudloff · 2020-01-13T14:52:22Z

Installed the indicated version and the behavior turns out to be the expected. Maybe the following code could be added to the documentation for ease of installing that version:
pip install "git+https://github.com/skggm/skggm.git@a0ed406586c4364ea3297a658f415e13b5cbdaf8"

bellet · 2020-01-13T15:05:04Z

Sure, it could be added in the installation-and-setup part of the doc, feel free to open a PR for this
Closing the issue

bellet · 2020-01-13T15:26:33Z

Note: Since all supervised versions of metric learners pass sklearn's check_estimator (see link) they pass the test check_fit_idempotent (see link) which would fail if fit does not give consistent results

bellet closed this as completed Jan 13, 2020

wdevazelhes mentioned this issue May 26, 2020

Drop support for python 2 and python 3.5 #291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unconsistent behaviour of SDML when ussing skggm with fixed seed. #272

Unconsistent behaviour of SDML when ussing skggm with fixed seed. #272

grudloff commented Jan 13, 2020

bellet commented Jan 13, 2020

grudloff commented Jan 13, 2020

bellet commented Jan 13, 2020

bellet commented Jan 13, 2020 •

edited

Loading

Unconsistent behaviour of SDML when ussing skggm with fixed seed. #272

Unconsistent behaviour of SDML when ussing skggm with fixed seed. #272

Comments

grudloff commented Jan 13, 2020

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

bellet commented Jan 13, 2020

grudloff commented Jan 13, 2020

bellet commented Jan 13, 2020

bellet commented Jan 13, 2020 • edited Loading

bellet commented Jan 13, 2020 •

edited

Loading