Skip to content

Unconsistent behaviour of SDML when ussing skggm with fixed seed. #272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
grudloff opened this issue Jan 13, 2020 · 4 comments
Closed

Unconsistent behaviour of SDML when ussing skggm with fixed seed. #272

grudloff opened this issue Jan 13, 2020 · 4 comments

Comments

@grudloff
Copy link
Contributor

Description

I was using SDML_Supervised() for a subsequent 2D visualization with UMAP (Similar to t-sne) and got large differences in the results on every fit instance while using the same data. Fixing the seed doesn't make a difference. I tracked down the problem to the call of quic() done when skggm is installed, reviewing their code I found there is a fixed seed but anyway the results from that function vary in every call.

note: I am using the latest version from Skggm, will try to reproduce later with the version indicated in the documentation.

Steps/Code to Reproduce

from metric_learn import SDML_Supervised
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np

wine=load_wine()
X, y = load_wine(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

SDML=SDML_Supervised(random_state=42)
X_transform=SDML.fit_transform(X_train,y_train)
print(np.sum(np.abs(X_transform - SDML.fit_transform(X_train,y_train))))

Expected Results

The two instances of SDML fit should have the same result, then the printed difference should be zero.

Actual Results

Large numbers in the order of 100 to 300.

Versions

Linux-5.0.0-37-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]
NumPy 1.18.1
SciPy 1.4.1
Scikit-Learn 0.22.1
Metric-Learn 0.5.0
Skggm 0.2.8

@bellet
Copy link
Member

bellet commented Jan 13, 2020

I just realized that we had noticed this when refurbishing SDML. See the following link and in particular the last post by jasonlaska:
skggm/skggm#122

It seems they have not made a new release since then, which is why our doc suggests to install a particular developer version. Doing this should fix this problem. @grudloff can you try and confirm?

@grudloff
Copy link
Contributor Author

Installed the indicated version and the behavior turns out to be the expected. Maybe the following code could be added to the documentation for ease of installing that version:
pip install "git+https://github.com/skggm/skggm.git@a0ed406586c4364ea3297a658f415e13b5cbdaf8"

@bellet
Copy link
Member

bellet commented Jan 13, 2020

Sure, it could be added in the installation-and-setup part of the doc, feel free to open a PR for this
Closing the issue

@bellet bellet closed this as completed Jan 13, 2020
@bellet
Copy link
Member

bellet commented Jan 13, 2020

Note: Since all supervised versions of metric learners pass sklearn's check_estimator (see link) they pass the test check_fit_idempotent (see link) which would fail if fit does not give consistent results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants