[MRG] Uniformize num_dims and add it for LMNN #193

wdevazelhes · 2019-04-18T11:46:06Z

This solves #167 partially (only gives a num_dims argument to LMNN and ensures that the check is done uniformly between the algorithms)

# Conflicts: # test/test_utils.py

wdevazelhes · 2019-04-18T11:59:04Z

Note: For LMNN, if we have shogun, it will do a pca if asked so, but I couldn't find how the dimension of pca is chosen, so for now I didn't change anything... Also, maybe we'll not have to bother as soon as we have solved #124, because maybe we will chose that the 'pca' option triggers a scikit-learn PCA in both shogun/not shogun cases...

bellet · 2019-04-18T12:23:02Z

Have you checked what is the pca option of Shogun for? It could be simply a preprocessing applied to the dataset, not an initialization of the algorithm. In which case we will set this option to false when solving #125

wdevazelhes · 2019-04-18T12:38:30Z

Have you checked what is the pca option of Shogun for? It could be simply a preprocessing applied to the dataset, not an initialization of the algorithm. In which case we will set this option to false when solving #125

Yes, according to this page: http://shogun.ml/notebook/latest/LMNN.html, (in cell 5), they say PCA is used to find the initial matrix

bellet · 2019-04-18T13:15:33Z

Have you checked what is the pca option of Shogun for? It could be simply a preprocessing applied to the dataset, not an initialization of the algorithm. In which case we will set this option to false when solving #125

Yes, according to this page: http://shogun.ml/notebook/latest/LMNN.html, (in cell 5), they say PCA is used to find the initial matrix

Indeed. Looking at this, I do not think they reduce the dimension anyway:
https://shogun-list.shogun-toolbox.narkive.com/n6B3o0Mf/shogun-lmnn-pca-error
But I agree it will be safer to perform the PCA initialization ourselves and pass it to Shogun's train's method

perimosocordiae · 2019-04-18T13:21:46Z

metric_learn/_util.py

+  """Checks that num_dims is less that n_features and deal with the None
+  case"""
+  if num_dims is None:
+    dim = n_features


As a style nitpick, I prefer early-return for cases like this:

if num_dims is None: return n_features if 0 < num_dims <= n_features: return num_dims raise ValueError(...)

That's better I agree, done

perimosocordiae · 2019-04-18T13:22:05Z

metric_learn/_util.py

+
+
+def _check_num_dims(n_features, num_dims):
+  """Checks that num_dims is less that n_features and deal with the None


typo: "less than"

Thanks, done

perimosocordiae · 2019-04-18T13:24:05Z

metric_learn/_util.py

+    dim = n_features
+  else:
+    if not 0 < num_dims <= n_features:
+      raise ValueError('Invalid num_dims, must be in [1, %d]' % n_features)


Some existing code would only warn if num_dims > n_features. I think it's probably better to error out here, but we should keep in mind that this is technically a back-compat break.

That's true, maybe adding it in the changelog is enough ? Something like "for all the algorithms that have a parameter num_dims, it will now be checked to be between 1 and n_features, with n_features the number of dimensions of the input space" ?

I just added it to the

wdevazelhes · 2019-04-23T07:35:49Z

Also, I kept the name num_dims for now, but in scikit-learn's NCA we went for n_components, which respects more scikit-learn's conventions. It would be good to change the name wouldn't it ? If so I'll add deprecation warning for it

bellet · 2019-04-23T09:19:46Z

I agree that n_components would be more intuitive, +1 for changing this before the 0.5.0 release with deprecation warning

wdevazelhes · 2019-05-03T20:24:48Z

I investigated the travis fail and there seems to be an issue with some shapes mismatch for LMNN when doing dimensionality reduction. There seems to be a problem with the gradient: Looking into this line the gradient seems to be of the shape of an outer product of Xs (when it should be of the shape of L):

https://github.com/metric-learn/metric-learn/blob/d4badc8adc0a8ce6689dd9a95e89181efaf5ee24/metric_learn/lmnn.py#L196

Looking at the PR on LMNN in scikit-learn, there might be a dot product with L missing:
https://github.com/scikit-learn/scikit-learn/pull/8602/files#diff2914dd42441ed8b594e69b2fe04d23f5R762

I'll look into it

# Conflicts: # metric_learn/rca.py # test/test_base_metric.py # test/test_utils.py

This reverts commit 81c9a8d.

wdevazelhes · 2019-06-05T13:10:25Z

test/test_fit_transform.py

-    if np.sign(res_1[0,0]) != np.sign(res_2[0,0]):
-        res_2 *= -1
-    assert_array_almost_equal(res_1, res_2)
+    assert_array_almost_equal(abs(res_1), abs(res_2))


The tests were failing because of a sign error, I think it's because of the eigenvalue decomposition in LFDA, that can lead to different signs in the output. I did this modif because I thought that any sign switching of any column still keeps the result valid am I right (any symmetry wrt some of the axis are still valid right?) (Not just a global sign switching as before)? But maybe I should rather try to change the code to make LFDA deterministic ? What do you think @perimosocordiae @bellet ?

I just raised an issue for that (#211), I guess maybe let's do like that for now and later fix this non deterministic problem ?

The sign switch is expected, as each eigenvector may be negated independently of the others. The result is valid in the sense that the underlying optimization problem is still solved, and distances in the transformed space are equivalent. Individual points in the transformed space won't always be the same, though.

I don't think it's worth trying to enforce a deterministic result, but I could probably be persuaded otherwise.

Let's discuss it in the issue you opened

wdevazelhes · 2019-06-05T13:38:09Z

If you agree with my last modifs (see comment #193 (review)), we should be ready to merge

bellet

LGTM. Merging

William de Vazelhes added 3 commits April 18, 2019 13:45

Uniformize num_dims and add it for LMNN

07c156a

Merge branch 'master' into feat/dim_red_for_all

65d7709

# Conflicts: # test/test_utils.py

MAINT: fix imports

b003d83

Fix: fix test_num_dims

9ec6892

perimosocordiae approved these changes Apr 18, 2019

View reviewed changes

MAINT: Address scikit-learn-contrib#193 (review)

5edad14

bellet mentioned this pull request Apr 23, 2019

Error with modshogun #192

Closed

wdevazelhes mentioned this pull request Apr 24, 2019

[MRG] Uniformize initialization for all algorithms #195

Merged

5 tasks

William de Vazelhes added 3 commits April 29, 2019 11:14

Refactor num_dims in n_components and add deprecation

047191b

FIX make some tests work

6630d0a

FIX Make tests work (fix deprecation messages and fix RCA example)

5579c9a

wdevazelhes mentioned this pull request May 13, 2019

[MRG] FIX LMNN gradient and cost function #201

Merged

William de Vazelhes added 6 commits May 29, 2019 16:45

Merge branch 'master' into feat/dim_red_for_all

b015258

# Conflicts: # metric_learn/rca.py # test/test_base_metric.py # test/test_utils.py

Remove unused import

81c9a8d

Revert "Remove unused import"

1334797

This reverts commit 81c9a8d.

Fix import

0c9758e

FIX fix some tests

ca58256

Allow more general sign switching in test_lfda

e161cc9

wdevazelhes commented Jun 5, 2019

View reviewed changes

wdevazelhes mentioned this pull request Jun 5, 2019

lfda not deterministic #211

Closed

wdevazelhes requested a review from bellet June 5, 2019 13:38

wdevazelhes changed the title ~~[WIP] Uniformize num_dims and add it for LMNN~~ [MRG] Uniformize num_dims and add it for LMNN Jun 5, 2019

wdevazelhes requested a review from nvauquie June 7, 2019 08:03

bellet approved these changes Jun 7, 2019

View reviewed changes

bellet merged commit 3899653 into scikit-learn-contrib:master Jun 7, 2019

bellet mentioned this pull request Jun 7, 2019

Dimensionality reduction for algorithms learning Mahalanobis matrix M #167

Open

mvargas33 mentioned this pull request Oct 4, 2021

Rename variables, proposed by issue #257 #324

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Uniformize num_dims and add it for LMNN #193

[MRG] Uniformize num_dims and add it for LMNN #193

wdevazelhes commented Apr 18, 2019 •

edited

Loading

wdevazelhes commented Apr 18, 2019 •

edited

Loading

bellet commented Apr 18, 2019 •

edited

Loading

wdevazelhes commented Apr 18, 2019

bellet commented Apr 18, 2019

perimosocordiae Apr 18, 2019

wdevazelhes Apr 19, 2019 •

edited

Loading

perimosocordiae Apr 18, 2019

wdevazelhes Apr 19, 2019

perimosocordiae Apr 18, 2019

wdevazelhes Apr 19, 2019

wdevazelhes Apr 19, 2019

wdevazelhes commented Apr 23, 2019

bellet commented Apr 23, 2019

wdevazelhes commented May 3, 2019 •

edited

Loading

wdevazelhes Jun 5, 2019

wdevazelhes Jun 5, 2019

perimosocordiae Jun 5, 2019

bellet Jun 7, 2019

wdevazelhes commented Jun 5, 2019

bellet left a comment



		def _check_num_dims(n_features, num_dims):
		"""Checks that num_dims is less that n_features and deal with the None

[MRG] Uniformize num_dims and add it for LMNN #193

[MRG] Uniformize num_dims and add it for LMNN #193

Conversation

wdevazelhes commented Apr 18, 2019 • edited Loading

wdevazelhes commented Apr 18, 2019 • edited Loading

bellet commented Apr 18, 2019 • edited Loading

wdevazelhes commented Apr 18, 2019

bellet commented Apr 18, 2019

Choose a reason for hiding this comment

wdevazelhes Apr 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wdevazelhes commented Apr 23, 2019

bellet commented Apr 23, 2019

wdevazelhes commented May 3, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wdevazelhes commented Jun 5, 2019

bellet left a comment

Choose a reason for hiding this comment

wdevazelhes commented Apr 18, 2019 •

edited

Loading

wdevazelhes commented Apr 18, 2019 •

edited

Loading

bellet commented Apr 18, 2019 •

edited

Loading

wdevazelhes Apr 19, 2019 •

edited

Loading

wdevazelhes commented May 3, 2019 •

edited

Loading