[WIP] Add verbose to NCA and MLKR #105

wdevazelhes · 2018-08-01T17:14:24Z

Fixes #67.

This PR adds a verbose argument for NCA and MLKR. Here is an example on a snippet.

from sklearn.datasets import load_digits
from metric_learn import NCA
dataset = load_digits()
X, y = dataset.data, dataset.target
nca = NCA(verbose=True, max_iter=5)
nca.fit(X, y)

Returns: UPDATE: Edited after commit 8ca38b7 that correct the training time

[NCA]
[NCA]  Iteration      Objective Value    Time(s)
[NCA] ------------------------------------------
[NCA]          0         1.436318e+03       0.20
[NCA]          1         1.770346e+03       0.23
[NCA]          2         1.777018e+03       0.22
[NCA]          3         1.779409e+03       0.22
[NCA]          4         1.781304e+03       0.23
[NCA]          5         1.782404e+03       0.22
/home/will/Code/metric-learn-bis/metric_learn/nca.py:113: ConvergenceWarning: [NCA] NCA did not converge: b'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'
  cls_name, opt_result.message), ConvergenceWarning)
[NCA] Training took     1.35s.

The code mostly comes from scikit-learn's NCA PR (scikit-learn/scikit-learn#10058), and this verbose code in particular was taken mostly from scikit-learn's LMNN PR (scikit-learn/scikit-learn#8602)

TODO:

Fix the convergence warning test for MLKR (related to MLKR optimization algorithm does not work as expected regarding iterations #104)

…cation for classification and regression for regression

perimosocordiae

A few minor comments, but overall I'm happy with this.

perimosocordiae · 2018-08-01T17:44:51Z

metric_learn/mlkr.py

+  def _loss(self, flatA, X, y, dX):
+
+    if self.n_iter_ == 0:
+      if self.verbose:


I'd combine these into a single condition.

Agreed, done

perimosocordiae · 2018-08-01T17:46:25Z

metric_learn/mlkr.py

+        cls_name = self.__class__.__name__
+        print('[{}]'.format(cls_name))
+        print('[{}] {}\n[{}] {}'.format(cls_name, header, cls_name,
+                                        '-' * len(header)))


Named format slots might make this easier to read. That is, like the "by name" examples here: https://docs.python.org/2/library/string.html#format-examples

Agreed, done (except for the header field because there I thought keyword arguments would be more redundant than really helping reading , but tell me if you don't think so)

perimosocordiae · 2018-08-01T17:48:25Z

metric_learn/mlkr.py

+        print('[{}] {}\n[{}] {}'.format(cls_name, header, cls_name,
+                                        '-' * len(header)))
+
+    t_funcall = time.time()


I'm not a big fan of the t_ prefix for times. How about start_time?

I agree, done

wdevazelhes · 2018-08-10T13:29:41Z

I just noticed that the Conjugate Gradient method (used in MLKR) has a different way to count iterations: unlike L-BFGS-B, one iteration is not equal to one function call, so the way to print iteration number, objective and gradient, as done in this PR does not work for MLKR.

I was thinking I could maybe instead print the cost function for each function evaluation instead of each iteration, for MLKR ?

Or we could use L-BFGS-B for MLKR too ? @perimosocordiae was there a particular reason to use Conjugate Gradient ?
Looking into the documentation of Conjugate Gradient https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_cg.html, at Notes they say that it works better if the function has a global minima, which I don't think there is, right ? (any rotation of an optimal transformation A would be optimal too because conserving distances)
@bellet any thought on that ?

Note: here is a thread that compares the two methods: https://scicomp.stackexchange.com/questions/507/bfgs-vs-conjugate-gradient-method

perimosocordiae · 2018-08-10T13:40:55Z

The MLKR paper says this about the optimization problem, after briefly describing a gradient descent algorithm:

Of course, other methods for minimizing (3) such as conjugate gradient, stochastic gradient or BFGS may also be used and might lead to faster convergence results.

The initial MLKR implementation by @dsquareindia used a hand-rolled gradient descent, and I later converted this to use scipy.optimize.minimize probably without thinking too hard about the choice of method.

So in summary: if you think L_BFGS-B will work better, feel free to try it out! 👍

bellet · 2018-08-13T16:17:31Z

L-BFGS-B should also work well (maybe slightly better) at the cost of slightly higher memory. Maybe you can indeed give it a try, this would make it more consistent with NCA

… for regression

wdevazelhes · 2018-08-16T10:33:43Z

Allright, we can go for L-BFGS-B for now (it can easily be changed in the future)
I just added a commit which also uses less features for the test of make_regression, making the task easier to train on hence making MLKR do really 2 iterations (and hitting max_iter in the test), and not stopping weirdly at the beginning (see #104)

…guments for L-BFGS-B

bellet

LGTM. This slightly changes the interface to MLKR (remove epsilon and change alpha to tol). If we want to ensure compatibility with existing code we could keep the name alpha and show some deprecation warning for epsilon. But maybe this is not so important. @perimosocordiae ?

perimosocordiae · 2018-08-16T16:00:35Z

Changing the underlying optimizer is a bit of a compatibility break, too, so I'm fine with making it explicit by changing the keyword arguments.

# Conflicts: # metric_learn/mlkr.py # test/metric_learn_test.py

# Conflicts: # metric_learn/mlkr.py # metric_learn/nca.py

wdevazelhes · 2018-08-17T14:34:37Z

I just realized in fact L-BFGS-B also has an epsilon parameter, so maybe we want to keep it ? And also add it to NCA consequently ?

bellet · 2018-08-17T14:48:21Z

From my understanding, epsilon is useless here because you already provide a function to compute the exact gradient (this is also true for NCA)

wdevazelhes · 2018-08-17T14:52:03Z

Oh yes that's right
I'll provide a deprecation warning similar to the one in NCA and I think we'll be good to merge

wdevazelhes · 2018-08-17T15:04:36Z

I just realized that MLKR is introduced in master (will be introduced in the next release), so I guess actually we don't need any deprecation ?

bellet · 2018-08-17T15:11:30Z

Makes sense indeed :-) But need to fix the travis failure

wdevazelhes · 2018-08-17T15:42:09Z

Done

William de Vazelhes added 3 commits August 1, 2018 16:15

ENH: Add verbose to NCA

a122210

ENH: add verbose to MLKR

b29948d

ENH: Add test for convergence warning, and fix datasets (use classifi…

76b4c20

…cation for classification and regression for regression

perimosocordiae approved these changes Aug 1, 2018

View reviewed changes

William de Vazelhes added 3 commits August 6, 2018 11:23

STY: update code according to review scikit-learn-contrib#105 (review)

4619372

FIX: return the real training time

8ca38b7

FIX: fix forgotten function call in test_no_verbose

ae9f551

MAINT: Add L-BFGS-B to MLKR, and improve the tests with less features…

567e698

… for regression

wdevazelhes mentioned this pull request Aug 16, 2018

[MRG] FIX: fix MLKR cost and gradient #111

Merged

William de Vazelhes added 3 commits August 16, 2018 15:44

FIX: remove MLKR previous arguments for conjugate gradient and put ar…

21e9f59

…guments for L-BFGS-B

FIX: fix test string representation for mlkr

1ad7c8f

FIX: convert y to numeric in MLKR (since it is a regression algorithm).

aaca494

bellet approved these changes Aug 16, 2018

View reviewed changes

perimosocordiae approved these changes Aug 16, 2018

View reviewed changes

William de Vazelhes added 2 commits August 17, 2018 09:49

Merge branch 'master' into feat/add_verbose_nca_mlkr

6c78fcd

# Conflicts: # metric_learn/mlkr.py # test/metric_learn_test.py

Merge branch 'master' into feat/add_verbose_nca_mlkr

2f84df9

# Conflicts: # metric_learn/mlkr.py # metric_learn/nca.py

FIX: fix MLKR test using the method mlkr._loss

f63eae0

perimosocordiae merged commit 7441357 into scikit-learn-contrib:master Aug 17, 2018

wdevazelhes deleted the feat/add_verbose_nca_mlkr branch August 22, 2018 06:50

wdevazelhes mentioned this pull request Feb 26, 2019

Allow support for multi-label algorithms #174

Open

[WIP] Add verbose to NCA and MLKR #105

[WIP] Add verbose to NCA and MLKR #105

Uh oh!

Conversation

wdevazelhes commented Aug 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO:

Uh oh!

perimosocordiae left a comment

Choose a reason for hiding this comment

Uh oh!

perimosocordiae Aug 1, 2018

Choose a reason for hiding this comment

Uh oh!

wdevazelhes Aug 6, 2018

Choose a reason for hiding this comment

Uh oh!

perimosocordiae Aug 1, 2018

Choose a reason for hiding this comment

Uh oh!

wdevazelhes Aug 6, 2018

Choose a reason for hiding this comment

Uh oh!

perimosocordiae Aug 1, 2018

Choose a reason for hiding this comment

Uh oh!

wdevazelhes Aug 6, 2018

Choose a reason for hiding this comment

Uh oh!

wdevazelhes commented Aug 10, 2018

Uh oh!

perimosocordiae commented Aug 10, 2018

Uh oh!

bellet commented Aug 13, 2018

Uh oh!

wdevazelhes commented Aug 16, 2018

Uh oh!

bellet left a comment

Choose a reason for hiding this comment

Uh oh!

perimosocordiae commented Aug 16, 2018

Uh oh!

wdevazelhes commented Aug 17, 2018

Uh oh!

bellet commented Aug 17, 2018

Uh oh!

wdevazelhes commented Aug 17, 2018

Uh oh!

wdevazelhes commented Aug 17, 2018

Uh oh!

bellet commented Aug 17, 2018

Uh oh!

wdevazelhes commented Aug 17, 2018

Uh oh!

Uh oh!

wdevazelhes commented Aug 1, 2018 •

edited

Loading