Skip to content

[WIP] Add verbose to NCA and MLKR #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

wdevazelhes
Copy link
Member

@wdevazelhes wdevazelhes commented Aug 1, 2018

Fixes #67.

This PR adds a verbose argument for NCA and MLKR. Here is an example on a snippet.

from sklearn.datasets import load_digits
from metric_learn import NCA
dataset = load_digits()
X, y = dataset.data, dataset.target
nca = NCA(verbose=True, max_iter=5)
nca.fit(X, y)

Returns: UPDATE: Edited after commit 8ca38b7 that correct the training time

[NCA]
[NCA]  Iteration      Objective Value    Time(s)
[NCA] ------------------------------------------
[NCA]          0         1.436318e+03       0.20
[NCA]          1         1.770346e+03       0.23
[NCA]          2         1.777018e+03       0.22
[NCA]          3         1.779409e+03       0.22
[NCA]          4         1.781304e+03       0.23
[NCA]          5         1.782404e+03       0.22
/home/will/Code/metric-learn-bis/metric_learn/nca.py:113: ConvergenceWarning: [NCA] NCA did not converge: b'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'
  cls_name, opt_result.message), ConvergenceWarning)
[NCA] Training took     1.35s.

The code mostly comes from scikit-learn's NCA PR (scikit-learn/scikit-learn#10058), and this verbose code in particular was taken mostly from scikit-learn's LMNN PR (scikit-learn/scikit-learn#8602)

TODO:

William de Vazelhes added 3 commits August 1, 2018 16:15
Copy link
Contributor

@perimosocordiae perimosocordiae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but overall I'm happy with this.

def _loss(self, flatA, X, y, dX):

if self.n_iter_ == 0:
if self.verbose:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd combine these into a single condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

cls_name = self.__class__.__name__
print('[{}]'.format(cls_name))
print('[{}] {}\n[{}] {}'.format(cls_name, header, cls_name,
'-' * len(header)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named format slots might make this easier to read. That is, like the "by name" examples here: https://docs.python.org/2/library/string.html#format-examples

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done (except for the header field because there I thought keyword arguments would be more redundant than really helping reading , but tell me if you don't think so)

print('[{}] {}\n[{}] {}'.format(cls_name, header, cls_name,
'-' * len(header)))

t_funcall = time.time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of the t_ prefix for times. How about start_time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, done

@wdevazelhes
Copy link
Member Author

I just noticed that the Conjugate Gradient method (used in MLKR) has a different way to count iterations: unlike L-BFGS-B, one iteration is not equal to one function call, so the way to print iteration number, objective and gradient, as done in this PR does not work for MLKR.

I was thinking I could maybe instead print the cost function for each function evaluation instead of each iteration, for MLKR ?

Or we could use L-BFGS-B for MLKR too ? @perimosocordiae was there a particular reason to use Conjugate Gradient ?
Looking into the documentation of Conjugate Gradient https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_cg.html, at Notes they say that it works better if the function has a global minima, which I don't think there is, right ? (any rotation of an optimal transformation A would be optimal too because conserving distances)
@bellet any thought on that ?

Note: here is a thread that compares the two methods: https://scicomp.stackexchange.com/questions/507/bfgs-vs-conjugate-gradient-method

@perimosocordiae
Copy link
Contributor

The MLKR paper says this about the optimization problem, after briefly describing a gradient descent algorithm:

Of course, other methods for minimizing (3) such as conjugate gradient, stochastic gradient or BFGS may also be used and might lead to faster convergence results.

The initial MLKR implementation by @dsquareindia used a hand-rolled gradient descent, and I later converted this to use scipy.optimize.minimize probably without thinking too hard about the choice of method.

So in summary: if you think L_BFGS-B will work better, feel free to try it out! 👍

@bellet
Copy link
Member

bellet commented Aug 13, 2018

L-BFGS-B should also work well (maybe slightly better) at the cost of slightly higher memory. Maybe you can indeed give it a try, this would make it more consistent with NCA

@wdevazelhes
Copy link
Member Author

Allright, we can go for L-BFGS-B for now (it can easily be changed in the future)
I just added a commit which also uses less features for the test of make_regression, making the task easier to train on hence making MLKR do really 2 iterations (and hitting max_iter in the test), and not stopping weirdly at the beginning (see #104)

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This slightly changes the interface to MLKR (remove epsilon and change alpha to tol). If we want to ensure compatibility with existing code we could keep the name alpha and show some deprecation warning for epsilon. But maybe this is not so important. @perimosocordiae ?

@perimosocordiae
Copy link
Contributor

Changing the underlying optimizer is a bit of a compatibility break, too, so I'm fine with making it explicit by changing the keyword arguments.

William de Vazelhes added 2 commits August 17, 2018 09:49
# Conflicts:
#	metric_learn/mlkr.py
#	test/metric_learn_test.py
# Conflicts:
#	metric_learn/mlkr.py
#	metric_learn/nca.py
@wdevazelhes
Copy link
Member Author

I just realized in fact L-BFGS-B also has an epsilon parameter, so maybe we want to keep it ? And also add it to NCA consequently ?

@bellet
Copy link
Member

bellet commented Aug 17, 2018

From my understanding, epsilon is useless here because you already provide a function to compute the exact gradient (this is also true for NCA)

@wdevazelhes
Copy link
Member Author

Oh yes that's right
I'll provide a deprecation warning similar to the one in NCA and I think we'll be good to merge

@wdevazelhes
Copy link
Member Author

I just realized that MLKR is introduced in master (will be introduced in the next release), so I guess actually we don't need any deprecation ?

@bellet
Copy link
Member

bellet commented Aug 17, 2018

Makes sense indeed :-) But need to fix the travis failure

@wdevazelhes
Copy link
Member Author

Done

@perimosocordiae perimosocordiae merged commit 7441357 into scikit-learn-contrib:master Aug 17, 2018
@wdevazelhes wdevazelhes deleted the feat/add_verbose_nca_mlkr branch August 22, 2018 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants