[MRG] FIX LMNN gradient and cost function #201

wdevazelhes · 2019-05-13T08:43:38Z

Fixes #46
There seems to be problems in LMNN, that we might want to fix for the next release:

(Note: to find out what to do we can also compare to scikit-learn/scikit-learn#8602 )

Looking into this line the gradient seems to be of the shape of an outer product of Xs (when it should be of the shape of L) (see [MRG] Uniformize num_dims and add it for LMNN #193 (comment)). (There might be a dot product with L missing:
(cf. https://github.com/scikit-learn/scikit-learn/pull/8602/files#diff2914dd42441ed8b594e69b2fe04d23f5R762)) This blocks PR [MRG] Uniformize num_dims and add it for LMNN #193:

https://github.com/metric-learn/metric-learn/blob/d4badc8adc0a8ce6689dd9a95e89181efaf5ee24/metric_learn/lmnn.py#L196

Some part of the loss function seems to be an integer count, when it should be the hinge loss ? as suggested by LMNN - objective calculate? #46:

https://github.com/metric-learn/metric-learn/blob/d4badc8adc0a8ce6689dd9a95e89181efaf5ee24/metric_learn/lmnn.py#L198

I created a test that should pass when this is fixed (like the one in scikit-learn's PR)

I'll look into it, but I'm not yet that familiar with LMNN so you might know how to fix it

bellet · 2019-05-14T17:42:38Z

You must be careful when comparing with scikit-learn/scikit-learn#8602 because this version uses a squared hinge loss, which obviously changes the cost function and its gradient

bellet

But I agree that there seems to be a problem with both the gradient and the cost: L should indeed appear in the gradient as the objective is quadratic in L, and the cost should indeed sum margin violations, not counting them

wdevazelhes · 2019-05-21T13:08:34Z

I realized in fact there was just one matrix product with L that was missing, I added it and added a toy example too for the loss function

wdevazelhes · 2019-05-21T13:09:11Z

metric_learn/lmnn.py

@@ -191,10 +191,12 @@ def _loss_grad(self, X, L, dfG, impostors, it, k, reg, target_neighbors, df,
    # do the gradient update
    assert not np.isnan(df).any()
    G = dfG * reg + df * (1 - reg)
+
+    grad = 2 * L.dot(G)


I put the multiplication by 2 here so that we can test the real gradient using check_grad

wdevazelhes · 2019-05-21T13:10:18Z

metric_learn/lmnn.py

@@ -191,10 +191,12 @@ def _loss_grad(self, X, L, dfG, impostors, it, k, reg, target_neighbors, df,
    # do the gradient update
    assert not np.isnan(df).any()
    G = dfG * reg + df * (1 - reg)
+
+    grad = 2 * L.dot(G)
    # compute the objective function
    objective = total_active * (1 - reg)


In fact this was right, it's the term coming from the +1 that is in the hinge loss (cf. question from #46)

wdevazelhes · 2019-05-21T13:23:05Z

I guess it is ready to merge now

perimosocordiae

It's been a while since I've read the LMNN paper or looked at the shogun implementation, so I'm not sure if this is a correct change or not. I suspect that there are a few different LMNN formulations floating around the web at this point, with slightly different characteristics. For example, I think the scikit-learn PR's implementation is using a modification of the original loss to make it more amenable for optimization using scipy.optimize.

Empirically, how does this PR affect the behavior of LMNN on iris / make_blobs / sandwich / etc ?

perimosocordiae · 2019-05-21T13:46:03Z

test/metric_learn_test.py

+        # want to have the same result when applying the function twice
+        return lmnn._loss_grad(X, L.reshape(-1, X.shape[1]), dfG, impostors,
+                               1, k, reg, target_neighbors, df.copy(),
+                               list(a1), list(a2))[0].ravel()


This test seems particularly brittle, as it replicates so much LMNN functionality. Ideally we could structure the LMNN code to make this simpler, but at the least we should avoid copy-paste here:

def loss_grad(flat_L): return lmnn._loss_grad(X, flat_L.reshape(...), ...) fun = lambda x: loss_grad(x)[1] grad = lambda x: loss_grad(x)[0].ravel()

Yes you're right, let's go for this clearer expression
done
(note: I didn't use the lambda expression since I was returned a flake8 error, but maybe it's something we want to ignore and still go for the lambda ?)

perimosocordiae · 2019-05-21T13:46:40Z

test/metric_learn_test.py

+                                        (np.array([[0], [1], [2], [3]]),
+                                         [1, 0, 0, 1], 26.)])
+def test_toy_ex_lmnn(X, y, loss):
+  """Test that the loss give the right result on a toy example"""


Where did you get these values from?

I computed them by hand, according to the paper (note that it's only the loss function and that I set L to identity so it's easy since the distances are easy to get from this 1D example).
Maybe I can add a small detail of the computation as a comment ?

wdevazelhes · 2019-05-21T14:14:06Z

Empirically, how does this PR affect the behavior of LMNN on iris / make_blobs / sandwich / etc ?

This is the difference between master and this PR for LMNN:

master:

this PR:

So I guess a significant improvement, I'll check for the other datasets

wdevazelhes · 2019-05-21T14:30:22Z

But I agree I haven't been through every computation of the algorithm, I just looked more or less if the loss is correct at initialization with the toy examples, but I didn't check the part on recomputing which constraints are active after each iteration etc
But if we assume the loss is correct, it is, from the code:
loss = constant + tr(A L.T L), with A = dfG * reg + df * (1 - reg) = constant,
so the gradient is necessarily grad = L (A + A.T) = 2 * L A ( deriving from equation 3 page of this course: http://cs229.stanford.edu/notes/cs229-notes1.pdf)
So I guess now there is just need to check whether the matrix A is the right one, (i.e. it contains the right outer product differences) ?

wdevazelhes · 2019-05-21T14:45:44Z

For iris, we don't see a big difference:

from sklearn.datasets import load_iris
from metric_learn import LMNN
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
from sklearn.pipeline import make_pipeline

X, y = load_iris(return_X_y=True)

red_dim = make_pipeline(LMNN(), TSNE())
X_e = red_dim.fit_transform(X, y)

plt.scatter(X_e[:, 0], X_e[:, 1], c=y)
plt.show()

With master:

With this PR:

wdevazelhes · 2019-05-21T14:52:11Z

From this example (inspired from #180, i.e. a dataset with two informative features and a big noise for the other features), the difference is pretty clear:

from sklearn.utils import shuffle
from sklearn.datasets import make_classification
from metric_learn import LMNN
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
from sklearn.pipeline import make_pipeline

X, y = make_classification(random_state=42, n_redundant=0, shuffle=False)
X[:, 2:] *= 10
X, y = shuffle(X, y)

red_dim = make_pipeline(LMNN(), TSNE())
X_e = red_dim.fit_transform(X, y)

plt.scatter(X_e[:, 0], X_e[:, 1], c=y)
plt.show()

With this PR:

With master:

wdevazelhes · 2019-05-22T08:34:06Z

Sorry my last commit was for the wrong branch, I'll revert it

This reverts commit 562f33b.

bellet · 2019-05-24T13:00:04Z

Empirically, how does this PR affect the behavior of LMNN on iris / make_blobs / sandwich / etc ?

This is the difference between master and this PR for LMNN:

master:

this PR:

So I guess a significant improvement, I'll check for the other datasets

This makes me think that we should fix a random seed in the sandwich example so that it is stable when run several times ?

bellet · 2019-05-24T13:04:26Z

What about the problem in the objective function?

wdevazelhes · 2019-05-24T13:10:27Z

What about the problem in the objective function?

You mean the one with the count being added ? I think it's maybe normal in the end (see comment #201 (comment)). In fact this count should be the sum of all the "+ 1" (the margin) in the hinge losses that are not 0 in the total sum. (But I didn't check yet whether total_active is indeed this number of active hinge losses)

bellet · 2019-05-24T13:10:25Z

metric_learn/lmnn.py

-    objective += G.flatten().dot(L.T.dot(L).flatten())
-    return G, objective, total_active, df, a1, a2
+    objective += G.flatten().dot(L.flatten())
+    return 2 * G, objective, total_active, df, a1, a2


I guess the factor 2 comes from differentiating L.T.dot(L). It would be clearer to put it directly in G then

bellet · 2019-05-24T17:03:47Z

What about the problem in the objective function?

You mean the one with the count being added ? I think it's maybe normal in the end (see comment #201 (comment)). In fact this count should be the sum of all the "+ 1" (the margin) in the hinge losses that are not 0 in the total sum. (But I didn't check yet whether total_active is indeed this number of active hinge losses)

The hinge loss is equal to max(0, 1 - a). (in the case of LMNN, a is the distance with the impostor minus the distance with the target neighbor)
It is equal to 0 when the margin is satisfied (a >= 1). if not it is equal to the margin violation (1 - a), not "+1"...

bellet

If I'm not mistaken, you should compute something like

violation1 = np.max(0, g1 - g0)
violation2 = np.max(0, g2 - g0)
total_active += (violation1 > 0).sum() + (violation2 > 0).sum()
total_violation += violation1.sum() + violation2.sum()

and to replace objective = total_active * (1 - reg) by objective = total_violation * (1 - reg)

(you can keep using total_active as an interesting quantity to monitor though)

You can run some executions showing how the objective function evolves to make sure it goes down at each iteration when small enough step size is used, and compare with the version in master (the number of active constraints should tend to go down but not each time)

wdevazelhes · 2019-05-29T13:16:06Z

The hinge loss is equal to max(0, 1 - a). (in the case of LMNN, a is the distance with the impostor minus the distance with the target neighbor)
It is equal to 0 when the margin is satisfied (a >= 1). if not it is equal to the margin violation (1 - a), not "+1"...

I was not thinking that "+1'" only was the value of the hinge loss, but I was thinking it was the "+1" part of the hinge loss (sum(1-a) = sum(1) - sum(a) = total_active - sum(a)). I had the impression that the "df" part later (some outer products), with operations with L, accounted for the sum of "a"s (distance with the impostor minus the distance with the target neighbor) so in total there was something like: push_loss = total_active - sum_of_the_differences_btw_distance_with_the_impostor_and_distance _with_target_neighbor
But I definitely need to check if this is indeed the case I didn't deepen the "df" computation part

You can run some executions showing how the objective function evolves to make sure it goes down at each iteration when small enough step size is used, and compare with the version in master (the number of active constraints should tend to go down but not each time)

Yes good idea I'll do that

wdevazelhes · 2019-05-29T13:39:36Z

Oups, in fact I introduced this bug in PR #101: I forgot to multiply by L when applying the update...

See comments https://github.com/metric-learn/metric-learn/pull/101/files#r288573192 and https://github.com/metric-learn/metric-learn/pull/101/files#r288573007

bellet · 2019-05-29T13:57:52Z

Ok fair enough - the objective may be correct but the code is very hard to parse. Let's fix the gradient issue for now and make an issue to maybe refactor the code later to make it more easy to maintain.

wdevazelhes · 2019-05-29T14:16:25Z

So in the end this PR just fixes the bug that I introduced, as well as introduced some tests, so if everybody agrees, I think we are good to merge ? @perimosocordiae ?

perimosocordiae · 2019-05-29T14:17:29Z

Merged!

TST: make tests for LMNN gradient

86d26d3

wdevazelhes added the bug label May 13, 2019

wdevazelhes added this to the v0.5.0 milestone May 13, 2019

wdevazelhes changed the title ~~[WIP] FIX LMNN gradient~~ [WIP] FIX LMNN gradient and cost function May 13, 2019

wdevazelhes requested review from bellet, nvauquie and perimosocordiae May 13, 2019 08:48

bellet reviewed May 14, 2019

View reviewed changes

FIX: fix gradient computation

61eea28

wdevazelhes commented May 21, 2019

View reviewed changes

Simplify expression

d5c9dd0

wdevazelhes changed the title ~~[WIP] FIX LMNN gradient and cost function~~ [MRG] FIX LMNN gradient and cost function May 21, 2019

perimosocordiae reviewed May 21, 2019

View reviewed changes

Be more tolerant for checking NCA

b238b65

William de Vazelhes added 2 commits May 21, 2019 16:58

Address scikit-learn-contrib#201 (comment)

f9511a0

Add checks for bounds argument

562f33b

William de Vazelhes added 2 commits May 22, 2019 10:40

Revert "Add checks for bounds argument"

65057e3

This reverts commit 562f33b.

Add missing return

69c328c

bellet reviewed May 24, 2019

View reviewed changes

bellet requested changes May 24, 2019

View reviewed changes

wdevazelhes mentioned this pull request May 29, 2019

[MRG] FIX Fix LMNN rollback #101

Merged

bellet approved these changes May 29, 2019

View reviewed changes

perimosocordiae merged commit 187b59e into scikit-learn-contrib:master May 29, 2019

wdevazelhes mentioned this pull request May 29, 2019

refactor lmnn code #210

Open

wdevazelhes mentioned this pull request Jun 14, 2019

[MRG] Enhance documentation #208

Merged

10 tasks

[MRG] FIX LMNN gradient and cost function #201

[MRG] FIX LMNN gradient and cost function #201

Uh oh!

Conversation

wdevazelhes commented May 13, 2019

Uh oh!

bellet commented May 14, 2019

Uh oh!

bellet left a comment

Choose a reason for hiding this comment

Uh oh!

wdevazelhes commented May 21, 2019

Uh oh!

wdevazelhes May 21, 2019

Choose a reason for hiding this comment

Uh oh!

wdevazelhes May 21, 2019

Choose a reason for hiding this comment

Uh oh!

wdevazelhes commented May 21, 2019

Uh oh!

perimosocordiae left a comment

Choose a reason for hiding this comment

Uh oh!

perimosocordiae May 21, 2019

Choose a reason for hiding this comment

Uh oh!

wdevazelhes May 21, 2019

Choose a reason for hiding this comment

Uh oh!

perimosocordiae May 21, 2019

Choose a reason for hiding this comment

Uh oh!

wdevazelhes May 21, 2019

Choose a reason for hiding this comment

Uh oh!

wdevazelhes commented May 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wdevazelhes commented May 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wdevazelhes commented May 21, 2019

Uh oh!

wdevazelhes commented May 21, 2019

Uh oh!

wdevazelhes commented May 22, 2019

Uh oh!

bellet commented May 24, 2019

Uh oh!

bellet commented May 24, 2019

Uh oh!

wdevazelhes commented May 24, 2019

Uh oh!

bellet May 24, 2019

Choose a reason for hiding this comment

Uh oh!

bellet commented May 24, 2019

Uh oh!

bellet left a comment

Choose a reason for hiding this comment

Uh oh!

wdevazelhes commented May 29, 2019

Uh oh!

wdevazelhes commented May 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bellet commented May 29, 2019

Uh oh!

wdevazelhes commented May 29, 2019

Uh oh!

perimosocordiae commented May 29, 2019

Uh oh!

Uh oh!

wdevazelhes commented May 21, 2019 •

edited

Loading

wdevazelhes commented May 21, 2019 •

edited

Loading

wdevazelhes commented May 29, 2019 •

edited

Loading