[WIP] Metric Learning :: NCA #4789

artsobolev · 2015-05-29T16:07:10Z

GSoC 2015 project, Iteration 1: Neighborhood Component Analysis

For details refer to my GSoC blog, especially a post on NCA.

…tric-learning

Fixed 2 bugs: 1. Algorithm was choosing the worst run 2. KL and L1 losses had same gradient

eickenberg · 2015-05-31T10:26:10Z

sklearn/metric_learning/nca.py

+
+def nca_vectorized_oracle(X, y, params):
+    dX = X[:, None, :] - X[None, :, :]  # n_samples x n_samples x n_comp
+    outer = dX[:, :, None, :] * dX[:, :, :, None]  # n_samples x n_samples x n_comp x n_comp


please use np.newaxis instead of None. What does n_comp stand for? Could you use n_features instead?

Ah, it's n_components. OK. Please write it out.

eickenberg · 2015-05-31T11:07:14Z

Some very superficial comments, mostly pertaining to form while waiting for the bench results.

I'm not sure I like the name "oracle". It may also be preferable to have _precompute functions for certain aspects, but that is up to discussion.

There are indeed some heavy tensors along the way in the fully vectorized version, so they will need to be memory profiled as well later on in addition to speed profiling for certain operations.

artsobolev · 2015-05-31T12:59:39Z

@eickenberg I thought the name "oracle" is pretty much a standard in optimization theory (in this case we have first-order oracle that returns its value and the gradient at a given point).

eickenberg · 2015-05-31T13:24:04Z

Well, oracle rings 'black box' to me. While it is true that you are not
exploiting any specific structure of the problem here, we still know the
function. Keep it for the moment if you like, but bear in mind that the
term is very technical and should probably in future iterations be replaced
by something like cost_and_gradient for better legibility.

On Sun, May 31, 2015 at 3:00 PM, B@rmaley.exe notifications@github.com
wrote:

@eickenberg https://github.com/eickenberg I thought the name "oracle"
is pretty much a standard in optimization theory (in this case we have
first-order oracle that returns its value and the gradient at a given
point).

—
Reply to this email directly or view it on GitHub
#4789 (comment)
.

artsobolev · 2015-05-31T14:01:34Z

sklearn/metric_learning/nca.py

+            logp -= sp.misc.logsumexp(logp)
+            assert logp.shape == (n_samples, )
+
+            p = np.exp(logp)  # n_samples


This bothers me. Sometimes I get various numerical warnings (underflows). Should I keep probabilities in log-domain and use log-sum-exp trick for sums?

On the second thought: outer product of Xs can be almost arbitrary.

Fixed a bug in fully-vectorized optimizer Major refactoring

Compared vectorized and semivectorized NCA-assisted 1NN with 1NN with Euclidean distance on UCI Wine dataset

eickenberg · 2015-06-01T11:36:56Z

sklearn/metric_learning/nca.py

+
+def nca_vectorized_oracle(X, y, n_components, loss, threshold=0.0):
+    dx = X[:, np.newaxis, :] - X[np.newaxis, :, :]  # n_samples x n_samples x n_features
+    outer = dx[:, :, np.newaxis, :] * dx[:, :, :, np.newaxis]  # n_samples x n_samples x n_features x n_features


Could you check whether this full expansion is necessary? Looking at the formulas for the gradient, it seems possible to bring L inside all members of the sum. Assuming that L.shape[0] <= L.shape[1] (is there a usecase for the opposite?), this has the potential to decrease memory and cpu cycles on this outer expansion.

Another thing to check in addition to this is to unravel the x_ik to what they actually are: x_ik[:, np.newaxis] * x_ik = (x_i - x_k)[:, np.newaxis] * (x_i - x_k) = x_i[:, np.newaxis] * x_i + ... which yields 4 terms, all of which you can already find in your dx. This should bring down memory consumption.

Yes, L is always either square of rectangular with n_components < n_features. The opposite doesn't make sense, since L's rank is bounded by n_features.

What's the benefit of propagating L inside? In case of vectorized optimizer we would lose the ability to cache outer products. It might be useful in semivectorized optimizer, but the effect seems useless: we'll be computing L (x - y) (x - y)^T products which doesn't seem to be of any help in case of square L (though, I can compute it efficiently since I already have L(x-y) from probabilities).

In my understanding, L is not necessarily square. The reformulation does
nothing much for square L, but replaces one n_features by n_components in
the computation complexity, so if n_components << n_features, this should
result in a speed-up

On Mon, Jun 1, 2015 at 2:52 PM, B@rmaley.exe notifications@github.com
wrote:

In sklearn/metric_learning/nca.py
#4789 (comment)
:

@@ -0,0 +1,280 @@
+from ..base import BaseEstimator, TransformerMixin
+from ..utils.validation import check_is_fitted
+
+import numpy as np
+import scipy as sp
+import scipy.misc
+import scipy.optimize
+import time
+
+import sys
+
+
+def nca_vectorized_oracle(X, y, n_components, loss, threshold=0.0):

dx = X[:, np.newaxis, :] - X[np.newaxis, :, :] # n_samples x n_samples x n_features

outer = dx[:, :, np.newaxis, :] * dx[:, :, :, np.newaxis] # n_samples x n_samples x n_features x n_features

Yes, L is always either square of rectangular with n_components <
n_features. The opposite doesn't make sense, since L's rank is bounded by
n_features.

What's the benefit of propagating L inside? In case of vectorized
optimizer we would lose the ability to cache outer products. It might be
useful in semivectorized optimizer, but the effect seems useless: we'll be
computing L (x - y) (x - y)^T products which doesn't seem to be of any
help in case of square L (though, I can compute it efficiently since I
already have L(x-y) from probabilities).

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/4789/files#r31421976.

(and yes, caching those outer products would be out of the window, but we don't know yet how useful the caching actually is, since that tensor is incredibly redundant due to its low rank. This is the point I would like to attack be evaluating selectively, sequentially.)

eickenberg · 2015-06-01T11:43:15Z

Could you also make a comparison of gradient descent techniques? Preferably also on several datasets and for several numbers of samples/features so we can get a feeling for the practical complexity the methods have?

amueller · 2015-06-02T14:37:24Z

ping @bmcfee ;)

…tric-learning

eickenberg · 2015-06-10T23:39:23Z

cc @kastnerkyle :)

kastnerkyle · 2015-06-27T15:56:24Z

I am catching up with this now and will have some questions/comments soon. Any progress toward stochastic NCA?

artsobolev · 2015-06-27T23:08:54Z

Sorry for the silence, I was busy with my uni.

So I merged semivectorized oracle with stochastic one. The resultant cost can also use neighbors heuristic that cuts off too distant points.

I also experimented with L "propagated" inside of grad calculation. I measured execution time for 10 initializations on 2 datasets obtained by make_classification:

X.shape = (300, 100)

X.shape = (500, 20)

Overall, despite of line 130, the propagated version seems at least nearly as fast as the non-propagated one, or even faster in case of low-rank L.

kastnerkyle · 2015-06-30T13:33:07Z

This is looking pretty good so far!

In the bottom left plot (X.shape (500, 20), m = 5) what is up with the negative time? Have we run something like line_profiler and mem_profiler to see where the bottlenecks are in this code generally? I am surprised that the times are so close but maybe I have the wrong intuition.

Over the next weeks my idea would be to document some of the intermediate functions more (since I have little knowledge of this field it will be great help for me :D), compare different evaluation strategies (do we evaluate against all other samples for the gradient or only among the samples in the batch - @eickenberg brought this up), and evaluate on some big-ish datasets that are reasonable tests for this.

Does this sound like an OK approach to everyone?

artsobolev · 2015-07-01T00:52:03Z

Oh, that's weird. I didn't notice negative values on that plot. I use time to measure execution time, occasionally it returns weird results. Updated plot looks like this

Yes, I ran %lprun profiler in my dev ipython notebook, most of the time seems to be spent (+ precompute_distances=False) in vectorized operations like calculating outer products. logsumexp, etc.

I do test it on bigger (yet created by means of make_classification datasets: so far it gives bad results on big datasets (like 10,000 × 1000): accuracy on 1-nearest neighbors is higher without NCA. As I observe, the loss on all initializations is close to the optimal.

jnothman · 2015-09-16T08:08:43Z

Could we get a statement on what the status of this is?

Should it be labeled [MRG] yet? If not, what blockers remain?
Is @Barmaley-exe going to bring it to merge post-GSoC?

artsobolev · 2015-09-16T08:27:49Z

@jnothman

Overall, it kinda works, but I don't think it's merge-ready. It lacks documentation, tests and examples.

jnothman · 2015-09-16T11:10:39Z

Fair enough. What of those things do you intend to do in the near future?

terrytangyuan · 2015-09-16T22:05:49Z

My friend and I actually just published this metric-learn package to Pypi. Anyone interested in helping me to integrate other methods besides NCA into scikit-learn? Any suggestions would be appreciated.

amueller · 2015-10-14T17:40:02Z

That looks quite interesting. How does the NCA there compare against this one?

terrytangyuan · 2015-10-14T19:11:43Z

Haven't looked into the detailed implementation of NCA here yet. But besides wide range of algorithms metric-learn package has, it is definitely more mature and stable with tests and examples.

maniteja123 · 2016-03-23T16:55:24Z

Hi everyone, sorry for disturbing but it was suggested on mailing list that this addition is of importance. I would be really grateful if some feedback can be given regarding this. I understand there is quite some math behind this algorithm but will try my best to understand in case this feature is needed. Thanks a lot !

bhargavvader · 2016-09-08T15:13:16Z

@Barmaley-exe , are you planning on taking this up? Mind if I have a crack at it?
I can fork this and work on top of what you've already done.

Or as @terrytangyuan suggested, alternatively we can figure out a way to migrate the methods or ideas in the metric-learn package here. Any suggestions, @amueller ?

artsobolev · 2016-09-08T15:15:22Z

You're welcome to contribute.

On Thursday, 8 September 2016, Bhargav Srinivasa notifications@github.com
wrote:

@Barmaley-exe https://github.com/Barmaley-exe , are you planning on
taking this up? Mind if I have a crack at it?
I can fork this and work on top of what you've already done.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4789 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAafyo4exXzGUv0A1ewL4i19s_M7lN3Rks5qoCY_gaJpZM4EvYbc
.

GaelVaroquaux · 2019-02-25T11:16:55Z

Closing as overridden by #10058

Barmaley.exe and others added 7 commits May 29, 2015 19:02

Vectorized implementation of NCA

4065753

Merge branch 'master' of github.com:scikit-learn/scikit-learn into me…

3f7abd8

…tric-learning

Let SciPy decide which method to use

16adf0c

Adds adagrad

453f4e7

Adds loopy NCA optimizer (without memory overhead)

baf96e1

Bugfixes and fancy debug output

cae85f3

Fixed 2 bugs: 1. Algorithm was choosing the worst run 2. KL and L1 losses had same gradient

Semi-vectorized NCA oracle, merged adagrad and gd

6fb2853

eickenberg reviewed May 31, 2015
View reviewed changes

artsobolev reviewed May 31, 2015
View reviewed changes

Barmaley.exe added 5 commits May 31, 2015 19:50

Major update

7c78bc1

Fixed a bug in fully-vectorized optimizer Major refactoring

Simple benchmark

9c5449c

Compared vectorized and semivectorized NCA-assisted 1NN with 1NN with Euclidean distance on UCI Wine dataset

Replace tol with gtol for scipy's optimizer

1bf9eb4

Fixes PyCharm warnings

251818a

Replace expensive multiply-sum with fast np.einsum

ea577c5

eickenberg reviewed Jun 1, 2015
View reviewed changes

Barmaley.exe added 3 commits June 10, 2015 01:04

Adds threshold parameter

e7c76fb

Reworked benchmark

7f02808

Merge branch 'master' of github.com:scikit-learn/scikit-learn into me…

76fe794

…tric-learning

Barmaley.exe added 4 commits June 14, 2015 18:50

New benchmark, fix for loss calculation

92409fc

Mean execution time

4830a78

Propagate L inside

ee2f95d

Propagate L flag, debug facilities, optimizations

63997df

Barmaley.exe added 3 commits June 28, 2015 00:39

Combine semivectorized and stochastic methods

ce5a9bd

Fix minor warnings

8d6befe

Merge branch 'master' of github.com:scikit-learn/scikit-learn into HEAD

efce3f2

amueller mentioned this pull request Jul 31, 2015

New feature: NCA #3213

Closed

jnothman mentioned this pull request Sep 16, 2015

New Feature: Added NCA as first algorithm in metric_learning module #5276

Closed

amueller added the Waiting for Reviewer label Dec 10, 2015

wdevazelhes mentioned this pull request Nov 2, 2017

[MRG+2] Neighborhood Components Analysis #10058

Merged

9 tasks

GaelVaroquaux closed this Feb 25, 2019

Uh oh!

[WIP] Metric Learning :: NCA #4789

[WIP] Metric Learning :: NCA #4789

Uh oh!

Conversation

artsobolev commented May 29, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eickenberg commented May 31, 2015

Uh oh!

artsobolev commented May 31, 2015

Uh oh!

eickenberg commented May 31, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eickenberg commented Jun 1, 2015

Uh oh!

amueller commented Jun 2, 2015

Uh oh!

eickenberg commented Jun 10, 2015

Uh oh!

kastnerkyle commented Jun 27, 2015

Uh oh!

artsobolev commented Jun 27, 2015

Uh oh!

kastnerkyle commented Jun 30, 2015

Uh oh!

artsobolev commented Jul 1, 2015

Uh oh!

jnothman commented Sep 16, 2015

Uh oh!

artsobolev commented Sep 16, 2015

Uh oh!

jnothman commented Sep 16, 2015

Uh oh!

terrytangyuan commented Sep 16, 2015

Uh oh!

amueller commented Oct 14, 2015

Uh oh!

terrytangyuan commented Oct 14, 2015

Uh oh!

maniteja123 commented Mar 23, 2016

Uh oh!

bhargavvader commented Sep 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artsobolev commented Sep 8, 2016

Uh oh!

GaelVaroquaux commented Feb 25, 2019

Uh oh!

Uh oh!

bhargavvader commented Sep 8, 2016 •

edited

Loading