SCML : Sparse Compositional Metric Learning #278

grudloff · 2020-02-14T15:39:40Z

Sparse Compositional Metric Learning (SCML) allows scalable learning of global, multi-task and multiple local Mahalanobis metrics for multi-class data under a unified framework based on sparse combinations of rank-one basis metrics. For this initial merge, only the global setting will be implemented.

The algorithm learns on triplets, so it is necessary to add the base class for the addition of this kind of algorithm. For the sake of clarity, this will be added in a separate concurrent PR.

This implementation follows closely the matlab implementation of Y. Shi, A. Bellet and F. Sha. Sparse Compositional Metric Learning. AAAI Conference on Artificial Intelligence (AAAI), 2014. SCML paper

Theory - Global setting

The Mahalanobis Matrix is constructed as a sum of rank-1 PSD matrices:

The basis are intended to be locally discriminative. In the original paper and in this implementation they are constructed with LDA of several local regions. There are other options to construct this basis that will be added later.

The constrains are a set of triplets C that are inforced through the minimization problem:

Advantages

No need to project to SPD cone as M is constructed as a nonegative sum of SPD matrices.
Only n_basis parameters must be learned and stored, without taking into account the basis.
Uses stochastic composite optimization and in consequence, big datasets can be handled by this algorithm. An efficient implementation of regularized dual averaging is used for the optimization procedure.
Faster to Train.
Good Performance on standard datasets (see paper for detailed information) especially on multi-task and local settings. A Benchmark against the other algorithms of the package will be added to this PR later.
First algorithm to learn on triplets to be implemented in this package.

Tests on Vehicles dataset
The results for the vehicles dataset found in the matlab implementation github are used to validate the consistency and the correctness of the current implementation.
On each of the tests, the algorithm was run 100 times, the mean and std of the resulted accuracy in test and train, as well as the time, are shown below each test link.
For the batch versions, the numbers of iterations are reduced to have the same amount of gradient computations for each method.

Test Vanilla

Time. Mean:  5.7989197754859925 std:  0.05547399180223441
Train - Accuracy. Mean:  79.3214990138067 std:  0.7612083562308343
Test - Accuracy. Mean:  77.71176470588235 std:  2.056124280977084

Test Batch

Time. Mean:  6.090397052764892 std:  0.048224950065408285
Train - Accuracy. Mean:  81.41222879684418 std:  0.9372967012148159
Test - Accuracy. Mean:  78.0 std:  1.6946894459868151

Test Adagrad

Time. Mean:  6.090397052764892 std:  0.048224950065408285
Train - Accuracy. Mean:  81.41222879684418 std:  0.9372967012148159
Test - Accuracy. Mean:  78.0 std:  1.6946894459868151

Test Batch+Adagrad

Time. Mean:  1.7653330016136168 std:  0.015989019692575546
Train - Accuracy. Mean:  81.31558185404339 std:  0.9613659886786882
Test - Accuracy. Mean:  78.03529411764706 std:  1.641208095836975

We can see that the result are almost the same and even with little variance improvements over the test accuracy. Also the use of mini-batches yields almost a 4-fold improvement on the time used.

Furthermore, the adagrad version has a faster convergence than the "vanilla" version as it can be observed of the results obtained with 1/20 the number of iterations, as observed on the achieved train acuracy. But this comes with an apparent tradeoff as the test accuracy of the vanilla version is a little bit better, this suggest that maybe it is a good idea to allow both options.
Test Vanilla - 1/20 iterations

Time. Mean:  1.2541892004013062 std:  0.02197305793469499
Train - Accuracy. Mean:  79.0 std:  0.8888682213828464
Test - Accuracy. Mean:  77.37647058823529 std:  1.5683284028668087

Test Adagrad - 1/20 iterations

Time. Mean:  1.3263736057281494 std:  0.020214300581072035
Train - Accuracy. Mean:  80.65483234714003 std:  1.1301895624652614
Test - Accuracy. Mean:  77.0764705882353 std:  2.0688077237684213

TODO:

metric_learn/scml.py

perimosocordiae · 2020-03-04T16:19:52Z

I suspect this will need a rebase now that the triplets PR is merged.

…ic-learn into SCML

grudloff · 2020-03-05T14:36:18Z

I edited the PR and added a test on the current implementation of this branch of the Vehicles dataset from the Matlab implementation repository. I will be adding tests on more datasets soon!

grudloff · 2020-03-11T16:04:35Z

I added a simple basis construction for the unsupervised version in the meantime. The implementation is now consistent with other algorithm implementations, with the new basis construction for the unsupervised version it is possible to check both the normal and supervised version. The following step is to add specific tests and to add some benchmarks on other datasets!

I think the PR is now in shape for some feedback!

grudloff · 2020-03-25T18:07:03Z

test/metric_learn_test.py

+    X, y = make_classification(n_samples=100, n_classes=3, n_features=60,
+                               n_informative=60, n_redundant=0, n_repeated=0,
+                               random_state=42)


While making this test, I tried for n_informative = 45 and n_redundant = 15 . But got as a result an empty components matrix, because the objective function never went below 1 which is the value it takes on iter==0, so the weights stayed with the initialization value.

Watching at the change in the weights it seems like it is learning ok, just the values for the objective function are bigger (reaches a minimum of 2 or so) while in general values below 1 are achieved very fast.

I haven't looked deeply into it yet, but it seems like a situation that should be addressed.

EDIT: Nevermind this comment, I noticed the issue is that it shouldn't enter the output iter procedure on the first iteration. This is addressed in the next commit.

Testing class separation is nice (although it is not good here probably due to dimension compared to number of data points), but I mostly meant making a test to check that the number of bases one obtains is as expected (check that the returned n_basis is equal to the expected one, and that the shape of the returned basis is also what we expect), parameterized for a few values of n_samples, n_features and n_classes

(of course you should keep the two tests with toy data where you also know exactly what the basis should be, which is nice)

I actually thought this test only to test the n_features > 50 case!

PD: It actually can do a lot better, it just has very bad hyperparameters!

OK, then you could keep it and just extend it to also check the number of basis and basis shape, and parametrize it to cover a few values of the above mentioned parameters

I was already on it! Just added it in the last commit.

Perfect!
For n_samples, maybe remove 1000 to avoid slowing down too much the test suite?

bellet · 2020-03-26T17:17:59Z

metric_learn/scml.py

+    max_iter = int(self.max_iter/self.batch_size)
+    output_iter = int(self.output_iter/self.batch_size)


I think this is not consistent with the common semantic of "iteration": one iteration is one update of the parameter, regardless of the mini-batch size used to estimate the gradient

bellet · 2020-03-26T17:20:37Z

Things are looking pretty good at this point. I think the priority now should be to have a benchmark which shows the benefit compared to existing algorithms in the package and allows to decide on decent default parameter values.

bellet · 2020-03-27T18:20:25Z

Some additional small things to fix:

The algorithm crashes when output_iter is larger than max_iter because best_w is not defined
The algorithm crashes when some arguments are given with the wrong type, e.g. a float for max_iter. It would be a good idea to check that all arguments have the desired type and throw an error if it is not the case, along with tests

bellet · 2020-04-01T14:32:11Z

metric_learn/scml.py

+
+    # weight vector
+    w = np.zeros((1, n_basis))
+    # avarage obj gradient wrt weights


terrytangyuan

+1 to merge what we have now but will wait for @perimosocordiae to decide/merge

perimosocordiae

There are a few parts that need more testing, and some minor style nits here and there, but I think this is fine to merge as-is. We can clean up the remaining TODOs in future PRs.

perimosocordiae · 2020-06-17T20:35:13Z

Merged. Thanks for the huge contribution, @grudloff !

grudloff · 2020-06-18T02:19:59Z

This is great news, thanks a lot! I had to leave this hanging but I definitely had in mind giving it the last push to reach completion once I had a bit of time. Glad you decided to merge as-is

bellet · 2020-06-18T06:55:15Z

Congrats @grudloff!

wdevazelhes · 2020-06-18T08:20:13Z

Congrats @grudloff !

grudloff added 3 commits February 13, 2020 15:28

scml first commit

cab9ce7

add scml to __init__.py

41e2cef

fix in components calculation

8ee9a87

grudloff mentioned this pull request Feb 14, 2020

[MRG] Learning on Triplets #279

Merged

3 tasks

perimosocordiae requested a review from bellet February 14, 2020 19:43

bellet mentioned this pull request Feb 17, 2020

Implementing Incremental Fit for LMNN #38

Open

grudloff added 3 commits February 18, 2020 13:45

remove triplet generator, added in triplets PR

f201f9f

change init&fit interface, faster compute & others

87c3da0

added coments & docstrings, small code changes

21a6fc0

terrytangyuan reviewed Feb 19, 2020

View reviewed changes

metric_learn/scml.py Outdated Show resolved Hide resolved

metric_learn/scml.py Outdated Show resolved Hide resolved

metric_learn/scml.py Outdated Show resolved Hide resolved

grudloff added 4 commits February 19, 2020 17:41

typos and added choice of gamma & output_iter

5453c75

some small improvements

5f8d885

lda tail handling rollback

1083f57

performance improvement by precomputing rand_ints

78b9658

grudloff added 3 commits March 5, 2020 10:34

small fix in components computation

bc203f5

Merge branch 'master' of https://github.com/scikit-learn-contrib/metr…

224e861

…ic-learn into SCML

flake8 fix

ecdb74d

grudloff added 10 commits March 5, 2020 17:35

SCML_global fit fix & other small changes

f82f3b3

Proper use of init vars and unsup bases generation

2018d09

triplet dataset format & remove_y for triplets

e9e654c

adaptation with dataset format

686b7eb

remove labels for triplets and quadruplets

4ff5f4c

remove labels

dc50dc7

remove labels & old fit random_state asignation

10d1d04

compliant with older numpy versions

2814662

small typo and fix order

e9f4362

fix n_basis check

a9d1a02

test big n_features

ad47f7f

grudloff commented Mar 25, 2020

View reviewed changes

grudloff added 7 commits March 26, 2020 11:28

Correct output iters

9541b75

output_iter on supervised and improved verbose

ca6c69d

flake8 fix

9902dfe

bases generation test comments

fdf3067

change big_n_basis_lda error msg

517cce4

test generated n_basis and basis shape

06d92f2

add mini batch optimization

5fd80e9

bellet reviewed Mar 26, 2020

View reviewed changes

grudloff added 3 commits March 27, 2020 10:11

correct iter convention

18289e0

eliminate n_samples = 1000

bdf981e

batch grad refactored

c551344

grudloff added 3 commits March 30, 2020 15:43

adagrad adaptive learning

c02e6e5

int input checks and tests

9fd186c

flake8 fix

8dbaad9

bellet reviewed Apr 1, 2020

View reviewed changes

metric_learn/scml.py

# weight vector

w = np.zeros((1, n_basis))

# avarage obj gradient wrt weights

Copy link

Member

bellet Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

average

no double division and smaller triplets arrays

95e5fe8

bellet mentioned this pull request Apr 1, 2020

Towards possible M2TML implem #284

Open

perimosocordiae added 2 commits June 17, 2020 11:25

minor grammar fixes

2aed606

minor formatting tweaks

cba1cf6

terrytangyuan approved these changes Jun 17, 2020

View reviewed changes

perimosocordiae approved these changes Jun 17, 2020

View reviewed changes

perimosocordiae merged commit 43a60c9 into scikit-learn-contrib:master Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SCML : Sparse Compositional Metric Learning #278

SCML : Sparse Compositional Metric Learning #278

grudloff commented Feb 14, 2020 •

edited

Loading

perimosocordiae commented Mar 4, 2020

grudloff commented Mar 5, 2020

grudloff commented Mar 11, 2020

grudloff Mar 25, 2020 •

edited

Loading

bellet Mar 26, 2020

grudloff Mar 26, 2020 •

edited

Loading

bellet Mar 26, 2020

grudloff Mar 26, 2020

bellet Mar 26, 2020

bellet Mar 26, 2020

bellet commented Mar 26, 2020

bellet commented Mar 27, 2020

bellet Apr 1, 2020

terrytangyuan left a comment

perimosocordiae left a comment

perimosocordiae commented Jun 17, 2020

grudloff commented Jun 18, 2020

bellet commented Jun 18, 2020

wdevazelhes commented Jun 18, 2020

		max_iter = int(self.max_iter/self.batch_size)
		output_iter = int(self.output_iter/self.batch_size)

SCML : Sparse Compositional Metric Learning #278

SCML : Sparse Compositional Metric Learning #278

Conversation

grudloff commented Feb 14, 2020 • edited Loading

perimosocordiae commented Mar 4, 2020

grudloff commented Mar 5, 2020

grudloff commented Mar 11, 2020

grudloff Mar 25, 2020 • edited Loading

Choose a reason for hiding this comment

bellet Mar 26, 2020

Choose a reason for hiding this comment

grudloff Mar 26, 2020 • edited Loading

Choose a reason for hiding this comment

bellet Mar 26, 2020

Choose a reason for hiding this comment

grudloff Mar 26, 2020

Choose a reason for hiding this comment

bellet Mar 26, 2020

Choose a reason for hiding this comment

bellet Mar 26, 2020

Choose a reason for hiding this comment

bellet commented Mar 26, 2020

bellet commented Mar 27, 2020

bellet Apr 1, 2020

Choose a reason for hiding this comment

terrytangyuan left a comment

Choose a reason for hiding this comment

perimosocordiae left a comment

Choose a reason for hiding this comment

perimosocordiae commented Jun 17, 2020

grudloff commented Jun 18, 2020

bellet commented Jun 18, 2020

wdevazelhes commented Jun 18, 2020

grudloff commented Feb 14, 2020 •

edited

Loading

grudloff Mar 25, 2020 •

edited

Loading

grudloff Mar 26, 2020 •

edited

Loading