-
Notifications
You must be signed in to change notification settings - Fork 228
SCML : Sparse Compositional Metric Learning #278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I suspect this will need a rebase now that the triplets PR is merged. |
I edited the PR and added a test on the current implementation of this branch of the Vehicles dataset from the Matlab implementation repository. I will be adding tests on more datasets soon! |
I added a simple basis construction for the unsupervised version in the meantime. The implementation is now consistent with other algorithm implementations, with the new basis construction for the unsupervised version it is possible to check both the normal and supervised version. The following step is to add specific tests and to add some benchmarks on other datasets! I think the PR is now in shape for some feedback! |
X, y = make_classification(n_samples=100, n_classes=3, n_features=60, | ||
n_informative=60, n_redundant=0, n_repeated=0, | ||
random_state=42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While making this test, I tried for n_informative = 45
and n_redundant = 15
. But got as a result an empty components
matrix, because the objective function never went below 1 which is the value it takes on iter==0
, so the weights stayed with the initialization value.
Watching at the change in the weights it seems like it is learning ok, just the values for the objective function are bigger (reaches a minimum of 2 or so) while in general values below 1 are achieved very fast.
I haven't looked deeply into it yet, but it seems like a situation that should be addressed.
EDIT: Nevermind this comment, I noticed the issue is that it shouldn't enter the output iter procedure on the first iteration. This is addressed in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing class separation is nice (although it is not good here probably due to dimension compared to number of data points), but I mostly meant making a test to check that the number of bases one obtains is as expected (check that the returned n_basis
is equal to the expected one, and that the shape of the returned basis
is also what we expect), parameterized for a few values of n_samples
, n_features
and n_classes
(of course you should keep the two tests with toy data where you also know exactly what the basis should be, which is nice)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually thought this test only to test the n_features > 50
case!
PD: It actually can do a lot better, it just has very bad hyperparameters!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then you could keep it and just extend it to also check the number of basis and basis shape, and parametrize it to cover a few values of the above mentioned parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was already on it! Just added it in the last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect!
For n_samples
, maybe remove 1000 to avoid slowing down too much the test suite?
metric_learn/scml.py
Outdated
max_iter = int(self.max_iter/self.batch_size) | ||
output_iter = int(self.output_iter/self.batch_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not consistent with the common semantic of "iteration": one iteration is one update of the parameter, regardless of the mini-batch size used to estimate the gradient
Things are looking pretty good at this point. I think the priority now should be to have a benchmark which shows the benefit compared to existing algorithms in the package and allows to decide on decent default parameter values. |
Some additional small things to fix:
|
|
||
# weight vector | ||
w = np.zeros((1, n_basis)) | ||
# avarage obj gradient wrt weights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
average
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to merge what we have now but will wait for @perimosocordiae to decide/merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few parts that need more testing, and some minor style nits here and there, but I think this is fine to merge as-is. We can clean up the remaining TODOs in future PRs.
Merged. Thanks for the huge contribution, @grudloff ! |
This is great news, thanks a lot! I had to leave this hanging but I definitely had in mind giving it the last push to reach completion once I had a bit of time. Glad you decided to merge as-is |
Congrats @grudloff! |
Congrats @grudloff ! |
Sparse Compositional Metric Learning (SCML) allows scalable learning of global, multi-task and multiple local Mahalanobis metrics for multi-class data under a unified framework based on sparse combinations of rank-one basis metrics. For this initial merge, only the global setting will be implemented.
The algorithm learns on triplets, so it is necessary to add the base class for the addition of this kind of algorithm. For the sake of clarity, this will be added in a separate concurrent PR.
This implementation follows closely the matlab implementation of Y. Shi, A. Bellet and F. Sha. Sparse Compositional Metric Learning. AAAI Conference on Artificial Intelligence (AAAI), 2014. SCML paper
Theory - Global setting
The Mahalanobis Matrix is constructed as a sum of rank-1 PSD matrices:

The basis are intended to be locally discriminative. In the original paper and in this implementation they are constructed with LDA of several local regions. There are other options to construct this basis that will be added later.
The constrains are a set of triplets

C
that are inforced through the minimization problem:Advantages
n_basis
parameters must be learned and stored, without taking into account the basis.Tests on Vehicles dataset
The results for the vehicles dataset found in the matlab implementation github are used to validate the consistency and the correctness of the current implementation.
On each of the tests, the algorithm was run 100 times, the mean and std of the resulted accuracy in test and train, as well as the time, are shown below each test link.
For the batch versions, the numbers of iterations are reduced to have the same amount of gradient computations for each method.
Test Vanilla
Test Batch
Test Adagrad
Test Batch+Adagrad
We can see that the result are almost the same and even with little variance improvements over the test accuracy. Also the use of mini-batches yields almost a 4-fold improvement on the time used.
Furthermore, the adagrad version has a faster convergence than the "vanilla" version as it can be observed of the results obtained with 1/20 the number of iterations, as observed on the achieved train acuracy. But this comes with an apparent tradeoff as the test accuracy of the vanilla version is a little bit better, this suggest that maybe it is a good idea to allow both options.
Test Vanilla - 1/20 iterations
Test Adagrad - 1/20 iterations
TODO: