[MRG + 1] Allow already formed tuples as an input. #92

wdevazelhes · 2018-05-17T13:02:12Z

This PR changes the API with the first and easier case to implement of the new api (see issue #91 ): the case where there is no preprocessor but tuples are already formed and given to the algorithm as such. The main question is for algorithms that needs a covariance matrix to initialize the metric (LSML), or that compute bounds (ITML). For now I just reconstruct a dataset of points from the pairs thanks to np.unique, in order to have the same results and pass the tests, but this is probably not the best thing to do. For SDML, I could express the term involved in the loss only with formed pairs instead of datapoints, and here is a snippet that tests it: https://gist.github.com/wdevazelhes/3c349e13976613d15ebc46178c942474

Left TODO:

Update docstrings of methods where signature changed

… Weakly Supervised Algorithms.

…arn-contrib#94)

perimosocordiae · 2018-05-18T19:44:16Z

metric_learn/sdml.py

    # set up prior M
    if self.use_cov:
+      X = np.unique(pairs.reshape(-1, pairs.shape[2]), axis=0)


This is causing test failures, because the axis argument for np.unique was added in version 1.13.0 but we test with an earlier version.

Yes, indeed, I will change this to something compatible with version 1.12.1

I'll go for this solution: https://stackoverflow.com/questions/16970982/find-unique-rows-in-numpy-array/22941699#22941699

wdevazelhes · 2018-05-22T08:51:19Z

I also updated the docstring. Note also that for quadruplets learning, for now we need for the quadruplets to be ordered so that we know which samples are more similar to others. We could also imagine to let the ordering between the first two and the last two pairs be any (most similar pairs before the least similar, or the contrary), and then use a label to specify the ordering. This would be coherent with the supervision for pairs (where there is also a label of constraints), but is redundant. This can be decided later.

bellet

This looks good except for minor comments. I guess we have tests in place to confirm that these modifications did not change the output of the algorithms.

Regarding whether the order of instances in quadruplets/triplets should be fixed or given by labels is indeed an open question. Maybe we can indeed stay like this for now and make the final decision when implementing the score/predict functions. Having labels might be more natural in that context.

bellet · 2018-05-24T11:49:34Z

metric_learn/itml.py

@@ -51,52 +51,65 @@ def __init__(self, gamma=1., max_iter=1000, convergence_threshold=1e-3,
    self.A0 = A0
    self.verbose = verbose

-  def _process_inputs(self, X, constraints, bounds):
-    self.X_ = X = check_array(X)
+  def _process_pairs(self, pairs, y, bounds):


I think it would make sense that this function _process_pairs is shared across the class of pair metric learners. For instance, ruling the potential pairs that are identical is useful for all algorithms

Yes I agree, I added it to the small features TODO list at the end of the main issue: #91 (comment)

bellet · 2018-05-24T11:50:44Z

metric_learn/itml.py

+    pos_no_ident = vector_norm(pos_pairs[:, 0, :] - pos_pairs[:, 1, :]) > 1e-9
+    pos_pairs = pos_pairs[pos_no_ident]
+    neg_no_ident = vector_norm(neg_pairs[:, 0, :] - neg_pairs[:, 1, :]) > 1e-9
+    neg_pairs = neg_pairs[neg_no_ident]


maybe showing a warning to the user when such pair is found and discarded is useful. in particular if a negative pair is made of two identical points, probably there is a problem with the way the user generated the pairs, or the dataset

Yes I agree. I added it to the TODO

bellet · 2018-05-24T11:57:57Z

metric_learn/sdml.py

+    pos_neg = c.positive_negative_pairs(num_constraints,
+                                              random_state=random_state)
+    pairs, y = wrap_pairs(X, pos_neg)
+    y = 2 * y - 1


shouldn't the pair labels be 0/1 and not -1/1?

that said, this raises the question of whether pair labels should be 0/1 or -1/1. If -1/1 is always more convenient for the algorithm implementation, we could switch to that. potentially allow both and convert to -1/1 in the process pairs helper function

Yes I agree
Done (see commits 374a851 and b4bdec4). (Note that for now I did not try to simplify the code of the algorithms using these labels but I just added it in the new API issue main message here #91 (comment))

…ment)).

bellet · 2018-05-24T13:27:02Z

metric_learn/mmc.py

+    pairs: array-like, shape=(n_constraints, 2, n_features)
+        Array of pairs. Each row corresponds to two points.
+    y: array-like, of shape (n_constraints,)
+        Labels of constraints. Should be 0 for dissimilar pair, 1 for similar.


missing changes for label -1/1
maybe make sure to check that no change has been missed, including within the algorithms and tests

Thanks indeed I forgot docstrings :p Tests should be passing (they do on my computer)

…rib#92 (comment))

wdevazelhes · 2018-05-25T09:07:16Z

Thanks for the review. I am merging this PR. This will allow to review the next PR more easily. If some change remains to do, they can be changed later on, as we are working on a branch separate from master.

Update API to be compatible with scikit-learn by taking 3D inputs for…

9f5c998

… Weakly Supervised Algorithms.

wdevazelhes mentioned this pull request May 17, 2018

New API to be more compatible with scikit-learn #91

Closed

7 tasks

wdevazelhes changed the title ~~Allow already formed tuples as an input.~~ [WIP] Allow already formed tuples as an input. May 17, 2018

This comment has been minimized.

Sign in to view

wdevazelhes requested review from perimosocordiae, bellet and nvauquie May 18, 2018 14:37

wdevazelhes changed the title ~~[WIP] Allow already formed tuples as an input.~~ [MRG] Allow already formed tuples as an input. May 18, 2018

wdevazelhes and others added 2 commits May 18, 2018 11:48

Deals with scipy's new version, where eigsh can call eigh. (scikit-le…

4c887d7

…arn-contrib#94)

Merge branch 'master' into new_api_fresh_start

3acf31a

wdevazelhes changed the title ~~[MRG] Allow already formed tuples as an input.~~ [WIP] Allow already formed tuples as an input. May 18, 2018

perimosocordiae reviewed May 18, 2018

View reviewed changes

William de Vazelhes added 2 commits May 22, 2018 09:51

find unique rows in a way compatible with numpy 1.12.1

a7e4807

Update docstring for new api

903f174

wdevazelhes changed the title ~~[WIP] Allow already formed tuples as an input.~~ [MRG] Allow already formed tuples as an input. May 22, 2018

bellet approved these changes May 24, 2018

View reviewed changes

Change labels y to be +1/-1 (cf. comment scikit-learn-contrib#92 (com…

374a851

…ment)).

bellet reviewed May 24, 2018

View reviewed changes

update docstrings with change for +1/-1 labels (see scikit-learn-cont…

b4bdec4

…rib#92 (comment))

wdevazelhes changed the title ~~[MRG] Allow already formed tuples as an input.~~ [MRG + ] Allow already formed tuples as an input. May 25, 2018

wdevazelhes changed the title ~~[MRG + ] Allow already formed tuples as an input.~~ [MRG + 1] Allow already formed tuples as an input. May 25, 2018

wdevazelhes merged commit 13f1535 into scikit-learn-contrib:new_api_design May 25, 2018

wdevazelhes mentioned this pull request May 25, 2018

[MRG] New API should allow prediction functions and scoring #95

Merged

4 tasks

wdevazelhes deleted the new_api_fresh_start branch August 22, 2018 06:50

wdevazelhes mentioned this pull request Jan 11, 2019

SDML raises SDP error on iris #154

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MRG + 1] Allow already formed tuples as an input. #92

[MRG + 1] Allow already formed tuples as an input. #92

Uh oh!

wdevazelhes commented May 17, 2018 •

edited

Loading

Uh oh!

This comment has been minimized.

perimosocordiae May 18, 2018

Uh oh!

wdevazelhes May 22, 2018

Uh oh!

wdevazelhes May 22, 2018

Uh oh!

wdevazelhes commented May 22, 2018 •

edited

Loading

Uh oh!

bellet left a comment

Uh oh!

bellet May 24, 2018

Uh oh!

wdevazelhes May 24, 2018

Uh oh!

bellet May 24, 2018

Uh oh!

wdevazelhes May 24, 2018

Uh oh!

bellet May 24, 2018

Uh oh!

wdevazelhes May 24, 2018 •

edited

Loading

Uh oh!

bellet May 24, 2018

Uh oh!

wdevazelhes May 24, 2018 •

edited

Loading

Uh oh!

wdevazelhes commented May 25, 2018

Uh oh!

Uh oh!

[MRG + 1] Allow already formed tuples as an input. #92

[MRG + 1] Allow already formed tuples as an input. #92

Uh oh!

Conversation

wdevazelhes commented May 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wdevazelhes commented May 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bellet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wdevazelhes May 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wdevazelhes May 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wdevazelhes commented May 25, 2018

Uh oh!

Uh oh!

wdevazelhes commented May 17, 2018 •

edited

Loading

wdevazelhes commented May 22, 2018 •

edited

Loading

wdevazelhes May 24, 2018 •

edited

Loading

wdevazelhes May 24, 2018 •

edited

Loading