[MRG] New API should allow prediction functions and scoring #95

wdevazelhes · 2018-05-22T13:47:33Z

The new API should allow metric learning algorithms that fit on tuples of points to also predict, score, etc on tuples, Therefore being usable in scikit-learn's cross-validation routines. This is part of the PRs need for issue #91.

Take previous tests from [WIP] New API proposal #85, and refactor them using pytest, and to allow already formed tuples (3D arrays) instead of ConstrainedDatasets.
Take previous code from [WIP] New API proposal #85 and adapt it to formed tuples
Make tests work
Make some modifications if needed
- ~~Move docstrings from _fit to fit~~
- ~~Remove unused imports~~

Basically these are the tests from PR scikit-learn-contrib#85, but reformatted to use pytest, and formed tuples instead of ConstrainedDatasets.

- Make PairsClassifierMixin and QuadrupletsClassifierMixin classes, to implement scoring functions - Implement a new API for supervised wrappers of weakly supervised learning estimators (through the use of base classes, (ex: BaseMMC), from which inherit child classes (ex: MMC and MMC_Supervised) (which is the same idea as in PR scikit-learn-contrib#85 - Delete tests that use tuples learners as transformers (as we do not want to support this behaviour anymore: it is too complicated to allow such different input types (tuples or points) for the same estimator

# Conflicts: # metric_learn/sdml.py

wdevazelhes · 2018-05-25T09:34:51Z

I just merged with the recently merged PR #92, so changes introduced by this PR are now clearer.

bellet

maybe good to test warning/errors in the check functions? such as wrong labels (not -1/1), etc
predict function for pairs: later we should think about how to implement a threshold-based predict, without fixing the threshold in advance but tuning it automatically on the train set to achieve desired precision
It looks like we are loosing the ability of weakly supervised algorithm to be used to transform the data, but I guess this will be fixed in the next PR introducing a Mahalanobis Mixin with an embed method

bellet · 2018-05-29T07:15:57Z

metric_learn/base_metric.py

+class _PairsClassifierMixin:
+
+  def predict(self, pairs):
+    """Predicts the learned similarity between input pairs.


should be metric instead of similarity here

Yes indeed, thanks

bellet · 2018-05-29T07:16:59Z

metric_learn/base_metric.py

+class _QuadrupletsClassifierMixin:
+
+  def predict(self, quadruplets):
+    """Predicts differences between sample similarities in input quadruplets.


Yes, thanks

bellet · 2018-05-29T07:24:22Z

test/test_weakly_supervised.py

+
+
+def build_pairs():
+  # test that you can do cross validation on a ConstrainedDataset with


no ConstrainedDataset anymore. also X_constrained should be renamed (this is a set of pairs)

Yes, thanks

as some tests are parameterized to work for pairs and quadruplets, I will rename them tuples in the tests, but pairs and quadruplets in build_pairs and build_quadruplets functions that initialize data

bellet · 2018-05-29T07:24:28Z

test/test_weakly_supervised.py

+
+
+def build_quadruplets():
+  # test that you can do cross validation on a ConstrainedDataset with


perimosocordiae · 2018-05-31T02:27:51Z

metric_learn/base_metric.py

+    return (np.sqrt(np.sum(similar_diffs.dot(self.metric()) *
+                           similar_diffs, axis=1)) -
+            np.sqrt(np.sum(dissimilar_diffs.dot(self.metric()) *
+                           dissimilar_diffs, axis=1)))


This pattern, distance under some metric, seems like it should be factored out.

Yes indeed, the function will call function score_pairs (that returns the new metric between points) that will be inherited from the BaseMetricLearner, and implemented through ExplicitMixin (a Mixin for all learners that can embed data) (so score_pairs will be implemented as the euclidean distance between embeddings)

(this should ultimately be in the Mahalanobis Mixin)

perimosocordiae · 2018-05-31T02:29:13Z

metric_learn/base_metric.py

+      The quadruplets score.
+    """
+    predicted_sign = self.decision_function(quadruplets) < 0
+    return np.sum(predicted_sign) / predicted_sign.shape[0]


Why not np.mean(np.sign(...)) here?

Much cleaner indeed, thanks !

wdevazelhes · 2018-06-05T09:25:12Z

maybe good to test warning/errors in the check functions? such as wrong labels (not -1/1), etc

Yes indeed, I will add it to the TODO in the issue #91

predict function for pairs: later we should think about how to implement a threshold-based predict, without fixing the threshold in advance but tuning it automatically on the train set to achieve desired precision

Yes, it is in the TODO

It looks like we are loosing the ability of weakly supervised algorithm to be used to transform the data, but I guess this will be fixed in the next PR introducing a Mahalanobis Mixin with an embed method

Yes, indeed, the abstract method will be created in ExplicitMixin, and then implemented in MahalanobisMixin. I wonder however if we could not postpone ExplicitMixin to when there are metric learners which are not Explicit, and for now implement embed and score_pairs directly in MahalanobisMixin.

…and scikit-learn-contrib#95 (review) - replace similarity by metric - replace constrained dataset by pairs/quadruplets - simplify score on quadruplets expression - replace ``X_constrained`` in tests by pairs/quadruplets/tuples

bellet · 2018-06-05T12:24:58Z

Yes, one possibility is to implement only a Mahalanobis Mixin for now (since all current algorithms fall in this category)

Add tests

776ab91

Basically these are the tests from PR scikit-learn-contrib#85, but reformatted to use pytest, and formed tuples instead of ConstrainedDatasets.

wdevazelhes mentioned this pull request May 22, 2018

New API to be more compatible with scikit-learn #91

Closed

7 tasks

William de Vazelhes added 4 commits May 24, 2018 11:50

fix pep8 errors and unused imports

237d467

let the transformer function inside BaseMetricLearner

c124ee6

Merge branch 'new_api_design' into feat/api_prediction

2dae03e

# Conflicts: # metric_learn/sdml.py

FIX move docstrings from _fit to fit

a70d1a8

wdevazelhes changed the title ~~[WIP] New API should allow prediction functions and scoring~~ [MRG] New API should allow prediction functions and scoring May 25, 2018

wdevazelhes requested review from perimosocordiae, bellet and nvauquie May 25, 2018 12:20

bellet approved these changes May 29, 2018

View reviewed changes

perimosocordiae reviewed May 31, 2018

View reviewed changes

wdevazelhes mentioned this pull request Jun 6, 2018

[MRG] Create new Mahalanobis mixin #96

Merged

7 tasks

wdevazelhes merged commit 24b0def into scikit-learn-contrib:new_api_design Jun 8, 2018

wdevazelhes deleted the feat/api_prediction branch August 22, 2018 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] New API should allow prediction functions and scoring #95

[MRG] New API should allow prediction functions and scoring #95

wdevazelhes commented May 22, 2018 •

edited

Loading

wdevazelhes commented May 25, 2018 •

edited

Loading

bellet left a comment

bellet May 29, 2018

wdevazelhes Jun 5, 2018

bellet May 29, 2018

wdevazelhes Jun 5, 2018

bellet May 29, 2018

wdevazelhes Jun 5, 2018

wdevazelhes Jun 5, 2018

bellet May 29, 2018

perimosocordiae May 31, 2018

wdevazelhes Jun 5, 2018

bellet Jun 5, 2018

perimosocordiae May 31, 2018

wdevazelhes Jun 5, 2018

wdevazelhes commented Jun 5, 2018

bellet commented Jun 5, 2018



		def build_pairs():
		# test that you can do cross validation on a ConstrainedDataset with



		def build_quadruplets():
		# test that you can do cross validation on a ConstrainedDataset with

[MRG] New API should allow prediction functions and scoring #95

[MRG] New API should allow prediction functions and scoring #95

Conversation

wdevazelhes commented May 22, 2018 • edited Loading

wdevazelhes commented May 25, 2018 • edited Loading

bellet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wdevazelhes commented Jun 5, 2018

bellet commented Jun 5, 2018

wdevazelhes commented May 22, 2018 •

edited

Loading

wdevazelhes commented May 25, 2018 •

edited

Loading