[MRG] New api design #139

wdevazelhes · 2018-12-21T09:03:34Z

This PR merges new_api_design into master, adding the new API to metric learn as well as a minimal documentation

Achieves #91
Also addresses #123

TODO:

Change README ?
~~- [ ] fix pep8 erros comparing (in the diff with master)~~
~~- [ ]update the notebook metric_plotting.ipynb with the new API~~
- [ ] take a second look at methods that were modified to prevent using self.X_ (and therefore have a new argument X) (ex: RCA._check_dimension) some of them could be static methods. It could be good to explicitly make them so (maybe just file an issue for that and solve after not to mix with this PR)

… Weakly Supervised Algorithms.

Basically these are the tests from PR #85, but reformatted to use pytest, and formed tuples instead of ConstrainedDatasets.

- Make PairsClassifierMixin and QuadrupletsClassifierMixin classes, to implement scoring functions - Implement a new API for supervised wrappers of weakly supervised learning estimators (through the use of base classes, (ex: BaseMMC), from which inherit child classes (ex: MMC and MMC_Supervised) (which is the same idea as in PR #85 - Delete tests that use tuples learners as transformers (as we do not want to support this behaviour anymore: it is too complicated to allow such different input types (tuples or points) for the same estimator

[MRG + 1] Allow already formed tuples as an input.

# Conflicts: # metric_learn/sdml.py

- replace similarity by metric - replace constrained dataset by pairs/quadruplets - simplify score on quadruplets expression - replace ``X_constrained`` in tests by pairs/quadruplets/tuples

[MRG] New API should allow prediction functions and scoring

* WIP create MahalanobisMixin * ENH Update algorithms with Mahalanobis Mixin: - Make them inherit from Mahalanobis Mixin, and implement the metric_ property - Improve metric_ property by checking if it exists and raising the appropriate warning if not - Make tests work, by replacing metric() with metric_ * FIX: add missing import * FIX: update sklearn's function check_no_fit_attributes_set_in_init to new check_no_attributes_set_in_init" This new function was introduced through PR scikit-learn/scikit-learn#9450 in scikit-learn. It also allows to pass tests that would otherwise not pass: indeed having abstract attributes as properties threw an error. But the new test functions handles well this property inheritance. * FIX: take function ``_get_args`` from scikit-learn's PR scikit-learn/scikit-learn#9450 Indeed, in the PR this function is modified to support python 2. This should solve the CI error. * ENH: add transformer_ attribute and improve docstring * WIP: move transform() in BaseMetricLearner to transformer_from_metric() in MahalanobisMixin * WIP: refactor metric to original formulation: a function, with result computed from the transformer * WIP: make all Mahalanobis Metric Learner algorithms have transformer_ and metric() * ENH Add score_pairs function - Make MahalanobisMixin inherit from BaseMetricLearner to give a concrete implementation of score_pairs - Use score_pairs to compute more easily predict - Add docstring - TST: for every algorithm: - test that using score_pairs pairwise returns an euclidean distance matrix - test that score_pairs works for 3D arrays of several pairs as well as 2D arrays of one pair (and there returns only a scalar) - test that score_pairs always returns a finite output * TST add test on toy example for score_pairs * ENH Add embed function - add the function and docstring - use it for score_pairs - TST : - should be finite - have right output dimension - embedding should be linear - should work on a toy example * FIX fix error in slicing of quadruplets * FIX minor corrections * FIX minor corrections - remove unusual s to test functions - remove redundant parenthesis * FIX fix PEP8 errors * FIX remove possible one-sample scoring from docstring for now * REF rename n_features_out to num_dims to be more coherent with current algorithms * MAINT: Adress #96 (review) - replace embed by transform and add always the input X in calling the function - mutualize _transformer_from_metric not to be overwritten in MMC - improve test_mahalanobis_mixin.test_score_pairs_pairwise according to #96 (comment) - improve test_mahalanobis_mixin.check_is_distance_matrix - correct typos and nitpicks * ENH: Add check_tuples * FIX: fix parenthesis * ENH: put docstring of transformer_ in each metric learner * FIX: style knitpicks to uniformize transformer_ docstring with childs * FIX: make transformer_from_metric public * Address #96 (review) * FIX: fix pairwise distances check * FIX: ensure random state is set in all tests * FIX: fix test with real value to test in check_tuples * FIX: update MetricTransformer to be abstract method and have the appropriate doc * MAINT: make BaseMetricLearner and MetricTransformer abstract * MAINT: remove __init__ method from BaseMetricLearner Since it is an abstract class it already returns an error at instanciation: TypeError: Can't instantiate abstract class BaseMetricLearner with abstract methods score_pairs

* WIP create MahalanobisMixin * ENH Update algorithms with Mahalanobis Mixin: - Make them inherit from Mahalanobis Mixin, and implement the metric_ property - Improve metric_ property by checking if it exists and raising the appropriate warning if not - Make tests work, by replacing metric() with metric_ * FIX: add missing import * FIX: update sklearn's function check_no_fit_attributes_set_in_init to new check_no_attributes_set_in_init" This new function was introduced through PR scikit-learn/scikit-learn#9450 in scikit-learn. It also allows to pass tests that would otherwise not pass: indeed having abstract attributes as properties threw an error. But the new test functions handles well this property inheritance. * FIX: take function ``_get_args`` from scikit-learn's PR scikit-learn/scikit-learn#9450 Indeed, in the PR this function is modified to support python 2. This should solve the CI error. * ENH: add transformer_ attribute and improve docstring * WIP: move transform() in BaseMetricLearner to transformer_from_metric() in MahalanobisMixin * WIP: refactor metric to original formulation: a function, with result computed from the transformer * WIP: make all Mahalanobis Metric Learner algorithms have transformer_ and metric() * ENH Add score_pairs function - Make MahalanobisMixin inherit from BaseMetricLearner to give a concrete implementation of score_pairs - Use score_pairs to compute more easily predict - Add docstring - TST: for every algorithm: - test that using score_pairs pairwise returns an euclidean distance matrix - test that score_pairs works for 3D arrays of several pairs as well as 2D arrays of one pair (and there returns only a scalar) - test that score_pairs always returns a finite output * TST add test on toy example for score_pairs * ENH Add embed function - add the function and docstring - use it for score_pairs - TST : - should be finite - have right output dimension - embedding should be linear - should work on a toy example * FIX fix error in slicing of quadruplets * FIX minor corrections * FIX minor corrections - remove unusual s to test functions - remove redundant parenthesis * FIX fix PEP8 errors * FIX remove possible one-sample scoring from docstring for now * REF rename n_features_out to num_dims to be more coherent with current algorithms * MAINT: Adress #96 (review) - replace embed by transform and add always the input X in calling the function - mutualize _transformer_from_metric not to be overwritten in MMC - improve test_mahalanobis_mixin.test_score_pairs_pairwise according to #96 (comment) - improve test_mahalanobis_mixin.check_is_distance_matrix - correct typos and nitpicks * ENH: Add check_tuples * FIX: fix parenthesis * ENH: First commit adding a preprocessor * ENH: Improve check_tuples with more comments and deal better with ensure_min_features * STY: remove unexpected spaces * FIX: Raise more appropriate error message The previous error message would have said "[...], shape=(shape_of_the_2D_array_extracted_from_3D)" But it is clearer to print the shape of the actual 3D initial array (tuples in input) * FIX: fix string formatting and refactor name and context to use context in custom functions but to give name to check_array * FIX: only allow 2D if preprocessor and 3D if not preprocessor * FIX: put format arguments in the right order * MAINT: better to say the preprocessor than a preprocessor in messages * FIX: numeric should be default if NO preprocessor * FIX: preprocessor argument has to be a boolean (before this change was a callable or an array) * FIX: fix preprocessor argument passed in check_tuples in base_metric * MAINT: say a preprocessor rather than the preprocessor * DOC: fix docstring of t in check_tuples * MAINT: make error messages better by only printing presence of preprocessor if the error is because not 2D or 3D shape * TST: Add tests for check_tuples * TST: simplify tests by removing the test for messages with the estimator name, but incorporating it in all other tests through parametrization * STY: remove unnecessary parenthesis * FIX: put back else statement that probably was wrongfully merged * TST: add tests for weakly supervised estimators and preprocessor that fetches indices * TST: add tests for preprocessor * FIX: remove redundant metric and transformer function, wrongly merged * MAINT: rename format_input into preprocess_tuples and input into tuples * MAINT: fixes and enhancements - fix the format_input previous incomplete refactoring - mututalize check_tuples for Weakly Supervised Algorithms - fix test of string representations * MAINT: mutualize check_tuples * MAINT: refactor SimplePreprocessor into ArrayIndexer * MAINT: improve check_tuples and tests * TST: add random seed for _Supervised classes * TST: Adapt test pipeline - use random state for _Supervised classes - call transform only for pipeline with a TransformerMixin * TST: fix test_progress_message_preprocessor_tuples by making func return an np.array * Remove deprecated cross_validation import and put model_selection instead * WIP replace checks by unique check_input function * Fixes some tests: - fixes dtype checks and conversion by ensuring the checks and conversions are made on the preprocessed array and not the input array (which can be an array of indices) - fix tests that were failing due to the wrong error message * TST: Cherry pick from new sklearn version ac0e230 [MRG] Quick fix of failed tests due to new scikit-learn version (0.20.0) (#130) * TST: Quick fix of failed tests due to new scikit-learn version (0.20.0) * FIX update values to pass test * FIX: get changes from master to pass test iris for NCA * FIX fix tests that were failing due to the error message FIX check_tuples at the end and only at the end, because the number of tuples should not be modified from the beginning, and we want to check them also at the end * TST: fix test_check_input_invalid_t that changed since we test t at the end now * TST fix NCA's iris test taking code from master * FIX fix tests: - use check_array instead of conversion to numpy array to ensure that sparse array are kept as such and not wrapped into a regular numpy array - check_array the metric with the right arguments for covariance in order not to fail if the array is a scalar (for one-feature samples) * FIX fix previous modification that removed self.X_ but was modifying the input at fit time * FIX ensure at least 2d only for checking the metric because after check_array of inputs the metric should be a scalar or an array so we only need to ensure it is an array (2D array) * STY: Fix PEP8 violations * MAINT: Refactor error messages with the help of numerical codes * MAINT: mutualize check_preprocessor and check_input for every estimator * FIX: remove format_map for python2.7 compatibility * DOC: Add docstring for check_input and fix some bugs * DOC: add docstrings * MAINT: Removing changes not related to this PR, and fixing previous probable unsuccessfull merge * STY: Fix PEP8 errors * STY: fix indent problems * Fixing docstring spaces * DOC: add preprocessor docstring when missing * STY: PEP8 fixes * MAINT: refactor the global check function into _prepare_input * FIX: fix quadruplets scoring and delete useless comments * MAINT: remove some enhancements to be coherent with previous code and simplify review * MAINT: Improve test messages * MAINT: reorganize tests * FIX: fix typo in LMNN shogun and clean todo for the equivalent code in python_LMNN * MAINT: Rename inputs and input into input_data * STY: add backticks to None * MAINT: add more detailed comment of first checks and remove old comment * MAINT: improve comments for checking num_features * MAINT: Refactor t into tuple_size * MAINT: Fix small PEP8 error * MAINT: FIX remaining t into tuple_size and replace hasattr if None by getattr with default None * MAINT: remove misplaced comment * MAINT: Put back/add docstrings for decision_function/predict * MAINT: remove unnecessary ellipsis and upadate docstring of decision_function * Add comments in LMNN for arguments useful for the shogun version that are not used in the python version * MAINT: Remove useless mock_preprocessor * MAINT: Remove useless loop * MAINT: refactor test_dict_unchanged * MAINT: remove _get_args copied from scikit-learn and replace it by an import * MAINT: Fragment check_input by extracting blocks into check_input_classic and check_input_tuples * MAINT: ensure min_samples=2 for supervised learning algorithms (we should at least have two datapoints to learn something) * ENH: Return custom error when some error is due to the preprocessor * MAINT: Refactor algorithms preprocessing steps - extract preprocessing steps in main fit function - remove self.X_ when found and replace it by X (Fixes #134) - extract the function to test collapsed pairs as _utils.check_collapsed_pairs and test it - note that the function is now not used where it was used before but the user should be responsible for having coherent input (if he wants he can use the helper function as a preprocessing step) * MAINT: finish the work of the previous commit * TST: add test for cross-validation: comparison of manual cross-val and scikit-learn cross-val * ENH: put y=None by default in LSML for better compatibility. This also allows to simplify some tests by removing the need for a separate case for lsml * ENH: add error message when type of inputs is not some expected type * TST: add test that checks that 'classic' is the default behaviour * TST: remove unnecessary conversion to vertical vector of y * FIX: remove wrong condition hasattr 'score' at top of loop * MAINT: Add comment to explain why we return twice X for build_regression and build_classification * ENH: improve test for preprocessor and return error message if the given preprocessor has the bad type * FIX: fix wrong type_of_inputs in a test * FIX: deal with the case where preprocessor is None * WIP refactor build_dataset * MAINT: refactor bool preprocessor to with_preprocessor * FIX: fix build_pairs and build_quadruplets because 'only named arguments should follow expression' * STY: fix PEP8 error * MAINT: mututalize test_same_with_or_without_preprocessor_tuples and test_same_with_or_without_preprocessor_classic * TST: give better names in test_same_with_or_without_preprocessor * MAINT: refactor list_estimators into metric_learners * TST: uniformize names input_data - tuples, labels - y * FIX: fix build_pairs and build_quadruplets * MAINT: remove forgotten code duplication * MAINT: address #117 (review)

* Create some text to initialize the PR * DOC: add doc outline * DOC: Add data visualisation to possible examples * Update documentation outline * Add doc from master * DOC: add beginning of doc tree * DOC: add some beginning of example to get started with the examples section using sphinx-gallery * DOC: modify gitignore to ignore auto_examples * WIP: add preprocessor section and some section about weakly supervised learners and copy the previous content to the different sections * A few style improvements (text wraping to line limit, better references formatting ...) * Address #133 (review) * raise instead of return * Fix quickstart example * Emphasize scikit-learn compatibility * Update introduction with new methods * address #133 (comment) * explain what happens when preprocessor=None * Precisions in doc about the input accepted by the preprocessor * address #133 (comment) * Better formulation of sentences * change title formatting in index * Fix references and some numering issue * Reformat link to preprocessor * Fix automatic link to algorithms for the supervised section * Reformatting and adding examples about supervised version in the end of the weakly supervised version * add precisions in the intro * add precisions for score_pairs in the intro * Change examples for weakly supervised section * add _Supervised section in Supervised section * change examples in weakly supervised section * fix empty module contents * rename sandwich.py into plot_sandwich.py to be found by sphinx-gallery

# Conflicts: # doc/index.rst # metric_learn/itml.py # metric_learn/lmnn.py # metric_learn/lsml.py # metric_learn/mlkr.py # metric_learn/mmc.py # metric_learn/nca.py # metric_learn/rca.py # metric_learn/sdml.py # test/metric_learn_test.py # test/test_base_metric.py

bellet · 2018-12-21T12:59:38Z

doc/supervised.rst

+    X = iris_data['data']
+    Y = iris_data['target']
+
+    nca = NCA(max_iter=1000, learning_rate=0.01)


remove learning_rate

Indeed, thanks !
Done

bellet · 2018-12-21T13:00:24Z

test/test_transformer_metric_conversion.py

    assert_array_almost_equal(L.T.dot(L), lmnn.metric())

  def test_sdml_supervised(self):
    seed = np.random.RandomState(1234)
    sdml = SDML_Supervised(num_constraints=1500)
    sdml.fit(self.X, self.y, random_state=seed)
-    L = sdml.transformer()
+    L = sdml.transformer_
    assert_array_almost_equal(L.T.dot(L), sdml.metric())

  def test_nca(self):
    n = self.X.shape[0]
    nca = NCA(max_iter=(100000//n), learning_rate=0.01)


wdevazelhes · 2018-12-21T13:55:01Z

I'll also update the README.rst here (for instance removing the paragraph about transformer())

bellet · 2018-12-21T14:12:49Z

Should we update the README, or do these instructions refer to the last release (install from PyPi)?

wdevazelhes · 2018-12-21T14:42:26Z

Should we update the README, or do these instructions refer to the last release (install from PyPi)?

Ah yes that's right, they seem to refer to the last release
Should we keep instructions on using the package (algorithms etc) in the README though ? Because this seems to be already documented in the documentation (by the way in the doc we say the distance is (x−y)M(x−y) whereas in the README we say it's sqrt((x−y)M(x−y)). I guess the real distance is with sqrt so we should modify in the docs right ? )
Because if we make changes in algorithms we'll have to always update this page, but do it just after the release, it seems a bit weird

wdevazelhes · 2018-12-21T14:49:20Z

I just added a TODO in the description of the PR

bellet · 2018-12-21T15:22:52Z

I agree that we could probably remove the detailed usage instructions on the README and only keep the link to the doc
And yes, the proper distance is with the square root and this is also what is returned by score_pairs so we should change the doc accordingly

bellet · 2018-12-21T15:25:59Z

About self.X_ / self.X: if you don't do it here, make sure you remove #134 from the list of issues addressed by this PR

perimosocordiae · 2018-12-30T19:40:40Z

I'd prefer to keep this PR limited as much as possible to the merge, and then fix the remaining minor issues (pep8 stuff, jupyter notebook, static methods, etc) separately. The idea is that those will be small and easy to review, whereas this PR is pretty large already.

bellet · 2019-01-02T10:56:55Z

Yes that's true, reviewing additional things here will be more difficult
@wdevazelhes could you perhaps simply update the README and move the rest to new issues that you can deal with separately? So we can merge this quickly and start making progress on small things and on improving the doc in separate PRs

wdevazelhes · 2019-01-02T13:18:33Z

I agree, I just pushed the new README and updated the introduction with the square root, and created the issues

wdevazelhes · 2019-01-02T13:31:06Z

About self.X_ / self.X: if you don't do it here, make sure you remove #134 from the list of issues addressed by this PR

I replaced self.X_ by self.X here, but there still remains some variables that maybe do not need to be stored after fit, like for instance self.labels_ in LMNN, or self.vab_ in LSML, so I'll remove it from listed issues, and change the title to a more general title (not just "avoid storing X" but "avoid storing unnecessary variables")

bellet · 2019-01-02T14:19:04Z

Thanks, I will merge this

…te in the merge scikit-learn-contrib#139

* API: remove num_labeled parameter * DEP: Add deprecation warnings for num_labels * MAINT: put deprecation for version 0.5.0 * Revert "MAINT: put deprecation for version 0.5.0" This reverts commit 8727c44. * Revert "Merge remote-tracking branch 'origin/master' into fix/remove_num_labeled_parameter" This reverts commit 944bb3e, reversing changes made to 8727c44. * Revert "Revert "MAINT: put deprecation for version 0.5.0"" This reverts commit bc1eb32. * FIX string representation test wrongly merged * git revert d6bd0d4 * STY fix pep8 errors * STY: fix docstring indentation * FIX remove tests from NCA that are dealt with in #143 * FIX remove nca deprecation test because we remove totally learning rate in the merge #139 * FIX update version * Remove the use of random_subset

William de Vazelhes and others added 26 commits May 14, 2018 15:55

Update API to be compatible with scikit-learn by taking 3D inputs for…

9f5c998

… Weakly Supervised Algorithms.

Merge branch 'master' into new_api_fresh_start

3acf31a

find unique rows in a way compatible with numpy 1.12.1

a7e4807

Update docstring for new api

903f174

Add tests

776ab91

Basically these are the tests from PR #85, but reformatted to use pytest, and formed tuples instead of ConstrainedDatasets.

fix pep8 errors and unused imports

237d467

let the transformer function inside BaseMetricLearner

c124ee6

Change labels y to be +1/-1 (cf. comment #92 (comment)).

374a851

update docstrings with change for +1/-1 labels (see #92 (comment))

b4bdec4

Merge pull request #92 from wdevazelhes/new_api_fresh_start

13f1535

[MRG + 1] Allow already formed tuples as an input.

Merge branch 'new_api_design' into feat/api_prediction

2dae03e

# Conflicts: # metric_learn/sdml.py

FIX move docstrings from _fit to fit

a70d1a8

FIX: corrections according to reviews #95 (review) and #95 (review)

b741a9e

- replace similarity by metric - replace constrained dataset by pairs/quadruplets - simplify score on quadruplets expression - replace ``X_constrained`` in tests by pairs/quadruplets/tuples

Merge pull request #95 from wdevazelhes/feat/api_prediction

24b0def

[MRG] New API should allow prediction functions and scoring

solve little glitch in merging

f5d9c6b

API: remove deprecated learning rate for v 0.5.0

e31d58e

MAINT: avoid to store unnecessary variables in Covariance

dd18993

FIX LMNN wrongly merged

780dd01

FIX merge

02c50c7

FIX merge

8848240

FIX merge

a9610b3

wdevazelhes changed the title ~~New api design~~ [WIP] New api design Dec 21, 2018

William de Vazelhes added 2 commits December 21, 2018 11:58

FIX merge

603e673

FIX merge

f545a5d

bellet reviewed Dec 21, 2018

View reviewed changes

FIX: remove deprecated learning rate

c08eb3c

MAINT: clean some imports

b43011a

Remove detailed usage from README and update introduction.rst with sqrt

9e13301

This was referenced Jan 2, 2019

Fix some PEP8 errors #140

Closed

Update metric_plotting.ipynb with the new API #141

Closed

Refactor static methods #142

Open

wdevazelhes mentioned this pull request Jan 2, 2019

Avoid storing any unnecessary variable after fit #134

Closed

bellet changed the title ~~[WIP] New api design~~ [MRG] New api design Jan 2, 2019

bellet merged commit 23d0746 into master Jan 2, 2019

This was referenced Jan 2, 2019

No unified function to check the inputs #123

Closed

New API to be more compatible with scikit-learn #91

Closed

wdevazelhes mentioned this pull request Jan 2, 2019

[MRG] API: remove num_labeled parameter #119

Merged

wdevazelhes pushed a commit to wdevazelhes/metric-learn that referenced this pull request Jan 2, 2019

FIX remove nca deprecation test because we remove totally learning ra…

3a200cd

…te in the merge scikit-learn-contrib#139

wdevazelhes mentioned this pull request Jan 2, 2019

[MRG] FIX put back nca's tests #143

Merged

bellet deleted the new_api_design branch January 2, 2019 17:04

bellet added this to the v0.5.0 milestone Jan 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] New api design #139

[MRG] New api design #139

wdevazelhes commented Dec 21, 2018 •

edited

Loading

bellet Dec 21, 2018

wdevazelhes Dec 21, 2018

bellet Dec 21, 2018

wdevazelhes Dec 21, 2018

wdevazelhes commented Dec 21, 2018

bellet commented Dec 21, 2018 •

edited

Loading

wdevazelhes commented Dec 21, 2018

wdevazelhes commented Dec 21, 2018

bellet commented Dec 21, 2018

bellet commented Dec 21, 2018 •

edited

Loading

perimosocordiae commented Dec 30, 2018

bellet commented Jan 2, 2019

wdevazelhes commented Jan 2, 2019 •

edited

Loading

wdevazelhes commented Jan 2, 2019

bellet commented Jan 2, 2019

[MRG] New api design #139

[MRG] New api design #139

Conversation

wdevazelhes commented Dec 21, 2018 • edited Loading

bellet Dec 21, 2018

Choose a reason for hiding this comment

wdevazelhes Dec 21, 2018

Choose a reason for hiding this comment

bellet Dec 21, 2018

Choose a reason for hiding this comment

wdevazelhes Dec 21, 2018

Choose a reason for hiding this comment

wdevazelhes commented Dec 21, 2018

bellet commented Dec 21, 2018 • edited Loading

wdevazelhes commented Dec 21, 2018

wdevazelhes commented Dec 21, 2018

bellet commented Dec 21, 2018

bellet commented Dec 21, 2018 • edited Loading

perimosocordiae commented Dec 30, 2018

bellet commented Jan 2, 2019

wdevazelhes commented Jan 2, 2019 • edited Loading

wdevazelhes commented Jan 2, 2019

bellet commented Jan 2, 2019

wdevazelhes commented Dec 21, 2018 •

edited

Loading

bellet commented Dec 21, 2018 •

edited

Loading

bellet commented Dec 21, 2018 •

edited

Loading

wdevazelhes commented Jan 2, 2019 •

edited

Loading