Skip to content

[MRG] New api design #139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jan 2, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
9f5c998
Update API to be compatible with scikit-learn by taking 3D inputs for…
May 14, 2018
3acf31a
Merge branch 'master' into new_api_fresh_start
May 18, 2018
a7e4807
find unique rows in a way compatible with numpy 1.12.1
May 22, 2018
903f174
Update docstring for new api
May 22, 2018
776ab91
Add tests
May 22, 2018
106cbd2
Implement scoring functions (and make tests work):
May 24, 2018
237d467
fix pep8 errors and unused imports
May 24, 2018
c124ee6
let the transformer function inside BaseMetricLearner
May 24, 2018
374a851
Change labels y to be +1/-1 (cf. comment https://github.com/metric-le…
May 24, 2018
b4bdec4
update docstrings with change for +1/-1 labels (see https://github.co…
May 24, 2018
13f1535
Merge pull request #92 from wdevazelhes/new_api_fresh_start
wdevazelhes May 25, 2018
2dae03e
Merge branch 'new_api_design' into feat/api_prediction
May 25, 2018
a70d1a8
FIX move docstrings from _fit to fit
May 25, 2018
b741a9e
FIX: corrections according to reviews https://github.com/metric-learn…
Jun 5, 2018
24b0def
Merge pull request #95 from wdevazelhes/feat/api_prediction
wdevazelhes Jun 8, 2018
e4685b1
[MRG] Create new Mahalanobis mixin (#96)
wdevazelhes Sep 4, 2018
010b34a
[MRG] Add preprocessor option (#117)
wdevazelhes Dec 14, 2018
073451a
Add documentation for the new API (#133)
wdevazelhes Dec 20, 2018
d34867d
Merge branch 'master' into new_api_design
Dec 21, 2018
f5d9c6b
solve little glitch in merging
Dec 21, 2018
e31d58e
API: remove deprecated learning rate for v 0.5.0
Dec 21, 2018
dd18993
MAINT: avoid to store unnecessary variables in Covariance
Dec 21, 2018
780dd01
FIX LMNN wrongly merged
Dec 21, 2018
02c50c7
FIX merge
Dec 21, 2018
8848240
FIX merge
Dec 21, 2018
a9610b3
FIX merge
Dec 21, 2018
603e673
FIX merge
Dec 21, 2018
f545a5d
FIX merge
Dec 21, 2018
c08eb3c
FIX: remove deprecated learning rate
Dec 21, 2018
b43011a
MAINT: clean some imports
Dec 21, 2018
9e13301
Remove detailed usage from README and update introduction.rst with sqrt
Jan 2, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ dist/
.coverage
htmlcov/
.cache/
doc/auto_examples/*
23 changes: 2 additions & 21 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,27 +34,8 @@ package installed).

**Usage**

For full usage examples, see the `sphinx documentation`_.

Each metric is a subclass of ``BaseMetricLearner``, which provides
default implementations for the methods ``metric``, ``transformer``, and
``transform``. Subclasses must provide an implementation for either
``metric`` or ``transformer``.

For an instance of a metric learner named ``foo`` learning from a set of
``d``-dimensional points, ``foo.metric()`` returns a ``d x d``
matrix ``M`` such that the distance between vectors ``x`` and ``y`` is
expressed ``sqrt((x-y).dot(M).dot(x-y))``.
Using scipy's ``pdist`` function, this would look like
``pdist(X, metric='mahalanobis', VI=foo.metric())``.

In the same scenario, ``foo.transformer()`` returns a ``d x d``
matrix ``L`` such that a vector ``x`` can be represented in the learned
space as the vector ``x.dot(L.T)``.

For convenience, the function ``foo.transform(X)`` is provided for
converting a matrix of points (``X``) into the learned space, in which
standard Euclidean distance can be used.
See the `sphinx documentation`_ for full documentation about installation, API,
usage, and examples.

**Notes**

Expand Down
2 changes: 1 addition & 1 deletion bench/benchmarks/iris.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
'LMNN': metric_learn.LMNN(k=5, learn_rate=1e-6, verbose=False),
'LSML_Supervised': metric_learn.LSML_Supervised(num_constraints=200),
'MLKR': metric_learn.MLKR(),
'NCA': metric_learn.NCA(max_iter=700, learning_rate=0.01, num_dims=2),
'NCA': metric_learn.NCA(max_iter=700, num_dims=2),
'RCA_Supervised': metric_learn.RCA_Supervised(dim=2, num_chunks=30,
chunk_size=2),
'SDML_Supervised': metric_learn.SDML_Supervised(num_constraints=1500),
Expand Down
4 changes: 4 additions & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
'sphinx.ext.viewcode',
'sphinx.ext.mathjax',
'numpydoc',
'sphinx_gallery.gen_gallery'
]

templates_path = ['_templates']
Expand All @@ -31,3 +32,6 @@
html_static_path = ['_static']
htmlhelp_basename = 'metric-learndoc'

# Option to only need single backticks to refer to symbols
default_role = 'any'

42 changes: 42 additions & 0 deletions doc/getting_started.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
###############
Getting started
###############

Installation and Setup
======================

Run ``pip install metric-learn`` to download and install from PyPI.

Alternately, download the source repository and run:

- ``python setup.py install`` for default installation.
- ``python setup.py test`` to run all tests.

**Dependencies**

- Python 2.7+, 3.4+
- numpy, scipy, scikit-learn
- (for running the examples only: matplotlib)

**Notes**

If a recent version of the Shogun Python modular (``modshogun``) library
is available, the LMNN implementation will use the fast C++ version from
there. The two implementations differ slightly, and the C++ version is
more complete.


Quick start
===========

This example loads the iris dataset, and evaluates a k-nearest neighbors
algorithm on an embedding space learned with `NCA`.

>>> from metric_learn import NCA
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.pipeline import make_pipeline
>>>
>>> X, y = load_iris(return_X_y=True)
>>> clf = make_pipeline(NCA(), KNeighborsClassifier())
>>> cross_val_score(clf, X, y)
96 changes: 12 additions & 84 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,103 +2,31 @@ metric-learn: Metric Learning in Python
=======================================
|License| |PyPI version|

Distance metrics are widely used in the machine learning literature.
Traditionally, practicioners would choose a standard distance metric
(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of
the domain.
Distance metric learning (or simply, metric learning) is the sub-field of
machine learning dedicated to automatically constructing optimal distance
metrics.

This package contains efficient Python implementations of several popular
metric learning algorithms.

Supervised Algorithms
---------------------
Supervised metric learning algorithms take as inputs points `X` and target
labels `y`, and learn a distance matrix that make points from the same class
(for classification) or with close target value (for regression) close to
each other, and points from different classes or with distant target values
far away from each other.
Welcome to metric-learn's documentation !
-----------------------------------------

.. toctree::
:maxdepth: 1

metric_learn.covariance
metric_learn.lmnn
metric_learn.nca
metric_learn.lfda
metric_learn.mlkr
:maxdepth: 2

Weakly-Supervised Algorithms
--------------------------
Weakly supervised algorithms work on weaker information about the data points
than supervised algorithms. Rather than labeled points, they take as input
similarity judgments on tuples of data points, for instance pairs of similar
and dissimilar points. Refer to the documentation of each algorithm for its
particular form of input data.
getting_started

.. toctree::
:maxdepth: 1

metric_learn.itml
metric_learn.lsml
metric_learn.sdml
metric_learn.rca
metric_learn.mmc

Note that each weakly-supervised algorithm has a supervised version of the form
`*_Supervised` where similarity constraints are generated from
the labels information and passed to the underlying algorithm.

Each metric learning algorithm supports the following methods:

- ``fit(...)``, which learns the model.
- ``transformer()``, which returns a transformation matrix
:math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
:math:`D`-dimensional learned metric space :math:`X L^{\top}`,
in which standard Euclidean distances may be used.
- ``transform(X)``, which applies the aforementioned transformation.
- ``metric()``, which returns a Mahalanobis matrix
:math:`M = L^{\top}L` such that distance between vectors ``x`` and
``y`` can be computed as :math:`\left(x-y\right)M\left(x-y\right)`.


Installation and Setup
======================

Run ``pip install metric-learn`` to download and install from PyPI.
:maxdepth: 2

Alternately, download the source repository and run:
user_guide

- ``python setup.py install`` for default installation.
- ``python setup.py test`` to run all tests.

**Dependencies**

- Python 2.7+, 3.4+
- numpy, scipy, scikit-learn
- (for running the examples only: matplotlib)
.. toctree::
:maxdepth: 2

**Notes**
Package Overview <metric_learn>

If a recent version of the Shogun Python modular (``modshogun``) library
is available, the LMNN implementation will use the fast C++ version from
there. The two implementations differ slightly, and the C++ version is
more complete.
.. toctree::
:maxdepth: 2

Navigation
----------
auto_examples/index

:ref:`genindex` | :ref:`modindex` | :ref:`search`

.. toctree::
:maxdepth: 4
:hidden:

Package Overview <metric_learn>

.. |PyPI version| image:: https://badge.fury.io/py/metric-learn.svg
:target: http://badge.fury.io/py/metric-learn
.. |License| image:: http://img.shields.io/:license-mit-blue.svg?style=flat
Expand Down
38 changes: 38 additions & 0 deletions doc/introduction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
============
Introduction
============

Distance metrics are widely used in the machine learning literature.
Traditionally, practitioners would choose a standard distance metric
(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of
the domain.
Distance metric learning (or simply, metric learning) is the sub-field of
machine learning dedicated to automatically construct task-specific distance
metrics from (weakly) supervised data.
The learned distance metric often corresponds to a Euclidean distance in a new
embedding space, hence distance metric learning can be seen as a form of
representation learning.

This package contains a efficient Python implementations of several popular
metric learning algorithms, compatible with scikit-learn. This allows to use
all the scikit-learn routines for pipelining and model selection for
metric learning algorithms.


Currently, each metric learning algorithm supports the following methods:

- ``fit(...)``, which learns the model.
- ``metric()``, which returns a Mahalanobis matrix
:math:`M = L^{\top}L` such that distance between vectors ``x`` and
``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`.
- ``transformer_from_metric(metric)``, which returns a transformation matrix
:math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
:math:`D`-dimensional learned metric space :math:`X L^{\top}`,
in which standard Euclidean distances may be used.
- ``transform(X)``, which applies the aforementioned transformation.
- ``score_pairs(pairs)`` which returns the distance between pairs of
points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs,
2, n_features)``, or it can be a 2D array-like of pairs indicators of
shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more
details).
2 changes: 1 addition & 1 deletion doc/metric_learn.nca.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Example Code
X = iris_data['data']
Y = iris_data['target']

nca = NCA(max_iter=1000, learning_rate=0.01)
nca = NCA(max_iter=1000)
nca.fit(X, Y)

References
Expand Down
12 changes: 2 additions & 10 deletions doc/metric_learn.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
metric_learn package
====================

Submodules
----------
Module Contents
---------------

.. toctree::

Expand All @@ -16,11 +16,3 @@ Submodules
metric_learn.nca
metric_learn.rca
metric_learn.sdml

Module contents
---------------

.. automodule:: metric_learn
:members:
:undoc-members:
:show-inheritance:
Loading