diff --git a/doc/index.rst b/doc/index.rst
index 9dbcd9b0..ed3f6ccb 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -2,8 +2,15 @@ metric-learn: Metric Learning in Python
=======================================
|License| |PyPI version|
-Welcome to metric-learn's documentation !
------------------------------------------
+Metric-learn contains efficient Python implementations of several
+popular supervised and weakly-supervised metric learning algorithms. The API
+of metric-learn is compatible with `scikit-learn
+`_, the leading library for machine learning in
+Python. This allows to use of all the scikit-learn routines (for pipelining,
+model selection, etc) with metric learning algorithms.
+
+Documentation outline
+---------------------
.. toctree::
:maxdepth: 2
diff --git a/doc/introduction.rst b/doc/introduction.rst
index 9f2b4165..f0195c83 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -1,38 +1,140 @@
-============
-Introduction
-============
-
-Distance metrics are widely used in the machine learning literature.
-Traditionally, practitioners would choose a standard distance metric
-(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of
-the domain.
-Distance metric learning (or simply, metric learning) is the sub-field of
-machine learning dedicated to automatically construct task-specific distance
-metrics from (weakly) supervised data.
-The learned distance metric often corresponds to a Euclidean distance in a new
-embedding space, hence distance metric learning can be seen as a form of
-representation learning.
-
-This package contains a efficient Python implementations of several popular
-metric learning algorithms, compatible with scikit-learn. This allows to use
-all the scikit-learn routines for pipelining and model selection for
-metric learning algorithms.
-
-
-Currently, each metric learning algorithm supports the following methods:
-
-- ``fit(...)``, which learns the model.
-- ``metric()``, which returns a Mahalanobis matrix
- :math:`M = L^{\top}L` such that distance between vectors ``x`` and
- ``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`.
-- ``transformer_from_metric(metric)``, which returns a transformation matrix
- :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
- data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
- :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
- in which standard Euclidean distances may be used.
-- ``transform(X)``, which applies the aforementioned transformation.
-- ``score_pairs(pairs)`` which returns the distance between pairs of
- points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs,
- 2, n_features)``, or it can be a 2D array-like of pairs indicators of
- shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more
- details).
+========================
+What is Metric Learning?
+========================
+
+Many approaches in machine learning require a measure of distance between data
+points. Traditionally, practitioners would choose a standard distance metric
+(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the
+domain. However, it is often difficult to design metrics that are well-suited
+to the particular data and task of interest.
+
+Distance metric learning (or simply, metric learning) aims at
+automatically constructing task-specific distance metrics from (weakly)
+supervised data, in a machine learning manner. The learned distance metric can
+then be used to perform various tasks (e.g., k-NN classification, clustering,
+information retrieval).
+
+Problem Setting
+===============
+
+Metric learning problems fall into two main categories depending on the type
+of supervision available about the training data:
+
+- :doc:`Supervised learning `: the algorithm has access to
+ a set of data points, each of them belonging to a class (label) as in a
+ standard classification problem.
+ Broadly speaking, the goal in this setting is to learn a distance metric
+ that puts points with the same label close together while pushing away
+ points with different labels.
+- :doc:`Weakly supervised learning `: the
+ algorithm has access to a set of data points with supervision only
+ at the tuple level (typically pairs, triplets, or quadruplets of
+ data points). A classic example of such weaker supervision is a set of
+ positive and negative pairs: in this case, the goal is to learn a distance
+ metric that puts positive pairs close together and negative pairs far away.
+
+Based on the above (weakly) supervised data, the metric learning problem is
+generally formulated as an optimization problem where one seeks to find the
+parameters of a distance function that optimize some objective function
+measuring the agreement with the training data.
+
+Mahalanobis Distances
+=====================
+
+In the metric-learn package, all algorithms currently implemented learn
+so-called Mahalanobis distances. Given a real-valued parameter matrix
+:math:`L` of shape ``(num_dims, n_features)`` where ``n_features`` is the
+number features describing the data, the Mahalanobis distance associated with
+:math:`L` is defined as follows:
+
+.. math:: D(x, x') = \sqrt{(Lx-Lx')^\top(Lx-Lx')}
+
+In other words, a Mahalanobis distance is a Euclidean distance after a
+linear transformation of the feature space defined by :math:`L` (taking
+:math:`L` to be the identity matrix recovers the standard Euclidean distance).
+Mahalanobis distance metric learning can thus be seen as learning a new
+embedding space of dimension ``num_dims``. Note that when ``num_dims`` is
+smaller than ``n_features``, this achieves dimensionality reduction.
+
+Strictly speaking, Mahalanobis distances are "pseudo-metrics": they satisfy
+three of the `properties of a metric `_ (non-negativity, symmetry, triangle inequality) but not
+necessarily the identity of indiscernibles.
+
+.. note::
+
+ Mahalanobis distances can also be parameterized by a `positive semi-definite
+ (PSD) matrix
+ `_
+ :math:`M`:
+
+ .. math:: D(x, x') = \sqrt{(x-x')^\top M(x-x')}
+
+ Using the fact that a PSD matrix :math:`M` can always be decomposed as
+ :math:`M=L^\top L` for some :math:`L`, one can show that both
+ parameterizations are equivalent. In practice, an algorithm may thus solve
+ the metric learning problem with respect to either :math:`M` or :math:`L`.
+
+Use-cases
+=========
+
+There are many use-cases for metric learning. We list here a few popular
+examples (for code illustrating some of these use-cases, see the
+:doc:`examples ` section of the documentation):
+
+- `Nearest neighbors models
+ `_: the learned
+ metric can be used to improve nearest neighbors learning models for
+ classification, regression, anomaly detection...
+- `Clustering `_:
+ metric learning provides a way to bias the clusters found by algorithms like
+ K-Means towards the intended semantics.
+- Information retrieval: the learned metric can be used to retrieve the
+ elements of a database that are semantically closer to a query element.
+- Dimensionality reduction: metric learning may be seen as a way to reduce the
+ data dimension in a (weakly) supervised setting.
+- More generally, the learned transformation :math:`L` can be used to project
+ the data into a new embedding space before feeding it into another machine
+ learning algorithm.
+
+The API of metric-learn is compatible with `scikit-learn
+`_, the leading library for machine
+learning in Python. This allows to easily pipeline metric learners with other
+scikit-learn estimators to realize the above use-cases, to perform joint
+hyperparameter tuning, etc.
+
+Further reading
+===============
+
+For more information about metric learning and its applications, one can refer
+to the following resources:
+
+- **Tutorial:** `Similarity and Distance Metric Learning with Applications to
+ Computer Vision
+ `_ (2015)
+- **Surveys:** `A Survey on Metric Learning for Feature Vectors and Structured
+ Data `_ (2013), `Metric Learning: A
+ Survey `_ (2012)
+- **Book:** `Metric Learning
+ `_ (2015)
+
+.. Methods [TO MOVE TO SUPERVISED/WEAK SECTIONS]
+.. =============================================
+
+.. Currently, each metric learning algorithm supports the following methods:
+
+.. - ``fit(...)``, which learns the model.
+.. - ``metric()``, which returns a Mahalanobis matrix
+.. :math:`M = L^{\top}L` such that distance between vectors ``x`` and
+.. ``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`.
+.. - ``transformer_from_metric(metric)``, which returns a transformation matrix
+.. :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
+.. data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
+.. :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
+.. in which standard Euclidean distances may be used.
+.. - ``transform(X)``, which applies the aforementioned transformation.
+.. - ``score_pairs(pairs)`` which returns the distance between pairs of
+.. points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs,
+.. 2, n_features)``, or it can be a 2D array-like of pairs indicators of
+.. shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more
+.. details).
\ No newline at end of file