scikit-learn · adrinjalali · Mar 31, 2025 · Dec 20, 2024 · Dec 20, 2024 · Dec 20, 2024
diff --git a/doc/modules/manifold.rst b/doc/modules/manifold.rst
@@ -418,67 +418,65 @@ Multi-dimensional Scaling (MDS)
 representation of the data in which the distances respect well the
 distances in the original high-dimensional space.
 
-In general, :class:`MDS` is a technique used for analyzing similarity or
-dissimilarity data. It attempts to model similarity or dissimilarity data as
-distances in a geometric space. The data can be ratings of similarity between
+In general, :class:`MDS` is a technique used for analyzing
+dissimilarity data. It attempts to model dissimilarities as
+distances in a Euclidean space. The data can be ratings of dissimilarity between
 objects, interaction frequencies of molecules, or trade indices between
 countries.
 
 There exist two types of MDS algorithm: metric and non-metric. In
-scikit-learn, the class :class:`MDS` implements both. In Metric MDS, the input
-similarity matrix arises from a metric (and thus respects the triangular
-inequality), the distances between output two points are then set to be as
-close as possible to the similarity or dissimilarity data. In the non-metric
-version, the algorithms will try to preserve the order of the distances, and
+scikit-learn, the class :class:`MDS` implements both. In metric MDS,
+the distances in the embedding space are set as
+close as possible to the dissimilarity data. In the non-metric
+version, the algorithm will try to preserve the order of the distances, and
 hence seek for a monotonic relationship between the distances in the embedded
-space and the similarities/dissimilarities.
+space and the input dissimilarities.
 
 .. figure:: ../auto_examples/manifold/images/sphx_glr_plot_lle_digits_010.png
    :target: ../auto_examples/manifold/plot_lle_digits.html
    :align: center
    :scale: 50
 
 
-Let :math:`S` be the similarity matrix, and :math:`X` the coordinates of the
-:math:`n` input points. Disparities :math:`\hat{d}_{ij}` are transformation of
-the similarities chosen in some optimal ways. The objective, called the
-stress, is then defined by :math:`\sum_{i < j} d_{ij}(X) - \hat{d}_{ij}(X)`
+Let :math:`\delta_{ij}` be the dissimilarity matrix between the
+:math:`n` input points (possibly arising as some pairwise distances
+:math:`d_{ij}(X)` between the coordinates :math:`X` of the input points).
+Disparities :math:`\hat{d}_{ij} = f(\delta_{ij})` are some transformation of
+the dissimilarities. The MDS objective, called the raw stress, is then
+defined by :math:`\sum_{i < j} (\hat{d}_{ij} - d_{ij}(Z))^2`,
+where :math:`d_{ij}(Z)` are the pairwise distances between the
+coordinates :math:`Z` of the embedded points.
 
 
 .. dropdown:: Metric MDS
 
-  The simplest metric :class:`MDS` model, called *absolute MDS*, disparities are defined by
-  :math:`\hat{d}_{ij} = S_{ij}`. With absolute MDS, the value :math:`S_{ij}`
-  should then correspond exactly to the distance between point :math:`i` and
-  :math:`j` in the embedding point.
-
-  Most commonly, disparities are set to :math:`\hat{d}_{ij} = b S_{ij}`.
+  In the metric :class:`MDS` model (sometimes also called *absolute MDS*),
+  disparities are simply equal to the input dissimilarities
+  :math:`\hat{d}_{ij} = \delta_{ij}`.
 
 .. dropdown:: Nonmetric MDS
 
   Non metric :class:`MDS` focuses on the ordination of the data. If
-  :math:`S_{ij} > S_{jk}`, then the embedding should enforce :math:`d_{ij} <
-  d_{jk}`. For this reason, we discuss it in terms of dissimilarities
-  (:math:`\delta_{ij}`) instead of similarities (:math:`S_{ij}`). Note that
-  dissimilarities can easily be obtained from similarities through a simple
-  transform, e.g. :math:`\delta_{ij}=c_1-c_2 S_{ij}` for some real constants
-  :math:`c_1, c_2`. A simple algorithm to enforce proper ordination is to use a
-  monotonic regression of :math:`d_{ij}` on :math:`\delta_{ij}`, yielding
-  disparities :math:`\hat{d}_{ij}` in the same order as :math:`\delta_{ij}`.
-
-  A trivial solution to this problem is to set all the points on the origin. In
-  order to avoid that, the disparities :math:`\hat{d}_{ij}` are normalized. Note
-  that since we only care about relative ordering, our objective should be
+  :math:`\delta_{ij} > \delta_{kl}`, then the embedding
+  seeks to enforce :math:`d_{ij}(Z) > d_{kl}(Z)`. A simple algorithm
+  to enforce proper ordination is to use an
+  isotonic regression of :math:`d_{ij}(Z)` on :math:`\delta_{ij}`, yielding
+  disparities :math:`\hat{d}_{ij}` that are a monotonic transformation
+  of dissimilarities :math:`\delta_{ij}` and hence having the same ordering.
+  This is done repeatedly after every step of the optimization algorithm.
+  In order to avoid the trivial solution where all embedding points are
+  overlapping, the disparities :math:`\hat{d}_{ij}` are normalized.
+
+  Note that since we only care about relative ordering, our objective should be
   invariant to simple translation and scaling, however the stress used in metric
-  MDS is sensitive to scaling. To address this, non-metric MDS may use a
-  normalized stress, known as Stress-1 defined as
+  MDS is sensitive to scaling. To address this, non-metric MDS returns
+  normalized stress, also known as Stress-1, defined as
 
   .. math::
-      \sqrt{\frac{\sum_{i < j} (d_{ij} - \hat{d}_{ij})^2}{\sum_{i < j} d_{ij}^2}}.
+      \sqrt{\frac{\sum_{i < j} (\hat{d}_{ij} - d_{ij}(Z))^2}{\sum_{i < j}
+      d_{ij}(Z)^2}}.
 
-  The use of normalized Stress-1 can be enabled by setting `normalized_stress=True`,
-  however it is only compatible with the non-metric MDS problem and will be ignored
-  in the metric case.
+  Normalized Stress-1 is returned if `normalized_stress=True`.
 
   .. figure:: ../auto_examples/manifold/images/sphx_glr_plot_mds_001.png
     :target: ../auto_examples/manifold/plot_mds.html
@@ -487,6 +485,10 @@ stress, is then defined by :math:`\sum_{i < j} d_{ij}(X) - \hat{d}_{ij}(X)`
 
 .. rubric:: References
 
+* `"More on Multidimensional Scaling and Unfolding in R: smacof Version 2"
+  <https://www.jstatsoft.org/article/view/v102i10>`_
+  Mair P, Groenen P., de Leeuw J. Journal of Statistical Software (2022)
+
 * `"Modern Multidimensional Scaling - Theory and Applications"
   <https://www.springer.com/fr/book/9780387251509>`_
   Borg, I.; Groenen P. Springer Series in Statistics (1997)

diff --git a/doc/whats_new/upcoming_changes/sklearn.manifold/30514.fix.rst b/doc/whats_new/upcoming_changes/sklearn.manifold/30514.fix.rst
@@ -0,0 +1,4 @@
+- :class:`manifold.MDS` now correctly handles non-metric MDS. Furthermore,
+  the returned stress value now corresponds to the returned embedding and
+  normalized stress is now allowed for metric MDS.
+  By :user:`Dmitry Kobak <dkobak>`
diff --git a/examples/manifold/plot_mds.py b/examples/manifold/plot_mds.py
@@ -21,79 +21,87 @@
 from sklearn.decomposition import PCA
 from sklearn.metrics import euclidean_distances
 
+# Generate the data
 EPSILON = np.finfo(np.float32).eps
 n_samples = 20
-seed = np.random.RandomState(seed=3)
-X_true = seed.randint(0, 20, 2 * n_samples).astype(float)
+rng = np.random.RandomState(seed=3)
+X_true = rng.randint(0, 20, 2 * n_samples).astype(float)
 X_true = X_true.reshape((n_samples, 2))
+
 # Center the data
 X_true -= X_true.mean()
 
-similarities = euclidean_distances(X_true)
+# Compute pairwise Euclidean distances
+distances = euclidean_distances(X_true)
 
-# Add noise to the similarities
-noise = np.random.rand(n_samples, n_samples)
+# Add noise to the distances
+noise = rng.rand(n_samples, n_samples)
 noise = noise + noise.T
-noise[np.arange(noise.shape[0]), np.arange(noise.shape[0])] = 0
-similarities += noise
+np.fill_diagonal(noise, 0)
+distances += noise
 
 mds = manifold.MDS(
     n_components=2,
     max_iter=3000,
     eps=1e-9,
-    random_state=seed,
+    random_state=42,
     dissimilarity="precomputed",
     n_jobs=1,
 )
-pos = mds.fit(similarities).embedding_
+X_mds = mds.fit(distances).embedding_
 
 nmds = manifold.MDS(
     n_components=2,
     metric=False,
     max_iter=3000,
     eps=1e-12,
     dissimilarity="precomputed",
-    random_state=seed,
+    random_state=42,
     n_jobs=1,
     n_init=1,
 )
-npos = nmds.fit_transform(similarities, init=pos)
+X_nmds = nmds.fit_transform(distances)
 
 # Rescale the data
-pos *= np.sqrt((X_true**2).sum()) / np.sqrt((pos**2).sum())
-npos *= np.sqrt((X_true**2).sum()) / np.sqrt((npos**2).sum())
+X_mds *= np.sqrt((X_true**2).sum()) / np.sqrt((X_mds**2).sum())
+X_nmds *= np.sqrt((X_true**2).sum()) / np.sqrt((X_nmds**2).sum())
 
 # Rotate the data
-clf = PCA(n_components=2)
-X_true = clf.fit_transform(X_true)
-
-pos = clf.fit_transform(pos)
-
-npos = clf.fit_transform(npos)
+pca = PCA(n_components=2)
+X_true = pca.fit_transform(X_true)
+X_mds = pca.fit_transform(X_mds)
+X_nmds = pca.fit_transform(X_nmds)
+
+# Align the sign of PCs
+for i in [0, 1]:
+    if np.corrcoef(X_mds[:, i], X_true[:, i])[0, 1] < 0:
+        X_mds[:, i] *= -1
+    if np.corrcoef(X_nmds[:, i], X_true[:, i])[0, 1] < 0:
+        X_nmds[:, i] *= -1
 
 fig = plt.figure(1)
 ax = plt.axes([0.0, 0.0, 1.0, 1.0])
 
 s = 100
 plt.scatter(X_true[:, 0], X_true[:, 1], color="navy", s=s, lw=0, label="True Position")
-plt.scatter(pos[:, 0], pos[:, 1], color="turquoise", s=s, lw=0, label="MDS")
-plt.scatter(npos[:, 0], npos[:, 1], color="darkorange", s=s, lw=0, label="NMDS")
+plt.scatter(X_mds[:, 0], X_mds[:, 1], color="turquoise", s=s, lw=0, label="MDS")
+plt.scatter(X_nmds[:, 0], X_nmds[:, 1], color="darkorange", s=s, lw=0, label="NMDS")
 plt.legend(scatterpoints=1, loc="best", shadow=False)
 
-similarities = similarities.max() / (similarities + EPSILON) * 100
-np.fill_diagonal(similarities, 0)
 # Plot the edges
-start_idx, end_idx = np.where(pos)
+start_idx, end_idx = np.where(X_mds)
 # a sequence of (*line0*, *line1*, *line2*), where::
 #            linen = (x0, y0), (x1, y1), ... (xm, ym)
 segments = [
-    [X_true[i, :], X_true[j, :]] for i in range(len(pos)) for j in range(len(pos))
+    [X_true[i, :], X_true[j, :]] for i in range(len(X_true)) for j in range(len(X_true))
 ]
-values = np.abs(similarities)
+edges = distances.max() / (distances + EPSILON) * 100
+np.fill_diagonal(edges, 0)
+edges = np.abs(edges)
 lc = LineCollection(
-    segments, zorder=0, cmap=plt.cm.Blues, norm=plt.Normalize(0, values.max())
+    segments, zorder=0, cmap=plt.cm.Blues, norm=plt.Normalize(0, edges.max())
 )
-lc.set_array(similarities.flatten())
+lc.set_array(edges.flatten())
 lc.set_linewidths(np.full(len(segments), 0.5))
 ax.add_collection(lc)