scikit-learn · nmayorov · May 26, 2014 · May 26, 2014 · May 26, 2014 · Jul 1, 2014
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -1241,6 +1241,7 @@ Estimators
    svm.LinearSVR
    svm.NuSVR
    svm.OneClassSVM
+   svm.SVDD
 
 .. autosummary::
    :toctree: generated/

diff --git a/doc/modules/outlier_detection.rst b/doc/modules/outlier_detection.rst
@@ -1,8 +1,8 @@
 .. _outlier_detection:
 
-===================================================
+=============================
 Novelty and Outlier Detection
-===================================================
+=============================
 
 .. currentmodule:: sklearn
 
@@ -49,37 +49,56 @@ In general, it is about to learn a rough, close frontier delimiting
 the contour of the initial observations distribution, plotted in
 embedding :math:`p`-dimensional space. Then, if further observations
 lay within the frontier-delimited subspace, they are considered as
-coming from the same population than the initial
-observations. Otherwise, if they lay outside the frontier, we can say
-that they are abnormal with a given confidence in our assessment.
-
-The One-Class SVM has been introduced by Schölkopf et al. for that purpose 
-and implemented in the :ref:`svm` module in the
-:class:`svm.OneClassSVM` object. It requires the choice of a
-kernel and a scalar parameter to define a frontier.  The RBF kernel is
-usually chosen although there exists no exact formula or algorithm to
-set its bandwidth parameter. This is the default in the scikit-learn
-implementation. The :math:`\nu` parameter, also known as the margin of
-the One-Class SVM, corresponds to the probability of finding a new,
-but regular, observation outside the frontier.
+coming from the same population than the initial observations. Otherwise,
+if they lay outside the frontier, we can say that they are abnormal with a
+given confidence in our assessment.
+
+There are two SVM-based approaches for that purpose:
+
+1. :class:`svm.OneClassSVM` finds a hyperplane which separates the data from
+   the origin by the largest margin.
+2. :class:`svm.SVDD` finds a sphere with a minimum radius which encloses
+   the data.
+
+Both methods can implicitly work in a transformed high-dimensional space using
+the kernel trick. :class:`svm.OneClassSVM` provides :math:`\nu` parameter for
+controlling the trade off between the margin and the number of outliers during
+training, namely it is an upper bound on the fraction of outliers in a training
+set or probability of finding a new, but regular, observation outside the
+frontier. :clss:`svm.SVDD` provides a similar parameter
+:math:`C = 1 / (\nu l)`, where :math:`l` is the number of samples, such that
+:math:`1/C` approximately equals the number of outliers in a training set.
+
+Both methods are equivalent if a) the kernel used depends only on the
+difference between two vectors, one example is RBF kernel, and
+b) :math:`C = 1 / (\nu l)`.
 
-.. topic:: References:
+.. figure:: ../auto_examples/svm/images/plot_oneclass_001.png
+   :target: ../auto_examples/svm/plot_oneclasse.html
+   :align: center
+   :scale: 75%
+
+.. figure:: ../auto_examples/svm/images/plot_oneclass_vs_svdd_001.png
+   :target: ../auto_examples/svm/plot_oneclass_vs_svdd.html
+   :align: center
+   :scale: 75
 
-    * `Estimating the support of a high-dimensional distribution
-      <http://dl.acm.org/citation.cfm?id=1119749>`_ Schölkopf, 
-      Bernhard, et al. Neural computation 13.7 (2001): 1443-1471.
-
 .. topic:: Examples:
 
    * See :ref:`example_svm_plot_oneclass.py` for visualizing the
-     frontier learned around some data by a
-     :class:`svm.OneClassSVM` object.
+     frontier learned around some data by :class:`svm.OneClassSVM`.
+   * See :ref:`example_svm_plot_oneclass_vs_svdd.py` to get the idea about
+     the difference between the two approaches.
+
+.. topic:: References:
+
+    * Bernhard Schölkopf et al, `Estimating the Support of a High-Dimensional
+      Distribution <http://dl.acm.org/citation.cfm?id=1119749>`_, Neural
+      computation 13.7 (2001): 1443-1471.
+    * David M. J. Tax and Robert P. W. Duin, `Support Vector Data Description
+      <http://dl.acm.org/citation.cfm?id=960109>`_, Machine Learning,
+      54(1):45-66, 2004.
 
-.. figure:: ../auto_examples/svm/images/plot_oneclass_001.png
-   :target: ../auto_examples/svm/plot_oneclasse.html
-   :align: center
-   :scale: 75%
-
 
 Outlier Detection
 =================
@@ -131,7 +150,7 @@ This strategy is illustrated below.
 
 
 Isolation Forest
-----------------------------
+----------------
 
 One efficient way of performing outlier detection in high-dimensional datasets
 is to use random forests.
@@ -187,7 +206,11 @@ results in these situations.
 The examples below illustrate how the performance of the
 :class:`covariance.EllipticEnvelope` degrades as the data is less and
 less unimodal. The :class:`svm.OneClassSVM` works better on data with
-multiple modes and :class:`ensemble.IsolationForest` performs well in every cases.
+multiple modes and :class:`ensemble.IsolationForest` performs well in all
+cases.
+
+:class:`svm.SVDD` is not presented in comparison as it works the same as
+:class:`svm.OneClassSVM` when using RBF kernel.
 
 .. |outlier1| image:: ../auto_examples/covariance/images/plot_outlier_detection_001.png
    :target: ../auto_examples/covariance/plot_outlier_detection.html
@@ -213,20 +236,20 @@ multiple modes and :class:`ensemble.IsolationForest` performs well in every case
         :class:`covariance.EllipticEnvelope` learns an ellipse, which
         fits well the inlier distribution. The :class:`ensemble.IsolationForest`
 	performs as well.
-      - |outlier1| 
+      - |outlier1|
 
-   * 
+   *
       - As the inlier distribution becomes bimodal, the
         :class:`covariance.EllipticEnvelope` does not fit well the
         inliers. However, we can see that both :class:`ensemble.IsolationForest`
 	and :class:`svm.OneClassSVM` have difficulties to detect the two modes,
 	and that the :class:`svm.OneClassSVM`
         tends to overfit: because it has not model of inliers, it
         interprets a region where, by chance some outliers are
-        clustered, as inliers. 
-      - |outlier2| 
+        clustered, as inliers.
+      - |outlier2|
 
-   * 
+   *
       - If the inlier distribution is strongly non Gaussian, the
         :class:`svm.OneClassSVM` is able to recover a reasonable
         approximation as well as :class:`ensemble.IsolationForest`,

diff --git a/doc/modules/svm.rst b/doc/modules/svm.rst
@@ -325,32 +325,54 @@ floating point values instead of integer values::
  * :ref:`example_svm_plot_svm_regression.py`
 
 
-
 .. _svm_outlier_detection:
 
-Density estimation, novelty detection
-=======================================
+Novelty and outlier detection
+=============================
+
+Support vector machines can be used for detecting novelty and outliers in
+unlabeled data sets. That is, given a set of samples, detect the soft boundary
+of that set so as to classify new points as belonging to that set or not.
+
+There are two SVM-based approaches to this problem:
 
-One-class SVM is used for novelty detection, that is, given a set of
-samples, it will detect the soft boundary of that set so as to
-classify new points as belonging to that set or not. The class that
-implements this is called :class:`OneClassSVM`.
+1. :class:`OneClassSVM` finds a hyperplane which separates the data from
+   the origin by the largest margin.
+2. :class:`SVDD` finds a sphere with a minimum radius which encloses
+   the data.
 
-In this case, as it is a type of unsupervised learning, the fit method
-will only take as input an array X, as there are no class labels.
+Both methods can be tuned for the optimal trade-off between number of outliers
+and the margin/radius of a separation bound.
 
-See, section :ref:`outlier_detection` for more details on this usage.
+See section :ref:`outlier_detection` for more details on their usage.
 
 .. figure:: ../auto_examples/svm/images/plot_oneclass_001.png
    :target: ../auto_examples/svm/plot_oneclass.html
    :align: center
    :scale: 75
 
+.. figure:: ../auto_examples/svm/images/plot_oneclass_vs_svdd_001.png
+   :target: ../auto_examples/svm/plot_oneclass_vs_svdd.html
+   :align: center
+   :scale: 75
 
 .. topic:: Examples:
 
- * :ref:`example_svm_plot_oneclass.py`
- * :ref:`example_applications_plot_species_distribution_modeling.py`
+   * See :ref:`example_svm_plot_oneclass.py` for visualizing the
+     frontier learned around some data by :class:`OneClassSVM`.
+   * See :ref:`example_svm_plot_oneclass_vs_svdd.py` to get the idea about
+     the difference between the two approaches.
+   * :ref:`example_applications_plot_species_distribution_modeling.py`
+
+.. topic:: References:
+
+    * Bernhard Schölkopf et al, `Estimating the Support of a High-Dimensional
+      Distribution <http://dl.acm.org/citation.cfm?id=1119749>`_, Neural
+      computation 13.7 (2001): 1443-1471.
+    * David M. J. Tax and Robert P. W. Duin, `Support Vector Data Description
+      <http://dl.acm.org/citation.cfm?id=960109>`_, Machine Learning,
+      54(1):45-66, 2004.
+
 
 
 Complexity
@@ -707,5 +729,3 @@ computations. These libraries are wrapped using C and Cython.
 
     - `LIBLINEAR -- A Library for Large Linear Classification
       <http://www.csie.ntu.edu.tw/~cjlin/liblinear/>`_
-
-
diff --git a/examples/applications/plot_outlier_detection_housing.py b/examples/applications/plot_outlier_detection_housing.py
@@ -19,7 +19,7 @@
 able to focus on the main mode of the data distribution, it sticks to the
 assumption that the data should be Gaussian distributed, yielding some biased
 estimation of the data structure, but yet accurate to some extent.
-The One-Class SVM algorithm
+The One-Class SVM algorithm and Support Vector Data Description
 
 First example
 -------------
@@ -39,7 +39,7 @@
 distribution: the location seems to be well estimated, although the covariance
 is hard to estimate due to the banana-shaped distribution. Anyway, we can
 get rid of some outlying observations.
-The One-Class SVM is able to capture the real data structure, but the
+The One-Class SVM and SVDD are able to capture the real data structure, but the
 difficulty is to adjust its kernel bandwidth parameter so as to obtain
 a good compromise between the shape of the data scatter matrix and the
 risk of over-fitting the data.
@@ -52,7 +52,7 @@
 
 import numpy as np
 from sklearn.covariance import EllipticEnvelope
-from sklearn.svm import OneClassSVM
+from sklearn.svm import OneClassSVM, SVDD
 import matplotlib.pyplot as plt
 import matplotlib.font_manager
 from sklearn.datasets import load_boston
@@ -67,8 +67,9 @@
                                              contamination=0.261),
     "Robust Covariance (Minimum Covariance Determinant)":
     EllipticEnvelope(contamination=0.261),
-    "OCSVM": OneClassSVM(nu=0.261, gamma=0.05)}
-colors = ['m', 'g', 'b']
+    "OCSVM": OneClassSVM(nu=0.261, gamma=0.05),
+    "SVDD": SVDD(kernel='rbf', gamma = 0.03, C=0.01)}
+colors = ['m', 'g', 'b', 'y']
 legend1 = {}
 legend2 = {}
 
@@ -105,8 +106,9 @@
 plt.ylim((yy1.min(), yy1.max()))
 plt.legend((legend1_values_list[0].collections[0],
             legend1_values_list[1].collections[0],
-            legend1_values_list[2].collections[0]),
-           (legend1_keys_list[0], legend1_keys_list[1], legend1_keys_list[2]),
+            legend1_values_list[2].collections[0],
+            legend1_values_list[3].collections[0]),
+           (legend1_keys_list[0], legend1_keys_list[1], legend1_keys_list[2], legend1_keys_list[3]),
            loc="upper center",
            prop=matplotlib.font_manager.FontProperties(size=12))
 plt.ylabel("accessibility to radial highways")
@@ -122,8 +124,9 @@
 plt.ylim((yy2.min(), yy2.max()))
 plt.legend((legend2_values_list[0].collections[0],
             legend2_values_list[1].collections[0],
-            legend2_values_list[2].collections[0]),
-           (legend2_values_list[0], legend2_values_list[1], legend2_values_list[2]),
+            legend2_values_list[2].collections[0],
+            legend2_values_list[3].collections[0]),
+           (legend2_keys_list[0], legend2_keys_list[1], legend2_keys_list[2], legend2_keys_list[3]),
            loc="upper center",
            prop=matplotlib.font_manager.FontProperties(size=12))
 plt.ylabel("% lower status of the population")

diff --git a/examples/covariance/plot_outlier_detection.py b/examples/covariance/plot_outlier_detection.py
@@ -49,8 +49,10 @@
 classifiers = {
     "One-Class SVM": svm.OneClassSVM(nu=0.95 * outliers_fraction + 0.05,
                                      kernel="rbf", gamma=0.1),
-    "robust covariance estimator": EllipticEnvelope(contamination=.1),
-    "Isolation Forest": IsolationForest(max_samples=n_samples, random_state=rng)}
+    "Robust Covariance Estimator": EllipticEnvelope(contamination=0.1),
+    "Isolation Forest": IsolationForest(max_samples=n_samples,
+                                        random_state=rng)
+}
 
 # Compare given classifiers under given settings
 xx, yy = np.meshgrid(np.linspace(-7, 7, 500), np.linspace(-7, 7, 500))
@@ -83,7 +85,6 @@
         Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
         Z = Z.reshape(xx.shape)
         subplot = plt.subplot(1, 3, i + 1)
-        subplot.set_title("Outlier detection")
         subplot.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7),
                          cmap=plt.cm.Blues_r)
         a = subplot.contour(xx, yy, Z, levels=[threshold],
@@ -95,11 +96,12 @@
         subplot.axis('tight')
         subplot.legend(
             [a.collections[0], b, c],
-            ['learned decision function', 'true inliers', 'true outliers'],
+            ['Decision function', 'True inliers', 'True outliers'],
             prop=matplotlib.font_manager.FontProperties(size=11))
-        subplot.set_xlabel("%d. %s (errors: %d)" % (i + 1, clf_name, n_errors))
+        subplot.set_xlabel("%s (errors: %d)" % (clf_name, n_errors))
         subplot.set_xlim((-7, 7))
         subplot.set_ylim((-7, 7))
+    plt.suptitle("Outlier detection")
     plt.subplots_adjust(0.04, 0.1, 0.96, 0.94, 0.1, 0.26)
 
 plt.show()
diff --git a/examples/svm/plot_oneclass.py b/examples/svm/plot_oneclass.py
@@ -40,8 +40,8 @@
 Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
 Z = Z.reshape(xx.shape)
 
-plt.title("Novelty Detection")
-plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
+plt.title("Novelty Detection by One-class SVM")
+plt.contourf(xx , yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
 a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
 plt.contourf(xx, yy, Z, levels=[0, Z.max()], colors='palevioletred')