Skip to content

[MRG+1] Support Vector Data Description #7910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 41 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
eb2d93a
ENH: nu-SVDD with sample weights, based on Chang, Lee, Lin (2013)
ivannz Nov 19, 2016
6d373a6
a Whatsnew entry and a minor comment fix in base.py
ivannz Dec 27, 2016
d86b3fd
Update to address #8711
ivannz Apr 8, 2017
766a344
docstring fix reflecting #9048
ivannz Jun 8, 2017
4093763
updated what's new according to #9505
ivannz Sep 15, 2017
941ca4b
review and sync with #9015
ivannz Oct 6, 2017
e8cd614
temporary ocSVM-test patch
ivannz Oct 9, 2017
543c4e6
removed 'random_state' from SVDD
ivannz Oct 10, 2017
d751148
fixed sparse SVDD test
ivannz Oct 10, 2017
ba11173
score_samples() test for the SVDD
ivannz Oct 10, 2017
1a7083a
BaseLibSVM interface update
ivannz Feb 23, 2018
5472505
FIX: Updated the default gamma to reflect #10331 and tests, fixed the…
ivannz Jul 29, 2018
a959082
TEST: fixed assertion in test_svm.py to reflect #12717
ivannz Dec 7, 2018
355c548
TST Fixed SVDD tests affected by scale redefinition in #13221
ivannz Mar 19, 2019
d765a40
updated docstrings and default parameters
ivannz Jul 25, 2019
bba31b6
TST fixed legacy asserts _eq and _gt in SVDD realted tests
ivannz Jul 25, 2019
71ecce1
Update MRO in SVDD to satisfy #14884
ivannz Oct 5, 2019
8c60b69
Simplified super() calls according to #12812
ivannz Nov 1, 2019
fab4538
DOC new docstring guidelines in svdd (according to #16060)
ivannz Jan 10, 2020
7082a56
DOCTEST fixed object and corrected reference scores
ivannz Jan 10, 2020
dbbc90b
fixed sphinx warnings due to bad indentation for circleci
ivannz Jan 10, 2020
c13582d
fixed unresolved conflict
ivannz Jan 30, 2020
90085e0
fixed unused kwarg warning
ivannz Jan 30, 2020
84a1dcb
some oneliners
ivannz Jan 30, 2020
3d39584
side-by-side comparison of scsvm with svdd (stationary kernel)
ivannz Jan 30, 2020
4d7217d
moved SVDD announcement from v0.20 to v0.23
ivannz Jan 30, 2020
2654479
removed hardcoded sample sizes
ivannz Jan 30, 2020
1e54626
patches to svdd-l1 reflecting #14286, #16530, #16992 and #16973
ivannz May 30, 2020
7a0ede0
reflect #17176: zero weight in SV models means that a sample is never…
ivannz Jul 5, 2020
6bf0fb5
reflect #15521: document attribtues inherited by SVDD from BaseLibSVM
ivannz Jul 5, 2020
00f3279
reflect #14286: test for negative or null sample_weights in SVDD
ivannz Jul 5, 2020
0e015e0
update mode in more-tags (reflecting #17361)
ivannz Oct 15, 2020
6bb003f
moved SVDD announcement from v0.23 to v1.0
ivannz Feb 25, 2021
fd43605
docfix in SVDD related to #20236
ivannz Jun 15, 2021
b0f4926
migrate svdd code style to Black (#18948)
ivannz Jul 23, 2021
742954a
update version in svdd docs to 1.1, relocate from 1.0 to 1.1 in whats…
ivannz Nov 10, 2021
9c95eea
move feature announcement from 1.1 to 1.2
ivannz May 15, 2022
36778b4
fixed user guide ref in SVDD docstring, copied kernel parameter docs …
ivannz May 15, 2022
4e5ca41
Removed deprecated `class_weight_` from the docs of SVDD (related to …
ivannz Jun 14, 2022
3cc3610
add parameter validation to SVDD and update dunder-docs (similar to o…
ivannz Aug 29, 2022
80a1725
clarify the parent space of the SVDD hypersphere (#r374672496)
ivannz Sep 4, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1529,9 +1529,10 @@ Estimators
svm.LinearSVR
svm.NuSVC
svm.NuSVR
svm.OneClassSVM
svm.SVC
svm.SVR
svm.OneClassSVM
svm.SVDD

.. autosummary::
:toctree: generated/
Expand Down
46 changes: 44 additions & 2 deletions doc/modules/outlier_detection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,8 @@ coming from the same population than the initial
observations. Otherwise, if they lay outside the frontier, we can say
that they are abnormal with a given confidence in our assessment.

The One-Class SVM has been introduced by Schölkopf et al. for that purpose
and implemented in the :ref:`svm` module in the
The :ref:`svm_one_class_svm` has been introduced by Schölkopf et al.
for that purpose and implemented in the :ref:`svm` module in the
:class:`svm.OneClassSVM` object. It requires the choice of a
kernel and a scalar parameter to define a frontier. The RBF kernel is
usually chosen although there exists no exact formula or algorithm to
Expand All @@ -167,12 +167,29 @@ implementation. The `nu` parameter, also known as the margin of
the One-Class SVM, corresponds to the probability of finding a new,
but regular, observation outside the frontier.

The Support Vector Data Description (:ref:`svm_svdd`) is an alternative
model for estimating the support of a data distribution. It was proposed
by Tax and Duin, and later reformulated by Chang et al. The reparametrized
SVDD model, which has better parameter interpretability, is implemented
in the :class:`svm.SVDD` object in the :ref:`svm` module. The interface
as well as the interpretation of the parameters is similar to the
:ref:`svm_one_class_svm` model.

.. topic:: References:

* `Estimating the support of a high-dimensional distribution
<https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-99-87.pdf>`_
Schölkopf, Bernhard, et al. Neural computation 13.7 (2001): 1443-1471.

* `Support vector data description
<http://dx.doi.org/10.1023/B:MACH.0000008084.60811.49>`_
Tax, and Duin. Machine learning, 54(1) (2004), pp.45-66.

* `A revisit to support vector data description (SVDD).
<http://w.csie.org/~cjlin/papers/svdd.pdf>`_ Chang, Lee,
and Lin. Technical Report (2013), Dept. of Computer Science,
National Taiwan University.

.. topic:: Examples:

* See :ref:`sphx_glr_auto_examples_svm_plot_oneclass.py` for visualizing the
Expand Down Expand Up @@ -415,3 +432,28 @@ Novelty detection with Local Outlier Factor is illustrated below.
:target: ../auto_examples/neighbors/plot_lof_novelty_detection.html
:align: center
:scale: 75%

.. _outlier_detection_ocsvm_vs_svdd:

One-Class SVM versus SVDD-L1
----------------------------

The :ref:`svm_one_class_svm` and :ref:`svm_svdd` models, though apparently
different, both try to construct a hypersurface, enveloping the densest regions
of the training sample. In the case of a stationary kernel :math:`K(x,y)=K(x-y)`,
such as RBF (see :ref:`svm_kernels`), for :math:`\nu\in (0,1)` the decision
functions are identical:

.. figure:: ../auto_examples/svm/images/sphx_glr_plot_oneclass_vs_svdd_001.png
:target: ../auto_examples/svm/plot_oneclass_vs_svdd.html
:align: center
:scale: 75%

But for a non-stationary kernel :math:`K(x,y)`, such as polynomial, the decision
functions may be dramatically different:

.. figure:: ../auto_examples/svm/images/sphx_glr_plot_oneclass_vs_svdd_002.png
:target: ../auto_examples/svm/plot_oneclass_vs_svdd.html
:align: center
:scale: 75%

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with this model. Is there something more specific that we could say to help the user choose between the two models when the kernel is not stationary?

For instance on the example we can see that for the polynomial kernel, the high density region is convex of SVDD-L1 is convex while it is not the case of the One-Class SVM model. Is this always the case? Or maybe under some assumption on the shape of the data?

It would be great to make the inductive bias of each model a bit more explicit if possible while not going into a deep mathematical comparison if possible.

218 changes: 207 additions & 11 deletions doc/modules/svm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ with and without weight correction.


:class:`SVC`, :class:`NuSVC`, :class:`SVR`, :class:`NuSVR`, :class:`LinearSVC`,
:class:`LinearSVR` and :class:`OneClassSVM` implement also weights for
:class:`LinearSVR`, :class:`OneClassSVM` and :class:`SVDD` implement also weights for
individual samples in the `fit` method through the ``sample_weight`` parameter.
Similar to ``class_weight``, this sets the parameter ``C`` for the i-th
example to ``C * sample_weight[i]``, which will encourage the classifier to
Expand Down Expand Up @@ -339,6 +339,28 @@ Density estimation, novelty detection
The class :class:`OneClassSVM` implements a One-Class SVM which is used in
outlier detection.

:ref:`svm_one_class_svm` and :ref:`svm_svdd` models can be used for novelty
detection: given a set of samples, the model detects a soft boundary of that
set so as to classify new points as belonging to that set or not. The
classes that implement these models are :class:`OneClassSVM` and
:class:`SVDD` respectively.

Since novelty detection is a type of unsupervised learning, the ``fit`` method
requires only an array X as input, as there are no class labels.

See section :ref:`outlier_detection` for more details on this usage.

.. figure:: ../auto_examples/svm/images/sphx_glr_plot_oneclass_001.png
:target: ../auto_examples/svm/plot_oneclass.html
:align: center
:scale: 75


.. topic:: Examples:

* :ref:`sphx_glr_auto_examples_svm_plot_oneclass.py`
* :ref:`sphx_glr_auto_examples_applications_plot_species_distribution_modeling.py`

See :ref:`outlier_detection` for the description and usage of OneClassSVM.

Complexity
Expand Down Expand Up @@ -382,11 +404,11 @@ Tips on Practical Use
function can be configured to be almost the same as the :class:`LinearSVC`
model.

* **Kernel cache size**: For :class:`SVC`, :class:`SVR`, :class:`NuSVC` and
:class:`NuSVR`, the size of the kernel cache has a strong impact on run
times for larger problems. If you have enough RAM available, it is
recommended to set ``cache_size`` to a higher value than the default of
200(MB), such as 500(MB) or 1000(MB).
* **Kernel cache size**: For :class:`SVC`, :class:`SVR`, :class:`NuSVC`,
:class:`NuSVR`, :class:`OneClassSVM` and :class:`SVDD` the size of the
kernel cache has a strong impact on run times for larger problems. If
you have enough RAM available, it is recommended to set ``cache_size``
to a higher value than the default of 200(MB), such as 500(MB) or 1000(MB).


* **Setting C**: ``C`` is ``1`` by default and it's a reasonable default
Expand Down Expand Up @@ -422,8 +444,9 @@ Tips on Practical Use
using a large stopping tolerance), the code without using shrinking may
be much faster*

* Parameter ``nu`` in :class:`NuSVC`/:class:`OneClassSVM`/:class:`NuSVR`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add OneClassSVM and SVDD in Kernel cache size above?

approximates the fraction of training errors and support vectors.
* Parameter ``nu`` in :class:`NuSVC`, :class:`OneClassSVM`, :class:`NuSVR`,
and :class:`SVDD` approximates the fraction of training errors and support
vectors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add SVDD after OneClassSVM in Randomness of the underlying implementations? (the implemenations of OneClassSVM and SVDD are both not random)


* In :class:`SVC`, if the data is unbalanced (e.g. many
positive and few negative), set ``class_weight='balanced'`` and/or try
Expand All @@ -435,9 +458,10 @@ Tips on Practical Use
``probability`` is set to ``True``). This randomness can be controlled
with the ``random_state`` parameter. If ``probability`` is set to ``False``
these estimators are not random and ``random_state`` has no effect on the
results. The underlying :class:`OneClassSVM` implementation is similar to
the ones of :class:`SVC` and :class:`NuSVC`. As no probability estimation
is provided for :class:`OneClassSVM`, it is not random.
results. The underlying :class:`OneClassSVM` and :class:`SVDD`
implementation is similar to the ones of :class:`SVC` and :class:`NuSVC`.
As no probability estimation is provided for :class:`OneClassSVM` and
:class:`SVDD`, they are not random.

The underlying :class:`LinearSVC` implementation uses a random number
generator to select features when fitting the model with a dual coordinate
Expand Down Expand Up @@ -760,6 +784,178 @@ where we make use of the epsilon-insensitive loss, i.e. errors of less than
:math:`\varepsilon` are ignored. This is the form that is directly optimized
by :class:`LinearSVR`.

.. _svm_one_class_svm:

One-Class SVM
-------------

This model, proposed by Schölkopf et al. (2001), estimates the support
of a high-dimensional distribution by constructing a supporting hyperplane
in the feature space corresponding to the kernel, which effectively
separates the data set from the origin with maximum margin.

For the training sample :math:`(x_i)_{i=1}^{n}` with weights :math:`(w_i)_{i=1}^{n}`,
:math:`\sum_{i=1}^{n} w_i>0`, the One-Class SVM solves the following primal problem:


.. math::

\min_{\rho,\xi,w} \frac12 w^Tw - \rho + \frac{1}{\nu W} \sum_{i=1}^{n} w_i \xi_i \,, \\

\textrm {subject to } & w^T\phi(x_i) \geq \rho - \xi_i \,, \\
& \xi_i \geq 0\,,\, i=1, \ldots, n \,,


where :math:`\phi(\cdot)` is the feature map associated with the
kernel :math:`K(\cdot,\cdot)`, and :math:`W = \sum_{i=1}^{n} w_i`.

The dual problem is


.. math::

\min_\alpha \frac12 \alpha^T Q\alpha\,\\

\textrm {subject to } & 0\leq \alpha_i \leq w_i\,,\, i=1, \ldots, n \,,\\
& e^T\alpha = \nu W \,,


where :math:`e\in \mathbb{R}^{n\times 1}` is the vector of ones and
:math:`Q_{ij} = K(x_i, x_j)` is the kernel Gram matrix.

The optimal decision function is given by:

.. math:: x\mapsto \operatorname{sgn}(\sum_{i=1}^{n} \alpha_i K(x_i, x) - \rho) \,,

where :math:`+1` indicates an inliner and :math:`-1` an outlier.

The parameter :math:`\nu\in(0,1]` determines the fraction of outliers
in the training dataset. More technically :math:`\nu` is:

- an upper bound on the fraction of the training points lying outside
the estimated region;
- a lower bound on the fraction of support vectors.

.. topic:: References:

* `Estimating the support of a high-dimensional distribution
<http://dl.acm.org/citation.cfm?id=1119749>`_ Schölkopf,
Bernhard, et al. Neural computation 13.7 (2001): 1443-1471.
doi:10.1162/089976601750264965


.. _svm_svdd:

SVDD
----

Support Vector Data Description (SVDD), proposed by Tax and Duin (2004),
aims at finding a spherically shaped boundary around a data set. Specifically,
it computes a minimum volume hypersphere (in the feature space induced by the
kernel) containing the most of the data with the number of outliers controlled
by the parameter of the model.

The original formulation suffered from non-convexity issues related to optimality of
the attained solution for certain values of the regularization parameter :math:`C`.
Chang, Lee, and Lin (2013) suggested a reformulation of the SVDD model
which had a well-defined and provably unique global solution for any :math:`C>0`.

The implementation in the class :class:`SVDD` is based on a modified version
of the 2013 SVDD formulation. The following changes were made to problem (7)
in Chang et al. (2013):

* **sample weights**: instead of a uniform penalty :math:`C>0` sample
observations are allowed to have different costs :math:`(C_i)_{i=1}^{n}`,
:math:`\sum_{i=1}^{n} C_i > 0`;

* :math:`\nu`-**parametrization**: the penalties are determined by
:math:`C_i = \frac{w_i}{\nu \sum_{i=1}^{n} w_i}`, where :math:`\nu\in(0, 1]`
and :math:`(w_i)_{i=1}^{n}` are non-negative sample weights.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as for the OCSVM: as the other implementations in the SVM module do not include the weights in the description of their formulation, maybe we should do the same here. But let's see what the others think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather update all the formulas to also include the weights. It cases where the formulas with the weights becomes too complex maybe one could mention the two formulas, the first without the weights (to get the gist of the loss function) and the second with the weights for the sake of completeness.

Straightforward extension of theorems 2-4 of Chang et al. (2013) to the case
of different penalty yielded the :math:`\sum_{i=1}^{n} C_i > 1`, or equivalently
:math:`\nu < 1`, as the condition, which distinguishes the case of :math:`R>0`
(theorem 4 case 1) from :math:`R=0` (theorem 4 case 2).

The main benefit of the :math:`\nu`-parametrization is a clearer interpretation
and a unified interface to the :ref:`svm_one_class_svm` model: :math:`\nu` is an
upper bound on the fraction of the training points lying outside the estimated
region, and a lower bound on the fraction of support vectors. Under the original
:math:`C`-parametrization the value :math:`\frac{1}{n C}` served as these bounds.

Note that in a typical run of the SVDD model the weights are set to :math:`w_i = 1`,
which is equivalent to the original 2013 SVDD formulation for :math:`C = \frac{1}{\nu n}`.

The primal problem of this modified version of SVDD for the training sample
:math:`(x_i)_{i=1}^{n}` with weights :math:`(w_i)_{i=1}^{n}`,
:math:`\sum_{i=1}^{n} w_i>0`, is:


.. math::

\min_{R,\xi,a} R + \frac{1}{\nu W} \sum_{i=1}^{n} w_i \xi_i\,,\\

\textrm {subject to } & \|\phi(x_i) - a\|^2 \leq R + \xi_i\,,\\
& \xi_i \geq 0\,,\, i=1, \ldots, n\,,\\
& R \geq 0\,,


where :math:`\phi(\cdot)` is the feature map associated with the kernel
:math:`K(\cdot,\cdot)`, and :math:`W = \sum_{i=1}^{n} w_i`.

When :math:`\nu \geq 1`, the optimal :math:`R=0` and the primal problem
reduces to an unconstrained convex optimization problem independent of
:math:`\nu`:

.. math :: \min_a \sum_{i=1}^{n} w_i \|\phi(x_i) - a\|^2\,.

Note that in this case every observation is an outlier.

In the case when :math:`\nu < 1` the constraint :math:`R\geq 0` is redundant,
strong duality holds, and the dual problem has the form:


.. math ::

\min_\alpha \frac12 \alpha^T Q\alpha - \frac{\nu W}{2} \sum_{i=1}^{n} \alpha_i Q_{ii}\,,\\

\textrm {subject to } & 0 \leq \alpha_i \leq w_i\,,\, i=1, \ldots, n\,,\\
& e^T \alpha = \nu W\,,


where :math:`e\in \mathbb{R}^{n\times 1}` is the vector of ones and
:math:`Q_{ij} = K(x_i, x_j)` is the kernel Gram matrix.

The decision function of the SVDD is given by:

.. math:: x\mapsto \operatorname{sgn}(R - \|\phi(x) - a\|^2) \,,

where :math:`+1` indicates an inliner and :math:`-1` an outlier. The
distances in the feature space and :math:`R` are computed implicitly through
the coefficients and the optimal value of the objective of the corresponding
dual problem.

It is worth noting, that in the case of a stationary kernel :math:`K(x,y)=K(x-y)`
the SVDD and One-Class SVM models are provably equivalent. Indeed, the values
:math:`Q_{ii} = K(x_i, x_i)` in the last term in the dual of the SVDD are all
equal to :math:`K(0)`, which makes the whole term independent of :math:`\alpha`.
Therefore the objective functions of the dual problems of the One-Class SVM
and the SVDD are equivalent up to a constant. This, however, **does not imply**
that one model generalizes the other: their solutions just happen to coincide
for a particular family of kernels (see :ref:`outlier_detection_ocsvm_vs_svdd`).

.. topic:: References:

* `Support vector data description
<http://dx.doi.org/10.1023/B:MACH.0000008084.60811.49>`_
Tax, and Duin. Machine learning, 54(1) (2004), pp.45-66.

* `A revisit to support vector data description (SVDD).
<http://w.csie.org/~cjlin/papers/svdd.pdf>`_ Chang, Lee,
and Lin. Technical Report (2013), Dept. of Computer Science,
National Taiwan University.


.. _svm_implementation_details:

Implementation details
Expand Down
4 changes: 3 additions & 1 deletion doc/whats_new/_contributors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -176,4 +176,6 @@

.. _Nicolas Hug: https://github.com/NicolasHug

.. _Guillaume Lemaitre: https://github.com/glemaitre
.. _Guillaume Lemaitre: https://github.com/glemaitre

.. _Ivan Nazarov: https://github.com/ivannz
4 changes: 4 additions & 0 deletions doc/whats_new/v1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,10 @@ Changelog
:class:`svm.NuSVR`, :class:`svm.SVR`, :class:`svm.OneClassSVM`.
:pr:`22898` by :user:`Meekail Zain <micky774>`.

- |Feature| Added the :class:`svm.SVDD` class for novelty detection based
on soft minimal volume hypersphere around the sample data. :pr:`7910`
by :user:`Ivan Nazarov <ivannz>`.

:mod:`sklearn.tree`
...................

Expand Down
1 change: 1 addition & 0 deletions examples/miscellaneous/plot_anomaly_comparison.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@
),
),
),
("SVDD", svm.SVDD(nu=outliers_fraction, kernel="rbf", gamma=0.1)),
(
"Isolation Forest",
IsolationForest(contamination=outliers_fraction, random_state=42),
Expand Down
6 changes: 3 additions & 3 deletions examples/svm/plot_oneclass.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
"""
==========================================
One-class SVM with non-linear kernel (RBF)
One-Class SVM with non-linear kernel (RBF)
==========================================

An example using a one-class SVM for novelty detection.
An example using a One-Class SVM for novelty detection.

:ref:`One-class SVM <svm_outlier_detection>` is an unsupervised
:ref:`One-Class SVM <svm_outlier_detection>` is an unsupervised
algorithm that learns a decision function for novelty detection:
classifying new data as similar or different to the training set.

Expand Down
Loading