Skip to content

DOC add cross-reference to examples instead of duplicating content for GPR #20003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Sep 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
a0d84a5
DOC revamp documentation of GPR
glemaitre Apr 28, 2021
286408e
iter
glemaitre Apr 28, 2021
0383114
iter
glemaitre May 8, 2021
69a984e
iter
glemaitre May 17, 2021
e3f359d
iter
glemaitre May 17, 2021
5cababf
Merge remote-tracking branch 'origin/main' into user_guide_gp
glemaitre May 17, 2021
bd281fa
remove exercise
glemaitre May 17, 2021
7de916b
iter
glemaitre May 18, 2021
8e90750
iter
glemaitre May 18, 2021
37d1cef
iter
glemaitre May 18, 2021
d073926
iter
glemaitre May 18, 2021
1a674e5
iter
glemaitre May 18, 2021
d08a09c
Apply suggestions from code review
glemaitre May 26, 2021
7c2d3a6
Update examples/gaussian_process/plot_compare_gpr_krr.py
glemaitre May 26, 2021
83cb215
Update examples/gaussian_process/plot_compare_gpr_krr.py
glemaitre May 26, 2021
c965bdb
Update examples/gaussian_process/plot_compare_gpr_krr.py
glemaitre May 26, 2021
a20e370
Update examples/gaussian_process/plot_gpr_co2.py
glemaitre May 26, 2021
9bedda0
Update examples/gaussian_process/plot_gpr_co2.py
glemaitre May 26, 2021
a2e4057
Update examples/gaussian_process/plot_gpr_co2.py
glemaitre May 26, 2021
76a4a11
Update examples/gaussian_process/plot_gpr_co2.py
glemaitre May 26, 2021
d0a4396
Update examples/gaussian_process/plot_compare_gpr_krr.py
glemaitre May 26, 2021
e1467af
Apply suggestions from code review
glemaitre May 26, 2021
e0b5422
Update examples/gaussian_process/plot_compare_gpr_krr.py
glemaitre May 26, 2021
41a8b2e
Update examples/gaussian_process/plot_compare_gpr_krr.py
glemaitre May 26, 2021
7cfc324
Apply suggestions from code review
glemaitre May 26, 2021
8d5a356
Update examples/gaussian_process/plot_gpr_noisy_targets.py
glemaitre May 26, 2021
2329354
Update examples/gaussian_process/plot_gpr_noisy_targets.py
glemaitre May 26, 2021
49db719
Update examples/gaussian_process/plot_gpr_noisy_targets.py
glemaitre May 26, 2021
9e87455
Update examples/gaussian_process/plot_gpr_noisy_targets.py
glemaitre May 26, 2021
2089316
Update examples/gaussian_process/plot_gpr_noisy_targets.py
glemaitre May 26, 2021
88f4278
Update examples/gaussian_process/plot_gpr_noisy.py
glemaitre May 26, 2021
1d7bf95
Apply suggestions from code review
glemaitre May 26, 2021
39f62e0
Update examples/gaussian_process/plot_gpr_co2.py
glemaitre May 26, 2021
d3509e3
Update examples/gaussian_process/plot_gpr_co2.py
glemaitre May 26, 2021
81ee851
Merge remote-tracking branch 'origin/main' into user_guide_gp
glemaitre May 26, 2021
9ac4237
PEP8
glemaitre Jun 11, 2021
8a468d1
fix
glemaitre Jun 11, 2021
4552275
iter
glemaitre Jun 11, 2021
3e86ff7
iter
glemaitre Jun 11, 2021
da32d5e
Merge remote-tracking branch 'origin/main' into user_guide_gp
glemaitre Nov 3, 2021
5677f19
Merge remote-tracking branch 'origin/main' into user_guide_gp
glemaitre Aug 2, 2023
bd8bb62
forgot to save
glemaitre Aug 2, 2023
a547d34
Merge remote-tracking branch 'origin/main' into user_guide_gp
glemaitre Aug 11, 2023
87c9272
improve title
glemaitre Aug 11, 2023
45bfb48
Merge remote-tracking branch 'origin/main' into user_guide_gp
glemaitre Aug 29, 2023
53b2eda
add changes proposed by Noa
glemaitre Aug 29, 2023
b06bdca
Merge branch 'main' into user_guide_gp
glemaitre Sep 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 43 additions & 169 deletions doc/modules/gaussian_process.rst
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly encourage including the term NONPARAMETRIC when discussing GPs.

I would rephrase the opening (lines 9-11):
Gaussian Processes (GP) are a nonparametric supervised learning method used
to solve regression and probabilistic classification problems.

Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@


.. _gaussian_process:

==================
Expand All @@ -8,7 +6,7 @@ Gaussian Processes

.. currentmodule:: sklearn.gaussian_process

**Gaussian Processes (GP)** are a generic supervised learning method designed
**Gaussian Processes (GP)** are a nonparametric supervised learning method used
to solve *regression* and *probabilistic classification* problems.

The advantages of Gaussian processes are:
Expand All @@ -27,8 +25,8 @@ The advantages of Gaussian processes are:

The disadvantages of Gaussian processes include:

- They are not sparse, i.e., they use the whole samples/features information to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sparse Gaussian Processes is a thing, just not in Scikit. One defines a set of inducing points (smaller than the data) and use them for learning the GP instead of the data.
This is a good blog post about it.

perform the prediction.
- Our implementation is not sparse, i.e., they use the whole samples/features
information to perform the prediction.

- They lose efficiency in high dimensional spaces -- namely when the number
of features exceeds a few dozens.
Expand All @@ -42,31 +40,44 @@ Gaussian Process Regression (GPR)
.. currentmodule:: sklearn.gaussian_process

The :class:`GaussianProcessRegressor` implements Gaussian processes (GP) for
regression purposes. For this, the prior of the GP needs to be specified. The
prior mean is assumed to be constant and zero (for ``normalize_y=False``) or the
training data's mean (for ``normalize_y=True``). The prior's
covariance is specified by passing a :ref:`kernel <gp_kernels>` object. The
hyperparameters of the kernel are optimized during fitting of
GaussianProcessRegressor by maximizing the log-marginal-likelihood (LML) based
on the passed ``optimizer``. As the LML may have multiple local optima, the
optimizer can be started repeatedly by specifying ``n_restarts_optimizer``. The
first run is always conducted starting from the initial hyperparameter values
of the kernel; subsequent runs are conducted from hyperparameter values
that have been chosen randomly from the range of allowed values.
If the initial hyperparameters should be kept fixed, `None` can be passed as
optimizer.
regression purposes. For this, the prior of the GP needs to be specified. GP
will combine this prior and the likelihood function based on training samples.
It allows to give a probabilistic approach to prediction by giving the mean and
standard deviation as output when predicting.

The noise level in the targets can be specified by passing it via the
parameter ``alpha``, either globally as a scalar or per datapoint.
Note that a moderate noise level can also be helpful for dealing with numeric
issues during fitting as it is effectively implemented as Tikhonov
regularization, i.e., by adding it to the diagonal of the kernel matrix. An
alternative to specifying the noise level explicitly is to include a
WhiteKernel component into the kernel, which can estimate the global noise
level from the data (see example below).
.. figure:: ../auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_002.png
:target: ../auto_examples/gaussian_process/plot_gpr_noisy_targets.html
:align: center

The prior mean is assumed to be constant and zero (for `normalize_y=False`) or
the training data's mean (for `normalize_y=True`). The prior's covariance is
specified by passing a :ref:`kernel <gp_kernels>` object. The hyperparameters
of the kernel are optimized when fitting the :class:`GaussianProcessRegressor`
by maximizing the log-marginal-likelihood (LML) based on the passed
`optimizer`. As the LML may have multiple local optima, the optimizer can be
started repeatedly by specifying `n_restarts_optimizer`. The first run is
always conducted starting from the initial hyperparameter values of the kernel;
subsequent runs are conducted from hyperparameter values that have been chosen
randomly from the range of allowed values. If the initial hyperparameters
should be kept fixed, `None` can be passed as optimizer.

The noise level in the targets can be specified by passing it via the parameter
`alpha`, either globally as a scalar or per datapoint. Note that a moderate
noise level can also be helpful for dealing with numeric instabilities during
fitting as it is effectively implemented as Tikhonov regularization, i.e., by
adding it to the diagonal of the kernel matrix. An alternative to specifying
the noise level explicitly is to include a
:class:`~sklearn.gaussian_process.kernels.WhiteKernel` component into the
kernel, which can estimate the global noise level from the data (see example
below). The figure below shows the effect of noisy target handled by setting
the parameter `alpha`.

.. figure:: ../auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_003.png
:target: ../auto_examples/gaussian_process/plot_gpr_noisy_targets.html
:align: center

The implementation is based on Algorithm 2.1 of [RW2006]_. In addition to
the API of standard scikit-learn estimators, GaussianProcessRegressor:
the API of standard scikit-learn estimators, :class:`GaussianProcessRegressor`:

* allows prediction without prior fitting (based on the GP prior)

Expand All @@ -77,149 +88,12 @@ the API of standard scikit-learn estimators, GaussianProcessRegressor:
externally for other ways of selecting hyperparameters, e.g., via
Markov chain Monte Carlo.

.. topic:: Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to gpr_examples? we might easily conflict with another place defining an Examples topic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a topic and not the name of the section. I don't think you can use this to reference.


GPR examples
============

GPR with noise-level estimation
-------------------------------
This example illustrates that GPR with a sum-kernel including a WhiteKernel can
estimate the noise level of data. An illustration of the
log-marginal-likelihood (LML) landscape shows that there exist two local
maxima of LML.

.. figure:: ../auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_003.png
:target: ../auto_examples/gaussian_process/plot_gpr_noisy.html
:align: center

The first corresponds to a model with a high noise level and a
large length scale, which explains all variations in the data by noise.

.. figure:: ../auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_004.png
:target: ../auto_examples/gaussian_process/plot_gpr_noisy.html
:align: center

The second one has a smaller noise level and shorter length scale, which explains
most of the variation by the noise-free functional relationship. The second
model has a higher likelihood; however, depending on the initial value for the
hyperparameters, the gradient-based optimization might also converge to the
high-noise solution. It is thus important to repeat the optimization several
times for different initializations.

.. figure:: ../auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_005.png
:target: ../auto_examples/gaussian_process/plot_gpr_noisy.html
:align: center


Comparison of GPR and Kernel Ridge Regression
---------------------------------------------

Both kernel ridge regression (KRR) and GPR learn
a target function by employing internally the "kernel trick". KRR learns a
linear function in the space induced by the respective kernel which corresponds
to a non-linear function in the original space. The linear function in the
kernel space is chosen based on the mean-squared error loss with
ridge regularization. GPR uses the kernel to define the covariance of
a prior distribution over the target functions and uses the observed training
data to define a likelihood function. Based on Bayes theorem, a (Gaussian)
posterior distribution over target functions is defined, whose mean is used
for prediction.

A major difference is that GPR can choose the kernel's hyperparameters based
on gradient-ascent on the marginal likelihood function while KRR needs to
perform a grid search on a cross-validated loss function (mean-squared error
loss). A further difference is that GPR learns a generative, probabilistic
model of the target function and can thus provide meaningful confidence
intervals and posterior samples along with the predictions while KRR only
provides predictions.

The following figure illustrates both methods on an artificial dataset, which
consists of a sinusoidal target function and strong noise. The figure compares
the learned model of KRR and GPR based on a ExpSineSquared kernel, which is
suited for learning periodic functions. The kernel's hyperparameters control
the smoothness (length_scale) and periodicity of the kernel (periodicity).
Moreover, the noise level
of the data is learned explicitly by GPR by an additional WhiteKernel component
in the kernel and by the regularization parameter alpha of KRR.

.. figure:: ../auto_examples/gaussian_process/images/sphx_glr_plot_compare_gpr_krr_005.png
:target: ../auto_examples/gaussian_process/plot_compare_gpr_krr.html
:align: center

The figure shows that both methods learn reasonable models of the target
function. GPR provides reasonable confidence bounds on the prediction which are not
available for KRR. A major difference between the two methods is the time
required for fitting and predicting: while fitting KRR is fast in principle,
the grid-search for hyperparameter optimization scales exponentially with the
number of hyperparameters ("curse of dimensionality"). The gradient-based
optimization of the parameters in GPR does not suffer from this exponential
scaling and is thus considerably faster on this example with 3-dimensional
hyperparameter space. The time for predicting is similar; however, generating
the variance of the predictive distribution of GPR takes considerably longer
than just predicting the mean.

GPR on Mauna Loa CO2 data
-------------------------

This example is based on Section 5.4.3 of [RW2006]_.
It illustrates an example of complex kernel engineering and
hyperparameter optimization using gradient ascent on the
log-marginal-likelihood. The data consists of the monthly average atmospheric
CO2 concentrations (in parts per million by volume (ppmv)) collected at the
Mauna Loa Observatory in Hawaii, between 1958 and 1997. The objective is to
model the CO2 concentration as a function of the time t.

The kernel is composed of several terms that are responsible for explaining
different properties of the signal:

- a long term, smooth rising trend is to be explained by an RBF kernel. The
RBF kernel with a large length-scale enforces this component to be smooth;
it is not enforced that the trend is rising which leaves this choice to the
GP. The specific length-scale and the amplitude are free hyperparameters.

- a seasonal component, which is to be explained by the periodic
ExpSineSquared kernel with a fixed periodicity of 1 year. The length-scale
of this periodic component, controlling its smoothness, is a free parameter.
In order to allow decaying away from exact periodicity, the product with an
RBF kernel is taken. The length-scale of this RBF component controls the
decay time and is a further free parameter.

- smaller, medium term irregularities are to be explained by a
RationalQuadratic kernel component, whose length-scale and alpha parameter,
which determines the diffuseness of the length-scales, are to be determined.
According to [RW2006]_, these irregularities can better be explained by
a RationalQuadratic than an RBF kernel component, probably because it can
accommodate several length-scales.

- a "noise" term, consisting of an RBF kernel contribution, which shall
explain the correlated noise components such as local weather phenomena,
and a WhiteKernel contribution for the white noise. The relative amplitudes
and the RBF's length scale are further free parameters.

Maximizing the log-marginal-likelihood after subtracting the target's mean
yields the following kernel with an LML of -83.214:

::

34.4**2 * RBF(length_scale=41.8)
+ 3.27**2 * RBF(length_scale=180) * ExpSineSquared(length_scale=1.44,
periodicity=1)
+ 0.446**2 * RationalQuadratic(alpha=17.7, length_scale=0.957)
+ 0.197**2 * RBF(length_scale=0.138) + WhiteKernel(noise_level=0.0336)

Thus, most of the target signal (34.4ppm) is explained by a long-term rising
trend (length-scale 41.8 years). The periodic component has an amplitude of
3.27ppm, a decay time of 180 years and a length-scale of 1.44. The long decay
time indicates that we have a locally very close to periodic seasonal
component. The correlated noise has an amplitude of 0.197ppm with a length
scale of 0.138 years and a white-noise contribution of 0.197ppm. Thus, the
overall noise level is very small, indicating that the data can be very well
explained by the model. The figure shows also that the model makes very
confident predictions until around 2015

.. figure:: ../auto_examples/gaussian_process/images/sphx_glr_plot_gpr_co2_003.png
:target: ../auto_examples/gaussian_process/plot_gpr_co2.html
:align: center
* :ref:`sphx_glr_auto_examples_gaussian_process_plot_gpr_noisy_targets.py`
* :ref:`sphx_glr_auto_examples_gaussian_process_plot_gpr_noisy.py`
* :ref:`sphx_glr_auto_examples_gaussian_process_plot_compare_gpr_krr.py`
* :ref:`sphx_glr_auto_examples_gaussian_process_plot_gpr_co2.py`

.. _gpc:

Expand Down
6 changes: 3 additions & 3 deletions examples/gaussian_process/plot_gpr_co2.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""
=======================================================
Gaussian process regression (GPR) on Mauna Loa CO2 data
=======================================================
====================================================================================
Forecasting of CO2 level on Mona Loa dataset using Gaussian process regression (GPR)
====================================================================================

This example is based on Section 5.4.3 of "Gaussian Processes for Machine
Learning" [RW2006]_. It illustrates an example of complex kernel engineering
Expand Down
6 changes: 3 additions & 3 deletions examples/gaussian_process/plot_gpr_noisy.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""
=============================================================
Gaussian process regression (GPR) with noise-level estimation
=============================================================
=========================================================================
Ability of Gaussian process regression (GPR) to estimate data noise-level
=========================================================================

This example shows the ability of the
:class:`~sklearn.gaussian_process.kernels.WhiteKernel` to estimate the noise
Expand Down