Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/modules/density.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ Here we have used ``kernel='gaussian'``, as seen above.
Mathematically, a kernel is a positive function :math:`K(x;h)`
which is controlled by the bandwidth parameter :math:`h`.
Given this kernel form, the density estimate at a point :math:`y` within
a group of points :math:`x_i; i=1\cdots N` is given by:
a group of points :math:`x_i; i=1, \cdots, N` is given by:

.. math::
\rho_K(y) = \sum_{i=1}^{N} K(y - x_i; h)
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/gaussian_process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ of a :class:`Sum` kernel, where it modifies the mean of the Gaussian process.
It depends on a parameter :math:`constant\_value`. It is defined as:

.. math::
k(x_i, x_j) = constant\_value \;\forall\; x_1, x_2
k(x_i, x_j) = constant\_value \;\forall\; x_i, x_j

The main use-case of the :class:`WhiteKernel` kernel is as part of a
sum-kernel where it explains the noise-component of the signal. Tuning its
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/linear_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -383,7 +383,7 @@ scikit-learn.
For a linear Gaussian model, the maximum log-likelihood is defined as:

.. math::
\log(\hat{L}) = - \frac{n}{2} \log(2 \pi) - \frac{n}{2} \ln(\sigma^2) - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{2\sigma^2}
\log(\hat{L}) = - \frac{n}{2} \log(2 \pi) - \frac{n}{2} \log(\sigma^2) - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{2\sigma^2}

where :math:`\sigma^2` is an estimate of the noise variance,
:math:`y_i` and :math:`\hat{y}_i` are respectively the true and predicted
Expand Down
4 changes: 2 additions & 2 deletions doc/modules/neural_networks_supervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Multi-layer Perceptron
**Multi-layer Perceptron (MLP)** is a supervised learning algorithm that learns
a function :math:`f: R^m \rightarrow R^o` by training on a dataset,
where :math:`m` is the number of dimensions for input and :math:`o` is the
number of dimensions for output. Given a set of features :math:`X = {x_1, x_2, ..., x_m}`
number of dimensions for output. Given a set of features :math:`X = \{x_1, x_2, ..., x_m\}`
and a target :math:`y`, it can learn a non-linear function approximator for either
classification or regression. It is different from logistic regression, in that
between the input and the output layer, there can be one or more non-linear
Expand Down Expand Up @@ -233,7 +233,7 @@ training.

.. dropdown:: Mathematical formulation

Given a set of training examples :math:`(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)`
Given a set of training examples :math:`\{(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\}`
where :math:`x_i \in \mathbf{R}^n` and :math:`y_i \in \{0, 1\}`, a one hidden
layer one hidden neuron MLP learns the function :math:`f(x) = W_2 g(W_1^T x + b_1) + b_2`
where :math:`W_1 \in \mathbf{R}^m` and :math:`W_2, b_1, b_2 \in \mathbf{R}` are
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/sgd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -405,7 +405,7 @@ Mathematical formulation
We describe here the mathematical details of the SGD procedure. A good
overview with convergence rates can be found in [#6]_.

Given a set of training examples :math:`(x_1, y_1), \ldots, (x_n, y_n)` where
Given a set of training examples :math:`\{(x_1, y_1), \ldots, (x_n, y_n)\}` where
:math:`x_i \in \mathbf{R}^m` and :math:`y_i \in \mathbf{R}`
(:math:`y_i \in \{-1, 1\}` for classification),
our goal is to learn a linear scoring function
Expand Down