Ch-1 Probabilistic Distributions
Ch-1 Probabilistic Distributions
Ch-1 Probabilistic Distributions
AAiT
AAU
where
N=20
P=0.5
P(X=6) = 0.0369644
• (The gamma function extends the factorial to real numbers, i.e., Γ(n) =
(n − 1)!.) Mean and variance are given by
with l = N − m.
Simple interpretation of hyperparameters a and b as effective
number of observations of x = 1 and x = 0 (a priori)
As we observe new data, a and b are updated
As N → ∞, the variance (uncertainty) decreases and the mean
converges to the ML estimate
Figure 2.5 Plots of the Dirichlet distribution over three variables, where the two horizontal axes
are coordinates in the plane of the simplex and the vertical axis corresponds to the value of the
density. Here {αk} = 0.1 on the left plot, {αk} = 1 in the centre plot, and {αk} = 10 in the right plot.
Machine Learning Fantahun B. (PhD) @ AAU-AAiT 20
Multinomial Variables: Dirichlet Distribution
• Some plots of a Dirichlet distribution over 3 variables:
Dirichlet distribution with values (clockwise from top left): = (6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4).
Machine Learning Fantahun B. (PhD) @ AAU-AAiT 21
Multinomial Variables: Bayesian Way 3/3
• Multiplying the prior (2.38) by the likelihood function (2.34)
yields the posterior:
Figure 2.6 Histogram plots of the mean of N uniformly distributed numbers for various values
of N. We observe that as N increases, the distribution tends towards a Gaussian.
Machine Learning Fantahun B. (PhD) @ AAU-AAiT 25
The Gaussian distribution: Properties
• The law is a function of the Mahalanobis distance from x to μ:
where
• It is robust to outliers
Figure 2.16 Illustration of the robustness of Student’s t-
distribution compared to a Gaussian.
(a) Histogram distribution of 30 data points drawn
from a Gaussian distribution, together with the
maximum likelihood fit obtained from a t-distribution
(red curve) and a Gaussian (green curve, largely
hidden by the red curve). Because the t-distribution
contains the Gaussian as a special case it gives
almost the same solution as the Gaussian.
where
m is the concentration (precision) parameter,
θ0 is the mean.
Machine Learning Fantahun B. (PhD) @ AAU-AAiT 35
Mixtures (of Gaussians) (1/3)
• Data with distinct regimes better modeled with mixtures
• We obtain (2.230)