06 Gaussian Distributions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Probabilistic Inference and Learning

Lecture 06
Gaussian Probability Distributions

Philipp Hennig
04 May 2021

Faculty of Science
Department of Computer Science
Chair for the Methods of Machine Learning
# date content Ex # date content Ex
1 20.04. Introduction 1 14 09.06. Logistic Regression 8
2 21.04. Reasoning under Uncertainty 15 15.06. Exponential Families
3 27.04. Continuous Variables 2 16 16.06. Graphical Models 9
4 28.04. Monte Carlo 17 22.06. Factor Graphs
5 04.05. Markov Chain Monte Carlo 3 18 23.06. The Sum-Product Algorithm 10
6 05.05. Gaussian Distributions 19 29.06. Example: Topic Models
7 11.05. Parametric Regression 4 20 30.06. Mixture Models 11
8 12.05. Understanding Deep Learning 21 06.07. EM
9 18.05. Gaussian Processes 5 22 07.07. Variational Inference 12
10 19.05. An Example for GP Regression 23 13.07. Example: Topic Models
11 25.05. Understanding Kernels 6 24 14.07. Example: Inferring Topics 13
12 26.05. Gauss-Markov Models 25 20.07. Example: Kernel Topic Models
13 08.06. GP Classification 7 26 21.07. Revision

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 1
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 2
The (univariate) Gaussian distribution
an exponentiated square

0.4

0.3

µ the mean of x
p(x)

(x−µ)2

0.2 p(x) = √1 e
σ 2π
2σ 2 =: N (x; µ, σ 2 ) σ 2 the variance of x
σ the standard deviation of x
0.1

x
0
0 1 2 3 4 5 6
µ−σ µ µ+σ
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 3
Univariate Gaussians
some observations and notations, conventions

Definition

1 (x−µ)2
N (x; µ, σ 2 ) =: √ e− 2σ2 with µ, σ ∈ R
σ 2π
will be called the Gaussian or normal distribution of x. We call x the argument or variable, µ, σ 2 the
parameters. We write x ∼ N (µ, σ 2 ) to say that the variable x is distributed with pdf N (x; µ, σ 2 ).
Z
▶ N (x; µ, σ 2 ) dx = 1 and N (x; µ, σ 2 ) > 0 ∀x ∈ R. So N is the density of a probability measure.
▶ Symmetry in x and µ: N (x; µ, σ 2 ) = N (µ; x, σ 2 )
▶ An exponential of a quadratic polynomial of the natural parameters (a, η, τ ) :
 
1
N (x; µ, σ 2 ) = exp a + ηx − τ x2 with τ = σ −2 (“precision”), η = σ −2 µ
2
1 
a = − log(2π) − log λ2 + λ2 η 2
2
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 4
Gaussian Inference
The Gaussian is its own conjugate prior.

Let
p(x)
0.8 p(x) = N (x; µ, σ 2 )
p(y | x)
p(x | y) p(y | x) = N (y; x, ν 2 )
0.6
Then
p(x)

0.4 p(x)p(y | x)
p(x | y) = R
p(x)p(y | x) dx
0.2 = N (x; m, s2 ), with
1
s2 := −2
0 σ + ν −2
σ −2 µ + ν −2 y
0 2 4 6 m :=
σ −2 + ν −2
x

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 5
Gaussian Inference
Least-Squares Estimation

p(x) = N (x; µ, σ 2 )
1 Y
N
p(y | x) = N (yi ; x, νi2 )
i=1
p(x)p(y | x)
p(x | y) = R
p(x)

0.5 p(x)p(y | x) dx
= N (x; m, s2 ), with

0 X
N
s−2 := σ −2 + νi−2
i=1
X
N

−1 0 1 2 3 4 s−2 m := σ −2 µ + νi−2 yi
x i=1

If σ −2 _ 0, νi = ν ∀i, then m is the arithmetic mean.


Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 6
The Method of Least Squares
The Gaussian distribution is the unique choice yielding a mean that is the mean of measurements. [image: C.A. Jensen, 1840]

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 7
The Multivariate Gaussian distribution
An exponentiated quadratic form

Definition (multivariate Gaussian distribution)

 
1 1 ⊺ −1
N (x; µ, Σ) = exp − (x − µ) Σ (x − µ) x, µ ∈ Rn , Σ ∈ Rn×n , spd.
(2π)n/2 |Σ|1/2 2

Σ must be symmetric positive definite.

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 8
The Multivariate Gaussian distribution
An exponentiated quadratic form

Definition (multivariate Gaussian distribution)

 
1 1 ⊺ −1
N (x; µ, Σ) = exp − (x − µ) Σ (x − µ) x, µ ∈ Rn , Σ ∈ Rn×n , spd.
(2π)n/2 |Σ|1/2 2

Σ must be symmetric positive definite.

Definition (symmetric positive definite matrix)


A matrix A ∈ Rn×n is called symmetric positive (semi-) definite if A = A⊺ , and

v⊺ Av ≥ 0 ∀v ∈ Rn .

Equivalent statement: All eigenvalues of the symmetric matrix A are non-negative.

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 8
The Multivariate Gaussian distribution
Equiprobability lines are ellipsoids

 
1 1 ⊺ −1
N (x; µ, Σ) = exp − (x − µ) Σ (x − µ) x, µ ∈ Rn , Σ ∈ Rn×n , spd.
(2π)n/2 |Σ|1/2 2
Z
8 ▶ N (x; µ, Σ) = 1 and N (x; µ, Σ) > 0 ∀x ∈ Rn .

6 ▶ Symmetry in x and µ: N (x; µ, Σ) = N (µ; x, Σ)


▶ An exponential of a quadratic polynomial:
4
 
⊺ 1 ⊺
µ2 N (x; µ, Σ) = exp a + η x − x Λx (1)
x2

2
 
0 1
= exp a + η ⊺ x − tr(xx⊺ Λ) (2)
2
−2

−4 with the natural parameters Λ = Σ−1 (precision


−4 −2 0 µ1 4 6 8 matrix), η = Λµ, and the sufficient statistics
x1
x, xx⊺ .
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 9
Products of Gaussians are Gaussians
Closure under Multiplication

6 To multiply Gaussians, add the natural parameters

4 N (x; a, A)N (x; b, B) = N (x; c, C)Z

µ2 C = (A−1 + B−1 )−1


x2

c = C(A−1 a + B−1 b)
0 Z = N (a; b, A + B)

−2
Note similarity to univariate case.
−4 µ1
−4 −2 0 4 6 8
x1

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 10
Products of Gaussians are Gaussians
Closure under Multiplication

6 To multiply Gaussians, add the natural parameters

4 N (x; a, A)N (x; b, B) = N (x; c, C)Z

µ2 C = (A−1 + B−1 )−1


x2

c = C(A−1 a + B−1 b)
0 Z = N (a; b, A + B)

−2
Note similarity to univariate case.
−4 µ1
−4 −2 0 4 6 8
x1

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 10
Products of Gaussians are Gaussians
Closure under Multiplication

6 To multiply Gaussians, add the natural parameters

4 N (x; a, A)N (x; b, B) = N (x; c, C)Z

µ2 C = (A−1 + B−1 )−1


x2

c = C(A−1 a + B−1 b)
0 Z = N (a; b, A + B)

−2
Note similarity to univariate case.
−4 µ1
−4 −2 0 4 6 8
x1

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 10
Linear Projections of Gaussians are Gaussians
Closure under linear maps

To linearly project a Gaussian variable,


4
project the parameters
2
x2

p(z) = N (z; µ, Σ)
0 ⇒ p(Az) = N (Az, Aµ, AΣA⊺ )

−2

−4
−4 −2 0 2 4 6 8
x1

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 11
Marginals of Gaussians are Gaussians
Closure under marginalization

p(z) = N (z; µ, Σ) p(Az) = N (Az, Aµ, AΣA⊺ )



6 
choose A = 1 0
4
Z      
x µx Σxx Σxy
N ; , dy = N (x; µx , Σxx )
y µy Σyx Σyy
2
x2

▶ this is the sum rule


0 Z Z
p(x, y) dy = p(y | x)p(x) dy = p(x)
−2

−4
▶ so every finite-dim Gaussian is a marginal of
−4 −2 0 2 4 6 8 infinitely many more
x1

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 12
Cuts through Gaussians are Gaussians
Closure under conditioning

6 p(x, y)
p(x | Ax = y) =
p(y)
4
= N x; µ + ΣA⊺ (AΣA⊺ )−1 (y − Aµ),

2 Σ − ΣA⊺ (AΣA⊺ )−1 AΣ
x2

▶ this is the product rule


−2
▶ so Gaussians are closed under the rules of
−4
probability
−4 −2 0 2 4 6 8
x1

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 13
Inference with Gaussians
Since conditioning and marginalization are mapped to linear algebra, so is Bayes’ Theorem

Theorem

If p(x) = N (x; µ, Σ)
and p(y | x) = N (y; Ax + b, Λ),
then p(y) = N (y; Aµ + b, Λ + AΣA⊺ )
and p(x | y) = N (x; µ + ΣA⊺ (AΣA⊺ + Λ)−1 (y − (Aµ + b)), Σ − ΣA⊺ (AΣA⊺ + Λ)−1 AΣ)
| {z }| {z } | {z }
gain residual Gram matrix
−1 ⊺ −1 −1 ⊺ −1 −1 −1
= N (x; (Σ +A Λ A) (A Λ (y − b) + Σ µ), (Σ + A⊺ Λ−1 A)−1 )
| {z } | {z }
precision matrix precision matrix

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 14
The Core Insight for All of This
Gaussian inference is linear algebra at its core [image: Konrad Jacobs]

 
P Q
A= M := (S − RP−1 Q)−1
R S
 −1 
−1 P + P−1 QMRP−1 −P−1 QM
A =
−MRP−1 M
(Z + UWV⊺ )−1 = Z−1 − Z−1 U(W−1 + V⊺ Z−1 U)−1 V⊺ Z−1
|Z + UWV⊺ | = |Z| · |W| · |W−1 + V⊺ Z−1 U|

Issai Schur (1875–1941)


Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 15
Gaussians provide the linear algebra of inference
if all joints are Gaussian and all observations are linear, all posteriors are Gaussian

▶ products of Gaussians are Gaussians ▶ marginals of Gaussians are Gaussians


∫ [( ) ( ) ( )]
N (x; a, A)N (x; b, B) x µx Σxx Σxy
N ; , dy = N (x; µx , Σxx )
= N (x; c, C)N (a; b, A + B) y µy Σyx Σyy
−1
C := (A + B−1 )−1 c := C(A−1 a + B−1 b) ▶ (linear) conditionals of Gaussians are Gaussians
▶ linear projections of Gaussians are Gaussians p(x, y)
p(x | y) =
p(y)
p(z) = N (z; µ, Σ) ( )
⇒ p(Az) = N (Az, Aµ, AΣA⊺ ) = N x; µx + Σxy Σ−1 −1
yy (y − µy ), Σxx − Σxy Σyy Σyx

Bayesian inference becomes linear algebra

If p(x) = N (x; µ, Σ) and p(y | x) = N (y; A⊺ x + b, Λ), then


p(B⊺ x + c | y) = N [B⊺ x + c; B⊺ µ + c + B⊺ ΣA(A⊺ ΣA + Λ)−1 (y − A⊺ µ − b), B⊺ ΣB − B⊺ ΣA(A⊺ ΣA + Λ)−1 A⊺ ΣB]

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 16
Example 1: Conditional Independence, Marginal Correlation
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

temperature outside
x2

x1 x3

temperature temperature
in building 1 in building 2

x2 = ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )


x1 = w1 x2 + ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )
x3 = w3 x2 + ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 17
Example 1: Conditional Independence, Marginal Correlation
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

temperature outside
p(ν) = N (ν; µ, diag(σ 2 ))
x2  
1 w1 0
A = 0 1 0 =⇒
0 w3 1
x1 x3  
 
 w1 σ22 + σ12 w1 σ22 w1 w3 σ22 
temperature temperature  

p(x = Aν) = N x; Aµ ,  σ22 w3 σ22 
in building 1 in building 2 
 |{z}
=:m w23 σ22 + σ32 
| {z }
x2 = ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 ) =:Σ

x1 = w1 x2 + ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )


x3 = w3 x2 + ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 17
Example 1: Conditional Independence, Marginal Correlation
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

temperature outside
p(ν) = N (ν; µ, diag(σ 2 ))
x2  
1 w1 0
A = 0 1 0 =⇒
0 w3 1
x1 x3  
 
 w1 σ22 + σ12 w1 σ22 w1 w3 σ22 
temperature temperature  

p(x = Aν) = N x; Aµ ,  σ22 w3 σ22 
in building 1 in building 2 
 |{z}
=:m w23 σ22 + σ32 
| {z }
x2 = ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 ) =:Σ

x1 = w1 x2 + ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )


From graph: x1 ⊥⊥ x3 | x2 . Where can we see this in the pdf?
x3 = w3 x2 + ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 17
Example 1: Conditional Independence, Marginal Correlation
A zero in the precision matrix means independence conditional on everything else [DJC MacKay, The humble Gaussian distribution, 2006]

x2 x2 = ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )


x1 = w1 x2 + ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )
x3 = w3 x2 + ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )
x1 x3
to simplify exposition, set µ = 0.

p(x1 , x2 , x3 ) = p(x2 ) · p(x1 | x2 ) · p(x3 | x2 )


  
1 1 x22 (x1 − w1 x2 )2 (x3 − w3 x2 )2
= exp − + +
Z1 Z2 Z3 2 σ22 σ12 σ32
    
1 1 2 1 w21 w23 2 1 w1 2 1 w3
= exp − x + 2 + 2 + x1 2 − 2x1 x2 2 + x3 2 − 2x3 x2 2
Z1 Z2 Z3 2 2 σ22 σ1 σ3 σ1 σ1 σ3 σ3
  1  
w1
− σ2 0  
  w1 1  1 
σ2
1  1 1
 x1 
exp  − 2 w3   
2 2
w1 w3
=  2− x1 x2 x3  σ1 2
σ2
+ σ12 + 2
σ3
− σ3 
2 x2 
Z1 Z2 Z3
0 − w3 1 x3
σ32 σ32
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 18
Example 2: Explaining away
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

emission
gas price price
x1 x3

x2

electricity
price

x1 = ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )


x3 = ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )
x2 = w1 x1 + w3 x3 + ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 19
Example 2: Explaining away
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

emission
gas price price
x1 x3
 
 2 
 σ1 w1 σ12 0 
 
x2 
p(x) = N x; m,  σ2 + w21 σ12 + w23 σ32
2
w3 σ32 

 σ32 
electricity | {z }
Σ
price      2 
x1 µ1 σ1 0
p(x1 , x3 ) = N ; ,
x1 = ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 ) x3 µ3 0 σ32
x3 = ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )
x2 = w1 x1 + w3 x3 + ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 19
Example 2: Explaining away
a ± value in the precision matrix implies ∓ correlation conditional on everything else [DJC MacKay, The humble Gaussian distribution, 2006]

x1 = ν1 p(ν1 ) = N (ν1 ; µ1 , σ12 )


x1 x3
x3 = ν3 p(ν3 ) = N (ν3 ; µ3 , σ32 )

x2 x2 = w1 x1 + w3 x3 + ν2 p(ν2 ) = N (ν2 ; µ2 , σ22 )

p(x1 , x2 , x3 ) = p(x1 ) · p(x3 ) · p(x2 | x1 , x3 )


  
1 1 x1 x3 x2 − w1 x1 − w3 x3
= exp − + 2+
Z 1 · Z 2 · Z3 2 σ12 σ3 σ22
      
1 1 2 1 w21 2 1 w1 1 w23 w3 w1 w3
= exp − x1 + + x2 2 − 2x x
1 2 2 + x 2
3 + − 2x x
2 3 2 + 2x x
3 1
Z 1 · Z 2 · Z3 2 σ12 σ22 σ2 σ2 σ32 σ22 σ2 σ22
     
1 w2
+ σ12 − σw12 w1 w3  
 1  2σ12 σ 2
 x1 
1 2 2 2

exp  − x1 x2 x3  − σw12 − σw32  x2 


1
=    
Z 1 · Z 2 · Z3 2 2
σ22  2 
w1 w3 w3 w23 x3
σ2
− σ2
1
2σ 2
+ σ2
2 2 3 2

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 20
Example 2: Explaining away
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

     2 
x1 µ σ 0
p(x1 , x3 ) = N ; 1 , 1
25 x3 µ3 0 σ32

20
x3 [EUR/t]

15

10

2 4 6 8
x1 [USD/MMBtu]

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 21
Example 2: Explaining away
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

     2 
x1 µ σ 0
p(x1 , x3 ) = N ; 1 , 1
25 x3 µ3 0 σ32
 
p(x2 ) = N x2 ; w1 µ1 + w3 µ3 + µ2 , σ22 + w21 σ12 + w23 σ32
20
x3 [EUR/t]

15

10

2 4 6 8
x1 [USD/MMBtu]

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 21
Example 2: Explaining away
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

     2 
x1 µ σ 0
p(x1 , x3 ) = N ; 1 , 1
25 x3 µ3 0 σ32
 
p(x2 ) = N x2 ; w1 µ1 + w3 µ3 + µ2 , σ22 + w21 σ12 + w23 σ32
20 p(x2 | x1 , x3 ) = N (x2 ; w1 x1 + w3 x3 + µ2 , σ22 )
x3 [EUR/t]

15

10

2 4 6 8
x1 [USD/MMBtu]

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 21
Example 2: Explaining away
Bayesian Inference with Gaussians [DJC MacKay, The humble Gaussian distribution, 2006]

     2 
x1 µ σ 0
p(x1 , x3 ) = N ; 1 , 1
25 x3 µ3 0 σ32
 
p(x2 ) = N x2 ; w1 µ1 + w3 µ3 + µ2 , σ22 + w21 σ12 + w23 σ32
20 p(x2 | x1 , x3 ) = N (x2 ; w1 x1 + w3 x3 + µ2 , σ22 )

x3 [EUR/t]

x2 − wµ1,3 − µ2
p(x1 , x3 | x2 ) = N x1,3 ; µ1,3 − Σ1,3 w⊺ ,
wΣ1,3 w⊺ + σ22
15 
1
Σ1,3 − Σ1,3 w⊺ wΣ1,3
wΣ1,3 w⊺ + σ22
     
10
x1 µ w σ 2 x2 − w1 µ1 − w3 µ3 − µ2
=N ; 1 − 1 12 ,
x3 µ3 w3 σ3 w21 σ12 + w23 σ32 + σ22
2 4 6 8  2    
σ1 0 w1 σ12 1  
x1 [USD/MMBtu] − 2
w σ w3 σ32
0 σ32 w3 σ32 w21 σ12 + w23 σ32 + σ22 1 1
Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 21
 
1 1
N (x; µ, Σ) = exp − (x − µ)⊺ Σ−1 (x − µ)
(2π)n/2 |Σ|1/2 2
Today:
▶ Gaussian distributions provide the linear algebra of inference.
▶ products of Gaussians are Gaussians
▶ linear maps of Gaussian variables are Gaussian variables
▶ marginals of Gaussians are Gaussians
▶ linear conditionals of Gaussians are Gaussians
If all variables in a generative model are linearly related, and the distributions of the parent variables are
Gaussian, then all conditionals, joints and marginals are Gaussian, with means and covariances com-
putable by linear algebra operations.
▶ A zero off-diagonal element in the covariance matrix implies independence if all other variables
are integrated out
▶ A zero off-diagonal element in the precision matrix implies independence conditional on all other
variables
[Σ]ij = 0 ⇒ p(xi , xj ) = N (xi ; [µ]i , [Σ]ii ) · N (xj ; [µ]j , [Σ]jj )
−1
[Σ ]ij = 0 ⇒ p(xi , xj | x̸=i,j ) = N (xi | x̸=i,j ) · N (xj | x̸=i,j )

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 22
The Toolbox

Framework:
Z
p(y | x)p(x)
p(x1 , x2 ) dx2 = p(x1 ) p(x1 , x2 ) = p(x1 | x2 )p(x2 ) p(x | y) =
p(y)

Modelling: Computation:
▶ Directed Graphical Models ▶ Monte Carlo
▶ Gaussian Distributions ▶ Linear algebra / Gaussian inference
▶ ▶
▶ ▶
▶ ▶

Probabilistic ML — P. Hennig, SS 2021 — Lecture 06: Gaussian Probability Distributions— © Philipp Hennig, 2021 CC BY-NC-SA 3.0 23

You might also like