Tut6 Questions

Uploaded by

Amir Sharifi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views2 pages

Tut6 Questions

Uploaded by

Amir Sharifi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

MLPR Tutorial Sheet 6

Reminder: If you need more guidance to get started on a question, seek clarifications and
hints on the class forum. Move on if you’re getting stuck on a part for a long time. Full
answers will be released after the last group meets.

1. More practice with Gaussians:

N noisy independent observations are made of an unknown scalar quantity m:

x (n) ∼ N (m, σ2 ).

a) I don’t give you the raw data, { x (n) }, but tell you the mean of the observations:
N
1
x̄ =
N ∑ x (n) .
n =1

What is the likelihood1 of m given only this mean x̄? That is, what is p( x̄ | m)?2

b) A sufficient statistic is a summary of some data that contains all of the information
about a parameter.

i) Show that x̄ is a sufficient statistic of the observations for m, assuming we

know the noise variance σ2 . That is, show that p(m | x̄ ) = p(m | { x (n) }nN=1 ).

ii) If we don’t know the noise variance σ2 or the mean, is x̄ still a sufficient
statistic in the sense that p(m | x̄ ) = p(m | { x (n) }nN=1 )? Explain your reasoning.

2. Conjugate priors: (This question sets up some intuitions about the larger picture for
Bayesian methods. But if you’re finding the course difficult, look at Q3 first.)
A conjugate prior for a likelihood function is a prior where the posterior is a distribution
in the same family as the prior. For example, a Gaussian prior on the mean of a
Gaussian distribution is conjugate to Gaussian observations of that mean.

a) The inverse-gamma distribution is a distribution over positive numbers. It’s often

used to put a prior on the variance of a Gaussian distribution, because it’s a
conjugate prior.
The inverse-gamma distribution has pdf (as cribbed from Wikipedia):

β α − α −1 β
p(z | α, β) = z exp − , with α > 0, β > 0,
Γ(α) z
where Γ(·) is a gamma function.3
Assume we obtain N observations from a zero-mean Gaussian with unknown
variance,
x (n) ∼ N (0, σ2 ), n = 1 . . . N,
and that we place an inverse-gamma prior with parameters α and β on the
variance. Show that the posterior over the variance is inverse-gamma, and find its
parameters.
Hint: you can assume that the posterior distribution is a distribution; it nor-
malizes to one. You don’t need to keep track of normalization constants, or do

1. I’m using the traditional statistics usage of the word “likelihood”: it’s a function of parameters given data, equal
to the probability of the data given the parameters. Personally I avoid saying “likelihood of the data” (Cf p29 of
MacKay’s textbook), although you’ll see that usage too.
2. The sum of Gaussian outcomes is Gaussian distributed; you only need to identify a mean and variance.
3. Numerical libraries often come with a gammaln or lgamma function to evaluate the log of the gamma function.

MLPR:tut6 Iain Murray, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2018/ 1

any integration. Simply show that the posterior matches the functional form of
the inverse-gamma, and then you know the normalization (if you need it) by
comparison to the pdf given.

b) i) If a conjugate prior exists, then the data can be replaced with sufficient
statistics. Can you explain why?

ii) Explain whether there could be a conjugate prior for the hard classifier:
(
> 1 w> x + b > 0
P(y = 1 | x, w) = Θ(w x + b) =
0 otherwise.

This question is intended as a tutorial discussion point. It might be hard

to write down a mathematically rigorous argument. But can you explain
whether it is easy to represent beliefs about the weights of a classifier in a
fixed-size statistic, regardless of what data you gather? A picture may help.

3. Regression with input-dependent noise:

In lectures we turned a representation of a function into a probabilistic model of
real-valued outputs by modelling the residuals as Gaussian noise:

p(y | x, θ ) = N (y; f (x; θ ), σ2 ).

The noise variance σ2 is often assumed to be a constant, but it could be a function of

the input location x (a “heteroscedastic” model).
A flexible model could set the variance using a neural network:

σ (x)2 = exp(w(σ)> h(x; θ ) + b(σ) ),

where h is a vector of hidden unit values. These could be hidden units from the neural
network used to compute function f (x; θ ), or there could be a separate network to
model the variances.

a) Assume that h is the final layer of the same neural network used to compute f .
How could we modify the training procedure for a neural network that fits f by
least squares, to fit this new model?

b) In the suggestion above, the activation a(σ) = w(σ)> h + b(σ) sets the log of the
variance of the observations.

i) Why not set the variance directly to this activation value, σ2 = a(σ) ?

ii) Harder (I don’t know if you’ll have an answer, but I’m curious to find out):
Why not set the variance to the square of this activation value, σ2 = ( a(σ) )2 ?

c) Given a test input x(∗) , the model above outputs both a guess of an output, f (x(∗) ),
and an ‘error bar’ σ (x(∗) ), which indicates how wrong the guess could be.
The Bayesian linear regression and Gaussian process models covered in lectures
also give error bars on their predictions. What are the pros and cons of the neural
network approach in this question? Would you use this neural network to help
guide which experiments to run?

MLPR:tut6 Iain Murray, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2018/ 2

Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
Bayes Gauss
100% (1)
Bayes Gauss
29 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Lec11 Introduction2BayesianStatistics
No ratings yet
Lec11 Introduction2BayesianStatistics
48 pages
Class7 AI Ans Key
100% (1)
Class7 AI Ans Key
2 pages
Lec12 13 BayesianInferenceForTheGaussian
No ratings yet
Lec12 13 BayesianInferenceForTheGaussian
57 pages
Lec8 MLE
No ratings yet
Lec8 MLE
35 pages
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
Lec22 Introduction2BayesianRegression
No ratings yet
Lec22 Introduction2BayesianRegression
42 pages
Lec24 BayesianLinearRegression
No ratings yet
Lec24 BayesianLinearRegression
29 pages
ML and MAP - HTML
No ratings yet
ML and MAP - HTML
9 pages
20 Bayesian2
No ratings yet
20 Bayesian2
50 pages
Intro Bayes Time Series 1
No ratings yet
Intro Bayes Time Series 1
72 pages
ML 3
No ratings yet
ML 3
66 pages
Artificial Intelligence, Machine Learning and Deep Learning: Pariwat Ongsulee
No ratings yet
Artificial Intelligence, Machine Learning and Deep Learning: Pariwat Ongsulee
6 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Lin Reg
No ratings yet
Lin Reg
34 pages
Gaussian Processes in Machine Learning
No ratings yet
Gaussian Processes in Machine Learning
9 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
Scribe Notes BML
No ratings yet
Scribe Notes BML
25 pages
RVM Tutorial
No ratings yet
RVM Tutorial
25 pages
Statistical Learning: First Steps: Sasha Rakhlin
No ratings yet
Statistical Learning: First Steps: Sasha Rakhlin
26 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
Priors in Bayesian Learning
No ratings yet
Priors in Bayesian Learning
26 pages
Lecture2 2013
No ratings yet
Lecture2 2013
60 pages
Lecture 5
No ratings yet
Lecture 5
23 pages
Ds 7
No ratings yet
Ds 7
20 pages
CS772 Lec5
No ratings yet
CS772 Lec5
22 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
Bishop2008 Chapter ANewFrameworkForMachineLearnin
No ratings yet
Bishop2008 Chapter ANewFrameworkForMachineLearnin
24 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Bayesian Inference For The Gaussian
No ratings yet
Bayesian Inference For The Gaussian
11 pages
Lec 38
No ratings yet
Lec 38
8 pages
CHP 3
No ratings yet
CHP 3
6 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Ai Class 10
No ratings yet
Ai Class 10
78 pages
Machine Learning and Pattern Recognition Week 10 - Bayes - Logistic - Regression
No ratings yet
Machine Learning and Pattern Recognition Week 10 - Bayes - Logistic - Regression
4 pages
Lec 37
No ratings yet
Lec 37
5 pages
Chapter 2.3.6
No ratings yet
Chapter 2.3.6
4 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
Supplementary Material For Lesson 10: 1 Conjugate Posterior For The Normal Mean
No ratings yet
Supplementary Material For Lesson 10: 1 Conjugate Posterior For The Normal Mean
4 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
1.2.6 Advanced
No ratings yet
1.2.6 Advanced
5 pages
Bayesian Basics: Ryan P. Adams
No ratings yet
Bayesian Basics: Ryan P. Adams
7 pages
Deep GP Untuk Speech
No ratings yet
Deep GP Untuk Speech
8 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Machine Learning and Pattern Recognition Bayesian Complexity Control
No ratings yet
Machine Learning and Pattern Recognition Bayesian Complexity Control
4 pages
Machine Learning and Pattern Recognition Gaussian Processes
No ratings yet
Machine Learning and Pattern Recognition Gaussian Processes
6 pages
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
3 pages
Tut7 Questions
No ratings yet
Tut7 Questions
2 pages
Gaussian Process - Part 2: 1 2 N T I 1 2 N T
No ratings yet
Gaussian Process - Part 2: 1 2 N T I 1 2 N T
4 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
DBRAIT's LESSON PLAN FORMAT
No ratings yet
DBRAIT's LESSON PLAN FORMAT
9 pages
Roots of Equations: Open Methods
No ratings yet
Roots of Equations: Open Methods
53 pages
Duality Theory in LP
No ratings yet
Duality Theory in LP
47 pages
A Novel Curve Clustering Method For Functional Dat
No ratings yet
A Novel Curve Clustering Method For Functional Dat
28 pages
Digital Filter Design
No ratings yet
Digital Filter Design
102 pages
16.36 Communication Systems Engineering: Mit Opencourseware
No ratings yet
16.36 Communication Systems Engineering: Mit Opencourseware
16 pages
Or Merged
No ratings yet
Or Merged
63 pages
CS103 Cheat Sheet
No ratings yet
CS103 Cheat Sheet
2 pages
Binary Variable Models
No ratings yet
Binary Variable Models
73 pages
A Simple Data Compression Algorithm For Anomaly Detection in Wireless Sensor Networks
No ratings yet
A Simple Data Compression Algorithm For Anomaly Detection in Wireless Sensor Networks
8 pages
Macro2 HW2 Solution v1
No ratings yet
Macro2 HW2 Solution v1
15 pages
Trial Exam 2021 With Solutions
No ratings yet
Trial Exam 2021 With Solutions
10 pages
Reports Raw 2023 12 20
No ratings yet
Reports Raw 2023 12 20
18 pages
DSP Lecture 3
No ratings yet
DSP Lecture 3
22 pages
EMM 3514 - Numerical Method
No ratings yet
EMM 3514 - Numerical Method
33 pages
Department of Computer Science and Engineering: A) Year-1 Semester-1
No ratings yet
Department of Computer Science and Engineering: A) Year-1 Semester-1
6 pages
Activity 2 (Midterm)
No ratings yet
Activity 2 (Midterm)
4 pages
First Eigenvalue of Weighted P-Laplacian Under Cotton Flow
No ratings yet
First Eigenvalue of Weighted P-Laplacian Under Cotton Flow
8 pages
Survey Detection of Crop Diseases Using Multiscaling Technique
No ratings yet
Survey Detection of Crop Diseases Using Multiscaling Technique
3 pages
The Calculation of Integrals Involving B-Splines by Means of Recursion Relations
No ratings yet
The Calculation of Integrals Involving B-Splines by Means of Recursion Relations
10 pages
Design and Analysis of Algorithms - MCQS: Home About Us Contact Our Policy
No ratings yet
Design and Analysis of Algorithms - MCQS: Home About Us Contact Our Policy
9 pages
A Scheme To Control The Speed of A DC Motor With Time Delay Using LQR-PID Controller
No ratings yet
A Scheme To Control The Speed of A DC Motor With Time Delay Using LQR-PID Controller
6 pages
Iba Pang Teknik at Estratehiya Sa Pagtuturo
No ratings yet
Iba Pang Teknik at Estratehiya Sa Pagtuturo
5 pages
Postgres PTH
No ratings yet
Postgres PTH
7 pages
Pre-Processing Example - 1
No ratings yet
Pre-Processing Example - 1
6 pages
ECE Intro To Robotics, Class Worksheet - Lecture 8 Name
No ratings yet
ECE Intro To Robotics, Class Worksheet - Lecture 8 Name
2 pages
Quantitative Methods FINAL QUIZ 1 - Attempt Review
No ratings yet
Quantitative Methods FINAL QUIZ 1 - Attempt Review
2 pages
Tut4 Questions
No ratings yet
Tut4 Questions
2 pages
Tut1 Questions
No ratings yet
Tut1 Questions
2 pages
Name: in The Name of Almighty Statistical Pattern Recognition Homework 1
No ratings yet
Name: in The Name of Almighty Statistical Pattern Recognition Homework 1
2 pages
Faster RCNN Object Detection With PyTorch - DebuggerCafe
No ratings yet
Faster RCNN Object Detection With PyTorch - DebuggerCafe
1 page
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet