Fuskpaper Bayes

Bertil Wegmann
Dept of Computer and Information Science

Linköping University
Bayesian Learning
Computer Lab 1
You are recommended to use R for solving the labs.
You work and submit your labs in pairs, but both of you should contribute equally
and understand all parts of your solutions.
It is not allowed to share exact solutions with other student pairs.
The submitted lab reports will be veried through URKUND and indications of
plagiarism will be investigated by the Disciplinary Board.
Submit your solutions via LISAM, no later than April 19 at 23:30.
Please note the following about the format of the submitted lab report:
1. The lab report should include all solutions and plots to the stated problems with
necessary comments.
2. Submit the lab report with your code attached to the solution of each sub-problem
(1a), 1b),...) in one PDF document.
3. Submit a separate le containing all code.
1. Daniel Bernoulli
Let y1 , ..., yn |θ ∼ Bern(θ), and assume that you have obtained a sample with s = 13
successes in n = 50 trials. Assume a Beta(α0 , β0 ) prior for θ and let α0 = β0 = 5.
(a) Draw random numbers from the posterior θ|y ∼ Beta(α0 + s, β0 + f ), where
y = (y1 , . . . , yn ), and verify graphically that the posterior mean E [θ|y] and
standard deviation SD [θ|y] converges to the true values as the number of
random draws grows large. [Hint: use rbeta()].
(b) Use simulation (nDraws = 10000) to compute the posterior probability
Pr(θ < 0.3|y) and compare with the exact value from the Beta posterior. [Hint:
use pbeta()].
(c) Simulate draws from the posterior distribution of the log-odds φ = log 1−θ θ
by
using simulated draws from the Beta posterior for θ and plot the posterior
distribution of φ (nDraws = 10000). [Hint: hist() and density() can be
utilized].
2. Log-normal distribution and the Gini coecient.
Assume that you have asked 9 randomly selected persons about their monthly in-
come (in thousands Swedish Krona) and obtained the following nine observations:
33, 24, 48, 32, 55, 74, 23, 76 and 17. A common model for non-negative continuous
variables is the log-normal distribution. The log-normal distribution log N (µ, σ 2 )
has density function

2 1 1 2
p(y|µ, σ ) = √ exp − 2 (log y − µ) ,
y · 2πσ 2 2σ
1
where y > 0, µ > 0 and σ 2 > 0. The log-normal distribution is related to the
normal distribution as follows: if y ∼ log N (µ, σ 2 ) then log y ∼ N (µ, σ 2 ). Let
y1 , ..., yn |µ, σ 2 ∼ log N (µ, σ 2 ), where µ = 3.5 is assumed to be known but σ 2 is
iid
unknown with non-informative prior p(σ 2 ) ∝ 1/σ 2 . The posterior for σ 2 is the
Inv − χ2 (n, τ 2 ) distribution, where
Pn
2 i=1 (log yi − µ)2
τ = .
n
(a) Simulate 10, 000 draws from the posterior of σ 2 by assuming µ = 3.5 and plot
the posterior distribution.
(b) The most common measure of income inequality is the Gini coecient, G,
where 0 ≤ G ≤ 1. G = 0 means a completely equal income distribution,
whereas G = 1 means complete income inequality (see e.g. Wikipedia for√more
information about the Gini coecient). It can be shown that G = 2Φ σ/ 2 −

1 when incomes follow a log N (µ, σ 2 ) distribution. Φ(z) is the cumulative

distribution function (CDF) for the standard normal distribution with mean
zero and unit variance. Use the posterior draws in a) to compute the posterior
distribution of the Gini coecient G for the current data set.
(c) Use the posterior draws from b) to compute a 95% equal tail credible interval
for G. A 95% equal tail interval (a, b) cuts o 2.5% percent of the posterior
probability mass to the left of a, and 2.5% to the right of b.
(d) Use the posterior draws from b) to compute a 95% Highest Posterior Density
Interval (HPDI) for G. Compare the two intervals in (c) and (d). [Hint: do
a kernel density estimate of the posterior of G using the density function in
R with default settings, and use that kernel density estimate to compute the
HPDI. Note that you need to order/sort the estimated density values to obtain
the HPDI.].
3. Bayesian inference for the concentration parameter in the von Mises distribution.
This exercise is concerned with directional data. The point is to show you that
the posterior distribution for somewhat weird models can be obtained by plotting
it over a grid of values. The data points are observed wind directions at a given
location on ten dierent days. The data are recorded in degrees:
(285, 296, 314, 20, 299, 296, 40, 303, 326, 308),
where North is located at zero degrees (see Figure 1 on the next page, where the
angles are measured clockwise). To t with Wikipedias description of probability
distributions for circular data we convert the data into radians −π ≤ y ≤ π . The
10 observations in radians are
(1.83, 2.02, 2.33, −2.79, 2.07, 2.02, −2.44, 2.14, 2.54, 2.23).
Assume that these data points conditional on (µ, κ) are independent observations
from the following von Mises distribution:
exp [κ · cos(y − µ)]
p(y|µ, κ) = , −π ≤ y ≤ π,
2πI0 (κ)
where I0 (κ) is the modied Bessel function of the rst kind of order zero [see
?besselI in R]. The parameter µ (−π ≤ µ ≤ π ) is the mean direction and κ > 0 is
2
called the concentration parameter. Large κ gives a small variance around µ, and
vice versa. Assume that µ is known to be 2.51. Let κ ∼ Exponential(λ = 1) a
priori, where λ is the rate parameter of the exponential distribution (so that the
mean is 1/λ).
(a) Derive the expression for what the posterior p(κ|y, µ) is proportional to. Hence,
derive the function f (κ) such that p(κ|y, µ) ∝ f (κ). Then, plot the posterior
distribution of κ for the wind direction data over a ne grid of κ values. [Hint:
you need to normalize the posterior distribution of κ so that it integrates to
one.]
(b) Find the (approximate) posterior mode of κ from the information in a).
Figure 1: The wind direction data. Angles are measured clock-wise starting from North.
Good luck!
3
Answers to lab1 bayesian learning
Isabe723, mansj125
Question 1
(a) For each of the draws the mean (E[θ|y]) and standard deviation (SD[θ|y]) were calculated and plotted. It was
then compared to the true mean and standard deviation of the posterior distribution. The plots are given
below,
It can also be seen from the graphs given above that the mean and standard deviation of our random draws
from the posterior distribution converges to the true mean and standard deviation as the number of random
draws increases.
Code:
#---------------------------------1.1a----------------------
#priors
prior_alpha = 5
prior_beta = 5
#observed data
trials = 50
sucesses = 13
fails = trials - sucesses
#posteriors for beta distr

posterior_alpha = (sucesses+prior_alpha)
posterior_beta = (fails + prior_beta)
N = 5000
drawn_mean = c()
drawn_sd = c()
for ( n in seq(1,N)){
draw = rbeta(n,prior_alpha+sucesses,prior_beta+fails)
drawn_mean = c(drawn_mean , mean(draw))
drawn_sd = c(drawn_sd , sd(draw))
}
#beta distribution
#pdf: p ∝ x^(a-1) *(1-x)^(b-1)
#mean: a / (a + b)
#variance: (a*b) / (a + b)^2 * ( a + b + 1)
#true vals
#mean / expected value of Beta-distr
true_mean = (sucesses+prior_alpha) / (trials + prior_alpha + prior_beta)
#sd of Beta distr

true_sd = sqrt((posterior_alpha*posterior_beta) /
((posterior_alpha+posterior_beta)^2 * ( posterior_alpha + posterior_beta +1)))
plot(drawn_mean, col="blue", xlab="Number of drawn samples")

abline(h=true_mean, lty = 2, lwd = 2, col ="red")
plot(drawn_sd,col="blue", xlab = "number of drawn samples")

abline(h=true_sd, lty = 2, lwd = 2, col ="red")
(b) The probability (Pr(θ< 0.3|y) calculated from the draws was found to be 0.5189
The exact probability value from the distribution was found to be 0.5150226
Code:
#----------------------------------1.1b-------------------
simulated_posterior_prob = mean(rbeta(10000, posterior_alpha,posterior_beta) <.3

) #draw from posterior distribution, and check if smaller than 0.3
exact_posterior_prob = pbeta(.3, posterior_alpha, posterior_beta) #get the exact
value using pbeta,
print(paste("simulated:",simulated_posterior_prob,"exact:",exact_posterior_prob,
sep=" "))
(c) The posterior distribution of log odds was plotted using both hist and density and is given below,
Code:
#---------------------------------1.1c------------------
log_odds <- function(p){

return(log ( p/(1-p)))
}
ndraws = 10000
posterior_theta_draws = rbeta(ndraws,posterior_alpha, posterior_beta) # again,
draw from posterior distribution
hist(posterior_logodds, freq=F, main = "posterior distribution of phi =

logodds(theta)")
lines(density(posterior_logodds))
Question 2
a)
#------------------------------2a----------------------------
#priors
mu = 3.5
#sigma unknown with prior p(s^2) ∝ 1/s^2
#data
y = c(33, 24, 48, 32, 55, 74, 23, 76 , 17)
n = length(y)
numdraws = 1E5
tau_sq = sum((log(y)-mu)^2)/n
X_draws = rchisq(numdraws,n-1)
sigma_sq = (n-1)*tau_sq / X_draws # Simulating draws from Inv−χ2(n,τ)

distribution
# Theoretical density function for Inv−χ2(n,τ) distribution

inversechisq <- function(x, v, s){
density = (((v/2)^(v/2))/gamma(v/2)) * (s^v) * (x^(-(v/2+1))) * (exp((-
v*s^2)/(2*x)))
return (density)
}
# Plotting posterior draws from sampling

plot(density(sigma_sq), lwd=3, main="Simulated and theoretical posteriors for
sigma^2")
# Plotting posterior distribution using therotical density function
lines(seq(0.1,15,by=0.1), inversechisq(seq(0.1,15,by=0.1), n, sqrt(tau_sq)), type
= "l", col = "red", lty=2, lwd=3)
legend("topright", inset=.02, c("Simulated Posterior Draws", "Theoretical
Posterior Density"), fill=c("black", "red"))
b)
#------------------------------2b-------------------------
gini = 2*pnorm(sqrt(sigma_sq)/sqrt(2))-1 #formula given in instructions

plot(density(gini),main="posterior distribution of Gini-coefficient")
c) + d)
The intevals are quite different. The tail on the upper side of the ETCI is much longer than on the lower,
thus causing 95% equal tail interval to be significantly higher than the HDPI.
#----------------------------2d--------------------------
sorted = sort(gini)
# 2.d)
# Finding density of gini distribution
dense = density(sorted)
# storing coordinate and desnity values in a dataframe
df = data.frame(x=dense$x,y=dense$y)
# sorting dataframe based on density values
df = df[order(df$y),]
# Finding cumulative sum of sorted density values
df$cumsum = cumsum(df$y)
# finding all points at which the cumulative sum is > 5% if the total density (
95% Highest posterior interval)
TOTAL = sum(df$y)
df_2 = df[df$cumsum>TOTAL*0.05,c("x","y")]
points(df_2,col="red")
legend("topright", inset=.01, c("95% Equal Tail Confidence Intervel", "Highest
Density Posterior
Intervel"), fill=c("green", "red"))
Question 3
(a) & (b)
Code:
#-----------------------------3a&b---------------------------
y = c(1.83, 2.02, 2.33, -2.79, 2.07, 2.02, -2.44, 2.14, 2.54, 2.23)
p_cond <- function(y,k,mu=2.51){

n = length(y)
res = (1/(2*pi*besselI(k,0))^n) * exp(sum(k*cos(y-mu))-k)
return(res)
}
probs = c()
part_length = 0.01
tested_k_values = seq(0,10,part_length)
max_prob = 0
k_max_prob = 0
for (k in tested_k_values){
p =p_cond(y,k)
probs = c(probs,p)
if (p > max_prob){
max_prob = p
k_max_prob = k
}
}
plot(x =tested_k_values, y = probs / sum(probs * part_length),ylab = "density",

xlab = "k", main = "posterior distribution of k, with mode plotted.", sub =
paste("k_mode =",k_max_prob))
abline(v=k_max_prob, col ="red", lty=2)
Bertil Wegmann
Bayesian Learning
Computer Lab 2
Submit your solutions via LISAM, no later than May 3 at 23:30.
necessary comments.
1. Linear and polynomial regression

The dataset TempLambohov.txt contains daily average temperatures (in degree Cel-
cius) at Lambohov, Linköping over the course of the year 2019. The response vari-
able is temp and the covariate is
the number of days since the beginning of the year
time = .
365
A Bayesian analysis of the following quadratic regression model is to be performed:
temp = β0 + β1 · time + β2 · time2 + ε, ε ∼ N (0, σ 2 ).
iid
(a) Use the conjugate prior for the linear regression model. The prior hyper-
parameters µ0 , Ω0 , ν0 and σ02 shall be set to sensible values. Start with
µ0 = (−10, 100, −100)T , Ω0 = 0.02 · I3 , ν0 = 3 and σ02 = 2. Check if this
prior agrees with your prior opinions by simulating draws from the joint prior
of all parameters and for every draw compute the regression curve. This gives
a collection of regression curves; one for each draw from the prior. Does the
collection of curves look reasonable? If not, change the prior hyperparame-
ters until the collection of prior regression curves agrees with your prior beliefs
about the regression curve.
[Hint: R package mvtnorm can be used and your Inv -χ2 simulator of random
draws from Lab 1.]
(b) Write a function that simulate draws from the joint posterior distribu-
tion of β0 , β1 ,β2 and σ2 .
1
i. Plot a histogram for each marginal posterior of the parameters.
ii. Make a scatter plot of the temperature data and overlay a curve for the
posterior median of the regression function f (time) = E [temp|time] =
β0 + β1 · time + β2 · time2 , i.e. the median of f (time) is computed for
every value of time. In addition, overlay curves for the 95% equal tail
posterior probability intervals of f (time), i.e. the 2.5 and 97.5 posterior
percentiles of f (time) is computed for every value of time. Does the
posterior probability intervals contain most of the data points? Should
they?
(c) It is of interest to locate the time with the highest expected temperature (i.e.
the time where f (time) is maximal). Let's call this value x̃. Use the simulated
draws in (b) to simulate from the posterior distribution of x̃. [Hint: the
regression curve is a quadratic polynomial. Given each posterior draw of β0 , β1
and β2 , you can nd a simple formula for x̃.]
(d) Say now that you want to estimate a polynomial regression of order 8,
but you suspect that higher order terms may not be needed, and you worry
about overtting the data. Suggest a suitable prior that mitigates this potential
problem. You do not need to compute the posterior. Just write down your
prior. [Hint: the task is to specify µ0 and Ω0 in a suitable way.]
2. Posterior approximation for classication with logistic regression

The dataset WomenAtWork.dat contains n = 168 observations on the following eight
variables related to women:
Variable Data type Meaning Role
Work Binary Whether or not the woman works Response y
Constant 1 Constant to the intercept Feature
HusbandInc Numeric Husband's income Feature
EducYears Counts Years of education Feature
ExpYears Counts Years of experience Feature
Age Counts Age Feature
NSmallChild Counts Number of child ≤ 6 years in household Feature
NBigChild Counts Number of child > 6 years in household Feature
(a) Consider the logistic regression model:

exp xT β
Pr(y = 1|x, β) = ,
1 + exp (xT β)
where y equals 1 if the woman works and 0 if she does not. x is a 7-dimensional
vector containing the seven features (including a 1 to model the intercept).
The goal is to approximate the posterior distribution of the parameter vector
β with a multivariate normal distribution

β|y, x ∼ N β̃, Jy−1 (β̃) ,
where β̃ is the posterior mode and J(β̃) = − ∂ ∂β∂β |β=β̃ is the negative of
ln p(β|y)2
T
the observed Hessian evaluated at the posterior mode. Note that ∂ ∂β∂β is
2 ln p(β|y)
T
a 7 × 7 matrix with second derivatives on the diagonal and cross-derivatives
2
on the o-diagonal. You can compute this derivative by hand, but
∂ 2 ln p(β|y)
∂βi ∂βj
we will let the computer do it numerically for you. Calculate both β̃ and J(β̃)
by using the optim function in R. [Hint: You may use code snippets from my
demo of logistic regression in Lecture 6.] Use the prior β ∼ N (0, τ 2 I), where
τ = 5.
Present the numerical values of β̃ and Jy−1 (β̃) for the WomenAtWork data. Com-
pute an approximate 95% equal tail posterior probability interval for the regres-
sion coecient to the variable NSmallChild. Would you say that this feature
is of importance for the probability that a woman works?
[Hint: You can verify that your estimation results are reasonable by comparing
the posterior means to the maximum likelihood estimates, given by: glmModel
<- glm(Work ~ 0 + ., data = WomenAtWork, family = binomial).]
(b) Use your normal approximation to the posterior from (a). Write a function
that simulate draws from the posterior predictive distribution of Pr(y = 1|x),
where the values of x corresponds to a 43-year-old woman, with two children (7
and 10 years old), 12 years of education, 8 years of experience, and a husband
with an income of 20. Plot the posterior predictive distribution of Pr(y = 1|x)
for this woman.
[Hints: The R package mvtnorm will be useful. Remember that Pr(y = 1|x)
can be calculated for each posterior draw of β .]
(c) Now, consider 11 women which all have the same features as the woman in
(b). Rewrite your function and plot the posterior predictive distribution for
the number of women, out of these 11, that are working. [Hint: Simulate
from the binomial distribution, which is the distribution for a sum of Bernoulli
random variables.]
Good luck!
Best, Bertil
3
Report lab2 Bayesian
Isabe723, mansj125
Question 1
1.a
The collection of curves is reasonable. The darker areas indicate most of the draws resulted in a
somewhat satisfying model. Ω_0 was scaled down from 0.02 · I_3 to 0.1· I_3 in order to achieve a
tighter distribution.
CODE:
library(mvtnorm)
#--------------------------------2.1a----------------------
dat = read.table("TempLambohov.txt",header = T )
summary(dat)
plot(dat)
y = dat[,2]
X = data.frame(dat[,1])
X$time_sq = dat[,1]^2
colnames(X) = c("time", "time_sq")

X$const = 1
X = X[,c(3,1,2)]
head(X)
#hyperparams (for beta)

mu_0 = c(-10, 100,-100)
omega_0 = 0.1 * diag(3)
nu_0 = 3
sigma_sq_0 = 2
#draw sigma from inv_chgi_sq

numdraws = 50
Xdraws = rchisq(numdraws,nu_0)
#as per slide in L3 but with other params

sigma_sq = nu_0*sigma_sq_0/Xdraws
#sigma_sq
#draw beta from N

betas = matrix(nrow=numdraws,ncol=3)
for (i in 1:numdraws){
tmp = rmvnorm(1,mu_0,sigma_sq[i]*solve(omega_0)) # L5
#print(tmp)
betas[i,] = tmp
}
X_sample = data.frame(const = 1, time = seq(0,1,1/364),time_sq =

seq(0,1,1/364)^2)
plot(dat,ylim=c(-100,100), main="Measured data and simulated draws.")
draws_predict =betas%*%t(X_sample)
points(X_sample$time,draws_predict[i,],col="#F7000010")
}
1.b
The posterior intervals do not contain “most” of the data points. The model we have chosen is not
complex enough to fit the measured data. The posterior betas are selected to best fit the data, not to
“box in“ the measured data. It is therefore expected that the credible interval does not contain most of
the measured data. The plotted points do not take the error term into account, only the median.
CODE:
1.c
The purple bar indicated the highest expected temperature with each of the parameters sampled from
the posterior distribution. They are clustered in a small area just after midsummer, which makes sense.
CODE:
#-----------------------------------2.1b.i---------------------------------
"
create all the '_n' varialbes
draw some number of sigmas.
use the new sigmas to draw betas
show marginalö posterior for betas
X,y, omega0, mu_0, v0, n, sigma0_sq
"
posterior <- function(numdraws, X, y, omega_0, mu_0, nu_0,sigma_sq_0 ){
n = length(y)
XtX = t(X)%*%X
mu_n = 0
#mu_n
beta_hat = solve(t(X)%*%X)%*%t(X)%*%y # (X.t * X)^-1 * X.t * y
mu_n = solve(XtX + omega_0) %*% (XtX%*%beta_hat + omega_0%*%mu_0)
#omega_n
omega_n = XtX + omega_0
#nu_n
nu_n = nu_0 + n
#vs_n
vs_n = nu_0*sigma_sq_0 + (t(y)%*%y + t(mu_0)%*%omega_0%*%mu_0 -
t(mu_n)%*%omega_n%*%mu_n)
#draw sigmas
X_draws = rchisq(numdraws, nu_n)
sigma_sq = vs_n / X_draws
#print(sigma_sq)
parameters = matrix(nrow = numdraws,ncol=length(mu_0)+1)

om_inv = solve(omega_n)
parameters[i,1:length(mu_0)] = rmvnorm(1,mean= mu_n,sigma =
sigma_sq[i]*om_inv)
}
parameters[,length(mu_0)+1] = sigma_sq
return(parameters)
}
parameters = posterior(1000, as.matrix(X), as.matrix(y), omega_0, mu_0, nu_0,

sigma_sq_0 )
print(head(parameters))
hist(parameters[,1],freq = F,main="beta0- constant")#,xlim=c(-100,300))

hist(parameters[,2],freq = F, main ="beta1- time")#,xlim=c(-100,300))
hist(parameters[,3],freq = F, main = "beta2- time^2")#,xlim=c(-100,300))
hist(parameters[,4],freq = F, main = "sigma^2- variance")
#------------------------------------2.1b.ii--------------------------
plot(dat)
betas = parameters[,1:3]
n = dim(X_sample)[1]
preds = betas %*%t(X_sample)
median_preds = matrix(ncol=n)
quants = matrix(nrow=2,ncol=n)
for (i in 1:n){
median_preds[i] = median(preds[,i])
quants[,i] = quantile(preds[,i],c(.025,0.975))
points(X_sample$time,median_preds, col = "#ff000023")

points(X_sample$time,quants[1,], col ="#0000ff23")
points(X_sample$time,quants[2,], col ="#0000ff23")
#points(X_sample$time,median_preds, col = alpha("red",.8))
1.c)
The highest temperature is predicted just past mid-year. Wich is in line with what we would expect.
Code:
#--------------------------------2.1c---------------------
beta_derivs = matrix(nrow = dim(betas)[1], ncol = 2)

zeros = matrix(rep(0,n),nrow=n)
beta_derivs[,1] = betas[,2]
beta_derivs[,2] = 2*betas[,3]
highest_temp = -beta_derivs[,1]/beta_derivs[,2]
head(highest_temp)
hist(highest_temp,freq=F, main="Posterior distribtuiton for x~, time of year with
highest expected temp")
1.d
Using the prior μ_0=0,Ω_0=λ*I.
μ_0=0 will ensure the parameters stay closer to 0.

Ω_0=λ*I a higher vale of λ will give smaller spread in the posterior distribution and thus ensure
our parameters do not stray too far from the μ_0 we have specified.
Setting these hyperparameters correctly can help us avoid overfitting.
We could also vary the lambda for each parameter such that the lower order terms are given
more freedom to move, and the higher order terms are more restricted.
Question 2
a)
The numerical values for the beta modes are given below,
The numerical value of J-1(beta mode) is,
The 95% equal tail credible interval for the variable NSmallChild was found out to be,
Yes, this feature is important for the probability that a woman works because the 95% equal tailed
credible interval for the parameter has a significant negative value. This implies that there is a 95%
probability that this parameter is a significant negative value which in turn reduces the probability of the
women working when applied in the function for logistic regression. The plot for the distribution of
NSmallChild and its credible interval is given below,
CODE:
#-----------------------------2.2a------------------------------
library(mvtnorm)
#data
dat = read.table("WomenAtWork.dat",header=T)
head(dat)
X = as.matrix(dat[,-1])
y = as.matrix(dat[,1])
#priors
#mean (for betas)

num_covs = dim(X)[2]
betas_prior = as.matrix(rep(0,num_covs)) # ~N(0,tau^2 * I)
tau = 5
Sigma_prior = tau^2 * diag(num_covs)
LogPostLogistic <- function(betas,y,X,mu,Sigma){

linPred <- X%*%betas;
logLik <- sum( linPred*y - log(1 + exp(linPred)) );
#if (abs(logLik) == Inf) logLik = -20000; # Likelihood is not finite, stear the
optimizer away from here!
logPrior <- dmvnorm(t(betas), t(mu), Sigma, log=TRUE);
return(logLik + logPrior)
}
LogPostLogistic(betas_prior,y,X,betas_prior,Sigma_prior)
optres = optim(betas_prior,LogPostLogistic, gr = NULL,y,X,betas_prior,

Sigma_prior,method=c("BFGS"),control=list(fnscale=-1),hessian=TRUE)
optres
J = optres$hessian
beta_mode = optres$par
beta_posterior_draws = rmvnorm(1E4,mean = beta_mode, sigma = solve(-J))
small_child_draws = beta_posterior_draws[,6]
hist(small_child_draws, freq=F,sub="red lines indicate boundries of 95% equal

tail")
abline(v=quantile(small_child_draws,probs=c(0.025,0.975)), col="red")
#the interval is fully negative, indicating we number of small children is
significant at least to that level of certainty
glm(formula = "Work ~ 0 + .",data = dat, family = "binomial")
b)
The plot of densities and histogram of the posterior predictive distribution for the women working is
given below,
CODE:
#---------------------------------2.2b-----------------------
posterior_draw_predictive <- function(X, beta_mode, Sigma, numdraws = 1){
betas = rmvnorm(numdraws,beta_mode,sigma=Sigma)
linpred = matrix(nrow=numdraws,ncol=1)
linpred[i] = X %*% betas[i,]
}
tmp =exp(linpred)
prob = tmp / ( 1 + tmp)
return(prob)
N = 1E4
X_test = matrix(data = c(1, 20, 12, 8, 43, 0,2),ncol = 7, nrow=1 )
probs = posterior_draw_predictive(X_test,beta_mode,solve(-J), numdraws = N)
hist(probs, freq=F, main= "Posterior draws for probabilty that the woman is
working")
lines(density(probs))
#---------------------------2.2c-----------------------------
c)
The plot of histogram of the posterior predictive distribution for the number of women out of these 11
that are working is given below,
CODE:
#---------------------------2.2c-----------------------------
posterior_draw_predictive_11 <- function(X, beta_mode, Sigma, numdraws = 1){
betas = rmvnorm(numdraws,beta_mode,sigma=Sigma)
num_women = matrix(nrow=numdraws,ncol=1)
linpred = X %*% betas[i,]
tmp =exp(linpred)
prob = tmp / ( 1 + tmp)
num = rbinom(1,11,prob)
num_women[i] = num
}
return(num_women)
N = 1E4
X_test = matrix(data = c(1, 20, 12, 8, 43, 0,2),ncol = 7, nrow=1 )
num_women = posterior_draw_predictive_11(X_test,beta_mode,solve(-J), numdraws =

N)
hist(num_women, freq = F)
Bertil Wegmann
Bayesian Learning
Computer Lab 3
Submit your solutions via LISAM, no later than May 17 at 23:30.
necessary comments.
1. Gibbs sampler for a normal model

The dataset Precipitation.rds consists of daily records of weather with rain or
snow (in units of mm) from the beginning of 1962 to the end of 2008 in a certain
area. Assume the natural log of the daily precipitation {y1 , ..., yn } to be indepen-
∼ N (µ, σ 2 ), where both µ and σ 2 are
dent normally distributed, ln y1 , ..., ln yn |µ, σ 2 iid
unknown. Let µ ∼ N (µ0 , τ02 ) independently of σ 2 ∼ Inv -χ2 (ν0 , σ02 ).
(a) Implement (code!) a Gibbs sampler that simulates from the joint posterior
p(µ, σ 2 | ln y1 , ..., ln yn ). The full conditional posteriors are given on the slides
from Lecture 7. Evaluate the convergence of the Gibbs sampler by calculating
the Ineciency Factors (IFs) and by plotting the trajectories of the sampled
Markov chains.
(b) Plot the following in one gure: 1) a histogram or kernel density estimate of
the daily precipitation {y1 , ..., yn }. 2) The resulting posterior predictive density
p (ỹ|y1 , . . . , yn ) using the simulated posterior draws from (a). How well does
the posterior predictive density agree with this data?
2. Metropolis Random Walk for Poisson regression

Consider the following Poisson regression model
iid
yi |β ∼ Poisson exp xTi β , i = 1, ..., n,

where yi is the count for the ith observation in the sample and xi is the p-dimensional
vector with covariate observations for the ith observation. Use the data set
1
eBayNumberOfBidderData.dat. This dataset contains observations from 1000 eBay
auctions of coins. The response variable is nBids and records the number of bids
in each auction. The remaining variables are features/covariates (x):
Const (for the intercept)
PowerSeller (equal to 1 if the seller is selling large volumes on eBay)
VerifyID (equal to 1 if the seller is a veried seller by eBay)
Sealed (equal to 1 if the coin was sold in an unopened envelope)
MinBlem (equal to 1 if the coin has a minor defect)
MajBlem (equal to 1 if the coin has a major defect)
LargNeg (equal to 1 if the seller received a lot of negative feedback from

customers)
LogBook (logarithm of the book value of the auctioned coin according to
expert sellers. Standardized)
MinBidShare (ratio of the minimum selling price (starting price) to the book
value. Standardized).
(a) Obtain the maximum likelihood estimator of β in the Poisson regression model
for the eBay data [Hint: glm.R, don't forget that glm() adds its own intercept
so don't input the covariate Const]. Which covariates are signicant?
(b) Let's
do a Bayesian analysis of the Poisson regression. Let the prior be β ∼
N 0, 100 · (X X) , where X is the n × p covariate matrix. This is a com-
T −1
monly used prior, which is called Zellner's g-prior. Assume rst that the pos-
terior density is approximately multivariate normal:

β|y ∼ N β̃, Jy−1 (β̃) ,
where β̃ is the posterior mode and Jy (β̃) is the negative Hessian at the posterior
mode. β̃ and Jy (β̃) can be obtained by numerical optimization (optim.R)
exactly like you already did for the logistic regression in Lab 2 (but with the
log posterior function replaced by the corresponding one for the Poisson model,
which you have to code up.).
(c) Let's simulate from the actual posterior of β using the Metropolis algorithm
and compare the results with the approximate results in b). Program a general
function that uses the Metropolis algorithm to generate random draws from an
arbitrary posterior density. In order to show that it is a general function for
any model, we denote the vector of model parameters by θ. Let the proposal
density be the multivariate normal density mentioned in Lecture 8 (random
walk Metropolis):
θp |θ(i−1) ∼ N θ(i−1) , c · Σ ,

where Σ = Jy−1 (β̃) was obtained in b). The value c is a tuning parameter and
should be an input to your Metropolis function. The user of your Metropo-
lis function should be able to supply her own posterior density function, not
necessarily for the Poisson regression, and still be able to use your Metropolis
function. This is not so straightforward, unless you have come across function
objects in R. The note HowToCodeRWM.pdf in Lisam describes how you
can do this in R.
2
Now, use your new Metropolis function to sample from the posterior of β
in the Poisson regression for the eBay dataset. Assess MCMC convergence by
graphical methods.
(d) Use the MCMC draws from c) to simulate from the predictive distribution of
the number of bidders in a new auction with the characteristics below. Plot
the predictive distribution. What is the probability of no bidders in this new
auction?
PowerSeller =1
VerifyID = 0
Sealed = 1
MinBlem = 0
MajBlem = 1
LargNeg = 0
LogBook = 1.2
MinBidShare = 0.8
3. Time series models in Stan
(a) Write a function in R that simulates data from the AR(1)-process

iid
xt = µ + φ (xt−1 − µ) + εt , εt ∼ N 0, σ 2 ,

for given values of µ, φ and σ 2 . Start the process at x1 = µ and then simulate
values for xt for t = 2, 3 . . . , T and return the vector x1:T containing all time
points. Use µ = 13, σ 2 = 3 and T = 300 and look at some dierent realizations
(simulations) of x1:T for values of φ between −1 and 1 (this is the interval
of φ where the AR(1)-process is stationary). Include a plot of at least one
realization in the report. What eect does the value of φ have on x1:T ?
(b) Use your function from a) to simulate two AR(1)-processes, x1:T with φ = 0.2
and y1:T with φ = 0.95. Now, treat your simulated vectors as synthetic data,
and treat the values of µ, φ and σ 2 as unknown parameters. Implement Stan-
code that samples from the posterior of the three parameters, using suitable
non-informative priors of your choice. [Hint: Look at the time-series models
examples in the Stan user's guide/reference manual, and note the dierent
parameterization used here.]
i. Report the posterior mean, 95% credible intervals and the number of ef-
fective posterior samples for the three inferred parameters for each of the
simulated AR(1)-process. Are you able to estimate the true values?
ii. For each of the two data sets, evaluate the convergence of the samplers
and plot the joint posterior of µ and φ. Comments?
Good luck!
Best, Bertil
3
Lab 3, Bayesian learning
mansj125, Isabe723
Question 1
(a)
Inefficiency Factor for μ = 1.26403
Inefficiency Factor for σ2 = 1.065704
The trajectories of the sampled Markov Chains for μ and σ2 are given below,
It can be seen that the sample generated from does not seem to completely strongly converge to a
single value, but seems to move around a value with some variance.
CODE:
Note: The code includes Gibbs sampling as well as direct sampling which was done for verification
purposes
library(scales)
#---------------------------------3.1--------------------
data = readRDS("Precipitation.rds")
y = log(data)
n = length(y)
y_mean = mean(y)
# initializing hyper parameters

#these are not given and no method for choosing provided
#have chosen reasonable values and checked with draws from posterior
mu_0 = mean(y)*1.2
t_sq_0 = var(y)*.8
sigma_sq_0 = .5
v0 = 100*.1
# Plotting prior to check if it is at all similar to the data given

#simulated draws from inverss chi^2
prior_sigmasq = c()
N_Draws = 2000
prior_mu = rnorm(N_Draws,mu_0,t_sq_0)
for(i in 1:N_Draws){
X = rchisq(1,v0)
prior_sigmasq = c(prior_sigmasq, v0*sigma_sq_0 / X)
}
# Drawing form the simulated prior mean and sigma^2

draws_from_prior = c()
for(i in 1:N_Draws){
draws_from_prior = c(draws_from_prior, rnorm(1, prior_mu[i],prior_sigmasq[i]))
}
# Plotting histogram of data points

hist(draws_from_prior,freq=F)
mean(draws_from_prior)
var(draws_from_prior)
#plotting given data

hist(y, freq=F)
mean(y)
var(y)
#------------------gibbs sampling------------------
N_Draws = n
# initializing hyper parameters
mu_0 = mean(y)*1.2
t_sq_0 = var(y)*.8
sigma_sq_0 = .5
v0 = 100*.1
v_n = v0 + n
# Initial Setting
mu = rnorm(1,mu_0,t_sq_0) # draw using our hyperparams
#draw from inv_chi^2

X = rchisq(1,v0)
sigmasq = v0*sigma_sq_0 / X
Gibbs_posterior_mu = c()
Gibbs_posterior_sigma = c()
#mu is mean of the normal distribution for y

#mu_n is the mean of the normal distribution fro mu
#t_sq_n is the variance of the normal distribution for mu
# v0, v_n and sigma_sq_0 are used in drawing for sigma_sq
for( i in 1:N_Draws){
w = (n/sigmasq) / ( (n/sigmasq) + (1 / t_sq_0))

mu_n = w*y_mean + (1-w)*mu_0
t_sq_n = 1/((n/sigmasq) + (1/t_sq_0))
mu = rnorm(1,mu_n,t_sq_n)
Gibbs_posterior_mu = c(Gibbs_posterior_mu, mu)
X = rchisq(1,v_n)
thing = (v0*sigma_sq_0 + sum((mu - y)^2))/(n + v0)
sigmasq = v_n*thing/X
Gibbs_posterior_sigma = c(Gibbs_posterior_sigma, sigmasq)
}
#autocorrelations of sampled params
acf_mu = acf(Gibbs_posterior_mu)
acf_sigma = acf(Gibbs_posterior_sigma)
#inefficiencty factors
if_mu = 1+2*sum(acf_mu$acf[-1])
if_sigma = 1+2*sum(acf_sigma$acf[-1])
hist(Gibbs_posterior_mu)
hist(Gibbs_posterior_sigma)
plot(Gibbs_posterior_mu,type="l",main = "trajectory of mu")
plot(Gibbs_posterior_sigma,type="l", main = "trajectory of sigma")
#--------------verification, through direct draws, not part of question ---------

--------------
# Direct Draws Done for verification
mu_0 = mean(y)
k0 = 1
sigma_sq_0 = 1.8
v0 = 100
# Draws from Piror
nDraws = 2000
X = rchisq(nDraws,v0)
prior_direct_sigmasq = v0*sigma_sq_0 / X
prior_direct_mu = c()
for(i in 1:nDraws){
prior_direct_mu = c(prior_direct_mu, rnorm(1,mu_0,prior_direct_sigmasq[i]/k0))
}
hist(mu)
hist(prior_direct_sigmasq)
# Posterior from Direct Sampling

nDraws = 500
mu_n = ((k0/(k0+n))*mu_0) + ((n/(k0+n))*y_mean)
kn = k0 + n
vn = v0 + n
s_sq = sum((y-y_mean)^2)/(n-1)
sigmasq_n = (v0*sigma_sq_0 + (n-1)*s_sq + (((k0*n)/(k0+n))*(y_mean - mu_0)^2))/vn
X = rchisq(nDraws,vn)
direct_sigmasq = vn*sigmasq_n / X
direct_mu = c()
for(i in 1:nDraws){
direct_mu = c(direct_mu, rnorm(1,mu_n,direct_sigmasq[i]/kn))
}
hist(direct_mu)
hist(direct_sigmasq)
plot(direct_mu,type="l")
plot(direct_sigmasq,type="l")
b)
It can be seen that the posterior prediction also closely resembles the trends of the dataset that was
given but the posterior predictions has some outliers.
CODE:
#-----------------------------------------------3.1.b--------------------------
draws_posterior = c()
for(i in 1:n){
draws_posterior = c(draws_posterior, rnorm(1,Gibbs_posterior_mu[i],
Gibbs_posterior_sigma[i]))
}
hist(data, freq=F,col="red")
hist(exp(draws_posterior),add=T, freq=F, breaks = 100,
xlim=range(0,80),col="blue")
legend("topright",legend=c("historical data", "posterior draws"), col=c("red",
"blue"), lty=1)
Question 2
A. Significant covariates
I. Intercept
II. VerifyID
III. Sealed
IV. Logbook
V. MinBidShare
CODE:
#----------------------------------------------3.2.a--------------------------
library(mvtnorm)
data <- read.table("eBayNumberOfBidderData.dat", sep = "" , header = T ,
na.strings ="", stringsAsFactors= F)
data = as.data.frame(data)
col_names = colnames(data)
#poisson distributed, with exponentail in expression

model <- glm("nBids ~ . - Const" , data, family = poisson(link = "log"))
model
summary(model)
B. See code
CODE:
#----------------------------------3.2.b--------------------------------
Y = data$nBids
X = as.matrix(data[,-1])
n = dim(data)[1]
mu = rep(0,9)
Sigma = 100*solve(t(X)%*%X)
beta_prior = rmvnorm(1,mean=mu,sigma=Sigma)
dim(beta_prior)
LogPostPoisson <- function(betas,Y,X,mu,Sigma){

linPred <- betas%*%t(X);
# finding the log likelihood
logLik <- sum(Y*linPred - exp(linPred) - log(factorial(Y)));
#print(logLik)
# finding the log logPrior using the parameters given
logPrior <- dmvnorm(betas, mu, Sigma, log=TRUE);
#print(logPrior)
# returning the log posterior as the sum of loglikihood and logPrior
return(logLik + logPrior)
}
OptimRes <-
optim(beta_prior,LogPostPoisson,gr=NULL,Y,X,mu,Sigma,method=c("BFGS"),control=lis
t(fnscale=-1),hessian=TRUE)
# Getting the hessian matrix from the results returned from the optim function
beta_mode_hessian = OptimRes$hessian
# Getting the mode Î² values as the parameters returned from the optim function
beta_mode = OptimRes$par
colnames(beta_mode) = col_names[-1]
# Getting the J(beta_mode) as the negative of the hessian returned
J = -beta_mode_hessian
# Calculating inverse of J
J_inverse = solve(-beta_mode_hessian)
posterior_beta_draw = rmvnorm(1,beta_mode, J_inverse)
C. The beta values do not converge. Neither do they diverge. They oscillate around the values
suggested by our glm model from question 1. It seems like the significant(according to glm)
features stay closer to the red line than the insignificant ones. Suggesting they influence model
performance more.
Note the oscillations are bigger than the bounds of the window for some of the features. They
are all scaled to 0 - > 2*beta from glm tol illustratet the difference In relative magnitude of
scillations.
CODE:
#-----------------------------------3.2c---------------------------------
num_random_walk = 1E5
MetropolisRandomWalk <- function(beta_initial, c, J_inverse,
log_density_function, mu, Sigma, Y, X){
metropolis_betas = matrix(data = NA, nrow = num_random_walk, ncol = dim(X)[2])

prev_beta = beta_initial
for(i in 1:num_random_walk){
sample = rmvnorm(1, mean = prev_beta, sigma = c * J_inverse)
log_density_sample = log_density_function(sample, Y, X, mu, Sigma)

log_density_prev_sample = log_density_function(prev_beta, Y, X, mu, Sigma)
alpha = min(1, exp((log_density_sample - log_density_prev_sample)))

#comparing the sum if log-prior and log likelihood of the two samples
#update betas if new one is sufficiently 'better', based on random draw
rand = runif(n=1, min = 0, max = 1)

if(rand<alpha){
metropolis_betas[i,] = sample
prev_beta = sample
}
else{
metropolis_betas[i,] = prev_beta
}
}
return(metropolis_betas)
}
metropolis_posterior_betas = MetropolisRandomWalk(posterior_beta_draw, .6,

J_inverse, LogPostPoisson, mu, Sigma, Y, X)
colnames(metropolis_posterior_betas) = col_names[-1]
par(mfrow=c(3,3))
for(i in 1:dim(X)[2]){
plot(metropolis_posterior_betas[,i], type = "l",ylab = col_names[1+i])# ylim =
c(0,2*model$coefficients[i]))
abline(h=model$coefficients[i], col="red")
}
D. Our predictive distribution. The model from question 1 predicted 0.68 on the same test case.
Once again it seems they agree.
We estimate the probability of no bidders to be between 50 and 51 percent.
CODE:
#-----------------------------------3.2c---------------------------------
num_random_walk = 1E5
MetropolisRandomWalk <- function(beta_initial, c, J_inverse,
log_density_function, mu, Sigma, Y, X){
metropolis_betas = matrix(data = NA, nrow = num_random_walk, ncol = dim(X)[2])

prev_beta = beta_initial
for(i in 1:num_random_walk){
sample = rmvnorm(1, mean = prev_beta, sigma = c * J_inverse)
log_density_sample = log_density_function(sample, Y, X, mu, Sigma)

log_density_prev_sample = log_density_function(prev_beta, Y, X, mu, Sigma)
alpha = min(1, exp((log_density_sample - log_density_prev_sample)))

#comparing the sum if log-prior and log likelihood of the two samples
#update betas if new one is sufficiently 'better', based on random draw
rand = runif(n=1, min = 0, max = 1)

if(rand<alpha){
metropolis_betas[i,] = sample
prev_beta = sample
}
else{
metropolis_betas[i,] = prev_beta
}
}
return(metropolis_betas)
}
metropolis_posterior_betas = MetropolisRandomWalk(posterior_beta_draw, .6,

J_inverse, LogPostPoisson, mu, Sigma, Y, X)
colnames(metropolis_posterior_betas) = col_names[-1]
par(mfrow=c(3,3))
for(i in 1:dim(X)[2]){
plot(metropolis_posterior_betas[,i], type = "l",ylab = col_names[1+i])# ylim =
c(0,2*model$coefficients[i]))
abline(h=model$coefficients[i], col="red")
}
#-------------------------------------------3.2.d--------------------------------
----
par(mfrow=c(1,1))
sample = matrix(c(1, 1, 0, 1, 0, 1,0,1.2,0.8))
intensities = exp(metropolis_posterior_betas %*% sample)
poisson_draws = matrix(nrow=num_random_walk,ncol=1)
for (i in 1:num_random_walk){
poisson_draws[i,1] = rpois(1,intensities[i])
}
#print(exp(model$coefficients%*%sample))
hist(poisson_draws, main = "Predictive distribution", sub =paste("glm predicts:
",exp(model$coefficients%*%sample)), freq=F)
prob_no_bids = mean(poisson_draws==0)
prob_no_bids
Question 3
A. See realizations using phi = [-.8, 0, .8] below, including autcorrelation
Using phi < 0 makes the series jump back and forth around the mean in a very erratic way. The
autocorrelation is changing sign with each lag-step and approaching zero as the number of lag
steps increases
Using phi == 0 gives a trajectory where the only things affecting the values are the mean and the
error/random term. There is no significant autocorrelation.
phi > 0 will give us a smoother series of values, where each value will tend to be close to the
previous. The autocorrelation is positive but decreases exponentially with the number of lag
steps.
CODE:
#----------------3.3.2 create data-----------------------------------------------

----------
fi_1 = .2
mu_1 = 15
sigma_sq_1 = 4
fi_2 = .95
mu_2 = 2
sigma_sq_2 = 8
par(mfrow=c(1,1))
path = matrix(nrow = t_max,ncol=2)

path[1,] = c(mu_1,mu_2)
for (t in 2:t_max){
path[t,1] = mu_1 + fi_1*(path[t-1,1]-mu_1) + rnorm(1,0,sqrt(sigma_sq_1))
}
plot(path[,2],type="line",col="blue", main="trajectories sampled with our two
sets of parameters")
lines(path[,1],col="red")#acf(path,main=paste("ACF ",fi))
B-I
i) The 95% credible intervals for each of the parameters that has been obtained from the
model are given below,
ii) First trajectory, true values [.2, 15, 4]
i. Phi - (0.135, 0.286)

ii. Mu - (14.76, 15.2)
iii. Sigma^2 - (3.725 , 4.608 )
iii) Second trajectory, true values [.95, 2, 8]
i. Phi - (0.899, 0.998)

ii. Mu - (-21.435, 15.939)
iii. Sigma^2 - (7.178 , 9.945)
The interval for mu is huge in the second run. The mean is curiously enough quite close (2.147).
The other parameters converged nicely. Neff for mu is only 64. I suspect this might be solved by
using more data and running the chains for longer and/or using more chains. The variance in the
randomness is a lot higher here which I suspect makes the problem more difficult for the chain.
B-II)
Like in question 2.c the values do not converge but rather oscillate around some value and stay
there. It seems that no benefit would be gained by performing more iterations. Both chains have
Rhat-values close to 1, indicating convergence. However the second sampler had a lot fewer
efficient iterations.
CODE:
t_max = 300
# ------------------ 3.3.a-------------------------------------------------------
------
par(mfrow=c(3,2))
for (fi in seq(-.8,.8,.8)){
path = matrix(nrow = t_max)
path[1] = mu
for (t in 2:t_max){
path[t] = mu + fi*(path[t-1]-mu) + rnorm(1,0,sigma_sq)
}
plot(path,main=fi,type="line",col="blue")
acf(path,main=paste("ACF ",fi))
}
#----------------3.3.a create data-----------------------------------------------

----------
fi_1 = .2
mu_1 = 15
sigma_sq_1 = 4
fi_2 = .95
mu_2 = 2
sigma_sq_2 = 8
par(mfrow=c(1,1))
path = matrix(nrow = t_max,ncol=2)

path[1,] = c(mu_1,mu_2)
for (t in 2:t_max){
}
plot(path[,2],type="line",col="blue", main="trajectories sampled with our two

sets of parameters")
lines(path[,1],col="red")#acf(path,main=paste("ACF ",fi))
write.table(path[,1],file="path1.txt",header=T)
write.table(path[,2],file="path2.txt",header=T)
paths = path
#---------------------------------------------3.3.b---------------------
library(rstan)
#y=c(4,5,6,4,0,2,5,3,8,6,10,8)
#N=length(y)
StanModel = '
data {
vector[300] M; // synthetic data
//matrix[300,2] M
}
parameters {
real mu;
real<lower=0> sigma2;
real phi;
}
model {
mu ~ uniform(-100,100); // Normal with mean 0, st.dev. 100
sigma2 ~ uniform(0,100); // Scaled-inv-chi2 with nu 1,sigma 2
phi ~ uniform(-10,10);
for(i in 2:300){
M[i] ~ normal(mu + phi*(M[i-1]-mu),sqrt(sigma2));

}
}'
data1 <- list(M=paths[,1])

data2 <- list(M=paths[,2])
#-------------------fitting for first series
niter <- 5000

warmup <- as.integer(niter*.15)
fit <- stan(model_code=StanModel,data=data1, warmup=warmup,iter=niter,chains=4)

# Print the fitted model
print(fit,digits_summary=3)
# Extract posterior samples
postDraws <- extract(fit)
# Do traceplots of the first chain
par(mfrow = c(1,1))
plot(postDraws$mu[1:(niter-warmup)],type="l",ylab="mu",main="Traceplot")
# Do automatic traceplots of all chains
traceplot(fit)
# Bivariate posterior plots
pairs(fit)
#---------------------fitting for second series

#
niter <- 5000
warmup <- as.integer(niter*.15)
fit <- stan(model_code=StanModel,data=data2, warmup=warmup,iter=niter,chains=4)
# Print the fitted model
print(fit,digits_summary=3)
# Extract posterior samples
postDraws <- extract(fit)
# Do traceplots of the first chain
par(mfrow = c(1,1))
plot(postDraws$mu[warmup:niter],type="l",ylab="mu",main="Traceplot")
# Do automatic traceplots of all chains
traceplot(fit)
# Bivariate posterior plots
pairs(fit)

Fuskpaper Bayes

Uploaded by

Copyright:

Available Formats

Fuskpaper Bayes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fuskpaper Bayes

Uploaded by

Copyright:

Available Formats

Bertil Wegmann

Dept of Computer and Information Science

1 when incomes follow a log N (µ, σ 2 ) distribution. Φ(z) is the cumulative

#posteriors for beta distr

#sd of Beta distr

plot(drawn_mean, col="blue", xlab="Number of drawn samples")

plot(drawn_sd,col="blue", xlab = "number of drawn samples")

simulated_posterior_prob = mean(rbeta(10000, posterior_alpha,posterior_beta) <.3

log_odds <- function(p){

hist(posterior_logodds, freq=F, main = "posterior distribution of phi =

sigma_sq = (n-1)*tau_sq / X_draws # Simulating draws from Inv−χ2(n,τ)

# Theoretical density function for Inv−χ2(n,τ) distribution

# Plotting posterior draws from sampling

gini = 2*pnorm(sqrt(sigma_sq)/sqrt(2))-1 #formula given in instructions

p_cond <- function(y,k,mu=2.51){

plot(x =tested_k_values, y = probs / sum(probs * part_length),ylab = "density",

1. Linear and polynomial regression

2. Posterior approximation for classication with logistic regression

(a) Consider the logistic regression model:

a 7 × 7 matrix with second derivatives on the diagonal and cross-derivatives

colnames(X) = c("time", "time_sq")

#hyperparams (for beta)

#draw sigma from inv_chgi_sq

#as per slide in L3 but with other params

#draw beta from N

X_sample = data.frame(const = 1, time = seq(0,1,1/364),time_sq =

mu_n = solve(XtX + omega_0) %*% (XtX%*%beta_hat + omega_0%*%mu_0)

parameters = matrix(nrow = numdraws,ncol=length(mu_0)+1)

parameters = posterior(1000, as.matrix(X), as.matrix(y), omega_0, mu_0, nu_0,

hist(parameters[,1],freq = F,main="beta0- constant")#,xlim=c(-100,300))

points(X_sample$time,median_preds, col = "#ff000023")

beta_derivs = matrix(nrow = dim(betas)[1], ncol = 2)

Using the prior μ_0=0,Ω_0=λ*I.

μ_0=0 will ensure the parameters stay closer to 0.

The numerical value of J-1(beta mode) is,

#mean (for betas)

LogPostLogistic <- function(betas,y,X,mu,Sigma){

optres = optim(betas_prior,LogPostLogistic, gr = NULL,y,X,betas_prior,

beta_posterior_draws = rmvnorm(1E4,mean = beta_mode, sigma = solve(-J))

hist(small_child_draws, freq=F,sub="red lines indicate boundries of 95% equal

glm(formula = "Work ~ 0 + .",data = dat, family = "binomial")

probs = posterior_draw_predictive(X_test,beta_mode,solve(-J), numdraws = N)

posterior_draw_predictive_11 <- function(X, beta_mode, Sigma, numdraws = 1){

num_women = posterior_draw_predictive_11(X_test,beta_mode,solve(-J), numdraws =

1. Gibbs sampler for a normal model

2. Metropolis Random Walk for Poisson regression

 VerifyID (equal to 1 if the seller is a veried seller by eBay)

 Sealed (equal to 1 if the coin was sold in an unopened envelope)

 MinBlem (equal to 1 if the coin has a minor defect)

 MajBlem (equal to 1 if the coin has a major defect)

 LargNeg (equal to 1 if the seller received a lot of negative feedback from

3. Time series models in Stan

(a) Write a function in R that simulates data from the AR(1)-process

Inefficiency Factor for μ = 1.26403

Inefficiency Factor for σ2 = 1.065704

# initializing hyper parameters

# Plotting prior to check if it is at all similar to the data given

# Drawing form the simulated prior mean and sigma^2

# Plotting histogram of data points

2. Posterior approximation for classication with logistic regression

mu_n = solve(XtX + omega_0) %% (XtX%%beta_hat + omega_0%*%mu_0)

VerifyID (equal to 1 if the seller is a veried seller by eBay)

Sealed (equal to 1 if the coin was sold in an unopened envelope)

MinBlem (equal to 1 if the coin has a minor defect)

MajBlem (equal to 1 if the coin has a major defect)

LargNeg (equal to 1 if the seller received a lot of negative feedback from