Fuskpaper Bayes
Fuskpaper Bayes
Fuskpaper Bayes
Bayesian Learning
Computer Lab 1
You are recommended to use R for solving the labs.
You work and submit your labs in pairs, but both of you should contribute equally
and understand all parts of your solutions.
It is not allowed to share exact solutions with other student pairs.
The submitted lab reports will be veried through URKUND and indications of
plagiarism will be investigated by the Disciplinary Board.
Submit your solutions via LISAM, no later than April 19 at 23:30.
Please note the following about the format of the submitted lab report:
1. The lab report should include all solutions and plots to the stated problems with
necessary comments.
2. Submit the lab report with your code attached to the solution of each sub-problem
(1a), 1b),...) in one PDF document.
3. Submit a separate le containing all code.
1. Daniel Bernoulli
Let y1 , ..., yn |θ ∼ Bern(θ), and assume that you have obtained a sample with s = 13
successes in n = 50 trials. Assume a Beta(α0 , β0 ) prior for θ and let α0 = β0 = 5.
(a) Draw random numbers from the posterior θ|y ∼ Beta(α0 + s, β0 + f ), where
y = (y1 , . . . , yn ), and verify graphically that the posterior mean E [θ|y] and
standard deviation SD [θ|y] converges to the true values as the number of
random draws grows large. [Hint: use rbeta()].
(b) Use simulation (nDraws = 10000) to compute the posterior probability
Pr(θ < 0.3|y) and compare with the exact value from the Beta posterior. [Hint:
use pbeta()].
(c) Simulate draws from the posterior distribution of the log-odds φ = log 1−θ θ
by
using simulated draws from the Beta posterior for θ and plot the posterior
distribution of φ (nDraws = 10000). [Hint: hist() and density() can be
utilized].
2. Log-normal distribution and the Gini coecient.
Assume that you have asked 9 randomly selected persons about their monthly in-
come (in thousands Swedish Krona) and obtained the following nine observations:
33, 24, 48, 32, 55, 74, 23, 76 and 17. A common model for non-negative continuous
variables is the log-normal distribution. The log-normal distribution log N (µ, σ 2 )
has density function
2 1 1 2
p(y|µ, σ ) = √ exp − 2 (log y − µ) ,
y · 2πσ 2 2σ
1
where y > 0, µ > 0 and σ 2 > 0. The log-normal distribution is related to the
normal distribution as follows: if y ∼ log N (µ, σ 2 ) then log y ∼ N (µ, σ 2 ). Let
y1 , ..., yn |µ, σ 2 ∼ log N (µ, σ 2 ), where µ = 3.5 is assumed to be known but σ 2 is
iid
unknown with non-informative prior p(σ 2 ) ∝ 1/σ 2 . The posterior for σ 2 is the
Inv − χ2 (n, τ 2 ) distribution, where
Pn
2 i=1 (log yi − µ)2
τ = .
n
(a) Simulate 10, 000 draws from the posterior of σ 2 by assuming µ = 3.5 and plot
the posterior distribution.
(b) The most common measure of income inequality is the Gini coecient, G,
where 0 ≤ G ≤ 1. G = 0 means a completely equal income distribution,
whereas G = 1 means complete income inequality (see e.g. Wikipedia for√more
information about the Gini coecient). It can be shown that G = 2Φ σ/ 2 −
where North is located at zero degrees (see Figure 1 on the next page, where the
angles are measured clockwise). To t with Wikipedias description of probability
distributions for circular data we convert the data into radians −π ≤ y ≤ π . The
10 observations in radians are
(1.83, 2.02, 2.33, −2.79, 2.07, 2.02, −2.44, 2.14, 2.54, 2.23).
Assume that these data points conditional on (µ, κ) are independent observations
from the following von Mises distribution:
exp [κ · cos(y − µ)]
p(y|µ, κ) = , −π ≤ y ≤ π,
2πI0 (κ)
where I0 (κ) is the modied Bessel function of the rst kind of order zero [see
?besselI in R]. The parameter µ (−π ≤ µ ≤ π ) is the mean direction and κ > 0 is
2
called the concentration parameter. Large κ gives a small variance around µ, and
vice versa. Assume that µ is known to be 2.51. Let κ ∼ Exponential(λ = 1) a
priori, where λ is the rate parameter of the exponential distribution (so that the
mean is 1/λ).
(a) Derive the expression for what the posterior p(κ|y, µ) is proportional to. Hence,
derive the function f (κ) such that p(κ|y, µ) ∝ f (κ). Then, plot the posterior
distribution of κ for the wind direction data over a ne grid of κ values. [Hint:
you need to normalize the posterior distribution of κ so that it integrates to
one.]
(b) Find the (approximate) posterior mode of κ from the information in a).
Figure 1: The wind direction data. Angles are measured clock-wise starting from North.
Good luck!
3
Answers to lab1 bayesian learning
Isabe723, mansj125
Question 1
(a) For each of the draws the mean (E[θ|y]) and standard deviation (SD[θ|y]) were calculated and plotted. It was
then compared to the true mean and standard deviation of the posterior distribution. The plots are given
below,
It can also be seen from the graphs given above that the mean and standard deviation of our random draws
from the posterior distribution converges to the true mean and standard deviation as the number of random
draws increases.
Code:
#---------------------------------1.1a----------------------
#priors
prior_alpha = 5
prior_beta = 5
#observed data
trials = 50
sucesses = 13
fails = trials - sucesses
N = 5000
drawn_mean = c()
drawn_sd = c()
for ( n in seq(1,N)){
draw = rbeta(n,prior_alpha+sucesses,prior_beta+fails)
drawn_mean = c(drawn_mean , mean(draw))
drawn_sd = c(drawn_sd , sd(draw))
}
#beta distribution
#pdf: p ∝ x^(a-1) *(1-x)^(b-1)
#mean: a / (a + b)
#variance: (a*b) / (a + b)^2 * ( a + b + 1)
#true vals
#mean / expected value of Beta-distr
true_mean = (sucesses+prior_alpha) / (trials + prior_alpha + prior_beta)
(b) The probability (Pr(θ< 0.3|y) calculated from the draws was found to be 0.5189
The exact probability value from the distribution was found to be 0.5150226
Code:
#----------------------------------1.1b-------------------
print(paste("simulated:",simulated_posterior_prob,"exact:",exact_posterior_prob,
sep=" "))
(c) The posterior distribution of log odds was plotted using both hist and density and is given below,
Code:
#---------------------------------1.1c------------------
ndraws = 10000
posterior_theta_draws = rbeta(ndraws,posterior_alpha, posterior_beta) # again,
draw from posterior distribution
Question 2
a)
#------------------------------2a----------------------------
#priors
mu = 3.5
#sigma unknown with prior p(s^2) ∝ 1/s^2
#data
y = c(33, 24, 48, 32, 55, 74, 23, 76 , 17)
n = length(y)
numdraws = 1E5
tau_sq = sum((log(y)-mu)^2)/n
X_draws = rchisq(numdraws,n-1)
#------------------------------2b-------------------------
c) + d)
The intevals are quite different. The tail on the upper side of the ETCI is much longer than on the lower,
thus causing 95% equal tail interval to be significantly higher than the HDPI.
#----------------------------2d--------------------------
sorted = sort(gini)
# 2.d)
# Finding density of gini distribution
dense = density(sorted)
# storing coordinate and desnity values in a dataframe
df = data.frame(x=dense$x,y=dense$y)
# sorting dataframe based on density values
df = df[order(df$y),]
# Finding cumulative sum of sorted density values
df$cumsum = cumsum(df$y)
# finding all points at which the cumulative sum is > 5% if the total density (
95% Highest posterior interval)
TOTAL = sum(df$y)
df_2 = df[df$cumsum>TOTAL*0.05,c("x","y")]
points(df_2,col="red")
legend("topright", inset=.01, c("95% Equal Tail Confidence Intervel", "Highest
Density Posterior
Intervel"), fill=c("green", "red"))
Question 3
(a) & (b)
Code:
#-----------------------------3a&b---------------------------
y = c(1.83, 2.02, 2.33, -2.79, 2.07, 2.02, -2.44, 2.14, 2.54, 2.23)
probs = c()
part_length = 0.01
tested_k_values = seq(0,10,part_length)
max_prob = 0
k_max_prob = 0
for (k in tested_k_values){
p =p_cond(y,k)
probs = c(probs,p)
if (p > max_prob){
max_prob = p
k_max_prob = k
}
}
Bayesian Learning
Computer Lab 2
You are recommended to use R for solving the labs.
You work and submit your labs in pairs, but both of you should contribute equally
and understand all parts of your solutions.
It is not allowed to share exact solutions with other student pairs.
The submitted lab reports will be veried through URKUND and indications of
plagiarism will be investigated by the Disciplinary Board.
Submit your solutions via LISAM, no later than May 3 at 23:30.
Please note the following about the format of the submitted lab report:
1. The lab report should include all solutions and plots to the stated problems with
necessary comments.
2. Submit the lab report with your code attached to the solution of each sub-problem
(1a), 1b),...) in one PDF document.
3. Submit a separate le containing all code.
(a) Use the conjugate prior for the linear regression model. The prior hyper-
parameters µ0 , Ω0 , ν0 and σ02 shall be set to sensible values. Start with
µ0 = (−10, 100, −100)T , Ω0 = 0.02 · I3 , ν0 = 3 and σ02 = 2. Check if this
prior agrees with your prior opinions by simulating draws from the joint prior
of all parameters and for every draw compute the regression curve. This gives
a collection of regression curves; one for each draw from the prior. Does the
collection of curves look reasonable? If not, change the prior hyperparame-
ters until the collection of prior regression curves agrees with your prior beliefs
about the regression curve.
[Hint: R package mvtnorm can be used and your Inv -χ2 simulator of random
draws from Lab 1.]
(b) Write a function that simulate draws from the joint posterior distribu-
tion of β0 , β1 ,β2 and σ2 .
1
i. Plot a histogram for each marginal posterior of the parameters.
ii. Make a scatter plot of the temperature data and overlay a curve for the
posterior median of the regression function f (time) = E [temp|time] =
β0 + β1 · time + β2 · time2 , i.e. the median of f (time) is computed for
every value of time. In addition, overlay curves for the 95% equal tail
posterior probability intervals of f (time), i.e. the 2.5 and 97.5 posterior
percentiles of f (time) is computed for every value of time. Does the
posterior probability intervals contain most of the data points? Should
they?
(c) It is of interest to locate the time with the highest expected temperature (i.e.
the time where f (time) is maximal). Let's call this value x̃. Use the simulated
draws in (b) to simulate from the posterior distribution of x̃. [Hint: the
regression curve is a quadratic polynomial. Given each posterior draw of β0 , β1
and β2 , you can nd a simple formula for x̃.]
(d) Say now that you want to estimate a polynomial regression of order 8,
but you suspect that higher order terms may not be needed, and you worry
about overtting the data. Suggest a suitable prior that mitigates this potential
problem. You do not need to compute the posterior. Just write down your
prior. [Hint: the task is to specify µ0 and Ω0 in a suitable way.]
where y equals 1 if the woman works and 0 if she does not. x is a 7-dimensional
vector containing the seven features (including a 1 to model the intercept).
The goal is to approximate the posterior distribution of the parameter vector
β with a multivariate normal distribution
β|y, x ∼ N β̃, Jy−1 (β̃) ,
where β̃ is the posterior mode and J(β̃) = − ∂ ∂β∂β |β=β̃ is the negative of
ln p(β|y)2
T
the observed Hessian evaluated at the posterior mode. Note that ∂ ∂β∂β is
2 ln p(β|y)
T
2
on the o-diagonal. You can compute this derivative by hand, but
∂ 2 ln p(β|y)
∂βi ∂βj
we will let the computer do it numerically for you. Calculate both β̃ and J(β̃)
by using the optim function in R. [Hint: You may use code snippets from my
demo of logistic regression in Lecture 6.] Use the prior β ∼ N (0, τ 2 I), where
τ = 5.
Present the numerical values of β̃ and Jy−1 (β̃) for the WomenAtWork data. Com-
pute an approximate 95% equal tail posterior probability interval for the regres-
sion coecient to the variable NSmallChild. Would you say that this feature
is of importance for the probability that a woman works?
[Hint: You can verify that your estimation results are reasonable by comparing
the posterior means to the maximum likelihood estimates, given by: glmModel
<- glm(Work ~ 0 + ., data = WomenAtWork, family = binomial).]
(b) Use your normal approximation to the posterior from (a). Write a function
that simulate draws from the posterior predictive distribution of Pr(y = 1|x),
where the values of x corresponds to a 43-year-old woman, with two children (7
and 10 years old), 12 years of education, 8 years of experience, and a husband
with an income of 20. Plot the posterior predictive distribution of Pr(y = 1|x)
for this woman.
[Hints: The R package mvtnorm will be useful. Remember that Pr(y = 1|x)
can be calculated for each posterior draw of β .]
(c) Now, consider 11 women which all have the same features as the woman in
(b). Rewrite your function and plot the posterior predictive distribution for
the number of women, out of these 11, that are working. [Hint: Simulate
from the binomial distribution, which is the distribution for a sum of Bernoulli
random variables.]
Good luck!
Best, Bertil
3
Report lab2 Bayesian
Isabe723, mansj125
Question 1
1.a
The collection of curves is reasonable. The darker areas indicate most of the draws resulted in a
somewhat satisfying model. Ω_0 was scaled down from 0.02 · I_3 to 0.1· I_3 in order to achieve a
tighter distribution.
CODE:
library(mvtnorm)
#--------------------------------2.1a----------------------
dat = read.table("TempLambohov.txt",header = T )
summary(dat)
plot(dat)
y = dat[,2]
X = data.frame(dat[,1])
X$time_sq = dat[,1]^2
X = X[,c(3,1,2)]
head(X)
CODE:
1.c
The purple bar indicated the highest expected temperature with each of the parameters sampled from
the posterior distribution. They are clustered in a small area just after midsummer, which makes sense.
CODE:
#-----------------------------------2.1b.i---------------------------------
"
create all the '_n' varialbes
draw some number of sigmas.
use the new sigmas to draw betas
show marginalö posterior for betas
X,y, omega0, mu_0, v0, n, sigma0_sq
"
posterior <- function(numdraws, X, y, omega_0, mu_0, nu_0,sigma_sq_0 ){
n = length(y)
XtX = t(X)%*%X
mu_n = 0
#mu_n
beta_hat = solve(t(X)%*%X)%*%t(X)%*%y # (X.t * X)^-1 * X.t * y
#omega_n
omega_n = XtX + omega_0
#nu_n
nu_n = nu_0 + n
#vs_n
vs_n = nu_0*sigma_sq_0 + (t(y)%*%y + t(mu_0)%*%omega_0%*%mu_0 -
t(mu_n)%*%omega_n%*%mu_n)
#draw sigmas
X_draws = rchisq(numdraws, nu_n)
sigma_sq = vs_n / X_draws
#print(sigma_sq)
return(parameters)
}
print(head(parameters))
#------------------------------------2.1b.ii--------------------------
plot(dat)
betas = parameters[,1:3]
n = dim(X_sample)[1]
preds = betas %*%t(X_sample)
median_preds = matrix(ncol=n)
quants = matrix(nrow=2,ncol=n)
for (i in 1:n){
median_preds[i] = median(preds[,i])
quants[,i] = quantile(preds[,i],c(.025,0.975))
The highest temperature is predicted just past mid-year. Wich is in line with what we would expect.
Code:
#--------------------------------2.1c---------------------
1.d
a)
The numerical values for the beta modes are given below,
The 95% equal tail credible interval for the variable NSmallChild was found out to be,
Yes, this feature is important for the probability that a woman works because the 95% equal tailed
credible interval for the parameter has a significant negative value. This implies that there is a 95%
probability that this parameter is a significant negative value which in turn reduces the probability of the
women working when applied in the function for logistic regression. The plot for the distribution of
NSmallChild and its credible interval is given below,
CODE:
#-----------------------------2.2a------------------------------
library(mvtnorm)
#data
dat = read.table("WomenAtWork.dat",header=T)
head(dat)
X = as.matrix(dat[,-1])
y = as.matrix(dat[,1])
#priors
tau = 5
Sigma_prior = tau^2 * diag(num_covs)
return(logLik + logPrior)
}
LogPostLogistic(betas_prior,y,X,betas_prior,Sigma_prior)
J = optres$hessian
beta_mode = optres$par
small_child_draws = beta_posterior_draws[,6]
b)
The plot of densities and histogram of the posterior predictive distribution for the women working is
given below,
CODE:
#---------------------------------2.2b-----------------------
posterior_draw_predictive <- function(X, beta_mode, Sigma, numdraws = 1){
betas = rmvnorm(numdraws,beta_mode,sigma=Sigma)
linpred = matrix(nrow=numdraws,ncol=1)
for (i in 1:numdraws){
linpred[i] = X %*% betas[i,]
}
tmp =exp(linpred)
prob = tmp / ( 1 + tmp)
return(prob)
N = 1E4
X_test = matrix(data = c(1, 20, 12, 8, 43, 0,2),ncol = 7, nrow=1 )
hist(probs, freq=F, main= "Posterior draws for probabilty that the woman is
working")
lines(density(probs))
#---------------------------2.2c-----------------------------
c)
The plot of histogram of the posterior predictive distribution for the number of women out of these 11
that are working is given below,
CODE:
#---------------------------2.2c-----------------------------
betas = rmvnorm(numdraws,beta_mode,sigma=Sigma)
num_women = matrix(nrow=numdraws,ncol=1)
for (i in 1:numdraws){
linpred = X %*% betas[i,]
tmp =exp(linpred)
prob = tmp / ( 1 + tmp)
num = rbinom(1,11,prob)
num_women[i] = num
}
return(num_women)
N = 1E4
X_test = matrix(data = c(1, 20, 12, 8, 43, 0,2),ncol = 7, nrow=1 )
hist(num_women, freq = F)
Bertil Wegmann
Dept of Computer and Information Science
Linköping University
Bayesian Learning
Computer Lab 3
You are recommended to use R for solving the labs.
You work and submit your labs in pairs, but both of you should contribute equally
and understand all parts of your solutions.
It is not allowed to share exact solutions with other student pairs.
The submitted lab reports will be veried through URKUND and indications of
plagiarism will be investigated by the Disciplinary Board.
Submit your solutions via LISAM, no later than May 17 at 23:30.
Please note the following about the format of the submitted lab report:
1. The lab report should include all solutions and plots to the stated problems with
necessary comments.
2. Submit the lab report with your code attached to the solution of each sub-problem
(1a), 1b),...) in one PDF document.
3. Submit a separate le containing all code.
where yi is the count for the ith observation in the sample and xi is the p-dimensional
vector with covariate observations for the ith observation. Use the data set
1
eBayNumberOfBidderData.dat. This dataset contains observations from 1000 eBay
auctions of coins. The response variable is nBids and records the number of bids
in each auction. The remaining variables are features/covariates (x):
Const (for the intercept)
PowerSeller (equal to 1 if the seller is selling large volumes on eBay)
monly used prior, which is called Zellner's g-prior. Assume rst that the pos-
terior density is approximately multivariate normal:
β|y ∼ N β̃, Jy−1 (β̃) ,
where β̃ is the posterior mode and Jy (β̃) is the negative Hessian at the posterior
mode. β̃ and Jy (β̃) can be obtained by numerical optimization (optim.R)
exactly like you already did for the logistic regression in Lab 2 (but with the
log posterior function replaced by the corresponding one for the Poisson model,
which you have to code up.).
(c) Let's simulate from the actual posterior of β using the Metropolis algorithm
and compare the results with the approximate results in b). Program a general
function that uses the Metropolis algorithm to generate random draws from an
arbitrary posterior density. In order to show that it is a general function for
any model, we denote the vector of model parameters by θ. Let the proposal
density be the multivariate normal density mentioned in Lecture 8 (random
walk Metropolis):
θp |θ(i−1) ∼ N θ(i−1) , c · Σ ,
where Σ = Jy−1 (β̃) was obtained in b). The value c is a tuning parameter and
should be an input to your Metropolis function. The user of your Metropo-
lis function should be able to supply her own posterior density function, not
necessarily for the Poisson regression, and still be able to use your Metropolis
function. This is not so straightforward, unless you have come across function
objects in R. The note HowToCodeRWM.pdf in Lisam describes how you
can do this in R.
2
Now, use your new Metropolis function to sample from the posterior of β
in the Poisson regression for the eBay dataset. Assess MCMC convergence by
graphical methods.
(d) Use the MCMC draws from c) to simulate from the predictive distribution of
the number of bidders in a new auction with the characteristics below. Plot
the predictive distribution. What is the probability of no bidders in this new
auction?
PowerSeller =1
VerifyID = 0
Sealed = 1
MinBlem = 0
MajBlem = 1
LargNeg = 0
LogBook = 1.2
MinBidShare = 0.8
for given values of µ, φ and σ 2 . Start the process at x1 = µ and then simulate
values for xt for t = 2, 3 . . . , T and return the vector x1:T containing all time
points. Use µ = 13, σ 2 = 3 and T = 300 and look at some dierent realizations
(simulations) of x1:T for values of φ between −1 and 1 (this is the interval
of φ where the AR(1)-process is stationary). Include a plot of at least one
realization in the report. What eect does the value of φ have on x1:T ?
(b) Use your function from a) to simulate two AR(1)-processes, x1:T with φ = 0.2
and y1:T with φ = 0.95. Now, treat your simulated vectors as synthetic data,
and treat the values of µ, φ and σ 2 as unknown parameters. Implement Stan-
code that samples from the posterior of the three parameters, using suitable
non-informative priors of your choice. [Hint: Look at the time-series models
examples in the Stan user's guide/reference manual, and note the dierent
parameterization used here.]
i. Report the posterior mean, 95% credible intervals and the number of ef-
fective posterior samples for the three inferred parameters for each of the
simulated AR(1)-process. Are you able to estimate the true values?
ii. For each of the two data sets, evaluate the convergence of the samplers
and plot the joint posterior of µ and φ. Comments?
Good luck!
Best, Bertil
3
Lab 3, Bayesian learning
mansj125, Isabe723
Question 1
(a)
The trajectories of the sampled Markov Chains for μ and σ2 are given below,
It can be seen that the sample generated from does not seem to completely strongly converge to a
single value, but seems to move around a value with some variance.
CODE:
Note: The code includes Gibbs sampling as well as direct sampling which was done for verification
purposes
library(scales)
#---------------------------------3.1--------------------
data = readRDS("Precipitation.rds")
y = log(data)
n = length(y)
y_mean = mean(y)
#------------------gibbs sampling------------------
N_Draws = n
# initializing hyper parameters
mu_0 = mean(y)*1.2
t_sq_0 = var(y)*.8
sigma_sq_0 = .5
v0 = 100*.1
v_n = v0 + n
# Initial Setting
for( i in 1:N_Draws){
mu = rnorm(1,mu_n,t_sq_n)
Gibbs_posterior_mu = c(Gibbs_posterior_mu, mu)
X = rchisq(1,v_n)
thing = (v0*sigma_sq_0 + sum((mu - y)^2))/(n + v0)
sigmasq = v_n*thing/X
Gibbs_posterior_sigma = c(Gibbs_posterior_sigma, sigmasq)
}
#autocorrelations of sampled params
acf_mu = acf(Gibbs_posterior_mu)
acf_sigma = acf(Gibbs_posterior_sigma)
#inefficiencty factors
if_mu = 1+2*sum(acf_mu$acf[-1])
if_sigma = 1+2*sum(acf_sigma$acf[-1])
hist(Gibbs_posterior_mu)
hist(Gibbs_posterior_sigma)
plot(Gibbs_posterior_mu,type="l",main = "trajectory of mu")
plot(Gibbs_posterior_sigma,type="l", main = "trajectory of sigma")
X = rchisq(nDraws,v0)
prior_direct_sigmasq = v0*sigma_sq_0 / X
prior_direct_mu = c()
for(i in 1:nDraws){
prior_direct_mu = c(prior_direct_mu, rnorm(1,mu_0,prior_direct_sigmasq[i]/k0))
}
hist(mu)
hist(prior_direct_sigmasq)
X = rchisq(nDraws,vn)
direct_sigmasq = vn*sigmasq_n / X
direct_mu = c()
for(i in 1:nDraws){
direct_mu = c(direct_mu, rnorm(1,mu_n,direct_sigmasq[i]/kn))
}
hist(direct_mu)
hist(direct_sigmasq)
plot(direct_mu,type="l")
plot(direct_sigmasq,type="l")
b)
It can be seen that the posterior prediction also closely resembles the trends of the dataset that was
given but the posterior predictions has some outliers.
CODE:
#-----------------------------------------------3.1.b--------------------------
draws_posterior = c()
for(i in 1:n){
draws_posterior = c(draws_posterior, rnorm(1,Gibbs_posterior_mu[i],
Gibbs_posterior_sigma[i]))
}
hist(data, freq=F,col="red")
hist(exp(draws_posterior),add=T, freq=F, breaks = 100,
xlim=range(0,80),col="blue")
legend("topright",legend=c("historical data", "posterior draws"), col=c("red",
"blue"), lty=1)
Question 2
A. Significant covariates
I. Intercept
II. VerifyID
III. Sealed
IV. Logbook
V. MinBidShare
CODE:
#----------------------------------------------3.2.a--------------------------
library(mvtnorm)
data <- read.table("eBayNumberOfBidderData.dat", sep = "" , header = T ,
na.strings ="", stringsAsFactors= F)
data = as.data.frame(data)
col_names = colnames(data)
B. See code
CODE:
#----------------------------------3.2.b--------------------------------
Y = data$nBids
X = as.matrix(data[,-1])
n = dim(data)[1]
mu = rep(0,9)
Sigma = 100*solve(t(X)%*%X)
beta_prior = rmvnorm(1,mean=mu,sigma=Sigma)
dim(beta_prior)
# Getting the hessian matrix from the results returned from the optim function
beta_mode_hessian = OptimRes$hessian
# Getting the mode β values as the parameters returned from the optim function
beta_mode = OptimRes$par
colnames(beta_mode) = col_names[-1]
# Getting the J(beta_mode) as the negative of the hessian returned
J = -beta_mode_hessian
# Calculating inverse of J
J_inverse = solve(-beta_mode_hessian)
C. The beta values do not converge. Neither do they diverge. They oscillate around the values
suggested by our glm model from question 1. It seems like the significant(according to glm)
features stay closer to the red line than the insignificant ones. Suggesting they influence model
performance more.
Note the oscillations are bigger than the bounds of the window for some of the features. They
are all scaled to 0 - > 2*beta from glm tol illustratet the difference In relative magnitude of
scillations.
CODE:
#-----------------------------------3.2c---------------------------------
num_random_walk = 1E5
MetropolisRandomWalk <- function(beta_initial, c, J_inverse,
log_density_function, mu, Sigma, Y, X){
}
return(metropolis_betas)
}
par(mfrow=c(3,3))
for(i in 1:dim(X)[2]){
plot(metropolis_posterior_betas[,i], type = "l",ylab = col_names[1+i])# ylim =
c(0,2*model$coefficients[i]))
abline(h=model$coefficients[i], col="red")
}
D. Our predictive distribution. The model from question 1 predicted 0.68 on the same test case.
Once again it seems they agree.
We estimate the probability of no bidders to be between 50 and 51 percent.
CODE:
#-----------------------------------3.2c---------------------------------
num_random_walk = 1E5
MetropolisRandomWalk <- function(beta_initial, c, J_inverse,
log_density_function, mu, Sigma, Y, X){
for(i in 1:num_random_walk){
}
return(metropolis_betas)
}
par(mfrow=c(3,3))
for(i in 1:dim(X)[2]){
plot(metropolis_posterior_betas[,i], type = "l",ylab = col_names[1+i])# ylim =
c(0,2*model$coefficients[i]))
abline(h=model$coefficients[i], col="red")
}
#-------------------------------------------3.2.d--------------------------------
----
par(mfrow=c(1,1))
sample = matrix(c(1, 1, 0, 1, 0, 1,0,1.2,0.8))
intensities = exp(metropolis_posterior_betas %*% sample)
poisson_draws = matrix(nrow=num_random_walk,ncol=1)
for (i in 1:num_random_walk){
poisson_draws[i,1] = rpois(1,intensities[i])
}
#print(exp(model$coefficients%*%sample))
hist(poisson_draws, main = "Predictive distribution", sub =paste("glm predicts:
",exp(model$coefficients%*%sample)), freq=F)
prob_no_bids = mean(poisson_draws==0)
prob_no_bids
Question 3
Using phi < 0 makes the series jump back and forth around the mean in a very erratic way. The
autocorrelation is changing sign with each lag-step and approaching zero as the number of lag
steps increases
Using phi == 0 gives a trajectory where the only things affecting the values are the mean and the
error/random term. There is no significant autocorrelation.
phi > 0 will give us a smoother series of values, where each value will tend to be close to the
previous. The autocorrelation is positive but decreases exponentially with the number of lag
steps.
CODE:
fi_1 = .2
mu_1 = 15
sigma_sq_1 = 4
fi_2 = .95
mu_2 = 2
sigma_sq_2 = 8
par(mfrow=c(1,1))
B-I
i) The 95% credible intervals for each of the parameters that has been obtained from the
model are given below,
ii) First trajectory, true values [.2, 15, 4]
The interval for mu is huge in the second run. The mean is curiously enough quite close (2.147).
The other parameters converged nicely. Neff for mu is only 64. I suspect this might be solved by
using more data and running the chains for longer and/or using more chains. The variance in the
randomness is a lot higher here which I suspect makes the problem more difficult for the chain.
B-II)
Like in question 2.c the values do not converge but rather oscillate around some value and stay
there. It seems that no benefit would be gained by performing more iterations. Both chains have
Rhat-values close to 1, indicating convergence. However the second sampler had a lot fewer
efficient iterations.
CODE:
t_max = 300
# ------------------ 3.3.a-------------------------------------------------------
------
par(mfrow=c(3,2))
for (fi in seq(-.8,.8,.8)){
path = matrix(nrow = t_max)
path[1] = mu
for (t in 2:t_max){
path[t] = mu + fi*(path[t-1]-mu) + rnorm(1,0,sigma_sq)
}
plot(path,main=fi,type="line",col="blue")
acf(path,main=paste("ACF ",fi))
}
fi_1 = .2
mu_1 = 15
sigma_sq_1 = 4
fi_2 = .95
mu_2 = 2
sigma_sq_2 = 8
par(mfrow=c(1,1))
write.table(path[,1],file="path1.txt",header=T)
write.table(path[,2],file="path2.txt",header=T)
paths = path
#---------------------------------------------3.3.b---------------------
library(rstan)
#y=c(4,5,6,4,0,2,5,3,8,6,10,8)
#N=length(y)
StanModel = '
data {
vector[300] M; // synthetic data
//matrix[300,2] M
}
parameters {
real mu;
real<lower=0> sigma2;
real phi;
}
model {
mu ~ uniform(-100,100); // Normal with mean 0, st.dev. 100
sigma2 ~ uniform(0,100); // Scaled-inv-chi2 with nu 1,sigma 2
phi ~ uniform(-10,10);
for(i in 2:300){