Lecture 13
Lecture 13
Lecture 13
Biostatistics
Boot Camp:
Lecture 13,
Binomial
Proportions
Brian Caffo
Mathematical Biostatistics Boot Camp: Lecture 13, Binomial
Table of
contents Proportions
Intervals for
binomial
proportions
Brian Caffo
1 Table of contents
Table of
contents
Intervals for
2 Intervals for binomial proportions
binomial
proportions
Agresti- Coull
3 Agresti- Coull interval
interval
Bayesian
analysis
4 Bayesian analysis
Prior
specification
Prior specification
Posterior
Credible
Posterior
intervals
Credible intervals
Summary
5 Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Intervals for binomial parameters
Binomial
Proportions
Brian Caffo
Table of
contents • When X ∼ Binomial(n, p) we know that
Intervals for
binomial
a. p̂ = X /n is the MLE for p
proportions b. E [p̂] = p
Agresti- Coull c. Var(p̂) = p(1 − p)/n
interval
d. √ p̂−p follows a normal distribution for large n
Bayesian p̂(1−p̂)/n
analysis
Prior
• The latter fact leads to the Wald interval for p
specification
Posterior p
Credible
intervals p̂ ± Z1−α/2 p̂(1 − p̂)/n
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Some discussion
Binomial
Proportions
Brian Caffo
Table of
contents • The Wald interval performs terribly
Intervals for • Coverage probability varies wildly, sometimes being quite low for certain values
binomial
proportions of n even when p is not near the boundaries
Agresti- Coull • Example, when p = .5 and n = 40 the actual coverage of a 95% interval is only
interval
92%
Bayesian
analysis • When p is small or large, coverage can be quite poor even for extremely large
Prior
specification values of n
Posterior
Credible
intervals
• Example, when p = .005 and n = 1, 876 the actual coverage rate of a 95%
Summary interval is only 90%
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Simple fix
Binomial
Proportions
Brian Caffo
• A simple fix for the problem is to add two successes and two failures
Table of
contents • That is let p̃ = (X + 2)/(n + 4)
Intervals for
binomial • The (Agresti- Coull) interval is
proportions
Agresti- Coull
p
interval p̃ ± Z1−α/2 p̃(1 − p̃)/ñ
Bayesian
analysis • Motivation: when p is large or small, the distribution of p̂ is skewed and it
Prior
specification
Posterior
does not make sense to center the interval at the MLE; adding the pseudo
Credible
intervals observations pulls the center of the interval toward .5
Summary
• Later we will show that this interval is the inversion of a hypothesis testing
technique
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Example
Binomial
Proportions
Brian Caffo
Intervals for
hypertension. Estimate the prevalence of hypertension in this population.
binomial
proportions • p̂ = .65, n = 20
Agresti- Coull • p̃ = .63, ñ = 24
interval
1.0
Boot Camp:
Lecture 13,
Binomial
Proportions
0.8
Brian Caffo
Table of
contents 0.6
likelihood
Intervals for
binomial
proportions
0.4
Agresti- Coull
interval
Bayesian
analysis
0.2
Prior
specification
Posterior
Credible
intervals
0.0
Summary
p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Bayesian analysis
Binomial
Proportions
Brian Caffo
Table of
contents
• Bayesian statistics posits a prior on the parameter of interest
Intervals for
binomial • All inferences are then performed on the distribution of the parameter given
proportions
Agresti- Coull
the data, called the posterior
interval
• In general,
Bayesian
analysis Posterior ∝ Likelihood × Prior
Prior
specification
Posterior • Therefore (as we saw in diagnostic testing) the likelihood is the factor by which
Credible
intervals our prior beliefs are updated to produce conclusions in the light of the data
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Beta priors
Binomial
Proportions
Brian Caffo
• The beta distribution is the default prior for parameters between 0 and 1.
Table of
contents • The beta density depends on two parameters α and β
Intervals for
binomial Γ(α + β) α−1
proportions p (1 − p)β−1 for 0 ≤ p ≤ 1
Agresti- Coull
Γ(α)Γ(β)
interval
10
15
Lecture 13,
20
density
density
density
Binomial
10
6
10
Proportions
5
2
0
Brian Caffo
0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
Table of p p p
contents
1.4
2.0
15
proportions
density
density
density
10
1.0
1.0
Agresti- Coull
interval
5
0.6
0.0
0
Bayesian
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
analysis
Prior p p p
specification
Posterior
Credible alpha = 2 beta = 0.5 alpha = 2 beta = 1 alpha = 2 beta = 2
intervals
2.0
Summary
20
1.0
density
density
density
1.0
10
0.0
0.0
0
p p p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Posterior
Binomial
Proportions
Brian Caffo • Suppose that we chose values of α and β so that the beta prior is indicative of
Table of our degree of belief regarding p in the absence of data
contents
Intervals for
• Then using the rule that
binomial
proportions
Posterior ∝ Likelihood × Prior
Agresti- Coull
interval
Bayesian and throwing out anything that doesn’t depend on p, we have that
analysis
Prior
specification
Posterior
Posterior ∝ p x (1 − p)n−x × p α−1 (1 − p)β−1
Credible
intervals = p x+α−1 (1 − p)n−x+β−1
Summary
Brian Caffo
• The posterior mean is a mixture of the MLE (p̂) and the prior mean
Table of
contents • π goes to 1 as n gets large; for large n the data swamps the prior
Intervals for
binomial • For small n, the prior mean dominates
proportions
Agresti- Coull
• Generalizes how science should ideally work; as data becomes increasingly
interval available, prior beliefs should matter less and less
Bayesian
analysis • With a prior that is degenerate at a value, no amount of data can overcome
Prior
specification the prior
Posterior
Credible
intervals
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Posterior variance
Binomial
Proportions
Brian Caffo
• The posterior variance is
Table of
contents
Brian Caffo
Table of
contents • If α = β = 2 then the posterior mean is
Intervals for
binomial
proportions
p̃ = (x + 2)/(n + 4)
Agresti- Coull
interval and the posterior variance is
Bayesian
analysis
Prior p̃(1 − p̃)/(ñ + 1)
specification
Posterior
Credible
intervals • This is almost exactly the mean and variance we used for the Agresti-Coull
Summary interval
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Example
Binomial
Proportions
Agresti- Coull
p x+α−1 (1 − p)n−x+β−1 = p x (1 − p)n−x
interval
Bayesian that is, for the uniform prior, the posterior is the likelihood
analysis
Prior
specification
• Consider the instance where α = β = 2 (recall this prior is humped around the
Posterior
Credible point .5) the posterior is
intervals
Summary
p x+α−1 (1 − p)n−x+β−1 = p x+1 (1 − p)n−x+1
1.0
Proportions
Prior
Brian Caffo Likelihood
Posterior
0.8
Table of
contents
prior, likelihood, posterior
Intervals for
0.6
binomial
proportions
Agresti- Coull
interval
0.4
Bayesian
analysis
Prior
specification
0.2
Posterior
Credible
intervals
Summary
0.0
p
Mathematical
Biostatistics
alpha = 1 beta = 1
Boot Camp:
Lecture 13,
1.0
Binomial
Proportions Prior
Likelihood
Brian Caffo
Posterior
0.8
Table of
prior, likelihood, posterior
contents
Intervals for
0.6
binomial
proportions
Agresti- Coull
interval
0.4
Bayesian
analysis
Prior
specification
0.2
Posterior
Credible
intervals
Summary
0.0
p
Mathematical
Biostatistics
alpha = 2 beta = 2
Boot Camp:
Lecture 13,
1.0
Binomial
Proportions Prior
Likelihood
Brian Caffo
Posterior
0.8
Table of
prior, likelihood, posterior
contents
Intervals for
0.6
binomial
proportions
Agresti- Coull
interval
0.4
Bayesian
analysis
Prior
specification
0.2
Posterior
Credible
intervals
Summary
0.0
p
Mathematical alpha = 2 beta = 10
Biostatistics
Boot Camp:
Lecture 13,
1.0
Binomial
Proportions Prior
Likelihood
Brian Caffo
Posterior
0.8
Table of
prior, likelihood, posterior
contents
Intervals for
0.6
binomial
proportions
Agresti- Coull
interval
0.4
Bayesian
analysis
Prior
specification
0.2
Posterior
Credible
intervals
Summary
0.0
p
Mathematical
Biostatistics alpha = 100 beta = 100
Boot Camp:
Lecture 13,
1.0
Binomial
Proportions Prior
Brian Caffo Likelihood
Posterior
0.8
Table of
prior, likelihood, posterior
contents
Intervals for
0.6
binomial
proportions
Agresti- Coull
interval
0.4
Bayesian
analysis
Prior
specification
0.2
Posterior
Credible
intervals
Summary
0.0
p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Bayesian credible intervals
Binomial
Proportions
Brian Caffo
Table of
contents
• A Bayesian credible interval is the Bayesian analog of a confidence interval
Intervals for
binomial • A 95% credible interval, [a, b] would satisfy
proportions
Agresti- Coull
interval P(p ∈ [a, b] | x) = .95
Bayesian
analysis
Prior
• The best credible intervals chop off the posterior with a horizontal line in the
specification
Posterior
same way we did for likelihoods
Credible
intervals • These are called highest posterior density (HPD) intervals
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Binomial
Proportions
3
Brian Caffo
Table of
contents
Posterior
Intervals for
2
binomial
proportions 95%
Agresti- Coull
interval
Bayesian
1
analysis
Prior (0.44,0.64) (0.84,0.64)
specification
Posterior
Credible
intervals
0
Summary
p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
R code
Binomial
Proportions
Brian Caffo
Table of
contents
Agresti- Coull
library(binom)
interval binom.bayes(13, 20, type = "highest")
Bayesian
analysis
Prior
gives the HPD interval. The default credible level is 95% and the default prior is
specification
Posterior the Jeffrey’s prior.
Credible
intervals
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Interpretation of confidence intervals
Binomial
Proportions
Brian Caffo
Brian Caffo
Intervals for
• Fuzzy interpretation:
binomial
proportions The interval [.42, .84] represents plausible values for p.
Agresti- Coull
interval
• Actual interpretation
Bayesian
analysis The interval [.42, .84] represents plausible values for p in the sense
Prior
specification
Posterior
that for each point in this interval, there is no other point that is
Credible
intervals
more than 8 times better supported given the data.
Summary
• Yikes!
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Credible intervals
Binomial
Proportions
Brian Caffo
Table of
contents
Intervals for
binomial • Recall the Jeffrey’s prior 95% credible interval was
proportions
Agresti- Coull
[.44, .84]
interval
• Actual interpretation
Bayesian
analysis The probability that p is between .44 and .84 is 95%.
Prior
specification
Posterior
Credible
intervals
Summary