Lecture 13

Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Binomial
Proportions
Brian Caffo
Mathematical Biostatistics Boot Camp: Lecture 13, Binomial
Table of
contents Proportions
Intervals for
binomial
proportions
Agresti- Coull Brian Caffo

interval
Bayesian Department of Biostatistics

analysis
Prior
Johns Hopkins Bloomberg School of Public Health
specification Johns Hopkins University
Posterior
Credible
intervals
Summary October 18, 2012

Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Table of contents
Binomial
Proportions
Brian Caffo
1 Table of contents
Table of
contents
Intervals for
2 Intervals for binomial proportions
binomial
proportions
Agresti- Coull
3 Agresti- Coull interval
interval
Bayesian
analysis
4 Bayesian analysis
Prior
specification
Prior specification
Posterior
Credible
Posterior
intervals
Credible intervals
Summary
5 Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Intervals for binomial parameters
Binomial
Proportions
Brian Caffo
Table of
contents • When X ∼ Binomial(n, p) we know that
Intervals for
binomial
a. p̂ = X /n is the MLE for p
proportions b. E [p̂] = p
Agresti- Coull c. Var(p̂) = p(1 − p)/n
interval
d. √ p̂−p follows a normal distribution for large n
Bayesian p̂(1−p̂)/n
analysis
Prior
• The latter fact leads to the Wald interval for p
specification
Posterior p
Credible
intervals p̂ ± Z1−α/2 p̂(1 − p̂)/n
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Some discussion
Binomial
Proportions
Brian Caffo
Table of
contents • The Wald interval performs terribly
Intervals for • Coverage probability varies wildly, sometimes being quite low for certain values
binomial
proportions of n even when p is not near the boundaries
Agresti- Coull • Example, when p = .5 and n = 40 the actual coverage of a 95% interval is only
interval
92%
Bayesian
analysis • When p is small or large, coverage can be quite poor even for extremely large
Prior
specification values of n
Posterior
Credible
intervals
• Example, when p = .005 and n = 1, 876 the actual coverage rate of a 95%
Summary interval is only 90%
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Simple fix
Binomial
Proportions
Brian Caffo
• A simple fix for the problem is to add two successes and two failures
Table of
contents • That is let p̃ = (X + 2)/(n + 4)
Intervals for
binomial • The (Agresti- Coull) interval is
proportions
Agresti- Coull
p
interval p̃ ± Z1−α/2 p̃(1 − p̃)/ñ
Bayesian
analysis • Motivation: when p is large or small, the distribution of p̂ is skewed and it
Prior
specification
Posterior
does not make sense to center the interval at the MLE; adding the pseudo
Credible
intervals observations pulls the center of the interval toward .5
Summary
• Later we will show that this interval is the inversion of a hypothesis testing
technique
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Example
Binomial
Proportions
Brian Caffo
Table of Suppose that in a random sample of an at-risk population 13 of 20 subjects had

contents
Intervals for
hypertension. Estimate the prevalence of hypertension in this population.
binomial
proportions • p̂ = .65, n = 20
Agresti- Coull • p̃ = .63, ñ = 24
interval
Bayesian • Z.975 = 1.96

analysis
Prior • Wald interval [.44, .86]
specification
Posterior
Credible • Agresti-Coull interval [.44, .82]
intervals
Summary • 1/8 likelihood interval [.42, .84]

Mathematical
Biostatistics
1.0
Boot Camp:
Lecture 13,
Binomial
Proportions
0.8
Brian Caffo
Table of
contents 0.6
likelihood
Intervals for
binomial
proportions
0.4
Agresti- Coull
interval
Bayesian
analysis
0.2
Prior
specification
Posterior
Credible
intervals
0.0
Summary
0.0 0.2 0.4 0.6 0.8 1.0
p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Bayesian analysis
Binomial
Proportions
Brian Caffo
Table of
contents
• Bayesian statistics posits a prior on the parameter of interest
Intervals for
binomial • All inferences are then performed on the distribution of the parameter given
proportions
Agresti- Coull
the data, called the posterior
interval
• In general,
Bayesian
analysis Posterior ∝ Likelihood × Prior
Prior
specification
Posterior • Therefore (as we saw in diagnostic testing) the likelihood is the factor by which
Credible
intervals our prior beliefs are updated to produce conclusions in the light of the data
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Beta priors
Binomial
Proportions
Brian Caffo
• The beta distribution is the default prior for parameters between 0 and 1.
Table of
contents • The beta density depends on two parameters α and β
Intervals for
binomial Γ(α + β) α−1
proportions p (1 − p)β−1 for 0 ≤ p ≤ 1
Agresti- Coull
Γ(α)Γ(β)
interval
Bayesian • The mean of the beta density is α/(α + β)

analysis
Prior
specification
• The variance of the beta density is
Posterior
Credible
intervals αβ
Summary (α + β)2 (α + β + 1)
• The uniform density is the special case where α = β = 1
Mathematical
Biostatistics alpha = 0.5 beta = 0.5 alpha = 0.5 beta = 1 alpha = 0.5 beta = 2
Boot Camp:
10
15
Lecture 13,
20
density
density
density
Binomial
10
6
10
Proportions
5
2
0
Brian Caffo
0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
Table of p p p
contents
Intervals for alpha = 1 beta = 0.5 alpha = 1 beta = 1 alpha = 1 beta = 2

binomial
1.4
2.0
15
proportions
density
density
density
10
1.0
1.0
Agresti- Coull
interval
5
0.6
0.0
0
Bayesian
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
analysis
Prior p p p
specification
Posterior
Credible alpha = 2 beta = 0.5 alpha = 2 beta = 1 alpha = 2 beta = 2
intervals
2.0
Summary
20
1.0
density
density
density
1.0
10
0.0
0.0
0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
p p p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Posterior
Binomial
Proportions
Brian Caffo • Suppose that we chose values of α and β so that the beta prior is indicative of
Table of our degree of belief regarding p in the absence of data
contents
Intervals for
• Then using the rule that
binomial
proportions
Posterior ∝ Likelihood × Prior
Agresti- Coull
interval
Bayesian and throwing out anything that doesn’t depend on p, we have that
analysis
Prior
specification
Posterior
Posterior ∝ p x (1 − p)n−x × p α−1 (1 − p)β−1
Credible
intervals = p x+α−1 (1 − p)n−x+β−1
Summary
• This density is just another beta density with parameters α̃ = x + α and

β̃ = n − x + β
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Posterior mean
Binomial
Proportions
Brian Caffo • Posterior mean

Table of α̃
contents E [p | X ] =
Intervals for α̃ + β̃
binomial
proportions x +α
=
Agresti- Coull
interval
x +α+n−x +β
Bayesian
analysis
x +α
=
Prior
specification n+α+β
Posterior
Credible
intervals x n α α+β
Summary
= × + ×
n n+α+β α+β n+α+β
= MLE × π + Prior Mean × (1 − π)

Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Binomial
Proportions
Brian Caffo
• The posterior mean is a mixture of the MLE (p̂) and the prior mean
Table of
contents • π goes to 1 as n gets large; for large n the data swamps the prior
Intervals for
binomial • For small n, the prior mean dominates
proportions
Agresti- Coull
• Generalizes how science should ideally work; as data becomes increasingly
interval available, prior beliefs should matter less and less
Bayesian
analysis • With a prior that is degenerate at a value, no amount of data can overcome
Prior
specification the prior
Posterior
Credible
intervals
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Posterior variance
Binomial
Proportions
Brian Caffo
• The posterior variance is
Table of
contents
Intervals for α̃β̃

binomial Var(p | x) =
proportions (α̃ + β̃)2 (α̃ + β̃ + 1)
Agresti- Coull
interval (x + α)(n − x + β)
=
Bayesian
analysis
(n + α + β)2 (n + α + β + 1)
Prior
specification
Posterior • Let p̃ = (x + α)/(n + α + β) and ñ = n + α + β then we have
Credible
intervals
Summary p̃(1 − p̃)

Var(p | x) =
ñ + 1
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Discussion
Binomial
Proportions
Brian Caffo
Table of
contents • If α = β = 2 then the posterior mean is
Intervals for
binomial
proportions
p̃ = (x + 2)/(n + 4)
Agresti- Coull
interval and the posterior variance is
Bayesian
analysis
Prior p̃(1 − p̃)/(ñ + 1)
specification
Posterior
Credible
intervals • This is almost exactly the mean and variance we used for the Agresti-Coull
Summary interval
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Example
Binomial
Proportions
Brian Caffo • Consider the previous example where x = 13 and n = 20

Table of • Consider a uniform prior, α = β = 1
contents
Intervals for • The posterior is proportional to (see formula above)

binomial
proportions
Agresti- Coull
p x+α−1 (1 − p)n−x+β−1 = p x (1 − p)n−x
interval
Bayesian that is, for the uniform prior, the posterior is the likelihood
analysis
Prior
specification
• Consider the instance where α = β = 2 (recall this prior is humped around the
Posterior
Credible point .5) the posterior is
intervals
Summary
p x+α−1 (1 − p)n−x+β−1 = p x+1 (1 − p)n−x+1
• The “Jeffrey’s prior” which has some theoretical benefits puts α = β = .5

Mathematical
Biostatistics alpha = 0.5 beta = 0.5
Boot Camp:
Lecture 13,
Binomial
1.0
Proportions
Prior
Brian Caffo Likelihood
Posterior
0.8
Table of
contents
prior, likelihood, posterior
Intervals for
0.6
binomial
proportions
Agresti- Coull
interval
0.4
Bayesian
analysis
Prior
specification
0.2
Posterior
Credible
intervals
Summary
0.0
0.0 0.2 0.4 0.6 0.8 1.0
p
Mathematical
Biostatistics
alpha = 1 beta = 1
Boot Camp:
Lecture 13,
1.0
Binomial
Proportions Prior
Likelihood
Brian Caffo
Posterior
0.8
Table of
contents
Intervals for
0.6
binomial
proportions
Agresti- Coull
interval
0.4
Bayesian
analysis
Prior
specification
0.2
Posterior
Credible
intervals
Summary
0.0
0.0 0.2 0.4 0.6 0.8 1.0
p
Mathematical
Biostatistics
alpha = 2 beta = 2
Boot Camp:
Lecture 13,
1.0
Binomial
Proportions Prior
Likelihood
Brian Caffo
Posterior
0.8
Table of
contents
Intervals for
0.6
binomial
proportions
Agresti- Coull
interval
0.4
Bayesian
analysis
Prior
specification
0.2
Posterior
Credible
intervals
Summary
0.0
0.0 0.2 0.4 0.6 0.8 1.0
p
Mathematical alpha = 2 beta = 10
Biostatistics
Boot Camp:
Lecture 13,
1.0
Binomial
Proportions Prior
Likelihood
Brian Caffo
Posterior
0.8
Table of
contents
Intervals for
0.6
binomial
proportions
Agresti- Coull
interval
0.4
Bayesian
analysis
Prior
specification
0.2
Posterior
Credible
intervals
Summary
0.0
0.0 0.2 0.4 0.6 0.8 1.0
p
Mathematical
Biostatistics alpha = 100 beta = 100
Boot Camp:
Lecture 13,
1.0
Binomial
Proportions Prior
Brian Caffo Likelihood
Posterior
0.8
Table of
contents
Intervals for
0.6
binomial
proportions
Agresti- Coull
interval
0.4
Bayesian
analysis
Prior
specification
0.2
Posterior
Credible
intervals
Summary
0.0
0.0 0.2 0.4 0.6 0.8 1.0
p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Bayesian credible intervals
Binomial
Proportions
Brian Caffo
Table of
contents
• A Bayesian credible interval is the Bayesian analog of a confidence interval
Intervals for
binomial • A 95% credible interval, [a, b] would satisfy
proportions
Agresti- Coull
interval P(p ∈ [a, b] | x) = .95
Bayesian
analysis
Prior
• The best credible intervals chop off the posterior with a horizontal line in the
specification
Posterior
same way we did for likelihoods
Credible
intervals • These are called highest posterior density (HPD) intervals
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Binomial
Proportions
3
Brian Caffo
Table of
contents
Posterior
Intervals for
2
binomial
proportions 95%
Agresti- Coull
interval
Bayesian
1
analysis
Prior (0.44,0.64) (0.84,0.64)
specification
Posterior
Credible
intervals
0
Summary
0.0 0.2 0.4 0.6 0.8 1.0
p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
R code
Binomial
Proportions
Brian Caffo
Table of
contents
Intervals for Install the binom package, then the command

binomial
proportions
Agresti- Coull
library(binom)
interval binom.bayes(13, 20, type = "highest")
Bayesian
analysis
Prior
gives the HPD interval. The default credible level is 95% and the default prior is
specification
Posterior the Jeffrey’s prior.
Credible
intervals
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Interpretation of confidence intervals
Binomial
Proportions
Brian Caffo
Table of • Confidence interval: (Wald) [.44, .86]

contents
Intervals for • Fuzzy interpretation:

binomial
proportions We are 95% confident that p lies between .44 to .86
Agresti- Coull
interval
• Actual interpretation:
Bayesian
analysis The interval .44 to .86 was constructed such that in repeated
Prior
specification independent experiments, 95% of the intervals obtained would
Posterior
Credible
intervals
contain p.
Summary
• Yikes!
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Likelihood intervals
Binomial
Proportions
Brian Caffo
Table of • Recall the 1/8 likelihood interval was [.42, .84]

contents
Intervals for
• Fuzzy interpretation:
binomial
proportions The interval [.42, .84] represents plausible values for p.
Agresti- Coull
interval
• Actual interpretation
Bayesian
analysis The interval [.42, .84] represents plausible values for p in the sense
Prior
specification
Posterior
that for each point in this interval, there is no other point that is
Credible
intervals
more than 8 times better supported given the data.
Summary
• Yikes!
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Credible intervals
Binomial
Proportions
Brian Caffo
Table of
contents
Intervals for
binomial • Recall the Jeffrey’s prior 95% credible interval was
proportions
Agresti- Coull
[.44, .84]
interval
• Actual interpretation
Bayesian
analysis The probability that p is between .44 and .84 is 95%.
Prior
specification
Posterior
Credible
intervals
Summary

Lecture 13

Uploaded by

Copyright:

Available Formats

Lecture 13

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 13

Uploaded by

Copyright:

Available Formats

Mathematical

Agresti- Coull Brian Caffo

Bayesian Department of Biostatistics

Summary October 18, 2012

Table of Suppose that in a random sample of an at-risk population 13 of 20 subjects had

Bayesian • Z.975 = 1.96

Summary • 1/8 likelihood interval [.42, .84]

0.0 0.2 0.4 0.6 0.8 1.0

Bayesian • The mean of the beta density is α/(α + β)

Intervals for alpha = 1 beta = 0.5 alpha = 1 beta = 1 alpha = 1 beta = 2

0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8

• This density is just another beta density with parameters α̃ = x + α and

Brian Caffo • Posterior mean

= MLE × π + Prior Mean × (1 − π)

Intervals for α̃β̃

Summary p̃(1 − p̃)

Brian Caffo • Consider the previous example where x = 13 and n = 20

Intervals for • The posterior is proportional to (see formula above)

• The “Jeffrey’s prior” which has some theoretical benefits puts α = β = .5

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Intervals for Install the binom package, then the command

Table of • Confidence interval: (Wald) [.44, .86]

Intervals for • Fuzzy interpretation:

Table of • Recall the 1/8 likelihood interval was [.42, .84]

You might also like