Lecture 13

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Mathematical

Biostatistics
Boot Camp:
Lecture 13,
Binomial
Proportions

Brian Caffo
Mathematical Biostatistics Boot Camp: Lecture 13, Binomial
Table of
contents Proportions
Intervals for
binomial
proportions

Agresti- Coull Brian Caffo


interval

Bayesian Department of Biostatistics


analysis
Prior
Johns Hopkins Bloomberg School of Public Health
specification Johns Hopkins University
Posterior
Credible
intervals

Summary October 18, 2012


Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Table of contents
Binomial
Proportions

Brian Caffo
1 Table of contents
Table of
contents

Intervals for
2 Intervals for binomial proportions
binomial
proportions

Agresti- Coull
3 Agresti- Coull interval
interval

Bayesian
analysis
4 Bayesian analysis
Prior
specification
Prior specification
Posterior
Credible
Posterior
intervals
Credible intervals
Summary

5 Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Intervals for binomial parameters
Binomial
Proportions

Brian Caffo

Table of
contents • When X ∼ Binomial(n, p) we know that
Intervals for
binomial
a. p̂ = X /n is the MLE for p
proportions b. E [p̂] = p
Agresti- Coull c. Var(p̂) = p(1 − p)/n
interval
d. √ p̂−p follows a normal distribution for large n
Bayesian p̂(1−p̂)/n
analysis
Prior
• The latter fact leads to the Wald interval for p
specification
Posterior p
Credible
intervals p̂ ± Z1−α/2 p̂(1 − p̂)/n
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Some discussion
Binomial
Proportions

Brian Caffo

Table of
contents • The Wald interval performs terribly
Intervals for • Coverage probability varies wildly, sometimes being quite low for certain values
binomial
proportions of n even when p is not near the boundaries
Agresti- Coull • Example, when p = .5 and n = 40 the actual coverage of a 95% interval is only
interval
92%
Bayesian
analysis • When p is small or large, coverage can be quite poor even for extremely large
Prior
specification values of n
Posterior
Credible
intervals
• Example, when p = .005 and n = 1, 876 the actual coverage rate of a 95%
Summary interval is only 90%
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Simple fix
Binomial
Proportions

Brian Caffo
• A simple fix for the problem is to add two successes and two failures
Table of
contents • That is let p̃ = (X + 2)/(n + 4)
Intervals for
binomial • The (Agresti- Coull) interval is
proportions

Agresti- Coull
p
interval p̃ ± Z1−α/2 p̃(1 − p̃)/ñ
Bayesian
analysis • Motivation: when p is large or small, the distribution of p̂ is skewed and it
Prior
specification
Posterior
does not make sense to center the interval at the MLE; adding the pseudo
Credible
intervals observations pulls the center of the interval toward .5
Summary
• Later we will show that this interval is the inversion of a hypothesis testing
technique
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Example
Binomial
Proportions

Brian Caffo

Table of Suppose that in a random sample of an at-risk population 13 of 20 subjects had


contents

Intervals for
hypertension. Estimate the prevalence of hypertension in this population.
binomial
proportions • p̂ = .65, n = 20
Agresti- Coull • p̃ = .63, ñ = 24
interval

Bayesian • Z.975 = 1.96


analysis
Prior • Wald interval [.44, .86]
specification
Posterior
Credible • Agresti-Coull interval [.44, .82]
intervals

Summary • 1/8 likelihood interval [.42, .84]


Mathematical
Biostatistics

1.0
Boot Camp:
Lecture 13,
Binomial
Proportions

0.8
Brian Caffo

Table of
contents 0.6
likelihood

Intervals for
binomial
proportions
0.4

Agresti- Coull
interval

Bayesian
analysis
0.2

Prior
specification
Posterior
Credible
intervals
0.0

Summary

0.0 0.2 0.4 0.6 0.8 1.0

p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Bayesian analysis
Binomial
Proportions

Brian Caffo

Table of
contents
• Bayesian statistics posits a prior on the parameter of interest
Intervals for
binomial • All inferences are then performed on the distribution of the parameter given
proportions

Agresti- Coull
the data, called the posterior
interval
• In general,
Bayesian
analysis Posterior ∝ Likelihood × Prior
Prior
specification
Posterior • Therefore (as we saw in diagnostic testing) the likelihood is the factor by which
Credible
intervals our prior beliefs are updated to produce conclusions in the light of the data
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Beta priors
Binomial
Proportions

Brian Caffo
• The beta distribution is the default prior for parameters between 0 and 1.
Table of
contents • The beta density depends on two parameters α and β
Intervals for
binomial Γ(α + β) α−1
proportions p (1 − p)β−1 for 0 ≤ p ≤ 1
Agresti- Coull
Γ(α)Γ(β)
interval

Bayesian • The mean of the beta density is α/(α + β)


analysis
Prior
specification
• The variance of the beta density is
Posterior
Credible
intervals αβ
Summary (α + β)2 (α + β + 1)
• The uniform density is the special case where α = β = 1
Mathematical
Biostatistics alpha = 0.5 beta = 0.5 alpha = 0.5 beta = 1 alpha = 0.5 beta = 2
Boot Camp:

10

15
Lecture 13,

20
density

density

density
Binomial

10
6

10
Proportions

5
2

0
Brian Caffo

0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8

Table of p p p
contents

Intervals for alpha = 1 beta = 0.5 alpha = 1 beta = 1 alpha = 1 beta = 2


binomial

1.4

2.0
15

proportions
density

density

density
10

1.0

1.0
Agresti- Coull
interval
5

0.6

0.0
0

Bayesian
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8
analysis
Prior p p p
specification
Posterior
Credible alpha = 2 beta = 0.5 alpha = 2 beta = 1 alpha = 2 beta = 2
intervals
2.0

Summary
20

1.0
density

density

density
1.0
10

0.0

0.0
0

0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8

p p p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Posterior
Binomial
Proportions

Brian Caffo • Suppose that we chose values of α and β so that the beta prior is indicative of
Table of our degree of belief regarding p in the absence of data
contents

Intervals for
• Then using the rule that
binomial
proportions
Posterior ∝ Likelihood × Prior
Agresti- Coull
interval

Bayesian and throwing out anything that doesn’t depend on p, we have that
analysis
Prior
specification
Posterior
Posterior ∝ p x (1 − p)n−x × p α−1 (1 − p)β−1
Credible
intervals = p x+α−1 (1 − p)n−x+β−1
Summary

• This density is just another beta density with parameters α̃ = x + α and


β̃ = n − x + β
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Posterior mean
Binomial
Proportions

Brian Caffo • Posterior mean


Table of α̃
contents E [p | X ] =
Intervals for α̃ + β̃
binomial
proportions x +α
=
Agresti- Coull
interval
x +α+n−x +β
Bayesian
analysis
x +α
=
Prior
specification n+α+β
Posterior
Credible
intervals x n α α+β
Summary
= × + ×
n n+α+β α+β n+α+β

= MLE × π + Prior Mean × (1 − π)


Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Binomial
Proportions

Brian Caffo
• The posterior mean is a mixture of the MLE (p̂) and the prior mean
Table of
contents • π goes to 1 as n gets large; for large n the data swamps the prior
Intervals for
binomial • For small n, the prior mean dominates
proportions

Agresti- Coull
• Generalizes how science should ideally work; as data becomes increasingly
interval available, prior beliefs should matter less and less
Bayesian
analysis • With a prior that is degenerate at a value, no amount of data can overcome
Prior
specification the prior
Posterior
Credible
intervals

Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Posterior variance
Binomial
Proportions

Brian Caffo
• The posterior variance is
Table of
contents

Intervals for α̃β̃


binomial Var(p | x) =
proportions (α̃ + β̃)2 (α̃ + β̃ + 1)
Agresti- Coull
interval (x + α)(n − x + β)
=
Bayesian
analysis
(n + α + β)2 (n + α + β + 1)
Prior
specification
Posterior • Let p̃ = (x + α)/(n + α + β) and ñ = n + α + β then we have
Credible
intervals

Summary p̃(1 − p̃)


Var(p | x) =
ñ + 1
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Discussion
Binomial
Proportions

Brian Caffo

Table of
contents • If α = β = 2 then the posterior mean is
Intervals for
binomial
proportions
p̃ = (x + 2)/(n + 4)
Agresti- Coull
interval and the posterior variance is
Bayesian
analysis
Prior p̃(1 − p̃)/(ñ + 1)
specification
Posterior
Credible
intervals • This is almost exactly the mean and variance we used for the Agresti-Coull
Summary interval
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Example
Binomial
Proportions

Brian Caffo • Consider the previous example where x = 13 and n = 20


Table of • Consider a uniform prior, α = β = 1
contents

Intervals for • The posterior is proportional to (see formula above)


binomial
proportions

Agresti- Coull
p x+α−1 (1 − p)n−x+β−1 = p x (1 − p)n−x
interval

Bayesian that is, for the uniform prior, the posterior is the likelihood
analysis
Prior
specification
• Consider the instance where α = β = 2 (recall this prior is humped around the
Posterior
Credible point .5) the posterior is
intervals

Summary
p x+α−1 (1 − p)n−x+β−1 = p x+1 (1 − p)n−x+1

• The “Jeffrey’s prior” which has some theoretical benefits puts α = β = .5


Mathematical
Biostatistics alpha = 0.5 beta = 0.5
Boot Camp:
Lecture 13,
Binomial

1.0
Proportions
Prior
Brian Caffo Likelihood
Posterior

0.8
Table of
contents
prior, likelihood, posterior

Intervals for
0.6

binomial
proportions

Agresti- Coull
interval
0.4

Bayesian
analysis
Prior
specification
0.2

Posterior
Credible
intervals

Summary
0.0

0.0 0.2 0.4 0.6 0.8 1.0

p
Mathematical
Biostatistics
alpha = 1 beta = 1
Boot Camp:
Lecture 13,

1.0
Binomial
Proportions Prior
Likelihood
Brian Caffo
Posterior

0.8
Table of
prior, likelihood, posterior

contents

Intervals for
0.6

binomial
proportions

Agresti- Coull
interval
0.4

Bayesian
analysis
Prior
specification
0.2

Posterior
Credible
intervals

Summary
0.0

0.0 0.2 0.4 0.6 0.8 1.0

p
Mathematical
Biostatistics
alpha = 2 beta = 2
Boot Camp:
Lecture 13,

1.0
Binomial
Proportions Prior
Likelihood
Brian Caffo
Posterior

0.8
Table of
prior, likelihood, posterior

contents

Intervals for
0.6

binomial
proportions

Agresti- Coull
interval
0.4

Bayesian
analysis
Prior
specification
0.2

Posterior
Credible
intervals

Summary
0.0

0.0 0.2 0.4 0.6 0.8 1.0

p
Mathematical alpha = 2 beta = 10
Biostatistics
Boot Camp:
Lecture 13,

1.0
Binomial
Proportions Prior
Likelihood
Brian Caffo
Posterior

0.8
Table of
prior, likelihood, posterior
contents

Intervals for
0.6

binomial
proportions

Agresti- Coull
interval
0.4

Bayesian
analysis
Prior
specification
0.2

Posterior
Credible
intervals

Summary
0.0

0.0 0.2 0.4 0.6 0.8 1.0

p
Mathematical
Biostatistics alpha = 100 beta = 100
Boot Camp:
Lecture 13,

1.0
Binomial
Proportions Prior
Brian Caffo Likelihood
Posterior

0.8
Table of
prior, likelihood, posterior

contents

Intervals for
0.6

binomial
proportions

Agresti- Coull
interval
0.4

Bayesian
analysis
Prior
specification
0.2

Posterior
Credible
intervals

Summary
0.0

0.0 0.2 0.4 0.6 0.8 1.0

p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Bayesian credible intervals
Binomial
Proportions

Brian Caffo

Table of
contents
• A Bayesian credible interval is the Bayesian analog of a confidence interval
Intervals for
binomial • A 95% credible interval, [a, b] would satisfy
proportions

Agresti- Coull
interval P(p ∈ [a, b] | x) = .95
Bayesian
analysis
Prior
• The best credible intervals chop off the posterior with a horizontal line in the
specification
Posterior
same way we did for likelihoods
Credible
intervals • These are called highest posterior density (HPD) intervals
Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Binomial
Proportions

3
Brian Caffo

Table of
contents
Posterior

Intervals for
2

binomial
proportions 95%
Agresti- Coull
interval

Bayesian
1

analysis
Prior (0.44,0.64) (0.84,0.64)
specification
Posterior
Credible
intervals
0

Summary

0.0 0.2 0.4 0.6 0.8 1.0

p
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
R code
Binomial
Proportions

Brian Caffo

Table of
contents

Intervals for Install the binom package, then the command


binomial
proportions

Agresti- Coull
library(binom)
interval binom.bayes(13, 20, type = "highest")
Bayesian
analysis
Prior
gives the HPD interval. The default credible level is 95% and the default prior is
specification
Posterior the Jeffrey’s prior.
Credible
intervals

Summary
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Interpretation of confidence intervals
Binomial
Proportions

Brian Caffo

Table of • Confidence interval: (Wald) [.44, .86]


contents

Intervals for • Fuzzy interpretation:


binomial
proportions We are 95% confident that p lies between .44 to .86
Agresti- Coull
interval
• Actual interpretation:
Bayesian
analysis The interval .44 to .86 was constructed such that in repeated
Prior
specification independent experiments, 95% of the intervals obtained would
Posterior
Credible
intervals
contain p.
Summary
• Yikes!
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Likelihood intervals
Binomial
Proportions

Brian Caffo

Table of • Recall the 1/8 likelihood interval was [.42, .84]


contents

Intervals for
• Fuzzy interpretation:
binomial
proportions The interval [.42, .84] represents plausible values for p.
Agresti- Coull
interval
• Actual interpretation
Bayesian
analysis The interval [.42, .84] represents plausible values for p in the sense
Prior
specification
Posterior
that for each point in this interval, there is no other point that is
Credible
intervals
more than 8 times better supported given the data.
Summary
• Yikes!
Mathematical
Biostatistics
Boot Camp:
Lecture 13,
Credible intervals
Binomial
Proportions

Brian Caffo

Table of
contents

Intervals for
binomial • Recall the Jeffrey’s prior 95% credible interval was
proportions

Agresti- Coull
[.44, .84]
interval
• Actual interpretation
Bayesian
analysis The probability that p is between .44 and .84 is 95%.
Prior
specification
Posterior
Credible
intervals

Summary

You might also like