Material de Estudio 3
Material de Estudio 3
Material de Estudio 3
or, equivalently, as
exp(β0 + β1X)
p =
1 + exp(β0 + β1X)
1.0
0.8
5
+ slope
0.6
Probability
Log-Odds
0 slope
0
0.4
0 slope
- slope
0.2
-5
+ slope - slope
0.0
-5 0 5 -5 0 5
X X
Plot the observed probability of mortality and the empirical logits with
linear and quadratic LS fits (which are not the same as the logistic MLE
fits).
Observed mortality, probability scale Empirical logit with `naive' LS fits (not MLE)
1.00
4
0.75
2
emp.logit
rep rep
p.hat
1 1
0.50 2 2
0.25
−2
50 60 70 50 60 70
conc conc
4 Logistic Regression and Newton-Raphson
However, there does not appear to be any biological motivation for this
and so here they are retained in the data set.
Combining the data from the two replicates and plotting the empirical
logit of the observed proportions against concentration gives a relationship
that is better fit by a quadratic than a linear relationship,
p
log = β0 + β1X + β2X 2.
1−p
The right plot below shows the linear and quadratic model fits to the
observed values with point-wise 95% confidence bands on the logit scale,
and on the left is the same on the proportion scale.
1.2 The Model 5
Observed and predicted mortality, probability scale Observed and predicted mortality, logit scale
1.00 ●
●
● 7.5
●
●
●
●
●
5.0
0.75
●
●
modelorder modelorder
● linear ● ● linear
●
emp.logit
● quadratic ● quadratic
p.hat
●
●
2.5 ●
0.50 rep rep
● 1 ●
● ● 1
2 2
● ●
●
● 0.0
●
0.25 ●
● ●
●
●
●
●
−2.5 ●
●
0.00
50 60 70 50 60 70
conc conc
exp(x> i β)
pi = e and
1 + exp(x>
e
i β )
1
e e
1 − pi = .
1 + exp(x> i β )
e e
To obtain the MLEs we first write the log-likelihood in (1.1) as a function
of β ,
e
exp(x>β )
! i
n e >
1 1+exp(xi β )
X e
`(β ) = mi log >
+ yi log 1
ee
e
i=1
1 + exp(xi β )
1+exp(x >β )
e e i
( ! ) e e
n
X 1 >
= mi log >β )
+ y i (x i β)
i=1
1 + exp(x i e e
n n
e e
X o
> >
= yi(xi β ) − mi log(1 + exp(xi β )) . (1.2)
i=1
e e e e
˙ ) = 0r+1.
`(β
e e
8 Logistic Regression and Newton-Raphson
and
∂ exp(x>
∂
∂βj i β)
>
log(1 + exp(xi β )) = e e
∂βj e e 1 + exp(x> i β)
exp(x>
e
β )
e
i ∂ >
= e e (xi β )
1 + exp(x> i β ) ∂β j e e
e e
= pi(xi, β )xij , (1.5)
e e
and so
∂`(β ) Xn n o
e = yixij − mipi(xi, β )xij
∂βj i=1
e e
Xn n o
= xij (yi − mipi(xi, β )) , j = 0, 1, . . . , r. (1.6)
i=1
e e
1.2 The Model 9
It is straightforward to show
∂pi(xi, β )
e e = xik pi(xi, β )(1 − pi(xi, β )).
∂βk e e e e e
So
n
∂ 2` Xn o
= − xij xik mipi(xi, β )(1 − pi(xi, β )) .
∂βj ∂βk i=1
e e e e
Recall that Var(Yi) = mipi(xi, β )(1 − pi(xi, β )), from the variance of the
binomial distribution. Let Var(Y i ) = vi (β ) = vi (xi , β ).
e e e e
For programming, it is convenient toeuse vector/matrix notation. Let
e e
Y1 p1 m1
Y = ... p = ... m = ...
e Yn e pn e mn
p 1
>
x1 ! log 1−p1
e.. p .
.
X = . log = . operate elementwise.
1 − p
e
x>n log pn
e e 1−p n
β̂j − 0 ·
∼ Normal(0, 1).
σj (β̂ )
e
General remarks
1. There is an extensive literature on conditions for existence and uniqueness
of MLEs for logistic regression.
2. MLEs may not exist. One case is when you have “separation” of
covariates (e.g., all successes to left and all failures to right for some
value of x).
ind ind
4. If you have two observations Y1 ∼ Binomial(m1, p) and Y2 ∼
Binomial(m2, p) with the same success probability p, then the log-
likelihood (excluding constants) is the same regardless of whether
you treat Y1 and Y2 as separate binomial observations or you combine
ind
them as Y1 + Y2 ∼ Binomial(m1 + m2, p). More generally, Bernoulli
observations with the same covariate vector can be combined into
a single binomial response (provided observations are independent)
when defining the log-likelihood.
1.3 Implementation
Function f.lr.p() computes the probability vector under a logistic regression
model
exp(x>i β)
pi = e e
1 + exp(x>i β)
e e
from the design matrix X and regression vector β . The function assumes
that X and β are of the correct dimensions. e
e
f.lr.p <- function(X, beta) {
# compute vector p of probabilities for logistic regression with logit link
X <- as.matrix(X)
beta <- as.vector(beta)
p <- exp(X %*% beta) / (1 + exp(X %*% beta))
return(p)
}
from three input vectors: the counts y , the sample sizes m, and the
probabilities p. The function is arbitrary,eworking for all Binomial
e models.
e
f.lr.l <- function(y, m, p) {
# binomial log likelihood function
# input: vectors: y = counts; m = sample sizes; p = probabilities
# output: log-likelihood l, a scalar
The Fisher’s scoring routine for logistic regression f.lr.FS() finds the
MLE β̂ (without line-search), following from the derivation above.
Convergence
e is based on the number of iterations, maxit = 50, Euclidean
distance between successive iterations of β̂ , eps1, and distance between
successive iterations of the log-likelihood, eeps2. The absolute difference
in log-likelihoods between successive steps is new for us, but a sensible
addition.
Comments
1. The iteration scheme
β̂ = β̂ + (X>v(β )X)−1X>(y − µ(β ))
ei+1 ei e e ee
= β̂ + (inverse Info)(Score func)
ei
is implemented below in two ways. The commented method takes
the inverse of the information matrix, which can be computationally
intensive and (occasionally) numerically unstable. The uncommented
method solves
(X>v(β )X)(β̂ − β̂ ) = X>(y − µ(β ))
e ei+1 ei e ee
for (increm) = (β̂ −β̂ ). The new estimate is β̂ = β̂ +(increm).
ei+1 ei ei+1 ei
16 Logistic Regression and Newton-Raphson
# update beta
beta.2 <- beta.1 # old guess is current guess
mu.2 <- m * f.lr.p(X, beta.2) # m * p is mean
# variance matrix
v.2 <- diag(as.vector(m * f.lr.p(X, beta.2) * (1 - f.lr.p(X, beta.2))))
score.2 <- t(X) %*% (y - mu.2) # score function
# this increment version inverts the information matrix
# Iinv.2 <- solve(t(X) %*% v.2 %*% X) # Inverse information matrix
# increm <- Iinv.2 %*% score.2 # increment, solve() is inverse
# this increment version solves for (beta.2-beta.1) without inverting Information
increm <- solve(t(X) %*% v.2 %*% X, score.2) # solve for increment
# iteration history
NR.hist <- rbind(NR.hist, c(i, diff.beta, diff.like, llike.1, alpha.step[ind.max.a
beta.hist <- rbind(beta.hist, matrix(beta.1, nrow = 1))
}
# prepare output
out <- list()
out$beta.MLE <- beta.1
out$iter <- i - 1
out$NR.hist <- NR.hist
out$beta.hist <- beta.hist
v.1 <- diag(as.vector(m * f.lr.p(X, beta.1) * (1 - f.lr.p(X, beta.1))))
Iinv.1 <- solve(t(X) %*% v.1 %*% X) # Inverse information matrix
out$beta.cov <- Iinv.1
18 Logistic Regression and Newton-Raphson
# quadratic model
X <- matrix(c(rep(1,n), X.temp, X.temp^2), nrow = n)
colnames(X) <- c("Int", "conc", "conc2")
r <- ncol(X) - 1 # number of regression coefficients - 1
1.3 Implementation 19
## $beta.MLE
## [,1]
## Int 7.968410
## conc -0.516593
## conc2 0.006372
##
## $iter
## [1] 6
##
## $NR.hist
## i diff.beta diff.like llike.1 step.size
## 1 1 Inf Inf -322.7 1.0
## 2 2 2.531e+01 1.329e+02 -189.8 1.4
## 3 3 2.701e+01 6.658e+00 -183.2 1.2
## 4 4 4.931e+00 1.050e+00 -182.1 1.2
## 5 5 9.305e-01 8.664e-03 -182.1 1.0
## 6 6 6.066e-03 1.195e-06 -182.1 1.0
## 7 7 1.171e-06 8.527e-14 -182.1 0.9
##
## $beta.hist
## [,1] [,2] [,3]
## [1,] 0.4263 0.0000 0.000000
## [2,] -24.8787 0.5947 -0.002996
## [3,] 2.1174 -0.2900 0.004244
## [4,] 7.0444 -0.4867 0.006130
## [5,] 7.9745 -0.5168 0.006373
## [6,] 7.9684 -0.5166 0.006372
## [7,] 7.9684 -0.5166 0.006372
##
## $beta.cov
## Int conc conc2
## Int 121.80053 -4.115854 3.444e-02
## conc -4.11585 0.139603 -1.172e-03
## conc2 0.03444 -0.001172 9.878e-06
Compare our parameter estimate table above to the one from the glm()
function.
summary(glm.beetles2)$coefficients
FS Observed and predicted mortality, probability scale glm Observed and predicted mortality, probability scale
1.00 ●
● 1.00 ●
●
● ●
● ●
0.75 0.75
rep rep
p.hat
p.hat
●
● 1 ● ● 1
0.50 2 0.50 2
● ●
0.25 0.25
● ●
● ●
50 60 70 50 60 70
conc conc
Also note that the observed and fitted proportions are fairly close,
which qualitatively suggests a reasonable model for the data.
and NRES (the number of NTOTAL that survived at least one year from
the time of diagnosis).
The researchers are interested in modelling the probability p of surviving
at least one year as a function of WBC and AG. They believe that WBC
should be transformed to a log scale, given the skewness in the WBC
values.
## Leukemia white blood cell types example
# ntotal = number of patients with IAG and WBC combination
# nres = number surviving at least one year
# ag = 1 for AG+, 0 for AG-
# wbc = white cell blood count
# lwbc = log white cell blood count
# p.hat = Emperical Probability
leuk <- read.table("http://statacumen.com/teach/SC1/SC1_11_leuk.dat", header = TRUE)
leuk$lwbc <- log(leuk$wbc)
leuk$p.hat <- leuk$nres / leuk$ntotal
1.0
5
0.80.6
IAG=1
0
Probability
Log-Odds
IAG=1
0.4
-5
IAG=0
0.2
IAG=0
-10
0.0
-5 0 5 -5 0 5
LWBC LWBC
Before looking at output for the equal slopes model, note that the
data set has 30 distinct AG and LWBC combinations, or 30 “groups” or
samples. Only two samples have more than 1 observation. The majority of
the observed proportions surviving at least one year (number surviving ≥ 1
year/group sample size) are 0 (i.e., 0/1) or 1 (i.e., 1/1). This sparseness
of the data makes it difficult to graphically assess the suitability of the
logistic model (because the estimated proportions are almost all 0 or 1).
Let’s fit the model with our Fisher’s Scoring method.
# create data variables: m, y, X
n <- nrow(leuk)
m <- leuk$ntotal
y <- leuk$nres
X <- matrix(c(rep(1,n), leuk$lwbc, leuk$ag), nrow = n)
## $beta.MLE
## [,1]
## Int 5.543
## lwbc -1.109
## ag 2.520
##
## $iter
## [1] 5
##
## $NR.hist
## i diff.beta diff.like llike.1 step.size
## 1 1 Inf 1.000e+09 -21.00 1.0
## 2 2 6.081e+00 7.168e+00 -13.84 1.3
## 3 3 5.602e-01 4.164e-01 -13.42 1.2
## 4 4 1.814e-01 4.077e-03 -13.42 1.0
## 5 5 3.747e-03 1.267e-06 -13.42 1.0
## 6 6 1.368e-06 1.901e-13 -13.42 0.9
##
## $beta.hist
## [,1] [,2] [,3]
## [1,] -0.6931 0.0000 0.000
## [2,] 4.9039 -0.9312 2.188
## [3,] 5.3702 -1.0819 2.460
## [4,] 5.5399 -1.1082 2.518
## [5,] 5.5433 -1.1088 2.520
## [6,] 5.5433 -1.1088 2.520
##
## $beta.cov
## Int lwbc ag
## Int 9.1350 -1.3400 0.4507
## lwbc -1.3400 0.2125 -0.1798
## ag 0.4507 -0.1798 1.1896
At each step, the log-likelihood increased, and the norm of the difference
between successive estimates eventually decreased to zero. The estimates
are 5.543 for the constant term, −1.109 for the linear term, and 2.52 for
the quadratic term.
# create a parameter estimate table
beta.Est <- out$beta.MLE
beta.SE <- sqrt(diag(out$beta.cov)) # sqrt diag inverse Information matrix
beta.z <- beta.Est / beta.SE
beta.pval <- 2 * pnorm(-abs(beta.z))
Compare our parameter estimate table above to the one from the glm()
function.
## compare to the glm() fit:
summary(glm.i.l)$call
summary(glm.i.l)$coefficients
survival probability, after taking LWBC into account. This test is rejected
at any of the usual significance levels, suggesting that the AG level affects
the survival probability (assuming a very specific model).
library(ggplot2)
p <- ggplot(leuk, aes(x = lwbc, y = p.hat, colour = ag))
p <- p + geom_line(aes(y = p.MLE))
# fitted values
p <- p + geom_point(aes(y = p.MLE), size=2)
# observed values
p <- p + geom_point(size = 2, alpha = 0.5)
p <- p + labs(title = "FS Observed and predicted probability of 1+ year survival")
print(p)
28 Logistic Regression and Newton-Raphson
●
●
0.75 ●
●
●
●
●
●
ag
p.hat
0.50 ●
● 0
● ● 1
●
●
●
0.25 ●
●
● ●
●
● ●
● ●
●●● ●
● ●
0.00
5 6 7 8 9
lwbc
The plot from our Fisher’s Scoring method above is the same as the
plot below from the glm() procedure.
1.3 Implementation 29
●
●
0.75 ●
●
●
●
●
●
ag
p.hat
0.50 ●
● 0
● ● 1
●
●
●
0.25 ●
●
● ●
●
● ●
● ●
●●● ●
● ●
0.00
5 6 7 8 9
lwbc
or
exp(8.06 − 1.11 LWBC)
p̃ = .
1 + exp(8.06 − 1.11 LWBC)
Although the equal slopes model appears to fit well, a more general
model might fit better. A natural generalization here would be to add an
interaction, or product term, AG ∗ LWBC to the model. The logistic
model with an AG effect and the AG ∗ LWBC interaction is equivalent
to fitting separate logistic regression lines to the two AG groups. This
interaction model provides an easy way to test whether the slopes are
equal across AG levels. I will note that the interaction term is not needed
here.