Lecture3-Logistic Regression 6-5-08
Lecture3-Logistic Regression 6-5-08
Lecture3-Logistic Regression 6-5-08
Advanced Statistical
Methods
June 5, 2008
Introduction
– multinomial
• In a nutshell:
100
80
Length of Stay (days)
60
40
20
0 20 40 60 80
Age (yrs.)
What is Logistic Regression?
1.0
CHD (Died=1, Alive=0)
0.8
0.6
100 day Mortality
0.4
0.2
0.0
20 30 40 50 60 70
Age
Age (yrs.)
Fig 1. Logistic regression curves for the three drug combinations. The dashed reference line represents the probability of DLT
of .33. The estimated MTD can be obtained as the value on the horizontal axis that coincides with a vertical line drawn through
the point where the dashed line intersects the logistic curve. Taken from “Parallel Phase I Studies of Daunorubicin Given
With Cytarabine and Etoposide With or Without the Multidrug Resistance Modulator PSC-833 in Previously Untreated
Patients 60 Years of Age or Older With Acute Myeloid Leukemia: Results of Cancer and Leukemia Group B Study
9420” Journal of Clinical Oncology, Vol 17, Issue 9 (September), 1999: 283. http://www.jco.org/cgi/content/full/17/9/2831
What can we use Logistic Regression for?
exp( z )
p
1 exp( z )
p
LOGIT ( p ) z log
(1 p )
z (log odds)
Logistic Regression
Logistic Regression:
P Y
ln 0 1 X 1 2 X 2 K X K
1-P Y
Linear Regression:
Y 0 1 X 1 2 X 2 K X K
The Logistic Regression Model
predictor variables
P Y
ln 0 1 X 1 2 X 2 K X K
1-P Y
dichotomous outcome
PY
ln is the log(odds) of the outcome.
1 PY
The Logistic Regression Model
P Y
ln 0 1 X 1 2 X 2 K X K
1-P Y
intercept model coefficients
PY
ln is the log(odds) of the outcome.
1 PY
Logistic Regression uses Odds Ratios
Probability event
Odds event =
1-Probability event
Odds event
Probability event
1+Odds event
The Odds Ratio
0.667
OR Trt vs. Placebo
2.67
0.25
Interpretation of the Odds Ratio
•Example cont’d:
Outcome = response, OR trt vs. plb 2.67
T, H, H, T, T, H, H, T, H, H
X # of heads
• In fact, pˆ
N total # of flips
is the ML estimator of p.
Maximum Likelihood
P Y
ln 0 1 X 1 2 X 2 K X K
1-P Y
intercept model coefficients
PY
ln is the log(odds) of the outcome.
1 PY
Maximum Likelihood
N
L Pr y1 , y2 , , y N Pr y1 Pr y2 Pr y N Pr yi
i 1
Pr y p 1 p
y 1 y
N N
L Pr yi pi 1 pi
yi 1 yi
i 1 i 1
yi
N
1
pi
yi
1 pi
i 1 1 pi
yi
N
pi
1 pi
i 1 1 pi
• Taking the logarithm of both sides:
pi Can you
ln L yi ln ln 1 pi see why?
i 1 pi i
Remember that:
P Y
ln 0 1 X 1 2 X 2 K X K
1-P Y
xi
• Substituting in using logistic regression model:
ln L yi xi
i
ln1 expx
i
i
• Now we choose values of β that make this equation as large as possible.
Steps:
• Observe data on outcome, Y, and
charactersitiscs X1, X2, …, XK
P Y
ln 0 1 X 1 2 X 2 K X K
1-P Y
dichotomous outcome
PY
ln is the log(odds) of the outcome.
1 PY
The Logistic Regression Model
P Y
ln 0 1 X 1 2 X 2 K X K
1-P Y
intercept model coefficients
PY
ln is the log(odds) of the outcome.
1 PY
Form for Predicted Probabilities
P Y
ln 0 1 X 1 2 X 2 K X K
1-P Y
exp 0 1 X 1 2 X 2 K X K
P Y
1 exp 0 1 X 1 2 X 2 K X K
Logistic Regression:
P Y
ln 0 1 X 1 2 X 2 K X K
1-P Y
Linear Regression:
Y 0 1 X 1 2 X 2 K X K
Why not use linear regression for
dichotomous outcomes?
– Yi are independent
Pr(Y 1)
0 ∞ odds
1 Pr(Y 1)
L , U
Logistic Regression Models are
estimated by Maximum Likelihood
e ,e L U
where L , U is a 95% CI for K
The Logistic Regression Model
Example:
In Assisted Reproduction Technology (ART) clinics, one of the
main outcomes is clinical pregnancy.
Pr pregnancy
ln 2.67 0.13 Age
1 Pr pregnancy
exp2.67 0.13 Age
Pr pregnancy
1 exp2.67 0.13 Age
The Logistic Regression Model
Pr pregnancy
ln 2.67 0.13 Age
1 Pr pregnancy
Pr pregnancy
ln 2.67 0.13 Age
1 Pr pregnancy
exp2.67 0.13 Age
Pr pregnancy
1 exp2.67 0.13 Age
The Logistic Regression Model
2. Wald test
Likelihood Ratio test
• Idea is to compare the (log) Likelihood of two
models to test H 0 : K 0
• Two models:
1. Full model = with predictor included
2. Reduced model = without predictor
• Then, LˆReduced
2 ln
Lˆ
2 ln LˆReduced 2 ln LˆFull
Full
~ 2 with df # of extra parameters in full model
2
(here df 1; Critical 1 3.84 for 0.05)
Wald test
• Idea is to use large sample Z statistic from a single
model to test: H 0 : K 0
ˆK
Here, Z where Z ~ N 0, 1
SEˆ
K
0.6
Proportion of CHD
0.4
0.2
0.0
20 30 40 50 60
Age Group
Logistic regression
1
e odds ratio of CHD for persons taking
the " New Drug" vs. " Placebo"
Logistic Regression
• We can add multiple predictor variables in modeling
the log odds of getting CHD:
Pr(CHDi )
log 0 1 Drug 2 ( Agei 45)
Pr( no CHDi )
1, if Drug
Drug i for Person i
0, if Placebo
Interpretation for Drug
ˆ1 ? ˆ1
e ?
SEˆ ?
1
ˆ1
z ?
SEˆ
1
• Conclusion:
Strong evidence that the odds of CHD is associated
with the drug.
Likelihood Ratio test
• Idea is to compare the (log) Likelihood of two
models to test H 0 : K 0
• Two models:
1. Full model = with predictor included
2. Reduced model = without predictor
• Then, LˆReduced
2 ln
Lˆ
2 ln LˆReduced 2 ln LˆFull
Full
~ 2 with df # of extra parameters in full model
2
(here df 1; Critical 1 3.84 for 0.05)
Suggested exercises
• No need to hand in
Looking ahead