Lecture3-Logistic Regression 6-5-08

HSRP 734:
Advanced Statistical
Methods
June 5, 2008
Introduction
• Categorical data analysis
– multinomial
– 2x2 and RxC analysis
– 2x2xK, RxCxK analysis
• Stratified analysis (CMH) considers the

problem of controlling for other variable
Introduction
• Need to extend to scientific questions of higher

dimension.
• When the number of potential covariates increases,

traditional methods of contingency table analysis
become limited
• One alternative approach to stratified analyses is the

development of regression models that incorporate
covariates and interactions among variables.
Introduction
• Logistic regression is a form of regression

analysis in which the outcome variable is binary
or dichotomous
• General theory: analysis of variance (ANOVA)

and logistic regression all are special cases of
General Linear Model (GLM)
OBJECTIVES
• To describe what simple and multiple logistic

regression is and how to perform
• To describe maximum likelihood techniques to

fit logistic regression models
• To describe Likelihood ratio and Wald tests

OBJECTIVES
• To describe how to interpret odds ratios for logistic
regression with categorical and continuous
predictors
• To describe how to estimate and interpret predicted

probabilities from logistic models
• To describe how to do the above 5 using SAS

Enterprise
What is Logistic Regression?
• In a nutshell:
A statistical method used to model dichotomous or

binary outcomes (but not limited to) using predictor
variables.
Used when the research method is focused on

whether or not an event occurred, rather than when
it occurred (time course information is not used).
• What is the “Logistic” component?
Instead of modeling the outcome, Y, directly, the

method models the log odds(Y) using the logistic
function.
• What is the “Regression” component?
Methods used to quantify association between an

outcome and predictor variables. Could be used to
build predictive models as a function of predictors.
100
80
Length of Stay (days)
60
40
20
0 20 40 60 80
Age (yrs.)
1.0
CHD (Died=1, Alive=0)
0.8
0.6
100 day Mortality
0.4
0.2
0.0
20 30 40 50 60 70
Age
Age (yrs.)
Fig 1. Logistic regression curves for the three drug combinations. The dashed reference line represents the probability of DLT
of .33. The estimated MTD can be obtained as the value on the horizontal axis that coincides with a vertical line drawn through
the point where the dashed line intersects the logistic curve. Taken from “Parallel Phase I Studies of Daunorubicin Given
With Cytarabine and Etoposide With or Without the Multidrug Resistance Modulator PSC-833 in Previously Untreated
Patients 60 Years of Age or Older With Acute Myeloid Leukemia: Results of Cancer and Leukemia Group B Study
9420” Journal of Clinical Oncology, Vol 17, Issue 9 (September), 1999: 283. http://www.jco.org/cgi/content/full/17/9/2831
What can we use Logistic Regression for?
• To estimate adjusted prevalence rates, adjusted for

potential confounders (sociodemographic or clinical
characteristics)
• To estimate the effect of a treatment on a

dichotomous outcome, adjusted for other covariates
• Explore how well characteristics predict a categorical

outcome
History of Logistic Regression
• Logistic function was invented in the 19 th century to

describe the growth of populations and the course of
autocatalytic chemical reactions.
• Quetelet and Verhulst
• Population growth was described easiest by

exponential growth but led to impossible values
• Logistic function was the solution to a

differential equation that was examined from
trying to dampen exponential population growth
models.
• Published in 3 different papers around the

1840’s. The first paper showed how the logistic
models agreed very well with the actual course of
the populations of France, Belgium, Essex, and
Russia for periods up to the early 1830’s.
The Logistic Curve
 p  exp  z 
LOGIT ( p ) ln  z  p
p (probability)  (1  p )  1  exp  z 
exp( z )
p
1  exp( z )
p
LOGIT ( p )  z log
(1  p )
z (log odds)
Logistic Regression
• Simple logistic regression = logistic regression with 1 predictor

variable
• Multiple logistic regression = logistic regression with multiple

predictor variables
• Multiple logistic regression =

Multivariable logistic regression =
Multivariate logistic regression
The Logistic Regression Model
Logistic Regression:
 P Y  
ln   0  1 X 1   2 X 2     K X K
 1-P Y  
 
Linear Regression:
Y  0  1 X 1   2 X 2     K X K  
predictor variables
 P Y  
ln    0  1 X 1   2 X 2     K X K
 1-P Y  
 
dichotomous outcome
 PY  
ln  is the log(odds) of the outcome.
 1  PY 
 P Y  
ln    0  1 X 1   2 X 2     K X K
 1-P Y  
 
intercept model coefficients
 PY  
 1  PY 
Logistic Regression uses Odds Ratios
• Does not model the outcome directly, which leads

to effect estimates quantified by means (i.e.,
differences in means)
• Estimates of effect are instead quantified by

“Odds Ratios”
Relationship between
Odds & Probability
Probability event 
Odds event  =
1-Probability event 
Odds event 
Probability event  
1+Odds event 
The Odds Ratio
Definition of Odds Ratio: Ratio of two odds estimates.
So, if Pr(response | trt) = 0.40 and Pr(response | placebo) = 0.20

Then:
 0.40
Odds response| trt group   
 0.667
1  0.40
 0.20
Oddsresponse | placebo group   0.25
1  0.20
 0.667
 OR Trt vs. Placebo  
 2.67
0.25
Interpretation of the Odds Ratio
•Example cont’d:

Outcome = response, OR trt vs. plb  2.67
Then, the odds of a response in the treatment

group were estimated to be 2.67 times the odds
of having a response in the placebo group.
Alternatively, the odds of having a response

were 167% higher in the treatment group than in
the placebo group.
Odds Ratio vs. Relative Risk
• An Odds Ratio of 2.67 for trt. vs. placebo does

NOT mean that the outcome is 2.67 times as
LIKELY to occur.
• It DOES mean that the ODDS of the outcome

occurring are 2.67 times as high for trt. vs. placebo.
Odds Ratio vs. Relative Risk
• The Odds Ratio is NOT mathematically equivalent to the

Relative Risk (Risk Ratio)
• However, for “rare” events, the Odds ratio can

approximate the Relative risk (RR)
 1-P response | trt  

OR=RR 
 1-P response | plb  
 
Maximum Likelihood
Idea of Maximum Likelihood
• Flipped a fair coin 10 times:
T, H, H, T, T, H, H, T, H, H
• What is the Pr(Heads) given the data?
1/100? 1/5? 1/2? 6/10?
• Did you do the home experiment?

T, H, H, T, T, H, H, T, H, H
• What is the Pr(Heads) given the data?
• Most reasonable data-based estimate would be

6/10.
X # of heads
• In fact, pˆ  
N total # of flips
is the ML estimator of p.
Maximum Likelihood
• The method of maximum likelihood estimation

chooses values for parameter estimates
(regression coefficients) which make the observed
data “maximally likely.”
• Standard errors are obtained as a by-product of

the maximization process
 P Y  
ln    0  1 X 1   2 X 2     K X K
 1-P Y  
 
 PY  
 1  PY 
Maximum Likelihood
• We want to choose β’s that maximizes the

probability of observing the data we have:
N
L Pr y1 , y2 ,  , y N  Pr y1 Pr y2  Pr y N   Pr yi 
i 1
Assumption: independent y’s

Maximum Likelihood
• Define p = Pr(y=1). Then for dichotomous
outcome => Pr(y=0) = 1-Pr(y=1) = 1-p.
Then:
Pr y   p 1  p 
y 1 y
For y 1, Pr(1)  p 1  p   p

1 0
For y 0, Pr(0)  p 1  p  1  p

0 1
• So, given that Pr(y) = py(1-p)1-y :
N N
L  Pr yi   pi 1  pi 
yi 1 yi
i 1 i 1
yi
N
 1 
 pi 
yi
 1  pi 
i 1  1  pi 
yi
N
 pi 
  1  pi 
i 1  1  pi 
• Taking the logarithm of both sides:
 pi  Can you
ln L  yi ln    ln 1  pi  see why?
i  1  pi  i
Remember that:
 P Y  
ln    0  1 X 1   2 X 2     K X K
 1-P Y  
 
 xi
• Substituting in using logistic regression model:
ln L  yi xi 
i
 ln1  expx 
i
i
• Now we choose values of β that make this equation as large as possible.
• Maximizing the lnL => maximizes L
• Maximizing involves derivatives & iteration

Maximum Likelihood
• The method of maximum likelihood estimation
chooses values for parameter estimates which make
the observed data “maximally likely.”
• ML estimators have great properties:

– Unbiased (estimate true β’s)
– Asymptotically efficient (narrow CI’s)
– Asymptotically Normally distributed

(can calculate CI’s and Test Statistics using familiar Z
formulas)
Estimating a Logistic Regression Model
Steps:
• Observe data on outcome, Y, and
charactersitiscs X1, X2, …, XK
• Estimate model coefficients using ML
• Perform inference: calculate confidence

intervals, odds ratios, etc.
predictor variables
 P Y  
ln    0  1 X 1   2 X 2     K X K
 1-P Y  
 
dichotomous outcome
 PY  
 1  PY 
 P Y  
ln    0  1 X 1   2 X 2     K X K
 1-P Y  
 
 PY  
 1  PY 
Form for Predicted Probabilities
 P Y  
ln    0  1 X 1   2 X 2     K X K
 1-P Y  
 

exp   0  1 X 1   2 X 2     K X K 
P Y  
1  exp   0  1 X 1   2 X 2     K X K 
In this latter form, the logistic regression model directly

relates the probability of Y to the predictor variables.
Logistic Regression:
 P Y  
ln   0  1 X 1   2 X 2     K X K
 1-P Y  
 
Linear Regression:
Y  0  1 X 1   2 X 2     K X K  
Why not use linear regression for
dichotomous outcomes?
• If we model Y directly and Y is dichotomous, this

necessarily violates the linear regression
assumptions (homoscedasticity)
• One of the more intuitive reasons not to is that will

end up with predicted Y’s other than 0 or 1 (possibly
more extreme than 0 or 1).
Assumptions in logistic regression
• Assumptions in logistic regression
– Yi are from Bernoulli or binomial (n i, i) distribution
– Yi are independent
– Log odds P(Yi = 1) or logit P(Yi = 1) is a linear

function of covariates
• Relationships among probability, odds and
log odds
Measure Min Max Name

Pr(Y=1) 0 1 prob
Pr(Y 1)
0 ∞ odds
1  Pr(Y 1)
 Pr(Y 1)  -∞ ∞ log odds

log 
 1  Pr(Y 1) 
Commonality between
linear and logistic regression
• Operating on the logit scale allows a linear model

that is similar to linear regression to be applied
• Both linear and logistic regression are apart of the

family of Generalized Linear Models (GLM)
Logistic Regresion is a
General Linear Model (GLM)
• Family of regression models that use the same general framework
• Outcome variable determines choice of model
Outcome GLM Model

Continuous Linear regression
Dichotomous Logistic regression
Counts Poisson regression
Logistic Regression Models are
estimated by Maximum Likelihood
• Using this estimation gives model coefficient

estimates that are asymptotically consistent,
efficient, and normally distributed.
• Thus, a 95% Confidence Interval for  K is given
by:

 
 K z 2  SE  
 K 
 L , U 
Logistic Regression Models are
estimated by Maximum Likelihood
• The Odds Ratio for the kth model coefficient is:

 
 
OR = exp   K 
 
• We can also get a 95% CI for the OR from:
e ,e L U

where L , U  is a 95% CI for  K
Example:
In Assisted Reproduction Technology (ART) clinics, one of the
main outcomes is clinical pregnancy.
There is much empirical evidence that the candidate mother’s

age is a significant factor that affects the chances of pregnancy
success.
A recent study examined the effect of the mother’s age, along

with clinical characteristics, on the odds of pregnancy success on
the first ART attempt.
 Pr pregnancy 
ln  2.67  0.13  Age
 1  Pr pregnancy

exp2.67  0.13  Age
Pr pregnancy 
1  exp2.67  0.13  Age
ln  2.67  0.13  Age
Q1. What is the effect of Age on Pregnancy?


A. The OR Age exp 0.13 0.88
This implies that for every 1 yr. increase in age, the
odds of pregnancy decrease by 12%.
Q2. What is the predicted probability of a 25 yr. old

having pregnancy success with first ART attempt?
ln  2.67  0.13  Age

exp2.67  0.13  Age
Pr pregnancy 
1  exp2.67  0.13  Age
Q2. What is the predicted probability of a 25 yr. old

having pregnancy success with first ART attempt?
 exp2.67  0.13  25

Pr pregnancy  0.359
1  exp2.67  0.13  25
A. From this model, a 25 yr. old has about a

36% chance of pregnancy success.
Hypothesis testing
• Usually interested in testing H 0 :  K 0
• Two types of tests we’ll discuss:
1. Likelihood Ratio test
2. Wald test
Likelihood Ratio test
• Idea is to compare the (log) Likelihood of two
models to test H 0 :  K 0
• Two models:
1. Full model = with predictor included
2. Reduced model = without predictor
• Then,  LˆReduced 
 2 ln
Lˆ   
  2 ln LˆReduced   2 ln LˆFull
 Full 
~  2 with df # of extra parameters in full model
2
(here df 1; Critical 1 3.84 for  0.05)
Wald test
• Idea is to use large sample Z statistic from a single
model to test: H 0 :  K 0
ˆK
Here, Z  where Z ~ N 0, 1
SEˆ
K
• Critical Z value for =0.05 is 1.96 (two-sided)

Hypothesis testing
• As the sample size gets larger and larger, the

Wald test will approximate the Likelihood ratio test.
• The LR test is preferred but Wald test is common
• Why? Not to scald the Wald but…

Predictive ability of Logistic regression
• Generalized R-squared statistics controversial
• ROC curve plots Sensitivity vs. 1-Specificity based

on fitted model
• c statistic = Area under ROC curve commonly

used to summarize predictive ability of model
SAS Enterprise:
chd.sas7bdat
Logistic Regression
• Motivating example
• Consider the problem of exploring how the risk for

coronary heart disease (CHD) changes as a function of
age.
• How would you test whether age is associated with

CHD?
• What does a scatter plot of CHD and age look like?

0.8
0.6
Proportion of CHD
0.4
0.2
0.0
20 30 40 50 60
Age Group
Logistic regression
Taking the exp of β1 gives the odds ratio:
odds of CHD taking Drug

exp1  
odds of CHD taking Placebo
1
e odds ratio of CHD for persons taking
the " New Drug" vs. " Placebo"
Logistic Regression
• We can add multiple predictor variables in modeling
the log odds of getting CHD:
 Pr(CHDi ) 
log   0  1 Drug   2 ( Agei  45)
 Pr( no CHDi ) 
Agei = Person i’s age in years
 1, if Drug
Drug i  for Person i
0, if Placebo
Interpretation for Drug
• Interpretation of coefficient 1 when there are more than

one variable
– log odds ratio when the other variables are held

constant
– e.g., log odds ratio between having CHD with and

without drug, adjusting for age
Interpretation for Age
• Interpretation of coefficient 1 when there are more than

one variable
• Interpretation of coefficient 1 for continuous covariate
– e.g., log odds ratio for 1 year of change in age (unit

difference in covariate), adjusting for drug
• Maximum Likelihood Estimates:
ˆ1 ? ˆ1
e ?
SEˆ ?
1
ˆ1
z ?
SEˆ
1
95% CI for 1 ? 95% CI for e 1 ?
• Conclusion:
Strong evidence that the odds of CHD is associated
with the drug.
Likelihood Ratio test
• Idea is to compare the (log) Likelihood of two
models to test H 0 :  K 0
• Two models:
1. Full model = with predictor included
2. Reduced model = without predictor
• Then,  LˆReduced 
 2 ln
Lˆ   
  2 ln LˆReduced   2 ln LˆFull
 Full 
~  2 with df # of extra parameters in full model
2
(here df 1; Critical 1 3.84 for  0.05)
Suggested exercises
• Read Kleinbaum Chapter 1,2,3

Detailed Outline
• Chapter 1 in Kleinbaum & Klein

Practice Exercises (can check
answers)
• No need to hand in
Looking ahead
• HW 3: Due June 12th
• Next 2 classes: Model Building,

Diagnostics & Extensions
• Review & Exam 1

Lecture3-Logistic Regression 6-5-08

Uploaded by

Copyright:

Available Formats

Lecture3-Logistic Regression 6-5-08

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture3-Logistic Regression 6-5-08

Uploaded by

Copyright:

Available Formats

HSRP 734:

• Categorical data analysis

– 2x2 and RxC analysis

– 2x2xK, RxCxK analysis

• Stratified analysis (CMH) considers the

• Need to extend to scientific questions of higher

• When the number of potential covariates increases,

• One alternative approach to stratified analyses is the

• Logistic regression is a form of regression

• General theory: analysis of variance (ANOVA)

• To describe what simple and multiple logistic

• To describe maximum likelihood techniques to

• To describe Likelihood ratio and Wald tests

• To describe how to estimate and interpret predicted

• To describe how to do the above 5 using SAS

A statistical method used to model dichotomous or

Used when the research method is focused on

• What is the “Logistic” component?

Instead of modeling the outcome, Y, directly, the

• What is the “Regression” component?

Methods used to quantify association between an

• To estimate adjusted prevalence rates, adjusted for

• To estimate the effect of a treatment on a

• Explore how well characteristics predict a categorical

• Logistic function was invented in the 19 th century to

• Quetelet and Verhulst

• Population growth was described easiest by

• Logistic function was the solution to a

• Published in 3 different papers around the

• Simple logistic regression = logistic regression with 1 predictor

• Multiple logistic regression = logistic regression with multiple

• Multiple logistic regression =

• Does not model the outcome directly, which leads

• Estimates of effect are instead quantified by

Definition of Odds Ratio: Ratio of two odds estimates.

So, if Pr(response | trt) = 0.40 and Pr(response | placebo) = 0.20

Then, the odds of a response in the treatment

Alternatively, the odds of having a response

• An Odds Ratio of 2.67 for trt. vs. placebo does

• It DOES mean that the ODDS of the outcome

• The Odds Ratio is NOT mathematically equivalent to the

• However, for “rare” events, the Odds ratio can

 1-P response | trt  

• Flipped a fair coin 10 times:

• What is the Pr(Heads) given the data?

1/100? 1/5? 1/2? 6/10?

• Did you do the home experiment?

• What is the Pr(Heads) given the data?

• Most reasonable data-based estimate would be

• The method of maximum likelihood estimation

• Standard errors are obtained as a by-product of

• We want to choose β’s that maximizes the

Assumption: independent y’s

For y 1, Pr(1)  p 1  p   p

For y 0, Pr(0)  p 1  p  1  p

• Maximizing the lnL => maximizes L

• Maximizing involves derivatives & iteration

• ML estimators have great properties:

– Asymptotically efficient (narrow CI’s)

– Asymptotically Normally distributed