Chapter 6

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Chapter 6: generalized linear models (GLMs)

for count data:


Poisson regression and Loglinear Models
By: Ashenafi A.
➔ Introduction to Poisson regression

➔ Estimation and testing

➔ Count Regression for Rate Data

➔ Loglinear Models for Two-Way and Three-Way

Tables,

➔ Loglinear Model of Independence

Overview ➔ Interpretation of Parameters in Independence

Model,

➔ Saturated Model

➔ ,
Poisson regression
➢ Many discrete response variables have counts as possible
outcomes. Example is Y = number of parties attended in the
past month, for a sample of students,
➢ Counts also occur in summarizing categorical variables with
contingency tables.
➢ This section introduces GLMs for count data
➢ The simplest GLMs for count data assume a Poisson
distribution for the random component. Like counts, Poisson
variates can take any nonnegative integer value.
Poisson Model
The Maximum Likelihood estimator
➔ We observe data {(xi , yi)|1 ≤ i ≤ n}. The number yi is a realization of the random
variable Yi . The total log-likelihood is, using independency, given by

and 𝛍i = exp(βtxi). Write now Log L(β) as shorthand notation for the total likelihood. Then it follows

The maximum likelihood (ML) estimator is then of course defined as


Poisson regression
❖ For a single explanatory variable x, the Poisson loglinear
model has form

A one-unit increase in x has a multiplicative impact of eβ on μ:


Interpretation of β (continued)
Example: Number of Deaths Due to AIDs
The number of deaths due to AIDS per 3-month period from January 1983 to
June 1986
Number of Deaths Due to AIDs × Month
y<-c(0,1,2,3,1,4,9,18,23,31,20,25,37,45)
x<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14)
plot(x,y ,col= "red")
A Linear Model for AIDs Data
Let’s try a linear model:
The estimated parameters from GLM with a Poisson distribution and the log link:
R-code
> poisson.model<-glm(y ~ x, family = quasipoisson(link = "log"))
> summary(poisson.model)
Call:
glm(formula = y ~ x, family = quasipoisson(link = "log"))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.33963 0.38946 0.872 0.4
x 0.25652 0.03417 7.507 7.17e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasipoisson family taken to be 2.403942)

Null deviance: 207.272 on 13 degrees of freedom


Residual deviance: 29.654 on 12 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
Example
The number of awards earned by students at one high school.
Predictors of the number of awards earned include the type of
program in which the student was enrolled (e.g., vocational,
general or academic) and the score on their final exam in math.
Data available in this link

: https://stats.idre.ucla.edu/wp-content/uploads/2016/02/poisson_sim.sav.
>p<-poisson_sim
>hist(p$num_awards,col
= "yellow",border =
"blue",xlim = c(0,8),
ylim = c(0,200),
breaks = 7,main =
"Histogram",xlab =
"Number of award")
con’t
> summary(m1 <- glm(num_awards ~ prog + math, family="poisson", data=p))

Call:
glm(formula = num_awards ~ prog + math, family = "poisson", data = p)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2043 -0.8436 -0.5106 0.2558 2.6796
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.24712 0.65845 -7.969 1.60e-15 ***
progAcademic 1.08386 0.35825 3.025 0.00248 **
progVocational 0.36981 0.44107 0.838 0.40179
math 0.07015 0.01060 6.619 3.63e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 287.67 on 199 degrees of freedom
Residual deviance: 189.45 on 196 degrees of freedom
AIC: 373.5
Number of Fisher Scoring iterations: 6
Interpret
log(num_awards) = Intercept + b1( progAcademic) +
b2(progVocational ) + b3math.
= exp(Intercept) * exp(b1(progAcademic)) *
exp(b2(progVocational)) * exp(b3math)
The output above indicates that the award rate for [prog=Academic] is
2.9560 times the award rate for the reference group, general. Likewise,
the award rate for [prog=Vocational ] is 1.45 times the awar t rate for
the general holding the other variables at constant. The percent change
in the award rate of num_awards is an increase of 7% for every unit
increase in math.
Loglinear Models

Consider an I × J contingency table that cross-classifies n subjects. When


the responses are statistically independent, the joint cell probabilities{πij }
are determined by the row and column marginal totals,
πij = πi+π+j , i = 1,... ,I, j = 1,...,J

Under independence, μij = nπi+π+j for all i and j .


Loglinear Model of Independence for Two-Way
Table
➔ Denote the row variable by X and the column variable by Y .
The condition of independence, μij = nπi+π+j , is multiplicative.

➔ Thus, independence has the form logμij = λ + λXi + λY j

for a row effect λXi and a column effect λY j


.
Loglinear Model of Independence for Two-Way Table
Example
Example: Belief in Life after Death
For example, in the 2000 General Social Survey, subjects were asked whether they
believed in life after death. The number who answered “yes” was 1339 of the 1639
whites, 260 of the 315 blacks and 88 of the 110 classified as “other” on race. Table
shows results of fitting the independence loglinear model to the 3 × 2 table. The
model fits well. For the constraints used, λY1 = 1.50 and λY2 = 0. Therefore, the
estimated odds of belief in life after death was exp(1.50) = 4.5 for each race

Belief in Life after Death


Race Yes No Total
White 1339 300 1639
1339.63 299.37
Black 260 55 315
257.46 57.536
Others 88 22 110
89.91 20.092
Total 1687 377 2064
Saturated Model for Two-Way Tables

Variables that are statistically dependent rather than independent satisfy


the more complex loglinear model,
logμij = λ + λXi + λYj + λXYij

The λXYij parameters are association terms that reflect deviations from
independence.
.
Example: Belief in Life after Death

The estimated odds ratios between


belief and race are exp(0.1096) =
1.12 for white and other, exp(0.1671)
= 1.18 for black and other, and
exp(0.1096 − 0.1671) = 0.94 for white
and black. For example, the estimated
odds of belief in an life after death for
whites are 0.94 times the estimated
odds for blacks. Since the
independence model fitted well, none
of these estimated odds ratios differ
significantly from 1.0.

You might also like