Logit, Probit and Tobit:

Models for Categorical and Limited

Dependent Variables

By Rajulton Fernando
Presented at
PLCS/RDC Statistics and Data Series at Western
March 23,
23 2011
• In social science research
research, categorical data are often
collected through surveys.
– Categorical
g Î Nominal and Ordinal variables
– They take only a few values that do NOT have a metric.
• A)) Binary
y Case
• Many dependent variables of interest take only two
values (a dichotomous variable), denoting an event or
non-event and coded as 1 and 0 respectively. Some
– The labor force status of a person.
– Voting behavior of a person (in favor of a new policy).
– Whether a person got married or divorced.
– Whether a person involved in criminal behaviour, etc.
• With such variables
variables, we can build models that
describe the response probabilities, say P(yi = 1), of
the dependent
p variable yi.
– For a sample of N independently and identically distributed
observations i = 1, ... ,N and a (K+1)-dimensional vector x′i
off explanatory
l t variables,
i bl theth probability
b bilit that
th t y takes
t k value
1 is modeled as
P ( yi = 1| xi ) = F ( xi′ β ) = F ( zi )
where β is a (K + 1)-dimensional column vector of
• The transformation function F is crucial. It maps the
linear combination into [0,1] and satisfies in general
F(−∞) = 0,
0 F(+∞) = 1,
1 and δF(z)/δz > 0 [that is is, it is a
cumulative distribution function].
The Logit and Probit Models
• When the transformation function F is the logistic
function, the response probabilities are given by
e xi β

P ( y i = 1 | xi ) =
1 + e xi β

• And, when the transformation function F is the

cumulative density function (cdf) of the standard
normal distribution, the response probabilities are
x ′β x ′β
given by 1 −
s i 2

P ( y i = 1 | x i ) = Φ ( x i′ β ) = ∫ Φ ( s ) ds
2 = ∫ 2π
e ds
−∞ −∞

• The Logit and Probit models are almost identical (see

the Figure next slide) and the choice of the model is
bi although
lh h llogit
i model
d l has
h certain i
advantages (simplicity and ease of interpretation)
Source: J.S. Long, 1997
The Logit and Probit Models
• However
However, the parameters of the two models are
scaled differently. The parameter estimates in a
g regression
g tend to be 1.6 to 1.8 times higher
than they are in a corresponding probit model.
• The pprobit and logitg models are estimated byy
maximum likelihood (ML), assuming independence
across observations. The ML estimator of β is
i andd asymptotically
i ll normally ll distributed.
di ib d
However, the estimation rests on the strong
assumption that the latent error term is normally
distributed and homoscedastic. If homoscedasticity is
violated,, no easyy solution.
The Logit and Probit Models
• Note: The response function (logistic or probit) is an
S-shaped function, which implies a fixed change in X
has a smaller impact
p on the pprobabilityy when it is
near zero than when it is near the middle. Thus, it is a
non-linear response function.
• How to interpret the coefficients : In both models,
If b > 0 Î p increases as X increases
If b < 0 Î p decreases as X increases
– As mentioned above, b cannot be interpreted as a simple
slope as in ordinary regression. Because the rate at which
the curve ascends or descends changes according to the
value of X.
– In other words,, it is not a constant change
g as in ordinaryy
regression. Î The greatest rate of change is at p = 0.5
The Logit and Probit Models
– In the logit model
model, we can interpret b as an effect
on the odds. That is, every unit increase in X
results in a multiplicative
p effect of eb on the odds.
Example: If b = 0.25, then e.25 = 1.28. Thus, when X
changes by one unit, p increases by a factor of 1.28, or
changes by 28%.
- In the probit model, use the Z-score terminology.
F every unit
For it increase
i in
i X,
X the
th Z-score
Z ( the
(or th
Probit of “success”) increases by b units. [Or, we
can also say that an increase in X changes Z by b
standard deviation units.]
- If yyou like,, yyou can convert the z-score to p
using the normal table.
Models for Polytomous Data
• B) Polytomous Case
– Here we need to distinguish between purely
nominal variables and really ordinal variables.
– When the variable is purely nominal, we can
extend the dichotomous logit
g model,, usingg one of
the categories as reference and modeling the other
responses j=1,2,..m-1 compared to the reference.
• Example: In the case of 3 categories, using the 3rd category
as the reference, logit p1 = ln(p1/p3) and logit p2 = ln(p2/p3),
which will ggive two sets of parameter
p estimates.
exp( β 1 x )
P ( y = 1) =
1 + exp( β 1 x ) + exp( β 2 x )
exp( β 2 x )
P ( y = 2) =
1 + exp( β 1 x ) + exp( β 2 x )
P ( y = 3) =
1 + exp( β 1 x ) + exp( β 2 x )
Polytomous Case
– When the variable is really ordinal,
ordinal we use cumulative
logits (or probits). The logits in this model are for
cumulative categories at each point, contrasting
categories above with categories below.
– Example: Suppose Y has 4 categories; then,
• logit (p1) = ln{p1 / (1-p
(1 p1)} = a1 + bX
• logit (p1 + p2) = ln{(p1+ p2 )/(1-p1 – p2)} = a2 + bX
• logit (p1+p2+p3) = ln{(p1+ p2 + p3 )/(1-p1–p2–p3)} = a3 + bX
– Since these are cumulative logits, the probabilities are
attached to being in category j and lower.
– Since the right side changes only in the intercepts,
and not in the slope coefficient, this model is known as
Proportional odds model.
model Thus,
Thus in ordered logistic,
logistic we
need to test the assumption of proportionality as well.
Ordinal Logistic
– a1, a2, a3 … are the “intercepts”
intercepts that satisfy the property
a1 < a2 < a3… interpreted as “thresholds” of the latent
– Interpretation of parameter estimates depends on the
software used! Check the software manual.
• If the RHS = a + bX,bX a positive
positi e coefficient is associated
more with lower order categories and a negative
coefficient is associated more with higher order
• If the RHS = a – bX, a negative coefficient is more
associated with lower ordered categories
categories, and a positive
coefficient is more associated with higher ordered
Model for Limited Dependent Variable
• C) Tobit Model
• This model is for metric dependent variable and
when it is “limited”
limited in the sense we observe it only if
it is above or below some cut off level. For example,
– the wages
g mayy be limited from below by y the minimum
– The donation amount give to charity
– “Top coding” income at, say, at $300,000
– Time use and leisure activity of individuals
– Extramarital affairs
• It is also called censored regression model. Censoring
can be from below or from above, also called left and
right censoring. [Do not confuse the term “censoring”
with the one used in dynamic modeling.]
The Tobit Model
• The model is called Tobit because it was first proposed
by Tobin (1958), and involves aspects of Probit analysis –
a term coined by Goldberger for Tobin’s Probit.
• Reasoning behind:
– If we include the censored observations as y = 0, the
censored d observations
b i on the
h lleft
f will
ill pull
ll down
d the
h end
d off
the line, resulting in underestimates of the intercept and
overestimates of the slope.
– If we exclude the censored observations and just use the
observations for which y>0 (that is, truncating the sample),
it will overestimate the intercept and underestimate the
– The degree
g of bias in both will increase as the number of
observations that take on the value of zero increases. (see
Figure next slide)
Source: J.S. Long
The Tobit Model
• The Tobit model uses all of the information,
including info on censoring and provides consistent
• It is also a nonlinear model and similar to the probit
model. It is estimated usingg maximum likelihood
estimation techniques. The likelihood function for
the tobit model takes the form:

• This is an unusual function, it consists of two terms,

the first for non-censored observations (it is the pdf),
d th
the secondd ffor censored
d observations
b ti (it iis th
the cdf).
The Tobit Model
• The estimated tobit coefficients are the marginal
effects of a change in xj on y*, the unobservable latent
variable and can be interpreted
p in the same way
y as in a
linear regression model.
• But such an interpretation may not be useful since we
are interested in the effect of X on the observable y (or
change in the censored outcome).
– It can b
be shown
h th
thatt change
h iin y is
i found
f d by
b multiplying
lti l i
the coefficient with Pr(a<y*<b), that is, the probability of
being uncensored. Since this probability is a fraction, the
marginal effect is actually attenuated.
– In the above, a and b denote lower and upper censoring
points For example,
points. example in left censoring,
censoring the limits will be:
a =0, b=+∞.
Illustrations for logit, probit and tobit models, using womenwk.dta from Baum available at

Descriptive Statistics

N Minimum Maximum Mean Std. Deviation

age 2000 20 59 36.21 8.287

education 2000 10 20 13.08 3.046
married 2000 0 1 .67 .470
children 2000 0 5 1.64 1.399
wagefull 2000 -1.68 45.81 21.3118 7.01204
wage 1343 5.88 45.81 23.6922 6.30537
lw 1343 1.77 3.82 3.1267 .28651
work 2000 0 1 .67 .470
lwf 2000 .00 3.82 2.0996 1.48752
Valid N (listwise) 1343

Binary Logistic Regression

Model Summary

Step Cox & Snell R Nagelkerke R

-2 Log likelihood Square Square

1 2055.829a .212 .295

a. Estimation terminated at iteration number 5 because

parameter estimates changed by less than .001.

Hosmer and Lemeshow Test

Step Chi-square df Sig.

1 6.491 8 .592

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Step 1 age .058 .007 64.359 1 .000 1.060

education .098 .019 27.747 1 .000 1.103

married .742 .126 34.401 1 .000 2.100

children .764 .052 220.110 1 .000 2.148

Constant -4.159 .332 156.909 1 .000 .016

a. Variable(s) entered on step 1: age, education, married, children.

Binary Probit Regression (in SPSS, use the ordinal regression menu and select probit
link function. Ignore the test of parallel lines, etc.)

Model Fitting Information

Model -2 Log
Likelihood Chi-Square df Sig.

Intercept Only 1645.024

Final 1166.702 478.322 4 .000

Link function: Probit.

Parameter Estimates

95% Confidence Interval

Estimate Std. Error Wald df Sig. Lower Bound Upper Bound

Threshold [work = 0] 2.037 .209 94.664 1 .000 1.626 2.447

Location age .035 .004 67.301 1 .000 .026 .043

education .058 .011 28.061 1 .000 .037 .080

children .447 .029 243.907 1 .000 .391 .503

[married=0] -.431 .074 33.618 1 .000 -.577 -.285

[married=1] 0a . . 0 . . .

Link function: Probit.

a. This parameter is set to zero because it is redundant.

Tobit regression cannot be done in SPSS. Use Stata. Here are the Stata commands.
First, fit simple OLS Regression of the variable lwf (just to check)

. regress lwf age married children education

Source | SS df MS Number of obs = 2000

-------------+------------------------------ F( 4, 1995) = 134.21
Model | 937.873188 4 234.468297 Prob > F = 0.0000
Residual | 3485.34135 1995 1.74703827 R-squared = 0.2120
-------------+------------------------------ Adj R-squared = 0.2105
Total | 4423.21454 1999 2.21271363 Root MSE = 1.3218

lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval]
age | .0363624 .003862 9.42 0.000 .0287885 .0439362
married | .3188214 .0690834 4.62 0.000 .1833381 .4543046
children | .3305009 .0213143 15.51 0.000 .2887004 .3723015
education | .0843345 .0102295 8.24 0.000 .0642729 .1043961
_cons | -1.077738 .1703218 -6.33 0.000 -1.411765 -.7437105

. tobit lwf age married children education, ll(0)

Tobit regression Number of obs = 2000
LR chi2(4) = 461.85
Prob > chi2 = 0.0000
Log likelihood = -3349.9685 Pseudo R2 = 0.0645

lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval]
age | .052157 .0057457 9.08 0.000 .0408888 .0634252
married | .4841801 .1035188 4.68 0.000 .2811639 .6871964
children | .4860021 .0317054 15.33 0.000 .4238229 .5481812
education | .1149492 .0150913 7.62 0.000 .0853529 .1445454
_cons | -2.807696 .2632565 -10.67 0.000 -3.323982 -2.291409
/sigma | 1.872811 .040014 1.794337 1.951285
Obs. summary: 657 left-censored observations at lwf<=0
1343 uncensored observations
0 right-censored observations

. mfx compute, predict(pr(0,.))

Marginal effects after tobit

y = Pr(lwf>0) (predict, pr(0,.))
= .81920975
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
age | .0073278 .00083 8.84 0.000 .005703 .008952 36.208
married*| .0706994 .01576 4.48 0.000 .039803 .101596 .6705
children | .0682813 .00479 14.26 0.000 .058899 .077663 1.6445
educat~n | .0161499 .00216 7.48 0.000 .011918 .020382 13.084
(*) dy/dx is for discrete change of dummy variable from 0 to 1

. mfx compute, predict(e(0,.))

Marginal effects after tobit

y = E(lwf|lwf>0) (predict, e(0,.))
= 2.3102021
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
age | .0314922 .00347 9.08 0.000 .024695 .03829 36.208
married*| .2861047 .05982 4.78 0.000 .168855 .403354 .6705
children | .2934463 .01908 15.38 0.000 .256041 .330852 1.6445
educat~n | .0694059 .00912 7.61 0.000 .051531 .087281 13.084
(*) dy/dx is for discrete change of dummy variable from 0 to 1

