Appendix: Statistical Models of Attrition

This document summarizes two statistical models for addressing attrition in studies: the Heckman selection model and the tobit model. The Heckman model involves two equations, one for outcomes and one for the selection process, and assumes bivariate normal errors. The tobit model assumes outcomes are missing below a cutoff and estimates the regression by treating values below the cutoff as left-censored. Applying these models to a voucher lottery study, the tobit model estimates the voucher effect on reading scores is 3.3, larger than the simple regression estimate of 0.7. Both models make strong assumptions but allow estimating treatment effects under different assumptions about the attrition process.

Uploaded by

Bom Villatuya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Appendix: Statistical Models of Attrition

Uploaded by

Bom Villatuya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Appendix: Statistical Models of Attrition

An alternative to the nonparametric approach discussed in Chapter 7 is to posit a

parametric model of the attrition process and the potential outcomes. Parametric models make
assumptions about the functions linking cause and effect and the distributions from which
unobserved causes are drawn. The most widely used parametric model, first proposed by
Heckman (1979), involves two regression equations. The first equation offers a model of the
outcome:

. (A7.1)

This “outcome equation” may also be expressed in terms of potential outcomes using the same
form as equation (4.7).

The second equation offers a model of the process that determines whether the outcome
is observed or missing. This “selection equation” predicts each subject’s propensity to render
observed outcomes:

. (A7.2)

where is a variable (or collection of variables) that predict attrition but are unrelated to .1
This exclusion restriction is similar to the assumption we encountered in Chapter 5, when we
discussed an “instrumental variable” that predicted whether a subject received treatment but had
no causal effect on outcomes. The strongest case for excludability of occurs when the
intensity of effort to obtain outcome data is randomly allocated. DiNardo and McCrary (2010),
for example, discuss an experiment in which researchers randomly varied the amount of effort
they devoted to obtaining outcomes.

The two equations work together in the following way. Let = 1 if ; otherwise
= 0. In other words, we observe outcomes for subjects whose propensity to be observed is above
a certain threshold. In the case of the voucher lottery example, we would observe students
whose potential test scores were above a certain cutoff level. Thus,

if = 1, otherwise is missing. (A7.3)

Because we observe outcomes for some observations and not others, a regression of on may
generate biased estimates. Equation (A7.4) shows that the bias in the estimated treatment effect
stems from the relationship between and . The expected value of can be written as

| ] | ]. (A7.4)

1
Although this approach can produce estimates even when there is no excluded variable in the selection
equation, it is rarely applied without an excluded variable because in the absence of an instrument the results are
entirely driven by the specific distributional and modeling assumptions.
If and are independent, there is no bias. However, the omitted factors predicting attrition
( ) are typically correlated with omitted factors predicting outcomes ( ).

In order to eliminate this bias, the researcher imposes assumptions that allow the third
term in equation (A7.4) to be measured and included as a control variable in a regression. The
critical assumptions are that and are bivariate normal (each is normally distributed, but they
are correlated), with standard deviations normalized to 1, and s. It is important to appreciate how
strong these assumptions are: the errors are not only assumed to be bivariate normal, they are
also assumed to be homoskedastic.

Under these assumptions,

̂
| ] , (A7.5)
̂

where is the normal pdf and is the normal cdf, and ̂ is the predicted probability of
based on a probit estimation of the selection model. In order to estimate the average treatment
̂
effect, one regresses on and a “correction term,” ̂
. Although in general, controlling for a
covariate is an inadequate remedy for systematic attrition, under the assumptions of this model,
adding the correction term to the regression model produces unbiased estimates of . Another
interesting property of this model is that regressing on alone yields unbiased estimates so
long as is independent of the correction term, a condition that is satisfied when the expected
rate of attrition is the same for the treatment and control group.

The selection model above rests on strong assumptions. Homoskedasticity, for example,
presupposes that the variance of is the same for all subjects, an assumption that does not
follow from random allocation of subjects to treatment. A different set of assumptions leads to a
different estimation approach, known as Tobit. Recall from equation (7.10) that applying the
difference-in-means estimator to observed outcomes is, in expectation, equivalent to estimating
the treatment effect among those who would be observed if assigned to the control group, plus a
term that represents the selection effect. Suppose we assume that the treatment effect is positive
for all subjects. Under this assumption, the subjects who would be observed if assigned to the
control group will also be observed if assigned to treatment, since if . As for
the term that represents the selection effect, if the treatment effect is positive, |
| ] will be positive. The intuition here is that the set of subjects for whom
is a more select set with higher potential outcomes than the set of subjects for whom
. When , . When , , but this is an easier hurdle
because is greater than .

Based on the assumption of positive treatment effects, Angrist et al. (2006) propose a
parametric model of the effects of vouchers on test scores:

, (A7.6)

where if ; otherwise is missing. This model is similar to the selection model

above except that now missingness is a function of latent outcomes, not covariates. If the
outcomes were not truncated, the parameters of equation (A7.6) could be estimated using
regression. Given truncation, regression is biased. Angrist et al. assume that the are drawn
independently from a normal distribution and estimate this regression model for different values
of the cutoff parameter, . They term this approach artificial censoring, because for a small
fraction of subjects, contrary to the model, observed values fall below the proposed censoring
value , in which case the researchers treat these subjects as though they were missing.

Table A7.1 shows the results of tobit estimation based on the assumption that outcomes
are missing whenever the score the subject would have received is less than 32, which is the 1st
percentile score among the observed scores. In contrast to the missing at random assumption,
this censoring value suggests that the missing subjects would have done very poorly on the
exam. The details of estimating these models are as follows: To prepare the data for the
command used to estimate the tobit moded, all outcome values less than or equal to 32 are set to
32. The Stata command option ll in the command line below indicates that the smallest value in
the data will be used as the left censoring value.

The tobit estimates imply that attrition severely distorts simple regression results. If the
model is correct, the estimated effect of vouchers on reading scores is 3.3, rather than 0.7.
Repeating this exercise for a range of censoring values indicates the sensitivity of the estimates
to different assumptions about truncation.

. tobit readcens1 vouch0 age sex_name, ll

Tobit regression Number of obs = 3541

LR chi2(3) = 888.95
Prob > chi2 = 0.0000
Log likelihood = -6143.5532 Pseudo R2 = 0.0675

------------------------------------------------------------------------------
readcens1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
vouch0 | 3.289796 .7048559 4.67 0.000 1.907831 4.671761
age | -9.029631 .3660284 -24.67 0.000 -9.747279 -8.311982
sex_name | -1.673547 .6865377 -2.44 0.015 -3.019596 -.3274971
_cons | 137.9839 4.393478 31.41 0.000 129.3699 146.5979
-------------+----------------------------------------------------------------
/sigma | 16.29016 .385755 15.53384 17.04649
------------------------------------------------------------------------------
Obs. summary: 2334 left-censored observations at readcens1<=32
1207 uncensored observations
0 right-censored observations

Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Basic Training in Mathematics - Fitness Program For Science Students PDF
100% (1)
Basic Training in Mathematics - Fitness Program For Science Students PDF
390 pages
Reference Book
No ratings yet
Reference Book
3 pages
Experiments and Quasi-Experiments: Solutions To Exercises
No ratings yet
Experiments and Quasi-Experiments: Solutions To Exercises
4 pages
Handout 6 Causality
No ratings yet
Handout 6 Causality
16 pages
Exam Questions
No ratings yet
Exam Questions
3 pages
Heckman Selection Presentation - V3
No ratings yet
Heckman Selection Presentation - V3
3 pages
1999 Comment Sta 7951 PDF
No ratings yet
1999 Comment Sta 7951 PDF
4 pages
Applied Econometrics: William Greene Department of Economics Stern School of Business
No ratings yet
Applied Econometrics: William Greene Department of Economics Stern School of Business
68 pages
151 Practice Final 1
100% (1)
151 Practice Final 1
11 pages
10620A Lecture8 H2023
No ratings yet
10620A Lecture8 H2023
44 pages
M06 StockWatson123635 03 Econ Ch06
No ratings yet
M06 StockWatson123635 03 Econ Ch06
50 pages
Sample Exam With Solutions. Econometrics II 2015.
No ratings yet
Sample Exam With Solutions. Econometrics II 2015.
15 pages
Tutorial: Matching and Difference in Difference Estimation: Psmatch2 From HTTP://FMWWW - Bc.Edu/Repec/Bocode/P
No ratings yet
Tutorial: Matching and Difference in Difference Estimation: Psmatch2 From HTTP://FMWWW - Bc.Edu/Repec/Bocode/P
12 pages
Discrete Choice Modeling: William Greene Stern School of Business New York University
No ratings yet
Discrete Choice Modeling: William Greene Stern School of Business New York University
58 pages
Stock Watson 3U ExerciseSolutions Chapter13 Instructors PDF
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter13 Instructors PDF
14 pages
Lecture set 3
No ratings yet
Lecture set 3
53 pages
Lecture 3a
No ratings yet
Lecture 3a
44 pages
f21 Pratice Midterm 2 Solution (Part I)
No ratings yet
f21 Pratice Midterm 2 Solution (Part I)
7 pages
Causal Inference: Yu Xie University of Michigan
No ratings yet
Causal Inference: Yu Xie University of Michigan
51 pages
Binary Logistic Regression Using Stata 17 Drop-Down Menus
No ratings yet
Binary Logistic Regression Using Stata 17 Drop-Down Menus
53 pages
Empirical Methods in Microeconomics
No ratings yet
Empirical Methods in Microeconomics
3 pages
6414 SP2022 Practice Final Part1 Solutions
No ratings yet
6414 SP2022 Practice Final Part1 Solutions
3 pages
DiD Regression
No ratings yet
DiD Regression
18 pages
Exam Question Evaluation With Item Response Theory: Evert-Jan - Bakker@wur - NL
No ratings yet
Exam Question Evaluation With Item Response Theory: Evert-Jan - Bakker@wur - NL
4 pages
Discrete Choice Modeling: William Greene Stern School of Business New York University
No ratings yet
Discrete Choice Modeling: William Greene Stern School of Business New York University
58 pages
When Does Heckman's Two-Step Procedure For Censored Data Work and When Does It Not?
No ratings yet
When Does Heckman's Two-Step Procedure For Censored Data Work and When Does It Not?
22 pages
Econometrics II ReExam
No ratings yet
Econometrics II ReExam
8 pages
EDU6950 SEM1 2009: 2 Assignment
No ratings yet
EDU6950 SEM1 2009: 2 Assignment
19 pages
Sample Questions PUHE6003
No ratings yet
Sample Questions PUHE6003
19 pages
The Tobit Model
No ratings yet
The Tobit Model
13 pages
Instrumental Variables in RCT
No ratings yet
Instrumental Variables in RCT
27 pages
ansprac2
No ratings yet
ansprac2
6 pages
17.874 Lecture Notes Part 6: Panel Models
No ratings yet
17.874 Lecture Notes Part 6: Panel Models
13 pages
Sample Selection Models in R: Package Sampleselection: Ott Toomet Arne Henningsen
No ratings yet
Sample Selection Models in R: Package Sampleselection: Ott Toomet Arne Henningsen
23 pages
GMU Econ535-Applied Econometrics Final Exam Spring 2023 solutions
No ratings yet
GMU Econ535-Applied Econometrics Final Exam Spring 2023 solutions
13 pages
Adv Econometrics
No ratings yet
Adv Econometrics
8 pages
Economics 717 Fall 2019 Lecture - Heckman
No ratings yet
Economics 717 Fall 2019 Lecture - Heckman
16 pages
Nihms 1780206
No ratings yet
Nihms 1780206
11 pages
2020-2021 Re
No ratings yet
2020-2021 Re
5 pages
Eco220y A17
No ratings yet
Eco220y A17
28 pages
Prop Scores
No ratings yet
Prop Scores
77 pages
PANEL DATA ANSWERS- PENSION SCHEMES
No ratings yet
PANEL DATA ANSWERS- PENSION SCHEMES
13 pages
Discrete Choice Modeling: William Greene Stern School of Business New York University
No ratings yet
Discrete Choice Modeling: William Greene Stern School of Business New York University
58 pages
Chapter 6-Linear Regression With Multiple Regressors
No ratings yet
Chapter 6-Linear Regression With Multiple Regressors
68 pages
Inferential Statistics notes
No ratings yet
Inferential Statistics notes
41 pages
Aea Cookbook Econometrics Module 4
No ratings yet
Aea Cookbook Econometrics Module 4
80 pages
Regn_lect_5
No ratings yet
Regn_lect_5
9 pages
Chi Square 2 - 2
No ratings yet
Chi Square 2 - 2
13 pages
Matching Regression
No ratings yet
Matching Regression
6 pages
Mock+Final+Exam
No ratings yet
Mock+Final+Exam
10 pages
Paper 3 Style Question With The Marking Scheme 5
No ratings yet
Paper 3 Style Question With The Marking Scheme 5
9 pages
Grade: A. Great Job! Feel Free To Let Me Know If You Have Any Questions About My Comments
No ratings yet
Grade: A. Great Job! Feel Free To Let Me Know If You Have Any Questions About My Comments
11 pages
Cap1_Slides
No ratings yet
Cap1_Slides
30 pages
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
SAS Annotated Output
No ratings yet
SAS Annotated Output
8 pages
ECON 301 - Midterm - F2023 - Answer Key
No ratings yet
ECON 301 - Midterm - F2023 - Answer Key
4 pages
Non Parametric Tests BIG
No ratings yet
Non Parametric Tests BIG
24 pages
Unitb - II - Linear Probability, Logit and Probit
No ratings yet
Unitb - II - Linear Probability, Logit and Probit
34 pages
Heckman The Common Structure of Statistical Models of Truncation Sample Selection and Limited Dependence Model
No ratings yet
Heckman The Common Structure of Statistical Models of Truncation Sample Selection and Limited Dependence Model
19 pages
Week04_LectureSlidesECO372
No ratings yet
Week04_LectureSlidesECO372
40 pages
Logic Handout
No ratings yet
Logic Handout
28 pages
29 Kaunlaran (23 April 2024)
No ratings yet
29 Kaunlaran (23 April 2024)
10 pages
Internet Value Chain 2022 1
No ratings yet
Internet Value Chain 2022 1
54 pages
MsIB Presentation For Companies - Compressed
No ratings yet
MsIB Presentation For Companies - Compressed
26 pages
Draft Syllabur For Financial Accounting and Reporting
No ratings yet
Draft Syllabur For Financial Accounting and Reporting
5 pages
Draft Syllabus For Auditing
No ratings yet
Draft Syllabus For Auditing
4 pages
Kelleher 2E TOC Preface
No ratings yet
Kelleher 2E TOC Preface
15 pages
FRE 528 Applied Econometrics University of British Columbia Fall, 2018
No ratings yet
FRE 528 Applied Econometrics University of British Columbia Fall, 2018
4 pages
Applied Regression Analysis For Business - Tools, Traps and Applications (PDFDrive)
No ratings yet
Applied Regression Analysis For Business - Tools, Traps and Applications (PDFDrive)
294 pages
Jottings: RC Makati - Through The Years
No ratings yet
Jottings: RC Makati - Through The Years
22 pages
Leadership Team Ry 2021-2022: Jottings
No ratings yet
Leadership Team Ry 2021-2022: Jottings
12 pages
Villatuya - Jose Ramon-ICD Profile CV (March.28.2022)
No ratings yet
Villatuya - Jose Ramon-ICD Profile CV (March.28.2022)
2 pages
Guess How Much I Love You
No ratings yet
Guess How Much I Love You
1 page
Jottings: Club Launches Hack4Food Challenge To Address Challenges in Agriculture
No ratings yet
Jottings: Club Launches Hack4Food Challenge To Address Challenges in Agriculture
6 pages
Syllabus Decision Theory
No ratings yet
Syllabus Decision Theory
4 pages
EC505 F16 Persson Bjorn1
No ratings yet
EC505 F16 Persson Bjorn1
2 pages
Linear Algebra For Business Analytics
No ratings yet
Linear Algebra For Business Analytics
27 pages
Basic R Programming: Exercises
No ratings yet
Basic R Programming: Exercises
7 pages
Journal of Statistical Software: Reviewer: Samuel E. Buttrey Naval Postgraduate School
No ratings yet
Journal of Statistical Software: Reviewer: Samuel E. Buttrey Naval Postgraduate School
4 pages
Dining With The Data: The Case of New York City and Its Restaurants
No ratings yet
Dining With The Data: The Case of New York City and Its Restaurants
7 pages
Course p015 Cost Benefit Analysis
No ratings yet
Course p015 Cost Benefit Analysis
2 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Econometrics Syllabus
No ratings yet
Econometrics Syllabus
4 pages
Model Sum of Squares DF Mean Square F Sig. 1 Regression 49.210 4 12.302 45.969 .000 Residual 12.846 48 .268 Total 62.056 52 A. Predictors: (Constant), LC, TANG, DEBT, EXT B. Dependent Variable: DPR
No ratings yet
Model Sum of Squares DF Mean Square F Sig. 1 Regression 49.210 4 12.302 45.969 .000 Residual 12.846 48 .268 Total 62.056 52 A. Predictors: (Constant), LC, TANG, DEBT, EXT B. Dependent Variable: DPR
3 pages
Foi 5590
No ratings yet
Foi 5590
19 pages
HealthcareHumanResourcePlanning R!
No ratings yet
HealthcareHumanResourcePlanning R!
28 pages
Intrebari Grila Econometrie Exemple PT EXAMEN Pus
No ratings yet
Intrebari Grila Econometrie Exemple PT EXAMEN Pus
7 pages
Casio Calc Regression
No ratings yet
Casio Calc Regression
2 pages
Addis Ababa University - Group Assignment I-4
No ratings yet
Addis Ababa University - Group Assignment I-4
1 page
7SSMM700 Lecture 8
No ratings yet
7SSMM700 Lecture 8
33 pages
2016 Book TimeSeriesEconometrics
100% (3)
2016 Book TimeSeriesEconometrics
421 pages
MIDAS Usersguide V2.3
No ratings yet
MIDAS Usersguide V2.3
57 pages
Ei Pack Rnews
No ratings yet
Ei Pack Rnews
5 pages
Domodar N. Gujarati: Chapter # 8: Multiple Regression Analysis
No ratings yet
Domodar N. Gujarati: Chapter # 8: Multiple Regression Analysis
41 pages
Worksheet 7 Olympic Swimmers
No ratings yet
Worksheet 7 Olympic Swimmers
5 pages
Box Jenkins Method
No ratings yet
Box Jenkins Method
5 pages
Uts Bahrudin Sunge
No ratings yet
Uts Bahrudin Sunge
15 pages
Lecture No. 20
No ratings yet
Lecture No. 20
19 pages
AP Stats 3.2
No ratings yet
AP Stats 3.2
57 pages
QM
No ratings yet
QM
4 pages
Lecture8 4
No ratings yet
Lecture8 4
29 pages
5th Session Forecasting Business
No ratings yet
5th Session Forecasting Business
13 pages
Statistics For Business STAT130: Unit 8: Correlation and Regression Analysis
No ratings yet
Statistics For Business STAT130: Unit 8: Correlation and Regression Analysis
56 pages
Instructors Manual
No ratings yet
Instructors Manual
96 pages
4363Introduction to Econometrics 3rd Edition James H. Stock - Download the complete ebook in PDF format and read freely
100% (6)
4363Introduction to Econometrics 3rd Edition James H. Stock - Download the complete ebook in PDF format and read freely
67 pages
Program MARK a Gentle Introduction
No ratings yet
Program MARK a Gentle Introduction
1,025 pages
(Econometric Society Monographs) Rakesh V. Vohra - Mechanism Design - A Linear Programming Approach (2011, Cambridge University Press)
0% (1)
(Econometric Society Monographs) Rakesh V. Vohra - Mechanism Design - A Linear Programming Approach (2011, Cambridge University Press)
185 pages
Sample - Global Surfactants Market - Mordor Intelligence
No ratings yet
Sample - Global Surfactants Market - Mordor Intelligence
35 pages
Econometrics - Sebenta
No ratings yet
Econometrics - Sebenta
106 pages
Multiple Regression Case
0% (1)
Multiple Regression Case
3 pages