Academia.eduAcademia.edu

The Economics and Econometrics of Active Labor Market Programs

1999, Handbook of labor economics

AI-generated Abstract

This chapter explores the effectiveness of government-provided job training, wage subsidies, and job search assistance, key components of active labor market policies within modern welfare states. It emphasizes the need for objective evaluations of these programs, especially given the rising costs and criticisms of welfare states. Through a review of over three decades of evaluations, primarily from the U.S. context, the chapter identifies general lessons applicable to similar programs in Europe, discussing the methodologies used for assessments and relevant findings on the impact of these labor market interventions.

The Economics and Econometrics of Active Labor Market Programs James J. Heckman, University of Chicago Robert J. LaLonde, Michigan State University and Je¤rey A. Smith, University of Western Ontario Prepared for the Handbook of Labor Economics, Volume III, Orley Ashenfelter and David Card, editors. We thank Susanne Ackum Agell for her helpful comments on Scandanavian active labor market programs and Costas Meghir for very helpful comments on Sections 1-7. 1 The Economics and Econometrics of Active Labor Programs Contents 1. Introduction 2. Public Job Training and Active Labor Market Policies 3. The Evaluation Problem and the Parameters of Interest in Evaluating Social Programs 3.1 The Evaluation Problem 3.2 The Counterfactuals of Interest 3.3 The Counterfactuals Most Commonly Estimated in the Literature 3.4 Is Treatment on the Treated an Interesting Economic Parameter? 4. The Prototypical Solutions to the Evaluation Problem 4.1 The Before-After Estimator 4.2 The Di¤erence-in-Di¤erences Estimator 4.3 The Cross Section Estimator 5. Social Experiments 5.1 How Social Experiments Solve the Evaluation Problem 5.2 Intention to Treat and Substitution Bias 5.3 Social Experiments in Practice 5.3.1 Two Important Social Experiments 5.3.2 The Practical Importance of Dropping Out and Substitution 5.3.3 Additional Problems Common to All Evaluations 6. Econometric Models of Outcomes and Program Participation 6.1 Uses of Economic Models 6.2 Prototypical Models of Earnings and Program Participation 6.3 Expected Present Value of Earnings Maximization 6.3.1 Common Treatment E¤ect 6.3.2 A Separable Representation 6.3.3 Variable Treatment E¤ect 6.3.4 Imperfect Credit Markets 6.3.5 Training as a Form of Job Search 6.4 The Role of Program Eligibility Rules in Determining Participation 6.5 Administrative Discretion and the E¢ciency and Equity of Training Provision 6.6 The Con‡ict between the Economic Approach to Program Evaluation and the Modern Approach to Social Experiments 7. Non-experimental Evaluations 7.1 The Problem of Causal Inference in Non-experimental Evaluations 2 7.2 Constructing a Comparison Group 7.3 Econometric Evaluation Estimators 7.4 Identi…cation Assumptions for Cross-Section Estimators 7.4.1 The Method of Matching 7.4.2 Index Su¢cient Methods and the Classical Econometric Selection Model 7.4.3 The Method of Instrumental Variables 7.4.4 The Instrumental Variable Estimator as a Matching Estimator 7.4.5 IV Estimators and the Local Average Treatment E¤ect 7.4.6 Regression Discontinuity Estimators 7.5 Using Aggregate Time Series Data on Cohorts of Participants to Evaluate Programs 7.6 Panel Data Estimators 7.6.1 Analysis of the Common Coe¢cient Model 7.6.2 The Fixed E¤ects Method 7.6.3 Ut Follows a First-Order Autoregressive Process 7.6.4 Ut is Covariance Stationary 7.6.5 Repeated Cross-Section Analogs of Longitudinal Procedures 7.6.6 The Fixed E¤ect Model 7.6.7 The Error Process Follows a First-Order Autoregression 7.6.8 Covariance Stationary Errors 7.6.9 The Anomalous Properties of First Di¤erence or Fixed E¤ect Models 7.6.10 Robustness of Panel Data Methods in the Presence of Heterogeneous Responses to Treatment 7.6.11 Panel Data Estimators as Matching Estimators 7.7 Robustness to Biased Sampling Plans 7.7.1 The IV Estimator and Choice-Based Sampling 7.7.2 The IV Estimator and Contamination Bias 7.7.3 Repeated Cross-Section Methods with Unknown Training Status and ChoiceBased Sampling 7.8 Bounding and Sensitivity Analysis 8. Econometric Practice 8.1 Data Sources 8.1.1 Using Existing General Survey Data Sets 8.1.2 Using Administrative Data 8.1.3 Collecting New Survey Data 8.1.4 Combining Data Sources 8.2 Characterizing Selection Bias 8.3 A Simulation Study of the Sensitivity of Nonexperimental Methods 8.3.1 A Model of Earnings and Program Participation 3 8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results from the Simulations 8.4 Speci…cation Testing and the Fallacy of Alignment 9. Indirect E¤ects, Displacement, and General Equilibrium Treatment E¤ects 9.1 Review of Traditional Approaches to Displacement and Substitution 9.2 General Equilibrium Approaches 9.2.1 Davidson and Woodbury 9.2.2 Heckman, Lochner, and Taber 9.3 Summary on General Equilibrium Approaches 10. A Survey of Empirical Findings 10.1 The Objectives of Program Evaluations 10.2 The Impact of Government Programs on Labor Market Outcomes 10.3 The Findings from U.S. Social Experiments 10.4 The Findings from Non-experimental Evaluations of U.S. Programs 10.5 The Findings from European Evaluations 11. Conclusions 1 Introduction Public provision of job training, of wage subsidies and of job search assistance is a feature of the modern welfare state. These activities are cornerstones of European “active labor market policies,” and have been a feature of U.S. social welfare policy for more than three decades. Such policies also have been advocated as a way to soften the shocks administered to the labor markets of former East Bloc and Latin economies currently in transition to market-based systems. A central characteristic of the modern welfare state is a demand for “objective” knowledge about the e¤ects of various government tax and transfer programs. Di¤erent parties bene…t and lose from such programs. Assessments of these bene…ts and losses often play critical roles in policy decision-making. Recently, interest in evaluation has been elevated as many economies with modern welfare states have ‡oundered, and as the costs of running welfare states have escalated. This chapter examines the evidence on the e¤ectiveness of such welfare state active labor market policies such as training, job search and job subsidy policies, and the methods used to obtain the evidence on their e¤ectiveness. Our methodological discussion of alternative approaches to evaluating programs has more general interest. Few U.S. government programs have received such intensive scrutiny, and have been subject to so many di¤erent 4 types of evaluation methodologies as has governmentally-supplied job training. In part, this is due to the fact that short run measures of government training programs are more easily obtained and are more readily accepted. Outcomes such as earnings, employment, and educational and occupational attainment are all more easily measured than the outcomes of health and public school education programs. In addition, short run measures of the outcomes of training programs are more closely linked to the “treatment” of training. In public school and health programs, a variety of inputs over the life cycle often give rise to measured outcomes. For these programs, attribution of speci…c e¤ects to speci…c causes is more problematic. A major focus of this chapter is on the general lessons learned from over thirty years of experience in evaluating government training programs. Most of our lessons come from American studies because the U.S. government has been much more active in promoting evaluations than have other governments, and the results from the evaluations are often used to expand – or contract – government programs. We demonstrate that recent studies in Europe indicate that the basic patterns and lessons from the American case apply more generally. The two relevant empirical questions in this literature are (i) adjusting for their lower skills and abilities, do participants in government employment and training programs bene…t from these programs? and (ii) are these programs worthwhile social investments? As currently constituted, these programs are often ine¤ective on both counts. For most groups of participants, the bene…ts are modest, and at worst participation in government programs is harmful. Moreover, many programs and initiatives can not pass a cost-bene…t test. Even when programs are cost e¤ective, they are rarely associated with a large scale improvement in skills. But, at the same time, there is substantial heterogeneity in the impacts of these programs. For some groups these programs appear to generate signi…cant bene…ts both to the participants and to society. We believe that there are two reasons why the private and social gains from these programs are generally small. First, the per-capita expenditures on participants are usually small relative to the de…cits that these programs are being asked to address. In order for such interventions to generate large gains they would have to be associated with very large internal rates of return. Moreover, these returns would have to larger than what is estimated for private sector training (Mincer, 1993). Another reason that the gains from these programs are generally low is that these services are targeted toward relatively unskilled and less able individuals. Evidence on the complementarity between the returns to training and skill in the private sector suggests that the returns to training in the public sector should be relatively low. We also survey the main methodological lessons learned from thirty years of evaluation activity conducted mainly in the United States. We have identi…ed eight lessons from the 5 evaluation literature that we believe should guide practice in the future. First, there are many parameters of interest in evaluating any program. This multiplicity of parameters results in part because of the heterogeneous impacts of these programs. As a result of this heterogeneity, some popular estimators that are well-suited for estimating one set of parameters are poorly suited for estimating others. Understanding that responses to the same measured treatment are heterogenous across people, that measured treatments themselves are heterogeneous, that in many cases people participate in programs based in part on this heterogeneity and that econometric estimators should allow for this possibility, is an important insight of the modern literature that challenges traditional approaches to program evaluation. Because of this heterogeneity, many di¤erent parameters are required to answer the interesting evaluation questions. Second, there is inherently no method of choice for conducting program evaluations. The choice of an appropriate estimator should be guided by the economics underlying the problem, the data that are available or that can be acquired, and the evaluation question being addressed. A third lesson from the evaluation literature is that better data helps a lot. The data available to most analysts have been exceedingly crude as we document below. Too much has been asked of econometric methods to remedy the defects of the underlying data. When certain features of the data are improved, the evaluation problem becomes much easier. The best solution to the evaluation problem lies in improving the quality of the data on which evaluations are conducted and not in the development of formal econometric methods to circumvent inadequate data. Fourth, it is important to compare comparable people. Many non-experimental evaluations identify the parameter of interest by comparing observationally di¤erent persons using extrapolations based on inappropriate functional forms imposed to make incomparable people comparable. A major advantage of nonparametric methods for solving the problem of selection bias is that, rigorously applied, they force analysts to compare only comparable people. Fifth, evidence that di¤erent non-experimental estimators produce di¤erent estimates of the same parameter does not indicate that non-experimental methods cannot address the underlying self-selection problem in the data. Instead, di¤erent estimates obtained from di¤erent estimators simply indicate that di¤erent estimators address the selection problem in di¤erent ways and that non-random participation in social programs is an important problem that deserves more attention in its own right. Di¤erent methods produce the same estimates only if there is no problem of selection bias. Sixth, a corollary lesson, derived from lessons three, four and …ve, is that the message from LaLonde’s (1986) in‡uential study of nonexperimental estimators has been misunderstood. Once analysts de…ne bias clearly, compare comparable people, know a little about 6 the unemployment histories of trainees and comparison group members, administer them the same questionnaire and place them in the same local labor market, much of the bias in using nonexperimental methods is attenuated. Variability in estimates across estimators arises from the fact that di¤erent nonexperimental estimators solve the selection problem under di¤erent assumptions, and these assumptions are often incompatible with each other. Only if there is no selection bias would all evaluation estimators identify the same parameter. Seventh, three decades of experience with social experimentation have enhanced our understanding of the bene…ts and limitations of this approach to program evaluation. Like all evaluation methods, this method is based on implicit identifying assumptions. Experimental methods estimate the e¤ect of the program compared to no programs at all when they are used to evaluate the e¤ect of a program for which there are few good substitutes. They are less e¤ective when evaluating ongoing programs in part because they appear to disrupt established bureaucratic procedures. The threat of disruption leads local bureaucrats to oppose their adoption. To the extent that programs are disrupted, the program evaluated by the method is not the ongoing program that one seeks to evaluate. The parameter estimated in experimental evaluations is often not likely to be of primary interest to policy makers and researchers, and under any event has to be more carefully interpreted than is commonly done in most public policy discussions. However, if there is no disruption, and the other problems that plague experiments are absent, the evidence from social experiments provides a benchmark for learning about the performance of alternative non-experimental methods. Eighth, and …nally, programs implemented at a national or regional level a¤ect both participants and nonparticipants. The current practice in the entire “treatment e¤ect” literature is to ignore the indirect e¤ects of programs on nonparticipants by assuming they are negligible. This practice can produce substantially misleading estimates of program impacts if indirect e¤ects are substantial. To account for the impacts of programs on both participants and nonparticipants, general equilibrium frameworks are required when programs substantially impact the economy. The remainder of the chapter is organized as follows. In Section 2, we distinguish among several types of active labor market policies and describe the types of employment and training services o¤ered both in U.S. and in Europe, their approximate costs, and their intended e¤ects. We introduce the evaluation problem in Section 3. We discuss the importance of heterogeneity in the response to treatment for de…ning counterfactuals of interest. We consider what economic questions the most widely used counterfactuals answer. In section 4, we present three prototypical solutions to the problem cast in terms of mean impacts. These prototypes are generalized throughout the rest of this chapter, but three basic principles introduced in this section underlie all approaches to program evaluation when the 7 parameters of interest are means or conditional means. In Section 5, we present conditions under which social experiments solve the evaluation problem and assess the e¤ectiveness of social experiments as a tool for evaluating employment and training programs. In Section 6, we outline two prototypical models of program participation and outcomes that represent the earliest and the latest thinking in the literature. We demonstrate the implications of these decision rules for the choice of an econometric evaluation estimator. We discuss the empirical evidence on the determinants of participation in government training programs. The econometric models used to evaluate the impact of training programs in nonexperimental settings are described in Section 7. The interplay between the economics of program participation and the choice of an appropriate evaluation estimator is stressed. In Section 8, we discuss some of the lessons learned from implementing various approaches to evaluation. Included in this section are the results of a simulation analysis based on the empirical model of Ashenfelter and Card (1985), where we demonstrate the sensitivity of the performance of alternative estimators to assumptions about heterogeneity in impact among persons and other data generating processes of the underlying econometric model. We also reexamine LaLonde’s (1986) evidence on the performance of nonexperimental estimators and reinterpret the main lessons from his study. Section 9 discusses the problems that arise in using microeconomic methods to evaluate programs with macroeconomic consequences. A striking example of the problems that can arise from this practice is provided. Two empirically operational general equilibrium frameworks are presented, and the lessons from applying them in practice are summarized. Section 10 surveys the …ndings from the non-experimental literature, and contrasts them with those from experimental evaluations. We conclude in Section 11 by surveying the main methodological lessons learned from the program evaluation literature on job training. 8 2 Public Job Training and Active Labor Market Policies Many government policies a¤ect employment and wages. The “active labor market” policies we analyze have two important features that distinguish them from general policies, such as income taxes, that also a¤ect the labor market. First, they are targeted toward the unemployed or toward those with low skills or little work experience who have completed (usually at a low level) their formal schooling. Second, the policies are aimed at promoting employment and/or wage growth among this population, rather than just providing income support. Table 2.1 describes the set of policies we consider. This set includes: (a) classroom training (CT) consisting of basic education to remedy de…ciencies in general skills or vocational training to provide the skills necessary for particular jobs; (b) subsidized employment with public or private employers (WE), which includes public service employment (wholly subsidized temporary government jobs) and work experience (subsidized entry-level jobs at public or non-pro…t employers designed to introduce young people to the world of work) as well as wage supplements and …xed payments to private …rms for hiring new workers; (c) subsidies to private …rms for the provision of on-the-job training (OJT); (d) training in how to obtain a job; and (e) in-kind subsidies to job search such as referrals to employers and free access to job listings. Policies (d) and (e) fall under the general heading of job search assistance (JSA), which also includes the job matching services provided by the U.S. Employment Service and similar agencies in other countries. As we argue in more detail below, distinguishing the types of training provided is important for two reasons. First, di¤erent types of training often imply di¤erent economic models of training participation and impact and therefore di¤erent econometric estimation strategies. Second, because most existing training programs provide a mix of these services, heterogeneity in the impact of training becomes an important practical concern. As we show in Section 7, this heterogeneity has important implications for the choice of econometric methods for evaluating active labor market policies. We do not analyze privately supplied job training despite its greater quantitative importance to modern economies (see Heckman, Lochner and Taber, 1998a, or Mincer, 1962, 1993). For example, in the United States, Jacob Mincer has estimated that such training amounts to approximately 4 to 5 percent of GDP, annually. Despite the magnitude of this investment there are surprisingly few publicly-available studies of the returns to private job training, and many of those that are available do not control convincingly for the non-random allocation of training among private sector workers. Governments demand publicly-justi…ed evaluations of training programs while private …rms, to the extent that 9 they formally evaluate their training programs, keep their …ndings to themselves. An emphasis on objective publicly accessible evaluations is a distinctive feature of the modern welfare state, especially in an era of limited funds and public demands for accountability. Table 2.2 presents the amount spent on active labor market policies by a number of OECD countries. Most OECD countries provide some mix of the employment and training services described in Table 2.1. Di¤erences among countries include the relative emphasis on each type of service, the particular populations targeted for service, the total resources spent on the programs, how resources are allocated among programs and the extent to which employment and training services are integrated with other programs such as unemployment insurance or social assistance. In addition, although the programs we study are funded by governments, they are not always conducted by governments, especially in the U.S. and the U.K. In decentralized training systems, private …rms and local organizations play an important role in providing employment and training services. Table 2.2 reveals that many OECD countries spend substantial sums on active labor market policies. In nearly all countries, total expenditures are more than one-third of total expenditures on unemployment bene…ts, and some countries’ expenditures on active labor market policies exceed those on unemployment bene…ts. Usually only a fraction of these expenditures are for CT. Further, even in countries that emphasize classroom training, governments spend substantial sums on other active labor market policies. Denmark spends 1 percent of its GDP on CT for adults, the most of any OECD country. However, this expenditure amounts to only 40 percent of its total spending on active labor market programs. Only in Canada is the fraction spent on CT larger. At the opposite extreme, Japan and the U.S. spend only 0.03 percent and 0.04 percent, respectively, of their GDP on CT. However, as the table shows, these two countries also spend the smallest share of GDP on active labor market policies. The low percentage of GDP spent on active labor market programs in the U.S. has led some researchers to comment on the irony that despite these low expenditures, U.S. programs have been evaluated more extensively and over a longer period of time than programs elsewhere (Haveman and Saks, 1985; Björklund, 1993). Indeed, much of what is known about the impacts of these programs and many of the methodological developments associated with evaluating them come from U.S. evaluations.1 1 However, the level of total expenditure in the U.S. is still quite large. Relative total expenditures on active labor market policies can be inferred from Table 2.2 using the relative sizes of each economy compared with the U.S. For example, the German economy is somewhat less than one-fourth the size of the U.S. economy, and the French, Italian and British economies are approximately one-sixth the size of the U.S. economy. Accordingly, training expenditures are somewhat greater in Germany and France, about the same in Italy, and less in the United Kingdom than in the U.S. See OECD, Employment Outlook (1996), Table 1.1, p.2. 10 We now consider in detail each type of employment and training service in Table 2.1. This discussion motivates the consideration of alternative economic models of program participation and impact in Sections 6 and 7, and our focus on heterogeneity in program impacts. It also provides a context for the empirical literature on the impact these programs that we review in Section 10. The …rst category listed in Table 2.1 is classroom training. In many countries, CT represents the largest fraction of government expenditures on active labor market policy, and most of that expenditure is devoted to vocational training. Even in the U.S., where remedial programs aimed at high school dropouts and other low-skill individuals play a larger role than elsewhere, most CT programs provide vocational training. By design, most CT programs in the OECD are of limited duration. For example in Denmark, CT typically lasts 2 to 4 weeks (Jensen, et al., 1993) while in Sweden a duration of four months and in the United Kingdom, and the United States three months is the more typical duration. Per capita expenditures on such training varies substantially, with a training slot costing approximately $7,500 in Sweden and between $2,000 and $3,000 in the United States.2 The Swedish …gures include stipends for participants while the U.S. …gures do not. An important di¤erence among OECD countries that provide CT is the extent to which the training is relatively standardized and therefore less tailored to the requirements of …rms or the market in general. In the 1980s and early 1990s, the Nordic countries usually provide CT in government training centers that use standardized materials and teaching methods. However, the emphasis has shifted recently, especially in Sweden, toward decentralized and …rm based training. In the United Kingdom and the U.S., the provision of CT is highly decentralized and its content depends on the choices made by local councils of business, political, and labor leaders. The local councils receive funding from the federal government and then subcontract for CT with private vocational and proprietary schools and local community colleges. Due to this highly decentralized structure, both participant characteristics and training content can vary substantially among locales, which suggests that the impact of training is likely to vary substantially across individuals in evaluations of such programs. The second category of services listed in Table 2.1 are wage and employment subsidies. This category encompasses several di¤erent speci…c services which we group together due to their analytic similarity. The simplest example of this type of policy provides subsidies to private …rms for hiring workers in particular groups. These subsidies may take the form of a …xed amount for each new employee hired or some fraction of the employee’s wage for a period of time. In the U.S., the Targeted Jobs Tax Credit is an example of this type of program. Heckman, Lochner, Smith and Taber (1997) discuss the empirical evidence on 2 Unless otherwise indicated all monetary units are expressed in 1997 U.S. dollars. 11 the e¤ectiveness of wage and employment subsidies in greater detail. Temporary work experience (WE) usually targets low skilled youth or adults with poor employment histories and provides them with a job lasting 3 to 12 months in the public or nonpro…t sector. The idea of these programs is to ease the transition of these groups into regular jobs, by helping them learn about the world of work and develop good work habits. Such programs constitute a very small proportion of U.S. training initiatives, but substantial fractions of services provided to youth in countries such as France (TUC) and the United Kingdom (Community Programmes). In public sector employment (PSE) programs, governments create temporary public sector jobs. These jobs usually require some amount of skill and are aimed at unemployed adults with recent work experience rather than youth or the disadvantaged. Except for a brief period during the late 1970s, they have not been used in the United States since the Depression era. However, they have been and remain an important component of active labor market policy in several European countries. The third category in Table 2.1 is subsidized on-the-job training at private …rms. The goal of subsidized OJT programs is to induce employers provide job-relevant skills, including …rm-speci…c skills, to disadvantaged workers. In the U.S., employers receive a 50 percent wage subsidy for up to six months; in the U.K. employers receive a lump sum per week (O’Higgins, 1994). Although evidence is limited and …rm training is di¢cult to measure, there is a widespread view that these programs in fact provide little training, even informal on-the-job training, and are better characterized as a work experience or wage subsidy program (e.g., Breen, 1988; Hutchinson and Church, 1989).3 Survey responses by employers who have hired or sponsored OJT trainees suggest that they value the program for its help in reducing the costs associated with hiring and retaining suitable employees more than for the opportunity to increase the skills of new workers (Begg, et al., 1991). For purposes of evaluation, it is almost always impossible to distinguish those OJT experiences from which new skills were acquired from those that amounted to work experience or wage subsidy without a training component. In addition, because OJT is provided by individual employers, this indeterminacy is not simply a program-speci…c feature, but holds among individuals within the same program. Consequently, OJT programs will likely have heterogeneous e¤ects, and the impact, if any, of these programs will result from some combination of learning by doing, the usual training provided by the …rm to new workers 3 The provision of subsidized OJT is particularly hard to monitor both because on-the-job training has proven di¢cult to measure with survey methods (Barron, Berger and Black, 1997) and because trainees often do not peceive that they have been treated any di¤erently than their co-workers who are not subsidized. In fact, both groups may have received substantial amounts of informal on-the-job training. For evidence of the importance of informal on-the-job training in the U.S., see Barron, Black and Lowenstein (1989). 12 and incremental training beyond that provided to unsubsidized workers. The fourth category of services in Table 2.1 is job search assistance. The purpose of these services is to facilitate the matching process between workers and …rms both by reducing time unemployed and by increasing match quality. The programs are usually operated by the national or local employment service, but sometimes may be subcontracted out to third parties. Included under this category are direct placement in vacant jobs, employer referrals, in-kind subsidies to search such as free access to job listings and telephones for contacting employers, career counseling, and instruction in job search skills. The last of these, which often includes instruction in general social skills, was developed in the U.S., but is now used in U.K., Sweden, and recently France (Björklund and Regner, 1996, p. 24). In recent years, JSA has become more popular due to its low cost, usually just a few hundred dollars per participant, and relatively solid record of performance (which we discuss in detail in Section 10). To conclude this section, we discuss …ve features of employment and training programs that should be kept in mind when evaluating them. First, as the operation of these programs has become more decentralized in OECD countries, there have emerged di¤erences between how these programs were designed and how they are implemented (Hollister and Freedman, 1988). Actual practice can deviate substantially from explicit written policy.4 Therefore, the evaluator must be careful to characterize the program as implemented when assessing its impacts. Second, participants often receive services from more than one category in Table 2.1. For example, classroom training in vocational skills might be followed by job search assistance. In the U.K., the Youth Training Scheme (now Youth Training) was explicitly designed to combine OJT with 13 weeks of CT. Some expensive programs combine several of the services listed in Table 2.1 into a single package. For example, in the U.S. the Job Corps program for youth combines classroom training with work experience and job search assistance in a residential setting at a current cost of around $19,000 per participant. Many available survey data sets do not identify all the services received by a participant. In this case, the practice of combining together various types of training, particularly when combinations are tailored to the needs of individual trainees as in the U.S. JTPA program, constitutes another source of heterogeneity in the impact of training. Even when administrative data are available that identify the services received, isolating the impact of particular individual services often proves di¢cult or impossible in practice due to the small samples receiving particular combinations of services or due to di¢culties in determining the process by which 4 For example, see Breen (1988) and Hollister and Freedman (1990) describing the implementation of WEP in Ireland and Hollister and Freedman (1990) and Leigh (1995) describing the implementation of JTPA in the United States. 13 individuals come to receive particular service combinations. Third, certain features of active labor market programs a¤ect individuals’ decisions to participate in training. In some countries, such as Sweden and the United Kingdom, participation in training is a condition for receiving unemployment bene…ts rather than less generous social assistance payments. In the U.S., participation is sometimes required by a court order in lieu of alternative punishment. Fourth, program administrators often have considerable discretion over whom they admit into government training programs. This discretion results from the fact that the number of applicants often exceeds the number of available training positions. It has long been a feature of U.S. programs, but also has characterized programs in Austria, Denmark, Germany, Norway, and the United Kingdom (Björklund and Regner, 1996; WestergardNeilsen, 1993; Kraus, et al., 1997). Consequently, when modeling participation in training, it may be important to account for not only individual incentives, but also those of the program operators. In Section 6, we discuss the incentives facing program operators and how they a¤ect the characteristics of participants in government training programs. Finally, the di¤erent types of services require di¤erent economic models of program participation and impact. For example, the standard human capital model captures the essence of individual decisions to invest in vocational skills (CT). It provides little guidance to behavior regarding job search assistance or wage subsidies. In Section 6 we describe economic models that describe participation in alterative programs and discuss their implications for evaluation research. 14 3 3.1 The Evaluation Problem and the Parameters of Interest in Evaluating Social Programs The Evaluation Problem Constructing counterfactuals is the central problem in the literature on evaluating social programs. In the simplest form of the evaluation problem, persons are imagined as being able to occupy one of two mutually exclusive states: “0” for the untreated state and “1” for the treated state. Treatment is associated with participation in the program being evaluated.5 Associated with each state is an outcome, or set of outcomes. It is easiest to think of each state as consisting of only a single outcome measure, such as earnings, but just as easily, we can use the framework to model vectors of outcomes such as earnings, employment and participation in welfare programs. In the models presented in section 6, we study an entire vector of earnings or employment at each age that result from program participation. We can express these outcomes as a function of conditioning variables, X. Denote the potential outcomes by Y0 and Y1 , corresponding to the untreated and treated states. Each person has a (Y0 ; Y1 ) pair. Assuming that means exist, we may write the (vector) of outcomes in each state as (3.1a) Y0 = ¹0 (X) + U0 (3.1b) Y1 = ¹1 (X) + U1 where E(Y0 jX) = ¹0 (X) and E(Y1 jX) = ¹1 (X): To simplify the notation, we keep the conditioning on X implicit unless it serves to clarify the exposition by making it explicit. The potential outcome actually realized depends on decisions made by individuals, …rms, families or government bureaucrats. This model of potential outcomes is variously attributed to Fisher (1935), Neyman (1935), Roy (1951), Quandt (1972, 1988) or Rubin (1974). To focus on main ideas, throughout most of this chapter we assume E(U1 jX) = E(U0 jX) = 0, although as we note at several places in this paper, this is not strictly required. For many of the estimators that we consider in this chapter we allow for the more general case Y0 = g0 (X) + U0 Y1 = g1 (X) + U1 where E(U0 j X) 6= 0 and E(U1 j X) 6= 0. Then ¹0 (X) = g0 (X) + E(U0 jX) and ¹1 (X) = g1 (X) + E(U1 jX).6 Thus X is not necessarily exogenous in the ordinary econometric usage 5 In this paper, we only consider a two potential state model in order to focus on the main ideas. Heckman (1998a) develops a multiple state model of potential outcomes for a large number of mutually exclusive states. The basic ideas in his work are captured in the two outcome models we present here. 6 For example, an exogeneity assumption is not required when using social experiments to identify E(Y1 ¡ Y0 jX; D = 1). 15 of that term. These conditions do not imply that E(U1 ¡U0 jX; D = 1) = 0. D may depend on U1 , U0 or U1 ¡ U0 and X. Note also that Y may be a vector of outcomes or a time series of potential outcomes: (Y0t ; Y1t ); for t = 1; : : : ; T , on the same type of variable. We will encounter the latter case when we analyze panel data on outcomes. In this case, there is usually a companion set of X variables which we will sometimes assume to be strictly exogenous in the conventional econometric meaning of that term: E(U0t jX) = 0; E(U1t jX) = 0 where X = (X1;::: ; XT ): In de…ning a sequence of “treatment on the treated” parameters, E(Y1t ¡ Y0t jX; D = 1) t = 1; : : : ; T; this assumption allows us to abstract from any dependence between U1t , U0t and X. It excludes di¤erences in U1t and U0t arising from X dependence and allows us to focus on di¤erences in outcomes solely attributable to D. While convenient, this assumption is overly strong. However, we stress that the exogeneity assumption in either cross section or panel contexts is only a matter of convenience and is not strictly required. What is required for an interpretable de…nition of the “treatment on the treated” parameter is avoiding conditioning on X variables caused by D even holding Y P = ((Y01; Y11 ); : : : ; (Y0T; Y1T )) …xed where Y P is the vector of potential outcomes. More precisely, we require that for the conditional density of the data f (XjD; Y P ) = f (XjY P ) i.e. we require that the realization of D does not determine X given the vector of potential outcomes. Otherwise, the parameter E(Y1 ¡ Y0 jX; D = 1) does not capture the full e¤ect of treatment on the treated as it operates through all channels and certain other technical problems discussed in Heckman (1998a) arise. In order to obtain E(Y1t ¡ Y0t jX; D = 1) de…ned on subsets of X; say Xc ; simply integrate out E(Y1t ¡ Y0t jX; D) against the density f jD = 1) where X f is the portion of X not in X : X = (X X f f (X c c c c; c ). Note, …nally, that the choice of a base state “0” is arbitrary. Clearly the roles of “0” and “1” can be reversed. In the case of human capital investments, there is a natural base state. But for many other evaluation problems the choice of a base is arbitrary. Assumptions appropriate for one choice of “0” and “1” need not carry over to the opposite choice. With this cautionary note in mind, we proceed as if a well-de…ned base state exists. In many problems it is convenient to think of “0” as a benchmark “no treatment ” state. The gain to the individual of moving from “0” to “1” is given by (3.2) ¢ = Y1 ¡ Y0 : If one could observe both Y0 and Y1 for the same person at the same time, the gain ¢ would be known for each person. The fundamental evaluation problem arises because we do not know both coordinates of (Y1 ; Y0 ) and hence ¢ for anybody. All approaches to 16 solving this problem attempt to estimate the missing data. These attempts to solve the evaluation problem di¤er in the assumptions they make about how the missing data are related to the available data, and what data are available. Most approaches to evaluation in the social sciences accept the impossibility of constructing ¢ for anyone. Instead, the evaluation problem is rede…ned from the individual level to the population level to estimate the mean of ¢, or some other aspect of the distribution of ¢, for various populations of interest. The question becomes what features of the distribution of ¢ should be of interest and for what populations should it be de…ned? 3.2 The Counterfactuals of Interest There are many possible counterfactuals of interest for evaluating a social program. One might like to compare the state of the world in the presence of the program to the state of the world if the program were operated in a di¤erent way, or to the state of the world if the program did not exist at all, or to the state of the world if alternative programs were used to replace the present program. A full evaluation entails an enumeration of all outcomes of interest for all persons both in the current state of the world and in all the alternative states of interest, and a mechanism for valuing the outcomes in the di¤erent states. Outcomes of interest in program evaluations include the direct bene…ts received, the level of behavioral variables for participants and nonparticipants and the payments for the program, for both participants and nonparticipants, including taxes levied to …nance a publicly provided program. These measures would be displayed for each individual in the economy to characterize each state of the world. In a Robinson Crusoe economy, participation in a program is a well-de…ned event. In a modern economy, almost everyone participates in each social program either directly or indirectly. A training program a¤ects more than the trainees. It also a¤ects the persons with whom the trainees compete in the labor market, the …rms that hire them and the taxpayers who …nance the program. The impact of the program depends on the number and composition of the trainees. Participation in a program does not mean the same thing for all people. The traditional evaluation literature usually de…nes the e¤ect of participation to be the e¤ect of the program on participants explicitly enrolled in the program. These are the “Direct E¤ects.” They exclude the e¤ects of a program that do not ‡ow from direct participation, known as the “Indirect E¤ects”. This distinction appears in the pioneering work of H. G. Lewis on measuring union relative wage e¤ects (Lewis, 1963). His insights apply more generally to all evaluation problems in social settings. There may be indirect e¤ects for both direct participants and direct nonparticipants. Thus a direct participant may pay taxes to support the program just as persons who do not 17 directly participate may also pay taxes. A …rm may be an indirect bene…ciary of the lower wages resulting from an expansion of the trained workforce. The conventional econometric and statistical literature ignores the indirect e¤ects of programs and equates “treatment” outcomes with the direct outcome Y1 in the program state and “no treatment” with the direct outcome Y0 in the no program state. Determining all outcomes in all states is not enough to evaluate a program. Another aspect of the evaluation problem is the valuation of the outcomes. In a democratic society, aggregation of the evaluations and the outcomes in a form useful for social deliberations also is required. Di¤erent persons may value the same state of the world di¤erently even if they experience the same “objective” outcomes and pay the same taxes. Preferences may be interdependent. Redistributive programs exist, in part, because of altruistic or parternalistic preferences. Persons may value the outcomes of other persons either positively or negatively. Only if one person’s preferences are dominant (the idealized case of a social planner with a social welfare function) is there a unique evaluation of the outcomes associated for each possible state from each possible program. The traditional program evaluation literature assumes that the valuation of the direct e¤ects of the program boil down to the e¤ect of the program on GDP. This assumption ignores the important point that di¤erent persons value the same outcomes di¤erently and that the democratic political process often entails coalitions of persons who value outcomes in di¤erent ways. Both e¢ciency and equity considerations may receive di¤erent weights from di¤erent groups. Di¤erent mechanisms for aggregating evaluations and resolving social con‡icts exist in di¤erent societies. Di¤erent types of information are required to evaluate a program under di¤erent modes of social decision making. Both for pragmatic and political reasons, government social planners, statisticians or policy makers may value objective output measures di¤erently than the persons or institutions being evaluated. The classic example is the value of nonmarket time (Greenberg, 1997). Traditional program evaluations exclude such valuations largely because of the dif…culty of inputting the value and quantity of nonmarket time. By doing this, however, these evaluations value labor supply in the market sector at the market wage, but value labor supply in the nonmarket sector at a zero wage. By contrast, individuals value labor supply in the nonmarket sector at their reservation wage. In this example, two di¤erent sets of preferences value the same outcomes di¤erently. In evaluating a social program in a society that places weight on individual preferences, it is appropriate to recognize personal evaluations and that the same outcome may be valued in di¤erent ways by di¤erent social actors. Programs that embody redistributive objectives inherently involve di¤erent groups. Even if the taxpayers and the recipients of the bene…ts of a program have the same preferences, their valuations of a program will, in general, di¤er. Altruistic considerations often 18 motivate such programs. These often entail private valuations of distributions of program impacts - how much recipients gain over what they would experience in the absence of the program. (See Heckman and Smith, 1993, 1995, 1998a and Heckman, Smith and Clements, 1997.) Answers to many important evaluation questions require knowledge of the distribution of program gains especially for programs that have a redistributive objective or programs for which altruistic motivations play a role in motivating the existence of the program. Let D = 1 denote direct participation in the program and D = 0 denote direct nonparticipation. To simplify the argument in this section, ignore any indirect e¤ects. From the standpoint of a detached observer of a social program who takes the base state values (denoted “0”) as those that would prevail in the absence of the program, it is of interest to know, among other things, (A) the proportion of people taking the program who bene…t from it: Pr(Y1 > Y0 j D = 1) = Pr(¢ > 0 j D = 1); (B) the proportion of the total population bene…ting from the program: Pr(Y1 > Y0 j D = 1) ¢ Pr(D = 1) = Pr(¢ > 0 j D = 1) ¢ Pr(D = 1); (C) selected quantiles of the impact distribution inf f¢ : F (¢ j D = 1) > qg, where q is a quantile of the distribution ¢ and where “inf” is the smallest attainable value of ¢ that satis…es the condition stated in the braces; (D) the distribution of gains at selected base state values: F (¢ j D = 1; Y0 = y0 ); (E) the increase in the level of outcomes above a certain threshold y¹ due to a policy: Pr(Y1 > y¹ j D = 1) ¡ Pr(Y0 > y¹ j D = 1). Measure (A) is of interest in determining how widely program gains are distributed among participants. Participants in the political process with preferences over distributions of program outcomes would be unlikely to assign the same weight to two programs with the same mean outcome, one of which produced favorable outcomes for only a few persons while the other distributed gains more broadly. When considering a program, it is of interest to determine the proportion of participants who are harmed as a result of program participation, indicated by Pr(Y1 < Y0 j D = 1): Negative mean impact results might be acceptable if most participants gain from the program. These features of the outcome distribution are likely to be of interest to evaluators even if the persons studied do not know their Y0 and Y1 values in advance of participating in the program. Measure (B) is the proportion of the entire population that bene…ts from the program, assuming that the costs of …nancing the program are broadly distributed and are not perceived to be related to the speci…c program being evaluated. If voters have correct 19 expectations about the joint distribution of outcomes, it is of interest to politicians to determine how widely program bene…ts are distributed. At the same time, large program gains received by a few persons may make it easier to organize interest groups in support of a program than if the same gains are distributed more widely. Evaluators interested in the distribution of program bene…ts would be interested in measure (C). Evaluators who take a special interest in the impact of a program on recipients in the lower tail of the base state distribution would …nd measure (D) of interest. It reveals how the distribution of gains depends on the base state for participants. Measure (E) provides the answers to the question “do the distributions of gains for the participants dominate the distribution of outcomes if they did not participate?” (See Heckman, Smith and Clements, 1997; and Heckman and Smith, 1998a.) Expanding the scope of the discussion to evaluate the indirect e¤ects of the program makes it more likely that estimating distributional impacts is an important part in conducting program evaluations. 3.3 The Counterfactuals Most Commonly Estimated In The Literature The evaluation problem in its most general form for distributions of outcomes is formidable and is not considered in depth either in this chapter or in the literature. (Heckman and Smith, 1998a, and Heckman, Smith and Clements, 1997, consider identi…cation and estimation of counterfactual distributions.) Instead, in this chapter we focus on counterfactual means, and consider a form of the problem in which analysts have access to information on persons who are in one state or the other at any time, and for certain time periods there are some persons in both states, but there is no information on any single person who is in both states at the same time. As discussed in Heckman (1998a) and Heckman and Smith (1998a), a crucial assumption in the traditional evaluation literature is that the no treatment state approximates the no program state. This would be true if indirect e¤ects are negligible. Most of the empirical work in the literature on evaluating government training programs focuses on means and in particular on one mean counterfactual: the mean direct e¤ect of treatment on those who take treatment. The transition from the individual to the group level counterfactual recognizes the inherent impossibility of observing the same person in both states at the same time. By dealing with aggregates, rather than individuals, it is sometimes possible to estimate group impact measures even though it may be impossible to measure the impacts of a program on any particular individual. To see this point more formally, consider the switching regression model with two regimes denoted by “1” and “0” (Quandt, 1972). The observed outcome Y is given by 20 (3.3) Y = DY1 + (1 ¡ D)Y0 : When D = 1 we observe Y1 ; when D = 0 we observe Y0: To cast the foregoing model in a more familiar-looking form, and to distinguish it from conventional regression models, express the means in (3.1a) and (3.1b) in more familiar linear regression form: E(Yj jX) = ¹j (X) = X¯ j ; j = 0; 1. With these expressions, substitute from (3.1a) and (3.1b) into (3.3) to obtain Y = D(¹1 (X) + U1 ) + (1 ¡ D)(¹0 (X) + U0 ): Rewriting, Y = ¹0 (X) + D(¹1 (X) ¡ ¹0 (X) + U1 ¡ U0 ) + U0 : Using the linear regression representation, we obtain (3.4) Y = X¯ 0 + D(X(¯ 1 ¡ ¯ 0 ) + U1 ¡ U0 ) + U0 : Observe that from the de…nition of a conditional mean, E(U0 j X) = 0 and E(U1 j X) = 0: The parameter most commonly invoked in the program evaluation literature, although not the one actually estimated in social experiments, or in most nonexperimental evaluations, is the e¤ect of randomly picking a person with characteristics X and moving that person from “0” to “1”: E(Y1 ¡ Y0 jX) = E(¢jX): In terms of the switching regression model this parameter is the coe¢cient on D in the “regression” non-error component of following equation: (3.5) Y = ¹0 (X) + D(¹1 (X) ¡ ¹0 (X)) + fU0 + D(U1 ¡ U0 )g = ¹0 (X) + D(E(¢jX)) + fU0 + D(U1 ¡ U0 )g = X¯ 0 + DX(¯ 1 ¡ ¯ 0 ) + fU0 + D(U1 ¡ U0 )g where the term in braces is the “error.” If the model is specialized so that there are K regressors plus an intercept and ¯ 1 = (¯ 10 ; : : : ; ¯ 1K ) and ¯ 0 = (¯ 00 ; : : : ¯ 0K ), where the intercepts occupy the …rst position, and the slope coe¢cients are the same in both regimes: ¯ 1j = ¯ 0j = ¯ j ; 21 j = 1; : : : ; K and ¯ 00 = ¯ 0 and ¯ 10 ¡ ¯ 00 = ®, the parameter under consideration reduces to ®: (3.6) E(Y1 ¡ Y0 jX) = ¯ 10 ¡ ¯ 00 = ®: The regression model for this special case maybe written as (3.7) Y = X¯ + D® + fU0 + D(U1 ¡ U0 )g : It is nonstandard from the standpoint of elementary econometrics because the error term has a component that switches on or o¤ with D. In general, its mean is not zero because E[U0 + D(U1 ¡ U0 )] = E(U1 ¡ U0 jD = 1) Pr(D = 1): If U1 ¡ U0 ; or variables statistically dependent on it, help determine D, E(U1 ¡ U0 j D = 1) 6= 0. Intuitively, if persons who have high gains (U1 ¡ U0 ) are more likely to appear in the program, than this term is positive. In practice most non-experimental and experimental studies do not estimate E(¢ j X). Instead, most nonexperimental studies estimate the e¤ect of treatment on the treated, E(¢ j X; D = 1): This parameter conditions on participation in the program as follows: (3.8) E(¢jX; D = 1) = E(Y1 ¡ Y0 jX; D = 1) = X(¯ 1 ¡ ¯ 0 ) + E(U1 ¡ U0 jX; D = 1): It is the coe¢cient on D in the non-error component of the following regression equation: (3.9) Y = ¹0 (X) + D(E(¢jX; D = 1)) + fU0 + D [(U1 ¡ U0 ) ¡ E(U1 ¡ U0 jX; D = 1)]g = X¯ 0 + D(X(¯ 1 ¡ ¯ 0 ) + E(U1 ¡ U0 jX; D = 1)) + fU0 + D [(U1 ¡ U0 ) ¡ E(U1 ¡ U0 jX; D = 1)]g : E(¢ j X; D = 1) is a nonstandard parameter in conventional econometrics. It combines “structural” parameters X(¯ 1 ¡¯ 0 ) with the means of the unobservables (E(U1 ¡U0 jX; D = 1)): It measures the average gain in the outcome for persons who choose to participate in a program compared to what they would have experienced in the base state. It computes the average gain in terms of both observables and unobservables. It is the latter that makes the parameter look nonstandard. Most econometric activity is devoted to separating ¯ 0 and ¯ 1 from the e¤ects of the regressors on U1 and U0 . Parameter (3.8) combines these e¤ects. This parameter is implicitly de…ned conditional on the current levels of participation in the program in society at large. Thus it recognizes social interaction. But at any point in time the aggregate participation level is just a single number, and the composition of trainees is …xed. From a single cross section of data, it is not possible to estimate how variation in the levels and composition of participants in a program a¤ect the parameter. The two evaluation parameters we have just presented are the same if we assume that U1 ¡ U0 = 0, so the unobservables are common across the two states. From (3.9) we now have Y1 ¡ Y0 = ¹1 (X)¡ ¹0 (X) = X(¯ 1 ¡ ¯ 0 ). The di¤erence between potential outcomes in the two states is a function of X but not of unobservables. Further specializing the model to one of intercept di¤erences (i.e. Y1 ¡Y0 = ®); requires that the di¤erence between potential 22 outcomes is a constant. The associated regression can be written as the familiar-looking dummy variable regression model: (3.10) Y = X¯ + D® + U, where E(U) = 0: The parameter ® is easy to interpret as a standard structural parameter and the speci…cation (3.10) looks conventional. In fact, model (3.10) dominates the conventional evaluation literature. The validity of many conventional instrumental variables methods and longitudinal estimation strategies is contingent on this speci…cation as we document below. The conventional econometric evaluation literature focuses on ®; or more rarely, X(¯ 1 ¡ ¯ 0 ), and the selection problem arises from the correlation between D and U. While familiar, the framework of (3.10) is very special. Potential outcomes (Y1 ; Y0 ) di¤er only by a constant (Y1 ¡ Y0 = ®). The best Y1 is the best Y0 : All people gain or lose the same amount in going from “0” to “1”. There is no heterogeneity in gains. Even in the more general case, with ¹1 (X) and ¹0 (X) distinct, or ¯ 1 6= ¯ 0 in the linear regression representation, so long as U1 = U0 among people with the same X, there is no heterogeneity in the outcomes moving from “0” to “1”. This assumed absence of heterogeneity in response to treatments is strong. When tested, it is almost always rejected (see Heckman, Smith and Clements, 1997 and the evidence presented below). There is one case when U1 6= U0 , where the two parameters of interests are still equal even though there is dispersion in gain ¢. This case occurs when (3.11) E(U1 ¡ U0 jX; D = 1) = 0: Condition (3.11) arises when conditional on X; D does not explain or predict U1 ¡ U0 . This condition could arise if agents who select into state “1” from “0” either do not know or do not act on U1 ¡ U0 , or information dependent on U1 ¡ U0 , in making their decision to participate in the program. Ex post, there is heterogeneity, but ex ante it is not acted on in determining participation in the program. When the gain does not a¤ect individuals’ decisions to participate in the program, the error terms (the terms in braces in (3.7) and (3.9)) have conventional properties. The only bias in estimating the coe¢cients on D in the regression models arise from the dependence between U0 and D just as the only source of bias in the common coe¢cient model is the covariance between U and D when E(U(X)) = 0. To see this point take the expectation of the terms in braces in (3.7) and (3.9), respectively, to obtain the following: E(U0 + D(U1 ¡ U0 )jX; D) = E(U0 jX; D) and E(U0 + D [(U1 ¡ U0 ) ¡ E(U1 ¡ U0 jX; D = 1)] jX; D) = E(U0 j X; D). 23 A problem that remains when condition (3.11) holds is that, the D component in the error terms contributes a component of variance to the model and so makes the model heteroscedastic: V ar(U0 + D(U1 ¡ U0 )jX; D) = V ar(U0 jX; D) +2COV (U0 ; U1 ¡ U0 jX; D)D + V ar(U1 ¡ U0 jX; D)D: The distinction between a model with U1 = U0 , and one with U1 6= U0 , is fundamental to understanding modern developments in the program evaluation literature. When U1 = U0 and we condition on X, everyone with the same X has the same treatment e¤ect. The evaluation problem greatly simpli…es and one parameter answers all of the conceptually distinct evaluation questions we have posed. “Treatment on the treated” is the same as the e¤ect of taking a person at random and putting him/her into the program. The distributional questions (A)–(E) all have simple answers because everyone with the same X has the same ¢. Equation (3.10) is amenable to analysis by conventional econometric methods. Eliminating the covariance between D and U is the central problem in this model. When U1 6= U0 , but (3.11) characterizes the program being evaluated, most of the familiar econometric intuition remains valid. This is the “random coe¢cient” model with the coe¢cient on D “random” (from the standpoint of the observing economist), but uncorrelated with D. The central problem in this model is covariance between U0 and D and the only additional econometric problem arises in accounting for heteroscedasticity in getting the right standard errors for the coe¢cients. In this case, the response to treatment varies among persons with the same X values. The mean e¤ect of treatment on the treated and the e¤ect of treatment on a randomly chosen person are the same. In the general case when U1 6= U0 and (3.11) no longer holds, we enter a new world not covered in the traditional econometric evaluation literature. A variety of di¤erent treatment e¤ects can be de…ned. Conventional econometric procedures often break down or require substantial modi…cation. The error term for the model (3.5) has a non-zero mean.7 Both error terms are heteroscedastic. The distinctions among these three models — (a) the coe¢cient on D is …xed (given X) for everyone; (b) the coe¢cient on D is variable (given X), but does not help determine program participation; and (c) the coe¢cient on D is variable (given X) and does help determine program participation — are fundamental to this chapter and the entire literature on program evaluation. 7 E[U0 + D(U1 ¡ U0 )X] = E(U1 ¡ U0 j X; D = 1) Pr(D = 1 j X) 6= 0: 24 3.4 Is Treatment on the Treated an Interesting Economic Parameter? What economic question does parameter (3.2) answer? How does it relate to the conventional parameter of interest in cost-bene…t analysis - the e¤ect of a program on GDP? In order to relate the parameter (3.2) with the parameters needed to perform traditional cost-bene…t analysis, it is fruitful to consider a more general framework. Following our previous discussion, we consider two discrete states or sectors corresponding to direct participation and nonparticipation and a vector of policy variables ' that a¤ect the outcomes in both states and the allocation of persons to states or sectors. The policy variables may be discrete or continuous. Our framework departs from the conventional treatment e¤ect literature and allows for general equilibrium e¤ects. Assuming that costless lump-sum transfers are possible, that a single social welfare function governs the distribution of resources and that prices re‡ect true opportunity costs, traditional cost-bene…t analysis (see, e.g., Harberger, 1971) seeks to determine the impact of programs on the total output of society. E¢ciency becomes the paramount criterion in this framework, with the distributional aspects of policies assumed to be taken care of by lump sum transfers and taxes engineered by an enlightened social planner. In this framework, impacts on total output are the only objects of interest in evaluating programs. The distribution of program impacts is assumed to be irrelevant. This framework is favorable to the use of mean outcomes to evaluate social programs. Within the context of the simple framework discussed in Section 3.1, let Y1 and Y0 be individual output which trades at a constant relative price of “1” set externally and not a¤ected by the decisions of the agents we analyze. Alternatively, assume that the policies we consider do not alter relative prices. Let ' be a vector of policy variables which operate on all persons. These generate indirect e¤ects. c(') is the social cost of ' denominated in “0” units. We assume that c(0) = 0 and that c is convex and increasing in '. Let N1 (') be the number of persons in state “1” and N0 (') be the number of persons in state “0”. The total output of society is N1 (')E(Y1 j D = 1; ') + N0 (')E(Y0 j D = 0; ') ¡ c('); ¹ is the total number of persons in society. For simplicity, we where N1 (') + N0 (') = N assume that all persons have the same person-speci…c characteristics X. Vector ' is general enough to include …nancial incentive variables for participation in the program as well as mandates that assign persons to a particular state. A policy may bene…t some and harm others. 25 Assume for convenience that the treatment choice and mean outcome functions are di¤erentiable and for the sake of argument further assume that ' is a scalar. Then the change in output in response to a marginal increase in ' from any given position is: (3.12) ¢(') = @N1 (') [E(Y1 j D = 1; ') ¡ E(Y0 j D = 0; ')]+ @' " # " # @E(Y1 j D = 1; ') @E(Y1 j D = 0; ') @c(') N1 (') + N0 (') ¡ : @' @' @' The …rst term arises from the transfer of persons across sectors that is induced by the policy change. The second term arises from changes in output within each sector induced by the policy change. The third term is the marginal social cost of the change. In principle, this measure could be estimated from time-series data on the change in aggregate GDP occurring after the program parameter ' is varied. Assuming a well-de…ned social welfare function and making the additional assumption that prices are constant at initial values, an increase in GDP evaluated at base period prices raises social welfare provided that feasible bundles can be constructed from the output after the social program parameter is varied so that all losers can be compensated. (See, e.g., La¤ont, 1989, p. 155, or the comprehensive discussion in Chipman and Moore, 1976). If marginal policy changes have no e¤ect on intra-sector mean output, the bracketed elements in the second set of terms inside the braces are zero. In this case, the parameters of interest for evaluating the impact of the policy change on GDP are: @N1 (') (i) ; the number of people entering or @' leaving state 1. (ii) E(Y1 j D = 1; ') ¡ E(Y0 j D = 0; ');the mean output di¤erence between sectors. @c(') (iii) ; the social marginal cost of the policy. @' It is revealing that nowhere on this list are the parameters that receive the most attention in the econometric policy evaluation literature. (See, e.g., Heckman and Robb, 1985a). These are “the e¤ect of treatment on the treated”: (a) E(Y1 ¡ Y0 j D = 1,') or ¹ . This is (b) E(Y1 j ' = ' ¹ ) ¡ E(Y0 j ' = 0) where ' = ' ¹ sets N1 (¹ ') = N the e¤ect of universal coverage for the program. 26 Parameter (ii) can be estimated by taking simple mean di¤erences between the outputs in the two sectors; no adjustment for selection bias is required. Parameter (i) can be obtained from knowledge of the net movement of persons across sectors in response to the policy change, something usually neglected in micro policy evaluation (for exceptions, see Mo¢tt, 1992, or Heckman, 1992). Parameter (iii) can be obtained from cost data. Full social marginal costs should be included in the computation of this term. The typical micro evaluation neglects all three terms. Costs are rarely collected and gross outcomes are typically reported; entry e¤ects are neglected and term (ii) is usually “adjusted” to avoid selection bias when in fact, no adjustment is needed to estimate the impact of the program on GDP. It is informative to place additional structure on this model. This leads to a representation of a criterion that is widely used in the literature on microeconomic program evaluation and also establishes a link with the models of program participation used in the later sections of this chapter. Assume a binary choice random utility framework. Suppose that agents make choices based on net utility and that policies a¤ect participant utility through an additively-separable term k(') that is assumed scalar and di¤erentiable. Net utility is U = X + k(') where k is monotonic in ' and where the joint distributions of (Y1 ; X) and (Y0 ; X) are F (y1 ; x) and F (y0 ; x), respectively. The underlying variables are assumed to be continuously distributed. In the special case of the Roy model of self-selection (see, Heckman and Honoré, 1990, for one discussion) X = Y1 ¡ Y0 ; D = 1(U ¸ 0) = 1(X ¸ ¡k(')); where “1” is the indicator function (1(Z > 0) = 1 if Z > 0; = 0 otherwise) 1 ¹ Pr(U ¸ 0) = N ¹ R¡k(') N1 (') = N f (x)dx; and ¹ Pr(U < 0) = N ¹ N0 (') = N Total output is R ¡k(') ¡1 f (x)dx: R1 R ¡k(') 1 1 ¹ R¡1 ¹ R¡1 N y1 ¡k(') f (y1 ; x j ')dxdy1 + N y0 ¡1 f (y0 ; x j ')dxdy0 ¡ c('): 27 Under standard conditions (see, e.g., Royden, 1968), we may di¤erentiate this expression to obtain the following expression for the marginal change in output with respect to a change in ' : (3.13) ¢(') = ¹ 0 (')fx(¡k('))[E(Y1 j D = 1; x = ¡k('); ')-E(Y0 j D = 0; x = ¡k('); ')] Nk R1 R ¡k(') @f (y0 ; x j ') R1 @f (y1 ; x j ') 1 ¹ R¡1 +N[ y1 ¡k(') dxdy1 + ¡1 y0 ¡1 dxdy0 ] @' @' @c(') ¡ : @' This model has a well-de…ned margin: X = ¡k('); which is the utility of the marginal entrant into the program. The utility of the participant might be distinguished from the objective of the social planner who seeks to maximize total output. The …rst set of terms corresponds to the gain arising from the movement of persons at the margin (the term in brackets) weighted by the proportion of the population at the margin, k 0 (')fx (¡k(')), times the number of people in the population. This term is the net gain from switching sectors. The expression in brackets in the …rst term is a limit form of the “local average treatment e¤ect” of Imbens and Angrist (1994) which we discuss further in our discussion of instrumental variables in Section 7.4.3. The second set of terms is the intrasector change in output resulting from a policy change. This includes both direct and indirect e¤ects. The second set of terms is ignored in most evaluation studies. It describes how people who do not switch sectors are a¤ected by the policy. The third term is the direct marginal social cost of the policy change. It includes the cost of administering the program plus the opportunity cost of consumption foregone to raise the taxes used to …nance the program. Below we demonstrate the empirical importance of accounting for the full social costs of programs. At an optimum, ¢(') = 0, provided standard second order conditions are satis…ed. Marginal bene…t should equal the marginal cost. We can use either a cost-based measure of marginal bene…t or a bene…t-based measure of cost to evaluate the marginal gains or marginal costs of the program, respectively. Observe that the local average treatment e¤ect is simply the e¤ect of treatment on the treated for persons at the margin (X = ¡k(')) : (3.14) E(Y1 j D = 1; X = ¡k('); ') ¡ E(Y0 j D = 0; X = ¡k('); ') = E(Y1 ¡ Y0 j D = 1; X = ¡k('); '): This expression is obvious once it is recognized that the set X = ¡k(') is the indifference set. Persons in that set are indi¤erent between participating in the program and not participating. The Imbens and Angrist (1994) parameter is a marginal version of the “treatment on the treated” evaluation parameter for gross outcomes. This parameter is one of the ingredients required to produce an evaluation of the impact of a marginal change 28 in the social program on total output but it ignores costs and the e¤ect of a change in the program on the outcomes of persons who do not switch sectors.8 The conventional evaluation parameter, E(Y1 ¡ Y0 j D = 1; x; ') does not incorporate costs, does not correspond to a marginal change and includes rents accruing to persons. This parameter is in general inappropriate for evaluating the e¤ect of a policy change on GDP. However, under certain conditions which we now specify, this parameter is informative about the gross gain accruing to the economy from the existence of a program at level ' ~ compared to the alternative of shutting it down. This is the information required for an “all or nothing” evaluation of a program. The appropriate criterion for an all or nothing evaluation of a policy at level ' = ' ~ is A(~ ') = fN1 (~ ')E(Y1 j D = 1; ' = ' ~ ) + N0 (~ ')E(Y0 j D = 0; ' = ' ~ ) ¡ c(~ ')g ¡ fN1 (0)E(Y1 j D = 1; ' = 0 ) + N0 (0)E(Y0 j D = 0; ' = 0)g where ' = 0 corresponds to the case where there is no program, so that N1 (0) = 0 and ¹ . If A(~ N0 (0) = N ') > 0, total output is increased by establishing the program at level ' ~. In the special case where the outcome in the benchmark state “0” is the same whether or not the program exists, (3.15) E(Y0 j D = 0; ' = ' ~ ) = E(Y0 j D = 0; ' = 0): This condition de…nes the absence of general equilibrium e¤ects in the base state so the no program state for nonparticipants is the same as the nonparticipation state. Assumption (3.15) is what enables analysts to generalize from partial equilibrium to general equilibrium ¹ = N1 (') +N0 ('), when (3.15) holds we have settings. Recalling that N (3.16) A(~ ') = N1 (~ ')E(Y1 ¡ Y0 j D = 1; ' = ' ~ ) ¡ c(~ '):9 Given costless redistribution of the bene…ts, the output-maximizing solution for ' also maximizes social welfare. For this important case, which is applicable to small-scale social programs with partial participation, the measure “treatment on the treated” which we focus on in this chapter is justi…ed. For evaluating the e¤ect of marginal variation or “…ne-tuning” of existing policies, measure ¢(') is more appropriate.10 8 Heckman and Smith (1998a) and Heckman (1997) present comprehensive discussions of the Imbens and Angrist (1994) parameter. We discuss this parameter further in Section 7. One important di¤erence between their parameter and the traditional treatment on the treated parameter is that the latter excludes variables like ' from the conditioning set, but the Imbens-Angrist parameter includes it. 9 Condition (3.15) is stronger than what is required to justify (3.16). The condition only has to hold for the subset of the population (N0 (') in number) who would not participate in the presence of the program. 10 Björklund and Mo¢tt (1987) estimate both the marginal gross gain and the average gross gain from participating in a program. However, they do not present estimates of marginal or average costs. 29 4 Prototypical Solutions to the Evaluation Problem An evaluation entails making some comparison between “treated” and “untreated” persons. This section considers three widely-used comparisons for estimating the impact of treatment on the treated: E(Y1 ¡ Y0 j X; D = 1). All use some form of comparison to construct the required counterfactual E(Y0 j X; D = 1). Data on E(Y1 j X; D = 1) are available from program participants. A person who has participated in a program is paired with an “otherwise comparable” person or set of persons who have not participated in it. The set may contain just one person. In most applications of the method, the paired partner is not literally assumed to be a replica of the treated person in the untreated state although some panel data evaluation estimators make such an assumption. Thus, in general, ¢ = Y1 ¡ Y0 is not estimated exactly. Instead, the outcome of the paired partners is treated as a proxy for Y0 for the treated individual and the population mean di¤erence between treated and untreated persons is estimated by averaging over all pairs. The method can be applied symmetrically to nonparticipants to estimate what they would have earned if they had participated. For that problem the challenge is to …nd E(Y1 j X; D = 0) since the data on nonparticipants enables one to identify E(Y0 j X; D = 0). A major di¢culty with the application of this method is providing some objective way of demonstrating that a candidate partner or set of partners is “otherwise comparable.” Many econometric and statistical methods are available for adjusting di¤erences between persons receiving treatment and potential matching partners which we discuss in Section 7. 4.1 The Before-After Estimator In the empirical literature on program evaluation, the most commonly-used evaluation strategy compares a person with himself/herself. This is a comparison strategy based on longitudinal data. It exploits the intuitively-appealing idea that persons can be in both states at di¤erent times, and that outcomes measured in one state at one time are good proxies for outcomes in the same state at other times at least for the no-treatment state. This gives rise to the motivation for the simple “before-after” estimator which is still widely used. Its econometric descendent is the …xed e¤ect estimator without a comparison group. The method assumes that there is access either (i) to longitudinal data on outcomes measured before and after a program for a person who participates in it, or (ii) to repeated cross section data from the same population where at least one cross section is from a period prior to the program. To incorporate time into our analysis, we introduce “t” subscripts. Let Y1t be the post-program earnings of a person who participates in the program. When longitudinal data are available, Y0t0 is the pre-program outcome of the 30 person. For simplicity, assume that program participation occurs only at time period k, where t > k > t0 . The “ before-after” estimator uses preprogram earnings Y0t0 to proxy the treatment state in the post program period. In other words, the underlying identifying assumption is (4.A.1) E(Y0t ¡ Y0t0 j D = 1) = 0: If this assumption is valid, the “Before-After” estimator is given by (4.1) (Y 1t ¡ Y 0t0 )1 ; where the subscript “1” denotes conditioning on D = 1, and the “¡” denotes sample means. To see how this estimator works, observe that for each individual the gain from the program may be written as Y1t ¡ Y0t = (Y1t ¡ Y0t0 ) + (Y0t0 ¡ Y0t ): The second term (Y0t0 ¡ Y0t ) is the approximation error. If this term averages out to zero, we may estimate the impact of participation on those who participate in a program by subtracting participants’ mean pre-program earnings from the mean of their post-program earnings. These means also may be de…ned for di¤erent values of participants’ characteristics, X. The before-after estimator does not literally require longitudinal data to identify the means (Heckman and Robb, 1985a,b). As long as the approximation error averages out, repeated cross-sectional data that sample the same population over time, but not necessarily the same persons, are su¢cient to construct a before-after estimate. An advantage of this approach is that it only requires information on the participants and their pre-participation histories to evaluate the program. The major drawback to this estimator is its reliance on the assumption that the approximation errors average out. This assumption requires that among participants, the mean outcome in the no-treatment state is the same in t and t0 . Changes in the overall state of the economy between t and t0 , or changes in the life cycle position of a cohort of participants, can violate this assumption. A good example of a case in which assumption (4.A.1) is likely violated is provided in the work of Ashenfelter (1978). Ashenfelter observed that prior to enrollment in a training program, participants experience a decline in their earnings. Later research demonstrates that Ashenfelter’s “dip” is a common feature of the pre-program earnings of participants in government training programs. See Figures 4.1 to 4.6 which show the dip for a variety of programs in di¤erent countries. If this decline in earnings is transitory, and earnings is a mean-reverting process so that the dip is eventually restored, even in the absence of participation in the program, and if period t0 falls in the period of transitorily low 31 earnings, then the approximation error will not average out. In this example, the beforeafter estimator overstates the average e¤ect of training on the trained and attributes mean reversion that would occur under any event to the e¤ect of the program. On the other hand, if the decline is permanent, the before-after estimator is unbiased for the parameter of interest. In this case, any improvement in earnings is properly attributable to the program. Another potential defect of this estimator is that it attributes to the program any trend in earnings due to macro or lifecycle factors. Two di¤erent approaches have been used to solve these problems with the before-after estimators. One controversial method generalizes the before-after estimator by making use of many periods of pre-program data and extrapolating from the period before t0 to generate the counterfactual state in period t. It assumes that Y0t and Y0t0 can be adjusted to equality using data on the same person, or the same populations of persons, followed over time. As an example, suppose that Y0t is a function of t, or is a function of t- dated variables. If we have access to enough data on pre-program outcomes prior to date t0 to extrapolate post-program outcomes Y0t ; and if there are no errors of extrapolation, or if it is safe to assume that such errors average out to zero across persons in period t, one can replace the missing data or at least averages of the missing data, using extrapolated values. This method is appropriate if population mean outcomes evolve as deterministic functions of time or macroeconomic variables like unemployment. This procedure is discussed further in Section 7.5.11 The second approach is based on the before-after estimator which we discuss next. 4.2 The Di¤erence-in-Di¤erences Estimator A more widely used approach to the evaluation problem assumes access either (i) to longitudinal data or (ii) to repeated cross-section data on nonparticipants in periods t and t0 . If the mean change in the no-program outcome measures are the same for participants and nonparticipants i.e. if the following assumption is valid: (4.A.2) E(Y0t ¡ Y0t0 j D = 1) = E(Y0t ¡ Y0t0 j D = 0) then the di¤erence-in-di¤erences estimator given by (4.2) (Y¹1t ¡ Y¹0t0 )1 ¡ (Y¹0t ¡ Y¹0t0 )0 t > k > t0 : is valid for E(¢t j D = 1) = E(Y1t ¡ Y0t j D = 1) where ¢t = Y1t ¡ Y0t because E[(Y¹1t ¡ Y¹0t0 )1 ¡ (Y¹0t ¡ Y¹0t0 )0 ] = E(¢t j D = 1):12 11 12 See also Heckman and Robb (1985a), p. 210-215. The proof is immediate. Make the following decomposition (Y¹1t ¡ Y¹0t0 )1 = (Y¹1t ¡ Y¹0t0 )1 + (Y¹0t ¡ Y¹0t0 )1 : 32 If assumption (4.A.2) is valid, the change in the outcome measure in the comparison group serves to benchmark common year or age e¤ects among participants. Because we cannot form the change in outcomes between the treated and untreated states, the expression (Y1t ¡ Y0t0 )1 ¡ (Y0t ¡ Y0t0 )0 ; cannot be formed for anyone, although we can form one or the other of these terms for everyone. Thus, we cannot use the di¤erence-in-di¤erences estimator to identify the distribution of gains without making further assumptions.13 Like the before-after estimator, we can implement the di¤erence-in-di¤erences estimator for means (4.2) on repeated cross sections. It is not necessary to sample the same persons in periods t and t0 — just persons from the same populations. Ashenfelter’s dip provides an example of a case where assumption (4.A.2) is likely to be violated. If Y is earnings, and t0 is measured at the time of a transitory earnings dip, and if non-participants do not experience the dip, then (4.A.2) will be violated, because the time path of no-program earnings between t0 and t will be di¤erent between participants and nonparticipants. In this example, the di¤erence-in-di¤erences estimator overstates the average impact of training on the trainee. 4.3 The Cross-Section Estimator A third estimator compares mean outcomes of participants and nonparticipants at time t: This estimator is sometimes called the cross-section estimator. It does not compare the same persons because by hypothesis a person cannot be in both states at the same time. Because of this fact, cross-section estimators cannot estimate the distribution of gains unless additional assumptions are invoked beyond those required to estimate mean impacts. The key identifying assumption for the cross-section estimator of the mean is that (4.A.3) E(Y0t j D = 1) = E(Y0t j D = 0); i.e., that on average persons who do not participate in the program have the same notreatment outcome as those who do participate. If this assumption is valid, then the cross-section estimator is given by The claim follows upon taking expectations. 13 One assumption that identi…es the distribution of gains is to assume that (Y1t ¡ Y0t )1 is independent of (Y0t ¡ Y0t0 )1 and that the distribution of (Y1t ¡ Y0t )1 is the same as the distribution of (Y0t ¡ Y0t0 )0 : Then the results on deconvolution in Heckman, Smith and Clements (1997) can be applied. See their paper for details. 33 (4.3) (Y¹1t )1 ¡ (Y¹0t0 )0 : This estimator is valid under assumption (4.A.3) because E((Y¹1t )1 ¡ (Y¹0t )0 ) = E(¢t j D = 1):14 If persons go into the program based on outcome measures in the post-program state, then assumption (4.A.3) will be violated. The assumption would be satis…ed if participation in the program is unrelated to outcomes in the no program state in the post-program period. Thus, it is possible for Ashenfelter’s dip to characterize the data on earnings in the preprogram period, and yet for (4.A.3) to be satis…ed. Moreover, as long as the macro economy and aging process operate identically on participants and nonparticipants, the cross section estimator is not vulnerable to the problems that plague the before-after estimator. The cross section estimator (4.3), the di¤erence-in-di¤erences estimator (4.2), and the before-after estimator (4.1) comprise the trilogy of conventional non-experimental evaluation estimators. All of these estimators can be de…ned conditional on observable characteristics X. Conditioning on X or additional “instrumental” variables make it more likely that modi…ed versions of assumptions (4.A.3), (4.A.2), or (4.A.1) will be satis…ed but this is not guaranteed. If, for example, the distribution of X characteristics is di¤erent between participants (D = 1) and nonparticipants (D = 0), conditioning on X may eliminate systematic di¤erences in outcomes between the two groups. Using modern nonparametric procedures, it is possible to exploit each of the identifying conditions to estimate nonparametric versions of all three estimators. On the other hand, if the di¤erence between participants and nonparticipants is due to unobservables, conditioning may accentuate, and not eliminate, di¤erences between participants and nonparticipants in the no-program state.15 The three estimators exploit three di¤erent principles but all are based on making some comparison. The assumptions that justify one method will not, in general, justify any of the other methods. All of the estimators considered in this chapter exploit one of these three principles. They extend the simple mean di¤erences just discussed by making a variety of adjustments to the means. Throughout the rest of the chapter, we organize our discussion of alternative estimators by discussing how they modify the simple mean di¤erences used in the three intuitive estimators to account for nonstationary environments and di¤erent regressors in the di¤erent comparison groups. We …rst consider social experimentation and how it constructs the counterfactuals used in policy evaluations. 14 Proof: (Y¹1t )1 ¡ (Y¹0t0 )0 = (Y¹1t )1 ¡ (Y¹0t )1 + (Y¹0t )1 ¡ (Y¹0t0 )0 and take expectations invoking assumption (4-A-3). Thus if j E(Y0 j D = 1) ¡ E(Y0 j D = 0) j= M , there is no guarantee that j E(Y0 j D = 1; X) ¡ E(Y0 j D = 0; X) j< M . For some values of X, the gap could widen. 15 34 5 Social Experiments Randomization is one solution to the evaluation problem. Recent years have witnessed increasing use of experimental designs to evaluate North American employment and training programs. This approach has been less common in Europe, though a small number of experiments have been conducted in Britain, Norway and Sweden. When the appropriate quali…cations are omitted, the impact estimates from these social experiments are easy for analysts to calculate and for policymakers to understand (see, e.g., Burtless, 1995). As a result of its apparent simplicity, evidence from social experiments has had an important impact on the design of U.S. welfare and training programs.16 Because of the importance of experimental designs in this literature, in this section we show how they solve the evaluation problem, describe how they have been implemented in practice, and discuss their advantages and limitations. 5.1 How Social Experiments Solve the Evaluation Problem. An important lesson of this section is that social experiments, like other evaluation methods, provide estimates of the parameters of interest only under certain behavioral and statistical assumptions. To see this, let “*” denote outcomes in the presence of random assignment. Thus, conditional on X for each person we have (Y1¤ ; Y0¤ ; D¤ ) in the presence of random assignment and (Y1 ; Y0 ; D) when the program operates normally without randomization. Let R = 1 if a person for whom D¤ = 1 is randomized into the program and R = 0 if the person is randomized out. Thus, R = 1 corresponds to the experimental treatment group and R = 0 to the experimental control group. The essential assumption required to use randomization to solve the evaluation problem for estimating the mean e¤ect of treatment on the treated is that (5.A.1) E(Y1¤ ¡ Y0¤ j X; D¤ = 1) = E(Y1 ¡ Y0 j X; D = 1). A stronger set of conditions, not strictly required, are (5.A.2a) E(Y1¤ j X; D¤ = 1) = E(Y1 j X; D = 1) and (5.A.2b) E(Y0¤ j X; D¤ = 1) = E(Y0 j X; D¤ = 1). Assumption (5.A.2a) states that the means from the treatment and control groups generated by random assignment produce the desired population parameter. With certain exceptions discussed below, this assumption rules out changes in the impact of participation due to the presence of random assignment as well as changes in the process of program participation. The …rst part of this assumption can in principle be tested by comparing the 16 We discuss this evidence in Section 10. 35 outcomes of participants under a regime of randomization with the outcome of participants under the usual regime. If (5.A.2a) is true, among the population for whom D = 1 and R = 1 we can identify E(Y1 j X; D = 1; R = 1) = E(Y1 j X; D = 1): Under (5.A2a) information su¢cient to estimate this mean without bias is routinely produced from data collected on participants in social programs. The new information produced by an experiment comes from those randomized out of the program. Using the experimental control group it is possible to estimate: E(Y0 j X; D = 1; R = 0) = E(Y0 j X; D = 1): Thus, experiments produce data that satisfy assumption (4.A.3). Simple application of the cross-section estimator identi…es E(¢ j X; D = 1) = E(Y1 ¡ Y0 j X; D = 1): Within the context of the model of equation (3.10), an experiment that satis…es (5.A.1) or (5.A.2a) and (5.A2b) does not make D orthogonal to U. It simply equates the bias in the two groups R = 1 and R = 0. Thus in the model of equation (3.1), under (5.A.2a), E(Y jX; D = 1; R = 1) = g1 (X) + E(U1 jX; D = 1) and E(Y jX; D = 1; R = 0) = g0 (X) + E(U0 jX; D = 1):17 Rewriting the …rst conditional mean, we obtain E(Y jX; D = 1; R = 1) = g1 (X) + E(U1 ¡ U0 jX; D = 1) + E(U0 jX; D = 1): Subtracting the second mean from the …rst eliminates the common selection bias component E(U0 jX; D = 1) so E(Y jX; D = 1; R = 1) ¡ E(Y jX; D = 1; R = 0) = g1 (X) ¡ g0 (X) + E(U1 ¡ U0 jX; D = 1): When the model (3.1) is specialized to one of intercept di¤erences, as in (3.10), this parameter simpli…es to ®. Notice, that the method of social experiments does not set either E(U1 jX; D = 1) or E(U0 jX; D = 1) equal to zero. Rather, it balances the selection bias in the treatment and control groups. 17 Notice that in this section we allow for the more general model Y0 = g0 (X) + U0 , Y1 = g1 (X) + U1 where E(U0 j X) 6= 0 and E(U1 j X) 6= 0. 36 Stronger assumptions must be made to identify the distribution of impacts F (¢ j D = 1): Without invoking further assumptions, data from experiments, like data from nonexperimental sources, are unable to identify the distribution of impacts because the same person is not observed in both states at the same time (Heckman, 1992; Heckman, Smith and Clements, 1997; Heckman and Smith, 1993, 1995, 1998a). If assumption (5.A.1) or assumptions (5.A.2a) and (5.A.2b) fail to hold because the program participation probabilities are a¤ected, so D¤ and D are di¤erent, then the composition of the participant population di¤ers in the presence of random assignment. In two important special cases, experimental data still provide unbiased estimates of the e¤ect of treatment on the treated. First, if the e¤ect of training is the same for everyone, changing the composition of the participants has no e¤ect because the parameter of interest is the same for all possible participant populations (Heckman, 1992). This assumption is sometimes called the common treatment e¤ect assumption and, letting i denote a variable value for individual i, may be formally expressed as (5.A.3) Y1i ¡ Y0i = ¢i ´ ¢ for all i. This assumption is equivalent to setting U1 = U0 in (3.9). Assumption (5.A.3) can be de…ned conditionally on observed characteristics, so we may write ¢ = ¢(X). Notice, however, that in this case, if randomization induces persons with certain X values not to participate in the program, then estimates of ¢(X) can only be obtained for values of X possessed by persons who participate in the program. In this case (5.A.1) is satis…ed but (5.A.2a) and (5.A.2b) are not. The second special case where experimental data still provide unbiased estimates of the e¤ect of treatment on the treated arises when decisions about training are not a¤ected by the realized gain from participating in the program. This case could arise if potential trainees know E(¢ j X) but not ¢ at the time participation decisions are made. Formally, the second condition is (5.A.4) E(¢ j X; D = 1) = E(¢ j X); which is equivalent to condition (3.11) in the model (3.9). If either (5.A.3) or (5.A.4) holds, the simple experimental mean di¤erence estimator is unbiased for E(¢ j X; D = 1): Randomization improves on the non-experimental cross-section estimator even if there is no selection bias. In an experiment, for all values of X for which D = 1, one can identify 18 E(¢ j X; D = 1) = E(Y1 ¡ Y0 j X; D = 1): Using assumption (4.A.3) in an ordinary nonexperimental evaluation, there may be values of X such that Pr(D = 1 j X) = 1; that is, there may be values of X with no comparison group members. Randomization avoids this di¢culty by balancing the distribution of X 18 Replace “E” with “F ” in (5.A.2a) and (5.A.2b) to obtain one necessary condition. 37 values in the treatment and control groups (Heckman, 1996). At the same time, however, random assignment conditional on D = 1 cannot provide estimates of ¢(X) for values of X such that Pr(D = 1 j X) = 0. The stage of potential program participation at which randomization is applied - eligibility, application, or acceptance into a program - determines what can be learned from a social experiment. For randomization conditional on acceptance into a program (D = 1), we can estimate the e¤ect of treatment on the treated: E(¢ j X; D = 1) = E(Y1 ¡ Y0 j X; D = 1) using simple experimental means. We cannot estimate the e¤ect of randomly selecting a person to go into the program: E(¢ j X) = E(Y1 ¡ Y0 j X); by using simple experimental means unless one of two conditions prevails. The …rst condition is just the common e¤ect assumption (5.A.3). This assumption is explicit in the widely-used dummy endogenous variable model (Heckman, 1978). The second condition is that embodied in assumption (5.A.4), that participation decisions are independent of the person-speci…c component of the impact. In both cases, the mean impact of treatment on a randomly selected person is the same as the mean impact of treatment on the treated. In the general case, it is di¢cult to estimate the e¤ect of randomly assigning a person with characteristics X to go into a program. This is because persons randomized into a program cannot be compelled to participate in it. In order to secure compliance, it may be necessary to compensate or persuade persons to participate. For example, in many U.S. social experiments, program operators threaten to reduce participants’ social assistance bene…ts, if they refuse to participate in training. Such actions, even if successful, alter the environment in which persons operate and may make it impossible to estimate E(¢ j X) using experimental means. One assumption that guarantees compliance is the existence of a “compensation” or “punishment” level c such that (5.A.5a) Pr(D = 1 j X; c) = 1 and (5.A5b) E(¢ j X; c) = E(¢ j X): The …rst part of the assumption guarantees that a person with characteristics X can be “bribed” or “persuaded” to participate in the program. The second part of the assumption guarantees that compensation c does not a¤ect the outcome being evaluated.19 If c is a 19 Observe that the value of c is not necessarily unique. 38 monetary payment, it would be optimal from the standpoint of an experimental analyst to …nd the minimal value of c that satis…es these conditions. Randomization of eligibility is sometimes proposed as a less disruptive alternative to randomization conditional on D = 1. Randomizing eligibility avoids the application and screening costs that are incurred when accepted individuals are randomized out of a program. Because the randomization is performed outside of training centers, it also avoids some of the political costs that have accompanied the use of the experimental method. Consider a population of persons who are usually eligible for the program. Randomize eligibility within this population. Let e = 1 if a person retains eligibility and e = 0 if a person becomes ineligible. Assume that eligibility does not disturb the underlying structure of the random variables (Y0 ; Y1 ; D; X) and that Pr(D = 1 j X) 6= 0. Then Heckman (1996) shows that E(Y j X; e = 1) ¡ E(Y j X; e = 0) = E(¢ j X; D = 1): Pr(D = 1 j X; e = 1) Randomization of eligibility produces samples that can be used to identify E(¢ j X; D = 1) and also to recover Pr(D = 1jX): The latter is not recovered from samples which condition on D = 1 (Heckman, 1992; Mo¢tt, 1992). Without additional assumptions of the sort previously discussed, randomization on eligibility will not, in general, identify E(¢ j X). 5.2 Intention to Treat and Substitution Bias The objective of most experimental designs is to estimate the conditional mean impact of training, or E(¢ j X; D = 1). However, in many experiments a signi…cant fraction of the treatment group drops out of the program and does not receive the services being evaluated.20 In general, in the presence of dropping out E(¢ j X; D = 1) cannot be identi…ed using comparisons of means. Instead, the experimental mean di¤erence estimates the mean e¤ect of the o¤er of treatment, or what is sometimes called the “intent to treat.” For many purposes, this is the policy-relevant parameter. It is informative on how the availability of a program a¤ects participant outcomes. Attrition is a normal feature of an ongoing program. To obtain an estimate of the impact of training on those who actually receive it, additional assumptions are required beyond (5.A.1) or (5.A.2a) and (5.A.2b). Let T be an indicator for actual receipt of treatment, with T = 1 for persons actually receiving training, and T = 0 otherwise. Let T ¤ be a similarly de…ned latent variable for control group 20 Using the analysis in the preceding subsection, dropping out by experimental treatment group members could be reduced by compensating them for completing training. 39 members indicating whether or not they would have actually received training, had they been in the treatment group. De…ne E(¢ j X; D = 1; R = 1; T = 1) = E(¢ j X; D = 1; T = 1) as the mean impact of training on those members of the treatment group who actually receive it. This parameter will equal the original parameter of interest E(¢ j X; D = 1) only in the special cases where (5.A.3), the common e¤ect assumption, holds, or where an analog to (5.A.4) holds so that the decision of treatment group members to drop out is independent of (¢ ¡ E(¢)), the person-speci…c component of their impact. A consistent estimate of the impact of training on those who actually received it can be obtained under the assumption that the mean outcome of the treatment group dropouts is the same as that of their analogs in the control group, so that (5.A.6) E(Y j X; D = 1; R = 1; T = 0) = E(Y j X; D = 1; R = 0; T ¤ = 0): Note that this assumption rules out situations where the treatment group dropouts receive potentially valuable partial treatment. Under (5.A.6), E(Y j X; D = 1; R = 1) ¡ E(Y j X; D = 1; R = 0) (5.1) P (T = 1 j X; D = 1; R = 1) identi…es the mean impact of training on those who receive it.21 This estimator scales up the experimental mean di¤erence estimate by the fraction of the treatment group receiving training. When all treatment group members receive training, the denominator equals one and the estimator reduces to the simple experimental mean di¤erence. Estimator (5.1) also shows that the simple mean di¤erence estimator provides a downward biased estimate of the mean impact of training on the trained when there are dropouts from the treatment group, because the denominator always lies between zero and one. Heckman, Smith and Taber (1998) present methods for estimating distributions of outcomes and for testing the identifying assumptions in the presence of dropping out. They present evidence on the validity of the assumptions that justify (5.1) in the National JTPA Study data. In an experimental evaluation, the converse problem can also arise for the control group members. In an ideal experiment, no control group members would receive either the experimental treatment or close substitutes to it from other sources. In practice, a signi…cant fraction of controls often receives similar services from other sources. In this situation, the mean earnings of control group members no longer correspond to E(Y0 j X; D = 1) and neither the experimental mean di¤erence estimator nor the adjusted estimator (5.1) identi…es the impact of training relative to no training for those who receive it. However, under certain conditions discussed in Section 3, the experimental estimate can be interpreted as the mean incremental e¤ect of the program relative to a world in which it does not exist. 21 See, e.g., Mallar (1978), Bloom (1984) and Heckman, Smith and Taber (1998). 40 As in the case of treatment group dropouts, identifying the impact of training on the trained in the presence of control group substitution requires additional assumptions beyond (5.A.1) or (5.A.2a) and (5.A.2b). Let S = 1 denote control group members receiving substitute training from alternative sources and let S = 0 denote control group members receiving no training and let Y2 be the outcome conditional on receipt of alternative training. Consider the general case with both treatment group dropping out and control group substitution. In this context, one approach would be to invoke the assumptions required to apply non-experimental techniques as described in Section 7 to the treatment group data to obtain an estimate of the impact of the training being evaluated on those who receive it. Heckman, Hohmann, Khoo and Smith (1998) employ this and other strategies using data from the National JTPA Study. Alternatively, two other assumptions allow use of the control group data to estimate the impact of training on the trained. The …rst assumption is a generalized common e¤ect assumption, where to distinguish individuals we restore subscript i (5.A.30 ) Y1i ¡ Y0i = Y2i ¡ Y0i = ¢i ´ ¢ for all i. This assumption states that (a) the impact of the program being evaluated is the same as the impact of substitute programs for each person and (b) that all persons respond exactly the same way to the program (a common e¤ect assumption). The second assumption is a generalized version of (5.A.4), where (5.A.40 ) E(Y1 ¡ Y0 j X; D = 1; T = 1; R = 1) = E(Y2 ¡ Y0 j X; D = 1; S = 1; R = 0): This assumption states that the mean impact of the training being evaluated received by treatment group members who do not drop out equals the mean impact of substitute training on those control group members who receive it. Both (5.A.30 ) and (5.A.40 ) are strong assumptions. To be plausible, either would require evidence that the training received by treatment group members was similar in content and duration to that received by control group members. Note that (5.A.30 ) implies (5.A.40 ). Under either assumption, the ratio E(Y j X; D = 1; R = 1) ¡ E(Y j X; D = 1; R = 0) (5.2) Pr(T = 1 j X; D = 1; R = 1) ¡ Pr(S = 1 j X; D = 1; R = 0) identi…es the mean impact of training on those who receive it in both the experimental treatment and control groups, provided that the denominator is not zero. The similarity of estimator (5.2) to the instrumental variable estimator de…ned in Section 7 is not accidental; under assumptions (5.A.30 ) or (5.A.40 ), random assignment is a valid instrument for training because it is correlated with training receipt but not with any other determinants of the outcome Y . Without one of these assumptions, random assignment is not, in general, a valid instrument (Heckman, 1997; Heckman, Hohmann, Khoo and Smith, 1998). To see this point, consider a model in which individuals know their gain from training, but because the treatment group has access to the program being evaluated, it faces a lower cost of training. In this case, controls are less likely to be trained, but the mean gross impact 41 would be larger among control trainees than among the treatment trainees. Drawing on the analysis of Section 7, this correlation violates the condition required for the IV estimator to identify the parameter of interest. 5.3 Social Experiments in Practice In this subsection we discuss how social experiments operate in practice. We present empirical evidence on some of the theoretical issues surrounding social experiments discussed in the preceding subsections and provide a context for the discussion of the experimental evidence on the impact of training in Section 10. To make the discussion concrete, we focus in particular on two of the best known U.S. social experiments: the National Supported Work (NSW) demonstration (Hollister, et al., 1984) and the recent National JTPA Study (NJS).22 We begin with a brief discussion of the implementation of these two experiments. 5.3.1 Two Important Social Experiments The NSW Demonstration was one of the …rst employment and training experiments. It tested the e¤ect of 9 to 18 months of guaranteed work experience in unskilled occupations on groups of long-term AFDC (welfare) recipients, ex-drug addicts, ex-criminal o¤enders, and economically disadvantaged youths in 10 sites across the U.S. These jobs were in a sheltered environment in which productivity standards were gradually raised over time and participants met frequently with program counselors to discuss grievances and performance. The NSW enrollment process began with a referral, usually by a welfare agency, drug rehabilitation agency, or prisoners’ assistance society. Program operators then interviewed potential participants and eliminated any persons that they believed “would be disruptive to their programs” (Hollister, et al., 1984, p. 35). Following this screening, a third party randomly assigned one-half of the quali…ed applicants to the treatment group. The remainder were assigned to the control group and prevented from receiving NSW services. Although the controls could not receive NSW services, program administrators could not prevent them from receiving other training services in their community, such as those offered under another widely available training program with the acronym CETA. Follow-up data on the experimental treatment and control groups were collected via both surveys and administrative earnings records. In contrast to the NSW, the NJS sought to evaluate the e¤ectiveness of an ongoing training program. From the start, the goal of evaluating an ongoing program without signi…cantly disrupting its operations – and thereby violating assumption (5.A.1) or assumptions (5.A.2a) and (5.A.2b) – posed signi…cant problems. The …rst of these arose 22 See, among others, Doolittle and Traeger (1990), Bloom, et al. (1993) and Orr, et al. (1994). 42 in selecting the training centers at which random assignment would take place. Initially, evaluators planned to use a random sample of the nearly 600 U.S. JTPA training sites. Randomly choosing the evaluation sites would enhance the “external validity” of the experiment – the extent to which its …ndings can be generalized to the population of JTPA training centers. Yet, it was di¢cult to persuade local administrators to participate in an evaluation that required them to randomly deny services to eligible applicants. When only four of the randomly selected sites or their alternates agreed to participate, the study was redesigned to include a “diverse” group of 16 centers willing to participate in a random assignment study (see Doolittle and Traeger, 1990; or the summary of their analysis presented in Hotz, 1992). Evaluators had to contact 228 JTPA training centers in order to obtain these sixteen volunteers.23 The option of forcing centers to participate was rejected because of the importance of securing the cooperation of local administrators in preserving the integrity of random assignment. Such concerns are not without foundation, as the integrity of an experimental training evaluation in Norway was undermined by the behavior of local operators (Torp, et al., 1993). Concerns about disrupting normal program operations and violating (5.A.1) or (5.A.2a)(5.A.2b) also led to an unusual approach to the evaluation of the speci…c service types provided by JTPA. This program o¤ers a personalized mix of employment and training services including all those listed in Table 2.1 with the exception of public service employment. During their enrollment in the program, participants may receive two or more of these services in sequence, where the sequence may depend on the participant’s success or failure in those services provided …rst. As a result of this heterogeneous, ‡uid structure, it was impossible without changing the character of the program to conduct random assignment conditional on (planned) receipt of particular services or sets of services. Instead, JTPA sta¤ recommended particular services for each potential participant prior to random assignment, and impact estimates were calculated conditional on these recommendations. In particular, the recommendations were grouped into three “treatment streams”: the “CT-OS stream” which included persons recommended for classroom training, CT, (and possibly other services), OS, but not on the job training or OJT; the “OJT stream” which included persons recommended for OJT (and possibly other services) but not CT; and the “other stream” which included the rest of the admitted applicants, most of whom ended up receiving only job search assistance. Note that this issue did not arise in the NSW, which provided a single service to all of its participants. In the NJS, follow-up data on earnings, employment and other outcomes were obtained from both surveys and multiple 23 Very large training centers (e.g., Los Angeles) and small, rural centers were excluded from the study design from the outset of the center enrollment process, for administrative and cost reasons, respectively. The …nal set of 16 training centers received a total of $1 million in payments to cover the cost of participating in the experiment. 43 administrative data sources. 5.3.2 The Practical Importance of Dropping Out and Substitution The most important problems a¤ecting social experiments are treatment group dropout and control group substitution. These problems are not unique to experiments. Persons drop out of programs whether or not they are experimentally evaluated. There is no evidence that the rate of dropping out increases during an experimental evaluation. Most programs have good substitutes so that the estimated e¤ect of a program as typically estimated is in relation to the full range of activities in which nonparticipants engage. Experiments exacerbate this problem by creating a pool of persons who attempted to take training who then ‡ock to substitute programs when they are placed in an experimental control group. Table 5.1 demonstrates the practical importance of these problems in experimental evaluations by reporting the rates of treatment group dropout and control group substitution from a variety of social experiments. It reveals that the fraction of treatment group members receiving program services is often less than 0.7, and sometimes less than 0.5. Furthermore, the observed characteristics of the treatment group members who drop out often di¤er from those who remain and receive the program services.24 In regard to substitution, Table 5.1 shows that as many as 40 percent of the controls in some experiments received substitute services elsewhere. In an ideal experiment, all treatments receive the treatment and there is no control group substitution, so that the di¤erence between the fraction of treatments and controls that receive the treatment equals 1.0. In practice, this di¤erence is often well below 1.0. The extent of both substitution and dropout depends on the characteristics of the treatment being evaluated and the local program environment. In the NSW, where the treatment was relatively unique and of high enough quality to be clearly perceived as valuable by participants, dropout and substitution rates were low enough to approximate the ideal case. In contrast, in the NJS and other evaluations of programs that provide low cost services widely available from other sources, substitution and dropout rates are high.25 In the NJS, the substitution problem is accentuated by the fact that JTPA relies on 24 For the NSW, see LaLonde (1984); for the NJS see Smith (1992). For the NJS, Table 5.1 reveals the additional complication that estimates of the rate of training receipt in the treatment and control groups depend on the data source used to make the calculation. In particular, because many treatment group members do not report training that administrative records show they received, dropout rates measured using only the survey data are substantially higher than those that combine the survey and administrative data. At the same time, because administrative data are not available on control group training receipt (other than the very small number of persons who defeated the experimental protocol), using only self-report data on controls but the combined data for the treatment group will likely overstate the di¤erence in service receipt levels between the two groups. 25 44 outside vendors to provide most of its training. Many of these vendors, such as community colleges, provide the same training to the general public, often with subsidies from other government programs such as Pell Grants. In addition, in order to help in recruiting sites to participate in the NJS, evaluators allowed them to provide control group members with a list of alternative training providers in the community. Of the 16 sites in the NJS, 14 took advantage of this opportunity to alert control group members to substitute training opportunities. To see the e¤ect of high of dropping out and substitution on the interpretation of the experimental evidence, consider Project Independence. The unadjusted experimental impact estimate is $264 over the 2-year follow-up period, while application of the IV estimator that uses sample moments in place of (5.2) yields an adjusted impact estimate of $1,100 ($264/0.24). The …rst estimate indicates the mean impact of the o¤er of treatment relative to the other employment and training opportunities available in the community. Under assumptions (5.A.30 ) or (5.A.40 ), the latter estimate indicates the impact of training relative to no training in both the treatment and control groups. Under these assumptions, the high rates of dropping out and substitution suggest that, the experimental mean di¤erence estimate is strongly downward biased as an estimate of the impact of treatment on the treated, the primary parameter of policy interest. A problem unique to experimental evaluations is violation of (5.A.1), or (5.A.2a) and (5.A.2b) which produces what Heckman (1992) and Heckman and Smith (1993, 1995) call “randomization bias.” In the NJS, this problem took the form of concerns that expanding the pool of accepted applicants, which was required to keep the number of participants at normal levels while creating a control group, would change the process of selection of persons into the program. Speci…cally, training centers were concerned that the additional recruits brought in during the experiment would be less motivated and harder to train and therefore bene…t less from the program. Concerns about this problem were frequently cited by training centers that declined to participate in the NJS (Doolittle and Traeger, 1990). To partially allay these concerns, random assignment was changed from the 1:1 ratio that minimizes the sampling variance of the experimental impact estimator to a 2:1 ratio of treatments to controls. Although we have no direct evidence on the empirical importance of changes in participation patterns on measured outcomes during the NJS, there is some indirect evidence about the validity of (5.A.1) or (5.A.2a) and (5.A.2b) in this instance. First of all, a number of training centers in the NJS streamlined their intake processes during the experiment – sometimes with the help of an intake consulting …rm whose services were subsidized as part of the evaluation. In so doing, they generally reduced the number of visits and other costs paid by potential trainees, thereby including among those randomly assigned less motivated persons than were normally served. Second, some training centers asked for, and received, 45 additional temporary reductions in the random assignment ratio during the course of the experiment when they experienced di¢culties recruiting su¢cient quali…ed applicants to keep the program operating at normal levels. A second problem unique to experiments involves obtaining experimental estimates of the e¤ects of individual components of services provided in sequence as part of a single program. Experimental designs can readily determine how access to a bundle of services a¤ects participants’ earnings. More di¢cult is the question of how participation at each stage in‡uences earnings, when participants can drop out during the sequence. Providing an experimental answer to this question requires randomization at each stage in the sequence.26 In a program with several stages, this would lead to a proliferation of treatments and either large (and costly) samples or insu¢cient sample sizes. In practice, such sequential randomization has not been attempted in evaluating job training programs. A …nal problem unique to experimental designs is that even under ideal conditions, they are unable to answer many questions of interest besides the narrow impact of “treatment on the treated” parameter. For example, it is not possible in practice to obtain simple experimental estimates of the duration of post-random assignment employment due to postrandom assignment selection problems (Ham and LaLonde, 1990). An elaborate analysis of self-selection of the sort sought to be avoided by social experiments is required. As another example, consider estimating the impact of training on wage rates. The problem that arises in this case is that we observe wages only for those employed following random assignment. If the experimental treatment a¤ects employment, then the sample of employed treatments will have di¤erent observed and unobserved characteristics than the employed controls. In general, we would expect that the persons without wages will be less skilled. The experimental impact estimate cannot separate out di¤erences between the distribution of observed wages in the treatment and control groups that result from the e¤ect of the program on wage rates from the e¤ect of the program on selection into employment. Under these circumstances, only non-experimental methods such as those discussed in Section 7 can provide an answer to the question of interest. 5.3.3 Additional Problems Common to All Evaluations There are a number of other problems that arise in both social experiments and nonexperimental evaluations. Solving these problems in an experimental setting requires an26 Alternatively, in a program with three stages, program administrators might randomly assign eligible participants to one of several treatment groups, with the …rst group receiving only stage 1 services, the second receiving stage 1 and stage 2 services and the third receiving services from all three stages. However, a problem may arise with this scheme if participants assigned to the second and third stages of the program at some point decline to participate. In that case, the design described in the text would be more e¤ective. 46 alysts to make the same types of choices (and assumptions) that are required in a nonexperimental analysis. An important point of this subsection is that experimental impact estimates are sensitive to these choices in the same way as non-experimental estimates. A related concern is that experimental evaluations should, but often do not, include sensitivity analyses indicating the e¤ect of the choices made on the impact estimates obtained. The …rst common evaluation problem arises from imperfect data. Di¤erent survey instruments can yield di¤erent measures for the same variable for the same person in a given time period (see Smith, 1997a,b, and the citations therein). For example, self-reported measures of earnings or welfare receipt from surveys typically di¤er from administrative measures covering the same period (LaLonde and Maynard, 1987; Bloom, et al., 1993). As we discuss in Section 8, in the case of earnings, data sources commonly used for evaluation research di¤er in the types of earnings covered, the presence or absence of top-coding and the extent of missing or incorrect values. The evaluator must trade o¤ these factors when choosing which data source to rely on. Whatever the data source used, the analyst must make decisions about how to handle outliers and missing values. To underscore the point that experimental impacts for the same program can di¤er due to di¤erent choices about data sources and data handling, we compare the impact estimates for NJS presented in the two o¢cial experimental impact reports, Bloom, et al. (1993) and Orr, et al. (1994).27 As shown in Table 5.2, these two reports give substantially di¤erent estimates of the impact of JTPA training for the same demographic groups over the same time period. The di¤erences result from di¤erent decisions about whom to include in the evaluation sample, how to combine earnings information from surveys and administrative data, how to treat seemingly anomalous reports of overtime earnings in the survey data and so on. Several of the point estimates di¤er substantially, as do the implications about the relative e¤ectiveness of the three treatment streams for adult women. The estimated 18-month impact for adult women in the “other services” stream triples from the 18month impact report to the 30-month impact report, making it the service with the largest estimated impact despite the low average cost of the services provided to persons in this stream. The second problem common to experimental and non-experimental evaluations is sample attrition. Note that sample attrition is not the same as dropping out of the program. Both control and treatment group members can attrit from the sample and treatment group members who drop out of the program will often remain in the data. In the NSW, attrition from the evaluation sample by the 18 month follow-up interview was 10 percent for the adult women, but more than 30 percent for the male participants. In the NJS study, sample attrition by the 18 month follow-up was 12 percent for the adult women and ap27 A complete discussion of the impact estimates from the NJS appears in Section 10. 47 proximately 20 percent of the adult males. Such high rates of attrition are common among the disadvantaged due to relatively frequent changes in residence and other di¢culties with making follow-up contacts. Sample attrition poses a problem for experimental evaluations when it is correlated with individual characteristics or with the impact of treatment conditional on characteristics. In practice, persons with poorer labor market characteristics tend to have higher attrition rates (see, e.g., Brown, 1979). Even if attrition a¤ects both experimental and control groups in the same way, the experiment estimates the mean impact of the program only for those who remain in the sample. Usually, attrition rates are both non-random and larger for controls than for treatments. In this case, the experimental estimate of training is biased because individuals’ experimental status, R, is correlated with their likelihood of being in the sample. In this setting, experimental evaluations become non-experimental evaluations because evaluators must make some assumption to deal with selection bias. 48 6 Econometric Models of Outcomes and Program Participation The economic approach to program evaluation is based on estimating behavioral relationships that can be applied to evaluate policies not yet implemented. A focus on invariant behavioral relationships is the cornerstone of the econometric approach. Economic relationships provide frameworks within which empirical knowledge can be accumulated across di¤erent studies. They o¤er guidance on the speci…cation of empirical relationships for any given study and the type of data required to estimate a behaviorally-motivated evaluation model. Alternative empirical evaluation strategies can be judged, in part, by the economic justi…cation for them. Estimators that make economically implausible or empirically unjusti…ed assumptions about behavior should receive little support. The approach to evaluation guided by economic models is in contrast with the case-bycase approach of statistics that at best o¤ers intuitive frameworks for motivating estimators. The emphasis in statistics is on particular estimators and not on the models motivating the estimators. The output of such case by case studies often does not cumulate. Since no articulated behavioral theory is used in this approach, it is not helpful in organizing evidence across studies or in suggesting explanatory variables or behaviorally motivated empirical relationships for a given study. It produces estimated parameters that are very di¢cult to use in answering well posed evaluation questions. All economic evaluation models have two ingredients: (a) a model of outcomes and (b) a model of program participation. This section presents several prototypical econometric models. The …rst was developed by Heckman (1978) to rationalize the evidence in Ashenfelter (1978). The second rationalizes the evidence presented in Heckman and Smith (1998b) and Heckman, Ichimura, Smith and Todd (1998). 6.1 Uses of Economic Models There are several distinct uses of economic models. (1) They suggest lists of explanatory variables that might belong in both outcome and participation equations. (2) They sometimes suggest plausible “exclusion restrictions” - variables that in‡uence participation but do not directly in‡uence outcomes, that can be used to help identify models in the presence of self-selection by participants. (3) They sometimes suggest speci…c functional forms of estimating equations motivated by a priori theory or by cumulated empirical wisdom. 49 6.2 Prototypical Models of Earnings and Program Participation To simplify the discussion, and start where the published literature currently stops, assume that persons have only one period in their lives - period k - where they have the chance to take job training. From the beginning of economic life, t = 1 up through t = k, persons have one outcome associated with the no-training state “0”: Y0j j = 1; :::; k: After period k, there are two potential outcomes corresponding to the training outcome (denoted “1”) and the no-training outcome (“0”): (Y0j ; Y1j ) j = k + 1; :::; T where T is the end of economic life. Persons participate in training only if they apply to a program and are accepted into it. Several decision makers may be involved: individuals, family members and bureaucrats. Let D = 1 if a person participates in a program; D = 0 otherwise. Then the full description of participation and potential outcomes is (6.1) (D; Y0t ; t = 1; :::; k; (Y0t ; Y1t ); t = k + 1; ::::; T ): As before, observed outcomes after period k can be written as a switching regression model: Y0t = DY1t + (1 ¡ D)Y0t : The most familiar model and the one that is most widely used in the training program evaluation literature assumes that program participation decisions are based on individual choices based on the maximization of the expected present value of earnings. It ignores family and bureaucratic in‡uences on participation decisions. 6.3 Expected Present Value of Earnings Maximization In period k, a prospective trainee seeks to measure the expected present value of earnings. Earnings is the outcome of interest. The information available to the agent in period k is Ik . The cost of program participation consists of two components: c (direct costs) and foregone earnings during the period. Training takes one period to complete. Assume that credit markets are perfect so that agents can lend and borrow freely at interest rate r. The expected present value of earnings maximizing decision rule is to participate in the program (D"= 1) if # TP ¡k Y1;k+j TP ¡k Y0;k+j (6.2) E ¡c¡ j Ik ¸ 0; j j j=1 (1 + r) j=0 (1 + r) 50 and not to participate in the program (D = 0) if this inequality does not hold. In (6.2), the expectations are computed with respect to the information available to the person in period k(Ik ). It is important to notice that the expectations in (6.2) are the private expectations of the decision maker. They may or may not conform to the expectations computed against the true ex ante distribution. Note further that Ik may di¤er among persons in the same environment or may di¤er among environments. Many variables external to the model may belong in the information sets of persons. Thus friends, relatives and other channels of information may a¤ect personal expectations.28 The following are consequences of this decision rule. (a) Older persons, and persons with higher discount rates, are less likely to take training. (b) Earnings prior to time period k are irrelevant for determining participation in the program except for their value in forecasting future earnings. (i.e. except as they enter the person’s information set Ik ). (c) Only current costs and the discounted gain to earnings determine participation in the program. Persons with lower foregone earnings and lower direct costs of program participation are more likely to go into the program. (d) Any dependence between the realized (measured) income at date t and D is induced by the decision rule. It is the relationship between the expected outcomes at the time decisions are made and the realized outcomes that generate the structure of the bias for any econometric estimator of a model. This framework underlies much of the empirical work in the literature on evaluating job training programs (see, e.g., Ashenfelter, 1978, Bassi, 1983, 1984, and Ashenfelter and Card, 1985). We now consider various specializations of it. 6.3.1 Common Treatment E¤ect As discussed in Section 3, the common treatment e¤ect model is implicitly assumed in much of the literature evaluating job training programs. It assumes that Y1t ¡ Y0t = ®t ; t > k, where ®t is a common constant for everyone. Another version writes ®t as a function of X; ®t (X). We take it as a point of departure for our analysis. The model we …rst presented was in Heckman (1978). Ashenfelter and Card (1985) and Heckman and Robb (1985a, 1986a) develop it. In this model, the e¤ect of treatment on the treated and the e¤ect of randomly assigning a person to treatment come to the same thing, i.e. E(Y1t ¡ Y0t j X; D = 1) = E(Y1t ¡ Y0t j X) since the di¤erence between the two income streams is the same for all persons with the same X characteristics. Under this model, decision rule (6.2) specializes to the discrete choice model 28 A sharp contrast between a model of perfect certainly and model of uncertainty is that the latter introduces the possibility of incorporating many more “explanatory variables” in the model in addition to the direct objects of the theory. 51 (6.3) D = 1; if E à TP ¡k j=1 ! ®k+j ¡ c ¡ Y0k j Ik ¸ 0; (1 + r)j D=0 otherwise. If the ®k+j are constant in allµperiods and T ¶ is large (T ! 1) the criterion simpli…es to ® (6.4) D = 1 if E ¡ c ¡ Y0k jIk ¸ 0; r D = 0 otherwise. Even though agents are assumed to be farsighted, and possess the ability to make accurate forecasts, the decision rule is simple. Persons compare current ¯ # " costs (both direct ¯ TP ¡k ®k+j ¯ costs c and foregone earnings, Y0k ) with expected future rewards E ( ) ¯I . j ¯ k (1 + r) j=1 Future rewards are the same for everyone of the same age and with the same discount rate. Future values of Y0t do not directly determine participation given Y0k . The link between D and Y0t ; t > k, comes through the dependence with Y0k and any dependence on cost c. If one knew, or could proxy, Y0k and c, one could condition on these variables and eliminate selective di¤erences between participants and nonparticipants. Since returns are identical across persons, only variation across persons in the direct cost and foregone earnings components determine the variation in the probability of program participation across persons. Assuming that c and Y0k are unobserved by the econometrician, but known to the agent making the decision to go into training, Pr(D = 1) = Pr à TP ¡k j=1 ! ®k+j > c + Y0k : (1 + r)j In the case of an in…nite-horizon, temporally-constant treatment e¤ect, ®, the expression simpli…es to µ ¶ ® Pr(D = 1) = Pr ¸ c + Y0k : r This simple model is rich enough to be consistent with Ashenfelter’s dip. As discussed in Section 4, the “dip” refers to the pattern that the earnings of program participants decline just prior to their participation in the program. If earnings are temporarily low in enrollment period k, and c does not o¤set Y0k , persons with low earnings in the enrollment period enter the program. Since the return is the same for everyone, it is low opportunity costs or tuition that drive program participation in this model. If the ®; c or Y0k depend on observed characteristics, one can condition on those characteristics in constructing the probability of program participation. This model is an instance of a more general approach to modelling behavior that is used in the economic evaluation literature. Write the net utility of program participation of the 52 decision maker as IN. An individual participates in the program (D = 1) if and only if IN > 0. Adopting a separable speci…cation, we may write IN = H(X) ¡ V: In terms of the previous example, H(X) = TX ¡k j=1 ®k+j is a constant, and V = c+ Y0k . (1 + r)j The probability that D = 1 given X is (6.5) Pr(D = 1 j X) = Pr(V < H(X) j X): If V is stochastically independent of X; we obtain the important special case Pr(D = 1 j X) = Pr(V < H(X)) which is widely assumed in econometric studies of discrete choice.29 If V is normal with mean ¹1 and variance ¾Ã2V , then ! H(X) ¡ ¹1 (6.6) Pr(D = 1 j X) = Pr(V < H(X)) = © ¾V where © is the cumulative distribution function of a standard normal random variable. If V is a standardized logit, Pr(D = 1 j X) = exp(H(X)) : 1 + exp(H(X)) Although these functional forms are traditional, they are restrictive and are not required by the econometric approach. Conditions for nonparametric identi…ability of Pr(D = 1 j X) given di¤erent assumptions about the dependence of X and V are presented in Cosslett (1983), and Matzkin (1992). Cosslett (1983), Matzkin (1993) and Ichimura (1993) consider nonparametric estimation of H and the distribution of V . Lewbel (1998) demonstrates how discrete choice models can be identi…ed under much weaker assumptions than independence between X and V . Under certain conditions, information about agent decisions to participate in a training program can be informative about their preferences and the outcomes of a program. Heckman and Smith (1998a) demonstrate conditions under which knowledge of the self-selection decisions of agents embodied in Pr(D = 1 j X) is informative about the value of Y1 relative to Y0 . In the Roy model (see, e.g., Heckman and Honoré, 1990), IN = Y1 ¡ Y0 = (¹1 (X) ¡ ¹0 (X)) + (U1 ¡ U0 ): Assuming X is independent of U1 ¡ U0 ; from 29 Conditions for the existence of a discrete choice random utility representation of a choice process are given in McLennan (1990). 53 self selection decisions of persons into a program, it is possible to estimate ¹1 (X) ¡ ¹0 (X) up to scale, where the scale is [V ar(U1 ¡ U0 )]1=2 . This is a standard result in discrete choice theory. Thus in the Roy model it is possible to recover E(Y1 ¡ Y0 j X) up to scale just from knowledge of the choice probability. Under additional assumptions on the support of X, Heckman and Smith (1998a) demonstrate that it is possible to recover the full joint distribution F (y0 ; y1 j X) and to answer all of the evaluation questions about means and distributions posed in Section 3. Under more general self-selection rules, it is still possible to infer the personal valuations of a program from observing selection into the program and attrition from it. The Roy model is the one case where personal evaluations of a program, as revealed by the choice behavior of the agents studied, coincide with the “objective” evaluations based on Y1 ¡ Y0 . Within the context of a choice-theoretic model, it is of interest to consider the assumptions that justify the three intuitive evaluation estimators introduced in section 4, starting with the cross-section estimator (3.3) - which is valid if assumption (4.A.3) is correct. Given decision rule (6.3), under what conditions is it plausible to assume that (4.A.3) E(Y0t j D = 1) = E(Y0t j D = 0); t>k so that cross section comparisons identify the true program e¤ect? (Recall that in a model with homogeneous treatment impacts, the various mean treatment e¤ects all come to the same thing.) We assume that evaluators do not observe costs nor do they observe Y0k for trainees. Assumption (4.A.3) would be satis…ed in period t if TP ¡k ®k+j TP ¡k ®k+j E(Y0t j ¡ c ¡ Y0k ¸ 0) = E(Y0t j ¡ c ¡ Y0k < 0); t > k: j j j=1 (1 + r) j=1 (1 + r) One way this condition can be satis…ed is if earnings are distributed independently over time (Y0k independent of Y0t ), t > k, and direct costs c are independent of Y0t ; t > k: More generally, only independence in the means with respect to c + Y0k is required.30 If the dependence in earnings vanishes for earnings measured more than ` periods apart (e.g. if earnings are a moving average of order `), then for t > k + `, assumption (4.A.3) would be satis…ed in such periods. Considerable evidence indicates that earnings have an autoregressive component (see, e.g., Ashenfelter 1978; Ashenfelter and Card, 1985; MaCurdy, 1982; Farber and Gibbons, 1994). Then (4.A.3) seems implausible except for special cases.31 Moreover if stipends (a component of c) are determined in part by current and past income because they are targeted toward low-income workers, then (4.A.3) is unlikely to be satis…ed. Access to better information sometimes makes it more likely that a version of assumption 30 31 Formally, it is required that E (Y0t jc + Y0k ) does not depend on c and Y0k for all t > k. Note, however, much of this evidence is for log earnings and not earnings levels. 54 (4.A.3) will be satis…ed if it is revised to condition on observables X: (4.A.30 ) E(Y0t j D = 1; X) = E(Y0t j D = 0; X): In this example, let X = (c; Y0k ): Then if we observe Y0k for everyone, and can condition on it, and if c is independent of Y0t given Y0k ; then TP ¡k ®k+j E(Y0t j D = 1; Y0k ) = E(Y0t j ¡ Y0k ¸ c; Y0k ) j=1 (1 + r)j = E(Y0t j Y0k ) = E(Y0t j D = 0; Y0k ): Then for common values of Y0k , assumption (4.A.30 ) is satis…ed for X = Y0k . Ironically, using too much information may make it di¢cult to satisfy (4.A.30 ). To see this, suppose that we observe c and Y0k and X = (c; Y0k ). Now E(Y0t j D = 1; (c; Y0k )) = E(Y0t j c; Y0k ) and E(Y0t j D = 0; (c; Y0k )) = E(Y0t j c; Y0k ) because c and Y0k perfectly predict D. But (4.A.30 ) is not satis…ed because decision rule (6.3) perfectly partitions the (c; Y0k ) space into disjoint sets. There are no common values of X = (c; Y0k ) such that (4.A.30 ) can be satis…ed. In this case, the “regression discontinuity design” estimator of Campbell and Stanley (1966) is appropriate. We discuss this estimator in Section 7.4.6 below. If we assume that 0 < P r(D = 1 j X) < 1 we rule out the phenomenon of perfect predictability of D given X. This condition guarantees that persons with the same X values have a positive probability of being both participants and nonparticipants.32 Ironically, having too much information may be a bad thing. We need some “random” variation that places observationally equivalent people in both states. The existence of this fortuitous randomization lies at the heart of the method of matching. Next consider assumption (4.A.1). It is satis…ed in this example if in a time homogeneous environment, a “…xed e¤ect” or “components of variance structure” characterizes Y0t so that there is an invariant random variable ' such that Y0t can be written as (6.7) Y0t = ¯ t + ' + U0t for all t and E(U0t j ') = 0 for all t where the U0t are mutually independent, and c is independent of U0t : If Y0t is earnings, then ' is “permanent income” and the U0t are “transitory deviations” around it. Then using (6.3) for t > k > t0 , we have 32 This is one of two conditions that Rosenbaum and Rubin (1983) call “strong ignorability” and is central to the validity of matching. We discuss these conditions further in section 7.3. 55 E(Y0t ¡ Y0t0 j D = 1) = ®t + ¯ t ¡ ¯ t0 , since E(U0t j D = 1) ¡ E(U0t0 j D = 1) = 0: From the assumption of time homogeneity, ¯ t = ¯ t0 . Thus assumption (4.A.1) is satis…ed and the before-after estimator identi…es ®t . It is clearly not necessary to assume that the U0t are mutually independent, just that (6.8) E(U0t ¡ U0t0 j D = 1) = 0 i.e. that the innovation U0t ¡U0t0 is mean independent of U0k +c. In terms of the economics of the model, it is required that participation does not depend on transitory innovations in earnings in periods t and t0 . For decision model (6.3), this condition is satis…ed as long as U0k is independent of U0t and U0t0 , or as long as U0k + c is mean independent of both terms. If, however, the U0t are serially correlated, then (4.A.1) will generally not be satis…ed. Thus if a transitory decline in earnings persists over several time periods (as seems to be true as a consequence of Ashenfelter’s dip), so that there is stochastic dependence of (U0t ; U0t0 ) with U0k , then it is unlikely that the key identifying assumption is satis…ed. One special case where it is satis…ed, developed by Heckman (1978) and Heckman and Robb (1985a) and applied by Ashenfelter and Card (1985) and Finifter (1987) among others, is a “symmetric di¤erences” assumption. If t and t0 are symmetrically aligned (so that t = k+ ` and t = k¡ `) and conditional expectations forward and backward are symmetric, so that (6.9) E(U0t j c + ¯ t + U0k ) = E(U0t0 j c + ¯ k + U0k ); then assumption (4.A.1) is satis…ed. This identifying condition motivates the symmetric di¤erences estimator discussed in Section 7.6. Some evidence of non-stationary wage growth presented by Farber and Gibbons (1994), MaCurdy (1982), Topel and Ward (1992) and others suggests that earnings can be approximated by a “random walk” speci…cation. If (6.10) Y0t = ¯ t + ´ + t P j=0 ºj; where the º j are mean zero, mutually independent and identically-distributed random variables independent of ´, then (6.8) and (6.9) will not generally be satis…ed. Thus even if conditional expectations are linear, both forward and backward, it does not follow that (4.A.1) will hold. Let the variance of ´ and the variance of º j be …nite. Assume that E(´) = 0. Suppose c is independent of all the º j and ´;and E(U0t j c + ¯ t + U0k ) = and ¾ 2´ + k¾ 2v (c+ U0k ¡ E(c)) ¾ 2c + ¾ 2´ + k¾ 2º E(U0t0 j c + ¯ t + U0k ) = ¾ 2´ + t0 ¾ 2v (c + U0k ¡ E(c)): ¾ 2c + ¾ 2´ + t¾ 2º 56 These two expressions are not equal unless ¾ 2º = 0: A more general model that is consistent with the evidence reported in the literature writes Y0t = ¹0t (X) + ´ + U0t ; where U0t = k P j=1 ½0j U0;t¡j + m X m0j º t¡j ; j=1 where the º t¡j satisfy E(º t¡j ) = 0 at all leads and lags, and are uncorrelated with ´; where U0t is an autoregression of order k and moving average of length m. Some authors like MaCurdy (1982) or Gibbons and Farber (1994) allow the coe¢cients (½0j ; m0j ) to depend on t and do not require that the innovations be identically distributed over time. For the logarithm of white male earnings in the United States, MaCurdy (1982) …nds that a model with a permanent component (´), plus one autoregressive coe¢cient (k = 1) and two moving average terms (m = 2) describes his data.33 Gibbons and Farber report similar evidence. These times series models suggest generalizations of the before-after estimator that exploit the longitudinal structure of earnings processes but work with more general types of di¤erences that align future and past earnings. These are developed at length in Heckman and Robb (1985, 1986), Heckman (1998a) and in Section 7.6. If there are “time e¤ects,” so that ¯ t 6= ¯ t0 , (4.A.1) will not be satis…ed. Before-after estimators will confound time e¤ects with program gains. The “di¤erence in di¤erences” estimator circumvents this problem for models in which (4.A.1) is satis…ed for the unobservables of the model but ¯ t 6= ¯ t0 : Note, however, that in order to apply this assumption it is necessary that time e¤ects be additive in some transformation of the dependent variable and identical across participants and nonparticipants. If they are not, then (4.A.2) will not be satis…ed. For example, if the decision rule for program participation is such that persons with lower life cycle wage growth paths are admitted into the program, or persons who are more vulnerable to the national economy are trained, then the assumption of common time (or age) e¤ects across participants and nonparticipants will be inappropriate and the di¤erence-in-di¤erence estimator will not identify true program impacts. 33 The estimated value of ½01 is close to 1 so that the model is close is a random walk in levels of log earnings. 57 6.3.2 A Separable Representation In implementing econometric evaluation strategies, it is common to control for observed characteristics X. Invoking the separability assumption, we write the outcome equation for Y0t as Y0t = g0t (X) + U0t where g0t is a behavioral relationship and U0t has a …nite mean conditioning on X. A parallel expression can be written for Y1t : Y1t = g1t (X) + U1t . The expression for g0t (X) is a structural relationship that may or may not be di¤erent from ¹0t (X), the conditional mean. It is a ceteris paribus relationship that informs us of the e¤ect of changes of X on Y0t holding U0t constant. Throughout this chapter we distinguish ¹1t from g1t and ¹0t from g0t . For the latter, we allow for the possibility that E(U1t j X) 6= 0 and E(U0t j X) 6= 0. The separability enables us to isolate the e¤ect of self selection, as it operates through the “error term”, from the structural outcome equation: (6.11a) E(Y0t j D = 0; X) = g0t (X) + E(U0t j D = 0; X): (6-11b) E(Y1t j D = 1; X) = g1t (X) + E(U1t j D = 1; X): The g0t (X) and g1t (X) functions are invariant across di¤erent conditioning schemes and decision rules provided that X is available to the analyst. One can borrow knowledge of these functions from other studies collected under di¤erent conditioning rules including the conditioning rules that de…ne the samples used in social experiments. Although the conditional mean of the errors di¤ers across studies, the g0t (X) and analogous g1t (X) functions are invariant across studies. If they can be identi…ed, they can be meaningfully compared across studies, unlike the parameter treatment on the treated which, in the case of heterogeneous response to treatment that is acted on by agents, di¤ers across programs with di¤erent decision rules and di¤erent participant compositions. A special case of this representation is the basis for an entire literature. Suppose that (P.1) The random utility representation (6.5) is valid. Further, suppose that (P.2) (U0t ; U1t ; V ) k X, (“ k ” denotes stochastic independence) and …nally assume that (P.3) the distribution of V; F (V ) is strictly increasing in V . Then (6.12a) E(U0t j D = 1; X) = K0t (Pr(D = 1 j X)): 58 and (6.12b) E(U1t j D = 1; X) = K1t (Pr(D = 1 j X)).34 The mean error term is a function of P , the probability of participation in the program. This special case receives empirical support in Heckman, Ichimura, Smith and Todd (1998) and Heckman, Ichimura and Todd (1997). It enables analysts to characterize the dependence between U0t and X by the dependence of U0t on Pr(D = 1 j X) which is a scalar function of X. As a practical matter, this greatly reduces the empirical task of estimating selection models. Instead of having to explore all possible dependence relationships between U and X; the analyst can con…ne attention to the more manageable task of exploring the dependence between U and Pr(D = 1 j X). An investigation of the e¤ect of conditioning on program eligibility rules or self selection on Y0t comes down to an investigation of the e¤ect of the conditioning on Y0t as it operates through the probability P . It motivates a focus on the determinants of participation in the program in order to understand selection bias and is the basis for the “control function” estimators developed in Section 7. 34 The proof is immediate. The proof of (6.12b) follows by similar reasoning. We follow Heckman (1980) and Heckman and Robb (1985a, 1986b). Assume that U0t ; V are jointly continuous random variables, with density f (U0t ; V j X). From (P.2) f(U0t ; V j X) = f(U0t ; V ): Thus E(U0t j X; D = 1) = R1 U0t ¡1 H(X) R f(U0t ; V )dU0t dV ¡1 H(X) R : f(V )dV ¡1 Now Pr(D = 1 j X) = Inverting, we obtain H(X) R f (V )dV: ¡1 H(X) = FV¡1 (Pr(D = 1 j X)): Thus E(U0t j X; D = 1) = R1 ¡1 FV¡1 (Pr(D=1jX)) U0t R f (U0t ; V )dV dU0t ¡1 Pr(D = 1 j X) def K0t (Pr(D = 1 j X)): = 59 If, however, (P.2) is not satis…ed, then the separable representation is not valid. Then it is necessary to know more than the probability of participation to characterize E(U0t j X; D = 1). In this case it is necessary to characterize both the dependence between U0t and X given D = 1 and the probability of participation. 6.3.3 Variable Treatment E¤ect A more general version of the decision rule, given by (6.2), allows (Y0t ; Y1t ) to be a pair of random variables with no necessary restriction connecting them. In the more general case, ®t = Y1t ¡ Y0t ; t>k is now a random variable. In this case as previously discussed in Section 3, there is a distinction between the parameter “the mean e¤ect of treatment on the treated” and the “mean e¤ect of randomly assigning a person with characteristics X into the program”. In one important case discussed in Heckman and Robb (1985a), the two parameters have the same ex post mean value even if treatment e¤ect ®t is heterogeneous after conditioning on X. Suppose that ®t is unknown to the agent at the time enrollment decisions are made. The agent forecasts ®t using the information available in his/her information set Ik . E(®t j Ik ) is the private expectation of gain by the agent. If ex post gains of participants with characteristics X are the same as what the ex post gains of nonparticipants would have been had they participated, then the two parameters are the same. This would arise if both participants and nonparticipants have the same ex ante expected gains E(®t j D = 1; Ik ) = E(®t j D = 0; Ik ) = E(®t j Ik ), and if E[E(®t j Ik ) j X; D = 1] = E[E(®t j Ik ) j X; D = 0]; where the expectations are computed with respect to the observed ex-post distribution of the X. This condition requires that the information in the participant’s decision set has the same relationship to X as it has for nonparticipants. The interior expectations in the preceding expression are subjective. The exterior expectations in the expression are computed with respect to distributions of objectively-observed characteristics. The condition for the two parameters to be the same is E[E(®t j Ik ; D = 1) j X; D = 1] = E[E(®t j Ik ; D = 0) j X; D = 0]: 60 As long as the ex-post objective expectation of the subjective expectations is the same, the two parameters (E(®t j X; D = 1) and E(®t (X)) are the same. This condition would be satis…ed if, for example, all agents, irrespective of their X values, place themselves at the mean of the objective distribution, i.e., E(®t j Ik ; D = 1) = E(®t j Ik ; D = 0) = ® ¹t (see, e.g., Heckman and Robb, 1985a). Di¤erences across persons in program participation are generated by factors other than potential outcomes. In this case, the ex-post surprise, (®t ¡ ® ¹ t) does not depend on X or D in the sense that E(®t ¡ ® ¹ t j X; D = 1) = 0: So E(Y1t ¡ Y0t j X; D = 1) = ® ¹ t: This discussion demonstrates the importance of understanding the decision rule and its relationship to measured outcomes in formulating an evaluation model. If agents do not make their decisions based on the unobserved components of gains from the program or on variables statistically related to those components, the analysis for the common coe¢cient model presented in section (a) remains valid even if there is variability in U1t ¡U0t : If agents anticipate the gains, and base decisions on them, at least in part, then a di¤erent analysis is required. The conditions for the absence of bias for one parameter are di¤erent from the conditions for the absence of bias for another parameter. The di¤erence between the “random assignment” parameter E(Y1t ¡ Y0t j X) and the “treatment on the treated” parameter is gain in the unobservables going from one state to the next: E(U1t ¡ U0t j X; D = 1) = E(¢t j X; D = 1) ¡ E(¢t j X): The only way to avoid bias for both mean parameters is if E(U1t ¡ U0t j X; D = 1) = 0: Unlike the other estimators, the before-after estimators are non-robust to time e¤ects that are common across participants and nonparticipants. The di¤erence-in-di¤erences estimators and the cross-section estimators are unbiased under di¤erent conditions. The cross-section estimator for the period t common e¤ect and the “treatment on the treated” 61 variable-e¤ect version of the model require that mean unobservables in the no program state be the same for participants and nonparticipants. The di¤erence-in-di¤erences estimator requires a balance of the bias in the change in the unobservables from period t0 to period t. If the cross-section conditions for the absence of bias are satis…ed for all t, then the assumption justifying the di¤erence-in-di¤erences estimator is satis…ed. However, the converse is not true. Even if the conditions for the absence of bias in the di¤erence-in-di¤erences estimator are satis…ed, the conditions for absence of bias for the cross section estimator are not necessarily satis…ed. Moreover, failure of the di¤erencein-di¤erences condition for the absence of bias does not imply failure of the condition for absence of bias for the cross-section estimator. Ashenfelter’s dip provides empirically relevant example of this point. If t0 is measured during the period of the dip, but the dip is mean-reverting in post-program periods, then the condition for the absence of cross-section bias could be satis…ed because post-program, there could be no selective di¤erences among participants. 6.3.4 Imperfect Credit Markets How robust is the analysis of Sections 6.2 and 6.3, and in particular the conditions for bias, to alternative speci…cations of decision rules and the economic environments in which individuals operate? To answer this question, we …rst reexamine the decision rule after dropping our assumption of perfect credit markets. There are many ways to model imperfect credit markets. The most extreme approach assumes that persons consume their earnings each period. This changes the decision rule (6.2) and produces a new interpretation for the conditions for absence of bias. Let G denote a time-separable strictly concave utility function and let ¯ be a subjective discount rate. Suppose that persons have exogenous income ‡ow ´t per period. Expected utility maximization given information Ik produces the following program participation rule: 2 3 8 (6.13) TP ¡k > > > ¯ j fG(Y1;k+j + ´ k+j ) ¡ G(Y0;k+j + ´ k+j )g 7 < 1 if E 6 4 j=1 5 ¸ 0; D => +G(´ ¡ c ) ¡ G(Y + ´ ) j I k 0k k > k k > : 0 otherwise. As in the previous cases, earnings prior to time period k are only relevant for forecasting future earnings (i.e., as elements of Ik ). However, the decision rule (6.2) is fundamentally altered in this case. Future earnings in both states determine participation in a di¤erent way. Common components of earnings in the two states do not di¤erence out unless G is a linear function.35 35 Due to the nonlinearity of G, there are wealth e¤ects in the decision to take training. 62 Consider the permanent-transitory model of equation (6.7). That model is favorable to the application of longitudinal before-after estimators. Suppose that the U0t are independent and identically distributed, and there is a common-e¤ect model. Condition (6.8) is not satis…ed in a perfect foresight environment when there are credit constraints, or in an environment in which the U0t can be partially forecast36 because for t > k > t0 E(U0t j X; D = 1) 6= 0 even though E(U0t0 j X; D = 1) = 0; so E(U0t ¡ U0t0 j X; D = 1) 6= 0: The before-after estimator is now biased. So is the di¤erence in di¤erences estimator. If, however, the U0t are not known, and cannot be partially forecast, then condition (6.8) is valid, so both the before-after and di¤erence in di¤erence estimators are unbiased. Even in a common e¤ect model, with Y0t (or U0t ) independently and identically distributed, the cross section estimator is biased for period t > k in an environment of perfect certainty with credit constraints because D depends on Y0t through decision rule (6.13). On the other hand, if Y0t is not forecastable with respect to the information in Ik , the cross-section estimator is unbiased. The analysis in this subsection and the previous subsections has major implications for a certain style of evaluation research. Understanding the stochastic model of the outcome process is not enough. It is also necessary to know how the decision makers process the information, and make decisions about program participation. 6.3.5 Training As A Form of Job Search Heckman and Smith (1998b) …nd that among persons eligible for the JTPA program, the unemployed are much more likely to enter the program than are other eligible persons. Persons are de…ned to be unemployed if they are not working but report themselves as actively seeking work. The relationship uncovered by Heckman and Smith is not due to eligibility requirements. In the United States, unemployment is not a precondition for participation in the program. Several previous studies suggest that Ashenfelter’s dip results from changes in labor force status, instead of from declines in wages or hours among those who work. Using 36 “Partially forecastable” means that some component of U0t resides in the information set Ik : That is, letting f(y j x) be the density of y given x, f (U0t j Ik ) 6= f(U0t ) so that Ik predicts U0t in this sense. One could de…ne “moment forecastability” using conditional expectations of certain moments of function“'": If E('(U0t ) j Ik ) 6= E('(U0t )), then '(U0t ) is partially moment forecastable using the information in Ik . More formally, a random variable is fully-forecastable if the ¾-algebra generating U0t is contained in the ¾-algebra of Ik . It is partially forecastable if the complement of the projection of the ¾-algebra of U0t onto the ¾-algebra of Ik is not the empty set. It is fully unforecastable if the projection of the ¾-algebra of U0t onto the ¾-algebra of Ik is the empty set. 63 even a crude measure of employment rates, namely whether a person was employed at all during a calendar year, Card and Sullivan (1988) observed that U.S. CETA training participants’ employment rates declined prior to entering training.37 Their evidence suggests that changes in labor force dynamics instead of changes in earnings may be a more precise way to characterize participation in training. Heckman and Smith (1998b) show that whether or not a person is employed, unemployed (not employed and looking for work), or out of the labor force is a powerful predictor of participation in training programs. Moreover, they …nd that recent changes in labor force status are important determinants of participation for all demographic groups. In particular, eligible persons who have just become unemployed, either through job loss or through re-entry into the labor force, have the highest probabilities of participation. For women, divorce, another form of job termination, is a predictor of who goes into training. Among those who either are employed or out of the labor force, persons who have recently entered these states have much higher participation program probabilities than persons in those states for some time. Their evidence is formalized by the model presented in this section. The previous models that we have considered are formulated in terms of levels of costs and earnings. When opportunity costs are low, or tuition costs are low, the persons are more likely to enter training. The model presented here recognizes that changes in labor force states account for participation in training. Low earnings levels are a subsidiary predictor of program participation that are overshadowed in empirical importance by unemployment dynamics in the analyses of Heckman and Smith (1998b). Persons with zero earnings di¤er substantially in their participation probabilities depending on their recent labor force status histories. Yet, in models based on pre-training earnings dynamics, such as the one presented in Section 6.3, such persons are assumed to have the same behavior irrespective of their labor market histories. The importance of labor force status histories also is not surprising given that many employment and training services, such as job search assistance, on-the-job training at private …rms, and direct placement are all designed to lead to immediate employment. By providing these services, these programs function as a form of job search for many participants. Recognizing this role of active labor market policies is an important development in recent research. It indicates that in many cases, participation in active labor market programs should not be modeled as if it were like a schooling decision, such as we have modeled it in the preceding sections. In this section, we summarize the evidence on the determinants of participation in the program and construct a simple economic model in which job search makes two contribu37 Ham and LaLonde (1990) report the same result using semi-monthly employment rates for adult women participating in NSW. 64 tions to labor market prospects: (a) it facilitates the rate of arrival of job o¤ers and (b) it improves the distribution of wages in the sense of giving agents a stochastically dominant wage distribution compared to the one they face without search. Training is one form of unemployment that facilitates job search. Di¤erent training options will produce di¤erent job prospects characterized by di¤erent wage and layo¤ distributions. Searchers might participate in programs that subsidize the rate of arrival of job o¤ers (JSA as described in Section 2), or that improve the distribution from which wage o¤ers are drawn (i.e., basic educational and training investments). Instead of motivating participation in training with a standard human capital model, we motivate participation as a form of search among options. Because JSA constitutes a large component of active labor market policy, it is of interest to see how the decision rule is altered if enhanced job search rather than human capital accumulation is the main factor motivating individuals’ participation in these programs. Our model is based on the idea that in program j; wage o¤ers arrive from a distribution Fj at rate ¸j . Persons pay cj to sample from Fj : (The costs can be negative). Assume that the arrival times are statistically independent of the wage o¤ers and that arrival times and wage o¤ers from one search option are independent of the wages and arrival times of other search options. At any point in time, persons pick the search option with the highest expected return. To simplify the analysis, suppose that all distributions are time invariant and denote by N the value of nonmarket time. Persons can select among any of J options, denoted by j. Associated with each option is a rate at which jobs appear, ¸j . Let the discount rate be r. These parameters may vary among persons but for simplicity we assume that they are constant for the same person over time. This heterogeneity among persons produces di¤erences among choices in training options, and di¤erences in the decision to undertake training. In the unemployed state, a person receives a nonmarket bene…t, N . The choice between search from any of the training and job search options can be written in “Gittens Index” form. (See, e.g., Berry and Fristedt, 1986). Under our assumptions, being in the nonmarket state has constant per-period value N irrespective of the search option selected. Letting Vje be the value of employment arising from search option j, the value of being unemployed under training option j is: ¸j (1 ¡ ¸j ) (6.14a) Vju = N ¡ cj + Ej max[Vje ; Vju ] + Vju : 1+r 1+r The …rst term, (N ¡ cj ), is the value of nonmarket time minus the j-speci…c cost of search. The second term is the discounted product of the probability that an o¤er arrives next period if the j th option is used, and the expected value of the maximum of the two options: work (valued at Vje ) or unemployment Vju . The third term is the probability that the person will continue to search times the value of doing so. In a stationary environment, if 65 it is optimal to search from j today, it is optimal to do so tomorrow. Let ¾ je be the exogenous rate at which jobs disappear. For a job holder, the value of employment is Vje : (1 ¡ ¾ je) ¾ je (6.14b) Vje = Yj + Vje + Ej [ max(VN ; Vju ).] 1+r 1+r Vju is the value of optimal job search under j. The expression consists of the current ‡ow 1 ) expected value of employment (Vje ) times the of earnings (Yj ) plus the discounted ( 1+r probability that the job is retained (1 ¡ ¾ je ). The third term arises from the possibility that a person loses his/her job (this happens with probability (¾ je )) times the expected value of the maximum of the search and nonmarket value options (VN ): To simplify this expression, assume that Vju > VN . If this is not so, the person would never search under any training option under any event. In this case, Vje simpli…es to Vje = Yj + (1 ¡ ¾ je) ¾ je Vje + Vju 1+r 1+r so ¾ je (1 + r)Yj Vju + : r + ¾ je r + ¾ je Substituting (6.14c) into (6.14a), we obtain, after some rearrangement, (6.14c) Vje = Vju = (1 + r)(N ¡ cj ) + ¸j E (Vje j Vje > Vju ) Pr(Yj > Vju (r=1 + r)) j r + ¸j Pr(Yj > Vju (r=1 + r)) : In deriving this expression, we assume that the environment is stationary so that the optimal policy at time t is also the optimal policy at t0 provided that the state variables are the same in each period. The optimal search strategy is ^j =arg max fVju g j provided that Vju > VN for at least one j. The lower cj and the higher ¸j , the more attractive is option j. The larger the Fj — in the sense that j stochastically dominates j 0 (Fj (x) < Fj 0 (x)), so more of the mass of Fj is the upper portion of the distribution— the more attractive is option j. Given the search options available to individuals, enrollment in a job training program may be the most e¤ective option. The probability that a training from option j lasts Tj = tj periods or more is Pr(Tj ¸ tj ) = [1 ¡ ¸j (1 ¡ Fj (Vju (r=(1 + r)))]tj 66 where 1¡ ¸j (1 ¡ Fj (Vju (r=1 + r)) is the sum of the probability of receiving no o¤er (1 ¡ ¸j ) plus the probability of receiving an o¤er that is not acceptable (¸j Fj (Vju (r=1 + r)). This model is nonlinear in the basic parameters. Because of this nonlinearity, many estimators relying on additive separability of the unobservables, such as di¤erence-in-di¤erences or the …xed e¤ect schemes for eliminating unobservables, are ine¤ective evaluation estimators. This simple model summarizes the available empirical evidence on job training programs. (a) It rationalizes variability in the length of time persons with identical characteristics spend in training. Persons receive di¤erent wage o¤ers at di¤erent times and leave the program to accept the wage o¤ers at di¤erent dates. (b) It captures the notion that training programs might facilitate the rate of job arrivals - the ¸j (this is an essential function of “job search assistance” programs) or they might produce skills - by improving the Fj0 or both. (c) It accounts for why there might be recidivism back into training programs. As jobs are terminated (at rate ¾ je ), persons re-enter the program to search for a replacement job. Recidivism is an important feature of major job training programs. Trott and Baj (1993) estimate that as many as 20 percent of all JTPA program participants in Northern Illinois have been in the program at least twice with the modal number being three. This has important implications for the contamination bias problem that we discuss in Section 7.7. A less attractive feature of the model is that persons do not switch search strategies. This is a consequence of the assumed stationarity of the environment and the assumption that agents know both arrival rates and wage o¤er distributions. Relaxing the stationarity assumption produces switching among strategies which seems to be consistent with the evidence. A more general - but less analytically tractable model - allows for learning about wage o¤er distributions as in Weitzman (1979). In such a model, persons may switch strategies as they learn about the arrival rates or the wage o¤ers obtained under a given strategy. The learning can take place within each type of program and may also entail word of mouth learning from fellow trainees taking the option. Weitzman’s model captures this idea in a very simple way and falls within the Gitten’s index framework. The basic idea is as follows. Persons have J search options. They pick the option with the highest value and take a draw from it. They accept the draw if the value of the realized draw is better than the expected value of the best remaining option. Otherwise they try out the latter option. If the draws from the J options are independently distributed, a Gittens-index strategy describes this policy. In this framework, unemployed persons may try a variety of options - including job training - before they take a job, or drop out of the labor force. One could also extend this model to allow the value of non-market time, N , to become stochastic. If N ‡uctuates, persons would enter or exit the labor force depending on the value of N. Adding this feature captures the employment dynamics of trainees described 67 by Card and Sullivan (1988). In this more general model, shocks to the value of leisure or termination of previous jobs make persons contemplate taking training. Whether or not they do so depends on the value of training compared to the value of other strategies for …nding jobs. Allowing for these considerations produces a model broadly consistent with the evidence presented in Heckman and Smith (1998b) that persons enter training as a consequence of displacement from both the market and nonmarket sector. The full details of this model remain to be developed (see Heckman and Smith, 1999, for a start). We suggest that future analyses of program participation be based on this empirically more concordant model. For the rest of this chapter, however, we take decision rule (6.3) as canonical in order to motivate and justify the choice of alternative econometric estimators. We urge our readers to modify our analysis to incorporate the lessons from this framework of labor force dynamics sketched here. 6.4 The Role of Program Eligibility Rules In Determining Participation Several institutional features of most training programs suggest that the participation rule is more complex than that characterized by the simple model presented above in Section 6.2. For example, eligibility for training is often based on a set of objective criteria, such as current or past earnings being below some threshold. In this instance, individuals can take training at time k only if they have had low earnings, regardless of its potential bene…t to them. For example, enrollees satisfy (6.15) ®=r ¡ Yik ¡ ci > 0 and the eligibility rules Yi;k¡1 < K where K is a cuto¤ level. More general eligibility rules can be analyzed in the same framework. The universality of Ashenfelter’s dip in pre-program earnings among program participants occurs despite the substantial variation in eligibility rules among training programs. This suggests that earnings or employment dynamics drive the participation process and that Ashenfelter’s dip is not an artifact of eligibility rules. Few major training programs in the United States have required earnings declines to qualify for program eligibility. Certain CETA programs in the late 1970s required participants to be unemployed during the period just prior to enrollment, while NSW required participants to be unemployed at the date of enrollment. MDTA contained no eligibility requirements, but restricted training stipends to persons who were unemployed or “underemployed.”38 For the JTPA program, eligibility 38 Eligibility for CETA varied by subprogram. CETA’s controversial Public Sector Employment (PSE) program required participants to have experienced a minimum number of days of unemployment or “un- 68 has been con…ned to the economically disadvantaged (de…ned by low family income over the past six months, participation in a cash welfare program or Food Stamps or being a foster child or disabled). There is also a 10 percent “audit window” of eligibility for persons facing other unspeci…ed “barriers to employment.” It is possible that Ashenfelter’s dip results simply from a mechanical operation of program eligibility rules that condition on recent earnings. Such rules select individuals with particular types of earnings patterns into the eligible population. To illustrate this point, consider the monthly earnings of adult males who were eligible for JTPA in a given month from the 1986 panel of the U.S. Survey of Income and Program Participation (SIPP). For most people, eligibility is determined by family earnings over the past six months. The mean monthly earnings of adult males appear in Figure 4.1 aligned relative to month ‘k,’ the month when eligibility is measured. The …gure reveals a dip in the mean earnings of adult male eligibles centered in the middle of the six month window over which family income is measured when determining JTPA eligibility. Figure 4.1 also displays the mean earnings of adult males in the experimental control group from the NJS.39 The earnings dip for the controls, who applied and were admitted in the program, is larger than for the sample of JTPA eligibles from the SIPP. Moreover, this dip reaches its minimum during month ‘k’ rather than three or four months before as would be indicated by the operation of eligibility rules. The substantial di¤erence between the mean earnings patterns of JTPA participants and eligibles implies that Ashenfelter’s dip does not result from the mechanical operation of program eligibility rules.40 6.5 Administrative Discretion and the E¢ciency and Equity of Training Provision Training participation also often depends on discretionary choices made by program operators. Recent research focuses on how program operators allocate training services among deremployment” just prior to enrollment. In general, persons became eligible for other CETA programs by having a low income or limited ability in English. Considerable discretion was left to the states and training centers to determine who enrolled in the program. By contrast, the NSW eligibility requirements were quite speci…c. Adult women had to be on AFDC at the time of enrollment, have received AFDC for 30 of the last 36 months, and have a youngest child age six years or older. Youth in the NSW had to be age 17-20 years with no high school diploma or equivalency degree and have not been in school in the past six months. In addition, …fty percent of youth participants had to have had some contact with the criminal justice system (Hollister, et al., 1984). 39 Such data were collected at four of the 16 training centers that participated in the study. 40 Devine and Heckman (1996) present certain nonstationary family income processes that can generate Ashenfelter’s dip from the application of JTPA eligibility rules. However, in their empirical work they …nd a dip centered at k ¡ 3 or k ¡ 4 for adult men and adult women, but no dip for male and female youth. 69 groups and on how administrative performance standards a¤ect the allocation of these services. The main question that arises in these studies is the potential trade-o¤ between equity and e¢ciency, and the potential con‡ict between social objectives and program operators’ incentives. An e¢ciency criterion that seeks to maximize the social return to public training investments, regardless of the implications for income distribution, implies focusing training resources on those groups for whom the impact is largest (per dollar spent). In contrast, equity and redistributive criteria dictate focusing training resources on groups who are most in “need” of services . These goals of e¢ciency and equity are written into the U.S. Job Training Partnership Act.41 Whether or not these twin goals con‡ict with each other depends on the empirical relationship between initial skill levels and the impact of training. As we discuss in below Section 10, the impact of training appears to vary on the basis of observable characteristics, such as sex, age, race and what practitioners call “barriers to employment” – low schooling, lack of employment experience and so on. These twin goals would be in con‡ict if the largest social returns resulted from training the most job ready applicants. In recent years, especially in the United States, policymakers have used administrative performance standards to assess the success of program operators in di¤erent training sites. Under JTPA, these standards are based primarily on average employment rates and average wage rates of trainees shortly after they leave training. The target levels for each site are adjusted based on a regression model that attempts to hold constant features of the environment over which the local training site has no control, such as racial composition.42 Sites whose performance exceeds these standards may be rewarded with additional funding; those that fall below may be sanctioned. The use of such performance standards, instead of measures of the impact of training, raises the issue of “cream-skimming” by program operators (Bassi, 1984). Program sta¤ concerned solely with their site’s performance relative to the standard should admit into the program applicants who are likely to be employed at good wages (the “cream”) regardless of whether or not they bene…t from the program. By contrast, they should avoid applicants who are less likely to be employed after leaving training or have low expected wages, even if the impact of the training for such persons is likely to be large. The implications of cream-skimming for equity are clear. If it exists, program operators are directing resources away from those most in need. However, its im41 A related issue involves di¤erences in the types of services provided to di¤erent groups conditional on participation in a program. The U.S. General Accounting O¢ce (1991) …nds such di¤erences alarming in the JTPA program. Smith (1992) argues that they result from di¤erences across groups in readiness for immediate employment and in the availability of income support during classroom training. 42 See Heckman and Smith (1997d) and the essays in Heckman (1998b) for more detailed descriptions of the JTPA performance standards system. Similar systems based on the JTPA system now form a part of most U.S. training programs. 70 plications for e¢ciency depend on the empirical relationship between short-term outcome levels and long-term impacts. If applicants who are likely to be subsequently employed also are those who bene…t the most from the program, performance standards indirectly encourage the e¢cient provision of training services.43 A small literature examines the empirical importance of cream-skimming in JTPA programs. Anderson, et al. (1991) and Anderson, et al.(1993) look for evidence of creamskimming by comparing the observable characteristics of JTPA participants and individuals eligible for JTPA. They report evidence of cream-skimming de…ned in their study as the case in which individuals with fewer barriers to employment have di¤erentially higher probabilities of participating in training. However, this …nding may result not from creamskimming by JTPA sta¤, but because among those in the JTPA eligible population, more employable persons self-select into training.44 Two more recent studies address this problem. Using data from the NJS, Heckman and Smith (1998e) decompose the process of participation in JTPA into a series of stages. They …nd that much of what appears to be cream-skimming in simple comparisons between participants’ and eligibles’ characteristics is self-selection. For example, high school dropouts are very unlikely to be aware of JTPA and as a result are unlikely ever to apply. To assess the role of cream-skimming, Heckman, Smith and Taber (1996) study a sample of applicants from one of the NJS training centers. They …nd that program sta¤ at this training center do not cream-skim, and appear instead to favor the hard-to-serve when deciding whom to admit into the program. Such evidence suggests that cream-skimming may not be of major empirical importance, perhaps because the social service orientation of JTPA sta¤ moderates the incentives provided by the performance standards system, or because of local political incentives to serve more disadvantaged groups. For programs in Norway, Aakvik (1998) …nds strong evidence of negative selection of participants on outcomes. Heinrich (1998) reports just the opposite for a job training program in the United States. At this stage no universal generalization about bureaucratic behavior regarding cream skimming is possible. Studies based on the NJS also provide evidence on the implications of cream-skimming, even if it were to exist. Heckman, Smith and Clements (1997) …nd that except for those who are very unlikely to be employed, the impact of training does not vary with the expected levels of employment or earnings in the absence of training. This …nding indicates that the impact on e¢ciency of cream-skimming (or alternatively the e¢ciency cost of serving 43 Heckman and Smith (1997d) discuss this issue in greater depth. The discussion in the text presumes that the costs of training provided to di¤erent groups are roughly equal. 44 Program sta¤ often have some control over who applies through their decisions about where and how much to publicize the program. However, this control is much less important than their ability to select among program applicants. 71 the hard-to-serve) is low. Similarly, (1998d) …nd little empirical relationship between the outcome measures used in the JTPA performance standards system and experimental estimates of the impact of JTPA training. These …ndings suggest that cream-skimming has little impact on e¢ciency, and that administrative performance standards, to the extent that they a¤ect who is served, do little to increase either the e¢ciency or equity of training provision. 6.6 The Con‡ict Between The Economic Approach to Program Evaluation And The Modern Approach to Social Experiments We have already noted in Section 5 that under ideal conditions, social experiments identify E(Y1 ¡ Y0 jX; D = 1). Without further assumptions and econometric manipulation, they do not answer the other evaluation questions posed in Section 3. As a consequence of the self-selected nature of the samples generated by social experiments, the data produced from them are far from ideal for estimating the structural parameters of behavioral models. This makes it di¢cult to generalize …ndings across experiments or to use experiments to identify the policy-invariant structural parameters that are required for econometric policy evaluation. To see this, recall that social experiments balance bias, but they do not eliminate the dependence between U0 and D or U1 and D. Thus from the experiments conducted under ideal conditions, we can recover the conditional densities f (y0 jX; D = 1) and f (y1 jX; D = 1). From nonparticipants we can recover f (y0 jX; D = 0). It is the density f (y0 j X; D = 1) that is the new information produced from social experiments. The other densities are available from observational data. All of these densities condition on choices. Knowledge of the conditional means E(Y0 jX; D = 1) = g0 (X) + E(U0 jX; D = 1) and E(Y1 jX; D = 1) = g1 (X) + E(U1 jX; D = 1) does not allow us to separately identify the structure (g0 (X); g1 (X)) from the conditional error terms without invoking the usual assumptions made in the nonexperimental selection literature. Moreover, the error processes for U0 and U1 conditional on D = 1 are fundamentally di¤erent than those in the population at large if participation in the program depends, in part, on U0 and U1 : 72 For these reasons, evidence from social experiments on programs with di¤erent participation and eligibility rules do not cumulate in any interpretable way. The estimated treatment e¤ects reported from the experiments combine structure and error in di¤erent ways, and the conditional means of the outcomes bear no simple relationship to g0 (X) or g1 (X) (X¯ 0 and X¯ 1 in a linear regression setting). Thus it is not possible, without conducting a nonexperimental selection study, to relate the conditional means or regression functions obtained from a social experiment to a core set of policy-invariant structural parameters. Ham and LaLonde (1996) present one of the few attempts to recover structural parameters from a randomized experiment, where randomization was administered at the stage where persons applied and were accepted into the program. The complexity of their analysis is revealing about the di¢culty of recovering structural parameters from social experiments. In bypassing the need to specify economic models, many recent social experiments produce evidence that is not informative about them. They generate choice-based, endogenously strati…ed samples that are di¢cult to use in addressing any other economic question apart from the narrow question of determining the impact of treatment on the treated for one program with one set of participation and eligibility rules. 73