The Construct Related Validity of Assess
The Construct Related Validity of Assess
The Construct Related Validity of Assess
In the present study, we provide a systematic review of the assessment center literature with
respect to specific design and methodological characteristics that potentially moderate the
construct-related validity of assessment center ratings. We also conducted a meta-analysis of
the relationship between these characteristics and construct-related validity outcomes. Results
for rating approach, assessor occupation, assessor training, and length of assessor training
were in the predicted direction such that a higher level of convergent, and lower level of dis-
criminant validity were obtained for the across-exercise compared to the within-exercise rating
method; psychologists compared to managers/supervisors as assessors; assessor training com-
pared no assessor training; and longer compared to shorter assessor training. Partial support
was also obtained for the effects of the number of dimensions and assessment center purpose.
Our review also indicated that relatively few studies have examined both construct-related
and criterion-related validity simultaneously. Furthermore, these studies provided little, if any
support for the view that assessment center ratings lack construct-related validity while at
the same time demonstrating criterion-related validity. The implications of these findings for
assessment center construct-related validity are discussed.
© 2002 Elsevier Science Inc. All rights reserved.
夽 Portions of this paper were presented at the 14th annual meeting of the Society for Industrial/Organizational
Psychology, Atlanta, GA, April 1999 and the 17th annual meeting of the Society for Industrial/Organizational
Psychology, Toronto, Canada, April 2002.
∗ Corresponding author. Tel.: +1-865-974-1673; fax: +1-865-974-3163.
0149-2063/02/$ – see front matter © 2002 Elsevier Science Inc. All rights reserved.
PII: S 0 1 4 9 - 2 0 6 3 ( 0 2 ) 0 0 2 1 6 - 7
232 D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258
Over the past several decades, assessment centers have enjoyed increasing popularity.
They are currently used in numerous private and public organizations to assess thousands of
people each year (Lowry, 1997; Spychalski, Quiñones, Gaugler & Pohley, 1997; Thornton
& Byham, 1982). The validity of assessment centers is undoubtedly partially responsible for
their popularity. Evidence supporting the criterion-related validity of assessment center rat-
ings has been consistently documented (Arthur, Day, McNelly & Edens, in press; Gaugler,
Rosenthal, Thornton & Bentson, 1987). In addition, content-related methods of validation
are also regularly used in assessment center development in an effort to meet professional
and legal requirements (Sackett, 1987). Evidence for the construct-related validity of assess-
ment center dimensions, however, has been less promising. Specifically, assessment centers
are designed to evaluate individuals on specific dimensions of job performance across sit-
uations or exercises. Research, however, has indicated that exercise rather than dimension
factors emerge in the evaluation of assessees (Bycio, Alvares & Hahn, 1987; Highhouse
& Harris, 1993; Schneider & Schmitt, 1992; Turnage & Muchinsky, 1982). Thus, a lack
of evidence of convergent validity, as well as a partial lack of evidence of discriminant
validity, has been extensively reported in the literature (Brannick, Michaels & Baker, 1989;
Klimoski & Brickner, 1987; Sackett & Harris, 1988). These findings have led to a pre-
vailing view that assessment center ratings demonstrate criterion-related validity while at
the same time lacking construct-related validity (e.g., evidence of convergent/discriminant
validity).
It is important to note that this “prevailing view” is inconsistent with the unitarian con-
ceptualization of validity which postulates that content-, criterion-, and construct-related
validity are simply different strategies for demonstrating the construct validity of a test
or measure (Binning & Barrett, 1989). Here, consistent with Binning and Barrett (1989),
Landy (1986), and other proponents of the unitarian conceptualization of validity, we draw
a distinction between construct-related validity and construct validity (or validation) and
consider the “validation of personnel selection decisions [to be] merely a special case of the
more general validation process” (Binning & Barrett, 1989: 480). Psychological constructs
are conceptualizations regarding the arrangement and interaction of covarying groups of
behavior (i.e., theory-building). In this sense, a construct is a hypothesis concerning these
commonalities among behaviors. Within this framework, construct validation refers to “the
process for identifying constructs by developing measures of such constructs and examin-
ing relationships among the various measures” (Binning & Barrett, 1989: 474). Construct
validation is then, fundamentally, a process of assessing what a test or measurement mea-
sures and how well it does so. In contrast, construct-related validity (e.g., evidence of
convergent/discriminant validity) refers to a specific evidential approach for justifying a
specific measure-construct link and is one of several inferential strategies that can be used
to contribute to our understanding of the construct validity of a test.
Content-related and criterion-related validity are two other commonly used inferential
strategies where the former is typically a rational assessment of the content overlap between
the performance domain and that sampled by the predictor, and the latter is an empiri-
cal demonstration of predictor/criterion measure relationship. Thus, within the unitarian
framework of validity, content-related, criterion-related, and construct-related validity are
considered to be three of several evidential bases for demonstrating the construct validity
of a test or measure (AERA, APA & NCME, 1999; SIOP, 2002), where construct validity
D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258 233
(as differentiated from construct-related validity), generally refers to whether a test is mea-
suring what it purports to measure, how well it does so, and the appropriateness of infer-
ences that are drawn from the test scores (AERA et al., 1999; Binning & Barrett, 1989;
Landy, 1986; Lawshe, 1985; Messick, 1989, 1995, 1998). And because these evidential
bases form an interrelated, bound, logical system, demonstration of any two, conceptually
implies the third is also present (Binning & Barrett, 1989). So, within the unitarian frame-
work, at a theoretical level, if a measurement tool demonstrates criterion-related validity
and content-related validity, as has been established with assessment centers, it should also
be expected to demonstrate construct-related validity.
Here it may be helpful to expand on the assessment center “validity paradox.” Within
the context of the unitarian view of validity, this paradox is reflected in the idea that assess-
ment center ratings demonstrate (1) content-related validity—it is widely accepted that the
situations and exercises incorporated into assessment centers represent relatively realistic
work samples and that the knowledge, skills, and abilities required for successful assess-
ment center performance are the same as those required for successful job performance;
(2) criterion-related validity—as noted above, the predictive validity of assessment center
ratings has been consistently documented (Arthur et al., in press; Gaugler et al., 1987);
and (3) a lack of construct-related validity—again as noted above, the assessment center
literature consistently points to a lack of convergent and discriminant validity with respect
to assessment center dimensions (cf. Arthur, Woehr & Maldegan, 2000).
Several explanations have been postulated for the presence of evidence supporting as-
sessment center content- and criterion-related validity in the absence of construct-related
validity evidence. One recently endorsed view is the construct misspecification explana-
tion. That is, assessment centers may be measuring constructs other than those originally
intended by the assessment center designers (Arthur & Tubre, 2002; Raymark & Binning,
1997; Russell & Domm, 1995). This explanation suggests that the lack of convergent and
discriminant validity evidence is not due to measurement error, but instead due to misspecifi-
cation of the latent structure of the construct domain. As Russell and Domm (1995: 26) note,
“simply put, assessment center ratings must be valid representations of some construct(s),
we just do not know which one(s).” An alternate perspective on the construct misspecifi-
cation explanation has also recently been advanced. Several researchers have argued that
rather than the construct domain being “misspecified,” the factors are correctly specified
but misinterpreted. They argue that the exercise factors that typically emerge represent valid
cross-situational specificity and not “method bias” (Ladd, Atchley, Gniatczyk & Bauman,
2002; Lance et al., 2000).
Although conceptually plausible, the misspecification hypothesis has yet to be demon-
strated empirically. In addition, although it may be argued that construct misspecification
may not be particularly troublesome when assessment centers are used for selection or
234 D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258
promotion, it has dire implications for the use of assessment centers as training and develop-
ment interventions. Specifically, the use of assessment centers as training and development
interventions is predicated on the assumption that they are indeed measuring the specified
targeted dimensions (e.g., team building, flexibility, influencing others) and consequently,
developmental feedback reports and interviews, and individual development plans are all
designed and developed around these dimensions.
It is important to note that explanations such as the construct misspecification explanation
are predicated on the idea that the assessment center validity paradox actually exists. It is
possible, however, that this paradox is illusory. Specifically, evidence supporting the validity
paradox would require that specific assessment centers which demonstrate content- and
criterion-related validity also lack construct-related validity. Yet a cursory examination of
the literature suggests that studies examining assessment center construct-related validity
and those examining criterion-related validity are largely independent. Thus, one important
question with respect to the assessment center validity paradox is how many individual
studies have demonstrated a lack of construct-related validity while also demonstrating
criterion-related validity for a specific assessment center application?
Assuming, however, that the assessment center validity paradox is not illusory, a second
explanation for this paradox posits that assessment center design, implementation, and other
methodological factors may add measurement error that prevents appropriate convergent
and discriminant validity from being obtained (Arthur et al., 2000; Jones, 1992; Lievens,
1998). Here, it may be argued that if assessment centers are implemented in a manner consis-
tent with their theoretical and conceptual basis, more consistent validity outcomes should
be obtained. Specifically, assessment center dimensions should display content-related,
criterion-related, and construct-related validity.
Although the lack of construct-related validity has been widely cited, conceptual and
methodological explanations have not been closely considered (Jones, 1992). However,
recent research (e.g., Arthur & Tubre, 2002; Arthur et al., 2000; Born, Kolk & van der
Flier, 2000; Howard, 1997; Jones, 1992; Kudisch, Ladd & Dobbins, 1997; Lievens, 1998,
2001; Thornton, Tziner, Dahan, Clevenger & Meir, 1997) has focused on these issues, and
subsequently called the lack of construct-related validity view into question.
There is a body of research which indicates that differences in the design and implemen-
tation of assessment centers can result in large variations in their psychometric outcomes.
For example, Schmitt, Schneider and Cohen (1990) compared the correlations of overall
assessment ratings (OARs) with teacher ratings from one assessment center implemented
at 16 different sites. Although the original implementation was the same across sites, some
sites took liberties to make changes during the time the assessment center was in use.
These changes in implementation resulted in a considerable range in predictive validity
coefficients, ranging from −.40 to .82. In addition, Lievens (1998) reviewed 21 studies
that explicitly manipulated assessment center design characteristics hypothesized to impact
the construct-related validity of assessment center ratings. Across studies, design-related
characteristics were sorted into five categories namely dimension characteristics, exercise
characteristics, assessor characteristics, observation and evaluation approach, and rating
D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258 235
integration approach. Results of the review indicated no clear impact of rating integra-
tion approach or observation and evaluation approaches on evidence of assessment center
construct-related validity. However, manipulations focusing on dimension characteristics
(e.g., number of dimensions, conceptual distinctiveness, and transparency), assessor char-
acteristics (e.g., type of assessor and training), and exercise characteristics (e.g., exercise
format) were all found to moderate construct-related validity evidence.
Studies such as those reviewed by Lievens (1998) provide clear evidence that assess-
ment center design-related factors can impact the validity of assessment center ratings.
However, these studies differ markedly from the vast majority of studies on which pre-
vailing views of assessment center validity are based. Specifically, the studies reviewed
by Lievens (1998) almost exclusively incorporate experimental or quasi-experimental de-
signs in which one or two design characteristics were directly manipulated. In addition,
these studies were typically conducted in relatively artificial or contrived settings and
thus, did not address criterion-related validity. In fact, of the 21 studies included in the
Lievens review, 10 were based on student samples (students serving as either assessors,
assessees or both), 7 used videotaped “hypothetical” assessees, and none presented any
criterion-related validity data. Although this research indicates that design-related charac-
teristics can impact validity, they do not provide an indication of the actual design features
of the operational assessment centers from which the validity paradox stems. Thus, in or-
der to evaluate the role of design characteristics in the validity paradox, one must have
a clear view of the existing literature with respect to methodological and design-related
factors.
The first variables of interest are the participant-to-assessor ratio and the number of di-
mensions assessors are asked to observe, record, and rate. These variables play an important
role in the validity of assessment center ratings (Bycio et al., 1987). For example, Schmitt
236 D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258
(1977) found that instead of using the 17 designated dimensions, in evaluating participants,
assessors actually collapsed these 17 dimensions into three global dimensions for rating
purposes. Along similar lines, Sackett and Hakel (1979) found that only 5 dimensions, out
of a total of 17, were required to predict most of the variance in OARs. In an extension of
Sackett and Hakel, Russell (1985) also found that out of 16 dimensions, a single dimension
dominated assessors’ ratings.
Gaugler and Thornton (1989) further demonstrated that assessors have difficulty differ-
entiating between a large number of performance dimensions. In this study, assessors were
responsible for rating 3, 6, or 9 dimensions. Those assessors who were asked to rate 3 or 6
dimensions provided more accurate ratings than those asked to rate 9. Thus, it appears that
when asked to rate a large number of dimensions, the cognitive demands placed on assessors
may make it difficult for them to process information at the dimension level resulting in a
failure to obtain convergent and discriminant validity. These findings are consistent with the
upper limits of human information processing capacity reported in the cognitive psychol-
ogy literature (Miller, 1956). Relatedly, the role of cognitive processes in the performance
evaluation and rating process has been well established (Bretz, Milkovich & Read, 1992;
Ilgen, Barnes-Farrell & McKellin, 1993).
Although there is less direct evidence, a similar argument can be made with respect to
the number of assessment center participants any given assessor is required to observe and
evaluate in any given exercise. That is, as the participant-to-assessor ratio increases, the
cognitive demands placed on assessors may make it more difficult to process information at
a dimension level for each participant. In addition, assessment center ratings will be more
susceptible to bias and information processing errors under conditions of high cognitive
demand (Martell, 1991; Woehr & Roch, 1996).
In summary, this body of research would suggest that when placed under high cognitive
demands or overload due to a large number of dimensions (Gaugler & Thornton, 1989;
Reilly, Henry & Smither, 1990) or assigned participants, assessors are unable to distinguish
between and use dimensions consistently across exercises. This means that there is much
to lose from the inclusion of a large number of assessment center dimensions or a large
participant-to-assessor ratio. The inability to simultaneously process a large number of
dimensions across multiple participants may account for assessors’ tendency to rate using
more global dimensions resulting in a failure to obtain convergent and discriminant validity.
Consequently we hypothesized that:
Two primary evaluation approaches have been identified across assessment centers
(Sackett & Dreher, 1982; Robie et al., 2000). In the within-exercise approach assessees
are rated on each dimension after completion of each exercise. Two variations of this
within-exercise approach have been described: (a) the same assessors observe all exercises
but provide dimension ratings after observing each exercise and (b) different sets of asses-
sors observe each exercise and provide ratings for each dimension. In the across-exercise
approach, evaluation occurs after all of the exercises have been completed and dimen-
sion ratings are based on performance from all of the exercises. Two variations of the
across-exercise approach have also been described: (a) assessors provide an overall rating
for each dimension reflecting performance across all exercises and (b) assessors provide
dimension ratings for each exercise, but after all exercises are observed.
Silverman et al. (1986) provide some evidence that the choice of approach may moderate
findings of convergent and discriminant validity in assessment center ratings. And although
their results would seem to suggest that an across-exercise approach is preferable to a
within-exercise approach, Harris et al. (1993: 677) failed to replicate their findings. Harris
et al.’s results “showed that both across- and within-exercise scoring methods produced vir-
tually the same average monotrait-heteromethod correlations and heterotrait-monomethod
correlations.” However, Robie et al. (2000) recently provided further evidence supporting
the across-exercise rating approach. Specifically, Robie et al. found that when assessors
rated one dimension across all exercises, clear dimension factors emerged. Alternately,
when assessors rated all dimensions within one exercise, clear exercise factors emerged.
Given the research to date, it may be argued that the across-exercise approach is concep-
tually more appropriate and thus, results in better evidence of construct-related validity.
Consequently, we hypothesized:
Type of Assessor
The fourth factor pertains to the type of assessor, specifically psychologists vs. managers
and supervisors. In an explanation of their meta-analytic results, Gaugler et al. (1987) posit
that psychologists make better assessors because, as a result of their education and training,
they are better equipped to observe, record, and rate behavior. Sagie and Magnezy (1997)
demonstrated that type of assessor (i.e., managers vs. psychologists) significantly influenced
the construct-related validity of assessment center ratings. Thus, all things being equal,
studies that use industrial/organizational (I/O) psychologists (and similarly trained human
resource consultants and professionals) as assessors, are more likely to obtain evidence of
convergent/discriminant validity in contrast to those that use managers, supervisors, and
incumbents. Thus, we hypothesized that:
238 D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258
Hypothesis 4: The type of assessor used in assessment centers (psychologists vs. man-
agers/supervisors) will be related to construct-related validity such that dimension conver-
gent validity estimates will be higher when ratings are provided by psychologists compared
to managers/supervisors. In addition, dimension discriminant validity estimates will be
lower when ratings are provided by psychologists compared to managers/supervisors.
Assessor Training
Because assessment center ratings are obviously inherently judgmental in nature, train-
ing assessors/raters is an important element in the development and design of assessment
centers. Thus, the type of training is also an important variable (Woehr & Huffcutt, 1994).
For instance, there is consensus in the literature that frame-of-reference (FOR) is a highly
effective approach to rater training (Lievens, 2001; Noonan & Sulsky, 2001; Schleicher
& Day, 1998; Woehr & Huffcutt, 1994). However, irrespective of the training approach
used, assessment centers that have more extensive rater training are more likely to result
in ratings that display convergent/discriminant validity. Consequently, we hypothesized
that:
Another variable that may impact assessment center construct-related validity outcomes
is the purpose for which assessment center ratings are collected. Here, it may be argued
that assessors may evaluate candidates differently depending on whether their ratings will
be used for selection or promotion decisions, or for training and development purposes.
Although research focusing on rating purpose in the assessment center literature is limited,
this issue has received a great deal of attention in the performance appraisal literature.
This literature suggests that rating purpose impacts rater cognitive processing such that
raters process incoming information differently depending on whether they begin with an
evaluative or observational goal (e.g., Feldman, 1981; Woehr & Feldman, 1993). Research
in this area has indicated that raters are more likely to form differentiated dimension-based
evaluations, as opposed to overall global evaluations, when initial processing goals focus
on observation and differentiation as opposed to pure evaluation (Woehr, 1992; Woehr &
Feldman, 1993). Thus, assessment centers conducted for training and development purposes
D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258 239
may lead to more differentiated ratings than would assessment centers conducted solely for
selection or promotion purposes. Thus, we hypothesized that:
In summary, the literature reviewed above identified several assessment center method-
ological/design factors that potentially moderate assessment center dimension construct-
related validity evidence. These are (1) number of performance dimensions assessed, (2) the
participant-to-assessor ratio, (3) type of rating approach, (4) type of assessor, (5) asses-
sor training, (6) length of assessor training, and (7) assessment center purpose. Another
methodological factor that has received some attention in the assessment center litera-
ture is type of rating scale. We chose not to include this variable in the current study
for two reasons. First, we found very few studies reporting information on the type of
rating scale used and the vast majority of this small subset were laboratory-based stud-
ies and thus, would not have met our criterion for inclusion (Lievens, 1998 found only
six studies and almost all of these were laboratory-based with student samples). Second,
there is very limited evidence with respect to the impact of different rating scales in an
assessment center context. In contrast, there is a great deal more literature on the impact
of rating scales on ratings in the performance appraisal literature, and the generally ac-
cepted conclusion of this literature is that specific rating scale format has little effect on
performance ratings. In fact, over 20 years ago Landy and Farr (1980) went so far as to
call for a moratorium on rating scale format research, arguing that it had largely proved
fruitless.
Thus, the methodological factors considered here are not intended to be an exhaustive
list of all possible potential moderators. Rather these characteristics are those that appear
most likely to impact construct-related validity outcomes. That is, based on the both the
conceptual and empirical arguments presented, these characteristics appear to be those for
which the hypothesized effect on construct-related validity outcomes (both positive and
negative) can be most clearly articulated.
Present Study
We propose that the lack of convergent and discriminant validity for assessment center
dimensions is not an inherent flaw of the assessment center as a measurement tool or method,
but rather these findings may be attributable to certain design and methodological features
(Gaugler et al., 1987; Jones, 1992; Schmitt et al., 1990). Given this proposition, it would
seem worthwhile to systematically re-examine the literature on which the current view
(i.e., that assessment center ratings demonstrate content-related and criterion-related but
not construct-related validity) is based.
240 D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258
Our primary objective in the present study was to review this literature with respect
to several methodological/design-related characteristics and to conduct a meta-analysis to
empirically examine the relationship between these characteristics and assessment center
construct-related validity. Toward this objective, we first provided a detailed review of the
existing literature examining the construct-related validity of assessment center ratings. The
goal of this review was to provide summary descriptive information on the existing literature
with respect to the seven assessment center characteristics presented above and then formu-
late specific hypotheses with respect to the effect of these methodological/design-related
characteristics on the construct-related validity of assessment center ratings. We next con-
ducted a meta-analysis to test the hypothesized effects of the specified methodological and
design characteristics.
Another goal of the present study was to review the extent to which the studies comprising
the existing literature on assessment center construct-related validity simultaneously exam-
ine multiple sources of validity evidence. Here we sought to document the extent to which
studies that examine the construct-related validity of assessment center dimensions also
present data on the criterion-related validity of the assessment center dimension ratings.
Specifically, how many individual studies have demonstrated a lack of construct-related
validity while also demonstrating criterion-related validity for a specific assessment center
application? Thus, overall we sought to provide a detailed picture of the literature on which
the prevailing view of assessment center validity is based and use meta-analytic procedures
to empirically examine the impact of specific assessment center methodological/design
characteristics on the construct-related validity of assessment center ratings.
Method
A search was conducted to locate studies which empirically examined the construct-related
validity of assessment center ratings. A literature search was conducted using a number of
computerized databases (i.e., PsycINFO, Social Sciences Citation Index, Web of Science).
In addition, reference lists from obtained studies were also examined in order to identify
additional studies. We used several criteria for the inclusion of studies. First, we sought
out studies that directly examined the construct-related validity of assessment center di-
mensions. Second, we included only those studies which provided information about the
construct-related validity of operational assessment center ratings. Specifically, we focused
on studies in which assessment centers were conducted in an actual organizational context
and thus, excluded studies based on “simulated” assessment centers (i.e., we did not exclude
studies in which assessment center characteristics were examined using an experimental or
quasi-experimental approach—however we did exclude studies based on student samples
[either as assessors or assessees] or those using videotaped “hypothetical” assessees). The
search resulted in the location of 32 studies spanning over 30 years (from 1966 to 2001)
reporting results for 48 separate assessment centers. This set of studies served as the basis
for our descriptive review of the literature. Finally, we also identified the subset of these
studies that reported traditional MTMM correlation-based data. We used these summary
D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258 241
Each of the 48 assessment centers was reviewed and coded with respect to the seven
methodological/design characteristics discussed above. These characteristics were:
(1) number of dimensions evaluated; (2) participant-to-assessor ratio; (3) rating approach
(within-exercise vs. across-exercise); (4) assessor occupation (manager or supervisor vs.
psychologist); (5) whether assessor training was reported; (6) the length of assessor train-
ing; and (7) the assessment center purpose. Each study was also coded with respect to
four additional pieces of descriptive information: (1) number of assessees (i.e., sample
size); (2) number of exercises included in the assessment center; (3) descriptions of the
dimensions/constructs rated; and (4) type of analysis used to examine construct-related va-
lidity (exploratory factor analysis, confirmatory factor analysis, MTMM data, nomological
net). For those studies reporting MTMM data, we also recorded convergent (i.e., mean
monotrait-heteromethod rs) and/or discriminant validity coefficients (i.e., mean heterotrait-
monomethod rs). Finally, each of the studies was reviewed to ascertain whether criterion-
related validity evidence was reported in addition to the construct-related validity evidence.
Meta-Analytic Procedures
As previously noted, we used both convergent and discriminant validity coefficients (i.e.,
mean monotrait-heteromethod and/or heterotrait-monomethod rs, respectively) as measures
of construct-related validity—in other words, the meta-analysis used the convergent and dis-
criminant validity coefficients (rs) as the outcome statistic. Consequently, the meta-analysis
was based on the 31 (out of 48) studies that reported traditional MTMM correlation-based
data.
The participant-to-assessor ratio and number of dimensions assessed were initially coded
as continuous variables, but were converted to dichotomous variables for the meta-analysis
using a median split. We also categorized the length of training into three levels—less
than 1 day, 1–5 days, and more than 5 days of training. Although it permitted us to run
the specified analyses, the limitations associated with the coding of assessor training must
be noted. First, the variable represents whether or not assessor training was reported, not
necessarily whether training actually occurred. It is possible that training was provided and
simply not reported. Second, this coding provides no indication of the nature or content of
the training provided. It would have been preferable to code for training with respect to the
content of training, but unfortunately, very few of the studies provided sufficient detail for
such an approach.
The data analyses were performed using Arthur, Bennett and Huffcutt’s (2001) SAS
PROC MEANS meta-analysis program to compute sample-weighted convergent and dis-
criminant validities for the specified levels of the methodological characteristics. Sample
weighting assigns studies with larger sample sizes more weight and reduces the effect
242 D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258
of sampling error since sampling error generally decreases as the sample size increases
(Hunter & Schmidt, 1990). We also computed 95% confidence intervals (CIs) for the
sample-weighted convergent and discriminant validities. CIs assess the accuracy of the
estimate of the mean validity/effect size (Whitener, 1990). CIs estimate the extent to which
sampling error remains in the sample-size-weighted validity. Thus, CI gives the range of
values that the mean validity is likely to fall within if other sets of studies were taken from
the population and used in the meta-analysis. A desirable CI is one that does not include
zero if a non-zero relationship is hypothesized.
Results
As expected there was a great deal of variability across studies in terms of the specified
methodological and design characteristics. Specifically, the mean sample size across studies
was 269.58 (SD = 281.69; median = 159.5; mode = 75) and ranged from 29 to 1170.
The mean number of dimensions to be evaluated was 10.60 (SD = 5.11; median = 9.00;
mode = 8) and ranged from 3 to 25. It is also interesting to note that across the 48 assessment
center studies 129 different dimension labels were recorded (a listing of the dimension labels
is available from the authors). The mean number of exercises included in the assessment
centers represented was 4.78 (SD = 1.47; median = 5; mode = 4; minimum = 2;
maximum = 8). It should be noted that this number of exercises represents only situational
exercises, several of the studies also included paper-and-pencil measures of some of the
dimensions evaluated. Only 26 of the 48 studies (54%) reported information on the ratio
of participants to assessors. For these 26, the participant-to-assessor ratio ranged from 1
participant for 4 assessors to 4 participants for each assessor with a mean ratio of 1.71
(mode = 2) participants per assessor. With respect to rating approach, 17 studies reported
using an across-exercise approach in which dimensional ratings were collected after the
completion of all exercises. Twenty-nine reported using a within-exercise approach in which
dimensional ratings were collected after the completion of each exercise.
Thirty-five of the 48 studies (73%) provided information with respect to whether rater
training was included but of these only 22 studies (44%) reported information pertaining to
length of training. For these studies, the mean length of training was 3.36 days (SD = 3.06;
median = 2; mode = 2) ranging from 1 to 15 days. Of the 48 assessment centers presented,
40 (83%) indicated that they were used primarily for selection/promotion decisions and 8
(17%) indicated training/development as the primary purpose. Finally, 26 of the 48 studies
(54%) provided information with respect to assessor occupation, 21 (81%) reported using
managers or supervisors from the same organization in which the assessment center was
being implemented and 5 (19%) reported using psychologists.
convergent and discriminant validity coefficients as the outcome statistic. Convergent valid-
ity coefficients represent the level of intercorrelation within dimensions across exercises and
in contrast, discriminant validity coefficients represent the level of intercorrelation within
exercise and across dimensions. Thus, construct-related validity is expressed by high con-
vergent and low discriminant validity coefficients.
The results of the meta-analysis, which are presented in Table 1, indicate that the results
for rating approach (Hypothesis 3), assessor occupation (Hypothesis 4), assessor training
(Hypothesis 5), and length of assessor training (Hypothesis 6) were in the predicted direc-
tion. Specifically, for rating approach, the mean dimension convergent validity was higher
for the across-exercise approach compared to the within-exercise approach (.43 vs. .29).
In addition, dimension discriminant validity was lower for the across-exercise approach
compared to the within-exercise approach (.48 vs. .58). Likewise, for type-of-assessor,
the mean dimension convergent validity was higher for psychologists compared to man-
agers/supervisors (.45 vs. .38), and the dimension discriminant validity was lower when
ratings are provided by psychologists compared to managers/supervisors (.40 vs. .64). Sim-
ilar patterns of results were obtained for assessor training compared no assessor training.
And excluding the single data point for more than 5 days of training, the results of the
meta-analysis also indicated that longer assessor training was associated with higher levels
of convergent validity and lower levels of discriminant validity.
Partial support was obtained for the number of dimensions (Hypothesis 1) and assess-
ment center purpose (Hypothesis 7). Specifically, fewer dimensions were associated with
higher levels of convergent validity, but contrary to our hypothesis, fewer dimensions were
also associated with a higher level of discriminant validity. Likewise, although higher, the
level of convergent validity for training/development assessment centers was not meaning-
fully higher than that for assessment centers used for selection/promotion. Furthermore,
contrary to the study hypothesis, the level of discriminant validity was higher for train-
ing/development assessment centers than selection/promotion assessment centers. Finally,
the participant-to-assessor ratio hypothesis (Hypothesis 2) was not supported. Both the con-
vergent and discriminant validity results for the participant-to-assessor ratio were opposite
to what we had hypothesized—lower participant-to-assessor ratios were associated with
lower convergent validity and higher discriminant validity. In summary, 4 of the 7 study
hypotheses were fully supported, partial support was obtained for 2, and 1 hypothesis was
not supported.
244
Convergent validity Discriminant validity
K N mean r SDr % var. acc. for 95% CI K N mean r SDr % var. acc. for 95% CI
Overall 31 7440 .34 .11 25.11 .33/.36 30 6412 .55 .13 13.31 .53/.56
Number of dimensionsa
respectively. K = number of convergent/discriminant validities; N = number of participants; mean r = mean of sample−weighted convergent and discriminant
validities; SDr = standard deviation of sample−weighted convergent/discriminant validities; % var. acc. for = percentage of variance due to sampling error; 95%
CI = lower and upper values of 95% confidence interval. CIs estimate the extent to which sampling error remains in the sample-size-weighted mean effect size. Thus, CI
gives the range of values that the mean effect size is likely to fall within if other sets of studies were taken from the population and used in the meta-analysis. A desirable
CI is one that does not include zero if a non-zero relationship is hypothesized.
D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258 245
Table 2
Summary of analyses used to investigate construct and criterion-related validity evidence
Study Type of analysisa Criterion-related validity evidence
Bray and Grant EFA—Hierarchical factor analysis of mean Correlations between derived “factor” scores
(1966) rating on 25 dimensions resulted in 11 and salary progression
factors for the college sample and 8 factors
for the non-college sample
Chan (1996) MTMM—Mean within dimension, Mean AC rating (rxy with performance
cross-exercise r of .07; mean within ratings = .06; with actual promotion = .59)
exercise, cross-dimension r of .71
EFA—Principal components analysis with Consensus ‘promotability’ rating (rxy with
orthogonal rotation of 6 exercises × 14 performance ratings = .25; with actual
dimension ratings resulted in 6 exercise promotion = .70)
factors
Nomological Net—Pattern of correlations
between AC dimension ratings and
measures of cognitive ability and
personality do not support construct validity
Fleenor (1996) MTMM—Mean within dimension, Mean correlation of AC dimension rating of:
cross-exercise r of .22; mean within .10 with subordinate performance ratings; .15
exercise, cross-dimension r of .42 self-performance ratings; .17 supervisor
performance ratings
EFA—Principal components analysis with
orthogonal rotation of 8 exercises × 10
dimension ratings resulted in 8 exercise
factors
Henderson MTMM—Mean within dimension, Job performance ratings regressed on 14
et al. (1995) cross-exercise r of .19; mean within dimension scores. Results indicated only 2
exercise, cross-dimension r of .42 dimensions were significant predictors of
performance
EFA—Analysis of dimension ratings with
orthogonal rotation resulted in 5 exercise
factors
Hinrichs (1969) EFA—Principal components analysis with Scores based on 3 factors were correlated
non-orthogonal rotation of 12 trait ratings with relative salary standing, overall
resulting in 3 overlapping factors management potential, and overall
assessment center-based evaluation. The r’s
ranged from .15 to .78
Huck and Bray EFA—Principal components analysis with Overall assessment rating (rxy with overall
(1976) orthogonal rotation of 16 dimension ratings performance rating = .41; with rated
resulting in 4 factors potential for advancement = .59) for whites.
Overall assessment rating (rxy with overall
performance rating = .35; with rated
potential for advancement = .54) for blacks
Jansen and MTMM—Mean within dimension, Correlations of average salary growth with
Stoop (2001) cross-exercise r of .28; mean within dimensions scores from each exercise (mean
exercise, cross-dimension r of .62 r = .09, min. = −.02, max. = .30)
Shore et al. Nomological Net—Pattern of correlations Correlations of job advancement with peer
(1992) between self and peer AC dimension (mean r = .20) and self (mean r = .07) AC
ratings and measures of cognitive ability ratings
and personality support construct validity
a EFA: exploratory factor analysis; CFA: confirmatory factor analysis; MTMM: multitrait-multimethod data.
246 D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258
With respect to the type of evidence presented for construct-related validity, across all
of the 48 studies, several approaches were indicated with many studies incorporating mul-
tiple analytic strategies. Some type of exploratory factor analysis (typically examining the
number and nature of factors underlying ratings) was used to analyze data from 26 of the
assessment centers, and 16 cases used confirmatory factor analysis (most often evaluating
some form of a MTMM model incorporating dimension and exercise latent variables). Typi-
cal MTMM correlation matrix data (i.e., monotrait-heteromethod, monomethod-heterotrait,
rs) were reported for 32 of the assessment centers, while only 6 cases reported using a vari-
ance partitioning approach (i.e., ANOVA) looking at proportions of dimension and exercise
variance. Finally, five studies reported data based on the relationship of assessment cen-
ter dimension ratings with measures of other constructs (a “nomological net” approach
examining patterns of relationships relative to expectations).
Again, somewhat unexpectedly, evidence with respect to the construct-related validity of
assessment center ratings was mixed and tended to depend on the analytic strategy used.
Evidence from the 31 studies reporting traditional MTMM correlation-based data indicated a
mean within-dimension, across-exercise (i.e., monotrait-heteromethod) sample-weighted r
of .34 (SDr = .11) and a mean across-dimension, within-exercise (monomethod-heterotrait)
D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258 247
Discussion
The purpose of the present paper was to provide a systematic re-examination of the liter-
ature on which the current view that assessment center ratings demonstrate criterion-related
but not construct-related validity is based. We argue that these findings may be attributable to
certain design and methodological features as opposed to an inherent flaw of the assessment
center as a measurement tool or method. Thus, we examine the methodological character-
istics of studies examining the construct-related validity of assessment centers. We also
argue that the prevailing view that assessment center ratings demonstrate criterion-related
validity but not construct-related validity is inconsistent with current conceptualizations of
validity and the validation process. Thus, we also examine the extent to which evidence
with respect to criterion-related validity and construct-related validity stems from the same
empirical studies.
248 D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258
There are two limitations with the meta-analysis that should be noted. First, to investi-
gate the effects of the methodological characteristics, each was broken down into specified
levels to run the sublevel analysis. Although there is no standard as to the minimum number
of data points required for a stable and interpretable meta-analysis, breaking down vari-
ables into sublevels sometimes results in a small number of data points that can result in
second-order sampling error (Arthur et al., 2001; Hunter & Schmidt, 1990). Consequently,
because the levels of some of our methodological characteristics had a small number of
data points, their associated results should be cautiously interpreted. Second, to permit the
sublevel analyses, variables which were originally continuous (i.e., number of dimensions,
participant-to-assessor ratio, length of assessor training) had to be categorized. For the num-
ber of dimensions and the participant-to-assessor ratio, this was accomplished by using a
median split. Because of the problems associated with this procedure, we reanalyzed these
methodological characteristics by correlating each with the convergent and discriminant
validity coefficients across studies. The results of these correlational analyses replicated
those obtained for the meta-analysis.
Our primary goal in the present study was to provide a systematic review of the assess-
ment center literature with respect to specific design and methodological characteristics
that potentially moderate the validity of assessment center ratings. Given the results of this
review, we believe that studies that directly manipulate specific characteristics, where feasi-
ble, have a lot to contribute to further our understanding of assessment centers. Our findings
suggest a number of features and characteristics that may impact the validity of assessment
centers. We believe that future research should be directed toward systematically examining
design factors that influence the psychometric properties of assessment centers. These de-
sign factors include, but are not limited to, the number of dimensions used, characteristics
of assessors, how ratings are made (across-exercise vs. within-exercise), assessment center
purpose, and assessor training. Other potential moderators of the convergent/discriminant
validity of assessment centers that should also be examined include the use of behavior
checklists (Reilly et al., 1990), and the non-transparency of dimensions (Kleinmann, 1993;
Kleinmann, Kuptsch & Koller, 1996).
With respect to the extent to which findings pertaining to construct- and criterion-related
validity stem from a common literature, our results indicate that this evidence is largely
drawn from independent bodies of research. There is, of course, nothing inherently wrong
with this approach. Given that the lack of construct-related validity evidence in the presence
of criterion-related and content-related validity evidence is inconsistent with the unitarian
view of validity, it is conceivable that studies in which there was a lack of construct-related
validity may also have demonstrated a lack of criterion-related validity (which would be
consistent with the unitarian view). However, because the evidence is drawn from largely
independent research studies, this is a very plausible explanation. Indeed we found only
four studies that reported both criterion-related and construct-related validity data. And of
these, only one (Chan, 1996) reported support for criterion-related validity in the absence of
construct-related validity. For the other three (Fleenor, 1996; Henderson et al., 1995; Jansen
& Stoop, 2001), lack of construct-related validity was coupled with a lack of criterion-related
D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258 251
validity. Thus, although our results do not disprove the prevailing view of assessment center
validity, they do raise serious concerns about its veridicality. We believe that future research
should also be directed at providing simultaneous examinations of multiple evidential bases
of validity.
Our re-examination of the literature on which the current view, that assessment cen-
ter ratings demonstrate criterion-related (and content-related) validity evidence but not
construct-related validity evidence, is based suggests that these findings may be attributable
to certain design and methodological features as opposed to an inherent flaw of the assess-
ment center as a measurement tool. Alternately, there may be other plausible explanations
for the presence of assessment center content- and criterion-related validity in the absence of
convergent and discriminant validity. One such explanation focuses on the idea of construct
misspecification/misidentification (Raymark & Binning, 1997; Russell & Domm, 1995).
Thus, instead of measuring the targeted constructs of interest—such as team building, flex-
ibility, influencing others—assessment centers may unwittingly be measures of unspecified
constructs like, for example, self-monitoring or impression management (Church, 1997;
Cronshaw & Ellis, 1991) or misinterpreting the nature of “exercise” factors (Ladd et al.,
2002; Lance et al., 2000). Thus, in this particular example, the actual explanatory variable—
self-monitoring—is a “deeper” source trait operating at a different nomological level than
assessment center constructs ostensibly being measured.
On one hand, this construct misspecification hypothesis has yet to receive extensive
empirical attention and appears to be an area worthy of future research (cf. Arthur and
Tubre (2002); see also Russell and Domm’s (1995) investigation of role congruency as
a plausible explanatory construct). On the other hand, in our opinion, a potential prob-
lem with this reconciliation or explanation for the assessment center validity paradox is
that it may have dire implications for the current use of assessment centers as training
and development interventions. Specifically, the use of assessment centers in this manner
is largely predicated on the assumption that they are indeed measuring the specified tar-
geted dimensions (e.g., team building, flexibility, influencing others) and consequently,
developmental feedback reports and interviews, and individual development plans are
all designed and developed around these dimensions. Are, and have all of these efforts
been fundamentally misguided? Is this important use of assessment centers fundamen-
tally flawed? We think not. Although conceptually plausible, the misspecification hypoth-
esis has yet to be demonstrated empirically. Furthermore, given the data and arguments
presented in the present study, we are inclined to believe that the position that the lack
of discriminant and convergent validity is due to development and design factors is a
more parsimonious and succinct explanation for the assessment center construct validity
paradox.
In conclusion, our findings question the prevailing view that assessment center ratings
do not demonstrated construct-related validity, and instead they lead us to conclude that
as measurement tools, assessment centers are probably only as good as their development,
design, and implementation. Furthermore, we believe that the assessment center is a method,
and like any method there will be variability in its implementation. Thus, future research
needs to be directed toward systematically examining design factors that influence the
psychometric properties of assessment centers and at the simultaneous examination of
multiple evidential bases of validity.
252
Appendix A. Summary of Design-Related Characteristics of Studies that Investigated Convergent/Discriminant
Validity for Assessment Center Ratings
Studya Sample Participant-to- Number of Ex Within- vs. Assessor Purpose Training Length of
253
254
D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258
Appendix A (Continued )
Studya Sample Participant-to- Number of Ex Within- vs. Assessor Purpose Training Length of
size assessor ratio dimension across-exercise occupation training
(in days)
(within-exercise method = rating all dimensions within an exercise before proceeding to the next exercise; across-exercise method = rating a dimension across all
exercises before proceeding to the next dimension).
D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258 255
References
dimensions: A conceptual and empirical reexamination of the assessment center construct-related validity
paradox. Journal of Management, 26: 813–835.
Binning, J. F., & Barrett, G. V. 1989. Validity of personnel decisions: A conceptual analysis of the inferential and
evidential bases. Journal of Applied Psychology, 74: 478–494.
Born, M. P., Kolk, N. J., & van der Flier, H. 2000. A meta-analytic study of assessment center construct validity.
Paper presented at the 15th annual conference of the Society for Industrial and Organizational Psychology,
New Orleans, LA.
Brannick, M. T., Michaels, C. E., & Baker, D. P. 1989. Construct validity of in-basket scores. Journal of Applied
Psychology, 74: 957–963.
∗ Bray, D. W., & Grant, D. L. 1966. The assessment center in the measurement of potential for business management.
and behavioral checklists: Some additional findings. Journal of Social Behavior and Personality, 12: 85–108.
∗ Fleenor, J. W. 1996. Constructs and developmental assessment centers: Further troubling empirical findings.
∗ Henderson, F., Anderson, A., & Rick, S. 1995. Future competency profiling: Validating and redesigning the ICL
graduate assessment centre. Personnel Review, 24: 19–31.
∗ Highhouse, S., & Harris, M. M. 1993. The measurement of assessment center situations: Bem’s template matching
technique for examining exercise similarity. Journal of Applied Social Psychology, 23: 140–155.
∗ Hinrichs, J. R. 1969. Comparison of “real life” assessments of management potential with situational exercises,
paper-and-pencil ability tests, and personality inventories. Journal of Applied Psychology, 53: 425–432.
Howard, A. 1997. A reassessment of assessment centers: Challenges for the 21st century. Journal of Social
Behavior and Personality, 12: 13–52.
Huck, J. R., & Bray, D. W. 1976. Management assessment center evaluations and subsequent job performance of
white and black females. Personnel Psychology, 29: 13–30.
Hunter, J. E., & Schmidt, F. L. 1990. Methods of meta-analysis: Correcting error and bias in research findings.
Newbury Park, CA: Sage.
Ilgen, D. R., Barnes-Farrell, J. L., & McKellin, D. B. 1993. Performance appraisal process research in the 1980s:
What has it contributed to appraisals in use? Organizational Behaviors and Human Decision Processes, 54:
321–368.
∗ Jansen, P. G. W., & Stoop, B. A. M. 2001. The dynamics of assessment center validity: Results of a 7-year study.
assessment centers: The findings may not be so troubling after all. Journal of Social Behavior and Personality,
12: 129–144.
Kleinmann, M. 1993. Are rating dimensions in assessment centers transparent for participants? Consequences for
criterion and construct validity. Journal of Applied Psychology, 78: 988–993.
Kleinmann, M., Kuptsch, C., & Koller, O. 1996. Transparency: A necessary requirement for the construct validity
of assessment centers. Applied Psychology: An International Review, 45: 67–84.
Klimoski, R., & Brickner, M. 1987. Why do assessment centers work? The puzzle of assessment center validity.
Personnel Psychology, 40: 243–259.
Ladd, R. T., Atchley, E. K., Gniatczyk, L. A., & Baumann, L. B. 2002. An evaluation of the construct validity of an
assessment center using multiple-regression importance analysis. Paper presented at the 17th annual meeting
of the Society for Industrial/Organizational Psychology, Toronto, Canada, April 2002.
Lance, C. E., Newbolt, W. H., Gatewood, R. D., Foster, M. S., French, N. R., & Smith, D. E. 2000. Assessment
center exercise factors represent cross-situational specificity, not method bias. Human Performance, 13: 323–
353.
Landy, F. J. 1986. Stamp collecting versus science: Validation as hypothesis testing. American Psychologist, 41:
1181–1192.
Landy, F. J., & Farr, J. L. 1980. Performance rating. Psychological Bulletin, 87: 72–107.
Lawshe, C. H. 1985. Inferences from personnel tests and their validity. Journal of Applied Psychology, 70: 237–238.
Lievens, F. 1998. Factors which improve the construct validity of assessment centers: A review. International
Journal of Selection and Assessment, 6: 141–152.
Lievens, F. 2001. Assessors and use of assessment center dimensions: A fresh look at a troubling issue. Journal
of Organizational Behavior, 22: 203–221.
Lowry, P. E. 1997. The assessment center process: New directions. Journal of Social Behavior & Personality, 12:
53–62.
Martell, R. F. 1991. Sex bias at work: The effects of attentional and memory demands on performance ratings of
men and women. Journal of Applied Social Psychology, 21: 1939–1960.
Messick, S. J. 1989. Validity. In R. L. Linn (Ed.), Educational measurement: 13–103. New York: Macmillian.
Messick, S. J. 1995. The validity of psychological assessment: Validation of inferences from persons’ responses
and performances as scientific inquiry into score meaning. American Psychologist, 50: 741–749.
D.J. Woehr, W. Arthur Jr. / Journal of Management 2003 29(2) 231–258 257
Messick, S. J. 1998. Alternative modes of assessment, uniform standards of validity. In M. D. Hakel (Ed.), Beyond
multiple choice: Evaluating alternatives to traditional testing for selection: 59–74. Mahwah, NJ: Lawrence
Erlbaum.
Miller, G. A. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing
information. Psychological Review, 63: 81–97.
∗ Nedig, R. D., Martin, J. C., & Yates, R. E. 1979. The contribution of exercise skill ratings to final assessment
assessment centres: Dimensions into exercises won’t go. Journal of Occupational Psychology, 60: 187–195.
Robie, C., Adams, K. A., Osburn, H. G., Morris, M. A., & Etchegaray, J. M. 2000. Effects of the rating process
on the construct validity of assessment center dimension evaluations. Human Performance, 13: 355–370.
∗ Russell, C. J. 1985. Individual decision processes in an assessment center. Journal of Applied Psychology, 70:
737–746.
∗ Russell, C. J. 1987. Person characteristics vs. role congruency explanations for assessment center ratings. Academy
information to form overall ratings. Organizational Behavior and Human Performance, 23: 120–137.
∗ Sackett, P. R., & Harris, M. M. 1988. A further examination of the constructs underlying assessment center
∗ Thornton, G. C., III, Tziner, A., Dahan, M., Clevenger, J. P., & Meir, E. 1997. Construct validity of assessment
center judgments: Analysis of the behavioral reporting method. Journal of Social Behavior and Personality,
12: 109–128.
∗ Turnage, J. J., & Muchinsky, P. M. 1982. Transsituational variability in human performance within assessment
Winfred Arthur Jr. is currently a Professor of Psychology and Management at Texas A&M
University. He received his Ph.D. in Industrial/Organizational Psychology from the Uni-
versity of Akron in 1988. His research interests are in the areas of personnel psychology,
testing, selection, and validation, human performance, team selection and training, train-
ing development, design, delivery, and evaluation, human performance and complex skill
acquisition and retention, models of job performance, and meta-analysis.