Rater 2: Present Absent Present
Rater 2: Present Absent Present
Rater 2: Present Absent Present
Reliability = .78
Standardized (structured) interviews with
clear scoring instructions will be more
reliable than unstructured interviews. The This presents a hypothetical data set
reason is that structured interviews from a study assessing the reliability of
reduce both information variance and alcoholism diagnoses derived from a
criterion variance. structured interview. This example
assesses interrater reliability (the level of
Information variance refers to the agreement between two raters), but the
variation in the questions that clinicians calculations would be the same if one
ask, the observations that are made wanted to assess test–retest reliability. In
during the interview, and the method of that case, the data for Rater 2 would be
integrating the information that is replaced by data for Testing 2 (Retest).
obtained. As can be seen, the two raters evaluated
the same 100 patients for the
Criterion variance refers to the variation presence/absence of an alcoholism
in scoring thresholds among clinicians. diagnosis, using a structured interview.
These two raters agreed in 90% of the
cases [(30 60)/100]. Agreement here the extent that its scores correlate with
refers to coming to the same measures of peer rejection and
conclusion—not just agreeing that the aggressive behavior.
diagnosis is present but also that
the diagnosis is absent. The table also Discriminant validity refers to the
presents the calculation for kappa—a interview’s ability not to correlate with
chance-corrected index of agreement measures that are not theoretically
that is typically lower than overall related to the construct being measured.
agreement. The reason for this lower For example, there is no theoretical
value is that raters will agree on the basis reason a specific phobia (e.g., of heights)
of chance alone in situations where the should be correlated with level of
prevalence rate for a diagnosis is intelligence. Therefore, a demonstration
relatively high or relatively low. In the that the two measures are not
example shown in Table 6-5, we see that significantly correlated would indicate the
the diagnosis of alcoholism specific phobia interview’s discriminant
is relatively infrequent. Therefore, a rater validity.
who always judged the disorder to be
absent would be correct (and likely to Construct validity is used to refer to all
agree with another rater) in many cases. of these aspects of validity. The extent to
The kappa coefficient takes into account which interview scores are correlated
such instances of agreement based on with other measures or behaviors in a
chance alone and adjusts the agreement logical and theoretically consistent way.
index (downward) accordingly. In This will involve a demonstration of both
general, a kappa value between .75 and convergent and discriminant validity.
1.00 is considered to reflect excellent
interrater agreement beyond chance. predictive validity a form of criterion-
related validity. The extent to which
Validity interview scores correlate with scores on
The validity of any type of psychological other relevant measures administered at
measure can take many forms. some point in the future.