A Thin Slice Perspective on the Accuracy
A Thin Slice Perspective on the Accuracy
A Thin Slice Perspective on the Accuracy
http://www.elsevier.com/copyright
Author's personal copy
Abstract
The accuracy of Wrst impressions was examined by investigating judged construct (negative aVect,
positive aVect, the Big Wve personality variables, intelligence), exposure time (5, 20, 45, 60, and 300 s),
and slice location (beginning, middle, end). Three hundred and thirty four judges rated 30 targets.
Accuracy was deWned as the correlation between a judge’s ratings and the target’s criterion scores on
the same construct. Negative aVect, extraversion, conscientiousness, and intelligence were judged
moderately well after 5-s exposures; however, positive aVect, neuroticism, openness, and agreeable-
ness required more exposure time to achieve similar levels of accuracy. Overall, accuracy increased
with exposure time, judgments based on later segments of the 5-min interactions were more accurate,
and 60 s yielded the optimal ratio between accuracy and slice length. Results suggest that accuracy of
Wrst impressions depends on the type of judgment made, amount of exposure, and temporal location
of the slice of judged social behavior.
2007 Elsevier Inc. All rights reserved.
1. Introduction
0092-6566/$ - see front matter 2007 Elsevier Inc. All rights reserved.
doi:10.1016/j.jrp.2007.01.004
Author's personal copy
ago that people make broad generalizations about personality based on limited exposure
to others. The ubiquity of personality judgments derived from limited information and the
social consequences of these judgments make this an important topic of inquiry. The term
“thin slices” has been used to describe short excerpts of social behavior from which per-
ceivers can draw inferences about states, traits, and other personally-relevant characteris-
tics (Ambady & Rosenthal, 1992; Ambady, Bernieri, & Richeson, 2000) and is an approach
that is well suited for studying the accuracy of Wrst impressions. In the current study, we
will use the thin slice approach to study when Wrst impressions are right and wrong, and
examine the amount of exposure that judges need to increase the likelihood of producing
an accurate judgment.
Considerable research has addressed the degree to which inferences based on thin slices
are accurate, or can predict other meaningful attributes of the stimulus person. Neverthe-
less, signiWcant gaps in knowledge remain in the thin slices literature. In this study, we
address four issues that have yet to be fully studied: (a) previous empirical studies have not
fully investigated the impact of slice “thickness” (i.e., length) on accuracy across a range of
constructs. (b) Little is known about how the location of a slice within a behavioral stream
might inXuence judges’ accuracy about target individuals. It may be the case, for example,
that slices of social behavior derived from the start of a social interaction may be less infor-
mative about a target’s personality than slices taken later in an interaction as individuals
show more revealing sides of themselves. (c) It is not known to what extent accuracy may
diVer according to the construct being judged. Thus far, no single research study has inves-
tigated diVerences in accuracy for emotions, personality, and cognitive ability. Further-
more, research has not yet investigated whether the eVects of slice length and location
generalize across diVerent types of judged constructs. (d) Finally, the stimuli presented to
judges for evaluation have been highly variable from study to study (e.g., naturalistic inter-
actions vs. posed expressions; college students vs. community members; get-acquainted
conversations vs. people reading a prepared script, etc.). The list of possible moderators
could be extended yet further to include, for example, the channel of exposure (e.g., face vs.
voice, Rosenthal, Hall, Di Matteo, Rogers, & Archer, 1979). Thus, eVorts to address the
foregoing questions by comparing results between studies have been problematic. The
present study addresses these shortcomings by systematically varying slice length, slice
location within the behavioral stream, and the constructs judged, all within the situational
context of two college students participating in a getting acquainted interaction.
We will examine how the thickness of a behavioral slice aVects accuracy and/or predic-
tion. The realistic accuracy model (RAM; Funder, 2001) and the weighted average model
(WAM; Kenny, 1994) both describe when and how accuracy is achieved in person percep-
tion. These models agree that judgmental accuracy should increase as the amount of avail-
able information increases, suggesting that accuracy should be greater for “thicker” slices.
However, research does not always support this hypothesis, which we discuss in more
detail below. In the discussion that follows, we describe the diVerent methods researchers
have used that may account for inconsistent Wndings, and then we will introduce the meth-
odological approach to be used in the current study.
level of extraversion of your new acquaintance. After some period of observation and
interaction, you decide that she is highly extraverted. During a subsequent conversation,
she tells you that she views herself to be utterly extraverted. The two of you agree she is
highly extraverted and, despite the possibility that both of you are wrong, chances are good
that she really is an extravert. This scenario is about agreement, which we deWne as the
match between a thin-slice judgment and a criterion where both are measured on the same
content (e.g., judge and new acquaintance rating of extraversion). For example, researchers
have found that judgments of sexual orientation based on 1 and 10-s slices agreed with tar-
gets’ actual sexual orientation (Ambady, Hallahan, & Conner, 1999); judgments of intelli-
gence based on thin slices agreed with targets’ actual IQ scores (Murphy, Hall, & Colvin,
2003; Reynolds & GiVord, 2001); and judgments of personality traits based on thin slices
agreed with targets’ self-rated personality traits (Borkenau & Liebler, 1995; Lippa & Dietz,
2000). Moreover, many studies have found that judges’ evaluations of emotion from thin
slices of behavior exhibit agreement with criteria for those states (Nowicki & Duke, 1994;
Rosenthal et al., 1979).
Returning to our example above, now assume that you decide your new acquaintance is
highly extraverted and you use this evaluation to predict that she will be an excellent lec-
turer and receive positive student evaluations at the end of the semester. In this example, a
personality judgment is used to predict a future behavior. This is predictive validity, and in
this context it refers to the relation between ratings of thin slices and characteristics or out-
comes of the target persons that are diVerent in content from the rating. For example,
Ambady and Rosenthal (1993) showed that thin-slice judgments of enthusiasm, attentive-
ness, and warmth (among others) predicted semester-end teacher evaluations. Many of the
studies in Ambady and Rosenthal’s (1992) meta-analysis similarly demonstrate thin-slice
predictive validity. There is no theoretical requirement that the impact of slice length or
location should be the same for studies that utilize agreement versus predictive validity.
A second source of possible discrepancy in results stems from diVerences, and possi-
bly lack of clarity, in how the term “accuracy” is deWned and discussed (for a review, see
Funder, 1995). To make a claim for accuracy, a researcher performs a statistical test
against a null or chance value (what that value is depends on methodological factors;
Hall, Bernieri, & Carney, 2005). If the test is signiWcant, one might conclude that the
thin-slice judgment exhibits accuracy. But what does this mean? To Wnd that a coeY-
cient reXecting average agreement or predictive validity exceeds chance may not be say-
ing very much. Such a coeYcient can be signiWcantly better than chance even if it is
actually small in an absolute sense. For example, Zebrowitz, Hall, Murphy, and Rhodes
(2002) found in a meta-analysis of published research that intelligence was accurately
inferred from facial expressions, but the average of the individual perceivers’ correla-
tions between inference and criterion was an arguably modest r D .19. Similarly, other
research has shown that accuracy of individual perceivers’ judgments of rapport
between two people having a conversation (calculated as in the preceding study) was
signiWcant but also only r D .19 (Bernieri, Gillis, Davis, & Grahe, 1996). Although the
magnitudes of these correlations are considered to be modest to moderate on the accu-
racy continuum, the sweeping conclusion that there is “accuracy” may create a mislead-
ing impression if the distinction between signiWcance test and magnitude is not made
Author's personal copy
clear.1 Furthermore, broad statements about accuracy that do not distinguish larger
from smaller eVects are also problematic.
1.4. Context
DiVerent studies have used thin-slice samples from a wide range of settings, tasks, and
populations such as standardized interviews about a movie (Carney, 2004); oYce spaces
1
Such comparison is made doubly diYcult by the fact that diVerent studies report results that used incompati-
ble metrics, for example in terms of mean percentage accuracy versus the correlation between ratings and a crite-
rion. Within studies that use the correlation between ratings and a criterion as the operational deWnition of
accuracy, the magnitude of results can further depend on whether accuracy was calculated as the correlation be-
tween judgments and criteria across items versus across stimuli, and if it is the latter, whether accuracy was calcu-
lated per individual perceiver and then averaged across perceivers or whether the ratings were averaged across
perceivers and then correlated with the criterion (see Hall et al., 2005, for a discussion).
2
Because accuracy of judging aVect and accuracy of judging personality traits are typically scored using diVer-
ent metrics (percent accuracy vs. proWle correlation, respectively), even studies that include both do not allow a di-
rect comparison (e.g., Realo et al., 2003).
Author's personal copy
and bedrooms (Gosling, Ko, Mannarelli, & Morris, 2002); university employees (Schmid
Mast & Hall, 2004); college students having a competitive discussion (Bernieri et al., 1996);
community-dwelling adults reading a weather report (Borkenau & Liebler, 1993); college
men role-playing being a television announcer (Lippa & Dietz, 2000); or college students
getting acquainted or talking with a close friend (Vogt & Colvin, 2003). Such variation
contributes to the ability to generalize if the results converge across studies. However, com-
parisons between studies can be confounded by these contextual diVerences. In the present
study we held context constant by basing all analyses on one set of expressors in one con-
text: college students in an opposite-gender get-acquainted situation.
Research is mixed on whether slice length is related to accuracy. It was found that
observers’ accuracy in judging targets’ personality increased with the number and variety
of targets’ videotaped behavioral contexts, such as introducing oneself versus telling a joke
versus solving a logical problem versus telling a dramatic story (Borkenau, Mauer, Rie-
mann, Spinath, & Angleitner, 2004). Other research has shown that across 100 personality
items, there was a statistically signiWcant linear increase in agreement between observers’
and targets’ ratings as a function of exposure time. In this latter study, accuracy increased
from r D .22 when judgments were based on 5–10 min to r D .26 when judgments were based
on 25–30 min (Blackman & Funder, 1998). Ambady et al. (1999) found that judgments of
sexual orientation increased from r D .35 at 1-s exposures to r D .52 at 10-s exposures.
Rosenthal et al. (1979) reported on a version of the ProWle of Nonverbal Sensitivity (PONS
test) in which the exposures to videotaped face and body cues were 1/24, 3/24, 9/24, and 27/
24 s. Accuracy was signiWcantly greater than chance even at 1/24 s, but increased dramati-
cally after that (with not much change across the longer exposure lengths). Some research-
ers have found a linear trend in accuracy for judging basic facial emotions across exposure
lengths of 1/15, 2/15, and 3/15 s (Matsumoto et al., 2000).
On the other hand, the meta-analysis of Ambady and Rosenthal (1992) found that
across 38 studies of thin-slice accuracy, there was no linear increase in correlations from
slices of under 30 s to slices of 300 s in length. Moreover, Ambady and Rosenthal (1993)
showed that prediction of end-of-semester teaching evaluations did not vary when slices of
2 s versus 5 s were compared (prediction based on 10 s was stronger but not statistically sig-
niWcantly so). Bernieri and Gillis (2001) compared accuracy of judging rapport for slices
varying from 5 s to 60 min and found only minimal increases as more information was
made available to perceivers.
It appears that there is little consensus on whether exposure length makes a diVerence.
Again, due to methodological diVerences, comparisons between studies that examine diVerent
slice lengths are problematic. For example, it would be inadvisable to compare accuracy for
2-s slices of aVect conveyed in the voice (Rosenthal et al., 1979) to accuracy for 5-min slices of
personality conveyed in full video (Vogt & Colvin, 2003). In reality, slice length may matter
only under some circumstances, for some constructs, or within a speciWc range of slice
lengths. It is also important to reiterate that Wnding accuracy at very short exposures is not
inherently incompatible with Wnding an eVect of slice length. Furthermore, there may be a lin-
ear eVect up to a point and no evident gains beyond that, or there may be threshold eVects.
Clearly, much more research is needed on the question of accuracy as a function of slice
length. In the present study we examined Wve slice lengths: 5, 20, 45, 60, and 300 s.
Author's personal copy
Very little is known about where in the behavioral stream thin slices are most diag-
nostic. It has been argued that accuracy is enhanced when judges are exposed to “good”
information (i.e., information derived from contexts in which individuals freely express
their underlying personality characteristics) as opposed to less valid information
(Funder, 1995, 2001). In addition, Funder noted that “the quality dimension is just
beginning to receive its due attention, but already it seems clear that some kinds of
acquaintanceship and contexts of observation are more informative than others”
(Funder, 2001, p. 133).
We will examine whether the location within the behavioral stream from which a
slice is excerpted (Wrst min, middle/third min, and Wfth/last min) is related to accuracy.
We reasoned that when strangers “get to know each other” during a 5 min interaction,
as in the current study, information contained in the beginning of the behavioral stream
may contain awkwardness as the two strangers settle in. Then, as the strangers begin to
feel more comfortable with their environment and with each other, the information
may be optimal for making accurate assessments because, presumably, they are acting
more consistent with who they really are as they feel more comfortable. We predicted
that judges’ accuracy would be highest in the third and Wfth minute and lowest in the
Wrst.
would be more accurate at detecting negative than positive aVect after extremely brief
exposure.
Accuracy research often demonstrates that female judges are more accurate than male
judges, particularly when judging emotions (see meta-analyses by Hall, 1978, 1984; and
McClure, 2000). Women’s greater accuracy has also been observed for judgments of per-
sonality traits and intelligence (Ambady, Hallahan, & Rosenthal, 1995; Lippa & Dietz,
2000; Murphy et al., 2003; Vogt & Colvin, 2003). Overall, past research suggests that
women may be more accurate than men on all the constructs we assessed in the current
study.
Author's personal copy
Table 1
Ratings and corresponding criteria for eight judged constructs
Construct Items that targets were judged on Target information (criterion mea-
sures)
Positive Active, alert, attentive, determined, enthu- .90 Emotive behavioral Q-sort items .77
aVect siastic, inspired, interested, proud, strong
(PANAS; Watson et al., 1988)
Negative aVect Distressed, upset, afraid, jittery, nervous, .89 Emotive behavioral Q-sort items .83
guilty, scared, hostile, irritable (PANAS;
Watson et al., 1988)
Neuroticism Nervous, moody, fearful, self-pitying .79 NEO-Neuroticism (average of self, .87
friends, and parents)
Extraversion Talkative, energetic, outgoing, dominant .86 NEO-Extraversion (average of self, .85
friends, and parents)
Openness Wide interests, intelligent, insightful, .79 NEO-Openness (average of self, .84
curious friends, and parents)
Agreeableness Sympathetic, kind, trusting, pleasant .80 NEO-Agreeableness (average of self, .92
friends, and parents)
Conscientiousness Dependable, conscientious, precise, .79 NEO-Conscientiousness (average of .92
practical self, friends, & parents)
Intelligence Estimated IQ N/A Wonderlic score (performance .89
measure)
Note: No reliability coeYcient for intelligence ratings was computed because there was only one item. , coeY-
cient alpha.
2. Method
2.1.3. Personality
The Big Five factors of neuroticism, extraversion, openness, agreeableness, and consci-
entiousness were calculated for each target by using an average of self report, peer report,
and parent reports on the NEO-PI-R (Costa & McCrae, 1992; peer and parent reports are
further described in Vogt & Colvin, 2003).
2.1.4. Intelligence
Intelligence was measured with the Wonderlic personnel test (Wonderlic, 1984), a 12-
min 50-item measure that is highly correlated with established IQ tests (Dodrill, 1983).
4
Across the three group of targets, accuracy of participants’ responses did not diVer and the pattern of relation-
ships was similar. Therefore, all reported results are based on data collapsed across all targets.
Author's personal copy
Accuracy was calculated for each judge, for each judged construct, across targets using
proWle correlations (Carney, 2004; Hall et al., 2005; Hall & Carter, 1999; Lippa & Dietz,
2000; Tickle-Degnen & Lyons, 2004; Vogt & Colvin, 2003).5 After each judge’s accuracy
score for each construct was calculated, the proWle correlations (accuracy scores) were
transformed into Fisher’s-z coeYcients before any descriptive or inferential statistics were
conducted. All analyses were based on the Fisher’s-z transformed accuracy scores and
results were converted back into r for presentation.6
3. Results
3.1. Overall eVects for constructs, slice length, and slice location
5
For example, judge 1 rated targets 1–10 on four extraversion items. Judge 1’s four ratings were then averaged
for each target. Then, judge 1’s averaged ratings of extraversion for targets 1–10 were correlated with target 1–
10’s criterion extraversion scores (derived from an average of self, friend, and parent’s NEO on the target). This
correlation is a proWle, or accuracy, correlation indicating how accurate judge 1 is at making assessments of the
rated targets’ extraversion.
6
Correlation coeYcients are not normally distributed; thus, transforming them into Fisher-z coeYcients, which
are normally distributed, circumvents a non-normality violation which is an assumption in ANOVA and other
statistical tests based on the general linear model.
Author's personal copy
Table 2
Mean accuracy for diVerent constructs and slice lengths
Slice length Linearity test
Overall 5s 20 s 45 s 60 s 300 s t
¤¤¤ ¤¤¤ ¤¤¤ ¤¤¤ ¤¤¤
Positive aVect .20 .06 .28 .25 .20 .26 1.29
Negative aVect .32¤¤¤ .31¤¤¤ .35¤¤¤ .31¤¤¤ .33¤¤¤ .28¤¤¤ ¡.51
Neuroticism .21¤¤¤ .14¤ .19¤¤¤ .25¤¤¤ .22¤¤¤ .29¤¤¤ 1.46
Extraversion .42¤¤¤ .22¤¤¤ .41¤¤¤ .46¤¤¤ .52¤¤¤ .55¤¤¤ 2.73¤¤
Openness .17¤¤¤ .10¤¤¤ .22¤¤¤ .20¤¤¤ .16¤¤¤ .21¤¤ .74
Agreeableness .11¤¤¤ .04 .09¤ .12¤¤ .17¤¤¤ .21¤¤ 1.63+
Conscientiousness .28¤¤¤ .21¤¤¤ .26¤¤¤ .28¤¤¤ .34¤¤¤ .39¤¤¤ 2.06¤
Intelligence .22¤¤¤ .24¤¤¤ .20¤¤¤ .22¤¤¤ .24¤¤¤ .21¤¤ ¡.22
Overall (across construct) .25¤¤¤ .17¤¤¤ .25¤¤¤ .26¤¤¤ .28¤¤¤ .31¤¤¤ 4.74¤¤¤
Note: Mean accuracy values were calculated using Fisher-z transformed accuracy correlations and were trans-
formed back into r for presentation. One-sample t-tests performed on the Fisher-z transformed accuracy scores
were used to test correlations against zero.
Ns in each exposure-length condition (in order according to the table) are: 82, 74, 73, 77, and 24. Linear weights
were proportional to the slice length (¡81, ¡66, ¡41, ¡26, and +214).
¤
p < .05.
¤¤
p < .01.
¤¤¤
p < .001.
+
p < .10.
Table 3
SigniWcance values for paired-samples comparisons between diVerent constructs
Construct PA NA N E O A C IQ
(M r D .20) (M r D .32) (M r D .21) (M r D .42) (M r D .17) (M r D .11) (M r D .28) (M r D .22)
PA .001 .82 .001 .18 .01 .001 .36
NA .001 .001 .001 .001 .112 .001
N .001 .30 .001 .01 .54
E .001 .001 .001 .001
O .09 .001 .08
A .001 .001
C .01
Note: PA, positive aVect; NA, negative aVect; N, neuroticism; E, extraversion; O, openness; A, agreeableness; C,
conscientiousness; IQ, intelligence quotient.
constructs, there was a statistically signiWcant linear eVect of slice length (bottom of Table
2). These results suggest that overall, accuracy increases with exposure length.
It is important, of course, to understand the eVects of slice length and location for indi-
vidual constructs. In the sections that follow, the eight judged constructs are grouped into
aVect, personality, and cognitive ability.
Table 4
Mean accuracy at diVerent slice locations
Construct Slice location
First minute Third minute Fifth minute
Positive aVect .11a .21ab .26b
Negative aVect .24a .31a .42b
Neuroticism .19a .18a .22a
Extraversion .29a .51b .41c
Openness .16ac .23ab .12c
Agreeableness .08a .12a .12a
Conscientiousness .20a .28ab .34b
Intelligence .11a .30b .26b
Note: Mean accuracy values were calculated using Fisher-z transformed accuracy correlations and were trans-
formed back into r for presentation. Within a row, accuracy values sharing subscripts are not signiWcantly diVer-
ent whereas values with diVerent subscripts are (p < .05).
other slice lengths were not diVerent from each other (p > .57). There was no linear eVect of
slice length on accuracy (see last column in Table 2).
To examine the eVects of slice location on positive aVect, a 4 (slice length) £ 3 (slice loca-
tion) between-participants ANOVA was used. There was a main eVect of slice location,
F(2, 326) D 3.75, p < .02 (Table 4). Accuracy was greatest when judgments were based on
the third or Wfth minute (the comparison of Wrst with third min was not signiWcant). To
determine whether slice location mattered for each of the slice lengths, a series of one-way
ANOVAs was conducted on the slice locations for each slice length. A statistically signiW-
cant one-way ANOVA would indicate that slice location moderated accuracy for a partic-
ular slice length. Table 5 displays the mean accuracy achieved at each slice location
separately for each slice length. In the column farthest to the right is the one-way ANOVA
indicating whether slice location mattered for each particular slice-length. Rows 1 through
4 of Table 5 suggest that slice location appears to matter for all slice lengths and matters
the most for 5- and 60-s exposures (Table 5).
3.3.1. Neuroticism
The Wrst column in Table 2 shows that judges were signiWcantly greater than chance at
detecting neuroticism. Table 2 also shows that accuracy for neuroticism was achieved at 5 s
Author's personal copy
Table 5
Mean accuracy for combinations of slice length and slice locations
Construct Slice location
Slice length First minute Third minute Fifth minute One-way ANOVA
Positive aVect 5s ¡.12 .12 .16 F(2, 79) D 4.85, p < .01
20 s .26 .16 .38 F(2, 71) D 2.72, p < .08
45 s .16 .21 .33 F(2, 70) D 2.50, p < .09
60 s .13 .37 .13 F(2, 74) D 4.97, p < .01
Negative aVect 5s .11 .37 .43 F(2, 79) D 5.95, p < .01
20 s .22 .38 .45 F(2, 71) D 4.44, p < .02
45 s .27 .25 .37 F(2, 70) D .95, p > .39
60 s .34 .21 .41 F(2, 74) D 2.12, p > .12
Neuroticism 5s .07 .11 .22 F(2, 79) D .70, p > .49
20 s .25 .18 .15 F(2, 71) D .46, p > .63
45 s .16 .25 .32 F(2, 70) D .65, p > .52
60 s .27 .18 .20 F(2, 74) D .28, p > .75
Extraversion 5s .02 .42 .19 F(2, 79) D 9.67, p < .001
20 s .15 .54 .51 F(2, 71) D 10.65, p < .001
45 s .44 .53 .41 F(2, 70) D 1.17, p > .31
60 s .50 .55 .51 F(2, 74) D .29, p > .75
Openness 5s .21 .10 .00 F(2, 79) D 4.34, p < .02
20 s .12 .29 .25 F(2, 71) D 1.54, p > .22
45 s .09 .36 .14 F(2, 70) D 3.37, p < .04
60 s .21 .17 .08 F(2, 74) D 1.01, p > .36
Agreeableness 5s ¡.03 .04 .10 F(2, 79) D 1.00, p > .37
20 s .21 .04 .03 F(2, 71) D 2.86, p < .07
45 s .01 .17 .17 F(2, 70) D 1.78, p > .17
60 s .11 .27 .17 F(2, 74) D 1.45, p > .24
Conscientiousness 5s .12 .18 .32 F(2, 79) D 2.36, p > .10
20 s .28 .22 .28 F(2, 71) D .21, p > .81
45 s .14 .29 .38 F(2, 70) D 3.48, p < .04
60 s .23 .43 .39 F(2, 74) D 2.69, p < .08
Intelligence 5s .10 .34 .26 F(2, 73) D 3.93, p < .03
20 s .11 .16 .32 F(2, 70) D 2.89, p < .07
45 s .09 .34 .20 F(2, 65) D 2.15, p > .12
60 s .12 .36 .26 F(2, 73) D 3.44, p < .04
(although statistically signiWcant, the magnitude of the mean accuracy correlation was very
small) and longer exposures. Planned contrasts revealed no diVerences between the diVer-
ent slice lengths on accuracy (all p > .52) and no signiWcant linear trend.
To examine the eVects of slice location on neuroticism, a 4 (slice length) £ 3 (slice loca-
tion) between-participants ANOVA was used. There was no main eVect of slice location,
F(2, 303) D .29, p > .74 (Table 4). Table 5 (the 4 rows associated with neuroticism) suggests
that slice location did not aVect accuracy at any slice lengths.
3.3.2. Extraversion
The Wrst column in Table 2 shows that judges were signiWcantly better than chance at
detecting extraversion. In addition, Table 2 shows that statistically signiWcant accuracy for
extraversion was achieved at all slice lengths. Planned contrasts revealed that 5 s was less
Author's personal copy
accurate than all other slice lengths (all p < .001), and there were no other diVerences (all
p > .21). There was a linear eVect of slice length on accuracy for extraversion (Table 2).
To examine the eVects of slice location on extraversion, a 4 (slice length) £ 3 (slice loca-
tion) between-participants ANOVA was used. There was a main eVect of slice location on
accuracy, F(2, 294) D 12.64, p < .001. Accuracy was lower when it was based on the Wrst or
third min (Table 4). Table 5 (the 4 rows associated with extraversion) suggests that slice
location aVected accuracy at 5 and 20 s but not at 45 and 60 s exposures.
3.3.3. Openness
Table 2, column 1, shows that judges were signiWcantly greater than chance at detect-
ing openness. Table 2 also shows that accuracy for openness was achieved at all slice
lengths. Planned contrasts revealed no diVerences in accuracy as a function of slice
length (all p > .22). There was no linear eVect of slice length on accuracy for openness
(Table 2).
To examine the eVects of slice location on openness, a 4 (slice length) £ 3 (slice location)
between-participants ANOVA was used, revealing a main eVect of slice location, F(2,
294) D 3.03, p < .05. Accuracy was lower in the Wrst as compared to the third and Wfth min
(the latter two were not diVerent; Table 4). Table 5 (the 4 rows associated with openness)
suggests that slice location aVected accuracy the most at 5 and 45 s but not at 20 and 45 s.
3.3.4. Agreeableness
Column 1 in Table 2 shows that judges were signiWcantly better than chance at detecting
agreeableness. Table 2 shows that signiWcant accuracy was not achieved until 20-s expo-
sures, and did not reach a magnitude above .15 until 60-s exposures. Planned contrasts
revealed that 5 s was less accurate than 60 s (p < .08). No other diVerences were statistically
signiWcant (all p > .18). The linear eVect of slice length on accuracy for agreeableness was
marginally signiWcant (Table 2).
To examine the eVects of slice location on agreeableness, a 4 (slice length) £ 3 (slice loca-
tion) between-participants ANOVA was used. There was no main eVect of slice location,
F(2, 294) D .74, p > .47 (Table 4). Table 5 (the 4 rows associated with agreeableness) sug-
gests that slice location did not aVect accuracy except for slightly at 20-s exposures.
3.3.5. Conscientiousness
Column 1 in Table 2 reveals that judges were signiWcantly better than chance at detect-
ing conscientiousness. In addition, Table 2 shows that accuracy for conscientiousness was
better than chance at all slice lengths. Planned contrasts revealed that 5 s was not diVerent
from any other lengths (all p > .11). The linear eVect of slice length on accuracy was statisti-
cally signiWcant (Table 2).
To examine the eVects of slice location on agreeableness, a 4 (slice length) £ 3 (slice loca-
tion) between-participants ANOVA was used. There was no main eVect of slice location,
F(2, 317) D 5.27, p < .01 (Table 4). Table 5 (the 4 rows associated with conscientiousness)
suggests that slice location aVected accuracy the most at 45- and 60-s exposures.
Accuracy for judging intelligence was signiWcantly greater than zero overall (Column
1 of Table 2). Table 2 also shows that accuracy for intelligence was achieved at all slice
Author's personal copy
lengths from 5 s to 5 min. Post hoc tests revealed that none of the slice lengths
was diVerent from any other (p > .97), and there was no linear eVect of slice length
(Table 2).
To examine eVects of slice location on intelligence a 4 (slice length) £ 3 (slice location)
between-participants ANOVA was used. There was a main eVect of slice location, F(2,
281) D 8.79, p < .001, such that accuracy based on the Wrst min was lower than accuracy
based on the third or Wfth minute (third and Wfth were not diVerent; Table 4). Table 5 (the
4 rows associated with intelligence) suggests that slice location did not aVect accuracy
except for at 60 s exposures.
Over all constructs, females’ mean accuracy across all judged variables was higher (M
r D .26) than males’ (M r D .22), F(1, 325) D 7.11, p < .01; eVect size r D .15. At the individual
construct level, one-way ANOVAs revealed that females achieved signiWcantly or margin-
ally signiWcantly higher accuracy (M rs D .35, .20, and .25) than males (M rs D .28, .12, and
.18) on negative aVect, F(1, 325) D 3.26, p < .08, openness F(1, 325) D 4.24, p < .04, and intel-
ligence, F(1, 312) D 3.33, p < .07, respectively. Gender did not interact with slice length (all
p > .32) or location (all p > .40).
4. Discussion
Our empirical results demonstrate that the achievement of accurate Wrst impressions
depends upon the judgment being made, and the quantity and quality of information on
which the judgment is based. Under varying judgment circumstances, Wrst impressions
either can be remarkably right or substantially wrong.
We predicted that exposure time and accuracy would be positively related, which is con-
sistent with two models of trait accuracy (Funder, 2001; Kenny, 1994) and previous
research on judgments of personality based on excerpts longer than 5 min (Blackman &
Funder, 1998). Consistent with these predictions, there was a positive relationship between
exposure time and accuracy for the aggregated set of variables. However, slice length (i.e.,
exposure time) mattered most for three variables: positive aVect, extraversion, and agree-
ableness. For each of these three variables, accuracy at 5 s was signiWcantly lower than
accuracy at longer exposures. There was a signiWcant linear eVect of slice length on accu-
racy for extraversion and agreeableness. These results seem to suggest that more informa-
tion yields more accuracy particularly for constructs related to positive aVect and social
approach.
In contrast, increased exposure time was unrelated to accuracy when judging negative
aVect, neuroticism, openness, and intelligence – accuracy was no diVerent at 5 s than at
300 s. Judgments of conscientiousness exhibited a linear increase in accuracy with greater
slice length, but the diVerence between 5 and 300 s was not statistically signiWcant. Overall,
it might be argued that these Wve variables fall into the two categories of negative aVect or
threat, and intelligence or competence. Quick and accurate judgments of these behavioral
categories may be both life-saving and life-promoting.
Author's personal copy
The current study, as far as we know, is the Wrst to empirically examine the inXuence of
slice location on accuracy. Judges observed targets in an unstructured “get acquainted”
interaction and it was found, consistent with our predictions, that slices from the middle
and end of the behavioral stream produced the greatest accuracy for 6 of 8 variables (the
exceptions were neuroticism and agreeableness). Overall, accuracy was highest for slices
extracted from the third minute of the 5 min interaction. Considerable accuracy was also
observed for slices extracted from the Wfth min, whereas accuracy for the Wrst min was
much lower. The middle min may be the most informative as participants pass through the
nervous phase of initial introductions, begin to learn something about each other’s charac-
teristics, but have not yet entered the awkward period of having nothing left to discuss.
These results support Funder’s (2001) claim that qualitative diVerences in information can
inXuence the accuracy of judges’ ratings. This Wnding is more prescriptive than theoretical;
however, it suggests that slice location ought to be considered in future research on Wrst
impressions. It remains an open question whether slices positioned later in the behavioral
stream will consistently produce greater accuracy.
Because the thin slice literature is marked by studies that are not always directly
comparable, we thought it was important to investigate in one study Wrst impression
accuracy for emotions, personality traits, and intelligence. Furthermore, the stimuli
presented to judges for evaluation were all derived from the same set of dyadic get
acquainted interactions so that accuracy estimates could be compared across con-
structs. As it turned out, there was considerable variability in the judgment accuracy of
our set of constructs (e.g., Funder & Colvin, 1988). Thin-slice judgments for extraver-
sion exhibited the greatest accuracy, followed by negative aVect, conscientiousness, IQ,
neuroticism, positive aVect, openness, and agreeableness (for which accuracy was sig-
niWcantly lower than all other variables).
We expected, and found, greater accuracy for negative than positive aVect, which may
reXect the survival value of accurately judging negative aVect. We expected accuracy to be
higher for extraversion, conscientiousness, and agreeableness than for neuroticism and
openness. Our results were consistent with this pattern except that, on average, neuroticism
was judged relatively accurately whereas agreeableness was not. The results are consistent
with those reported by Kenny, Albright, Malloy, and Kashy (1994) on accuracy at zero-
acquaintance in which consensus on the Big 5 factors was found to be highest for extraver-
sion and lowest for agreeableness.
Women’s judgments of targets, across all 8 constructs, were signiWcantly more accurate
than men’s judgments of targets (for related personality results, see Vogt & Colvin, 2003).
Women were signiWcantly or marginally more accurate on openness, intelligence (e.g.,
Murphy et al., 2003), and negative aVect (as frequently found in aVect judgment studies).
Women and men did not diVer in accuracy on neuroticism, extraversion, or positive aVect
(cf., Lippa & Dietz, 2000; Ambady et al., 1995).
Author's personal copy
It should be noted that our accuracy criteria, although based on multiple methods, mul-
tiple raters, or objective tests, varied across constructs. The constructs explored spanned
aVective, personality, and intelligence domains, and thus required diVerent types of accu-
racy criteria against which to compare judges’ ratings. For example, intelligence is typically
measured with an objective IQ measure as it was in our study, and not by self, friend, or
parent ratings as our other traits were. Thus, the direct comparability of our intelligence
criteria to other criteria used in this study may be questioned. Nevertheless, at least one
study has demonstrated that observers can evaluate the intelligence of others with consid-
erable validity (Block & Kremen, 1998). Still, it is diYcult to know whether the various cri-
teria used in this study were equally valid. Therefore, diVerences in average levels of
accuracy from construct to construct may be due to in part to the diVerences in criterion
measurement.
A strength of the study is that we evaluated accuracy in the emotion, personality, and
intelligence domains. However, our sampling of constructs is a relatively small one and the
results we reported are likely to reXect this fact. Future research will beneWt from studies
that evaluate a wide range of constructs in order to develop parameter estimates for the
independent variables we and others have begun to study. We considered only a restricted
range of exposure lengths (only up to 5 min). It may now be the right time to integrate thin
slices and trait accuracy research in which the former tends to make judgments based on 5 s
to 5 min observations whereas the latter tends to make judgments based on 5 min and
longer observations.
Sixty-s slices provided suYcient behavioral fodder to yield the optimal accuracy-to-
slice-length ratio for all judged variables. Increasing exposures to 300 s (the full 5 min) did
not signiWcantly increase accuracy from that obtained at one minute. However, 60-s slices
generally yielded signiWcantly more accuracy than shorter slices. This result is at odds with
previous Wndings in which thicker slices were not related to accuracy (Ambady & Rosen-
thal, 1992, 1993). In addition, 60-s slices were the most impervious to slice location. That is,
slice location did not generally inXuence accuracy if the excerpts were 60 s long. Thus, if a
researcher’s goal is to determine an excerpt length to optimize accuracy, 60 s is the answer.
However, if a researcher needs to know which variable to study because his or her question
can only be met by investigating a variable under conditions of extremely brief exposure,
the answers, in the order of magnitude of accuracy at 5-s exposure, are negative aVect,
extraversion, conscientiousness, and intelligence.
References
Albright, L., Kenny, D. A., & Malloy, T. E. (1988). Consensus in personality judgments at zero acquaintance.
Journal of Personality and Social Psychology, 55, 387–395.
Allport, G. W. (1937). Personality: A psychological interpretation. New York: Holt.
Ambady, N., & Rosenthal, R. (1992). Thin slices of expressive behavior as predictors of interpersonal conse-
quences: a meta-analysis. Psychological Bulletin, 111, 256–274.
Ambady, N., & Rosenthal, R. (1993). Half a minute: predicting teacher evaluations from thin slices of nonverbal
behavior and physical attractiveness. Journal of Personality and Social Psychology, 64, 431–441.
Author's personal copy
Ambady, N., Bernieri, F. J., & Richeson, J. A. (2000). Toward a histology of social behavior: judgmental accuracy
from thin slices of the behavioral stream. In M. P. Zanna (Ed.), Advances in experimental social psychology
(Vol. 32, pp. 201–271). San Diego, CA: Academic Press.
Ambady, N., Hallahan, M., & Conner, B. (1999). Accuracy of judgments of sexual orientation from thin slices of
behavior. Journal of Personality and Social Psychology, 77, 538–547.
Ambady, N., Hallahan, M., & Rosenthal, R. (1995). On judging and being judged accurately in zero acquaintance
situations. Journal of Personality and Social Psychology, 69, 518–529.
Bernieri, F. J., & Gillis, J. S. (2001). Judging rapport: employing Brunswik’s lens model to study interpersonal sen-
sitivity. In J. A. Hall & F. J. Bernieri (Eds.), Interpersonal sensitivity: Theory and measurement (pp. 67–86).
Mahwah, NJ: Erlbaum.
Bernieri, F., Gillis, J. S., Davis, J. M., & Grahe, J. E. (1996). Dyad rapport and the accuracy of its judgment across
situations: a lens model analysis. Journal of Personality and Social Psychology, 71, 110–129.
Blackman, M. C., & Funder, D. C. (1998). The eVect of information on consensus and accuracy in personality
judgment. Journal of Experimental Social Psychology, 34, 164–181.
Block, J., & Kremen, A. (1998). IQ and ego-resiliency: conceptual and empirical connections and separateness.
Journal of Personality and Social Psychology, 70, 349–361.
Block, J. H., & Block, J. (1980). The role of ego-control and ego-resiliency in the organization of behavior. In W.
A. Collins (Ed.), Minnesota Symposium on Child Psychology (Vol. 13, pp. 39–101). Hillsdale, NJ: Erlbaum.
Borkenau, P., & Liebler, A. (1993). Convergence of stranger ratings of personality and intelligence with self-
ratings, partner ratings, and measured intelligence. Journal of Personality and Social Psychology, 65, 546–
553.
Borkenau, P., & Liebler, A. (1995). Observable attributes as manifestations and cues of personality and intelli-
gence. Journal of Personality, 63, 1–25.
Borkenau, P., Mauer, N., Riemann, R., Spinath, F. M., & Angleitner, A. (2004). Thin slices of behavior as cues of
personality and intelligence. Journal of Personality and Social Psychology, 86, 599–614.
Carney, D. R. (2004). The nonverbal expression and accurate detection of implicitly and explicitly measured anti-
Black attitudes. Unpublished doctoral dissertation, Northeastern University, Boston, MA.
Costa, P., & McCrae, R. (1992). NEO PI-R Professional Manual. Odessa, FL: Psychological Assessment
Resources.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 177–
193.
Darwin, C. (1965). The expression of the emotions in man and animals. Chicago: University of Chicago Press (Orig-
inal work published 1872).
Dodrill, C. B. (1983). Long-term reliability of the Wonderlic personnel test. Journal of Consulting and Clinical Psy-
chology, 51, 316–317.
Esteves, F., Dimberg, U., & Oehman, A. (1994). Automatically elicited fear: conditioned skin conductance
responses to masked facial expressions. Cognition and Emotion, 8, 393–413 experiments. Berkeley, CA: Uni-
versity of California Press.
Funder, D. C. (1995). On the accuracy of personality judgment: a realistic approach. Psychological Review, 102,
652–670.
Funder, D. C. (2001). Accuracy in personality judgment: research and theory concerning an obvious question. In
B. W. Roberts & R. Hogan (Eds.), Personality Psychology in the Workplace. Decade of Behavior (pp. 121–140).
Washington: American Psychological Association.
Funder, D. C., & Colvin, C. R. (1988). Friends and strangers: acquaintanceship, agreement, and the accuracy of
personality judgment. Journal of Personality and Social Psychology, 55, 149–158.
Funder, D. C., & Dobroth, K. M. (1987). DiVerences between traits: properties associated with interjudge agree-
ment. Journal of Personality and Social Psychology, 52, 409–418.
Funder, D. C., & Sneed, C. D. (1993). Behavioral manifestations of personality: an ecological approach to judg-
mental accuracy. Journal of Personality and Social Psychology, 64, 479–490.
Funder, D. C., Furr, R. M., & Colvin, C. R. (2000). The Riverside behavioral Q-sort: a tool for the description of
social behavior. Journal of Personality, 68, 451–489.
GiVord, R. (1991). Mapping nonverbal behavior on the interpersonal circle. Journal of Personality and Social Psy-
chology, 61, 279–288.
Gosling, S. D., Ko, S. J., Mannarelli, T., & Morris, M. E. (2002). A room with a cue: personality judgments based
on oYces and bedrooms. Journal of Personality and Social Psychology, 82, 379–398.
Hall, J. A. (1978). Gender eVects in decoding nonverbal cues. Psychological Bulletin, 85, 845–857.
Author's personal copy
Hall, J. A. (1984). Nonverbal sex diVerences: Communication accuracy and expressive style. Baltimore, MD: Johns
Hopkins University Press.
Hall, J. A., & Bernieri, F. J. (2001). Interpersonal sensitivity: Theory and measurement. NJ: Erlbaum.
Hall, J. A., & Carter, J. D. (1999). Gender-stereotype accuracy as an individual diVerence. Journal of Personality
and Social Psychology, 77, 350–359.
Hall, J. A., Bernieri, F. J., & Carney, D. R. (2005). Nonverbal behavior and interpersonal sensitivity. In J. A. Harr-
igan, R. Rosenthal, & K. R. Scherer (Eds.), Handbook of nonverbal behavior research methods in the aVective
sciences. New York: Oxford.
Izard, C. E. (1991). The psychology of emotions. New York: Plenum Press.
John, O. P. (1989). Towards a taxonomy of personality descriptors. In D. M. Buss & N. Cantor (Eds.), Personality
psychology: Recent trends and emerging directions (pp. 261–271). New York: Springer.
Kenny, D. A. (1994). Interpersonal perception: A social relations analysis. New York: Guilford.
Kenny, D. A., Albright, L., Malloy, T. E., & Kashy, D. A. (1994). Consensus in interpersonal perception: acquain-
tance and the big Wve. Psychological Bulletin, 116, 245–258.
Lippa, R., & Dietz, J. K. (2000). The relation of gender, personality, and intelligence to judges’ accuracy in judging
strangers’ personality from brief video segments. Journal of Nonverbal Behavior, 24, 25–43.
Matsumoto, D., LeRoux, J., Wilson-Cohn, C., Raroque, J., Kooken, K., Ekman, P., et al. (2000). A new test to
measure emotion recognition ability: Matsumoto and Ekman’s Japanese and Caucasian Brief AVect Recogni-
tion Test (JACBERT). Journal of Nonverbal Behavior, 24, 179–209.
McClure, E. B. (2000). A meta-analytic review of sex diVerences in facial expression processing and their develop-
ment in infants, children, and adolescents. Psychological Bulletin, 126, 424–453.
Murphy, N. A., Hall, J. A., & Colvin, R. C. (2003). Accurate intelligence assessments in social interactions: media-
tors and gender eVects. Journal of Personality, 71, 465–493.
Norman, W. T., & Goldberg, L. R. (1966). Raters, ratees, and randomness in personality structure. Journal of Per-
sonality and Social Psychology, 4, 681–691.
Nowicki, S., & Duke, M. P. (1994). Individual diVerences in the nonverbal communication of aVect: the diagnostic
analysis of nonverbal accuracy scale. Journal of Nonverbal Behavior, 18, 9–35.
Realo, A., Allik, J., Nõlvak, A., Valk, R., Ruus, T., Schmidt, M., et al. (2003). Mind-reading ability: beliefs and per-
formance. Journal of Research in Personality, 37, 420–445.
Reynolds, D. J., Jr., & GiVord, R. (2001). The sounds and sights of intelligence: a lens model channel analysis. Per-
sonality and Social Psychology Bulletin, 27, 187–200.
Rosenthal, R., Hall, J. A., Di Matteo, M. R., Rogers, P. L., & Archer, D. (1979). Sensitivity to nonverbal communi-
cation: The PONS test. Baltimore, MD: The Johns Hopkins University Press.
Schmid Mast, M., & Hall, J. A. (2004). Who is the boss and who is not? Accuracy of judging status. Journal of
Nonverbal Behavior, 28, 145–165.
Tickle-Degnen, L., & Lyons, K. D. (2004). Practitioners’ impressions of patients with Parkinson’s disease: the
social ecology of the expressive mask. Social Science & Medicine, 58, 603–614.
Vogt, D. S., & Colvin, C. R. (2003). Interpersonal orientation and the accuracy of personality judgments. Journal
of Personality, 71, 267–295.
Watson, D. (1989). Strangers’ ratings of the Wve robust personality factors: evidence of a surprising convergence
with self-report. Journal of Personality and Social Psychology, 57, 120–128.
Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures of positive and
negative aVect: the PANAS scales. Journal of Personality and Social Psychology, 54, 1063–1070.
Wonderlic, E. F. (1984). Wonderlic personnel test manual. NorthWeld, IL: Wonderlic & Associates.
Zebrowitz, L. A., Hall, J. A., Murphy, N. A., & Rhodes, G. (2002). Looking smart and looking good: facial cues to
intelligence and their origins. Personality and Social Psychology Bulletin, 28, 238–249.