Journal of Experimental Psychology:
Human Perception and Performance
2006, Vol. 32, No. 6, 1422–1435
Copyright 2006 by the American Psychological Association
0096-1523/06/$12.00 DOI: 10.1037/0096-1523.32.6.1422
What Is Being Masked in Object Substitution Masking?
Angus Gellatly and Michael Pilling
Geoff Cole
The Open University
University of Durham
Paul Skarratt
University of Hull
Object substitution masking (OSM) is said to occur when a perceptual object is hypothesized that is
mismatched by subsequent sensory evidence, leading to a new hypothesized object being substituted for
the first. For example, when a brief target is accompanied by a longer lasting display of nonoverlapping
mask elements, reporting of target features may be impaired. J. T. Enns and V. Di Lollo (2000)
considered it an outstanding question whether OSM masks some or all aspects of a target. The authors
report three experiments demonstrating that OSM can selectively affect target features. Participants may
be able to detect a target while being unable to report other aspects of it or to report the color but not the
orientation of a target (or vice versa). We discuss these findings in relation to two other visual
phenomena.
Keywords: object substitution masking, visual attention
Visual masking is said to occur when two stimulus displays are
presented in close spatial and temporal contiguity and the visibility
of one of them (the target) is reduced by the presence of the other
(the mask). The target is flashed only briefly, but the mask display
may be presented for a shorter or longer period depending on the
requirements of the particular study. Over the years, several supposedly distinct forms of masking have been proposed. In a recent
influential article, Enns and Di Lollo (1997) reported what they
claimed to be a new form of visual masking, which they termed
object substitution masking (OSM). They (Di Lollo, Enns, &
Rensink, 2000; Enns, 2004) have contrasted OSM, supposedly
involving substitution of one perceptual object by another, with
what Enns (2004) has called object formation masking (OFM). The
latter refers to masking that supposedly results from interference
with the perceptual formation process involved in segmenting the
target from the camouflage of background and other nearby objects. The term OFM subsumes much of what has previously been
referred to as integration, interruption, or metacontrast masking,
although Di Lollo et al. (2000) and Enns (2004) suggested that
demonstrations of these purported categories of masking may
often have included components of both OFM and OSM. It is not
our intention in this article to review the huge literature on visual
masking or to debate fine details of taxonomy and nomenclature in
relation to it. We adopt the terminological dichotomy of OFM and
OSM on pragmatic grounds because it serves our present purpose.
In recent articles dealing with OSM (e.g., Di Lollo et al., 2000;
Enns, 2004; Enns & Di Lollo, 1997; Kahan and Mathis, 2002) a
usage has developed in which new labels are sometimes used in
place of terms with a longer history in the literature on masking.
Again, we do not wish to argue the superiority of one nomenclature over the other. We aim to use both sets of terms interchangeably and in such a manner as to be understood equally by those
accustomed to the more traditional terminology and those familiar
with usage in recent articles dealing with OSM.
According to Di Lollo et al. (2000), OFM is sensitive to factors
such as contour abutment and overlap and relative luminances of
target and mask displays. It also depends critically on the exact
timing of target and mask onsets. When studied as integration or
metacontrast masking, OFM typically peaks at a target–mask
stimulus onset asynchrony (SOA) of 50 ms or less and is largely
absent at SOAs of 100 ms or more (see Enns, 2004). OFM is also
little affected by manipulations of spatial attention toward or away
from the target.
OSM, by contrast, is highly sensitive to attentional manipulations but not to the local spatiotemporal contour interactions
thought to give rise to OFM. A standard demonstration of OSM
uses what Kahan and Mathis (2002) have called the briefly masked
control method, comparing two conditions in which target and
mask onset simultaneously (common onset). In the briefly masked
control (or “no masking” control) condition, they also offset simultaneously. In the mask condition, the (temporally trailing)
mask remains present after target offset. In the earlier literature,
these were frequently referred to as simultaneous-offset and
delayed-offset conditions. Reporting of some target feature is
markedly reduced in the second condition relative to the first.
Angus Gellatly and Michael Pilling, Department of Psychology, The
Open University, Milton Keynes, United Kingdom; Geoff Cole, Department of Psychology, University of Durham, Durham, United Kingdom;
Paul Skarratt, Department of Psychology, University of Hull, Hull, United
Kingdom.
Michael Pilling is now at the MRC Institute for Hearing Research,
University of Nottingham, Nottingham, United Kingdom.
This research was supported by Economic and Social Research Council
Grant R000223824.
Correspondence concerning this article should be addressed to Angus
Gellatly, Department of Psychology, The Open University, Walton Hall,
Milton Keynes MK7 6AA, United Kingdom. E-mail: a.gellatly@open.ac.uk
1422
OBJECT SUBSTITUTION MASKING
Because spatial and temporal contour relationships at onset are
identical, the degree of OFM is usually thought to be equal in both
conditions, and the reduction in target visibility is, therefore, taken
as a measure of OSM. (An alternative interpretation would be that
the greater time-integrated energy of the trailing mask simply
produces a greater degree of OFM than does the simultaneousoffset mask. In our introduction and discussion of Experiment 2,
we present evidence and arguments against this interpretation.)
The theory of OSM (Di Lollo et al., 2000) assumes that perception arises from continuous and recurrent communication between neurons at lower and higher levels within the visual system.
Newly appearing objects stimulate lower level cells with spatially
local receptive fields and geometrically simple stimulus requirements. In a feed-forward sweep, output from these cells activates
higher level neurons, which have larger receptive fields and are
tuned to more complex stimulus properties. Competing pattern
hypotheses are generated at this higher level. Resolution of competition between these hypotheses, as well as binding of patterns to
precise spatial locations, is thought to require feedback sweeps.
Activations at higher and lower levels are compared for consistency, and there may be some number of iterations of forward and
backward sweeps before a stable percept emerges. If the visual
scene remains constant over the iterations required to achieve
dynamic stability, the new object will be consciously perceived.
However, if a mismatch is detected between activation at the
different levels, the iterative process will begin again only on the
basis of current sensory input. OSM is said to occur as a result of
such a mismatch. Onset of target and mask sets up lower level
activation leading to the hypothesis of target plus mask. If both
offset simultaneously before the arrival of the feedback sweep, this
hypothesis can still be matched to persisting but fading activity at
the lower level. However, if the mask display continues after offset
of the target, the hypothesis will mismatch strong sensory evidence
that there is now only a mask present. Further iterations result in
only the mask being consciously perceived. Perception of the
target plus mask will have been substituted by perception of the
mask alone. Spatial attention is thought to modulate the masking
effect because if attention is already focused at the appropriate
location, conscious perception of the target will be achieved with
fewer iterations than if it is focused elsewhere or is diffuse. In the
same vein, studies by Neill, Hutchinson, and Graves (2002) and
Tata and Giaschi (2004) have shown that the extent of OSM is
modulated by the power of the mask to capture attention away
from the target.
Supposedly, OSM takes place after object formation, at the level
of object representation (Enns, 2004; Lleras & Moore, 2003;
Moore & Lleras, 2005). But what does this mean? In one example
of OSM, a diamond target missing either a left or right corner is
briefly presented (usually 33 ms or less), and the observer has to
report the side of the missing corner. Under a variety of conditions,
accuracy of report can be reduced by a mask of just four dots
surrounding but not abutting the target. For example, in Experiment 1 reported by Enns and Di Lollo (1997), there were three
horizontally aligned locations at which the briefly presented target
might appear. After a variable SOA, a four-dot mask appeared for
30 ms at the same location as the target or at a different location.
Accuracy of reporting the missing corner of the target was greatly
reduced in the former condition relative to the latter. The same
outcome was obtained by Kahan and Mathis (2002, Experiment 1),
1423
when the target diamond appeared unpredictably in one of four
quadrant positions and a simultaneous-onset four-dot mask with
delayed-offset appeared in the same or in a different quadrant.
Similarly, in their Experiment 3, Enns and Di Lollo found that
accuracy of report was reduced for displays containing two distractor diamonds in addition to the masked target as compared with
displays containing only a masked target.
We are interested in the following question: When, in studies
such as those just cited, observers are unable to report the side of
the missing corner of the diamond, what is it that is being masked?
For example, are observers unaware that a target was presented?
Given that detection performance is generally a more sensitive
measure than discrimination performance, it seems unlikely that
this need be the case. However, the proposal that OSM occurs at
the level of object tokens has about it the suggestion that all
representation of the target object is erased from perception. Although we are unaware of published data that address the issue,
statements by several authors come very close to such a claim. For
example, in relation to four-dot masking of a diamond, Kahan and
Mathis (2002) stated that “the phenomenological experience of
this effect is that a mask surface replaces the target” (p. 1249).
Neill et al. (2002) reported for four-dot masking of a letter that
“not only does the space inside the dots appear blank, but there is
a strong subjective impression of the contours of a square connecting the 关masking兴 dots” (p. 683). For masking of variously
oriented C shapes, Di Lollo et al. (2000) stated,
Although the four-dot mask was insufficient as a source of contour
interference, it was entirely adequate for defining a trailing configuration (a square surface) that replaced the target as an object of
perception . . . .At longer durations of the trailing mask . . . the four
dots appeared to be clearly visible, but the target location appeared
empty. (p. 492)
While for similar displays, Tata (2002) reported that by contrast
with previous metacontrast masking effects in which “visibility of
the target is reduced, but its presence is nevertheless detected,” in
his studies masking “was phenomenologically complete . . . the
observers reported seeing a blank space among the distractors
where the target should have been” (p. 1036). Clearly, Di Lollo et
al. and Tata were open to the possibility that OSM may eliminate
all trace of the target representation. In contrast to these strong if
introspectively based claims, Enns and Di Lollo (2000, p. 351)
took a more cautious position and considered it an outstanding
question whether OSM (which in the context they termed
common-onset masking) interferes with the perception of some or
all aspects of a target. They wrote,
For example, many iterative cycles might be required to perceive
specific attributes of the target such as its detailed shape or colour.
Simpler attributes such as mere presence or absence might require
fewer cycles, in which case masking for these attributes would be
reduced or eliminated. (p. 351)
The first aim of the present article is to provide data to address this
outstanding question identified by Enns and Di Lollo.
Before we proceed further, it is important to emphasize the logic
of inquiry in relation to this matter. Suppose the phenomenological
claims reported above were supported by behavioral evidence that
observers could not discriminate between presence and absence of
the target. What would this demonstrate? Consider that Di Lollo et
GELLATLY, PILLING, COLE, AND SKARRATT
1424
al. (2000) used displays with up to 15 distractors and thus had 16
potential targets and target locations. It is possible that in conditions of such high perceptual load, target detection might indeed be
at the level of chance. This would not prove that OSM necessarily
involves elimination of all trace of the target. It would simply show
that there happen to be conditions in which detection performance
is reduced to chance and that also produce OSM. However, proving the obverse case, that OSM for some feature of a target can
occur without all trace of the target having been erased from
conscious perception, requires only a single counterdemonstration.
If a substantial degree of OSM can be obtained for a particular
target feature in conditions that produce a much smaller decrement
in detection performance and if the level of detection is higher than
the level of discrimination, then there must be some trials on which
the target is detectable while the feature is not discriminable. In
Experiment 1 we had the aim of investigating this issue.
Our results may be summarized as the following: Experiment 1
showed that strong four-dot masking can occur for a discrimination task under conditions in which reporting of presence or
absence is much less affected. This raised the further question of
whether, in OSM, properties such as color and orientation can be
independently masked. The results of Experiment 2 indicate that
masking of different properties is at least partially independent,
and those of Experiment 3 reveal that it can be fully independent.
Theoretical implications of this independent processing are
discussed.
Experiment 1: Detection and Discrimination Under OSM
This study was closely modeled on Enns and Di Lollo’s (1997)
Experiment 3. Participants reported either which corner of a
masked diamond was missing (discrimination) or whether a target
had been present or absent at the masked location (detection).
Method
Participants. Ten postgraduate students and employees at the Open
University served as paid participants. All had normal or corrected-tonormal vision.
Materials and procedure. Stimuli were presented on a PC monitor.
Following a warning tone, a trial began with presentation of two blue bars,
2° above and below the center of the screen, between which participants
were instructed to fixate throughout the trial. After 400 ms, three diamonds
were presented for 17 or 33 ms, each with either the left or right corner
deleted at random. One diamond, the target, was surrounded by four
simultaneously appearing dots (except on half of the detection trials, on
which the two distractor diamonds appeared with the dots surrounding an
empty location). Mask dots either offset simultaneously with target and
distractors (unmasked or simultaneous-offset condition) or remained for
500 ms (trailing mask or delayed-offset condition). For discrimination, the
missing corner of the target was to be indicated with the left or right slash
key. For detection, the same keys were used to report target presence or
absence, with this response mapping and the order of the two tasks
counterbalanced across participants. Target and mask (or mask alone)
appeared randomly and equally often in each of the three locations. Each
task comprised demonstration trials (with extended frame durations), followed by 32 practice trials and four blocks of 48 experimental trials. The
whole session lasted approximately 40 min.
Stimuli were presented on a PC controlled by custom software and were
viewed from 70 cms. Target and distractors were white diamonds (all
monitor color guns at 63) on black (all monitor color guns at 0); they were
0.9° on a side with a corner missing (triangular section 0.25°on its equal
sides). Masking dots were squares 0.4° on a side, forming a virtual square
of 2°. Target and distractors appeared at the center and 3° to left and right
of center.
Results
Accuracy for both tasks is shown in Table 1. False positive
errors occurred on less than 1% of target-absent detection trials.
An initial analysis of variance (ANOVA) on the accuracy data
showed target duration did not produce a significant main effect or
interactions with other factors (F ⬍ 1.5 in all cases), therefore the
data were subsequently collapsed across durations. A two-way
repeated-measures ANOVA showed significant main effects of
task, F(1, 9) ⫽ 63.3, p ⬍ .001, and masking, F(1, 9) ⫽ 23.5, p ⬍
.001, and a significant interaction between these two factors, F(1,
9) ⫽ 14.5, p ⬍ .005, reflecting the larger effect of masking on
discrimination performance than on detection performance. Post
hoc one-tailed tests showed, however, that masking had a significant effect on both discrimination, t(9) ⫽ 5.04, p ⬍ .005, and
detection, t(9) ⫽ 2.12, p ⬍ .05.
Discussion
Targets that masking has reduced in visibility or even rendered
phenomenally absent may serve as effectively as unmasked targets
to elicit implicit measures of detection such as response time to
display onset (Fehrer & Biederman, 1962; Fehrer & Raab, 1962)
or two-alternative forced choice discrimination (Schiller & Smith,
1966). Present or absent responses are thought to more closely
reflect phenomenal experience, although see Jacoby (1998) for a
discussion of the relative contributions of conscious and unconscious processes to performance on explicit performance tests.
Nevertheless, the present results indicate that four-dot masking can
reduce explicit detection performance even for fairly low-load
visual displays. The effect is quite small in the present experiment,
but it is significant. On this basis, it is possible that in different
circumstances, especially with higher load displays such as those
used by Di Lollo et al. (2000), the effect might be much larger,
Table 1
Percentage Mean Correct Responses and Standard Deviations on Present/Absent Detection Task
and Left/Right Discrimination Task in Experiment 1
Unmasked
Present–absent detection
Left–right discrimination
Masked
Masking effect
% Correct
SD
% Correct
SD
% Correct
SD
93.8
81.2
2.3
7.2
89.8
65.1
6.8
12.0
4.0
16.1
5.9
10.1
OBJECT SUBSTITUTION MASKING
with detection performance possibly even down at chance level (as
implied in the quotations cited earlier). Be that as it may, however,
in Experiment 1 OSM had a much greater effect on discrimination
than on detection, indicating that on some proportion of trials the
target must have been detectable while the missing corner was not
discriminable.
The answer to the question posed by Enns and Di Lollo (2000)
is that OSM can interfere to a greater extent with one aspect of a
target than with another. Contrary to one reading of the introspective reports cited earlier, OSM need not be an all-or-none affair; it
does not have to entail complete substitution of the conscious
representation of the target (Di Lollo et al., 2000). So exactly what
aspects of the target do get masked in OSM? A striking feature of
OSM is that aspects of a target can be obscured by masking
elements or even a single element (Lleras & Moore, 2003) very
different from it in shape and location. Typically, however, OSM
experiments use target and mask stimuli that differ in various
uncontrolled ways and are poorly matched psychophysically. The
diamond targets and masking dots used in Experiment 1 and in
several previous studies of OSM (e.g., Enns & Di Lollo, 1997;
Kahan & Mathis, 2002) were characteristic in this respect. The
dots were actually small squares; target and mask objects differed
greatly in size and by 45° in their major orientation (although the
truncated diamonds contain a vertical edge also). That such differently shaped and spatially distant mask elements could impair
perception of the target was, of course, precisely what made the
OSM demonstration so impressive. At the same time, however, it
left open the question of whether the intensity of OSM may be
determined by the extent of the physical differences between target
and mask objects. Although there have been careful parametric
studies in relation to OSM of target–mask SOA, target–mask
separation, and number of distractors (e.g., Di Lollo et al., 2000;
Enns, 2004), we are unaware of any studies of OSM that have
systematically varied feature differences between target and mask
elements (though see Moore & Lleras, 2005). In our next two
experiments we used target and mask stimuli that differed in a
controlled manner on two physical dimensions. As in Experiment
1, a target item was surrounded by four masking items, but all
these were identically shaped rectangles. Target and mask bars
could differ in color, orientation, both, or neither.
In Experiments 2 and 3, we addressed the question of whether
discrimination of target color and target orientation can be independently subject to OSM. One way of putting this is to ask
whether what gets substituted is the representation of an integrated
object or of a bundle of stimulus features that remain somewhat
unbound and independent (Wolfe & Cave, 1999). If OSM occurs
at the level of object representations, it might be expected that
features would already be bound together and so not subject to
independent OSM. We defer full consideration of such theoretical
issues until the data have been reported.
Experiment 2: OSM for Reporting Color or Orientation of
Target
The task for Experiments 2 and 3 is illustrated in Figure 1.
Participants fixated between two blue bars presented for 500 ms
and reported either the color or orientation (Experiment 2), or both
(Experiment 3), of a target bar. Target and distractors occurred
centrally and to the left and right. The 17-ms target display
1425
Delayed
offset
Till Response(s)
500 ms
Simultaneous
offset
17 ms
500 ms
Till Response(s)
17 ms
500 ms
Figure 1. Presentation sequence for the control mask (simultaneous offset) and trailing mask (delayed offset) conditions of Experiments 2 and 3.
contained either the target and two distractor bars or just two
distractors. Target location was identified by four horizontal masking bars surrounding one of the three potential targets’ locations.
There were three masking conditions. The mask display could
onset and offset simultaneously with the target, the simultaneousoffset, or (briefly masked) control condition. Or the mask display
could onset simultaneously with the target but remain present for
500 ms, the delayed-offset, or trailing mask condition. Or, finally,
the mask could onset at a 100 ms SOA following target onset and
remain present for 500 ms, the delayed-onset condition. Participants pressed the left slash key (for Orange or Horizontal or the
right slash key (for Red or Vertical) to indicate either the color or
the orientation of the target, or the space bar if they either did not
know the color or orientation of the target or thought that no target
had been presented. They were instructed not to guess the color or
orientation of the target if they were uncertain but to press the
space bar in such cases. Responses of this kind are referred to as
omission responses as opposed to correct responses (reporting the
correct color or orientation) and error responses (reporting the
incorrect color or orientation).
The wording of these instructions and the inclusion of targetabsent trials require explanation. Two-alternative forced choice
decisions are commonly used in psychophysical work because
they have the advantage that, in principle, data interpretation need
not take response bias into account. The use of target-absent trials,
as well as the instruction to report color or orientation only when
fairly confident of them, potentially introduces issues of response
bias to the interpretation of our data. However, we deliberately
chose this method rather than a two-alternative forced choice
decision precisely to avoid the danger of a particular response bias.
Our participants were required to report either the color or the
orientation of a target bar fleetingly presented at the center of four
clearly visible horizontal bars that themselves had one of the two
1426
GELLATLY, PILLING, COLE, AND SKARRATT
values of color and orientation that the target could take. During a
run of repetitive experimental trials, participants reporting nearthreshold experiences might sometimes unconsciously respond
with the color (orientation) of the mask bars rather than with what
they thought might be the color (orientation) of the target. Alternatively, participants might very consciously adopt a strategy of
responding with the color (orientation) of the mask when they
were uncertain about the color (orientation) of the target precisely
because they hypothesized that target color (orientation) was more
discernible when different from that of the mask. In other words,
a possible strategy would be, “when you don’t see the color
(orientation) of the target, respond with the value of the mask.”
Inclusion of target-absent trials and instructions not to guess when
uncertain was intended to discourage use of this strategy.1 This
seems to have worked, so much so that there were even signs of an
opposite strategy having been adopted by some participants in
Experiment 3. It also allowed us to assess whether participants
distinguished between absence and presence of a masked target, as
they had in Experiment 1. If the false positive rate was low, then
target-absent trials were not being mistaken for target-present
trials. Note, though, that because of the nature of our instructions,
false negative responses— omission responses on target present
trials— could not be interpreted as failures to detect the target. The
participant may have detected that “something was there” but not
known its color or orientation. Indeed, the results of Experiment 1
and the very low false positive rates we report for Experiments 2
and 3 strongly suggest that this was usually the case when an
omission response was made, suggesting that in these experiments
OSM occurred at the feature level. Moreover, we believe that the
robust pattern of data across Experiments 2 and 3 justifies our
decision on this matter of method.
Mask objects in Experiments 2 and 3 were narrowly separated
from the target, and some target contour was paralleled by mask
contour. In order to draw any conclusions about OSM, one must
demonstrate that the results obtained do not simply reflect OFM.
One way to do this is to use the briefly masked control method
used in Experiment 1, which involves comparing a simultaneousoffset mask with a delayed-offset (trailing) mask. Because both
conditions involve simultaneous onset of target and mask the
extent of OFM is usually thought to be equal in both cases, so—as
described earlier—any difference in performance on the two conditions is then attributed to OSM alone. Alternatively, OSM is
sometimes thought to be demonstrated by comparing a
simultaneous-onset and delayed-offset (trailing) mask with the
same duration mask presented at an SOA of (in our case) 100 ms,
a delayed-onset mask. As just described, the former condition is
thought potentially to give rise to both OFM, because of simultaneous onset of mask and target, and OSM, because of the delayed
offset of the mask elements (though see also below). In the
delayed-onset condition, by contrast, this interpretation predicts
that there should be little if any OFM because with a 100 ms SOA
between target onset and mask onset, OFM will be greatly attenuated if not absent (Spencer & Shuntich, 1970), leaving only OSM
caused by the trailing mask elements.
In Experiment 2 we used both control methods to ensure that the
effects we reported were indeed examples of OSM, not OFM. Note
that we did not attempt to establish whether OFM actually occurred in any of the conditions of Experiment 2. One way to have
done this would have been to vary the separation between target
and mask elements. If separation had no effect on the extent of
masking, it could have been concluded that only OSM and not
OFM was affecting performance (Enns & Di Lollo, 1997, Experiment 2). However, a problem arises with this method of distinguishing between OSM and OFM if target–mask separation does
prove to have an effect on performance because there is no means
of measuring the relative contribution of each type of masking.
Accordingly, instead of adopting this procedure, we followed the
logic of common-onset masking (Di Lollo et al., 2000; Enns & Di
Lollo, 2000). As just described, the usual argument here is that if
OFM has occurred, its strength should be equal for simultaneousoffset and delayed-offset masks because both involve an identical
simultaneous onset. As noted earlier, however, it could be argued
that the greater time-integrated energy of the trailing mask simply
causes more OFM than does the simultaneous-offset mask. Our
use of a delayed-onset mask with the same energy as the trailing
mask should allow us to distinguish between these competing
interpretations. Because OFM is known to decrease with target–
mask SOA and to be almost absent at SOAs at or beyond 100 ms
(Spencer & Shuntich, 1970), then on either account the delayedonset mask should cause less OFM than the simultaneous-onset
trailing mask. If the two types of mask cause similar degrees of
masking, this indicates that both are causing OSM rather than
OFM.
To summarize, the logic of the experiments was as follows.
Target and mask could have the same orientation and color, could
differ on color or orientation, or could differ on both features. If
OSM occurs after object features have been bound into an integrated representation, then the extent of masking in any condition
should be equal for reporting of either feature. This follows because if the representation of the target (plus mask) has been
substituted by a representation of the mask alone, then access to a
record of either target feature should be equally impossible. By
contrast, if OSM occurs prior to the binding of features into an
integrated representation, then it presumably occurs at the level of
individual features. Therefore, when target and mask differ on a
single feature, reporting of that feature should evidence less masking than reporting of the feature that is the same for target and
mask elements. This is because the signal-to-noise ratio is greater
for the former than for the latter. By the same reasoning, because
reporting of a feature should reflect only the signal-to-noise ratio
for that feature, accuracy of reporting should be unaffected by
1
One method of avoiding the problem might have been to borrow
techniques used by Mounts and Melara (1999) and to use two pairs of
similar colors such as red or orange and green or blue. A red or orange
target, for example, could have been surrounded by either red and orange
mask elements (similar condition) or by green and blue mask elements
(dissimilar condition). In this case, participants would not have been able
to utilize a strategy of simply responding with either the mask color or its
opposite. However, pilot testing and the results of Experiments 2 and 3
show how difficult it is to hold performance on the present task below
ceiling even when the color difference between target and mask elements
in the dissimilar color condition is very small (i.e., red vs. orange). This
rules out use of the method just described. Following the same logic,
Mounts and Melara also used pairs of orientations that were close to
vertical or close to horizontal. But this technique was also inapplicable to
the present studies because of ceiling and floor effects and the impossibility
of matching for contour proximity across similar and dissimilar conditions.
OBJECT SUBSTITUTION MASKING
whether there is a match or mismatch between target and mask on
the other feature. The extent of masking should vary across conditions. Reporting of color should be affected only by match or
mismatch on color, and reporting of orientation should be affected
only by match or mismatch on orientation.
1427
Participants. Twenty University of Keele undergraduates with normal
or corrected-to-normal vision served as participants in the experiment in
partial fulfillment of a course requirement, half reporting target color and
half target orientation.
Equipment and stimuli. Stimuli were presented as for Experiment 1
and viewed from 70 cm. Following a warning tone, two horizontal blue
fixation lines appeared. After 500 ms they were joined by the target
display, containing two or three bars at the three potential target locations,
at the center of the screen and 3° to left and right. Target, distractor, and
mask bars were 1° ⫻ 0.2° and colored red (45, 0, 0) or orange (45, 28, 0).
They were photometrically matched for on-screen luminance. Simultaneously with target onset (control and trailing mask conditions) or 100 ms
later (delayed mask condition), one location was surrounded by four red or
orange horizontal masking bars, centered 0.5° above or below and 0.8° left
or right of the location center. After 17 ms, the target (if any) and distractor
bars offset. The mask bars offset either simultaneously (control) or after
being present for 500 ms (trailing mask and delayed mask). The target
location contained a target bar on two thirds of trials. Unmasked locations
always contained horizontal distractors the same color as the mask bars.
When a target was present at the masked location, it was equally often the
same color and orientation as the mask (SC–SO), different color and same
orientation (DC–SO), same color and different orientation (SC–DO), or
different on both features (DC–DO).
Procedure and design. We explained the task by using demonstration
trials with prolonged frame durations. Participants were informed that
response speed was unimportant and that their sole aim was to be accurate
in color or orientation decisions and response key selection. They were also
told (a) that they should not be reluctant to press the space bar to indicate
either “target absent” or that they were uncertain as to the color or
orientation of the target, and (b) that on one third of trials no target would
be presented. Central fixation throughout a trial was emphasized. Participants completed a practice block of 40 trials making color or orientation
judgments followed by 18 experimental blocks of 60 randomly ordered
trials of the same judgment. There were 60 target-present and 30 targetabsent trials per combination of target type (SC/SO, DC/SO, SC/DO,
DC/DO) and mask condition (control/trailing/delayed).
“don’t know”) and that the percentage of omissions was effectively the inverse of the percentage of correct responses. The latter
figures are shown in Panel A of Figure 2 for reporting color and
Panel B for reporting orientation. For the simultaneous-offset
control condition, reporting of both dimensions was highly accurate for all target types. For the delayed-offset (trailing) mask and
delayed-onset mask conditions, data patterns and absolute accuracy levels were highly similar. In both, reporting of target color
was greatly improved by a target–mask difference in color and to
a lesser extent by a difference in orientation. Similarly, for both
conditions, reporting of target orientation was greatly improved by
a target–mask difference in orientation and to some extent by a
difference in color.
To compare the delayed-offset (trailing) mask and delayedonset mask conditions, we conducted a 2 ⫻ 2 ⫻ 4 mixed ANOVA
with related variables of mask type (delayed-offset/delayed-onset)
and target type (SC/SO, DC/SO, SC/DO, DC/DO) and an unrelated variable of reported dimension (color/orientation). Mask type
had a nonsignificant effect and did not enter into any significant
interactions, F ⬍ 1 in all cases. We do not comment on other
aspects of this analysis because they recurred in the analyses that
follow and are considered in the next paragraph. Also, because
these two sets of data were indistinguishable, only one of them was
used in further analysis.
We calculated masking scores by subtracting for each participant his or her scores in the delayed-offset (trailing) mask condition from the corresponding score in the simultaneous-offset control condition. These data were entered into separate 2 ⫻ 2
within-participant ANOVAs for the report color and report orientation groups, with factors of target color (same/different) and
orientation (same/different). For the report color group, there were
significant main effects of target color, F(1, 9) ⫽ 36.72, p ⬍ .001,
and orientation, F(1, 9) ⫽ 23.67, p ⬍ .001. Target color had a
slightly larger effect when target and mask orientation were the
same rather than different, but this interaction effect was nonsignificant, F(1, 9) ⫽ 3.36, p ⬍ .1. For reporting orientation, there
were main effects of target color, F(1, 9) ⫽ 26.81, p ⬍ .001, and
orientation, F(1, 9) ⫽ 23.09, p ⬍ .001, and a significant interaction, F(1, 9) ⫽ 10.44, p ⬍ .01, because target orientation had a
larger effect when target and mask color were the same rather than
different.
Results
Discussion
The false positive rate was very low, with participants correctly
pressing the space bar on 99% of target-absent trials. Responses on
target-present trials were either correct, omissions, or errors. Errors— giving the wrong color or orientation of a presented target—
occurred on only 1.1% of target-present trials, indicating that
participants were able to follow the instruction not to guess when
uncertain as to target color or orientation. A 2 ⫻ 3 ⫻ 4 ANOVA
on the error data with factors of reported dimension (color/orientation), mask condition (control/trailing/delayed-onset) and target
type (SC/SO, DC/SO, SC/DO, DC/DO) gave no significant effects
(F ⬍ 2, p ⬎ .15 in all cases). Because error rates did not differ
across condition, guessing corrections were not applied, and we
reported the percentage of correct responses (out of 60) for each
condition. The low error rate also means that the vast majority of
noncorrect responses were, therefore, omissions (“no target” or
Experiment 2 was designed to provide two checks on whether
the pattern of data obtained with the delayed-offset (trailing) mask
reflected OSM, OFM, or a combination of the two. The first
involved comparing the simultaneous-offset and delayed-offset
conditions. Figure 2 shows that these produced very different
patterns of data, suggesting that the pattern for the delayed-offset
condition was due to OSM rather than to OFM. However, the
strength of this conclusion could be open to question. There is a
potential problem because performance with the simultaneousoffset mask was at or close to ceiling for all conditions. It is
possible that if performance with this control mask had been
sufficiently below ceiling, then the same pattern of results might
have been obtained as with the delayed-offset (trailing) mask,
which would have suggested that in both cases OFM was at work.
This would be consistent with the interpretation that says that the
Method
GELLATLY, PILLING, COLE, AND SKARRATT
1428
Report Colour
Simultaneous Onset Control
Delayed Offset (Trailing) Mask
Delayed Onset Mask
Percentage correct score
100
A
80
60
40
20
0
Target Colour:
Same
Different
Same
Different
Target Orient.:
Same
Same
Different
Different
Report Orientation
Percentage correct score
100
Simultaneous Onset Control
Delayed Offset (Trailing) Mask
Delayed Onset Mask
B
80
60
40
20
0
Target Colour:
Same
Different
Same
Different
Target Orient.:
Same
Same
Different
Different
Figure 2. Percentage of correct responses for report color (Panel A) and report orientation (Panel B) by target
type and masking condition in Experiment 2. Error bars denote the standard error of the mean.
OBJECT SUBSTITUTION MASKING
more intense masking with the trailing mask was due to its greater
energy causing a higher degree of OFM. However, evidence
against this comes from the second check built into the experiment.
Results for the delayed-offset (trailing) mask and the delayedonset (also trailing) mask are indistinguishable. If the former had
its effect largely by means of OFM, then much smaller masking
effects should have been caused by the latter. That this was not the
case indicates that the identical effects seen in both these conditions were the result of OSM, not OFM.
In the two trailing mask conditions (delayed-offset and delayedonset), a target or mask difference on either color or orientation
reduced masking more for that feature than for the other feature.
This indicates a degree of dimensional independence in OSM,
which in turn implies that what was being masked was not an
integrated representation of the target object, an object token.
Conversely, the fact that for reporting of both color and orientation, masking was also significantly reduced by a target or mask
difference on the other dimension suggests that independence of
dimensional processing may have been less than complete. The
data from Experiment 2 are, then, somewhat ambiguous with
respect to the predictions outlined earlier. Once again, however,
account has to be taken of the possible role of ceiling effects.
Performance in reporting of either dimension was close to ceiling
for all target types in the simultaneous-offset control condition
(and for some target types—particularly DC–DO—in the delayedoffset and delayed-onset conditions). By obscuring possible differences between target–mask conditions, these high levels of
performance may have distorted the calculated masking scores.
Our main aim in Experiment 3 was to test the same predictions as
described for Experiment 2 under conditions in which ceiling level
performance could be avoided.
1429
Equipment and stimuli. Equipment and stimuli were as in Experiment
2, except the bar stimuli were one third of their previous length and
distances between locations were halved.
Procedure and design. The procedure and design were as in Experiment 2, except for the changes already indicated, and there were 48 rather
than 60 target-present trials per condition. As previously, one third of trials
contained no target. Half of participants reported color before orientation,
half the reverse.
Results
To this end, we made target signals for color and orientation
smaller in Experiment 3 by reducing the size of the stimulus
elements. Also, because the delayed-offset mask and the delayedonset mask had produced indistinguishable results in Experiment
2, only the former condition was included in Experiment 3. Finally, instead of reporting either color or orientation of the target,
participants in Experiment 3 were required to report both features
with equal priority. We made this change because the ambiguous
evidence of dimensional selectivity of OSM observed in Experiment 2 might have resulted from participants selectively attending
to their to-be-reported dimension. Such a top-down attentional set
might have biased participants against forming an integrated representation of the target object, resulting in data that reflected only
a contingent and partial dimensional independence of masking.
Requiring participants to report both dimensions was intended to
ensure that they attended equally to both features and that, to the
extent this might be subject to top-down control, they were biased
to form an integrated representation of the target from which both
features could be read off.
The false positive rate was again very low, with participants
correctly pressing the space bar on 98.5% of target-absent trials.
For target-present trials, responses were either correct, omissions,
or errors. As in Experiment 2, the large majority of noncorrect
responses were omissions. However, errors (reporting the wrong
color or orientation of a presented target) averaged 3% for the
simultaneous-offset control mask and 10% for the delayed-offset
(trailing) mask, being higher for color reports of same-color targets
than of different-color targets and higher for orientation reports of
same-orientation targets than of different-orientation targets. Because errors clearly were not evenly distributed across conditions,
guessing corrections were applied to individual participant data for
each combination of mask type and target type. For reporting
color, errors on SC/SO were subtracted from number correct on
DC/SO and vice versa, and similarly for SC/DO and DC/DO. For
reporting orientation, errors on SC/SO were subtracted from number correct on SC/DO and vice versa, and similarly for DC/SO and
DC/DO. The guessing-corrected scores are shown in Figure 3. The
correction procedure reduced effect sizes but did not alter the data
pattern. Accuracy for reporting color in the simultaneous-offset
control condition remained close to ceiling for all target types,
even with smaller stimuli and a requirement to report both target
dimensions. However, accuracy for reporting orientation in the
control condition was lower in all conditions. Data of the control
conditions were entered into a two-way related ANOVA with
factors of reported dimension (color/orientation) and target type
(SC–SO, DC–SO, SC–DO, DC–DO). There was a main effect of
reported dimension, F(1, 9) ⫽ 15.58, p ⬍ .01, and also of target
type, F(3, 27) ⫽ 5.34, p ⬍ .01 but, it is important to note, no
interaction, F(3, 27) ⫽ 1.80.
For the delayed-offset (trailing) mask condition, performance
for reporting both features was well below ceiling and, as can be
seen from Figure 3, target type appeared to have differential effects
on reporting of color and orientation. Masking scores were calculated as in Experiment 2, by subtracting guessing-corrected
delayed-offset scores from guessing-corrected simultaneous-offset
scores (See Figure 4). For reporting of each feature, a two-way
ANOVA was computed with factors of target color (same/different) and target orientation (same/different). For reporting color,
there was a main effect of target color, F(1, 9) ⫽ 6.13, p ⬍ .05, but
no effect of target orientation and no interaction, F ⬍ 1 in both
cases. For reporting orientation, there was a main effect of target
orientation, F(1, 9) ⫽ 14.13, p ⬍ .005, but no effect of target color
and no interaction (F ⬍ 1.3 in both cases).
Method
Discussion
Participants. There were 10 new participants from the Open University, as described for Experiment 1.
Comparing Figures 2 and 3 demonstrates that there are clear
similarities in the pattern of results obtained in Experiments 2 and
Experiment 3: OSM for Reporting Color and Orientation
of Target
1430
GELLATLY, PILLING, COLE, AND SKARRATT
Figure 3. Percentage guessing-corrected scores for reporting target color and orientation by target type and
mask condition in Experiment 3. Error bars denote the standard error of the mean.
3. In both, a color difference was more important for reporting
color and an orientation difference more important for reporting
orientation. But whereas the results of Experiment 2 are ambiguous as to whether masking is wholly dimension specific, those of
Experiment 3 are rather more clear-cut. With performance in the
delayed-offset (trailing) mask condition well below ceiling, the
ability to report target color was affected only by a target-mask
color difference and not by an orientation difference. For reporting
target orientation, accuracy is affected mainly by a target–mask
orientation difference, although there is also some sign that a color
difference helped. However, the same slight benefit of a color
difference to reporting orientation is also seen for the
simultaneous-offset control condition. Presumably, this arose because color was the more salient stimulus dimension for these
stimuli, as shown by performance still being very close to ceiling
for reporting of color in the control condition. However, once
masking scores are calculated (see Figure 4), the dimensional
specificity of OSM becomes apparent. As borne out by statistical
analysis, and unlike in Experiment 2, OSM for reporting of each
feature was affected only by a target–mask difference on that
feature and not by a difference on the other feature.
It might be objected that because reporting of color in the
simultaneous-offset control condition remained close to ceiling for
all target–mask combinations, therefore strong conclusions should
no more be drawn from the results of Experiment 3 than from
those of Experiment 2. This objection supposes that ceiling effects
are obscuring differences between target–mask combinations in
the report color control condition that, were they evident, would
alter calculated masking scores in such a way as to remove
evidence for dimensional independence of OSM. This would be
the case only if they were obscuring a benefit to reporting color or
a difference in orientation. A number of counterarguments can be
mounted against such a possibility. First, if ceiling effects were
distorting the data of the report color control condition when there
were clearly no such ceiling effects for report orientation, it might
well be expected that there would be an interaction between
feature reported and target–mask combination. Yet the analysis of
control data showed no interaction. Performance in the control
conditions was slightly higher for different color targets than for
same color targets, regardless of which feature was being reported.
This is wholly consistent with color being the more salient feature,
as evidenced by greater accuracy of report for color.
Secondly, as already noted, there were no ceiling effects for
reporting orientation and yet the evidence for feature independent
OSM is as strong in this case as it is for reporting color. This is all
the more striking in the light of the greater salience of target color.
Given the imbalance in saliency, it might well have been expected
that the more salient color difference signal would affect orientation judgments even if the reverse were not the case (Nothdurft,
2000). That such an asymmetry of effect was not observed suggests the independent processing of different feature dimensions
was robust. Of course, it is possible that with a sufficiently large
imbalance in feature saliency a difference on a more salient feature
would affect reporting of a less salient feature. However, in Experiment 3 the significant difference in saliency did not lead to
such an asymmetry of effect on reporting target features. Along the
same lines, the fact that participants had to attend to and report
both features of the target might have been expected to bias them
OBJECT SUBSTITUTION MASKING
1431
Simultaneous minus Delayed score
Report Colour
Report Orientation
60
40
20
0
Target Colour:
Same
Different
Same
Different
Target Orient.:
Same
Same
Different
Different
Figure 4. Size of masking effect by target type for reporting target color and target orientation in Experiment
3. Error bars denote the standard error of the mean.
against independent processing of features. That it did not do so
once again suggests a robust independence of feature processing.
General Discussion
Experiment 1 demonstrated that OSM of a target feature can
occur even though the presence of the target has been detected.
(i.e., if a representation of the target had been established, but
masking prevented perception of its features). Experiments 2 and
3 examined whether the degree of OSM is a function of target–
mask feature similarity and whether target features become available for report conjointly or independently. The data show that
target–mask similarity on a particular dimension affected reporting
of that dimension only, indicating that target features are independently processed. In other words, consistent with Experiment 1,
OSM was occurring at the level of features rather than of integrated objects. In the remainder of this section we first address
some possible objections to the conclusions we have drawn. We
then relate the present findings about OSM to two other visual
phenomena and consider various theoretical implications of all
three phenomena.
OSM
That the extent of visual masking can be modulated by target–
mask similarity on such features as color, shape, and spatial
frequency has been shown several times previously (e.g., Breitmeyer, 1984; Growney & Weisstein, 1972; Uttal, 1970; Yellott &
Wandell, 1976). The present results confirm these earlier findings
but, when viewed in the context of a distinction between OFM and
OSM, also extend them in two important ways. First, whereas the
earlier studies were not designed to indicate whether target–mask
similarity effects are associated with OFM, OSM, or both, the
present experiments have shown that they can definitely be associated with at least OSM. Our conclusion that the extent of OSM
is a function of target–mask similarity (or dissimilarity) on feature
dimensions might appear to be challenged by two aspects of our
own data. Although detection performance in Experiment 1 was
less affected by four-dot masking than was discrimination, there
was such an effect on discrimination. Because target and mask
elements differed in many respects, this could be taken to show
that OSM occurs even when target and mask seem not to share
features. In fact, however, target and mask elements in Experiment
1 did share at least some features. They were the same color, which
Experiments 2 and 3, as well as a recent study by Moore and Lleras
(2005), showed to be an important dimension of similarity. They
also both contained right angles, though whether this would be
important we do not know. Indeed, our starting point for these
studies was precisely that little was known as to what, if any,
dimensions of similarity would be important for OSM.
We have shown that color and orientation are important dimensions but there could well be others such as degree of overlap of
spatial frequency content, closure, or goodness of figure. A second
aspect of our data that might also be thought to challenge our
interpretation comes from Experiment 3 in which, for both reporting color and reporting orientation, OSM was observed even when
target and mask differed on both dimensions. It could be argued
that given these feature differences, OSM should not have occurred. However, it should be noted that in Experiment 2 there was
1432
GELLATLY, PILLING, COLE, AND SKARRATT
no masking of targets that differed on both dimensions. Indeed, it
was in order to avoid such ceiling effects that the color and
orientation feature differences in Experiment 3 were deliberately
reduced in comparison with those of Experiment 2. Our overall
results indicate that the extent of OSM is a function of the degree
of target–mask similarity along various feature dimensions, of
which color and orientation are two. Varying target–mask separation (or overlap, or signal-to-noise ratio) on one of these feature
dimensions affects reporting of that dimension only and not reporting of the other. But for the stimuli of Experiment 3, the
feature values deliberately varied by such small amounts that even
targets differing on both dimensions were subject to some degree
of OSM. That feature-specific OSM has been demonstrated with
the present displays does not preclude the possibility that, with
attention spread across larger and busier displays including more
eccentric targets, OSM may be obtainable with target–mask combinations differing as much as possible on as many features as
possible. In addition to feature-level OSM, there may also be
object-level OSM (Lleras & Moore, 2003; Moore & Lleras, 2005;
Treisman & Kanwisher, 1998), a topic to which we return later in
this article.
The second way in which our results go beyond previous studies
of target–mask similarity is in demonstrating that under conditions
of OSM, the effect of similarity is a function of what target feature
is to be reported. That is, target–mask similarity cannot be specified simply in terms of the physical features of the stimulus
configuration but must also be defined with respect to what the
observer is reporting.2 We go on now to consider two other visual
phenomena that have been shown to exhibit this kind of taskspecific tuning.
Feature Specificity of Pop-Out and Sparse Representation
Mounts and Melara (1999) asked participants to report the color
or orientation of a target that preattentively popped out of a
48-item array from which it differed in either color or orientation.
All items in the array were subject to interruption masking. Participants were better able to report the color than the orientation of
a color pop-out target and better able to report the orientation than
the color of an orientation pop-out target. Furthermore, and consistent with the results of Experiment 3, this was the case even
when both features had to be reported on every trial. Mounts and
Melara attributed the effect to the fact that in their experiments the
target was more discriminable from distractors on the pop-out
dimension than on the non-pop-out dimension. Unlike in the
present study, the design of Mounts and Melara’s experiments did
not distinguish between OFM and OSM nor allow an evaluation of
whether featural processing was wholly or only partially independent. Mounts and Melara interpreted their findings as evidence
against object-based theories of attention (e.g., Driver & Baylis,
1989; Duncan, 1984; Duncan & Humphreys, 1992; Egly, Driver,
& Rafal, 1994), according to which objects are selected in an
all-or-none fashion, so knowing an object’s color should entail
knowing equally its orientation and vice versa. Mounts and Melara
(1999) concluded that, whether linkages among features are set up
in terms of a common spatial location or of a common object
token, attentional selection is coordinated at the level of features.
They saw their results as consistent with more recent object-based
models of attention that allow for partially independent processing
of object features (Duncan, 1996; Logan, 1996, 2004). The present
data, especially from Experiment 3, push this line of reasoning still
further by evidencing wholly independent processing of different
features of the same object. But if object features are processed
completely independently, then in what sense can selection in
these conditions be said to be object based?
One approach to the issue is that of sparse representation, a view
advocated by various authors over the years (e.g., Hochberg, 1984;
MacKay, 1973; O’Regan, 1992; Rensink, 2000). According to this
view, there is no rich and detailed internal representation of the
visual world. Representation of a particular object or area will be
only as detailed as it need be for the task at hand. We initially
posed the question “Is what gets substituted in OSM the representation of an integrated object or of a bundle of stimulus features
that remain somewhat unbound and independent?” On the sparse
representation view, the question itself is faulty because it takes for
granted that, given adequate presentation conditions, visual representations inevitably get rich and detailed independently of the
task at hand.3 The alternative is that we never see whole objects
but only those aspects of an object relevant to what we need to do.
A frequently cited analogy is with the perception of an object held
in the hand (MacKay, 1973; O’Regan, 1992). The experience is of
a complete object although it is one based on a fragmentary
representation, with sensory input restricted to just those object
parts actually in contact with the skin (plus haptic information).
The experience of visual scenes and complete visual objects is
similarly derived from partial input and representation. According
to Rensink (2000) and others, a seen object seems to appear before
us as real and fully complete only because any particular property
can be made explicit as required simply by interrogating the
external world. In Rensink’s coherence theory (CT), preattention
yields proto-objects, which can be surprisingly detailed—for instance, including three-dimensional organization— but are coherent only over small spatial and temporal extents, having constantly
to be regenerated. Focused attention is a process by which one (or
a few) of these proto-objects acquires a high degree of coherence
such that it can retain its identity across brief interruptions, and its
various features and their interrelationships can be experienced as
required.
CT has much in common with Treisman’s (1988) feature integration theory (FIT). For example, in CT attention selects a protoobject, which is transformed into a coherence field, or visual object
(Rensink, 2000). In FIT, attention selects “loosely organized feature bundles (Wolfe & Cave, 1999, p. 17) the features of which are
then glued together to make an integrated visual object. So far, so
similar. How the two approaches seem to differ is that in CT the
features of the attended object are represented only as required,
whereas in FIT all the features of the attended object are automatically represented, by which it is meant that they are bound
together. How many objects may be so richly represented at one
time depends in FIT on the overall perceptual and cognitive load
(Lavie, 2005; Treisman, 1995). Rensink (2000) was somewhat
ambivalent as to whether there is rich representation of an attended
object, first saying we may “instead of simultaneously representing
2
We thank Jim Enns for pointing out this way of thinking about our
results.
3
We thank Jim Enns for pointing this out to us.
OBJECT SUBSTITUTION MASKING
in detail all of the objects in our surroundings, represent only those
objects – and only those particular properties of those objects –
needed for the task at hand” (p. 1475, italics added) but then later
on more ambiguously suggesting, “An interesting possibility in
this regard is the binding problem may be illusory – it may be that
the properties of only one object at a time are ever bound together”
(p. 1484). A very strong version of the sparse representation view
has been defended by Enns and Austen (2003, in press), who have
argued that not all features of even a single attended object are
necessarily represented perceptually. Consistent with this view,
Droll, Hayhoe, Triesch, and Sullivan (2005) reported evidence that
object features may be represented only while they remain relevant
to action selection, and Rafal, Danziger, Giordano, Machado, &
Ward (2005) argued that explicit object representation is at the
level needed to select task relevant actions. Our results for Experiment 3, in which both features had to be reported, together with
those of Mounts and Melara (1999), show that representation of
task-relevant features is constrained independently by the quality
of the data (signal:noise ratio) on each feature.
Nevertheless, however sparsely or not features may be represented, they require to be linked to an object representation of
some kind. We turn now to consideration of token and type
representations and their possible involvement in OSM.
Repetition Blindness and Type–Token binding
Another visual phenomenon shown to exhibit task-specific tuning is repetition blindness (RB). Kanwisher, Driver, and Machado
(1995) found that spatial RB is modulated by selective attention to
stimulus dimensions. Their participants were briefly presented
with two simultaneous characters (C1 and C2), each followed by
an interruption mask; each character could be one of three letters
and one of three colors. Participants reported the identity or color
of first the left (C1) then the right (C2) character, responding “no
character” when they thought a location had been empty. Reporting of the relevant feature (identity or color) was markedly less
accurate when its value was repeated across C1 and C2 than when
it was not repeated but was unaffected by repetition of the nonrelevant feature. For example, if both characters were red, reporting of their colors was less accurate than if they were different
colors, regardless of the identities of the two letters, whereas if
both letters were Es, reporting of their identities was less accurate
than if they were different, regardless of the colors of the two
letters. For reporting either feature there was RB for the relevant
but not for the irrelevant stimulus dimension, and this was shown
to be unrelated to unwillingness to repeat a response. As with the
present results, variation on an irrelevant dimension did not affect
accuracy of reporting the relevant dimension.
Kanwisher (1987; Kanwisher et al., 1995) has argued that RB
arises because although repeated characters (or colors) are usually
appropriately recognized (matched to stored types), they are less
likely than unrepeated characters to be individuated as distinct
perceptual tokens. This failure of type–token binding supposedly
applies equally to temporal or spatial repetition of items. Consideration of the type–token distinction raises the question of how it
may apply to the phenomenon of OSM. The present masking
displays were similar in many ways to the displays used by
Kanwisher et al. (1995) to demonstrate spatial RB. Consider those
conditions in which target and mask elements onset simulta-
1433
neously (i.e., all conditions of all three experiments other than the
delayed mask condition of Experiment 2). The location of the
mask indicated which of the three potential targets had to be
reported on, so it was necessary to attend first to the mask elements
and then to the target itself, which was similar to Kanwisher’s
participants having to attend first to C1 then to C2. Just as C2
might repeat one, both, or neither of the features of C1, so targets
in our Experiments 2 and 3 repeated one, both, or neither of the
features of the mask. And in both cases, repetition on a dimension
reduced accuracy of report for that dimension—more RB in the
Kanwisher et al. study, more OSM in our experiments—whereas
repetition on the other dimension had no effect. Perhaps the two
performance deficits reflect a common processing failure. In the
RB literature, in which there has been an emphasis on similarity
between C1 and C2, this has been taken to be a failure to individuate separate occurrences of a type. In the OSM literature, in
which— especially in relation to four-dot masking—the emphasis
has been on dissimilarity between mask elements and the target,
the failure has been taken to be in maintaining a representation (of
some sort) of the target.
In OSM terminology, cells early in the system initially code for
features, with cells later in the system synthesizing candidate
object descriptions. However, drawing parallels between OSM and
RB reminds us that there is a distinction between type and token
representations (see also Lleras & Moore, 2003; Moore & Lleras,
2005). Just as the present experiments included no-target trials, the
experiments of Kanwisher et al. (1995) included trials in which
either C1 or C2 was omitted. As in our experiments, participants
were well able to distinguish target absence from a target glimpsed
indistinctly. Kanwisher et al. stressed that their results “reflect a
failure in binding the appropriate identity (type) to distinct object
representations (tokens) when a type is repeated, rather than a
complete failure to set up distinct tokens” (pp. 329 –330). In the
terms of OSM theory, it may be that early visual cells not only
process an object’s features but also cause an object token representation to be set up on the master map of locations (Treisman,
1988). Later cells would then synthesize a candidate type representation, which reentrant processing would link (or not) to the
appropriate object token on the master map. Our findings with
reporting of both features, along with those of Mounts and Melara
(1999), imply that under impoverished conditions different features need not be equally well bound into the type representation
nor, consequently, into the token representation. (Indeed, for visual
objects as structurally and semantically pared down as brief colored bars, perhaps the type representation is no more than a loose
conjunction of more or less well-represented features.)
The essential idea behind the concept of OSM is that it is a form
of masking that takes place at the level of object representation (Di
Lollo et al., 2000; Enns, 2004; Enns & Di Lollo, 1997). The
present demonstration of feature-specific OSM indicates that what
is masked need not be an integrated object representation, but we
have already conjectured that, dependent on experimental conditions, there could be different forms of OSM involving either
feature-specific or object-level representations. Lleras and Moore
(2003; Moore & Lleras, 2005) have presented evidence for OSM
involving object-level representations, favoring the term object
token in their first study and object file in the second. Moore and
Lleras showed that OSM was reduced when mask and target were
different colors and when, through the use of apparent motion, the
GELLATLY, PILLING, COLE, AND SKARRATT
1434
mask appeared to slide across the target, so only incidentally
surrounding it briefly. Their interpretation was that “susceptibility
to OSM is determined by the extent to which separate object
representations can be established for the target and mask prior to
mask offset; when separate object representations can be established, target information associated with that representation can
be protected from OSM” (p. 1178). Although this is certainly an
interesting argument, we note that the reductions in OSM could
also be due to target and mask having differed on the features of
color and motion or stationarity, respectively. Although the idea of
separate object-level and feature-level OSM effects is an intriguing
possibility, empirically distinguishing between levels is likely to
pose considerable challenges (Scholl, 2001).
Although it is important to try to theoretically relate different
visual phenomena to each other, it would be unwise at this stage to
force the comparison between OSM and RB too far. It may be that
they are similar yet distinct phenomena. Only empirical investigation will reveal whether, for example, one can be obtained under
conditions in which the other cannot or whether it may be possible
to obtain additive effects of the two. These are issues we are
currently pursuing.
Summary
The present findings suggest that the theory of OSM needs to be
expanded to take account of the distinction between type and token
representations. OSM can occur independently for separate target
features (Experiments 2 & 3) and has similarities to RB in that
both phenomena can be thought of as failures of type–token
binding. However, there may be examples of OSM involving
masking of object tokens or object files, as well as examples
involving failure to bind feature and/or type representations appropriately to a token representation. Possibly, the answer to our
original questions may be that OSM can operate at more than one
level. Certainly, it can operate at the level of independent features.
References
Breitmeyer, B. G. (1984). Visual masking: An integrative approach. New
York: Oxford University Press.
Di Lollo, V., Enns, J. T., & Rensink, R. A. (2000). Competition for
consciousness among visual events: The psychophysics of reentrant
visual processes. Journal of Experimental Psychology: General, 129,
481–507.
Driver, J., & Baylis, G. C. (1989). Movement and visual attention: The
spotlight metaphor breaks down. Journal of Experimental Psychology:
Human Perception and Performance, 15, 448 – 456.
Droll, J. A., Hayhoe, M. M., Triesch, J., & Sullivan, B. T. (2005). Task
demands control acquisition and storage of visual information. Journal
of Experimental Psychology: Human Perception and Performance, 31,
1416 –1438.
Duncan, J. (1984). Selective attention and organization of visual information. Journal of Experimental Psychology: General, 113, 501–517.
Duncan, J. (1996). Cooperating brain systems in selective perception and
action. In T. Inui & J. L. McClelland (Eds.), Attention and performance
XVI: Information integration in perception and communication (pp.
549 –578). Cambridge, MA: MIT Press.
Duncan, J., & Humphreys, G. W. (1992). Beyond the search surface:
Visual search and attentional engagement. Journal of Experimental
Psychology: Human Perception and Performance, 18, 578 –588.
Egly, R., Driver, J., & Rafal, R. D. (1994). Shifting visual attention
between objects and locations: Evidence from normal and parietal lesion
subjects. Journal of Experimental Psychology: General, 123, 161–177.
Enns, J. T. (2004). Object substitution and its relation to other forms of
visual masking. Vision Research, 44, 1321–1331.
Enns, J. T., & Austen, E. (2003). Change detection in an attended face
depends on the expectations of the observer. Journal of Vision, 3, 64 –74.
Enns, J. T., & Austen, E. (in press) Mental schemata and the limits of
perception. In Peterson, M. A., Gillam, B., & Sedgwick, H. A. (Eds.), In
the mind’s eye: Julian Hochberg on the perception of pictures, film, and
the world. New York: Oxford University Press.
Enns, J. T., & Di Lollo, V. (1997). Object substitution: A new form of
masking in unattended visual locations. Psychological Science, 8, 135–
139.
Enns, J. T., & Di Lollo, V. (2000). What’s new in visual masking? Trends
in Cognitive Science, 4, 345–352.
Fehrer, E., & Biederman, I. (1962). A comparison of reaction time and
verbal report in the detection of masked stimuli. Journal of Experimental
Psychology, 64, 126 –130.
Fehrer, E., & Raab, D. (1962). Reaction time to stimuli masked by
metacontrast. Journal of Experimental Psychology, 63, 143–147.
Growney, R., & Weisstein, N. (1972). Spatial characteristic of metacontrast. Journal of the Optical Society of America, 62, 690 – 696.
Hochberg, J. (1984). Form perception: Experience and explanations. In
P. C. Dodwell & T. Caelli (Eds.) Figural synthesis (pp. 1–30). Hillsdale,
N. J.: Erlbaum.
Jacoby, L. L. (1998). Invariance in automatic influences on memory:
Toward a user’s guide for the process dissociation procedure. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 24, 3–26.
Kahan, T. A., & Mathis, K. M. (2002). Gestalt grouping and common onset
masking. Perception & Psychophysics, 64, 1248 –1259.
Kanwisher, N. (1987). Repetition blindness: Type recognition without
token individuation. Cognition, 27, 117–143.
Kanwisher, N., Driver, J., & Machado, L. (1995). Spatial repetition blindness is modulated by selective attention to colour or shape. Cognitive
Psychology, 29, 303–337.
Lavie, N. (2005). Distracted and confused?: Selective attention under load.
Trends in Cognitive Science, 9, 75– 82.
Lleras, A., & Moore, C. M. (2003). When the target becomes a mask:
Using apparent motion to isolate the object component of objectsubstitution masking. Journal of Experimental Psychology: Human Perception and Performance, 29, 106 –120.
Logan, G. D. (1996). The CODE theory of visual attention: An integration
of space- based and object-based attention. Psychological Review, 103,
603– 649.
Logan, G. D. (2004). Cumulative progress in formal theories of attention.
Annual Review of Psychology, 55, 207–234.
MacKay, D. M. (1973). Visual stability and voluntary eye movements. In
R. Jung (Ed.), Handbook of sensory physiology (Vol. VII/3A, pp. 307–
331). Berlin: Springer.
Moore, C. M., & Lleras, A. (2005). On the role of object representations in
object substitution masking. Journal of Experimental Psychology: Human Perception and Performance, 31, 1171–1180.
Mounts, J. R. W., & Melara, R. D. (1999). Attentional selection of objects
or features: Evidence from a modified search task. Perception & Psychophysics, 61, 322–341.
Neill, W. T., Hutchinson, K. A., & Graves, D. F. (2002). Masking by object
substitution: Dissociation of masking and cuing effects. Journal of
Experimental Psychology: Human Perception and Performance, 28,
682– 694.
Nothdurft, H.-C. (2000). Salience from feature contrast: Additivity across
dimensions. Vision Research, 40, 1183–1202.
O’Regan, J. K. (1992). Solving the “real” mysteries of visual perception:
The world as an outside memory. Canadian Journal of Psychology, 46,
461– 488.
OBJECT SUBSTITUTION MASKING
Rafal, R., Danziger, S., Giordano, G. Machado, L., & Ward, R. (2002).
Visual detection is gated by attending for action: Evidence from hemispatial neglect. Proceedings of the National Academy of Sciences, 99,
16371–16375.
Rensink, R. A. (2000). Seeing, sensing and scrutinizing. Vision Research,
40, 1469 –1487.
Schiller, P. H., & Smith, M. C. (1966). Detection in metacontrast. Journal
of Experimental Psychology, 71, 32–39.
Scholl, B. J. (2001). Objects and attention: The state of the art. Cognition,
80, 1– 46.
Spencer, T. J., & Shuntich, R. (1970). Evidence for an interruption theory
of backward masking. Journal of Experimental Psychology, 85, 198 –
203.
Tata, M. S. (2002). Attend to it now or lose it forever: Selective attention,
metacontrast masking and object substitution. Perception & Psychophysics, 64, 1028 –1038.
Tata, M. S., & Giaschi, D. E. (2004). Warning: Attending to a mask may
be hazardous to your perception. Psychonomic Bulletin & Review, 11,
262–268.
1435
Treisman, A. M. (1988). Features and objects: The fourteenth Bartlett
Memorial lecture. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 40(A), 201–237.
Treisman, A. M. (1995). Modularity and attention: Is the binding problem
real? Visual Cognition, 2, 303–311.
Treisman, A. M., & Kanwisher, N. G. (1998). Perceiving visually presented objects: Recognition, awareness, and modularity. Current Opinion in Neurobiology, 8, 218 –226.
Uttal, W. R. (1970). On the physiological basis of masking with dotted
noise. Perception & Psychophysics, 7, 321–327.
Wolfe, J. M., & Cave, K. R. (1999). The psychophysical evidence for a
binding problem in human vision. Neuron, 24, 11–17.
Yellott, J. L., & Wandell, B. A. (1976). Colour properties of the contrast
flash effect: Monoptic vs dichoptic comparisons. Vision Research, 16,
1275–1280.
Received August 17, 2005
Revision received April 19, 2006
Accepted April 19, 2006 䡲