Academia.eduAcademia.edu

SWIFT explorations

2003, The mind's eye: Cognitive and applied …

SWIFT is a computational model of eye guidance in reading. It assumes (1) spatially distributed lexical processing, (2) a separation of saccade timing from saccade target selection, and (3) autonomous and parallel generation of saccades with inhibition by foveal targets. The model accounts for fixation probabilities as well as various measures of inspection time in their relation to lexical processing difficulty. We illustrate the dynamics associated with saccade generation and inhibition by foveal targets. In addition, we generate predictions for an experiment involving gaze-contingent display change.

3617P HYONA - 13-26 cg 1111 2 3 4 5 6 7 8 9 10 1111 2 3 4 5 6 7 8 9 20111 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5111 19/12/02 9:56 am Page 391 Chapter 19 SWIFT Explorations Reinhold Kliegl and Ralf Engbert SWIFT is a computational model of eye guidance in reading. It assumes (1) spatially distributed lexical processing, (2) a separation of saccade timing from saccade target selection, and (3) autonomous and parallel generation of saccades with inhibition by foveal targets. The model accounts for fixation probabilities as well as various measures of inspection time in their relation to lexical processing difficulty. We illustrate the dynamics associated with saccade generation and inhibition by foveal targets. In addition, we generate predictions for an experiment involving gaze-contingent display change. Introduction Is it sufficient to assume that reading involves sequential shifts of attention from one word to the next or are several words within the perceptual span processed in parallel? There is still a very productive controversy surrounding this issue summarized recently by Starr and Rayner (2001). These authors concluded that One potential solution would be to abandon the serial framework of attention models of eye-movement control and replace it with a parallel mechanism. . . . Words would thus be processed in parallel, although the processing of information would be most accurate at the center of the attentional distribution. . . . However, such a model seems rather complicated and would be difficult to implement in a computational model. Thus a challenge for proponents of a parallel mechanism of attention during reading is to delineate the parameters of such a framework (Starr & Rayner, 2001, p. 162). We proposed a model that fits this description (Engbert, Longtin & Kliegl, 2002). The model is based on three principles: spatially distributed lexical processing, a partial The Mind’s Eye: Cognitive and Applied Aspects of Eye Movement Research Copyright © 2003 by Elsevier Science Ltd. All rights of reproduction in any form reserved. ISBN: 0–444–51020–6 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 392 392 Reinhold Kliegl and Ralph Engbert separation of saccade timing from saccade target selection, and autonomous Saccade generation With Inhibition by Foveal Targets. From the last principle we also derived an acronym for the model (i.e., SWIFT). In the following we present a synopsis of the model components presenting in greater detail our assumptions about foveal inhibition and the dynamics of saccade generation and saccade cancellation. In addition, we evaluate the model with respect to predictions for an experiment with gaze-contingent display changes. With these explorations of the SWIFT model, we aim at a greater transparency of its core principles and demonstrate the utility of such a computational model for accounts of extant data as well as the prediction of novel aspects of eye guidance in reading. The SWIFT Model Lexical Processing Figure 19.1 provides an overview of the model components. The “Lexical processing” box encapsulates how words are processed relative to the current eye position. We call this processing “foveal lexical activity.” In contrast to other computational models such as E-Z Reader (Reichle, Pollatsek, Fisher & Rayner, 1998; Reichle, Rayner & Pollatsek, 1999) or the model proposed by Engbert and Kliegl (2001) we assume that the perceptual span encompasses four words, namely the word currently fixated as well as the one to the left and the two words to the right. The (normalized) processing rate depends on fixation location: It is largest for the fixated word [parameter estimate: (0) = 0.798] and considerably smaller for the left and right neighbor [(1) = (–1) = 0.077] and even smaller for the second word to the right [(2) = 0.048]. As the sum of s was fixed at 1.0, we used only two degrees of freedom for the three parameter estimates. Details about parameter estimation will be presented in the “Model evaluation” section; the model was estimated with a total of 11 free parameters (see Table 19.1). A word is processed as soon as it is within the perceptual span leading to a dynamic change in what we call lexical activity associated with this word. We assume that there is a maximum of lexical activity with each word depending on its frequency and its predictability from the prior sentence context, using the specification proposed by Reichle et al. (1998), that is Ln = (1–pn)( – log fn), where pn represents the predictability of wordn and fn the printed frequency of wordn,  and are model parameters which were estimated as 148.5 and 5.71, respectively. Thus, the maximum of lexical activity will range between 148.5 (=) for an unpredictable, very-lowfrequency word and 0 for a perfectly predictable word. Over time lexical activity increases from zero for unprocessed words to the maximum lexical activity and then decreases back to zero. The processing time associated with the increase in lexical activity is called lexical preprocessing; the processing time required for the return to the zero-baseline is called lexical completion — again in rough analogy to a two-level processing introduced by Reichle et al. (1998). Obviously, preprocessing and completion could be conceived as two independent processes. Rather than estimating a single 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 393 SWIFT Explorations 393 1111 2 3 4 5 6 7 8 9 10 1111 2 3 4 5 6 7 8 9 20111 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5111 Figure 19.1: Schematic diagram of SWIFT. The main subsystems are saccade programming and lexical processing. These two subsystems are coupled via a foveallyinhibited random timing system and a saccade execution system which moves the eyes during saccades (from Engbert et al., 2002). processing rate and preprocessing factor, we could estimate two separate rates for these processes with the first process “running” from zero to maximum lexical activity and the second process from maximum to zero lexical activity. The remaining link between these two processes is that they use the same maximum of lexical activity for a given word as a stopping value for the process. As an analogy of this conceptualization it may be useful to interpret the maximum lexical activity of a word as an indicator of its “attractiveness” to the mind. In this sense, unpredictable or low-frequency words are more interesting to the mind and it takes longer for this interest to subside than in the case of predictable or high-frequency words. Note that the current formula based on printed frequency and predictability serves only as a descriptive interface to the general difficulty of the words in the perceptual span. Ideally, such lexical activity values should be provided by a theoretical account of sentence processing. Predictability and printed frequency are but convenient proxies of theoretically unspecified contributions of syntactic, semantic, and pragmatic sources of variance for the processing difficulty associated with a given word in its sentence context. Processing rates for left and right parafoveal word could be constrained to the same estimate without loss of fit (i.e., 0.077). However, it is well-known that there is a 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 394 394 Reinhold Kliegl and Ralph Engbert processing asymmetry in the direction of reading. Indeed, in our initial qualitative specification of parameters we assumed a higher processing rate for the right than the left parafoveal word. In the quantitative estimation it turned out that this asymmetry was already captured in the parameter that allows preprocessing and completion rates to differ by constant factor. This parameter (f = 62.5) suggests that lexical preprocessing is completed much faster than lexical completion. It turned out that lexical preprocessing usually occurs in the right parafovea and lexical completion in the right parafovea or fovea. If it occurred at all, processing in the left parafovea was restricted to lexical completion. Thus, lexical processing in the model reflects the well-known asymmetry of processing. Moreover, it allows us to interpret word-position dependent processing rates (i.e., the s) as indicators of retinal acuity which would be symmetric relative to the fixation location. Saccade Programming The second major model component specifies “Saccade programming” which comprises (a) saccade initiation, (b) inhibition by foveal targets, (c) labile and nonlabile stages of timing (When?), (d) target selection (Where?), and (e) saccade execution (see Figure 19.1). In the following we describe each of these aspects. Saccade initiation In the SWIFT model saccade initiation occurs autonomously (see “Random timing” box in Figure 19.1), an idea already implemented in our earlier sequential-attention shift model (Engbert & Kliegl, 2001) after a random interval generated by a timer (tS). Assuming a gamma distribution, tS was estimated with a mean of 187.1 ms and a relative standard deviation of 0.239 of the mean (i.e., 44.7 ms). (A single value for relative standard deviations was estimated for saccade initition times and other saccade-related timing distributions, see next paragraph). The assumption of autonomous saccade initiation is fundamentally different from E-Z Reader and its predecessor models (e.g., Morrison, 1984) where saccade initiation is strictly coupled to some aspect of lexical processing. Inhibition by foveal targets The intuition guiding the assumption of autonomous saccade initiation is that during reading we initiate saccade programs according to some preferred mean rate. However, we want to allow for some influence of lexical processes. Specifically, we assume that high lexical activity delays the saccade initiation. In other words, if there is a chance that comprehension may lag behind the autonomous generation of saccades and, consequently, comprehension and eye position threaten to desynchronize, then saccade initiation can be postponed. In Figure 19.1, this intervention is represented by the inhibitory link from the foveal lexical activity to the link between “Random timing” and “Saccade programming”. We assume an additive contribution of foveal lexical activity [haK(t)] to the random interval generated by the timer (tS). Thus, a new saccade program is started after the time interval t′ = tS + haK(t). For a word of maximum difficulty ( = 148.5, p = 0), no parafoveal preprocessing and purely foveal processing the maximum “inhibition time” 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 395 SWIFT Explorations 395 1111 2 3 4 5 6 7 8 9 10 1111 2 3 4 5 6 7 8 9 20111 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5111 haK(t) amounts to 181 ms for the current set of parameter estimates [(0) = 0.798; h = 50.3] (see Appendix A). Thus, the next saccade will be initiated at the latest 181 ms after the value drawn from the distribution of saccade initiation times. (The calculation in the appendix also shows that the maximum inhibition time does not depend on the precise value of h as long as h is sufficiently large; for an infinite value of h, the maximum inhibition time would be 186 ms. Thus, in principle this free parameter might not be necessary in future model versions.) In general, however, the delay will be much smaller because of lower lexical activity, parafoveal preprocessing, asynchrony of maximum lexical activity, and determination of inhibition time during foveal processing. Indeed, the amount of fixation time due to foveal inhibition amounted to less than 15% for low-frequency words in the simulation. Nevertheless, foveal inhibition was necessary to explain the dependency of first fixation duration on word frequency. We will discuss a similar proposal by Yang and McConkie (2001; McConkie & Yang, this volume) in the final section of this chapter. Labile and nonlabile stages of saccade program (“when?”) Once a saccade is initiated, that is once a saccade program is started, we assume two stages, a labile and a non-labile stage. Assuming gamma distributions, labile and non-labile times were estimated to last on average 128.6 (SD = 30.7 ms) and 41.6 ms (SD = 9.9 ms), respectively. The model assumes that a saccade can be cancelled and saccade targets can be modified during the labile phase. A distinction between labile and non-labile stage is implemented in E-Z Reader as well (Reichle et al., 1998). However, E-Z Reader does not allow direct target modification. Rather a saccade program is always initiated to the next word at the completion of the first stage of lexical processing (i.e., the familiarity check); target modification can occur indirectly through the cancelation of a saccade during the labile stage. In the SWIFT model, a new saccade can be initiated during the preparation of an older one. Such interactions of a new saccade program with an older one will be the topic of the next section. Target selection (“where?”) The distinction between “when?” and “where?” is motivated by neurophysiological results (e.g., Carpenter, 2000; Findlay & Walker, 1999; see Reilly & Radach, this volume, for another implementation of this distinction in computational model of reading). Saccadic target selection (where to move next) is specified as largely independent of saccade timing (when to move). Target selection was estimated to occur after 87% of the labile phase, that is on average after 112.1 ms in the current implementation. This dependence could be relaxed substantially in future versions of the model. This target of the next saccade, one of the words within the current perceptual window, is stochastically determined according to the values of current lexical activities. Thus, the word with the largest current lexical activity is the most likely target; perfectly predictable words with a lexical activity of zero will be skipped. The differences in processing rates associated with fixation position and the differences in lexical preprocessing and completion rates generate a “bow wave” of lexical activity pulling the eye in direction of reading across the sentence. The reason is that the currently fixated word is processed at the highest rate (see above) and is likely to be already in the stage of lexical completion with a continuous decrease 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 396 396 Reinhold Kliegl and Ralph Engbert in lexical activity at the time of target selection. Words to the right of the fixated word are processed much slower than the fixated word and are likely to be still in the stage of lexical preprocessing with a continuous increase in lexical activity (or perhaps even “trail” the fixated word in the stage of lexical completion). Consequently, at the time of target selection the lexical activity is likely to be lower for the currently fixated words than its right neighbor. Therefore, the latter is more likely to be selected as the next saccade target. Consequences of target selection mechanism The stochastics of the selection process as well as differences between maximum lexical activities will lead to refixations of the current word and refixations to the previous word. In addition, there is the possibility that the eye moves on before a word (i.e., the word to the left of the fixated one) is completely processed with the residual lexical activity remaining at a constant level as long as the word is outside the perceptual span. These words will remain potential targets for saccades with a probability derived from their residual lexical activities. As the residual lexical activity is typically rather low, the selection is likely to occur late when the eye has moved towards the end of the sentence. We call such regressions to words to the left of the current perceptual span “long regressions”. Their empirical plausibility (i.e., the accuracy of regressions to previous words in a text) has been demonstrated by Kennedy and Murray (1987). Saccade execution The end of the non-labile stage of saccade programming triggers “Saccade execution” (see Figure 19.1). At this time, lexical preprocessing is suspended but lexical completion continues. The rationale for this distinction is that lexical preprocessing requires perceptual input which is suppressed or strongly attenuated during saccade execution. Lexical completion should not be affected by saccadic suppression. In the simulation, times for saccade executions were fixed with a mean of 25 ms and a standard deviation of 8.3 ms assuming a gamma distribution of latencies. The saccade execution shifts the position of the eye which in turn leads to a change in the foveal and non-foveal lexical activities. Dynamics of Saccade Generation Most of the model assumptions are fairly straightforward but there are some intricacies associated with the initiation of saccade programs. In particular, due to autonomous saccade generation it is possible, that a new saccade program is started while the previous saccade progam is still “under construction.” The effects of such parallel saccade programs on the resulting fixation durations depend on the stage of completion of the first saccade program (completed, labile stage, non-labile stage, or execution stage). As described in the last section, lexical processing has an effect on the initiation of the next saccade program solely via the inhibition by foveal lexical activity. Lexical activity can only delay the initiation of a saccade program, it cannot start or cancel a saccade program. Thus, if the currently fixated word is very difficult 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 397 SWIFT Explorations 397 1111 2 3 4 5 6 7 8 9 10 1111 2 3 4 5 6 7 8 9 20111 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5111 and if there is a high lexical activity when the initiation time for the next saccade is sampled, then the gamma-distributed latency will be extended accordingly. Therefore, for the following illustration the effects of lexical activity of foveal inhibition can be subsumed under the initialization time. Figure 19.2 illustrates a hypothetical sequence of saccade programs assuming for now a deterministic scheme with labile stages lasting 125 ms, non-labile stages 50 ms, and execution times 25 ms. We also assume that targets are selected 100 ms after saccade initiation. Now we turn to a description of the various cases of overlap between saccade programs. Non-overlapping Saccades When a saccade program (SP) is started, the time for the initiation of the next saccade is determined as well. The initial saccade program SP0 is started with an initialization of t = 0 (see Figure 19.2).At the beginning of SP0 the initialization time for saccade program SP1 is sampled according to a gamma distribution , the dashed horizontal line indicates that SP1 is to start at t = 250 ms. According to the above specifications, for the first fixation duration F0 we simply compute the sum of labile and non-labile stages which amounts to 175 ms. The target for SP0 was specified after 100 ms during the labile stage. The first fixation (i.e., the nonlabile saccade program stage) is terminated with the execution of the saccade lasting 25 ms. Thus, the next fixation F1 starts at 200 ms. Note that there are still 50 ms of processing (250 ms — 200 ms) before SP1 is initiated. Thus, the inital saccade program does not overlap with the first one; SP1 is started during F1 at absolute time t = 250 ms. At this point in time a latency for SP2 is chosen with a value of 225 ms. Consequently, there is again enough time for SP1 to complete without interference from SP2: Its labile phase will end at absolute time 375 ms; the non-labile phase at absolute time 425 ms (amounting to a F1 duration of 225 ms = 50 ms + 175 ms); target selection occurred at absolute time 350 ms. The execution of the second saccade lasts from 425 ms to 450 ms. This illustration shows that there is no simple link between fixation duration and the time intervals at which saccade programs are initiated because the difference between the time for the initiation of the next saccade program and the time required for carrying out the current saccade program is allocated to the next fixation duration. Saccade Initiation During Labile Stage of Current Saccade Program At absolute time 475 ms, saccade program SP2 is started and the next saccade program SP3 is to be initiated with a latency of 75 ms. The initiation of SP3 falls in the labile stage of SP2. In this case, SP2 is simply canceled and replaced with SP3 (i.e., the duration in the labile stage is reset to zero; any target would be de-selected as well). Obviously such a cancelation extends the duration of current fixation by the amount of time already spent in the labile stage of canceled saccade program (see F2). Note also that cancelation of a saccade program implies that only one saccade program is active. 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 398 398 Reinhold Kliegl and Ralph Engbert Figure 19.2: Dynamics of saccade generation in SWIFT. When a saccade program (SP) is started, the time for the initiation of the next saccade is determined as well. This will lead to non-overlapping (SP0 and SP1) and various cases of overlapping saccade 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 399 SWIFT Explorations 399 1111 2 3 4 5 6 7 8 9 10 1111 2 3 4 5 6 7 8 9 20111 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5111 Saccade Initiation During Non-labile Stage of Current Saccade Program The case of a saccade initiation during the non-labile stage of a saccade program is illustrated for SP4 and SP5. As implied by the name of the stage, there is no consequence for a saccade program in the non-labile stage. The program will run its course and land at the target word selected earlier. There is also no consequence for the initiation of the new saccade program. Thus, in this case there are two saccade programs running in parallel without interference. The same logic applies if the initiation of a new saccade program occurs during the execution of the current saccade. Two Saccade Programs in Non-labile or Execution Stages Durations of labile and non-labile stages were specified as stochastic variables. Consequently, the second saccade program might enter the execution stage while the first saccade is being executed. Obviously the eyes cannot go two targets at the same time. In this case the later saccade program simply has to wait for the current one to be completed. This case is illustrated with SP6 and SP7. Note that this case can lead to very short fixations because the eye is in rest only for the duration of the nonlabile stage of SP7. Model Evaluation We used the corpus of Schilling, Rayner and Chumbley (1998) to fit the model and estimate parameters. The corpus comprises 48 sentences with a total of 536 words. Frequency and predictability values are available for each word. Moreover, Schilling et al. reported statistics of gaze, first, and single fixation duration as well as the probabilities of single, double, and zero fixations (i.e., skipping). Trials with regressions were discarded from the analysis. The same corpus was used by Reichle et al. (1998) and Engbert and Kliegl (2001). For each sentence we obtained 100 simulations. The model was fitted with 11 free parameters. These parameters, the best fitting values, and the associated standard deviations are listed in Table 19.1. The panels of Figure 19.3 represent three simulations of the same sentence using the parameter estimates of Table 19.1. The sentence read was: “Mark told Jane that he would meet her after baseball practice”. Time runs from left to right. The solid black line indicates the position of the eye at each point in time. Vertical gray lines indicated saccades; the vertical dashed line marks the end of reading. The thin lines programs. Initiation of the second program during the labile stage of the first one leads to cancellation of the first program (SP2 and SP3). Initiation during the non-labile stage does not interfere (SP4 and SP5) but the two programs can not be in the non-labile stage at the same time (SP6 and SP7). In this case the second program has to wait. Fixation durations are shown next to time axis (F0 to F6). 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 400 400 Reinhold Kliegl and Ralph Engbert Table 19.1: Model parameter estimates. Parameter Lexical parameters Processing rate Saccade parameters Symbol Value Difficulty, intercept  148.5 Difficulty, slope 5.71 Foveal  (0) 0.798 Parafoveal  (1, –1) 0.077 Parafoveal  (2) 0.048 Preprocessing f 62.5 187.1 Random timing (ms) tS Labile stage (ms) 128.6 l 41.6 Nonlabile stage (ms) n SD (relative to mean)  0.239 0.872 Target selection (% labile stage) tar Inhibition factor h 50.3 SD 3.6 0.29 0.017 0.017 0.017 5.8 2.6 3.2 4.7 0.021 0.056 14.1 Notes: Standard deviations (SD) were based on five runs of parameter estimation by genetic algorithm (see Engbert et al., 2002, Appendix A). Processing rates were estimated with the constraint:  (0) + 2 (1, –1) +  (2) = 1, yielding a total of 11 free parameters for the model. above the solid eye-position line show the time course of lexical activity for each word. Finally, the solid black segments on the time axes indicate delays of saccade program initiation due to foveal inhibition. Obviously, aside from differences in total reading time between simulations, the three examples illustrate the high degree of complexity that follows from the theoretical principles and the inherent dynamics of the model and the large variance of eye movement traces that result from them. The same pinciples and dynamics, however, also strongly constrain the type of paths the eye can take through the sentence (Engbert, Longtin & Kliegl, in press). Indeed, such statistics may prove very useful for comparing computational models with each other and with human data. The traces also illustrate links between lexical activity and various types of eye movements. The top panel represents a trace consisting solely of forward moves. At the beginning the eye is on the first word and the model starts processing the first three words as indicated in the increase and subsequent decrease of lexical activity visualized for each word. The rate of change is highest for the fixated word. After about 500 ms the eye moves to the second word. The decision point to select this word as the next target occured around 100 ms into the first fixation. Note that at this time lexical activity of the fixated word was probably already smaller than that of the second and the third word. The height of lexical activity indicates the probability with which a word is selected as the next target. Therefore, chances were high for the selection of the second or third word with the second word “winning” in this case. Note that in the second and third trace for this sentence (middle and bottom panels of Figure 19.3) the third word was selected as the target with the second word being skipped in the 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 401 SWIFT Explorations 401 1111 2 3 4 5 6 7 8 9 10 1111 2 3 4 5 6 7 8 9 20111 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5111 process. Words are also skipped if they are already processed completely in the parafovea of earlier fixations. Examples of such skips are the words “that”, “her” and “practice” in the first trace (top panel). The second and third trace illustrate two different regressions to the skipped word “told”. In the second trace (middle panel of Figure 19.3, “told” has the highest lexical activity among the four words in the perceptual span (i.e., “told Janet that he”) during the second fixation and was selected as the taregt for the third saccade, leading to a regression back from “Janet”. The same mechanism will also yield refixations of a word, indicated with a circle on the trace. For example, in the second trace (middle panel) “told” was fixated twice, that is it was selected again as target during the first fixation. Obviously, such refixations are more likely for low-frequency, unpredictable words. We already encountered regressive movements of the eye to an earlier word within the perceptual span. A word will also be the target of a long regressive movement if its processing was not completed while it was in the perceptual span. For example, the word “told” in the bottom panel of Figure 19.3 was left in such an unfinished state because first “Janet” and then “he” were selected as targets causing “told” to fall to the left of the perceptual span. In the model the residual lexical activity of this word will stay at its last value in the perceptual span until it is selected again as a target. Any word with residual lexical activtiy will compete for target selection irrespective of whether or not it still is in the perceptual span. Typically such a residual lexical activity is low due to earlier processing and therefore the chances of being selected as a target are small as long as there are unprocessed words to the right. However, the predictability of words increases with serial word position and, consequently, maximum lexical activity will decrease across a sentence. Thus, as the eye approaches the end of the sentence chances increase again that words with residual lexical activity will be selected. In the third trace (bottom panel of Figure 19.3) “told” was selected as a target when the eye fixated “would.” The following words had been processed completely; therefore reading resumes at the next word with lexical activity outside the perceptual span which is “after” in this case. (Incidentally, “after” was first processed during the fixation on “meet”. Due to the subsequent regression, processing ceased while lexical activity was fairly high.) There is good evidence that such long regressions are typically very precise, suggesting that reading sets up a spatial representation of word locations (Kennedy, 2000). Experimental evidence for long regressions due to residual lexical activity would constitute very strong support for the model because, traditionally, such long regressions have been attributed only to high-level sentence parsing problems, such as revisions of an interpretation in a garden-path sentence. Predictions relating to this distinction remain to be tested in experiments but at this point they illustrate the innovative potential of a dynamic model. From such simulations we can compile summary statistics of various measures of inspection time as well as various measures of fixation probability and compare them with experimental results. These comparisons are displayed in the two panels of Figure 19.4 as a function of logarithmic word frequency. In general, the model reproduces the qualitative patterns very well. Most notable are the increase in skipping probability and the drop in single-fixation probability for high-frequency words. Also the decrease 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 402 402 Reinhold Kliegl and Ralph Engbert 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 403 SWIFT Explorations 403 1111 2 3 4 5 6 7 8 9 10 1111 2 3 4 5 6 7 8 9 20111 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5111 in inspection time measures as a function of logarithmic word frequency is reproduced very well. In the current implementation the height of lexical activity at the time of target selection is used as weight for determining the next saccadic goal. Explorations of the model showed that data could also be fitted if height determined the next target (i.e., a “winner-take-all” rule). Thus, the stochastic selection of target selection was not very critical. However, substantially lower parafoveal than foveal processing rates were critical for obtaining adequate model fits. Simulation of Eye-contingent Display Changes In this section, we demonstrate that SWIFT can be used to predict several measures of a typical reading experiment. Properties of the perceptual span in reading are fundamental to the model. To derive the model we used psychologically plausible assumptions about the perceptual window. To test whether these assumptions lead to realistic behavioral patterns, we investigate predictions that can be obtained from numerical simulations with respect to an experimental manipulation of the display during reading (Binder, Pollatsek & Rayner, 1999; we restrict our analysis to the preview condition). In an eye-contingent display change experiment, a target word in a sentence was changed during preview. As soon as the eyes entered the target region, the preview word was replaced by the target word. The display change was performed during the saccade to the target region and the subjects tested were unaware of this manipulation. Some results of the Binder et al. (1999) experiment are summarized in Table 19.2. If the preview was not changed, that is if an identical base word appeared at the target location, a skipping probability of 0.30 resulted. If a different word was used during preview, that is if the word was replaced with a different word during the approaching saccade, skipping probability decreased to 0.165. Also there was an increase of first fixation durations and regression probability for the changed word. These results highlight the importance of parafoveal lexical processing during reading. In our numerical simulations, we used a sentence of the corpus by Schilling, Rayner and Chumble (1998) to demonstrate the effect of reduced preview in our model. Obviously, we only aimed for a qualitative reproduction of key results because of differences in the sentence material and experimental set-up. To this end, we reset the lexical activity of the target word to zero as soon as a saccade to the target region occurs (see Figure 19.5). As a result, we find a comparable reduction in skipping probability from 0.47 to 0.15. Furthermore, like Binder et al. (1999) we observe an Figure 19.3: Trajectories for the same sentence from three simulation runs of SWIFT with parameter estimates of Table 19.1. Lexical activities (thin lines) are plotted over time together with the eye position (bold line). The execution of saccades is indicated by the shaded vertical regions. The beginning of a refixation is indicated by a circle. Foveal inhibition (delay of initiation of saccade program) is marked by the bold segments on the time axis. Sentence and data on word frequencies and predictability were taken from the Schilling et al. (1998) experiment. 3617P HYONA - 13-26 cg 19/12/02 9:56 am Page 404 404 Reinhold Kliegl and Ralph Engbert increase in the first fixation durations in the changed compared to the identical preview condition (Table 19.2). Finally, to demonstrate the extraction of information to the left of the fixation point, we analyzed the probability of regressions to the target word. Our theoretical expectation was that the reset of lexical activity will lead to an increase of the number of regression. For the target word used here, the probability of regressing was 0.042 without preview manipulation. The increase of the regression probability to Figure 19.4: Statistical evaluation of SWIFT performance; experimental data are from Schilling et al. (1998). (a) First fixation duration, gaze duration, and single fixation duration as a function of word frequency class (averaged over 1000 statistical realizations from SWIFT simulations, i.e. 1000 simulations of the model over the same corpus of sentences but with different pseudo-random numbers). (b) Probabilities for word skipping, performing a single fixation, and making two fixations (computed from the sampe runs as in (a)) as a function of word frequency class. (from Engbert et al., 2002). 3617P HYONA - 13-26 cg 19/12/02 9:57 am Page 405 SWIFT Explorations 405 1111 2 3 4 5 6 7 8 9 10 1111 2 3 4 5 6 7 8 9 20111 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5111 0.07 as a consequence of the reset of lexical activity is much smaller than the increase observed in the experiment by Binder et al. (1999). This discrepancy, however, may be explained by the fact that, in the model, regressions are completely caused by incomplete lexical processing. Different sources of regressions, as reviewed for example by Rayner (1998), are beyond the scope of the current version. Table 19.2: Gaze-contingent display changes: A comparison of SWIFT and Binder et al. (1999). Reading measure / Preview Probability of skipping preview First fixation on target (ms) Probability of regressing to target SWIFT Binder et al. (1999) Identical Different Identical Different 0.470 187 0.042 0.150 201 0.070 0.300 228 0.080 0.165 246 0.220 Note: Values for different preview of Binder et al. (1999) represent mean of related and unrelated preview changes. Figure 19.5: Numerical simulation of the preview experiment by Binder et al. (1999). A preview display was changed during the first saccade entering the target region (word6). In our simulations, the lexical activity of the target word was reset to zero, when the first saccade to the target region was performed. In the example trajectory shown here, this occurs at time t = 930 ms. As a result, the fixation duration of word6 increases. Also the word is skipped less often and is more likely to attract a regressive saccade. 3617P HYONA - 13-26 cg 19/12/02 9:57 am Page 406 406 Reinhold Kliegl and Ralph Engbert In summary, the model can be used to predict typical measures used for the analysis of eye movement experiments with gaze-contingent display changes. The influence of preview manipulations and the mechanism of extraction of information to the left of the fixated word are qualitatively in good agreement with experimental results. These results underline the model’s psychological plausibility and how properties of the perceptual window can be thought to influence the dynamics of eye movements. Comparisons and Perspectives Comparison with E-Z Reader Computational model of eye movements in reading hold much promise for providing unifying accounts of rich and diverse sets of experimental results. Recent years have witnessed the emergence of a few attempts in this direction (Engbert & Kliegl, 2001; Reichle et al., 1998, 1999; Reilly & Radach, this volume). The SWIFT model was developed as an alternative to the E-Z Reader (Reichle et al., 1998), which at the time, in our opinion, was the most advanced computational model in the domain of attentional and ocular control during reading. Consequently, central assumptions guiding the design of E-Z Reader were adopted for SWIFT, such as two stages of lexical access (including the formula for combining word frequency and predictability) and the distinction between labile and nonlabile stages of saccade programs terminating with saccade execution. We also followed Reichle et al.’s (1998) lead to simulate reading at the level of words, not characters, which probably underestimates the role of perceptual factors given the correlation of word length and word frequency, but greatly reduced the complexity of model construction. Finally, as Reichle et al. (1998), we used the corpus of data from Schilling et al. (1998) to evaluate the model. In preliminary model comparisons, goodness of fit statistics were typically quite comparable. However, given the large architectural differences between the models, quantitative model fits may not be very meaningful. Rather we want to point to a few qualitative differences between the models which we considered to be critical in the design of the SWIFT model. There are three pieces of empirical evidence that may prove problematic for the notion of sequential attention shifts (SAS) in reading and, consequently, also for computational models such as E-Z Reader (Reichle et al., 1998) or our own earlier model of this type (Engbert & Kliegl, 2001) subscribing to this assumption (for references see Engbert et al., 2002, p. 622f.): First, there is some evidence that processing of a fixated word is influenced by the difficulty of the next word. Second, there is evidence that information is picked up to the left of the fixated word. Third, there is little empirical support for longer fixation durations prior to skipped words — an implication of sequential attention shift models. The experimental evidence for the first problem is controversial. The second problem was handled at the cost of an increase in model complexity in more recent versions of the model (see Pollatsek, Reichle & Rayner, this volume). 3617P HYONA - 13-26 cg 19/12/02 9:57 am Page 407 SWIFT Explorations 407 1111 2 3 4 5 6 7 8 9 10 1111 2 3 4 5 6 7 8 9 20111 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5111 However, in our opinion, there is no easy solution to the third problem: In SAS models, word skipping requires that a saccade targeting the next word is cancelled and re-programmed to the following word. Saccade cancellation necessarily increases the fixation duration prior to the saccade. Indeed both E-Z Reader and our own previous SAS model exhibited a strong effect of this sort (173 ms in E-Z Reader 5; 75 ms in Engbert & Kliegl, 2001). The three empirical studies cited by Reichle et al. (1998) in support of longer fixation durations prior to skipped reported the following effects: 3 to 7 ms (Hogaboam, 1982, Tables 18.1, 18.3), 21 ms (Pollatsek, Rayner & Balota, 1986, p. 126), and 38 ms (Reichle et al., 1998, p. 147). Given that the effect can not be secured empirically in large data sets (e.g., McConkie, Kerr & Dyre, 1994; Radach & Heller, 2000), the SAS assumption should be reconsidered and SAS models minimally require architectural revision. One solution might be to postpone target selection, similar to the partial separation of saccade initiation and target selection in the SWIFT model. Accordingly, for SWIFT simulations we observed a range of 10 to 21-ms increases of fixation durations prior to skipped words which appears to be in agreement with the empirical data. Incidentally, in our opinion, it is to the credit of E-Z Reader and other SAS models that they can be seriously challenged with experimental evidence. A general problem of E-Z Reader relates to the order-of-processing methodology. Even slight modifications of the model typically increase the number of states to be considered and seriously limit the complexity of the dynamics that can be covered in the model. Given that SWIFT was designed with at least some of these problems in mind, it is not surprising that it is not affected by these problems. And we should point out that SWIFT does not (yet) account for the full scope of the Schilling et al. (1998) corpus, most notably we did not fit distributions of fixation durations as function of logarithmic word frequency. Comparison with Competition–Inhibition Theory Yang and McConkie (2001, also McConkie & Young, this volume) proposed a Competition–Inhibition theory which is very specific with respect to the timing of saccades, that is the “when?” component of saccade programs. Although the theory is not yet implemented as a computational model, there are two central assumptions of the SWIFT model that are conceptually very much in agreement with the Competition– Inhibition theory: autonomous timing of saccades and inhibition of foveal targets due to lexical factors. First, autonomous timing of saccades represents active search for new information as well as predictions and expectations about where relevant information is to be found. This is very different from models assuming that the completion of lexical or cognitive processes triggers the initiation of new saccades; the familiarity check in the E-Z Reader model is such an example (Reichle et al., 1998). The proposal that the eye is acting on expectations that are corrected by sensory feedback if necessary (i.e., the “motor prediction” perspective of Wolpert & Flanagan, 2000), is also in line with more general theories of the relation of eye movements and complex actions such as driving or cricket batting (Land & Furneaux, 1997; Land & McLeod, 2000). 3617P HYONA - 13-26 cg 19/12/02 9:57 am Page 408 408 Reinhold Kliegl and Ralph Engbert It seems plausible that eye movements in reading will eventually be understood as a special case of a more general theory of eye movements and action. The second notion common to SWIFT and Competition–Inhibition theory is that lexical factors affect saccadic latency (and indirectly fixation duration) via an inhibition by foveal targets. The assumption is that the process of autonomous saccade generation can be delayed by lexical difficulty. In the current implementation of the SWIFT model, the sampled time interval for initiating the next saccade program increases with the lexical difficulty of the fixated word. Obviously, this prolongs the fixation on the current word — up to a maximum of an additional 181 ms. Yang and McConkie’s (2001) inhibition process is very sophisticated. They distinguish between three types of saccades (early saccades initiated about 100 — 125 ms after fixation; normal saccades initiated after 175 — 200 ms; and late saccades initiated after 225 ms). Their most relevant result for the current discussion is that display changes of text content (word to nonword) affected only the initiation of late saccades. If there are qualitatively different saccade types and if lexical factors influence only the late ones, then SWIFT will have to be changed to accommodate this high degree of specificity, for example, by linking lexical processing with the labile phase of saccade generation conditional on some minimum amount of processing of the foveal word. The Next Version of SWIFT We opted for some glaring simplifications in the evaluation and specification of the SWIFT model. They were motivated by keeping model complexity down and model comparability high. There are at least three necessary extensions. The first two extensions are to increase the scope of the data base that should be accounted for without major revisions of the current model. Specifically, as mentioned earlier, unlike E-Z Reader the SWIFT model does not account for distributions of fixation durations as function of logarithmic word frequency. Moreover, although SWIFT uses a single mechanism to generate all types of within-line eye movements (i.e., word-to-word, skippings, refixations, and regressions), we have not modeled regression probabilities, perhaps also based on a distinction between regressions within the perceptual span and long regressions. The reason for this omission was that the Schilling et al. (1998) corpus had removed sentences with regressions prior to the analysis. Fitting regression probabilities requires a new corpus of eye movement data. The more serious extension of the SWIFT model concerns a switch from wordbased to letter-based processing. The model should reproduce typical landing-position probabilities as a function of word length and saccadic launch distance (for a review of the relevant literature we refer to Radach and Heller, 2000). Moreover, the relation between fixation positions and fixation durations has been of lasting concern in eye movement research. Specifically, there is evidence for two independent effects of fixation position on fixation duration: (1) Fixation durations are longer for fixations in the center of words, (irrespective of whether they are single or first fixation durations (Vitu, McConkie, Kerr & O’Regan, 2001) and (2) fixation durations increase with the launch distance of the last saccade (Radach & Heller, 2000; Vitu et al., 2001). Finally, 3617P HYONA - 13-26 cg 19/12/02 9:57 am Page 409 SWIFT Explorations 409 1111 2 3 4 5 6 7 8 9 10 1111 2 3 4 5 6 7 8 9 20111 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5111 in this context it may be useful (or even necessary) to allow for a dynamic adjustment of the letter-based perceptual span in response to lexical difficulty. Obviously, such an extension requires a data base that includes information about landing positions of fixations at the letter level in addition to the statistics that were reported in Figure 19.4. Such an increase in data base is necessary to constrain the model parameter space. If successful, such an extension would provide a very desirable modeling framework for the joint consideration of oculomotor, perceptual and low-level cognitive control issues. Acknowledgments This work was supported by Deutsche Forschungsgemeinschaft (DFG grants KL 955/3–1, 3–2, 3–3). A SWIFT applet and source codes of the model can be found at: http://www.agnld.uni-potsdam.de/~ralf/swift/. We thank André Longtin, Ralph Radach, Ronan Reilly, and anonymous reviewer for constructive comments. Address for correspondence: Reinhold Kliegl, Department of Psychology, University of Potsdam, PO Box 601553, 14415 Potsdam, Germany. E-mail: kliegl@rz.unipotsdam.de (Reinhold Kliegl), engbert@rz.uni-potsdam.de (Ralf Engbert). References Binder, K. S., Pollatsek, A., & Rayner, K. (1999). Extraction of information to the left of the fixated word in reading. Journal of Experimental Psychology: Human Perception and Performance, 25, 1162–1172. Carpenter, R. H. S. (2000). The neural control of looking. Current Biology, 10, R291-R293. Engbert, R., Longtin, A., & Kliegl, R. (in press). Complexity of eye movements in reading. International Journal of Bifurcation and Chaos. Engbert, R., Longtin, A., & Kliegl, R. (2002). A dynamical model of saccade generation in reading based on spatially distributed lexical processing. Vision Research, 42, 621–636. Engbert, R., & Kliegl, R. (2001). Mathematical models of eye movements in reading: A possible role for autonomous saccades. Biological Cybernetics, 85, 77–87. Findlay, J. M., & Walker, R. (1999). A model of saccade generation based on parallel processing and competitive inhibition. Behavioral and Brain Sciences, 22, 661–721. Hogaboam, T. W. (1983). Reading patterns in eye movements. In: K. Rayner (ed.), Eye Movements in Reading. New York: Academic Press. Inhoff, A. W., Radach, R., Starr, M., & Greenberg, S. (2000). Attention and saccade programming. In: A. Kennedy, R. Radach, D. Heller and J. Pynte (eds), Reading as a Perceptual Process. Amsterdam: Elsevier. Kennedy, A. (2000). Attention allocation in reading: Sequential or parallel? In: A. Kennedy, R. Radach, D. Heller and J. Pynte (eds), Reading as a Perceptual Process. Amsterdam: Elsevier. Kennedy, A., & Murray, W. S. (1987). Spatial coding and reading: Some comments on Monk (1985). Quarterly Journal of Experimental Psychology, 39A, 649–718. Land, M. F., & Furneaux, S. (1997). The knowledge base of the oculomotor system. Philosophical Transactions of the Royal Society London, B352, 1231–1239. 3617P HYONA - 13-26 cg 19/12/02 9:57 am Page 410 410 Reinhold Kliegl and Ralph Engbert Land, M. F., & McLeod, P. (2000). From eye movements to actions: How batsmen hit the ball. Nature Neuroscience, 3, 1340–1345. McConkie, G. W., Kerr, P. W., & Dyre, B. P. (1994). What are “normal” eye movents during reading: Toward a mathematical description. In: J. Ygge and G. Lennestrand (eds), Eye Movements in Reading. Oxford: Elsevier. Morrison, R. E. (1984). Manipulations of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 667–682. Pollatsek, A., Rayner, K., & Balota, D. A. (1986). Inferences about eye movement control from the perceptual span in reading. Perception & Psychophysics, 40, 123–130. Radach, R., & Heller, D. (2000). Spatial and temporal aspects of eye movement control. In: A. Kennedy, R. Radach, D. Heller and J. Pynte (eds), Reading as a Perceptual Process (pp. 165–191). Oxford: Elsevier. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372–422. Reichle, E. D., Pollatsek, A., Fisher, D. L., & Rayner, K. (1998). Toward a model of eye movement control in reading. Psychological Review, 105, 125–157. Reichle, E. D., Rayner, K., & Pollatsek, A. (1999). Eye movement control in reading: Accounting for initial fixation locations and refixations within the E-Z Reader model. Vision Research, 39, 4403–4411. Schilling, H. E. H., Rayner, K., & Chumbley, J. I. (1998). Comparing naming, lexical decision, and eye fixation times: Word frequency effects and individual differences. Memory & Cognition, 26, 1270–1281. Starr, M.S., & Rayner, K. (2001). Eye movements during reading: some current controversies. Trends in Cognitive Science, 5, 156–163. Vitu, F., McConkie, G.W., Kerr, P., & O’Regan, J.K. (2001). Fixation location effects on fixation durations during reading: an inverted optimal viewing position effect. Vision Research, 41, 3513–3533. Yang, S.-N., & McConkie, G. W. (2001). Eye movements during reading: A theory of saccade initiation times. Vision Research, 41, 3567–3585. Wolpert, D. M., & Flanagan, J. R. (2000). Motor prediction. Current Biology, 11, R729-R732. Appendix A: Analytical Calculation of the Theoretical Maximum of Inhibition Time In the SWIFT model, the time between two subsequent decisions to start a saccade program is given by a random time interval ts and an additive contribution of foveal inhibition h · ak(t). Let us denote the time of end of the last saccade (or, equivalently, the start of the current fixation) by t′. The next command to start a saccade program is generated at time t′ = ts + h · ak (t) (1) 3617P HYONA - 13-26 cg 19/12/02 9:57 am Page 411 SWIFT Explorations 411 1111 2 3 4 5 6 7 8 9 10 1111 2 3 4 5 6 7 8 9 20111 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5111 The theoretical maximum of the contribution of the inhibition process, i.e. max {h · ak (t)} can be calculated. For simplicity, we can choose t = t′. The inhibition mechanism reaches its maximum under three conditions: 1. The random component is — by chance — zero: ts = 0. 2. There has been no preprocessing of the foveal word: ak(0) = 0, i.e. the lexical activity of the foveal word is zero at the start of the fixation. Since lexical preprocessing time is short compared to lexical completion, however, we assume that ak(0) = Lk to further simplify our calculations. 3. The foveal word has a very low frequency, which implies that its lexical difficulty is Lk = . Since the foveal word is lexically processed with rate (0), its lexical activity decreases linearly according to the relation ak (t′) =  – (0) · t′ (2) Putting together this equation with Equation 1 with ts = 0, i.e. t′ = h · ak(t′), we obtain the relation t′/h =  – (0) · t′, which can be rearranged to the final equation for the maximum of the inhibition time  t′ = ———— (0) + 1– h (3) We interpret this result by discussing the two limiting cases of a vanishing or infinite inhibition parameter h: h → 0: t′ = 0  h → +∞: t′ = —— (0) (4) In the first case, without inhibition, the maximum of t′ is obviously zero. In the second, and more interesting case, even an arbitrary large inhibition parameter leads to a finite contribution of t′ = /(0) = 186 ms (for the estimated value h = 50.3, we obtain a slightly lower value t′ = 181 ms). The small increase of 5 ms between the cases h = 50.3 and h → + ∞ also explains the large standard deviation estimated for the inhibition factor h (see Table 1). 3617P HYONA - 13-26 cg 19/12/02 9:57 am Page 412