The Rolling Cross-Section and Causal Attribution: Henry E. Brady and Richard Johnston
The Rolling Cross-Section and Causal Attribution: Henry E. Brady and Richard Johnston
The Rolling Cross-Section and Causal Attribution: Henry E. Brady and Richard Johnston
Causal Attribution
paign. The illustration shows the limitation of a panel design in causal at-
tribution, but it also shows the limits of the rolling cross-section. This
forces us to ask just what the rolling cross-section is and how it might be
deployed, the topic of the second section. Then follows an exposition of
the logic of the primary method of compensating for the potential lack of
statistical power: graphical smoothing. Part of the argument is for graphs
as such: the rolling cross-section makes their use both desirable and rela-
tively unproblematic. They are desirable in that they greatly facilitate pri-
mary research—not to mention exposition—where a major element in
analysis is real time. They are relatively unproblematic because of the ran-
dom assignment of each respondent to an interview date; controls for re-
spondents’ accessibility are just not required. But the smallness of daily
samples forces graphical data to be smoothed, and choices among smooth-
ing alternatives are not simple. We end with the discussion of a mixed de-
sign and a quick overview of other literature about analyzing the rolling
cross-section design.
An Example
Johnston, Hagen, and Jamieson 2004 argue that a pivotal feature of the
2000 campaign was a shift in perceptions of Al Gore’s character, in partic-
ular of his honesty. It would be natural for a researcher to assume before-
hand that one of the major campaign events causing opinion shifts would
be the presidential debates and to design a panel to capture possible shifts.
Figure 1 certainly points in this direction. Figure 1a sets up data from the
2000 National Annenberg Elections Study (NAES)1 as if resources had
been committed to a simple three-wave panel with interviews before the
debates (September and the Arst two days of October), between the Arst
and last debate (October 3 to 16), and after the last debate (October 17 to
the end). Mean values for Gore’s honesty rating are indicated by solid
horizontal bars, with 95 percent conAdence intervals around them, for
each of the three periods. For interpretive ease, ratings have been rescaled
to the 1 to 1 interval, with values below zero conveying negative judg-
ment.2 The narrow conAdence intervals reBect the massive accumulation of
sample in the NAES.
Unquestionably, Al Gore was better regarded before the Arst debate
than after it. The predebate mean is positive while the postdebate mean is
negative. The conAdence intervals suggest that there is no possibility that
Fig. 1. Debates and perceptions of Al Gore‘s honesty. (a) Pre-post means;
whole-period estimates; dashed lines are approximate 95 percent conAdence
interval. (b) Daily means; daily estimates; dashed lines are approximate 95 per-
cent conAdence interval. (Data from 2000 Annenberg Election Survey.)
The Rolling Cross-Section and Causal Attribution 167
these results were generated from the same underlying distribution. If any
debate mattered, it must have been the Arst one, as the Arst shift is both
larger and statistically less ambiguous than the second one. There is a sug-
gestion that opinion on Gore deteriorated further after the second debate,
but even the large sample sizes in the period do not allow us to reject the
null hypothesis of no difference between the days before and the days after
the last debate.
But how do we know that any debate was critical? The data were
cut arbitrarily at the dates of the public events to simulate the results
from a panel designed on the premise that debates are crucial moments
in the history of a campaign. It is, of course, a reasonable supposition
that if anything has dynamic impact, debates will. But the analysis is
based on that supposition, not on any consideration of actual dynamics.
Certainly, if one is precommitted to a panel design, it would not make
sense to mark the boundary between interview and reinterview at any-
thing other than a major public event. The campaign is about more than
public events, however.
Figure 1b suggests that using only a crude pre-/postevent design
might lead to an inappropriate causal attribution. In this panel the NAES
data are fully rolled out as a daily tracking. The data are noisy, of course,
as indicated both by the amount of surplus day-to-day vertical movement
in the data and by the 95 percent conAdence interval, which is an order
of magnitude larger than those in Agure 1a. Notwithstanding the noise,
the Arst debate does not seem to be the whole of it. Values in the week
or so before the Arst debate are lower than those that typify early Sep-
tember. There is a strong hint, then, that downward movement started
even before the debates. But there is also a suggestion that emphasis on
the Arst debate is not entirely misplaced. The day after the Arst debate
witnessed a sharp drop in Gore’s rating. The drop was not the largest of
the series, but it is one of the few that were not corrected by immediately
following observations. Could it be that the debate accelerated the
decay? The picture also suggests that other debates did not affect Gore’s
ratings. The entire drop after the Arst debate occurred in the Arst few
post–Arst debate days.
The picture so far is unclear. The temporally crude but statistically
powerful periodization in panel a conArms that Gore’s ratings dropped.
There is no question that a sharp contrast exists between the period be-
fore and after the Arst debate. But Agure 1b indicates that focus on the Arst
168 Capturing Campaign Effects
debate does not do justice to the data. Movement probably predated the
debate and may not have been affected by it. Then again, it might have,
and identiAcation of the real predebate turning point is next to impos-
sible. As the rolling cross-section data are presented in panel b, they are
powerful enough to undermine an exclusive emphasis on the debate but
not powerful enough to underpin a conclusive alternative interpretation.
The Design
What is the design that gets us to this point? In essence, a “rolling” cross-
section is just a cross-section of respondents, but with a twist. In any sur-
vey, when the list of potential respondents is released to interviewers to
begin the process of contacting them for an interview, the interviewers are
asked to follow a careful mix of calling at different times of the day and on
different days of the week in order to maximize the chance of eventually
Anding the respondent at home. The process of completing interviews in
this way is called “clearing the sample.” Aggressive and systematic clear-
ance compensates for the accidents of daily life that cause people to be
away from their telephone at different times. Much of the variation in the
quality of surveys and of survey houses lies in the willingness to spend
money on clearance. As a result, any self-respecting survey will have sev-
eral days for clearance built into it.
At the same time, the more such days, the more vulnerable the sur-
vey will be to changes in responses because of real events. People called by
pollsters after September 11, 2001, for example, had much different atti-
tudes on terrorism and defense than people called just before the tragic
events of that day. But to complicate things, some of the apparent effect of
time will not be from events in real time but from differences in the respon-
dents: from any sampling frame, respondents interviewed later in the clear-
ance period are likely to differ systematically from those easier to reach
and thus interviewed earlier (Dunkelberg and Day 1973; Hawkins 1975;
Groves 1989). Disentangling impact from factors evolving in real time
from impact due to mere accessibility of respondents is a formidable task.
But failure to take the task on may lead an analyst to misrepresent the data.
The rolling cross-section design converts the “bug” of temporal het-
erogeneity into a “feature.” The steps in executing the design for a tele-
phone survey are as follows.
The Rolling Cross-Section and Causal Attribution 169
day, such that by the end of this day over 60 percent of interviews that
would ultimately be completed were in the bank. Thereafter, increments
were small: under 10 percent of ultimate completions for days 3 and 4 and
under 5 percent for all succeeding days. By one week out, over 90 percent
of ultimate completions had been recorded, and by two weeks (the nomi-
nal end of the interviewing window for any replicate) over 99 percent of
interviews were in the bank. As it happens, one interview from the July 5
sample was conducted four weeks after release.
Now imagine the transposition of this sequence into the comple-
tion pattern for replicates released on later days. If the second day’s repli-
cates have exactly the same distribution as the Arst, then about 115 to 120
interviews on day 2 will be from that day’s replicates, and about 50 inter-
views completed that day will be from replicates released the day before.
The total number of interviews on day 2 should be about 165. On the third
day, another 115 to 120 interviews will accrue from that day’s release,
along with about 50 from day 2 and 20 from day 1, for a total of about 210.
The daily total will build for about two weeks, at which point earlier repli-
cates will have been exhausted and dropped. From this point on, we can
say that the day on which a respondent is interviewed is the product of a
random draw.4 Practically speaking, this is effectively true after about a
week of interviewing so that from that time on the group of people inter-
viewed on each day can be treated as a representative cross-section of the
population.
Reality is slightly messier, but only slightly, according to Agure 3.
This Agure tracks actual completions from July 5 to Election Day. The ac-
tual number of completions on July 5 is larger than the number implied in
Agure 2, as the NAES was already in the Aeld. Before the Independence
Day holiday, only one replicate was released per day, with an average daily
completion rate of 50 interviews. The ramping up of Aeldwork required
close to two weeks, although the presence of open numbers from before
July 5 accelerated the uptake modestly. In any case, from mid-July on,
completions oscillated around the 300-person target.5
Advantages
For all the apparent complication of sample release and clearance, what re-
sults is just a set of daily cross-sections that can be combined into a large
cross-section. And almost any temporal subsample can be combined into a
172 Capturing Campaign Effects
Fig. 3. Completed interviews by day, July 5 to Election Day. (Data from 2000
Annenberg Election Survey.)
Low Cost
escapably, the NAES response rate started to drop two weeks before the
end, and the same will be true, mutatis mutandis, for any rolling cross-
section.8 Initial response-rate decline is tiny, but it accelerates, essentially,
on the temporal inverse of the pattern in Agure 2b. The crucial point, how-
ever, is that each daily cross-section is representative of the population.
Fresh Respondents
say, the immediacy with which impact unfolds and or in differences in time
path between critical partitions of the sample (for example, whether or not
the respondent saw any of the debate or the respondent’s general exposure
and attention to the mass media), relevant data not compromised by acces-
sibility bias appear immediately after the event.11 Meanwhile, joining up
any combination of consecutive days is unproblematic. From the sampling
perspective, all that adding or subtracting a day does is reduce or expand
standard error. No other aspect of sample selection is being tweaked in the
slightest. So the design combines Bexibility in periodization with power in
combination of days.
If this approach to seeking campaign effects seems ad hoc, the ac-
cusation is not troubling given the lack of theory about what drives the
twists and turns of campaigns. Theory leaves us short of expectations for
the identiAcation of important campaign events, much less their timing
and temporal shape. Holbrook (1996, chap. 6) very usefully and imagina-
tively goes beyond conventions and debates by considering other kinds of
campaign events, but his criteria for selection events is “admittedly vague”
(127), and he does not consider their temporal shape. Shaw’s 1999 work
catalogs alternative time paths suggested by control theory, and he tries to
see what kinds of events follow which paths. But even he does not supply
more than a typology of dependent-variable distributions. There is, so far,
no body of propositions that might distinguish debates from conventions,
for example, or a news impact from an advertising one. At the very least,
the rolling cross-section allows us to maximize the variance to be ex-
plained. Down the road, it should allow us to build an inventory of dy-
namic patterns.
What makes this possible, of course, is daily management of data
collection, such that any single day yields a random subsample of the total
sample. Apart from the Bexibility this affords us, the natural temptation is
to focus on individual days, but the idea, just fronted, of identifying dy-
namic patterns involves more than juxtaposing consecutive days. It re-
quires examining the pattern within that body of days or consecutive groups
of small numbers of days, ideally by day-by-day comparison. The problem
is illustrated by the example this essay opened with. We noted that there
did seem to be some effect of the Arst debate, in that the immediately fol-
lowing days seemed to yield low readings, relative to the days before. But
the days before also seemed to exhibit a drop in Gore’s rating. This cast
some doubt on the independent effect of the debate. But identifying an
The Rolling Cross-Section and Causal Attribution 175
earlier point of discontinuity deAed the naked eye. Realizing the full value
of the design requires some mode of induction that captures the signal of
true turning points from the noise of sampling error.
Graphical Smoothing
have a large enough sample size—so that variation from one time period to
the next due to sampling error is small compared to variation from cam-
paign events.
Even if the population were homogenous or even if we had a panel,
the variation from period to period might be due to measurement error—
the different ways that people can answer questions when they have the
same opinions. People with the same opinions may interpret the question
in different ways because of interviewer effects or simply the imprecision
of the question. Once again, with rolling cross-sections, the solution to
this problem is to have a large enough sample size so that average measure-
ment error is small compared to variation due to campaign events.
A large sample size is the best way to smooth the data because it improves
the signal to noise ratio by diminishing sampling error, but large samples
are costly and constrained by limited budgets. Given the limits on daily
sample sizes, the problem is to And the best way to extract signal from
noise given the data at hand. For the Annenberg study this means Anding
the best way to analyze data on about three hundred respondents per day
as shown in Agure 3. The goal is to get the best rendition of the course of
public opinion—the shape of the curve ut where t is time and ut is the true
daily mean of opinion.
To do this, we need some criterion by which we can judge whether
we have done a good or a bad job of smoothing the data. One criterion is
unbiasedness. By this standard, if we want to know the population’s aver-
age estimate of Gore’s honesty (denoted by u1, u2, and u3) for the three pe-
riods 1, 2, and 3, then the best estimate for each mean is the sample aver-
age of Gore’s honesty ratings in each period from the three hundred
respondents who were interviewed during that period. We denote these
sample averages by u1*, u2*, and u3*. It is a standard result that with random
sampling the expected value of each of these is equal to the true mean for
that particular day; that is, u1*, u2*, and u3* are unbiased estimates of u1, u2,
and u3, respectively. For period 1, for example, this means that if we were
to repeatedly sample from the population and get many estimates of u1*
(say, from different polling Arms operating on that same day), then the av-
erage of these many estimates would equal the true population value u1.
Unbiasedness of this sort is an especially useful property if we are looking
The Rolling Cross-Section and Causal Attribution 177
for turning points because we want to make sure that each daily estimate
is an unbiased estimate of the true signal for that day.
But another criterion is minimizing variance so that the standard er-
rors of estimates of views about Gore’s honesty are as small as possible.
Since standard errors are a measurement of the amount by which our esti-
mates vary from sample to sample, minimizing them means that we have
reduced noise to a minimum. If we assume that the total variance due to
heterogeneity and measurement error is s2, then the sampling variance for
each of u1*, u2*, and u3* is s2/n, where n 300, the number of observations
in each period. For small n the quantity s2/n can still be quite large. We
could get an even smaller sampling variance if we assumed that Gore’s hon-
esty did not change over periods 1, 2, and 3 so that we could average u1*,
u2*, and u3* to get u# (u1* u2* u3*)/3 with a variance of s2/3n—one-
third the size of the previous sampling variance.
The quantity u# is a three-period average, and if for any time-series
with observations t 1, t, and t 1 we deAne ut# (ut 1* ut* ut 1*)/3,
then we have a three-period moving average. The equal weights of (1/3, 1/3, 1/3)
that deAne this estimator ut# are called the kernel weights in the statisti-
cal literature or just the kernel. Note that the weights always sum to one,
but different patterns of weights deAne different estimators. The kernel
for the estimator that takes just the current period’s mean ut* from among
(ut 1*, ut*, ut 1) is (0,1,0). Hence, the kernel deAnes different ways to
combine the daily sample means to produce an estimator, and kernels
have different shapes ranging from the Bat or “uniform” distribution with
equal weights for the three-period moving average to the sharply peaked
(at the middle value) shape for the current period’s mean.
Unfortunately, u# might be a biased estimate of even u2, the middle
period’s value for Gore’s honesty if Gore’s honesty varies by period.12
Thus, there is a tradeoff between bias and sampling variance, but we
might be willing to trade a little bit of bias for a lot smaller sampling vari-
ance. There are, of course, limits to this, and we would not be willing to
trade a lot of bias for a slightly reduced sampling variance. Statisticians
formalize this tradeoff by considering mean squared error as a summary
criterion for any estimator. Mean squared error is equal to the bias squared
plus the sampling variance. To simplify the computation of mean squared
error in this case, we set the zero point of the honesty scale by assuming
that the middle value u2 for honesty is equal to zero. There is no loss in
generality in doing this because the scale is arbitrary to begin with. With
178 Capturing Campaign Effects
this costless simpliAcation, we can easily compute the mean squared error
for u# as an estimator of u2:13
The formula for the bias term [u1 u3]2/9 may not seem intuitively obvi-
ous so we explore its properties in more detail later. We can also compute
the mean squared error from using u2* as the estimator for u2, which will be
just equal to the variance of u2* since u2* is an unbiased estimator of u2:
MSE(u2*) s2/n.
Under what conditions should we use u# versus u2*? That is, under
what conditions should we use the three-period moving average versus the
one period estimate? As the variance s2 gets bigger, there is more noise in
the data. It makes sense in this situation to use u# instead of u2* because the
bias term in MSE(u#), that is, [u1 u3]2/9, will be dominated by the vari-
ance term, s2/3n, so it is worth accepting some bias from averaging all three
periods to get the much smaller variance term (s2/3n) in u# compared to
that (s2/n) in u2*. As n gets bigger and bigger, the bias term in MSE(u#) will
dominate the variance term so it will make sense to use u2*, which does not
have any bias term. With more observations, there is no need to average
over adjoining periods in order to reduce noise—the number of observa-
tions in a single period does that nicely. Similarly, as the bias term gets
bigger and bigger, it makes sense to use u2* instead of u# because u2* is un-
biased. That is, if the twists and turns of the campaign cause the variable
of interest to change a lot, then we should refrain from averaging over
adjacent periods.14 In summary, for large variance, small n, and small bias,
it makes sense to use u#. For small variance, large n, and large bias, we
should use u2*.
The bias term, (u1 u3)2, deserves some additional discussion because it
summarizes the shape of the response curve—the way that ratings of
Gore’s honesty can be expected to go up and down. Because we have set
u2 0, this quantity attains its smallest possible value of zero when u1 and
u3 are also zero. In this case, it clearly makes sense to combine the three
The Rolling Cross-Section and Causal Attribution 179
periods to estimate Gore’s honesty because the true value is the same for
all three periods. But the bias term also attains its smallest possible value
when u1 u3 and the three points lie along a straight line ut with a con-
stant slope. Thus, the bias term is only nonzero when the three points u1,
u2, and u3 depart from lying along a straight line—only when the slope of
the curve ut is changing. The classic measure of the change in a slope is the
second derivative of the curve. Consider the standard difference-in-differ-
ences approximation of a second derivative:
where h is some small unit of time. Since we have assumed that u2 0, this
amounts to ut [u3 u1]/h2 so that by a little algebra, h2 ut [u3 u1],
which is the square root of the bias term. This result will come in handy
later because [ut ]2 is a convenient measure of the shape of the response
that we want to detect.
can slice the time periods smaller and smaller and still get some observa-
tions. Thus months can be split into weeks and weeks into days. Of course,
there is a limit to how far we can do this with the rolling cross-section de-
sign because our smallest unit is a day, but this thought experiment is nev-
ertheless useful. Assume that the total length of the time period on the hor-
izontal axis is one unit and that there are N observations spread evenly
over this entire time period. We break the horizontal axis into a number of
evenly spaced time periods, each of which is h units apart, where h is some
fraction of one. These time units might be months, weeks, or days. Then
for any given time period, there are n hN observations. Our goal will be
to see what happens as we change h, which is called the bandwidth in
smoothing language. Intuitively, we would expect that as the bandwidth h
gets smaller, the amount of bias in the estimator will decrease, but the num-
ber of observations n hN will also get smaller, causing the variance in the
estimator to increase. Thus the choice of bandwidth is an essential aspect
of choosing a smoother because a good choice will minimize the mean
squared error.
We choose the equal weighting kernel (moving average) so that the
mean squared error is as stated earlier:
Using the previous result for the bias [h2 ut ] [u3 u1] and the fact that
n hN, we can rewrite this as15
Just as we expected, as the bandwidth h gets bigger, the bias term in-
creases but the variance term gets smaller. Furthermore, the amount of
bias depends upon the size of the second derivative and the curve’s devi-
ation from linearity. The more “wiggly” the curve, the more bias there is
in the estimator.
The Rolling Cross-Section and Causal Attribution 181
h5 3s2/[4N(ut )2].
The optimal bandwidth gets wider with greater population variance s2 and
narrower with increasing N and increasingly wiggly curves.
We can use this formula to determine the optimal smoothing for the An-
nenberg data on Gore’s honesty. From the daily data, we can estimate s2 as
the average of the daily variances.17 The result is a value of about .48. The
value of N for the sixty-eight days of interviewing is 20,892. The value of
ut depends upon the kinds of responses we expect to And in the general
population. One natural measure of a unit of response is the cross-sectional
standard deviation in the variable of interest, such as perceptions of Gore’s
honesty. Changes in the mean equivalent to about 5 percent of the stan-
dard deviation might be considered signiAcant, although they might also
be hard to detect. Changes in the mean equivalent to about 25 percent of
the standard deviation would certainly be substantial, and we would want
to be able to detect them. Consider each of these possibilities.
Assume that we expect changes in the mean value of Gore’s honesty
of one-quarter of the cross-sectional standard deviation, and assume that
we expect that these changes might happen within three days. Then we
can calculate an approximate value for ut as follows. Suppose that the
trend line is Bat and that it changes upward (or downward) by one-quarter
of a standard deviation (.25 units on the honesty scale) in three days,
which is about one-twentieth (.05) of the total sixty-eight days on Agures
1a and 1b. Then ut will be .25/.05 5 over this period. Putting this num-
ber along with the variance s2 (.48) and the total number of interviews
(20,892) in the previous formula yields h .058 so that n hN will be
about 1,200, or four days of interviewing at three hundred respondents per
day. Similarly, if we are expecting changes of 5 percent of a standard devi-
ation, then ut .05/.05 1 and h .111 so that n will be about 2,400, or
eight days of interviewing. These results suggest that the ideal amount of
smoothing will be something like four to eight days.
182 Capturing Campaign Effects
Green and Silverman 1994; Ruppert, Wand, and Carroll 2003) that smooth
data by Atting them to piecewise polynomial (often linear) functions that
are spliced together at knots. There are close relationships among these
methods. Silverman (1985, 3–4), for example, shows that spline smooth-
ing can be considered a form of weighted moving average smoothing with
a particular kernel and varying bandwidth (see also Hardle 1990, 56–64).
Although we have focused on the optimal smoothing problem, stat-
isticians have also given considerable attention to the problems of infer-
ence from smoothed data, and they have developed methods for describ-
ing conAdence intervals for the curves produced by smoothing. These
methods provide ways to address the statistical power issues highlighted
by John Zaller (2002) in his studies of the inferences that can be made
from election studies. Some representative references are Hardle 1990,
chapter 4; and Ruppert, Wand, and Carroll 2003, chapter 6.
A Mixed Design
Although this essay began by treating the rolling cross-section and the
panel design as substitutes, in fact they are better seen as complements.
That is, a properly constructed election survey can be both a rolling-cross
section and a panel; all that is required is for one wave of interviewing to
have controlled release of sample. It might be tempting to deploy a precam-
paign cross-section as a baseline and then meter the next wave over the
campaign. But if the point of the initial wave is to establish a baseline, this
can be achieved by examination of distributions and parameters in the early
days of the campaign. As most comparisons to baseline that truly matter are
aggregate ones, reinterviewing adds little value. Meanwhile, panel condi-
tioning might distort estimates of aggregate change. It might be best to
compare fresh cross-section with fresh cross-section.
The obvious way to connect the designs is with a simple pre–post
setup, where the preelection, or campaign, wave is the temporally me-
tered rolling cross-section. Temporal metering of the postelection wave
might also be undertaken but on a different basis: release of numbers at
the postelection wave should be as uncorrelated as possible to the timing
of Arst-wave completions.19 This makes it possible to do two things. First,
the postelection rolling cross-sections can be used to monitor aggregate
changes that occur in the postelection period in the same way as the
preelection rolling cross-sections are used. Second, over-time changes in
186 Capturing Campaign Effects
that stretches over more than a few days is likely to have temporal hetero-
geneity, as several essays in this volume show. It is best that that hetero-
geneity be recognized explicitly, guarded against if possible by making
sure that the dates of interviews are uncorrelated with one another, and ul-
timately modeled directly if it still remains a problem. And, of course, the
heterogeneity produced by events ought not to be confused with hetero-
geneity produced by differences in respondent accessibility.
The rolling cross-section component beneAts from the merger of
the two designs by clearer separation of cross-sectional and longitudinal
variance. A simple example of such leverage is portrayed in Agure 6, for
analysis of a debate effect. If one observes after a debate a difference be-
tween those who saw the debate and those who did not, is the difference
the result of actual exposure to the debate, or is it merely symptomatic of
an abiding difference that also correlates with the likelihood of viewing the
debate in the Arst place? By itself, a rolling cross-section data Ale cannot
address this question. But one linked to a postelection wave can. A critical
fact about the postelection wave is that debate exposure information can
be gleaned from all respondents, including those Arst interviewed before
the debate. The postelection data allow us to read back through the event
and to distinguish its endogenous and exogenous components.
The example in Agure 6 is from the 1988 Canadian Election Study. In
that year’s debate among the party leaders, John Turner of the Liberal Party
apparently scored a clear victory. This both primed and moved opinion on
the main issue, Canadian–U.S. free trade, and it rehabilitated Turner’s repu-
tation as a leader. Figure 6 shows the extent to which this rehabilitation was
conditional on exposure to the event itself. Exposure is indicated by response
to a postelection question, and so the comparison extends back virtually to
the start of the campaign. Smoothing is by prior moving averages (the tech-
nique exempliAed in Ag. 4 and in Johnston, Hagen, and Jamieson 2004), and
so any turning points should be correctly located. There is the merest hint
that respondents who would watch the debate began to reevaluate Turner
just before the event.21 In general, however, it appears that debate watchers
did not bring any different beliefs to the moment than did nonwatchers. So
the difference right after the event is mostly real, in the sense that it truly re-
Bects impact from the moment, not from selection bias.
Impact from the moment is not entirely the same as impact from the
debate, however. This potential indeterminacy shows how leverage can
work the other way.
188 Capturing Campaign Effects
more than a few days after the event is likely to yield to a false negative for
the event.23
Conclusions
The rolling cross-section design is a powerful one for detecting the impact
of events over time, and it has strengths that are lacking in the standard
panel design. Indeed, we show that panels can miss important turning
points and events. Moreover, designers of panels have probably underesti-
mated the problems that can arise in making inferences from panels when
Aeldwork for each wave is spread out, as it almost inevitably has to be, over
a period of time. Because of its focus on temporal change, the rolling cross-
section design suggests ways that panels themselves could be improved by
incorporating rolling cross-sections in each wave.
Despite its advantages, the rolling cross-section design also pre-
sents substantial analytical challenges due to the small size of each period’s
The Rolling Cross-Section and Causal Attribution 191
NOTES
1. For details on the NAES, see Romer et al. 2003.
2. The question is, “Does the word ‘honest’ describe Al Gore extremely well,
quite well, not too well, or not well at all?”
3. The rigidity of the callback sequence is modiAed when interviewers pursue
opportunities presented by the Aeld. For example, if a contact expresses willingness
to make an appointment outside the normal two-week clearance window, inter-
viewers are commonly instructed to make the appointment, as maximizing re-
sponse rate is always a serious priority. Similarly, if a call attempt indicates that the
respondent is at home but already engaged with a call, the interviewer will phone
back promptly.
4. Thus, although the completions on a given day come from different repli-
cates—from today’s, the preceding day’s, the replicate from the day before that,
and so forth—they should still amount to a random sample of the population if the
samples have all been worked with the same intensity. By working the samples with
the same intensity, we ensure that today’s interviews from the replicate of Ave days
ago are statistically valid substitutes for the group of people from today’s replicate
who will be ultimately interviewed Ave days from now. The “same intensity” as-
sumption, therefore, allows us to make the jump from random replicates to the as-
sumption that those interviewed on a given day represent a representative sample
of the population.
5. The data show that the Aeldwork house, Shulman, Ronca, and Bucuvalas,
Inc. (SRBI), struggled early on to And the target but was on top of the task by
early August.
6. The only exceptions to this rule are samples from transitional subperiods,
such as the days following July 5. Relatively inaccessible respondents will be un-
derrepresented, relative to other periods, at transitions that involve increasing
sample size and overrepresented at transitions involving reductions in sample
size. Analysis for transitional days should, strictly speaking, employ weights for
accessibility.
7. We are grateful to David Northrup, project director on the Canadian Elec-
tion Studies, for this insight.
8. Well, maybe not in Canada. In the 1993, 1997, and 2000 Canadian Election
Surveys, completion numbers climb in the last week, quite without any change in
Aeldwork intensity. The heart of the matter seems to be that respondents who
192 Capturing Campaign Effects
earlier would schedule a later interview now agree on the spot or agree to be in-
terviewed promptly, under the shadow of the deadline.
9. Some leverage on this question could be gained by drawing a fresh post-
debate cross-section and using this for calibration. This starts to inBate costs, how-
ever, and it presents its own comparison problems, as the second wave of the panel
is not itself a cross-section.
10. Johnston, Hagen, and Jamieson (2004), for example, And that both party
identiAcation and liberal/conservative ideology drift toward the temporarily advan-
taged party and then away as the advantage shifts. Such endogenous movement in
party identiAcation can be minimized by using response to the root question. This
means that the seven-point scale, where “leaners” are assigned to parties and parti-
sans assigned intensity scores, is inappropriate for rolling cross-section analysis.
11. Postevent surveys started right after the event and completed within a day
or two overrepresent those respondents who are easily accessible by the interview
method. The rolling cross-section overcomes this problem. Consider, for example,
a population in which “stay-at-homes” almost always answer on the Arst day of in-
terviewing whereas those who “get-out-of-the-house” typically require several days
of calls. Further assume that after a week’s effort, both groups are just about as
likely to be interviewed. A postevent survey conducted for one or two days would
have very high response rates for “stay-at-homes” and consist mostly of such
people. A rolling cross-section would interview the correct proportions of each
group because it would pick-up those who “get-out-of-the-house” and were not in-
terviewed before the event from replicates released before the event. If “stay-at-
homes” are different from those who “get-out-of-the-house” (and there is abundant
evidence that they are), then the postevent survey will provide a biased picture of
the impact of the event. Brady and Orren (1992) provide an example with respect
to the Canadian debates.
12. The bias in using u# to estimate u2 comes from using information that is one
period away from period 2 (namely, information from periods 1 and 3) as well as
contemporaneous information. It seems likely that there will be even more bias in
using u# to estimate u1 or u3 because u# uses some information from two periods
away.
13. We can generalize the result a bit by assuming that u# a1u1* a2u2*
a3u3* with the weights adding up to one (a1 a2 a3 1). Since u# is an estima-
tor for u2, it makes sense to assume a symmetrical treatment of period 1 and period
3 observations so that a1 a3. Then we can write u# au1* (1 2a)u2* au3*.
The expected value of this is E(u#) au1 (1 2a)u2 au3, and the true value of
the period 2 average is u2 so that the expected bias is Bias(u#) E(u#) u2 au1
(2au2) au3. Since we have set u2 0, this simpliAes to Bias(u#) a(u1 u3).
The variance of u# can also be easily calculated as Var(u#) (6a2 4a 1)s2/n.
Hence, the mean squared error is
MSE(u#) a2(u1 u3)2 (6a2 4a 1)s2/n.
If a 1/3, then this becomes the expression in the text for the three-period mov-
ing average.
14. This analysis can be done more formally with the results from the preced-
The Rolling Cross-Section and Causal Attribution 193
ing footnote by minimizing the mean squared error in that footnote with respect
to the parameter a, which produces u# when a 1/3 and u2* when a 0. We can
And the value of a by taking the derivative of the MSE in the preceding footnote
with respect to a, setting the derivative equal to zero, and solving for a. The result,
after some algebra, is
a 2/{6 n[(u1 u3)2/s2]}.
Clearly this has the two limits 1/3 (producing u#) and zero (producing u2*). Fur-
thermore, for small n or small (u1 u3)2, we obtain something close to u#, whereas
for large n or large (u1 u3)2 we get u2*. For large s2 we get u#, and for small s2 we
get u2*.
15. Note that we conveniently choose the interval h for computing the approx-
imation to the derivative to be the same as the bandwidth. This means that the ex-
pression for MSE(u#) is only approximate.
16. If we carry through with the more general case in the preceding footnotes,
we get that
MSE(u#) a2 h4[u (t)]2 [6a2 4a 1] s2/hN.
And if we think of a kernel as a function K(t) that deAnes weights for each value of
t, then we can deAne
c 冱t [K(t)]2 Sum of Square of Kernel Weights 6a2 4a 1
d 冱t u2 K(u) Variance of Kernel weights 2a,
so that we can write MSE(u#) h4(d2/4)[u (t)]2 cs2/hN. This result is identical
to the general result of Gasser and Muller reported in Hardle as Theorem 3.1.1
(Hardle 1990, 29–30).
17. We are eliding a potential complication here by assuming that s2 is constant
across the campaign. Brady and Johnston (1987, 170–73) show how standard de-
viations for trait batteries become greater over the course of a primary campaign
(see also Campbell 2000; Wlezien and Erickson 2002). In this case, the variation
in s2 is probably a second-order problem, but it will not be in every case.
18. We use “prior” moving averages in which the smoothed point on day t from
a p-period moving average is calculated from average of day t, t 1, t 2, . . . ,
t p 1. Prior moving averages have the virtue that, if a turning point occurs in
the underlying true series ut, then the prior moving average will only start to turn
at the point where the true series begins to turn. They have the defect that the
prior moving average may underestimate the size of the turn.
19. It is impossible to ensure that the actual gap between Arst- and second-wave
interviews is uncorrelated to Arst-wave timing. The closest we can come is to make
rerelease of the number to Aeld uncorrelated to the initial completion date.
20. Note that this analysis is symmetrical and that it would also allow inferences
about the impacts of postelection events by comparing groups interviewed before
and after some postelection occurrence.
21. The predebate uptick among eventual debate watchers reBects one outlier.
All other observations for this group in the period indicate no predebate shift.
194 Capturing Campaign Effects
22. Indeed, much of this coverage was simple repetition of the key moment in
the debate.
23. This observation applies to any postevent retrospective question, not just
one posed in the second wave of a panel.
REFERENCES
Brady, Henry E., and Richard Johnston. 1987. “What’s the Primary Message: Horse
Race or Issue Journalism?” In Media and Momentum, ed. Gary Orren and Nelson
Polsby. Chatham, NJ: Chatham House.
Brady, Henry E., and Gary Orren. 1992. “Polling Pitfalls: Sources of Error in Pub-
lic Opinion Surveys.” In Media Polls in American Politics, ed. Thomas Mann and
Gary Orren. Washington, DC: Brookings Institution.
Campbell, James E. 2000. The American Campaign: U.S. Presidential Campaigns and the
National Vote. College Station: Texas A&M Press.
Cleveland, William. 1979. “Robust Locally Weighted Regression and Smoothing
Scatterplots.” Journal of the American Statistical Association 74:829–36.
Deaton, Angus. 1985. “Panel Data from Time Series of Cross-Sections.” Journal of
Econometrics 30:109–26.
Dunkelberg, William C., and George S. Day. 1973. “Nonresponse Bias and Call-
backs in Sample Surveys.” Journal of Marketing Research 10:160–68.
Green, P. J., and B. W. Silverman. 1994. Nonparametric Regression and Generalized Linear
Models. London: Chapman and Hall.
Groves, Robert M. 1989. Survey Errors and Survey Costs. New York: Wiley.
Hardle, Wolfgang. 1990. Applied Nonparametric Regression. Cambridge: Cambridge
University Press.
Hawkins, Thomas M. 1975. “Estimation of Nonresponse Bias.” Sociological Methods
and Research 3:461–88.
Holbrook, Thomas M. 1996. Do Campaigns Matter? Thousand Oaks, CA: Sage.
Johnston, Richard, André Blais, Henry E. Brady, and Jean Crête. 1992. Letting the
People Decide: Dynamics of a Canadian Election. Stanford: Stanford University Press.
Johnston, Richard, and Henry E. Brady. 2002. “The Rolling Cross-Section Design.”
Electoral Studies 21:283–95.
Johnston, Richard, Michael G. Hagen, and Kathleen Hall Jamieson. 2004. The 2000
Presidential Election and the Foundations of Party Politics. Cambridge: Cambridge Uni-
versity Press.
MofAtt, Robert. 1993. “IdentiAcation and Estimation of Dynamic Models with a
Time-Series of Repeated Cross-Sections.” Journal of Econometrics 59:99–123.
Romer, Daniel, Kate Kenski, Paul Waldman, Christopher Adasiewicz, and Kath-
leen Hall Jamieson. 2004. Capturing Campaign Dynamics: The Annenberg National Elec-
tion Survey. New York: Oxford University Press.
Ruppert, David, M. P. Wand, and R. J. Carroll. 2003. Semiparametric Regression. Cam-
bridge: Cambridge University Press.
Shaw, Daron R. 1999. “A Study of Presidential Campaign Effects from 1952 to
1992.” Journal of Politics 61:387–422.
Silverman, B. W. 1985. “Some Aspects of the Spline Smoothing Approach to Non-
The Rolling Cross-Section and Causal Attribution 195
Parametric Regression Curve Fitting.” Journal of the Royal Statistical Society, Series
B (Methodological), 47:1–52.
Verbeek, M., and T. Nijman. 1992a. “Can Cohort Data Be Treated as Genuine
Panel Data?” Empirical Economics 17:9–23.
———. 1992b. “Pseudo Panel Data.” In The Econometrics of Panel Data: Handbook of
Theory and Applications, ed. Laszlo Matyas and Patrick Sevestre. Dordrecht, the
Netherlands: Kluwer Academic Publishers.
Wlezien, Christopher, and Robert S. Erikson. 2002. “The Timeline of Presidential
Election Campaigns.” Journal of Politics 64:969–93.
Zaller, John. 1998. “Monica Lewinsky’s Contribution to Political Science.” PS: Po-
litical Science and Politics 31:182–89.
———. 2002. “The Statistical Power of Election Studies to Detect Media Expo-
sure Effects in Political Campaigns.” Electoral Studies 21: 297–329.