Research On Statistics Learning and Reasoning: J. Michael Shaughnessy
Research On Statistics Learning and Reasoning: J. Michael Shaughnessy
J. Michael Shaughnessy
PORTLAND STATE UNIVERSITY
At the beginning of the 1990s a rather unique profes- curriculum leaders followed the lead of the NCTM
sional opportunity came my way when the National Curriculum Standards and began to build statistics into
Council of Teachers of Mathematics (NCTM) asked their state mathematics frameworks and likewise be-
me to author a chapter on research on the teaching gan to assess students’ growth in learning statistical
and learning of probability and statistics for the Hand- concepts. Consequently, curriculum developers began
book of Research on the Teaching and Learning of Mathemat- to pay more than lip service to statistics. Prior to the
ics (Shaughnessy, 1992). At that time, research on stu- Standards statistics had been a lost stepchild in math-
dents’ understanding of probability and statistics was ematics curriculum frameworks, the mere frosting on
just beginning to blossom in the United States though any mathematics program if there was time at the end
research in stochastics had been conducted for sever- of the school year. Now statistics is here to stay as a
al decades in other countries, principally in Europe. major strand in school mathematics programs in the
During the 1980s, interest in research in probability United States.
and statistics in the United States was being fueled by This book represents the second effort by NCTM
interactions with researchers from other countries at to survey, analyze, and compile research in mathemat-
conferences such as the International Conference on ics education, and I have another unique opportunity
Teaching of Statistics (ICOTS) and the International to author a chapter, this time focusing on statistics
Group for the Psychology of Mathematics Education (maybe they are hoping I will get it right this time).
(PME). Towards the end of the 1990s, the long efforts However, this time around a synthesis and analysis of
of a core group of mathematics and statistics educa- the research in data and chance is far more challeng-
tors in the United States to promote an increased ing than it was 15 years ago. Research in probability
emphasis on statistics in school mathematics pro- and statistics is no longer a fledgling discipline, and
grams finally took root. Probability and statistics were reviewing all of the relevant literature is no longer pos-
included among the content standards in NCTM’s sible as it was for the first Handbook. There has been an
groundbreaking document, Curriculum and Evaluation amazing boom in research, curriculum development,
Standards for School Mathematics (1989). This was the and assessment in statistics education. A recent review
first time that a national organization in the United of the research literature in data and chance carried
States, in fact the national organization for mathemat- out just in Australasia over a 4-year period from 2000
ics teachers, placed statistics on an equal footing with to 2003 turned up over 150 citations in statistics educa-
number sense, algebra, geometry, and measurement tion (Pfannkuch & Watson, 2005). Compare that to
as a critical foundation stone for school mathematics. around 150 international references that were in the
Suddenly statistics became a more attractive area in 1992 Research Handbook chapter that covered a 30-year
which to conduct research on student learning. State
period of research in statistics all over the world, and
957
958 ■ STUDENTS AND LEARNING
the magnitude of the current task becomes apparent. (2000) (PSSM), statistics continued to hold a promi-
Contributions to the field have come from a very wide nent position in NCTM’s vision for school mathemat-
range of investigators, including mathematics and sta- ics. The two editions of the Standards documents
tistics educators, statisticians, cognitive psychologists, (1989, 2000) have had counterparts in other coun-
educational psychologists, and science educators. tries that have proclaimed similar messages, touting
Satellite roundtable conferences on statistics ed- the important role that statistics plays in the educa-
ucation have been held at the last three meetings of tion of our citizens, both as learners of statistics at the
International Congress of Mathematics Education school level and as consumers of statistics at the adult
(ICME) in Granada (1996), Hiroshima (2000), and level. For example, A National Statement on Mathematics
Lund (2004), and these have continued to catalyze re- for Australian Schools (Australian Educational Council,
search in statistics education. There is also now an in- 1991) and Mathematics in the New Zealand Curriculum
ternational group of statistics educators who have been also called for considerable statistics to be taught as
holding semi-annual conferences on Statistics Reason- part of school mathematics programs, in part to em-
ing, Thinking, and Literacy (SRTL) in Israel (1999), power students to critically evaluate data and claims
Australia (2001), the United States (2003), and New made about data.
Zealand (2005). The field is now expanding so rapidly National calls for increased attention to statistics
that keeping abreast of everything is quite impossible. have played an important role in catalyzing research in
However, I do not think that the growth in research in the learning of statistics. However, national calls them-
probability and statistics is well known among the gen- selves will not amount to much unless good curricu-
eral community of mathematics educators. Recently at lum materials for students and accompanying guides
a conference a famous and very widely read colleague for classroom teachers are available to implement the
in mathematics education asked me, “Are you still work- suggestions of those calls. Statistics has been embedded
ing in statistics? There hasn’t been very much going on in a prominent way in many of the recent curriculum
there recently, has there?” I was really amazed that he development projects in the United States and Canada
was unaware of what has been going on in the area, and throughout the 1990s. These curricula, partly inspired
he is probably typical of many mathematics educators. by the development and implementation of the Stan-
The sheer volume of research in both probability dards documents, have provided another major impe-
and statistics has in fact necessitated two separate chap- tus for research on statistics. The Standards documents
ters in this edition of the Handbook, one on probability have influenced curriculum projects such as Math-
(see Jones, Langrall, & Mooney, this volume) and one ematics in Context (1994), Core-Plus Mathematics (1997),
on statistics. In one sense, this is rather unfortunate be- Data Driven Mathematics (1999), Connected Mathematics
cause there are natural connections between statistics (1998), and Investigations Into Number, Data and Space
concepts and probability concepts, for example in sam- (1998) just to point out a few examples. The round of
pling distribution, confidence intervals, and significance curriculum materials in the 1990s systematically wove
tests. Furthermore, researchers must continue to clarify statistics throughout all their targeted grade levels and
the learning connections between statistics and prob- incorporated material to help meet the recommenda-
ability for teachers and students. However, reviews of re- tions of PSSM for probability and statistics in K–12 that
search are more useful if they focus on particular issues, say that students should:
rather than trying to do everything. In this chapter I will
• Formulate questions that can be addressed
focus primarily on research on students’ learning and
with data and collect, organize, and display
reasoning in statistics from the past 15 years or so, mak-
relevant data to answer them.
ing connections to probability along the way. Most of the
research discussed in this chapter concerns students’ • Select and use appropriate statistical methods
understanding of descriptive statistics and data analy- to analyze data.
sis, as there has not yet been much research devoted to • Develop and evaluate inferences and
students’ understanding of inferential statistics—confi- predications that are based on data.
dence intervals, hypothesis testing, p-values, and so on. • Understand and apply basic concepts of
probability.
curriculum materials for statistics does not mean that of items have continued to grow over the past 30 years
they are being uniformly and faithfully implemented in NAEP. The percentage of NAEP items classified as
everywhere throughout the United States and Cana- data analysis, statistics and probability has more than
da. Our teaching force is undernourished in statistical tripled at Grade 12 (from 6% in 1986 to 20% in 1996)
experience, as statistics has not often been a part of and almost doubled in Grade 8 (from 8% to 15% over
many teachers’ own school mathematics programs. In the same time period). In the 1996 administration,
many schools there is a tremendous need for profes- the NAEP content strand called Data Analysis, Statis-
sional development in the area of statistics. tics and Probability was designed to emphasize “the
At the same time that many teachers need more work appropriate methods for gathering data, the visual ex-
with statistics themselves, the number of students taking ploration of data, various ways of representing data,
Advanced Placement (AP) statistics courses in secondary and the development and evaluation of arguments
school has greatly increased in the United States. The based on data analysis.” (National Assessment Govern-
AP exams, constructed by Educational Testing Service ing Board, 1994, p. 77).
(ETS), provide a mechanism for secondary school stu- In the past, NAEP statistics items have included
dents to obtain college-level credit for coursework taken calculating measures of center (mean, median, mode),
in high school. Growth in the number of students who as well as making inferences from data, reading and
register for the AP statistics exam in the United States has interpreting graphical representations of data, predict-
been the fastest of any AP course in the history of the Ad- ing beyond the data given in graphs, identifying which
vanced Placement Program. In 1997, the first time it was statistic (e.g., mean or median) is more appropriate in
administered, 7,500 students took the AP statistics test. various situations, and answering questions about sam-
According to the chair of the AP statistics grading com- ples and sampling. Information from NAEP tasks can
mittee, the number of students taking the exam grew to help to identify growth in students’ understanding of
37,000 students in 2000, then to 50,000 students in 2002, statistical concepts as well as potential trouble spots for
and to as many as 65,000 in 2004 (Roxy Peck, personal students in understanding statistics. Therefore, NAEP
communication, March, 2004). The AP statistics option data can be a good source for mathematics educators to
has been another catalytic force in the growth of atten- formulate researchable questions about student growth
tion to statistics in K–12 schools. Advanced Placement and understanding. Some summary highlights of the
Statistics is a very popular option in secondary schools 1996, and the 2000 and 2003 NAEP findings about stu-
because most students need to take some sort of statistics dents’ knowledge of statistical concepts provide a back-
course in college, whatever their major. In fact, introduc- drop for the rest of the chapter. In an analysis of the
tory statistics has recently outgrown calculus for the larg- 1996 NAEP statistics items, Zawojewski & Shaughnessy
est enrollment in any mathematics or statistics class at (2000) noted that:
many colleges and universities.
• Over half of 8th-grade students read informa-
Spreading the word on the importance of statistics
tion from tables, charts, and graphs but expe-
for students and getting good statistics materials ad-
rienced difficulty in using the information for
opted and faithfully implemented in K–12 classrooms
other substantive purposes such as drawing
requires a long-term effort that must be coupled with
conclusions based on data.
research on student learning and assessment of stu-
• Based on performance on two NAEP items,
dent progress. In that regard, perhaps an overall snap-
about three-fourths of students in Grade 12
shot of student progress in learning statistics is a good
can successfully read line graphs, yet only 37
starting point for our research journey in this chapter.
percent could read a box plot.
In the next section I provide a brief overview of stu-
dents’ knowledge of statistical concepts and skills as • Comparing 1990 results to 1996 results, there
documented by the National Assessment of Educational was significant growth in 8th- and 12th-grade
Progress (NAEP) in the United States. students’ performance on NAEP items that
required they find the mean and median for
particular data sets.
• When given a choice, 8th-grade and 12th-
NATIONAL ASSESSMENT OF EDUCATIONAL grade students tend to select the mean over
PROGRESS: A SHORT OVERVIEW the median, regardless of the distribution of
ON STATISTICS the data.
• Given their performance on items that
Statistics items have been included in each of the assessed the appropriateness of survey
NAEP frameworks since 1973. The variety and depth samples, over half of the students in Grade
960 ■ STUDENTS AND LEARNING
8 appropriately considered the potential for tistics concepts have been slowly making their way into
bias and the number of data points. school mathematics programs. There is still an enor-
mous amount of work to be done to get statistics into
A subsequent analysis by Tarr and Shaughnessy every student’s school mathematics program through-
(in press) of the results on the statistics items from the out the K–12 school years.
2000 and 2003 NAEP administrations found that:
&Tversky, 1972, 1973a, 1973b; Tversky & Kahneman, ability and statistics chapter in the first edition of this
1974, 1983). Other authors (e.g., Shaughnessy, 1977; Handbook (Shaughnessy, 1992) included a distinction
Konold, 1989; Konold, Pollatsek, Well, Lohmeier, & among normative, prescriptive, and descriptive models that
Lipson, 1993) have applied or extended the work of may help to distinguish among the current constructs of
Kahneman and Tversky. Konold (1989) accounted statistical thinking, statistical literacy, and statistical rea-
for types of student reasoning on tasks that did not soning. Garfield (2002) summarized a number of gen-
fall neatly into Kahneman and Tversky’s categories eral perspectives on what she called statistical reasoning,
of heuristic reasoning. A model for reasoning under including correct and incorrect ways that students rea-
uncertainty has turned out to be much more compli- son about statistics, and ways of assessing statistical rea-
cated than some of the initial models suggested, as soning. In this chapter I take a perspective that attempts
pointed out in several recent interpretive reviews of to distinguish between statistical thinking and statistical
research in statistics (Konold & Higgens, 2003) and reasoning, as well as to discuss statistical literacy.
probability (Shaughnessy, 2003a). Models of statistical thinking help both researchers
Much of the earlier research in probability and sta- and teachers to attend to the important concepts and
tistics was concerned with misconceptions that students processes in the teaching and learning of statistics.
had about probability and statistics, a line of research These models reflect what we want learners, consum-
that has persisted even until fairly recently (Fischbein ers, and producers of statistics to know. Thus, models
& Schnarck, 1997). However, researchers have begun of statistical thinking are primarily normative models
to pay more careful attention to the details of the de- of what statisticians feel are the important concepts
velopment of students’ understanding of and thinking and processes of their discipline.
about statistics. More mature representations of stu- Models of statistical literacy help to identify critical
dent thinking have arisen over the past decade. Much statistical survival skills for both school students and
of this growth in new models and frameworks has come adults. Students are primarily learners of statistics,
about from careful documentation and analysis of stu- but also they can be consumers of statistics in making
dent thinking by researchers employing a qualitative decisions on what to buy, or possibly even producers
research methodology while students work on statis- of statistics if they are working on a research project
tical tasks that can elicit an array of responses, from themselves. Adults are often in job situations in which
rather naive to somewhat sophisticated responses. As they are producers of statistics. Models of statistical
a result, most researchers in statistics (and probability) literacy often have a prescriptive tone, suggesting what
now hunt for a spectrum of student thinking in their students and life-long learning adults need to do in
research work, rather than taking an approach that order to be well informed, or to make good decisions,
students either do or do not “have it.” This alteration or to take advantage of the data that are available to
of perspectives on student thinking from a misconcep- them. Statistical literacy may also include recommen-
tions viewpoint to more of a transitional conceptions dations for the development of students’ and adults’
position is partly due to an increased attention to and critical thinking skills, so that claims made with data
acceptance of a constructivist epistemology that places can be questioned and analyzed.
students at the active center of their learning. It is also Finally, cognitive and developmental research
partly due to the types of models and frameworks them- frameworks can provide interpretive lenses to help
selves that have been employed in response to a con- us identify and track students’ and adults’ statistical
structivist theory of knowledge. reasoning and their conceptual development. Models
In reviewing the literature for this second round of statistical reasoning are primarily descriptive models
of the Handbook, a new crop of frameworks and mod- that help clarify how people are thinking about sta-
els for research in statistics is surfacing. A discussion of tistics, what they seem to know and understand, and
some of these models and frameworks is in order early where they have difficulty. The descriptions of student
on in this chapter, as they will help provide lenses for thinking obtained from models of statistical reasoning
discussion and comparison of research results. can also point out opportunities for scaffolding statis-
tical ideas in the teaching of statistics.
Each of these three important realms, statistical
thinking, statistical literacy, and statistical reasoning,
STATISTICAL THINKING, STATISTICAL
while overlapping, are potential focal points for re-
LITERACY, AND STATISTICAL REASONING
search, teaching, and curriculum development in sta-
tistics. They have been the motivating force behind the
The literature on judgment and decision making under creation of an ongoing international working group in
uncertainty that was reviewed and analyzed in the prob- statistics education that has spawned the SRTL confer-
962 ■ STUDENTS AND LEARNING
ences and produced a book on statistical thinking, lit- sented a four-dimensional model of statistical think-
eracy, and reasoning (Garfield & Ben Zvi, 2004). ing consisting of two dimensions they called cycles of
thinking activity—an Interrogative cycle and an Inves-
tigative cycle—and two more dimensions called Types
A Model of Statistical Thinking
of statistical thinking and Dispositions, respectively
How do statisticians generate statistical questions? (See Figure 21.1).
How do they decide upon a design for a study, which According to Wild and Pfannkuch when a statisti-
data to collect, and what to take into account when cian works on a statistical problem, parts of these four
analyzing the data? Are particular ways of thinking dimensions are continually, and simultaneously, in
germane to statistics? Wild and Pfannkuch (1999) pre- use (Pfannkuch & Wild, 2000, 2004). They claim that
(a) Dimension 1: The Investigative Cycle (PPDAC) (b) Dimension 2: Types of Thinking
General Types
• Interpretation
• Conclusions
Conclusions Problem • Strategic
• New ideas • Grasping system dynamics — planning, anticipating problems
• Communication • Defining problem — awareness of practical constraints
Analysis • Seeking Explanations
Plan
• Data exploration Planning • Modelling
• Planned analyses • Measurement system — construction followed by use
• Unplanned analyses • “Sampling design”
Data • Applying Techniques
• Hypothesis generation • Data management
• Data collection • Piloting & analysis — following precedents
• Data management — recognition and use of archetypes
• Data cleaning — use of problem solving tools
Types Fundamental to Statistical Thinking
(Foundations)
• Recognition of need for data
(c) Dimension 3: The Interrogative Cycle • Transnumeration
(Changing representations to engender understanding)
— capturing “measures” from real system
Decide what to: Judge — changing data representations
Generate Imagine possibilities for: — communicating messages in data
• believe • plans of attack
• continue to entertain • explanations/models • Consideration of variation
• discard • information requirements — noticing and acknowledging
Criticize Seek — measuring and modelling for the purposes of
Check against Information and ideas — prediction, explanation, or control
reference points: • internally — explaining and dealing with
• internal Interpret • externally — investigative strategies
• external • Read/hear/see • Reasoning with statistical models
• Translate — aggregate-based reasoning
• Internally summarize
• Compare • Integrating the statistical and contextual
• Connect — information, knowledge, conceptions
• Scepticism
• Imagination
• Curiosity and awareness
— observant, noticing
• Openness
— to ideas that challenge preconceptions
• A propensity to seek deeper meaning
• Being Logical
• Engagement
• Perseverance
their model of statistical thinking can also be used to The dimension on Types of thinking is two-
analyze student thinking, not just a statistician’s think- pronged in the Wild and Pfannkuch model. Al-
ing, and that such analyses can inform both teaching though some thinking is inherently of a statistical
and curriculum development in statistics. The initial nature, there are also more general types of strategic
building blocks for their model came from three data thinking that remind one of Polya’s problem-solving
sources: students’ work on statistical tasks; interviews heuristics. The inherently statistical types of thinking
with student team leaders of statistical projects; and, highlighted by Wild and Pfannkuch include (a) the
perhaps most importantly, interviews with six statisti- need for data; (b) attention to variation; (c) the use
cians working in various settings (business, marketing, of historical, statistical, and probabilistic models (such
medicine, etc.). In developing their model Wild and as those used in inference); (d) the critical impor-
Pfannkuch tapped and expanded upon the writings tance of context knowledge; and (e) transnumeration, a
of previous statistics educators, such as Moore (1990, word coined by Wild and Pfannkuch. Except for the
1997), who discussed the omnipresence of variabil- modeling tools aspect, Wild and Pfannkuch’s list is
ity, the need for data and data production strategies, quite different than a typical list of the hallmarks of
and the measuring and modeling of variability. Sta- mathematical thinking, which include processes such
tistics educators are likely to resonate with Wild and as looking for patterns, abstracting, generalizing, spe-
Pfannkuch’s model, whereas mathematics educators cializing, and generating and applying algorithms. In
who have not delved much into the area of statistics particular, the invented word transnumeration needs
may encounter some thought-provoking and unfa- some further explication.
miliar issues. Cobb and Moore (1997) have written Wild and Pfannkuch created the word transnu-
eloquently and persuasively about the differences meration because sometimes in the data organization
between the disciplines of statistics and mathematics. and analysis phase, a particular representation of the
Some of the differences between solving mathemati- data can reveal entirely new or different features that
cal problems and solving statistical problems surface were previously hidden. These hidden features may
in the Wild and Pfannkuch model. have a major impact on how one interprets the data in
The Investigative cycle, tabbed PPDAC (Problem, that particular context. Wild and Pfannkuch needed
Plan, Data, Analysis, Conclusion), is really the man- a word that went beyond a mere transformation or re-
tra of all statistical investigations and is reminiscent of representation of the data, so as to identify instances
Polya’s (1945) seemingly timeless four-step model of in which striking features of a context are suddenly
mathematical problem solving (Understand, Plan, Ex- revealed. An analogy might be the sudden insight Eu-
ecute, Review). However, there are some crucial dif- reka! experience that mathematical problem solvers
ferences between Polya’s model and PPDAC, chiefly often speak about.
lying in the two Ps and the D. Statistical problems are An example of transnumeration occurred in the
often ill posed at first, as they arise out of messy con- reanalysis of the O-ring data for the Space Shuttle af-
texts. Often there is a need for several iterative cycles ter the Challenger incident (Dalal, Fowlkes, & Hoadley,
just between the Problem ↔ Plan phases of the In- 1989). It was hypothesized that there might be a rela-
vestigative cycle, in order to adequately formulate a tionship between the air temperature at launch time
statistical problem from a murky prestatistical situa- and O-ring failures on the shuttle. The original data
tion. The D, for data in the PPDAC cycle of a statis- set that was being analyzed prior to the launch of the
tics problem, is very different from the type of data Challenger included only information on the launch
that one might encounter in a mathematical problem. temperature and number of O-ring failures per
Data for statistics problems are accompanied by ex- launch. No significant trends could be found. How-
cess baggage such as bias, uncontrollable sources of ever, when data on launch temperatures and the num-
variation, context issues, and so on. Most of the cur- ber of O-ring successes was included, the augmented
rent statistics education in the United States places a data set revealed a cut-off temperature above which
heavy emphasis on the DAC parts of the Investigative O-ring failure had never occurred on the shuttle. The
cycle, but precious little time is devoted in classrooms fact that it was probably too cold to launch the Chal-
to the PP parts. If students are given only prepack- lenger was not clearly evident until after this transnu-
aged statistics problems, in which the tough decisions meration process.
of problem formulation, design, and data production Although transnumeration is more likely to be
have already been made for them, they will encounter encountered by statisticians as they investigate a sta-
an impoverished, three-phase, investigative cycle and tistical situation, it can also occur during teaching
will be ill equipped to deal with statistics problems in episodes with students. For example, while watching
their early formulation stages. students work with a data set on the wait time between
964 ■ STUDENTS AND LEARNING
blasts for the Old Faithful Geyser, Shaughnessy and own opinions about statistical information and data-
Pfannkuch (2002) found that when students moved based messages” (Gal, 2003, p.16). There is a clear,
from representing the data in box plots or histograms overarching importance for students and adults alike
to creating plots over time, a whole new vista on the to be able to critically read and evaluate information
data was opened up to them. (The geyser data are in tables, graphs, and media reports, and to adopt a
discussed in more detail later in this chapter. See for healthy questioning attitude towards what is presented
example Figures 2 and 3). Plots over time revealed a by sellers and buyers, by scientists and by the govern-
short-long cycle of wait times, whereas box plots and ment, by politicians and by the news media. Perhaps
other graphical representations of the geyser data someday quoting numbers without an adequate basis
masked the nature of the variability. A teaching and in fact will be outlawed. Meanwhile everyone is sus-
learning culture that encourages transnumeration ceptible to those who would stretch the truth, or only
might evolve if teachers and curriculum developers tell that part of a data story that suits their purposes.
heed the advice of some researchers who have rec- Better that we pay heed to quotes such as those above,
ommended that students have more opportunities than that we fall prey to statements such as this one
to construct their own representations of data rather made by a mid-western congressman several years ago
than working primarily with canned tables and graphs while on the campaign trail: “We will legislate that all
(Cobb, 1999; Lehrer and Romberg, 1996). In this re- children in our state schools will be above average.”
gard, the Tinkerplots software (Key Curriculum, 2005a) Although a thorough trek through the literature on
allows students considerable flexibility in constructing statistical literacy is well beyond the scope or intention
their own visual representations of data and in gener- of this chapter, work in statistical literacy has provided
ating their own hypotheses and conjectures, and thus several frameworks that are useful lenses to consider
it encourages the transnumeration process. as we examine research in statistics education.
The Interrogative cycle in Wild and Pfannkuch’s Among the abilities subsumed under statistical lit-
model involves explicit metacognitive activity. There eracy are Document Literacy, Prose Literacy, and Quan-
is an intense reflective component in the Interroga- titative Literacy that comprise the three facets of adult
tive cycle, as statistical problem solvers must always literacy identified by Kirsch, Jungeblut, and Mosenthal
deal with beliefs, emotions, and the danger of their (1998). Kirsch et al. presented a five-category frame-
own, or their client’s, narrow perspectives. The Inter- work—locating, cycling, integrating, generating, and mak-
rogative cycle echoes the claim by Shaughnessy and ing inferences—that adults use when they read informa-
Pfannkuch (2002) that statistical problem solving tion from tables and graphs. Mosenthal and Kirsch
needs to be done by “data detectives” who continually (1998) illustrated the power of using this framework
question and reflect upon the processes of data pro- for understanding the types of skills that adults need
duction and data analysis. in order to read documents by providing a measure of
The Dispositions dimension of Wild and document complexity. This framework for analyzing
Pfannkuch’s model has much in common with not the graph- and table-reading skills of adults has been
only mathematical problem solving, but also problem used in publications of several large-scale surveys, such
solving in any arena. All problem solvers need to be as The National Adult Literacy Survey (NALS) and the
curious, be aware, have imagination, be skeptical, be International Adult Literacy Survey (IALS). The cat-
open to alternative interpretations, and seek deeper egories in Kirsch et al.’s adult literacy framework are
meaning, as in the look back stage of Polya’s model. somewhat similar to Curcio’s (1989) three levels of
reading graphs, “Reading the Graph,” “Reading Within
the Graph,” and “Reading Beyond the Graph.”
Models Focused on Statistical Literacy
Reading the Graph corresponds to locating in the
“Statistical thinking will one day be as necessary Kirsch et al. model. Integrating and generating are sug-
for efficient citizenship as the ability to read and gestive of Curcio’s Reading Within the Graph, and
write.” This quote from H.G. Wells appeared in Dar- Reading Beyond the Graph includes such activity as
rell Huff’s famous little book How to Lie with Statistics making inferences in the Kirsch model. The resonance
(1954). Since then numerous authors have addressed between these two frameworks for graph and table lit-
the importance of statistics for an educated and com- eracy, Kirsch et al.’s and Curcio’s, is indicative of what
petent citizenry. Consider: “The need for statistical seems to be occurring more and more in research
thinking in social decision-making is exemplified ev- in statistics education: similar statistical reasoning
ery day in the news media” (Watson, 1997, p.107); or, phenomena have been identified independently by
to be statistically literate, one must have “. . . the abil- separate researchers who are pursuing related but
ity to interpret, critically evaluate, and express one’s not necessarily identical questions. Although the lan-
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 965
guage is different, the types of categories that Curcio data come from contexts that are unfamiliar. Gal’s re-
and Kirsch et al. developed point to a similar com- flections on adult literacy should give both research-
plexity on a scale of graphical literacy. When interpre- ers and curriculum developers some pause as they
tive frameworks align like this, they cross validate one think about what pre-cursor skills are necessary for
another in powerful ways. adults to develop this type of web literacy. This type
Gal has been one of the champions in research- of literacy also should start with students when they
ing and promoting statistical literacy, mainly as part of are in school. Perhaps students need to begin with
his interest in general adult literacy (Gal, 2002, 2004). digestible chunks from large databases that will help
In an article in the International Statistical Review1 (Gal, them scaffold from a basic level of statistical literacy
2002), Gal claimed that statistically literate behavior (say, reading and interpreting tables and graphs)
by adults depends on their ability to access five knowl- to the type of literacy that Gal is suggesting. In this
edge bases: general literacy knowledge (like the previ- regard, the Watson (1997) framework for statistical
ously mentioned document, prose and quantitative literacy may provide a roadmap to develop students’
types of literacy), statistical knowledge, mathematical statistical literacy skills.
knowledge, knowledge of the context, and knowledge While investigating students’ developmental pro-
of how to be critical and question claims. Gal also ex- gression in statistical literacy, Watson (1997) identified
amined how these five knowledge bases can interact three Tiers of literacy that arose as students worked
with a person’s dispositions, beliefs, and attitudes on statistical tasks involving the media, newspapers,
towards data and statistics in general. There is some magazines, news reports, and so on. Tier 1 is an un-
overlap between Gal’s model of statistical literacy and derstanding of basic statistical terminology; Tier 2
Wild and Pfannkuch’s model of statistical thinking dis- involves considering and embedding statistical terms
cussed above, although they are focused on different within a real context; and Tier 3 encompasses a criti-
constructs: what adults need to do to be statistically cal reasoning component. People at Tier 3 are able
literate versus what statisticians do on their job. to question statistical claims and critique media items
In another study Gal (2003) examined six web sites that involve statistics using their understanding of the
with large databases to analyze the types of knowledge basic statistical concepts and their understanding of
and skills needed to access, process and manipulate the the context (Tiers 1 and 2).
information on those sites. The web sites Gal examined Watson and her colleagues began to use these
contained information about national statistics, health Tiers in their research studies to help analyze student
issues, international education and international eco- thinking on statistical tasks (Watson & Moritz, 1997a,
nomic development. Gal noted that there were press 2000a, 2000b). Much of Watson and her colleagues’
releases, reports, executive summaries, and aggregate subsequent work on statistical literacy arose from the
data sets available on each of these web sites but they planning, execution, and multiple analyses of the re-
were often in a form primarily intended for policy mak- sults of a large longitudinal study of the development
ers. A considerable amount of context and background of school students’ knowledge of probability and sta-
knowledge was necessary for someone from the gen- tistics. About a thousand students in Australia were
eral public to wade through these websites and to initially tested on a large set of survey items on statis-
make sense of the information that was provided. Gal tics and probability, and subsequently many of these
concluded that the abilities to access, define, locate, students were retested 2 years later and some even 4
extract, and filter information were additional critical years later. Students’ understanding of such concepts
skills needed for statistical literacy. as means and middles, chance, graphs, variability,
Clearly the type of statistical literacy that Gal de- sampling, comparing data sets, and their thinking on
scribed is a different kind of statistical literacy than items from the media involving data and graphs were
just reading and evaluating data and graphs. There examined. As a result, over the past decade, Watson
are many levels and contexts for literacy in Gal’s has been one of the most active and prolific research-
work. He implied that it is important for adults, par- ers in statistics education in the world. Her work on
ticularly in their roles as consumers and voters, to students’ understandings of particular statistical con-
become “web-data” literate. However, a person does cepts will be addressed throughout this chapter.
not just automatically step in and start interpreting In a reanalysis of data from survey tasks given to stu-
and critiquing data and document information on dents in Grades 3 to 9 over a 7-year period (1993–2000),
the web. It is challenging to unpack data that appear Watson and Callingham (2003) used Rasch model tech-
in reduced formats on the web, especially when the niques and combined results on statistical tasks from
1
International Statistical Review is a publication of the International Statistics Institute (ISI).
966 ■ STUDENTS AND LEARNING
multiple student surveys from over 3000 students. They tasks. SOLO, which stands for Structure of Observed
found quantitative support for the identification of six Learning Outcomes, was first developed by Biggs and
levels of development along a uni-dimensional statistical Collis (1982) as a general model for evaluating learn-
literacy scale, ranging from idiosyncratic to critical math- ing in any context or environment. It has been heav-
ematical. These six levels are closely associated with Wat- ily used in the past decade particularly by Australian
son’s (1997) three Tiers of statistical literacy that initially researchers, and often to help code and analyze col-
arose through qualitative analyses of students’ responses lections of responses to tasks that arise in clinical in-
to the tasks. Watson has thus provided substantial docu- terviews. I discuss SOLO here briefly because it has
mentation in multiple research studies for the develop- become a prevalent analytical tool in some research
mental progression of a complex construct called sta- circles in statistics education.
tistical literacy, as well as some of its components (Wat- SOLO posits five modes of reasoning: sensori mo-
son, 1997; Watson & Callingham, 2003; Watson, Kelly, tor, ikonic (images), concrete symbolic, formal, and
Callingham, & Shaughnessy, 2003). In addition to the
post formal, adding “post formal” to Piaget’s original
research contributions Watson has made in the area of
four modes. SOLO also postulates U-M-R (uni-struc-
statistical literacy, her three Tier framework—statistical
tural, multi-structural, relational) cycles within each
terms, considering terms in context, and justifying statis-
mode. Within a given mode, for a particular task, there
tical claims—provides a useful introductory framework
may be several such U-M-R cycles. These cycles repre-
for scaffolding instruction in statistical literacy.
sent increasing orders of complexity while a person
Both Gal and Watson have made it clear that
functions within a particular mode. Uni-structural re-
any model of statistical literacy must accord a major
sponses suggest attention to only one relevant aspect
role to context, as well as the use of statistical terms,
of a task, whereas multi-structural responses involve
tools, and techniques. Furthermore, they both have
several disjoint but relevant aspects, and relational re-
noted the importance of being able to communicate
sponses suggest an understanding that integrates sev-
reactions to statistical information and to critique it
(Watson 1997; Gal, 2002). According to Watson and eral aspects of a task within a mode. Analyzing student
Moritz (1997a) “Judging statistical claims from the verbal responses is a tricky business, as anyone who
media is fundamental to being statistically literate” (p. has done it is aware. In studies that employ the SOLO
129). The research into statistical literacy has unveiled methodology, boundaries between the U-M-R levels
a very deep construct involving a myriad of types of are often blurred, but that does not necessarily pre-
skills and cognitive processes. It is therefore impor- vent the model from assisting a researcher in identify-
tant for researchers to be quite clear about what they ing a spectrum of complexity in student responses.
mean by statistical literacy when they use the term, as A good example of the potential advantages of
it has many levels of meaning. the SOLO framework can be found in Watson, Col-
lis, Callingham, and Moritz (1995). They introduced
the SOLO model and then applied it at the concrete
Models That Capture Statistical Reasoning
symbolic mode to analyze student responses to a sort-
Over the last 15 years a number of researchers ing task. Students were given a set of data cards con-
have conducted studies that focus on students’ rea- taining information about a set of people, including
soning about particular statistics concepts or pro- their eating habits, their TV watching habits, and their
cesses, such as students’ notions of average, variation, weight. The authors originally hypothesized that there
sampling, comparing data sets, graph sense, and data would be two U-M-R cycles within the concrete sym-
representations. In most of these studies an explicit, bolic mode on the data cards task given to students in
or implicit, conceptual analysis occurs. Researchers Grades 6 and 9. Students were first interviewed indi-
attempt to distill and categorize the ways students vidually while working on the task, and then groups of
reason with and think about the concepts, sometimes students worked together with the data cards and pre-
resulting in the identification of levels of reasoning. sented a report. In their analysis, the authors outlined
In this section I’ll first discuss a model that has often a U-M-R cycle with the data cards as follows:
been used to identify and assess levels of student rea-
soning in mathematics and statistics and then point to U—Students form images of individual people
several emerging models for student reasoning. and invent stories about what the people must
be like. For example, they might describe an
The SOLO Model individual person, who watches a lot of TV,
The SOLO Model is a neo-Piagetian model cre- eats a lot of fast food, and seems of higher
ated to analyze the complexity of student responses to weight, but no real sorting occurs.
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 967
M—Students begin to group cards by aspects, one In addition to the models that have emerged from
at a time—perhaps all the high and low TV SOLO theory (e.g., Jones et al., 1999, 2000), several
watchers are grouped in separate piles. groups of researchers, initially working quite indepen-
R—Students are sorting by multiple aspects, and dently, have identified and begun to use similar terms
making conjectures. For example, people to describe types of student thinking. For example,
who were high in both TV watching and fast Saldanha and Thompson (2003) referred to “additive”
food consumption might be grouped, and and “multiplicative” thinking when they describe stu-
conjectures might be made about their weight. dents’ work with sampling distributions. Shaughnessy,
Ciancetta, & Canada (2004a) identified three levels of
The second cycle that Watson et al. (1995) hy- student reasoning about variability in a repeated sam-
pothesized was not found in individual responses but ples environment, “additive, proportional, and distribu-
was found when groups of students worked on the tional.” Watson (2002) referred to “additive” and “dis-
card sorting task. This second cycle involved graphical tributional” reasoning when students are comparing
representations of the data (U2, M2), and conjectur- data sets, and Cobb (1999) to “additive” and “multipli-
ing and defending conjectures about the graphs (R2). cative” thinking when he discussed students’ thinking
Groups of students made both histograms and scatter- while comparing data sets. The “additive/multiplica-
plots of the card data. tive” terminology has been borrowed from other areas
SOLO may be a useful research tool for statistics of research in mathematics education (Vergnaud, 1983;
educators, as its framework is designed to assess re- Doerr, 2000; Thompson & Saldanha, 2003). The terms
sponses to open-ended complex tasks that elicit a hi- proportional and distributional are particularly important
erarchy of student reasoning. The SOLO model also for statistics and can have specialized meanings. Exam-
helps researchers to identify partial success on tasks, ples of the importance of proportional reasoning for
so that a spectrum of student reasoning can be ob- handling statistical tasks are presented in Watson and
served, rather than an all-or- nothing, right-or-wrong Shaughnessy (2004). Ways of characterizing distribu-
approach. For example, in an investigation of elemen- tional reasoning are being developed by several groups
tary school students’ reasoning about data sets, Jones, of researchers, for example, Bakker and Gravemeijer
Thornton, Langrall, Mooney, Perry and Putt (2000) (2004), Shaughnessy et al. (2004a), Shaughnessy, Cian-
modeled a spectrum of student reasoning using an ap- cetta, Best, & Noll (2005). Further discussion of ele-
proach quite similar to SOLO to describe four stages ments of these emerging models is provided in other
of reasoning: idiosyncratic, transitional, quantitative, sections of this chapter.
and analytical. This four-stage model was subsequent- In this section I have presented some models of
ly employed by Mooney (2002) to characterize middle statistical thinking, statistical literacy, and statistical
school students’ statistical reasoning. reasoning—interpretive and descriptive frameworks
that researchers have used when analyzing student re-
Some Emerging Models sponses to statistical tasks. The spectrum of the models
The recent development of cognitive and develop- and frameworks discussed includes both the particular
mental models to interpret student reasoning in statis- and the general, from Wild and Pfannkuch’s model of
tics is a healthy sign of growing research maturity in the statistical thinking that is specifically tailored to statis-
field. A detailed review and analysis of the use of models tics, to the SOLO model that is a general framework
in statistics education research can be found in Jones, for analyzing student responses in any content area or
Langrall, Mooney, and Thornton (2004). Among the discipline. In the next sections I discuss some of the
models they discussed are a series of four-stage, SOLO- research that has focused on students’ understanding
like models that they have developed for interpreting of particular statistical concepts and processes, such
and analyzing primary and middle school students’ re- as centers and average, variability, information from
sponses to statistics and probability tasks (Jones, Langrall, samples, comparison of data sets, and graph sense.
Thornton, & Mogill, 1999; Jones et al., 2000; Mooney,
2002). In the context of reading and interpreting data in
graphs, Jones et al. (2000) superimposed their four stage
RESEARCH ON STUDENTS’ UNDERSTANDINGS
model of student reasoning (idiosyncratic, transitional,
OF SOME STATISTICAL CONCEPTS
quantitative, and analytical) on Curcio’s (1989) three
stages of graph sense: reading the data, reading between
the data, reading beyond the data. They provided rich The research on students’ conceptual understanding
descriptions of student reasoning within each cell of the of statistics has been conducted with a wide range of
resulting 3 × 4 (Jones × Curcio) matrix. students from primary, to middle school, to second-
968 ■ STUDENTS AND LEARNING
ary, to tertiary. In this regard, there has been a change In the next sections of this chapter, as the research
from the research reported in the first Handbook of Re- lens focuses on students’ understanding of various sta-
search in Mathematics Education (Shaughnessy, 1992) in tistical concepts such as centers, or variation, or in-
two ways. First, much of the research reported in the formation from samples, or on students’ graph sense,
first Handbook was conducted with college level stu- keep in mind that this division into subsections is one
dents. Second, the stochastics research at the time of of convenience. The aspects of a distribution—center,
the first Handbook had concentrated more on proba- shape, and spread—are quite interrelated. Further-
bility concepts, or probability distributions, than it did more, distributions are often represented in graphi-
on statistical concepts. This time around a majority of cal form so that reasoning about distributions is also
the research in stochastics has been conducted on stu- confounded with reasoning about the graphs them-
dents’ understanding of specific statistics concepts. selves. Though all these concepts and constructs are
Perhaps the overarching goal of statistics educa- connected within the field of statistics, research that
tion is to enable students (of any age) to read, ana- focuses on a particular concept in statistics can some-
lyze, critique, and make inferences from distributions times reveal aspects of student thinking that help to
of data. The concept of a distribution in statistics is inform the teaching of statistics.
very complicated, and the word is used in different
ways. Statisticians talk about distributions of data, but
Research on Students’ Understanding
they also talk about sampling distributions and probability
of Average
distributions. Distributions of data sometimes have an
underlying probability distribution, such as the nor- Concepts of average—middles and means in par-
mal distribution or the binomial distribution, or, there ticular—are very powerful in statistics because means
may be no apparent normative probability distribu- and other measures of center are used to help sum-
tion for a data set. The word sampling distribution often marize information about an entire data set. Further-
refers to a finite frequency distribution of repeated more, if the data set is a sample that has been appro-
observations of some statistic—such as a distribution priately drawn from a parent population, the sample
of means, or sample proportions, or standard devia- should mirror aspects of the parent population, and
tions—that have been calculated for samples drawn the sample mean should provide an estimate for the
from some population. On the other hand, the word mean of the parent population from which the sam-
sampling distribution may refer to the “distribution of ple was drawn. On a more basic level the mean of a
all possible such statistics” for a given population, an data set or a sample is representative, or in some sense
infinite, theoretical construct. A more detailed discus- typical, of that sample or data set. “Typical” values of
sion of these various uses of the word distribution can different data sets can provide an efficient, albeit at
be found in Shaughnessy and Chance (2005). times potentially misleading, mechanism to compare
For the purposes of this chapter, an understand- and contrast those whole data sets. Indeed the mean is
ing of students’ reasoning about distributions is the a very powerful idea in statistics, and it is fundamental
(oft unstated) goal behind much of the research that for students’ understanding of summary statistics and
has been conducted on students’ understandings of statistical tests. Thus, there is plenty of motivation for
particular statistical concepts. The notion of distribu- researchers to study students’ conceptions of average.
tion includes concepts like center, shape, and spread. It is one of the most important concepts in all of the
When researchers investigate student thinking about mathematical sciences. And yet, students’ school ex-
means and middles or about variation, or when they periences with concepts of average are often reduced
investigate students’ thinking while comparing data to a computational shell.
sets or making decisions or inferences from graphs, What do students do if they are given an oppor-
they are researching students’ understandings of as- tunity to think intuitively or conceptually about aver-
pects of distributions of data. ages? What notions of average, if any, do they employ
Bakker and Gravemeijer (2004) discussed exam- when making decisions? Are students’ notions of aver-
ples of both upwards (from particular data points to age the common statistical notions, such as mean or
entire distributions of data) and downwards (from en- middles, or do they have other conceptions of what
tire distributions back to particular data points) rea- average means?
soning by students as they learn to reason about dis-
tributions of data. Shaughnessy, Ciancetta, Best, and Early Work on Average
Noll (2005) presented examples of students’ thinking Much of the early research on students’ under-
about centers, spread, and shapes as they reasoned standing of averages was done with tertiary students
about distributions of data. (e.g., Pollatsek, Lima, & Well, 1981; Mevarech, 1983;
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 969
Pollatsek, Konold, Well, & Lima, 1984). Pollatsek et al. measurement ideas like minimizing deviations, or that
(1981) found that when college students were given a zero data value must also be included and account-
the population mean (500) for SAT scores and a rath- ed for when calculating the mean, proved extremely
er extreme data value from a sample of SAT scores, difficult for their students. The process of minimizing
the students did not take the extreme value into ac- deviations from the mean normally surfaces in regres-
count or adjust their prediction for a sample mean sion analysis in high school or college, and it can be
away from 500. Rather, they just predicted that the challenging for students at that level, so it is no won-
mean of the sample would also be 500. Students often der these younger students had trouble with it. Strauss
believe that the mean is always their best bet for any and Bichler found that children do not think about
prediction. Sometimes they even believe the mean the concept of the mean in the same ways that statisti-
is the most likely result to occur in a sample, even if cally mature adults do.
the mean itself is not a possible data point. The Pol-
latsek studies also found that when tertiary students Conceptual Models of Students’ Thinking About Averages
were given means for two unequal-sized samples and Mokros and Russell (1995) conducted one of the
asked to find the mean of the combined sample, that first studies that investigated young students’ concep-
the students tended to weight the samples equally tual understanding of averages (see also Russell &
and find the midpoint of the two sample means (av- Mokros, 1996). They interviewed Grade 4, 6, and 8
erage of averages—a common mistake by students). students in what they called “messy data” situations,
Mevarech (1983) referred to this as the “closure mis- using contexts like allowance money and food prices
conception,” as she hypothesized that students have that were familiar to students. The tasks went beyond
a “group structure” in the back of their mind when straightforward algorithmic computations and tried
operating with means or variances in statistics. These to elicit students’ own developing constructs of aver-
early studies on students’ understanding of the mean age. All of their students had been taught the proce-
indicated that many tertiary students have at best a dure for finding the arithmetic average, so they had
very procedural understanding of the mean, as some- some familiarity with computing means.
thing to be calculated. The types of tasks Mokros and Russell used tend-
Strauss and Bichler (1988) conducted one of the ed to ask students to work backwards from a mean to
first studies reported on younger students’ (Age 8–14) possibilities for a data set that could have that mean.
conceptions of average. They asked students a series of For example, in the Potato Chips problem, students
structured questions in a one-on-one setting to probe were told that the mean cost of a bag of potato chips
students’ understanding of “statistical” or “abstract” was $1.35, and then they were asked to construct a col-
properties of average, as well as whether students re- lection of bag prices that had that mean of $1.35. The
alized that an average was representative of a set of Allowance problem was similar except that students
values. Strauss and Bichler concentrated on computa- were given several existing data points along with the
tional and measurement properties of the mean, rath- mean, were asked to create the rest of the collection of
er than on conceptual properties. For example, they data points that would have that mean, and were told
tested to see if students realized that the mean had that they could not use the mean itself as a data point.
to be located between the extreme values in a data Mokros and Russell were searching for students’ own
set. They tested to see if students were aware that the preferred strategies when dealing with averages. Their
mean is influenced by particular values in a data set. analysis of students’ thinking resulted in the identifi-
They tested to see if students realized that the average cation of five different mental constructs that students
itself did not have to be one of the values in the set, had of average: average as mode, average as algorithm,
or if students knew that the sum of deviations of data average as reasonable, average as midpoint, and average
points from the average was zero. They also tested to as point of balance.
see if students realized that the mean is in some sense Mokros and Russell found that students who fo-
closest to all the values in a data set. (Formally, this cused on modes, or “mosts,” in data sets had difficulty
would mean that the mean is the value that minimizes working backwards from the mean to construct a data
the sum of the squared deviations from data points, a set if they were not allowed to use the actual average
fairly advanced concept for students Ages 8 to 14). value itself as a data value. They concluded that modal
In their analysis Strauss and Bichler identified two thinking students do not see the whole data set, the dis-
very different levels of difficulty in their tasks. On the tribution, as an entity in itself. They see only individual
one hand, students were quite aware that the mean data values. About the same time as the Mokros and Rus-
was between extremes, and that particular data values sell study, research by Cai (1995) found that although
can influence the mean. However, more sophisticated most students could calculate a mean when given all
970 ■ STUDENTS AND LEARNING
the data, they had great difficulty working backwards, not provide good underpinnings for connecting to
filling in missing values when given the mean. computational algorithms for average.
Mokros and Russell also discovered that students Foreman and Bennett (1995) make the case that
who had a purely algorithmic conception of average a related intuitive concept of average, average as fair
were unable to make connections from their com- share does in fact translate well to computational
putational procedures back to the actual context. algorithms. Students represent data in columns of
For example, in the Potato Chips problem, some stu- blocks, and then “level” the blocks to produce a visual
dents multiplied $1.35 by 9, and then just divided by 9 display of fair shares. This approach starts with whole
again, thus generating a data set that had $1.35 for an numbers but can generalize to rational numbers (as
average, but every single value in the data set was also stacks of blocks can be sketched and “cut” when nec-
$1.35. For such students average is something that essary). The fair-share approach allows students to
you do with numbers, they have no rich conceptual generate their own computational algorithms that
understanding of average. On the basis of responses reflect the physical leveling of stacks of blocks. This
gathered during student interviews Mokros and Rus- ‘’leveling” approach can also lead to the mean as the
sell suggested that students who preferred a rule or al- point of balance for a distribution, as it highlights the
gorithm for averages may actually have had their own value of the mean as a representative value of an en-
intuitive thinking about average as “typical” interfered tire data set. Friel (1998) discussed connections to
with during their schooling. computational algorithms for both the leveling and
Students who thought of average as reasonable the balancing models.
tended to refer to information from their own lives. In a longitudinal study of the development of stu-
Perhaps they thought of average as a mathematically dents’ concepts of average in Grades 3 to 9, Watson
reasonable, but not necessarily precise, approximation and Moritz (2000c, 2000d) built upon the work of
for a set of numbers. These students thought that no Mokros and Russell, as well as work of other research-
precise answer existed for problems like the Elevator ers. Watson and Moritz included questions such as
problem, which asked “If there are 6 women who aver- “Have you heard of the word average? What does it
age 120 pounds, and 2 men who average 150 pounds mean?” and “How do you think they got the average
in an elevator, what is the average weight of everyone of 3 hours a day for watching TV?” taken from a me-
in the elevator?” Students would say, “But, you don’t dia report. In another question they told students “On
know everyone’s exact weight.” For such students, the average, Australian families have 2.3 children. What
mean is somewhat representative of a situation, but can you tell from this?” In addition to probing for ini-
not always calculable. tial understandings of average in media or everyday
Even though students might not formally know contexts, Watson and Moritz asked students to work
what a median is, they have a good sense of average as backwards and fill in missing data values when given
midpoint. Mokros and Russell found that some stu- the mean, and to find the average in weighted mean
dents worked backwards to a distribution by symmet- situations, similar to the tasks of Mokros and Russell.
rically choosing values above and below the average, Watson and Moritz (2000c, 2000d) used the
for example, above and below the $1.35 value for the SOLO taxonomy to model student responses on
Potato Chips problem. Similar to the students who fo- tasks about average and they described six levels of
cused on modes, these students also had some trouble student understanding of average. For example, a re-
when they were not allowed to use the average itself as sponse by a student who invented a story about a se-
one of the data points. cret camera that was used to keep an eye on students
Mokros and Russell provided one of the first to see how much television they were viewing instead
attempts to build a developmental framework for of doing homework was classified as pre-structural. Re-
characterizing students’ thinking about average. In sponses involving colloquial terms for average like
their research they were disappointed to not find “normal” or “alright” were coded uni-structural. A ref-
more students who had richer conceptions of aver- erence to a computational algorithm without strong
age, such as average as balance point, which might conceptual connections was coded as multi-structural.
be adapted more naturally to computational algo- If students’ thinking suggested that they considered
rithms. They concluded that the development of the average as a representative for an entire data set,
higher-level conceptions of average may need to be their response would be considered relational. Wat-
scaffolded for students via instructional interactions. son and Moritz also later identified two response
They also lamented the fact that some students’ pre- levels beyond relational with a subset of these same
ferred intuitive notions of average, such as average students who successfully dealt with applications of
as modal or average as reasonable, conceptions that do the mean in multiple contexts.
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 971
Watson and Moritz have provided strong evidence sell’s earlier work. They postulated four conceptual
from a large sample of student interviews (over 90) perspectives for the mean: mean as typical value, mean
that students’ conceptions of average follow a devel- as fair share, mean as data reducer, and mean as signal
opmental path that proceeds from “stories” then to amid noise. They argued that from a statistical point of
“mosts and middles” and finally to the mean as “rep- view that the mean as “signal amid noise” is the most
resentative” of a data set. At Grade 3, 71% of the stu- important and most useful conception of the mean,
dents’ responses were either pre-structural or uni- because they feel that this conception of the mean is
structural. At Grade 5, about 50% of the responses the most helpful for comparing two data sets. Further-
were above the uni-structural level, and by Grade more, they recommended that the mean should be in-
7 all the students in their study were at least at the troduced to students in the context of comparing data
multi-structural level. A strong positive association was sets. They also argued that inasmuch as conceptions
found between the SOLO levels and student responses of average like typical or fair share are not as powerful
by grade level. A partial explanation for this might be for comparing groups, they shouldn’t be emphasized
that by Grade 7 the three measures of center—mode, with students. Konold and Pollatsek’s conceptions of
median and mean—were to have been covered in the mean as typical or mean as fair share are more closely
Australian national curriculum recommendations. tied to a data analysis perspective on statistics, while
However, there is also a purely developmental com- their conceptions of mean as a data reducer and or
ponent to the students’ understanding of average that mean as a signal are more closely connected to deci-
is very robust. Watson and Moritz (2000c) conducted sion-making in statistics. Data reduction is necessary
follow-up interviews 3 to 4 years later with these same in decision-making in order to locate an informative
students and found that they had maintained the level signal amid the noise of variability.
of their thinking about average. Once students had From a normative point of view (what statisticians
developed powerful and flexible conceptions of the are looking for) Konold and Pollatsek may have a
mean, including the ability to calculate weighted aver- good argument. However, on the basis of the work of
ages in real contexts, they maintained those concep- Mokros and Russell and of Watson and Moritz, mean
tions over time. as fair-share, and subsequently mean as typical value,
Watson and Moritz’s results suggest that it takes are perhaps better first introductions to the notion of
many years for students to develop their concept of measures of center, because they build on students’
the mean to the point where it is a representative of a primary intuitions. The mean as data reducer requires
data set. Whereas Mokros and Russell identified each more sophistication from students, and a willingness
student’s dominant conception of average, Watson on their part to let go of some pieces of information.
and Moritz have documented and analyzed the great This can be difficult for students to do, especially in
variety of conceptions of average that a given student those instances when their own information is included
can have. They claimed that, “It would appear that in a data set, for they feel their information will get
many students hold eclectic ideas associated with av- lost in the data reduction process. The mean as signal
erage, even when they have not been taught formally” amid noise involves yet another level of complexity. It
(Watson & Moritz, 2000c, p. 46). They strongly recom- takes considerable experience with data sets to real-
mended that the teaching of the concept of average ize that there even is such a thing as “noise” in data.
should build up from students’ initial preferences for Noise could appear in data just from random variabil-
“middles” and “mosts” to the more normative concep- ity in samples or in gathering data from probability
tion of mean as representative of a data set. They also experiments. Noise could be the result of measure-
recommend delaying the formal introduction of the ment error, either systematic error or careless error.
mean until students’ developmental stages have had Noise could be introduced by poor data-production
an opportunity to unfold. Bakker and Gravemeijer techniques, or biases in sampling procedures. Teach-
(2004) made similar recommendations for delaying ers and students must spend some time focusing on
the introduction of some of the more formal concepts the noise itself before getting too carried away with
in statistics, like the mean. They found a wealth of stu- determining a signal in the data. I am convinced that
dent ideas to build upon, including informal concep- it is just as important, if not more so, to analyze the
tions of center and shape, when students were asked variability in data, and to look for both special-cause
to compare distributions of data, or to generate their and common-cause variation in data as it is to com-
own hypothetical distributions of data. pare data sets by comparing their means. Wild and
Konold and Pollatsek (2002) added to the litera- Pfannkuch (1999) agree, and identify variability as the
ture on the complexity of the average concept with a critical component in the development of students’
theoretical reflection that extends Mokros and Rus- understanding of distributions of data.
972 ■ STUDENTS AND LEARNING
Research on Students’ Understanding Godino, Vallecillos, Green & Holmes, 1994; Loosen,
of Variability Lioen, & Lacante, 1995; Shaughnessy, 1997). It wasn’t
until the turn of the millennium that research on and
Like average, variability is itself a very complex about variability began to appear at professional meet-
construct. Researchers have often tended to use the ings or in scholarly journals.
terms variability and variation somewhat interchange- Several research efforts have identified a variety
ably. I prefer the distinction pointed out in Reading of components of variability that indicate the poten-
and Shaughnessy (2004), in which variability is the tial depth of students’ thinking and understanding
propensity for something to change, and variation is about variability. For example, Wild and Pfannkuch
a description of or a measurement of that change. “A (1999) included a number of aspects of variation in
survey of various dictionaries demonstrated that varia- their model of statistical thinking, such as acknowl-
tion is a noun used to describe the act of varying or edging, measuring, explaining, and controlling variation.
changing condition, and variability is a noun form of Reading and Shaughnessy (2004) extended Wild and
the adjective variable, meaning that something is apt Pfannkuch’s list of aspects of variation to include de-
or liable to vary or change. The term variability will scribing and representing variability, and Canada (2004)
be taken to mean the characteristic of the entity that provided a detailed framework for analyzing students’
is observable, and the term variation to mean the de- thinking about noticing, describing, and attributing causes
scribing or measuring of that characteristic.” (Shaugh- of variation. Reading and Shaughnessy also provided
nessy & Reading, 2004, pp. 201–202). However, I will evidence from student work on several statistical tasks
attempt to remain faithful to the way that particular for both a description hierarchy and a causation hierarchy
researchers refer to the concept, variability or varia- for variability. For example, in the description hier-
tion, when discussing their research. archy lower level responses might be concerned only
Within the field of statistics variability arises every- with outliers or only with middles (uni-structural),
where. Even in formal statistics variability is a slippery whereas a higher level response might mention both
concept. Data vary, samples vary, and distributions middles and extremes (multistructural). At an even
vary. Furthermore, variation occurs both within sam- higher level a student response might discuss the de-
ples and distributions as well as across samples and dis- viations of data from some fixed value like the mean,
tributions. A large part of statistical analysis often in- thus making connections between the concepts of
volves parsing out the relative contributions and loca- center and variability (relational).
tions of sources of variation. The objects within which Some questions come to mind in considering
variability occurs in statistics—in data, in samples, and student thinking about variability. Do students ac-
in distributions—are clearly not independent of one knowledge variability? If so, how do they describe it or
another. Statisticians may be interested in the data in talk about it? Are students’ conceptions of variability
a particular sample, or the distribution of a sample, or influenced by context? Do they recognize potential
the sampling distribution of a statistic, or even varia- sources of variability? Will they attempt to control as-
tion across several distributions of data. pects of an experiment in order to minimize variabil-
Similarly, research on students’ thinking and un- ity? Do students think about variability in a variety of
derstanding about variability could focus on variation ways, similar to the different conceptions of average
in data, or on students’ conceptions of variability in that were discussed in the previous section? The next
samples, or on the variability across several distribu- section reviews research that has investigated some of
tions of data that are being compared. Thus, variabil- these questions and discusses students’ thinking about
ity occurs within many levels of statistical objects, and variability across the spectrum of statistical objects: in
our students need to develop their intuition for what data, among samples, and across distributions.
is a reasonable or an unreasonable amount of variabil- Researchers have now begun to study students’
ity in these objects. thinking about variability in a number of statistical
It is a bit surprising then, given the overarching contexts, such as when reasoning about distribu-
importance of variability in statistics (Moore, 1997), tions, dealing with samples and sampling, reasoning
that practically no research on students’ conceptions about the outcomes of a probability experiment, and
of variability was reported prior to 1999. A number comparing data sets (Shaughnessy, Watson, Mori-
of researchers had called for research on the teach- tz, & Reading, 1999; Melitou, 2000, 2002; Reading
ing and learning of variability and had raised con- & Shaughnessy, 2000, 2004; Torok, 2000; Torok &
cerns about the lack of explicit attention to the con- Watson, 2000; Watson & Kelly, 2002; Reading, 2003;
cept of variability in introductory statistics texts or in Ciancetta, Shaughnessy, & Canada, 2003; Shaugh-
school curriculum materials (Green, 1993; Batanero, nessy, Ciancetta, & Canada, 2003; Watson, Kelly,
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 973
Callingham, & Shaughnessy, 2003; Canada, 2004; does give a one number summary of the data set, it
Shaughnessy, Ciancetta, Best, & Canada, 2004b; Wat- can also mask important features in the distribution
son & Kelly, 2004). Melitou (2002) presented one of of the data. When these students were subsequently
the first reviews of the literature on research on the asked to graph the data set, some of their representa-
teaching and learning of variability and described tions uncovered a pattern in the data that was highly
a number of the ways that researchers were begin- bimodal. More importantly, students discovered an al-
ning to investigate student thinking about the con- ternating short-long pattern in the Old Faithful blasts
struct. As this chapter was being written, an issue of when they created plots over time, or dot plots or bar
the Statistics Education Research Journal (Vol. 4, No. 1) graphs (See Figure 21.3).
with a special section on reasoning about variation This oscillating pattern can be completely missed
was in process (J. Garfield & D. Ben-zvi, Eds.). In the if one just calculates a mean or draws a box-plot for
next several sections I discuss research on students’ the data. The signal in the variability is much stronger
thinking about variability, in data, in samples, and than the signal in the center in the Old Faithful example.
in distributions. While this is a convenient approach The variation in the Old Faithful data is not random.
to discuss student thinking about variability, it is also There are likely to be some underlying geological
a somewhat artificial categorization because data,
causes or relationships for the variation. Shaughnessy
samples of data, and distributions of data all inter-
and Pfannkuch found that students who attended to
connect with one another.
the variability in the data were much more likely to
predict a range of outcomes or an interval for the wait
Variability in Data
time for old Faithful, such as “Most of the time you’ll
Data sets tell stories, and the heart of any statisti-
wait from 60 to 85 minutes,” than to predict a single
cal story is usually contained in the variability in the
number (such as 77 minutes) for the wait time. Means
data. When analyzing data, the role of a student or
mask what is going on in the Old Faithful data set.
a statistician is to be a “data detective,” to uncover
The definitive long-short pattern is lost in an average.
the stories that are hidden in the data. From a data
Means may be critical in making statistical inferences
detective point of view, there are important signals in
when comparing groups, but variation is often even
the variability as well as in measures of center. In fact,
more important to the data detective.
premature attention to measures of center can result
in missing the important trends in the variability in In the first edition of this Handbook I discussed
the data. For example, consider the data for a series some differences between the points of view of psy-
of wait times for eruptions of the Old Faithful geyser chologists and of mathematics educators on research
in Yellowstone National Park (Figure 21.2), and the in the teaching and learning of statistics and prob-
accompanying student graphs of the geyser data (Fig- ability. Konold and Pollatsek’s (2002) argument on
ure 21.3). This example, taken from Shaughnessy and the importance of the mean as the signal in data sets
Pfannkuch (2002), presents data for approximately 3 may have arisen from their frustration with students
consecutive days of wait times (in minutes) between lack of attention to the mean (Konold et al., 1997)
eruptions of Old Faithful. Students were asked to as a great tool for making comparisons between data
work in groups, to analyze the data, and to make a sets, something that psychologists often wish to do
decision on how long they would expect to wait for an in their research work. However, mathematic educa-
eruption of Old Faithful. tors can be equally frustrated when students rush to
Typically, many beginning students first just calcu- compute a mean without even considering what the
late a mean or determine a median for a day’s worth of variability in a data set might reveal about the con-
Old Faithful data and then base their initial prediction text (Shaughnessy & Pfannkuch, 2002). Students also
on a measure of central tendency. (The mean of First need to recognize and investigate potential sources of
day is 70.1, the median is 70.5; mean of Second day variation within the data and not just rush to looking
is 79.9, median is 77, and so forth). While the mean at centers.
Day 1: 51 82 58 81 49 92 50 88 62 93 56 89 51 79 58 82 52 88
Day 2: 86 78 71 77 76 94 75 50 83 82 72 77 75 65 79 72 78 77
Day 3: 65 89 49 88 51 78 85 65 75 77 69 92 68 87 61 81 55 93
Figure 21.2 Wait times in minutes for the eruption of the Old Faithful Geyser.
974 ■ STUDENTS AND LEARNING
20 yellow
30 blue
50 red
2
In Australia the word lollies is used for hard candies that are wrapped in cellophane or paper.
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 975
Jenny pulls out a handful of 10 lollies, counts the number of reds, and records it on the board. Then, Jenny puts the
lollies back into the bowl, and mixes them all up again.
Four of Jenny’s classmates, Jack, Julie, Jason, and Jerry do the same thing. One at a time they pull ten lollies, count the
reds, and write down the number of reds, and put the lollies back in the bowl and mix them up again.
____________ ____________
I think this because:
2. I think the list for the number of reds is most likely to be (circle one)
3. I think the numbers of reds went from (a low of) _______ to a high of _______.
example, some students predicted all high numbers because “any result could occur, you never know” sug-
of reds, like 6, 7, 5, 8, 9, mostly numbers above the gesting that they may be using “outcome approach”
expected value of 5 for samples of 10 lollies from this reasoning (Konold, 1989) or an equi-probability con-
50% red mixture. These students usually reasoned that ception (LeCoutre, 1992). Still other students pre-
there were “a lot of red in there, so it (the red ones) dicted a very narrow list for the numbers of reds, for
will happen a lot.” Other students, mostly among the example, 5, 5, 5, 5, 5, or 5, 6, 5, 5, 6 (range ≤ 1). The
Grade 4 students, predicted all low numbers (all num- “narrow” responses, especially all 5s, occurred more
bers ≤ 5) and said that there were a lot of “non-reds” frequently among the older students, Grade 12 in par-
in the mixture that would prevent the reds from being ticular. Students who predicted narrow reasoned that,
pulled very often. Still other students predicted a wide “5 is the most likely outcome” or “5 is what you are
list of outcomes, for example, 1, 5, 7, 9, 2 (range ≥ 8) supposed to get.” Reading and Shaughnessy (2000)
976 ■ STUDENTS AND LEARNING
found that the narrow predictors were reticent to ple proportions and population proportions. Distribu-
change their answers, even after they acknowledged tional reasoners combine both centers and spreads in
that the repeated samples were unlikely to all be iden- their reasoning about the Lollie problem. They make
tical. Finally, some students’ lists were reasonable in that comparisons between the sample proportions and the
they were distributed in a more normative way around population proportion, and they also explicitly mention
5, such as 3, 7, 5, 6, 5, centered around the expected variation about the expected value. Shaughnessy et al. also
value within a reasonable range. discussed transitional phases in students’ thinking that
There is some evidence that when students are occur between proportional reasoning and distribu-
given a chance to actually draw their own samples from tional reasoning. Attention to just one of the aspects of
a lollie type of mixture, a higher percentage of them a distribution (e.g., center, shape, or spread) does not
will subsequently give reasonable predictions for the Lol- necessarily guarantee attention to the other aspects.
lie task. Shaughnessy et al. (1999) reported an increase It takes some time for students to be able to integrate
from 17% to 55% in reasonable responses when a sample these various aspects of a distribution.
of 94 middle school students actually conducted a simu-
lation of the lollie problem with colored cubes in a box. Variability: From Samples to Sampling Distributions
On the other hand, in a study using student interviews, Distributional reasoning involves making connec-
Kelly and Watson (2002) did not find such changes in tions from populations to samples, and back again. In
students’ predictions, though they admit there may order to adequately reason about samples pulled from
have been too few trials in their experiment to change populations, students must have a strong concept of
students’ predictions. The tenacity of beliefs and intu- population proportion, what Kahneman and Tver-
itions about probabilistic and statistical phenomena sky (1972) have called the base rate. The propensity
has been well documented over the years (Kahneman for subjects to ignore the base rate was so prevalent
& Tversky, 1972, 1973a, 1973b; Tversky & Kahneman, across Kahneman and Tversky’s research that recog-
1974, 1983; Shaughnesssy & Dick, 1991; Shaughnessy nition of population proportions is clearly an impor-
& Bergman, 1993). Beliefs and conceptions about data tant idea for instructors to emphasize when they first
and chance are very difficult to change, and research introduce sampling to students. The complexity of
has suggested that empirical experiments and simula- the levels of statistical objects involved in sampling,
tions must be systematically built into instruction over and the proportional nature of the relations among
a longer period of time in order to change the patterns those objects, is highlighted in work by Saldanha and
of students’ intuitive conceptions (Shaughnessy, 1977; Thompson (2003). They designed a teaching experi-
Fischbein & Schnark, 1997;). ment to develop secondary students’ understand-
The emerging conceptual model introduced previ- ings of the concept of sampling distribution. In the
ously in this chapter has been used by several groups experiment the students wrestled with a multitude of
of researchers to describe the progression of student statistical objects: individual data values; collections of
reasoning about variation on the Lollie tasks, from data values (samples); statistics for a sample (like the
ikonic, to additive, to proportional, and finally to distri- mean of a sample); and finally, collections of statistics for
butional. According to Kelly and Watson (2002), some many samples (for example, a distribution of sample
students, particularly younger students, reason “ikoni- means). The concept of population proportion was
cally,” by using physical circumstances or personal sto- critical to Saldanha and Thompson’s teaching experi-
ries when they predict sample proportions for the lol- ment, as was the relationship between sample pro-
lie task. Students who are reasoning ikonically may say, portion and population proportion. Saldanha and
“They might get more reds because their hand could Thompson emphasized the differences among data,
find them,” or “Maybe they are lucky and will get all samples, and populations with the students in their
reds,” without any reference to the actual contents or teaching experiment. However, they expressed some
the proportion of colors in the bowl. In a sample of 272 frustration that even with this explicit emphasis, their
students in Grades 6–12, Shaughnessy et al. (2004a) students had great difficulty conceiving of and distin-
classified students’ reasoning on the Lollie task as pre- guishing among the various levels of statistical objects
dominantly additive, proportional, or distributional. while working on sampling tasks. Their students “did
Additive reasoners focus on frequencies, rather than not have a sense of variability that extended to ideas of
on relative frequencies. Proportional reasoners tend to distribution” (Saldanha & Thompson, 2003, p. 264).
predict “around 5” for the Lollie problem and defend Saldanha and Thompson’s students interacted
it with statements such as, “There are 50 red,” or “I’d primarily with a computer environment, generating
expect 5 red out of the 10 candies.” Proportional rea- samples and sampling distributions. One wonders if
soners explicitly discuss the connections between sam- Saldanha and Thompson’s students would have had
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 977
more success if they had spent more time experiencing the interaction between representativeness and vari-
a hands-on approach, doing simulations with objects to ability on these tasks, and that students had very little
generate sampling distributions, rather than in the com- intuition for the shape of the sampling distribution
puter environment. Studies that have involved substan- for the Gummy Bears packets.
tial hands-on simulation activity have shown some suc- Other researchers have found results similar to
cess in influencing student thinking in stochastic situa- Rubin et al. Shaughnessy et al. (2004a) gave a lollies
tions (Shaughnessy, 1977; Shaughnessy et al., 1999). In type question to secondary students that asked them
a second teaching experiment Saldanha (2003) found to predict a sampling distribution for repeated sam-
that the addition of physical, hands-on sampling provid- ples drawn from a lollie jar. They found that many stu-
ed better support for student understanding of sampling dents focused heavily on aspects of variability, predict-
distributions than the computer environment alone. ing overly wide ranges for the outcomes in a sampling
Rubin, Bruce, & Tenney (1991) pointed out that distribution that too many extreme results would
peoples’ understanding of a what a sample tells us occur. On the other hand, Saldanha and Thompson
ranges along a spectrum from, “knowing everything” to (2003) found that students’ concept of sampling did
“knowing nothing.” Knowing everything is the result of not entail much of a sense of variability at all. Their
overconfidence in how well the sample represents the secondary students tended to judge samples purely on
population from which it is drawn, reminiscent Kahne- representativeness, perhaps relying too heavily on the
man and Tversky’s (1972) representativeness heuristic. underlying population proportion.
At the other extreme, knowing nothing reflects a belief This series of studies (Rubin et al., 1991; Saldanha
that a sample is just chance, because absolutely anything & Thompson, 2003; Shaughnessy et al., 2004a, Watson
can happen in a sample. This other extreme of Rubin’s & Kelly, 2004) clearly shows that there is always a tension
spectrum reflects reasoning similar to Konold’s (1989) between representativeness and variability in sampling
outcome approach. Another way to think about these situations. Furthermore, the framing of the task itself
two extremes is that at one end people are too fixated can tip students’ thinking in the direction of either an
on centers—they feel all samples should be perfectly over reliance on the population proportion, or an over
representative—and at the other end they are too fix- reliance on variability. The goal for teachers and cur-
ated on variability, and all the possibilities for individu- riculum developers is to help students to grow a middle
al outcomes. Students need to have a balance between ground between representativeness and variability, be-
representativeness and variability, between expectation tween knowing all and knowing nothing, to “knowing
and variation (Watson and Kelly, 2004). something,” so that both population proportion and
Rubin et al. created several sampling scenarios variability are taken into account together. This balance
to investigate how students might attempt to balance is necessary so that students understand that one can ac-
sample representativeness with sample variability. tually know something with a reasonable likelihood in
For example, in one scenario students were told that sampling situations. One approach to sampling that may
packets containing six Gummy Bears were handed out promote a balance for students between expectation
to all the children attending a parade. The packets and variation involves student-generated confidence in-
were assembled from a huge vat of 1 million red and tervals (Landwehr, Watkins, & Swift, 1987).
2 million green Bears that were all mixed up. Then
the students were asked how many children out of 100 Variability across Distributions: From Informal
they thought got packets with exactly 4 green and 2 to Formal Inference
red bears? Half the students said over 75% would get Some research has focused on student thinking
packets with 4 greens. All but one student said that about variability across distributions, using tasks in
over 50% of the children would get 4 greens. The which students are asked to compare several distribu-
wording of this problem focused the students’ atten- tions or to make decisions based on a collection of dis-
tion on the population proportion of two-thirds green, tributions. Gal, Rothschild and Wagner (1989, 1990)
and likely tipped the students’ thinking towards the found that middle grade students normally did not use
representativeness end of Rubin’s spectrum. However, the mean when they were asked to compare two data
when a different version of the problem was given to sets. Instead Gal et al. (1989) found that students used
the students, asking “out of 100 packets, how many statistical strategies, quantitative data summaries, or
children would get 0, 1, 2, . . . , 6 green Bears,” they proto-statistical strategies that focused on incomplete
tended to spread their predictions out across all the features of the data sets. Some students invented their
possible outcomes, tipping their responses toward own stories to compare data sets. Gal et al. (1989) were
the variability end of Rubin’s spectrum. Rubin et al. the among the first researchers to point out the essen-
found that students had little or no understanding of tial role that proportional reasoning can play when
978 ■ STUDENTS AND LEARNING
students are asked to compare two data sets. Konold, of data sets (Watson and Moritz, 1999; Watson, 2001a,
Pollatsek, Well, and Gagnon (1997) reported similar 2001b). A version of the data sets used for these stud-
results with some upper secondary students, who also ies is presented in Figure 21.6.
neglected to use the mean as a comparative measure. In interview settings, Watson and Moritz present-
Their students often focused on particular data points ed students with the pairs of the data sets from Figure
or individual features of the data rather than on mak- 21.6 (e.g., Yellow-Brown, Pink-Black) and told them
ing global comparisons of the distributions. Evidently that the graphs showed the results of students’ test
the power of the mean as a representative measure scores in two different classes. They then asked stu-
for making comparisons of data sets is not a primary dents which class did better in each pair, and why.
intuition (Fischbein, 1987), and must be carefully de- In the first study, Watson and Moritz (1999) report
veloped within a teaching-learning setting. that students’ strategies fell into two SOLO cycles. The
In an 8-week design experiment with a fourth grade first cycle dealt with equal sized data sets (e.g., Yellow-
class Petresino, Lehrer, & Schauble (2003) introduced Brown). Student explanations ranged from a compari-
distributions of measurement data as objects for com- son of individual data values, to calculating the total
parison and decision-making. Students gathered data on class values (adding up all the values in the data set),
the heights of model rockets, some with pointed noses to visual comparison strategies (“Yellow goes higher”),
and others with rounded noses and then were asked to and to combinations of these explanations. In the sec-
decide which was the best design. The students found ond SOLO cycle, data sets of different sizes were com-
the median height for both types of rockets, but this did pared (e.g., the Pink-Black data sets in Figure 21.6) and
not entirely satisfy them as a comparison basis, because proportional reasoning played a more prominent role.
the data for one type of rocket was more inconsistent Students’ reasoning strategies again included totaling
than the other. They decided to find a way to measure up the scores and visual strategies, but students also cal-
the inconsistency, that is, the variability. This class de- culated the means to compare the two groups. Watson
vised a way to compare the variation for the two rockets and Moritz thus identified both additive and propor-
by computing differences from rocket height to median tional reasoning strategies among their students. They
rocket height for each type of rocket. Then they created concluded that students use “bottom line” strategies,
two new distributions from the original distributions, the such as attending to certain features of the graphs or
distributions of height differences from the medians. reasoning about modal clumps, when comparing data
These fourth graders were thus comparing distributions sets. Watson and Moritz argued that these informal rea-
of residuals—a rather sophisticated concept, but one that soning strategies can provide opportunities for teach-
arose naturally in this particular design experiment. ers to introduce the comparison of two data sets prior
The study by Petresino et al. provides a glimpse to calculating means. Bakker and Gravemeijer (2004)
into classroom dynamics and into the social-concep- came to the same conclusion: Students can, and should,
tual growth of a class of students who attempted to use compare data sets using their own intuitive strategies
both centers and variability in their decision making prior to formal statistics. In a follow-up study conduct-
process. It is interesting that the students in the Pe- ed 3 years later with some of the same students Watson
tresino et al. study looked for a signal from the variation (2001a) found considerable improvement in the level
when they compared the distributions, rather than a and complexity of students’ responses on these tasks.
signal from the centers. They felt that the center val- In a third study Watson (2002) used an interesting
ues for the two types of rockets could be misleading, methodology to set up cognitive conflict among stu-
as one rocket design was more inconsistent than the dents and to challenge their thinking. After they had
other in the heights it attained. In this case, it was not responded to the comparing-data-sets tasks in Figure
just the mean or the median that became the signal 21.6, Watson showed students video clips of other stu-
to inform a decision (see Konold & Pollatsek, 2002). dents who had compared the data sets using a differ-
The variability was the primary signal that aided the ent strategy. For example, if a student reasoned that,
students’ decision. Petresino et al. Also reported that “The Pink Class did better, because they have more
their fourth graders performed very well on the 1996 students who scored higher,” Watson might show this
NAEP Grade 4 statistics tasks that were administered student a clip of a student who computed the means
to the class after the 8-week teaching experiment. This of the Pink and Black class and claimed that Black was
suggests that there may be long-term retention ben- better. Or, if a student said that “Yellow did better be-
efits when students invent their own approaches to cause they got high numbers” (referring to the height
data analysis and the comparison of data sets. of the 5 column in the yellow data set) she might show
In a series of articles, Watson reported on Grade 3 a clip of a student who said that there was no differ-
to 9 students’ types of strategies when comparing pairs ence, because both classes totaled 45, or there was no
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 979
Figure 21.6 Paired data set comparisons. (From Watson & Shaughnessy, 2004. Proportional reasoning: Lessons from
research in data and chance. Mathematics Teaching in the Middle School, p. 105)
difference because both classes had an average of 15, it has higher totals), and it is difficult to change their
or that the Brown class did better because it had a 7. minds if they are not proportional reasoners. Some stu-
Of those students whose reasoning could improve dents reasoned proportionally on this problem without
the second time around (that is, students who didn’t rea- actually calculating the means. For example, they said, “a
son at the top level the first time), 57% improved on the higher percentage of the Black class got higher scores.”
Yellow–Brown comparison, but only 30% improved on This sort of reasoning is proportional, because students
the Black–Pink comparison, after being shown a clip of are comparing relative frequencies, not just absolute fre-
another student. Students need to reason proportionally quencies. Watson (2002) noted that the improvement
to get the Pink–Black comparison correct. Students who rate after the cognitive conflict was nearly identical to
only reason additively will usually pick Pink (because improvement rate over 4 years in the (2001a) longitu-
980 ■ STUDENTS AND LEARNING
dinal study. Thus cognitive conflict might help acceler- Most of the research in this section has dealt with
ate students in their ability to analyze and compare data students’ informal inference strategies as they compare
sets. A question that comes to mind, one acknowledged distributions. Pfannkuch (2005) reviewed the literature
by Watson, is how stable are the students’ responses af- on probability and statistical inference, and suggested
ter the cognitive conflict? What would they have said if some approaches to help teachers move from informal
asked again 3 or 4 years later? inference towards the use of more formal inference
Shaughnessy (2003b) has used three of these com- tools, such as using simulations in a re-sampling ap-
paring-groups tasks from Figure 21.6 in a survey form proach to statistical inference with students as early as
with middle and secondary school students. Percentag- Grade 12. However, Lipson (2002) found that software
es of types of reasoning, including additive, proportion- capabilities disrupted the transition between empiri-
al, and distributional reasoning, on the Brown–Yellow cal and theoretical approaches to inference. Also, del-
and Pink–Black tasks are shown in Tables 21.1 and 21.2. Mas, Garfield, and Chance (1999) found no substan-
Although there was some growth across grade levels on tial evidence that simulations improved their students’
the Pink–Black task, comparing unequal sized groups conceptual understanding of sampling distributions.
remained a challenge for many students. Nearly three- Despite these discouraging results Pfannkuch remains
fourths of the students in the survey said that the Pink optimistic that resampling techniques and simulations
class did better. More work with students at comparing can provide a bridge to help students transition from
unequal sized groups is clearly an important target for informal inference strategies to a more formal (norma-
teaching and assessment. tive) approach to inference. The concepts surrounding
Table 21.1 Frequencies (%) of Responses for Codes for Yellow–Brown Group
Comparisons (Survey Results from NSF Grant No. REC-0207842)
Codes: 0—No response, misread, unclear reasoning “they look the same”
1—Using Sums; individual graph characteristics e.g., “there’s more 5’s in yellow,” or “there’s a 7 in brown”
2—Using Average (if computed)
3—Variation explanation; spread or tightness of distributions
4—Sophisticated use of Average and Variation in combination
Table 21.2 Frequencies (%) of Responses for Codes for Pink–Black Group
Comparisons (Survey Results from NSF Grant No. REC-0207842)
Codes: 0—No response, or “Pink because of more scores or higher total”; or, black or equal with bad reasoning
1—Black, use of graph characteristics
2—Black, use of averages
3—Black, use of characteristics of the distribution, i.e., “higher proportion of scores are . . .”
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 981
statistical inference are very complex, and the transi- Nearly 50% of the 1270 Grade 12 students tested
tion for students to formal inference is likely to be in agreed with James, and said that there was a 50–50
process for several years. There is no quick fix to under- chance that both spinners would land black. In a con-
standing these concepts, anymore than there is a quick venience sample of responses (N = 306) the reason
fix to understanding the formal concept of a limit in given by a third of the students in support of a 50–50
mathematics. chance of both spinners landing black was that since
the spinners each were half white and half black there
was a 50–50 chance that both of them would land on
Variability and Random Outcomes
black. Only 8% of the students in the NAEP sample
in Probability Experiments
claimed that the probability was 1/4 and gave an ad-
Probability is the focal point of another chapter in equate justification for their answer.
this volume (Jones, Langrall, & Mooney, this volume), Responses to this item speak volumes about the
but there are important connections between prob- state of probability instruction in the United States.
ability and statistics, particularly when repeated trials Simple compound probability problems like this spin-
of probability experiments generate a distribution of ner problem are mentioned among the important
possible outcomes. probability concepts in the Principles and Standards for
Truran (1994) wanted to push students to inves- School Mathematics (NCTM, 2000). This spinner prob-
tigate how far experimental probabilities would have lem is not a particularly difficult compound prob-
to deviate from expectations before students would ability problem. In fact, isomorphic versions of this
consider revising their predictions. Sampling with re- problem appear in many middle school mathematics
placement, Truran asked 32 students in Years 4, 6, 8, programs. Shaughnessy and Zawojewski (1999) out-
and 10 what they would expect in samples of size 9, lined ways that teachers could use this spinner task,
and samples of size 50 drawn from an urn with 2 green and similar ones, as teaching and assessment tools to
balls and 1 blue ball. Extremes were found to be sur- explore students’ understanding of probability.
prising to most of these students, and some of them Shaughnessy and Ciancetta (2002) found some-
thought that their results would tend toward 50–50 if what more promising results for the spinner task when
they “did it enough times.” Several students offered they surveyed a sample of 652 students in Grades 6–12
personal confidence intervals from 4–8 greens for the and had a subset of students gather actual data. Cor-
small sample, and from 27–40 greens for the large rect responses initially ranged from 20% in Grades 6–
sample. Many of Truran’s students gave responses that 7 to near 90% in classes of students who were studying
involved personal theories about the balls or the urn more advanced mathematics such as precalculus, AP
that did not relate to the actual mixture proportions. calculus, or AP statistics. Follow-up interviews with 28
The probability task in Figure 21.7 is an extended of the students from Grade 8–12 were conducted in a
constructed response that was given to twelfth graders setting where students first predicted the results for 10
on the 1996 NAEP. Zawojewski and Shaughnessy (2000) repeated trials of the spinners tasks and then gathered
discovered some startling responses to this task. their own spinner data for 10 actual trials. After con-
ducting their own experimental trials, only 5 students
persisted in believing that the chance of both spinners
landing black was 50%. As the students experienced
the variability in results across several sets of 10 trials
and witnessed the actual oscillation in the number of
times both spinners landed black, most students real-
ized that the true expectation of winning the spinner
game was not 50%, but something much smaller.
Watson and Kelly (2004) assigned SOLO levels to
The two fair spinners above are part of a carnival game. A player student responses on the 50–50 spinners task. Typical
wins a prize only when both arrows land on black after each
spinner has been spun once.
responses in the main three SOLO levels were, “it’s
50–50 because it’s half white” (uni-structural); “the
James thinks he has a 50–50 chance of winning. Do you agree?
chance that both land black is less than 50% because
A Yes B No only one of the three outcomes has both black (multi-
Justify your answer. structural); and “there are four outcomes, BB, BW,
WB, and WW, so the chance is 25% both land black”
Figure 21.7 The 1996 NAEP Spinner Task (From (relational). Zawojewski and Shaughnessy (2000
Zawojewski & Shaughnessy, 2000, p. 263) found that these same three response categories ac-
982 ■ STUDENTS AND LEARNING
counted for most of the student responses to this task of a nearly perfect relationship between an increase
on the 1996 NAEP. Watson and Kelly concluded that in heart deaths and an increase in motor vehicle use
the concept of independence of the two spinners was as described in a newspaper article. They also asked
not intuitive for their students, and it was especially students what questions they would raise about such a
difficult for the middle school students in their study. claim. Using their three-tiered framework for analyz-
This points to a major challenge for teaching and cur- ing statistical claims, Watson and Moritz found some
riculum development in statistics—how do we help students who were reasoning at the third tier because
students get a better intuitive feel for what a reason- they questioned whether there really was a cause-and-
able spread is in experimental probability situations? effect relationship between the incidence of heart
Students’ recognition of the appropriate variabil- deaths and motor vehicle use. However, most of their
ity across the outcomes of a probability experiment students just accepted the claim about an association
is often very dependent on the task. Shaughnessy, between driving a car and having a heart attack with-
Ciancetta, and Canada (2003) compared 84 middle out questioning it. Students are often too willing to
school students’ predictions about variability on re- accept anything that they read in print, and are proba-
peated trials across three task environments: the Lol- bly inexperienced in critiquing statistical claims about
lie task, a task that involved repeated sets of 20 tosses relationships between variables.
of a regular six sided die, and a single spinner task. Moritz (2000, 2004) provided an overview and
The percentage of students’ responses that fell within analysis of the research on students’ reasoning asso-
“reasonable” variation guidelines established for these ciation and about covariation, and presented bivariate
three tasks were 70% for the lollies, 30% for the die, tasks to students in Grades 3 to 9 that involved both
and 49%, for the spinner task. These results suggest
positive and negative association. In one task, Moritz
that in the mental tug of war that students face be-
gave students the description and graph in Figure
tween variability (spread) and expectation (centers),
21.8 that portrays a negative association between the
that the Lollie problem pulls students more in the
number of people in a classroom and the noise level
direction of acknowledging variability, whereas the
in that classroom. Then, he asked students to explain
die problem pulls students more in the direction of
the graph to someone who could not actually see it.
predicting expectation. The spinner problem split
the students down the middle into about 50% reason-
able and 50% unreasonable predictions for variability
across repeated sets of trials. There seems to be an Some students were doing a project on noise. They visited 6
different classrooms. They measured the level of noise in the
influence from probability on these types repeated tri- class with a sound meter. They counted the number of people
als tasks, especially in the die problem but also some- in the class. They used the numbers to draw this graph.
what in the spinner problem. Probability instruction
might interfere with student thinking about variabil-
ity, since students may tend to predict ‘what should
happen, theoretically. In order for students to reason
distributionally, in order for them to grow beyond a
mere focus on expectation, they must develop their
intuition for a reasonable amount of variation around
an expected value, not just the expected value itself.
Moritz identified four levels in responses to this level students to investigate whether students would
task: nonstatistical, single aspect, inadequate covaria- look for potential causes of variability in food con-
tion, and appropriate covariation. Nonstatistical re- sumption over time.
sponses did not address anything about covariation, Most students did make some conjectures about
for example “It’s a graph of noise of students.” Sin- the variability over time in such per capita food con-
gle-aspect responders tended to pick out a few data sumption graphs (e.g., per capita milk, coffee, soft
points, perhaps extremes or outliers, and talk about drink, & bottled water consumption). Students will
them but they used only a very few data points. If stu- even make up something up if they cannot provide a
dents only talked about one of the two variables, or if solid contextual explanation for the humps and dips
they compartmentalized their responses on the two in these food consumption graphs over time. Students
variables and did not talk about how the two were have said, “maybe it was inflation,” or “maybe the
related to one another, their responses were coded economy was bad.” They rarely attribute the big jumps
as inadequate covariation. Finally, responses that re- or big dips in the food consumption graphs to ran-
lated the changes in the two variables to one another dom variation. Rather, students tend to look for what
were coded as appropriate covariation. Wild and Pfannkuch call “special cause” variation.
Moritz (2004) reflected on approaches that Shaughnessy (2003b) found students who attributed
might help students to transition through several the swings in these food consumption graphs to the
stages of statistical reasoning, from single data baby boom, to improved production and distribution
points to the variation within individual variables, of food, to the Depression, to World War II, to the war
to a consideration of how both variables change si- in Vietnam, and finally, if all else failed, some students
multaneously. He agreed with Nemirovsky (1996) claimed, “It must have been the hippies.” Students
who in his work in algebraic thinking suggested that do try to make contextual conjectures for why such
covariation might best be introduced to students by graphs vary. Nemirovsky (1996) had a point, covaria-
using time as the independent variable. Shaugh- tion might be best introduced with time as one of the
nessy (2003b) also used graphs over time such as variables, because students are interested in trends
the one in Figure 21.9 in interviews and in class- over time, and this type of data connects naturally to
room teaching episodes with middle and secondary topics that are of interest to them.
15
14
Fish_and_Shellfish
13
12
11
10
1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
Year
Whereas Moritz looked at younger student’s under- the whole distributions. Other students used what
standing of covariation, Batanero, Estepa, Godino, & Estepa et al. called deterministic strategies, like com-
Green (1996) researched beginning tertiary students’ paring lowest and highest values, comparing ranges,
reasoning about association of variables in 2 × 2 and looking at coincidences, or arguing from their own
3 × 3 contingency tables. Among their findings was the personal beliefs. Gal (1998) discussed a variety of lev-
tendency for tertiary students to make decisions about els of questions to assess students’ understanding of
association between two variables on the basis of the data in two-way tables. Gal noted that percentages are
frequency in only one cell of a contingency table. When needed to back up opinions or to defend claims that
reasoning with contingency tables the critical infor- are made about data in two-way tables, because indi-
mation is usually in the row and column proportions vidual cell frequencies are inadequate. Gal encour-
rather than in the cell frequencies. For example the aged teachers to pose more open-ended, less directive
data in Table 21.3 suggest that there may be a relation- types of questions about two-way tables to promote a
ship between travel class and death rates on the Titanic. higher level of statistical reasoning by students.
The death rate was 38%, 64%, and 76%, respectively,
for first, second, and third class travel accommoda-
Types of Conceptions of Variability
tions (Takis, 1999). The proportion of those who died
in each class level is the critical issue, not the absolute One of the strengths of the research on average
frequencies. noted above has been the spectrum of conceptions of
average that researchers have identified, both in in-
Table 21.3 Deaths by Travel Class on the Titanic vestigations of students’ thinking and in theoretical
(From Takis, 1999, p. 483) conceptual analyses. For example, Mokros and Russell
(1995) pointed to average as middle, average as most,
Travel Class Died Survived Total average as balance, and average as reasonable. Ko-
nold and Pollatsek (2002) mentioned mean as typical,
First 122 197 319 mean as fair share, mean as data reducer, and mean
Second 167 94 261 as signal. In a similar way, research efforts on students’
Third 476 151 627 understanding of variability are beginning to identify
Total 765 442 1207 a variety of conceptions of variability, in part because
of the variety of statistical objects that can vary, such as
data, samples, and distributions. The types of student
Batanero et al. (1996) found that many students conceptions of variability identified by recent research
had correct or partially correct strategies when rea- include the following:
soning about association of variables in contingency
1. Variability in particular values, including
tables, but they also found three main incorrect rea-
extremes or outliers. In this conception of
soning patterns. Some students expected a perfect
variability, students focus their attention
correspondence between variables, that there should
on particular data values as pointers, often
be no exceptions in the data. Batanero et al. called
on very large or very small values, or very
this a deterministic approach to association. Other stu-
strange individual values in a graph or in a
dents thought that association could only be in the
data set (Konold and Pollatsek, 2002; Konold,
positive direction and ignored situations in which
Higgens, Russell, & Khalil, unpublished
variables were inversely related, and this they called
manuscript).
the uni-directional misconception. A third incorrect
strategy, called the localist approach, students consid- 2. Variability as change over time. As discussed,
ered only part of the information, such as just one this conception of variability may be a good
cell, or perhaps just the main diagonal. Batanero et al. starting point to introduce covariation
concluded that their tertiary students lacked the nec- (Nemirovsky, 1996; Moritz, 2004).
essary proportional reasoning skills to reason about 3. Variability as whole range—the spread of all
contingency tables. possible values. This conception of variability
In another study with senior secondary students, involves the spread of an entire data set or
Estepa, Batanero, & Sanchez (1999) explored stu- distribution and is closely related to the
dents’ ability to make associations between two vari- concept of sample space in probability,
ables when data sets were presented in two-way tables. the set of all the possible outcomes. In this
Some students used statistical approaches, such as conception students have begun to move away
means, totals, percentages, or attempted to compare from seeing data only as individual values that
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 985
vary, to recognizing that entire samples of data distributions emerge to help make decisions
can also vary (Shaughnessy et al., 1999). about distributions of data, or about sampling
4. Variability as the likely range of a sample. This distributions (Bakker and Gravemeijer, 2004;
conception of variability arises in tasks like Shaughnessy, Ciancetta, Best, & Noll, 2005).
the Lollie problem (Reading & Shaughnessy,
2004) or in repeated trials of probability Student Thinking About Information
experiments (Shaughnessy et al., 2003). It Obtained from Samples and Surveys
can lead to statistical tools for representing
variability within or across samples, such as In her work with Grades 4 and 5 students, Jacobs
box plots or frequency distributions. This (1997, 1999) found that children’s evaluation of sur-
conception of variability requires the concept vey methods fell into four main categories: potential for
of relative frequency and thus relies on bias, fairness, practical issues, and results. Some students
proportional reasoning. It can also lead to noted the potential for bias in certain survey methods,
the concept of a sampling distribution when but others were more concerned with fairness issues.
applied to the likely range of a distribution For many students a survey is not fair unless it has rep-
of means, or distributions of other sample resentation from all possible subgroups in the survey
statistics (Saldanha & Thompson, 2003). population. For example, a school survey would have
to have a boy and a girl from each class in the school,
5. Variability as distance or difference from some fixed
in order to be “fair.” On the surface, this looks like a
point. This concept involves an actual or a
sophisticated stratified sampling scheme, but in prac-
visual measurement, either from an endpoint
tice those who favor the fair-sample approach would
value (as in a geometric distribution) or from
reject any part of randomization. Jacobs also found
some measure of center (usually the mean or
that students who were concerned with results of sur-
median). Here students are predominantly
veys disagreed with them if they did not match their
concerned with the variability of one data
own preconceived notions of what should happen.
point at a time from a center rather than the
Students also were concerned about how decisive the
variability of an entire distribution of data
results of a survey really were. In their minds it was
from a center (Moritz, 2004).
supposed to “be right.”
6. Variability as the sum of residuals. This Several of Jacob’s tasks asked students to evaluate
is a measure of the collective amount a three different survey techniques that she classified
distribution is “off” from some fixed value as restricted, self-selection, and random. In one task,
and provides a measure of the total variability Jacobs presented students with six different survey
of an entire distribution of data. This is the scenarios in a school on whether students were inter-
notion of variability that arose in the design ested in conducting a raffle in a school to raise money.
experiment of the Grade 4 class investigated Watson et al. (2003) used this task in Grades 3 to 9 as
by Petrosino et al. (2003). It provides the part of their study to measure students’ understand-
foundation for such concepts as standard ing of sources of variation. Shaughnessy (2003b) used
deviation and regression analysis. a version of Jacob’s school survey task with secondary
7. Variation as covariation or association. This school students to see if they could identify important
conception of variability involves the aspects of sampling. The task is presented in Figure
interaction of several variables, and how 21.10, accompanied by the results from a survey of stu-
changes in one may correspond to (though dents in Grades 7–12 in Table 21.4.
not necessarily cause) changes in another. Responses to Part 1 of the task were scored 0 to
Covariation raises issues about the strength 4, based on the sampling methods described by the
of relationships among variables and poses students and whether their methods did or did not
challenges to parsing out what part of include explicit reference to (a) sampling, (b) ap-
variation is due to chance, and what part may propriate size (30 to 120 was rated appropriate), (c)
actually be due to cause and effect (Batanero stratification, and (d) randomness. Only about a third
et al., 1996; Moritz, 2004). of the students scored a 3 or 4 on this task, indicating
8. Variation as distribution. Distributions that they had a statistically appropriate sampling plan.
themselves can vary. When the variation Many of the students wanted to survey most, if not all,
between or among a set of distributions of the students in the entire school.
is compared, the specter of statistical The percentage of students who rated each of the
significance arises. Theoretical probability three different sampling methods (second part of the
986 ■ STUDENTS AND LEARNING
Part 1. A class wanted to raise money for their school trip to Disney World. They could raise money by selling raffle
tickets for a Nintendo Game system. But before they decided to have a raffle they wanted to estimate how many
students in their whole school would buy a ticket.
So they decided to do a survey to find out first. The school has 600 students in grades 7–12 with 100 students in
each grade.
How many students would you survey and how would you choose them? Explain why?
Part 2. Three students in the school suggested different methods to surveys the students in the school about buying
the raffle tickets.
a) Shannon got the names of all 600 children in the school and put them in a hat, and then pulled out 60 of them.
What do you think of Shannon’s survey?
b) Raffi surveyed 60 of his friends. What do you think of Raffi’s survey?
c) Claire set up a booth outside of the cafeteria. Anyone who wanted to stop and fill out a survey could. She
stopped collecting surveys when she got 60 kids to complete them. What do you think of Claire’s survey?
(After each of these sampling methods, students were asked to rate the method, and to give a reason for their rating).
Table 21.4 Percentages Favoring Three selection method outside the cafeteria. For example,
Sampling Methods (Survey Results from NSF some groups of students might not eat lunch in the
Grant No. REC-0207842) cafeteria. Only about a third of the students surveyed
preferred the random sampling approach (Shannon’s
Rating/ Shannon Raffi Claire
way), whereas over half felt that Claire’s self selection
Method (Random) (Friends) (Cafeteria)
survey approach was the best, because “everyone has
Good 24a 37 59 the same chance this way.” The fairness criterion
for sampling that Jacobs (1997) found so prevalent
Not sure 9 10 29
among younger students is still quite robust among
Bad 67 53a 12a
older students.
Best Method 31a 5 64 Watson and Moritz reported on a series of studies
to investigate whether students knew what a sample
a
Indicates the expected response was, whether they would be sensitive to sample size, and
whether they would recognize the possibility for bias
task) as good, not sure, or bad, is listed in Table 21.4. in real sampling situations from the media (Watson &
Shaughnessy’s sample of students heavily favored the Moritz, 2000a, 2000b; Watson, 2004). In one longitu-
self-selection approach outside the cafeteria in this dinal study they examined students over a 3 to 4-year
task. This is consistent with results found by Jacobs period, administering the same four tasks three times
(1997) and Watson and Moritz (2000a). A third of to students who were in Grades 3 through 11. Students
these students felt that asking friends (Raffi’s way) was were asked the same questions 2 years later and again 4
a good way to get a sense of the opinion in the school. years later to see if they had grown in their understand-
Their reasons indicated that they wanted to predeter- ing of a sample as representative of the population from
mine the survey results and this was a way to make which it was drawn (Watson & Moritz, 2000b). The first
it happen. For teachers of statistics who promote the question asked was “If you were given a sample, what
importance of random sampling this is a troubling re- would you have?” Some students gave examples of sam-
sult. So much for democracy! ples—samples of food, blood sample, a free sample in
Another striking result was that only 12% of the the mail, when asked what they would have if they had
students recognized the potential for bias in the self- a sample. Others responded “a little bit,” or “a small
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 987
portion.” Still others mentioned a “bit of something” or model of statistical literacy—knowing terms, applying
“a part of a group.” Overall, students’ thinking ranged terms in context, and asking questions and critiquing
from personal examples, to the notion of a piece, to the the media (Watson, 1997). Their six levels of under-
idea that a sample should be a representative piece of standing samples ranged from small samplers with
something larger. no sampling method to large samplers using random
The second question asked students whether they sampling methods that were sensitive to bias. Small
would put more faith in a friend’s recommendation for samplers were those students who were content to ask
a car purchase or in the recommendation of Consumer just a few people, or to listen to their friend rather
Reports magazine, or if it did not matter one way or than Consumer Reports about the car purchase ques-
the other. Watson & Moritz’s other two questions were tion. Large samplers recognized the increased power
based on articles from a newspaper. One article claimed of information when it represented a larger fraction
that over 90% of those who phoned in on a survey were of the population. Students’ own preferred sampling
in favor of legalizing marijuana, and another article methods ranged from pre-selected or distributed
generalized a claim ‘that 6 of every 10 students from criteria so that “things were fair,” like in the Jacobs
a sample in Chicago could easily bring a handgun to (1997) study, to large samplers who wanted to control
school’ to all of the United States. Thus all three of for bias by pulling random samples.
these contexts had the clear potential for bias. Watson and Moritz were troubled that so many
Watson & Moritz describe a progression in think- students failed to recognize the bias in the handgun
ing about samples in which students (a) do not distin- newspaper article that made a claim about the entire
guish between sample and population, then (b) rec- United States on the basis of a survey of just one high
ognize the difference between sample and population school in Chicago. Some students did question the va-
but really wanted to sample everyone, and finally (c) re- lidity of the claims of the Handgun newspaper article,
alize that samples can be used to represent the popu- exhibiting Tier 3 reasoning on the literacy scale, but
lation and to estimate population parameters. Watson such students were rare. From 55–65% of the students
and Moritz found considerable growth in their longi- surveyed in each grade level were quite content to
tudinal study—50% of the students improved on the believe that those reported interviewed in the Hand-
four questions after 2 years, and 75% had improved gun question were representative of everyone in the
from their initial responses after 4 years. The cumula- United States. As for the sample size task, only 9 of
tive school experience and outside world experience 41 students interviewed said that the strange result of
seems to have improved these students’ understand- 80% boys in the school was more likely to occur at the
ing of the concept of a sample. rural school, because there is more relative variability
In another study Watson and Moritz (2000a) in- in small samples. Many of the students who got the
terviewed 62 students who had completed the four School Problem wrong did see the need for and ben-
survey questions in their longitudinal study. In the efits of larger samples on the other questions. They
interviews Watson and Moritz included some more just could not apply what they knew about large/small
general questions on sampling and sample size such samples to the School problem. This suggests teachers
as a “hospital-type” problem in order to investigate have their work cut out for them to help students be-
whether students would be sensitive to issues of sam- come aware of potential sources of bias in sampling,
ple size and variability. (The prototype of this task and to reach the level of critical thinkers at Tier 3 on
involved the gender percentages of babies born in Watson’s statistical literacy hierarchy. Students can rec-
large or small hospitals. See for example, Kahneman ognize the importance of getting a larger sample as a
and Tversky, 1972; Shaughnessy, 1977). The hospital- means to “avoid getting all the same,” or as a means
type task went as follows: A sample of 50 students was to “avoid too many extreme outcomes” or as a means
taken from a large urban school and a sample of 20 to “balance” the input more fairly. But they may still
students was taken from a small rural school. One of not see that larger samples provide a method of tight-
these samples was strange in that it had 80% boys. ening relative sample variability. Watson and Moritz
Which do you think is more likely? (a) The sample is sum up what they found about the development of
from the small school, (b) it is from the large school, students’ understanding of samples as follows:
or (c) it could be from the large school or the small
school? As part of this study, Watson and Moritz Students initially build a concept of sample from ex-
(2000a) identified six different levels of understand- periences with sample products in medical and sci-
ing sampling, using selection criteria and sample size as ence related contexts, perhaps associating the term
coding variables. They determined these levels by ap- random with sampling. As students begin to acknowl-
plying both the SOLO model and their three-tiered edge variation in the population, they recognize the
988 ■ STUDENTS AND LEARNING
importance of sample selection, at first attempting to students were given data for railroad ticket sales and
ensure representation by predetermined selection but asked which of two different graphical representations
subsequently by realizing that adequate sample size of the same data they would use in a presentation on
coupled with random or stratified selection is a valid
how sales had grown. Shaughnessy and Zawojewski
method to obtain samples representing the whole
(1999) reported that only 2% of Grade 8 students
population. (Watson and Moritz, 2000a, p. 63)
gave a reason for their choice of graphs that merited
the highest score on the NAEP rubric for the rail task.
Research on Students’ Understanding Even though an additional 18% of students gave par-
of Graphs tially correct answers, the overall performance, espe-
cially the number of omits on this NAEP task, suggests
Graphs are critical for data representation, data that students’ graph interpretation skills are weak.
reduction, and data analysis in statistical thinking and
Other research has confirmed some difficulties
reasoning. A complete review of all the research on
that students have when reading and interpreting par-
reasoning about graphs is well beyond the scope of
ticular types of graphs. Researchers have examined
this chapter. A thorough review of the literature con-
student thinking on bar graphs (Pereira-Mendoza &
cerning sense-making in graphs is available in Friel,
Mellor, 1991), line graphs (Aberg-Bengtsson & Ot-
Curcio, & Bright (2001). They define graph com-
toson, 1995), stem-and-leaf plots (Pereira-Mendoza
prehension as “the ability of graph readers to derive
& Dunkels, 1989; Dunkels, 1994), box-plots (Carr &
meaning from graphs created by others or by them-
Begg, 1994), scatter-plots (Estepa & Batanero, 1994),
selves (p.132).” They discussed the influences of vi-
pictographs (Watson & Moritz, 2001), and histograms
sual perception, the characteristics of graph readers,
(Melitiou & Lee, 2002). Pereira-Mendoza (1995) sug-
and the effect that experience with statistics all have
gested that children should:
on the ability of people to make sense of graphs. Friel
et al. (2001) were particularly interested in students’
1. Explore the assumptions underlying the
ability to comprehend statistical graphs, as contrasted
classification of data and interpretation of the
with graphs of functions in algebra or calculus. It is
meaning of data.
interesting to compare the references in the Friel et
al. review with another article by Roth and Bowen 2. Discuss and explore the possibility of
(2001) on reading graphs that appeared in the same alternative representations.
issue of that journal. Of the hundreds of references in 3. Predict from the data. (Pereira-Mendoza,
these two articles, only four or five references appear 1995, p. 6).
in both. The overall scope of research on understand-
ing graphs has become enormous. For my purposes, His point is that by directing students’ attention to
I want to concentrate on a few areas of research on alternative representations, teachers can help move
graph sense that are of particular importance to sta- students beyond mere drawing and tabulating of data
tistical reasoning. to more critical elements in graph sense.
Carr and Begg (1994) introduced box-plots to
Research on Understanding Particular Types 11- and 12-year-old students in order to investigate
of Statistical Graphs whether such graphs were appropriate for elementary
An analysis of the student results on graph items school students. Their informal observational study of
from the 1996 NAEP indicated that although students 8 students included a brief instructional component
performed well when reading information represent- on constructing box-plots, followed by unstructured
ed in pictographs and stem-and-leaf plots, Grade 12 student interviews to determine if students under-
students ability to read and interpret histograms or stood the ideas of center and spread. They concluded
box plots lagged behind their performance with other that box-plots are an appropriate topic for students in
graphical representations (Zawojewski & Shaughnessy, this age group provided that teachers emphasize the
2000). Performance on histograms, which requires understanding and interpretation of the plots, and
some proportional reasoning, was especially low. Stu- not just the construction of them. However, not all re-
dents tested on 1996, 2000 and 2003 NAEP adminis- searchers are in agreement about early introduction
trations could read graphs fairly well but had trouble of box-plots. Bakker et al (2004) pleaded for delaying
interpreting graphs, and even more trouble making the introduction of box-plots due to the difficulties
predictions based on graphical information. (Za- that middle school students have with the propor-
wojewski & Shaughnessy, 2000; Tarr & Shaughnessy, tional reasoning needed to construct and interpret
in press). On one extended constructed response task box-plots.
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 989
The research on students’ understanding of par- tion among graphical representations (see also Mori-
ticular graph types suggests that teachers should (a) tz & Watson, 1997). The cognitive transition in bar
include a variety of graphical representations and graphs mirrors the transition from additive to propor-
(b) go beyond mere graph construction to discuss tional to distributional thinking previously discussed
the meanings and interpretations of the graphs. Leh- in this chapter, with a heavy reliance on proportional
rer and Romberg (1996) note that most examples of thinking. Without proportional thinking, students see
graphs in textbooks are preprocessed, and they rec- only counts and miss the power of graphs to reduce
ommend more attention be given to having students data and show trends. Proportional thinking is the
construct and share their own graphical representa- lynchpin in making sense of statistical graphs.
tions of data.
Research on Components of Graph Sense
A Closer Look at “Bar” Graphs Curcio (1987) introduced a framework for graph
Given the research on students’ understanding sense by building on research in the literature on read-
of particular graphs, consider the complexity of just ing. She identified three elements—form, content,
one type of graph, the bar graph. Bar graphs are first and topic—which contributed to the ability of stu-
introduced in the elementary grades to represent fre- dents to construct their own graph-reading schemata.
quency counts of categorical or integer data. They fol- In later work she characterized three levels of reading
low in a sort of natural way from stacked dot plots, of graphs: reading the graph, reading within the graph,
allowing data representation to move from discrete to and reading beyond the graph (Curcio, 1989). These
continuous displays. But this can be a big jump for categories help to shed light on the complex nature
students, as the exact count that was visible in a dot of understanding graphs. For example consider the
plot is no longer obvious in a bar and the vertical axis graph of fruit juice consumption over time in Figure
may no longer be integer valued. Bar graphs are first 21.11. This graph is based on data collected by the
used to represent frequencies, but later on they are United States Department of Agriculture (USDA,
used with the bar heights that represent relative fre- 2004) and posted on their website.
quencies. Still later, when histograms are introduced In order to be Reading the graph, students have
to represent the frequency or relative frequency of to at least understand the scale and the measurement
continuous data within continuous intervals, the hori- units. This is a graph over time, from 1970 to 2000, and
zontal axis is no longer categorical or integer valued, the vertical axis is in gallons per person per year. It is
it also becomes a continuous scale. Relative frequency also important for students to realize and understand
distributions are a critical tool for normalizing data that “per capita” data are rate data. Reading within the
in order to compare unequal-sized data sets—but re- graph is especially important for graphs over time, in
search has shown that this is a very challenging issue order to be able to discuss general trends. For exam-
for students to grasp. ple, the timeline shows a fairly steady increase in con-
In their analysis of document literacy among adults sumption, with a couple of sudden dips. Why is there a
(discussed previously in this chapter), Kirsch, Junge- general increase, and why did those dips occur? Read-
blatt, & Mosenthal (1998) and Mosenthal and Kirsch ing beyond the graph, Curcio’s highest level of graph
(1998) have done large-scale studies that point out comprehension, includes such skills as projecting into
the difficulties that adults have with bar graphs or line the future and asking questions about the data. For
graphs that require more than simply reading the data. example, one might suggest that fruit juice consump-
When integration of information in bar graphs is re- tion appears to be dropping at the end of the graph,
quired, or when adults are asked to summarize in writ- and that it may level off at some point in the future.
ing the conclusions from a graph, they perform much Or, one might ask where these data came from, how
more poorly than on simple reading-the-graph tasks. they were collected, and how the USDA managed to
These results are consistent with reports about NAEP calculate or estimate “gallons of fruit juice per capita”
tasks (Zawojewski & Shaughnessy, 2000; Tarr & Shaugh- for each of the years.
nessy, in press) and with what Gal (2002) discussed I have previously argued (Shaughnessy, Garfield,
about the Grade 12 TIMMS results on graph sense. & Greer, 1996) for another level of graph comprehen-
The conceptual complexity of bar-type graphs con- sion beyond Curcio’s three levels, that is, reading be-
tinually increases throughout the teaching and learn- hind the data or graph. This involves more than just
ing process. The values and scales on the horizontal reading beyond the graph. Statistics are within a con-
and vertical axes morph from counts, to percentages, text. As Wild and Pfannkuch (1999) have noted, one
and then to percentages over an interval. Watson and must always search for special causes of variation in
Moritz (1997b) discussed the complexity in this transi- the data. The special role of statistics as a scientific dis-
990 ■ STUDENTS AND LEARNING
8.5
8.0
Fruit_Juices
7.5
7.0
6.5
6.0
5.5
1970 1975 1980 1985 1990 1995 2000
Year
Figure 21.11 Fruit Juice Consumption, Gallons per capita, over time (USDA, 2004).
cipline lies in making connections between the con- Watson and Moritz (1997a) applied their three
text and the graph, that is, what lies behind the graph. tiers of statistical literacy—(a) knowledge of basic statis-
For example, in the juice consumption graph one tical terminology, (b) applying statistics in context, and
might look at historical, economic, or demographic (c) challenging statistical claims—to analyze students’
influences that may have affected juice consumption responses to some graphs taken from newspapers.
over the last 40 years. In reading behind this graph, The tiers of statistical literacy match up fairly well with
one might conjecture that in the mid 1980s, better Curcio’s three levels of graph sense when one consid-
production of juice came about. Perhaps a switch oc- ers statistical literacy about graphs. Watson and Moritz
curred from smaller orchards to larger centralized lo- found a lack of attention to scaling issues in the data
cations for collecting and processing fruit for juice. and graphs they gave their students. Furthermore, only
As time went on, a global economy arose so that fruit 10% of their students were aware of the complexity of
juice could be produced and shipped more easily all or questioned some of the features of one particularly
year round from somewhere in the world. This great- misleading graph. When students were asked to make
er availability could contribute to a more rapid growth some calculations based on information in the graph,
in fruit juice consumption. Perhaps a killing frost ex- many of them ignored the graph altogether and revert-
plains the two years, 1981 and 1989, that show a drastic ed to personal experiences. Watson and Moritz discov-
drop in fruit juice consumption. Whatever the cause ered gaps in students’ graphical literacy within all three
of the drop behind the data in those years, a traumatic of their levels of statistical literacy.
event affected juice consumption but was followed by In another series of studies that investigated stu-
a quick rebound to the status quo in subsequent years. dents’ thinking about pictographs Watson and Moritz
The drop-off during more recent years might be due (2001) identified three levels of student reasoning. In
to increased consumption of other beverages, such as follow-up studies 3 years later they documented growth
soft drinks or bottled water. These are just a few of in students’ thinking about pictographs. They also found
the possible special causes that might affect fruit juice that cognitive conflict could improve students’ initial rea-
consumption over time, which involve looking behind soning. When a sample of Grade 3 students was shown
the graph, and using some data detective skills to ana- responses from other children about the pictographs,
lyze graphical information. their reasoning about the graphs tended to improve.
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 991
Investigations of students’ graph comprehen- read beyond the data). I also suggest two additional
sion and graph interpretation skills have been one behaviors that fall under the level of reading behind
approach to exploring the components necessary to the data.
develop good graph sense. Another approach has
involved analyzing student-generated graphs. Fol- 1. Recognizing components of graphs (Reading
lowing on earlier work in relation to the SOLO mod- the data).
el (Watson et al., 1995), Chick and Watson (2001, 2. Speaking the language of graphs (Reading the
2002) used the data card tasks that included infor- data).
mation about students’ weight, age, favorite activ- 3. Understanding relationships among tables,
ity, and number of fast food meals eaten per week. graphs, and data (Reading within the data).
Students in Grades 5 and 6 were put into groups of 4. Making sense of a graph, but avoiding
three and asked to generate some hypotheses and to personalization and maintaining an objective
construct graphical representations of the informa- stance while talking about the graphs.
tion on the case cards. Chick and Watson’s students (Reading within the data).
produced three levels of representations of informa-
5. Interpreting information in a graph and
tion in the data cards. Some students depicted indi-
answering questions about it (Reading beyond
vidual aspects of the data set, such as a table of single
the data).
values of one variable with no interpretation offered.
Other students produced a graph of a single vari- 6. Recognizing appropriate graphs for a given
able, and still others produced a bi-variate represen- data set and its context (Reading beyond the
tation, such as a scatter graph, indicating a possible data).
relationship between two of the variables. Students’
In addition I would add
reasoning on the task followed a pattern similar to
their graphical representations: looking at individual 7. Looking for possible causes of variation
data points, considering the entire range of values (Reading behind the data).
for one variable, or conjecturing cause-and-effect re-
8. Looking for relationships among variables in
lationships among several variables. Nearly half the
the data (Reading behind the data).
students could interpret the data at a higher level
than they could represent it graphically, suggesting The cumulative results from a number of research-
that students’ graphical sense may lag behind their ers on graph sense indicate that students have poor
data interpretation skills. graphical interpretation skills and are often unable to
The attention to individual data points exhibited reason beyond graphs. Also unless the graph is very
by Chick and Watson’s students has been pointed straightforward, students may even have trouble just
out by other researchers, notably Cobb (1999) and reading a graph at all. In the special case of statistical
Konold and Higgens (2002, 2003). These research- graphs, reading behind the data is critical to making
ers have suggested that case-value representations connections between the context and the data. Basic
of data are a good starting point to help students graph sense such as reading, reading within, and read-
develop graph sense and begin making connections ing beyond graphs is critical to statistical thinking, rea-
to the context. Konold and Higgens (2003) present- soning, and literacy.
ed case value data in a value bar representation and
demonstrated a progression of representations from
value bars, to end points of the value bars, to stacked Technology and Research on Learning
dot-plots. This transition path could aid students and Statistics
teachers to move beyond students’ initial preferences My principal research interest in the use of tech-
for individual case value plots to stacked dot plots that nology in teaching statistics is the development of
provide a visual representation of the entire distribu- students’ statistical thinking in the conceptual ar-
tion of the data, including information about shape, eas discussed in the previous section: averages, vari-
center, and spread. ability, information from samples and surveys, and
A summary analysis of the literature on students’ graph sense. The availability of powerful computing
understanding of graphs by Friel et al. (2001) identi- tools has led to improved methods of analyzing data
fied six behaviors that they considered to be closely as- and exploring data graphically (Hawkins, Jolliffe, &
sociated with graph sense. Each of these six behaviors Glickman, 1992). What sorts of technological envi-
seems to fit nicely with one of Curcio’s three levels of ronments can enhance student learning of the major
graph reading (read the data, read within the data, concepts in statistics?
992 ■ STUDENTS AND LEARNING
Biehler (1993, 1994a, 1994b, 1997) has written ex- ing selection of procedures from an already existing
tensively on the type of software tools that are neces- repertoire. In an analysis of existing and desirable sta-
sary to enhance the teaching and learning of statistics. tistical software, Biehler (1997) distinguished among
Over a decade ago Biehler described aspects of tech- tools, resources, and microworlds for the teaching and
nology needed to empower students to do interactive learning of statistics with technology. By tools, Biehler
exploratory data analysis, using visualization and sim- means the type of software and hardware support that
ulations tools to understand statistical concepts and professional statisticians use to practice their trade.
methods. He proposed the following components for For example, Minitab (2005) is a software package that
ideal software: consists primarily of statistical tools for data analysis.
Resources would include data sets with references, and
• Student tools for data analysis, for method background and context information. These types of
construction and evaluation, for modeling data sets can often be found at Internet sites. The ERS
and for visualization, that can grow and ex- Web site maintained by the Department of Agricul-
pand along several paths into a professional ture (2005) from which the food consumption data
version, instead of mere technically reduced (see Figures 9 & 11) were downloaded is one example
student versions of professional systems of what Biehler means by resources. Microworlds pro-
• A system of coordinated computer vide a creative, exploratory environment for students
experiments, learning environments, and to represent data, and to carry out simulations. Both
major visualizations that can be adapted Tinkerplots (Key Curriculum Press, 2005a) and Fathom
to students’ and teachers’ needs. (Biehler, (Key Curriculum Press, 2005b) are examples of mi-
1994b, p. 3). croworlds that are flexible and that put a lot of the
power in the hands of the students to create a variety
In another early analysis of and report on the use of types of graphs and data representations.
of technology for teaching statistics, Garfield (1990) Ben-Zvi (2000) presented examples of types of
led a Working Group on Technology and Data of the technological tools for learning statistics that are simi-
American Statistical Association (ASA). This Working lar to Biehler’s categories, such as statistical software
Group outlined attributes of technological environ- packages, microworlds, tutorials, and resources on
ments that can facilitate the learning of data handling, the Internet. According to Ben-Zvi, tools for statistical
including: learning have been developed to support these areas:
Direct access, which allows students to view 1. Students’ active knowledge construction, by
and explore data in different forms, “doing” and “seeing” statistics.
including subsets of data and different visual
2. Opportunities for students to reflect on
representations.
observed phenomena.
Flexibility, which allows students to experiment
3. The development of students’ metacognitive
with and alter displays of data, change
capabilities, that is, knowledge about their
intervals on a graph, and explore different
own thought processes, self-regulation, and
models that may fit the data.
control.
Connectedness, so that students are able to access
4. The renewal of statistics instruction and
resources on the Internet, as well as to obtain
curriculum on the basis of strong synergies
software or data used in the study of other
among content, pedagogy, and technology.
disciplines.
(Ben-Zvi, 2000, p. 128)
Representations, including dynamic ones from
which students may choose among different Ben-Zvi shared a detailed example of some middle
graphs in order to select the best way to school students as they worked with spreadsheets and
interpret and display a data set. constructed graphical representations of some data
on Olympic records for the 100-meter dash. He docu-
Both Biehler and Garfield have continually rec- mented how even a simple tool like a spreadsheet can
ommended that in order to truly enhance statistics be very powerful in the hands of students who are di-
education, technology for learning statistics should recting their own learning and how a spreadsheet can
go way beyond mere statistical packages that carry out provide support for the four areas he identified.
procedures. They advocate for technology that puts Bakker (2002) referred to Fathom and Tinkerplots
the design and representation of data structures in as “landscape-type” tools and contrasted them with
the hands of the student, rather than merely allow- “route-type” tools with which students have far fewer
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 993
choices at their disposal. Examples of route-type tools there may be too much route-type and not enough
include the Mini-tools discussed by Cobb et al. (1999) landscape-type use of technology in the teaching of
and the tools created by the Chance–Plus project such statistics. The landscape-type software is more likely to
as Prob-Sim and Data-Scope (Konold & Miller, 1994). help students to become good data detectives. Open-
Landscape-type tools put students in very open situ- ended micro-worlds give students a lot of autonomy
ations, with a variety of powerful choices, including and power in data exploration and analysis, whereas
sorting and arranging data into a variety of visual dis- some route-type software might be a bit confining and
play formats. Bakker expressed concern about starting sometimes overly procedurally oriented.
students with these landscape-type tools, as they might
overwhelm students with too many representational Some Research on Students’ Statistical Thinking
choices. He argued that it is better to first introduce with Technology
students to some more limited route-type tools, such Bakker and Gravemeijer (2004) presented a de-
as Prob-Sim or Minitools, that can provides students with tailed analysis of a teaching experiment with seventh
focused opportunities to explore particular concepts. Graders that was designed to get students to reason
Friel (2007) has provided a detailed review of research about distributions in informal ways. They noted four
on the interaction of technology with the teaching critical components of distributions: centers, spreads,
and learning of statistics, including an analysis of the density or clustering, and skewness. Their teaching
strengths and weaknesses of a variety of types of tech- experiment was designed to have students encounter
nological tools for teaching statistics, including both these four components and to move students along
route-type and landscape-type tools. from looking at data as individual case values to rea-
Research on what students learn about statistics or soning about entire distributions of data. Using two
how their statistical thinking develops while using new computer Mini-tools, a value bar tool and a stacked
statistical tools like Tinkerplots and Fathom is relatively dot-plot tool, Bakker and Gravemeijer had students
sparse as of this writing, especially when compared to investigate two data sets on battery life, data sets that
the general explosion of research in other areas of had also been used by other researchers (Cobb, 1999;
statistics education. During the development phases Cobb, McClain, & Gravemeijer, 2003).
of statistical software most of the developers’ energy The students first compared the two battery data
is focused on the tools themselves, or on professional sets using the value bar tool. During this exploration,
development work with teachers to help them use the students began to use terms like majority, outliers, reli-
new software. Normally very little research is done on ability, and spread out. Bakker and Gravemejier claimed
student learning or on student conceptual growth in that the value bar tool helped to provide a visual rep-
statistics during the developmental phase of software. resentation of the mean for students. Subsequently
Finzer (2002) made a good case for the simultaneous the stacked dot-plot tool allowed students to develop
integration of research on student learning with soft- qualitative notions of more advanced aspects of distri-
ware development, so that students’ learning issues butions such as frequency, classes, spread, quartiles,
can inform the software development team in mid- median, and density (Bakker & Gravemeijer, 2004).
stream, and vice-versa: The dot plot tool also allowed students to partition
data into 2, 4, or more equal-sized groups. This pro-
In some ideal world the boundary between develop- vided an underlying structure for development of
ment of software for use in research on learning and
concepts such as median, box-plot, density, and even-
development of software for classroom use would be
truly porous so that researchers could easily adapt tually histogram. Students were able to reason about
classroom software for research purposes and class- clumps of data and compare clumps across the two
rooms would reap the benefits of educational re- battery-life distributions, so that the underpinnings
search. (Finzer, 2002, p. 1) of Konold’s “signal amid the noise” conception of the
mean were available. Cobb (1999) described a simi-
General overall access for all students and teach- lar learning trajectory for a group of middle school
ers to statistical software is still a problem. Too few students who were exploring the battery data. Cobb’s
teachers and schools are currently using data analysis principal lens for the study was classroom discourse
tools like Fathom or Tinkerplots. Even AP statistics teach- and social interaction, which he claims is a necessary
ers are more likely to use graphing calculator statistics component for the type of growth and learning wit-
packages or some of the more traditional data-crunch- nessed in his work.
ing packages such as Minitab. To argue the flip side On the basis of the statistical development that
of Bakker’s concern about landscape-type software they witnessed among the students in their studies,
being too open, in the current teaching of statistics Bakker and Gravemeijer, and Cobb, concluded that
994 ■ STUDENTS AND LEARNING
it is important to provide opportunities for students variation in scatter-plots, and some of their own work
to contribute their own ideas to the statistical learn- with secondary school students who explored a large
ing process and that teachers need to provide a lot of multivariate data set containing information on a
time for discussion and interaction during class explo- number of variables about high school students (Ko-
rations of data. They also believe that formal measures nold, Pollatsek, Well, & Gagnon, 1997). According to
to describe distributions, such as median and quar- Konold et al. (unpublished manuscript) data can be
tiles, should not be introduced until after students viewed as pointers, case values, classifiers, or aggregates.
have had opportunities develop their own intuitive A major issue in statistics education is to find ways
notions about distributions. to help students’ thinking about data to evolve along
In reading the Bakker and Gravemeijer study, it this continuum. Although Konold has written about
is striking how natural the integration of the Mini- the importance of the mean as the “signal amid the
tool environment was in the teaching experiment. noise,” indicating that he considers data as aggre-
Furthermore, the process of students’ growth from gate to be the most important way for students view
first looking at data as individual data cases, then data, in this working paper he has also acknowledged
considering clumps of data, and finally making com- the potential benefits of viewing data in other ways.
parisons and decisions based on clumps and spreads, For example, case values can help to highlight the
was clearly supported and enhanced by the Mini-tools variability in data, a concept that gets short shrift in
technology. Even though these two Mini-tools are school mathematics (Shaughnessy et al., 1999). The
more in Bakker and Gravemeijer’s route-type software types of technological tools that Konold et al. include
category, the students were able to grow considerably in their paper support student conceptual growth on
despite the software’s limited capabilities. This study centers, as well as on variability.
was enhanced by the strong alignment between the
As noted earlier in this chapter, students have a
task environment and the Mini-tool’s capability to ex-
very difficult time learning about sampling distribu-
plore that task environment. Based on these studies
tions and the accompanying statistical ideas such as
on teaching statistical concepts with technology, the
the law of large numbers (Saldanha and Thompson,
need for a close match between the statistical ques-
2003). Chance et al. (2004) shared research on a series
tions that are being investigated and the capability of
of experiments on sampling distributions conducted
the software to directly tackle those questions seems
paramount. These studies also suggest that the type of with college students in introductory statistics courses.
classroom discourse that takes place during a statisti- This work built upon earlier explorations of tertiary
cal exploration also has a major impact on students’ students’ understanding of sampling distributions
conceptual growth in statistics. (del Mas, Garfield, & Chance, 1998; Garfield, delMas,
& Chance, 1999), and used a software tool developed
Friel (2007) shared a number of visual repre-
by del Mas (2001) to run simulations that generate
sentations of data from two different types of roller
sampling distributions.
coasters using Tinkerplots. Stacked dot plots allow stu-
dents to partition the two data sets in different ways, In their research Chance et al. identified four
to compare spreads, middles, and clumps. Gradually prerequisite concepts that students need in order to
the shape of the two distributions becomes apparent. understand sampling distributions: variability, dis-
The roller coaster comparison task is rich in student tribution, sampling, and the concept of the normal
exploration possibilities, similar to the battery-life distribution. Research reviewed in previous sections
task discussed above. The combination of interesting of this chapter has already noted the complexity and
data sets with versatile software tools has made these diversity of student thinking about variability and
two environments, the roller coasters and the batter- about samples and sampling, and has noted issues
ies, excellent candidates for future teaching and re- that students deal with when comparing distributions.
search projects. With the addition of the normal distribution as a pre-
In reviewing a series of efforts in which research- requisite concept, it is no wonder that Saldanha and
ers have asked students to explore and compare data Thompson (2003) encountered so much difficulty in
sets within rich technological environments, Ko- their teaching experiment on sampling distributions
nold et al. (unpublished manuscript) summarized with secondary students. Chance et al. used the devel-
a number of different ways to consider data. Among opmental model of statistical thinking that Jones et
their research sources were cases of primary teachers al. (2000) had introduced when working with elemen-
who wrote about how their students dealt with data tary and middle school students. Chance et al. uncov-
(Russell, Shifter, & Bastable, 2002), some research ered and subsequently validated five different levels
by Cobb et al. (2003) on how students describe co- of reasoning about sampling distribution among their
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 995
college students, namely, idiosyncratic, verbal, transi- from those naive conceptions to richer, more power-
tional, procedural, and integrated types of reasoning. ful understandings of statistical concepts.
Even though Chance et al. spent considerable
time targeting tasks in which their students ran soft-
ware simulations to explore sampling distributions, RESEARCH ON AND DEVELOPMENT
they found only a few students who actually managed OF TEACHERS’ UNDERSTANDING
to integrate all the statistical concepts involved as they OF STATISTICS
explored properties of sampling distributions.
Much of the research on teachers’ understanding of
Part of the problem in developing a complete un-
derstanding of sampling distributions appears to be statistical concepts has been derived from profession-
due to students’ less than complete understanding al development work that is intended to extend teach-
of related concepts, such as distribution and stan- ers’ knowledge and competence in statistics. This is
dard deviation. We have found our own research pro- partly a matter of convenience, combining research
gressing backward, studying the instruction of topics with professional development. It is also partly be-
earlier in the course and the subsequent effects on cause it is difficult to obtain direct information about
students’ ability to develop an understanding of sam- teachers’ knowledge about statistical concepts. Most
pling distributions. For example, initially we explored K–12 mathematics teachers in the United States have
student understanding of the effect of sample size
very little background in statistics. The exceptions are
on the shape and variability of distributions of sam-
ple means. We learned, however, that many students those teachers who may have had a concentration in
did not fully understand the meanings of distribution statistics during their masters program for secondary
and variability. Thus, we were not able to help them teachers, or middle school teachers who completed
integrate and build on these ideas in the context of one of the few special programs that exist in the Unit-
sampling distributions until they better understood ed States for middle school mathematics teachers.
the earlier terminology and concepts. (Chance et al., Teachers know that their statistical understanding is
2004, p. 312) shaky, and so attempts to obtain direct information
from them using surveys or interviews can be embar-
Like Saldanha and Thomspon, they go on to conclude rassing for them. As a result, a good deal of what is
that the “concept of sampling distribution is a difficult known about how teachers think and reason about
concept.” statistics has tended to be somewhat anecdotal. Some-
times field notes can be taken during observations of
The few pages given in most textbooks, a definition classroom statistics activity, or retrospective statements
of the Central Limit Theorem, and static demonstra- can be obtained from teachers while debriefing trial
tions of sampling distributions are not sufficient to lessons of new statistics materials.
help students develop an integrated understanding of
the processes involved, nor to correct the persistent
misconceptions many students bring to or develop Research on Teachers’ Understanding
during a first statistics course. Our research suggests of Statistics Within Professional
that it is vital for teachers to spend substantial time in Development
their course on concepts related to sampling distribu-
tions.” (Chance et al., 2004, p. 312) Over the course of several curriculum develop-
ment and professional development projects, Rubin
If the results of the research on student under- used field notes and reflections after interviews to
standings of average, variability, and distribution that gather information on teachers’ thinking about sta-
were discussed in earlier sections of this chapter can tistical concepts. In one project several teachers vol-
be effective in promoting better and earlier teaching unteered to trial the Stretchy Histograms and Shifty
and curriculum development in K–12 statistics, per- Lines software from the Elastic project with their stu-
haps Chance and others will eventually have a stronger dents (Rubin & Rosebery, 1988). These two tools were
statistical foundation to build upon when they work among the very first tools to allow users to click and
with tertiary students. In the meantime, tertiary sta- drag elements of graphs, like bars in histograms, or
tistics instructors need to heed Chance et al.’s advice lines in scatter-plots, and to observe changes in the
and start teaching from where their students actually mean and median of distributions or changes in the
are in their knowledge of statistics. The research re- equations of regression lines. Rubin and Rosebery
viewed in this section suggests that technological tools found that when teachers asked their students to alter
are very important for helping students to transition histograms by adding values to the original graph, the
996 ■ STUDENTS AND LEARNING
teachers became puzzled. For example, they wondered sets is promising. Second, the types of graphical repre-
why did the median not change, when the mean did sentations that are now available to compare bivariate
change? The teachers were unaware that there might data (gender × treatment in this case) have the poten-
be multiple occurrences of the median value in the tial to revolutionize data analysis in classrooms in the
data set, or that the median remains constant if the future. At the present a typical AP statistics class ap-
same number of data values are added in on either proach to the T-cell problem might be to put the data
side of it. As a result of Rubin and Rosebery’s work, into a 2 × 2 contingency table, sex × treatment, and
recent software packages like Tinkerplots prefer to use examine cell values and margin proportions. Imagine
stacked dot-plots as the default representation mode in a few years, if middle school students have already
rather than histograms. Dot plots preserve every data explored bivariate data using powerful visual data-rep-
point, and the median value can easily be highlighted resentation tools, the standard numerical statistical
in a dot plot. procedures for analyzing data might not be their first
In another project, Hammerman and Rubin choice to approach bivariate problems. More of the
(2003) worked with middle school and secondary decision-making power for how the analysis is to pro-
school teachers as they used Tinkerplots to clump data ceed could be put directly into the students’ hands.
into bins, thereby reducing data complexity. This The work of Rubin and her colleagues has important
helped the teachers to find ways to make comparisons implications for the statistical education of teachers.
between groups of unequal sizes. Research discussed
earlier in this chapter documented the difficulty that We believe that one powerful way for teachers to gain
students have when comparing groups of unequal size expertise in statistical reasoning is to have more expe-
rience in “being statisticians” themselves. In the area
(e.g., Watson, 2002), and one would expect teachers
of exploratory data analysis, teacher education should
to have some of those same difficulties. One data set consist largely of teachers investigating statistical prob-
that Hammerman and Rubin gave teachers consisted lems that interest them, collecting data, analyzing it,
of T-cell counts for two groups infected with the HIV and drawing conclusions in the same way that statisti-
virus, a control group with n =186 and an experimen- cians would. (Rubin and Rosebery, 1988, p. 17)
tal group with n = 46. Gender information within
the groups was also of unequal size. (The T-cell data In fact, this vision for the statistical education of teach-
set had previously been used by Cobb, Gravemeijer, ers should be the norm for the statistical education of
Doorman, and Bowers, 1999.) Tinkerplots allowed the all of our students.
teachers to create bins by T-cell count, then to count Makar and Confrey (2002) have also conducted
the total number of people in each bin, and finally professional development projects in statistics with an
to create pie-graphs of the percentage of men and imbedded research component. They cleverly merged
women in each bin. The pie-graph percentages can their teachers’ concerns about their students’ per-
disguise the actual bin counts, and decisions made formance on large-scale state testing in mathematics
on percentages alone may be misleading because with an upgrade of the teachers’ own understanding
of small sample sizes within some bins. Conversely, of statistics. Makar and Confrey introduced teachers
actual bin counts can also be misleading if relative to data analysis with Fathom using data from students’
sizes are neglected. However, these teachers were si- performance on the state mathematics test. Teachers
multaneously able to compare both frequencies and were particularly interested in comparing different
relative frequencies for T-cell count by gender, and to groups of students on the state test, so Makar and Con-
use the information to make some conjectures about frey took advantage of that context. They identified
the relative success of a treatment for HIV for each four constructs they wished their teachers to consider
gender. Hammerman and Rubin (2003) argued that as they decided whether differences between groups
when faced with a lot of complex data, their teachers of students’ test scores were really meaningful differ-
sought ways to reduce or manage variability, so they ences: measurable conjectures, tolerance for variabil-
could make better judgments. ity, context, and an ability to draw conclusions. For
Several things are striking in the work of Ham- example, they pointed out that tolerance for variability
merman and Rubin. First, the teachers found ways requires a very different mindset than the determin-
to use the software to create proportions within bin istic foundations of traditional mathematics courses.
cells, and to reason proportionally when comparing Statistics requires decision making under uncertain-
data sets, rather than purely additively with frequen- ty. Their teachers discovered a new appreciation for
cies. Even though proportional reasoning had its po- variability within the context of student scores on the
tential pitfalls in this case, the fact that the teachers state test. What began as a surface-level analysis of just
used proportional comparisons of unequal-sized data the bare numbers transitioned to a rich discussion re-
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 997
lated to context when the teachers were told that the on a teacher’s understanding of data and distribu-
numbers in their data sets were actual state test scores tion (Mickelson & Heaton, 2004), and another on
for a set of different classes. In analyzing the teach- manifestations of that same teacher’s understanding
ers’ work Makar and Confrey created a framework of variability (Mickelson & Heaton, 2003). In some
of five levels of reasoning used by the teachers when contexts, the teacher exhibited strong statistical rea-
they compared data sets: pre-descriptive, descriptive, soning skills, and in other contexts, only much more
emerging distributional, transitional, and emerging naive statistical skills. This teacher was acting as both
statistical. Makar and Confrey’s levels describe the a learner and a teacher in their research project, a
growth in the teachers’ understanding of the role duel role that Mickelson and Heaton urge more re-
and importance of variability when comparing data searchers and professional development leaders to
sets, and the difficulty of making conclusions about consider, especially with elementary teachers who are
measurable conjectures. trying new ideas out in statistics and data analysis in
In another study Makar and Confrey (2005) inves- their classrooms. Although this third-grade teacher
tigated preservice mathematics and science teachers’ had had considerable professional development work
understanding of variation and distribution during a in data analysis herself, at times she had difficulty find-
teacher education course. The teachers were intro- ing an appropriate vehicle to transfer what she her-
duced to the Fathom software and then given some self had learned to the students in her own classroom.
assessment data to analyze. Makar and Confrey were Teachers need opportunities to create statistical activi-
interested to see if their participants would compare ties and investigations themselves and to try them out
groups just using means, or if they would look at varia- while supported in their classroom by a statistics edu-
tion and spread in their comparisons. In interviews cator. The teaming of teachers with researchers in the
these preservice teachers made comparisons between classroom may help to facilitate teachers’ abilities to
groups of students using conventional statistical terms, transfer their own insights in statistics to experiences
such as means, measures of spread, discussions of for their students.
shape, and the proportion of students who improved
on the assessment. However, Makar and Confrey also
Larger Scale Research on Teachers’
encountered a good deal of non-statistical language
Understanding of Statistics
among their preservice teachers, such as “clustered” or
“spread out” or “bulk of” or “majority” or “clumps” or Professional development cases, such as those de-
“big chunk.” In their analysis of the responses Makar scribed by Hammerman and Rubin and by Makar and
and Confrey concentrated on “clumps” and “chunks.” Confrey in the previous section, tend to be with small
A clump means exactly what it sounds like, including groups of teachers who are special volunteers. In sev-
modal clumps of data. A chunk is a contiguous subset eral studies that involved a larger number of teachers,
of a distribution of data, not necessarily a clump. This Bright and Friel (1993) asked primary teachers to cre-
language is reminiscent of the “bumps” and “gaps” ate concept maps for probability and statistics. They
language used by Friel, Mokros, & Russell (1992) in found that the maps were disconnected and rather
their work with elementary children in the Used Num- sparse in statistical content. Greer and Ritson (1993)
bers Project. It is an intuitive, natural language used to surveyed teachers at all levels in Northern Ireland and
describe a collection of data. Makar and Confrey were found a universal need for in-service teacher training
capturing the primary intuitions of the student teach- in probability and statistics across all grade levels.
ers (Fischbein, 1987), the raw material upon which Watson (2001b) found a rather clever way to
secondary intuitions can be developed through teach- amass a considerable amount of information from
ing environments. Makar and Confrey warned that teachers on their perceptions of statistics and on their
the nonstandard terminology that students use should own strengths and weaknesses in the data and chance
not be ignored, as it might be a rich source of student curriculum in Australia. Using an information profile
understanding that could be overlooked. to assess the need for professional development, Wat-
Heaton and Mickelson have also explored K–6 son asked teachers to rate the importance of certain
teachers’ understanding of some statistical concepts statistical concepts, to describe their own lesson prac-
(Heaton & Mickelson, 2002; Mickelson & Heaton, tices, to estimate their confidence in teaching certain
2003, 2004). Several of their research efforts have in- statistical topics, and to suggest two possible student
volved a single case study of one third-grade teacher responses to some statistical tasks, one appropriate
as she integrated statistics investigations into other and the other inappropriate. This proved to be a less
parts of her curriculum, such as science, literature, threatening methodology for obtaining information
or social studies investigations. One report focused on teacher knowledge of statistics than testing them
998 ■ STUDENTS AND LEARNING
directly. Watson used Shulman’s (1987) knowledge Teach STAT, 1996a, 1996b;). In project Teach-STAT an
typology to guide the categories of questions on the intensive 3-week in-service course in data analysis for
teacher profile, thereby obtaining information about teachers in Grades 1–6 led to the creation of a pro-
teachers’ content knowledge, knowledge of teaching, fessional development manual for statistics educators.
pedagogical content knowledge, and other knowledge The in-service materials for teachers were based on
categories identified by Shulman. Watson was able to the Used Numbers curriculum materials that were writ-
administer the profile to over 40 teachers, some in in- ten for elementary students (Russell & Corwin 1989;
dividual interviews, some via written survey, and some Corwin & Friel, 1990; Friel, Mokros, & Corwin 1992).
via an Internet survey. Of note in Watson’s findings In Australia, several trials to deliver professional
were the teachers’ lack of confidence in statistics and development in statistics for teachers from a wide geo-
their lack of knowledge about sampling. Also of note graphical area were conducted by the Australian As-
was the need expressed by many of her teachers to sociation of Mathematics Teachers (AAMT) (Watson,
further their own professional development in data 1998). The professional development was delivered
and chance, as well as their frustration with the lack of by regional satellite television, by video-conferencing
support from local authorities to help them in teach- across the nation, by the preparation of a CD-ROM
ing statistical topics. for teachers, and by the development of a Website
Watson’s profile was designed to provide infor- with the help of a capital city newspaper. The material
mation about teachers’ content knowledge in statis- delivered was based on workshops and other material
tics while assessing their professional development to promote data and chance as suggested by the na-
needs. A next step would be to administer such a pro- tional mathematics statement (Australian Education-
file anonymously, directly to teachers, to get a better al Council, 1991). Evaluation by consultants and par-
understanding of their content knowledge. Inasmuch ticipants was quite positive (Watson & Moritz, 1997b).
as this is a potentially risky and threatening venture However, as with many such technology-based innova-
for classroom teachers, it might be easier to investi- tions, when the government funding was exhausted
gate the statistical content knowledge of preservice the ability to reach teachers was greatly diminished.
teachers. Canada (2004) conducted a study of pre- A number of other efforts have addressed the profes-
service elementary teachers’ conceptions of variabil- sional development needs of teachers in statistics, such
ity in three contexts, data and graphs, sampling, and as books by Hawkins (1990) and Hawkins et al. (1992) in
probability. Building upon the aspects of variability the United Kingdom, the six books on data and chance
identified by Wild and Pfannkuch (1999), Canada be- of the Navigations (2004a, 2004b) series of the National
gan by documenting a variety of types of preservice Council of Teachers of Mathematics (see e.g., Burrill,
elementary teachers’ thinking about variability. From Franklin, Godbold, & Young, 2004; Shaughnessy, Bar-
his analysis of students’ responses to survey tasks, an rett, Billstein, Kranendonk, & Peck, 2004c), and the
emerging framework evolved consisting of three main book Statistical Questions from the Classroom (Shaughnessy
aspects: expecting variation, displaying variation, and & Chance, 2005), just to name a few. As documented in
interpreting variation. Canada subsequently used a number of studies discussed above, professional devel-
his framework to compare and contrast the thinking opment materials for teachers can lead to windows of
of 6 case study students before and after he imbed- opportunity for researchers to explore teachers’ own un-
ded teaching episodes on statistics in a mathematics derstandings of statistical concepts.
course for preservice elementary teachers. Canada
found growth in all three aspects among the case-
study students, including richer conceptions involving
expectation of variation in data, more flexibility with SOME RECOMMENDATIONS
displays of variation, and stronger interpretations of FOR FUTURE RESEARCH
variation. Canada’s evidence suggests that class inter-
ventions can help to strengthen preservice teachers’ At the end of my chapter in the first edition of this
conceptions of variation. Handbook I put forth some recommendations for fu-
In response to the tremendous need for profes- ture research in the teaching and learning of prob-
sional development in data and chance that a number ability and statistics (Shaughnessy, 1992). One won-
of researchers have identified, Friel and her colleagues ders if anyone ever really pays attention to such rec-
developed some excellent professional development ommendations, but editors of research handbooks
and curricular activities for primary teachers to help and readers who search through the chapters of such
strengthen teachers’ statistical content knowledge and handbooks usually expect a look back over where the
pedagogical content knowledge (Friel & Bright, 1998; research has been, and some suggestions for where
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 999
the research could (should?) go next. As an author, (2004). And, there are a few teaching experiments that
it would be fun to have a conversation with readers have been conducted (Saldanha & Thompson, 2003;
at this point in the chapter to see what recommenda- Saldanha, 2003; Cobb et al., 2003) as well as instances
tions the readers themselves would make for future of the more modern version of a teaching experiment,
research after plowing through this lengthy tome. I do the “design experiment” (Petrosino et. al., 2003), but
have some thoughts for future research, but first I will teaching experiments are still not a frequent occur-
take a peek back at my recommendations in the first rence in statistics education. They yield a wealth of in-
edition of this Handbook. How have they fared? formation, but they are hard to conduct, very intense,
and very time-consuming if done properly. As for re-
search on the influence of technology, even though
Research Recommended in the First Edition some interesting “landscape-type” software packages
of the Handbook are now available like Tinkerplots or Fathom, very little
As I mentioned in the introduction to this chap- research has been conducted on how or what students
ter, research in probability and statistics was just be- learn about statistics with these powerful statistical
ginning to take hold in a number of countries during tools. Student reasoning with technology has more of-
the decade prior to the arrival of the first Handbook of ten been researched using “route-type” tools (Bakker
Research. At that time the field was rather naive, and & Gravemeijer, 2004; Ben-Zvi, 2000).
the suggestions I made for future research at that time Concerning the recommendation to develop as-
were probably equally naive. That wish list included: sessment tools in statistics in the first edition, a good
The need for assessment instruments; a request to in- deal of progress has been made. The book The Assess-
vestigate secondary students’ and classroom teachers’ ment Challenge in Statistics Education (Gal & Garfield,
conceptions of probability and statistics; a method- 1997) contains a wealth of assessment information. In
ological recommendation for more teaching experi- one chapter a framework for statistical assessment is
ments in probability and statistics; and a request to presented by Friel et al. (1997). Among the authors
investigate the effects of technology on student learn- who contributed to the development of assessment
ing of statistical concepts. There were also recommen- tools in statistics education are Schau and Mattern
dations for cross-cultural studies comparing students’ (1997) on the use of concept maps, Watson (1997)
statistical thinking, and a suggestion to investigate the on using the media, Curcio and Artzt (1997) on small
role of metacognition in solving statistical problems, group work, Starkings (1997) and Holmes (1997) on
but to my knowledge no one took those two recom- statistical projects, Keeler (1997) on portfolios, La-
mendations very seriously. joie on using technology in assessment, Lesh, Amit, &
Progress has been made on some of the recom- Schorr (1997) on using real-life problems, and Jolliffe
mendations, although in some cases not as much as I (1997) on instrument construction. More recently,
would like to have seen. Researchers have investigated Garfield, delMas, & Chance (2005) have led a project
secondary students’ conceptions of statistics, for ex- which has created a website of assessment items for
ample, Saldanha and Thompson (2003) and Shaugh- improving statistical thinking.
nessy et al., (2004a, 2004b, 2005). In addition, many
of the studies by Watson and her colleagues reviewed Recommendations for Future Research:
in this chapter have been conducted with students Take Two
from a range of grade levels, including Grades 9 and
11. However, much more of the research on students’ This time around my recommendations for future
understanding of statistics has been conducted with research directions fall under three broad categories:
students from Grades 3 to 8. Perhaps this is because (a) research on conceptual issues in statistics, (b) re-
students in these grade levels are actually doing some search on teaching issues in statistics, and (c) some
statistics, whereas there is still no guarantee that sec- methodological issues for research in statistics.
ondary students, at least in the United States, will
study any statistics. Statistics is growing in the United Conceptual Issues
States secondary schools but it is still treated as an The notion of distribution needs to be clarified
optional content area in many schools. It should not and students’ conceptions of the interrelationships of
be optional. the aspects of a distribution deserve more research at-
The research on teachers’ understanding of sta- tention. Statisticians and statistics educators use the
tistics has also begun to blossom with work like that of word distribution all the time. The word is used to re-
Hammerman and Rubin (2003), Confrey and Makar fer to a single distribution of data, to a sampling distri-
(2002), Watson (2001b), and Heaton and Mickelson bution of statistics, to a probability distribution. We all
1000 ■ STUDENTS AND LEARNING
sort of know what we mean, some kind of collection trends over time, spreads, centers, and shape. Bak-
of observations or numbers that is represented in a ker and Gravemeijer (2004), Moritz (2004), and Kelly
table or a graph, but it is not very well defined. Prob- and Watson (2002) have provided good examples of
ability distributions can be defined in terms of the research that has uncovered a wealth of information
values that are taken on by random variables (which from student-generated graphs.
are themselves a type of function), so clearly the word More research is needed on students’ conceptual growth
distribution involves a “range of meanings.” It would in statistics when they work in technology-rich environments.
be beneficial for statistics educators, present company Too little research has been conducted on the effects
included, to pin down what we mean when we use the of statistical software packages on students’ concep-
term distribution. Researchers are beginning to talk tual growth and thinking in statistics. Studies by Bak-
about “distributional reasoning.” For some research- ker and Gravemeijer (2004), Saldanha and Thomp-
ers, distributional reasoning in data analysis involves son (2003), Ben-Zvi (2000), and Chance et al. (2004)
the explicit integration of multiple aspects of a dis- are among the welcomed exceptions to the dearth
tribution, such as centers, shape, and variability (see of research on the effects of technology on statistical
for example, Shaughnessy et al. 2004a, 2005). A re- thinking. Currently there is very little research on the
cent meeting of the International Forum on Statisti- effects of “landscape-type” data analysis tools such as
cal Reasoning, Thinking, and Literacy held in Auck- Tinkerplots or Fathom.
land, New Zealand (SRTL IV, August, 2005)3 was even More research is needed on teachers’ conceptions of statis-
entitled “Reasoning about Distributions.” A number tics. Teachers have the same difficulties with statistical
of researchers are beginning to articulate their own concepts as the students they teach. Research needs
thinking about distributions and are sharing their re- to find ways to help teachers develop in their statisti-
search on students’ understanding of distributions in cal knowledge and thinking, especially now that statis-
a more prominent way. tics has a more prominent role in the K–12 curricula.
More research is needed on tertiary students’ conceptions Some promising inroads into teachers’ understand-
of statistics. At the time of the first edition of the Hand- ings of statistics have been made within professional
book much of the research had investigated tertiary development programs (Watson, 2001b; Heaton &
students’ conceptions of probability and statistics, and Mickelson, 2002; Hammerman & Rubin, 2003; Makar
very little research had been done with K–12 students. & Confrey, 2004). Student work samples or the results
This time it is the other way around. There has been of student test scores have been used by researchers to
a major influx of studies of K–12 students’ statistical catalyze teachers’ reflections and reasoning about sta-
thinking in school mathematics settings, and not so tistical issues. This technique is a promising indirect,
much has been done with tertiary students. What are non-threatening approach to finding out more about
tertiary students’ conceptions of centers and average? what teachers know and do not know about statistics.
What are their conceptions of variability? How do they
reason when comparing data sets? Do tertiary students Teaching Issues
show any noticeable improvement in reasoning over What is the statistical knowledge necessary for
their K–12 counterparts, or are they stuck at the same teaching? Ball and Bass (2000) and Ball, Lubienski,
levels on the same issues as elementary and secondary and Mewborne (2001) have argued that a special type
students are? Statistics is now required in almost all of knowledge is needed to teach mathematics, that
collegiate majors. Are students learning anything be- this knowledge is different than just more mathemati-
yond procedures in those courses? Are they learning cal content, and that it is more than just what Shul-
to be data detectives, or just data crunchers? man (1987) called pedagogical content knowledge.
The use of student-generated graphs is a very promis- For example, a special knowledge is needed for teach-
ing area for mining student thinking. Student-generated ing algebra that involves knowing the types of sym-
graphs, an instance of what been called student in- bolic mistakes and misconceptions that students will
scriptions, are very powerful tools to investigate student have, and how to address them. A special knowledge
thinking. Student graphs not only provide informa- is needed for geometry that involves the role of exam-
tion about students’ graphical skills, but also informa- ples and counterexamples, and reasoning with proof.
tion on the level of their thinking about data—cases, Within each content area of mathematics there are
clumps, aggregates, and so forth. Inscriptions can also special types of knowledge that are critical to effective
inform us on how students do or do not think about teaching of that content. What is needed to success-
3
A forthcoming issue of the Statistics Education Research Journal (SERJ) will include edited papers about research on students’ thinking about
distribution from the SRTL IV conference (Vol. 5, No. 2, November 2006). There is also a CD of the proceedings available (Makar, 2005).
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 1001
fully teach statistics? What are the particularly tough swering survey questions in particular ways can be
concepts and areas for potential misunderstandings in validated in detailed clinical interviews. Also, inter-
statistics? Examples of statistical knowledge for teach- views often reveal lines of thought that were in the
ing are addressed in Shaughnessy and Chance (2005) survey data all the time but that were initially missed
in their discussions of questions that arise from teach- by the researchers. I applaud the use of multiple re-
ers and students in the classroom. search methodologies and urge my fellow research-
Research is needed on classroom discourse in statis- ers to continue this practice.
tics. Are students being asked to analyze data at a The statistics education research community would
high level? Are statistical tasks posed and discussed benefit from a thoughtful discussion and debate about the
in classrooms in ways that promote high level think- strengths and limitations of using the SOLO model. The
ing, critical analysis, multiple representations, and SOLO model has been used by many researchers, par-
thoughtful communication of results and solutions? ticularly in Australia where it was born, to identify levels
This type of analysis of classroom processes has been of student reasoning on a number of constructs. Many
undertaken in research on the teaching of mathe- research studies reviewed in this chapter employed
matics, but so far no such research has been reported the SOLO model in an analysis of student responses
for the teaching of statistics. Analyzing, critiquing, to statistical tasks. The SOLO model is based on the
communicating, and representing are critical skills assumption that development can be represented in
that have been identified by Wild and Pfannkuch hierarchical structures. Is that assumption warranted?
(1999) and by Watson (1997) in their models of sta-
One of the criticisms leveled against the SOLO model
tistical thinking and statistical literacy, respectively.
is that it is not falsifiable, so the validity of any conclu-
Are we doing enough to identify and promote the
sions reached via a SOLO approach cannot be easily
types of teaching that will enhance our students’ dis-
challenged. On the other hand, the SOLO model has
course skills in statistics? How well is statistics taught
been genuinely useful in helping to describe student
by mathematics teachers? Statistics is not the same
reasoning on a number of concepts in statistics like
as mathematics (Cobb & Moore, 1997). Are we bal-
average, variation, comparison of data sets, and so on.
ancing our classroom discourse between exploratory
I recommend that the statistics education community
data analysis (data detective work) and the teaching
of statistical concepts and procedures? Discourse engage in healthy debate on the merits and demerits
analysis in statistics education is a wide-open area for of the SOLO model, and that the debate be published
future research. in a public forum.
There has been very little research into students’ and The statistics education research community would
teachers’ beliefs and attitudes towards statistics. This is benefit from a thoughtful discussion and debate about the
another area of research that is wide open for statis- strengths and limitations of using the Rasch model. Recent-
tics education. There has been research into teach- ly several statistics educators have begun to use the
er affect and teachers’ beliefs about mathematics Rasch model to quantify students’ statistical literacy,
(Thompson, 1992) as well as research into student including tasks involving such concepts as average
attitudes and beliefs about mathematics (McCleod, and variability (see for example, Callingham & Wat-
1992), but very little work has been done on stu- son, 2005; Watson & Callingham, 2003; Watson et al.,
dents’ or teachers’ attitudes and beliefs specifically 2003). In several instances Rasch measurement has
about statistics. been used in conjunction with the SOLO model in
order to scale students’ responses to statistical tasks
Methodological Issues and Recommendations along a SOLO type hierarchy. The results of the Rasch
The use of multiple research methodologies in research analysis in these studies appear to be very robust, and
on students’ conceptions of statistics has powerful payoffs. Rasch provides a way to quantify and measure student
There has been a growing trend for researchers to constructed responses on surveys and interviews. It is
use both quantitative and qualitative methodologies an underpinning assumption of Rasch measurement
in their research on students’ reasoning on statisti- that the target construct is unidimendional. One won-
cal tasks. In many studies researchers have gathered ders if this is really the case with such constructs as
and quantified results of surveys on statistical tasks average and variability, for which an eclectic variety
administered to large numbers of students but have of types of conceptions have been identified in the
also conducted clinical interviews with smaller num- literature. Are linear models of complex concepts suf-
bers of students, either in conjunction with survey ficient for valid measurements? This is a question for
administration or as follow-up work to the surveys. debate and discussion within the statistics education
Hypotheses generated about why students are an- community.
1002 ■ STUDENTS AND LEARNING
Some Implications From Research for the • Remember that there is a difference between statistics
Teaching of Statistics and mathematics! Wild and Pfannkuch’s work
(1999) and the writings of David Moore
There are so many recommendations that could (1990, 1997) are signals to everyone who
be made from research for the teaching and learning teaches mathematics that there are ways of
of statistics that this section could be a separate chap- thinking and analytical tools that are specific
ter just in itself. I will contain myself to just a few. to statistics. In particular, statistics is fraught
with contextual issues, which is the nature
• Emphasize variability as one of the primary issues in of the discipline, whereas often mathematics
statistical thinking and statistical analysis. In the strips off the context in order to abstract and
past there has been a tendency to overempha- generalize.
size centers as the principal concept in statis-
tics, and the important role of variability has
been neglected. Students need to integrate
the concepts of centers and variability when Concluding Remarks
they investigate data so that they can reason I had to make choices in developing this chapter
about properties of aggregates of data. so I tried to take to heart the old adage for authors,
• Introduce comparison of data sets much earlier on “Write what you know.” Or, in this case perhaps it is,
with students, prior to formal statistics. Bakker “Write what you think you know.” There are many op-
and Gravemeijer (2004), Konold and Higgens portunities for thoughtful and knowledgeable readers
(2003), and Watson and Moritz (1999) have to fill in the many gaps that I left behind. Hopefully
all found that students can develop their the chapter will provide a sufficient network of refer-
own powerful, intuitive ways to compare ences so that an interested reader can pursue a par-
data sets prior to the introduction of formal ticular topic by going directly to the original works
concepts like mean, median, variation, or and tracing back further among other references. I
standard deviation. Have students compare want to remind the reader that in this chapter you
data sets right from the start of their statistical have encountered statistical research through an indi-
education. rect mode, through the lens of my own personal filter.
• Build on students’ intuitive notions of center There is some efficiency in this approach, and that is
and variability. Research has uncovered a why people are asked to write reviews and syntheses
spectrum of student conceptions about of the literature. However, this is never any substitute
these two important concepts. Students will for reading the original sources. I urge all interested
be in transition from their own colloquial readers to do just that, especially in those instances
understandings of centers and variability— that peak your curiosity. Do not just take my word for
such as average as “typical” or variability as it, go and find out for yourself!
“things that change over time”—to more
statistical understandings of these concepts, REFERENCES
such as means or likely ranges. We should start
with what students bring to the table on these Aberg-Bengtsson, L., & Ottoson, T. (1995). Children’s understand-
ing of graphically represented quantitative information. Paper
concepts and build from there.
presented at the 6th Conference of the European Associa-
• Make the role of proportional reasoning in the tion for Research on Learning and Instruction. Nijmegen,
connections between populations and samples The Netherlands.
more explicit. Although it might seem obvious Australian Educational Council. (1991). A National Statement
that carefully chosen samples should be on Mathematics for Australian Schools. Canberra, Australia:
Author.
representative of whole populations, a long
Bakker, A. (2002). Route-type and landscape-type software for
history of research evidence suggests that learning statistical data analysis. In B. Phillips (Ed.), De-
people ignore base rates when making veloping a Statistically Literate Society: Proceedings of the Sixth
inferences from samples or predictions to International Conference on Teaching Statistics. Voorburg, The
populations (see for example, Kahneman & Netherlands: International Statistical Institute.
Tversky 1972, 1973; Tversky & Kahneman, Bakker, A., Biehler, R., & Konold, C. (2004). Should young stu-
dents learn about box plots? In G. Burrill & M. Camden
1974). Students should have repeated
(Eds.), Curricular Development in Statistics Education. Interna-
opportunities to actually choose samples tional Association for Statistical Education (IASE) Roundtable,
themselves, so they have chances to see the Lund, Sweden, (pp. 163–173). Voorburg, The Netherlands:
proportional relationship firsthand. International Statistical Institute.
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 1003
Bakkar, A., & Gravemeijer, K. P. E. (2004). Learning to reason field & D. Ben-Zvi (Eds.), The challenge of developing statisti-
about distribution. In J. Garfield & D. Ben Zvi (Eds.), The cal literacy, reasoning and thinking (pp. 295–324). Dordre-
challenge of developing statistical literacy, reasoning and thinking cht, The Netherlands: Kluwer.
(pp. 147–168). Dordrecht, The Netherlands: Kluwer. Callinghan, R. A., & Watson, J. M. (2005). Measuring statistical
Ball, D. L., & Bass, H. (2000). Interweaving content and peda- literacy. Journal of Applied Measuremment, 6, 19–47.
gogy in teaching and learning to teach: Knowing and using Chick, H. L., & Watson, J. M. (2001). Data representations
mathematics. In J. Boaler (Ed.), Multiple perspectives on the and interpretation by primary school students working in
teaching and learning of mathematics (pp. 83–104). Westport, groups. Mathematics Education Research Journal, 13, 91–111.
CT: Ablex. Chick, H. L., & Watson, J. M. (2002). Collaborative influences
Ball, D. L., Lubienski, S. T., & Mewborne, D. S. (2001). Research on emergent statistical thinking—A case study. Journal of
on teaching mathematics: The unsolved problem of teach- Mathematical Behavior, 21, 317–400.
ers’ mathematical knowledge. In V. Richardson (Ed.), Ciancetta, M., Shaughnessy, J. M., & Canada, D. (2003, July).
Handbook of research on teaching (4th ed., pp. 433–457). New Middle school students’ emerging definitions of variability.
York: Macmillan. In N. Pateman, B. Dougherty, & J. Zilliox (Eds.), Poster
Batanero, C., Estepa, A., Godino, J. D., & Green, D. R. (1996). Session in the Proceedings of the 27th Conference of the Interna-
Intuitive strategies and preconceptions about association tional Group for the Psychology of Mathematics Education (Vol.
in contingency tables. Journal for Research in Mathematics 4, p. 481). Honolulu: University of Hawaii.
Education, 27, 151–169. Cobb, G. W., & Moore, D. S. (1997). Mathematics, statistics, and
Batanero, C., Godino, J. D., Vallecillos, A., Green, D. R., & Hol- teaching. American Mathematical Monthly, 104, 801–823.
mes, P. (1994). Errors and difficulties in understanding el- Cobb, P. A. (1999). Individual and collective mathematics devel-
ementary statistical concepts. International Journal of Math- opment: The case of statistical data analysis. Mathematical
ematics Education in Science and Technology, 25, 527–547. Thinking and Learning, 1, 5–44.
Ben-Zvi, D. (2000). Toward understanding the role of techno-
Cobb, P., Gravemeijer, K. P. E., Doorman, M., & Bowers, J.
logical tools in statistics learning. Mathematical Thinking
(1999). Computer Mini-tools for exploratory data analysis
and Learning, 2, 127–155.
(Version Prototype). Nashville, TN: Vanderbilt University.
Biehler, R. (1993). Software tools and mathematics education:
Cobb, P., McClain, K., & Gravemeijer, K. P. E. (2003). Learn-
The case for statistics. In C. Keitel & K. Ruthven (Eds.),
ing about statistical co-variation. Cognition and Instruction,
Learning from computers: Mathematics education and technology
21(1), 1–78.
(pp. 68–100). NATO ASI Series F, Computers and Systems
Connected Mathematics. (1998). Palo Alto, CA: Dale Seymour
Sciences. Berlin, Germany: Springer-Verlag.
Publications.
Biehler, R. (1994a, July). Cognitive technologies for statistics ed-
ucation: Relating the perspective of tools for learning and Core-Plus Mathematics Project. (1997). Contemporary mathemat-
of tools for doing statistics. In L. Brunelli & G. Cicchitelli ics in context: A unified approach. Dedham, MA: Janson Pub-
(Eds.), Proceedings of the First Scientific Meeting of IASE (pp. lications.
173–190). Perguia, Italy: University of Perugia Press Corwin, R. B., & Friel, S. N. (1990) Used numbers: Prediction and
Biehler, R. (1994b, July). Requirements for an ideal software tool in sampling. Palo Alto, CA: Dale Seymour.
order to support learning and doing statistics. Paper presented Curcio, F. R. (1987). Comprehension of mathematical relation-
at the Fourth International Conference on Teaching Sta- ships experienced in graphs. Journal for Research in Mathemat-
tistics, Marrakech, Morocco. ics Education, 18, 382–393.
Biehler, R. (1997). Software for learning and for doing statistics. Curcio, F. R. (1989). Developing graph comprehension. Reston, VA:
International Statistical Review, 65, 167–189. National Council of Teachers of Mathematics.
Biggs J. B., & Collis, K. F. (1982). Evaluating the quality of learn- Curcio, F. R., & Artz, A. F. (1997). Assessing students’ statistical
ing: The SOLO taxonomy. New York: Academic Press. problem solving behaviors in a small group setting. In I.
Bright, G. W., & Friel, S. (1993, April). Elementary teachers’ rep- Gal & J. Garfield (Eds.), The assessment challenge in statistics
resentations of relationships among statistics concepts. Paper education (pp. 107–122). Amsterdam: IOS Press.
presented at the Annual Meeting of the American Educa- Dalal, S. R., Fowlkes, E. B., & Hoadley, B. (1989). Risk analysis
tional Research Association, Atlanta, GA. of the space shuttle: Pre-Challenger prediction of failure.
Burrill, G., Franklin, C., Godbold, L., & Young, L. (2004). Navi- Journal of the American Statistical Association, 84, 945–951.
gating through data analysis, grades 9–12. Reston, VA: Nation- Data Driven Mathematics (1999). White Plains, NY: Dale Seymour.
al Council of Teachers of Mathematics. delMas, R. (2001). Sampling SIM (Version 5). Retrieved April
Cai, J. (1995). Beyond the computational algorithm: Students’ 23, 2003, from http://www.gen.umn.edu/faculty_staff/
understanding of the arithmetic average concept. In L. delMas/stat_tools.
Meira & D. Carraher (Eds.), Proceedings of the 19th Psychology delMas, R., Garfield, J., & Chance, B. (1998). Assessing the ef-
of Mathematics Education Conference (Vol. 3. pp. 144–151). fects of a computer microworld on statistical reasoning.
Sao Paulo, Brazil: PME Program Committee. In L. Pereira-Mendoza, L. S. Kea, T. W. Kee, & W. Wong
Canada, D. (2004). Pre-service elementary teachers conceptions of (Eds.), Proceedings of the Fifth International Conference on
variability. Unpublished doctoral dissertation, Portland Teaching Statistics (pp. 1083–1089), Nanyang Technological
State University, Portland, OR. University. Singapore: International Statistical Institute.
Carr, J., & Begg, A.(1994). Introducing box and whisker plots. delMas, R., Garfield, J., & Chance, B. (1999). A model of class-
In J. Garfield (Ed.), Research Papers from the Fourth Interna- room research in action: Developing simulation activities
tional Conference on Teaching Statistics. Minneapolis, MN. to improve students’ statistical reasoning. Journal of Statis-
Chance, B., delMas, R., & Garfield, J. (2004). Reasoning about tics Education, 7, 3. Retrieved July 10, 2005, from www.am-
sampling distributions. Data driven mathematics. In J. Gar- stat.org/publications/jse/v7n3.
1004 ■ STUDENTS AND LEARNING
Doerr, H. M. (2000). How can I find a pattern in this random cation Research Journal, 2, 3–21. Retrieved March 15, 2005,
data? The convergence of multiplicative and probabilistic from http://fehps.une.edu.au/serj.
reasoning. Journal of Mathematical Behavior, 18, 431–454. Gal, I. (2004). Statistical literacy: Meanings, components, re-
Dunkels, A. (1994). Interweaving numbers, shapes, statistics, sponsibilities. In J. Garfield & D. Ben-Zvi (Eds.), The chal-
and the real world in primary school and primary teacher lenge of developing statistical literacy, reasoning and thinking
education. In D. F. Robitaille, D. H. Wheeler, & C. Kieran (pp. 47–78). Dordrecht, The Netherlands: Kluwer.
(Eds.), Selected Lectures from the 7th International Congress on Gal, I., & Garfield, J. (1997). The assessment challenge in statistics
Mathematical Education (pp. 123–135). Sainte-Foy, Quebec, education. Amsterdam: IOS Press.
Canada: Laval University Press. Gal, I., Rothschild, K., & Wagner, D. A. (1989, April). Which
Estepa, A., & Batanero, C. (1994, July). Judgments of association in group is better? The development of statistical reasoning in school
scatter-plots: An empirical study of students’ strategies and precon- children. Paper presented at the meeting of the Society for
ceptions. Paper presented at the Fourth International Con- Research in Child Development, Kansas City, KS.
ference on Teaching Statistics: Marrakech, Morocco. Gal, I., Rothschild, K., & Wagner, D. A. (1990, April). Statistical
Estepa, A., Batanero, C., & Sanchez, F. T. (1999). Students’ in- concepts and statistical reasoning in children: Convergence of di-
tuitive strategies in judging association when comparing vergence? Paper presented at the meeting of the American
two samples. Hiroshima Journal of Mathematics Education, 7, Educational Research Association, Boston, MA.
17–30. Garfield, J. (1990). Technology and data: Models and analysis. Re-
Finzer, W. (2002). The Fathom experience—is research-based port of the Working Group on Technology and Statistics.
development of a commercial statistics learning environ- Madison, WI: NCRMSE.
ment possible? In B. Phillips (Ed.), Developing a statistically Garfield, J. (2002). The challenge of developing statistical reason-
literate society: Proceedings of the Sixth International Conference ing. Journal of Statistics Education, 10(3). Retrieved April 23,
on Teaching Statistics. [CD-ROM]. Voorburg, The Nether- 2003, from http://www.amstat.org/publications/jse/.
lands: International Statistical Institute. Garfield, J., & Ben-Zvi, D. (2004). The challenge of developing sta-
Fischbein, E. (1987). Intuition in science and mathematics. Dordre- tistical literacy, reasoning and thinking. Dordrecht, The Neth-
cht, The Netherlands: D. Reidel. erlands: Kluwer.
Fischbein, E., & Schnarck, D. (1997). The evolution with age of Garfield, J., delMas, R., & Chance, B. (1999, August). Develop-
probabilistic intuitively based misconceptions. Journal for ing statistical reasoning about sampling distributions. Present-
Research in Mathematics Education, 28, 96–105. ed at the First International Research Forum on Statisti-
Friel, S. N. (1998). Teaching statistics: What’s average? In L. cal Reasoning, Thinking, and Literacy (SRTL I). Kibbutz
J. Morrow (Ed.), The teaching and learning of algorithms in Be’eri, Israel.
school mathematics (pp. 208–217). Reston, VA: National Garfield, J., delMas, B., & Chance, B. (2005). ARTIST: Assess-
Council of Teachers of Mathematics. ment resource tools for improving statistical thinking. Re-
Friel, S. N. (2007). The research frontier: Where technology in- trieved March 20, 2005, from http://data.gen.umn.edu/
teracts with the teaching and learning of data analysis and artist/index.html.
statistics. In G. Blume & K. Heid (Eds.), Research on technology Green, D. (1993). Data analysis: What research do we need? In
and the teaching and learning of mathematics: Cases and perspec- L. Pereira-Mendoza (Ed.), Introducing data analysis in the
tives (Vol. 1, pp. 279–331). Greenwich, CT: Information Age. schools: Who should teach it? (pp. 219–239). Voorburg, The
Friel, S. N., & Bright, G. W. (1998). Teach-Stat: A model for pro- Netherlands: International Statistics Institute.
fessional development and data analysis for teachers K–6. Greer, B., & Ritson, R. (1993). Teaching data handling with the
In S. Lajoie (Ed.), Reflections on statistics: Learning, teaching, Northern Ireland Mathematics Curriculum: Report on survey in
and assessment in grades K–12 (pp. 89–117). Mahwah, NJ: schools. Belfast, Ireland: Queen’s University.
Erlbaum. Hammerman, J. K., & Rubin, A. (2003). Reasoning in the pres-
Friel, S. N., Mokros, J. R., & Russell, S. J. (1992). Used numbers: Mid- ence of variability. In C. Lee (Ed.), Proceedings of the Third
dles, means, and in-betweens. Palo Alto, CA: Dale Seymour. International Research Forum on Statistical Reasoning, Think-
Friel, S. N., Bright, G. W., Frierson, D., & Kader, G. D. (1997). ing, and Literacy (SRTL-3, CD-ROM). Mt. Pleasant: Central
A framework for assessing knowledge and learning in sta- Michigan University.
tistics (K–8). In I. Gal and J. Garfield (Eds.), The assessment Hawkins, A. (Ed.). (1990). Teaching teachers to teach statistics. Voor-
challenge in statistics education (pp. 55–64). Amsterdam: IOS burg, The Netherlands: International Statistics Institute.
Press. Hawkins, A., Jolliffe, F., & Glickman, L. (1992). Teaching statisti-
Friel, S. N., Curcio, F. R., & Bright, G. W. (2001). Making sense cal concepts. London: Longman Publishing.
of graphs: Critical factors influencing comprehension and Heaton, R., & Mickelson, W. (2002). The learning and teaching
instructional implications. Journal for Research in Mathemat- of statistical investigation in teaching and teacher educa-
ics Education, 32, 12–158. tion. Journal of Mathematics Teachers Education, 5, 35–59.
Foreman, L. C., & Bennett, A. B. (1995). Math alive, Course I. Holmes, P. (1997). Assessing project work by external reviewers.
Salem, OR: The Math Learning Center. In I. Gal & J. Garfield (Eds.), The assessment challenge in sta-
Gal, I. (1998). Assessing statistical knowledge as it relates to stu- tistics education (pp. 153–164). Amsterdam: IOS Press.
dents’ interpretation of data. In S. Lajoie (Ed.), Reflections Huff, D. (1954). How to lie with statistics. New York: W.W. Norton
on statistics: Learning, teaching, and assessment in grades K–12 Publishing.
(pp. 275–295). Mahwah, NJ: Erlbaum. Investigations into Number, Data, and Space. (1998). White
Gal, I. (2002). Adults’ statistical literacy: Meanings, components, Plains, NY: Dale Seymour Publications.
responsibilities. International Statistical Review, 70, 1–25. Jacobs, V. R. (1997, April). Children’s understanding of sampling
Gal, I. (2003). Expanding conceptions of statistical literacy: An in surveys. Paper presented at the annual meeting of the
analysis of products from statistics agencies. Statistics Edu- American Educational Research Association, Chicago.
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 1005
Jacobs, V. R. (1999). How do students think about statistical (pp. 193–215). Reston, VA: National Council of Teachers
sampling before instruction? Mathematics Teaching in the of Mathematics.
Middle School, 5, 240–246, 263. Konold, C., & Miller, C. (1994). Data scope and prob-sim.
Jones, G. A., Langrall, C. W., Thornton, C. A., & Mogill, A. T. [Computer software]. Amherst: University of Massachu-
(1999). Students’ probabilistic thinking in instruction. setts, SRRI.
Journal for Research in Mathematics Education, 30, 487–519. Konold, C., & Pollatsek, A. (2002) Data analysis as a search for
Jones, G. A., Thornton, C. A., Langrall, C. W., Mooney, E. S., signals in noisy processes. Journal for Research in Mathematics
Perry, B., & Putt, I. J. (2000). A framework for characteriz- Education, 33, 259–289.
ing students’ statistical thinking. Mathematics Thinking and Konold, C., Pollatsek, A., Well, A., Lohmeier, J., & Lipson, A.
Learning, 2, 269–307. (1993). Inconsistencies in students’ reasoning about prob-
Jones, G. A., Langrall, C. W., Mooney, E. S., & Thornton, C. A. ability. Journal for Research in Mathematics Education, 24,
(2004). Models of development in statistical reasoning. In 392–414.
J. Garfield & D. Ben Zvi (Eds.), The Challenge of Develop- Konold, C., Pollatsek, A., Well, A., & Gagnon, A. (1997). Stu-
ing Statistical Literacy, Reasoning and Thinking (pp. 97–118). dents analyzing data: Research of critical barriers. In J. B.
Dordrecht, The Netherlands: Kluwer. Garfield & G. Burrill (Eds.), Research on the role of technol-
Jones, G. A., Langrall, C. W, & Mooney, E. S. (this volume). ogy in teaching and learning statistics. Voorburg, The Nether-
Research in Probability: Responding to Classroom Reali- lands: International Statistical Institute.
ties. In F. Lester, (Ed.), Handbook of research on the teaching Konold, C., Higgens, T., Russell, S. J., & Khalil, K. (February,
and learning of mathematics (2nd ed.). Reston, VA: National 2004). Data seen through different lenses. Unpublished
Council of Teachers of Mathematics. manuscript, Amherst, MA: University of Massachusetts.
Jolliffe, F. (1997). Issues in constructing assessment instruments Landwehr, J. M., Watkins, A.E., & Swift, J. (1987). Exploring sur-
for the classroom. In I. Gal & J. Garfield (Eds.), The assess- veys: Information from samples. Palo Alto, CA: Dale Seymour.
ment challenge in statistics education (pp. 191–204). Amster- Lajoie, S. P. (1997). Technologies for assessing and extending
dam: The International Statistics Institute. statistical learning. In I. Gal & J. Garfield (Eds.), The assess-
Kahneman, D., & Tversky, A. (1972). Subjective probability: ment challenge in statistics education (pp. 179–190). Amster-
A judgment of representativeness. Cognitive Psychology, 3, dam: IOS Press.
430–454. LeCoutre, V.P. (1992). Cognitive models and problem spaces
Kahneman, D., & Tversky, A. (1973a). On the psychology of in “purely random” situations. Educational Studies in Math-
prediction. Psychological Review, 80, 237–251. ematics, 23, 557–568.
Kahneman, D., & Tversky, A. (1973b). Availability: A heuristic Lehrer, R. & Romberg, T. (1996). Exploring children’s data
for judging frequency and probability. Cognitive Psychology, modeling. Cognition and Instruction, 14, 69–108.
5, 207–232. Lesh, R., Amit, M., & Schorr, R. Y. (1997). Using “real-life” prob-
Kelly, B. A., & Watson, J. M. (2002) Variation in a chance sam- lems to prompt students to construct conceptual models.
pling setting: The lollies task. In B. Barton, K. C. Irvin, M. In I. Gal & J. Garfield (Eds.), The assessment challenge in sta-
Pfannkuch, & M. J. Thomas (Eds.), Proceedings of the 25th tistics education (pp. 65–84). Amsterdam: IOS Press.
annual conference of the Mathematics Education Research Group Lipson, K. (2002). The role of computer based technology in
of Australasia: Mathematics education in the South Pacific, Auck- developing understanding of the concept of sampling dis-
land (Vol. 2, pp.366–373). Sydney, Australia: MERGA. tribution. In B. Phillips (Ed.), Proceedings of the Sixth Interna-
Keeler, C. M. (1997). Portfolio assessment in graduate level sta- tional Conference on Teaching Statistics: Developing a statistically
tistics courses. In I. Gal and J. Garfield (Eds.), The Assess- literate society, Cape Town, South Africa. [CD-ROM]. Voorburg,
ment Challenge in Statistics Education (pp. 165–178). Amster- The Netherlands: International Statistical Institute.
dam: IOS Press. Loosen, F., Lioen, M., & Lacante, M. (1985). The standard de-
Key Curriculum Press. (2005a). Tinkerplots Dynamic Data Ex- viation: Some drawbacks of an intuitive approach. Teaching
ploration (Version 1.0) [Computer software]. Emeryville, Statistics, 7, 29–39.
CA: Author. Makar, K. (2005). Reasoning about distributions: A collection
Key Curriculum Press. (2005b). Fathom Dynamic Data Software of recent research studies. Proceedings of the Fourth In-
(Version 2.0) [Computer software]. Emeryville, CA: Author. ternational Research Forum for Statistical Reasoning,
Kirsch, I. S., Jungeblut, S. S., & Mosenthal, P. M. (1998). The Thinking, and Literacy, Auckland, NZ. Brisbane Australia:
measurement of adult literacy. In S.T. Murray, I.S. Kirsch, University of Queensland.
& L. B. Jenkins (Eds.), Adult literacy in OECD countries: Tech- Makar, K., & Confrey, J. (2002, August). Comparing two distribu-
nical report on the first International Adult Literacy Survey (pp. tions: Investigating secondary teachers’ statistical thinking. Pa-
105–134). Washington, DC: National Center for Education per presented at the Sixth International Conference on
Statistics, U.S. Department of Education. Teaching Statistics. Cape Town, South Africa.
Konold, C. (1989). Informal conceptions of probability. Cogni- Makar, K., & Confrey, J. (2005). “Variation Talk”: Articulat-
tion and Instruction, 6, 59–98. ing meaning in statistics. Statistical Education Research
Konold, C., & Higgens, T. (2002). Working with data: High- Journal, 4(1), 27–54. Retrieved October 15, 2006, from
lights related to research. In S. J. Russell, D. Schifter, & http://www.stat.auckland.ac.nz/~iase/publications.
V. Bastabel (Eds.), Developing mathematical ideas: Collecting, php?show=serjarchive.
representing, and analyzing data (pp. 165–201). Mahwah, NJ: Makar, K., & Confrey, J. (2004). Secondary teachers’ reasoning
Erlbaum. about comparing two groups. In D. Ben-Zvi & J. Garfield
Konold, C., & Higgens, T. (2003). Reasoning about data. In. (Eds.), The challenges of developing statistical literacy, reason-
J. Kilpatrick, W. G. Martin, & D. Schifter (Eds.), A research ing, and thinking (pp. 353–374). Dordrecht, The Nether-
companion to principles and standards for school mathematics lands: Kluwer.
1006 ■ STUDENTS AND LEARNING
Mathematics in context. (1994). New York: Rand McNally. document readability formula. Journal of Adolescent and
McCleod, D. (1992). Research on affect in mathematics edu- Adult Literacy, 41(8), 638–657.
cation: A reconceptualization. In D. Grouws (Ed.), Hand- National Assessment Governing Board. (1994). Mathematics
book of Research on Mathematics Teaching and Learning (pp. Framework for the 1996 National Assessment of Educational
575–596). Reston, VA: National Council of Teachers of Progress. Washington, D.C.: Author.
Mathematics. National Council of Teachers of Mathematics. (1989). Curricu-
Meletiou, M. (2000). Developing students’ conceptions of variation: lum and evaluation standards for K–12 mathematics. Reston,
An untapped well in statistical reasoning. Unpublished doc- VA: Author.
toral dissertation, The University of Texas at Austin. National Council of Teachers of Mathematics. (2000). Principles
Meletiou, M. (2002). Conceptions of variation: A literature re- and standards for school mathematics. Reston, VA: Author.
view. Statistics Education Research Journal, 1(1), 46–52. National Council of Teachers of Mathematics. (2004a). Navigat-
Meletiou, M. & Lee, C. (2002). Student understanding of histo- ing through data. Reston, VA: Author.
grams: A stumbling stone to the development of intuitions National Council of Teachers of Mathematics. (2004b). Navigat-
about variation. In B. Phillips (Ed.), Proceedings of the Sixth In- ing through probability. Reston, VA: Author.
ternational Conference on Teaching Statistics: Developing a statisti- Nemirovsky, R. (1996). Mathematical narratives, modeling, and
cally literate society, Cape Town, South Africa. [CD-ROM]. Voor- algebra. In N. Bednarz, C. Kieran, & L. Lee (Eds.), Approach-
burg, The Netherlands: International Statistical Institute. es to algebra: Perspectives for research and teaching (pp. 197–220).
Mevarech, Z. (1983). A deep structure model of students’ sta- Dordrecht, The Netherlands: Kluwer.
tistical misconceptions. Educational Studies in Mathematics,
Petrosino, A. J., Lehrer, R., & Schauble, L (2003). Structuring
14, 415–429.
error and experimental variation as distribution in 4th
Mickelson, W., & Heaton, R. (2003). Purposeful statistical in- grade. Mathematics Thinking and Learning, 5, 131–136.
vestigation merged with K–6 content: Variability, learning,
Pereira-Mendoza, L. (1995). Graphing in the primary school: Al-
and teacher knowledge use in teaching. In C. Lee (Ed.),
gorithm versus comprehension. Teaching Statistics, 17, 2–6.
Proceedings of the Third International Research forum on statisti-
Pereira-Mendoza L., & Dunkels, A. (1989). Stem-and-leaf plots
cal reasoning, thinking, & literacy (SRTL-3, CD-ROM). Mt.
in the primary grades. Teaching Statistics, 11, 34–37.
Pleasant: Central Michigan University.
Mickelson, W., & Heaton, R. (2004). Primary teachers’ statis- Pereira-Mendoza, L. & Mellor, J. (1991). Students’ concepts of
tical reasoning about data. In D. Ben-Zvi & J. Garfield bar graphs—some preliminary findings. In D. Vere-Jones
(Eds.), The challenge of developing statistical literacy, reasoning, (Ed.), Proceedings of the Third International Conference on
and thinking (pp. 327–352). Dordrecht, The Netherlands: Teaching Statistics. Vol. I: School and General Issues (pp. 150–
Kluwer. 157). Voorburg, The Netherlands: International Statistics
Institute.
Minitab Statistical Software. (2005). State College, PA: Pennsylva-
nia State University Press. Pfannkuch, M. (2005). Probability and statistical inference:
Mokros, J., & Russell, S. J. (1995). Children’s concepts of av- How can teachers enable learners to make the connec-
erage and representativeness. Journal for Research in Math- tion? In G. Jones (Ed.), Exploring probability in school: Chal-
ematics Education, 26, 20–39. lenges for teaching and learning (pp. 1–32). Dordrecht, The
Netherlands: Kluwer.
Mooney, E. S. (2002). A framework for characterizing middle
school students’ statistical thinking. Mathematical Thinking Pfannkuch, M., & Watson, J. (2005). Statistics education. In B.
and Learning, 4, 23–63. Perry, G. Anthony, & C. Diezmann (Eds.), Research in math-
ematics education in Australasia 2000–2003 (pp. 265–289).
Moore, D. (1990). Uncertainty. In L. Steen (Ed.), On the shoul-
Brisbane, Australia: PostPressed.
ders of giants: New approaches to numeracy (pp. 95–137).
Washington, DC: National Academy Press. Pfannkuch, M., & Wild, C. J. (2000). Statistical thinking and
Moore, D. (1997). New pedagogy and new content: The case of statistical practice: Themes gleaned from professional stat-
statistics. International Statistics Review, 65, 123–165. isticians. Statistical Science, 15, 132–152.
Moritz, J. B., & Watson, J. M. (1997). Graphs: Communication Pfannkuch, M., & Wild, C. J. (2004). Towards an understanding
lines to students. In F. Biddulph & K. Carr (Eds.), People of statistical thinking. In J. Garfield & D. Ben-Zvi (Eds.), The
in Mathematics Education (Vol. 2, pp. 344–351). Proceedings challenge of developing statistical literacy, reasoning and thinking
of the Twentieth Annual Meeting of the Mathematics Education (pp. 17–46). Dordrecht, The Netherlands: Kluwer.
Research Group of Australasia. Waikato, New Zealand: The Pollatsek, A., Lima, S., & Well, A. D. (1981). Concept or compu-
University of Waikato Printery. tation: Students’ understanding of the mean. Educational
Moritz, J. B. (2000). Graphical representations of statistical Studies in Mathematics, 12, 191–204.
associations by upper primary students. In J. Bana & A. Pollatsek, A., Konold, C., Well, A., & Lima, S. (1984). Beliefs
Chapman (Eds.), Mathematics education beyond 2000. Pro- underlying random sampling. Cognition and Instruction, 12,
ceedings of the 23rd Annual Conference of the Mathematics Edu- 395–401.
cation Research Group of Australasia, 2 (pp. 440–447). Perth, Polya, G. (1945). How to Solve it. Princeton, NJ: Princeton Uni-
Australia: Mathematics Education Research Group of Aus- versity Press.
tralasia. Reading, C. (2003, July). Student perceptions of variation in a real
Moritz, J. B. (2004). Reasoning about co-variation. In J. Garfield world context. Paper presented at the Third International
& D. Ben-Zvi (Eds.), The challenge of developing statistical lit- Research Forum on Statistics Reasoning, Thinking, and
eracy, reasoning and thinking (pp. 227–256). Dordrecht, The Literacy (SRTL 3). Lincoln, NE.
Netherlands: Kluwer. Reading, C., & Shaughnessy, J. M. (2000). Student perceptions
Mosenthal, P. M., & Kirsch, I. S. (1998). A new measure for as- of variation in a sampling situation. In T. Nakahara & M.
sessing document complexity: The PMOSE/IKIRSZCH Koyama (Eds.), Proceedings of the 24th Conference of the Interna-
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 1007
tional Group for the Psychology of Mathematics Education (Vol. 4, Shaughnessy J. M., & Bergman, B. (1993). Thinking about
pp. 89–96). Hiroshima, Japan: Hiroshima University. uncertainty: Probability and statistics. In P. Wilson (Ed.),
Reading, C., & Shaughnessy, J. M. (2004). Reasoning about Research ideas for the classroom: High school mathematics (pp.
variation. In J. Garfield & D. Ben-Zvi (Eds.), The challenge 177–197). Reston, VA: National Council of Teachers of
of developing statistical literacy, reasoning and thinking (pp. Mathematics.
201–226). Dordrecht, The Netherlands: Kluwer. Shaughnessy, J. M., & Chance, B. (2005). Statistical questions from
Roth, W. -M & Bowen, G. M. (2001). Professionals read graphs: the classroom. Reston, VA: National Council of Teachers of
A semiotic analysis. Journal for Research in Mathematics Edu- Mathematics.
cation, 32, 159–193. Shaughnessy, J. M., & Ciancetta, M. (2002). Students’ under-
Rubin, A., & Rosebery, A. S. (1988, August). Teachers’ misunder- standing of variability in a probability environment. In B.
standings in statistical reasoning: Evidence from a field test of Phillips (Ed.), CD of the Proceedings of the Sixth International
innovative materials. Paper presented at the International Conference on Teaching Statistics: Developing a statistically liter-
Statistics Institute Round Table Conference, Budapest, ate society, Cape Town, South Africa. Voorburg, The Nether-
Hungary. Voorburg, The Netherlands: International Sta- lands: International Statistics Institute.
tistics Institute. Shaughnessy, J. M., & Dick, T. P. (1991). Monty’s Dilemma:
Rubin, A., Bruce, B., & Tenney, Y. (1991). Learning about sam- Should you stick or switch? The Mathematics Teacher, 84,
pling: Trouble at the core of statistics. In D. Vere-Jones 252–256.
(Ed.), Proceedings of the Third International Conference on Shaughnessy, J. M., & Pfannkuch, M. (2002). How faithful is
Teaching Statistics (Vol. 1, pp. 314–319). Voorburg, The Old Faithful? Statistical thinking: A story of variation and
Netherlands: International Statistical Institute. prediction. The Mathematics Teacher, 95, 252–259.
Russell, S. J., & Corwin, R. B. (1989). Used numbers: The shape of Shaughnessy, J. M., & Zawojewski, J. S. (1999). Secondary stu-
the data. Palo Alto, CA: Dale Seymour. dents’ performance on data and chance in the 1996 NAEP.
Russell, S. J., & Mokros, J. (1996). What do children understand The Mathematics Teacher, 92, 713–718.
about average? Teaching Children Mathematics, 2, 360–364. Shaughnessy, J. M., Garfield, J., & Greer, B. (1996). Data Han-
Russell, S. J., Schifter, D., & Bastable, V. (2002). Developing math- dling. In A. J. Bishop, K. Clements, C. Keitel, J. Kilpatrick,
ematical ideas: Working with data. Parsippany, NJ: Dale Sey- & C. Laborde (Eds.), International Handbook of Mathemat-
mour. ics Education (pp. 205–237. Dordrecht, The Netherlands:
Saldanha, L. (2003). Is this sample unusual? An investigation of Kluwer.
students exploring connections between sampling distributions Shaughnessy, J. M., Watson, J. M, Moritz, J. B., & Reading, C.
and statistical inference. Unpublished doctoral dissertation, (1999, April). School mathematics students’ acknowledgement
Vanderbilt University, Tennessee. of statistical variation: There’s more to life than centers. Paper
Saldanha, L., & Thompson, P. (2003). Conceptions of sample presented at the Research Pre-session of the 77th Annual
and their relationship to statistical inference. Educational meeting of the National Council of Teachers of Mathemat-
Studies in Mathematics, 51, 257–270. ics, San Francisco, CA.
Schau, C., & Mattern, N. (1997). Assessing students’ connected Shaughnessy, J. M., Ciancetta, M., & Canada, D. (2003). Middle
understanding of statistical relationships. In I. Gal & J. school students’ thinking about variability in repeated tri-
Garfield (Eds.), The assessment challenge in statistics education als: A cross-task comparison. In N. Pateman, B. Dougherty,
(pp. 91–106). Amsterdam: IOS Press & J. Zillah (Eds.). Proceedings of the 27th Conference of the In-
Shaughnessy, J. M. (1977). Misconceptions of probability: An ternational Group for the Psychology of Mathematics Education
experiment with a small-group, activity-based, model (Vol. 4, pp. 159–166). Honolulu, HI: University of Hawaii.
building approach to introductory probability at the col- Shaughnessy, J. M., Ciancetta, M., & Canada, D. (2004a). Types
lege level. Educational Studies in Mathematics, 8, 285–316. of student reasoning on sampling tasks. In M. Johnsen
Shaughnessy, J. M. (1992). Research on probability and sta- Hoines & A. Berit Fuglestad (Eds.), Proceedings of the 28th
tistics: Reflections and Directions. In D. Grouws (Ed.), meeting of the International Group for Psychology and Mathemat-
Handbook of research on mathematics teaching and learning ics Education (Vol. 4, pp. 177–184). Bergen, Norway: Ber-
(pp. 465–494). Reston, VA: National Council of Teachers gen University College Press.
of Mathematics. Shaughnessy, J. M., Ciancetta, M., Best, K, & Canada, D. (2004b,
Shaughnessy, J. M. (1997) Missed opportunities in research on April). Students’ attention to variability when comparing distri-
the teaching and learning of data and chance. In F. Bid- butions. Paper presented at the Research Pre-session of the
dulph & K. Carr (Eds.), People in mathematics education (Vol. 82nd annual meeting of the National Council of Teachers
1, pp. 6–22). Proceedings of the Twentieth annual meet- of Mathematics, Philadelphia, PA.
ing of the Mathematics Education Research Group of Aus- Shaughnessy, J. M., Barrett, G., Billstein, R., Kranendonk, H.
tralasia. Waikato, New Zealand: The University of Waikato A., & Peck, R. (2004c). Navigating through probability, grades
Printery. 9–12. Reston, VA: National Council of Teachers of Math-
Shaughnessy, J. M. (2003a). Research on students’ understand- ematics.
ing of probability. In J. Kilpatrick, W. G. Martin, & D. Shaughnessy, J. M., Ciancetta, M., Best, K., & Noll, J. (2005, April).
Schifter (Eds.), A research companion to principles and stan- Secondary and middle school students’ attention to variability when
dards for school mathematics (pp. 216–226). Reston, VA: Na- comparing data sets. Paper presented at the Research Pre-ses-
tional Council of Teachers of Mathematics. sion of the 83rd annual meeting of the National Council of
Shaughnessy, J. M. (2003b). The development of secondary students’ Teachers of Mathematics, Anaheim, CA.
conceptions of variability. (Annual report year 1. NSF Grant No. Shulman, L. S. (1987). Knowledge and teaching: Foundations
REC 0207842). Portland, OR: Portland State University. of the new reform. Harvard Educational Review, 57, 1–22.
1008 ■ STUDENTS AND LEARNING
Starkings, S. (1997). Assessing statistical projects. In I. Gal & J. Watson, J. M. (2002). Inferential reasoning and the influence
Garfield (Eds.), The assessment challenge in statistics education of cognitive conflict. Educational Studies in Mathematics, 51,
(pp. 139–151). Amsterdam: IOS Press. 225–256.
Strauss, S., & Bichler, E. (1988). The development of children’s Watson, J. M. (2004). Developing reasoning about samples. In
concepts of the arithmetic average. Journal for Research in J. Garfield & D. Ben-Zvi (Eds.), The challenge of developing
Mathematics Education, 19, 64–80. statistical literacy, reasoning and thinking (pp. 277–294). Dor-
Tarr, J., & Shaughnessy, J. M. (in press). Statistics and probabil- drecht, The Netherlands: Kluwer.
ity. In P. Kloosterman & F. Lester (Eds,), Results from the Watson, J. M., & Callingham, R. (2003). Statistical literacy: A
Eighth & Ninth Mathematics Assessment of the National Assess- complex hierarchical construct. Statistics Education Research
ment of Educational Progress. Reston, VA: National Council of Journal, 2, 3–46.
Teachers of Mathematics. Watson, J. M., & Kelly, B. A. (2002). Can grade 3 students learn
Takis, S. L. (1999). Titanic: A statistical exploration. The Math- about variation? In B. Phillips (Ed.), Proceedings of the Sixth
ematics Teacher, 92, 660–664. International Conference on Teaching Statistics: Developing a
Teach-STAT. (1996a). Teach-STAT: Teaching statistics grades 1–6: A statistically literate society, Cape Town, South Africa. [CD-
Key for Better Mathematics. The University of North Carolina ROM].Voorburg, The Netherlands: International Statis-
Mathematics and Science Education Network. Palo Alto, tics Institute.
CA: Dale. Watson, J. M., & Kelly, B. A. (2004). Expectation versus varia-
Teach-STAT. (1996b). Teach-STAT for statistics educators. The Uni- tion: Students’ decision making in a chance environment.
versity of North Carolina Mathematics and Science Educa- Canadian Journal of Science, Mathematics, and Technology Edu-
tion Network. Palo Alto, CA: Dale Seymour. cation, 4, 371–396.
Thompson, A. (1992). Teachers’ beliefs and conceptions: A Watson, J. M., & Moritz, J. B. (1997a). Student analysis of vari-
synthesis of the research. In D. Grouws (Ed.), Handbook of ables in a media context. In B. Phillips (Ed.), Papers on
research on mathematics teaching and learning (pp. 127–146). Statistical Education Presented at ICME 8 (pp.129–147). Haw-
Reston, VA: National Council of Teachers of Mathematics. thorn, Australia: Swinburne Press.
Thompson, P. W., & Saldanha, L. A. (2003). Fractions and multi- Watson, J. M., & Moritz, J. B. (1997b). The C&D PD CD: Profes-
plicative reasoning. In. J. Kilpatrick, W. G. Martin, & D. sional development in chance and data in the technologi-
Schifter (Eds.), A research companion to principles and standards cal age. In N. Scott & H. Hollingsworth (Eds.), Mathematics
for school mathematics (pp. 95–113). Reston, VA: National creating the future. Proceedings of the 16th Biennial Conference
Council of Teachers of Mathematics. of the Australian Association of Mathematics Teachers(pp. 442–
Torok, R. (2000). Putting the variation into chance and data. 450). Adelaide, South Australia: AMMT.
Australian Mathematics Teacher, 56, 25–31. Watson, J. M., & Moritz, J. B. (1999). The beginning of statistical
Torok, R., & Watson, J. (2000). Development of the concept of inference: Comparing two data sets. Educational Studies in
statistical variation: An exploratory study. Mathematics Edu- Mathematics, 37,145–168.
cation Research Journal, 9, 60–82. Watson, J. M., & Moritz, J. B. (2000a). Developing concepts of
Truran, J. (1994). Children’s intuitive understanding of vari- sampling. Journal for Research in Mathematics Education, 31,
ance. In J. Garfield (Ed.), Research papers from the Fourth 44–70.
International Conference on Teaching Statistics. Minneapo- Watson, J. M., & Moritz, J. B. (2000b). Development of under-
lis, MN: The International Study Group for Research on standing of sampling for statistical literacy. Journal of Math-
Learning Probability and Statistics. ematical Behavior, 19, 109–136.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertain- Watson, J. M., & Moritz, J. B. (2000c). The longitudinal develop-
ty: Heuristics and biases. Science, 185, 1124–1131. ment of understanding of average. Mathematical Thinking
Tversky, A., & Kahneman, D. (1983). Extensional versus intui- and Learning, 2, 11–50.
tive reasoning: The conjunction fallacy in probability judg- Watson, J. M. & Moritz, J. B. (2000d). The development of con-
ment. Psychological Review, 90, 293–315. cepts of average. Focus on Learning Problems in Mathematics,
United States Department of Agriculture (2004). Economic Re- 21, 15–39.
search Service. Retrieved October 13, 2005, from www.ers. Watson, J. M., & Moritz, J. B. (2001). Development of reasoning
usda.gov/data/foodconsumption/foodavailspreads. associated with pictographs: Representing, interpreting, and
Vergnaud, G. (1983). Multiplicative structures. In R. Lesh & M. predicting. Educational Studies in Mathematics, 48, 47–81.
Landau (Eds.), Acquisition of mathematics concepts and pro- Watson, J. M., & Shaughnessy, J. M. (2004). Proportional rea-
cesses (pp. 127–174). New York: Academic Press. soning: Lessons from research in data and chance. Math-
Watson, J. M. (1997). Assessing statistical thinking using the me- ematics Teaching in the Middle School, 10, 104–109.
dia. In I. Gal & J. Garfield (Eds.), The assessment challenge in Watson, J. M., Collis, K. F., Callingham, R. A., & Moritz, J. B.
statistics education (pp. 107–121). Amsterdam: IOS Press. (1995). A model for assessing higher order thinking in sta-
Watson, J. M. (1998). Professional development for teachers of tistics. Educational Research and Evaluation, 1, 247–275.
probability and statistics: Into an era of technology. Interna- Watson, J. M., Kelly, B.A., Callingham, R.A., & Shaughnessy, J. M.
tional Statistical Review, 66, 271–289. (2003). The measurement of school students’ understand-
Watson, J. M. (2001a). Longitudinal development of inferential ing of statistical variation. International Journal of Mathemati-
reasoning by school students. Educational Studies in Math- cal Education in Science and Technology, 34, 1–29.
ematics, 47, 337–372. Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empiri-
Watson, J. M. (2001b). Profiling teachers’ competence and con- cal enquiry. International Statistical Review, 67, 223–265.
fidence to teach particular mathematics topics: The case of Zawojewski, J. S., & Shaughnessy, J. M. (2000). Data and chance.
data and chance. Journal of Mathematics Teacher Education, In E. A. Silver & P. A. Kenney (Eds.), Results from the Seventh
4, 305–337. Mathematics Assessment of the National Assessment of Education-
RESEARCH ON STATISTICS LEARNING AND REASONING ■ 1009
AUTHOR NOTE
I would like to express my heartfelt thanks to Iddo
Gal, Jane Watson, and Joan Garfield for their insight-
ful suggestions in reviewing two earlier versions of
this chapter. A chapter of this magnitude is often a
community effort in a field of inquiry, and that was
truly the case for this chapter. Iddo, Jane, and Joan
helped me to more clearly articulate the big themes
in our research in statistics education, and pointed
me in directions that I had not previously been aware
of. I am very fortunate to have such thoughtful and
professional colleagues who helped make the chapter
stronger.