Unit 7
Data Analysis
Unit 7 Overview
Students will learn beginning techniques of data management and analysis. They will also learn the
introductory concepts of statistics and probability. The big ideas in this unit are:
• There are three measures of central tendency (sometimes referred to as averages); they are the
mean, median and mode.
• When deciding which measure of central tendency best represents a set of data, the presence (and
effects) of outliers must be considered
• Probability ranges between two values, 0 and 1; or 0% and 100%. A probability of 0 means an
event is impossible, while a probability of 1 means an event is certain.
• There are two types of probability to consider; experimental and theoretical. As the number of
trials in an experiment increases the experimental probability of an event occurring will get
closer to the theoretical probability of that event occurring.
The students will examine sets of data in order to determine the three measures of central tendency. They
will use computational methods to accomplish this and also use technology to assist them. The students
will be required to determine the existence of outliers and analyze the effects of any outliers on the
measures of central tendency. The students will learn to determine which measure is the best
representative of a set of data.
Using skills learned in earlier units (shifting between fractions, decimals, and percents) the students will
express the probability of an event in several formats. They will learn what it means, statistically
speaking, to say that an event is certain or impossible.
Students will develop sample spaces (lists of all possible outcomes) for events. They will utilize tree
diagrams and tables to accomplish this, and then use those sample spaces to determine the probability of
two independent events occurring simultaneously, or sequentially. The students will then compare those
theoretical probabilities to experimentally determined probabilities for the same independent events.
Understanding data analysis is a valuable skill for any person living in a modern 21st century society.
The concepts taught in data management and probability are those used everyday to make important
decisions in many industries. Industries such as marketing, research, sports, medicine, law making
and law enforcement, business, and government. Being familiar with these ideas will allow the
student to make intelligent and relevant decisions as they go through life.
Then there is the man who drowned crossing a stream with an average depth of six inches.
~W.I.E. Gates
It is expected that students will: Measures of central tendency allow us to describe a set of
7SP1. Demonstrate an data with a single meaningful number. The study of mean,
understanding of central median and mode as measures of central tendency is entirely
tendency and range by: new to these students in grade 7.
• determining the
measures of central The focus of this outcome is to determine mean, median and
tendency (mean, median, mode and to understand that situational context will determine
mode) and range which measure is most meaningful. It may be appropriate to
• determining the most use one, two or three of these measures to represent a given
appropriate measures of data set.
central tendency to
report findings. The mean is the sum of the numbers in a set of data divided by
[C, PS, R, T] the number of items of data (arithmetic average). It is the
number that most people are describing when they talk about
Achievement Indicators average. The calculation of the mean describes a set of data by
identifying a value obtained from combining all values of the
7SP1.1 Determine mean, data set and distributing them equally.
median and mode for a
given set of data, and The median is the middle number when data are arranged in
explain why these values numerical order. Half of the data values are above the median
may be the same or and half are below. If there are two numbers in the middle of a
different. data set, the median is the mean of those two values. The
median may be the same as the mean or, it may be different.
The mode is the number that occurs most often in a set of data.
It is possible that the data set have one mode, several modes or
no mode at all. Ordering, bar graphs and stem and leaf plots
are useful data displays to easily identify the mode of a given
data set. Students have studied these displays prior to grade 7.
7SP1.2 Determine the When considering the data as a whole it is often of value to
range for a given set of consider the spread of the data. One strategy for examining
data. this is to consider the range of the data. Students will calculate
the range by subtracting the smallest data value from the
greatest. Range may be used in combination with one of the
7SP1.3 Solve a given
other measures of central tendency to create a better
problem involving the
representation of the data in a set.
measures of central
Graphic Organizer
Create a tri-fold foldable to define and create examples of each of
the measures of central tendency. On each of the outside panels,
name and define mean, median or mode. On the corresponding
inside panel, create and solve an example of a problem using the
measure of central tendency on the front.
1. Create a set of data for each of the following. Each set must
have at least 6 pieces of data.
A. Situation 1: The mean, median and mode are the same.
B. Situation 2: The mean, median and mode are different.
It is expected that students will: In a set of data, we often find values which are significantly
7SP2. Determine the effect different from the others. These values are called outliers.
on the mean, median and The presence of outliers may affect which measure of central
mode when an outlier is tendency best represents the data.
included in a data set.
[C, CN, PS, R] Outliers are often identified from a numbered set but can also
be identified in different data displays.
Achievement Indicators
7SP2.2 Explain the effect If, however, the data contained just one or two extreme pieces
of outliers on the of data, the mean may be less representative. For the data set
measures of central 3 + 4 + 5 + 5 + 6 + 19 42
3, 4, 5, 5, 6, 19, the mean = = 7 is
tendency for a given data 6 6
set. greatly influenced by the outlier 19 and does not represent the
data as well as the median (3, 4, 5, 5, 6, 19) which remains
unchanged at 5.
It is expected that students will: In any data collection process it is possible for errors to occur.
7SP2. Determine the effect These errors may be due to human measurement problems or
on the mean, median and in the recording of information.
mode when an outlier is
included in a data set. Students must learn to distinguish between an error that
[C, CN, PS, R] appears as an outlier and a legitimate data value which is
(Cont’d) significantly different from the others. Data directly
attributable to error are omitted from the calculation of
Achievement Indicators measures of central tendency. If no error has occurred, the data
value must be included in the calculation of the measures of
7SP2.3 Identify outliers in central tendency.
a given set of data, and
justify whether or not they When choosing the best measure of central tendency to
are to be included in represent a data set, the presence of outliers and their effect on
reporting the measures of the mean, median and mode must be considered.
central tendency.
7SP2.4 Provide examples Players on the grade 7 basketball team were asked to record
of situations in which their height in cm on a chart in the classroom. The data
outliers would and would obtained was to be used to represent the height of the team.
not be used in reporting Here is the data.
the measures of central
tendency. 155 cm 153 cm 150 cm 167 cm
164 cm 182 cm 170 cm 159 cm
185 cm 19 cm 182 cm 174 cm
a) Does this data set contain any outliers? How can you tell?
b) Suggest a reason for this outlier. Should this outlier be
included in the calculations for the measures of central
tendency? Why or why not?
c) Calculate the mean, median and mode for these heights.
d) Which measure(s) of central tendency would you use to
represent the height of the team? Why?
Define the term outlier. Provide an example of a situation in which
an outlier must be excluded from the data before calculating the
measures of central tendency. Why would you exclude the outlier in
this case?
It is expected that students will: Solutions to the questions on the previous two page spread:
7SP2. Determine the effect
on the mean, median and a) When responding to this question, students should be able to
mode when an outlier is determine that 19 cm is significantly different from all of the
included in a data set. other values. It is over 130 cm in difference from the next
[C, CN, PS, R] closest value.
b) Perhaps the student made an error and should have written
Achievement Indicators 190 cm. Unfortunately we cannot know for certain. This
outlier is, however, clearly an error. No grade 7 student would
7SP2.3 Identify outliers in reasonably measure 19 cm tall. Consequently, it should be
a given set of data, and excluded when calculating the measures of central tendency.
justify whether or not they
are to be included in 1841
c) mean = = 167.36 which is 167 cm when rounded.
reporting the measures of 11
central tendency.
(continued) median : 150, 153, 155, 159, 164, 167, 170, 174, 182, 182, 185
thus median = 167 cm
7SP2.4 Provide examples mode: From the ordered list, we can easily see that 182 cm is
of situations in which the mode.
outliers would and would
not be used in reporting d) Both the mean and the median are good representations for
the measures of central the height of the team in this case. The mode, however, is not a
tendency. good choice as it represents a height greater than most of the
(continued) players on the team.
It is expected that students will: Classroom discussion at this point should be focussed upon
7SP1. Demonstrate an realistic situations and making decisions among measures of
understanding of central central tendency.
tendency and range by:
• determining the The mean is most meaningful when the data values contain
measures of central few outliers. For instance, when considering student progress,
tendency (mean, median, a teacher calculating a student’s mark uses the mean when the
mode) and range set of marks does not contain any extreme values.
• determining the most
appropriate measures of The median is most meaningful when there are a small number
central tendency to of significantly different pieces of data. In this case, the
report findings. median is often a better representation of the data. For
[C, PS, R, T] example, when considering the progress of a student who
(Cont’d) regularly performs well on tests but had one failure, the
teacher may wish to use the median of the student’s scores
Achievement Indicators when discussing test results with parents on interview night.
7SP1.4 Provide a context Certain life situations exist in which the mode is the only
in which the mean, acceptable measure of central tendency. A store selling shoes
median or mode is the or dresses will not find it useful to know that the average size
most appropriate measure of shoe or dress is 8.32.
of central tendency to use
when reporting findings. Example:
Restaurant staff was required to prepare 400 meals per day.
After a 15 day period, they noted that some food was always
left over. The manager knew she wanted to prepare enough
food for her customers, but did not want to be wasteful. In an
effort to determine how many meals the staff should prepare
each day, she analysed the sales receipts to determine the
number of meals sold. She recorded the number of meals sold
each day.
For each situation, would mean, median, or mode be most helpful
to know. Justify your choice.
A. You are ordering bowling shoes for a bowling alley.
B. You want to know if you read more or fewer books per
month than most people in your class.
C. You want to know the “average” amount spent per week
on junk food in your class.
Darryl, Gordon and Joan are captains of the school math teams.
Their contest results are recorded in the table below.
A. Based on the mean, whose math team is the best? ST: pp. 271–275
B. Based on the median, whose math team is the best? Practice and HW Book
C. Based on the mode, whose math team is the best? pp. 161–163
D. Which measure would you choose to determine whose
team is the best? Why?
E. Why might someone disagree with you?
It is expected that students will: Solution to the question on the previous two page spread:
7SP1. Demonstrate an
understanding of central A reasonable suggestion would be to prepare 320 meals per
tendency and range by: day. The following analysis is suggested.
• determining the
measures of central Students should think about finding the measures of central
tendency (mean, median, tendency for this data.
mode) and range
• determining the most 4328
appropriate measures of Mean = = 288.53 . Students can use a calculator
central tendency to Therefore the mean number to find the mean.
report findings. of meals is 289.
[C, PS, R, T]
120 Using an ordered list,
Achievement Indicators 265 students can easily find the
272 , 278 median and the mode.
7SP1.4 Provide a context 288
in which the mean, 295, 296, 298, 299
median or mode is the 311, 315 ***If students chose to
most appropriate measure 320, 320, 325, 326 remove the outlier of 120,
of central tendency to use there is no significant change
when reporting findings. The median is 298. in the measures of central
(continued) tendency.***
The mode is 320.
It is expected that students will: In grade 6, students were exposed to probability as a measure
7SP4. Express probabilities of how likely an event is to occur. They have been exposed to
as ratios, fractions and identifying possible outcomes and determining theoretical and
percents. experimental probabilities.
[C, CN, R, T, V]
Theoretical probability can sometimes be obtained by
carefully considering the possible outcomes and using the rules
of probability. For example, in flipping a coin, there are only
two possible outcomes, so the probability of flipping a head is,
in theory, ½. Often in real-life situations involving probability,
it is not possible to determine theoretical probability. We must
rely on observation of several trials (experiments) and a good
estimate, which can often be made through a data collection
process. This is called experimental probability.
Achievement Indicators
It is important for students to acquire an understanding that
7SP4.1 Determine the probability can be represented in multiple forms. The
probability of a given probability of an event occurring is most often represented by
outcome occurring for a using a fraction, where the numerator represents the number of
given probability favourable outcomes and the denominator represents the total
experiment, and express it possible outcomes.
as a ratio, fraction and # of favorable outcomes
P(event ) =
percent. # of possible outcomes
This representation has many advantages since it often
maintains the original numbers in simple situations.
Probability can similarly be represented as a ratio. However,
probability can just also be represented in decimal form.
Likewise, students will often hear in news/weather reports
various probability data presented as percents. For example,
the likelihood of rainfall for a given day is almost always
provided in percent form. In order for all situations
encountered to be meaningful to the student, they should work
with all representations of probability (fractions/decimals,
ratios and percents).
1 1 3
Use a scale with benchmarks 0 (0%), (25%), (50%),
4 2 4
(75%), and 1 (100%) to assess the reasonable probability of the
events described below. Explain each choice.
A. The next baby born in your town will be a boy. Math Makes Sense 7
B. It will snow at least once in the month of June. Lesson 7.5
C. Living 6 months without water. Unit 7: Data Analysis
D. The sun will set tomorrow. TR: ProGuide, pp. 25–29
Achievement Indicators
It is expected that students will: In grade 7, the study of sample space is limited to independent
7SP5. Identify the sample events. Events are considered to be independent if the result of
space (where the combined one does not depend on the result of another. The sample space
sample space has 36 or for a probability is the list of all possible outcomes for the
fewer elements) for a events.
probability experiment
involving two independent Students must understand that spinning a four sided spinner
events. does not in any way affect the number an eight-sided die will
[C, ME, PS] land upon when tossed.
Achievement Indicators Students will explore several ways to organize the sample
space for two independent events.
7SP5.1 Provide an For example:
example of two What is the sample space for spinning a four coloured spinner
independent events, such and rolling a six-sided die?
• spinning a four section
Using a table:
spinner and rolling an
eight-sided die Spinner Die Spinner Die
• tossing a coin and
Red 1 Green 1
rolling a twelve-sided Red 2 Green 2
die Red 3 Green 3
• tossing two coins
Red 4 Green 4
• rolling two dice
Red 5 Green 5
and explain why they
Red 6 Green 6
are independent.
Yellow 1 Blue 1
Yellow 2 Blue 2
Yellow 3 Blue 3
7SP5.2 Identify the Yellow 4 Blue 4
sample space (all possible Yellow 5 Blue 5
outcomes) for each of two Yellow 6 Blue 6
independent events, using
a tree diagram, table or (This elaboration is continued on the next two page spread...)
other graphic organizer.
Achievement Indicators
7SP5.1 Provide an
example of two
independent events, such
• spinning a four section
spinner and rolling an
eight-sided die
• tossing a coin and
rolling a twelve-sided
• tossing two coins
• rolling two dice
and explain why they
are independent. Using a graphic (e.g. a fishbone organizer):
It is expected that students will: Theoretical probability of an event is the ratio of the number
7SP6. Conduct a probability of favourable outcomes in an event to the total number of
experiment to compare the possible outcomes, when all possible outcomes are equally
theoretical probability likely. It can only be used to predict what will happen in the
(determined using a tree long run, when events are equally likely to occur. The
diagram, table or other theoretical probability of event Y is:
graphic organizer) and
experimental probability of total # of favorable outcomes
P(Y ) =
two independent events. sample space (total # of possible outcomes)
[C, PS, R, T]
Experimental probability is the ratio of the number of
Achievement Indicators favourable outcomes in an event to the number of possible
outcomes (sample space) observed in simulations and
7SP6.1 Determine the experiments. Students should realize that the probability in
theoretical probability of a many situations cannot be characterized as equally likely, such
given outcome involving as tossing a thumb tack to see if it lands with the point up or
two independent events. down, and therefore, theoretical probability is more difficult to
determine. In such cases, experiments may be conducted to
identify the probability. The experimental probability of an
7SP6.2 Conduct a event Y is:
probability experiment for
an outcome involving two total # of observed occurrences of Y
independent events, with P(Y ) =
and without technology, to Sample space (total # of trials )
compare the experimental
probability with the Before conducting experiments, students should predict the
theoretical probability. probability whenever possible, and use the experiment to
verify or refute the prediction.
Informal Observation
Conduct an experiment of spinning the spinner twice, and finding
the sum of the numbers from the two spins. Predict which sum will
appear most often. Explain your thinking.
the experiment with each group doing 10 or 20 trials. Collate the
results to obtain at least 100 trials. Have students compare the
experimental results to their prediction and explain why there may
be differences.***
It is expected that students will: For example, tell students that an experiment of tossing two
7SP6. Conduct a probability fair coins was conducted. About how many times in an
experiment to compare the experiment with 64 trials would you expect to get two heads?
theoretical probability Explain your thinking.
(determined using a tree
diagram, table or other Have students work in pairs to carry out the experiment with
graphic organizer) and each group doing 10 or 20 trials. Collate the results to obtain
experimental probability of 64 trials and then add more trials as needed to show that
two independent events. experimental probability approaches theoretical probability as
[C, PS, R, T] the number of trials increases. Have them calculate the
(Cont’d) experimental probability of getting two heads when two coins
are tossed. Have students compare the experimental
probability to the theoretical probability. (This can be
Achievement Indicators
simulated using the TI-84 graphing calculator.)
7SP6.1 Determine the
theoretical probability of a
given outcome involving
two independent events.
7SP6.2 Conduct a
probability experiment for
an outcome involving two
independent events, with
and without technology, to
compare the experimental
probability with the
theoretical probability.
Unit Problem:
