Sample Problems On Data Analysis: What Is Your Favorite Class?
Sample Problems On Data Analysis: What Is Your Favorite Class?
Sample Problems On Data Analysis: What Is Your Favorite Class?
(a) How many variables does this table have? Are the variables categorical or
quantitative?
The percentage of all students who are male students and who chose art (joint
relative frequency).
(2/30)*100% = 6.7%
The percentage of all male students who chose math (conditional relative
frequency).
(4/10)*100% = 40%
(d) In the table above, calculate the relative frequencies for each gender. Show your
work for at least one calculation. See above
1
(e) Sketch side-by-side bar graphs, a segmented bar graph, and a mosaic plot for this
data. For help, you can go to https://www.statsmedic.com/applets (1 Categorical
Variable, Multiple Groups) and enter the class data.
(g) If there was no association between favorite class and gender, what would the
graphs look like?
The corresponding bars would be the same height, e.g.,
2
3
What is your pulse rate?
Suppose that we collected data on the pulse rates of 19 high school students when they
were resting and after 5 minutes of running.
The data obtained is displayed below three ways: (1) dot plots, (2) a back-to-back
stemplots, and (3) histograms.
(c) Write a few sentences comparing the distributions of resting and after-exercise
pulse rates.
Center: The center for after-exercise is higher than the center for resting.
Unusual features: For resting pulse rates, 120 is a potential outlier; for after-
exercise pulse rates, 146 is a potential outlier. We’ll learn a rule for identifying
outliers soon.
Shape: The distribution of resting pulse rates and after-exercise pulse rates are
4
both similarly skewed to the right.
Spread: The variability of after-exercise is a bit greater (range of 60) than that of
resting (range of 52).
3, 3, 4, 4, 4, 5, 5, 6, 9, 12
(c) What is the median number of colleges? Describe how you found it.
Median = 4.5 colleges
Put the data in order. Find the average of 5th and 6th observations.
If there is an even number of values, find the average of the two middle values.
If there is an odd number of values, find the middle value.
(e) Using your calculator, calculate the following values (five number summary).
Minimum: 3 Q1: 4 Median: 4.5 Q3: 6 Maximum: 12
(f) The interquartile range (or IQR) is defined as Q3 – Q1. Find the IQR. Where do you
see the IQR in the boxplot?
IQR = Q3 – Q1 = 6 – 4 = 2 colleges
The IQR is the length of the box.
5
(g) An observation is an outlier if it is less than Q1 - (1.5*IQR) or greater than Q3 +
(1.5*IQR). Are there any outliers? Show your work.
Q1 - (1.5*IQR) = 4 – (1.5*2) = 1 No low outliers
Q3 + (1.5*IQR) = 6 + (1.5*2) = 9 < 12 12 is a high outlier
(h) Use the five number summary to make a boxplot. Add labels, e.g., min, Q1.
(i) Compare the mean and median for the set of data.
Median (4.5) < mean (5.5)
(j) Would you use the mean or the median to summarize the center of this
data? Explain.
I would use the median because the data is skewed to the right with a high
outlier at 12. The median (and IQR) are resistant to the influence of outliers.
The mean (and standard deviation) are not resistant to outliers.
If the data is symmetric, use the mean and standard deviation.
6
Spread: The IQR for the distribution is 2 colleges.
(n) Another CHS senior, Anika joins the group. She applied to 4 colleges.
What effect will adding this observation have on the group’s mean and
standard deviation? Include in your explanation the direction of the
changes.
Since Anika’s number of colleges is less than the average (5.5), the mean of
the group will decrease. New mean = 5.36 colleges
Since Anika’s number of colleges is less than one standard deviation from
the mean, the standard deviation will decrease. New standard deviation =
2.64 colleges.
(o) Instead, suppose that each senior in the original group of 10 applied to 3
more colleges. What would happen to the group mean and standard
deviation?
By properties of means, the mean of the group would increase by 3.
By properties of standard deviations, the standard deviation of the group
would stay the same.
Adding a constant a to the original values shifts the center of the distribution
by that amount, but doesn’t change the spread, i.e.,
μX+a = μX + a σX+a = σX
(p) Instead, suppose each senior in the original group applied to double the
number of colleges. What would happen to the group mean and standard
deviation?
By properties of means, the mean of the group would double.
By properties of standard deviations, the standard deviation of the group
would double.
Multiplying all of the original values by a constant b multiplies the mean by
that constant and multiplies the spread by that constant, i.e.,
μbX = bμX σbX = |b|*σX
7
Match the following histograms to their corresponding boxplot.