Skittles Project With Reflection
Skittles Project With Reflection
Skittles Project With Reflection
Data Collection:
Organizing and Displaying Categorical Data: Colors
Observations:
The colors came out to be relatively close in number. We expected a color or two to be lower in
count and maybe a favored color to be out in front, which we did see in half the charts while the
others were more evenly distributed. The data from our bag follows the pattern of the whole
classes bag except for the color purple which is higher in ours, but the other colors follow the
pattern of the whole classes.
Group Three Bag:
Summary statistics:
Column n M
Varia
ea nce
n
NUMBE 5 48. 43.3
R
4
Medi
an
46
Ra
ng
e
16
M
i
n
4
4
M
a
x
6
0
Q Q S
1 3 u
m
4 4 24
5 7 2
Class Bags:
Summary statistics:
Percent of
Total
3.8461538
3.8461538
Cumulative
Frequency
1
2
56
58
1
5
0.038461538
0.19230769
3.8461538
19.230769
3
8
59
60
6
6
0.23076923
0.23076923
23.076923
23.076923
14
20
61
62
5
1
0.19230769
0.038461538
19.230769
3.8461538
25
26
Mean: 59.1
Standard Deviation: 1.90
5 Number Summary:
Min: 54
Q1: 58
Q2: 59
Q3: 60
Max: 62
Observations:
The shape of the distribution is almost a normal bell shape, although it is skewed to the left. It
seems that the class data is relatively normal which can be expected considering there are not
extreme outliers. We expected there would be about the same number of candies in each bag,
which wasnt the case. The number of candies from my individual bag was 26 and the total
number of bags in the sample is 26.
Reflection
The categorical data are qualitative variables that consist of names or labels (not
numbers) which represent counts or measurements. The pie charts and bar graphs make sense for
categorical data because they compare one categorical variable against others. Computation, or
arranging in ordering such as low to high, does not make sense for categorical data however,
survey responses of yes, no, and undecided are more appropriate.
Quantitative data is numerical variables consisting of number that can be measured,
ordered, or counted. The scatterplot and steam plot make sense to house quantitative data
because, they help determine whether there is a relationship between two variables or separating
each value into two parts. Computation makes sense for quantitative data to find an average or
mean, standard deviation, five numbers summary, and sum.
The 95% confidence interval estimate for the true proportion of purple
candies:
Based on the calculations from our sample data, we are 95% confidence that the interval between
0.182 and 0.222 actually does contain the true value of the population proportion of the purple
color candies.
The 99% confidence interval estimate for the true mean number of candies
per bag:
99% confidence interval results:
: Mean of variable
Variable
Sample
Mean
Mean candies per bag
59.076923
Std. Err.
D
F
2
5
0.371786
03
L. Limit
U. Limit
58.0405
93
60.1132
53
Based on the calculations from our data, we are 99% confident that the interval between 58.041
and 60.113 does contain the true mean value of the population of number of candies per bag.
The 98% confidence interval estimate for the standard deviation of the
number of candies per bag:
98% confidence interval results:
: standard deviation of variable
Variable
Standard Deviation.
Mean candies per bag
1.895744234
D
F
2
5
L. Limit
U. Limit
1.4238975
74
2.7922132
62
Based on the calculation from our data, we are 98% confident that the interval between 1.424
and 2.792 does contain the true value of the population standard deviation of the number of the
candies per bag.
Hypothesis tests
A hypothesis is an assumption or claim about some aspect of a population. The various
parameters of the population involved in hypothesis testing are mean, standard deviation,
probability, and variance. Hypothesis tests are used to evaluate the accuracy of the claim
(hypothesis) made about the property of a population. In the following section, our group
performed two hypothesis tests on our classs Skittle candy data.
Test the claim that 20% of all Skittles are green (class bags):
Hypothesis test results:
p: Proportion of successes
H0: p = 0.2
HA: p 0.2
Proportion
Cou
nt
Tot
al
Sample
Prop.
Std. Err.
Z-Stat
Pvalue
311
15
36
0.20247396
Variable
0.0102062
07
Sample
Mean
60.5
Std. Err.
D
F
3
0.86602
54
0.242397
42
0.808
5
T-Stat
Pvalue
0.013
8
5.19615
24
Since our p-value of 0.805 was greater than , we fail to reject the claim. We have sufficient
evidence to support the claim of H0 that 20% of all Skittles candies are green.
The mean number of candies in a bag of Skittles is 56 (class bags):
Hypothesis test results:
: Mean of variable
H0: = 56
HA: 56
Variable
Candies per bag class
Sample
Mean
59.076923
Std. Err.
0.371786
03
D
F
2
5
T-Stat
8.27605
89
Pvalue
<0.00
01
Since our p-value of 0.0001 was less than , we reject the claim that the mean number of candies
in a bag of Skittles is 56. There is sufficient evidence to warrant a rejection of the claim.
Reflection:
The conditions for doing interval estimates and hypothesis tests:
The sample must be a simple random sample or the sample size n must be > 30.
The data for our sample met both requirements as the class sample was simple and
random, and although our sample size was less than 30, it was generally normally distributed as
shown in the histogram from the previous section.
An error that could have occurred is that although there is a normal distribution, it is
slightly skewed, and the population size is less than 30, which could cause results to be skewed.
The sampling method could be improved by making the sample more random. For
example, we could likely get more accurate results were the sample to have been taken from
students in statistics classes all throughout Utah.