SB Assignment
SB Assignment
STUDENT DETAILS
DECLARATION
I hold a copy of this assignment if the original is lost or damaged.
I hereby certify that no part of this assignment or product has been copied from any other student’s work or
from any other source except where due acknowledgement is made in the assignment.
I hereby certify that no part of this assignment or product has been submitted by me in another
(previous or current) assessment, except where appropriately referenced, and with prior permission
from the Lecturer / Tutor / Unit Coordinator for this unit.
No part of the assignment/product has been written/ produced for me by any other person except
where collaboration has been authorised by the Lecturer / Tutor /Unit Coordinator concerned.
I am aware that this work may be reproduced and submitted to plagiarism detection software programs for
the purpose of detecting possible plagiarism (which may retain a copy on its database for future
plagiarism checking).
1
CONTRIBUTION
Section b
Hoàng Diễm Quỳnh Problem 3&4a
(Diagram)
Section a
Huỳnh Dương Phương Anh Problem 4b&4c Section c
Section b
Trần Đình Quân Problem 5
(Interpretation)
Note: The workload was equally and fairly assigned to each member from the
start, and all have performed well to find out the solutions as well as discuss to
adjust and complete the answer. All members were active, dedicated, and
timely to do this group assignment.
2
1. Problem Solving
Problem 1
We use the law of total probability for this problem.
The probability that the man left his umbrella in the second shop is equal the
probability that he did not leave it in the first shop, because the given condition
is that he left his umbrella in one of two shop, so:
1
Given that the probability that the man leaves his umbrella in any shop is 5
1 4
⇒ The probability that he does not leave it in the first shop is 1 - 5
= 5
4
⇒ P(left in the second shop) = 5
Therefore, the probability that the man left his umbrella in the second shop is
4/5 or 0.8.
Problem 2
Tails Heads
25 75 100
25 1
P(normal fair coin | result comes up heads) = 75
= 3
Problem 3
3
a) The probabilities sum to 1, as must be true for any probability
distribution. That means, we have:
⇒ 0.1+p+q+0.2= 1 (1)
𝑁
Given that E(X) = 1.5 and E(X) = ∑ 𝑥iP(xi)
𝑖=1
⇒ 0×0.1 + p + 2q + 2×0.3 = 1.5
⇒ p + 2q + 0.6 = 1.5 (2)
Problem 4
a) The probability that the demand is exactly 2 fans in any one week is:
𝑥 −λ 2 −3.2
λ𝑒 3.2 𝑒
P(X=2) = 𝑥!
= 2!
= 0.2087
b) The probability that will not be satisfy the demand for fan in that week is:
P(X>4) = 1 - P(X≤ 4)
= 1 - [P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)]
= 1 - (0.0407 + 0.1304 + 0.2087 + 0.2226 + 0.178)
= 0.2196
c) The least value of n for which the probability of his not being able to
satisfy the demand for fans in that week is less than 0.05:
Testing:
P(X>5) = 1 - P(X≤ 5)
= 1 - [P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4) + P(X=5)]
4
= 1 - (0.0407 + 0.1304 + 0.2087 + 0.2226 + 0.178 + 0.1139)
= 0.1057 (>0.05, unacceptable)
P(X>6) = 1 - P(X≤ 6)
= 1 - [P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4) + P(X=50) +
P(X=6)]
= 1 - (0.0407 + 0.1304 + 0.2087 + 0.2226 + 0.178 + 0.1139 +
0.0607)
= 0.045 (<0.05, acceptable)
⇒n=6
Problem 5
a)
X = length of the bar
Notice that:
P(X< 20.02) = 12%; P(X>20.06) = 33%
Because it is about normal distribution,
so:
P(z<a) = 12%; P(z<b) = 67% (obtained
from that P(z>b) = 33%)
⇔ a = -1.18; b = 0.44
(according to the C-2 appendix of the
textbook)
Here we have:
5
b)
𝑥−μ
Z-score formula: z =
σ
In the case of x = 20.03, we have that z = -0.775
According to the C-2 appendix of the textbook, this z-score goes with a
percentage of 22.06% that is synonymous with the statement: The
proportion of steel bars which measure 20.03 cm or more is
approximately 77.94%
c)
P(reject) = P(X<20.02) + P(X>20.08)
= P(z<-1.18) + P(z>1.25)
= P(z<-1.18) + (1 - P(z<1.25))
= 11.9% + (100% - 89.44%) (according to the C-2 appendix of
the textbook)
= 22.46%
Conclusion: The percentage of bars are rejected as being outside the
acceptable range is 22.46%
2. Data Analysis
a)
6
MEASURES OF
VARIABILITY
And the histograms for each gender and for the whole data set:
7
b) The data represents the height of 500 American men and 500 American
women. To compare and contrast the distribution of two groups, we may
either use the descriptive statistics or interpret some suitable diagrams.
- From the descriptive statistics:
● The distributions of both the data sets of men and women are
roughly unimodal (because mean, mode, and median are approx.
8
equal) but that of men is left skewed (because the mean is less
than the median) and that of women is right skewed (because the
mean is greater than the median).
● The distributions of women height may contain low outliers due to
less variability (comprising of range, interquartile range, variance,
and standard deviation).
● Interquatile range is an appropriate measure for the spread of the
distribution; here, the IQR of them are far from each other (8.00
and 18.00), and this says that the distributions of the heigh of
American women and men are quite different.
- From some specific diagrams:
The histogram:
● Both of them have the rough bell-shape, therefore the height spreads
approximately equally to two sides from the centre.
● It is clear that the distribution of women height has fatter tails.
● In terms of spreading to the left, the men height has some strange
outliers that are isolated to the whole group. Similarly, the women height
also has these strange outliers in the left side.
9
The CDF: Since the blue line is above the red line, it means that the
distribution of the women height as fatter tails. Normally, fatter tails represent
the probability of extreme events being higher than normal. In this case, from
the variability of the descriptive statistics or the histogram, it is obvious that the
distribution of men height contains more outliers than women height.
Therefore, the extreme events of women weight are likely to be more than that
of men weight but fall into the unusual group, not outlier.
10
The dotplot: Similar to the histogram in terms of showing the spread but still
able to indicate something interesting. That is we also have the “strange
outliers” in the right side of the distribution of men height. And the dot plot also
reveals us the shape of the distribution (clearer than the histogram because
the sample here is large enough) — it is quite abnormal in both groups. The
shape here is neither really bell-shaped nor normally distributed as we can
see a lot of observations that do not follow the pattern and there are some
clusters appearing as from other population. Because it is not given that the
sample of 500 men and 500 women are randomly chosen so maybe this can
be explained by the fact that each group contains some different small group
with slight difference in characteristic, here is the height (it doesn’t seem to be
due to sampling error because the strange point is significant). For example,
perhaps the data of 500 women was chosen from women in different regions
of the US with different average height (the sample size for each region is not
the same), or maybe this 500 men includes a group of adolescents that have
the lower average height compared to the population.
H1: μ ≠ 178 cm (the average height of American men is not 178 cm)
For α = 0.05, the two-tailed critical value for d.f. = n - 1 = 500 - 1 = 499
degrees of freedom is 1.96 (it can be obtained from both appendix C-2 or D
because normally we can use the z-score instead of t-score even if the σ is
unknown when the sample size is greater than 30, here, with the sample size
of 500, we can feel confident to use either the z-score or t-score test to gain
the accurate answer).
11
We will reject H0 if tcalc > 1.96 or if tcalc < -1.96, as illustrated in the figure.
Since the test statistic obviously falls in the left tail of rejection region, we
reject the null hypothesis H0: μ = 178 cm and conclude H1: μ ≠ 178 cm at 5
percent level of significance
Because the variance of the population is unknown and the sample standard
deviations appear different (6.00 for women height and 5.84 for men height),
we will assume that population variances are unequal. Therefore, we will
apply the formula for the case “unknown variances, assumed unequal”.
12
We will have the t statistic:
If we use the quick rule for degrees of freedom, then we would get the d.f. =
min(n1 – 1, n2 – 1) = 499 and t.05 = 1.96. Here, we can easily see that the t
statistic falls in the left tail of rejection region.
13