0% found this document useful (0 votes)
39 views

SB Assignment

The document contains details of a group assignment for a statistics unit. It lists the names and student IDs of four students assigned to the group. It also provides the unit name and number, lecture details, assignment title and due date. The students signed a declaration confirming their contributions. One student addressed problems 1 and 2, another addressed the data analysis sections, and the other two students addressed other problems. The workload was divided evenly among the group members.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

SB Assignment

The document contains details of a group assignment for a statistics unit. It lists the names and student IDs of four students assigned to the group. It also provides the unit name and number, lecture details, assignment title and due date. The students signed a declaration confirming their contributions. One student addressed problems 1 and 2, another addressed the data analysis sections, and the other two students addressed other problems. The workload was divided evenly among the group members.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

GROUP ASSIGNMENT COVER SHEET

STUDENT DETAILS

Student name: Huỳnh Dương Phương Anh Student ID number: 31221023804

Student name: Vũ Hoàng Khánh Linh Student ID number: 31221023321

Student name: Trần Đình Quân Student ID number: 31221026975

Student name: Hoàng Diễm Quỳnh Student ID number: 31221023102


UNIT AND TUTORIAL DETAILS

Unit name: Statistics for Business Unit number: MAT102


Tutorial/Lecture: Lecture Class day and time: Friday 12:00 p.m
Lecturer or Tutor name: Mr. Tran Minh Hoang
ASSIGNMENT DETAILS

Title: Group assignment


Length: Due date: 05/01/2023 Date submitted: 05/01/2023

DECLARATION
I hold a copy of this assignment if the original is lost or damaged.
I hereby certify that no part of this assignment or product has been copied from any other student’s work or
from any other source except where due acknowledgement is made in the assignment.
I hereby certify that no part of this assignment or product has been submitted by me in another
(previous or current) assessment, except where appropriately referenced, and with prior permission
from the Lecturer / Tutor / Unit Coordinator for this unit.
No part of the assignment/product has been written/ produced for me by any other person except
where collaboration has been authorised by the Lecturer / Tutor /Unit Coordinator concerned.
I am aware that this work may be reproduced and submitted to plagiarism detection software programs for
the purpose of detecting possible plagiarism (which may retain a copy on its database for future
plagiarism checking).

Student’s signature: Huynh Duong Phuong Anh


Student’s signature: Vu Hoang Khanh Linh
Student’s signature: Tran Dinh Quan
Student’s signature: Hoang Diem Quynh
Note: An examiner or lecturer / tutor has the right to not mark this assignment if the above declaration has not
been signed.

1
CONTRIBUTION

Problem Data analysis

Vũ Hoàng Khánh Linh Probem 1&2 Section d

Section b
Hoàng Diễm Quỳnh Problem 3&4a
(Diagram)
Section a
Huỳnh Dương Phương Anh Problem 4b&4c Section c

Section b
Trần Đình Quân Problem 5
(Interpretation)

Note: The workload was equally and fairly assigned to each member from the
start, and all have performed well to find out the solutions as well as discuss to
adjust and complete the answer. All members were active, dedicated, and
timely to do this group assignment.

2
1. Problem Solving

Problem 1
We use the law of total probability for this problem.

The probability that the man left his umbrella in the second shop is equal the
probability that he did not leave it in the first shop, because the given condition
is that he left his umbrella in one of two shop, so:

P(left in the second shop) = P(not left in the first shop)

1
Given that the probability that the man leaves his umbrella in any shop is 5
1 4
⇒ The probability that he does not leave it in the first shop is 1 - 5
= 5

4
⇒ P(left in the second shop) = 5

Therefore, the probability that the man left his umbrella in the second shop is
4/5 or 0.8.

Problem 2

Assume there are 100 times flip coins

Tails Heads

Normal fair coin 25 25 50

Coins with heads both sides 0 50 50

25 75 100

25 1
P(normal fair coin | result comes up heads) = 75
= 3

Problem 3

3
a) The probabilities sum to 1, as must be true for any probability
distribution. That means, we have:

⇒ 0.1+p+q+0.2= 1 (1)

𝑁
Given that E(X) = 1.5 and E(X) = ∑ 𝑥iP(xi)
𝑖=1
⇒ 0×0.1 + p + 2q + 2×0.3 = 1.5
⇒ p + 2q + 0.6 = 1.5 (2)

From (1) and (2), we have:

Solving this set of equations, we find that: p = 0.5 and q = 0.2


𝑁
b) [ ]
Var(X) = ∑ 𝑥𝑖 − µ 2P(xi) = (0-1.5)2 × 0. 1 + (1-1.5)2 × 0. 5 + (2-1.5)2
𝑖=1
× 0. 2 + (3-1.5)2 × 0. 2 = 0.85

Problem 4

a) The probability that the demand is exactly 2 fans in any one week is:
𝑥 −λ 2 −3.2
λ𝑒 3.2 𝑒
P(X=2) = 𝑥!
= 2!
= 0.2087

b) The probability that will not be satisfy the demand for fan in that week is:
P(X>4) = 1 - P(X≤ 4)
= 1 - [P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)]
= 1 - (0.0407 + 0.1304 + 0.2087 + 0.2226 + 0.178)
= 0.2196

c) The least value of n for which the probability of his not being able to
satisfy the demand for fans in that week is less than 0.05:

Testing:

P(X>5) = 1 - P(X≤ 5)
= 1 - [P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4) + P(X=5)]

4
= 1 - (0.0407 + 0.1304 + 0.2087 + 0.2226 + 0.178 + 0.1139)
= 0.1057 (>0.05, unacceptable)

P(X>6) = 1 - P(X≤ 6)
= 1 - [P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4) + P(X=50) +
P(X=6)]
= 1 - (0.0407 + 0.1304 + 0.2087 + 0.2226 + 0.178 + 0.1139 +
0.0607)
= 0.045 (<0.05, acceptable)
⇒n=6

Problem 5
a)
X = length of the bar
Notice that:
P(X< 20.02) = 12%; P(X>20.06) = 33%
Because it is about normal distribution,
so:
P(z<a) = 12%; P(z<b) = 67% (obtained
from that P(z>b) = 33%)
⇔ a = -1.18; b = 0.44
(according to the C-2 appendix of the
textbook)
Here we have:

Solving this set of equations, we find that:


μ = Mean = 20.0491358
σ = Standard deviation = 2/81 ≈ 0.0247

5
b)
𝑥−μ
Z-score formula: z =
σ
In the case of x = 20.03, we have that z = -0.775
According to the C-2 appendix of the textbook, this z-score goes with a
percentage of 22.06% that is synonymous with the statement: The
proportion of steel bars which measure 20.03 cm or more is
approximately 77.94%
c)
P(reject) = P(X<20.02) + P(X>20.08)
= P(z<-1.18) + P(z>1.25)
= P(z<-1.18) + (1 - P(z<1.25))
= 11.9% + (100% - 89.44%) (according to the C-2 appendix of
the textbook)
= 22.46%
Conclusion: The percentage of bars are rejected as being outside the
acceptable range is 22.46%

2. Data Analysis

a)

Descriptive statistics of the given data:

MEASURES OF CENTER Women Men Both Gender

Size 500 500 1000

Mean 163.25 176.46 169.86

Mode 163.00 177.00 177.00

Median 163.00 177.00 170.00

Min 146.00 154.00 146.00

Max 180.00 195.00 195.00

Mid Range 163.00 174.50 170.50

6
MEASURES OF
VARIABILITY

Range 34.00 41.00 49.00

Variance 36.00 34.12 78.64

Standard deviation 6.00 5.84 8.87

Coefficient of Variation (CV) 3.68% 3.31% 5.22%

Q1 159.00 172.00 163.00

Median 163.00 177.00 170.00

Q3 167.00 190.00 177.00

Interquartile range 8.00 18.00 14.00

Mean absolute deviation 4.81 4.63 7.41

Kurtosis -0.08 0.42 -0.70

And the histograms for each gender and for the whole data set:

7
b) The data represents the height of 500 American men and 500 American
women. To compare and contrast the distribution of two groups, we may
either use the descriptive statistics or interpret some suitable diagrams.
- From the descriptive statistics:
● The distributions of both the data sets of men and women are
roughly unimodal (because mean, mode, and median are approx.

8
equal) but that of men is left skewed (because the mean is less
than the median) and that of women is right skewed (because the
mean is greater than the median).
● The distributions of women height may contain low outliers due to
less variability (comprising of range, interquartile range, variance,
and standard deviation).
● Interquatile range is an appropriate measure for the spread of the
distribution; here, the IQR of them are far from each other (8.00
and 18.00), and this says that the distributions of the heigh of
American women and men are quite different.
- From some specific diagrams:

The histogram:
● Both of them have the rough bell-shape, therefore the height spreads
approximately equally to two sides from the centre.
● It is clear that the distribution of women height has fatter tails.
● In terms of spreading to the left, the men height has some strange
outliers that are isolated to the whole group. Similarly, the women height
also has these strange outliers in the left side.

9
The CDF: Since the blue line is above the red line, it means that the
distribution of the women height as fatter tails. Normally, fatter tails represent
the probability of extreme events being higher than normal. In this case, from
the variability of the descriptive statistics or the histogram, it is obvious that the
distribution of men height contains more outliers than women height.
Therefore, the extreme events of women weight are likely to be more than that
of men weight but fall into the unusual group, not outlier.

10
The dotplot: Similar to the histogram in terms of showing the spread but still
able to indicate something interesting. That is we also have the “strange
outliers” in the right side of the distribution of men height. And the dot plot also
reveals us the shape of the distribution (clearer than the histogram because
the sample here is large enough) — it is quite abnormal in both groups. The
shape here is neither really bell-shaped nor normally distributed as we can
see a lot of observations that do not follow the pattern and there are some
clusters appearing as from other population. Because it is not given that the
sample of 500 men and 500 women are randomly chosen so maybe this can
be explained by the fact that each group contains some different small group
with slight difference in characteristic, here is the height (it doesn’t seem to be
due to sampling error because the strange point is significant). For example,
perhaps the data of 500 women was chosen from women in different regions
of the US with different average height (the sample size for each region is not
the same), or maybe this 500 men includes a group of adolescents that have
the lower average height compared to the population.

c) We use a two-tailed test. The null hypothesis is in conformance with the


statement.

H0: μ = 178 cm (the average height of American men is 178 cm)

H1: μ ≠ 178 cm (the average height of American men is not 178 cm)

For α = 0.05, the two-tailed critical value for d.f. = n - 1 = 500 - 1 = 499
degrees of freedom is 1.96 (it can be obtained from both appendix C-2 or D
because normally we can use the z-score instead of t-score even if the σ is
unknown when the sample size is greater than 30, here, with the sample size
of 500, we can feel confident to use either the z-score or t-score test to gain
the accurate answer).

11
We will reject H0 if tcalc > 1.96 or if tcalc < -1.96, as illustrated in the figure.

Calculate the test statistic:

Since the test statistic obviously falls in the left tail of rejection region, we
reject the null hypothesis H0: μ = 178 cm and conclude H1: μ ≠ 178 cm at 5
percent level of significance

Conclusion: There is enough evidence to support that the average


height of American men is not 178 cm at 5% level of significance.

d) We will do a two-tailed test at α = 0.05. The hypotheses are:

H0: μ1 – μ2 = 14 cm (on average American men is 14 cm taller than women)

H1: μ1 – μ2 ≠ 14 cm (the statement is false)

Because the variance of the population is unknown and the sample standard
deviations appear different (6.00 for women height and 5.84 for men height),
we will assume that population variances are unequal. Therefore, we will
apply the formula for the case “unknown variances, assumed unequal”.

12
We will have the t statistic:

If we use the quick rule for degrees of freedom, then we would get the d.f. =
min(n1 – 1, n2 – 1) = 499 and t.05 = 1.96. Here, we can easily see that the t
statistic falls in the left tail of rejection region.

Conclusion: There is enough evidence to support that on average


American men is not 14 cm taller than women at 5% level of significance.

13

You might also like