Module 3 (301 SI-1)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

RV Institute of Technology & Management ®

MODULE-III

STATISTICAL INFERENCE 1

Topic Learning Objectives:

Upon Completion of this module, student will be able to:

• Demonstrate the validity of testing the hypothesis.


• Solve problems on testing the hypothesis and probability distribution functions two
variables.
• Apply discrete and continuous distributions in analyzing the probability models
arising in engineering field.

Hypothesis Testing
Introduction:
Statistical Inference is a branch of Statistics which uses probability concepts to deal with
uncertainty in decision making. There are a number of situations where in we come across
problems involving decision making. For example, consider the problem of buying 1 kilogram
of rice, when we visit the shop, we do not check each and every rice grains stored in a gunny
bag; rather we put our hand inside the bag and collect a sample of rice grains. Then analysis
takes place. Based on this, we decide to buy or not. Thus, the problem involves studying whole
rice stored in a bag using only a sample of rice grains.

First what is meant by hypothesis testing?


This means that testing of hypothetical statement about a parameter of population.
Conventional approach to testing:
The procedure involves the following:
1. First we set up a definite statement about the population parameter which we call it as null
hypothesis, denoted by H 0 . Null Hypothesis is the statement which is tested for possible

rejection under the assumption that it is true. Next we set up another hypothesis called
alternate statement which is just opposite of null statement; denoted by H1 which is just

complimentary to the null hypothesis. Therefore, if we start with H0 :μ=μ0 then alternate

hypothesis may be considered as either one of the following statements;


H1 :   0 , or H1 :   0 or H1 :   0 .

As we are studying population parameter based on some sample study, one can not do the job
with 100% accuracy since sample is drawn from the population and possible sample may not

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 1 | 16
RV Institute of Technology & Management ®

represent the whole population. Therefore, usually we conduct analysis at certain level of
significance (lower than 100%. The possible choices include 99%, or 95% or 98% or 90%.
Usually we conduct analysis at 99% or 95% level of significance, denoted by the symbol
 . We test H0 against H1 at certain level of significance. The confidence with which a
person rejects or accepts H0 depends upon the significance level adopted. It is usually
expressed in percentage forms such as 5% or 1% etc. Note that when  is set as 5%, then
probability of rejecting null hypothesis when it is true is only 5%. It also means that when the
hypothesis in question is accepted at 5% level of significance, then statistician runs the risk of
taking wrong decisions, in the long run, is only 5%. The above is called II step of hypothesis
testing.
Critical values or Fiducial limit values for a two tailed test:
Sl. No Level of significance Theoretical Value

1  = 1% 2.58

2  = 2% 2.33

3  = 5% 1.96

Critical values or Fiducial limit values for a single tailed test (right and test)
Tabulated value  = 1%  = 5%  = 10%

Right – tailed test 2.33 1.645 1.28

Left tailed test -2.33 -1.645 -1.28

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 2 | 16
RV Institute of Technology & Management ®

Setting a test criterion: The third step in hypothesis testing procedure is to construct a test
criterion. This involves selecting an appropriate probability distribution for the particular test
i.e. a proper probability distribution function to be chosen. Some of the distribution functions
used are t, F, when the sample size is small (size lower than 30). However, for large samples,
normal distribution function is preferred. Next step is the computation of statistic using the
sample items drawn from the population. Usually, samples are drawn from the population by
a procedure called random, where in each and every data of the population has the same chance
of being included in the sample. Then the computed value of the test criterion is compared
with the tabular value; as long the calculated value is lower then or equal to tabulated value,
we accept the null hypothesis, otherwise, we reject null hypothesis and accept the alternate
hypothesis. Decisions are valid only at the particular level significance of level adopted.
During the course of analysis, there are two types of errors bound to occur. These are (i) Type
– I error and (ii) Type – II error.

Type – I error: This error usually occurs in a situation, when the null hypothesis is true, but
we reject it i.e. rejection of a correct/true hypothesis constitute type I error.
Type – II error: Here, null hypothesis is actually false, but we accept it. Equivalently,
accepting a hypothesis which is wrong results in a type – II error. The probability of
committing a type – I error is denoted by  where
 = Probability of making type I error = Probability [Rejecting H0 | H0 is true]

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 3 | 16
RV Institute of Technology & Management ®

On the other hand, type – II error is committed by not rejecting a hypothesis when it is false.
The probability of committing this error is denoted by  . Note that
 = Probability of making type II error = Probability [Accepting H1 | H1 is false]

Critical region:
A region in a sample space S which amounts to Rejection of H0 is termed as critical region.

One tailed test and two tailed test:


This depends upon the setting up of both null and alternative hypothesis.
A note on computed test criterion value:

1. When the sampling distribution is based on population of proportions/Means, then test


criterion may be given as

( Expected results - Observed results )


Zcal =
Standard error of the distribution

Application of standard error:


1. S.E. enables us to determine the probable limit within which the population parameter
may be expected to lie. For example, the probable limits for population of proportion
are given by p±3 pqn . Here, p represents the chance of achieving a success in a single

trial, q stands for the chance that there is a failure in the trial and n refers to the size of
the sample.
2. The magnitude of standard error gives an index of the precision of the parameter.
Probable limits of population mean are:
σ
X  1.96
95% fiducial limits of population mean are n

99% fiducial limits of population mean are X  2.58 σ . Further, test criterion z cal =
x-μ
n S.E.

The binomial distribution is regarded as the sampling distribution of the number of successes
in the sample. We know that the mean of this distribution is np and S.D is √𝑛𝑝𝑞. S.E proportion
𝑝𝑞
of successes √ 𝑛 .
𝑥−𝑛𝑝
The associated standard normal variate Z be defined by 𝑧 = . The probable limits 𝑝 ±
√𝑛𝑝𝑞

𝑝𝑞
2.58√ 𝑛 .

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 4 | 16
RV Institute of Technology & Management ®

Problems:
1. A coin is tossed 400 times and the head turned up 216 times. Test the hypothesis that the
coin is un– biased?
Solution: First we construct null and alternate hypotheses set up H0 : The coin is not a biased

one. Set up H1 : Yes, the coin is biased. As the coin is assumed be fair and it is tossed 400

times, clearly we must expect 200 times heads occurring and 200 times tails. Thus, expected
number of heads is 200. But the observed result is 216. There is a difference of 16. Further,
standard error is σ= npq . With p = ½, q = ½ and n = 400, clearly  = 10 . The test criterion

difference 216 − 240


is zcal = = = 1.6 If we choose  = 5% , then the tabulated value
standard error 10
for a two tailed test is 1.96. Since, the calculated value is lower than the tabulated value; we
accept the null hypothesis that coin is un – biased.

2. A person throws a 10 dice 500 times and obtains 2560 times 4, 5, or 6. Can this be attributed
to fluctuations in sampling?
Solution: As in the previous problem first we shall set up H 0 : The die is fair and H1 : The

die is unfair. We consider that problem is based on a two – tailed test. Let us choose level of
significance as  = 5% then, the tabulated value is 1.96. Consider computing test criterion,
Expected value - observed result
zcal = ; here, as the dice is tossed by a person 5000 times, and
standard error
on the basis that die is fair, then chance of getting any of the 6 numbers is 1/6. Thus, chance
of getting either 4 or 5, or 6 is p = ½. Also, q = ½. With n = 5000, standard error, σ= npq

= 35.36. Further, expected value of obtaining 4 or 5 or 6 is 2500. Hence,


2500 - 2560
zcal = = 1.7 which is lower than 1.96. Hence, we conclude that die is a fair one.
35.36
3. A sample of 1000 days is taken from meteorological records of a certain district and 120 of
them are found to be foggy. What are the probable limits to the percentage of foggy days in
the district?
Solution: Let p denote the probability that a day is foggy in nature in a district as reported by
120
meteorological records. Clearly, p= = 0.12 and q = 0.88. With n = 1000, the probable
1000

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 5 | 16
RV Institute of Technology & Management ®

limits to the percentage of foggy days is given by p±3 pqn . Using the data available in this

problem, one obtains the answer as 0.12  3 0.12  88  1000 . Equivalently, 8.91% to 15.07%.

4. A die was thrown 9000 times and a throw of 5 or 6 was obtained 3240 times. On the
assumption of random throwing, do the data indicate that die is biased?
Solution: We set up the null hypothesis as H0 : Die is un - biased. Also, H1 : Die is biased. .

Let us take level of significance as α=5% . Based on the assumption that distribution is
normally distributed, the tabulated value is 1.96. The chance of getting each of the 6 numbers
is same and it equals to 1/6 therefore chance of getting either 5 or 6 is 1/3. In a throw of 9000
1
times, getting the numbers either 5 or 6 is ×9000 = 3000 . Now the difference in these two
3
results is 240. With p = 1/3, q = 2/3, n = 9000, S.E.= npq = 44.72. Now consider the test

Difference 240
criterion zcal = = = = 5.367 which is again more than the tabulated value.
S.E. 44.72
Therefore, we reject null hypothesis and accept the alternate that die is highly biased.

Tests of significance for large samples:


In the previous section, we discussed problems pertaining to sampling of attributes. It is time
to think of sampling of other variables one may come across in a practical situation such as
height weight etc. We say that a sample is small when the size is usually lower than 30,
otherwise it is called a large one.
The study here is based on the following assumptions: (i) the random sampling distribution of
a statistic is approximately normal and (ii) values given by the samples are sufficiently close
to the population value and can be used in its place for calculating standard error. When the
p
standard deviation of population is known, then S.E (X) =
n
where  p denotes the standard deviation of population . When the standard deviation of the

σ
population is unknown, then S.E (X) = where  is the standard deviation of the sample.
n
Fiducial limits of population mean are:

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 6 | 16
RV Institute of Technology & Management ®

σ
95% fiducial limits of population mean are X  1.96
n

σ x-μ
99% fiducial limits of population mean are X  2.58 . Further, test criterion z cal =
n S.E.

Testing the hypothesis

Significance Significance Significance of Significance of


of of a sample difference between difference between
proportions mean proportions means
𝑝1 − 𝑝2
𝑥̅ − 𝑛𝑝 𝑧= 𝑥̅ − 𝑦̅
𝑧= 1 1 𝑧=
√𝑝𝑞 (
√𝑛𝑝𝑞 𝑛1 + 𝑛2 )
𝜎2 𝜎2
s.n.v √ 1 + 2
or 𝑛1 𝑝1 + 𝑛2 𝑝2 𝑛1 𝑛2
𝑝=
𝑥̅ − 𝜇 𝑛1 + 𝑛2
𝑧=
(𝜎/√𝑛)
𝑞 = 1−𝑝

(𝑝1 − 𝑝2 ) (𝑥̅ − 𝑦̅)


Probable 𝑝 ± 𝑧𝑐 √𝑝𝑞/𝑛 or
1 1 𝜎12 𝜎22
limits ± 𝑧𝑐 √𝑝𝑞 ( + ) ± 𝑧𝑐 (√ + )
𝑥̅ ± 𝑧𝑐 (𝜎/√𝑛) 𝑛1 𝑛2 𝑛1 𝑛2

Problems:
1. A sample of 100 tyres is taken from a lot. The mean life of tyres is found to be 39, 350 kilo
meters with a standard deviation of 3, 260. Could the sample come from a population with
mean life of 40, 000 kilometers? Establish 99% confidence limits within which the mean life
of tyres is expected to lie.
Solution: First we shall set up null hypothesis, H0 :  = 40,000 , alternate hypothesis as

H1 :   40,000 . We consider that the problem follows a two tailed test and
chose  = 5% . Then corresponding to this, tabulated value is 1.96. Consider the expression

x -
for finding test criterion, zcal = . Here,  =40, 000, x = 39, 350 and  = 3, 260 , n =
S.E.

 3,260
100. S.E. = = = 326 . Thus, zcal = 1.994. As this value is slightly greater than 1.96,
n 100

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 7 | 16
RV Institute of Technology & Management ®

we reject the null hypothesis and conclude that sample has not come from a population of 40,
000 kilometers.
The 99% confidence limits within which population mean is expected to lie is given as

x  2.58×S.E. i.e. 39,350±2.58×326 = (38, 509, 40, 191) .


2. The mean life time of a sample of 400 fluorescent light bulbs produced by a company is
found to be 1, 570 hours with a standard deviation of 150 hours. Test the hypothesis that the
mean life time of bulbs is 1600 hours against the alternative hypothesis that it is greater than 1,
600 hours at 1% and 5% level of significance.
Solution: First we shall set up null hypothesis, H0 :  = 1,600 hours , alternate hypothesis as

H1 :   1,600 hours . We consider that the problem follows a two tailed test and chose  = 5%
. Then corresponding to this, tabulated value is 1.96. Consider the expression for finding test

x -
criterion, zcal = . Here,  =1, 600, x = 1, 570, n = 400 ,  = 150 hours so that using all
S.E.

these values above, it can be seen that zcal = 4.0 which is really greater than 1.96. Hence, we

have to reject null hypothesis and to accept the alternate hypothesis.


3. A light bulb company claims that the 100-watt light bulb it sells has an average life of
1200 hours with a standard deviation of 100 hours. For testing the claim 50 new bulbs were
selected randomly and allowed to burn out. The average lifetime of these bulbs was found to
be 1180 hours. Is the company’s claim is true at 5% level of significance?
Solution: Here, we are given that
Specified value of population mean = μ = 1200 hours,
Population standard deviation = σ = 100 hours,
Sample size = n = 50
Sample mean =𝑋̅ = 1180 hours.
𝐻0 : 𝜇 = 1200
𝐻1 : 𝜇 ≠ 1200
Thus, for testing the null hypothesis the test statistic is given by

𝑋̅ − 𝜇 1180 − 1200
𝑍= 𝜎 = = −1.41
100
√𝑛 √50
The critical (tabulated) values at 5% level of significance are 𝑧𝛼 = −1.96.
Hence, we have to accept the null hypothesis and to reject the alternate hypothesis.

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 8 | 16
RV Institute of Technology & Management ®

4. A manufacturer of ball point pens claims that a certain pen manufactured by him has a mean
writing-life at least 460 A-4 size pages. A purchasing agent selects a sample of 100 pens and put
them on the test. The mean writing-life of the sample found 453 A-4 size pages with standard
deviation 25 A-4 size pages. Should the purchasing agent reject the manufacturer’s claim at 1%
level of significance?
Solution: Here, we are given that
Specified value of population mean = μ = 460,
Sample size = n = 100,
Sample mean =𝑋̅ = 453,
Sample standard deviation = σ = 25
𝐻0 : 𝜇 ≥ 460
𝐻1 : 𝜇 < 460
Thus, for testing the null hypothesis the test statistic is given by

𝑋̅ − 𝜇 453 − 460
𝑍= 𝜎 = = −2.8
25
√𝑛 √100
The critical (tabulated) values at 1% level of significance is 𝑧𝛼 = −2.33.
Hence, we have to reject the null hypothesis and to accept the alternate hypothesis.

Test of significance of difference between the means of two samples

Consider two populations P1 and P2. Let S1 and S2 be two samples drawn at random from
these two different populations. Suppose we have the following data about these two samples,
say

Samples/Data Sample size Mean Standard Deviation

S1 n1 x1 1

S2 n2 x2 2

then standard error of difference between the means of two samples S1 and S2 is
2  22 Difference of sample means
S.E = 1
+ and the test criterion is Zcal = . The rest of
n1 n2 Standard error
the analysis is same as in the preceding sections.

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 9 | 16
RV Institute of Technology & Management ®

When the two samples are drawn from the same population, then standard error is

1 1 Difference of sample means


S.E =  + and test criterion is Zcal = .
n1 n2 Standard error

When the standard deviations are un – known, then standard deviations of the two samples
s12 s 22
must be replaced. Thus, S.E = + where s1 and s2 are standard deviations of the
n1 n 2
two samples considered in the problem.

Problems:

1. Intelligence test on two groups of boys and girls gave the following data:

Data Mean Standard Sample size


deviation
Boys 75 15 150

Girls 70 20 250

Is there a significant difference in the mean scores obtained by boys and girls?
Solution: We set up null hypothesis as H0 : there is no significant difference between the mean

scores obtained by boys and girls. The alternate hypothesis is considered as H1 : Yes, there is

a significant difference in the mean scores obtained by boys and girls. We choose level of
Difference of means
significance as  = 5% so that tabulated value is 1.96. Consider zcal =
Standard Error

152 202
The standard error may be calculated as S.E = + =1.761 , The test criterion is
150 250
75 - 70
zcal = = 2.84 . As 2.84 is more than 1.96, we have to reject null hypothesis and to accept
1.761
alternate hypothesis that there are some significant difference in the mean marks scored by
boys and girls.

2. A man buys 50 electric bulbs of “Philips” and 50 bulbs of “Surya”. He finds that Philips
bulbs give an average life of 1,500 hours with a standard deviation of 60 hours and Surya bulbs

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 10 | 16
RV Institute of Technology & Management ®

gave an average life of 1, 512 hours with a standard deviation of 80 hours. Is there a significant
difference in the mean life of the two makes of bulbs?

Solution: we set up null hypothesis, H0 : there is no significant difference between the bulbs

made by the two companies, the alternate hypothesis can be set as H1 : Yes, and there could

be some significant difference in the mean life of bulbs. Taking  = 1% and  =5% , the
respective tabulated values are 2.58 and 1.96. Consider standard error is

602 802 1512 - 1500


S.E = + =14.14 so that zcal = = 0.849 . Since the calculated value is
50 50 14.14
certainly lower than the two tabulated values, we accept the hypothesis there is no significant
difference in the make of the two bulbs produced by the companies.

3. A random sample of size N=100 is taken from a population with standard deviation 𝜎 =
5.1. Given that the sample mean is 𝑋̅ = 21.6. Obtain the 95% confidence interval for the
population mean 𝜇.
Solution: N=100, 𝜎 = 5.1 , 𝑋̅ = 21.6
Confidence limits for the population mean are
𝑠 5.1
̅𝑋
̅̅̅ ± 𝑍𝑐 = 21.6 ± 𝑍𝑐
√𝑁 √100

For 95% confidence level , 𝑍𝑐 = 1.96


𝑠 5.1
∴ ̅𝑋
̅̅̅ ± 𝑍𝑐 = 21.6 ± (1.96) = 21.6 ± .9996
√𝑁 √100

4. One type of aircraft is found to develop engine trouble in 5 flights out of 100 flights and
another type in 7 flights out of 200 flights. Is there a significant difference in the two types of
air crafts so far as engine defects are concerned?
Solution: Let P1 be the proportion of type 1 aircrafts that develop engine trouble, then
P1= 5/100 = 0.05 in the sample of size N1 = 100.
Let P2 be the proportion of type 2 aircrafts that develop engine trouble, then
P2= 7/200 = 0.035 in the sample of size N2 = 200.
Let us make an hypothesis that there is no difference in the two types of aircrafts.
Then the mean of the distribution of differences in proportions is zero; that is
𝜇(𝑝1 −𝑝2 ) = (𝑝1 − 𝑝2 ) = 0and the standard deviation of the distribution is

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 11 | 16
RV Institute of Technology & Management ®

𝑝1 𝑞1 𝑝2 𝑞2 0.05(0.95) 0.035(0.965)
𝜎(𝑝1 −𝑝2 ) = √ + =√ + = 0.0254.
𝑁1 𝑁2 100 200

The corresponding 𝑍−score is


(𝑝1 −𝑝2 )−𝜇(𝑝1 −𝑝2 ) 0.05−0.035
𝑧= = =0.591.
𝜎(𝑝1 −𝑝2 ) 0.0254

This Z − score is less than Zc = 1.96 and Zc = 2.58. Therefore, the hypothesis
cannot be rejected. This means that the difference between the two types of aircrafts is
not significant.

5. A sample height of 6400 soldiers has a mean of 172.34 cms and a standard deviation of 6.5
cms, while a sample of heights of 1600 sailors has a mean of 174.12 cms and a standard
deviations of 6.4 cms. Does the data indicate that the sailors are on the average taller than
soldiers. Use the left – tailed test.
Solution: Here,
𝑁1 = 6400, 𝑋̅1 = 172.34, 𝑠1 = 6.5, 𝑁2 = 1600, 𝑋̅2 = 174.12, 𝑠2 = 6.4 .
We test the Null hypothesis
H : There is no difference in the heights of soldiers and sailors(on the average), against the
alternative hypothesis.
H1 : Sailors are taller than soldiers, on the average; that is 𝜇2 > 𝜇1 , 𝑜𝑓 𝜇1 − 𝜇2 < 0.
Under the hypothesis H, we have
𝜇(𝑋̅1 −𝑋̅2) = 𝜇1 − 𝜇2 = 0 and

𝑠 2 𝑠2 2 (6.5)2 (6.4)2
𝜎(𝑋̅1 −𝑋̅2) = √ 𝑁1 + = √ 6400 + = 0.179.
1 𝑁2 1600

The corresponding Z – score is


(172.34−174.12)−0
𝑧= = −9.94.
0.179

This score lies in the critical region 𝑧 < −𝑧𝑐 for the left – tailed test at both of 0.01 and 0.05
levels of significance. Therefore, we reject the null hypothesis H. Consequently, we accept the
alternative hypothesis H1. Thus, the data indicates that, on the average, the sailors are taller
than soldiers.

6. In a sample of 500 men it was found that 60% of them had overweight. Find the 99% of
confidence limits for the proportion of men in the population having overweight.
Solution: Probability of persons having over weight p = 60% = .60
q = 40% = .40

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 12 | 16
RV Institute of Technology & Management ®

For 99% confidence level , 𝑍𝑐 = 2.58


𝑝𝑞 (.6×.4)
Probable limits are, p ± 𝑧𝑐 𝜎𝒫 = p ± 𝑧𝑐 √ 𝑁 = 0.6 ± (2.58)√ = 0.6 ± 0.057
500

7. In two samples of women from Punjab and Tamilnadu, the mean height of 1000 and 2000 women
are 67.6 and 68.0 inches respectively. If population standard deviation of Punjab and Tamilnadu
are same and equal to 5.5 inches then, can the mean heights of Punjab and Tamilnadu women be
regarded as same at 1% level of significance?
Solution: We are given
𝑛1 = 1000, 𝑛2 = 2000, 𝑥̅ = 67.6, 𝑦̅ = 68.0, 𝜎1 = 𝜎2 = 5.5
𝐻0 : 𝜇1 = 𝜇2
𝐻1 : 𝜇1 ≠ 𝜇2
Thus, for testing the null hypothesis the test statistic is given by
𝑥̅ − 𝑦̅
𝑧=
𝜎2 𝜎2
√ 1 + 2
𝑛1 𝑛2

67.6 − 68.0
𝑧= = −1.88
2 2
√(5.5) + (5.5)
1000 2000

The critical (tabulated) values at 1% level of significance are 𝑧𝛼 = −2.58.


Hence, we have to accept the null hypothesis and to reject the alternate hypothesis.
8. A university conducts both face to face and distance mode classes for a particular course indented
both to be identical. A sample of 50 students of face to face mode yields examination results mean
and SD respectively as: 𝑋̅ = 80.4, 𝜎1 = 12.8 and other sample of 100 distance-mode students
yields mean and SD of their examination results in the same course respectively as: 𝑌̅ = 74.3, 𝜎2 =
20.5. Are both educational methods statistically equal at 5% level?
Solution: We are given that
𝑛1 = 50, 𝑋̅ = 80.4, 𝜎1 = 12.8; 𝑛2 = 100, 𝑌̅ = 74.3, 𝜎2 = 20.5
𝐻0 : 𝜇1 = 𝜇2
𝐻1 : 𝜇1 ≠ 𝜇2
Thus, for testing the null hypothesis the test statistic is given by
𝑥̅ − 𝑦̅
𝑧=
𝜎2 𝜎2
√ 1 + 2
𝑛1 𝑛2

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 13 | 16
RV Institute of Technology & Management ®

80.4 − 74.3
𝑧= = 2.23
2 2
√(12.8) + (20.5)
50 100

The critical (tabulated) values at 5% level of significance are 𝑧𝛼 = 1.96.


Hence, we have to reject the null hypothesis and to accept the alternate hypothesis.
9. A machine produces a large number of items out of which 25% are found to be defective. To
check this, company manager takes a random sample of 100 items and found 35 items defective.
Is there an evidence of more deterioration of quality at 5% level of significance?
Solution: The company manager wants to check that his machine produces 25% defective items.
Here, attribute under study is defectiveness. And we define our success and failure as getting a
defective or non-defective item.
Let 𝑝0 be population proportion of defectives items = 0.25; 𝑞0 = 1 − 𝑝0 = 0.75
p is observed proportion of defectives items in the sample = 35/100 = 0.35
𝐻0 : 𝑝 ≤ 𝑝0
𝐻1 : 𝑝 > 𝑝0
Thus, for testing the null hypothesis the test statistic is given by
𝑝 − 𝑝0 0.35 − 0.25
𝑧= = = 2.31
𝑝 𝑞
√ 0 0 √0.25 × 0.75
𝑛 100

The critical (tabulated) values at 5% level of significance are 𝑧𝛼 = 1.645.


Hence, we have to reject the null hypothesis and to accept the alternate hypothesis.
10. A die is thrown 9000 times and draw of 2 or 5 is observed 3100 times. Can we regard that die
is unbiased at 5% level of significance.
Solution:
Let getting a 2 or 5 be our success, and getting a number other than 2 or 5 be a failure then in usual
notions, we have
3100
n = 9000, X = number of successes = 3100, 𝑝 = 9000 = 0.3444

Here, we want to test that the die is unbiased and we know that if die is unbiased then proportion
or probability of getting 2 or 5 is
𝑝0 be probability of getting a 2 or 5
= Probability of getting 2 + Probability of getting 5
2 1 2
𝑝0 = = ; 𝑞0 = 1 − 𝑝0 =
6 3 3
𝐻0 : 𝑝 = 𝑝0

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 14 | 16
RV Institute of Technology & Management ®

𝐻1 : 𝑝 ≠ 𝑝0
Thus, for testing the null hypothesis the test statistic is given by
𝑝 − 𝑝0 0.3444 − 0.3333
𝑧= = = 2.22
𝑝 𝑞
√ 0 0 √0.3333 × 0.6667
𝑛 900

The critical (tabulated) values at 5% level of significance are 𝑧𝛼 = 1.96.


Hence, we have to reject the null hypothesis and to accept the alternate hypothesis.

11. In a random sample of 100 persons from town A, 60 are found to be high consumers of wheat.
In another sample of 80 persons from town B, 40 are found to be high consumers of wheat. Do
these data reveal a significant difference between the proportions of high wheat consumers in town
A and town B (at α = 0.05)?

Solution: Here, attribute under study is high consuming of wheat. And we define our success and
failure as getting a person of high consumer of wheat and not high consumer of wheat respectively.
We are given that
𝑛1 = total number of persons in the sample of town A = 100
𝑛2 = total number of persons in the sample of town B = 80
𝑋1= number of persons of high consumer of wheat in town A = 60
𝑋2= number of persons of high consumer of wheat in town B = 40
The sample proportion of high wheat consumers in town A is
𝑋1 60
𝑝1 = = = 0.60
𝑛1 100
and the sample proportion of wheat consumers in town B is
𝑋2 40
𝑝2 = = = 0.50
𝑛2 80
𝐻0 : 𝑝1 = 𝑝2
𝐻1 : 𝑝1 ≠ 𝑝2
The estimate of the combined proportion (P) of high wheat consumers in two towns is given by
𝑋1 + 𝑋2 60 + 40 5 4
𝑃= = = ; 𝑄 =1−𝑃 =
𝑛1 + 𝑛2 100 + 80 9 9
Thus, for testing the null hypothesis the test statistic is given by
𝑝1 − 𝑝2 0.60 − 0.50
𝑧= = = 1.34
1 1
√𝑃𝑄 ( + ) √5 × 4 ( 1 + 1 )
𝑛1 𝑛2 9 9 100 80

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 15 | 16
RV Institute of Technology & Management ®

The critical (tabulated) values at 5% level of significance are 𝑧𝛼 = 1.96.


Hence, we have to accept the null hypothesis and to reject the alternate hypothesis.
12. A machine produced 60 defective articles in a batch of 400. After overhauling it produced 30
defective in a batch of 300. Has the machine improved due to overhauling? (Take α = 0.01).
Solution: Here, the machine produced articles and attribute under study is defectiveness. And we
define our success and failure as getting a defective or non defective article.
Therefore, we are given that
𝑋1= number of defective articles produced by the machine before overhauling = 60
𝑋2= number of defective articles produced by the machine after overhauling = 30
𝑛1 = 400; 𝑛2 = 300
Let 𝑝1 be observed proportion of defective articles in the sample before the overhauling
𝑋1 60
𝑝1 = = = 0.15
𝑛1 400
and 𝑝2 be observed proportion of defective articles in the sample after the overhauling

𝑋2 30
𝑝2 = = = 0.10
𝑛2 300
𝐻0 : 𝑝1 ≤ 𝑝2
𝐻1 : 𝑝1 > 𝑝2
Since P is unknown, so the pooled estimate of proportion is given by
𝑋1 + 𝑋2 60 + 30 9 61
𝑃= = = ; 𝑄 = 1−𝑃 =
𝑛1 + 𝑛2 400 + 300 70 70
Thus, for testing the null hypothesis the test statistic is given by
𝑝1 − 𝑝2 0.15 − 0.10
𝑧= = = 1.95
1 1
√𝑃𝑄 ( + ) √ 9 × 61 ( 1 + 1 )
𝑛1 𝑛2 70 70 400 300

The critical (tabulated) values at 1% level of significance are 𝑧𝛼 = 2.33.


Hence, we have to accept the null hypothesis and to reject the alternate hypothesis.

Video links:

1. Hypothesis Testing - Statistics - YouTube


2. Testing of Hypothesis for Difference of Two Population Means
3. Testing of Hypothesis for Difference of Two Population Proportions

III-Semester, Mathematics for Computer Science (MCS) (BCS301)


P a g e 16 | 16

You might also like