Statistical Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 188

Probability and

Expectation

NOTES
1. PROBABILITY AND EXPECTATION

STRUCTURE

Introduction
Some Important definitions
Theorems on Probability
Addition Theorems on Probability
Conditional Probability
Multiplication Theorems on Probability
Addition Theorem for Independent events
Bayes Theorem

1.1. INTRODUCTION
INTRODUCTION

The word ‘Probability’ and ‘Chance’ are quite familiar to everyone. Many a times,
we come across statements like, “Probably it may rain today”, “chances of hitting the
target are very few”. “It is possible that he may top the examination”. In the above
statements, the probably, chances, possible, etc. convey the sense of uncertainty about
the occurrence of some event. Ordinarily, it appears that there cannot be any exact
measurement for these uncertainties, but in Mathematical Statistics, we have methods
for calculating the degree of certainty of events in numerical value, under certain
conditions. When, we perform experiments in science and engineering, repeatedly under
identical conditions, we get almost the same result. There also exist experiment in
which the outcome may be different even if the experiment is performed under identical
conditions. In such experiments, the outcome of each experiment depends on chance.

1.2. SOME IMPORTANT DEFINITIONS


IMPORT

Experiment : Any operation that results in two or more outcomes is called an


experiment, and performing of an experiment is called trial.

Self-Instructional Material 1
Statistical Analysis Random Experiment : A random experiment is defined as an experiment in
which all possible outcomes are known and which can be repeated under identical
conditions but it is not possible to predict the outcome of any particular trial in advance.
e.g. Tossing a coin or throwing a die is random experiment.
NOTES Sample Space : The sample space of a random experiment is defined as the set
of all possible outcomes of the experiment. The possible outcomes are called sample
points. The sample space is generally denoted by the letter S.
e.g. In throwing a fair die, sample space is S = {1, 2, 3, 4, 5, 6}. In tossing of two
unbiased coins sample space is S = {HH, HT, TH, TT}.
Event : Any subset of the sample space is defined as an event. An event is
called an elementary (or simple) event if it contains only one sample point. In the
experiment of throwing a die, the event A of getting 2 is a simple event. We write A =
{2}. Also an event is called an impossible event if it can never occur. In the above
experiment, event B = {7} of getting 7 is an impossible event. An event which is sure to
occur is called a certain event.
e.g. In throwing a die, the event of getting a number less than 7 is a certain event.
Exhaustive Events : The total number of all possible outcomes in any trial are
known as exhaustive events or cases.
e.g. In tossing a coin, there are two exhaustive events, head and tail. In throwing a die,
there are 6 exhaustive cases, any one of the six faces may turn up.
Note. In throwing n dice, the exhaustive cases are 6n.
Equally Likely Events : Events are said to be equally likely, if there is no
reason to expect any one in preference to any other.
e.g. If we draw a card from a well-shuffled pack, we may get any card, then the 52
different cases are equally likely.
Favourable Events : The events which ensure the required happening, are
said to be favourable events.
e.g. In throwing a die, the number of cases favourable to the appearance of a multiple
of 2 are three viz. 2, 4 and 6. In drawing two cards from a pack of 52 cards, the number
of cases favourable to drawing 2 aces is 4C2.
Independent Events : Events are said to be independent if the happening (or
non-happening) of one event is not affected by the happening (or non-happening) of
others.
e.g. In case a card is drawn from a pack of well shuffled cards and is not replaced, then
the second draw of the card is dependent on the first draw. However, if the first card
drawn is replaced before drawing the second card, the result of the second draw is
independent of the first draw.
Mutually Exclusive Events : Two events are said to be mutually exclusive if
they cannot occur together i.e., if one occurs then other cannot.
e.g. In tossing a coin, the events head and tail are mutually exclusive, since if the
outcome is tail, the possibility of getting head in the same trial is ruled out.
Compound Events : Events obtained by combining together two or more
elementary events are known as the compound events.
e.g. In throwing a die, getting 5 or 6 is called a compound event.
Mathematical (or Classical) Definition of Probability : If an event can
happen in n ways which are equally likely, exhaustive and mutually exclusive and out

2 Self-Instructional Material
of these n ways, m ways are favourable to an event A, then the probability of happening Probability and
of A is given by Expectation
m
p or P(A) =
n
If A happens in m ways, it will fail in (n – m) ways so that the probability of its NOTES
failure
nm m
q or P( A ) = =1– =1–p
n n
 p + q = 1 i.e., P(A) + P( A ) = 1
0p1;0q1
If P(A) = 1, then A is called a certain event. If P(A) = 0, then A is called an impossible
event.
Statistical (or Empirical) Definition of Probability : If in n trials, an event
A happens m times then the probability of happening A is given by
m
p or P(A) = lim .
n n

THEOREMS ON PROB
PROBABILITY
OBABILITY

1.3. ADDITION THEOREMS ON PROB


PROBABILITY
OBABILITY

1.3.1. Theorem 1 (Addition Theorem for Two Events)


If A and B are two events associated with a random experiment, then
P(A  B) = P(A) + P(B) – P(A  B).
Proof. Let S be the sample space associated with the given random experiment.
Suppose the experiment results in n mutually exclusive ways. Then S contains n
elementary events.
Let m1, m2 and m be the number of elementary events favourable to A, B and
A  B respectively. Then,
m1 m2
P(A) =
, P(B) =
n n
m
and P(A  B) = .
n S
The number of elementary events A B
favourable to A only is m1 – m. Similarly, the
number of elementary events favourable to B m1 m m2
only is m2 – m. Since m elementary events are
favourable to both A and B, therefore, the
number of elementary events favourable to A or
B or both i.e., A  B is
m1 – m + m2 – m + m = m1 + m2 – m.
m1  m2  m m1 m2 m
So, P(A  B) = =  –
n n n n
 P(A  B) = P(A) + P(B) – P(A  B).

Self-Instructional Material 3
Statistical Analysis Corollary : If A and B are mutually exclusive events, then P(A  B) = 0,
therefore,
P(A  B) = P(A) + P(B)
This is the addition theorem for mutually exclusive events.
NOTES
1.3.2. Theorem 2 (Addition Theorem for three events)
If A, B and C are three events associated with a random experiment, then
P(A  B  C) = P(A) + P(B) + P(C) – P(A  B) – P(B  C)
– P(A  C) + P(A  B  C)
Proof. Let D = B  C, then
P(A  B  C) = P(A  D) = P(A) + P(D) – P(A  D) ...(1) (by Th. 1)
Now, A  D = A  (B  C) = (A  B)  (A  C)
 P(A  D) = P[(A  B)  (A  C)]
= P(A  B) + P(A  C) – P[(A  B)  (A  C)]
= P(A  B) + P(A  C) – P(A  B  C) ...(2) (by Th. 1)
[Q (A  B)  (A  C) = A  B  C]
Also P(D) = P(B  C) = P(B) + P(C) – P(B  C) ...(3)
From (1), (2) and (3), we get
P(A  B  C) = P(A) + P(B) + P(C) – P(B  C)
– [P(A  B) + P(A  C) – P(A  B  C)]
= P(A) + P(B) + P(C) – P(A  B) – P(B  C) – P(A  C)
+ P(A  B  C)
Corollary : If A, B and C are mutually exclusive events, then
P(A  B) = P(B  C) = P(A  C) = P(A  B  C) = 0
 P(A  B  C) = P(A) + P(B) + P(C)
This is addition theorem for three mutually exclusive events.

1.4. CONDITION AL PR
CONDITIONAL OB
PROBABILITY
OBABILITY

Let A and B be two events associated with a random experiment. Then, the
probability of occurrence of A under the condition that B has already occurred and
P(B)  0, is called the conditional probability and is denoted by P(A/B).
Thus, P(A/B) = Probability of occurrence of A given that B has already occurred.
Similarly, P(B/A) = Probability of occurrence of B given that A has already
occurred.

1.5. MULTIPLICA
MULTIPLICATION THEOREMS ON PR
TIPLICATION OB
PROBABILITY
OBABILITY

1.5.1. Theorem 1
If A and B are two events associated with a random experiment, then
P(A  B) = P(A) P(B/A), if P(A)  0
or P(A  B) = P(B) P(A/B), if P(B)  0

4 Self-Instructional Material
Proof. Let S be the sample space associated with the given random experiment. Probability and
Suppose S contains n elementary events. Let m1, m2 and m be the number of elementary Expectation
events favourable to A, B and A  B respectively. Then
m1 m2 m
P(A) = , P(B) = and P(A  B) = . NOTES
n n n
Since m1 elementary events are favourable
to A out of which m are favourable to B, therefore, S
A B
m
P(B/A) = .
m1
m1 m m2
m
Similarly, P(A/B) =
m2
m m m1
Now, P(A  B) = = .
n m1 n
= P(B/A) . P(A) ...(1)
m m m2
and P(A  B) = = . = P(A/B) P(B) ...(2)
n m2 n
Note 1. From (1) and (2) in the above theorem, we find that
P(A  B) P(A  B)
P(B/A) = and P(A/B) =
P(A) P(B)
P(A  B) is also written as P(AB).
2. For three events A, B, C
P(A  B  C) = P(ABC)
= Probability of the simultaneous occurrence of events A, B and C
= P(A) P(B/A) P(C/AB)
= P(A) P(B/A) P(C/(A  B))
If A1, A2, ..., An are n events, then
P(A1  A2  ...  An) = P(A1) P(A2/A1) P(A3/A1  A2) ... P(An/A1  A2  ...  An – 1)

1.5.2. Multiplication Theorems For Independent Events


Theorem 1. If A and B are independent events associated with a random
experiment, then
P(A  B) = P(A) P(B)
Proof. By multiplication theorem, we have
P(A  B) = P(A) P(B/A)
Since A and B are independent events, therefore, P(B/A) = P(B)
Hence, P(A  B) = P(A) P(B)
Theorem 2. If A1, A2, ..., An are independent events associated with a random
experiment, then
P(A1  A2  A3 ...  An) = P(A1) P(A2) ... P(An)
Proof. By multiplication theorem, we have
P(A1  A2  A3  ...  An)
= P(A1) P(A2/A1) P(A3/A1  A2) ... P(An/A1  A2  ...  An – 1)
Since A1, A2, ..., An – 1, An are independent events, therefore,
P(A2/A1) = P(A2), P(A3/A1  A2) = P(A3), ..., P(An/A1  A2  ...  An – 1)
= P(An)
Hence, P(A1  A2  A3 ...  An) = P(A1) P(A2) ... P(An)
Self-Instructional Material 5
Statistical Analysis
1.6. ADDITION THEOREM FOR INDEPENDENT EVENTS

1.6.1. Theorem
NOTES If A1, A2, ..., An are n independent events associated with a random experiment,
then

P(A1  A2  ...  An) = 1 – P( A1 ) P( A2 ) ... P( An ).


Proof. We have P(A1  A2  ...  An) = 1 – P(A 1  A 2  ...  A n )
= 1 – P( A 1  A 2  ...  A n )
= 1 – P( A 1 ) P( A 2 ) ... P( A n )
(Q A1, A2, ..., An are independent events, therefore, so are A 1 , A 2 , ..., A n )

SOLVED EXAMPLES
Example 1. Find the probability of getting a tail in throw a coin.
Solution. Clearly the sample space S = {H, T}
Event of getting tail E = {T}
Clearly n(E) = 1 and n(S) = 2
 Probability of getting a tail is given by
n(E) 1
P(E) = =
n(S) 2
or If E is the required event, then E = {T}
No. of cases favourable to E 1
Hence, P(E) = = .
Total number of cases 2
Example 2. Three coins are tossed, find the probability of getting at least two
heads.
Solution. Clearly the sample space
S = {HHH, HHT, HTH, THH, THT, TTH, HTT, TTT}
If E is the required event, then
E = {HHH, HHT, HTH, THH}
No. of cases favourable to E
P(E) =
Total number of cases
4 1
= = .
8 2
Example 3. If there are two children in a family, find the probability that there
is at least one girl in the family.
Solution. Let S be the sample space, then
S = {BB, BG, GB, GG},
where B and G stand for ‘Boy’ and ‘Girl’ respectively.
If E is the required event, then
A = {BG, GB, GG}
3
P(E) = .
4
6 Self-Instructional Material
Example 4. What is the chance that a leap-year, selected at random, will contain Probability and
53 Fridays ? Expectation

Solution. There are 366 days in a leap-year and we can write 366 = (7 × 52) + 2.
This means that the leap year will contain at least 52 Fridays. The possible combinations
for the remaining two days can be made as follows : NOTES
(i) Sunday and Monday (ii) Monday and Tuesday
(iii) Tuesday and Wednesday (iv) Wednesday and Thursday
(v) Thursday and Friday (vi) Friday and Saturday
(vii) Saturday and Sunday.
Of these seven likely cases only (v) and (vi) are favourable.
2
Hence, the required probability = .
7
Example 5. What is the probability of getting an even number in the throw of an
unbiased die ?
Solution. Clearly, there are 6 equally likely possible outcomes 1, 2, 3, 4, 5, 6.
Hence, the sample space S = {1, 2, 3, 4, 5, 6}
Let E be the required event, then we have
E = {2, 4, 6}
3 1
Hence, P(E) =
= .
6 2
Example 6. A bag contains 7 red, 12 white and 4 green balls. What is the
probability that
(i) 3 balls drawn are all white and
(ii) 3 balls drawn are one of each colour.
Solution. Total balls are = 7 + 12 + 4 = 23
3 balls out of these 23 balls can be drawn in

23 C
23  22  21
3 = = 1771 ways
321
 The sample space for this experiment contains 1771 sample point, i.e., n(S)
= 1771.
(i) Let E1 = event that the 3 balls drawn are all white. Now 3 white balls can be
drawn from 12 white balls in
12 C =
12  11  10
3 = 220 ways
321
 n(E1) = 220
n(E 1 ) 220
 P(E1) = = .
n(S) 1771
(ii) Let E2 = event that three balls are one of each colour.
Now 1 red ball can be drawn out of 7 red balls in 7C1 = 7 ways,
1 white ball can be drawn out of the 12 white balls in 12C1 = 12 ways and 1 green
ball can be drawn out of the 4 green balls in 4C1 = 4 ways.
3 balls one of each colour can be drawn in 7 × 12 × 4 = 336
 n(E2) = 336
n(E 2 ) 336
 P(E2) = = .
n(S) 1771

Self-Instructional Material 7
Statistical Analysis Example 7. From a pack of 52 cards three are drawn at random. Find the chance
that they are a king, a queen and a knave.
Solution. From a pack of 52 cards three can be drawn in 52C3 ways. Thus,
52
n = C3.
NOTES There are 4 kings, 4 queens and 4 knaves. A king can be drawn in 4C1 ways, a
queen in 4C1 ways and a knave in 4C1 ways. Since each of these may be with drawn in
4C × 4C × 4 C ways.
1 1 1
 m = 4C1 × 4C1 × 4C1
4
C1  4 C1  4 C 1
444321 16
 Required probability = 52
= = .
C3 52  51  50 5525
Example 8. A and B are two mutually exclusive events of an experiment. If
P(‘not A’) = 0.65, P(A  B) = 0.65 and P(B) = p, find the value of p.
Solution. By addition theorem for mutually exclusive events, we have
P(A  B) = P(A) + P(B)
P(A  B) = 1 – P(‘not A’) + P(B) [Q P(A) = 1 – P( A )]
0.65 = 1 – 0.65 + p
 p = 0.30.
Example 9. The probability that at least one of the events A and B occurs is 0.6.
If A and B occur simultaneously with probability 0.2, then find P( A ) + P( B ).
Solution. We have P(A  B) = 0.6 and P(A  B) = 0.2
Now P(A  B) = P(A) + P(B) – P(A  B)
0.6 = P(A) + P(B) – 0.2
0.6 = 1 – P( A ) + 1 – P( B ) – 0.2 = 1.8 – [P( A ) + P( B )]
 P( A ) + P( B ) = 1.8 – 0.6 = 1.2.
Example 10. A, B, C are three mutually exclusive and exhaustive events
3
associated with a random experiment. Find P(A), it being given that P(B) = P(A) and
2
1
P(C) = P(B).
2
Solution. Let P(A) = p. Then
3 3
P(B) =P(A)  P(B) = p
2 2
1 3
and P(C) = P(B)  P(C) = p
2 4
Since A, B, C are mutually exclusive and exhaustive events associated with a
random experiment, therefore,
ABC=S
P(A  B  C) = P(S)  P(A  B  C) = 1 [Q P(S) = 1]
P(A) + P(B) + P(C) = 1
3 3 4
p+ p p = 1  p= .
2 4 13
Example 11. A card is drawn from a pack of 52 cards. Find the probability of
getting a king or a heart or a red card.
Solution. Consider the following events :
A = getting a king, B = getting a heart
C = getting a red card.

8 Self-Instructional Material
4
C1 4 13
C1 13 Probability and
We have P(A) = 52
= , P(B) = 52
= Expectation
C1 52 C1 52
26
C1 26
P(C) = =
52
C1 52 NOTES
1
P(A  B) = P(getting a king of heart) =
52
13
P(B  C) = P(getting a heart card) =
52
2
P(C  A) = P(getting a red king) =
52
1
P(A  B  C) = P(getting a king of heart) =
52
Required probability,
P(A  B  C)
= P(A) + P(B) + P(C) – P(A  B) – P(B  C) – P(C  A) + P(A B  C)
4 13 26 1 13 2 1 28 7
=   – – –  = = .
52 52 52 52 52 52 52 52 13
Example 12. Consider an experiment throwing a pair of dice. Let A and B be the
events given by A = the sum of points is 8 ; B = there is an even number on first die. Find
P(A/B) and P(B/A).
Solution. We have A = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}
and B = {(2, 1), ..., (2, 6), (4, 1), ..., (4, 6), (6, 1), ..., (6, 6)}
5 18
 P(A) = and P(B) =
36 36
Now P(A/B) = Probability of occurrence of A when B occurs
= Probability of getting 8 as the sum when there is an even number
on first die
n( A  B) 3 1
= = =
n(B) 18 6
and P(B/A) = Probability of occurrence of B when A occurs
= Probability of getting an even number on first die when the sum
of the numbers on two dice is 8
n( A  B) 3
= = .
n(A) 5
Example 13. A bag contains 10 white and 15 black balls. Two balls are drawn
in succession without replacement. What is the probability that first is white and second
is black ?
Solution. Consider the following events :
A = getting a white ball in first draw
B = getting a black ball in second draw.
Required probability = Probability of getting a white ball in first draw and
black ball in second draw.
= P(A and B) = P(A  B)
= P(A) P(B/A)

Self-Instructional Material 9
Statistical Analysis 10
C1 10 2
Now P(A) = 25 = =
C1 25 5
and P(B/A) = Probability of getting a black ball in second draw when a
white ball has already been in first draw.
NOTES
15 15 5
C1
= = =
24
C1 24 8
(Q 24 balls are left after drawing a white ball in first draw
out of which 15 are black)
So required probability = P(A  B) = P(A) P(B/A)
2 5 1
=  = .
5 8 4
Example 14. Two balls are drawn from an urn containing 2 white, 3 red and 4
black balls one by one without replacement. What is the probability that at least one
ball is red ?
Solution. Consider the following events :
A = not getting a red ball in first draw
B = not getting a red ball in second draw
Required probability = Probability that at least one ball is red
= 1 – Probability that none is red
= 1 – P(A and B) = 1 – P(A  B)
= 1 – P(A) P(B/A)
Now P(A) = Probability of not getting a red ball in first draw
= Probability of getting an other colour (white or black) ball in
first draw
6 2
= =
9 3
When another colour ball is drawn in first draw there are 5 other colour (white
or black) balls and 3 red balls, out of which one other colour ball can be drawn in 5C1
ways.
5
 P(B/A) =
8
2 5 7
Required probability = 1 – P(A) P(B/A) = 1 –  = .
3 8 12
Example 15. If A and B are two events such that P(A) = 0.5, P(B) = 0.6 and
P(A  B) = 0.8, find P(A/B) and P(B/A).
Solution. We have P(A  B) = P(A) + P(B) – P(A  B)
 P(A  B) = P(A) + P(B) – P(A  B) = 0.5 + 0.6 – 0.8 = 0.3
P ( A  B) 0.3 1
Now, P(A/B) = = =
P(B) 0.6 2
P (A  B) 0.3 3
and P(B/A) = = = .
P(A) 0.5 5
Example 16. A coin is tossed twice and the four possible outcomes are assumed
to be equally likely. If A is the event, ‘both head and tail have appeared’, and B be the
event, ‘at most one tail is observed’, find P(A), P(B), P(A/B) and P(B/A).

10 Self-Instructional Material
Solution. Here, S = {HH, HT, TH, TT}, A = {HT, TH} and B = {HH, HT, TH}. Probability and
Expectation
 A  B = {HT, TH}
n( A ) 2 1
Now, P(A) = = =
n(S) 4 2 NOTES
n(B) 3 n( A  B) 2 1
P(B) = = and P(A  B) = = =
n(S) 4 n(S) 4 2
P (A  B) 1/ 2 2 P (A  B) 1/ 2
 P(A/B) = = = and P(B/A) = = = 1.
P(B) 3 / 4 3 P(A) 1/ 2
Example 17. A coin is tossed thrice and all eight outcomes are equally likely.
A = ‘The first throw results in head’
B = ‘The last throw results in tail’
Prove that events A and B are independent.
Solution. Let S be the sample space, then
S = {HHH, HHT, THH, HTH, TTH, HTT, THT, TTT}
A = {HHH, HHT, HTH, HTT}, B = {HHT, HTT, THT, TTT}
A  B = {HHT, HTT}
n( A ) 4 1 n(B) 4 1
P(A) = = = , P(B) = = =
n(S) 8 2 n(S) 8 2
n( A  B) 2 1
P(A  B) = = =
n(S) 8 4
1
Clearly P(A  B) = = P(A) P(B)
4
Hence, A and B are independent events.
Example 18. Events A and B are independent. Find P(B) if P(A) = 0.35 and
P(A  B) = 0.6.
Solution. We have P(A  B) = P(A) + P(B) – P(A  B)
P(A  B) = P(A) + P(B) – P(A) P(B) (Q A and B are independent)
= P(A) + P(B) [1 – P(A)]
0.6 = 0.35 + P(B) (1 – 0.35)
0.25 = 0.65 P(B)
0.25 5
P(B) = = .
0.65 13
Example 19. X can solve 90% of the problems given in a book and Y can solve
70%. What is the probability that at least one of them will solve the problem, selected at
random from the book ?
Solution. Let A and B be the events defined as follows :
A = X solves the problem, B = Y solves the problem
Clearly A and B are independent events such that
90 9 70 7
P(A) = = and P(B) = =
100 10 100 10
Now required probability = P(A  B)
= 1 – P( A ) P( B ) (Q A and B are independent events)
 9   1 – 7  = 1 – 1  3 = 0.97

= 1 – 1–
10   10  10 10

Self-Instructional Material 11
Statistical Analysis
EXERCISE 1.1
1. Find the probability of getting a head in throw a coin.
2. Three unbiased coins are tossed, find the probability of getting
NOTES (i) all heads (ii) two heads
(iii) one head (iv) at least one head
(v) at least two heads.
3. A bag contains 7 white, 6 red and 5 black balls. Two balls are drawn at random. Find the
probability that they will both be white.
4. Four cards are drawn from a pack of cards. Find the probability that
(i) all are diamonds (ii) there is one card of each suit, and
(iii) there are two spades and two hearts.
5. Two dice are thrown simultaneously. Find the probability of getting
(i) an even number as the sum (ii) the sum as a prime number
(iii) a total of at least 10 (iv) a doublet of even number.
6. Tickets numbered from 1 to 20 are mixed up together and then a ticket is drawn at
random. What is the probability that the ticket has a number which is a multiple of 3 or
7?
7. A bag contains 50 tickets numbered 1, 2, 3, ..., 50 of which five are drawn at random and
arranged in ascending order of magnitude (x1 < x2 < x3 < x4 < x5). Find the probability
that x3 = 30.
1 1
8. If A, B, C are mutually and exhaustive events, find P(B), if P(C) = P(A) = P(B).
3 2
9. If P(A) = a and P(B) = b, then show that P(A/B)  (a + b – 1)/b.
10. Two cards are drawn from a pack of 52 cards. What is the probability that either both
are red or both are kings ?
1 1
11. Given two mutually exclusive events A and B such that P(A) = and P(B) = . Find
2 3
P(A or B).
12. A die is thrown twice and the sum of numbers appearing is observed to be 6. What is the
conditional probability that the number 4 has appeared at least once ?
13. A bag contains 19 tickets, numbered from 1 to 19. A ticket is drawn and then another
ticket is drawn without replacement. Find the probability that both tickets will show
even numbers.
14. If A and B are two events such that P(A) = 0.3, P(B) = 0.6 and P(B/A) = 0.5, find P(A/B)
and P(A  B).
15. A bag contains 3 red and 4 black balls and another bag has 4 red and 2 black balls. One
bag is selected at random and from the selected bag a ball is drawn. Let A be the event
that the first bag is selected, B be the event that the second bag is selected and C be the
event that the ball drawn is red. Find P(A), P(B), P(C/A) and P(C/B).
16. If P(A) = 0.4, P(B) = p, P(A  B) = 0.6 and A and B are given to be independent events,
find the value of p.
17. A bag contains 5 white, 7 red and 4 black balls. If four balls are drawn one by one with
replacement, what is the probability that none is white ?

18. Two dice are thrown. Find the probability of getting an odd number on the first die and
a multiple of 3 on the other.
1 1 1
19. A problem of statistics is given to 3 students whose chances of solving it are , , .
2 3 4
What is the probability that the problem is solved ?

12 Self-Instructional Material
Answers Probability and
Expectation
1 1 3 3 7 1
1. 2. (i) (ii) (iii) (iv) (v)
2 8 8 8 8 2
7 11 2197 468
3. 4. (i) (ii) (iii) NOTES
51 4165 20825 20825
1 5 1 1
5. (i) (ii) (iii) (iv)
2 12 6 12
2 551 1 55
6. 7. 8. 10.
5 15134 6 221
5 2 4 1
11. 12. 13. 14. ; 0.75
6 5 19 4
4
1 1 3 2 1  11  1
15. ; ; ; 16. 17.   18.
2 2 7 3 3  16  6
3
19. .
4

1.7. BAYES THEOREM


BAYES

An event A can occur only if any one of the set of exhaustive and mutually
exclusive events B1, B2, ..., Bn occurs. The probabilities P(B1), P(B2), ..., P(Bn) and the
conditional probabilities P(A/Bi), i = 1, 2, 3, ..., n for an event A to occur are known.
Then the conditional probability P(Bi/A) when A has already occurred is given
by
P(Bi ) P(A /B i )
P(Bi/A) = n

 P(B ) P(A / B )
i i
i1
P(B i ) P ( A / B i )
=
P(B 1 ) P( A /B 1 )  P(B 2 ) P ( A /B 2 )  ...  P (B n ) P( A / B n )

SOLVED EXAMPLES
Example 1. Two boxes contain respectively 4 white and 2 black and 1 white and
3 black balls. One ball is transferred from the first box into the second and then one
ball is drawn from the second. It turns out to be black. What is the probability that the
transferred ball was white ?
Solution. Let B1 be the event that the transferred ball (ball drawn from the
first box) is white and B2 be the event that the transferred ball is black.
4 2 2 1
P(B1) =
= , P(B2) = =
6 3 6 3
Let A be the event that the ball drawn from the second box (after a ball is
transferred from the first box to the second box) is black, then
3 4
P(A/B1) = , P(A/B2) =
5 5
P(B1/A) = The probability that the ball transferred from the first box is white
when the ball drawn from the second box is known to be black.

Self-Instructional Material 13
Statistical Analysis P(B 1 ) P(A /B 1 )
P(B1/A) =
P(B 1 ) P(A /B 1 )  P(B 2 ) P (A / B 2 )
2 3 2
 2 3 3
= 3 5 = 5 =  = .
NOTES 2 3 1 4 2 4 5 2 5
   
3 5 3 5 5 15
Example 2. The chance that doctor X will diagnose disease Y correctly is 60%.
The chance that a patient will die by his treatment after correct diagnosis is 40% and
the chance of death by wrong diagnosis is 70%. A patient of doctor X, who had disease
Y, died. What is the chance that his disease was correctly diagnosed ?
Solution. Let B1 be the event that the diagnosis is correct and B2 be the event
that the diagnosis is incorrect. Let A be the event that the patient dies. Then
60
P(B1) = = 0.6, P(B2) = 1 – P(B1) = 1 – 0.6 = 0.4
100
40 70
P(A/B1) = = 0.4, P(A/B2) = = 0.7
100 100
P(B1/A) = Probability that a patient was correctly diagnosed, given that he
had died.
P(B 1 ) P(A /B 1 )
P(B1/A) =
P(B 1 ) P(A /B 1 )  P(B 2 ) P(A / B 2 )
0.6  0.4 0.24
= =
0.6  0.4  0.4  0.7 0.24  0.28
0.24
= = 0.4615 or 46.15%.
0.52
Example 3. The contents of urns I, II and III are as follows :
1 white, 2 black and 3 red balls,
2 white, 1 black and 1 red balls, and
4 white, 5 black and 3 red balls.
One urn is chosen at random and two balls drawn. They happen to be white and
red. What are the probability that they come from urns I, II and III ?
Solution. Let B1, B2 and B3 denote the events that the urn I, II and III is
chosen, respectively and let A be the event that the two balls taken from the selected
urn are white and red. Then
1
P(B1) = P(B2) = P(B3) = (Q n = 3 urns, m = 1)
3
1 3 1 21 1
P(A/B1) = = , P(A/B2) = 4 =
6
C2 5 C2 3

43 2
and P(A/B3) = 12 =
C2 11

P(B 1 ) P(A / B 1 )
P(B1/A) =
P(B 1 ) P( A / B1 )  P(B 2 ) P(A /B 2 )  P(B3 ) P(A /B 3 )

1 1
 1 165 33
= 3 5 =  =
1 1 1 1 1 2 5 118 118
    
3 5 3 3 3 11

14 Self-Instructional Material
P(B 2 ) P (A / B 2 ) Probability and
P(B2/A) = Expectation
P(B 1 ) P( A / B1 )  P (B 2 ) P (A /B 2 )  P(B3 ) P (A /B 3 )
1 1

3 3 1 165 55
= =  = NOTES
1 1 1 1 1 2 3 118 118
    
3 5 3 3 3 11
P(B3/A) = 1 – [P(B1/A) + P(B2/A)]
 33  55  = 1 – 88 = 30 .
= 1–
 118 118  118 118
Example 4. In a bolt factory machines X, Y, Z manufacture respectively 25%,
35% and 40% of the total. Of their output 5, 4 and 2 percent are defective bolts. A bolt is
drawn at random from the product and is found to be defective. What are the probabilities
that it was manufactured by machines X, Y and Z ?
Solution. Let B1, B2, B3 denote the events that a bolt selected at random is
manufactured by the machines X, Y and Z respectively and let A denote the event of
its being defective. Then we have
25 35 40
P(B1) = = 0.25, P(B2) = = 0.35, P(B3) = = 0.40
100 100 100
The probability of drawing a defective bolt manufactured by machine X is
5
P(A/B1) = = 0.05
100
4 2
Similarly, P(A/B2) = = 0.04, P(A/B3) = = 0.02
100 100
P(B1/A) = The probability that a defective bolt selected at random
is manufactured by machine A
P(B1 ) P (A /B 1 )
P(B1/A) =
P(B 1 ) P(A /B 1 )  P(B 2 ) P (A / B2 )  P(B3 ) P (A /B 3 )

0.25  0.05 125 25


= = =
0.25  0.05  0.35  0.04  0.40  0.02 345 69
P(B 2 ) P(A / B 2 )
P(B2/A) =
P(B 1 ) P( A / B1 )  P(B 2 ) P(A /B 2 )  P(B3 ) P(A /B 3 )

0.35  0.04 140 28


= = =
0.25  0.05  0.35  0.04  0.40  0.02 345 69
P(B3/A) = 1 – [P(B1/A) + P(B2/A)]

 25  28  = 1 – 53 = 16 .
= 1–
 69 69  69 69

EXERCISE 1.2
1. A doctor has taken a vaccine from either storage unit P (which contains 30 current and
10 outdated vaccines), or from unit Q (which contains 20 current and 20 outdated
vaccines), or from unit R (which contains 10 current and 30 outdated vaccines), but he is
twice as likely to have taken it from unit P as from unit Q and twice as likely to have
taken it from unit Q as from unit R. If the vaccine selected is outdated, what is the
probability that it came from unit P ?

Self-Instructional Material 15
Statistical Analysis 2. A factory has two machines. The empirical evidence has established that machines I
and II produce 30% and 70% of the output respectively. It has also been established that
5% and 1% of the output produced by these machines respectively was defective. A
defective item is drawn at random. What is the probability that the defective item was
produced by machine II ?
NOTES 3. A doctor has decided to prescribe two new drugs to 200 heart patients, as follows : 50 get
drug A, 50 get drug B and 100 get both. Drug A reduces the probability of a heart attack
by 35%, drug B reduces the probability by 20% and the two drugs, when taken together,
work independently. The 200 patients were chosen so that each has an 80% chance of
having a heart attack. If a randomly selected patient has a heart attack, what is the
probability that the patient was given both drugs ?
4. In a class of 75 students, 15 were considered to be very intelligent, 45 as medium and
the rest below average. The probability that a very intelligent student fail in a viva-voce
examination is 0.005, the medium student failing has a probability 0.05, and the
corresponding probability for a below average student is 0.15. If a student is known to
have passed the viva-voce examination, what is the probability that he is below average?
5. Suppose that there is a chance for a newly constructed flyover to collapse whether the
design is faulty or not . The chance that the design is faulty is 5%. The chance that the
flyover collapses if the design is faulty is 95%, otherwise it is 30%. A flyover collapsed.
What is the probability that it collapsed because of faulty design ?

Answers
1. 0.3636 2. 0.318 3. 0.4176
4. 0.18 5. 0.1428.

16 Self-Instructional Material
Probability Distributions

NOTES
2. PROBABILITY DISTRIBUTIONS

STRUCTURE

Binomial Distribution
Applications of Binomial Distribution
Recurrence Formula for the Binomial Distribution
Mean, Variance and Standard Deviation of Binomial Distribution
Poisson Distribution
Applications of Poisson Distribution
Recurrence Formula for the Poisson Distribution
Mean, Variance and standard Deviation of Poisson Distribution
Normal Distribution
Properties of the Normal Distribution
Standard Form of the Normal Distribution

Frequency distributions can be classified under two categories :


(i) Observed Frequency Distributions
(ii) Theoretical or Expected Frequency Distributions
Observed frequency distributions are based on actual observations and
experimentation. If certain hypothesis is assumed, it is sometimes possible to derive
mathematically what the frequency distribution of certain universe should be. Such
distributions are called Theoretical Distributions.
Here, we will deal with two types of probability distributions :
(i) Discrete Probability Distributions
(ii) Continuous Probability Distributions
Under the first type we will deal with
(i) Binomial Distribution
(ii) Poisson Distribution
Under the second type we will deal with Normal Distribution.
Discrete random variables represent count data such as the number of defectives
in a sample of n items. Continuous random variables represent measured data such as
heights, distances, temperatures in a given interval, etc.
A discrete random variable assumes each of its values with a certain probability.
A table listing all possible values that discrete random variable can take along with
the associated probabilities is called a Discrete Probability Distribution.

Self-Instructional Material 17
Statistical Analysis
2.1. BINOMIAL DISTRIBUTION
DISTRIBUTION

Binomial distribution was discovered by James Bernoulli in the year 1700.


NOTES Let a random experiment be performed repeatedly and let the occurrence of an
event in any trial be called a success and its non-occurrence a failure. Consider a
series of n independent trials. Let a random variable X denote the number of successes
in these n trials. Let p be the probability of a success and q = 1 – p that of a failure in
a single trial. Let p be constant for each trial.
The probability of r successes in n trials in a specified order (say) SSSFFS ...
FFSF (where S represents success and F failure) is given by
P(SSSFFS ... FFSF) = P(S) P(S) P(S) P(F) P(F) P(S) ... P(F) P(S) P(F)
= pppqqp ... qpq

= p . pp ...
p . qqq ... qq
r factors

( n  r ) factors)

= pr q n – r
But r successes in n trials can occur in nCr ways and the probability for each of
these ways is pr qn – r. Hence, the probability of r successes in n trials is given by
P(X = r) = nCr pr qn – r, where p + q = 1 and r = 0, 1, 2, ..., n
The probability distribution of the number of successes so obtained is called the
Binomial probability distribution and X is called the Binomial Variate.
Note. (i) P(X = r) is usually written as P(r).
(ii) n and p in the binomial distribution are called the parameters of the distribution.
(iii) Each trial has only two possible outcomes called success and failure .
(iv) There is a finite number of trials say n.
(v) All trials are identical, i.e., p (and hence q) is constant in each trial.
(vi) The trials are independent of each other.
(vii) If n independent trials repeated N times then the expected frequency of r successes is
N . P(r).

2.2. APPLICATIONS OF BINOMIAL DISTRIB


APPLICATIONS UTION
DISTRIBUTION

This distribution is mainly applied in problems concerning


(i) Number of defectives in a sample from production line.
(ii) Estimation of reliability of systems.
(iii) Number of rounds fired from a gun hitting a target.
(iv) Radar detection.

2.3. RECURRENCE FORMULA FOR THE BINOMIAL


DISTRIBUTION
DISTRIBUTION

We have P(r) = nCr pr qn – r


and P(r + 1) = nCr + 1 pr + 1 qn – (r + 1)

18 Self-Instructional Material
P (r  1)
n
C r  1 pr  1 qn  r  1 Probability Distributions
=
P( r ) n
Cr pr qn  r
n! r ! (n  r ) ! p r  1 q n  r  1
=  
(r  1) ! (n  r  1) ! n! pr q n  r NOTES
r ! (n  r) (n  r  1) ! p n  r p
=  = .
(r  1) r ! (n  r  1) ! q r1 q
nr p
or P(r + 1) = . P(r) ,
r1 q
which is the required recurrence formula. Using this formula successively, we can
find P(1), P(2), ..., if P(0) is known.

1.4. MEAN
MEAN,, VARIANCE AND ST AND
STANDARD DEVIA
ANDARD TION OF
DEVIATION
BINOMIAL DISTRIB UTION
DISTRIBUTION

For binomial distribution, we have P(r) = nCr pr qn – r


The mean () is given by
n n

Mean () =  r . P (r ) =  r.
r0
n
Cr pr qn  r
r0
= 0 + 1 . nC1 p1 qn – 1 + 2 nC2 p2 qn – 2 + ... + r nCr pr qn – r + ...
+ ... + n . nCn pn qn – n
2n(n  1) 2 n  2
= np qn – 1 + p q + ... + n . pn
2!
= n p[q n – 1 + (n – 1) p qn – 2 + ... + pn – 1]
= np(q + p)n – 1 = np (Q p + q = 1)
Hence, the mean of the binomial distribution is np.
The variance (2) is given by
n n
Variance (2) = 
r0
r 2 P(r)   2 =  [r  r(r  1)] P(r)  
r0
2

n n
=  r P(r)   r(r  1) P(r)  
r0 r0
2

n
=+  r(r  1)
r0
n
C r pr q n  r   2

=  + [2 . 1. nC2 p2 qn – 2 + 3 . 2 . nC3 p3 qn – 3 + ... + n(n – 1) . nCn pn] – 2

=+ 2
 n(n  1) 2 n  2
p q 6
n(n  1) (n  2) 3 n  3
p q  ...  n(n  1) p n – 2
"#
! 2! 3! $
=  + n(n – 1) p2 [qn – 2 + (n – 2) pqn – 3 + ... + pn – 2] – 2
=  + n(n – 1) p2 (q + p)n – 2 – 2
=  + n(n – 1) p2 – 2 (Q p + q = 1)
2
= np + n(n – 1) p – n p 2 2 (Q  = np)
= np [1 + (n – 1) p – np]
2 = np(1 – p) = npq
Hence, the variance of the binomial distribution is npq.
Self-Instructional Material 19
Statistical Analysis The standard deviation () is given by
Standard deviation () = npq
Hence, the standard deviation of the binomial distribution is npq .
NOTES q p 1  2p
Note. (i) 1 = = gives the measure of skewness of the binomial distribution.
npq npq
1 1 1
If p < , skewness is positive, if p > , skewness is negative and if p = , skewness is
2 2 2
zero.
1  6 pq
(ii) 2 = 3 + gives a measure of the kurtosis of the binomial distribution.
npq

SOLVED EXAMPLES
Example 1. A die is thrown 6 times. If getting an even number is a success, what
is the probability of :
(i) no success (ii) exactly 5 successes
(iii) at least 5 successes (iv) at most 5 successes.
Solution. Here, S = {1, 2, 3, 4, 5, 6}. Let A denote ‘getting an even number’.
A = {2, 4, 6}
n( A ) 3 1
p= = =
n(S) 6 2
1 1
q = 1– p = 1 – = , n = 6
2 2
We know that P(r) = nCr pr qn – r

 1  1 =  1


0 60 6
(i) P(no success) = P(r = 0) = 6 C0
 2  2   2
 1  1 = 6   1
5 65 6
3
(ii) P(exactly 5 successes) = P(5) = 6 C5
 2  2  2 =
32
(iii) P(at least 5 successes) = P(r  5)
= P(5 successes or 6 successes)
= P(5) + P(6)

 1  1
5 65
 1  1
6 66

 2  2  2  2
6
= C5  6 C6

3 1 61 7
 == =
32 64 64 64
(iv) P(at most 5 successes) = P(r  5)
1 63
= 1 – P(r > 5) = 1 – P(r = 6) = 1 – = .
64 64
Example 2. The items produced by a company contains 5% defective items. What
is the probability of getting 2 defective items in a sample of 10 items ?
5 1
Solution. Here, p= = , n = 10, r = 2
100 20

20 Self-Instructional Material
1 19 Probability Distributions
q=1– p=1– =
20 20
We know that P(r) = nCr pr qn – r
P(2 defective items) = P(r = 2)
NOTES
 1   19 
2 10  2
10  9 (19) 8 45  (19) 8
 20   20 
10
= C2 =  10 = .
2 (20) (20) 10
Example 3. A pair of dice thrown 10 times. If getting a doublet (same number
on both) is considered a success, find the probability of
(i) no success (ii) 3 successes.
Solution. A doublet can be obtained when a pair of dice is thrown in
(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), i.e., (6, 6) 6 ways.
The two dice can be thrown in 62 = 36 ways.
6 1
p = P(getting doublet) = =
36 6
1 5
q=1–p=1– = , n = 10
6 6
We know that P(r) = nCr pr qn – r

 5   1 0 10  0
 5 
 5 
10 10

 6  6  6  6
10
(i) P(no success) = P(0) = C0 = 1 =

 5   1 10  9  8  1   5 
3 7
   
3 10  3

 6  6 3  2  1  6   6
10
(ii) P(3 successes) = P(3) = C 3 =

120  5  5  5
7 7
  =  
216  6  9  6
= .

Example 4. Five cards are drawn successively with replacement from a well-
shuffled pack of 52 cards. What is the probability that
(i) none is spade (ii) only 3 cards are spade ?
13 1
Solution. p = P(spade card) = =
52 4
1 3
q=1–p=1– = ,n=5
4 4
We know that P(r) = nCr pr qn – r

 1   3  = 1  1   3 
0 50 5
243
(i) P(none is spade) = P(0) = 5 C0
 4  4  4 =
1024

(ii) P(only 3 cards are spade) = P(3) = C    


 1  3  3 53

 4  4
5
3

1  3
2
 
54
=
21 4

 4 3

10  9 90 45
= 5 = = .
4 1024 512

Self-Instructional Material 21
Statistical Analysis Example 5. If the probability of hitting a target is 10% and 10 shots are fired
independently. What is the probability that the target will be hit at least once ?
10 1
Solution. Here, p= =
100 10
NOTES 1 9
q=1–p=1– = , n = 10
10 10
We know that P(r) = nCr pr qn – r
P(target will be hit at least once)
= P(r  1) = 1 – P(r < 1)
= 1 – P(r = 0)
 1   9 
0 10  0
 9  10

 10   10   10 
10
= 1– C0 =1–1×1×

= 1 – 0.3487 = 0.6513.
Example 6. A policeman fires 4 bullets on a dacoit. The probability that the
dacoit will be killed by a bullet is 0.6. What is the probability that dacoit is still alive?
Solution. Here, p = 0.6, q = 1 – p = 1 – 0.6 = 0.4, n = 4
We know that P(r) = nCr pr qn – r
P(dacoit is still alive) = P(not killed)
= P(r = 0) = 4C0 (0.6)0 (0.4)4 – 0
= 1 × 1 × (0.4)4 = 0.0256.
Example 7. Find the parameters of the binomial distribution for which mean
= 4 and variance = 3.
Solution. We know that for a binomial distribution
Mean = np and variance = npq
Here, np = 4 and npq = 3
npq 3 3
We have =  q=
np 4 4
3 1
p=1–q=1– =
4 4
1
n × = 4  n = 16.
4
Example 8. Comment on the following statement. The mean of a binomial
distribution is 3 and standard deviation is 5.

Solution. We know that mean = np and standard deviation = npq for a


binomial distribution.
Here, np = 3 and npq = 5 or npq = 5
npq 5 5
Now, =  q= > 1,
np 3 3
which is not possible, because probability cannot exceed 1.
Example 9. Obtain the mean and variance of a binomial distribution for which
P(X = 3) = 16 P(X = 7) and n = 10.
Solution. P(X = 3) = 10C3 p3 q10 – 3 = 10C3 p3 q7
P(X = 7) = 10C7 p7 q10 – 7 = 10C7 p7 q3

22 Self-Instructional Material
According to the given condition Probability Distributions
10C p3 q7 = 16 × 10 C p7 q3
3 7
p3 q7 = 16 × p7 q3 (Q 10C
3 = 10C
7)
 q4 = 16p4 NOTES
 q4 = (2p)4  q = 2p
In a binomial distribution
1
p+q =1  p + 2p = 1  p=
3
1 2
q=1–p=1– =
3 3
1 10
Mean = np = 10 × =
3 3
1 2 20
Variance = npq = 10 ×  = .
3 3 9
1
Example 10. The probability of a man hitting a target is . How many times
4
must he fire so that the probability of his hitting the target at least once is greater than
2
?
3
Solution. Let the man hits the target n times.
1 1 3
Here, p= and q = 1 – p = 1 – =
4 4 4
P(hitting the target at least once) = P(r  1)
= 1 – P(r < 1) = 1 – P(r = 0)
According to given condition
2
1 – P(r = 0) >
3
   
1
0
3
n0
2
1 – nC0
   
4 4
>
3
2 1
1 – (0.75)n > , i.e., (0.75)n <
3 3
(0.75)n < 0.3333
Taking log on both sides, we have
n log (0.75) < log (0.3333)
n(– 0.1249) < – 0.47712
n(0.1249) > 0.47712
0.47712
n> i.e., n > 3.82
0.1249
n = 4.
Example 11. Six dice are thrown 729 times. How many times do you expect at
least three dice to show a five or six ?
Solution. Here, p = the probability of getting 5 or 6 with one die
2 1
= =
6 3

Self-Instructional Material 23
Statistical Analysis 1 2
q=1–p=1–
= , n = 6, N = 729
3 3
The expected number of times at least three dice showing five or six
= N . P(r  3)
NOTES
= 729 × [P(r = 3) + P(r = 4) + P(r = 5) + P(r = 6)]

  2   1
3 3
 2   1
4 2
 2   1
5 1
 2   1 "#
0 6

 3  3  3  3  3   3  3   3  #$
6
= 729  C3  6C4  6 C5  6 C6
!
729
= [160 + 60 + 12 + 1] = 233.
36
Example 12. Out of 800 families with 5 children each, how many families would
be expected to have
(i) 3 boys (ii)5 girls (iii) either 2 or 3 boys ?
Assume equal probabilities for boys and girls.
Solution. Here, n = 5, N = 800
1 1 1
p = P(a boy) =
, q = P(a girl) = 1 – p = 1 – =
2 2 2
(i) The probability of having 3 boys out of 5 children = P(r = 3)

 1  1
3 53
 1 5
= 5 C3
 2  2 = 10 
 2
10
= 0.3125=
32
The expected number of families = N . P(r = 3) = 800 × 0.3125 = 250
(ii) The probability of having 5 girls out of 5 children = P(r = 0)

 1  1
0 50
 1 5
1
= 5 C0
 2  2 =1×1×
 2 =
32
= 0.03125

The expected number of families


= N . P(r = 0) = 800 × 0.03125 = 25.
(iii) The probability of having 2 or 3 boys out of 5 children = P(r = 2 or r = 3)
= P(r = 2) + P(r = 3)
 1   1   C  1   1 
2 5–2 3 5–3

 2  2  2  2
5
= 5C2 3

 1
= 10     10   
 1
5 5

 2  2
 1 20 = 0.625
= 20    
5

 2  32
The expected number of families = N. P(r = 2 or r = 3)
= 800 × 0.625 = 500.
Example 13. In sampling a large number of parts manufactured by a machine,
the mean number of defectives in a sample of 20 is 2. Out of 2000 such samples, how
many would be expected to contain at least 3 defective parts.
Solution. Here,  = mean no. of defectives = 2
 np = 2, n = 20

24 Self-Instructional Material
2 1 Probability and
p= = = 0.1 Expectation
20 10
q = 1 – p = 1 – 0.1 = 0.9, N = 2000
The probability of having at least 3 defectives in a sample of 20 parts = P(r  3)
= 1 – P(r < 3)
NOTES
= 1 – [P(r = 0) + P(r = 1) + P(r = 2)]
= 1 – [20C0(0.1)0 (0.9)20 – 0 + 20C
1(0.1)
1 (0.9)20 – 1 + 20C
2(0.1)
2 (0.9)20 – 2]
= 1 – [0.1216 + 0.2702 + 0.2852] = 1 – 0.677 = 0.323
The expected number of samples = N . P(r  3)
= 2000 × 0.323 = 646.
Example 14. Find the binomial distribution whose mean is 5 and variance is
10
.
3
Solution. We know that mean = np and variance = npq
10
So np = 5 and npq =
3
npq 10 / 3 2
Now, =  q=
np 5 3
2 1
p = 1 – q = 1 – = . But np = 5  n = 15
3 3
Hence, binomial distribution is
 1  2 
r 15  r

 3  3
15
P(r) = Cr .
Example 15. Four coins are tossed 160 times. The number of times r heads
occur is given below.

r 0 1 2 3 4

No. of times 8 34 69 43 6

Fit a binomial distribution to this data on the hypothesis that coins are unbiased.
1
Solution. The coins are unbiased so the probability of getting head is = .
2
1 1 1
So p= ,q=1–p=1– =
2 2 2
Here, n = 4, N = 160
f(r) = expected frequency = N . P(r)
 1  1
0 40
 1 4
1
P(0) = 4 C0
 2  2  =1×1×
 2 =
16
Using recurrence relation, we have
nr p
P(r + 1) = . P(r) (Q p = q and n = 4)
r1 q
40 1 1
P(1) = P(0) = 4 × P(0) =4× =
01 16 4
41 3 3 1 3
P(2) = P(1) =  P(1) =  =
1 1 2 2 4 8
42 2 2 3 1
P(3) = P(2) =  P(2) =  =
21 3 3 8 4
43 1 1 1 1
P(4) = P(3) =  P(3) =  =
31 4 4 4 16
Self-Instructional Material 25
Statistical Analysis r P(r) N . P(r)

1 1
0 160 × = 10
16 16
1 1
NOTES 1 160 × = 40
4 4
3 3
2 160 × = 60
8 8
1 1
3 160 × = 40
4 4
1 1
4 160 × = 10
16 16

EXERCISE 2.1
1. A pair of dice thrown 6 times. If getting a total of 9 is considered a success. What is the
probability of at least 5 successes.
2. A die is thrown 6 times. If getting an odd number is a success. Find the probability of
(i) no success (ii) 5 successes
(iii) at least 5 successes (iv) at most 5 successes.
3. A coin is tossed 5 times. What is the probability of getting at least 3 heads ?
4. Find the probability distribution of the number of heads observed when a coin is tossed
3 times.
5. If on an average one ship in every ten is wrecked, find the probability that out of 5 ships
expected to arrive, 4 at least will arrive safely.
6. A pair of dice is thrown 4 times. What is the probability of getting doublets at least
twice?
7. “The mean and variance of a binomial distribution are respectively 6 and 9”. Is this
statement correct ?
8. A student is given a true-false examination with 8 questions. If he gets 6 or more correct
answers, he passes the examination. Given that he guesses the answer to each question,
find the probability that he passes the examination.
9. In a box containing 60 bulbs, 6 are defective. What is the probability that out of a sample
of 5 bulbs
(i) none is defective (ii) exactly 2 are defective ?
10. The sum of mean and variance of a binomial distribution is 15 and the sum of their
squares is 117. Determine the distribution.
11. Out of 2000 families with 4 children each, how many would you expect to have
(i) at least one boy (ii) 2 boys
(iii) 1 or 2 girls (iv) no girls ?
Assume equal probabilities for boys and girls.
12. In a sampling a large number of parts manufactured by a machine, the mean number of
defectives in a sample of 20 is 2. Out of 1000 such samples, how many would be expected
to contain at least 3 defective.
13. Assuming that 20% of the population of a city are literate, so that the chance of an
1
individual being literate is and assuming that 100 investigators each take 10
5
individuals to see whether they are literate. How many investigations would you expect
to report 3 or less were literate ?

26 Self-Instructional Material
Answers Probability and
Expectation
49 1 3 7 63
1. 2. (i) (ii) (iii) (iv)
96 64 32 64 64

1 NOTES
3.
2

4. r 0 1 2 3

P(r) 1/8 3/8 3/8 1/8

7 9  4
19 37
5.
5 10  6.
44
7. No. 8.
256
 9
(i)  
5
729 27  1  2 
r 27  r
9.
 10  (ii)
10000
10. Cr
 3  3 ; r = 0, 1, 2, ..., 27

11. (i) 1875 (ii) 750 (iii) 1250 (iv) 125


12. 323 13. 88 (approx.).

2.5. POISSON DISTRIBUTION


DISTRIBUTION

Poisson distribution is a discrete probability distribution. Poisson distribution


was discovered by the French mathematician, Simeon Denis Poisson in 1837.
In a random experiment, let p be the probability of the occurrence of an event
and let n trials be made in such a way that
(i) p is very small, i.e., p  0
(ii) n is very large, i.e., n 
(iii) np =  (say) is finite.
Then the probability of occurrence of this event r times is given by the Poisson
distribution as

 r
P(X = r) = e , where r = 0, 1, 2, 3, ...
r!
Note. (i) P(X = r) is usually written as P(r).
(ii)  is called the parameter of the distribution.
(iii) The sum of the probabilities P(r) for r = 0, 1, 2, 3, ... is 1,
Since P(0) + P(1) + P(2) + P(3) + ...
 0 1 2 3
= e  e   e   e   ...
0! 1! 2! 3!
 e  e  e 
= e   2  3  ...
1! 2! 3!

= e


1 
2 3

"#
 ... = e–  . e = 1
! 2! 3! #$
(iv) The events must be random and independent of each other.
(v) Events must be rare events.
(vi) If n independent trials repeated N times then the expected frequency of r
successes is N . P(r).

Self-Instructional Material 27
Statistical Analysis
2.6. APPLICATIONS OF POISSON DISTRIB
APPLICATIONS UTION
DISTRIBUTION

This distribution is mainly applied in problems concerning


NOTES (i) The demand for a product.
(ii) Typographical errors in a book.
(iii) The occurrence of accidents in a factory over a period of time.
(iv) The pattern of arrival of customers at a check-out counter.
(v) Number of air accidents in some time.
(vi) Number of deaths in a area by rare disease.
(vii) Number of fragments from a shell hitting a target.
(viii) Number of printing mistakes at each page of the book.

2.7. RECURRENCE FORMULA FOR THE POISSON


DISTRIBUTION
DISTRIBUTION

 r  r  1
We have P(r) = e and P(r + 1) = e
r! (r  1) !
r  1
e 
P (r  1) (r  1) ! r  1 r!  
= r =  r =  r! =
P( r )  (r  1) !  ( r  1) r ! ( r  1)
e 
r!

P(r + 1) = P( r ) ,
(r  1)
which is the required recurrence formula. Using this formula successively, we can
find P(1), P(2), ..., if P(0) is known.

2.8. MEAN
MEAN,, VARIANCE AND STAND
STANDARD DEVIA
ANDARD TION OF
DEVIATION
POISSON DISTRIB UTION
DISTRIBUTION
For Poisson distribution, we have

 r
P(r) = e
r!
The mean () is given by
 
r
Mean () =  r . P ( r) =  r . e
r0 r0

r!

r    1
 2 3 "#
=e


r0
r
r!
=e


!
0  1.
1!
 2.
2!
3.
3!
 ...
$
= e–
      ..."#
2
3

! 2! $
28 Self-Instructional Material
= e   1   
 2 3

"#
 ... = e–   e = 
Probability and
Expectation
! 2! 3! $
Hence, the mean of the Poisson distribution is equal to the parameter .
The variance (2) is given by NOTES
 
r
Variance (2) = r
r0
2
. P ( r)   2 = r
r0
2
. e 
r!
 2

0  1   2   3   4   ..."#  
2
1
2
2
2
3
2
4
2
= e–
! 1! 2 ! 3 ! 4 ! $
= e 
 2  3  4  ..."#  
. 1
2 3
2

! 1! 2 ! 3 ! $
=e
  (1  1)   (1  2)   (1  3)   ..."#  
 1
2 3
2

! 1! 2! 3! $
       ...     2  3  ... "#  
 1 
2 3 2 3

! 1 ! 2 ! 3 !   1 ! 2 ! 3 !  #$
= e  2

      ... "#  
 e   1 
2

!  1 ! 2 !  #$
  2
=e

= e–  [e +  e] 2 = e–   e (1 +  ) – 2 = (1 +  ) – 2 = 


Hence, the variance of the Poisson distribution is also .
The standard deviation () is given by standard deviation () = .
Note. The mean and the variance of the Poisson distribution is same.

SOLVED EXAMPLES
Example 1. Using Poisson distribution, find the probability that the aces of
spades will be drawn from a pack of well-shuffled cards at least once in 104
consecutive trials. (Given e–2 = 0.1353)
1
Solution. p= , n = 104
52
1
 = np = 104 × =2
52
r
 
We know that P(r) = e
r!
P(at least once) = P(r  1) = 1 – P(r < 1)
202
= 1 – P(r = 0) = 1  e
0!
= 1 – e– 2 = 1 – 0.1353 = 0.8647.
Example 2. Suppose a book of 585 pages contain 43 typographical mistakes. If
these mistakes are randomly distributed throughout the book. What is the probability
that 10 pages, selected at random will be free of mistakes ? (Given e– 0.735 = 0.4795)

Self-Instructional Material 29
Statistical Analysis 43
Solution. p= = 0.0735, n = 10
585
 = np = 0.0735 × 10 = 0.735
 r
NOTES We know that P(r) = e
r!
 0.735 (0.735) 0
Required probability = P(r = 0) = e
0!
= e– 0.735 = 0.4795.
Example 3. A car hire firm has two cars, which it hires out day-by-day. The
number of demands for a car on each day is distributed as a Poisson distribution with
mean 1.5. Calculate the proportion of days on which neither car is used the proportion
of days on which some demand is refused. (Given e– 1.5 = 0.2231)
Solution. The mean of the Poisson distribution is .
 = 1.5
r

We know that P(r) = e
r!
The proportion of days on which neither car is used
= Probability of there being no demand for the car
(1.5) 0
 1.5
= P(r = 0) = e = e– 1.5 = 0.2231
0!
The proportion of days on which some demand is refused
= Probability for the number of demands to be more than two
= P(r > 2) = 1 – P( r  2)
= 1 – [P(r = 0) + P(r = 1) + P(r = 2)]

= 1 – e  1.5  e  1.5
(1.5)
 e  1.5
(1.5) 2 "#
! 1! 2! $
= 1 – [1 + 1.5 + 1.125] e– 1.5 = 1 – 3.625 × 0.2231 = 0.1913.
Example 4. Find the probability that at most 5 defective components will be
found in a lot of 200, if experience shows that 2% of such components are defective. Also
find the probability of more than 5 defective components. (Given e –4 = 0.018).
2 1
Solution. Here, p= = , n = 200
100 50
1
 = np = 200 × =4
50
r
 
We know that P(r) = e
r!
Probability that at most 5 defective components will be found = P(r  5)
= P(r = 0) + P(r = 1) + P(r = 2) + P(r = 3) + P(r = 4) + P(r = 5)
(4) 0 (4) 1 (4) 2 (4) 3 (4) 4 (4) 5
= e 4  e 4  e 4  e 4  e 4  e 4
0! 1! 2! 3! 4! 5!

= e 4 1  4 
16 64 256 1024 "#
! 2

6

24

120 $
= 0.018 × 42.86 = 0.7715
Probability of more than 5 defective components = P(r > 5)
= 1 – P(r  5) = 1 – 0.7715 = 0.2285.

30 Self-Instructional Material
Example 5. It is given that 2% of the electric bulbs manufactured by a company Probability and
are defective. Using Poisson distribution, find the probability that a sample of 200 Expectation
bulbs will contain
(i) no defective bulb (ii) 2 defective bulbs
(iii) atmost 3 defective bulbs (iv) at least 3 defective bulbs. (Given e–4 = 0.0183). NOTES
2 1
Solution. Here, p= = , n = 200
100 50
1
 = np = 200 × =4
50
r
 
We know that P(r) = e
r!
(i) Probability of no defective bulb = P(r = 0)

4 (4) 0
=e = e– 4 = 0.0183
0!
(ii) Probability of 2 defective bulbs = P(r = 2)

4 (4) 2 16
= e = 0.0183 × = 0.1464
2! 2
(iii) Probability of at most 3 defective bulbs = P(r  3)
= P(r = 0) + P(r = 1) + P(r = 2) + P(r = 3)

4 (4) 1 (4) 2 (4)3


= e– 4 + e  e 4  e 4
1! 2! 3!
4  32 "# 71
= e 1 4  8 
! 3 $
= 0.0183 ×
3
= 0.4331
(iv) Probability of at least 3 defective bulbs = P(r  3)
= 1 – P(r < 3) = 1 – [P(r = 0) + P(r = 1) + P(r = 2)]
= 1 – [e– 4 + 4e– 4 + 8e– 4] = 1 – e– 4 × 13 = 1 – 0.0183 × 13
= 1 – 0.2379 = 0.7621.
Example 6. Assume that the chance of an individual coal-miner being killed in
1
a mine accident during a year is . Use Poisson distribution to calculate the
1500
probability that in a mine employing 375 minors, there will be at least one total accident
in a year. (Given e– 0.25 = 0.78).
1
Solution. Here, p= , n = 375
1500
1 1
 = np = 375 × = = 0.25
1500 4
r
 
We know that P(r) = e
r!
Probability of at least one total accident = P(r  1)
= 1 – P(r < 1) = 1 – P(r = 0)
(0.25)0
 0.25
=1– e = 1 – e– 0.25 = 1 – 0.78 = 0.22.
0!
Example 7. A manufacturer knows that the razor blades he makes contain on
the average 0.5% defectives. He packs them in packets of 5. What is the probability that
a packet picked at random contains 3 or more defective blades ? (Given e– 0.025 = 0.9753).

Self-Instructional Material 31
Statistical Analysis 0.5
Solution. Here, p = 0.5% = = 0.005, n = 5
100
 = np = 5 × 0.005 = 0.025
r

NOTES We know that P(r) = e
r!
Probability of 3 or more defective blades = P(r  3)
= 1 – P(r < 3) = 1 – [P(r = 0) + P(r = 1) + P(r = 2)]
 (0.025) 1
= 1 – e  0.025  e  0.025  e  0.025
(0.025) 2"#
–
!
0.025
1! 2! $
=1–e [1 + 0.025 + 0.0003125]
= 1 – 0.9753 × 1.0253 = 1 – 0.999975 = 0.00002491.
Example 8. If the variance of the Poisson distribution is 2, find probabilities for
r = 1, 2, 3, 4 using recurrence relation of the Poisson distribution. Also find P(r  4).
(Given e– 2 = 0.1353).
Solution. Here, variance = 2
So =2
 r
We know that P(r) = e
r!
(2) 0
2
P(0) = e = e– 2 = 0.1353
0!
We know that the recurrence relation is

P(r + 1) = P(r)
r1
Now putting r = 0, 1, 2, 3 in the recurrence relation, we have

P(1) = P(0) = 2 × 0.1353 = 0.2706
1
 2
P(2) = P(1) = × 0.2706 = 0.2706
2 2
 2
P(3) = P(2) = × 0.2706 = 0.1804
3 3
 2
P(4) = P(3) = × 0.1804 = 0.0902
4 4
and P(r  4) = 1 – P(r < 4)
= 1 – [P(r = 0) + P(r = 1) + P(r = 2) + P(r = 3)]
= 1 – [P(0) + P(1) + P(2) + P(3)]
= 1 – (0.1353 + 0.2706 + 0.2706 + 0.1804)
= 1 – 0.8569 = 0.1431.
Example 9. A manufacturer who produces medicine bottles finds that 0.1% of
the bottles are defective. The bottles are packed in boxes containing 500 bottles. A drug
manufacturer buys 1000 boxes from the producer of bottles. Using Poisson distribution,
find how many boxes will contain :
(i) no defective bottle
(ii) at least two defective bottles. (Given e– 0.5 = 0.6065)
0.1
Solution. Here, p = 0.1% = = 0.001, n = 500
100
 = np = 500 × 0.001 = 0.5, N = 1000
32 Self-Instructional Material
Number of boxes containing no defective bottle = N . P(r = 0) Probability and
Expectation
 0.5 (0.5) 0
= 1000 × e
0!
= 1000 × 0.6065 = 606.5 = 606 (approx.) NOTES
Number of boxes containing at least two defective bottles = N . P(r  2)
= N . [1 – P(r < 2)]
= N × [1 – (P(r = 0) + P(r = 1))]
  (0.5) 1  "#
!   #$
= 1000  1  e  0.5  e  0.5
1
= 1000 × [1 – (0.6065 × 1.5)] = 1000 × [1– 0.90975]
= 1000 × 0.09025 = 90.25 = 90 (approx.).
Example 10. After correcting 100 pages of a book, the proof-reader finds that
there are on the average, 4 errors per 10 pages. How many pages would one expect to
find with 0, 1 and 2 errors in 1000 pages of the first print of the book ?
(Given e– 0.4 = 0.6703).
Solution. Here,  = average number of errors per page
4
= = 0.4, N = 1000
10
r
 
We know that P(r) = e
r!

 0.4 (0.4) 0
(i) Probability of no errors = P(r = 0) = e = e– 0.4 = 0.6703
0!
Number of pages containing no errors = N . P(r = 0)
= 1000 × 0.6703 = 670.3 = 670 (approx.)
(ii) Probability of one error = P(r = 1)

 0.4 (0.4) 1
=e = 0.6703 × 0.4 = 0.26812
1!
Number of pages containing one error = N . P(r = 1)
= 1000 × 0.26812 = 268.12 = 268 (approx.)
(iii) Probability of two errors = P(r = 2)

 0.4 (0.4) 2
=e = 0.6703 × 0.08 = 0.053624
2!
Number of pages containing two errors = N . P(r = 2)
= 1000 × 0.053624 = 53.624 = 54 (approx.).
Example 11. For a Poisson variate X, calculate P(X > 0), if it is given that
4P(X = 4) = 5P(X = 5).
Solution. Given 4P(X = 4) = 5P(X = 5)
4  
5
4 . e  = 5.e
4! 5!
4 5
4 =5
4! 5 4!
Self-Instructional Material 33
Statistical Analysis 44 = 5  =4
P(X > 0) = P(r > 0) = 1 – P(r  0) = 1 – P(r = 0)
(4) 0
4
=1– e = 1 – e– 4 = 1 – 0.0183 = 0.9817.
NOTES 0!
Example 12. The frequency of accidents per shift in a factory is shown in the
following data :

Accidents per shift Frequency

0 192
1 100
2 24
3 3
4 1

Total 320

Calculate the mean number of accidents per shift. Fit a Poisson distribution and
calculate theoretical frequencies.
Solution. Mean number of accidents per shift
fx 0  100  48  9  4 161
= = = = 0.5031
f 320 320
 = 0.5031
r
Required Poisson distribution = N . e  
r!
(0.5031) r
= 320 × e– 0.5031 ×
r!
(193.48) (0.5031) r
=
r!

r N.P(r) Theoretical frequencies

0 193.48 193
1 97.34 97
2 24.49 24
3 4.10 4
4 0.51 1

Example 13. A typist kept a record of mistakes per day during 300 working
days :

Mistakes per day 0 1 2 3 4

Number of days 143 90 44 14 9

Fit a Poisson distribution for the above data and calculate theoretical frequencies.

34 Self-Instructional Material
fx Probability and
Solution. Here,  = mean = Expectation
f
0  90  88  42  36 256
= = = 0.853, N = 300
300 300
NOTES
 0.853 (0.853) r (0.853) r
P(r) = e = (0.426)
r! r!
(0.853)0
P(0) = (0.426) = 0.426
0!
(0.853) 1
P(1) = (0.426) = 0.426 × 0.853 = 0.363
1!
(0.853) 2
P(2) = (0.426) = 0.426 × 0.3638 = 0.155
2!
(0.853)3
P(3) = (0.426) = 0.426 × 0.1034 = 0.044
3!
(0.853) 4
P(4) = (0.426) = 0.426 × 0.0221 = 0.009
4!

r N . P(r) Theoretical frequencies

0 127.8 128
1 108.9 109
2 46.5 47
3 13.2 13
4 2.7 3

EXERCISE 2.2
1. Suppose a book of 600 pages contain 40 printing mistakes. If these mistakes are randomly
distributed throughout the book. What is the probability that 10 pages, selected at
random, will be free of mistakes ? (Given e– 0.67 = 0.51)
2. Suppose 300 misprints are distributed randomly throughout a book of 500 pages. Find
the probability that a given page contains (i) exactly 2 misprints (ii) 2 or more misprints.
3. Suppose 2 percent of the items made by a factory are defective. Find the probability that
there are 3 defective items in a sample of 100 items. (Given e– 2 = 0.135)
4. If the probability that an individual suffers a bad reaction from a certain injection is
0.001, determine the probability that out of 2000 individuals
(i) exactly 3 (ii) more than 2 individuals
(iii) none (iv) more than 1 individual, will suffer a bad reaction.
5. An insurance company finds that 0.005% of the population dies from a certain kind of
accident each year. What is the probability that the company must pay off no more than
3 of 10,000 insured risks against such accident in a given year ? (Given e– 0.5 = 0.6065)
6. A manufacturer of screws knows that 4% of his product is defective. If he sells the screws
in boxes of 100 and guarantee that not more than 5 screws will be defective. What is the
probability that a box will fail to meet the guaranteed quality ?
7. A manufacturer knows that the condensers he makes contain on the average 1% of the
defectives. He packs them in boxes of 100. What is the probability that a box picked at
random will contain 4 or more defective condensers ?

Self-Instructional Material 35
Statistical Analysis 8. Assume that the probability of an individual coal-miner being killed in a mine accident
1
during a year is . Use Poisson distribution to calculate the probability that in a
2400
mine employing 200 miners, there will be at least one fatal accident in a year.
NOTES 9. An insurance company found that only 0.01% of the population is involved in a certain
type of accident each year. If its 1000 policy holders were randomly selected from the
population, then what is the probability that not more than two of its clients are involved
in such an accident next year ? (Given e– 0.1 = 0.9048)
10. If X is a Poisson variate such that P(X = 2) = 9 P(X = 4) + 90 P(X = 6), find the mean of X.
11. If X is a Poisson variate such that P(X = 1) = 0.01487 ; P(X = 2) = 0.04461, find P(X = 3).
12. If X is a Poisson variate such that P(X = 1) = P(X = 2) ; find
(i) mean of the distribution (ii) P(X = 0) (iii) P(X = 4)
(Given e– 2 = 0.1353)
13. The number of accidents in a year involving taxi drivers in a city follows a Poisson
distribution with mean equal to 3. Out of 1000 taxi drivers, find approximately the
number of drivers with
(i) no accident in a year (ii) more than 3 accident in a year.
1
14. In a certain factory turning out razor blades, there is small chance for any blade to
500
be defective. The blades are supplied in packets of 10. Using Poisson’s distribution,
calculate the approximate number of packets containing
(i) no defective
(ii) one defective and
(iii) two defective blades respectively in a consignment of 10,000 packets.
(Given e– 0.02 = 0.9802)
15. The distribution of typing mistakes committed by a typist is given below :

Mistakes per page 0 1 2 3 4 5

No. of pages 142 156 69 27 5 1

Assuming Poisson model, find out the expected frequencies.


16. Accidents per day were recorded in a certain city for a period of 400 days, as follows :

No. of accidents 0 1 2 3 4 5

No. of days 213 128 37 18 3 1

Assuming Poisson model, find out the expected frequencies.


17. The first proof of 200 pages of a book containing 560 pages revealed the following
distribution of the number of printing errors :

No. of errors in a page 0 1 2 3 4 5

No. of pages 112 63 20 3 1 1

Fit a Poisson distribution corresponding to these data.

Answers
1. 0.51 2. (i) 0.1 (ii) 0.122
3. 0.18
4. (i) 0.18 (ii) 0.325 (iii) 0.135 (iv) 0.59

36 Self-Instructional Material
5. 0.3235 6. 1  e 4
5  4  4  4  4 "#
2 3 4 5
Probability and
Expectation
! 2 ! 3 ! 4 ! 5 ! #$
7. 0.019 8. 0.08 9. 0.9998
10. 1 11. 0.08922
NOTES
12. (i) 2 (ii) 0.1353 (iii) 0.0902
13. (i) 50 (ii) 353
14. (i) 9802 (ii) 196 (iii) 2
15. 147, 147, 74, 25, 6, 1 pages 16. 202, 138, 47, 11, 2, 0
17. 109, 66, 20, 4, 1, 0

2.9. NORMAL DISTRIBUTION


DISTRIBUTION

The normal distribution was discovered by French mathematician De-Moivre


in 1733. It was derived from the binomial distribution in the limiting case. The normal
distribution is a continuous distribution.
A random variable X is said to have a normal distribution with mean  and
standard deviation  if its probability density function is given by
2
1  x  
1   
2  
f(x) = e ; – < x < , –  <  < ,  > 0,
 2
where e = 2.7183 and 2 = 2.5066.
The normal distribution with mean
 and variance 2 is denoted by N(, 2).
The graph of normal distribution is
called the normal curve. It is bell-shaped and
symmetrical about mean . The two tails of
the curve extend to +  and –  towards the
positive and negative directions of the x-axis
respectively and gradually approach the –¥ x=a x=b x=m +¥
x-axis without ever meeting it.
For a normally distributed random variable x with mean  and variance 2 the
probability that x lies between a and b is given by
P(a < x < b) = area under the normal curve f(x) between
x=a and x = b.

2.10. PROPER
PROPERTIES OF THE NORMAL DISTRIB
OPERTIES UTION
DISTRIBUTION

The normal probability curve with mean  and standard deviation  is given by
the equation
2
1  x  
1   
2  
f(x) = e
 2

Self-Instructional Material 37
Statistical Analysis and has the following properties :
(i) f(x)  0

NOTES
(ii) I

f ( x) dx = 1 i.e., the total area under the normal curve above the x-axis is 1.

(iii) The normal curve is bell-shaped and symmetrical about the line x =  i.e.,
mean.
(iv) It is a unimodal distribution i.e., mean = median = mode.
(v) The height of the normal curve is maximum at the mean value. The maximum
1
ordinate at x =  is given by y = .
 2
(vi) P( –  < x <  + ) = 68%
P( – 2 < x <  + 2) = 95.5%
P( – 3 < x <  + 3) = 99.7%.

2.11. STAND
STANDARD FORM OF THE NORMAL DISTRIB
ANDARD UTION
NORMAL DISTRIBUTION

A random variable Z which has a normal distribution with  = 0 and  = 1 is


said to have a standard distribution. The probability density function for the normal
distribution in standard form is given by
1  z2 / 2
f(Z) = e ,
2
x
where Z = ; Z is called the standardized normal random variable and is denoted

by N(0, 1).
P(a  Z  b) = area under the standard normal curve between Z = a and Z = b.
Note. The probabilities P(z1  Z  z2), P(z1 < Z  z2), P(z1  Z < z2) and P(z1 < Z < z2) are all
regarded to be the same.

SOLVED EXAMPLES
Example 1. The marks obtained by a group of students who appeared for a test
were normally distributed with mean 80 and standard deviation 6. Find the standard
scores for the student who scored
(i) 98 marks (ii) 58 marks (iii) 50 marks.
Solution. Suppose x is normally distributed with mean () = 80 and standard
deviation () = 6.
We know that
x
Standard normal variate Z =

98  80 18
(i) When x = 98, Z = = =3
6 6
58  80  22
(ii) When x = 58, Z = = = – 3.67
6 6
50  80  30
(iii) When x = 50, Z = = = – 5.
6 6

38 Self-Instructional Material
Example 2. Find the area under the standard normal curve in each of the Probability and
following : Expectation
(i) P(0  Z  1.4) (ii) P(– 1.67  Z < 0)
(iii) P(0.65  Z < 2.35) (iv) P(– 3  Z < 1.6).
Solution. Using the table of the area of standard normal curve, we have NOTES
(i) P(0  Z  1.4) = 0.4192

z=0 z = 1.4 z = –1.67 z=0

(i) (ii)

(ii) P(– 1.67  Z < 0) = P(0 < Z  1.67) = 0.4525


(by symmetry)
(iii) P(0.65  Z < 2.35) = P(0 < Z  2.35)
– P(0 < Z  0.65)
= 0.4906 – 0.2422
= 0.2484.

z = 0 z = 0.65 z = 2.35

(iv) P(– 3  Z < 1.6) = P(– 3  Z < 0)


+ P(0  Z < 1.6)
= P(0 < Z  3) + P(0  Z < 1.6)
(by symmetry)
= 0.4987 + 0.4452
= 0.9439.
z = –3 z = 0 z = 1.6

Example 3. Students of a class were given an aptitude test. Their marks were
found to be normally distributed with mean 60 and standard deviation 5. What
percentage of students scored more than 60 marks ?
Solution. Here,  = 60,  = 5, x = 60
x
Z=

60  60
= =0
5
P(x > 60) = P(Z > 0)
= P(0 < Z < )
= 0.5 = 50%.

Self-Instructional Material 39
Statistical Analysis Example 4. A sample of 100 dry battery cells tested to find the length of life
produced the following results :
 = 12 hours, = 3 hours
Assuming data to be normally distributed, what percentage of battery cells are
NOTES expected to have life
(i) more than 15 hours (ii) less than 6 hours
(iii) between 10 and 14 hours ?
Solution. Here, x denotes the length of life of dry battery cells.
x x  12
We know that Z= =
 3
15  12 3
(i) When x = 15, Z = = =1
3 3
P(x > 15) = P(Z > 1)
= P(0 < Z < ) – P(0 < Z < 1)
= 0.5 – 0.3413
= 0.1587 = 15.87%.
z=0 z=1
6  12 6
(ii) When x = 6, Z = = =–2
3 3
P(x < 6) = P(Z < – 2)
= P(Z > 2)
= P(0 < Z < ) – P(0 < Z < 2)
= 0.5 – 0.4772
= 0.0228 = 2.28%
(iii) When x = 10,
z = –2 z = 0 z=2
10  12 2
Z= = = – 0.67
3 3
14  12 2
When x = 14, Z =  = 0.67
3 3
P(10 < x < 14) = P(– 0.67 < Z < 0.67)
= P(– 0.67 < Z < 0) + P(0 < Z < 0.67)
= P(0 < Z < 0.67) + P(0 < Z < 0.67
= 2P(0 < Z < 0.67)
= 2 × 0.2486 = 0.4972
z = –0.67 z = 0 z = 0.67
= 49.72%.
Example 5. A normal distribution is given with mean 50 and standard deviation
8. Find the probability that x assumes a value between 38 and 72.
Solution. Here,  = 50,  = 8
x x  50
Z= =
 8
38  50  12
When x = 38, Z = = = – 1.5
8 8
72  50 22
When x = 72, Z = = = 2.75
8 8
P(38 < x < 72) = P(– 1.5 < Z < 2.75)
z = –1.5 z = 0 z = 2.75
= P(– 1.5 < Z < 0) + P(0 < Z < 2.75)
= P(0 < Z < 1.5) + P(0 < Z < 2.75)
= 0.4332 + 0.4970 = 0.9302.
40 Self-Instructional Material
Example 6. In a sample of 1000 cases, the mean of a certain test is 14 and Probability and
standard deviation is 2.5. Assuming the distribution to be normal, find : Expectation

(i) how many students score between 12 and 15 ?


(ii) how many scores above 18 ?
NOTES
(iii) how many scores below 8 ?
(iv) how many scores 16 ?
Solution. Here, N = 1000,  = 14 and  = 2.5
x x  14
Z= =
 2.5
(i) When x = 12,
12  14 2
Z= = = – 0.8
2.5 2.5
When x = 15,
15  14 1
Z= = = 0.4
2.5 2.5 z = –0.8 z = 0 z = 0.4
P(12 < x < 15) = P(– 0.8 < Z < 0.4)
= P(– 0.8 < Z < 0) + P(0 < Z < 0.4)
= P(0 < Z < 0.8) + P( 0 < Z < 0.4)
= 0.2881 + 0.1554 = 0.4435
Number of students scoring between 12 and 15 = 1000 × 0.4435
= 443.5 ~_ 444 (approx.)
18  14 4
(ii) When x = 18, Z= = = 1.6
2.5 2.5
P(x > 18) = P(Z > 1.6)
= P( 0 < Z < )
– P(0 < Z < 1.6)
= 0.5 – 0.4452
= 0.0548
Number of students scoring above 18
= 1000 × 0.0548
_ 55 (approx.)
= 54.8 ~ z=0 z = 1.6

8  14 6
(iii) When x = 8, Z = = = – 2.4
2.5 2.5
P(x < 8) = P(Z < – 2.4)
= P(Z > 2.4)
= P(0 < Z < ) – P(0 < Z < 2.4)
= 0.5 – 0.4918
= 0.0082
Number of students scoring below 8
= 1000 × 0.0082
z = –2.4 z = 0
_ 8 (approx.)
= 8.2 ~
(iv) Area between x = 15.5 and x = 16.5
15.5  14 1.5
When x = 15.5, Z = = = 0.6
2.5 2.5

Self-Instructional Material 41
Statistical Analysis 16.5  14 2.5
When x = 16.5, Z = = =1
2.5 2.5
P(15.5 < x <16.5) = P(0.6 < Z < 1)
= P(0 < Z < 1) – P(0 < Z < 0.6)
NOTES
= 0.3413 – 0.2257
= 0.1156
Number of students scoring 16
z = 0 z = 0.6 z = 1
= 1000 × 0.1156
_ 116 (approx.).
= 115.6 ~
Example 7. The life of army shoes is normally distributed with mean 8 months
and standard deviation 2 months. If 5000 pairs are, issued, how many pairs would be
expected to need replacement after 12 months ?
Solution. Here,  = 8,  = 2, N = 5000
x 12  8 4
Z= = = =2
 2 2
P(x > 12) = P(Z > 2)
= P(0 < Z < ) – P(0 < Z < 2)
= 0.5 – 0.4772
= 0.0228 z=0 z=2

Number of pairs whose life is more than 12 months


= 5000 × 0.0228 = 114
Replacement after 12 months = 5000 – 114 = 4886 Pairs.
Example 8. Assuming that the diameters of 1000 brass plugs taken consecutively
from a machine form a normal distribution with mean 0.7515 inches and standard
deviation 0.0020 inches. How many of the plugs are likely to be rejected if the diameter
is to be 0.752 ± 0.004 inches ?
Solution. Here, N = 1000, = 0.7515,  = 0.0020
Least diameter of non-defective plug
= 0.752 – 0.004 = 0.748 inches
Greatest diameter of non-defective plug
= 0.752 + 0.004
= 0.756 inches
0.748  0.7515
When x = 0.748, Z =
0.0020
0.0035
=– = – 1.75
0.0020
z = –1.75 z = 0 z = 2.25

0.756  0.7515 0.0045


When x = 0.756, Z = = = 2.25
0.0020 0.0020
P(0.748  x  0.756) = P(– 1.75  Z  2.25)
= P(– 1.75  Z  0) + P(0  Z  2.25)
= P(0  Z  1.75) + P(0  Z  2.25)
= 0.4599 + 0.4878 = 0.9477
_ 948 (approx.)
Number of plugs to be accepted = 1000 × 0.9477 = 947.7 ~
Number of plugs likely to be rejected = 1000 – 948 = 52.
42 Self-Instructional Material
Example 9. A manufacturer knows from experience that the resistance of resistors Probability and
he produces is normal with mean 100 ohms and standard deviation 2 ohms. What Expectation
percentage of resistors will have resistance between 98 ohms and 102 ohms ?
Solution. Here,  = 100,  = 2
NOTES
x x  100
Z= =
 2
98  100 2
When x = 98, Z = =  =–1
2 2
102  100 2
When x = 102, Z = = =1
2 2
P(98 < x < 102) = P(– 1 < Z < 1) = P(– z = –1 z=0 z=1
1 Z < 0) + P(0  Z  1)
= P(0 Z  1) + P(0  Z  1) = 2P(0 Z  1)
= 2 × 0.3413 = 0.6826
Resistors having resistance between 98 ohms and 102 ohms = 68.26%.
Example 10. In a normal distribution, 31% of the items are under 45 and 8%
are over 64. Find the mean and standard deviation of the distribution.
Sol. Let  be the mean and  be the standard deviation.
45  
When x = 45, Z1 =

64  
When x = 64, Z2 =

Area between 0 and Z1 = 0.50 – 0.31 = 0.19
From the table, when area is 0.19,
Z1 = – 0.496 (Z1 < 0)
Area between 0 and Z2 = 0.5 – 0.08 = 0.42
From the table, when area is 0.42,
Z2 = 1.405 19% 42%
We have 31% 8%
z = z1 z=0 z = z2
45   64  
– 0.496 = and 1.405 =
 
Now on solving for  and  we have
 = 50 and  = 10.
Example 11. Fit a normal curve to the following data :

Length of line (in cm) 4 6 8 10 12 14 16 18 20 22 24

Frequency 1 7 15 22 35 43 38 20 13 5 1

Solution. First find the mean  and standard deviation  as follows :


Let assumed mean A = 14

Self-Instructional Material 43
Statistical Analysis
Length of line (in cm) Frequency d=x–A fd fd2
x f = x – 14

4 1 – 10 – 10 100
NOTES 6 7 –8 – 56 448
8 15 –6 – 90 540
10 22 –4 – 88 352
12 35 –2 – 70 140
14 (A) 43 0 0 0
16 38 2 76 152
18 20 4 80 320
20 13 6 78 468
22 5 8 40 320
24 1 10 10 100

Total f = 200 fd = – 30 fd2 = 2940

fd ( 30)
Mean () = A + = 14 + = 14 – 0.15 = 13.85
f 200

fd 2 fd   2
2940 
30  2
Standard deviation () =
f

f   =
200
– –

200 
= 14.7  0.0225 =
14.6775 = 3.83
Hence, the equation of the normal curve is given by
2
1 x    ( x  13.85) 2
1 – 
2  
 1 –
29.355
f(x) = e = e .
 2 (3.83) 2

EXERCISE 2.3
1. On a final examination in Statistics, the mean was 72, and the standard deviation was
15. Determine the standard scores of students receiving grades :
(i) 60 (ii) 93 (iii) 72.
2. Find the area under the normal curve in each of the cases :
(i) Z = 0 and Z = 1.2 (ii) Z = – 0.68 and Z = 0
(iii) Z = – 0.46 and Z = 2.21 (iv) Z = 0.81 and Z = 1.94
(v) To the left of Z = – 0.6 (vi) Right of Z = – 1.28.
3. Find the value of Z in each of the cases
(i) area between 0 and Z is 0.3770 (ii) area to the left of Z is 0.8621
4. If X is a normal variate with mean 12 and standard deviation 4, then find
(i) P(X  20) (ii) P(X  20)
5. The scores of candidates in a certain test are normally distributed with mean 500 and
standard deviation 100. What percentage of candidates receives the scores between 350
and 550 ?
6. Assume the mean height of soldiers to be 68.22 inches with a variance of 10.8 inches
square. How many soldiers in a regiment of 10,000 would you expect to be over 6 feet
tall?
7. In a sample of 1000 items, the mean weight and standard deviation are 45 kgs and
15 kgs respectively. Assuming the distribution to be normal, find the number of items
weighing between 40 kgs and 60 kgs.
44 Self-Instructional Material
8. A workshop produces 2000 units per day. The average weight of units is 130 kgs with a Probability and
standard deviation of 10 kgs. Assuming normal distribution, how many units are expected Expectation
to weight less than 142 kgs ?
9. The mean of a normal distribution is 50 and 5% of the values are greater than 60. Find
the standard deviation of the distribution.
NOTES
10. The time taken to complete a particular type of job is distributed approximately normal
with a mean of 1.8 hours and a standard deviation 0.1 hour. If ‘Normal time work’ finishes
at 6.00 p.m. and a job is started at 4.00 p.m. then, what is the probability that the job
will need overtime payments ?
11. The marks of the students in a class are normally distributed with mean 70 and standard
deviation 5. If the instructor decides to give ‘A’ grade to the top 15% students of the
class, how many marks a student must get to be able to get ‘A’ grade ?
12. Find the values of mean and standard deviation from the following data relating to a
normal distribution ?
10% of the items are under 40
95% of the items are under 75.
13. In a sample of 240 workers in a factory, the mean and standard deviation of wages were
` 113.50 and ` 30.30 respectively. Find the percentage of workers getting wages between
` 90 and ` 170 in the whole factory assuming that the wages are normally distributed.
14. In a distribution exactly normal, 7% of the items are under 35 and 89% are under 63.
What are the mean and standard deviation of the distribution ?
15. Fit a normal curve to the following data :

Variable (r) 0 1 2 3 4 5

Frequency (f) 10 14 19 8 5 4

Answers
1. (i) – 0.8 (ii) 1.4 (iii) 0
2. (i) 0.3849 (ii) 0.2518 (iii) 0.6637 (iv) 0.1828
(v) 0.2743 (vi) 0.8997 3. (i) z = ± 1.16 (ii) Z = 1.09
4. (i) 0.0228 (ii) 0.9772 5. 62.47% 6. 1251
7. 471 approx. 8. 1770 9. 6.1 10. 0.0228
11. 75.18 12. 55.32, 11.97 13. 75% 14. 50.3, 10.33

( x  1.93) 2
1 –
15. f(x) = e 3.92 .
1.40 2

Self-Instructional Material 45
Statistical Analysis

NOTES
3. STATISTICAL DECISION THEORY

STRUCTURE

Introduction
Elements of a Decision Problem
Types of Decision Making Environment
Decision Making Under Uncertainty
Decision Making Under Risk
Decision Tree

3.1. INTRODUCTION
INTRODUCTION

Decision analysis involves the use of a rational process for selecting the best of
several alternatives. The goodness of a selected alternative depends on the quality of
the data used in describing the decision situation. A decision making process falls into
one of the three categories ; decision making under certainty, decision making under
uncertainty and decision making under risk.
Nowadays, management students, businessman, engineers, persons from
industries and government, etc. are giving much emphasis over decision making under
conditions of uncertainty as mostly situations involves making choices under
uncertainty. The study of making decisions to choose the best among a number of
alternative courses of action is known as decision theory or statistical decision theory.

3.2. ELEMENTS OF A DECISION PROBLEM


PROBLEM

There are certain essential elements which are common to all decision making
categories :
1. The decision maker. The decision maker is charged with the responsibility
for making the decisions. The decision maker can be an individual, a group of
individuals, any company, an industrial body, etc.

46 Self-Instructional Material
2. Acts. The acts are the alternative courses of action or strategies that are Statistical Decision
available to decision maker. Theory

3. Events. The events also known as state of nature. The events identify the
occurrences which are not under the control of decision maker and which determines
the level of success for a given act. NOTES
4. Pay-off. Each combination of a course of action and an event is associated
with a pay-off. It is a quantitative measure of the value to the decision maker of the
outcomes. It measures the net benefit to the decision maker from a given combination
of course of action and an event. The pay-off usually represents the net monetary gain
(profit), but some other measures can also be used, as cost is negative profit.
5. Pay-off table. Suppose the problem under consideration has m possible events
(state of nature) denoted by E 1, E2, ..., Em and n alternative acts (strategies) denoted
by A1, A2, ..., An. Then the pay-off corresponding to strategy Aj of the decision maker
under the state of nature Ei will be denoted by pij (i = 1, 2, ...., m ; j = 1, 2, ..., n).
The totality of mn pay-offs arranged in a tabular form is known as pay-off table.

Events Pay-off (`)


(States of nature) courses of actions/acts/strategies
A1 A2 ...... An

E1 p11 p12 ...... p1n

E2 p21 p22 ...... p2n


# # # #
# # # #
Em pm1 pm2 ...... pmn

6. Regret or opportunity loss table. An opportunity loss has been defined to


be the difference between the highest possible profit for a state of nature and the
actual profit obtained for a particular action taken, i.e. an opportunity loss is the loss
incurred due to failure of not adopting the best possible action. For a given state of
nature, the opportunity loss of a course of action is the difference between the pay-off
of that course of action and the pay-off for the best course of action that could have
been selected.

Events Opportunity loss (`)


(States of courses of actions/acts/strategies
nature) A1 A2 ...... An

E1 M1 – p11 M1 – p12 ...... M1 – p1n

E2 M 2  p21 M2  p22 ...... M 2  p2 n


# # # #
# # # #
Em M m  pm1 M m  pm2 ...... M m  pmn

where M1, M2, ...., Mm are the maximum of these quantities respectively.

Self-Instructional Material 47
Statistical Analysis
3.3. TYPES OF DECISION MAKING ENVIRONMENT
ENVIRONMENT

There are three categories of decision making environment.


NOTES 1. Decision making under certainty. In this environment the decision maker
knows with certainty the consequence of every alternative or decision choice. There
will be only one outcome for each alternative. Examples of such decision problems are
linear programming, dynamic programming, transportation, assignment, integer
programming, etc.
2. Decision making under uncertainty. In this environment, the decision
maker cannot assess the outcome probability with confidence. In otherwords, if the
information about the outcomes is incomplete and the available information cannot be
described in terms of probability density. Decisions under uncertainty refer to situations
where more than one outcome can result from any single decision.
3. Decision making under risk. In this environment the pay-offs associated
with each decision alternative are usually described by probability distributions. For
this reason, decision making under risk is usually based on the expected monetary
value criterion or expected opportunity loss of the expected pay-off.

3.4. DECISION MAKING UNDER UNCERTAINTY


UNCERT

Different rules for making a decision under such environment are as follows :
1. Maximax or Minimin Criterion (Criterion of Optimism). This criterion
is based upon ‘extreme optimism’. The basic steps of this criterion are as follows :
(i) Determine the maximum possible pay-off for each alternative.
(ii) Choose that alternative which corresponds to the maximum of the above
maximum pay-offs.
In decision problems dealing with costs, the minimum for each alternative is
determined and then the alternative which minimizes the above minimum cost is
selected.
2. Maximin or Minimax Criterion (Criterion of Pessimism). This criterion
is based upon the ‘conservative approach’ to assume that the worst possible is going to
happen. The basic steps of this criterion are as follows :
(i) Determine the minimum possible pay-off for each alternative.
(ii) Choose that alternative which corresponds to the maximum of the above
minimum pay-offs.
In decision problems dealing with costs, the maximum for each alternative is
determined and then the alternative which minimizes the above maximum cost is
selected.
3. Laplace Criterion (Equally Likely Decisions). The Laplace criterion uses
all the information by assigning equal probabilities to all events of each alternative,
as there is no information about probabilities of occurrence. The basic steps of this
criterion are as follows :
 1 to each pay-off of a strategy (having n pay-offs).
(i) Assign equal probabilities  n
(ii) Determine the expected pay-off value for each alternative by multiplying
each pay-off by its probability and then adding.

48 Self-Instructional Material
(iii) Choose that alternative which corresponds to the maximum of the expected Statistical Decision
pay-offs. Theory
In decision problems dealing with costs, select that alternative which corresponds
to the minimum of the expected pay-offs.
4. Savage Criterion (Criterion of Regret). The savage criterion is based on NOTES
the concept of regret (or opportunity loss). This criterion also known as minimax regret
criterion. The basic steps of this criterion are as follows :
(i) Construct the regret table.
regret (opportunity loss)

=
%& RC[QHH OCZRC[QHH  KH VJG RC[QHHUTGRTGUGPVRTQHKVU
' RC[QHH  OKPRC[QHHKHVJGRC[QHHUTGRTGUGPVEQUVU
(ii) Determine the maximum regret for each alternative.
(iii) Choose the alternative with minimum regret out of these maximum regrets.
5. Hurwicz Criterion. The Hurwicz criterion is based on the concept that the
decision makers are neither completely pessimistic nor completely optimistic, but are
a combination of the two extremes. Therefore, we should give attention to both. The
basic steps of this criterion are as follows :
(i) Choose an appropriate degree of optimism (or pessimism) of the decision
maker. Let (0    1) be his degree of optimism (so 1 –  is his degree of pessimism).
(ii) Determine the maximum as well as minimum pay-off for each alternative
and obtain the quantities D (decision index) as
D = .maximum pay-off + (1 – ) . minimum pay-off
for each alternative.
(iii)Choose the maximum value of D when profits are given (Choose the minimum
value of D when costs are given).

SOLVED EXAMPLES
Example 1. Given the following profit pay-off table :

Strategy States of nature


Pay-off (in `)
S1 S2 S3 S4

A1 16 10 12 7
A2 13 12 9 9
A3 11 14 15 14

Which strategy should the decision maker choose on the basis of


(i) Maximin criterion (ii) Maximax criterion
(iii) Minimax regret criterion (iv) Laplace criterion ?

Self-Instructional Material 49
Statistical Analysis Solution.
(i) Maximin criterion :

Strategy States of nature Minimum for


NOTES S1 S2 S3 S4 each strategy

A1 16 10 12 7 7
A2 13 12 9 9 9
A3 11 14 15 14 11 (Max.)

Using maximin criterion, maximum of these minimum is 11 corresponding to


strategy A3. So A3 should be selected.
(ii) Maximax criterion :

Strategy States of nature Minimum for


S1 S2 S3 S4 each strategy

A1 16 10 12 7 16 (Max)
A2 13 12 9 9 13
A3 11 14 15 14 15

Using maximax criterion, maximum of these maximum is 16 corresponding to


strategy A1. So A1 should be selected.
(iii) Minimax regret criterion :
Regret table

Strategy States of nature Maximum


S1 S2 S3 S4 Regret

A1 16–16 = 0 14–10 = 4 15–12 = 3 14–7 = 7 7


A2 16–13 = 3 14–12 = 2 15–9 = 6 14–9 = 5 6
A3 16–11 = 5 14–14 = 0 15–15 = 0 14–14 = 0 5 (Min.)

Using minimax regret criterion, minimum of these maximum regrets is 5


corresponding to strategy A3. So A3 should be selected.
(iv) Laplace criterion :
Here, p = 1/4
1 1 1 1
EMV (Strategy A1) =  16   10   12   7
4 4 4 4
1 45
= (16  10  12  7)   11.25
4 4
1 43
EMV (Strategy A2) = (13  12  9  9)   10.75
4 4
1 54
EMV (Strategy A3) = (11  14  15  14)   13.5
4 4
Since the EMV is maximum for strategy A3. So A3 should be selected.

50 Self-Instructional Material
Example 2. The research department of Hindustan Lever has recommended to Statistical Decision
the marketing department to launch a shampoo of three different types. The marketing Theory
manager has to decide one of the types of shampoo to be launched under the following
estimated pay-offs various levels of sales :
NOTES
Estimated levels of sales (units)
Types of shampoo
15,000 10,000 5,000

Egg shampoo 30 10 10
Clinic shampoo 40 15 5
Delux shampoo 55 20 3

What should be the marketing manager decision using


(i) Maximin criterion (ii) Minimax criterion
(iii) Maximax criterion (iv) Laplace criterion
(v) Minimax regret criterion ?
Solution. (i) Maximin criterion :

Types of shampoo Estimated levels of sales (unit) Minimum


15,000 10,000 5,000 of each decision

Egg 30 10 10 10 (Max.)
Clinic 40 15 5 5
Delux 55 20 3 3

Using maximin criterion, maximum of these minimum is 10 corresponding to


Egg shampoo. So Egg shampoo should be launched.
(ii) Minimax criterion :

Types of shampoo Estimated levels of sale (units) Maximum of each


15,000 10,000 5,000 decision

Egg 30 10 10 30 (Min)
Clinic 40 15 5 40
Delux 55 20 3 55

Using minimax criterion, minimum of these maximum is 30 corresponding to


Egg shampoo. So Egg shampoo should be launched.
(iii) Maximax criterion :

Types of shampoo Estimated levels of sale (units) Maximum of each


15,000 10,000 5,000 decision

Egg 30 10 10 30
Clinic 40 15 5 40
Delux 55 20 3 55 (Max.)

Using maximax criterion, maximum of these maximum is 55 corresponding to


Delux shampoo. So Delux shampoo should be launched.

Self-Instructional Material 51
Statistical Analysis
1
(iv) Laplace criterion : Here, p =
3
1 1 1
EMV (Egg shampoo) = × 30 + × 10 + × 10
NOTES 3 3 3
1 50
= (30 + 10 + 10) = = 16.67
3 3
1 60
EMV (Clinic shampoo) = (40 + 15 + 5) = = 20
3 3

1 78
EMV (Delux shampoo) = (55 + 20 + 3) = = 26
3 3
Since the EMV is maximum for Delux shampoo. So Delux shampoo should be
launched.
(v) Minimax regret criterion :

Regret table

Types of shampoo Estimated levels of sale (units) Maximum regret


15,000 10,000 5,000

Egg 55–30 = 25 20–10 = 10 10–10 = 0 25


Clinic 55–40 = 15 20–15 = 5 10–5 = 5 15
Delux 55–55 = 0 20–20 = 0 10–3 = 7 7 (Min.)

Using minimax regret criterion, minimum of these maximum regrets is 7


corresponding to Delux shampoo. So Delux shampoo should be launched.
Example 3. A farmer wants to plan which of the three crops he should plant on
his 100 acre farm. The profit of each crop depends upon the rainfall during the growing
season. The rainfall could be high, medium and low. The estimated profit of the farmer
for each of the crops is shown in the table :

Rainfall Estimated Conditional Profit


Crop A Crop B Crop C

High 6000 3000 7000


Medium 4000 4500 4000
Low 2000 5000 5000

The farmer decides to plant only one crop, which would be his best crop using the
following :
(i) Maximax criterion (ii) Maximin criterion
(iii) Laplace criterion (iv) Minimax regret criterion.

52 Self-Instructional Material
Solution. (i) Maximax criterion : Statistical Decision
Theory
Type of crop Estimated Conditional Profit Maximum
Rainfall of each
High Medium Low crop NOTES
Crop A 6000 4000 2000 6000
Crop B 3000 4500 5000 5000
Crop C 7000 4000 5000 7000 (Max.)

Using maximax criterion, maximum of these maximum is 7000 corresponding


to crop C.
So crop C is the best crop.
(ii) Maximin criterion :

Type of crop Estimated Conditional Profit Minimum of


Rainfall each crop
High Medium Low

Crop A 6000 4000 2000 2000


Crop B 3000 4500 5000 3000
Crop C 7000 4000 5000 4000 (Max.)

Using maximin criterion, maximum of these minimum is 4000 corresponding to


crop C.
So crop C is the best crop.
1
(iii) Laplace criterion : Here, p =
3
1 1 1
EMV (Crop A) = × 6000 + × 4000 + × 2000
3 3 3
1 12000
= (6000 + 4000 + 2000) = = 4000
3 3
1 12500
EMV (Crop B) = (3000 + 4500 + 5000) = = 4166.67
3 3
1 16000
EMV (Crop C) = (7000 + 4000 + 5000) = = 5333.33
3 3
Since the EMV is maximum for crop C. So crop C is the best crop.
(iv) Minimax regret criterion :

Regret table

Types of crop Rainfall Maximum


High Medium Low regret

Crop A 7000–6000 = 1000 4500–4000 = 500 5000–2000 = 3000 3000


Crop B 7000–3000 = 4000 4500–4500 = 0 5000–5000 = 0 4000
Crop C 7000–7000 = 0 4500–4000 = 500 5000–5000 = 0 500 (Min.)

Using minimax regret criterion, minimum of these maximum regrets is 500


corresponding to crop C. So crop C is best crop.

Self-Instructional Material 53
Statistical Analysis Example 4. Consider the following pay-off (profit) matrix :

Alternative Events
E1 E2 E3 E4
NOTES
A1 5 10 18 25
A2 8 7 8 23
A3 21 18 12 21
A4 30 22 19 15

Find optimum alternative using Hurwicz criterion with  = 0.75.


Solution. Here,  = 0.75 so (1 – ) = 1 – 0.75 = 0.25

Alternative Maximum pay-off Minimum pay-off D =  × (i) + (1 – ) (ii)


(i) (ii)

A1 25 5 25 × 0.75 + 5 × 0.25 = 20
A2 23 7 23 × 0.75 + 7 × 0.25 = 19
A3 21 12 21 × 0.75 + 12 × 0.25 = 18.75
A4 30 15 30 × 0.75 + 15 × 0.25 = 26.25

According to Hurwicz criterion, maximum value of D is 26.25 corresponding to


A4. So A4 is optimum alternative.

3.5. DECISION MAKING UNDER RISK

Different criterion for making a decision under such environment are as follows:
1. Expected Monetary Value (EMV) Criterion. The expected monetary value
criterion seeks the maximization of expected profit or the minimization of expected
cost. The basic steps of this criterion are as follows :
(i) Construct the pay-off table listing all possible courses of actions and events
(states of nature), along with the corresponding event probabilities.
(ii)Determine the expected conditional profit values for each course of action.
(iii) Determine EMV for each course of action (strategy) by
EMV (Ai) = pi1 P(E1) + pi2 P(E2) + ... + pim P(Em)
(iv) Choose that course of action (strategy) having highest EMV.
2. Expected Opportunity Loss (EOL) Criterion. An alternative approach
to maximizing expected monetary value (EMV) is to minimize expected opportunity
loss (EOL). The basic steps of this criterion are as follows :
(i) Construct the opportunity loss table listing all possible courses of actions
and events (states of nature), along with the corresponding event probabilities.
(ii) Determine the conditional opportunity loss values for each event.
(iii) Determine the expected conditional opportunity loss values and sum these
values to get the expected opportunity loss (EOL) for each course of action by
EOL (Aj) = (M1 – p1j) P(E1) + (M2 – p2j) P(E2) + ... + (Mm – pmj) P (Em)
(j = 1, 2, ..., n)
(iv) Choose that course of action (strategy) having lowest EOL.

54 Self-Instructional Material
3. Expected Value of Perfect Information (EVPI). The expected value with Statistical Decision
perfect information is the expected or average return, in the long run, if we have perfect Theory
information before a decision is made. The EVPI may be defined as the maximum
amount spend by the decision maker to get perfect (additional) information. Expected
pay-off under perfect information (EPPI) can be calculated by finding the sum of product NOTES
of pay-off of best outcome of each state of nature and its probability of occurrence.
The expected value of perfect information (EVPI) is the expected outcome with
perfect information minus the expected outcome without perfect information (maximum
EMV).
EVPI = EPPI – max. EMV

SOLVED EXAMPLES
Example 1. A management is faced with the problem of choosing one of three
products for manufacturing. The potential demand for each product may turn out to be
good, moderate or poor. The probabilities for each of the states of nature were estimated
as follows :

Product Nature of demand


Good Moderate Poor

X 0.70 0.20 0.10


Y 0.50 0.30 0.20
Z 0.40 0.50 0.10

The estimated profit or loss (in `) under the three states may be taken as :

Product Good Moderate Poor

X 300,000 200,000 100,000


Y 600,000 300,000 200,000
Z 400,000 100,000 – 150,000 (loss)

Prepare the expected value table and advice the management about the choice of
the product.
Solution.

Nature of Expected pay-off (in ` Lacs) for various acts


demand X Y Z
x1j p1j x1j p1j x2j p2j x2j p2j x3j p3j x3j p3j

Good 3 0.7 2.1 6 0.5 3.0 4 0.4 1.6


Moderate 2 0.2 0.4 3 0.3 0.9 1 0.5 0.5
Poor 1 0.1 0.1 2 0.2 0.4 – 1.5 0.1 – 0.15

EMV = xij pij 2.6 4.3 1.95

Since the EMV is maximum for product Y, so Y should be selected as the best
product.

Self-Instructional Material 55
Statistical Analysis Example 2. Pay-offs of three acts X, Y, Z and the states of nature P, Q, R are as
follows :

State of nature Pay-offs (`) (Acts)


NOTES X Y Z

P – 120 – 80 100
Q 200 400 – 300
R 260 – 260 600

The probabilities of the states of nature are 0.3, 0.5 and 0.2 respectively. Tabulate
the expected monetary values (EMVs) for the above data and state which can be selected
as the best act.
Solution.

Pay-offs (`)
(Acts)
State of nature X Y Z
x1j p1j x1j p1j x2j p2j x2j p2j x3j p3j x3j p3j

P – 120 0.3 – 36 – 8.0 0.3 – 24 100 0.3 30


Q 200 0.5 100 400 0.5 200 – 300 0.5 – 150
R 260 0.2 52 – 260 0.2 – 52 600 0.2 120

EMV = xij pij 116 124 0

Since the EMV is maximum for act Y, so Y should be selected as the best act.
Example 3. A newspaper distributor assigns probabilities to the demand for a
magazine as follows :

Copies demanded 1 2 3 4

Probability 0.4 0.3 0.2 0.1

A copy of magazine sells for ` 7 and costs ` 6. What can be the maximum possible
expected monetary value (EMV) if the distributor can return unsold copies for ` 5 each
?
Solution. Cost of a magazine =`6
Selling price of a magazine =`7
Profit per magazine = ` (7 – 6) = ` 1
Loss on each unsold magazine
= ` (6 – 5) = ` 1
%& 1  S  S if D  S
Conditional profit =
' 1  D  1  (S  D)  2D  S if D  S
,

where D = no. of magazines demanded


S = no. of magazines in stock

56 Self-Instructional Material
The resulting pay-off and corresponding expected pay-offs are as follows : Statistical Decision
Theory
Event Probability Conditional pay-off (`) Expected pay-off (`)
(Demand) Act (Stock) Act (Stock)
1 2 3 4 1 2 3 4 NOTES
D (i) (ii) (iii) (iv) (v) (i) × (ii) (i) × (iii) (i) × (iv) (i) × (v)

1 0.4 1 0 –1 –2 0.4 0 – 0.4 – 0.8


2 0.3 1 2 1 0 0.3 0.6 0.3 0
3 0.2 1 2 3 2 0.2 0.4 0.6 0.4
4 0.1 1 2 3 4 0.1 0.2 0.3 0.4

EMV 1.0 1.2 0.8 0

Since the EMV is maximum for act (stock) 2, so the optimum act for the
distributor would be to stock 2 copies of magazine.
Example 4. The following pay-off table is given :

Acts Events
E1 E2 E3 E4

A1 40 200 – 200 100


A2 200 0 200 0
A3 0 100 0 150
A4 – 50 400 100 0
Suppose that the probabilities of the events are :
P(E1) = 0.20, P(E2) = 0.15, P(E3) = 0.40 and P(E4) = 0.25. Calculate the expected
pay-off and the expected loss of each action. Find the optimum act using EMV and EOL
criterion.
Solution. Computation of expected pay-offs

Event Probability Conditional pay-off (`) Expected pay-off (`)


(Demand) Act (Stock) Act (Stock)
1 2 3 4 1 2 3 4
D (i) (ii) (iii) (iv) (v) (i) × (ii) (i) × (iii) (i) × (iv) (i) × (v)

E1 0.20 40 200 0 – 50 8 40 0 – 10
E2 0.15 200 0 100 400 30 0 15 60
E3 0.40 – 200 200 0 100 – 80 80 0 40
E4 0.25 100 0 150 0 25 0 37.5 0

EMV – 17 120 52.5 90

Since the EMV is maximum for act A2, so A2 is the optimum act.

Self-Instructional Material 57
Statistical Analysis Computation of expected loss

Event Probability Opportunity loss Expected loss


Act Act
NOTES A1 A2 A3 A4 A1 A2 A3 A4
(i) (ii) (iii) (iv) (v) (i) × (ii) (i) × (iii) (i) × (iv) (i) × (v)

E1 0.20 200 – 40 200 – 200 200 – 0 200 + 50 32 0 40 50


= 160 =0 = 200 = 250
E2 0.15 400–200 400–0 400–100 400–400 30 60 45 0
= 200 = 400 = 300 =0
E3 0.40 200 + 200 200–200 200–0 200–100 160 0 80 40
= 400 =0 = 200 = 100
E4 0.25 150–100 150–0 150–150 150–0 12.5 37.5 0 37.5
= 50 = 150 =0 = 150

EOL 234.5 97.5 165 127.5

Since the EOL is minimum for act A2, so A2 is the optimum act.
Example 5. A grocery with a bakery department is faced with the problem of
how many cakes to buy in order to meet the day’s demand. The grocer prefers not to sell
day-old goods in competition with fresh products ; leftover cakes are, therefore, a complete
loss. On the other hand, if a customer desires a cake and all of them have been sold, the
disappointed customer will buy elsewhere and the sales will be lost. The grocer has,
therefore, collected information on the past sale on a selected 100 day period as follows:

Sales per day No. of days Probability

25 10 0.10
26 30 0.30
27 50 0.50
28 10 0.10

A cake costs ` 80 and sells for ` 100. Construct the pay-off table and the opportunity
loss table. What is the optimum number of cakes that should be bought each day ?
Solution. Let Ai = alternative strategy (act) of stocking i cakes
Ej = a daily demand of j cakes state of nature (event)
Here, cost of a cake = ` 80
Selling price of a cake = ` 100
Profit per cake sold = ` (100 – 80) = ` 20
Loss on each unsold cake = ` 80

%K 20 S if D  S
Conditional pay-off = &K 20 D  80 (S  D) if D  S ,
' = 100 D  80 S
where D = no. of cakes demanded
S = no. of cakes in stock

58 Self-Instructional Material
The resulting pay-off (conditional profit) are as follows : Statistical Decision
Theory
Event Probability Conditional pay-off (`)
(Demand) Act (Stock)
D A1 : 25 A2 : 26 A3 : 27 A4 : 28 NOTES
E1 : 25 0.10 500 420 340 260

E2 : 26 0.30 500 520 440 360


E3 : 27 0.50 500 520 540 460
E4 : 28 0.10 500 520 540 560

The expected conditional pay-off are computed as follows :

Event Probability Expected conditional pay-off (`)


A1 : 25 A2 : 26 A3 : 27 A4 : 28

E1 : 25 0.10 50 42 34 26
E2 : 26 0.30 150 156 132 108
E3 : 27 0.50 250 260 270 230
E4 : 28 0.10 50 52 54 56

EMV 500 510 490 420

Since the EMV is maximum for act A2, so 26 cakes should be bought (stocked)
each day.
The conditional opportunity loss are computed as follows :

Event Probability Conditional opportunity loss (`)


Act (Stock)
(Demand) A1 : 25 A2 : 26 A3 : 27 A4 : 28

E1 : 25 0.10 0 80 160 240


E2 : 26 0.30 20 0 80 160
E3 : 27 0.50 40 20 0 80
E4 : 28 0.10 60 40 20 0

The expected conditional opportunity loss are computed as follows :

Event Probability Expected conditional opportunity loss (`)


A1 : 25 A2 : 26 A3 : 27 A4 : 28

E1 : 25 0.10 0 8 16 24
E2 : 26 0.30 6 0 24 48
E3 : 27 0.50 20 10 0 40
E4 : 28 0.10 6 4 2 0

EOL 32 22 42 112

Since the EOL is minimum for act A2, so 26 cakes should be bought (stocked)
each day.

Self-Instructional Material 59
Statistical Analysis Example 6. A wholesaler of sports goods has an opportunity to by 5000 pairs of
gloves that have been declared surplus by the government. The wholesaler will pay ` 50
per pair and can obtain ` 100 a pair by selling gloves to retailers. The price is well
established, but the wholesaler is in doubt as to just how many pairs he will able to sell.
NOTES Any gloves leftover, he can sell to discount outlets at ` 20 a pair. After a careful
consideration of the past data, the wholesaler assigns probabilities to the demand as
follows :
Retailer’s demand Probability
1000 pairs 0.6
3000 pairs 0.3
5000 pairs 0.1
(i) Compute the conditional monetary and expected monetary values.
(ii) Compute the expected profit with a perfect predicting device.
(iii) Compute the EVPI.
Solution. Cost per pair = ` 50
Selling price per pair = ` 100
Profit per pair = ` 50 (on sold pair)
Disposal selling price = ` 20 (on unsold pair)
Loss on each unsold pair = ` (50 – 20) = ` 30

%K 50 S if D  S
Conditional pay-off (profit) = & 50 D  30 (S  D) if D < S ,
K'  80 D  30 S
where D = no. of pairs demanded
S = no. of pairs stocked
(i) The resulting conditional pay-offs and corresponding expected pay-offs are
computed as follows :

Retailer’s Probability Conditional pay-offs (`’ 000) Expected pay-offs (` ’000)


demand Stock per week Stock perweek
D 1000 3000 5000 1000 3000 5000
pairs pairs pairs pairs pairs pairs

1000 pairs 0.6 50 – 10 – 70 30 –6 – 42


3000 pairs 0.3 50 150 90 15 45 27
5000 pairs 0.1 50 150 250 5 15 25

EMV 50 54 10

60 Self-Instructional Material
(ii) The expected profit under perfect information (EPPI) is computed as follows: Statistical Decision
Theory
Retailer’s Probability Conditional pay-offs (`’000) Under perfect information
demand Stock per week (` ’000)
D 1000 3000 5000 Maximum Expected NOTES
pairs pairs pairs pay-off (v) pay-off
(i) (ii) (iii) (iv) [from (ii), (iii) and (iv)] (i) × (v)

1000 pairs 0.6 50 – 10 – 70 50 30


3000 pairs 0.3 50 150 90 150 45
5000 pairs 0.1 50 150 250 250 25

EPPI = 100

(iii) EVPI = EPPI – max EMV = 100 – 54 = 46


Thus, EVPI = ` 46,000.
Example 7. Pay-off of three acts, A1, A2 and A3 and state of nature X, Y, Z are as
follows :

Pay-off (`)
State of nature Acts
A1 A2 A3

X – 20 – 50 200
Y 200 – 100 – 50
Z 400 600 300

The probabilities of the state of nature are 0.3, 0.4 and 0.3 respectively. Calculate
the EMV for the given data and select the best act. Also find the expected value of
perfect information (EVPI).
Solution. Computation of expected pay-off

State of nature Probability Pay-off (`) Expected pay-off (`)


Acts Acts
A1 A2 A3 A1 A2 A3

X 0.3 – 20 – 50 200 –6 – 15 60
Y 0.4 200 – 100 – 50 80 – 40 – 20
Z 0.3 400 600 300 120 180 90

EMV 194 125 130

Since the EMV is maximum for act A1, so A1 is the best act.

Self-Instructional Material 61
Statistical Analysis The expected profit under perfect information (EPPI) is computed as follows :

State of nature Probability Pay-off (`) Under perfect information (`)


Acts
NOTES A1 A2 A3 Maximum pay-off Expected
pay-off

X 0.3 – 20 – 50 200 200 200 × 0.3 = 60


Y 0.4 200 – 100 – 50 200 200 × 0.4 = 80
Z 0.3 400 600 300 600 600 × 0.3 = 180

EPPI = 320

EVPI = EPPI – max EMV = 320 – 194 = 126


Thus, EVPI = ` 126.

3.6. DECISION TREE

A decision tree is a graphical representation of the various alternatives and


sequence of events in a decision problem. Decision tree is a simple method for making
a decision, where all the options are clearly open to the decision maker in concise form.
Decision tree is beneficial for simple as well as complex decision making situations.
Basically decision tree is drawn for those problems where more than one decisions are
to be taken and the decision taken to one stage affects the subsequent decision.
There are some basic rules for drawing a decision tree.
(i) Identify all decisions to be made and the order in which they must be made.
(ii) Identify the chance events that can occur after each decision.
(iii) Develop a tree diagram showing the sequence of decisions and chance events.
The tree is constructed starting from left and moving towards right. The
‘square box’ denotes a decision point at which the available strategies are
considered. The ‘circle’ represents the chance node or event, the various state
of nature or outcomes emanate from this.
(iv) Obtain probability estimate of the chances of each outcome’s occurrence.
(v) Obtain estimates of the consequences of all possible outcomes or actions.
(vi) Calculate the expected value of all possible actions.
(vii) Select the action offering the most attractive expected value.

SOLVED EXAMPLES
Example 1. XYZ Ltd. has invented a picture cell phone. It is faced with selecting
one alternative out of the following strategies :
(i) Manufacture the cell phone
(ii) Take royalty from another manufacturer
(iii) Sell the rights for the invention and take a lumpsum amount.

62 Self-Instructional Material
Profit in thousands of rupees which can be incurred and the probability associated Statistical Decision
with such alternative are shown in the followng table: Theory

Event Probability Manufacture Royalty Sell rights


NOTES
High 0.25 200 60 50
Medium 0.40 50 40 50
Low 0.35 – 10 20 50

Represent the company’s problem in the form of the decision tree and suggest
what decision the company should take to maximize profit.
Solution.
Monetary values Prob. EMV
High
200 0.25

Medium
50 0.40 Rs. 66,500
re

Low
tu

–10 0.35
ac
uf
an

High
M

60 0.25

Royalty Medium
40 0.40 Rs. 38,000

Low
20 0.35
Se

High
ll
rig

50 0.25
ht
s

Medium
50 0.40 Rs. 50,000

Low
50 0.35

Thus, EMV for strategy (i) manufacture the cell phone is maximum. So the best
decision by XYZ Ltd. is to manufacture the picture cell phone itself to get profit of `
66,500.
Example 2. A company is evaluating four alternative single-period investment
opportunities whose returns are based on the state of the economy. The possible states of
the economy and the associated probability distribution are as follows :

State Fair Good Great

Probability 0.2 0.5 0.3

The returns for each investment opportunity and each state of the economy are as
follows :

State of Economy
Alternative Fair Good Great
(`) (`) (`)

A 1,000 3,000 6,000


B 500 4,500 6,800
C 0 5,000 8,000
D – 4,000 6,000 8,500

Self-Instructional Material 63
Statistical Analysis Using the decision tree approach, determine the expected return for each
alternative. Which alternative investment proposal would you recommend if the expected
monetary value criterion is to be employed ?
Solution.
NOTES Monetary values Prob. EMV
Fair (Rs.)
1,000 0.2
A Good
3,000 0.5 Rs. 3,500
Great
6,000 0.3
Fair
500 0.2
B Good
4,500 0.5 Rs. 4,390
Great
6,800 0.3

Fair
0 0.2
C Good
5,000 0.5 Rs. 4,900
Great
8,000 0.3
Fair
–4,000 0.2
D Good
6,000 0.5 Rs. 4,750
Great
8,500 0.3

Thus, EMV for C is maximum, so alternative C is the best with maximum return
of ` 4,900.
Example 3. A manager has a choice between (i) a risky contract promising ` 7
lakhs with probability 0.6 and ` 4 lakhs with probability 0.4 (ii) a diversified portfolio
consisting of
two contracts with independent outcomes each promising ` 3.5 lakhs with
probability of 0.6 and ` 2 lakhs with probability of 0.4. Construct a decision tree and
suggest which choice the manager should opt using EMV criterion.
Solution.

Monetary values Prob. EMV


(Rs. Lakhs)
7 0.6
1 Rs. 5.8 lakhs
4 0.4

D 3.5 0.6
3 Rs. 2.9 lakhs
0.5
2 0.4
2
0.5 3.5 0.6
4 Rs. 2.9 lakhs
2 0.4

EMV at node 2 = (2.9 × 0.5) + (2.9 × 0.5) = ` 2.9 lakhs.


Thus, EMV for strategy (i) risky contract is maximum. So manager should opt
choice (i).

64 Self-Instructional Material
Example 4. A shopkeeper has the facility to store a large number of perishable Statistical Decision
items. He buys them at a rate of ` 3 per item and sells at the rate of ` 5 per item. If an Theory
item is not sold at the end of the day, then there is a loss of ` 3 per item. The daily
demand of the item has the following probability distribution :
NOTES
Number of items sold 4 5 6

Probability 0.2 0.5 0.3

How many items should he store so that his daily expected profit is maximum ?
Use decision tree approach.
Solution. Profit per item = ` (5 – 3) = ` 2

Profit (Rs.) Prob. EMV


Demand 4 items
8 0.2
Stock 4 items Demand 5 items
8 0.5 Rs. 8
Demand 6 items
8 0.3
Demand 4 items
5 0.2
Stock 5 items Demand 5 items
D 10 0.5 Rs. 9
Demand 6 items
10 0.3
Demand 4 items
2 0.2
Stock 6 items Demand 5 items
7 0.5 Rs. 7.5
Demand 6 items
12 0.3

Thus, EMV for strategy second is maximum. So shopkeeper should stock 5 items.
Example 5. Matrix company is planning to launch a new product, which can be
introduced initially in Western India or in the entire country. If the product is introduced
only in Western India, the investment outlay will be ` 12 million. After two years, Matrix
can evaluate the project to determine whether it should cover the entire country. For
such expansion it will have to incur an additional investment of ` 10 million. To introduce
the product in the entire country right in the begining would involve an outlay of ` 20
million. The product, in any case, will have a life of 5 years after which the plant will
have zero net value.
If the product is introduced only in Western India, demand would be high or low
with the probabilities of 0.8 and 0.2 respectively and annual cashflow of ` 4 million
and ` 2.5 million respectively.
If the product is introduced in the entire country right in the begining the demand
would be high or low with probabilities of 0.6 and 0.4 respectively and annual cash
inflows of ` 8 million and ` 5 million respectively.
Based on the observed demand in Western India, if the product is introduced in
the entire country the following probabilities would exist for high and low demand on an
all India basis :

Western India Entire Country


High demand Low demand

High demand 0.9 0.1


Low demand 0.4 0.6

Self-Instructional Material 65
Statistical Analysis The hurdle rate applicable to this project is 12 percent.
(i) Set up a decision tree for the investment situation.
(ii) Advice Matrix company on the investment policy it should follow.
Support your advice with appropriate reasoning.
NOTES Solution.
High demand
(0.9)

ion 3
ans
Exp
Low demand
D3 (0.1)
d
an
dem
h (0.8) No expansion
ig
EMV = 7.60 H
1
a No expansion
di Lo
In w (0.2)
rn de
te
es m
an
W d High demand
D2
D1 (0.4)

En 4
tir High demand
e
co
un (0.6) Low demand
try
(0.6)
2

EMV = 14
Low demand
(0.4)

Decision Point Outcome Probability Conditional Expected Value


Value (`)

(Demand high in
Western India)
(i) Expansion High demand 0.9 8 7.2
Low demand 0.1 5 0.5
D3 7.7
7.7 × 3 years = 23.1
Less cost = 10.0
(ii) No expansion Total 13.1

0
Total expected
profit = 13.1

66 Self-Instructional Material
Statistical Decision
(Demand low in
Theory
Western India)
(i) Expansion High demand 0.4 8 3.2
Low demand 0.6 5 3.0 NOTES
6.2
D2 6.2 × 3 years = 18.6
Less cost = 10.0
(ii) No expansion Total 8.6
0
Total expected
profit = 8.6

(i) Introduction High demand 0.6 8 4.8


Entire country Low demand 0.4 5 2.0
6.8
6.8 × 5 years = 34.0
Less cost = 20.0
D1 Total expected profit 14
(ii) Introduction High demand 0.8 4 0.8 (4 × 2 + 13.1) = 16.88
Western India Low demand 0.2 2.5 0.2 (2.5 × 2 + 8.6) = 2.72
= 19.60
Less cost = 12.00
Total expected profit = 7.6

Thus, the EMV at node 2 is maximum, make a decision to launch the product in
entire country.

EXERCISE 3.1
1. The ABC company is faced with four decision alternatives relating to investments in a
capital expansion programme. Since these investments are made in future, the company
foresees different market conditions as expressed in the form of states of nature. The
following table summarizes the decision alternatives, the various states of nature and
the rate of return associated with each state of nature :

Decision States of nature


Q1 Q2 Q3
D1 17% 15% 8%
D2 18% 16% 9%
D3 21% 14% 9%
D4 19% 12% 10%

If the company has no information regarding the probability of occurrence of the three
states of nature, give the recommended decision for the decision criterion as follows :
(i) Maximax criterion (ii) Maximin criterion
(iii) Minimax regret criterion (iv) Laplace criterion

Self-Instructional Material 67
Statistical Analysis 2. A food products company is contemplating the introduction of a revolutionary new product
with new packaging to replace the existing product at much higher price (A1) or a moderate
change in the composition of the existing product with a new packaging at a small increase
in price (A2) or a small change in the composition of the existing except the word ‘New’
with a negligible increase in price (A3). The three possible states of nature or events are
NOTES (i) high increase in sales (S1), (ii) no change in sales (S2) and (iii) decrease in sales (S3).
The marketing department of the company worked out the pay-offs in terms of yearly
net profits for each of the strategies of three events (expected sales). This is represented
in the following table :

Strategy States of nature


Pay-offs (in `)
S1 S2 S3
A1 7,00,000 3,00,000 1,50,000
A2 5,00,000 4,50,000 0
A3 3,00,000 3,00,000 3,00,000

Which strategy should the concerned executive choose on the basis of


(i) Maximin criterion (ii) Maximax criterion
(iii) Minimax regret criterion (iv) Laplace criterion ?
3. A person wants to invest in one of three alternative investment plans : stock, bonds or a
savings account. It is assumed that the person wishes to invest all of the funds in one
plan. The conditional pay-offs of the investments are based on three potential economic
conditions : accelerated, normal or slow growth. The pay-off matrix is given as :

Alternative Economic conditions


Investment Pay-off (`)
Accelerated growth Normal growth Slow growth

Stocks 10,000 6,500 – 4,000


Bonds 8,000 6,000 1,000
Savings 5,000 5,000 5,000

Determine the best investment plan using each of the following :


(i) Maximin criterion (ii) Maximax criterion
(iii) Laplace criterion (iv) Hurwicz criterion with  = 0.6
4. Consider the following cost matrix :

Alternatives States of nature


S1 S2 S3 S4

A1 1 3 8 5
A2 2 5 4 7
A3 4 6 6 3
A4 6 8 3 5

Determine the best alternative using :


(i) Minimax criterion
(ii) Minimin criterion
(iii) Minimax regret criterion.

68 Self-Instructional Material
5. Pay-offs (in `) of three acts A1, A2 and A3 and the possible states of nature S1, S2 and S3 Statistical Decision
are as follows : Theory

State of nature Pay-offs (`)


Act
NOTES
A1 A2 A3

S1 – 20 – 50 200
S2 200 – 100 – 50
S3 400 600 300

The probabilities of the states of nature are 0.3, 0.4 and 0.3 respectively. Tabulate the
expected monetary values (EMVs) for the above data and state which can be selected as
the best act.
6. An investor is given the following investment alternatives and percentage rates of return:

Strategy States of nature (Market conditions)


Low Medium High

Regular shares 7% 10% 15%


Risky shares – 10% 12% 25%
Property – 12% 18% 30%

Over the past 300 days, 150 days have been medium market conditions and 60 days
have been high market conditions.
On the basis of these data, state the optimum investment strategy for the investment.
7. Pay-offs (`) of three acts A1, A2, A3 and the states of nature S1, S2 and S3 are as follows :

States of nature Pay-offs (`)


Act
A1 A2 A3

S1 25 – 10 – 125
S2 400 440 400
S3 650 740 750

The probabilities of the states of nature are 0.1, 0.7 and 0.2 respectively. Tabulate the
expected monetary values (EMVs) and state which can be selected as the best act.
8. XYZ flower shop promises its customers delivery within four hours on all flower orders.
All flowers are purchased on the prior day and delivered to XYZ by 8:00 the next morning.
XYZ’s daily demand for roses is as follows :

Dozens roses 7 8 9 10

Probability 0.1 0.2 0.4 0.3

XYZ purchases roses for ` 10.00 per dozen and sells them for ` 30.00. All unsold roses
are donated to a local hospital. How many dozens of roses should XYZ order each evening
to maximize its profit ? What is the optimum expected profit ?
9. A producer of boats has estimated the following distribution of demand for a particular
kind of boat :

No. demanded 0 1 2 3 4 5 6

Probability 0.14 0.27 0.27 0.18 0.09 0.04 0.01

Self-Instructional Material 69
Statistical Analysis Each boat cost him ` 7,000 and he sells them for ` 10,000 each. Any boat that are left
unsold at the end of the season must be disposed off for ` 6,000 each. How many boats
should be in stock so as to maximize his expected profit ?
10. Consider the following pay-off table.

NOTES Acts Events


E1 E2 E3 E4

A1 18 10 12 8
A2 16 12 10 10
A3 12 13 11 12

The probabilities of events E1, E2, E3 and E4 are 0.25, 0.40, 0.15 and 0.20 respectively.
Find the optimum act using expected opportunity loss (EOL) criterion.
11. A man has the choice of running either a hot-snack stall or an ice-cream stall at a seaside
resort during the summer season. If it is a fairly cool summer, he should make ` 5,000 by
running the hot-snack stall, but if the summer is quite hot he can only expect to make
` 1000. On the other hand, if he operates the ice-cream stall, his profit is estimated at
` 6500 if the summer is hot, but only ` 1000 if it is cool. There is a 40% chance of the
summer being hot. Should he opt for running the hot-snack stall or the ice-cream stall ?
Give mathematical argument.
12. The cost of making an item is ` 25, the selling price of the item is ` 30, if it is sold within
a week, and it could be disposed off at ` 20 per piece at the end of week if unsold. Frequency
of weekly sales is given as :

Weekly sales ( 3) 4 5 6 7 ( 8)

No. of weeks 0 10 20 40 30 0

Find the optimum number of items per week the industry should make using EMV and
EOL criterion. Also find, the EVPI.
13. A company wants to know whether or not a new shaving cream should be marketed. The
present value of all future profits for the success of the cream is ` 10,00,000 and its
failure would results in a net loss of ` 5,00,000.
Not marketing it would not change the profits. The chances of the success of the new
cream are 50%. Determine the optimum act and find the EVPI.
14. A modern home appliances dealer finds that the cost of holding a mini-cooking range in
stock for a month is ` 200 (insurance, minor deterioration, interest on borrowed capital,
etc.). Customer who cannot obtain a cooking range immediately tends to go to other
dealers and he estimates that for every customer who cannot get immediate delivery, he
loses an average of ` 500. The probabilities of a demand of 0, 1, 2, 3, 4 and 5 cooking
ranges in a month are 0.05, 0.10, 0.20, 0.30, 0.20 and 0.15 respectively. Determine the
optimum stock level of cooking range. Also find the EVPI.
15. A manufacturer of leather goods must decide whether to expand his plant capacity now
or wait at least another year. His advisors tell him that if he expands now and economic
conditions remained good, there will be a profit of ` 1,64,000 during the next year. If he
expands now and there is recession, there will be a loss of ` 40,000. If he waits at least
another year and economic conditions remain good, there will be a profit of ` 80,000 and
if he waits at least another year and there is a recession, there will be a small profit of
` 8,000. What should the manufacturer decide to do if he wants to minimize the expected
loss during next year and he feels that the odds are 2 : 1 that there will be recession. Use
decision tree approach.
16. XYZ Ltd. wants to update/change its existing manufacturing prices for product A. It
wants to strengthen its research and development cell and conduct research for finding
a better product of manufacturing, which can get them higher profits. At present the

70 Self-Instructional Material
company is earning a profit of ` 20,000 after paying for material, labour and overheads. Statistical Decision
XYZ Ltd. has the following four alternatives : Theory
(i) The company continues with the existing process.
(ii) The company conducts research P, which costs ` 20,000, has 75% probability of success
and can get the profit of ` 5,000. NOTES
(iii) The company conducts research Q, which costs ` 10,000, has 50% probability of success
and can get the profit of ` 25,000.
(iv) The company pays ` 10,000 as royalty for a new product and can get profit of ` 20,000.
The company can carry out only one out of the two types of research P and Q because
of certain limitations. Draw a decision tree diagram and find the best strategy for
XYZ Ltd.
17. The investment staff of a bank is considering four investment proposals for clients, shares,
bonds, real estate and saving certificates, these investments will be held for one year.
The past data regarding the four proposals is given as follows :
Shares. There is 25% chance that shares will decline by 10%, 30% chance that they will
remain stable and 45% chance that they will increase in value by 15%. Also the shares
under consideration do not pay any dividends.
Bonds. These bonds stand a 40% chance of increase in value by 5% and 60% chance of
remaining stable and they yield 12%.
Real Estate. This proposal has a 20% chance of increasing 30% in value, a 25% chance
of increasing 20% in value, a 40% chance of increasing 10% in value, 10% chance of
remaining stable and a 5% chance of loosing 5% of its value.
Saving Certificates. These certificates will yields 8.5% with certainty.
Use a decision tree to structure the alternatives available to the investment staff, and
using the expected monetary value criteria, choose the alternative with the highest
expected value.
18. A manufacturing company has to select one of the two products A or B for manufacturing
product A requires investment of ` 20,000 and product B ` 40,000. Market research
survey shows high, medium and low demands with corresponding probabilities and return
from sales, in ` thousand, for the two products, in the following table :

Market Probability Return for sales

A B A B

High 0.4 0.3 50 80


Medium 0.3 0.5 30 60
Low 0.3 0.2 10 50

Construct an appropriate decision tree. What decision the company should take ?

Answers
1. (i) D3 (ii) D4 (iii) D3 (iv) D3 2. (i) A3 (ii) A1 (iii) A1 (iv) A1
3. (i) Savings (ii) Stock (iii) Bonds or Savings (iv) Bonds
4. (i) A3 (ii) A1 (iii) A35. A1 6. Property 7. A2
8. 9 dozen, ` 168 9. 3 boats 10. A2
11. Hot-snack stall 12. 6 items, ` 3.50 13. Market cream, ` 2,50,000
14. 4 cooking ranges, ` 315. 15. Wait for one year
16. Conduct research P to find a new process 17. Invest in real estate 18. Product B

Self-Instructional Material 71
Statistical Analysis

NOTES 4. SAMPLING AND SAMPLING


DISTRIBUTIONS

STRUCTURE

Sampling
Types of Sampling
Use of Random Numbers
Parameter and Statistic
Sampling Distribution of Mean
Sampling Distribution of Sample Variance
Sampling Distribution of Sample Proportion
Estimation
Point Estimation
Interval Estimation
Bayesian Estimation

4.1. SAMPLING

Sampling means the selection of a part of the aggregate with a view to draw
some statistical informations about the whole. This aggregate of the investigation is
called population and the selected part is called sample. A population is finite or infinite
according to its size i.e., number of members.
The main objective of the sampling is to obtain the maximum information of the
population. The analysis of the sample is done to obtain an idea of the probability
distribution of the variable in the population.
Though by applying proper process of sampling we may not be able to represent
the characteristics of the population correctly. This discrepancy is called sampling
error.

4.2. TYPES OF SAMPLING

There are different sampling methods. We describe below some important types
of sampling.
(a) Simple random sampling. In this type of sampling every unit of the
population has an equal chance of being selected in a sample. There are two ways of

72 Self-Instructional Material
drawing a simple random sample–With Replacement (WR) and Without Replacement Sampling and Sampling
(WOR). Distributions

In WR type, the drawn unit of the population is again returned to the population
so that the size of the population remains same before each drawing. In WOR type, the
drawn unit of the population is not returned to the population. For finite population NOTES
the size diminishes as the sampling process continues.
(b) Systematic sampling. In systematic sampling one unit is chosen at random
from the population and the items are selected regularly at predetermined intervals.
This method is quite good over the simple random sampling provided there is no
deliberate attempt to change the sequence of the units in the population.
(c) Cluster sampling. When the population consists of certain group of clusters
of units, it may be advantageous and economical to select a few clusters of units and
then examine all the units in the selected clusters. For example of certain goods which
are packed in cartons and repacking is costly it is advisable to select only few cartons
and inspect all the inside goods.
(d) Two-stage sampling. When the population consists of larger number of
groups each consisting of a number of items, it may not be economical to select few
groups and inspect all the items in the groups. In this case, the sample is selected in
two stages. In the first stage, a desired number of groups (primary units) are selected
at random and in the second stage, the required number of items are chosen at random
from the selected primary units.
(e) Stratified sampling. Here the population is subdivided into several parts,
called strata showing the heterogenity of the items is not so prominent and then a sub
sample is selected from each of the strata. All the sub-samples combined together give
the stratified sample. This sampling is useful when the population is heterogeneous.

4.3. USE OF RANDOM NUMBERS

The random numbers represent a sequence of digits where they appear in a


perfectly random order. Selection of a random number from a table of random numbers
has the same probability of selection. There are various methods to generate random
numbers. Also there are tables of random numbers. Briefly we illustrate the use of
random numbers. Let us consider the following two digits random numbers:
23, 04, 82, 07, 14, 66, 54, 10, 72 and 32.
Suppose we have marks of a subject of 100 students and we want to draw a
sample of marks of size 10. To draw this, number the students from 00 to 99 and using
the above random numbers select the marks of a student whose number is 23 since the
first random is 23. Next select a student whose number is 04 since the next random
number is 04. Repeating this process we obtain a sample of marks of size 10.
By considering another set of 10 random numbers, we can construct another
sample of marks of size 10 and so on.

4.4. PARAMETER AND STATISTIC

Any statistical measure relating to the population which is based on all units of
the population is called parameter, e.g., population mean (µ), population S.D. (),
moments µr , µr etc.
Self-Instructional Material 73
Statistical Analysis Any statistical measure relating to the sample which is based on all units of the
sample is called statistic, e.g., sample mean ( x ), sample variance, moments mr, m'r
etc. Hence the value of a statistic varies from sample to sample. This variation is
called ‘sampling fluctuation’. The parameter has no fluctuation and it is constant.
NOTES The probability distribution of a statistic is called ‘sampling distribution’. The standard
deviation (S.D.) in the sampling distribution is called ‘standard error’ of the statistic.
Example 1. For a population of five units, the values of a characteristic x are
given below:
8, 2, 6, 4 and 10.
Consider all possible samples of size 2 from the above population and show that
the mean of the sample means is exactly equal to the population mean.

30
Solution. The population mean, µ = 6
5
Random samples of size two (Without Replacement)
Serial Sample Sample Serial Sample Sample
no. values mean no. values mean
1 8, 2 5 6 2, 4 3
2 8, 6 7 7 2, 10 6
3 8, 4 6 8 6, 4 5
4 8, 10 9 9 6, 10 8
5 2, 6 4 10 4, 10 7

Total 31 Total 29

31 + 29 60
 Mean of sample means = = = 6 which is equal to the population
10 10
mean.

4.5. SAMPLING DISTRIBUTION OF MEAN

Case I : s Known
Consider a population having mean µ and variance 2. If a random sample of
size n is taken from this population then the sample mean X is a random variable
whose distribution has the mean µ.
2
If the population is infinite, then the variance of this distribution is n and the


standard error is defined as S.E. = .
n
If the population is finite of size N then the variance of this distribution is
2 N  n
 and the standard error is defined as
n N 1
 Nn
S.E. = 
n N 1
provided the sample is drawn without replacement.

74 Self-Instructional Material
Sampling and Sampling
Nn Distributions
The factor is called finite population correction factor.
N 1
Let us consider the standardized sample mean
X  NOTES
Z =
/ n
Then we have the central limit theorem as follows:
If X is the mean of a sample of size n taken from a population whose mean is µ
and variance is 2, then
X
Z =  N(0, 1) as n   .
/ n
If the samples come from a normal population then the sampling distribution of
the mean is normal regardless of the size of the sample.
If the population is not normal then the sampling distribution of the mean is
approximately normal for small size (n = 25) of the sample.
Example 2. A random sample of size 100 is taken from an infinite population
having the mean µ = 66 and the variance  2 = 225. What is the probability of getting an
x between 64 and 68?
x  
Solution. Let Z = , n = 100, µ = 66,  = 15
/ n
Required probability = P[64 < x < 68]
= P [–1.33 < z < 1.33]
= 2 (1.33) = 2 (0.4082)
= 0.8164.
Example 3. A random sample is of size 5 is drawn without replacement from a
finite population consisting of 35 units. If the population standard deviation is 2.25.
What is the standard error of sample mean?
Solution. Here, n = 5, N = 35,  = 2.25
 Nn
S.E. of sample mean = 
n N 1

2.25 30
=   0.95.
5 34
Case II : s Unknown
For small sample, the assumption of normal population gives fairly the sampling
distribution of X. However the  is replaced by sample standard deviation S. Then we
have
X   1
t =   ( x i  x )2
where, S2 =
S/ n n 1
is a random variable having the t distribution with the degrees of freedom v = n – 1.

Self-Instructional Material 75
Statistical Analysis
4.6. SAMPLING DISTRIBUTION OF SAMPLE VARIANCE

Like sample mean, if we calculate the sample variance for each samples drawn
NOTES from a population then it shows also a random variable. We have the following result:
If a random sample of size n with sample variance S2 is taken from a normal population
having the variance 2, then
(n  1)S2 1
2 = 2 where, S2 = ( xi  x )2
 n  1
is a random variable having the chi-square distribution with the degrees of freedom
v = n – 1.
2
(In chi-square distribution table  represents the area under the chi-square
distribution to its right is equal to ).
If S12 and S22 are the variances of independent random sample of size n1 and n2
respectively, taken from two normal populations having the same variance, then
S12
F =
S22
is a random variable having the F distribution with the degrees of freedoms v1
= n1 – 1 and v2 = n2 – 1.
Example 4. If two independent random samples of size n1 = 9 and n2 = 16 are
taken from the normal population, what is the probability that the variance of the first
sample will be at least four times as large as that of the second sample?
Solution. Here v1 = 9 – 1 = 8, v2 = 16 – 1 = 15, S21 = 4S22
From F distribution table we find that
F0.01 = 4.00 for v1 = 8 and v2 = 15.
Thus, the desired probability is 0.01.

4.7. SAMPLING DISTRIBUTION OF SAMPLE


PROPORTION

Consider a lot with proportion of defectives P. If a random sample of size n with


proportion of defectives p is drawn from this population then the sampling distribution
of p is approximately normal distribution with mean = P and S.D. = S.E. of sample

PQ
proportion  where, Q = 1 – P and the sample size n is sufficiently large. If the
n
random sample is drawn from a finite population without replacement then we have

Nn
to multiply a correction factor to the S.D. formula.
N 1
If p1 and p2 denote the proportions from independent samples of sizes n1 and n2
drawn from two populations with proportions P1 and P2 respectively then
P1Q1 PQ
S.E. of (p1 – p2) = + 2 2
n1 n2

where, P1 + Q1 = 1 and P2 + Q2 = 1.
76 Self-Instructional Material
Example 5. It has been found that 3% of the tools produced by a certain machine Sampling and Sampling
are defective. What is the probability that in a shipment of 450 such tools, 2% or more Distributions
will be defective?
Solution. Since the sample size n = 450 is large, the sample proportion (p) is
approximately normally distributed with mean = P = 3% = 0.03. NOTES

PQ (0.03) (0.97)
S.D. =   0.008
n 450
 Required probability = P[p > 0.02]
= P[z > –1.25] = 0.5 +  (1.25)
= 0.5 + 0.3944 = 0.8944.

EXERCISE 4.1
1. A population consists of 5 numbers (2, 3, 6, 8, 11). Consider all possible samples of size
two which can be drawn with replacement from this population. Calculate the S.E. of
sample means.
2. When we sample from an infinite population, what happens to the standard error of the
mean if the sample size is (a) increased from 30 to 270, (b) decreased from 256 to 16?
3. A random sample of size 400 is taken from an infinite population having the mean µ =
86 and the variance of 2 = 625. What is the probability that X will be greater than 90?
4. The number of letters that a department receives each day can be modeled by a
distribution having mean 25 and standard deviation 4. For a random sample of 30 days,
what will be the probability that the sample mean will be less than 26?
5. A random sample of 400 mangoes was taken from a large consignment and 30 were
found to be bad. Find the S.E. of the population of bad ones in a sample of this size.
6. From a population of large number of men with a S.D. 5, a sample is drawn and the
standard error is found to be 0.5, what is the sample size?
7. A population consists of 20 elements, has mean 9 and S.D. 3 and a sample of 5 elements
is taken without replacement. Find the mean and S.D. of the sampling distribution of
the mean. What will be the S.D. for samples of size 10?
8. A machine produces a component for a transistor set of the total produce, 6 percent are
defective. A random sample of 5 components is taken for examination from (i) a very
large lot of produce, (ii) a box of 10 components. Find the mean and S.D. of the average
number of defectives found among the 5 components taken for examination.
9. A population consists of five numbers 2, 3, 6, 8, 11. Consider all possible samples of size
two which can be drawn without replacement from the population. Find
(a) The mean of the population
(b) Standard deviation of the population
(c) The mean of the sampling distribution of means
(d) The standard deviation of the sampling distribution of means.

Answers
1. 2.32 2. (a) It is divided by 3 (b) It is multiplied by 4
3. 0.0007 4. 0.9147
5. 0.013 6. 100

Self-Instructional Material 77
Statistical Analysis
27
7. For sample of 5 elements, sampling mean = 8, S.D. =
19
3
For sample of 10 elements, sampling mean = 8, S.D. =
19
NOTES
8. Mean = 0.06, S.D. = 0.106
9. (a) 6, (b) 3.29, (c) 6, (d) 2.12.

4.8. ESTIMATION

When we deal with a population, most of the time the parameters are unknown.
So we cannot draw any conclusion about the population. To know the unknown
parameters the technique is to draw a sample from the population and try to gather
information about the parameter through a function which is reasonably close. Thus
the obtained value is called an estimated value of the parameter, the process is called
estimation and the estimating function is called estimator.
A good estimator should satisfy the four properties which we briefly explain below:
(a) Unbiasedness. A statistic t is said to be an unbiased estimator of a parameter
 if, E [t] = .
Otherwise it is said to be ‘biased’.
Theorem 1. Prove that the sample mean x is an unbiased estimator of the
population mean µ.
Proof. Let x1, x2, ...,xn be a simple random sample with replacement from a finite
population of size N, say, X1, X2, ..., XN
Here, x = (x1 + x2 + ...+ xn)/n
µ = (X1 + X2 + ...+ XN)/N
To prove that E (x ) = µ
While drawing xi, it can be one of the population members i.e., the probability
distribution of xi can be taken as follows:

xi X1 X2 ...XN for i = 1, 2, ..., n

Probability 1/N 1/N 1/N


Therefore,

1 1 1
E (xi) = X1 . + X2 . + ... + XN .
N N N
= (X1 + X2 + ... + XN)/N
= µ, i = 1, 2, ..., n.

and E ( x ) = E [(x1 + x2 + ... + xn)/n]


= [E (x1) + E (x2) + ... + E (xn)]/n
= [µ + µ + ...+ µ]/n = nµ/n = µ.
The same result is also true for infinite population and the sampling without
replacement.
78 Self-Instructional Material
Theorem 2. The sample variance Sampling and Sampling
Distributions
1
S2 =   ( x i  x )2
n
NOTES
is a biased estimator of the population variance 2.
Proof. Let x1, x2, ..., xn be a random sample from an infinite population with
mean µ and variance 2.
Then E (xi) = µ, Var (xi) = E (xi – µ)2 = 2, for i = 1, 2, ..., n.

1
s2 =   ( xi  x )2
n
1
=   xi2  ( x )2
n
1
=   yi2  ( y )2 , where, y i = x i – µ and S.D
n
is unaffected by change of origin.

1
=   ( xi  )2  ( x  )2
n
1
 E (s2) =   E ( x i   )2  E( x  )2
n

1 2 n  1
= .2 – Var ( x ) = 2 – = . 2  2.
n n n
 s2 is a biased estimator of 2
1
Note. Let S2 =  ( xi  x )2 , then
(n  1)

n
E (S2) = .E (s2)
n 1

n n 1 2
= .  = 2
n 1 n
Thus, S2 is an unbiased estimator of 2.
Example 1. A population consists of 4 values 3, 7, 11, 15. Draw all possible sample
of size two with replacement. Verify that the sample mean is an unbiased estimator of
the population mean.
Solution. No. of samples = 42 = 16, which are listed below:
(3, 3), (7, 3), (11, 3), (15, 3)
(3, 7), (7, 7), (11, 7), (15, 7)
(3, 11), (7, 11), (11, 11), (15, 11)
(3, 15), (7, 15), (11, 15), (15, 15)

Self-Instructional Material 79
Statistical Analysis
3  7  11  15 36
Population mean, µ=  9
4 4
Sampling distribution of sample mean
NOTES
Sample mean Frequency x . f (x )
(x ) f (x )
3 1 3
5 2 10
7 3 21
9 4 36
11 3 33
13 2 26
15 1 15
Total 16 144

144
Mean of sample mean = =9
16

Since, E ( x ) = µ,
 Sample mean is an unbiased estimator of the population mean.
(b) Consistency. A statistic tn obtained from a random sample of size n is said
to be a consistent estimator of a parameter if it converges in probability to  as n tends
to infinity.
Alt, If E [Tn]  and Var [Tn]  0 as n , then the statistic tn is said to be
consistent estimator of .
For example, in sampling from a Normal Population N (µ, 2),

2
E [ x ] = µ and Var [ x ] =  0 as n  .
n

Hence, the sample mean is a consistent estimator of population mean.


(c) Efficiency. There may exist more than one consistent estimator of a
parameter. Let T1 and T2 be two consistent estimators of a parameter . If Var
(T1) < Var (T2) for all n then T1 is said to be more efficient than T2 for all sample size.
If a consistent estimator has least variance than any other consistent estimators
of a parameter, then it is called the most efficient estimator.
Let T be the most efficient estimator and T1 be any other consistent estimator of
a parameter. Then, we define
Var (T)
Efficiency = Var (T )
1

which is less than equal to one.

80 Self-Instructional Material
A statistic which is unbiased and also the most efficient, is said to be the Minimum Sampling and Sampling
Variance Unbiased Estimator (MVUE). Distributions

Note. If T1 and T2 are two MVU Estimators of a parameter then T1 = T2.

For example, the sample mean x obtained from a normal population is the MVUE for NOTES
the parameter µ.
Let x1, x2, ..., xn be a random sample and
T = a1x1 + a2x2 + ... + anxn
where a1, a2, ..., an are constants. If T is an MVUE, then T is also called Best Linear
Unbiased Estimator (BLUE).

Example 2. A random sample (X1, X2, X3, X4, X5, X6 ) of size 6 is drawn from a
normal population with unknown mean µ. Consider the following estimators to
estimate µ.

X1  X 2 + X3  X4 + X5  X6
(i) T1 =
6

X1  X 2 + X3 X  X 5 + X6
(ii) T2 =  4
2 3

(iii) T3 = 1 1
(X1  X 2 ) + X 3  X 4  (X 5  X6 )
2 3
Are these estimators unbiased? Find the estimator which is best among T1, T2
and T3.
Solution. Here E (Xi) = µ, Var (Xi = 2 (say), Cov (Xi , Xj) = 0, i  j

1
E (T1) = [E (X1) + E (X2) + E (X3) + E (X4) + E (X5) + E (X6)]
6

1 1
= [µ + µ + µ + µ + µ + µ) = .6 µ = µ.
6 6

1 1
E (T2) = [E (X1) + E (X2) + E (X3)] + [E (X4) + E(X5) + E (X6)]
2 3

1 1 3 5
= [µ + µ + µ] + [µ + µ + µ] =  = .
2 3 2 2

1 1
E (T3) = [E (X1) + E (X2)] + E (X3) + E (X4) + [E (X5) + E (X6)]
2 3

1 1
= [µ + µ] + µ + µ + [µ + µ]
2 3

2 11
= µ + 2µ + = .
3 3

Self-Instructional Material 81
Statistical Analysis Since E (T1) = µ  T 1 is unbiased. T 2 and T 3 are biased
estimators.
1
Var (T1) = [Var (X1) + Var (X2) + ...+ Var (X6)]
36
NOTES
1 1 2
= [2 + 2 + ... + 2] = (62) = .
36 36 6
1
Var (T2) = [Var (X1) + Var (X2) + Var (X3)]
4
1 1
+ [Var (X4) + Var (X5) + Var (X6)]
9 9
1
= [2 + 2 + 2] + [2 + 2 + 2]
4
3 2 32 13 2
=  + =  .
4 9 12
1
Var (T3) = [Var (X1) + Var (X2)] + Var (X3) + Var (X4)
4
1
+ [Var (X5) + Var (X6)]
9
1 1
= [2 + 2 ] + 2 + 2 + [2 + 2 ]
4 9
2 22 49 2
=  22  =  .
2 9 18
Since Var (T1) is smallest T1 is best estimator.
2 2
6
Efficiency of T1 over T2 =   0.15
13 2
12
13
2
 3
6
Efficiency of T1 over T3 =   0.06.
49 2
18
49
Example 3. A random sample (X1, X2, X3, X4 ) of size 4 is drawn from a normal
population with unknown mean. If

T = 2 X1 + X + 3 X3 – 4 X4
2 2
be an unbiased estimator of µ, find .
Solution. Let E (Xi) = µ, i = 1, 2, 3, 4.
For unbiasedness, E (T) = µ

 2E (X1) + E(X2) + 3 E(X3) – 4 E(X4) = µ
2

 2µ + µ + 3µ – 4µ = µ
2

 µ+ µ = µ
2

 = 0  = 0.
2

82 Self-Instructional Material
(d ) Sufficiency. Let x1, x2, ..., xn be a random sample from a population whose Sampling and Sampling
p.m.f. or pdf is f (x,). Then T is said to be a sufficient estimator of  if we can express Distributions
the following:
f (x1,) . f (x2,) ... f (xn,) = g1 (T,) . g2 (x1,x2, ..., xn)
NOTES
where g1 (T, ) is the sampling distribution of T and contains and g2 (x1, x2, ..., xn) is
independent of .
Sufficient estimators exist only in few cases. However in random sampling from
a normal population, the sampling mean x is a sufficient estimator of µ.

4.9. POINT ESTIMATION

Using sampling if a single value is estimated for the unknown parameter of the
population, then this process of estimation is called point estimation. We shall discuss
two methods of point estimation below:
I. Method of Maximum Likelihood
Let x1, x2, ..., xn be a random sample from a population whose p.m.f. (discrete case)
or p.d.f. (continuous case ) is f (x, ) where  is the parameter. Then construct the
likelihood function as follows:
L = f (x1, ). f (x2, ) ...f (xn, ).
Since, log L is maximum when L is maximum. Therefore to obtain the estimate of
, we maximize L as follows:

(log L) = 0   = 

2
and (log L) < 0 at  = 
2
Here  is called Maximum Likelihood Estimator (MLE).
Properties of MLE
(i) MLE is not necessarily unbiased.
(ii) MLE is consistent, most efficient and also sufficient, provided a sufficient
estimator exists.
(iii) MLE tends to be distributed normally for large samples.
(iv) If g() is a function of  and  is an MLE of , then g(  ) is the MLE of g().
Example 4. A discrete random variable X can take up all non-negative integers and
P (X = r) = p (1 – p)r (r = 0, 1, 2, ...)
where, p (0 < p < 1) is the parameter of the distribution. Find the MLE of p for a sample
of size n : x1 , x2 , ..., xn from the population of X.
Solution. Consider the following likelihood function:
L = P (X = x1) . P (X = x2) ... P (X = xn)
= p (1  p)x1 . p (1  p)x2 ... p (1  p)x n
 x2  ...  x n
= pn (1  p)x1 = pn (1  p)xi

Self-Instructional Material 83
Statistical Analysis Taking log on both sides we obtain
ln L = n ln p + (xi) ln (1 – p)
dlnL
Now = 0
dp
NOTES
n  xi
    0
p 1 p
n  xi
 
p 1 p
1  p  xi
  
p n
1
  1 = x
p
1
  pˆ 
1x
Also,
d2 ln L n  xi  1 x 
=   =  n  2  2
(1  p) 
2 2 2
dp p (1  p ) p
 2 x (1  x )2 
=  n  (1  x )  
 ( x )2 
1 1
at pˆ  2
=  n (1  x ) (1  )  0
1x x

1
Hence the MLE of p is .
1 x

Example 5. A random variable X has a distribution with density function:


f (x) =  x– 1 (0 < x < 1)
where is the parameter. Find the MLE of  for a sample of size n : x1, x2, ...., xn from
the population of X.
Solution. Consider the following likelihood function:
L = f(x1). f(x2) ... f(xn)
=  x1 1
.  x2 1
...  x n 1

= n (x1 . x2 ....xn)– 1
Taking log on both sides we obtain
ln L = n ln + (– 1) ln (x1 . x2 .... xn)

d ln L n
Now, = 0   ln ( x1 x2 ... xn )  0
d 
n
   ln ( x1 x2 ... xn )

n
 ˆ 
ln ( x1 x2 ... xn )

84 Self-Instructional Material
Sampling and Sampling
d 2 ln L n
Also, =  0 Distributions
d 2 2

n
Hence, the MLE of  is . NOTES
ln ( x1 x2 ... xn )

Example 6. X tossed a biased coin 40 times and got head 15 times, while Y tossed
it 50 times and got head 30 times. Find the MLE of the probability of getting head when
the coin is tossed.
Solution. Let P be the unknown probability of getting a head.
Using binomial distribution,
 40  15 25
Probability of getting 15 heads in 40 tosses =   P (1  P)
15
 
 50  30 20
Probability of getting 30 heads in 50 tosses =   P (1  P)
 30 
The likelihood function is taken by multiplying these probabilities.
 40   50 
L =   .   P45 (1  P)45
15   30 
 40   50  
 log L = log   .     45 log P  45 log (1  P)
15   30  
 log L 45 45
Hence, =0    0  P  1/2, which is the MLE.
P P 1  P
II. Method of Moments
In this method, the first few moments of the population is equated with the
corresponding moments of the sample.
Then µ'r = m'r
where µ'r = E ( xr ) and m'r = xir/n
The solution for the parameters gives the estimates. But this method is applicable
only when the population moments exist.
Example 7. Estimate the parameter p of the binomial distribution by the method
of moments (when n is known).
Solution. Here, µ'1 = E (x) = np and m'1 = x
Taking µ'1 = m'1, we have
np = x

x
 p =
n

which is the estimated value.

Self-Instructional Material 85
Statistical Analysis
4.10. INTERVAL ESTIMATION

In interval estimation we find an interval which is expected to include the unknown


NOTES parameter with a specified probability, i.e.,
P (t1  t2) = k
where, [t1, t2] is called confidence interval (C.I.),
t1, t2 are called confidence limits,
k is called confidence co-efficient of the interval.
(a) C.I. for mean with known S.D. Let us consider a random sample of size n
from a Normal Population N (µ, 2) in which s2 is known. To find C.I. for mean µ.

x 
We know that z = follows standard normal distribution and 95% of the
 n
area under the standard normal curve lies between z = 1.96 and z = –1.96, Then,
P [–1.96 z 1.96] = 0.95
 x   
 P  1.96   1.96  = 0.95
  n 
i.e., in 95% cases we have
x  
1.96   1.96
 n
 
 x  1.96    x  1.96
n n
   
The interval  x  1.96 , x  1.96  is known as 95% confidence interval
 n n
for µ.
   
Similarly,  x  2.58 , x  2.58  is known as 99% C.I. for µ,
 n n
  3 
 x  3. ,x   is known as 99.73% C.I. for µ.
 n n
(b) C.I. for mean with unknown S.D. s.
In this case, the sampling from a normal population N (µ, 2), the statistic

x   1
t= , where s2   ( x i  x )2
s n 1 n

follows t distribution with (n – 1) degree of freedom.


Then for 95% confidence interval for mean µ we have

x  
t0.025   t0.025
s n 1

s s
 x  t0.025 .    x  t0.025
n  1 n  1

86 Self-Instructional Material
  Sampling and Sampling
s s Distributions
Thus,  x  t0.025 . , x  t0.025 .  is called 95% C.I. for µ.
 n 1 n  1 

 s s 
Similarly,  x  t0.005 . , x  t0.005 .  is called 99% C.I. for µ. NOTES
 n 1 n  1 
(c) C.I. for variance s2 with known mean. We know that (xi – µ)2/2 follows
chi-square distribution with n degrees of freedom.
For probability 95% we have
 20.975   ( xi  )2 / 2  0.025
2

  ( xi   )2 / 20.025  2   ( xi  )2 / 20.975


which is 95% confidence interval for 2.
Similarly,

 ( xi   )2 / 20.005  2   ( xi   )2 / 0.995
2

is the 99% confidence interval for .


(d) C.I. for variance s2 with unknown mean. In this case
ns2 / 2   ( xi  x )2 / 2 follows chi-square distribution with (n – 1) degrees of
freedom.
For probability 95% we have

20.975  ns2 / 2  20.025


 ns2 / 20.025  2  ns2 / 20.975
which is 95% C.I. for 2
Similarly, ns2 / 20.005  2  ns2 / 20.995 is the 99% C.I. for 2.
Some of the Confidence Limits are given below:
(with Normal Population N (µ, 2))
Difference of Means (µ1 – µ2) : (S.Ds known).

95% Confidence limits = ( x1  x2 )  1.96 12 n1  22 n2

99% Confidence limits = ( x1  x2 )  2.58 12 n1  22 n2


Difference of Means (µ1 – µ2) : (Common S.D. unknown)
1 1
95% Confidence limits = ( x1  x2 )  t0.025 . s 
n1 n2

1 1
99% Confidence limits = ( x1  x2 )  t0.005 s . 
n1 n2

For Proportion P:
95% Confidence limits = p ± 1.96 (S.E. of p)
99% Confidence limits = p ± 2.58 (S.E. of p)
PQ pq
where, S.E. of p = 
n n

Self-Instructional Material 87
Statistical Analysis For Difference of Proportions P1 – P2 :
95% Confidence limits = (p1 – p2) ± 1.96 [S.E. of (p1 – p2)]
99% Confidence limits = (p1 – p2) ± 2.58 [S.E. of (p1 – p2)]

NOTES P1 Q1 P Q p1 q1 p q
where S.E of (p1 – p2) =  2 2   2 2.
n1 n2 n1 n2

Example 8. A random sample of size 10 was drawn from a normal population with an
unknown mean and a variance of 35.4 (cm)2. If the observations are (in cms): 55, 75, 71, 66,
73, 77, 63, 67, 60 and 76, obtain 99% confidence interval for the population mean.
x
Solution. Given n = 10, xi = 683, Then x =  68.3
n
Since, the population S.D. is known, then 99% C.I. for µ is given by
   
 x  2.58 , x  2.58 
 n n
 2.58 . 35.4 2.58 . 35.4 
i.e., 68.3  , 68.3  
 10 10 
i.e., [63.45, 73.15].
Example 9. A random sample of size 10 was drawn from a normal population
which are given by 48, 56, 50, 55, 49, 45, 55, 54, 47, 43. Find 95% confidence interval for
mean µ of the population.
Solution. From the given data, xi = 502, so x  50. 2 , n  10
Let d = x – 50, then the samples are changed to
–2, 6, 0, 5, –1, –5, 5, 4, –3, –7.
d = 2, d2 = 190
2 2
d 2  d  190  2 
 s2 =        18.96
n  n  10  10 
s = 4.35
Since, the population S.D. is unknown, the 95% C.I. for mean µ is
 s s 
 x  2.262 . , x  2.262 . 
 n n
 (4.35) (4.35) 
i.e., 50.2  (2.262) , 50.2  (2.262) 
 10 10 
i.e., [47.09, 53.31].
Example 10. The standard deviation of a random sample of size 15 drawn from
a normal population is 3.2. Calculate the 95% confidence interval for the standard
deviation () in the population.
Solution. Here n= 15, sample s.d. (s) = 3.2
95% Confidence interval for 2 is
n s2 n s2
2
 2  2
0.025 0.975

88 Self-Instructional Material
From chi-square table with 14 degrees of freedom, Sampling and Sampling
Distributions
20.025 = 26.12, 20.975 = 5.63
Therefore the C.I. is
NOTES
15.(3.2)2 15.(3.2)2
 2 
26.12 5.63
i.e., 5.88    27.28
2

i.e., 2.42    5.22.


Example 11. A sample of 500 springs produced in a factory is taken from a large
consignment and 65 are found to be defective. Estimate the assign limits in which the
percentage of defectives lies.
Solution.There are 65 defective springs in a sample of size n = 500.
 The sample proportion of defective is
65
p =  0.13
500
The limits to the percentage of defectives refer to the C.I., which can be taken as
[p – 3 (S.E. of p), p + 3 (S.E. of p)]

Here S.E. of p = PQ
n

 pq 65  65  1
 1   0.02
n 500  500  500
Thus, the limits are [0.13 – 3 (0.02), 0.13 + 3 (0.02)]
i.e., [0.07, 0.19].

4.11. BAYESIAN ESTIMATION


Bayesian estimation uses subjective judgement in an engineering design.
For discrete case, let the parameter  takes the values i, i = 1, 2, ..., n with the
probabilities pi = P [i]. Let 0 be the observed outcome of the experiment.
Then by Bayes’ theorem we obtain,
P [0   i ] . P [  i ]
P[= i|0] = n
i = 1, 2, ..., n
 P [ 0    j ] . P [    j ]
j 1
Then the expected value of  is called Bayesian estimator of the parameter, i.e.,
 = E   i 0
n
=  i . P   i 0 
i 1
Using this we can calculate
n
P [X  a] =  P  X  a   i  . P   i 0 
i 1
For continuous case, let  be a random variable of the parameter of the distribution
given by the density function f’(). Then
P[i <  < i + ] = f '(i) ., i = 1, 2, ..., n

Self-Instructional Material 89
Statistical Analysis If0is an observed experimental outcome, then
P 0 i  . f ’( i )
f " (i)  = n
, i = 1, 2, ..., n
  
P 0  j  f ’  j 
NOTES j 1

P  0   f ’  
In the limit we obtain, f " () = 

 P 0   f ’   d

Then the Bayesian estimator is

 = E  0 =   f "  d

Using this we can calculate

P X  a =  P  X  a   f "  d.


SUMMARY

 Sampling means the selection of a part of the aggregate with a view to draw
some statistical informations about the whole. This aggregate of the
investigation is called population and the selected part is called sample.
 Any statistical measure relating to the population which is based on all units
of the population is called parameter.
 Any statistical measure relating to the sample which is based on all units of
the sample is called statistic.
 When we deal with a population, most of the time the parameters are unknown.
So we cannot draw any conclusion about the population. To know the unknown
parameters the technique is to draw a sample from the population and try to
gather information about the parameter through a function which is reasonably
close. Thus, the obtained value is called an estimated value of the parameter,
the process is called estimation and the estimating function is called estimator.
 If a consistent estimator has least variance than any other consistent estimators
of a parameter, then it is called the most efficient estimator.
 Using sampling if a single value is estimated for the unknown parameter of
the population, then this process of estimation is called point estimation.

EXERCISE 4.2
1. A random variable X has a distribution with density function:
f (x) = .(+ 1) x, 0  x  1,> – 1
= 0, otherwise
and a random sample of size 8 produces the data: 0.2, 0.4, 0.8, 0.5, 0.7, 0.9, 0.8 and 0.9.
Find the MLE of the unknown parameter .

90 Self-Instructional Material
2. A random variable X has a distribution with density function: Sampling and Sampling
Distributions
( a  1) x a
f (x) = 1
, 0x2
2a
= 0, otherwise NOTES
Find the MLE of the parameter a (> 0).
3. Consider a random sample of size n from a population following Poisson distribution.
Obtain the MLE of the parameter of this distribution.
4. Consider a random sample x1, x2, ..., xn from a normal population having mean zero.
Obtain the MLE of the variance and show that it is unbiased.
5. Consider a random sample x1, x2, ..., xn from a population following binomial
distribution having parameters n and p. Find the MLE of p and show that it is unbiased.
6. Find the estimates of µ and  in the normal populations N (µ, 2) by the method of
moments.
7. Show that the estimates of the parameter of the Poisson distribution obtained by the
method of maximum likelihood and the method of moments are identical.
8. Find a 95% C.I. for the mean of a normal population with  = 3, given the sample 2.3,
– 0.2, 0.4 and – 0.9.
9. In a sample of size 10, the sample mean is 3.22 and the sample variance 1.21. Find
the 95% C.I. for the population mean.
10. A sample of size 10 from a normal population produces the data 2.03, 2.02, 2.01, 2.00,
1.99, 1.98, 1.97, 1.99, 1.96 and 1.95. From the sample find the 95% C.I. for the
population mean.
11. A random sample of size 10 from a N (µ, 2) yields sample mean 4.8 and sample
variance 8.64. Find 95% and 99% confidence intervals for the population mean.
12. The following random sample was obtained from a normal population : 12, 9, 10, 14,
11, 8. Find the 95% C.I. for the population S.D. when the population mean is (i)
known to be 13, (ii) unknown.
13. The marks obtained by 15 students in an examination have a mean 60 and variance
30. Find 99% confidence interval for the mean of the population of marks, assuming it
to be normal.
14. 228 out of 400 voters picked at random from a large electorate said that they were
going to vote for a particular candidate. Find 95% C.I. for the proportion of voters of
the electorate who would in favour of the candidate.
15. In a random sample of 300 road accidents, it was found that 114 were due to bad
weather. Construct a 99% confidence interval for the corresponding true proportions.
16. A study shows that 102 of 190 persons who saw an advertisement on a product on
T.V. during a sports program and 75 of 190 other persons who saw it advertised on a
variety show purchased the product. Construct a 99% confidence interval for the
difference of sample proportions.
Answers

n
1.  = 0.890091 2. aˆ   1 3.  = x
 n n 
ln  2 /  xi 
 i 1 

4. 2 =  x 2 / n 5. p = x / n. 6. ˆ  x , ˆ 2  s2
i

Self-Instructional Material 91
Statistical Analysis 8. [–2.54, 3.34] 9. [2.39, 4.05] 10. [1.972, 2.008]
11. 95% C.I. [2.233, 7.367], 99% C.I. [1.616, 7.984]
12. (i) [1.97, 6.72], (ii) [1.35, 5.30] 13. [55.64, 64.36]
NOTES
14. [0.52, 0.62] 15. [0.31, 0.45] 16. [0.02, 0.28].

FURTHER READINGS

1. ATB of Quantitative Techniques, N.P. Bali, University Science Press.


2. Statistics and Operation Research — A Unified Approach, Dr. Debashis Dutta,
Laxmi Publication.
3. Business Mathematics & Statistics, B.M. Agarwal, Ane Books Pvt. Ltd.
4. ATB of Quantitative Techniques, N.P. Bali, University Science Press.
5. Statistics and Operation Research — A Unified Approach, Dr. Debashis Dutta,
Laxmi Publication.
6. Business Mathematics & Statistics, B.M. Agarwal, Ane Books Pvt. Ltd.

92 Self-Instructional Material
Hypothesis Testing

5. HYPOTHESIS TESTING
NOTES
STRUCTURE

Introduction
Null Hypothesis and Alternative Hypothesis
Level of Significance and Confidence Limits
Type I Error and Type II Error
Power of the Test
Test of Significance for Small Samples
Student’s t-Test
Assumptions for Student’s t-test
Degree of Freedom
Test for Single Mean
t-test for Difference of Means
Paired t-test For Difference of Means
F-test
Properties of F-distribution
Procedure to F-test
Critical Values of F-distribution
Test of Significance for Large Samples
Test of Significance for Proportion
Test of Significance for Single Mean
Test of Significance for Difference of Means

5.1. INTRODUCTION
INTRODUCTION

To describe a set of data or observations, we use statistics such as mean and


standard deviation. These statistics are estimated from samples. Sample is nothing
but a small section selected from the population and the process of drawing or selecting
a sample from the population is called ‘sampling’. It is essential that a sample must be
a random selection so that each member of the population has the equal chance of
being selection in the sample. A statistical population consists of observations of some
characteristic of interest associated with the individuals concerned and not the
individual items or persons themselves.
A statistical measure based only on all the units selected in a sample is called
‘statistic’, e.g., sample mean, sample standard deviation, proportion of defectives, etc.
whereas a statistical measure based on all the units in the population is called
‘parameter’. The terms like mean, median, mode, standard deviation are called
parameters when they describe the characteristics of the population and are called
statistic when they describe the characteristics of the sample.
A very important aspect of the sampling theory is the study of the tests of
significance which enables us to decide on the basis of the sample results whether to

Self-Instructional Material 93
Statistical Analysis accept or reject the hypothesis. A test of significance can be used to compare the
characteristics of two samples of the same type. Some of the well known tests of
significance for small samples are t-test and F-test.

NOTES
5.2. NULL HYPOTHESIS AND AL
HYPOTHESIS TERN
ALTERNATIVE
TERNA
HYPOTHESIS
HYPOTHESIS

A statistical hypothesis is a statement about a population parameter. There are


two types of statistical hypothesis, null hypothesis and alternative hypothesis.
The hypothesis formulated for the sake of rejecting it under the assumption
that it is true, is called the null hypothesis and is denoted by H0. Null hypothesis
asserts that there is no significant difference between the sample statistic and the
population parameter and whatever difference is observed that is merely due to
fluctuations in sampling from the same population.
Rejecting null hypothesis implies that it is rejected in favour of some other
hypothesis which is accepted. A hypothesis which is accepted when H0 is rejected is
called the alternative hypothesis and is denoted by H1. What we intend to conclude is
stated in the alternative hypothesis.

5.3. LEVEL OF SIGNIFICANCE AND CONFIDENCE


LIMITS

The probability level below which we reject the hypothesis is known as the ‘level
of significance’. The region in which a sample value falling is rejected, is known as the
‘critical region’ or the ‘rejection region’. We generally, take two critical regions which
cover 5% and 1% areas of the normal curve.
Depending on the nature of the problem, we use a single-tail test or double-tail
test to estimate the significance of a result. In a single-tail test, only the area on the
right of an ordinate is taken into consideration whereas in a double-tail test, the areas
of both the tails of the curve representing the sampling distribution are taken into
consideration.
For example, a test for testing the mean of a population
H0 :  = 0
against the alternative hypothesis H1 :  > 0 (right tailed) or H1 :  < 0 (left tailed) is
a single tailed test. In the right tailed test (H1 :  > 0), the critical region lies entirely
in the right tail of the sampling distribution ; while for the left tail test (H1 :  < 0),
the critical region is entirely in the left tail of the sampling distribution.
A test of statistical hypothesis where the alternative hypothesis is two tailed
such as :
H0 :  = 0 against the alternative hypothesis
H1 :   0 ( > 0 and  < 0) is known as two tailed test and in such a case the
critical region is given by the portion of the area lying in both the tails of the probability
curve of the test statistic.
The value of z corresponding to 5% level of significance is  1.96 and corresponding
to 1% level of significance value of z is  2.58. The set of z-scores outside the range

94 Self-Instructional Material
 1.96 and  2.58 constitute the critical region of the hypothesis (or the region of Hypothesis Testing
rejection) at 5% and 1% level of significance respectively.
The following figure showing region of acceptance and rejection for 5% and 1%
level of significance.
NOTES

Region of
Region of
acceptance
Critical region acceptance Critical region Critical region Critical region
99% area
or region of 95% area or region of or region of or region of
rejection rejection rejection rejection

2.5% 2.5% 0.5% 0.5%


z = –1.96 z=0 z = 1.96 z = –2.58 z=0 z = 2.58
(5% level of significance) (1% level of significance)

5.4. TYPE I ERR OR AND TYPE II ERR


ERROR OR
ERROR

The error of rejecting H0 when H0 is true is called the type I error and the error
of accepting H0 when H0 is false (H1 is true) is called the type II error. The probability
of type I error is denoted by  and the probability of type II error is denoted by .
P (rejecting H0 when H0 is true) = 
P (accepting H0 when H1 is true) = 

5.5. POWER OF THE TEST


POWER

A good test should accept the null hypothesis when it is true and reject the null
hypothesis when it is false. 1 –  (i.e., 1–probability of type II error) measures how well
the test is working and is called the power of the test.
Power of the test = 1 – .

TEST OF SIGNIFICANCE FOR SMALL SAMPLES

5.6. STUDENT’S t-TEST

Let x1, x2, ......, xn be a random sample of size n (n < 30) from a normal population
with mean  and variance 2. The student’s t-test is defined as
x
t= ,
S/ n
n
1 n
where x =
n  x , is the sample mean and S =
i1
i
1

n  1 i1
( xi  x ) 2 is an unbiased

estimate of the standard deviation .


Self-Instructional Material 95
Statistical Analysis
5.7. ASSUMPTIONS FOR STUDENT’S t-TEST

The following assumptions are made in student’s t–test :


NOTES (i) The parent population from which the sample is drawn is normal.
(ii) The population standard deviation () is unknown
(iii) Sample size is less than 30.

5.8. DEGREE OF FREEDOM

The number of independent variates which make up the statistic is known as


the degree of freedom (d.f.) and is denoted by  (the letter ‘Nu’ of the Greek alphabet).
In general the degree of freedom is defined as
d.f. = number of frequencies – number of independent constraints on them.

5.9. TEST FOR SINGLE MEAN

Suppose we want to test


(i) If a random sample xi (i = 1, 2, ....., n) of size n has been drawn from a normal
population with a specified mean say  or
(ii) If the sample mean differs significantly from the hypothetical value  of the
population mean.
Under null hypothesis H0 :
(i) The sample mean has been drawn from the population with mean  or
(ii) There is no significant difference between the sample mean x and the
population mean , the statistic
x
t= ,
S/ n
n n
1 1
where x =
n x
i1
i and S=
n 1i1 
( xi  x ) 2

follows Student’s t-distribution with (n – 1) degrees of freedom.


We now compare the calculated value of t with the tabulated value at certain
level of significance. If calculated |t| > tabulated t, H0 is rejected and if calculated |t|
< tabulated t, H0 may be accepted.
Note. We know, the sample variance
1
s2 = (xi – x )2
n
 ns2 = (n – 1) S2
S2 s2 S s
or   
n n1 n n1
Hence, the test statistic becomes
x x
t=  .
S/ n s/ n  1

96 Self-Instructional Material
Hypothesis Testing
SOLVED EXAMPLES
Example 1. The mean weekly sales of soap bars in departmental stores was
146.3 bars per store. After an advertising campaign the mean weekly sales in 22 stores
for a typical week increased to 153.7 and showed a standard deviation of 17.2. Was the NOTES
advertising campaign successful ?
Solution. Here, n = 22, x = 153.7, s = 17.2
Null hypothesis H0 :  = 146.3, i.e., the advertising campaign is not successful.
Alternative hypothesis H1 :  > 146.3 (Right tail)
Under H0, the test statistic is
x
t= with (22 – 1) = 21 d.f.
s/ n  1
153.7  146 .3 7.4  21
t=  = 9.
17.2 / 22  1 17.2
Since calculated value of t = 9 is greater than the tabulated value of t = 1.72 for
21 d.f. at 5% level of significance. It is highly significant. So H0 is rejected, i.e., the
advertising campaign was successful in promoting sales.
Example 2. Ten individuals are chosen at random from a normal population
and the heights are found to be in inches 63, 63, 66, 67, 68, 69,70, 70, 71 and 71. Test if
the sample belongs to the population whose mean height is 66 inches. (Given t 0.05 = 2.26
for 9 d.f.)
Solution.
xi xi – x (xi – x )2
63 – 4.8 23.04
63 – 4.8 23.04
66 – 1.8 3.24
67 – 0.8 0.64
68 0.2 0.04
69 1.2 1.44
70 2.2 4.84
70 2.2 4.84
71 3.2 10.24
71 3.2 10.24

xi = 678 (xi – x )2 = 81.6


Here, n = 10
x i 678
x = sample mean =  = 67.8 inches
n 10
1 1
S= ( x i  x ) 2   81.6
n 1 9

= 9.0667 = 3.011
Null hypothesis H0 :  = 66, i.e., population mean is 66 inches
Under H0, the test statistic is
x  67.8  66 1.8  10 5.692
t=   = = 1.8904
S/ n 3.011/ 10 3.011 3.011
degree of freedom = n – 1 = 10 – 1 = 9
t0.05 = 2.26 for 9 d.f.

Self-Instructional Material 97
Statistical Analysis As the calculated value of |t| is less than t0.05, the difference between x and 
may be due to fluctuations of random sampling. H0 may be accepted. In other words,
the data does not provide any significant evidence against the hypothesis that the
population mean is 66 inches.
NOTES Example 3. A random sample of 16 values from a normal population showed
a mean of 41.5 inches and the sum of squares of deviations from this mean equal to
135 square inches. Show that the assumption of a mean of 43.5 inches for the
population is not reasonable. (Given t0.05 = 2.13, t0.01 = 2.95 for 15 degrees of freedom)
Solution. Here, x = 41.5 inches, n = 16, (xi – x )2 = 135 sq. inches
1 1
S
n1 
( xi  x ) 2 
15
 135  9  3

Null hypothesis H0 :  = 43.5 inches, i.e., the data are consistent with an
assumption that the mean height in population is 43.5 inches.
Alternative hypothesis H1 :   43.5 inches
Under H0, the test statistic is
x
t=
S/ n
.  43.5| 2  4
|415
|t| = = = 2.667
3 / 16 3
degrees of freedom = n – 1 = 16 – 1 = 15
We are given t0.05 = 2.13 and t0.01 = 2.95 for 15 degrees of freedom.
Since calculated |t| is greater than t0.05 = 2.13, null hypothesis H0 is rejected at
5% level of significance and we conclude that the assumption of mean 43.5 inches for
the population is not reasonable.
Remark. Since calculated |t| is less than t0.01 = 2.95, null hypothesis H0 may be ac-
cepted at 1% level of significance.

5.10. t-TEST FOR DIFFERENCE OF MEANS

Given two independent random samples xi (i = 1, 2, ......, n1) and


yj (j = 1, 2, ...... , n2) of sizes n1 and n2 with means x and y and standard deviations S1
and S2 from normal populations with the same variance, we have to test the hypothesis
that the population means are same. In other words, since a normal distribution is
completely specified by its mean and variance, we have to test the hypothesis that the
two independent samples come from the same normal population.
The statistic is given by
x~y
t= ,
1 1
S 
n1 n2
n1 n2
1 1
where x =
n1 
i1
xi ; y =
n2 y j
j1
1
and S2 = [(n1 – 1) S12 + (n2 – 1)S22]
(n1  n2  2)
1  n1 n2 "#
or S2 =  (x  x)2   ( y j  y) 2
#$
!
i
n1  n2  2 i1 j1
follows Student’s t-distribution with (n1 + n2 – 2) degrees of freedom.
98 Self-Instructional Material
If the calculated value of |t| be > tabulated t, the difference between the sample Hypothesis Testing
means is said to be significant at certain level of significance ; otherwise the data are
said to be consistent with the hypothesis.

NOTES
5.11. PAIRED t-TEST FOR DIFFERENCE OF MEANS

If the size of the two samples is the same, say equal to n, and the data are
paired, i.e. (xi, yi), (i = 1, 2,......, n) corresponds to the same ith sample unit. The problem
is to test if the sample means differ significantly or not.
Here, we consider the increments, di = xi – yi , (i = 1, 2, ......, n).
Under the null hypothesis H0 that increments are due to fluctuations of sampling,
the statistic
d
t= ,
S/ n
n n
1 1
where d =
n 
i1
di and S2 = 
n  1 i1
(di  d ) 2

follows Student’s t-distribution with (n – 1) degrees of freedom. If di is negative, we


may consider | d |. This test is generally one tailed test. Therefore, the alternative
hypothesis is H1 : 1 > 2 or H1 : 1 < 2.

SOLVED EXAMPLES

Example 1. The following data related to the heights (in cms) of two different
varieties of wheat plants.

Variety 1 63 65 68 69 71 72

Variety 2 61 62 65 66 69 69 70 71 72 73

Test the null hypothesis that the mean heights of plants of both varieties are the
same.
Solution. Given n1 = 6, n2 = 10
Null hypothesis H0 : 1 = 2
Alternative hypothesis H1 : 1 > 2 (right tail)
Under H0 the test statistic is given by
xy
t=
1 1
S 
n1 n2

Self-Instructional Material 99
Statistical Analysis Variety 1 Variety 2

x x – x = x – 68 (x – x )2 y y – y = y – 67 (y – y )2

63 –5 25 61 –6 36
NOTES
65 –3 9 62 –5 25
68 0 0 65 –2 4
69 1 1 65 –2 4
71 3 9 66 –1 1
72 4 16 66 –1 1
x = 408 (x – x )2 70 3 9
= 60 70 3 9
72 5 25
73 6 36
y = 670 (y – y )2 = 150

1 408 1 670
x = n xi  6 = 68 y yi  = 67
1 n 2 10
1
S2 = [(x – x )2 + (y – y )2]
n1  n2  2
1 210
= [60 + 150] = = 15  S = 3.873
6  10  2 14
xy 68  67 1
 t= =  = 0.499
1 1 1 1 3.873  0.5164
S  3.873 
n1 n2 6 10
Tabulated t0.05 for 14 degrees of freedom for single tail-test is 1.76.
Since calculated value of t is less than 1.76, it is not at all significant at 5% level
of significance. Hence, H0 may be accepted and we conclude that the height of the
plants are not different at 5% level of significance.
Example 2. The mean values of birth weight with standard deviations and
sample sizes are given below by socio-economic status. Is the mean difference in birth
weight significant between socio-economic group ?

High socio-economic graph Low socio-economic group

Sample size n1 = 15 n2 = 10
Birth weight (kg) x = 2.91 y = 2.26
Standard deviation S1 = 0.27 S2 = 0.22

Solution. Given n1 = 15, n2 = 10, x = 2.91, y = 2.26


S1 = 0.27 and S2 = 0.22
Null hypothesis H0 : 1 = 2
Alternative hypothesis H1 : 1 > 2 (right tail), i.e. high socio-economic group
is superior to low socio-economic group.
Under H0 the test statistic is
xy
t=
1 1
S 
n1 n2

100 Self-Instructional Material


1 Hypothesis Testing
S2 = [(n1 – 1) S12 + (n2 – 1) S22]
n1  n2  2
1
= [(15 – 1) × (0.27)2 + (10 – 1) × (0.22)2]
15  10  2
NOTES
1.0206  0.4356 1.4562
=  = 0.063
23 23
 S = 0.25
2.91  2.26 0.65  150 0.65  2.45
 t=  = = 6.37
1 1 0.25  25 0.25

0.25
15 10
Tabulated value of t for 23 degrees of freedom at 5% level of significance for
right tailed test is 1.71. Since calculated t is much greater than tabulated t, it is highly
significant and H0 is rejected and conclude that mean of high group is greater than
low group.
Example 3. In a test examination given to two groups of students, the marks
obtained were as follows :

Group I 25 32 30 34 24 14 32 24 30 31 35 25

Group II 44 34 22 10 47 31 40 30 32 35 18 21 35 29 22

Examine the significance of difference between the arithmetic average of marks


secured by students of the two groups.
Solution. Here, n1 = 12, n2 =15
Null hypothesis H0 : 1 = 2
Alternative hypothesis H1 : 1  2 (two-tailed)

x x – x = x – 28 (x – x )2 y y – y = y – 30 (y – y )2

25 –3 9 44 14 196
32 4 16 34 4 16
30 2 4 22 –8 64
34 6 36 10 –20 400
24 –4 16 47 17 289
14 – 14 196 31 1 1
32 4 16 40 10 100
24 –4 16 30 0 0
30 2 4 32 2 4
31 3 9 35 5 25
35 7 49 18 – 12 144
25 –3 9 21 –9 81
x = 336 (x – x ) = 0 (x – x )2 = 380 35 5 25
29 –1 1
22 –8 64

y = 450 (y – y ) = 0 (y – y )2 = 1410

Self-Instructional Material 101


Statistical Analysis 1 336 1 450
x = xi = = 28, y  yi = = 30
n1 12 n 2 15
Under H0, the test statistic is
x~y
NOTES t=
1 1
S 
n1 n2
1
S2 = [(x – x )2 + (y – y )2]
n1  n2  2
1 1790
= [380 + 1410] = = 71.6
12  15  2 25
 S = 8.46
30  28 2
 t=  = 0.61
1 1 8.46  0.387
8.46 
12 15
Tabulated value of t0.05 for 25 degrees of freedom is 2.06.
Since calculated value of t is less than tabulated value of t at 5% level of
significance. H0 may be accepted and we may conclude that two averages do not differ
significantly.
Example 4. Memory capacity of 8 students was tested before and after training.
State at 5% level of significance whether the training was effective from the following
scores :

Student 1 2 3 4 5 6 7 8 Total

Before 49 53 51 52 47 50 52 53 407
After 52 55 52 53 50 54 54 53 423

Use paired t-test for your answer.


Solution. Let x denotes the scores before training and y denotes the scores
after training.
Null hypothesis H0 : 1 = 2, i.e. there is no significant difference in the scores
before and after the training. In other words, the given increments are just by chance
(fluctuations of sampling).
Alternative hypothesis H1 : 1 < 2 (to conclude that training has been effected)
(One tail)

Student Score before Score after d=x–y d2


training (x) training (y)

1 49 52 –3 9
2 53 55 –2 4
3 51 52 –1 1
4 52 53 –1 1
5 47 50 –3 9
6 50 54 –4 16
7 52 54 –2 4
8 53 53 0 0

d = – 16 d2 = 44
102 Self-Instructional Material
Under H0 the test statistic is Hypothesis Testing
d
t=
S/ n
n
1  16
d =
n d
i1
i 
8
=–2 NOTES

n
1 1
S2 =
n  1 i1 
(di  d ) 2 =
n1
[di2 – n( d )2]

1 44  32 12
= [44 – 8 × (– 2)2] = = = 1.714
7 7 7
 S = 1.31
|d| | 2| 2  2.83
 |t|= = = = 4.32
S/ n 1.31 / 8 1.31
Tabulated t0.05 for (8 – 1) = 7 degrees of freedom for one tail test is 1.90.
Since calculated value of t is greater than the tabulated t, H0 is rejected at 5%
level of significance. Hence, we conclude that the scores differ significantly before and
after the training, i.e. training was effected.
Example 5. A certain drug administred to 10 patients showed the following
additional hours of sleep :
– 1.0, 0.5, 2.7, – 0.6, 1.2, 1.8, 1.6, 3.5, 0.2, – 1.7
Can it be concluded that the drug does produce additional hours of sleep ?
Solution. Here, di are given as
di = xi – yi = – 1.0, 0.5, 2.7, – 0.6, 1.2, 1.8, 1.6, 3.5, 0.2, – 1.7
n = 10
 di  1.0  0.5  2.7  0.6  1.2  1.8  1.6  3.5  0.2  1.7
d = 
n 10
8.2
=  0.82
10
di2 = 1 + 0.25 + 7.29 + 0.36 + 1.44 + 3.24 + 2.56 + 12.25 + 0.04 + 2.89 = 31.32
Null hypothesis H0 : 1 = 2, i.e. the drug does not produce any additional hours
of sleep.
Alternative hypothesis H1 : 1 < 2, i.e. drug is effective (one tail).
Under H0, the test statistic is
d
t=
S/ n
n
1 1
S2 =
n1  (d
i1
i  d )2 =
n1
[di2 – n( d )2]

1 1
= [31.32 – 10 × (0.82)2] = [31.32 – 6.724] = 2.733
10  1 9
 S = 1.653

0.82  10 2.593
 t= = = 1.57
1.653 1.653

Self-Instructional Material 103


Statistical Analysis Tabulated t0.05 = 1.833 with (10 – 1) degrees of freedom at 5% level of significance.
Since calculated value of t is less than the tabulated t, H0 is accepted at 5% level
of significance. Hence, we conclude that the drug do not produce additional hours of
sleep.
NOTES

EXERCISE 5.1

1. A brand of matches is sold in boxes on which it is claimed that the average contents are
40 matches. A check on a pack of 5 boxes gives the following results :
41, 39, 37, 40, 38
(i) Test the manufacturer’s claim keeping the interests of both the manufacturer and
the customer in mind.
(ii) As a customer test the manufacturer’s claim.
  2. A sample of size 10 drawn from a normal population has a mean 31 and a variance 2.25.
Is it reasonable to assume that the mean of the population is 30 ? (Use 1% level of
significance).
3. A random sample of size 10 from a normal population with mean  gives a sample mean
of 40 and sample standard deviation of 6. Test the hypothesis that  = 44 against   44
at 5% level of significance.
4. A new drug manufacturer wants to market a new drug only if he could be quite sure that
the mean temperature of a healthy person taking the drug could not rise above 98.6°F
otherwise he will withhold the drug. The drug is administered to a random sample of 17
healthy persons. The mean temperature was found to be 98.4°F with a standard deviation
of 0.6°F. Assuming that the distribution of the temperature is normal and  = 0.01, what
should the manufacturer do ?
5. The marks of students in two groups were obtained as

I 18 20 36 50 49 36 34 49 41

II 29 28 26 35 30 44 46

Test whether the groups were identical.


(Given t0.05 = 2.14 for 14 degrees of freedom)
6. Two different types of drugs A and B were tried on certain patients for increasing weight.
5 persons were given drug A and 7 persons were given drug B. The increase in weight in
pounds is given below :

Drug A 8 12 13 9 3

Drug B 10 8 12 15 6 8 11

Do the two drugs differ significantly with regard to their effect in increasing weight.
(Given t0.05 = 2.23 for 10 degrees of freedom)
7. The mean life of a sample of 10 electric light bulbs was found to be 1456 hours with
standard deviation of 423 hours. A second sample of 17 bulbs chosen from a different
batch showed a mean life of 1280 hours with standard deviation of 398 hours. Is there a
significant difference between the means of the two batches ?
(Given t0.05 = 2.06 for 25 degrees of freedom)
8. To verify whether a course in Statistics improved performance, a similar test was given
to 12 participants both before and after the course. The original marks recorded in
alphabetical order of the participants were 44, 40, 61, 52, 32, 44, 70, 41, 67, 72, 53 and
72. After the course, the marks were in the same order 53, 38, 69, 57, 46, 39, 73, 48, 73,
74, 60 and 78. Was the course useful ?
(Given t0.05 = 2.201 for 11 degrees of freedom)
104 Self-Instructional Material
9. A certain medicine given to each of the 9 patients resulted in the following increase of Hypothesis Testing
blood pressure. Can it be concluded that the medicine will in general be accompanied by
an increase in blood pressure.
7, 3, – 1, 4, – 3, 5, 6, – 4, – 1
(Given t0.05 = 2.306 for 8 degrees of freedom) NOTES
Answers
1. (i) Accept manufacturer ’s claim (ii) manufacturer’s claim is justified.
2. Yes 3. Accept null hypothesis
4. The manufacturer should market the drug 5. Two groups are identical
6. No 7. No 8. Yes 9. No

5.12. F-TEST

This test uses the variance ratio to test the significance of difference between
two sampled variances. F-test which is based on F-distribution is called so in honour
of a great statistician Prof. R.A. Fisher.
Let x1, x2, ...... , xn1 and y1, y2, ......, yn2 be the values of two independent random
samples drawn from the same normal population with variance 2. Then, we define
variance ratio F as follows :
S 12
F= ; S1 > S2,
S22
n1

where S1 2 =
1
n1  1
 (x
i1
i  x)2

n2
1
S22 =
n2  1 i  1

( yi  y ) 2

and x , y are the sample means.


The distribution of variance ratio F with 1 and 2 degrees of freedom is given
by
 1  2 
 
 2 
y0 F
y= ,
 1   F
  1  2 
 
 2 
1
   2
where y0 is so chosen that the total area under the curve is unity.
The parameters 1 and 2 represent degrees of freedom. For samples of sizes n1
and n2, we have
1 = n1 – 1 and 2 = n2 – 1.

5.13. PROPER
PROPERTIES OF F-DISTRIB
OPERTIES UTION
F-DISTRIBUTION

(i) The value of F cannot be negative as both terms of F-ratio are the squared
values.
(ii) The range of the values of F is from 0 to .
(iii) The F-distribution is independent of the population variance 2 and depends
on 1 and 2 only.
Self-Instructional Material 105
Statistical Analysis The F-distribution for various degrees of freedom 1 and 2 is given in the
following table :
Table : Values of F for 5% and 1% level, where 1 is the number of degree of
freedom for greater estimate of variance and 2 for the smaller estimate of variance.
NOTES

5.14. PR OCEDURE TO F-TEST


PROCEDURE

(i) Set up the null hypothesis H0 = 12 = 22 = 2, i.e. the independent estimates
of the common population variance do not differ significantly.
(ii) Find the degrees of freedom 1 and 2 given by 1 = n1 – 1 and 2 = n2 – 1
respectively.
(iii) Calculate the variances of two samples and then calculate F.
(iv) From F-distribution table note the value of F for 1, 2 degrees of freedom at
the desired level of significance.
(v) Compare the calculated value of F with tabulated value of F at the desired
level of significance. If the calculated value of F is less than the tabulated value, then
the difference is not significant and we may conclude that the same could have come
from two populations with the same variance i.e., accept H0, otherwise reject H0.

5.15. CRITICAL VALUES OF F-DISTRIB


VALUES UTION
F-DISTRIBUTION

The available F-table give the critical values of F for the right-tailed test, i.e. the
critical region is determined by the right-tail areas. Thus, the significance value
F (1, 2) at level of significance and (1, 2) degrees of freedom is determined by
P[F > F (1, 2)] = , as shown below :

P(F)

Critical value

Acceptance
region (1 – a) Rejection region
(a)

Fa(n1, n2)

SOLVED EXAMPLES

Example 1. In one sample of size 8 the sum of the squares of deviations of the
sample values from the sample mean is 84.4 and in the other sample of size 10 it is
102.6. Test whether this difference is significance at 5% level. Given that for 1 = 7 and
2 = 9 ; F0.05 = 3.29.
Solution. Here, n1 = 8, n2 = 10
and (x – x )2 = 84.4, (y – y )2 = 102.6
106 Self-Instructional Material
1 1 Hypothesis Testing
S12 = (x – x )2 = × 84.4 = 12.057
n1  1 7
1 1
S22 = (y – y )2 = × 102.6 = 11.4
n2  1 9
NOTES
Under H0 : 12 = 22 = 2, i.e. the estimates of 2 given by the samples are
homogeneous,

S 12 12.057
F= 2
 = 1.057
S2 11.4

For 1 = 7 and 2 = 9, we have F0.05 = 3.29. Since calculated value of F is less


than F0.05, H0 may be accepted at 5% level of significance.
Example 2. Two random samples gave the following information :

Sample Size Sample mean Sum of squares of


deviations from the mean

1 10 15 90
2 12 14 108

Test whether the samples have been drawn from the same normal population.
Given that for 1 = 9 and 2 = 11 ; F0.05 = 2.90 (approx.).
Solution. Here, n1 = 10, n2 = 12, x = 15, y = 14
  (x – x )2 = 90 ; (y – y )2 = 108
1 1
S12 = (x – x )2 =  90 = 10
n1  1 9
1 1
S22 = (y – y )2 =  108 = 9.82
n2  1 11
Under H0 : 12 = 22 = 2, i.e. two samples have been drawn from the same
normal population.

S 12 10
F=  = 1.018
S 22 9.82
For 1 = 9 and 2 = 11, we have F0.05 = 2.90.
Since calculated value of F is less than F0.05 it is not significant. Hence, null
hypothesis H0 may be accepted.
Example 3. The samples of sizes 9 and 8 give the sum of squares of deviations
from their respective means equal to 160 and 91 square units respectively. Test whether
the samples have been drawn from the same normal population. Given that for 1 = 8
and 2 = 7 ; F0.05 = 3.73.
Solution. Here, n1 = 9, n2, = 8, (x – x )2 = 160, (y – y )2 = 91
1 1
S12 = (x – x )2 = × 160 = 20
n1  1 8
1 1
S22 = (y – y )2 =  91 = 13
n2  1 7

Self-Instructional Material 107


Statistical Analysis Under H0 : 1 2 = 22 = 2 , i.e. two samples have been drawn from the same
normal population.

S 12 20
F= 2
 = 1.54 (approx.)
NOTES S2 13
For 1 = 8 and 2 = 7, we have F0.05 = 3.73
Since calculated value of F is less than F0.05 it is not significant. Hence, H0 may
be accepted.
Example 4. Two samples are drawn from two normal populations. From the
following data test whether the two samples have the same variances at 5% level of
significance.

Sample I 60 65 71 74 76 82 85 87

Sample II 61 66 67 85 78 88 86 85 63 91

Solution. Here, n1 = 8, n2 = 10
Under H0 : S12 = S22 , i.e. two samples have the same variance.
H1 : S12  S22
Sample-I Sample-II

x x– x (x – x )2 y y– y (y – y )2

60 60–75 = – 15 225 61 61–77 = – 16 256


65 65–75 = – 10 100 66 66–77 = – 11 121
71 71–75 = – 4 16 67 67–77 = – 10 100
74 74–75 = – 1 1 85 85–77 = 8 64
76 76–75 = 1 1 78 78–77 = 1 1
82 82–75 = 7 49 88 88–77 = 11 121
85 85–75 = 10 100 86 86–77 = 9 81
87 87–75 = 12 144 85 85–77 = 8 64
63 63–77 = – 14 196
91 91–77 = 14 196

x = 600 (x – x )2 y = 770 (y – y )2


= 636 = 1200

x 600 y 770
x = n  8 = 75 y =
n2

10
= 77
1
1 636
Variance of sample-I = S12 = (x – x )2 = = 90.857
n1  1 81
1 1200
Variance of sample-II = S22 = (y – y )2 = = 133.33
n2  1 10  1
S 22 133.33
F= = = 1.467
S 12 90.857
For 1 = 7 and 2 = 9, we have F0.05 = 3.29.
Since calculated value of F is less than F0.05, H0 may be accepted, i.e. the samples I
and II have the same variance.

108 Self-Instructional Material


Hypothesis Testing
EXERCISE 5.2

1. In a sample of 8 observations, the sum of squared deviations of items from the mean was
94.5. In another sample of 10 observations, the value was found to be 101.7. Test whether
the difference is significant at 5% level. NOTES
2. The following are the values in thousands of an inch obtained by two engineers in 10
successive measurements with the same micrometer. Is one engineer significantly more
consistent than the other ?

Engineer A 503 505 497 505 495 502 499 493 510 501

Engineer B 502 497 492 498 499 495 497 496 498

3. The nicotine content (in milligrams) of two samples of tobacco were found to be as follows :

Sample A 24 27 26 21 25

Sample B 27 30 28 31 22 36

Can it be said that the two samples come from the same normal population ?
4. The daily wages in ` of skilled workers in two cities are as follows :

City Size of sample of workers S.D. of wages in the sample

A 16 25
B 13 32

Test at 5% level of significance the equality of variances of the wage distribution in the
two cities.
5. The time taken by workers in performing a job by methods I and II is given below :

Method I 20 16 26 27 23 22 –

Method II 27 33 42 35 32 34 38

Do the data show that the variances of time distribution from population from which
these samples are drawn do not differ significantly ?
6. Two random samples drawn from two normal populations are given below :

Sample I 63 65 68 69 71 72 – – – –

Sample II 63 62 65 66 69 69 70 71 72 73

Test whether the two populations have the same variance at 5% level of significance.

Answers
1. No 2. Not significant 3. yes
4. Accepted 5. Not significant 6. Yes.

Self-Instructional Material 109


Statistical Analysis
TEST OF SIGNIFICANCE FOR LARGE SAMPLES
LARGE
For practical purposes a sample is taken as a large sample if n > 30. Under
large sample test there are some important tests to test the significance. These tests
NOTES
are as follows :
1. Test of significance for proportion
(i) Single proportion (ii) Difference of proportions
2. Test of significance for single mean.
3. Test of significance for differences of
(i) Means (ii) Standard deviations.

5.16. TEST OF SIGNIFICANCE FOR PR OPOR


PROPOR TION
OPORTION

(i) Single proportion


This test is used to test the significant difference between proportion of the
sample and the population.
Let X be the number of successes in n independent trials with constant probability
P of success for each trial.
We have E(X) = nP and V(X) = nPQ, where Q = 1 – P = probability of failure
X
Now, p= (p = observed proportion of success)
n
 
X 1 nP
Now, E(p) = E
 
n
= E(X) =
n n
=P

 X 1
V(p) = V   = 2
V(X) =
nPQ
2 =
PQ
n n n n
PQ
S.E. (p) =
n
p  E( p) pP
Z= = ~ N(0, 1)
S.E. ( p) PQ
n
where E  expected value, V  Variance and S.E.  Standard error
Z is called a test statistic which is used to test the significant difference of the
sample and population proportion.
Note 1. The probable limits for the observed proportion of success are E(p) ± Z V( p)

PQ
i.e., P ± Z , where Z is the significant value at the level of significance .
n
2. If P is not known then the probable limits for the proportion in the population are
pq
p ± Z  .
n
3. If  is not given, then we can use 3 limits. Hence, probable limits for the observed
PQ
proportion of success are P ± 3 and probable limits for the proportion in the population are
n
pq
p±3 .
n
110 Self-Instructional Material
4. A set of four selected values is commonly used for . Each  and corresponding Z and Hypothesis Testing
Z/2 values are given in the following table :

For two-tailed test For one-tailed test


NOTES
 Z/2  Z

0.20 1.282 0.10 1.282


0.10 1.645 0.05 1.645
0.05 1.960 0.025 1.960
0.01 2.576 0.01 2.326

(ii) Difference of Proportions


This test is used to test the difference between the sample proportions.
Let two samples X1 and X2 of sizes n1 and n2 respectively taken from two different
X1 X2
populations, then p1 = and p2 = .
n1 n2

To test the significance of the difference between the sample proportions p1 and
p2 we set the null hypothesis H0, that there is no significant difference between the
two sample proportion.
Under the null hypothesis H0, the test statistic is

p1  p2 n1 p1  n2 p2
Z= , where P = and Q=1–P
 1  1 
PQ 
n1  n2

n n 
1 2

If sample proportions are not given, we set the null hypothesis


H0 : p1 = p1
under H0 the test statistic is
P1  P2
Z= , where Q1 = 1 – P1 and Q2 = 1 – P2.
P1Q 1 P2 Q 2

n1 n2

SOLVED EXAMPLES

Example 1. A coin is tossed 324 times and the head turned up 175 times. Test
the hypothesis that the coin is unbiased.
Solution. Null hypothesis H0 : the coin is unbiased i.e.,
1
P=
2
Here, n = 324, X = Number of heads = 175
1
P = prob. of getting a head in a toss =
2
1 1
Q=1–P=1– =
2 2

Self-Instructional Material 111


Statistical Analysis 1
X – E(X) 175  324 
X – nP 2
 Z= = =
SE of X nPQ 1 1
324  
2 2
NOTES 13
= = 1.44 < 1.96
9
Since | Z | < 1.96, null hypothesis is accepted at 5% level of significance. Hence
the coin is unbiased.
Example 2. A die is thrown 1000 times and a throw of 5 or 6 was obtained 420
times. On the assumption of random throwing do the data indicate an unbiased die ?
Solution. Null hypothesis H0 : the die is unbiased
Under H0, P = probability of getting 5 or 6
1 1 1
=  =
6 6 3
1 2
Q=1–P=1– =
3 3
Here, n = 1000, X = Number of success = 420

1
420  1000 
Z=
X  nP
= 3 = 420  333.33 = 86.67 = 5.813
nPQ 1 2 222.222 14.91
1000  
3 3
Since | Z | = 5.813 > 3 (Maximum value of Z), H0 is rejected i.e., the die is
biased.
Example 3. A manufacturer claims that only 4% of his products supplied by
him are defective. A random sample of 600 products contained 36 defectives. Test the
claim of manufacturer.
36
Solution. Here p = sample proportion of defectives = = 0.06
600
4
P = proportion of defectives in the population = = 0.04
100
Q = 1 – P = 1 – 0.04 = 0.96
n = 600
Null hypothesis H0 : P = 0.04 is true i.e., the claim of manufacturer is right

pP 0.06  0.04 0.02


Z= = = = 2.5
PQ 0.04  0.96 0.008
n 600
If we set the alternative hypothesis H1 : P  0.04 we apply two tailed test.
Since | Z | = 2.5 > 1.96, H0 is rejected at 5% level of significance
i.e., manufacturer’s claim is not acceptable.
If we set the alternative hypothesis H1 : P > 0.04 we apply right tailed test.
| Z | = 2.5 > 1.645, H0 is rejected at 5% level of significance. i.e., manufacturer’s
claim is not acceptable.
Example 4. 500 apples are taken at random from a large basket and 65 are
found to be bad. Find the S.E. of the proportion of bad ones in a sample of this size and
assign limits within which the percentage of bad apples most probably lies.
112 Self-Instructional Material
Solution. Here, n = 500, X = number of bad apples in the sample = 65 Hypothesis Testing

65
p = proportion of bad apples in the sample = = 0.13 and
500
q = 1 – p = 1 – 0.13 = 0.87 NOTES
Q The proportion of bad apples P in the population is not known.
 We can take P = p = 0.13, Q = q = 0.87 and N = n = 500
PQ 0.13  0.87
S.E. of proportion =  = 0.015
N 500
Limits for proportions of bad apples in the population is

PQ 0.13  0.87
P±3 = 0.13 ± 3 = 0.13 ± 0.045 = 0.175 and 0.085
N 500
= 17.5% and 8.5%.
Example 5. A manufacturer claimed that at least 95% of the equipment which
he supplied to a factory conformed to specifications. An examination of a sample of 300
equipments revealed that 27 are faulty. Test his claim at a significance level of (i) 5%
(ii) 1%.
Solution. Here,
X = number of equipments conforming to specifications in the samples
= 300 – 27 = 273
273
p = sample proportion conforming to specifications = = 0.91
300
Null hypothesis H0 : P = 0.95 (the proportion of equipments conforming to
specification in the population is 95%)
Q = 1 – P = 0.05
H1 : P < 0.95 (at least 95% conformed to specification)

pP 0.91  0.95  0.04


Z= = = = – 3.175
PQ 0.95  0.05 0.0126
n 300

| Z | = | – 3.175 | = 3.175
(i) Since the H1 is one tailed and the significant value of Z at 5% level of
significance for one tail is 1.645.
Now | Z |= 3.175 > 1.645, H0 is rejected i.e., manufacturer’s claim is not
acceptable.
(ii) The significant value of Z at 1% level of significance for one tail is 2.33.
Now | Z | = 3.175 > 2.33, H0 is rejected i.e., manufacturer’s claim is not
acceptable.
Example 6. Before an increase in excise duty on tea, 400 people out of a sample
of 500 persons where found to be tea drinkers. After an increase in the excise duty,
400 persons were known to be tea drinkers in a sample of 600 people. Do you think that
there has been a significant decrease in the consumption of tea after the increase in the
excise duty ?
Solution. Here n1 = 500, n2 = 600
X1 = 400, X2 = 400
Self-Instructional Material 113
Statistical Analysis
400 4
p1 = proportion of drinkers in first sample = = = 0.8
500 5
400 2
p2 = proportion of drinkers in second sample = = = 0.67
NOTES 600 3
Since proportion P of the population is not given, it can be estimated by using
n1 p1  n2 p2 400  400 800 8
P= = = =
n1  n2 500  600 1100 11

8 3
and Q=1–P=1– =
11 11
Null hypothesis H0 : P1 = P2 (there is no significant difference in the consumption
of tea before and after increase of excise duty)
Alternative hypothesis H1 : P1 > P2 (right tailed test), under H0 the test statistic

p1  p2 0.8  0.67 0.13


Z= = = = 4.815
 1  1  8 3 1 
1  0.027
PQ
n n 
1 2
 

11 11 500 600 
Since | Z | = 4.815 > 1.645 also | Z | = 4.815 > 2.33 at both the significant
values of Z at 5% and 1% level of significant respectively, H0 is rejected i.e., there is a
significant decrease in the consumption of tea due to increase in excise duty.
Example 7. During a country wide investigation the incidence of a chronic decease
was found to be 1%. In a village of 400 strength 5 were reported to be affected where-as
in another village of 1200 strength 10 were reported to be affected. Does this indicate
any significant difference.
Solution. Here, P = 0.01 and Q = 1 – P = 1 – 0.01 = 0.99
5
n1 = 400, p1 = = 0.0125
400
10
n2 = 1200, p2 = = 0.0083
1200
Null hypothesis H0 : P1 = P2 (there is no significant difference)
Alternative hypothesis H1 : P1  P2 (two tailed test)

p1  p2 0.0125  0.0083
Under H0 the test statistic Z= =
 1  1   1  1 
PQ
n n 
1 2
0.01  0.99
 400 1200 
0.0042
= = 0.732
0.00574
Since | Z | = 0.732 < 1.96, null hypothesis is accepted at 5% level of significance.
Hence the difference is not significant.
Example 8. 500 articles from a factory are examined and found to be 2% defective.
800 similar articles from a second factory are found to have only 1.5% defectives. Can it
reasonably concluded that the products of the first factory are inferior to those of second ?

114 Self-Instructional Material


Solution. Here, n1 = 500, Hypothesis Testing

2
p1 = proportion of defectives from first factory = = 0.02
100
n2 = 800, NOTES
1.5
p2 = proportion of defectives from second factory = = 0.015
100
Since proportion P of the population is not given it can be estimated by using
n1 p1  n2 p2 10  12 22
P= = = = 0.017
n1  n2 500  800 1300
and Q = 1 – P = 1 – 0.017 = 0.983
Null hypothesis H0 : P1 = P2 (there is no significant difference between the
products of first and second factory)
Alternative hypothesis H1 : P1  P2 (two tailed test)
Under H0 the test statistic

p1  p2 0.02  0.015
Z= =
 1 1   1  1 
PQ
n1

n2  0.017  0.983
 500 800 
0.005
= = 0.678
0.00737
Since | Z | = 0.678 < 1.96, null hypothesis is accepted at 5% level of significance.
Hence there is no significant difference between the products of first and second factory
i.e., the products of the first factory are not inferior to those of second.
Example 9. A manufacturing firm claims that its brand. A products outsells its
brand B products by 8%. If it is found that 84 out of a sample of 400 persons prefer
brand A and 36 out of another sample of 200 persons prefer brand B. Test whether the
8% difference is a valid claim.
Solution. Here, n1 = 400, n2 = 200
84
p1 = proportion of preference of brand A = = 0.21
400
36
p2 = proportion of preference of brand B = = 0.18
200
n1 p1  n2 p2 84  36 120
P= = = = 0.2
n1  n2 400  200 600
and Q = 1 – P = 1 – 0.2 = 0.8
Null hypothesis H0 : 8% difference is there in the sales of brand A and
brand B i.e., P1 – P2 = 0.08
Alternative hypothesis H1 : P1 – P2  0.08 (two tailed test)

Self-Instructional Material 115


Statistical Analysis Under H0 the test statistic
( p1  p2 )  ( P1  P2 ) (0.21  0.18)  (0.08)
Z= =
 1  1   1  1 
NOTES
PQ
n n 
1 2
0.2  0.8
 400 200 
0.05
= – = – 1.44
0.0346
Since | Z | = 1.44 < 1.96, null hypothesis is accepted at 5% level of significance.
Hence the claim of 8% difference in the sales of brand A and brand B is valid.
Example 10. In two large populations there are 30% and 25% respectively of
fair haired people. Is this difference likely to be hidden in samples of 1400 and 1000
respectively from the two populations.
Sol. Here, n1 = 1400, n2 = 1000
30
P1 = proportion of fair haired in the first population = = 0.3
100
25
P2 = proportion of fair haired in the second population = = 0.25
100
Q1 = 1 – P1 = 1 – 0.3 = 0.7, Q2 = 1 – P2 = 1 – 0.25 = 0.75
Null hypothesis H0 : p1 = p2 (Sample proportions are equal) i.e., the difference in
population proportions is likely to be hidden in sampling.
Alternative hypothesis H1 : p1  p2 (two tailed test)
Under H0 the test statistic is
P1  P2 0.30  0.25
Z= =
P1Q 1 P2 Q 2 0.3  0.7 0.25  0.75
 
n1 n2 1400 1000
0.05
= = 2.72
0.01837
Since | Z | = 2.72 > 1.96, null hypothesis is rejected at 5% level of significance.
Hence at 5% level of significance these samples will exhibit the difference in the
population proportions.

EXERCISE 5.3

1. A coin was tossed 400 times and the head turned up 216 times. Test the hypothesis that
the coin is unbiased.
2. In a hospital 525 female and 475 male babies were born in a month. Do these figures
confirm the hypothesis that females and males are born in equal number ?
3. A die is thrown 10000 times and a throw of 3 or 4 was obtained 4200 times. On the
assumption of random throwing do the data indicate an unbiased die ?
4. Given that on the average 4% of insured men of age 65 die within a year and that 60 of
a particular group of 1000 such men (age 65) died within a year. Can this group be
regarded as a representative sample ?
5. 325 men out of 600 men chosen from a big city were found to be smokers. Does this
information support the conclusion that the majority of men in the city are smokers ?
6. A random sample of 400 apples is taken from a large basket and 40 are found to be bad.
Estimate the proportion of bad apples in the basket and assign limits within which the
percentage most probably lies.

116 Self-Instructional Material


7. A manufacturer claimed that at least 95% of the equipments which he supplied to a Hypothesis Testing
factory conformed to specifications. An examination of a sample of 200 pieces of
equipments revealed that 18 were faulty. Test the manufacturer’s claim at a level of
significance (i) 5% (ii) 1%.
8. 1000 articles from a factory are examined and found to be 2.5% defective. 1500 similar NOTES
articles from a second factory are found to have only 2% defectives. Can it reasonably
concluded that the products of the first factory are inferior to those of second ?
9. A manufacturing firm claims that its brand A product outsells its brand B product by
8%. If it is found that 42 out of a sample of 200 persons prefer brand A and 18 out of
another sample of 100 persons prefer brand B. Test whether the 8% difference is valid
claim.
10. In a survey on a particular matter in a college, 850 males and 560 females voted. 500 males
and 320 females voted yes. Does this indicate a significant difference of opinion between
male and female on this matter at 1% level of significance ?
11. Two samples of sizes 1200 and 900 respectively drawn from two large populations. In
the two large populations there are 30% and 25% respectively of fair haired people. Test
whether these two samples will reveal the difference in the population proportions.
12. Before an increase in excise duty on tea 800 persons out of a sample of 1000 persons
were found to be tea drinkers. After an increase in excise duty 800 people were tea
drinkers in a sample of 1200 people. Test whether there is a significant decrease in the
consumption of tea after the increase in excise duty.

Answers
1. H0 is accepted at 5% level of significance
2. Yes, H0 is accepted at 5% level of significance
3. H0 is rejected 4. H0 is rejected
5. H0 is rejected at 5% level of significance 6. 8.5 : 11.5
7. Using left tailed test, H0 is rejected at both 5% and 1% level of significance
8. No, H0 is accepted 9. H0 is accepted
10. H0 is accepted 11. H0 is rejected at 5% level of significance
12. H0 is rejected.

5.17. TEST OF SIGNIFICANCE FOR SINGLE MEAN

This test is used to test the significant difference between sample mean and
population mean.
Let X1, X2, ..., Xn be a random sample of size n from a normal population with
mean  and variance 2.
The standard error (S.E.) of mean of a random sample of size n from a population
is given by

S.E. ( x ) = , where  is the standard deviation of the population.
n
We set the null hypothesis H0 that the sample has been drawn from a large
population with mean  and variance 2 i.e., there is no significant difference between
the sample mean ( x ) and population mean ().
Under the null hypothesis H0 the test statistic is
x
Z=
/ n

Self-Instructional Material 117


Statistical Analysis If standard deviation of the population () is not known, we use the test statistic
given as
x
Z= , where s is the standard deviation of the sample.
s/ n
NOTES 
Note. The limits of the population mean  are given by x ± Z  . i.e.,
n
    + 
x – Z . x Z .
n n
These limits are called the confidence limits for .

SOLVED EXAMPLES

Example 1. A normal population has a mean of 6.8 and standard deviation of


1.5. A sample of 400 members gave a mean of 6.75. Is the difference significant ?
Solution. Here,  = 6.8, x = 6.75,  = 1.5, n = 400
Null hypothesis H0 : x =  (there is no significant difference between x and )
Alternative hypothesis H1 : there is a significant difference between x and 
x 6.75  6.8 0.05
Z= = =– = – 0.67
/ n 1.5 / 400 0.075
Since | Z | = 0.67 < 1.96 H0 is accepted at 5% level of significance. Hence there
is no significant difference between x and  .
Example 2. A random sample of 400 members has a mean 99. Can it be
reasonably regarded as a sample from a large population of mean 100 and standard
deviation 8 at 5% level of significance ?
Solution. Here,  = 100, x = 99,  = 8, n = 400
Null hypothesis H0 : the sample is drawn from a large population with mean
100 and standard deviation 8.
Alternative hypothesis H1 :  100 (two tailed test)
x 99  100 1
Z= = = – = – 2.5
/ n 8 / 400 0.4
Since | Z | = 2.5 > 1.96, H0 is rejected at 5% level of significance. Hence there is
a significant difference between x and  i.e., it can not be regarded as a sample from a
large population.
Example 3. The management of a company claims that the average weekly income
of their employees is ` 900. The trade union disputes this claim stressing that it is
rather less. An independent sample of 150 randomly selected employees estimated the
average to be ` 856 with standard deviation of ` 354. Would you accept the view of the
management ?
Solution. Here,  = 900, x = 854, s = 354, n = 150
Null hypothesis H0 : there is no significant difference between x and  i.e., the
view of management is correct.
Alternative hypothesis H1 :  900 (two-tailed test)
x 854  900 46
Z= = =– = – 1.59
s/ n 354 / 150 28.904
Since | Z | = 1.59 < 1.96, H0 is accepted at 5% level of significance. Hence the
view of management is correct.

118 Self-Instructional Material


Example 4. In a population with a standard deviation of 14.8, what sample size Hypothesis Testing
is needed to estimate the mean of population within ± 1.2 with 95% confidence ?
Solution. Here, x –  = ± 1.2,  = 14.8, Z = 1.96
x NOTES
We know that Z=
/ n
Using this, we have

 1.2  1.2 n
1.96 = =
14.8 / n 14.8
On squaring both the sides we have

  1.2 
=
2
 1.96  14.8 
n= 
2
(1.96)2
 14.8  n or
  1.2  = 584.35  584.

Example 5. A random sample of 900 measurements from a large population


gave a mean value of 64. If this sample has been drawn from a normal population with
standard deviation of 20, find the 95% and 99% confidence limits for the mean in the
population.
Solution. Here, n = 900, x = 64,  = 20
At 95% confidence Z = 1.96
At 99% confidence Z = 2.58
The confidence limits for the population mean  is given by

x  Z
n
The confidence limits for 95% confidence are
20
64  1.96  = 64 ± 1.307 = 62.693 and 65.307
900
The confidence limits for 99% confidence are
20
64  2.58  = 64 ± 1.72 = 62.28 and 65.72.
900

EXERCISE 5.4

1. A random sample of 900 members has a mean 3.4 cms. Can it be reasonably regarded as
a sample from a large population of mean 3.2 cms and standard deviation 2.3 cms ?
2. A random sample of 400 male students is found to have a mean height of 160 cms. Can
it be reasonably regarded as a sample from a large population with mean height 162.5 cms
and standard deviation 4.5 cms ?
3. A random sample of 200 measurements from a large population gave a mean value of 50
and a standard deviation of 9. Determine 95% confidence interval for the mean of
population.
4. A random sample of 400 measurements from a large population gave a mean value of 82
and a standard deviation of 18. Determine 95% confidence interval for the mean of
population.
5. A company manufacturing electric bulbs claims that the average life of its bulbs is
1600 hours. The average life and standard deviation of random sample of 100 such bulbs
were 1570 hours and 120 hours respectively. Should we accept the claim of the company ?

Self-Instructional Material 119


Statistical Analysis 6. An insurance agent has claimed that the average age of policy holders who insure through
him is less than the average for all agents which is 30.5 years. A random sample of
100 policy holders who had insured through him reveal that the mean and standard
deviation are 28.8 years and 6.35 years respectively. Test his claim at 5% level of
significance.
NOTES 7. The guaranteed average life of a certain type of bulbs is 1000 hours with a standard
deviation of 125 hours. It is decided to sample the output so as to ensure that 90% of the
bulbs do not fall short of the guaranteed average by more than 2.5%. What must be the
minimum size of the sample ?

Answers
1. Yes, H0 is accepted 2. Yes, H0 is accepted
3. 48.8 and 51.2 4. 80.24 and 83.76
5. No, rejected at 5% level of significance 6. Claim is valid
7. n=4

5.18. TEST OF SIGNIFICANCE FOR DIFFERENCE OF


MEANS

(i) This test is used to test the significant difference between the means of two
large samples.
Let x1 be the mean of a sample of size n1 from a population with mean 1 and
variance 12 and let x2 be the mean of an independent sample of size n2 from another
population with mean 2 and variance 22.
We set the null hypothesis H0 that there is no significant difference between the
sample means i.e., 1 = 2.
Under the null hypothesis H0 the test statistic is

x1  x2
Z=
 12  2 2

n1 n2

If the samples are drawn from the same population with common standard
deviation (), then under the null hypothesis the test statistic is

x1  x2
Z= (Q 1 = 2 = )
1 1
 
n1 n2
Note. 1. If 1  2 and 1 and 2 are not known, the test statistic is
x1  x2
Z= .
s12 s22

n1 n2
2. If common standard deviation () is not known and 1 = 2 than  can be obtained by
using

n1 s12  n2 s22
 =
n1  n2

120 Self-Instructional Material


Hypothesis Testing
x1  x2
The test statistic is Z= .
n1 s12  n2 s22  1 1
  
n1  n2  n1 n2 

(ii) Standard Deviations. This test is used to test the significant difference NOTES
between the standard deviations of two populations.
Let two independent random sample of sizes n1 and n2 having standard deviations
s1 and s2 be drawn from the two normal population with standard deviation 1 and 2
respectively.
We set the null hypothesis H0 that the sample standard deviations do not differ
significantly i.e., 1 = 2.
Under the null hypothesis H0 the test statistic is
s1  s2
Z=
 12  2 2

2n1 2n2
If 1 and 2 are unknown then the test statistic is
s1  s2
Z= .
s12 s 2
 2
2n1 2n2

SOLVED EXAMPLES

Example 1. Examine whether there is any significant difference between the


two samples for the following data :

Sample Size Mean

1 50 140
2 60 150

Standard deviation of the population = 10.


Solution. Here, n1 = 50, n2 = 60, x1 = 140, x2 = 150, = 10
Null hypothesis H0 : 1 = 2 i.e., samples are drawn from the same normal
population.
Alternative hypothesis H1 : 1  2
Under H0 the test statistics is

x1  x2 140  150 10
Z= = =– = – 5.22
1 1 1 1 1.915
  10 
n1 n2 50 60

Since | Z | = 5.22 > 3, H0 is rejected. Hence the samples are not drown from the
same normal population.

Self-Instructional Material 121


Statistical Analysis Example 2. Intelligence tests on two groups of boys and girls gave the following
results.

Mean S.D. Size


NOTES
Girls 70 10 70
Boys 75 11 100

Examine if the difference between mean scores is significant.


Solution. Here, n1 = 70, n2 = 100, x1 = 70, x2 = 75, s1 = 10, s2 = 11
Null hypothesis H0 : There is no significant difference between mean scores i.e.,
x1 = x2
Alternative hypothesis H1 : x1  x2 (two-tailed test)
Under H0 the test statistic is

x1  x2 70  75 5
Z= = = – = – 1.895
s12 2
s2 102
11 2 2.639
 
n1 n2 70 100

Since | Z | = 1.895 < 1.96, H0 is accepted at 5% level of significance. Hence


there is no significant difference between mean scores.
Example 3. Two samples were taken from two normal populations. The following
information was available on these samples regarding the expenditure in Rupees per
month per family.
Sample 1 n1 = 42 x1 = 744.85 12 = 158165.43
Sample 2 n2 = 32 x 2 = 516.78 22 = 26413.61
Test whether the average expenditure per month per family is equal.
Solution. Null hypothesis H0 : 1 = 2 i.e., the average expenditure per month
per family is equal.
Alternative hypothesis H1 : 1  2 (two tailed test)
Under H0 the test statistic is

x1  x2 744.85  516.78 228.07


Z= = = = 3.37
 12  2 2 158165.43 26413.61 67.76
 
n1 n2 42 32

Since | Z | = 3.37 > 1.96, H0 is rejected at 5% level of significance. Hence the


average expenditure per month per family is not equal.
Example 4. The means of two large samples of 1000 and 2000 members are
168.75 cms and 170 cms respectively. Can the samples be regarded as drawn from the
same population of standard deviation 6.25 cms ?
Solution. Here, n1 = 1000, n2 = 2000, x1 = 168.75, x2 = 170,  = 6.25
Null hypothesis H0 : 1 = 2 i.e., samples are drawn from the same population.
Alternative hypothesis H1 : 1  2 (two tailed test)

122 Self-Instructional Material


Under H0 the test statistic is Hypothesis Testing

x1  x2 168.75  170 1.25


Z= = = – = – 5.165
1 1 1 1 0.242
  6.25 
n1 n2 1000 2000 NOTES

Since | Z | = 5.165 > 1.96, H0 is rejected at 5% level of significance. Hence the


samples are not drawn from the same population.
Example 5. Two random samples of sizes 1000 and 2000 farms gave an average
yield of 2000 kg and 2050 kg respectively. The variance of wheat farms in the country
may be taken as 10 kg. Examine whether the two samples differ significantly in yield.
Solution. Here, n1 = 1000, n2 = 2000, x1 = 2000, x2 = 2050, 2 = 100 i.e.,  = 10
Null hypothesis H0 : 1 = 2 i.e., samples are drawn from the same population.
Alternative hypothesis H1 : 1  2 (two tailed test)
Under H0 the test statistic is
x1  x2 2000  2050 50
Z= = = – = – 129.20
1 1 1 1 0.387
  10 
n1 n2 1000 2000
Since | Z | = 129.20 > 3 (maximum value of Z), highly significant, H 0 is rejected.
Hence the samples are not drawn from the same normal population.
Example 6. The standard deviation of weight of all students in a college was
found to be 4 kgs. Two random samples are drawn. The standard deviations of the
weight of 100 undergraduate students is 3.5 kgs and 50 postgraduate students is 3 kgs.
Test the significance of the difference in standard deviations of the samples at 5% level
of significance.
Solution. Here, n1 = 100, n2 = 50, s1 = 3.5, s2 = 3,  = 4
Null hypothesis H0 : 1 = 2 i.e., sample standard deviations do not differ
significantly
Alternative hypothesis H1 : 1  2 (two tailed test)
Under H0 the test statistic is
s1  s2 3.5  3 0.5
Z= = = = 1.02
1 1 1 1 0.49
  4 
2n1 2n2 200 100
Since | Z | = 1.02 < 1.96, H0 is accepted. Hence sample standard deviations do
not differ significantly.
Example 7. Random samples drawn from two large cities gave the following
information relating to the heights of adult males :

Mean height Standard deviation No. in samples


(in inches)

City 1 67.42 2.58 1000


City 2 67.25 2.50 1200

Test the significance of difference in standard deviations of the samples at 5%


level of significance.

Self-Instructional Material 123


Statistical Analysis Solution. Here, n1 = 1000, n2 = 1200, x1 = 67.42, x2 = 67.25, s1 = 2.58, s2 = 2.50,
 is not known.
Null hypothesis H0 : 1 = 2 i.e., the sample standard deviations do not differ
significantly
NOTES
Alternative hypothesis H1 : 1  2 (two tailed test)
Under H0 the test statistic is

s1  s2 2.58  2.50 0.08


Z= = = = 1.039
s12 s22 (2.58) 2
(2.50) 2 0.077
 
2n1 2n2 2000 2400

Since | Z | = 1.039 < 1.96, H0 is accepted. Hence sample standard deviations do


not differ significantly.
Example 8. In a survey of incomes of two classes of workers of two random
samples gave the following data :

Size of sample Mean annual Standard deviation


income in ` in `

Sample 1 100 582 24


Sample 2 100 546 28

Examine whether the difference between


(i) Mean and
(ii) The standard deviations significant.
Solution. Here, n1 = 100, n2 = 100, x1 = 582, x2 = 546, s1 = 24, s2 = 28
(i) Null hypothesis H0 : 1 = 2 i.e., sample means do not differ significantly.
Alternative hypothesis H1 : 1  2 (two tailed test)
Under H0 the test statistic is

x1  x2 582  546 36
Z= = = = 9.762
s12 s2 2 (24) 2 (28) 2 3.6878
 
n1 n2 100 100

Since | Z | = 9.762 > 1.96, highly significant, H0 is rejected at 5% level of


significance. Hence sample means differ significantly.
(ii) Null hypothesis H0 : 1 = 2 i.e., sample standard deviations do not differ
significantly.
Alternative hypothesis H1 : 1  2 (two-tailed test)
Under H0 the test statistic is

s1  s2 24  28 4
Z= = = = – 1.53
s12 s 2 (24) 2 (28) 2 2.6077
 2 
2n1 2n2 200 200

Since | Z | = 1.53 < 1.96, H0 is accepted at 5% level of significance. Hence


sample standard deviations do not differ significantly.

124 Self-Instructional Material


Hypothesis Testing
EXERCISE 5.5

1. The number of accidents per day were studied for 144 days in city A and for 100 days in
city B. The mean numbers of accidents and standard deviations were respectively 4.5
and 1.2 for city A and 5.4 and 1.5 for city B. Is city A more prone to accidents than city B. NOTES
2. The mean yields of a crop from two places in a district were 210 kgs and 220 kgs per acre
from 100 acres and 150 acres respectively. Can it be regarded that the sample were
drawn from the same district which has the standard deviation of 11 kgs per acre ?
3. Given the following data :

No. of cases Mean wages Standard deviation


in ` of wages in `

Sample 1 400 47.4 3.1


Sample 2 900 50.3 3.3

Examine whether the two mean wages differ significantly.


4. A sample of heights of 6400 soldiers has a mean of 67.85 inches and a standard deviation
of 2.56 inches. While another sample of heights of 1600 sailors has a mean of 68.55
inches and a standard deviation of 2.52 inches. Do the data indicate that the sailors are
on the average taller than soldiers ?
5. Intelligence tests on two groups of boys and girls gave the following results :

Mean S.D Size

Girls 75 8 60
Boys 73 10 100

Examine if the difference between mean scores is significant.


6. The yield of a crop in a random sample of 1000 farms in a certain area has a standard
deviation of 192 kgs. Another random sample of 1000 farms gives a standard deviation
of 224 kgs. Are the standard deviations significantly different ?
7. The standard deviation of a random sample of 900 members is 4.6 and that of another
random sample of 1600 is 4.8. Examine if the standard deviations are significantly
different.
8. The mean yield of two sets of plots and their variability are as follows :

Set of 40 plots Set of 60 plots

Mean yield per plot 1258 kgs 1243 kgs


S.D. per plot 34 28

Examine whether
(i) the difference in the variability in yields is significant,
(ii) the difference in the mean yields is significant.

Answers
1. No 2. No 3. Yes, highly significant

4. Highly significant 5. Not significant at 5% 6. Yes


7. Not significant 8. (i) Not significant (ii) significant.

Self-Instructional Material 125


Statistical Analysis

NOTES
6. NON-PARAMETRIC TESTS

STRUCTURE

Chi-square Test
Chi-square Test to Test the Goodness of Fit
Chi-square Test to Test the Independence of Attributes
Conditions for 2 Test
Uses of 2 Test
Correlation Analysis
Scatter or Dot Diagram
Characteristics of the coefficient of Correlation r
Spearman’s Rank Correlation

6.1. CHI-SQUARE TEST


CHI-SQU

In test of hypothesis of parameters, it is usually assumed that the random


variable follows a particular distribution. To confirm whether our assumption is right,
Chi-square test is used which measures the discrepancy between the observed (actual)
frequencies and theoretical (expected) frequencies, on the basis of outcomes of a trial
or observational data. Chi-square is a letter of the Greek alphabet and is denoted by
2. It is a continuous distribution which assumes only positive values.

6.2. CHI-SQUARE TEST TO TEST THE GOODNESS OF


CHI-SQU
FIT

The value of 2 is used to test whether the deviations of the observed (actual)
frequencies from the theoretical (expected) frequencies are significant or not. Chi-
square test is also used to test whether a set of observations fit a given distribution or
not. Therefore, chi-square provides a test of goodness of fit.

126 Self-Instructional Material


If O1, O2, ...... , On is a set of observed (actual) frequencies and E1, E2, ......, En is Non-parametric Tests
the corresponding set of theoretical (expected) frequencies, then the statistic 2 is given
by
n

 &'K
K% (O i  Ei ) 2 K() NOTES
*K
2 =
i1
Ei
is distributed with (n – 1) degrees of freedom.
Here, we test the null hypothesis
H0 : There is no significant difference between the observed (actual) values and
the corresponding expected (theoretical) values.
v.s., H1 : H0 is not true.
If 2cal  2tab (or 2, n – 1) then H0 is rejected otherwise H0 is accepted.
Note. If the null hypothesis H0 is true, the test statistic 2 follow chi-square distribution
with (n – 1) degrees of freedom, where
n n n

 O  E
i i ; i.e.  (O
i1
i  Ei ) = 0.
i1 i1

6.3. CHI-SQUARE TEST TO TEST THE INDEPENDENCE


CHI-SQU
OF ATTRIBUTES
ATTRIBUTES

The value of 2 is used to test whether two attributes are associated or not, i.e.
independence of attributes. To test the independence of attributes contingency table is
used.
A contingency table is a two-way table in which rows are classified according to
one attribute or criterion and columns are classified according to the other attribute or
criterion. Each cell contains that number of items O ij possessing the qualities of the
ith row and jth column, where i = 1, 2, ......, r and j = 1, 2, ......, s. In such a case
contingency table is said to be of order (r × s). Each row or column total is known as
r
marginal total. Also we have the sum of row totals R
i1
i is equal to the sum of column

s
totals C
j1
j , i.e.

R i
i = C
j
j = N, where N is the total frequency.

Let us consider the two attributes A and B, where A divided into r classes A1,
A2, ......, Ar and B divided into s classes B1, B2, ...... , Bs. If Ri represents the number of
persons possessing the attributes Ai ; Cj represents the number of persons possessing

Self-Instructional Material 127


Statistical Analysis the attributes Bj and Oij represent the number of persons possessing attributes Ai and
Bj respectively. The contingency table of order (r × s) is shown in the following table :

Columns
NOTES B1 B2 ...... Bs Total
Rows

A1 O11 O12 ...... O1s R1


A2 O21 O22 ...... O2s R2

# # # # #
# # # # #

Ar Or1 Or2 ...... Ors Rr

Total C1 C2 ...... Cs N

Corresponding to each Oij the expected frequency Eij in a contingency table is


calculated by
Ri  C j Row total  Column total
Eij = =
N Grand total
Here, we test the null hypothesis
H0 : There is no association between the attributes under study, i.e. attributes
A and B are independent.
v.s., H1 : attributes are associated, i.e., attributes A and B are not independent.
H0 can be tested by the statistic
r s
(O ij  E ij ) 2
2 = 
i1 j1
E ij
is distributed with (r – 1) (s – 1) degrees of freedom.

If cal2  2tab (or 2,(r – 1) (s – 1) ), then H0 is rejected otherwise H0 is accepted.


Note 1. For a contingency table with r rows and s columns, the degrees of freedom
= (r – 1) (s – 1).
a b
2. For a 2 × 2 contingency table c d we use the following formula to calculate the value
of statistic 2 as
N ( ad  bc)2
2 = ,
(a  b) (b  d) ( a  c) (c  d)
where N=a+b+c+d
2 has (2 – 1) (2 – 1) = 1 degree of freedom.
3. Yate’s correction. In a 2 × 2 contingency table, if any of cell frequency is less than 5,
1
we make a correction to make 2 continuous. Decrease by those cell frequencies which are
2
1
greater than expected frequencies and increase by those cell frequencies which are less than
2
expected frequencies. This will affect the marginal totals. This correction is known a Yate’s
correction.
After applying the Yate’s correction, the corrected value of 2 is given by

2
 N
N | ad  bc | 

 2
 =
2 .
(a  b) (b  d) ( a  c) (c  d)

128 Self-Instructional Material


Non-parametric Tests
6.4. CONDITIONS FOR 2 TEST

1. The number of observations collected must be large, i.e. n  30.


2. No theoretical frequency should be very small. NOTES
3. The sample observations should be independent.
4. N, the total of frequencies should be reasonably large, say, greater than 50.

6.5. USES OF 2 TEST

1. To test the goodness of fit.


2. To test the discrepancies between observed and expected frequencies.
3. To determine the association between attributes.

SOLVED EXAMPLES

Example 1. The following table gives the number of accidents that took place in
an industry during various days of the week. Test whether the accidents are uniformly
distributed over the week.

Days Mon. Tue. Wed. Thu. Fri. Sat.

No. of accidents 16 20 14 13 17 16

Solution. Here, n = 6, total number of accidents = 96


Null hypothesis H0 : the accidents are uniformly distributed over the week.
Under H0, the expected number of accidents of each of these days
Total no. of accidents 96
= = = 16
No. of days 6
The observed and expected number of accidents are given below :

Oi 16 20 14 13 17 16

Ei 16 16 16 16 16 16

(Oi – Ei)2 0 16 4 9 1 0

6
(O i  E i ) 2 0  16  4  9  1  0 30
2 = 
i=1
Ei
=
16
=
16
= 1.875.

Tabulated value of  for 5 (6 – 1 = 5) degrees of freedom at 5% level of significance


2

is 11.07.
Since calculated value of 2 is less than tabulated value of 2, so H0 is accepted,
i.e., the accidents are uniformly distributed over the week.

Self-Instructional Material 129


Statistical Analysis Example 2. A die is thrown 120 times and the result of these throws are given
as :

No. appeared on the die 1 2 3 4 5 6


NOTES
Frequency 16 30 22 18 14 20

Test whether the die is biased or not.


Solution. Here, n = 6, total frequency = 120
Null hypothesis H0 : die is unbiased
120
Under H0, the expected frequencies for each digit = = 20
6
The observed and expected frequencies are given below :

Oi 16 30 22 18 14 20

Ei 20 20 20 20 20 20

(Oi – Ei)2 16 100 4 4 36 0

6
(O i  E i ) 2 16  100  4  4  36  0 160
2 =
i1
 Ei
=
20
=
20
=8

Tabulated value of  for 5 (6 – 1 = 5) degrees of freedom at 5% level of significance


2

is 11.07. Since calculated value of 2 is less than tabulated value of 2, so H0 is accepted,
i.e. the die is unbiased.
Example 3. The following table shows the distribution of digits in numbers
chosen at random from a telephone directory :

Digits 0 1 2 3 4 5 6 7 8 9

Frequency 1026 1107 997 966 1075 933 1107 972 964 853

Test at 5% level whether the digits may be taken to occur equally frequently in
the directory.
Solution. Here, n = 10, total frequency = 10,000
Null hypothesis H0 : all the digits occur equally frequently in the directory
10,000
Under H0, the expected frequency of each of the digits = = 1000
10
The observed and expected frequencies are given below :

Oi 1026 1107 997 966 1075 933 1107 972 964 853

Ei 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000

(Oi – Ei)2 676 11449 9 1156 5625 4489 11449 784 1296 21609

10
(O i  E i ) 2 676  11449  ......  21609
2 = 
i1
Ei

1000
58542
=  58.542
1000

130 Self-Instructional Material


Tabulated value of 2 for 9 (10 – 1 = 9) degrees of freedom at 5% level of Non-parametric Tests
significance is 16.92.
Since calculated value of 2 is greater than tabulated value of 2 , so H0 is rejected,
i.e. all the digits in the numbers in the telephone directory do not occur equally
frequently. NOTES
Example 4. Survey of 320 families of 5 children each revealed the following
information :

No. of male births 5 4 3 2 1 0

No. of female births 0 1 2 3 4 5

No. of families 14 56 110 88 40 12

Test whether the data are consistent with the hypothesis that Binomial law holds
and the chance of male and female births are equally probable.
Solution. Null hypothesis
1
H0 : The male and female births are equally probable i.e., p = q = , where p is
2
the probability of female birth and q is the probability of male birth.
The expected frequencies are calculated by using Binomial distribution as :
E(r) = N × P (X = r), where r = 0, 1, 2, 3, 4, 5 ; where N is the total frequency
and E(r) is the number of families with r female children.
P(X = r) = nCr prqn – r ; n is number of children.
E(0) = No. of families with 0 female children
 1  1
0 50
1
= 320 × 5C0
 2  2  32
= 320 ×
= 10

E(1) = No. of families with 1 female children


 1  1
1 51
5
= 320 × 5C1
 2  2 32
= 320 ×
= 50

E(2) = No. of families with 2 female children


 1  1
2 52
10
= 320 × 5C2
 2  2  32
= 320 ×
= 100

E(3) = No. of families with 3 female children


 1  1
3 53
10
= 320 × 5C3
 2  2 32
= 320 ×
= 100

E(4) = No. of families with 4 female children


 1  1
4 54
5
= 320 × 5C4
 2  2 = 320 ×
32
= 50

E(5) = No. of families with 5 female children

= 320 × 5C5 1    
5
1
55 1
2   
2
= 320 ×
32
= 10

Oi 14 56 110 88 40 12

Ei 10 50 100 100 50 10

(Oi – Ei)2 16 36 100 144 100 4

Self-Instructional Material 131


Statistical Analysis 5
(O i  E i ) 2 16 36 100 144 100 4
2 = 
i0
Ei
 
10 50
+
100
+
100
+
50
+
10
= 1.60 + 0.72 + 1.00 + 1.44 + 2.00 + 0.40 = 7.16
NOTES Tabulated value of 2 for 5 ( 6 – 1 = 5) degrees of freedom at 5% level of significance
is 11.07. Since calculated value of 2, is less than tabulated value of 2, so H0 is accepted
i.e., male and female births are equally probable.
Example 5. Fit a Poisson distribution for the following data and test the goodness
of fit.

No. of defects (x) 0 1 2 3 4 5

Frequency 6 13 13 8 4 3

Solution. Null hypothesis H0 : Poisson distribution is a good fit to the data.


We first find the Poisson distribution for the above data.
 fi xi 94
Mean of given distribution =  =2
 fi 47
Here,  = 2 (For a Poisson distribution mean = )
N = fi = 47
The expected frequencies of the Poisson distribution are given by
r 2r
E(r) = N × e– 
= 47 × e–2 ; r = 0, 1, 2, 3, 4, 5
r! r!
The expected frequencies are as :
0
E(0) = 47 × e– 2 . 2 = 6.36  6 (e– 2 = 0.1353)
0!
1
E(1) = 47 × e–2 . 2 = 12.72 13
1!
22
E(2) = 47 × e– 2 . 2 ! = 12.72  13

23
E(3) = 47 × e–2 . = 8.48  9
3!
24
E(4) = 47 × e–2 . = 4.24  4
4!
25
E(5) = 47 × e–2 . = 1.696  2
5!

x 0 1 2 3 4 5

Oi 6 13 13 8 4 3

Ei 6.36 12.72 12.72 8.48 4.24 1.696

(Oi – Ei)2 0.1296 0.0784 0.0784 0.2304 0.0576 1.7004

5
(O i  E i ) 2 0.1296 0.0784 0.0784 0.2304 0.0576 1.7004
2 = 
i0
Ei
=
6.36

12.72

12.72

8.48

4.24

1.696
= 0.02038 + 0.00616 + 0.00616 + 0.02717 + 0.01358 + 1.0026
= 1.07605

132 Self-Instructional Material


Tabulated value of 2 for 4 (6 – 2 = 4) degrees of freedom at 5% level of significance Non-parametric Tests
is 9.488.
Since calculated value of 2 is less than tabulated value of 2, so H0 is accepted,
i.e., Poisson distribution is a good fit to the data.
Example 6. The theory predicts that the proportion of beans in the four groups
NOTES
A, B, C and D should be in the ratio 11 : 4 : 3 : 2. In an experiment with 2000 beans the
number of four groups A, B, C and D are 1070, 430, 330 and 170 respectively. Does the
experimental result support the theory.
Solution. Null hypothesis H0 : the experimental result support the theory, i.e.
there is no significant difference between observed and theoretical frequencies.
Under H0 the expected (theoretical) frequencies can be calculated as :
Total number of beans = 1070 + 430 + 330 + 170 = 2000
Sum of ratios = 11 + 4 + 3 + 2 = 20
11
E(A) = 2000 × = 1100
20
4
E(B) = 2000 × = 400
20
3
E(C) = 2000 × = 300
20
2
E(D) = 2000 × = 200
20
Oi 1070 430 330 170

Ei 1100 400 300 200

(Oi – Ei)2 900 900 900 900

4
(O i  E i ) 2 900 900 900 900
2 = 
i1
Ei
=   
1100 400 300 200
= 0.8182 + 2.250 + 3.000 + 4.500 = 10.5682
Tabulated value of 2 for 3 (4 – 1 = 3) degrees of freedom at 5% level of significance
is 7.815. Since calculated value of 2 is greater than tabulated value of 2, so H0 is
rejected, i.e. the experimental results does not support the theory.

Example 7. Find the expected frequencies of 2 × 2 contingency table a b .


c d

Solution. Attributes B1 B2 Total

A1 a b a+b

A2 c d c+d

Total a+c b+d N=a+b+c+d

The expected frequencies are


(a  b) (a  c)
E(a) = E(A1, B1) =
abcd
(a  b) (b  d)
E(b) = E(A1, B2) =
abcd
Self-Instructional Material 133
Statistical Analysis (c  d) (a  c)
E(c) = E(A2, B1) =
abcd
(c  d) (b  d)
E(d) = E(A2, B2) =
abcd
NOTES Example 8. The following data is collected on two characters :

Smokers Non smokers

Literate 83 57

Illiterate 45 68

From this information find out whether there is any relation between literacy
and the smoking.
Solution. Null hypothesis H0 : There is no relation between literacy and the
smoking, i.e. they are independent

Smokers Non smokers Total

Literate 83 57 140 (R1)

Illiterate 45 68 113 (R2)

Total 128 (C1) 125 (C2) N = 253

Under the null hypothesis, expected frequencies can be calculated by using


Ri  C j
Eij = (i = 1, 2 ; j = 1, 2)
N
Expected frequencies are

Smokers Non smokers Total

140  128 140  125


Literate = 70.83 = 69.17 140
253 253

113  128 113  125


Illiterate = 57.17 = 55.83 113
253 253

Total 128 125 253

2 2
(O ij  Eij ) 2
2 = 
i1j1
Eij

(83  70.83) 2 (57  69.17) 2 (45  57.17) 2 (68  55.83) 2


=   +
70.83 69.17 57.17 55.83
= 2.091 + 2.141 + 2.590 + 2.653 = 9.475
Tabulated value of 2 for 1 [(2 – 1) (2 – 1) = 1] degree of freedom at 5% level of
significance is 3.841.
Since calculated value of 2 is greater than tabulated value of 2, so H0 is rejected,
i.e., there is a relation between literacy and smoking or they are not independent.

134 Self-Instructional Material


Example 9. In a locality 100 persons were randomly selected and asked about Non-parametric Tests
their educational achievements. The results are given below :

Sex Education
NOTES
Middle High school College

Male 10 15 25

Female 25 10 15

Based on this information can you say the education depends on sex.
Solution. Null hypothesis H0 : Education is independent of sex.
Under the null hypothesis expected frequencies can be calculated by using
Ri  C j
Eij =
N
(i = 1, 2 ; j = 1, 2, 3)

Sex Education

Middle High school College Total

Male 10 15 25 50 (R1)

Female 25 10 15 50 (R2)

Total 35(C1) 25 (C2) 40 (C3) N = 100

Expected frequencies are

Education
Sex
Middle High school College Total

50  35 50  25 50  40
Male = 17.5 = 12.5 = 20 50
100 100 100
50  35 50  25 50  40
Female = 17.5 = 12.5 = 20 50
100 100 100
Total 35 25 40 100

2 3
(O ij  E ij ) 2
2 = 
i1j1
E ij

(10  17.5) 2 (15  12.5) 2 (25  20) 2 (25  17.5) 2 (10  12.5) 2 (15  20) 2
=     
17.5 12.5 20 17.5 12.5 20
= 3.214 + 0.5 + 1.25 + 3.214 + 0.5 + 1.25 = 9.928
Tabulated value of 2 for 2 [(2 – 1) (3 – 1) =2] degrees of freedom at 5% level of
significance is 5.991. Since calculated value of 2 is greater than tabulated value of 2,
so H0 is rejected, i.e., education is not independent of sex or there is a relation between
education and sex.

Self-Instructional Material 135


Statistical Analysis Example 10. From the following table regarding the colour of eyes of father and
son, test whether the colour of the son’s eyes is associated with that of father’s.

Eye colour of son


NOTES Eye colour of father Total
Light Not light

Light 471 151 622

Not light 148 230 378

Total 619 381 1000

Solution. Null hypothesis H0 : The colour of son’s eye is not associated with
that of father, i.e., they are independent.
Under the null hypothesis expected frequencies can be calculated by using
Ri  C j
Eij = (i = 1, 2 ; j = 1, 2)
N
Expected frequencies are

Eye colour of son


Eye colour of father Total
Light Not light

622  619 622  381


Light = 385.018 = 236.982 622
1000 1000
378  619 378  381
Not light = 233.982 = 144.018 378
1000 1000

Total 619 381 1000

2 2
(O ij  Eij ) 2
2 = 
i1j1
Eij

(471  385.018) 2 (151  236.982) 2 (148  233.982) 2 (230  144.018) 2


=   +
385.018 236.982 233.982 144.018
= 19.201 + 31.196 + 31.596 + 51.333 = 133.326
Tabulated value of 2 for 1 [(2 – 1) (2 – 1) = 1] degree of freedom at 5% level of
significance is 3.841.
Since calculated value of 2 is greater than tabulated value of 2, so H0 is rejected,
i.e., the colour of son’s eye is associated with that of father or they are dependent.
Example 11. The following table gives the number of good and bad parts
produced by each of the three shifts in a factory.

Good parts Bad parts Total

Day shift 960 40 1000

Evening shift 940 50 990

Night shift 950 45 995

Total 2850 135 2985

136 Self-Instructional Material


Test whether the production of bad parts is independent of the shifts on which Non-parametric Tests
they were produced.
Solution. Null hypothesis H0 : The production of bad parts is independent of
the shift on which they were produced, i.e. production and shifts are independent.
Under the null hypothesis expected frequencies can be calculated by using
NOTES
Ri  C j
Eij = (i = 1, 2, 3 ; j = 1, 2)
N
Expected frequencies are

Good parts Bad parts Total

1000  2850 1000  135


Day shift = 954.774 = 45.226 1000
2985 2985

990  2850 990  135


Evening shift = 945.226 = 44.774 990
2985 2985

995  2850 995  135


Night shift = 950.000 = 45.000 995
2985 2985

2850 135 2985

3 2
(O ij  E ij ) 2
2 = 
i1j1
E ij

(960  954.774) 2 (40  45.226) 2 (940  945.226) 2


=  
954.774 45.226 945.226
(50  44.774) 2 (950  950) 2 (45  45) 2
+  
44.774 950 45
= 0.0286 + 0.6039 + 0.0289 + 0.6099 + 0 + 0 = 1.2713
Tabulated value of 2 for 2 [(3 – 1) (2 – 1) = 2] degrees of freedom at 5% level of
significance is 5.991
Since calculated value of 2 is less than tabulated value of 2, so H0 is accepted,
i.e., the production of bad parts is independent of the shift on which they were produced.

EXERCISE 6.1

1. The frequency distribution of the digits on a set of random numbers was observed to be :

Digits 0 1 2 3 4 5 6 7 8 9

Frequency 18 19 23 21 16 25 22 20 21 15

Test the hypothesis that the digits are uniformly distributed.


2. The sales in a supermarket during a week are given below :

Days Mon. Tue. Wed. Thu. Fri. Sat.

(Sales ,000 `) 65 54 60 56 71 84

Test the hypothesis that the sales do not depend on the day of the week, using a 5%
significant level.

Self-Instructional Material 137


Statistical Analysis 3. The following table gives the number of accidents that took place in an industry during
various days of the week :

Days Mon. Tue. Wed. Thu. Fri. Sat.

NOTES No. of accidents 14 18 12 11 15 14

Test if accidents are uniformly distributed over the week.


4. A die is thrown 276 times and the results of these throws are given below :

No. appeared on the die 1 2 3 4 5 6

Frequency 40 32 29 59 57 59

Test whether the die is biased or not.


5. A sample analysis of examination results of 500 students was made. It was found that
220 had failed ; 170 had secured a third class ; 90 were placed in second class ; 20 got
first class. Are these results commensurable with the general examination result which
is in the ratio of 4 : 3 : 2 : 1 for the above said categories respectively.
6. Four dice were thrown 112 times and the number of times 1, 3 or 5 was thrown were as
under :

No. of dice throwing 1, 3 or 5 0 1 2 3 4

Frequency 10 25 40 30 7

Test the hypothesis that all dice were fair.


7. Fit a Poisson distribution for the following data and test the goodness of fit.

No. of defects (x) 0 1 2 3 4

Frequency 109 65 22 3 1

8. The following table gives the classification of 100 workers according to sex and nature of
work. Using 2-test examine whether the nature of work is independent of the sex of the
worker.

Sex Nature of work

Skilled Unskilled

Male 40 20

Female 10 30

9. For the data given in the following table use 2 -test to test the effectiveness of inoculation
in preventing the attack of smallpox.

Attacked Not attacked

Inoculated 25 220

Not inoculated 90 160

138 Self-Instructional Material


10. Two investigators draw samples from the same town in order to estimate the number of Non-parametric Tests
persons falling in the income groups ‘poor’, middle class’ and ‘well to do’. Their results
are as follows :

Investigator Income groups


NOTES
Poor Middle class Well to do

A 140 100 15

B 140 50 20

Test whether the sampling techniques of the two investigators are significantly depend-
ent of the income groups of people.

Answers
1. Yes 2. Accepted
3. Yes 4. Biased
5. No 6. Yes
7. Poisson distribution is a good fit to the data
8. No
9. Inoculation against smallpox is a preventive measure
10. Sampling techniques are dependent of the income groups

6.6. CORRELATION AN
CORRELATION AL
ANAL YSIS
ALY

In a bivariate distribution, if the change in one variable is accompanied by a


change in the other variable in such a way that an increase in one variable results in
an increase or decrease in the other, then the two variables are said to be correlated.
For example, income and expenditure, heights and weights of students in a class,
price and demand of certain commodities.
If the increase (or decrease) in one variable results in a corresponding increase
(or decrease) in the other, correlation is said to be direct or positive. But if the increase
(or decrease) in one variable results in a corresponding decrease (or increase), in the
other, correlation is said to be negative. If two variables vary in such a way that their
ratio is always constant, then the correlation is said to be perfect.

6.7. SCATTER OR DO
SCATTER T DIA
DOT GRAM
DIAGRAM

When we plot the corresponding values of two variables, taking one on X-axis
and the other along Y-axis, it shows a collection of dots. This collection of dots is called
a dot diagram or a scatter diagram.
If all the plotted points lie in a straight line and show an upward trend, then the
correlation is perfect positive. If all the plotted points lie in a straight line and show a
downward trend, then the correlation is perfect negative.
If the plotted points are not on a straight line but seem to be scattered around a
straight line, the variables are correlated. Closer the scatter of points around a line,
higher is the degree of correlation. If the plotted points are not clustered around a
straight line but are widely scattered over the diagram, then there is a very low degree
of correlation between the variables.
Self-Instructional Material 139
Statistical Analysis If the plotted points show no trend at all, then the variables are independent
and are not correlated.

Y Y

NOTES

O Perfect positive correlation X O Perfect negative correlation X

Y Y

O Higher degree of X O Higher degree of X


positive correlation negative correlation

Y Y

O Low degree of X O Low degree of X


positive correlation negative correlation

O No correlation X

140 Self-Instructional Material


Karl Pearson’s Coefficient of Correlation Non-parametric Tests

The correlation coefficient r (x, y) between two variables x and y is given by


cov ( x, y) cov ( x, y)
r (x, y) = =
variance ( x) variance ( y) x y NOTES
r (x, y) is also denoted by (x, y) or rxy or simply by r.
n
1
n  (x i  x ) ( yi  y)
i1
r(x, y) =
n n
1 1
n 
i1
( xi  x ) 2
n  (y
i1
i  y) 2

n n n
n  xi yi   y xi i
i1 i1 i1
or r(x, y) =
n   n
2
n   n
2

x   x  y   y 
   
2 2
n i i n i i
i1 i1 i1 i1

If the values of xi or yi ’s are large or involve fractions, then define


xi  a y b
ui = and vi = i ,
h k
where a and b are assumed means of x-series and y-series respectively, h and k are
constants. This property is known as change of origin and scale. Correlation coefficient
is independent of change of origin and scale. In this r(x, y) is given by the formula.
n n n
n  uv  u  v
i i i i
i1 i1 i1
r(x, y) = r(u, v) = .
n   n
2
 
n n
2

u   u  n  v   v 
   
2 2
n i i i i
i1 i1 i1 i1

6.8. CHARA CTERISTICS OF THE COEFFICIENT OF COR-


CHARACTERISTICS
RELATION r
RELATION

(i) – 1  r  + 1
(ii) If r = – 1, then there is perfect negative correlation between x and y.
(iii) If r =1, then there is perfect positive correlation between x and y.
(iv) If r = 0, then there is no correlation between x and y.
(v) If – 1  r < 0, then there is negative correlation between x and y.
(vi) If 0 < r  1, then there is positive correlation between x and y.

6.9. SPEARMAN’S RANK CORRELATION


CORRELATION

Sometimes we have to deal with problems in which data cannot be measured


quantitatively but qualitative assessment is possible, e.g. beauty, honesty, morality

Self-Instructional Material 141


Statistical Analysis etc. In such a cases we assign ranks to the individuals possessing these attributes or
characteristics. The best individual is given rank 1, the next rank 2 and so on.
The coefficient of rank correlation r is given by
n
NOTES 6 d
i1
i
2

r(x, y) = 1 –
, 2
n (n  1)
where di is the difference of corresponding rank and n is the number of pairs of
observations.
Let (xi , yi) ; i = 1, 2, ......, n be the ranks of the ith individuals in two characteristics
x and y respectively. Assuming that no two individuals are equal in either classification,
each individual takes the values 1, 2, 3, ......, n.
n
1 1 1 n (n  1) n1
Then x =
n
x
i1
i 
n
(1 + 2 + 3 + ...... + n) = .
n 2
=
2
n
1 1 1 n (n  1) n  1
y =
n y
i1
i 
n
(1 + 2 + 3 + ...... + n) =
n 2

2
 
  ( y) = n1 (1 + 2 + 3 + ...... + n ) –  n 2 1
n
  y
2
1 2 2
x2 = y2 = 2 2 2 2

 n  i1
i

1 n (n  1) (2n  1)  n  1
2

 2  
2
(n  1) (2n  1) (n  1)
= 
n 6 6 4
(n  1) (n  1) (n  1) n 2  1
= (4n + 2 – 3n – 3) = 
12 12 12
n n n

 di 2 =  ( xi  yi ) 2 =  [( x
i1
i  x )  ( yi  y )]2 (Q x  y)
i1 i1
n n n
=  ( xi  x ) 2 +  ( yi  y ) 2 – 2  (x
i1
i  x ) ( yi  y )
i1 i1

1
n
1
n
1
n 1 n "#
  di 2   ( xi  x ) 2   ( yi  y ) 2  2  (x  x ) ( yi  y)
#$
!n
i
n i1
n i1
n i1 i1
n n
1 1
We know that var (x) =
n  (x
i1
i  x ) 2 , var (y) =
n  (y i  y) 2
i1
n
1
cov (x, y) =
n  (x
i1
i  x ) (yi – y )

n
1

n d
i1
i
2
= var (x)+ var (y) – 2cov (x, y)

1
d
n
2
= 2 var (x) – 2r(x, y) x y
' r ( x, y) 
cov ( x, y) "#
#$
i
n i1
n
!  x y
1
n d
i1
i
2
= 2x2 – 2r (x, y) x2 [Q x = y]

142 Self-Instructional Material


n Non-parametric Tests
1
n 
i1
di 2 = 2x2 [1 – r(x, y)]

n  n  1
2
1
n d
i1
i
2
=2  12  [1 – r(x, y)] NOTES
n
1 n2  1
n 
i1
di 2 
6
[1 – r (x, y)]

n
6
n (n 2  1)
d
i1
i
2
= 1 – r (x, y)
n
6 d
i1
i
2

 r(x, y) = 1 – 2
.
n (n  1)

SOLVED EXAMPLES

Example 1. Find the coefficient of correlation for the following data :


n = 10,  x = 50, y = – 30,  x2 = 290, y2 = 300, xy = – 115.
n xy  xy
Solution. r(x, y) =
n x 2  ( x) 2 n y 2  ( y) 2
10  (  115)  (50) (  30)
=
10  290  (50) 2 10  300  (  30) 2
 1150  1500 350
= = = 0.3819.
400 2100 200 21
Example 2. A computer while calculating correlation coefficient between two
variables x and y from 25 pairs of observations obtained the following results :
n = 25, x = 125, y = 100, x2 = 650, y2 = 460, xy = 508.
It was, however later, discovered at the time of checking that he had copied down
two pairs as :
x y x y
6 14 8 12
8 6 while the correct values were 6 8 . Obtain the correct value of correlation
coefficient.
Solution. Corrected x = given x – (sum of incorrect values)
+ (sum of the correct values)
Corrected x = 125 – (6 + 8) + (8 + 6) = 125
Corrected y = 100 – (14 + 6) + (12 + 8) = 100
Corrected x2 = 650 – (62 + 82) + (82 + 62) = 650
Corrected y2 = 460 – (142 + 62) + (122 + 82) = 436
Corrected xy = 508 – (6 × 14 + 8 × 6) + (8 × 12 + 6 × 8) = 520
n xy  x y
Corrected r (x, y) =
nx  ( x) 2 ny 2  (y) 2
2

25  520  125  100


=
25  650  (125) 2  25  436  (100) 2
500
= = 0.66.
25  30

Self-Instructional Material 143


Statistical Analysis Example 3. Calculate the Karl Pearson’s coefficient of correlation for the following
data :

x 2 4 6 8 10
NOTES
y 20 12 18 10 40

Solution.

x y x2 y2 xy

2 20 4 400 40
4 12 16 144 48
6 18 36 324 108
8 10 64 100 80
10 40 100 1600 400

x = 30 y = 100 xy = 220 y2 = 2568 xy = 676

Here, n=5
n xy  x y
r (x, y) =
n x  ( x) 2
2
ny 2  (y) 2
5  676  30  100
=
5  220  (30) 2 5  2568  (100) 2
3380  3000 380
= = 0.5042 
1100  900 12840  10000 200 2840
Example 4. Find the Karl Pearson’s coefficient of correlation between x and y
for the following data :

x 150 153 154 155 157 160 163 164

y 65 66 67 70 68 53 70 63

Solution. Let ui = xi – 155 and vi = yi – 68

x y u v u2 v2 uv

150 65 –5 –3 25 9 15
153 66 –2 –2 4 4 4
154 67 –1 –1 1 1 1
155 70 0 2 0 4 0
157 68 2 0 4 0 0
160 53 5 – 15 25 225 – 75
163 70 8 2 64 4 16
164 63 9 –5 81 25 – 45

Total 16 – 22 204 272 – 84

144 Self-Instructional Material


Here, n=8 Non-parametric Tests
n uivi  ui vi
r (x, y) =
nui  (ui ) 2 nvi 2  (vi ) 2
2

8  ( 84)  (16)  ( 22)


= NOTES
8  204  (16) 2 8  272  ( 22) 2
 672  352  320
=  = – 0.2097.
1376 1692 1525.8414
Example 5. Calculate the rank correlation coefficient for the following data :

Student A B C D E F G H I J

Rank in Maths. 9 10 6 5 7 2 4 8 1 3

Rank in Stats. 1 2 3 4 5 6 7 8 9 10

Solution. Here, the ranks are given and n = 10.

Student R1 R2 d = R 1 – R2 d2

A 9 1 8 64
B 10 2 8 64
C 6 3 3 9
D 5 4 1 1
E 7 5 2 4
F 2 6 –4 16
G 4 7 –3 9
H 8 8 0 0
I 1 9 –8 64
J 3 10 –7 49

d2 = 280

6 di 2 6  280
1680
r=1– 2 =1– =1–
= 1 – 1.697 = – 0.697.
2
n(n  1) 10 (10  1) 990
Example 6. Ten competitors in a beauty contest are ranked by three judges in
the following order :

First Judge 1 6 5 10 3 2 4 9 7 8

Second Judge 3 5 8 4 7 10 2 1 6 9

Third Judge 6 4 9 8 1 2 3 10 5 7

Using the rank correlation method, discuss which pair of judges has the nearest
approach to common taste in beauty ?

Self-Instructional Material 145


Statistical Analysis Solution. Let R1, R2, R3 be the ranks given by three judges.

R1 R2 R3 d12 = R1 – R2 d13 = R1 – R3 d23 = R2 – R3 d122 d132 d232

1 3 6 –2 –5 –3 4 25 9
NOTES 6 5 4 1 2 1 1 4 1
5 8 9 –3 –4 –1 9 16 1
10 4 8 6 2 –4 36 4 16
3 7 1 –4 2 6 16 4 36
2 10 2 –8 0 8 64 0 64
4 2 3 2 1 –1 4 1 1
9 1 10 8 –1 –9 64 1 81
7 6 5 1 2 1 1 4 1
8 9 7 –1 1 2 1 1 4
Total 200 60 214

Here, n = 10
Rank correlation coefficient between first and second judges

6d12 2 6  200 40
r12 = 1 – 2 =1–=1– = – 0.212
n(n  1) 10  99 33
Rank correlation coefficient between first and third judges,
6 d13 2 6  60 4
r13 = 1 – =1– =1– = 0.636
n(n 2  1) 10  99 11
Rank correlation coefficient between second and third judges,
6d23 2
6  214 214
r23 = 1 – 2 =1– =1– = – 0.297
n (n  1) 10  99 165
Since r13 is a maximum, therefore, the pair of judges first and third has the
nearest approach to common tastes in beauty.
Example 7. The marks obtained by 9 students in Statistics and Mathematics
are given below :

Marks in Statistics 35 23 47 17 10 43 9 6 28

Marks in Mathematics 30 33 45 23 8 49 12 4 31

Compute the rank correlation coefficient.


Solution. Here, the marks are given. First find the ranks and then differences.

Marks in Marks in Ranks in Rank in


Statistics (X) Mathematics (Y) X (xi) Y (yi) di = xi – yi di2

35 30 3 5 –2 4
23 33 5 3 2 4
47 45 1 2 –1 1
17 23 6 6 0 0
10 8 7 8 –1 1
43 49 2 1 1 1
9 12 8 7 1 1
6 4 9 9 0 0
28 31 4 4 0 0
Total 12

146 Self-Instructional Material


Here, n = 9,  di2 = 12 Non-parametric Tests

6  di 2 6  12
r=1– =1–
n(n 2  1) 9(9 2  1)
6  12 NOTES
=1– = 1 – 0.1 = 0.9.
9  80

EXERCISE 6.2

1. Calculate the Karl Pearson’s correlation coefficient between height of father and height
of son from the given data :

Height of father (in inches) 64 65 66 67 68 69 70

Height of son (in inches) 66 67 65 68 70 68 72

2. Calculate the Karl Pearson’s correlation coefficient from the following data :

Overheads in (’ 000 `) 80 90 100 110 120 130 140 150 160

Cost (’ 000 `) 15 19 16 19 17 18 16 18 15

3. Calculate the Karl Pearson’s correlation coefficient from the following data using 20 as
the working mean for price and 70 as the working mean for demand.

Price 14 16 17 18 19 20 21 22 23

Demand 84 78 70 75 66 67 62 58 60

4. Coefficient of correlation between x and y for 20 items is 0.3. Mean of x is 15 and mean of
y is 20 while standard deviations are 4 and 5 for x and y respectively. At the time of
calculation one item 27 has wrongly been taken as 17 in case of x series and 35 instead
of 30 in case of y series. Find the correct coefficient of correlation.
5. Ten students got the following marks in Statistics and Mathematics :

Marks in Statistics 78 36 98 25 75 82 90 62 65 39

Marks in Mathematics 84 51 91 60 68 62 86 58 53 47

Calculate the coefficient of correlation.


6. Calculate the correlation coefficient from the following results :
n = 10, x = 140, y = 150, (x – 10)2 = 180,
(y – 15)2 = 215 and (x – 10) (y – 15) = 60.
7. Calculate the coefficient of rank correlation from the following data :

x 10 12 8 15 20 25 40

y 15 10 6 25 16 12 18

8. Calculate the coefficient of rank correlation from the following data :

x 4 6 8 10 12 14 16 18

y 10 15 20 25 30 35 40 45

Self-Instructional Material 147


Statistical Analysis 9. In a beauty contest two judges rank the ten competitors in the following order :

Competitors A B C D E F G H I J

Rank by Judge I 6 4 3 1 2 7 9 8 10 5
NOTES
Rank by Judge II 4 1 6 7 5 8 10 9 3 2

Determine if the two judges have the same taste in beauty.


10. The coefficient of rank correlation between marks in Statistics and marks in Mathematics
obtained by a certain group of students is 0.8. If the sum of the squares of the differences
in marks is 33, find the number of students in the group.
11. The coefficient of rank correlation of the marks obtained by 10 students in Mathematics
and Statistics was found to be 0.5. It was then detected that the difference in ranks in
the two subjects for one particular student was wrongly taken to be 3 in place of 7.
What should be the correct rank correlation coefficients ?

Answers
1. r = 0.81 2. r = – 0.11547 3. r = – 0.9542
4. Correct r = 0.515 5. r = 0.78 6. 0.9151 7. r = 0.57
8. r=1 9. Yes 10. n = 10 11. r = 0.2576.

148 Self-Instructional Material


Regression Analysis

NOTES
7. REGRESSION ANALYSIS

STRUCTURE

Linear Regression
Lines of Regression
Properties of Regression Coefficients
Angle Between Two Lines of Regression
Standard Error of Estimate (or Prediction)
Coefficient of Determination
Properties of Coefficient of Determination

Regression analysis attempts to establish the nature of relationship between


the variables. It also helps to determine the functional relationship between the
variables so that one can predict or estimate the value of one variable for the given
value of the other variable. Regression measures the nature and extent of correlation.

7.1. LINEAR REGRESSION

If the variables in a bivariate distribution are correlated, then points in scatter


diagram will be more or less concentrated round a curve. This curve is called the curve
of regression. If the curve is straight line, it is called a line of regression and the
regression is said to be linear. Since the line of regression gives the best estimate to
the value of dependent variable for any given value of the independent variable,
therefore, it is called the line of best fit which is obtained by the method of least squares.
Since any one of the two variables x and y can be taken as the independent variable
and the other as a dependent variable. Therefore, there are two regression lines, one
as the line of regression of y on x and the other as the line of regression of x on y.

7.2. LINES OF REGRESSION

Let the equation of line of regression of y on x be


y = a + bx ...(1) then y = a + bx ...(2)

Self-Instructional Material 149


Statistical Analysis Now subtracting (2) from (1), we have
y – y = b(x – x ) ...(3)
The normal equation for the equation (1) are
y = na + bx
NOTES
 xy = ax + bx2 ...(4)
Shifting the origin to ( x , y ), (4) becomes
(x – x ) (y – y ) = a  (x – x ) + b(x – x )2 ...(5)
We know that
1
( x  x ) ( y  y)
r= n ; (x – x ) = 0 ;
x  y
1
and x2 = ( x  x ) 2
n
From (5), we have
y
nr xy = a.0 + b.nx2 or b = r
x
Hence, from (3), the line of regression of y on x is given by
y
y– y =r (x – x )
x
Similarly, the line of regression of x on y is given by

x – x = r x (y – y )
y
y
r is called the regression coefficient of y on x and is denoted by byx.
x
r y cov ( x, y) n xy  xy
byx = = =
x x 2
nx 2  (x) 2
x
r is called the regression coefficient of x on y and is denoted by bxy.
y
x cov ( x, y) n xy  xy
bxy = r = 2 =
y y n y 2  (y) 2
Note. The line of regression of y on x is used to estimate the value of y for given value of
x. The line of regression of x on y is used to estimate the value of x for given value of y.

7.3. PROPER
PROPERTIES OF REGRESSION COEFFICIENTS
OPERTIES

(i) The correlation coefficient and two regression coefficients are of the same
sign.
(ii) If one of the regression coefficient is greater than unity, the other must be
less than unity.
(iii) Arithmetic mean of regression coefficients is greater than the correlation
coefficient.
(iv) The correlation coefficient is the geometric mean between the regression
coefficients.
(v) Regression coefficients are independent of the origin and not of scale.

150 Self-Instructional Material


Regression Analysis
7.4. ANGLE BETWEEN TWO LINES OF REGRESSION
TWO

If  is the acute angle between the two lines of regression in the case of two
variables x and y, then NOTES
1 r 2 x y
tan  = 2
. 2 ,
r  x   y2
where r, x and y have their usual meanings.
Explain the significance when r = 0 and r =  1.
Proof. Equation of the line of regression y on x is
y
y– y =r (x – x )
x
and the equation of the line of regression x on y is

x – x = r x ( y – y)
y
y y
Their slopes are m1 = r and m2 =
x r x
y y
r
m2  m1 r x x
 tan  =  =
1  m1m2 y 2
1
 x2
1  r2  y  2 1  r2 x y
= . . 2 x 2  . 2
r x x   y r  x   y2
Since r2  1 and x, y are positive.
 Positive sign gives the acute angle between the lines.
1  r2 x y
Hence, tan  = . 2
r  x   y2

When r = 0,  = .
2
So two lines of regression are perpendicular to each other.
When r = ± 1, tan  = 0 so that  = 0 or .
So two lines of regression coincide and there is perfect correlation between the
two variables x and y.

SOLVED EXAMPLES

Example 1. Find the equation of two lines of regression for the data :

x 1 2 3 4 5

y 7 6 5 4 3

and hence find an estimate of y for x = 3.5 from the appropriate line of regression.

Self-Instructional Material 151


Statistical Analysis Solution.

x y x2 y2 xy

1 7 1 49 7
NOTES
2 6 4 36 12
3 5 9 25 15
4 4 16 16 16
5 3 25 9 15

x = 15 y = 25 x2 = 55 y2 = 135 xy = 65

Here, n = 5
1 15 1 25
x = n xi = 5 = 3, y = n yi = 5 = 5
n xi yi  xi yi 5  65  15  25 13  15
Now, byx =   =–1
nxi 2  (xi ) 2 5  55  (15) 2 11  9
n xi yi  xi yi
5  65  15  25 13  15
bxy =   =–1
nyi 2  (yi ) 2 5  135  (25) 2 27  25
So, line of regression of y on x is
y – y = byx( x – x )  y – 5 = – 1 (x – 3)
 y=–x+8
and the line of regression of x on y is
x – x = bxy (y – y )  x – 3 = – 1 (y – 5)
 x=– y+8
To estimate the value of y when x is given, we use the line of regression of y on
x, i.e.
y=–x+8
Now substitute x = 3.5, we have
y = – 3.5 + 8 = 4.5.
Example 2. The following table gives age (x) in years of cars and annual
maintenance cost (y) in hundred rupees :

x 1 3 5 7 9

y 15 18 21 23 22

Estimate the maintenance cost for a 6 years old car after finding the appropriate
line of regression.
Solution.

x y x2 xy

1 15 1 15
3 18 9 54
5 21 25 105
7 23 49 161
9 22 81 198

x = 25 y = 99 x2 = 165 xy = 533

152 Self-Instructional Material


Here, n=5 Regression Analysis
1 25 1 99
x = n xi  5 = 5, y = yi  = 19.8
n 5
n xi yi  xi yi5  533  25  99
byx =  NOTES
n xi 2  (xi ) 2 5  165  (25) 2
2665  2475 190
=  = 0.95
825  625 200
Regression line of y on x is given by
y – y = byx (x – x )
y – 19.8 = 0.95 (x – 5)
y = 0.95 x + 15.05
When x = 4 years.
y = 0.95 × 4 + 15.05
= 18.85 hundred rupees
= ` 1885.
Example 3. From the following information on values of two variables x and y,
find the two regression lines and the correlation coefficient between x and y.
n = 10, x = 20, y = 40, x2 = 240,
y2 = 410, xy = 200.
Solution. We know that
n xy  xy 10  200  20  40 20  8 12 3
byx = = = = =
2
n x  (x) 2
10  240  (20) 2 24  4 20 5
n xy  xy
and bxy =
n y 2  (y) 2
10  200  20  40
20  8 12
= = 2

10  410  (40) 41  16 25
x 20 y 40
x = n = 10 = 2, y = =
10
=4
n
The two regression lines are
3
y – y = byx (x – x ), i.e. y – 4 = (x – 2)
5
y = 0.6 x + 2.8
12
and x – x = bxy (y – y ) i.e., x – 2 = (y – 4)
25
x = 0.48 y + 0.08

We know that r= byx  bxy

3 12 36
r=  = = 0.536
5 25 125
(Q Regression coefficients are positive so r will be positive)
Example 4. For 100 students of a class, the regression equation of marks in
Statistics (x) on the marks in Mathematics (y) is 3y – 5x + 180 = 0. The mean marks in
4
Mathematics is 50 and variance of marks in Statistics is th of the variance of marks
9

Self-Instructional Material 153


Statistical Analysis in Mathematics. Find the mean marks in Statistics and the coefficient of correlation
between marks in the two subjects.
Solution. Since the given line of regression is x on y so we have
3 180 3
NOTES x=y = y  36
5 5 5
3 x
We have bxy = = r
5 y
4
Given variance of x = variance of y
9
2
Variance of x ( x ) 4 x 2
    =
Variance of y ( y ) 2
9 y 3
3 2 9
 =r×  r= = 0.9
5 3 10
(Q bxy is positive, r is positive)
Since the mean of x and mean of y lie on the regression lines, we have
3 3
x = y  36  x = × 50 + 36 = 66. (Q y = 50)
5 5
Example 5. The lines of regression of y on x and x on y are y = x + 5 and 16x – 9y
= 94 respectively.
Find the variance of x if the variance of y is 16. Also find the covariance of
x and y.
Solution. Regression equation of y on x is
y=x+5  byx = 1 (Coefficient of x)
Regression equation of x on y is
9 94
16x – 9y = 94, i.e., x= y
16 16
9
 bxy = (Coefficient of y)
16
We know that r= byx  bxy

9 3
r= 1 = = 0.75 (r is positive since byx and bxy are positive)
16 4
 9   (4)
x bxy   y  16  = 3 (Q
bxy = r
y
 x =
r

 3  y2 = 16)

 4
Cov. ( x, y)
r=
x y
3
 cov (x, y) = r x y =  3  4 = 9.
4
Example 6. From the following information on values of two variables x and y,
find the two regression lines and estimate values of x and y if y = 10 and x = 8 respectively.
n = 5, x = 15, y = 18, x2 = 55, y2 = 74, xy = 58.

154 Self-Instructional Material


Solution. We know that Regression Analysis
n xy  xy 290  270 20
5  58  15  18 2
byx = = =  =
2
n x  (x) 2
5  55  (15) 2752 225 50 5
n xy  xy 5  58  15  18 290  270 20 10
and bxy = = = =  NOTES
2
n y  (y) 2
5  74  (18) 2 370  324 46 23
x 15 y 18
x =  = 3 and y = =
n 5 n 5
The line of regression of y on x is given by
y – y = byx (x – x )
18 2
y– = (x – 3)
5 5
y = 0.4x + 2.4 ...(1)
The line of regression of x on y is given by

10 18 
x – x = bxy (y – y )  x – 3 =
23
y
5  or x = 0.435y + 1.435 ...(2)

Putting x = 8 in (1), we have


y = 0.4 × 8 + 2.4 = 3.2 + 2.4 = 5.6
Putting y = 10 in (2), we have
x = 0.435 × 10 + 1.435 = 4.35 + 1.435 = 5.785.
Example 7. The information about advertising and sales of a manufacturing
concern is given as follows :
Advertising expenditure Sales (y)
(x) (` Lacs) (` Lacs)
Mean 10 90
S.D. 3 12
Correlation coefficient = 0.8.
Find (i) the regression coefficients byx and bxy.
(ii) the two regression lines
(iii) the likely sales when advertisement expenditure is ` 18 lacs.
(iv) the advertisement expenditure if the company wants to attain sales target of
` 115 lacs.
Solution. Given x = 10, y = 90, x = 3, y = 12 and r = 0.8
y 12
(i) byx = r = 0.8 × = 3.2
x 3
x 3
bxy = r = 0.8 × = 0.2
y 12
(ii) The line of regression of y on x is given by
y – y = byx (x – x )
y – 90 = 3.2 (x – 10)
y = 3.2 x + 58

Self-Instructional Material 155


Statistical Analysis The line of regression of x on y is given by
x – x = bxy (y – y )
x – 10 = 0.2 (y – 90)
NOTES x = 0.2 y – 8
(iii) Given x = 18 we have to estimate y
 y = 3.2 × 18 + 58 = Rs. 115.6 lacs
(iv) Given y = 115 we have to estimate x
 x = 0.2 × 115 – 8 = Rs. 15 lacs.
Example 8. The equations of two lines of regression are
4x + 3y + 7 = 0 and 3x + 4y + 8 = 0.
Find (i) mean value of x and y
(ii) the regression coefficients byx and bxy
(iii) the correlation coefficient between x and y.
(iv) the standard deviation of y, if the variance of x is 4
(v) the value of y for x = 5
Solution. (i) Since the mean of x and mean of y lie on the regression lines, we
have
 4x + 3y + 7 = 0 or 4x + 3y = – 7
and 3x  4 y  8  0 or 3x  4 y   8
On solving the above equations for x and y , we have
4 11
x =– and y =–
7 7
4 11
Mean of x = – and mean of y = –
7 7
(ii) Let the regression line of y on x be
4 7
4x + 3y + 7 = 0 or y=– x
3 3
4
 byx = –
3
and the regression line of x on y be
4 8
3x + 4y + 8 = 0 or x=– y–
3 3
4
 bxy = –
3
 3    3   9  1
Since
 4   4  16
byx . bxy = 
Hence, the choice of regression line is correct.
3 3
So byx = – and bxy = –
4 4
(iii) We know that r= byx  bxy

  3     3  3 3
 r=
 4  4 4
==–
4
(Q byx and bxy have the negative sign)

156 Self-Instructional Material


(iv) We have x2 = 4  x = 2 Regression Analysis
3 y 3
Now, byx = – or r =–
4 x 4
  3   
y 3
 4 2 =–
4
 y = 2 NOTES

(v) Since we have to find y when x is given, we use line of regression of y on x


4 7
y=– x–
3 3
Putting x = 5, we have
4 7
y=– ×5– = – 6.67 – 2.33
3 3
y = – 9.
Example 9. Consider the two regression lines :
3x + 2y = 26 and 6x + y = 31.
(i) Find the mean values of x and y.
(ii) Find the correlation coefficient between x and y.
(iii) Show that the estimated value of y for x = 0 is 13 whereas estimated value of
x for y = 13 is 3.
Solution. (i) Since the mean of x and mean of y lie on the regression line, we
have
3 x + 2 y = 26 and 6 x + y = 31
On solving these two equations for x and y ,
We have x =4 and y =7
 Mean of x = 4 and mean of y = 7
(ii) Let the regression line of y on x be
3 3
3x + 2y = 26 or y = – x + 13  byx = –
2 2
and let the regression line of x on y be
1 31 1
6x + y = 31 or x = – y+  bxy = –
6 6 6
We know that r=  byx  bxy

3 1 1 1
r=   = 
2 6 2 2
(r is negative since byx and bxy are negative)
(iii) Since we have to show the estimated value of y for the given value of x, we
use line of regression of y on x
3
y=– x + 13
2
3
Putting x = 0, we have y = – × 0 + 13 = 13
2
To show the estimated value of x for the given value of y, we use line of regression of
x on y
1 31
x=– y+
6 6
Self-Instructional Material 157
Statistical Analysis Putting y = 13, we have
1 31 18
x=– × 13 +  = 3.
6 6 6
NOTES

7.5. STAND
STANDARD ERR
ANDARD OR OF ESTIMA
ERROR TE (OR PREDICTION)
ESTIMATE

The square root of arithmetic mean of squared deviation of the predicted value
from the observed value is known as the standard error of estimate or prediction. It is
given by

( y  y p ) 2
Eyx = ,
n
where y is the actual value and yp is the predicted value ; Eyx is called the standard
error of estimate or prediction of y on x.
Example. Find the standard error of estimate of y on x from the following data :

x 1 2 3 4 5

y 2 5 3 8 7

Solution. For the standard error of estimate of y on x we have to find regression


line of y on x.

x y x2 xy

1 2 1 2
2 5 4 10
3 3 9 9
4 8 16 32
5 7 25 35

x = 15 y = 25 x2 = 55 xy = 88

x 15
x =  =3
n 5
y 25
y =  =5
n 5
Here, n=5
n xy  xy 5  88  15  25 65
byx = = = = 1.3
n x  (x)2 2 5  55  (15) 2 50
The line of regression of y on x is given by
y – y = byx (x – x )
y – 5 = 1.3 (x – 3)
y = 1.3x + 1.1 or yp = 1.3x + 1.1

158 Self-Instructional Material


Now Regression Analysis

x y yp = 1.3x + 1.1 y – yp (y – yp)2

1 2 1.3 × 1 + 1.1 = 2.4 – 0.4 0.16


NOTES
2 5 1.3 × 2 + 1.1 = 3.7 1.3 1.69
3 3 1.3 × 3 + 1.1 = 5.0 – 2.0 4.00
4 8 1.3 × 4 + 1.1 = 6.3 1.7 2.89
5 7 1.3 × 5 + 1.1 = 7.6 – 0.6 0.36

(y – yp)2 = 9.10

( y  y p ) 2 9.10
Eyx = = = 182
. = 1.349.
n 5

7.6. COEFFICIENT OF DETERMINATION


DETERMINA

The quantity r2 is called the coefficient of determination. It lies between 0 and 1.


Explained variation  ( y p  y) 2
r2 = =
Total variation  ( y  y) 2
The quantity (1 – r2) is called the coefficient of non-determination. Also the
quantity 1  r 2 is called the coefficient of alienation.
Since r lies between – 1 and + 1, r2 lies between 0 and 1 both inclusive.
Note. This is another formula to calculate correlation coefficient r.

7.8. PROPER
PROPERTIES OF COEFFICIENT OF
OPERTIES
DETERMINATION
DETERMINA

(i) As an index of fit it is interpreted as the total proportion of variance in y


explained by x.
(ii) As a measure of linear relationship it tells us how well the regression line
fits the data.
(iii) As an important indicator of the predictive accuracy of the regression equa-
tion, the minimum value of r2 should be 0.8, otherwise, the predictive accuracy is
considered low.

EXERCISE 7.1

1. Find the line of regression of y on x for the following data :

x 10 9 8 7 6 4 3

y 8 12 7 10 8 9 6

Self-Instructional Material 159


Statistical Analysis 2. Find the line of regression of y on x for the following data :

x 1 3 4 6 8 9 11 14

y 1 2 4 4 5 7 8 9
NOTES
Estimate the value of y, when x = 10.
3. Find the regression lines for the following data :

x 6 2 10 4 8

y 9 11 5 8 7

4. Find the regression coefficient bxy between x and y for the following data :
x = 30, y = 42, xy = 199, x2 = 184, y2 = 318 and n = 6
5. Find the regression coefficients byx and bxy for the following data :
x = 24, y = 12, x2 = 374, y2 = 97, xy = 157 and n = 7.
Also, find the coefficient of correlation between x and y.
6. Find the regression line of x on y and estimate the value of x, when y = 5 from the
following data :
x = 125, y = 100, x2 = 1650, y2 = 1500, xy = 50 and n = 25.
7. The following regression equations were obtained from a correlation table :
y = 0.516 x + 33.73 ; x = 0.512 y + 32.52
Find the value of
(i) the mean of x’s and the mean of y’s (ii) the correlation coefficient.
(iii) the coefficient of determination.
8. You are given the following data :
Series x y
Mean 18 100
Standard deviation 14 20
Correlation coefficient between x and y = 0.8.
Find (i) the regression coefficients byx and bxy.
(ii) the two regression lines.
(iii) estimate the value of y, when x = 70.
(iv) estimate the value of x, when y = 90.
9. If 4x – 5y + 33 = 0 and 20x – 9y – 107 = 0 are two lines of regression. Find
(i) the mean values of x and y.
(ii) the regression coefficients byx and bxy.
(iii) the correlation coefficient between x and y.
(iv) the standard deviation of y, if variance of x is 9.
(v) the coefficient of determination.
10. Find the standard error of estimate of y on x for the following data :

x 1 3 4 6 8 9 11 14

y 1 2 4 4 5 7 8 9

11. In a partially destroyed record, for the estimation of the two lines of regression from a
bivariate data (x, y), the following results were available :
Regression coefficient of y on x = – 1.6, regression coefficient of x on y = – 0.4, standard
error of the estimate of y on x = 3.
Find (i) coeff. of correlation between x and y (ii) standard deviation x and y (iii) standard
error of estimate of x on y.

160 Self-Instructional Material


Answers Regression Analysis

 1 133 "  7 6 "


1. y x 2. y  x , 6.91# 3. y = 11.9 – 0.65 x ; x = 16.4 – 1.3 y
! 3 21 #$ ! 11 11 $

4. – 0.46 5. byx = 0.397 ; bxy = 1.516 ; r = 0.776 NOTES


 x = – 9 y + 146 ; 4.591 "#
6.
! 22 22 $ 7. (i) 67.6, 68.61 (ii) 0.514 (iii) 0.264

8. (i) byx = 1.14 and bxy = 0.56 (ii) y = 1.14x + 79.41 ; x = 0.56y – 38 (iii) 159.21 (iv) 12.4

4 9
9. (i) 13, 17 (ii) byx = , bxy = (iii) r = 0.6 (iv) y = 4 (v) r2 = 0.36
5 20
10. 0.564 11. (i) r = – 0.8 (ii) x = 2.5, y = 5, (iii) Exy = 1.5

Self-Instructional Material 161


Statistical Analysis

8. STATISTICAL QUALITY CONTROL


NOTES

STRUCTURE

Introduction
Causes of Variations
Methods of Statistical Quality Control
Advantages of Statistical Quality Control
Control Charts
Types of Control Charts
Control Charts for Variables
Control Charts for Attributes
(i) Control Chart for Fraction Defectives (p-chart)
(ii) Control Chart for Number of Defectives (np-chart)
(iii) Control Chart for Number of Defects (c-Chart)

8.1. INTRODUCTION
INTRODUCTION

Statistical quality control (SQC) is one of the major area of production


management. It is a specialised professional technique which is used to maintain the
technical efficiency of the processes of production. SQC is a simple statistical method
for determining the extent to which quality goals are being met without necessarily
checking each and every item produced and for indicating whether or not the variations
which occur are exceeding normal expectations. SQC enables us to decide whether to
reject or accept a particular product.

8.2. CAUSES OF VARIA


CAUSES TIONS
ARIATIONS

Products of exactly the same quality are not possible to be produced in the
continuous flow of any manufacturing process. So the variations in quality of the product
remains inevitable. These variations occurs due to two types of causes :
(i) Chance or Random causes. Some deviations from the desired specifications
are bound to occur in the items produced, howsoever efficient, the production process
may be. If the variations occurs due to some inherent pattern of variation and no
causes can be assigned to it, it is called chance or random variation.
For instance, slight variation in temperature, pressure and humidity, etc. interact
randomly to produce slight variation in the quality of the product. Chance variation is
162 Self-Instructional Material
tolerable and does not materially affect the quality of a product. In such a situation, Statistical Quality Control
the process is said to be under statistical control.
(ii) Assignable causes. Assignable causes (also called non random or systematic)
can be easily identified. The assignable cause may occur at any stage of the process.
These causes can be easily removed. Assignable causes of variation may be due to NOTES
defective raw material, negligence of the operators, improper handling of machines,
faulty equipments, etc. In such a situation, the process is said to be out of control.

8.3. METHODS OF STATISTICAL Q


STA QUUALITY CONTROL
CONTROL

To control the quality characteristics of the product, there are two main methods :
1. Process control. The main aim in any production process is to control and
maintain the quality of the product to requisite standard during the manufacturing
process. This is termed as process control and is achieved through the use of control
charts given by W.A. Shewhart.
2. Product control. This technique is concerned with inspection of already
manufactured product to ascertain whether they are acceptable to the consumer or
not. This is achieved through an acceptance inspection or a sampling inspection plan.
Such a sampling inspection is often termed as product control.

8.4. ADVANT
ADV AGES OF ST
ANTA ATISTICAL QU
STA ALITY CONTR
QUALITY OL
CONTROL

1. SQC makes it possible to discriminate whether deviation from the requisite


standard occurring in the product during manufacturing process is due to
chance causes or due to assignable causes.
2. SQC is extremely useful, particularly in the case where the units are destroyed
under inspection, e.g., the life of an electric bulb, explosiveness of crackers,
bombs, life of a battery cell, etc.
3. SQC enables to determine whether the quality standards are being met
without inspecting the every unit produced.
4. SQC helps to know whether the manufacturing process is under control or
not and if it has gone out of control, remedial measures can be applied.
5. SQC reduces the waste of time and material to the absolute minimum by
giving an early warning about the occurrence of the defects.
6. The greatest advantage is the low cost of inspection..
7. SQC minimizes the risk of the consumer as well as the producer.
8. SQC provides protection to the manufacturer against losses due to the
rejection of manufacturing products, likely to be made later on.
9. Efficient utilization of personnel, machines and materials results in higher
production.
10. Removal of bottle necks in the production process.

8.5. CONTROL CHAR


CONTROL TS
CHARTS

Control charts are the devices to describe the patterns of variation. The control
charts were developed by W.A. Shewhart of Bell Telephone Laboratories in 1924. Based

Self-Instructional Material 163


Statistical Analysis on the theory of probability and sampling, it enabled us to detect the presence of
assignable causes of erratic variations in the process. These causes are then identified
and eliminated and the process is stabilized and controlled at desired performances.
A control chart is the running record graph of the performance of some quality
NOTES characteristics. A control chart consists of the following three horizontal lines on the
graph :
(i) a control or central line (CL) depicting the desired standard or level of the
process.
(ii) an upper control limit (UCL).
(iii) a lower control limit (LCL).

Y
Out of control
UCL
Quality characteristics

Under
CL
control

LCL
Out of control

0 1 2 3 4 5 6 7 8 9 10 11 12 13 X
Sample numbers

The control chart has a horizontal scale that represents the consecutive sample
number and a vertical scale that represents the quality characteristic of each sample.

8.6. TYPES OF CONTR OL CHAR


CONTROL TS
CHARTS

Control charts are of two types depending on whether a given characteristic is


measurable or not.
(i) Control Charts for Variables. These charts are used to achieve and
maintain an acceptable quality level for a process whose product can be subjected to
quantitative measurements.
(ii) Control Charts for Attributes. These charts are used to achieve and
maintain an acceptable quality level for a process whose product cannot be subjected
to quantitative measurements but can be classified as good or bad, acceptable or
non-acceptable.

8.7. CONTROL CHAR


CONTROL TS FOR VARIABLES
CHARTS

The most common charts for variables are


(i) Control Charts for sample means ( x -Charts)
(ii) Control Charts for sample ranges (R-Charts)

164 Self-Instructional Material


The various steps to construct x and R-charts are as follows : Statistical Quality Control
1. A random sample of size n (n is usually 4 or 5 units) is taken during the
manufacturing process over a period of time and the quality measurements
x1, x2, ..., xn are noted.
NOTES
2. The sample mean x and sample range R are calculated by using
n
x  x2  ......  xn 1
x = 1
n

n x
i 1
i

and R = xmax – xmin ,


where xmax and xmin are the largest and smallest values of measurements x1, x2, ..., xn
respectively.
3. If the process is found to be satisfactory, k successive samples (k usually
varies from 20 to 30) are taken and for each sample, mean x and range R are calculated.
Then find the combined mean x and combined range R by using
k
x1  x2  ...  xk 1
x =
k

k x
i 1
i

k
R 1  R 2  ...  R k 1
and R =
k

k Ri 1
i

4. Calculation of control limits for x -chart


Control or central line (CL) = x

3R
Upper control limit (UCL) = x 
d2 n

or Upper control limit (UCL) = x  A 2 R

3R
Lower control limit (LCL) = x 
d2 n

or Lower control limit (LCL) = x  A 2 R ,


where d2 and A2 can be found from the table depending upon the size of the sample n.
5. Calculation of control limits for R-chart

Control or central line (CL) = R

Upper control limit (UCL) = D4 R

Lower control limit (LCL) = D3 R ,


where D3 and D4 can be found from the table depending upon the size of the sample n.
6. The natural tolerance limits (upper and lower tolerance limits) for individual
values of x are calculated by using

3R
UTL x = x 
d2

3R
LTL x = x 
d2

Self-Instructional Material 165


Statistical Analysis The process is said to be capable of meeting the customers specifications if these
natural tolerance limits fall within the customers specifications.

6R
The process capability is = 6  = , where  is standard deviation.
NOTES d2
Table

Sample size (n) A2 D3 D4 d2

2 1.88 0.00 3.27 1.13


3 1.02 0.00 2.57 1.69
4 0.73 0.00 2.28 2.06
5 0.58 0.00 2.11 2.33
6 0.48 0.00 2.00 2.53
7 0.42 0.08 1.92 2.70
8 0.37 0.14 1.86 2.85
9 0.34 0.18 1.82 2.97
10 0.31 0.22 1.78 3.08
11 0.29 0.26 1.74 3.17
12 0.27 0.28 1.72 3.26
13 0.25 0.31 1.69 3.34
14 0.24 0.33 1.67 3.41
15 0.22 0.35 1.65 3.47
16 0.21 0.36 1.64 3.53
17 0.20 0.38 1.62 3.59
18 0.19 0.39 1.61 3.64
19 0.19 0.40 1.60 3.69
20 0.18 0.41 1.59 3.74

8.8. CONTROL CHAR


CONTROL TS FOR ATTRIB
CHARTS UTES
TTRIBUTES

Sometimes it becomes impossible to determine the quality of a product by means


of measurement. The product is classified as good or bad, acceptable or non-acceptable.
At times, the product is inspected for defects. Such characteristics are called attributes.
The most common charts for attributes are
(i) Control chart for fraction defective (p-chart)
(ii) Control chart for number of defective (np-chart)
(iii) Control chart for number of defects (c-chart).

8.9. (i) CONTROL CHAR


CONTROL CHARTT FOR FRACTION DEFECTIVES
FRACTION
(p-CHAR T)
-CHART)

Let n be the sample size taken from the production process at different time
intervals. If d be the number of defectives in this sample of size n, then the fraction
d
defective in this sample is given by p = or d = np
n

166 Self-Instructional Material


If p represents the average fraction defective from all the samples (k samples) Statistical Quality Control
inspected, then
Total number of defectives in all the samples inspected
p = .
Total number of items inspected in all the samples NOTES
The Binomial distribution is used to construct the ‘p’ chart. By Binomial
distribution standard deviation (p) is given by

p (1  p)
p =
n
Control limits for p-chart are given by
CLp = p

p (1  p)
UCLp = p + 3p = p + 3
n

p (1  p)
LCLp = p – 3p = p – 3
n
Since the number of defectives (or fraction defectives) cannot be negative, if
LCL comes out to be negative, it is taken as zero.
To construct the p-chart, p-values are taken on the y-axis and sample numbers
on the x-axis. If any point lies outside the control limits, it is concluded that the process
is not under control otherwise under control.

8.9. (ii
(ii)) CONTROL CHAR
CONTROL CHART T FOR NUMBER OF
DEFECTIVES ((np
np
np--CHAR T)
CHART)

If n is the sample size and d is the number of defectives in this sample, then
d = np, where p is the fraction defectives in the sample.
Now, let if np represents the average number of defectives per sample of constant
Total number of defective items in all the samples inspected
size, i.e., np =
Number of samples inspected
Now the standard deviation (np) is given by

p (1  p)
np = np = n = np (1  p)
n
Control limits for np-chart are given by
CLnp = np
UCLnp = np + 3np = np + 3 np (1  p)
LCLnp = np – 3np = np – 3 np (1  p)
Since the number of defectives cannot be negative, if LCL comes out to be
negative, it is taken as zero. To construct the np-chart, np, i.e., d values are taken on
the y-axis and sample numbers on the x-axis. If any point lies outside the control
limits, it is concluded that the process is not under control otherwise under control.

Self-Instructional Material 167


Statistical Analysis
8.9. (iii) CONTR OL CHAR
CONTROL CHARTT FOR NUMBER OF DEFECTS
(c-CHAR T)
-CHART)
NOTES There are many situations in which it will be advantageous to know the number
of defects in an item or product after classifying that item or product is defective. In
this situation, c-chart is used. Sample size for c-chart may be single unit like a radio,
a match box, a computer, an aircraft or a group of units. The number of defects may be
0, 1, 2, ...... The variate values are discrete in nature and hence, it will follow a discrete
distribution. The Poisson distribution is used to construct the c-chart. Since for a Poisson
distribution mean and variance are same, then the standard deviation (c) is given by
c = c ,
where c is average number of defects in a sample.
Total number of defects in all the samples inspected
c =
Number of samples inspected
Control limits for c-chart are given by
CLc = c
UCLc = c + 3c = c + 3 c
LCLc = c – 3c = c – 3 c
Since the number of defects cannot be negative, if LCL comes out to be negative,
it is taken as zero. To construct the c-chart, c-values are taken on the y-axis and sample
numbers on the x-axis. If any point lies outside the control limits, it is concluded that
the process is not under control otherwise under control.

SOLVED EXAMPLES

Example 1. Using the following data calculate the control limits for x -chart:
n = 12, x = 138.6, R = 7.4 and d2 = 3.258.
Solution. The control limits for x -chart are calculated as :
CL = x = 138.6
3R 3  7.4
UCL = x  = 138.6 +
d2 n 3.258 12
= 138.6 + 1.967 = 140.567
3R 3  7.4
LCL = x  = 138.6 –
d2 n 3.258 12
= 138.6 – 1.967 = 136.633.
Example 2. Using the following data calculate the control limits for R-chart:
n = 4, R = 9.60, d2 = 2.059 and d3 = 0.880.
Solution. The control limits for R-chart are calculated as :
CL = R = 9.60
3d3 R 3  0.880  9.60
UCL = R + = 9.60 
d2 2.059
= 9.60 + 12.3089 = 21.91

168 Self-Instructional Material


Statistical Quality Control
3d R 3  0.880  9.60
LCL = R  3 = 9.60 
d2 2.059
= 9.60 – 12.3089 = – 2.71.
Example 3. A machine is set to deliver packets of a given weight, 10 samples of NOTES
size 5 each were recorded and the mean and range of each sample is as follows :

Sample No. 1 2 3 4 5 6 7 8 9 10

Mean ( x ) 49 45 48 53 39 47 46 39 51 45

Range (R) 7 5 7 9 5 8 8 6 7 6

Calculate the control limits of x and R-charts. Comment on the state of control
without drawing the charts.
Solution. Here, n = 5, k = 10, A2 = 0.58, D3 = 0
and D4 = 2.11 (from table for n = 5)
x 49  45  ...  45 462
x =   = 46.2
k 10 10
R 68
R =  = 6.8
k 10
For x -chart
CL = x = 46.2
UCL = x + A2 R = 46.2 + 0.58 × 6.8
= 46.2 + 3.944 = 50.144
LCL = x – A2 R = 46.2 – 0.58 × 6.8
= 46.2 – 3.944 = 42.256
For R-chart
CL = R = 6.8
UCL = D4 R = 2.11 × 6.8 = 14.348
LCL = D3 R = 0 × 6.8 = 0
For x -chart some of the points are above and below the UCL and LCL, so the
process is not under control.
For R-chart all of the points lie within the UCL and LCL, so the process is under
control.
Example 4. A company manufactures screws to a nominal diameter 0.500 ±
0.030 cm. Five samples of size 3 each were taken from the manufactured lot at different
lengths. The readings are as follows :

Sample No. Measurement per sample x (in cm.)

1 2 3

1 0.488 0.489 0.505


2 0.494 0.495 0.499
3 0.498 0.515 0.487
4 0.492 0.509 0.514
5 0.490 0.508 0.499

Self-Instructional Material 169


Statistical Analysis Calculate the control limits of x and R-charts. Comment on the state of control
by drawing the charts.
Solution. Calculation for x
NOTES 0.488  0.489  0.505 1.482
For sample 1, x1 =  = 0.494
3 3
0.494  0.495  0.499 1.488
For sample 2, x2 =  = 0.496
3 3
0.498  0.515  0.487 1.500
For sample 3, x3 =  = 0.500
3 3
0.492  0.509  0.514 1.515
For sample 4, x4 =  = 0.505
3 3
0.490  0.508  0.499 1.497
For sample 5, x5 =  = 0.499
3 3
x1  x2  x3  x4  x5
x =
5
0.494  0.496  0.500  0.505  0.499 2.494
= =  0.4988
5 5
Calculation for R :
For sample 1, R1 = xmax – xmin
R1 = 0.505 – 0.488 = 0.017
For sample 2, R2 = 0.499 – 0.494 = 0.005
For sample 3, R3 = 0.515 – 0.487 = 0.028
For sample 4, R4 = 0.514 – 0.492 = 0.022
For sample 5, R5 = 0.508 – 0.490 = 0.018
R 1  R2  R3  R 4  R5
R=
5
0.017  0.005  0.028  0.022  0.018 0.090
= =  0.018
5 5
Control limits for x -chart
CL = x = 0.4988
UCL = x + A2 R (Q A2 = 1.02 for n = 3)
= 0.4988 + 1.02 × 0.018
= 0.4988 + 0.01836 = 0.5172

LCL = x  A 2 R = 0.4988 – 1.02 × 0.018


= 0.4988 – 0.01836 = 0.4804
Control limits for R-chart
CL = R = 0.018
UCL = D4 R = 2.57 × 0.018 (Q D4 = 2.57 for n = 3)
= 0.0463

170 Self-Instructional Material


Statistical Quality Control
LCL = D3 R = 0 × 0.018 (Q D3 = 0 for n = 3)
=0

NOTES
0.54

0.53
UCL = 0.5172
0.52
Sample mean x

0.51

0.50
CL = 0.4988
0.49

0.48
LCL = 0.4804
0.47

0.46
1 2 3 4 5 6
Sample No.

( x -chart)

It is clear from the figure that all the values of x lies within the UCL and LCL,
so the process is under control.

0.07

0.06
UCL = 0.0463
Sample range R

0.05

0.04

0.03

0.02
CL = 0.018
0.01
LCL = 0
0.00
1 2 3 4 5 6
Sample No.

(R-chart)

It is clear from the figure that all the values of R lies within the UCL and LCL,
so the process is under control.
Example 5. The following are the mean lengths and ranges of lengths of a finished
product from 10 samples each of size 5. The specification limits for length are 200 ± 5
cm. Construct x and R-charts and examine whether the process is under control.

Sample No. 1 2 3 4 5 6 7 8 9 10

Mean ( x ) 201 198 202 200 203 204 199 196 199 201

Range (R) 5 0 7 3 4 7 2 8 5 6

Assume for n = 5, A2 = 0.577, D3 = 0 and D4 = 2.115.


Self-Instructional Material 171
Statistical Analysis Solution. The specification limits for length are given to be 200 ± 5 cm. Hence,
mean  is known as 200.
5 + 0  7  3  4  7  2  8  5  6 47
R =  = 4.7
10 10
NOTES
Control limits for x -chart
CL =  = 200
UCL =  + A2 R = 200 + 0.577 × 4.7
= 200 + 2.712 = 202.712
LCL =  – A2 R = 200 – 0.577 × 4.7
= 200 – 2.712 = 197.288
Control limits for R-chart
CL = R = 4.7
UCL = D4 R = 2.115 × 4.7 = 9.941
LCL = D3 R = 0 × 4.7 = 0

205

204

203 UCL = 202.712

202

201
Sample mean x

CL = 200
200

199

198

197 LCL = 197.288

196

195
1 2 3 4 5 6 7 8 9 10
Sample No.

( x -chart)

It is clear from figure that three points lie outside the UCL and LCL, so the
process is not in control.

172 Self-Instructional Material


Statistical Quality Control
UCL = 9.941
10

8 NOTES
7
Sample range R

5 CL = 4.7
4

1
LCL = 0
0
1 2 3 4 5 6 7 8 9 10
Sample No.

(R-chart)

It is clear from figure that all the values of R lies within UCL and LCL, so the
process is under control.
Example 6. A bulb manufacturing company ABC samples the fused bulb, taking
sample of 5 each every hour. These samples sets of five have been arranged in increasing
orders as follows :

45 42 20 35 43 52 61 20 16 70 65 60

68 46 25 55 52 70 65 25 28 100 85 75

75 65 82 69 57 75 70 32 40 110 95 94

77 70 87 78 60 80 90 55 65 115 100 109

88 92 87 85 79 120 110 65 85 160 110 140

Construct x and R-charts and examine whether the process is under control.
Solution.
45  68  75  77  88 353
For sample 1, x1 =  = 70.6
5 5
315 301 322
Similarly, x2   63, x3   60.2, x4   64.4,
5 5 5
291 397 396
x5   58.2, x6   79.4, x7   79.2
5 5 5
197 234 555
x8   39.4, x9   46.8, x10   111
5 5 5
455 478
x11   91, x12   95.6
5 5
For sample 1, R1 = xmax – xmin = 88 – 45 = 43

Self-Instructional Material 173


Statistical Analysis Similarly, R2 = 50, R3 = 67, R4 = 50, R5 = 36
R6 = 68, R7 = 49, R8 = 45 R9 = 69
R10 = 90, R11 = 45, R12 = 80
NOTES x = x1  x2  ...  x12  858.8 = 71.57
12 12
R 1  R 2  ...  R 12 692
R =  = 57.67
12 12
Control limits for x -chart
CL = x = 71.57
UCL = x  A 2 R
= 71.57 + 0.577 × 57.67 (Q A2 = 0.577 for n = 5)
= 71.57 + 33.27 = 104.84

LCL = x  A 2 R
= 71.57 – 0.577 × 57.67
= 71.57 – 33.27 = 38.3
Control limits for R-chart
CL = R = 57.67
UCL = D4 R
= 2.115 × 57.67 (Q D4 = 2.115 for n = 5)
= 121.97
LCL = D3 R
= 0 × 57.67 (Q D3 = 0 for n = 5)
=0

120

110 UCL = 104.84


100

90

80
CL = 71.57
70
Sample mean x

60

50

40 LCL = 38.3

30

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12
Sample No.

174 Self-Instructional Material


It is clear from the figure that one point lie outside the UCL, so the process is Statistical Quality Control
not in control.

130 NOTES
UCL = 121.97
120

110

100

90
Sample range R

80

70

60

50 CL = 57.67

40

30

20

10
LCL = 0
0
1 2 3 4 5 6 7 8 9 10 11 12
Sample No.

(R-chart)

It is clear from the figure that all the values of R lies within UCL and LCL, so
the process is under control.
Example 7. In a factory producing spark plugs, the number rejected found in
the inspection of 10 lots of size 100 each is given below:

Lot No. Number Fraction Lot No. Number Fraction


rejected rejected rejected rejected

1 4 0.040 6 4 0.040
2 7 0.070 7 5 0.050
3 8 0.080 8 8 0.080
4 2 0.020 9 6 0.060
5 3 0.030 10 10 0.100

Construct appropriate control chart and state whether the process is in control.
Solution. Since we are given fraction rejected, p-chart is suitable for the given
situation.
Total number of rejected
p =
Total number of items inspected in all samples

57
= = 0.057
10  100
CLp = p = 0.057

Self-Instructional Material 175


Statistical Analysis
p (1  p) 0.057 (1  0.057)
UCLp = p  3  0.057  3
n 100
= 0.057 + 0.070 = 0.127
NOTES
p(1  p) 0.057(1  0.057)
LCLp = p  3  0.057  3
n 100
= 0.057 – 0.070 = – 0.013
Since LCLp is negative, so LCLp is taken as zero.

(p-chart)

It is clear from figure that all the values of fraction rejected p lies within UCL
and LCL, so the process is under control.
Example 8. Based on 15 subgroups each of size 200 taken at intervals of 45
minutes from a manufacturing process, the average fraction defective was found to be
0.068. Calculate the value of CL, UCL and LCL.
Solution. Since we are given average fraction defective, we will calculate the
control limits of p-chart.
CLp = p = 0.068
p (1  p) 0.068(1  0.068)
UCLp = p  3 = 0.068  3
n 200
= 0.068 + 0.053 = 0.121
p (1  p ) 0.068(1  0.068)
LCLp = p  3 = 0.068  3
n 200
= 0.068 – 0.053 = 0.015.

176 Self-Instructional Material


Example 9. Samples of 100 tubes are drawn randomly from the output of a Statistical Quality Control
process that produces several thousand units daily. Sample items are inspected for
quality and defective tubes are rejected. The results of 15 samples are as follows :

Sample No. No. of Sample No. No. of NOTES


defective tubes defective tubes

1 8 9 10
2 10 10 13
3 13 11 18
4 9 12 15
5 8 13 12
6 10 14 14
7 14 15 9
8 6

Construct a control chart for fraction defective, and examine whether the process
is under control.
Solution.

Sample No. No. of defective tubes Fraction defective

1 8 0.08
2 10 0.10
3 13 0.13
4 9 0.09
5 8 0.08
6 10 0.10
7 14 0.14
8 6 0.06
9 10 0.10
10 13 0.13
11 18 0.18
12 15 0.15
13 12 0.12
14 14 0.14
15 9 0.09

p = Total number of defectives


Total number of items inspected in all samples

169
= = 0.113
15  100
Control limits for p-chart
CLp = p = 0.113
p (1  p) 0.113(1  0.113)
UCLp = p + 3 = 0.113 + 3
n 100
= 0.113 + 0.095 = 0.208

Self-Instructional Material 177


Statistical Analysis
p (1  p) 0.113 (1  0.113)
LCLp = p – 3 = 0.113 – 3
n 100
= 0.113 – 0.095 = 0.018
NOTES

(p-chart)

It is clear from figure that all the values of fraction defectives p lies within UCL
and LCL, so the process is under control.
Example 10. The following data refers to visual defects found during the
inspection of the first 10 samples of size 50 each from a lot of two-wheelers manufactured
by an automobile company :

Sample No. 1 2 3 4 5 6 7 8 9 10

No. of defectives 4 3 2 3 4 4 4 1 3 2

Draw the p-chart and examine whether the process is under control.
Solution.

Sample No. No. of defectives Fraction defectives

1 4 0.08
2 3 0.06
3 2 0.04
4 3 0.06
5 4 0.08
6 4 0.08
7 4 0.08
8 1 0.02
9 3 0.06
10 2 0.04

178 Self-Instructional Material


Statistical Quality Control
Total number of defectives 30
p= = = 0.06
Total number of items inspected in all samples 10  50
Control limits for p-chart
CLp = p = 0.06 NOTES
p (1  p) 0.06 (1  0.06)
UCLp = p + 3 = 0.06 + 3 = 0.06 + 0.1008 = 0.1608
n 50
p (1  p ) 0.06 (1  0.06)
LCLp = p – 3 = 0.06 – 3 = 0.06 – 0.1008 = – 0.0408
n 50
Since LCLp is negative, so LCLp = 0

0.18
UCL = 0.1608
0.16

0.14

0.12
Fraction defertives

0.10

0.08
CL = 0.06
0.06

0.04

0.02
LCL = 0
0
1 2 3 4 5 6 7 8 9 10
Sample No.

(p-chart)

It is clear from figure that all the values of fraction defectives p lies within UCL
and LCL, so the process is under control.
Example 11. Ten samples of hourly production of a mass produced items are
taken and the number of defectives in each sample are noted. On the basis of these data,
obtain the control limits of the control chart for fraction defectives.

Sample No. 1 2 3 4 5 6 7 8 9 10

Size of sample 148 160 155 156 161 167 164 160 156 173

No. of defectives 7 6 8 8 5 9 8 8 7 10

Solution. Here, the sample sizes are different. So the average sample size is to
be determined first as
Number of items examined = 1600
Number of samples = 10
1600
Average sample size (n) = = 160
10

Self-Instructional Material 179


Statistical Analysis Total number of defectives
Now p =
Total number of items inspected in all samples
76
= = 0.0475
1600
NOTES Control limits for p-chart
CLp = p = 0.0475
0.0475 (1  0.0475)
UCLp = p + 3 p (1  p) = 0.0475 + 3
n 160
= 0.0475 + 0.0504 = 0.0979
p (1  p ) 0.0475 (1  0.0475)
LCLp = p – 3 = 0.0475 – 3
n 160
= 0.0475 – 0.0504 = – 0.0029
Since LCLp is negative, so LCLp = 0.
Example 12. In a blade manufacturing factory, 1000 blades are examined daily.
Following information shows number of defectives blades obtained there. Draw the np-
chart and comment on the state of control.

Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No. of defective blades 9 10 12 8 7 15 10 12 10 8 7 13 14 15 16

Solution. Here, n = 1000, k = 15


Total number of defectives
np =
Number of samples inspected

166
np = = 11.067
15
166
p = = 0.011
1000  15
Control limits for np-chart
CLnp = n p = 11.067

UCLnp = n p + 3 np (1  p) = 11.067 + 3 11.067 (1  0.011)


= 11.067 + 9.925 = 20.992
LCLnp = n p – 3 np (1  p) = 11.067 – 3 11.067 (1  0.011)
= 11.067 – 9.925 = 1.142

180 Self-Instructional Material


Statistical Quality Control

NOTES

(np-chart)

It is clear from the figure that all the number of defectives lies within UCL and
LCL, the process is under control.
Example 13. An inspection of 10 samples of size 400 each from 10 lots revealed
the following number of defectives 17, 15, 14, 26, 9, 4, 19, 12, 9, 15.
Draw the control chart for number of defectives and examine whether the process
is under control.
Solution. Here, n = 400 and k = 10
Total number of defectives
np =
Number of samples inspected
140
np = = 14
10
140
p = = 0.035
400  10
Control limits for np-chart
CLnp = 14
UCLnp = n p + 3 np (1  p) = 14 + 3 14 (1  0.035)
= 14 + 11.027 = 25.027
LCLnp = n p – 3 np (1  p) = 14 – 3 14 (1  0.035)
= 14 – 11.027 = 2.973

30
UCL = 25.027
25
No. of defectives

20

15 CL = 14

10

5 LCL = 2.973
0
1 2 3 4 5 6 7 8 9 10
Sample No.
(np-chart)
Self-Instructional Material 181
Statistical Analysis It is clear from the figure that one point corresponding to 4th sample lie outside
the UCL and LCL, so the process is not under control.
Example 14. An inspection of 10 samples of size 100 each revealed the following
data :
NOTES
Sample No. 1 2 3 4 5 6 7 8 9 10

No. of defectives 2 1 1 3 2 3 4 2 2 0

Draw the control chart for number of defectives (np-chart) and examine whether
the process is under control.
Solution. Here, n = 100, k = 10
Total number of defectives
np =
Number of samples inspected
20
np = =2
10
20
p = = 0.02
100  10
Control limits for np-chart
CLnp = n p = 2
UCLnp = n p + 3 np (1  p)

=2+3 2 (1  0.02) = 2 + 4.2 = 6.2

LCLnp = n p – 3 np (1  p)

=2–3 2 (1  0.02) = 2 – 4.2 = – 2.2


Since LCLnp is negative, so LCLnp = 0.

7
UCL = 6.2
6

5
No. of defectives

3
CL = 2
2

1
LCL = 0
0
1 2 3 4 5 6 7 8 9 10
Sample No.

(np-chart)

It is clear from the figure that all the number of defectives lies within UCL and
LCL, the process is under control.

182 Self-Instructional Material


Example 15. During an inspection of equal length of cloth, the following are the Statistical Quality Control
number of defects observed :
2, 3, 4, 0, 5, 6, 7, 4, 3, 2.
Draw a control chart for the number of defects and comment whether the process
NOTES
is under control.
Solution. Average number of defects in 10 sample is given by

Total number of defects


c =
No. of samples inspected

2+3+4+0+5+6+7+4+3+2 36
= = = 3.6
10 10
Control limits for c-chart
CLc = c = 3.6

UCLc = c + 3 c = 3.6 + 3 3.6 = 3.6 + 5.692 = 9.292

LCLc = c – 3 c = 3.6 – 3 3.6 = 3.6 – 5.692 = – 2.092


Since LCLc is negative, so LCLc = 0

10 UCL = 9.292

8
No. of defectives

6
CL = 3.6
4

2
LCL = 0
0
1 2 3 4 5 6 7 8 9 10
Sample No.

(c-chart)

It is clear from the figure that all the values of number of defects (c) lies within
UCL and LCL, the process is under control.
Example 16. The number of complaints received daily by an organization are as
follows :

Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Complaints 2 3 0 1 9 2 0 0 4 2 0 7 0 2 4

Draw a suitable control chart and examine whether the process is under control.
Solution. For the given problem, the suitable control chart is c-chart. Let the
number of complaints is denoted by c.
Total number of complaints
Here, c =
Number of days
36
c = = 2.4
15

Self-Instructional Material 183


Statistical Analysis Control limits for c-chart
CLc = c = 2.4
UCLc = c + 3 c = 2.4 + 3 2.4 = 2.4 + 4.648 = 7.048
NOTES
LCLc = c – 3 c = 2.4 – 3 2.4
= 2.4 – 4.648 = – 2.248
Since LCLc is negative, so LCLc = 0

10

8
UCL = 7.048
7
No. of complaints

3 CL = 2.4
2

1
LCL = 0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Day’s number

It is clear from the figure that one value of c corresponding to 5th day is not
within the control limits, the process is out of control.
Example 17. The following table shows the number of missing rivets observed
at the same time of the inspection of 12 aircrafts. Find the control limits for the number
of defects chart and comment on the state of control.

Aircraft Number 1 2 3 4 5 6 7 8 9 10 11 12

No. of missing rivets 7 15 13 18 10 14 13 10 20 11 22 15

Total number of missing rivets


Solution. c =
Number of aircrafts inspected

168
c = = 14
12
Control limits for c-chart
CLc = c = 14

UCLc = c + 3 c

= 14 + 3 14 = 14 + 11.225 = 25.23

184 Self-Instructional Material


LCLc = c – 3 c Statistical Quality Control

= 14 – 3 14 = 14 – 11.225 = 2.775

NOTES
30
UCL = 25.23
25
No. of missing rivets

20

15 CL = 14

10

5 LCL = 2.775
0
1 2 3 4 5 6 7 8 9 10 11 12
Air craft no.

(c-chart)

It is clear from the figure that all the values of missing rivets lies within UCL
and LCL, the process is under control.

EXERCISE 8.1

1. In the manufacturing process of a certain item from 20 subgroups, each of size 4, it is


found that  x = 41.283 and R = 0.335. Compute the control limits for x and R-charts.
2. A machine is set to deliver packets of a given weight, 10 samples of size 5 each were
recorded and the mean and range of each sample is as follows :

Sample No. 1 2 3 4 5 6 7 8 9 10

Mean ( x ) 15 17 15 18 17 14 18 15 17 16

Range (R) 7 7 4 9 8 7 12 4 11 5

Calculate the control limits of x and R -charts and comment on the state of control.
3. A company manufactures a product which is packed in cans. It utilises an automatic
filling equipment. It takes a sample of 5 cans every hour and measures the filling (grams)
in the last 5 samples.

Sample No. Individual measurements

1 2 3 4 5

1 1001 998 1002 1002 999


2 999 998 1001 998 999
3 995 1001 1003 1002 1002
4 1000 998 999 1001 1002
5 994 1000 996 996 999

Calculate the control limits of x and R-charts and comment on the state of control.

Self-Instructional Material 185


Statistical Analysis 4. The following data shows the value of sample mean x and range R for 10 samples of size
5 each.

Sample No. 1 2 3 4 5 6 7 8 9 10

NOTES Mean ( x ) 11.2 11.8 10.8 11.6 11.0 9.6 10.4 9.6 10.6 10.0

Range (R) 7 4 8 5 7 4 8 4 7 9

Construct x and R-charts and examine whether the process is under control.
(Given for n = 5, A2 = 0.577, D3 = 0, D4 = 2.115)
5. The following data shows the value of sample mean x and range R for 10 samples of size
5 each.

Sample No. 1 2 3 4 5 6 7 8 9 10

Mean ( x ) 43 49 37 44 45 37 51 46 43 47

Range (R) 5 6 5 7 7 4 8 6 4 6

Construct x and R-charts and comment on the state of control.


6. If the average fraction defective of large sample of products is 0.1537. Calculate the
control limits. (Given that subgroup size is 2000)
7. A company produces fuses for automobile electric systems. Five hundred of the fuses are
tested per day for 30 days. The following table gives the number of defective fuses found
per day for the 30 days.

Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No. of defectives 3 3 3 3 1 1 1 1 6 1 1 1 5 4 6

Day 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

No. of defectives 3 6 2 7 3 2 3 6 1 2 3 1 4 4 5

Calculate the central line, upper control limit and lower control limit for p-chart.
8. A daily sample of 30 items was taken over a period of 14 days in order to establish
attributes control limits. If 21 defectives were found, what should be the upper and
lower control limits of the proportion of defectives ?
9. In a factory producing an item, the number rejected found in the inspection of 20 lots of
size 100 each is given below :

Lot No. No. rejected Fraction rejected Lot No. No. rejected Fraction rejected

1 5 0.050 11 4 0.040
2 10 0.100 12 7 0.070
3 12 0.120 13 8 0.080
4 8 0.080 14 2 0.020
5 6 0.060 15 3 0.030
6 5 0.050 16 4 0.040
7 6 0.060 17 5 0.050
8 3 0.030 18 8 0.080
9 3 0.030 19 6 0.060
10 5 0.050 20 10 0.100

Construct appropriate control chart and state whether the process is in control.

186 Self-Instructional Material


10. The table given below shows the results of the production and inspection of 100 castings Statistical Quality Control
a day for 20 days. Based on these data, construct p-chart and state whether the process
is in control.

Day No. of defectives Day No. of defectives


NOTES
1 6 11 33
2 11 12 39
3 20 13 25
4 22 14 18
5 9 15 17
6 40 16 14
7 12 17 13
8 10 18 5
9 31 19 7
10 30 20 9

11. The following data refer to defects found during inspection of the first 10 samples of size
100 each.

Sample No. 1 2 3 4 5 6 7 8 9 10

No. of defectives 4 8 11 3 11 7 7 16 12 6

Calculate the control limits for np-chart and state whether the process is in control.
12. Twenty samples each of size 10 were inspected. The number of defectives found in each
of them is given below :

Sample No. 1 2 3 4 5 6 7 8 9 10

No. of defectives 0 1 0 3 9 2 0 7 0 1

Sample No. 11 12 13 14 15 16 17 18 19 20

No. of defectives 1 0 0 3 1 0 0 2 0 0

Construct appropriate chart and state whether the process is in control.


13. The following data refer to number of defectives found during inspection of first 10 samples
of size 100 each :

Sample No. 1 2 3 4 5 6 7 8 9 10

No. of defectives 4 8 11 3 11 7 7 16 12 6

Obtain the upper and lower control limits of np-chart and state whether the process is in
control.
14. The following data refer to number of defectives found on 24 consecutive production
days in daily samples of 400 items :

Production day 1 2 3 4 5 6 7 8 9 10 11 12

No. of defectives 20 10 20 24 22 18 38 8 24 54 50 18

Production day 13 14 15 16 17 18 19 20 21 22 23 24

No. of defectives 24 30 16 28 20 8 22 22 52 6 20 22

Draw np-chart and state whether the process is in control.

Self-Instructional Material 187


Statistical Analysis 15. Ten pieces of cloth out of different rolls of equal length contained the following number of
defects :
1, 7, 3, 1, 2, 4, 8, 2, 0, 3
Draw a control chart for the number of defects and state whether the process is in control.
NOTES 16. The number of complaints received daily by an organization are as follows :

Day 1 2 3 4 5 6 7 8 9 10

Complaints 2 3 4 0 5 6 7 4 3 2

Draw a control chart for number of defects and comment whether the process is in control.
17. The number of mistakes made by an account clerk are as follows :

Week No. 1 2 3 4 5 6 7 8 9 10

No. of mistakes 1 0 2 0 1 0 1 0 1 2

Week No. 11 12 13 14 15 16 17 18 19 20

No. of mistakes 3 3 1 0 0 7 1 0 1 0

Draw an appropriate control chart and state whether the mistakes of the clerk is in
under control.

Answers
1. For x -chart For R-chart
CL = 2.06415 CL = 0.01675
UCL = 2.076 UCL = 0.03819
LCL = 2.0522 LCL = 0]
2. For x -chart For R-chart
CL = 16.2 CL = 7.4
UCL = 20.492 UCL = 12.3
LCL = 11.908 LCL = 0
Process is under control by both charts.
3. For x -chart : CL = 999.32, UCL = 1001.772, LCL = 996.872,
For R-chart : CL = 4.4, UCL = 9.306, LCL = 0
Process under control using both x and R-charts.
4. For x -chart : CL = 10.66, UCL = 14.295, LCL = 7.025 ; process under control
For R-chart : CL = 6.3, UCL = 13.3245, LCL = 0 ; process under control.
5. For x -chart : CL = 44.2, UCL = 47.564, LCL = 40.836; out of control
For R-chart : CL = 5.8, UCL = 12.267, LCL = 0; under control
6. CLp = 0.1537, UCLp = 0.17788, LCLp = 0.1295.
7. CL = 0.184, UCL = 0.236, LCL = 0.132.
8. CLp = 0.05, UCLp = 0.17, LCLp = 0.
9. CLp = 0.06, UCLp = 0.1311, LCLp = 0 ; out of control.
10. CLp = 0.186, UCLp = 0.303, LCLp = 0.069 ; out of control.
11. CLnp = 8.5, UCLnp = 16.87, LCLnp = 0.13; under control.
12. CLnp = 1.5, UCLnp = 4.89, LCLnp = 0 ; out of control.
13. UCLnp = 16.87, LCLnp = 0.13 ; under control.
14. CLnp = 24, UCLnp = 38.25, LCLnp = 9.75 ; out of control.
15. CLc = 3.6, UCLc = 8.38, LCLc = 0 ; under control.
16. CLc = 3.6, UCLc = 9.292, LCLc = 0; under control.
17. CLc = 1.2, UCLc = 4.49, LCLc = 0 ; not under control.
188 Self-Instructional Material

You might also like