Statistical Analysis
Statistical Analysis
Statistical Analysis
Expectation
NOTES
1. PROBABILITY AND EXPECTATION
STRUCTURE
Introduction
Some Important definitions
Theorems on Probability
Addition Theorems on Probability
Conditional Probability
Multiplication Theorems on Probability
Addition Theorem for Independent events
Bayes Theorem
1.1. INTRODUCTION
INTRODUCTION
The word Probability and Chance are quite familiar to everyone. Many a times,
we come across statements like, Probably it may rain today, chances of hitting the
target are very few. It is possible that he may top the examination. In the above
statements, the probably, chances, possible, etc. convey the sense of uncertainty about
the occurrence of some event. Ordinarily, it appears that there cannot be any exact
measurement for these uncertainties, but in Mathematical Statistics, we have methods
for calculating the degree of certainty of events in numerical value, under certain
conditions. When, we perform experiments in science and engineering, repeatedly under
identical conditions, we get almost the same result. There also exist experiment in
which the outcome may be different even if the experiment is performed under identical
conditions. In such experiments, the outcome of each experiment depends on chance.
Self-Instructional Material 1
Statistical Analysis Random Experiment : A random experiment is defined as an experiment in
which all possible outcomes are known and which can be repeated under identical
conditions but it is not possible to predict the outcome of any particular trial in advance.
e.g. Tossing a coin or throwing a die is random experiment.
NOTES Sample Space : The sample space of a random experiment is defined as the set
of all possible outcomes of the experiment. The possible outcomes are called sample
points. The sample space is generally denoted by the letter S.
e.g. In throwing a fair die, sample space is S = {1, 2, 3, 4, 5, 6}. In tossing of two
unbiased coins sample space is S = {HH, HT, TH, TT}.
Event : Any subset of the sample space is defined as an event. An event is
called an elementary (or simple) event if it contains only one sample point. In the
experiment of throwing a die, the event A of getting 2 is a simple event. We write A =
{2}. Also an event is called an impossible event if it can never occur. In the above
experiment, event B = {7} of getting 7 is an impossible event. An event which is sure to
occur is called a certain event.
e.g. In throwing a die, the event of getting a number less than 7 is a certain event.
Exhaustive Events : The total number of all possible outcomes in any trial are
known as exhaustive events or cases.
e.g. In tossing a coin, there are two exhaustive events, head and tail. In throwing a die,
there are 6 exhaustive cases, any one of the six faces may turn up.
Note. In throwing n dice, the exhaustive cases are 6n.
Equally Likely Events : Events are said to be equally likely, if there is no
reason to expect any one in preference to any other.
e.g. If we draw a card from a well-shuffled pack, we may get any card, then the 52
different cases are equally likely.
Favourable Events : The events which ensure the required happening, are
said to be favourable events.
e.g. In throwing a die, the number of cases favourable to the appearance of a multiple
of 2 are three viz. 2, 4 and 6. In drawing two cards from a pack of 52 cards, the number
of cases favourable to drawing 2 aces is 4C2.
Independent Events : Events are said to be independent if the happening (or
non-happening) of one event is not affected by the happening (or non-happening) of
others.
e.g. In case a card is drawn from a pack of well shuffled cards and is not replaced, then
the second draw of the card is dependent on the first draw. However, if the first card
drawn is replaced before drawing the second card, the result of the second draw is
independent of the first draw.
Mutually Exclusive Events : Two events are said to be mutually exclusive if
they cannot occur together i.e., if one occurs then other cannot.
e.g. In tossing a coin, the events head and tail are mutually exclusive, since if the
outcome is tail, the possibility of getting head in the same trial is ruled out.
Compound Events : Events obtained by combining together two or more
elementary events are known as the compound events.
e.g. In throwing a die, getting 5 or 6 is called a compound event.
Mathematical (or Classical) Definition of Probability : If an event can
happen in n ways which are equally likely, exhaustive and mutually exclusive and out
2 Self-Instructional Material
of these n ways, m ways are favourable to an event A, then the probability of happening Probability and
of A is given by Expectation
m
p or P(A) =
n
If A happens in m ways, it will fail in (n m) ways so that the probability of its NOTES
failure
nm m
q or P( A ) = =1 =1p
n n
p + q = 1 i.e., P(A) + P( A ) = 1
0p1;0q1
If P(A) = 1, then A is called a certain event. If P(A) = 0, then A is called an impossible
event.
Statistical (or Empirical) Definition of Probability : If in n trials, an event
A happens m times then the probability of happening A is given by
m
p or P(A) = lim .
n n
THEOREMS ON PROB
PROBABILITY
OBABILITY
Self-Instructional Material 3
Statistical Analysis Corollary : If A and B are mutually exclusive events, then P(A B) = 0,
therefore,
P(A B) = P(A) + P(B)
This is the addition theorem for mutually exclusive events.
NOTES
1.3.2. Theorem 2 (Addition Theorem for three events)
If A, B and C are three events associated with a random experiment, then
P(A B C) = P(A) + P(B) + P(C) P(A B) P(B C)
P(A C) + P(A B C)
Proof. Let D = B C, then
P(A B C) = P(A D) = P(A) + P(D) P(A D) ...(1) (by Th. 1)
Now, A D = A (B C) = (A B) (A C)
P(A D) = P[(A B) (A C)]
= P(A B) + P(A C) P[(A B) (A C)]
= P(A B) + P(A C) P(A B C) ...(2) (by Th. 1)
[Q (A B) (A C) = A B C]
Also P(D) = P(B C) = P(B) + P(C) P(B C) ...(3)
From (1), (2) and (3), we get
P(A B C) = P(A) + P(B) + P(C) P(B C)
[P(A B) + P(A C) P(A B C)]
= P(A) + P(B) + P(C) P(A B) P(B C) P(A C)
+ P(A B C)
Corollary : If A, B and C are mutually exclusive events, then
P(A B) = P(B C) = P(A C) = P(A B C) = 0
P(A B C) = P(A) + P(B) + P(C)
This is addition theorem for three mutually exclusive events.
1.4. CONDITION AL PR
CONDITIONAL OB
PROBABILITY
OBABILITY
Let A and B be two events associated with a random experiment. Then, the
probability of occurrence of A under the condition that B has already occurred and
P(B) 0, is called the conditional probability and is denoted by P(A/B).
Thus, P(A/B) = Probability of occurrence of A given that B has already occurred.
Similarly, P(B/A) = Probability of occurrence of B given that A has already
occurred.
1.5. MULTIPLICA
MULTIPLICATION THEOREMS ON PR
TIPLICATION OB
PROBABILITY
OBABILITY
1.5.1. Theorem 1
If A and B are two events associated with a random experiment, then
P(A B) = P(A) P(B/A), if P(A) 0
or P(A B) = P(B) P(A/B), if P(B) 0
4 Self-Instructional Material
Proof. Let S be the sample space associated with the given random experiment. Probability and
Suppose S contains n elementary events. Let m1, m2 and m be the number of elementary Expectation
events favourable to A, B and A B respectively. Then
m1 m2 m
P(A) = , P(B) = and P(A B) = . NOTES
n n n
Since m1 elementary events are favourable
to A out of which m are favourable to B, therefore, S
A B
m
P(B/A) = .
m1
m1 m m2
m
Similarly, P(A/B) =
m2
m m m1
Now, P(A B) = = .
n m1 n
= P(B/A) . P(A) ...(1)
m m m2
and P(A B) = = . = P(A/B) P(B) ...(2)
n m2 n
Note 1. From (1) and (2) in the above theorem, we find that
P(A B) P(A B)
P(B/A) = and P(A/B) =
P(A) P(B)
P(A B) is also written as P(AB).
2. For three events A, B, C
P(A B C) = P(ABC)
= Probability of the simultaneous occurrence of events A, B and C
= P(A) P(B/A) P(C/AB)
= P(A) P(B/A) P(C/(A B))
If A1, A2, ..., An are n events, then
P(A1 A2 ... An) = P(A1) P(A2/A1) P(A3/A1 A2) ... P(An/A1 A2 ... An 1)
1.6.1. Theorem
NOTES If A1, A2, ..., An are n independent events associated with a random experiment,
then
SOLVED EXAMPLES
Example 1. Find the probability of getting a tail in throw a coin.
Solution. Clearly the sample space S = {H, T}
Event of getting tail E = {T}
Clearly n(E) = 1 and n(S) = 2
Probability of getting a tail is given by
n(E) 1
P(E) = =
n(S) 2
or If E is the required event, then E = {T}
No. of cases favourable to E 1
Hence, P(E) = = .
Total number of cases 2
Example 2. Three coins are tossed, find the probability of getting at least two
heads.
Solution. Clearly the sample space
S = {HHH, HHT, HTH, THH, THT, TTH, HTT, TTT}
If E is the required event, then
E = {HHH, HHT, HTH, THH}
No. of cases favourable to E
P(E) =
Total number of cases
4 1
= = .
8 2
Example 3. If there are two children in a family, find the probability that there
is at least one girl in the family.
Solution. Let S be the sample space, then
S = {BB, BG, GB, GG},
where B and G stand for Boy and Girl respectively.
If E is the required event, then
A = {BG, GB, GG}
3
P(E) = .
4
6 Self-Instructional Material
Example 4. What is the chance that a leap-year, selected at random, will contain Probability and
53 Fridays ? Expectation
Solution. There are 366 days in a leap-year and we can write 366 = (7 × 52) + 2.
This means that the leap year will contain at least 52 Fridays. The possible combinations
for the remaining two days can be made as follows : NOTES
(i) Sunday and Monday (ii) Monday and Tuesday
(iii) Tuesday and Wednesday (iv) Wednesday and Thursday
(v) Thursday and Friday (vi) Friday and Saturday
(vii) Saturday and Sunday.
Of these seven likely cases only (v) and (vi) are favourable.
2
Hence, the required probability = .
7
Example 5. What is the probability of getting an even number in the throw of an
unbiased die ?
Solution. Clearly, there are 6 equally likely possible outcomes 1, 2, 3, 4, 5, 6.
Hence, the sample space S = {1, 2, 3, 4, 5, 6}
Let E be the required event, then we have
E = {2, 4, 6}
3 1
Hence, P(E) =
= .
6 2
Example 6. A bag contains 7 red, 12 white and 4 green balls. What is the
probability that
(i) 3 balls drawn are all white and
(ii) 3 balls drawn are one of each colour.
Solution. Total balls are = 7 + 12 + 4 = 23
3 balls out of these 23 balls can be drawn in
23 C
23 22 21
3 = = 1771 ways
321
The sample space for this experiment contains 1771 sample point, i.e., n(S)
= 1771.
(i) Let E1 = event that the 3 balls drawn are all white. Now 3 white balls can be
drawn from 12 white balls in
12 C =
12 11 10
3 = 220 ways
321
n(E1) = 220
n(E 1 ) 220
P(E1) = = .
n(S) 1771
(ii) Let E2 = event that three balls are one of each colour.
Now 1 red ball can be drawn out of 7 red balls in 7C1 = 7 ways,
1 white ball can be drawn out of the 12 white balls in 12C1 = 12 ways and 1 green
ball can be drawn out of the 4 green balls in 4C1 = 4 ways.
3 balls one of each colour can be drawn in 7 × 12 × 4 = 336
n(E2) = 336
n(E 2 ) 336
P(E2) = = .
n(S) 1771
Self-Instructional Material 7
Statistical Analysis Example 7. From a pack of 52 cards three are drawn at random. Find the chance
that they are a king, a queen and a knave.
Solution. From a pack of 52 cards three can be drawn in 52C3 ways. Thus,
52
n = C3.
NOTES There are 4 kings, 4 queens and 4 knaves. A king can be drawn in 4C1 ways, a
queen in 4C1 ways and a knave in 4C1 ways. Since each of these may be with drawn in
4C × 4C × 4 C ways.
1 1 1
m = 4C1 × 4C1 × 4C1
4
C1 4 C1 4 C 1
444321 16
Required probability = 52
= = .
C3 52 51 50 5525
Example 8. A and B are two mutually exclusive events of an experiment. If
P(not A) = 0.65, P(A B) = 0.65 and P(B) = p, find the value of p.
Solution. By addition theorem for mutually exclusive events, we have
P(A B) = P(A) + P(B)
P(A B) = 1 P(not A) + P(B) [Q P(A) = 1 P( A )]
0.65 = 1 0.65 + p
p = 0.30.
Example 9. The probability that at least one of the events A and B occurs is 0.6.
If A and B occur simultaneously with probability 0.2, then find P( A ) + P( B ).
Solution. We have P(A B) = 0.6 and P(A B) = 0.2
Now P(A B) = P(A) + P(B) P(A B)
0.6 = P(A) + P(B) 0.2
0.6 = 1 P( A ) + 1 P( B ) 0.2 = 1.8 [P( A ) + P( B )]
P( A ) + P( B ) = 1.8 0.6 = 1.2.
Example 10. A, B, C are three mutually exclusive and exhaustive events
3
associated with a random experiment. Find P(A), it being given that P(B) = P(A) and
2
1
P(C) = P(B).
2
Solution. Let P(A) = p. Then
3 3
P(B) =P(A) P(B) = p
2 2
1 3
and P(C) = P(B) P(C) = p
2 4
Since A, B, C are mutually exclusive and exhaustive events associated with a
random experiment, therefore,
ABC=S
P(A B C) = P(S) P(A B C) = 1 [Q P(S) = 1]
P(A) + P(B) + P(C) = 1
3 3 4
p+ p p = 1 p= .
2 4 13
Example 11. A card is drawn from a pack of 52 cards. Find the probability of
getting a king or a heart or a red card.
Solution. Consider the following events :
A = getting a king, B = getting a heart
C = getting a red card.
8 Self-Instructional Material
4
C1 4 13
C1 13 Probability and
We have P(A) = 52
= , P(B) = 52
= Expectation
C1 52 C1 52
26
C1 26
P(C) = =
52
C1 52 NOTES
1
P(A B) = P(getting a king of heart) =
52
13
P(B C) = P(getting a heart card) =
52
2
P(C A) = P(getting a red king) =
52
1
P(A B C) = P(getting a king of heart) =
52
Required probability,
P(A B C)
= P(A) + P(B) + P(C) P(A B) P(B C) P(C A) + P(A B C)
4 13 26 1 13 2 1 28 7
= – – – = = .
52 52 52 52 52 52 52 52 13
Example 12. Consider an experiment throwing a pair of dice. Let A and B be the
events given by A = the sum of points is 8 ; B = there is an even number on first die. Find
P(A/B) and P(B/A).
Solution. We have A = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}
and B = {(2, 1), ..., (2, 6), (4, 1), ..., (4, 6), (6, 1), ..., (6, 6)}
5 18
P(A) = and P(B) =
36 36
Now P(A/B) = Probability of occurrence of A when B occurs
= Probability of getting 8 as the sum when there is an even number
on first die
n( A B) 3 1
= = =
n(B) 18 6
and P(B/A) = Probability of occurrence of B when A occurs
= Probability of getting an even number on first die when the sum
of the numbers on two dice is 8
n( A B) 3
= = .
n(A) 5
Example 13. A bag contains 10 white and 15 black balls. Two balls are drawn
in succession without replacement. What is the probability that first is white and second
is black ?
Solution. Consider the following events :
A = getting a white ball in first draw
B = getting a black ball in second draw.
Required probability = Probability of getting a white ball in first draw and
black ball in second draw.
= P(A and B) = P(A B)
= P(A) P(B/A)
Self-Instructional Material 9
Statistical Analysis 10
C1 10 2
Now P(A) = 25 = =
C1 25 5
and P(B/A) = Probability of getting a black ball in second draw when a
white ball has already been in first draw.
NOTES
15 15 5
C1
= = =
24
C1 24 8
(Q 24 balls are left after drawing a white ball in first draw
out of which 15 are black)
So required probability = P(A B) = P(A) P(B/A)
2 5 1
= = .
5 8 4
Example 14. Two balls are drawn from an urn containing 2 white, 3 red and 4
black balls one by one without replacement. What is the probability that at least one
ball is red ?
Solution. Consider the following events :
A = not getting a red ball in first draw
B = not getting a red ball in second draw
Required probability = Probability that at least one ball is red
= 1 Probability that none is red
= 1 P(A and B) = 1 P(A B)
= 1 P(A) P(B/A)
Now P(A) = Probability of not getting a red ball in first draw
= Probability of getting an other colour (white or black) ball in
first draw
6 2
= =
9 3
When another colour ball is drawn in first draw there are 5 other colour (white
or black) balls and 3 red balls, out of which one other colour ball can be drawn in 5C1
ways.
5
P(B/A) =
8
2 5 7
Required probability = 1 P(A) P(B/A) = 1 = .
3 8 12
Example 15. If A and B are two events such that P(A) = 0.5, P(B) = 0.6 and
P(A B) = 0.8, find P(A/B) and P(B/A).
Solution. We have P(A B) = P(A) + P(B) P(A B)
P(A B) = P(A) + P(B) P(A B) = 0.5 + 0.6 0.8 = 0.3
P ( A B) 0.3 1
Now, P(A/B) = = =
P(B) 0.6 2
P (A B) 0.3 3
and P(B/A) = = = .
P(A) 0.5 5
Example 16. A coin is tossed twice and the four possible outcomes are assumed
to be equally likely. If A is the event, both head and tail have appeared, and B be the
event, at most one tail is observed, find P(A), P(B), P(A/B) and P(B/A).
10 Self-Instructional Material
Solution. Here, S = {HH, HT, TH, TT}, A = {HT, TH} and B = {HH, HT, TH}. Probability and
Expectation
A B = {HT, TH}
n( A ) 2 1
Now, P(A) = = =
n(S) 4 2 NOTES
n(B) 3 n( A B) 2 1
P(B) = = and P(A B) = = =
n(S) 4 n(S) 4 2
P (A B) 1/ 2 2 P (A B) 1/ 2
P(A/B) = = = and P(B/A) = = = 1.
P(B) 3 / 4 3 P(A) 1/ 2
Example 17. A coin is tossed thrice and all eight outcomes are equally likely.
A = The first throw results in head
B = The last throw results in tail
Prove that events A and B are independent.
Solution. Let S be the sample space, then
S = {HHH, HHT, THH, HTH, TTH, HTT, THT, TTT}
A = {HHH, HHT, HTH, HTT}, B = {HHT, HTT, THT, TTT}
A B = {HHT, HTT}
n( A ) 4 1 n(B) 4 1
P(A) = = = , P(B) = = =
n(S) 8 2 n(S) 8 2
n( A B) 2 1
P(A B) = = =
n(S) 8 4
1
Clearly P(A B) = = P(A) P(B)
4
Hence, A and B are independent events.
Example 18. Events A and B are independent. Find P(B) if P(A) = 0.35 and
P(A B) = 0.6.
Solution. We have P(A B) = P(A) + P(B) P(A B)
P(A B) = P(A) + P(B) P(A) P(B) (Q A and B are independent)
= P(A) + P(B) [1 P(A)]
0.6 = 0.35 + P(B) (1 0.35)
0.25 = 0.65 P(B)
0.25 5
P(B) = = .
0.65 13
Example 19. X can solve 90% of the problems given in a book and Y can solve
70%. What is the probability that at least one of them will solve the problem, selected at
random from the book ?
Solution. Let A and B be the events defined as follows :
A = X solves the problem, B = Y solves the problem
Clearly A and B are independent events such that
90 9 70 7
P(A) = = and P(B) = =
100 10 100 10
Now required probability = P(A B)
= 1 P( A ) P( B ) (Q A and B are independent events)
9 1 – 7 = 1 1 3 = 0.97
= 1 1–
10 10 10 10
Self-Instructional Material 11
Statistical Analysis
EXERCISE 1.1
1. Find the probability of getting a head in throw a coin.
2. Three unbiased coins are tossed, find the probability of getting
NOTES (i) all heads (ii) two heads
(iii) one head (iv) at least one head
(v) at least two heads.
3. A bag contains 7 white, 6 red and 5 black balls. Two balls are drawn at random. Find the
probability that they will both be white.
4. Four cards are drawn from a pack of cards. Find the probability that
(i) all are diamonds (ii) there is one card of each suit, and
(iii) there are two spades and two hearts.
5. Two dice are thrown simultaneously. Find the probability of getting
(i) an even number as the sum (ii) the sum as a prime number
(iii) a total of at least 10 (iv) a doublet of even number.
6. Tickets numbered from 1 to 20 are mixed up together and then a ticket is drawn at
random. What is the probability that the ticket has a number which is a multiple of 3 or
7?
7. A bag contains 50 tickets numbered 1, 2, 3, ..., 50 of which five are drawn at random and
arranged in ascending order of magnitude (x1 < x2 < x3 < x4 < x5). Find the probability
that x3 = 30.
1 1
8. If A, B, C are mutually and exhaustive events, find P(B), if P(C) = P(A) = P(B).
3 2
9. If P(A) = a and P(B) = b, then show that P(A/B) (a + b 1)/b.
10. Two cards are drawn from a pack of 52 cards. What is the probability that either both
are red or both are kings ?
1 1
11. Given two mutually exclusive events A and B such that P(A) = and P(B) = . Find
2 3
P(A or B).
12. A die is thrown twice and the sum of numbers appearing is observed to be 6. What is the
conditional probability that the number 4 has appeared at least once ?
13. A bag contains 19 tickets, numbered from 1 to 19. A ticket is drawn and then another
ticket is drawn without replacement. Find the probability that both tickets will show
even numbers.
14. If A and B are two events such that P(A) = 0.3, P(B) = 0.6 and P(B/A) = 0.5, find P(A/B)
and P(A B).
15. A bag contains 3 red and 4 black balls and another bag has 4 red and 2 black balls. One
bag is selected at random and from the selected bag a ball is drawn. Let A be the event
that the first bag is selected, B be the event that the second bag is selected and C be the
event that the ball drawn is red. Find P(A), P(B), P(C/A) and P(C/B).
16. If P(A) = 0.4, P(B) = p, P(A B) = 0.6 and A and B are given to be independent events,
find the value of p.
17. A bag contains 5 white, 7 red and 4 black balls. If four balls are drawn one by one with
replacement, what is the probability that none is white ?
18. Two dice are thrown. Find the probability of getting an odd number on the first die and
a multiple of 3 on the other.
1 1 1
19. A problem of statistics is given to 3 students whose chances of solving it are , , .
2 3 4
What is the probability that the problem is solved ?
12 Self-Instructional Material
Answers Probability and
Expectation
1 1 3 3 7 1
1. 2. (i) (ii) (iii) (iv) (v)
2 8 8 8 8 2
7 11 2197 468
3. 4. (i) (ii) (iii) NOTES
51 4165 20825 20825
1 5 1 1
5. (i) (ii) (iii) (iv)
2 12 6 12
2 551 1 55
6. 7. 8. 10.
5 15134 6 221
5 2 4 1
11. 12. 13. 14. ; 0.75
6 5 19 4
4
1 1 3 2 1 11 1
15. ; ; ; 16. 17. 18.
2 2 7 3 3 16 6
3
19. .
4
An event A can occur only if any one of the set of exhaustive and mutually
exclusive events B1, B2, ..., Bn occurs. The probabilities P(B1), P(B2), ..., P(Bn) and the
conditional probabilities P(A/Bi), i = 1, 2, 3, ..., n for an event A to occur are known.
Then the conditional probability P(Bi/A) when A has already occurred is given
by
P(Bi ) P(A /B i )
P(Bi/A) = n
P(B ) P(A / B )
i i
i1
P(B i ) P ( A / B i )
=
P(B 1 ) P( A /B 1 ) P(B 2 ) P ( A /B 2 ) ... P (B n ) P( A / B n )
SOLVED EXAMPLES
Example 1. Two boxes contain respectively 4 white and 2 black and 1 white and
3 black balls. One ball is transferred from the first box into the second and then one
ball is drawn from the second. It turns out to be black. What is the probability that the
transferred ball was white ?
Solution. Let B1 be the event that the transferred ball (ball drawn from the
first box) is white and B2 be the event that the transferred ball is black.
4 2 2 1
P(B1) =
= , P(B2) = =
6 3 6 3
Let A be the event that the ball drawn from the second box (after a ball is
transferred from the first box to the second box) is black, then
3 4
P(A/B1) = , P(A/B2) =
5 5
P(B1/A) = The probability that the ball transferred from the first box is white
when the ball drawn from the second box is known to be black.
Self-Instructional Material 13
Statistical Analysis P(B 1 ) P(A /B 1 )
P(B1/A) =
P(B 1 ) P(A /B 1 ) P(B 2 ) P (A / B 2 )
2 3 2
2 3 3
= 3 5 = 5 = = .
NOTES 2 3 1 4 2 4 5 2 5
3 5 3 5 5 15
Example 2. The chance that doctor X will diagnose disease Y correctly is 60%.
The chance that a patient will die by his treatment after correct diagnosis is 40% and
the chance of death by wrong diagnosis is 70%. A patient of doctor X, who had disease
Y, died. What is the chance that his disease was correctly diagnosed ?
Solution. Let B1 be the event that the diagnosis is correct and B2 be the event
that the diagnosis is incorrect. Let A be the event that the patient dies. Then
60
P(B1) = = 0.6, P(B2) = 1 P(B1) = 1 0.6 = 0.4
100
40 70
P(A/B1) = = 0.4, P(A/B2) = = 0.7
100 100
P(B1/A) = Probability that a patient was correctly diagnosed, given that he
had died.
P(B 1 ) P(A /B 1 )
P(B1/A) =
P(B 1 ) P(A /B 1 ) P(B 2 ) P(A / B 2 )
0.6 0.4 0.24
= =
0.6 0.4 0.4 0.7 0.24 0.28
0.24
= = 0.4615 or 46.15%.
0.52
Example 3. The contents of urns I, II and III are as follows :
1 white, 2 black and 3 red balls,
2 white, 1 black and 1 red balls, and
4 white, 5 black and 3 red balls.
One urn is chosen at random and two balls drawn. They happen to be white and
red. What are the probability that they come from urns I, II and III ?
Solution. Let B1, B2 and B3 denote the events that the urn I, II and III is
chosen, respectively and let A be the event that the two balls taken from the selected
urn are white and red. Then
1
P(B1) = P(B2) = P(B3) = (Q n = 3 urns, m = 1)
3
1 3 1 21 1
P(A/B1) = = , P(A/B2) = 4 =
6
C2 5 C2 3
43 2
and P(A/B3) = 12 =
C2 11
P(B 1 ) P(A / B 1 )
P(B1/A) =
P(B 1 ) P( A / B1 ) P(B 2 ) P(A /B 2 ) P(B3 ) P(A /B 3 )
1 1
1 165 33
= 3 5 = =
1 1 1 1 1 2 5 118 118
3 5 3 3 3 11
14 Self-Instructional Material
P(B 2 ) P (A / B 2 ) Probability and
P(B2/A) = Expectation
P(B 1 ) P( A / B1 ) P (B 2 ) P (A /B 2 ) P(B3 ) P (A /B 3 )
1 1
3 3 1 165 55
= = = NOTES
1 1 1 1 1 2 3 118 118
3 5 3 3 3 11
P(B3/A) = 1 [P(B1/A) + P(B2/A)]
33 55 = 1 – 88 = 30 .
= 1–
118 118 118 118
Example 4. In a bolt factory machines X, Y, Z manufacture respectively 25%,
35% and 40% of the total. Of their output 5, 4 and 2 percent are defective bolts. A bolt is
drawn at random from the product and is found to be defective. What are the probabilities
that it was manufactured by machines X, Y and Z ?
Solution. Let B1, B2, B3 denote the events that a bolt selected at random is
manufactured by the machines X, Y and Z respectively and let A denote the event of
its being defective. Then we have
25 35 40
P(B1) = = 0.25, P(B2) = = 0.35, P(B3) = = 0.40
100 100 100
The probability of drawing a defective bolt manufactured by machine X is
5
P(A/B1) = = 0.05
100
4 2
Similarly, P(A/B2) = = 0.04, P(A/B3) = = 0.02
100 100
P(B1/A) = The probability that a defective bolt selected at random
is manufactured by machine A
P(B1 ) P (A /B 1 )
P(B1/A) =
P(B 1 ) P(A /B 1 ) P(B 2 ) P (A / B2 ) P(B3 ) P (A /B 3 )
25 28 = 1 – 53 = 16 .
= 1–
69 69 69 69
EXERCISE 1.2
1. A doctor has taken a vaccine from either storage unit P (which contains 30 current and
10 outdated vaccines), or from unit Q (which contains 20 current and 20 outdated
vaccines), or from unit R (which contains 10 current and 30 outdated vaccines), but he is
twice as likely to have taken it from unit P as from unit Q and twice as likely to have
taken it from unit Q as from unit R. If the vaccine selected is outdated, what is the
probability that it came from unit P ?
Self-Instructional Material 15
Statistical Analysis 2. A factory has two machines. The empirical evidence has established that machines I
and II produce 30% and 70% of the output respectively. It has also been established that
5% and 1% of the output produced by these machines respectively was defective. A
defective item is drawn at random. What is the probability that the defective item was
produced by machine II ?
NOTES 3. A doctor has decided to prescribe two new drugs to 200 heart patients, as follows : 50 get
drug A, 50 get drug B and 100 get both. Drug A reduces the probability of a heart attack
by 35%, drug B reduces the probability by 20% and the two drugs, when taken together,
work independently. The 200 patients were chosen so that each has an 80% chance of
having a heart attack. If a randomly selected patient has a heart attack, what is the
probability that the patient was given both drugs ?
4. In a class of 75 students, 15 were considered to be very intelligent, 45 as medium and
the rest below average. The probability that a very intelligent student fail in a viva-voce
examination is 0.005, the medium student failing has a probability 0.05, and the
corresponding probability for a below average student is 0.15. If a student is known to
have passed the viva-voce examination, what is the probability that he is below average?
5. Suppose that there is a chance for a newly constructed flyover to collapse whether the
design is faulty or not . The chance that the design is faulty is 5%. The chance that the
flyover collapses if the design is faulty is 95%, otherwise it is 30%. A flyover collapsed.
What is the probability that it collapsed because of faulty design ?
Answers
1. 0.3636 2. 0.318 3. 0.4176
4. 0.18 5. 0.1428.
16 Self-Instructional Material
Probability Distributions
NOTES
2. PROBABILITY DISTRIBUTIONS
STRUCTURE
Binomial Distribution
Applications of Binomial Distribution
Recurrence Formula for the Binomial Distribution
Mean, Variance and Standard Deviation of Binomial Distribution
Poisson Distribution
Applications of Poisson Distribution
Recurrence Formula for the Poisson Distribution
Mean, Variance and standard Deviation of Poisson Distribution
Normal Distribution
Properties of the Normal Distribution
Standard Form of the Normal Distribution
Self-Instructional Material 17
Statistical Analysis
2.1. BINOMIAL DISTRIBUTION
DISTRIBUTION
= p .pp ...
p . qqq ...qq
r factors
( n r ) factors)
= pr q n r
But r successes in n trials can occur in nCr ways and the probability for each of
these ways is pr qn r. Hence, the probability of r successes in n trials is given by
P(X = r) = nCr pr qn r, where p + q = 1 and r = 0, 1, 2, ..., n
The probability distribution of the number of successes so obtained is called the
Binomial probability distribution and X is called the Binomial Variate.
Note. (i) P(X = r) is usually written as P(r).
(ii) n and p in the binomial distribution are called the parameters of the distribution.
(iii) Each trial has only two possible outcomes called success and failure .
(iv) There is a finite number of trials say n.
(v) All trials are identical, i.e., p (and hence q) is constant in each trial.
(vi) The trials are independent of each other.
(vii) If n independent trials repeated N times then the expected frequency of r successes is
N . P(r).
18 Self-Instructional Material
P (r 1)
n
C r 1 pr 1 qn r 1 Probability Distributions
=
P( r ) n
Cr pr qn r
n! r ! (n r ) ! p r 1 q n r 1
=
(r 1) ! (n r 1) ! n! pr q n r NOTES
r ! (n r) (n r 1) ! p n r p
= = .
(r 1) r ! (n r 1) ! q r1 q
nr p
or P(r + 1) = . P(r) ,
r1 q
which is the required recurrence formula. Using this formula successively, we can
find P(1), P(2), ..., if P(0) is known.
1.4. MEAN
MEAN,, VARIANCE AND ST AND
STANDARD DEVIA
ANDARD TION OF
DEVIATION
BINOMIAL DISTRIB UTION
DISTRIBUTION
Mean () = r . P (r ) = r.
r0
n
Cr pr qn r
r0
= 0 + 1 . nC1 p1 qn 1 + 2 nC2 p2 qn 2 + ... + r nCr pr qn r + ...
+ ... + n . nCn pn qn n
2n(n 1) 2 n 2
= np qn 1 + p q + ... + n . pn
2!
= n p[q n 1 + (n 1) p qn 2 + ... + pn 1]
= np(q + p)n 1 = np (Q p + q = 1)
Hence, the mean of the binomial distribution is np.
The variance (2) is given by
n n
Variance (2) =
r0
r 2 P(r) 2 = [r r(r 1)] P(r)
r0
2
n n
= r P(r) r(r 1) P(r)
r0 r0
2
n
=+ r(r 1)
r0
n
C r pr q n r 2
=+ 2
n(n 1) 2 n 2
p q 6
n(n 1) (n 2) 3 n 3
p q ... n(n 1) p n 2
"#
! 2! 3! $
= + n(n 1) p2 [qn 2 + (n 2) pqn 3 + ... + pn 2] 2
= + n(n 1) p2 (q + p)n 2 2
= + n(n 1) p2 2 (Q p + q = 1)
2
= np + n(n 1) p n p 2 2 (Q = np)
= np [1 + (n 1) p np]
2 = np(1 p) = npq
Hence, the variance of the binomial distribution is npq.
Self-Instructional Material 19
Statistical Analysis The standard deviation () is given by
Standard deviation () = npq
Hence, the standard deviation of the binomial distribution is npq .
NOTES q p 1 2p
Note. (i) 1 = = gives the measure of skewness of the binomial distribution.
npq npq
1 1 1
If p < , skewness is positive, if p > , skewness is negative and if p = , skewness is
2 2 2
zero.
1 6 pq
(ii) 2 = 3 + gives a measure of the kurtosis of the binomial distribution.
npq
SOLVED EXAMPLES
Example 1. A die is thrown 6 times. If getting an even number is a success, what
is the probability of :
(i) no success (ii) exactly 5 successes
(iii) at least 5 successes (iv) at most 5 successes.
Solution. Here, S = {1, 2, 3, 4, 5, 6}. Let A denote getting an even number.
A = {2, 4, 6}
n( A ) 3 1
p= = =
n(S) 6 2
1 1
q = 1 p = 1 = , n = 6
2 2
We know that P(r) = nCr pr qn r
1 1
5 65
1 1
6 66
2 2 2 2
6
= C5 6 C6
3 1 61 7
== =
32 64 64 64
(iv) P(at most 5 successes) = P(r 5)
1 63
= 1 P(r > 5) = 1 P(r = 6) = 1 = .
64 64
Example 2. The items produced by a company contains 5% defective items. What
is the probability of getting 2 defective items in a sample of 10 items ?
5 1
Solution. Here, p= = , n = 10, r = 2
100 20
20 Self-Instructional Material
1 19 Probability Distributions
q=1 p=1 =
20 20
We know that P(r) = nCr pr qn r
P(2 defective items) = P(r = 2)
NOTES
1 19
2 10 2
10 9 (19) 8 45 (19) 8
20 20
10
= C2 = 10 = .
2 (20) (20) 10
Example 3. A pair of dice thrown 10 times. If getting a doublet (same number
on both) is considered a success, find the probability of
(i) no success (ii) 3 successes.
Solution. A doublet can be obtained when a pair of dice is thrown in
(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), i.e., (6, 6) 6 ways.
The two dice can be thrown in 62 = 36 ways.
6 1
p = P(getting doublet) = =
36 6
1 5
q=1p=1 = , n = 10
6 6
We know that P(r) = nCr pr qn r
5 1 0 10 0
5
5
10 10
6 6 6 6
10
(i) P(no success) = P(0) = C0 = 1 =
5 1 10 9 8 1 5
3 7
3 10 3
6 6 3 2 1 6 6
10
(ii) P(3 successes) = P(3) = C 3 =
120 5 5 5
7 7
=
216 6 9 6
= .
Example 4. Five cards are drawn successively with replacement from a well-
shuffled pack of 52 cards. What is the probability that
(i) none is spade (ii) only 3 cards are spade ?
13 1
Solution. p = P(spade card) = =
52 4
1 3
q=1p=1 = ,n=5
4 4
We know that P(r) = nCr pr qn r
1 3 = 1 1 3
0 50 5
243
(i) P(none is spade) = P(0) = 5 C0
4 4 4 =
1024
4 4
5
3
1 3
2
54
=
21 4
4 3
10 9 90 45
= 5 = = .
4 1024 512
Self-Instructional Material 21
Statistical Analysis Example 5. If the probability of hitting a target is 10% and 10 shots are fired
independently. What is the probability that the target will be hit at least once ?
10 1
Solution. Here, p= =
100 10
NOTES 1 9
q=1p=1 = , n = 10
10 10
We know that P(r) = nCr pr qn r
P(target will be hit at least once)
= P(r 1) = 1 P(r < 1)
= 1 P(r = 0)
1 9
0 10 0
9 10
10 10 10
10
= 1– C0 =11×1×
= 1 0.3487 = 0.6513.
Example 6. A policeman fires 4 bullets on a dacoit. The probability that the
dacoit will be killed by a bullet is 0.6. What is the probability that dacoit is still alive?
Solution. Here, p = 0.6, q = 1 p = 1 0.6 = 0.4, n = 4
We know that P(r) = nCr pr qn r
P(dacoit is still alive) = P(not killed)
= P(r = 0) = 4C0 (0.6)0 (0.4)4 0
= 1 × 1 × (0.4)4 = 0.0256.
Example 7. Find the parameters of the binomial distribution for which mean
= 4 and variance = 3.
Solution. We know that for a binomial distribution
Mean = np and variance = npq
Here, np = 4 and npq = 3
npq 3 3
We have = q=
np 4 4
3 1
p=1q=1 =
4 4
1
n × = 4 n = 16.
4
Example 8. Comment on the following statement. The mean of a binomial
distribution is 3 and standard deviation is 5.
22 Self-Instructional Material
According to the given condition Probability Distributions
10C p3 q7 = 16 × 10 C p7 q3
3 7
p3 q7 = 16 × p7 q3 (Q 10C
3 = 10C
7)
q4 = 16p4 NOTES
q4 = (2p)4 q = 2p
In a binomial distribution
1
p+q =1 p + 2p = 1 p=
3
1 2
q=1p=1 =
3 3
1 10
Mean = np = 10 × =
3 3
1 2 20
Variance = npq = 10 × = .
3 3 9
1
Example 10. The probability of a man hitting a target is . How many times
4
must he fire so that the probability of his hitting the target at least once is greater than
2
?
3
Solution. Let the man hits the target n times.
1 1 3
Here, p= and q = 1 p = 1 =
4 4 4
P(hitting the target at least once) = P(r 1)
= 1 P(r < 1) = 1 P(r = 0)
According to given condition
2
1 P(r = 0) >
3
1
0
3
n0
2
1 nC0
4 4
>
3
2 1
1 (0.75)n > , i.e., (0.75)n <
3 3
(0.75)n < 0.3333
Taking log on both sides, we have
n log (0.75) < log (0.3333)
n( 0.1249) < 0.47712
n(0.1249) > 0.47712
0.47712
n> i.e., n > 3.82
0.1249
n = 4.
Example 11. Six dice are thrown 729 times. How many times do you expect at
least three dice to show a five or six ?
Solution. Here, p = the probability of getting 5 or 6 with one die
2 1
= =
6 3
Self-Instructional Material 23
Statistical Analysis 1 2
q=1p=1
= , n = 6, N = 729
3 3
The expected number of times at least three dice showing five or six
= N . P(r 3)
NOTES
= 729 × [P(r = 3) + P(r = 4) + P(r = 5) + P(r = 6)]
2 1
3 3
2 1
4 2
2 1
5 1
2 1 "#
0 6
3 3 3 3 3 3 3 3 #$
6
= 729 C3 6C4 6 C5 6 C6
!
729
= [160 + 60 + 12 + 1] = 233.
36
Example 12. Out of 800 families with 5 children each, how many families would
be expected to have
(i) 3 boys (ii)5 girls (iii) either 2 or 3 boys ?
Assume equal probabilities for boys and girls.
Solution. Here, n = 5, N = 800
1 1 1
p = P(a boy) =
, q = P(a girl) = 1 p = 1 =
2 2 2
(i) The probability of having 3 boys out of 5 children = P(r = 3)
1 1
3 53
1 5
= 5 C3
2 2 = 10
2
10
= 0.3125=
32
The expected number of families = N . P(r = 3) = 800 × 0.3125 = 250
(ii) The probability of having 5 girls out of 5 children = P(r = 0)
1 1
0 50
1 5
1
= 5 C0
2 2 =1×1×
2 =
32
= 0.03125
2 2 2 2
5
= 5C2 3
1
= 10 10
1
5 5
2 2
1 20 = 0.625
= 20
5
2 32
The expected number of families = N. P(r = 2 or r = 3)
= 800 × 0.625 = 500.
Example 13. In sampling a large number of parts manufactured by a machine,
the mean number of defectives in a sample of 20 is 2. Out of 2000 such samples, how
many would be expected to contain at least 3 defective parts.
Solution. Here, = mean no. of defectives = 2
np = 2, n = 20
24 Self-Instructional Material
2 1 Probability and
p= = = 0.1 Expectation
20 10
q = 1 p = 1 0.1 = 0.9, N = 2000
The probability of having at least 3 defectives in a sample of 20 parts = P(r 3)
= 1 P(r < 3)
NOTES
= 1 [P(r = 0) + P(r = 1) + P(r = 2)]
= 1 [20C0(0.1)0 (0.9)20 0 + 20C
1(0.1)
1 (0.9)20 1 + 20C
2(0.1)
2 (0.9)20 2]
= 1 [0.1216 + 0.2702 + 0.2852] = 1 0.677 = 0.323
The expected number of samples = N . P(r 3)
= 2000 × 0.323 = 646.
Example 14. Find the binomial distribution whose mean is 5 and variance is
10
.
3
Solution. We know that mean = np and variance = npq
10
So np = 5 and npq =
3
npq 10 / 3 2
Now, = q=
np 5 3
2 1
p = 1 q = 1 = . But np = 5 n = 15
3 3
Hence, binomial distribution is
1 2
r 15 r
3 3
15
P(r) = Cr .
Example 15. Four coins are tossed 160 times. The number of times r heads
occur is given below.
r 0 1 2 3 4
No. of times 8 34 69 43 6
Fit a binomial distribution to this data on the hypothesis that coins are unbiased.
1
Solution. The coins are unbiased so the probability of getting head is = .
2
1 1 1
So p= ,q=1p=1 =
2 2 2
Here, n = 4, N = 160
f(r) = expected frequency = N . P(r)
1 1
0 40
1 4
1
P(0) = 4 C0
2 2 =1×1×
2 =
16
Using recurrence relation, we have
nr p
P(r + 1) = . P(r) (Q p = q and n = 4)
r1 q
40 1 1
P(1) = P(0) = 4 × P(0) =4× =
01 16 4
41 3 3 1 3
P(2) = P(1) = P(1) = =
1 1 2 2 4 8
42 2 2 3 1
P(3) = P(2) = P(2) = =
21 3 3 8 4
43 1 1 1 1
P(4) = P(3) = P(3) = =
31 4 4 4 16
Self-Instructional Material 25
Statistical Analysis r P(r) N . P(r)
1 1
0 160 × = 10
16 16
1 1
NOTES 1 160 × = 40
4 4
3 3
2 160 × = 60
8 8
1 1
3 160 × = 40
4 4
1 1
4 160 × = 10
16 16
EXERCISE 2.1
1. A pair of dice thrown 6 times. If getting a total of 9 is considered a success. What is the
probability of at least 5 successes.
2. A die is thrown 6 times. If getting an odd number is a success. Find the probability of
(i) no success (ii) 5 successes
(iii) at least 5 successes (iv) at most 5 successes.
3. A coin is tossed 5 times. What is the probability of getting at least 3 heads ?
4. Find the probability distribution of the number of heads observed when a coin is tossed
3 times.
5. If on an average one ship in every ten is wrecked, find the probability that out of 5 ships
expected to arrive, 4 at least will arrive safely.
6. A pair of dice is thrown 4 times. What is the probability of getting doublets at least
twice?
7. The mean and variance of a binomial distribution are respectively 6 and 9. Is this
statement correct ?
8. A student is given a true-false examination with 8 questions. If he gets 6 or more correct
answers, he passes the examination. Given that he guesses the answer to each question,
find the probability that he passes the examination.
9. In a box containing 60 bulbs, 6 are defective. What is the probability that out of a sample
of 5 bulbs
(i) none is defective (ii) exactly 2 are defective ?
10. The sum of mean and variance of a binomial distribution is 15 and the sum of their
squares is 117. Determine the distribution.
11. Out of 2000 families with 4 children each, how many would you expect to have
(i) at least one boy (ii) 2 boys
(iii) 1 or 2 girls (iv) no girls ?
Assume equal probabilities for boys and girls.
12. In a sampling a large number of parts manufactured by a machine, the mean number of
defectives in a sample of 20 is 2. Out of 1000 such samples, how many would be expected
to contain at least 3 defective.
13. Assuming that 20% of the population of a city are literate, so that the chance of an
1
individual being literate is and assuming that 100 investigators each take 10
5
individuals to see whether they are literate. How many investigations would you expect
to report 3 or less were literate ?
26 Self-Instructional Material
Answers Probability and
Expectation
49 1 3 7 63
1. 2. (i) (ii) (iii) (iv)
96 64 32 64 64
1 NOTES
3.
2
4. r 0 1 2 3
7 9 4
19 37
5.
5 10 6.
44
7. No. 8.
256
9
(i)
5
729 27 1 2
r 27 r
9.
10 (ii)
10000
10. Cr
3 3 ; r = 0, 1, 2, ..., 27
r
P(X = r) = e , where r = 0, 1, 2, 3, ...
r!
Note. (i) P(X = r) is usually written as P(r).
(ii) is called the parameter of the distribution.
(iii) The sum of the probabilities P(r) for r = 0, 1, 2, 3, ... is 1,
Since P(0) + P(1) + P(2) + P(3) + ...
0 1 2 3
= e e e e ...
0! 1! 2! 3!
e e e
= e 2 3 ...
1! 2! 3!
= e
1
2 3
"#
... = e . e = 1
! 2! 3! #$
(iv) The events must be random and independent of each other.
(v) Events must be rare events.
(vi) If n independent trials repeated N times then the expected frequency of r
successes is N . P(r).
Self-Instructional Material 27
Statistical Analysis
2.6. APPLICATIONS OF POISSON DISTRIB
APPLICATIONS UTION
DISTRIBUTION
r r 1
We have P(r) = e and P(r + 1) = e
r! (r 1) !
r 1
e
P (r 1) (r 1) ! r 1 r!
= r = r = r! =
P( r ) (r 1) ! ( r 1) r ! ( r 1)
e
r!
P(r + 1) = P( r ) ,
(r 1)
which is the required recurrence formula. Using this formula successively, we can
find P(1), P(2), ..., if P(0) is known.
2.8. MEAN
MEAN,, VARIANCE AND STAND
STANDARD DEVIA
ANDARD TION OF
DEVIATION
POISSON DISTRIB UTION
DISTRIBUTION
For Poisson distribution, we have
r
P(r) = e
r!
The mean () is given by
r
Mean () = r . P ( r) = r . e
r0 r0
r!
r 1
2 3 "#
=e
r0
r
r!
=e
!
0 1.
1!
2.
2!
3.
3!
...
$
= e
..."#
2
3
! 2! $
28 Self-Instructional Material
= e 1
2 3
"#
... = e e =
Probability and
Expectation
! 2! 3! $
Hence, the mean of the Poisson distribution is equal to the parameter .
The variance (2) is given by NOTES
r
Variance (2) = r
r0
2
. P ( r) 2 = r
r0
2
. e
r!
2
0 1 2 3 4 ..."#
2
1
2
2
2
3
2
4
2
= e
! 1! 2 ! 3 ! 4 ! $
= e
2 3 4 ..."#
. 1
2 3
2
! 1! 2 ! 3 ! $
=e
(1 1) (1 2) (1 3) ..."#
1
2 3
2
! 1! 2! 3! $
... 2 3 ... "#
1
2 3 2 3
! 1 ! 2 ! 3 ! 1 ! 2 ! 3 ! #$
= e 2
... "#
e 1
2
! 1 ! 2 ! #$
2
=e
SOLVED EXAMPLES
Example 1. Using Poisson distribution, find the probability that the aces of
spades will be drawn from a pack of well-shuffled cards at least once in 104
consecutive trials. (Given e2 = 0.1353)
1
Solution. p= , n = 104
52
1
= np = 104 × =2
52
r
We know that P(r) = e
r!
P(at least once) = P(r 1) = 1 P(r < 1)
202
= 1 P(r = 0) = 1 e
0!
= 1 e 2 = 1 0.1353 = 0.8647.
Example 2. Suppose a book of 585 pages contain 43 typographical mistakes. If
these mistakes are randomly distributed throughout the book. What is the probability
that 10 pages, selected at random will be free of mistakes ? (Given e 0.735 = 0.4795)
Self-Instructional Material 29
Statistical Analysis 43
Solution. p= = 0.0735, n = 10
585
= np = 0.0735 × 10 = 0.735
r
NOTES We know that P(r) = e
r!
0.735 (0.735) 0
Required probability = P(r = 0) = e
0!
= e 0.735 = 0.4795.
Example 3. A car hire firm has two cars, which it hires out day-by-day. The
number of demands for a car on each day is distributed as a Poisson distribution with
mean 1.5. Calculate the proportion of days on which neither car is used the proportion
of days on which some demand is refused. (Given e 1.5 = 0.2231)
Solution. The mean of the Poisson distribution is .
= 1.5
r
We know that P(r) = e
r!
The proportion of days on which neither car is used
= Probability of there being no demand for the car
(1.5) 0
1.5
= P(r = 0) = e = e 1.5 = 0.2231
0!
The proportion of days on which some demand is refused
= Probability for the number of demands to be more than two
= P(r > 2) = 1 P( r 2)
= 1 [P(r = 0) + P(r = 1) + P(r = 2)]
= 1 e 1.5 e 1.5
(1.5)
e 1.5
(1.5) 2 "#
! 1! 2! $
= 1 [1 + 1.5 + 1.125] e 1.5 = 1 3.625 × 0.2231 = 0.1913.
Example 4. Find the probability that at most 5 defective components will be
found in a lot of 200, if experience shows that 2% of such components are defective. Also
find the probability of more than 5 defective components. (Given e 4 = 0.018).
2 1
Solution. Here, p= = , n = 200
100 50
1
= np = 200 × =4
50
r
We know that P(r) = e
r!
Probability that at most 5 defective components will be found = P(r 5)
= P(r = 0) + P(r = 1) + P(r = 2) + P(r = 3) + P(r = 4) + P(r = 5)
(4) 0 (4) 1 (4) 2 (4) 3 (4) 4 (4) 5
= e 4 e 4 e 4 e 4 e 4 e 4
0! 1! 2! 3! 4! 5!
= e 4 1 4
16 64 256 1024 "#
! 2
6
24
120 $
= 0.018 × 42.86 = 0.7715
Probability of more than 5 defective components = P(r > 5)
= 1 P(r 5) = 1 0.7715 = 0.2285.
30 Self-Instructional Material
Example 5. It is given that 2% of the electric bulbs manufactured by a company Probability and
are defective. Using Poisson distribution, find the probability that a sample of 200 Expectation
bulbs will contain
(i) no defective bulb (ii) 2 defective bulbs
(iii) atmost 3 defective bulbs (iv) at least 3 defective bulbs. (Given e4 = 0.0183). NOTES
2 1
Solution. Here, p= = , n = 200
100 50
1
= np = 200 × =4
50
r
We know that P(r) = e
r!
(i) Probability of no defective bulb = P(r = 0)
4 (4) 0
=e = e 4 = 0.0183
0!
(ii) Probability of 2 defective bulbs = P(r = 2)
4 (4) 2 16
= e = 0.0183 × = 0.1464
2! 2
(iii) Probability of at most 3 defective bulbs = P(r 3)
= P(r = 0) + P(r = 1) + P(r = 2) + P(r = 3)
Self-Instructional Material 31
Statistical Analysis 0.5
Solution. Here, p = 0.5% = = 0.005, n = 5
100
= np = 5 × 0.005 = 0.025
r
NOTES We know that P(r) = e
r!
Probability of 3 or more defective blades = P(r 3)
= 1 P(r < 3) = 1 [P(r = 0) + P(r = 1) + P(r = 2)]
(0.025) 1
= 1 e 0.025 e 0.025 e 0.025
(0.025) 2"#
!
0.025
1! 2! $
=1e [1 + 0.025 + 0.0003125]
= 1 0.9753 × 1.0253 = 1 0.999975 = 0.00002491.
Example 8. If the variance of the Poisson distribution is 2, find probabilities for
r = 1, 2, 3, 4 using recurrence relation of the Poisson distribution. Also find P(r 4).
(Given e 2 = 0.1353).
Solution. Here, variance = 2
So =2
r
We know that P(r) = e
r!
(2) 0
2
P(0) = e = e 2 = 0.1353
0!
We know that the recurrence relation is
P(r + 1) = P(r)
r1
Now putting r = 0, 1, 2, 3 in the recurrence relation, we have
P(1) = P(0) = 2 × 0.1353 = 0.2706
1
2
P(2) = P(1) = × 0.2706 = 0.2706
2 2
2
P(3) = P(2) = × 0.2706 = 0.1804
3 3
2
P(4) = P(3) = × 0.1804 = 0.0902
4 4
and P(r 4) = 1 P(r < 4)
= 1 [P(r = 0) + P(r = 1) + P(r = 2) + P(r = 3)]
= 1 [P(0) + P(1) + P(2) + P(3)]
= 1 (0.1353 + 0.2706 + 0.2706 + 0.1804)
= 1 0.8569 = 0.1431.
Example 9. A manufacturer who produces medicine bottles finds that 0.1% of
the bottles are defective. The bottles are packed in boxes containing 500 bottles. A drug
manufacturer buys 1000 boxes from the producer of bottles. Using Poisson distribution,
find how many boxes will contain :
(i) no defective bottle
(ii) at least two defective bottles. (Given e 0.5 = 0.6065)
0.1
Solution. Here, p = 0.1% = = 0.001, n = 500
100
= np = 500 × 0.001 = 0.5, N = 1000
32 Self-Instructional Material
Number of boxes containing no defective bottle = N . P(r = 0) Probability and
Expectation
0.5 (0.5) 0
= 1000 × e
0!
= 1000 × 0.6065 = 606.5 = 606 (approx.) NOTES
Number of boxes containing at least two defective bottles = N . P(r 2)
= N . [1 P(r < 2)]
= N × [1 (P(r = 0) + P(r = 1))]
(0.5) 1 "#
! #$
= 1000 1 e 0.5 e 0.5
1
= 1000 × [1 (0.6065 × 1.5)] = 1000 × [1 0.90975]
= 1000 × 0.09025 = 90.25 = 90 (approx.).
Example 10. After correcting 100 pages of a book, the proof-reader finds that
there are on the average, 4 errors per 10 pages. How many pages would one expect to
find with 0, 1 and 2 errors in 1000 pages of the first print of the book ?
(Given e 0.4 = 0.6703).
Solution. Here, = average number of errors per page
4
= = 0.4, N = 1000
10
r
We know that P(r) = e
r!
0.4 (0.4) 0
(i) Probability of no errors = P(r = 0) = e = e 0.4 = 0.6703
0!
Number of pages containing no errors = N . P(r = 0)
= 1000 × 0.6703 = 670.3 = 670 (approx.)
(ii) Probability of one error = P(r = 1)
0.4 (0.4) 1
=e = 0.6703 × 0.4 = 0.26812
1!
Number of pages containing one error = N . P(r = 1)
= 1000 × 0.26812 = 268.12 = 268 (approx.)
(iii) Probability of two errors = P(r = 2)
0.4 (0.4) 2
=e = 0.6703 × 0.08 = 0.053624
2!
Number of pages containing two errors = N . P(r = 2)
= 1000 × 0.053624 = 53.624 = 54 (approx.).
Example 11. For a Poisson variate X, calculate P(X > 0), if it is given that
4P(X = 4) = 5P(X = 5).
Solution. Given 4P(X = 4) = 5P(X = 5)
4
5
4 . e = 5.e
4! 5!
4 5
4 =5
4! 5 4!
Self-Instructional Material 33
Statistical Analysis 44 = 5 =4
P(X > 0) = P(r > 0) = 1 P(r 0) = 1 P(r = 0)
(4) 0
4
=1 e = 1 e 4 = 1 0.0183 = 0.9817.
NOTES 0!
Example 12. The frequency of accidents per shift in a factory is shown in the
following data :
0 192
1 100
2 24
3 3
4 1
Total 320
Calculate the mean number of accidents per shift. Fit a Poisson distribution and
calculate theoretical frequencies.
Solution. Mean number of accidents per shift
fx 0 100 48 9 4 161
= = = = 0.5031
f 320 320
= 0.5031
r
Required Poisson distribution = N . e
r!
(0.5031) r
= 320 × e 0.5031 ×
r!
(193.48) (0.5031) r
=
r!
0 193.48 193
1 97.34 97
2 24.49 24
3 4.10 4
4 0.51 1
Example 13. A typist kept a record of mistakes per day during 300 working
days :
Fit a Poisson distribution for the above data and calculate theoretical frequencies.
34 Self-Instructional Material
fx Probability and
Solution. Here, = mean = Expectation
f
0 90 88 42 36 256
= = = 0.853, N = 300
300 300
NOTES
0.853 (0.853) r (0.853) r
P(r) = e = (0.426)
r! r!
(0.853)0
P(0) = (0.426) = 0.426
0!
(0.853) 1
P(1) = (0.426) = 0.426 × 0.853 = 0.363
1!
(0.853) 2
P(2) = (0.426) = 0.426 × 0.3638 = 0.155
2!
(0.853)3
P(3) = (0.426) = 0.426 × 0.1034 = 0.044
3!
(0.853) 4
P(4) = (0.426) = 0.426 × 0.0221 = 0.009
4!
0 127.8 128
1 108.9 109
2 46.5 47
3 13.2 13
4 2.7 3
EXERCISE 2.2
1. Suppose a book of 600 pages contain 40 printing mistakes. If these mistakes are randomly
distributed throughout the book. What is the probability that 10 pages, selected at
random, will be free of mistakes ? (Given e 0.67 = 0.51)
2. Suppose 300 misprints are distributed randomly throughout a book of 500 pages. Find
the probability that a given page contains (i) exactly 2 misprints (ii) 2 or more misprints.
3. Suppose 2 percent of the items made by a factory are defective. Find the probability that
there are 3 defective items in a sample of 100 items. (Given e 2 = 0.135)
4. If the probability that an individual suffers a bad reaction from a certain injection is
0.001, determine the probability that out of 2000 individuals
(i) exactly 3 (ii) more than 2 individuals
(iii) none (iv) more than 1 individual, will suffer a bad reaction.
5. An insurance company finds that 0.005% of the population dies from a certain kind of
accident each year. What is the probability that the company must pay off no more than
3 of 10,000 insured risks against such accident in a given year ? (Given e 0.5 = 0.6065)
6. A manufacturer of screws knows that 4% of his product is defective. If he sells the screws
in boxes of 100 and guarantee that not more than 5 screws will be defective. What is the
probability that a box will fail to meet the guaranteed quality ?
7. A manufacturer knows that the condensers he makes contain on the average 1% of the
defectives. He packs them in boxes of 100. What is the probability that a box picked at
random will contain 4 or more defective condensers ?
Self-Instructional Material 35
Statistical Analysis 8. Assume that the probability of an individual coal-miner being killed in a mine accident
1
during a year is . Use Poisson distribution to calculate the probability that in a
2400
mine employing 200 miners, there will be at least one fatal accident in a year.
NOTES 9. An insurance company found that only 0.01% of the population is involved in a certain
type of accident each year. If its 1000 policy holders were randomly selected from the
population, then what is the probability that not more than two of its clients are involved
in such an accident next year ? (Given e 0.1 = 0.9048)
10. If X is a Poisson variate such that P(X = 2) = 9 P(X = 4) + 90 P(X = 6), find the mean of X.
11. If X is a Poisson variate such that P(X = 1) = 0.01487 ; P(X = 2) = 0.04461, find P(X = 3).
12. If X is a Poisson variate such that P(X = 1) = P(X = 2) ; find
(i) mean of the distribution (ii) P(X = 0) (iii) P(X = 4)
(Given e 2 = 0.1353)
13. The number of accidents in a year involving taxi drivers in a city follows a Poisson
distribution with mean equal to 3. Out of 1000 taxi drivers, find approximately the
number of drivers with
(i) no accident in a year (ii) more than 3 accident in a year.
1
14. In a certain factory turning out razor blades, there is small chance for any blade to
500
be defective. The blades are supplied in packets of 10. Using Poissons distribution,
calculate the approximate number of packets containing
(i) no defective
(ii) one defective and
(iii) two defective blades respectively in a consignment of 10,000 packets.
(Given e 0.02 = 0.9802)
15. The distribution of typing mistakes committed by a typist is given below :
No. of accidents 0 1 2 3 4 5
Answers
1. 0.51 2. (i) 0.1 (ii) 0.122
3. 0.18
4. (i) 0.18 (ii) 0.325 (iii) 0.135 (iv) 0.59
36 Self-Instructional Material
5. 0.3235 6. 1 e 4
5 4 4 4 4 "#
2 3 4 5
Probability and
Expectation
! 2 ! 3 ! 4 ! 5 ! #$
7. 0.019 8. 0.08 9. 0.9998
10. 1 11. 0.08922
NOTES
12. (i) 2 (ii) 0.1353 (iii) 0.0902
13. (i) 50 (ii) 353
14. (i) 9802 (ii) 196 (iii) 2
15. 147, 147, 74, 25, 6, 1 pages 16. 202, 138, 47, 11, 2, 0
17. 109, 66, 20, 4, 1, 0
2.10. PROPER
PROPERTIES OF THE NORMAL DISTRIB
OPERTIES UTION
DISTRIBUTION
The normal probability curve with mean and standard deviation is given by
the equation
2
1 x
1
2
f(x) = e
2
Self-Instructional Material 37
Statistical Analysis and has the following properties :
(i) f(x) 0
NOTES
(ii) I
f ( x) dx = 1 i.e., the total area under the normal curve above the x-axis is 1.
(iii) The normal curve is bell-shaped and symmetrical about the line x = i.e.,
mean.
(iv) It is a unimodal distribution i.e., mean = median = mode.
(v) The height of the normal curve is maximum at the mean value. The maximum
1
ordinate at x = is given by y = .
2
(vi) P( < x < + ) = 68%
P( 2 < x < + 2) = 95.5%
P( 3 < x < + 3) = 99.7%.
2.11. STAND
STANDARD FORM OF THE NORMAL DISTRIB
ANDARD UTION
NORMAL DISTRIBUTION
SOLVED EXAMPLES
Example 1. The marks obtained by a group of students who appeared for a test
were normally distributed with mean 80 and standard deviation 6. Find the standard
scores for the student who scored
(i) 98 marks (ii) 58 marks (iii) 50 marks.
Solution. Suppose x is normally distributed with mean () = 80 and standard
deviation () = 6.
We know that
x
Standard normal variate Z =
98 80 18
(i) When x = 98, Z = = =3
6 6
58 80 22
(ii) When x = 58, Z = = = 3.67
6 6
50 80 30
(iii) When x = 50, Z = = = 5.
6 6
38 Self-Instructional Material
Example 2. Find the area under the standard normal curve in each of the Probability and
following : Expectation
(i) P(0 Z 1.4) (ii) P( 1.67 Z < 0)
(iii) P(0.65 Z < 2.35) (iv) P( 3 Z < 1.6).
Solution. Using the table of the area of standard normal curve, we have NOTES
(i) P(0 Z 1.4) = 0.4192
(i) (ii)
z = 0 z = 0.65 z = 2.35
Example 3. Students of a class were given an aptitude test. Their marks were
found to be normally distributed with mean 60 and standard deviation 5. What
percentage of students scored more than 60 marks ?
Solution. Here, = 60, = 5, x = 60
x
Z=
60 60
= =0
5
P(x > 60) = P(Z > 0)
= P(0 < Z < )
= 0.5 = 50%.
Self-Instructional Material 39
Statistical Analysis Example 4. A sample of 100 dry battery cells tested to find the length of life
produced the following results :
= 12 hours, = 3 hours
Assuming data to be normally distributed, what percentage of battery cells are
NOTES expected to have life
(i) more than 15 hours (ii) less than 6 hours
(iii) between 10 and 14 hours ?
Solution. Here, x denotes the length of life of dry battery cells.
x x 12
We know that Z= =
3
15 12 3
(i) When x = 15, Z = = =1
3 3
P(x > 15) = P(Z > 1)
= P(0 < Z < ) P(0 < Z < 1)
= 0.5 0.3413
= 0.1587 = 15.87%.
z=0 z=1
6 12 6
(ii) When x = 6, Z = = =2
3 3
P(x < 6) = P(Z < 2)
= P(Z > 2)
= P(0 < Z < ) P(0 < Z < 2)
= 0.5 0.4772
= 0.0228 = 2.28%
(iii) When x = 10,
z = –2 z = 0 z=2
10 12 2
Z= = = 0.67
3 3
14 12 2
When x = 14, Z = = 0.67
3 3
P(10 < x < 14) = P( 0.67 < Z < 0.67)
= P( 0.67 < Z < 0) + P(0 < Z < 0.67)
= P(0 < Z < 0.67) + P(0 < Z < 0.67
= 2P(0 < Z < 0.67)
= 2 × 0.2486 = 0.4972
z = –0.67 z = 0 z = 0.67
= 49.72%.
Example 5. A normal distribution is given with mean 50 and standard deviation
8. Find the probability that x assumes a value between 38 and 72.
Solution. Here, = 50, = 8
x x 50
Z= =
8
38 50 12
When x = 38, Z = = = 1.5
8 8
72 50 22
When x = 72, Z = = = 2.75
8 8
P(38 < x < 72) = P( 1.5 < Z < 2.75)
z = –1.5 z = 0 z = 2.75
= P( 1.5 < Z < 0) + P(0 < Z < 2.75)
= P(0 < Z < 1.5) + P(0 < Z < 2.75)
= 0.4332 + 0.4970 = 0.9302.
40 Self-Instructional Material
Example 6. In a sample of 1000 cases, the mean of a certain test is 14 and Probability and
standard deviation is 2.5. Assuming the distribution to be normal, find : Expectation
8 14 6
(iii) When x = 8, Z = = = 2.4
2.5 2.5
P(x < 8) = P(Z < 2.4)
= P(Z > 2.4)
= P(0 < Z < ) P(0 < Z < 2.4)
= 0.5 0.4918
= 0.0082
Number of students scoring below 8
= 1000 × 0.0082
z = –2.4 z = 0
_ 8 (approx.)
= 8.2 ~
(iv) Area between x = 15.5 and x = 16.5
15.5 14 1.5
When x = 15.5, Z = = = 0.6
2.5 2.5
Self-Instructional Material 41
Statistical Analysis 16.5 14 2.5
When x = 16.5, Z = = =1
2.5 2.5
P(15.5 < x <16.5) = P(0.6 < Z < 1)
= P(0 < Z < 1) P(0 < Z < 0.6)
NOTES
= 0.3413 0.2257
= 0.1156
Number of students scoring 16
z = 0 z = 0.6 z = 1
= 1000 × 0.1156
_ 116 (approx.).
= 115.6 ~
Example 7. The life of army shoes is normally distributed with mean 8 months
and standard deviation 2 months. If 5000 pairs are, issued, how many pairs would be
expected to need replacement after 12 months ?
Solution. Here, = 8, = 2, N = 5000
x 12 8 4
Z= = = =2
2 2
P(x > 12) = P(Z > 2)
= P(0 < Z < ) P(0 < Z < 2)
= 0.5 0.4772
= 0.0228 z=0 z=2
Frequency 1 7 15 22 35 43 38 20 13 5 1
Self-Instructional Material 43
Statistical Analysis
Length of line (in cm) Frequency d=xA fd fd2
x f = x 14
4 1 10 10 100
NOTES 6 7 8 56 448
8 15 6 90 540
10 22 4 88 352
12 35 2 70 140
14 (A) 43 0 0 0
16 38 2 76 152
18 20 4 80 320
20 13 6 78 468
22 5 8 40 320
24 1 10 10 100
fd ( 30)
Mean () = A + = 14 + = 14 0.15 = 13.85
f 200
fd 2 fd 2
2940
30 2
Standard deviation () =
f
–
f =
200
– –
200
= 14.7 0.0225 =
14.6775 = 3.83
Hence, the equation of the normal curve is given by
2
1 x ( x 13.85) 2
1 –
2
1 –
29.355
f(x) = e = e .
2 (3.83) 2
EXERCISE 2.3
1. On a final examination in Statistics, the mean was 72, and the standard deviation was
15. Determine the standard scores of students receiving grades :
(i) 60 (ii) 93 (iii) 72.
2. Find the area under the normal curve in each of the cases :
(i) Z = 0 and Z = 1.2 (ii) Z = 0.68 and Z = 0
(iii) Z = 0.46 and Z = 2.21 (iv) Z = 0.81 and Z = 1.94
(v) To the left of Z = 0.6 (vi) Right of Z = 1.28.
3. Find the value of Z in each of the cases
(i) area between 0 and Z is 0.3770 (ii) area to the left of Z is 0.8621
4. If X is a normal variate with mean 12 and standard deviation 4, then find
(i) P(X 20) (ii) P(X 20)
5. The scores of candidates in a certain test are normally distributed with mean 500 and
standard deviation 100. What percentage of candidates receives the scores between 350
and 550 ?
6. Assume the mean height of soldiers to be 68.22 inches with a variance of 10.8 inches
square. How many soldiers in a regiment of 10,000 would you expect to be over 6 feet
tall?
7. In a sample of 1000 items, the mean weight and standard deviation are 45 kgs and
15 kgs respectively. Assuming the distribution to be normal, find the number of items
weighing between 40 kgs and 60 kgs.
44 Self-Instructional Material
8. A workshop produces 2000 units per day. The average weight of units is 130 kgs with a Probability and
standard deviation of 10 kgs. Assuming normal distribution, how many units are expected Expectation
to weight less than 142 kgs ?
9. The mean of a normal distribution is 50 and 5% of the values are greater than 60. Find
the standard deviation of the distribution.
NOTES
10. The time taken to complete a particular type of job is distributed approximately normal
with a mean of 1.8 hours and a standard deviation 0.1 hour. If Normal time work finishes
at 6.00 p.m. and a job is started at 4.00 p.m. then, what is the probability that the job
will need overtime payments ?
11. The marks of the students in a class are normally distributed with mean 70 and standard
deviation 5. If the instructor decides to give A grade to the top 15% students of the
class, how many marks a student must get to be able to get A grade ?
12. Find the values of mean and standard deviation from the following data relating to a
normal distribution ?
10% of the items are under 40
95% of the items are under 75.
13. In a sample of 240 workers in a factory, the mean and standard deviation of wages were
` 113.50 and ` 30.30 respectively. Find the percentage of workers getting wages between
` 90 and ` 170 in the whole factory assuming that the wages are normally distributed.
14. In a distribution exactly normal, 7% of the items are under 35 and 89% are under 63.
What are the mean and standard deviation of the distribution ?
15. Fit a normal curve to the following data :
Variable (r) 0 1 2 3 4 5
Frequency (f) 10 14 19 8 5 4
Answers
1. (i) 0.8 (ii) 1.4 (iii) 0
2. (i) 0.3849 (ii) 0.2518 (iii) 0.6637 (iv) 0.1828
(v) 0.2743 (vi) 0.8997 3. (i) z = ± 1.16 (ii) Z = 1.09
4. (i) 0.0228 (ii) 0.9772 5. 62.47% 6. 1251
7. 471 approx. 8. 1770 9. 6.1 10. 0.0228
11. 75.18 12. 55.32, 11.97 13. 75% 14. 50.3, 10.33
( x 1.93) 2
1 –
15. f(x) = e 3.92 .
1.40 2
Self-Instructional Material 45
Statistical Analysis
NOTES
3. STATISTICAL DECISION THEORY
STRUCTURE
Introduction
Elements of a Decision Problem
Types of Decision Making Environment
Decision Making Under Uncertainty
Decision Making Under Risk
Decision Tree
3.1. INTRODUCTION
INTRODUCTION
Decision analysis involves the use of a rational process for selecting the best of
several alternatives. The goodness of a selected alternative depends on the quality of
the data used in describing the decision situation. A decision making process falls into
one of the three categories ; decision making under certainty, decision making under
uncertainty and decision making under risk.
Nowadays, management students, businessman, engineers, persons from
industries and government, etc. are giving much emphasis over decision making under
conditions of uncertainty as mostly situations involves making choices under
uncertainty. The study of making decisions to choose the best among a number of
alternative courses of action is known as decision theory or statistical decision theory.
There are certain essential elements which are common to all decision making
categories :
1. The decision maker. The decision maker is charged with the responsibility
for making the decisions. The decision maker can be an individual, a group of
individuals, any company, an industrial body, etc.
46 Self-Instructional Material
2. Acts. The acts are the alternative courses of action or strategies that are Statistical Decision
available to decision maker. Theory
3. Events. The events also known as state of nature. The events identify the
occurrences which are not under the control of decision maker and which determines
the level of success for a given act. NOTES
4. Pay-off. Each combination of a course of action and an event is associated
with a pay-off. It is a quantitative measure of the value to the decision maker of the
outcomes. It measures the net benefit to the decision maker from a given combination
of course of action and an event. The pay-off usually represents the net monetary gain
(profit), but some other measures can also be used, as cost is negative profit.
5. Pay-off table. Suppose the problem under consideration has m possible events
(state of nature) denoted by E 1, E2, ..., Em and n alternative acts (strategies) denoted
by A1, A2, ..., An. Then the pay-off corresponding to strategy Aj of the decision maker
under the state of nature Ei will be denoted by pij (i = 1, 2, ...., m ; j = 1, 2, ..., n).
The totality of mn pay-offs arranged in a tabular form is known as pay-off table.
where M1, M2, ...., Mm are the maximum of these quantities respectively.
Self-Instructional Material 47
Statistical Analysis
3.3. TYPES OF DECISION MAKING ENVIRONMENT
ENVIRONMENT
Different rules for making a decision under such environment are as follows :
1. Maximax or Minimin Criterion (Criterion of Optimism). This criterion
is based upon extreme optimism. The basic steps of this criterion are as follows :
(i) Determine the maximum possible pay-off for each alternative.
(ii) Choose that alternative which corresponds to the maximum of the above
maximum pay-offs.
In decision problems dealing with costs, the minimum for each alternative is
determined and then the alternative which minimizes the above minimum cost is
selected.
2. Maximin or Minimax Criterion (Criterion of Pessimism). This criterion
is based upon the conservative approach to assume that the worst possible is going to
happen. The basic steps of this criterion are as follows :
(i) Determine the minimum possible pay-off for each alternative.
(ii) Choose that alternative which corresponds to the maximum of the above
minimum pay-offs.
In decision problems dealing with costs, the maximum for each alternative is
determined and then the alternative which minimizes the above maximum cost is
selected.
3. Laplace Criterion (Equally Likely Decisions). The Laplace criterion uses
all the information by assigning equal probabilities to all events of each alternative,
as there is no information about probabilities of occurrence. The basic steps of this
criterion are as follows :
1 to each pay-off of a strategy (having n pay-offs).
(i) Assign equal probabilities n
(ii) Determine the expected pay-off value for each alternative by multiplying
each pay-off by its probability and then adding.
48 Self-Instructional Material
(iii) Choose that alternative which corresponds to the maximum of the expected Statistical Decision
pay-offs. Theory
In decision problems dealing with costs, select that alternative which corresponds
to the minimum of the expected pay-offs.
4. Savage Criterion (Criterion of Regret). The savage criterion is based on NOTES
the concept of regret (or opportunity loss). This criterion also known as minimax regret
criterion. The basic steps of this criterion are as follows :
(i) Construct the regret table.
regret (opportunity loss)
=
%& RC[QHH OCZRC[QHH KH VJG RC[QHHUTGRTGUGPVRTQHKVU
' RC[QHH OKPRC[QHHKHVJGRC[QHHUTGRTGUGPVEQUVU
(ii) Determine the maximum regret for each alternative.
(iii) Choose the alternative with minimum regret out of these maximum regrets.
5. Hurwicz Criterion. The Hurwicz criterion is based on the concept that the
decision makers are neither completely pessimistic nor completely optimistic, but are
a combination of the two extremes. Therefore, we should give attention to both. The
basic steps of this criterion are as follows :
(i) Choose an appropriate degree of optimism (or pessimism) of the decision
maker. Let (0 1) be his degree of optimism (so 1 is his degree of pessimism).
(ii) Determine the maximum as well as minimum pay-off for each alternative
and obtain the quantities D (decision index) as
D = .maximum pay-off + (1 ) . minimum pay-off
for each alternative.
(iii)Choose the maximum value of D when profits are given (Choose the minimum
value of D when costs are given).
SOLVED EXAMPLES
Example 1. Given the following profit pay-off table :
A1 16 10 12 7
A2 13 12 9 9
A3 11 14 15 14
Self-Instructional Material 49
Statistical Analysis Solution.
(i) Maximin criterion :
A1 16 10 12 7 7
A2 13 12 9 9 9
A3 11 14 15 14 11 (Max.)
A1 16 10 12 7 16 (Max)
A2 13 12 9 9 13
A3 11 14 15 14 15
50 Self-Instructional Material
Example 2. The research department of Hindustan Lever has recommended to Statistical Decision
the marketing department to launch a shampoo of three different types. The marketing Theory
manager has to decide one of the types of shampoo to be launched under the following
estimated pay-offs various levels of sales :
NOTES
Estimated levels of sales (units)
Types of shampoo
15,000 10,000 5,000
Egg shampoo 30 10 10
Clinic shampoo 40 15 5
Delux shampoo 55 20 3
Egg 30 10 10 10 (Max.)
Clinic 40 15 5 5
Delux 55 20 3 3
Egg 30 10 10 30 (Min)
Clinic 40 15 5 40
Delux 55 20 3 55
Egg 30 10 10 30
Clinic 40 15 5 40
Delux 55 20 3 55 (Max.)
Self-Instructional Material 51
Statistical Analysis
1
(iv) Laplace criterion : Here, p =
3
1 1 1
EMV (Egg shampoo) = × 30 + × 10 + × 10
NOTES 3 3 3
1 50
= (30 + 10 + 10) = = 16.67
3 3
1 60
EMV (Clinic shampoo) = (40 + 15 + 5) = = 20
3 3
1 78
EMV (Delux shampoo) = (55 + 20 + 3) = = 26
3 3
Since the EMV is maximum for Delux shampoo. So Delux shampoo should be
launched.
(v) Minimax regret criterion :
Regret table
The farmer decides to plant only one crop, which would be his best crop using the
following :
(i) Maximax criterion (ii) Maximin criterion
(iii) Laplace criterion (iv) Minimax regret criterion.
52 Self-Instructional Material
Solution. (i) Maximax criterion : Statistical Decision
Theory
Type of crop Estimated Conditional Profit Maximum
Rainfall of each
High Medium Low crop NOTES
Crop A 6000 4000 2000 6000
Crop B 3000 4500 5000 5000
Crop C 7000 4000 5000 7000 (Max.)
Regret table
Self-Instructional Material 53
Statistical Analysis Example 4. Consider the following pay-off (profit) matrix :
Alternative Events
E1 E2 E3 E4
NOTES
A1 5 10 18 25
A2 8 7 8 23
A3 21 18 12 21
A4 30 22 19 15
A1 25 5 25 × 0.75 + 5 × 0.25 = 20
A2 23 7 23 × 0.75 + 7 × 0.25 = 19
A3 21 12 21 × 0.75 + 12 × 0.25 = 18.75
A4 30 15 30 × 0.75 + 15 × 0.25 = 26.25
Different criterion for making a decision under such environment are as follows:
1. Expected Monetary Value (EMV) Criterion. The expected monetary value
criterion seeks the maximization of expected profit or the minimization of expected
cost. The basic steps of this criterion are as follows :
(i) Construct the pay-off table listing all possible courses of actions and events
(states of nature), along with the corresponding event probabilities.
(ii)Determine the expected conditional profit values for each course of action.
(iii) Determine EMV for each course of action (strategy) by
EMV (Ai) = pi1 P(E1) + pi2 P(E2) + ... + pim P(Em)
(iv) Choose that course of action (strategy) having highest EMV.
2. Expected Opportunity Loss (EOL) Criterion. An alternative approach
to maximizing expected monetary value (EMV) is to minimize expected opportunity
loss (EOL). The basic steps of this criterion are as follows :
(i) Construct the opportunity loss table listing all possible courses of actions
and events (states of nature), along with the corresponding event probabilities.
(ii) Determine the conditional opportunity loss values for each event.
(iii) Determine the expected conditional opportunity loss values and sum these
values to get the expected opportunity loss (EOL) for each course of action by
EOL (Aj) = (M1 p1j) P(E1) + (M2 p2j) P(E2) + ... + (Mm pmj) P (Em)
(j = 1, 2, ..., n)
(iv) Choose that course of action (strategy) having lowest EOL.
54 Self-Instructional Material
3. Expected Value of Perfect Information (EVPI). The expected value with Statistical Decision
perfect information is the expected or average return, in the long run, if we have perfect Theory
information before a decision is made. The EVPI may be defined as the maximum
amount spend by the decision maker to get perfect (additional) information. Expected
pay-off under perfect information (EPPI) can be calculated by finding the sum of product NOTES
of pay-off of best outcome of each state of nature and its probability of occurrence.
The expected value of perfect information (EVPI) is the expected outcome with
perfect information minus the expected outcome without perfect information (maximum
EMV).
EVPI = EPPI max. EMV
SOLVED EXAMPLES
Example 1. A management is faced with the problem of choosing one of three
products for manufacturing. The potential demand for each product may turn out to be
good, moderate or poor. The probabilities for each of the states of nature were estimated
as follows :
The estimated profit or loss (in `) under the three states may be taken as :
Prepare the expected value table and advice the management about the choice of
the product.
Solution.
Since the EMV is maximum for product Y, so Y should be selected as the best
product.
Self-Instructional Material 55
Statistical Analysis Example 2. Pay-offs of three acts X, Y, Z and the states of nature P, Q, R are as
follows :
P 120 80 100
Q 200 400 300
R 260 260 600
The probabilities of the states of nature are 0.3, 0.5 and 0.2 respectively. Tabulate
the expected monetary values (EMVs) for the above data and state which can be selected
as the best act.
Solution.
Pay-offs (`)
(Acts)
State of nature X Y Z
x1j p1j x1j p1j x2j p2j x2j p2j x3j p3j x3j p3j
Since the EMV is maximum for act Y, so Y should be selected as the best act.
Example 3. A newspaper distributor assigns probabilities to the demand for a
magazine as follows :
Copies demanded 1 2 3 4
A copy of magazine sells for ` 7 and costs ` 6. What can be the maximum possible
expected monetary value (EMV) if the distributor can return unsold copies for ` 5 each
?
Solution. Cost of a magazine =`6
Selling price of a magazine =`7
Profit per magazine = ` (7 6) = ` 1
Loss on each unsold magazine
= ` (6 5) = ` 1
%& 1 S S if D S
Conditional profit =
' 1 D 1 (S D) 2D S if D S
,
56 Self-Instructional Material
The resulting pay-off and corresponding expected pay-offs are as follows : Statistical Decision
Theory
Event Probability Conditional pay-off (`) Expected pay-off (`)
(Demand) Act (Stock) Act (Stock)
1 2 3 4 1 2 3 4 NOTES
D (i) (ii) (iii) (iv) (v) (i) × (ii) (i) × (iii) (i) × (iv) (i) × (v)
Since the EMV is maximum for act (stock) 2, so the optimum act for the
distributor would be to stock 2 copies of magazine.
Example 4. The following pay-off table is given :
Acts Events
E1 E2 E3 E4
E1 0.20 40 200 0 50 8 40 0 10
E2 0.15 200 0 100 400 30 0 15 60
E3 0.40 200 200 0 100 80 80 0 40
E4 0.25 100 0 150 0 25 0 37.5 0
Since the EMV is maximum for act A2, so A2 is the optimum act.
Self-Instructional Material 57
Statistical Analysis Computation of expected loss
Since the EOL is minimum for act A2, so A2 is the optimum act.
Example 5. A grocery with a bakery department is faced with the problem of
how many cakes to buy in order to meet the days demand. The grocer prefers not to sell
day-old goods in competition with fresh products ; leftover cakes are, therefore, a complete
loss. On the other hand, if a customer desires a cake and all of them have been sold, the
disappointed customer will buy elsewhere and the sales will be lost. The grocer has,
therefore, collected information on the past sale on a selected 100 day period as follows:
25 10 0.10
26 30 0.30
27 50 0.50
28 10 0.10
A cake costs ` 80 and sells for ` 100. Construct the pay-off table and the opportunity
loss table. What is the optimum number of cakes that should be bought each day ?
Solution. Let Ai = alternative strategy (act) of stocking i cakes
Ej = a daily demand of j cakes state of nature (event)
Here, cost of a cake = ` 80
Selling price of a cake = ` 100
Profit per cake sold = ` (100 80) = ` 20
Loss on each unsold cake = ` 80
%K 20 S if D S
Conditional pay-off = &K 20 D 80 (S D) if D S ,
' = 100 D 80 S
where D = no. of cakes demanded
S = no. of cakes in stock
58 Self-Instructional Material
The resulting pay-off (conditional profit) are as follows : Statistical Decision
Theory
Event Probability Conditional pay-off (`)
(Demand) Act (Stock)
D A1 : 25 A2 : 26 A3 : 27 A4 : 28 NOTES
E1 : 25 0.10 500 420 340 260
E1 : 25 0.10 50 42 34 26
E2 : 26 0.30 150 156 132 108
E3 : 27 0.50 250 260 270 230
E4 : 28 0.10 50 52 54 56
Since the EMV is maximum for act A2, so 26 cakes should be bought (stocked)
each day.
The conditional opportunity loss are computed as follows :
E1 : 25 0.10 0 8 16 24
E2 : 26 0.30 6 0 24 48
E3 : 27 0.50 20 10 0 40
E4 : 28 0.10 6 4 2 0
EOL 32 22 42 112
Since the EOL is minimum for act A2, so 26 cakes should be bought (stocked)
each day.
Self-Instructional Material 59
Statistical Analysis Example 6. A wholesaler of sports goods has an opportunity to by 5000 pairs of
gloves that have been declared surplus by the government. The wholesaler will pay ` 50
per pair and can obtain ` 100 a pair by selling gloves to retailers. The price is well
established, but the wholesaler is in doubt as to just how many pairs he will able to sell.
NOTES Any gloves leftover, he can sell to discount outlets at ` 20 a pair. After a careful
consideration of the past data, the wholesaler assigns probabilities to the demand as
follows :
Retailers demand Probability
1000 pairs 0.6
3000 pairs 0.3
5000 pairs 0.1
(i) Compute the conditional monetary and expected monetary values.
(ii) Compute the expected profit with a perfect predicting device.
(iii) Compute the EVPI.
Solution. Cost per pair = ` 50
Selling price per pair = ` 100
Profit per pair = ` 50 (on sold pair)
Disposal selling price = ` 20 (on unsold pair)
Loss on each unsold pair = ` (50 20) = ` 30
%K 50 S if D S
Conditional pay-off (profit) = & 50 D 30 (S D) if D < S ,
K' 80 D 30 S
where D = no. of pairs demanded
S = no. of pairs stocked
(i) The resulting conditional pay-offs and corresponding expected pay-offs are
computed as follows :
EMV 50 54 10
60 Self-Instructional Material
(ii) The expected profit under perfect information (EPPI) is computed as follows: Statistical Decision
Theory
Retailers Probability Conditional pay-offs (`000) Under perfect information
demand Stock per week (` 000)
D 1000 3000 5000 Maximum Expected NOTES
pairs pairs pairs pay-off (v) pay-off
(i) (ii) (iii) (iv) [from (ii), (iii) and (iv)] (i) × (v)
EPPI = 100
Pay-off (`)
State of nature Acts
A1 A2 A3
X 20 50 200
Y 200 100 50
Z 400 600 300
The probabilities of the state of nature are 0.3, 0.4 and 0.3 respectively. Calculate
the EMV for the given data and select the best act. Also find the expected value of
perfect information (EVPI).
Solution. Computation of expected pay-off
X 0.3 20 50 200 6 15 60
Y 0.4 200 100 50 80 40 20
Z 0.3 400 600 300 120 180 90
Since the EMV is maximum for act A1, so A1 is the best act.
Self-Instructional Material 61
Statistical Analysis The expected profit under perfect information (EPPI) is computed as follows :
EPPI = 320
SOLVED EXAMPLES
Example 1. XYZ Ltd. has invented a picture cell phone. It is faced with selecting
one alternative out of the following strategies :
(i) Manufacture the cell phone
(ii) Take royalty from another manufacturer
(iii) Sell the rights for the invention and take a lumpsum amount.
62 Self-Instructional Material
Profit in thousands of rupees which can be incurred and the probability associated Statistical Decision
with such alternative are shown in the followng table: Theory
Represent the companys problem in the form of the decision tree and suggest
what decision the company should take to maximize profit.
Solution.
Monetary values Prob. EMV
High
200 0.25
Medium
50 0.40 Rs. 66,500
re
Low
tu
–10 0.35
ac
uf
an
High
M
60 0.25
Royalty Medium
40 0.40 Rs. 38,000
Low
20 0.35
Se
High
ll
rig
50 0.25
ht
s
Medium
50 0.40 Rs. 50,000
Low
50 0.35
Thus, EMV for strategy (i) manufacture the cell phone is maximum. So the best
decision by XYZ Ltd. is to manufacture the picture cell phone itself to get profit of `
66,500.
Example 2. A company is evaluating four alternative single-period investment
opportunities whose returns are based on the state of the economy. The possible states of
the economy and the associated probability distribution are as follows :
The returns for each investment opportunity and each state of the economy are as
follows :
State of Economy
Alternative Fair Good Great
(`) (`) (`)
Self-Instructional Material 63
Statistical Analysis Using the decision tree approach, determine the expected return for each
alternative. Which alternative investment proposal would you recommend if the expected
monetary value criterion is to be employed ?
Solution.
NOTES Monetary values Prob. EMV
Fair (Rs.)
1,000 0.2
A Good
3,000 0.5 Rs. 3,500
Great
6,000 0.3
Fair
500 0.2
B Good
4,500 0.5 Rs. 4,390
Great
6,800 0.3
Fair
0 0.2
C Good
5,000 0.5 Rs. 4,900
Great
8,000 0.3
Fair
–4,000 0.2
D Good
6,000 0.5 Rs. 4,750
Great
8,500 0.3
Thus, EMV for C is maximum, so alternative C is the best with maximum return
of ` 4,900.
Example 3. A manager has a choice between (i) a risky contract promising ` 7
lakhs with probability 0.6 and ` 4 lakhs with probability 0.4 (ii) a diversified portfolio
consisting of
two contracts with independent outcomes each promising ` 3.5 lakhs with
probability of 0.6 and ` 2 lakhs with probability of 0.4. Construct a decision tree and
suggest which choice the manager should opt using EMV criterion.
Solution.
D 3.5 0.6
3 Rs. 2.9 lakhs
0.5
2 0.4
2
0.5 3.5 0.6
4 Rs. 2.9 lakhs
2 0.4
64 Self-Instructional Material
Example 4. A shopkeeper has the facility to store a large number of perishable Statistical Decision
items. He buys them at a rate of ` 3 per item and sells at the rate of ` 5 per item. If an Theory
item is not sold at the end of the day, then there is a loss of ` 3 per item. The daily
demand of the item has the following probability distribution :
NOTES
Number of items sold 4 5 6
How many items should he store so that his daily expected profit is maximum ?
Use decision tree approach.
Solution. Profit per item = ` (5 3) = ` 2
Thus, EMV for strategy second is maximum. So shopkeeper should stock 5 items.
Example 5. Matrix company is planning to launch a new product, which can be
introduced initially in Western India or in the entire country. If the product is introduced
only in Western India, the investment outlay will be ` 12 million. After two years, Matrix
can evaluate the project to determine whether it should cover the entire country. For
such expansion it will have to incur an additional investment of ` 10 million. To introduce
the product in the entire country right in the begining would involve an outlay of ` 20
million. The product, in any case, will have a life of 5 years after which the plant will
have zero net value.
If the product is introduced only in Western India, demand would be high or low
with the probabilities of 0.8 and 0.2 respectively and annual cashflow of ` 4 million
and ` 2.5 million respectively.
If the product is introduced in the entire country right in the begining the demand
would be high or low with probabilities of 0.6 and 0.4 respectively and annual cash
inflows of ` 8 million and ` 5 million respectively.
Based on the observed demand in Western India, if the product is introduced in
the entire country the following probabilities would exist for high and low demand on an
all India basis :
Self-Instructional Material 65
Statistical Analysis The hurdle rate applicable to this project is 12 percent.
(i) Set up a decision tree for the investment situation.
(ii) Advice Matrix company on the investment policy it should follow.
Support your advice with appropriate reasoning.
NOTES Solution.
High demand
(0.9)
ion 3
ans
Exp
Low demand
D3 (0.1)
d
an
dem
h (0.8) No expansion
ig
EMV = 7.60 H
1
a No expansion
di Lo
In w (0.2)
rn de
te
es m
an
W d High demand
D2
D1 (0.4)
En 4
tir High demand
e
co
un (0.6) Low demand
try
(0.6)
2
EMV = 14
Low demand
(0.4)
(Demand high in
Western India)
(i) Expansion High demand 0.9 8 7.2
Low demand 0.1 5 0.5
D3 7.7
7.7 × 3 years = 23.1
Less cost = 10.0
(ii) No expansion Total 13.1
0
Total expected
profit = 13.1
66 Self-Instructional Material
Statistical Decision
(Demand low in
Theory
Western India)
(i) Expansion High demand 0.4 8 3.2
Low demand 0.6 5 3.0 NOTES
6.2
D2 6.2 × 3 years = 18.6
Less cost = 10.0
(ii) No expansion Total 8.6
0
Total expected
profit = 8.6
Thus, the EMV at node 2 is maximum, make a decision to launch the product in
entire country.
EXERCISE 3.1
1. The ABC company is faced with four decision alternatives relating to investments in a
capital expansion programme. Since these investments are made in future, the company
foresees different market conditions as expressed in the form of states of nature. The
following table summarizes the decision alternatives, the various states of nature and
the rate of return associated with each state of nature :
If the company has no information regarding the probability of occurrence of the three
states of nature, give the recommended decision for the decision criterion as follows :
(i) Maximax criterion (ii) Maximin criterion
(iii) Minimax regret criterion (iv) Laplace criterion
Self-Instructional Material 67
Statistical Analysis 2. A food products company is contemplating the introduction of a revolutionary new product
with new packaging to replace the existing product at much higher price (A1) or a moderate
change in the composition of the existing product with a new packaging at a small increase
in price (A2) or a small change in the composition of the existing except the word New
with a negligible increase in price (A3). The three possible states of nature or events are
NOTES (i) high increase in sales (S1), (ii) no change in sales (S2) and (iii) decrease in sales (S3).
The marketing department of the company worked out the pay-offs in terms of yearly
net profits for each of the strategies of three events (expected sales). This is represented
in the following table :
A1 1 3 8 5
A2 2 5 4 7
A3 4 6 6 3
A4 6 8 3 5
68 Self-Instructional Material
5. Pay-offs (in `) of three acts A1, A2 and A3 and the possible states of nature S1, S2 and S3 Statistical Decision
are as follows : Theory
S1 20 50 200
S2 200 100 50
S3 400 600 300
The probabilities of the states of nature are 0.3, 0.4 and 0.3 respectively. Tabulate the
expected monetary values (EMVs) for the above data and state which can be selected as
the best act.
6. An investor is given the following investment alternatives and percentage rates of return:
Over the past 300 days, 150 days have been medium market conditions and 60 days
have been high market conditions.
On the basis of these data, state the optimum investment strategy for the investment.
7. Pay-offs (`) of three acts A1, A2, A3 and the states of nature S1, S2 and S3 are as follows :
S1 25 10 125
S2 400 440 400
S3 650 740 750
The probabilities of the states of nature are 0.1, 0.7 and 0.2 respectively. Tabulate the
expected monetary values (EMVs) and state which can be selected as the best act.
8. XYZ flower shop promises its customers delivery within four hours on all flower orders.
All flowers are purchased on the prior day and delivered to XYZ by 8:00 the next morning.
XYZs daily demand for roses is as follows :
Dozens roses 7 8 9 10
XYZ purchases roses for ` 10.00 per dozen and sells them for ` 30.00. All unsold roses
are donated to a local hospital. How many dozens of roses should XYZ order each evening
to maximize its profit ? What is the optimum expected profit ?
9. A producer of boats has estimated the following distribution of demand for a particular
kind of boat :
No. demanded 0 1 2 3 4 5 6
Self-Instructional Material 69
Statistical Analysis Each boat cost him ` 7,000 and he sells them for ` 10,000 each. Any boat that are left
unsold at the end of the season must be disposed off for ` 6,000 each. How many boats
should be in stock so as to maximize his expected profit ?
10. Consider the following pay-off table.
A1 18 10 12 8
A2 16 12 10 10
A3 12 13 11 12
The probabilities of events E1, E2, E3 and E4 are 0.25, 0.40, 0.15 and 0.20 respectively.
Find the optimum act using expected opportunity loss (EOL) criterion.
11. A man has the choice of running either a hot-snack stall or an ice-cream stall at a seaside
resort during the summer season. If it is a fairly cool summer, he should make ` 5,000 by
running the hot-snack stall, but if the summer is quite hot he can only expect to make
` 1000. On the other hand, if he operates the ice-cream stall, his profit is estimated at
` 6500 if the summer is hot, but only ` 1000 if it is cool. There is a 40% chance of the
summer being hot. Should he opt for running the hot-snack stall or the ice-cream stall ?
Give mathematical argument.
12. The cost of making an item is ` 25, the selling price of the item is ` 30, if it is sold within
a week, and it could be disposed off at ` 20 per piece at the end of week if unsold. Frequency
of weekly sales is given as :
Weekly sales ( 3) 4 5 6 7 ( 8)
No. of weeks 0 10 20 40 30 0
Find the optimum number of items per week the industry should make using EMV and
EOL criterion. Also find, the EVPI.
13. A company wants to know whether or not a new shaving cream should be marketed. The
present value of all future profits for the success of the cream is ` 10,00,000 and its
failure would results in a net loss of ` 5,00,000.
Not marketing it would not change the profits. The chances of the success of the new
cream are 50%. Determine the optimum act and find the EVPI.
14. A modern home appliances dealer finds that the cost of holding a mini-cooking range in
stock for a month is ` 200 (insurance, minor deterioration, interest on borrowed capital,
etc.). Customer who cannot obtain a cooking range immediately tends to go to other
dealers and he estimates that for every customer who cannot get immediate delivery, he
loses an average of ` 500. The probabilities of a demand of 0, 1, 2, 3, 4 and 5 cooking
ranges in a month are 0.05, 0.10, 0.20, 0.30, 0.20 and 0.15 respectively. Determine the
optimum stock level of cooking range. Also find the EVPI.
15. A manufacturer of leather goods must decide whether to expand his plant capacity now
or wait at least another year. His advisors tell him that if he expands now and economic
conditions remained good, there will be a profit of ` 1,64,000 during the next year. If he
expands now and there is recession, there will be a loss of ` 40,000. If he waits at least
another year and economic conditions remain good, there will be a profit of ` 80,000 and
if he waits at least another year and there is a recession, there will be a small profit of
` 8,000. What should the manufacturer decide to do if he wants to minimize the expected
loss during next year and he feels that the odds are 2 : 1 that there will be recession. Use
decision tree approach.
16. XYZ Ltd. wants to update/change its existing manufacturing prices for product A. It
wants to strengthen its research and development cell and conduct research for finding
a better product of manufacturing, which can get them higher profits. At present the
70 Self-Instructional Material
company is earning a profit of ` 20,000 after paying for material, labour and overheads. Statistical Decision
XYZ Ltd. has the following four alternatives : Theory
(i) The company continues with the existing process.
(ii) The company conducts research P, which costs ` 20,000, has 75% probability of success
and can get the profit of ` 5,000. NOTES
(iii) The company conducts research Q, which costs ` 10,000, has 50% probability of success
and can get the profit of ` 25,000.
(iv) The company pays ` 10,000 as royalty for a new product and can get profit of ` 20,000.
The company can carry out only one out of the two types of research P and Q because
of certain limitations. Draw a decision tree diagram and find the best strategy for
XYZ Ltd.
17. The investment staff of a bank is considering four investment proposals for clients, shares,
bonds, real estate and saving certificates, these investments will be held for one year.
The past data regarding the four proposals is given as follows :
Shares. There is 25% chance that shares will decline by 10%, 30% chance that they will
remain stable and 45% chance that they will increase in value by 15%. Also the shares
under consideration do not pay any dividends.
Bonds. These bonds stand a 40% chance of increase in value by 5% and 60% chance of
remaining stable and they yield 12%.
Real Estate. This proposal has a 20% chance of increasing 30% in value, a 25% chance
of increasing 20% in value, a 40% chance of increasing 10% in value, 10% chance of
remaining stable and a 5% chance of loosing 5% of its value.
Saving Certificates. These certificates will yields 8.5% with certainty.
Use a decision tree to structure the alternatives available to the investment staff, and
using the expected monetary value criteria, choose the alternative with the highest
expected value.
18. A manufacturing company has to select one of the two products A or B for manufacturing
product A requires investment of ` 20,000 and product B ` 40,000. Market research
survey shows high, medium and low demands with corresponding probabilities and return
from sales, in ` thousand, for the two products, in the following table :
A B A B
Construct an appropriate decision tree. What decision the company should take ?
Answers
1. (i) D3 (ii) D4 (iii) D3 (iv) D3 2. (i) A3 (ii) A1 (iii) A1 (iv) A1
3. (i) Savings (ii) Stock (iii) Bonds or Savings (iv) Bonds
4. (i) A3 (ii) A1 (iii) A35. A1 6. Property 7. A2
8. 9 dozen, ` 168 9. 3 boats 10. A2
11. Hot-snack stall 12. 6 items, ` 3.50 13. Market cream, ` 2,50,000
14. 4 cooking ranges, ` 315. 15. Wait for one year
16. Conduct research P to find a new process 17. Invest in real estate 18. Product B
Self-Instructional Material 71
Statistical Analysis
STRUCTURE
Sampling
Types of Sampling
Use of Random Numbers
Parameter and Statistic
Sampling Distribution of Mean
Sampling Distribution of Sample Variance
Sampling Distribution of Sample Proportion
Estimation
Point Estimation
Interval Estimation
Bayesian Estimation
4.1. SAMPLING
Sampling means the selection of a part of the aggregate with a view to draw
some statistical informations about the whole. This aggregate of the investigation is
called population and the selected part is called sample. A population is finite or infinite
according to its size i.e., number of members.
The main objective of the sampling is to obtain the maximum information of the
population. The analysis of the sample is done to obtain an idea of the probability
distribution of the variable in the population.
Though by applying proper process of sampling we may not be able to represent
the characteristics of the population correctly. This discrepancy is called sampling
error.
There are different sampling methods. We describe below some important types
of sampling.
(a) Simple random sampling. In this type of sampling every unit of the
population has an equal chance of being selected in a sample. There are two ways of
72 Self-Instructional Material
drawing a simple random sampleWith Replacement (WR) and Without Replacement Sampling and Sampling
(WOR). Distributions
In WR type, the drawn unit of the population is again returned to the population
so that the size of the population remains same before each drawing. In WOR type, the
drawn unit of the population is not returned to the population. For finite population NOTES
the size diminishes as the sampling process continues.
(b) Systematic sampling. In systematic sampling one unit is chosen at random
from the population and the items are selected regularly at predetermined intervals.
This method is quite good over the simple random sampling provided there is no
deliberate attempt to change the sequence of the units in the population.
(c) Cluster sampling. When the population consists of certain group of clusters
of units, it may be advantageous and economical to select a few clusters of units and
then examine all the units in the selected clusters. For example of certain goods which
are packed in cartons and repacking is costly it is advisable to select only few cartons
and inspect all the inside goods.
(d) Two-stage sampling. When the population consists of larger number of
groups each consisting of a number of items, it may not be economical to select few
groups and inspect all the items in the groups. In this case, the sample is selected in
two stages. In the first stage, a desired number of groups (primary units) are selected
at random and in the second stage, the required number of items are chosen at random
from the selected primary units.
(e) Stratified sampling. Here the population is subdivided into several parts,
called strata showing the heterogenity of the items is not so prominent and then a sub
sample is selected from each of the strata. All the sub-samples combined together give
the stratified sample. This sampling is useful when the population is heterogeneous.
Any statistical measure relating to the population which is based on all units of
the population is called parameter, e.g., population mean (µ), population S.D. (),
moments µr , µr etc.
Self-Instructional Material 73
Statistical Analysis Any statistical measure relating to the sample which is based on all units of the
sample is called statistic, e.g., sample mean ( x ), sample variance, moments mr, m'r
etc. Hence the value of a statistic varies from sample to sample. This variation is
called sampling fluctuation. The parameter has no fluctuation and it is constant.
NOTES The probability distribution of a statistic is called sampling distribution. The standard
deviation (S.D.) in the sampling distribution is called standard error of the statistic.
Example 1. For a population of five units, the values of a characteristic x are
given below:
8, 2, 6, 4 and 10.
Consider all possible samples of size 2 from the above population and show that
the mean of the sample means is exactly equal to the population mean.
30
Solution. The population mean, µ = 6
5
Random samples of size two (Without Replacement)
Serial Sample Sample Serial Sample Sample
no. values mean no. values mean
1 8, 2 5 6 2, 4 3
2 8, 6 7 7 2, 10 6
3 8, 4 6 8 6, 4 5
4 8, 10 9 9 6, 10 8
5 2, 6 4 10 4, 10 7
Total 31 Total 29
31 + 29 60
Mean of sample means = = = 6 which is equal to the population
10 10
mean.
Case I : s Known
Consider a population having mean µ and variance 2. If a random sample of
size n is taken from this population then the sample mean X is a random variable
whose distribution has the mean µ.
2
If the population is infinite, then the variance of this distribution is n and the
standard error is defined as S.E. = .
n
If the population is finite of size N then the variance of this distribution is
2 N n
and the standard error is defined as
n N 1
Nn
S.E. =
n N 1
provided the sample is drawn without replacement.
74 Self-Instructional Material
Sampling and Sampling
Nn Distributions
The factor is called finite population correction factor.
N 1
Let us consider the standardized sample mean
X NOTES
Z =
/ n
Then we have the central limit theorem as follows:
If X is the mean of a sample of size n taken from a population whose mean is µ
and variance is 2, then
X
Z = N(0, 1) as n .
/ n
If the samples come from a normal population then the sampling distribution of
the mean is normal regardless of the size of the sample.
If the population is not normal then the sampling distribution of the mean is
approximately normal for small size (n = 25) of the sample.
Example 2. A random sample of size 100 is taken from an infinite population
having the mean µ = 66 and the variance 2 = 225. What is the probability of getting an
x between 64 and 68?
x
Solution. Let Z = , n = 100, µ = 66, = 15
/ n
Required probability = P[64 < x < 68]
= P [1.33 < z < 1.33]
= 2 (1.33) = 2 (0.4082)
= 0.8164.
Example 3. A random sample is of size 5 is drawn without replacement from a
finite population consisting of 35 units. If the population standard deviation is 2.25.
What is the standard error of sample mean?
Solution. Here, n = 5, N = 35, = 2.25
Nn
S.E. of sample mean =
n N 1
2.25 30
= 0.95.
5 34
Case II : s Unknown
For small sample, the assumption of normal population gives fairly the sampling
distribution of X. However the is replaced by sample standard deviation S. Then we
have
X 1
t = ( x i x )2
where, S2 =
S/ n n 1
is a random variable having the t distribution with the degrees of freedom v = n 1.
Self-Instructional Material 75
Statistical Analysis
4.6. SAMPLING DISTRIBUTION OF SAMPLE VARIANCE
Like sample mean, if we calculate the sample variance for each samples drawn
NOTES from a population then it shows also a random variable. We have the following result:
If a random sample of size n with sample variance S2 is taken from a normal population
having the variance 2, then
(n 1)S2 1
2 = 2 where, S2 = ( xi x )2
n 1
is a random variable having the chi-square distribution with the degrees of freedom
v = n 1.
2
(In chi-square distribution table represents the area under the chi-square
distribution to its right is equal to ).
If S12 and S22 are the variances of independent random sample of size n1 and n2
respectively, taken from two normal populations having the same variance, then
S12
F =
S22
is a random variable having the F distribution with the degrees of freedoms v1
= n1 1 and v2 = n2 1.
Example 4. If two independent random samples of size n1 = 9 and n2 = 16 are
taken from the normal population, what is the probability that the variance of the first
sample will be at least four times as large as that of the second sample?
Solution. Here v1 = 9 1 = 8, v2 = 16 1 = 15, S21 = 4S22
From F distribution table we find that
F0.01 = 4.00 for v1 = 8 and v2 = 15.
Thus, the desired probability is 0.01.
PQ
proportion where, Q = 1 P and the sample size n is sufficiently large. If the
n
random sample is drawn from a finite population without replacement then we have
Nn
to multiply a correction factor to the S.D. formula.
N 1
If p1 and p2 denote the proportions from independent samples of sizes n1 and n2
drawn from two populations with proportions P1 and P2 respectively then
P1Q1 PQ
S.E. of (p1 p2) = + 2 2
n1 n2
where, P1 + Q1 = 1 and P2 + Q2 = 1.
76 Self-Instructional Material
Example 5. It has been found that 3% of the tools produced by a certain machine Sampling and Sampling
are defective. What is the probability that in a shipment of 450 such tools, 2% or more Distributions
will be defective?
Solution. Since the sample size n = 450 is large, the sample proportion (p) is
approximately normally distributed with mean = P = 3% = 0.03. NOTES
PQ (0.03) (0.97)
S.D. = 0.008
n 450
Required probability = P[p > 0.02]
= P[z > 1.25] = 0.5 + (1.25)
= 0.5 + 0.3944 = 0.8944.
EXERCISE 4.1
1. A population consists of 5 numbers (2, 3, 6, 8, 11). Consider all possible samples of size
two which can be drawn with replacement from this population. Calculate the S.E. of
sample means.
2. When we sample from an infinite population, what happens to the standard error of the
mean if the sample size is (a) increased from 30 to 270, (b) decreased from 256 to 16?
3. A random sample of size 400 is taken from an infinite population having the mean µ =
86 and the variance of 2 = 625. What is the probability that X will be greater than 90?
4. The number of letters that a department receives each day can be modeled by a
distribution having mean 25 and standard deviation 4. For a random sample of 30 days,
what will be the probability that the sample mean will be less than 26?
5. A random sample of 400 mangoes was taken from a large consignment and 30 were
found to be bad. Find the S.E. of the population of bad ones in a sample of this size.
6. From a population of large number of men with a S.D. 5, a sample is drawn and the
standard error is found to be 0.5, what is the sample size?
7. A population consists of 20 elements, has mean 9 and S.D. 3 and a sample of 5 elements
is taken without replacement. Find the mean and S.D. of the sampling distribution of
the mean. What will be the S.D. for samples of size 10?
8. A machine produces a component for a transistor set of the total produce, 6 percent are
defective. A random sample of 5 components is taken for examination from (i) a very
large lot of produce, (ii) a box of 10 components. Find the mean and S.D. of the average
number of defectives found among the 5 components taken for examination.
9. A population consists of five numbers 2, 3, 6, 8, 11. Consider all possible samples of size
two which can be drawn without replacement from the population. Find
(a) The mean of the population
(b) Standard deviation of the population
(c) The mean of the sampling distribution of means
(d) The standard deviation of the sampling distribution of means.
Answers
1. 2.32 2. (a) It is divided by 3 (b) It is multiplied by 4
3. 0.0007 4. 0.9147
5. 0.013 6. 100
Self-Instructional Material 77
Statistical Analysis
27
7. For sample of 5 elements, sampling mean = 8, S.D. =
19
3
For sample of 10 elements, sampling mean = 8, S.D. =
19
NOTES
8. Mean = 0.06, S.D. = 0.106
9. (a) 6, (b) 3.29, (c) 6, (d) 2.12.
4.8. ESTIMATION
When we deal with a population, most of the time the parameters are unknown.
So we cannot draw any conclusion about the population. To know the unknown
parameters the technique is to draw a sample from the population and try to gather
information about the parameter through a function which is reasonably close. Thus
the obtained value is called an estimated value of the parameter, the process is called
estimation and the estimating function is called estimator.
A good estimator should satisfy the four properties which we briefly explain below:
(a) Unbiasedness. A statistic t is said to be an unbiased estimator of a parameter
if, E [t] = .
Otherwise it is said to be biased.
Theorem 1. Prove that the sample mean x is an unbiased estimator of the
population mean µ.
Proof. Let x1, x2, ...,xn be a simple random sample with replacement from a finite
population of size N, say, X1, X2, ..., XN
Here, x = (x1 + x2 + ...+ xn)/n
µ = (X1 + X2 + ...+ XN)/N
To prove that E (x ) = µ
While drawing xi, it can be one of the population members i.e., the probability
distribution of xi can be taken as follows:
1 1 1
E (xi) = X1 . + X2 . + ... + XN .
N N N
= (X1 + X2 + ... + XN)/N
= µ, i = 1, 2, ..., n.
1
s2 = ( xi x )2
n
1
= xi2 ( x )2
n
1
= yi2 ( y )2 , where, y i = x i µ and S.D
n
is unaffected by change of origin.
1
= ( xi )2 ( x )2
n
1
E (s2) = E ( x i )2 E( x )2
n
1 2 n 1
= .2 Var ( x ) = 2 = . 2 2.
n n n
s2 is a biased estimator of 2
1
Note. Let S2 = ( xi x )2 , then
(n 1)
n
E (S2) = .E (s2)
n 1
n n 1 2
= . = 2
n 1 n
Thus, S2 is an unbiased estimator of 2.
Example 1. A population consists of 4 values 3, 7, 11, 15. Draw all possible sample
of size two with replacement. Verify that the sample mean is an unbiased estimator of
the population mean.
Solution. No. of samples = 42 = 16, which are listed below:
(3, 3), (7, 3), (11, 3), (15, 3)
(3, 7), (7, 7), (11, 7), (15, 7)
(3, 11), (7, 11), (11, 11), (15, 11)
(3, 15), (7, 15), (11, 15), (15, 15)
Self-Instructional Material 79
Statistical Analysis
3 7 11 15 36
Population mean, µ= 9
4 4
Sampling distribution of sample mean
NOTES
Sample mean Frequency x . f (x )
(x ) f (x )
3 1 3
5 2 10
7 3 21
9 4 36
11 3 33
13 2 26
15 1 15
Total 16 144
144
Mean of sample mean = =9
16
Since, E ( x ) = µ,
Sample mean is an unbiased estimator of the population mean.
(b) Consistency. A statistic tn obtained from a random sample of size n is said
to be a consistent estimator of a parameter if it converges in probability to as n tends
to infinity.
Alt, If E [Tn] and Var [Tn] 0 as n , then the statistic tn is said to be
consistent estimator of .
For example, in sampling from a Normal Population N (µ, 2),
2
E [ x ] = µ and Var [ x ] = 0 as n .
n
80 Self-Instructional Material
A statistic which is unbiased and also the most efficient, is said to be the Minimum Sampling and Sampling
Variance Unbiased Estimator (MVUE). Distributions
For example, the sample mean x obtained from a normal population is the MVUE for NOTES
the parameter µ.
Let x1, x2, ..., xn be a random sample and
T = a1x1 + a2x2 + ... + anxn
where a1, a2, ..., an are constants. If T is an MVUE, then T is also called Best Linear
Unbiased Estimator (BLUE).
Example 2. A random sample (X1, X2, X3, X4, X5, X6 ) of size 6 is drawn from a
normal population with unknown mean µ. Consider the following estimators to
estimate µ.
X1 X 2 + X3 X4 + X5 X6
(i) T1 =
6
X1 X 2 + X3 X X 5 + X6
(ii) T2 = 4
2 3
(iii) T3 = 1 1
(X1 X 2 ) + X 3 X 4 (X 5 X6 )
2 3
Are these estimators unbiased? Find the estimator which is best among T1, T2
and T3.
Solution. Here E (Xi) = µ, Var (Xi = 2 (say), Cov (Xi , Xj) = 0, i j
1
E (T1) = [E (X1) + E (X2) + E (X3) + E (X4) + E (X5) + E (X6)]
6
1 1
= [µ + µ + µ + µ + µ + µ) = .6 µ = µ.
6 6
1 1
E (T2) = [E (X1) + E (X2) + E (X3)] + [E (X4) + E(X5) + E (X6)]
2 3
1 1 3 5
= [µ + µ + µ] + [µ + µ + µ] = = .
2 3 2 2
1 1
E (T3) = [E (X1) + E (X2)] + E (X3) + E (X4) + [E (X5) + E (X6)]
2 3
1 1
= [µ + µ] + µ + µ + [µ + µ]
2 3
2 11
= µ + 2µ + = .
3 3
Self-Instructional Material 81
Statistical Analysis Since E (T1) = µ T 1 is unbiased. T 2 and T 3 are biased
estimators.
1
Var (T1) = [Var (X1) + Var (X2) + ...+ Var (X6)]
36
NOTES
1 1 2
= [2 + 2 + ... + 2] = (62) = .
36 36 6
1
Var (T2) = [Var (X1) + Var (X2) + Var (X3)]
4
1 1
+ [Var (X4) + Var (X5) + Var (X6)]
9 9
1
= [2 + 2 + 2] + [2 + 2 + 2]
4
3 2 32 13 2
= + = .
4 9 12
1
Var (T3) = [Var (X1) + Var (X2)] + Var (X3) + Var (X4)
4
1
+ [Var (X5) + Var (X6)]
9
1 1
= [2 + 2 ] + 2 + 2 + [2 + 2 ]
4 9
2 22 49 2
= 22 = .
2 9 18
Since Var (T1) is smallest T1 is best estimator.
2 2
6
Efficiency of T1 over T2 = 0.15
13 2
12
13
2
3
6
Efficiency of T1 over T3 = 0.06.
49 2
18
49
Example 3. A random sample (X1, X2, X3, X4 ) of size 4 is drawn from a normal
population with unknown mean. If
T = 2 X1 + X + 3 X3 4 X4
2 2
be an unbiased estimator of µ, find .
Solution. Let E (Xi) = µ, i = 1, 2, 3, 4.
For unbiasedness, E (T) = µ
2E (X1) + E(X2) + 3 E(X3) 4 E(X4) = µ
2
2µ + µ + 3µ 4µ = µ
2
µ+ µ = µ
2
= 0 = 0.
2
82 Self-Instructional Material
(d ) Sufficiency. Let x1, x2, ..., xn be a random sample from a population whose Sampling and Sampling
p.m.f. or pdf is f (x,). Then T is said to be a sufficient estimator of if we can express Distributions
the following:
f (x1,) . f (x2,) ... f (xn,) = g1 (T,) . g2 (x1,x2, ..., xn)
NOTES
where g1 (T, ) is the sampling distribution of T and contains and g2 (x1, x2, ..., xn) is
independent of .
Sufficient estimators exist only in few cases. However in random sampling from
a normal population, the sampling mean x is a sufficient estimator of µ.
Using sampling if a single value is estimated for the unknown parameter of the
population, then this process of estimation is called point estimation. We shall discuss
two methods of point estimation below:
I. Method of Maximum Likelihood
Let x1, x2, ..., xn be a random sample from a population whose p.m.f. (discrete case)
or p.d.f. (continuous case ) is f (x, ) where is the parameter. Then construct the
likelihood function as follows:
L = f (x1, ). f (x2, ) ...f (xn, ).
Since, log L is maximum when L is maximum. Therefore to obtain the estimate of
, we maximize L as follows:
(log L) = 0 =
2
and (log L) < 0 at =
2
Here is called Maximum Likelihood Estimator (MLE).
Properties of MLE
(i) MLE is not necessarily unbiased.
(ii) MLE is consistent, most efficient and also sufficient, provided a sufficient
estimator exists.
(iii) MLE tends to be distributed normally for large samples.
(iv) If g() is a function of and is an MLE of , then g( ) is the MLE of g().
Example 4. A discrete random variable X can take up all non-negative integers and
P (X = r) = p (1 p)r (r = 0, 1, 2, ...)
where, p (0 < p < 1) is the parameter of the distribution. Find the MLE of p for a sample
of size n : x1 , x2 , ..., xn from the population of X.
Solution. Consider the following likelihood function:
L = P (X = x1) . P (X = x2) ... P (X = xn)
= p (1 p)x1 . p (1 p)x2 ... p (1 p)x n
x2 ... x n
= pn (1 p)x1 = pn (1 p)xi
Self-Instructional Material 83
Statistical Analysis Taking log on both sides we obtain
ln L = n ln p + (xi) ln (1 p)
dlnL
Now = 0
dp
NOTES
n xi
0
p 1 p
n xi
p 1 p
1 p xi
p n
1
1 = x
p
1
pˆ
1x
Also,
d2 ln L n xi 1 x
= = n 2 2
(1 p)
2 2 2
dp p (1 p ) p
2 x (1 x )2
= n (1 x )
( x )2
1 1
at pˆ 2
= n (1 x ) (1 ) 0
1x x
1
Hence the MLE of p is .
1 x
= n (x1 . x2 ....xn) 1
Taking log on both sides we obtain
ln L = n ln + ( 1) ln (x1 . x2 .... xn)
d ln L n
Now, = 0 ln ( x1 x2 ... xn ) 0
d
n
ln ( x1 x2 ... xn )
n
ˆ
ln ( x1 x2 ... xn )
84 Self-Instructional Material
Sampling and Sampling
d 2 ln L n
Also, = 0 Distributions
d 2 2
n
Hence, the MLE of is . NOTES
ln ( x1 x2 ... xn )
Example 6. X tossed a biased coin 40 times and got head 15 times, while Y tossed
it 50 times and got head 30 times. Find the MLE of the probability of getting head when
the coin is tossed.
Solution. Let P be the unknown probability of getting a head.
Using binomial distribution,
40 15 25
Probability of getting 15 heads in 40 tosses = P (1 P)
15
50 30 20
Probability of getting 30 heads in 50 tosses = P (1 P)
30
The likelihood function is taken by multiplying these probabilities.
40 50
L = . P45 (1 P)45
15 30
40 50
log L = log . 45 log P 45 log (1 P)
15 30
log L 45 45
Hence, =0 0 P 1/2, which is the MLE.
P P 1 P
II. Method of Moments
In this method, the first few moments of the population is equated with the
corresponding moments of the sample.
Then µ'r = m'r
where µ'r = E ( xr ) and m'r = xir/n
The solution for the parameters gives the estimates. But this method is applicable
only when the population moments exist.
Example 7. Estimate the parameter p of the binomial distribution by the method
of moments (when n is known).
Solution. Here, µ'1 = E (x) = np and m'1 = x
Taking µ'1 = m'1, we have
np = x
x
p =
n
Self-Instructional Material 85
Statistical Analysis
4.10. INTERVAL ESTIMATION
x
We know that z = follows standard normal distribution and 95% of the
n
area under the standard normal curve lies between z = 1.96 and z = 1.96, Then,
P [1.96 z 1.96] = 0.95
x
P 1.96 1.96 = 0.95
n
i.e., in 95% cases we have
x
1.96 1.96
n
x 1.96 x 1.96
n n
The interval x 1.96 , x 1.96 is known as 95% confidence interval
n n
for µ.
Similarly, x 2.58 , x 2.58 is known as 99% C.I. for µ,
n n
3
x 3. ,x is known as 99.73% C.I. for µ.
n n
(b) C.I. for mean with unknown S.D. s.
In this case, the sampling from a normal population N (µ, 2), the statistic
x 1
t= , where s2 ( x i x )2
s n 1 n
x
t0.025 t0.025
s n 1
s s
x t0.025 . x t0.025
n 1 n 1
86 Self-Instructional Material
Sampling and Sampling
s s Distributions
Thus, x t0.025 . , x t0.025 . is called 95% C.I. for µ.
n 1 n 1
s s
Similarly, x t0.005 . , x t0.005 . is called 99% C.I. for µ. NOTES
n 1 n 1
(c) C.I. for variance s2 with known mean. We know that (xi µ)2/2 follows
chi-square distribution with n degrees of freedom.
For probability 95% we have
20.975 ( xi )2 / 2 0.025
2
( xi )2 / 20.005 2 ( xi )2 / 0.995
2
1 1
99% Confidence limits = ( x1 x2 ) t0.005 s .
n1 n2
For Proportion P:
95% Confidence limits = p ± 1.96 (S.E. of p)
99% Confidence limits = p ± 2.58 (S.E. of p)
PQ pq
where, S.E. of p =
n n
Self-Instructional Material 87
Statistical Analysis For Difference of Proportions P1 P2 :
95% Confidence limits = (p1 p2) ± 1.96 [S.E. of (p1 p2)]
99% Confidence limits = (p1 p2) ± 2.58 [S.E. of (p1 p2)]
NOTES P1 Q1 P Q p1 q1 p q
where S.E of (p1 p2) = 2 2 2 2.
n1 n2 n1 n2
Example 8. A random sample of size 10 was drawn from a normal population with an
unknown mean and a variance of 35.4 (cm)2. If the observations are (in cms): 55, 75, 71, 66,
73, 77, 63, 67, 60 and 76, obtain 99% confidence interval for the population mean.
x
Solution. Given n = 10, xi = 683, Then x = 68.3
n
Since, the population S.D. is known, then 99% C.I. for µ is given by
x 2.58 , x 2.58
n n
2.58 . 35.4 2.58 . 35.4
i.e., 68.3 , 68.3
10 10
i.e., [63.45, 73.15].
Example 9. A random sample of size 10 was drawn from a normal population
which are given by 48, 56, 50, 55, 49, 45, 55, 54, 47, 43. Find 95% confidence interval for
mean µ of the population.
Solution. From the given data, xi = 502, so x 50. 2 , n 10
Let d = x 50, then the samples are changed to
2, 6, 0, 5, 1, 5, 5, 4, 3, 7.
d = 2, d2 = 190
2 2
d 2 d 190 2
s2 = 18.96
n n 10 10
s = 4.35
Since, the population S.D. is unknown, the 95% C.I. for mean µ is
s s
x 2.262 . , x 2.262 .
n n
(4.35) (4.35)
i.e., 50.2 (2.262) , 50.2 (2.262)
10 10
i.e., [47.09, 53.31].
Example 10. The standard deviation of a random sample of size 15 drawn from
a normal population is 3.2. Calculate the 95% confidence interval for the standard
deviation () in the population.
Solution. Here n= 15, sample s.d. (s) = 3.2
95% Confidence interval for 2 is
n s2 n s2
2
2 2
0.025 0.975
88 Self-Instructional Material
From chi-square table with 14 degrees of freedom, Sampling and Sampling
Distributions
20.025 = 26.12, 20.975 = 5.63
Therefore the C.I. is
NOTES
15.(3.2)2 15.(3.2)2
2
26.12 5.63
i.e., 5.88 27.28
2
Here S.E. of p = PQ
n
pq 65 65 1
1 0.02
n 500 500 500
Thus, the limits are [0.13 3 (0.02), 0.13 + 3 (0.02)]
i.e., [0.07, 0.19].
Self-Instructional Material 89
Statistical Analysis If0is an observed experimental outcome, then
P 0 i . f ’( i )
f " (i) = n
, i = 1, 2, ..., n
P 0 j f ’ j
NOTES j 1
P 0 f ’
In the limit we obtain, f " () =
P 0 f ’ d
Then the Bayesian estimator is
= E 0 = f " d
Using this we can calculate
P X a = P X a f " d.
SUMMARY
Sampling means the selection of a part of the aggregate with a view to draw
some statistical informations about the whole. This aggregate of the
investigation is called population and the selected part is called sample.
Any statistical measure relating to the population which is based on all units
of the population is called parameter.
Any statistical measure relating to the sample which is based on all units of
the sample is called statistic.
When we deal with a population, most of the time the parameters are unknown.
So we cannot draw any conclusion about the population. To know the unknown
parameters the technique is to draw a sample from the population and try to
gather information about the parameter through a function which is reasonably
close. Thus, the obtained value is called an estimated value of the parameter,
the process is called estimation and the estimating function is called estimator.
If a consistent estimator has least variance than any other consistent estimators
of a parameter, then it is called the most efficient estimator.
Using sampling if a single value is estimated for the unknown parameter of
the population, then this process of estimation is called point estimation.
EXERCISE 4.2
1. A random variable X has a distribution with density function:
f (x) = .(+ 1) x, 0 x 1,> 1
= 0, otherwise
and a random sample of size 8 produces the data: 0.2, 0.4, 0.8, 0.5, 0.7, 0.9, 0.8 and 0.9.
Find the MLE of the unknown parameter .
90 Self-Instructional Material
2. A random variable X has a distribution with density function: Sampling and Sampling
Distributions
( a 1) x a
f (x) = 1
, 0x2
2a
= 0, otherwise NOTES
Find the MLE of the parameter a (> 0).
3. Consider a random sample of size n from a population following Poisson distribution.
Obtain the MLE of the parameter of this distribution.
4. Consider a random sample x1, x2, ..., xn from a normal population having mean zero.
Obtain the MLE of the variance and show that it is unbiased.
5. Consider a random sample x1, x2, ..., xn from a population following binomial
distribution having parameters n and p. Find the MLE of p and show that it is unbiased.
6. Find the estimates of µ and in the normal populations N (µ, 2) by the method of
moments.
7. Show that the estimates of the parameter of the Poisson distribution obtained by the
method of maximum likelihood and the method of moments are identical.
8. Find a 95% C.I. for the mean of a normal population with = 3, given the sample 2.3,
0.2, 0.4 and 0.9.
9. In a sample of size 10, the sample mean is 3.22 and the sample variance 1.21. Find
the 95% C.I. for the population mean.
10. A sample of size 10 from a normal population produces the data 2.03, 2.02, 2.01, 2.00,
1.99, 1.98, 1.97, 1.99, 1.96 and 1.95. From the sample find the 95% C.I. for the
population mean.
11. A random sample of size 10 from a N (µ, 2) yields sample mean 4.8 and sample
variance 8.64. Find 95% and 99% confidence intervals for the population mean.
12. The following random sample was obtained from a normal population : 12, 9, 10, 14,
11, 8. Find the 95% C.I. for the population S.D. when the population mean is (i)
known to be 13, (ii) unknown.
13. The marks obtained by 15 students in an examination have a mean 60 and variance
30. Find 99% confidence interval for the mean of the population of marks, assuming it
to be normal.
14. 228 out of 400 voters picked at random from a large electorate said that they were
going to vote for a particular candidate. Find 95% C.I. for the proportion of voters of
the electorate who would in favour of the candidate.
15. In a random sample of 300 road accidents, it was found that 114 were due to bad
weather. Construct a 99% confidence interval for the corresponding true proportions.
16. A study shows that 102 of 190 persons who saw an advertisement on a product on
T.V. during a sports program and 75 of 190 other persons who saw it advertised on a
variety show purchased the product. Construct a 99% confidence interval for the
difference of sample proportions.
Answers
n
1. = 0.890091 2. aˆ 1 3. = x
n n
ln 2 / xi
i 1
4. 2 = x 2 / n 5. p = x / n. 6. ˆ x , ˆ 2 s2
i
Self-Instructional Material 91
Statistical Analysis 8. [2.54, 3.34] 9. [2.39, 4.05] 10. [1.972, 2.008]
11. 95% C.I. [2.233, 7.367], 99% C.I. [1.616, 7.984]
12. (i) [1.97, 6.72], (ii) [1.35, 5.30] 13. [55.64, 64.36]
NOTES
14. [0.52, 0.62] 15. [0.31, 0.45] 16. [0.02, 0.28].
FURTHER READINGS
92 Self-Instructional Material
Hypothesis Testing
5. HYPOTHESIS TESTING
NOTES
STRUCTURE
Introduction
Null Hypothesis and Alternative Hypothesis
Level of Significance and Confidence Limits
Type I Error and Type II Error
Power of the Test
Test of Significance for Small Samples
Students t-Test
Assumptions for Students t-test
Degree of Freedom
Test for Single Mean
t-test for Difference of Means
Paired t-test For Difference of Means
F-test
Properties of F-distribution
Procedure to F-test
Critical Values of F-distribution
Test of Significance for Large Samples
Test of Significance for Proportion
Test of Significance for Single Mean
Test of Significance for Difference of Means
5.1. INTRODUCTION
INTRODUCTION
Self-Instructional Material 93
Statistical Analysis accept or reject the hypothesis. A test of significance can be used to compare the
characteristics of two samples of the same type. Some of the well known tests of
significance for small samples are t-test and F-test.
NOTES
5.2. NULL HYPOTHESIS AND AL
HYPOTHESIS TERN
ALTERNATIVE
TERNA
HYPOTHESIS
HYPOTHESIS
The probability level below which we reject the hypothesis is known as the level
of significance. The region in which a sample value falling is rejected, is known as the
critical region or the rejection region. We generally, take two critical regions which
cover 5% and 1% areas of the normal curve.
Depending on the nature of the problem, we use a single-tail test or double-tail
test to estimate the significance of a result. In a single-tail test, only the area on the
right of an ordinate is taken into consideration whereas in a double-tail test, the areas
of both the tails of the curve representing the sampling distribution are taken into
consideration.
For example, a test for testing the mean of a population
H0 : = 0
against the alternative hypothesis H1 : > 0 (right tailed) or H1 : < 0 (left tailed) is
a single tailed test. In the right tailed test (H1 : > 0), the critical region lies entirely
in the right tail of the sampling distribution ; while for the left tail test (H1 : < 0),
the critical region is entirely in the left tail of the sampling distribution.
A test of statistical hypothesis where the alternative hypothesis is two tailed
such as :
H0 : = 0 against the alternative hypothesis
H1 : 0 ( > 0 and < 0) is known as two tailed test and in such a case the
critical region is given by the portion of the area lying in both the tails of the probability
curve of the test statistic.
The value of z corresponding to 5% level of significance is 1.96 and corresponding
to 1% level of significance value of z is 2.58. The set of z-scores outside the range
94 Self-Instructional Material
1.96 and 2.58 constitute the critical region of the hypothesis (or the region of Hypothesis Testing
rejection) at 5% and 1% level of significance respectively.
The following figure showing region of acceptance and rejection for 5% and 1%
level of significance.
NOTES
Region of
Region of
acceptance
Critical region acceptance Critical region Critical region Critical region
99% area
or region of 95% area or region of or region of or region of
rejection rejection rejection rejection
The error of rejecting H0 when H0 is true is called the type I error and the error
of accepting H0 when H0 is false (H1 is true) is called the type II error. The probability
of type I error is denoted by and the probability of type II error is denoted by .
P (rejecting H0 when H0 is true) =
P (accepting H0 when H1 is true) =
A good test should accept the null hypothesis when it is true and reject the null
hypothesis when it is false. 1 (i.e., 1probability of type II error) measures how well
the test is working and is called the power of the test.
Power of the test = 1 .
Let x1, x2, ......, xn be a random sample of size n (n < 30) from a normal population
with mean and variance 2. The students t-test is defined as
x
t= ,
S/ n
n
1 n
where x =
n x , is the sample mean and S =
i1
i
1
n 1 i1
( xi x ) 2 is an unbiased
96 Self-Instructional Material
Hypothesis Testing
SOLVED EXAMPLES
Example 1. The mean weekly sales of soap bars in departmental stores was
146.3 bars per store. After an advertising campaign the mean weekly sales in 22 stores
for a typical week increased to 153.7 and showed a standard deviation of 17.2. Was the NOTES
advertising campaign successful ?
Solution. Here, n = 22, x = 153.7, s = 17.2
Null hypothesis H0 : = 146.3, i.e., the advertising campaign is not successful.
Alternative hypothesis H1 : > 146.3 (Right tail)
Under H0, the test statistic is
x
t= with (22 1) = 21 d.f.
s/ n 1
153.7 146 .3 7.4 21
t= = 9.
17.2 / 22 1 17.2
Since calculated value of t = 9 is greater than the tabulated value of t = 1.72 for
21 d.f. at 5% level of significance. It is highly significant. So H0 is rejected, i.e., the
advertising campaign was successful in promoting sales.
Example 2. Ten individuals are chosen at random from a normal population
and the heights are found to be in inches 63, 63, 66, 67, 68, 69,70, 70, 71 and 71. Test if
the sample belongs to the population whose mean height is 66 inches. (Given t 0.05 = 2.26
for 9 d.f.)
Solution.
xi xi x (xi x )2
63 4.8 23.04
63 4.8 23.04
66 1.8 3.24
67 0.8 0.64
68 0.2 0.04
69 1.2 1.44
70 2.2 4.84
70 2.2 4.84
71 3.2 10.24
71 3.2 10.24
= 9.0667 = 3.011
Null hypothesis H0 : = 66, i.e., population mean is 66 inches
Under H0, the test statistic is
x 67.8 66 1.8 10 5.692
t= = = 1.8904
S/ n 3.011/ 10 3.011 3.011
degree of freedom = n 1 = 10 1 = 9
t0.05 = 2.26 for 9 d.f.
Self-Instructional Material 97
Statistical Analysis As the calculated value of |t| is less than t0.05, the difference between x and
may be due to fluctuations of random sampling. H0 may be accepted. In other words,
the data does not provide any significant evidence against the hypothesis that the
population mean is 66 inches.
NOTES Example 3. A random sample of 16 values from a normal population showed
a mean of 41.5 inches and the sum of squares of deviations from this mean equal to
135 square inches. Show that the assumption of a mean of 43.5 inches for the
population is not reasonable. (Given t0.05 = 2.13, t0.01 = 2.95 for 15 degrees of freedom)
Solution. Here, x = 41.5 inches, n = 16, (xi x )2 = 135 sq. inches
1 1
S
n1
( xi x ) 2
15
135 9 3
Null hypothesis H0 : = 43.5 inches, i.e., the data are consistent with an
assumption that the mean height in population is 43.5 inches.
Alternative hypothesis H1 : 43.5 inches
Under H0, the test statistic is
x
t=
S/ n
. 43.5| 2 4
|415
|t| = = = 2.667
3 / 16 3
degrees of freedom = n 1 = 16 1 = 15
We are given t0.05 = 2.13 and t0.01 = 2.95 for 15 degrees of freedom.
Since calculated |t| is greater than t0.05 = 2.13, null hypothesis H0 is rejected at
5% level of significance and we conclude that the assumption of mean 43.5 inches for
the population is not reasonable.
Remark. Since calculated |t| is less than t0.01 = 2.95, null hypothesis H0 may be ac-
cepted at 1% level of significance.
NOTES
5.11. PAIRED t-TEST FOR DIFFERENCE OF MEANS
If the size of the two samples is the same, say equal to n, and the data are
paired, i.e. (xi, yi), (i = 1, 2,......, n) corresponds to the same ith sample unit. The problem
is to test if the sample means differ significantly or not.
Here, we consider the increments, di = xi yi , (i = 1, 2, ......, n).
Under the null hypothesis H0 that increments are due to fluctuations of sampling,
the statistic
d
t= ,
S/ n
n n
1 1
where d =
n
i1
di and S2 =
n 1 i1
(di d ) 2
SOLVED EXAMPLES
Example 1. The following data related to the heights (in cms) of two different
varieties of wheat plants.
Variety 1 63 65 68 69 71 72
Variety 2 61 62 65 66 69 69 70 71 72 73
Test the null hypothesis that the mean heights of plants of both varieties are the
same.
Solution. Given n1 = 6, n2 = 10
Null hypothesis H0 : 1 = 2
Alternative hypothesis H1 : 1 > 2 (right tail)
Under H0 the test statistic is given by
xy
t=
1 1
S
n1 n2
Self-Instructional Material 99
Statistical Analysis Variety 1 Variety 2
x x x = x 68 (x x )2 y y y = y 67 (y y )2
63 5 25 61 6 36
NOTES
65 3 9 62 5 25
68 0 0 65 2 4
69 1 1 65 2 4
71 3 9 66 1 1
72 4 16 66 1 1
x = 408 (x x )2 70 3 9
= 60 70 3 9
72 5 25
73 6 36
y = 670 (y y )2 = 150
1 408 1 670
x = n xi 6 = 68 y yi = 67
1 n 2 10
1
S2 = [(x x )2 + (y y )2]
n1 n2 2
1 210
= [60 + 150] = = 15 S = 3.873
6 10 2 14
xy 68 67 1
t= = = 0.499
1 1 1 1 3.873 0.5164
S 3.873
n1 n2 6 10
Tabulated t0.05 for 14 degrees of freedom for single tail-test is 1.76.
Since calculated value of t is less than 1.76, it is not at all significant at 5% level
of significance. Hence, H0 may be accepted and we conclude that the height of the
plants are not different at 5% level of significance.
Example 2. The mean values of birth weight with standard deviations and
sample sizes are given below by socio-economic status. Is the mean difference in birth
weight significant between socio-economic group ?
Sample size n1 = 15 n2 = 10
Birth weight (kg) x = 2.91 y = 2.26
Standard deviation S1 = 0.27 S2 = 0.22
Group I 25 32 30 34 24 14 32 24 30 31 35 25
Group II 44 34 22 10 47 31 40 30 32 35 18 21 35 29 22
x x x = x 28 (x x )2 y y y = y 30 (y y )2
25 3 9 44 14 196
32 4 16 34 4 16
30 2 4 22 8 64
34 6 36 10 20 400
24 4 16 47 17 289
14 14 196 31 1 1
32 4 16 40 10 100
24 4 16 30 0 0
30 2 4 32 2 4
31 3 9 35 5 25
35 7 49 18 12 144
25 3 9 21 9 81
x = 336 (x x ) = 0 (x x )2 = 380 35 5 25
29 1 1
22 8 64
Student 1 2 3 4 5 6 7 8 Total
Before 49 53 51 52 47 50 52 53 407
After 52 55 52 53 50 54 54 53 423
1 49 52 3 9
2 53 55 2 4
3 51 52 1 1
4 52 53 1 1
5 47 50 3 9
6 50 54 4 16
7 52 54 2 4
8 53 53 0 0
d = 16 d2 = 44
102 Self-Instructional Material
Under H0 the test statistic is Hypothesis Testing
d
t=
S/ n
n
1 16
d =
n d
i1
i
8
=2 NOTES
n
1 1
S2 =
n 1 i1
(di d ) 2 =
n1
[di2 n( d )2]
1 44 32 12
= [44 8 × ( 2)2] = = = 1.714
7 7 7
S = 1.31
|d| | 2| 2 2.83
|t|= = = = 4.32
S/ n 1.31 / 8 1.31
Tabulated t0.05 for (8 1) = 7 degrees of freedom for one tail test is 1.90.
Since calculated value of t is greater than the tabulated t, H0 is rejected at 5%
level of significance. Hence, we conclude that the scores differ significantly before and
after the training, i.e. training was effected.
Example 5. A certain drug administred to 10 patients showed the following
additional hours of sleep :
1.0, 0.5, 2.7, 0.6, 1.2, 1.8, 1.6, 3.5, 0.2, 1.7
Can it be concluded that the drug does produce additional hours of sleep ?
Solution. Here, di are given as
di = xi yi = 1.0, 0.5, 2.7, 0.6, 1.2, 1.8, 1.6, 3.5, 0.2, 1.7
n = 10
di 1.0 0.5 2.7 0.6 1.2 1.8 1.6 3.5 0.2 1.7
d =
n 10
8.2
= 0.82
10
di2 = 1 + 0.25 + 7.29 + 0.36 + 1.44 + 3.24 + 2.56 + 12.25 + 0.04 + 2.89 = 31.32
Null hypothesis H0 : 1 = 2, i.e. the drug does not produce any additional hours
of sleep.
Alternative hypothesis H1 : 1 < 2, i.e. drug is effective (one tail).
Under H0, the test statistic is
d
t=
S/ n
n
1 1
S2 =
n1 (d
i1
i d )2 =
n1
[di2 n( d )2]
1 1
= [31.32 10 × (0.82)2] = [31.32 6.724] = 2.733
10 1 9
S = 1.653
0.82 10 2.593
t= = = 1.57
1.653 1.653
EXERCISE 5.1
1. A brand of matches is sold in boxes on which it is claimed that the average contents are
40 matches. A check on a pack of 5 boxes gives the following results :
41, 39, 37, 40, 38
(i) Test the manufacturers claim keeping the interests of both the manufacturer and
the customer in mind.
(ii) As a customer test the manufacturers claim.
2. A sample of size 10 drawn from a normal population has a mean 31 and a variance 2.25.
Is it reasonable to assume that the mean of the population is 30 ? (Use 1% level of
significance).
3. A random sample of size 10 from a normal population with mean gives a sample mean
of 40 and sample standard deviation of 6. Test the hypothesis that = 44 against 44
at 5% level of significance.
4. A new drug manufacturer wants to market a new drug only if he could be quite sure that
the mean temperature of a healthy person taking the drug could not rise above 98.6°F
otherwise he will withhold the drug. The drug is administered to a random sample of 17
healthy persons. The mean temperature was found to be 98.4°F with a standard deviation
of 0.6°F. Assuming that the distribution of the temperature is normal and = 0.01, what
should the manufacturer do ?
5. The marks of students in two groups were obtained as
I 18 20 36 50 49 36 34 49 41
II 29 28 26 35 30 44 46
Drug A 8 12 13 9 3
Drug B 10 8 12 15 6 8 11
Do the two drugs differ significantly with regard to their effect in increasing weight.
(Given t0.05 = 2.23 for 10 degrees of freedom)
7. The mean life of a sample of 10 electric light bulbs was found to be 1456 hours with
standard deviation of 423 hours. A second sample of 17 bulbs chosen from a different
batch showed a mean life of 1280 hours with standard deviation of 398 hours. Is there a
significant difference between the means of the two batches ?
(Given t0.05 = 2.06 for 25 degrees of freedom)
8. To verify whether a course in Statistics improved performance, a similar test was given
to 12 participants both before and after the course. The original marks recorded in
alphabetical order of the participants were 44, 40, 61, 52, 32, 44, 70, 41, 67, 72, 53 and
72. After the course, the marks were in the same order 53, 38, 69, 57, 46, 39, 73, 48, 73,
74, 60 and 78. Was the course useful ?
(Given t0.05 = 2.201 for 11 degrees of freedom)
104 Self-Instructional Material
9. A certain medicine given to each of the 9 patients resulted in the following increase of Hypothesis Testing
blood pressure. Can it be concluded that the medicine will in general be accompanied by
an increase in blood pressure.
7, 3, 1, 4, 3, 5, 6, 4, 1
(Given t0.05 = 2.306 for 8 degrees of freedom) NOTES
Answers
1. (i) Accept manufacturer s claim (ii) manufacturers claim is justified.
2. Yes 3. Accept null hypothesis
4. The manufacturer should market the drug 5. Two groups are identical
6. No 7. No 8. Yes 9. No
5.12. F-TEST
This test uses the variance ratio to test the significance of difference between
two sampled variances. F-test which is based on F-distribution is called so in honour
of a great statistician Prof. R.A. Fisher.
Let x1, x2, ...... , xn1 and y1, y2, ......, yn2 be the values of two independent random
samples drawn from the same normal population with variance 2. Then, we define
variance ratio F as follows :
S 12
F= ; S1 > S2,
S22
n1
where S1 2 =
1
n1 1
(x
i1
i x)2
n2
1
S22 =
n2 1 i 1
( yi y ) 2
5.13. PROPER
PROPERTIES OF F-DISTRIB
OPERTIES UTION
F-DISTRIBUTION
(i) The value of F cannot be negative as both terms of F-ratio are the squared
values.
(ii) The range of the values of F is from 0 to .
(iii) The F-distribution is independent of the population variance 2 and depends
on 1 and 2 only.
Self-Instructional Material 105
Statistical Analysis The F-distribution for various degrees of freedom 1 and 2 is given in the
following table :
Table : Values of F for 5% and 1% level, where 1 is the number of degree of
freedom for greater estimate of variance and 2 for the smaller estimate of variance.
NOTES
(i) Set up the null hypothesis H0 = 12 = 22 = 2, i.e. the independent estimates
of the common population variance do not differ significantly.
(ii) Find the degrees of freedom 1 and 2 given by 1 = n1 1 and 2 = n2 1
respectively.
(iii) Calculate the variances of two samples and then calculate F.
(iv) From F-distribution table note the value of F for 1, 2 degrees of freedom at
the desired level of significance.
(v) Compare the calculated value of F with tabulated value of F at the desired
level of significance. If the calculated value of F is less than the tabulated value, then
the difference is not significant and we may conclude that the same could have come
from two populations with the same variance i.e., accept H0, otherwise reject H0.
The available F-table give the critical values of F for the right-tailed test, i.e. the
critical region is determined by the right-tail areas. Thus, the significance value
F (1, 2) at level of significance and (1, 2) degrees of freedom is determined by
P[F > F (1, 2)] = , as shown below :
P(F)
Critical value
Acceptance
region (1 – a) Rejection region
(a)
Fa(n1, n2)
SOLVED EXAMPLES
Example 1. In one sample of size 8 the sum of the squares of deviations of the
sample values from the sample mean is 84.4 and in the other sample of size 10 it is
102.6. Test whether this difference is significance at 5% level. Given that for 1 = 7 and
2 = 9 ; F0.05 = 3.29.
Solution. Here, n1 = 8, n2 = 10
and (x x )2 = 84.4, (y y )2 = 102.6
106 Self-Instructional Material
1 1 Hypothesis Testing
S12 = (x x )2 = × 84.4 = 12.057
n1 1 7
1 1
S22 = (y y )2 = × 102.6 = 11.4
n2 1 9
NOTES
Under H0 : 12 = 22 = 2, i.e. the estimates of 2 given by the samples are
homogeneous,
S 12 12.057
F= 2
= 1.057
S2 11.4
1 10 15 90
2 12 14 108
Test whether the samples have been drawn from the same normal population.
Given that for 1 = 9 and 2 = 11 ; F0.05 = 2.90 (approx.).
Solution. Here, n1 = 10, n2 = 12, x = 15, y = 14
(x x )2 = 90 ; (y y )2 = 108
1 1
S12 = (x x )2 = 90 = 10
n1 1 9
1 1
S22 = (y y )2 = 108 = 9.82
n2 1 11
Under H0 : 12 = 22 = 2, i.e. two samples have been drawn from the same
normal population.
S 12 10
F= = 1.018
S 22 9.82
For 1 = 9 and 2 = 11, we have F0.05 = 2.90.
Since calculated value of F is less than F0.05 it is not significant. Hence, null
hypothesis H0 may be accepted.
Example 3. The samples of sizes 9 and 8 give the sum of squares of deviations
from their respective means equal to 160 and 91 square units respectively. Test whether
the samples have been drawn from the same normal population. Given that for 1 = 8
and 2 = 7 ; F0.05 = 3.73.
Solution. Here, n1 = 9, n2, = 8, (x x )2 = 160, (y y )2 = 91
1 1
S12 = (x x )2 = × 160 = 20
n1 1 8
1 1
S22 = (y y )2 = 91 = 13
n2 1 7
S 12 20
F= 2
= 1.54 (approx.)
NOTES S2 13
For 1 = 8 and 2 = 7, we have F0.05 = 3.73
Since calculated value of F is less than F0.05 it is not significant. Hence, H0 may
be accepted.
Example 4. Two samples are drawn from two normal populations. From the
following data test whether the two samples have the same variances at 5% level of
significance.
Sample I 60 65 71 74 76 82 85 87
Sample II 61 66 67 85 78 88 86 85 63 91
Solution. Here, n1 = 8, n2 = 10
Under H0 : S12 = S22 , i.e. two samples have the same variance.
H1 : S12 S22
Sample-I Sample-II
x x x (x x )2 y y y (y y )2
x 600 y 770
x = n 8 = 75 y =
n2
10
= 77
1
1 636
Variance of sample-I = S12 = (x x )2 = = 90.857
n1 1 81
1 1200
Variance of sample-II = S22 = (y y )2 = = 133.33
n2 1 10 1
S 22 133.33
F= = = 1.467
S 12 90.857
For 1 = 7 and 2 = 9, we have F0.05 = 3.29.
Since calculated value of F is less than F0.05, H0 may be accepted, i.e. the samples I
and II have the same variance.
1. In a sample of 8 observations, the sum of squared deviations of items from the mean was
94.5. In another sample of 10 observations, the value was found to be 101.7. Test whether
the difference is significant at 5% level. NOTES
2. The following are the values in thousands of an inch obtained by two engineers in 10
successive measurements with the same micrometer. Is one engineer significantly more
consistent than the other ?
Engineer A 503 505 497 505 495 502 499 493 510 501
Engineer B 502 497 492 498 499 495 497 496 498
3. The nicotine content (in milligrams) of two samples of tobacco were found to be as follows :
Sample A 24 27 26 21 25
Sample B 27 30 28 31 22 36
Can it be said that the two samples come from the same normal population ?
4. The daily wages in ` of skilled workers in two cities are as follows :
A 16 25
B 13 32
Test at 5% level of significance the equality of variances of the wage distribution in the
two cities.
5. The time taken by workers in performing a job by methods I and II is given below :
Method I 20 16 26 27 23 22
Method II 27 33 42 35 32 34 38
Do the data show that the variances of time distribution from population from which
these samples are drawn do not differ significantly ?
6. Two random samples drawn from two normal populations are given below :
Sample I 63 65 68 69 71 72
Sample II 63 62 65 66 69 69 70 71 72 73
Test whether the two populations have the same variance at 5% level of significance.
Answers
1. No 2. Not significant 3. yes
4. Accepted 5. Not significant 6. Yes.
X 1
V(p) = V = 2
V(X) =
nPQ
2 =
PQ
n n n n
PQ
S.E. (p) =
n
p E( p) pP
Z= = ~ N(0, 1)
S.E. ( p) PQ
n
where E expected value, V Variance and S.E. Standard error
Z is called a test statistic which is used to test the significant difference of the
sample and population proportion.
Note 1. The probable limits for the observed proportion of success are E(p) ± Z V( p)
PQ
i.e., P ± Z , where Z is the significant value at the level of significance .
n
2. If P is not known then the probable limits for the proportion in the population are
pq
p ± Z .
n
3. If is not given, then we can use 3 limits. Hence, probable limits for the observed
PQ
proportion of success are P ± 3 and probable limits for the proportion in the population are
n
pq
p±3 .
n
110 Self-Instructional Material
4. A set of four selected values is commonly used for . Each and corresponding Z and Hypothesis Testing
Z/2 values are given in the following table :
To test the significance of the difference between the sample proportions p1 and
p2 we set the null hypothesis H0, that there is no significant difference between the
two sample proportion.
Under the null hypothesis H0, the test statistic is
p1 p2 n1 p1 n2 p2
Z= , where P = and Q=1P
1 1
PQ
n1 n2
n n
1 2
SOLVED EXAMPLES
Example 1. A coin is tossed 324 times and the head turned up 175 times. Test
the hypothesis that the coin is unbiased.
Solution. Null hypothesis H0 : the coin is unbiased i.e.,
1
P=
2
Here, n = 324, X = Number of heads = 175
1
P = prob. of getting a head in a toss =
2
1 1
Q=1P=1 =
2 2
1
420 1000
Z=
X nP
= 3 = 420 333.33 = 86.67 = 5.813
nPQ 1 2 222.222 14.91
1000
3 3
Since | Z | = 5.813 > 3 (Maximum value of Z), H0 is rejected i.e., the die is
biased.
Example 3. A manufacturer claims that only 4% of his products supplied by
him are defective. A random sample of 600 products contained 36 defectives. Test the
claim of manufacturer.
36
Solution. Here p = sample proportion of defectives = = 0.06
600
4
P = proportion of defectives in the population = = 0.04
100
Q = 1 P = 1 0.04 = 0.96
n = 600
Null hypothesis H0 : P = 0.04 is true i.e., the claim of manufacturer is right
65
p = proportion of bad apples in the sample = = 0.13 and
500
q = 1 p = 1 0.13 = 0.87 NOTES
Q The proportion of bad apples P in the population is not known.
We can take P = p = 0.13, Q = q = 0.87 and N = n = 500
PQ 0.13 0.87
S.E. of proportion = = 0.015
N 500
Limits for proportions of bad apples in the population is
PQ 0.13 0.87
P±3 = 0.13 ± 3 = 0.13 ± 0.045 = 0.175 and 0.085
N 500
= 17.5% and 8.5%.
Example 5. A manufacturer claimed that at least 95% of the equipment which
he supplied to a factory conformed to specifications. An examination of a sample of 300
equipments revealed that 27 are faulty. Test his claim at a significance level of (i) 5%
(ii) 1%.
Solution. Here,
X = number of equipments conforming to specifications in the samples
= 300 27 = 273
273
p = sample proportion conforming to specifications = = 0.91
300
Null hypothesis H0 : P = 0.95 (the proportion of equipments conforming to
specification in the population is 95%)
Q = 1 P = 0.05
H1 : P < 0.95 (at least 95% conformed to specification)
| Z | = | 3.175 | = 3.175
(i) Since the H1 is one tailed and the significant value of Z at 5% level of
significance for one tail is 1.645.
Now | Z |= 3.175 > 1.645, H0 is rejected i.e., manufacturers claim is not
acceptable.
(ii) The significant value of Z at 1% level of significance for one tail is 2.33.
Now | Z | = 3.175 > 2.33, H0 is rejected i.e., manufacturers claim is not
acceptable.
Example 6. Before an increase in excise duty on tea, 400 people out of a sample
of 500 persons where found to be tea drinkers. After an increase in the excise duty,
400 persons were known to be tea drinkers in a sample of 600 people. Do you think that
there has been a significant decrease in the consumption of tea after the increase in the
excise duty ?
Solution. Here n1 = 500, n2 = 600
X1 = 400, X2 = 400
Self-Instructional Material 113
Statistical Analysis
400 4
p1 = proportion of drinkers in first sample = = = 0.8
500 5
400 2
p2 = proportion of drinkers in second sample = = = 0.67
NOTES 600 3
Since proportion P of the population is not given, it can be estimated by using
n1 p1 n2 p2 400 400 800 8
P= = = =
n1 n2 500 600 1100 11
8 3
and Q=1P=1 =
11 11
Null hypothesis H0 : P1 = P2 (there is no significant difference in the consumption
of tea before and after increase of excise duty)
Alternative hypothesis H1 : P1 > P2 (right tailed test), under H0 the test statistic
p1 p2 0.0125 0.0083
Under H0 the test statistic Z= =
1 1 1 1
PQ
n n
1 2
0.01 0.99
400 1200
0.0042
= = 0.732
0.00574
Since | Z | = 0.732 < 1.96, null hypothesis is accepted at 5% level of significance.
Hence the difference is not significant.
Example 8. 500 articles from a factory are examined and found to be 2% defective.
800 similar articles from a second factory are found to have only 1.5% defectives. Can it
reasonably concluded that the products of the first factory are inferior to those of second ?
2
p1 = proportion of defectives from first factory = = 0.02
100
n2 = 800, NOTES
1.5
p2 = proportion of defectives from second factory = = 0.015
100
Since proportion P of the population is not given it can be estimated by using
n1 p1 n2 p2 10 12 22
P= = = = 0.017
n1 n2 500 800 1300
and Q = 1 P = 1 0.017 = 0.983
Null hypothesis H0 : P1 = P2 (there is no significant difference between the
products of first and second factory)
Alternative hypothesis H1 : P1 P2 (two tailed test)
Under H0 the test statistic
p1 p2 0.02 0.015
Z= =
1 1 1 1
PQ
n1
n2 0.017 0.983
500 800
0.005
= = 0.678
0.00737
Since | Z | = 0.678 < 1.96, null hypothesis is accepted at 5% level of significance.
Hence there is no significant difference between the products of first and second factory
i.e., the products of the first factory are not inferior to those of second.
Example 9. A manufacturing firm claims that its brand. A products outsells its
brand B products by 8%. If it is found that 84 out of a sample of 400 persons prefer
brand A and 36 out of another sample of 200 persons prefer brand B. Test whether the
8% difference is a valid claim.
Solution. Here, n1 = 400, n2 = 200
84
p1 = proportion of preference of brand A = = 0.21
400
36
p2 = proportion of preference of brand B = = 0.18
200
n1 p1 n2 p2 84 36 120
P= = = = 0.2
n1 n2 400 200 600
and Q = 1 P = 1 0.2 = 0.8
Null hypothesis H0 : 8% difference is there in the sales of brand A and
brand B i.e., P1 P2 = 0.08
Alternative hypothesis H1 : P1 P2 0.08 (two tailed test)
EXERCISE 5.3
1. A coin was tossed 400 times and the head turned up 216 times. Test the hypothesis that
the coin is unbiased.
2. In a hospital 525 female and 475 male babies were born in a month. Do these figures
confirm the hypothesis that females and males are born in equal number ?
3. A die is thrown 10000 times and a throw of 3 or 4 was obtained 4200 times. On the
assumption of random throwing do the data indicate an unbiased die ?
4. Given that on the average 4% of insured men of age 65 die within a year and that 60 of
a particular group of 1000 such men (age 65) died within a year. Can this group be
regarded as a representative sample ?
5. 325 men out of 600 men chosen from a big city were found to be smokers. Does this
information support the conclusion that the majority of men in the city are smokers ?
6. A random sample of 400 apples is taken from a large basket and 40 are found to be bad.
Estimate the proportion of bad apples in the basket and assign limits within which the
percentage most probably lies.
Answers
1. H0 is accepted at 5% level of significance
2. Yes, H0 is accepted at 5% level of significance
3. H0 is rejected 4. H0 is rejected
5. H0 is rejected at 5% level of significance 6. 8.5 : 11.5
7. Using left tailed test, H0 is rejected at both 5% and 1% level of significance
8. No, H0 is accepted 9. H0 is accepted
10. H0 is accepted 11. H0 is rejected at 5% level of significance
12. H0 is rejected.
This test is used to test the significant difference between sample mean and
population mean.
Let X1, X2, ..., Xn be a random sample of size n from a normal population with
mean and variance 2.
The standard error (S.E.) of mean of a random sample of size n from a population
is given by
S.E. ( x ) = , where is the standard deviation of the population.
n
We set the null hypothesis H0 that the sample has been drawn from a large
population with mean and variance 2 i.e., there is no significant difference between
the sample mean ( x ) and population mean ().
Under the null hypothesis H0 the test statistic is
x
Z=
/ n
SOLVED EXAMPLES
1.2 1.2 n
1.96 = =
14.8 / n 14.8
On squaring both the sides we have
1.2
=
2
1.96 14.8
n=
2
(1.96)2
14.8 n or
1.2 = 584.35 584.
EXERCISE 5.4
1. A random sample of 900 members has a mean 3.4 cms. Can it be reasonably regarded as
a sample from a large population of mean 3.2 cms and standard deviation 2.3 cms ?
2. A random sample of 400 male students is found to have a mean height of 160 cms. Can
it be reasonably regarded as a sample from a large population with mean height 162.5 cms
and standard deviation 4.5 cms ?
3. A random sample of 200 measurements from a large population gave a mean value of 50
and a standard deviation of 9. Determine 95% confidence interval for the mean of
population.
4. A random sample of 400 measurements from a large population gave a mean value of 82
and a standard deviation of 18. Determine 95% confidence interval for the mean of
population.
5. A company manufacturing electric bulbs claims that the average life of its bulbs is
1600 hours. The average life and standard deviation of random sample of 100 such bulbs
were 1570 hours and 120 hours respectively. Should we accept the claim of the company ?
Answers
1. Yes, H0 is accepted 2. Yes, H0 is accepted
3. 48.8 and 51.2 4. 80.24 and 83.76
5. No, rejected at 5% level of significance 6. Claim is valid
7. n=4
(i) This test is used to test the significant difference between the means of two
large samples.
Let x1 be the mean of a sample of size n1 from a population with mean 1 and
variance 12 and let x2 be the mean of an independent sample of size n2 from another
population with mean 2 and variance 22.
We set the null hypothesis H0 that there is no significant difference between the
sample means i.e., 1 = 2.
Under the null hypothesis H0 the test statistic is
x1 x2
Z=
12 2 2
n1 n2
If the samples are drawn from the same population with common standard
deviation (), then under the null hypothesis the test statistic is
x1 x2
Z= (Q 1 = 2 = )
1 1
n1 n2
Note. 1. If 1 2 and 1 and 2 are not known, the test statistic is
x1 x2
Z= .
s12 s22
n1 n2
2. If common standard deviation () is not known and 1 = 2 than can be obtained by
using
n1 s12 n2 s22
=
n1 n2
(ii) Standard Deviations. This test is used to test the significant difference NOTES
between the standard deviations of two populations.
Let two independent random sample of sizes n1 and n2 having standard deviations
s1 and s2 be drawn from the two normal population with standard deviation 1 and 2
respectively.
We set the null hypothesis H0 that the sample standard deviations do not differ
significantly i.e., 1 = 2.
Under the null hypothesis H0 the test statistic is
s1 s2
Z=
12 2 2
2n1 2n2
If 1 and 2 are unknown then the test statistic is
s1 s2
Z= .
s12 s 2
2
2n1 2n2
SOLVED EXAMPLES
1 50 140
2 60 150
x1 x2 140 150 10
Z= = =– = 5.22
1 1 1 1 1.915
10
n1 n2 50 60
Since | Z | = 5.22 > 3, H0 is rejected. Hence the samples are not drown from the
same normal population.
x1 x2 70 75 5
Z= = = – = 1.895
s12 2
s2 102
11 2 2.639
n1 n2 70 100
x1 x2 582 546 36
Z= = = = 9.762
s12 s2 2 (24) 2 (28) 2 3.6878
n1 n2 100 100
s1 s2 24 28 4
Z= = = = 1.53
s12 s 2 (24) 2 (28) 2 2.6077
2
2n1 2n2 200 200
1. The number of accidents per day were studied for 144 days in city A and for 100 days in
city B. The mean numbers of accidents and standard deviations were respectively 4.5
and 1.2 for city A and 5.4 and 1.5 for city B. Is city A more prone to accidents than city B. NOTES
2. The mean yields of a crop from two places in a district were 210 kgs and 220 kgs per acre
from 100 acres and 150 acres respectively. Can it be regarded that the sample were
drawn from the same district which has the standard deviation of 11 kgs per acre ?
3. Given the following data :
Girls 75 8 60
Boys 73 10 100
Examine whether
(i) the difference in the variability in yields is significant,
(ii) the difference in the mean yields is significant.
Answers
1. No 2. No 3. Yes, highly significant
NOTES
6. NON-PARAMETRIC TESTS
STRUCTURE
Chi-square Test
Chi-square Test to Test the Goodness of Fit
Chi-square Test to Test the Independence of Attributes
Conditions for 2 Test
Uses of 2 Test
Correlation Analysis
Scatter or Dot Diagram
Characteristics of the coefficient of Correlation r
Spearmans Rank Correlation
The value of 2 is used to test whether the deviations of the observed (actual)
frequencies from the theoretical (expected) frequencies are significant or not. Chi-
square test is also used to test whether a set of observations fit a given distribution or
not. Therefore, chi-square provides a test of goodness of fit.
&'K
K% (O i Ei ) 2 K() NOTES
*K
2 =
i1
Ei
is distributed with (n 1) degrees of freedom.
Here, we test the null hypothesis
H0 : There is no significant difference between the observed (actual) values and
the corresponding expected (theoretical) values.
v.s., H1 : H0 is not true.
If 2cal 2tab (or 2, n 1) then H0 is rejected otherwise H0 is accepted.
Note. If the null hypothesis H0 is true, the test statistic 2 follow chi-square distribution
with (n 1) degrees of freedom, where
n n n
O E
i i ; i.e. (O
i1
i Ei ) = 0.
i1 i1
The value of 2 is used to test whether two attributes are associated or not, i.e.
independence of attributes. To test the independence of attributes contingency table is
used.
A contingency table is a two-way table in which rows are classified according to
one attribute or criterion and columns are classified according to the other attribute or
criterion. Each cell contains that number of items O ij possessing the qualities of the
ith row and jth column, where i = 1, 2, ......, r and j = 1, 2, ......, s. In such a case
contingency table is said to be of order (r × s). Each row or column total is known as
r
marginal total. Also we have the sum of row totals R
i1
i is equal to the sum of column
s
totals C
j1
j , i.e.
R i
i = C
j
j = N, where N is the total frequency.
Let us consider the two attributes A and B, where A divided into r classes A1,
A2, ......, Ar and B divided into s classes B1, B2, ...... , Bs. If Ri represents the number of
persons possessing the attributes Ai ; Cj represents the number of persons possessing
Columns
NOTES B1 B2 ...... Bs Total
Rows
# # # # #
# # # # #
Total C1 C2 ...... Cs N
2
N
N | ad bc |
2
=
2 .
(a b) (b d) ( a c) (c d)
SOLVED EXAMPLES
Example 1. The following table gives the number of accidents that took place in
an industry during various days of the week. Test whether the accidents are uniformly
distributed over the week.
No. of accidents 16 20 14 13 17 16
Oi 16 20 14 13 17 16
Ei 16 16 16 16 16 16
(Oi Ei)2 0 16 4 9 1 0
6
(O i E i ) 2 0 16 4 9 1 0 30
2 =
i=1
Ei
=
16
=
16
= 1.875.
is 11.07.
Since calculated value of 2 is less than tabulated value of 2, so H0 is accepted,
i.e., the accidents are uniformly distributed over the week.
Oi 16 30 22 18 14 20
Ei 20 20 20 20 20 20
6
(O i E i ) 2 16 100 4 4 36 0 160
2 =
i1
Ei
=
20
=
20
=8
is 11.07. Since calculated value of 2 is less than tabulated value of 2, so H0 is accepted,
i.e. the die is unbiased.
Example 3. The following table shows the distribution of digits in numbers
chosen at random from a telephone directory :
Digits 0 1 2 3 4 5 6 7 8 9
Frequency 1026 1107 997 966 1075 933 1107 972 964 853
Test at 5% level whether the digits may be taken to occur equally frequently in
the directory.
Solution. Here, n = 10, total frequency = 10,000
Null hypothesis H0 : all the digits occur equally frequently in the directory
10,000
Under H0, the expected frequency of each of the digits = = 1000
10
The observed and expected frequencies are given below :
Oi 1026 1107 997 966 1075 933 1107 972 964 853
Ei 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
(Oi Ei)2 676 11449 9 1156 5625 4489 11449 784 1296 21609
10
(O i E i ) 2 676 11449 ...... 21609
2 =
i1
Ei
1000
58542
= 58.542
1000
Test whether the data are consistent with the hypothesis that Binomial law holds
and the chance of male and female births are equally probable.
Solution. Null hypothesis
1
H0 : The male and female births are equally probable i.e., p = q = , where p is
2
the probability of female birth and q is the probability of male birth.
The expected frequencies are calculated by using Binomial distribution as :
E(r) = N × P (X = r), where r = 0, 1, 2, 3, 4, 5 ; where N is the total frequency
and E(r) is the number of families with r female children.
P(X = r) = nCr prqn r ; n is number of children.
E(0) = No. of families with 0 female children
1 1
0 50
1
= 320 × 5C0
2 2 32
= 320 ×
= 10
= 320 × 5C5 1
5
1
55 1
2
2
= 320 ×
32
= 10
Oi 14 56 110 88 40 12
Ei 10 50 100 100 50 10
Frequency 6 13 13 8 4 3
23
E(3) = 47 × e2 . = 8.48 9
3!
24
E(4) = 47 × e2 . = 4.24 4
4!
25
E(5) = 47 × e2 . = 1.696 2
5!
x 0 1 2 3 4 5
Oi 6 13 13 8 4 3
5
(O i E i ) 2 0.1296 0.0784 0.0784 0.2304 0.0576 1.7004
2 =
i0
Ei
=
6.36
12.72
12.72
8.48
4.24
1.696
= 0.02038 + 0.00616 + 0.00616 + 0.02717 + 0.01358 + 1.0026
= 1.07605
4
(O i E i ) 2 900 900 900 900
2 =
i1
Ei
=
1100 400 300 200
= 0.8182 + 2.250 + 3.000 + 4.500 = 10.5682
Tabulated value of 2 for 3 (4 1 = 3) degrees of freedom at 5% level of significance
is 7.815. Since calculated value of 2 is greater than tabulated value of 2, so H0 is
rejected, i.e. the experimental results does not support the theory.
A1 a b a+b
A2 c d c+d
Literate 83 57
Illiterate 45 68
From this information find out whether there is any relation between literacy
and the smoking.
Solution. Null hypothesis H0 : There is no relation between literacy and the
smoking, i.e. they are independent
2 2
(O ij Eij ) 2
2 =
i1j1
Eij
Sex Education
NOTES
Middle High school College
Male 10 15 25
Female 25 10 15
Based on this information can you say the education depends on sex.
Solution. Null hypothesis H0 : Education is independent of sex.
Under the null hypothesis expected frequencies can be calculated by using
Ri C j
Eij =
N
(i = 1, 2 ; j = 1, 2, 3)
Sex Education
Male 10 15 25 50 (R1)
Female 25 10 15 50 (R2)
Education
Sex
Middle High school College Total
50 35 50 25 50 40
Male = 17.5 = 12.5 = 20 50
100 100 100
50 35 50 25 50 40
Female = 17.5 = 12.5 = 20 50
100 100 100
Total 35 25 40 100
2 3
(O ij E ij ) 2
2 =
i1j1
E ij
(10 17.5) 2 (15 12.5) 2 (25 20) 2 (25 17.5) 2 (10 12.5) 2 (15 20) 2
=
17.5 12.5 20 17.5 12.5 20
= 3.214 + 0.5 + 1.25 + 3.214 + 0.5 + 1.25 = 9.928
Tabulated value of 2 for 2 [(2 1) (3 1) =2] degrees of freedom at 5% level of
significance is 5.991. Since calculated value of 2 is greater than tabulated value of 2,
so H0 is rejected, i.e., education is not independent of sex or there is a relation between
education and sex.
Solution. Null hypothesis H0 : The colour of sons eye is not associated with
that of father, i.e., they are independent.
Under the null hypothesis expected frequencies can be calculated by using
Ri C j
Eij = (i = 1, 2 ; j = 1, 2)
N
Expected frequencies are
2 2
(O ij Eij ) 2
2 =
i1j1
Eij
3 2
(O ij E ij ) 2
2 =
i1j1
E ij
EXERCISE 6.1
1. The frequency distribution of the digits on a set of random numbers was observed to be :
Digits 0 1 2 3 4 5 6 7 8 9
Frequency 18 19 23 21 16 25 22 20 21 15
(Sales ,000 `) 65 54 60 56 71 84
Test the hypothesis that the sales do not depend on the day of the week, using a 5%
significant level.
Frequency 40 32 29 59 57 59
Frequency 10 25 40 30 7
Frequency 109 65 22 3 1
8. The following table gives the classification of 100 workers according to sex and nature of
work. Using 2-test examine whether the nature of work is independent of the sex of the
worker.
Skilled Unskilled
Male 40 20
Female 10 30
9. For the data given in the following table use 2 -test to test the effectiveness of inoculation
in preventing the attack of smallpox.
Inoculated 25 220
A 140 100 15
B 140 50 20
Test whether the sampling techniques of the two investigators are significantly depend-
ent of the income groups of people.
Answers
1. Yes 2. Accepted
3. Yes 4. Biased
5. No 6. Yes
7. Poisson distribution is a good fit to the data
8. No
9. Inoculation against smallpox is a preventive measure
10. Sampling techniques are dependent of the income groups
6.6. CORRELATION AN
CORRELATION AL
ANAL YSIS
ALY
6.7. SCATTER OR DO
SCATTER T DIA
DOT GRAM
DIAGRAM
When we plot the corresponding values of two variables, taking one on X-axis
and the other along Y-axis, it shows a collection of dots. This collection of dots is called
a dot diagram or a scatter diagram.
If all the plotted points lie in a straight line and show an upward trend, then the
correlation is perfect positive. If all the plotted points lie in a straight line and show a
downward trend, then the correlation is perfect negative.
If the plotted points are not on a straight line but seem to be scattered around a
straight line, the variables are correlated. Closer the scatter of points around a line,
higher is the degree of correlation. If the plotted points are not clustered around a
straight line but are widely scattered over the diagram, then there is a very low degree
of correlation between the variables.
Self-Instructional Material 139
Statistical Analysis If the plotted points show no trend at all, then the variables are independent
and are not correlated.
Y Y
NOTES
Y Y
Y Y
O No correlation X
n n n
n xi yi y xi i
i1 i1 i1
or r(x, y) =
n n
2
n n
2
x x y y
2 2
n i i n i i
i1 i1 i1 i1
u u n v v
2 2
n i i i i
i1 i1 i1 i1
(i) 1 r + 1
(ii) If r = 1, then there is perfect negative correlation between x and y.
(iii) If r =1, then there is perfect positive correlation between x and y.
(iv) If r = 0, then there is no correlation between x and y.
(v) If 1 r < 0, then there is negative correlation between x and y.
(vi) If 0 < r 1, then there is positive correlation between x and y.
r(x, y) = 1
, 2
n (n 1)
where di is the difference of corresponding rank and n is the number of pairs of
observations.
Let (xi , yi) ; i = 1, 2, ......, n be the ranks of the ith individuals in two characteristics
x and y respectively. Assuming that no two individuals are equal in either classification,
each individual takes the values 1, 2, 3, ......, n.
n
1 1 1 n (n 1) n1
Then x =
n
x
i1
i
n
(1 + 2 + 3 + ...... + n) = .
n 2
=
2
n
1 1 1 n (n 1) n 1
y =
n y
i1
i
n
(1 + 2 + 3 + ...... + n) =
n 2
2
( y) = n1 (1 + 2 + 3 + ...... + n ) n 2 1
n
y
2
1 2 2
x2 = y2 = 2 2 2 2
n i1
i
1 n (n 1) (2n 1) n 1
2
2
2
(n 1) (2n 1) (n 1)
=
n 6 6 4
(n 1) (n 1) (n 1) n 2 1
= (4n + 2 3n 3) =
12 12 12
n n n
di 2 = ( xi yi ) 2 = [( x
i1
i x ) ( yi y )]2 (Q x y)
i1 i1
n n n
= ( xi x ) 2 + ( yi y ) 2 2 (x
i1
i x ) ( yi y )
i1 i1
1
n
1
n
1
n 1 n "#
di 2 ( xi x ) 2 ( yi y ) 2 2 (x x ) ( yi y)
#$
!n
i
n i1
n i1
n i1 i1
n n
1 1
We know that var (x) =
n (x
i1
i x ) 2 , var (y) =
n (y i y) 2
i1
n
1
cov (x, y) =
n (x
i1
i x ) (yi y )
n
1
n d
i1
i
2
= var (x)+ var (y) 2cov (x, y)
1
d
n
2
= 2 var (x) 2r(x, y) x y
' r ( x, y)
cov ( x, y) "#
#$
i
n i1
n
! x y
1
n d
i1
i
2
= 2x2 2r (x, y) x2 [Q x = y]
n n 1
2
1
n d
i1
i
2
=2 12 [1 r(x, y)] NOTES
n
1 n2 1
n
i1
di 2
6
[1 r (x, y)]
n
6
n (n 2 1)
d
i1
i
2
= 1 r (x, y)
n
6 d
i1
i
2
r(x, y) = 1 2
.
n (n 1)
SOLVED EXAMPLES
x 2 4 6 8 10
NOTES
y 20 12 18 10 40
Solution.
x y x2 y2 xy
2 20 4 400 40
4 12 16 144 48
6 18 36 324 108
8 10 64 100 80
10 40 100 1600 400
Here, n=5
n xy x y
r (x, y) =
n x ( x) 2
2
ny 2 (y) 2
5 676 30 100
=
5 220 (30) 2 5 2568 (100) 2
3380 3000 380
= = 0.5042
1100 900 12840 10000 200 2840
Example 4. Find the Karl Pearsons coefficient of correlation between x and y
for the following data :
y 65 66 67 70 68 53 70 63
x y u v u2 v2 uv
150 65 5 3 25 9 15
153 66 2 2 4 4 4
154 67 1 1 1 1 1
155 70 0 2 0 4 0
157 68 2 0 4 0 0
160 53 5 15 25 225 75
163 70 8 2 64 4 16
164 63 9 5 81 25 45
Student A B C D E F G H I J
Rank in Maths. 9 10 6 5 7 2 4 8 1 3
Rank in Stats. 1 2 3 4 5 6 7 8 9 10
Student R1 R2 d = R 1 R2 d2
A 9 1 8 64
B 10 2 8 64
C 6 3 3 9
D 5 4 1 1
E 7 5 2 4
F 2 6 4 16
G 4 7 3 9
H 8 8 0 0
I 1 9 8 64
J 3 10 7 49
d2 = 280
6 di 2 6 280
1680
r=1 2 =1 =1
= 1 1.697 = 0.697.
2
n(n 1) 10 (10 1) 990
Example 6. Ten competitors in a beauty contest are ranked by three judges in
the following order :
First Judge 1 6 5 10 3 2 4 9 7 8
Second Judge 3 5 8 4 7 10 2 1 6 9
Third Judge 6 4 9 8 1 2 3 10 5 7
Using the rank correlation method, discuss which pair of judges has the nearest
approach to common taste in beauty ?
1 3 6 2 5 3 4 25 9
NOTES 6 5 4 1 2 1 1 4 1
5 8 9 3 4 1 9 16 1
10 4 8 6 2 4 36 4 16
3 7 1 4 2 6 16 4 36
2 10 2 8 0 8 64 0 64
4 2 3 2 1 1 4 1 1
9 1 10 8 1 9 64 1 81
7 6 5 1 2 1 1 4 1
8 9 7 1 1 2 1 1 4
Total 200 60 214
Here, n = 10
Rank correlation coefficient between first and second judges
6d12 2 6 200 40
r12 = 1 2 =1=1 = 0.212
n(n 1) 10 99 33
Rank correlation coefficient between first and third judges,
6 d13 2 6 60 4
r13 = 1 =1 =1 = 0.636
n(n 2 1) 10 99 11
Rank correlation coefficient between second and third judges,
6d23 2
6 214 214
r23 = 1 2 =1 =1 = 0.297
n (n 1) 10 99 165
Since r13 is a maximum, therefore, the pair of judges first and third has the
nearest approach to common tastes in beauty.
Example 7. The marks obtained by 9 students in Statistics and Mathematics
are given below :
Marks in Statistics 35 23 47 17 10 43 9 6 28
Marks in Mathematics 30 33 45 23 8 49 12 4 31
35 30 3 5 2 4
23 33 5 3 2 4
47 45 1 2 1 1
17 23 6 6 0 0
10 8 7 8 1 1
43 49 2 1 1 1
9 12 8 7 1 1
6 4 9 9 0 0
28 31 4 4 0 0
Total 12
6 di 2 6 12
r=1 =1
n(n 2 1) 9(9 2 1)
6 12 NOTES
=1 = 1 0.1 = 0.9.
9 80
EXERCISE 6.2
1. Calculate the Karl Pearsons correlation coefficient between height of father and height
of son from the given data :
2. Calculate the Karl Pearsons correlation coefficient from the following data :
Cost ( 000 `) 15 19 16 19 17 18 16 18 15
3. Calculate the Karl Pearsons correlation coefficient from the following data using 20 as
the working mean for price and 70 as the working mean for demand.
Price 14 16 17 18 19 20 21 22 23
Demand 84 78 70 75 66 67 62 58 60
4. Coefficient of correlation between x and y for 20 items is 0.3. Mean of x is 15 and mean of
y is 20 while standard deviations are 4 and 5 for x and y respectively. At the time of
calculation one item 27 has wrongly been taken as 17 in case of x series and 35 instead
of 30 in case of y series. Find the correct coefficient of correlation.
5. Ten students got the following marks in Statistics and Mathematics :
Marks in Statistics 78 36 98 25 75 82 90 62 65 39
Marks in Mathematics 84 51 91 60 68 62 86 58 53 47
x 10 12 8 15 20 25 40
y 15 10 6 25 16 12 18
x 4 6 8 10 12 14 16 18
y 10 15 20 25 30 35 40 45
Competitors A B C D E F G H I J
Rank by Judge I 6 4 3 1 2 7 9 8 10 5
NOTES
Rank by Judge II 4 1 6 7 5 8 10 9 3 2
Answers
1. r = 0.81 2. r = 0.11547 3. r = 0.9542
4. Correct r = 0.515 5. r = 0.78 6. 0.9151 7. r = 0.57
8. r=1 9. Yes 10. n = 10 11. r = 0.2576.
NOTES
7. REGRESSION ANALYSIS
STRUCTURE
Linear Regression
Lines of Regression
Properties of Regression Coefficients
Angle Between Two Lines of Regression
Standard Error of Estimate (or Prediction)
Coefficient of Determination
Properties of Coefficient of Determination
7.3. PROPER
PROPERTIES OF REGRESSION COEFFICIENTS
OPERTIES
(i) The correlation coefficient and two regression coefficients are of the same
sign.
(ii) If one of the regression coefficient is greater than unity, the other must be
less than unity.
(iii) Arithmetic mean of regression coefficients is greater than the correlation
coefficient.
(iv) The correlation coefficient is the geometric mean between the regression
coefficients.
(v) Regression coefficients are independent of the origin and not of scale.
If is the acute angle between the two lines of regression in the case of two
variables x and y, then NOTES
1 r 2 x y
tan = 2
. 2 ,
r x y2
where r, x and y have their usual meanings.
Explain the significance when r = 0 and r = 1.
Proof. Equation of the line of regression y on x is
y
y y =r (x x )
x
and the equation of the line of regression x on y is
x x = r x ( y y)
y
y y
Their slopes are m1 = r and m2 =
x r x
y y
r
m2 m1 r x x
tan = =
1 m1m2 y 2
1
x2
1 r2 y 2 1 r2 x y
= . . 2 x 2 . 2
r x x y r x y2
Since r2 1 and x, y are positive.
Positive sign gives the acute angle between the lines.
1 r2 x y
Hence, tan = . 2
r x y2
When r = 0, = .
2
So two lines of regression are perpendicular to each other.
When r = ± 1, tan = 0 so that = 0 or .
So two lines of regression coincide and there is perfect correlation between the
two variables x and y.
SOLVED EXAMPLES
Example 1. Find the equation of two lines of regression for the data :
x 1 2 3 4 5
y 7 6 5 4 3
and hence find an estimate of y for x = 3.5 from the appropriate line of regression.
x y x2 y2 xy
1 7 1 49 7
NOTES
2 6 4 36 12
3 5 9 25 15
4 4 16 16 16
5 3 25 9 15
Here, n = 5
1 15 1 25
x = n xi = 5 = 3, y = n yi = 5 = 5
n xi yi xi yi 5 65 15 25 13 15
Now, byx = =1
nxi 2 (xi ) 2 5 55 (15) 2 11 9
n xi yi xi yi
5 65 15 25 13 15
bxy = =1
nyi 2 (yi ) 2 5 135 (25) 2 27 25
So, line of regression of y on x is
y y = byx( x x ) y 5 = 1 (x 3)
y=x+8
and the line of regression of x on y is
x x = bxy (y y ) x 3 = 1 (y 5)
x= y+8
To estimate the value of y when x is given, we use the line of regression of y on
x, i.e.
y=x+8
Now substitute x = 3.5, we have
y = 3.5 + 8 = 4.5.
Example 2. The following table gives age (x) in years of cars and annual
maintenance cost (y) in hundred rupees :
x 1 3 5 7 9
y 15 18 21 23 22
Estimate the maintenance cost for a 6 years old car after finding the appropriate
line of regression.
Solution.
x y x2 xy
1 15 1 15
3 18 9 54
5 21 25 105
7 23 49 161
9 22 81 198
3 12 36
r= = = 0.536
5 25 125
(Q Regression coefficients are positive so r will be positive)
Example 4. For 100 students of a class, the regression equation of marks in
Statistics (x) on the marks in Mathematics (y) is 3y 5x + 180 = 0. The mean marks in
4
Mathematics is 50 and variance of marks in Statistics is th of the variance of marks
9
9 3
r= 1 = = 0.75 (r is positive since byx and bxy are positive)
16 4
9 (4)
x bxy y 16 = 3 (Q
bxy = r
y
x =
r
3 y2 = 16)
4
Cov. ( x, y)
r=
x y
3
cov (x, y) = r x y = 3 4 = 9.
4
Example 6. From the following information on values of two variables x and y,
find the two regression lines and estimate values of x and y if y = 10 and x = 8 respectively.
n = 5, x = 15, y = 18, x2 = 55, y2 = 74, xy = 58.
10 18
x x = bxy (y y ) x 3 =
23
y
5 or x = 0.435y + 1.435 ...(2)
3 3 3 3
r=
4 4 4
==
4
(Q byx and bxy have the negative sign)
3 1 1 1
r= =
2 6 2 2
(r is negative since byx and bxy are negative)
(iii) Since we have to show the estimated value of y for the given value of x, we
use line of regression of y on x
3
y= x + 13
2
3
Putting x = 0, we have y = × 0 + 13 = 13
2
To show the estimated value of x for the given value of y, we use line of regression of
x on y
1 31
x= y+
6 6
Self-Instructional Material 157
Statistical Analysis Putting y = 13, we have
1 31 18
x= × 13 + = 3.
6 6 6
NOTES
7.5. STAND
STANDARD ERR
ANDARD OR OF ESTIMA
ERROR TE (OR PREDICTION)
ESTIMATE
The square root of arithmetic mean of squared deviation of the predicted value
from the observed value is known as the standard error of estimate or prediction. It is
given by
( y y p ) 2
Eyx = ,
n
where y is the actual value and yp is the predicted value ; Eyx is called the standard
error of estimate or prediction of y on x.
Example. Find the standard error of estimate of y on x from the following data :
x 1 2 3 4 5
y 2 5 3 8 7
x y x2 xy
1 2 1 2
2 5 4 10
3 3 9 9
4 8 16 32
5 7 25 35
x = 15 y = 25 x2 = 55 xy = 88
x 15
x = =3
n 5
y 25
y = =5
n 5
Here, n=5
n xy xy 5 88 15 25 65
byx = = = = 1.3
n x (x)2 2 5 55 (15) 2 50
The line of regression of y on x is given by
y y = byx (x x )
y 5 = 1.3 (x 3)
y = 1.3x + 1.1 or yp = 1.3x + 1.1
( y y p ) 2 9.10
Eyx = = = 182
. = 1.349.
n 5
7.8. PROPER
PROPERTIES OF COEFFICIENT OF
OPERTIES
DETERMINATION
DETERMINA
EXERCISE 7.1
x 10 9 8 7 6 4 3
y 8 12 7 10 8 9 6
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
NOTES
Estimate the value of y, when x = 10.
3. Find the regression lines for the following data :
x 6 2 10 4 8
y 9 11 5 8 7
4. Find the regression coefficient bxy between x and y for the following data :
x = 30, y = 42, xy = 199, x2 = 184, y2 = 318 and n = 6
5. Find the regression coefficients byx and bxy for the following data :
x = 24, y = 12, x2 = 374, y2 = 97, xy = 157 and n = 7.
Also, find the coefficient of correlation between x and y.
6. Find the regression line of x on y and estimate the value of x, when y = 5 from the
following data :
x = 125, y = 100, x2 = 1650, y2 = 1500, xy = 50 and n = 25.
7. The following regression equations were obtained from a correlation table :
y = 0.516 x + 33.73 ; x = 0.512 y + 32.52
Find the value of
(i) the mean of xs and the mean of ys (ii) the correlation coefficient.
(iii) the coefficient of determination.
8. You are given the following data :
Series x y
Mean 18 100
Standard deviation 14 20
Correlation coefficient between x and y = 0.8.
Find (i) the regression coefficients byx and bxy.
(ii) the two regression lines.
(iii) estimate the value of y, when x = 70.
(iv) estimate the value of x, when y = 90.
9. If 4x 5y + 33 = 0 and 20x 9y 107 = 0 are two lines of regression. Find
(i) the mean values of x and y.
(ii) the regression coefficients byx and bxy.
(iii) the correlation coefficient between x and y.
(iv) the standard deviation of y, if variance of x is 9.
(v) the coefficient of determination.
10. Find the standard error of estimate of y on x for the following data :
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
11. In a partially destroyed record, for the estimation of the two lines of regression from a
bivariate data (x, y), the following results were available :
Regression coefficient of y on x = 1.6, regression coefficient of x on y = 0.4, standard
error of the estimate of y on x = 3.
Find (i) coeff. of correlation between x and y (ii) standard deviation x and y (iii) standard
error of estimate of x on y.
8. (i) byx = 1.14 and bxy = 0.56 (ii) y = 1.14x + 79.41 ; x = 0.56y 38 (iii) 159.21 (iv) 12.4
4 9
9. (i) 13, 17 (ii) byx = , bxy = (iii) r = 0.6 (iv) y = 4 (v) r2 = 0.36
5 20
10. 0.564 11. (i) r = 0.8 (ii) x = 2.5, y = 5, (iii) Exy = 1.5
STRUCTURE
Introduction
Causes of Variations
Methods of Statistical Quality Control
Advantages of Statistical Quality Control
Control Charts
Types of Control Charts
Control Charts for Variables
Control Charts for Attributes
(i) Control Chart for Fraction Defectives (p-chart)
(ii) Control Chart for Number of Defectives (np-chart)
(iii) Control Chart for Number of Defects (c-Chart)
8.1. INTRODUCTION
INTRODUCTION
Products of exactly the same quality are not possible to be produced in the
continuous flow of any manufacturing process. So the variations in quality of the product
remains inevitable. These variations occurs due to two types of causes :
(i) Chance or Random causes. Some deviations from the desired specifications
are bound to occur in the items produced, howsoever efficient, the production process
may be. If the variations occurs due to some inherent pattern of variation and no
causes can be assigned to it, it is called chance or random variation.
For instance, slight variation in temperature, pressure and humidity, etc. interact
randomly to produce slight variation in the quality of the product. Chance variation is
162 Self-Instructional Material
tolerable and does not materially affect the quality of a product. In such a situation, Statistical Quality Control
the process is said to be under statistical control.
(ii) Assignable causes. Assignable causes (also called non random or systematic)
can be easily identified. The assignable cause may occur at any stage of the process.
These causes can be easily removed. Assignable causes of variation may be due to NOTES
defective raw material, negligence of the operators, improper handling of machines,
faulty equipments, etc. In such a situation, the process is said to be out of control.
To control the quality characteristics of the product, there are two main methods :
1. Process control. The main aim in any production process is to control and
maintain the quality of the product to requisite standard during the manufacturing
process. This is termed as process control and is achieved through the use of control
charts given by W.A. Shewhart.
2. Product control. This technique is concerned with inspection of already
manufactured product to ascertain whether they are acceptable to the consumer or
not. This is achieved through an acceptance inspection or a sampling inspection plan.
Such a sampling inspection is often termed as product control.
8.4. ADVANT
ADV AGES OF ST
ANTA ATISTICAL QU
STA ALITY CONTR
QUALITY OL
CONTROL
Control charts are the devices to describe the patterns of variation. The control
charts were developed by W.A. Shewhart of Bell Telephone Laboratories in 1924. Based
Y
Out of control
UCL
Quality characteristics
Under
CL
control
LCL
Out of control
0 1 2 3 4 5 6 7 8 9 10 11 12 13 X
Sample numbers
The control chart has a horizontal scale that represents the consecutive sample
number and a vertical scale that represents the quality characteristic of each sample.
k
R 1 R 2 ... R k 1
and R =
k
k Ri 1
i
3R
Upper control limit (UCL) = x
d2 n
3R
Lower control limit (LCL) = x
d2 n
3R
UTL x = x
d2
3R
LTL x = x
d2
6R
The process capability is = 6 = , where is standard deviation.
NOTES d2
Table
Let n be the sample size taken from the production process at different time
intervals. If d be the number of defectives in this sample of size n, then the fraction
d
defective in this sample is given by p = or d = np
n
p (1 p)
p =
n
Control limits for p-chart are given by
CLp = p
p (1 p)
UCLp = p + 3p = p + 3
n
p (1 p)
LCLp = p 3p = p 3
n
Since the number of defectives (or fraction defectives) cannot be negative, if
LCL comes out to be negative, it is taken as zero.
To construct the p-chart, p-values are taken on the y-axis and sample numbers
on the x-axis. If any point lies outside the control limits, it is concluded that the process
is not under control otherwise under control.
8.9. (ii
(ii)) CONTROL CHAR
CONTROL CHART T FOR NUMBER OF
DEFECTIVES ((np
np
np--CHAR T)
CHART)
If n is the sample size and d is the number of defectives in this sample, then
d = np, where p is the fraction defectives in the sample.
Now, let if np represents the average number of defectives per sample of constant
Total number of defective items in all the samples inspected
size, i.e., np =
Number of samples inspected
Now the standard deviation (np) is given by
p (1 p)
np = np = n = np (1 p)
n
Control limits for np-chart are given by
CLnp = np
UCLnp = np + 3np = np + 3 np (1 p)
LCLnp = np 3np = np 3 np (1 p)
Since the number of defectives cannot be negative, if LCL comes out to be
negative, it is taken as zero. To construct the np-chart, np, i.e., d values are taken on
the y-axis and sample numbers on the x-axis. If any point lies outside the control
limits, it is concluded that the process is not under control otherwise under control.
SOLVED EXAMPLES
Example 1. Using the following data calculate the control limits for x -chart:
n = 12, x = 138.6, R = 7.4 and d2 = 3.258.
Solution. The control limits for x -chart are calculated as :
CL = x = 138.6
3R 3 7.4
UCL = x = 138.6 +
d2 n 3.258 12
= 138.6 + 1.967 = 140.567
3R 3 7.4
LCL = x = 138.6
d2 n 3.258 12
= 138.6 1.967 = 136.633.
Example 2. Using the following data calculate the control limits for R-chart:
n = 4, R = 9.60, d2 = 2.059 and d3 = 0.880.
Solution. The control limits for R-chart are calculated as :
CL = R = 9.60
3d3 R 3 0.880 9.60
UCL = R + = 9.60
d2 2.059
= 9.60 + 12.3089 = 21.91
Sample No. 1 2 3 4 5 6 7 8 9 10
Mean ( x ) 49 45 48 53 39 47 46 39 51 45
Range (R) 7 5 7 9 5 8 8 6 7 6
Calculate the control limits of x and R-charts. Comment on the state of control
without drawing the charts.
Solution. Here, n = 5, k = 10, A2 = 0.58, D3 = 0
and D4 = 2.11 (from table for n = 5)
x 49 45 ... 45 462
x = = 46.2
k 10 10
R 68
R = = 6.8
k 10
For x -chart
CL = x = 46.2
UCL = x + A2 R = 46.2 + 0.58 × 6.8
= 46.2 + 3.944 = 50.144
LCL = x A2 R = 46.2 0.58 × 6.8
= 46.2 3.944 = 42.256
For R-chart
CL = R = 6.8
UCL = D4 R = 2.11 × 6.8 = 14.348
LCL = D3 R = 0 × 6.8 = 0
For x -chart some of the points are above and below the UCL and LCL, so the
process is not under control.
For R-chart all of the points lie within the UCL and LCL, so the process is under
control.
Example 4. A company manufactures screws to a nominal diameter 0.500 ±
0.030 cm. Five samples of size 3 each were taken from the manufactured lot at different
lengths. The readings are as follows :
1 2 3
NOTES
0.54
0.53
UCL = 0.5172
0.52
Sample mean x
0.51
0.50
CL = 0.4988
0.49
0.48
LCL = 0.4804
0.47
0.46
1 2 3 4 5 6
Sample No.
( x -chart)
It is clear from the figure that all the values of x lies within the UCL and LCL,
so the process is under control.
0.07
0.06
UCL = 0.0463
Sample range R
0.05
0.04
0.03
0.02
CL = 0.018
0.01
LCL = 0
0.00
1 2 3 4 5 6
Sample No.
(R-chart)
It is clear from the figure that all the values of R lies within the UCL and LCL,
so the process is under control.
Example 5. The following are the mean lengths and ranges of lengths of a finished
product from 10 samples each of size 5. The specification limits for length are 200 ± 5
cm. Construct x and R-charts and examine whether the process is under control.
Sample No. 1 2 3 4 5 6 7 8 9 10
Mean ( x ) 201 198 202 200 203 204 199 196 199 201
Range (R) 5 0 7 3 4 7 2 8 5 6
205
204
202
201
Sample mean x
CL = 200
200
199
198
196
195
1 2 3 4 5 6 7 8 9 10
Sample No.
( x -chart)
It is clear from figure that three points lie outside the UCL and LCL, so the
process is not in control.
8 NOTES
7
Sample range R
5 CL = 4.7
4
1
LCL = 0
0
1 2 3 4 5 6 7 8 9 10
Sample No.
(R-chart)
It is clear from figure that all the values of R lies within UCL and LCL, so the
process is under control.
Example 6. A bulb manufacturing company ABC samples the fused bulb, taking
sample of 5 each every hour. These samples sets of five have been arranged in increasing
orders as follows :
45 42 20 35 43 52 61 20 16 70 65 60
68 46 25 55 52 70 65 25 28 100 85 75
75 65 82 69 57 75 70 32 40 110 95 94
Construct x and R-charts and examine whether the process is under control.
Solution.
45 68 75 77 88 353
For sample 1, x1 = = 70.6
5 5
315 301 322
Similarly, x2 63, x3 60.2, x4 64.4,
5 5 5
291 397 396
x5 58.2, x6 79.4, x7 79.2
5 5 5
197 234 555
x8 39.4, x9 46.8, x10 111
5 5 5
455 478
x11 91, x12 95.6
5 5
For sample 1, R1 = xmax xmin = 88 45 = 43
LCL = x A 2 R
= 71.57 0.577 × 57.67
= 71.57 33.27 = 38.3
Control limits for R-chart
CL = R = 57.67
UCL = D4 R
= 2.115 × 57.67 (Q D4 = 2.115 for n = 5)
= 121.97
LCL = D3 R
= 0 × 57.67 (Q D3 = 0 for n = 5)
=0
120
90
80
CL = 71.57
70
Sample mean x
60
50
40 LCL = 38.3
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12
Sample No.
130 NOTES
UCL = 121.97
120
110
100
90
Sample range R
80
70
60
50 CL = 57.67
40
30
20
10
LCL = 0
0
1 2 3 4 5 6 7 8 9 10 11 12
Sample No.
(R-chart)
It is clear from the figure that all the values of R lies within UCL and LCL, so
the process is under control.
Example 7. In a factory producing spark plugs, the number rejected found in
the inspection of 10 lots of size 100 each is given below:
1 4 0.040 6 4 0.040
2 7 0.070 7 5 0.050
3 8 0.080 8 8 0.080
4 2 0.020 9 6 0.060
5 3 0.030 10 10 0.100
Construct appropriate control chart and state whether the process is in control.
Solution. Since we are given fraction rejected, p-chart is suitable for the given
situation.
Total number of rejected
p =
Total number of items inspected in all samples
57
= = 0.057
10 100
CLp = p = 0.057
(p-chart)
It is clear from figure that all the values of fraction rejected p lies within UCL
and LCL, so the process is under control.
Example 8. Based on 15 subgroups each of size 200 taken at intervals of 45
minutes from a manufacturing process, the average fraction defective was found to be
0.068. Calculate the value of CL, UCL and LCL.
Solution. Since we are given average fraction defective, we will calculate the
control limits of p-chart.
CLp = p = 0.068
p (1 p) 0.068(1 0.068)
UCLp = p 3 = 0.068 3
n 200
= 0.068 + 0.053 = 0.121
p (1 p ) 0.068(1 0.068)
LCLp = p 3 = 0.068 3
n 200
= 0.068 0.053 = 0.015.
1 8 9 10
2 10 10 13
3 13 11 18
4 9 12 15
5 8 13 12
6 10 14 14
7 14 15 9
8 6
Construct a control chart for fraction defective, and examine whether the process
is under control.
Solution.
1 8 0.08
2 10 0.10
3 13 0.13
4 9 0.09
5 8 0.08
6 10 0.10
7 14 0.14
8 6 0.06
9 10 0.10
10 13 0.13
11 18 0.18
12 15 0.15
13 12 0.12
14 14 0.14
15 9 0.09
169
= = 0.113
15 100
Control limits for p-chart
CLp = p = 0.113
p (1 p) 0.113(1 0.113)
UCLp = p + 3 = 0.113 + 3
n 100
= 0.113 + 0.095 = 0.208
(p-chart)
It is clear from figure that all the values of fraction defectives p lies within UCL
and LCL, so the process is under control.
Example 10. The following data refers to visual defects found during the
inspection of the first 10 samples of size 50 each from a lot of two-wheelers manufactured
by an automobile company :
Sample No. 1 2 3 4 5 6 7 8 9 10
No. of defectives 4 3 2 3 4 4 4 1 3 2
Draw the p-chart and examine whether the process is under control.
Solution.
1 4 0.08
2 3 0.06
3 2 0.04
4 3 0.06
5 4 0.08
6 4 0.08
7 4 0.08
8 1 0.02
9 3 0.06
10 2 0.04
0.18
UCL = 0.1608
0.16
0.14
0.12
Fraction defertives
0.10
0.08
CL = 0.06
0.06
0.04
0.02
LCL = 0
0
1 2 3 4 5 6 7 8 9 10
Sample No.
(p-chart)
It is clear from figure that all the values of fraction defectives p lies within UCL
and LCL, so the process is under control.
Example 11. Ten samples of hourly production of a mass produced items are
taken and the number of defectives in each sample are noted. On the basis of these data,
obtain the control limits of the control chart for fraction defectives.
Sample No. 1 2 3 4 5 6 7 8 9 10
Size of sample 148 160 155 156 161 167 164 160 156 173
No. of defectives 7 6 8 8 5 9 8 8 7 10
Solution. Here, the sample sizes are different. So the average sample size is to
be determined first as
Number of items examined = 1600
Number of samples = 10
1600
Average sample size (n) = = 160
10
Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
166
np = = 11.067
15
166
p = = 0.011
1000 15
Control limits for np-chart
CLnp = n p = 11.067
NOTES
(np-chart)
It is clear from the figure that all the number of defectives lies within UCL and
LCL, the process is under control.
Example 13. An inspection of 10 samples of size 400 each from 10 lots revealed
the following number of defectives 17, 15, 14, 26, 9, 4, 19, 12, 9, 15.
Draw the control chart for number of defectives and examine whether the process
is under control.
Solution. Here, n = 400 and k = 10
Total number of defectives
np =
Number of samples inspected
140
np = = 14
10
140
p = = 0.035
400 10
Control limits for np-chart
CLnp = 14
UCLnp = n p + 3 np (1 p) = 14 + 3 14 (1 0.035)
= 14 + 11.027 = 25.027
LCLnp = n p 3 np (1 p) = 14 3 14 (1 0.035)
= 14 11.027 = 2.973
30
UCL = 25.027
25
No. of defectives
20
15 CL = 14
10
5 LCL = 2.973
0
1 2 3 4 5 6 7 8 9 10
Sample No.
(np-chart)
Self-Instructional Material 181
Statistical Analysis It is clear from the figure that one point corresponding to 4th sample lie outside
the UCL and LCL, so the process is not under control.
Example 14. An inspection of 10 samples of size 100 each revealed the following
data :
NOTES
Sample No. 1 2 3 4 5 6 7 8 9 10
No. of defectives 2 1 1 3 2 3 4 2 2 0
Draw the control chart for number of defectives (np-chart) and examine whether
the process is under control.
Solution. Here, n = 100, k = 10
Total number of defectives
np =
Number of samples inspected
20
np = =2
10
20
p = = 0.02
100 10
Control limits for np-chart
CLnp = n p = 2
UCLnp = n p + 3 np (1 p)
LCLnp = n p 3 np (1 p)
7
UCL = 6.2
6
5
No. of defectives
3
CL = 2
2
1
LCL = 0
0
1 2 3 4 5 6 7 8 9 10
Sample No.
(np-chart)
It is clear from the figure that all the number of defectives lies within UCL and
LCL, the process is under control.
2+3+4+0+5+6+7+4+3+2 36
= = = 3.6
10 10
Control limits for c-chart
CLc = c = 3.6
10 UCL = 9.292
8
No. of defectives
6
CL = 3.6
4
2
LCL = 0
0
1 2 3 4 5 6 7 8 9 10
Sample No.
(c-chart)
It is clear from the figure that all the values of number of defects (c) lies within
UCL and LCL, the process is under control.
Example 16. The number of complaints received daily by an organization are as
follows :
Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Complaints 2 3 0 1 9 2 0 0 4 2 0 7 0 2 4
Draw a suitable control chart and examine whether the process is under control.
Solution. For the given problem, the suitable control chart is c-chart. Let the
number of complaints is denoted by c.
Total number of complaints
Here, c =
Number of days
36
c = = 2.4
15
10
8
UCL = 7.048
7
No. of complaints
3 CL = 2.4
2
1
LCL = 0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Day’s number
It is clear from the figure that one value of c corresponding to 5th day is not
within the control limits, the process is out of control.
Example 17. The following table shows the number of missing rivets observed
at the same time of the inspection of 12 aircrafts. Find the control limits for the number
of defects chart and comment on the state of control.
Aircraft Number 1 2 3 4 5 6 7 8 9 10 11 12
168
c = = 14
12
Control limits for c-chart
CLc = c = 14
UCLc = c + 3 c
= 14 + 3 14 = 14 + 11.225 = 25.23
= 14 3 14 = 14 11.225 = 2.775
NOTES
30
UCL = 25.23
25
No. of missing rivets
20
15 CL = 14
10
5 LCL = 2.775
0
1 2 3 4 5 6 7 8 9 10 11 12
Air craft no.
(c-chart)
It is clear from the figure that all the values of missing rivets lies within UCL
and LCL, the process is under control.
EXERCISE 8.1
Sample No. 1 2 3 4 5 6 7 8 9 10
Mean ( x ) 15 17 15 18 17 14 18 15 17 16
Range (R) 7 7 4 9 8 7 12 4 11 5
Calculate the control limits of x and R -charts and comment on the state of control.
3. A company manufactures a product which is packed in cans. It utilises an automatic
filling equipment. It takes a sample of 5 cans every hour and measures the filling (grams)
in the last 5 samples.
1 2 3 4 5
Calculate the control limits of x and R-charts and comment on the state of control.
Sample No. 1 2 3 4 5 6 7 8 9 10
NOTES Mean ( x ) 11.2 11.8 10.8 11.6 11.0 9.6 10.4 9.6 10.6 10.0
Range (R) 7 4 8 5 7 4 8 4 7 9
Construct x and R-charts and examine whether the process is under control.
(Given for n = 5, A2 = 0.577, D3 = 0, D4 = 2.115)
5. The following data shows the value of sample mean x and range R for 10 samples of size
5 each.
Sample No. 1 2 3 4 5 6 7 8 9 10
Mean ( x ) 43 49 37 44 45 37 51 46 43 47
Range (R) 5 6 5 7 7 4 8 6 4 6
Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No. of defectives 3 3 3 3 1 1 1 1 6 1 1 1 5 4 6
Day 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
No. of defectives 3 6 2 7 3 2 3 6 1 2 3 1 4 4 5
Calculate the central line, upper control limit and lower control limit for p-chart.
8. A daily sample of 30 items was taken over a period of 14 days in order to establish
attributes control limits. If 21 defectives were found, what should be the upper and
lower control limits of the proportion of defectives ?
9. In a factory producing an item, the number rejected found in the inspection of 20 lots of
size 100 each is given below :
Lot No. No. rejected Fraction rejected Lot No. No. rejected Fraction rejected
1 5 0.050 11 4 0.040
2 10 0.100 12 7 0.070
3 12 0.120 13 8 0.080
4 8 0.080 14 2 0.020
5 6 0.060 15 3 0.030
6 5 0.050 16 4 0.040
7 6 0.060 17 5 0.050
8 3 0.030 18 8 0.080
9 3 0.030 19 6 0.060
10 5 0.050 20 10 0.100
Construct appropriate control chart and state whether the process is in control.
11. The following data refer to defects found during inspection of the first 10 samples of size
100 each.
Sample No. 1 2 3 4 5 6 7 8 9 10
No. of defectives 4 8 11 3 11 7 7 16 12 6
Calculate the control limits for np-chart and state whether the process is in control.
12. Twenty samples each of size 10 were inspected. The number of defectives found in each
of them is given below :
Sample No. 1 2 3 4 5 6 7 8 9 10
No. of defectives 0 1 0 3 9 2 0 7 0 1
Sample No. 11 12 13 14 15 16 17 18 19 20
No. of defectives 1 0 0 3 1 0 0 2 0 0
Sample No. 1 2 3 4 5 6 7 8 9 10
No. of defectives 4 8 11 3 11 7 7 16 12 6
Obtain the upper and lower control limits of np-chart and state whether the process is in
control.
14. The following data refer to number of defectives found on 24 consecutive production
days in daily samples of 400 items :
Production day 1 2 3 4 5 6 7 8 9 10 11 12
No. of defectives 20 10 20 24 22 18 38 8 24 54 50 18
Production day 13 14 15 16 17 18 19 20 21 22 23 24
No. of defectives 24 30 16 28 20 8 22 22 52 6 20 22
Day 1 2 3 4 5 6 7 8 9 10
Complaints 2 3 4 0 5 6 7 4 3 2
Draw a control chart for number of defects and comment whether the process is in control.
17. The number of mistakes made by an account clerk are as follows :
Week No. 1 2 3 4 5 6 7 8 9 10
No. of mistakes 1 0 2 0 1 0 1 0 1 2
Week No. 11 12 13 14 15 16 17 18 19 20
No. of mistakes 3 3 1 0 0 7 1 0 1 0
Draw an appropriate control chart and state whether the mistakes of the clerk is in
under control.
Answers
1. For x -chart For R-chart
CL = 2.06415 CL = 0.01675
UCL = 2.076 UCL = 0.03819
LCL = 2.0522 LCL = 0]
2. For x -chart For R-chart
CL = 16.2 CL = 7.4
UCL = 20.492 UCL = 12.3
LCL = 11.908 LCL = 0
Process is under control by both charts.
3. For x -chart : CL = 999.32, UCL = 1001.772, LCL = 996.872,
For R-chart : CL = 4.4, UCL = 9.306, LCL = 0
Process under control using both x and R-charts.
4. For x -chart : CL = 10.66, UCL = 14.295, LCL = 7.025 ; process under control
For R-chart : CL = 6.3, UCL = 13.3245, LCL = 0 ; process under control.
5. For x -chart : CL = 44.2, UCL = 47.564, LCL = 40.836; out of control
For R-chart : CL = 5.8, UCL = 12.267, LCL = 0; under control
6. CLp = 0.1537, UCLp = 0.17788, LCLp = 0.1295.
7. CL = 0.184, UCL = 0.236, LCL = 0.132.
8. CLp = 0.05, UCLp = 0.17, LCLp = 0.
9. CLp = 0.06, UCLp = 0.1311, LCLp = 0 ; out of control.
10. CLp = 0.186, UCLp = 0.303, LCLp = 0.069 ; out of control.
11. CLnp = 8.5, UCLnp = 16.87, LCLnp = 0.13; under control.
12. CLnp = 1.5, UCLnp = 4.89, LCLnp = 0 ; out of control.
13. UCLnp = 16.87, LCLnp = 0.13 ; under control.
14. CLnp = 24, UCLnp = 38.25, LCLnp = 9.75 ; out of control.
15. CLc = 3.6, UCLc = 8.38, LCLc = 0 ; under control.
16. CLc = 3.6, UCLc = 9.292, LCLc = 0; under control.
17. CLc = 1.2, UCLc = 4.49, LCLc = 0 ; not under control.
188 Self-Instructional Material