PA Lec 8 2024

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

PROBABILITY ANALYSIS

Session 08

Dr. Aditya Kumar Sahu


◼ Quick Recap!

2
Learning Outcome

1. Discrete Probability Distributions:


 Bernoulli Distribution
 Binomial Distribution
 Case: Go Bananas!
 Geometric Distribution
 Poisson
 Hypergeometric

3
◼ Let random variable X1 represents the no. of PGP students arriving at Nescafe
between 2 PM to 4PM
◼ Let random variable X2 represents the no. of IPM students arriving at Nescafe
between 2 PM to 4PM

Y= No. of students from IPM and PGP jointly arriving at IIM Rohtak Nescafe Store 4
◼ Y = X1 +X2

5
Discrete Probability Distributions

◼ In addition to tables and graphs, a formula that gives the probability


function, f(x), for every value of x is often used to describe the
probability distributions.

◼ Several discrete probability distributions specified by formulas are


the:
❑ Discrete-uniform,
❑ Binomial,
❑ Poisson, and
❑ Hypergeometric distributions.

6
Distribution of Random Variable

7
Bernoulli Random Variable

• If an experiment consists of a single trial and the outcome of the trial can only be
either a success* or a failure, then the trial is called a Bernoulli trial.

• The number of success X in one Bernoulli trial, which can be 1 or 0, is a


Bernoulli random variable.

• If p is the probability of success in a Bernoulli experiment,


then P(1) = p, P(0) = 1 – p,

• 𝑓 𝑥 = 𝑝 𝑥 (1 − 𝑝)(1−𝑥)
• E(X) = p and Var(X) = p(1 – p).

* The terms success and failure are simply statistical terms, and do not have positive or
negative implications. In a production setting, finding a defective product may be
termed a “success,” although it is not a positive result.
𝒇 𝒙 = 𝒑𝒙 (𝟏 − 𝒑)(𝟏−𝒙)

9
Binomial Probability Distribution
◼ Four Properties of a Binomial Experiment
1. The experiment consists of a sequence of n identical bernoulli trials.
2. Two outcomes, success and failure, are possible on each trial.
3. The probability of a success, denoted by p, does not change from trial
to trial. (This is referred to as the stationarity assumption.)
4. The trials are independent.

10
Binomial Probability Distribution

• Our interest is in the number of successes occurring in the n trials.

• We let x denote the number of successes occurring in the n trials.

11
Binomial Probability Distribution
• Binomial Probability Function
𝑛!
𝑓 𝑥 = 𝑝 𝑥 (1 − 𝑝)(𝑛−𝑥)
𝑥! 𝑛 − 𝑥 !
where:
x = the number of successes
p = the probability of a success on one trial
n = the number of trials
f(x) = the probability of x successes in n trials
n! = n(n – 1)(n – 2) ….. (2)(1)

12
Mean, Variance, and Standard Deviation of the Binomial Distribution
• Expected Value
E(x) =  = np

• Variance
Var(x) =  2 = np(1 – p)

• Standard Deviation
𝜎= 𝑛𝑝(1 − 𝑝)

13
Geometric Distribution
or
Pascal Distribution

𝑓 𝑥 = 𝑝 (1 − 𝑝)(𝑥−1)

14
Mean, Variance, and Standard Deviation of the Geometric Distribution
• Expected Value
E(x) =  = 1/p

• Variance
Var(x) =  2 = (1 – p)/ p2

• Standard Deviation
𝜎= (1 – p)/ p2

15
The Poisson Distribution

• A Poisson distributed random variable is often useful in estimating


the number of occurrences over a specified interval of time or
space.

• It is a discrete random variable that may assume an infinite


sequence of values (x = 0, 1, 2, . . . ).

16
The Poisson Distribution

◼ Describes discrete occurrences over an interval.

◼ Also, used to describe no. of rare events in a fixed interval.

◼ Each occurrence is independent to any other occurrences.

◼ The number of occurrences in each interval can vary from zero to infinity.

◼ The probability of an occurrence is the same for any two intervals of equal
length.

17
Poisson Distribution - Applications

◼ Arrivals at queuing systems


 airports -- people, airplanes, automobiles, baggage
 banks -- people, automobiles, loan applications
 computer file servers -- read and write operations

◼ Defects in manufactured goods


 number of defects per 1,000 feet of extruded copper wire
 number of blemishes per square foot of painted surface
 number of errors per typed page
Poisson Probability Distribution

◼ Poisson Probability Function


𝜇 𝑥 𝑒 −𝜇
𝑓 𝑥 =
𝑥!
where:
x = the number of occurrences in an interval
f(x) = the probability of x occurrences in an interval
 = mean number of occurrences in an interval
e = 2.71828
x! = x(x – 1)(x – 2) . . . (2)(1)

19
◼ A property of the Poisson distribution is that the mean and variance
are equal.

=2

20
Poisson Distribution - Formula

−
e  x
P( X = x) =
x!
where:
x = number of events in an area of opportunity
 = expected number of events/average arrival rate
e = base of the natural logarithm system
(2.71828...)
Poisson Distribution - Characteristics

◼ If X ~ Poisson(λ), Then
◼ Mean
 = E( X ) = 
◼ Variance

Standard Deviation σ = λ
2

σ= λ
Caselet 1.5

◼ A life insurance salesman sells on the average 3 life insurance policies


per week. Considering poisson distribution, calculate the probability
that in a given week he will sell more than 1 policies but less than 5
policies.
Caselet 1.6

◼ A mainframe computer in a university crashes on the average 0.71


time in a semester.
a) What is the probability that it will crash atmost two times in a given
semester?
b) What is the probability that it will not crash at all in a given semester?

24
Hypergeometric Probability Function

◼ The binomial distribution is appropriate when you sample with replacement.


 The probability of success does not change from trial to trial
 The trials are independent

◼ Sampling without replacement: i.e., after an item is drawn, it is not put back for
subsequent draws.
 Trials not independent
 The probability of success changes from trial to trial

◼ Use the hypergeometric distribution in place of the binomial distribution when


sampling without replacement.
 The number of successes in a two outcome experiment
 Trials are not independent of one another
25
◼ Consider a box full of production items, of which 10% are known to be defective.
Let success be labeled as the draw of a defective item.
◼ The probability of success may not be the same from trial to trial; it will depend
on the size of the population and whether the sampling was done with or
without replacement.
◼ Suppose the box consists of 20 items of which 10%, or 2, are defective. The
probability of success in the first draw is 0.10 (= 2/20).
◼ However, the probability of success in subsequent draws will depend on the
outcome of the first draw.
◼ For example, if the first item was defective, the probability of success in the
second draw will be 0.0526 (= 1/19), while if the first item was not defective, the
probability of success in the second draw will be 0.1053 (= 2/19).

26
◼ The probability of x successes in a random selection of n items is
𝑟 𝑁−𝑟
𝑥 𝑛−𝑥
𝑓 𝑥 =𝑃 𝑋=𝑥 =
𝑁
𝑛
 N is the population size, n is the sample size or no of trials.
 r is the number of elements in the population labeled success,
 𝑥 = the number of successes

◼ The formula consists of three parts


𝑟
 : the number of ways to select x success from r population successes
𝑥
𝑁−𝑟
 : the number of ways to select 𝑛 − 𝑥 failures from 𝑁 − 𝑟 population failures
𝑛−𝑥
𝑁
 : the number of ways a sample of size n can be selected from a population of size N
𝑛
27
Hypergeometric Probability Distribution

𝑟 𝑁−𝑟
𝑥 𝑛 − 𝑥 for 0 < x < r
𝑓 𝑥 =
𝑁
𝑛
number of ways
n – x failures can be selected
from a total of N – r failures
number of ways in the population
x successes can be selected
from a total of r successes
in the population
number of ways
n elements can be selected
from a population of size N

28
𝑟
◼ 𝐸 𝑋 =𝜇=𝑛 ,
𝑁

2 𝑟 𝑟 𝑁−𝑛
◼ 𝑉𝑎𝑟 𝑥 = 𝜎 = 𝑛 1−
𝑁 𝑁 𝑁−1

29
Caselet 1.7

◼ Wooden boxes are commonly used for the packaging and transportation of
mangoes. A convenience store in Morganville, New Jersey, regularly buys
mangoes from a wholesale dealer. For every shipment, the manager randomly
inspects five mangoes from a box containing 20 mangoes for damages due to
transportation. Suppose the chosen box contains exactly two damaged mangoes.
1. What is the probability that one out of five mangoes used in the inspection is
damaged?
2. If the manager decides to reject the shipment if one or more of the mangoes are
damaged, what is the probability that the shipment will be rejected?
3. Calculate the expected value, the variance, and the standard deviation of the
number of damaged mangoes used in the inspection.

30
◼ Inspect five mangoes from a box containing 20 mangoes with exactly two damaged mangoes.

◼ What is the probability that one out of the five mangoes is damaged?
𝑟 𝑁−𝑟
2 20 − 2
𝑥 𝑛−𝑥 𝑃 𝑋 = 1 = 1 5 − 1 = 0.3947
𝑓 𝑥 =𝑃 𝑋=𝑥 = 20
𝑁
𝑛 5

◼ If the manager decides to reject the shipment if one or more of the mangoes are damaged, what is
the probability that the shipment will be rejected?
2 20−2
0 5−0
 𝑃 𝑋=0 = 20 = 0.5526
5
 𝑃 𝑋 ≥ 1 = 1 − 𝑃 𝑋 = 0 = 1 − 0.5526 = 0.4474

31
◼ Calculate the expected value, the variance, and the standard deviation.

32
Inspect five mangoes from a box containing 20 mangoes
with exactly two damaged mangoes.

What is the probability that one out of the five mangoes is


damaged?
2 20 − 2
𝑃 𝑋 = 1 = 1 5 − 1 = 0.3947
20
5

33
Quick Revision
Practice: Bayes Theorem

◼ A consulting firm submitted a bid for a large research project. The firm’s
management initially felt they had a 50 –50 chance of getting the project.
However, the agency to which the bid was submitted subsequently requested
additional information on the bid. Past experience indicates that for 75% of the
successful bids and 40% of the unsuccessful bids the agency requested additional
information.
◼ What is the prior probability of the bid being successful (that is, prior to the
request for additional information)?
◼ What is the conditional probability of a request for additional information given
that the bid will ultimately be successful?
◼ Compute the posterior probability that the bid will be successful given a request
for additional information.

35
S1 = successful, S2 = not successful, and B = request received
for additional information.

a. P(S1) = .50

b. P(B | S1) = .75

(.50)(.75) .375
c. P(S1 B) = (.50)(.75) + (.50)(.40) = .575 = .65

36
Measures of Association Between Two Variables

• Thus far we have examined numerical methods used to summarize the


data for one variable at a time.

• Often a manager or decision maker is interested in the relationship


between two variables.

• Two descriptive measures of the relationship between two variables are


covariance and correlation coefficient.

37
Covariance

• The covariance is a measure of the linear association between two


variables.

• Positive values indicate a positive relationship.

• Negative values indicate a negative relationship.


Covariance
• The covariance is computed as follows:

σ(𝑥𝑖 −𝑥)(𝑦
ҧ ത
𝑖 −𝑦)
For samples: 𝑠𝑥𝑦 =
𝑛−1

σ(𝑥𝑖 −𝜇𝑥 )(𝑦𝑖 −𝜇𝑦 )


For 𝜎𝑥𝑦 =
𝑁
populations:

39
Correlation Coefficient

◼ Correlation is a statistical measure that expresses the extent to which two


variables are linearly related.
◼ Correlation and regression are statistical measurements that are used to quantify
the strength of the linear relationship between two variables.
◼ Correlation determines if two variables have a linear relationship while
regression describes the cause and effect between the two.

40
Correlation Coefficient
• The correlation coefficient is computed as follows:

𝑠𝑥𝑦
For samples: 𝑟𝑥𝑦 =
𝑠𝑥 𝑠𝑦

𝜎𝑥𝑦
For populations: 𝜌𝑥𝑦 =
𝜎𝑥 𝜎𝑦
Correlation Coefficient

• The coefficient can take on values between -1 and +1.

• Values near -1 indicate a strong negative linear relationship.

• Values near +1 indicate a strong positive linear relationship.

• The closer the correlation is to zero, the weaker the relationship.

42
43
Chebyshev’s Theorem
• At least (1 - 1/z2) of the items in any data set will be within z standard
deviations of the mean, where z is any value greater than 1.
• Chebyshev’s theorem requires z > 1, but z need not be an integer.

44
Chebyshev’s Theorem
• At least 75% of the data values must be within z = 2 standard
deviations of the mean.
• At least 89% of the data values must be within z = 3 standard
deviations of the mean.
• At least 94% of the data values must be within z = 4 standard
deviations of the mean.

45
Continuous random variable
Continuous Random Variables

◼ Continuous random variable: A variable which can take on any value


over a given interval.
 Continuous variables are measured, not counted.
 Examples:
◼ thickness of an item
◼ time required to complete a task
◼ temperature of a solution
◼ height, in inches
◼ Weight of ice-cream
Probability Density Function

◼ The probabilities associated with a continuous random variable X are


determined by the probability density function (i.e., p.d.f) of the random
variable.

◼ The function, denoted f(x), has the following properties:


1. f(x) ≥ 0 for all x.
2. The probability that X will be between two numbers a and b is equal to the
area under f(x) between a and b.
3. The total area under the entire curve of f(x) is equal to 1.0.
Continuous Probability Distributions

• A continuous random variable can assume any value in an interval on the


real line or in a collection of intervals.

• It is not possible to talk about the probability of the random variable


assuming a particular value.

• Instead, we talk about the probability of the random variable assuming a


value within a given interval.

49
Cumulative Probability Distribution

◼ The cumulative distribution function of a continuous random


variable :
F(x) = P(X ≤ x)
= area under f(x) between the smallest possible
value of X and point x.
Therefore,
P(a ≤ X ≤ b) = F(b) – F(a)
Note that, putting b = a
P(a ≤ X ≤ a) = F(a) – F(a) = 0 i.e., P(X = a) = 0
Also note
P(a ≤ X ≤ b) = P(a < X ≤ b) = P(a ≤ X < b) = P(a < X < b)
Continuous Probability Distribution

f(x)

Area = F(a)

x
a
f(x) P(a ≤ X ≤b) = Area under f(x)
Between a and b = F(b) – F(a)

x
a b
Continuous Probability Distributions

◼ Uniform distribution
◼ Normal distribution
◼ The t distribution
◼ The chi-square distribution
◼ Exponential distribution
◼ F distribution
◼ ...
f (x) Exponential

• Uniform Probability Distribution


• Normal Probability Distribution
• Exponential Probability Distribution
x
Uniform
f (x)
Normal
f (x)

x
x

53
Continuous Probability Distributions

• The probability of the random variable assuming a value within some


given interval from x1 to x2 is defined to be the area under the graph of
the probability density function between x1 and x2.

Uniform Normal f (x) Exponential


f (x) f (x)

x x x
x1 x2 x1 x2 x1 x2

54

You might also like