03 - Probability Distributions and Estimation

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 66

Probability Distributions

and
Random Variables

Introduction
Probability is the measure of uncertainty and have always been an
important aspect in the reliability assessment of industrial products
and/or equipments.
Good product design is of course essential for products with high
reliability. However, no matter how good the product design is,
products deteriorate over time since they are operating under certain
stress or load in the real environment, often involving randomness.
Maintenance has, thus, been introduced as an efficient way to assure a
satisfactory level of reliability during the useful life of a physical asset.

Lecture Overview
Definitions
Mathematical Derivations
Example /Tutorial

Probability definition?

Probability can be defined as the measure of uncertainty


(used to represent the risk of uncertainty in engineering
applications)

In other words it can be used to quantify the likelihood


or chance of an event occurring at a given time

It can be interpreted as the degree of belief or relative


frequency
4

Random Variable
A random variable is defined as a function that assigns a real value to every
possible outcome or event of an experiment or observation.
In many applications ranging from manufacturing engineering and elsewhere, the
outcomes x1, x2,,xn of events that constitute a sample over an interval of time
(or space) take real numerical values. It is therefore convenient, and sometimes
necessary, to express all events using numerical values on a real line. The functions
that establish such transformations to a real line are called random variables.
Random variables can be classified into two types: discrete and continuous
random variables.

Probability distribution

Definition: is a mathematical model that relates the value of the variable


with the probability of occurrence of that value in the population.

The probability of an event is a number lying in the interval 0 p 1,


with 0 corresponding to an event that never occur and 1 to an event that is
certain to occur.

For example if we visualise the diameter of a piston-ring as a random


variable, because it takes on different values in the population according
to some random mechanisms, then the probability distribution of ring
diameter is the probability of occurrence of any value of ring diameter in
the population.
6

Example

Suppose that an event E can happen in h ways out of n equally likely


possible ways. Then the probability of occurrence of an event E is
denoted by
p P( E )

h
.
n

Then the probability of non-occurrence of an event E is denoted by


q P( E c )

nh
h
1 1 p 1 P ( E ).
n
n

Example

If E1 and E2 are two events, the probability that event E2 occurs given
that E1 has occurred is denoted by P(E2 | E1) (conditional probability
of E2 given has E1 occurred). If the two events are mutually exclusive
then P(E1 E2) = 0. Then E1 E2 denotes the event that either E1 or
E2 or both occur, then
P(E1 E2) = P(E1) + P(E2) P(E1 E2).

An extension of n mutually exclusive events with respective


probabilities p1, p2, ., pn gives the result that the probability of
occurrence of the union of all events is the sum of all possible
probabilities, Spiegel (1992).
8

Discrete distribution

Discrete distribution: when the parameter measures


assumes a certain value, such as integers 0, 1, 2the
probability distribution is called a discrete distribution.

If a discrete random variable X can take the values x1, x2,


,xn with probabilities p1, p2,,pn where p1+ p2+
+pn =1 and p0, then this defines a discrete
distribution for X. The probability that X will take a
particular value x is denoted by P(X = x) or P(x)
9

Probability Mass function

Given the probability (mass) distribution f (x), the mean is


defined as
x. f ( x)
all x

and the variance is


2 x 2 . f ( x) x 2 . f ( x) 2
all x

all x

The standard deviation is simply the square root of the


variance
S 2

10

Continuous distribution

A random variable X is continuous if its set of possible values is an entire interval


of numbers (If A < B, then any number x between A and B is possible).

Continuous distribution: when a variable is measured and expressed in a


continuous scale, its probability distribution is called a continuous distribution.

Then a probability distribution of X is a function f (x) such that for any two
numbers a and b,

P a X b

f ( x )dx

Properties of a probability density function are


1)

f ( x) 0

2)

f ( x) 1
11

Probability density function


For f (x) to be a probability density function (pdf)

f (x) > 0 for all values of x.

The area of the region between the graph of f and the x axis
is equal to 1.

Area = 1
12

The Cumulative Distribution


Function F(x)
The cumulative distribution function, F(x) for a continuous
random variable X is defined for every number x by

F ( x) P X x

f ( y )dy

For each x, F(x) is the area under the density curve to the
left of x.
13

Using F(x) to Compute


Probabilities
Let X be a continuous random variable with pdf
f(x) and cdf F(x). Then for any number a,

P X a 1 F (a )
and for any numbers a and b with a < b,
P a X b F (b) F (a )
14

Obtaining f(x) from F(x)

If X is a continuous random variable with


pdf f(x) and cdf F(x), then at every
number x for which the derivative
F ( x) exists,

F ( x) f ( x).
15

Expected Value or Mean Value


The expected or mean value of a continuous random
variable X with pdf f (x) is

X E X

x f ( x)dx

16

Expected Value of h(X)


If X is a continuous random variable with pdf f(x)
and h(x) is any function of X, then

E h( x ) h ( X )

h( x) f ( x)dx

17

Standard Deviation and


Variance
The variance of continuous random variable X with
pdf f(x) and mean is
2
X

V ( x)

(x )

f ( x)dx

E[ X ]
The standard deviation is

X V ( x).
18

Short-cut Formula for Variance

E ( X )

V (X ) E X

19

Important Distributions

Several distributions are used quite frequently in reliability analysis they


are:
Discrete distributions
The binomial distribution
Poison distribution
Continuous distributions
Normal distribution
Log-normal distribution
Exponential distribution
Weibull distribution
Gamma distribution
20

Binomial distribution
The Binomial Distribution with parameters n 0 and 0 < p < 1 its probability distribution
is expressed as:
n x
p (1 p ) n x
x

p ( x)

x 0,1, ..., n

The mean and variance of the binomial distribution are


Mean np
Standard deviation

Variance

2 np (1 p)

S 2 np (1 p )

In the process of manufacturing a product, inspection may test whether the product is good
or defective. The probability outcomes for analysing several products would follow a
binomial distribution.
21

Binomial Example

Example 1.
coin is

The probability of getting exactly 2 heads in 6 tosses of a fair

2
6 1 1
p ( x)
2 2 2

Example 1.

62

6! 1 1


2!4! 2 2

15
0.23
64

The probability of getting 4 heads in 6 tosses of a fair coin is

4
64
5
6 1 1
6 1 1
p ( x)

4 2 2
5 2 2
15
6
1
11

0.34
64 64 64 32

6 5

6
6 1 1

6 2 2

6 6

22

Binomial Exercise
To guard against spurious failure causing a plant
outage, automatic protective system are often
designed with three protective channels. Any 2 out
of 3 channels need to be in a failed state to initiate
a system shutdown. Assuming that the reliability
is 0.99, determine the probability of such an
automatic system being in a failed state when an
inspection is carried out on the system

23

Solution

If the reliability R of a single channel is 0.99, then the probability p of


it being in a failed state is (1 - 0.99) = 0.01. For the APS to be in a
failed state on demand, two or more channels must have failed:
3 2
3!
p 1 p 3 2
0.01 2 (0.99) 3 0.0001 0.99
2!1!
2

p ( X 2)

2.97 10 4
3 3
3!
p 1 p 3 3
0.01 3 (1) 1 10 6
3!1!
3

p ( X 3)

Hence the probability of the system being in a failed state


ps p ( X 2) p ( X 3) 2.98 10 4

24

Conditions for Binomial


distribution

The experiment consist of n repetitions or trials


Each trial can have only one or two possible outcomes
The probability of a given outcome is the same for each
trial
The trials are independent (i.e. the probability of obtaining
a given result doe not depend upon the previous/other
trials)

25

Poison distribution

A useful distribution in statistical quality control is the Poison distribution,


defines as follows.

e x
f ( x)
x!

x 0,1,....

The mean and variance of the Poison distribution are


Mean and
Variance

where the parameter (lamda) is greater than zero, and e is a constant equal to
approximately 2.71828.
Note that the mean and variance of the Poison distribution are equal

26

Poisson Example
Suppose the average number of lions seen on a 1 day safari is 5. What is the
probability that tourists will see fewer than four lions on the next 1-day safari?
Solution: This is a Poisson experiment in which we know the following:
= 5; since 5 lions are seen per safari, on average.
x = 0, 1, 2, or 3; since we want to find the likelihood that tourists will see fewer
than 4 lions; that is, we want the probability that they will see 0, 1, 2, or 3 lions.
since e = 2.71828.

27

Poisson Solutions
To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3
lions.
We need to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5).
To compute this sum, we use the Poisson formula:
P(x < 3, 5) = P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5)
P(x < 3, 5) = [ (e-5)(50) / 0! ] + [ (e-5)(51) / 1! ] + [ (e-5)(52) / 2! ] + [ (e-5)(53) / 3! ]
P(x < 3, 5) = [ (0.006738)(1) / 1 ] + [ (0.006738)(5) / 1 ] + [ (0.006738)(25) / 2 ] +
[ (0.006738)(125) / 6 ]
P(x < 3, 5) = [ 0.0067 ] + [ 0.03369 ] + [ 0.084224 ] + [ 0.140375 ]
P(x < 3, 5) = 0.2650
The probability of seeing no more than 3 lions is 0.2650.
28

Poison distribution
Characteristics of a Poisson experiment are:

The probability of an occurrence is the same over any


two intervals of equal length

The occurrence or non-occurrence in any interval is


independent of the occurrence or non-occurrence in any
other interval

29

Normal Distribution

Most useful in modeling distributions of physical nature, e.g.


measurement of height of people in a population.
Probability density function is symmetrical, bell-shaped: with
as mean, as S.D.

1
f ( x)
e
2

1 x

2

, where - x .

f(x)

30

Normal Distribution
Normal distributions are defined using 2 parameters: mean and standard deviation:
N(,).
Bell Shaped

f(x)

Symmetrical
Mean, Median and Mode are Equal
Location is determined by the mean,

Spread is determined by the standard deviation,


The random variable has an infinite theoretical range:
+ to
Area under bell = 1.00
31

Spread of S.D

32

Standard Normal Distributions


The normal distribution with parameter values 0 and 1
is called a standard normal distribution. The random
variable is denoted by Z. The probability density function is

1
z2 / 2
f ( z; 0,1)
e
2
z
The cumulative distribution function is z

( z ) P( Z z )

f ( y; 0,1) dy
33

Standard Normal Cumulative


Areas

Shaded area = (z )

Standard
normal
curve
0

34

Standard Normal

By substituting z= (x-)/, we transform the normal distribution into


standard normal distribution.
Standard normal distribution mean = 0,
Standard normal distribution variance = 1.
Standard Cumulative Normal Distribution Table:
Z
0.0
0.1

1.0

2.0

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.500 0.504 0.508 0.512 0.516 0.520 0.524 0.528 0.532 0.536
0.540 0.544 0.548 0.552 0.556 0.560 0.564 0.567 0.571 0.575
0.841 0.844 0.846 0.848 0.851 0.853 0.855 0.858 0.860 0.862
0.977 0.978 0.978 0.979 0.979 0.980 0.980 0.981 0.981 0.982

35

Example

If X is distributed normally with mean, , of 100 and


standard deviation, , of 50, the Z value for X = 200 is
Z

200 100
2.0
50

This says that X = 200 is two standard deviations (2


increments of 50 units) above the mean of 10

36

Empirical Rule
2 covers about 95% of Xs
3 covers about 99.7% of Xs

95.44%

99.72%
37

Standard Normal Table

From the standard normal table the probability


less than a desired value for Z (i.e., from negative
infinity to Z)
.9772

P (Z < 2.0)=0.9772
0

2.00

38

The Standardized Normal


Table
The column gives the value
of Z to the second decimal
point
Z

The row
shows the
value of Z to
the first
decimal point

0.00

0.01

0.02

0.0
0.1
.
.
.

2.0

.9772

The value within the


table gives the
probability from Z =
up to the desired Z value

P(Z < 2.00) = .9772 2.0


39

Log-Normal distribution

It is a more versatile distribution than the Normal distribution as it has


a range of shapes, and therefore it is often a better fit to reliability data,
such as for population with wear out characteristics.

It does not have the disadvantage of the Normal distribution of


extending below zero to . The p.d.f is given as

f ( x)

x 2

1 ln x

for ( x 0)

Mean e

for ( x 0)

SD

e 2 2 e 2
2

40

Exponential distribution

A continuous random variable X has an exponential distribution with


parameter if the probability density function is

e x
f ( x)
0

x0
otherwise

The mean and variance of a random variable X having the exponential


distribution

Mean

1
,

Variance

1
2

41

Gamma distribution
A continuous random variable X has a gamma distribution if the probability
density function is

f ( x)

1
1 x /
x
e

( )
0
otherwise

x0

where the parameter satisfy > 0, > 0. The standard gamma distribution has
= 1.
For > 0 the () is defined as

( )

1 x
x
e dx

Mean E(X) , Variance V ( X ) 2 2


42

Weibull Distribution

The Weibull is a very flexible life distribution model with two parameters. It
has probability density function given as

x 1e ( x / )

f ( x)

x0

x0

With parameter >0, >0


The mean and variance are

1
1

2
1
1 1



2

43

Cumulative distribution
function (cdf) F(t)

The cumulative distribution function (cdf) of a continuous random


variable is defined by
t

F (t ) P(T t )

f (s) ds

The cumulative distribution function equals 0 at - and equals 1 at


+. The relationship between the density and the distribution can also
be expressed as
f (t )

d F (t )
.
dt

44

Conditional probability

Based on the definition of conditional probability, the conditional probability density


function f (t1 | t2) for a random variable T1 given another random variable T2 and is given
by

f (t1 | t 2 )

f (t1 , t 2 )
f (t 2 )

where f (t1, t2) is equal to the joint density function of T1 and T2,and f (t1) is equal the
marginal density function of T2 given by

f ( t2 )

f (t1 , t 2 ) dt1 .

Similarly, the conditional probability density for T2 given T1 = t1 is given by

f ( t 2 | t1 )

f (t1 , t 2 )
f (t1 )
45

Joint density function

The joint density function can be obtained from a given joint


cumulative distribution function by evaluating the partial derivative as
follows
f (t )

F (t )
.
t

That is
n F (t1 , t 2 ,......., t n )
f (t1 , t 2 ,......., t n )
.
t1t 2 .....t n

This concept simply offers a convenient way of modelling n random


variables simultaneously.
46

Maximum Likelihood
Estimator (MLE)

Suppose we have random variables T1, T2,,Tn having a joint density,


and we observe a random sample of observations t1, t2, ,tn from a
population of interest with common density function
f(t1, t2, , ,tn | )
where the form of f is known and is not known. Given observed values Ti
= ti, where i = 1, 2,, n, then the likelihood L( ) as a function of t1, t2,
, tn is defined by
n

L( ; t1 , t 2 ,......, t n ) f (ti | )
i 1

and we sometimes abbreviate L( ; t1, t2,, tn) to L( ) for convenience.


47

Log-likelihood estimates

It is usually easier to maximize the natural logarithm rather than the


likelihood function itself, Rice (1995). The log-likelihood is defined by
n

l ( ) ln f (ti | ).
i 1

Alternatively, the maximum likelihood estimates may be found from the loglikelihood by setting the derivative to zero. The solution sometimes involves
numerical methods such as Newton-Raphson or Quasi-Newton algorithms.
Such methods are available in subroutine libraries such as NAG, Crowder et
al. (1993). However, from a theoretical point of view, the maximumlikelihood estimation method ensures that most statistical problems of
parameter estimation likely to arise in reliability contexts are easily dealt with.

48

Exponential MLE

For both complete and censored data the MLE estimator for lambda is given as

r

T

Where r is the number of failure and T is the cumulative test time (censored)

f (ti ) i e t

Pr(Ti t r for all i r e tr

i 1,2,......r
L(t1 ,....t 2 )

exp(t )

nr

nr

i 1

exp ti (n r )t r
i 1

And the natural logarithm of the likelihood function


r

r
d ln L( ) r
ln L( ) t ln ti (n r )t r . Therefore,
t i ( n r )t r 0
d

i 1
i 1
Solving for lambda
r

r
r

t
i 1

( n r )t r

r
T
49

Exponential MLE

For an exponential distribution with r representing the number of failures

L ( )

i 1

exp(ti ) exp(ti )
i 1

Taking the logarithm


n

i 1

i 1

ln L( ) r ln ti ti
n
n
d ln L( ) r
ti ti
d
i 1
i 1

Solving for lambda

r
n

t tdistribution can be estimated by taking the Total Time


This confirm that the MTTF for the
exponential
i 1

i 1

on Test and dividing by the number of failures

50

Confidence Interval estimation


Confidence intervals give a plausible range of
values for a population parameter.

Confidence intervals also give information about


the precision of an estimate.

(When sampling variability is high, the confidence


interval will be wide to reflect the uncertainty of
the observation.)

51

Point and Interval Estimates

A point estimate is a single number

A confidence interval provides additional


information about variability

Lower
Confidence
Limit

Point Estimate

Upper
Confidence
Limit

Width of
confidence interval
52

Concepts of Confidence
Intervals

The value of a statistic (the mean, odds ratios,


indices etc.)

The standard error (SE) of the sample.

The desired width of the confidence interval (e.g.,


the 95% confidence interval or the 99%
confidence interval).
53

General Format of Confidence


Intervals
The value of the statistic in the sample (eg.,
mean, odds ratio, etc.)

estimate (measure of how confident we want to be)


(standard error)
From a Z table or a T table, depending on the
sampling distribution of the statistic.

Standard error of the statistic.

54

Point estimators

Characteristics

Unbiased: E ( x) x
Efficiency: as n increases s get closer to
s

Standard error
n

Sampling error:

55

For a large sample.


A confidence interval for a population mean is:

x Z

The mean, standard deviation, and n depend on the sample,


and Z depends on the confidence level.
56

Confidence Interval for


( Known)

Assumptions

Population standard deviation is known


Population is normally distributed
If population is not normal, use large sample (n30)

Confidence interval estimate:


xZ

(where Z is the normal distribution critical value)

57

Common Z levels of
confidence

Commonly used confidence levels are 90%,


95%, and 99%
Confidence
Level
80%
90%
95%
98%
99%
99.8%
99.9%

Z value
1.28
1.645
1.96
2.33
2.58
3.08
3.27

58

Confidence Interval for


( Unknown)

If the population standard deviation is


unknown, we can substitute the sample
standard deviation, S

This introduces extra uncertainty, since S is


variable from sample to sample

So we use the t distribution instead of the


normal distribution
59

Confidence Interval for


( Unknown)

Assumptions

Population standard deviation, , is unknown


Population is normally distributed
If population is not normal, use large sample

Use Students t Distribution


Confidence Interval Estimate:

X t n-1

S
n

(where t is the critical value of the t distribution with n-1 d.f. )


60

Example
If we want to estimate the average ages of kids
that ride a particular roller coaster ride in
Blackpool, and we take a random sample of 8 kids
exiting the ride, and find that their ages are: 2, 3,
4, 5, 6, 6, 7, 7.
a. Calculate the sample mean.
b. Calculate the sample standard deviation.
c. Calculate the standard error of the mean.
d. Calculate the 99% confidence interval.

61

Answer (a, b)
a. To calculate the sample mean.
8

X8

X
i 1

2 3 4 5 6 6 7 7 40

5.0
8
8

b. To calculate the sample standard deviation.


8

s X2

i 1

( X i 5) 2
8 1

s X 3.4 1.9

32 2 2 12 0 2(12 ) 2(2 2 ) 24

3.4
7
7
62

Answer (c)
c. Calculate the standard error of the mean.
s x 1 .9
sx

.67
n
8

d. Calculate the 99% confidence interval.


mean s x (t df , / 2 )
5.0 .67(3.50) (2.65, 7.35)
where t comes from Students t distribution, and depends on the sample
size through the degrees of freedom n-1.

63

Confidence Interval

The confidence level tells us how sure we can be


about our estimate.

It expresses how often the true percentage of the


population lies within the confidence interval.

The confidence level describes the uncertainty


associated with a sampling method.
64

Revision: Possible
Questions/Problems

Definitions: Probability, random variables, Confidence


interval
Discrete and continuous probability distributions
Types of distribution, show the density function, mean and
standard deviations of the Binomial, Exponential, Poisson
and the Normal distribution
Maximum Likelihood Estimator for Exponential
Distribution
Characteristics of a Normal distribution
Standard normal calculation using table
65

QUESTIONS?

66

You might also like