Ang A., Tang W. Probability Concepts in Engineering 2ed 2007
Ang A., Tang W. Probability Concepts in Engineering 2ed 2007
Ang A., Tang W. Probability Concepts in Engineering 2ed 2007
PROBABILITY CONCEPTS
IN ENGINEERING
Emphasis on Applications to Civil and Environmental Engineering
Definitions
Probability Concepts
in Engineering
*
Emphasis on Applications in
Civil & Environmental Engineering
WILSON H.TANG
Chair Professor,
Hong Kong University of Science & Technology
620.00151
A54P
L1BROS 0
WILEY COMPANIA0C
JOHN WILEY & SONS, INC. Calle 19 No. 3-16 • Local 104
librosycomp@hotmail.com
Telefax: 341 71 00 • Bogota. D C.
*This title is the 2nd edition of Probability Concepts in Engineering Planning and Design, Vol. I: Basic
Principles.
ASSOCIATE PUBLISHER Daniel Sayre
ACQUISITIONS EDITOR Jennifer Welter
SENIOR PRODUCTION EDITOR William A. Murray
MARKETING MANAGER Frank Lyman
COVER DESIGN Hope Miller
ILLUSTRATION COORDINATOR Mary Alma
MEDIA EDITOR Stefanie Liebman
COVER PHOTO
The modern-style Caiyuanba Bridge is a tie-arch bridge located in Chongqing, China over the Yangtze River. It
has a main arch span of 420 meters with two decks. The upper deck carries six lanes of traffic and two pedestrian
paths; the lower deck carries two monorail tracks. Both the girder and the box-arch ribs are constructed of steel.
The cover image was provided by T.Y. Lin International (San Francisco, California), designer of the main span
of the Caiyuanba Bridge. The authors and publisher wish to express their thanks to T.Y. Lin International for the
use of the image.
This book was set in Times Roman by TechBooks and printed and bound by Hamilton Printing Company. The
cover was printed by Phoenix Color.
To order books or for customer service please, call 1-800-CALL WILEY (225-5945).
AONT|F<C1A UNIVERSiDAB JAVEftlANA I
B1BLIOTECA GENERAL
8ELEtC!CN Y ADOUISlclON
SOUCilADO PQgfSPJUCrtUCS
S8J q ^25^6^
ISBN-13 978-0-471-72064-5
ISBN-10 0-471-72064-X
10 9 8 7 6 5 4 3 2 1
Dedicated to Myrtle Mae and Bernadette
Preface
vii
viii ► Preface
INTENDED AUDIENCE
The material in the book is intended for a first course on applied probability and statistics for
engineering students at the sophomore or junior level, or for self-study, stressing probabilis
tic modeling and the fundamentals of statistical inferences. The primary aim is to provide
an in-depth understanding of the fundamentals for the proper application in engineering.
Only knowledge of elementary calculus is required, and thus the material can be taught to
undergraduate engineering students at any level. It may be used for a course taught either
in the engineering departments or offered for engineers by the departments of mathematics
and statistics.
The book is self-contained and thus is also suitable for self-study by practicing engineers
who desire a reading and working knowledge of the basic concepts and tools of probability.
SUGGESTED SYLLABUS
One-semester course A suggested outline for a one-semester (or one-quarter) course
may be as follows: Chapter 1 (assigned as required reading with guidance from instructors)
through Chapter 5 stressing the modeling of probabilistic problems, plus Chapters 6 through
8 stressing the fundamentals of inferential statistics, can be covered in a one-semester course.
One-quarter course For a one-quarter course, the same chapters may be covered with less
emphasis on selected sections (e.g., discuss fewer types of useful probability distributions)
and limit the number of illustrations in each chapter.
Senior-level course For a course at the senior level, all the chapters, including the first
part of Chapter 9, may be covered in one semester.
The extensive variety of problems at the end of each chapter provides wide choices for
class assignments and also opportunities for self-measuring a reader’s understanding.
INSTRUCTOR RESOURCES
These instructor resources are available on the Instructor section of the Web site at
www.wiley.com/college/ang. They are available only to instructors who adopt the text:
• Solutions Manual: Solutions to all the exercise problems in the text.
Preface ◄ ix
• Image Gallery: All figures and tables from the text, appropriate for use in Power
Point presentations.
These resources are password protected. Visit the Instructor section of the book Web site
to register for a password to access these materials.
MATHEMATICAL RIGOR
We have not emphasized mathematical rigor throughout the book; such rigor may be sup
plemented with treatises on the mathematical theory of probability and statistics. We are
concerned mainly with the practical applications and relevance of probability concepts to
engineering. The necessary mathematical concepts are developed in the context of engi
neering problems and through illustrations of probabilistic modeling of physical situations
and phenomena. In this regard, only the essential principles of mathematical theory are
discussed, and these principles are explained in non-abstract terms in order to stress their
relevance to engineering. This is necessary and essential to enhance the appreciation and
recognition of the practical significance of probability concepts.
MOTIVATION
Uncertainties are unavoidable in the design and planning of engineering systems. Properly,
therefore, the tools of engineering analysis should include methods and concepts for eval
uating the significance of uncertainty on system performance and design. In this regard,
the principles of probability (and its allied fields of statistics and decision theory) offer the
mathematical basis for modeling uncertainty and the analysis of its effects on engineering
design.
Probability and statistical decision theory have especially significant roles in all aspects
of engineering planning and design, including: (1) the modeling of engineering problems
and evaluation of systems performance under conditions of uncertainty; (2) systematic-
development of design criteria, explicitly taking into account the significance of uncertainty;
and (3) the logical framework for quantitative risk assessment and risk-benefit tradeoff
analysis relative to decision making. Our principal aim is to emphasize these wider roles of
probability and statistical decision theory in engineering, with special attention on problems
related to construction and industrial management; geotechnical, structural, and mechanical
design; hydrologic and water resources planning; energy and environmental problems;
ocean engineering; transportation planning; and problems of photogrammetric and geodetic
engineering.
The principal motivation for developing this revised edition of the book is our firm
belief that the principles of probability and statistics are of fundamental importance to all
branches of engineering, although the examples and exercise problems included in this text
are mostly from civil and environmental engineering. These principles are essential for
the quantitative analysis and modeling of uncertainties in the assessment of risk, which is
central in the modern approach to decision making under uncertainty.
The concepts and methods expounded in this book constitute only the basics necessary
for the proper treatment of uncertainties. These basic principles may need to be supple
mented with more advanced tools for specialized applications. See Volume II of Ang and
Tang (1984) for some of these advanced topics.
Over the years, we have received numerous compliments from former students and
professional colleagues regarding the way we elucidated the concepts and methods in the
first edition, particularly for those wishing to learn and apply the principles of probability and
x ► Preface
statistics. In this regard, we are encouraged that the first edition of this book has contributed to
the education of several generations of engineering students, and of professional colleagues
through self-studies. The work for this second edition of the book is also inspired by the
hope that this work will continue to contribute to the education of future generations of
engineering students in the practical roles and significance of probability and statistics in
engineering, enhanced further nowadays by the general availability of personal computers
and associated commercial software.
VOLUME II
The first edition of this text was published in two volumes. For the second edition, only
Volume 1 (this text) is being revised. If you would like to obtain a copy of the original
Volume II, you may contact Professor Ang directly at ahang2@aol.com.
ACKNOWLEDGEMENTS
Finally, it is our pleasure to acknowledge the many constructive comments and suggestions
offered by the prepublication reviewers of our original manuscript, including:
C. H. Aikens, University of Tennessee
B. Bhattacharya, University of Delaware
V. Cariapa, Marquette University
A. Der Kiureghian, University of California, Berkeley
S. Ekwaro-Osire, Texas Tech University
B. Ellingwood, Georgia Institute of Technology
T. S. Hale, Ohio University
P. A. Johnson, Pennsylvania State University
J. Lee, University of Louisiana
M. Maes, University of Calgary
S. Mattingly, University of Texas, Arlington
P. O’Shaughnessy, The University of Iowa
C. Polito, Valparaiso University
J. R. Rowland, University of Kansas
Y. K. Wen, University of Illinois, Urbana-Champaign
as well as a number of other anonymous reviewers. Many of their suggestions have served
to improve the final manuscript. We also greatly appreciate the many compliments from
several of the reviewers, including the phrase “the authors seem to understand what Socrates
knew a long time ago... ‘Analytical tools that are understood have a higher probability of
being used’ relating our work to Socrates certainly represents the height of compliments.
Last but not least, our thanks to T. Hu, H. Lam, J. Zhang and L. Zhang for their assistance
in the solutions to some of the examples, and in the preparation of the Solutions Manual
for the problems in the book.
xi
xii Contents
5.2.3 Problems Involving Aleatory and 8.2.3 Confidence Intervals in Regression 309
Epistemic Uncertainties 223 8.3 Correlation Analysis 311
5.2.4 MCS Involving Correlated Random 8.3.1 Estimation of the Correlation Coefficient 312
Variables 231 8.3.2 Regression of Normal Variates 313
5.3 Concluding Summary 242 8.4 Linear Regression with Nonconstant Variance 318
Problems 242 8.5 Multiple Linear Regression 321
References and Softwares 244 8.6 Nonlinear Regression 325
8.7 Applications of Regression Analysis in
► CHAPTER 6 Engineering 333
Statistical Inferences from Observational Data 245 8.8 Concluding Summary 339
Problems 339
6.1 Role of Statistical Inference in Engineering 245
References 344
6.2 Statistical Estimation of Parameters 246
6.2.1 Random Sampling and Point Estimation 246
6.2.2 Sampling Distributions 255 ► CHAPTER 9
Table A.6 Critical Values of the Anderson-Darling B.4: The Multinomial Coefficient 399
Goodness-of-Fit Test 395 B.5: Stirling’s Formula 399
► 1.1 INTRODUCTION
In dealing with real world problems, uncertainties are unavoidable. As engineers, it is im
portant that we recognize the presence of all major sources of uncertainty in engineering.
The sources of uncertainty may be classified into two broad types: (1) those that are asso
ciated with natural randomness; and (2) those that are associated with inaccuracies in our
prediction and estimation of reality. The former may be called the aleatory type, whereas
the latter the epistemic type. Irrespective of the type of uncertainty, probability and statistics
provide the proper tool for its modeling and analysis. In the ensuing chapters we will present
the fundamental principles of probability and statistics, and illustrate their applications in
engineering-type problems. The main aim of this work is to present the concepts and meth
ods of probability and statistics for the modeling and formulation of engineering problems
under uncertainty; this is in contrast to books that are devoted to statistical data analysis,
although the fundamentals of statistics are also presented here.
The effects of uncertainties on the design and planning of an engineering system are
important, to be sure; however, the quantification of such uncertainty and the evaluation of its
effects on the performance and design of the system, should properly include the concepts
and methods of probability and statistics. Furthermore, under conditions of uncertainty,
the design and planning of engineering systems involve risks, which involves probability
and associated consequences, and the formulation of related decisions may be based on
quantitative risk-benefit trade-offs which are properly also within the province of applied
probability and statistics. In this light, and with reference to problems containing randomness
and uncertainty, the significance of the concepts of probability and statistics in engineering
parallels those of the principles of physics, chemistry, and mechanics in the formulation
and solution of engineering problems.
In light of the above, we see that the role of probability and statistics is quite pervasive
in engineering; it ranges from the description of basic information to the development and
formulation of bases for design and decision making. Specific examples of such imperfect
information, and of applications in engineering design and decision-making problems, are
described in the following sections.
1
2 Chapter 1. Roles of Probability and Statistics in Engineering
Many phenomena or processes of concern to engineers, or that engineers must contend with,
contain randomness: that is, the expected outcomes are unpredictable (to some degree).
Such phenomena are characterized by field or experimental data that contain significant
variability that represents the natural randomness of an underlying phenomenon; i.e., the
observed measurements are different from one experiment (or one observation) to another,
even if conducted or measured under apparently identical conditions. In other words, there
is a range of measured or observed values of the experimental results; moreover, within
1.2 Uncertainty in Engineering 3
this range certain values may occur more frequently than others. The variability inherent in
such data or information is statistical in nature, and the realization of a specific value (or
range of values) involves probability. The inherent variability in the observed or measured
data can be portrayed graphically in the form of a histogram or frequency diagram, such
as those shown in Figs. I. I through 1.23, all of which demonstrate information on physical
phenomena of relevance particularly to civil and environmental engineering. Furthermore, if
two variables are involved, the joint variability may similarly be portrayed in a scattergram.
A histogram simply shows the relative frequencies of the different observed values of
a single variable. For example, for a specific set of experimental data, the corresponding
histogram may be constructed as follows.
From the range of the observed data set, we may select a range on one axis (for a
two-dimensional graph) that is sufficient to cover the largest and smallest values among the
set of data, and divide this range in convenient intervals. The other axis can then represent
the number of observations within each interval among the total number of observations, or
the fraction of the total number. For example, consider the 29 years of annual cumulative
rainfall intensity in a watershed area recorded over a period of 29 years as presented in
Table 1.1.
An examination of these data will reveal that the observed rainfall intensities range
from 39.91 to 67.72 in. Therefore, choosing a uniform interval of 4 in. between 38 and
70 in. the number of observations within each interval and the corresponding fraction of
the total observations are calculated as summarized in Table 1.2.
The uniform intervals indicated in Table 1.2 may then be scaled on the abscissa, and
the corresponding number of observations (column 2 in Table 1.2) can be shown as a bar
on the vertical axis, as illustrated in the histogram of Fig. 1.1 a for the rainfall intensity of
the watershed area. Alternatively, the vertical bar may be in terms of the fraction of the total
observations (column 3 in Table 1.2) and would appear as shown in Fig. 1.1b. Oftentimes,
there may be reasons to compare an empirical frequency diagram, such as a histogram, with
a theoretical frequency distribution (such as a probability density function, PDF. discussed
later in Chapter 3).
For this purpose, the area under the empirical frequency diagram must be equal to
unity; we obtain this by dividing each of the ordinates in a histogram by its total area; e.g.,
we obtain the empirical frequency function of Fig. 1.1a by dividing each of the ordinates
by 29 x 4 = 1 16; whereas the corresponding empirical frequency function may also be
obtained from Fig. 1.1b by dividing each of the ordinates by 4 x 1 = 4. In either case,
we would obtain the empirical frequency function of Fig. 1.1c for the rainfall intensity in
the watershed area. We may then observe that the total area under the empirical frequency
function is equal to 1.0, and thus the area over a given range may be used to estimate the
probability of rainfall intensity within the given range.
A large number of physical phenomena are represented in Figs. 1.1 through 1.23; these
are purposely collected here to demonstrate and emphasize the fact that the state of most
engineering information contains significant variability. For examples, the properties of
most materials of construction vary widely; in Figs. 1.2 and 1.3 we present the histograms
demonstrating the variabilities in the bulk density of soils and the water-cement (w/c)
ratio of concrete specimens, respectively, whereas in Figs. 1.4 and 1.5 are shown the yield
strength of reinforcing bars and the ultimate shear strength of steel fillet welds.
30
Figure 1.2 Bulk density of residual soils (after Winn et al., 2001).
Figure 1.4 Yield strength of reinforcing bars (data from Julian, 1957).
5
6 Chapter 1. Roles of Probability and Statistics in Engineering
n - 41
x=56.4
c
0)
s=6.2
o
0)
Q_
cz
CD
ZJ
S' 10
OL
IO 50 60 70
Ultimate shear strength, ksi
Figure 1.5 Ultimate shear strength of fillet welds (after Kulak, 1972).
Similarly, in the case of construction timber, we see in Fig. 1.6 examples of the his
tograms of the modulus of elasticity of Southern pine and Douglas fir timber, whereas in
Fig. 1.7 we show the histogram of the modulus of elasticity of grouted masonry. As ex
pected. there are wide variabilities in the moduli of elasticity of these two materials; timber
is a natural organic material, whereas masonry is a highly heterogeneous mixture of cement
and natural sand.
Figure 1.8 Corrosion activity of steel in concrete (after Pruckner & Gjorv, 2002).
Figure 1.9 Trace lengths of discontinuities in rock mass (after Wu and Wang, 2002).
7
8 ► Chapter 1. Roles of Probability and Statistics in Engineering
Furthermore, the efficiencies of pile groups are also highly variable as shown in
Figs. 1.12a and 1.12b, respectively, for pile groups in clay and sand.
Significant variabilities are also present in the loads on structures; these are illustrated
in Fig. 1.13 showing the wind-induced pressure fluctuations on tall buildings observed
during two typhoons (hurricanes), and in Fig. 1.14 is shown the variability of earthquake-
induced shear stresses in soils. In both of these figures, the dispersions are scaled in terms
of respective standard deviations.
1.2 Uncertainty in Engineering 9
3.5
3.0
2.0 |
>
1.5 =
1.0 I
0.5 °-
0.0
0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1
Pile group efficiency
(a) clay
Probability density
In Fig. 1.19 we see the histograms of the measured roughness profiles of a rough road
and a smooth road, in terms of the respective rms (root-mean-square) values. The cost of
injuries from highway work zone accidents in the United States can be expected to vary
greatly; this is evidenced in Fig. 1.20.
It is of interest to observe that engineered structures can sometimes fail and cause
economic losses as well as loss of human lives. Figure 1.21 shows the statistics of dam
failures in the United States as a function of life (in years) after completion. Clearly, we
observe that most failures occur during the first year after completion of construction of
a dam.
Figure 1.19 Measured road roughness profiles (after Rouillard et al., 2000).
35%
30%
25%
20%
15%
10%
5%
0%
$0-$1 $1-$1,000 $1,000-$5,000 $5,000-$7,500
Cost of work zone accidents
Figure 1.20 Cost of work zone accidents (after Mohan & Gautam, 2002).
11
12 Chapter 1. Roles of Probability and Statistics in Engineering
Figure 1.21 Statistics of dam failures in the United States (after van Gelder, 2000).
Speculative development
Average 1080 man hours
In some of the figures, e.g., Figs. 1.11, 1.12, 1.14, 1.19, 1.21, and 1.23, the theoretical
probability density functions (PDFs) are also shown; the significance of these theoretical
functions and their relations to the corresponding experimental frequency diagrams will be
discussed in greater depth in Chapters 3 and 7.
1.2 Uncertainty in Engineering ◄ 13 ’
Figure 1.23 Distribution of bid prices in highway construction (after Cox, 1969).
When two (or more) variables are involved, each variable may have its own variability,
whereas there may also be joint variability of the two variables. Observed data of pairs of
values of the two variables can be portrayed in a two-dimensional graph in the form of a
scattergram of the observed data points. For example, in Fig. 1.24 is shown the scattergram
of the modulus of elasticity versus the corresponding strength of Douglas fir timber, whereas,
in Fig. 1.25 we see the scattergram of the tensile strength of concrete versus temperature.
In Fig. 1.26 we observe the scatter of the data points of the mean annual discharge of
a stream versus the corresponding drainage area near Honolulu. In Fig. 1.27 is shown the
scattergram of the plasticity index versus the liquid limit of soils, which is of fundamental
interest to geotechnical engineers.
Figure 1.24 Modulus of elasticity vs. strength of timber (after Littleford, 1967).
In Fig. 1.28, we show the scattergram between the concentration of chlorophyll and
phosphorous concentration; this information is important to environmental engineers con
cerned with the productivity of lakes. In Fig. 1.29 we see a typical scattergram of real estate
land value plotted against population density.
14 ► Chapter 1. Roles of Probability and Statistics in Engineering
2.8
♦
o 2.6
CD
Mean value = 2!.4 MPa
co 2.4 R =0.79 1 ♦
0)
2.2
k_
• Q
♦
CD ZJ
CD Ct
2.0
♦ ♦
c
CD Ct 1.8
♦
CD
E 1.6 : .4'
CD CD ♦
CD
Ct
1.4
CD
v_
o
► ♦
CD
1.2
►
1.0
200 250 300 350 400 450 500 550
Temperature, °C
Figure 1.25 Tensile strength of concrete vs. temperature (after dos Santos et al., 2002).
Figure 1.26 Mean annual discharge vs. drainage area of streams (after Todd & Meyer. 1971).
Figure 1.27 Plasticity index vs. liquid limit of solids (after Winn et al., 2001).
1.2 Uncertainty in Engineering ◄ 15
In estimating the maximum wind speeds, the calculated speed may not be perfect;
this is illustrated in Fig. 1.30, which shows the scattergram between the calculated and
corresponding observed maximum wind speeds. Also, in traffic engineering, the relationship
between the daily conflicts and the total entry volume at traffic intersections is illustrated
in Fig. 1.31, which shows the scatter of the data points.
Finally, in assessing the hazards from glacier lake outbursts, which can potentially
cause dangerous outburst floods and debris flow in mountainous regions, the area and mean
depth of a glacier lake can be used to estimate the volume of a lake. For this purpose, the
scattergram of Fig. 1.32 shows such a relationship between the mean depth and the area.
Figure 1.30 Calculated vs. observed wind speeds Figure 1.31 Traffic conflicts vs. volume (after
(after Matsui et al., 2002). Katamine, 2000).
16 ► Chapter 1. Roles of Probability and Statistics in Engineering
100
1
103 104 105 106 107
Area (m2)
Figure 1.32 Mean depth vs. area of glacier lakes (after Huggel et al., 2002).
A large number of histograms and scattergrams are shown above; the purpose of il
lustrating such a large number is to demonstrate clearly that variability of engineering
information is invariably present and unavoidable in many areas of engineering applica
tions.
We might emphasize that the variability exhibited in any histogram is due to randomness
in nature, and thus is an aleatory type of uncertainty. The following example may serve to
introduce how such information may be handled in practice.
► EXAMPLE 1.1 Quite often, when we have a finite set of observational data (called a sample), it is of interest to estimate
the average of the sample (called the sample mean) and a measure of its variability or dispersion (called
the sample variance)-, the latter is the aleatory uncertainty corresponding to the data set. Consider, for
example, the set of 29 observed annual rainfall intensities in the watershed area tabulated earlier in
Table I. I a. Clearly, we can estimate the sample average of the observations simply as follows:
and the corresponding sample variance, which is the average of the squared deviation from the mean
(we shall define the sample variance more thoroughly in Chapter 6), is
= 57.34
Finally we obtain the corresponding sample standard deviation, sx = V57.34 = 7.57 in. This
sample standard deviation is, therefore, a measure of the dispersion of the annual rainfall intensity
and represents the corresponding randomness or aleatory uncertainty of the rainfall intensity in the
watershed area.
As the above average annual rainfall intensity is estimated from the set of 29 observations, there
is also epistemic uncertainty underlying the estimated average, called the sampling error. In this case,
this is the uncertainty underlying the estimated average annual rainfall intensity of 50.70 in. This
sampling error (defined in Chapter 6 as a function of the sample size) is equal to 7.57/V29 = 1.41 in.
1.2 Uncertainty in Engineering 17
In this example, we see that the variability (or randomness) in the observed data contains the
aleatory uncertainty, whereas the sampling error in the estimated average contributes to the epistemic
uncertainty. There are other sources of epistemic uncertainty as illustrated below in Examples 1.2
and 1.3. ◄
► EXAMPLE 1.2 Consider the calculation of the deflection of a prismatic cantilever beam under the concentrated load
P as shown in Fig. E1.2. For engineering purposes, the deflection at the end of the beam B is usually
calculated on the basis of the simple beam theory, which gives
_ PL?
(1.1)
~ 3EI
in which
measured A«
= 1.05;0.95; 1.10; 0.98; 1.15; 0.97; 1.20; 1.00; 1.08; 1.12
Aw by Eq. 1.1
These test results would yield the following sample mean and sample standard deviation (see
Chapter 6) for the ratio of the measured to the calculated deflection, Ab:
► EXAMPLE 1.3 In Table El.3, we show the data of the observed settlements of pile groups and the correspond
ing calculated settlements of the same pile groups reported by Viggiani (2001). The calculated
settlements were obtained with a nonlinear model for predicting settlements of the respective pile
groups.
Using the data in Table El .3, we develop later, in Example 8.3 of Chapter 8. the so-called linear
regression equation showing the relationship between the observed and calculated settlements as
follows;
in which E(Y\x) stands for the expected observed settlement Y of a pile group if the calculated
settlement is x. In Example 8.3, we also show that the conditional standard deviation about
the regression equation (i.e., the average dispersion along the regression line) is 5^ = 7.784 mm.
1.3 Design and Decision Making Under Uncertainty 19
TABLE E1.3 Data of Observed and Calculated Settlements of Pile Groups (in mm)
Equation 1.2 has the following significance: The equation can be used to determine the mean-value
(average) of the actual settlement, E(Y\x), if the calculated settlement is x. However, the conditional
standard deviation of sy|X = 7.784 mm represents the error of the nonlinear model for predicting the
actual settlements, and thus is the epistemic uncertainty of the proposed calculational method.
The above Eq. 1.2 may be used to determine the expected settlements of similar pile groups at
a site based on the settlements calculated by the same nonlinear model described in Viggiani (2001):
however, according to the regression equation, the calculated settlements will tend to be on the low side
and must be increased by a bias factor of 1.064. Moreover, there will be an average standard deviation
(dispersion) of the true settlement of 7.784 mm for a given value of x. Depending on the value of
the calculated settlement x, the corresponding coefficient of variation (c.o.v.) may also be estimated;
i.e„ the c.o.v. would be 7.784/E(F|x). For example, if the calculated settlement for a particular pile
group is x = 45 mm, we can expect the actual settlement to be 45 x 1.064 = 47.88 mm with a c.o.v.
of 7.784/47.88 = 0.16, which is the epistemic uncertainty underlying the model equation. ◄
Figure 1.30 provides a similar example for assessing the epistemic uncertainty of a
predictive model. All uncertainties, whether they are aleatoric or epistemic, can be assessed
in statistical terms, and the evaluation of their significance on engineering planning and
design can be performed systematically and logically using the concepts and methods that
are embodied in the theory of probability.
a systematic basis for evaluating the degree of conservativeness; a resulting design that is
overly conservative may be excessively costly, whereas one with insufficient conservatism
may be inexpensive but will sacrifice performance or safety. The optimal decision ought to
be based on a trade-off between cost and benefit, in order to achieve a balance between cost
and system performance. As the available information and evaluative models are invari
ably imperfect or insufficient, and thus contain uncertainties, the required trade-off analysis
ought to be performed within the context of probability and risk.
The situations described above are common to many problems in engineering; in the
following we describe several examples illustrating some of these problems. The examples
are idealized to simplify the presentations; nevertheless, they serve to illustrate the essence
of the decision-making aspects of engineering under conditions of uncertainty.
Figure 1.33 Density of compacted vocanic tuff subgrade (after Pettitt. 1967).
enough?”—a question that realistically requires the consideration of risk and the probability
of nonperformance or failure.
As a specific example, consider the design of an offshore drilling platform, which is
subject to occasional hurricane forces. In such a case, we recognize that aside from the fact
that the maximum wind and wave effects during a hurricane are random, as may be inferred
from Fig. 1.13, the occurrence frequency of hurricanes in a given region of the ocean is
also unpredictable. Hence, in determining the safety level for the design of the platform,
the probability of occurrence of strong hurricanes within the specified useful life of the
structure must be considered, in addition to the survival probability of the structure under
the highest wind and wave forces expected during the life of the structure. Consequently,
the level of hurricane force that should be specified in the design, and the required level
of protection that would be adequate during a hurricane, are decisions that may require a
trade-off between cost and level of protection in terms of risk or failure probability within
the lifetime of the platform.
Similarly, in considering the design of structural or machine components that are sub
jected to repeated or cyclic loads, we recognize that the fatigue life (in number of load cycles
until fatigue failure) of a component is also highly variable, even under constant amplitude
stress cycles, as illustrated in Fig. 1.34. For this reason, the failure-free life of a component
is difficult to predict and may be described only in terms of probability. Therefore, such
a component may be designed for a required operational life within a specified reliability
(probability of no fatigue failure). As expected, the fatigue life is a function of the applied
stress amplitudes; in general, the life will increase inversely with the applied stress ampli
tudes, as we can see in Fig. 1.34. Consequently, if a desired failure-free operational life is
specified, a component may be designed to be massive so that the stress amplitudes will
be low and thus ensure a high reliability of achieving the desired life; the resulting design
will, of course, require more material and higher expense. In contrast, if the component is
under-designed, high stresses will be induced resulting in shorter life and more frequent
maintenance or replacement in order to maintain the required reliability within the opera
tional life. In this case, the optimal operational life may be determined by minimizing the
total expected life-cycle cost of a component, which would include the initial cost of the
component, the expected costs of maintenance and replacements (which are functions of
22 ► Chapter 1. Roles of Probability and Statistics in Engineering
50 r-
40
30
25
20
Smax
a 20
o 26
□ 32
a 38
• 44
Log N = 10.870 - 3,372 Log SR
r = 0.147
Figure 1.34 Fatigue life of welded beams vs. applied stress (after Fisher et al., 1970).
the specified reliability), and the expected cost associated with the loss of revenue incurred
during a repair (which is also a function of reliability). Having decided on the operational
design life and a stated reliability, the component may then be proportioned or designed
accordingly.
The dissolved oxygen concentration in the stream during any 7 consecutive day period must
be such that: (i) the probability of its being less than 4 mg/1 for any one day is less than 0.20;
and (ii) the probability of its being less than 2 mg/1 for any one day is less than 0.1 and for two
or more consecutive days is less than 0.05.
The strength level of the concrete will be considered satisfactory if the average of all sets of
three consecutive strength test results equal or exceed the required/,.’ and no individual strength
test result falls below the required/,.' by more than 500 psi. Each strength test result shall
be the average of two cylinders from the same sample tested at 28 days or the specified earlier
age.
The requirements stated above clearly imply the need for probability and statistics
in the quality assurance of concrete material. Similar requirements may be found for the
quality assurance of other construction materials.
References 25
► REFERENCES
Ang, A. H-S.. “Extended Reliability Basis of Structural Design under way Research Record. No. 278, National Research Council, 1969,
Uncertainties,” Annals of Reliability and Maintainability, Vol. 9, pp. 35-48.
AIAA, July 1970. pp. 642-649. Donovan, N.C., “A Stochastic Approach to the Seismic Liquefaction
Ang, A. H-S., and DeLeon. D., “Modeling and Analysis of Uncertainties Problem,” Proc. 1st Int. Conf on Application of Statistics and Prob
for Risk-Informed Decisions in Infrastructure, Engineering,” Struc ability, Hong Kong University Press, 1972.
ture and Infrastructure Engineering, Vol. 1, No. 1. Taylor & Francis. dos Santos, J.R.. Branco, F.A., and de Brito. J., “Assessment of Concrete
March 2005, pp. 19-31. Structures Subjected to Fire—The FB Test,” Magazine of Concrete
Becker, D.E., Burwash. W.J., Montgomery, R.A.. and Liu, Y., “Foun Research, Vol. 54, June 2002.
dation Design Aspects of the Confederation Bridge,” Canadian Environmental Protection Agency, “Policy for Use of Probabilistic
Geotechnical Journal, Vol. 35. October 1998. Analysis in Risk Assessment: Guiding Principles for Monte Carlo
Brandow, G.E., Hart, G., and Virdee, A., “1997 Design of Reinforced Analysis,” EPA/630/R-97/001, May 1997.
Masonry Structures,” Concrete and Masonry Association of Califor Fisher, J.W., Frank. K.H.. Hirt, M.A., and McNamee. M.. “Effects of
nia and Nevada, 1997. Weldments on the Fatigue Strength of Steel Beams.” NCHRP Rept.
Cox, E.A., “Information Needs for Controlling Equipment Costs," High No. 102. National Research Council, 1970.
26 Chapter 1. Roles of Probability and Statistics in Engineering
Forbes, W.S., “A Survey of Progress in House Building,” Building Tech National Aeronautics and Space Administration. “Probabilistic Risk As
nology and Management, Vol. 7(4), April 1969. pp. 88-91. sessment Procedures Guide for NASA Managers and Practitioners.”
Galligan, W.L., and Snodgrass, D.V., “Machine Stress Rated Lumber: August 2002.
Challenge to Design,” Journal of Structural Division, ASCE, Vol. National Institutes of Health, “Science and Judgment in Risk Assess
96, December 1970. ment: Needs and Opportunities,” Environmental Health Perspec
Huggel, C.. Kaab, A., Haeberli, W., Teysseire, P., and Paul, F„ “Remote tives, Vol. 102, No. 11, November 1994.
Sensing Based Assessment of Hazards from Glacier Lake Outbursts: National Research Council, “Science and Judgment in Risk Assess
A Case Study in the Swiss Alps,” Canadian Geotechnical Journal, ment,” National Academy Press, Washington. DC, 1994.
Vol. 39, March 2002. Pettitt, J.H.D., “Statistical Analysis of Density Tests.” Journal Highway
Jones, J.R., and Bachmann, R.W., “Prediction of Phosphorous and Div., ASCE, Vol. HW2. November 1967.
Chlorophyll Levels in Lakes,” Journal of the Water Pollution Control Pruckner, F., and Gjorv, O.E., “Patch Repair and Macrocell Activity in
Fere ration, Vol. 48, 1976. Concrete Structures,” ACE Materials Journal. Vol. 99, March-April
Julian. O.G., “Synopsis of First Progress Report of Committee on 2002.
Factors of Safety,” Journal of Structural Division, ASCE, Vol. 83. Rouillard, V., “Classification of Road Surface Profiles,” Journal Trans
July 1957, p. 1316. portation Engineering, ASCE. Vol. 126, January/February, 2000,
Katamine, N.M., “Various Volume Definitions with Conflicts at pp. 41-45.
Unsignalized Intersections,” ASCE Journal of Transportation Thoft-Christensen, P., “Stochastic Modeling of the Diffusion Coefficient
Engineering, Vol. 126, January/February 2000. for Concrete,” Reliability and Optimization of Structural Systems,
Kothandaraman, V., and Ewing, B.B., “A Probabilistic Analysis Swets & Zeitlinger. Lisse, 2003.
of Dissolved Oxygen-Biochemical Oxygen Demand Relationship Todd, D.K.. and Meyer, C.F., “Hydrology and Geology of the Honolulu
in Streams,” Journal of Water Resources Control Federation, Aquifer,” Journal ofHydraulics Div.. ASCE. Vol. 97. February 1971.
Part 2, February 1969, pp. 73-90. United Kingdom Health and Safety Executive (HSE), “Use of Risk
Kulak, G.L., “Statistical Aspects of Strength of Connection.” Proc. Assessment in Government Departments,” U.K. Interdepartmental
ASCE Specialty Conf, on Safety and Reliability of Metal Structures, Liaison Group on Risk Assessment, 2000.
November 1972, pp. 83-105. U.S. Department of Energy. “Characterization of Uncertainties in Risk
Lam Put, R., “Dynamic Response of a Tall Building to Random Wind Assessment with Special Reference to Probabilistic Uncertainty
Loads," Proc., 3rd Int. Conf, on Wind Effects on Buildings and Struc Analysis,” EH-413-068/0496. April 1996.
tures, Tokyo, September 1971. van Gelder. P.H.A.J.M., “Statistical Methods for the Risk-Based Design
Littleford, T.W., “A Comparison of Flexural Strength-Stiffness Relation of Civil Structures.” Communications on Hydraulic and Geotechni
ships for Clear Wood and Structural Grades of Lumber,” Informa cal Engineering, Delft University of Technology, 2000.
tion Report VP-X-30, Forest Products Lab., B.C., Canada, December Viggiani, C., “Analysis and Design of Piled Foundations,” Rivista Ital
1967. iano di Geotecnica, Vol. 35. 2001.
Loucks, D.P., and Lynn, W.R., “Probabilistic Models for Predicting Viner, J.G„ “Recent Developments in Roadside Crush Cushions,” Jour
Stream Quality," Water Resources Research, Vol. 2, No. 3, Septem nal of Transportation Engineering, ASCE, Vol. 98, February 1972,
ber 1966,pp. 593-605. pp.71-87.
Matsui, M., Ishihara, L, and Hibi., K., “Directional Characteristics of Winn, K„ Rahardjo. H., and Peng, S.C., “Characterization of Residual
Probability Distribution of Extreme Wind Speeds by Typhoon Sim Soil in Singapore,” Journal ofSoutheast Asian Geotechnical Society.
ulation," Journal of Wind Engineering and Industrial Aerodynamic, Vol. 32, No. 1. April 2001.
Vol. 90. Elsevier Science, Ltd.. 2002. Wu, F-Q., and Wang, S-J., “Statistical Model for Structure of Jointed
Mohan, S.B., and Gautam. P.. “Cost of Highway Work Zone Injuries,” Rock Mass,” Geotechnique, Vol. 52, 2002.
Practice Periodical on Structural Design and Construction, Vol. 7, Wynn, F.H., “Shortcut Modal Split Formula,” Highway Research Record,
May 2002. National Research Council, 1969.
Mohseni, O., Erickson, T.R., and Stefan. H.G., “Upper Bounds for Zhang. L„ Tang. W. H„ and Ng, C.W.W., "Reliability of Axially Loaded
Stream Temperatures in the Contiguous United States.” Journal of Driven Pile Groups,” ASCE Journal of Geotechnical and Environ
Environmental Engineering, Vol. 128. January 2002. mental Engineering, Vol. 127 (12). December 2001.
Fundamentals of Probability
Models
► EXAMPLE 2.1 A contractor is planning the acquisition of construction equipment, including bulldozers, needed for
a new project in a remote area. Suppose that from his prior experience with similar bulldozers, he
estimated that there is a 50% chance that each bulldozer can remain operational for at least 6 months.
If he purchased three bulldozers for the new project, what is the probability that there will be only
I bulldozer left operational after 6 months into the project?
First, we observe that at the end of 6 months, the possible number of operating bulldozers will be
0, 1,2, or 3; therefore, this set of numbers constitute the possibility space of the number of operational
bulldozers after 6 months. However, this possibility space is not pertinent to the question referred to
above. For this latter purpose, the possibility space must be derived from the possible status of each
bulldozer after 6 months, as follows:
Denoting the condition of each bulldozer after 6 months as O for operational and N for nonope
rational, the possible conditions of the three bulldozers would be:
27
28 Chapter 2. Fundamentals of Probability Models
NNO
ONO
NON
Therefore, the pertinent possibility space consists of the eight possible outcomes as indicated
above. We observe also that since the condition of a bulldozer is equally likely to be operational
or nonoperational after 6 months, the eight possible outcomes are equally likely to occur. It is also
worth noting that among the eight possible outcomes, only one of them can be realized at the end of
6 months; this means that the different possibilities are mutually exclusive (we shall say more on this
point in Section 2.2.1).
Finally, among the eight possible outcomes, the realization of ONN, NON, or NNO is tantamount
to the event of interest, namely, “only one bulldozer is operational." Because each possible outcome
is equally likely to occur, the probability of the event within the above possibility space is 3/8. ◄
► EXAMPLE 2.2 In designing a left-turn lane for eastbound traffic at a highway intersection, such as shown in
Fig. E2.2, the probability of 5 or more cars waiting for left turns at any given time may be needed to
determine the required length of the left-turn lane.
I L. T. lane
For the above purpose, suppose that over a period of 1 week, 60 observations were made at regular
time intervals (during periods of heavy traffic) of the number of eastbound motor vehicles waiting for
left turns at this intersection, with the following results:
0 4 4/60
1 16 16/60
2 20 20/60
3 14 14/60
4 3 3/60
5 2 2/60
6 1 1/60
7 0 0
8 0 0
Conceivably, the number of vehicles waiting for left turns, during heavy traffic hours, could be any
integer number; however, based on the above observations, it is not likely that there will be seven or
more vehicles waiting for left turns at any time.
2.1 Events and Probability ◄ 29
On the basis of the above observations, the estimated relative frequency (in the third column
of the above table) may be used approximately as the probability of a particular number of cars
waiting for left turns. For example, the probability of the event “5 or more cars waiting for left turns”
is approximately 2/60+ 1/60 = 3/60. The estimated probabilities based on relative frequencies are
approximate because of “sampling error’ which may be significant when the estimate is based on a
small number of observations; this is part of the epistemic uncertainty that we discussed earlier in
Chapter 1. The accuracy of the estimated probabilities will improve as the total number of observations
[sample size) increases as we shall discuss further later in Chapter 6. ◄
EXAMPLE 2.3 The simply supported beam AB shown in Fig. E2.3 is carrying a load of 100 kg that may be placed
anywhere along the span of the beam. The reaction at the support A, RA, can be any value between
0 and 100 kg depending on the position of the load on the beam; in this case, therefore, any value
between 0 and 100 kg is a possible value of RA, and thus is its possibility space.
An event of interest may be that the reaction is in some specified interval; for example, (10 < RA
< 20 kg) or (RA > 50 kg). Therefore, if a particular value of RA is realized, the event (defined by an
interval) containing this value of RA has occurred, and we can speak of the probability that RA will,
or will not, be in a given interval. For example, if we assume that the 100 kg load is equally likely to
be placed anywhere along the beam span, then the probability that the value of RA will be in a given
interval is proportional to the length of the interval; for example,
P(10< Ra < 20) = 10/100 = 0.10 and P(RA > 60) = 40/100 = 0.40.
100 kg
EXAMPLE 2.4 Consider the bearing capacity of the footing foundation for a building. Suppose that from prior
experience, it is the judgment of the foundation engineer that the bearing capacity of a footing at the
building site has a 95% probability of at least 4000 psf (pounds per square foot). If 16 individual
footings are required for the building foundation, what is the probability that all the footings will
have at least 4000 psf bearing capacity? Conversely, what is the probability that at least one of the 16
footings would have its bearing capacity less than 4000 psf?
In this case, the possibility space consists of 2I6 = 65,536 sample points. Suppose that each
footing has a probability of 0.95 that its bearing capacity will be at least 4000 psf. Then if the bearing
capacities of the different footings are statistically independent of each other, the probability that all
the footings will have bearing capacities of at least 4000 psf is then (0.95)16 = 0.440.
In the second question, “at least one is the complement of none.” Therefore, the probability of
at least one footing with bearing capacity less than 4000 psf is 1 — 0.440 = 0.560. The concepts of
complement of an event and statistical independence will be discussed later in Sections 2.2 and 2.3,
respectively. ◄
From the foregoing examples, we can observe the following special characteristics of
probabilistic problems.
30 ► Chapter 2. Fundamentals of Probability Models
In Sections 2.2 and 2.3, we shall present the mathematical tools pertinent to and useful
for each of these purposes.
The potential winner in a competitive bidding for a construction project will be among those
firms submitting bids for the project. In this case, the sample space is generally finite and
consists of all the firms submitting bids for the project, whereas each of the firms is a sample
point.
The number of days in a year with potentially measurable precipitation in Seattle is finite
and conceivably will range from 0 to 365 days. Each day of the year is a sample point, and the
number of days in the year plus one constitute the sample space.
In determining the percentage offlights that arrive more than 15 minutes late at the O'Hare
International Airport, the total number of flights landing at O'Hare in a 24-hour day is a finite
sample space and each of the flights is a sample point within this sample space.
Examples of discrete sample spaces with countably infinite number of sample points
are the following:
• The number of flaws in a given length of welding—there may be none or only a few
flaws in the weld, or the number of flaws could be very large. The actual number of
flaws in a weld could conceivably be infinite.
• The number of cars crossing a toll bridge until the next accident on the bridge over
a period of one year. An accident may possibly occur with the first car crossing the
bridge, or there may not be any accident in the year.
In a continuous sample space, the number of sample points is always infinite. For example:
• In considering the location on a toll bridge where a traffic accident may occur, each of
the possible locations is a sample point, and the sample space would be the continuum
of possible locations on the bridge.
32 Chapter 2. Fundamentals of Probability Models
• If the bearing capacity of a clay soil deposit is between 1.5 tsf (tons per square foot)
and 4.0 tsf, then any value within the range 1.5 to 4.0 is a sample point, and the entire
continuum of values in this range constitutes the sample space.
► EXAMPLE 2.5 Consider again a simply supported beam AB as shown in Fig. E2.5a.
(a) If a concentrated load of 100 kg can be placed only at any of the 1-meter interval points on the
beam, the sample space of the reaction RA will be as follows:
(0. 10, 20, 30, 40, 50, 60, 70. 80, 90. 100 kg)
Figure E2.5b Sample space of RA and RB. Figure E2.5c Sample space of RA.
(c) If the 100-kg load can be placed anywhere along the length of the beam, the sample space of RA
can be represented by the straight line between 0 and 100 (Fig. E2.5c), whereas the corresponding
sample space of RA and RB is the diagonal line shown in Fig. E2.5d. In Fig. 2.5c, an event may be
defined as (20 < RA < 40), whereas, in Fig. 2.5d an event for (RA, RB) may be between (20, 80) and
(40. 60).
2.2 Elements of Set Theory—Tools for Defining Events ◄ 33
0 300 kg
Figure E2.5d Sample space of (RA, RB). Figure E2.5e Sample space of RA, or RH.
(d) Next, consider that the load can be 100 kg, 200 kg, or 300 kg and can be placed anywhere along
the beam. In this case, the sample space of /?.,», or RB, contains all the values between 0 and 300 kg as
represented by the line shown in Fig. E2.5e. whereas the sample space of (RA, RB) is represented by
the three lines shown in Fig. E2.5f.
(e) Finally, if the load can be any value between 100 and 300 kg and can be placed anywhere along
the beam, the sample space of RA or Rl} is also the straight line of Fig. E2.5e, whereas the sample
space of (RA. Rh) would be the hatched area in Fig. E2.5g.
The event within the sample space of Fig. E2.5g, (RA > 200 kg) is the triangular region shown in
Fig. E2.5g, whereas the event (RA < 100; Rh > 100) is the trapezoid in the same figure. ◄
► EXAMPLE 2.6 From historical data of floods for a river, suppose the annual maximum flood levels above the mean
river flow range from 1 m to 5 m. If the annual maximum flow is measured in an increment of 0.1 m.
then the sample space of the annual flow would contain the 51 sample points (1.0, 1.1. 1.2,..., 4.8,
4.9, 5.0 m). The event of annual flood flow exceeding 3.0 m. therefore, would contain the 20 sample
points defined by (3.1, 3.2,..., 4.8. 4.9. 5.0 m).
On the other hand, if the annual maximum flow can be any level from 1 m to 5 m, then the sample
space of the annual flow would be the continuum of infinite values between 1 m and 5 m. Similarly,
the event of flood flow exceeding 3.0 m will be the continuum of values between 3 m and 5 m. ◄
34 ► Chapter 2. Fundamentals of Probability Models
Special Events
We define the following special events and adopt the corresponding notations indicated
below:
• Impossible event, denoted </>, is the event with no sample point. It is, therefore, an
empty set in a sample space.
• Certain event, denoted S, is the event containing all the sample points in a sample
space; i.e., it is the sample space itself.
• Complementary event E, of an event E contains all the sample points in S that are
not in E.
Figure 2.2a Venn diagram with Figure 2.2b Venn diagram with
two events A and B. three events A, B, and C.
A Venn diagram with two (or more) events is illustrated in Fig. 2.2.
In many practical problems, the event of interest may be a combination of several
other events. For instance, in Example 2.1 the event of at least two bulldozers in operating
condition after 6 months may be of interest. This event would be the combination of two
or three bulldozers in operating condition. Such an event involves the union of the two
individual events.
There are only two ways that events may be combined, or an event may be derived
from other events; namely, by the union or intersection. Consider two events E\ and E2.
The union of E] and E2, denoted E\ U E2, is the occurrence of Ei or E2 or both. (In set
theory, or is used in an inclusive sense, which means or/and.) This means that E\ U E2 is
another event containing all the sample points belonging to either E| or E2.
The Venn diagram for the union of E\ and E? would be the hatched region shown in
Fig. 2.3. It follows, therefore, that the region outside of the hatched region within S' is the
complementary event E\ U E2, i.e., the complement of the event (E| U E2).
2.2 Elements of Set Theory—Tools for Defining Events ◄ 35
Figure 2.3 Venn diagram for E\ U £>>• Figure 2.4 Venn diagram of E\E2.
The union of three or more events, as shown Fig. 2.2b, means the occurrence of at least
one of them, and is the subset of the sample points within the three hatched regions of the
individual events A, B, and C.
The intersection of two events E\ and Ez, denoted E\ A Ez or simply E\Ez, is an event
representing the joint occurrence of E\ and E-y, in other words, £j£2 is the subset of sample
points belonging to both E\ and £2- The Venn diagram of E\Ez would be the double hatched
region shown in Fig. 2.4.
Finally, the intersection of three or more events is the occurrence of all of them and
would be the subset of sample points belonging to all three individual events.
Examples of Union of Events
• In describing the state of supply of construction materials, if £ । represents the shortage
of concrete, and £2 represents the shortage of steel, then the union £1 U £2 is the
shortage of concrete or steel, or both. In this case, the complementary event, £1 U £2
means no shortage of construction material, i.e., concrete and steel are both available,
whereas E\ U £2 means there is no shortage of concrete or there is no shortage of
steel (observe the subtle difference).
• The transportation of cargoes between Chicago and New York may be by air, highway,
or railway. If the availability of each of these three modes of transportation is denoted,
respectively, as A, H, and /?, the available means of transporting cargoes between
Chicago and New York is (A U H U /?), i.e., cargoes may be shipped by air or highway
or railway.
Examples of Intersection of Events
• Referring to the first example above, the event £1 £2 would mean the shortage of both
concrete and steel, and £1 £2 means no shortage of both materials.
• Referring to the second example above, the event AHR means that all three modes of
transportation between Chicago and New York are available. Observe also that the
event AH R means that only air transportation is available.
EXAMPLE 2.7 Suppose there are two highway routes from city A to city B as shown in Fig. E2.7a. Let E\ represent
the event that Route I is open and E2 that Route 2 is open to traffic. Then E\ U E2 means that
Route 1 is open or Route 2 is open; in other words, at least one of the two routes is open.
The intersection, E\E2, means that both Routes 1 and 2 are open, whereas E\E2 means that
Route 1 is open but Route 2 is closed, and E\E2 means that both routes are closed perhaps because
of heavy snow.
Next consider the three cities A, B and C with Route 1 connecting A to B and Route 2 connecting
B to C as shown in Fig. E2.7b. If E\ and E2, respectively, mean that Route 1 is closed and Route 2 is
closed, then the union E\ U E2 means that Route 1 is closed or Route 2 is closed, which also means
that it is not possible to go from city A to city C.
e2
Finally, assume that there are two alternative routes from city A to city B, and only Route 3 from city
B to city C, as shown in Fig. E2.7c. In this case, there are two possible ways to go from city A to city
C; namely, EiEj U E2Ey. alternatively, this can be expressed as the event (£, U E2)E}. Observe that
the event E j E2 U E2 means that there will be no transportation from city A to city C. ◄
► EXAMPLE 2.8 Consider the last case of Example 2.5. in which the load ranges from 100 kg to 300 kg and the sample
space of the two reactions (RA, Rb) as shown in Fig. E2.5g.
the events A and B would be the respective subsets of the point pairs shown in Figs. E2.8a and E2.8b.
The union A U B is the hatched region in Fig. E2.8c, whereas the intersection AB would be the hatched
region in Fig. E2.8d.
Note that in this example. Figs. E2.8a through E2.8d also serve as the corresponding Venn
diagrams. ◄
If two events are mutually exclusive, then their intersection is an impossible event; i.e.,
£,E2 =0-
Examples of events that are naturally mutually exclusive include the following:
1. A car making a right turn and making a left turn at a street intersection.
2. Occurrence of flood and occurrence of drought of a river at a given time.
3. Collapse and no damage of a building under a strong earthquake.
Similarly, three or more events are mutually exclusive if the occurrence of one event pre
cludes the occurrence of all the other events. Examples are the following:
1. If there are three competing locations for a single airport, then the three choices for
the final location of the airport are mutually exclusive.
2. In Example 2.1, the number of bulldozers that will remain operational after 6 months
are mutually exclusive.
3. In Example 2.2, the number of vehicles waiting for left turns at the intersection are
mutually exclusive.
38 Chapter 2. Fundamentals of Probability Models
However, in Example 2.7, the conditions of the different routes are not mutually exclusive as
the closing of one route does not necessarily preclude the closing of another route. Likewise,
in Example 2.8, the events A and B are not mutually exclusive because both Rr\ and Rb can
exceed 100 kg if the load is sufficiently high, say > 200 kg; also the intersection AB is not
a null set.
► EXAMPLE 2.9 Two construction companies a and b are bidding for projects. Define A as the event that Company a
wins a bid. and B as the event that Company b wins a bid. Let us sketch the Venn diagrams for the
sample spaces of the following:
1. Company a is submitting a bid for one project, and Company b is submitting its own bid for another
project. The Venn diagram would be as shown in Fig. E2.9a.
In this case, it is possible for both companies to win their respective bids as represented by the
intersection of A and B.
2. Companies a and b are submitting bids for the same project, and there are also other bidders for
the project. The corresponding Venn diagram would be as shown in Fig. E2.9b.
In this case, Company a or b may be awarded the project, or one of the other companies may be
awarded the project. If Company a wins its bid, then no other companies, including Company b can
also be the winner of the award. That is, the occurrence of A precludes the occurrence of B; thus,
the events A and B are mutually exclusive, as shown in Fig. E2.9b where there is no overlapping
region between A and B. Moreover, the complement of A U B, i.e., A U B, means that one of the other
companies wins the award.
3. Company a and Company b are the only companies submitting competing bids for a single project.
The Venn diagram for this case would appear as shown in Fig. E2.9c.
In this particular case, since Company a and Company b are the only bidders for the single project,
and only one of them can be the winner, the events A and B are mutually exclusive, and are also
collectively exhaustive or A UB = S. Thus, the sample space contains only the two sets A and B as
shown in Fig. E2.9c.
► EXAMPLE 2.10 There are three possible sites, denoted as Site a, Site b and Site c for the construction of a new airport
for a major city.’Define the following:
= the union
= the intersection
= contains
— belongs to, or is contained in
E = the complement of E
With these notations, the mathematical rules governing the operations of sets are the
following:
Equality of Sets
Two sets are equal if and only if both sets contain exactly the same sample points. On this
basis, we observe that
A U0 = A
Also,
A A0 = (/)
Furthermore,
AU A = A
AAA - A
and, for the sample space S,
AUS = S
whereas
AAS=A
On Complementary Sets
With regard to an event E and its complement E, we observe the following:
EU E = 5
whereas
E A E = (/)
and
~E = E
that is, the complement of the complementary event yields the original event.
Commutative Rule
The union and intersection of sets are commutative; that is, for two sets A and B.
AU B = BUA
Also.
AAB =BnA
Associative Rule
The union and intersection of sets are also associative; that is, for three sets, A. B. and C,
(AU B)UC = AU(BUC)
Also,
(AB)C = A(BC)
Distributive Rule
Finally, the union and intersection of sets are distributive; that is, for three sets A. B. and C,
(A U B)DC = A A C U B A C or AC U EC
and also,
(AB) U C = (A U C) A (B U C)
We might observe that the above commutative, associative, and distributive rules for
sets are similar to the same algebraic rules for numbers. In particular, the operational rules
2.2 Elements of Set Theory—Tools for Defining Events ◄ 41
governing the addition and multiplication of numbers apply (with certain equivalences) to
the union and intersection of sets. With the following equivalences—union for addition
and intersection for multiplication (i.e., U —> + and A —> x)—the rules of conventional
algebra apply to operations of sets or events. Moreover, in accordance with the hierarchy of
algebraic operations, intersection takes precedence over union of sets, unless parenthetically
indicated otherwise.
It should be emphasized that the above equivalences are only valid in an operational
sense; conventional algebraic operations such as addition and multiplication have no mean
ing for sets or events. Moreover, there are no equivalent operations for subtraction or
division of sets. On the other hand, there are operations and operational rules that apply to
sets that have no counterparts in conventional algebra. For example, for a set A,
Another case in point is the second of the distributive rule described above, which says that
(A U C\B U C) = A5UACUECU CC
but,
BC U CC = C
Similarly,
ACUC = C.
(A U C)(E U C) = A5 U C
(a + c)(Z? + c) = ab + ac + be + c2 ab + c
Finally, another important rule that applies to sets but has no counterpart in conventional
algebra is the de Morgan's rule, as described below.
De Morgan’s Rule
This rule relates to sets and their complements. For two sets, or events, E\ and Ei, the
de Morgan’s rule says that
E] U E2 — E\ A E2
The general validity of this relation can be shown with the Venn diagrams in Fig. 2.6.
The unhatched region in Fig. 2.6a is clearly E\ U E2. The two Venn diagrams in Fig.
2.6b show, respectively, the complementary sets E|and E2, the intersection of which is the
double-hatched region in Fig. 2.6c. From Figs. 2.6a and 2.6c, we see the equality of the two
sets E\ U E2 = E\ A E2, thus verifying the de Morgan’s rule.
In more general terms, the de Morgan's rule is
£, U E2U"-U£„ = E? A E? A • • • A E^ (2.3a)
E| U E? U • • • U En = E| E2 • ■ • En
42 Chapter 2. Fundamentals of Probability Models
(a)
(6)
Double-hatched
region=E1nE2
(c)
Figure 2.6 Venn diagrams showing de Morgan’s rule.
Thence, taking the complements of both sides of the above equation, the de Morgan's rule
can be stated also as
E\ Ei • • ■ En = E\ U E2 U • • • U En (2.3b)
In light of Eqs. 2.3a and 2.3b. we can establish the following duality relation: The comple
ment of the unions and intersections of events is equal to the intersections and unions of the
respective complements of the same events.
The following examples illustrate the above duality relation:
► EXAMPLE 2.11 Consider a simple chain consisting of two links as shown in Fig. E2.11. Clearly, the chain will fail to
cany the load F if either link breaks; thus, if
Link 1 Link 2
Then
Failure of chain = E\ U £2
No failure of the chain, therefore, is the complement E\ U £2. However, no failure of the chain also
means that both links survive (no breakage); that is,
No failure of chain = E\ Cl E2
Therefore, we have
E\ U E2 — E\ A E2
► EXAMPLE 2.12 The water supply for two cities C and D comes from the two sources A and B as shown in Fig. E2.12.
Water is transported by pipelines consisting of branches 1,2.3, and 4. Assume that either one ol the
two sources, by itself, is sufficient to supply the water for both cities.
E\ e2 u e3 = (£? u £?)£?
The last event above means that there is no failure in branch 1 or branch 2 and also no failure in branch
3. Similarly, shortage of water in city D would be the event E\E2 U £3 U £4. Therefore, no shortage
of water in city 1) is
which means that there is sufficient supply at the station, i.e., (£1 U £2), and there are no failures in
both branches 3 and 4, represented by £3£4. ◄
44 Chapter 2. Fundamentals of Probability Models
P(E)>0 (2.4)
P(5)=I.O (2.5)
Axiom 3: Finally, for two events E\ and E^ that are mutually exclusive,
Equations 2.4 through 2.6 constitute the basic axioms of probability theory. These are
essential assumptions and. therefore, cannot be violated. However, these axioms and the
resulting theory must be consistent with and useful for real-world problems. In this latter
regard, we may observe the following:
• The probability of an event, P(E), is a relative measure, i.e., relative to other events
in the same sample space. For this purpose, it is natural and convenient to assume
such a measure to be nonnegative as prescribed in Eq. 2.4.
• Because an event, E, is always defined within a prescribed sample space S, it is
convenient to normalize its probability relative to 5 (the certain event), as specified
in Eq. 2.5.
Therefore, on the basis of Eqs. 2.4 and 2.5. it follows that the probability of an event E is
bounded between 0 and 1.0; that is,
With regard to the third axiom of Eq. 2.6, we may observe intuitively that from a relative
frequency standpoint, if an event E\ occurs n\ times among n repetitions of an experiment,
and another event £2 occurs n2 times in the same n repetitions, in which E\ and E2 cannot
occur simultaneously (they are mutually exclusive), then E\ or E2 will have occurred
(«i +/i2) times among the n repetitions of the experiment. Thence, on the basis of relative
frequency, we have (for large w)
n\ + n-> n\ n\
P(Ei U E2) = —------ - = — + —
n n n
= P(E1)+P(E2)
It should be emphasized that the mathematical theory of probability provides the logical
bases for developing the relationships among probability measures. As expected, all such
relationships and any theoretical results are based on the three basic axioms stated in Eqs.
2.4 through 2.6.
2.3 Mathematics of Probability 45
As can be seen in Fig. 2.7, E\ and E| E2 are mutually exclusive. Therefore, according
to Eq. 2.6,
P(E. UE?E2)= P(Ei)+ P(E7E2)
But EjE2 U E]E2 = (E) U Ei)E2 = SE2 = E2; and E\E2 and E|E2 are clearly mutually
exclusive. Hence,
P(E?E2)= P(E2)- P(E)E2)
from which we obtain Eq. 2.8.
► EXAMPLE 2.13 A contractor is starting two new projects—jobs 1 and 2. There is some uncertainty on the scheduled
completion of each job; at the end of 1 year, the condition of completing each of the jobs may be
defined as follows:
A = definitely completed
B = completion questionable
C = definitely incomplete
Problems
1. Describe the sample space for the states of completion of the two jobs; i.e., identify all the possible
situations regarding the completion of both jobs 1 and 2 at the end of 1 year in terms of the notations
above; e.g., AA means both jobs will be definitely completed in 1 year.
The pertinent Venn diagram is shown in Fig. E2.13 where all the sample points are contained in 5. If
El is the event that job I will definitely be completed in I year, then
2. Assuming that each state of completion of both jobs is equally likely at the end of 1 year (i.e., each
sample point has a probability of 1/9), what is the probability that at least one job will definitely be
completed at the end of 1 year?
In this case, the event of interest is the union £j U E2. Observe first that the intersection E\E2 D (AA);
therefore, according to Eq. 2.8,
3 3 15
P(£,Ufi2)=5 + --5=-
We can also observe from Fig. E2.13 that (E\ U £2) 2) (AA. AB, AC, BA, CA)\ its probability is also
5/9, thus verifying the above result.
3. Only one of the two projects will definitely be completed at the end of one year E, this event
contains the following sample points,
E 3 (AB, AC, BA, CA)
Its probability, therefore, is 4/9.
► EXAMPLE 2.14 For the purpose of designing the left turn (L.T.) lane in Example 2.2, the 60 observations of the number
of vehicles waiting for left turns at the intersection yielded the results shown in Table E2.14.
Define
The different number of vehicles waiting for left turns at the intersection are obviously mutually
exclusive events. Then, using the relative frequencies in Table E2.14 to represent the corresponding
probabilities, we obtain, approximately.
14 3 2 1 20
60 60 60 60 60
whereas
4 16 20 14 3 57
P(E~} ~ 60 + 60 + 60 + 60 + 60“ 60
2.3 Mathematics of Probability 47
0 4 4/60
I 16 16/60
2 20 20/60
3 14 14/60
4 3 3/60
5 2 2/60
6 1 1/60
7 0 0
8 0 0
► EXAMPLE 2.15 In Example 2.8, two events associated with the reactions at A and B are defined as
These are represented by the respective subsets in the sample space of Fig. E2.15. For this illustration,
we may assume that the sample points are all equally likely to occur. This implies that the probability
of an event within this sample space is proportional to its “area” relative to the sample space.
Similarly. P(B) = -
whereas
|(10
P(AB) = - ----
V 40,0
Therefore, according to Eq. 2.8, we obtain,
P(A U B) = +•
35,000 7
from which we also obtain P(A U B) =-------- = -
40,000 8
Extending the addition rule. Eq. 2.8, to three events £], E2, E2 we would have,
► EXAMPLE 2.16 The major airline industry is subject to labor strikes by the pilots, mechanics, and the flight attendants
or by two or more of these labor groups. Using the following notations,
determine the probability of a labor strike in the major airline industry in the next 3 years. Assume the
following respective probabilities of strikes by the three individual groups: B(A) = 0.03: P(B) = 0.05;
2.3 Mathematics of Probability 49
P(C) = 0.05, and that strikes by the different labor groups are statistically independent (see Sects.
2.3.2 and 2.3.3), which means according to Eq. 2.15.
SOLUTION A strike in the industry will occur in the next 3 years if any one or more of the three
labor groups go on strike during this period; therefore, we are interested in the union of A, B. and C
whose probability according to Eq. 2.9 is
Note that A. B. and C are also statistically independent; i.e., P(A B C) — P(A)P(B)P(C). Thus,
we also obtain
P(A U B U C) = 1 - (0.95 x 0.97 x 0.95)
= 0.1246 ◄
It may be well to emphasize that the conditional probability, as defined in Eq. 2.11, is
merely a generalization of the (unconditional) probability of an event. When we speak of
= U£l)£2]}
P(E2)
P(E2)
= 1.0
P(E2)
Therefore,
P(£d£2)= 1 - £’(£1|£2) (2.12)
which is a generalization of Eq. 2.7. It is important to recognize that in Eq. 2.12 the
conditioning event E2 is the same reconstituted sample space on both sides of the equation.
For this reason, one must make sure, when applying Eq. 2.12, that the event (e.g., £() and
its complement refer to the same conditioning event E2.
Observe, for example, the following:
P(£,|£l)/ 1-P(£?|£2)
P(£?|£2)/ 1 - P(£,|£l)
P(£7|£l)/ 1 - P(£,|£2)
► EXAMPLE 2.17 There are two highways from City A to City B as shown in Fig. E2.17.
Route J is on flat terrain, whereas Route 2 is a scenic route that goes through mountainous terrain.
During severe winter seasons, one or both routes may be closed to traffic because of heavy snowfalls.
Between the two routes. Route 2 is obviously more likely to be closed than Route 1 during the winter
months. Moreover, the condition of Route 1 during a severe snow storm may depend on whether or
not Route 2 is open to traffic. Suppose, during a severe snow storm, the probabilities that the routes
will be open are, respectively,
£(£0 = 0.75.
whereas
P(E2) = 0.50
and the probability that both routes will be open is
P(£1£2) = 0.40
Then, if Route 2 is open during a snow storm, the probability that Route i is also open is, according
to Eq. 2.11,
£(£.£2) 0.40
P(£i|£2) = = 0.80
£(E2) 050
On the other hand, if Route 2 is closed during a severe snow storm, the probability that Route 1 is also
closed may be determined as follows:
£(g, £2)
£(£i|£2) =
£(£2)
in which
► EXAMPLE 2.18 Suppose that motor vehicles approaching a certain intersection are twice as likely to go straight ahead
than to make a right turn; and left turns are only half as likely as right turns.
As a vehicle approaches the intersection, the possible directions may be defined as follows:
£1 = straight ahead
£2 = turning right
£3 = turning left
52 I Chapter 2. Fundamentals of Probability Models
At the intersection, if a vehicle is definitely making a turn, the probability that it will be a right turn
is (observe that the three alternative directions are mutually exclusive)
P[£.(£^ U £3) P(£2U£2£3)
P(£2|£2 U £3) = ~------ — = —--------
P(£2U£3) P(£2U£3)
£(£,) 2/7 2
“ P(£2)+P(£3) “ 3/7 “ 3
On the other hand, if the vehicle is definitely making a turn at the intersection, the probability that it
will not turn right, according to Eq. 2.12, is
Statistical Independence
If the occurrence, or nonoccurrence, of an event does not affect the probability of occurrence
of another event, the two events are statistically independent. In other words, the probability
of occurrence of one event does not depend on the occurrence or nonoccurrence of another
event. Therefore, if two events E\ and E2 are statistically independent,
P(E1|E2)= P(E.)
and
£(E2|E,)= £(£2) (2.13)
It might be prudent to point out here the difference between events that are statistically
independent versus those that are mutually exclusive. The difference is profound, and there
should be no confusion—statistical independence between two events refers to the proba
bility of their joint occurrence, whereas two events are mutually exclusive when their joint
occurrence is impossible, as the occurrence of one event precludes the occurrence of the
other. In other words, P(E2|£]) = 0ifEj and £2 are mutually exclusive. Finally, statistical
independence of two or more events pertains to the probability of the joint event, whereas
mutual exclusiveness refers to the definition of the events.
P(£j£2)= £(£i|£2)£(£2)
or (2.14)
P(EjE2) = P(E2\E\)P(E\)
Therefore, with Eq. 2.13, if Ej and E2 are statistically independent events, the above mul
tiplication rule becomes
Mathematically, statistical independence is generally defined in the form of Eq. 2.15 or 2.15a.
2.3 Mathematics of Probability 53
► EXAMPLE 2.19 Consider again the chain system of Example 2.11 consisting of two links as shown in Fig. E2.19
subjected to a force F = 300 kg.
Link 1 Link 2
F = 300 kg F = 300 kg
Figure E2.19 A two-link chain.
If the fracture strength of a link is less than 300 kg, it will fail by fracture. Suppose that the probability
of this happening to either of the two links is 0.05. Clearly, the chain will fail if one or both of the
two links should fail by fracture. To determine the probability of failure of the chain, define
E i = fracture of link I
E2 = fracture of link 2
Then, £(£,) = P(E2) = 0.05 and the probability of failure of the chain is
We observe that the solution requires the conditional probability P(E2\E J, which is a function of the
mutual dependence between E\ and E2- If there is no dependence or they are statistically independent,
P(E2\E0 = P(E2~) = 0.05. In this case, the probability of failure of the chain is
P(E, U E2) = 0.10 - 0.05 x 0.05 = 0.0975
On the other hand, if there is complete or total dependence between E\ and Ei, which means that if
one link fractures the other will also fracture, then P(E2\E\) = 1.0. In such a case, the probability of
failure of the chain becomes
P(E\ UE2) = 0.10 - 0.05 x 1.0 = 0.05
In this latter case, we see that the failure probability of the chain system is the same as the failure
probability of a single link.
Therefore, we can state that the probability of failure of the chain system ranges between 0.05
and 0.0975. ◄
► EXAMPLE 2.20 The foundation of a tall building may fail because of inadequate bearing capacity or by excessive
settlement. Let B and S represent the respective modes of foundation failure, and assume P(B) = 0.001,
P(S) = 0.008. and P(B\S) = 0.10, which is the conditional probability of bearing capacity failure given
that there is excessive settlement. Then, the probability of failure of the foundation is
P(SAB) = P(B\S)P(S)
= [1 - P(B|S)]P(S)
= (1.0-0.1X0.008) = 0.0072
In this problem, we might observe that the conditional probability P(B\S) cannot be larger than 1/8.
The reason for this assertion is as follows:
P(B\S)P(S) = P(S\B)P(B)
0.001 1
U.VUO o
Because P(S|B) < 1.0, the value of P(B\S) is limited to a maximum of 1/8. ◄
► EXAMPLE 2.21 Two rivers, a and b, flow through the neighborhood of a paper mill that is allowed to dispose its waste
into both rivers. The dissolved oxygen, DO, level in the water downstream is an indication of the
degree of pollution of the rivers caused by the disposed waste from the mill. Let
Based on the records of the respective DO levels in the two rivers tested over the past year, it was
determined that on a given day, the likelihood of unacceptable water pollution in each of the rivers
2.3 Mathematics of Probability <1 55
is as follows:
P(A) = 20% and P(B) = 33%
whereas the probability that both rivers will have unacceptable levels of pollution on the same day is
0.10; i.e.,
P(AB) = 10%
On a given day, the probability that at least one of the rivers will have an unacceptable level of
pollution is
P(A U B) = 0.20 + 0.33 - 0.10 = 0.43
If River a is tested to have an unacceptable pollution level, the probability that River b will also have
unacceptable pollution level will be
P(AB) 0.10
= 0.50
whereas the probability that Rivera will have an unacceptable pollution level assuming that River b
has been tested to have an unacceptable level is
P(AB) 0.10
P(A B) = —----- = —— = 0.30
1 P(B) 0.33
A related question: On any given day, what is the probability that only one of the two rivers will
have an unacceptable pollution level?
SOLUTION This means that either one of the two rivers is polluted and the other is not; thus, the
probability is
► EXAMPLE 2.22 The electrical power for a city is supplied by two generating plants—Plant a and Plant b. Each of
the plants has sufficient capacity to supply the average daily power requirement of the entire city.
However, during peak hours of a day, the capacities of both plants are needed; otherwise, there will
be brownouts in parts of the city. Denote the following events:
A = failure of Plant a
B = failure of Plant b
and assume
P(A) = 0.05
P(B) = 0.07
P(AB) = 0.01
If one of the two units should fail on a given day, what is the probability of failure of the other unit
on the same day?
SOLUTION The probability of failure of the second unit will depend on which unit fails first. For
example, if Plant a fails first, we have
P(AB) 0.01
P(B\A) = = 0.20
P(A) ~ 005
56 Chapter 2. Fundamentals of Probability Models
- P(AB) P(A\B)P(B)
' P(AU B) P(A U B)
^(1-0.14)0.07^
0.11
Finally, the probability that a brownout will be caused by the failures of both plants would be
P[4B(AUB)] P(ABAUABB) P(AB)
P(AB\A U B) - u - —p(AuB) - 0 B)
_ 0.01
= 0.09
“ ojT
► EXAMPLE 2.23 Suppose that before a section (say 1/10 km in length) of a newly constructed highway pavement
is accepted by the State Department of Highways, the thickness of a 25-cm pavement is inspected
for specification compliance by taking ultrasonic readings at every 1/10-km point of the constructed
pavement, as shown in Fig. E2.23. Each 1/10-km section will be accepted if the measured thickness
is at least 23 cm; otherwise, the entire section will be rejected or a penalty will be imposed.
Constructed pavement
Ultrasonic readings at
these locations
From past experience, 90% of constructed highway works by the same contractor were found
to be in compliance with specifications. The ultrasonic thickness determination is only 80% reliable;
i.e., there is a 20% chance that a conclusion based on the ultrasonics test may be erroneous.
2.3 Mathematics of Probability 57
Based on the contractor’s past construction record, we may assume that 90% of his constructed
pavement will have acceptable ultrasonics readings; hence, P(A) = 0.90.
The probability that a particular 1/10-km section of the pavement is well constructed and will
be accepted by the Highway Department is, therefore,
P(GA) = P(G\A)P(A)
= (0.80)(0.90) = 0.72
The more pertinent and practical questions are perhaps the following:
1. What is the probability that a well constructed section of the pavement will be accepted by the
Highway Department on the basis of the ultrasonics test?
2. Conversely, what is the probability that a poorly constructed section will be rejected?
SOLUTIONS The probability that a well-constructed section of the pavement will be accepted is
P(G\A)P(A)
P(A\G) = ' ... ..........
1 P(G)
Ej, i — 1,2,.../?, and the probability of A will depend on whieh of the E,’s has occurred. On
such an occasion, the probability of A would be composed of the conditional probabilities
(conditioned on each of the E,’s) and weighted by the respective probabilities of the E,’s.
Such problems require the theorem of total probability.
Before formally presenting the mathematical theorem, let us examine the following
example to illustrate the essential elements of the theorem.
► EXAMPLE 2.24 The flooding of a river in the spring season will depend on the accumulation of snow in the mountains
during the past winter season. The accumulation of snow may be described as heavy, normal, and
light. Clearly, if the snow accumulation in the mountains is heavy, the probability of flooding in the
following spring will be high, whereas, if the snow accumulation is light, this probability will be low.
Flooding, of course, may also be caused by rainfalls in the spring. With the following notations,
and by virtue of the multiplication rule. Eq. 2.14, we obtain the theorem of total probability
as
► EXAMPLE 2.25 Hurricanes along the Gulf of Mexico and the eastern seaboard of the United States occur every year,
mostly in the summer and fall. These hurricanes are classified into five categories from Cl through
C5; to be classified as a hurricane, the wind speed must be at least 75 miles per hour (120 km/hr). The
frequencies of hurricanes, of course, would decrease with the categories; for example, a Category C5
hurricane, with sustained wind >150 mph (242 km/hr), would very seldom occur.
We might observe first that the five categories of hurricanes. Cl. C2, C3, C4, C5, are mutually
exclusive, as it is reasonable to assume that no two categories can occur at the same time and cover
all possible hurricanes; thus, these five categories plus nonhurricane winds (denoted CO) are also
collectively exhaustive.
Assume that annually there can be at most one hurricane striking a particular area in the southern
coast of Louisiana along the Gulf of Mexico, and the annual occurrence probabilities of the different
hurricane categories are as follows:
P(Cl) = 0.35; P(C2) = 0.25; P(C3) = 0.14; P(C4) = 0.05; P(C5) = 0.01
Structural damage to an engineered building in the reference area can be expected to occur depending
on the category of hurricane that the building will be subjected to. Suppose the conditional probabilities
of damage to the building are as follows:
£(£>) = P(D|C1)P(C1)+P(D|C2)P(C2)+P(£>|C3)P(C3)+P(D|C4)P(C4)+P(£>|C5)/’(C5)
= 0.05 x 0.35 + 0.10 x 0.25 + 0.25 x 0.14 + 0.60 x 0.05 + 1.00 x 0.01 = 0.1175
Therefore, annually the probability of hurricane wind damage to the building is about 12%.
We might observe from the above calculations that the greatest contributions to the annual
damage probability are from hurricanes of Categories 3 and 4, P(D\C3)P(C3) — 0.035 and
£(£>|C4)P(C4) = 0.030. Observe also that even though damage to the building will certainly oc
cur under a Category 5 hurricane. £(£>|C5) — 1.00. the occurrence of such hurricanes is very rare,
annually P(C5) = 0.01, which means that it might occur (on the average) only about once in every
100 years. ◄
60 Chapter 2. Fundamentals of Probability Models
► EXAMPLE 2.26 Figure E2.26 shows the eastbound directions of two interstate highways /i and 12 merging into another
highway I3. Interstates 1} and 12 have the same traffic capacities; however, the rush-hour traffic volume
on 12 is about twice that on 1\ so that during rush hours, the probabilities of traffic congestion, denoted,
respectively, as E\ and £2.are as follows:
Also, when one route has excessive traffic, the chance of excessive traffic on the other route can
be expected to increase; assume that these conditional probabilities are:
P(E\ |£2) = 0.40. whereas, from Bayes' theorem (see Eq. 2.20), we must have
P(£2|£i) = 0.80
We might wish also to determine the probability of traffic congestion on the third route/j. P(E?,).
1. First, let us assume that the capacity of I3 is the same as that of h or Z2, and that when I\ and Z2
are both carrying less than their respective traffic capacities, there is a 20% probability that It, will
experience excessive traffic; i.e., P(£3|£| £2) = 0.20.
We would expect that the relevant probability will depend on the traffic conditions on I\ and
Z2, which may be £,£2, E|£2, £]£2. or E|£2; observe that these four joint events are mutually
exclusive and collectively exhaustive. Their respective probabilities are then
Clearly, the traffic on It, will be congested when the traffic on I\ or I2 or both are excessive. Then we
obtain the total probability of traffic congestion on 1$ to be
2. Next, assume that the traffic capacity of It, is twice that of Zj or Z2. In this case, if only one of
the two routes I\ or Z2 has excessive traffic, the probability of congestion on I3 will be 25%, i.e.,
P(£3|£|£2) = P(Ey|£| E2) = 0.25. If both routes have excessive traffic, the probability that the
capacity of It, will be exceeded is 95%, meaning P(£3|£f £2) = 0.95.
2.3 Mathematics of Probability ◄ 61
In the next example, we will consider a problem involving double conditional events
in association with the application of the total probability theorem.
► EXAMPLE 2.27 The town of Urbana is in the county of Champaign, Illinois, which is located in the plains area of
the midwestem United States. High-wind storms can occur in this area with maximum wind speeds
exceeding 60 mph (96 km/hr), and occasionally such storms can spawn tornadoes creating even higher
wind speeds.
Suppose we wish to assess the annual probability of structural damage to residential houses in
Urbana, and based on historical data were able to determine the following information:
High-wind storms in the county of Champaign occur about once every other year, and during
such a storm the probability that it will be accompanied by tornadoes is 0.25. The probability
that tornadoes occurring in the county of Champaign will strike the town of Urbana is 0.15.
The probability of severe damage to houses in Urbana during a wind storm (in the absence of
tornadoes) is 0.05; however, when tornadoes are spawned in the county but do not strike the
town of Urbana, the probability of severe damage to houses is 0.10, whereas if the tornadoes
strike the town, the probability of severe damage to one or more houses in Urbana will be a
certainty.
and also,
P(5) = 0.50; P(T|5) = 0.25; P( H\ST) = 0.15
► EXAMPLE 2.28 A tower may be subjected to earthquake loads which could be of high intensity (event H) or of long
duration (event L). It is estimated that if the load has long duration, the probability that its intensity
is high is 0.7. Also, if the load has high intensity, there is 20% probability that it will be of short
duration. Finally, the probability of having a long duration earthquake load is 0.3.
The designer estimated that the probability of failure when the tower is subjected to a short
duration-high intensity earthquake is 0.05, whereas, this probability is doubled if the earthquake is of
long duration but low intensity. Also, he is certain that the tower will fail if subjected to an earthquake
with both high intensity and long duration, and that it will survive with certainty if subjected to an
earthquake of low intensity and short duration.
SOLUTIONS From the problem statements, we observe the following probabilities: P(H\L) =
0.7_=> P(~H\L) = 0.3; also, P(L\H) = 0.2 and P(L\H) = 0.8. Furthermore. P(L) = 0.3 and
P(L) = 0.7. Note that P(H) is not given, but it can be found from
P(H) = P(LH)/P(L\H)
= P(H\L)P{L)/P{L\H)
= 0.7 x 0.3/0.8
= 0.2625
(a) Since P(H\L) = 0.7 / 0, H can happen given that L occurs. So H and L are not mutually
exclusive.
2.3 Mathematics of Probability < 63
(b) P(L\H) = 1 — P(L\H) = I — 0.20 = 0.80, but the unconditional probability P(L) = 0.3 /
P(L\H); hence, L is more likely to occur given the occurrence of H. so H and L are not statisti
cally independent.
(c) P(HUL)= P(H) + P(L) — P(HL)
= P(H)+ P(L) — P(L\H)P(H)
= 0.2625 + 0.3 - 0.8 x 0.2625
= 0.353
which is less than 1.0. Hence, H and L are not collectively exhaustive.
(d) Let F denote the failure of the tower. The total probability of F is obtained by summing the con
tributions from each of the four events, namely HL, H L, H L, H L, which are collectively exhaustive.
Hence, by applying the theorem of total probability,
(2.20)
11 E(A)
which is the Bayes’ theorem. In Eq. 2.20, if P(A) is expanded using the total probability
theorem. Eq. 2.20 becomes
P(A|E,)P(E,)
P(E,|A) = „ (2.20a)
^P(A\Ej)P(Ej)
► EXAMPLE 2.29 Aggregates for the construction of a reinforced concrete building are supplied by two companies,
Company a and Company b. Orders are for Company a to deliver 600 truck loads a day and 400
truck loads a day from Company b. From prior experience, it is expected that 3% of the material from
Company a will be substandard, whereas 1% of the material from Company b are substandard.
Define the following events:
Then,
600 400
P(A) =------------- =0.60; and P(B) = -- ------- —=0.40
600 + 400 600 + 400
whereas p(E|A) = 0.03; and P(E|B) = 0.01
and the probability of substandard aggregates is
The Bayes’ theorem provides a valuable and useful tool for revising or updating a
calculated probability as additional data or information becomes available. The following
examples will serve to illustrate this concept, including how prior information (which may
be based on subjective judgments) can be combined with test results to update a calculated
probability.
► EXAMPLE 2.30 In order to ensure the quality of concrete material used in a reinforced concrete construction, concrete
cylinders are collected at random from concrete mixes delivered to the construction site by a mixing
plant. Past records of concrete from the same plant show that 80% of concrete mixes are good or of
satisfactory quality.
To further ensure that the concrete delivered on site is of good quality, the engineer requires that
one cylinder among those collected each day be tested (after 7 days of curing) for minimum compres
sive strength. The test method is not perfect—its reliability is only 90%. meaning the probability that
a good-quality concrete cylinder will pass the test is 0.90. or that a poor-quality cylinder can pass the
test is 0.10. Define the following events:
Then, if a concrete cylinder passes the test, the probability of good-quality concrete delivered on site
is updated as follows:
Therefore, with a positive test result, the probability of good-quality concrete used in the construction
is increased from 80% to 97.3%.
2.4 Concluding Summary 65
Now, suppose the engineer is not satisfied with just testing one cylinder, and requires that a
second cylinder be tested. If the second cylinder tested also gave a positive result, the probability that
the concrete is of good quality becomes
P(T2\G)P(G)
W|T2) =
P(T2\G)P(G) + P(T2|G)P(G)
0.90 x 0.973
= 0.997
0.90 x 0.973 + 0.10 x 0.027
The above probability is updated sequentially. The updating may also be performed in a single step
by using the two test results together; in this latter case, denoting
P(7jT2|G) = P(7j|G)P(T2|G)
However, if the test result of either one of the two cylinders tested was negative, the probability of
good-quality concrete would be updated as follows:
_ P(7\Ti\G)P(G)
P(G\T\T2) = ------ ------- ----
P(T}T2\G)P(G)+ P(T17’2|G)P(G)
0.90 x 0.10 x 0.80 _
“ 0.90 x 0.10 x 0.80 + 0.10 x 0.90 x 0.20 “ ' <4
In essence, the concepts and tools developed in this chapter constitute the essential
fundamentals necessary for correctly applying probability in engineering. In the ensuing
chapters, particularly Chapters 3 and 4, additional analytical tools will be developed based
on the fundamental concepts expounded in this chapter.
► PROBLEMS
2.1 Suppose the travel time between two major cities A and B draw down the water level in the tank by an amount equivalent
by air is 6 or 7 hr if the flight is nonstop; however, if there is one to 5, 6, or 7 ft of the water in the tank.
stop, the travel time would be 9, 10. or 11 hr. A nonstop flight
between A and B would cost $1200. whereas with one stop the
cost is only $550. Then, between cities B and C, all flights are
nonstop requiring 2 or 3 hours at a cost of $300. For a passenger
wishing to travel from city A to city C.
(a) What is the possibility space or sample space of his travel
times from A to B? From A to C?
(b) What is the sample space of his travel cost from A to C?
(c) If T = travel time from city A to city C, and S = cost of travel
from A to C, what is the sample space of T and 5?
2.2 The settlement of a bridge pier, say Pier 1, is estimated to
be between 2 and 5 cm. Similarly, the settlement of an adja
cent pier. Pier 2, is also estimated to be between 4 and 10 cm. Cylindrical water tank.
There will, therefore, be a possibility of differential settlements
between these two adjacent piers. (a) What are the possible combinations of inflow and outflow
(a) What would be the sample space of this differential settle of water in the tank in a given day?
ment? (b) If the water level in the tank is 7 ft from the bottom at the
(b) If the differential settlements in the above sample space are start of a day, what are the possible water levels in the tank at
equally likely, what would be the probability that the differential the end of the day?
settlement will be between 3 and 5 cm? (c) If the amounts of inflow and outflow of water for the tank
are both equally likely, what would be the probability that there
2.3 The direction of the prevailing wind at a particular building would be at least 9 ft of water remaining in the tank at the end
site is between due East ((? = 0 ) and due North (0 — 90 '), The of the day?
wind speed V can be any value between 0 and oo.
(a) Sketch the sample space for wind speed and direction. 2.5 A 20-ft cantilever beam is shown in the figure below. Load
(b) Denote the events: W। = 200 lb, or W2 = 500 lb, or both may be applied at the mid
point B or at the end of the beam C. The bending moment induced
Ei =(V > 35 kph); at the fixed support A, MA, will depend on the magnitudes of the
loads at B and C.
E2 = (15 kph < V <45 kph)
Ey = (0 < 30°)
i 10ft I 10ft
1
I
Identify the events E\, E2, E3, and Ei within the sample space
ABC
of Part (a).
(c) Use new sketches to identify the following events: 20-ft cantilever beam.
(c) Assume the following respective probabilities for the posi from a premixed concrete supplier. However, it is not always
tions of the two loads: certain that these sources of material will be available. Further
P(W, atB) = 0.25 more, whenever it rains at the site, casting cannot be performed.
On a given day, define the following events:
P(Wj atC) = 0.60
P(W2 atB) = 0.30 E\ = there will be no rain
P(W2atC) = 0.50 E2 = production of concrete material at the job site is feasible
Assuming that the positions of W। and W2 are statistically in E3 = supply of premixed concrete is available
dependent; what are the respective probabilities associated with with the following respective probabilities:
each of the possible values of M.\‘1
(d) Determine the probabilities of the following events: P(Ei) = 0.8; P(E2) = 0.7; P(E3) = 0.95;
and P(E3|£2) = 0.6
E\, E2, E3, E\ A E2, E\ U E2, and E2
whereas E2 and £3 are statistically independent of Ej.
2.6 Two cities 1 and 2 are connected by route A, and route B (a) Identify the following events in terms of £,. E2, and £3:
connects cities 2 and 3 as shown in the figure below. Let us de (i) A = casting of concrete elements can be performed on a
note the eastbound lanes as A| and B^ and the westbound lanes given day.
as A2 and B2. (ii) B = casting of concrete elements cannot be performed on
a given day.
(b) Determine the probability of the event B.
(c) If production of concrete material at the job site is not feasi
ble, what is the probability that casting of concrete elements can
still be performed on a given day?
2.9 A construction firm purchased 3 tractors from a certain com
Routes connecting three cities. pany. At the end of the fifth year, let E\, E2, E3 denote, respec
tively, the events that tractors no. 1, 2. and 3 are still in good
Suppose the probability is 95% that one of the two lanes in operational condition.
route A will not require major resurfacing of the pavement for at (a) Define the following events at the end of the 5th year, in
least 2 years; the corresponding probability for a lane in route B terms of E\, E2, and £3, and their respective complements:
is only 85%.
(a) Determine the probability that route A will require major A = only tractor no. I is in good condition.
resurfacing in the next 2 years. Do the same for route B. Assume B = exactly one tractor is in good condition.
that if one lane of a route needs major resurfacing, the chance
C = at least one tractor is in good condition.
that the other lane of the same route will also need resurfacing
is 3 times its original probability.
(b) Past experience indicates that the chance of a given tractor
(b) Assuming that the need for resurfacing in routes A and B are
manufactured by this company having a useful life longer than
statistically independent, what is the probability that the road
5 years (i.e., in good condition at the end of the 5th year) is 60%.
between cities 1 and 3 will require major resurfacing in two
If one tractor needs to be replaced (not in good operational con
years?
dition) at the end of the 5th year, the probability of replacement
2.7 From past experience, it is known that, on the average, 10% for one of the other two tractors is 60%; if two tractors need to
of welds performed by a particular welder are defective. If this be replaced, the probability of replacement of the remaining one
welder is required to do three welds in a day, is 80%.
(a) What is the probability that none of the welds will be defec Evaluate the probabilities of the events A, B, and C.
tive?
2.10 A contractor has two subcontractors for his excavation
(b) What is the probability that exactly two of the welds will be
work. Experience shows that in 60% of the time, subcontrac
defective?
tor A was available to do a job. whereas subcontractor B was
(c) What is the probability that all the welds for a day are defec
available 80% of the time. Also, the contractor is able to get at
tive?
least one of these two subcontractors 90% of the time.
It is assumed that the condition of each weld is independent
(a) What is the probability that both subcontractors will be avail
of the conditions of the other welds.
able to do the next job?
2.8 On a given day, casting of concrete structural elements at a (b) If the contractor learned that subcontractor A is not available
construction project depends on the availability of material. The for the job, what is the probability that the other subcontractor
required material may be produced at the job site or delivered will be available?
68 Chapter 2. Fundamentals of Probability Models
(c) Suppose EA denotes the event that subcontractor A is avail (b) If town B is flooded in a given year, what is the probability
able, and Eh denotes that subcontractor B is available. that town C is also flooded?
(i) Are EA and EB statistically independent? (c) What is the probability that at least one town is flooded in a
(ii) Are EA and EB mutually exhaustive? given year?
(iii) Are EA and EB collectively exhaustive?
2.11 An underground site is being considered for the storage of
hazardous waste. Within the next 100 years, there is a 1 % chance
that the hazardous material could leak outside of the storage con
tainment. Two adjacent towns, A and B, rely on ground water for
their water supply. The water to each town will be contaminated
if there is a leakage in the waste storage and if there exists a
continuous seam of sand between the storage containment and
2.13 Successful completion of a construction project depends on
the given town. Observe that the presence of a continuous seam
the supply of materials and labor as well as the weather condi
of sand would allow the contaminant to move freely and quickly
tion. Consider a given project that can be successfully completed
access a region. if either one of the following conditions prevails:
(a) Good weather and at least labor or materials are adequately
available.
(h) Bad weather but both labor and materials are adequately
available.
Define:
G = Good weather
G' = Bad weather
L = Adequate labor supply
Suppose there is a 2% chance of a continuous seam of sand from M = Adequate materials supply
the storage site to A. and the probability of a continuous seam of C = Successful completion
sand to B is slightly higher and equals 3%. However, if a contin
uous seam of sand exists between X and A, the probability of a Suppose P(L) = 0.7; P(G) = 0.6 and L is independent of both
continuous seam of sand between X and B is increased to 20%. M and G. If the weather is good, adequate supply of materi
Assume that the event of leakage from the storage is independent als is guaranteed, whereas the probability of adequate supply of
of the presence of seams of sand. Consider the period over the material is only 50% if bad weather prevails.
next 100 years. (a) Formulate the event of successful completion in terms of
(a) What is the probability that water in town A will be contam G. L. and M.
inated? (b) Determine the probability of successful completion. (Ans.
(b) What is the probability that at least one of the two towns’ 0.74)
water will be contaminated? (c) If the project was successfully completed, what is the prob
ability that labor supply had been inadequate? (Ans. 0.243)
2.12 Towns A, B, and C lie along a river, as shown in the figure
below, which may be subject to overflow (flooding). The annual 2.1 4 The number of accidents at rail-highway grade-crossings
probabilities of flooding are 0.2,0.3, and 0.1 for towns A, B, and reported for a province over the last 10 years are summarized
C, respectively. The events of flooding in each of the towns A, and classified as follows:
B, and C are not statistically independent. If town C is flooded
in a given year, the probability that town B will also be flooded Type of Accident
that same year is increased to 0.6; and if both towns B and C are
flooded in a given year, the probability that town A will also be (R) Run into (S) Struck by
flooded that year is increased to 0.8. However, if town C does Train Train
not experience flooding in a given year, the probability that both
Time of Day (D) 30 60
towns A and B will also not suffer any flooding in that year is
Occurrence Night (N) 20 20
0.9. In a given year, if all three towns are flooded, it is regarded
as a disaster year for the year. Suppose the flooding events be
tween any two years are statistically independent. Answer the Suppose there are 1000 rail-highway grade-crossings in
following: province XY.
(a) What is the probability that a given year in the region is a (a) What is the probability that an accident will occur at a given
disaster year? crossing next year? (Ans. 0.013)
Problems 69
(b) If an accident is reported to have occurred in the daytime, Suppose that from prior statistical observations, the proba
what is the probability that it is a “struck by train” accident? bilities of getting a space on each weekday morning in lots A, B,
(Ans. 2/3) and C are 0.20, 0.15, and 0.80, respectively. However, if lot A is
(c) Suppose that 50% of the “run into train” accidents are fa full, the probability that Mr. X will find a space in lot B is only
tal and 80% of the “struck by train” accidents are fatal, what 0.05. Also, if both lots A and B are full, Mr. X will only have a
is the probability that the next accident will be fatal? (Ans. probability of 40% of getting a parking space in lot C.
0.685) Determine the following:
(d) Suppose D = event that the next accident occurs in the (a) The probability that Mr. X will not find free parking on a
daytime weekday morning.
R — event that the next accident is a “run into (b) The probability that Mr. X will be able to park his car on
train” accident campus on a weekday morning.
(i) Are D and R mutually exclusive? Justify. (c) If Mr. X successfully parked his car on campus on a weekday
(ii) Are D and R statistically independent? Justify. (Ans. no) morning, what is the probability that his parking is free?
2.15 The promising alternative energy sources currently under
development are fuel cell technology and large-scale solar energy
power. The probabilities that these two sources will be success
fully developed and commercially viable in the next 15 years are,
respectively. 0.70 and 0.85. The successful development of these
two energy sources are statistically independent. Determine the
following:
(a) The probability that there will be energy supplied by these
alternative sources in the next 15 years.
(b) The probability that only one of the two alternative energy
sources will be commercially viable in the next 15 years. A study of campus parking.
2.16 An examination of the 10-year record of rainy days for a 2.18 A building may fail by excessive settlement of the foun
town reveals the following: dation or by collapse of the superstructure. Over the life of the
1. 30% of the days are rainy days. building, the probability of excessive settlement of the founda
2. There is a 50% chance that a rainy day will be followed tion is estimated to be 0.10, whereas the probability of collapse
by another rainy day. of the superstructure is 0.05. Also, if there is excessive settlement
3. There is a 20% chance that two consecutive rainy days of the foundation, the probability of superstructure collapse will
will be followed by a third rainy day. be increased to 0.20.
A house is scheduled for painting starting next Monday for a (a) What is the probability that building failure will occur over
period of 3 days. its life?
(a) Let Et = Monday is a rainy day (b) If building failure should occur during its life, what is the
probability that both failure modes will occur?
£, = Tuesday is a rainy day
(c) What is the probability that only one of the two failure modes
£, — Wednesday is a rainy day will occur over the life of the building?
Express the events corresponding to the three probabili
2.19 The automobile brake system consists of the following
ties indicated above; i.e., 1, 2, and 3, in terms of £,. E2, E2.
components: the master cylinder, the wheel cylinders, and the
(b) What is the probability that it will rain on both Monday and
brake pads. Failure of any one or more of these components will
Tuesday?
constitute failure of the brake system. Within a period of 4 years
(c) What is the probability that Wednesday will be the only dry
or 50.000 miles without maintenance, the failure probabilities of
day during the painting period?
the master cylinder, wheel cylinders, and brake pads are 0.02,
(d) What is the probability that there will be at least one rainy
0.05, and 0.50. respectively. The probability that both the master
day during the 3-day painting period?
cylinder and the wheel cylinders will fail within the same period
2.17 Mr. X who works in the office building D is selected for or mileage is 0.01. The failure of the brake pads is statistically
observation in a study of the parking problem on a college cam independent of the failures in the master and wheel cylinders,
pus. Each day, assume that Mr. X will check the parking lots A, (a) What is the probability that only the wheel cylinders will fail
B, and C in that sequence, as shown in the figure below, and will within 4 years or 50.000 miles?
park his car as soon as he finds an empty space. Assume also (b) What is the probability that the brake system will fail within
that there are only these three parking lots available and no street 4 years or 50,000 miles?
parking is allowed, among which lots A and B are free—whereas (c) When there is failure in the brake system, what is the prob
lot C is metered. ability that only one of the three components failed?
70 Chapter 2. Fundamentals of Probability Models
2.20 Let E\. E2, and Ey denote the events of excessive snowfall (c) What is the probability that the available power supply to the
in the first, second, and third winters, respectively, from this fall. city is less than or equal to 100 MW in a given week?
Statistical records of snowfalls indicate that during any winter,
2.23 The probability that a strong earthquake will cause damage
the probability of excessive snow is 0.10. However, if exces
to a certain structure has been estimated to be 0.02.
sive snowfall occurred in the previous winter, the probability of
(a) What is the probability that the structure can survive three
excessive snowfall in the following winter is increased to 0.40.
such strong earthquakes?
whereas if the preceding two winters are both subjected to exces
(b) What is the probability that the structure will be damaged
sive snowfalls, the probability of excessive snow in the following
during the second of two such strong earthquakes?
winter will be 0.20.
It may be assumed that damages to the structure caused by suc
(a) From the information given above, determine the following:
cessive earthquakes are statistically independent.
P(El). P(E2), E(E2|E1), P(E3|Ei E2). P(E3|E2) 2.24 A team of two engineers. A and B. was assigned to check
a set of computations. The two work simultaneously but sepa
(b) What is the probability that excessive snowfall will occur in rately and independently. The probability of engineer A spotting
at least one of the next two winters? a given error is 0.8, whereas that for B is 0.9.
(c) What is the probability that excessive snowfall will occur in (a) Suppose there is only one error in the computation. What is
each of the next three winters? the probability that this error will be spotted by this team? (Ans.
(d) If the preceding winter did not experience excessive snow 0.98)
fall, what is the probability that the subsequent winter will not (b) If the error in part (a) was identified, what is the probability
suffer excessive snowfall? In other words, determine /’(E2|E]). that it was discovered by A alone? (Ans. 0.082)
Hint. Start out with the following relationship: (c) Suppose there is an alternative team consisting of three en
gineers C|, C2, and C3, each of whom works separately and
P(E{ U E2) = 1 - P(E\ U E2) = 1 - P(E| E2) independently and has a probability of 0.75 of spotting a given
error. Would this team of three engineers be selected instead, if
2.21 Weather records for a certain region indicate that there is the objective is to maximize the chance of spotting the error?
a tendency for two hot summers to occur in a row. From the Please justify. (Ans. yes)
statistics for the region, the following information has been es (d) In part (a), suppose there were two errors in the computa
tablished: (i) the chance of any given summer being hot is 0.20; tions. What is the probability that both errors will be spotted
(ii) if the weather in a given summer is hot, the following summer by the team of two engineers? Assume that the events of spot
will also be hot with a probability of 0.40; and (iii) the weather ting between any two errors arc statistically independent. (Ans.
in any given year depends only on the weather in the previous 0.960)
year.
2.25 A self-standing antenna disk is supported by a lattice struc
(a) What is the probability that there will be three hot summers
ture that is anchored to the ground at the base. During a wind
in a row in the given region?
storm, the disk may be damaged as a result of anchorage fail
(b) If the summer this year is not hot. what is the probability
ure and/or failure of the lattice structure. Suppose the following
that the summer next year will not be hot?
information is known:
(c) What is the probability that there will be at least one hot
1. The probability of anchorage failure during a wind storm is
summer in the next 3 years?
0.006.
2.22 A metropolitan city derives its electrical power from three 2. If the anchorage should fail, the probability of failure of the
generating plants A, B, and C. Each plant has a generating ca lattice structure will be 0.40, whereas the probability of failure
pacity of 50 MW (megawatt). The chances that the generating of the anchorage given that failure has occurred in the lattice
plants will be shut down (for periodic maintenance, due to over structure is 0.30.
load. accidents, etc.) in any given week are, respectively, 5%, Determine the following:
5%, and 10%. The operations of plants A and B are related in
(a) The probability of damage to the antenna disk during a wind
such a way that in case of a shutdown at one plant, there is a
storm.
probability of 50% that the other plant will also shut down due
(b) The probability that only one of the two potential failure
to overload, whereas the operation of plant C is independent of
modes will occur during a wind storm.
the other two plants.
(c) If the disk is damaged during a wind storm, what is the prob
(a) During a severe storm, lightning knocked out the power lines
ability that it was caused only by anchorage failure?
from plant A, and repair of the lines will take at least 1 week.
What is the chance that the city will suffer a complete blackout 2.26 The probability of a severe lire (denoted event F) occur
(no power) in the week caused by the storm? ring in a new hospital is assumed to be low. The insurance com
(b) What is the probability that the city will have no power in pany estimates that the probability of a fire occurring in a year
any given week? is P(F) = 0.0\. However, for additional safety, a very sensitive
Problems 71
fire alarm system was installed. This system will always sound E2 = stopped by signal at C from M to Q.
an alarm (A) whenever there is a fire; but because of its high Ey = stopped by red signal once between M and N.
sensitivity, it may also cause false alarms with a probability of (b) Compute the probability that the driver will be stopped by
P(A|F) = 0.1. Assume that there is no possibility for more than traffic signals at least once traveling from M to Q.
one fire in a year. (c) Compute the probability that the driver will be stopped by
(a) List the set of mutually exclusive and collectively exhaustive traffic signals at most once going from M to N.
events.
2.29 From previous records on winter weather for a town, the
(b) Calculate the probabilities for each of the events listed in
probability that it snows on a given day is 0.2. Records also re
(a). veal that low temperature (say less than O' F) occurs on 5% of the
(c) What is the probability that the alarm system will be trig
days, whereas windy days occur 10% of the time. If temperature
gered in one year?
is low. the probability of snow on that day is increased to 30%.
(d) What is the probability of a real fire given that the alarm
Let S. C, and W denote the events of snow, low temperature and
sounded?
wind, respectively, on a given day. The event W may be assumed
2.27 The probability of contracting a certain disease (event to be statistically independent of S and C.
D) during one’s lifetime is very small, with probability (a) What is the probability of having a doomsday (i.e., snow,
P(D) = 0.001. However, if left untreated the disease is always fa low temperature, and wind all occurring on the same day)? (Ans.
tal. Fortunately, modern medical science has provided a diagnos 0.0015)
tic test T to detect the presence of the disease; however, the test is (b) Suppose a construction project cannot progress if it is windy
not always correct. If a person has the disease, there is only 85% or cold, but it is not affected by snow. What is the probability that
probability that the test will be positive, i.e., P(T\D) = 0.85. Also, construction work will be stopped on a given day? (Ans. 0.145)
there is a small probability of 2% that the test will be positive even (c) On a day without snow, what is the probability that construc
when a patient does not have the disease, i.e., P(T\D) = 0.02. tion work cannot proceed? (Ans. 0.139)
(a) Identify the set of mutually exclusive and collectively ex (d) What is the probability of a nice winter day (i.e., no snow,
haustive events. no gusty wind, and no low temperature)?
(b) Calculate the probabilities for each of the events identified (e) Suppose U denotes the event of uncomfortable wind chill
in (a). condition on a given day. If either C or W (but not both) occurs,
(c) What is the probability of a positive test result? the probability of U is 50%, whereas, if both C and W occur, the
(d) If a person's test result is positive, what is the probability probability of U is 100%, and U will not occur if both C and W
that he/she has the disease? do not occur. What is the probability of the event U on a given
winter day?
2.28 A car is traveling from point M to point Q according to
the route indicated in the figure below. The driver has to pass 2.30 Lead and bacteria are the two common sources of con
through three intersections, namely at A, B. and C where traffic tamination in a water distribution system. Suppose 4% of the
lights are installed. Information on the traffic lights relative to water distribution systems are contaminated by lead, and only
the driver is as follows: 2% of the water distribution systems are contaminated by bacte
1. The light at A is equally likely to be either red or green. ria. Assume that the events of lead and bacterial contamination
2. The light at B is equally likely to be either red or green; how are statistically independent.
ever, if the driver encounters green at A, he will meet a green (a) Determine the probability that a water distribution system
light at B with a probability of 0.80. selected at random for inspection is contaminated. (Ans. 0.0592)
3. The driver will encounter a red or green or left turn only at C. (b) If a system is indeed contaminated, what is the probability
The probability of meeting left turn only at C is 0.20. The signal that it is caused by lead only? (Ans. 0.662)
lights at C are statistically independent of those at A and B, and
2.31 The 15-ft diameter tank shown below rests on a concrete
left turns at C are permitted only when the left turn only signal
base. When there is water in the tank, it will be at the 10-ft or
20-ft level, and the probability of either level is 0.40. The weight
of the tank is 100,000 lb and the weight of the water is 11,000
lb per foot of water level in the tank. The frictional resistance
against the horizontal force is equal to the total weight of the
tank and contents times the coefficient of friction, or
F = CW
(a) In terms of the following: GA = light signal at A is green; where
Gri = light signal at B is green; and GLT = left turn only signal is
on at C, define the following events: C = the coefficient of friction
E] = stopped by red lights between M and N. IV = total weight of tank and contents
72 Chapter 2. Fundamentals of Probability Models
(a) If C is equally likely to be 0.10 or 0.20, identify the sample Let W = event of well-compacted clay; and this occurs
space of the frictional resistance to horizontal sliding of the tank, with 90% probability
(b) If the total maximum horizontal force during a wind storm H = event of geomembrane containing holes; and this
is 15,000 lb, what is the probability that the tank will be dis is 30% likely
placed (slide) horizontally during the storm? The value of C is
E = event of extremely heavy rainfall; and the
statistically independent of the total weight.
likelihood is 20%
(c) Suppose that the total maximum horizontal force during a
wind storm (that may cause sliding) may be 15,000 lb or 20.000 The quality of construction has no effect on the future amount
lb with respective probabilities of 60% and 40%; in such a case, of rainfall. However, if the geomembrane contained holes, the
what would be the probability of sliding of the tank? Assume probability of a well-compacted clay is reduced to 60%.
that the maximum wind force is statistically independent of the (a) Express event I in terms of the symbols defined above. Re
frictional resistance. peat for event II.
(b) Determine the probability of event I. Repeat for event II.
(Ans. 0.056, 0.096)
(cl) Are W and H mutually exclusive? Are they statistically in
dependent? Provide explanations to support your answers. (Ans.
no, no)
(c2) Are the events I and 11 mutually exclusive? Are they col
lectively exhaustive? (Ans. yes, no)
(d) Determine the probability of leakage for this landfill con
tainment system. (Ans. 0.152)
2.33 A community is concerned about the supply of energy for
the coming winter. Suppose there are three major sources of en
2.32 A landfill containment system is shown in the figure below.
ergy for the community, namely electrical power, natural gas,
A thick layer of clay (which is a highly impermeable material)
and oil. Let E. G, and O denote, respectively, the shortages of
was placed between the landfill and the surrounding soil stra
these sources of energy for the next winter. Also, it is estimated
tum to prevent the contaminants leaking from the landfill into
that the respective probabilities of these shortages are as follows:
the soil stratum resulting from rainfall infiltration. A layer of
synthetic material called geomembrane was also placed above P(E) = 0.15; P(G) = 0.l0; and P(O) = 0.20
the clay material to provide additional protection against leakage
of contaminants. Nevertheless, the quality of workmanship dur Furthermore, if there is a shortage of oil. the probability that
ing construction may not be completely satisfactory. First, the there will be shortage of electrical energy will be doubled. The
clay might have been compacted poorly; second, the geomem shortage of gas may be assumed to be statistically independent
brane might have holes punctured by sharp stones that were not of shortages in oil and electricity.
detected during inspection. Moreover, extremely heavy rainfall (a) What is the probability that there will be shortages in all the
could happen during the operation of the landfill, which could three sources of energy next winter?
induce excessive pore pressure on the geomembrane/clay layers. (b) What is the probability that there will be shortages in at least
one of the following sources next winter: gas and electricity?
(c) If there is a shortage of electricity next winter, what is the
probability that there will also be shortages in both gas and oil?
(d) What is the probability that at least two of the three sources
of energy will be in short supply next winter?
2.34 An aerial reconnaissance system consists of three remote
sensing components A, B, and C such that the failure of any
one of the components will constitute the failure of the system.
Flying at normal attitudes, the probabilities of failure of the
components over a period of 10 years are, respectively, 0.05,
0.03, and 0.02. whereas operating at ultrahigh altitudes, the cor
The engineer in this case believes that leakage will happen “dur responding failure probabilities would be 0.07,0.08, and 0.03. In
ing extremely heavy rainfall, and either the clay was not well 60% of the time, the reconnaissance system will be operating at
compacted or there were holes in the geomembrane” (event I). normal altitudes, and 40% of the time it will be used at ultrahigh
Leakage could also occur “under ordinary rainfalls (i.e., without altitudes.
extremely heavy rainfall), but only when the clay was not well The system is such that if component A should fail, the
compacted and the geomembrane contained holes” (event II). failure probability of component B will be twice its original
Problems 73
probability. On the other hand, the failure of component C is cement material in the city can definitely handle a low demand
statistically independent of A or B. (L), but if the demand is average (A) or high (H), the supply may
If the reconnaissance system fails within 10 years, what be inadequate with probabilities of 0.10 and 0.50, respectively.
is the probability that it was caused by the failure of compo (a) What is the probability of shortage of cement material in a
nent B? given month?
2.35 In any given year, the winter in a Midwest city can be cold (b) If a shortage occurred in a month, what is the probability
(C) and wet (W). On the average, 50% of the winters in this city that the demand had been average?
are cold and 30% of the winters are wet. Moreover, 40% of the (c) What is the probability of shortage of cement in at least
1 month over a 2-month period? Assume that the demand and
cold winters are also wet. An unpleasant winter (U) is one when
supply of cement material are statistically independent between
the weather is either cold or wet or both.
(a) Are the events C and W statistically independent? Justify. consecutive months.
(b) What is the probability of an unpleasant winter in a given 2.39 Prefabricated wall panels are shipped to a construction
year? project site. Suppose that one shipment is delivered daily to the
(c) What is the probability that the winter in any given year will site. Due to errors in the fabrication process of the panels, it is
be cold but not wet? estimated that the number of defective wall panels in a shipment
(d) If the winter in a given year is indeed unpleasant, what is the is 0, 1, or 2 with respective probabilities of 0.2, 0.5, and 0.3.
probability that it will be both cold and wet? When the wall panels are delivered, they are inspected by
2.36 A city commuter can take one of three possible routes A, B, the supervisor at the construction site. The supervisor will accept
or C to work. On a given weekday morning, during rush hours, the entire shipment if at most one panel is found to be defective;
the chances that routes A, B, and C will have traffic congestion. H, otherwise the shipment is rejected.
are, respectively. 60%, 60%, and 40%. Routes A and B are close (a) Determine the probability that a shipment will be accepted
by each other such that if the traffic is congested in one route the on a given day.
chance of congestion in the other route increases to 85%, whereas (b) What is the probability that exactly one shipment will be
the condition in route C is unaffected (i.e., independent) by the rejected in a week consisting of 5 working days?
traffic conditions in routes A and B. Also, if the traffic in all the (c) The inspection procedure, however, is not perfect; it is only
three routes are congested, the chance that the commuter will be 80% likely that a defective panel will be correctly identified.
late, L, for work is 90%, otherwise it will be 30%. Assuming that the identification of any two defective panels are
(a) What is the probability that exactly one of the three routes statistically independent, determine the probability that a ship
will be congested on a weekday morning? ment will be accepted on a given day.
(b) What is the probability that the commuter will be late for 2.40 A construction project is about 2 months away from the
work on a given weekday morning? scheduled completion date. Based on the progress of the project
(c) If it is known that the traffic in route A is congested, how to date, the contractor estimated that the project can be completed
would this fact change the probability in (b)? on schedule without difficulty if good weather continues for the
2.37 The damage in a structure after an earthquake can be clas next 2 months; whereas if the weather in the next 2 months is
sified as none (N), light (L) or heavy (H). For a new undamaged normal, the probability of on-time completion will be 90%. This
structure, the probability that it will suffer light and heavy dam probability will be reduced to 20% if bad weather prevails, in
ages after an earthquake is 0.2 and 0.05, respectively. However, if which case he has the option of launching a crash program to
a structure was already lightly damaged, its probability of getting improve the on-time completion probability to 80%. However,
heavy damage during the next earthquake is increased to 0.5. because of uncertainty in the labor market, there is only a 50-
(a) For a new structure, what is the probability that it will be 50 chance that he can successfully launch a crash program when
heavily damaged after two earthquakes? Assume that no repair needed. Assume that no crash program is necessary if the weather
was performed after the first earthquake. (Ans. 0.188) condition is good or normal. Suppose also that the weather bu
(b) If a structure is indeed heavily damaged after two earth reau predicts that the relative likelihoods of good, normal, and
quakes, what is the probability that the structure was either un bad weather conditions in the next 2 months are 1:2:2.
damaged or lightly damaged before the second earthquake? (Ans. (a) What is the probability that the construction project will be
0.733) completed on schedule?
(c) If the structure is restored to its undamaged condition after (b) If the project was not completed on schedule, what is the
each earthquake, what is the probability that the structure will probability that the weather condition had been normal?
ever experience heavy damage during three earthquakes? (Ans. 2.41 A country is examining its energy situation for the next
0.143) 10 years. Suppose this country depends on oil and natural gas
2.38 In a typical month, the demand for cement material in a as its principal sources of energy. Energy experts in the country
city may be low (L), average (A), or high (H) with respective estimated that there is a 40% probability that gas supply will
probabilities of 0.60, 0.30, and 0.10. The various suppliers of be low in the next 10 years, whereas the probability of low oil
74 ► Chapter 2. Fundamentals of Probability Models
supply is 20%. However, if oil supply is low, the probability that other loading conditions will produce critical fatigue stresses in
gas supply will also be low is increased to 50%. the girder.
The demand for energy in the next 10 years is also pro (a) Determine the sample space of possible bridge loading con
jected, and the experts estimated that the probabilities of low, ditions relative to the fatigue stresses in the girder, i.e., consisting
normal, and high demand in the next 10 years will be 0.3, 0.6, of load and lane position.
and 0.1. respectively. Depending on the level of energy demand (b) Given that a vehicle is in lane B. what is the probability that
and the supplies of oil and gas in the next 10 years, an “energy it is fully loaded (event F)?
crisis" might occur according to the following table. (c) What is the probability of the occurrence of critical fatigue
stresses in the girder?
(d) If a critical stress is observed, what is the probability that it
Energy Occurrence of
was caused by a vehicle in lane A?
Demand Oil and Gas Supply Energy Crisis
2.44 There are two possible modes of failure of a reinforced con
Low Both low Yes
crete (R/C) beam; namely, shear failure, which can occur sud
Normal At least one low Yes
denly without warning, and bending failure which is preceded by
High Regardless of supply situation Yes
large deflection. Experience indicates that 5% of all failures of
R/C beams are by the shear mode and the rest by bending mode.
(a) If there is normal energy demand in the next 10 years, what Laboratory tests show that 80% of all shear failures exhibit di
is the probability of an “energy crisis”? agonal cracks at the end of a beam prior to failure, whereas only
(b) What is the probability of an “energy crisis” in the next 10% of bending failures show such diagonal cracks.
10 years? (a) What is the probability that an R/C beam will show diagonal
2.42 In an offshore area in the Gulf of Mexico, the probabilities cracks before failure?
that the area will be hit by one or two hurricanes in a given year (b) After a severe earthquake, upon inspection of an R/C build
are, respectively, 20% and 5%. The probability of more than two ing. an engineer found cracks at the end of one of the beams
hurricanes in a year is negligible. An oil production platform in in the building. Should he recommend immediate repair if the
this area consists of two substructures, a jacket truss and a deck. owner’s instruction is that immediate repair is necessary only if
If the hurricane occurs only once in a given year, the annual shear failure is likely to occur or that failure probability exceeds
probability that both substructures will not suffer any damage 75%?
is 99%; this probability drops to 80% if two hurricanes should
2.45 Traffic signals were installed at an intersection involving
occur in a year.
two one-way streets. Suppose 85% of the vehicles will decelerate
(a) What is the probability that the platform (i.e., at least one of
when they see the amber light, whereas 10% will accelerate and
the substructures) will suffer some damage if the area is hit by
5% are “indecisive” and simply continue with the same speed.
one hurricane in a given year? Assume that damages to the two
Five percent of those who accelerated will eventually run a red
substructures are statistically independent. What would be this
light, and only 2% of the “indecisive” drivers will be forced to
probability if two hurricanes should occur in the given year?
run a red light. All of those who decelerated are able to stop
(b) What is the probability that the platform will suffer some
before the red light.
hurricane damage next year?
(a) For a vehicle encountering the amber light at this intersec
(c) If no damage to the platform was found at the end of a year,
tion. what is the probability that it will run the red light?
what is the probability that no hurricanes occurred that year?
(b) If a vehicle were found to have run a red light, what is the
2.43 The common five-axle semitrailer trucks dominate the probability that the driver had accelerated?
heavy truck traffic over a bridge that has been selected for struc (c) The likelihood of an accident resulting from a vehicle run
tural monitoring and testing. For simplicity, the heavy vehicles ning the red light (referred to as a problem vehicle) is studied as
are classified as empty (E), half-full (H), orfully loaded (F). Only follows. Suppose in 60% of the time, vehicles are waiting on the
one vehicle at a time can be observed on the bridge at any instant, other street at the start of their green light cycle, ready to cross the
but it can be in either of two lanes, lane A or lane B. The likeli intersection. Most of these drivers, say 80%, are cautious before
hood of a truck being in lane A is five times greater than in lane they entered the intersection, whereas the rest are not cautious.
B. The lane in which a truck crosses the bridge is independent Given the presence of a problem vehicle in the intersection zone,
of the load it carries. a cautious driver can avoid the problem vehicle 95% of the time,
Of concern are stresses in a girder under lane A. If a fully whereas 20% of the noncautious drivers will collide with the
loaded truck. F, crosses the bridge in lane A, a critical fatigue problem vehicle. What is the probability that a problem vehicle
stress is certain to occur, whereas if the same truck were to cross will lead to an accident?
in lane B. the likelihood of a critical stress is reduced to 60%. (d) Suppose the annual traffic flow in one of these one-way
The likelihood that a half-loaded truck. H, will produce a critical streets is 100,000 vehicles and 5% of them would encounter the
stress is 40% when in lane A. but only 10% when in lane B. No amber signal light. Estimate the number of accidents per year
Problems « 75
in the intersection that would be traced back to vehicles running suffer some loss of revenue due to power outage. What is the
through a red light on that street. probability of this loss in a given year?
(c) If the utility company incurs this loss of revenue in a given
2.46 A major construction company has three branches A, B,
year, whal is the probability that severe earthquakes occurred
and C operating in different parts of the country. The chances
that year?
that the branches will be profitable in any given year are 70%,
70%, and 60%, respectively. The operations of branches A and 2.49 At a construction project, the amount of material used in
B are related such that if one makes a profit the probability of a day’s construction is either 100 units or 200 units, with corre
the other branch also making a profit increases to 90%. whereas sponding probabilities of 0.60 and 0.40. If the amount of mate
branch C is independent of both A and B. At the end of each year, rial required in a day is 100 units, the probability of shortage of
if at least two branches are profitable, the chance that employees material is 0.10, whereas if the amount of material required is
will receive a bonus is 80%; otherwise the chance of a bonus 200 units, the probability of shortage of material is 0.30.
will only be 20%. (a) What is the probability of shortage of material in a given
(a) Whal is the probability that exactly two branches will make day? (Ans. 0.18)
profits in a given year? (b) If there is a shortage of material in a given day, what is
(b) Determine the probability that the company employees will the probability that the amount of material required that day is
receive a bonus this year. 100 units? (Ans. 1/3)
(c) If it is known for sure that branch A will end this year in the
red (i.e.. not making any profit), how would this fact change the 2.50 A space vehicle is designed to land on Mars. Assume that
probability of part (b)? the ground condition on Mars is either hard or soft. If hard ground
is encountered during landing, the vehicle will be successfully
2.47 A contractor finds that some of his building projects in landed with probability 0.9, whereas, if soft ground is encoun
volve difficult foundation conditions, and others are considered tered, the corresponding probability of a successful landing is
easy. His current projects are in three counties in Illinois; namely, only 0.5. Based on the available information, it is judged that the
in Champaign. Ford, and Iroquois. The contractor has statistics chance of hitting hard ground is three times that of hitting soft
from previous jobs in these counties which enable him to make ground.
the following observations with confidence:
(a) What is the probability of a successful landing? (Ans. 0.8).
1. The probability of any building project picked at random
(b) Suppose a stick can be projected to test the ground condition
will have a difficult foundation problem is 2/3.
before landing. It will penetrate into soft ground with probabil
2. 1/3 of the projects are in Ford County.
ity 0.9, and hard ground with probability of only 0.2. If the stick
3. 2/5 of his projects are in Iroquois County and involve dif
were observed to penetrate into the ground, what is the proba
ficult foundation conditions.
bility that the ground is hard? (Ans. 0.4)
4. Of all his projects in Ford County. 50% involve difficult
(c) What is the probability of a successful landing if the stick
foundation conditions.
penetrated the ground? (Ans. 0.66)
5. The events of a project in Champaign County and the foun
dation condition are statistically independent. 2.51 Cement and reinforcing steel (rebars) are essential materi
(a) What is the probability of having the next project in Ford als for constructing a reinforced concrete building. During con
County that has easy foundation conditions? struction, the probabilities of encountering shortages of these
(b) What is the probability of having the next project in Cham materials (e.g., caused by strikes at the factories) are, respec
paign County with easy foundation conditions? tively, 0.10 and 0.05. However, if cement is available, the proba
(c) If we know only that his next project will have easy foun bility of shortage of rebars is reduced to one-half of the original
dation conditions, what is the probability that it is in Iroquois probability.
County? (a) What is the probability that there will be shortage of either
2.48 The three major loadings on a nuclear power plant, as far as (or both) construction material?
safety is concerned, are those due to severe earthquakes (E). loss (b) What is the probability that only one of the two materials
of coolant accident (L), and thermal transients (T). For a typi will be available?
cal plant, the chances of occurrence of E. L. and Tina given (c) If there is shortage of material during construction, what
year are, respectively, ().()()() 1,0.0002, and 0.00015. Also, severe is the probability that it will be shortage of reinforcing
earthquakes sometimes cause L due to pipe breaks; it is estimated steel?
that the chance of L increases to 10% if severe earthquakes occur The construction materials referred to above must be transported
in the same year. T is assumed to be independent of both E and L. from the factories to the construction site either by trucks or
(a) What is the probability that in a given year, all three types trains. Past records show that 60% of the materials are trans
of loadings will occur? ported by trucks and the remaining 40% by trains. Also, the
(b) If at least one of the major loadings occurs, the plant has to probability of on-time delivery by trucks is 0.75. whereas the
be shut down for a period of time and the utility company will corresponding probability by trains is 0.90.
76 Chapter 2. Fundamentals of Probability Models
(d) What is the probability that materials to the construction site (c) What is the probability that the engineer will meet his dead
will be delivered on schedule? line for the project?
(e) If there is delay in the transportation of construction mate
2.54 At a rock quarry, the time required to load crushed rocks
rials to the site, what is the probability that it will be caused by
onto a truck is equally likely to be either 2 or 3 minutes. Also, the
truck transportation?
number of trucks in a queue waiting to be loaded can vary con
2.52 Water for a city is supplied from two sources; namely. siderably; data from 40 previous observations taken at random
Source A and Source B. During the summer season, the prob show the following:
ability that the supply from Source A will be below normal is
No. of Trucks No. of Relative
0.30; the corresponding probability for Source B is 0.15. How
in Queue Observations Frequency
ever, if Source A is below normal, the probability that Source
B will also be below normal during the same summer season is 0 7 0.175
increased to 0.30. I 5 0.125
The probability of water shortage in the city will obviously 2 12 0.3
depend on the supplies from the two sources. In particular, if 3 11 0.275
only Source A is below normal supply, the probability of water 4 4 0.1
shortage is 0.20, whereas if only Source B is below normal the 5 1 0.025
corresponding probability of shortage is 0.25. Obviously, if none 6 0 0
of the sources are below normal, there would be no chance of Total = 40
shortage, whereas if both sources are below normal during the
summer, the probability of water shortage in the city would be
0.80.
During a summer season, determine the following:
(a) The probability that there will be below normal supply of
water from either or both sources.
(b) The probability that only one of the two sources will be
below normal supply.
(c) The probability of water shortage in the city during the sum
mer season.
... □□□
Queue size
(d) If water shortage should occur in the city, what is the prob
ability that it was caused by below-normal supplies from both A quarry site.
sources? The time required to load a truck is statistically independent of
(e) If there is no water shortage in the city during the summer, the queue size.
what is the probability there was normal supply from Source A? (a) If there are two trucks in the queue when a truck arrives at
the quarry, what is the probability that its “waiting time” will be
2.53 A consulting engineer must meet a deadline for a project
less than 5 minutes?
consisting of two independent phases:
(b) Before arriving at the quarry, and thus not knowing the size
(1) Field work—If weather conditions are favorable, the prob
of the queue upon arrival, what is the probability that the waiting
ability that the Held work will be completed on schedule is 0.90.
time of the truck will be less than 5 minutes?
Otherwise, the probability of on-schedule completion is reduced
to 0.50. The probability of unfavorable weather is 0.60. 2.55 A small old bridge is susceptible to damages from heavy
(2) Computations—Two independent computers are available tmeks. Suppose the bridge can have room for at most two trucks,
to perform the required calculations. Each computer has a reli one in each lane. The event of possible damage to the bridge when
ability of 70% (i.e., the probability of working is 0.70). If only two trucks are present simultaneously is investigated next.
one of the computers is working, the probability of completing Suppose 10% of the trucks are overloaded (i.e.. above legal
the computations on time is 0.60, whereas, if both computers are load limit) and the event of overloading is statistically indepen
working, this probability increases to 0.90. Furthermore, if both dent between trucks. The damage probability of the bridge is
of the computers are not working, the engineer must perform his 30% when both trucks are overloaded; the probability is 5% if
calculations using desk calculators which are 100% reliable, but only one is overloaded and 0.1 % when both trucks are not over
will decrease the probability of completing the computations to loaded.
0.40. (a) What is the probability of damage to the bridge while sup
(a) What is the probability that the field work will be completed porting two trucks?
on schedule? (b) If the bridge is damaged, what is the probability that it was
(b) What is the probability that the computations will be com caused by overloaded truck (or trucks)? [Hint. Determine first the
pleted on time? probability that damage was not caused by overloaded truck(s).]
Problems 77
(c) Return to part (a). Suppose the county board can allocate a (i) What is the probability that it will be of good quality?
sum of money for strengthening the bridge such that the proba (ii) In this case, i.e., concrete passed the test, what would be
bility of damage will be half of the existing bridge. Alternatively, the probability of failure of a structural component cast from
that sum of money could be used to increase the inspection fre this batch of concrete?
quency of trucks such that the fraction of overloaded trucks en
2.58 The maximum intensity of the next earthquake in a city
tering the bridge is decreased from 10% to 6%. Which alternative
may be classified (for simplicity) as low (L), medium (M), or
is better if the ob jective is to minimize the probability of damage
high (H) with relative likelihoods of 15:4:1. Suppose also that
to the bridge while supporting two trucks?
buildings may be divided into two types; poorly constructed (P)
2.56 A geologic anomaly embedded underneath a site could and well constructed (W). About 20% of all the buildings in the
induce geotechnical failure if the anomaly is sufficiently large city are known to be poorly constructed for earthquake resis
and consists of undesirable soil properties. Suppose an engineer tance.
estimates that there is a 30% likelihood that anomaly may be It is estimated that a poorly constructed building will be
present at a given site on the basis of the geology in the region. damaged with a probability of 0.10, 0.50, or 0.90 when sub
An exploration program may be performed at the site to verify jected to a low-, medium-, or high-intensity earthquake, re
the presence of an anomaly. spectively. However, a well-constructed building will survive a
One plan calls for the use of geophysical techniques. If an low-intensity earthquake, although it may be damaged when sub
anomaly is present, such techniques will have 50% probability jected to a medium- or high-intensity earthquake with probability
of detecting the anomaly: otherwise, no signal will be registered, of 0.05 or 0.20, respectively.
(a) If the geophysical technique is used but it failed to detect (a) What is the probability that a well-constructed building will
any anomaly, what is the probability that the occurrence of an be damaged during the next earthquake?
anomaly is still possible at the site? (Ans. 0.176) (b) What proportion of the buildings in this city will be damaged
(b) At this point, a more discriminating plan is used such that during the next earthquake?
the probability of detecting an anomaly is as high as 80% if an (c) If a building in the city is damaged after an earthquake, what
anomaly is present. Suppose this new plan also did not detect is the probability that the building was poorly constructed?
any anomaly.
(i) How confident is the engineer now about his claim that the 2.59 A transit system consists of one-way trains running be
site is free of any anomaly? (Ans. 0.959) tween four stations as shown in the figure below. The distances
(ii) A foundation system will be built at the site. The engineer between stations are as indicated in the figure. The probabilities
estimates that the foundation should be 99.99% safe if there concerning origin and destination of passengers are summarized
is no anomaly. However, if an anomaly exists, the reliability in the following matrix.
of the foundation is reduced to 80%. What is the probability
Destination
of failure of this foundation system? (Ans. 0.008)
(iii) Suppose failure of the foundation system could bring a Origin 1 2 3 4
loss of one million dollars, whereas survival of the foundation
system will not result in any loss. What is the expected loss 1 0 0.1 0.3 0.6
associated with a failure in the foundation? How much of this 2 0.6 0 0.3 0.1
expected loss can be saved if the site can be verified to be 3 0.5 0.1 0 0.4
anomaly free? (Hint: Expected loss = probability of failure x 4 0.8 0.1 0.1 0
failure loss) (Ans. $8300, $8200)
For example, a passenger originating from Station 1 will get off at
2.57 Past records show that a batch of mixed concrete supplied
Station 2, 3. or 4 with probabilities 0.1,0.3 and 0.6, respectively.
by a certain manufacturer can be of good quality (G), average
Furthermore, the fraction of trips originating from Stations 1.2,
quality (A), or bad quality (B), with respective probabilities of
3, and 4 are 0.25, 0.15, 0.35, and 0.25, respectively.
0.30, 0.60, and 0.10. Suppose that the probability of failure of a
reinforced concrete component would be 0.001,0.01. or 0.1 de
pending on whether the quality of concrete is good (G), average
(A), or bad (B).
(a) What would be the probability of failure of a reinforced con
crete structural component cast with a batch of concrete supplied
by the manufacturer?
(b) A test may be performed to give more information on the
quality of concrete supplied by the manufacturer. The probabil
ities of passing the test for good, average, and bad quality con
crete are 0.90, 0.70, and 0.20, respectively. If a batch of concrete (a) What is the probability that a passenger will leave the train
passed the test. at Station 3? (Ans. 0.145)
78 ► Chapter 2. Fundamentals of Probability Models
(b) What is the expected trip length for a passenger boarding at responding probability would be 0.20. However, if both causes
Station 1? (Note-. “Expected value of X“ = 'EauXipi where/?, is are present, the probability of defective concrete would be 0.80;
probability of the outcome x,). (Ans. 11.5 miles) if none of the causes is present, the corresponding probability
(c) What proportion of passenger trips will exceed 10 miles? is 0.05. Determine the probability of defective concrete on the
(Ans. 0.5). project.
(d) What fraction of the passengers departing the train at Station (e) If there is defective concrete on the project, what is the prob
3 originated from Station 1? (Ans. 0.517) ability that it is caused by both poor aggregates and poor work
manship?
2.60 Three machines A, B, and C produce, respectively, 6%,
30%. and 10% of the total number of items of a factory. From 2.63 The structure shown in the figure below could be subject
past records, the percentages of defective outputs of these ma to settlement problems. The likelihood of having a settlement
chines are, respectively, 2% 3%, and 4%. An item is selected at problem (event A) depends on the subsoil condition, in particu
random and is found to be defective. lar, whether a weak zone exists in the subsoil or not. If there is
(a) What is the probability that the particular defective item was a small weak zone (event S), the probability of A is 0.2; if the
produced by machine A? (Ans. 0.48) weak zone is large (event L), the probability of A becomes 0.6;
(b) What is the probability that the defective item was produced last, if no weak zone exists (event N), then the probability of A
by either machine A or B? (Ans. 0.84) is only 0.05. Based on their experiences with the geology of the
neighborhood and the soil information from the preliminary site
2.61 An engineering Company E is submitting bids for two
exploration program, the engineers in this case believe that there
projects A and B; the probabilities of winning are estimated
is a 70% chance of no weak zone in the stratum underlying the
to be, respectively, 0.50 and 0.30. Also, if the company wins one
structure; however, if there is a weak zone, it would be twice as
bid. his chance of winning the other bid is reduced to one-half
likely to be small than large.
of the original probability.
(a) What is the probability of Company E winning at least one
of its bids?
(b) If Company E wins at least of one of its bids, what is the
probability that it will be for project A and not project B?
(c) If Company E is awarded only one project, what is the prob Weak zone?
Large or small?
ability that it will be project A?
Furthermore, on the basis of past performance, it is estimated
(a) What is the probability that the structure will have a settle
that the probability of Company E completing project A within
ment problem? (Ans. 0.135)
a target time is 0.75, whereas, if another company is awarded
(b) Suppose an additional boring can be performed at the site to
the project, the probability of on-time completion of project A
gather more information about the presence of the weak material.
is only 0.50.
The engineers judge that: If a large weak zone exists, it is 80%
(d) What is the probability that project A will be completed on
likely that the boring will encounter it; this encounter probabil
time?
ity drops to 30% for a small weak zone. Obviously, the boring
(e) If the project A is completed on time, what is the probability
will not encounter any weak material if the weak zone does not
that it was done by Company E?
exist at all. Suppose the additional boring failed to encounter any
2.62 Defective concrete on a construction project can be caused weak material:
by either poor aggregates or poor workmanship (such as selection i. What is the probability of the presence of a large weak
and grading of material, pouring, curing) or both. The quality of zone? (Ans. 0.023)
aggregates is also affected by the quality of workmanship, and ii. What is the probability of a small weak zone? (Ans. 0.163)
vice versa. iii. In this case, what is the probability that the structure will
On a given project, the probability of poor aggregates is have a settlement problem? (Ans. 0.087)
0.20. The probability of poor workmanship if the aggregates are
2.64 A dam is proposed to be built in a seismically active area
poor is 0.30. The probability of poor aggregates if there is poor
as shown in the following figure:
workmanship is 0.15.
(a) What is the probability of poor workmanship on the project?
(b) What is the probability of at least one of the causes of de
fective concrete occurring on the project?
(c) Determine the probability that only one of the two possible
causes will occur.
(d) In the above project, if there is only poor aggregates and not
poor workmanship, the probability of defective concrete is 0.15.
If there is only poor workmanship but good aggregates, the cor
Problems <4 79
Two regions. A and B, can be identified in the vicinity such that 4. Bad weather is more likely in the morning; in fact, 20% of
earthquakes in either area could cause damage to the proposed the mornings are associated with bad weather, but only 10% of
dam. Earthquakes occur independently between regions A and pm hours are subject to bad weather.
B. Suppose the annual probabilities of earthquake occurrences 5. Assume only two kinds of weather, namely, good or bad.
in regions A and B are 0.01 and 0.02, respectively. Moreover, the Define events as follows:
chance of two or more earthquakes occurring annually in each
region is negligible. A = am (morning)
(a) What is the probability of an earthquake occurring in the P = pm (rest of the day)
vicinity of the dam in a given year? (Ans. 0.03)
D = Delay
(b) If an earthquake occurred in A (but not in B), the likelihood
of damage to the dam is 0.3; however, if an earthquake occurred G = Good weather
in B (but not in A), the likelihood of damage is only 0.1. Further B = Bad weather
more. if earthquakes occurred in both regions, the dam would
have a 50-50 chance of damage. What is the probability that the Answer the following:
dam will be damaged in a given year? (Ans. 0.00502) (a) What fraction of the flights at this airport will be delayed?
(c) Suppose the dam can be relocated close to the center of region Observe that this is the same as the probability that a given flight
A, such that earthquakes in region B will not cause any damage will be delayed. (Ans. 0.282)
to the dam. However, the likelihood of damage due to an earth (b) If a flight is delayed, what is the probability that it is caused
quake in region A will increase to 0.4. Should the dam be sited in by bad weather? (Ans. 0.330)
this new location if the objective is to minimize the probability (c) What fraction of the morning flights at this airport will be
of incurring damage? Please substantiate your answer. delayed? (Ans. 10%)
(d) Would the decision in part (c) change if the new site 2.67 Defects in the articles produced by a manufacturing com
is also susceptible to: (i) a landslide caused by severe rain pany can be the result of one of the following independent causes:
storms with an annual probability of 0.002, and (ii) a 0.001 1. Malfunction of the machinery, which occurs 5% of the pro
annual probability of subsidence due to poor supporting sub duction time
soil structures? Explain. Assume that the dam will be dam 2. Carelessness of workers, which occurs 8% of the production
aged during landslide or subsidence. Also, the events of damage time
caused by earthquake, landslide, and subsidence are statistically Defective articles are produced only by these two causes. Also, if
independent. only the machinery malfunctions, the probability of a defective
article is 0.10, whereas whenever there is only carelessness of
2.65 Three oil companies, A. B. and C. are exploring for oil
workers, the probability of a defective article is 0.20. However,
in an area. The probabilities that they will discover oil are, re
when both causes occur, the probability of a defective article is
spectively. 0.40, 0.60, and 0.20. If B discovers oil, the probabil
0.80.
ity that A will also discover oil is increased by 20%, whereas
(a) What is the probability of producing a defective article in
this does affect the chance of C discovering oil (i.e., C is in
the company?
dependent of B). Moreover, assume that C is also independent
(b) If a defective article is discovered, what is the probability
of A.
that it was caused by carelessness of workers?
(a) What is the probability that oil will be discovered in the area
by one or more of the three companies? 2.68 Leakage of contaminated material is suspected from a
(b) If oil is discovered in the area, what is the probability that it given landfill. Monitoring wells are proposed to verify if leak
will be discovered by Company C? age has occurred. The location of two wells, A and B. are shown
(c) What is the probability that only one of the three companies in the following figure:
will discover oil in the area?
2.66 Delays at the airport are common phenomena for air travel
ers. The likelihood of delay often depends on the weather condi
tion and time of the day. The following information is available
at a local airport:
1. In the morning (am), flights are always on time if good
weather prevails; however, during bad weather, half of the flights
will be delayed.
2. For the rest of the day (pm), the chances of delay during good If leakage has occurred, it will be observed by Well A with 80%
and bad weather are 0.3 and 0.9, respectively. probability, whereas Well B is 90% likely to detect the leakage.
3. 30% of the flights are during am hours, whereas 70% of the Assume that either well will not register any contaminant if there
flights are during pm hours. is no leakage from the landfill. Before the wells are installed, the
80 Chapter 2. Fundamentals of Probability Models
engineer believes that there is a 70% chance that leakage has cessive, it will cause health problems in only 20% of the pop
occurred. ulation who has low natural resistance to that pollutant. Also,
(a) Suppose Well A has been installed and no contaminants were data from many similar communities reveal that the presence of
observed. How likely will the engineer now believe that leakage these two pollutants in drinking water is not independent; half
has occurred? (Ans. 0.318) of those communities whose drinking water contain excessive
(h) Suppose both wells have been installed. Assume that the amounts of pollutant A will also contain excessive amounts of
events of detecting leakage between the wells are statistically pollutant B.
independent. Suppose a resident is selected at random from this com
(i) What is the probability that contaminants will be observed munity, what is the probability that he or she will suffer health
in at least one of the wells? (Ans. 0.686) problems from drinking the water? Assume that a person’s resis
(ii ) If contaminants were not observed by the wells, how con tance to pollutant B is innate, which is independent of the event
fident is the engineer in concluding that no leakage has oc of having excessive pollutant in the drinking water.
curred? (Ans. 0.955)
2.70 A contractor submits bids to three highway jobs and two
(c) If the cost of installing Well A is the same as Well B. and the
building jobs. The probability of winning each job is 0.6. Assume
budget is sufficient for the installation of only one well, which
that winning among the jobs is statistically independent.
well should be installed? Please justify. (Ans. B)
(a) What is the probability that the contractor will win at most
2.69 Drinking water may be contaminated by two pollutants. In one job? (Ans. 0.087)
a given community, the probability of its drinking water con (b) What is the probability that the contractor will win at least
taining excessive amount of pollutant A is 0.1, whereas that of two jobs? (Ans. 0.913)
pollutant B is 0.2. When pollutant A is excessive, it will defi (c) What is the probability that he or she will win exactly one
nitely cause health problems; however, when pollutant B is ex highway job. but none of the building jobs? (Ans. 0.046)
► REFERENCES
Feller, W., An Introduction to Probability Theory and Its Applications, Parzen, E., Modern Probability Theory and Its Applications, J. Wiley &
Vol. 1. 2nd cd.. John Wiley and Sons, New York, 1957. Sons, Inc, New York, 1960.
Papoulis. A., Probability, Random Variables, and Stochastic Processes.
McGraw-Hill Book Co.. New York, 1965.
CHAPTER
3
Analytical Models of
Random Phenomena
81
82 Chapter 3. Analytical Models of Random Phenomena
represents the possible states of a chain (as described in the paragraph above), then X = 0
means failure of the chain. In other words, a random variable is a mathematical device for
identifying events in numerical terms. Henceforth, in terms of the random variable, we can
speak of an event as (X = a), or (X < a) or (a < X < b).
More formally, a random variable may be considered as a mathematical function or rule
that maps (or transform) events in a sample space into the number system (i.e., the real line).
The mapping is unique, and mutually exclusive events are mapped into nonoverlapping
intervals on the real line, whereas intersecting events are represented by the respective
overlapping intervals on the real line. In Fig. 3.1, the events E\ and E? are mapped into the
real line through the random variable X, and thus can be identified, respectively, as indicated
below:
E\ = (a < X < b)
E2 = (c < X < d)
Ei U E2 = (X < «) + (X > d),
and
Just as a sample space can consist of discrete sample points or a continuum of sample points,
a random variable may be discrete or continuous.
The advantages and purpose of identifying events in numerical terms should be obvious;
this will then permit us to conveniently represent events analytically, as well as graphically
display the events and their respective probabilities.
Accordingly, if Fx(x) has a first derivative, then from Eq. 3.4, its PDF is
dFx(x)
fx(x)= —-— (3.5)
dx
We might emphasize again that the PDF, fx(x), is not a probability but a probability
density; this is analogous to a mass density which contains no mass. However,
*We have adopted the notation of using a capital letter to denote a random variable, and its possible values with
the corresponding lower case letter.
84 Chapter 3. Analytical Models of Random Phenomena
Figure 3.2a Discrete probability distribution. Figure 3.2b Continuous probability distribution.
it may be emphasized that any function used to represent the probability distribution
of a random variable must necessarily satisfy the axioms of probability theory, as described
earlier in Sect. 2.3.1. Accordingly, the function must be nonnegative and the probabilities
associated with all the possible values of the random variable must add up to 1.0. In other
words, if Fx(x) is the CDF of X, then it must satisfy the following conditions:
Z ■oo
fx(x)dx — I
J—oo
fx(x)dx
► EXAMPLE 3.1 Let us again consider Example 2.1, which involves a discrete random variable. Using X as the random
variable whose values represent the number of operating bulldozers after 6 months, the events of
interest are mapped into the real line as shown in Fig. E3.1a. Thus, (X = 0), (X = 1), (X = 2), and
(X = 3) now represent the corresponding events of interest.
Assuming again that each of the three bulldozers is equally likely to be operating or nonoperating
after 6 months, i.e., probability of operating is 0.5, and that the conditions between bulldozers are
statistically independent, the PMF of X would be as shown in Fig. E3.1 b.
86 Chapter 3. Analytical Models of Random Phenomena
FXW'
1/0
7/8 ---------------
1/2
1/8---------------
________ I I I I
0 1________ 2________ 3________ 4 x
Figure E3.1c CDF of X.
► EXAMPLE 3.2 For a continuous random variable, consider the lOO-kg load in Example 2.5. If the load is equally
likely to be placed anywhere along the span of the beam of 10 m, then the PDF of the load position
X is uniformly distributed in 0 < .r < 10; that is,
where c is a constant. As the area under the PDF must be equal to 1.0, the constant c — 1/10.
Graphically, the PDF is shown in Fig. E3.2a.
The corresponding CDF of X would be
x > 10;
x <0
The probability that the position of the load will be between 2 m and 5 m on the beam is
f5 1
P(2 < X < 5) = / — dx = 0.30
5 2
P(2 < X < 5) = Fx(5) - Fx(2) = — - — = 0.30
► EXAMPLE 3.3 The useful life, T (in hours) of welding machines is not completely predictable, but may be described
by the exponential distribution, with the following PDF:
Values of T
Central Values
As there is a range of possible values of a random variable, some central value of this range,
such as the average value, would naturally be of special interest. In particular, because
3.1 Random Variables and Probability Distribution M 89
the different values of the random variable are associated with different probabilities or
probability densities, the “weighted average” (i.e., weighted by the respective probability
measures) would be of special interest; this weighted average is the mean value or the
expected value of the random variable.
Therefore, if X is a discrete random variable with PMF, px(x(), its mean value, denoted
E(X), is
E(X)= ^x.pxC
*
/) (3.7a)
all Xi
whereas if X is a continuous random variable with PDF, /x(x), its mean value is
Other quantities that may be used to designate the central value of a random variable include
the mode, or modal value, and the median. The mode, which we shall denote as x, is the
most probable value of a random variable; i.e., it is the value of the random variable with
the largest probability or the highest probability density. The median of a random variable
X, which we shall denote as xm, is the value at which the CDF is 50% and thus larger and
smaller values are equally probable; that is.
In general, the mean, mode, and median of a random variable are different, especially if the
underlying PDF is skewed (nonsymmetric). However, if the PDF is symmetric and unimodal
(single mode), these three quantities will be the same.
Mathematical Expectation
The notion of an expected value, or weighted average, as described in Eq. 3.7, can be
generalized for a function of X. Given a function g(X), its expected value E[g(X)], can be
obtained as a generalization of Eq. 3.7 as
In either case, E[g(X)| is known as the mathematical expectation of g(X) and is the weighted
average of the function g(X).
Measures of Dispersion
Again, as there is a range of possible values of a random variable associated with different
probabilities or probability measures, px(x,) or /x(x), some measure of dispersion is needed
to indicate how widely or narrowly the values of the random variable are dispersed. Of
special interest is a quantity that gives a measure of how closely or widely the values of the
variate are clustered around a central value. Intuitively, such a quantity must be a function
of the deviations from the central value. However, whether a deviation is above or below
the central value should be immaterial; therefore, the function should be an even function
of the deviations.
90 Chapter 3. Analytical Models of Random Phenomena
If the deviations are defined relative to the mean value, then a suitable measure of the
dispersion is the variance. For a discrete random variable X with PMF px (x/), the variance
of X is
Var(X) = (x,- - /ix)2 Px(
*
i) (3.10a)
in which /ix = E(X). We might observe that this is the weighted average of the squared
deviations from the mean, or with reference to Eq. 3.9 it is the mathematical expectation
of gfX) = (X — /lx)2- Therefore, according to Eq. 3.9b, for a continuous X with PDFfx(x),
the variance of X is
/»OO
Z -00
(x2 - Zpxx + Mx2) fx(x)dx
In Eq. 3.11, the term £(X2) is known as the mean square value of X.
Dimensionally, a more convenient measure of dispersion is the square root of the
variance, which is the standard deviation (?%', that is,
crx = yVbr(X) (3.12)
We might observe that based solely on the value of the variance or standard deviation, it may
be difficult to state the degree of dispersion; for this purpose, a measure of the dispersion
relative to the central value would be appropriate. In other words, whether the dispersion is
large or small is more meaningful if measured relative to the central value. For this reason,
for px > O' the coefficient of variation (c.o.v.),
ax
8X = — (3-13)
Mx
is often a preferred and convenient nondimensional measure of dispersion or variability.
► EXAMPLE 3.4 In Example 3.1. the PMF of the number of operating bulldozers after 6 months is shown in Fig. E3.1b.
On this basis, we obtain the expected number of operating bulldozers after 6 months as
As the random variable is discrete, the mean value of 1.5 is not necessarily a possible value; in this
case, we may only conclude that the mean number of operating bulldozers is between 1 and 2 at the
end of 6 months.
The corresponding variance is
ax = 7175 = 0.866
► EXAMPLE 3.5 In Example 3.3, the useful life, T, of welding machines is a random variable with an exponential
e probability distribution; the PDF and CDF are, respectively, as follows:
/»OO
Mr = E(T) = / t'ke~K,dt
2) Jo
flm
/ ke~x'dt = 0.50
3) Jo
Therefore,
tm = 0.693 Mr
The variance of T is
his
the Var(T)= (t - l/X)2X<?-Xr dt
Jo
Var(T) = 1
92 Chapter 3. Analytical Models of Random Phenomena
► EXAMPLE 3.6 Suppose a construction company has an experience record showing that 60% of its projects were com
pleted on schedule. If this record prevails, the probability of the number of on-schedule completions
in the next six projects can be described by the binomial distribution (see Sect. 3.2.3) as follows:
If X is the number of projects completed on schedule among 6 future projects, then (see
Eq. 3.30),
otherwise
the factor ( ) = -------------is known as the binomial coefficient. In the present problem, n = 6 and
\r / rl(n — r)l
r = x.
The above PMF is shown graphically in Fig. E3.6.
6'
E(X) = —(0.6r(0.4)6-
6' 6' 6’
= (l ’Tk6T^<0'6,<0'4)5 + (2)2!(6^2)!(l,'6)2(0'4)4 + 3M(“,W
Therefore, the expected number of projects among six that can be completed on schedule by the
company is between three and four. The corresponding variance is
3.1 Random Variables and Probability Distribution 93
6 6!
Var(X) = - 3.60)2——— (0.6)a(0.4)6-x
S x!(6 —x)!
= (-3-60)2n,7A6! m,(0.6)°(0.4)6 + (1 - 3.60)2—(0.6)(0.4)5
0!(6 — ())! 11(6 — 1)!
6'
+ (6 - 3-60>2a.^ ‘ (0.6)6(0.4)°
6!(6 — 6)!
= 0.0531 + 0.2482 + 0.3539 + 0.0995 + 0.0498 4- 0.1626 + 0.2684
= 1.2355
The standard deviation, therefore, is
8X = — = 0.308
3.60
In this case, X = 4 has the highest probability of 0.311 as shown in Fig. E3.6. Therefore, the most
probable number of projects that will be completed on schedule is four. ◄
Measure of Skewness
Another useful and important property of a random variable is the symmetry or asymmetry
of its PDF or PMF, and the associated degree and direction of asymmetry. A measure of
this asymmetry or skewness is the third central moment’, i.e..
and
E(X - Mx)3
(3.14)
94 ► Chapters. Analytical Models of Random Phenomena
The Kurtosis
Finally, another useful property is the fourth central moment of a random variable known
as the kurtosis. Extending what we have presented above for the second and third central
moments, the kurtosis is
and
oo
/ -oo
(x - mx)4/x(*
M x for continuous X
► EXAMPLE 3.7 Let us calculate the skewness of the PMF of Example 3.6. As this is a discrete random variable, we
calculate the sknewness as follows:
From Example 3.6, we have /x * = 3.60 and ox = 1.11.
6
E(X - nx? = - 3.6O)3p*( x,)
i=0.
= (0 - 3.60)3(0.004) + (1 - 3.60)3(0.037) + (2 - 3.60)3(0.138) + (3 - 3.60)3(0.276)
+ (4 - 3.60)3(0.311) + (5 - 3.60)’(0.187) + (6 - 3.60)3(0.047)
= -0.279
Therefore, the distribution of X is negatively skewed with a skewness coefficient, according to
Eq. 3.14, of
0.279
e =-------- 7 = - 0.204
1.113 ◄
► EXAMPLE 3.8 For the exponential distribution of the useful life of welding machines, T, of Example 3.5. the mean
life of the machines is iiT- Then, the third central moment of the PDF is (using a for fir),
rx / i \
E(T - /z)3 = / (t - /z)3 I I dt
Jo \d /
1 f00
= — / (t3 — + 3r/z2 — iL3)e~,,lldt
d Jo
( r / \ 3 / \ 2 / \ ~i r / \ 2 / \
= /z3e? '■M | | | + 3 f— ) + 6 | | + 6 +3 | — j + 2 ( — )+2
\M/ \nj \n J J [_\M/ \M/
'/ t \ 1 I00
- 3 - + 1 + 1
L\M/ J Jo
— 3
= — । — J + | — )+2 = 2;z3
IV' Jo
96 ► Chapters. Analytical Models of Random Phenomena
We recall from Example 3.5 that the standard deviation of the exponential distribution is ar = iiT.
Therefore, the skewness coefficient of this distribution, according to Eq. 3.14, is 9 = 2.0. ◄
Several of these probability distribution functions are presented, and their special properties
are described and illustrated in this section.
where, // and a are the parameters of the distribution. In this case, for the Gaussian dis
tribution. these parameters are also the mean and standard deviation, respectively, of the
random variable X. A popular and convenient short notation for this distribution is A(/i, o),
which we shall also adopt in specifying the normal PDF.
The significance of the two parameters, /z and a, may be observed graphically from
Fig. 3.5. In Fig. 3.5a, /.i is held constant at 60 and a varies with 10, 20, and 30, whereas in
Fig. 3.5b, n varies with values of 30, 45, and 60 for a constant o = 10.
3.2 Useful Probability Distributioos 97
Conversely, the value of the standard normal variate at a cumulative probability p would
be denoted as
Clearly, this is the area under the general normal PDF between a and b as shown in Fig. 3.8.
Theoretically, the above probability can be obtained by performing the integral directly;
however, this can be evaluated more readily using the tabulated values of <£>(.?) given in
Table A. 1. For this purpose, we make the following change of variables:
= -L / e~^s ds
which we may recognize is the area under the PDF of A(0, 1) between (a — /z)/<r and
(/? — /z)/<7. Therefore, according to Eq. 3.6, the above probability may be evaluated as
b — u. \ / a — u\
( cr / ~
----- \ a “J
(3.21)
3.2 Useful Probability Distributions 99
► EXAMPLE 3.9 The drainage from a community during a storm is a normal random variable estimated to have a mean
of 1.2 million gallons per day (mgd) and a standard deviation of 0.4 mgd; i.e., N( 1.2, 0.4) mgd. If the
storm drain system is designed with a maximum drainage capacity of 1.5 mgd. what is the underlying
probability of flooding during a storm that is assumed in the design of the drainage system?
Flooding in the community will occur when the drainage load exceeds the capacity of the drainage
system; therefore, the probability of flooding is
/1.5 - 1.2 \
P(X > 1.5) = I - P(X < 1.5) = 1 - ———J = 1 - 4>(0.75)
= 1 - 0.7734 = 0.227
/1.6-1.2X /1.0-1.2X
P(1.0 < X < 1.6) = 0 ------------ -01 ------------ = 0(1.0) — 0(—0.5)
\ 0.4 / \ 0.4 /
= 0.8413 - [1 - 0(0.5)] = 0.8413 — (1 -0.6915)
= 0.533
(ii) The 90-percentile drainage load from the community during a storm. This is the value of the
random variable at which the cumulative probability is less than 0.90, which we would obtain as
>,. v . (-^0.90 1 -2 \ A
P(X < X()9o) — 0l ----()~4()---- / = 0'90
Therefore,
=*
-'(0.90)
0.40
► EXAMPLE 3.10 Statistical data of vehicular accidents show that the annual vehicle miles (i.e., miles per vehicle
per year) driven between traffic accidents (of all severities) can be represented by a normal random
variable with a mean of 15,000 miles per year and a c.o.v. of 25%.
The standard deviation is a = 0.25 (15,000) = 3750 miles per year. Therefore, the distribution
of miles driven between accidents is N( 15,000; 3750) miles per year.
100 Chapter 3. Analytical Models of Random Phenomena
Then, for a typical driver who drives 10,000 miles per year, the probability of him/her having an
accident in a year is
/ 10,000- 15,000\
P(X < 10,000) = 0<------- yyyj-------J = 0(-1.33) = 1 - 0(1.33)
= 1 - 0.9082
= 0.092
which means that the probability of accident of an individual driver is around 9% annually, or the
probability of no accident is 91 %. If the driver has driven 8000 miles in a given year without encoun
tering any accident, what is the probability of his/her having an accident for the remainder of that
year? In this case, we have a conditional probability as follows:
/ 8.000- 15,000
0.092 - 0 --------------------
P(8,000 < X < 10.000) \ 3,750
P(X < 10,000 | X > 8.000) =
P(X > 8,000) / 8,000- 15,000
\ 3,759
0.092 - 0(-1.87) 0.092 - (1 - 0.9692) 0.061
“ fl - 0(-1.87)1 ” 1 - (I - 0.9692) ~ 0.9692
= 0.063
Therefore, the probability of accident-free driving for the remainder of the year would be around
94%. ◄
► EXAMPLE 3.11 In the fabrication of steel beams and columns by a manufacturer, there arc unavoidable variabilities
in the dimensions (for example, length) of the steel members. Suppose in a building construction, the
erection of the beams and columns for the building frame would require that the actual lengths be
within a tolerance of ±5 mm of the specified dimensions with a probability or reliability of 99.7%.
What is the required precision, in terms of an allowable a, of the production process?
The variability in the fabrication of steel beams may be assumed to be normally distributed
with zero mean (denoting no systematic bias in the fabrication process) and a standard deviation a,
representing the precision of the process. In this case, the reliability of 99.7% is equivalent to ±3<r’s,
as shown in Fig. 3.7 for the normal PDF. Therefore, to satisfy the required reliability, the tolerance E
must be,
/5 —0\ /—5 —0\ /5\
P(-5 < E < 5) = 0 ------- - 0 --------- I = 20 I — - 1 = 0.997
\ or / \ cr / \<7 /
or
- = 0-'(0.9985) = 2.97
a
From which we obtain cr = 1.68 mm. Hence, the required precision, a, of the fabrication process is
determined to be 1.68 mm. This is an illustration of the 6-cr rule in quality control. ◄
1 1 /In x — XV
fx = ~^=;—;exP n-1 J x >0 (3.22)
V2W)
3.2 Useful Probability Distributions « 101
where X = E(ln X) and £ = >/Vfrr(lnX), are the parameters of the distribution, which means
that these parameters are, respectively, also the mean and standard deviation of 1 n X. The
above PDF is illustrated graphically in Fig. 3.9 for various values of the parameter £. Observe
that this distribution is strictly for positive values of the variate X.
We may observe from Fig. 3.9 that as the parameter < increases, the positive skewness
of the PDF also increases.
In Example 4.2 of Chapter 4, we shall show that if X is lognormal with parameters
X and £, then InX is normal with mean A and standard deviation £; i.e., N(A, <). Because
of this logarithmic relationship with the normal distribution, probabilities associated with
the lognormal distribution can be determined conveniently also from the table of standard
normal probabilities, such as Table A.l. We show this with the following: On the basis of
Eqs. 3.3 and 3.22, the probability that X will assume values in an interval (a, h] is
The parameters X and < of the lognormal distribution are related to the mean and
standard deviation of the random variable X as follows:
Z»OO
1 /In x — X\2
dx = / exp
v2jr< Jo 7 J dx
We can recognize that the integral within the braces is the total area under the Gaussian
PDF of/V(a + £2, £), which is equal to 1.0. Hence,
dx = exp^X + (3.24a)
A = In - -<2 (3.24b)
1 f00 1 1
-75=- / exp ~T72b2-2(x + 2^)+x2]r>’
\ J.7TC, J— oo J
By completing the square term in the exponential, the above integral yields
Again, we can recognize that the quantity inside the braces is the total area under the PDF
of /V(A + 2£2, £), which is equal to 1.0. Thus, we have
Thence, according to Eq. 3.11 and using Eq. 3.24a, we obtain the variance of X as
We should note that if the c.o.v. Of X, 8x, is not large, say <0.30, ln( I + 8x2) — 8x2. In
these cases,
t — 8v (3.26)
The median, instead of the mean, is often used to designate the central value of a lognormal
random variable. By the definition of Eq. 3.8, the median xm is P(X < xm) = 0.50.
For the lognormal distribution, this means
Thus.
ln-r"' = cp-'(0.50) = 0
which means that the mean value of a lognormal variate is invariably larger than the corre
sponding median; i.e., fix > -9n-
► EXAMPLE 3.12 In Example 3.9. if the distribution of storm drainage from the community is a lognormal random
variable instead of normal, with the same mean and standard deviation, the probability of flooding
during a storm would be evaluated as follows.
First, we obtain the parameters A and < of the lognormal distribution as follows: From Eq. 3.25,
0.4 \
C = In 1 + = Infl.ll 1) = 0.105
L2/
Thus,
< = 0.324
= 0.198
104 Chapter 3. Analytical Models of Random Phenomena
which may be compared with the probability of 0.227 from Example 3.9, illustrating the fact that the
result depends on the underlying distribution of the random variable.
Also, with the lognormal distribution, we obtain the probability that the drainage will be between
1.0 mgd and 1.6 mgd:
/In 1.6 —0.130 \ /In 1.0-0.130\
P(1.0 < X < 1.6) = ----- - ----------- J - 0^------ —----- J = 0(1.049) - 0(—0.401)
from which
In .to 90 — 0.130 ।
---- —----------- = 0 ’(0.90) = 1.28
0.324
Therefore, the 90% drainage is
xo.9o = <?0'545 = 1 -72 mgd.
With the normal distribution of Example 3.9. the 90% drainage is 1.71 mgd. ◄
► EXAMPLE 3.13 The time T between breakdowns of a major equipment in an oil platform is defined by a lognormal
distribution with a median of 6 months and a c.o.v. of 0.30. In order to ensure a 95% probability that
the equipment will be operational at any time, the interval between inspections and repairs can be
determined as follows.
In this case, the parameters of the lognormal distribution are: A = In 6 = 1.792; and < ~ 0.30;
then assuming that to = the time between inspection and repair, we would require
or
If the equipment is in operational condition at the time it is scheduled for regular maintenance and
inspection, the probability that it will remain operational without breakdowns for another 2 months
would be
P(T > 5.66) n P(T > 3.66 P(T > 5.66)
P(T > 5.66|T > 3.66) = — ------------ -—-7-------------
P{T > 3.66) P(T > 3.66)
In 5.66- 1.792\
1 -0
030 1 - 0(—0.195) _ । - (1 -0.577)
0.95 (05 “ 0.95
= 0.61
3.2 Useful Probability Distributions 105
Therefore, the probability is 61 % (better than a 50% chance) that the equipment will remain operational
without breaking down for 2 months beyond its scheduled regular maintenance. It may be observed
that the probability of the equipment operating for 5.66 months from the beginning is P(T > 5.66) =
0.423, which is less than 50%. ◄
In the two examples introduced above, we may model each of the problems as a
Bernoulli sequence as follows:
• Over the duration of the project, the operational conditions between equipment are
statistically independent, and the probability of malfunction for every piece of equip
ment is the same; then, the conditions of the entire fleet of equipments constitute a
Bernoulli sequence.
• If the annual maximum floods between any 2 years are statistically independent and
in each year the probability of the flood’s exceeding some specified level is constant,
then the annual maximum floods over a series of years can be modeled as a Bernoulli
sequence.
Values of the CDFs of Eq. 3.30a are tabulated in Table A.2 for specified integer values of
n and given probability p.
We can get a better understanding of the basis of Eq. 3.30 by observing the following:
By virtue of statistical independence, the probability of realizing a particular sequence of
exactly x occurrences and (n — x) nonoccurrences of an event among n trials is /F( 1 — p)n~x.
However, the x occurrences of the event can be permuted among the n trials, so that the
/n \
number of sequences with x occurrences is 1 j; for example, if there are x malfunctions
X (n\
among a fleet of n pieces of equipment, the x malfunctions may occur in I I different
sequences among the n machines. Thus, we obtain Eq. 3.30.
► EXAMPLE 3.14 Five road graders are used in the construction of a highway project. The operational life T of each
grader is a lognormal random variable with a mean life of 1500 hr and a cov of 30% (see Fig. E3.14).
Assuming statistical independence among the conditions of the machines, the probability that two of
the five machines will malfunction in less than 900 hr of operation can be evaluated as follows.
2 3
Machine No.
The parameters of the lognormal distribution are: ( — 0.30; and X = In 1500 — | (0.3)2 = 7.27.
Then, the probability that a machine will malfunction within 900 hr (see Fig. E3.14) is
I ii 900 7 27 \
p = P(T < 900) = ~ J = 4>(-1.56) = 0.0594
For the five machines taken collectively, the actual operational lives of the different machines
may conceivably be as shown in Fig. E3.14; i.e., as illustrated in the figure, machines No. 1 and
4 have operational lives less than 900 hr, whereas machines No. 2, 3. and 5 have operational lives
longer than 900 hr. The corresponding probability of this exact sequence is /r(l — /?)’. But the
two malfunctioning machines may happen to any two of the five machines; therefore, the number of
3.2 Useful Probability Distributions 107
sequences with two malfunctioning machines among the five is 55/253! = 10. Consequently, if X is
the number of road graders malfunctioning in 900 hr,
P(X = 2) =10(0.0594)2(l - 0.0594)3 = 0.0294
Also, the probability of malfunction among the five graders (i.e., there will be malfunctions in one or
more machines) would be
This last result involves the CDF of the binomial distribution, which is tabulated in Table A.2 for
limited values of the parameters. Using Table A.2 with n = 5, x = 2, and p = 0.05, we obtain a value
of 0.9988 from this table.
* ◄
In spite of its simplicity, the Bernoulli model is quite useful in many engineering
applications. There are numerous problems in engineering involving situations with only
two alternative possibilities. Aside from those we have described and illustrated above, other
problems that can be modeled as respective Bernoulli sequences include the following:
• In a series of piles driven into a soil stratum, each pile may or may not encounter
boulders or hard rock.
• In monitoring the daily water quality of a river on the downstream side of an industrial
plant, the water tested daily may or may not meet the pollution control standards.
• The individual items produced on an assembly line may or may not pass the inspection
to ensure product quality.
• In a seismically active region, a building may or may not be damaged annually.
In each of these cases, if the situation is repeated, the resulting series may be modeled as a
Bernoulli sequence.
We might emphasize that in modeling problems with the Bernoulli sequence, the in
dividual trials must be discrete and statistically independent. In spite of this requirement,
however, certain continuous problems may be modeled (approximately at least) with the
Bernoulli sequence. For example, time and space problems, which are generally continuous,
may be modeled with the Bernoulli sequence by discretizing time (or space) into appropriate
intervals and admitting only two possibilities within each interval; what happens in each
time (or space) interval then constitutes a trial, and the series of finite number of intervals
is then a Bernoulli sequence. We illustrate this with the following example.
EXAMPLE 3.15 The annual rainfall (accumulated generally during the winter and spring) of each year in Orange
County, California, is a Gaussian random variable with a mean of 15 in. and a standard deviation of
4 in.; i.e., A( 15, 4). Suppose the current water policy of the county is such that if the annual rainfall
is less than 7 in. for a given year, water rationing will be required during the summer and fall of that
year.
*Table A.2 is limited to specific values of n and p\ for more general values of these parameters, it is more
convenient to use computers to evaluate the required probability (see Examples 5.2 and 5.11 of Chapter 5).
108 Chapter 3. Analytical Models of Random Phenomena
Assuming X is the annual rainfall, the probability of water rationing in Orange County in any
given year is then
/7 — 15 \
P(X < 7) = 0 -------- I = <D(-2.0) = I - 0(2.0) = 1 - 0.9772 = 0.0228
\ 4 /
However, if the county wishes to reduce the probability of water rationing to half that of the current
policy, the annual rainfall below which rationing has to be imposed would be determined as follows:
/ r - 15\ 1
P(X < xr) = — ) = -(0.0228) = 0.0114
Thus,
r — 15
-------- = 0 '(0.0114) = -0-'(0.9885) = -2.28
4
Hence, with the new policy, the annual rainfall below which rationing must be imposed is
xr = 15 - (4)(2.28) = 5.88 in.
Under the current water policy, and assuming that the annual rainfalls between years are statistically
independent, the probability that in the next 5 years there will be at least 1 year in which water
rationing will be necessary would be determined as follows.
Denoting N as the number of years when rationing would be imposed, the probability would be
3\ /3\
Generally, as q = (1 — /?) < 1.0, the infinite series within the above parentheses yields
1/(1 — <y)2 = Up1. Hence, we obtain the return period
- 1
T = - (3.32)
P
which means that the mean recurrence time between two consecutive occurrences of an
event is equal to the reciprocal of the probability of the event within one time interval.
It is well to emphasize that the return period is only an average duration between
consecutive occurrences of an event and should not be construed as the actual time between
the occurrences; the actual time is T, which is a random variable.
EXAMPLE 3.16 Suppose that the building code for the design of buildings in a coastal region specifies the 50-yr wind
as the “design wind.” That is, a wind velocity with a return period of 50 years; or on the average, the
design wind may be expected to occur once every 50 yr.
In this case, the probability of encountering the 50-yr wind velocity in any 1 yr is p = 1/50 =
0.02. Then, the probability that a newly completed building in the region will be subjected to the
design wind velocity for the first time on the fifth year after its completion is
whereas the probability that the first such wind velocity will occur within 5 yr after completion of the
building would be
5
P(T < 5) = 22(O.O2)(O.98)'“1
/—I
= 0.02 + 0.0196 + 0.0192 + 0.0188+0.0184
= 0.096
We might point out that this latter event (the first occurrence of the wind velocity within 5 yr) is the
same as the event of at least one 50-yr wind in 5 yr. which is also the complement of no 50-yr wind
in 5 years; thus, the desired probability may also be calculated as 1 — (0.98)5 = 0.096. However, the
above is quite different from the event of experiencing exactly one 50-yr wind in 5 yr; the probability
/5\
in this case is given by the binomial probability which would be 1 I (0.02) (0.98) = 0.092.
110 ► Chapter 3. Analytical Models of Random Phenomena
► EXAMPLE 3.17 A fixed offshore platform, shown in Fig. E3.17, is designed for a wave height of 8 m above the mean
sea level. This wave height corresponds to a 5% probability of being exceeded per year. The return
period of the design wave height is therefore,
r=i^5=20yr
The probability that the platform will be subjected to the design wave height within the return
period is therefore,
Observe next that the probability of no event occurring within its return period T is
where p = \/T. Expanding the above with the binomial theorem, we have
Furthermore, for large T, or small p, we may recognize that the series on the right side is
approximately equal to e~Tp. Therefore, for large T,
and
In other words, for a rare event that is defined as one with a long return period, T, the
probability of the event’s occurring within its return period is always 0.632. This result is a
useful approximation even for return periods that are not very long; for instance, for T = 20
time intervals, such as in Example 3.16, the probability is
/ 1 X20
v occurrence in T)> = 1 — 11
/’(no on I / =1— 0.6415 = 0.359
\ ------
which shows that the error in the above exponential approximation is less than 1.5%.
► EXAMPLE 3.18 In Example 3.16, the probability that the building in the region will be subjected to the design wind
for the third time on the tenth year is, according to Eq. 3.33,
10 — 1 \
P(T3 = 10) = (0.02)3 (0.98)10-3 (0.02)3 (0.98)7
3— 1 /
= 72(0.00001)(0.8681) = 0.0005
112 Chapter 3. Analytical Models of Random Phenomena
whereas the probability that the third design wind will occur within 5 years would be
► EXAMPLE 3.19 A steel cable is built up of a number of independent wires as shown in Fig. E3.19. Occasionally, the
cable is subjected to high overloads; on such occasions the probability of fracture of one of the wires
is 0.05, and the failure of two or more wires during a single overload is unlikely.
Steel Cable
Load Load
If the cable must be replaced when the third wire fails, the probability that the cable can withstand
at least five overloads can be determined as follows.
First, we observe that the third wire failure must occur at or after the sixth overloading. Hence,
using Eq. 3.33, the required probability is
5
P(T3 > 6) = 1 - P(T3 < 6) = 1 -
>i=3
2\ /3\ /4\
( 2 J (0.05)3 (0.95)° — I 2 J (0.05)3 (0.95) ~ L ) (O.O5)3 (0.95)2
= 1-0.00184 = 0.9982 ◄
20 \
( 10/
(0.5)l0(0.5)20“10 = 0.1762
The above solution is grossly approximate because it assumes that no more than one car
will be making L.T. in a 30-sec interval; obviously, two or more L.T.s are possible.
The solution would be improved if we selected a shorter time interval, say, a 10-sec
interval. Then, the probability of an L.T. in each interval is p = 60/360 = 0.1667, and
Further improvements can be made by subdividing time into shorter intervals. If the time t
is subdivided into n equal intervals, then the binomial PMF would give
/n\/X\7 XV"x
P(x occurrences in t) = I 11 - I (1---- I
\x \"/ \ «/
114 ► Chapters. Analytical Models of Random Phenomena
where X is the average number of occurrences of the event in time t. If the event can occur
at any time (as in the case of left-turn traffic), the time t would need to be subdivided into
a large number of intervals, i.e., n —> oo; then,
and all the other terms approach 1.0 as n —> oo, except the term X'7.v!. Therefore, in the
limit, as n —> oo, we have
P(x occurrences in t) = —e
which is the Poisson distribution of Eq. 3.34, with X = vt. On this basis, with v = I L.T. per
minute, the probability of 10 L.T. in 10 min is then
(I x IO)10
P(Xio = 10) = ———e-lxl° = 0.125
► EXAMPLE 3.20 Historical records of severe rainstorms in a town over the last 20 years indicated that there had been
an average number of four rainstorms per year. Assuming that the occurrences of rainstorms may
be modeled with the Poisson process, the probability that there would not be any rainstorms next
year is
(4 x 1)
P(X, = 0) = Q! e~4 = 0.018
We note from this last result that although the average yearly occurrences of rainstorms is four, the
probability of actually experiencing four rainstorms in a year is less than 20%. The probability of two
or more rainstorms in the next year is
£!-x—
(X,> 2) = v^(4 l)r «
= 1 -0.018-0.074 = 0.908
3.2 Useful Probability Distributions ◄ 115
The different probabilities of the number of rainstorms in a year are tabulated below:
0 0.018 7 0.060
1 0.074 8 0.030
2 0.146 9 0.013
3 0.195 10 0.005
4 0.195 11 0.002
5 0.156 12 0.001
6 0.104 13 0.000
► EXAMPLE 3.21 In designing the left-turn bay at a state highway intersection, the vehicles making left turns at the
intersection may be modeled as a Poisson process. If the cycle time of the traffic light for left turns is
I min. and the design criterion requires a left-turn lane that will be sufficient 96% of the time (which
may be the criterion in some states in the United States), the lane distance, in terms of car lengths, to
allow for an average left turns of 100 per hour, may be determined as follows.
As stated above, the mean rate of left turns at the intersection is v = 100/60 per minute. Let
us suppose the design length of the left-turn lane is k car lengths. Then, during a 1-min cycle of the
traffic light, the design criterion requires that the probability of no more than k cars waiting for left
turns must be at least 96%; therefore, we must have
whereas
if k = 4. P(X, < 4) = 0.968
Therefore, a left-turn bay of four car lengths at the intersection is sufficient to satisfy the design
requirement. ◄
116 Chapter 3. Analytical Models of Random Phenomena
► EXAMPLE 3.22 A structure is located in a region where tornado wind force must be considered in its design. Suppose
that from the records of tornadoes for the past 20 years, the mean occurrence rate of tornadoes in the
region is once every 10 years. Assume that the occurrence of tornadoes can be modeled as a Poisson
process.
If the structure is designed to withstand a tornado force with an allowable probability of damage
of 20%, the probability that the structure will be damaged in the next 50 years is
(0.1 x5O)^_(Olx5O)
P(D)=1 - P(D)=\- 1-0.20)'
_n=0
n!
we may recognize that the limit of the infinite series inside the brackets is the exponential e>,) 8,,x5 °.
Hence, the probability of damage in 50 years is
Obviously, the above probability of damage is much too high. If the structure were to be upgraded
in order to reduce the 50-year probability of damage to 5%, what should be the allowable damage
probability against a tornado wind? In this case, we would have
P(D) = 0.05
which means
* 01 x5°)’e-,0,««» = , _ c-'V"
l-Hl- p),, = 0.05.
So "!
Thus,
1 - e~5p = 0.05
or
p = 0.010
which means that the original structure should be upgraded to reduce the damage probability against
a tornado wind of 0.010 or 1 %.
Now suppose that the regional government plans to upgrade 10 similar structures to the above
standard, i.e., that each structure should have no more than 0.05 probability of damage against tornado
wind in 50 years. Then the probability that at most one of the ten upgraded structures will be damaged
in the next 50 years would be evaluated as follows.
Assuming that the damages between structures are statistically independent, the solution involves
the binomial probability in which the 50-year damage probability of each structure is 0.05. Denoting
X as the number of damaged structures in 50 years,
= 0.914
3.2 Useful Probability Distributions 117
On the other hand, the probability that at least one of the ten structures will be damaged by tornadoes
in the next 50 years is
In the next example, we illustrate a space problem that may be modeled with the Poisson
process.
► EXAMPLE 3.23 A major steel pipeline is used to transport crude oil from an oil production platform to a refinery
over a distance of 100 km. Even though the entire pipeline is inspected once a year and repaired
as necessary, the steel material is subject to damaging corrosion. Assume that from past inspection
records, the average distance between locations of such corrosions is determined to be 0.15 km. In
this case, if the occurrence of corrosions along the pipeline is modeled as a Poisson process with a
mean occurrence rate of v — 0.15/km, the probability that there will be 10 locations of damaging
corrosion between inspections is
P(X|.H,= IO)=«^^^.-™ = 0.049
whereas the probability of at least five corrosion sites between inspections would be
A(0.15xl00)" ni5xino
P(X100 > 5) = 1 - P(X100 < 5) = 1 - V--------- -------- e-o>5xioo
to
_ ! _ [We->5 + + (JC-5 +
0! 1! 2! 3! 4!
► EXAMPLE 3.24 In the last 50 years, suppose that there were two large earthquakes (with magnitudes M > 6) in
Southern California. If we model the occurrences of such large earthquakes as a Bernoulli sequence,
the probability of such large earthquakes in Southern California in the next 15 years would be evaluated
as follows.
First, we observe that the annual probability of occurrence of large earthquakes is p = 2/50 =
0.04. Then,
The maximum intensity of seismic ground shaking at a given site can be measured in terms of g
(gravitational acceleration = 980 cm/sec2). Suppose that during an earthquake of M > 6, the ground
shaking intensity Y at a particular building site has a lognormal distribution with a median of 0.20g
and a c.o.v. of 0.25. If the seismic capacity of a building is 0.30g, the probability that the building
will suffer damage during an earthquake of magnitude M >6 would be
We should emphasize that in both the Bernoulli sequence and the Poisson process, the
occurrences of an event between trials (in the case of the Bernoulli model) and between
intervals (in the Poisson model) are statistically independent. More generally, the occurrence
of a given event in one trial (or interval) may affect the occurrence or nonoccurrence of the
same event in subsequent trials (or intervals). In other words, the probability of occurrence
of an event in a given trial may depend on earlier trials, and thus could involve conditional
probabilities. If this conditional probability depends on the immediately preceding trial (or
interval), the resulting model is a Markov chain (or Markov process); the essential principles
of the Markov chain are described in Ang and Tang, Vol. 2 (1984).
*
*This reference is out of print, but is available by direct order from ahang2@aol.com.
3.2 Useful Probability Distributions 119
► EXAMPLE 3.25 According to Benjamin (1968), the historical record of earthquakes in San Francisco from 1836 to
1961 shows that there were 16 earthquakes with ground motion intensity in MM-scale of VI or higher.
If the occurrence of such high-intensity earthquakes in the San Francisco-Bay Area can be assumed to
constitute a Poisson process, the probability that the next high-intensity earthquake will occur within
the next 2 years would be evaluated as follows.
The mean occurrence rate of high-intensity earthquakes in the region is
16
v = -j^r =0.128 quake per year
The above is equivalent to the probability of the occurrence of such high-intensity earthquakes (one
or more) in the next two years. With the Poisson model, this latter probability would be
Again, this probability may also be evaluated with the Poisson distribution as
(0.128 x 10)° ^,-0.128x10
P(X,o = O) = = 0.278
0!
120 Chapter 3. Analytical Models of Random Phenomena
The return period of an intensity VI earthquake in San Francisco, according to Eq. 3.37, is
therefore,
In general, the probability of occurrence of large earthquakes within a given time t is given by the
CDF of T\; in the present case, this is
P(J} < /) = 1 -e-°-128'
In particular, the probability of high-intensity earthquakes occurring within the return period of
7.8 years in the San Francisco area would be
P(T} < 7.8) = 1 - e-0128x7-8 = 1 - e"10 = 0.632 ◄
In fact, as illustrated above, for a Poisson process the probability of an event occurring
(once or more) within its return period is always equal to 1 — e-10 = 0.632. This may be
compared with the probability of events with long return periods of the Bernoulli model,
as discussed earlier in Sect. 3.2.3.
Of course, the exponential distribution is also useful as a general-purpose probability
function. We saw this as illustrated earlier in Example 3.3 to describe the useful life of
welding machines. In general, the PDF of the exponential distribution is
► EXAMPLE 3.26 Suppose that four identical diesel engines are used to generate backup electrical power for the emer
gency control system of a nuclear power plant. Assume that at least two of the diesel-powered units
are required to supply the needed emergency power; in other words, at least two of the four engines
must start automatically during sudden loss of outside electrical power. The operational life T of each
diesel engine may be modeled with the shifted exponential distribution, with a rated mean operational
life of 15 years and a guaranteed minimum life of 2 years.
In this case, the reliability of the emergency backup system would clearly be of interest. For
example, the probability that at least two of the four diesel engines will start automatically during an
emergency within the first 4 years of the life of the system can be determined as follows.
First, the probability that any of the engines will start without any problem within 4 years is
Then, denoting N as the number of engines starting during an emergency, the reliability of the backup
system within 4 years is
4\ A A\ 1
= 0.990
( 0/
(0.1426)4 -
\1 /
|(0.8574)(0.1426)3 = 1 - 0.0004 - 0.0099
Therefore, the reliability of the backup system within 4 years is 99%, even though the reliability
of each engine is only about 86%. ◄
where v and k are the parameters of the distribution, and V(k) is the gamma function
/ yk 'e ydy
I(u, k) = ----------------
HAr)
Then, the probability of (a < X < £>) can be obtained as
vk /Cb xk, ~]e~vxdx
-------
P(a
r(k)Ja
If we let v = vx, the above integral becomes
i r rvh rva
P(a yk~]e~ydy — I yk~le~ydy
Wo Jo (3.43a)
= Z(vZ?. k) - l(va, k)
Therefore, in effect, the incomplete gamma function ratio is also the CDF of the gamma
distribution.
3.2 Useful Probability Distributions 123
► EXAMPLE 3.27 The gamma distribution may be used to represent the distribution of the equivalent uniformly dis
tributed load (EUDL) on buildings. For a particular building, if the mean EUDL is 15 psf (pounds per
square foot) and the c.o.v. is 25%. the parameters of the appropriate gamma distribution are.
rr \/k/v 1 1 1
thus, k — — =--------r = 16
M ~ k/v y/k' 52 (0.25)2
and
k
v = — — = 1.067
d 15
The design live load is generally specified (conservatively) to be on the high side. For instance, if the
design EUDL is specified to be 25 psf, the probability that this design load will be exceeded according
to Eq. 3.43a, is
P(L > 25) = 1 - P(L < 25) = 1 - Z(25 x 1.067, 16) = 1 - 7(26.67, 16)
= 1 - 0.671 = 0.329
W = =x) = 1 - E-re-"
x=k x=0
Taking the derivative of the above CDF, it can be shown that the PDF of Tk is as follows:
v (v t )k~'
M0=7~LT7e~V' fort >0 (3.44)
(/C — 1)!
The above gamma distribution with integer A is known also as the Erlang distribution. In
this case, the mean time until the Ath occurrence of an event is
E(Jk) = k/v
and its variance is
Var(Tk) = k/v2
We can see that for A = 1, i.e., for the time until the first occurrence of an event, Eq. 3.44
is reduced to the exponential distribution of Eq. 3.36.
► EXAMPLE 3.28 Suppose that fatal accidents on a particular highway occur on the average about once every 6 months.
If we can assume that the occurrences of accidents on this highway constitute a Poisson process, with
mean occurrence rate of v = 1/6 per month, the time until the occurrence of the first accident (or
124 Chapter 3. Analytical Models of Random Phenomena
between two consecutive accidents) would be described by the exponential distribution, specifically
with the following PDF:
y7.1(O = le-'/6
o
Figure E3.28 PDFs of times until the occurrences of first, second, and third accidents on a highway.
The time until the occurrence of the second accident (or the time between every other accidents) on
the same highway is described by the gamma distribution, with the PDF
/r2(O = ^a/6)e-i/6
o
whereas the time until the occurrence of the third accident would also be gamma distributed, with the
PDF
The above PDFs are illustrated graphically in Fig. E3.28, and the corresponding mean occurrence
times of T\, T2, and T2 are, respectively, 6. 12, and 18 months. ◄
We might recognize that the exponential and gamma distributions are the continuous
analogues, respectively, of the geometric and negative binomial distributions; that is, the
geometric and negative binomial distributions govern the first and Ath occurrence times
of a Bernoulli sequence, whereas the exponential and gamma distributions govern the
corresponding occurrence times of a Poisson process.
in which v, k > 1.0, and y are the three parameters of the distribution. We observe that Eq.
3.45 is reduced to Eq. 3.42 if y = 0.
The mean and variance of X are, respectively.
k k
P-x = - + y and °A2
v v2
EXAMPLE 3.29 The three-parameter gamma distribution can be shown to give better fit with statistical data when there
is significant skewness in the observed data. For instance, shown below in Fig. E3.29 is the histogram
of measured residual stresses in the flanges of steel H-sections. The mean, standard deviation, and
skewness coefficient of the measured ratios of residual stress/yield stress are, respectively, 0.3561,
0.1927, and 0.8230.
Clearly, because the data show significant skewness, a three-parameter distribution is necessary
in order to include the skewness for adequately fitting the histogram of the measured residual stresses.
As shown in Fig. E3.29, the three-parameter gamma PDF (solid curve) that includes the skewness
of 0.8230 has a much closer fit to the histogram than the normal or lognormal distributions which
are, of course, two-parameter distributions. This is further verified later in Example 7.10 with the K-S
goodness-of-fit test. ◄
UNIVtHSlDAL) JAVERIANA
Biblioteca General
Carrera 7 no. 4i-oo
Santafe de Bogota
126 Chapter 3. Analytical Models of Random Phenomena
The hypergeometric distribution arises when samples from a finite population, consisting
of two types of elements (e.g., “good” and “bad”), are being examined. It is the basic
distribution underlying many sampling plans used in connection with acceptance sampling
and quality control.
Consider a lol of N items, among which m are defective and the remaining (N — nt)
items are good. If a sample of n items is taken at random from this lot. the probability that
there will be x defective items in the sample is given by the hypergeometric distribution as
follows:
The above distribution is based on the following: In the lot of N items, the number of
samples of size n is ( ^ ): among these, the number of samples with x defectives is
Therefore, assuming that the samples are equally likely to be selected, we obtain the hyper
geometric distribution of Eq. 3.46.
► EXAMPLE 3.30 In a box of 100 strain gages, suppose we suspect that there may be four gages that are defective. If
six of the gages from the box were used in an experiment, the probability that one defective gage was
used in the experiment is evaluated as follows (in this case, we have N = 100, m = 4, and n = 6);
thus,
/4\ /100-4\
\l/\6—1 /
P(X = 1) = ..------------------ = 0.205
I 100 i
\6 /
whereas the probability that none of the defective gages in the box was used in the experiment is
A \ /100 - 4\
\0 / \6 /
P(X = 0) = x ,------------- = 0.778
I 100 I
\6 /
and at least one defective gage was used in the experiment would be
► EXAMPLE 3.31 In a huge reinforced concrete construction project, 100 concrete cylinders are to be collected from
the daily concrete mixes delivered to the construction site. Furthermore, to ensure material quality,
the acceptance/rejection criterion requires that ten of these cylinders (selected at random) must be
3.2 Useful Probability Distributions 127
tested for crushing strength after curing for 1 week, and nine of the ten cylinders tested must have a
required minimum strength. Is the acceptance/rejection criterion stringent enough?
Whether the acceptance/rejection criterion is too stringent, or not stringent enough, depends on
whether it is difficult or easy for poor-quality concrete mixes to go undetected. For example, if there
is d percent of defective concrete, then on the basis of the specified acceptance/rejection criterion, the
probability of rejection of the daily concrete mixes would be (denoting X as the number of defective
cylinders in the test)
P (rejection) = 1 —
whereas if d = 2%,
Therefore, if 5% of the concrete mixes were defective, it is likely (with 41% probability) that the
defective material will be discovered with the proposed acceptance/rejection criterion, whereas if 2%
of the concrete mixes were defective, the likelihood of the daily mixes being rejected is very low (with
0.009 probability).
Hence, if the contract requires concrete with less than 2% defectives, then the proposed accep
tance/rejection criterion is not stringent enough; on the other hand, if material with 5% defectives is
acceptable, then the proposed criterion may be satisfactory. ◄
Figure 3.11 The beta distribution with q = 2.0 and r = 6.0 for (2 < x < 12).
in which q and r are the parameters of the distribution, and B(q, r) is the beta function
(l
/x(x) = —----- x«-
* -xf-1 0 < x < 1.0
B(q,r) (3.47a)
=0 otherwise
which may be called the standard beta distribution.
Figure 3.12 shows the standard beta PDFs with different values of q and r From this
figure, we can observe that the beta PDF assumes different shapes depending on the values
Figure 3.12 The standard beta PDFs with different values of q and r.
3.2 Useful Probability Distributions 129
of the two parameters q and r. In particular, we also observe that for q < r, the PDF is
positively Skewed, whereas for q > r it is negatively skewed, and for q = r the PDF is
symmetric. With these characteristics, the beta distribution is quite versatile and can be
used to fit a wide range of histograms of observed data.
The probability associated with a beta distribution can be evaluated in terms of the
incomplete beta function, which is defined as
/3(x|<7, r) = 1 - fi(x\r, q)
For a general beta distribution of Eq. 3.47, the probability between x = xj and x = X2 may
be evaluated as follows:
'X2 (x — a)q '(b — x)r 1
P(Xi ---- ---- —TT-i--- dx
B(q. r) (.b
If we substitute
x —a
b—a
we also have
b—x dx
and dy = 7-----
b—a b—a
Hence, the above integral becomes
| r p(x2-a)/(b-a) (X| -a)/(h-a)
P(X! ——- / yq '(1 -.y)' 'dy— / y9-l(l — yf~'dy
B(^,r)LJo Jo
Therefore, denoting u = (x2 — a)/(b — a) and v = (xi — a)Kb — a\ the above probability
can be evaluated in terms of the CDF of the standard beta distribution, Eq. 3.51, as
The mean and variance of X with a beta distribution between a and b are
Vx =a ----- — (b — a) (3.53)
q+r
x = a + ---------—(b - a) (3.55)
2—q —r
► EXAMPLE 3.32 The duration required to complete an activity in a construction project has been estimated by the
subcontractor to be as follows:
Minimum duration = 5 days
Maximum duration = 10 days
Expected duration = 7 days
The coefficient of variation of the required duration is estimated to be 10%.
In this case, the beta distribution may be appropriate with a = 5 days and b = 10 days. The
parameters of the distribution would be determined as follows: With Eq. 3.53, we have
5 + —^—(10 — 5) = 7
<7 + r
giving
Then, substituting this into the expression for the variance, Eq. 3.53a, we have
yielding
q = 3.26 and r = 4.89
The probability that the activity will be completed within 9 days is then given by
P(T < 9) = j8„(3.26, 4.89)
in which u = (9 — 5)/( 10 — 5) = 0.8. From tables of the incomplete beta function ratios (e.g., Pearson
and Johnson, 1968) we obtain after suitable interpolation
► EXAMPLE 3.33 In evaluating the reliability of steel bridge components against fatigue damage, the stress range (i.e., the
maximum minus the minimum applied stresses) in each loading cycle is the principal load parameter.
For this purpose, it is reasonable to assume that the stress range has finite lower and upper bound
values.
As a specific example, consider the strain-ranges measured at a particular interstate highway
bridge in Illinois under heavy traffic as shown in Fig. E3.33. The corresponding stress ranges (in psi)
can be obtained by multiplying the strain range, in micro in./in. of Fig. E3.33 by 30,000 psi/1000 =
30 psi. On this basis, we may model the applied stress ranges, 5, on the particular highway bridge
with a beta PDF with parameters q = 2.83 and r = 4.39, and a lower bound value of 0 and an upper
bound value of 10,000 psi.
Suppose a steel-wide flange beam of the bridge is subjected to the beta-distributed stress ranges
of Fig. E3.33. The fatigue life of metal structures may be described with the so-called SN relation
3.2 Useful Probability Distributions - 131
Figure E3.33 Histogram of measured strain range of a beam, Shaffer Creek Bridge, IL under heavy
traffic (after Ruhl and Walker, 1975).
specified by two parameters c and m. In the present case of the steel-wide flange beam, these SN
parameters are
c = 3.98 x IO8 cycles and m — 2.75
The mean fatigue life when subjected to random stress ranges is given by
_ c
n = --------
E(Sm)
in which E(5"') is the mth moment of S; for a beta-distributed S, this is (see Ang, 1977)
r(m + 9)T(<7 + r)
E(S'") = s”
r(^)f(m + q +r)
In the case of the above wide flange beam.
r(5.58)r(7.22)'
E(Sm) = IO275 = 89.486
r(2.83)T(9.97)_
Thus, the mean fatigue life of the beam is
3.98 x 108
= 4.45 x 106 cycles
89.486
As any pair of values of the random variables X and Y represents events, there are probabil
ities associated with given values of x and y; accordingly, the probabilities for all possible
pairs of x and y may be described with the joint distribution function, CDF. of the random
variables X and Y; namely,
Fx.y(x, y) = P(X <x,Y < y) (3.56)
which is the CDF of the joint occurrences of the events identified by X < x and Y < y. In
order to comply with the fundamental axioms of probability, the above CDF must satisfy
the following:
whereas, if the random variables X and Y are continuous, the joint probability distribution
may also be described with the joint PDF, fx.Y(x,y), defined as follows:
fx.Y^x, y)dxdy = P(x < X < x + dx, y < Y < y + dy)
3.3 Multiple Random Variables 133
Then,
n fx.Y(n, v)dvdu
which is the volume under the surface f(x,y) as shown in Fig. 3.13.
If the random variables X and Y are statistically independent, meaning the events X = x,
and Y = y7- are statistically independent, then
If the random variables X and Y are continuous, the conditional PDF of X given Y is
fx y(x, y)
Air(x|y) - (3.64)
,/r(v)
from which we also have
or
fx.Y(x, y) = fY\x(y\x)fx(x)
However, if X and Y are statistically independent, i.e., /x|yU|y) = /xW and /y|x(y|x) =
/y(y), then the joint PDF becomes
Finally, through the theorem of total probability, we obtain the marginal PDFs,
ZOO fx\Y(x\y)fY(.y)dy=
POO
fx.Y{x,y)dy (3.67)
■oo J—oo
Figure 3.14 Joint and marginal PDFs of two continuous random variables.
3.3 Multiple Random Variables 4 135
Similarly.
Z»OO
/y(y) = / fx,y(x,y)dx (3.68)
J —00
The characteristics of a joint PDF for two random variables, X and Y, and the associated
marginal PDFs, are portrayed graphically in Fig. 3.14.
► EXAMPLE 3.34 Based on a survey of construction labor and its productivity, the work duration (in number of hours per
day) and the average productivity (in terms of percent efficiency) were recorded as tabulated below.
For simplicity, the data for work duration are recorded as 6. 8, 10, and 12 hr, whereas the average
productivity is recorded as 50%, 70%, and 90%. Denoting X = duration, and Y = productivity, the
recorded data are as follows:
The above data can be portrayed graphically as the joint PMF of X and Y, as shown in Fig.
E3.34a.
The marginal PMF of X, the work duration, is
pxU) = ^2 px,y(x, y)
(>7=50,70.90)
Similarly, the marginal PMF of Y, for productivity, would be as shown in Fig. E3.34c.
Finally, if the work duration is 8 hr/day, the probability that the average productivity will be 90%
is given by the conditional probability of Eq. 3.61a as
Therefore, the probability of achieving 90% efficiency for an 8 hr/day work duration is less than 42%.
The probabilities of other productivity levels for an 8-hr work day are shown in the conditional PMF
of Fig. E3.34d. ◄
EXAMPLE 3.35 An example of a joint PDF of two continuous random variables X and Y is the bivariate normal
density function given by
1 -1 I ( x - pX\~
fx.Y^x, y) = - ----------- ------ exp —-----— I —---- I
27T<JxOYy/] — p- 2( 1 — p-) | \ CSX /
in which p is the correlation coefficient between X and Y (as defined in Sect. 3.3.2). We can show
that the above joint PDF may be written also as
1 1
A.rU, y) = exp
v2jtctx y/ljtCTy ,/l — p2
V 2'
Then, in light of Eq. 3.64, we see that the conditional PDF of Y given X = x is
\ 2'
( Oyy/\ — p2
y - pY - p(ay/ax)(x - px) \
/
1 /x - Px \
/x(x) = 2 \ ax )
both of which are Gaussian. In particular, we can observe that the conditional PDF is normal with a
mean value of
E(Y\X = x) = pY - p(o-y/CTx)(x - Px)
and variance of
Var(Y\X = x) = oy2(l - p2)
x - px - p(crx/ay)(y - py)
/xirUl.v) = -=----- , . exp
s/2noxx/\ - p2
When there are two random variables X and Y, there may be a relationship between the vari
ables. In particular, the presence or absence of a linear statistical relationship is determined
as follows: First, we observe the joint second moment of X and Y as
Observe that p is dimensionless; its values range between — 1.0 and 1.0; i.e..
-1<P<+1 (3.73)
which we can verify as follows: Schwarz's inequality (Kaplan, 1953; Hardy, Littlewood,
and Polya, 1959) says
“ z»oo />oo -|2
and
/ (y - Mr)2/r(y)Jy = ar2
J-OQ
Hence, we have
[Cov (X, F)]2 < ax2aY2
3.3 Multiple Random Variables 139
0 x
(f)p=O
Figure 3.15 Significance of correlation coefficient p.
or
p2 < 1-0
thus verifying Eq. 3.73.
The significance of the correlation coefficient, p, is illustrated graphically in Fig. 3.15.
Tn particular, we observe that when p = ± 1.0, the random variables X and Y are linearly
related as shown in Fig. 3.15(a) and 3.15(b), respectively, whereas when p = 0. values of
(X, F) pairs will appear as in Fig. 3.15(c). For intermediate values of p, values of (X, F)
pairs will appear as shown in Fig. 3.15(d); the “scatter” in the data points will decrease
as p increases. Finally, we should also observe from Figs. 3.15(e) and 3.15(f) that when
the relation between X and F is nonlinear, p — 0 even though there is a perfect functional
relationship between the variables.
Therefore, the magnitude of the correlation coefficient, p (between 0 and 1), is a
statistical measure of the degree of linear interrelationship between two random variables.
Note: It is also well to point out that although p is a measure of the degree of linear rela
tionship between two variables, this does not necessarily imply a causal effect between the
variables. Two random variables X and F may both depend on another variable (or variables).
140 Chapter 3. Analytical Models of Random Phenomena
such that the values of X and Y may be highly correlated, but the values of one variable may
have no direct effect on the values of the other variable. For instance, the flood flow of a river
and the productivity of a construction crew may be correlated because both depend on the
weather condition: however, the Hood flow may have no direct influence on the productivity
of the construction crew, or vice versa. For another illustration, consider the next problem in
Example 3.36 from the field of mechanics.
► EXAMPLE 3.36 A cantilever beam, shown in Fig. E3..36 below, is subjected to two random loads, S| and .S'2, that are
statistically independent with respective means and standard deviations of Mb o’] and /z2, rr2- The
shear force Q and bending moment M at the fixed support of the beam are both functions of the two
loads, respectively, as follows:
Q = 5, + S,
Although the two loads, 5j and S2, are statistically independent, Q and M will be correlated; this
correlation can be evaluated as follows:
E{QM) = EL(S! + S2)(aSt + 2aS2)J = a£(S2) + 3aE(S]S2) + 2a£(S22)
But £(S|S2) = £(Si)£(52), and £(.S’|2) = <7|2 + Mt2i £(S22) = ^22 + M22- Thus,
Pq.m = f— = 0-948
indicating a strong correlation between the shear force Q and bending moment M at the support. This
correlation arises because both Q and M are functions of the same loads S| and S2; however, there is
no causal relation between Q and M. ◄
Problems ◄ 141
► PROBLEMS
3.1 The duration in days of two activities A and B in a construc
tion project are denoted as TA and TB, whose PMFs are given
graphically below as follows.
(a) What is the probability that the contractor will lose money
on this job? (Ans. 0.2).
(b) Suppose the contractor declares that he has made money on
this project. What is the probability that his profit was more than
forty thousand dollars? (Ans. 0.1875)
3.3 The storm runoff X (in cubic meters per second, ems) from
a subdivision can be modeled by a random variable with the
following probability density function:
Assume that TA and TB are statistically independent, and activity fx(x) = c x for 0 < x < 6
B will begin as soon as activity A has been completed. Deter
= 0; otherwise
mine and plot the PMF of the total time T required to complete
both activities.
(a) Determine the constant c and sketch the PDF. (Ans. 1/6).
3.2 The profit (in thousand dollars) of a construction project is (b) The runoff is carried by a pipe with a capacity of 4 cms.
described by the following PDF: Overflow will occur when the runoff exceeds the pipe capacity.
142 Chapter 3. Analytical Models of Random Phenomena
If overflow occurs after a storm, what is the probability that the 3.7 The annual maximum snow load X (in lb/ft2) on buildings
runoff in this storm is less than 5 cms? (Ans. 0.714) with a flat roof in a northern U.S. location can be modeled by a
(c) An engineer considers replacing the current pipe by a larger random variable with the following CDF:
pipe having a capacity of 5 cms. Suppose there is a probability of
60% that the replacement would be completed prior to the next
storm. What is the probability of overflow in the next storm? Fx(x) = 0 forx < 0
(Ans. 0.1)
= I - forx > 0
3.4 Severe snow storm is defined as a storm whose snowfall
exceeds 10 inches. Let X be the amount of snowfall in a severe
snow storm. The cumulative distribution function (CDF) of X in (a) An engineer recommends a design snow load of 30 lb/ft2 for
a given town is a building. What is the probability of roof failure (design load
being exceeded) in a given year? Probability that failure will
for a- > 10 occur for the first time in the fifth year?
(b) If roof failure should occur during 2 or more years over the
forx < 10 next 10 yr, the design engineer will face a penalty. What is his
chance of going through the next 10 yr without a penalty?
(a) Determine the median of X.
(b) What is the expected amount of snowfall in the town in a 3.8 The CDF of the daily progress in a tunnel excavation project
severe snow storm? is described graphically below
(c) Suppose a disastrous snow storm is defined as a storm with
over 15 inches of snowfall. What percentage of the severe snow
storms are disastrous?
(d) Suppose the probability that the town will experience 0, I,
and 2 severe snow storms in a year is 0.5, 0.4, and 0.1, respec
tively. Determine the probability that the town will not experi
ence a disastrous snow storm in a given year. Assume the amounts
of snowfall between storms are statistically independent.
3.5 Suppose X is a random variable defined as
% final cost of project
estimated cost of project
which has a PDF as follows:
(a) What is the probability that the progress in tunnel excavation
x < 1 will be between 2 and 8 m on a given day?
A A) = 1 <x <a (b) Determine the median length of tunnel excavated in 1 day.
x > 1.5 (c) Plot the probability density function (PDF) of the daily
progress in tunnel excavation.
(a) Determine the value of a (Ans. 1.5) (d) Determine the mean length of tunnel excavated in 1 day.
(b) What is the probability that the final cost of a project will 3.9 The maximum load S (in tons) on a structure is modeled by
exceed its estimated cost by 25%? (Ans. 0.4) a continuous random variable 5 whose CDF is given as follows:
(c) Determine the mean value and standard deviation of X. (Ans.
1.216, 0.143)
0 for 5 < 0
3.6 The duration of a rainstorm at a given location is described 3 2
by the following PDF: Fs(s) = 864 + 48 forO < .v < 12
(b) What is the probability that the platform will not be sub
jected to waves exceeding its design value during its lifetime of
30 yr? (Ans. 0.860)
the amount of factory production. Suppose the event of poor air (b) If fatalities are involved in 20% of the accidents, what is
quality occurs as a Poisson process with a mean rate of once per the probability of fatalities occurring at this intersection over a
month. During each time period when the air quality becomes period of 2 months? Assume that events of fatalities between
substandard, its pollutant concentration may reach a hazardous accidents arc statistically independent.
level with a 10% probability. The pollutant concentrations bet
3.22 A town is bordered by two rivers as shown in the follow
ween any two periods of poor air quality may be assumed to be
ing figure. Levees A and B were constructed to protect the town
statistically independent.
from high water in the rivers. The levees were both designed for
(a) What is the probability of at most two periods of poor air Hoods with return periods of 5 and 10 yr, respectively. Assume
quality during the next 4-1/2 months? (Ans. 0.174) that the events of flooding from the two rivers are statistically
(b) What is the probability that the air quality would reach a independent.
hazardous level during the next 3 months? (Ans. 0.259) (a) Determine the probability that the town will encounter flood
3.18 A country is subject to natural hazards such as floods, earth ing in a given year. (Ans. 0.28)
quakes, and tornadoes. Suppose earthquakes occur according to
a Poisson process with a mean rate of 1 in 10 yr; tornado occur
rences are also Poisson-distributed with a mean rate of 0.3 per
year. There can be either one or no flood each year; hence the
occurrence of a flood each year follows a Bernoulli sequence,
and the mean return period of floods is 5 yr. Assume floods,
earthquakes, and tornadoes can occur independently.
(a) If no hazards occur during a given year, it is referred to as a
“good” year. What is the probability of a “good” year?
(b) What is the probability that 2 of the next 5 yr will be good
years? (Ans. 0.287)
(c) What is the probability of only one incidence of natural haz (b) What is the probability that the town will be flooded in at
ard in a given year? (Ans. 0.349) least 2 of the next 5 yr? (Ans. 0.43)
(c) Suppose the townspeople desired to reduce the annual prob
3.19 The occurrence of traffic accidents at an intersection may ability of flooding to at most 15%. Levee A may be improved
be modeled as a Poisson process, and based on historical records to be capable of floods with return periods of 10 or 20 yr with
the average rate of accidents is once every 3 yr. an investment of 5 or 20 million dollars, whereas levee B may
(a) What is the probability that there will be no accident at the be improved for floods with return periods of 20 or 30 yr with
intersection for a period of 5 yr? corresponding investment of 10 or 20 million dollars. What is
(b) Suppose that in every accident at the intersection, there is the optimal course of action? (Ans. A to 10 years and B to
a 5% probability of fatality. Based on the above Poison model, 20 years)
what is the probability of traffic fatality al this intersection over
3.23 The exterior of a building consists of one hundred 3 m x
a period of 3 yr?
5 m glass panels. Past records indicate that on the average one
3.20 A highway traffic condition during a blizzard is hazardous. flaw is found in every 50 nr of this type of glass panels; also a
Suppose one traffic accident is expected to occur in each 50 panel containing two or more flaws will eventually cause break
miles of highway on a blizzard day. Assume that the occurrence age problems and have to be replaced. The occurrence of flaws
of accidents along the highway is modeled by a Poisson pro may be assumed to be a Poisson process.
cess. For a stretch of highway that is 20 miles long, consider the (a) What is the probability that a given panel will be replaced?
following: (Ans. 0.037)
(a) What is the probability that at least one accident will occur (b) Replacement of glass panel is usually expensive. If each
on a given blizzard day? (Ans. 0.33) replacement costs $5,000, what is the expected cost for replace
(b) Suppose there are five blizzard days this winter. What is the ments of the glass panels in the building? (Ans. $18,500)
probability that two out of these five blizzard days are accident (c) A higher-grade glass panel, which costs $ 100 more each, has
free? Assume that accident occurrences between blizzard days on the average one flaw in every 80 nr. Should you recommend
are statistically independent. (Ans. 0.16) using the higher-grade panel, if the objective is to minimize the
total expected cost of the glass panels (initial cost and replace
3.21 The occurrence of accidents at a busy intersection may be
ment cost)?
described by a Poisson process with an average rate of three
accidents per year. 3.24 The truck traffic on a certain highway can be described as
(a) Determine the probability of exactly one accident over a a Poisson process with a mean arrival rate of 1 truck per minute.
2-month period. Would this be the same as the probability of The weight of each truck is random, and the probability that a
exactly two accidents in a 4-month period? Explain. truck is overloaded is 10%.
Problems ◄ 145
(a) What is the probability that there will be at least two trucks (a) Suppose a typical structural connection requires 30 inches
passing a weigh station on this highway in a 5-min period? (Ans. of weld and acceptance of such a connection requires the weld
0.96) to be flawless. What is the probability that a welded connection
(b) What is the probability that at most one of the next five will be acceptable? (Ans. 0.779)
trucks stopping at the weigh station will be overloaded? (Ans. (b) For a welding job consisting of three similar structural con
0.92) nections, what is the probability that at least two connections
(c) Suppose the weigh station will close for 30 min during lunch will be acceptable? (Ans. 0.875)
hour. What is the probability of overloaded trucks passing the (c) What is the probability that there is altogether only one flaw
station during the lunch break? (Ans. 0.95) in the three structural connections? (Ans. 0.354)
3.25 The occurrence of tornadoes in a county can be modeled 3.29 The following is the 10-yr record of floods between 1994
as a Poisson process. Twenty tornadoes have touched down in a and 2003 in Town A.
county within the last twenty years. If there is at least one occur
rence of tonadoes in a year, that year is classified as a “tornado
year.” Year Number of Floods Year Number of Floods
(a) What is the probability that next year will be a “tornado 1994 1 1999 0
year”? 1995 0 2000 2
(b) What is the probability that there will be 2 “tornado years” 1996 1 2001 0
within the next 3 yr? 1997 1 2002 0
(c) On the average, over a 10-yr period, 1998 0 2003 1
(i) How many tornadoes are expected to occur?
(ii) How many “tornado years” are expected to occur?
The occurrences of floods in the town may be modeled as a
3.26 Strong earthquakes occur according to a Poisson process Poisson process.
in a metropolitan area with a mean rate of once in 50 yr. There (a) On the basis of the above historical flood data, determine
are three bridges in the metropolitan area. When a strong earth the probability that there will be between one and three floods
quake occurs, there is a probability of 0.3 that a given bridge will in Town A over the next 3 yr.
collapse. Assume the events of collapse between bridges during (b) A sewage treatment plant is located on a high ground in the
astrong earthquake are statistically independent; also, the events town. The probability that it will be inundated during a flood is
of bridge collapse between earthquakes are also statistically in 0.02. What is the probability that the treatment plant will not be
dependent. inundated for a period of 5 yr?
(a) What is the probability of at most one strong earthquake
occurring in this metropolitan area within the next 20 yr? (Ans. 3.30 Highway traffic accidents can be classified into either in
0.938) jury (I) or noninjury (N) accidents. In a given year, the occurrence
(b) During a strong earthquake, what is the probability that ex rates of these two types of accidents along a stretch of highway
actly one of the three bridges will collapse? (Ans. 0.441) are 0.01 and 0.05 per mile, respectively. Assume that the oc
(c) What is the probability of “no bridge collapse from strong currence of each type of accidents along the highway follows
earthquakes” during the next 20 yr? (Ans. 0.769) a Poisson process. Consider a highway that runs between two
cities that are 50 miles apart.
3.27 One of the hazards to an existing underground pipeline (a) Determine the probability that there will be exactly two
is due to improperly conducted excavations. Consider a system noninjury accidents in a given year.
consisting of 100 miles of pipeline. Suppose the number of exca (b) Determine the probability that there will be at least three
vations along this pipeline over the next year follows a Poisson accidents in a given year.
process with a mean rate of 1 per 50 miles. Forty percent of the (c) Suppose that exactly two accidents occurred last year. What
excavations are expected to result in damage to the pipeline. As is the probability that both of them involved injuries?
sume the events of damage between excavations are statistically
independent. 3.31 The occurrence of thunderstorms in Peoria, Illinois, may
be assumed to follow a Poisson process during each of the two
(a) What is the probability that there will be at least two exca
seasons, namely,
vations along the pipeline next year?
I. Winter (October to March)
(b) Suppose that two excavations will be performed. What is the
II. Summer (April to September)
probability that the pipeline will be damaged?
A 21-yr record reveals that a total of 173 thunderstorms have
(c) What is the probability that the pipeline will not be damaged
taken place during the winter seasons, whereas 840 thunder
from excavations next year?
storms have occurred during the summer seasons.
3.28 Flaws in welding may be assumed to occur according to a (a) Estimate the mean rate of occurrence of thunderstorms per
Poisson process with a mean rate of 0.1 per foot of weld. month for the
146 Chapter 3. Analytical Models of Random Phenomena
(i) winter season that they could have a leisurely dinner together? Assume John’s
(ii) summer season (Ans. (i) 1.37, (ii) 6.67) scheduled connection time is 1 hr in part (c). (Ans. 0.368)
(b) What is the probability that there will be a total of four thun
3.34 Suppose rebars from a supplier are suspected to contain 2%
derstorms during the 2 months of March and April next year?
that are below specification. 1000 rebars were delivered by the
(Ans. 0.056)
supplier for the construction of a reinforced concrete structure.
(c) What is the probability that there will be no December thun
To ensure the quality of the rebars, the construction company
derstorms during 2 out of the next 5 yr? (Ans. 0.267)
randomly selected 20 rebars and tested them for specification
3.32 Geomembrane is often used to provide an effective im compliance. Based on the suspected 2% defective rebars, an
pervious barrier in a waste containment lining system. The ge swer the following:
omembrane has to be sewn together to cover the entire site; (a) What is the probability that all the 20 rebars tested will pass
defects can thus occur along the seams. Consider a landfill con the test?
struction project that requires 3000 m of seams and the quality (b) What is the probability that at least two of the rebars tested
of the seaming operation is such that defects will occur along the will fail the test?
seams at a mean rate of one per 200 m. The geomembranc layer (c) How many of the rebars delivered need to be tested without
is inspected after the installation, and those defects that are de a single failure if the construction company wish to have a 90%
tected will be repaired. However, some of the defects will not be probability of assurance of the quality of the rebars from the
detected during the inspection; they will remain and can cause supplier?
unsatisfactory performance of the lining system. Suppose the 3.35 The traffic on the one-way main street shown below may
current inspection procedure fails to detect 20% of the defects, be satisfactorily described by a Poisson process with an average
(a) What is the mean rate of defects along the seams that remain rate of arrival v = 10 cars per minute. A driver (indicated by the
in the system after an inspection? box) on the side street is waiting to cross the main street. He will
(b) Assume that the defects that remained undetected occur ac cross as soon as he finds a gap of 15 sec.
cording to a Poisson process. What is the probability that there
will be more than two defects remaining in the lining system?
(c) Consider a similar but smaller project involving only 1000 m
of seams. However, defects in the geomembrane seams are very
undesirable for this project. It is required to achieve a 95% proba
bility that the geomembrane lining system will be free of defects
after the inspection. Assume that the quality of seaming oper One way
ation is the same as the uninspected seam (i.e., the same mean
rate of defects before inspection), but the inspection effort can
be improved to reduce the percent of undetected defects. What
is the allowable fraction of undetected defects for this improved
inspection procedure?
3.33 The delay time fora given flight is exponentially distributed (a) Determine the probability, p, that a gap will be longer than
with a mean of 0.5 hr. Ten passengers on this flight need to take 15 sec.
a subsequent connecting flight. The scheduled connection time (b) What is the probability that the driver will cross at the fourth
is either 1 or 2 hr depending on the final destination. Suppose gap?
three and seven passengers arc associated with these connection (c) Determine the mean number of gaps he has to wait until
times, respectively. crossing the main street.
(a) Suppose John is one of the ten passengers needing a connec (d) What is the probability that he will cross within the first four
tion. What is the probability that he will miss his connection? gaps?
(Ans. 0.053)
3.36 Suppose cracks exist in a certain material, and the crack
(b) Suppose he met Mike on the plane, and Mike also needs
length [in micrometers (pni)\ of a randomly selected crack is
to make a connection. However. Mike is going to another des
normally distributed with an average length of 71 and a variance
tination and thus has a different connection time from John’s.
of 6.25.
What is the probability that both John and Mike will miss their
(a) What percentage of cracks are over 74 pm long?
connections? (Ans. 0.018)
(b) What percentage of cracks exceeding 72 pm are over
(c) A friend of John’s, named Mary, happens to live close to the
77 pml
airport where John makes his connection. She would like to take
this opportunity to meet John at the airport. Suppose she has 3.37 A project is described by the simple activity network shown
already waited for 30 min beyond John’s scheduled arrival time. below, consisting of directed branches representing activities,
What is the probability that John will miss his connection so and nodes representing the beginning/termination of activities.
Problems 147
Activities A and B are independent activities and are sched 120 planes use it during this period, whereas the E-W runway
uled to start simultaneously at day 0. Node “a” represents com is considered overcrowded if more than 115 planes use it during
pletion of both activities A and B, and node “b” represents com the peak period.
pletion of the project. Activity C can start only when both A (a) What is the probability that the N-S runway is going to be
and B are completed. All durations (in days) required for the overcrowded on a given day, if at the beginning of the day this
respective activities are normal random variables as defined in runway is selected to be used?
each of the branches in the figure below. (b) What would be the probability of overcrowding, if at the
beginning of a given day the E-W runway were selected to be
used?
(c) What is the probability of overcrowding at the airport, if
there is no advance information as to which runway will be used
on a given day?
3.40 A foundation engineer estimates that the settlement of a
proposed structure will not exceed 2 in. with 95% probability.
From a record of performance of many similar structures built
on similar soil conditions, he finds that the coefficient of varia
day 0 tion of the settlement is about 20%. If a normal distribution is
Activity Network assumed for the settlement of the proposed structure, what is the
probability that the proposed structure will settle more than 2.5
At node “a,” activity C is scheduled to start at day 60; however, in.? (Ans. 0.00047)
if there is delay, it cannot start until 15 days beyond its scheduled
3.41 A student has submitted a concrete cylinder to the “strength
starting date (e.g., because of reallocation of resources to other
contest” in Engineering Open House. Suppose the strength of her
projects).
concrete cylinder is normally distributed as N(80. 20) in kips.
(a) What is the probability that activity C will start on schedule,
She was scheduled to be the last contestant for load testing.
i.e.. 60 days after the starting date of the project?
Immediately prior to her cylinder being tested, the two highest
(b) What is the probability that the project will be completed on
strengths in the contest thus far are 100 and 70 kips.
target, i.e.. 150 days after the starting date?
(a) What is the probability that she will be the second-place
3.38 The current traffic volume at an airport (number of take winner?
offs and landings) during the peak hour of each day is a normal (b) Suppose her cylinder is being tested, and it has not shown
variate with a mean of 200 planes and a standard deviation of any sign of distress at a load of 90 kips. What is the probability
60 planes. that she will win first place?
(a) If the present runway capacity (for landings and take-offs) (c) Suppose she submitted a similar cylinder to another contest.
is 350 planes per hour, what is the current probability of traffic Her boyfriend used an alternative procedure to make his concrete
congestion at this airport? Assume that there is one peak hour cylinder, such that the strength is expected to be only 1% higher
per day. (Ans. 0.0062) than hers, but the c.o.v. is 50% higher. Who is more likely to
(b) If the mean traffic volume is increasing linearly at the annual score a higher strength in that contest? Justify.
rate of 10% of the current volume, with thec.o.v. remaining con
3.42 An offshore platform is built to withstand ocean wave
stant. what would be the probability of congestion at the airport
forces.
10 yr hence?
(a) The annual maximum wave height of the ocean waves (above
(c) If the projected growth of traffic volume is correct, what air
mean sea level) is a random variable with a Gaussian distribution
port capacity will be required 10 yr from now in order to maintain
having a mean height of 4.0 m and a c.o.v. of 0.80. What is the
the present service condition, i.e.. to maintain the current prob
probability that the wave height will exceed 6 m in a given year?
ability of congestion?
(b) If the platform were to be designed for a wave height (above
3.39 An airport has two runways, one is North-South (N-S), and mean sea level) such that it will not be exceeded by ocean waves
the other is East-West (E-W). Because of the prevailing winds, over a period of 3 yr with a probability of 80%, what should
the N-S runway is used 80% of the days, and the E-W one the be the platform height above the mean sea level? Wave height
remainder of the time. The selection of the runway for a given exceedances between years are statistically independent.
day is based on the wind at the beginning of the day, and once (c) Suppose that ocean waves exceeding 6 meters will occur ac
selected will not be changed for the entire day. The airport has cording to a Poisson process and that each of these waves could
a single peak period each day; this is the hour between 4:00 and potentially cause damage to the platform with a probability of
5:00 pm. During this peak hour, the volume of air traffic varies 0.40. What is the probability that there will be no damage to the
day to day and may be described with a normal variate N( 100, platform in 3 yr? Damages to the platform between waves are
10). The N-S runway is considered overcrowded if more than statistically independent.
148 ► Chapter 3. Analytical Models of Random Phenomena
3.43 The daily SO2 concentration in the air for a given city is (b) Suppose the pollutant concentration between days are statis
normally distributed with a mean of 0.03 ppm and a c.o.v. of tically independent. What is the probability that the critical level
40%. Assume statistical independence between the SO2 concen of pollutant concentration will not be reached during a given
tration for any 2 days. Suppose the criteria for clean air standard week?
require that:
3.47 Because of spatial irregularities, the depth, H. from the
1. The weekly average SO2 concentration should not exceed
ground surface to the rock stratum may be modeled as a lognor
0.04 ppm.
mal random variable with a median depth of 20 m and a c.o.v. of
2. The SO2 concentration should not exceed 0.075 ppm on more
30%. In order to provide satisfactory support, a steel pile must
than 1 day during a given week.
be embedded 0.5 m into the rock as indicated in the figure below.
Determine which one of these two criteria is more likely to be
violated in this city. Substantiate your answer with calculated
probabilities.
3.44 The daily flow rate of contaminant from an industrial plant
is modeled by a normal random variable with a mean value of
10 units and a c.o.v. of 20%. When the contaminant flow rate ex
ceeds 14 units on a given day, it is considered excessive. Assume
that the contaminant flow rate between any 2 days is statistically
independent.
(a) What is the probability of having excessive contaminant flow
rate on a given day? (Ans. 0.02275)
(b) Regulation requires the measurement of contaminant flow
rate for 3 days. The plant will be charged with a violation if
excessive contaminant flow rate is observed during the 3-day
(a) What is the probability that a 25-m-long pile will not anchor
period. What is the probability that the plant will not be charged
satisfactorily in the rock stratum?
with violation? (Ans. 0.933)
(b) If a 25-m pile has been driven 24 m and has not encoun
(c) Suppose there is a proposal to change the regulation such that
tered rock, what is the probability that an additional 2 m of pile
contaminant flow rate will be measured for 5 days and the plant
welded to the original length will be sufficient to anchor this pile
will be charged with a violation if excessive contaminant flow
satisfactorily in the rock stratum?
rate is observed in more than one of the 5 days. Will the plant be
better off with the proposed change? Justify your answer. (Ans. 3.48 The capacity of a pile supporting a transmission tower is
yes) modeled with a lognormal random variable with a mean of 100
(d) Return to Part (b). Although the plant cannot reduce the stan tons and a c.o.v. of 20%.
dard deviation of the daily contaminant flow rate, it can reduce (a) What is the probability that the pile will survive a load of
the mean daily contaminant flow rate by improving the chem 100 tons?
ical process. Suppose the plant decides to limit the probability (b) During the installation of the pile, a pile load test indicated
of violation to 1%. What should be the daily mean contaminant that the pile can support at least a load of 75 tons. What is now
overflow rate? (Ans. 8.58) the probability that the pile will survive a 100-ton load?
(c) If the pile has survived a recent hurricane during which the
3.45 The time between severe earthquakes at a given region
load transmitted to the pile is estimated to be 90 tons, what is the
follows a lognormal distribution with a coefficient of varia
probability that the pile can carry a 100-ton load?
tion of 40%. The expected time between severe earthquakes is
80 yr. 3.49 The maximum wind velocity in a tornado at a given city
(a) Determine the parameters of this lognormally distributed re follows a lognormal distribution with a mean of 90 mph and a
currence time T. (Ans. 4.308, 0.385) c.o.v. of 20%.
(b) Determine the probability that a severe earthquake will occur (a) What is the probability that the maximum wind velocity will
within 20 yr from the previous one. exceed 120 mph during the next tornado?
(c) Suppose the last severe earthquake in the region took place (b) Determine the design tornado wind velocity whose return
100 yr ago. What is the probability that a severe earthquake will period is 100 yr. Assume one tornado will strike this city each
occur over the next year? year.
3.46 The daily average concentration of pollutants in a stream 3.50 Statistical data of breakdowns of computer XXX show that
follows a lognormal distribution with a mean of 60 mg/1 and a the duration for trouble-free operation of the machine can be de
c.o.v. of 20%. scribed as a gamma distribution with a mean of 40 days and
(a) What is the probability that the average concentration of pol a standard deviation of 10 days. The computer is occasionally
lutant in the stream will exceed 100 mg/1 (a critical level) on a taken out for maintenance in order to insure operational condi
given day? tion at any time with a 95% probability.
Problems 149
(a) How often should the computer be scheduled for mainte tively, and shape parameters of 3.00 and 4.50. The operation
nance? (Hint: Should it he shorter or longer than the mean of times between the ships are statistically independent. In this case,
40 days?) what would be the answer to Part (b)? For this part, consider the
(b) If the computer is in good operational condition at the time computer-based methods of Chapter 5.
it is scheduled for regular maintenance, but no maintenance was
3.53 A bridge connects two cities A and B. The bridge has a
performed, what is the probability that it will break down within
capacity to handle 1000 vehicles per hour. Normally, on a given
another week beyond its regular maintenance schedule?
day. the peak volume of traffic using the bridge can be modeled
(c) Three XXX computers were acquired at the same time by
by a beta distribution with lower and upper bound traffic of 600
an engineering consulting firm. The computers are operating un
and I 100 vehicles per hour and a mean hourly traffic of 750 ve
der the same environment, workload, and regular maintenance
hicles and a c.o.v. of 0.20. A jamming condition will develop on
schedule. The breakdown times between the computers, how
the bridge if its capacity is exceeded by the traffic volume, or
ever. may be assumed to be statistically independent. What is
if an accident should occur on the bridge. The probability of an
the probability that at least one of the three machines will break
accident on the bridge is estimated to be 0.02 during any period
down within the first scheduled maintenance time?
of peak traffic. It may be assumed that peak traffic volume and
3.51 The capacity of a building to withstand earthquake forces the occurrence of an accident are statistically independent.
without damage has a gamma distribution with a mean of 2500 (a) What is the probability that the bridge will be jammed on a
tons and a c.o.v. of 35%. given day?
(a) If the building has survived a previous earthquake with a (b) If the bridge is jammed, what is the probability that it was
force of 1500 tons without damage, what is the probability that caused by an accident on the bridge?
it can withstand a future earthquake with a force of 3000 tons?
(b) The occurrences of earthquakes with a force of 2000 tons 3.54 Statistics show that 20% of freshmen students at an engi
constitute a Poisson process with an expected occurrence rate of neering school fail after I yr. In a class of 30 students, what is the
once every 20 yr. What is the probability that there will be no probability that among eight students selected at random, two of
damage to the building over a life of 50 yr? them will fail after I yr?
(c) In a complex of five similar buildings, each with the same 3.55 A highway contractor orders the delivery of ten road
earthquake resistance capacity as described above, what is the graders from an equipment rental company. Record shows that
probability that at least four of them will not be damaged under the trouble-free operational time of a road grader has a mean
an earthquake force of 2000 tons? Assume that the occurrences time of 35 days with a c.o.v. of 0.25. If the distribution of the
of damage to the different buildings are statistically independent. trouble-free operational time can be represented by a gamma
3.52 In a harbor, a merchant ship may need to wait for dock distribution.
ing at a quay for loading and unloading operations. The load (a) What is the probability that a road grader will be operational
ing/unloading operation of each ship can be either 2 or 3 days without problem for 40 days?
with relative likelihoods of 1 to 3, and the operation times be (b) If the rental company has an inventory of 50 road graders,
tween ships are statistically independent. There is only one load and assuming that 10% of them will have trouble-free opera
ing crane and three mooring berths in the quay. Assume the fol tional life of less than 40 days, what is the probability that two
lowing probabilities for the queue size: among the ten graders will have operational lives less than 40
days?
Queue Size (no. of ships) Probability
3.56 An office building is planned and designed with a lateral
0 0.1 load-resisting structural system for earthquake resistance in a
1 0.3 seismic zone. The seismic capacity (in terms of force factor) of
2 0.4 the proposed system is assumed to have a lognormal distribu
3 0.2 tion with a median of 6.5 and a standard deviation of 1.5. The
ground motion expected to be generated by the maximum possi
(a) What is the probability that the total waiting time of a mer ble earthquake at the building site will have an equivalent force
chant ship upon arriving at the harbor has to wait longer than factor of 5.5.
5 days before loading and unloading can commence at the quay, (a) What is the estimated probability of damage to the office
if there are two ships in the queue and the loading/unloading building when subjected to the maximum possible earthquake?
operations are just starting for the first ship in the queue? (b) If the building should survive (without any damage) a previ
(b> What is the probability that the total waiting time of the ship ous moderate earthquake with a force factor of 4.0. what would
in (a) will be longer than 5 days without knowing the queue size be its future failure probability under the maximum possible
when it arrives at the harbor? earthquake?
(c) Finally, suppose that the loading/unloading operation time (c) The future occurrences of the maximum possible earth
for a ship is a beta-distributed random variable with minimum quakes may be modeled by a Poisson process with a return pe
and maximum operation times of 1.5 days and 4 days, respec riod of 500 yr. if the damage effects between earthquakes are
150 ► Chapters. Analytical Models of Random Phenomena
statistically independent, what is the probability of the proposed (II) Increase the original probability of satisfactory pressure at
building surviving a life of 100 yr without damage? B (i.e., within the normal range) to 95%.
(d) Suppose that the office building will be part of a complex Which option is better in terms of minimizing the probability of
of five identical structures designed with the same earthquake unsatisfactory water service to the city? (Ans. I) Explain.
resistance. What is the probability that at least four of the five
3.58 The daily water levels (normalized to the respective full
buildings will survive a life of 100 yr without damage? Sur
condition) of two reservoirs A and B are denoted by two random
vivals among the buildings may be assumed to be statistically
variables X and Y having the following joint PDF:
independent.
3.57 A water distribution network is shown in the figure below. f(x,y) = (6/5)(x + y2), 0 < x < 1:0 <y< 1
For a given rate of flow through the network, the performance at
a given node is measured by the head pressure at the node. Satis (a) Determine the marginal density function of the daily water
factory performance at a node requires that its pressure be within level for reservoir A.
a normal range, between 6 and 14 units. Suppose the pressure (b) If reservoir A is half full on a given day, what is the proba
at node A is a lognormal random variable with a mean value of bility that the water level will be more than half full?
10 units and a c.o.v. of 20%. (c) Is there any statistical correlation between the water levels
in the two reservoirs?
3.59 The joint PMF of precipitation. X (in.) and runoff, Y (cfs)
(discretized here for simplicity) due to storms at a given location
is as follows:
X= 1 X=2 X=3
► REFERENCES
Ang, A. H.-S., “Bases for Reliability Approach to Structural Fa Aerospace Research Laboratories. U.S. Air Force. U.S. Government
tigue,” Proc. 2nd. Inf. Conf, on Structural Safety and Reliability, Printing Office. Washington. D.C.. 1963.
ICOSSA’R ’77, Munich, Germany, Werner-Verlag, Dusseldorf, Sept. Kaplan, W.. Advanced Calculus, Addison-Wesley Publishing Co., Cam
1977. bridge, MA. 1953.
Ang, A.H.-S., and Tang. W.H, “Probability Concepts in Engineering Pearson, E.S, and Johnson, N.L, Tables of the Incomplete Beta Function,
Planning and Design,” Vol. 2, Decision, Risk and Reliability, John 2nd ed., Cambridge University Press, Cambridge. England, 1968.
Wiley & Sons. New York. 1984. Ruhl, J.A., and Walker, W.H., “Stress Histories for Highway Bridges
Elderton, W.P, Frequency Curves and Correlation. 4th ed., Cambridge Subjected to Traffic Loading,” Civil Engineering Studies, Struc
University Press, Cambridge, England. 1953. tural Research Series No. 416, University of Illinois at Urbana-
Hardy, G.H. Littlewood, J.E, and Polya, G., Inequalities, Cambridge Champaign, March 1975.
University Press. Cambridge. England, 1959. Zhao, Y.G. and Ang, A.H-S., "Three-Parameter Gamma Distribution and
Harter, H.L, New Tables of the Incomplete Gamma Function Ratio Its Significance in Structural Reliability,” Computational Structural
and of Percentage Points of the Chi-square and Beta Distributions, Engineering, An International Journal, Vol. 2, Seoul. Korea, 2002.
CHAPTER
4
Functions of Random Variables
► 4.1 INTRODUCTION
In this chapter, we introduce the function of one or more random variables. Engineering
problems often involve the determination of functional relations between a dependent vari
able and one or more basic or independent variables. If any one of the independent variables
are random, the dependent variable will likewise be random; its probability distribution,
as well as its moments, will be functionally related to and may be derived from those of
the basic random variables. As a simple example, the deflection. £), of a cantilever beam
of length L subjected to a concentrated load, P, applied at the end of the cantilever (as
discussed earlier in Example 1.2) is functionally related to the load P and the modulus of
elasticity E of the beam material as follows:
in which 1 is the moment of inertia of the beam cross section. Clearly, we can expect that
if P and E (such as construction timber) are both random variables, with respective PDFs,
fp{p) and /^(e), the deflection D will also be a random variable with PDF./n(J), that can
be derived from the PDFs of P and E. Moreover, the moments (such as the mean and
variance) of D can also be derived as a function of the respective moments of P and E. In
this chapter, we shall develop and illustrate the relevant concepts and procedures for these
purposes.
T = g(X) (4.1)
In this case, when Y =y,X = g“*(y), in which g-1 is the inverse function of g. If the inverse
function g_|(y) has a single root, and therefore is single valued, then
151
152 ► Chapter 4. Functions of Random Variables
We can demonstrate Eq. 4.2 graphically for the variable Y as a function of the discrete
random variable X with the following function:
Y = X2 for x > 0
and the PMF of X is as shown in the figure below on the left.
Px(X)
In this case, we can see that according to Eq. 4.2, when y = 1, x = 1 and pr(l) =
px(l) = 0.25, whereas when Y = 4, pr(4) = px(2) = 0.50, and when y = 9, ^r(9) =
px(3) = 0.25. At all other values of Y the PMF is zero; e.g., py(5) =px(\/r5) =
/?x(2.24) = 0. Graphically, the PMF of Y is shown in the figure above on the right.
Also, it follows that
P(Y < y) = P[X < g-l(y)] if g(x) is an increasing function ofx
whereas if g(x) is a decreasing function ofx, P(Y < y) — P[X > g-l(y)].
Thus, when y increases with x, the CDF of Y is
FrW = M3)
Therefore for discrete X, we have
F?(y) = £2 Px(xi) (4-4)
all Xi<g~'(y)
In this latter case, for a continuous X, we recall from calculus that by making a change in
the variable of integration, Eq. 4.5 becomes
A(y) = -AU-1)-^-
«y
4.2 Derived Probability Distributions 153
dg
but------is negative. Therefore, properly the derived PDF ot Y is
dy
dg 1
fy(y) = fx(g~') (4.6)
dy
► EXAMPLE 4.1 Let us first illustrate the transformation of a discrete distribution as indicated in Eq. 4.2. Consider a
cantilever beam with a length L that is subjected to a concentrated load F applied at the free end of
the beam as shown in the figure below.
Suppose that the load F consists of a number of boxes each of the same weight of one unit; the number
of boxes loaded on the beam, x, varies from 0 to n and is distributed as a binomial distribution in
which the probability of a box being loaded is p; this means that the load F has a binomial PMF as
follows:
Pf(x) = x = 0, 1,..., n
in which x is the number of boxes loaded on the beam. Under a load of x boxes, the bending moment
at the fixed support of the cantilever beam is
m =x•L
and the inverse function is g“* = x — m/L. Then, according to Eq. 4.2. the distribution of the bending
moment is
pm/L(l - p)'-",/L
Since m = xL, we can easily see that the distribution of the bending moment, M, at the fixed end of
the beam has the same binomial PMF as the applied load F. ◄
► EXAMPLE 4.2 Consider a normal variate X with parameters p and a; i.e., N(p, a) with PDF
I /x — p \
fx(x) = .. ........exp
x/2n’cr 2\ a /
X-p
Let Y =-------- . Using Eq. 4.6, we determine the PDF of Y as follows.
a
154 ► Chapter 4. Functions of Random Variables
*
First, we observe that the inverse function is g_|(y) = cry + /z, and ——• = a. Then, according to
ay
Eq. 4.6, the PDF of Y is
-|(cry + /z - At)2
A(.y) = —=^-cxp —-_g - 4 v-
y/27T
V27TO
which is the PDF of the standard normal distribution. 7V(0, 1).
► EXAMPLE 4.3 Suppose a random variable X has a lognormal distribution with parameters X and £. According to
Eq. 4.6, we derive the PDF of the function Y = In X as follows.
In this case, the PDF of X is
1 1 I / In x — AV
A(x) = 2\ ? /
and
1 1 F 1 /y —X\2"] V1 1
fY(y) = — exp -- —— |e’| = —=-exp
V2tt< e> L 2V < / J v2tt<
which means that the distribution of Y = In X is normal with a mean of X and a standard deviation of
i.e., ^). This result also shows that
£(ln X) = X
and
Var(lnX) = <2.
The inverse function g 1 (>’) may not be single-valued; i.e., for a given value y there may
be multiple values of g“*(y). For instance, if g-1(y) = xj, x2,..., Xk, then we would have
k
(Y = y) = |J(X = x,)
i=i
And if X is discrete, the PMF of Y is
k
PyW = ^T,Px(Xi) (4.7)
»=i
► EXAMPLE 4.4 The strain energy in a linearly elastic bar subjected to an axial force S is given by the equation
L 7
U =------S2
2AE
where:
L = length of the bar
A = cross-sectional area of the bar
E = modulus of elasticity of the material
Using c = LI2AE. we can rewrite
U =cS2
In this case, the inverse function has two roots which are
with derivatives
ds - ± 1 ds 1
and
du 2^/cii du 2^/cu
Now. if S is a lognormal variate with parameters X and the PDF of U, according to Eq. 4.8, is
fuw=[a (yf)+fs )] ।
But_/$(■$■) = 0 for < 0; therefore.
fiAu) =
1 1 1 In m — Inc — 2X
fuW = ----- exp 2\ 2?
72^(20 w
► EXAMPLE 4.5 In Example 4.4. if the applied force S is a standard normal variate with the PDF 7V(0, 1), the PDF of
the strain energy U, according to Eq. 4.8, would become as follows:
A(«) =
In this case, because of the symmetry of the PDF of S about 0. fs(—s) = fs(s)- Hence,
fiW =
\V c / \2jcu /
= V27TCW
J- exp(-^)
v 2c/
u>0
156 Chapter 4. Functions of Random Variables
which is a chi-square distribution with one degree-of-freedom (see Chapter 6). Graphically, this PDF
would be as shown in Fig. E4.5.
► EXAMPLE 4.6 The overall height of earth dams must have sufficient freeboard above the maximum reservoir level in
order to prevent waves from washing over the top of the dam. The determination of the overall height
must include the wind tide and wave height.
In particular, the wind tide, in feet above the still-water level is given by
F ■>
Z = -------- V2
1400J
where
V = wind speed in miles per hour
F = fetch, or length of water surface over which the wind blows, in feet
d = average depth of reservoir along the fetch, in feet.
Suppose the wind speed, V, is exponentially distributed with a mean wind speed of i.e., its
PDF is
=0 v <0
Then, according to Eq. 4.8. the distribution of the wind tide, Z, is determined as follows:
Denoting a — F/1400c/, we have Z = aV2
and the inverse functions, v = ± / -
V«
and
dv
dz
Then, Eq. 4.8 yields
A dependent variable may be a function of two or more random variables, in which case
it is also a random variable and its probability distribution will depend on those of the
independent random variables through the specified functional relationship.
Consider first the case of a function of two random variables X and Y.
Z = g(X, Y) (4.9)
If X and Y are both discrete random variables, we would have
Sum of Discrete Variates—Consider first the sum of two discrete random variables
Z=X+Y
in which case, Eq. 4.10 becomes
Pz(z) = 52 -^) (4.11)
Xi+yj=z all Xj
Sum of Independent Poisson Processes—As an example of the sum of two discrete random
variables, consider the sum of two statistically independent variates X and Y that are Poisson-
distributed with respective parameters, v and /z, as follows:
/ x _v,
PxU) = —~e
and
(pt)y _
Py(y) = ——e M
According to Eq. 4.11, and since X and Y are statistically independent, we have
= e~(v+^' •
Z-^x\(z — x)!
all x ' '
But the above sum for all x is the binomial expansion of (v + Hence, we obtain the
PMF of Z,
[(v + _( )t
Pz(z) =----------- 7-------- e
z!
which means that Z also has a Poisson distribution with parameter (v + /z). Generalizing
this result, we can infer that the sum of n independent Poisson processes is also a Poisson
158 Chapter 4. Functions of Random Variables
process; specifically, if
z=
1=1
where each X, has a Poisson PMF with parameter v(, the PMF of Z is also Poisson with
parameter
n
VZ =
i= l
It is important to point out, however, that the difference of two independent Poisson processes
is not a Poisson process. That is, if X and Y are individually Poisson distributed, the PMF
of Z = X — Y does not yield a Poisson distribution.
► EXAMPLE 4.7 A toll bridge serves three suburban residential districts A. B, and C as shown in Fig. E4.7. During
peak hours of each day, the estimated average volumes of vehicular traffic from the three districts
are, respectively, 2, 3, and 4 vehicles per minute. If the peak hour traffic from each district can be
assumed to be a Poisson process, with respective parameters of = 2, vB — 3, Vc = 4, the total peak
hour traffic at the toll bridge would also be a Poisson process with parameter
v = 2 +3+ 4 = 9 vehicles/min
The probability of more than nine vehicles crossing the bridge in 1 minute would be
(9 x 1)"
P(X, > 9) = 1 - = 1 -0.5874 = 0.413
Now, if the basic random variables X and Y are continuous, the CDF of Z would be
roc rx
Fz(z) = // fx,y(x, y)dxdy = / / /x.y(x, y)dxdy
JJ J —00 J — oc
A'U.y)<z
where g 1 = g 1 (z, y). Changing the variable of integration from x to z, the above integral
becomes
OO PC 9g 1
Z ■oo J—oc dz
Taking the derivative of Eq. 4.11 with respect to z, we obtain the PDF of Z,
dzdy (4.12)
9g~‘
/zU)= / /x.r(g \y)
dy (4.13)
dz
4.2 Derived Probability Distributions 159
Alternatively, taking the inverse for y, i.e., g 1 = g 1 (x, z), we also have for the PDF of Z,
1 / z — by \
,/z(z) = / ~fx,Y I ------ y I dy (4.14)
J-oo a \ a J
or with Eq. 4.13a,
/’ 00 1 / z — ax\
fz(z) = / ~fx.Y\x, —-— ) dx (4.14a)
J-oo » \ O /
If X and Y are statistically independent variates, the above Eq. 4.14 becomes
1 Y z — by \
fz(z) = - ,fx\------ -]fY(y)dy (4.15)
a \ a J
or
1 Yz~ax\
fz(z) = - / fxMfyl dx (4.15a)
bJ-oo \ b f
► EXAMPLE 4.8 Figure E4.8 shows a frame building subjected to earthquake forces. The building mass m is assumed
to be concentrated at the roof level. When subjected to ground shaking during an earthquake, the
building will vibrate about its original (at rest) position, inducing velocity components X and Y of the
mass, with a resultant velocity Z = x/X2 + Y2.
Assuming that X and Y are statistically independent and are, respectively, standard normal
variates with distribution N(0. 1), the probability distribution of the resultant kinetic energy of the
building mass during an earthquake would be determined as follows:
The resultant kinetic energy is
mass m
Elevation
W=U+ V
From the results of Example 4.5. the PDF of U and V are. respectively.
Then, according to Eq. 4.15a. and assuming U and V are statistically independent (based on indepen
dence between the velocity components X and F), we obtain the PDF of W as follows:
With v = w — m,
We may observe that the above integral is the beta function |), which is
Hence,
= J-e-’2"
2m
which is a chi-square distribution with two degrees-of-freedom (see Chapter 6).
z»OO
/z(z) = t--------- / exp <V’
—J —oo
1
=---------- exp dy
2?r OxCTy
where
Py Z — Px
and 2
ap a2 2
4.2 Derived Probability Distributions 161
Then substituting,
v
w y
u
the integral above becomes
-~(uy2 - 2uy) dw
Mz = Mx + Mr
and variance,
2 _ । _2
aZ ~ aX + °Y
With the same procedure, we can show that Z — X — Y is also Gaussian with a mean of
Mx — Mr and the same variance Oy = aj, + trp.
We might point out that the above results remain valid even if the variates are correlated,
except that the variance must include the covariance between X and Y.
On the basis of the above results, we can infer inductively that if
n
z = Y^Xi
i=l
where are constants, and X, are statistically independent Gaussian variates Mmx,, o%),
then Z is also Gaussian with mean,
n
Mz = ^a/Mx, (4.1 Op)
Z=l
and variance,
az = ^"^x, (4.10q)
The last result above shows that any linear function of Gaussian variates is also a Gaussian
variate. The relationships of Eqs. 4.1 Op and 4.10q, for the mean and variance, however, are
not limited to Gaussian variates. We shall observe later in Sect. 4.3.1 that these equations
are, in fact, valid for linear functions of any statistically independent random variables
irrespective of their distributions.
► EXAMPLE 4.9 The storm drain of a city is a normal variate with a mean capacity of 1.5 million gallons per day (mgd)
and a standard deviation of 0.3 mgd; i.e., the capacity is N(\.5, 0.30). The storm drain serves two
independent drainage sources within the city that are also normally distributed as follows: drainage
source A = /V(0.70. 0.20) mgd. and drainage source B = N(0.50, 0.15) mgd.
162 Chapter 4. Functions of Random Variables
The probability that the storm drain capacity D will be exceeded during a storm would be
determined as follows:
Since D, A, and B are independent normal variates, the sum S — D — (A + B) is also normal with
mean and variance, according to Eqs. 4.1 Op and 4. lOq,
and
Suppose from prior storms, it has been shown that the existing storm drain can carry at least 1.2 mgd.
Given that this information is reliable, what is the probability that the storm drain will be able to carry
a total drainage of 1.9 mgd during a severe storm?
Clearly, this probability is conditioned on the information that the capacity of the storm drain is
>1.2 mgd. Hence, we have
/ 1.9- 1.5\
I P(Z) > 1.9) ] — P(D < 1.9) 03 )
P(D > ' 9ID > = HpTiii = ! - P(D < 1.2) = . /l.2—l.5\
\ 0.3 /
= 1 - 0(2.00) = 1 -0.9772 = o 03
1- 0(-1.00) 1-(1-0.8413)
Therefore, there is very little likelihood that the storm drain can carry a drainage of 1.9 mgd.
Now, if another drainage source C = 7V(0.8, 0.2) mgd is to be added to the existing storm drain,
how much must the mean capacity of the current storm drain be increased in order to maintain the
same probability of exceedance (i.e„ of 0.221) of the existing system? Assume that the standard
deviation of the new storm drain will remain at 0.3 mgd.
Denoting the new capacity as D', we must have
and
Therefore,
P(S < o) = ~2 °)\ _ () 221
\ 0.44 /
or
20~ = 0-‘(O.221) = -0.77
0.44
4.2 Derived Probability Distributions 163
► EXAMPLE 4.10 Products can be transported by rail or trucks between New York and Los Angeles; both modes of
transportation go through the city of Chicago. The mean travel times between the major cities for
each mode of transportation are indicated in Fig. E4.10.
Figure E4.10 Rail and truck routes between New York and Los Angeles
The c.o.v.s of the travel times for the two modes of transportation are 15% and 20%, respectively, for
rail and truck. Assume that the travel times between any two cities are statistically independent normal
variates. The means and standard deviations of the travel times for the two modes of transportation
are, respectively,
and
If the loading and unloading times in Chicago are 10 hr for trucks and 15 hr for rail, the probabilities
that the actual travel times will exceed 85 hr between New York and Los Angeles are as follows.
The loading/unloading times in Chicago are assumed to be deterministic (c.o.v. = 0); and must
be added to the respective mean travel times. Therefore.
25 _ 80 \
/ 85 - 75 \
whereas for rail, P(TR > 85) = 1 - P(TK < 85) =1-0 —— = 1 - 0(1.54)
\ 6.49 /
= 1 - 0.938 = 0.062.
Therefore, the probability that merchandise shipped by rail from New York to Los Angeles will arrive
within 85 days is 0.938, whereas by trucks the corresponding probability is 0.691. ◄
► EXAMPLE 4.11 The integrity of the columns is essential to the safety of a high-rise building. The total load acting on
the columns may include the effects of the dead load D (primarily the weight of the structure), the live
load L (that includes human occupancy, furniture, movable equipment, etc.), and the wind load W.
164 Chapter 4. Functions of Random Variables
These individual load effects on the building columns may be assumed to be statistically inde
pendent Gaussian variates with the following respective means and standard deviations:
The individual columns were designed with a mean strength that is equal to 1.5 times the total mean
load that it carries, and may be assumed to be also Gaussian with a c.o.v. of 15%. The strength of
each column R is clearly independent of the applied load. A column will be over-stressed when the
applied load S exceeds the strength /?; the probability of this event (R < S) occurring is, therefore,
0 —(1.5 x 14.1)- 14.1 \
P(R < S) = P(R - 5 < 0) = <t>
7(0.15 X 1.5 x 14.1)2 + (1.1)2/
-7.05 \
) = $(-2.10) = 1 -0.982 = 0.018
If we wish to decrease the probability of the event (R < S), and thus increase the safety of the column,
we need to increase the strength of the column. ◄
► EXAMPLE 4.12 The framing of a house may be done by subassembling the components in a plant and then delivering
them to the site for framing. Simultaneously, while this subassembly of the components is being
fabricated, the preparation of the site, which includes the excavation of the foundation through the
construction of the foundation walls, can proceed at the same time. The different activities and their
sequence may be represented with the activity network shown in Fig. E4.12; the required work
durations for the respective activities are also shown in the table below.
Assume that the required completion times for the different activities are statistically independent
Gaussian variates, with the respective means and standard deviations shown in the above table. Clearly,
to start framing the house, the foundation walls must be completed and the assembled components
must be delivered to the site. The probability that framing of the house can start within 8 days after
work started on the job can be determined as follows.
4.2 Derived Probability Distributions 165
Denote the durations of the activities listed in the above table as X,, X2, X?,, X4 and X5, respectively,
and let
7i = Xi + X2 + X2
and
t2 = x4 + x5
In which T। and T2 are also statistically independent. Then, the required probability is
P(F) = P[(7'l < 8) A (T2 < 8)] = P(T\ < X)P(T2 < 8)
in which
8-(2 + I + 3) / 2 \
P(T\ < 8) = = — J = <t>( 1.33) = 0.907
7(1.0)2 + (0.5)2 + (1.0)2
and
8 — (5 + 2) \
P(T2 < 8) = = = 4>(0.89) = 0.813
7(1.0)2 + (0.5)2/
Products and Quotients ofRandom Variables—If the function is the product of two random
variables, say
Z = XY
then
dx 1
X = ZfY and — = -
dz y
and Eq. 4.13 yields the PDF of Z as
(4.18)
From a practical standpoint, the product (and quotient) of lognormal random variables is
of special interest. In particular, we observe that the product or quotient of statistically
independent lognormal variates is also lognormal. This can be shown as follows.
Suppose
z=n%,
1=1
where the X,'s are statistically independent lognormal variates, with respective parameters
Xx, and fx,; then
n
InZ = J^InX,
/=i
But each of the In X,- is normal (as shown in Example 4.3) with mean A, and variance <;2;
hence, In Z is the sum of normals and, therefore, is also normal with mean and variance as
follows:
n
kz = E(ln Z) = (4.20)
;=1
We might also recall from Eq. 3.27 that Ax, = lnxz„.(, in which xmj is the median of X,-;
therefore, Az is also
fl
Az = 221nxMl.,- (4.20a)
/=!
► EXAMPLE 4.13 The annual operational cost for a waste treatment plant is a function of the weight of solid waste. IV.
the unit cost factor, F, and an efficiency coefficient, E, as follows:
WF
where W, F, and E are statistically independent lognormal variates with the following respective
medians and coefficients of variation (c.o.v.):
As C is a function of the product and quotient of lognormal variates, its probability distribution is
also lognormal, which we can show as follows:
and
<c = y(0.20)2 + (0.15)2 + (1/2 x 0.125)2 = 0.26
On the basis of the above, the probability that the annual cost of operating the waste treatment plant
will exceed $35,000 is
In 35.000- 10.13
P(C > 35,000) = 1 - P(C < 35,000) = 1 - <I>
026
= 1 - 0(1.28) = 1 - 0.900 = 0.100
► EXAMPLE 4.14 The structure and foundation of the high-rise building shown in Fig. E4.14 are both designed to
withstand wind-induced pressures with the respective pressure capacities as follows:
Superstructure, = 40 psf(lb/ft2)
Foundation, R, = 30 psf
The peak wind pressure Pw on the building during a wind storm is given by
where:
V = maximum wind speed, in fps
C — drag coefficient
168 > Chapter 4. Functions of Random Variables
During a 50-year wind storm, the maximum wind speed V may be assumed to be a lognormal variate
with a mean speed of p.v = 100 fps and a c.o.v. 8V = 0.25. The drag coefficient C is also lognormal
with a mean of /ic = 1 -80 and 8C = 0.30.
Clearly, the distribution of the wind-induced pressure. Pw, is also lognormal with the following
parameters for the 50-year wind:
XPw = In 1.165 x UK3 + ac + 2AV = -6.755 + (In 1.80 - 0.302) + 2(ln 100 - 0.252)
= -6.755 + 0.498 + 2(4.543) = 2.829
and
i;Pw = 7(0.25)2 + (O.3O)2 = 0.39
Whenever the maximum wind pressure exceeds the design pressure capacity of the superstructure
or the foundation, there may be damage to the corresponding substructure. These probabilities are.
respectively, as follows.
For the superstructure,
/ In 40-2.829 \
P(PW > 40) = 1 - P(PW < 40) = 1 - $I----- —----- 1 = 1- $(2.20) = 1 - 0.986 = 0.014
/ In 30- 2.829 \
P(PW > 30) = 1 - P(PW < 30) = 1 - $f----- —----- \ = 1 - $(1.47) = 1 -0.929 = 0.071
In this case, because the wind resistances of the superstructure and foundation are deterministic, the
probability of damage to the high-rise system when subjected to the 50-year wind would be
Finally, if the occurrences of 50-year wind storms constitute a Poisson process, and damages to
the building between storms are statistically independent, the probability of wind damage. D. to the
high-rise building for a period of 20 years would be
[/>,D)rf(0(>2x20r^
P(D in 20 years) = 1 — nI
The Central Limit Theorem—One of the most significant theorems in probability theory
pertains to the limiting distribution of the sum of a large number of random variables
known as the central limit theorem. Stated loosely, the theorem says that the sum of a large
number of individual random components, none of which is dominant, tends to the Gaussian
distribution as the number of component variables (regardless of their initial distributions)
increases. Therefore, if a physical process is derived as the combined totality of a large
number of individual effects, then according to the central limit theorem, the process would
tend to be Gaussian.
4.2 Derived Probability Distributions 169
The rigorous proof of the theorem is beyond our scope of interest; however, the essence
of the theorem may be demonstrated with the following example. Take, for example, the
following sum
where the Xi's are statistically independent and identically distributed random variables
with PMF
= D= 2
and
P(Xj = x) = 0, otherwise
According to the central limit theorem, the sum S will approach the Gaussian distribu
tion N(0, 1) as n —> oo. This is demonstrated in Fig. 4.1 for the above initial PMF of the
Xj's as n increases from 2 to 20.
By virtue of the above central limit theorem, we can infer that the product (or quotient)
of independent factors, none of which is dominating, will tend to approach the lognor
mal distribution. That is, regardless of the distributions of the Xi's, the distribution of the
product
Ki = In X,
With a sample of 10,000, we generate the histogram of Li and the corresponding statistics as shown
below.
mean( Y^ = 0.136;
std(Y1) = 1.276;
skewness( Y0 = -1.207
For n = 5, the sum of A = the histogram and corresponding statistics are as follows:
mean(,4) = 0.571;
std(4) = 2.873;
skewness(>4) = -0.465
4.2 Derived Probability Distributions 171
and for n = 100. the sum of B = F,- we obtain the histogram and corresponding statistics as
follows:
mean(B) = 11.670;
std(B) = 12.755;
skewness(B) = -0.054
From the above, we see that as n increases from I to 5 to 100. the sum J2"=1F( becomes
more symmetric as indicated by the skewness coefficient, and approaches zero for n— 100. We
can also show that the kurtosis approaches zero for increasing n, indicating therefore that the sum
approaches the Gaussian distribution. Therefore, the product P = approaches the
lognormal distribution. ◄
► EXAMPLE 4.16 Finally, consider the sum of n uniformly distributed random variables X, each ranging between 0 and
2; i.e.
For n = 2, 5, 10, and 100. the respective histograms of S (all normalized with a mean of 100), and
the corresponding superimposed PDFs, with the same means and standard deviations, are shown
below:
All the histograms shown above are generated with 1000 or 10.000 repetitions (sample size). The
convergence to the normal PDF for increasing n is clearly demonstrated.
172 Chapter 4. Functions of Random Variables
Values of S Values of S
Generalization—The function of two random variables described above in Eq. 4.13 can be
generalized to derive the distribution of the function of n random variables. In particular, if
Z = g(Xi,X2,...,X„) (4.22)
then, generalizing Eq. 4.13. we have
in which g ' = g 1 (z, X|, x2,..., x„). Changing the variable of integration from Xi to z,
we have
dg
FAA = fx,... x„(g \x2, dz dxj • • ■ dx.
dz
from which we obtain
3g'1
/x,....Xn(g '<-X2, dx'2 ■. ■ dx, (4.24)
dz
This book is out of print but is available for direct order from ahang2@aol.com.
4.2 Derived Probability Distributions 173
smallest value will have its own probability distribution, whieh may be an exact distribution
or an asymptotic distribution.
► EXAMPLE 4.17 Consider the initial variate X with the exponential PDF as follows:
F'x(x) - I -
Therefore, the CDF of the largest value from samples of size n, according to Eq. 4.25, is
Fy„(y) = (l-e-x-v)"
A(y) = An(l
Graphically, the above PDF and CDF of Yn are shown (for A = 1.0) in Fig. E4.17 for different sample
sizes n from 1 to 100.
From Fig. E4.17, we can see that the PDF as well as the CDF shift to the right with increasing n.
Also, as expected, the mode of the largest value increases with increasing n.
Expanding the above Fyn(y) by the binomial series expansion, we obtain the series
*
(1 - e~k>)n = 1 - ne~ + ~ ------
2!
For large n the above series approaches the double exponential exp(—ne~ky). Hence, for large n the
CDF of the largest value from an exponential population approaches the double exponential
FYn(y) = exp(— ne~Xy)
4.2 Derived Probability Distributions 175
The Asymptotic Distributions—In Example 4.17, we observed that in the case of the expo
nential initial distribution, the CDF of the largest value from samples of size n approaches
the double exponential distribution as n increases; in this case, this double exponential dis
tribution is the asymptotic distribution of the largest value. This characteristic is shown also
in Fig. 4.2, illustrating the convergence of the exact distribution of Yn to the asymptotic
double exponential distribution as n —> oo.
The characteristics illustrated in Fig. 4.2 for the initial exponential distribution actually
apply also to other initial distributions; i.e., the distribution of an extreme value converges
asymptotically in distribution as n increases. According to Gumbel (1954, 1960), there are
three types of such asymptotic distributions (although not exhaustive) depending on the tail
behavior of the initial PDFs; namely, as follows;
The extreme value from an initial distribution with an exponentially decaying tail (in the
direction of the extreme) will converge asymptotically to the Type 1 limiting form. This
was illustrated earlier in Example 4.17 and demonstrated graphically in Fig. 4.2. For an
initial variate with a PDF that decays with a polynomial tail, the distribution of its extreme
value will converge to the Type // limiting form, whereas if the extreme value is limited, the
176 ► Chapter 4. Functions of Random Variables
corresponding extremal distribution will converge asymptotically to the Type /// asymptotic
form.
The Gumbel Distribution—The CDF of the Type 1 asymptotic form for the largest value,
well known as the Gumbel distribution (Gumbel, 1958), is
FYii(y) = exp[-e““'1(-v“'41)] (4.29a)
and its PDF is
fYll(y) = Q'„e-a"('v-'<'')exp[—(4.29b)
in which
u„ = the most probable value of Yn
a„ = an inverse measure of the dispersion of values of Y„
Moreover, the mean and variance of the largest value, Yn, and smallest value, Y ।, are related
to the respective parameters as follows (see Ang and Tang, Vol. 2, 1984):
=«,, + — (4.30a)
O',;
whereas for the smallest value, the corresponding mean and variance are
2
y . 2 7r
Mr, = «i------ and aY =
a\ 6a\
► EXAMPLE 4.18 Consider an initial variate X with the standard normal distribution, i.e., N(0,l), with the PDF
fxM = -^=e~x2f2
In this case, the tail of the PDF is clearly exponential; hence, the asymptotic distribution of the largest
value is of the double exponential form (Type I); specifically, the CDF of Yn is
Fyn(y) = exp|-«r““(v-u',)]
fYn(y) = ^^-“^^“"’expl-e-0"^-""’]
4.2 Derived Probability Distributions 177
with parameters
In In n + In 4zr
\/2 In «
and
(Xn V'2 In n
Explanation of the derivation of the above parameters may be found in Ang and Tang, Vol. 2 (1984).
The mean and standard deviation of the largest value can then be obtained, respectively, from Eqs.
4.30a and 4.30b.
If the initial variate X has a general Gaussian distribution, A(/z, a), we observe from Example 4.2
that (X — /z)/<r will be 7V(0.1). Then, the asymptotic distribution of the largest value from (X — /z)/cr
will be of the double exponential form with the parameters u„ and ot„ obtained above. Therefore, if
Yr't is the largest value from the initial Gaussian variate X, then it follows that
Hence, the CDF of Yn' is of the same double exponential form as Y„, with the parameters
u'n - aun + /z
a'„ = an/a
In the case of the smallest value from samples of size n, the corresponding distributions,
PDF and CDF. would shift to the left as n increases. Similar to the largest value, the
distribution of the smallest value will also converge (in distribution) to one of the three
types of asymptotic distributions depending on the tail behavior (in the direction of the
smallest value) of the PDF of the initial variate.
► EXAMPLE 4.19 Consider an initial variate with the standard Gaussian distribution /V(0, 1). The tail behavior in the
lower end of this PDF is obviously also exponential; thus, the CDF of the smallest value from this
initial variate will also converge to the Type 1 asymptotic form as follows:
Fr,(y) = 1 — exp[—e“,(v~"l)]
and the corresponding PDF is
/r,(y) = a1e“'(>-">)exp[-e“l(y-",)]
or । — V2 In n
► EXAMPLE 4.20 If the initial variate X is described by the Rayleigh distribution with PDF,
A(x) = 4^^2 x>0
a-
in which a =the modal value. Although the tail behavior of this PDF may not be obvious, it is
exponential (see Ang and Tang. Vol. 2. 1984). and thus the distribution of the largest value will
178 Chapter 4. Functions of Random Variables
and
x/2 In n
a,, = ---------
a
Then, according to Eqs. 4.30a and 4.30b, the mean and standard deviation of Y„ are, respectively,
► EXAMPLE 4.21 If the initial variate X is lognormal, with parameters kx and ft, In X is normal with parameters p = kx
and cr = &. Then, according to the results of Example 4.17. the largest value of In X will converge
asymptotically to the Type I distribution with parameters
,----- - In In n + In 4 7T \ \/2 In n
(V2 In /?----------- ,
2v/2h^ /
— I 4- Av and czn =----------
kx
4.2 Derived Probability Distributions 179
Therefore, according to the above logarithmic transformation, the largest value of the initial lognormal
variate X will converge to the Type II asymptotic form with parameters
v„ = eUn and k = an
The Weibull Distribution—For the Type III asymptotic form, the asymptotic distribution
of the smallest value is of greater interest. In engineering, it is well known as the Weibull
distribution (Weibull, 1951), which was discovered by Weibull for modeling the fracture
strength of materials. Its CDF is given by
where
vv’i = the most probable smallest value
k — the shape parameter
s = the lower bound value of v
The mean and standard deviation of Y\ is related to the above parameters as follows:
and (4.35)
1 - r2
► EXAMPLE 4.22 Suppose the lower-bound fracture strength of a welded joint is 4.0 ksi. If the actual strength of the joint
Y\ is modeled with a Type III smallest asymptotic distribution. Eq. 4.34. with parameters W| = 15.0
ksi and k = 1.75, the probability that the strength of the joint will be at least 16.5 ksi is
1.75“
16.5 -4\
P(Y\ > 16.5) = exp = 0.286
15-4 /
The mean and standard deviation of the joint strength are, according to Eq. 4.35,
respectively.
= 13.80 ksi;
and
ar, = (15 — 4)[T(1 +2/1.75)- T2(l + 1/1.75)]I/2 = 11 x IF( 1.1429) — E2( 1.5714)]1/2
= 11 [0.9354 - (0.8906)2]*/ 2 =4.15 ksi
The values of the respective gamma functions indicated above were evaluated through
MATLAB. ◄
180 ► Chapter 4. Functions of Random Variables
Mean and Variance of a Linear Function—Consider first the moments of the linear
function
Y = aX + b
= rrVar(X)
Z-00
(x - Px)2fx(x)dx
(4.38)
Z -oo
X|A,(X|)^I + «2 /
J—oo
X2fxfX2)dx2
4.3 Moments of Functions of Random Variables 181
We can recognize that the last two integrals above are, respectively, E(X\) and E(X2)-, hence,
we have for the sum of two random variables
If the variates X| and X2 are statistically independent, Cov (X|, X2) = 0; thus, Eqs. 4.40 and
4.42 become
Var(K) = afVar(Xi) + a?Var(X2)
The results we obtained above in Eqs. 4.39 through 4.42 can be extended to a general linear
function of n random variables, such as
n
Y = £a,X,-
1=1
in which the a/s are constants. For this general case, we obtain the following: The mean
value of T,
n n
E(Y) = £a,E(X,) = (4.43)
/=! /=!
in which ptJ is the correlation coefficient between X, and Xz, as defined in Eq. 3.72.
Moreover, if we have another linear function of the X, ’s,
2 =
182 ► Chapter 4. Functions of Random Variables
► EXAMPLE 4.23 The maximum load on a column of a high-rise reinforced concrete building may be composed of the
dead load (D), the live load (L), and the earthquake-induced load (£). The total maximum load carried
by the column would be T = D -I- L + E. Suppose the statistics of the individual load components are
as follows:
/Zp = 2000 tons, crp = 210 tons
/Zi = 1500 tons. aL = 350 tons
/Ze = 2500 tons. er/. = 450 tons
If the three loads are statistically independent, i.e., p,y, = 0, the mean and standard deviation of the
total load T, according to Eqs. 4.43 and 4.44, are
However, the dead load, D. and the earthquake load, E, may be correlated, say with a correlation
coefficient of p,y = 0.5, whereas the live load L is uncorrelated with D and E. Then, the corresponding
variance would be, according to Eq. 4.44,
a2 = 2102 + 3502 + 4502 + 2(0.50)(210)(450) = 463.600 tons2
Assuming that all the variables are Gaussian, and therefore, the difference C-T is also Gaussian, the
probability of overloading the column would be
0 - 4000
P(C - T < 0) = = <t>(-2.47) = 1 - <t>(2.47) = 1 - 0.9932 = 0.007
1618
► EXAMPLE 4.24 In the event of an earthquake of intensity 7 = 5, the average economic losses in the central area of a
city are estimated to be as follows, with respective standard deviations:
• Property losses—pp = $2.5 million; aP = $1.5 million
• Loss of business—/za = $6.0 million; og = $2.5 million
• Cost of injuries—pj = $4.0 million; oy = $2.0 million
4.3 Moments of Functions of Random Variables - 183
Assume that the loss of lives is negligible, and among the three cost items above the loss of business is
positively correlated with the property losses, with a correlation coefficient of 0.70. i.e., pBp = 0.70,
whereas the cost of injuries is uncorrelated with the other losses.
Assume also that each of the average earthquake losses indicated above varies with I2, and the
possible intensities in the area during an earthquake are / = 4, 5, and 6 with relative likelihoods of
2:1:1, respectively, whereas the standard deviation is invariant with 7.
The total loss during an earthquake is
T= P+ B+J
/6\2
E(T\I = 6) = 12.5/ - \ =18.00
Hence, the standard deviation of the total loss is, aT = V17.75 = $4.21 million ◄
Y = g(X)
z z dg 1 ^d~g
g(X) = g(P-x) T- (X — Mx)~ + — Mx) —T + • • •
~ g(Mx) + (X — /lx) —
dX
we obtain the first-order approximate mean and variance of Y as
(4.46)
and
(dg\2
Var(T) ~ Var(X — dX )
= Var(X)h£ (4.47)
yoX /
We should observe that if the function g(X) is approximately linear (i.e., not highly
nonlinear) for the entire range of X. Eqs. 4.46 and 4.47 should yield good approximations of
the exact mean and variance of g(X) (Hald. 1952). Moreover, when Var(X) is small relative
to g(/tx), the above approximations should be adequate even when the function g(X) is
nonlinear.
The above first-order approximations may be successively improved by including the
higher-order terms of the Taylor series expansion; for example, if we include the second-
order term in the series, we can show that the corresponding second-order approximations
are as follows:
1 d 2g
E(F)^(Mx)+-Var(X)--| (4.48)
and
2 1 2 2
d 2g \ dg d2g
Var(T) ~ dX ) + E(X — /z%)3
4 x dX2 J dX dX2
(4.49)
+ l-E(X-nx)4 d2g\
dX2)
We observe that the second-order approximate variance of Y as indicated in Eq. 4.49 involves
the third and fourth central moments of the original variate X. which are seldom evaluated
in practical applications.
For practical purposes, the mean value is of first importance; thus, we may use the
second-order mean of the function T, Eq. 4.48. with its first-order variance of Eq. 4.47. In
this way, we obtain an improved mean value of Y without involving more than the mean
and variance of X.
► EXAMPLE 4.25 The maximum impact pressure (in psf) of ocean waves on coastal structures may be determined by
where U is the random horizontal velocity of the advancing wave, with a mean of 4.5 fps and a c.o.v.
of 20%. The other parameters are all constants as follows:
The first-order mean and standard deviation of pm, according to Eqs. 4.46 and 4.47, are
For an improved mean value, we evaluate the second-order mean with Eq. 4.48 as follows:
£(F) ~ 3750.7 + |(0.20 x 4.5)2^2.7p^(2)
► EXAMPLE 4.26 Hypothetically, suppose the average number of commercial airplanes arriving over the Chicago O' Hare
Airport during the peak hour from various major cities in the United States are as follows:
New York 10 4
Miami 6 2
Los Angeles 10 5
Washington, DC 12 4
San Francisco 8 4
Dallas 10 3
Seattle 5 2
Other U.S. cities 15 6
The total average number of arrivals is 76 planes, and the standard deviation of the total arrivals
(assuming the arrivals from the different cities are statistically independent) is 11.22 planes.
Now, suppose that the holding time, T (in minutes), is an empirical function of the total arrivals
as follows:
T = 3/N~A, in minutes
in which NA is the total number of arrivals during the peak hour. Then, by first-order approximation,
the mean holding time would be
ar = 4.65 min
186 ► Chapter 4. Functions of Random Variables
With Eq. 4.48, we obtain the corresponding second-order mean holding time:
In this case, the first-order mean is fairly accurate; it is almost the same as the second-order
mean. ◄
Y = g(Xi,X2........ X„)
, Z X Az X ^g
Y — • Mx2, • • •, Mx„) + (A —
/=! oXi
1 " - d2g
+ 2EE(*' ‘
1=17=1 ' J
+ '''
where the derivatives are all evaluated at //X|, /zx,,..., /zXn.
If we truncate the above series at the linear terms, i.e.,
n o
y — g(^xt, mx2, ■ ■ ■, px,f + — ,jLx^^x~.
£(y)~g(MX|,^X2,...,Mx„) (4.50)
and
II / G \ 2 II n n n
,(A
*
Var(D^E< + £ (451)
1=1 ' (/ /.J=l l^J ' J
We observe that if X, and X, are uncorrelated (or statistically independent) for all i and j,
i.e., Pij = 0, then Eq. 4.51 becomes
" / \2
(4.51a)
where the derivatives are evaluated at the mean values of the X/s. Again, if X,- and X, are
uncorrelated, Eq. 4.52 becomes
£(E) ~ • (4.52a)
► EXAMPLE 4.27 According to the Manning equation, the velocity of uniform flow, in fps, in an open channel is
1 .4^) A/}
V = ----- R^SV2
n
where:
S = slope of the energy line, in %
R — the hydraulic radius, in ft
n = the roughness coefficient of the channel
For a rectangular open channel with concrete surface, assume the following mean values and
corresponding c.o.v.s:
S 1% 0.10
R 2 ft 0.05
n 0.013 0.30
Assuming that the above random variables are statistically independent, the first-order mean and
variance of the velocity V are, respectively,
1 49
My ~ —_(2)^(1 )’/2 = 182 fps; and
/I 49 \2 / 2 x 1 49 X2
= (0.10x 1)2( —-(2)2/3(lF1/2) +(0.05x2)2(——YY(1)V2(2)-1/3)
\2 x 0.013 / \3 x 0.013 /
+ (0.30 x 0.013)2 (— 1.49(2)2/3( 1 )l/2(0.013)—2)2 = 82.79 + 36.80 + 0.21 = 119.80
The first-order approximate mean velocity is about 8% lower than the corresponding second-order
mean velocity. ◄
188 Chapter 4. Functions of Random Variables
M P
S= — +-
Z A
where:
M — applied bending moment
P = applied axial force
A = cross-sectional area of the beam
Z = section modulus of the beam
M. Z, and P are random variables with respective means and c.o.v.s as follows:
Assume that M and P are correlated with a correlation coefficient of pw.p = 0.75, whereas Z is
statistically independent of M and P. We determine the mean and standard deviation of the applied
stress S in the beam by first-order approximation as follows:
Mw _ 45,000 5000
Mean value: ------ F —- =------------------- = 550 psi
Mz A 100 50 H
and variance:
= 10.900.00
104.402\ 110\
<s = In ( 1 + = 0.0354; <1 = = (0.14)2
5502 / 800/
and
As = In 550 - -(0.0354) = 6.29; ASc = In 800 - |(0.14)2 = 6.67
The safety factor of the beam is defined as 6 = Sc/S. As S, and S are both lognormal variates, the
safety factor 0 is also lognormal with the parameters
and
The beam will be overstressed when 0 < 1.0; therefore, the probability of this event is
/lnl.0-A«\ /0-0.38X
P(0 < 1.0) = -------------- = 0 =]-0(] .65) = 1 - 0.950 = 0.050
\ & / \ 0.23 /
That is, there is a 5% chance that the beam will be overstressed under the applied load.
► EXAMPLE 4.29 A two-span bridge across a 300-m-wide river is to be constructed with a midspan pier about 150 m
from one bank of the river. In order to locate the center position of the pier, a base line B is established
along one bank as shown in Fig. E4.29. The position of the pier is to be determined by intersecting
the lines of sight from Stations a and b. with the angle at Station a fixed at 90 degrees.
The proposed position of the center point of the pier is to be located 150 m from the base line,
which has been measured to have a mean distance of B = 200 m and a standard error of aH = 2 cm.
Also, the angle 0\ measured from Station b has a mean measurement of 36 52' and a standard error
of 2'.
The first-order approximate mean and standard error of the required distance D from the base
line to the pier location are as follows (in measurement theory, the standard deviation of the mean
measurement is called the “standard error”):
D = ~B tan 0\ = 200 tan 36+52' = 149.98 m
and
= o-^tantfj)2 + (B sec20i)2orl = (0.02)2(tan 36 52')2 + (200 sec2 36 52')2(5.818 x 10~4)2
Role of Monte Carlo Simulations—As we have observed in Sect. 4.2, the PDF of the
function of a single variable can be derived analytically without much difficulty. However,
in the case of a function of multiple random variables, the derivation of its probability
distribution can be a formidable task, with some exceptions, such as the sum of multiple
Gaussian variates or the product/quotient of multiple lognormal variates, as we saw in Sect.
4.2.2. Except for these special cases, we may have to resort to Monte Carlo simulations or
other numerical methods to generate, approximately at least, the probability distribution of
the function (see Chapter 5 for the basics of Monte Carlo simulations). Take, for example,
the sum of two lognormal random variables; the distribution of the sum is neither normal
nor lognormal. Also, the probability distribution of the product of two normal random
variables is neither lognormal nor normal. Tn such cases, if information on the probability
distribution of the function is required, a numerical method or Monte Carlo simulation is
clearly necessary and provides a practical tool. The basics of Monte Carlo simulation are
presented and illustrated in Chapter 5.
► PROBLEMS
4.1 Suppose an engineering variable Y is an exponential func
tion of a random variable X as follows: Ans. fY(y) =
Y = ex
and X is normally distributed as N(2, 0.4). Derive the PDF of Y 4.3 The sources of electrical power for a region are nuclear,
and show that it follows a lognormal distribution. fossil, and hydroelectric. The respective generating capacities of
these sources can be described as independent Gaussian random
4.2 The absolute velocity (X) of particles in a gas follows a
variables as follows (in megawatt power):
Maxwell distribution, with the PDF
Nuclear: N(100. 15)
if a- > 0 Fossil: N(200,40)
fx(x) =
0 otherwise Hydro: N(400, 100)
(a) Determine the total power supply for the region; i.e., define
where a is a constant. Determine the PDF fy(y) for the particle
1 , the probability distribution of the power supply with the corre
kinetic energy Y = -mX~, where m is the mass of a particle. sponding mean and standard deviation.
Problems 191
le (b) The power demand of the region during normal weather where A, B. and C are. respectively, the thickness (in m) of the
r. is 400 megawatt, whereas during extreme weather the demand three layers of soil. Suppose A, B, and C are modeled as inde
y would be 600 megawatt. In any given year, normal weather is pendent normal random variables as follows:
e twice as likely to occur as extreme weather. With the generating A ~ N(5. I)
t. capacity of (a), what is the probability that power shortage for
the region (i.e., demand exceeding supply) will occur during the B ~ N(8. 2)
>r
>f year? C ~ N(7, I)
(c) If power shortage should occur during a given year, what is
the probability that it will occur during normal weather? (a) Determine the probability that the settlement will exceed
il 4 cm. (Ans. 0.347)
(d) Suppose that the three power sources are assigned to supply
n the following respective percentages of the total power demand: (b) If the total thickness of the three layers is known exactly as
y 20 m. and furthermore, thickness A and B are correlated with
s Nuclear = 15% of demand; Fossil = 30% of demand; a correlation coefficient equal to 0.5. determine the probability
and Hydro = 55% of demand that the settlement will exceed 4 cm. (Ans. 0.282)
e
During normal weather, what is the probability that at least one
of the three sources will be unable to supply their respective
allocations? Assume statistical independence.
4.4 Bob and John are traveling from city A to city D. Bob de
s cides to take the upper route (through B). whereas John takes the
lower route (through C) as shown in the following figure:
i
i
>
4.7 A city is located downstream of the confluence point of two the total number of fibers across the crack. Suppose C = N(30,
rivers as shown below. The annual maximum flood peak in River- 5); each F, is N(5, 3) and N is a discrete random variable with
1 has an average of 35 m3/sec with the standard deviation of the following PMF.
10 m3/sec, whereas in River-2 the mean peak flow rate is
25 m’/sec and the standard deviation is 10 m3/sec. The annual
maximum peak flow rates in both rivers arc normally distributed PN(n)
with a correlation coefficient of 0.5. Presently, the channel which
runs through the city can accommodate up to 100 m3/sec flow
rate without flooding the city.
Fi 30 0.30
t2 20 0.20
t3 40 0.30
of the passengers will arrive at the shopping center during rush at least 400 ft3? (Hint: assume the inflow rate into each pipe is
hours in less than 1 hour? constant during that minute.)
(d) A passenger starting from Town B has an appointment (c) Suppose the municipal waste water is projected to increase
at 3:00 pm at the shopping center. If the bus left Town B at at a rate of 3 cfs per year, and pipe 5 has a capacity of 70 cfs. If
2:00 pm but has not arrived at the shopping center at 2:45 pm, the design criterion is that the probability of overflow at pipe 5
what is the probability that he or she will arrive on time for (total inflow exceeds capacity) after a storm should be less than
his/her appointment? 0.05, how many years will the current pipe 5 remain adequate,
i.e., before a larger size pipe is needed?
4.10 The settlement of each footing shown in the figure below
follows a normal distribution with a mean of 2 in. and a coeffi 4.12 The flows in two tributaries 1 and 2 combine to form stream
cient of variation of 30%. Suppose the settlements between two 3 as shown in the figure below. Suppose the concentration of
adjacent footings are correlated with a correlation coefficient of pollutants in tributary 1 is X. which is normally distributed as
0.7; that is, the differential settlement is N(20, 4) parts per unit volume (puv), whereas that in tributary
2 is Y which is N(15, 3) puv. The flow rate in tributary I is
D = |Sj - S2| 600 cubic feet per sec (cfs) and that in 2 is 400 cfs. Hence, the
where Si and S2 are the settlements of footings 1 and 2, respec concentration of pollutants in the stream is
tively. 600X + 400 Y
(a) Determine the mean and variance of D. = 0.6X + 0.4T
600 + 400
(b) What is the probability that the magnitude of the differential
settlement, i.e. \D\, will be less than 0.5 in.?
and a c.o.v. of 40%, whereas the total demand is expected to fol (ii) Will the random variable T follow a normal, or lognor
low a lognormal distribution with a mean of 1.5 million gallons mal, or some other probability distribution? Provide an expla
with a c.o.v. of 10%. What is the probability of a water shortage nation.
in this city over the next month? 4.17 A corner column of a building is supported on a pile group
4.15 A large parabolic antenna is designed against wind load. consisting of two piles. The capacity of each pile is the sum of
During a wind storm, the maximum wind-induced pressure on two independent contributions, namely, the frictional resistance,
the antenna, P. is computed as F. along the pile length and the end bearing resistance. B, at
the pile tip. Suppose F and B are both normally distributed with
P = -CRV2 mean values of 20 and 30 tons and coefficients of variation of
2 0.2 and 0.3, respectively. The capacities between the two piles
are correlated with a correlation coefficient of p = 0.8.
where C = drag coefficient; /? = air mass density in slugs/ft3; (a) Determine the mean and c.o.v. of the capacity of one pile.
V = maximum wind speed in ft/sec; and P = pressure in lb/ft2. (b) Determine the mean and c.o.v. of the total capacity of the
C, /?, and V are statistically independent lognormal variates with pile group.
the following respective means and c.o.v.’s: (c) If the maximum load applied on the pile group is also nor
mally distributed with a mean value of 50 tons and a c.o.v. of
He = 1.80; 3c=0.20 0.3, what is the probability of failure of this pile group?
Hr = 2.3 x 10-3 slugs/ft3; 8r = 0.10
4.18 Consider a tower that has leaned an inclination of 18 . The
Hv = 120ft/sec; <5y =0.45 tower continues to lean at an annual increment A, which is nor
mally distributed with a mean value of 0.1° and a c.o.v. of 30%.
(a) Determine the probability distribution of the maximum wind (a) Assume that the incremental amounts of additional leaning
pressure P and evaluate its parameters. between years are statistically independent. Determine the prob
(b) What is the probability that the maximum wind pressure will ability that the final inclination of the tower after 16 years will
exceed 30 lb/ft2? exceed 20c.
(c) The actual wind resistance capacity of the antenna is also (b) Suppose it is believed that the maximum inclination, M. be
a lognormal random variable with a mean of 90 lb/ft2 and a fore the tower collapses (i.e., the leaning capacity of the lower)
c.o.v. of 0.15. Failure in the antenna will occur whenever the is itself a normal random variable with a mean of 20 and a c.o.v.
maximum applied wind pressure exceeds its wind resistance ca of 1%.
pacity. During a wind storm, what is the probability of failure of (i) What is the probability that the tower will not collapse
the antenna? within the next 16 years?
(d) If the occurrences of wind storms in (c) constitute a Poisson (ii) An alternative assumption, instead of statistical indepen
process with a mean occurrence rate of once every 5 years, what dence between years as in part (a), is that the tower will have
is the probability of failure of the antenna in 25 years? the same incremental leaning in each year in the future. What
(e) Suppose five antennas were built and installed in a given re then is the probability that the tower will not collapse after
gion. What is the probability that at least two of the five antennas 16 years?
will not fail in 25 years? Assume that failures between antennas 4.19 Five cylindrical tanks are used to store oil; each of the tanks
are statistically independent. is shown below. The weight, IV, of each tank and its contents is
4.16 A pile is designed to have a mean capacity of 20 tons; 200 kips. When subjected to an earthquake, the horizontal iner
however, because of variabilities, the pile capacity is lognor- tial force may be calculated as
W
mally distributed with a coefficient of variation (c.o.v.) of 20%. F = —a
Suppose the pile is subject to a maximum lifetime load that is g
also lognormally distributed with a mean of 10 tons and a c.o.v. where:
of 30%. Assume that pile load and capacity are statistically in g = acceleration of gravity = 32.2 ft/sec2
dependent. a — maximum horizontal acceleration of the earthquake
(a) Determine the probability of failure of the pile.
(b) A number of piles may be tied together to form a pile group
to resist an external load. Suppose the capacity of the pile group
is the sum of the capacities of the individual piles. Consider a
pile group that consists of two single piles as described above.
Because of proximity, the capacities between the two piles are
correlated with a correlation coefficient of 0.8. Let T denote the
capacity of this pile group.
(i) Determine the mean value and c.o.v. of T.
Problems •< 195
During an earthquake, the frictional force at the base of a tank independent. There are currently 100 cars and 30 trucks on the
keeps it from sliding. The coefficient of friction between the tank bridge.
and the base support is a lognormal variate with a median value (a) Determine the mean and c.o.v. of the total vehicle weight on
of 0.40 and a c.o.v. of 0.20. Also, during an earthquake assume the bridge.
that the maximum ground acceleration is a lognormal variate (b) Estimate the probability that the total vehicle weight exceeds
with a mean of 0.30g and a c.o.v. of 0.25%. 1200 kips. State any assumption that you use.
(a) What is the probability that during an earthquake a tank will (c) Suppose the total dead load of the bridge is normally dis
slide from its base support? tributed with a mean of 1200 kips and a c.o.v. of 10%.
(b) Of the five tanks, what is the probability that none of them (i) What is the probability that the total vehicle load will ex
will slide during an earthquake, assuming that the conditions ceed the total dead load?
between the tanks are statistically independent? (ii) What is the probability that the sum of total dead and
vehicle load will exceed 2500 kips?
4.20 The time T in minutes for each car to clear the toll station at
Lion Rock Tunnel is exponentially distributed with a mean value 4.24 “5-star” brand cement is shipped in batches, in which each
of 5 sec. What is the probability that a line of 50 cars waiting to batch contains 40 bags. Previous records show that the weight
pay toll at this tunnel can be completely served in less than 3.5 of a randomly selected bag of this brand of cement has a mean
min? of 2.5 kg and a standard deviation of 0.1 kg. but its exact PDF is
unknown.
4.21 Consider a long cantilever structure shown in the figure be
(a) What is the mean weight of one batch of 5-star brand ce
low, which is loaded by a force F at a distance X from the fixed
ment?
end A. Suppose F and X are independent lognormal random
(b) Suppose the shipping company charges a penalty if a batch
variables with mean values of 0.2 kips and 10 ft., respectively.
exceeds its mean weight by more than 1 kg. What is the proba
The c.o.v.s are 20% and 30%, respectively.
bility that a batch of 5-star brand cement will receive a penalty?
(a) Determine the probability that the induced bending moment
(c) Suppose the standard deviation of each bag is changed to 1 kg
at A will exceed 3 ft-kip. (Ans. 0.093)
but all other parameters remain the same. What is the probabil
(b) Suppose there are 50 forces, each acting at a random location
ity of a penalty now? Comment on whether or not the increased
along the beam. Each force has a lognormal distribution with a
standard deviation is desirable.
mean of 0.2 kip and c.o.v. of 20%, and the location of each force
is also lognormal with a mean of 10 ft. from A and a c.o.v. of 4.25 From an extensive survey of live floor loads of office build
30%. Assume statistical independence between individual val ings, it was determined that the sustained EUDL (equivalent uni
ues of the 50 forces and also between individual locations of formly distributed load) intensity may be modeled with a log
the 50 forces. Determine the probability that the overall induced normal distribution with a mean EUDL of 12 psf and a c.o.v. of
bending moment at A will exceed 120 ft-kip. (Ans. 5.5 x 10~5) 30%. Over the life of a building, the live floor load may fluctuate
because of changes in occupancy and use of the floor space.
(a) Assuming an average rate of occupancy change of once every
2 years, determine the exact distribution of the lifetime maximum
live load EUDL for office buildings over a life of 50 years.
(b) Develop the corresponding asymptotic form for the distribu
tion of the lifetime maximum EUDL, and evaluate the parameters
for the 50-year life.
4.22 The salaries of Assistant Engineers (AEs) at a large en
4.26 The PDF of an initial variate with a gamma distribution is
gineering firm are uniformly distributed between $30,000 and
$50,000 per year.
(a) What is the probability that a randomly selected AE at this fx(x) = —e~vx\ x > 0; k > 1
J T(Jl)
firm will have an annual salary exceeding $40,000?
(b) If 50 AEs are selected at random from this firm, find the
where v and k are the parameters and F(k) is the gamma function,
probability that their average salary will exceed $40,000.
(a) Determine the CDF and PDF of the largest value from a
(c) Why do the answers from (a) and (b) differ significantly?
sample of size n.
Elaborate.
(b) Determine the appropriate asymptotic form for the distribu
4.23 $uppose the traffic on a long-span bridge consists of two tion of the largest value as n —> oo.
types of vehicles: cars and trucks. The weight of each car has
4.27 Axle loads of trailer trucks higher than 18 tons may be
a mean of 4 kips and standard deviation of 1 kip, whereas the
modeled with a shifted exponential distribution as follows:
weight of a truck has a mean of 20 kips and standard deviation
of 5 kips. Assume that the weight of each vehicle is lognormal ly
distributed and the weights between vehicles are statistically
196 Chapter 4. Functions of Random Variables
Suppose that a culvert is subjected to 1355 of the above axle shoring system will be inadequate to withstand the maximum
loads (> 18 tons) in a year. Assume that the axle loads are wind velocity during the erection of the building?
statistically independent and that the average traffic volume will (c) What would be the mean and c.o.v. of the maximum wind
remain constant over the years. velocity during the erection of the building?
(a) Determine the mean and c.o.v. of the maximum axle load on
4.31 In Problem 4.30, suppose the daily maximum wind at the
the culvert over periods of 1,5. 10, and 25 years.
building site is lognormally distributed (instead of normally)
(b) For a culvert life of 20 years, determine the probability that
with the same mean of 40 mph and c.o.v. of 25%. In this case, if
it will be subjected to an axle load of over 80 tons.
the erection of the building requires 3 months to complete:
(c) Determine the "design axle load” corresponding to an allow
(a) What is the most probable maximum wind velocity that can
able exceedance probability of 10% over the life of 20 years.
be expected during the erection of the building?
4.28 The daily level of dissolved oxygen (DO) concentration (b) What would be the probability that the shoring system will
in a stream is assumed to be Gaussian distributed with a mean be inadequate to withstand the maximum wind velocity during
of 3.00 mg/1 and a standard deviation of 0.50 mg/1. The DO the erection of the building? The capacity of the shoring system
concentrations between days may be assumed to be statistically is 70 mph.
independent.
4.32 The hydraulic head loss in a pipe may be determined by
(a) Determine the asymptotic distribution of the minimum daily
the Darcy-Weisbach equation as follows:
DO in a month (30 days); also in a year (365 days).
(b) Evaluate the probability that the daily DO concentration will fLV~
be less than 0.5 mg/1 in a month; in a year. 2Dg
(c) Determine the mean and c.o.v. of the minimum daily DO
concentration over a period of a month; over a period of a where:
year. L = length of a pipe
4.29 A structure is designed with a capacity to withstand a wind V = flow velocity of water in a pipe
velocity of 150 kph; i.e., any wind velocity up to 150 kph will I) = pipe diameter
not cause damage to the structure. In a hurricane-prone area, the
f = coefficient of friction
maximum wind velocity during a hurricane may be modeled as
an extreme Type 1 asymptotic distribution with a mean velocity g = gravitational acceleration = 32.2 ft/sec*2
of 100 kph and a c.o.v. of 0.45. Suppose a pipe has the following properties:
(a) For a structure with the above wind-resistant capacity, what
would be the probability of wind damage during a hurricane? Variable Mean Value
(b) If the permissible damage probability were to be reduced
L 100 ft 0.05
to 1/10 of the original design, as described in (a) above, what
D 12 in. 0.15
should be the wind-resistant capacity of the revised design
f 0.02 0.25
(in kph)?
V 9.0 fps 0.2
(c) If the occurrence of hurricanes in the area is modeled as a
Poisson process with a return period of 200 years, what is the
(a) By first-order approximation, determine the mean and stan
probability of damage to the original structure over a life of 100
dard deviation of the hydraulic head loss of the pipe.
years? Also, what would be the corresponding damage proba
(b) Evaluate also the corresponding mean head loss by second-
bility to the revised structure? Assume that damages between
order approximation.
hurricanes are statistically independent.
(d) Suppose that three structures with the original design were 4.33 By simple beam theory, the maximum deflection of a pris
built in the area, each with the same wind damage probability. matic cantilever beam under a concentrated load P applied at the
What is the probability that at least one of the structures will be end of the beam is given by
damaged over the life of 100 years?
PL3
4.3 0 The wind gust at a building site is of concern during the D = -----
3 El
erection of a steel building. Suppose the daily maximum wind
where:
velocity at a particular site can be modeled as a Gaussian random
variable with a mean velocity of 40 mph and a c.o.v. of 25%. L — length of the beam
(a) If the erection of a building can be completed in a period E = modulus of elasticity of the material
of 3 months, what is the most probable maximum wind velocity
I = moment of inertia of the cross section of the beam
that can be expected during the erection of the building?
(b) If the shoring system used in the erection can withstand a Suppose a 15-ft rectangular wood beam with nominal cross-
wind gust velocity of 70 mph, what is the probability that the sectional dimensions of 6" x 12" (actual mean dimensions are
Problems 197
5.5" x 11.5") is subjected to a concentrated load P with a mean (i) Estimate the mean and variance of Y using the first-order
of 500 lb and a c.o.v. of 20%. The modulus of elasticity E of the approximate method. (Ans. 4, 1.64)
wood has a mean value of 3,000,000 psi and a c.o.v. of 0.25. We (ii) Determine if this offer from the contractor will yield a
may assume that the c.o.v. of each cross-sectional dimension is larger probability of acceptable performance. You may as
0.05. The variables P. E, and / are statistically independent. sume Y to be lognormal with values of mean and variances
(a) Determine the first-order mean and standard deviation of the estimated in part (i).
end deflection D of the cantilever wood beam. Observe that the
1 , 4.36 Manning's formula frequently used for determining the
moment of inertia of a rectangular cross section is / = — bh, flow capacity of a circular storm sewer is
in which b and h are correlated with phh = 0.80
(b) Evaluate also the mean deflection of the beam by second- Qc = 0.463 n_|£>2-67S0'5
order approximation.
in which <2c is the flow rate (in ft'/sec); n is the roughness coef
4.34 The following is an engineering formula, for Z: ficient; I) is the sewer diameter (in ft.); and S is the pipe slope
(ft/ft).
Z = X Y~ IV1/2
Due to manufacturing imprecision and construction error,
where: the sewer diameter and pipe slope, together with the roughness
coefficient, are subject to uncertainty. Consider a section of sewer
X: uniformly distributed between 2 and 4
in a storm sewer system. The statistical properties of n. D. and S
Y: beta distribution between 0 and 3 with parameters in the above Manning's formula are given in the following table.
q = 1 and r = 2
W\ exponentially distributed with a median of 1
Variable Mean Coefficient of Variation
and X, Y. and W are statistically independent.
(a) Determine the mean values and variances of X. Y, and W. n 0.015 0.10
respectively. (Ans. 3, 1/3; 1, 1/2; 1.443, 2.081) D (ft.) 3.0 0.02
(b) Estimate the mean, variance, and c.o.v. of Z using the first- S (ft/ft) 0.005 0.05
order approximation. (Ans. 3.6, 29.7, 1.51)
4.35 A clay liner is constructed to reduce the amount of pollu Suppose that all three random variables are statistically indepen
tants migrating from the landfill to the surrounding environment, dent.
as shown in the figure below. (a) Using the first-order approximate formula, estimate the
mean and variance of the flow carrying capacity of the sewer.
(b) Calculate the percentage of contribution of each random
variable to the total uncertainty of the sewer flow capacity and
identify the relative importance of each random variable. (Ans.
n: 74.2%, D: 21.2%; S: 4.6%)
(c) Determine also the second-order mean of the sewer flow
capacity.
(d) Assuming that the sewer flow capacity follows the lognor
mal distribution with the mean and variance obtained from part
Suppose X is the flow' rate of pollutants from the landfill with (a), determine the reliability that the sewer capacity can accom
out the installation of the clay liner; L is the effectiveness of modate an inflow of 30 ft3/sec. (Ans. 0.996)
the clay liner, such that the flow rate of pollutants ejected to the
environment is given by 4.37 The amount of rainfall R in each rainstorm is independently
lognormally distributed with a mean of 1 in. and a c.o.v. of 100%.
Y - (1 - L) X (a) Suppose 50 rainstorms have occurred last year, and other
contributions to rainfall are negligible. Estimate the probability
X follows a lognormal distribution with a mean of 100 units per
that last year’s annual rainfall exceeded 60 in.?
year and a c.o.v. of 20%. Acceptable performance requires that
(b) Suppose the amount of water collected from a single rain
Y should not exceed 8 units per year.
storm is predicted by the following equation:
(a) Suppose the quality of construction of the clay liner can be
assured to have an effectiveness of 0.95. What is the probability V =NR\n(R + 1)
of acceptable performance? (Ans. 0.993)
(b) Suppose the contractor cannot guarantee the effectiveness where V is the volume of water in thousand gallons and N is a
to be 0.95. Instead, he offers to provide a liner with a mean random variable (independent of/?), with a mean value of 1 and
effectiveness of 0.96 but with a standard deviation of 0.01. a c.o.v. of 50% representing the error of the prediction model.
198 Chapter 4. Functions of Random Variables
(i) Determine the mean and c.o.v. of V using the first-order where K is the stiffness and M is the mass of the system.
approximate formulas. (a) If the mean value and standard deviation of K are, respec
(ii) Are the approximations good in this case? Why? tively, 400 and 200, and M = 100, determine the first-order ap
(iii) Evaluate also the second-order mean of V. proximate mean value and standard deviation of the natural fre
quency of the system.
4.38 The natural frequency, a>, of a single-degree-of-freedom
(b) Now, if M is also a random variable with a standard deviation
spring-mass mechanical system is
of 20, determine the corresponding mean and standard deviation
of the natural frequency.
= /£ (c) Determine the second-order approximate mean value of the
V M natural frequency.
REFERENCES
Ang. A. H-S., and W.H., Tang. Probability Concepts in Engineering Gumbel, E., Statistics of Extremes. Columbia University Press, New
Planning and Design, Vol. 11, Decision, Risk and Reliability, John York, 1958.
Wiley & Sons. Inc., New York, 1984. Hald. A., Statistical Theory’ and Engineering Applications, John Wiley
Fisher, R.A., and L.H.C., Tippett. “Limiting Forms of the Frequency and Sons, Inc., New York, 1952.
Distribution of the Largest or Smallest Number of a Sample,” Proc. Weibull, W„ “A Statistical Distribution of Wide Applicability," Journal
Cambridge Philosophical Soc., XXIV. Part II, 1928. pp. 180-190. of Applied Mechanics, ASME. Vol. 18. 1951.
Computer-Based Numerical
and Simulation Methods
in Probability
► 5.1 INTRODUCTION
The main thrust of this chapter is to present numerical and simulation methods for solving
probabilistic problems that are otherwise difficult or practically impossible by analytical
methods. These numerical methods must necessarily rely on high-speed computers for their
applications. The objective here is to expand the usefulness and utility of probabilistic mod
eling of engineering problems containing uncertainties. As in the previous three chapters,
Chapters 2 through 4, the probability concepts are also illustrated with numerical examples
in the present chapter. However, the pertinent solutions would now require computer pro
grams that may be coded using commercial softwares (such as MATLAB); also, there are
no preprogrammed software packages that can be used for the required solutions. Some of
the coded programs for solving the example problems may not be optimal but are intended
only to be illustrative.
In many practical engineering situations, the problems (even deterministic ones) may
be complicated and not amenable to analytic solutions. In such situations, numerical meth
ods are necessary and often provide the only practical and effective approach. When the
problems involve random variables, or require consideration of probability, the numeri
cal process may include repeated simulations through Monte Carlo sampling techniques.
When we consider problems involving aleatory and epistemic uncertainties separately, there
is even greater need for Monte Carlo simulations.
For example, when the distributions of the individual basic variables are not all Gaus
sian, the probability distribution of the sum of two or more basic variables may not be
Gaussian as we can observe from Sect. 4.2. Similarly, in the case of products or quotients of
several random variables, unless all the random variables are lognormally distributed, the
probability distribution of the function (product or quotient) may not be lognormal, as in
Sect. 4.2. Moreover, when the function is a nonlinear function of several variates, regardless
of distributions, the distribution of the function is often difficult or practically impossible
to determine analytically. In these, and in other cases, we have to resort to the application
of numerical methods to obtain practical solutions, even if approximations are involved.
We must emphasize that the purpose of this chapter is to illustrate the solution process of
available numerical tools to enhance the application of probabilistic and statistical models in
199
200 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
engineering. The descriptions of the numerical tools, which are generally well established,
however, are beyond the scope of the chapter.
Also, random number generators for a number of distributions are available in MATLAB with
its companion Statistics Toolbox; MATHCAD and MATHEMATICA have similar random
number generators for essentially the same distributions.
In any Monte Carlo simulation, the sample size, n, or number of random numbers
generated for each distribution, must be prescribed. The accuracy of the solution obtained
through Monte Carlo simulation will improve with the sample size. This accuracy is often
important and may be measured by the coefficient of variation (c.o.v.) of the solution.
Later, in Eq. 6.27 of Chapter 6, we shall show that in estimating the probability p with
a sample of size /?, the sample mean P is an unbiased estimate of the probability p, and the
sample variance of P is given by Eq. 6.28a as
2 P(i-P)
n----
----~
Therefore, the c.o.v. of P estimated with a sample of size n in a Monte Carlo simulation is
— /(I - P)
c.o.v.(P) = J-—=— (5.1)
V nP
Shooman (1968) has shown, based on an approximation, the same result in determining the
probability of system performance through MCS.
To obtain the actual c.o.v. of a Monte Carlo solution, we have to repeatedly perform
the same simulation process, say with N repeated simulations, each with the same sample
size n, and obtain the mean and standard deviation of the N simulations from which the
c.o.v. of the solution can be estimated. This is illustrated specifically in Example 5.5.
Using Eq. 5.1, we can also evaluate the error (in percent) of a Monte Carlo solution
with a given sample size n,
1 - P
error (in %) = 200 (5.2)
nP
Conversely, Eq. 5.2 may be used to determine the sample size, /?, required in a Monte Carlo
simulation for a specified tolerable % error in the estimated P.
programs developed for some of the examples are intended to be illustrative and may not
necessarily be optimal.
► EXAMPLE 5.1 Suppose employees of a company have to travel by air between two major cities; the air travel time (in
hours) between the cities is estimated to require a minimum of 4 hr and a maximum of 8 hr. depending
on the weather and traffic conditions. From data based on experiences of its company employees,
a beta distribution with parameters <7 = 2.50 and r = 5.50 is deemed to be appropriate to model the
probability distribution of the actual travel time between the two cities.
The probability that the actual travel time T will be less than 6 hr, P(T < 6), may be calculated
conveniently through MATLAB using the incomplete beta function of the Statistics Toolbox. Therefore,
according to Eq. 3.52, with
u = (6 - 4)/(8 - 4) = 0.5
Note', betainc(x, a. b) is the MATLAB syntax, or statement, for the incomplete beta function ratio.
The probability that the travel time will be between 4.5 hr and 5.5 hr would be
Instead of the beta distribution, the gamma distribution was also suggested as a plausible alternative
distribution for the travel time between the two cities. Assuming that the gamma distribution will
have the same mean and standard deviation as the above beta distribution, which are ij,t = 5.32 hr and
0-7- = 0.50 hr, the corresponding parameters of the gamma distribution are v = 21.73 and k = 115.60.
Then using MATLAB again, we obtain the probability that the actual travel time will be less than
6 hr as (see Eq. 3.43a)
Note: gammainc(v, x, Ic) is the MATLAB syntax, or statement, for the incomplete gamma function
ratio whereas the probability that the travel time will be between 4.5 hr and 5.5 hr is now
Solution by Numerical Integration—Alternatively, the probability l\T < 6) may be evaluated by nu
merical integration. This can be accomplished conveniently by quadrature integration using MATLAB
as follows:
Create an “M-File”: function f=bfunc(x)
((
f=((beta(2.5,5.5)).A-l) .* (x-4).A1.5).
* (8-x).A4.5)./4.A7
and save it. Then perform the quadrature integration between 4 and 6 in the command window of
MATLAB with the command.
quad(@bfunc,4,6)
5.2 Numerical and Simulation Methods 203
obtaining a result of 0.872 for the probability P(T < 6). The probability of T between 4.5 and 5.5
hours would be obtained with the command.
quad(@bfunc,4.5,5.5)
giving a result of 0.570.
We observe that these results are the same as those obtained above for the beta distribution. By
replacing beta with gamma in the above M-File, similar results for the gamma distribution can be
obtained. ◄
► EXAMPLE 5.2 This problem illustrates the ease of generating the CDF of the binomial distribution using MATLAB
or MATHCAD.
In a car assembly plant, suppose that 50 cars are produced daily. There are 15 parts in a car that
are critical for a vehicle to function properly. The parts are randomly selected for inspection, and
according to the inspection record, the average percentage of defective parts is 6%. The conditions of
the 15 parts in a car may be assumed to be statistically independent.
If three or more of the critical parts installed in a car are defective, the car will malfunction within
a month or 1000 miles. The probability that a customer will purchase one of the malfunctioning cars,
therefore, is
P(X > 3) = 1 - P(X < 3) = 1 - P(X < 2)
= 1 - binocdf(2,15,0.06) = 0.0571
where binocdf(2.15,0.06) is the MATLAB statement for the CDF of the binomial distribution with
parameters n — 15. x = 2. and p = 0.06.
Also, the probability that this assembly plant will produce no more than 4 defective vehicles per
day, among the 50, is
P(X < 4) = Wnoa//(4,50,0.0571) = 0.8445
We might observe that the parameters in these problems are beyond the limits of Table A.2. MATHCAD
may be used in place of MATLAB to obtain the same results. ◄
► EXAMPLE 5.3 Consider the sum of three random variables X।, X}, and X3 that are statistically independent. All three
variables are individually lognormally distributed with the following respective means and c.o.v.s:
= 71n(l +0.502) = 0.47; Cr2 = yin(l + 0.602) = 0.55; = x/ln(l +0.702) = 0.63
XX1 = In 500 - |(0.47)2 = 6.10; = In 600 - ^(0.55)2 = 6.25;
I ,
Ax, = In 700 - -(0.63)2 = 6.35
In this case, the probability distribution of the sum S = Xt +X2 + X3 will neither be normal nor
lognormal. In fact, its distribution would be difficult to determine analytically. Therefore, if the
probability distribution of 5 is required, Monte Carlo simulation would be a practical solution. With
10,000 repetitions (i.e., sample size, n — 10.000), we generate, using MATHCAD. the histogram of
values of 5 as shown below, with the following statistics of S:
The mean value, the standard deviation, and the skewness of S are, respectively,
mean(S) = 1.799 x 103 Stdev(S) = 644.869 skew(S) =1.316
204 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
As the skewness coefficient is not zero, the distribution of the sum, S, is clearly not normal. Also, the
75-percentile and 90-percentile values of S are, respectively,
S75 = 2.121 x 103 and S90 = 2.628 x 103
Similarly, using MATLAB, we obtain by MCS with 10.000 repetitions, the following histogram for S:
5.2 Numerical and Simulation Methods 205
The corresponding mean value, standard deviation, and skewness coefficient of 5 are, respect
ively.
whereas the 50% (median), 75%, and 90% values of S are, respectively,
Xl=lognrnd(6.10,0.47,10000, 1) ;
X2=lognrnd(6.25,0.55,10000,1);
X3=lognrnd(6.35,0.63,10000,1);
S=X1+X2+X3;
hist(S,40) % plots the histogram ofS
mean(S) ;
std(S);
skewness(S)
pretile(S,50) % the 50% value ofS
pretile(S,75);
pretile(S,90)
The solution for this example may also be performed using Microsoft EXCEL; the simulation can be
performed in a spreadsheet through the following steps:
1. Go to Tools, select Add-in, then select Analysis Tool Pak and Analysis Tool Pak-VBA.
2. Since Excel does not have a built-in random number generator for lognormal distribution, one can
first generate values for a normal random variable Y and then obtain the corresponding values of the
lognormal random variable from the relation A'=exp(T).
3. Create three columns of normally distributed random numbers using Tools-Data/Analysis-Random
Number Generation. Choose Normal and input the mean and standard deviation of (6.10, 0.47);
(6.25, 0.55); (6.35, 0.63), respectively, from previously calculated parameters. Call these Yl, Y2,
and Y3.
4. Create another set of three columns of X1, X2, and X3 by mapping the respective columns of Y1,
Y2, and Y3 through the function X = exp(f).
5. Create a fourth column of random numbers as X1+X2+X3 using EXCEL's ordinary arithmetic
formulas. Copy the formula from the first cell to the last.
6. Use EXCEL's Tools-Data Analysis-Descriptive Statistics on the data created in step 5 to find the
mean, standard deviation and, other pertinent statistics.
7. Use EXCEL’s Tools-Data Analysis-Histogram on the data created in Step 5 to construct the
histogram. Select the appropriate interval (bin) as needed.
The respective statistics of the sum of the three lognormal random variables are
The histogram of the sum 5 based on a sample size of 10,000 is shown in the following.
► EXAMPLE 5.4 Suppose the random variables in Example 5.3 are now Gaussian with the following respective means
and standard deviations:
inti:=-500+3000•
J 40
5.2 Numerical and Simulation Methods 207
Histogram of P
Alternatively, using MATLAB, we also obtain through MCS with 10.000 repetitions the following
histogram of P:
The corresponding mean value, standard deviation, and skewness coefficient of P are,
respectively.
For simulation using the EXCEL spreadsheet, the steps are as follows:
1. Go to Tools, select Add-in, then select Analysis Tool Pak and Analysis Tool Pak-VBA
2. Generate 3 columns of normally distributed random numbers using Tools—Data Analysis—
Random Number Generation—(choose Normal and input the parameters: mean & standard
deviation). Call these XI, X2, & X3.
3. Create a 4th column of random numbers as * X3/X2
XI using EXCEL’s ordinary arithmetic for
mulas. Copy the formula from the first cell to the last.
4. Use EXCEL’s Tools—Data Analysis—Descriptive Statistics on the data created in step 3 to find
the mean, standard deviation and other pertinent statistics.
5. Use EXCEL’s Tools—DataAnalysis—Histogram on the data created in step 3 to construct the
histogram. Select the appropriate interval (bin) as needed.
X3/X2 based on 10,000 repetitions are as follows:
The statistics of the product XI *
► EXAMPLE 5.5 In this example, we will consider a problem involving two independent random variables with different
probability distributions. As a specific problem, consider the capacity of a pile group, R, and the applied
load, 5, that are, respectively, Gaussian and lognormal variates as follows:
R = N(50, 15) tons; S = LN(30. 0,33) tons
i.e., R is Gaussian with a mean of 50 tons and a standard deviation of 15 tons, whereas S is lognormal
with a median of 30 tons and a c.o.v. of 0.33.
The probability of failure of the pile group under the applied load is
Pf = (R - 5 < 0)
Because the distribution of (/? — S) is clearly neither normal nor lognormal. Monte Carlo simu
lation is an effective means to evaluate the above probability of failure. For this purpose, we generate
n sample values of R from the given Gaussian distribution and the same n sample values of 5 from the
lognormal distribution, and we calculate for each zth pair of sample values the difference (/?, — 5,).
The ratio of the number of (/?, — 5,) < 0 to the total sample size n can then be used to represent the
probability of failure.
That is,
n
£(/?/ - $,) < 0
The result obtained with MATHCAD for n = 100.000 yields the following failure probability:
pt. =0.155
5.2 Numerical and Simulation Methods 209
According to Eq. 5.1, the c.o.v. of the above p/.- obtained with a sample size of n = 100,000 would be
0.0074 or 0.74%.
Using MATLAB with the same sample size of n— 100,000. we obtain the failure pro
bability of
p,. = 0.1566
Clearly, the above results show that the c.o.v. decreases with the sample size n.
► EXAMPLE 5.6 This problem is the same as that of Example 4.12, except that the individual work durations are now
beta-distributed instead of Gaussian as in Example 4.12. Figure E5.6 shows the same activity network
as that of Fig. E4.12.
210 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
The framing of a house may be done by subassembling the components in a plant and then
delivering them to the building site for framing. Simultaneously, while this subassembly of the com
ponents is being fabricated, the preparation of the site, which includes the excavation of the foundation
through the construction of the foundation walls, can proceed at the same time. The different activities
and their respective sequences may be represented with the activity network shown in Fig. E5.6; the
required work durations for the respective activities are shown in the table below, with the respective
distribution parameters.
Assume that the required completion times for the different activities are statistically independent
beta-distributed variates, with the respective means and standard deviations as indicated in columns
3 and 4 in the above table. With the shape parameters of <7 = 2.00 and r = 3.00, the corresponding
lower and upper bound values of the respective beta distributions are as indicated in the above table.
Clearly, to start framing the house, the foundation walls must be completed and the assembled
components must also be delivered to the site. The probability that framing of the house can start
within 8 days after work started on the job can be determined as follows.
With the durations of the respective activities listed in the table above, the total durations for the
two sequences of activities are
7j = Xj + X? T X3
and
T2 = X4 4- X5
in which 7j and T2 are also statistically independent. As the individual variates X1; X2, X3, X4, and
X5 are beta-distributed, the above sums for T\ and T2 are difficult to determine analytically. MCS
provides a practical alternative. The required MCS results by MATHCAD using 100,000 repetitions
are shown below, yielding the following:
P(7j < 8) = 0.772
and
P(T2 < 8) = 0.736
As T\ and T2 are statistically independent, the probability that framing can start within 8 days is,
therefore,
P(framing) = 0.899 x 0.802 = 0.721
5.2 Numerical and Simulation Methods 211
Both MATHCAD and MATLAB generate random numbers only for the standard beta distribution for x
between 0 and 1 with shape parameters q and r. The corresponding beta-distributed random numbers
for X between a and b with the same parameters can be obtained as X = (b — a)x + a.
► EXAMPLE 5.7 In Example 4.25 of Ang and Tang (1984), it was shown that the annual floods (maximum discharges),
F, of the Wabash River in Mount Carmel. Illinois, may be modeled by the Type I largest asymptotic
distribution with the following parameters: u„ = 120 x 10’ cfs and a„ =0.015 x 10 '.
212 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
Suppose that a bridge is planned to be constructed across the river with a center pier. The pier
foundation is designed against scouring during high floods. The tlood capacity of the pier foundation
against scouring. R, may be assumed to be a lognormal random variable with a median capacity of
400 x 10' cfs and a c.o.v. of 0.30. The probability of failure of the center pier due to scouring during
high floods is
pF = P(R < F)
The required probability of failure. pf.. can be evaluated effectively by MCS. We describe below the
MATHCAD statements to perform the MCS with 10,000 repetitions, showing the failure probability
of the pier to be 0.037.
Special note-. A MATHCAD program is developed to generate the required random numbers
with a Type I asymptotic distribution of largest values. The Type / largest random numbers .v with
parameters a and can be generated as follows:
Fx(x) = exp(-e~a{x~p))
At probability Fx(x) — u, where n = value of uniformly distributed random number.
x =/? — (l/a)ln[ln(l/u)]
Similarly, we obtain the solution of the same problem with the following MATLAB program,
obtaining the failure probability of the pier as p? = 0.0359.
EXAMPLE 5.8 Suppose the load effect of heavy vehicles, each weighing X tons, on a bridge support is X133 tons
(including impact). If the number of heavy vehicles on the bridge is a random variable N with a
Poisson distribution with a mean of 25, and the average vehicle weight on the bridge is a Gaussian
random variable N( 12,4) tons, the total vehicle load effect on the bridge support, Y, would be given by
Y=N(X)'33
X"
P(N=n)=— ek and X = N(\2, 4)
n'.
Y is also a random variable; its distribution would be difficult to determine analytically. We resort to
MCS with 1000 repetitions using MATHCAD yielding the following histogram for the vehicle load
effect, Y:
From the 1000 sample values, the corresponding mean, standard deviation, and skewness coefficient
of Y are estimated to be as follows:
Using MATLAB, we also obtain through MCS with 1000 repetitions, the following histogram.
Histogram of Y
90
80
c 70
o
« 60
a>
2 50
o
Z 20
10
0
0 500 1000 1500 2000 2500
Values of Y
The corresponding mean, standard deviation, and skewness coefficient of Y are, respectively.
mean(l) = 689.23 tons Stdev(T) = 348.41 tons skew(T) = 0.666
► EXAMPLE 5.9 Consider the problem that was illustrated in Example 4.14 involving a high-rise building consisting of
a structure-foundation system designed to withstand wind-induced pressures. However, let us modify
the problem by changing the distributions for some of the variables as follows:
Assume that the wind-induced pressure capacities of the superstructure and foundation are.
respectively, normal variates:
Rs = Af(70. 15) psf and Rr = A(60, 20) psf
5.2 Numerical and Simulation Methods 215
The peak wind-induced pressure, Pw, on the building during a wind storm is
Pw = 1.165 x 10“3CV2, inpsf
where:
C = the drag coefficient, a Gaussian variate 7V( 1.80, 0.50).
V = maximum wind speed, a Type 1 extremal variate with a modal speed of 100 fps. and a c.o.v.
of 30%; i.e., the equivalent extremal parameters are a =0.037 and u = 100.
The probability distribution of P„. is clearly not any of the standard distribution; for this deter
mination we resort to Monte Carlo simulation. With 1000 repetitions (or sample size) we generate
the histogram of Pw with the following MATHCAD program:
v:=ioo-^(inHW)
P: =(1.165 • 10-3 • C ■ V2)
With the results of the above simulations, obtained with a sample of size n = 1000. we estimate
the sample mean and sample standard deviation of Pw using Eqs. 6.1 and 6.3 of Chapter 6, yielding,
respectively.
mean(P) = 31.09
Stdev(P) =24.18
skew(P) =2.73
Moreover, by Monte Carlo simulation (with 1000 repetitions), we also obtain the failure probabilities
of the superstructure and the foundation, using MATHCAD, respectively, as follows:
PF: = —— PF = 0.0 72
1000
F:=rnorm(1000,60,20)
v:=(F-P) < 0
999
With MATLAB, the corresponding results (for sample size n — 1.000) of Example 5.9 are as
follows:
For the superstructure, the failure probability is Pfs = 0.072;
whereas for the foundation, the failure probability is prf =OAT1\
and the probability of failure of the superstructure-foundation system is pF = 0.149.
mean(V)=115.3406
std(V)=35.1357
C=normrnd(1.80,0.50,1000,1);
10
*
Pw=l. 165 -3.
C.
V
A* A2 ;
mean(Pw)=30.1943
std(Pw)=22.4987
skewness(Pw)=2.793
► EXAMPLE 5.10 In Example 4.9. suppose the storm drain of the city is a normal variate with a mean capacity of
1.5 million gallons per day (mgd) and a standard deviation of 0.3 mgd: i.e., the capacity is N{ 1.5,
0.30). The storm drain serves two independent drainage sources within the city that are gamma
distributed (instead of normal as in Example 4.9) with the same respective means and standard
deviations as in Example 4.9; i.e., drainage source A = gamma-distributed with k— 12 and v= 17.5
and drainage source B = gamma-distributed with k = 11 and v = 22.22. These parameter values are
consistent with those of Chapter 3. which defined mean = k/v and variance = k/v2. In the present case,
with the indicated parameter values, the means and standard deviations of drainage sources A and
B are, respectively, the same as those of Example 4.9; namely, =0.70 mgd, pB = 0.50 mgd. and
=0.20 mgd, <tb = 0.15 mgd.
The probability that the storm drain capacity will be exceeded during a storm can be determined
as follows:
P(D < A + B) = P{[D - (A + B)] < 0}
Because D is a normal variate but A and B are independent gamma-distributed variates, the distribu
tion of the function S = [D— (A + B)] is difficult to determine analytically. Its distribution may be
determined by Monte Carlo simulation as indicated below.
The probability that the drain capacity will be exceeded is found by MCS with the MATHCAD
program below to be
P(S < 0) = 0.202
i «- i+1
B
mean(A)=0.685 mean(B)=0.495
Stdev(A)=0.197 Stdev(B)=0.149
skew(A)=0.605 skew(B)=0.653
x:=S < 0
9999
Em
pF:=10000
pF = 0.202
5.2 Numerical and Simulation Methods 219
MATLAB also generates the one-parameter gamma distribution. The two-parameter gamma
distributed random numbers can be generated with the following MATLAB program.
hist(A,20)
mean(A)=0.6847
Std(A)=0.1944
skewness(A)=0.4493
u=unifrnd(0,1,1000,11);
X=log(u);
Y=X';
Z=sum(Y(:,1:1000));
B=-(l/22.22)
*
Z;
hist(B,20)
mean(B)=0.4931
std(B)=0.1545
skewness(B)=0.6220
Using the MATLAB program below, the probability that the storm drain capacity will be exceeded is
pF—0.196
In this case, a repeated Monte Carlo simulation may be performed as shown below. A MATHCAD
program was developed below to automatically calculate the failure probability repeatedly for different
assumed mean values of D between = 1.70 and 3.0 until the failure probability was < 0.202. The
results obtained with this program indicate that the required upgraded mean capacity should be
Hd = 2.35 mgd.
Hence, the mean capacity of the existing storm drain must be increased by 2.35 — 1.5 =
0.85 mgd.
10000
v if p > 0.202
(break) if p< 0.202
v
yielding V =2.3 5
P-=10000 p=0.201
By MATLAB, the corresponding upgraded storm capacity is determined to be V = 2.36 mgd. This is
obtained with the following MATLAB program.
B=-(1/22.22)
z
* % generates vector of 1000 gamma
distributed values for B
w=unifrnd(O,1,1OOO,16)
a=log(w)
b=a'
c=sum(b(:,1:1000))
c
*
C=-(l/20) % generates vector of 1000 gamma
distributed values for C
for v=l.7:0.01:3.0 % disignates range of V
D=normrnd(n,0.30,1000,1)
E=D-(A'+B'+C')
F=(E<=0)
G=sum(F)
pF=(1/1000) "G % calculates failure probability
of upgraded capacity
if pF>0.196
v=v+0.1
elseif pF<=0.196
V=v
break
end
Execution of the above program yields the required upgraded drain capacity V = 2.36 mgd. The
corresponding calculated failure probability is pF = 0.192, which is less than 0.196 of the original
system. ◄
► EXAMPLE 5.11 In an apartment building, the number of occupants in the building during a fire would be highly
variable. Suppose the total number of occupants living in an apartment is 150; however, during a
fire the number of people in the building may be as few as 50. Therefore, during a fire the actual
number would be a random variable that may be described with a beta distribution ranging between
50 and 150, with parameters q = 1.50 and r = 3.00. Assume also that during a fire the probability
of serious injury to an individual in the building is 0.001. Therefore, assuming that injuries between
individuals are statistically independent, the probability of x number of injuries during a fire is given
by the binomial distribution (see Sect. 3.2) as follows:
However, the number of occupants exposed to the fire, n, is a beta-distributed random variable as
described above. Therefore, the required probability may be evaluated as
150 150 f
P(X = x) = ^P(X = x\N = n)P(N = n) = (o.ooi)‘(i - o.ooi)"-x P(N = n)
n=50 n=50(
in which N is beta-distributed within the range of (50, 150) and parameters q = 1.50 and r = 3.00.
Clearly, numerical solution is necessary to evaluate the above probability for given x. In this case,
the solution involves the summation of discrete probabilities (including discretization of the beta
PDF). For this purpose, a MATLAB program is developed below to perform the required solution
222 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
► EXAMPLE 5.12 In environmental engineering, waste treatment is a necessary process of maintaining the ultimate
BOD level of a waste. If a waste is not treated, the ultimate BOD level of the waste at a given stage
may be predicted from the BOD measured at an earlier stage of degradation. Suppose that at time
t = 5 days, the BOD was measured to be 0.22 mg/L. On this basis, the ultimate BOD level of the
waste will eventually reach.
Lo = (BOD)5/[1 -exp(-5£)J
if no further treatment is performed on the waste beyond t = 5 days. The parameter k is the reaction
rate constant; it depends on the temperature, type of waste, and the degradation process. Assume that
k may be modeled as a normal variate, 7V(0.22, 0.03) per day.
Moreover, because of variability in the measurement of the BOD level. (BOD)5 may be assumed
to be 7V(2OO, 10) in mg/L. Clearly, the distribution of Lu will not be normal; moreover, its distribution
will not be any of the standard distributions. We may determine the distribution and statistics of the
5.2 Numerical and Simulation Methods 223
ultimate level of BOD, Lo, through MCS as shown below using MATHCAD. Through this process.
► EXAMPLE 5.13 Suppose an engineering system has been designed with a median safety factor, 0. The underlying
probability of failure due to inherent variability, <5y, is
where:
0 = the median safety factor; assume 0 = 2.5
8# = the c.o.v. of 0: assume 8» = 0.25
224 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
However, because of epistemic uncertainty, (9 is also a random variable; suppose its distribution is
lognormal with distribution LN(2.5. 0.15), i.e., with median of 2.5 and c.o.v. of 0.15. Because 0 is a
random variable, Pr is also a random variable; its distribution will not be one of the standard forms.
We may generate the values and histogram of PF through Monte Carlo simulation. The MATHCAD
statements for the MCS with a sample size of 10,000 are shown below.
For each of the 10.000 values of 0, we calculate the above PF. The result is a vector of 10,000
values of PF, from which we can construct the histogram of PF, as shown below for ln(/?F), as well
as estimate its mean, median, standard deviation, and skewness, and determine also its 90% value as
follows:
Histogram of log(pF)
We may emphasize that the aleatory uncertainty gives rise to the probability of failure pF. whereas
the epistemic uncertainty is reflected in the range and distribution ofpF. The distribution of/?A provides
a decision maker the option to select conservative values of pF. such as the 90% value pFlw = 1.842 x
io-3.
mean(pF)=8.414x10 4
median(pF)=l.218xl0~4
Stdev(pF)=2.833x10 3
skew(pF)=10.578
mean = 8.46e — 004. standard deviation = 0.0029. and skewness coefficient = 17.61.
► EXAMPLE 5.14 Suppose the probability of failure of an engineering sytcm due to inherent variability (aleatory uncer
tainty) is
Pi = P(R-S)
where:
R = A(40,15)
S = LN(25, 0.35), i.e., median of 25 and c.o.v. of 0.35
Clearly, ph can be easily evaluated by MCS but would be difficult to evaluate analytically, as (R — S)
is not of any standard distribution. Moreover, because of possible errors in the estimation of pR (mean
of R) and sm (median of S), there are cpistemic uncertainties in the estimates of these parameters;
assuming no bias and respective c.o.v.s of 0.10 and 0.15, we may denote the distributions of these
parameters to be N( 1.0,0.10) and N( 1.0, 0.15), respectively. Because of these epislemic uncertainties
in pR and sm, the true failure probability will be a random variable.
For each pair of values of pR and s„„ we calculate the above pF, and for the ranges of the random
values of pR and s„„ we obtain a vector of pF.
The solution requires repeated iterations by a double-loop MCS; the inner loop calculates pF for
specified values of pR and s,„. whereas the outer loop generates the random vectors of R and .S’ for the
given pair of pR and ,v,„. This requires programming to perform the repeated iterations automatically;
the results obtained using MATHCAD with a sample size of n = 10.000 are as follows:
The mean, standard deviation, and skewness coefficient of the failure probability are. respectively,
U IN I V ER S l D A U JAVERIANA
Biblioteca General
Carrera 7 No. 41-00
Santafe de Bogota
226 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
Also, the 90% value of pF is pFw = 0.366. Again, for a conservative value of the failure probability,
we may use pFyu — 0.366, instead of the mean pF. The corresponding histogram for pF is as follows:
pF50 = 0.226
PF15 = 0.298
FF90 = 0.366
The following MATLAB program also yields the solutions for this example.
x=R-S
y=(x<=0)
n=sum(y)
pf=n/1000
pF(i,l)=pf' % a vector of 1000 values of pF.
Execution of the above program yields a vector of 1000 values of pF. with the corresponding histogram
as follows:
► EXAMPLE 5.15 In Example 5.9, we can expect significant uncertainties (epistemic) in the specification of the mean
value of the drag coefficient. /z c . and in the estimation of the mean wind velocity, //y. Moreover, there
may also be imperfection (consisting of bias and random error) in the equation for calculating the
wind-induced pressure Pw. Let us suppose the following epistemic uncertainties, expressed in terms
of the respective c.o.v.s.
And there is no bias in the equation for Pw, but there is uncertainty (in terms of c.o.v.) of A/\ = 0.20.
Therefore, the total epistemic uncertainty in the calculated wind-induced pressure Pw may be aggre
gated as
Histogram of Pw
- rnorm(1000,1.0,0.30)
P:= N *
M <- rnorm(1000,1.0,0.15)
L <-rnorm(1000,1.0,0.20)
for i e 0 . . 999
C <- rnorm(1000,1.8-Ni,0.50)
u «-runif(1000,0,1)
V (100) Mi - [0 037 ■
P - 1.165 ■ IO -3 ■ (C-V2)-Li
continue
P
Clearly, therefore, the calculated failure probability of the superstructure (as well as of the foundation)
will depend on the value of the mean wind pressure. Assuming that there are no epistemic uncertainties
in and Rf, we perform repeated MCS for 1000 sample values of the mean wind pressures (see
MATHCAD program below), obtaining the probability distribution of the failure probabilities of the
superstructure, and the foundation with the following respective statistics:
Also, from the same MCS, we obtain the histograms of the failure probabilities of the superstructure
and foundation, respectively, as shown below.
On the basis of the respective histograms, we may determine the corresponding 90-percentile
failure probabilities of the superstructure and foundation; these may be referred to as the values with
a 10% probability of error.
For the superstructure, pFw = 0.244 and for the foundation, pF9() = 0.312
As in Example 5.9, the failures of the superstructure and of the foundation are not statistically in
dependent because of the common load PH. By MCS, the failure probability of the structure-foundation
5.2 Numerical and Simulation Methods ‘229
system, which is the union of the superstructure and foundation failures, is found to be as
follows:
The corresponding 90% value of the failure probability of the structure-foundation system is 0.361.
P <-1.165 x 10~3(C.V2-L-j)
S <- rnorm(1000,70,15)
F rnorm(1000,60,20)
x (STP)
999 Y
pFs j £1000
y <-(F <¥)
y^XiVYi
PFSSJ - E Tooo~
continue
pFSS
230 ► Chapters. Computer-Based Numerical and Simulation Methods in Probability
The MATHCAD program for calculating the three failure probabilities (for the superstructure, the
foundation, and the structure-foundation system) is described above. In this program, we should point
out that the proper input/output must be specified prior to each execution of the program; i.e.. pFs for
superstructure, pFf for foundation, and pFSS for the structure-foundation system failure probabilities.
As illustrated above, the program will yield pFSS.
The MCS solutions for Example 5.15 are obtained also by MATLAB. The corresponding MATLAB
program to generate the distribution of Pw and calculate the failure probabilities of the superstructure
and the foundation, as well as of the structure-foundation system, is as follows:
The results obtained with a sample size of 1000 can be summarized as follows:
The above procedure can be extended to three or more random variables, say for k
random variables X], X2,..., X * with known joint PDF fx},x,*ix(- , x2,..., Xk) or joint
CDF FX[, x2... x*(i , -t2,.,., Xk). For such dependent random variables, the joint PDF and
CDF can be expressed, respectively, as
fX\.X2... *2X
1.>
C • • • , Xk) — /X|(X1) • ’ ,/’x4|X|..... Xk^ (Xt 1
*, • • • >*
fc-t)
and
^Xi.X2... X
*
( xl, X2, . . • , Xk) = Fx.Ui) • Fx,|X|(-t2k'l) • • • Fxk\x....... . . , Xa-1)
To generate the random numbers for each of the k random variables, we first generate k
vectors of uniformly distributed random numbers between 0 and 1, U\, t/2,..., £/
*, each
of size n. Then, the Ath vector of n random numbers for Xk is generated by,
= ^xjx,..... • • • ’ x*
-i) (5-5)
Besides the general procedure described above that applies to a set of random variables
with any prescribed joint distribution, there are also specialized techniques that apply to
particular types of joint distributions, such as the multivariate joint normal distribution as
illustrated below in Example 5.16 for the bivariate normal and Example 5.17 for the bivariate
lognormal distributions.
► EXAMPLE 5.16 Consider the generation of correlated random numbers for two random variables, X and F, with a
bivariate normal distribution. Following Example 5.10 in Ang and Tang (1984), the joint PDF of a
bivariate distribution can be expressed as
fx.Y^x, y) = Aix(.v|-r)/x U)
where /V|x(y|x) is the conditional normal PDF of Y given X as defined in Chapter 3 with a mean
and standard deviation of py and ay, and fx(x) is the marginal normal PDF of X with mean
and standard deviation of /zA- and aX- And X and Y are correlated with a correlation coefficient
of p.
Suppose the parameters of the two random variables are defined as follows:
Px = <50; ax = 20
Py = 120; aY =25; and p = 0.75
The following MATLAB program generates the correlated bi-variate normal random numbers with a
sample size of n = 100.000:
The generated random numbers, therefore, reproduce the statistics of the original or prescribed bi
variate normal distribution.
Also, the respective skewness of the marginal histograms is as follows, with both close to
zero:
confirming that the random numbers generated for X and for T|x are, respectively, very close to being
normally distributed.
The marginal histograms for X and K|x are, respectively:
To further confirm the above MCS results, we calculate the probability P(X < Y) analytically and by
MCS. yielding the following for the above joint distribution:
► EXAMPLE 5.17 Next, consider the generation of random numbers of two random variables, X and Y, that are jointly
lognormally distributed. As we saw in Example 4.3, if X is a lognormal variate, ln(X) will be normally
distributed. Therefore, we can generate random numbers for ln(X) and ln(F) with a bivariate normal
distribution as described in Example 5.16, and then determine the respective lognormal random
numbers for X and Y:
We illustrate this numerically as follows. Assume the following:
For a sample size of n = 10,000, the following MATLAB statements generate the bivariate lognormal
random numbers with the properties prescribed above.
234 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
Histogram of X
Values of X
Let us now suppose that X is the supply of construction material and Y is the demand of the
material in a construction project. It is reasonable to assume that the supply and demand are positively
correlated: i.e., the supply will be higher or lower depending on the demand. We assume a correlation
coefficient of 0.75 between X and Y. In this case, we may be interested in the probability of shortage
of material for the project (supply will be less than the demand); i.e.. P(X < Y). We do this through
MCS with the following MATLAB statements for a sample size of n = 10,000:
f = (X < Y);
s — sum(f);
P = s/10000
yielding the pertinent probability as P(X < Y) = 0.0563
It might be of interest to compare this probability with the corresponding probability if the
demand and supply were uncorrelated or statistically independent. In this latter case, we can show
5.2 Numerical and Simulation Methods 235
that the probability would be P(X < Y) = 0.1901, showing, therefore, that the assumed correlation of
75% between X and Y significantly affects the probability of shortage. ◄
► EXAMPLE 5.18 Let us now illustrate the general procedure described earlier for the bivariate normal distribution of
Example 5.16, in which the statistics of X and Y are as follows:
Execution of the above MATLAB statements yields the following statistics for X and K|.r.
which are very close to those of Example 5.16. The respective histograms of X and Y are as follows:
which are both close to zero. As expected, we observe that the above statistics agree closely with
those of Example 5.16.
236 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
► EXAMPLE 5.19 For a general bivariate distribution, consider the following PDF of X and Y:
(xy + y2)
AixUU) =
4(x + 4)
Because the product of the marginal PDFs of X and Y is not equal to the joint bi-variate PDF. the two
variates are not statistically independent.
The above results can be confirmed using the Symbolic Math Toolbox of the following MATLAB
program: also, the program generates the required inverse functions of the respective CDFs. The
appropriate program for these purposes is as follows.
With the inverse functions generated in the program above, the following MATLAB program will then
generate the random numbers of X, Y, and T|x:
ul=unifrnd(O,1,10000,1)
ul) .A(1/2)
(1+3. *
x=-2+2 . * % generates a vector of 10,000
values of X
u2=unifrnd(0,1,10000,1)
y=-l+(1+24. *
u2). A(1/2) % generates a vector of 10,000
values of Y
*
Yx=-l/2
x +l/2 (x .A2+64
.* *
u2+ % vector of 10,000 values of Y|x
u).
x
16
* 2. A(l/2)
Executing the second program above, we obtain the following statistics for X. Y, and F|x as
follows:
From these statistics, we can see that the statistics of Y\x are different from those of 1. indicating that
there is some dependency between X and Y. The correlation coefficient of —0.054 indicates that there
is a weak correlation between X and Y. The corresponding histograms can be displayed as shown
below:
238 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
► EXAMPLE 5.20 Consider next a more complicated bivariate joint PDF as follows:
A.rU. y) = |
1 < v < 2
1 (x \ 1 ( x~ \
A(x) = + 1); /’x( v) - - I — + V )
3\4 /
2/ 1 \ 2/ 1 \
A(y) = t - + i ;
3 \ V" / 3\ y/
Then, using the respective inverse functions obtained above, the following MATLAB program
generates the vectors of X, Y. and F|.v.
u=unifrnd(O,1,10000,1)
x=-2+2
(l+3
* .*
u) . A(l/2) % generates a vector of 10000
values of X
v=unifrnd(0,1,10000,1)
y=(3/4) * . +(l/4)
v . % generates a vector of 10000
v .A2 + 16).A(l/2)
*(9 . * values of Y
Yx=-(l/4) .*
x+l/2+(l/2) . % generates a vector of 10,000
*v+(l/8) .* v .*x+(l/8) . values of Y\x
(4. x.a2 + *
** 16. x*-8. v.
x-4.
**
x.
v. 2+16+32. v+16 .
a*
v .A2. *
*v.A2+8 . * x+v .A2.
*x.A2).A(l/2)
5.2 Numerical and Simulation Methods 239
In this case, we also find the correlation coefficient between X and Y to be —0.036. indicating that
there is some slight dependency between X and Y.
The respective histograms are also portrayed below.
Values of Y
► EXAMPLE 5.21 Finally, consider the following joint PDF for two random variables X and Y:
1 , 3x
Fx.y(x-y) = 77Tb’A'' +aT)
162
240 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
and the associated marginal distributions for X. Y, and the conditional distribution of Y|.v are.
respectively.
A(x) = -l(x2 + 3); /r(y) = -i(y2 + 3);
Io Io
1 1 ( V V“ 4- )
Fx(x) = — (A-3 + 9x); Fr(y) = —(y3 + 9y); /’V|.r(y|x) = -4——j—
54 54 3(x2 4- 9)
On the basis of the above statistics, we can see that there is dependency between X and Y\
specifically, the statistics for Y are different from those of Y|x. Also, there is a correlation coefficient
of —0.150 between X and Y indicating some dependency.
The corresponding histograms are displayed, respectively, below:
With the results generated above, let us examine the following. Suppose
Yx<3)
*
f=(x. ; % calculates (XY < 3) for each pair of
values of X and Y/x
S=sum(f);
P=S/10000 % yielding P(X-Y]x< 3) = 0.412
whereas if X and Y were statistically independent, the corresponding MATLAB program would be
*
y<3)
g=(x. ;
s=sum(g);
p=g/10000 yielding P(XY<3)=0.454
The above results, therefore, indicate that the correlation affects the threshold probability.
242 Chapter 5. Computer-Based Numerical and Simulation Methods in Probability
► PROBLEMS
The problems in this chapter require solutions by Monte Carlo simulation or numerical pro
cedures. For this purpose, personal computers (PCs) with appropriate software are necessary.
Commercially available software includes MATHCAD. MATLAB, and MATHEMATICA. MS
Excel + VISUAL BASIC may also be used. To understand and apply the software languages
used in this chapter, some familiarity with one or more of these softwares (particularly
MATHCAD or MATLAB) will be helpful.
Many of the problems in Chapters 3 and 4 may also be solved by numerical procedures
or Monte Carlo simulations, either with the probability distributions as specified in Chapters
3 and 4 or with revised distributions as appropriate. Specifically, consider the following:
5.1 This problem is similar to that of Example 4.10 of Chapter 4. load. A column will be over-stressed when the applied load S
The integrity of the columns in a high-rise building is essential exceeds the strength R\ determine the probability of occurrence
for the safety of the building. The total load acting on the columns of this event (R < S).
may include the effects of the dead load D (primarily the weight
of the structure), the live load L (that includes human occupancy, 5.2 The annual operational cost for a waste treatment plant is a
function of the weight of solid waste. W, the unit cost factor. F.
furniture, movable equipment, etc.), and the wind load IV.
and an efficiency coefficient. E, as follows:
These individual load effects on the building columns may
be assumed to be statistically independent variates with the fol WF
C= —
lowing respective probability distributions and corresponding \Te
parameters: where IV. F. and Eare statistically independent variates. Assume
Distribution of D is normal with /zD = 4.2 tons; and that the respective probability distributions are as follows, with
ctd = 0.3 ton the following medians and coefficients of variation (c.o.v.):
Distribution of L is lognormal with p.\ = 6.5 tons; and Variable Distribution Median c.o.v.
(7L = 0.8 ton
W lognormal 2000 tons/per year 20%
Distribution of W is Type I extreme with /zw = 3.4 tons;
F beta (from 0 to 50) $20 per ton 15%
and <rw =0.7 ton
E normal 1.6 12.5%
The total combined load effect .S' on each of the columns is
As C is a function of the product and quotient of the three vari
S=D+L+ W ates with the respective distributions indicated above, its proba
The individual columns were designed with a mean strength that bility distribution is difficult to determine analytically. By Monte
is equal to 1.5 times the total mean load that it carries, and may Carlo simulation, with a sample size of n = 10.000. determine the
be assumed to be a lognormal variate with a c.o.v. of 15%. The probability that the annual cost of operating the waste treatment
strength of each column R is clearly independent of the applied plant will exceed $35,000.
Problems 243
The result may be compared with that of Example 4.13 in (c) The number of passengers originating from Town B is twice
which the random variables were all assumed to be lognormally that of the number originating from Town A. What percentage
distributed. of the passengers will arrive at the shopping center during rush
hours in less than 1 hour?
5.3 In Problem 4.15. the amount of water supply available for a
(d) A passenger starting from Town B has an appointment at
city in the next month is lognormally distributed with a mean of
3:00 pm at the shopping center. If the bus left Town B at 2:00 pm
1 million gallons and a c.o.v. of 40%, whereas the total demand
but has not arrived at the shopping center at 2:45 pm, what is
is expected to follow a lognormal distribution with a mean of
the probability that he or she will arrive on time for his or her
1.5 million gallons with a c.o.v. of 10%.
appointment?
(a) Evaluate by MCS the probability of a water shortage in this
In Problem 4.9, the distributions of the travel times were all
city over the next month. The result should be the same as, or
Gaussian random variables. The results obtained above may be
close to. that of Example 4.15.
compared with those of Problem 4.9.
(b) Suppose the available water supply for the city in the next
month is beta distributed with parameters q = 2.00 and r = 4.00. 5.5 The settlement of each footing shown in the figure below
whereas the total demand is still lognormally distributed. The follows a normal distribution with a mean of 2 in. and a coeffi
means and c.o.v.s remain the same. Evaluate the end values a cient of variation of 30%. Suppose the settlements between two
and b of the beta distribution, and estimate by MCS the proba adjacent footings are correlated with a correlation coefficient of
bility of a water shortage in the city in the next month. 0.7; that is, the differential settlement is
(c) In part (a), suppose the mean supply of I million gallons D = |51-52|
can actually vary between 0.80 and 1.1 million gallons in the
next month (equivalent to an cpistemic uncertainty with a c.o.v. where S| and S2 are the settlements of footings 1 and 2, respec-
of 0.09). Determine by MCS the statistics of the probability of
water shortage in the city over the next month.
5.4 A shuttle bus operates from a shopping center, travels to
Towns A and B sequentially, and then returns to the shopping
center as shown in the figure below.
245
246 Chapter 6. Statistical Inferences from Observational Data
Statistical
inference
Mean p- x Statistical x - n Z X/
Variance o2 = s2 Estimation s2=^-t X(X/-x)2
• Efficiency refers to the variance of the estimator—for a given set of data, an estimator
f?i is said to be more efficient than another Oz if the variance of 0 , is smaller than
that of $2-
1 x” A
and for the sample variance, s2 =------- / (x, — x)2 (6.2)
n — I Z=1
Accordingly, x and ,s2 are. respectively, the point estimates of the population mean p and
population variance cr2.
We might point out that the sample variance of Eq. 6.2 is an “unbiased” estimate of
o'2; observe, however, that this is not quite the average of the sample values of (x, — x)2 for
i = 1,2,...,/?. The average would require division by n, instead of (n— 1). which would be
a biased estimator of cr2.
By expanding the squared terms of Eq. 6.2. it can be shown that Eq. 6.2 may be
expressed also as
6.2 Statistical Estimation of Parameters 249
* >0 = k/v2
, *
v[v( —
3-P Gamma AU) = V(k} exp| *
v( y)]; v, y,k > 1.0 Px =k/v + y
1 1 // x - p \\ 2"|
Gaussian fx(x) = —p-= exp ; p,o px -p
\j2fna ~2 \ <7 /
(normal) —c.X) < * < oo = er2
*
1 /In — XV / • o'
Lognormal fx(x) = ~;__ —exp ( J ; /zx-expl
v2tT<
* ~2
1 r*
\2 ‘
Rayleigh fx(x) — exp » Px =
O'" ~2'<a/
* >o \ 2
a2 = A ----- Ict-
2/
= T2(h~a)2
► EXAMPLE 6.1 Consider the data of the crushing strength of concrete tested for 25 cylinders listed in Table
E6.1. Applying Eqs. 6.1 and 6.3, we obtain the sample mean and sample variance, respectively,
1 A 140
x = — > X; = —— = 5.6 ksi
25^ 25
i=i
and
1 T 25 11
s2 = — ]Fx2 - 25(5.6)2 = — (794.52 - 784) = 0.44
“ L<=i J
Therefore, the estimates of the population mean and standard deviation are x = 5.6ksi and
s = VO-44 = 0.66 ksi.
Now, if the probability distribution of the crashing strength is modeled with the Gaussian distri
bution, then its parameters (see Table 6.1) would be fi. = 5.6 ksi and a = V0.44 = 0.66ksi.
However, if the crushing strength of concrete is assumed to be gamma distributed, then according
to Table 6.1, the parameters v and k would be
k/v = 5.6 and k/v2 = 0.44
► EXAMPLE 6.2 Data for the fatigue life of 75 S-T aluminum yielded the sample mean and sample variance for the
fatigue life (in terms of loading cycles) as follows:
= ^x^~ ~ 0
Substituting the sample mean and sample variance, respectively, for the true mean and variance, we
obtain the estimated values of the parameters A and < as follows (denoting A and < as the respective
estimators):
A 1
expl A + I = 26.75
and,
(26.75)2(^2 - 1) = 360.0
observed set of sample values X|, x2, ..., x„? This is the rationale underlying the maximum
likelihood method of point estimation.
It is reasonable to assume that the likelihood of obtaining a particular sample value x, is
proportional to the value of the PDF at x,-. Then, assuming random sampling, the likelihood
of obtaining a set of n independent observations x,, x2,..., x„ is
which is the likelihood function of observing the set of sample values xf. x2,..., x„. We
may then define the maximum likelihood estimator 0 as the value of 6 that maximizes
the likelihood function of Eq. 6.4. and thus may be obtained from the following equation
(if the likelihood function, L, is differentiable with respect to the parameter 0),
dL(X],x2, ...,x„;0)
----------------------------- = () (6.5)
de
Because the likelihood function of Eq. 6.4 is a product function, it is more convenient to
maximize the logarithm of the likelihood function; i.e.,
dlogL(X|,x2, ...,x„;0)
----------------------------------- = () (0.0 I
The solution for 0 from Eq. 6.6, of course, is the same as that obtained through Eq. 6.5.
For PDFs that are defined with two or more parameters, the likelihood function becomes
n
L(xi,x2, ..., x„;0|................ = ]-[/(xI;0;,.... Om) (6.7)
i=l
where 31,..., 0m are the m parameters to be estimated. In this case, the maximum likelihood
estimators of these parameters may be obtained from the solution to the following set of
simultaneous equations:
The maximum likelihood estimate (MLE) of a parameter possesses many of the desirable
properties of an estimator described earlier. In particular, for large sample size n, the max
imum likelihood estimator is often considered to yield the “best” estimate of a parameter,
in the sense that asymptotically it has the minimum variance (Hoel. 1962).
► EXAMPLE 6.3 The times between successive arrivals of vehicles at a toll booth were observed as follows:
1.2, 3.0. 6.3, 10.1.5.2,2.4, and 7.1 sec
If the inter-arrival times of vehicles at the toll booth are modeled with an exponential distribution,
with a PDF of
fi(t) =
A
in which A is the parameter of the distribution and is also the mean inter-arrival time.
According to Eq. 6.4. the likelihood function for observing the seven inter-arrival times is
7 1 / 1 7
L(ti. h........ t-j\k) = n-exp(-r,/A) = (A)~7exp| ~~Y\
Lt a \ a f-r
6.2 Statistical Estimation of Parameters 253
1 35.30
Log L = — 7 Log X----- > ; and for the present example. Log L = — 7 Log A------ -—
.
A i=i X
dLogL 7 35.30 r
---- — = - - ± ---r- = 0
dx x x2
from which we obtain the MLE of X as
From the above result, we may infer generally that the MLE for the parameter X, from a sample of
size n, is
► EXAMPLE 6.4 Tri-axial laboratory tests are performed on five specimens of saturated sand. Each specimen is sub
jected to cyclic vertical loads with stress amplitudes of ±200 psf. The number of load cycles applied
until each of the specimens failed were observed to be as follows: 25. 20. 28, 33, and 26 cycles.
If the number of load cycles to failure of saturated sand is modeled with a lognormal distribution,
the two parameters X and Q may be estimated by the maximum likelihood method as follows.
According to Eq. 6.7, the likelihood function for the case of a sample of size n would be
1 (In Xi— X
L(X|,X2,.
VXi 2
n-Hexp -^E(inA'-z>2
din L 1
—— = 0; yielding -
dX
and
din L « 1 V- 2
= 0; yielding ------- 1—T > (Inx, - X)- = 0
From the two equations above, we obtain the MLE for the two parameters, respectively, as
1 '' 1 "
X = —Y^lnxj and £2 = — Y^flnx, — X)2
n fl
i=i i=i
254 ► Chapter 6. Statistical Inferences from Observational Data
Applied to the present example, with the observed number of load cycles for the five specimens, we
obtain the corresponding MLE as follows:
and
or.
< =0.164
For this example, the same parameters estimated by the method of moments would be as follows:
The sample mean is x = ^(25 + 20 + 28-1- 33 + 26) = 26.40 cycles
.r = -1(25 - 26.4)2 + (20 - 26.4)2 + (28 - 26.4)2 + (33 - 26.4)2 + (26 - 26.4)2] = 22.30
4
Thus, the sample standard deviation is .v = 4.72 cycles.
Using the relationships given in Table 6.1 for the lognormal distribution, we have
/ I A
26.40 = expl A + I
and
22.30 = (26.40- 1)
We see that the two methods yield the same estimates for the parameter X, but the estimates for the
parameter < are different by about 8% between the two methods. ◄
Estimation of Proportion
In many engineering problems requiring the probability of occurrence or nonoccurrence of
an event, the appropriate probability measure may be estimated as the proportion of experi
mental or field observations of the event—for example, the annual occurrence probability of
hurricane-intensity wind in a coastal region, the proportion of vehicular traffic making left
turns at a major intersection, or the proportion of embankment material meeting a specified
compaction standard.
In such situations, the required probability may be estimated as the proportion of
occurrences (of an event) in a Bernoulli sequence. Suppose that we have a sequence
of n independent trials or random variables X|, X->,..., X„, where every X, is a two
valued random variable; i.e., X( = 0 or 1 denoting the occurrence or nonoccurrence of
an event in the (th trial. Then, the sequence X\, ..,Xn constitutes a random sample of
size n.
6.2 Statistical Estimation of Parameters ◄ 255
(6.9)
In other words, the estimate of p is the proportion of occurrences of an event among the
sequence of n trials.
(6.10)
(n1 /=*
a \
// n
1
=M (6.11)
which means that the expected value of the sample mean is equal to the population mean,
and therefore X is an unbiased estimator of the population mean p.
Observing that X\, X2, ■ ■ . ,X„ are statistically independent and identically distributed
as X, the variance of X is, therefore,
Var(X) = Var
X - fi .
It follows, therefore, that the distribution of----- — is TV (0,1).
X -fi
sample size n is small. More accurately, the random variable----- — will have the Student s
s/y/n
/-distribution ( Freund, 1962) with (/?—!) degrees-of-freedom. The PDF of the /-distribution
is
(6.14)
(6.15)
On the basis of Eq. 6.15. the sample variance of Eq. 6.14 is, therefore, an unbiased estimator
of the population variance, cr2, as we asserted earlier for Eq. 6.2.
The variance of S2 is given by (see Hald, 1952):
Var(S2) = (6.16)
where p.4 — E(X — /z)4 is the fourth central moment or kurtosis of the population X. We
may observe that as n increases, the variance of S2 decreases.
For sampling in Gaussian populations, i.e., X is a Gaussian random variable, the dis
tribution of the sample variance can be determined as follows.
We may rewrite Eq. 6.14 as
n n
(n - 1)52 = ^[(X, - /z) - (X - m)!2 = ^(Xz - /z)2 - «(X - /z)2
(=i z=i
Dividing both sides by <t2, we obtain
(6.17)
where X, and X are both Gaussian. We may recognize that the first term on the right-hand
side of Eq. 6.17 is the sum of squares of n independent standard normal variates; it can be
shown, by generalizing the result of Example 4.7, that this has a chi-square distribution with n
degrees-of-freedom (denoted as /2). Similarly, the second term on the right-hand side of Eq.
6.17 is also a chi-square distribution with one degree-of-freedom. Moreover, according to
Hoel (1962), the sum of two chi-square distributions, respectively, with degrees-of-freedom
p and q is also a chi-square distribution with (/? + </) degrees-of-freedom. On these bases, the
distribution of (n — 1 )52/cr2 is a chi-square distribution with (n — I) degrees-of-freedom
(d.o.f.); i.e., Xn-\-
258 Chapter 6. Statistical Inferences from Observational Data
(6.18)
The PDFs of Eq. 6.18 are shown in Fig. 6.5 for different degrees-of-freedom/. We may
observe from this figure that as/ increases, the /y distribution tends to approach the normal
distribution by virtue of the central limit theorem.
The mean and variance of the xy distribution are, respectively,
Pc = /
and
= 2/; or ac =
The information developed in Sect. 6.2.2 will be useful in our presentation of hypothesis
testing and in determining the confidence intervals, which are the topics of Sects. 6.3 and
6.4, respectively.
Ho: p = Po
HA: p > po
that involve the upper tail of a distribution. Similarly, the one-sided test involving
the lower tail of a distribution would be
p = po
Ha: p < po
Step 2—Identify the proper test statistic and its distribution.
The appropriate test statistic, and its probability distribution, will depend on
the population parameter that is being tested.
Step 3—Based on a sample of observed data, estimate the test statistic.
Step 4—Specify the level of significance.
Because the test statistic is a random variable, and its value is estimated from a
finite sample of observations, there is a probability of error that a wrong hypothesis
may be selected. In this regard, there are two possible types of errors:
Type I error—rejecting the null hypothesis Ho when in fact it is true.
Type 11 error—accepting the null hypothesis H() when in fact it is false.
The probability of a Type I error, usually denoted a, is called the level of sig
nificance. This level should be selected with some care, although its selection is
largely subjective. In practice, values of a between 1% and 5% are often selected
as the levels of significance.
260 Chapter 6. Statistical Inferences from Observational Data
Although there is a probability of a Type II error, usually denoted /I, this error
is seldom used in practice. Therefore, we shall concentrate only on the Type I
error.
Step 5—Define the region for rejection of the null hypothesis.
Based on the probability distribution of the test statistic, we can define the
region for rejecting the null hypothesis corresponding to the specified level of
significance a. In the case of a two-sided test, the region may be divided equally
in the two-tail regions of the distribution, whereas in a one-sided test, the re
gion will be in the upper tail region or the lower tail region, as illustrated in
Fig. 6.6.
The region of rejection is defined by the appropriate critical value(s) of the
test statistic corresponding to the specified level of significance a, as indicated in
Fig. 6.6. Such critical values will depend on the appropriate probability distribu
tion of the test statistic. The complementary region is the region of nonrejection
of the null hypothesis (for practical purposes, this is equivalent to the region of
acceptance of the null hypothesis).
► EXAMPLE 6.5 Suppose the specification for the yield strength of rebars required a mean value of 38 psi. It is,
Test of the Mean with therefore, essential that the population of rebars to be used in the construction of a reinforced concrete
Known Variance structure has the required mean strength. From the rebars delivered to the construction site by the
supplier, the engineer ordered that a sample of 25 rebars be randomly selected and tested for yield
strengths. The sample mean from the 25 tests yielded a value of 37.5 psi. It is known that the standard
deviation of rebar strength from the supplier is 3.0 psi.
Since the engineer would be concerned only with rebars having mean yield strength lower than
38 psi, a one-sided test is appropriate. Specifically, the appropriate null and alternative hypotheses
are as follows:
H„: Uy = 38 psi
and
In this case, we may use the sample mean X to formulate the test statistic
6.3 Testing of Hypotheses - 261
As X is normal with a mean n and a standard deviation of <r/the distribution of Z is MO. 1).
From the 25 sampled data, the estimated value of Z is
37.5 - 38.0
- 0.833
3.0/V25
At a 5% level of significance, ot = 5%, the critical value of z is found in Table A. 1. yielding
za = ‘t’ '(0.05) = —<I>-l(0.95) = —1.95. Since the estimated statistic is —0.833, which is outside
the region of rejection, or is within the region of acceptance, the null hypothesis is accepted. There
fore. the rebars from the supplier satisfy the required yield strength of the specification and are
acceptable. ◄
► EXAMPLE 6.6 In Example 6.5, the standard deviation, a, of the population of rebars is known to be 3.0 psi. In many
Test of the Mean with cases, the information on the population standard deviation may not be known and must be estimated
Unknown Variance also from the available sampled data. In Example 6.5. suppose the 25 tests yielded the following
results:
With f = 25 — 1 = 24 d.o.f., we obtain the critical value of t from Table A.3 at the 5% significance
level to be ta — —1.711. Therefore, the value of the test statistic is —0.714 > —1.711 which is outside
the region of rejection; hence, the null hypothesis is accepted, and the rebars from the supplier remain
acceptable for yield strength. ◄
► EXAMPLE 6.7 Continuing with the same Example 6.5, let us now test the population variance of a2 =9.0. For this
Test of the Variance purpose, let us suppose that the sample size was increased to 41 and the tests yielded the results:
x — 37.60 psi and 5 = 3.75 psi. The null and alternative hypotheses for this case are
H„: a2 =9.0
HA: a2 > 9.0
For this test, the appropriate test statistic would be
r_ D$2
For a significance level of ot = 2.5%, we obtain from Table A.4 with f = (41 — 1) = 40 d.o.f. the critical
value of co.975 = 59.34. Since c = 62.50 which is > C0.975, and therefore is in the region of rejection,
the null hypothesis is rejected and the alternative hypothesis must be accepted, which implies that
cr > 3.0. ◄
262 Chapter 6. Statistical Inferences from Observational Data
that K =----- ~ is a standard normal random variable with the distribution A(0, 1). On
Or/VH
this basis, we may form the statement
are the lower and upper critical values, as shown in Fig. 6.7.
The above probability statement may be rewritten as
6.4 Confidence Intervals (Interval Estimation) 263
From which we obtain the (1 - a) confidence interval for the population mean, /z, as
► EXAMPLE 6.8 Let us consider the yield strength of rebars that was described in Example 6.5. However, we now wish
to determine the 95% confidence interval of the true mean yield strength of rebars. We recall that
the essential information is as follows: The standard deviation of the yield strength is known to be
3.0 psi; the number of rebars that were tested is 25; and the estimated sample mean is 37.5 psi.
We determine the 95% confidence interval of the mean yield strength /z as follows:
First, we determine the lower critical value ka/2 = &o.o25 = —<I>-l(0.975) = —1.96; and the upper
critical value = ^0.975 = ’(0.975) = 1.96. Thus,
The above interval shows that with 95% probability, the value of the true mean /z will be between
36.32 psi and 38.68 psi. ◄
where /!£,„_] and /(!_<£), ,;_i are, respectively, the lower and upper critical values at prob
abilities of a/2 and (I — a/2) of the /-distribution with (n — 1) d.o.f. as indicated in
Fig. 6.8. These critical values are tabulated in Table A.3 for given d.o.f. Observe that
because of symmetry of the /-distribution about the origin, ta/2.n-\ = — fi-a/2,n-i, from
which we determine the (1 — cr) confidence interval for the population mean as
_ _ s \
(x + h,n-\ —7=; x + /(]_£).„_i —= I
y/n 2 y/n)
We might point out that for large sample size n, e.g., > 50. the /-distribution approaches
(6.20)
the standard normal distribution, as we saw earlier in Sect. 6.2.2. Therefore, in such cases,
the confidence interval obtained with Eq. 6.20 can be expected to be close to that obtained
with Eq. 6.19.
► EXAMPLE 6.9 In Example 6.6. the sample of 25 rebars tested for yield strengths yielded the following information:
x = 37.5 psi, and s — 3.5 psi. In this case, since the population variance is unknown and must be esti-
X - /z
mated also from the sampled data, the appropriate statistic is T =----- — which has the /-distribution.
For a 95% confidence interval, we obtain the lower and upper critical values from Table A.3 to be
fi).025.24 = —2.064, and to.975,24 — 2.064. respectively. Therefore, with Eq. 6.20, the 95% confidence
interval for the mean yield strength is
/ 3.5 3.5 \
</z)09S = 37.5 - 2.064-—; 37.5 + 2.064-— = (36.06; 38.94) psi
\ V25 V25/
Comparing this result with that of Example 6.8, we see that this 95% confidence interval is wider than
that of Example 6.8. However, if the sample size n is increased to 120 from 25, the corresponding
95% confidence interval would become,
/ 3.5 3.5 \
(M)o95 = 37.5 - 1.980-=; 37.5 + 1.980-= I = (36.87; 38.13) psi
\ 7120 7120/ 4
where (1 — o') is the specified confidence level and k(\-a) is the critical value given by
k(\-a} = 0“1 (1 — a). Rearranging the terms in the above equation, we have
6.4 Confidence Intervals (Interval Estimation) 265
from which we obtain the (I — a) lower confidence limit for the population mean p as
— a \
( x — I
vH/
(6.21)
However, if the population variance, cr2, is not known, and the sample variance .s2 must
X -p
be used in its place, then the appropriate statistic would be T = ----- — which has a
s/y/n
/-distribution with (n — 1) d.o.f. Then the (1 - o') lower confidence limit would be
— s \
(x/i—a, h—i —— I
V" /
(6.22)
in which /]_« „_] is the critical value at the probability of (1 — a) of the /-distribution.
EXAMPLE 6.10 Test results for 100 randomly selected specimens of 1-cm diameter A36 steel yielded the following
sample mean and sample standard deviation of the yield strength of the material: x = 2200kgf
(kilogram force) and 5 = 220 kgf. For specification purposes, the manufacturer is required to specify
the 95% lower confidence limit of the mean yield strength. As the sample size n = 100 is reasonably
large, we may assume that the population standard deviation, cr, is fairly well represented by the
sample standard deviation 5 = 220 kgf.
Then, with (1 — a) = 0.95, a = 0.05; we have from Table A. 1.
Jto.95 = <p '(0.95) = 1.65
EXAMPLE 6.11 In Example 6.10. if the same sample mean and sample standard deviation were obtained from a sample
of only 15 specimens, i.e., n = 15. the corresponding 95% lower confidence limit may be determined
with Eq. 6.22 as follows:
From Table A.3, we obtain the critical value at the probability of (1 — a) = 0.95, and with d.o.f./ = 14,
To.95.i4 = 1-761. Thus, the corresponding 95% lower confidence limit of the population mean would
be
220
(M)o.95 = 2200- 1.761—= = 2100 kgf
715
Conversely, there are situations in which the upper confidence limits of the population mean
are of interest. For example, in determining the wind load for the design of a building, the
upper limit of the mean wind force would be relevant; similarly, in assessing the flood of a
river, the upper limit of the mean maximum flow of the river would be of concern. In such
cases, the upper confidence limit on /z is of interest.
On the same basis as that of Eq. 6.21, we can show that the (1 — a) upper confidence limit
(if the population variance a2 is known) is
(6.23)
(M) I —a X + /[ (6.24)
266 Chapter 6. Statistical Inferences from Observational Data
We may emphasize that the confidence intervals given in Eqs. 6.19 and 6.20 and the
one-sided confidence limits of Eqs. 6.21 through 6.24 for the population mean /i are exact
if the underlying population is Gaussian. However, for practical purposes, these results may
be applied also to non-Gaussian populations especially if the sample sizes are reasonably
large (e.g., n > 20) by virtue of the central limit theorem; for this reason, irrespective
of the distribution of the underlying populations, the preceding equations may be used to
determine (approximately, at least) the confidence intervals or one-sided confidence limits
of the population mean /z.
► EXAMPLE 6.12 Recorded data (from Linsley and Franzini, 1964) of 25 storms and associated runoffs on the Monocacy
River at Jug Bridge, Maryland, are presented in Table E6.12.
Denoting
X = precipitation
Y = runoff
we obtain the respective sample means and sample variances as follows;
_=5T89=2i6.n 52 = _L[153 39_ 25(2.16)2] = 1.53 or sx = 1.24in.
25 A 24
and
20.05
•V ” 25 = 0.80 in. and y =
s2 [24.68 - 25(0.80)2] = 0.36 or Sy = 0.60 in.
I 1.11 0.52
2 1.17 0.40
3 1.79 0.97
4 5.62 2.92
5 1.13 0.17
6 1.54 0.19
7 3.19 0.76
8 1.73 0.66
9 2.09 0.78
10 2.75 1.24
11 1.20 0.39
12 1.01 0.30
13 1.64 0.70
14 1.57 0.77
15 1.54 0.59
16 2.09 0.95
17 3.54 1.02
18 1.17 0.39
19 1.15 0.23
20 2.57 0.45
21 3.57 1.59
22 5.11 1.74
23 1.52 0.56
24 2.93 1.12
25 1.16 0.64
6.4 Confidence Intervals (Interval Estimation) 267
With the above sample information, we obtain the 99% confidence interval for the mean precipitation
/zA- as follows.
From Table A.3, with f — 24 d.o.f., we obtain the critical values r0.005,24 = —2.797 and
to.995,24 = 2.797, obtaining the confidence interval for the mean precipitation as
1.24 1.24
<Mx)o.99 — 2.16- 2.797 2.16 + 2.797 —= = (1.47; 2.85) in.
vZ25 %/25
For the runoff, we would be more interested in the upper confidence limit of the mean runoff p.Y.
Thus, with Eq. 6.24, we obtain the 99% upper confidence limit as follows:
From Table A.3, with f = 24 d.o.f., we obtain the critical value r().99.24 = 2.492, and the upper
confidence limit for the mean runoff as
/ 0.60 \
(Mr )o99 = I 0.80 + 2.492 ———- I — 1.10 in.
in which w is the prescribed length of the half width expressed in terms of the number of
standard deviations of the sample mean.
We might observe from Eq. 6.25 that the sample size n increases inversely with the
prescribed half width, w; it also increases with the specified level of confidence (1 — o').
► EXAMPLE 6.13 In a traffic survey, the speeds of vehicles are measured by laser guns for the purpose of determining
the mean vehicle speed on a particular city street. It is known that the standard deviation of vehicle
speeds on city streets with the same posted speed limit is 3.58 kph. If we wish to determine the mean
vehicle speed to within ±1 kph (kilometer per hour) with a 99% confidence, what should be the
sample size of our observations?
If we prescribe w= 1.0, and from Table A.l we obtain Aq.995 = d>-1 (0.995) = 2.58, then with
Eq. 6.25 the required sample size is
However, if we wish to reduce the half width to w = 0.50, then the required sample size for the same
level of confidence would be increased to
In Eq. 6.25, we assumed that the population variance is known. If the population variance
is unknown, it must be estimated with s2 and the appropriate statistic would be
T = ■ ■'
s/y/n
which has the /-distribution. In this case, the sample size n necessary for a (1 — a) confidence
interval with a prescribed half width of w can be shown to be
1 z ,2
ft = yp ’ 0—a/2,»—i) (6.26)
in which fi-a/2,n-i is the value of the variate at the probability of (1 — a/2) in the t-
distribution of Table A.3 with (w — 1) d.o.f.
From Eq. 6.26, we see that /T —«/2.„— 1 is a function of rv, hence, in this case, the solution
for n will require trial and error.
The confidence interval for the proportion p (the occurrence probability) may be developed
as follows: Referring to Eq. 6.9, we first observe that
Pd - P) . , , P( 1 ~ P)
----------- ’ P ’ *( I -a/2) (6.29)
n n
► EXAMPLE 6.14 In order to control the quality of the compaction of the subgrade in a highway pavement project, 50
soil specimens were prepared and tested. Three out of the sample of 50 specimens were found to have
compaction below the CBR requirement.
On the basis of the test results, we estimate the proportion of the subgrade to have satisfactory
compaction (i.e., satisfying the CBR requirement) to be
47
^ = 50=094
That is, 94% of the subgrade for the highway pavement constructed may be expected to be well
compacted. Furthermore, the 95% confidence interval of the proportion of well-compacted pavement
is, according to Eq. 6.29, with &o.O25 — — (0.975) = —1.96 and A&975 = 4’“' (0.975) = 1.96
= (0.938; 0.942)
where .s2 is the sample variance of Eq. 6.3 and ca/2,n-i and c\_a/2,n-\ are, respectively, the
values of the /2 random variable at probabilities of a/2 and (1 — a/2) with (/? — 1) d.o.f.
which are tabulated in Table A.4.
Equation 6.30 is a two-sided confidence interval. Corresponding one-sided confidence
limits may also be developed, which are as follows. For the lower (1 — a) confidence limit,
(n — l).s’2
(6.31)
£ 1 —a. n— 1
(^2)l-a =
(n — l)s2
(6.32)
<-a, n— 1
270 Chapter 6. Statistical Inferences from Observational Data
For the population variance, we may be more concerned with either the lower bound or the
upper bound value, depending on the situation at hand. Accordingly, one or the other of
Eqs. 6.31 and 6.32 may be more useful than Eq. 6.30.
► EXAMPLE 6.15 In Example 6.12, we estimated the sample variance of the runoff on the Monocacy River, from 25
storms, to be 0.36 in2. We determine the upper 95% confidence limit of the population variance for
the runoff of the river to be (with Cq.05,24 = 13.848 from Table A.4),
(25 - l)(0.36)
)o.95
(°r* — = 0.624 in.2
13.848
Therefore, the upper 95% confidence limit of the standard deviation would be >/0.624 =
0.790 in2. ◄
(6.34)
with expected value, E(D) = <5, indicating that Eq. 6.34 is an unbiased estimator of the true
— — 52
distance 8, whereas the variance of D is Var(D) = —, in which
n
.s-2 = ——~^(di - dY
n — 1 i=i
which is the sample variance of the set of sample measurements.
— 5
In measurement theory, the standard deviation of D, i.e., cr is known as the
w2
standard error.
*Observed measurements will generally contain systematic and random errors. It is assumed here that all
measurements have been adjusted for systematic errors.
6.5 Measurement Theory 271
The independent random variables £>|, £>2,..., Dn are assumed to be Gaussian; this is
supported by observations of measured distances, such as, for example, Fig. 6.9. It follows
15-8 . . ., , c ,
that the random variable -—— has a /-distribution with (n — 1) d.o.ftherefore, the (1 — or)
s/y/n
confidence interval for 8 is
— s - s \
Wt-a « + ta/2, n —1 ~F J d + f 1 —a/2, n-1 ~F I (6.35)
x/u /
in which /i-a/2.n-i is the value of the random variable T with the PDF of Eq. 6.13 at the
probability of (1 — a/2) with (n — 1) d.o.f., which can be found in Table A.3, and because
of symmetry of the /-distribution ta/2 = — t\-a/2-
► EXAMPLE 6.16 The straight-line distance between two geodetic stations A and B is measured with an electronic
ranging instrument called a tellerometer. Ten repeated and independent measurements were taken of
the distance as follows:
di = 45,479.4 m 4 = 45,479.2 m
d2 = 45.479.6 m = 45,479.6 in
J, = 45,479.3 m = 45,479.5 m
J4 = 45,479.5 m d9 = 45,479.3 m
d5 = 45,479.8 m <710 = 45,479.1 m
The estimated distance is, therefore,
1 / l(> \ ।
d = — I ^d, = —(45,479.4 + • • • + 45,479.1) = 45,479.43 m
\/=i /
and the variance of the measured distances is
1 jo r -7i
s2 = — <7)~ I = 0.0445 m2; or 5 = 0.21m
/=!
Therefore, the standard error of the estimated distance is
.y 0.21 „
cr Q = —— = _ _ = 0.0664 m
/To
272 Chapter 6. Statistical Inferences from Observational Data
We also determine the 90% confidence interval of the true distance 8 with Eq. 6.35 as follows: With
f= « — I = 9 d.o.f., we obtain from Table A.3, Zo.95,9 — 1 -833; then.
( 0.21 0.21
(<5)O9O = (45,479.43 - 1.833—; 45,479.43 + 1.833 —
= (45.479.31; 45.479.55) m
For a function of one or more distances (or other geometrical dimensions), the function may
be evaluated on the basis of the mean measurements; i.e., if £ is a function of k distances
<5], <$2,..., 8k,
$ = #(<$i ,82,..., 8k) (6.36)
where the true distances <51, <5?,..., 8k must necessarily be estimated by the respective mean
measurements d\, d2........ dk- The estimator of £ may then be obtained through first-order
approximation (Eq. 4.50) as
< =^(O|, D2, ..., Dk) (6.37)
and its mean value is
- g(d\,d2,dk) = C (6.38)
and based on Eq. 4.5 la, the variance is
(6.39)
dDi
Assuming < to be Gaussian with mean < according to Eq. 6.38, and standard error a . as
given by Eq. 6.39, we obtain the (1 — a) confidence interval for $ as
(O l-a — (£ + ^a/2 • Off; C + ^(l-a/2) ' (6.40)
► EXAMPLE 6.17 Consider a rectangular tract of land as shown in Fig. E6.17. Each of the dimensions of the rectangle
is measured several times as indicated below.
D 9 60 m 0.81 m2
B 4 70 m 0.64 m2
C 4 30 m 0.32 nr
We determine the 95% confidence interval of the area of the tract of land as follows:
The area of the rectangle is
A = (B + C)D
2 \ / 2 \ / 2
( t ) + (60n t ) +(100n
4/ \ 4 / \ 9
n/0.64\ ,/0.32\ ,,
= (60)- —- + (60)- — + (100 2
\ 4 /\ 4 /
► PROBLEMS
6.1 The foundation for a building is designed to rest on 100 piles (b) If it is desired to estimate the mean annual maximum stream
based on the individual pile capacity of 80 tons. Nine test piles flow to within ±1.000 cfs with 90% confidence, how many
were driven at random locations into the supporting soil stratum additional years of observation will be required? Assume the
and loaded until failure of each pile occurred. The results are as sample (not the true value) variance based on the new set of data
follows: will be approximately 9 x IO6 (cfs)2. (Ans. 17)
Test Piles Pile Capacity (tons) 6.4 Five piles have been load tested until failure; the load mea
1 82 sured at failure denotes the actual capacity of the given pile. The
2 75 following table summarizes the data from the load tests:
3 95
4 90 Actual Predicted
5 88 Pile Test No. Capacity A Capacity P N = A/P
6 92
7 78 1 20.5 13.6 to be determined
8 85 2 18.5 20.4
9 80 3 10.0 8.8
4 15.3 14.3
5 26.2 22.8
(a) Estimate the mean and standard deviation of the individual
pile capacity to be used at the site.
Observe that the capacity of each pile has also been predicted by
(b) At the 5% significance level, should the piles be accepted
a theoretical model as indicated in the table above. The factor N
based on the results of the nine test piles? That is, perform a
is simply the ratio of the actual pile capacity to the predicted pile
one-sided hypothesis test, with the null hypothesis that the mean
capacity; i.e., N = A/P.
pile capacity is 80 tons.
(a) Complete the table by calculating the respective value of N
(c) Establish the 98% confidence interval for the mean pile ca
for each test pile.
pacity. assuming that the standard deviation of the population is
(b) Determine the sample mean and variance of N. (Ans. 1.154,
known, a = s.
0.048)
(d) Determine the 98% confidence interval for the mean pile
(c) Determine the 95% confidence interval of the mean value of
capacity on the basis of unknown variance.
N. Ans. (0.881, 1.427)
6.2 The average speed of vehicles on a highway is being studied, (d) In order to estimate the mean value of N to ± 0.02 with 90%
(a) Suppose observations on 50 vehicles yielded a sample mean confidence, how many additional piles should be tested? Assume
of 65 mph. Assume that the standard deviation of vehicle speed that the variance of N is known and equal to 0.045 for this part.
is known to be 6 mph. Determine the two-sided 99% confidence (Ans. 300)
intervals of the mean speed. (e) Assume N is a normal random variable whose mean value
(b) In part (a), how many additional vehicles’ speed should be and variance are given exactly by the corresponding sample val
observed such that the mean speed can be estimated to within ues from part (b). Consider a new site where a pile has been de
± 1 mph with 99% confidence? signed and its capacity is predicted by the model to be 15 tons.
(c) Suppose John and Mary are assigned to collect data on the What is the probability that the pile will fail under a load of
speed of vehicles on this highway. After each person has sepa 12 tons? (Ans. 0.0537)
rately observed 10 vehicles, what is the probability that John’s
6.5 Concrete placed on a structure was subsequently cored after
sample mean will exceed Mary’s sample mean by 2 mph?
28 days, and the following results were obtained of the compres
(d) Repeat part (c) if each person has separately observed 100
sive strengths from five test specimens:
vehicles instead.
6.3 Suppose the annual maximum stream flow of a given river 4142. 3405, 3402, 4039, 3372 psi
has been observed for 10 years yielding the following statistics:
(a) Determine the 90% two-sided confidence interval of the
sample mean — x = 10.000 cfs mean concrete strength.
sample variance = ?= 9x 106(cfs)2 (b) Suppose the confidence interval established in part (a) is too
wide, and the engineer would like to have a confidence interval
(a) Establish the two-sided 90% confidence interval on the mean to be ± 300 psi of the computed sample mean concrete strength.
annual maximum stream flow. Assume a normal population. Generally, more specimens of concrete would be needed to keep
Ans. (8261, 11739) the same confidence level. However, without additional samples.
Problems. < 275
what is the confidence level associated with the specified interval (c) Determine the 95% confidence interval for the true mean DO
based on the five measurements given above? concentration.
(c) If the required minimum compressive strength is 3500 psi,
6.9 The height of a radio tower may be determined by measur
perform a one-sided hypothesis test at the 2% significance
ing the horizontal distance L from the center of the tower base to
level.
the instrument, and the vertical angle /J, as shown in the figure
6.6 At a weigh station, the weights of trailer trucks were ob below.
served before crossing a highway bridge. (a) The distance L is measured three times with the following
(a) Suppose observations on 30 trucks yielded a sample mean of readings:
12.5 tons. Assume that the standard deviation of truck weights
is known to be 3 tons. Determine the two-sided 99% confidence
124.30, 124.20, and 124.40 ft
intervals of the mean weight of trailer trucks on the particular
highway.
(b) In part (a), how many additional trucks should be observed Determine the estimated distance and its standard error.
such that the mean truck weight can be estimated to within (Ans. 124.30 ft; 0.0577ft)
± 1.0 ton with 99% confidence? (b) The angle fl is measured five times, with the following
6.7 The distribution of ocean wave height, H. may be modeled readings:
with the Rayleigh PDF as
40'24.6', 40°25.0', 40' 25.5', 40 24.7', 40°25.2'
/„(/,)= A . /? > 0
oi-
Determine the estimated angle, and its standard error.
in which o' is the parameter of the distribution. Suppose that the (Ans: 40 25'; 0.164').
following measurements of the wave height were observed: (c) Estimate the height of the tower H. Assume that the instru
ment is 3 ft high with a standard deviation of 0.01 ft.
1.5 0. 2.80, 2.50, 3.20, 1.90, 4.10, 3.60, 2.60, 2.90, 2.30 m (d) Evaluate the standard error of the estimated height of the
tower, o -q.
(e) Determine the 98% confidence interval of the actual height
Estimate the parameter a by the method of maximum likelihood
of the tower.
(MLE).
6.8 The daily dissolved oxygen concentration (DO) for a loca
tion A downstream from an industrial plant has been recorded
for the past 10 consecutive days as tabulated below:
(a) Suppose the minimum concentration of DO required by the
Environmental Protection Agency is 2.0 mg/l. Perform a hypoth
esis test to determine whether the stream quality satisfies the EPA
standard at the significance level of 5%.
Day DO (mg/l)
1 1.8
2 2
3 2.1
4 1.7
5 1.2 Determining height of tower.
6 2.3
7 2.5
8 2.9
9 1.9 6.10 Each of the inner and outer radii of a circular ring, as shown
10 2.2 in the following figure was measured five times with the follow
ing readings:
(a) Determine the estimated (mean) distances b\ and /?2, and the
angle /f.
(b) Evaluate the respective standard errors.
(c) Estimate the area, A. of the triangular lot, and evaluate the
associated standard error by first-order approximation.
(d) Determine the 90% confidence interval of the area.
6.12 The lengths a and b of the triangular lot, shown below,
were measured independently as follows:
a, m b, m
80.5 121.4
79.9 120.6
80.2 120.3
80.1 120.1
(a) Determine the best estimates of the outer and inner radii, and 79.6 120.2
the corresponding standard errors. 80.8
(b) The hatched area between the two concentric circles can be
estimated based on the mean values of the estimated outer and
inner radii; namely, the area is
A = ntr1 - r~)
Car No. Observed Mileage (a) Estimate the sample mean and sample standard deviation of
the actual mileage of this particular make of car.
I 35 mpg
(b) Suppose that the stated mileage of this particular model of
2 40
cars is 35 mpg; perform a hypothesis test to verify the stated
3 37
mileage with a significance level of 2%.
4 42
(c) Determine the corresponding 95% confidence interval of the
5 32
actual mileage.
6 43
7 38
8 32
9 41
10 34
REFERENCES
Freund. J. E.. Mathematical Statistics. Prentice-Hall. Englewood Cliffs, Hoel, P. G.. Introduction to Mathematical Statistics, 3rd Ed.,
New Jersey, 1962. J. Wiley and Sons, Inc.. New York, 1962.
Hald. A., Stastistical Theory’ and Engineering Applications. Linsley, R. K. and Franzini. J. B., Water Resources Engineering,
.1. Wiley and Sons, Inc.. New York. 1965. McGraw-Hill Book Co.. New York, 1964. p. 68.
CHAPTER
Determination of Probability
Distribution Models
>7.1 INTRODUCTION
The probability distribution model appropriate to describe a random phenomenon is gen
erally not known; i.e., the functional form of the probability distribution is not defined.
Under certain circumstances, the basis or properties of the physical process underlying the
random phenomenon may suggest the form of the required distribution. For example, if a
process is composed of the sum of many individual effects, the Gaussian distribution may
be appropriate on the basis of the central limit theorem, whereas if the extremal conditions
of a physical process are of interest, one of the asymptotic extreme-value distributions may
be a suitable model (discussed in Sect. 7.4).
In many cases, the required probability distribution may need to be determined empir
ically based on the available observational data. For example, if the frequency diagram for
a set of data can be constructed, the required distribution model may be inferred by visually
comparing a particular PDF with the corresponding frequency diagram (see Chapter 1 for
examples). Alternatively, the available data may be plotted on probability papers prepared
for specific distributions (as described below in Sect. 7.2); if the data points plot approxi
mately with a linear trend on one of these papers, the distribution associated with this paper
may be an appropriate distribution model.
An assumed or prescribed prior probability distribution, perhaps determined empir
ically as described above or developed on theoretical grounds, may be verified, or dis
proved, in the light of available data using certain statistical tests, known as goodness-of-fit
tests for distribution. Moreover, when two or more distributions appear to be plausible
probability distribution models, such tests can be used to discriminate the relative valid
ity of the different distributions. Two such goodness-of-fit tests are commonly used for
these purposes—the Chi-square (/2) test and the Kolmogorov-Smirnov (K-S) test; a third
test—the Anderson-Darling test—is particularly useful when the tails of a distribution are
important.
In practice, the choice of the appropriate distribution model may be dictated by math
ematical tractability or convenience. For example, because of the mathematical simpli
fications possible with the normal distribution and the wide availability of probability
information (such as probability tables) for this distribution, the normal or lognormal dis
tribution is frequently used to model nondeterministic problems—at times even when there
is no clear-cut basis for such a model. Probabilistic information derived on the basis of
such prescribed distribution models could be useful, particularly when the information is
needed only for relative purposes. However, when the form of a distribution is important.
278
7.2 Probability Papers 279
particularly when ample data are available, the methods described in this chapter should
provide the tools needed for its determination.
For a set ofN observations xi,xi,. ■x^, arranged in increasing order, the mth value is plotted
m
at the cumulative probability of p
This plotting position applies to all probability papers; its theoretical basis is discussed
in Gumbel (1954). There are also other plotting positions, such as (/n—|)/(V, which was
advocated by Hazen (1930) and has been used widely; however, this plotting position has
certain theoretical weaknesses. In particular, when there are N observations, this plotting
position with probability of (in— |)//V would yield a return period of 2N for the largest
observation instead of the correct value of N (Gumbel, 1954). Still other plotting positions
have been suggested (e.g., Kimball, 1946); however, none seems to have the theoretical
m
attributes and computational simplicity ol .
The utility of a probability paper is to provide a graphical portrayal of the frequency
curve produced by a set of observational data. The linearity (or linear trend), or lack of a
linear trend, of the set of sample data points plotted on a particular probability paper may
be used as a basis for inferring or determining whether the distribution of the underlying
population is the same as that of the probability paper. On this basis, therefore, probability
papers may be used to establish or explore the possible distribution (s) of the underlying pop
ulation. In the following sections, we shall describe the construction of several probability
papers. In particular, in Sects. 7.2.2 and 7.2.3 we illustrate the construction and application
of two commonly used probability papers—namely, the normal and the lognormal prob
ability papers. In Sect. 7.2.4 we describe the construction of probability papers for other
distributions.
280 Chapter 7. Determination of Probability Distribution Models
The normal (or Gaussian) probability paper is constructed on the basis of the standard
normal CDF as follows:
• One axis, in arithmetic scale, represents the values of the random variable X, as
illustrated in Fig. 7.1.
• On the other axis, perpendicular to the first, are two parallel scales; one in arithmetic
scale represents values of the standard normal variate, s, whereas the other shows the
cumulative probabilities, <£>(.$•), corresponding to the indicated values of 5 as shown
also in Fig. 7.1.
A normal variate X with probability distribution N(p-, o') is represented on this paper by a
straight line passing through the point at X = p and <t>(s) = 0.50, with a slope of (xp — p.)/s,
which is equal to the standard deviation or, where xp is the value of the variate at probability
p. In particular, at p = 0.84, 5=1; hence, the slope is (%o.84 — M)-
Any set of data may be plotted on the normal probability paper; however, if the result
ing graph of the data points, plotted with the plotting positions described earlier in Sect.
7.2.1. shows a lack of linearity (or linear trend), this would indicate that the underlying
distribution of the variate is not Gaussian. Conversely, if the data points plotted on this nor
mal probability paper show a linear trend, the straight line drawn through these data points
represents a specific normal distribution for the data set, at least within the range of the
observations.
► EXAMPLE 7.1 Data for the fracture toughness of steel plates are given in Table E7.1; the measured toughness has
been rearranged in increasing order. The fracture toughness and corresponding cumulative probability
of the data are plotted on the normal probability paper as shown in Fig. E7.1.
In Fig. E7.l, the data in Table E7.l are plotted on a normal probability paper. Values of the
fracture toughness Klc are plotted against the corresponding cumulative probabilities of m/(N + I),
with N — 26. The straight line drawn through the data points (by eye) as shown in Fig. E7.1 represents
the normal distribution of the observed fracture toughness data, from which we obtain the mean value
pKlc — 77 ksix/in. From the same straight line, we also observe that the value of Klc at the probability
of 84% is 81.6; thus, the standard deviation is oKlc =81.6 — 77 = 4.6 ksivTn.
7.2 Probability Papers 281
TABLE E7.1 Fracture Toughness of Steel Base Plates (after Kies et al., 1965)
m K m/(N 4- 1) m K m/(N + 1)
Observational data from a lognormal population should plot with a linear trend on
the lognormal probability paper, so that a straight line can be drawn through the data
points. From this straight line, the median xm is simply the value of the variate on this line
corresponding to the cumulative probability of 0.50, whereas the parameter £ (or the c.o.v.)
is given by the slope of the line, i.e.,
_ ln(x/xw)
Conversely, if the data points of the /nth value among N observations plotted at a cumulative
probability of m/(N + 1) do not yield a linear trend on this probability paper, then the
underlying population may not be lognormal.
► EXAMPLE 7.2 Data for the fracture toughness of MIG welds are shown in Table E7.2. Values of the toughness,
arranged in increasing order, are shown in columns 2 and 5 in Table E7.2, and the corresponding
plotting positions tn/(N + 1) are shown in columns 3 and 6.
TABLE E7.2 Fracture Toughness of MIG Welds (data from Kies et al., 1965)
m tn
tn K tn K
N+ 1 N+ 1
1 54.4 0.05 2 62.6 0.10
3 63.2 0.15 4 67.0 0.20
5 70.2 0.25 6 70.5 0.30
7 70.6 0.35 8 71.4 0.40
9 71.8 0.45 10 74.1 0.50
11 74.1 0.55 12 74.3 0.60
13 78.8 0.65 14 81.8 0.70
15 83.0 0.75 16 84.4 0.80
17 85.3 0.85 18 86.9 0.90
19 87.3 0.95
We can observe from Fig. E7.2 that the data points plotted on this probability paper yield a
linear trend. Accordingly, the straight line drawn through the data points represents the lognormal
distribution for the MIG welds with a median value of 74 ksi and a c.o.v. of 12% as indicated in
Fig E7.2. ◄
EXAMPLE 7.3 Measured data of the precipitation and runoff on the Monocacy River that were described earlier in
Example 6.12 are rearranged in increasing order as shown in Table E7.3. The corresponding data points
are plotted on the respective lognormal probability papers as shown in Fig. E7.3a for precipitation,
X, and in Fig. E7.3b for the runoff, Y.
On the basis of these two graphs, we may conclude from Fig, E7.3a that the distribution of the
precipitation is not lognormal, whereas, from Fig. E7.3b, the runoff could reasonably be modeled
with the lognormal distribution with parameters = In 0.66 = 0.42 and G = 0.79.
A probability paper may be constructed for any probability distribution. For a given distri
bution. the corresponding probability paper should be such that values of a variate and the
respective cumulative probabilities will yield a straight line on the appropriate paper. Con
versely, any straight line on a specific probability paper represents a particular distribution
with given values of the parameters. For this purpose, a probability paper, therefore, should
be constructed such that it is independent of the values of the parameters of the distribution.
This can be accomplished by defining a standard variate, if one exists, appropriate for the
distribution.
In the last two sections above, we illustrated the construction and application of the
normal and lognormal probability papers; in the following examples, we illustrate the
construction and application of other probability papers.
Construction of the Exponential Probability Paper—Consider now the construction
of the probability paper for the shifted exponential probability distribution. The PDF of this
distribution is
= 0;
where X is the parameter, and a is the minimum value ofX. In this case, the standard variate
is 5 = X(X — a), and its PDF is then, according to Eq. 4.6,
=e 5>0
=0 5<0
7.2 Probability Papers 285
TABLE 7.1 Grid Line Positions for Shifted Exponential Probability Paper
On the basis of the above, we construct the exponential probability paper as follows:
• On one (e.g., the horizontal) axis, scale the values of the standard variate 5 (in arith
metic scale); on the same or a parallel axis, mark the corresponding cumulative
probabilities according to the above Fs(s).
• On the other (perpendicular) axis, mark the values of the original variate X (in arith
metic scale).
For illustration, specific values of 5 and the corresponding Fs(s) are calculated as summa
rized in Table 7.1. Grid lines are then drawn for given Fs(s) at the indicated values of 5
shown in Table 7.1. The resulting paper is shown in Fig. 7.2.
A straight line, with positive slope, on this paper represents a particular shifted expo
nential distribution, in which the intercept on the x-axis is the value of «, and its slope is
1/A.
2 3
► EXAMPLE 7.4 Sample values from an exponential population should plot with a linear trend on the exponential
probability paper. To illustrate this, consider the hypothetical set of data in Table E7.4 for a random
variable X. The /nth values of X and corresponding plotting positions m/(N + I) are shown plotted
in Fig. E7.4. From the straight line drawn visually through the data points in Fig. E7.4, we obtain
estimates of the minimum value a = 150, and the slope 1/X = 2000/2.69 = 743; thus, the parameter
A = 0.001346.
0 12 3 4
where u and a are the parameters. For this distribution, the standard variate is
S = ot(X — u)
The result is the Type 1 extremal probability paper, which is also known as the Gumbel
probability paper. X straight line on this paper represents a particular Type I asymptotic
extreme value distribution; the value of X on this line at .v = 0 or F$(.s) = 0.368 is the value
w, whereas the slope of the straight line is I lot, as illustrated in Fig. 7.3.
288 Chapter 7. Determination of Probability Distribution Models
Return period
Probability, Fs(s)
► EXAMPLE 7.5 In Table E7.5 we show data on the occurrence of the annual largest magnitude earthquakes for
31 years observed between 1932 and 1962 in California (Epstein and Lomnitz, 1966). Also shown
in this table are the corresponding plotting positions that are used to plot the data on the Gumbel
probability paper as shown in Fig. E7.5. From this figure, we can determine the parameters (at s = 0),
u = 5.7 and the slope of the graph 1 /a = = 0.50 or a = 2.00.
Return period
(7.1)
in which C\-aj is the critical value of the // distribution at the cumulative probability
of (1 — a), the assumed theoretical distribution is an acceptable model, at the significance
level a. Otherwise, if Eq. 7.1 is not satisfied, the assumed distribution model is not sub
stantiated by the observed data at the a significance level. Values of C|_ay are tabulated in
Table A.4.
In applying the chi-square test for goodness-of-fit, it is generally necessary for satis
factory results to have (if possible) k > 5 and e, > 5.
► EXAMPLE 7.6 Severe thunderstorms have been recorded at a given station over a period of 66 years. During this
period, the frequencies of severe thunderstorms observed are as follows:
occurrences.
7.3 Testing Goodness-of-Fit of Distribution Models 291
we have
22 (h, -eif/ei =0.068 < 5.99
Therefore, the Poisson distribution is suitable for modeling the annual occurrences of rainstorms at
the station, at the 5% significance level. ◄
► EXAMPLE 7.7 Consider the histogram of the crushing strength of concrete cubes shown in Fig. E7.7. Also shown
in the same figure are the normal and lognormal PDFs with the same mean and standard deviation
as estimated from the observed data set. Visually, it appears that the two theoretical distributions are
equally valid to model the crushing strength of concrete.
In this case, the chi-square test can be used to discriminate the relative goodness of fit between the
two candidate distributions. For this purpose, eight intervals of the crushing strength are considered
as indicated in Table E7.7.
The two parameters of both the normal and lognormal distributions were estimated from the
sample data; consequently, the number of d.o.f. in both cases is/=8 —3 = 5. Therefore, at the
significance level of a = 5%, we obtain from Table A.4, Co.95.5 = I 1.07. Comparing this with the sum
of the last two columns in Table E7.7, we see that both the normal and the lognormal are suitable for
modeling the crushing strength of concrete; however, the lognormal model is superior to the normal
model as indicated by the values of E(w< — e,)2/e,- for the two distributions, as shown in columns 5
and 6 of Table E7.7.
Figure E7.7 Histogram of crushing strength of concrete cubes. (Data from Cusens and Wettern,
1959.)
292 Chapter 7. Determination of Probability Distribution Models
TABLE E7.7 Computations for Chi-Square Tests of Normal and Lognormal Distributions
. . , Theoretical Frequencies, e, (n, — ei)2lei
Interval Observed _______________ 2______ ___ __ ____ ___ _
(ksi) Frequency, n. Normal Lognormal Normal Lognormal
► EXAMPLE 7.8 From a sample of 320 observations, the histogram of residual stresses of wide flange steel beams is
shown in Fig. E7.8. Superimposed on the same figure are the PDFs of three theoretical distribution
models: the normal, lognormal, and the shifted (or 3-parameter) gamma distributions. From the
data set. the first three moments were estimated to be, /z =0.3561. <7=0.1927, and 0 = 0.8230
(skewness coefficient). Accordingly, the normal and lognormal distributions are assumed to have
the estimated values for and <r, whereas the shifted gamma is assumed with the three estimated
moments.
By visual inspection of Fig. E7.8, we can observe that the normal distribution is clearly not
suitable. However, the lognormal and the shifted gamma distributions appear to fit the histogram
well. In order to verify the relative validities of the three distribution models, we perform the chi-
square test for goodness-of-fit with the calculations shown in Table E7.8.
Figure E7.8 Chi-square tests to discriminate three distribution models for residual stresses.
I
7.3 Testing Goodness-of-Fitof Distribution Models 293
From Table E7.8, we see that among the three distributions, the shifted gamma distribution gives
the lowest value for — eifle,. Also, at the significance level of 1 % and a d.o.f. of / = 9 — 4 = 5, we
obtain from Table A.4 the critical value of Co.99,5 = 15.09 for the normal and lognormal distributions,
whereas for the shifted gamma distribution/ — 9—5 = 4 and Co.99.4 — 13.28. Therefore, according to the
chi-square test, only the shifted gamma distribution (among the three distributions) is approximately
valid at the 1% significance level for modeling the probability distribution of residual stresses in
wide-flange beams. ◄
It may be emphasized that because there is some arbitrariness in the choice of the
significance level a, the chi-square goodness-of-fit test (as well as the Kolmogorov-Smirnov
and the Anderson-Darling methods, described subsequently in Sects. 7.3.2 and 7.3.3) may
not provide absolute information on the validity of a specific distribution. For example, it
is conceivable that a distribution acceptable at one significance level may be unacceptable
at another significance level: this can be illustrated with the shifted gamma distribution of
Example 7.8, in which the distribution is valid at the 1% significance level but will not be
valid at the 5% level.
In spite of this arbitrariness in the selection of the significance level, however, such sta
tistical goodness-of-fit tests remain useful, especially for determining the relative goodness-
of-fit of two or more theoretical distribution models, as illustrated in Examples 7.7 and 7.8.
Moreover, these tests should be used only to help verify the validity of a theoretical model
that has been selected on the basis of other prior considerations, such as through the appli
cation of appropriate probability papers, or even visual inspection of an appropriate PDF
with the available histogram.
For a sample of size /?, we rearrange the set of observed data in increasing order. From
this ordered set of sample data, we develop a stepwise experimental cumulative frequency
function as follows:
Theoretically, Dn is a random variable. For a significance level a, the K-S test compares the
observed maximum difference, Dn of Eq. 7.3, with the critical value which is defined
for significance level a by
F(D„ <£>“)= 1 - a (7.4)
The critical values £)“ at various significance levels a are tabulated in Table A.5 for various
sample size n. If the observed Dn is less than the critical value £)“, the proposed theoreti
cal distribution is acceptable at the specified significance level a; otherwise, the assumed
theoretical distribution would be rejected.
The K-S test has some advantage over the chi-square test. With the K-S test, it is not
necessary to divide the observed data into intervals; hence, the problem associated with
small e, and/or small number of intervals k in the chi-square test would not be an issue with
the K-S test.
► EXAMPLE 7.9 The data for the fracture toughness of steel plates in Example 7.1 were plotted on a normal proba
bility paper as shown in Fig. E7.1. The data appear to yield a linear trend producing a straight line
corresponding to a normal distribution N(T7. 4.6). Let us now perform a K-S test to evaluate the
7.3 Testing Goodness-of-Fit of Distribution Models 295
appropriateness of the proposed normal distribution model in light of the available data, at the 5%
significance level.
With the tabulated data that have been rearranged in increasing order in Table E7.1, we illustrate
the calculations of Eq. 7.2 for the empirical cumulative frequency and the corresponding theoretical
CDF for the <V(77, 4.6) as shown in Table E7.9. The two cumulative frequencies are plotted as shown
in Fig. E7.9. From Fig. E7.9, or Table E7.9, we can observe that the maximum discrepancy between
the two cumulative frequencies is Dlt — 0.16 and occurs at x = K/c = 77 k.si xAn.
In this case, the sample size is n = 26; therefore, at the 5% significance level, we obtain the
critical value of D" from Table A.5 to be D$ = 0.265. Since the maximum discrepancy D„ =0.16
is less than 0.265, the normal distribution ZV(77, 4.6) is verified as an acceptable model at the 5%
significance level.
Figure E7.9 Cumulative frequencies for the K-S test of fracture toughness.
► EXAMPLE 7.10 In Example 7.8, we tested the goodness-of-fit of three candidate distributions (the normal, lognormal,
and the shifted gamma) for the residual stresses in wide flange steel beams using the chi-square test.
With the same data, let us now use the K-S test to determine the goodness-of-fit of the same three
distributions.
296 I Chapter?. Determination of Probability Distribution Models
For the K-S test, we constructed Fig. E7.10 to display the empirical cumulative frequency func
tion of the observed data and the respective CDFs of the normal, lognormal, and shifted gamma
distributions. The respective maximum discrepancies for the three theoretical distributions, in accor
dance with Eq. 7.3, are found to be as follows:
Figure E7.10 Empirical cumulative frequency and the CDFs of three distribution models
In the K-S test, the probability scale is in an arithmetic scale. We may observe that both the
proposed theoretical and the empirical CDFs are relatively flat at the tails of the probability
distributions. Hence, the maximum deviation in the K-S test will seldom occur in the tails
of a distribution, whereas in the chi-square test, the empirical frequencies at the tails must
generally be grouped together. As a result, either test would not reveal any discrepancy
between the empirical and theoretical frequencies at the tails of the proposed distribution.
The Anderson-Darling (A-D) goodness-of-fit test was introduced by Anderson and Darling
(1954) to place more weight or discriminating power at the tails of the distribution. This can
be important when the tails of a selected theoretical distribution are of practical significance.
The procedure for applying the A-D method can be described with the following steps:
1. Arrange the observed data in increasing order: X|, xz, ... ,xt,...... xn, with x„ as the
largest value.
2. Evaluate the CDF of the proposed distribution Fx (x,) at x,, for i = 1.2, ... n.
3. Calculate the Anderson-Darling (A-D) statistic
n
^2 = -22l(2/- l){lnFxa.) + ln|l -Fxa„+1-i)]|/n]-" (7.5)
i=l
7.3 Testing Goodness-of-Fit of Distribution Models 297
—- a. 7 (7.6)
n
in which values of aa, bo, and b\ are given in Table A.6a for a prescribed significance level
a, and the adjusted A-D statistic for a sample size n is
0.75 2.25
4* = A2 ----------1------ 2 (7.7)
7? 7?z
For the exponential distribution, the critical value of ca at a specified significance level
a is given in Table A.6b. and the corresponding adjusted A-D statistic. A *, is given by
0.6 \
* = A2
A (7.8)
n
For the gamma distribution, the critical value of ca depends on the value of the parameter
k as given in Table A.6c. Moreover, the adjusted A-D statistic also depends on the parameter
k as follows:
For A: = 1, * = A2(1.0+ 0.6/h)
A
For A: > 2 * = A2 + (0.2 + 0.3/k)/n
A (7.9)
For the extremal distributions, of the Gumbel and Weibull types, the critical values of
ca at specified significance levels a are given in Table A.6d. In this case, the adjusted A-D
statistic is given by
0.2
* = A2
A (7.10)
n
► EXAMPLE 7.11 The steel toughness data of Example 7.1 was previously fitted with a normal distribution, and its
goodness-of-fit was validated in Example 7.9 by the K-S Test. With the same data, we will now
perform a validation of the normal distribution with the A-D Test at the 5% significance level. Table
E7.ll summarizes the result of the calculations according to the procedure outlined earlier. The
proposed normal model is N(76.99. 4.709) with the parameters, and a, estimated from the sample
of size 26. From the results in Table E7.11, we calculate the A-D statistic as
, -699.476
A2 =------------------- 26 = 0.903
26
298 Chapter 7. Determination of Probability Distribution Models
TABLE E7.11 Computations for the Anderson-Darling Test of the Normal Distribution
Since A* > 0.727, the normal distribution is not acceptable at the 5% significance level. However, at
the 1 % significance level, the constants aa,bo, and b\ from Table A.6a are 1.0348, — 1.013, and —0.93
respectively. Hence, the critical value is co.oi =0.994. Therefore, at the 1% significance level, since
A
* < Co on the normal distribution would be acceptable. ◄
We might point out that the A-D test described above for the normal distribution
is applicable also to the lognormal distribution. As the logarithm of a lognormal variate
is normally distributed, we simply need to take the logarithms of the sample values of
the variate in applying the same A-D test as for the normal distribution. That is, all the
computations for a normal distribution would remain the same except that the sample values
of the variate must be replaced by the respective logarithms of the variate; e.g., in Table
E7.11, Xj must be replaced by the corresponding ln(.rz).
7.3 Testing Goodness-of-Fit of Distribution Models 299
► EXAMPLE 7.12 As shown in Fig. E7.5. the annual largest earthquake magnitudes in California observed between 1932
and 1962 show a linear trend on the Gumbel probability paper. On this basis, the Gumbel distribution
with the parameters w = 5.7 and a = 2.0 is a viable model for the annual maximum earthquakes
in California. We will now perform an A-D test for goodness-of-fit of this distribution at the 5%
significance level.
The required calculations are summarized in Table E7.12.
Therefore, according to Eq. 7.5, the A-D statistic is
-976.86
= 0.512
31
* = 0.512 (1.0+
A ) =0.530
\ 7317
TABLE E7.12 Computations for the Anderson-Darling Test of the Gumbel Distribution
From Table A.6d, we obtain the critical value ca =0.757 at the significance level of 5%. Therefore,
* <ca, the Gumbel distribution is an acceptable or valid distribution
according to the A-D test, since A
at the 5% significance level. ◄
Finally, we should also emphasize that if the initial distribution is one of the asymptotic
extremal forms, the distribution for increasing n will remain of the same asymptotic form;
i.e., the form of distribution of the extremes will remain invariant with increasing/?, although
the parameters will change with /?. In particular, we observe, for example, that if Y is the
annual maximum with a Gumbel distribution, the maximum in n years will also be Gumbel
distributed; we show this as follows:
Suppose
Fy (y) = exp|—
► EXAMPLE 7.13 In Example 7.5, we observed that the occurrence of the annual largest magnitude earthquakes was
determined to follow the Gumbel (Type /) extreme value distribution with parameters u = 5.7 and
a = 2.00. On the basis of Eq. 7.11, the distribution of the 10-year maximum magnitude will remain
a Gumbel distribution with parameters
c , 1,1 10 Z OC 1 TA
u„ = 5.7 + -y = 6.85 and an = 2.0
On this basis, we can say that the most probable largest magnitude earthquake in California in the
next ten years will be 6.85 on the Richter scale. Similarly, in the next 25 years, the most probable
earthquake will be of magnitude
In 25 _,
wn — 5.7 + y — 7-31
By the same token, extrapolating from 10 years to 25 years, we will also obtain
In 2.5
= 6.85 + yy = 7.31
► EXAMPLE 7.14 In Example 4.21, the fracture strength of a welded joint was modeled by the Weibull distribution (the
Type HI asymptotic distribution of smallest value), as defined in Eq. 4.32, with parameters w = 15 ksi,
k= 1.75, and the minimum strength is £=4 ksi. Now, suppose that we have a structural member
with five welded joints of the same type. Then, according to Eq. 4.25. the CDF of the lowest fracture
strength among the five welded joints in the structural member would be
which remains a Weibull distribution. However, with n = 5. the probability that the lowest fracture
strength of the structural member will be at least 16.5 ksi becomes
16.5-4
P (Ei > 16.5) = exp —5 x = 0.002
15-4
which is a much lower probability than that of 0.286 for a single joint as shown in Example 4.21. ◄
tests, however, depend on the prescribed level of significance, the choice of which is largely
subjective. Nevertheless, these tests are statistically useful, particularly for discriminating
the relative validities of two or more candidate distribution models.
Finally, it is well to emphasize that when extreme values are involved, one of the
asymptotic forms of extremal distribution may be appropriate. The invariance property of
the asymptotic forms is also of significance in such applications.
PROBLEMS
7.1 The data for the fracture toughness of steel base plates have Ultimate Strain, Ultimate Strain, Ultimate Strain,
been compiled as shown in Table E7.1. In Example 7.1, these Bar# U (in %) Bar # U (in %) Bar # U (in %)
data were plotted on a normal probability paper.
(a) Plot the same set of data on a lognormal probability paper, 1 19.4 6 17.9 11 16.1
and draw a straight line through the plotted data points if a linear 2 16.0 7 17.8 12 16.8
trend can be observed. 3 16.6 8 18.8 13 17.0
(b) Estimate the median and c.o.v. of the fracture toughness on 4 17.3 9 20.1 14 18.1
the basis of the straight line of Part (a). 5 18.4 10 19.1 15 18.6
(c) Perform a chi-square lest to evaluate the validity of the ex Range of K/per day No. of Observations
ponential distribution at the 5% significance level.
0.000-0.049 1
(d) Alternatively, perform also a Kolmogorov-Smirnov test of
0.050-0.099 11
the same distribution at the same 5% significance level.
0.100-0.149 20
7.5 The number of vehicles arriving at a toll booth, per minute, 0.150-0.199 23
were observed as follows: 0.200-0.249 15
0.250-0.299 11
0.3, 1,2,0, 1, 1, 1,2.0, 1,4,3. 1, 1,0.0, 1,0, 2, 2.0, 1,0,0 0.300-0.349 2
(a) Assuming that the arrival rate of vehicles at the toll booth is (a) If a normal distribution is proposed to model the oxygena
a Poisson process, estimate the mean arrival rate. tion rate at the Cincinnati Pool. Ohio River, estimate the mean
(b) Perform a chi-square test to determine the validity of the and standard deviation of the distribution.
Poisson distribution at the 1% significance level. (b) Perform a chi-square test for the goodness-of-fit of the pro
7.6 The PDF of the Rayleigh distribution of a random variable posed distribution at the 5% significance level.
X is 7.8 A random variable X with a triangular distribution between
A(x)= x > 0 a and a + r is described by the following PDF:
a-
x <0 2(x-a)
=0 ; fx (X) = a <x <a+r
r2 ’
in which the parameter a is the mode, or most probable value, =0 otherwise
ofX.
(a) Determine the appropriate standard variate .S' for the trian
(a) Construct the probability paper of the above Rayleigh dis
gular distribution.
tribution. What docs the slope of a straight line on this paper
(b) Construct the corresponding probability paper. What do the
represent?
values of X at Fs(0) and F$(1.0) on this paper mean?
(b) A set of data for strain range induced by vehicle loads on
(c) Suppose the following sample values were observed for X:
highway bridge members is tabulated below. Plot the set of data
on the Rayleigh probability paper constructed in Part (a) above,
36 32 34 71
(c) In light of the results of Parts (a) and (b) above, what con
18 69 45 66
clusion can you draw regarding the Rayleigh distribution as to
56 71 53 58
its suitability for modeling the live-load stress-range in high
64 50 55 53
way bridges? To verify or invalidate the Rayleigh distribution,
72 28 62 48
perform a K-S test at the 1% significance level.
75
(d) Determine the most probable value of the strain-range
(if possible) from Part (b). Plot the above set of sample values on the triangular probability
paper constructed in Part (b) above, and from this plot estimate
Measured Strain Range (micro-in./in.) the minimum and maximum values of X.
48.4 52.7 42.4 7.9 The shear strength [in kips per square foot (k.sf) | of 13 undis
47.1 44.5 146.2 turbed samples of clay from the Chicago subway project are
49.5 84.8 115.2 tabulated below (data from Peck, 1940):
116.0 52.6 43.0
84.1 53.6 103.6 Shear Strength of Clay, ksf
99.3 33.5 64.7
0.35 0.42 0.49 0.70 0.96
108.1 43.8 69.8
0.40 0.43 0.58 0.75
47.3 56.3 44.0
0.41 0.48 0.68 0.87
93.7 34.5 36.2
36.3 62.8 50.6
122.5 180.5 167.0 (a) Plot the above data on a lognormal probability paper, and
draw a straight line through the data points (if a linear trend can
(Data courtesy of W. H. Walker.) be observed).
(b) Estimate the parameters of the distribution from the straight
7.7 Data on the rate of oxygenation, K. in streams have been line in Part (a) above.
obtained for the Cincinnati Pool. Ohio River, at 20 C and sum (c) Test the goodness-of-fit of the lognormal distribution using
marized as follows. (Data from Kothandaraman, 1968.) the Kolmogorov-Smirnov test at the 2% significance level. Do
304 Chapter 7. Determination of Probability Distribution Models
the same with the Anderson-Darling test at the 2.5% significance (a) Based on the histogram shown in Fig. 1.1 of Chapter 1, a
level. lognormal or a gamma PDF may be a plausible distribution to
model the rainfall intensity of the watershed area. Plot the data
7.10 Passenger cars coming to an intersection must stop at a slop
on a lognormal probability paper.
sign and must wait for a gap sufficiently long to cross or make a
(h) Prescribe a gamma distribution and estimate its parameters
turn. The acceptance gap. G, measured in seconds, varies from
by the method of moments.
driver to driver; some drivers are more alert or more risk-taking,
(c) Determine which of the two distributions is better suited to
whereas others are more careful or slow-moving. Observations at
model the pertinent rainfall intensity by performing chi-square
several similar intersections in a city were recorded as follows:
goodness-of-fit tests for both distributions.
7.12 Data for the observed settlements of piles and the corre
Acceptance Gap, G (sec) Number of Observations
sponding calculated settlements are compiled in Table E8.3 of
0.5-1.5 0 Chapter 8. Based on these data, we can calculate the ratios of the
1.5-2.5 6 observed to the corresponding calculated settlements; the results
2.5-3.5 34 are as follows:
3.5-4.5 132
4.5-5.5 179
5.5-6.5 218 Ratios of Observed Settlement to Calculated Settlement
6.5-7.5 183
0.12 0.97 0.86 1.14 0.94
7.5-8.5 146
2.37 0.88 0.92 1.01 0.99
8.5-9.5 69
1.02 1.04 0.99 0.87 0.52
9.5-10.5 30
0.94 1.06 1.38 1.04 1.18
10.5-11.5 3
1.00 0.86 0.82 0.84 1.09
11.5-12.5 0
(a) Plot a histogram for the above data of the observed accep The ratio of the observed to the calculated settlements is a
tance gap. measure of the accuracy of the calculational method. From
(b) From the above histogram, determine whether the normal the above data, we can observe that this ratio has considerable
or lognormal distribution is a better model for the acceptance variability.
gap. G. (a) Assuming that the ratio is a Gaussian random variable, plot
(c) From the observed data, estimate the sample mean and sam the above data on a normal probability paper and observe if there
ple standard deviation of the acceptance gap. For this purpose, is a linear trend of the data points.
the average gap length may be used for each of the intervals of (b) If a linear trend is observed, draw a straight line through the
G in the first column of the tabulated data. data points and estimate the mean and standard deviation from
(d) Perform chi-square goodness-of-fit tests for both the normal the straight line. Perform a chi-square goodness-of-fit test for
and lognormal distributions at the 1% significance level, and the normal distribution at the 5% significance level. Also, do the
on this basis determine which of the two distributions is more same with the Anderson-Darling test.
appropriate to model the acceptance gap G. (c) Otherwise, if no linear trend can be observed in Part (b), plot
the same data on another probability paper, such as the lognormal
7.11 Measured data for the annual rainfall intensity in a water
paper, and determine the suitability of this alternative distribu
shed area was presented in Table 1.1 of Chapter 1 .These data are
tion to model the relevant ratio, including a goodness-of-fit test
repeated below as follows:
at the 2% significance level to verify its suitability.
Observed Rainfall Intensity, in. 7.13 In Example 8.8 of Chapter 8, the data tabulated in Table
E8.8 include the mean depth of glacier lakes in the Swiss Alps,
43.30 54.49 58.71
which we summarized below as follows:
53.02 47.38 42.96
63.52 40.78 55.77
45.93 45.05 41.31
Mean Depth of Glacier Lakes, m
48.26 50.37 58.83
50.51 54.91 48.21 2.9 12.0 33.3
49.57 51.28 44.67 5.0 13.6 27.9
43.93 39.91 67.72 4.7 28.6 46.9
46.77 53.29 43.11 7.1 18.6 50.0
59.12 67.59 10.4 34.3 83.3
References 305
Obviously, the mean depths of glacier lakes are highly variable. mean depth of glacier lakes. Verify also the selected distribution
Assuming that the above data are representative of glacier lakes with a goodness-of-fit test.
in general, determine the appropriate distribution to model the
REFERENCES
Allen. D. E.. “Statistical Study of the Mechanical Properties of Rein Kimball. B. F., "Assignment of Frequencies to a Completely Ordered
forcing Bars." Building Research Note, No. 85, National Research Set of Sample Data,” Transactions, American Geophysical Union,
Council. Ottawa, Canada. April 1972. Vol. 27, 1946. 843-846.
Anderson. T. W.. and Darling, D. A. (1954). “A Test of Goodness-of-Fit.” Kothandaraman, V., "A Probabilistic Analysis of Dissolved Oxygen-
Jour, ofAmerica Statistical Association. 49. 765-769. Biochemical Oxygen Demand Relationship in Streams,” Ph.D.
Cusens, A. R.. and Wettern. J. H„ “Quality Control in Factory-Made Dissertation, University of Illinois at Urbana-Champaign.
Precast Concrete,” Civil Engineering and Public Works Review. 1968.
Vol. 54. 1959. Lockhart. R. A., and Stephens, M. A. "Goodness-of-Fit Tests for the
D'Agostino, R. B., and Stephens, M. S., Goodness-of-Fit Technique. Gamma Distribution,” Technical Report. Department of Mathe
Marcel Dekker. Inc.. New York and Basel. 1986. matics and Statistics, Simon Fraser University. British Columbia,
Epstein, B„ and Lomnitz, C., “A Model for the Occurrence of Large Canada. 1985.
Earthquakes," Nature. August 1966. pp. 954—956. Pearson, E. S.. and Hartley, H. O.. Biometrika Tables for Statisticians,
Gumbel. E. J.. “Statistical Theory of Extreme Values and Some Practical Vol. 2. Cambridge University Press. New York. 1972.
Applications.” Applied Mathematics Series 33, National Bureau of Peck. R. B„ “Sampling Methods and Laboratory Tests for Chicago Sub
Standards, Washington, D.C., February 1954. way Soils,” Proc. Purdue Conf, on Soil Mechanics and Its Applica
Hazen, A.. Flood, Flows, A Study in Frequency and Magnitude. J. Wiley tions, Lafayette, IN. 1940.
& Sons, Inc.. New York. 1930. Petitt. A. N., "Testing the Normality of Several Independent Samples
Hoel. P. G.. Introduction to Mathematical Statistics. 3rd ed., .1. Wiley & Using the Anderson-Darling Statistic.” Jour. Roy. Stat. Soc. C 26,
Sons. Inc., New York. 1962. 156-161, 1977.
Kies. J. A., Smith. H. L.. Romine. H. E., and Bernstein. M., “Frac Stephens. M. A. “Goodness-of-Fit for the Extreme Value Distribution.”
ture Testing of Weldments.” ASTM Special Publ. No. 381, 1965, Biometrika, Vol. 64. 1977. pp. 583-588.
pp. 328-356.
Regression and Correlation
Analyses
► 8.1 INTRODUCTION
When there are two (or more) variables, there may be some relationship between (or among)
the variables. In the presence of randomness, the relationship between the two variables
will not be unique; given the value of one variable, there is a range of possible values of
the other variable. The relationship between the variables, therefore, requires probabilistic
description. If the probabilistic relationship between the variables is described in terms of
the mean and variance of one random variable as a function of the value of the other variable,
this requires what is known as regression analysis. When the analysis is limited to linear
mean-value functions, it is called linear regression. More generally, however, regression
may be nonlinear. The linear or nonlinear relationship obtained from a regression analysis
does not necessarily represent any causal relation between the variables; i.e., there may not
be any cause-and-effect relationship between the variables. Such a relationship, however,
may be used to predict the value or statistics of one variable based on the value of the other
control variable.
For linear regression, the degree of linearity in the relationship between two random
variables may be measured by the statistical correlation, in particular, by the correlation
coefficient as defined in Sect. 3.3.2. When the correlation coefficient is high, close to ± 1.0.
one can expect high confidence in being able to predict the value of one variable based on
information about the value of the other (control) variable. Evaluation of the correlation
coefficient from a set of observed data is through correlation analysis.
In this chapter, we start our discussion of linear regression with constant variance, in
cluding the determination of the conditional variance and confidence interval of a regression
equation, and the related correlation analysis. This is then extended to linear regression with
nonconstant variance, nonlinear regression, and multiple regression analyses, concluding
with illustrations of engineering applications of regression analyses.
306
8.2 Fundamentals of Linear Regression Analysis 307
variable, say X = x, does not give perfect information on the other variable Y. Conceivably,
the range of values of Y is governed by a probability distribution. We may observe also
from Fig. 8.1 that the mean value of Y increases with increasing values of X, and if this
relationship is linear, we have a linear regression; i.e.,
E(Y\X = x) = a + fix (8.1)
where a and are constants, known as the regression coefficients, which are the intercept and
slope, respectively, of the straight line. Equation 8.1 is known as the regression equation, and
it represents the regression of Y on X. The regression coefficients a and f must necessarily
be estimated from the available data.
From the scatter of the data points in Fig. 8.1, we would expect a variance of Y that may
depend on the value of X, i.e., a conditional variance of Y given X = x or Var(F|X = x). In
general, this conditional variance may vary with x. However, let us first consider the case
in which Var(F|X = x) is constant.
We can observe from the scattergram of the plotted data points that there could con
ceivably be many straight lines, depending on the values of the regression coefficients, that
might represent the mean-value of Y as a linear function of X. The "best” straight line may
be the one that passes through the data points with the least cumulative error. To obtain this
particular straight line, we observe from Fig. 8.1 that for each data point (x(, y() the difference
between the observed value y, and the value from a candidate straight line y- = a + fx, is
| V/ — y(|. For a sample of observed data pairs of size n, i.e., [(xj,yi), (x2, yz), • • •, (x,,,>’„)!,
the total absolute error for all the data points can be represented by the total cumulative
squared error; i.e.,
A2 = ^2(y,- - y-)“ = -« ~ fXiY (8-2)
i=i i=i
Then, we may obtain the straight line with the least squared error by minimizing A2 of
Eq. 8.2, yielding the following equations for obtaining a and f. That is,
-2(y,- - a - fix.) = 0
da Z=l
308 Chapter 8. Regression and Correlation Analyses
and
3A2
, -2x,- (y, - a - fix.) = 0
The above procedure is known as the method of least squares, from which we obtain the
least-squares estimates of a and f from the sample of size n as follows:
«= - yi - - y Xi - y - fix (8.3)
n L—' n
and
where
e=el.
x, y = the sample mean of X and Y, respectively
n = the sample size
assumed to be constant with x, an unbiased estimate of this variance from a sample of size
n is
(8.6)
7 __
(8.6a)
(8.7)
We shall see in Sect. 8.3.1, Eq. 8.12, that r is closely related to the correlation co
efficient p.
yi ~ dY\xt
/I (X; — X )2
•s'ri.r j ---- 7
V n £(x,- - x )-
will have the /-distribution with (/? - 2) d.o.f (Hald, 1952). On this basis, we may obtain
the (1 — a) confidence interval for the regression equation at several selected values of
X = Xj as
I \ - . /1 , Ui - X)“
(Mru,),_ = V, ± ~ + Efc_7)2 (8.8)
with (/? — 2) d.o.f. as given in Table A.3. Among the confidence intervals of Eq. 8.8 at the
selected discrete values of .q, the interval will be minimum at x(- = x, the mean value of X.
Connecting these discrete points along the regression line should yield the appropriate
confidence interval of the regression equation.
► EXAMPLE 8.1 Observed data for blow counts, N, and corresponding measured unconfined compressive strength, q,
of very stiff clay are given in the first two columns of Table E8.1. The sample of 10 pairs of data is
also plotted in Fig. E8.1.
On the basis of the calculations in Table E8.1 we obtain the respective sample means of N and
q as
The tabulated results can be organized conveniently using spreadsheets as in Example 8.3.
8.3 Correlation Analysis 311
Using the calculations in Table E8.1, the estimated correlation coefficient according to
Eq. 8.9 is
1 492.77 - 10 x 18.7 x 2.123
9 V95.12-/E22
To determine the 95% confidence interval for the regression equation of q on N, let us use the
following selected values of Nj = 4, 11,19, and 34, and with to.975,8 — 2.306 from Table A.3, we obtain
. . fl (4- 18.7)2
At N, = 4; (m,ik)o,s = 0.477 ± 2.306 x 0.195^- +
The above confidence intervals at the selected discrete values of N are also displayed in Fig. E8.1.
These may then be used to construct the 95% confidence interval for the linear regression equation
of q on N, which are the lower and upper dash curves in Fig. E8.1. ◄
a better statistieal measure of the linear relationship between two random variables X and
Y is the correlation coefficient, which was defined in Eq. 3.81 as px.y = Cov(X, F)/oxO'y,
where Cov(X. F) is the covariance between X and Y. We shall see later in Eq. 8.12 that
Px.y is related to Sy|X. In essence, the correlation coefficient is a measure of the goodness-
of-fit of the linear regression equation in light of the set of sampled data. The accuracy of
the predicted mean value of Y for a given value of X will then depend on the correlation
coefficient.
where x, y, sx, and are, respectively, the sample means and sample standard deviations
of X and Y. According to Eq. 3.82, the value of p ranges from —1 to +1. If the estimated
Px.y is large (close to ±1.0), there is a strong linear relationship between X and Y; this
is illustrated in Fig. 8.2, which shows the linear regression line between the compression
index and void ratio of soils. Conversely, if px. y Is very small or close to zero (uncor
related), this would indicate a lack of linear relationship between X and Y. as illustrated
in Fig. 8.3 between the modulus of rupture and the modulus of elasticity of laminated
wood.
From Eqs. 8.4 and 8.9, we can show that the estimated correlation coefficient is
Figure 8.2 Compression Index vs. void ratio of soil. (After Nishida, 1956.)
8.3 Correlation Analysis * 313
Figure 8.3 Modulus of rupture vs. modulus of elasticity of laminated wood. (After Galligan and
Snodgrass, 1970.)
Equation 8.10a gives a useful relationship between the estimates of the correlation coeffi
cient p and the slope of the regression line fr. Furthermore, by substituting Eq. 8.10a into
Eq. 8.6, we obtain
s-2 - 1 n— 1
- y )2 - ?2^ ZL_ x )2 ----- -sp(l-p-) (8.11)
~ n-2 sx n—2
(8.12)
n — 1 Sy
which we see is equal to r2 of Eq. 8.7 for large n. On this basis, therefore, we can conclude
that a higher value of |p| means a greater reduction in the conditional variance associated
with the linear regression equation, and hence a more accurate prediction of Y based on the
regression of Y on X.
We may recall from Example 3.38 that if two random variables X and Y are jointly
normally distributed, the conditional mean and variance of Y given X=x are as follows:
<7y
E(Y\X = x) = pY + p— (x - px) (8.13)
and
Var(T|X = x) = a~(l - p2) (8.14)
in which p is the correlation coefficient between the two variates. The results of Eqs. 8.13
and 8.14, therefore, clearly show that if two variates are jointly normal, the regression of Y
on X, or vice versa the regression of X on T, is linear with constant conditional variance;
i.e., independent of x or v. Specifically, for the regression of Y on X, we see that the linear
cry
equation of Eq. 8.13 is of the form of Eq. 8.1 with a slope of ft = p— and an intercept of
ax
a = pY - fipx-
EXAMPLE 8.2 In Example 6.12 of Chapter 6, we presented the precipitation and corresponding runoff data recorded
during the 25 rainstorms on the Monocacy River. In hydrology, it is of interest to be able to predict
the runoff of the river on the basis of the precipitation; for this purpose, the regression of the runoff
on precipitation, therefore, is relevant. Denoting Y for runoff and X for precipitation, we tabulate the
calculations necessary to evaluate the regression coefficients in Table E8.2.
From the results of Table E8.2, we obtain the sample means and sample variances of X and K,
respectively, as follows:
A2 1.735
= 0.075
n-2 25-2
and its conditional standard deviation is = s/0.075 = 0.274 in. In this case, using Eq. 8.9. we
obtain the correlation coefficient as
1 59.24- 25 x 2.16 x 0.80
p =------------- , ------ = 0.898
24 vT53370?362
Assuming that the runoff at a given precipitation is a normal variate, we may assess the probability
of a specified runoff at a given precipitation. For instance, the probability that the runoff of the
Monocacy River will exceed 2 in. during a rainstorm with 4 in. of precipitation would be as follows.
For a precipitation of X = 4 in., the mean runoff would be
Therefore, the normal distribution of the runoff Y when the precipitation is X = 4 in. is N( 1.6, 0.274)
in. Hence, the probability that the runoff will exceed 2 in. is
/2 — 1.6X
P(y>2|X = 4)=l-<D^-^j = 1 - 0.928 = 0.072
316 Chapters. Regression and Correlation Analyses
We can also establish the 95% confidence interval as follows: For this purpose, let us select
five discrete values of the precipitation at Xj — 1.0, 2.16, 3.0, 4.0. and 5.0 in. Then, according to
Eq. 8.8, the respective confidence intervals at these five different values of x, are, individually, with
to.975,23 = 2.069 from Table A.3.
. . /1 (1.0-2.16)2
At xt = 1.0: i ok = 0.295 ± 2.069 x 0.274,/ — +------------------- -------
' V ri] °/0.95 y 25 153.44 - 25 x 2.162
= (0.138 -> 0.452)in.
At x, = 2.16: = 0.800 ± 2.069 x 0.274^ 1 + 'k,6:
EXAMPLE 8.3 Table E8.3 shows a set of data of observed settlements of pile groups (Col. 3), reported by Viggiani
(2001), under the respective loads; also shown in the same table (Col. 4) are the corresponding calcu
lated settlements using a nonlinear model proposed by Viggiani (2001). We may perform the regression
of the observed settlement, Y, on the calculated settlement, X\ the calculations are summarized in
Table E8.3.
From Table E8.3, we obtain the sample means and sample standard deviations of Y and X.
respectively, as follows:
_ 600.36 _ _ 563.34
V = -------- = 24.014 mm x = -------- = 22.534 mm
25 25
According to Eqs. 8.3 and 8.4, we obtain the corresponding regression coefficients for the regression
of Y on X as follows:
x 43,842.89 - 25 x 24.014 x 22.534
fi = --------------------------------------------- = 1.064 and a = 24.014 - 1.064 x 22.535 = 0.038
f 41.185.44 - 25 x 22.5342
The data given in Columns 3 and 4 of Table E8.3 are plotted in Fig. E8.3.
Therefore, the linear regression equation of Y on X is
E(Y |X = x) = 0.038 + 1,064x
8.3 Correlation Analysis 317
which shows a very high correlation between the observed and calculated settlements. The conditional
standard deviation of Y given X, according to Eq. 8.6a. is
We may observe that this .Sy|r is much smaller than the unconditional standard deviation
•vy = 37.44 mm.
We may also construct the 95% confidence interval of the regression line following Eq. 8.8.
For this purpose, we first evaluate the corresponding 95% confidence interval at the following se
lected discrete values of x,: 25 mm, 50 mm, 100 mm, 150 mm, and 180 mm. From Table A.3,
f0.975,23 = 2.069.
/](25 - 22 534)2
At Xi = 25 mm: (Mr) 9S = 26.600 ± 2.069 x 7.784. / — +---------- ---------------------- -
95 V 25 (41.185.4 - 25 x 22.5342)
= (23.38 —> 29.82) mm
fl (50 - 22.534)2
At .v, = 50 mm: (Mr) 95 = 53.238 ± 2.069 x 7.784. — +------- ----------------- ------- -
95 V 25 (41,185.4 — 25 x 22.5342)
= (49.08 -► 57.39) mm
/I (100-22.534)2
At Xi — 100 mm: (Mr) 95 = 106.44 ± 2.069 x 7.784./ — ±------------------------- ------ -
95 V 25 (41,185.4 - 25 x 22.5342)
= (98.38 1 14.50) mm
/I (150- 22.534)2
At Xi = 150 mm: (My) 95 = 159.64 ±2.069 x 7.784, — ±------------------------- ------- -
95 V 25 (41,185.4 - 25 X 22.5342)
= (147.06—* 172.22) mm
= 191.56±2.069 x 7.784,/± + (18° ~ 22'W ,
At Xi = 180 mm:
95 V 25 (41,185.4 - 25 x 22.5342)
= (176.19 -* 206.92) mm
By connecting two lines through the respective lower-bound and upper-bound values, as calcu
lated above, we obtain the 95% confidence interval of the regression line as shown in dash lines in
Fig. E8.3. ◄
where cs is an unknown constant and g(x) is a predetermined function of x. Again, for linear
regression of Y on X, we have
E(Y\X = x) = a + ftx
in which the regression coefficients a and ft may be different from those of Eq. 8.1. In
this case, it would be reasonable to assume that data points in regions of small conditional
variance should carry higher “weights” than those in regions of large conditional variances.
On this premise, accordingly, we assign weights inversely proportional to the conditional
variance, or
w' = 1 = 1
' Var(T|X=x,) o-2g2(x,)
Then we can show that the total squared error is
A2 = E wt (Yi - « - ftXj)2
/=i
from which the least-squares estimates of the regression coefficients a and ft become
&= t
E W, V; — ft E W,X;
' (8 j
E^i
and
HwiCE,w'xiyi) - (E^m'/XE^) zo
P \' /V^ 2\ zV \2 (o. 1 /)
> . W/(E
WiX, ) - wixiY (E
in which
2 / '
Wi = (J~W; = —------
' g2(x.)
An unbiased estimate of o2 from a sample of size n is then
or
y( = E(Y\X = Xj) is given by the linear regression equation with the regression
coefficients estimated with Eqs. 8.16 and 8.17; and
/ E W, ( V; — V -)2
syir = g(x\ J—— of Eq. 8.19a and 8.18.
V 77—2
320 Chapters. Regression and Correlation Analyses
► EXAMPLE 8.4 Data of the observed maximum settlements and the maximum differential settlements for 18 storage
tanks in Libya are plotted as shown in Fig. E8.4. From this figure, we can observe that the scatter of
the differential settlement appears to increase linearly with the maximum settlement. Accordingly, we
may assume that the conditional standard deviation of the maximum differential settlement Y increases
linearly with the maximum settlement X; that is. g(x) = x. Var(F|X = x) — o2x2. and w, = 1 /x2.
The details of the calculations are tabulated in Table E8.4. from which we obtain the sample
means and sample standard deviations of X and Y, respectively, as follows:
29.7 19.9
= 1.65 and
and
Also, with Eqs. 8.16 and 8.17, we obtain the regression coefficients
22.38 x 12.42 - 15.84 x 11.31 11.31 -0.65 x 15.84
= 0.65 and = 0.045
22.38 x 18 - 15.842 22.38
Therefore, the linear regression of Y on X is E(Y |x) = 0.045 + 0.65x.
Figure E8.4 Settlements of tanks in Libya. (Data from Lambe and Whitman. 1969.)
With the above intervals at the selected discrete values of x„ we can construct the 90% confidence in
terval for the regression equation of the maximum differential settlement on the maximum settlement,
as displayed in Fig. E8.4.
Max. Diff.
Tank Settlement Settlement
No., i x, (cm) y, (cm) Wf WiXj W;X,-2 w,(.v,— y,)2
1 0.3 0.2 11.11 3.33 2.22 0.67 1.0 0.0178
2 0.7 0.7 2.04 1.43 1.43 1.00 1.0 0.0816
3 0.8 0.5 1.56 1.25 0.78 0.62 1.0 0.0066
4 0.8 1.1 1.56 1.25 1.72 1.37 1.0 0.4465
5 0.9 0.3 1.23 1.11 0.37 0.33 1.0 0.1339
6 1.0 0.6 1.00 1.00 0.60 0.60 1.0 0.0090
7 1.1 0.6 0.83 0.91 0.50 0.55 1.0 0.0212
8 1.4 1.0 0.51 0.71 0.51 0.71 1.0 0.0010
9 1.5 1.0 0.44 0.67 0.44 0.66 1.0 0.0002
10 1.6 1.0 0.39 0.63 0.39 0.62 1.0 0.0028
II 1.6 1.3 0.39 0.63 0.51 0.81 1.0 0.0180
12 2.0 1.5 0.25 0.50 0.38 0.75 1.0 0.0060
13 2.4 1.3 0.17 0.42 0.22 0.53 1.0 0.0158
14 2.6 2.3 0.15 0.38 0.35 0.90 1.0 0.0479
15 2.9 1.9 0.12 0.34 0.23 0.66 1.0 0.0001
16 2.9 2.3 0.12 0.34 0.28 0.80 1.0 0.0164
17 3.7 1.7 0.07 0.27 0.12 0.44 1.0 0.0394
18 1.5 0.6 0.44 0.67 0.26 0.40 1.0 0.0776
to be linear, the resulting analysis is called multiple linear regression. The basic analysis of
multiple linear regression is simply a generalization of the regression analysis developed in
Sect. 8.2 for two variables.
Suppose the dependent variable of interest is Y and that it is a linear function of
k random variables X\, Xz, ...,Xk. It follows that the mean value of Y at X] =x,i,
AS = Xiz,... ,Xk=Xik',i= 1,2,..., n, is
y'i = + Pl^il + 02
*12 + • • • + PkXik (8.20)
for each z, where 0q, 0 i >••••> 0 k are constant regression coefficients that must be estimated
from the observed data (x/i, x/2,... ,x, *), whereas the conditional variance of Y for given
values x/i, x,2,... ,Xjk is assumed to be constant for any z; i.e.,
Var(T|x/i, x/2,... ,X/<) = a~ (8.21)
or is a known function of (x,i, x/2,... ,Xik)\ i.e.,
Var(T|x/i,X/2,... ,xzfc) = cr2j?2(x/i, xi2,.. . ,xik) (8.22)
Regression analysis is then performed to estimate fl ।,..., flk and <72 based on a set of
observed data of size zz, namely, (x/i, X/2,..., xik,y,), z = 1, 2,..., n.
Equation 8.20 may be written conveniently in matrix form as follows:
y =x/3 (8.20a)
where y' is a vector y' = {yj, y'2,..., y't}, in which each y- is given by Eq. 8.20; (3 is a
vector of the regression coefficients (3 = {0O, 0\,...,0k}, and X is an n by A + 1 matrix
(i.e., n rows and k + 1 columns):
Let us now concentrate our attention on the case in which the conditional variance is
constant. The total squared error for a set of n data points is then
a2 = E (» - A
1=1
= Z=1E- a - ^x''------aw <8-23>
Then, on the basis of least squares, we minimize A2 to obtain the following set of linear
equations for determining the estimates of flj, 7 = 0,1,2,..., k:
8 A2 ,
-77- = LV/ ~ ~ -----&*<] =0
"Po /=I
8A2 A r . . . ,
-rx- = > X/| |y(- - 0O - 0iXn--------- 0kxik\ = 0 (8.24)
- 1
fl
8A2
Zx‘k [y, - 0(> - 0[Xn--------- PicXik] = 0
8A i=l
Premultiplying both sides of Eq. 8.24a by the inverse of the matrix XrX, we obtain the
solutions for the least-squares estimates of the regression coefficients as
3 = (XrX)-' X7y (8.25)
in which yf is given by Eq. 8.26a. The reduction in the original variance of Y may also be
evaluated with r of Eq. 8.7 by substituting sy|,V1... for
► EXAMPLE 8.5 An important factor in determining the frost depth for highway pavement design is the mean annual
temperature for the site under consideration. The mean annual temperatures that have been recorded
at 10 different weather stations in West Virginia are summarized in Table E8.5a.
At locations in the state where temperature records are not available, the mean annual temperature
at a construction site in West Virginia may be predicted on the basis of its elevation and latitude based
on the information recorded in Table E8.5a.
For the above purpose, the following multiple linear equation may be assumed:
Table 8.5a Mean Annual Temperatures in West Virginia. (Data from Moulton and Schaub, 1969.)
To evaluate the regression coefficients, we form the following matrix based on the data of Table
E8.5a:
10 15,350 388
15.350 3.36 x IO7 5.92 x 105
388 5.92 x 105 15,061
and
520.7 121.05
772,104 Hence, 0 = (X7 X) 1 (Xry) = -0.0034
20.208 -1.644
and the regression coefficients are = 121.05, = —0.0034. and = —1.644. Thus, we obtain
the multiple linear regression of Y on X, and X2 as
With the above regression equation, we summarize the results of the calculations in Table E8.5b.
From these results, we also obtain the following. The respective sample means
15,350 388.01 520.7
1535; = 38.80; y = 52.07
10 10 10
and respective sample standard deviations:
/ 3.80
.vF|r| = /-------- ----- = 0.74 F
1,2 y10_2_]
8.6 Nonlinear Regression 325
Applied to Gary, West Virginia, which is located at an elevation of 1426 ft and at a latitude of
37.37 N, the expected mean annual temperature would be
£(711426, 37.37)= 121.3 -0.0034 x 1426-4.65 x 37.37 = 54.80 F
from which, we obtain Vo.io = 64.80 + 0.74<t> ’1 (0.10) = 54.80 — 0.74 x 1.28 = 53.9 F. ◄
Multiple Correlations
In multiple regression, the dependent variable Y would be a function of two or more in
dependent or control variables. Therefore, Y may be correlated with each of the indepen
dent variables, e.g., between Y and X), with corresponding correlation coefficient Py.x,-
Moreover, each pair of the independent variables may also be mutually correlated with a
correlation coefficient px,.x, between X, and Xj. From the set of observed data of sample
size n, the estimates of the respective correlation coefficients between Y and Xy, following
Eq. 8.9, would be
UNlVERSiDAD JAVERIAN&
Biblioteca General
Carrera 7 No. 41-00
Samtafe de Bogota
326 Chapters. Regression and Correlation Analyses
with certain undetermined coefficients that must be evaluated on the basis of the observed
data. The simplest type of nonlinear functions for the regression of Y on X is
► EXAMPLE 8.6 Data of the average all-day parking cost (in the 1960s) in the central business districts of 15 U.S. cities
were collected as listed in Table E8.6. and the data are plotted against the city population. The data
points in the scattergram of Fig. E8.6a clearly show a nonlinear trend. However, if we use X = In x
in place of x, the scattergram would be as shown in Fig. E8.6b, which would then show a reasonably
linear trend. On the latter basis, we may model the average all-day parking cost as a linear function
of In x or with the following nonlinear regression equation:
where
From the calculations summarized in Table E8.6, we also obtain the following. The sample
means
94.97 _ 11.57
x —------- = 6.33 and y = —— = 0.771
15 J 15
Then, with Eqs. 8.3 and 8.4, we obtain the regression coefficients,
which is displayed in Fig. E8.6b, whereas the corresponding nonlinear regression of Y on X is shown
in Fig. E8.6a.
The conditional standard deviation of Y given x/ = In x is
Figure 8.6a Data points of parking cost vs. population (in arithmetic scale) and regression equation
of Y on X. (Data from Wynn, 1969.)
328 Chapters. Regression and Correlation Analyses
Figure 8.6b Data points of parking cost vs. population (in semilog scale) and regression of Y on In
X. (Data from Wynn. 1969.)
EXAMPLE 8.7 In order to predict the average dissolved oxygen (DO) concentration in a pool based on the average
pool temperature, T, measured data for DO (in mg/l) and corresponding pool temperature T (in C)
were obtained as summarized in Table E8.7. In this case, an exponential model may be an appropriate
functional relation between the DO and the temperature T\ that is,
DO = aefiT
By taking the natural logarithm of both sides of the above equation, we have
In DO = Ina + f}T
From the calculations summarized in Table E8.7, we obtain the following. The respective sample
means:
- 279.7 ------- 11.68
T =------- = 25.43 and In DO =------- = 1.06
11 11
and the respective sample standard deviations
which is shown in the semilogarithmic graph of Fig. E8.7, superimposed on the data points. The
equivalent nonlinear regression of DO on T would be
The estimated correlation coefficient associated with the above linear regression equation of In DO
on T is, according to Eq. 8.9,
I /292.62 - II x 25.43 x 1.06 \
p = — ------------------------------------ = -0.74
10 \ 1.883 x 0.281 J
or by Eq. 8.10.
•SinDoir = J 11 _ — 0-159
We may also construct the confidence interval of the linear regression of In DO on T using the
following discrete values of the temperature T: T — 23.5", 25.0", 26.5", and 28.0 C.
/ 1 (23.5 - 25.43)2
At T = 23.5°C: (Uinoo)o95 = 1.275 ±2.262 x 0.159, — ± — ---------------- —-
V ,nDO/0-95 , y 11 (7149 - 11 x 25.432)
= (1.12 -> 1.43)
fl (25.0 - 25.43)2
At T = 25.0°C: (/Zinno)o95 = 1.110 ±2.262 x 0.159,/— ± —------------------
M /(’-95 y 11 (7149 - 11 x 25.432)
= (1.00-> 1.22)
R (26.5 - 25.43)2
At 7’ = 26.5'C: ^nDO)M5 = 0.945 ± 2.262 x 0.159^- ± (7j49 _ j f x 25
Figure E8.7 Dissolved oxygen (DO) and temperature (T) relationship. (Data from Butts,
Schnepper, and Evans, 1970.)
► EXAMPLE 8.8 In mountainous regions, the hazard of glacier lake outburst is of major concern. The maximum
discharge and runout distance of an outburst is a function of the volume of a glacier lake. Through
remote sensing data, the area of a lake, A, can be determined and from empirical data on mean depths,
£), of glacier lakes, the required volume of a lake may be estimated. Available data for the Swiss Alps
are tabulated in Table E8.8 (after Huggel et al., 2002).
Ice Cave Lake 0.0035 2.9 3.5441 0.4624 1.6388 12.56 0.2138 0.5428 0.0065
Gruben Lake 5 0.01 5 4.0000 0.699 2.796 16.00 0.4886 0.732 0.001
Crusoe-Baby 0.017 4.7 4.2304 0.6721 2.8432 17.8962 0.4517 0.8276 0.0242
Lake
Gruben Lake 3 0.021 7.1 4.3222 0.8512 3.6791 18.6814 0.7245 0.8657 0.0002
Gruben Lake 1 0.023 10.4 4.3617 1.017 4.4358 19.0244 1.0343 0.8821 0.0182
MT’ Lake 0.0416 12 4.6191 1.0792 4.9849 21.336 1.1646 0.9889 0.0082
Lac d’Arsine 0.059 13.6 4.7708 1.1335 5.4077 22.7605 1.2848 1.0519 0.0066
Nostetuko Lake 0.2622 28.6 5.4186 1.4564 7.8916 29.3612 2.1211 1.3207 0.0184
Between Lake 0.4 18.8 5.6021 1.2742 7.1382 31.3835 1.6236 1.3968 0.015
Abmachimai Lake 0.565 34.3 5.752 1.5353 8.831 33.0855 2.3571 1.4591 0.0058
Gjanupsvatn 0.6 33.3 5.7782 1.5224 9.0000 33.3876 2.3177 1.4699 0.0028
Quongzonk Co 0.753 27.9 5.8768 1.4456 9.4955 34.5368 2.0898 1.5109 0.0043
Laguna Paron 1.6 46.9 6.2041 1.6712 10.3683 38.4908 2.7929 1.6467 0.0006
Summit Lake 5 50 6.699 1.699 11.3816 44.8766 2.8866 1.8521 0.0234
Phontom Lake 6 83.3 6.7781 1.9206 13.018 45.9426 3.6887 1.8849 0.0013
log I) = a + ft log A
which we can observe is linear in log-log space. The details of the calculations are summarized in
Table E8.8.
From Table E8.8, we obtain the sample means of lake areas and depths as
- 15.355 , , — 378.8
A = -------- = 1.024 x 106 nr and D =-------- = 25.25 m
15 15
and the respective sample standard deviations of log A and log D are
77.957
•JlogA — (419.323)- 15 = 1.006
14 15
1 18.439
■'‘log D — (25.240) - 15 = 0.429
14 15
and in terms of the original variables, the mean depth, D. and area. A, the relevant nonlinear relation
is
D = 0.118A0415
The graph of the data points and the above linear regression equation of log D on log A are shown in
Fig. E8.8.
The correlation coefficient associated with the regression of log D on log A is, according to
Eq. 8.9.
/ 77.957 \ /18.439
101 - 15 -------- 1 (---------
V 15 7 V 15
1 1.006 x 0.429 = 0.86
and the corresponding conditional standard deviation of log D for given log A is
^logD|log4 — y 15 ? — 0.102
We may also construct the 95% confidence bounds for the regression of log D on log A at the following
selected values of A:
Figure E8.8 Relationship between log D and log A for Glacier Lakes. (After Huggel et al., 2002.)
I 1 (4 699 — 5 197 )2
(miokdL = 1.022 ±2.160 x 0.102, - ±----- —---------- -------—- = (0.958 -> 1.086)
' 8 195 V 15 (419.323 -15 x 5.1972) 7
or
(/zD).95 = (9.08 -> 12.19) m
, . fl (5.699 -5.197)2
Mioo d o. = 1 -437 ± 2.160 x 0.102, / — ±------------------------ -—- = (1.373 -> 1.501)
' 8 / 95 y 15 (419.323 - 15 x 5.1972)
or
(/zd) 95 = (23.60 31.70)m
, , fl (6.699 - 5.197)2
MiogoL = 1-852 ± 2.160 x 0.102 / — ± -— --------------------—- = (1.747 1.957)
' 8 /95 y 15 (419.323 - 15 x 5.1972)
or
which can be used to establish the 95% confidence intervals of the regression line as shown in
dash lines in Fig. E8.8. ◄
8.7 Applications of Regression Analysis in Engineering 333
The form of the nonlinear functions assumed in Eq. 8.28 can be generalized as follows:
We now observe that by converting each of the polynomial terms in Eq. 8.30 into the
respective transformed variables, Zj = gj(x), Eq. 8.30 becomes
Figure 8.4 S-N relation for fatigue of mild steel. (Data courtesy of W. H. Munse.)
334 ► Chapters. Regression and Correlation Analyses
in which the constants a and b can be evaluated as estimates of the regression coefficients
based on the set of measured data. The above regression equation, therefore, yields the
so-called S-N relation of the form
NSb = a
There are also situations in which the mathematical form of a required relationship among
the principal variables may be postulated from heuristic considerations; regression analysis
may then be applied to assess the validity (statistically) of the mathematical equation or to
evaluate the values of the parameters on the basis of the observed data. For example. Smeed
(1968) postulated that the peak How of traffic |in passenger car units (pen)] into the center
of a city is
Q = afAl/2
where/ = the fraction of the city center that is occupied by roadways; A = the area of the
city center in ft2; and a = a constant that depends on the speed of the traffic and efficiency
of the road system. Basically, this equation is based on the hypothesis that the volume of
traffic (in peu) that can enter the central area is proportional to the circumference of the
central area. Data from 35 cities, including 20 from the UK, are shown plotted on a log-log
graph in Fig. 8.5.
The least-squares linear regression of log (Q/f) on log A yields a slope of 0.53 for the
regression line. Also, from the regression line of Fig. 8.5, the constant a can be evaluated
as the value of {Q/f} at A = 1.
There are also situations in engineering in which it is difficult to measure a quantity (or
variable) of interest directly, but may be obtained indirectly through its relationship with
another variable. For example, in determining the maximum stress at the extreme fiber of
a steel beam, it is difficult to measure the stress directly; however, through the stress-strain
relationship of the beam material, the maximum stress can be determined by measuring
the corresponding strain at the extreme fiber of the beam. Also, some engineering variables
8.7 Applications of Regression Analysis in Engineering ◄ 335
Figure 8.6 Compression index vs. void ratio of soil. (After Nishida, 1956.)
can be measured more readily and economically than others; for example, the initial void
ratios of clay samples can be measured inexpensively in a laboratory, whereas the direct
measurement of the compression index of the same soil would be much more costly and
would require considerable effort and time. In such a case, if an empirical relation is
established between the void ratio and the compression index of soils, such as the relation
illustrated in Fig. 8.6, we can simply measure the void ratio and predict the compression
index of a soil by applying the appropriate regression equation.
Another example in this regard is the determination of the fully cured 28-day concrete
strength; obviously, this will require testing the concrete specimens after 28 days. At the
rate that construction progresses nowadays, 28 days would be too long; methods of early
determination of concrete strength are highly desirable for quality assurance and have been
suggested, such as using an accelerated strength obtained through an accelerated curing
process. A linear regression has been developed, e.g., by Malhotra and Zoldners (1969), as
shown in Fig. 8.7, between the 28-day strength and the accelerated strength of concrete,
based on data from nine construction jobs across Canada.
In traffic engineering, Heathington and Tutt (1971) have developed linear relationships
between short-interval observed traffic volume and long-term traffic volume in six cities in
Texas; some of these results are shown in Fig. 8.8. Obviously, there is considerable benefit
in using short-term observations for predicting long-term traffic conditions.
Multiple linear regression also finds many applications in engineering; for example,
Martin et al. (1963) used multiple linear regression to determine the expected number of
trips generated, T, per dwelling unit in a community as a function of automobile ownership
X।, population density X2, distance from the central business district and family income
X4 as follows:
kg/cm2
9000
8000
7000
6000
5000
4000
3000
2000
1000
Figure 8.7 Relation between 28-day strength and accelerated strength of concrete. (After Malhotra
and Zoldners, 1969.)
(a) Five-minute peak volume in inbound (b) One hour peak volume in inbound
direction direction x 102
(c) Five-minute volume in both directions (d) Hourly volume in both directions during
during inbound peak five minutes inbound peak hour x 103
Independent
Variables Regression Equation ......x* r (Eq. 8.7)
X1,X2,X3,X4 / = 4.33 + 3.89a-! - O.OO5x2 - 0.128x3 - 0.012x4 0.87 0.837
X.,X2 y' = 3.80 + 3.79xi - 0.003x2 0.87 0.835
X2,X4 y = 5.49 - 0.0089x2 + 0.227x4 1.02 0.764
X. y' = 2.88 + 4.60xi 0.89 0.827
x2 y = 7.22-0.013x2 1.10 0.718
x4 y = 3.07 + 0.44X4 1.20 0.655
x3 y = 3.55 + 0.74x3 1.30 0.575
Alternative multiple linear regressions, using fewer independent variables, were also
performed for this problem. The respective results of these different regression analyses are
summarized in Table 8.1. We can observe from the values of r (see Eq. 8.7) in Table 8.1
that the reduction in the unconditional variance of Y generally increases with the number
of independent variables included in the regression analysis.
Nonlinear regression is also widely used in engineering. Besides those illustrated earlier
in Examples 8.5 and 8.6, Fig. 8.9 shows an application involving the logarithmic transfor
mation, in which data points of the average stress per cycle of repeated loading are plotted
against the logarithm of the number of cycles to failure of concrete beams. Superimposed
on the data points is the linear regression for determining the S-N relation for predicting
the expected fatigue life of concrete beams. In this case, because of the large variability of
concrete strengths, a wide scatter in the data points can be observed.
The application of a double logarithmic transformation is illustrated in Fig. 8.10, where
data points of the logarithm of river flows and the logarithm of distance downstream are
plotted, yielding a linear regression equation for the transformed variables.
Another application of the double logarithmic transformation is shown in Fig. 8.11 in
which data points of the maximum sustained wind speed and the radial distance from the
center of a hurricane are plotted, with the corresponding linear regression line superimposed.
338 Chapters. Regression and Correlation Analyses
0.02
Relative distance, x/xQ
Figure 8.10 River flow vs. distance downstream. (After Shull and Gloyna, 1969.)
Polynomial functions are also often used in nonlinear regression. For example, a third-
degree polynomial function is fitted, through regression analysis, to the data points of vehicle
speeds and corresponding traffic densities in Fig. 8.12.
Numerous other examples of applications of regression analysis can be observed also
in Chapter 1.
Figure 8.11 Surface hurricane wind profile. (After Goldman and Ushijima, 1974.)
Problems ◄ 339
PROBLEMS
8.1 A tensile load test was performed on an aluminum specimen. (a) We may assume that the force-elongation relation over the
The applied tensile force and the corresponding elongation of the range of the applied loads is linear. On this basis, determine the
specimen at various stages of the test are recorded as shown in least-squares estimate of the Young’s modulus of elasticity of
the table on p. 340. the aluminum specimen, which is the slope of the stress-strain
340 Chapter 8. Regression and Correlation Analyses
curve. The cross-sectional area of the specimen is 0.10 in2, and (c) Suppose that the total traffic count on a given day has been
the length of the specimen is 10 in. observed to be 55,000 vehicles. What is the probability that the
peak-hour traffic volume on that day exceeded 7000 vehicles per
Tensile Force Elongation hour? (Hint: Use an appropriate regression analysis.)
8.5 A survey of passenger car weight (kip) and corresponding Dissolved Oxygen Time of Travel
gasoline mileage (miles per gallon) gives the following: DO (ppm) T (days)
0.28 0.5
Car Gasoline Mileage (mpg) Weight (kip) 0.29 1
1 25 2.5 0.29 1.6
2 17 4.2 0.18 1.8
3 20 3.6 0.17 2.6
4 21 3.0 0.18 3.2
0.1 3.8
0.12 4.7
These four cars represent a random sample of the entire passen
ger car population.
(a) Suppose passenger car weight (in kips) is normally dis (a) Determine the least-squares regression of DO on T, i.e.. the
tributed and (by analysis using probability paper) its mean and regression equation for predicting the dissolved oxygen concen
standard deviation are estimated to be 3.33 and 1.04. respec tration based on the travel time of the water downstream.
tively. Find the probability that another car picked at random (b) Evaluate the correlation coefficient between DO and 7'. and
from the population will weigh more than 4.5 kips. also the conditional standard deviation of DO for given T.
(b) Using linear regression analysis with constant variance, an (c) Determine the 95% confidence interval of the regression line.
swer the following: 8.8 The actual concrete strength, Y, in a structure is generally
higher than that measured on a specimen. X, from the same batch
If you buy a car that weighs 2.3 kips, what is the probability of concrete. Data show that a regression equation for predicting
that it will have a gasoline mileage of more than 28 mpg? the actual concrete strength is
(c) Develop the 95% confidence interval of the resulting regres E(F|x) = 1.12x + 0.05 (ksi); 0.1 < x < 0.5
sion equation, and sketch the interval on the graph.
and
8.6 The error incurred in a given type of measurement by
a surveyor appears to be affected by the surveyor’s years Var(F|x) = 0.0025 (ksi)2
of experience. The following is the data observed for five
surveyors. Assume that Y follows a normal distribution for a given value
of x.
Years of Measurement (a) For a given job, in which the measured strength is 0.35 ksi,
Surveyor Experience, Y Error M in. what is the probability that the actual strength will exceed the
requirement of 0.3 ksi?
1 3 1.5 (b) Suppose the engineer has lost the data on the measured
2 5 0.8 strength on the concrete specimen. However, he recalls that it
3 10 1.0 is either 0.35 or 0.40 with the relative likelihood of I to 4. What
4 20 0.8 is the probability that the actual strength will exceed the require
5 25 0.5 ment of 0.3 ksi?
(c) Suppose the measured values of concrete strength at two sites
A and B are 0.35 and 0.4 ksi, respectively. What is the proba
On the basis of the above information, answer the following and bility that the actual strength for the concrete structure at site A
state your assumptions: will be higher than that at site B? You may assume that the pre
(a) For a surveyor with 15 years of experience, what is the
dicted actual concrete strength between the sites are statistically
probability that his measurement error will be less than 1 in.? independent.
(Ans. 0.713)
(b) For a 65-year-old surveyor who has 30 years of experience, 8.9 Data on the construction costs for three houses in a given
can you estimate the probability that his measurement error will residential area are as follows:
be less than 1 in.? Please elaborate.
8.7 Dissolved oxygen (DO) concentration in ppm (parts per mil Floor Area (1000 sq. ft.) Cost ($1,000)
lion) in a stream is found to decrease with the time of travel, T.
1.05 63
downstream (Thayer and Krutchkoff. 1966). The data shown in
1.83 92
the next table below represent a set of measurements of the DO
3.14 204
versus T for a particular stream:
342 Chapters. Regression and Correlation Analyses
Plot the above data on a piece of x-y paper. (c) Sketch the regression line and determine its 95% confidence
(a) Determine by linear regression the relation between the con interval; sketch also this interval.
struction cost of houses as a function of the floor area. Sketch
8.12 Suppose a survey of the effect of a fare increase on the loss
this on the graph.
of ridership for mass transit systems in the United States reveals
(b) Estimate the standard deviation of construction cost for the
the data tabulated below.
given floor area.
(c) How good is the linear relation between cost and floor area? % Fare Increase, X % Loss in Ridership, Y
The answer depends on the correlation coefficient. 5 1.5
(d) If you wish to build a house with 2500 sq. ft. of floor area, 12
35
what is the probability that the construction cost will not exceed 20 7.5
$180,000 (based on the information given above)? Assume that 15 6.3
the cost for a given floor area is normally distributed. 4 1.2
8.10 In order to determine the reliability of the rated mileage 6 1.7
(in mpg) of cars, 6 different makes of cars were driven for the 18 7.2
same distance over combined city and highway roads, with the 23 8
following results: 38 11.1
8 3.6
Make of Rated Mileage, Actual Mileage, 12 3.7
Car mpg mpg 17 6.6
A 20 16 17 4.4
B 25 19 13 4.5
C 30 25 7 2.8
D 30 22 23 8
E 25 18
F 15 12 (a) Plot the above data for the percentage loss in ridership versus
the percentage fare increase in an x-v graph.
(b) Perform a linear regression analysis for predicting the ex
(a) Find the linear regression equation for determining the mean pected percentage loss in ridership as a function of the percentage
actual mileage for the given rated mileage of a car: i.e., develop fare increase for a mass transit system in the United States.
the equation for (c) Evaluate the correlation coefficient between X and Y. and
E(Y\X = x) estimate the constant conditional standard deviation of the loss
in ridership for a given fare increase.
in which (d) Determine the 90% confidence interval of the regression
Y = the actual mileage, in mpg; and equation of Part (b) above.
X — the rated mileage, in mpg. 8.13 Seismic damage to an urban area will depend on the inten
(b) Evaluate the conditional standard deviation of Y for a given sity of a given earthquake. Based on a regression analysis of past
X = x, ,s'y|V, and the correlation coefficient between X and Y. earthquake damage data, the expected damage in a given area
(c) Suppose two cars, models Q and R, have rated mileage of during an earthquake of intensity / was determined to be
22 mpg and 24 mpg, respectively. What is the probability that the
E(D\I) = 10.5+ 15/
model Q car will have better actual mileage than that of model
R car? in which £) = damage loss in $ million. The corresponding con
(d) Plot the data points on an x-v graph, and sketch in the regres ditional standard deviation was found to be ,vO|/ = 30, which is
sion line. Develop and sketch also the 90% confidence interval constant for all /.
of the regression line. (a) On the assumption that the seismic damage loss in the area,
D, is a Gaussian variate, what is the probability that the damage
8.11 In Example 8.3, we show the data of observed settlements
loss will exceed $150 million if an earthquake of intensity / = 6
of pile groups and the corresponding calculated settlements ob
should occur?
tained with a nonlinear model (proposed by Viggiani, 2001)
(b) If damage-causing earthquakes in the area are limited to in
as shown earlier in Table E8.3 for varying levels of loading
tensities of / = 6, 7, and 8 with relative likelihoods of occurrence
(Columns 2, 3, and 4).
of 0.6, 0.3, and 0.1. respectively, what would be the expected
(a) Perform the linear regression of the corresponding calcu
seismic damage loss of the urban area in the next earthquake?
lated settlements on the observed settlements, i.e., E(X|y), and
evaluate the conditional standard deviation 5X|V. 8.14 Several simply supported timber beams were tested exper
(b) Evaluate the correlation coefficient between X and Y. imentally under varying loads P to determine its deflections £).
Problems 343
The measured deflections at the midspan of the test beams were (a) Evaluate the correlation coefficient between V and D. On this
measured as follows: basis, can we say that there is a reasonable linear relationship
between the stopping distance and the speed of travel?
P, tons D. cm
(b) Plot the above stopping distance versus travel speed on an
8.4 4.8 x-y graph.
6.7 2.9 (c) From the above graph, a nonlinear function may be suggested
4.0 2.0 to model the stopping distance-speed relationship as follows:
10.2 5.5
E(D\ V = v) — a + bv + cv2
(a) On the basis of the above test results, determine the linear
regression of the deflection on the load, and the associated con
Estimate the regression coefficients a, b, and c, and evaluate also
ditional standard deviation (assume this is constant under all
the conditional standard deviation sl^v=v.
loads).
(d) Determine the expected stopping distance for a car traveling
(b) Develop also the corresponding 90% confidence interval of
at a speed of 50 mph. However, if the driver wants a 90% prob
the above regression equation.
ability of stopping the car traveling at 50 mph, what distance
(c) What would be the mean deflection of the beam under a load
should he allow?
P= 8 tons? Also, assuming normal distribution, what would be
the 75-percentile deflection under this load? 8.17 Data on the daily consumption of water in the Midwest
have been collected for seven towns with varying population as
8.15 Test data on the deformation and Brinell hardness of a cer follows:
tain type of steel have been obtained as follows:
Total Consumption,
D (deformation, mm): 6 11 13 22 28 35 Town # Population. X (106 gal/day)
H (Brinell hardness, kg/mm ): 68 65 53 44 37 32
1 12.000 1.2
2 40.000 5.2
(a) Estimate the correlation coefficient between deformation 3 60.000 7.8
and Brinell hardness of the particular steel. 4 90,000 12.8
(b) Assuming that the Brinell hardness varies linearly with the 5 120.000 18.5
deformation, determine the regression for E(H\D — d) and the 6 135.000 22.3
associated constant conditional standard deviation sfl\j. 7 180,000 31.5
(c) Suppose that the Brinell hardness corresponding to a given
deformation may be modeled as a normal variate. What is the
probability that the Brinell hardness for a deformation of 20 mm (a) Based on the above observed data, develop the regression
will be between 40 and 50 kg/mm2? of the per capita water consumption on the total population; i.e.,
8.16 As expected, the stopping distance, D, for a car depends on estimate the regression coefficients a and f in the linear equation
the speed of travel, V, and on (he condition of the road surface.
However, there is also variability in the stopping distance at a E(Y\x) = a + 0x
given speed even under the same road condition. From several
road tests on a dry pavement, the following were observed: in which Y is the per capita water consumption.
(b) Estimate the constant conditional standard deviation Sy|x=x-
(c) Determine the 98% confidence interval of the above regres
Stopping Distance Travel Speed
sion equation.
Car# D(ft) V(mph) (d) An engineer is interested in studying the daily consumption
of water in a town A, with a population of 100,000. If he assumes
1 46 25
that the per capita consumption of water is a normal variate for
2 6 5
a given population, what is the probability (on the basis of the
3 1 10 60
above regression equation) that the demand for water in the town
4 46 30
will exceed 17.000,000 gal/day?
5 16 10
6 75 45 8.18 The rate of oxygenation from the atmospheric reaeration
7 16 15 process for a stream depends on the mean velocity of the stream
8 76 40 flow and the average depth of the stream bed. Data for 12 streams
9 90 45 have been recorded as shown in the next table (after Thayer and
10 32 20 Krutchkoff, 1966).
344 Chapters. Regression and Correlation Analyses
Suppose the following nonlinear relationship has been suggested Peak Hour Traffic 24-hour Traffic
for the mean oxygenation rate: Vol., X (in 103) Vol., Y (in IO4)
E(X|V. H) =aVp'Hp2 1.4 1.6
2.2 2.3
2.4 2
Mean Oxygenation Rate, Mean Velocity. V Mean Depth. H
2.7 2.2
X (ppm/day) (fps) (ft)
2.9 2.6
2.272 3.07 3.27 3.1 2.6
1.44 3.69 5.09 3.6 2.1
0.981 2.1 4.42 4.1 3
0.496 2.68 6.14 3.4 3
0.743 2.78 5.66 4.3 3.8
1.129 2.64 7.17 5.1 5.1
0.281 2.92 11.41 5.9 4.2
3.361 2.47 2.12 6.4 3.8
2.794 3.44 2.93 4.6 4.2
1.568 4.65 4.54
0.455 2.94 9.5 Assume that the conditional standard deviation varies in
0.389 2.51 6.29 a quadratic form with x from the origin.
(a) Determine the regression of Y on X.
(b) Estimate the prediction error about the regression line; i.e.,
With the data given in this table, determine the regression coeffi •Sr|x=x-
cients o'. fl। and /32, and evaluate the corresponding conditional (c) Determine the 98% confidence interval for the regression
standard deviation. equation of Part (a).
(d) If the peak-hour traffic volume on a certain morning was
8.19 The peak-hour traffic volume and the 24-hour daily traf
observed to be 3500 vehicles, what is the probability that more
fic volume on a toll bridge have been recorded for 14 days as
than 30,000 vehicles will be crossing the toll bridge on that day?
follows:
REFERENCES
Butts. T. A.. Schnepper, D. H.. and Evans. R. L., “Statistical Assessment Martin, B. V.. Memmott, F. W.. and Bone, A. J., “Principles and Tech
of DO in Navigation Pool," Jour, of Sanitary Engineering, ASCE. niques of Predicting Future Demand for Urban Area Transportation,”
Vol. 96. April 1970. Research Rept. No, 38, Dept, of Civil Engineering, MIT. Cambridge,
Galligan. W. L.. and Snodgrass. D. V., “Machine Stress Rated Lumber: MA, January 1963.
Challenge to Design." Jour, of the Structural Div., ASCE. Vol. 96. Meadows, D. H., Meadows. D. L., Randers, J., and Behrens, W. W., The
December 1970. Limits of Growth, Universe Books, New York, 1972.
Goldman. J. L.. and Ushijima, T.. “Decrease in Hurricane Winds after Miller, A. J., "The Amount of Traffic Which Can Enter a City Center
Landfall." Jour, ofthe Structural Div., ASCE. Vol. 100. January 1974. During Peak Periods.” Transportation Science. ORSA. Vol. 4. 1970.
Hald. A.. Statistical Theory with Engineering Applications, J. Wiley & pp. 409-411.
Sons. New' York. 1952. Moulton, L. K., and Schaub. J. H., "Estimation of Climatic Parameters
Heathington, K. W.. and Tutt, P. R„ “Traffic Volume Characteristics on for Frost Depth Predictions.” Jour, of Transportation Engineering.
Urban Freeway." Transportation Engineering Jour., ASCE. Vol. 97. ASCE. Vol. 85, November 1969.
February 1971. Murdock, J. W., and Kesler, C. E„ "Effects of Range of Stress on Fatigue
Huggel. C., Kaab, A., Haeberli. W., Teysseire, P., and Paul. F., “Remote Strength of Plain Concrete Beams,”,lour, ofAmerican Concrete Inst.,
Sensing Based Assessment of Hazards from Glacier Lake Outbursts: Vol. 30, August 1958.
A Case Study in the Swiss Alps." Canadian Geotechnical Jour.. Vol. Nishida, Y. K., "A Brief Note on Compression Index of Soil," Jour,
39. March 2002. of the Soil Mechanics and Foundation Div., ASCE. Vol. SM3, July
Lambe. T. W„ and Whitman. R. V.. Soil Mechanics, John Wiley & Sons, 1956.
Inc., New York. 1969. p. 375. Payne, H. J., “Freeway Traffic Control and Surveillance Model.” Trans
Malhotra, V. M.. and Zoldners. N. G.. “Some Field Experience in portation Engineering Jour., ASCE. Vol. 99, November 1973.
the Use of an Accelerated Method of Estimating 28-Day Strength Shull. R. D.. and Gloyna, E.F.. “Transport of Dissolved Water in Rivers.”
of Concrete.” Jour, of American Concrete Institute. November Jour, of the Sanitary Engineering Div., ASCE, Vol. 95, December
1969. 1969.
References 345
Smeed. R. .1., “Traffic Studies and Urban Congestion,” Jour. Transporta Viggiani, C., “Analysis and Design of Piled Foundations,” Rivista Ital
tion Economics and Policy, Vol. 2, No. 1, 1968, pp. 33-70. iano di Geotecnica, Vol. 35, 2001.
Thayer, R. P.. and Krutchkoff, R. G., "A Stochastic Model for Pollu Wynn. F. H„ “Shortcut Modal Split Formula,” Highway Research
tion and Dissolved Oxygen in Streams," Water Resources Research Record. No. 283, Highway Research Board. National Research
Center, Virginia Polytechnic Inst.. Blacksburg, VA, 1966. Council. 1969.
The Bayesian Approach
9.1 INTRODUCTION
In engineering, we often need to use whatever information is available in formulating a
sound basis for making decisions. This may include observed data (field as well as ex
perimental), information derived from theoretical models, and expert judgments based on
experience. The various sources and types of information must often be combined regard
less of their respective qualities. Moreover, available information may need to be updated
as new information or data are acquired. When the available information is in statistical
form, or contains variability, which is invariably the case with engineering information, as
we saw in Chapter 1, the proper tools for combining and updating the available information
is embodied in the Bayesian approach. In this chapter, we shall present the fundamen
tals of the Bayesian approach to typical engineering problems involving probability and
statistics.
In Chapter 1, we identified two broad types of uncertainties—the aleatory uncertainty
that is associated with the inherent variability of information, and the epistemic uncertainty
that is associated with the imperfections in our knowledge or ability to make predictions. The
aleatory uncertainly gives rise to a calculated probability, whereas the epistemic uncertainty
leads to a lack of confidence in the calculated probability. In this regard, the Bayesian
approach can be relevant in two ways: (1) to systematically update the existing aleatory
and epistemic uncertainties as additional information or data for each type of uncertainty
becomes available; and (2) to provide an alternative basis for combining the two types of
uncertainties for the purpose of decision making or formulating bases for design (see Ang
and Tang. 1984).
An essential topic concerns the estimation of the parameters of an underlying prob
ability model; we shall see that the Bayesian approach provides another logical basis for
estimating the parameters, which is different from the classical approach that we presented
in Chapter 6. There is also a role for the Bayesian approach in the regression and correlation
analyses between two (or more) random variables.
346
9.2 Basic Concepts—The Discrete Case 347
Equation 9.1a, therefore, gives the posterior probability mass function of 0. (In general,
we shall use ' and " to denote the prior and posterior information.)
The expected value of 0 is then commonly used as the Bayesian estimator of the
parameter; that is,
k
6" = E(&\E) = ^0iP"(@ = 0i) (9.2)
/=!
We may point out that in Eqs. 9.1 and 9.2, the observational data e and any judgmental
information are both used and combined in a systematic way to estimate the underlying
parameter 0.
In the Bayesian framework, the significance of judgmental information is reflected
also in the calculation of the relevant probabilities. In the preceding case, where subjective
judgments were used in the estimation of the parameter 0, such judgments would be reflected
in the calculation of the probability associated with the basic random variable X; for example,
P(X < a) is obtained through the theorem of total probability using the posterior PMF of
Eq. 9.1a. That is,
k
P(X <a) = ^P(X <a\& = 0i)P"(& = 0i) (9.3)
9.2 Basic Concepts—The Discrete Case 349
This represents the updated probability of the event (X < a) based on all the available
information. Jt may be emphasized that in Eq. 9.3 the uncertainty associated with the error
of estimating the parameter [as reflected in P"(& = 0,)] is combined formally and system
atically with the inherent variability of the random variable X. In contrast, the combination
will be difficult in the classical statistical approach (presented in Chapter 6). in which the
parameter uncertainty may be described in terms of confidence intervals.
To further clarify the general concepts introduced above, consider the following
examples.
► EXAMPLE 9.1 Reinforced concrete piles for a building foundation could be subject to defects resulting from poor
construction quality. Some of the common defects would include insufficient bonding, inadequate
length, cracks, and voids in the concrete. An engineer would like to estimate the proportion of piles
that are defective in a given project that may consist of hundreds of piles. Suppose that from the
engineer’s experience with a range of construction quality for various pile foundation contractors in
the region of the project site, he estimated (judgmentally) that the proportion of defective piles, p, for
the site would range from 0.2 to 1.0 with 0.4 as the most likely value; more specifically./? is described
by the prior PMF as shown in Fig. E9.1a. The values of/? are discretized at 0.2 intervals to simplify
the illustration.
On the basis of this prior PMF. which is based entirely on the engineer’s judgment, the estimated
probability of a defective pile would be (by virtue of the total probability theorem)
In order to supplement his judgment, the engineer ordered a pile to be selected for inspection. The
outcome of the inspection shows that the pile is defective. Based on the result of this single inspection,
the PMF of p would be revised according to Eq. 9.1a. obtaining the posterior PMF as follows:
similarly, we obtain the posterior probabilities for the other values of p as follows:
Figure E9.1a Prior PMF of p. Figure E9.1 b PMF of p after one inspection.
350 Chapter 9. The Bayesian Approach
In Fig. E9.1 b, we see that because a defective pile was discovered from the single inspection, the
probabilities for the higher values of /?,• are increased from those of the prior distribution, resulting in
a higher estimate for p. namely, /?" = E(p\e) = 0.55, whereas the prior estimate was 0.44. Observe
that the inspection of a single pile, indicating a defective pile, does not imply that all piles will be
defective; instead, the inspection result merely serves to increase the estimated proportion of defective
piles by 0.1 1 (from 0.44 to 0.55). Figure E9.1c illustrates how the PMF of/? changes with increasing
number of consecutive defective piles observed; the distribution shifts toward /? — I .Oasn —> oo.
9.2 Basic Concepts—The Discrete Case 351
Figure E9.1 d shows the corresponding Bayesian estimate for p; we observe that after a sequence
of six consecutive defective piles, the estimate for p is 0.90. If a long sequence of defective piles is
observed, the Bayesian estimate of p approaches 1.0—a result that tends to the classical estimate; in
such a case, there is an overwhelming amount of observed data to supersede any prior judgment. Or
dinarily, however, where observational data are limited, judgment would be important and is reflected
properly in the Bayesian estimation process. We may point out that if a person has a strong feeling
about certain values of the parameter, it would imply a narrow prior distribution. In this case, it would
require a large amount of observed information to override his or her prior judgment.
Now suppose that each of the main columns in the building is supported on a group of three
piles. The column will suffer settlement problems if all the piles in the group are defective. Consider
the case right after a test pile was found to be defective. Based on the posterior PMF of Fig. E9.1b.
and using Eq. 9.3 with X denoting the number of defective piles, the probability that a main column
will have a settlement problem is
P(X = 3) = P(X = 3\p = 0.2)P"(p = 0.2) + P(X = 3\p = 0.4)P"(p = 0.4)
+ ••• + p(X = 3|p = 1.0)P"(/r= 1.0)
= (0.2)3(0.136) + (0.4)3(0.364) + (0.6)3(0.204) + (0.8)’(0.182) + (l)(0.114)
= 0.255 ◄
► EXAMPLE 9.2 A traffic engineer is interested in the average rate of accidents v at an improved road intersection.
Suppose that from his previous experience with similar road design configurations and traffic condi
tions, he deduced that the expected accident rate would be between 1 and 3 per year, with an average
of 2. and postulated the prior PMF shown in Fig. E9.2. The occurrence of accidents is assumed to be
a Poisson process.
During the first month after completion of the intersection, one accident occurred.
(a) In the light of this observation, i.e., one accident in the first month, revise the estimate for v.
(b) Using the result of part (a), determine the probability of no accident in the next 6 months.
352 Chapter 9. The Bayesian Approach
SOLUTIONS (a) Let e be the event that an accident occurred in one month. The posterior probabilities
according to Eq. 9.1a are therefore,
Note that the probability of the observed event for a given value of v is proportional to the exponential
PDF evaluated at an occurrence lime of 1 month. Similarly,
P"(v = 2) = 0.411
P"(v = 3) = 0.423
Hence, the updated value of v is v" = E(v|e) = (0.166)(l) + (0.411)(2) + (0.423)(3) = 2.26 accidents
per year.
(b) Let A be the event of no accidents in the next 6 months. Then
In Sect. 9.2, the possible values of the parameter 0 (such as p in Example 9.1 and v in
Example 9.2) were limited to a discrete set of values; this was purposely assumed to simplify
the presentation of the concepts underlying the Bayesian method of estimation.
In many situations, however, the value of a parameter could be a continuum of possible
values. Thence, it would be appropriate to assume the parameter to be a continuous random
variable in the Bayesian estimation. In this case, we develop the corresponding results,
analogous to Eqs. 9.1 through 9.3, as follows.
Let 0 be the random variable for the parameter of a distribution, with a prior PDF
f'(6) as shown in Fig. 9.2. The prior probability that 0 will be between 0, and 0-, + A0 is
.f(0,)A0.
9.3 The Continuous Case 353
f"(0i)M = k ------
E PG-m/WAfl
where P(E\0d = P(s\6j <0 <0,+ A0). In the limit, the above becomes
(9.4)
' fZo PW)ff(0)d0
The term P(e\0) is the conditional probability, or likelihood, of observing the experimental
outcomes assuming that the value of the parameter is 0. Hence. P(e |0) is a function of 0 and
is commonly referred to as the likelihoodfunction of 0 and denoted L(0). The denominator
is independent of 0; this is simply a normalizing constant required to insure that/"(0) is a
proper PDF. Equation 9.4. therefore, may be expressed also as
where
k= L(0)f'(0)d0] 1 is the normalizing constant
L(0) = the likelihood of observing the experimental outcome e assuming a given 0.
We observe from Eq. 9.5 that both the prior distribution and the likelihood function con
tribute to the posterior distribution of 0. In this way, as in the discrete case, the significance
of judgment and of any observational data is combined properly and systematically; the
former through f'(0) and the latter embedded in L(0).
Analogous to the discrete case, Eq. 9.2, the expected value of 0 is commonly used as
the point estimator of the parameter. Hence, the updated estimate of the parameter 0, in the
light of the observational data s, is given by
The uncertainty in the estimation of the parameter can be included in the calculation of the
probability associated with a value of the underlying random variable. For example, if X is
354 Chapter 9. The Bayesian Approach
Physically, Eq. 9.7 is the average probability of (X < «) weighted by the posterior proba
bilities of the parameter 0.
EXAMPLE 9.3 Let us consider again the problem of Example 9.1. in which the proportion of defective piles at the
site is of concern; this time, however, we will assume that the probability p is a continuous random
variable. If there is no (prior) factual information on /?, a uniform prior distribution may be assumed
(known as the diffuse prior), namely,
On the basis of a single inspection of one pile, revealing that the pile is defective, the likelihood
function is simply that the probability of the event £= 1 pile selected for inspection is defective,
which is simply p. Hence, the posterior distribution of p, according to Eq. 9.5, is
0 < p < 1
p ■ Zpdp = 0.667
If a sequence of n piles were inspected, out of which r piles were found to be defective, then
the likelihood function is the probability of observing r failures among the n piles inspected. If the
proportion of defective piles, or equivalently the probability of each pile being defective is p. and
statistical independence is assumed between the piles, the likelihood function would be defined by
the binomial PMF; i.e.,
/ H \
r(p) = U /(i -p)1 ()</><!
\r /
where
f' [n]pr(\-py-rdp
° \r /
9.3 The Continuous Case 355
f P || /(I - prrdP
do \r /
P" = E(p\£) = ---- ------------------------------
[ (" )prd -pr~'dp
Jo \ r /
f1
/ pr+,0 - p)n~rdp
_ Jo ______________
[ pr(l-p)n~rdp
Jo
Repeated integration-by-parts of the above integrals yields
... r+ r+l
n c' , n+ 2
/ (p" - pn)dp
Jo
From the above result, we may observe that as the number of inspections n increases (with the ratio
r/n remaining constant), the Bayesian estimate for p approaches that of the classical estimate; that is,
r+l r
---------- > — for large n
n + 2----- n
► EXAMPLE 9.4 An engineer is designing a temporary structure subjected to wind load on a newly developed island
in the Pacific. Of interest is the probability p that the annual maximum wind speed will not exceed
120 km/hr. Records for the annual maximum wind speed in the island are available only for the last
5 years, and among these, the 120 km/hr wind was exceeded only once. An island in this region has
a longer record of wind speeds; however, it is at some distance away. After a comparative study of
the geographical condition in the two islands, the engineer inferred from this longer record that the
average value of p for the newly developed island is 2/3 with a c.o.v. of 27%. Since p is bounded
between 0 and 1.0, the following beta distribution (consistent with the above statistics) is also assumed
for the prior distribution:
In this case, the likelihood that the annual maximum wind speed will exceed 120 kph in 1 out of
5 years is
5\
( 4/
I P4d - P)
/"(p) = kL(p)f'(p)
where
Thus.
In this case, the prior PDF is equivalent to the assumption of one exceedance in 4 years, whereas
the resulting posterior distribution is tantamount to two exceedances in 9 years. In fact, the above
posterior distribution is the same as that obtained for a case in which two exceedances were observed
in nine years and assuming a diffused (i.e., uniform) prior distribution. This example should serve
also to illustrate a property of the Bayesian approach—namely, that information from sources other
than the observed data can be useful in the estimation process.
The relation between the likelihood function and the prior and posterior distributions of the
parameter/? is illustrated in Fig. E9.4. Observe that the posterior distribution is “sharper” than either
the prior distribution or the likelihood function. This implies that more information is “contained” in
the posterior distribution than in either the prior or the likelihood function.
► EXAMPLE 9.5 The occurrences of earthquakes may be modeled as a Poisson process with a mean occurrence rate v.
Suppose that the historical record for a region A shows that no earthquakes have occurred in the past
to years. The corresponding likelihood function is then given by
If there is no other information for estimating v, a diffuse prior may be assumed; this implies that/'(v)
is independent of the values of v and thus can be absorbed into the normalizing constant k. Then the
9.3 The Continuous Case 357
/"(v) = AL(v)
v > 0
"o'-
Upon normalization, k = tn. The resulting/"(v) may be compared with the gamma density function
of Eq. 3.44 (for the random variable v), thus indicating that the posterior distribution of v follows the
gamma distribution. The probability of the event (E = n earthquakes in the next t years in region A)
is then given by Eq. 9.7 as follows:
/*OG
P(E) = / P(E\v)f"(v)dv
Jo
_w to(vto)"0 _Wo,
= / ------e ------------ e "dv
Jo nl nQ\
= / /■“ \ (n+n0)!
\Jo (n+»())'. / nlnol (t + t0)n+"o +1
Since the integrand inside the parentheses is a gamma density function, the integral is equal to 1.0.
Hence,
An interesting application of the Bayesian updating process is in the inspection and detec
tion of material defects (Tang, 1973). Fatigue and fracture failures in metal structures are
frequently the result of unchecked propagation of flaws or cracks in the joints (welds) or
base metals. Periodic inspection and repair can be used to minimize the risk of fracture fail
ure by limiting the existing Haw sizes. Methods of detecting Haws, such as nondestructive
testing (NDT) are often used. However, such detection methods are invariably imperfect;
consequently, not all flaws may be detected during an inspection.
The probability of defecting a flaw generally increases with the flaw size and the
detection power of the device. An example of a detectability curve for ultrasonics method
is shown in Fig. 9.3. Hence, even when a structure is inspected and all detected flaws
are repaired, it is difficult to ensure that there are no flaws larger than some specified
size.
Suppose that an NDT device is used to inspect a set of welds in a structure and all
detected flaws are fully repaired. On the basis of this assumption, the flaws that remain in
358 Chapter 9. The Bayesian Approach
Figure 9.3 Detectability versus actual flaw depth. (Data from Packman et al., 1968.)
the weld would be those that were not detected. Let X be the flaw size and D the event that
a flaw is detected. The probability that a flaw size (for example, depth) will be between x
and x + dx given that the flaw was not detected is, therefore,
in which /’y(x) is the distribution of the flaw size prior to inspection and repair, whereas
fx(x\D) is the distribution of the flaw size after inspection and repair have been performed.
Also, P(D\x) = 1 — P(D\x), where P(D\x) is simply the probability of detecting a flaw
with depth x, which is the function defined by the detectability curve, such as that shown
in Fig. 9.3. Comparing Eq. 9.8 with Eq. 9.5. we observe that Eq. 9.8 is of the same form as
Eq. 9.5, with the following equivalences:
► EXAMPLE 9.6 As an illustration, suppose that the initial (prior) distribution of the flaw depths X in a series of welds
has a triangular shape described as follows (see Fig. E9.6):
Assume also that the NDT device used in the inspection has the detectability curve shown in Fig. 9.3;
mathematically, this curve is given by
0; x <0
P(D\x) = 8x; 0 < x < 0.125
1.0; x > 0.125
Substituting the appropriate expressions for each interval of X into Eq. 9.8, we obtain the updated
PDF of flaw depths as follows:
0 x <0
k(\ - 8x)(2O8.3x) 0 < x < 0.06
fx(x\D) =
k(\ — 8x)(20 — 125x) 0.06 < x < 0.125
0 x > 0.125
0 x <0
495x - 3964x2 0 < x < 0.06
fx(x\D) =
47.6 - 678x + 2379x2 0.06 < x < 0.125
0 x > 0.125
The above prior, likelihood, and posterior functions are plotted in Fig. E9.6. It can be observed that
the likelihood function, which is the “complementary function” of Fig. 9.3, behaves as a filter, it
cuts off flaws larger than 0.125 in., and it also eliminates many of the remaining larger flaws. Thus,
after the inspection and repair process, the distribution of flaw depths is shifted toward smaller flaw
sizes.
360 ►' Chapter 9. The Bayesian Approach
If the experimental outcome e in Eq. 9.4 is a set of observed values (jq, x2,... ,x„), rep
resenting a random sample (see Sect. 6.2.1) from a population X with an underlying PDF
fx(x), the probability of observing this particular set of values, assuming that the parameter
of the distribution is 0, is
i=i
Then, if the prior PDF of 0 is f'(0), the corresponding posterior density function becomes,
according to Eq. 9.4.
n AUii^w
j=i
.r(^)
r(^) = n
fx(xi \0)dx
i=i
= kue)f\e) (9.9)
whereas the likelihood function L(0) is the product of the PDF of X evaluated at (X|,
x2, ...,x„), or
n
M0)=nAu(i0) (9.10)
i=i
Using the posterior PDF of 0, Eq. 9.9, in Eq. 9.6 we, therefore, obtain the Bayesian estimator
of the parameter 0. It is interesting to observe that the likelihood function of Eq. 9.10 is the
same as that given earlier in Eq. 6.7 in connection with the classical method of maximum
likelihood estimation. Furthermore, if a diffuse prior distribution is assumed (e.g., as in
Eq. 9.13 below), then the mode of the posterior distribution. Eq. 9.9. would give the maxi
mum likelihood estimator.
In the case of a Gaussian population with known standard deviation, a, the likelihood
function for the parameter, /i, according to Eq. 9.10, is
i.l 7^exp
fl
L(/z) = = n ATgUz, (7)
1=1
where Nlt(Xi, a) denotes the PDF of /.z with mean value xt and standard deviation a. it can
be shown (e.g., Tang, 1971) that the product of m normal PDFs with respective means /z,
9.4 Bayesian Concept in Sampling Theory 361
and standard deviations a, is also a normal PDF with mean and variance
(9.11)
(9.12)
/"(M) = kL(n)
where k is necessarily equal to 1.0 upon normalization. Therefore, without prior information,
the posterior distribution of /z is Gaussian with a mean value equal to the sample mean x
and standard deviation rr/^/u.
Using the expected value of /z as the Bayesian estimator, we obtain, in accordance with
Eq. 9.6,
/z" = £’(/z|e) = x
That is, the sample mean x is the point estimate of the population mean. We recognize
that this is the same as the classical estimate of Eq. 6.1. Therefore, in the absence of prior
information, the Bayesian and classical methods give the same estimates for the population
mean. Conceptually, however, the Bayesian basis for this estimate differs from that of the
classical approach. Whereas Eq. 9.13 says that the posterior distribution of /z is Gaussian
with mean x and standard deviation rr/^/77, the classical approach (of Sect. 6.2) says that
the sample mean X is a Gaussian random variable with mean /z and standard deviation
or/Vn.
/"(m) = A'L(/z)/'(/z)
which is a product of two normal PDFs. Again, it can be shown from Eq. 9.11 that/"(/z) is
also Gaussian with mean
n _ I x/(cr/7n)2| + |/x7(az)2| _ x(or')2 +/z'(or2/n)
(cr,)2(cr2/n)
(9.15)
(or7)2 + (cr2/n)
M =M
That is, the Bayesian estimate of the mean value is an average of the prior mean /a.'
and the sample mean x, weighted inversely by the respective variances, as indicated in
Eq. 9.14. As expected, the posterior mean approaches the sample mean for large sample
size n. However, it will also approach the sample mean if the basis for the judgment is
relatively weak (i.e., large cr') or if the population shows small scatter (i.e., small a). In
these cases, a given sampling plan will become more effective.
Equation 9.14 is an example of how prior information is combined systematically with
observed data, in the present case, for estimating the mean value /z.
It is important to observe that the posterior variance of /x. as given by Eq. 9.15. is
always less than! (cr')2 or (cr2/n); that is, the variance of the posterior distribution is always
less than that of the prior distribution or of the likelihood function.
On the basis of the posterior distribution of /z, i.e., NM(/z", cr"), and with Eqs. 9.14 and
9.15. we may also determine the probability that /z is between a and b by
= / fx(x\9)f\6)de (9.16)
In the case of a Gaussian variate X. with known cr. and /z estimated from a sample data.
fx(x)= / fx(x\/i)f"(/i)dn
where fx(x\/j.) = Nx(/i,<x). and/"(/z) is given by Eq. 9.13. Again it can be shown (e.g.,
Tang, 1971) that the last integral above yields the normal PDF
EXAMPLE 9.7 A toll bridge was recently opened to traffic. For the past 2 weeks, records on rush-hour traffic during
the last 10 work days showed a sample mean of 1535 vehicles per hour (vph). Suppose that rush-
hour traffic has a normal distribution with a standard deviation of 164 vph. Based on this observed
information, the posterior distribution of the mean rush-hour traffic /z is, according to Eq. 9.13,
7V( 1535. 164/vTO) or N( 1535, 51.9) vph. The point estimate of /z, therefore, is 1535 vph.
The probability that /z will be between 1500 and 1600 vph is given by
Of greater interest are probabilities associated with the rush-hour traffic (rather than its mean) on a
given work day. Suppose that for the present toll collection procedure, problems could arise if the
rush-hour traffic exceeds 1700 vph on a given day. Then the probability that this will occur on any
given day, based on Eq. 9.17, is given by
1700- 1535 \
P(X > 1700) =!-<!>
/(I64)2 + (5I.9)2/
= 1-0 (0.958)
= 0.169
In other words, in about 17% of the work days, the present toll collection system will be inadequate
during rush hours. Observe that the error in the estimation of /z has been included in computing this
probability.
Now suppose that before the toll bridge was opened for traffic, simulation modeling was per
formed to predict the rush-hour traffic on the bridge. Based on the simulation results alone, it was
estimated that the mean rush-hour traffic on a work day would be 1500 ± 100 with 90% confidence.
How can this information be used with the observed traffic flow in the estimation of /z?
Assuming a Gaussian prior and with the foregoing simulation results, we obtain the prior distri
bution of the mean rush-hour traffic /z to be N (1500. 60.8) vph. Then, applying Eqs. 9.14 and 9.15.
the posterior distribution of /z is also Gaussian with
and
(60.8)2 (51.9)2
------- x------------x = 39.5 vph
(60.8)2 + (51,9)2
Therefore, by incorporating the result of simulation, the estimated mean rush-hour traffic is 1520 vph
and the corresponding standard deviation is 39.5 vph. ◄
► EXAMPLE 9.8 Five repeated measurements of the elevation (relative to a fixed datum) on the top of a bridge pier
under construction were performed as follows:
20.45 m
20.38 m
20.51 m
20.42 m
20.46 m
Assume that the measurement error is Gaussian with zero mean and a standard deviation of
0.08 m.
(a) Estimate the actual elevation of the pier based on the given measurements.
(b) Suppose that the elevation of the pier was previously measured by another surveying crew; the
elevation was estimated to be 20.42 ± 0.02 m (that is, the mean measurement was 20.42 m with a
standard error of 0.02 m). Estimate the elevation of the pier, taking advantage of this additional prior
information.
SOLUTION The estimation of an actual dimension <5 in surveying and photogrammetry is equivalent to
the estimation of the mean value of a random variable (see Sect. 6.2.3). Measurement error is invariably
assumed to be Gaussian with zero mean; this means tacitly that a set of measurements constitute a
sample from a normal population. Therefore, the results derived in Sect. 9.4.2 are applicable to the
estimation of geometric quantities in surveying and photogrammetry.
(a) The sample mean of the five measurements is
= 20.444 m
(0.036)2 (0.020)2
= 0.017 in
(0.036)2 + (0.020)2
► EXAMPLE 9.9 The annual maximum flow of a stream has been recorded for the last 5 years as follows:
Based on extensive data from nearby streams, it was determined that the annual maximum stream
flow may be modeled by a lognormal distribution. Assume that the parameter < in the lognormal
distribution may be estimated from the above five sample values as follows. The natural logarithm
of the above data values are, respectively, 3.07, 2.96, 3.15, 3.00, and 2.90 from which we obtain the
sample mean x = 3.016, and the sample standard deviation < =0.097.
Without any prior information, the posterior distribution of Z, according to Eq. 9.13, is
N(x, or A(3.016, 0.097/\/5) = AT3.016. 0.043). However, if prior information is available,
it can be incorporated through the prior distribution of Z. For example, suppose that/'(Z) is assumed
to be N(2.9. 0.06): then from Eqs. 9.14 and 9.15 the posterior distribution./"(Z), will be normal
with
3.016(0.06)2 + 2.9(0.043)2
(0.06)2 + (0.043)2
and
(0.06)2(0.043)2
= 0.035
(0.06)2 + (0.043 )2
That is, in this latter case, the posterior distribution of Z is 7V(2.98, 0.035).
Exponential Gamma
, „ x v(vA.)
* -le-”*
fx(x) = ke~kx a
M» = r«>
Normal Normal
1 ■ 11 // x — m
\
\2 1 1 /M V
,/xU) = — exp P ,/m(M) = — exp
2 \ <7 / v fi 2\ /
(with known cr)
Gamma-Normal
r I I 1 (M — m \2 I
1 exp --I —— 1
[ x/27r<r/z! [_ 2\cr/^/n/
[ [(« - l)/2]("+l)/2 / u \(«-i)/2 / n- 1 M \]
| r[(n + l)/2] w) CXP\ 2
Poisson Gamma
„ , v(vp)k~'e
Px(x) = ---- — e
x! M>‘,=—w
Lognormal Normal
1
/x(x) = —=—exp
yR-jt^x
(with known <)
► EXAMPLE 9.10 The occurrence of flaws in a welded joint may be modeled by a Poisson process with a mean occurrence
rate of /z flaws per meter of weld. Actual observations performed with a powerful device (assume it
would not miss detecting any significant flaw) detected five flaws in a weld of 9.2 m. However, from
previous experience with a similar type of welds, the mean flaw rate would vary with the quality of
workmanship on a specific job. and based on this information, the engineer believes that the mean flaw
rate in the present job has a mean of 0.5 flaw/m with a c.o.v. of 40%. Determine the mean and c.o.v.
of /z for this type of weld, using the observed data as well as the information from prior experience.
Since the number of flaws in a given weld length is described by the Poisson distribution,
it is convenient, according to Table 9.1, to prescribe its conjugate gamma distribution as the prior
distribution for the parameter /z. From the information given above, and observing from Sect. 3.2.8 (see
also Column 4 of Table 9.1) for the mean and variance of the gamma distribution, we have the mean of p
E'(ZZ) = - = 0.5
v'
and c.o.v. of /z
yjk'/va
8'(ix) =
k'/v'
Thus, the prior parameters of the gamma distribution are k' = 6.25 and v' = 12.5.
It follows then that the posterior distribution of /z is also gamma distributed. From the relationship
given in Table 9.1 (column 5) between the prior and posterior statistics, and the sample data, we
evaluate the parameters k" and v" of the posterior gamma distribution as follows:
9.4 Bayesian Concept in Sampling Theory 367
„ _ M^(o-2/n) + x<7^
E(n) = Mm
CT2/n+(ffp)2
(a[)2(a2/n)
Var(M) = ffg < =
(ap2 4- a2/n
Var(a) = ——r — E2(a) v > 2 u" = [( v'u' 4- n'm'2) 4- (v« 4- nm2) — n"m"2]/v"
v—2
k
E(n) = - v" = v' + 1
V
k
Var(/z) = — k" = k' + x
V2
'«2/») + cr2lnx
EM = /z M —
<2/n 4- a2
a2(X2/n)
Var(X) = <72 rr" = ,
cr2 4- <2/«
1 - * =030
8" fa) =
V11.25
► EXAMPLE 9.11 Suppose the maximum water elevation of a river, H, during a flood can be described by an exponential
distribution as follows:
No official record on flood occurrences exists at a given site. However, the local inhabitants recall
that only two floods have occurred in recent years, whose elevations are at least 5 ft. Assume that the
flood elevations between floods are statistically independent, and there is no other information for
estimating the parameter A.
(a) Determine the distribution of X, its mean value, and c.o.v. of H based on the given information.
(b) What is the probability that the next flood will exceed 5 ft?
368 ► Chapter 9. The Bayesian Approach
(c) After a period of time, three floods occurred at the given site and the maximum water elevations
were recorded as 3.4. and 5 feet, respectively. With this new information, what would be the updated
distribution of X, its mean value, and c.o.v.?
SOLUTIONS (a) In this part, since the observed information is in terms of exceedance (instead of
exactly equal to) of a given value, the use of conjugate concepts for Bayesian updating would not be
applicable. We need to start with the basic updating equation of Eq. 9.5. The likelihood function in
this case is the probability of water elevation exceeding 5 ft in two floods in terms of A. For one flood,
the probability of water elevation exceeding 5 ft is
P = P(H > 5) = 1 - F„(5) = e~5X
Hence, the likelihood function becomes, with £ = two floods exceeding 5 ft,
L(A) = P(e|A) = p2 = (e~5z)2 = e_10x
By assuming a diffused prior distribution of Eq. 9.5. the updated distribution for A is
/"(A) = ke~'0K
Hence, f is another exponential distribution with a parameter of 10. The corresponding mean and
c.o.v. are. respectively, 0.1 and 1. Note that this is also a gamma distribution with parameters v = 10
and k = 1 (see Table 9.1).
(b) The probability that the next flood will exceed 5 feet is determined by integrating over all possible
values of A, yielding the updated probability as
/•OO yOO IQ pOO
P"(H > 5) = / P"(H > 5|A)/"(A)JA = / e“5x(l()e^H,x)JA = — / 15e~,5AJA 0.667
Jo Jo 15 Jo
(c) Since the basic random variable X is the flood level, which is exponentially distributed, the
gamma distribution (with v' = 10 and k' = 1) obtained in Part (a) above would serve as a suitable prior
distribution for the parameter A for this part of the updating. Applying the formula in Table 9.1, the
posterior distribution of A would also follow the gamma distribution with
whose mean and c.o.v. are, respectively, 4/22 = 0.182 and 1/2 = 0.5. ◄
02. It'6? । and 02 are statistically independent,/'(01. O2) can be expressed as/'(0|)./'(^2) to
facilitate the assessment of the prior joint distribution. However, as will be shown later in an
example, 01 and 02 may not necessarily remain statistically independent after incorporation
of the observed information through the likelihood function. Again, k is the normalizing
constant to ensure that the posterior joint distribution f"(6], 02) is a proper PDF. It would
be determined from
or
/ / kue\,e2)f'(ei,O2)df)}d02 = I (9.19)
J—00 J —00
With the joint PDF of 0\ and f)2 defined, the respective marginal distributions of 0| and 02
can be determined according to Eqs. 3.67 and 3.68 as
/"(6>i) = / (9.20a)
oo
/ -00
/"((9|,(92Wi (9.20b)
► EXAMPLE 9.12 Boulders arc frequently found embedded in a certain type of soil stratum. The occurrence of these
boulders may be modeled according to a Poisson process in a spatial domain with a mean rate /z per
unit volume of the soil deposit. To gain information on the density of boulders in a given deposit, an
engineer has penetrated a rod (of negligible cross-sectional area) to a depth of 40 ft into the stratum.
If the rod has not encountered any boulder, what can be said about the mean rate /i of boulders in the
soil deposit?
It may be obvious that the size of the boulders would also play a role in interpreting the observed
result. To simplify the problem, we may assume that the boulders are spherical and all have the
same radius R at this site. However. R is highly variable between sites. Prior to the penetration test,
the radius R of the boulder is believed to be uniformly distributed between I and 4 ft, whereas no
other information is available on the parameter /z. We would be interested in determining the updated
distribution of /z and R in light of the penetration test.
The ordinate of the prior PDF of the radius R is a constant l/3 over the range from I to 4 ft.
By assuming a diffused prior distribution for /i.f'fjj.) may be simply absorbed into the normalizing
constant. The event of not encountering any boulder along the rod is equivalent to not finding any
boulders whose centers arc within a distance, R — r. from the rod. Hence, the likelihood function is
equivalent to the probability of zero occurrence in a cylindrical volume of jrr2(40). Since the centers
of the boulders occur according to a Poisson process with a mean rate of n per unit volume of soil,
the likelihood function is
r1 c4 i .
— I -------- dr = 40tt(4) = 160/r
_3 J i 40rrr2
Hence,
/"('■>= r ~
Jo 3 3r2
f4 4 4
E"(R) = r •—Jr =-In 4= 1.85ft
3r- 3
The prior and the updated posterior marginal distributions of the radius R are as show n in Fig. E9.12a.
Therefore, based on the observed event of “no encounter by a 40-ft rod,” the expected boulder radius
reduces from 2.5 ft to 1.85 ft.
Similarly, the marginal distribution of the mean rate /.i can be determined, through a transforma
tion of variable, as follows,
V 40 3^/Tf
9.5 Estimation of Two Parameters 371
which is a decreasing function as shown in Fig. E9.12b. from which we obtain the corresponding
updated expected mean boulder rate of 0.0035 per cubic foot. In order to ensure a lower frequency
and size of boulders in this soil stratum, the event of no encounter by one or more rods with a deeper
penetration would be needed. ◄
Geologic anomalies in a soil deposit are often causes of geotechnical failures. Hence,
the detection of these anomalies through a site exploration program is important. As demon
strated in Example 9.12. Bayesian methods can systematically incorporate observed site
exploration results with available judgmental information to infer the characteristics, such
as the occurrence probability or frequency and size distributions, of relevant anomalies.
Further literature on this subject can be found in Tang et al. (1983, 1986, 1990, 1991. and
1993).
EXAMPLE 9.13 A set of 10 tests for the bending strength of Douglas fir lumber shows a sample mean of 6 ksi and a
sample standard deviation of 1.2 ksi. Prior to these tests, the engineer believed that the mean bending
strength of this species of wood has a mean of 5 ksi and a c.o.v. of 16.3%; and the standard deviation of
the bending strength to have a mean of 1.253 ksi and a c.o.v. of 52.3% based on previous experiences
with this kind of material. Assume that the bending strength follows a normal distribution N(fi. a).
The engineer now wishes to update these prior estimates of the parameters /z and a by combining
the observed test results with his previous estimates. For simplicity, we will apply the conjugate
approach for the updating process. Since both /z and a need to be estimated, the joint Gamma
Normal distribution may be used as a convenient prior for this purpose according to Table 9.1. The
Gamma-Normal distribution is defined by four parameters, namely, tn, u, n, and v where v = n — 1.
The first step is to evaluate the prior parameters consistent with the engineer’s prior experience.
In summary, the prior information is as follows: E'(/j.) = 5; 8'(/x) = 0.163; E'(a') = 1.253; and
<5'(cr) = 0.523. By using the formulas in Column 4 of Table 9.1, we evaluate the values of/«', u'. n',
and v' to be 5, 1, 3. and 4. respectively.
In the next step, we would use the formula in Column 5 of Table 9.1 to combine the prior and the
observed information. Recall that the observed statistics are n— 10; x = 6; .s = 1.2; v = n — 1 =9.
Hence, we have
n" = z/ + n = 3+ 10 = 13
and
Also,
v = v -|-v + l= 4-|-9-|-l = 14
Finally, the posterior statistics may be evaluated by using the formulas in Column 4 of Table 9.1,
yielding
The updated mean of 5.769 ksi is between the prior estimate and the sample mean. It may be observed
that the uncertainty level represented by the c.o.v. of n is reduced from 16.3% to 6.1%. For the
estimated standard deviation, s, further mathematical manipulation with the formulas in Column 4 of
Table 9.1 would yield an updated mean and standard deviation of 1.242 and 0.257 ksi, respectively.
Hence, the uncertainty level, c.o.v., of s is also reduced from 52.3% to 20.7%. ◄
E(T|x) = a + £x (9.22)
in which the parameters a, p, and <72 are estimated from a set of sampled data. Based on
least squares errors, the formulas for estimating these parameters are
.= (9 24)
^2 *1 — nx
and
a = y - px (9.25)
E(Y\x) = a + Px (9.26)
9.6 Bayesian Regression and Correlation Analyses 373
and
By combining the likelihood function with an assumed prior distribution of the parameters
according to Eq. 9.18. the updated distribution of the parameters is
/" (a, fl, a2) = kL(a, fl. o2)f\a, fl, or2) (9.29)
which can be subsequently used to determine the updated distribution of the mean-value
function, 6X = E(y|x), as follows:
in which Ny(a + flx, o) denotes a normal PDF for Y with mean (a + flx) and standard
deviation o. The above formulation is generally difficult to carry out analytically, although
the use of compatible prior distributions could be helpful. Numerical procedures would
generally be required to obtain the pertinent results. A set of useful results is presented in
Tang (1980) for the case of databased prediction (i.e., no prior information on the regression
coefficients including o) as follows:
n— 1
Var(0J = ----- - (9.33)
n —3
in which ot, fl, and o2 are estimated from the observed statistics according to Eqs. 9.24
through 9.27, and s2 = the sample variance of each x,. Observe that the (epistemic) un
certainty in the regression line depends on the degree of extrapolation as measured by
(x —x'Y'/s2 and the number of data points n in evaluating the regression line. A larger
374 ► Chapter 9. The Bayesian Approach
extrapolation or a smaller sample size will result in a larger variance Var(0x). The factor
(/? — 1)/(/? — 3) accounts for the contribution of the uncertainty in the estimate of the basic
scatter a2; its effect diminishes as the sample size increases. Finally, these uncertainties on
the parameters can be combined with the basic scatter in evaluating the overall uncertainty
on the predicted value of Y for a given value of x. Indeed, by using a formula presented in
Raiffa and Schlaifer (1961) for a general normal regression process. Tang (1980) has shown
that
n — I
Var(T|x) =----- -(1 + y)a2 (9.34)
n—3
in which y stands for the quantity within the square brackets in Eq. 9.33. It represents the
uncertainty contributed by the variance in the mean-value function or regression equation.
In the extreme case, when n goes to infinity, Var(T|x) approaches a2 as expected, since the
epistemic uncertainty will not be significant. The expected value of Y remains that given
by the regression equation of Eq. 9.32.
► EXAMPLE 9.14 Consider the precipitation-runoff data presented in Example 8.2, where the values of a, and cr2
here have been determined to be a = —0.14, £ = 0.435. and cr2 =0.075. Hence, for a storm with
precipitation x (in.), the runoff will have a mean
As an example, for a storm with 4 in. of precipitation, the runoff would have a mean of
E( Y |X = 4) = -0.14 + (0.435)(4) = 1.60 in.
and a variance of
24 I 25(4 —2.16)21
Var(F|x=4)= - ( 1 + 24 36.8 | 0.075 = 0.0854 in2
If the value of the basic scatter cr2 can be assumed to be equal to 0.075. there would not be any
contribution from the uncertain a2. Hence, Var(F|x) in Eq. 9.34 becomes (l+y)0.()75, yielding
0.078 for the variance of Y at x = 4. Moreover, the runoff Y will be normally distributed. The pro
bability that the runoff will exceed 2 in. in this storm with 4 in. of precipitation may be determined as
/2 — 1.60\
P(Y > 2|x = 4) = 1 - 4> ■ .. . = I - <D(0.435) = 0.332
V 7078 /
which limited data on Y are available. To supplement these data or to obtain a preliminary
estimate of the mean value of T. one may explore if a regression relation exists between
Y and another variable. As an example, consider the runoff Y in Example 9.13 for a storm
with a precipitation of 4 in. Suppose only two runoff values have been observed associated
with storms having precipitation of 4 in. In this case, Bayesian regression analysis can be
used to provide the prior mean and variance of £(T|x = 4), which can be updated with the
directly measured runoffs for storms with a precipitation of 4 in. The updating here would
reduce to the case of updating the mean value of the random variable Y at x = 4.
► EXAMPLE 9.15 If the runoff values observed for two storms with precipitation of 4 in. each at a given location are
1.5 in. and 2.5 in., respectively, what are the updated estimates of the mean and variance of the mean
runoff from storms with 4 in. of precipitation?
Let 04 denote the mean runoff for a storm with a precipitation of 4 in. From the Bayesian
regression analysis on the runoff-precipitation data in Example 8.2 or 9.14, it was shown that
£(04) = -0.14 + (0.435)(4) = 1.60 in.
and variance
24 ‘ 1 I 25(4-2.16/ 1
Var(04) = — .25 | ’ + 24 3C8 J 0.075 = 0.0036 in2
by using Eqs. 9.32 and 9.33. These two values will be the prior statistics of For the directly
measured values, we have y = 2. Suppose the value of runoff Y at x = 4 in. is normally distributed
with variance assumed to be known, equal to 0.075. then the updated statistics of 04 can be determined
from Eqs. 9.14 and 9.15 as
(2)(0.0036) + (1,60)(0.075/2)
= 1.64
(0.0036) + (0.075/2)
(0.0036)(0.075/2)
Var"(tt») =
(0.0036) + (0.075/2)
We may observe that the updated mean is closer to 1.60 than 2, indicating that in this case the directly
measured runoff values have less weight than that given by the regression equation. ◄
in which
pY\Xi = PY + P---- (Xi - pX) (9.36)
Invoking the Bayesian updating equation, the updated distribution of the correlation
coefficient is given by
f"(P) = kUp)f'(p) (9.38)
► EXAMPLE 9.16 The data in Example 9.13 show that there is significant correlation between the precipitation and runoff
at the Monocacy River. Suppose prior information on the correlation coefficient is also available. De
termine the corresponding updated distribution of the correlation coefficient. Numerical calculations
based on Eqs. 9.36 through 9.38 arc used for this task. Three separate prior distributions were consid
ered to describe the information on the correlation coefficient before collecting the precipitation-runoff
data. They are:
Results for the updated (posterior) distributions corresponding to each of these three cases are
shown in Fig. E9.16. The statistics of the correlation coefficient before and after incorporating the
observed data are summarized in Table E9.16. Since the observed data consist of 25 pairs of measured
values and the data follow closely a linear trend, the posterior distribution of p is dominated by the
observed data; in this case, therefore, the different prior assumptions have minimal influence on the
updated distribution and statistics.
► PROBLEMS
9.1 A new structure is subjected to proof testing. Assume that (b) If only one structure is proof tested, and it survives the max
the maximum proof load is specified at a reasonably high level imum proof load, determine the updated distribution of the sur
so that the calculated probability of the structure surviving the vival probability.
maximum proof load is 0.90. However, it is felt that this calcula (c) What is the expected probability of survival after the proof
tion is only 70% reliable, and there is a 25% chance that the true test?
probability may be 0.50; moreover, there is even a 5% chance (d) If three structures were proof tested, and two of the
that it may be only 0.10. structures survived whereas one failed under the maximum
(a) What is the expected probability of survival before the proof proof load, determine the updated expected probability of
test? survival.
378 Chapter 9. The Bayesian Approach
9.2 A new waste-treatment process has just been developed. In (b) Suppose that the model predicted HA HF, what is the proba
order to evaluate its effectiveness^ the treatment process is in bility that the condition of the improved intersection will actually
stalled for a trial period. Each day the output from the treatment be HAHF2
process is inspected to see if it satisfies the specified standard. (c) If the model predicted LALF, what is the updated relative
Suppose that the outputs between days are statistically indepen likelihoods of the four possible conditions?
dent, and there is a probability p that the daily output will be
9.4 An instrument is used to check the accuracy of a set of mea
acceptable. If the prior PMF is as shown in the figure below, de
surements. However, it can only record three readings, namely.
termine the posterior distribution of p with each of the following
x= 1.2. or 3. The reading x = 2 implies that the previously mea
observations.
sured value is within a tolerable error, whereas x = 1 and x = 3
(a) The output on the first day of the trial period is of unaccept
denote that the measurement is on the low and high side, re
able quality.
spectively. Suppose the distribution of the underlying random
(b) For a 3-day trial period, the quality is unacceptable in only
variable X is as follows:
one day.
(c) For a 3-day trial period, the first two days are satisfactory,
1 -nt
w'hereas the quality is unacceptable on the third day. for
*,- = 1
2
In each case, determine also the Bayesian estimate for p. Px(x.) = nt for x, = 2
(Ans. 0.536, 0.6/7, 0.617.) 1 — nt
for x, — 3
2
(a) Based purely on the data, what is the posterior distribution (a) Determine the updated mean and c.o.v. of the mean duration
of /zr? until damage for this material.
(b) The passenger is now on a plane from San Francisco to Los (b) With the above observed information, what is the probabil
Angeles. By coincidence, the passenger sitting next to him also ity that a wall constructed using this material will be damaged
has been keeping track of the flight time. From his record of 10 in less than 1 hr of fire?
previous trips, he obtained an average of 60 min. Assume that
9.18 A contractor is trying out a new markup policy in preparing
these two passengers have never before taken a plane together.
his bids. With his new strategy, he expects that the probability
With this additional information, what would be the updated
of winning a given job. /?, is 0.5 with a variance of 0.05.
distribution of pr‘l
(a) Determine a suitable PDF for the probability p.
(c) What is the probability that lheircurrent flight will take more
(b) Suppose the contractor has submitted bids for four jobs us
than 80 min?
ing this new markup strategy. The outcome of the first two jobs
9.14 Six measurements were made of an angle as follows: revealed that he has won one but lost the other. What would be
the updated distribution of/??
32 04' 32 05' (c) What is the probability that the contractor will win the re
31 59 31c57' maining two jobs?
32 01' 32'00
9.19 The time between breakdowns of a certain type of construc
Assume that the measurement error is Gaussian with zero mean; tion equipment follows an exponential distribution with mean
and the standard deviation of each measurement can be repre I/a where the mean rate of failure X was rated by the manu
sented by the sample standard deviation of the six measurements facturer to have a mean of 0.5 per year and a c.o.v. of 20%.
above. A contractor owns two pieces of this construction equipment.
(a) Estimate the angle. The operational limes until breakdown of the equipments were
(b) Subsequently, the engineer discovered that the angle has subsequently observed to be 12 and 18 months, respectively. De
been measured before and recorded as 32 00 ± 2 . Estimate the termine the updated mean and c.o.v. of the parameter by using
actual angle using both sets of measurements. the conjugates approach.
9.15 A distance L is measured independently by three surveyors 9.20 Cracks are observed on a concrete bridge deck after an
with three sets of instruments. The respective measurements are earthquake. In order to assess the condition of the bridge, the
2.15, 2.20. and 2.18 km. Suppose the ratio of the standard error sizes ofcracks are measured. Suppose six cracks were measured
of the three measurements is 1:2:3. Estimate the actual distance and their lengths are as follows:
L on the basis of the three sets of measurements. Assume that
measurement error is Gaussian with zero mean. Crack Number Crack Length (cm)
9.16 In designing reinforced concrete structural members to re I 3
sist ultimate loads, a capacity reduction factor <p is often used. 2 5
Suppose that the structural member is a beam element and is 3 at most 6
designed for pure flexure. The conventional value of <•/? is 0.9. 4 4
However, a committee is investigating the effect on the proba 5 at least 4
bility of failure of beams against ultimate load if 0 is increased 6 8
to 0.95. Twelve beams are designed using </> = 0.95. and each of
them is subjected to the designed ultimate load in the laboratory. Suppose the length of a crack follows the exponential dis
It is desired to estimate the probability of failure, p. tribution with parameter A as follows:
Suppose that prior experiences show that p has a mean of
fx(x) = Ae
0.1 and a standard deviation of 0.06. The laboratory test results
show that one out of 12 beams tested failed the ultimate load. Assume that there is no other information on the crack length.
Suggest a suitable prior distribution and determine the mean and Determine the distribution (PDF) of the parameter a.
variance of/? from these data.
9.21 An engineer has moved to a new town. He discovered that
9.17 A newly developed construction material was tested for its the town lies within a tornado zone. The information he has
fire resistance. Results from testing two samples show that one gathered for tornadoes in this town is as follows:
had serious fire damage after being exposed to only 2 hr of lire, (i) Two tornadoes were observed within the last 20 yr.
whereas the other lasted for 3 hr. Suppose the duration of fire, (ii) The maximum wind speeds of these two tornadoes were
T. for damage of a material follows a normal distribution N(p., 190 and 160 kph, respectively.
0.5). Before the tests, the mean duration until damage, /?. for this (iii) From an independent source, it is believed that there is a
material was expected to be 3 hr with a c.o.v. of 20% based on a 95% probability that the mean maximum wind speed for torna
theoretical study. does in the region will be between 145 and 175 kph.
References 4 381
(iv) The standard deviation of maximum wind speed may be (c) Determine the updated distribution of//. (Ans. = 0.023h;
assumed to be 15 kph. 4<h< 10)
(v) The occurrence of tornadoes may be assumed to follow a (d) What is the probability that the hazardous zone will not ex
Poisson process. ceed 4 km in the next accident? (An.v: 0.571)
(vi) The maximum wind speed in a tornado may be assumed to
9.23 Grouted soil nails are often used to strengthen soil slopes
follow a normal distribution.
against stability failure. However, full grouting along the entire
Assume that no other information is available. The engi
length of the nail is generally not achieved in certain cases. On
neer intends to live in this town for 5 years, and he estimates that
a given project, an engineer estimates the average fraction of
his house should be strong enough to withstand a 120 kph wind
length grouted to be within 0.75 to 0.95 with 95% confidence,
without damage.
based on his previous experiences with the particular geologic
Answer the following questions:
formation and construction crew. To improve his estimate, the
(a) What is the probability that the town will be hit by a tornado
engineer tested a sample of four soil nails for the given job.
during the next 5 years?
The average grouted length measured is 0.89, while the standard
(b) What is the probability that the maximum wind speed of the
deviation of measured grouted length is 0.06.
next tornado will exceed 220 kph?
(a) Determine the updated mean and c.o.v. of the average
(c) What is the probability that his house will be undamaged
grouted length of soil nails for the site. State any assumptions
during the next 5-yr period?
used.
9.22 A given type of accident during the operation of a nuclear (b) Final approval of the construction job requires inspection
power plant may involve the leakage of radioactive material, and testing by the regulatory agency. Suppose a sample of
which could be hazardous to the immediate neighborhood. Let two nails will be selected and tested by the agency. Accep
X be the distance (in km) from the power plant beyond which tance requires that each of the following two conditions be
health hazards will be below an acceptable level in the event of satisfied.
an accident. The PDF of X is uniformly distributed as 1. The sample average grouted length must be at least 0.85.
2. The minimum grouted lengths of nails tested should not be
lower than 0.83.
AU) = t 0 <x <h
h (c) Which condition does the engineer think is more stringent
(less likely) to be met for this site?
in which h is the parameter, denoting the maximum possible dis
persion distance of the radioactive material. Based on theoretical 9.24 For Problem 9.13, suppose that the standard deviation of T
studies, a prior distribution of h has been established as is not known, and the sample standard deviation from the 5 trips
is 10 minutes. Based solely on the observed data, determine the
f'H(h) = 0.003 A2 0 </? < 10 posterior statistics of /iT and crT.
9.25 Consider the data given in Problem 10 of Chapter 8 for
(a) Determine the expected distance of the hazardous zone based the rated and actual mileage of 6 cars. By using the Bayesian
on the given information. (Ans. 3.75) regression approach, what is the probability that a car with rated
(b) Suppose an accident has occurred and the hazardous zone mileage of 24 mpg will have an actual mileage less than 18 mpg?
was observed to be extending to a distance of 4 km. What is the How would this probability be changed if the value of the basic
range of h now? scatter a2 can be assumed equal to 1.73 mpg?
► REFERENCES
Ang, A. H-S., and Tang, W., Probability Concepts in Engineering Plan ASCE, Vol. 1 19, No. GT2, pp. 195-213, February 1993.
ning and Design—Decision Risk and Reliability’, Vol II. John Wiley Lee, Peter M., Bayesian Statistics: An Introduction, Edward Arnold Pub
& Sons, New York. 1984. lisher, London. 1989.
Benjamin, J. R. ’‘Probabilistic Model for Seismic Force Design.” Jour, Packman. P. F„ Pearson. H. S.. Owens, J. S., and Marchese, G. B„ “The
of Structural Division, ASCE. Vol. 94. May 1968, pp. 1 175-1196. Applicability of a Fracture Mechanics—Nondestructive Testing De
Halim, I. S.. and Tang, W. H. “Bayesian Method for Characterization of sign Criteria." Technical Report, AFML-TR-68-32, Air Force Mate
Geological Anomaly,” Proceedings, ISUMA 1990, The First Interna rials Laboratory, Wright-Patterson Air Force Base, Ohio, May 1968.
tional Symposium on Uncertainty Modeling and Analysis, Maryland, Raiffa, H., and Schlaifer, R. Applied Statistical Decision Theory.
pp. 585-594, 1990. Division of Research, Harvard Business School. Boston, MA.,
Halim. I. S., and Tang. W. H. “Reliability of Undrained Clay Slope Con 1961.
sidering Geologic Anomaly,” Proceedings, ICASP, 1991. Mexico Tang, W. H„ “A Bayesian Evaluation of Information for Foundation
City. pp. 776-783. Engineering Design,” Proc., 1st International Conference on Appli
Halim, I. S., and Tang, W. H. “Site Exploration Strategy for Geolo cations of Statistics and Probability, Hong Kong Univ. Press. Sept.
gies Anomaly Characterization,” Jour, of Geotechnical Engineering, 1971. pp. 173-185.
382 Chapter 9. The Bayesian Approach
Tang. W. H. “Bayesian Frequency Analysis.” Jour, of the Hydraulics and Their Impact on Civil Engineering Practice, Vol. 11, edited by
Division, ASCE, Vol. 106. No. HY7, 1980. pp. 1203-1218. W. F. Chen and A. D. M. Lewis, 1983. pp. 895-898.
Tang, W. H.. and Saadeghvaziri, A. “Updating Distribution of Anomaly Tang, W. H., and Quek, S. T. “Statistical Model of Boulder Size and
Size and Fraction,” in Recent Advances in Engineering Mechanics Fraction,” Jour, of Geotechnical Engineering, ASCE, Vol. 1, 1986.
Probability Tables
1 / e~^/2dy
TABLE A.1 Standard Normal Probabilities (pg. 1 of 4) <t>(x) = -
v/2tt .■’—oc
X X X
383
384 Appendix A. Probability Tables
1 / e ^l2dy
TABLE A.1 Standard Normal Probabilities (pg. 2 of 4) (i>(x) =
v/2tt -' —oo
X O(x) X w X
1 I e^^dy
TABLE A.1 Standard Normal Probabilities (pg. 3 of 4) = \/2tt v
n=2 2 3 3 3 4 4 4 4 5 5 5 5 5
p x=0 1 0 1 2 0 1 2 3 0 1 2 3 4
0.20 0.6400 0.9600 0.5120 0.8960 0.9920 0.4096 0.8192 0.9728 0.9984 0.3277 0.7373 0.9421 0.9933 0.9997
0.25 0.5625 0.9375 0.4219 0.8438 0.9844 0.3164 0.7383 0.9492 0.9961 0.2373 0.6328 0.8965 0.9844 0.9990
0.30 0.4900 0.9100 0.3430 0.7840 0.9730 0.2401 0.6517 0.9163 0.9919 0.1681 0.5282 0.8369 0.9692 0.9976
0.35 0.4225 0.8775 0.2746 0.7182 0.9571 0.1785 0.5630 0.8735 0.9850 0.1160 0.4284 0.7648 0.9460 0.9947
0.40 0.3600 0.8400 0.2160 0.6480 0.9360 0.1296 0.4752 0.8208 0.9744 0.0778 0.3370 0.6826 0.9130 0.9898
0.45 0.3025 0.7975 0.1664 0.5748 0.9089 0.0915 0.3910 0.7585 0.9590 0.0503 0.2562 0.5931 0.8688 0.9815
0.50 0.2500 0.7500 0.1250 0.5000 0.8750 0.0625 0.3125 0.6875 0.9375 0.0313 0.1875 0.5000 0.8125 0.9687
0.55 0.2025 0.6975 0.0911 0.4253 0.8336 0.0410 0.2415 0.6090 0.9085 0.0185 0.1312 0.4069 0.7438 0.9497
0.60 0.1600 0.6400 0.0640 0.3520 0.7804 0.0256 0.1792 0.5248 0.8704 0.0102 0.0870 0.3174 0.6630 0.9222
0.65 0.1225 0.5775 0.0429 0.2818 0.7254 0.0150 0.1265 0.4370 0.8215 0.0053 0.0540 0.2352 0.5716 0.8840
0.70 0.0900 0.5100 0.0270 0.2160 0.6570 0.0081 0.0837 0.3483 0.7599 0.0024 0.0308 0.1631 0.4718 0.8319
0.75 0.0625 0.4375 0.0156 0.1563 0.5781 0.0039 0.0508 0.2617 0.6836 0.0010 0.0156 0.1035 0.3672 0.7627
0.80 0.0400 0.3600 0.0080 0.1040 0.4880 0.0016 0.0272 0.1808 0.5904 0.0003 0.0067 0.0579 0.2627 0.6723
0.85 0.0225 0.2775 0.0034 0.0608 0.3859 0.0005 0.0120 0.1095 0.4780 0.0001 0.0022 0.0266 0.1648 0.5563
0.90 0.0100 0.1900 0.0010 0.0280 0.2710 0.0001 0.0037 0.0523 0.3439 0 0.0005 0.0086 0.0815 0.4095
0.95 0.0025 0.0975 0.0001 0.0073 0.1426 0 0.0005 0.0140 0.1855 0 0 0.0012 0.0226 0.2262
0.99 0 0.0199 0 0 0.0297 0 0 0.0006 0.0394 0 0 0 0.0010 0.0490
(Continued)
TABLE A.2 (Continued)
n=6 6 6 6 6 6 8 8 8 8 8 8 8 8
p x=0 1 2 3 4 5 0 1 2 3 4 5 6 7
0.01 0.9415 0.9985 1 1 1 1 0.9227 0.9973 0.9999 1 1 1 1 1
0.05 0.7351 0.9672 0.9978 0.9999 1 1 0.6634 0.9428 0.9942 0.9996 1 1 1 1
0.10 0.5314 0.8857 0.9842 0.9987 0.9999 1 0.4305 0.8131 0.9619 0.995 0.9996 1 1 1
0.15 0.3771 0.7765 0.9527 0.9941 0.9996 1 0.2725 0.6572 0.8948 0.9786 0.9971 0.9998 1 1
0.20 0.2621 0.6554 0.9011 0.983 0.9984 0.9999 0.1678 0.5033 0.7969 0.9437 0.9896 0.9988 0.9999 1
0.25 0.178 0.5339 0.8306 0.9624 0.9954 0.9998 0.1001 0.3671 0.6785 0.8862 0.9727 0.9958 0.9996 1
0.30 0.1176 0.4202 0.7443 0.9295 0.9891 0.9993 0.0576 0.2553 0.5518 0.8059 0.942 0.9887 0.9987 0.9999
0.35 0.0754 0.3191 0.6471 0.8826 0.9777 0.9982 0.0319 0.1691 0.4278 0.7064 0.8939 0.9747 0.9964 0.9998
0.40 0.0467 0.2333 0.5443 0.8208 0.959 0.9959 0.0168 0.1064 0.3154 0.5941 0.8263 0.9502 0.9915 0.9993
0.45 0.0277 0.1636 0.4415 0.7447 0.9308 0.9917 0.0084 0.0632 0.2201 0.477 0.7396 0.9115 0.9819 0.9983
0.50 0.0156 0.1094 0.3438 0.6563 0.8906 0.9844 0.0039 0.0352 0.1445 0.3633 0.6367 0.8555 0.9648 0.9961
0.55 0.0083 0.0692 0.2553 0.5585 0.8364 0.9723 0.0017 0.0181 0.0885 0.2604 0.523 0.7799 0.9368 0.9916
0.60 0.0041 0.041 0.1792 0.4557 0.7667 0.9533 0.0007 0.0085 0.0498 0.1737 0.4059 0.6846 0.8936 0.9832
0.65 0.0018 0.0223 0.1174 0.3529 0.6809 0.9246 0.0002 0.0036 0.0253 0.1061 0.2936 0.5722 0.8309 0.9681
0.70 0.0007 0.0109 0.0705 0.2557 0.5798 0.8824 0.0001 0.0013 0.0113 0.058 0.1941 0.4482 0.7447 0.9424
0.75 0.0002 0.0046 0.0376 0.1694 0.4661 0.822 0 0.0004 0.0042 0.0273 0.1138 0.3215 0.6329 0.8999
0.80 0.0001 0.0016 0.017 0.0989 0.3446 0.7379 0 0.0001 0.0012 0.0104 0.0563 0.2031 0.4967 0.8322
0.85 0 0.0004 0.0059 0.0473 0.2235 0.6229 0 0 0.0002 0.0029 0.0214 0.1052 0.3428 0.7275
0.90 0 0.0001 0.0013 0.0159 0.1143 0.4686 0 0 0 0.0004 0.005 0.0381 0.1869 0.5695
0.95 0 0 0.0001 0.0022 0.0328 0.2649 0 0 0 0 0.0004 0.0058 0.0572 0.3366
0.99 0 0 0 0 0.0015 0.0585 0 0 0 0 0 0 0.0027 0.0773
(Continued)
TABLE A.2 (Continued for n = 10)
p x=0 1 2 3 4 5 6 7 8 9
0.50 0.001 0.0107 0.0547 0.1719 0.377 0.623 0.8281 0.9453 0.9893 0.999
0.55 0.0003 0.0045 0.0274 0.102 0.2616 0.4956 0.734 0.9004 0.9767 0.9975
0.60 0.0001 0.0017 0.0123 0.0548 0.1662 0.3669 0.6177 0.8327 0.9536 0.994
0.65 0 0.0005 0.0048 0.026 0.0949 0.2485 0.4862 0.7384 0.914 0.9865
0.70 0 0.0001 0.0016 0.0106 0.0473 0.1503 0.3504 0.6172 0.8507 0.9718
(Continued)
TABLE A.2 (Continued forn= 15)
p x=0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0.01 0.8601 0.9904 0.9996 1 1 1 1 1 1 1 1 1 1 1 1
0.05 0.4633 0.829 0.9638 0.9945 0.9994 0.9999 1 1 1 1 1 1 1 1 1
0.10 0.2059 0.549 0.8159 0.9444 0.9873 0.9978 0.9997 1 1 1 1 1 1 1 1
0.15 0.0874 0.3186 0.6042 0.8227 0.9383 0.9832 0.9964 0.9994 0.9999 1 1 1 1 1 1
0.20 0.0352 0.1671 0.398 0.6482 0.8358 0.9389 0.9819 0.9958 0.9992 0.9999 1 1 1 1 1
0.25 0.0134 0.0802 0.2361 0.4613 0.6865 0.8516 0.9434 0.9827 0.9958 0.9992 0.9999 1 1 1 1
0.30 0.0047 0.0353 0.1268 0.2969 0.5155 0.7216 0.8689 0.95 0.9848 0.9963 0.9993 0.9999 1 1 1
0.35 0.0016 0.0142 0.0617 0.1727 0.3519 0.5643 0.7548 0.8868 0.9578 0.9876 0.9972 0.9995 0.9999 1 1
0.40 0.0005 0.0052 0.0271 0.0905 0.2173 0.4032 0.6098 0.7869 0.905 0.9662 0.9907 0.9981 0.9997 1 1
0.45 0.0001 0.0017 0.0107. 0.0424 0.1204 0.2608 0.4522 0.6535 0.8182 0.9231 0.9745 0.9937 0.9989 0.9999 1
0.50 0 0.0005 0.0037 0.0176 0.0592 0.1509 0.3036 0.5 0.6964 0.8491 0.9408 0.9824 0.9963 0.9995 1
0.55 0 0.0001 0.0011 0.0063 0.0255 0.0769 0.1818 0.3465 0.5478 0.7392 0.8796 0.9576 0.9893 0.9983 0.9999
0.60 0 0 0.0003 0.0019 0.0093 0.0338 0.095 0.2131 0.3902 0.5968 0.7827 0.9095 0.9729 0.9948 0.9995
0.65 0 0 0.0001 0.0005 0.0028 0.0124 0.0422 0.1132 0.2452 0.4357 0.6481 0.8273 0.9383 0.9858 0.9984
0.70 0 0 0 0.0001 0.0007 0.0037 0.0152 0.05 0.1311 0.2784 0.4845 0.7031 0.8732 0.9647 0.9953
0.75 0 0 0 0 0.0001 0.0008 0.0042 0.0173 0.0566 0.1484 0.3135 0.5387 0.7639 0.9198 0.9866
0.80 0 0 0 0 0 0.0001 0.0008 0.0042 0.0181 0.0611 0.1642 0.3518 0.602 0.8329 0.9648
0.85 0 0 0 0 0 0 0.0001 0.0006 0.0036 0.0168 0.0617 0.1773 0.3958 0.6814 0.9126
0.90 0 0 0 0 0 0 0 0 0.0003 0.0022 0.0127 0.0556 0.1841 0.451 0.7941
0.95 0 0 0 0 0 0 0 0 0 0.0001 0.0006 0.0055 0.0362 0.171 0.5367
0.99 0 0 0 0 0 0 0 0 0 0 0 0 0.0004 0.0096 0.1399
(Continued)
TABLE A.2 (Continued for n — 20)
p x=0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0.25 0.003 0.024 0.091 0.225 0.415 0.617 0.786 0.898 0.959 0.986 0.996 0.999 1 1 1 1 1 1 1 1
0.30 8E-04 0.008 0.036 0.107 0.238 0.416 0.608 0.772 0.887 0.952 0.983 0.995 0.999 1 1 1 1 1 1 1
0.35 2E-04 0.002 0.012 0.044 0.118 0.245 0.417 0.601 0.762 0.878 0.947 0.98 0.994 0.999 1 1 1 1 1 1
0.40 0 5E-04 0.004 0.016 0.051 0.126 0.25 0.416 0.596 0.755 0.873 0.944 0.979 0.994 0.998 1 1 1 1 1
0.45 0 IE-04 9E-04 0.005 0.019 0.055 0.13 0.252 0'414 0.591 0.751 0.869 0.942 0.979 0.994 0.999 1 1 1 1
0.50 0 0 2E-04 0.001 0.006 0.021 0.058 0.132 0.252 0.412 0.588 0.748 0.868 0.942 0.979 0.994 0.999 1 1 1
0.55 0 0 0 3E-04 0.002 0.006 0.021 0.058 0.131 0.249 0.409 0.586 0.748 0.87 0.945 0.981 0.995 0.999 1 1
0.60 0 0 0 0 3E-04 0.002 0.007 0.021 0.057 0.128 0.245 0.404 0.584 0.75 0.874 0.949 0.984 0.996 1 1
0.65 0 0 0 0 0 3E-04 0.002 0.006 0.02 0.053 0.122 0.238 0.399 0.583 0.755 0.882 0.956 0.988 0.998 1
0.70 0 0 0 0 0 0 3E-04 0.001 0.005 0.017 0.048 0.113 0.228 0.392 0.584 0.763 0.893 0.965 0.992 0.999
0.75 0 0 0 0 0 0 0 2E-04 9E-04 0.004 0.014 0.041 0.102 0.214 0.383 0.585 0.775 0.909 0.976 0.997
0.80 0 0 0 0 0 0 0 0 IE-04 6E-04 0.003 0.01 0.032 0.087 0.196 0.37 0.589 0.794 0.931 0.989
0.85 0 0 0 0 0 0 0 0 0 0 2E-04 0.001 0.006 0.022 0.067 0.17 0.352 0.595 0.824 0.961
0.09 0 0 0 0 0 0 0 0 0 0 0 1E-04 4E-04 0.002 0.011 0.043 0.133 0.323 0.608 0.878
0.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3E-04 0.003 0.016 0.076 0.264 0.642
0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0.017 0.182
392 ► Appendix A. Probability Tables
TABLE A.4 (Continued) Critical Values of the x2 Distribution at Probability Level (1 —a) = p
TABLE A.5 Critical Values of D'n' at Significance Level a in the K-S Test
Significance Level a bo bi
Excerpted from D'Agostino and Stephens, 1986 (see References in Chapter 7).
Significance Level a ca
0.25 0.736
0.20 0.816
0.15 0.916
0.10 0.162
0.05 1.321
0.025 1.591
0.01 1.959
0.005 2.244
0.0025 2.534
Table A.6c Critical Values of ca at Significance Level a oftheA-D Test for the Gamma
Distribution (Parameters Estimated from Sample Data)
Significance Level a
k 0.25 0.10 0.05 0.025 0.01 0.005
Excerpted from Lockhart and Stephens. 1985 (see References in Chapter 7).
Significance Level a ca
0.25 0.474
0.10 0.637
0.05 0.757
0.025 0.877
0.01 1.038
Combinatorial Formulas
In probability problems involving discrete and finite sample spaces, the definition of events
and of the underlying sample spaces entails the enumeration of sets or subsets of sample
points. For this purpose, the techniques of combinatorial analysis are often useful. We
summarize here some of the basic elements of combinatorial analysis.
► EXAMPLES (1) If an engineering design involves three parameters </>i, fa, fa and there are, respectively, 2, 3, 4
values of these parameters, the number of feasible designs is
7V(3|2, 3,4) = 2x3x4 = 24
(2) In a 3-dimensional Cartesian coordinate system with x, y, z axes, if 10 discrete values are specified
for each of the three axes, e.g., x = 0. 1, 2,..., 9, then the total number of coordinate positions is
N(3| 10, 10, 10) = 103 = 1.000 ◄
Ordered Sequences
In a set of n distinct elements, the number of ordered sequences or arrangements, each with
k elements, is
n!
(n)A. = n(n - 1 )(n - 2) • • • (n - k + 1) = ------ — (B.2)
(n — k)\
We can observe that for the first position in the sequence of k elements, there are n elements
available to occupy it; but there are only (n — 1) elements available for the second position
since one of the n elements has been used for the first position and only (n — 2) elements
are available for the third position, and so on. Thence, by virtue of the basic relation,
Eq. B.l,
397
398 Appendix B. Combinatorial Formulas
Wk
(B.3)
kl
It may be emphasized that in Eq. B.2 the ordering of the k elements is of significance (i.e.,
different orderings of the same elements constitute different sequences or arrangements),
whereas in Eq. B.3 the order is irrelevant. In a set of A: elements, the positions of the elements
can be permuted k\ times; hence, by virtue of Eq. B.2, we obtain Eq. B.3 as the number of
different A-element subsets (disregarding order).
Equation B.3 is defined only for k < n. Using Eq. B.2 in Eq. B.3 we also have
n\
(B.3a)
kl(n-k)l
(B.4)
Equation B.3 or B.3a is known as the binomial coefficient because this is precisely the
coefficient in the binomial expansion of (x+y)", namely,
► EXAMPLES A? 1
(
/•/
I. This is the
same as sampling without replacement, except that the order is disregarded; therefore, the number
of samples is
(«)r
B.5 Stirling's Formula 399
(1) Among 25 concrete cylinders marked 1, 2,...,25. the total number of possible samples
of five cylinders each is
25 25!
= 53.120
5 5’20!
Among the n elements, the first group of k\ elements can be selected in L I ways. The
V1 /
second group of k2 elements can be selected from the remaining (n — k\) elements in
— Ai\
ways, and so on. The total number of ways of dividing the n elements into r
► EXAMPLES (1) In a given seismic region, suppose six earthquakes of intensities (in MM scale) V. VI, and VII
may occur in the next 10 years. The number of different sequences in which three quakes of
intensity V, two of intensity VI, and one of intensity VII can occur is
3! x 2! x I!
(2) Suppose a construction company wishes to acquire three types of construction equipment
consisting of two bulldozers, four road graders, and six backhoes for a total fleet of 12 pieces ot
equipment for a project. The number of different sequences that the company may be able to
acquire the fleet of equipment is
12!
--------------- = 13,860
2! x 4! x 6!
’Feller, W„ An Introduction to Probability and Its Applications, Vol. 1.2nd Ed.. J. Wiley and Sons, New York.
1957.
Derivation of the
Poisson Distribution
In Chapter 3, Sect. 3.2.6, we saw that the Poisson distribution describes the probability
mass function (PMF) of the number of occurrences of an event within a specified interval
of time or space. It is the result of an underlying counting process X(t), known as a Poisson
process, which is a model of the random occurrences of an event in time (or space) t.
The Poisson process model is based on the following assumptions:
1. At any instant of time (or point in space), there can be at most one occurrence of
an event; in other words, the probability of n occurrences of the event over a small
interval Ar is of order o(Ar)".
2. The occurrences of an event in nonoverlapping time (or space) intervals are statis
tically independent; this is the assumption of independent increments.
3. The probability of the occurrence of an event in the interval (r, t + Ar) is proportional
to Ar; that is,
400
Appendix C. Derivation of the Poisson Distribution ◄ 401
We should recognize that Eq. C. 1 applies for x > 1. For x = 0. Eq. C. I becomes
dPoA) -n
—— = -vp0(t) (C.2)
at
If the counting process starts at t = 0, the initial conditions associated with Eqs. C.l and
C.2 are
po(0)=1.0 and px(0) = 0
The solution of Eq. C.2 with the first of the above initial conditions yields, for x = 0,
Po(t) = e~vt
Whereas forx > 1 the solutions to Eq. C.l are
403
404 Index
Event(s). 27-50. 52-54, 57, 58, 62, 63. tests, 278, 289. 293, 296 M
83. 85. 105, 106, 108-113, G umbel Marginal distribution, 133, 236, 238.
117-120. 132. 254 distribution. 176, 299, 300, 301 240, 370
certain, 34 probability paper. 287-289, 299 Markov chain (or process), 1 18
collectively exhaustive, 38 MATHCAD. 200. 201, 203, 206, 208,
complementary. 34 H 210-213, 215. 217-219.
conditioned. 50 Half width. 267, 268 223-225, 227, 242, 244
impossible, 32. 34 Histogram, 3, 4. 6, 8-10, 125. 170-172, MATLAB, 179, 199-203, 209, 211.
independent, 52. 53 204-208. 212-215. 234-231. 216,217,219, 220. 230. 235.
intersection of. 35, 82 233-241 242,244
joint, 52. 60 Hypothesis test, 258-261 Mathematical expectation. 89. 180
mutually exclusive, 37. 82 with known variance. 260 Maximum likelihood. 251
special. 34 with unknown variance, 261 estimator, 252, 255, 360
union of. 34-37, 39, 40, 42, 46 Mean-value, 17, 19. 89. 141,307
1
Expected value, see mean value function, 321, 372-374
Independence, statistical, 52, 106, 354,
Exponential, 1 18. 249 Median. 89, 103
369
distribution, shifted, 121, 249, 285 Mixed probability distribution, 84
Inference, statistical, 245-247
form, double, 175-177, 287 Mode or modal value. 89. 91
inherent variability, see aleatory
probability paper, 284-287 Moments, 94, 151, 180. 184. 248. 273
uncertainty
Extremal distribution, 176, 297, 300 of function of random variables,
Intersection of events, 34. 35. 36, 37.
Extreme values. 172. 198, 300 180-189
39—42, 82
exact probability. 173 Monte Carlo simulation (MCS), 180,
Interval estimation. 262-264. 268. 269,
273 190. 199-201,203,206,208.
F 215, 218. 223-231,231-241
First-order approximation. 184-189. J MS Excel. 200. 205. 208
214 Joint distributions, 132, 232-241,369 Multiple random variables. 132, 133,
Fisher-Tippett distribution, 178 Judgment, subjective, 30. 64, 347. 348 135, 137, 139.157
Frequency. 289. 291-293. 296 Judgmental information. 347. 348, 371 Multiplication rule. 52, 53. 59. 65
Frequency distribution, 4 Mutually exclusive events, 37-39. 82
empirical. 4, 7-13 K
Function Kolmogorov-Smirnov (K-S) test, N
of multiple random variables, 293-296 Nonlinear function, 180. 190. 199. 325.
157-168 critical values, 395 326.333
of single random variable. 151-156 Kurtosis, 94, 257 Normal distribution, see Gaussian
Null hypothesis, 259-261
G L Numerical method (or procedure), 190.
Gamma Least squares, 308, 319, 322 199-201. 373. 376
distribution. 122-124. 249, 357, 366, Likelihood function. 252, 253. 353.
368 354. 356. 359-362. 368, 369, O
shifted. 124. 249. 292, 293, 296 373,375 Occurrence
three-parameter, see shifted Linear first. 109. 111, 118, 123
function, 122, 128 function. 161, 180. 181,307,322 joint, 35, 52, 132
function, incomplete, 122, 202 mean-value function. 306 Ath, 111, 123
Gaussian (or normal) regression, 306-308, 313, 318-320. Occurrence rate, 113, 116-119. 123.
distribution. 96. 255. 297. 298, 360, 321-323, 326-328, 334, 335, 290, 356. 366
366, 371 337.372 One-sided test, 259, 260, 261
bivariate. 137. 232. 235 equation. 18. 309, 311-313.316. Outcome, 28. 30. 81. 105, 349. 353
standard. 97-100, 154. 176. 256. 319, 326. 329. 331,333, 337,
264 339.372 P
probability paper, 280, 281 multiple, 321-324, 335, 337 Parameter
General probability papers. 284 trend, 278-280, 282. 283, 286 estimated, 246, 262, 347
Geometric distribution. 108-110. 249 Lognormal of asymptotic distributions,
Goodness-of-fit, 289, 292, 295, 297, distribution. 100-104. 106. 118 176-179
299,301 probability paper. 279, 281-284 prior, 366, 371
Now extensively revised with new illustrative problems and new and expanded topics,
this Second Edition will help you develop a thorough understanding of probability and
statistics and the ability to formulate and solve real-world problems in engineering. The
authors present each basic principle using different examples, and give you the opportu
nity to enhance your understanding with practice problems. The text is ideally suited for
students, as well as those wishing to learn and apply the principles and tools of statistics
and probability through self-study.
©WILEY________
www.wiley.com/college/ang