Statistical Programming SimulationandNumberGenerator
Statistical Programming SimulationandNumberGenerator
1
CONTENTS 2
5 Mathematical Functions 60
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.1 Logarithmic function . . . . . . . . . . . . . . . . . . . 61
5.1.2 Trigonometric functions . . . . . . . . . . . . . . . . . 62
5.1.3 Power laws . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.4 Polynomial functions . . . . . . . . . . . . . . . . . . . 63
5.2 Probability Functions . . . . . . . . . . . . . . . . . . . . . . 64
5.2.1 Continuous probability distributions . . . . . . . . . . 65
6 Sampling Distributions 72
6.1 Chi–Square Distributions . . . . . . . . . . . . . . . . . . . . 72
6.2 Student’s t Distributions . . . . . . . . . . . . . . . . . . . . . 79
6.3 Fisher’s F Distributions . . . . . . . . . . . . . . . . . . . . . 81
7 Confidence Intervals 86
7.0.1 Confidence Intervals for the Mean (Known Variance) . 86
7.0.2 Confidence Intervals for the Mean (Unknown Variance) 87
Learning Objectives:
Pre-Requisites:
Home work
There will be approximately 12 homework assignments. You are encouraged
to work in groups, but remember to submit your own unique work.
For each assignment, turn-in a hard copy of your R source file along with a
brief write-up of the solutions (do not submit raw output). Also submit via
e-mail (cyomari@yahoo.com) a copy of your R source file. No late homework
will be accepted unless it is due to some excused absence.
• Late homework will not be accepted, except for acts of God, e.g. med-
ical emergencies or death in the family.
have your cell phones, ipods, kindles etc CLOSED before entering the class-
room. If a cell phone rings, you text, or you use electronic devices during
a lecture or exam, you will suffer an automatic 10% reduction of your
final grade. Only one warning will be given. Using cell phones in class is
not proper university conduct.
Computer: During the lecture or exams you may have the SAS or R
windows open, as well as a web browser directed to the class website ONLY.
The time at the computer lab is reserved for learning and following along
the lecture. Opening web browser email, or other websites than the class
website will result in an automatic 10% reduction of your final grade.
Only one warning will be given.
Generating Random
Numbers
In this lecture, the focus will be on the generation of random numbers uni-
formly distributed on the interval (0, 1). The aim is to generate a stream
u1 , u2 , · · · of iid U (0, 1) variables. The algorithms used in practical imple-
mentations are all deterministic in nature, typically using a recursion, and
can therefore at best only mimic the behaviour of a truly U (0, 1) sequence
of random numbers. For this reason we call the output of a generator a
sequence of pseudo-random numbers. The generation of an iid U (0, 1)
sequence on a computer is a central theme in the area of simulation, as the
simulation of random variables from general distributions requires the abil-
ity to produce U (0, 1) random numbers.
6
CHAPTER 1. GENERATING RANDOM NUMBERS 7
One of the reasons for widespread use simulation methods is the fact that
the vast majority of practically important problems are not amenable to
analytical solutions. Another reason is that it is relatively easy to mas-
ter (at least) the basic steps of simulation techniques. Without any doubt,
the power of simulation methods increases enormously when simulations are
combined with analytical calculations.
Upper case letters (like X, Y, Z) are used to denote random variables. Lower
case letters (like x1 , y10 , z) with or without subscripts are mostly used to
denote realisations of random variables.
(c) it has been found that the generated number exhibit both bias and
dependence.
Most of today’s random number generators are not based on physical de-
vices, but on simple algorithms that can be easily implemented on a com-
puter, are fast, require little storage space, and can readily reproduce a given
sequence of random numbers.
a, the multiplier
c, the increment
m, the modulus m > a, m > c.
To obtain the desired sequence of random numbers u1 , u2 , · · · , u N , first gen-
erate a sequence of integers x1 , x2 ,· · · , x N in the range 0, 1, 2, · · · , m − 1
starting from an initial value x0 , known as the seed.
This method generates random numbers on the interval [0, 1) rather than
[0, 1]. (when writing ranges, mathematicians use square brackets to indicate
that the endpoint is included in the range and round brackets to indicate
that it is not. So these ranges correspond to 0 ≤ · · · < 1 and 0 ≤ · · · ≤ 1.)
In fact, it is clear that it can only give numbers of the form i/m where
i = 0, 1, 2, · · · , m − 1. In practice this doesn’t matter, assuming m is large
enough, so we won’t distinguish between the two in this chapter.
The second objection to LCGs might be that the Ui ’s can take on only
the rational values 0, 1/m, 2/m, · · · , (m − 1)/m; in fact, the ui ’s might ac-
tually take on only a fraction of these values, depending on the specification
of the constants m, a, c, and Z0 , as well as on the nature of the floating-
point division by m. Thus there is no possibility of getting a value of Ui
between, say, 0.1/m and 0.9/m, whereas this should occur with probability
0.8/m > 0. As we shall see, the modulus m is usually chosen to be very
large, say 109 or more, so that the points in [0, 1] where the Ui ’s can fall are
very dense; for m ≥ 109 , there are at least a billion possible values.
Example 1.3.1. Use this algorithm with parameter values a = 11, c = 37,
m = 100 and seed x0 = 85 to generate 5 random numbers from the U (0, 1)
distribution. How suitable do you think this method would be for generating
random numbers?
Solution. The first few numbers in the sequence are calculated as follows:
These are then divided by m to generate the random numbers, which are:
0.72, 0.29, 0.56, 0.53, 0.20, · · ·
To see how suitable this sequence is for practical use, we need to see how
many numbers are generated and whether these appear to be random.
In fact this sequence only generates 50 different numbers out of the 100
in the range 0 to 99 because it starts repeating after x50 = x0 = 85.
i xi ui i xi ui i xi ui i xi ui
0 7 – 5 10 0.625 10 9 0.563 15 4 0.250
1 6 0.375 6 5 0.313 11 0 0.000 16 7 0.438
2 1 0.063 7 12 0.750 12 3 0.188 17 6 0.375
3 8 0.500 8 15 0.938 13 2 0.125 18 1 0.063
4 11 0.688 9 14 0.875 14 13 0.813 19 8 0.500
The sequence begins to repeat after 16 random numbers. Note that every
number between 1 and 16 is in the sequence, but neither zero nor 17 ever
appear, since the number zero would repeat forever. This particular choice
CHAPTER 1. GENERATING RANDOM NUMBERS 13
Question 1.3.1. Use the algorithm with parameter values a = 10, c = 23,
m = 59 and seed x0 = 11 to generate 3 random numbers from the U (0, 1)
distribution.
Question 1.3.4. What are the disadvantages of using truly random num-
bers, as opposed to pseudo-random numbers in Monte Carlo simulations?
Answer.
Question 1.3.5. Describe the three integers one has to choose in order to
form a linear congruential generator (LCG) and set out a recursive relation-
ship, with subsequent scaling, for generating a sequence of pseudo-random
numbers in the range −1 to 1.
General approach to
generating random variates
2.1 Introduction
A simulation that has any random aspect at all must involve sampling, or
generating, random variates from probability distributions. These distribu-
tions are often specified as a result of fitting some appropriate distributional
form, e.g., exponential, gamma, or poisson, to observed data. In this lecture
we assume that a distribution has already been specified somehow (includ-
ing the values of the parameter), and we address the issue of how we can
generate random variates with this in order to run the simulation model.
For example, the queueing-type model required generation of inter-arrival
and service times to drive the simulation through time, and the inventory
model needed randomly generated demand sizes at the times when a de-
mand occurred.
There are many techniques for generating random variates, and the partic-
ular algorithm used must, of course, depend on the distribution from which
we wish to generate; however nearly all these techniques can be classified
according to their theoretical basis.
The main general methods for the generation of random variates are:
2. Acceptance-rejection method.
15
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES16
Special algorithms exist for the generation of random variates from impor-
tant distributions. For instance, the Box-Muller algorithm and the polar
algorithm are used for generation of random variates from the standard nor-
mal distribution.
If a random variable U is uniformly distributed over the interval [0, 1], then
the random variable X = F −1 (U ) has the distribution function F ( x ), since:
h i
F ( x ) = P( X ≤ x ) = P F −1 (U ) ≤ x = P [U ≤ F ( x )] = F ( x )
The last step follows because if U ∼ U (0, 1) then P(U ≤ u) = u for any
value of u in the range (0, 1). (recall that ∼ is read “is distributed as”).
(2). Return x = F −1 (u), which is the variate desired from the given distri-
bution.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES17
The difficulty with this technique lies in finding the inverse transform such
that F −1 (U ) = x. If this inverse function can be established, we only need to
generate various uniform random numbers and perform the inverse function
on them to obtain random deviates from the distribution desired.
The algorithm for generating a random variable from F can thus be written
as follows: Generation of a discrete Random Variable
1. Generate U ∼ U (0, 1)
Example 2.2.1. Describe how you would generate random variates from a
Binomial(3, 0.6) distribution X, using a sequence of random numbers ui .
It should be clear that the probabilities of each outcome are correct in the
above example. We will now discuss the general situation.
Let X be a discrete random variable which can take only the values x1 , x2 , · · · , x N ,
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES18
where the summation runs over the range of indexes i such that xi ≤ x. If
x < x1 then ∑ pi .
i:xi ≤ x
(a) Find the value of the constant k, so that f ( x ) is a well defined proba-
bility density function.
(b) Use the inverse transformation method to generate three random ob-
servations from f ( x ) by hand.
(c) Write an R function to generate 1000 observations from f ( x ).
Note: If needed, use the following uniform random numbers in the
given order:
r0 = 0.0257, r1 = 0.2331, r2 = 0.3452, r3 = 0.2589.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES19
Example 2.2.3. Using the first 5 numbers in the first column of random
U (0, 1) numbers given in the Tables, generate 5 values from a Poisson(10)
distribution.
Bernoulli Distribution
f ( x ) = p x (1 − p )1− x , x = 0, 1,
1. Generate U ∼ U (0, 1)
Binomial Distribution
Recall that a binomial random variable X can be viewed as the total num-
ber of successes in n independent Bernoulli experiments, each with success
probability p; Denoting the result of the ith trial by Xi = 1 (success) or
Xi = 0 (failure), we can write X1 + X2 + · · · + Xn with the { Xi } iid Ber( p)
random variables. The simplest generation algorithm can thus be written
as follows: (Generation of a Binomial Random Variable)
Geometric Distribution
f ( x ) = p (1 − p ) x −1 , x = 1, 2, · · ·
Proof. :
The cdf is
0R
x<0
x
F(x) = 0
2xdx = x2 , 0≤x≤1 (2.3)
1 x>0
Applying X = F −1 (U ), we have
√
X = F −1 (U ) = U, 0 ≤ u ≤ 1.
Recall: Z x
d
f (x) = F ( x ), F(x) = f (t)dt
dx 0
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES22
Then
Z x Z x
F(x) = f (t) dt = 3t2 dt
t =0 0
Hence;
F ( x ) = x3 .
Since F ( x ) is the cumulative distribution function of f ( x ), we can replace
F ( x ) by a random number R. The inverse transformation is, therefore
x = F −1 (U ) = U 1/3 .
with respect to x. Even in the case where F −1 exists in an explicit form, the
inverse-transform method may not necessarily be the most efficient random
variable generation method.
The next subsection we present algorithms for generating variables from
commonly used continuous. Of the numerous algorithms available we have
tried to select those which are reasonably efficient and relatively simple to
implement.
1
F −1 ( u ) = − log(1 − u).
λ
Noting the U ∼ U (0, 1) implies 1 − U ∼ U (0, 1), we obtain the following
algorithm. Generation of an Exponential Random Variable
1. Generate U ∼ U (0, 1)
There are many alternative procedures for generating variables from the ex-
ponential distribution. The interested reader is referred to Lue Devroye’s
book Non-Uniform Random Variate Generation, Springer-Verlag, 1986. (The
entire book can be downloaded for free.)
Example 2.2.6. Given a random number U uniformly distributed over
[0, 1], give an expression in terms of U for another variable X whose density
function is f ( x ) = 5 exp(−5x ). Justify the expression.
Answer. Let X = −(log U )/5. Then, for x > 0, the cumulative distribu-
tion function is
F ( x ) = P( X ≤ x ) = P [log U ≥ −5x ]
= P U ≥ e−5x
f ( x ) = 5 exp(−5x ) as required.
Use these to generate four pseudo random values from the exponential dis-
tribution with probability density function
f ( x ) = 5e−5x , x > 0.
[4]
Weibull Distribution Generate a random variate from the Weibull (α, β)
distribution.
Example 2.2.7. Use the U (0, 1) random numbers 0.238 and 0.655 to gen-
erate two observations from the Weibull distribution with parameters c =
(1/β)α = 0.002, and α = 1.1.
Solution. Using the inverse transform method, we need to set
U = 1 − e−cx
α
i.e.,
−cx α = log(1 − u),
i.e.,
1/α
log(1 − u)
x=
−c
Using the equation above with the parameters c = 0.002 and α = 1.1, we
get;
3. Describe the three integers one has to choose in order to form a linear
congruential generator (LCG) and set out a recursive relationship,
with subsequent scaling, for generating a sequence of pseudo random
numbers in the range 1 to 1. (3)
specified distribution.
u
P( X = 0) = 0.1353 so F (0) = 0.1353
P( x = 1) = 0.2702 so F (1) = 0.4060 ← 0.167, 0.236
P( x = 2) = 0.2707 so F (2) = 0.6767
P( x = 3) = 0.1804 so F (3) = 0.8571 ← 0.778
P( x = 4) = 0.0902 so F (4) = 0.9473
P( x = 5) = 0.0361 so F (5) = 0.9834 ← 0.968
and so on.
5.
6. (a) Let X be any continuous random variable, and let F ( x ) be its cu-
mulative distribution function. Suppose that U is a continuous
random variable which follows a uniform distribution on the inter-
val (0, 1), and define the new random variable Y by Y = F −1 (U ),
where F −1 () is the inverse function of F (). By considering the
cumulative distribution function of Y, or otherwise, show that Y
has the same distribution as X. (5)
Solution. FY (y) = P(Y ≤ y) = P( F −1 (U ) ≤ y) = P(U ≤
F (y)). But U is uniform (0, 1), and so P(U ≤ u) = u for
0 ≤ u ≤ 1. Hence, FY (y) = P(U ≤ F (y)) = F (y), the same
cdf as X. Therefore, Y has the same distribution as X.
(b) The following values are a random sample of numbers from a
uniform distribution on the interval (0, 1).
Sample is 1, 1, 4, 4.
18
(ii) Pareto: f (x) = 3 , x > 3. (5)
x Z x x
18 9
Solution. For Pareto, FX ( x ) = dt = − =
3 t3 t2 3
9
1 − 2 ( x ≥ 3). u = FX ( x ) = 1 − x92 gives x = √13−u . The
x
random variates therefore are 3.365, 4.144, 8.624, 10.882.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES29
1
(iii) Standard normal: f (x) = √ exp(− x2 /2), −∞ <
2π
x<∞ (4)
Solution. Using Normal tables, the four values correspond-
ing to ui are xi = Φ−1 (ui ), i.e., −0.82, −0.06, 1.17, 1.43.
7. Try to solve for the inverse function of the normal density. Highlight
the difficulties in obtaining a neat algebraic solution.
f ( x ) = 1 − e− x/β
P ( X = x ) = f ( x ) = p (1 − p ) x −1 , x = 1, 2, 3, · · ·
(b) Using a), can you devise a strategy to generate a random number
from a geometric distribution?
Explain your steps carefully.
Note: You can only submit your lab assignment printed or hand written
with appropriate graphs printed out from R. Emails are not accepted. Also,
note that the way you present your solutions is considered a part of your
skills and a portion of the marks is devoted to that.
f (x) f (x)
C = max and for all x
all x h( x ) C h( x )
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES31
so that 0 < g( x ) ≤ 1.
Therefore
48 3
C= = .
32 2
C is closely related to the efficiency of the procedure, as it represents the
number of values which must be generated on average in order to end up
with a single acceptable value x.
f (x)
3. Look at the ratio and find the maximum value it takes over the
h( x )
range. (Usually this will involve calculus.) Call this maximum C.
f (x) f (x)
When is divided by C you get . This is g( x ).
h( x ) C h( x )
4. Now generate some values yi from the distribution h but accept them
only if ui ≤ g(yi ), i.e., with probability g(yi ). Here the ui ’s are random
values from U (0, 1).
Use these to generate four pseudorandom values from the exponential dis-
tribution with probability density function;
f ( x ) = 5e−5x , x>0
Steps are:
(i) Describe carefully how you would use the acceptance/rejection method
to generate discrete pseudo-random values from this distribution using
as your reference distribution the discrete uniform distribution that is
equally likely to take any of the values 0, 1, 2 or 3. (You are not
required to describe how the uniform values are generated and you
may assume that you have continuous uniform values available for the
acceptance/rejection decisions.) [4]
(iii) Outline a more efficient alternative method that could be used to gen-
erate the pseudo-random numbers for the numbers of claims. [3]
(ii) Determine the proportion of the simulations from the U (−1, 1) distri-
bution that will be rejected in the long run. [1]
f ( x ) = θe−θx
(ii) Show that this distribution has mean 500, and variance 500,000. [2]
(iv) Comment on the ease with which we could use the inversion method
to generate random numbers from this distribution. [1]
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES36
For generating values from the 2-parameter Pareto: 0.74 0.82 0.16
For making the acceptance-rejection decision: 0.26 0.59 0.71
( x − µ )2
1
f (x) = √ exp − , −∞ < x < ∞
σ 2π 2σ2
where µ is the mean (or expectation) and σ2 the variance of the distribution.
So in fact algorithm gives you two values each time (X and Y). But you do
have to feed in two values (U1 and U2 ) as well. It might not be obvious but
X and Y are independent.
(Note that you work in radians if you use 2πUi - if you work in degrees
then you need to use 360Ui .)
3. If s > 1 go to step 1.
r
−2 ln s
z1 = v1
s
r
−2 ln s
z2 = v2
s
Various routines for generating random variates can be found in the book
“Numerical Recipes in C: the art of scientific computing”, by W.H. Press,
S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Cambridge University
Press, 1992.
Example 2.4.1. Using the random numbers 0.802 and 0.450 from U (0, 1),
generate a pair of standard normal variates (a) using the Box-Muller algo-
rithm (b) using the polar method. How would you use these methods to
generate values from a N (3, 8) distribution?
Solution. Our z values are:
p
z1 = −2 log 0.802 cos(2π × 0.450) = −0.632
p
z2 = −2 log 0.802 sin(2π × 0.450) = 0.205
You could have used the two initial values the other way round in which case
you would get different answers.
2. State how you would use the polar method to generate pairs of un-
correlated pseudo-random numbers from the standard normal distri-
bution.(2)
Generation of random
variates from a specified
distribution using R
3.1 Introduction
Most computer languages already contain a built-in pseudo-random number
generator. The user is typically requested only to input the initial seed, X0 ,
and upon invocation, the random number generator produces a sequence of
independent uniform (0, 1) random variables.
runif
runif(100)
runif(500)
42
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US
We can change the defaults runif(n, a, b). n is how many numbers you
want to generate, a is the lower bound, b is the upper bound.
Uniform random numbers are equally likely to be any number between the
bounds. This means that when we draw a histogram of our simulated ran-
dom numbers, it should be almost flat.
set.seed(20)
x=runif(200);mean(x)
set.seed(40)
x=runif(200);mean(x)
set.seed(40)
x=runif(200);mean(x)
set.seed(20)
x=runif(200);mean(x)
y=runif(200);mean(y)
Notice how the first and third mean are the same but the last is different.
Example 3.1.1. Generate a sample of size 1000 from a uniform (0, 1) dis-
tribution (runif(1000)) and look at the following;
x<-runif(1000)
hist(x)
hist(log(x))
hist(log(x/(1-x)))
A seed can be specified for the random number generator. This is important
to allow replication of results (e.g., while testing and debugging).
The argument specify the number of variables to be created and the range
over which they are distributed (by default unit interval).
Note that for this, you will input the probability value a to find x.
For X ∼ Binomial (n, p), n defines the number of trials and p denotes
the probability of success. For a help menu about the binomial distribution
functions type the command help(qbinom.
Note that, by default, the pbinom function reports the probability for P( X ≤
x ) (the first argument). When working with discrete density functions, be
careful about off-by-one errors. You can verify your answer by summing the
pdf: sum(dbinom(8:10, 10, p) for 1-pbinom(7,size=10, prob=p
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US
Question 3.2.1. Using the R commands for the Binomial distribution, an-
swer the following questions. To answer these questions, you will need to
show the derivation and write down the R commands that you used for com-
putations.
The firing rates of 16neuros are recorded in a particular region of the brain
of a monkey under each of two experimental conditions. The number of
neurons in which the firing rate is greater in Condition 1 than in Condition
2 is counted. Suppose that in the population of neurons the probability of
a higher firing rate for Condition 1 is 0.38.
(a) What is the probability that exactly 7 of the 16 neurons have higher
firing rates for Condition 1?
(b) What is the probability that 8 or fewer have higher firing rates for
Condition 1?
(c) What is the probability that more than 8 have higher firing rates for
Condition 1?
(d) What is the probability that 8 or more have higher firing rates for
Condition 1?
(e) What is the probability that from 5 to 7 have higher firing rates for
Condition 1?
Question 3.2.2. Problem Suppose there are twelve multiple choice ques-
tions in an English class quiz. Each question has five possible answers, and
only one of them is correct. Find the probability of having four or less
correct answers if a student attempts to answer every question at random.
Solution Since only one out of five possible answers is correct, the prob-
ability of answering a question correctly by random is 1/5=0.2. We can find
the probability of having exactly 4 correct answers by random attempts as
follows.
¿ dbinom(4, size=12, prob=0.2) [1] 0.1329 To find the probability of hav-
ing four or less correct answers by random attempts, we apply the function
dbinom with x = 0,,4.
¿ dbinom(0, size=12, prob=0.2) + + dbinom(1, size=12, prob=0.2) +
+ dbinom(2, size=12, prob=0.2) + + dbinom(3, size=12, prob=0.2) + +
dbinom(4, size=12, prob=0.2) [1] 0.9274 Alternatively, we can use the cu-
mulative probability function for binomial distribution pbinom.
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US
(i) What is the probability that the intern on call will be able to
sleep through the night?
We simply evaluate the Poisson distribution for the case where
k = 0, and λ = 4. The R-code is: >dpois(0,4)
(ii) What is the probability that the intern will be swamped (defined
as more than six births to be attended to)?
We will use the cumulative distribution function. The R-code is:
1-ppois(6,4)
4.
5. The employees of a firm that does asbestos cleanup are being tested
for indications of asbestos in their lungs. the firm is asked to send four
employees who have positive indication of asbestos on to a medical
center for further testing. Assume 40% of the employees have positive
indications of asbestos in their lungs. Let x be the number of employees
that are tested before finding the four who do have asbestos in their
lungs.
If you input only x, the default values for mean and standard de-
viation are m = 0 and s = 1, and therefore, the command dnorm(x)
finds the density function for a standard normal random variable at
value x.
Note that for this, you will input the probability value p to find x p .
For a help about the normal distribution functions type the help command
help(qnorm).
Example 3.2.1. Using the R commands for the Normal distribution, an-
swer the following questions. To answer these questions, you will need to
show the derivation and write down the R commands that you used for com-
putations.
(b) If the acceptable range of hardness was (70 − c, 70 + c), for what value
of c would 95% of all specimens have acceptable hardness?
Start with making 4 plots, each showing the probability density func-
tion on the interval from 0 to 10, for each of these functions. Hints:
Look for the functions dnorm, dunif, dchisq. and df. You can place
all plots in the same figure using the command
> par(mfrow=c(2,2)
Then, make three similar histograms, but where you take the mean
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US
6. Repeat the previous exercise for each of the three other distributions
mentioned in the first exercise. What do you observe?Can you connect
this to previous theoretical statistical knowledge?
8. Calculate the probability for each of the following events: (a) A stan-
dard normally distributed variable is larger than 3. (b) A normally
distributed variable with mean 35 and standard deviation 6 is larger
than 42. (c) Getting 10 out of 10 successes in a binomial distribution
with probability 0.8. (d) X < 0.9 when X has the standard uniform
distribution. (e) X > 6.5 in a χ2 distribution with 2 degrees of free-
dom.
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US
(a) Find the 95th percentile of the F distribution with (5, 2) degrees
of freedom.
(b) Find the 2.5th and 97.5th percentiles of the Student t distribution
with 5 degrees of freedom.
(c) Find the 95th percentile of the Chi-Squared distribution with 7
degrees of freedom.
(d) Assume that the test scores of a college entrance exam fits a
normal distribution. Furthermore, the mean test score is 72, and
the standard deviation is 15.2. What is the percentage of students
scoring 84 or more in the exam?
(e) Suppose the mean checkout time of a supermarket cashier is three
minutes. Find the probability of a customer checkout being com-
pleted by the cashier in less than two minutes. Ans: pexp(2,
rate=1/3)
Chapter 4
History and definition: The term “Monte Carlo” was apparently first
used by Ulam and von Neumann as a Los Alamos code word for the stochas-
tic simulations they applied to building better atomic bombs. Their meth-
ods, involving the laws of chance, were aptly named after the international
gaming destination; the moniker stuck and soon after the War a wide range
of sticky problems yielded to the new techniques. Despite the widespread
use of the methods, and numerous descriptions of them in articles and mono-
graphs, it is virtually impossible to find a succint definition of “Monte Carlo
method” in the literature. Perhaps this is owing to the intuitive nature of
the topic which spawns many definitions by way of specific examples. Some
authors prefer to use the term “stochastic simulation” for almost everything,
reserving “Monte Carlo” only for Monte Carlo Integration and Monte Carlo
Tests (cf. Ripley 1987). Others seem less concerned about blurring the dis-
tinction between simulation studies and Monte Carlo methods.
Be that as it may, a suitable definition can be good to have, if for nothing
other than to avoid the awkwardness of trying to define the Monte Carlo
method by appealing to a whole bevy of examples of it. Since I am (so
Elizabeth claims!) unduly influenced by my advisor’s ways of thinking, I
like to define Monte Carlo in the spirit of definitions she has used before. In
particular, I use:
Definition: Monte Carlo is the art of approximating an ex-
pectation by the sample mean of a function of simulated random
variables
We will find that this definition is broad enough to cover everything that
53
CHAPTER 4. MONTE CARLO METHODS AND IMPORTANCE SAMPLING54
has been called Monte Carlo, and yet makes clear its essence in very familiar
terms: Monte Carlo is about invoking laws of large numbers to approximate
expectations.
While most Monte Carlo simulations are done by computer today, there
were many applications of Monte Carlo methods using coin-flipping, card-
drawing, or needle-tossing (rather than computer generated pseudo-random
numbers) as early as the turn of the century—long before the name Monte
Carlo arose.
4.0.1 Introduction
Two major classes of numerical problems that arise in statistical inference
are optimization problems and integration problems. Indeed, numerous ex-
amples (see Rubinstein, 1981, Gentle, 2002, or Robert, 2001) show that it is
not always possible to analytically compute the estimators associated with
a given paradigm (maximum likelihood, Bayes, method of moments, etc.).
Thus, whatever the type of statistical inference, we are often led to con-
sider numerical solutions. The previous chapter introduced a number of
methods for the computer generation of random variables with any given
distribution and hence provides a basis for the construction of solutions to
our statistical problems. A general solution is indeed to use simulation, of
either the true or some substitute distributions, to calculate the quantities of
interest. In the setup of decision theory, whether it is classical or Bayesian,
this solution is natural since risks and Bayes estimators involve integrals
with respect to probability distributions.
Lastly, numerical integration tools cannot easily face the highly (or even
moderately) multidimensional integrals that are the rule in statistical prob-
lems. Devising specific integration tools for those problems would be too
costly, especially because we can take advantage of the probabilistic nature
of those integrals.
where χ denotes the set where the random variable X takes its value. The
principle of the Monte Carlo method for approximating (3.1) is to generate
a sample ( X1 , . . . , Xn ) from the density f and propose as an approximation
the empirical average
Random: 23 64 18 96 71 46 54 8 11 81 75 39 28 43 52
Solution. Table 1
Demand: 8 9 10 11 12 13 14 15 16 17
Probability: 0.05 0.09 0.10 0.15 0.13 0.08 0.11 0.14 0.08 0.07
Table 2
The following is the schedule of simulated demand for the first 15 days,
in the order of the given random numbers;
Random numbers: 23 64 18 96 71 46 54 8 11 81 75 39 28 43 52
Day: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Demand: 10 14 10 17 15 12 13 9 9 15 15 12 11 12 13
From table 2, read the demand against each random number.
Example 4.0.2. Mill bakery maintains sufficient stock of its ‘Ever sweet’
cake and the daily demand is as under:
Daily Demand 0 10 20 30 40 50 60 70 80
Probability 0.02 0.16 0.23 0.15 0.13 0.12 0.10 0.06 0.03
Using the sequence of random numbers simulate the demand for the next
12 days. If the proprietor of the bakery decides to make 40 cakes everyday,
then calculate the stock position at the end of the 12th day. Also calculate
the daily average demand for the cakes.
Random: 36 29 84 57 19 79 46 67 08 81 87 94
Solution. The table below shows the required columns to simulate the daily
average demand for the cakes.
Demand Probability Cumulative Random Number Random
Probability Interval Number
0 0.02 0.02 00 − 01 −
10 0.16 0.18 2 − 17 8
20 0.23 0.41 18 − 40 36, 29, 19
30 0.15 0.56 41 − 55 46
40 0.13 0.69 56 − 68 57, 67
50 0.12 0.81 69 − 80 79
60 0.10 0.91 81 − 90 84, 81, 87
70 0.06 0.97 91 − 96 94
80 0.03 1.00 97 − 99 −
The following is the schedule of simulated demand for the first 15 days.
Random Numbers 36 29 84 57 19 79 46 67 08 81 87 94
Day 1 2 3 4 5 6 7 8 9 10 11 12
Demand 20 20 60 40 20 50 30 40 10 60 60 70
Stock Position 20 40 20 20 40 30 40 40 70 50 30 0
CHAPTER 4. MONTE CARLO METHODS AND IMPORTANCE SAMPLING58
Assignment I
Simulate the system for 15 days and find the average number of late
corners per day, using the following random numbers.
94, 22, 12, 64, 94, 60, 85, 0, 1, 91, 44, 43, 47, 39, 38, 77
Chapter 5
Mathematical Functions
5.1 Introduction
For the kinds of functions you will meet in statistical computing there are
only three mathematical rules that you need to learn: these are concerned
with powers, exponents, and logarithms. In the expression x b the explana-
tory variable is raised to the power b. In e x the explanatory variable appears
as a power-in this special case, of e = 2.71828, of which x is the exponent.
The inverse of e x is the logarithm of x, denoted by log( x ) - note that all
our logs are to the base e and that, for use writing log( x ) is the same as ln( x ).
60
CHAPTER 5. MATHEMATICAL FUNCTIONS 61
y = a ln(bx )
x<-seq(0,10,0.1)
Note that the plot function can be used in an alternative way, specifying
the cartesian coordinates of the line. Using plot(x,y) rather than the for-
mula plot(y x).
x<-seq(0,2*pi,2*pi/100)
y1<-cos(x)
y2<-sin(x)
plot(y1~x, type= "I", main= "cosine curve")
plot(y2~x, type= "I", main= "sine curve")
y3<-tan(x)
plot(y3~x, type= "I", ylim= c(-3,3), main= "tangent curve")
Depending on the value of the power, b, the relationship can take one of
five forms. In the trivial case of b = 0 the function is y = a (a horizontal
straight line).
The four more interesting shapes are as follows:
x<-seq(0,1,0.01)
y<-x^0.5
plot(x, y, type="I",main="0<b<1")
y<-x
plot(x, y, type="I", main="b=1")
y<-x^2
plot(x, y, type="I",main="b>1")
y<-1/x
plot(x, y, type="I", main="b<0")
log(y) = log( ax b )
= log( a) + b log( x )
So that on log-log axes the intercept is log(a) and the slope is b. These
are often called allometric relationships because when b ̸= 1 the proportion
of x that becomes y varies with x.
x<-seq(0,10,0.1)
y1<-2+5*x-0.2*x^2
Making the negative coefficient of the x2 term larger produces a curve with
a hump as follows:
y2<-2+5*x-0.4*x^2
CHAPTER 5. MATHEMATICAL FUNCTIONS 64
y3<-2+4*x-0.6*x^2+0.04*x^3
y4<-2+4*x+2*x^2-0.6*x^3+0.04*x^4
Gamma function
The gamma function Γ(t) is an extension of the factorial function t!, to
positive real numbers:
Z ∞
Γ(t) = x t−1 e− x dx
0
t<-seq(0.2, 4, 0.01)
plot(t, gamma(t), type="I")
abline(h=1, lty=2)
Factorial function
The factorial function gives the number of permutations of n items.
n! = n(n − 1)(n − 2) · · · × 3 × 2 × 1
x<-0:6
plot(x, factorial(x), type= "s", main= "factorial x", log= "y")
These show the number of ways there are of selecting x items out of n items
when the item can be one of just two types (e.g. either male or female, black
or white, solvent or insolvent.)
n n!
=
x x!(n − x )!
So with n = 8 and x = 3, we get:
8×7×6
n 8!
= = = 56
x 3!(8 − 3)! 3×2
and in R.
choose(8,3)
[1] 56
prefix); the cumulative probability (p); the quantiles of the distribution (q);
and random numbers generated from the distribution (r). Each letter can
be prefixed to the R function names in the table below (e.g dbeta)
curve(pnorm(x),-33)
arrows(-1,0,-1,pnorm(-1),col="red")
arrows(-1,pnorm(-1),-3,pnorm(-1),col="green")
pnorm(-1)
[1] 0.1586653
CHAPTER 5. MATHEMATICAL FUNCTIONS 67
d
( F ( x )) = f ( x )
dx
You can see at once that the slope is never negative. The slope starts out
very shallow up to about x = −2, increases up to a peak (at x = 0 in this
example) then gets shallower, and becomes very small indeed above about
x = 2.
Here is what the density function of the normal (dnorm) looks like:
curve(dnorm(x),-3,3)
For a discrete random variable, like the poisson or the binomial, the proba-
bility density function is straightforward: it is simply a histogram with the
y axis scaled as probabilities rather than counts, and the discrete values of
x (0, 1, 2,· · · ) on the horizontal axis. But for a continuous random variable,
the definition of the probability density function is more subtle: it does not
have probabilities on the y axis, but rather the derivative (the slope) of the
cumulative probability function at a given value of X.
Normal distribution
When the distribution has mean 0 and standard deviation 1 (the standard
normal distribution) the equation becomes:
1 2
f (z) = √ ez /2
2π
Suppose we have measured the heights of 100 people. The mean height was
170cm and the standard deviation was 8cm. We can answer three sorts
of questions about the data that follows: What is the probability that a
randomly selected individual will be:
pnorm(-1.25)
[1] 0.1056498
This is the probability that someone will be less than or equal to 185cm
(that is what the function pnorm has been written to provide). All we need
to do is to work out the complement of this:
1-pnorm(1.875)
[1] 0.03039636
hist(runif(1000)*10, main="")
What about the distribution of sample means, based on taking just 5 uni-
formly distributed random numbers?
means<-numeric(1000)
for(i in 1:1000){
means[i]<-mean(runif(5)*10)
}
hist(means, ylim= c(0,1600))
mean(means)
sd(means)
x<-seq(0,10,0.1)
The probability density has an integral of 1.0 (that’s the area beneath the
normal curve), but we had 10,000 samples. To scale the normal probability
density function to our particular case, however, depends on the height of
CHAPTER 5. MATHEMATICAL FUNCTIONS 70
yvals<-rnorm(100,24,4)
mean(yvals)
[1] 24.2958
sd(yvals)
[1] 3.5725
If you want to generate random numbers with an exact mean and stan-
dard deviation, then do the following:
ydevs<-rnorm(100,0,1)
Now compensate for the fact that the mean is not exactly 0 and the stan-
dard deviation is not exactly 1 by expressing all the values as departures
from the sample mean scaled in units of the sample’s standard deviations:
ydevs<-(ydevs-mean(ydevs))/sd(ydevs)
Check that the mean is zero and the standard deviation is exactly 1:
mean(ydevs)
sd(ydevs)
The mean is as close to zero as makes no difference, and the standard devi-
ation is one. Now multiply this vector by your desired standard deviation
and add to your desired mean value to get a sample with exactly the means
and standard deviation required:
yvals<-24+ydevs*4
mean(yvals)
CHAPTER 5. MATHEMATICAL FUNCTIONS 71
sd(yvals)
Sampling Distributions
Recall that if X has the standard normal distribution, then X 2 has the spe-
cial gamma distribution, which is referred to as the chi–square distribution,
and this accounts for the important role that the chi–square distribution
plays in problems of sampling from normal populations. A chi–square is
often denoted by“χ2 distribution”.
72
CHAPTER 6. SAMPLING DISTRIBUTIONS 73
The mean and the variance of the chi–square distribution with v degrees
of freedom are v and 2v, and its moment generating function is given by
1
MX (t) = (1 − 2t)−v/2 for t< .
2
Theorem 6.1.1. If X has the standard normal distribution, then X 2 has
the Chi-squared distribution with v = 1 degree of freedom.
More generally, let us show that
Theorem 6.1.2. If X1 , X2 , · · · , Xn are independent random variables having
standard normal distributions, then
n
Y= ∑ Xi2
i =1
Proof. Since a detailed proof of part 1 would go beyond the scope of this
course, we shall proof only for n = 2, and we shall assume the independence
of X̄ and S2 in our proof of part 2.
n
Now, if we divide each term by σ2 and substitute (n − 1)S2 for ∑ (Xi − X̄ )2 ,
i =1
it follows that
n 2 2
Xi − µ ( n − 1) S2 X̄ − µ
∑ σ
=
σ2
+ √
σ/ n
.
i =1
with regard to the three terms of this identity, we know that the one on the
left-hand side of the equation is a random variable having a chi–square distri-
bution with n degrees of freedom. Also, the second term on the right–hand
side of the equation is a random variable having a chi–square distribution
CHAPTER 6. SAMPLING DISTRIBUTIONS 75
( n − 1) S2
σ2
is a random variable having a chi–square distribution with n − 1 degrees of
freedom.
Example 6.1.1. Plot the Chi-square density function with degrees of free-
dom, d f = 6.
Solution. We will plot the function from x = 0 to x = 30.
> x=seq(0,30,0.1)
> plot(x,dchisq(x,6),main="Chi-distribution with df=6",type="l",col="blue")
> plot(x,dchisq(x,2),main="Chi-distribution with df=2",type="l",col="red")
The function dchisq(x,6) returns the Chi-square density function with
df = 6.
Question 6.1.1. Find the critical value, χ2α (k ), of the Chi-square distribu-
tion with 6 degrees of freedom and significance level α = 0.05.
Solution. We obtain:
> qchisq(0.05,6,lower.tail=F)
12.59159
Then, under H0 ,
( n − 1) S2
X2 =
σ2
has a χ2 distribution with n − 1 degrees of freedom.
H0 : σ2 = σ02 against
H1 : σ > 2
σ02 . (upper-tail)
or
H0 : σ2 = σ02 against
H1 : σ < 2
σ02 . (upper-tail)
or
H0 : σ2 = σ02 against
2
H1 : σ ̸= σ02 . (two-tailed test)
Checking in the tables, in the row headed 6 d.f and the column headed
with an upper-tail area of 0.05, we see the number 12.5916. Thus,
! !
6 6
P ∑ Zi2 > 12.5916 = 0.05 or P ∑ Zi2 ≤ 12.5916 = 0.95
i =1 i =1
and b = 12.5916.
Example 6.1.4. Suppose that the thickness of a part used in a semicon-
ductor is its critical dimension and that the process of manufacturing these
parts is considered to be under control if the true variation among the thick-
ness of the parts is given by a standard deviation not greater than σ = 0.60
thousandth of an inch. To keep a check on the process, random samples of
size n = 20 are taken periodically, and it is regarded to be “out of control”
if the probability that s2 will take on a value greater than or equal to the
observed sample value is 0.01 or less (even though σ = 0.60). What can
one conclude about the process if the standard deviation of such a periodic
random sample is s = 0.84 thousandth of an inch?
( n − 1) s2
Solution. The process will be declared “out of control” if with
σ2
n = 20 and σ = 0.60 exceeds χ20.01, 19 = 36.191. Since
( n − 1) 19(0.84)2
= = 37.24.
σ2 (0.60)2
CHAPTER 6. SAMPLING DISTRIBUTIONS 78
1-pchisq(4.5,df=2)
0.1053992
It follows that
!
n n n
E (Y ) = E ∑ Zi2 = ∑ E(Zi2 ) = ∑ 1 = n;
i =1 i =1 i =1
X̄ − µ
√
σ/ n
− v+2 1
Γ v+2 1
t2
f (t) = √ 1+ for − ∞ < t < ∞
πvΓ( 2v ) v
Theorem 6.2.1. If X̄ and S2 are the mean and the variance of a random
sample of size n from a normal population with the mean µ and the variance
σ2 , then
X̄ − µ
T= √
s/ n
has the t-distribution with n − 1 degrees of freedom.
Although the mean of the t distribution does not exist when n = 1, the
mean does exist for any value of n > 1 of course, whenever the mean does
exist, its value is 0 because of the symmetry of the t distribution.
Figure (**) displays the pdfs of two t distributions. They can be distin-
guished by virtue of the fact that the variance of t(ν) decreases as ν in-
creases. It may strike you that t pdfs closely resemble normal pdfs. In fact,
the standard normal pdf is a limiting case of the tpdfs:
CHAPTER 6. SAMPLING DISTRIBUTIONS 81
Theorem 6.2.2. Let Fν denote the cdf of t(ν) and let Φ denote the cfd of
Normal (0,1). Then
lim Fν (t) = Φ(t)
ν→∞
pt(-1.5, df=14)
0.07791266
Y1 /ν1
F=
Y2 /ν2
is a random variable having an F distribution. Because Yi ≥ 0, the set of
possible values of F is F (S) = [0, ∞). We are interested in the distribution
of F:
v1 1
Γ v1 +2 v2 v 1 − 2 ( v1 + v2 )
v1 2 v1
− 1
f (x) = ·x2 1+ x
Γ v21 Γ v22
v2 v2
Z
T= √ ∼ t(ν2 )
Y2 /ν2
and
Z2 Y /1
T2 = = 1 ∼ F (1, ν2 ).
Y2 /ν2 Y2 /ν2
More generally,
Theorem 6.3.1. If T ∼ t(ν), then T 2 ∼ F (1, ν).
The R function pf returns values of the cdf of any specified F distribution.
For example, if F ∼ (2, 27), then P( F > 2.5) is
1-pf(2.5,df1=1,df2=27)
0.1008988
Theorem 6.3.2. If s21 and s22 and the variances of independent random
samples of size n1 and n2 from normal populations with the variance σ12 and
σ22 , then
s2 /σ2 σ 2 s2
F = 12 12 = 22 12
s2 /σ2 σ1 s2
is a random variable having an F- distribution with n1 − 1 and n2 − 1 degrees
of freedom.
We may be interested in comparing the variance of two normal distribu-
tions, and testing to determine whether or not they are equal. This problem
is often encountered when comparing the precision of two measuring instru-
ments, the variation in quality characteristics of manufactured product, or
the variation in scores for two testing procedures.
Suppose X11 , X12 , · · · , X1n1 and X21 , X22 , · · · , X2n2 are independent random
samples from normal distributions with unknown means and Var( X1i ) = σ12 ,
Var( X2i ) = σ22 , where σ12 and σ22 are unknown. Suppose we are interested
to test the null hypothesis, H0 : σ12 = σ22 against the alternative hypothesis,
H1 : σ12 > σ22 now, σ12 and σ22 can be estimated by S12 and S22 respectively.
CHAPTER 6. SAMPLING DISTRIBUTIONS 83
We would reject H0 : σ12 = σ22 in favour of H1 : σ12 > σ22 if S12 is much
larger than S22 , i.e., reject H0 : σ12 = σ22 if
s21
s22
The rejection region becomes Fcomp > Fα,n1 −1, n2 −2 i.e., k = Fα,n1 −1, n2 −2 .
Example 6.3.1. Consider two random samples, X1 and X2 of sizes 10 and
20 with sample variances given as 0.0003 and 0.0001 respectively. Assuming
that the populations, from which the samples have been drawn, are normal,
determine whether the variance of the first population is significantly greater
than the second one. Take α = 0.05.
Solution. Let σ12 and σ22 denote the variances for the first and second popu-
lation from which the samples were taken. Then the hypothesis to be tested
is H0 : σ12 = σ22 against H1 : σ12 > σ22 .
Therefore;
S12 0.0003
F= = =3
S22 0.0001
The tabulated value is F9,19,0.05 = 2.42.
Fcalc = 3.
CHAPTER 6. SAMPLING DISTRIBUTIONS 84
Conclusion: The variation of the first population is greater than the second
one.
Note:
1. If X1 , X2 , X3 · · · Xn is a random sample of size n from a normal distri-
bution with mean of µ and a variance of σ2 , then
1 n
n i∑
X̄ = Xi
=1
χ21 /v1
F= ,
χ22 /v2
is said to have an F–distribution with v1 numerator degrees of freedom
and v2 denominator degrees of freedom.
Example 6.3.2. For the two samples on yields of potatoes we had n1 = 10,
S12 = 12, n2 = 12, S22 = 26.17. Is there reason to suspect that the samples
came from normal population with different variances? Take α = 5% and
α = 1%.
CHAPTER 6. SAMPLING DISTRIBUTIONS 85
(a). Is it reasonable to assume that the variances of the height of boys and
girls of this age are the same?
(b). What is the 95% confidence interval for the difference in mean heights
of boys and girls of this age?
Confidence Intervals
So far, we’ve talked about estimates for parameters that give a single number
as the best guess for that parameter given the data we have. Sometimes it
is useful to instead estimate an interval for the possible values of the param-
eter and put a probability on how confident we are that the true parameter
value falls inside this interval.
P( Ln < θ < Un ) = 1 − α.
86
CHAPTER 7. CONFIDENCE INTERVALS 87
where the probability is computed using the standard normal dcf. with a
little algebraic manipulation, we can convert this to
σ σ
P( X̄n − zα/2 √ < µ < X̄n + zα/2 √ ) = 1 − α.
n n
Using R: To calculate the critical value zα/2 , notice that this is just another
word for the 1 − α/2 quantile. The R command to get this value is:
qnorm(1-0.5 * alpha)
For the most common choice of α = 0.05 (again, a 95% confidence inter-
val), we get that z0.025 ≈ 1.96, for a 99% confidence interval, we would get
the critical value z0.005 ≈ 2.58.
Example 7.0.1. You want to estimate the average snowfall in the Mt.Kenya
front this year, and you take snowfall measurements at 40 different locations
along the front. Experience from previous years indicates that there is a
variance of 36 inches between measurements at these locations. You compute
the average snowfall for the year is 620 inches. What is a 95% confidence
interval for the average snowfall?
t-statistic and its distribution were discovered by Ronald Fisher.)1 The pdf
for the t-distribution is very similar to the Gaussian. It is centered at zero
and symmetric “hill” shape, but it does have “heavier tails”, meaning that
it goes to zero slower than the Gaussian. As n gets large, the t-distribution
has one parameter, the degrees of freedom m. The random variable Tn above
has a t-distribution with degrees of freedom equal to m = n − 1. In terms
of notation, this is written Tn ∼ t(n − 1).
Back to confidence intervals, we can also get critical values of the t-distribution,
denoted tα/2 , which satisfy
qt(1-0.025, df=9)
which return the result 2.26. Notice that this gives a wider confidence
interval than the Gaussian case above did. If n were to grow larger, our
confidence interval would shrink and eventually match the 1.96 value of the
Gaussian.
Example 7.0.2. Repeat the Snowball analysis above, but this time you do
not rely on previous estimates of the snowfall variance. You compute the
1 Itis called Student’s t-distribution because Gossett used the pseudonym “Student” so
that he wouldn’t get in trouble with his employer (Guinness brewery) for publishing his
work.
CHAPTER 7. CONFIDENCE INTERVALS 89
t.test(data)
> t.test(data)
t.test(data, conf.level=0.9)
We apply it to the data set data above, assuming we know ahead that
the variance is 2.
> data=c(6.2, 6.6, 7.1, 7.4, 7.6, 7.9, 8, 8.3, 8.4, 8.5, 8.6, 8.8, 8.8,
+ 9.1, 9.2, 9.4, 9,4, 9.7, 9.9, 10.2, 10.4, 10.8, 11.3, 11.9)
>
>
> norm.interval=function(data, variance=var(data), conf.level=0.95){
+ z=qnorm((1-conf.level)/2, lower.tail=FALSE)
+ xbar=mean(data)
+
sdx=sqrt(variance/length(data))
+ c(xbar-z*sdx, xbar+z*sdx)
+ }
CHAPTER 7. CONFIDENCE INTERVALS 91
> norm.interval(data, 2)
# "good" or "excellent"
# (something new)
#let’s select a 95% confidence interval
alpha = 0.05
zstar = -qnorm(alpha/2)
The main distribution used in hypothesis testing are: the chi-squared, for
testing hypotheses involving count data; Fisher’s F, in analysis of variance
(ANOVA) for comparing two variances; and Student’s t, in small-sample
work for comparing two parameter estimates. These distributions tell us
the size of the test statistic that could be expected by chance alone when
nothing was happening (i.e. when the null hypothesis was true). Given the
rule that a big value of the test statistic tells us something is happening
and hence that the null hypothesis is false, these distributions define what
constitute a big value of the test statistic.
93
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING94
par(mfrow=c(1,2))
x<-seq(0,30,0.25)
plot(x, pchisq(x, 3, 7.25), type= "l", ylab= "p(x)", xlab="x")
plot(x, pchisq(x, 5, 10), type= "l", ylab="p(x)", xlab="x")
The cumulative probability on the left has 3df and non-centrality parame-
ter (12 + 1.52 + 22 = 7.25), while the distribution on the right has 4df and
non-centrality 10 (note the longer left-hand tail at low probabilities). Chi-
squared is also used to establish confidence intervals for sample variances.
The quantity
( n − 1) s2
σ2
is the degree of freedom (n − 1) multiplied by the ratio of the sample vari-
ance s2 to the unknown population variance σ2 . This follows a chi-squared
distribution, so we can establish a 95% confidence interval for σ2 as follows:
( n − 1) s2 2 ( n − 1) s2
≤ σ ≤
χ21− α χ2α
2 2
8*10.2/qchisq(0.975, 8)
[1] 4.65367
8*10.2/qchisq(0.025, 8)
[1] 37.43582
which means that we can be 95% confident that the population variance
lies in the range 4.65 ≤ σ2 ≤ 37.44.
F distribution, and you will often want to use the quantile qf to look up
critical values of F. You specify, in order, the probability of you one-tailed
test (this will usually be 0.95), then the two degrees of freedom: numerator
first, then denominator. So the 95% value of F with 2 and 18 d.f is:
qf(0.95, 2, 18)
[1] 3.55457
This is what the density function of F looks like for 2 and 18 d.f. and 6
and 18 d.f.
x<-seq(0.05,4,0.05)
plot(x, df(x,2,18), type= "I", ylab="f(x)", xlab="x")
plot(x, df(x,6,18), type= "I", ylab="f(x)", xlab="x")
You see that the rule of thumb (critical F = 4) quickly becomes much
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING96
too large one the d.f. in the numerator (on the x-axis) is larger than 2. The
lower (solid) line shows the critical values of F when the denominator has
30 d.f. and the upper(dashed) line shows the case in which the denominator
has 10 d.f.
x̄ ± tα,(n−1) s x̄
where;
x̄ −sample mean
tα,(n−1) −critical value of t at a given level of significance (n − 1) degrees of freedom.
s
s x̄ = √ − s−is the sample standard deviation and is taken as an estimate of σ.
n
n−sample size.
Example 8.0.1. In order to revise the accident insurance rates for auto-
mobiles, an insurance company wants to assess the damage caused to cars
by accidents at a speed of 15 miles per hour. A sample of 16 new cars was
selected at random and the company clashed each one at the speed of 15
miles per hour. The cars so damaged were repaired and it was found that
the average repair amount was $2500 with standard deviation of $950.
µ = x̄ ± tα,(n−1) s x̄
x̄ = 2500, n = 16, s = 950, α = 0.05
check Ans: 1993.90, 3006.10
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING97
Null hypothesis: H0 : µ1 ≤ 30
Alternative hypothesis: H1 : µ1 > 30
Since our computed value of t == 2.45 is higher than the critical value
of t = 2.02, we cannot accept the null hypothesis. hence, the claim is not
considered correct.
Exercise:
A claim is made that DeKUT University students have an IQ of 120. To
test this claim, a random sample of 10 students was taken and their IQ score
was recorded as follows
105, 110, 120, 125, 100, 130, 120, 115, 125, 130
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING98
(ii). The sample size taken from each population is small (n < 30), but the
samples do not have to be equal in size.
(i). The number of degrees of freedom is the sum of the degrees of freedom
for each sample, where n1 is the sample size from population 1, and n2
is the sample size from population 2, the number of degrees of degrees
of freedom would be expressed as
d f = (n1 − 1)(n2 − 1)
= n1 + n2 − 2
(ii). The two standard deviations s1 and s2 calculated from the two samples
of size n1 and n2 respectively, are pooled together to form a single esti-
mate (s p ) of the population standard deviation, where s p is calculated
as: s
(n1 − 1)s21 + (n2 − 1)s22
sp =
n1 + n2 − 2
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING99
where;
The calculated t-statistic is compared with the critical t-score from the table
at a given level of significance and (n1 + n2 − 2) degrees of freedom and a
decision is made whether to accept or reject a null hypothesis.
Example 8.0.3. The college administration wants to find out if there is
any significant difference in the average amount of money carried by male
and female students at KeMU Nakuru campus on any given day. A random
sample of 8 male students and 10 female students was selected and the
amount of money they each had was recorded. From each sample, the
sample means and sample standard deviation were found as follows:
Male students Female students
n1 = 8 n2 = 10
x̄1 = £20.5 x̄2 = £17.00
s1 = £2.00 s2 = £1.50
We want to test if there is any significant difference in the average amount
of money carried by male and female students at 5% level of significance.
Solution. The null hypothesis states that there is no significant difference
between the average amounts of cash carried by male and female students
at the college;
Null hypothesis H0 : µ1 = µ2
Alt. hypothesis H1 : µ1 ̸= µ2 (A two tailed test)
where;
Hence;
20.5 − 17 3.5
t= q = 0.823 = 4.25
1 1
3.01 8 + 10
The critical t-score from the t table for a two-tailed test with α = 0.05 and
d f = (n1 + n2 − 2) = (8 + 10 − 2) = 16 is given as;
t0.05,16 = 2.12
Since our calculated value of t = 4.25 is larger than critical t value of 2.12,
we cannot accept the null hypothesis.
(a) Briefly explain the working of the following conditional statements and
loops in R programming giving the default syntax for each.
(i). repeat()
(ii). while()
(iii). for()
(iv). ifelse()
101
CHAPTER 9. PAST EXAMINATION PAPERS 102
> i <- 0
> repeat {
+ i <-i+1 + if (i==3) break + }
> i
[2 marks]
(f) Using the algorithm with parameter values a = 10, c = 23, m = 59
and seed x0 = 11 to generate 5 random numbers from the U (0, 1). [5
marks]
CHAPTER 9. PAST EXAMINATION PAPERS 103
(a). Write the expected output of the following R code line. [2 marks]
(b). (i) Explain the main advantage of the Polar method compared with the
Box- Muller method for pairs of uncorrelated pseudo-random values
from a standard normal distribution.
[2 marks]
(ii) State briefly the method you would use to generate pseudo-random
numbers on a computer for each of the following statistical distribu-
tions:
(ii) Describe the three integers one has to choose in order to form a lin-
ear congruential generator (LCG) and set out a recursive relationship
with subsequent scaling, for generating a sequence of pseudo-random
numbers in the range −1 to 1. [5 marks]
(c). Given a random number U uniformly distributed over [0, 1], give an ex-
pression in terms of U for another variable X whose density function is
f ( x ) = 5 exp(−5x ). Justify the expression. [4 marks]
[4 marks]
(b). (i) Show that the random variable X = −0.5 ln U has an exponential
distribution with mean 0.5 where U is uniformly distributed over the
range (0, 1).
[3 marks]
(ii) Explain how the above method can be used to simulate a path of
Poisson process with intensity λ = 2. [4 marks]
(ii) X < 0.9 when X has the uniform distribution between 5 and 9. (1
mark)
(e) (i) Write a short R program using loops to compute the first 100 Fibonacci
numbers.
a1 = 1, a2 = 1, an = an−1 + an−2
(3 marks)
(ii) Using a for loop write the R code to compute the product of the values
in the vector vec. Print the final product and also the intermediate
result during the iteration of the for loop. (3 marks)
(f) (i) Write a for loop in R to compute the sum of the squares of all integers
from 2 to 100. (2 marks)
Calculate the following probabilities and also write the corresponding R code
to generate the probabilities.
(a) Suppose a random variable X, has a Uniform distribution with a = 5 and
b = 9.
(c) The random variable X is normally distributed with mean 79 and variance
144.
(d) Write the R code to simulate the number of accidents for each year for 15
years, when the average rate is 2.8 accidents per year, assuming a poisson
model for the number of accidents each year. (1 mark)
(e) Write an R function to compute the degree of skewness for any vector of
numbers x.
(4 marks)
(a) Write the lines of R code required to compute the amount of interest to be
paid on a loan of $1000 held for 3 years at an interest rate of 4% compounded
annually.
(3 marks)
(b) Suppose, starting at his 25th birthday, Michael deposits 5,000 at the begin-
ning of every year into a retirement annuity that pays 9% interest per year,
compounded annually. He wants to retire when his annuity first reaches or
exceeds Shs 1 million. Write an R code to compute in how many years will
he be able to retire with this plan? (4 marks)
(c) Create a vector ‘values’ of length 100 with random values between 1 and
100. Use a for loop and an if statement to print only the values greater than
3. (3 marks)
(d) It is necessary to simulate samples from a distribution with density function
f ( x ) = 6x (1 − x ) 0 < x < 1.
CHAPTER 9. PAST EXAMINATION PAPERS 109
1 x ¡- 3
2 for (i in 1:3) {
3 show(x)
4 if (x %% 2 == 1) {
5 x ¡- x*4
6 } else {
7 x ¡- 3/x + 5
8 }
CHAPTER 9. PAST EXAMINATION PAPERS 110
9}
10 show(x)
(10 marks)
QUESTION FIVE (20 Marks)
(a) The continuous random variable X has probability density function f ( x ) and
cumulative distribution function F ( x ). Explain carefully how the inversion
method (i.e. the inverse c.d.f. method) of simulation could be used to
simulate observations from this distribution. What restrictions on F ( x ) are
required in order to make this method of simulation work? (4 marks)
(b) The following values are a random sample of numbers from a uniform dis-
tribution on the interval (0, 1).
Use these values to generate 4 random variates from each of the following
distributions, carefully explaining the method you use in each case.
e −2 2 x
(iii) P( X = x ) = , x = 0, 1, · · · . (5 marks)
x!
CHAPTER 9. PAST EXAMINATION PAPERS 111
(i) f (0.5)
(ii) F (2.5)
(c) Using a for-loop compute the product of the values in the vector. Print the
final product, but also print the intermediate result during the iteration of
the for-loop.
(3 marks)
(d) Create a vector ‘values’ of length 10 with random values between 1 and 10.
Use a for-loop and an if-statement to print only the values greater than 3.
(3 marks)
(e) Write the lines of R code required to compute the amount of interest to
be paid on a loan of $25000 held for 3 years at an interest rate of 4%
compounded monthly.
(3 marks)
(f) The continuous random variable X has probability density function f ( x ) and
cumulative distribution function F ( x ). Explain carefully how the inversion
method (i.e. the inverse c.d.f. method) of simulation could be used to
simulate observations from this distribution. What restrictions on F ( x ) are
required in order to make this method of simulation work? (3 marks)
(g) Consider the following function
(
k ( x + 1)3 if − 1 ≤ x ≤ 1
f (x) =
0 Otherwise
CHAPTER 9. PAST EXAMINATION PAPERS 112
(i) Find the value of the constant k so that f ( x ) is a well defined proba-
bility density function. (1
mark)
(ii) Use the inverse transformation method to generate three random ob-
servations from f ( x ) by hand. (3
marks)
(3 marks)
Explain the effect of each command line and the graph that is produced.(4
marks)
(c) Let x < − matrix(6:9, nrow=2, byrow=T) and y < − matrix(2:5, nrow=2,
byrow=T) be 2 by 2 matrices. Write the R-output for each command line
when executed.
(iii) > x% ∗ %y
(iv) > t( x ) + y
(12 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 113
1 x ¡- 3
2 for (i in 1:3) {
3 show(x)
4 if (x %% 2 == 1) {
5 x ¡- x*4
6 } else {
7 x ¡- 3/x + 5
8}
9}
10 show(x)
(10 marks)
QUESTION FIVE (20 Marks)
(a) The following numbers are a random sample of real numbers from a uniform
distribution on the range 0 to 1.
Use these values to generate four random variates from each of the following
distributions, explaining carefully the method you use in each case.
x 3− x
3 1 4
(i) P( X = x ) = , x = 0, 1, 2, 3. (6 marks)
x 5 5
1 4
(ii) f X ( x ) = 2x exp − x ,
3
x ≥ 0. (6 marks)
2
(b) A small post office opens at 9.00 a.m. The time (minutes) until the first
customer arrives, and the times (minutes) between the arrivals of any two
consecutive customers thereafter, are all independent exponential random
variables with expected value 2.
CHAPTER 9. PAST EXAMINATION PAPERS 115
The post office has just one server. As customers arrive, they join a queue
to be served. As soon as the server has finished serving one customer, she
starts to serve the first customer in the queue (if there is a queue). If a
customer arrives when no previous customer is being served and there is no
queue, then the server begins to serve that customer immediately on his or
her arrival. The service times (minutes) for individual customers are inde-
pendent uniform random variables on the range 1.5 to 2.5.
Use pairs of the uniform random numbers given in part (i), in the order
given, to simulate the arrival and service times for each of the first two cus-
tomers to arrive at this post office. Justify your method of simulation, and
record the simulated arrival times of these customers and the time at which
the server finishes serving them.
(8 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 116
(i) f (0.5)
(ii) F (2.5)
(c) Using a for-loop compute the product of the values in the vector. Print the
final product, but also print the intermediate result during the iteration of
the for-loop.
(3 marks)
(d) Create a vector ‘values’ of length 10 with random values between 1 and 10.
Use a for-loop and an if-statement to print only the values greater than 3.
(3 marks)
(e) Write the lines of R code required to compute the amount of interest to
be paid on a loan of $25000 held for 3 years at an interest rate of 4%
compounded monthly.
(3 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 117
(f) The continuous random variable X has probability density function f ( x ) and
cumulative distribution function F ( x ). Explain carefully how the inversion
method (i.e. the inverse c.d.f. method) of simulation could be used to
simulate observations from this distribution. What restrictions on F ( x ) are
required in order to make this method of simulation work? (3 marks)
(g) Consider the following function
(
k ( x + 1)3 ; if − 1 ≤ x ≤ 1
f (x) =
0; Otherwise
(i) Find the value of the constant k so that f ( x ) is a well defined proba-
bility density function. (1
mark)
(ii) Use the inverse transformation method to generate three random ob-
servations from f ( x ) by hand. (3
marks)
(3 marks)
Explain the effect of each command line and the graph that is produced.(4
marks)
(c) Let x < − matrix(6:9, nrow=2, byrow=T) and y < − matrix(2:5, nrow=2,
byrow=T). Write the R-output for each command line when executed.
CHAPTER 9. PAST EXAMINATION PAPERS 118
(i) >t(x)+t(y)
(ii) >crossprod(x,y)
(iii) >x% ∗ %y
(iv) >t(x)+y
Step 2 LET X = 0
Step 7 RETURN X.
(i) Study this algorithm and show that it gives the correct probability of
obtaining the value X = 0. (2 marks)
(ii) Use it to generate two values from a Poisson distribution with mean
λ = 2, using the following pseudo-random values from a uniform dis-
tribution on (0, 1): 0.564, 0.505, 0.756, 0.610, 0.046. (8
marks)
Use these values to generate four random variates from each of the following
distributions, explaining carefully the method you use in each case.
x 3− x
3 1 4
(i) P( X = x ) = , x = 0, 1, 2, 3. (6 marks)
x 5 5
1 4
(ii) f X ( x ) = 2x exp − x ,
3
x ≥ 0. (6 marks)
2
CHAPTER 9. PAST EXAMINATION PAPERS 120
Use the Box-Muller formulae to generate five simulated aptitude test marks
in the case where, µ = 5000 and σ = 1000 using the following pairs of
pseudo-random numbers from a uniform (0, 1) distribution: (0.558, 0.585),
and (0.080, 0.512).
(4 marks)
(c) You are given that f ( x ) = (1/9) x2 for 0 ≤ x ≤ 3. You are to simulate three
observations from the distribution using the inversion method. The following
three random numbers were generated from the uniform distribution on [0,1]:
Using the three simulated observations, estimate the mean of the distribu-
tion.
(4 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 121
(ii) List four methods for the generation of random variates. (2 marks)
(b) You are given the f ( x ) = ( 19 ) x2 for 0 ≤ x ≤ 3.
You are to simulate three observations from the distribution using the inver-
sion method. The following three numbers were generated from the uniform
distribution on [0, 1]:
Using the three simulated observations, estimate the mean of the distribu-
tion.
(4 marks)
(c) Write the expected output of the following R code:
(i) z <- 0
repeat
z <- z + 1
print(z)
if(z > 5) break()
z (2 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 122
(ii) x <- 1
while(x < 5) x <- x+1; if (x == 3) break; print(x); (2
marks)
(iii)
x <- 1
while(x < 5) x <- x+1; if (x == 3) next; print(x);
(2 marks)
(c) The number of claims ( X ) from a health insurance policy over a one year
period has the following probabilities:
(i) Describe how you would use the acceptance/rejection method to gen-
erate discrete pseudo-random samples from this distribution, by using
as reference the discrete uniform distribution that has probability 0.2
for each of the values 0, 1, 2, 3 and 4. You may assume that you can
also generate samples from the Bernoulli distribution, i.e., a sample
that is equal to 1 with probability p and 0 with probability 1 − p, for
any specified value of p in the range (0,1). (4 marks)
(d) Create a vector ‘values’ of length 100 with random values between 5 and 20.
Use a for-loop and an if-statement to print only the values greater than 10.
(3 marks)
(e) Compute the amount of interest to be paid on a loan of Ksh.50,000 held for
3 years at an interest rate of 12% compounded monthly. Write the lines of
R code required to compute the loan balance on a monthly basis and display
the same. (5 marks)
To answer these questions, you will need to compute the values for the re-
spective distributions and write down the R commands that you would use
for the same computations.
(a) Suppose the mass of shire horses is assumed to have a normal distribution
with mean 1000kg and standard deviation 50kg. Calculate the probability
that the mass of a shire horse
(b) Suppose that T has the Weibull distribution with parameters α = 0.2 and
β = 10. Find (i) the probability that T takes a value between 20 and 30,
and (ii) the 95th percentile of T. (4 marks)
(c) Suppose that 10% of a certain type of components last more than 600 hours
of operation. For n = 200 components, let X denote the number of those
that last more than 600 hours. Find (approximately):
(i) P( X ≤ 30), (ii) P(15 ≤ X ≤ 25) and (iii) P( X = 25). (6 marks)
(d) Suppose the mean checkout time of a supermarket cashier is three minutes.
Find the probability of a customer checkout being completed by the cashier
in less than two minutes, three minutes and four minutes. (i.e. what per-
centage of “waiting times” are less than two, three and four minutes?) (2
marks)
Using the above numbers generate a random sample from chi-square distri-
bution with 10 degrees of freedom. (6
marks)
(b)
(i) Let n be an integer and suppose that X1 , X2 · · · , Xn are independent random
variables each having an exponential distribution with parameter λ. Show
that Z = X1 + X2 + · · · + Xn has a Gamma distribution with parameter n
and λ.
(2 marks)
(ii) By using this result, generate a random sample from a Gamma distribution
with mean 30 and variance 300 using the following 5 digit pseudo-random
numbers.
63293 43937 08513
(4 marks)
(c) You are given a stream of independent uniform [0,1] random variables U1 , U2 , U3 · · · .
Describe how to use these random variables to generate random variates with
the following distributions:
(i) The Weibull distribution with probability density function
(
30x2 exp(−10x3 ) if x > 0
f (x) =
0 Otherwise
(4 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 126
(ii) You are to simulate four observations from a binomial distribution with two
trials probability of success of 0.30. The following random numbers are
generated from the uniform distribution on [0, 1]: