0% found this document useful (0 votes)
73 views

Statistical Programming SimulationandNumberGenerator

This document outlines a course on statistical programming in R. It covers generating random numbers from various distributions, Monte Carlo simulation methods, mathematical functions, sampling distributions, confidence intervals, and hypothesis testing. Random number generation includes uniform, discrete, and continuous distributions. Key algorithms discussed are the inverse transform method, acceptance-rejection method, and Box-Muller algorithm for normal variates. The course aims to teach students how to write efficient R programs to import, manipulate, analyze data using descriptive and inferential statistics, and fit statistical models through programming assignments, exams, and recommended textbooks.

Uploaded by

Cyprian Omari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Statistical Programming SimulationandNumberGenerator

This document outlines a course on statistical programming in R. It covers generating random numbers from various distributions, Monte Carlo simulation methods, mathematical functions, sampling distributions, confidence intervals, and hypothesis testing. Random number generation includes uniform, discrete, and continuous distributions. Key algorithms discussed are the inverse transform method, acceptance-rejection method, and Box-Muller algorithm for normal variates. The course aims to teach students how to write efficient R programs to import, manipulate, analyze data using descriptive and inferential statistics, and fit statistical models through programming assignments, exams, and recommended textbooks.

Uploaded by

Cyprian Omari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

Contents

1 Generating Random Numbers 6


1.1 Random numbers generators . . . . . . . . . . . . . . . . . . 6
1.2 Uniform random number generation . . . . . . . . . . . . . . 8
1.3 Linear Congruential Generators . . . . . . . . . . . . . . . . . 9

2 General approach to generating random variates 15


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 The Inverse Transformation Method . . . . . . . . . . . . . . 16
2.2.1 Inverse Transform method for Discrete variates . . . . 17
2.2.2 Exercise Questions . . . . . . . . . . . . . . . . . . . . 25
2.3 Acceptance-Rejection method . . . . . . . . . . . . . . . . . . 30
2.4 Generating Normal Variates . . . . . . . . . . . . . . . . . . . 36
2.4.1 Box-Muller algorithm . . . . . . . . . . . . . . . . . . 36
2.4.2 Polar Algorithm . . . . . . . . . . . . . . . . . . . . . 37
2.4.3 Approximate Method . . . . . . . . . . . . . . . . . . 38
2.4.4 Disadvantages of using truly random, as opposed to
pseudo-random, numbers . . . . . . . . . . . . . . . . 39
2.4.5 Revision Quizz . . . . . . . . . . . . . . . . . . . . . . 39

3 Generation of random variates from a specified distribution


using R 42
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Probability Distributions . . . . . . . . . . . . . . . . . . . . . 44
3.2.1 Discrete Distributions . . . . . . . . . . . . . . . . . . 45
3.2.2 Poisson distribution using R . . . . . . . . . . . . . . . 47
3.2.3 Normal distribution using R . . . . . . . . . . . . . . . 48

4 Monte Carlo Methods and Importance Sampling 53


4.0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 54

1
CONTENTS 2

4.0.2 Classical Monte Carlo Integration . . . . . . . . . . . 55


4.0.3 Simulation Procedures . . . . . . . . . . . . . . . . . . 55

5 Mathematical Functions 60
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.1 Logarithmic function . . . . . . . . . . . . . . . . . . . 61
5.1.2 Trigonometric functions . . . . . . . . . . . . . . . . . 62
5.1.3 Power laws . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.4 Polynomial functions . . . . . . . . . . . . . . . . . . . 63
5.2 Probability Functions . . . . . . . . . . . . . . . . . . . . . . 64
5.2.1 Continuous probability distributions . . . . . . . . . . 65

6 Sampling Distributions 72
6.1 Chi–Square Distributions . . . . . . . . . . . . . . . . . . . . 72
6.2 Student’s t Distributions . . . . . . . . . . . . . . . . . . . . . 79
6.3 Fisher’s F Distributions . . . . . . . . . . . . . . . . . . . . . 81

7 Confidence Intervals 86
7.0.1 Confidence Intervals for the Mean (Known Variance) . 86
7.0.2 Confidence Intervals for the Mean (Unknown Variance) 87

8 Other distributions used in hypothesis testing 93


8.0.1 Chi-square distribution . . . . . . . . . . . . . . . . . 93
8.0.2 Fisher’s F distribution . . . . . . . . . . . . . . . . . . 94

9 Past Examination Papers 101


CONTENTS 3

STA 2305: Statistical Programming I


By Omari C.O
Course Objective: At the end of this course, students should be able to
do programming in R on the kinds of mathematics that find most frequent
application in scientific work and statistical modelling.

Learning Objectives:

1. Be able to write efficient transparent programs in R

2. Produce clear and effective graphical descriptions of data

3. Import, export, and manipulate datasets

4. Analyze data using descriptive and inferential statistics

5. Analyze data through model fitting

Course Outline: Basic knowledge of high level programming languages


such as C++, S-plus and R. Computer arithmetic. Algorithm for mean
and standard deviation. Error analysis. Pseudo random number genera-
tors. Generation of random variates: Discrete and continuous distributions.
Monte Carlo simulation. Power of test, confidence interval. Mathematical
functions, classical tests

Pre-Requisites:

1. STA 2202 Introduction to computer interactive statistics

2. ICS 2108 Programming Methodology

Co-Requisite: STA 2201 Probability and Statistics III

Further Reading Books


There is no required textbook for this course. However, the following books
are recommended as references:

1. Crawley. Statistics: An Introduction Using R. John Wiley & Sons,


2010

2. John Chambers, Software for Data Analysis: Programming with R,


Springer, 2008.
CONTENTS 4

3. Brian Everitt and Torsten Hothorn, A Handbook of Statistical Analyses


Using R, Chap- man Hall/CRC, 2006.

4. Owen Jones, Robert Maillardet, Andrew Robinson, Introduction to


Scientific Program- ming and Simulation Using R, Chapman & Hall/CRC,
2009.

Home work
There will be approximately 12 homework assignments. You are encouraged
to work in groups, but remember to submit your own unique work.
For each assignment, turn-in a hard copy of your R source file along with a
brief write-up of the solutions (do not submit raw output). Also submit via
e-mail (cyomari@yahoo.com) a copy of your R source file. No late homework
will be accepted unless it is due to some excused absence.

Homework assignments will be made and collected throughout the semester.


You may work together on these assignments, but you must formulate and
write your own solutions. No credit will be given for answers without sup-
porting work.

• All homework is due at the beginning of class.

• Late homework will not be accepted, except for acts of God, e.g. med-
ical emergencies or death in the family.

• On R assignments, you must turn in all input programs and R output.

• Annotate your output (either with a word processor or by hand) so


that the grader can determine that you have accomplished the goals
of the assignment.

Honesty: Academic honesty is fundamental to the activities and principles


of a university. All members of the academic community must be confident
that each persons work has been responsibly and honorably acquired, de-
veloped and presented. Any effort to gain an advantage not given to all
students is dishonest whether or not the effort is successful. The academic
community regards academic dishonesty as an extremely serious matter,
with serious consequences that range from probation to expulsion. When
in doubt about plagiarism, paraphrasing, quoting, or collaboration, consult
the course instructor.

Cell Phones: It is your responsibility as a student of this university, to


CONTENTS 5

have your cell phones, ipods, kindles etc CLOSED before entering the class-
room. If a cell phone rings, you text, or you use electronic devices during
a lecture or exam, you will suffer an automatic 10% reduction of your
final grade. Only one warning will be given. Using cell phones in class is
not proper university conduct.

Computer: During the lecture or exams you may have the SAS or R
windows open, as well as a web browser directed to the class website ONLY.
The time at the computer lab is reserved for learning and following along
the lecture. Opening web browser email, or other websites than the class
website will result in an automatic 10% reduction of your final grade.
Only one warning will be given.

E-mail protocol: I am happy to receive queries by email. However, please


note that the subject line must contain STA 2305, otherwise the e-mail will
be automatically categorized as spam and deleted. Also, if many messages
on the same theme are received, I will respond in class and will not make
individual replies. Restrict messages to questions about course content and
requests for an appointment only. If you have requests for special considera-
tion, questions about your standing in the course, or other matters requiring
discussion you must see me personally.
Chapter 1

Generating Random
Numbers

1.1 Random numbers generators


This lecture deals with the execution of random experiments via the com-
puter, also called stochastic simulation. In a typical simulation, ran-
domness is introduced into simulation models via independent uniformly
distributed random variables, also called random numbers. The basic
building block of any simulation study is the ability to generate random
numbers that are distributed according to some distribution function FX .
Given the complexity of certain systems, the distribution function FX will
not always be explicitly known.

In this lecture, the focus will be on the generation of random numbers uni-
formly distributed on the interval (0, 1). The aim is to generate a stream
u1 , u2 , · · · of iid U (0, 1) variables. The algorithms used in practical imple-
mentations are all deterministic in nature, typically using a recursion, and
can therefore at best only mimic the behaviour of a truly U (0, 1) sequence
of random numbers. For this reason we call the output of a generator a
sequence of pseudo-random numbers. The generation of an iid U (0, 1)
sequence on a computer is a central theme in the area of simulation, as the
simulation of random variables from general distributions requires the abil-
ity to produce U (0, 1) random numbers.

With the advent of high-speed personal computers (PCs) simulations be-


came one of the most valuable tools of simulation and modeling profession.

6
CHAPTER 1. GENERATING RANDOM NUMBERS 7

One of the reasons for widespread use simulation methods is the fact that
the vast majority of practically important problems are not amenable to
analytical solutions. Another reason is that it is relatively easy to mas-
ter (at least) the basic steps of simulation techniques. Without any doubt,
the power of simulation methods increases enormously when simulations are
combined with analytical calculations.

The following standard terminology is used throughout this lecture.

A random variate–a realization of a random variable generated by a com-


puter.

A random number–a random variate generated from the distribution U (0, 1)


(the uniform distribution on the interval [0, 1]).

Upper case letters (like X, Y, Z) are used to denote random variables. Lower
case letters (like x1 , y10 , z) with or without subscripts are mostly used to
denote realisations of random variables.

A simulation of any system or process in which there are inherently ran-


dom components requires a method of generating or obtaining numbers that
are random, in some sense. For example, the queueing and inventory mod-
els required inter-arrival times, service times, demand sizes, etc., that were
“drawn” from some specified distributions, such as exponential or Erlang.
So as to avoid speaking of “ generating random variables,” which would not
be strictly correct since a random variable is defined in mathematical prob-
ability theory as a function satisfying certain conditions, we will adopt more
precise terminology and speak of “generating random variates”.

This entire lecture is devoted to methods of generating random variates


from the uniform distribution on the interval [0, 1]; this distribution is de-
noted by U (0, 1). Random variates generated from the U (0, 1) distribution
will be called random numbers. The prominent role of the U (0, 1) distribu-
tion stems from the fact that random variates from all other distributions
(normal, gamma, binomial etc.) and realizations of various random pro-
cesses (e.g., a nonstationary Poisson process) determined by the desired
distribution or process.
CHAPTER 1. GENERATING RANDOM NUMBERS 8

1.2 Uniform random number generation


In the early days of simulation, randomness was generated by manual tech-
niques such as coin flipping, dice rolling, card shuffling, and roulette spin-
ning. It was believed that only mechanical devices could yield “truly” ran-
dom numbers. Although mechanical devices are still widely used in gambling
and lotteries, these methods were abandoned by the computer-simulation
community for several reasons:

(a) Mechanical methods were too slow for general use.

(b) the generated sequence cannot be reproduced, and

(c) it has been found that the generated number exhibit both bias and
dependence.

Most of today’s random number generators are not based on physical de-
vices, but on simple algorithms that can be easily implemented on a com-
puter, are fast, require little storage space, and can readily reproduce a given
sequence of random numbers.

Importantly a good random number generator captures all the important


statistical properties of true random sequence, even through the sequence is
generated by a deterministic algorithm.

Producing a number by some algorithm takes away some of the random-


ness that it should possess. A truely “random” number or occurrence can
be produced only by some physical phenomenon, such as white noise. For
this reason, numbers produced by algorithms are correctly referred to as
pseudo-random numbers.

We are looking to generate random numbers on the interval [0, 1].

Properties of Uniformly Distributed Numbers


A good arithmetic random-number generator should possess the following
properties:

1. Above all, the numbers generated should appear to be distributed


uniformly on [0, 1] and should not exhibit any correlation with each
other; otherwise, the simulation results may be completely invalid.
CHAPTER 1. GENERATING RANDOM NUMBERS 9

2. From a practical standpoint, we would naturally like the generator to


be fast and avoid the need for a lot of storage.

3. We would like to be able to reproduce a given stream of random num-


bers exactly, for at least two reasons. First, this can sometimes make
debugging or verification of the computer program easier. More im-
portant, we might want to use identical random numbers in simulating
different systems in order to obtain a more precise comparison.

4. We would like the generator to be portable, i.e., to produce the same


sequence of random numbers (at least up to machine accuracy) for all
standard compiler and computers.

5. The generator should be able to produce a different set of random


numbers and to reproduce a series of numbers depending on where it
starts the sequence.

6. The method should not degenerate to repeatedly produce a constant


value.

7. The generator should not produce a zero.

Several methods have been developed to generate sequence of pseudo-random


numbers. The first three methods (Mid-square, Mid-product and Fibonacci)
are of historical significance and have detrimental, limited characteristics.

The most successful computer process for generating pseudo-random num-


bers is called a linear congruential generator(LCG). This generates ran-
dom numbers using an initial integer value called a seed and a recursive
formula.

The commonly used computer packages have pseudo-random functions built


in and the statistical and actuarial tables include pseudo-random numbers
from the U (0, 1) and N (0, 1) distributions.

1.3 Linear Congruential Generators


Many random-number generators in use today are linear congruential gen-
erators (LCGs), introduced by Lehmer(1951).

To use an LCG we need to choose three positive integers:


CHAPTER 1. GENERATING RANDOM NUMBERS 10

a, the multiplier
c, the increment
m, the modulus m > a, m > c.
To obtain the desired sequence of random numbers u1 , u2 , · · · , u N , first gen-
erate a sequence of integers x1 , x2 ,· · · , x N in the range 0, 1, 2, · · · , m − 1
starting from an initial value x0 , known as the seed.

The sequence of integers x1 , x2 , · · · is generated using the recursive for-


mula:
xi = ( axi−1 + c)(modulo m) i = 1, 2, · · · , N (1.1)
where m (the modulus), a (the multiplier), c (the increment) and x0 (the
seed or starting value) are nonnegative integers. Thus, Eq. (1.1) says that to
obtain xi , divide axi−1 + c by m and let xi be the remainder of this division.
Therefore 0 ≤ xi ≤ m − 1, and to obtain the desired random numbers Ui
(for i = 1, 2, · · · ) on [0, 1], we let Ui = xi /m. In addition to nonnegativity,
the integers m, a, c, and x0 should satisfy 0 < m, a < m, c < m and x0 < m.

This method generates random numbers on the interval [0, 1) rather than
[0, 1]. (when writing ranges, mathematicians use square brackets to indicate
that the endpoint is included in the range and round brackets to indicate
that it is not. So these ranges correspond to 0 ≤ · · · < 1 and 0 ≤ · · · ≤ 1.)

In fact, it is clear that it can only give numbers of the form i/m where
i = 0, 1, 2, · · · , m − 1. In practice this doesn’t matter, assuming m is large
enough, so we won’t distinguish between the two in this chapter.

If c = 0, then the procedure is called a multiplicative congruential gen-


erator. If c ̸= 0, then this is called a mixed generator. Surprisingly,
mixed generators have not performed well in practice. Hence, we shall only
consider the multiplicative congruential generator.
xi = ( axi−1 )(modulo m) i = 1, 2, 3, · · ·
Two objections could be raised against LCGs. The first objection is one
common to all (pseudo) random-number generators, namely, that the xi ’s
defined by Eq. (1.1) are not really random at all. In fact, one can show by
mathematical induction that for i = 1, 2, · · ·
c ( a i − 1)
 
i
x i = a x0 + (mod m)
a−1
CHAPTER 1. GENERATING RANDOM NUMBERS 11

so that every xi is completely determined by m, a, c, and x0 . However, by


careful choice of these four parameters we try to induce behaviour in the xi ’s
that makes the corresponding Ui ’s appear to be iid U (0, 1) random variates
when subjected to a variety of tests.

The second objection to LCGs might be that the Ui ’s can take on only
the rational values 0, 1/m, 2/m, · · · , (m − 1)/m; in fact, the ui ’s might ac-
tually take on only a fraction of these values, depending on the specification
of the constants m, a, c, and Z0 , as well as on the nature of the floating-
point division by m. Thus there is no possibility of getting a value of Ui
between, say, 0.1/m and 0.9/m, whereas this should occur with probability
0.8/m > 0. As we shall see, the modulus m is usually chosen to be very
large, say 109 or more, so that the points in [0, 1] where the Ui ’s can fall are
very dense; for m ≥ 109 , there are at least a billion possible values.

Example 1.3.1. Use this algorithm with parameter values a = 11, c = 37,
m = 100 and seed x0 = 85 to generate 5 random numbers from the U (0, 1)
distribution. How suitable do you think this method would be for generating
random numbers?

Solution. The first few numbers in the sequence are calculated as follows:

x1 = 11 × 85 + 37 mod 100 = 972 mod 100 = 72


x2 = 11 × 72 + 37 mod 100 = 829 mod 100 = 29
x3 = 11 × 29 + 37 mod 100 = 356 mod 100 = 56
x4 = 11 × 56 + 37 mod 100 = 653 mod 100 = 53
x5 = 11 × 53 + 37 mod 100 = 620 mod 100 = 20

These are then divided by m to generate the random numbers, which are:
0.72, 0.29, 0.56, 0.53, 0.20, · · ·

To see how suitable this sequence is for practical use, we need to see how
many numbers are generated and whether these appear to be random.

In fact this sequence only generates 50 different numbers out of the 100
in the range 0 to 99 because it starts repeating after x50 = x0 = 85.

We also need to see how “random” these numbers are.


CHAPTER 1. GENERATING RANDOM NUMBERS 12

Example 1.3.2. Consider the LCGs defined by m = 16, a = 5, c = 3


and x0 = 7. The table below gives xi and ui (to three decimal places) for
i = 1, 2, · · · , 19. Note that x17 = x1 = 6, x18 = x2 = 1, and so on. That
is, from i = 17 through 32, we shall obtain exactly the same values of xi
(and hence Ui ) that we did from i = 1, through 16, and in exactly the same
order. (we do not seriously suggest that someone use this generator since m
is so small; it only illustrates the arithmetic of LCGs.)

i xi ui i xi ui i xi ui i xi ui
0 7 – 5 10 0.625 10 9 0.563 15 4 0.250
1 6 0.375 6 5 0.313 11 0 0.000 16 7 0.438
2 1 0.063 7 12 0.750 12 3 0.188 17 6 0.375
3 8 0.500 8 15 0.938 13 2 0.125 18 1 0.063
4 11 0.688 9 14 0.875 14 13 0.813 19 8 0.500

The LCG xi = 5xi−1 + 3)( mod 16) with x0 = 7

The “looping behaviour of the LCG in the example above is inevitable.


By the definition in Eq. 1.1), whenever xi takes on a value it has had pre-
viously, exactly the same sequence of values is generated, and this cycle
repeats itself endlessly. The length of a cycle is called the period of a gen-
erator. For LCGs, xi−1 depends only on the previous integer Z and since
0 ≤ xi ≤ m − 1, it is clear that the period is at most m; if it is in fact m,
the LCG is said to have full period (The LCG in the example above has full
period). Clearly, if a generator is full period, any choice of the initial seed
Z0 from {0, 1, · · · , m − 1} will produce the entire cycle in same order. If,
however, a generator has less than full period, the cycle length could in fact
depend on the particular value of x0 chosen, in which case we should really
refer to the period of the seed for this generator.

Since large-scale simulation projects can use millions of random numbers, it


is manifestly desirable to have LCG’s with long periods.

Example 1.3.3. Let us consider a simple example with x0 , a = 5, and


m = 17. The following sequence of pseudo-random number s produced:

5, 8, 6, 13, 14, 2, 10, 16, 12, 9, 11, 4, 3, 15, 7, 15, · · ·

The sequence begins to repeat after 16 random numbers. Note that every
number between 1 and 16 is in the sequence, but neither zero nor 17 ever
appear, since the number zero would repeat forever. This particular choice
CHAPTER 1. GENERATING RANDOM NUMBERS 13

of a, x and xi yields a period of 16. The numbers “appear” to occur at


random, but they obviously repeat this sequence forever.

Question 1.3.1. Use the algorithm with parameter values a = 10, c = 23,
m = 59 and seed x0 = 11 to generate 3 random numbers from the U (0, 1)
distribution.

Question 1.3.2. For the following multiplicative LCGs, compute zi for


enough values of i ≥ 1 to cover the entire cycle:

zi = (11zi−1 )( mod 16), z0 = 1

Question 1.3.3. Without actually computing any zi ’s, determine which of


the following mixed LCGs have full period

(a). zi = (13zi−1 + 13)( mod 16)

(b). zi = (zi−1 + 12)( mod 13)

Question 1.3.4. What are the disadvantages of using truly random num-
bers, as opposed to pseudo-random numbers in Monte Carlo simulations?

Answer.

• Truly random numbers are generated from physical processes. This


necessitates a hardware attachment to the computer, which is inconve-
nient.

• Another drawback is that these numbers cannot be reproduced through


the same process.

• slow for large scale simulation

Question 1.3.5. Describe the three integers one has to choose in order to
form a linear congruential generator (LCG) and set out a recursive relation-
ship, with subsequent scaling, for generating a sequence of pseudo-random
numbers in the range −1 to 1.

Answer. The three integers to be chosen are:

• The multiplier – denoted here by a

• The increment – denoted here by c (it is often set to 0 to speed up the


generation process)
CHAPTER 1. GENERATING RANDOM NUMBERS 14

• The modulus – denoted here by m, where m > a and m > c, the


generator will produce a series of pseudo random integers in the range
0 to m − 1,so the modulus is usually set as a large number.

the recursive relationship is xn+1 = ( axn + c) mod m for n = 0, 1, 2, · · · .

We then set Xn = (2xn /m − 1) for n = 1, 2, · · ·

Lab 3: Generating Random Numbers


By Omari C.O
1. Use R, or some other programming system to write a program to
implement a generator using a multiplicative congruential method with
m = 213 − 1 and a = 17. Generate 500 numbers xi . Computer the
correlation of the pairs of successive numbers xi+1 and xi . Plot the
pairs. On how many lines do the points lie? Now, let a = 85. Generate
500 numbers, compute the correlation of the pairs, and plot them.
Now, look at the pairs xi+2 and xi . Compute their correlation.

2. Write a R function to use a multiplicative congruential method with


m = 231 − 1 and a = 16807 (the “minimal standard”)

3. Write a R function to use a multiplicative congruential method with


m = 231 − 1 and a = 950706376.

Answer. random.number<- numeric(10)# this will store pseudorandom


output
random.seed <-27218
for (j in 1:10) {
random.seed <-(171 * random.seed) %% 30269
random.number[j]<-random.seed/30269
}
random.number
Chapter 2

General approach to
generating random variates

2.1 Introduction
A simulation that has any random aspect at all must involve sampling, or
generating, random variates from probability distributions. These distribu-
tions are often specified as a result of fitting some appropriate distributional
form, e.g., exponential, gamma, or poisson, to observed data. In this lecture
we assume that a distribution has already been specified somehow (includ-
ing the values of the parameter), and we address the issue of how we can
generate random variates with this in order to run the simulation model.
For example, the queueing-type model required generation of inter-arrival
and service times to drive the simulation through time, and the inventory
model needed randomly generated demand sizes at the times when a de-
mand occurred.

There are many techniques for generating random variates, and the partic-
ular algorithm used must, of course, depend on the distribution from which
we wish to generate; however nearly all these techniques can be classified
according to their theoretical basis.

The main general methods for the generation of random variates are:

1. Inverse transform method,

2. Acceptance-rejection method.

15
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES16

Special algorithms exist for the generation of random variates from impor-
tant distributions. For instance, the Box-Muller algorithm and the polar
algorithm are used for generation of random variates from the standard nor-
mal distribution.

In this lecture it is always assumed that a sequence of random numbers


is available. many standard computer programs are capable of generating
such random numbers, in most cases using an LCG.

2.2 The Inverse Transformation Method


In this section we discuss a general method for generating one-dimensional
random variables from a prescribed distribution, namely the inverse-transform
method. The inverse transformation technique deals with the cumulative dis-
tribution function, F ( x ), of the distribution to be simulated.

Let X be a random variable with cdf F ( x ). Since F is a nondecreasing


function, the inverse function F −1 (y) may be defined as

F −1 (y) = inf{ x : F ( x ) ≥ y} 0 ≤ y ≤ 1. (2.1)

for all y between 0 and 1.

If a random variable U is uniformly distributed over the interval [0, 1], then
the random variable X = F −1 (U ) has the distribution function F ( x ), since:
h i
F ( x ) = P( X ≤ x ) = P F −1 (U ) ≤ x = P [U ≤ F ( x )] = F ( x )

The last step follows because if U ∼ U (0, 1) then P(U ≤ u) = u for any
value of u in the range (0, 1). (recall that ∼ is read “is distributed as”).

Thus, to generate a random variate x with cdf F, draw U ∼ U (0, 1) and


set X = F −1 (U ). The inverse-transform method is given by the following
two-step algorithm.

(1). Generate a random variate U from U (0, 1).

(2). Return x = F −1 (u), which is the variate desired from the given distri-
bution.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES17

(Note that when programmers are writing a subroutine to calculate a spe-


cial function, they use the RETURN command to give the calculated value.)

The difficulty with this technique lies in finding the inverse transform such
that F −1 (U ) = x. If this inverse function can be established, we only need to
generate various uniform random numbers and perform the inverse function
on them to obtain random deviates from the distribution desired.

2.2.1 Inverse Transform method for Discrete variates


(Drawing From a Discrete Distribution) Let X be a discrete random
variable with P( X = xi ) = pi , i = 1, 2, · · · , with ∑i pi = 1. The cdf of X is
given by F ( x ) = ∑i:xi ≤ x pi , i = 1, 2, · · · .

The algorithm for generating a random variable from F can thus be written
as follows: Generation of a discrete Random Variable

1. Generate U ∼ U (0, 1)

2. Find the smallest positive integer k, such that U ≤ F ( xk ) and return


X = xk .

Example 2.2.1. Describe how you would generate random variates from a
Binomial(3, 0.6) distribution X, using a sequence of random numbers ui .

Solution. X can only take the values 0, 1, 2 or 3. The distribution function


is given by:

F (0) = 0.064, F (1) = 0.352, F (2) = 0.784, F (3) = 1

Suppose you have a random number u then return:





 0 if 0 ≤ u ≤ 0.064
1 if 0.064 < u ≤ 0.352

U = F(x) =


 2 if 0.352 < u ≤ 0.784
3 if 0.784 < u ≤ 1

It should be clear that the probabilities of each outcome are correct in the
above example. We will now discuss the general situation.

Let X be a discrete random variable which can take only the values x1 , x2 , · · · , x N ,
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES18

where x1 < x2 < · · · < x N . The distribution of the random variable X is


given by:
P ( X = xi ) = pi , for i = 1, 2, · · · , N;
N
where pi > 0 for all i, and ∑ pi = 1.
i =1
The distribution of the random variable X is therefore:
F ( x ) = P( X ≤ x ) = ∑ pi
i:xi ≤ x

where the summation runs over the range of indexes i such that xi ≤ x. If
x < x1 then ∑ pi .
i:xi ≤ x

The following algorithm generates random variates x from this distribution.


1. Generate u from U (0, 1).
2. Find the positive integer i such that F ( xi−1 ) < u ≤ F ( xi ).
3. Return x = xi .
It is obvious that the algorithm can return only variates x from the range
x1 , x2 , · · · , x N . The probability that a particular value x = xn is returned is
given by:
P[value returned is xn ] = P [ F ( Xn−1 ) < U ≤ F ( xn )] = F ( xn ) − F ( xn−1 ) = pn
as required.
Example 2.2.2. Consider the following function:
(
k ( x + 1)3 , if − 1 ≤ x ≤ 1
f (x) =
0 Otherwise.

(a) Find the value of the constant k, so that f ( x ) is a well defined proba-
bility density function.
(b) Use the inverse transformation method to generate three random ob-
servations from f ( x ) by hand.
(c) Write an R function to generate 1000 observations from f ( x ).
Note: If needed, use the following uniform random numbers in the
given order:
r0 = 0.0257, r1 = 0.2331, r2 = 0.3452, r3 = 0.2589.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES19

Example 2.2.3. Using the first 5 numbers in the first column of random
U (0, 1) numbers given in the Tables, generate 5 values from a Poisson(10)
distribution.

Bernoulli Distribution

If X ∼ Ber( p), its pmf is of the form

f ( x ) = p x (1 − p )1− x , x = 0, 1,

where p is the success probability. Applying the inverse-transform method,


one readily obtains the following generation algorithm: (Generation of a
Bernoulli Random Variable)

1. Generate U ∼ U (0, 1)

2. If U ≤ p, return X = 1; otherwise return X = 0.

Binomial Distribution

If X ∼ Bin(n, p) then its pmf is of the form


 
n x
f (x) = p (1 − p)n− x , x = 0, 1, · · · , n.
x

Recall that a binomial random variable X can be viewed as the total num-
ber of successes in n independent Bernoulli experiments, each with success
probability p; Denoting the result of the ith trial by Xi = 1 (success) or
Xi = 0 (failure), we can write X1 + X2 + · · · + Xn with the { Xi } iid Ber( p)
random variables. The simplest generation algorithm can thus be written
as follows: (Generation of a Binomial Random Variable)

1. Generate iid random variables X1 , · · · , Xn from Ber( p).

2. Return X = ∑in=1 Xi ; as a random variable from Bin(n, p).

It is worthwhile to note that if Y ∼ Bin(n, p), then n − Y ∼ Bin(n, 1 − p).


Hence, to enhance efficiency, one may elect to generate X from Bin(n, p)
according to (
Y1 ∼ Bin(n, p), if p ≤ 21
X=
Y2 ∼ Bin(n, 1 − p), if p > 12 .
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES20

Geometric Distribution

If X ∼ G ( p), then its pmf is of the form

f ( x ) = p (1 − p ) x −1 , x = 1, 2, · · ·

The random variable X can be interpreted as the number of trials required


until the first success occurs, in a series of independent Bernoulli trials with
success parameter p. Note that P( X > m) = (1 − p)m .

We now present an algorithm which is based on the relationship between


the exponential and geometric distributions. Let Y ∼ Exp(λ), with λ such
that q = 1 − p = e−λ . Then, X = ⌊Y ⌋ + 1 has a G ( p) distribution–here ⌊ ⌋
denotes the integer part. Namely,

P( X > x ) = P(⌊Y ⌋ > x − 1) = P(Y > x ) = e−λx = (1 − p) x .

Hence, to generate a random variable from G ( p), we first generate a random


variable from the exponential distribution with λ = − ln(1 − p), truncate
the obtained value to the nearest integer and add 1.

The classic example of the inverse transform method is generating random


variates from an exponential(λ) distribution. (This is because the exponen-
tial cdf is so easy to invert.) If U ∼ uniform(0,1), and X = − ln U/λ, then
X ∼ exponential(λ). So it’s easy to generate exponential random variates.
Then, to get geometric and truncated exponential random variates, all
you have to do is take ⌊ X ⌋ and { X }, the integer and fractional parts of X,
respectively! More explicitly, the integer part ⌊ X ⌋ has a geometric(1 − e−λ )
distribution, and the fractional part { X } has the truncated exponential (λ)
distribution on (0, 1).

Proof. :

P(⌊ X ⌋ = k, { X } ≤ x ) = P(k ≤ X < k + x ) = e−λk − e−λ(k+ x) = e−λk (1 − e−λx ).

If we fix k and substitute 1 for x we have

P(⌊ X ⌋ = k, { X } ≤ 1) = P(⌊ X ⌋ = k ) = e−λk (1 − e−λ ) = (e−λ )k (1 − e−λ ).

Thus ⌊ X ⌋ ∼ geometric(1 − e−λ ).


CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES21

If we fix x and sum over all possible values of k we have



1 − e−λx
P({ X } ≤ x ) = (1 − e−λx ) ∑ e−λk = 1 − e−λ
.
k =0

Now, f ( x ) = 1 − e−λx is the cdf of an exponential(λ) distribution, and


1 − e−λ is the probability that an exponential(λ) random variable is no
greater than 1. Thus { X } has this truncated exponential (λ) distribution
an exponential (λ) distribution whose values are not allowed to be greater
than 1.
(Generation of a Geometric Random Variable)

1. Generate Y ∼ Exp(− ln(1 − p))

2. Return X = 1 + ⌊Y ⌋; as a random variable from G( p).

Inverse Transform Method for Continuous variates


Example 2.2.4. Generate a random variable from the pdf
(
2x 0 ≤ x ≤ 1
f (x) = (2.2)
0 otherwise

The cdf is

0R
 x<0
x
F(x) = 0
2xdx = x2 , 0≤x≤1 (2.3)

1 x>0

Applying X = F −1 (U ), we have

X = F −1 (U ) = U, 0 ≤ u ≤ 1.

Therefore, to generate a random variable X from the pdf, first generate a


random variable U from U (0, 1), and then take its square root.
Example 2.2.5. Generate a random variate from
(
3x2 0 ≤ x ≤ 1
f (x) =
0 elsewhere

Recall: Z x
d
f (x) = F ( x ), F(x) = f (t)dt
dx 0
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES22

Then
Z x Z x
F(x) = f (t) dt = 3t2 dt
t =0 0

Hence;
F ( x ) = x3 .
Since F ( x ) is the cumulative distribution function of f ( x ), we can replace
F ( x ) by a random number R. The inverse transformation is, therefore

x = F −1 (U ) = U 1/3 .

Hence, to generate a random deviate from the probability density f ( x ) =


3x2 , a random deviate from the uniform distribution is generated (call it R)
and the desired deviate x = (U )1/3 .
The problem with the inverse transform technique is that for many distri-
butions, the inverse function, F −1 (U ), does not exist or it is so complicated
as to be impractical. When this is the case, either approximations or other
techniques must be used.

In general, the inverse-transform method requires that the underlying cdf,


F, exist in a form for which the corresponding inverse function F −1 can
be found analytically or algorithmically. Applicable distributions are, for
example, the exponential, uniform, Weibull, logistic, and Cauchy distribu-
tions. Unfortunately, for many other probability distributions, it is either
impossible or difficult to find the inverse transform, that is, to solve
Z x
F(x) = f (t) dt = u
−∞

with respect to x. Even in the case where F −1 exists in an explicit form, the
inverse-transform method may not necessarily be the most efficient random
variable generation method.
The next subsection we present algorithms for generating variables from
commonly used continuous. Of the numerous algorithms available we have
tried to select those which are reasonably efficient and relatively simple to
implement.

Exponential Distribution We start by applying the inverse-transform


method to the exponential distribution. If X ∼ exp(λ), then its cdf F is
given by
F ( x ) = 1 − e−λx , x > 0.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES23

Hence, solving u = F ( x ) in terms of x gives

1
F −1 ( u ) = − log(1 − u).
λ
Noting the U ∼ U (0, 1) implies 1 − U ∼ U (0, 1), we obtain the following
algorithm. Generation of an Exponential Random Variable

1. Generate U ∼ U (0, 1)

2. Return X = − λ1 ln U as a random variable from exp(λ).

There are many alternative procedures for generating variables from the ex-
ponential distribution. The interested reader is referred to Lue Devroye’s
book Non-Uniform Random Variate Generation, Springer-Verlag, 1986. (The
entire book can be downloaded for free.)
Example 2.2.6. Given a random number U uniformly distributed over
[0, 1], give an expression in terms of U for another variable X whose density
function is f ( x ) = 5 exp(−5x ). Justify the expression.
Answer. Let X = −(log U )/5. Then, for x > 0, the cumulative distribu-
tion function is

F ( x ) = P( X ≤ x ) = P [log U ≥ −5x ]
= P U ≥ e−5x


= 1 − P(U < e−5x )


= 1 − e−5x

By differentiating this expression, we get

f ( x ) = 5 exp(−5x ) as required.

Alternatively, use the inverse distribution function method:


Z x
F(x) = 5e−5y dy = 1 − e−5x
0

We set F ( x ) = U, and solve for X. This gives X = −(log U )/5.


Question 2.2.1. The following pseudo-random numbers come from a uni-
form distribution on the interval (0, 1):

0.249, 0.667, 0.361, 0.963


CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES24

Use these to generate four pseudo random values from the exponential dis-
tribution with probability density function

f ( x ) = 5e−5x , x > 0.

[4]
Weibull Distribution Generate a random variate from the Weibull (α, β)
distribution.

A random variable X has a weibull distribution (with parameters α > 0


and β > 0) if its probability density function is given by:
(
αβ−α x α−1 exp[−( x/β)α ], for 0 ≤ x < ∞
f (x) =
0, otherwise

The distribution function of X is given by:


Z x
F ( x ) = P( X ≤ x ) = f (τ )dτ = 1 − exp [−( x/β)α ]
−∞

Example 2.2.7. Use the U (0, 1) random numbers 0.238 and 0.655 to gen-
erate two observations from the Weibull distribution with parameters c =
(1/β)α = 0.002, and α = 1.1.
Solution. Using the inverse transform method, we need to set

U = 1 − e−cx
α

i.e.,
−cx α = log(1 − u),
i.e.,
1/α
log(1 − u)

x=
−c
Using the equation above with the parameters c = 0.002 and α = 1.1, we
get;

u = 0.238 gives x = 86.96


u = 0.655 gives x = 300.73
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES25

To find the functional inverse of F ( x ) we have to solve the equation


u = F ( x ) for the variable x. Therefore, the inverse function F −1 (u) is the
solution of the equation:

u = 1 − exp [−( x/β)α ]

and is given by:


F −1 (u) = β [− ln(1 − u)]1/α
Thus, to generate a random variate x from a Weibull distribution we can
use the following algorithm:

(1). Generate a random variate u from U (0, 1).


1/α
(2). Return x = F −1 (u) = β [− ln(1 − u)] .

The main disadvantage of the inverse transform method is the necessity


to have an explicit expression for the inverse of the distribution function
F ( x ). For instance, to generate a random variate from the standard normal
distribution using the inverse transform method we need the inverse of the
distribution function:
Z x
1 2
F(x) = √ e−1/2t dt
2π −∞

However, no explicit solution to the equation u = F ( x ) can be found in this


case.
Question 2.2.2. Generate 3 random variates from an exponential distribu-
tion with mean 0.5, using the three random numbers generated in Question
1.3.1, i.e, 15/59, 55/59, and 42/59.

2.2.2 Exercise Questions


1. Explain the disadvantage of using random numbers, as opposed to
pseudo-random numbers, for simulation. (2)

Solution. A truly random number generator cannot generate long


sequences of numbers as efficiently as pseudo random number gener-
ators. One can mitigate this problem by generating in advance long
sequences of truly random numbers for later use, but this generally
requires huge storage space/hardware enhancement of the computer.
If truly random numbers are generated at the time of use (i.e., not read
out of computer memory), then the sequence cannot be reproduced.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES26

2. What is a linear congruential generator? (2)

A Linear Congruential Generator (LCG) is a process of generating


pseudo random numbers using an initial integer value, called the seed,
and a recursive formula.

A typical recursive formula is of the form xn = ( axn−1 + c)( mod m)


, where a and c are fixed integers smaller than the third integer m.
Given a seed x0 , one can go on generating successive integers in the
range 0 to m, using this formula. Usually m is chosen as a very large
number, and x/m is a pseudo-random number between 0 and (m1)/m.

3. Describe the three integers one has to choose in order to form a linear
congruential generator (LCG) and set out a recursive relationship,
with subsequent scaling, for generating a sequence of pseudo random
numbers in the range 1 to 1. (3)

4. (i) The continuous random variable X has probability density func-


tion f ( x ) and cumulative distribution function F ( x ). Explain
carefully how the inversion method (i.e. the inverse c.d.f. method)
of simulation could be used to simulate observations from this
distribution. What restrictions on F ( x ) are required in order to
make this method of simulation work? (4)
(ii) The following numbers are a random sample of real numbers from
the uniform distribution on the interval 0 to 1:

0.167 0.236 0.778 0.968.

Use these values to generate four random variates from each of


the following distributions:
(a) f X ( x ) = exp(− x ), x≥0 (4)
(b) f X ( x ) = 4x (1 − x2 ), 0≤x≤1 (7)
e −2 2 x
(c) P( X = x ) = , x = 0, 1, · · · (5)
x!
Solution.
(i) First generate by any available method a pseudo-random
number between 0 and 1; call it u.

Now set F ( x ) = u, and solve this equation to find x =


F −1 (u). This value x is a pseudo-random member of the
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES27

specified distribution.

If this is to work, F must be easily invertible, either alge-


braically or numerically.
(ii) (a) F ( x ) = 1 − e− x
If u = F ( x ) = 1 − e− x , then x = − ln(1 − u).

For the given four numbers, using them as u, we find


x = 0.183; 0.269, , 1.505, 3.442.

Note: If u is U (0, 1), so is (1 − u); so x = − ln u could


be used. Z
x h ix
(b) F ( x ) = (4t − 4t3 ) dt = 2t2 − t4 = 2x2 − x4 (for
0 0
0 ≤ x ≤ 1). If u = 2x2 − x4 , then we have √
x4 − 2x2 + u =
0; i.e., ( x − 1) − 1 + u = 0, or x − 1 = − 1 − u (taking
2 2

negative square root to obtain x < 1), which gives x =


p √
1 − 1 − u. This gives x = 0.295, 0.355, 0.727, 0.906.
(c) For the Poisson distribution, tables can be used to set up
the cumulative distribution (e.g. Examination Tables) or
the c.d.f. can be calculated by hand. When λ = 2, we
have:

u
P( X = 0) = 0.1353 so F (0) = 0.1353
P( x = 1) = 0.2702 so F (1) = 0.4060 ← 0.167, 0.236
P( x = 2) = 0.2707 so F (2) = 0.6767
P( x = 3) = 0.1804 so F (3) = 0.8571 ← 0.778
P( x = 4) = 0.0902 so F (4) = 0.9473
P( x = 5) = 0.0361 so F (5) = 0.9834 ← 0.968
and so on.

Any value of u up to 0.1352 corresponds to x = 1; u


from 0.1353 to 0.4059 to x = 2; and so on. So we find 1,
1, 3, 5 as the random sample from the Poisson distribu-
tion with mean 2.

F needs to be worked out as far into the tail of the dis-


tribution as necessary to use all the given values of u.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES28

5.

6. (a) Let X be any continuous random variable, and let F ( x ) be its cu-
mulative distribution function. Suppose that U is a continuous
random variable which follows a uniform distribution on the inter-
val (0, 1), and define the new random variable Y by Y = F −1 (U ),
where F −1 () is the inverse function of F (). By considering the
cumulative distribution function of Y, or otherwise, show that Y
has the same distribution as X. (5)
Solution. FY (y) = P(Y ≤ y) = P( F −1 (U ) ≤ y) = P(U ≤
F (y)). But U is uniform (0, 1), and so P(U ≤ u) = u for
0 ≤ u ≤ 1. Hence, FY (y) = P(U ≤ F (y)) = F (y), the same
cdf as X. Therefore, Y has the same distribution as X.
(b) The following values are a random sample of numbers from a
uniform distribution on the interval (0, 1).

0.205 0.476 0.879 0.924

Use these values to generate 4 random variates from each of the


following distributions, carefully explaining the method you use
in each case.
 x
1
(i) Geometric : P( X = x ) = , x = 1, 2, · · · (6)
2
Solution.
x 1 2 3 4
P( X = x ) 1/2 1/4 1/8 1/16
P( X ≤ x ) 0.5000 0.7500 0.8750 0.9375

u1 = 0.205 which is < 0.5 hence x1 = 1


u2 = 0.476, so also x2 = 1.
u3 = 0.879, 0.8750 < u3 < 0.9375, hence x3 = 4
u4 = 0.924, so also x4 = 4

Sample is 1, 1, 4, 4.
18
(ii) Pareto: f (x) = 3 , x > 3. (5)
x Z x  x
18 9
Solution. For Pareto, FX ( x ) = dt = − =
3 t3 t2 3
9
1 − 2 ( x ≥ 3). u = FX ( x ) = 1 − x92 gives x = √13−u . The
x
random variates therefore are 3.365, 4.144, 8.624, 10.882.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES29

1
(iii) Standard normal: f (x) = √ exp(− x2 /2), −∞ <

x<∞ (4)
Solution. Using Normal tables, the four values correspond-
ing to ui are xi = Φ−1 (ui ), i.e., −0.82, −0.06, 1.17, 1.43.

7. Try to solve for the inverse function of the normal density. Highlight
the difficulties in obtaining a neat algebraic solution.

Lab 4: Random Numbers and Simulations in R


By Omari C.O
Let X be a continuous r.v. with density f and cdf F. Let’s assume that F
is strictly continuous on some interval I such that f ( x ) = 0 for x ∈ I, and
as a consequence F −1 is well defined on I.
We have showed in class that if U is a uniform r.v. and we define Y =
F −1 (U ), then the cdf of Y is F. In other words, X and Y have the same
distribution. This lab aims to illustrate this property.

1. Derive the inverse function for the function:

f ( x ) = 1 − e− x/β

2. (a) Use the runif command in R to create a vector, U, of 1000


random numbers from a uniform distribution [0, 1].
(b) Use a histogram to look at the (empirical) distribution of the
vector you have generated in a). Does it look uniform?

3. The goal is to generate random numbers from an exponential distri-


bution with parameter λ, where λ is any (given) positive number.
Let us recall that the density of an exponential r.v. with parameter λ
is (
λ exp(−λx ) if x ≥ 0,
f (x) =
0 otherwise

(a) Find the corresponding cdf and show that F −1 ( x ) = − λ1 log(1 −


x ).
(b) Use the vector U defined in 1.a) to create a new vector Y =
F −1 (U ), that is, each element Yi of the new vector can be obtained
by applying the function F −1 to the corresponding element Ui
(Use λ = 2). What is the distribution of each Yi ?
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES30

(c) Use a histogram to verify your claim.

4. The goal is to devise a strategy to generate random numbers from


a geometric distribution with parameter p, where p is any number
between 0 and 1.
Recall that the density of a geometric (this is a discrete distribution)
r.v., X, with parameter p is

P ( X = x ) = f ( x ) = p (1 − p ) x −1 , x = 1, 2, 3, · · ·

(a) Show that P( X = x ) = P( x − 1 ≤ E < x ) where E is an expo-


nential r.v. with parameter λ = − log(1 − p).

Hint: Express this probability as FE ( x ) − FE ( x − 1), where FE is the cdf of


E.

(b) Using a), can you devise a strategy to generate a random number
from a geometric distribution?
Explain your steps carefully.

Note: You can only submit your lab assignment printed or hand written
with appropriate graphs printed out from R. Emails are not accepted. Also,
note that the way you present your solutions is considered a part of your
skills and a portion of the marks is devoted to that.

2.3 Acceptance-Rejection method


Acceptance-rejection sampling is based on the idea that, if it is too difficult
to generate points at random from under the graph of f , then a reasonable
substitute might be to generate points from a larger area, which includes
the region under the graph of f , then to discard any points which are not
acceptable.

We first have to find a simpler density function h( x ), from which it is rel-


atively straightforward to draw a sample, such that f ( x )/h( x ) is bounded
for all x.

Having found a suitable density function h( x ), define:

f (x) f (x)
C = max and for all x
all x h( x ) C h( x )
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES31

so that 0 < g( x ) ≤ 1.

The interpretation is that, having generated a point ( x, y) at random from


under the graph of Ch( x ), the value g( x ) represents the probability that
this point also falls under the graph of f .
3
Example 2.3.1. Find C if f ( x ) = (1 − x )( x − 5) for 1 ≤ x ≤ 5 and
32
h( x ) is the density of a U (1, 5) random variable.
1
Solution. Since h( x ) = in the range 1 ≤ x ≤ 5, we need to find the
4
3
maximum value of f ( x ) = (1 − x )( x − 5) in that range. Differentiating
32
3
this gives f ′ ( x ) = (6 − 2x ) and hence x = 3 at the maximum. (In fact
32
this should be obvious by symmetry. It should also be obvious that this
point really is a maximum rather than just a turning point.)

The maximum value itself is therefore 12 32 , and we must have C × 4 = 32 .


1 12

Therefore
48 3
C= = .
32 2
C is closely related to the efficiency of the procedure, as it represents the
number of values which must be generated on average in order to end up
with a single acceptable value x.

Now use the following algorithm:


1. Generate a random variate u from U (0, 1).

2. Generate a random variate y from the distribution with density h( x ).

3. If u > g(y) then go to step 1, otherwise return x = y.


In order to generate the random variate y in step 2, it may be necessary to
generate another random variate v from U (0, 1).

Note: To come up with a good acceptance-rejection algorithm, you need


to carry out the following steps:
1. Write down the PDF you require. This is f ( x ).

2. Think of a function with the same range and a similar shape to f ( x ),


from which you know how to generate random values. This is h( x ).
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES32

f (x)
3. Look at the ratio and find the maximum value it takes over the
h( x )
range. (Usually this will involve calculus.) Call this maximum C.
f (x) f (x)
When is divided by C you get . This is g( x ).
h( x ) C h( x )
4. Now generate some values yi from the distribution h but accept them
only if ui ≤ g(yi ), i.e., with probability g(yi ). Here the ui ’s are random
values from U (0, 1).

Example 2.3.2. (a). The following pseudo-random numbers came from a


uniform distribution on the interval (0, 1):

0.249, 0.667, 0.361, 0.963

Use these to generate four pseudorandom values from the exponential dis-
tribution with probability density function;

f ( x ) = 5e−5x , x>0

Answer. Let f ( x ) = 5e5 x , x > 0,


Z ∞
1
F(x) = 5 eu du = 1 − e−5x ∴ x = − ln(1 − f ( x ))
0 5
F ( x ) ∼ U (0, 1).
F ( x ): 0.249 0.667 0.361 0.963
x: 0.057 0.220 0.090 0.659
(b)] The beta distribution has density function f ( x ) = 6x (1 − x ), 0 ≤
x < 1.

Show that f ( x ) ≤ 1.5∀ 0 ≤ x ≤ 1.

Hence devise a method based on acceptance rejection sampling for gen-


erating observations from beta distribution.
Answer. f max( x ) = 1.5, f ( x ) ≤ 1.5 = g( x ). Providing pdf h( x ) =
g( x ) 1.5
Area under g( x )
= 1.5 = 1.

Steps are:

1. Obtain a sample x = x1 , from h( x ) ∼ U (0, 1)


CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES33

2. Obtain a random number R from U (0, 1).


f ( x1 )
3. If R ≤ g ( x1 )
accept x1 as a sample from f ( x1 ).

4. Otherwise discard x1 and return to step 1.


Question 2.3.1. A general insurance actuary uses a model in which the
number of claims made each year by motor insurance policyholders conform
to the following discrete distribution:
Number of Claims Probability
0 0.80
1 0.10
2 0.05
3 0.05
4 or more NIL

(i) Describe carefully how you would use the acceptance/rejection method
to generate discrete pseudo-random values from this distribution using
as your reference distribution the discrete uniform distribution that is
equally likely to take any of the values 0, 1, 2 or 3. (You are not
required to describe how the uniform values are generated and you
may assume that you have continuous uniform values available for the
acceptance/rejection decisions.) [4]

(ii) What proportion of values generated by the method in (i) will be


accepted? Comment on your answer. [3]

(iii) Outline a more efficient alternative method that could be used to gen-
erate the pseudo-random numbers for the numbers of claims. [3]

Question 2.3.2. A researcher wishes to generate pseudo-random values


from a statistical distribution that has probability density function:
√  −3/2
3 x2
f (x) = 1+ (−2 ≤ x ≤ 2)
4 2
He knows that the shape of this distribution is similar to the standard normal
distribution, which has probability density function:
1 2
ϕ( x ) = √ e−1/2x (−∞ < x < ∞)

He has therefore decided to use the acceptance-rejection method.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES34

(i) Show that the highest value attained


√ by the ratio f ( x )/ϕ( x ) over the
range −2 ≤ x ≤ 2 is c = 12 e 2π.
1 2
[6]
(ii) Hence determine an algorithm (which you should write out clearly)
that the researcher could use to generate the required sequence of
pseudo-random numbers. (You may assume that he has an unlimited
supply of random numbers available from both the standard normal
and standard uniform distributions.) [5]
Question 2.3.3. You have been given a sequence of pseudorandom numbers
from the two-sided exponential distribution with density function
1 −|x|
f (x) = e , −∞ < x < +∞
2
and you have been asked to use these to generate a pseudorandom sequence
from a standard normal distribution using the acceptance-rejection method.
(i) Write out the steps in the algorithm you would use. [4]
(ii) Determine the proportion of values from the two-sided exponential
distribution that will be rejected using this algorithm. [2]
(iii) Write down the names of two other methods that could be used to
generate a similar sequence using a standard electronic calculator. [1]
Question 2.3.4. It is required to simulate observations from a random
variable X with density function:
f ( x ) = 0.6 − 0.3x2 , −1 < x < 1
using the acceptance-rejection method on a U (−1, 1) distribution.
(i) (a) Use the pseudo-random numbers given below, which are uniform
on (0,1), to generate a set of three observations from the U (−1, 1)
using the inverse transform method:
0.461, 0.966, 0.924

(b) Use the acceptance-rejection method on the observations from


a U(-1,1) obtained in part (i)(a) to generate as many simulated
values from X as possible. For the decision step, use the following
additional pseudorandom numbers:
0.843, 0.427, 0.683
[6]
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES35

(ii) Determine the proportion of the simulations from the U (−1, 1) distri-
bution that will be rejected in the long run. [1]

(iii) Given a reason why this method is preferable to a method based on


inverting the cumulative distribution function of X. [1]
Question 2.3.5. Exam-style question 1 (Subject 103, April 2003, Question
4)
(i) Given a pseudo-random number U uniformly distributed over [0, 1] ,
obtain an expression in terms of U and θ for a non-negative pseudo-
random variable X which has density function:

f ( x ) = θe−θx

(ii) A sequence of simulated observations is required from the density func-


tion:
e−θx
g( x ) = k(θ ) , x > 0.
1+x
where θ is a non-negative parameter and k (θ ) is a constant of integra-
tion not involving x.

(a) Describe a procedure that applies the Acceptance-Rejection method


to obtain the required observations.
(b) Derive an expression involving θ and k (θ ) for the expected num-
ber of pseudo-random variables required to generate a single ob-
servation from the density g using this method. [6]

Question 2.3.6. An insurer is using the random variable X to model its


claim amounts. The probability density function of X is:
βx
f (x) = , x≥0
(500 + x )5
where β is a constant.
(i) Find the value of β. [2]

(ii) Show that this distribution has mean 500, and variance 500,000. [2]

(iii) Find the distribution function of the distribution. [2]

(iv) Comment on the ease with which we could use the inversion method
to generate random numbers from this distribution. [1]
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES36

(v) By using a 2-parameter Pareto distribution with α = 3 and λ = 500 as


a bounding distribution, generate as many random numbers as possible
from this three-parameter Pareto distribution. You should use the
random numbers from a uniform (0, 1) distribution given below (in
the order given). [6]

For generating values from the 2-parameter Pareto: 0.74 0.82 0.16
For making the acceptance-rejection decision: 0.26 0.59 0.71

2.4 Generating Normal Variates


Many simulations require the generation of normally distributed variates.
The scaling property of the normal distribution means that it is enough to
be able to generate a normal variate Z with mean 0 and variance 1, since
any variate X ∼ N (µ, σ2 ) can then be obtained as X = µ + σZ.

As neither the inverse transform method nor acceptance-rejection sampling


is suitable for the normal distribution, methods based on the distribution’s
particular properties are used.

If X ∼ N (µ, σ2 ), its pdf is given by

( x − µ )2
 
1
f (x) = √ exp − , −∞ < x < ∞
σ 2π 2σ2

where µ is the mean (or expectation) and σ2 the variance of the distribution.

Since inversion of the normal cdf is numerically inefficient, the inverse-


transform method is not very suitable for generating normal random vari-
ables, and some other procedures must be devised instead. We consider
only generation from N (0, 1) (standard normal variables), since any ran-
dom Z ∼ N (µ, σ2 ) can be represented as Z = µ + σX, where X is from
N (0, 1). One of the earliest methods for generating variables from N (0, 1)
is the following, due to Box and müller.

2.4.1 Box-Muller algorithm


The following algorithm can be used for generating a pair z1 , z2 of indepen-
dent random variates from the standard normal distribution. (Generation
of a Normal Random Variable, Box and Müller Approach)
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES37

1. Generate two independent random variables/numbers U1 and U2 from


U (0, 1).

2. Return two independent standard normal variables, X and Y, via

X = (−2 ln U1 )1/2 cos(2πU2 ),


Y = (−2 ln U1 )1/2 sin(2πU2 ).

So in fact algorithm gives you two values each time (X and Y). But you do
have to feed in two values (U1 and U2 ) as well. It might not be obvious but
X and Y are independent.

(Note that you work in radians if you use 2πUi - if you work in degrees
then you need to use 360Ui .)

A disadvantages of the Box-Muller algorithm is the necessity to compute


cos(.) and sin(.), which is time consuming for a computer. The following
algorithm uses essentially the same transformations as the Box-Muller al-
gorithm. However, it avoids computation of cos(.) and sin(.) by using the
acceptance-rejection method.

2.4.2 Polar Algorithm


1. Generate random number u1 and u2 .

2. Set v1 = 2u1 − 1, v2 = 2u2 − 1, and s = v21 + v21 .

Note that v1 and v2 are U (−1, 1).

3. If s > 1 go to step 1.

r
−2 ln s
z1 = v1
s

r
−2 ln s
z2 = v2
s

Analogously to the Box-Muller algorithm, the Polar algorithm generates a


pair of independent random variates from the standard normal distribution.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES38

2.4.3 Approximate Method


Where the exact distribution of the observations is not important, a tech-
nique often employed is to generate a sequence u1 , u2 , · · · , u12 of U (0, 1)
variates and set z = ∑121 ui − 6. This has mean 0, variance 1 and, by the
central limit theorem, is approximately normally distributed.

Various routines for generating random variates can be found in the book
“Numerical Recipes in C: the art of scientific computing”, by W.H. Press,
S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Cambridge University
Press, 1992.
Example 2.4.1. Using the random numbers 0.802 and 0.450 from U (0, 1),
generate a pair of standard normal variates (a) using the Box-Muller algo-
rithm (b) using the polar method. How would you use these methods to
generate values from a N (3, 8) distribution?
Solution. Our z values are:
p
z1 = −2 log 0.802 cos(2π × 0.450) = −0.632
p
z2 = −2 log 0.802 sin(2π × 0.450) = 0.205
You could have used the two initial values the other way round in which case
you would get different answers.

Using the polar method, we have:


v1 = 2 × 0.802 − 1 = 0.604, v2 = 2 × 0.450 − 1 = −0.1
⇒ s = 0.6042 + (−0.1)2 = 0.374816
So our z values are:
r
−2 log 0.374816
z1 = × 0.604 = 1.382
0.374816
and:
r
−2 log 0.374816
z2 = × −0.1 = −0.229.
0.374816
These are our values from the standard normal distribution.

If we want values from a N (3, 8) distribution, then we need to find values


x1 and x2 where: √
x = 3+z 8
using the 2z values we have just calculated.
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES39

2.4.4 Disadvantages of using truly random, as opposed to


pseudo-random, numbers
Numbers which are truly random can be generated by physical processes
(using various electronic devices). In 1955 the RAND Corporation pub-
lished a table of a million random digits obtained with the help of such a
device.

Loaded in a computer’s main memory these truly random numbers are as


convenient to use as pseudo-random numbers generated by an LCG. However
,the table may be too short.indeed,it is not unusual to find that Monte Carlo
simulations require several billion random numbers. on the other hand, a
combination of LCGs can produce a sequence of pseudo random numbers
which, despite still having a finite period, is as good as an infinite one for
all practical purposes.

A special device to generate truly random numbers might be attached to


the computer. Although such a device would allow for generation of an
arbitrarily long sequence of random numbers, still it would not be entirely
satisfactory since it would be impossible to reproduce exactly the same se-
ries of random numbers again. The only alternative would be to store the
original sequence for later re-use.

A great advantage of LCGs is the reproducibility of pseudo-random num-


bers. If the same seed is specified at the beginning of two runs then exactly
the same sequence of pseudo-random numbers is produced. The advantage
of needing to keep only a single seed rather than a sequence of possibly mil-
lions of integers is clear.

Finally, only a single routine is required for generation of pseudo- random


numbers, whereas in the case of truly random numbers we need either a
lengthy table or a hardware enhancement to a computer.

2.4.5 Revision Quizz


1. You are using simulation to find the mean and variance of a random
variable that has the density function,
k
f (x) = (−1 < x < 1)
1 + e− x
(i) Determine the value of the constant k
CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES40

(ii) Generate five pseudo-random values from this distribution based


on the U (0, 1) values 0.017, 0.757, 0.848, 0.531 and 0.321.
(iii) You have generated 1,000 such numbers and found that ∑ x =
165.681 and ∑ x2 = 339.275. Use these statistics to find a 95%
confidence interval for the mean of the distribution and a point
estimate for the standard deviation.

2. State how you would use the polar method to generate pairs of un-
correlated pseudo-random numbers from the standard normal distri-
bution.(2)

3. You are given the following samples from an exponential distribution


with mean 5:
10.6101, 2.7768, 11.8926, 0.1976, 6.6885, 6.4656. Transform these
numbers suitably to produce six samples from the normal distribu-
tion with mean 2 and variance 4. [5]

4. Explain the disadvantages of using truly random, as opposed to pseu-


dorandom, numbers. [3] (ii) List four methods for the generation of
random variates. [2]

• The stored table of random numbers generated by a physical pro-


cess may be too short a combination of linear congruential gener-
ators (LCG) can produce a sequence which is infinite for practical
purposes.

It might not be possible to reproduce exactly the same series of


random numbers again with a truly random number generator
unless these are stored. A LCG will generate the same sequence
of numbers with the same seed.

Truly random numbers would require either a lengthy table or


hardware enhancement compared with a single routine for pseudo
random numbers.
• Inverse Transform method.
Acceptance-Rejection Method
Box-Muller algorithm (from the standard normal distribution)
Polar algorithm (from the standard normal distribution)

5. (i) Take the following values as a random sample from a U (0, 1)


CHAPTER 2. GENERAL APPROACH TO GENERATING RANDOM VARIATES41

distribution, i.e. a uniform distribution on the range 0 to 1.

0.1423 0.3372 0.6772 0.9192

Use these values to generate four random variates from each of


the following distributions, explaining carefully the method you
use in each case.
(a) The standard Normal distribution. (6)
Solution. The inverse cumulative distribution function method
can be used with tables of the standard Normal cdf Φ( x ). The
values of z are such that Φ(z) = u, and for the four values of
u the corresponding values of z are 1.07, 0.42, +0.46, +1.40.
(b) The Normal distribution with expected value 2 and variance
0.81. (3)
Solution. These can be transformed to N (2, 0.81) by w =
µ + σz or w = 2 + 0.9z, to give 2.963, 2.378, 1.586, 0.740.
(c) The chi-squared distribution with one degree of freedom. (3)
Solution. The chi-squared distribution with one degree of
freedom is the square of N (0, 1), so take values of z2 from
(i): 1.14, 0.18, 0.21, 1.96.
Chapter 3

Generation of random
variates from a specified
distribution using R

3.1 Introduction
Most computer languages already contain a built-in pseudo-random number
generator. The user is typically requested only to input the initial seed, X0 ,
and upon invocation, the random number generator produces a sequence of
independent uniform (0, 1) random variables.

In R a similar kind of operation (though using a different formula, and


with a much longer cycle) is used internally by R to provide pseudo-random
numbers automatically with the function;

runif

Syntax: runif(n, min = a, max = b)

Execution of this command produces n pseudo-random uniform numbers


on the interval [ a, b]. The default values are a=0 and b=1. The seed is
selected internally. This defaults to creating numbers between 0 and 1.

runif(100)

runif(500)

42
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US

We can change the defaults runif(n, a, b). n is how many numbers you
want to generate, a is the lower bound, b is the upper bound.

runif(50, 0, 10) # 50 random variables between 0 and 10


runif(50, 10, 20) # 50 random variables between 10 and 20

Uniform random numbers are equally likely to be any number between the
bounds. This means that when we draw a histogram of our simulated ran-
dom numbers, it should be almost flat.

Sometimes it is helpful to be able to generate the same set of random num-


bers again. For example, when debugging (error fixing) code you might
want to use the same set of random numbers that created an error again.
When you start R it automatically chooses a set of random numbers, then
from that set, the second set is pre-determined. If we pre-select one of the
billions of starting points, then it makes the numbers more random.

We can do this using set.seed(seed number)

set.seed(20)
x=runif(200);mean(x)

set.seed(40)
x=runif(200);mean(x)

set.seed(40)
x=runif(200);mean(x)
set.seed(20)
x=runif(200);mean(x)
y=runif(200);mean(y)

Notice how the first and third mean are the same but the last is different.
Example 3.1.1. Generate a sample of size 1000 from a uniform (0, 1) dis-
tribution (runif(1000)) and look at the following;

(a) histogram of the original values

(b) histogram of their natural logarithms

(c) histogram of the logit transform of the values


CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US

x<-runif(1000)
hist(x)
hist(log(x))
hist(log(x/(1-x)))

Objective: To derive and compute probability functions, cumulative prob-


ability functions and quantiles using the R software and apply them in prac-
tice problems.

3.2 Probability Distributions


R has a wide range of built-in probability distributions, for each of which
four functions are available: the probability density function (which has a d
prefix); the cumulative probability (p); the quantiles of the distribution (q);
and random numbers generated from the distribution (r). Each letter can
be prefixed to the R function names in the table below (e.g dbeta)

R function Distribution Parameters


beta beta shape 1, shape 2
binom binomial sample size, probability
cauchy cauchy location, scale
exp exponential rate (optional)
chisq chi-squared degrees of freedom
f Fisher’s F df1, df2
gamma gamma shape
geom geometric probability
hyper hypergeometric m, n, k
Inom lognormal mean, standard deviation
logis logistic location, scale
nbinom negative binomial size, probability
norm normal mean, standard deviation
pois poisson mean
signrank Wilcoxon Signed rank statistic sample size n
t student’s t degree of freedom
unif uniform minimum, maximum (opt)
weibul Weibull shape
wilcox Wilcoxon rank sum m, n
Random variables are commonly needed for simulation and analysis.
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US

A seed can be specified for the random number generator. This is important
to allow replication of results (e.g., while testing and debugging).

3.2.1 Discrete Distributions


Uniform random variables
x=runif(n, min=0, max=1)

The argument specify the number of variables to be created and the range
over which they are distributed (by default unit interval).

Binomial distribution using R


The R commands for calculating binomial probability, cumulative probabil-
ity and quantile are as follows:

• To find P( X = x ) for X ∼ Binomial (n, p), the command is dbinom(x,n,p).

You will need to input all three values.

• To find the cumulative probability P( X ≤ x ) for X ∼ Binomial (n, p),


the command is pbinom(x,n,p).

• To find the quantile (inverse cumulative probability) x such that P( X ≤


x ) = a for X ∼ Binomial (n, p), the command is qbinom(a,n,p).

• To generate random numbers for X ∼ Binomial (n, p), the command


is rbinom(num,n,p). Where num is the number of random variates to
be generated.

Note that for this, you will input the probability value a to find x.

For X ∼ Binomial (n, p), n defines the number of trials and p denotes
the probability of success. For a help menu about the binomial distribution
functions type the command help(qbinom.

Note that, by default, the pbinom function reports the probability for P( X ≤
x ) (the first argument). When working with discrete density functions, be
careful about off-by-one errors. You can verify your answer by summing the
pdf: sum(dbinom(8:10, 10, p) for 1-pbinom(7,size=10, prob=p
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US

Question 3.2.1. Using the R commands for the Binomial distribution, an-
swer the following questions. To answer these questions, you will need to
show the derivation and write down the R commands that you used for com-
putations.

The firing rates of 16neuros are recorded in a particular region of the brain
of a monkey under each of two experimental conditions. The number of
neurons in which the firing rate is greater in Condition 1 than in Condition
2 is counted. Suppose that in the population of neurons the probability of
a higher firing rate for Condition 1 is 0.38.

(a) What is the probability that exactly 7 of the 16 neurons have higher
firing rates for Condition 1?

(b) What is the probability that 8 or fewer have higher firing rates for
Condition 1?

(c) What is the probability that more than 8 have higher firing rates for
Condition 1?

(d) What is the probability that 8 or more have higher firing rates for
Condition 1?

(e) What is the probability that from 5 to 7 have higher firing rates for
Condition 1?

Question 3.2.2. Problem Suppose there are twelve multiple choice ques-
tions in an English class quiz. Each question has five possible answers, and
only one of them is correct. Find the probability of having four or less
correct answers if a student attempts to answer every question at random.
Solution Since only one out of five possible answers is correct, the prob-
ability of answering a question correctly by random is 1/5=0.2. We can find
the probability of having exactly 4 correct answers by random attempts as
follows.
¿ dbinom(4, size=12, prob=0.2) [1] 0.1329 To find the probability of hav-
ing four or less correct answers by random attempts, we apply the function
dbinom with x = 0,,4.
¿ dbinom(0, size=12, prob=0.2) + + dbinom(1, size=12, prob=0.2) +
+ dbinom(2, size=12, prob=0.2) + + dbinom(3, size=12, prob=0.2) + +
dbinom(4, size=12, prob=0.2) [1] 0.9274 Alternatively, we can use the cu-
mulative probability function for binomial distribution pbinom.
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US

¿ pbinom(4, size=12, prob=0.2) [1] 0.92744 Answer The probability of


four or less questions answered correctly by random in a twelve question
multiple choice quiz is 92.7

3.2.2 Poisson distribution using R


The Poisson distribution is sometimes referred to as the distribution of “rare
events”. This is done because p (the probability of a single Bernoulli trial
being successful) is usually very, very small. This is counterbalanced by the
fact that there are many, many opportunities for that event to happen, so
overall, the event is not that rare.
1. An average of four babies are born each night on a particular maternity
ward.

(i) What is the probability that the intern on call will be able to
sleep through the night?
We simply evaluate the Poisson distribution for the case where
k = 0, and λ = 4. The R-code is: >dpois(0,4)
(ii) What is the probability that the intern will be swamped (defined
as more than six births to be attended to)?
We will use the cumulative distribution function. The R-code is:
1-ppois(6,4)

2. Suppose that a coin is tossed 100 times.We would expect, on average,


to get about 50 heads. Using our binomial probability function,

(i) Find the probability of getting between 45 and 55.


(ii) What is the chance of getting between 40 and 60 heads in 100
tosses?
(iii) What is the chance of getting between 35 and 65 heads in 100
tosses?

3. Generate 1000 random numbers from each of the following distribu-


tions and draw the histogram.

(i) Bin(n, p) for n = 20, 20, 40 and p = 0.5, 0.3, 0.9.


(ii) Geo( p) for p = 0.9, 0.5, 0.3.
(iii) Poisson(λ) with λ = 1, 4, 10.
(iv) Hypergeo( N1 , N2 , m) with N1 = 100, N2 = 50, m = 20, and
N1 = 1000, N2 = 1000, m = 40.
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US

4.

5. The employees of a firm that does asbestos cleanup are being tested
for indications of asbestos in their lungs. the firm is asked to send four
employees who have positive indication of asbestos on to a medical
center for further testing. Assume 40% of the employees have positive
indications of asbestos in their lungs. Let x be the number of employees
that are tested before finding the four who do have asbestos in their
lungs.

(i) Name an appropriate probability distribution for x, including its


parameters. (Look up the help file for this distribution in R to
review how R defines the parameters.)
(ii) Use R to calculate the probability that six employees who do not
have asbestos in their lungs must be tested before finding four
who do have asbestos in their lungs
(iii) Use R to calculate the probability that the firm must test a total
of 15 employees to find four who have asbestos in their lungs.

6. Let X be the number of heads in 100 tosses of a fair coin. Using R to


calculate the probability of getting

(i) at least 50 heads (that is, 50 or more)


(ii) exactly 50 heads
(iii) between 40 and 60 heads, inclusive.

3.2.3 Normal distribution using R


The R commands for calculating normal density, cumulative probability and
quantile are as follows:

• To find the density f ( x ) for X ∼ Normal(m, s2 ), the command is


dnorm(x, mean=m, sd=s)

If you input only x, the default values for mean and standard de-
viation are m = 0 and s = 1, and therefore, the command dnorm(x)
finds the density function for a standard normal random variable at
value x.

• To find the cumulative probability P( X ≤ x ) for X ∼ Normal(m, s2 ),


the command is pnorm(x,mean=m, sd=s).
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US

• To find the quantile (inverse cumulative probability) x such that P( X ≤


x p ) = p for X ∼ Normal(m, s2 ), the command is
qnorm(p,mean=m, sd=s).

Note that for this, you will input the probability value p to find x p .

For a help about the normal distribution functions type the help command
help(qnorm).
Example 3.2.1. Using the R commands for the Normal distribution, an-
swer the following questions. To answer these questions, you will need to
show the derivation and write down the R commands that you used for com-
putations.

The Rockwell hardness of a metal is determined by pressing a hardened


point into the surface of the metal and then measuring the depth of pene-
tration of the point. Suppose the Rockwell hardness of a particular alloy is
normally distributed with mean 70 and standard deviation3 (assume that
Rockwell hardness is measured on a continuous scale).

(a) If a specimen is acceptable only if its hardness is between 67 and


75, what is the probability that a randomly chosen specimen has an
acceptable hardness?

(b) If the acceptable range of hardness was (70 − c, 70 + c), for what value
of c would 95% of all specimens have acceptable hardness?

Lab Exercise 7: Distribution in R


By Omari C.O
Write the answers to the questions below in a plain text file, consisting of R
commands and comments (lines starting with ‘#’), so that the file can run
in R with the “source” function, and email the file to cyomari”dkut.ac.ke,
on Wednesday 26 November, at the latest.

In a number of exercises, we will look at how we can use simulation to


learn about statistical issues, and solve concrete problems.

1. We will use as examples 4 distributions which have most or all of their


probability density in the interval between 0 and 10:

• The Normal distribution with mean 5 and standard deviation 2.


CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US

• The Uniform distribution on the interval from 0 to 10.


• The chi-square distribution with 3 degrees of freedom.
• The F distribution with 20 and 3 as the first and second degrees
of freedom, respectively.

Start with making 4 plots, each showing the probability density func-
tion on the interval from 0 to 10, for each of these functions. Hints:
Look for the functions dnorm, dunif, dchisq. and df. You can place
all plots in the same figure using the command

> par(mfrow=c(2,2)

2. For each of the 4 distributions, simulate 10000 random values, and


make a histogram of these values. (Use breaks=“Scott” as a parameter
to get more detail in the histogram). How do they compare to the
density functions of the previous exercise?

3. For each of the 4 distributions from exercise 1, compute the proba-


bility of observing a value above 10. Then answer this exercise ap-
proximately by simulation: Simulate 10000 values and find which pro-
portion of these values are above 10. Run the simulations a couple of
times to get an impression of the variability of the result.

4. For each of the 4 distributions, do the following: Simulate 10000 val-


ues, and compute their mean and variance. Compare with the mean
and variance of the probability distribution, as computed from its pa-
rameters. Use www.wikipedia.org, or your probability and statistics
II notes, or your favorite statistics book, to find the formulas for the
mean and variance in terms of the parameters of the distribution. How
good is the approximation? Repeat the simulation a couple of times,
to get an impression of the variability of the result.

5. We continue to investigate the variability of the mean: Simulate 10000


values of the mean of three values from the chi-square distribution with
3 degrees of freedom. Hint: Construct first a matrix with 10000 rows
and 3 columns, where the entries are 30000 simulated values from the
chi-square distribution with 3 degrees of freedom. Then, apply the
mean function to the rows, using the apply function. Make a his-
togram of the simulated values.

Then, make three similar histograms, but where you take the mean
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US

over 10 values, 30 values, and finally 100 values, respectively. Hint:


First, construct a matrix with 10000 rows and 10 columns, where the
entries are 100000 simulated values from the chi-square distribution
with 3 degrees of freedom. Apply the mean function to the rows, and
make a histogram. Second, construct a matrix with 10000 rows and
30 columns, where the entries are 300000 simulated values from the
chi-square distribution with 3 degrees of freedom. Apply the mean
function to the rows, and make a histogram. Third: construct a ma-
trix with 10000 rows and 100 columns,where the entries are 100000
simulated values from the chi-square distribution with 3 degrees of
freedom. Apply the mean function to the rows, and make a histogram.

Can you explain the difference between the histograms?

6. Repeat the previous exercise for each of the three other distributions
mentioned in the first exercise. What do you observe?Can you connect
this to previous theoretical statistical knowledge?

7. Simulating data in R: Using the tools provided by R simulate 500 data


values from the following distributions:

(a) Binomial distribution with parameters n = 100 and p = 0.5,


(b) Binomial distribution with parameters n = 100 and p = 0.1,
(c) Poisson distribution with parameters λ = 1,
(d) Poisson distribution with parameters λ = 0.2,
(e) Poisson distribution with parameters λ = 100,
(f) Normal distribution with mean zero and standard deviation one.
(g) Normal distribution with mean 100 and standard deviation 20.
(h) Uniform (−2, 2).
(i) Cauchy with parameter (100)
(j) Gamma distribution with parameters shape = 2 and scale = 1

8. Calculate the probability for each of the following events: (a) A stan-
dard normally distributed variable is larger than 3. (b) A normally
distributed variable with mean 35 and standard deviation 6 is larger
than 42. (c) Getting 10 out of 10 successes in a binomial distribution
with probability 0.8. (d) X < 0.9 when X has the standard uniform
distribution. (e) X > 6.5 in a χ2 distribution with 2 degrees of free-
dom.
CHAPTER 3. GENERATION OF RANDOM VARIATES FROM A SPECIFIED DISTRIBUTION US

9. A rule of thumb is that 5% of the normal distribution lies outside an


interval approximately ±2s about the mean. To what extent is this
true? Where are the limits corresponding to 1%, 0.5%, and 0.1%?
What is the position of the quartiles measured in standard deviation
units?

10. For a disease known to have postoperative complication frequency of


20%, a surgeon suggest to a new procedure. He tests it on 10 patients
and there are no complications. What is the probability of operating
on 10 patients successfully with the traditional method?

11. Simulated coin-tossing can be done using rbinom instead of sample.


How exactly would you do that?

12. Write the R code to compute:

(a) Find the 95th percentile of the F distribution with (5, 2) degrees
of freedom.
(b) Find the 2.5th and 97.5th percentiles of the Student t distribution
with 5 degrees of freedom.
(c) Find the 95th percentile of the Chi-Squared distribution with 7
degrees of freedom.
(d) Assume that the test scores of a college entrance exam fits a
normal distribution. Furthermore, the mean test score is 72, and
the standard deviation is 15.2. What is the percentage of students
scoring 84 or more in the exam?
(e) Suppose the mean checkout time of a supermarket cashier is three
minutes. Find the probability of a customer checkout being com-
pleted by the cashier in less than two minutes. Ans: pexp(2,
rate=1/3)
Chapter 4

Monte Carlo Methods and


Importance Sampling

History and definition: The term “Monte Carlo” was apparently first
used by Ulam and von Neumann as a Los Alamos code word for the stochas-
tic simulations they applied to building better atomic bombs. Their meth-
ods, involving the laws of chance, were aptly named after the international
gaming destination; the moniker stuck and soon after the War a wide range
of sticky problems yielded to the new techniques. Despite the widespread
use of the methods, and numerous descriptions of them in articles and mono-
graphs, it is virtually impossible to find a succint definition of “Monte Carlo
method” in the literature. Perhaps this is owing to the intuitive nature of
the topic which spawns many definitions by way of specific examples. Some
authors prefer to use the term “stochastic simulation” for almost everything,
reserving “Monte Carlo” only for Monte Carlo Integration and Monte Carlo
Tests (cf. Ripley 1987). Others seem less concerned about blurring the dis-
tinction between simulation studies and Monte Carlo methods.
Be that as it may, a suitable definition can be good to have, if for nothing
other than to avoid the awkwardness of trying to define the Monte Carlo
method by appealing to a whole bevy of examples of it. Since I am (so
Elizabeth claims!) unduly influenced by my advisor’s ways of thinking, I
like to define Monte Carlo in the spirit of definitions she has used before. In
particular, I use:
Definition: Monte Carlo is the art of approximating an ex-
pectation by the sample mean of a function of simulated random
variables
We will find that this definition is broad enough to cover everything that

53
CHAPTER 4. MONTE CARLO METHODS AND IMPORTANCE SAMPLING54

has been called Monte Carlo, and yet makes clear its essence in very familiar
terms: Monte Carlo is about invoking laws of large numbers to approximate
expectations.

While most Monte Carlo simulations are done by computer today, there
were many applications of Monte Carlo methods using coin-flipping, card-
drawing, or needle-tossing (rather than computer generated pseudo-random
numbers) as early as the turn of the century—long before the name Monte
Carlo arose.

4.0.1 Introduction
Two major classes of numerical problems that arise in statistical inference
are optimization problems and integration problems. Indeed, numerous ex-
amples (see Rubinstein, 1981, Gentle, 2002, or Robert, 2001) show that it is
not always possible to analytically compute the estimators associated with
a given paradigm (maximum likelihood, Bayes, method of moments, etc.).

Thus, whatever the type of statistical inference, we are often led to con-
sider numerical solutions. The previous chapter introduced a number of
methods for the computer generation of random variables with any given
distribution and hence provides a basis for the construction of solutions to
our statistical problems. A general solution is indeed to use simulation, of
either the true or some substitute distributions, to calculate the quantities of
interest. In the setup of decision theory, whether it is classical or Bayesian,
this solution is natural since risks and Bayes estimators involve integrals
with respect to probability distributions.

Note that the possibility of producing an almost infinite number of ran-


dom variables distributed according to a given distribution gives us access
to the use of frequentist and asymptotic results much more easily than in the
usual inferential settings, where the sample size is most often fixed. One can
therefore apply probabilistic results such as the Law of Large Numbers or
the Central Limit Theorem, since they allow assessment of the convergence
of simulation methods (which is equivalent to the deterministic bounds used
by numerical approaches).

A main difficulty with numerical integration methods such as integrate is


that they often fail to spot the region of importance for the function to be
integrated. In contrast, simulation methods naturally target this region by
CHAPTER 4. MONTE CARLO METHODS AND IMPORTANCE SAMPLING55

exploiting the information provided by the probability density associated


with the integrals.

Lastly, numerical integration tools cannot easily face the highly (or even
moderately) multidimensional integrals that are the rule in statistical prob-
lems. Devising specific integration tools for those problems would be too
costly, especially because we can take advantage of the probabilistic nature
of those integrals.

4.0.2 Classical Monte Carlo Integration


The generic problem is about evaluating the integral
Z
E f [h( X )] = h( x ) f ( x ) dx (4.1)
χ

where χ denotes the set where the random variable X takes its value. The
principle of the Monte Carlo method for approximating (3.1) is to generate
a sample ( X1 , . . . , Xn ) from the density f and propose as an approximation
the empirical average

4.0.3 Simulation Procedures


When in a given situation, the elements of chance prevail then the process
can be covered under the probability technique. In such random behavior
situation, the method of Monte Carlo Simulation explained below can be
used for dependable variables.

Monte Carlo Simulation


The method that is used for devising a model of a stochastic system is
known as Monte Carlo Simulation Method. This technique is useful
for predicting the nature of behaviour of a system. Monte Carlo’s simula-
tion procedure is essentially based on the element of chance. It is therefore
called ‘Probabilistic simulation’. Accordingly, the data involved in a prob-
lem is simulated by using random number generators.

The basic characteristics of Monte Carlo’s simulation technique are:-

(i). Supposition of a model in the form of a probability distribution of the


concerned variable.
CHAPTER 4. MONTE CARLO METHODS AND IMPORTANCE SAMPLING56

(ii). It is a mechanical process based on random number generators.

The procedure of simulation on the basis of Monte Carlo technique, can be


explained with the help of the following examples:
Example 4.0.1. A milk dairy’s records of sales of one litre milk packets
during 100 days are as follows:
Demand: 8 9 10 11 12 13 14 15 16 17
Number of Days: 5 9 10 15 13 8 11 14 8 7
Using the following random numbers simulate the demand for the first 15
days.

Random: 23 64 18 96 71 46 54 8 11 81 75 39 28 43 52
Solution. Table 1

Demand: 8 9 10 11 12 13 14 15 16 17
Probability: 0.05 0.09 0.10 0.15 0.13 0.08 0.11 0.14 0.08 0.07
Table 2

Demand Probability Cumulative Random Number Random number


Probability Interval Number
8 0.05 0.05 0−4 −
9 0.09 0.14 5 − 13 8, 11
10 0.10 0.24 14 − 23 23, 18
11 0.15 0.39 24 − 38 28
12 0.13 0.52 39 − 51 46, 39, 43
13 0.08 0.60 52 − 59 54, 52
14 0.11 0.71 60 − 70 64
15 0.14 0.85 71 − 84 71, 81, 75
16 0.08 0.93 85 − 92 −
17 0.07 1.00 93 − 100 96
The random numbers have to be appropriately inserted in the intervals and
on the basis of that adjustment the corresponding demand has to be deter-
mined. The range of the entire systems of random numbers is from 0 to
100. On the basis of the probability item (0.05) the length of the interval
is fixed. 0.05 denotes a length of 5 units. Hence the interval is 0 − 4 (both
limits inclusive). In that manner, all the other intervals are determined.
CHAPTER 4. MONTE CARLO METHODS AND IMPORTANCE SAMPLING57

The following is the schedule of simulated demand for the first 15 days,
in the order of the given random numbers;

Random numbers: 23 64 18 96 71 46 54 8 11 81 75 39 28 43 52
Day: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Demand: 10 14 10 17 15 12 13 9 9 15 15 12 11 12 13
From table 2, read the demand against each random number.
Example 4.0.2. Mill bakery maintains sufficient stock of its ‘Ever sweet’
cake and the daily demand is as under:

Daily Demand 0 10 20 30 40 50 60 70 80
Probability 0.02 0.16 0.23 0.15 0.13 0.12 0.10 0.06 0.03
Using the sequence of random numbers simulate the demand for the next
12 days. If the proprietor of the bakery decides to make 40 cakes everyday,
then calculate the stock position at the end of the 12th day. Also calculate
the daily average demand for the cakes.
Random: 36 29 84 57 19 79 46 67 08 81 87 94
Solution. The table below shows the required columns to simulate the daily
average demand for the cakes.
Demand Probability Cumulative Random Number Random
Probability Interval Number
0 0.02 0.02 00 − 01 −
10 0.16 0.18 2 − 17 8
20 0.23 0.41 18 − 40 36, 29, 19
30 0.15 0.56 41 − 55 46
40 0.13 0.69 56 − 68 57, 67
50 0.12 0.81 69 − 80 79
60 0.10 0.91 81 − 90 84, 81, 87
70 0.06 0.97 91 − 96 94
80 0.03 1.00 97 − 99 −
The following is the schedule of simulated demand for the first 15 days.

Random Numbers 36 29 84 57 19 79 46 67 08 81 87 94
Day 1 2 3 4 5 6 7 8 9 10 11 12
Demand 20 20 60 40 20 50 30 40 10 60 60 70
Stock Position 20 40 20 20 40 30 40 40 70 50 30 0
CHAPTER 4. MONTE CARLO METHODS AND IMPORTANCE SAMPLING58

The total demand is 480

Average Daily Demand = 480/12 = 40 cakes per day.


CHAPTER 4. MONTE CARLO METHODS AND IMPORTANCE SAMPLING59

Assignment I

1. In a management institution, the first lecture starts at 10.00am. The


following is the probability distribution regarding the number of stu-
dents who are late comers for the first lecture each day.

Number of students coming late: 3 6 9 12 15 18


Probability: 0.4 0.3 0.2 0.005 0.03 0.02

Simulate the system for 15 days and find the average number of late
corners per day, using the following random numbers.

94, 22, 12, 64, 94, 60, 85, 0, 1, 91, 44, 43, 47, 39, 38, 77
Chapter 5

Mathematical Functions

5.1 Introduction
For the kinds of functions you will meet in statistical computing there are
only three mathematical rules that you need to learn: these are concerned
with powers, exponents, and logarithms. In the expression x b the explana-
tory variable is raised to the power b. In e x the explanatory variable appears
as a power-in this special case, of e = 2.71828, of which x is the exponent.
The inverse of e x is the logarithm of x, denoted by log( x ) - note that all
our logs are to the base e and that, for use writing log( x ) is the same as ln( x ).

It is also useful to remember a handful of mathematical facts that are use-


ful for working out behaviour at the limits. We would like to know what
happens to y when x gets very large (e.g. x → ∞) and what happens to y
when x goes to 0 (i.e. what the intercept is, if there is one). These are the
most important rules:

• Anything to the power of 1: x0 = 1

• One raised to any power is still 1: 1x = 1

• Infinity plus 1 is infinity: ∞ + 1 = ∞

• One over infinity (the reciprocal of infinity, ∞−1 ) is zero 1


∞ =0
• A number bigger than 1 raised to the power infinity is infinity: 1.2∞ =
∞.

• A fraction (e.g. 0.99) raised to the power infinity is zero: 0.99∞ = 0.

60
CHAPTER 5. MATHEMATICAL FUNCTIONS 61

• Negative powers are reciprocals:


1
x −b =
xb

• Fractional powers are roots:



x1/3 = 3
x

• The base of natural logarithms, e, is 2.71828, so e∞ = ∞.

• Last, but perhaps most usefully:


1 1
e−∞ = ∞
= ∞ =0
e e

There are built-in functions in R for logarithmic, probability and trigono-


metric functions.
Recall: STA 2205 Introduction to computer interactive statistics.

5.1.1 Logarithmic function


The logarithmic function is given by

y = a ln(bx )

Here the logarithm is to base e. The exponential function, in which the


response y is the antilogarithm of the continuous explanatory variable x is
given by
y = aebx
Both these functions are smooth functions, and to draw smooth functions
in R you need to generate a series of 100 or more regularly spaced x values
between min(x) and max(x):

x<-seq(0,10,0.1)

In R the exponential function is exp and the natural log function ln is


log. Let a=b=1. To plot the exponential and logarithmic functions with
these values together in a row, write
y<-exp(x)
plot(y~x, type="I", main= "Exponential")
y<-log(x)
CHAPTER 5. MATHEMATICAL FUNCTIONS 62

plot(y~x, type="I", main= "Logarithmic")

Note that the plot function can be used in an alternative way, specifying
the cartesian coordinates of the line. Using plot(x,y) rather than the for-
mula plot(y x).

These functions are most useful in modelling process of exponential growth


and decay.

5.1.2 Trigonometric functions


Here are the cosine (base/hypothenuse), sine(perpendicular/hypothenuse)
and tangent (perpendicular/base) functions of x (measured in radians) over
the range 0 to 2π.

Recall that the full circle is 2π radians, so

1 radian = 360/2π = 57.29578 degrees

x<-seq(0,2*pi,2*pi/100)

y1<-cos(x)
y2<-sin(x)
plot(y1~x, type= "I", main= "cosine curve")
plot(y2~x, type= "I", main= "sine curve")
y3<-tan(x)
plot(y3~x, type= "I", ylim= c(-3,3), main= "tangent curve")

The tangent of x has discontinues, shooting off to positive infinity at x =


π/2 are again at x = 3π/2. Restricting the range of values plotted on the
y axis (here from −3 to +3) therefore gives a better picture of the shape
of the tan function. Note that R joins the plus infinity and minus infinity
‘points’ with a straight line at x = π/2 and at x = 3π/2 within the frame
of the graph defined by ylim.

5.1.3 Power laws


There is an important family of two-parameter mathematical functions of
the form
y = ax b
CHAPTER 5. MATHEMATICAL FUNCTIONS 63

known as power laws.

Depending on the value of the power, b, the relationship can take one of
five forms. In the trivial case of b = 0 the function is y = a (a horizontal
straight line).
The four more interesting shapes are as follows:
x<-seq(0,1,0.01)
y<-x^0.5
plot(x, y, type="I",main="0<b<1")
y<-x
plot(x, y, type="I", main="b=1")
y<-x^2
plot(x, y, type="I",main="b>1")
y<-1/x
plot(x, y, type="I", main="b<0")

These functions are useful in a wide range of disciplines. The parameters a


and b are easy to estimate from data because the function is linearized by a
log-log transformation,

log(y) = log( ax b )
= log( a) + b log( x )

So that on log-log axes the intercept is log(a) and the slope is b. These
are often called allometric relationships because when b ̸= 1 the proportion
of x that becomes y varies with x.

5.1.4 Polynomial functions


Polynomial functions are functions in which x appears several times, each
time raised to a different power. They are useful for describing curves with
humps, inflections or local maxima like the following:

x<-seq(0,10,0.1)
y1<-2+5*x-0.2*x^2

Making the negative coefficient of the x2 term larger produces a curve with
a hump as follows:

y2<-2+5*x-0.4*x^2
CHAPTER 5. MATHEMATICAL FUNCTIONS 64

Cubic polynomials can show points of inflection, as in the following:

y3<-2+4*x-0.6*x^2+0.04*x^3

Finally, polynomial containing powers of 4 are capable of producing curves


with local maxima, as in the following:

y4<-2+4*x+2*x^2-0.6*x^3+0.04*x^4

Inverse polynomials are an important class of functions which are suitable


for setting up generalized linear models with gamma errors and inverse link
functions:
1
= a + bx + cx2 + dx3 + · · · + zx n .
y
Various shapes of function are produced, depending on the order of the
polynomial (the maximum power and the signs of the parameters:

Gamma function
The gamma function Γ(t) is an extension of the factorial function t!, to
positive real numbers:
Z ∞
Γ(t) = x t−1 e− x dx
0

It looks like this:

t<-seq(0.2, 4, 0.01)
plot(t, gamma(t), type="I")
abline(h=1, lty=2)

Note that Γ(t) is equal to 1 at both t = 1 and t = 2. For integer val-


ues of t, Γ(t + 1) = t!.

5.2 Probability Functions


There are many specific probability distributions in R, (normal, poisson,
binomial etc). But first we look at the base mathematical functions that
deal with elementary probability.
CHAPTER 5. MATHEMATICAL FUNCTIONS 65

Factorial function
The factorial function gives the number of permutations of n items.

n! = n(n − 1)(n − 2) · · · × 3 × 2 × 1

The R function is factorial and we can plot it for values of x from 0 to


10 using the step option type="s", in plot with a logarithmic scale on the
y axis log="y".

x<-0:6
plot(x, factorial(x), type= "s", main= "factorial x", log= "y")

The other important base function for probability calculations in R is the


choose function which calculates binomial coefficients.

These show the number of ways there are of selecting x items out of n items
when the item can be one of just two types (e.g. either male or female, black
or white, solvent or insolvent.)
 
n n!
=
x x!(n − x )!
So with n = 8 and x = 3, we get:
8×7×6
 
n 8!
= = = 56
x 3!(8 − 3)! 3×2
and in R.

choose(8,3)
[1] 56

The following is a graph of the number of ways of selecting from 0 to 8


males out of 8 individuals.

plot(0:8, choose(8, 0:8), type= "s",


main="binomial coefficients")

5.2.1 Continuous probability distributions


R has a wide range of built-in probability distributions, for each of which
four functions are available: the probability density function (which has a d
CHAPTER 5. MATHEMATICAL FUNCTIONS 66

prefix); the cumulative probability (p); the quantiles of the distribution (q);
and random numbers generated from the distribution (r). Each letter can
be prefixed to the R function names in the table below (e.g dbeta)

R function Distribution Parameters


beta beta shape 1, shape 2
binom binomial sample size, probability
cauchy cauchy location, scale
exp exponential rate (optional)
chisq chi-squared degrees of freedom
f Fisher’s F df1, df2
gamma gamma shape
geom geometric probability
hyper hypergeometric m, n, k
Inom lognormal mean, standard deviation
logis logistic location, scale
nbinom negative binomial size, probability
norm normal mean, standard deviation
pois poisson mean
signrank Wilcoxon Signed rank statistic sample size n
t student’s t degree of freedom
unif uniform minimum, maximum (opt)
weibul Weibull shape
wilcox Wilcoxon rank sum m, n
The cumulative probability function is a straightforward notion: it is an
s-shaped curve showing, for any value of x, the probability of obtaining a
sample value that is less than or equal to x. Here is what it looks like for
normal distribution:

curve(pnorm(x),-33)
arrows(-1,0,-1,pnorm(-1),col="red")
arrows(-1,pnorm(-1),-3,pnorm(-1),col="green")

The value of x (−1) leads up to the cumulative probability (red arrow)


and the probability associated with obtaining a value of this size (−1) or
smaller is on the y-axis (green arrow). The value on the y axis is 0.1586653:

pnorm(-1)
[1] 0.1586653
CHAPTER 5. MATHEMATICAL FUNCTIONS 67

The probability density is the slope of the curve (its ‘derivative’)

d
( F ( x )) = f ( x )
dx
You can see at once that the slope is never negative. The slope starts out
very shallow up to about x = −2, increases up to a peak (at x = 0 in this
example) then gets shallower, and becomes very small indeed above about
x = 2.

Here is what the density function of the normal (dnorm) looks like:

curve(dnorm(x),-3,3)

For a discrete random variable, like the poisson or the binomial, the proba-
bility density function is straightforward: it is simply a histogram with the
y axis scaled as probabilities rather than counts, and the discrete values of
x (0, 1, 2,· · · ) on the horizontal axis. But for a continuous random variable,
the definition of the probability density function is more subtle: it does not
have probabilities on the y axis, but rather the derivative (the slope) of the
cumulative probability function at a given value of X.

Normal distribution
When the distribution has mean 0 and standard deviation 1 (the standard
normal distribution) the equation becomes:

1 2
f (z) = √ ez /2

Suppose we have measured the heights of 100 people. The mean height was
170cm and the standard deviation was 8cm. We can answer three sorts
of questions about the data that follows: What is the probability that a
randomly selected individual will be:

(i). shorter than 160cm?

(ii) taller than 185cm?

(iii) between 160cm and 185 cm?


CHAPTER 5. MATHEMATICAL FUNCTIONS 68

First, we standardize the values using the statistic:


y − ȳ
z=
s
For less than 160cm
160 − 170
z= = −1.25
8
Now we need to find the probability of a value of the standard normal taking
a value of −1.25 or smaller. This is the area under the left hand tail (the
integral) of the density function. The function we need for this is pnorm:

pnorm(-1.25)
[1] 0.1056498

For taller than 185cm?


185 − 170
z= = 1.875
8
pnorm(1.875)
[1] 0.9696036

This is the probability that someone will be less than or equal to 185cm
(that is what the function pnorm has been written to provide). All we need
to do is to work out the complement of this:

1-pnorm(1.875)
[1] 0.03039636

Finally, for probability 165cm and 180cm


165 − 170 180 − 170
z1 = = −0.625 and z2 = = 1.25
8 8
pnorm(1.25)-pnorm(-0.625)
[1] 0.6283647 subsectionThe central limit theorem If you take repeated
samples from a population with finite variance and calculate their averages,
then the averages will be normally distributed. This is called the central
limit theorem.

We can take five uniformly distributed random numbers between 0 and


10 and work out the average. The average will be low when we gegt, say,
CHAPTER 5. MATHEMATICAL FUNCTIONS 69

2, 3, 1, 2, 1 and big when we get 9, 8, 9, 6, 8. Typically, of course, the average


will be close to 5. Let’s do this 10,000 times and look at the distribution
of the 10,000 means. The data are rectangularly (uniformly) distributed on
the interval 0 to 10, so the distribution of the raw data should be flat-topped:

hist(runif(1000)*10, main="")

What about the distribution of sample means, based on taking just 5 uni-
formly distributed random numbers?

means<-numeric(1000)
for(i in 1:1000){
means[i]<-mean(runif(5)*10)
}
hist(means, ylim= c(0,1600))

How close is this to a normal distribution? One test is to draw a nor-


mal distribution with the same parameters on top of the histogram. But
what are these parameters? The normal is a two-parameter distribution that
is characterized by its mean and its standard deviation. We can estimate
these two parameters from our sample of 10,000 means (your values will be
slightly different because of the randomization):

mean(means)
sd(means)

Now we use these two parameters in the probability density function of


the normal distribution (dnorm), to create a normal curve with our partic-
ular mean and standard deviation. To draw the smooth line of the normal
curve, we need to generate a series of values for the x-axis; inspection of the
histograms suggests that sensible limits would be from 0 to 10 (the limits
we chose for our uniformly distributed random numbers). A good rule of
thumb is that for a smooth curve you need at least 100 values, so let’s try
this:

x<-seq(0,10,0.1)

The probability density has an integral of 1.0 (that’s the area beneath the
normal curve), but we had 10,000 samples. To scale the normal probability
density function to our particular case, however, depends on the height of
CHAPTER 5. MATHEMATICAL FUNCTIONS 70

the highest bar (about 1500 in this case).

Generating random numbers with exact mean and standard devi-


ation
If we use a random number generator like rnorm then, naturally, the sample
you generate will not have exactly the mean and standard deviation that
you specify, and two runs will produce vectors with different means and
standard deviations. Suppose we want 100 normal random numbers with a
mean of exactly 24 and a standard deviation of precisely 4:

yvals<-rnorm(100,24,4)
mean(yvals)
[1] 24.2958
sd(yvals)
[1] 3.5725

If you want to generate random numbers with an exact mean and stan-
dard deviation, then do the following:

ydevs<-rnorm(100,0,1)

Now compensate for the fact that the mean is not exactly 0 and the stan-
dard deviation is not exactly 1 by expressing all the values as departures
from the sample mean scaled in units of the sample’s standard deviations:

ydevs<-(ydevs-mean(ydevs))/sd(ydevs)

Check that the mean is zero and the standard deviation is exactly 1:

mean(ydevs)
sd(ydevs)

The mean is as close to zero as makes no difference, and the standard devi-
ation is one. Now multiply this vector by your desired standard deviation
and add to your desired mean value to get a sample with exactly the means
and standard deviation required:

yvals<-24+ydevs*4
mean(yvals)
CHAPTER 5. MATHEMATICAL FUNCTIONS 71

sd(yvals)

Comparing data with a normal distribution


We are concerned with the task of comparing a histogram of real data with
a smooth normal distribution with the same mean and standard deviation,
in order to look for evidence of non-normality (e.g. skew or kurtosis).
Chapter 6

Sampling Distributions

A number of important probability distributions can be derived by consider-


ing various functions of normal random variables. These distributions play
important roles in statistical inference. They are rarely used to describe
data; rather they arise when analyzing data is sampled from a normal dis-
tribution. For this reason, they are sometimes called sampling distributions.

6.1 Chi–Square Distributions


Suppose that Z1 , · · · , Zn ∼ Normal (0, 1) and consider the continuous ran-
dom variable
Y = Z12 + · · · + Zn2 .
Because each Zi2 ≥ 0, the set of possible values of Y is Y (S) = [0, ∞). We
are interested in the distribution of Y.

The distribution of Y belongs to a family of probability distributions called


the chi-squared family. This family is indexed by a single real-valued param-
eter, ν ∈ [1, ∞), called the degrees of freedom parameter. We will denote a
chi-squared distribution with ν degrees of freedom by χ2 (ν).

Recall that if X has the standard normal distribution, then X 2 has the spe-
cial gamma distribution, which is referred to as the chi–square distribution,
and this accounts for the important role that the chi–square distribution
plays in problems of sampling from normal populations. A chi–square is
often denoted by“χ2 distribution”.

A random variable X has a chi–square distribution and it is referred to

72
CHAPTER 6. SAMPLING DISTRIBUTIONS 73

as a chi–square random variable if and only if its probability density is given


by 
1 v −2
x 2 e− x/2 x > 0


f ( x ) = 2 Γ(v/2)
v/2
0
 elsewhere
The parameter v is referred to as the number of degrees of freedom or simply
the degrees of freedom. The chi-square distribution plays a very important
role in the sampling theory.

The mean and the variance of the chi–square distribution with v degrees
of freedom are v and 2v, and its moment generating function is given by
1
MX (t) = (1 − 2t)−v/2 for t< .
2
Theorem 6.1.1. If X has the standard normal distribution, then X 2 has
the Chi-squared distribution with v = 1 degree of freedom.
More generally, let us show that
Theorem 6.1.2. If X1 , X2 , · · · , Xn are independent random variables having
standard normal distributions, then
n
Y= ∑ Xi2
i =1

X 2 has the chi–squared distribution with v = n degree of freedom.


Proof. Using the moment–generating function given above with v = 1, we
find that
1
MX2 (t) = (1 − 2t)− 2
i
and it follows that
n
∏(1 − 2t)−
1 n
MY (t) = 2 = (1 − 2t)− 2 .
i =1

This moment generating function is readily identified as that of the chi–


square distribution with v = n degrees of freedom.
Note: The chi–square distribution is a special case of the gamma distri-
bution. In the gamma distribution, substitute α = v/2 and β = 2, you
obtain a chi–square distribution. Hence, the mean and the variance of the
chi-square distribution are given by µ = v and σ2 = 2v.
Note: If X has the standard normal distribution, then X 2 has a special
gamma distribution, which is referred to as the chi–square distribution.
CHAPTER 6. SAMPLING DISTRIBUTIONS 74

Theorem 6.1.3. If X1 , X2 , · · · , Xn are independent random variables having


chi–square distributions with v1 , v2 , · · · , vn degrees of freedom, then
n
Y= ∑ Xi
i =1

has the chi-square distribution with v1 + v2 + · · · + vn degrees of freedom.


Theorem 6.1.4. If X1 and X2 are independent random variables, X1 has a
chi–square distribution with v1 degrees of freedom, and X1 + X2 has a chi–
square distribution with v > v1 degrees of freedom, then X2 has a chi–square
distribution with v − v1 degrees of freedom.
Theorem 6.1.5. If X̄ and S2 are the mean and the variance of a random
sample of size n from a normal population with the mean µ and the standard
deviation σ, then

1. X̄ and S2 are independent.


( n − 1) S2
2. the random variable has a chi–square distribution with n − 1
σ2
degrees of freedom.

Proof. Since a detailed proof of part 1 would go beyond the scope of this
course, we shall proof only for n = 2, and we shall assume the independence
of X̄ and S2 in our proof of part 2.

To prove part 2, we begin with the identity


n n
∑ (Xi − µ)2 = ∑ (Xi − X̄ )2 + n(X̄ − µ)2 .
i =1 i =1

n
Now, if we divide each term by σ2 and substitute (n − 1)S2 for ∑ (Xi − X̄ )2 ,
i =1
it follows that
n 2 2
Xi − µ ( n − 1) S2 X̄ − µ
 
∑ σ
=
σ2
+ √
σ/ n
.
i =1

with regard to the three terms of this identity, we know that the one on the
left-hand side of the equation is a random variable having a chi–square distri-
bution with n degrees of freedom. Also, the second term on the right–hand
side of the equation is a random variable having a chi–square distribution
CHAPTER 6. SAMPLING DISTRIBUTIONS 75

with 1 degree of freedom. Now, since X̄ and S2 are assumed to be indepen-


dent, it follows that the terms on the right–hand side of the equation are
independent, and we conclude that

( n − 1) S2
σ2
is a random variable having a chi–square distribution with n − 1 degrees of
freedom.
Example 6.1.1. Plot the Chi-square density function with degrees of free-
dom, d f = 6.
Solution. We will plot the function from x = 0 to x = 30.

> x=seq(0,30,0.1)
> plot(x,dchisq(x,6),main="Chi-distribution with df=6",type="l",col="blue")
> plot(x,dchisq(x,2),main="Chi-distribution with df=2",type="l",col="red")
The function dchisq(x,6) returns the Chi-square density function with
df = 6.
Question 6.1.1. Find the critical value, χ2α (k ), of the Chi-square distribu-
tion with 6 degrees of freedom and significance level α = 0.05.
Solution. We obtain:

> qchisq(0.05,6,lower.tail=F)
12.59159

Thus χ20.05 (6) = 12.59159.


The function qchisq(0.05,6,lower.tail=F) returns the critical value q
for which P( X ≥ q) = 0.05, where X follows a Chi-square distribution with
d f = 6. That is, it returns the value q for which the area under the density
curve to the right for q is 0.05.
Question 6.1.2. Suppose that X1 , X2 , · · · , Xn form a random sample from
a normal distribution with mean µ and variance σ2 . Find the distribution
of
n( X̄ − µ)2
σ2
Question 6.1.3. If a random variable X has a χ2 distribution with n degrees
1
of freedom, then the distribution of X 2 is called a chi (χ) distribution with
n degrees of freedom. Determine the mean of the distribution.
CHAPTER 6. SAMPLING DISTRIBUTIONS 76

Assume that we have a random sample X1 , X2 , · · · , Xn from a normal dis-


tribution with unknown mean of µ and unknown variance of σ2 . Suppose
we wish to test

H0 : σ2 = σ02 for some fixed values σ02


vs
H1 : σ2 ̸= σ02 .

Then, under H0 ,
( n − 1) S2
X2 =
σ2
has a χ2 distribution with n − 1 degrees of freedom.

Suppose X1 , X2 , · · · , Xn is a random sample from a normal distribution with


E( Xi ) = µ and Var( X ) = σ2 , to test the hypothesis

H0 : σ2 = σ02 against
H1 : σ > 2
σ02 . (upper-tail)

or

H0 : σ2 = σ02 against
H1 : σ < 2
σ02 . (upper-tail)

or

H0 : σ2 = σ02 against
2
H1 : σ ̸= σ02 . (two-tailed test)

Calculate the test statistic


( n − 1) S2
X2 =
σ2
For (i) reject H0 if X 2 calculated is greater than χα, n−1 , for (ii) reject H0 if
X 2 calculated is less than χ1−α, n−1 , (lower tail), and for(iii) reject H0 if X 2
calculated is less than χ2α/2, n−1 or X 2 < χ21− α n−1 .
2

Example 6.1.2. A machine engine part produced by a company is claimed


to have diameter variance no larger than 0.0002 (diameter measured in
inches). A random sample of 10 parts gave a sample variance of 0.0003.
Test, at the 5% level, H0 : σ2 = 0.0002 against H1 : σ2 > 0.0002.
CHAPTER 6. SAMPLING DISTRIBUTIONS 77

Solution. Assume the measured diameters are normally distributed. The


test statistic
( n − 1) S2 9(0.0003)
X2 = 2
= = 13.5
σ 0.002
The tabulated value is χ20.05, 9 = 16.919. The computed value 13.5 is smaller
than the tabulated value 16.919 hence, we do not reject H0 .

Conclusion: There is no enough evidence to indicate that σ2 exceeds 0.0002.


Example 6.1.3. If Z1 , Z2 , · · · , Z6 denotes a random sample from the stan-
dard normal distribution, find a number b such that
!
6
P ∑ Zi2 ≤ b = 0.95
i =1

Solution. ∑6i=1 Zi2 has a χ2 distribution with 6 degrees of freedom.

Checking in the tables, in the row headed 6 d.f and the column headed
with an upper-tail area of 0.05, we see the number 12.5916. Thus,
! !
6 6
P ∑ Zi2 > 12.5916 = 0.05 or P ∑ Zi2 ≤ 12.5916 = 0.95
i =1 i =1

and b = 12.5916.
Example 6.1.4. Suppose that the thickness of a part used in a semicon-
ductor is its critical dimension and that the process of manufacturing these
parts is considered to be under control if the true variation among the thick-
ness of the parts is given by a standard deviation not greater than σ = 0.60
thousandth of an inch. To keep a check on the process, random samples of
size n = 20 are taken periodically, and it is regarded to be “out of control”
if the probability that s2 will take on a value greater than or equal to the
observed sample value is 0.01 or less (even though σ = 0.60). What can
one conclude about the process if the standard deviation of such a periodic
random sample is s = 0.84 thousandth of an inch?
( n − 1) s2
Solution. The process will be declared “out of control” if with
σ2
n = 20 and σ = 0.60 exceeds χ20.01, 19 = 36.191. Since

( n − 1) 19(0.84)2
= = 37.24.
σ2 (0.60)2
CHAPTER 6. SAMPLING DISTRIBUTIONS 78

exceeds 36.191, the process is declared out of control. Of course, it is as-


sumed here that the sample may be regarded as a random sample from a
normal population.
Question 6.1.4. An experimenter was convinced that his measuring equip-
ment possessed variability, which resulted in a standard deviation of 2. Six-
teen measurements resulted in a value of S2 = 6.1. Do the data disagree
with his claim? Take α = 0.01. What would you conclude if you choose
α = 0.05?
Question 6.1.5. Show that if X1 , X2 , · · · , Xn are independent random
variables having the chi-square distribution with v = 1 and Yn = X1 + X2 +
+ Xn , then the limiting distribution of
Yn
n −1
Z= √
2/n
as n → ∞ is the standard normal distribution.
The following fact is quite useful:
Theorem 6.1.6. If Z1 , · · · , Zn ∼ Normal(0, 1) and Y = Z12 · · · + Zn2 , then
Y ∼ χ2 ( n ).
In theory, this fact allows one to compute the probabilities of events defined
by values of Y,e.g., P(Y > 4.5). In practice, this requires evaluating the
cdf of χ2 (ν), a function for which there is no simple formula. Fortunately,
there exist efficient algorithms for numerically evaluating these cdfs. The R
function pchisq returns values of the cdf of any specified chi-squared distri-
bution. For example, if Y ∼ χ2 (2), then P(Y > 4.5) is

1-pchisq(4.5,df=2)
0.1053992

Finally, if Zi ∼Normal (0, 1), then

E( Zi2 ) = VarZi + ( EZi )2 = 1.

It follows that
!
n n n
E (Y ) = E ∑ Zi2 = ∑ E(Zi2 ) = ∑ 1 = n;
i =1 i =1 i =1

thus, If Y ∼ χ2 (n), then E(Y ) = n.


CHAPTER 6. SAMPLING DISTRIBUTIONS 79

6.2 Student’s t Distributions


Recall, that for random samples from a normal population with the mean µ
and the variance σ2 , the random variable X̄ has a normal distribution with
the mean µ and the variance σ2 /n, in other words,

X̄ − µ

σ/ n

has the standard normal distribution.

This is an important result, but the major difficulty in applying it is in most


realistic applications the population standard deviations σ is unknown. This
makes it necessary to replace σ with an estimate, usually with the value of
the sample standard deviation S. Thus, the theory that follows leads to the
exact distribution of
X̄ − µ

S/ n
for random samples from normal populations.

If Y and Z are independent random variables, Y has a chi–square distribu-


tion with v degrees of freedom, and Z has the standard normal distribution,
then the distribution of the continuous random variable T
Z
T= √ is given by
Y/v

− v+2 1
Γ v+2 1
 
t2
f (t) = √ 1+ for − ∞ < t < ∞
πvΓ( 2v ) v

and it is called the t-distribution with v degrees of freedom.1


Definition 6.2.1. The distribution of T is called a t distribution with v
degrees of freedom. We will denote this distribution by t(ν).
1 The t distribution was introduced originally by W.S. Gosset, who published his sci-
entific writings under the pen name “Student,” since the company for which he worked, a
brewery, did not permit publication by employees. Thus, the t distribution is also known
as the Student–t distribution, or Student’s t distribution.
CHAPTER 6. SAMPLING DISTRIBUTIONS 80

Theorem 6.2.1. If X̄ and S2 are the mean and the variance of a random
sample of size n from a normal population with the mean µ and the variance
σ2 , then
X̄ − µ
T= √
s/ n
has the t-distribution with n − 1 degrees of freedom.

Proof. check this ...

Although the mean of the t distribution does not exist when n = 1, the
mean does exist for any value of n > 1 of course, whenever the mean does
exist, its value is 0 because of the symmetry of the t distribution.

It can be shown that if X has a t distribution with n degrees of freedom


(n > 2), then
Var( X ) = n/(n − 2).
Example 6.2.1. In 16 one-hour runs, the gasoline consumption of an engine
averaged 16.4 gallons with a standard deviation of 2.1 gallons. Test the claim
that the average gasoline consumption of this engine is 12.0 gallons per hour.
Solution. Substituting n = 16, µ = 12.0, x̄ = 16.4, and s = 2.1 into the
formula for t, we get
x̄ − µ 16.4 − 12.0
t= √ = √ = 8.38.
s/ n 2.1/ 16
Since v = 15, the probability of getting a value of T greater than 2.947 is
0.005, the probability of getting a value greater than 8 must be negligible.
Thus, it would seem reasonable to conclude that the true average hourly
gasoline consumption of the engine exceeds 12.0 gallons.
The standard normal distribution is symmetric about the origin; √ i.e., if Z ∼
Normal (0,1),√ then − Z ∼ Normal (0,1). It follows that T = Z/ Y/ν and
− T = − Z/ Y/ν have the same distribution. Hence, if p is the pdf of T,
then it must be that p(t) = p(−t). Thus, t pdfs are symmetric about the
origin, just like the standard normal pdf.

Figure (**) displays the pdfs of two t distributions. They can be distin-
guished by virtue of the fact that the variance of t(ν) decreases as ν in-
creases. It may strike you that t pdfs closely resemble normal pdfs. In fact,
the standard normal pdf is a limiting case of the tpdfs:
CHAPTER 6. SAMPLING DISTRIBUTIONS 81

Theorem 6.2.2. Let Fν denote the cdf of t(ν) and let Φ denote the cfd of
Normal (0,1). Then
lim Fν (t) = Φ(t)
ν→∞

for every t ∈ (−∞, ∞).


Thus, when ν is sufficiently large (ν > 40 is a reasonable rule of thumb),
t(ν) is approximately Normal (0,1) and probabilities involving the former
can be approximated by probabilities involving the latter.

In R it is just to calculate t(ν) probabilities as it is to calculate Normal


(0,1) probabilities. The R function pt returns values of the cdf of any spec-
ified t distribution. For example, if T ∼ t(14), then P( T ≤ −1.5) is

pt(-1.5, df=14)
0.07791266

6.3 Fisher’s F Distributions


Let Y1 ∼ χ2 (ν1 ) and Y2 ∼ χ2 (ν2 ) are independent random variables having
chi-square distribution with ν1 and ν2 degrees of freedom, then

Y1 /ν1
F=
Y2 /ν2
is a random variable having an F distribution. Because Yi ≥ 0, the set of
possible values of F is F (S) = [0, ∞). We are interested in the distribution
of F:
   v1  1
Γ v1 +2 v2 v 1 − 2 ( v1 + v2 )

v1 2 v1
− 1
f (x) = ·x2 1+ x
Γ v21 Γ v22
 
v2 v2

for x > 0 and g( x ) = 0 elsewhere.


Definition 6.3.1. The distribution of F is called an F distribution with ν
and ν2 degrees of freedom. We will denote this distribution by F (ν1 , ν2 ).
It is customary to call ν1 the “numerator” degrees of freedom and ν2 the
“denominator” degrees of freedom.
The figure (***) displays the pdfs of several F distributions.
CHAPTER 6. SAMPLING DISTRIBUTIONS 82

There is an important relation between t and F distributions. To antici-


pate it, suppose that Z ∼ Normal(0,1) and Y2 ∼2 (µ2 ) are independent
random variables. Then, Y1 = Z2 ∼ χ2 (1), so

Z
T= √ ∼ t(ν2 )
Y2 /ν2
and
Z2 Y /1
T2 = = 1 ∼ F (1, ν2 ).
Y2 /ν2 Y2 /ν2
More generally,
Theorem 6.3.1. If T ∼ t(ν), then T 2 ∼ F (1, ν).
The R function pf returns values of the cdf of any specified F distribution.
For example, if F ∼ (2, 27), then P( F > 2.5) is

1-pf(2.5,df1=1,df2=27)
0.1008988
Theorem 6.3.2. If s21 and s22 and the variances of independent random
samples of size n1 and n2 from normal populations with the variance σ12 and
σ22 , then
s2 /σ2 σ 2 s2
F = 12 12 = 22 12
s2 /σ2 σ1 s2
is a random variable having an F- distribution with n1 − 1 and n2 − 1 degrees
of freedom.
We may be interested in comparing the variance of two normal distribu-
tions, and testing to determine whether or not they are equal. This problem
is often encountered when comparing the precision of two measuring instru-
ments, the variation in quality characteristics of manufactured product, or
the variation in scores for two testing procedures.

Suppose X11 , X12 , · · · , X1n1 and X21 , X22 , · · · , X2n2 are independent random
samples from normal distributions with unknown means and Var( X1i ) = σ12 ,
Var( X2i ) = σ22 , where σ12 and σ22 are unknown. Suppose we are interested
to test the null hypothesis, H0 : σ12 = σ22 against the alternative hypothesis,
H1 : σ12 > σ22 now, σ12 and σ22 can be estimated by S12 and S22 respectively.
CHAPTER 6. SAMPLING DISTRIBUTIONS 83

We would reject H0 : σ12 = σ22 in favour of H1 : σ12 > σ22 if S12 is much
larger than S22 , i.e., reject H0 : σ12 = σ22 if

s21 /σ12 σ22 s21


F= = >k (6.1)
s22 /σ22 σ12 s22

where k depends upon the probability distribution of the statistic

s21
s22

(n1 − 1)S12 (n2 − 1)S22


Note: that and are independent chi–square random
σ12 σ22
variables. Therefore Eq.(6.1) has an F–distribution with n1 − 1 numerator
degrees of freedom and n2 − 1 denominator degrees of freedom. Under the
S2
null hypotheses σ12 = σ22 , then F = S12 has an F–distribution with numer-
2
ator degrees of freedom and we have n2 − 1 denominator degrees of freedom.

The rejection region becomes Fcomp > Fα,n1 −1, n2 −2 i.e., k = Fα,n1 −1, n2 −2 .
Example 6.3.1. Consider two random samples, X1 and X2 of sizes 10 and
20 with sample variances given as 0.0003 and 0.0001 respectively. Assuming
that the populations, from which the samples have been drawn, are normal,
determine whether the variance of the first population is significantly greater
than the second one. Take α = 0.05.
Solution. Let σ12 and σ22 denote the variances for the first and second popu-
lation from which the samples were taken. Then the hypothesis to be tested
is H0 : σ12 = σ22 against H1 : σ12 > σ22 .

The test statistic is


S12
F= , based on v1 = 9 and v2 = 19 degrees of freedom
S22

Therefore;
S12 0.0003
F= = =3
S22 0.0001
The tabulated value is F9,19,0.05 = 2.42.

Fcalc = 3.
CHAPTER 6. SAMPLING DISTRIBUTIONS 84

Since Fcalc > F9,19,0.05 = 2.42, we reject the null hypothesis.

Conclusion: The variation of the first population is greater than the second
one.
Note:
1. If X1 , X2 , X3 · · · Xn is a random sample of size n from a normal distri-
bution with mean of µ and a variance of σ2 , then

1 n
n i∑
X̄ = Xi
=1

is normally distributed with a mean of µ and a variance of σ2 /n.


Xi − µ
2. Suppose X1 , X2 , X3 · · · Xn is as defined in 1. then Zi = are
σ
independent standard normal random variables i = 1, 2, · · · , n and
n n 
Xi − µ 2

∑ Zi = ∑
2
σ
,
i =1 i =1

is a chi-square distribution with n degrees of freedom


3. Let Z be a standard normal random variable and let χ2v be a chi-
square random variable with v degrees of freedom, then if Z and X 2
are independent
Z
T= p ,
χ2 /v
is a t-distribution with v degrees of freedom.
4. Let χ21 and χ22 be chi-square random variables with v1 and v2 degrees
of freedom, respectively. Then if χ21 and χ22 are independent,

χ21 /v1
F= ,
χ22 /v2
is said to have an F–distribution with v1 numerator degrees of freedom
and v2 denominator degrees of freedom.
Example 6.3.2. For the two samples on yields of potatoes we had n1 = 10,
S12 = 12, n2 = 12, S22 = 26.17. Is there reason to suspect that the samples
came from normal population with different variances? Take α = 5% and
α = 1%.
CHAPTER 6. SAMPLING DISTRIBUTIONS 85

Question 6.3.1. In a large warehouse, there are 13 bags of maize delivered


from Embu district. The variance of their masses was 8.4kg2 . On the same
day there were 21 bags of maize delivered from Meru district, the variance
of their masses was 2.4kg2 .

Do these data provide evidence of a significant difference in the variance


of the masses of bags of maize received from the two districts? (Assume
masses are normally distributed).Take α = 5% and α = 1%.
Question 6.3.2. Consider two random samples X1 and X2 of sizes 9 and
5 with sample variances 115 and 24 respectively. Assuming that the pop-
ulations, from which the samples have been drawn, are normal, determine
whether the samples could have come from a population with a common
variance.
Question 6.3.3. A random sample of 10 children selected from children
who are the same age in Utopia high school. Their heights were as follows
(cm):
Boys: 172 155 157 152
Girls: 160 152 147 155 153 151

(a). Is it reasonable to assume that the variances of the height of boys and
girls of this age are the same?

(b). What is the 95% confidence interval for the difference in mean heights
of boys and girls of this age?

(c). Based on (b) is it reasonable to assume µB = µG .


Chapter 7

Confidence Intervals

So far, we’ve talked about estimates for parameters that give a single number
as the best guess for that parameter given the data we have. Sometimes it
is useful to instead estimate an interval for the possible values of the param-
eter and put a probability on how confident we are that the true parameter
value falls inside this interval.

Consider a random sample X1 , X2 , . . . . , Xn that comes from some distri-


bution F with parameter θ. A 100(1 − α)% confidence interval for θ is a
pair of statistics Ln , Un , such that

P( Ln < θ < Un ) = 1 − α.

A common choice is α = 0.05, which is a 95% confidence interval.

7.0.1 Confidence Intervals for the Mean (Known Variance)


The most common case for confidence intervals is when we have Gaussian
random sample, that is, Xi ∼ N (µ, σ2 ), and we want to estimate the mean.
Remember when we discussed the Central Limit Theorem that we defined
a normalized sample mean statistic:
X̄n − µ
Zn = √
σ/ n
Because the Xi are iid Gaussian, we know that X̄n ∼ N (µ, σ2 /n). Therefore,
the normalized sample mean has a standard normal distribution, i.e., Zn ∼
N (0, 1). We want to find a critical value zα/2 that satisfies

P(−zα/2 < Zn < zα/2 ) = 1 − α,

86
CHAPTER 7. CONFIDENCE INTERVALS 87

where the probability is computed using the standard normal dcf. with a
little algebraic manipulation, we can convert this to
σ σ
P( X̄n − zα/2 √ < µ < X̄n + zα/2 √ ) = 1 − α.
n n

This means that our confidence interval is given by


σ σ
Ln = X̄n − zα/2 √ , Un = X̄n + zα/2 √
n n

Using R: To calculate the critical value zα/2 , notice that this is just another
word for the 1 − α/2 quantile. The R command to get this value is:

qnorm(1-0.5 * alpha)

For the most common choice of α = 0.05 (again, a 95% confidence inter-
val), we get that z0.025 ≈ 1.96, for a 99% confidence interval, we would get
the critical value z0.005 ≈ 2.58.
Example 7.0.1. You want to estimate the average snowfall in the Mt.Kenya
front this year, and you take snowfall measurements at 40 different locations
along the front. Experience from previous years indicates that there is a
variance of 36 inches between measurements at these locations. You compute
the average snowfall for the year is 620 inches. What is a 95% confidence
interval for the average snowfall?

7.0.2 Confidence Intervals for the Mean (Unknown Vari-


ance)
The problem with the above estimation of confidence intervals is that we
have to know the true value of the variance. Typically, the true value of σ2 is
not known, and the best we can do is estimate it using the sample variance
Sn2 . What happens if we p replace the standard deviation σ with the
sample statistics Sn = Sn2 ? Instead of Zn , we get the following random
variable:
X̄n − µ
Tn = √ .
Sn / n
This variable will no longer have a N(0,1) distribution. However, a fel-
low named William Gossett figured out the formula for this distribution in
1908. (Actually he had a slightly different statistic - the modern form of the
CHAPTER 7. CONFIDENCE INTERVALS 88

t-statistic and its distribution were discovered by Ronald Fisher.)1 The pdf
for the t-distribution is very similar to the Gaussian. It is centered at zero
and symmetric “hill” shape, but it does have “heavier tails”, meaning that
it goes to zero slower than the Gaussian. As n gets large, the t-distribution
has one parameter, the degrees of freedom m. The random variable Tn above
has a t-distribution with degrees of freedom equal to m = n − 1. In terms
of notation, this is written Tn ∼ t(n − 1).

Back to confidence intervals, we can also get critical values of the t-distribution,
denoted tα/2 , which satisfy

P(−tα/2 < Tn < tα/2 ) = 1 − α

Equivalent to the Gaussian case above, we construct our confidence interval


as
Sn Sn
Ln = X̄n − tα/2 √ , Un = X̄n + tα/2 √
n n
Using R: The commands in R for the t-distribution are dt (gives the pdf),
pt (gives the cdf), and qt (gives the quantiles). Each command takes a
required parameter df for the degrees of freedom. To calculate the critical
t)α/2, remember that this is just another word for the 1 − α/2 quantile.
The R command to get this value is:

qt(1-0.5 * alpha, df=n-1)

where n is your sample size used to compute the mean statistic. As an


example, if the sample size was 10 and α = 0.05 (again, a 95% confidence
interval), we would call

qt(1-0.025, df=9)

which return the result 2.26. Notice that this gives a wider confidence
interval than the Gaussian case above did. If n were to grow larger, our
confidence interval would shrink and eventually match the 1.96 value of the
Gaussian.
Example 7.0.2. Repeat the Snowball analysis above, but this time you do
not rely on previous estimates of the snowfall variance. You compute the
1 Itis called Student’s t-distribution because Gossett used the pseudonym “Student” so
that he wouldn’t get in trouble with his employer (Guinness brewery) for publishing his
work.
CHAPTER 7. CONFIDENCE INTERVALS 89

variance in your measurements to be Sn2 = 34 inches. How did the confidence


interval change?
Using R to Find Confidence Intervals

Consider the following vector:


data=c(6.2, 6.6, 7.1, 7.4, 7.6, 7.9, 8, 8.3, 8.4, 8.5, 8.6, 8.8, 8.8, 9.1, 9.2, 9.4,
9,4, 9.7, 9.9, 10.2, 10.4, 10.8, 11.3, 11.9)

t.test(data)
> t.test(data)

One Sample t-test

data: data t = 25.4909, df = 24, p-value < 2.2e-16

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:


7.980892 9.387108
sample estimates:
mean of x
8.684
The t.test command can also be used to find confidence intervals with
levels of confidence different from 95%. We do so by specifying the desired
level of confidence using the conf.level option.

t.test(data, conf.level=0.9)

> t.test(data, conf.level=0.9)

One Sample t-test

data: data t = 25.4909, df = 24, p-value < 2.2e-16

alternative hypothesis: true mean is not equal to 0

90 percent confidence interval:


8.101154 9.266846
CHAPTER 7. CONFIDENCE INTERVALS 90

sample estimates: mean of x


8.684
The R command prop.test can be used similarly to construct confidence in-
tervals for the normal approximation to the binomial.

prop.test(83, 100, 0.75)


> prop.test(83, 100, 0.75)

1-sample proportions test with continuity correction

data: 83 out of 100, null probability 0.75

X-squared = 3, df = 1, p-value = 0.08326

alternative hypothesis: true p is not equal to 0.75

95 percent confidence interval:


0.7389130 0.8950666
sample estimates:
p
0.83
R does not have a command to find confidence intervals for the mean of nor-
mal data when the variance is known. Because this arises rarely in practice.
The following command lines create a new command norm.interval

We apply it to the data set data above, assuming we know ahead that
the variance is 2.
> data=c(6.2, 6.6, 7.1, 7.4, 7.6, 7.9, 8, 8.3, 8.4, 8.5, 8.6, 8.8, 8.8,
+ 9.1, 9.2, 9.4, 9,4, 9.7, 9.9, 10.2, 10.4, 10.8, 11.3, 11.9)
>
>
> norm.interval=function(data, variance=var(data), conf.level=0.95){
+ z=qnorm((1-conf.level)/2, lower.tail=FALSE)
+ xbar=mean(data)
+
sdx=sqrt(variance/length(data))
+ c(xbar-z*sdx, xbar+z*sdx)
+ }
CHAPTER 7. CONFIDENCE INTERVALS 91

> norm.interval(data, 2)

[1] 8.129638 9.238362


Similar calculations, or a similar function, could be developed for confidence
intervals for the variance of a normal distribution. We illustrate this for the
variance of the data set data.
> var.interval=function(data, conf.level=0.95){
+ df=length(data)-1
+ chilower=qchisq((1-conf.level)/2, df)

+ chiupper=qchisq((1-conf.level)/2, df, lower.tail=FALSE)

+ v=var(data) + c(df*v/chiupper, df*v/chilower)


+ }
> var.interval(data)

[1] 1.768963 5.615092

Confidence intervals for population proportion


###################### ## Confidence interval for proportion (from R
book) ######################

# example 7.2 presidential job performance

# out of a random sample of 1,013 likely voters

# 466 voters rated the president’s job performance as

# "good" or "excellent"

# find the confidence interval

# n = 1013 is large enough such that we can use the

# normal approximation n = 1013 phat = 466/n

# estimate of the average proportion SE = sqrt(phat*(1-phat)/n)


CHAPTER 7. CONFIDENCE INTERVALS 92

# that was the formula for the SE of the proportion

# (something new)
#let’s select a 95% confidence interval
alpha = 0.05

# find the corresponding z-value

# this is the number of standard deviations out from


# the mean that we go for a 95% confidence interval

zstar = -qnorm(alpha/2)

# now we construct the confidence interval


> c(phat - zstar * SE, phat + zstar * SE)
[1] 0.4293281 0.4907114 8
Chapter 8

Other distributions used in


hypothesis testing

The main distribution used in hypothesis testing are: the chi-squared, for
testing hypotheses involving count data; Fisher’s F, in analysis of variance
(ANOVA) for comparing two variances; and Student’s t, in small-sample
work for comparing two parameter estimates. These distributions tell us
the size of the test statistic that could be expected by chance alone when
nothing was happening (i.e. when the null hypothesis was true). Given the
rule that a big value of the test statistic tells us something is happening
and hence that the null hypothesis is false, these distributions define what
constitute a big value of the test statistic.

8.0.1 Chi-square distribution


It is a special case of the gamma distribution characterized by a single pa-
rameter, the number of degrees of freedom. The mean is equal to the degrees
of freedom ν, and the variance ie equal to 2ν. The density function looks
like this:
1
f ( x ) = ν/2 x ν/2−1 e− x/2 ,
2 Γ(ν/2)
where Γ is the gamma function. The chi-squared is important because many
quadratic forms follow the chi-squared distribution under the assumption
that the data follow the normal distribution. I particular, the sample vari-
ance is a scaled chi-squared variable. Likelihood ratio statistics are also
approximately distributed as a chi-squared.

When the cumulative probability is used, an optional third argument can be

93
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING94

provided to describe non-centrality independent normal random variables,


then the non-centrality parameter is equal to the sum of the squared means
of the normal variables. here the cumulative probability plots for a non-
centrality parameter (ncp) based on three normal means (of 1, 1.5 and 2)
and another with 4 means and ncp = 10:

par(mfrow=c(1,2))
x<-seq(0,30,0.25)
plot(x, pchisq(x, 3, 7.25), type= "l", ylab= "p(x)", xlab="x")
plot(x, pchisq(x, 5, 10), type= "l", ylab="p(x)", xlab="x")

The cumulative probability on the left has 3df and non-centrality parame-
ter (12 + 1.52 + 22 = 7.25), while the distribution on the right has 4df and
non-centrality 10 (note the longer left-hand tail at low probabilities). Chi-
squared is also used to establish confidence intervals for sample variances.
The quantity
( n − 1) s2
σ2
is the degree of freedom (n − 1) multiplied by the ratio of the sample vari-
ance s2 to the unknown population variance σ2 . This follows a chi-squared
distribution, so we can establish a 95% confidence interval for σ2 as follows:

( n − 1) s2 2 ( n − 1) s2
≤ σ ≤
χ21− α χ2α
2 2

Suppose the sample variance s2 = 10.2 on 8 df. Then the interval on σ2 is


given by:

8*10.2/qchisq(0.975, 8)
[1] 4.65367
8*10.2/qchisq(0.025, 8)
[1] 37.43582

which means that we can be 95% confident that the population variance
lies in the range 4.65 ≤ σ2 ≤ 37.44.

8.0.2 Fisher’s F distribution


This is the famous ratio test that occupies the penultimate column of every
ANOVA table. The ratio of treatment variance to error variance follows the
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING95

F distribution, and you will often want to use the quantile qf to look up
critical values of F. You specify, in order, the probability of you one-tailed
test (this will usually be 0.95), then the two degrees of freedom: numerator
first, then denominator. So the 95% value of F with 2 and 18 d.f is:

qf(0.95, 2, 18)
[1] 3.55457

This is what the density function of F looks like for 2 and 18 d.f. and 6
and 18 d.f.

x<-seq(0.05,4,0.05)
plot(x, df(x,2,18), type= "I", ylab="f(x)", xlab="x")
plot(x, df(x,6,18), type= "I", ylab="f(x)", xlab="x")

The F distribution is a two-parameter distribution defined by the density


function:
rΓ(1/2(r + s)) (rx/s)(r−1)/2
f (x) =
sΓ(1/2r )Γ(1/2s) [1 + (r x /s)](r+s)/2
where r is the degrees of freedom in the numerator and s is the degrees of
freedom in the denominator. The distribution is named after R.A. Fisher,
the father of analysis of variance and principal developer of quantitative ge-
netics. It is central to hypothesis testing, because of its use in assessing the
significance of the differences between two variances. The test statistic is
calculated by dividing the larger variance by the smaller variance. The two
variances are significantly different when the ratio is larger than the critical
value of Fisher’s F. The degrees of freedom in the numerator and in the
denominator allow the calculation of the critical value of the test statistic.
When there is a single degree of freedom in the numerator, the distribution
is equal to the square of students’s t: F = t2 . Thus, while the rule of thumb
for the critical value of t is 2, so the rule of thumb for F = t2 = 4. To see
how well the rule of thumb works, we can plot critical F against d.f. in the
numerator:

df<-seq(1, 30, 0.1)


plot(df, qf(0.95, df, 30), type="I", ylab= "critical F")
lines(df, qf(0.95, df, 10), lty=2)

You see that the rule of thumb (critical F = 4) quickly becomes much
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING96

too large one the d.f. in the numerator (on the x-axis) is larger than 2. The
lower (solid) line shows the critical values of F when the denominator has
30 d.f. and the upper(dashed) line shows the case in which the denominator
has 10 d.f.

The shape of the density function of the F distribution depends on the


degrees of freedom in the numerator.

Hypothesis Testing (Small Sample; n < 30 and the Population


Variance is Unknown)
Estimating the Population Mean (small sample)
The interval estimate for the population mean for small sample is given by

x̄ ± tα,(n−1) s x̄

where;

x̄ −sample mean
tα,(n−1) −critical value of t at a given level of significance (n − 1) degrees of freedom.
s
s x̄ = √ − s−is the sample standard deviation and is taken as an estimate of σ.
n
n−sample size.

Example 8.0.1. In order to revise the accident insurance rates for auto-
mobiles, an insurance company wants to assess the damage caused to cars
by accidents at a speed of 15 miles per hour. A sample of 16 new cars was
selected at random and the company clashed each one at the speed of 15
miles per hour. The cars so damaged were repaired and it was found that
the average repair amount was $2500 with standard deviation of $950.

Assuming that the population distribution of costs of repairs under the


conditions is normal, estimate the true average damage repair in terms of
dollars to all cars due to clash at 15 miles per hour. assume α = 0.05.

µ = x̄ ± tα,(n−1) s x̄
x̄ = 2500, n = 16, s = 950, α = 0.05
check Ans: 1993.90, 3006.10
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING97

Inferences Concerning a proportion Mean (Small Samples)


x̄ − µ
t= √
s/ n
Example 8.0.2. A gas station repair shop claims it can do a lubrication
job and oil change in 30 minutes. The Consumer Protection Department
wants to test this claim. a sample of six cars were sent to the station for
oil change and lubrication. the job took an average of 34 minutes with a
standard deviation of 4 minutes. Test the claim at α = 0.05.
Solution. It is a one-tailed test because the claim will not be rejected if
the average time taken on a car for the oil change and the lube-job is con-
siderably less than 30 minutes. It will only be rejected if this job time is
considerably more than 30 minutes.

The null and alternative hypotheses are stated as follows:

Null hypothesis: H0 : µ1 ≤ 30
Alternative hypothesis: H1 : µ1 > 30

Since the sample size is small, a t-test will be used.


x̄ − µ
t= √
s/ n
where; x̄ = 34, µ = 30, s = 4, n = 6.
Substituting these values, we get;
x̄ − µ 34 − 30 4
t= √ = √ = = 2.45
s/ n 4/ 6 1.63
The critical value of t at α = 0.05 and d f = n − 1 = 5, for a one-tailed test
is given as 2.02.

Since our computed value of t == 2.45 is higher than the critical value
of t = 2.02, we cannot accept the null hypothesis. hence, the claim is not
considered correct.
Exercise:
A claim is made that DeKUT University students have an IQ of 120. To
test this claim, a random sample of 10 students was taken and their IQ score
was recorded as follows

105, 110, 120, 125, 100, 130, 120, 115, 125, 130
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING98

At α = 0.05 level of significance test the validity of this claim.


q
Σ( xi − x̄ )2
Hint x̄ = Σx
n
i
= 118 and s = n −1 = 10.33.
The test statistic:
x̄ − µ
t= √
s/ n

Comparing Two Population Means (Small Independent Samples)


A t-test can be used for comparison of two population means in order to
establish whether there is any significant difference between the two popu-
lation means, provided the following conditions are met:

(i). Each population is approximately normally distributed.

(ii). The sample size taken from each population is small (n < 30), but the
samples do not have to be equal in size.

(iii). The two samples are independent (unrelated).

(iv). The population standard deviation standard deviations are unknown


but are nearly equal to each other.

The t- statistics for comparison of two populations means is similar to the


procedure of using the z-statistic for comparison of two population means.
two additional elements are considered when using the t-test. These are:

(i). The number of degrees of freedom is the sum of the degrees of freedom
for each sample, where n1 is the sample size from population 1, and n2
is the sample size from population 2, the number of degrees of degrees
of freedom would be expressed as

d f = (n1 − 1)(n2 − 1)
= n1 + n2 − 2

(ii). The two standard deviations s1 and s2 calculated from the two samples
of size n1 and n2 respectively, are pooled together to form a single esti-
mate (s p ) of the population standard deviation, where s p is calculated
as: s
(n1 − 1)s21 + (n2 − 1)s22
sp =
n1 + n2 − 2
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING99

Then the test statistic t is calculated using the following formula:


x̄1 − x̄2
t= r  
s2p n11 + n12

where;

x̄1 −mean of the first sample


x̄2 −mean of the second sample
n1 −the size of the first sample
n2 −the size of the second sample
s p −pooled estimate of the population standard deviation as calculated above.

The calculated t-statistic is compared with the critical t-score from the table
at a given level of significance and (n1 + n2 − 2) degrees of freedom and a
decision is made whether to accept or reject a null hypothesis.
Example 8.0.3. The college administration wants to find out if there is
any significant difference in the average amount of money carried by male
and female students at KeMU Nakuru campus on any given day. A random
sample of 8 male students and 10 female students was selected and the
amount of money they each had was recorded. From each sample, the
sample means and sample standard deviation were found as follows:
Male students Female students
n1 = 8 n2 = 10
x̄1 = £20.5 x̄2 = £17.00
s1 = £2.00 s2 = £1.50
We want to test if there is any significant difference in the average amount
of money carried by male and female students at 5% level of significance.
Solution. The null hypothesis states that there is no significant difference
between the average amounts of cash carried by male and female students
at the college;

Null hypothesis H0 : µ1 = µ2
Alt. hypothesis H1 : µ1 ̸= µ2 (A two tailed test)

The test statistic;


x̄1 − x̄2
t= r  
s2p n11 + n12
CHAPTER 8. OTHER DISTRIBUTIONS USED IN HYPOTHESIS TESTING100

where;

(n1 − 1)s21 + (n2 − 1)s22


s2p =
n1 + n2 − 2
(8 − 1)(2)2 + (10 − 1)(1.5)2
=
8 + 10 − 2
28 + 20.25
=
16
= 3.01

Hence;
20.5 − 17 3.5
t= q  = 0.823 = 4.25
1 1
3.01 8 + 10

The critical t-score from the t table for a two-tailed test with α = 0.05 and
d f = (n1 + n2 − 2) = (8 + 10 − 2) = 16 is given as;

t0.05,16 = 2.12

Since our calculated value of t = 4.25 is larger than critical t value of 2.12,
we cannot accept the null hypothesis.

A Paired Difference t-Test (Dependant Samples)


Example 8.0.4. The light bulbs in an industrial warehouse have been found
to have a mean lifetime of 1030.0 hours with a standard deviation of 90.0
hours. The warehouse manager has been approached by a representative
of extend-a-bulb, a company that makes a device intended to increase bulb
life. The manager is concerned that the average life time of extend-a-bulb
equipped bulbs might not be any greater that the 1030 hours, historically
experienced. In a subsequent test, the manager tests 40 bulbs equipped with
the device and finds their mean life to be 1061.6 hours. Does extend a bulb
really work?
Solution. Formulate the null and alternative hypotheses.
The warehouse manager’s concern that extend-a-bulb equipped bulbs might
not be any better than those in the past leads to a directional test.
accordingly, the null and alternative hypotheses are:

Null hypothesis H0 : µ ≤ 1030.0 hours (extend-a-bulb is no better than present)


Alt. hypothesis H1 : µ > 1030.0 hours (extend-a-bulb really does increase bulb life)
Chapter 9

Past Examination Papers

DEDAN KIMATHI UNIVERSITY OF TECHNOLOGY


University Examinations 2014/2015
SECOND YEAR FIRST SEMESTER EXAMINATION FOR THE
DEGREE OF BACHELOR OF SCIENCE IN ACTUARIAL
SCIENCE

STA 2305 Statistical Programming I


DATE: 28TH APRIL 2015 TIME: 8.30 AM – 10.30 AM
Instructions: Answer QUESTION ONE and any other TWO
QUESTIONS.

QUESTION ONE (30 MARKS) (COMPULSORY)

(a) Explain the importance of setting seed when generating pseudo-random


numbers in R.
[1 mark]

(a) Briefly explain the working of the following conditional statements and
loops in R programming giving the default syntax for each.

(i). repeat()
(ii). while()
(iii). for()
(iv). ifelse()

101
CHAPTER 9. PAST EXAMINATION PAPERS 102

(v). if() [5 marks]


(b) In each of the following R code lines, determine the final value of
answer.
(i) > answer <- 0
> for (j in 1:5) answer <- answer + j
(ii) > answer <- 0
> for (j in 1:5) answer <- c(answer ,j)
[4 marks]
(c) Write the R command to execute the following statements
(i) Calculate the probability that a random variable with a χ2 dis-
tribution with three degrees of freedom is larger than 5.2. [2
marks]
(ii) Compute the 0.99 quantile of standard normal distribution. [1
mark]
(iii) Generate 20 random numbers from an exponential distribution
with mean 3.
[2 marks]
(d) A normal random variable has mean 22 and variance 25. Calculate the
following probabilities and write the R code that computes the same.
(i) lies between 16.2 and 27.5
(ii) is greater than 29
(iii) is less than 17
(iv) is less than 15 or greater than 25. [8 marks]
(e) Write the expected output for the following R code.

> i <- 0
> repeat {
+ i <-i+1 + if (i==3) break + }
> i

[2 marks]
(f) Using the algorithm with parameter values a = 10, c = 23, m = 59
and seed x0 = 11 to generate 5 random numbers from the U (0, 1). [5
marks]
CHAPTER 9. PAST EXAMINATION PAPERS 103

QUESTION TWO (20 Marks)

(a). Write the expected output of the following R code line. [2 marks]

> y=log(c(3, 0.5, 2, 4))


> ifelse(y<0, NA, y)

(b). A student attempts a multiple choice exam (options A to F for each


question), but having done no revision work, selects his answers to each
question by rolling a fair die (A = 1, B = 2, etc.). If the exam contains
100 questions, calculate the probabilities and write the R programming
commands for computing the following

(i) probability of obtaining a mark below 20.


(ii) probability of obtaining a score of 40.
(iii) all the probabilities from P(0) to P(19). [6 marks]

(c). Write a function in R for calculating the median of a single sample


using the ifelse statements. [6 marks]

(d). A communication satellite is designed to have a mean lifetime of 6


years. If the actual time of failure is an exponentially distributed
random variable, calculate the probabilities and write the R command
to compute the probability that the satellite will

(i) fail sooner than five years


(ii) survive beyond 10 years
(iii) fail during the sixth year. [6 marks]

QUESTION THREE(20 Marks)


(a). Write the program flow through this program and present your output in a
table including all the possible steps and the values taken by the variables
in each line.
# program:threexplus1.r
1 x <- 3 2 for (i in 1:3) { 3 show(x)
4 if (x %% 2 == 0) {
5 x <- x/2 6 } else { 7 x <- 3*x + 1 8 } 9 } 10 show(x)
[10 marks]
CHAPTER 9. PAST EXAMINATION PAPERS 104

(b). (i) Explain the main advantage of the Polar method compared with the
Box- Muller method for pairs of uncorrelated pseudo-random values
from a standard normal distribution.
[2 marks]

(ii) State briefly the method you would use to generate pseudo-random
numbers on a computer for each of the following statistical distribu-
tions:

(a) the continuous uniform distribution on the range ( 0,1],


(b) the chi square distribution with 6 degree of freedom,
(c) the distribution of X given X > 10, where X has the exponential
distribution with mean 5.
[6 marks]
(c). [2 marks]

QUESTION FOUR (20 marks)


(a). Suppose a bank offers annual interest of 3% on balances of less than Ksh.5,000,
3.25% on balances of Ksh.5,000 or more but less than Ksh.10,000, and 3.5%
for balances of Ksh.10,000 or more. Write an R program code that calcu-
lates an investors new balance after one year. [8
marks]
(b). (i) What are the disadvantages of using truly random numbers, as op-
posed to pseudo-random numbers in Monte Carlo Simulations? [3
marks]

(ii) Describe the three integers one has to choose in order to form a lin-
ear congruential generator (LCG) and set out a recursive relationship
with subsequent scaling, for generating a sequence of pseudo-random
numbers in the range −1 to 1. [5 marks]
(c). Given a random number U uniformly distributed over [0, 1], give an ex-
pression in terms of U for another variable X whose density function is
f ( x ) = 5 exp(−5x ). Justify the expression. [4 marks]

QUESTION FIVE (20 Marks)


(a). Consider whether the following statements are true or false.

(i) ruinf(3) has dimension 3


CHAPTER 9. PAST EXAMINATION PAPERS 105

(ii) log(runif(5)) has all positive elements

(iii) in repeated samples mean(runif(1000)) is within 0.03 of 1/2.

(iv) in repeated samples sd(runif(1000))is within 0.01 of (1/12).


p

[4 marks]

(b). (i) Show that the random variable X = −0.5 ln U has an exponential
distribution with mean 0.5 where U is uniformly distributed over the
range (0, 1).
[3 marks]

(ii) Explain how the above method can be used to simulate a path of
Poisson process with intensity λ = 2. [4 marks]

(iii) The first seven numbers produced by a linear congruential generator


are as given below:
0.803, 0.626, 0.401, 0.020, 0.178, 0.354, 0.024

Use these to generate three pseudo random numbers from Poisson


distribution with mean 2 using method given in (ii).
[3 marks]
CHAPTER 9. PAST EXAMINATION PAPERS 106

DEDAN KIMATHI UNIVERSITY OF TECHNOLOGY


University Examinations 2013/2014
SECOND YEAR SECOND SEMESTER EXAMINATION FOR THE
DEGREE OF BACHELOR OF SCIENCE IN ACTUARIAL SCIENCE

STA 2305 STATISTICAL PROGRAMMING I

DATE: DECEMBER 2013 TIME: 2 HOURS


Instructions: Answer QUESTION ONE and any other TWO
QUESTIONS.
QUESTION ONE (30 MARKS) (COMPULSORY)
(a) In each case below, write down the response (if any) that you would see in
the R console window, if the given commands were typed into the console.
(i) > for (i in seq(22, 30, 3)) print(i %% 7) (2 marks)
(ii) > x <- seq(2,20,3)
> x[x>15 | x < 6] (2 marks)
(iii) (y <- matrix (1:8, nrow=2)); ifelse (y>3 & y<7,1,0). (2
marks)
(b) Consider the following probability mass function of a discrete random vari-
able X. 

 0.15 X = 2

0.20 X = 3



P( X = x ) = 0.25 X = 5

0.20 X = 7




0.20 X = 11

Using the following pseudo-random numbers from the uniform(0, 1) distri-


bution, generate nine samples from PX : 0.011, 0.757, 0.438, 0.258, 0.981,
0.518, 0.400, 0.351, 0.672. (4 marks)
(c) Using the R commands for the respective distributions, answer the following
questions. To answer these questions, you will need to write down the R
commands that you would use for computations.
(i) Getting 0 out of 10 successes in a binomial distribution with probabil-
ity 0.8.
(1 mark)
CHAPTER 9. PAST EXAMINATION PAPERS 107

(ii) X < 0.9 when X has the uniform distribution between 5 and 9. (1
mark)

(iii) X > 6.5 in a χ2 distribution with 2 degrees of freedom. (2 marks)

(d) Write the expected output of the following R program. (2 marks)


> z <- 0
> while(z<5) {
> z <- z+2
> print(z)

(e) (i) Write a short R program using loops to compute the first 100 Fibonacci
numbers.
a1 = 1, a2 = 1, an = an−1 + an−2
(3 marks)

(ii) Using a for loop write the R code to compute the product of the values
in the vector vec. Print the final product and also the intermediate
result during the iteration of the for loop. (3 marks)
(f) (i) Write a for loop in R to compute the sum of the squares of all integers
from 2 to 100. (2 marks)

(ii) Suppose that $10000 dollars is invested in a long term Commercial


Deposit that pays 8% interest compounded annually. Write a while
loop to determine the number of years that it would take the money
to triple (to equal or exceed $30000), and the resulting balance after
this number of years.
(4 marks)
(g) Explain the main advantage of the Polar method compared with the Box-
Muller method for generating pairs of uncorrelated pseudo-random values
from a standard normal distribution. (2 marks)

QUESTION TWO (20 Marks)

Calculate the following probabilities and also write the corresponding R code
to generate the probabilities.
(a) Suppose a random variable X, has a Uniform distribution with a = 5 and
b = 9.

(i) P(5.5 ≤ X ≤ 8) (2 marks)


CHAPTER 9. PAST EXAMINATION PAPERS 108

(ii) P( X > 7) (2 marks)

(b) Let X be an exponential random variable with λ = 0.5.

(i) P( X > 6) (2 marks)

(ii) P(5 ≤ X ≤ 6) (2 marks)

(c) The random variable X is normally distributed with mean 79 and variance
144.

(i) P( X ≤ 70) (2 marks)

(ii) P(64 ≤ X ≤ 96) (3 marks)

(iii) P( X > 85) (2 marks)

(d) Write the R code to simulate the number of accidents for each year for 15
years, when the average rate is 2.8 accidents per year, assuming a poisson
model for the number of accidents each year. (1 mark)
(e) Write an R function to compute the degree of skewness for any vector of
numbers x.
(4 marks)

QUESTION THREE (20 Marks)

(a) Write the lines of R code required to compute the amount of interest to be
paid on a loan of $1000 held for 3 years at an interest rate of 4% compounded
annually.
(3 marks)
(b) Suppose, starting at his 25th birthday, Michael deposits 5,000 at the begin-
ning of every year into a retirement annuity that pays 9% interest per year,
compounded annually. He wants to retire when his annuity first reaches or
exceeds Shs 1 million. Write an R code to compute in how many years will
he be able to retire with this plan? (4 marks)
(c) Create a vector ‘values’ of length 100 with random values between 1 and
100. Use a for loop and an if statement to print only the values greater than
3. (3 marks)
(d) It is necessary to simulate samples from a distribution with density function

f ( x ) = 6x (1 − x ) 0 < x < 1.
CHAPTER 9. PAST EXAMINATION PAPERS 109

(i) Use the acceptance-rejection technique to construct a complete algo-


rithm for generating samples from f by first generating samples from
the distribution with density h( x ) = 2(1 − x ). (5 marks)

(ii) Calculate how many samples from h would on average be needed to


generate one realisation from f . (1 mark)

(iii) Explain whether the acceptance-rejection method in (i) would be more


efficient if the uniform distribution were to be used instead. (4
marks)

QUESTION FOUR (20 Marks)

(a) (i) Consider the random number generator

x j = 171 x j−1 mod 30269


u j = x j /30269, j = 1, 2, · · ·

where x0 is an initial seed.

Write an R function called myunif() which takes x0 and n as input


arguments and returns a vector of length n which contains the values
u1 , u2 , u3 , · · · , u n .
(4 marks)

(ii) Differentiate between a multiplicative and mixed congruential gener-


ators giving their recursive formula. Generate four pseudo-random
numbers using the multiplicative congruential generator with b = 171
and m = 32865 with an initial seed of 20018. (6 marks)
(b) Write the program flow through this program and present your output in a
table including all the possible steps and the values taken by the variables
in each line.

1 x ¡- 3
2 for (i in 1:3) {
3 show(x)
4 if (x %% 2 == 1) {
5 x ¡- x*4
6 } else {
7 x ¡- 3/x + 5
8 }
CHAPTER 9. PAST EXAMINATION PAPERS 110

9}
10 show(x)

(10 marks)
QUESTION FIVE (20 Marks)

(a) The continuous random variable X has probability density function f ( x ) and
cumulative distribution function F ( x ). Explain carefully how the inversion
method (i.e. the inverse c.d.f. method) of simulation could be used to
simulate observations from this distribution. What restrictions on F ( x ) are
required in order to make this method of simulation work? (4 marks)
(b) The following values are a random sample of numbers from a uniform dis-
tribution on the interval (0, 1).

0.167 0.236 0.778 0.968.

Use these values to generate 4 random variates from each of the following
distributions, carefully explaining the method you use in each case.

(i) f ( x ) = exp(− x ), x≥0 (4 marks)

(ii) f ( x ) = 4x (1 − x2 ), 0≤x≤1 (7 marks)

e −2 2 x
(iii) P( X = x ) = , x = 0, 1, · · · . (5 marks)
x!
CHAPTER 9. PAST EXAMINATION PAPERS 111

Instructions: Answer QUESTION ONE and any other TWO


QUESTIONS.

QUESTION ONE (30 MARKS) (COMPULSORY)


(a) Monte Carlo methods depend crucially on the ability to generate pseudo
random numbers in the interval (0, 1). Make a short list of what you
consider to be the most important properties of a good pseudo random
number generator. (3 marks)
(b) Suppose X ∼ N (2, 0.25). Denote by f and F the density and distribution
function of X respectively. Write the R code to calculate

(i) f (0.5)

(ii) F (2.5)

(iii) F −1 (0.95) (recall that F −1 is the quantile function)

(iv) Pr (1 ≤ X ≤ 3). (8 marks)

(c) Using a for-loop compute the product of the values in the vector. Print the
final product, but also print the intermediate result during the iteration of
the for-loop.
(3 marks)
(d) Create a vector ‘values’ of length 10 with random values between 1 and 10.
Use a for-loop and an if-statement to print only the values greater than 3.
(3 marks)
(e) Write the lines of R code required to compute the amount of interest to
be paid on a loan of $25000 held for 3 years at an interest rate of 4%
compounded monthly.
(3 marks)
(f) The continuous random variable X has probability density function f ( x ) and
cumulative distribution function F ( x ). Explain carefully how the inversion
method (i.e. the inverse c.d.f. method) of simulation could be used to
simulate observations from this distribution. What restrictions on F ( x ) are
required in order to make this method of simulation work? (3 marks)
(g) Consider the following function
(
k ( x + 1)3 if − 1 ≤ x ≤ 1
f (x) =
0 Otherwise
CHAPTER 9. PAST EXAMINATION PAPERS 112

(i) Find the value of the constant k so that f ( x ) is a well defined proba-
bility density function. (1
mark)

(ii) Use the inverse transformation method to generate three random ob-
servations from f ( x ) by hand. (3
marks)

(ii) Write an R function to generate 1000 observations from f ( x ). Use the


following Uniform random numbers in the given order.

r0 = 0.0257, r1 = 0.2331, r3 = 0.2589

(3 marks)

QUESTION TWO (20 Marks)


(a) Write a function in R that takes a vector ( x1 , x2 , · · · , xn ) and calculates
both ∑ xi , and ∑ xi2 . (use the function sum). (4 marks)
(b) Consider the following R code
x < − (-10):10
n < − length(x)
y < − rnorm(n, x, 4)
plot(x, y)
abline (0,1)

Explain the effect of each command line and the graph that is produced.(4
marks)
(c) Let x < − matrix(6:9, nrow=2, byrow=T) and y < − matrix(2:5, nrow=2,
byrow=T) be 2 by 2 matrices. Write the R-output for each command line
when executed.

(i) > t( x ) + t(y)

(ii) > crossprod( x, y)

(iii) > x% ∗ %y

(iv) > t( x ) + y

(v) > solve( x% ∗ %y)

(12 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 113

QUESTION THREE (20 Marks)


Using the R commands for the respective distributions, answer the following
questions. To answer these questions, you will need to write down the R
commands that you would use for computations.
(i) Simulate 20 observations from the binomial distribution with n = 15 and
p = 0.2.
(2 marks)
(ii) Find the 20th percentile of the gamma distribution with α = 2 and θ = 10.
(2 marks)
(iii) Find P( T > 2) for T ∼ t8 . (2 marks)
(iv) Plot the Poisson mass function with λ = 4 over the range x = 0, 1, · · · , 15.
(3 marks)
(v) Simulate 100 observations from the normal distribution with µ = 50 and
σ = 4. Plot the empirical cdf Fn ( x ) for this sample and overlay the true
CDF F ( x ). (4 marks)
(vi) Simulate 25 flips of a fair coin where the results are “heads” and “tails”.
(2 marks)
(vii) To generate 50 random numbers from a Cauchy distribution with θ = 1.
(2 marks)
(viii) X > 6.5 in a χ2 distribution with 2 degrees of freedom. (3 marks)

QUESTION FOUR (20 Marks)


(a) (i) Consider the random number generator

x j = 171 x j−1 mod 30269


u j = x j /30269, j = 1, 2, · · ·

where x0 is an initial seed.

Write an R function called myunif() which takes x0 and n as input


arguments and returns a vector of length n which contains the values
u1 , u2 , u3 , · · · , u n .
(4 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 114

(ii) Differentiate between a multiplicative and mixed congruential gener-


ators giving their recursive formula. Generate four pseudo-random
numbers using the multiplicative congruential generator with b = 171
and m = 32865 with an initial seed of 20018. (6 marks)
(b) Write the program flow through this program and present your output in a
table including all the possible steps and the values taken by the variables
in each line.

1 x ¡- 3
2 for (i in 1:3) {
3 show(x)
4 if (x %% 2 == 1) {
5 x ¡- x*4
6 } else {
7 x ¡- 3/x + 5
8}
9}
10 show(x)

(10 marks)
QUESTION FIVE (20 Marks)
(a) The following numbers are a random sample of real numbers from a uniform
distribution on the range 0 to 1.

0.3612 0.6789 0.3552 0.2898

Use these values to generate four random variates from each of the following
distributions, explaining carefully the method you use in each case.
    x   3− x
3 1 4
(i) P( X = x ) = , x = 0, 1, 2, 3. (6 marks)
x 5 5
 
1 4
(ii) f X ( x ) = 2x exp − x ,
3
x ≥ 0. (6 marks)
2
(b) A small post office opens at 9.00 a.m. The time (minutes) until the first
customer arrives, and the times (minutes) between the arrivals of any two
consecutive customers thereafter, are all independent exponential random
variables with expected value 2.
CHAPTER 9. PAST EXAMINATION PAPERS 115

The post office has just one server. As customers arrive, they join a queue
to be served. As soon as the server has finished serving one customer, she
starts to serve the first customer in the queue (if there is a queue). If a
customer arrives when no previous customer is being served and there is no
queue, then the server begins to serve that customer immediately on his or
her arrival. The service times (minutes) for individual customers are inde-
pendent uniform random variables on the range 1.5 to 2.5.

Use pairs of the uniform random numbers given in part (i), in the order
given, to simulate the arrival and service times for each of the first two cus-
tomers to arrive at this post office. Justify your method of simulation, and
record the simulated arrival times of these customers and the time at which
the server finishes serving them.
(8 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 116

DEDAN KIMATHI UNIVERSITY OF TECHNOLOGY


University Examinations 2014/2015
SECOND YEAR SECOND SEMESTER EXAMINATION FOR THE
DEGREE OF BACHELOR OF SCIENCE IN ACTUARIAL
SCIENCE
STA 2305 Statistical Programming I
DATE: 3RD FEBRUARY 2015 TIME: 11.00 AM − 1.00 PM
Instructions: Answer QUESTION ONE and any other TWO
QUESTIONS.
QUESTION ONE (30 MARKS) (COMPULSORY)
(a) Monte Carlo methods depend crucially on the ability to generate pseudo
random numbers in the interval (0, 1). Make a short list of what you
consider to be the most important properties of a good pseudo random
number generator. (3 marks)
(b) Suppose X ∼ N (2, 0.25). Denote by f and F the density and distribution
function of X respectively. Write the R code to calculate

(i) f (0.5)

(ii) F (2.5)

(iii) F −1 (0.95) (recall that F −1 is the quantile function)

(iv) Pr (1 ≤ X ≤ 3). (8 marks)

(c) Using a for-loop compute the product of the values in the vector. Print the
final product, but also print the intermediate result during the iteration of
the for-loop.
(3 marks)
(d) Create a vector ‘values’ of length 10 with random values between 1 and 10.
Use a for-loop and an if-statement to print only the values greater than 3.
(3 marks)
(e) Write the lines of R code required to compute the amount of interest to
be paid on a loan of $25000 held for 3 years at an interest rate of 4%
compounded monthly.
(3 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 117

(f) The continuous random variable X has probability density function f ( x ) and
cumulative distribution function F ( x ). Explain carefully how the inversion
method (i.e. the inverse c.d.f. method) of simulation could be used to
simulate observations from this distribution. What restrictions on F ( x ) are
required in order to make this method of simulation work? (3 marks)
(g) Consider the following function
(
k ( x + 1)3 ; if − 1 ≤ x ≤ 1
f (x) =
0; Otherwise

(i) Find the value of the constant k so that f ( x ) is a well defined proba-
bility density function. (1
mark)

(ii) Use the inverse transformation method to generate three random ob-
servations from f ( x ) by hand. (3
marks)

(ii) Write an R function to generate 1000 observations from f ( x ). Use the


following Uniform random numbers in the given order.

r0 = 0.0257, r1 = 0.2331, r3 = 0.2589

(3 marks)

QUESTION TWO (20 Marks)


(a) Write a function in R that takes a vector ( x1 , x2 , · · · , xn ) and calculates
both ∑ xi , and ∑ xi2 . (use the function sum). (4 marks)
(b) Consider the following R code
x < − (-10):10
n < − length(x)
y < − rnorm(n, x, 4)
plot(x, y)
abline (0,1)

Explain the effect of each command line and the graph that is produced.(4
marks)
(c) Let x < − matrix(6:9, nrow=2, byrow=T) and y < − matrix(2:5, nrow=2,
byrow=T). Write the R-output for each command line when executed.
CHAPTER 9. PAST EXAMINATION PAPERS 118

(i) >t(x)+t(y)

(ii) >crossprod(x,y)

(iii) >x% ∗ %y

(iv) >t(x)+y

(v) >solve(x% ∗ %y) (12 marks)

QUESTION THREE (20 Marks)


State briefly the method you would use to generate pseudo-random numbers
on a computer from each of the following statistical distributions and also
using the R commands for the respective distributions write R code that
you would use to generate 100 numbers for the above distribution:
(a) the continuous uniform distribution on the range [0,1). (2 marks)
(b) the continuous uniform distribution on the range [1900,2500). (2 marks)
(c) the standard normal distribution. (2 marks)
(d) the exponential distribution with mean 2. (2 marks)
(e) the chi square distribution with 6 degrees of freedom. (2 marks)
q
2
(f) the distribution function ϕ( x ) = π2 e−1/2x ( x > 0). (2 marks)
(g) the distribution of X | X > 0 where X has a Poisson distribution with mean
5.
(2 marks)
(h) the distribution of X | X > 100000 where X has an exponential distribution
with mean 100,000. (2 marks)
(i) the Cauchy distribution with θ = 1. (2 marks)
(j) the 20th percentile of the gamma distribution with α = 2 and θ = 10. (2
marks)

QUESTION FOUR (20 Marks)


(a) (i) Briefly explain why it is important for the values in a sequence of
pseudo-random numbers to be uncorrelated. Give an example to illus-
trate your answer.
(4 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 119

(ii) Differentiate between a multiplicative and mixed congruential gener-


ators giving their recursive formula. Generate four pseudo-random
numbers using the multiplicative congruential generator with b = 171
and m = 32865 with an initial seed of 20018. (6 marks)
(b) The following algorithm can be programmed on a computer to generate
pseudo random numbers ( X ) from a Poisson distribution with mean λ.

Step 1 INPUT LAMBDA

Step 2 LET X = 0

Step 3 LET SUM = 0

Step 4 LET U = RANDOM VALUE FROM U (0, 1) DISTRIBUTION

Step 5 LET SUM = SUM − LN(U)/LAMBDA

Step 6 IF SUM < 1 THEN LET X = X + 1: GOTO Step 4

Step 7 RETURN X.

(i) Study this algorithm and show that it gives the correct probability of
obtaining the value X = 0. (2 marks)

(ii) Use it to generate two values from a Poisson distribution with mean
λ = 2, using the following pseudo-random values from a uniform dis-
tribution on (0, 1): 0.564, 0.505, 0.756, 0.610, 0.046. (8
marks)

QUESTION FIVE (20 Marks)


(a) The following numbers are a random sample of real numbers from a uniform
distribution on the range 0 to 1.

0.3612 0.6789 0.3552 0.2898

Use these values to generate four random variates from each of the following
distributions, explaining carefully the method you use in each case.
    x   3− x
3 1 4
(i) P( X = x ) = , x = 0, 1, 2, 3. (6 marks)
x 5 5
 
1 4
(ii) f X ( x ) = 2x exp − x ,
3
x ≥ 0. (6 marks)
2
CHAPTER 9. PAST EXAMINATION PAPERS 120

(b) The marks, denoted by X, obtained by candidates taking an aptitude test


can be modelled by assuming that X has a normal distribution with mean
µ and variance σ2 , where α is a positive constant.

Use the Box-Muller formulae to generate five simulated aptitude test marks
in the case where, µ = 5000 and σ = 1000 using the following pairs of
pseudo-random numbers from a uniform (0, 1) distribution: (0.558, 0.585),
and (0.080, 0.512).
(4 marks)
(c) You are given that f ( x ) = (1/9) x2 for 0 ≤ x ≤ 3. You are to simulate three
observations from the distribution using the inversion method. The following
three random numbers were generated from the uniform distribution on [0,1]:

0.008, 0.729, 0.125

Using the three simulated observations, estimate the mean of the distribu-
tion.
(4 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 121

DEDAN KIMATHI UNIVERSITY OF TECHNOLOGY


University Examinations 2015/2016
SECOND YEAR SECOND SEMESTER EXAMINATION FOR THE
DEGREE OF BACHELOR OF SCIENCE IN ACTUARIAL
SCIENCE
STA 2305 Statistical Programming I
DATE: 14th December 2015 TIME: 11.00 AM − 1.00 PM
Instructions: Answer QUESTION ONE and any other TWO
QUESTIONS.
QUESTION ONE (30 MARKS) (COMPULSORY)
(a) (i) Explain the disadvantages of using truly random numbers as opposed
to pseudo random numbers, in Monte Carlo simulations. (2 marks)

(ii) List four methods for the generation of random variates. (2 marks)
(b) You are given the f ( x ) = ( 19 ) x2 for 0 ≤ x ≤ 3.

You are to simulate three observations from the distribution using the inver-
sion method. The following three numbers were generated from the uniform
distribution on [0, 1]:

0.008 0.729 0.125

Using the three simulated observations, estimate the mean of the distribu-
tion.
(4 marks)
(c) Write the expected output of the following R code:

(i) z <- 0
repeat
z <- z + 1
print(z)
if(z > 5) break()

z (2 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 122

(ii) x <- 1
while(x < 5) x <- x+1; if (x == 3) break; print(x); (2
marks)

(iii)
x <- 1
while(x < 5) x <- x+1; if (x == 3) next; print(x);
(2 marks)

(c) The number of claims ( X ) from a health insurance policy over a one year
period has the following probabilities:

P( X = 0) = 0.70, P( X = 1) = 0.15, P( X = 2) = 0.08,


P( X = 3) = 0.05, P( X = 4) = 0.02, P( X > 4) = 0.

(i) Describe how you would use the acceptance/rejection method to gen-
erate discrete pseudo-random samples from this distribution, by using
as reference the discrete uniform distribution that has probability 0.2
for each of the values 0, 1, 2, 3 and 4. You may assume that you can
also generate samples from the Bernoulli distribution, i.e., a sample
that is equal to 1 with probability p and 0 with probability 1 − p, for
any specified value of p in the range (0,1). (4 marks)

(ii) What proportion of values generated as above will be accepted? Ex-


plain.
(2 marks)

(iii) Describe an alternative method that could be used to generate pseudo-


random samples from the above distribution. (2
marks)

(d) Create a vector ‘values’ of length 100 with random values between 5 and 20.
Use a for-loop and an if-statement to print only the values greater than 10.
(3 marks)
(e) Compute the amount of interest to be paid on a loan of Ksh.50,000 held for
3 years at an interest rate of 12% compounded monthly. Write the lines of
R code required to compute the loan balance on a monthly basis and display
the same. (5 marks)

QUESTION TWO (20 Marks)


CHAPTER 9. PAST EXAMINATION PAPERS 123

To answer these questions, you will need to compute the values for the re-
spective distributions and write down the R commands that you would use
for the same computations.
(a) Suppose the mass of shire horses is assumed to have a normal distribution
with mean 1000kg and standard deviation 50kg. Calculate the probability
that the mass of a shire horse

(i) is less than 975kgs.

(ii) is more than 975kgs.

(iii) is between 940kg and 1030kg.

(iv) exceeds 1030kg. (8 marks)

(b) Suppose that T has the Weibull distribution with parameters α = 0.2 and
β = 10. Find (i) the probability that T takes a value between 20 and 30,
and (ii) the 95th percentile of T. (4 marks)
(c) Suppose that 10% of a certain type of components last more than 600 hours
of operation. For n = 200 components, let X denote the number of those
that last more than 600 hours. Find (approximately):
(i) P( X ≤ 30), (ii) P(15 ≤ X ≤ 25) and (iii) P( X = 25). (6 marks)
(d) Suppose the mean checkout time of a supermarket cashier is three minutes.
Find the probability of a customer checkout being completed by the cashier
in less than two minutes, three minutes and four minutes. (i.e. what per-
centage of “waiting times” are less than two, three and four minutes?) (2
marks)

QUESTION THREE (20 Marks)


(a)
(i) Explain what is meant by the term “linear cogruential generator” and state
with reasons three criteria that are important when devising a linear con-
gruential generator.?
(4 marks)
(ii) Describe the three integers one has to choose in order to form a linear
congruential generator (LCG) and set out a recursive relationship, with
subsequent scaling, for generating a sequence of pseudo random numbers in
the range −1 to 1.
(3 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 124

(iii) Differentiate between a multiplicative and mixed congruential generators


giving their recursive formula. Generate four pseudo-random numbers using
the multiplicative congruential generator with b = 181 and m = 3285 with
an initial seed of 2015. (6 marks)
(b)
(i) Given a random number U uniformly distributed over [0,1], give an expres-
sion in terms in terms of U for another random variable X whose density
function is f ( x ) = 5 exp(5x ). Justify the expression. (2 marks)
(ii) It is required to generate a sequence of simulated observations from the
density function
k exp(−5x )
g( x ) = , x > 0,
1+x
where k is a constant not involving x. Describe a procedure that applies the
Acceptance Rejection method to obtain pseudo random samples from this
distribution. Is it necessary to determine the constant k in order to use this
method?
(5 marks)

QUESTION FOUR (20 Marks)


To answer part (a) of this question, you will need to compute the values for
the respective distributions and write down the R commands that you would
use for the same computations.
(a) Keith’s Florists has 15 delivery trucks, used mainly to deliver flowers and
flower arrangements in Nairobi. Of these 15 trucks, 6 have brake problems.
A sample of 5 trucks is randomly selected. Calculate the probability that
2 of those tested have defective brakes? Write an R-code to implement the
same. (5 marks)
(b) Write the R code to generate a sample of 1000 exponential random variables
with rate parameter equal to 2, and calculate the mean of those variables.
(3 marks)
(c) Consider the following function in R: myfct <- function(x1, x2=5)
z1 <- x1/x1
z2 <- x2*x2
myvec <- c(z1, z2)
return(myvec)

Write the expected output of the following R codes


(i) myfct (2 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 125

(ii) myfct(x1=2, x2=5) (2 marks)


(iii) myfct(x1=2) (1 mark)
(d) A loan of amount Ksh.5,000 is payable in 5 equal annual instalments at
an effective annual rate of interest of 10% p.a. Calculate the annual loan
amount and construct the amortization table for the loan. Write an R
function to generate the amortization table for the loan. (7 marks)

QUESTION FIVE (20 Marks)


(a) Five Pseudo-random numbers (3 digit) are generated as below:

271, 399, 240, 704, 285

Using the above numbers generate a random sample from chi-square distri-
bution with 10 degrees of freedom. (6
marks)
(b)
(i) Let n be an integer and suppose that X1 , X2 · · · , Xn are independent random
variables each having an exponential distribution with parameter λ. Show
that Z = X1 + X2 + · · · + Xn has a Gamma distribution with parameter n
and λ.
(2 marks)
(ii) By using this result, generate a random sample from a Gamma distribution
with mean 30 and variance 300 using the following 5 digit pseudo-random
numbers.
63293 43937 08513
(4 marks)
(c) You are given a stream of independent uniform [0,1] random variables U1 , U2 , U3 · · · .
Describe how to use these random variables to generate random variates with
the following distributions:
(i) The Weibull distribution with probability density function
(
30x2 exp(−10x3 ) if x > 0
f (x) =
0 Otherwise

(4 marks)
CHAPTER 9. PAST EXAMINATION PAPERS 126

(ii) You are to simulate four observations from a binomial distribution with two
trials probability of success of 0.30. The following random numbers are
generated from the uniform distribution on [0, 1]:

0.91 0.21 0.72 0.48

Determine the number of simulated observations for which the number of


successes equals zero. (4 marks)

You might also like