Analysis of Vaccine Efficacy

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Analysis of Vaccine Ecacy

By

Manjeet Sidhu ms648@kent.ac.uk

MA867: Dissertation

Submitted in partial fulllment of the requirements of the degree of MSc Statistics in the School of Mathematics, Statistics and Actuarial Science at University of Kent, 2012

Project supervised by: Dr Owen Lyne

Analysis of Vaccine Ecacy


Manjeet Sidhu School of Mathematics, Statistics and Actuarial Science University of Kent 2012

ABSTRACT

ii

Table of Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The SIR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Dening the threshold parameter . . . . . . . . . . . . . . . . . . . . . . 3.1.1 3.2 3.3 3.2.1 Calculating the beta values in Matlab . . . . . . . . . . . . . . . .

1 2 4 5 8

Final size probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Calculating the nal size probabilities in Matlab . . . . . . . . . . 11 Formation of the likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 12 15

4 Application on epidemic data . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2

Simulated data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Real life inuenza data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 22

5 Analysis of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 5.2 5.3

Relationship between z and R . . . . . . . . . . . . . . . . . . . . . . . 22 Inuence of R on the severity of the epidemic . . . . . . . . . . . . . . . 23 Eect of the L and G on the epidemic . . . . . . . . . . . . . . . . . . 24

6 Vaccination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.1 All or nothing vaccine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 31 33 34

iii

List of Figures, Matrices and Data sets


The list below is sorted in order of appearance within the dissertation. Figure 3.1.1: calculating the beta values in Matlab (A3) Figure 3.2.1: calculating the nal size probabilities in Matlab (A5) Data set 4.1: simulated epidemic data Matrix 4.1: log-likelihood matrix for the simulated data Figure 4.1: 2d and 3d contour plots of the log-likelihood surface for the simulated data Data set 4.2: inuenza epidemic data Matrix 4.2: log-likelihood matrix for the inuenza data Figure 4.2: 2d and 3d contour plots of the log-likelihood surface for the inuenza data Figure 5.2: plot showing the distribution of infectives under a large threshold value Figure 5.3: pie chart comparing the eect of a globally-driven epidemic with a locallydriven epidemic Figure 6.1: calculating probabilities of successful vaccinations in Matlab (A18)

iv

Chapter 1 Introduction
Throughout this project we will attempt to analyse numerous data sets based on infectives in order to examine and control the behaviour of epidemics for a closed population of small households i.e. a population of households that cannot contract a disease from an external source, and so the model that we decide to adopt is of great importance. Chapter 2 discusses the background regarding epidemic models, and certain parameters of which we will be incorporating within our investigation such as that of the threshold, within and between household contact rates and the infectious period. In chapter 3, we discuss in detail the SIR (susceptible infected removed) model by Ball et al. (1997) and its assumptions . Within this chapter we focus particularly on the threshold parameter which is the primary focus of this dissertation since its existence broadly denes an epidemic and we will also learn that its magnitude correlates with the severity of the epidemic in question. Due to the importance of this parameter, there are numerous components and computations that we will be required to produce beforehand and this chapter explains these parameters in detail with information on how they are derived and programmed within matlab. Providing an accurate estimate of the threshold parameter is crucial for our investigation and so in chapter 3 we also discuss how to construct the likelihood based on multinomial probabilities of infection. Chapter 4 involves maximizing model parameters given specic data sets, in particular we will consider two data sets; one based on simulations of a single household size and the other involving real life data on inuenza involving households of various sizes. Chapter 5 attempts to further analyse the parameters of importance and examines their eect on the way in which the epidemic behaves, and nally we introduce vaccination strategies into our model by considering dierent types of vaccine in Chapter 6. Here we focus particularly on an all or nothing vaccine in an attempt to discover that given the vaccines ecacy, the proportion of individuals within a population that it is necessary for us to vaccinate in order to prevent an epidemic from occurring, which relates directly to the post-vaccination threshold parameter.

Chapter 2 Background research


One of the key aspirations of examining epidemic behaviour is to provide us with knowledge of how to prevent the outbreak of an infectious disease within a population of humans. The celebrated threshold theorem facilitates this procedure since it enables us to analyse the epidemic within its earlier stages providing the population is large, hence allowing us to apply possible modications that inuence the application of the epidemic in order to prevent or at least maintain its impact on the population. This theorem is represented in the form of a threshold parameter which denes an epidemic occurrence based on whether its value is greater than one. Ideally we will attempt to adopt vaccination strategies and determine the proportion of individuals in a population of households that must be vaccinated in order to prevent a dierent post-vaccination threshold parameter from exceeding this value. The beginning of the previous paragraph indicates the importance of modelling the epidemic behaviour of an infectious disease and in fact this is well explained in Addy et al. (1991). Its opening paragraph states that Epidemic models can be used as mathematical tools for the analysis of the transmission of infectious diseases. An epidemic model oers a convenient summary to infectious disease data, but a more important use is to provide understanding of the biological and sociological mechanisms of disease transmission. And so by researching the history of epidemic models, we begin to develop a broader understanding of the purpose of our investigation. Rushton and Mautner (1995) were the rst to consider epidemic models with a two level mixing structure (explained later in the chapter), however it was Bartoszynski (1972) whose paper discussed a discrete epidemic model which was the rst paper to implement epidemic models amongst a population comprising of small households. Numerous papers have since appeared which have considered models involving small households such as Greenwood and Reed-Frost chain-binomial model by Bailey (1975). Becker and Dietz (1995) and Ball et al. (1997) discuss in detail, the SIR epidemic model that we shall be implementing within this dissertation and hence these authors provide crucial information of relevance. Of course each epidemic model is unique and arrives with its own assumptions, however the common assumption made regarding the majority of epidemic models is that the population being examined (epidemic stage) or treated (vaccination stage) is large but nite, locally and globally. A population that is locally large refers to a population that is sectionalized into groups such as age, gender, location and medical history, and so 2

each of these must be large which of course induces a globally large population. It is important to note that due to the nature of the SIR model that we shall be considering and in fact any modern models that treat households as the unit for analysis, locally large populations are clearly unrealistic due to household structures and this will be explained further in Chapter 3. As stated within this chapter, it was in fact Bushton and Mautner (1995) who was the rst to consider a two-level mixture structure for epidemic modelling. Regarding the SIR model, by the mixing behaviour we refer to contacts between dierent households and contacts within the same household. Both levels of mixing comprise of a separate contact rate that impacts the threshold parameter in its own unique way. These rates of contacts cause individuals to become infected for a certain amount of time, which is commonly referred to as the infectious period. Many writers interested in epidemics have attempted to experiment with multiple distributions (both continuous and discrete) for this infectious period and in particular, Addy el al. (1991) suggests a gamma distribution with parameters a = 2 and b = 2.05, whose product represents the average length of an infectious period in days for any given individual that has been infected within the population (4.1 days). Ball and Lyne (2002) also adopt this distribution and hence we shall also be applying this distribution to the infectious period.

Chapter 3 The SIR model


As stated earlier, it is of particular interest for us to incorporate the SIR (Susceptible Infective Removed) households model of Ball et al. (1997), which includes the model of Becker and Dietz (1995). The model is designed especially for a population of small households of which is unable to contract the disease from a source outside of its realm. The process involved in a SIR epidemic begins with a susceptible individual (an individual that is able to contract the disease) becoming infected, either globally with rate G where N denotes the total population, or locally with rate L , both at points N of a homogeneous poisson stochastic process. By a global infection, we refer to contacts between dierent households and a local infection is dened as contacts within the same household. Naturally we expect the local rate to be greater than the global rate i.e. L > G , since a susceptible is much more likely to contract an infectious disease from N a resident of the same household that he/she is a habitant of, as opposed to an infective residing in a dierent household. This is one of the main assumptions of this model, thus a valid and realistic assumption in practice. Referring back to the stage process of the SIR epidemic, once the individual has become infected, they shall remain infected until their infectious period TI reaches termination. As cited in Addy et al. (1991), Ball (1986) develops a stochastic epidemic model that allows for any distribution for the length of the infectious period, provided that its Laplace transform can be specied, hence the distribution of the infectious period is arbitrary but must be specied, and as mentioned towards the end of the previous chapter, we will consider a gamma distribution for TI . Regardless of the distribution, once their infectious period has expired, the individual is removed from the model and immune to any further infection. The process of a SIR epidemic begins with a xed number of initial infectives and terminates when there are no longer any infected individuals present within the population. Upon its termination, conclusions can be drawn based on the nal size of the epidemic. In order for the model to be relevant, it is necessary to outline the assumptions that coexist with its application. The model assumes independence between the contact parameters and the infectious periods of each infective and also assumes that there is no latent period i.e. that once an individual becomes infected, they can immediately begin 4

to infect other susceptibles within the population. The latter assumption eliminates the necessity of incorporating an infection rate (not to be confused with the infectious period which in this case can be seen as a removal rate) within our model and hence simplies its implementation. Previous studies have shown that the exclusion of the latent period does not eect the way in which the epidemic behaves; its nal outcome is invariant to the inclusion or exclusion of the latent period. We stated earlier in Chapter 2 that under most epidemic models, if the population was split into groups such as households or workplaces, then each of these must be large; which is referred to as a locally large population. Although this is a requirement in most deterministic models, clearly within the current household setting, locally large populations are unrealistic due to the population being split into a community of households and these tend to be quite small in terms of its number of occupants. Hence, recent work regarding epidemiology has involved relaxing this assumption, see for example Andersson (1999).

3.1

Dening the threshold parameter

During the preliminary phase of the epidemic and under the assumption that the number of households within the population is large and that the number of initial infectives at the beginning of the epidemic is small, it is likely that each global contact is with an individual inhabiting in a household comprising completely of susceptible individuals. It is this branching process that allows us to estimate the initial growth of the epidemic and introduce threshold behaviour into the discussion since we expect every contact to result in a new infective. Note that the branching process tends to break down in the case of local populations not being large due to the probability of a previous contact in the early stages not being close to one, and so we tend to relax this assumption. Within its current context, one of the most imperative aspects of epidemiology is the celebrated threshold theorem. The theorem dened by Ball and Lyne (2002) states that a global epidemic occurs if and only if as the total number of households m tends to innity (m ), the epidemic infects innitely many households. The application of the theorem for the current household setting is expressed through the threshold parameter R (A15 & A16), which basically denes an occurrence of an epidemic nature. Deriving the threshold parameter requires numerous computations involving other meaningful parameters, of which we will now begin to explain. Let us suppose that within our closed population consists a number of households of size n (where n = 1,2 .. until a nite number), which we denote as mn . Clearly the total number of households within the population can be denoted as m = n=1 mn and the total number of individuals can be dened as N = n=1 nmn . These denitions of totals enable us to express the proportion of households of size n as n = mn m 5

(A1) and the proportion of individuals residing in a single household of size n as n = n nmn = n k (A2). The latter parameter can also be regarded as the probability N k=1 k that a global contact is with an individual residing in a household of size n, and will also contribute in assisting us to dene the threshold parameter. By considering n,a (A12) to be the mean nal size (including initial infectives) of a single household epidemic where a denotes the number of individuals in a household infected at the beginning of the process and n denotes the number of susceptibles in the same household (the household size minus the initial infectives). From (2.25) and (2.26) of Ball (1986),
n

n,a = n + a
k=0

n k (L k)n+ak k

(3.1)

In (3.1), () (A13) where > 0, denotes the moment generating function of the infectious period TI = E(eTI ). By denoting TI as y, we have

e
0

y a1 e b dy = ba (a)

1 ( 1+b )a
y

y a1 e b where y Gamma(a, b) with corresponding f (y) = ba (a)

In order to determine the k (A3), Ball (1986) suggests a recursive procedure by rearranging (3.2) and attempting to formulate a trend for the beta values for k = 0,1,2,3,4.. which will be required to determine the mean nal size.
k

i=0

k i (L i)ki = k i

(3.2)

k=0 k=1 k=2

0 0 1 0 2 0

0 (L .0)0 = 0 = 0 = 0 0 (L .0)1 + 0 (L .0)2 +


1 1 2 1

1 (L .1)0 = 1 = 1 = 1 1 (L .1)1 +
2 2

2 (L .2)0 = 2 = 2 = 2 [
3 3

2 1

1 (L .1)1 ]

k = 3 3 0 (L .0)3 + 3 1 (L .1)2 + 3 2 (L .2)1 + 0 1 2 = 3 = 3 [ 3 1 (L .1)2 + 3 2 (L .2)1 ] 1 2

3 (L .3)0 = 3

k = 4 4 0 (L .0)4 + 4 1 (L .1)3 + 4 2 (L .2)2 + 4 3 (L .3)1 + 1 2 3 0 = 4 = 4 [ 4 1 (L .1)3 + 4 2 (L .2)2 + 4 3 (L .3)1 ] 1 2 3

4 4

4 (L .4)0 = 4

Thus, generally
k1

k = k
i=1

k i (L .i)ki i

(3.3)

We now have all the components required for expressing the threshold parameter,

R = G E(TI )
n=1

n n

(3.4)

Note that a parameter dened as the reproductive ratio R0 is considered one of the most important concepts in disease epidemiology. However, in the household context, this reproductive ratio tends to be a poor indicator of threshold as households tend to be small. In our case R = R0 where R0 = G E(TI ). Let us now begin to describe the derivation of (3.4) and its alternative representations by considering the case where an initial infective globally contacts a completely susceptible household of size n. Suppose Tn denote the total number of individuals infected with the inclusion of the initial infective and An denote the sum of their infectious periods, often referred to as the severity of the epidemic. Let Rn be the total number of global contacts occurring from the process, which has a poisson distribution with a mean of G An and so, Rn G An . By denoting R as the number of infected households developing from this epidemic, the threshold parameter is simply the expectation of this parameter and so

R = E(R) =
n=1

n E(Rn ) = G
n=1

n E(An )

which simplies to (3.4) since E(An ) = E(Tn )E(TI ) and n = E(Tn ). As described graphically on page 123 by Ball and Lyne (2002), if R > 1 then the probability of an outbreak of a disease is denitely above zero. In fact, the larger the value of the threshold parameter, the greater the magnitude of the epidemic (regardless of the contact rate values which have no inuence in this scenario). The probability of the epidemic is 1 pa where a is the initial infectives and p is the root of f (s) = E(sR ) = s. In order to compute the threshold parameter and other parameters of importance, we will be using the mathematical package Matlab. Each parameter will have its own Matlab le and the function will be called when its use is required. Let us begin by describing the layout and features of this program. We will be using Matlab R2011b throughout this dissertation. The program allows users to create functions as m les. Let us now describe a basic function in Matlab. 7

function myf = myfunction(x) % function that takes scalar x as the argument n = 5; % set n to 5 myf = zeros(1,n); % create vector of n zeros for i = 1:n % create for loop from 1 to n myf(i) = i * x % multiply x by the current iteration in the loop end The for loop is a loop that will be required to construct most of the parameters within our model. In the example above, the loop begins from 1 until n and the value at each iteration i is multiplied by a scalar x to determine the input for the myf vector. Some functions will require these loops to be inside of other loops which are referred to as nested loops. These m les are displayed as a list by default on the left window providing the user is inside of the correct directory. The main window in the center is the command window where users utilise these functions by providing the correct arguments, and also the output is displayed within this window. An example of this output for the corresponding function above is shown on the next page. >> myfunction(5) ans = 5 10 15 20 25

Certain features of Matlab will be benecial for the purpose of this investigation such as the calling of functions; within a function that requires the usage of another function, one can call this function to save time and facilitate in simplifying certain parameters. Now let us consider how one could make use of Matlab in order to compute the beta values that we have recently determined in the form of an equation.

3.1.1

Calculating the beta values in Matlab

Earlier we derived (3.3) to calculate the beta values required for the mean nal size. This is computed in Matlab in Figure 3.1.1.

function beta = beta_k(a,b,lambdaL,k) p = zeros(1,k-1); for i = 1:k-1 p(i) = phi(a,b,lambdaL,i); end beta = zeros(1,k); beta(1) = 1; betasum = zeros(1,k-1); for i = 2:k for j = 1:i-1 betasum(j) = nchoosek(i,j)*beta(j)*p(j)^(i-j); end beta(i) = i - sum(betasum); end

Figure 3.1.1: calculating the beta values in Matlab (A3) The input arguments for this function are the gamma parameters (a and b), the local lambda contact rate (lambdaL) and the number of susceptibles in a single household (k). The phi function is called and its values are stored in the p vector. The beta vector is then created and 1 is set to one and hence the i loop begins from 2. A double loop is then formed for the summation required within the beta equation and its values are stored in the betasum vector. Finally the beta vector is updated by using the current iteration within the loop (i) and subtracting the summation of the current step (j) within the betasum vector. (A3) provides further comments within the le. This function is then called upon in the mean nal size le which calculates the mean nal size for a single household epidemic, as seen in equation (3.1). The function is then modied to cope with the specialised case of one initial infective (A11) and it is this function that is utilised in the threshold parameter le. Let us produce an example with a = 2, b = 2.05, L = 1 and k = 5. ans = 1.0000 1.7850 2.7594 3.7633 4.7654

The output shown are the beta values for k = 1..5 i.e. 4 = 3.7633.

3.2

Final size probabilities

By considering a completely disease-free household of size n, the nal size probabilities refer to the probabilities of a particular number of susceptibles being infected by the end of the epidemic. Its relevance is fundamental in our discussion as it provides a framework that enables us to utilise maximum likelihood approaches in order to estimate L (local rate) and (escape probability) which are required in the calculation of the threshold parameter. The local contact rate enters R through the mean nal size (in the specialised case involving n 1 susceptibles and one initial infective) and the escape probability is required to calculate the proportion of individuals infected by the epidemic z (A20), as shown in (3.5). n n 1 (1 )k nk nk,k (3.5) z= n n k n=1 k=1 which is utilised in (3.6) and with some simple rearranging can be used to determine G , providing us with all the components required to calculate R (since no estimation is necessary to determine the other parameters in the threshold parameter equation). = exp(G zE(TI )) = G = log zE(TI ) (3.6)

Hence the calculation of the nal size probabilities allows us to estimate L and , which helps us determine n,a , z and G . is inputted directly into (3.6) whereas L relates to (3.6) via z which requires L to determine the mean nal size. Let us now revert back to the discussion involving the nal size probabilities and how to derive each of them by considering Equation 4 in Addy et al. (1991) and applying our notation and also adapting the equation to t with the context of our model regarding households. By doing this, we arrive with (3.7).
j

1=
w=0

j n Pw /(L (n j))w nj , j 0 w

(3.7)

We proceed in a similar manner as that of the previous section where we determined the beta values; by that meaning a recursive procedure is adopted.
n j = 0 .. 1 = ( (n0))0 n0 = P0 = n L
n (0)P0 0

j = 1 .. 1 = ( (n1))0 n1 L

n (1)P0 0

n (1)P1 1

(L (n1))1 n1

0 = n1

Pn

n P1 (L (n1))1 n1

10

0 n = P1 = (1 n1 )(L (n 1))1 n1 = (1 )(L (n 1))1 n1

Pn

By continuing this procedure, we derive at a general formula for the probability of i infectives in a household of size n, Pin = (1
n Pw )(L (n k))k nk w nk (L (n k)) w=0 k1 k w

(3.8)

As explained in Addy et al. (1991), these nal size probabilities fail to take into account the numerous ways of an individual becoming infected. Hence the probabilities in (3.8) must be multiplied by a binomial coecient as there are n dierent ways that i indii viduals may be infected in a household of size n. Thus we have Pin = n P n i i (3.9)

3.2.1

Calculating the nal size probabilities in Matlab

By using Equation 4 from Addy el al. (1991) and applying adjustments to suit our model context, we derived an equation for the nal size probabilities as shown in (3.8) and (3.9). The way this is computed inside of Matlab is shown in Figure 3.2.1. function fs2 = final_size(a,b,pi,lambdaL,n) fs = zeros(1,n+1); fs(1) = pi^n; p = zeros(1,n); for i=1:n p(i) = phi(a,b,lambdaL,n-i); end for i=1:n fssum = zeros(1,i); for w=0:i-1 fssum(w+1) = (nchoosek(i,w)*fs(w+1))/(((p(i))^w)*(pi)^(n-i)); end fs(i+1) = (1 - sum(fssum))*((p(i))^(i))*((pi)^(n-i)); end fs2 = zeros(1,n+1); for i=0:n fs2(i+1) = nchoosek(n,i)*fs(i+1); end Figure 3.2.1: calculating the nal size probabilities in Matlab (A5) 11

Here the input arguments for the function are the gamma parameters (a and b), the escape probability (pi), the local contact rate (lambdaL) and the maximum number of individuals in a household (n). The function begins with the creation of the f s vector and the rst element is set to n . The phi function is once again called, however this time the argument is set as n i. A double loop is then performed to create the f ssum vector and the relevant calculations are performed and the values are stored in f s vector. In order to take into account the possible ways of becoming infected, we derived (3.9) and this is denoted by the f s2 vector which simply multiplies the relevant entry in the f s vector by the corresponding binomial coecient. An example of its use is shown below with a = 2, b = 2.05, = 0.5, L = 1 and n = 5. ans = 0.0313 0.0018 0.0002 0.0001 0.0003 0.9663

The rst entry is the probability of zero individuals becoming infected in a household of size ve, up until the probability of all ve becoming infected. Clearly in this case there is a large probability that given the parameter values, all individuals within this household will become infected at one point in the epidemic. Note that the probabilities do indeed sum to one. This function is then converted into a matrix form which is then called to create the log-likelihood. In terms of the example provided, this would be the nal row of a 5x6 matrix (each row representing households from size 1 to 5 and each column representing the probabilities of becoming infected from 0 to 5). In the next section, we will dene this matrix as P. Here is the output of such an example, ans = 0.5000 0.2500 0.1250 0.0625 0.0313 0.5000 0.0537 0.0144 0.0049 0.0018 0 0.6963 0.0099 0.0011 0.0002 0 0 0.8507 0.0018 0.0001 0 0 0 0.9297 0.0003 0 0 0 0 0.9663

3.3

Formation of the likelihood

By simultaneously analysing infectious data based on households and applying our knowledge from the previous section, we can adopt a maximum likelihood approach in order to estimate key parameters for that particular data set. The data consists of 12

the number of individuals in a household of a certain size that are infected by the end of the epidemic. Let Xi denote the number of i infectives in a single household of size n (i n) and (n) Pi be the probability of i infectives in the same household containing n susceptibles. Thus, the likelihood is product of multinomial probabilities and can be constructed as
n (n)

L=
i=0 (n)

(Pi )Xi
(n)

(n)

(n)

(3.10)

The data enters the likelihood via Xi and Pi is the parameter in question which essentially will be maximized. In fact it is this parameter that has nested the parameters of interest, being L and . Utilising the log-likelihood oers simplicity as summations are generally easier to work with than products and so applying a log-transformation to the likelihood results in
n

logL =
i=0

Xi log(Pi )

(n)

(n)

(3.11)

Since Matlab operates solely with matrices and vectors, and will be utilised to construct and subsequently maximise the likelihood, it is deemed appropriate to incorporate matrix notation allowing for the theory presented to relate directly with our coding. Thus a vector for the data for a single household of size n can be represented by (n) (n) (n) (n) (n) (n) [X0 , X1 , .., Xn ] with corresponding probabilities [P0 , P1 , .., Pn ]. Note that the probabilities must sum to one, providing all the possible number of ways an individual can be infected has been accounted for. In order to account for all household sizes (as opposed to a single household), we can construct a matrix where each row represents a dierent household size from 1 to n where n represents the household with the largest size in the data set. The matrix of the numbers of infectives is shown below as X and the matrix representing the nal size probabilities is shown as P. (1) (1) X0 X1 0 0 (2) (2) (2) X0 X1 X2 0 X= . . . . . . . . (n) (n) (n) X0 X1 .. Xn (1) (1) P0 P1 0 0 (2) (2) (2) P0 P1 P2 0 P= . . . . . . . . (n) (n) (n) P0 P1 .. Pn

Hence by incorporating (3.11) and applying it on to the matrices above, we simply have 13

the summation of products of the entries of X and the logarithm of the corresponding entry in P i.e. X(i,j) log(P(i,j) ) where j could represent the possible household sizes (1...n) and i could denote the number of possible infectives in a household of size j (i j n). However since we are adopting matrix notation, here i represents the ith row of the relevant matrix and j denotes the jth row of the same matrix i.e. X1,2 log(P1,2 ) represents the corresponding elements of the relevant matrix relating to the rst row (households of size one) and the second column (all possible households with one infective). This is constructed in Matlab as shown in (A9) and modications are applied in order to adapt the function for the matlab minimization procedure resulting in (A10). So for example, if the data consisted of housesholds of size 3, the likelihood involving matrices becomes X1,1 log(P1,1 ) + X1,2 log(P1,2 ) (households of size one) + X2,1 log(P2,1 ) + X2,2 log(P2,2 ) + X2,3 log(P2,3 ) (households of size two) + X3,1 log(P3,1 ) + X3,2 log(P3,2 ) + X3,3 log(P3,3 ) + X3,4 log(P3,4 ) (households of size three)

14

Chapter 4 Application on epidemic data


The theory presented in this paper has enabled us to create a foundation to build upon involving the practical aspect of our discussion. The purpose of this chapter will be to utilise the Matlab les created to attempt to analyse numerous data sets of an epidemic nature. The analysis will involve tting a likelihood to the data based on a grid of parameters, attempting to maximise contact parameters using the simplex method and examining the eect that these contact rates have on the threshold parameter. Note that as explained previously, we will us assume that the infectious period TI follows a gamma distribution where a = 2 and b = 2.05.

4.1

Simulated data

By making use of the simhouses(A17) le provided by Dr Owen Lyne, we are able to simulate infectious data by specifying the gamma parameters and the contact rates as well as the vector of households sizes. Let us observe an example involving 300 households of size 5 ([0 0 0 0 300] represents the houses vector). It is important to note that numerous simulations will be required in order to provide precise infectious data since not all epidemics may occur but for the purpose of presentation, let us perform twenty simulations where G = 0.1 and L = 0.1 with minimum epidemic size as 1. The number of individuals infected in the population by the end of the epidemic is as follows. Note that the total population here is 1500 (300 * 5) which is the maximum number of infectives; this is a regular occurrence of infectives when we experiment with extremely large values of contact rates, especially the global rate which is far more inuential as we will learn. [3 34 55 1 3 9 1 616 604 5 1 56 3 17 395 271 568 6 52 8] Here we can observe that there are cases where the epidemic does not begin at all and other cases where many individuals become infected (e.g. 616, 604 395, 271 and 568). The data matrix showing the distributions of infections per household per epidemic for one of the simulations (616) is presented on the next page. Note that multiplying each number by their respective household sizes and taking the sum results in a total of 616 infectives.

15

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 123 31 24 16 41 65 Data set 4.1: simulated epidemic data The last row represents the households of size ve so it seems natural that all other rows contain zero elements. Again, the columns refer to the number of individuals infected and so the rst column informs us that 123 households had zero people infected, 31 had one person infected, etc. The log-likelihood matrix for the simulated data is shown as Matrix 4.1. The dimensions of the matrix represent the number of plausible values provided for the parameters and thus for presentation purposes, we will be experimenting only with ten values using the linspace command in Matlab, for L and the same for . -0.8317 -0.8114 -0.8089 -0.8255 -0.8644 -0.9318 -1.0399 -1.2157 -1.5399 -2.6377 -0.7144 -0.6522 -0.6012 -0.5606 -0.5307 -0.5124 -0.5089 -0.5274 -0.5909 -0.8788 -0.7322 -0.6613 -0.6012 -0.5515 -0.5121 -0.4844 -0.4712 -0.4800 -0.5336 -0.8115 -0.7762 -0.7017 -0.6382 -0.5850 -0.5421 -0.5109 -0.4943 -0.4998 -0.5501 -0.8247 -0.8255 -0.7492 -0.6839 -0.6290 -0.5844 -0.5515 -0.5332 -0.5370 -0.5857 -0.8587 -0.8743 -0.7970 -0.7307 -0.6747 -0.6292 -0.5953 -0.5760 -0.5789 -0.6267 -0.8988 -0.9209 -0.8430 -0.7760 -0.7194 -0.6732 -0.6387 -0.6188 -0.6210 -0.6682 -0.9398 -0.9648 -0.8865 -0.8191 -0.7620 -0.7154 -0.6804 -0.6601 -0.6620 -0.7087 -0.9799 -1.0061 -0.9274 -0.8597 -0.8023 -0.7554 -0.7201 -0.6995 -0.7010 -0.7475 -1.0184 -1.0448 -0.9659 -0.8980 -0.8403 -0.7932 -0.7577 -0.7369 -0.7382 -0.7844 -1.0551

Matrix 4.1: log-likelihood matrix for the simulated data Here the largest value in the matrix (-0.4712) represents approximately which values of L (0.1111) and (0.8267) correspond to the maximum of the log-likelihood; the point at which the log-likelihood function reaches its optimum value. Interpretation can be simplied by providing a visual representation in the form of contour plots. With L presented across the x-axis and labelling the y-axis, the contour plots for Matrix 4.1 are shown in Figure 4.1.

16

Figure 4.1: 2d and 3d contour plots of the log-likelihood surface for the simulated data This gure enables us to deduce that the maximum likelihood estimate for L lies approximately between 0.06 and 0.16 and that for = 0.75 0.9. Although the contour plots provide an insight to the intervals of which the parameter lie within and assist us with where to commence with our parameter search, clearly they are not reliable enough to specify a true parameter value since this may lie anywhere within the contour, and so we turn to Matlab and make use of fminsearch which minimizes the negative log-likelihood (we simply multiply the log-likelihood by -1) using the simplex method. Note instead we use the function fminsearch2 (A7) that has been created which ensures that the data matrix is valid and that the starting values for parameters cannot go outside of their corresponding ranges. More specically, must lie within 0 and 1 since in the case where is zero, this results in no members of the population escaping global infection which causes innitely many infectives, and it cannot be equal to one as in this case each member escapes global infection resulting in no infectives. The eect that the latter case has upon the log-likelihood is well presented by the 3d contour plot in Figure 4.1. In fact, with regards to both of these specialised cases, the log-likelihood is not truly dened outside of this parameter region and hence as tends to 0 or 1, the log-likelihood tends to . Also L cannot be negative but may however be zero since this simply implies that there are no infections within the same household. >> simdata=[0 0 0 0 0 0;0 0 0 0 0 0;0 0 0 0 0 0; 0 0 0 0 0 0;123 31 24 16 41 65] >> fminsearch2([0.825,0.1],2,2.05,simdata)

17

ans = 0.8354 0.1056

And so our estimates for L and are 0.1056 and 0.8354, respectively, which means that the local rate of infection was approximately 0.1056 and that the probability of an individual within this epidemic avoiding global infection was approximately 83.54%. These estimates are bounded by their corresponding ranges as explained by Figure 4.1 and are close to the values resulting from the matrix analysis, hence we have achieved consistency within our results. As explained previously, we also require an estimate of the global contact rate in order to calculate the threshold parameter. Since we now have an estimate of , we are in a position to compute z and this assists us in providing an estimate of G . >> lambda_g(2,2.05,0.8354,0.1056,[0 0 0 0 300]) ans = 0.1074 Hence the estimates produced for L and G are similar to that of the values utilised to simulate the data set (0.1), which provides reassurance regarding our procedures. Finally we can produce an estimate of the threshold parameter using either one of the R les created, each producing approximately (due to rounding issues) the same value for R . The le r1 (A15) takes as input arguments L and G , and r2 (A16) requires values for and L . >> r2(2,2.05,0.8354,0.1056,[0 0 0 0 300]) ans = 1.3682 The r2 matlab le calculates G based on values of L and and hence we can conclude here that R = 1.3682, which is of course greater than 1. Although simulating data has been a technique that has proven to be eective due to modern technology consisting of faster processing speeds of computers, it is important to consider a case involving real life data in order to relate our work directly to the real world, since the purpose of studying epidemic behaviour is to adopt vaccination strategies for the benet of society. 18

4.2

Real life inuenza data

In this section we look at an example involving real time epidemiology. Addy el al, (1991) focuses on inuenza data in Tecumseh, Michigan from 1976-1981 involving 1,414 individuals and the data matrix X is displayed as Data set 4.2. Recall that Xi represents (1) the number of i infectives within that particular household size i.e. X0 is the number of households of size one that contained zero infectives by the end of the epidemic. (1) X0 X (2) 0 (3) X = X0 (4) X0 (5) X0 X1 (2) X1 (3) X1 (4) X1 (5) X1
(1)

0 0 0 0 110 (2) X2 0 0 0 149 (3) (3) X2 X3 0 0 = 72 (4) (4) (4) X2 X3 X4 0 60 (5) (5) (5) (5) 13 X2 X3 X4 X5

23 0 0 0 0 27 13 0 0 0 23 6 7 0 0 20 16 8 2 0 9 5 2 1 1

Data set 4.2: inuenza epidemic data Addys paper develops a stochastic epidemic model for a heterogeneous population by adopting a gamma distribution for the infectious period where TI (2, 2.05), and involves deriving estimates of (in our notation) L and . By utilising the matlab les created, we will attempt to match these estimates and also produce an estimate of G , which will provide us with an accurate estimate the crucial threshold parameter. Let us begin by deriving the log-likelihood matrix using a suitable grid of values. Addys work provides us with an insight to the type of parameter values that the likelihood region should be conned within. Using this knowledge, we can attempt to t the loglikelihood with L = 0 0.5 and = 0.5 0.99 (since cannot be equal to 1 as this results in no members of the population becoming infected). Thus by using the linspace matlab command, we can experiment with ten possible values for the parameters within the specied intervals and so we obtain a log-likelihood matrix as shown in Matrix 4.2. -0.8298 -0.7384 -0.6619 -0.5997 -0.5520 -0.5204 -0.5094 -0.5294 -0.6128 -1.0127 -0.8611 -0.7631 -0.6791 -0.6077 -0.5488 -0.5029 -0.4726 -0.4646 -0.4997 -0.7429 -0.8993 -0.7993 -0.7131 -0.6395 -0.5781 -0.5296 -0.4965 -0.4854 -0.5173 -0.7570 -0.9361 -0.8351 -0.7479 -0.6732 -0.6108 -0.5612 -0.5270 -0.5148 -0.5454 -0.7839 -0.9701 -0.8686 -0.7808 -0.7055 -0.6425 -0.5923 -0.5575 -0.5447 -0.5747 -0.8126 -1.0014 -0.8995 -0.8113 -0.7357 -0.6723 -0.6217 -0.5866 -0.5734 -0.6030 -0.8405 -1.0302 -0.9280 -0.8396 -0.7637 -0.7001 -0.6493 -0.6139 -0.6004 -0.6298 -0.8671 -1.0568 -0.9545 -0.8659 -0.7898 -0.7259 -0.6750 -0.6394 -0.6258 -0.6550 -0.8921 -1.0815 -0.9791 -0.8903 -0.8141 -0.7501 -0.6990 -0.6633 -0.6496 -0.6787 -0.9156 -1.1046 -1.0020 -0.9132 -0.8369 -0.7727 -0.7215 -0.6857 -0.6719 -0.7009 -0.9377

Matrix 4.2: log-likelihood matrix for the inuenza data 19

Once again, the largest value within this matrix (-0.4646) represents approximately the maximum of the log-likelihood and the corresponding row and column refers to that particular combination of values for and L , respectively. Here the 8th row corresponds to = 0.8811 and the 2nd column denes L as 0.0556. Similarly we can produce contour plots of the log-likelihood surface which will better explain where the maximum lies. Figure 4.2 displays these plots where the x-axis represents L and the y-axis . The surface within the 3d contour plot of course represents the shape of log-likelihood surface.

Figure 4.2: 2d and 3d contour plots of the log-likelihood surface for the inuenza data Evidently the maximum lies approximately within the points where L = 0.025 0.625 and = 0.83 0.89 but it is not clear where the maximum lies exactly due to the nature of contour plots. Hence we can use the fminsearch (more specically fminsearch2) technique in Matlab to determine accurate estimates for the parameters. Our knowledge from the contour plots enables us to provide sensible starting values for the minimisation procedure. Recall that maximising the log-likelihood is equivalent to minimising the negative log-likelihood and since fminsearch adopts the simplex method to minimise a function, we provide it with the negative of the log-likelihood. >> addydata=[110 23 0 0 0 0;149 27 13 0 0 0;72 23 6 7 0 0; 60 20 16 8 2 0;13 9 5 2 1 1] >> fminsearch2([0.86,0.0325],2,2.05,addydata)

ans = 0.8674 0.0446 20

Thus we obtain parameter values = 0.8674 and L = 0.0446, which satisfy the analy sis deduced from the contour plots in Figure 4.2. Clearly it was not possible to obtain exact estimates from these plots since only ten possible parameter values within the interval were provided in the form of a vector. However, using this procedure, we have now obtained accurate parameter estimates and by matching the estimates to Addys paper, we can deduce that the code created is precise and working correctly. Although Addys paper does not discuss global rates and threshold parameters, knowing provides us with an estimate of G and together with L , we possess all the components required for the threshold parameter. Using our Matlab le and by specifying the obtained parameters above, we can produce an estimate of the global contact rate. >> lambda_g(2,2.05,0.8674,0.0446,[133 189 108 106 31]) ans = 0.1955 The value for G is larger than that for the simulated data (0.1074) despite this consisting of 616 infectives, as opposed to the inuenza data which only produced infectives. This is due to the household structure which caused more individuals to be infected at a lower global rate of infection. And nally we adopt usage of one of the two R les to obtain the threshold parameter. The code includes two ways of calculating the threshold parameter. The rst method (r1) uses arguments L and G and the second (r2) requires L and . Both les will produce approximately the same output for R providing the parameter values match one another (for example L = 0.0446, = 0.8674 and G = 0.1955). The function r2 calculates G based on values of L and and so we have, >> r2(2,2.05,0.8674,0.0446,[133 189 108 106 31]) ans = 1.1302 Thus regarding the inuenza data, we can deduce that R = 1.1302. Although this value is greater than one, it is smaller than the value produced involving the simulated data set and we will begin to examine why this is the case in the next chapter.

21

Chapter 5 Analysis of results


In this chapter, we will attempt to analyse our results and address topics such as the representation of the threshold parameter and how this may eect other parameters and vice versa. We will examine the relationship between the threshold parameter and the proportion of individuals infected by the end of an epidemic as well as the eect that each of the contact rates possess on the epidemic data. A summary of our results from the previous section is displayed in Table 1 below. Simulated Data Inuenza Data 0.1056 0.0446 0.8354 0.8674 0.4084 0.1775 0.1074 0.1955 1.3682 1.1302

L z G R

Table 1: summary of results of estimating the model parameters

5.1

Relationship between z and R

Let us commence by comparing our estimate of the proportion of individuals that are infected by the epidemic z. This parameter has much importance since it represents the probability that an individual chosen at random has been infected by the epidemic. Earlier in (3.5) we dened how to calculate this proportion and we are now able to compute this value. Focusing on the simulated data set, we simply have
Number of infectives Total population

(311)+(242)+(163)+(414)+(655) (3005)

616 1500

= 0.4107

Our ndings regarding obtaining maximum likelihood estimates of L and resulted in z = 0.4084, which is very close to the true z value for the simulated data set. This indicates that our estimation methods have been precise. For the inuenza data set, we have

22

1(23+27+23+20+9) + 2(13+6+16+5) + 3(7+8+2) + 4(2+1) + 5(1) 1(133)+2(189)+3(108)+4(106)+5(31)

250 1414

= 0.1768

Once again, we achieve accurate results with our likelihood methods since z for the inuenza data was approximately 0.1775, and we are now in a position to compare the relationship between the proportion z and the parameter of threshold R Recall that the threshold parameter is the mean number of infected households resulting from a typical infected household. So naturally we expect the proportion of individuals infected z to increase as R rises since as the global contact rate of infection G increases, the probability of escaping global infection will decrease. Now, obviously a larger G results in an increase of R and a larger probability causes z to decline. Thus, as the proportion z increases, so will the threshold parameter R . Our ndings agree with this statement since for the simulated data, z = 0.4107 with R = 1.3682 and for the inuenza data, z = 0.1768 with R = 1.1302.

5.2

Inuence of R on the severity of the epidemic

We can now begin to examine whether R eects the severity of the epidemic by considering simulating data with contact parameters that produce a large value for the threshold parameters. By making use of the simhouses function, we can simulate data involving 7500 households with household vector [250 500 1000 500 250] where the household sizes are [1 2 3 4 5] respectively. In order to obtain a large threshold value, we must provide the simulation code with particularly large contact rates so let us suppose we simulate epidemic data where G = 0.2 and L = 0.1 and thus, R = 1.6568. Below shows the number of infections for typical simulation with twenty repetitions. [4067, 9, 4182, 4253, 4158, 1, 2, 3, 1, 4259, 3984, 4246, 1, 9, 4202, 4198, 2, 1, 1, 1] Clearly this large threshold value has caused the distribution of the epidemic to be bimodal; by that meaning either a large number of infectives is generated or there are hardly any infectives at all. And hence we can deduce that the severity of the epidemic is determined solely on the threshold parameter. Figure 5.2 shows a visual representation of the nal size of this epidemic.

23

Figure 5.2: plot showing the distribution of infectives under a large threshold value

5.3

Eect of the L and G on the epidemic

In order to examine the extent of the eect that the contact rates have on the epidemic data, we can attempt to simulate two dierent data sets by xing one parameter and increasing the other. Once again by making use of simhouses, we can simulate data involving 7500 households with household vector [250 500 1000 500 250] where the household sizes are [1 2 3 4 5] respectively. Both data sets involve approximately 2500 infectives. The following data sets G and L are generated. 203 47 0 0 0 0 322 114 64 0 0 0 G = 564 169 142 125 0 0 218 70 50 71 91 0 84 35 24 24 28 55 214 36 0 0 0 0 373 22 105 0 0 0 0 L = 670 26 18 286 0 292 9 3 5 191 0 142 4 0 2 1 101

Data set G emanates from a globally-driven epidemic with G = 0.15 and L = 0.1 and L is locally-driven where L = 0.5 and G = 0.1. Seemingly, G tends to distribute infections across the dierent types of household sizes more in comparison with L. With regards to L, if a household becomes infected, it appears to be quite likely that every individual residing in that household is totally infected. In fact, we can calculate this as a proportion
Completely infected households Infected households

719 809

= 0.889

24

Thus under this particular local-driven epidemic, approximately 89% of the infected households are completely infected. In comparison to a globally-driven outbreak, where the proportion is only 52.5% and so in this case, more households are partly infected. Figure 5.3 explains this scenario in the form of a pie chart.

Figure 5.3: pie chart comparing the eect of a globally-driven epidemic with a locally-driven epidemic Here we can observe that under a globally-driven epidemic, there is a greater proportion of households that are partly infected (29%), in comparison with a locally-driven epidemic where only 3% of the households are partly infected. In terms of infected households, the locally-driven epidemic is clearly dominated by completely infected households. Another conclusion that one can draw from Figure 5.3 is that the locally-driven data set has approximately 12% more households that remained not infected by the end of the epidemic, which implies that G causes more households to become infected within the population, in comparison with L .

25

Chapter 6 Vaccination
One of the main purposes of examining the epidemic behaviour of a specic disease within a population of small households is to adopt vaccination strategies that enable us to determine the proportion of individuals that ultimately need to be vaccinated in order to prevent an epidemic from occurring. This is achieved by dening a new post-vaccination threshold parameter Rv and applying vaccination schemes in order to restrict this parameter from being greater than one and thus preventing an epidemic. Also of importance is the eectiveness of the vaccine, which we denote as and how one can determine this parameter based on likelihood procedures involving probabilities of successful vaccinations. In this chapter we will discuss certain types of vaccine, namely fully and partially effective vaccines and we will emphasize particularly on an all or nothing vaccine and discuss how one may incorporate this into our study regarding epidemiology.

Fully eective vaccines


This type of vaccine involves vaccinating a certain amount of susceptible individuals within a population fully. Let us suppose that we vaccinate v individuals in a household of size n then each of these individuals are completely immune from the disease i.e. the vaccines ecacy is equal to one and we have a perfect vaccine and hence we simply deduct v vaccinees from a household comprising of n individuals in order to determine new nal size probabilities. Recall from (3.4) that, nmn )n = N 1 G E(TI ) nk n k ( R = G E(TI ) n n = G E(TI ) N n=1 n=1 k=1 By denoting Rv as the post-vaccination threshold parameter where nk denotes the new number of household sizes with k = 1, 2..m, we have
m m

Rv = N

G E(TI )
k=1

nk n k

26

Partially eective vaccines


Here susceptible individuals are vaccinated partially. Hence they are not completely immune to infection and their probability of becoming infected relies on the eectiveness of the vaccine in question . This implies that the vaccine does not necessarily prevent infection, but rather mitigates its severity, which is of course a more complicated vaccine strategy since extra care would be required in terms of handling the partiality factor. Becker and Starczak (1997) stated that if xnv denotes the proportion of households with n individuals of which v are vaccinated (where v n), the probability of a global contact here is now n xnv (in comparison with the pre-vaccination case where this prob ability was n ). Suppose n k vaccinations succeed where k denotes the number of susceptibles in a household, then n k Bin (v, ) and hence the probability of n k v successful vaccinations is therefore nk nk (1 )vn+k (A18). The post-vaccination parameter Rv becomes slightly more complicated since it needs to take into account these probabilities and also the conditional probability that a global k contact is with a susceptible given there are k susceptibles which is n and so we obtain
n n

Rv = G E(TI )
n=1

n
v=0

xnv
k=nv

v nk

nk

k (1 )vn+k k n

(6.1)

Note that our primary objective here is to determine the proportion of individuals that need to be vaccinated such that Rv 1 and the way in which this reduction occurs may depend on the distribution of the vaccine; how the vaccine is allocated (see Figure 4 in Ball and Lyne (2002)). Now, most household owners will tend to vaccinate all of their children rather than a specic amount and so in the most realistic case, the proportion of individuals vaccinated in a household v would be either 0 or 1, but one cannot have knowledge of whether this consists of the home owner and/or their partner who both may only want their children to take the vaccine. In the simpler case of the overall proportion relating to the proportion vaccinated in a household, clearly these values should be close to one another; they most likely will not be equal due to the limited number of people in a household i.e. a household of size 5 can only have v = [0, 0.2, 0.4, 0.6, 0.8, 1] but the overall proportion can be any number from 0 1. If we let Cv denote the proportion of individuals vaccinated (often referred to as the vaccination coverage) under partially eective vaccination then 0 Cv 1 and
n

Cv =
n=1

n
v=0

xnv

v n

And so the probability of a contacted individual being vaccinated is Cv and unvaccinated 1 Cv . Ideally we would want Cv to be as small as possible but this of course depends 27

on other parameters such as . If Rv > 1 then the proportion of individuals that we 1 will need to vaccinate in order to prevent an epidemic would be Cv = 1 Rv i.e. consider the case where Rv = 1 which is what we want to achieve, here Cv = 0 and so since the epidemic does not take o, there is no purpose in vaccinating anybody in the population and so as Rv 1 then Cv 0. In the case where Rv then Cv 1, which implies that if the threshold value is extremely large, we will need to vaccinate everybody in order to prevent the epidemic from occurring.

6.1

All or nothing vaccine

It is of particular interest for us to adopt an all or nothing partially eective vaccine. Its application oers either complete immunity resulting in vaccinated individuals not being able to contract the disease (whether locally or globally) from a source within the population, which occurs with probability . In relation to the SIR model, these members of the population share similarities with individuals that are apart of the removal stage of the epidemic since they are also immune to (further) infection. However, rather than omitting these immune members from the process, we retain them within our model allowing us to analyse the eectiveness of the vaccine. The other only possible outcome for this particular vaccine is that of no immunity whatsoever which leaves vaccinated individuals completely vulnerable to infection and hence has no eect on anything regarding our model, which occurs with probability 1 . The probabilities of n k successful vaccinations can be constructed in Matlab as shown in Figure 6.1. function prob = v_prob(a,b,pi,lambdaL,n,v,epsilon) prob = zeros(1,v+1); for i = 0:v prob(i+1) = nchoosek(v,i)*(epsilon^(i))*(1-epsilon)^(v-i); end

Figure 6.1: calculating probabilities of successful vaccinations in Matlab (A18) In order to examine the eect that the all or nothing vaccine has on the nal size probabilities, let us consider two extreme cases for and examine how the nal size probabilities may change once v individuals in a household of size n have been vaccinated. For 0 1, let us suppose that we vaccinate v out of k susceptibles in a household consisting of n individuals where = 0 denotes a useless vaccine and = 1 refers to a perfect vaccine. In order for us to be able to determine the ecacy parameter for a particular data set, we will attempt to incorporate this parameter within our model and examine how this may eect the probability of infection i.e. the nal size 28

probabilities in (3.8) and (3.9). Suppose Pi denotes the probability of i infectives in a household of size n containing v vaccinated household members, let us consider two extreme cases. (1) Pi
(n,v) (n,v)

( = 0) = Pi

(n0)

or (2) Pi

(n,v)

( = 1) = Pi

(nv)

And so case (1) has no eect on the nal size probabilities of a household of size n whereas case (2) accounts for all of the individuals that have been successfully vaccinated within a household and subtracts this value from the household size to determine new vaccinated nal size probabilities for a household e.g. if = 1 and we attempt to vaccinate 3 individuals in a household of size 5, the nal size probabilities for this (5,3) (53) (2) (2) (2) (2) particular household would be Pi = Pi = Pi = [P0 P1 P2 ]. Clearly such a household would need to take into account all of the possibilities of the amount of individuals that may be vaccinated from 0-5. Note that zero household members vaccinated results in no change to the probability, and ve vaccinated leaves a household completely immune from infection resulting in no chance of infection and hence the probability of i infectives where i > 0 would be zero. Let us now represent our example in matrix notation where n = 5 and v = 3. (33) P0 0 0 0 . . (43) (43) P1 0 0 0 . P0 (53) (53) (53) P0 P1 P2 0 0 0 Note here that probability matrix does not take into account households of size 1 or 2 since the household size must be greater than or equal to the number of vaccinated individuals in a household (in this case 3). Recall that the rows must sum to one and (33) so P0 = 1 which simply means that a household of size three that has had three individuals vaccinated can only have zero people infected with probability one due to complete immunity within the household, all other probabilities become zero. Thus generally if v denotes the number of vaccinated individuals then P0
(vv)

((n1)v) ((n1)v) P1 P0 (nv) (nv) P0 P1

0 0 0 (n1)v) .. P(n1)v 0 0 (nv) .. .. Pnv 0 ..

Here we have considered the case where the vaccine is fully eective and one must begin to examine how these probabilities change based on values of that are within the interval [0,1]. Of course can be anything between these two extreme values and so each of these probabilities must take into account possible values of the vaccines ecacy.

29

Chapter 7 Conclusion
By utilising the SIR model, we developed an understanding of the epidemic behaviour of infectious diseases throughout a community of households, which enabled us to determine previously dened model parameters. The threshold parameter R was the primary focus regarding epidemics as it denes whether an epidemic occurs and so it was important for this value to exceed one in order to interpret and analyse the outbreak. By producing simulations that resulted in a particularly large value of R , we deduced that the severity of the epidemic increased and that a bimodal distribution for people infected was developed. Of particular importance within our model were the L and G rates of infection that dened whether contacts occurred within the same household or between dierent households. Our analysis showed us that as L , the proportion of infected households that were totally infected 1. In comparison with a globally-driven epidemic, we deduced that infections are distributed more evenly across the dierent household sizes and that more households become partly infected. In order to estimate these model parameters for given data sets, we utilised a maximum likelihood approach involving the number of infectives in a household and the probability of infected individuals in that particular household structure; the nal size probabilities. These probabilities contained model parameters L and which were maximized in order to obtain G and nally R . These estimations enabled us to draw the analysis above and thus, understand the true purpose of the key model parameters within the SIR households model. Finally we discussed vaccination in order to attempt to determine the proportion of individuals that we would need to vaccinate in order to prevent an epidemic from occurring. By examining data involving vaccinated individuals, this could have been achieved by determining the amount of individuals that produced a new post-vaccination parameter Rv = 1. We focused on an all or nothing partially eective vaccine and examined the probabilities of successful vaccinations given v vaccinees in a household of size n, and how these may eect our nal size probabilities and in eect, the likelihood; which given data, one could utilise to maximise the vaccines ecacy .

30

Chapter 8 Future work


Finally let us now consider possible extensions to our investigation regarding epidemics and ways in which one may draw further analysis. Here we utilised the SIR model but there are many other possible ways in which epidemic data can be modelled. A similar model is that of an SIS (susceptible infected susceptible) epidemic, which involves infected individuals becoming susceptible once their infectious period terminates, as opposed to the SIR model which removes the infective from the model. In terms of the population being examined, this could be sectionalized into numerous groups such as the age an individual or even the average age in a household, and examine if age eects the probability that the individual becomes infected. Naturally the same could be achieved for the gender and medical history. In the latter case, we would expect that if a susceptible has a previous medical history, this could make them more likely to contract the infectious disease from an infective. This attributes could also be examined by considering the eect they may have on model parameters i.e. given a large local rate, are elderly people more likely to contract a disease due to the conception that older individuals tend not to socialise in outside activities, in comparison to younger people, or possibly does the infectious period of 4.1 days change in accordance with age. The geographical location could also be considered in order to examine if for example, rural areas where households are generally quite small in size, may result in a less chance of local infection, or whether relationship status or job type eects the probability of infection i.e. jobs involving oce work may result in an individual being less likely to contract a disease as opposed to employment involving more social interaction. Andersson (1997) discusses epidemics in social structures. In terms of vaccination, the way in which the vaccine is allocated could be explored. The most realistic case involves vaccinating all individuals in a household or possibly the household size minus one (since this may be a home owner vaccinating their children). Here we adopted an all or nothing vaccine, Ball and Lyne (2006) explore numerous other vaccine action models such as non-random vaccine response and discrete vaccine response models. These dierent types of vaccines could be utilised in order to consider the eect they have on the threshold parameter in our discussion. Ideally, we would attempt to apply vaccination strategies for infectious data involving vaccination (see as Ball and Lyne (2006)), providing us with the ability to determine the vaccines ecacy and one could maximize this parameter, which is a characteristic of a successful vaccine.

31

References
[1] F.G. Ball and O.D. Lyne (2002). Epidemics among a population of households, pp. 115-142. [2] C.L. Addy, I.M. Longini, Jr. and M. Haber (1991). A generalized stochastic model for the analysis of infectious disease nal size data, Biometrics, 47, pp. 961-974. [3] F.G. Ball (1986). A unied approach to the distribution of total size and total area under the trajectory of infectives in epidemic models, Adv. Appl. Prob. 18, pp. 289-310. [4] R. Bartoszynski (1972). On a certain model of an epidemic, Appl. Math. 13, pp. 139-151. [5] F.G. Ball and O.D. Lyne (2006). Optimal vaccination schemes for epidemics among a population of households, with application to variola minor in Brazil, Statistical Methods in Medical Research, 15, pp. 481-497. [6] N.G. Becker and D.N. Starczak (1997). Optimal vaccination strategies for a community of households, Math, Biosci. 139, pp. 117-132. [7] F.G. Ball, D. Mollison and G. Scalia-Tomba (1997). Epidemics with two levels of mixing, Ann. Appl. Prob. 7, pp. 46-89 [8] N.G. Becker and K. Dietz (1995). The eect of the household distribution on transmission and control of highly infectious diseases, Math. Biosci. 127, pp. 207-219 [9] N.T.J. Bailey (1975). The Mathematical Theory of Infectious Diseases and its Applications, 2nd edn. Grin, London. [10] H. Andersson (1997). Epidemics in a population with social structures, Math. Biosci. 140, pp. 79-84. [11] S. Rushton and A.J. Mautner (1955). The deterministic model of a simple epidemic for more than one community, Biometrika 42, pp. 126-132. [12] H. Andersson (1999). Epidemic models and social networks, The Mathematical Scientist, 24, pp. 128-147.

32

Appendix
A list explaining the purpose of each le in the electronic appendices. (A1) alpha n.m - computes n (A2) alphatilde n.m - computes n (A3) beta k.m - computes k (A4) contourplot.m - produces contour plots of the likelihood (A5) nal size.m - computes Pin (A6) nal size2.m - converts the probabilities into matrix form (A7) fminsearch2.m - modied version of fminsearch (A8) lambda g.m - computes G (A9) likelihood.m - constructs the likelihood matrix (A10) likelihood2.m - modies the likelihood for fminsearch2 (A11) mu n.m - computes n (A12) mu na.m - computes n,a (A13) phi.m - computes (A14) pi.m - computes (A15) r1.m - computes R for values of L and G (A16) r2.m - computes R by calculating G based on L and (A17) simhouses.m - simulates epidemic data (A18) v prob.m - computes n k successful vaccinations (A19) v r.m - computes Rv (A20) z.m - computes z

33

You might also like