The BUGS Project Examples Volume 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Contents

Examples Volume 2
Dugongs: a nonconjugate, nonlinear model Orange trees: a hierarchical, nonlinear model
Multivariate Orange trees: a hierarchical, nonlinear model

Biopsies: latent class model Eyes: normal mixture model Hearts: a mixture model for count data Air: covariate measurement error Cervix: case-control study with errors in covariates Jaw: repeated measures analysis of variance Birats: a bivariate Normal hierarchical model Schools: muliivariate hierarchical model of examination results Ice: non-parametric smoothing in an age-cohort model Beetles: logistic, probit and extreme value models Alli: multinomial logistic model Endo: conditional inference in case-control studies Stagnant: a change point problem Asia: an expert system References: Sorry - an on-line version of the references is currently unavailable. Please refer to the existing Examples documentation available from http://www.mrc-bsu.cam.ac.uk/bugs.

[1]

Examples Volume II

Dugongs

Dugongs: nonlinear growth curve


Carlin and Gelfand (1991) present a nonconjugate Bayesian analysis of the following data set from Ratkowsky (1983):
Dugong 1 2 3 4 5 .... 26 27 ______________________________________________________ Age (X) 1.0 1.5 1.5 1.5 2.5 .... 29.0 31.5 Length (Y) 1.80 1.85 1.87 1.77 2.02 .... 2.27 2.57

The data are length and age measurements for 27 captured dugongs (sea cows). Carlin and Gelfand (1991) model this data using a nonlinear growth curve with no inflection point and an asymptote as Xi tends to infinity: Yi ~ Normal(i, ), i = 1,...,27 i = - Xi , > 1; 0 < < 1 Standard noninformative priors are adopted for , and , and a uniform prior on (0,1) is assumed for . However, this specification leads to a non conjugate full conditional distribution for which is also non log-concave. The graph and corresponding BUGS code is given below

alpha

beta

gamma

U3

tau

mu[i]

x[i]

sigma

Y[i] for(i IN 1 : N)

[2]

Dugongs model { for( i in 1 : N ) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha - beta * pow(gamma,x[i]) } alpha ~ dnorm(0.0, 1.0E-6) beta ~ dnorm(0.0, 1.0E-6) gamma ~ dunif(0.5, 1.0) tau ~ dgamma(0.001, 0.001) sigma <- 1 / sqrt(tau) U3 <- logit(gamma) }

Examples Volume II

Data ( click to open ) Inits ( click to open ) Results


A 1000 update burn in followed by a further 10000 updates gave the parameter estimates
mean 1.861 2.652 0.9729 0.8623 0.0992 sd 0.2678 0.07094 0.07649 0.03259 0.01496 MC_error 0.01189 0.003378 0.001806 0.001393 1.831E-4 val2.5pc 1.321 2.532 0.8251 0.7894 0.07513 median 1.865 2.646 0.9711 0.8658 0.09742 val97.5pc 2.37 2.808 1.129 0.9145 0.1339 start 1001 1001 1001 1001 1001 sample 10000 10000 10000 10000 10000

U3 alpha beta gamma sigma

model { for( i in 1 : N ) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha - beta * pow(gamma,x[i]) } alpha ~ dnorm(0.0, 1.0E-6) beta ~ dnorm(0.0, 1.0E-6) logit(gamma) <- U3 tau ~ dgamma(0.001, 0.001) sigma <- 1 / sqrt(tau) U3 ~ dnorm(0, 1.0E-4) }
list(alpha = 1, beta = 1, tau = 1, U3 = 0) mean sd MC_error val2.5pc median val97.5pc start sample

[3]

Examples Volume II
U3 alpha beta gamma sigma 1.912 2.665 0.9753 0.8684 0.09871 0.2609 0.07564 0.07757 0.02941 0.01474 0.01072 0.002835 0.00325 0.00123 2.373E-4 1.415 2.544 0.8274 0.8046 0.07482 1.904 2.655 0.9752 0.8704 0.09716 2.459 2.848 1.132 0.9212 0.1321 2001 2001 2001 2001 2001 9000 9000 9000 9000 9000

Dugongs

[4]

Otrees

Examples Volume II

Orange Trees: Non-linear growth curve


This dataset was originally presented by Draper and Smith (1981) and reanalysed by Lindstrom and Bates (1990). The data Yij consist of trunk circumference measurements recorded at time xj, j=1,...,7 for each of i = 1,..., 5 orange trees. We consider a logistic growth curve as follows: Yij ij ~ Normal(ij, c) = i1 _______________ 1 + i2 exp( i3 xj )

i1 = log( i1) i2 = log( i2 + 1) i3 = log( i3) The BUGS code is as follows model { for (i in 1:K) { for (j in 1:n) { Y[i, j] ~ dnorm(eta[i, j], tauC) eta[i, j] <- phi[i, 1] / (1 + phi[i, 2] * exp(phi[i, 3] * x[j])) } phi[i, 1] <- exp(theta[i, 1]) phi[i, 2] <- exp(theta[i, 2]) - 1 phi[i, 3] <- -exp(theta[i, 3]) for (k in 1:3) { theta[i, k] ~ dnorm(mu[k], tau[k]) } } tauC ~ dgamma(1.0E-3, 1.0E-3) sigmaC <- 1 / sqrt(tauC) varC <- 1 / tauC for (k in 1:3) { mu[k] ~ dnorm(0, 1.0E-4) tau[k] ~ dgamma(1.0E-3, 1.0E-3) sigma[k] <- 1 / sqrt(tau[k]) } }

[5]

Examples Volume II

Otrees

Data ( click to open ) Inits ( click to open ) Results


The hybrid Metropolis algorithm is used to sample the theta parameters in this model. The step length used for this algorithm adapts for the first 4000 iterations and these samples are discarded from the summary statistics. A further 1000 update burn-in followed by 10000 updates gave the following parameter estimates:
mu[1] mu[2] mu[3] sigma[1] sigma[2] sigma[3] sigmaC mean 5.257 2.211 -5.869 0.2332 0.1383 0.1012 8.065 sd 0.1279 0.1277 0.1091 0.1357 0.1147 0.08341 1.244 MC_error 0.002334 0.004119 0.004242 0.00204 0.003672 0.002777 0.03079 val2.5pc 5.002 1.965 -6.113 0.08448 0.02607 0.02317 6.014 median 5.256 2.209 -5.861 0.204 0.1078 0.07675 7.93 val97.5pc 5.505 2.469 -5.676 0.5494 0.4207 0.3234 10.92 start 5001 5001 5001 5001 5001 5001 5001 sample 10000 10000 10000 10000 10000 10000 10000

The current point Metropolis algorithm is used to sample the theta parameters in this model. The Gaussian proposal distribution used for this algorithm adapts for the first 4000 iterations and these samples are discarded from the summary statistics. A further 1000 update burn-in followed by 10000 updates gave the following parameter estimates:
mu[1] mu[2] mu[3] sigma[1] sigma[2] sigma[3] sigmaC theta[1,1] theta[1,2] theta[1,3] theta[2,1] theta[2,2] theta[2,3] theta[3,1] theta[3,2] theta[3,3] theta[4,1] theta[4,2] theta[4,3] theta[5,1] theta[5,2] theta[5,3] mean 5.254 2.22 -5.861 0.2245 0.1342 0.1098 8.025 5.079 2.134 -5.851 5.395 2.207 -5.825 5.079 2.187 -5.908 5.441 2.269 -5.816 5.291 2.299 -5.907 sd 0.1242 0.1252 0.1143 0.1235 0.1219 0.09349 1.216 0.08832 0.1542 0.149 0.05096 0.1245 0.1015 0.09932 0.1351 0.1494 0.04836 0.1395 0.1021 0.06828 0.1351 0.1075 MC_error 0.004513 0.007917 0.008563 0.00357 0.005743 0.005828 0.03895 0.007158 0.01001 0.0126 0.003465 0.008209 0.007943 0.008296 0.008393 0.01298 0.003287 0.009928 0.008087 0.005157 0.009323 0.008937 val2.5pc 5.002 1.994 -6.098 0.07706 0.02447 0.02354 6.03 4.949 1.823 -6.19 5.3 1.962 -6.028 4.945 1.915 -6.286 5.347 2.024 -6.008 5.174 2.05 -6.125 median 5.258 2.216 -5.86 0.1963 0.1009 0.08214 7.89 5.066 2.136 -5.849 5.393 2.205 -5.828 5.06 2.188 -5.89 5.442 2.256 -5.825 5.284 2.295 -5.903 val97.5pc 5.488 2.469 -5.657 0.5306 0.4428 0.3591 10.77 5.326 2.423 -5.583 5.505 2.46 -5.624 5.356 2.447 -5.666 5.543 2.566 -5.591 5.438 2.589 -5.7 start 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 5001 sample 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000

[6]

OtreesMVN

Examples Volume II

Orange Trees: Non-linear growth curve


We repeat the Otrees example, replacing the 3 independent univariate Normal priors for each ik , k=1,2,3 by a multivariate Normal prior i ~ MNV( , ) model { for (i in 1:K) { for (j in 1:n) { Y[i, j] ~ dnorm(eta[i, j], tauC) eta[i, j] <- phi[i, 1] / (1 + phi[i, 2] * exp(phi[i, 3] * x[j])) } phi[i, 1] <- exp(theta[i, 1]) phi[i, 2] <- exp(theta[i, 2]) - 1 phi[i, 3] <- -exp(theta[i, 3]) theta[i, 1:3] ~ dmnorm(mu[1:3], tau[1:3, 1:3]) } mu[1:3] ~ dmnorm(mean[1:3], prec[1:3, 1:3]) tau[1:3, 1:3] ~ dwish(R[1:3, 1:3], 3) sigma2[1:3, 1:3] <- inverse(tau[1:3, 1:3]) for (i in 1 : 3) {sigma[i] <- sqrt(sigma2[i, i]) } tauC ~ dgamma(1.0E-3, 1.0E-3) sigmaC <- 1 / sqrt(tauC) }

Data ( click to open ) Inits ( click to open ) Results


A 4000 iteration Metropolis adaptive phase plus 1000 update burn in followed by a further 10000 updates gave the parameter estimates:

[7]

Examples Volume II
mean 5.265 2.2 -5.88 0.2581 0.2679 0.2296 7.853 sd 0.1351 0.1656 0.141 0.1145 0.1291 0.1101 1.19 MC_error 0.001577 0.002555 0.002287 0.001681 0.002343 0.001523 0.02499 val2.5pc 4.992 1.874 -6.171 0.1268 0.1191 0.1085 5.923 median 5.263 2.197 -5.877 0.231 0.2368 0.2036 7.715 val97.5pc 5.537 2.522 -5.614 0.558 0.5925 0.5048 10.53 start 5001 5001 5001 5001 5001 5001 5001 sample 10000 10000 10000 10000 10000 10000 10000

OtreesMVN

mu[1] mu[2] mu[3] sigma[1] sigma[2] sigma[3] sigmaC

[8]

Biopsies

Examples Volume II

Biopsies: discrete variable latent class model


Spiegelhalter and Stovin (1983) presented data on repeated biopsies of transplanted hearts, in which a total of 414 biopsies had been taken at 157 sessions. Each biopsy was graded on evidence of rejection using a 4 category scale of none (O), minimal (M), mild (+) and moderatesevere (++). Part of the data is shown below.

Combination Multinomial response Session frequency _______________________________________________________ O O (2, 0, 0, 0) 12 M M O (1, 2, 0, 0) 10 + + O (1, 0, 2, 0) 17 ++ ++ ++ (0, 0, 0, 3) 5

The sampling procedure may not detect the area of maximum rejection, which is considered the true underlying state at the time of the session and denoted ti --- the underlying probability distribution of the four true states is denoted by the vector p. It is then assumed that each of the observed biopsies are conditionally independent given this truestate with the restriction that there are no`false positives': i.e. one cannot observe a biopsy worse than the true state. We then have the sampling model bi ~ Multinomial(eti, ni) ti ~ Categorical(p) where bi denotes the multinomial response at session i where ni biopsies have been taken, and ejk is the probability that a true state ti = j generates a biopsy in state k.The no-false-positive restriction means that e12 = e13 = e14 = e23 = e24 = e34 = 0. Spiegelhalter and Stovin (1983) estimated the parameters ej and p using the EM algorithm, with some smoothing to avoid zero estimates. The appropriate graph is shown below, where the role of the true state ti is simply to pick the appropriate row from the 4 x 4 error matrix e. Here the probability vectors ej (j = 1,...,4) and p are assumed to have uniform priors on the unit simplex, which correspond to Dirichlet priors with all parameters being 1. The BUGS code for this model is given below. No initial values are provided for the latent states, since the forward sampling procedure will find a configuration of starting values that is compatible with the expressed constraints. We also note the apparent ``cycle'' in the graph created by the expression nbiops[i] <- sum(biopsies[i,]). This will lead Such ``cycles'' are [9]

Examples Volume II

Biopsies

permitted provided that they are only data transformation statements, since this does not affect the essential probability model. model { for (i in 1 : ns){ nbiops[i] <- sum(biopsies[i, ]) true[i] ~ dcat(p[]) biopsies[i, 1 : 4] ~ dmulti(error[true[i], ], nbiops[i]) } error[2,1 : 2] ~ ddirch(prior[1 : 2]) error[3,1 : 3] ~ ddirch(prior[1 : 3]) error[4,1 : 4] ~ ddirch(prior[1 : 4]) p[1 : 4] ~ ddirch(prior[]); # prior for p }

Data ( click to open ) Inits ( click to open ) Results


A 1000 update burn in followed by a further 10000 updates gave the parameter estimates
mean 0.5875 0.4125 0.342 0.03729 0.6207 0.09933 0.02225 0.2037 0.6747 0.1529 0.3109 0.3892 0.1471 sd 0.0663 0.0663 0.04584 0.01782 0.04782 0.04218 0.02302 0.06101 0.07271 0.04962 0.0549 0.0437 0.0298 MC_error 0.001731 0.001731 7.001E-4 2.503E-4 7.253E-4 5.187E-4 3.867E-4 9.381E-4 0.001124 0.001503 0.00144 6.675E-4 3.433E-4 val2.5pc 0.4557 0.286 0.256 0.009585 0.5222 0.03382 5.186E-4 0.1013 0.5228 0.04877 0.216 0.3055 0.094 median 0.5884 0.4116 0.3403 0.03488 0.622 0.09397 0.01488 0.1984 0.6792 0.1551 0.3066 0.3879 0.1448 val97.5pc 0.714 0.5444 0.4363 0.07774 0.7107 0.1968 0.08594 0.3374 0.8044 0.2459 0.4323 0.4775 0.2106 start 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 sample 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000

error[2,1] error[2,2] error[3,1] error[3,2] error[3,3] error[4,1] error[4,2] error[4,3] error[4,4] p[1] p[2] p[3] p[4]

[10]

Eyes

Examples Volume II

Eyes: Normal Mixture Model


Bowmaker et al (1985) analyse data on the peak sensitivity wavelengths for individual microspectrophotometric records on a small set of monkey's eyes. Data for one monkey (S14 in the paper) are given below (500 has been subtracted from each of the 48 measurements).
29.0 35.4 38.5 43.8 50.6 30.0 35.9 38.6 43.9 51.2 32.0 36.1 39.4 45.3 51.4 33.1 36.3 39.6 46.2 51.5 33.4 36.4 40.4 48.8 51.6 33.6 36.6 40.8 48.7 52.8 33.7 37.0 42.0 48.9 52.9 34.1 37.4 42.8 49.0 53.2 34.8 37.5 43.0 49.4 35.3 38.3 43.5 49.9

Part of the analysis involves fitting a mixture of two normal distributions with common variance to this distribution, so that each observation yi is assumed drawn from one of two groups. Ti = 1, 2 be the true group of the i th observation, where group j has a normal distribution with mean j and precision . We assume an unknown fraction P of observations are in group 2, 1 - P in group 1. The model is thus yi ~ Normal(Ti, ) Ti ~ Categorical(P). We note that this formulation easily generalises to additional components to the mixture, although for identifiability an order constraint must be put onto the group means. Robert (1994) points out that when using this model, there is a danger that at some iteration, all the data will go into one component of themixture, and this state will be difficult to escape from --this matches our experience. obert suggests a re-parameterisation, a simplified version of which is to assume 2 = 1 + , > 0. 1, , , P, are given independent ``noninformative" priors, including a uniform prior for P on (0,1). The appropriate graph and the BUGS code are given below.

[11]

Examples Volume II
name: mean theta 0.0 type: precision stochastic 1.0E-6 density: dnorm upper bound 0 upper bound

Eyes

theta

lambda[1]

P[1:2]

alpha[]

lambda[2]

T[i]

tau

mu[i]

sigma y[i] for(i IN 1 : N)

model { for( i in 1 : N ) { y[i] ~ dnorm(mu[i], tau) mu[i] <- lambda[T[i]] T[i] ~ dcat(P[]) } P[1:2] ~ ddirch(alpha[]) theta ~ dnorm(0.0, 1.0E-6)I(0.0, ) lambda[2] <- lambda[1] + theta lambda[1] ~ dnorm(0.0, 1.0E-6) tau ~ dgamma(0.001, 0.001) sigma <- 1 / sqrt(tau) }

Data ( click to open ) Inits ( click to open ) Results


A 1000 update burn in followed by a further 10000 updates gave the parameter estimates

[12]

Eyes
mean 0.6014 0.3986 536.8 548.9 3.805 sd 0.08981 0.08981 1.023 1.388 0.726 MC_error 0.002305 0.002305 0.03708 0.03856 0.03322 val2.5pc 0.4267 0.2299 535.0 546.0 2.932 median 0.602 0.398 536.7 548.9 3.652 val97.5pc 0.7701 0.5733 539.0 551.3 6.014 start 1001 1001 1001 1001 1001

Examples Volume II
sample 10000 10000 10000 10000 10000

P[1] P[2] lambda[1] lambda[2] sigma

[13]

Examples Volume II

Hearts

Hearts: a mixture model for count data


The table below presents data given by Berry (1987) on the effect of a drug used to treat patients with frequent premature ventricular contractions (PVCs) of the heart.
PVC's per minute number (i) Pre-drug (xi) Post-drug (yi) Decrease __________________________________________________ 1 6 5 1 2 9 2 7 3 17 0 17 . .. .. .. 11 9 13 4 12 51 0 51

Farewell and Sprott (1988) model these data as a mixture distribution of Poisson counts in which some patients are "cured" by the drug, whilst others experience varying levels of response but remain abnormal. A zero count for the post-drug PVC may indicate a "cure", or may represent a sampling zero from a patient with a mildly abnormal PVC count. The following model thus is assumed: xi ~ Poisson(i) for all patients yi ~ Poisson( i) for all uncured patients P(cure) = To eliminate nuisance parameters li, Farewell and Sprott use the conditional distribution of yi given ti = xi + yi. This is equivalent to a binomial likelihood for yi with denominator ti and probability p = b /(1+b) (see Cox and Hinkley, 1974 pp. 136-137 for further details of the conditional distribution for Poisson variables). Hence the final mixture model may be expressed as follows: P(yi = 0 | ti ) = + (1 ) (1 p) ti P(yi | ti ) = (1 ) (ti! / (yi! (tiyi)!)) (pyi (1 p) (ti yi)

yi = 1,2,...,ti

The BUGS code for this model is given below: model { for (i in 1 : N) { y[i] ~ dbin(P[state1[i]], t[i]) state[i] ~ dbern(theta) [14]

Hearts state1[i] <- state[i] + 1 t[i] <- x[i] + y[i] prop[i] <- P[state1[i]]

Examples Volume II

} P[1] <- p P[2] <- 0 logit(p) <- alpha alpha ~ dnorm(0,1.0E-4) beta <- exp(alpha) logit(theta) <- delta delta ~ dnorm(0, 1.0E-4)

Data ( click to open ) Inits ( click to open ) Results


A 1000 update burn in followed by a further 10000 updates gave the parameter estimates
mean -0.4809 0.6427 0.3144 0.5717 sd 0.2795 0.1812 0.6177 0.1391 MC_error 0.002701 0.001765 0.006344 0.001417 val2.5pc -1.044 0.3521 -0.8919 0.2907 median -0.4767 0.6208 0.3124 0.5775 val97.5pc 0.0652 1.067 1.553 0.8253 start 1001 1001 1001 1001 sample 10000 10000 10000 10000

alpha beta delta theta

[15]

Examples Volume II

Air

Air: Berkson measurement error


Whittemore and Keller (1988) use an approximate maximum likelihood approach to analyse the data shown below on reported respiratory illness versus exposure to nitrogen dioxide (NO2) in 103 children. Stephens and Dellaportas (1992) later use Bayesian methods to analyse the same data.
Bedroom NO2 level in ppb (z) Respiratory illness (y) <20 20--40 40+ Total ________________________________________________ Yes 21 20 15 56 No 27 14 6 47 Total 48 34 21 103

A discrete covariate zj (j = 1,2,3) representing NO2 concentration in the child's bedroom classified into 3 categories is used as a surrogate for true exposure. The nature of the measurement error relationship associated with this covariate is known precisely via a calibration study, and is given by xj = + zj + j where = 4.48, = 0.76 and j is a random element having normal distribution with zero mean and variance 2 (= 1/) = 81.14. Note that this is a Berkson (1950) model of measurement error, in which the true values of the covariate are expressed as a function of the observed values. Hence the measurement error is independent of the latter, but is correlated with the true underlying covariate values. In the present example, the observed covariate zj takes values 10, 30 or 50 for j = 1, 2, or 3 respectively (i.e. the mid-point of each category), whilst xj is interpreted as the "true average value" of NO2 in group j. The response variable is binary, reflecting presence/absence of respiratory illness, and a logistic regression model is assumed. That is yj ~ Binomial(pj, nj) logit(pj) = 1 + 2 xj where pj is the probability of respiratory illness for children in the jth exposure group. The regression coefficients 1 and 2 are given vague independent normal priors. The graphical model is shown below:

[16]

Air

Examples Volume II

tau

mu[j] theta[1] X[j]

theta[2]

p[j]

n[j]

y[j] for(j IN 1 : J)

model { for(j in 1 : J) { y[j] ~ dbin(p[j], n[j]) logit(p[j]) <- theta[1] + theta[2] * X[j] X[j] ~ dnorm(mu[j], tau) mu[j] <- alpha + beta * Z[j] } theta[1] ~ dnorm(0.0, 0.001) theta[2] ~ dnorm(0.0, 0.001) }

Data Inits

( click to open ) ( click to open )

Results
A 1000 update burn in followed by a further 10000 updates gave the parameter estimates

[17]

Examples Volume II
mean 12.92 27.21 40.85 -0.9628 0.04927 sd 7.877 7.473 8.721 1.0 0.04276 MC_error 0.4227 0.1946 0.3502 0.08808 0.003759 val2.5pc -3.775 13.05 24.18 -4.233 0.004071 median 13.3 27.01 40.84 -0.7183 0.03857 val97.5pc 26.96 42.63 58.37 0.2104 0.1951 start 1001 1001 1001 1001 1001 sample 10000 10000 10000 10000 10000

Air

X[1] X[2] X[3] theta[1] theta[2]

Re-parameterised model with centred covariates: model { for( j in 1 : J ) { y[j] ~ dbin(p[j],n[j]) logit(p[j]) <- theta0+ theta[2] * (X[j] - mean(mu[])) X[j] ~ dnorm(mu[j],tau) mu[j] <- alpha + beta * Z[j] } theta0 ~ dnorm(0.0,0.001) theta[2] ~ dnorm(0.0,0.001) theta[1] <- theta0 - theta[2] * mean(mu[]) }

Inits ( click to open ) Results


A 1000 update burn in followed by a further 10000 updates gave the parameter estimates, with over-relaxation.
mean 13.27 27.28 41.03 -0.9269 0.04688 sd 8.04 7.455 8.468 0.7985 0.02995 MC_error 0.4047 0.1798 0.2267 0.05205 0.001802 val2.5pc -3.199 12.69 25.39 -3.068 0.003793 median 13.57 27.2 40.83 -0.7581 0.04105 val97.5pc 28.24 42.06 58.25 0.206 0.1188 start 1001 1001 1001 1001 1001 sample 10000 10000 10000 10000 10000

X[1] X[2] X[3] theta[1] theta[2]

[18]

Cervix

Examples Volume II

Cervix: case - control study with errors in covariates


Carroll, Gail and Lubin (1993) consider the problem of estimating the odds ratio of a disease d in a case-control study where the binary exposure variable is measured with error. Their example concerns exposure to herpes simplex virus (HSV) in women with invasive cervical cancer (d=1) and in controls (d=0). Exposure to HSV is measured by a relatively inaccurate western blot procedure w for 1929 of the 2044 women, whilst for 115 women, it is also measured by a refined or "gold standard'' method x. The data are given in the table below. They show a substantial amount of misclassification, as indicated by low sensitivity and specificity of w in the "complete'' data, and Carroll, Gail and Lubin also found that the degree of misclassification was significantly higher for the controls than for the cases (p=0.049 by Fisher's exact test).

d x w Count __________________________ Complete data __________________________ 1 0 0 13 1 0 1 3 1 1 0 5 1 1 1 18 0 0 0 33 0 0 1 11 0 1 0 16 0 1 1 16 _________________________ Incomplete data _________________________ 1 0 318 1 1 375 1 0 701 1 1 535

They fitted a prospective logistic model to the case-control data as follows di ~ Bernoulli(pi) i = 1,...,2044 logit(pi) = 0C + xi i = 1,...,2044 where is the log odds ratio of disease. Since the relationship between d and x is only directly observable in the 115 women with "complete'' data, and because there is evidence of differential measurement error, the following parameters are required in order to estimate the logistic model

[19]

Examples Volume II 1,1 1,2 2,1 2,2 q = = = = = P(w=1 | x=0, d=0) P(w=1 | x=0, d=1) P(w=1 | x=1, d=0) P(w=1 | x=1, d=1) P(x=1)

Cervix

The differential probability of being exposed to HSV (x=1) for cases and controls is calculated as follows

= =

P(x=1 | d=1) P(d=1 | x=1) P(x=1) ----------------------------P(d=1) 1 ----------------------------------------------------1 + (1 + exp 0C + ) / (1 + exp 0C) P(x=1 | d=0) P(d=0 | x=1) P(x=1) ----------------------------P(d=0) 1 1-q -----------------------------------------------------------1 + (1 + exp 0C ) / (1 + exp 0C) q 1-q -------q

= =

The BUGS code is given below. The role of the variables x1 and d1 is to pick the appropriate value of (the incidence of w) for any given true exposure status x and disease status d. Since x and d take the values 0 or 1, and the subscripts for take values 1 or 2, we must first add 1 to each x[i] and d[i] in the BUGS code before using them as index values for . BUGS does not allow subscripts to be functions of variable quantities --- hence the need to create x1and d1 for use as subscripts. In addition, note that 1 and 2 were not simulated directly in BUGS, but were calculated as functions of other parameters. This is because the dependence of 1 and 2 on d would have led to a cycle in the graphical model which would no longer define a probability distribution. model { for (i in 1 : N) { x[i] ~ dbern(q) # incidence of HSV logit(p[i]) <- beta0C + beta * x[i] # logistic model d[i] ~ dbern(p[i]) # incidence of cancer x1[i] <- x[i] + 1 d1[i] <- d[i] + 1 [20]

Cervix

Examples Volume II w[i] ~ dbern(phi[x1[i], d1[i]]) # incidence of w } q ~ dunif(0.0, 1.0) # prior distributions beta0C ~ dnorm(0.0, 0.00001); beta ~ dnorm(0.0, 0.00001); for(j in 1 : 2) { for(k in 1 : 2){ phi[j, k] ~ dunif(0.0, 1.0) } } # calculate gamma1 = P(x=1|d=0) and gamma2 = P(x=1|d=1) gamma1 <- 1 / (1 + (1 + exp(beta0C + beta)) / (1 + exp(beta0C)) * (1 - q) / q) gamma2 <- 1 / (1 + (1 + exp(-beta0C - beta)) / (1 + exp(-beta0C)) * (1 - q) / q) }

Data ( click to open ) Inits ( click to open ) Results


A 1000 update burn in followed by a further 10000 updates gave the parameter estimates
mean -0.9265 0.4371 0.5969 0.3177 0.2138 0.5694 0.7623 0.4943 sd 0.207 0.05431 0.06438 0.05363 0.08148 0.06434 0.06328 0.04017 MC_error 0.01102 0.002994 0.003332 0.002669 0.004308 0.002928 0.003054 0.002071 val2.5pc -1.357 0.3286 0.4727 0.2142 0.07201 0.4461 0.6371 0.4135 median -0.9144 0.4372 0.5948 0.3187 0.209 0.5696 0.7643 0.494 val97.5pc -0.5501 0.5451 0.731 0.4168 0.3849 0.691 0.8789 0.5728 start 1001 1001 1001 1001 1001 1001 1001 1001 sample 10000 10000 10000 10000 10000 10000 10000 10000

beta0C gamma1 gamma2 phi[1,1] phi[1,2] phi[2,1] phi[2,2] q

Re-parameterised model with centred covariates: model { for (i in 1 : N) { x[i] ~ dbern(q) # incidence of HSV logit(p[i]) <- beta0 + beta * (x[i] - mean(w[])) # logistic model d[i] ~ dbern(p[i]) # incidence of cancer x1[i] <- x[i] + 1 d1[i] <- d[i] + 1 w[i] ~ dbern(phi[x1[i], d1[i]]) # incidence of w } q ~ dunif(0.0, 1.0) # prior distributions [21]

Examples Volume II beta0 ~ dnorm(0.0, 0.00001); beta ~ dnorm(0.0, 0.00001); for(j in 1 : 2) { for(k in 1 : 2){ phi[j, k] ~ dunif(0.0, 1.0) } } # calculate gamma1 = P(x=1|d=0) and gamma2 = P(x=1|d=1) gamma1 <- 1 / (1 + (1 + exp(beta0C + beta)) / (1 + exp(beta0C)) * (1 - q) / q) gamma2 <- 1 / (1 + (1 + exp(-beta0C - beta)) / (1 + exp(-beta0C)) * (1 - q) / q) beta0C <- beta0 - mean(w[]) * beta }

Cervix

Inits ( click to open )


beta0C gamma1 gamma2 phi[1,1] phi[1,2] phi[2,1] phi[2,2] q mean -0.921 0.4389 0.5964 0.318 0.221 0.5664 0.7585 0.4953 sd 0.2036 0.05766 0.0635 0.05831 0.08396 0.0666 0.06472 0.04198 MC_error 0.0114 0.003321 0.003451 0.003108 0.004839 0.003198 0.003465 0.002285 val2.5pc -1.327 0.3274 0.4721 0.2003 0.0738 0.4325 0.6282 0.4138 median -0.9178 0.4396 0.5967 0.319 0.2146 0.5682 0.7603 0.4953 val97.5pc -0.5276 0.5513 0.719 0.4263 0.3988 0.6918 0.8797 0.5773 start 1001 1001 1001 1001 1001 1001 1001 1001 sample 10000 10000 10000 10000 10000 10000 10000 10000

[22]

Birats

Examples Volume II

Birats: a bivariate normal hierarchical model


We return to the Rats example, and illustrate the use of a multivariate Normal (MVN) population distribution for the regression coefficients of the growth curve for each rat. This is the model adopted by Gelfand etal (1990) for these data, and assumes a priori that the intercept and slope parameters for each rat are correlated. For example, positive correlation would imply that initially heavy rats (high intercept) tend to gain weight more rapidly (steeper slope) than lighter rats. The model is as follows Yij ~ Normal(ij, c) ij = 1i + 2i xj i ~ MVN( , ) where Yij is the weight of the ith rat measured at age xj, and i denotes the vector ( 1i, 2i). We assume 'non-informative' independent univariate Normal priors for the separate components 1 and 2. A Wishart(R, ) prior was specified for , the population precision matrix of the regression coefficients. To represent vague prior knowledge, we chose the the degrees of freedom for this distribution to be as small as possible (i.e. 2, the rank of ). The scale matrix was specified as
R= | 200, | 0, 0 | 0.2 |

This represents our prior guess at the order of magnitude of the covariance matrix -1 for i (see Classic BUGS manual (version 0.5) section on Multivariate normal models), and is equivalent to the prior specification used by Gelfand et al. Finally, a non-informative Gamma(0.001, 0.001) prior was assumed for the measurement precision c.

[23]

Examples Volume II

Birats

mean[]

mu.beta[1:2]

R[1:2, 1:2]

Omega[ , ]

prec[ , ]

beta[i, 1:2]

tauC mu[i, j] x[j]

sigma

Y[i, j] for(j IN 1 : T) for(i IN 1 : N)

model { for( i in 1 : N ) { beta[i , 1 : 2] ~ dmnorm(mu.beta[], R[ , ]) for( j in 1 : T ) { Y[i, j] ~ dnorm(mu[i , j], tauC) mu[i, j] <- beta[i, 1] + beta[i, 2] * x[j] } } mu.beta[1 : 2] ~ dmnorm(mean[], prec[ , ]) R[1 : 2 , 1 : 2] ~ dwish(Omega[ , ], 2) tauC ~ dgamma(0.001, 0.001) sigma <- 1 / sqrt(tauC)

Data ( click to open ) Inits ( click to open ) Results


A 1000 update burn in followed by a further 10000 updates gave the parameter estimates

[24]

Birats
mean 106.6 6.185 6.136 sd 2.35 0.1062 0.4781 MC_error 0.0335 0.001351 0.009095 val2.5pc 101.8 5.981 5.283 median 106.6 6.185 6.1 val97.5pc 111.2 6.397 7.137 start 1001 1001 1001

Examples Volume II
sample 10000 10000 10000

mu.beta[1] mu.beta[2] sigma

[25]

Examples Volume II

Schools

Schools: ranking school examination results using multivariate hierarcical models


Goldstein et al. (1993) present an analysis of examination results from inner London schools. They use hierarchical or multilevel models to study the between-school variation, and calculate school-level residuals in an attempt to differentiate between `good' and `bad' schools. Here we analyse a subset of this data and show how to calculate a rank ordering of schools and obtain credible intervals on each rank. Data Standardized mean examination scores (Y) were available for 1978 pupils from 38 different schools. The median number of pupils per school was 48, with a range of 1--198. Pupil-level covariates included gender plus a standardized London Reading Test (LRT) score and a verbal reasoning (VR) test category (1, 2 or 3, where 1 represents the highest ability group) measured when each child was aged 11. Each school was classified by gender intake (all girls, all boys or mixed) and denomination (Church of England, Roman Catholic, State school or other); these were used as categorical school-level covariates. Model We consider the following model, which essentially corresponds to Goldstein et al.'s model 1. Yij ij ~ Normal(ij, ij) = 1j + 2j LRTij + 3j VR1ij + 1 LRTij2 + 2 VR2ij + 3 Girlij + 4 Girls' schoolj + 5 Boys' schoolj + 6 CE schoolj + 7 RC schoolj + 8 other schoolj log ij = + LRTij

where i refers to pupil and j indexes school. We wish to specify a regression model for the variance components, and here we model the logarithm of ij (the inverse of the between-pupil variance) as a linear function of each pupil's LRT score. This differs from Goldstein et al.'s model which allows the variance 2ij to depend linearly on LRT. However, such a parameterization may lead to negative estimates of 2ij. Prior distributions The fixed effects k (k=1,...,8), and were assumed to follow vague independent Normal distributions with zero mean and low precision = 0.0001. The random school-level coefficients [26]

Schools

Examples Volume II

kj (k = 1,2,3) were assumed to arise from a multivariate normal population distribution with unknown mean and covariance matrix . A non-informative multivariate normal prior was then specified for the population mean , whilst the inverse covariance matrix = -1 was assumed to follow a Wishart distribution. To represent vague prior knowledge, we chose the the degrees of freedom for this distribution to be as small as possible (i.e. 3, the rank of ). The scale matrix R was specified as
0.1 0.005 0.005 0.005 0.01 0.005 0.005 0.005 0.01

which represents our prior guess at the order of magnitude of . The BUGS code is given below: model { for(p in 1 : N) { Y[p] ~ dnorm(mu[p], tau[p]) mu[p] <- alpha[school[p], 1] + alpha[school[p], 2] * LRT[p] + alpha[school[p], 3] * VR[p, 1] + beta[1] * LRT2[p] + beta[2] * VR[p, 2] + beta[3] * Gender[p] + beta[4] * School.gender[p, 1] + beta[5] * School.gender[p, 2] + beta[6] * School.denom[p, 1] + beta[7] * School.denom[p, 2] + beta[8] * School.denom[p, 3] log(tau[p]) <- theta + phi * LRT[p] sigma2[p] <- 1 / tau[p] LRT2[p] <- LRT[p] * LRT[p] } min.var <- exp(-(theta + phi * (-34.6193))) # lowest LRT score = -34.6193 max.var <- exp(-(theta + phi * (37.3807))) # highest LRT score = 37.3807 # Priors for fixed effects: for (k in 1 : 8) { beta[k] ~ dnorm(0.0, 0.0001) } theta ~ dnorm(0.0, 0.0001) phi ~ dnorm(0.0, 0.0001) # Priors for random coefficients: for (j in 1 : M) { alpha[j, 1 : 3] ~ dmnorm(gamma[1:3 ], T[1:3 ,1:3 ]); alpha1[j] <- alpha[j,1] } # Hyper-priors: [27]

Examples Volume II gamma[1 : 3] ~ dmnorm(mn[1:3 ], prec[1:3 ,1:3 ]); T[1 : 3, 1 : 3 ] ~ dwish(R[1:3 ,1:3 ], 3)

Schools

Data ( click to open )


Note that school is a 1978 x 3 matrix taking value 1 for all pupils in school 1, 2 for all pupils in school 2 and so on. For computational convenience, Y, mu and tau are indexed over a single dimension p = 1,...,1978 rather than as pupil i within school j as used in equations above. The appropriate school-level coefficients for pupil p are then selected using the school indicator in row p of the data array --- for example alpha[school[p],1].

Inits ( click to open ) Results


A 1000 update burn in followed by a further 10000 updates gave the parameter estimates

beta[1] beta[2] beta[3] beta[4] beta[5] beta[6] beta[7] beta[8] gamma[1] gamma[2] gamma[3] phi theta

mean 2.64E-4 0.4219 0.1725 0.125 0.06201 -0.2769 0.1441 -0.1667 -0.6778 0.03135 0.9597 -0.002605 0.5801

sd 9.842E-5 0.06225 0.04834 0.1377 0.1038 0.1875 0.1061 0.1733 0.09568 0.01019 0.08626 0.002829 0.03205

MC_error 2.429E-6 0.00344 0.001427 0.006001 0.004941 0.007158 0.004271 0.006393 0.005593 1.396E-4 0.004849 3.159E-5 3.518E-4

val2.5pc 7.499E-5 0.301 0.07689 -0.1558 -0.1475 -0.6584 -0.05912 -0.4943 -0.8668 0.01139 0.7947 -0.008146 0.5163

median 2.638E-4 0.4206 0.1722 0.1276 0.06323 -0.2728 0.1428 -0.1675 -0.6783 0.03139 0.959 -0.00259 0.5803

val97.5pc 4.558E-4 0.5466 0.2682 0.3939 0.2624 0.08729 0.36 0.1846 -0.4862 0.05137 1.129 0.002927 0.6414

start 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001

sample 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000

Estimating the ranks The school-specific intercept j1 measures the 'residual effect' for school j after adjusting for pupil- and school-level covariates. This might represent an appropriate quantity by which to rank schools' performance. We compute the ranks in BUGS using the "rank" option of the "Statistics" menu, which we set for the variable alpha at the same time as we set the "sample monitor" option. Since the rank is a function of stochastic nodes, its value will change at every iteration. Hence we may obtain a posterior distribution for the rank of alpha[, k] which may be summarized by posterior histograms as shown below:

[28]

Ice

Examples Volume II

Ice: non-parametric smoothing in an age-cohort model


Breslow and Clayton (1993) analyse breast cancer rates in Iceland by year of birth (K = 11 cohorts from 1840-1849 to 1940-1949) and by age (J =13 groups from 20-24 to 80-84 years). Due to the number of empty cells we consider a single indexing over I = 77 observed number of cases, giving data of the following form.

i agei yeari casesi person-yearsi _____ ________________________________________ 1 1 6 2 41380 2 1 7 0 43650 ... ... 77 13 5 31 13600

In order to pull in the extreme risks associated with small birth cohorts, Breslow and Clayton first consider the exchangeable model casesi ~ Poisson(i) log i = log person-yearsi + agei + yeari k ~ Normal( 0, )

Autoregressive smoothing of relative risks They then consider the alternative approach of smoothing the rates for the cohorts by assuming an auto-regressive model on the 's, assuming the second differences are independent normal variates. This is equivalent to a model and prior distribution casesi ~ Poisson(i) log i = log person-yearsi + agei + yeari 1 ~ Normal( 0, 0.000001 ) 2 | 1 ~ Normal( 0, 0.000001 ) k | 1,...,k-1 ~ Normal( 2 k-1 k-2, )

k>2

We note that 1 and 2 are given "non-informative" priors, but retain a term in order to provide the appropriate likelihood for . For computational reasons Breslow and Clayton impose constraints on their random effects k in order that their mean and linear trend are zero, and counter these constraints by introducing a [29]

Examples Volume II

Ice

linear term b x yeari and allowing unrestrained estimation of j. Since we allow free movement of the 's we dispense with the linear term, and impose a "corner" constraint 1 =0 . model { for (i in 1:I) { cases[i] ~ dpois(mu[i]) log(mu[i]) <- log(pyr[i]) + alpha[age[i]] + beta[year[i]] } betamean[1] <- 2 * beta[2] - beta[3] Nneighs[1] <- 1 betamean[2] <- (2 * beta[1] + 4 * beta[3] - beta[4]) / 5 Nneighs[2] <- 5 for (k in 3 : K - 2) { betamean[k] <- (4 * beta[k - 1] + 4 * beta[k + 1]- beta[k - 2] - beta[k + 2]) / 6 Nneighs[k] <- 6 } betamean[K - 1] <- (2 * beta[K] + 4 * beta[K - 2] - beta[K - 3]) / 5 Nneighs[K - 1] <- 5 betamean[K] <- 2 * beta[K - 1] - beta[K - 2] Nneighs[K] <- 1 for (k in 1 : K) { betaprec[k] <- Nneighs[k] * tau } for (k in 1 : K) { beta[k] ~ dnorm(betamean[k], betaprec[k]) logRR[k] <- beta[k] - beta[5] tau.like[k] <- Nneighs[k] * beta[k] * (beta[k] - betamean[k]) } alpha[1] <- 0.0 for (j in 2 : Nage) { alpha[j] ~ dnorm(0, 1.0E-6) } d <- 0.0001 + sum(tau.like[]) / 2 r <- 0.0001 + K / 2 tau ~ dgamma(r, d) sigma <- 1 / sqrt(tau) }

Data ( click to open ) Inits ( click to open ) Results


A 1000 update burn in followed by a further 100000 updates gave the parameter estimates [30]

Ice
mean -1.075 -0.7717 -0.4721 -0.2016 0.1588 0.319 0.4829 0.6512 0.8466 1.059 0.05286 sd 0.2503 0.1584 0.08179 0.03908 0.04625 0.06949 0.08673 0.1066 0.1281 0.1811 0.04374 MC_error 0.008578 0.005506 0.002555 6.68E-4 0.001162 0.002112 0.002982 0.003936 0.00484 0.006206 0.001335 val2.5pc -1.619 -1.107 -0.651 -0.278 0.04592 0.164 0.3022 0.4366 0.5911 0.7041 0.006732 median -1.043 -0.755 -0.463 -0.2018 0.1683 0.3282 0.4896 0.6566 0.8513 1.06 0.04159 val97.5pc -0.6951 -0.5203 -0.338 -0.1166 0.2269 0.4369 0.6469 0.8613 1.094 1.415 0.1625 start 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001

Examples Volume II
sample 100000 100000 100000 100000 100000 100000 100000 100000 100000 100000 100000

logRR[1] logRR[2] logRR[3] logRR[4] logRR[6] logRR[7] logRR[8] logRR[9] logRR[10] logRR[11] sigma

[31]

Examples Volume II

Beetles

Beetles: choice of link function


Dobson (1983) analyses binary dose-response data published by Bliss (1935), in which the numbers of beetles killed after 5 hour exposure to carbon disulphide at N = 8 different concentrations are recorded:
Concentration (xi) Number of beetles (ni) Number killed (ri) ______________________________________________________ 1.6907 59 6 1.7242 60 13 1.7552 62 18 1.7842 56 28 1.8113 63 52 1.8369 59 52 1.8610 62 61 1.8839 60 60

We assume that the observed number of deaths ri at each concentration xi is binomial with sample size ni and true rate pi. Plausible models for pi include the logistic, probit and extreme value (complimentary log-log) models, as follows pi = exp( + xi) / (1 + exp( + xi) pi = Phi( + xi) pi = 1 - exp(-exp( + xi)) The corresponding graph is shown below:

[32]

Beetles

Examples Volume II

alpha

beta

alpha.star

x[i]

p[i]

n[i]

r[i] for(i IN 1 : N)

rhat[i]

model { for( i in 1 : N ) { r[i] ~ dbin(p[i],n[i]) logit(p[i]) <- alpha.star + beta * (x[i] - mean(x[])) rhat[i] <- n[i] * p[i] } alpha <- alpha.star - beta * mean(x[]) beta ~ dnorm(0.0,0.001) alpha.star ~ dnorm(0.0,0.001) }

Data ( click to open ) Inits ( click to open ) Results


A 1000 update burn in followed by a further 10000 updates gave the parameter estimates Logit model

[33]

Examples Volume II
mean -60.79 34.31 3.56 9.932 22.47 33.87 50.03 53.21 59.14 58.68 sd 5.147 2.893 0.9488 1.677 2.091 1.751 1.646 1.102 0.7338 0.4241 MC_error 0.05624 0.03171 0.009435 0.01549 0.01736 0.0152 0.01661 0.01191 0.008143 0.004761 val2.5pc -71.29 28.91 1.997 6.909 18.36 30.32 46.66 50.86 57.52 57.72 median -60.67 34.24 3.463 9.851 22.46 33.89 50.06 53.28 59.22 58.74 val97.5pc -51.17 40.23 5.634 13.43 26.63 37.25 53.12 55.17 60.38 59.36 start 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 sample 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000

Beetles

alpha beta rhat[1] rhat[2] rhat[3] rhat[4] rhat[5] rhat[6] rhat[7] rhat[8]

Probit model
mean -35.04 19.79 3.442 10.76 23.48 33.81 49.6 53.27 59.6 59.17 sd 2.646 1.488 1.014 1.692 1.916 1.626 1.648 1.17 0.7542 0.3729 MC_error 0.02934 0.01657 0.0106 0.01684 0.01865 0.01706 0.01865 0.01353 0.008725 0.004308 val2.5pc -40.46 16.9 1.743 7.632 19.79 30.58 46.27 50.76 57.88 58.28 median -34.98 19.75 3.348 10.7 23.47 33.83 49.63 53.33 59.67 59.23 val97.5pc -29.93 22.84 5.693 14.23 27.24 36.96 52.73 55.38 60.84 59.72 start 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 sample 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000

alpha beta rhat[1] rhat[2] rhat[3] rhat[4] rhat[5] rhat[6] rhat[7] rhat[8]

Extreme value (cloglog) model


mean -39.77 22.15 5.623 11.28 20.91 30.32 47.74 54.08 61.02 59.92 sd 3.221 1.788 1.119 1.581 1.891 1.666 1.74 1.231 0.5304 0.09563 MC_error 0.02839 0.01573 0.01006 0.01461 0.0183 0.01688 0.01713 0.01134 0.004795 9.349E-4 val2.5pc -46.41 18.81 3.63 8.351 17.29 26.98 44.21 51.48 59.75 59.66 median -39.68 22.1 5.551 11.21 20.89 30.33 47.77 54.15 61.12 59.95 val97.5pc -33.74 25.85 8.055 14.52 24.66 33.56 51.01 56.25 61.77 60.0 start 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 sample 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000

alpha beta rhat[1] rhat[2] rhat[3] rhat[4] rhat[5] rhat[6] rhat[7] rhat[8]

[34]

Endo

Examples Volume II

Endo: conditional inference in case-control studies


Breslow and Day (1980) analyse a set of data from a case-control study relating endometrial cancer with exposure to estrogens. 183 pairs of cases and controls were studied, and the full data is shown below.
Status of control Status of case Not exposed Exposed ____________________________________ Not exposed n00 = 121 n01 = 7 Exposed n10 = 43 n11 = 12

We denote estrogen exposure as xij for the ith case-control pair, where j=1 for a case and j=2 for a control. The conditional likelihood for the log (odds ratio) is then given by i exp xi1 / (exp xi1 + exp xi2) We shall illustrate three methods of fitting this model. It is convenient to denote the fixed disease status as a variable Yi1 = 1, Yi2 = 0. First, Breslow and Day point out that for case-control studies with a single control per case, we may obtain this likelihood by using unconditional logistic regression for each case-control pair. That is Yi1 ~ Binomial(pi,2) logit pi = (xi1 xi2) Second, the Classic BUGS manual (version 0.5) section on Conditional likelihoods in casecontrol studies discusses fitting this likelihood directly by assuming the model Yi. pij log eij ~ Multinomial(pi., 1) = eij / j eij = xij

Finally, the Classic BUGS manual (version 0.5) shows how the multinomial-Poisson transformation can be used. In general, this will be more efficient than using the multinomiallogistic parameterisation above, since it avoids the time-consuming evaluation of j eij. However, in the present example this summation is only over J=2 elements, whilst the multinomial-Poisson parameterisation involves estimation of an additional intercept parameter for each of the 183 strata. Consequently the latter is less efficient than the multinomial-logistic in this case.

[35]

Examples Volume II

Endo

We note that all these formulations may be easily extended to include additional subject-specific covariates, and that the second and third methods can handle arbitrary numbers of controls per case. In addition, the Bayesian approach allows the incorporation of hierarchical structure, measurement error, missing data and so on. All these techniques are illustrated in the code given below, which includes a transformation of the original summary statistics into full data. In this example, all but the second conditionallikelihood approach are commented out. model { # transform collapsed data into full for (i in 1 : I){ Y[i,1] <- 1 Y[i,2] <- 0 } # loop around strata with case exposed, control not exposed (n10) for (i in 1 : n10){ est[i,1] <- 1 est[i,2] <- 0 } # loop around strata with case not exposed, control exposed (n01) for (i in (n10+1) : (n10+n01)){ est[i,1] <- 0 est[i,2] <- 1 } # loop around strata with case exposed, control exposed (n11) for (i in (n10+n01+1) : (n10+n01+n11)){ est[i,1] <- 1 est[i,2] <- 1 } # loop around strata with case not exposed, control not exposed (n00) for (i in (n10+n01+n11+1) :I ){ est[i,1] <- 0 est[i,2] <- 0 } # PRIORS beta ~ dnorm(0,1.0E-6) ; # LIKELIHOOD for (i in 1 : I) { # loop around strata # METHOD 1 - logistic regression # Y[i,1] ~ dbin( p[i,1], 1) # logit(p[i,1]) <- beta * (est[i,1] - est[i,J])

[36]

Endo

Examples Volume II # METHOD 2 - conditional likelihoods Y[i, 1 : J] ~ dmulti( p[i, 1 : J],1) for (j in 1:2){ p[i, j] <- e[i, j] / sum(e[i, ]) log( e[i, j] ) <- beta * est[i, j] } # METHOD 3 fit standard Poisson regressions relative to baseline #for (j in 1:J) { # Y[i, j] ~ dpois(mu[i, j]); # log(mu[i, j]) <- beta0[i] + beta*est[i, j]; } #beta0[i] ~ dnorm(0, 1.0E-6) }

Data ( cklick to open ) Inits ( cklick to open ) Results


A 5000 update burn in followed by a further 10000 updates gave the parameter estimates
beta mean 1.871 sd 0.4123 MC_error val2.5pc 0.009414 1.111 median 1.844 val97.5pc start 2.761 5001 sample 10000

[37]

Examples Volume II

Stagnant

Stagnant: a changepoint problem (and an illustration of how NOT to do MCMC!)


Carlin, Gelfand and Smith (1992) analyse data from Bacon and Watts (1971) concerning a changepoint in a linear regression.
i xi Yi i xi Yi i xi Yi ______________________________________________________ 1 1.39 1.12 11 0.12 0.60 21 0.44 0.13 2 1.39 1.12 12 0.12 0.59 22 0.59 0.01 3 1.08 0.99 13 0.01 0.51 23 0.70 0.13 4 1.08 1.03 14 0.11 0.44 24 0.70 0.14 5 0.94 0.92 15 0.11 0.43 25 0.85 0.30 6 0.80 0.90 16 0.11 0.43 26 0.85 0.33 7 0.63 0.81 17 0.25 0.33 27 0.99 0.46 8 0.63 0.83 18 0.25 0.30 28 0.99 0.43 9 0.25 0.65 19 0.34 0.25 29 1.19 0.65 10 0.25 0.67 20 0.34 0.24

Note the repeated x's. We assume a model with two straight lines that meet at a certain changepoint xk --- this is slightly different from the model of Carlin, Gelfand and Smith (1992) who do not constrain the two straight lines to cross at the changepoint. We assume Yi i ~ Normal(i, ) = + J[i] (xi - xk) J[i]=1 if i <= k J[i]=2 if i > k

giving E(Y) = at the changepoint, with gradient 1 before, and gradient 2 after the changepoint. We give independent "noninformative'' priors to , 1, 2 and . Note: alpha is E(Y) at the changepoint, so will be highly correlated with k. This may be a very poor parameterisation. Note way of constructing a uniform prior on the integer k, and making the regression parameter depend on a random changepoint. model { [38]

Stagnant for( i in 1 : N ) { Y[i] ~ dnorm(mu[i],tau) mu[i] <- alpha + beta[J[i]] * (x[i] - x[k]) J[i] <- 1 + step(i - k - 0.5) punif[i] <- 1/N } tau ~ dgamma(0.001,0.001) alpha ~ dnorm(0.0,1.0E-6) for( j in 1 : 2 ) { beta[j] ~ dnorm(0.0,1.0E-6) } k ~ dcat(punif[]) sigma <- 1 / sqrt(tau)

Examples Volume II

Data

( click to open )

Inits for chain 1 Inits for chain 2( click to open )


Traces of two chains shows complete dependence on starting values
alpha 1.2 1.0 0.8 0.6 0.4 0 2500 5000 iteration k 20.0 15.0 10.0 5.0 0 2500 5000 iteration 7500 7500

Results are hopeless - no mixing at all.


Note: alpha is E(Y) at the changepoint, so will be highly correlated with k. This may be a very poor parameterisation. TRY USING CONTINUOUS PARAMETERISATION [39]

Examples Volume II model { for(i in 1 : N) { Y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta[J[i]] * (x[i] - x.change) J[i] <- 1 + step(x[i] - x.change) } tau ~ dgamma(0.001, 0.001) alpha ~ dnorm(0.0,1.0E-6) for(j in 1 : 2) { beta[j] ~ dnorm(0.0,1.0E-6) } sigma <- 1 / sqrt(tau) x.change ~ dunif(-1.3,1.1) }

Stagnant

Data

( click to open )

Inits for chain 1 Inits for chain 2( click to open ) Results


alpha 0.8 0.6 0.4 0.2 0.0 0 2500 5000 iteration x.change 1.0 0.5 0.0 -0.5 0 2500 5000 iteration 7500 7500

[40]

Stagnant
alpha beta[1] beta[2] sigma x.change mean 0.537 -0.4184 -1.014 0.0221 0.02597 sd 0.02569 0.01511 0.01747 0.003271 0.03245 MC_error 0.001316 6.303E-4 5.38E-4 3.919E-5 0.001668 val2.5pc 0.4895 -0.4468 -1.049 0.0168 -0.03754 median 0.535 -0.419 -1.013 0.02171 0.02868 val97.5pc 0.5881 -0.3876 -0.9799 0.02952 0.0839 start 1001 1001 1001 1001 1001

Examples Volume II
sample 20000 20000 20000 20000 20000

Not wonderful mixing, but reasonable


Good fit to data , (monitor mu and use as predicted values) use 'model fit' in Compare tool Strong correlation of alpha and changepoint alpha x.change -0.932941

[41]

Examples Volume II

Asia

Asia: expert system


Evidence propagation Lauritzen and Spiegelhalter (1988) introduce a fictitious "expert system" representing the diagnosis of a patient presenting to a chest clinic, having just come back from a trip to Asia and showing dyspnoea (shortness-of-breath). The BUGS code is shown below and the conditional probabilities used are given in Lauritzen and Spiegelhalter (1988). Note the use of max to do the logical-or. The dcat distribution is used to sample values with domain (1,2) with probability distribution given by the relevant entries in the conditional probability tables. model { smoking ~ dcat(p.smoking[1:2]) tuberculosis ~ dcat(p.tuberculosis[asia,1:2]) lung.cancer ~ dcat(p.lung.cancer[smoking,1:2]) bronchitis ~ dcat(p.bronchitis[smoking,1:2]) either <- max(tuberculosis,lung.cancer) xray ~ dcat(p.xray[either,1:2]) dyspnoea ~ dcat(p.dyspnoea[either,bronchitis,1:2]) }

Data ( click to open ) Inits ( click to open ) Results


mean 1.811 1.185 1.101 1.628 1.089 1.223 sd 0.3918 0.3885 0.3011 0.4833 0.2854 0.4161 MC_error 0.001409 0.001287 0.001006 0.001764 9.782E-4 0.00135 val2.5pc 1.0 1.0 1.0 1.0 1.0 1.0 median 2.0 1.0 1.0 2.0 1.0 1.0 val97.5pc 2.0 2.0 2.0 2.0 2.0 2.0 start 10001 10001 10001 10001 10001 10001 sample 100000 100000 100000 100000 100000 100000

bronchitis either lung.cancer smoking tuberculosis xray

[42]

You might also like