Bayesian Analysis of Contingency Tables

Miguel A. Gomez-Villegas

Bayesian Analysis of Contingency Tables

Miguel A. Gomez-Villegas

2005, Communications in Statistics

visibility

…

description

12 pages

link

1 file

The display of the data by means of contingency tables is used in different approaches to statistical inference, for example, to broach the test of homogeneity of independent multinomial distributions. We develop a Bayesian procedure to test simple null hypotheses versus bilateral alternatives in contingency tables. Given independent samples of two binomial distributions and taking a mixed prior distribution, we calculate the posterior probability that the proportion of successes in the first population is the same as in the second. This posterior probability is compared with the p-value of the classical method, obtaining a reconciliation between both results, classical and Bayesian. The obtained results are generalized for r × s tables.

Communications in Statistics—Theory and Methods, 34: 1743–1754, 2005 Copyright © Taylor & Francis, Inc. ISSN: 0361-0926 print/1532-415X online DOI: 10.1081/STA-200066364 Analysis of Contingency Tables Bayesian Analysis of Contingency Tables MIGUEL A. GÓMEZ-VILLEGAS AND BEATRIZ GONZÁLEZ PÉREZ Departamento de Estadística e Investigación Operativa I, Universidad Complutense de Madrid, Madrid, Spain The display of the data by means of contingency tables is used in different approaches to statistical inference, for example, to broach the test of homogeneity of independent multinomial distributions. We develop a Bayesian procedure to test simple null hypotheses versus bilateral alternatives in contingency tables. Given independent samples of two binomial distributions and taking a mixed prior distribution, we calculate the posterior probability that the proportion of successes in the first population is the same as in the second. This posterior probability is compared with the p-value of the classical method, obtaining a reconciliation between both results, classical and Bayesian. The obtained results are generalized for r × s tables. Keywords Bayesian statistics; Chi-square tests; Contingency tables; p-values. Mathematics Subject Classification 62F15; 62H17. 1. Introduction The r × s table is used for discussing different approaches to statistical inference. For example, suppose that independent random samples are drawn from two large populations, and their each member is classified as a “success” or a “failure”. The first sample is of size n1 and produces a successes and b failures, the second is of size n2 and produces c successes and d failures. The situation is displayed in the Table 1. In this situation a quantitative measure of the strength of the evidence that the data gives support or rejection of the hypothesis that the proportion of successes in the first population, p1 , is equal to the proportion of successes in the second population, p2 , is required. This problem, apparently simple, has given rise to an extensive literature, since Karl Pearson introduced his already classical 2 test to value the goodness of the fit (see Pearson, 1900). This is one of the simplest natural problems to demonstrate clear differences between classical and Bayesian Received April 7, 2004; Accepted February 3, 2005 Address correspondence to Miguel A. Gómez-Villegas, Departamento de Estadística e Investigación Operativa I, Facultad de CC. Matemáticas, Universidad Complutense de Madrid, Madrid 28040, Spain; E-mail: ma.gv@mat.ucm.es 1743 1744 Gómez-Villegas and Pérez Table 1 Data in the 2 × 2 table Sample 1 Sample 2 Total Successes Failures Total a c m1 b d m2 n1 n2 N approaches, and also between different types of classical analysis. There are of course a number of variations on this problem. Some important bayesian references are given next. Howard (1998) advocates for the more frequent use of unilateral tests and approaches to the problem from a Bayesian viewpoint. He considers as hypotheses of interest H1 p2 < p1 and H2 p1 < p2 , and gives a quantitative measure of the strength of the evidence in support of the more likely hypothesis. He assumes that p1 and p2 will not be exactly equal, and that neither will be 0 or 1. Given independent samples from two binomial distributions, he notes that the posterior probability that p2 < p1 can be estimated from the standard (uncorrected) 2 significance level. He has to assume independent Jeffreys priors about the two populations, that is to say, p1 p2 ∝ p1−1/2 1 − p1 −1/2 p2−1/2 1 − p2 −1/2 in order to get this result. Besides, he introduces a conjugate family of priors which incorporate dependence between beliefs about the two populations. In this same line of work, with unilateral hypotheses like p1 > p2 , other Bayesian approaches to the problem of comparing two proportions for a 2 × 2 table can be mentioned; log-odds-ratio methods and inverse-root-sine methods, which calculate the posterior probability that 1 − 2 > 0 for beta priors, where √ i = log pi 1 − pi −1 , and i = arcsen pi i = 1 2, respectively, as measures of the degree in which two populations are homogeneous (see Lee, 1997, pp. 152–154). Quintana (1998) postulates a nonparametric Bayesian model for assessing homogeneity in r × s contingency tables with fixed right margin totals. The vectors of classification probabilities are assumed to be a sample from a distribution F , and the prior distribution of F is assumed to be a Dirichlet process, centered on a probability measure and with weight c. He also assumes a prior distribution for c and proposes a Bayes factor. Lindley (1988) gives a probability model for the formation of genotypes from two alleles. The alleles are A and a, and the genotypes are AA, Aa, and aa (it is a standard notation). The model can be expressed in terms of two parameters, = 21 log 4pp12p3 and = 21 log pp1 . A Bayesian test of the hypothesis that = 0 3 2 versus = 0, based on a Bayes factor, is considered, where = 0 is the null hypothesis of Hardly-Weinberg equilibrium, H0 p2 2p1 − p 1 − p2 p being the proportion of A’s. We consider testing equality of proportions of independent multinomial distributions when the common proportions are known. Our general approach to the problem of homogeneity consists in working directly with the simple null hypothesis and calculating its posterior probability. To do this, we will follow the Bayesian Analysis of Contingency Tables 1745 Table 2 Pearson’s example Sample 1 Sample 2 Total Successes Failures Total 3 7 10 15 5 20 18 12 30 method used by Gómez-Villegas and Sanz (2000) and Gómez-Villegas et al. (2002), based on assigning an initial probability 0 to the null hypothesis and distributing the remaining probability in the points of the alternative with a prior density p1 p2 . Posterior probabilities of the null hypothesis are calculated with respect to a mixture of point prior on the null and an independent Dirichlet prior on the proportions. With this procedure, in the context of the punctual null hypothesis, it is possible to get a reconciliation between the classical p-value and the Bayesian posterior probability of the null hypothesis. Section 2 formulates the problem in a precise way and calculates an exact expression of the posterior probability that the proportion of successes in the first population is the same as in the second, and equal to a known common value p0 . Section 3 reaches a reconciliation between the classical and Bayesian results, and the Pearson (1947) data (see Table 2) is used to illustrate the procedure. Section 4 generalizes the results of Sec. 2 and 3 for a r × s table. Section 5 exposes a summary of conclusions. 2. Formulation of the Problem and Posterior Probability Consider Xi i = 1 2, independent random binomial variables, Bni pi , and suppose that we wish to test H0 p1 = p2 = p0 versus H1 p1 = p2 (1) where p0 is a known value and the hypothesis p1 = p2 means that at least one of them is different from p0 , that is to say, p1 p2 = p0 p0 . Moreover, suppose that our prior opinion about p1 p2 is given by the density p1 p2 . Hence, in order to test (1), a mixed prior distribution is needed. We propose ∗ p1 p2 = 0 IH0 p1 p2 + 1 − 0 p1 p2 IH1 p1 p2 0 being the assigned prior probability to the null hypothesis. Then, the posterior probability of the null hypothesis, when the data of Table 1 has been observed, is PH0 a c = p0a + c 1 − p0 b + d 0 11 p0a + c 1 − p0 b + d 0 + 1 − 0 0 0 p1a 1 − p1 b p2c 1 − p2 d p1 p2 dp2 dp1 1746 Gómez-Villegas and Pérez A possible initial distribution consists in assigning independent uniform prior distributions, also called independent Laplace distributions, that is to say, p1 p2 = I01 p1 I01 p2 In this situation, we obtain 1 − 0 −1 PH0 a c = 1 + 0 (2) −m b+1 c+1 d+1 where = p0 1 1 − p0 −m2 a+1 . a+b+2 c+d+2 A more general assignment consists in using independent beta prior distributions, p1 p2 = + + −1 p 1 − p1 1 −1 p2−1 1 − p2 −1 where p1 p2 ∈ 0 1, > 0. Then, the posterior probability of the null hypothesis is obtained evaluating expression (2) in −m1 = p0 1 − p0 −m2 + + a + b + c + d + a + b + + c + d + + The posterior probability that is calculated in expression (2) depends on 0 , the initial prior probability assigned to the null hypothesis H0 p1 = p2 = p0 . Now, consider the more realistic precise hypothesis H0 dp0 p0 p1 p2 ≤ versus H1 dp0 p0 p1 p2 > (3) with an appropriate metric d and a value of > 0 sufficiently small. Applying the method of Gómez-Villegas and Sanz (2000) and Gómez-Villegas et al. (2002), we can use p1 p2 , our opinion about p1 p2 , and calculate 0 by means of averaging, 0 = Bp0 p0 p1 p2 dp2 dp1 (4) where Bp0 p0 = p1 p2 ∈ 0 1 × 0 1 dp0 p0 p1 p2 ≤ , the sphere of center p0 p0 and radius . Then, the prior probability assigned to H0 and to H0 by means of p1 p2 is the same thing. Different ways of specifying d p0 p0 p1 p2 can be considered. One of them could be considering an arbitrary value of and dividing it in two values 1 and 2 , may be 1 = 2 = 2 , and then we would build the distance starting from pi − p0 < i , i = 1 2. Another way could be considering Bp0 p0 = p1 p2 ∈ 0 1 × 0 1 p1 − p0 2 + p2 − p0 2 ≤ 2 By means of this second procedure, the posterior probability obtained in (2) can be expressed in terms of . In this article the results are obtained first in function Bayesian Analysis of Contingency Tables 1747 of 0 , and afterwards are specified in terms of employing the expression (4). In particular, it is possible to calculate the value of in (3) such that 0 = 21 . It can be observed that if our prior opinion about p1 p2 is the uniform distribution given by means of the density p1 p2 = 1, p1 p2 ∈ 0 1, then the value of 0 that is obtained with the expression (4), for sufficiently small, is 0 = 2 , the area of the sphere of radius . Note that, in general, H0 p1 = p2 = p0 in (1) is no natural null hypothesis. By this reason we consider first a value of p0 and after take an sphere of radius about this value. Besides, in general, when we wish to test (1), the value of p0 is unknown. In spite of this, (1) has a clear theoretical interest because it can be used as an auxiliary test to develop a Bayesian procedure, with the proposed methodology, when p0 is unknown or with functional form known. Suppose that we wish to test (1) with p0 = 21 and our prior opinion about p1 p2 is given by the uniform density p1 p2 = 1, p1 p2 ∈ 0 1. Thereby, the posterior probability of the null hypothesis is 1 − 0 −1 PH0 a c = 1 + 0 (5) b+1 c+1 d+1 . where = 2N a+1 a+b+2 c+d+2 It can be observed that values of which correspond with 0 > +1 get 1 PH0 a c > 21 . Moreover, PH0 a c = +1 for such that 0 = 21 . For the data of Table 2 we obtain = 6 7265 and, if = √12 , then 0 = 21 and PH0 a c = 0 1294, so that H0 is rejected. Moreover, to accept H0 with the data of Pearson’s example, > 0 53905 or 0 > 0 8706. Therefore, for the data of Table 2, we can observe that there is a wide range of values of < 0 53905, for which H0 is rejected. 3. Comparison with the Classical Method From the classical viewpoint, instead of considering the observed data a c as fixed values and permitting that p1 p2 changes, the point p0 p0 of the null hypothesis is fixed and after the probability of observing a point in some extreme region of the alternative hypothesis which includes a c is calculated, that is to say, instead of calculating the posterior probability of the null hypothesis, the p-value is calculated. (The idea is basically that or H0 is false, or an event with probability very small has occurred.) As usual, if we use, as measure of the evidence in support of H1 , the discrepancy between the observed values and the expected values when H0 is true, then, in the terms of Pearson’s 2 statistic, the test statistical would be the random variable = c2 d2 a2 b2 + −N + + n 1 p0 n1 1 − p0 n2 p0 n2 1 − p0 (6) The sampling distribution of when H0 is true is 22 . Then, if the value of in the data point is a0 c0 = , and the experiment was repeated independently, once again sampling n1 subjects randomly from population 1 and n2 subjects randomly from population 2 for p0 p0 fixed, the probability that we would get a 1748 Gómez-Villegas and Pérez new value of as big as or larger than can be calculated. Therefore, ≥ is a possible critical region, and p = P ≥ p0 p0 = P22 ≥ = e− 2 is the p-value. With this procedure, the decision of accepting or rejecting H0 depends on the size of the p-value, namely, H0 is rejected when p < p∗ p∗ ∈ 0 1 being a value sufficiently small. Now, we are going to suppose that we wish to test (1) with p0 = 21 by means of the previous classical method. In this situation, the test statistic is the random variable =2 a2 + b2 c2 + d2 + − N n1 n2 and the evidence used is the p-value, N p = e2 − a2 +b2 0 0 n1 − c2 +d2 0 0 n2 For the data of Table 2 we obtain = 8 33333, and a p-value p = 0 015504. Observe that H0 is rejected for p∗ = 0 05, but for p∗ = 0 01 there is not enough evidence to reject it, and in that sense H0 is accepted. To compare the proposed Bayesian method with Pearson’s 2 classical method, which uses the value given in expression (6) as the test statistical, it would be interesting if there exists a functional dependence between both statistics, and , or between the posterior probability and the p-value, p. That it to say, = g for some increasing function g R+ → R+ . However, for 2 × 2 tables, if n1 = 18 and n2 = 12, it can be observed that for the data of Pearson’s example, a c = 3 7, the value of in the expression (5) is 6.72 and the value of in the expression (6) is 8.33333, whereas if a c = 9 1, then = 7 45 and = 8 33333. Hence, such functional dependence is not possible. Furthermore, it can be observed that in contradistinction to distinguishes between the two previous situations. Notwithstanding, it can be verified that there exists a non-monotonous function, h R+ → R+ , for which = h (see Fig. 1). Therefore, the classical test allows a representation in terms of . Now, the objective is to get some kind of reconciliation between the classical and the Bayesian approaches, that is to say, it would be convenient that a same number had both performances. To do this, we consider the following equation, 1 − 0 1+ 0 −1 = p 2p∗ from which the value of 0 can be obtained, −1 1 2p∗ p 0 = 1 + = −1 p p + 2p∗ − p (7) Bayesian Analysis of Contingency Tables 1749 Figure 1. Bars diagram a c a c, for 2 × 2 tables with n1 = 18 and n2 = 12. There is a non-monotonous function, h R+ → R+ , such that = h which satisfies that PH0 a c > 21 when p > p∗ . Therefore, using the value of 0 which is obtained in the expression (7), the same conclusion would be reached with both methods. Notwithstanding, this reconciliation is too strict, since the obtained value in expression (7) depends on the data. In this sense, we do not affirm that the procedure to obtain the accord has to be by means of equaling both expressions, but that the use of a value next to the result of this equalization can furnish, when this is possible, an approximately equal numeric value from both viewpoints. The desirable reconciliation would formulate the accord so that if for example p∗ ∈ 0 05 0 1, then 0 ∈ ℓ1 ℓ2 for some ℓ1 ℓ2 ∈ 0 1, ℓ1 < ℓ2 . It can be noted that 0 < 0 < 1 only when 2p∗ > p. Moreover, fixing p∗ , 0 < p∗ < 1, for any p-value p 0 < p < 2p∗ , there is an initial prior probability 0 0 < 0 < 1, assigned to the null hypothesis of test (1) for p0 = 21 , assuming our initial opinion about p1 p2 is uniform, that allows both results, the classical and the Bayesian, to be equal. It can also be observed that if p∗ = 21 , then, whatever the p-value p is, such 0 always exists and verifies that PH0 a c = p. For the data of Table 2, if p∗ = 21 , the value 0 that reconciles the classical p-value, p = 0 015504, with the Bayesian posterior probability is 0 = 0 09578. If p∗ = 0 1 we obtain 0 = 0 36113 and reject with a posterior probability 0.07752. 1750 Gómez-Villegas and Pérez Table 3 Summary of results for Pearson’s example + 1−1 p 8.333333 0.015504 Table 2 6.7265 p =05 0.09578 0.17461 0.015504 ∗ p p + 2p − p p2p∗ −1 ∗ −1 + 1−1 0.1294 p =01 0.36113 0.33904 0.07752 ∗ 0.8706 0.53905 p = 0 05 0.5524 0.41933 0.15503 p = 0 01 0.9587 0.6085 0.77523 ∗ ∗ If p∗ = 0 05 we get 0 = 0 5524 and reject with 0.15503. For p∗ = 0 01 we get 0 = 0 9587 and accept with 0.77533. The obtained results are summarized in Table 3. We can observe that the value of 0 , and accordingly the value of , which obtains the agreement between the classical and the Bayesian results in the previously exposed terms, decreases when p∗ increases. Besides, for the data of Pearson’s example, the values of for which this agreement is achieved when p∗ ∈ 0 01 0 5 are such that ≤ 0.6085. It has already been indicated that the accord between the classical and Bayesian results that is obtained by means of expression (7) is too strict. However, it gives an idea of what value of , when it exists, must be so that this reconciliation between both methods is possible. To eliminate the dependence of the data, we have generated all of the possible 2 × 2 tables to n1 and n2 fixed and known. In the situation that we are studying, the entries are n1 = 18 and n2 = 12, and a total of 247 possible tables have been generated. Pearson’s data is organized in Table 95 in the ascendant sort carried out according to the values of (see Fig. 1). For every one of these tables, we carry out the same study that has previously been carried out for the data of Pearson’s example. By means of an easy data analysis, we can check that there are values of p∗ , for example p∗ = 0 5 p∗ = 0 1 p∗ = 0 05, or p∗ = 0 01, such that we can find an interval of values of 0 I = Ip∗ n1 = 18 n2 = 12, which verifies that the result obtained with the proposed Bayesian method for test (1), with p0 = 21 and p1 p2 = 1 p1 p2 ∈ 0 1, using a value 0 ∈ I, is the same as the result obtained with Pearson’s 2 classical test (see the following enclosed summary of results). Hence, there exists an accord between both methods. Notwithstanding, there are also values of p∗ , for example p∗ = 0 015, such that this is not possible. The obtained results are summarized in Table 4. Table 4 Summary of results for 2 × 2 tables with n1 = 18 and n2 = 12 p∗ ∈ ∈ 0 ∈ 0 46 0 513 0 221 0 23 0 153 0.167 0 087 0 143 0 353 0.4 0 391 0 506 0 045 0 052 0 453 0 462 0 643 0 673 0 0095 0 0138 0 5528 0 5675 0 893 0 914 Bayesian Analysis of Contingency Tables 1751 Moreover, it can be verified that the value of 0 , and thereby the value of , such that the previous reconciliation between both methods is possible, decreases when p∗ increases. Also, it can be checked that the value of 0 computed by means of expression (7) does not always exist, and when it exists this value does not always belong to the interval of values that allow the reconciliation between both methods to be achieved. In the general situation with fixed n1 , n2 , and p∗ , if we denote by means of ℓ1 = ℓ1 p∗ n1 n2 = max + 1−1 min + 1−1 acp>p∗ ℓ2 = ℓ2 p∗ n1 n2 = acp≤p∗ and p∗ satisfies that ℓ1 < ℓ2 , then there exists an interval of values of 0 , I = Ip∗ n1 n2 = ℓ1 ℓ2 , such that the result obtained with the developed Bayesian method to test (1), using a value of 0 ∈ I, is the same conclusion obtained with the Pearson’s 2 classical method. It is clear that the existence of values of p∗ which satisfy the sufficient condition that ensures the accord between both methods depends on the increasing tendency that we can observe (see Fig. 1) in the functional relationship that exists between both statistics, = h , although this relationship is not strictly monotonous. Therefore, the reconciliation is possible in that sense. 4. r × s Tables In the following, we will generalize the previously obtained results to the situation of r × s tables. To do this, we suppose that independent random samples are drawn from r sufficiently large populations, and their each member belongs to one and only one of the s classes C1 Cs . The sample number i i = 1 r, is of size ni and yields nij individuals in the category Cj j = 1 s. The situation is displayed in Table 5. Let Xi i = 1 r, be independent multinomial random variables, MBni pi , with pi = pi1 pis ∈ , where = p = p1 ps ∈ 0 1s si=1 pj = 1 ⊂ Rs−1 . In this situation, we are going to suppose that we wish to test H0 p1 = · · · = pr = p0 versus H1 ∃i = j pi = pj (8) where p0 = p01 p0s ∈ is an unknown value and H1 ∃i = j pi = pj means that at least one of them is different from p0 . Consider that our prior opinion about Table 5 Data in the r × s table Class 1 Class 2 Class s Total Sample 1 Sample 2 n11 n21 n12 n22 n1s n2s n1 n2 Sample r Total nr1 m1 nr2 m2 nrs ms nr N 1752 Gómez-Villegas and Pérez p1 pr is given by means of the density p1 pr = mixed prior distribution is needed to test (8), namely ∗ p1 pr = 0 IH0 p1 pr + 1 − 0 p1 r i=1 pi . Therefore, a pr IH1 p1 pr 0 being the prior probability assigned to the null hypothesis. Then, the posterior probability of the null hypothesis, when the data of Table 5 has been observed, is s r n p0j i=1 ij 0 r s n i=1 nij 0 + 1 − 0 ri=1 sj=1 pijij pi dpi j=1 p0j j=1 Consider i = i1 is , with ij > 0 for all j = 1 s and all i = 1 r. If we assign to each pi a Dirichlet prior distribution of parameter i Di i = 1 r, (see Ghosh and Ramamoorthi, 2003, Ch. 3), namely, sj=1 ij s ij −1 pi = s pij j=1 ij j=1 pis ∈ i = 1 pi = pi1 r then such posterior probability is 1+ s −m p0j j j=1 1 − 0 0 r i=1 −1 sj=1 ij s nij +ij −1 pij dpi s j=1 ij j=1 Therefore, the posterior probability of the null hypothesis, when the data of Table 5 has been observed, can be expressed in the following way, −1 1 − 0 1+ 0 (9) r r s s ij −m j=1 nij +ij i=1 s j=1 s where = sj=1 p0j j i=1 . r r i=1 j=1 ij i=1 ni + j=1 ij We can note that if we assign a uniform prior distribution on to each pi , i = 1 r, then the posterior probability of the null hypothesis can be obtained evaluating expression (9) in s = j=1 −m p0j j s r r s i=1 j=1 r i=1 nij + 1 ni + s The posterior probability calculated in expression (9) depends on 0 , the initial prior probability that we assign to the null hypothesis, H0 p1 = · · · = pr = p0 . Following, if we denote by P0 = p0 p0 ∈ r ⊂ Rrs−1 and P = p1 pr r ∈ ⊂ Rrs−1 , then H0 P = P0 is the null hypothesis of test (8). Now, we are going to consider the more realistic precise hypotheses, H0 dP0 P ≤ versus H1 dP0 P > with an appropriate metric d and a value of > 0 sufficiently small. Bayesian Analysis of Contingency Tables 1753 2 2 We propose to use BP0 = P ∈ r ri=1 s−1 j=1 pij − p0j ≤ . Then, applying the method of Gómez-Villegas and Sanz (2000) and Gómez-Villegas et al. (2002), we can use p1 pr = P, our opinion about P, to calculate 0 by means of averaging, 0 = BP0 PdP. We can observe that if a uniform prior distribution on is assigned to each pi , i = 1 r, then 0 = rs−1 2 rs−1 + 1 rs−1 2 the volume of the sphere of radius in Rrs−1 , for sufficiently small. From a classical viewpoint and considering Pearson’s 2 test statistic, = s r n2ij i=1 j=1 ni p0j − N if we denote by means of the value of calculated in the point which the observed data of Table 5 forms, that is to say, nij0 i = 1 r j = 1 s = , then ≥ is a possible critical region and the p-value is 2 p = P ≥ p0 = Prs−1 ≥ Therefore, to search for a reconciliation between both results, the classical and the Bayesian, we can follow the same kind of reasoning developed in Sec. 3, since expression (9) has the same form as expression (2). In conclusion, with fixed ni , i = 1 r and p∗ , if we denote by means of ℓ1 = ℓ1 p∗ n1 nr = max + 1−1 (10) ℓ2 = ℓ2 p∗ n1 nr = min + 1−1 (11) nij p>p∗ nij p≤p∗ and p∗ satisfies that ℓ1 < ℓ2 , then there is an interval of values of 0 , I = Ip∗ n1 nr = ℓ1 ℓ2 , such that the result obtained with the proposed Bayesian method to test (8), using a value of 0 ∈ I, is the same conclusion obtained when we use Pearson’s 2 classical method. Therefore, the accord is possible in this sense. 5. Conclusions The posterior probability of the null hypothesis of homogeneity of independent multinomial populations in tables r × s, when p0 is known for a mixed prior distribution that assigns an initial probability 0 to H0 p1 = · · · = pr = p0 and distributes of a continuous way the remaining probability in the points of the alternative hypothesis by means of a Dirichlet prior density, can be expressed as 1+ 1 − 0 0 −1 1754 Gómez-Villegas and Pérez where is a statistic that measures the strength of the evidence in support of the more likely hypothesis, = h is the test statistic for Pearson’s 2 Classical method, and h R+ → R+ is a nonmonotonous function of increasing tendency. Fixing ni ∈ N i = 1 r and p∗ ∈ 0 1 ℓ1 < ℓ2 , where ℓ1 and ℓ2 are defined in expressions (10) and (11), respectively, gives a sufficient condition by which the reconciliation between both methods is possible. That is to say, if p∗ satisfies that ℓ1 < ℓ2 , then for some value of such that 0 = 0 ∈ ℓ1 ℓ2 , the p-value, p, 0 0 verifies that p > p∗ and 1 + 1− −1 > 21 , or that p ≤ p∗ and 1 + 1− −1 ≤ 21 , 0 0 whatever nij0 i = 1 r j = 1 s, the point that the observed data of Table 5 forms, is. The existence of values p∗ that satisfy such sufficient condition depends on the functional relationship, in terms of h, that exists between the statistics and . Thereby, the reconciliation between both methods is possible in that sense. For example, for 2 × 2 tables with n1 = 18 and n2 = 12, when p∗ = 0 1 the accord is obtained for ∈ 0 353 0 4. The generalization of the previous results for the problem to test the homogeneity of independent multinomial populations when p0 is unknown, or with functional form known, p0 = p, is possible following a similar reasoning. We are studying some robustness properties of the Bayes procedure for the −contaminated class of priors and we have partial results. References Ghosh, J. K., Ramamoorthi, R. V. (2003). Bayesian Nonparametrics. Barcelona: Springer. Gómez-Villegas, M. A., Maín, P., Sanz, L. (2002). A suitable bayesian approach in testing point null hypothesis: some examples revisited. Commun. Statist. Theor. Meth. 31(2):201–217. Gómez-Villegas, M. A., Sanz, L. (2000). -contaminated priors in testing point null hypothesis: a procedure to determine the prior probability. Statist. Probab. Lett. 47:53–60. Howard, J. V. (1998). The 2 × 2 Table: A Discussion from a Bayesian Viewpoint. Statist. Sci. 13(4):351–367. Lee, P. M. (1997). Bayesian Statistics: An Introduction. London: Arnold. Lindley, D. V. (1988). Statistical inference concerning Hardy-Weinberg equilibrium. Bayesian Statist. 3:307–326. Pearson, E. S. (1947). The choice of statistical tests illustrated on the interpretation of data classed in a 2 × 2 table. Biometrica 4:139–167. Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Phil. Mag. 5(50):157–175. Quintana, F. A. (1998). Nonparametric Bayesian analysis for assessing homogeneity in k × l contingency tables with fixed right margin totals. J. Amer. Statist. Assoc. Theor. Meth. 93(443):1140–1149.

Log In

Bayesian Analysis of Contingency Tables

Related papers

Related topics