Showing posts with label PNAS. Show all posts
Showing posts with label PNAS. Show all posts

Monday, 19 August 2019

Updated correction: The harmonic mean p-value for combining dependent tests

 To view a PDF version of this announcement click here 
Important: Please update to version 3.0 of the harmonicmeanp R package to address critical errors in the main function p.hmp.

I would like to update the correction I issued on July 3, 2019 to cover a second error I discovered that affects the main function of the R package,  p.hmp. There are two errors in the original paper:
  1. The paper (Wilson 2019 PNAS 116: 1195-1200) erroneously stated that the test \(\overset{\circ}{p}_\mathcal{R} \leq \alpha_{|\mathcal{R}|}\,w_\mathcal{R}\) controls the strong-sense family-wise error rate asymptotically when it should read \(\overset{\circ}{p}_\mathcal{R} \leq \alpha_{L}\,w_\mathcal{R}\).
  2. The paper incorrectly stated that one can produce adjusted p-values that are asymptotically exact, as intended in the original Figure 1, by transforming the harmonic mean p-value with Equation 4 before adjusting by a factor \(1/w_{\mathcal{R}}\). In fact the harmonic mean p-value must be multiplied by \(1/w_{\mathcal{R}}\) before transforming with Equation 4.
The p.hmp function prior to version 3.0 of the R package was affected by these errors.

In the above,
  • L is the total number of individual p-values.
  • \(\mathcal{R}\) represents any subset of those p-values.
  • \(\overset{\circ}{p}_\mathcal{R} = \left(\sum_{i\in\mathcal{R}} w_i\right)/\left(\sum_{i\in\mathcal{R}} w_i/p_i\right)\) is the HMP for subset \(\mathcal{R}\).
  • \(w_i\) is the weight for the ith p-value. The weights must sum to one: \(\sum_{i=1}^L w_i=1\). For equal weights, \(w_i=1/L\).
  • \(w_\mathcal{R}=\sum_{i\in\mathcal{R}}w_i\) is the sum of weights for subset \(\mathcal{R}\).
  • \(|\mathcal{R}|\) gives the number of p-values in subset \(\mathcal{R}\).
  • \(\alpha_{|\mathcal{R}|}\) and \(\alpha_{L}\) are significance thresholds provided by the Landau distribution (Table 1).
Version 2.0 (released July 2019) of the harmonicmeanp R package only addressed the first error, which is why I am now releasing version 3.0 to address both errors. Compared to version 1.0, the main function p.hmp has been updated to take an additional argument, L, which sets the total number of p-values. If argument L is omitted, a warning is issued and L is assumed to equal the length of the first argument, p, preserving earlier behaviour. Please update the R package to version 3.0.

The tutorial, available as a vignette in the R package and online, is affected quantitatively by both errors, and has been extensively updated for version 3.0.

The second error affects only one line of the corrected paper (issued July 2019). I have updated it to address the second error and two typos in Figure legends 1 and 2: http://www.danielwilson.me.uk/files/wilson_2019_annotated_corrections.v2.pdf. You will need Adobe Reader to properly view the annotations and the embedded corrections to Figures 1 and 2.

I would like to deeply apologise to users for the inconvenience the two errors have caused.

More information follows under the headings:

Why does this matter?

The family-wise error rate (FWER) controls the probability of falsely rejecting any null hypotheses, or groups of null hypotheses, when they are true. The strong-sense FWER maintains control even when some null hypotheses are false, thereby offering control across much broader and more relevant scenarios than the weak-sense FWER.

The ssFWER is not controlled at the expected rate if:

  1. The more lenient threshold \(\alpha_{|\mathcal{R}|}\) is used rather than the corrected threshold \(\alpha_L\), both derived via Table 1 of the paper from the desired ssFWER \(\alpha\).
  2. Raw p-values are transformed with Equation 4 before adjusting by a factor \(w_{\mathcal{R}}^{-1}\), rather than adjusting the raw p-values by a factor \(w_{\mathcal{R}}^{-1}\) before transforming with Equation 4.
The p.hmp function of the R package suffers both issues in version 1.0, and the second issue in version 2.0. Please update to version 3.0.

Tests in which significance is marginal or non-significant at \(\alpha=0.05\) are far more likely to be affected in practice.

Regarding error 1, individual p-values need to be assessed against the threshold \(\alpha_{L}/L\) when the HMP is used, not the more lenient \(\alpha_{1}/L\) nor the still more lenient \(\alpha/L\) (assuming equal weights). This shows that there is a cost to using the HMP compared to Bonferroni correction in the evaluation of individual p-values (and indeed small groups of p-values). For one billion tests \(\left(L=10^9\right)\) and a desired ssFWER of \(\alpha=0.01\), the fold difference in thresholds from Table 1 would be \(\alpha/\alpha_L=0.01/0.008=1.25\).

However, it remains the case that HMP is more powerful than Bonferroni for assessing the significance of large groups of hypotheses. This is the motivation for using the HMP, and combined tests in general, because the power to find significant groups of hypotheses will be higher than the power to detect significant individual hypotheses when the total number of tests (L) is large and the aim is to control the ssFWER.

How does it affect the paper?

I have submitted a request to correct the paper to PNAS. It is up to the editors whether to agree to this request. A copy of the published paper, annotated with the requested corrections, is available here: http://www.danielwilson.me.uk/files/wilson_2019_annotated_corrections.v2.pdf. Please use Adobe Reader to properly view the annotations and the embedded corrections to Figures 1 and 2.

Where did the errors come from?

Regarding the first error, page 11 of the supplementary information gave a correct version of the full closed testing procedure that controls the ssFWER (Equation 37). However, it went on to erroneously claim that "one can apply weighted Bonferroni correction to make a simple adjustment to Equation 6 by substituting \(\alpha_{|\mathcal{R}|}\) for \(\alpha\)." This reasoning would only be valid if the subsets of p-values to be combined were pre-selected and did not overlap. However, this would no longer constitute a flexible multilevel test in which every combination of p-values can be tested while controlling the ssFWER. The examples in Figures 1 and 2 pursued multilevel testing, in which the same p-values were assessed multiple times in subsets of different sizes, and in partially overlapping subsets of equal sizes. For the multilevel test, a formal shortcut to Equation 37, which makes it computationally practicable to control the ssFWER, is required. The simplest such shortcut procedure is the corrected test
$$\overset{\circ}{p}_\mathcal{R} \leq \alpha_{L}\,w_\mathcal{R}$$
One can show this is a valid multilevel test because if $$\overset{\circ}{p}_\mathcal{R}\leq\alpha_L\,w_\mathcal{R}$$ then $$\overset{\circ}{p}=\left(w_\mathcal{R}\,\overset{\circ}{p}^{-1}_\mathcal{R}+w_{\mathcal{R}^\prime}\,\overset{\circ}{p}^{-1}_{\mathcal{R}^\prime}\right)^{-1} \leq w^{-1}_\mathcal{R}\,\overset{\circ}{p}_\mathcal{R}\leq\alpha_L$$an argument that mirrors the logic of Equation 7 for direct interpretation of the HMP (an approximate procedure), which is not affected by this correction.

The second error, which was also caused by carelessness on my part, occurred in the main text in the statement "(Equivalently, one can compare the exact p-value from Eq. 4 with \(\alpha\,w_{\mathcal{R}}\).)" I did not identify it sooner because the corrected version of the paper no longer uses Equation 4 to transform p-values in Figure 1.

How do I update the R package?

The R package is maintained at https://cran.r-project.org/package=harmonicmeanp.

In R, test whether you have version 3.0 of the package installed as follows:
packageVersion("harmonicmeanp")

The online binaries take a few days or weeks to update, so to ensure you install the most recent version of the package install from source by typing:
install.packages("harmonicmeanp", dependencies=TRUE, type="source")

You can additionally specify the CRAN repository for example:
install.packages("harmonicmeanp", dependencies=TRUE, type="source", repos="https://cran.wu.ac.at")

After installation, check again the version number:
stopifnot(packageVersion("harmonicmeanp")>=3.0)

What if I have already reported results?

I am very sorry for inconvenience caused in this case.

As long as the 'headline' test was significant with p.hmp under R package versions 1.0 or 2.0, then the weak-sense FWER can be considered to have been controlled. The 'headline' test is the test in which all p-values are included in a single combined test. The headline test is not affected by either error, because \(|\mathcal{R}|=L\) and \(w_{\mathcal{R}}=1\). The headline test controls the weak-sense FWER, and therefore so does a two-step procedure in which subsets are only deemed significant when the headline test is significant (Hochberg and Tamhane, 1987, Multiple Comparison Procedures, p. 3, Wiley).

If the headline test was not significant, re-running the analysis with version 3.0 will not produce significant results either because the stringency is greater for controlling the strong-sense FWER. If the headline test was significant, you may wish to reanalyse the data with version 3.0 to obtain strong-sense FWER control, because this was the criterion the HMP procedure was intended to control.

If some results that were significant under version 1.0 or 2.0 of the R package are no longer significant, you may conclude they are not significant or you may report them as significant subject to making clear that only the weak-sense FWER was controlled.

More information

For more information please leave a comment below, or get in touch via the contact page.

Saturday, 6 July 2019

Correction: The harmonic mean p-value for combining dependent tests

Important: this announcement has been superceded. Please see updated correction

I would like to issue the following correction to users of the harmonic mean p-value (HMP), with apologies: The paper (Wilson 2019 PNAS 116: 1195-1200) erroneously states that the following asymptotically exact test controls the strong-sense family-wise error rate for any subset of p-values \(\mathcal{R}\):
$$\overset{\circ}{p}_\mathcal{R} \leq \alpha_{|\mathcal{R}|}\,w_\mathcal{R}$$
when it should read
$$\overset{\circ}{p}_\mathcal{R} \leq \alpha_{L}\,w_\mathcal{R}$$
where:
  • L is the total number of individual p-values.
  • \(\mathcal{R}\) represents any subset of those p-values.
  • \(\overset{\circ}{p}_\mathcal{R} = \left(\sum_{i\in\mathcal{R}} w_i\right)/\left(\sum_{i\in\mathcal{R}} w_i/p_i\right)\) is the HMP for subset \(\mathcal{R}\).
  • \(w_i\) is the weight for the ith p-value. The weights must sum to one: \(\sum_{i=1}^L w_i=1\). For equal weights, \(w_i=1/L\).
  • \(w_\mathcal{R}=\sum_{i\in\mathcal{R}}w_i\) is the sum of weights for subset \(\mathcal{R}\).
  • \(|\mathcal{R}|\) gives the number of p-values in subset \(\mathcal{R}\).
  • \(\alpha_{|\mathcal{R}|}\) and \(\alpha_{L}\) are significance thresholds provided by the Landau distribution (Table 1).
In version 2.0 of the harmonicmeanp R package, the main function p.hmp is updated to take an additional argument, L, which sets the total number of p-values. If argument L is omitted, a warning is issued and L is assumed to equal the length of the first argument, p, preserving previous behaviour. Please update the R package.

An updated tutorial is available as a vignette in the R package and online here: http://www.danielwilson.me.uk/harmonicmeanp/hmpTutorial.html

Why does this matter?

The family-wise error rate (FWER) controls the probability of falsely rejecting any null hypotheses, or groups of null hypotheses, when they are true. The strong-sense FWER maintains control even when some null hypotheses are false, thereby offering control across much broader and more relevant scenarios.

Using the more lenient threshold \(\alpha_{|\mathcal{R}|}\) rather than the corrected threshold \(\alpha_L\), both derived via Table 1 of the paper from the desired ssFWER \(\alpha\), means the ssFWER is not controlled at the expected rate.

Tests with small numbers of p-values are far more likely to be affected in practice. In particular, individual p-values should be assessed against the threshold \(\alpha_{L}/L\) when the HMP is used, not the more lenient \(\alpha_{1}/L\) nor the still more lenient \(\alpha/L\) (assuming equal weights). This shows that there is a cost to using the HMP compared to Bonferroni correction in the evaluation of individual p-values. For one billion tests \(\left(L=10^9\right)\) and a desired ssFWER of \(\alpha=0.01\), the fold difference in thresholds from Table 1 would be \(\alpha/\alpha_L=0.01/0.008=1.25\).

However, it remains the case that HMP is much more powerful than Bonferroni for assessing the significance of groups of hypotheses. This is the motivation for using the HMP, and combined tests in general, because the power to find significant groups of hypotheses will be much higher than the power to detect significant individual hypotheses when the total number of tests (L) is large and the aim is to control the ssFWER.

How does it affect the paper?

I have submitted a request to correct the paper to PNAS. It is up to the editors whether to agree to this request. A copy of the published paper, annotated with the requested corrections, is available here: http://www.danielwilson.me.uk/files/wilson_2019_annotated_corrections.pdf. Please use Adobe Reader to properly view the annotations and the embedded corrections to Figures 1 and 2.

Where did the error come from?

Page 11 of the supplementary information gave a correct version of the full closed testing procedure that controls the ssFWER (Equation 37). However, it went on to erroneously claim that "one can apply weighted Bonferroni correction to make a simple adjustment to Equation 6 by substituting \(\alpha_{|\mathcal{R}|}\) for \(\alpha\)." This reasoning would only be valid if the subsets of p-values to be combined were pre-selected and did not overlap. However, this would no longer constitute a flexible multilevel test in which every combination of p-values can be tested while controlling the ssFWER. The examples in Figures 1 and 2 pursued multilevel testing, in which the same p-values were assessed multiple times in subsets of different sizes, and in partially overlapping subsets of equal sizes. For the multilevel test, a formal shortcut to Equation 37, which makes it computationally practicable to control the ssFWER, is required. The simplest such shortcut procedure is the corrected test
$$\overset{\circ}{p}_\mathcal{R} \leq \alpha_{L}\,w_\mathcal{R}$$
One can show this is a valid multilevel test because if $$\overset{\circ}{p}_\mathcal{R}\leq\alpha_L\,w_\mathcal{R}$$ then $$\overset{\circ}{p}=\left(w_\mathcal{R}\,\overset{\circ}{p}^{-1}_\mathcal{R}+w_{\mathcal{R}^\prime}\,\overset{\circ}{p}^{-1}_{\mathcal{R}^\prime}\right)^{-1} \leq w^{-1}_\mathcal{R}\,\overset{\circ}{p}_\mathcal{R}\leq\alpha_L$$an argument that mirrors the logic of Equation 7 for direct interpretation of the HMP, which is not affected by this correction.

More information

For more information please leave a comment below, or get in touch via the contact page.

Friday, 9 October 2015

PLoS Biology: Staphylococcus aureus invading the blood are less toxic

Toxicity in nose, blood and skin bacteria.
Collaborative work with Ruth Massey's group at the University of Bath taking forward a study of within-host evolution of Staphylococcus aureus during infection has been published in PLoS Biology. Previously we reported in PNAS that in one patient, bacteria causing a serious bloodstream infection differed by just eight mutations from a persistently carried nose population. We identified one of those mutations as playing a potentially causative role in transforming the nose bacteria into a form capable of bloodstream infection - a regulatory protein called rsp. To investigate further, Ruth applied a number of tests to characterize bacteria taken prior to and during infection. In this new paper, we report the surprising result that the bloodstream isolates show reduced toxicity and that rsp is the responsible for this change.

The notion that isolates responsible for serious human infection are less toxic challenges some long-held beliefs about the mechanism of disease in Staphylococcus aureus infections. Most models of disease assume a straightforward relationship between increased toxicity and greater virulence - the propensity to cause, or severity of, disease.

To test her observation, Ruth collaborated with groups from New York and Cambridge to investigate whether the pattern observed in one patient held more generally across 134 Staphylococcus aureus belonging to the notorious USA300 strain. It did.

Curiously, bacteria isolated from the skin and from superficial infections were equally toxic to nose bacteria. These findings raise new questions about the role of toxicity in colonization, transmission and serious infections of Staphylococcus aureus. One possibility that we wish to investigate further is whether toxicity might be required for the usual transmission of Staphylococcus aureus populations in the nose, skin or superficial infections (such as impetigo), whereas loss of toxicity may promote transition to deep tissue and bloodstream infections by evading immune defences.

Monday, 5 March 2012

PNAS paper on staphylococcal evolution during infection

Today in PNAS Early Edition my colleagues and I have a paper published reporting the genome evolution of Staphylococcus aureus during the transition from prolonged nasal carriage to invasive disease. Since Staph. aureus, a major bacterial cause of life-threatening infections, is carried without symptoms by a quarter of healthy adults, a natural question is to ask what genetic changes - if any - accompany the transition to invasive disease. The opportunity to pursue this question arose from a detailed epidemiological investigation of asymptomatic Staph. aureus nasal carriage set up by colleagues of mine including Derrick Crook and Kyle Knox. The study has recruited over 1,000 participants in Oxfordshire since it began running in October 2008. One participant developed a bloodstream infection that was indistinguishable from the strain of Staph. aureus persistently carried in the nose for the previous 13 months. Members of the Modernising Medical Microbiology consortium, led by Derrick and Rory Bowden, sequenced the genomes of 68 bacterial colonies isolated from the nasal and blood samples from this participant, and 101 colonies from nasal samples from two other participants that did not go on to develop disease. Bernadette Young and Tanya Golubchik analyzed the genome evolution of these bacterial populations, discovering an unusual pattern in the mutations that occurred between nasal carriage and invasive disease: mutations that led to prematurely truncated proteins were significantly over-represented, including one in a gene previously associated with virulence in bacteria. To know more, read the full open access article.