p-vrednost

U statističkom testiranju hipoteze, p-vrednost (engl. probability value) ili vrednost verovatnoće, za dati statistički model verovatnoća je da će, kada je nulta hipoteza tačna, statistički parametar (kao što je apsolutna vrednost prosečne razlike između dve upoređene grupe) biti veći ili jednak stvarno uočenom rezultatima.^[1] Upotreba p-vrednosti u testiranju statističkih hipoteza je uobičajena u mnogim oblastima istraživanja^[2] kao što su fizika, ekonomija, finansije, političke nauke, psihologija,^[3] biologija, krivično pravo, kriminologija i sociologija.^[4] Zloupotreba p-vrednosti je kontroverzna tema u metanauci.^[5]

Zakošavanje slova, upotreba velikih slova i deljenja reči variraju. Na primer, AMA stil koristi „P vrednost”, APA stil koristi „p vrednost”, a Američko statističko udruženje koristi „p-vrednost”.^[6]

Definicija i interpretacija

Definicija

Verovatnoća pod nultom hipotezom dobijanja statistike testa sa realnom vrednošću koja je najmanje ekstremna kao i dobijena

Uzmimo u obzir posmatranu statistiku testa $t$ iz nepoznate distribucije $T$ . Tada je p-vrednost $p$ ono što bi bila prethodna verovatnoća da se posmatra statistička vrednost testa barem tako „ekstremna” kao $t$ ako je nulta hipoteza $H_{0}$ bila istinita. To je:

$p=\Pr(T\geq t\mid H_{0})$ za jednostranu distribuciju test statistike na desni rep,
$p=\Pr(T\leq t\mid H_{0})$ za jednostranu distribuciju test statistike sa leve strane,
$p=2\min\{\Pr(T\geq t\mid H_{0}),\Pr(T\leq t\mid H_{0})\}$ za dvostranu distribuciju test-statistike. Ako je raspodela $T$ simetrična oko nule, onda je $p=\Pr(|T|\geq |t|\mid H_{0})$

Tumačenja

p-vrednost kao statistika za izvođenje testova značajnosti

U testu značaja, nulta hipoteza $H_{0}$ se odbacuje ako je p-vrednost manja ili jednaka unapred definisanoj graničnoj vrednosti $\alpha$ , koja se naziva alfa nivo ili nivo značajnosti. $\alpha$ nije izvedeno iz podataka, već ga postavlja istraživač pre ispitivanja podataka. $\alpha$ se obično postavlja na 0,05, mada se ponekad koriste niži nivoi za alfa. U 2018. godini, grupa statističara na čelu sa Danijelom Bendžaminom predložila je usvajanje vrednosti od 0,005 kao standardne vrednosti za statističku značajnost širom sveta.^[7]

Distribucija

Kada je nulta hipoteza tačna, ako ima oblik $H_{0}:\theta =\theta _{0}$ , a osnovna slučajna promenljiva je kontinuirana, onda je raspodela verovatnoće p-vrednosti uniformna na intervalu [0,1]. Nasuprot tome, ako je alternativna hipoteza tačna, distribucija zavisi od veličine uzorka i prave vrednosti parametra koji se proučava.^[2]^[8]

Distribucija p-vrednosti za grupu studija se ponekad naziva p-kriva.^[9] p-kriva se može koristiti za procenu pouzdanosti naučne literature, kao što je otkrivanje pristrasnosti publikacije ili p-hakovanje.^[9]^[10]

Reference

^ Wasserstein, Ronald L.; Lazar, Nicole A. (7. 3. 2016). „The ASA's Statement on p-Values: Context, Process, and Purpose”. The American Statistician. 70 (2): 129—133. doi:10.1080/00031305.2016.1154108.
^ ^а ^б Bhattacharya, Bhaskar; Habtzghi, DeSale (2002). „Median of the p value under the alternative hypothesis”. The American Statistician. 56 (3): 202—6. doi:10.1198/000313002146.
^ Wetzels, R.; Matzke, D.; Lee, M. D.; Rouder, J. N.; Iverson, G. J.; Wagenmakers, E. -J. (2011). „Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests”. Perspectives on Psychological Science. 6 (3): 291—298. PMID 26168519. doi:10.1177/1745691611406923.
^ Babbie, E. (2007). The practice of social research 11th ed. Thomson Wadsworth: Belmont, California.
^ Ioannidis, John P. A.; Ware, Jennifer J.; Wagenmakers, Eric-Jan; Simonsohn, Uri; Chambers, Christopher D.; Button, Katherine S.; Bishop, Dorothy V. M.; Nosek, Brian A.; Munafò, Marcus R. (januar 2017). „A manifesto for reproducible science”. Nature Human Behaviour (на језику: енглески). стр. 0021. doi:10.1038/s41562-016-0021. Приступљено 9. 5. 2019.
^ ASA House Style
^ Benjamin, Daniel J.; Berger, James O.; Johannesson, Magnus; et al. (1. 9. 2017). „Redefine statistical significance”. Nature Human Behaviour. 2 (1): 6—10. PMID 30980045. doi:10.1038/s41562-017-0189-z. eISSN 2397-3374.
^ Hung HM, O'Neill RT, Bauer P, Köhne K (март 1997). „The behavior of the P-value when the alternative hypothesis is true”. Biometrics (Submitted manuscript). 53 (1): 11—22. JSTOR 2533093. PMID 9147587. doi:10.2307/2533093.
^ ^а ^б Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD (март 2015). „The extent and consequences of p-hacking in science”. PLOS Biology. 13 (3): e1002106. PMC 4359000 . PMID 25768323. doi:10.1371/journal.pbio.1002106.
^ Simonsohn U, Nelson LD, Simmons JP (новембар 2014). „p-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results”. Perspectives on Psychological Science. 9 (6): 666—681. PMID 26186117. S2CID 39975518. doi:10.1177/1745691614553988.

Literatura

Pearson, Karl (1900). „On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling” (PDF). Philosophical Magazine. Series 5. 50 (302): 157—175. doi:10.1080/14786440009463897.
Elderton, William Palin (1902). „Tables for Testing the Goodness of Fit of Theory to Observation”. Biometrika. 1 (2): 155—163. doi:10.1093/biomet/1.2.155.
Fisher, Ronald (1925). Statistical Methods for Research Workers. Edinburgh, Scotland: Oliver & Boyd. ISBN 978-0-05-002170-5.
Fisher, Ronald A. (1971) [1935]. The Design of Experiments (9th изд.). Macmillan. ISBN 978-0-02-844690-5.
Fisher, R. A.; Yates, F. (1938). Statistical tables for biological, agricultural and medical research. London, England.
Stigler, Stephen M. (1986). The history of statistics : the measurement of uncertainty before 1900. Cambridge, Mass: Belknap Press of Harvard University Press. ISBN 978-0-674-40340-6.
Hubbard, Raymond; Bayarri, M. J. (novembar 2003), P Values are not Error Probabilities (PDF), Архивирано из оригинала (PDF) 4. 9. 2013. г., a working paper that explains the difference between Fisher's evidential p-value and the Neyman–Pearson Type I error rate α.
Hubbard, Raymond; Armstrong, J. Scott (2006). „Why We Don't Really Know What Statistical Significance Means: Implications for Educators” (PDF). Journal of Marketing Education. 28 (2): 114—120. doi:10.1177/0273475306288399. Архивирано из оригинала 18. 5. 2006. г.
Hubbard, Raymond; Lindsay, R. Murray (2008). „Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing” (PDF). Theory & Psychology. 18 (1): 69—88. doi:10.1177/0959354307086923. Архивирано из оригинала (PDF) 21. 10. 2016. г. Приступљено 21. 07. 2019.
Stigler, S. (decembar 2008). „Fisher and the 5% level”. Chance. 21 (4): 12. doi:10.1007/s00144-008-0033-3.
Dallal, Gerard E. (2012). The Little Handbook of Statistical Practice.
Biau, D.J.; Jolles, B.M.; Porcher, R. (mart 2010). „P value and the theory of hypothesis testing: an explanation for new researchers”. Clin Orthop Relat Res. 463 (3): 885—892. PMC 2816758 . PMID 19921345. doi:10.1007/s11999-009-1164-4.
Reinhart, Alex (2015). Statistics Done Wrong: The Woefully Complete Guide. No Starch Press. стр. 176. ISBN 978-1593276201.
Denworth L (октобар 2019). „A Significant Problem: Standard scientific methods are under fire. Will anything change?”. Scientific American. 321 (4): 62—67 (63). „The use of p values for nearly a century [since 1925] to determine statistical significance of experimental results has contributed to an illusion of certainty and [to] reproducibility crises in many scientific fields. There is growing determination to reform statistical analysis... Some [researchers] suggest changing statistical methods, whereas others would do away with a threshold for defining "significant" results.”
Benjamini, Yoav; De Veaux, Richard D.; Efron, Bradley; Evans, Scott; Glickman, Mark; Graubard, Barry I.; He, Xuming; Meng, Xiao-Li; Reid, Nancy; Stigler, Stephen M.; Vardeman, Stephen B.; Wikle, Christopher K.; Wright, Tommy; Young, Linda J.; Kafadar, Karen (2021). „The ASA President's Task Force Statement on Statistical Significance and Replicability”. Annals of Applied Statistics. 15 (3): 1084—1085. doi:10.1214/21-AOAS1501 .
Benjamin, Daniel J.; Berger, James O.; Johannesson, Magnus; et al. (1. 9. 2017). „Redefine statistical significance”. Nature Human Behaviour. 2 (1): 6—10. PMID 30980045. doi:10.1038/s41562-017-0189-z. eISSN 2397-3374.

Spoljašnje veze

Free online p-values calculators for various specific tests (chi-square, Fisher's F-test, etc.).
Understanding p-values, including a Java applet that illustrates how the numerical values of p-values can give quite misleading impressions about the truth or falsity of the hypothesis under test.
StatQuest: P Values, clearly explained na sajtu YouTube
StatQuest: P-value pitfalls and power calculations na sajtu YouTube

[ASA-1] Wasserstein, Ronald L.; Lazar, Nicole A. (7. 3. 2016). „The ASA's Statement on p-Values: Context, Process, and Purpose”. The American Statistician. 70 (2): 129—133. doi:10.1080/00031305.2016.1154108.

[Bhattacharya2002-2] а ^б Bhattacharya, Bhaskar; Habtzghi, DeSale (2002). „Median of the p value under the alternative hypothesis”. The American Statistician. 56 (3): 202—6. doi:10.1198/000313002146.

[3] Wetzels, R.; Matzke, D.; Lee, M. D.; Rouder, J. N.; Iverson, G. J.; Wagenmakers, E. -J. (2011). „Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests”. Perspectives on Psychological Science. 6 (3): 291—298. PMID 26168519. doi:10.1177/1745691611406923.

[4] Babbie, E. (2007). The practice of social research 11th ed. Thomson Wadsworth: Belmont, California.

[5] Ioannidis, John P. A.; Ware, Jennifer J.; Wagenmakers, Eric-Jan; Simonsohn, Uri; Chambers, Christopher D.; Button, Katherine S.; Bishop, Dorothy V. M.; Nosek, Brian A.; Munafò, Marcus R. (januar 2017). „A manifesto for reproducible science”. Nature Human Behaviour (на језику: енглески). стр. 0021. doi:10.1038/s41562-016-0021. Приступљено 9. 5. 2019.

[6] ASA House Style

[BenjaminBergerJohannesson2017-7] Benjamin, Daniel J.; Berger, James O.; Johannesson, Magnus; et al. (1. 9. 2017). „Redefine statistical significance”. Nature Human Behaviour. 2 (1): 6—10. PMID 30980045. doi:10.1038/s41562-017-0189-z. eISSN 2397-3374.

[Hung1997-8] Hung HM, O'Neill RT, Bauer P, Köhne K (март 1997). „The behavior of the P-value when the alternative hypothesis is true”. Biometrics (Submitted manuscript). 53 (1): 11—22. JSTOR 2533093. PMID 9147587. doi:10.2307/2533093.

[Head2015-9] а ^б Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD (март 2015). „The extent and consequences of p-hacking in science”. PLOS Biology. 13 (3): e1002106. PMC 4359000 . PMID 25768323. doi:10.1371/journal.pbio.1002106.

[Simonsohn2014-10] Simonsohn U, Nelson LD, Simmons JP (новембар 2014). „p-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results”. Perspectives on Psychological Science. 9 (6): 666—681. PMID 26186117. S2CID 39975518. doi:10.1177/1745691614553988.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]