Abstract
We report a new method to estimate the predictive performance of polygenic models for risk prediction and assess predictive performance for ten complex traits or common diseases. Using estimates of effect-size distribution and heritability derived from current studies, we project that although 45% of the variance of height has been attributed to SNPs, a model trained on one million people may only explain 33.4% of variance of the trait. Models based on current studies allow for identification of 3.0%, 1.1% and 7.0% of the populations at twofold or higher than average risk for type 2 diabetes, coronary artery disease and prostate cancer, respectively. Tripling of sample sizes could elevate these percentages to 18.8%, 6.1% and 12.2%, respectively. The utility of polygenic models for risk prediction will depend on achievable sample sizes for the training data set, the underlying genetic architecture and the inclusion of information on other risk factors, including family history.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
206,07 € per year
only 17,17 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
References
Bowles Biesecker, B. & Marteau, T.M. The future of genetic counselling: an international perspective. Nat. Genet. 22, 133–137 (1999).
Pharoah, P.D. et al. Polygenic susceptibility to breast cancer and implications for prevention. Nat. Genet. 31, 33–36 (2002).
van Hoek, M. et al. Predicting type 2 diabetes based on polymorphisms from genome-wide association studies: a population-based study. Diabetes 57, 3122–3128 (2008).
Pharoah, P.D., Antoniou, A.C., Easton, D.F. & Ponder, B.A. Polygenes, risk prediction, and targeted prevention of breast cancer. N. Engl. J. Med. 358, 2796–2803 (2008).
Wacholder, S. et al. Performance of common genetic variants in breast-cancer risk models. N. Engl. J. Med. 362, 986–993 (2010).
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Speliotes, E.K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
Teslovich, T.M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Jostins, L. & Barrett, J.C. Genetic risk prediction in complex disease. Hum. Mol. Genet. 20, R182–R188 (2011).
Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat. Genet. 42, 1118–1125 (2010).
Kraft, P. & Hunter, D.J. Genetic risk prediction–are we there yet? N. Engl. J. Med. 360, 1701–1703 (2009).
Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Zuk, O., Hechter, E., Sunyaev, S.R. & Lander, E.S. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc. Natl. Acad. Sci. USA 109, 1193–1198 (2012).
Park, J.H. et al. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat. Genet. 42, 570–575 (2010).
Park, J.H. et al. Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants. Proc. Natl. Acad. Sci. USA 108, 18026–18031 (2011).
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).
Park, J.H. & Dunson, D.B. Bayesian generalized product partition model. Statist. Sinica 20, 1203–1226 (2010).
Lee, S.H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).
Stahl, E.A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).
Vattikuti, S., Guo, J. & Chow, C.C. Heritability and Genetic Correlations Explained by Common SNPs for Metabolic Syndrome Traits. PLoS Genet. 8, e1002637 (2012).
Purcell, S.M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Clayton, D.G. Prediction and interaction in complex disease genetics: experience in type 1 diabetes. PLoS Genet. 5, e1000540 (2009).
Wray, N.R., Goddard, M.E. & Visscher, P.M. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17, 1520–1528 (2007).
Daetwyler, H.D., Villanueva, B. & Woolliams, J.A. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS ONE 3, e3395 (2008).
Janssens, A.C. et al. Predictive testing for complex diseases using multiple genes: fact or fiction? Genet. Med. 8, 395–400 (2006).
Mihaescu, R., Moonesinghe, R., Khoury, M.J. & Janssens, A.C. Predictive genetic testing for the identification of high-risk groups: a simulation study on the impact of predictive ability. Genome Med. 3, 51 (2011).
Roberts,, N.J. et al. The predictive capacity of personal genome sequencing. Sci Transl. Med. 4, 133ra58 (2012).
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B Stat. Methodol. 58, 267–288 (1996).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Goddard, M.E., Wray, N.R., Verbyla, K. & Visscher, P.M. Estimating effects and making predictions from genome-wide marker data. Stat. Sci. 24, 517–529 (2009).
Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5, 1780–1815 (2011).
Li, B. & Leal, S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).
Gail, M.H. Personalized estimates of breast cancer risk in clinical practice and public health. Stat. Med. 30, 1090–1104 (2011).
Lee, S.H., Wray, N.R., Goddard, M.E. & Visscher, P.M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).
Barrett, J.C. et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat. Genet. 41, 703–707 (2009).
Voight, B.F. et al. Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nat. Genet. 42, 579–589 (2010).
Eeles, R.A. et al. Identification of seven new prostate cancer susceptibility loci through a genome-wide association study. Nat. Genet. 41, 1116–1121 (2009).
Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).
Scheuner, M.T. Genetic evaluation for coronary artery disease. Genet. Med. 5, 269–285 (2003).
Mai, P.L., Wideroff, L., Greene, M.H. & Graubard, B.I. Prevalence of family history of breast, colorectal, prostate, and lung cancer in a population-based study. Public Health Genomics 13, 495–503 (2010).
Annis, A.M., Caulder, M.S., Cook, M.L. & Duquette, D. Family history, diabetes, and other demographic and risk factors among participants of the National Health and Nutrition Examination Survey 1999–2002. Prev. Chronic Dis. 2, A19 (2005).
Wray, N.R., Yang, J., Goddard, M.E. & Visscher, P.M. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 6, e1000864 (2010).
So, H.C., Kwan, J.S., Cherny, S.S. & Sham, P.C. Risk prediction of complex diseases from family history and known susceptibility loci, with applications for cancer screening. Am. J. Hum. Genet. 88, 548–565 (2011).
Park, J.H., Gail, M.H., Greene, M.H. & Chatterjee, N. Potential usefulness of single nucleotide polymorphisms to identify persons at high cancer risk: an evaluation of seven common cancers. J. Clin. Oncol. 30, 2157–2162 (2012).
Ghosh, A., Zou, F. & Wright, F.A. Estimating odds ratios in genome scans: an approximate conditional likelihood approach. Am. J. Hum. Genet. 82, 1064–1074 (2008).
Spiegelhalter, D.J., Best, N.G., Carlin, B.R. & van der Linde, A. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Series B Stat. Methodol. 64, 583–616 (2002).
Acknowledgements
This research was supported by the intramural program of the US National Cancer Institute.
Author information
Authors and Affiliations
Contributions
N.C. led the development of the statistical methods and drafted the manuscript. J.-H.P. contributed to the development of the methods and performed the illustrative analyses. B.W. implemented simulation studies. J.S., P.H. and S.J.C. contributed to designs of various analyses and interpretation of results. N.C., B.W., J.S., P.H., S.J.C. and J.-H.P. reviewed and revised the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Tables 1–4, Supplementary Figures 1–3, Supplementary Note (PDF 329 kb)
Rights and permissions
About this article
Cite this article
Chatterjee, N., Wheeler, B., Sampson, J. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat Genet 45, 400–405 (2013). https://doi.org/10.1038/ng.2579
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.2579
This article is cited by
-
Principles and methods for transferring polygenic risk scores across global populations
Nature Reviews Genetics (2024)
-
Enhancing polygenic risk prediction in diverse populations: opportunities and challenges
Nature Genetics (2023)
-
A polygenic and family risk score are both independently associated with risk of type 2 diabetes in a population-based study
Scientific Reports (2023)
-
The genetics of non-monogenic IBD
Human Genetics (2023)
-
A new method for multiancestry polygenic prediction improves performance across diverse populations
Nature Genetics (2023)