Abstract
Mixed linear model (MLM) methods have proven useful in controlling for population structure and relatedness within genome-wide association studies. However, MLM-based methods can be computationally challenging for large datasets. We report a compression approach, called 'compressed MLM', that decreases the effective sample size of such datasets by clustering individuals into groups. We also present a complementary approach, 'population parameters previously determined' (P3D), that eliminates the need to re-compute variance components. We applied these two methods both independently and combined in selected genetic association datasets from human, dog and maize. The joint implementation of these two methods markedly reduced computing time and either maintained or improved statistical power. We used simulations to demonstrate the usefulness in controlling for substructure in genetic association datasets for a range of species and genetic architectures. We have made these methods available within an implementation of the software program TASSEL.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
206,07 € per year
only 17,17 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
References
Abiola, O. et al. The nature and identification of quantitative trait loci: a community's view. Nat. Rev. Genet. 4, 911–916 (2003).
Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Abecasis, G.R., Cardon, L.R. & Cookson, W.O. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66, 279–292 (2000).
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).
Balding, D.J. A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7, 781–791 (2006).
Buckler, E.S. et al. The genetic architecture of maize flowering time. Science 325, 714–718 (2009).
Henderson, C.R. Comparison of alternative sire evaluation methods. J. Anim. Sci. 41, 760–770 (1975).
Pollak, E.J. & Quaas, R.L. Definition of group effects in sire evaluation models. J. Dairy Sci. 66, 1503–1509 (1983).
Thompson, R. Sire evaluation. Biometrics 35, 339–353 (1979).
Quass, R.L. & Pollak, E.J. Mixed model methodology for farm and ranch beef cattle testing programs. J. Anim. Sci. 51, 1277–1287 (1980).
Myles, S. et al. Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell 21, 2194–2202 (2009).
Zhu, L. et al. The long (and winding) road to gene discovery for canine hip dysplasia. Vet. J. 181, 97–110 (2009).
Henderson, C.R. Applications of Linear Models in Animal Breeding (University of Guelph, Guelph, Ontario, Canada, 1984).
Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).
Aulchenko, Y.S., de Koning, D.-J. & Haley, C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).
Searle, S.R., Casella, G. & McCulloch, C.E. Variance Components (Wiley & Sons, New York, 1992).
Robertson, A. Optimum group size in progeny testing and family selection. Biometrics 13, 442–450 (1957).
Hannrup, B., Jansson, G. & Danell, Ö. Comparing gain and optimum test size from progeny testing and phenotypic selection in Pinus sylvestris. Can. J. For. Res. 37, 1227–1235 (2007).
de Oliveira, H.N. & Lobo, R.B. Use of progeny testing in beef cattle: prediction of genetic gain in Nelore cattle breeding program. Rev. Bras. Genet. 18, 207–214 z(1995).
Yu, J., Arbelbide, M. & Bernardo, R. Power of in silico QTL mapping from phenotypic, pedigree and marker data in a hybrid breeding program. Theor. Appl. Genet. 110, 1061–1067 (2005).
Rutherford, J.R. & Krutchkoff, R.G. The empirical Bayes approach: estimating the prior distribution. Biometrika 54, 326–328 (1967).
Romesberg, H.C. Cluster Analysis for Researchers (LULU Press, Raleigh, North Carolina, USA, 2004).
Jain, A.K., Murty, M.N. & Flynn, P.J. Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999).
SAS Institute Inc. Statistical Analysis Software for Windows (Cary, North Carolina, 2002).
Bradbury, P.J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).
Lai, C.Q. et al. Fenofibrate effect on triglyceride and postprandial response of apolipoprotein A5 variants: the GOLDN study. Arterioscler. Thromb. Vasc. Biol. 27, 1417–1425 (2007).
Zhang, Z. et al. Estimation of heritabilities, genetic correlations, and breeding values of four traits collectively defining hip dysplasia in dogs. Am. J. Vet. Res. 70, 483–492 (2009).
Long, A.D. & Langley, C.H. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9, 720–731 (1999).
Lande, R. & Thompson, R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124, 743–756 (1990).
Loiselle, B.A., Sork, V.L., Nason, J. & Graham, C. Spatial genetic-structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). Am. J. Bot. 82, 1420–1425 (1995).
Acknowledgements
This study was supported by the US National Science Foundation (NSF)–Plant Genome Program (DBI-0321467, 0703908 and 0820619), NSF–Plant Genome Comparative Sequencing Program (DBI-06638566), US National Institutes of Health (1R21AR055228-01A1), National Heart, Lung, and Blood Institute (U 01 HL72524, HL54776 and 5U01HL072524-06), US Department of Agriculture Research Service (53-K06–5-10 and 58–1950-9–001), USDA–Cooperative State Research, Education and Extension Service National Research Initiative (2006-35300-17155), Morris Animal Foundation (D04CA-135), WALTHAM Centre for Pet Nutrition, Cornell Advanced Technology in Biotechnology and the Collaborative Research Program in the Cornell Veterinary College. The authors would like to thank K. Zhao for providing the source code to compute kinship and L. Rigamer Lirette, A.L. Ingham and S. Myles for editing of the manuscript.
Author information
Authors and Affiliations
Contributions
Z.Z. conceptualized the study, performed the data analyses and wrote the manuscript. E.E., M.A.G. and J.Y. participated in the data analyses and wrote the manuscript. P.J.B. implemented the two new methods in the TASSEL software package. C.L., H.K.T., D.K.A. and J.M.O. provided the human data and supervised its analyses. R.J.T. provided the dog data and supervised its analyses. E.S.B designed and supervised the project. All authors edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–5 and Supplementary Note (PDF 1425 kb)
Rights and permissions
About this article
Cite this article
Zhang, Z., Ersoz, E., Lai, CQ. et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42, 355–360 (2010). https://doi.org/10.1038/ng.546
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.546