Papers by Pierre-antoine Gourraud
Stem cells translational medicine, 2015
SummaryThe development of a California-based induced pluripotent stem cell (iPSC) bank based on h... more SummaryThe development of a California-based induced pluripotent stem cell (iPSC) bank based on human leukocyte antigen (HLA) haplotype matching represents a significant challenge and a valuable opportunity for the advancement of regenerative medicine. However, previously published models of iPSC banks have neither addressed the admixed nature of populations like that of California nor evaluated the benefit to the population as a whole. We developed a new model for evaluating an iPSC haplobank based on demographic and immunogenetic characteristics reflecting California. The model evaluates haplolines or cell lines from donors homozygous for a single HLA-A, HLA-B, HLA-DRB1 haplotype. We generated estimates of the percentage of the population matched under various combinations of haplolines derived from six ancestries (black/African American, American Indian, Asian/Pacific Islander, Hispanic, and white/not Hispanic) and data available from the U.S. Census Bureau, the California Instit...
Tissue antigens, 2014
Genetic matching for loci in the human leukocyte antigen (HLA) region between a donor and a patie... more Genetic matching for loci in the human leukocyte antigen (HLA) region between a donor and a patient in hematopoietic stem cell transplantation (HSCT) is critical to outcome; however, methods for HLA genotyping of donors in unrelated stem cell registries often yield results with allelic and phase ambiguity and/or do not query all clinically relevant loci. We present and evaluate a statistical method for in silico imputation of HLA alleles and haplotypes in large ambiguous population data from the Be The Match(®) Registry. Our method builds on haplotype frequencies estimated from registry populations and exploits patterns of linkage disequilibrium (LD) across HLA haplotypes to infer high resolution HLA assignments. We performed validation on simulated and real population data from the Registry with non-trivial ambiguity content. While real population datasets caused some predictions to deviate from expectation, validations still showed high percent recall for imputed results with aver...
Transplant Immunology, 2005
We review the most classical questions addressed by the analysis of population data in immunogene... more We review the most classical questions addressed by the analysis of population data in immunogenetics. Basic genetics' definitions are reminded. Questions related to the population data itself (structure, missing values nomenclature, sampling) are developed first, and secondly, the population genetics questions (relevance of genetic parameters (phenotype, genotype allele and haplotype frequencies; genetic distances; linkage disequilibrium measures), methods and practical computing) are illustrated by immunogenetics polymorphisms. This article gives the essential of population immunogenetics on examples and key references. We underline the importance of population dimension in statistical analysis: structure of linkage disequilibrium and genetic diversity between populations may affect the power of the study or the interpretation of correlations between markers, genes and diseases, making population genetics both theoretical and very practical.
Transplant Immunology, 2005
Complex polygenic and multifactorial diseases remain a challenge for human geneticists. Here we a... more Complex polygenic and multifactorial diseases remain a challenge for human geneticists. Here we aim to remind basic definitions of multifactorial diseases and the genetic related concepts underlying classical methods. Knowledge on pathophysiological process and the genetic information available conditions the design of study. The choice of methodology, between candidate gene approach and genome scan approach, between linkage and association studies, is the most important step. Both methods, linkage analysis and association studies are usually considered as complementary approaches for a given disease. For this reason, in this article, we present the most important classical methodologies in genetic epidemiology of complex disorders. References and examples are given to illustrate.
Human Immunology, 2015
We have estimated human leukocyte antigen (HLA) haplotype frequencies using the maximum likelihoo... more We have estimated human leukocyte antigen (HLA) haplotype frequencies using the maximum likelihood mode, which accommodates typing ambiguities. The results of the frequency distribution of the 7,015 haplotypes obtained are presented here. These include a total of 114 HLA-A, 185 HLA-B, and 76 HLA-DRB1 unique alleles at each locus. Across all populations, although the most common individual HLA alleles were HLA-A∗02:01 (29.0%), HLA-B∗07:02 (11.4%), and HLA-DRB1∗07:01 (15.9%), the most frequent haplotype was found to be HLA-A∗01:01-B∗08:01-DRB1∗03:01.
PLoS ONE, 2014
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variati... more The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC), only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD) decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies.
Frontiers in Genetics, 2015
Imputation is a commonly used technique that exploits linkage disequilibrium to infer missing gen... more Imputation is a commonly used technique that exploits linkage disequilibrium to infer missing genotypes in genetic datasets, using a well-characterized reference population. While there is agreement that the reference population has to match the ethnicity of the query dataset, it is common practice to use the same reference to impute genotypes for a wide variety of phenotypes. We hypothesized that using a reference composed of samples with a different phenotype than the query dataset would introduce imputation bias. To test this hypothesis we used GWAS datasets from Amyotrophic Lateral Sclerosis (ALS), Parkinson Disease (PD), and Crohn's Disease (CD). First, we masked and then performed imputation of 100 disease-associated markers and 100 non-associated markers from each study. Two references for imputation were used in parallel: one consisting of healthy controls and another consisting of patients with the same disease. We assessed the discordance (imprecision) and bias (inaccuracy) of imputation by comparing predicted genotypes to those assayed by SNP-chip. We also assessed the bias on the observed effect size when the predicted genotypes were used in a GWAS study. When healthy controls were used as reference for imputation, a significant bias was observed, particularly in the disease-associated markers. Using cases as reference significantly attenuated this bias. For nearly all markers, the direction of the bias favored the non-risk allele. In GWAS studies of the three diseases (with healthy reference controls from the 1000 genomes as reference), the mean OR for disease-associated markers obtained by imputation was lower than that obtained using original assayed genotypes. We found that the bias is inherent to imputation as using different methods did not alter the results. In conclusion, imputation is a powerful method to predict genotypes and estimate genetic risk for GWAS. However, a careful choice of reference population is needed to minimize biases inherent to this approach.
Human Immunology, 2015
High-resolution haplotype frequency estimations and descriptive metrics are becoming increasingly... more High-resolution haplotype frequency estimations and descriptive metrics are becoming increasingly popular for accurately describing human leukocyte antigen diversity. In this study, we compared sample sets of publically available haplotype frequencies from different populations to characterize the consequences of unequal sample size on haplotype frequency estimation. We found that for low samples sizes (a few thousand), haplotype frequencies were overestimated, affecting all descriptive metrics of the underlying distribution, such as most frequent haplotype, the number of haplotypes, and the mean/median frequency. This overestimation was a result of random sample fluctuation and truncation of the tail end of the frequency distribution that comprises the least frequent haplotypes. Finally, we simulated balanced datasets through resampling and contrasted the disparities of descriptive metrics among equal and unequal datasets. This simulation resulted in the global description of the most frequent human leukocyte antigen haplotypes worldwide.
Multiple Sclerosis Journal, 2014
Many genetic risk variants are now well established in multiple sclerosis (MS), but the impact on... more Many genetic risk variants are now well established in multiple sclerosis (MS), but the impact on clinical phenotypes is unclear. To investigate the impact of established MS genetic risk variants on MS phenotypes, in well-characterized MS cohorts. Norwegian MS patients (n = 639) and healthy controls (n = 530) were successfully genotyped for 61 established MS-associated single nucleotide polymorphisms (SNPs). Data including and excluding Major Histocompatibility Complex (MHC) markers were summed to a MS Genetic Burden (MSGB) score. Study replication was performed in a cohort of white American MS patients (n = 1997) and controls (n = 708). The total human leukocyte antigen (HLA) and the non-HLA MSGB scores were significantly higher in MS patients than in controls, in both cohorts (P < 10(-22)). MS patients, with and without cerebrospinal fluid (CSF) oligoclonal bands (OCBs), had a higher MSGB score than the controls; the OCB-positive patients had a slightly higher MSGB than the OCB-negative patients. An early age at symptom onset (AAO) also correlated with a higher MSGB score, in both cohorts. The MSGB score was associated with specific clinical MS characteristics, such as OCBs and AAO. This study underlines the need for well-characterized, large cohorts of MS patients, and the usefulness of summarizing multiple genetic risk factors of modest effect size in genotype-phenotype analyses.
Immunological Reviews, 2012
Human Immunology, 2005
Human leukocyte antigen (HLA) matching remains a key issue in the outcome of transplantation. In ... more Human leukocyte antigen (HLA) matching remains a key issue in the outcome of transplantation. In hematopoietic stem cell transplantation with unrelated donors, the matching for compatible donors is based on the HLA phenotype information. In familial transplantation, the matching is achieved at the haplotype level because donor and recipient share the block-transmitted major histocompatibility complex region. We present a statistical method based on the HLA haplotype inference to refine the HLA information available in an unrelated situation. We implement a systematic statistical inference of the haplotype combinations at the individual level. It computes the most likely haplotype pair given the phenotype and its probability. The method is validated on 301 phase-known phenotypes from CEPH families (Centre d'Etude du Polymorphisme Humain). The method is further applied to 85,933 HLA-A B DR typed unrelated donors from the French Registry of hematopoietic stem cells donors (France Greffe de Moëlle). The average value of prediction probability is 0.761 (SD 0.199) ranging from 0.26 to 1. Correlations between phenotype characteristics and predictions are also given. Homozygosity (OR ϭ 2.08; [2.02-2.14] p Ͻ10 Ϫ3 ) and linkage disequilibrium (p Ͻ10 Ϫ3 ) are the major factors influencing the quality of prediction. Limits and relevance of the method are related to limits of haplotype estimation. Relevance of the method is discussed in the context of HLA matching refinement.
European Journal of Human Genetics, 2004
Haplotype frequency estimation in population data is an important problem in genetics and differe... more Haplotype frequency estimation in population data is an important problem in genetics and different methods including expectation maximisation (EM) methods have been proposed. The statistical properties of EM methods have been extensively assessed for data sets with no missing values. When numerous markers and/or individuals are tested, however, it is likely that some genotypes will be missing. Thus, it is of interest to investigate the behaviour of the method in the presence of incomplete genotype observations. We propose an extension of the EM method to handle missing genotypes, and we compare it with commonly used methods (such as ignoring individuals with incomplete genotype information or treating a missing allele as any other allele). Simulations were performed, starting from data sets of haematopoietic stem cell donors genotyped at three HLA loci. We deleted some data to create incomplete genotype observations in various proportions. We then compared the haplotype frequencies obtained on these incomplete data sets using the different methods to those obtained on the complete data. We found that the method proposed here provides better estimations, both qualitatively and quantitatively, but increases the computation time required. We discuss the influence of missing values on the algorithm's efficiency and the advantages and disadvantages of deleting incomplete genotypes. We propose guidelines for missing data handling in routine analysis.
Brain, 2013
Brain magnetic resonance imaging is widely used as a diagnostic and monitoring tool in multiple s... more Brain magnetic resonance imaging is widely used as a diagnostic and monitoring tool in multiple sclerosis and provides a non-invasive, sensitive and reproducible way to track the disease. Topological characteristics relating to the distribution and shape of lesions are recognized as important neuroradiological markers in the diagnosis of multiple sclerosis, although these have been much less well characterized quantitatively than have traditional measures such as T 2 hyperintense or T 1 hypointense lesion volumes. Here, we used voxel-level 3 T magnetic resonance imaging T 1 -weighted scans to reconstruct the 3D topology of lesions in 284 subjects with multiple sclerosis and tested whether this is a heritable phenotype. To this end, we extracted the genotypes from a published genome-wide association study on these same individuals and searched for genetic associations with lesion load, shape and topological distribution. Lesion probability maps were created to identify frequently affected areas and to assess the overall distribution of T 1 lesions in the subject population as a whole. We then developed an original algorithm to cluster adjacent lesional voxels (cluxels) in each subject and tested whether cluxel topology was significantly associated with any single-nucleotide polymorphism in our data set. To focus on patterns of lesion distribution, we computed the first 10 principal components. Although principal component 1 correlated with lesion load, none of the remaining orthogonal components correlated with any other known variable. We then conducted genome-wide association studies on each of these and found 31 significant associations (false discovery rate 50.01) with principal component 8, which represents a mode of variation of lesion topology in the population. The majority of the loci can be linked to genes related to immune cell function and to myelin and neural growth; some (SYK, MYT1L, TRAPPC9, SLITKR6 and RIC3) have been previously associated with the distribution of white matter lesions in multiple sclerosis. Finally, we used a bioinformatics approach to identify a network of 48 interacting proteins showing genetic associations (P 5 0.01) with cluxel topology in multiple sclerosis. This network also contains proteins expressed in immune cells and is enriched in molecules expressed in the central nervous system that contribute to neural development and regeneration. Our results show how quantitative traits derived from brain magnetic resonance http://brain.oxfordjournals.org/ Downloaded from images of patients with multiple sclerosis can be used as dependent variables in a genome-wide association study. With the widespread availability of powerful computing and the availability of genotyped populations, integration of imaging and genetic data sets is likely to become a mainstream tool for understanding the complex biological processes of multiple sclerosis and other brain disorders. Top-scoring sub-network (module). (A) Circles and diamond represent proteins, and lines represent physical interactions (green:
Arthritis & Rheumatism, 2006
The HLA-DRB1 gene was reported to be associated with anticitrullinated protein/peptide autoantibo... more The HLA-DRB1 gene was reported to be associated with anticitrullinated protein/peptide autoantibody (ACPA) production in rheumatoid arthritis (RA) patients. A new classification of HLA-DRB1 alleles, reshaping the shared epitope (SE) hypothesis, was recently found relevant in terms of RA susceptibility and structural severity.
Annals of Neurology, 2011
Objective-Multiple sclerosis (MS) is a multifactorial neurologic disease characterized by modest ... more Objective-Multiple sclerosis (MS) is a multifactorial neurologic disease characterized by modest but tractable heritability. Genome Wide Association Studies (GWAS) have identified and/ or validated multiple polymorphisms in approximately 16 genes associated with susceptibility. We aimed at investigating the aggregation of genetic MS-risk markers in individuals by comparing multi and single-case families.
Brain : a journal of neurology, Jan 28, 2015
The aims of this study were: (i) to determine to what degree multiple sclerosis-associated loci d... more The aims of this study were: (i) to determine to what degree multiple sclerosis-associated loci discovered in European populations also influence susceptibility in African Americans; (ii) to assess the extent to which the unique linkage disequilibrium patterns in African Americans can contribute to localizing the functionally relevant regions or genes; and (iii) to search for novel African American multiple sclerosis-associated loci. Using the ImmunoChip custom array we genotyped 803 African American cases with multiple sclerosis and 1516 African American control subjects at 130 135 autosomal single nucleotide polymorphisms. We conducted association analysis with rigorous adjustments for population stratification and admixture. Of the 110 non-major histocompatibility complex multiple sclerosis-associated variants identified in Europeans, 96 passed stringent quality control in our African American data set and of these, >70% (69) showed over-representation of the same allele among...
Annals of neurology, 2014
We present a precision medicine application developed for multiple sclerosis (MS): the MS BioScre... more We present a precision medicine application developed for multiple sclerosis (MS): the MS BioScreen. This new tool addresses the challenges of dynamic management of a complex chronic disease; the interaction of clinicians and patients with such a tool illustrates the extent to which translational digital medicine-that is, the application of information technology to medicine-has the potential to radically transform medical practice. We introduce 3 key evolutionary phases in displaying data to health care providers, patients, and researchers: visualization (accessing data), contextualization (understanding the data), and actionable interpretation (real-time use of the data to assist decision making). Together, these form the stepping stones that are expected to accelerate standardization of data across platforms, promote evidence-based medicine, support shared decision making, and ultimately lead to improved outcomes.
PLoS ONE, 2010
Background: In Northern European descended populations, genetic susceptibility for multiple scler... more Background: In Northern European descended populations, genetic susceptibility for multiple sclerosis (MS) is associated with alleles of the human leukocyte antigen (HLA) Class II gene DRB1. Whether other major histocompatibility complex (MHC) genes contribute to MS susceptibility is controversial.
Brain : a journal of neurology, Jan 22, 2015
Immunological hallmarks of multiple sclerosis include the production of antibodies in the central... more Immunological hallmarks of multiple sclerosis include the production of antibodies in the central nervous system, expressed as presence of oligoclonal bands and/or an increased immunoglobulin G index-the level of immunoglobulin G in the cerebrospinal fluid compared to serum. However, the underlying differences between oligoclonal band-positive and -negative patients with multiple sclerosis and reasons for variability in immunoglobulin G index are not known. To identify genetic factors influencing the variation in the antibody levels in the cerebrospinal fluid in multiple sclerosis, we have performed a genome-wide association screen in patients collected from nine countries for two traits, presence or absence of oligoclonal bands (n = 3026) and immunoglobulin G index levels (n = 938), followed by a replication in 3891 additional patients. We replicate previously suggested association signals for oligoclonal band status in the major histocompatibility complex region for the rs9271640*...
Journal of Allergy and Clinical Immunology, 2014
IgE is a key mediator of allergic inflammation, and its levels are frequently increased in patien... more IgE is a key mediator of allergic inflammation, and its levels are frequently increased in patients with allergic disorders. We sought to identify genetic variants associated with IgE levels in Latinos. We performed a genome-wide association study and admixture mapping of total IgE levels in 3334 Latinos from the Genes-environments…
Uploads
Papers by Pierre-antoine Gourraud