SNP genotyping and data quality control
SNP genotyping was performed at two different phases of the study over a period of approximately 18 months. In the initial pilot phase 300 donor/recipient pairs were genotyped for the chosen SNP AIMs panel1using theSequenomiPLEX platform2 complemented by Thermo Fisher’s Taqman assays. In the second phase, extended genotyping on the Illumina HumanOmniExpress BeadChip became feasible, providing over 719,000 SNPs. However, to maintain the integrity of the initial study protocol, we sustained work with the 500 AIMs panel1, 304 of which were genotyped on the Illumina array and the remaining 196 were imputed using the 1000 Genomes reference populations3 and the IMPUTE 24 software package.
Due to the time lapse between the two genotyping phases of the study, sample quality control (QC)was performed separately for each phase, then all samples were merged for the combined analysis. In the pilot phase, none of the genotyped SNPs deviated from Hardy-Weinberg equilibrium proportions. However, six of the 500 SNP AIMs failed both Sequenom and Taqman genotyping, bringing the total SNPs participating in the analysis to 494. Additionally, 41donor/recipient pairs with sample call rate 94% were excluded from further analysis.
In the second phase, SNP and sample QC for the samples genotyped on the HumanOmniExpress chip was performed following the guidelines outlined by Laurie et al5 and using the software PLINK6.Seven samples were excluded by the genotyping lab for having a call rate < 98%. We excluded 6583 SNPs for not meeting the same call rate; none of them were among the study AIMs. An X-chromosome homozygosity test (supplemental figure S1) was run to check the predicted gender against the values reported to the CIBMTR and found twelve discordant samples. Further sex-chromosome testing at the genotyping lab confirmed gender concordance for two of the twelve samples while the remaining ten were excluded from further analysis. We further investigated the possibility of a sample switch at the lab through imputing HLA using the genotyped SNPs and cross-validating the imputed versus reported HLA for the gender discordant samples. Through this technique we found two sample pairs that were likely switched at the lab.
To quantitatively validate the quality of SNP imputation and sample QC for the study SNPs, we re-genotyped 54 samples (27 donor/recipient pairs) from the pilot phase on the HumanOmniExpress BeadChip. Genotypes from both phases of the study for all 54 QC samples were concordant at a rate of 98.01%. This was measured on the genotyped subset of SNPs from the study AIMs panel1. The average concordance of the imputed and actual genotypes of the 54 validation SNPs was 94.1%. This was measured on the imputed subset of SNPs.
References
1.Paschou P, Drineas P, Lewis J, et al. Tracing sub-structure in the European American population with PCA-informative markers. PLoS Genet. 2008;4(7):e1000114.
2.Gabriel S, Ziaugra L, Tabbaa D. SNP genotyping using the Sequenom MassARRAY iPLEX platform. Curr Protoc Hum Genet. 2009;Chapter 2:Unit 2.12.
3.Auton A, Brooks LD, Durbin RM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68-74.
4.Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012;44(8):955-959.
5.Laurie CC, Doheny KF, Mirel DB, et al. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet Epidemiol. 2010;34(6):591-602.
6.Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559-575.
1