Quality Control of Genotyped Data

Quality Control of Genotyped Data

To check for gender errors, excess homozyogisty, cryptic relatedness, and ethnic outliers within our study population (N=414), and to exclude subjects of non-European descent, we made a selection of best performing genome-wide SNPs. To that end, only SNPs with a minor allele frequency (MAF) >0.1, genotyping missingness < 2% and in Hardy-Weinberg equilibrium (HWE, p>1x10-5) were kept and then pruned for redundancy due to linkage disequilibrium (LD) using a cut-off of r2=0.2 (i.e. SNPs showing pairwise LD >0.2 were filtered out), leaving 80 979 SNPs. With these best performing SNPs the following were ascertained in Plink v1.071: gender errors (none); cryptic relatedness (pi_hat > 0.2; none); and excess homozygosity using a threshold of >3 standard deviations (SDs) from the F-statistic (three subjects, whom we excluded from further analyses). To identify subjects of non-European origin, we compared the genotyped data in the discovery phase with HapMap 3 using principal components analysis and removed six outliers of non-European descent, leaving only individuals clustering within CEU (see graph below, in which white dots represent the current sample –“CSF”- and the six outliers clearly deviate towards the MEX and GIH populations). After additionally removing samples with > 5% missing genotype data (seven), 398 individuals remained for further analyses. We then conducted imputation on this cleaned set of 398 participants as described in the main document (MACH output was converted to Plink format) using 10 332 781 SNPs of 1000G Phase I version 3 as the reference dataset. Among those SNPs, 1 103 560 SNPs failed the r^2 threshold of 0.3, leaving 9 229 221 SNPs for the final QC standards: < 2% genotyping missingness, HWE p-value > 1x10-6, and MAF > 0.05. For the X-chromosome imputation, a previous version of 1000 Genomes (1000G Interim Phase I) was used. Thus, a total of 5767231 SNPs (585655 genotyped) were used for the association analyses.

mRNA quantification procedures, genome-wide expression profiling and Quality Control

For isolation and purification of mRNA from whole blood the PAXgene extraction kit (Qiagen) was used. PAXgene tubes were stored at -20 °C and RNA was isolated within 6 months after sample collection according to the manufacturer’s instructions, including an optional DNase digestion step. Total mRNA was quantified using a ribogreen assay (Invitrogen Quant-it ™ Ribogreen). Quality of total RNA was determined using Agilent 2100 Bioanalyzer. A threshold of RNA integrity number (RIN) of 7 was taken for selection of RNA samples. Genome-wide RNA expression profiling was obtained with the Illumina HumanRef-12 version 3 arrays using Illumina’s standard protocol at the facility of the Southern California Genotyping Consortium (SGCG).

Quality control was performed by assessment of hierarchical clustering, box plots, density distribution plots and pair-wise correlations. A gene filter was applied to select those genes that were expressed in at least one sample, with the detection p-value generated by Beadstudio set to 0.01, reducing the number of probes from 48 803 to 25 361. Our gene expression dataset consisted of three batches based on Illumina serial numbers: 5168638 (11 samples), 5298416 (176 samples), 5590177 (46 samples). Normalization between batches was performed with ComBat 2.

Association Testing of Candidate Polymorphisms

We used a recently developed machine learning method to reconstruct the 5-HTTLPR polymorphism 3. Prediction of the 5-HTTLPR polymorphism resulted in frequencies of 0.41 and 0.59 for the short (S) and long (L) alleles, respectively. The genotype distribution was in Hardy-Weinberg equilibrium (P0.05) with N=70 S/S, N=200 S/L and N=144 L/L.

Next, we identifed SNPs associated with psychiatric traits using results from GWASs of bipolar disorder (BPD), schizophrenia (SCZ), a SCZ-BPD joint analysis, major depressive disorder (MDD), ADHD, and autism given the hypothesized involvement of the monoamine system with these disorders4-9. Independent SNPs (r2 < 0.2) with a stage 1 P < 5 x 10-8 or a combined stage 1 and 2 P < 5 x 10-8 and replications of suggestive signals (P < 5 x 10-5) at a Bonferroni-corrected α = 0.05, were set apart for quantitative association testing with each of the MMs and their ratios using the same covariates as for the genome-wide analyses. To verify the inclusiveness of our method, we cross-checked that we had included all common variants enumerated as associated with the prime psychiatric disorders in GWASs in a recent review on the genetic architectures of psychiatric disorders 10.We then added the SNPs listed in table 3 of that paper for the psychiatric disorders not mentioned above and created the list below (N=51 SNPs).

Candidate polymorphisms in monoamine-related pathway genes were then selected based upon a recent enumeration of all known polymorphisms and mutations in these genes 11 and a literature search ( combining each of the following listed enzymes with “gen*”) : TH (Tyrosine 3-Hydroxylase) 11, 12, TPH1 (Tryptophane 5-Hydroxylase Isoform 1) 11, 13, TPH2 (Tryptophane 5-Hydroxylase Isoform 2) 11, DDC (AADC, Dopa Decarboxylase) 11, DAT (dopamine transporter) 14, DBH (Dopamine Beta Hydroxylase) 11, PNMT (Phenyelethanolamine N-Methyltransferase) 11, MAO-A (Monoamine Oxidase A) 11, COMT (Catechol O-methyltransferase) 11, NET (norepinephrine transporter) 14, BDNF (brain derived neurotrophic factor) 15, and DTNPB1 (dystrobrevin binding protein-1) 16. We created a list of these SNPs available in our genotyped and imputed datasets for which rs-numbers are available (N=85 SNPs, see list below).

Out of the 135 SNPs selected (51+85 – 1 SNP (rs6265 was the only SNP associated with both MM metabolism and a psychiatric trait)), 85 were available in our dataset. A linear, additive model including covariates as described above was fitted for this total of 86 (the 5-HTTTLPR and the 85 selected available SNPs) candidate polymorphisms in Plink V1.07 1.

Finally, we followed up the dopamine turnover linkage peak detected in a non-human primate.

Association of SNPs in the 15MB vervet locus syntenic to human chromosome 10:5-20Mb (shown in figure 2 of that paper 17) with HVA and its ratios was tested. To that end, using the University of Southern California (UCSC) genome browser (hg 19), the SNPs encompassing this region were selected and the same QC criteria and covariates as mentioned for the genome-wide analysis were applied, resulting in 40770 SNPs (rs2904802 to rs1926820 on chr 10;5-20Mb).

SNP list of SNPs based on Psychiatric Disorders GWAS and Meta-Analyses (in alphabetical order by psychiatric disorder).

SNP / Gene / Psychiatric Disorder
rs11136000 / CLU / AD
rs11767557 / EPHA1 / AD
rs2075650 / APOE, TOMM40 / AD
rs3764650 / ABCA7 / AD
rs3818361 / CR1 / AD
rs3851179 / PICALM / AD
rs3865444 / CD33 / AD
rs610932 / MS4A cluster / AD
rs744373 / BIN1 / AD
rs9349407 / CD2AP / AD
rs1229984 / ADH1B / ALC
rs671 / ALDH2 / ALC
rs6943555 / AUTS2 / ALC
rs12518194 / intergenic / ASD
rs1896731 / intergenic / ASD
rs4141463 / MACROD2 / ASD
rs4307059 / intergenic / ASD
rs4327572 / intergenic / ASD
rs7704909 / intergenic / ASD
rs1064395 / NCAN / BIP
rs10994397 / ANK3 / BIP
rs12576775 / ODZ4 / BIP
rs4765913 / CACNA1C / BIP
rs9371601 / SYNE2 / BIP
rs7296288 / intergenic / MDD
rs1051730 / CHRNA3 / NIC
rs1329650 / LOC100188947 / NIC
rs3733829 / EGLN2, CYP2A6 / NIC
rs10503253 / CSMD1 / SCZ
rs11191580 / NT5C2 / SCZ
rs11819869 / AMBRA1 / SCZ
rs12807809 / NRGN / SCZ
rs12966547 / CCDC68 / SCZ
rs13211507 / MHC / SCZ
rs1625579 / MIR137 / SCZ
rs16887244 / LSM1 / SCZ
rs17512836 / TCF4 / SCZ
rs17662626 / PCGEM1 / SCZ
rs2021722 / TRIM26 / SCZ
rs2312147 / VRK2 / SCZ
rs548181 / intergenic / SCZ
rs7004633 / intergenic / SCZ
rs7004635 / MMP16 / SCZ
rs7914558 / CNNM2 / SCZ
rs9960767 / TCF4 / SCZ
rs10994359 / ANK3 / SCZ-BIP
rs1344706 / ZNF804A / SCZ-BIP
rs2239547 / ITIH3–ITIH4 / SCZ-BIP
rs4765905 / CACNA1C / SCZ-BIP
rs3025343 / DBH / SMOC
rs6265 / BDNF / SMOI

AD= Alzheimer’s disease; ALC= Alcohol dependence; BIP = Bipolar Disorder; MDD= Major Depressive Disorder; NIC= Nicotine consumption; SCZ= Schizophrenia; SCZ-BIP= SCZ-BIP joint analysis; SMOC= Smoking Cessation; SMOI= Smoking Initiation

SNP list of Monoamine Related Pathway Genes

SNP / Gene
rs1799833 / TH
rs34510659 / TH
rs7950050 / TH
rs6356 / TH
rs28934579 / TH
rs11564716 / TH
rs6357 / TH
rs28934581 / TH
rs28934580 / TH
rs11826260 / TH
rs36097848 / TH
rs3842724 / TH
rs1800033 / TH
rs503964 / TPH1
rs490895 / TPH1
rs34115267 / TPH2
rs17110563 / TPH2
rs7305115 / TPH2
rs2887148 / TPH2
rs2887147 / TPH2
rs4290270 / TPH2
rs7488262 / TPH2
rs11575290 / DDC
rs11575291 / DDC
rs11575292 / DDC
rs11575302 / DDC
rs6262 / DDC
rs6263 / DDC
rs11575376 / DDC
rs11575377 / DDC
rs11575542 / DDC
rs13306306 / DBH
rs3025380 / DBH
rs2797848 / DBH
rs1108580 / DBH
rs5319 / DBH
rs5320 / DBH
rs5321 / DBH
rs5322 / DBH
rs35465867 / DBH
rs5323 / DBH
rs1330630 / intergenic
rs5324 / DBH
rs3025400 / DBH
rs5325 / DBH
rs4531 / DBH
rs13306303 / DBH
rs77905 / DBH
rs3025421 / DBH
rs6271 / DBH
rs5639 / PNMT
rs5640 / PNMT
rs5641 / PNMT
rs5642 / PNMT
rs5643 / PNMT
rs1800464 / MAOA
rs1799835 / MAOA
rs1800465 / MAOA
rs1803987 / intergenic
rs7065428 / MAOA
rs1803986 / MAOA
rs1800466 / MAOA
rs11544670 / COMT
rs6270 / COMT
rs4633 / COMT
rs6267 / COMT
rs740602 / COMT
rs13306281 / COMT
rs5031015 / COMT
rs11544669 / COMT
rs769223 / COMT
rs4986871 / COMT
rs4818 / COMT
rs8192488 / COMT
rs4680 / COMT
rs13306279 / COMT
rs769224 / COMT
rs165631 / COMT
rs2619538 / DTNBP1
rs3918342 / intergenic
rs1421292 / intergenic
rs4537731 / intergenic
rs11030101 / BDNF
rs16917204 / BDNF-AS1
rs6265 / BDNF

1.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81(3): 559-575.

2.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007; 8(1): 118-127.

3.Lu AT, Bakker S, Janson E, Cichon S, Cantor RM, Ophoff RA. Prediction of serotonin transporter promoter polymorphism genotypes from single nucleotide polymorphism arrays using machine learning methods. Psychiatr Genet 2012; 22(4): 182-188.

4.Neale BM, Medland SE, Ripke S, Asherson P, Franke B, Lesch KP et al. Meta-analysis of genome-wide association studies of attention-deficit/hyperactivity disorder. J Am Acad Child Adolesc Psychiatry 2010; 49(9): 884-897.

5.Sklar P, Ripke S, Scott LJ, Andreassen OA, Cichon S, Craddock N et al. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat Genet 2011; 43(10): 977-983.

6.Ripke S, Sanders AR, Kendler KS, Levinson DF, Sklar P, Holmans PA et al. Genome-wide association study identifies five new schizophrenia loci. Nat Genet 2011; 43(10): 969-976.

7.Wray NR, Pergadia ML, Blackwood DH, Penninx BW, Gordon SD, Nyholt DR et al. Genome-wide association study of major depressive disorder: new results, meta-analysis, and lessons learned. Mol Psychiatry 2012; 17(1): 36-48.

8.Wang K, Zhang H, Ma D, Bucan M, Glessner JT, Abrahams BS et al. Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature 2009; 459(7246): 528-533.

9.Anney R, Klei L, Pinto D, Regan R, Conroy J, Magalhaes TR et al. A genome-wide scan for common alleles affecting risk for autism. Hum Mol Genet 2010; 19(20): 4072-4082.

10.Sullivan PF, Daly MJ, O'Donovan M. Genetic architectures of psychiatric disorders: the emerging picture and its implications. Nat Rev Genet 2012; 13(8): 537-551.

11.Haavik J, Blau N, Thony B. Mutations in human monoamine-related neurotransmitter pathway genes. Hum Mutat 2008; 29(7): 891-902.

12.Jonsson E, Sedvall G, Brene S, Gustavsson JP, Geijer T, Terenius L et al. Dopamine-related genes and their relationships to monoamine metabolites in CSF. Biol Psychiatry 1996; 40(10): 1032-1043.

13.Andreou D, Saetre P, Werge T, Andreassen OA, Agartz I, Sedvall GC et al. Tryptophan hydroxylase gene 1 (TPH1) variants associated with cerebrospinal fluid 5-hydroxyindole acetic acid and homovanillic acid concentrations in healthy volunteers. Psychiatry research 2010; 180(2-3): 63-67.

14.Jonsson EG, Nothen MM, Gustavsson JP, Neidt H, Bunzel R, Propping P et al. Polymorphisms in the dopamine, serotonin, and norepinephrine transporter genes and their relationships to monoamine metabolite concentrations in CSF of healthy volunteers. Psychiatry research 1998; 79(1): 1-9.

15.Jonsson EG, Saetre P, Edman-Ahlbom B, Sillen A, Gunnar A, Andreou D et al. Brain-derived neurotrophic factor gene variation influences cerebrospinal fluid 3-methoxy-4-hydroxyphenylglycol concentrations in healthy volunteers. Journal of neural transmission 2008; 115(12): 1695-1699.

16.Andreou D, Saetre P, Kahler AK, Werge T, Andreassen OA, Agartz I et al. Dystrobrevin-binding protein 1 gene (DTNBP1) variants associated with cerebrospinal fluid homovanillic acid and 5-hydroxyindoleacetic acid concentrations in healthy volunteers. European neuropsychopharmacology : the journal of the European College of Neuropsychopharmacology 2011; 21(9): 700-704.

17.Freimer NB, Service SK, Ophoff RA, Jasinska AJ, McKee K, Villeneuve A et al. A quantitative trait locus for variation in dopamine metabolism mapped in a primate model using reference sequences from related species. Proc Natl Acad Sci U S A 2007; 104(40): 15811-15816.