Supplemental Material

Supplemental Methods:

Wave / SCZ / BD / Controls / Total / Array
1 / 203 / 0 / 198 / 401 / Affymetrix 5.0
2 / 401 / 0 / 244 / 645 / Affymetrix 6.0
3 / 483 / 10 / 895 / 1388 / Affymetrix 6.0
4 / 1024 / 826 / 1198 / 3048 / Affymetrix 6.0
Total / 2111 / 836 / 2535 / 5482

Supplementary Table 1. Subject numbers and genotyping array by collection wave.

Pathway analysis methods

Five gene set analyses were conducted: 1) the top PGC-SCZ associated regions (p < 1 x 10-5)1in the new and full SCZ samples;2) synaptic genes2in BD, SCZ (new and full samples) and SCZ-BD combined; 3) MIR137 targets, implicated in the recent PGC-SCZ report1 and defined in the same manner, were tested in our SCZ new and full samples; 4) calcium channel genes in the BD and combined SCZ-BD groups (CACNA1A, CACNA1B, CACNA1C, CACNA1E, CACNA1F, CACNA1G, CACNA1H, CACNA1I, CACNA1S, CACNA2D1, CACNA2D2, CACNA2D3, CACNA2D4, CACNB1, CACNB2, CACNB4, CACNG1, CACNG3, CACNG4, CACNG5, CACNG6, CACNG7, CACNG8); 5) and all Gene Ontology (GO) categories3 in the combined SCZ-BD group.

The PLINK “clump” function was used to generate regions of association for four p-value thresholds (.005, .001, .0005, .0001) for each set of results tested for PGC-SCZ and synaptic gene enrichment, and results are reported for the p=.005 threshold. Pathway analyses of the calcium channels and GO categories used the top 200 associated regions of the relevant results files. Regions falling within the MHC (chr6: 25-35Mb) were removed from all pathway analyses. Genomic regions encompassing the top results as well as surrounding SNPs in linkage disequilibrium were compared against gene pathways using INRICH.4 Significance was assessed through 10,000 permutations in which randomly selected genomic regions matched for gene and SNP density were compared for pathway enrichment. Results are reported for the p=.005 threshold.

To test for MIR137 target enrichment, the set-screen test5 was performed and p-values were transformed to z-scores. The mean z-score for the MIR137 target set was compared with that of 100 randomly selected sets with the same number of genes.

Supplementary References

1.Ripke S, Sanders AR, Kendler KS, Levinson DF, Sklar P, Holmans PA et al.Genome-wide association study identifies five new schizophrenia loci. Nat Genet 2011; 43(10): 969-976.

2.Lips ES, Cornelisse LN, Toonen RF, Min JL, Hultman CM, Holmans PA et al.Functional gene group analysis identifies synaptic gene groups as risk factor for schizophrenia. Mol Psychiatry 2011.

3.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000; 25(1): 25-29.

4.Lee PH, O'Dushlaine C, Purcell, S.M. InRich: Interval-based Enrichment Analysis for Genome-wide Association Studies.

5.Moskvina V, O'Dushlaine C, Purcell S, Craddock N, Holmans P, O'Donovan MC. Evaluation of an approximation method for assessment of overall significance of multiple-dependent tests in a genomewide association study. Genet Epidemiol 2011; 35(8): 861-866.

Polygenic scoring methods

To generate the discovery SNP set from the PGC-SCZ data, we keptonly SNPs with MAF>0.02 and very high imputation quality (Info >.9), and removed SNPs with r2>0.1. We also excluded the SNPs in the MHC region (chr6:25-35Mb) due to its complicated LD structure and broad association signal.Association tests were computed for each SNP (correcting forancestry-based covariates) and the logistic regression beta coefficients used as weights. The testing dataset consisted of 2,111 SCZ cases or 836 BD cases and 2,535 controls, and quantitative scores were computed for each subject based on the pT (p-value threshold), the proportion of SNPs with p-values < pT in the training dataset. We varied pT from 0.01 to 1.0. For each set of SNPs defined by pT, the score for each subject in the test dataset was computed as the sum (across all selected SNPs) of the individual’s dosage of the testallele multiplied by the training dataset beta for that allele. For each SNP set defined by pT,we evaluated the significance of the case-control score difference using logistic regression (including principal component covariates) and the proportion of variance explained (R2) by subtracting the Nagelkerke’s R2 attributable to ancestry covariates alone from the R2 for polygenic scores plus covariates.

Supplemental Results:

Supplementary Figure 1. QQ-plot for four main analyses using directly genotyped SNPs.

SCZ, full sample (black)λ=1.04(λ1000 = 1.019), SCZ and BD combined (green) λ=1.04 (λ1000 = 1.022), SCZ, new subjects (blue) λ=1.04(λ1000 = 1.017), BD (purple) λ=1.05(λ1000 = 1.042)

A

B

C

D

Supplementary Figures 2, A-D. Manhattan plots for main association analyses. Standard –log10(p-value) plot for each of the four main analyses using Hapmap3 imputed markers A) SCZ new samples B) BD C) SCZ full sample D) SCZ and BD combined. Markers with p<5 x 10-5 (above the dashed line) are enlarged for emphasis.

Supplementary Figure 3A. Region plot of MHC region for the SCZ full sample association results. Only directly genotyped SNPs are represented from 25-35 Mb. The purple diamond marks the most highly associated SNP, rs886424 (p=4.54 x 10-8; OR=.68).

Supplementary Figure 3B. Region plot of MHC region (Chr6: 25-35 Mb) for SCZ full sample results conditioned on the top SNP, rs886424, in directly genotyped data. The most significant SNP after conditioning, rs9500976, has a p-value of 3.5 x 10-4.

Supplementary Figure 4. Region plot for MHC-specific imputation results.The genomic window imputed (and displayed here) was 29.3-33.9 MB on chromosome 6.

Heterogeneity in full sample SCZ results

Meta-analysis using PLINK yields two indices of heterogeneity:I2 and a Q statistic p-value. I2ranges from 0 to 100 and indexes the extent of heterogeneity, while theQ statistic p-values indicate the presence or absence of heterogeneity.1Meta-analysis of results across the four collection waves yielded somewhat high values for I2 (in comparison to the genome-wide distribution of these measures) and significant Q statistic p-values for several of the top SNPs in the full sample SCZ analyses(below). Effect sizes by collection wave are also shown.

We investigated the sources of this heterogeneity in terms of the potentially relevant variables available to us: collection wave, county of origin, sex, and ancestry.The small number of subjects in the first collection wave naturally results in greater variability in effect sizes (and larger standard errors), but it is actually the third collection wave that manifests some evidence of a distinct (diminished) effect size. However, a sign test indicates the direction of effect is consistent across the top results. No evidence for heterogeneity was observed in direct tests of heterogeneity across the counties of origin, and the top SNPs directly genotyped in all subjects did not show significant differences by sex.Additionally, while most subjects were of Swedish ancestry, a minority of subjects (7.5-11.1% across waves) were genetically identified as having Finnish ancestry (N=485).Excluding these subjects did not substantively change the heterogeneity values for the index SNP (rs886424, I2=67.9, Q=0.025) or other SNPs in the MHC region, indicating this population difference is not the root cause of this observation.

In summary, heterogeneity for the MHC regional association did not vary by county, ancestry, or sex, and we observed only a modest effect by collection wave.It is worth noting that in the original International Schizophrenia Consortium (2009) report of MHC association, heterogeneity was also observed and reported.

1 Huedo-Medina TB, Sanchez-Meca J, Marin-Martinez F, Botella J. Assessing the heterogeneity in meta-analysis: Q statistic or I2 index? Psychological Methods 2006; 11(2):193-206.

Supplementary Table 2. GO pathways with nominally significant enrichment (p<.05) in the SCZ-BD combined results. Target size = number of genes in the GO pathway. Number = number of these genes represented in the top 200 results.

Schizophrenia / Bipolar Disorder
Discovery P value threshold / R2 / Target P value / R2 / Target P value
0.01 / 0.0337 / < 2 x 10-16 / 0.0089 / 1.78 x 10-5
0.05 / 0.0345 / < 2 x 10-16 / 0.0076 / 7.12 x 10-5
0.1 / 0.0358 / < 2 x 10-16 / 0.0107 / 2.29 x 10-6
0.2 / 0.0343 / < 2 x 10-16 / 0.0120 / 5.78 x 10-7
0.3 / 0.0349 / < 2 x 10-16 / 0.0140 / 6.37 x 10-6
0.4 / 0.0317 / < 2 x 10-16 / 0.0072 / 1.98 x 10-4
0.5 / 0.0288 / < 2 x 10-16 / 0.0067 / 5.92 x 10-4
0.6 / 0.0272 / < 2 x 10-16 / 0.0069 / 4.88 x 10-4
0.7 / 0.0273 / < 2 x 10-16 / 0.0076 / 2.69 x 10-4
0.8 / 0.0276 / < 2 x 10-16 / 0.0078 / 2.24 x 10-4
0.9 / 0.0278 / < 2 x 10-16 / 0.0078 / 2.23 x 10-4
1.0 / 0.0276 / < 2 x 10-16 / 0.0075 / 2.73 x 10-4

Supplementary Table 3. Variance explained for the Swedish SCZ and BD samples using the PGC-SCZ discovery sample.

Supplementary Figure 5. Deletions and duplications in the new samples across all groups at 16p11.2. Positions aregiven on hg19 coordinates, and locations of segmental duplications are shown.

Supplementary Figure 6. Four deletions in SCZ subjects and an atypical, partial deletion in one control subject at 22q11.21 in addition to three duplications in controls. The hg19 coordinates are shown above the CNVs, and segmental duplications are displayed below.

Supplementary Figure 7. Fourteen duplications in BD, 17 in SCZ, and 11 in controls at 9q34.3.The hg19 coordinates are displayed, and no segmental duplications are observed in this region.

deletions / duplications
CNV region / schizophrenia / bipolar / control / schizophrenia / bipolar / control
1q21.1 / 4 / 2 / 1 / 1 / 1 / 1
2p21-16.3 / 3 / 0 / 2 / 1 / 0 / 0
3q29 / 2 / 0 / 0 / 1 / 0 / 1
7q36.3 / 0 / 0 / 0 / 1 / 0 / 1
15q11.2 / 0 / 0 / 0 / 2 / 0 / 1
15q13.2 / 2 / 1 / 1 / 8 / 2 / 5
16p11.2 / 1 / 0 / 0 / 9 / 1 / 1
16p13.11 / 0 / 0 / 1 / 7 / 3 / 6
17q25.1 / 0 / 0 / 0 / 8 / 4 / 8
21q11.2 / 6 / 1 / 4 / 2 / 3 / 4
22q11 / 4 / 0 / 1 / 0 / 0 / 3

Supplementary Table 4. Deletions and duplications >100 kbin previously associated CNV regions in the new subjects by group. Subject numbers by group: SCZ = 1,505, BD = 834, control = 2,087

Supplementary Table 5. Copy number polymorphism (CNPs) deletions in the 1000 genomes CEU subjects tagged by SNPs with r2 >.5 in and p<5x10-4 in the full SCZ association analysis. When multiple SNPs tagged a CNP, the SNP with the greatest r2 and lowest p-value was retained. Positions are given on hg18 coordinates.

1