Supplemental Methods s17

Supplemental Methods:

Supplemental Methods I: Genotyping

A 79 SNP marker panel was genotyped for each sample immediately upon sample receipt at CIDR. The panel includes 63 high-performing autosomal markers widely dispersed across the genome with high minor allele frequencies in all HapMap populations, 10 X and 6 Y chromosome markers. Genotypes were generated using Illumina Golden Gate chemistry. These data were used to verify gender, confirm expected duplicates, identify unexpected duplicates, predict sample performance on the GWAS array, and provide an internal genotype ‘barcode’ for each sample. This genotyping barcode was compared against the final GWAS dataset for each sample (62 of the 79 SNPs are also assayed on the 370Duo array) to confirm correct association between the final phenotype and genotype datasets. All mitochondrial and Y chromosome SNPs and a subset of X chromosome SNPs were manually reviewed and reclustered as needed.

CEPH Utah and Yoruba HapMap samples were placed in unique positions on each DNA plate, 1 per set of 3 columns processed together in the laboratory. Forty-five blind duplicate samples were provided by the study investigators. Each genotyped plate contained a mixture of cases and controls, and there was no evidence for plate-specific genotype effects.

Of the attempted SNPs, 2,530 (0.73%) did not have genotypes released by CIDR to the investigators due to technical failures (defined by the genotyping lab as call rate less than 85% or more than 1 HapMap sample replicate error). Four LCL samples were identified as XO mosaics, which is a relatively common artifact due to the cell immortalization process. Autosomal genotypes appeared unaffected; therefore, all genotypes except for those on the X chromosome were released for these four samples.

Supplemental Methods II: Level Two Quality Assessment by the Investigators

The genotypic data from the cases and controls were used to detect cryptically related individuals. The proportion of alleles shared identical by decent (IBD) for all pairs of individuals was estimated using PLINK (Purcell et al. 2007). Six pairs of siblings were identified between the two PD studies (PROGENI and GenePD) and the individual with the lower call rate was removed from further analyses. Ten controls identified as being siblings or the offspring of another control were similarly dropped from further analyses. One control was removed after they were identified as the parent of a PD case. Whole genome amplified (WGA) DNA (n=28 samples) had lower call rates, particularly near the telomeres, and for a subset of the SNPs the minor allele frequency estimates from WGA DNA differed significantly from that obtained from other sources of DNA (p<1 x 10-7 for 65 markers). Therefore, all WGA samples were removed from the dataset for analysis in the current study.

The genotypic data from the cases and controls were then reviewed using clustering algorithms to assess the potential for population stratification. The proportion of alleles shared identical by state (IBS) for all pairs of individuals was computed using PLINK. The multidimensional scaling (MDS) algorithm implemented in PLINK was then performed using data from four HapMap populations (CEPH Caucasian, Yoruba, Han Chinese, Japanese) to identify clusters using the first four dimensions. Three individuals were identified who clustered substantially closer to the HapMap Yoruba samples than the CEPH Caucasian samples and were therefore thought to be African American rather than Caucasian (n=3 controls). An additional 21 individuals (n=7 cases; n=14 controls) did not cluster tightly with the HapMap CEPH Caucasian samples and demonstrated admixture with either Asian or African populations by clustering toward the HapMap Asian or Yoruba samples. When the MDS algorithm was repeated after removing these individuals, none of the first 10 MDS components were significantly associated with disease status in the final sample.

When a genotype for a duplicated sample was called for only one of the two samples, that genotype was included in the final dataset. In instances when the genotypes were available for both samples and differed, the genotypic data for that SNP was coded as missing.

The Fisher exact test was employed to assess differential rates of missing genotypes based on PD classification (case or control) or gender (male or female) and SNPs with significantly different rates of missing genotypes (p<0.0001) were removed from further analysis (case/control n=75; gender n=271). SNPs demonstrating significant deviation from Hardy Weinberg equilibrium (p<0.000001) in the control sample were removed from further analysis (n=906). SNPs on the X chromosome and outside of the pseudo-autosomal region for which more than 1 male was called as heterozygous were removed from analysis (n=794), as were autosomal SNPs that had significantly different frequencies of heterozygotes (absolute difference > 0.10) between the sexes (n=49). SNPs with more than 2 Mendelian errors in the HapMap trio samples were removed from analyses (n=345).

Supplemental Methods III: Broader Inclusion Analysis

A broader set of individuals encompassing 902 cases (PROGENI, n=491; GenePD, n=411) and 881 controls (see Supplemental Table 1). These additional samples included 40 cases and 14 controls of Hispanic or Asian descent and 19 cases from whole genome amplified samples. African Americans have a substantially lower rate of disease and therefore were not included in the expanded data set. To account for potential population stratification by including Asian and Hispanic samples, the first two principle components estimated by MDS analysis were incorporated into the logistic regression. Results were largely similar to those obtained in the primary sample (see Supplemental Tables 2A and 2B).

Supplemental Methods IV: Quality Assessment of Fung et al. dataset

Eight individuals had X chromosome data that did not corroborate the gender listed in dbGaP. Two of these, however, had their correct gender listed on the Coriell website (ND00691, ND01701) and were therefore kept in the analytic dataset. We contacted Coriell about the true status of the remaining individuals (ND00171, ND00410, ND00740, ND01424, ND01507, and ND01552), and Coriell confirmed the listed data and recommended not using any of these individuals in analyses. When asked about the individuals that were genotyped but no longer available on the website, Coriell also stated that two samples had failed quality control and should be removed from analyses (ND01568 and ND01708). Three more individuals were removed from analyses because they were self-described as either African American or Hispanic (ND05016, ND01060, and ND04404).

Supplemental Table 1: Sample demographics

PD Cases (n=902) / Controls
PROGENI
(n=491) / GenePD
(n=411) / NINDS Coriell Repository (n=881)
Average age at onset (cases) or at enrollment (controls) / 62.1 ± 10.4 / 61.0 ± 11.8 / 54.6 ± 13.1
% Male / 60.5% / 57.9% / 40.0%
% with parent reported with PD / 35.4% / 22.9% / 0%

Supplemental Table 2A: Additive Model: Results with top SNPs in the expanded sample of 902 PD cases and 881 controls

# / Gene Regiona / Chr / SNP / Minor
Allele / Positionb / MAF
case / MAF
control / Odds / p-value
Ratioc
1 / GAK/DGKQ / 4 / rs1564282 / T / 842313 / 0.132 / 0.087 / 1.68 / 8.8 x 10-6
rs11248051 / T / 848332 / 0.133 / 0.087 / 1.68 / 7.4 x 10-6
rs11248060 / T / 954359 / 0.145 / 0.098 / 1.64 / 8.6 x 10-6
2 / COX6CP2/PTPN1 / 20 / rs4811072 / G / 48519524 / 0.287 / 0.238 / 1.36 / 2.0 x 10-4
rs1997791 / G / 48529835 / 0.293 / 0.238 / 1.39 / 6.8 x 10-5
3 / LOC729075 / 4 / rs2654735 / C / 112618062 / 0.395 / 0.461 / 0.78 / 4.0 x 10-4
rs1806506 / A / 112686700 / 0.391 / 0.457 / 0.76 / 1.8 x 10-4
rs11729080 / A / 112723321 / 0.136 / 0.186 / 0.68 / 9.7 x 10-5
4 / LOC727725/ZMAT4 / 8 / rs4736788 / T / 40947586 / 0.232 / 0.277 / 0.73 / 1.7 x 10-4
rs10094981 / C / 40950451 / 0.232 / 0.278 / 0.73 / 1.7 x 10-4
5 / HRNBP3 / 17 / rs898528 / T / 74678398 / 0.300 / 0.365 / 0.76 / 1.8 x 10-4
6 / LAMP1 / 13 / rs12871648 / C / 113018663 / 0.370 / 0.314 / 1.34 / 1.1 x 10-4
7 / LTBP1 / 2 / rs4670322 / G / 33309246 / 0.318 / 0.255 / 1.38 / 4.9 x 10-5
8 / gene desert / 10 / rs11592212 / C / 110407383 / 0.084 / 0.052 / 1.79 / 6.1 x 10-5
9 / SNCA/GPRIN3/MMRN1 / 4 / rs4106153 / C / 90463499 / 0.171 / 0.220 / 0.70 / 8.2 x 10-5
rs1504489 / T / 90477611 / 0.371 / 0.425 / 0.77 / 3.7 x 10-4
rs356229 / G / 90825620 / 0.433 / 0.366 / 1.37 / 1.8 x 10-5
rs356188 / G / 90910560 / 0.177 / 0.228 / 0.70 / 4.3 x 10-5
rs3775478 / G / 91061863 / 0.103 / 0.070 / 1.67 / 6.6 x 10-5
10 / PRDM13/MCHR2 / 6 / rs4431442 / G / 100320236 / 0.329 / 0.262 / 1.40 / 2.5 x 10-5
11 / VPS8 / 3 / rs10937194 / G / 186201412 / 0.190 / 0.240 / 0.70 / 3.5 x 10-5
12 / CGRRF1/SAMD4A / 14 / rs4901519 / C / 54088930 / 0.117 / 0.155 / 0.65 / 3.9 x 10-5
13 / C17orf69/PLEKHM1/MAPT / 17 / rs11012 / A / 40869224 / 0.142 / 0.194 / 0.69 / 1.0 x 10-4
rs1724425 / T / 41137530 / 0.384 / 0.446 / 0.75 / 9.3 x 10-5
14 / LEKR1 / 3 / rs12638253 / C / 158108785 / 0.447 / 0.522 / 0.77 / 2.2 x 10-4
15 / POU6F2 / 7 / rs9655034 / T / 39258636 / 0.482 / 0.415 / 1.32 / 8.3 x 10-5
16 / TMEM108 / 3 / rs1197313 / T / 134583142 / 0.393 / 0.445 / 0.77 / 4.5 x 10-4
17 / LOC728328/PCTK2 / 12 / rs7312607 / C / 95350301 / 0.488 / 0.434 / 1.32 / 1.1 x 10-4
18 / FGF12 / 3 / rs9859577 / T / 193571219 / 0.126 / 0.172 / 0.69 / 1.7 x 10-4
19 / LOC652429/TMEM132B / 12 / rs2108521 / C / 124901417 / 0.225 / 0.275 / 0.73 / 1.1 x 10-4

a Genes taken from the NCBI mRNA reference sequences collection (RefSeq); b from NCBI Build 36 reference; a “gene desert” is defined here as being more than 500 kb away from any gene listed in RefSeq; c odds ratios were computed for the minor allele

Supplemental Table 2B: Recessive Model: Results with top SNPs in the expanded sample of 902 PD cases and 881 controls

# / Gene Regiona / Chr / SNP / Minor
Allele / Positionb / MAF
case / MAF
control / Odds / p-value
Ratioc
1 / C17orf69 / 17 / rs1724422 / G / 41133096 / 0.401 / 0.460 / 0.60 / 9.6 x 10-5
rs1724425 / T / 41137530 / 0.384 / 0.446 / 0.57 / 2.9 x 10-5
2 / FSHR / 2 / rs7578654 / C / 49363677 / 0.469 / 0.424 / 1.74 / 1.7 x 10-5
3 / PIK3CD / 1 / rs4240910 / C / 9673044 / 0.514 / 0.450 / 1.62 / 6.7 x 10-5
4 / LOC728284/F11 / 4 / rs2889188 / G / 187552805 / 0.265 / 0.231 / 2.66 / 1.0 x 10-5
5 / gene desert / 1 / rs11184419 / C / 105439944 / 0.334 / 0.360 / 0.53 / 4.4 x 10-5
rs4128942 / C / 105447249 / 0.332 / 0.360 / 0.53 / 3.9 x 10-5
6 / GPXP2 / 21 / rs969988 / A / 27474523 / 0.463 / 0.418 / 1.68 / 8.7 x 10-5
7 / LOC643954/HS3ST5 / 6 / rs1519686 / T / 114553816 / 0.246 / 0.223 / 2.65 / 5.2 x 10-5
8 / SYT4/RIT2 / 18 / rs4890430 / A / 38951528 / 0.447 / 0.387 / 1.66 / 1.7 x 10-4
9 / FIGN / 2 / rs2083482 / A / 164146021 / 0.504 / 0.464 / 1.57 / 1.9 x 10-4
10 / SYN3 / 22 / rs1159220 / T / 31410753 / 0.388 / 0.424 / 0.58 / 7.3 x 10-5
rs3788483 / C / 31414345 / 0.390 / 0.426 / 0.58 / 7.9 x 10-5
11 / LOC283398/TMCC3 / 12 / rs10859725 / C / 93468003 / 0.206 / 0.173 / 2.88 / 1.0 x 10-4
12 / CDC2L6 / 6 / rs6912010 / A / 111003337 / 0.277 / 0.247 / 2.16 / 1.1 x 10-4
13 / LOC728637/ACSL6 / 5 / rs1355095 / G / 131276668 / 0.164 / 0.209 / 0.28 / 2.6 x 10-4
14 / LOC651011/OXSM/NGLY1 / 3 / rs9310784 / C / 25905208 / 0.140 / 0.154 / 0.19 / 5.2 x 10-5

a Genes taken from the NCBI mRNA reference sequences collection (RefSeq). b from NCBI Build 36 reference; a “gene desert” is defined here as being more than 500 kb away from any gene listed in RefSeq; c odds ratios were computed for the minor allele