SUPPLEMENTARY MATERIALS AND METHODS

Journal of Neurology

A PATIENT WITH PMP22-RELATED HEREDITARY NEUROPATHY AND DBH-GENE RELATED DYSAUTONOMIA

Anna Bartoletti-Stella · Giacomo Chiaro · Giovanna Calandra-Buonaura · Manuela Contin · Cesa Scaglione · Giorgio Barletta · Annagrazia Cecere · Paolo Garagnani · Paolo Tieri · Alberto Ferrarini · Silvia Piras · Claudio Franceschi · Massimo Delledonne · Pietro Cortelli* and Sabina Capellari*

*Correspondence: Sabina Capellari, IRCCS Istituto delle Scienze Neurologiche di Bologna, Dipartimento di Scienze Biomediche e Neuromotorie, Università di Bologna, Via Altura 1/8, 40139 Bologna, Italia. Tel: +39 051 4966115; Fax: +39 051 4966208; E-mail: . Pietro Cortelli IRCCS Istituto delle Scienze Neurologiche di Bologna, Dipartimento di Scienze Biomediche e Neuromotorie, Università di Bologna, Via Altura 1/8, 40139 Bologna, Italia. Tel: +39 051 4966292; Fax: +39 051 4966208; E-mail: .

Library construction, capture and sequencing

Genomic DNA of the proband and his family (mother, father and healthy brother) was extracted from blood by Maxwell Promega Maxwell 16 blood DNA purification kit and automatic extractor (Promega, Madison, WI). DNA libraries were constructed using Illumina TruSeq DNA Sample Preparation Kit (Illumina, San Diego, CA) following manufacturer instructions. Libraries were quality checked using Agilent DNA 1000 Kit (Agilent, Santa Clara, CA) on an Agilent 2100 Bioanalyzer and were quantified by qPCR using the KAPA Library Quantification kit (KapaBiosystems). Libraries were pooled in equimolar concentration and exome enrichment was performed using Illumina’s TruSeq Exome Enrichment kit (Illumina, San Diego, CA) according to manufacturer instructions. Two 20-h biotinylated bait-based hybridizations were performed with each followed with Streptavidin Magnetic Beads binding, a washing step and an elution step. A 10-cycle PCR enrichment was performed after the second elution and the enriched libraries were quality checked using Agilent DNA 1000 Kit (Agilent, Santa Clara, CA) on an Agilent 2100 Bioanalyzer and quantified by qPCR using the KAPA Library Quantification kit (KapaBiosystems). Sequencing was performed with an Illumina HiSeq 1000 Sequencer using TruSeq SBS v3-HS kit (200 cycles; Illumina, San Diego, CA) and TruSeq PE Cluster v3-cBot-HS kit (Illumina, San Diego, CA) generating 100-bp paired-end reads.

Data analysis

Raw reads were filtered by removing low quality reads (>10% undetermined bases and >50 bp with a quality score Q <7). Adapters were clipped using Scythe v0.980 and low quality ends with an average quality score <20 on a window of 20 nt were trimmed from 3’ ends of reads using Sickle v0.940 (https://github.com/ucdavis-bioinformatics). Filtered reads were mapped to the human reference genome (build hg19) using the Burrows-Wheeler Aligner (BWA) 0.6.2-r126. The BWA aligned sequencing reads were processed with Picard tools to mark PCR duplicates. The Genome Analysis Toolkit (GATK) 2.6-5 was then used to remove duplicates, perform local realignment and map quality score recalibration following developers best practices. Multisample variant calling was performed using UnifiedGenotyper module in GATK with -glm BOTH parameter set. The annotated VCF files were then filtered using the GATK VariantFiltration module and variant calls that failed to pass the following filters were eliminated from the call set: (i) MQ0 > = 4 & ((MQ0 / (1.0 * DP)) > 0.1); (ii) QUAL < 30.0 || QD < 5.0 || DP < 5 (iii) FS > 60.0for SNPs or FS >100 for Indels (iii) cluster size 10. Variant annotation and pedigree analysis were performed using SNP and Variation Suite (SVS) 7.7.5 (Golden Helix).

Interactome

The reconstruction and the analysis of a protein-protein interaction (PPI) network (or interactome) related to one or more genes allow to explore a given gene set at the functional level of the protein interactions [30,31]. Network-based approaches use information from experimentally validated PPI data to search for genes whose protein products interact with each other and which may jointly contribute to disease risk [30,32], or to prioritize gene sets [33,34]. Network-based approaches are better able to detect genes that work across pathways, less biased by prior knowledge than are other kind of analysis such as pathway-based (enrichment) analysis [35] and are able to identify network structures and central proteins by using punctual information (e.g. protein A interacting with protein B) reconstructed at the system level (e.g. the network of interactions of all given proteins). Level 0 interactome (L0) is referred to the network of interactions among and within seed proteins only. Here, we will consider the six exome sequencing-derived genes involved in the biosynthesis of catecholamines DBH, TH, DDC, VAMP2, SLC6A3, PAH (hereinafter called seed genes/proteins, Supplementary Table S5), and the 48 candidate genes/proteins with de novo (two genes) variants or in homo/heterozygosis (46 genes), selected by two or more software (Supplementary Table S6). Experimentally validated PPI data were retrieved from the Agile Protein Interaction Database (APID) [36] accessed through the network analysis platform Cytoscape [37] and the APID2NET plugin [38], and then analyzed with NetworkAnalyzer [39]. Level 0 interactome is empty, i.e. there is no experimental evidence of interactions among the 6 seed genes and the 48 candidate genes. When the analysis was performed, a significant portion (32 out of 54, ~60%) of seed + candidate genes is not even present in the main PPI databases (Supplementary Table S7). This means that at the functional level of PPI there is no evidence that seed + candidate genes work together.

Variant Confirmation

DBH gene variants (Table 2) were validated by PCR and Sanger sequencing. DNA extracted from blood was amplified by PCR with Taq DNA Polymerase (Roche Life Science). Primers for PCR were designed with Primer3 (frodo.wi.mit.edu/primer3) using NCBI37/Hg19 as a reference sequence (Supplementary Table S8). PCR products were purified with ExoSAP-IT (Affymetrix). ABI BigDyeTerminator 1.1 Cycle sequencing kit (Applied Biosystems) was used for sequencing reactions.

Supplementary References

[30] Huang W, Wang P, Liu Z, Zhang L (2009) Identifying disease associations via genome-wide association studies. BMC Bioinformatics Suppl 1:S68.

[31] Barabási AL, Oltvai ZN (2004) Network biology: understanding the cell's functional organization. Nat Rev Genet 5:101-113.

[32] Tieri P, Zhou X, Zhu L, Nardini C (2014) Multi-omic landscape of rheumatoid arthritis: re-evaluation of drug adverse effects. Front Cell Dev Biol 2:59.

[33] Berger SI, Posner JM, Ma'ayan A (2007) Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases. BMC Bioinformatics 8:372.

[34] Wu C, Zhu J, Zhang X (2012) Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes. BMC Bioinformatics 13: 182.

[35] Baranzini SE, Galwey NW, Wang J, Khankhanian P, Lindberg R, Pelletier D, et al (2009) Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum Mol Genet 18:2078-2090.

[36] Prieto C, De Las Rivas J (2006) APID: Agile Protein Interaction DataAnalyzer. Nucleic Acids Res 34:W298-302.

[37] Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27:431-432.

[38] Hernandez-Toro J, Prieto C, De las Rivas J (2007) APID2NET: unified interactome graphic analyzer. Bioinformatics 23:2495-2497.

[39] Assenov Y, Ramírez F, Schelhorn SE, Lengauer T, Albrecht M (2008) Computing topological parameters of biological networks. Bioinformatics 24:282-284.