Supplementary data

Spatio-temporal heterogeneity characterize the genetic landscape of pheochromocytoma and defines early events in tumorigenesis.

Crona et al.

Patients and samples

Tissue collection

Sample preparation and nucleic acid extraction

Sample Sequencing

Targeted Deep Sequencing

PCR amplification. fragment separation and Sanger Sequencing.

High throughput sequencing of selected PCR amplicons.

SNP array genotyping and raw data analysis.

Pipeline for resolving absolute ploidy and purity in Pheochromocytoma

Determination of LogR intensities and B allele frequencies

Copy number classification

Determination of clonal proportions

Inferring allele specific copy number status by establishing relationships between logR and absolute ploidy

Results

SNP array Genotyping

Results and validation of absolute ploidy and clonal fractions

Estimation of sample tumour cellularity

Comparison of robustness of the determination of variant frequencies

References

1

Supplementary Figure 1

Overview of the workflow executed in the current study.

1

Patients and samples

Tissue collection

All tumours samples were obtained from surgical specimens that had been collected between 1988-2013 at the Unit of Endocrine Surgery, Uppsala Academic Hospital, Uppsala, Sweden. Inclusion criteria were diagnosis of pheochromocytoma or paraganglioma confirmed by routine histopathological investigation. Tumour specimens were dissected directly upon resection and snap frozen in liquid nitrogen. Tissue samples were then stored in low temperature freezers that maintained an average temperature of less than -69°C. Peripheral blood was collected at the time of patient admission for surgery and stored below -20°C.

Sample preparation and nucleic acid extraction

Tumour cell content was quantified by light microscopy that was performed by two independent observers. Selected samples were macro-dissected in order to reduce the proportion of non-tumour cells. RNA and Genomic DNA were extracted using DNeasy Blood & Tissue and/or AllPrep DNA/RNA kits (Cat. No. 69506 and 80204, Qiagen, Hilden, Germany) as instructed by the manufacturer. Reverse transcription was performed using RevertAid First Strand cDNA Synthesis Kit (Thermo Scientific, Waltham, MA, USA). Nucleic acid quality and concentrations were assessed using Nanodrop spectrophotometer (ThermoFischer Scientific, MA, USA).

Sample Sequencing

Targeted Deep Sequencing

Genomic DNA from 131 PPGL samples were analysed with targeted next generation sequencing. For this purpose a aTruseq Custom Amplicon (Illumina Inc. CA. USA) targeted enrichment kit was designed to cover the protein coding sequences of theSDHA.SDHB.SDHC.SDHD.SDHAF2.VHL.EPAS1.RET (Exons 8. 10-11 and 13-16).NF1.TMEM127.MAX and H-RAS (Exons 2 and 3)genes. Genomic enrichment and library preparation experiments were performed by the SciLifeLab core facility ( as instructed by the manufacturer (Part# 15027983). The generated 175bp paired end libraries were indexed.pooled and sequenced twice on a MiSEQ instrument (IlluminaInc) as instructed by the manufacturer (Part# 15027617). Read mapping to the human reference sequence HG19 and subsequent variant calling were performed using MiSEQ Reporter v2.1.43 (IlluminaInc) using default settings. Briefly reads were mapped using a Smith-Waterman algorithm (reference sequence HG19) and variant called using the Genome Analysis Toolkit (GATK). Results from generated .bam and .vcf files were further analysed and visualized using CLC Genomics Workbench 5.51 (CLCbio. Aarhus. Denmark). Variant were processed by (1); excluding variants not available in both sequencing runs. (2); excluding synonymous variants without a probable splice site effect.(3); annotation for overlapping information in; Single Nucleotide Polymorphism database (dbSNP) build 137. Catalogue of Somatic Mutations in Cancer (COSMIC) (1).database of Human Gene Mutation Data (HGMD) (2) and Leiden Open (source) Variant Database (LOVD).The impact of non-synonymous amino acid substitution was further assessed in silico using PolymorphismPhenotyping v2 (Polyphen2) (3) and Sorting Intolerant from Tolerant (SIFT) (4). Variants were classified as Benign. Pathogenic or Unknown (Variant of Unknown Significance. VUS). Variant frequencies were determined from.bam files generated by MiSEQ Reporter v2.1.43 using theCLC Genomics Workbench 7.5 quality based variant calling algorithm (lower variant threshold 5%. default settings). The mean of the variant frequencies generated in sequencing runs 1 and 2 were calculated and used for further analysis.

Variants predicted as pathogenic were validated with sanger sequencing using genomic DNA andcDNA when appropriate. Primer sequences can be obtained by request.

PCR amplification.fragment separation and Sanger Sequencing.

Variants located outside the targeted regions could cause truncation of the NF1 protein through aberrant RNA splicing (5) and selecting tumours with LOH overlapping the NF1 locus have been shown to enrich for NF1 mutated tumours (5, 6).

Primer sequences of NF1 cDNA (NM_001042492.2) were obtained from Welander et al. (5). Amplified PCR products were sequenced using automated Sanger sequencing (Beckman Coulter genomics.Takeley. UK). PCR amplified cDNA from tumours and three normal controls were separated on 2.5% agarose gel. Those in which alternative lengths of amplicons could not be excluded samples were selected for sequencing.

High throughput sequencing of selected PCRamplicons.

Genomic DNA from the tumours of two patients were selected for additional validation; patient 8 with a 30bp deletion that was not detected using TSCA libraries and Patient 29 with low allele frequency. Both variants occurred in exon 1 of the VHL gene and were amplified by PCR using custom primer sequences designed in house using NCBI primer. Nextera (Illuminainc) paired end sequencing libraries were prepared from 1 ng of amplified dsDNA using an enzymatic fragmentation approach. Briefly amplified DNA was fragmented and tagmented by transposomes. A limited-cycle PCR reaction amplified the insert DNA and added index sequences. The libraries were purified using Ampure XP beads (Beckman coulter genomics.CA. USA).normalized and pooled into a single suspension. The sample pool was sequenced on a single sequencing run on the MiSEQ (IlluminaInc) instrument. Library preparation and sequencing were performed by SciLifeLab core facility (Stockholm node). Read mapping and variant calling were performed using CLC Genomics Workbench 5.5 mapping algorithm and quality based variants caller with default settings.

Semi-Quantitative PCR

Experiments were performed in triplicates using SsoAdvanced SYBR Green Supermix (Bio-Rad laboratories, Hercules, CA, USA) on a CFX96 Real Time system (Bio-Rad laboratories). Expression of VEGFA was analysed and normalized to HPRT1expression. ΔΔCT values were calculated for further analyses.

SNP array genotyping and raw data analysis.

Genomic DNA from a total of 138 tumour samples were analysed with Illuminainfinium dual flourencenceSingle Nucleotide Polymorphism arrays HumanOmniExpressExome-8v1_A. Omni1-Quad or Omni2.5-Quad (Illumina. CA. USA) containing 730’525. 1.140.419 and 2.379.855 probes respectively. DNA amplification.tagging and hybridization were performed according to the manufacturer’s instructions and scanned on a HiScanSQ instrument (Illumina). The experiments were carried out at Science for Life Laboratory core facilities.SNP&SEQ Technology Platform in Uppsala.Sweden ( according to the manufacturer’s guidelines. Generated fluorescent raw signals were imported into IlluminaBeadStudio (version 7.5 build 2011) software and normalized against reference data (ICF) provided by the manufacturer. Nexus copy number beadstudio plugin was used to export total relative fluorescent intensities (logR) as well as the relative intensity (BAF) for each individual probe.

Pipeline for resolving absolute ploidy and purity in Pheochromocytoma

We utilized a workflow consisting of multiple commercially and publicly available bioinformatics suites and algorithms as outlined schematically in figure S1.

Determination of LogR intensities and B allele frequencies

Further analysis of SNP array results were performed using Nexus 7.5 (Biodiscovery.CA. USA). Brieflydata was normalized for GC waves by array specific quantile systematic correction algorithms that were provided by the manufacturer. A circular binary segmentation algorithm SNP-RANK was used to call regions with copy number events and/or allelic imbalances (significance threshold 10-13). Samples with Nexus QC score >0.13 were excluded. Marginal events (defined as <100 probes) were excluded from further analysis in order to reduce false predictions. Generated segments were reviewed by two independent observers and consensus was reached by integrating LogR and BAF. BAF values 0.5 were transposed (to 1-BAF) and homozygous SNP calls excluded.The median valuewas calculated from both the upper and lower band. Segment coordinates and the corresponding median logR and BAF (mean of the two bands) were then exported for further analysis.

Copy number classification

In order to determine the underlying ploidy and clonal fraction of specific segments we used a two-step approach. First all segments were analysed using a maximisation of parsimony model. In regions where multiple different combinations of clonal fractions and ploidy may explain the observed data the model assumes that the solution with the smallest number of deviations from diploid have the highest probability of being correct. Regions with no relative change in LogR and with BAF 0.5 were considered heterozygous copy number neutral (ploidy AB). Regions with aberrant LogR/BAF calls were assigned classifications: relative decrease in logR and BAF >0.54 (technical detection limit) were considered a heterozygous loss (ploidy B/A) whereas a relative increase in logR and BAF >0.54 were considered as a copy number gain (ploidy AAB/BBA). Segmentations showing decrease in logR with BAF 0.54 were considered a homozygous loss (ploidy 0). Increase in logR and BAF <0.54 were considered a balanced copy number gain (ploidy AABB). Segments with logR of 0 and BAF >0.54 were considered as copy number neutral loss of heterozygosity (ploidy AA/BB). Ploidy classifiers A. B. AB. AAB. BAA. AABB will be denoted as simple calls in the remains of this manuscript.

Determination of clonal proportions

Ploidy and BAF valueswere used to calculate the proportion of cells (p) harbouring the particular SCNA(7). Formula (1) was derivedwhere(a) denotes ploidy of allele A and (b) denotes ploidy of allele B.

(1)

Segments showing p=1 were considered germline events and were considered to be outside of the scope of this manuscript and removed from the subsequent analyses.(8)

Inferring allele specific copy number status by establishing relationships between logR and absolute ploidy

We hypothesized that a minority of the generated segments could harbour complex combinations of multiple different chromosomal aberrations with different clonal fractions. In order to identify such segments we sought to compare each segment to an empirical dataset. LogR and copy number have been previously described to have a distinct relationship(9).We ruled that our dataset of segments would have the robustness to generate a similar relationship. This was based on (I) the expected robustness of the maximization of the theory of parsimony approach (especially as majority of our samples had limited number of SCNA) and (II) the validation of our results from copy number estimation provided by the existing literature.

Absolute copy number for simple calls were calculated using ploidy and clonal fraction in accordance with Formulae(2). CN represents the total copy number,pthe fraction of aberrant cells and Ptthe ploidity of the segment.

(2)

For all individual samples LogR (X-axis) and copy number (Y-axis) were then plotted to establish sample specific standard curves.Segments plotted as outliers on the standard curve were excluded and the remaining values included for logarithmic regression that was used to generate functions of the form shown in Formulae(3)describing the relationship between Log R and total copy number. Similar array specific curves were also produced using all the segments generated on a specific array.

(3)

For each sample the function from standard curve was used to calculate total copy number from LogR for all segments.Allele specific copy number status of each segment could then be estimated through multiplying the calculated total copy numberwith the corresponding BAF in accordance with Formulae (4) and (5).

(4)

(5)

Allele specific copy number was then used to calculate the fraction of cells in the sample carrying each aberration (clonal fraction) using Formulae (1.)For each sample.thetumor purity was estimated to be equal to the highest clonal fraction of the segments in the sample. Segments with clonal fraction within 10 percentage pointsof the highest clonal fraction in a given sample were classified tobe fully clonal while segments with lower clonal fractions were considered to carry subclonal events.

1

Results

Table S1. Sequence coverage of targeted NGS
MiSEQ batch / MiSEQ run / Coverage, mean number of reads / Percentage of targets at X-fold coverage
1X / 5X / 10X / 40X / 80X
1 / 1 / 201.94 / 98.1 / 97.7 / 97.1 / 91.3 / 82.5
1 / 2 / 197.3 / 98.1 / 97.6 / 97.2 / 90.8 / 80.1
2 / 1 / 280.3 / 98.5 / 98.1 / 97.3 / 92.1 / 84.7
2 / 2 / 284.4 / 98.5 / 97.9 / 97.3 / 92.1 / 84.7

Supplementary Table 1

Mean Coverage and percentage of targets with X-fold coverage from MiSEQ runs 1 and 2.

1

Supplementary Figure 2

Summary of the utilized sequencing workflow.All but one mutation previously detected by Sanger sequencing could be confirmed. There were no additional variants discovered neither in sequencing rounds number 2 or 3. Analysing the 30bp deletion not detected by targeted NGS showed that this particular variant was located at the overlap of two amplicons (data not shown) and could be validated using customized enrichment.NEXTERA libraries and MiSEQ sequencing. In conclusion all variants could be validated using two principally different sequencing chemistries.

Supplementary Figures 3A-B

Sequencing reads and chromatograms from tumour and normal tissueshowing(A) validation ofH-RAS,RET and VHL mutations. Figure 3B presents data from patient 29 that had previously been determined as wild type in the VHL locus by Sanger Sequencing analysis. Targeted NGS detected a pathogenic VHL mutation c.[193T>G]. Multiple samples from geographically different regions from the same primary tumour was analysed using Sanger sequencing revealing a secondary peak corresponding to the mutated G. The mutation was validated to have a 14.6% (sequence depth 58746) allele frequency using Nextera libraries sequenced on a Miseq instrument.

Supplementary Figure 4

Validation of NF1 mutations that was determined to cause a splice site disruptionby in silico analysis.cDNA from tumour tissue were amplified by PCR and separated on agarose gel that revealed aberrant fragments lengths for all three cases. Results were validated by Sanger Sequencing that showed effect on protein structure: c.288+1G>T,p.Arg69_Gly96del, c2252-2A>G, p.Gly751fsand c.4836-2delA, p.1613_1871del.

SNP array Genotyping

Genotyping was performed with a success rate (defined as a genotyping call rate of >98% for individual SNPs) of 95%. Median quality score determined by Nexus Copy numberwas 0.029 (range 0.1-0.015). A discrete asymmetry between the two fluorescent dyes was observed. Increasing difference between the two bands proportional to the distance from 0.5 were noted with the highest differences observed when the bands were in proximity of 1 and 0. Normalization with established bioinformatics scripts did not compensate for this asymmetry(10).The performance of the workflow was tested using BAF derived from either the upper and lower bands or from each individual single band only. BAF calculations from both bands were considered to have the most robust performance and were selected for further analysis.

Results and validation of absolute ploidy and clonal fractions

A total of 1191 segments were generated. 1064 affected one allele and 127 were bi allelic.Absolute ploidy and purity of two samples could not be resolved even after considering the interpretations generated by ASCAT and TAPS hence both samples were excluded from further analysis. Sample and array specific curves were created from the remaining samples and logarithmic regression was applied to generate quantitative relationships between LogR and absolute copy number. A small number segments were outliers and were classified to have unknown copy number state. The remaining segments were plotted and the extracted function of the standard curve were in line with that of previous studies(9).

Supplementary Figure 5.Standard curve showing relationship between copy number and Log R ratio generated by all segments included samples and arrays.Middle lineequals mean value.Coloured area denotes onestandard error.

Estimation of sample tumour cellularity

Correlation between theestimations of sample purity was found betweenstudy workflow and ASCAT (figure S6. Pearson correlations test R=0.830). Compared to ASCAT (considered as golden standard) the in house pipeline tended to overestimate sample purity. Troubleshooting revealed that by analysingBAF derived from the lower band only the study workflowshowed a stronger correlation to ASCAT. However as this increased the intra sample divergence of clonal fractions we decided to use BAF derived from both bands.Histopathological investigation showed low sensitivity to detect samples with low purity as determined from SNP array data (Figure S7).

Supplementary Figure6

Correlation of study observations and ASCAT analysis in estimating the total fraction of tumour cells within each individual tumour sample. ASCAT could not determine the tumour purity for two samples. Pearson correlation test R=0.830.

Supplementary Figure 7

Correlation of results from histopathological investigation and ASCAT in estimating the total fraction of tumour cells.Pearson correlation test R=0.658.

Supplementary Figure 8

Correlation between TAPS and the study workflow in the estimation of clonal fractions for specific segments. Data from sixteen samples were selected because of high levels of aneuploidy and/or subclonal fractionsin order to validate the study workflow.

Comparison of robustness of the determination of variant frequencies

Only samples in which both SNP array and deep sequencing analysis had been performed on the identical batch of genomic DNA were considered for integrative analysis. In order to determine the overlap between the two methods we selected variants present in dbSNP and plotted the corresponding variant frequency (VAF) generated from deep sequencing to the median transposed BAF observed at the corresponding location by SNP array. Pearson correlation testconfirmed satisfactory overlap between the two methods (Figure S9A.Pearson correlation R=0.945). Relative differences between the two methods were plotted against deep sequencing read coverage. As expected this confirmed that variants with high read coverage showed less variability than its counterparts (Figure S9B).