Supplementary Files
Genomic and epigenomic co-evolution in follicular lymphomas
Markus Loeffler1*, Markus Kreuz1*, Andrea Haake2*, Dirk Hasenclever1*,HeikoTrautmann3*, Christian Arnold4, Karsten Winter5, Karoline Koch6, Wolfram Klapper6, René Scholtysik7, Maciej Rosolowski1, Steve Hoffmann8, Ole Ammerpohl2, Monika Szczepanowski6, Dietrich Herrmann3,Ralf Küppers7,Christiane Pott3, Reiner Siebert2
on behalf of the Haematosys-Project
*these authors contributed equally to this work
1Institute for Medical Informatics Statistics and Epidemiology, University of Leipzig, Germany;2Institute of Human Genetics, Christian-Albrechts-University Kiel, Germany; 3Second Medical Department,University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany;4Interdisciplinary Centre for Bioinformatics (IZBI), University ofLeipzig, Germany;5TranslationalCentre for Regenerative Medicine (TRM-Leipzig); Germany; 6Hematopathology Section, Christian-Albrechts-University, Kiel, Germany;7Institute of Cell Biology (Cancer Research), Faculty of Medicine, University of Duisburg-Essen, Essen, Germany;8Transcriptome Bioinformatics, LIFE Research Center for Civilization Diseases, University of Leipzig, Germany;
1. Materials
Supplementary Table 1a: Summary of patient characteristics
All / Core set(Cases with IGHV-sequences)
Number of patients / n=33
(25 pairs; 6 trios;
2 quadruples) / n=19
(17 pairs; 2 trios)
Number of samples
Number of pair-wise comparisons* / n=76
n=55 / n=40
n=23
Sex / n=15 (45%) male
n=18 (55%) female / n=11 (58%) male
n=8 (42%) female
Diagnosis:
FLI/II
FLIIIa
FL NOS / n=58
n=2
n=16 / n=33
n=1
n=6
Age at biopsy (median, range) / 59 [27-88] / 54 [27-74]
Time between paired probes in months
(median, range) / 24 [0-101]** / 29 [6-101]
IGHV sequencing
Number of samples measured
Number of pair-wise comparisons* / n=40 (53%)
n=23 (42%)
n=9 validated using NGS / n=40 (100%)
n=23 (100%)
n=9 validated using NGS
Methylation analysis
Number of samples measured
Number of pair-wise comparisons* / n=76 (100%)
n=55 (100%) / n=40 (100%)
n=23 (100%)
NGS analysis
Number of samples measured
Number of pair-wise comparisons* / n=69 (91%)
n=50 (91%) / n=40 (100%)
n=23 (100%)
SNP 6.0 analysis
Number of samples measured
Number of pair-wise comparisons* / n=35 (46%)
n=19 (35%) / n=16 (40%)
n=9 (39%)
* Patients with 2 samples result in 1 pair-wise comparison, trios in 3 (primary vs. first relapse, primary vs. second relapse and first- vs. second relapse ) and quadruples in 6 pair-wise comparisons.
**7 pairs with time between samples less than 4 months were excluded from integrated correlation analyses (see section 4F).
2. Methods
DNA extraction:
DNA extraction from tissue was done using the QIAamp DNA Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer´s protocol with minor modifications. For DNA extraction, 15-20 sections à 20 µm of frozen tissue were used. The modifications are as follows: For lysis, 360 µl ATL buffer and 40 µl proteinase K and for precipitation 400 µl AL Buffer and 400 µl ethanol were used. Finally, the DNA was extracted using a total volume of 400 µl AE buffer (200 µl twice). DNA extraction from cells in DMSO was done using the Gentra Puregene Blood Isolation Kit (Qiagen). According to the manufacturer’s instructions elution was done in ddH2O. Quality control of the DNA was performed by agarose gel electrophoresis and showed a discrete band visible at a size 20 kb. Quantification and determination of purity (A260/280 > 1.8) was carried out using a Nanodrop photometer (Thermo Scientific, Braunschweig, Germany).
Detection and sequencing of immunoglobulin gene rearrangements:
To identify each patient’s clonal immunoglobulin heavy chain (IGHV) gene rearrangement, PCR amplification was performed according to the BIOMED-2 IGH Tube A protocol,1 including six consensus forward primers binding to framework region 1 (VH-FR1) in combination with one consensus reverse primer for all JH-segments. For each reaction, 200 ng DNA from fresh-frozen lymph node specimens were used.
Clonal expansion and Sanger sequencing of clonal VDJ rearrangements:
Clonal IGH VH-JH PCR products from tumor samples were subcloned into pCR4-TOPO-TA vectors (Life Technologies, Carlsbad, CA) according to the manufacturer´s instructions and expanded in bacterial colonies. We picked and sequenced between 8 and 59 individual colonies per tumor sample (median 36 colonies) via colony-PCR using M13 primers on a 3500 Genetic Analyzer (Life Technologies). Sanger sequencing was conducted with the BigDye Terminator v1.1 Cycle Sequencing Kit (Life Technologies).
454 sequencing of clonal VDJ rearrangements:
To perform 454 sequencing analysis of the rearranged IGHV loci, barcoded amplicons were prepared for NGS analysis by adding 5’ linker sequences to IGHV-FR1 gene segment family primers and the consensus JH-primer from the original primer sets published by the BIOMED-2 / EuroClonality consortium1. All amplifications were performed using a two-step PCR in a total volume of 50 µl. The first round PCR using 200 ng genomic DNA, 2.5 U FastStart High Fidelity polymerase (Roche) for 35 cycles was followed by a second amplification step using a 1/500 dilution of the first round PCR product as a template. During this second PCR step, adaptors including multiplex-identifier (MID) and sequencing adapter sequences for emulsion-PCR and 454 sequencing were added to both ends of the amplicons, applying universal-tailed fusion primers for bi-directional sequencing according to the manufacturer´s protocol. Parallel pyrosequencing was performed on a GS-Junior (Roche Diagnostics, Mannheim, Germany) following the manufacturer´s instructions. 1120 to 19311 (median8600) reads per sample were evaluable. Base calls and quality scores were extracted using the GS-Data-Analysis Software package(Version 2.5; Roche Diagnostics).
Sequencing of candidate genes:
Four different sequencing approaches were taken:
I)To investigate somatic mutations in CREBBP, TNFRSF14, TP53, CDKN2A, EP300, MLL2 and MEF2B, all coding exons of these genes were analyzed.
II)For the genes RHOH, PAX5, IRF4, CIITA, REL and PIM1,which are putative targets of the SHM machinery, we analyzed the region 2.5 kb downstream of the transcription start sites (TSS).
III)The genes BCL2, BCL6 und MYC are associated with somatic mutations in coding regions as well as aberrant SHM, so that both regions of these three genes were investigated.
IV)Finally, for detection of somatic mutations in EZH2 and MYD88, we analyzed ±75bp around the known mutational hotspots (Tyr641 in EZH2 and L265P in MYD88).
In total, we sequenced 176 regions from 18 genes with high coverage spanning 90,164 bases. The regions analyzed are described in Suppl. Table 2. To achieve a coverage on target of 1000-10000 reads the target regions were enriched using the RainDance Technology ( RainDance amplification and next generation sequencing was performed as custom service at Atlas Biolabs.
Supplementary Table 2: Candidate genes for mutation analysis
I)Potential driver mutations- all exons were analyzedgene / No. exons / chromosome / start (Hg19) / end (Hg19)
CREBBP / 31 / 16 / 3,775,053 / 3,930,123
TNFRSF14 / 7 / 1 / 2,487,802 / 2,495,269
TP53 / 10 / 17 / 7,571,717 / 7,590,865
CDKN2A / 4 / 9 / 21,967,748 / 21,975,124
EP300 / 31 / 22 / 41,488,611 / 41,576,083
MLL2 / 54 / 12 / 49,412,755 / 49,449,109
MEF2B / 13 E / 19 / 19,256,373 / 19,303,402
II) Aberrant somatic hypermutation – 2.5 kb from transcription start site were analyzed
gene / region / chromosome / start (Hg19) / end (Hg19)
RHOH / approx. 2.5kb / 4 / 40,198,527 / 40,201,027
PAX5 / approx. 2.5kb / 9 / 37,031,976 / 37,034,476
IRF4 / approx. 2.5kb / 6 / 391,752 / 394,252
CIITA / approx. 2.5kb / 12 / 10,971,055 / 10,973,555
REL / approx. 2.5kb / 2 / 61,108,752 / 61,111,252
PIM1 / approx. 2.5kb / 6 / 37,137,922 / 37,140,422
III) Potential driver mutations and aberrant somatic hypermutation
gene / region / chromosome / start (Hg19) / end (Hg19)
BCL6 / approx. 2.5kb / 3 / 187,439,162 / 187,463,475
BCL2 / approx. 2.5kb / 18 / 60,790,576 / 60,987,380
MYC / approx. 2.5kb / 8 / 128,747,765 / 128,750,815
IV) Known mutation position
gene / position / chromosome / start (Hg19) / end (Hg19)
MYD88 / L265P / 3 / 38,179,966 / 38,184,514
EZH2 / Trr641 / 7 / 148,504,461 / 148,581,443
For validation of selected SNVs detected by next generation sequencing in CREBBP, TNFRSF14, TP53, CDKN2A, EP300, MLL2 and MEF2B, Sanger sequencing using an ABI Sequencer 3100 (Applied Biosystems) was performed using the primers presented in Supplementary Table 3.
Supplementary Table 3: Primer sequences
Gene / analyzed region / fwd-primer (Sequence 5'-3') / rev-primer (Sequence 5'-3') / Temp [°C] / Amplicon length / Chrom(HG19) / Start
(HG19) / End
(HG19)
EZH2* / TYR-641 / tttgtccccagtccattttc / tggcaattcatttccaatca / 55 / 267 bp / 7 / 148508598 / 148508873
TNFRSF14 / Exon 1 / TCCTCTGCTGGAGTTCATCC / CATGGGGAAGAGATCTGTGG / 60 / 209 bp / 1 / 2488044 / 2488252
TNFRSF14 / Exon 2 / ATCTCCCAATGCCTGTCCT / AGAAGGGGGCAAGAGTGTCT / 60 / 202 bp / 1 / 2489135 / 2489336
TNFRSF14 / Exon 3 / TAGCTGGTGTCTCCCTGCTT / GGCTGTGCTGGCCTCTTAC / 60 / 250 bp / 1 / 2489677 / 2489926
TNFRSF14 / Exon 4 / TCCACGTACCCCTCTCAGC / GAAATGGGAGGGGTGTCC / 60 / 228 bp / 1 / 2491224 / 2491451
TNFRSF14 / Exon 6 / CTCCCTGAGGCTGAGTGAAC / GGTGACAGAGCTCCAAGAGG / 60 / 277 bp / 1 / 2493043 / 2493319
TNFRSF14 / Exon 8 / AAAATGAACCCGAGAACCTG / AGGTGGACAGCCTCTTTCAG / 60 / 267 bp / 1 / 2494514 / 2494780
CREBBP / Exon 13 / CATCCTCTGGGGTTGTGAAG / CATGAAATGTGCATTCTGGA / 55 / 401 bp / 16 / 3823635 / 3824033
CREBBP / Exon 14 / TCCATTTCTGGTAGGGACAGGTGC / GGCCCAAAAACAGCAGAGACAGA / 60 / 463 bp / 16 / 3820539 / 3821001
CREBBP / Exon 15 / TTGTAGGTTGCATGAGCAGC / CAGGGATACCCATGGCAG / 55 / 356 bp / 16 / 3819081 / 3819436
CREBBP / Exon 22-23 / GGACGCACACACAGACTTCTAC / AACCAAAGAACAATGGGGAC / 60 / 621 bp / 16 / 3794816 / 3795436
CREBBP / Exon 25 / GGTGTGCAGAAGCACCTTG / GAAGGCTCACAGGCTCCTC / 65 / 306 bp / 16 / 3789484 / 3789789
CREBBP / Exon 26 / aatgacagagcaagaccctg / TTAAAATACCCATTATTTCACGG / 55 / 315 bp / 16 / 3788474 / 3788788
CREBBP / Exon 27 / TAACTCCTTAAAGGCAGGGC / AAAAGGCACACAAATATCCTCC / 55 / 300 bp / 16 / 3786584 / 3786883
CREBBP / Exon 28 / CATGGGACTCTGCCACAC / GACACCACCACAGGAAGGAC / 60 / 388 bp / 16 / 3785931 / 3786318
CREBBP / Exon 29 / TGACCTACTTTGGCCTGAGC / ACTTCCCTCCCACCACAGAC / 65 / 377 bp / 16 / 3781671 / 3782047
CREBBP / Exon 30 / CTATTCTGCAGGCTGGGTG / AAAGGGACAGGATGCTTCG / 60 / 442 bp / 16 / 3781127 / 3781568
CREBBP / Exon 31 / CCTGTACCGGGTGAACATCAAC / GCTGCCTCCGTAACATTTCTCG / 60 / 677 bp / 16 / 3778459 / 3779135
CREBBP / Exon 31 / CCAAGTACGTGGCCAATCAG / ACCGCACCTGGTTACTAAGG / 65 / 717 bp / 16 / 3778015 / 3778731
TP53 / Exon 5-6 / TAGTGGGTTGCAGGAGGTG / tcaaataagCAGCAGGAGAAAG / 65 / 594 bp / 17 / 7578076 / 7578669
TP53 / Exon 12 / TGGGGTAAGGGAAGATTACG / TTCTGACGCACACCTATTGC / 58 / 399 bp / 17 / 7572815 / 7573213
CDKN2A / Exon 1 / AGTTAAGGGGGCAGGAGTG / GGCTCCTCAGTAGCATCAGC / 60 / 246 bp / 9 / 21994174 / 21994419
EP300 / Exon 4 / gaaatagcacattatgactcctacca / tccctggctgtaaaaattgc / 60 / 363 bp / 22 / 41523440 / 41523802
EP300 / Exon 14 / ttctgttctgaattgctgtcttg / atggaaatggcccagaagta / 55 / 558 bp / 22 / 41545721 / 41546278
EP300 / Exon 17 / tggtaactaatttcaaatgcacttttt / tggctatactgtttggaatgtga / 60 / 243 bp / 22 / 41550963 / 41551205
EP300 / Exon 26 / gaactcattatgtgacctgacttttt / tgttacgtaagaactaaaatgaggaaa / 60 / 295 bp / 22 / 41565449 / 41565743
EP300 / Exon 27 / caacttgtggtttaaaatgtagcc / ccagatctattgtcagcacctg / 65 / 285 bp / 22 / 41566333 / 41566617
MLL2 / Exon 3 / gcgtggtactgatgcttgtg / cagcccttatcccatttcct / 60 / 293 bp / 12 / 49448271 / 49448563
MLL2 / Exon 5 / ggctgacactgaggctcttt / tctcatttgccctatgacca / 60 / 235 bp / 12 / 49447723 / 49447957
MLL2 / Exon 6 / gcaatgtgctgaggcttaca / tcctgcccttccattcctac / 60 / 247 bp / 12 / 49447239 / 49447485
MLL2 / Exon 10 / aggagcatcgtgttgttgtg / GGAGACAGGCGAGATGCT / 65 / 490 bp / 12 / 49445745 / 49446234
MLL2 / Exon 10 / CCGCCACCTGAGGAATTG / GTGGGGAAGCAGGTGAGTC / 63 / 463 bp / 12 / 49445338 / 49445800
MLL2 / Exon 10 / GTGTCACGCCTGTCTCCAC / GCATAGGCATGGCTCCTC / 63 / 366 bp / 12 / 49445126 / 49445491
MLL2 / Exon 10 / TGAGGAGCCGCAACTCTG / CTCCTCAGGGGGCTTTTC / 55 / 424 bp / 12 / 49444856 / 49445279
MLL2 / Exon 11 / GGGGACAGTGACCCTGAGT / CCCCCACTACCTTCCCTATG / 65 / 298 bp / 12 / 49444181 / 49444478
MLL2 / Exon 14 / tgactctggtcgcaaatcag / attccccagcctacacctct / 65 / 242 bp / 12 / 49441712 / 49441953
MLL2 / Exon 23 / ctccttgactgccccaca / ccatcaaataacttgccagctc / 65 / 243 bp / 12 / 49437342 / 49437584
MLL2 / Exon 27 / acaggtgggagtggtctgaa / cagatggagggaaaggacaa / 65 / 232 bp / 12 / 49436287 / 49436518
MLL2 / Exon 29 / gcctgccaagtcttctctga / cagttcccacgctaatccat / 65 / 152 bp / 12 / 49435663 / 49435814
MLL2 / Exon 31 / GTTACCCCTCGCTTCCAGTC / GCCCAAAATGGCTGTTGAT / 60 / 385 bp / 12 / 49433851 / 49434235
MLL2 / Exon 31 / TTCACTTTCCCTCAGGCAGT / ggagcgatatagggggctta / 60 / 481 bp / 12 / 49433467 / 49433947
MLL2 / Exon 32 / tgggcttattcctcttctctttt / ccactatcccttgccactct / 60 / 242 bp / 12 / 49433192 / 49433433
MLL2 / Exon 33 / gggccaggatattgaaggtt / atccatcccccttggtttac / 60 / 234 bp / 12 / 49432959 / 49433192
MLL2 / Exon 34 / ttccagGCAACTGGTAGGAG / GTGGGGTGTTGGATGAAGAC / 65 / 493 bp / 12 / 49432286 / 49432778
MLL2 / Exon 34 / GCTGCTGATGCCTCTGAAC / CTGAAAGCTGCTGCTTCTTCT / 65 / 496 bp / 12 / 49431337 / 49431832
MLL2 / Exon 34 / GCATCTGGGGATGAGCTAGA / tggctatgttaccagctgagg / 65 / 575 bp / 12 / 49430884 / 49431457
MLL2 / Exon 35 / cgcagatattcactggagca / gggtgtgactgggaaagaaa / 58 / 237 bp / 12 / 49428543 / 49428779
MLL2 / Exon 38 / tcctgacacccagcttcttt / tctgggtgctaggctgaagt / 60 / 293 bp / 12 / 49427816 / 49428108
MLL2 / Exon 39 / GCACACTAATCTCATGGCAGA / GGATTGCCACCTGTCCTAGA / 65 / 500 bp / 12 / 49427228 / 49427728
MLL2 / Exon 39 / GAAGCCTCGGACCTGATTC / CCTTGCTGTTGGTGCTGTT / 65 / 484 bp / 12 / 49426885 / 49427369
MLL2 / Exon 39 / AGGGCCTTATGGGACACAG / GGCCCATCTGCTGCTGTT / 63 / 396 bp / 12 / 49426559 / 49426955
MLL2 / Exon 39 / TCTCCTCAGCAACAACAGCA / AGGCTGATCCCCTAAGGAAA / 65 / 480 bp / 12 / 49426053 / 49426532
MLL2 / Exon 39 / GCAGCTAGGCAGTGGATCAT / GTGGGGTCTGGCGTACTG / 65 / 374 bp / 12 / 49425764 / 49426137
MLL2 / Exon 39 / AAGGAGTCCTGGCCAAAAAC / GCAGCAGCAGGTGAGACC / 60 / 484 bp / 12 / 49425400 / 49425883
MLL2 / Exon 39 / ACCTCAGGGGCCAACCTT / GTTCCTGGTGCCCCTATTG / 65 / 300 bp / 12 / 49425154 / 49425453
MLL2 / Exon 40 / ggctctgaggaggagggtag / ctatcctgggatgggaccag / 60 / 233 bp / 12 / 49424632 / 49424864
MLL2 / Exon 48 / tacagggcaccctcctacag / ATGTCTCGCGGTACCTTGTC / 60 / 463 bp / 12 / 49420663 / 49421125
MLL2 / Exon 48 / CCTTGCGACCTGACAAGGTA / ACAGGGCCCCTTGATCTTAT / 60 / 371 bp / 12 / 49420323 / 49420693
MLL2 / Exon 50 / ctttggcctaaccccaaaaa / gaccagaggatccctgtcaa / 60 / 249 bp / 12 / 49418299 / 49418547
MLL2 / Exon 51 / cagaggaggtgggtggtatg / gccagctcatacCTGCTCTT / 60 / 368 bp / 12 / 49416361 / 49416728
MLL2 / Exon 52-53 / agaagggaaaggcaggagaa / aggaggaggagctgctttgt / 55 / 491 bp / 12 / 49415780 / 49416270
MLL2 / Exon 54 / gcattgattctgccctcttc / CAATGGCTGCTTCTGTCTGG / 60 / 390 bp / 12 / 49415295 / 49415684
MEF2B / Exon 5 / ggcagacagaggagaggtgt / tcaggtcagtcccttgccta / 60 / 246 bp / 19 / 19261413 / 19261658
MEF2B / Exon 6 / acaccaccccacattcatct / taaagcacgtcagccacaaa / 55 / 389 bp / 19 / 19259911 / 19260299
MEF2B / Exon 10 / gggtgtgggcctcagttt / taaccacccccagtgacagt / 55 / 248 bp / 19 / 19257252 / 19257499
MEF2B / Exon 11 / gaaggcttaaggagatgtccag / gtgcgcagtaccagggatg / 60 / 249 bp / 19 / 19256995 / 19257243
CREBBP / Exon 31 / CACAGCAGCCCAGCACAC / TTGTTGATGTTCACCCGGTA / 60 / 256bp / 16 / 3779112 / 3779367
* described in (Pellissery et al, 2010)2
Temp indicates the annealing temperature in the PCR.
SNP array analysis:
SNP array experiments were performed according to the standard protocol for Affymetrix GeneChip SNP 6.0 arrays (Affymetrix). Briefly, a 500 ng sample of DNA was digested with StyI and NspI, ligated to adaptors, amplified by PCR, fragmented with DNAse I, and biotin-labeled. The labeled samples were hybridized to Affymetrix GeneChip SNP 6.0 arrays, followed by washing, staining and scanning. The complete dataset comprised 35 FL samples in total (16 cases included within the core set) and 33 lab-specific euploid samples (17 females and 16 males) for controls.
DNA methylation analysis:
Bisulfite conversion of the DNA was performed using the “Zymo EZ DNA methylation Kit” (Zymo Research, Orange, CA) according to the manufacturer´s instructions with the modification described in the Infinium Assay Methylation Protocol Guide (Illumina, San Diego, CA). All further analysis steps were performed according to the “Infinium II Assay Lab Setup and Procedures” and the “Infinium Assay Methylation Protocol Guide”. The processed DNA samples were hybridized to the HumanMethylation 27 BeadChips (Illumina, San Diego, CA). This array was developed to assay 27,578 CpG sites selected from more than 14,000 genes. Raw hybridization signals were processed using Bead Studio software (version 3.1.3.0, Illumina) applying the default settings.
3. Bioinformatic and statisticalanalyses
Detecting selection by mutation analysis in the IGHV region sequences
The objective of the mutation analysis was to compare the ratio of the observed number of replacement mutations and the observed number of silent mutations with their expected ratio, assuming no selection for both the structural regions of the heavy chain known as „framework regions“ (FWR) and the „complementaritydetermining regions“ (CDR).
For each tumour sample the analysis consisted of the following steps:
(1) All tumour IGHVsequences were aligned with their most likely germline sequences using the IMGT/HighV-QUEST online tool.3
(2) The number of different replacement (R) and silent (S) mutations in the set of clonally related sequences was determined for FWR and CDR resulting in counts Rfwr, Sfwr, Rcdr, Scdr.
(3) A model4,5for SHM assuming no selection, accounting for micro-sequence specificity of SHM targets and transition bias was used to determine expected counts given the total number of observed mutations resulting in numbers eRfwr, eSfwr, eRcdr, eScdr.Step (2) and (3) was performed using the web server
(4) The web tool provides p-values on the null-hypothesis of no selection (separately for FWR and CDR) using the so called focused binomial test. P-values do not lend itself to meaningful meta-analyses. In addition, we wanted quantitative comparisons of strength of selection within tumour pairs from the same patient. Therefore we defined the logRSoddsratio=log( (R/S) / (eR/eS) ) as a quantitative measure of selection strength (compare Yaari et al 20126). This quantity compares the observed R/S ratio to the one expected under the null-hypothesis of no selection.The logarithm transforms the measure to its natural scale such that the estimates are approximately normally distributed.
(5) The logRSoddsratio can be estimated using the numbers from step (2) and (3). In line with the ‘focused binomial test’ outlined in Uduman et al, 20114and Hershberg et al, 20085 we gain power assuming that silent mutations are neutral concerning selection and thus Sfwr/Scdr = eSfwr/eScdr (we assume the mutation model that generates the expectations). Under this assumption logRSoddsratio=log( (R/S) / (eR/eS) )= log( (R/(Sfwr + Scdr)) / (eR/(eSfwr + eScdr)) ). 95% confidence intervals can be obtained sampling from the posterior distribution Dirichlet(eRfwr/E+Rfwr, eRcdr/E+Rcdr, (eSfwr + eScdr)/E +(Sfwr + Scdr)) [with E= eRfwr + eSfwr + eRcdr + eScdr]. P-values dual to these CIs are in good concordance with the p-values of the focused binomial test.
(6) Standard methods7 for fixed and random effect meta-analyses and forest-plots are used to analyse logRSoddsratios across samples.
We further wanted to distinguish evolution in times before tumor onset and after tumor initiation. Therefore, three computations were performed for each sample, each time using a different rooting sequence. Supplementary Figure 1 illustrates the three types of reference sequences:
- The first rooting sequence was a consensus germline sequence constructed from all germline sequences assigned to the tumour sequences of the patient using the IMGT/V-QUEST online tool. Bases which differed among these germline sequences were substituted by „N“ (any base) in the consensus germline sequence.
- The second rooting sequence consisted of bases common to all sequences from the primary tumour and the relapse tumour of the same patient. At positions where we observed different bases among these tumour sequences we inserted the corresponding base from the consensus germline sequence (the first reference sequence) into this common rooting sequence.
- The third rooting sequence was constructed from bases common to the sequences found in each single tumour sample. Positions which varied among these sequences were filled with the corresponding base from the germline consensus sequence.
Supplementary Figure 1. An illustration of the chronological position of the three types of rooting sequences used for detecting selection in each sample.
Using three different rooting sequences allowed us to investigate how strongly selection acted on the observed sequences
a) at any time since theVHDHJHrecombination of the B-cell from which the primary and the relapse tumours originated (evaluation with respect to the germline rooting sequence),
b) since the time of the last common somatic mutation in the precursor of the primary tumour and the relapse (evaluation with respect to the common tumor rooting sequence), and
c) since the last somatic mutation which was common to the sequences of the investigated tumour sample (evaluation with respect to the tumor specific rooting sequence), respectively.
NGS candidate genes
Sequence data of all 69 samples was mapped to hg19 using the segemehl algorithm8with default parameters. Samtools mpileup version 0.1.189was applied to each sample with parameter “–d” set to 25000, thus allowing a maximum coverage of 25000 reads per position. For further analysis positions with effective enrichment were selected. Therefore all positions within enriched genomic regions (see Supplementary Table2) with coverage >1000 reads and coverage of high quality (HQ) reads >500 (Phred quality score Q≥13) were selected for further investigation. Median HQ-base coverage over all target positions and lymphoma samples was 5343 bases (range: 4149-6653). Over all analyzed samples >99.3% of the enriched genomic positions showed both a coverage >1000 and a HQ coverage >500.
Prior to variant calling for each position the number of reference and alternative alleles were summarized for forward and backward strand separately. This was repeated for high quality bases (Q≥13).
For the variant calling, the proportion of the most frequent alternative allele was analyzed for each position in each lymphoma sample. A variant was calledif ≥10% of all HQ-bases showed a concordant alternative allele.
To achieve a high specificity and avoid false positive variant calls, additional quality filters were applied. Variants showing a proportion of low quality reads of >40% were rejected. In addition variants with a high allelic imbalance between forward and reverse strand for reference and mutant alleles were removed. Therefore for each variant the |logOR| of the number of reference alleles and mutated alleles, each for forward and reverse strand was calculated. To avoid zeros 0.5 was added to each count. Variants with a |logOR|≥5 were removed from further analysis.
After filtering, all variants were annotated with dbSNP build 135 for overlap with known single nucleotide polymorphisms. All variants overlapping positions with known SNPs were excluded from further analysis. In addition, functional annotation was added to all variants using vcfCodingSnps version 1.5 (
ab
Supplementary Figure 2.A) shows the histogram of the differences of the allele frequencies for detected mutations on the logit scale. The histogram indicates 3 peaks representing mutationspresent in both samples or either in primary (PT) or relapse tumor (RT) exclusively.A threshold of -2 and 2 (indicated by red vertical lines)is applied to distinguish these 3 groups. B) displays the allele frequency of mutated alleles for paired primary and relapse tumors. Red dots indicate mutations selected as differential between primary and relapse sample using the thresholds described in A).
To compare mutations between paired samples of the same patient, for each variant identified (see Supplementary Table 5) the frequency of the mutant allele was compared. If a variant was called only in one sample the allele frequency of the second sample was calculated from the raw data. When comparing the differences of the allele frequencies on the logit scale for all non-SNP positions 3 groups appear(see tri-modal histogram in Supplementary Figure 2a). Using a threshold of 2 respectively -2 allows distinguishing concordant variants from variants present in only one of the lymphoma samples (see Supplementary Figure 2b).
The proportion of discordant variants within pairs was determined for candidate genes and 2.5 kb downstream of TSS regions (non-IG SHM targets)separately and used as a summary measure of divergence.A schematic overview over the analysis pipeline for the sequencing data is shown in Supplementary Figure 3.