Appendix:

Genetic Diversity of Newly Diagnosed Follicular Lymphoma

Asmann, YW et al

METHODS

Patients and tissue samples:

Patients with FL were diagnosed and classified according to World Health Organization criteria. The fresh frozen tumors and paired peripheral blood (referred to as “normal”) samples were obtained from the Mayo Lymphoma Molecular Epidemiology Resources (MER). The characteristics of the 8 FL patients are listed in Table 1 in the main text. Exome sequencing of these 8 FL tumor-normal pairs were performed at the Mayo Clinic Advanced Genomics Technology Center. The coding regions of the genome were captured using SureSelect Target Enrichment System version 2.0 (Agilent; Santa Clara, CA) which targets ~36 Mb of coding exons, and 100-bp paired-end sequencings were carried out on an Illumina HiSeq 2000 sequencer. The whole-genome mate-pair libraries of both normal and tumor samples with 5kb insert sizes were sequenced paired end at 50-bp read length on the Illumina HiSeq 2000. For RNA sequencing of 8 FL tumors, total RNA was extracted using Exiqon’s miRCURY RNA Isolation Kit, and mRNA was purified and the sequencing libraries were constructed according to Illumina TruSeq protocol. The sizes of the library fragments selected for sequencing were between 150-250 bp, and 50 bp paired-end sequencing was performed.

Exome Sequencing and Variant Calling:

The qualities of the raw sequence reads were checked by FastQC (http://seqanswers.com/wiki/FastQC), and the paired-end reads were aligned to Human Reference Genome Build 37 using BWA [1] and local-realigned and re-calibrated using GATK [2]. Using the TREAT analytic work flow [3], the single nucleotide variants (SNV) and small insertions and deletions (INDEL) were called using SNVMix [4] and GATK, respectively. The identified variants were annotated using both Seattle-Seq (http://snp.gs.washington.edu/SeattleSeqAnnotation134/) and SIFT [5]. For the current study, we only included the following variants in our analyses: (i) frame-shift and splice-site INDELs; (2) non-sense and splice-site SNVs; and (3) non-synonymous or missense SNVs. The tumor-specific or somatic variants (INDELs and SNVs) were identified as described below. First, we required minimum sequencing depth of 8 reads at each variant site in both tumor and normal samples. The INDELs must be supported by at least 3 reads and present in only tumor but not paired normal sample. The somatic SNVs were defined as Chi-Square test p value ≤ 0.01 at variant site using read depth values of tumor and normal, and the numbers of reads supporting alternative alleles in tumor and paired normal samples. In addition, we also filtered out polymorphic positions with variant allele frequencies (VAF) > 0.01 in the 5500 subjects of the Exome Project (http://exome.gs.washington.edu/) or with Miner Allele Frequency (MAF) > 0.05 in dbSNP version 134.

RT-PCR and Sanger Sequencing Validation of Somatic CRIPAK Mutations:

The frozen tumors of 20 FL and 31DLBCL were obtained from Mayo Specialized Program of Research Excellence (SPORE) Molecular Epidemiology Resource. The total RNAs were extracted from the tissues using the QIAGEN RNA Easy Mini Kit, and were reverse transcribed into cDNA. The exon (the only exon of 1341 nt) plus 250 bp of the 5’ UTR and 50 bp of the 3’-UTR regions of the CRIPAK gene were amplified using polymerase chain reaction (PCR) (forward primer: 5’ GGGCATCTCGTTCCTCAGAT 3’; and reverse primer 5’ AGCACCAGGCTAACAAATCAGTCC 3’). The PCR amplified cDNA were sequenced using Sanger technology.

Regulatory Network Analysis of Frequently Mutated Genes in NHL:

Regulatory network analysis of multiple genes was performed using the shortest path algorithm from MetaCore (GeneGo Inc.) and the gene-gene relationships annotated in the MetaCore Knowledge Database (6.11 build 41105). The network statistics and the hub genes of the network was calculated and the hub genes are defined as the network nodes with the number of connections (or edges) large than 25% of the total nodes in the network.

Protein-Protein Interaction Distance:

Protein-protein interaction (PPI) relationships are retrieved from Human protein reference database (HPRD) [6]. The pair-wise shortest distance in retrieved PPI network was defined as average of shortest distance between two proteins in a given protein set with proteins, and was computed as: . To construct a null model assessing how the actually computed statistics differs from random cases, we sampled 10,000 random protein sets of N proteins and computed corresponding values of , and compared the s of the actual G with the distribution of the s values from random samplings.

RNA-Seq Data Analysis and Fusion Transcript Detection:

The mRNA expression of the genes were calculated using HTSeq (http://seqanswers.com/wiki/HTSeq) after BWA alignment of the paired-end reads to both reference genome (Build 37) and exon junctions. The fusion transcripts and isoforms were identified using the SnowShoes-FTD algorithm [7]. We required that the two fusion partner genes be on different chromosomes or at least 50,000 bp apart if on the same chromosome, and that a fusion transcript is supported by at least 3 pairs of encompassing reads and 2 unique fusion junction spanning reads. In addition, we allowed up to ten isoforms between two fusion partners.

Detection of Somatic Copy Number Variants in FL Tumor:

The exon level copy number variants (CNV) were detected in paired tumor-normal exome sequencing data using the in-house developed algorithm, PatternCNV, which is based on the observation that in exome sequencing data the distribution of the mapped reads, although not uniform among different exons within a sample, are consistent for each exon across different samples when there is no CNV events in the region [8]. In addition, the genome-level CNVs were identified from the paired tumor-normal mate-pair DNA sequencing data using an extended version of PatternCNV.

Detection of Somatic Copy Neutral Structural Variants:

The large structural variants including the copy neutral structural variants (SV) such as translocations and inversions were detected in paired tumor-normal whole genome mate-pair DNA sequencing data using an in-house developed algorithm, SnowShoes-SV (Asmann, et al manuscript submitted), which is based on the dis-concordant mapping of the read-pairs. The SnowShoes-SV is an exhaustive algorithm for SV detection and the false positives were filtered out using both paired normal samples and a pool of Mayo Biobank control subjects, as well as the alignment features of the potential SV regions.

RESULTS

Sequencing Statistics: The exomes of 8 tumor-normal pairs of FL samples were sequenced at depths of 107-164 million 100-bp paired-end reads per sample with ~45% of the reads on target, which led to ≥10-fold coverage in 90% of the targeted regions in all samples. The tumor and paired normal mate-pair libraries were sequenced at 127-206 million 50-bp paired end reads with 58-62% reads mapped to genome. The eight tumor RNA samples were sequenced at depths of 136-186 million 50-bp paired-end reads per sample with 58-64% reads mapped to known genes and exon junctions.

The Diversity of Mutational Landscape in FL: The 8 FL patients were clinically diverse (Table 1, main text), including four grade 1-2 indolent tumors, classified as indolent tumors; and two grade 3A tumors plus two grade 1-2 tumor subsequently transformed, classified as aggressive tumors. Two of the patients did not receive initial treatments after diagnosis (observations only), and three patients are event-free after 46, 79, and 100 months while two patients had subsequent transformation of their tumors. Interestingly, the two grade III FL tumors from patient #7 and #8, harbor the most genomic abnormality with: (i) highest number of genes with point mutations (SNVs and short INDELs, Figure 1b middle panel, and Supplement File S1); (ii) highest number of genes impacted by copy number aberrations (Figure 1b upper panel, and Supplement File S2); and (iii) substantially higher number of large structural variants compared to the other six tumors (Figure 1b lower panel, and Supplement Files S3, S7). The genomic diversity of these tumors appeared to parallel the clinical diversity of the patients.

As shown in Figure 1c and Supplement Files S3 and S7, we identified several recurrent mutations including the well characterized t(14;18) translocation in 1 of the 2 grade 3A tumors and 5 of 6 grade 1-2 patients. A chr1q amplification was observed in 4 out of 8 samples (Figure 1a and 1c; and Supplement Files S3, S7). In addition, recurrent point mutations were found in previously reported lymphoma genes. The histone methyltransferase gene MLL2 gene was mutated in 3 out of 8 patients; and the histone acetylation gene CREBBP was mutated in 2 out of 8 patients. The Histone cluster genes and HLA genes were also mutated in 2 and 3 out of 8 cases, respectively. In addition, we identified recurrent point mutations in a histone methyltransferase gene (CRIPAK, cysteine-rich PAK1 inhibitor), and copy number deletions of a tumor suppressor gene (DMBT1, deleted in malignant brain tumors 1) in 2 cases (Supplement File S2). The mutational landscape of individual tumors will be discussed in detail below.

Patient Description and Observed Genomic Alterations:

Patient #1: this female patient was diagnosed at age 56 with a grade 1 stage III follicular lymphoma. She has received no treatment before and after surgery/biopsy and has been treatment-free for 100 months. The tumor from this patient had the t(14;18) translocation and had a frame-shifting short insertion mutation in MLL2, a missense mutation in BCL2, a missense mutation in the histone H2A family gene HIST1H2AM, and deletion of the tumor suppressor gene DMBT1. This tumor had a chr7 trisomy.

Patient #2: this male patient was diagnosed with grade 2 Stage III FL at a young age of 39. He was enrolled in RESORT trial as initial treatment and later received maintenance R therapy. So far, the patient has been event free for 79 months. This tumor carried the t(14;18) translocation. A nonsense mutation was observed in the TNFRSF14 gene, as well as two frame-shifting INDELs in the HLA-B gene. This patient also had a frame-shifting small deletion in the CRIPAK gene.

Patient #3: this male patient was diagnosed with a grade 2 and stage III FL at age 56. The patient was placed under observation without treatment initially and went on to receive rituximab monotherapy 8 months after diagnosis. The patient subsequently entered a vaccine trial and later was managed with rituximab monotherapy. This tumor had the t(14;18) translocation. However, we did not observe mutations in known lymphoma genes.

Patient #4: this male patient was diagnosed at age 55 with a grade 2, stage II FL and received R-CHOP as initial treatment due to tumor related small bowl obstruction. The patient had an asymptomatic FL II relapse 68 months from diagnosis and subsequently enrolled in an Ibrutinib trial. The noticeable mutations in this tumor were the t(14;18) translocation, the DMBT1 deletion, and MYC amplification. This tumor had a chr8 trisomy.

Patient #5: this female patient was diagnosed at age 52 with a grade 1 stage III tumor. She received the initial R-CVP treatment which was ineffective. A re-biopsy after cycle one showed DLBCL and the patient was subsequently treated with R-CHOP, then R-ICE and an autologous stem-cell transplant. This tumor had the t(14;18) translocation as well as the chr1q amplification. We also identified a frame-shifting small deletion in MLL2, a missense mutation in CREBBP, a nonsense mutation in the histone H2B family gene HIST1H2BD, and one frame-shifting and two missense mutations in CRIPAK gene. In addition, a missense mutation was observed in TBL1XR1 (Transducin Beta-Like 1 X-Linked Receptor 1) gene which has been reported to be mutated in the primary central nervous system lymphoma [9] and was involved in the TBL1XR1/TP63 fusion in FL, DLBCL, and T-cell lymphoma [10] [11]. This sample had a chr12 trisomy as well as the chr1q amplification.

Patient #6: his male patient was diagnosed with grade 2stage III FL at age 66 and was initially treated with CVP. The patient relapsed after 16 months. The patient subsequently had 6 additional regimens, including R-CHOP and autologous stem cell transplant. In addition, the tumor transformed at relapse to FL 3A and later FL3B. This is the only non-grade-III tumor without the t(14;18) translocation. However the tumor does have the chr1q amplification in addition to chr18 and chr21 trisomy. We did not detect point mutations in genes known to be mutated in lymphoma. Interestingly multiple fusion transcripts were identified in this tumor (Supplement File S5, and Supplement File S6 Figure B): C17orf68 à NXN, LOC100132273 à CCDC117, and TFG à GPR128. The NXN (nucleoredoxin) as a fusion partner gene is interesting since it is a redox-dependent negative regulator of the Wnt signaling pathway [12]. The TFG à GPR128 fusion is a known germline fusion previously detected in both lymphoma and healthy subjects [13], and our tumor-normal paired DNA mate-pair sequencing data also support it as a germline DNA fusion. The TFG à GPR128 fusion is also the only recurrent fusion after screening the public RNA-Seq data of 12 FL and 92 DLBCL tumors (dbGAP study accession number: phs000235.v3.p1) [14, 15].

Patient #7: this male patient was diagnosed with grade 3a stage III/ FL at age 41 as was treated on the lenalidomide/R-CHOP clinical trial. This patient remains event-free at 46 months. This is one of the two grade III tumors profiled and had the second highest number of point mutations, CNA, and structural variants. It had the t(14;18) translocation, the chr1q amplification, and chr13/chr15/chr17 deletions. There were frame-shifting INDELs observed in CREBBP and HLA-DRB1 genes. This tumor also has a NOTCH2 gene deletion. We detected two fusion transcripts from the transcriptome sequencing data (Supplement File S5, and Supplement File S6 Figure B): AK7 à CBL and VCPIP1 à MYBL1. The partner genes involved in these two fusions are intriguing. The CBL oncogene (Casitas B-lineage lymphoma proto-oncogene) is an E3 ubiquitin-protein ligase which has been shown to induce mouse pre-B and pro-B cell lymphomas [16], and the MYBL1 gene (v-myb myeloblastosis viral oncogene homolog (avian)-like 1) is a strong transcription activator and might have a role in the proliferation and/or differentiation of B-lymphoid cells [17].

Patient #8: this male patient was diagnosed with grade 3A and stage III FL at age 54 and received R-CHOP initially. The FL relapsed after 35 months and the patient subsequently was treated with R monotherapy. This tumor does not harbor the common t(14;18) translocation but had instead large number of other structural variants. The chr1q of this tumor was amplified, and the chr1p had both amplifications and deletions. The chr1 abnormality also resulted in high number of large SVs with and without inversions. Furthermore, the chr17 had the p arm deletion and q arm amplification. In addition to the large number of structural variants, this stage III tumor also had the highest number of both point mutations and CNAs among 8 FL tumors profiled. The observed point mutations include a frame-shifting small deletion in MLL2, a missense mutation in TP53, frame-shifting INDELs in both HLA-B and HLA-DRB1, and a nonsense mutation in FAS (Fas cell surface death receptor) gene which was previously reported in lymphoma [18] [19]. It’s worth noting that one copy of the TP53 in this tumor was deleted and the remaining copy had the missense mutation. We also detected an expressed fusion gene (Supplement File S5, and Supplement File S6 Figure B), MAP4àGNL3, in the RNA-Seq data from this tumor. The fusion gene partner, GNL3 (guanine nucleotide binding protein-like 3) is known to interact with TP53 and MDM2 [20] and may play an important role in tumorigenesis.