METHODS

Patient samples

DNA samples for whole exome sequencing (WES) were obtained from normal tissue (oral mucosa) and tumor BM. The study was approved by the ethical research and animal care committee from the Institute of Health Carlos III (CEI PI 32_2009). DNA was extracted by the phenol–chloroform method and the quality and quantity of purified DNA was assessed by fluorometry (Qubit, Invitrogen) and gel electrophoresis.

WES and bioinformatic analysis

The preparation of shotgun libraries from the leukemic and non-leukemic genomic DNA followed the “SureSelect Human All Exon Kit” protocol (Agilent Technologies, Santa Clara, CA), covering 50 Mb of coding exons (approximately 1.60% of the genome). One to three µg of high molecular weight genomic DNA from each sample was fragmented by acoustic shearing on a Covaris S2 instrument. Fractions of 150–300 bp were ligated to Illumina adapters and PCR-amplified for 6 cycles. For exon enrichment, 300 ng of library were hybridized to SureSelect Human All Exon Capture kit for 24 h at 65ºC. Biotinylated hybrids were captured, and the enriched libraries were completed by amplifying with 12 cycles of PCR. The resulting purified DNA library was applied to an Illumina flow cell for cluster generation and sequenced using the Illumina Genome Analyzer IIx generating 2x76 base pair, paired-end reads by following the manufacturer protocols.

Exome variant analysis and biological impact predictions were performed by RUbioSeq software (1) using default parameters for somatic variation analysis. In detail, sequencing data were first checked by FastQC for quality control checks on raw sequence data and then aligned to the human reference genome (GRCh37) using Burrows-Wheeler alignment (BWA) (2). That reads unmapped by BWA were realigned using BFAST (3). Somatic variants were identified using the Unified Genotyper v2 available at the GATK (4). For variant calling we used GATK Unified Genotyper v2 applying the ‘‘Discovery’’ genotyping mode and default parameters for filtering. Thus, SNPs available at dbSNP 132 (hg19) and those reported by the 1000 Genomes Project, as well as changes present in the matched normal DNA and silent changes, were filtered out from VCF output files (5). The GATK QUAL field was employed for ranking selected somatic variants. Biological impact predictions for detected variants were obtained from Ensembl Variant Effect Predictor (6). In order to estimate the damaging character of the missense mutations, we used two different algorithms: SIFT and PolyPhen-2. A workflow of WES and the bioinformatic analysis is provide in Supplementary Figure 3.

We used targeted re-sequencing to validate candidate variants.Ion semiconductor sequencing on the Ion Torrent PGM® was performed using the HaloPlex target enrichment system (Agilent). Briefly, patients DNAs were digested by a cocktail of restriction enzymes and hybridized to specific probes incorporating Ion Torrent specific sequences motifs. Hybridized molecules were captured with magnetic beads, amplified using indexed primers, template onto Ion Sphere Particles (Ion One Touch 200 Template Kit v2 DL), and, finally, sequenced with the Ion PGM Sequencing 200 Kit on an Ion 316 chip. Data were analyzed with Variant Caller v2.2.3-31149 using default parameters (all Life Technologies, Carlsbad, CA).

Whole-transcriptome Sequencing and bioinformatics analyses

Total RNA was extracted from BM leukemia cells and from CD34+ cells (normal control) by TRIzol (Life Technologies, Carlsbad, CA), following the manufacturers protocol; 250-1000 ng of total RNA were used for the synthesis of cDNA libraries with TruSeq RNA Sample Prep Kit v2 (Illumina, San Diego, CA) according to the manufacturers recommendations. Sequencing by synthesis was performed on HiScanSQ sequencer (Illumina) at 75bp in paired end. Sequenced reads were quality-checked with FastQC. The 75-nt paired-end reads were aligned to the human genome (GRCh37/hg19) with TopHat-2.0.4 (7) (using Bowtie 0.12.7 (8) and Samtools 0.1.16 (9)), allowing two mismatches and five multihits. Fusion transcripts were obtained with TopHat-Fusion (10). Transcripts differential expression between samples was calculated with Cuffdiff (Cufflinks 1.3.0 (7)), using the human genome annotation data set Homo_sapiens.GRCh37.65 from Ensembl.LUC7L2 expression was validate by quantitative real-time RT-PCR starting from total RNA from the patient sample, which was reverse-transcribed using random hexamer primers with the TaqMan® Gold RT-PCR Kit (Hs00255388_m1 - Applied Biosystems, Foster City, CA). The Calibrated Normalized Relative Quantity, taking into account target- and run-specific amplification efficiencies, was calculated using endogenous GAPDH expression with the Qbase software application. PIM3-SCO2 fusion transcripts were amplified (PIM3-2F: CGCGACATTAAGGACGAAAA / SCO2-2R: GCCCTGCCTTGACAAAAG) and sequenced with BigDye terminator v3.1 Cycle Sequencing kit (PE Applied Biosystems, Foster City, CA) on an Applied Biosystems 310 analyzer.

Ingenuity Pathway Ananlysis

Genes found to be mutated among the cases were subjected to Ingenuity Pathways Analysis (IPA) (Ingenuity Systems) and used as a starting point for building biological networks ( IPA uses the proteins from the highest-scoring network to extract a connectivity pathway that relates candidate proteins to each other based on their interactions.

Supplementary Figure 1: Integrative Genome Viewer image from RNA-seq data. We found an altered pattern of splicing in the mRNA species transcribed from the RUNX1 gene. At the 3’ splice site of RUNX1 intron 5, the un-spliced reads were almost 3 times more frequent in the mutated patient than the normal control.

Supplementary Figure 2: Functional classification of genes differently expressed between CNL cells and normal control (CD34+), according to IPA, revealed an enrichment of categories like cell signaling; cell death and survival; and gene expression (B-H p-value <0.05).

Supplementary Figure 3: Sequencing workflow and bioinformatics pipeline for identification of somatic mutations. After alignment of the sequencing reads from DNA samples of CNL and normal cells to the human reference genome, a series of filters were applied to discard reads that were not usable for the downstream purpose of somatic mutation discovery. Sequence variants fulfilling the additional criteria were subjected to Sanger sequencing validation.

Supplementary Figure 4: Sanger sequencing chromatograms of LUC7L2(c.C79T/p.R93*)mutation in the CNL patient.

References

1.Rubio-Camarillo M, Gomez-Lopez G, Fernandez JM, Valencia A, Pisano DG. RUbioSeq: a suite of parallelized pipelines to automate exome variation and bisulfite-seq analyses. Bioinformatics.

2.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009; 25(14): 1754-60.

3.Homer N, Merriman B, Nelson SF. BFAST: an alignment tool for large scale genome resequencing. PLoS One 2009; 4(11): e7767.

4.DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet; 43(5): 491-8.

5.Jones S, Wang TL, Shih Ie M, Mao TL, Nakayama K, Roden R et al. Frequent mutations of chromatin remodeling gene ARID1A in ovarian clear cell carcinoma. Science; 330(6001): 228-31.

6.McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics; 26(16): 2069-70.

7.Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 2012; 7(3): 562-78.

8.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009; 10(3): R25.

9.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009; 25(16): 2078-9.

10.Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol 2011; 12(8): R72.