SUPPLEMENTARY INFORMATION

MATERIALS AND METHODS

Patient samples

Blood and marrow cells from both donor and recipient were obtained from the South Australian Cancer Research Biobank. Mesenchymal stromal cells (MSC) were cultured from BM aspirates as a source of germline control DNA. Whole exome sequencing (WES) was performed on three donor samples (MSC, AML diagnosis and relapse) and one recipient sample (AML diagnosis). In addition, paired diagnostic and remission samples from 12 other patients with DNMT3A-mutant AML were available for targeted sequencing. None of these 12 patients had therapy-related AML or an antecedent diagnosis of a hematological neoplasm.

Whole Exome Sequencing (WES) and Targeted Massively Parallel Sequencing

WES was performed using aRoche NimbleGen capture kitand sequenced on the Illumina HiSeq2500. Briefly 1g of genomic DNA was sheared to a mean fragment size of 200bp using the Covaris S220 before conversion to barcoded DNA libraries using aTruSeq DNA LT Sample Preparation Kit (Illumina, San Diego, CA USA). After purification, libraries were quantified by Agilent Bioanalyzer HS DNA assay and combined equally into pools of 6 prior to solution phase capture using the SeqCap EZ Exome Library v3.0 (Roche NimbleGen, Madison, WI USA). The three donor (mesenchymal stem cells, diagnosis, relapse) and one recipient (diagnosis) samples were sequenced together with other unrelated samples on five Illumina HiSeq2500 flowcells (v3 SBS chemistry 2x100PE), with 6 samples multiplexed per lane. All but the mesenchymal stem cell sample were included on two flowcells. The number of sequenced fragments for the mesenchymal stem cell sample was 30 million, while for the other three samples there were 121, 86 and 71 million fragments, respectively).

Targeted Massively Parallel Sequencing was performed on a custom 29 gene panel (all coding regions) of myeloid genes (Supplementary Table 3) using an Ion Torrent AmpliSeq approach. Briefly, the targeted gene libraries were generated from 10 ng of genomic DNA using the Ion AmpliSeq Library Kit v2.0 and the custom primer pool as per the manufacturer’s protocol (Life Technologies, Guilford, CT USA). After adapter ligation and a 5-cycle PCR amplification incorporating barcodes the libraries were quantified by Agilent Bioanalyzer HS DNA assay and combined equally into a pool of 12 samples. The library pool was diluted to 6 pM and templated onto Ion Sphere Particles (ISPs) by emulsion PCR using the automated Ion OneTouch2 system with the Ion P1 Template OT2 200 Kit (Life Technologies). ISP Sequencing was done using an Ion P1 chip (Ion P1 Sequencing 200 Kit v3 chemistry) on the Ion Proton.

Sequence analysis

The WES reads were mapped to the human genome (hg19) using bwa sampe (v0.6.2). Sorting and indexing was carried out using samtools (v0.1.12a) followed by duplicate marking using picard (v1.71). Mapping resulted in average coverage over the Nimblegen capture regions of 34.1, 96.5, 94.2 and 76.1 for the donor’s mesenchymal stem cells, diagnosis, relapse and the recipient’s diagnosis sample, respectively.

The GATK toolkit (v2.5.2-v2.8.1) was used to realign indels, recalibrate quality scores and its UnifiedGenotyper was used to call variants (multi-sample calling) according to the Broad’s “best practices pipeline” for the GATK v2 series. Variants were annotated using the ACRF Cancer Genome Facility’s custom annotation pipeline based on SnpEff and SnpSift (v3.3)18. Annotation information was taken from Ensembl (v73)19, dbSNP (v137), the 1000 Genomes project (integrated phase 1, v3)20, the Exome Sequencing project (6500SI-V2)21, COSMIC (v67)22, GERP scores23 as well as other public databases.

A rudimentary filtering was imposed on variants to remove those that were relatively unlikely to be of interest by imposing two conditions. First, we demanded that the variant had to be rare (<0.5%) both in the 1092 individuals of the 1000 Genomes project and the 4300/2203 European/African-Americans of the Exome sequencing project. Secondly, variants were only retained if they either showed evidence that they were evolutionarily conserved, either in mammalian (GERP ≥ 2) or other vertebrate (PhastCons ≥ 0.9) species, or if their predicted functional impact had the potential to be non-trivial (i.e. not synonymous coding and not classified by SnpEff to be a “modifier”), or if they were known somatic mutations occurring in the COSMIC database. Finally, we compared the variants passing the above criteria to an in-house collection of 51 exomes of patients with non-hematological malignancies, some of which were sequenced concurrently with the four exomes considered here. If the variant occurred more than once (heterozygous) in this collection then it was discarded. In total, these filters reduced the total number of sites to be considered further to 7480.

The Ion Torrent targeted sequencing data was processed with the Torrent Suite™ software v4.0.1 using the AmpliSeq workflow. This suite automates the generation of sequence reads, trimming of adapter sequences and the removal of poor quality reads. Variant calls were made using the Torrent Variant Caller plugin (4.0-5, 72041) using the Somatic Mutation default settings except for ‘SNP minimum allele frequency’ (0.5%) and ‘Indel min allele frequency’ (1.25%). Variants were annotated using SnpEff, COSMIC and local in-house databases as detailed above.

Additional cases of DNMT3A-mutant AML

DNMT3A mutation load of paired diagnostic and remission samples from 12 AML patients with DNMT3A (R882H/C) was performed using a custom Sequenom MassArray assay (Sequenom, Inc., San Diego, CA USA). Allele loads of concurrent mutations in isocitrate dehydrogenase 1 and 2 (IDH1/2), Kirsten rat sarcoma oncogene homolog (KRAS) and NPM1 were measured using a custom Sequenom assay, Sanger sequencing and a restriction fragment length polymorphism assay, respectively.

Study oversight

The research was approved by the Royal Adelaide Hospital Human Research Ethics Committee and all patients gave written informed consent.
Supplementary Table 1. Gene mutations in the two brothers.

Gene / Genome (hg19) / mRNA Transcript / Protein
DNMT3A / chr2:g.25457242C>T / NM_022552; c.2645G>A / p.Arg882His
NPM1 / chr5:
g.170837544_170837547dupTCTG / NM_002520;
c.860_863dupTCTG / p.Trp288Cysfs*12
FLT3 / chr13:g.28592642C>A / NM_004119; c.2503G>T / p.Asp835Tyr
IDH1 / chr2:g.209113112C>T / NM_005896; c.395G>A / p.Arg132His
NOTCH4 / chr6:g.32178533C>T / NM_004557; c.2861G>A / p.Cys954Tyr
WT1 / chr11:g.32413566G>A / NM_024424; c.1180C>T / p.Arg394Trp
SMC1A / chrX:g.53423420T>C / NM_006306; c.2680A>G / p.Ile894Val

Supplementary Table 2. Mutation allele loads of 12 DNMT3A-mutant AML patients who achieved complete remission after induction chemotherapy.


Supplementary Table 3. AmpliSeq 29 Gene Panel list. The entire coding region of each gene was encompassed by massively parallel sequencing for mutation detection.

Genes
ASXL1
BAP1
BRAF
CBL
CEBPA
DNMT3A
EGFR
EZH2
GATA2
IDH1
IDH2
JAK1
JAK2
KIT
KRAS
MET
MPL
MYD88
NOTCH1
NPM1
NRAS
PTPN11
RUNX1
SF3B1
SRP72
SRSF2
TET2
U2AF1
XPO1

REFERENCES

18.Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, et al.A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 2012; 6: 80-92.

19.Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, et al. Ensembl 2014. Nucleic Acids Res 2014; 42: D749-755.

20.Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, et al. An integrated map of genetic variation from 1,092 human genomes. Nature 2012; 491: 56-65.

21.Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, et al.Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 2012; 337: 64-69.

22.Forbes SA, Bhamra G, Bamford S, Dawson E, Kok C, Clements J, et al. The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr Protoc Hum Genet 2008; Chapter 10: Unit 10 11.

23.Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 2005; 15: 901-913.

1