Supplemental Results
Sequencing Output and Quality Assessment
According to the sequencing chemistry and respective software version used, we obtained 17-120 million reads per lane and 56-192 million reads per sample performing a 36bp or 76bp paired end run on a GAIIx. Using the SBS v5 sequencing kits and the SCS 2.9 software and a read length of 76bp, we obtained on average 100 million reads per lane which equals 7.6 Gb/lane. Of those reads, approximately 90% could be mapped to the reference genome. On average, 10% of those reads were not unique or mapped to more than one position resulting in around 80% of uniquely mapped unique reads. Mapping of those reads to the target region (here: “whole exome”, see details in Material and Methods) revealed 50-80% of uniquely mapped unique reads “on target” depended on the enrichment kit and the read length used, while defining the region “on target” with +/-200bp revealed around 90% of reads “on target” irrespective of the enrichment kit or read length used. With the SBS v5 chemistry and the SCS 2.9 software, we obtained an average coverage “on target” of ~100x per lane. On average, 97% of bases were covered 1x, 93% of bases 5x, 89% of bases 10x and 75% of bases 25x. One primary MM with corresponding PBMC sample showed a worse uniformity for unknown reasons (~70% of bases covered 1x) and was thus not included in this calculation. The transition/transversion (Ti/Tv) ratios were around 3.0 in all samples indicating a low rate of false positive calls and no significant technical bias. All technical parameters are listed in detail for each sample in S_Table 2.
SNV-Filtering and Validation
In order to detect somatic mutations that might be relevant for MM pathogenesis, we developed a bioinformatics pipeline to narrow down 55,363 raw SNVs that were called in a total of 16 samples (six MM cell lines and five primary MM plus corresponding PBMC samples) to 330 putative cancer relevant SNVs (Figure 1). In a first step, 47,869 SNVs were excluded that have previously been detected in healthy individuals (according to dbSNP and 1000 genomes). Subsequently, we annotated the SNVs using SeattleSeq annotation for different parameters such as missense and nonsense mutations and exluded those SNVs that did not lead to an amino acid exchange which left us with 5,856 non-synonymous SNVs. As we were interested in tumor-related SNVs, we then subtracted those SNVs that occurred in the corresponding PBMC sample thus reducing the number to 4,130 non-synonymous SNVs. Since the MM cell lines could not be matched with corresponding normal controls, they may include a large number of “passenger” mutations. To extract the tumor relevant information from the cell lines, we thus matched the genes that were mutated in our dataset (five primary MM and six cell lines) with the genes that were mutated in the 38 primary MM and excluded those SNVs that were only mutated in a cell line but in no primary sample and that occurred only in a published sample but in none of our samples. Notably, 480/1,429 of the mutated genes that were published by Chapman et al., were also mutated in our dataset. For our discovery approach, we therefore focussed on those genes that were mutated in our dataset (5 primary MM and 6 cell lines) and at least in one primary MM (5 own and 38 published5).This approach left us with only 913 SNVs (S_Table 4). Lastly, we applied three additional “functional” predictors to increase the probability that the identified SNVs lead to functional changes of the respective protein resulting in a final list of 330 SNVs/193 genes (S_Table 5). Validations by Sanger sequencing were performed randomly after SeattleSeq annotation (4,130 SNVs) at various steps of the analysis. In total, 199 of 213 SNVs (94.3%) could be confirmed (primers listed in S_Table 3) indicating only a very low technical bias as already evidenced by the good Ti/Tv ratio. This number of validated SNVs meets the guidelines of the International Cancer Genome Consortium.
Tyrosine Kinase Catalytic Domains of RTKs are Frequently Affected by Somatic Mutations in MM
To gain more information on the potential biological impact of the non-synonymous mutations in RTKs listed in Table 3A, we searched for the affected domains using the Nucleotide database from NCBI and the String database. For example, the somatic point mutations in NTRK2 affected the tyrosine kinase catalytic domain and the immunoglobulin domain in one primary MM of our discovery set and in the cell line L363, respectively (S_Figure 1). IGF1R also showed a somatic point mutation in L363 in the tyrosine kinase catalytic domain, whereas the IGF1R mutation in the primary tumor of the validation set5 affected the region between the second and third fibronectin type 3 (FN3) domains. Interestingly, the somatic point mutations in EGFR that were detected in MM.1S and in a primary MM5 also appeared to target the tyrosine kinase catalytic domain while the furin-like (FU) domain, that abets the receptor-ligand-domain, was shown to harbor a somatic point mutation in the cell line AMO1. Both point mutations in ERBB3 that were found in the cell lines AMO1 and L363 were close to the second ligand binding domain. One was found in between the second receptor-ligand-domain and the third FU domain and the second mutation within the third FU domain. No point mutation but a chromosomal deletion in ERBB3 was observed in a primary MM sample5. Finally, we detected two point mutations in NTRK1. The mutation that was detected in MM.1S affected the FU domain that follows the first receptor-ligand-domain, and the mutation that was observed in one primary MM was found to be located in the FN3 domain that directly follows the second receptor-ligand- domain. In summary, three out of five RTKs were affected by mutations in the tyrosine kinase domain in at least one of the samples investigated, and the remaining mutations mostly affected the furin-like domains or the region between two furin-like domains, close to the ligand-binding-domain.