Supplementary Text – Analysis of germline sequence data for TCGA endometrial cancer samples
The VCF files and clinical data of 543 uterine corpus endometrial cancer cases were downloaded from the Genomic Data Commons Data (GDC) portal(1)(Project #1502 : Genetic variation in candidate cancer genes and cancer).The whole exome of endometrial cancer cases had been sequenced either by Roche SeqCap EZ Human Exome Library v3.0 or Agilent SureSelect Human all Exon 38Mb v2, and primary data analysis was done using the Washington University School of Medicine (WUSM) pipeline, producing VCF-file format (1). Quality scores for single nucleotide and insertion-deletion variants were only available from theVarscan calling method(2). Variantswere thenannotated using SnpEff(3), and filtered for location all genes reviewed in this article, with the exception of EPCAM(where large deletions are implicated MSH2 inactivation). Somatic variants and low quality variants i.e. variants having base coverage less than 12, variants having fraction of alternate allele less than 20%, and strand bias variants were discarded. All variants were reviewed on the Integrative Genomic Viewer (IGV) (4)to filter for sequencing artefacts. In addition, all variants initially assigned as located in PMS2 were assessed for possible location in the pseudogene PMS2CLon the basis of position and pseudogene-specific sequence within the same read as the variant. Minor allele frequency was annotated using Exome Aggregation Consortium(ExaC [non TCGA]) version 3 database (5) and variants with MAF less than 1% were retained. The impact of splicing variants was predicted using MaxEntScan(6), with variants resulting in markedly reduced score interpreted as likely to affect native splicing.Variant pathogenicity was assigned with reference to designationsnoted on the ClinVar repository(7). Variant pathogenicity for MMR and BRCA1/2 genes was as assigned by ClinGen expert panels for MMR (InSiGHT, (8)) and BRCA1/2 (ENIGMA, or following classification criteria of these groups. Classification for other gene variants considered variant effect and other clinical information commonly used for variant classification.
We also assessed how our findings overlapped with those of Lu et al (9), who conducted a 2-stage analysis of germline variants across 12 cancer types in TCGA, including endometrial cancer. In the first discovery stage of Lu et al, analysis of rare truncating variants in 624 genes in all cancer types (including 258 endometrial cancer patients), and subsequent burden analysisprioritized 32 genes for analysis in the validation phase (including 295 endometrial cancer patients). Of the genes highlighted in this review, only MSH6, PMS2, MUTYH, BRCA1, BRCA2, PALB2, RAD51C, RAD51D, ATM, BRIP1 were included in both the discovery and validation phases, and thus analysed in all endometrial cancer patients. IDs for patients included in each phase were not reported, preventing direct comparison of sequence calls to those from our analysis. However, overlap in reporting was good for genes assessed in the full endometrial cancer sample set by Lu et al, with 15/16 variants reported in our study (Table 1) also identified by Lu et al (9).
1.
2.Koboldt DC, Zhang Q, Larson DE, et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 2012;22:568-576.
3.Cingolani P, Platts A, Coon M, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome ofDrosophila melanogasterstrain w1118; iso-2; iso-3. Fly 2012;6.
4.Robinson JT, Thorvaldsdottir H, Winckler W, et al. Integrative genomics viewer. Nat Biotechnol 2011;29:24-26.
5.The Genomes Project C. A global reference for human genetic variation. Nature 2015;526:68-74.
6.Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 2004;11:377-394.
7.Landrum MJ, Lee JM, Riley GR, et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research 2014;42:D980-D985.
8.Thompson BA, Spurdle AB, Plazzer JP, et al. Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database. Nat Genet 2014;46:107-115.
9.Lu C, Xie M, Wendl MC, et al. Patterns and functional implications of rare germline variants across 12 cancer types. Nat Commun 2015;6:10086.