A Gene Expression Signature Reflecting STK11 Mutation in Lung Adenocarcinoma

Supplementary Materials

A sensitive NanoString-based assay to score

STK11 (LKB1) pathway disruption in lung adenocarcinoma

Chenet al, 1/16/2016

Table of Contents

Supplementary Methods

Plasmid Vectors

Transfection and Immunoprecipitation

Western Blotting

Patient Cohorts and Public Datasets

Comparing the sequenced STK11 mutation status and STK11 gene signature phenotype

NanoStringTM data analysis

Immunohistochemistry

Supplementary RESULTS

Expression of stable, but catalytically inactive, STK11 variants may obscure STK11 pathway mutations

Development, refinement and validation of a 122-gene STK11 mutation signature

Supplementary Figures Legends

Supplementary References

Supplementary Methods

Plasmid Vectors

The pcDNA3-FLAG-STK11 vector was purchased from Addgene (Plasmid #8590; Addgene, Cambridge, MA, USA). pNTAPb and the pNTAPb-Mef2a affinity-tagged negative control were purchased from Agilent Technologies (Kit #240103, Santa Clara, CA, USA) as component of the InterPlay N-terminal Mammalian TAP System. The STK11 gene insert was amplified from this plasmid using the T7 promoter and custom primer (5'-ATACTCGAGCTGCTGCTTGCAGGC-3'), excised using EcoRI and XhoI restriction enzymes, and cloned into those sites of the pNTAPb vector. STK11 mutants were generated by PCR using the following primers: D194Y forward (5'-ACCCTCAAAATCTCCTACCTGGGCGTGGC-3'), D194Y reverse (5'-GCCACGCCCAGGTAGGAGATTTTGAGGGT-3'), P281fs*6 forward (5'-GACTGTGGCCCCCGCTCTCTGACCTG-3'), P281fs*6 reverse (5'-CAGGTCAGAGAGCGGGGGCCACAGTC-3'), F354L forward (5'-AGGACGAGGACCTCTTGGACATCGAGGATG-3'), F354L reverse (5'-CATCCTCGATGTCCAAGAGGTCCTCGTCCT-3').

Transfection and Immunoprecipitation

Cells were cultured in RMPI supplemented with 10% fetal bovine serum plus 1% penicillin/streptomycin. H1299 cells were cultured in 150 mm plates to approximately 70-80% confluency and transfected with 13 μg vector (pNTAPb; Agilent Technologies), Mef2a control, or one of three STK11 variants, D194Y, P281fs*6, or F354L, plus 41 μg ssDNA mixed with 130 μL Lipofectamine-2000 in a total of 30 mL of serum-free media. This mixture was left on the cells for 4 hours, and then changed to complete media. After 48 hours, cells were harvested and immunoprecipitated using streptavidin beads following the manufacturer's protocol (InterPlay Mammalian TAP System, Agilent Technologies, Santa Clara, CA, USA) stopping after the streptavidin elution with the exception that NETN (0.5% v/v Nonidet P-40, 20-mM Tris pH 8.0, 50-mM NaCl, 50-mM NaF, 100-μM Na3NO4, 1-mM DTT, and 50-μg/mL PMSF) was used in place of both the manufacture's Lysis buffer and Streptavidin binding buffer. The proteins were eluted in 30 μL 2X Laemmli buffer and the entire elution was run on the SDS-PAGE gel for western blotting.

A549 cells were cultured as above in 100 mm plates to 70-80% confluency. They were then transfected with 1 μg vector (pNTAPb), WT STK11, or one of the STK11 variants, D194Y or F354L, additional plates received co-transfections of WT STK11 with each of the STK11 variants at ratios of 1 μg variant plus 1 μg WT, 1 μg variant plus 5 μg WT, or 1 μg variant plus 10 μg variant. All transfections were brought up to a total of 11 μg DNA using ssDNA mixed with 60 μL Lipofectamine-2000 in a total of 15 mL of serum-free media. This mixture was left on the cells for 4 hours, then changed to complete media. After 48 hours, cells were harvested and western blotted as described above.

Western Blotting

Frozen cell pellets were obtained from Moffitt’s Lung Cancer Center for Excellence’ Cell Core facility. All lines were authenticated by short tandem repeat genotyping and maintained free of Mycoplasma. As previously described1, cell lysates were normalized for protein content (30 μg) and separated using SDS-PAGE. Proteins were visualized using horseradish peroxidase conjugated secondary antibodies and enhanced chemiluminescence (ECL; Amersham Biosciences, GE Life Sciences, Pittsburgh, PA, USA). Antibodies used include an STK11 mouse monoclonal antibody (clone Ley 37D/G6, catalogue number sc-32245, Santa Cruz Biotechnology, Santa Cruz, CA, USA), a MO25 rabbit polyclonal antibody (M7195, Sigma-Aldrich, St. Louis, MO), a STRAD goat polyclonal antibody (sc-55052, Santa Cruz Biotechnology), a threonine 172 phospho-AMPKα rabbit monoclonal antibody (2535, Cell Signaling Technology, Boston, MA), an AMPKα1 rabbit polyclonal antibody (2795, Cell Signaling), and a β-actin mouse monoclonal antibody (A5441, Sigma-Aldrich).

Patient Cohorts and Public Datasets

This study includes data from 442MoffittLUAD2patients for which we have overall survival data (MLOS cohort, see Tables S2 and 3, Supplementary Data for demographics) that consented to the Moffitt Cancer Center's Total Cancer Care (TCCTM) protocol, either at the Moffitt Cancer Center,or at one of 18 TCC affiliates between April 2006 and August 2010. This multi-institutional protocol has no exclusion or inclusion criteria and is open to all patients willing to permit access to self-reported demographics, clinical data, medical records, and tissue samples. These prospectively enrolled patients are followed for life. All work was approved by the University of South Florida Institutional Review Board. For theMLOS cohort we have determined the genomic status of STK11, KRAS, TP53 and EGFR by Sanger-based exon sequencing in work that is in press2. From the 442 patients, surgical tissue blocks from 150 patients that were treated at Moffitt were used to create a smallerLUAD cohort for which we have complete treatment and outcome data derived from patient chart reviews (MLCOMcohort, see Tabled S4-S6 for additional information about the cohort). For these patients, tissue blocks were used to create a tissue microarray for IHC and subsequently to isolate RNA for NanoString analysis. All patient-related work was approved by the University of South Florida Institutional Review Board.

A novel LUAD cohort, MLOS, developed at Moffitt via TCCTM, was also used to validate the STK11 signature. A manuscript describing this cohort is currently in press2#11797. Briefly, the cohort of 442 samples, for which both microarray and STK11 mutation data are available, was normalized with IRON against the median sample. Principle component analysis identified RNA quality as the first component. Differences in RNA quality were computationally corrected for by training a partial least-squares (PLS) model against the RIN score, removing the first component of the model, then reconstructing the full data matrix using all but the removed eigenvector. STK11 mutations and copy number were assayed by targeted exome sequencing and quantitative PCR, respectively.

Five public datasets were used, including; 1) the Molecular Classification of Lung Adenocarcinoma (MCLA) microarray study3, 2) the Cancer Genome Anatomy Project(TCGA) LUAD RNAseq study4 and 3-5) three microarray studies from the Gene Expression Omnibus5 (GSE302196, GSE377457 and GSE148148)for which STK11 mutation status was unknown. Data from a 442-LUAD patient cohort (herein referred to as MLOS, Moffitt LUAD,Overall Survival cohort) that consented to the Moffitt Cancer Center's Total Cancer Care Protocol (TCCTM, see Table S1) was also used. All datasets were normalized using IRON against the median sample of their respective dataset. GSE37745 was de-batched using COMBAT, using PA025 and PA117 as batch, with conformed histology and gender as covariates. Histology was conformed to remove minor inconsistencies in abbreviations and wording. The MCLA dataset was combined with its Dana Farber Cancer Institute (DFCI) sister samples from GSE148148, bad samples discarded, normalized with IRON, then de-batched with COMBAT. We will refer to this combined dataset as the MCLA+ dataset. Batch assignments and discarded samples are given in the online supplementary metadata for this manuscript. These datasets resulted in 307, 106, and 482 adenocarcinoma samples for GSE30219, GSE37745, and DC+, respectively.

Normalized RNASeqV2 RSEM files were downloaded from The Cancer Genome Anatomy Project(TCGA)9 for both adenocarcinoma4 and squamous lung projects, then merged together into a single larger cohort. Additional normalization was performed using IRON v2.1.5 (iron_generic –rnaseq) to correct for minor differences in dynamic range between samples. A batch effect was identified due to a likely process change in early February 2011. Two batches were assigned, using a dateofcreation cutoff of 2014-02-01. Gene expression was then de-batched using COMBAT, with histology and gender as covariates. Mutation calls were downloaded from TCGA and merged into the gene expression metadata, with synonymous mutations treated as WT. After discarding recurrences, the final adenocarcinoma cohort consisted of 488 samples (407 STK11 WT, 64 coding mutants, 11 splice site mutants, 6 mutation status unknown).

Comparing the sequenced STK11 mutation status and STK11 gene signature phenotype

We calculated measures of test performance including true positives, true negatives, false positives, false negatives, sensitivity, specificity, negative predictive value, false positive rate, and false discovery rate. For TCGA samples, the first principle component (t[1]) of the STK11 signature was used to assign WT and STK11 phenotypes (WT: t[1] ≤ 0, STK11: t[1] > 0) phenotype to the samples. The MLOS cohortsample phenotypes were assigned in a similar fashion (WT: t[1] ≤ 2.1, STK11: t[1] > 2.1). Splice site mutations were treated as mutant for two-state analysis purposes.

NanoStringTM data analysis

A NanoStringTM panel was created from the refined STK11 signature, consisting of 122 genes. The assays were performed with 150-ng aliquots of RNA using the NanoString nCounter Analysis system. Generic CLIA-certified codesets were obtained directly from NanoString and gene-specific oligonucleotides were obtained from IDT (see Tables S7-9). After codeset hybridization overnight, the samples were washed and immobilized to a cartridge using the NanoString nCounter Prep Station. Cartridges were scanned in the nCounter Digital Analyzer at 555 fields of view for the maximum level of sensitivity. Mean background signal for each chip was calculated as the arithmetic mean of all negative controls for that chip. Background-subtracted signals were then calculated for each chip by subtracting the negative control mean from the gene abundances. For each sample, the geometric mean of 13 housekeeping genes was used for normalization. Five of the 18 intended housekeeping genes were not used (GIGYF2, ORMDL1, PRDM4, TRIM39, USP4), due to the presence of samples for which non-positive background-subtracted signal was observed. Each chip was then scaled by (100 / geometric housekeeping mean), floored to 1, and then transformed into log2 space.

Immunohistochemistry

A tissue microarray (TMA) was constructed from available diagnostic paraffin blocks from 150 Moffitt LUAD patients (MLCom cohort). Slides from potential donor blocks were stained with hematoxylin-eosin and examined by a board-certified clinical pathologist. Appropriate blocks were released for study and representative tumor areas (and a subset of normal tissue areas) marked. Donor tissue cores with a diameter of 0.6 mm were punched and arrayed into a recipient paraffin block using a tissue arrayer (Beecher Instrument, Silver Spring, MD, USA). The TMA included 150 cores from primary adenocarcinomas (though some are lost during staining), 58 cores of adjacent normal lung tissue, 14 cores from non-lung tissue controls (normal and cancer) and 10 samples of lung cancer cell lines of known STK11 status (which were used to demonstrate the specificity of STK11 staining).

TMA slides were cut into 4 µM sections and stained with a mouse STK11 monoclonal antibody (sc-32245, Santa Cruz Biotechnology, Santa Cruz, CA, USAby the Moffitt Pathology Core (staining details are available upon request). The stained TMA was reviewed by a board-certified clinical pathologist (author SGB) blinded to the molecular data. Normal tissue cores were examined to determine staining criterion. The staining of tumor tissue was scored as either negative or positive, with positive values ranging from +1 to +4. For statistical analysis, all positive staining cores were grouped together, as previously described10.

SupplementaryRESULTS

Expression of stable, but catalytically inactive, STK11 variants may obscure STK11 pathway mutations

Previous work has identified three alternatively spliced isoforms of STK11; 1) the full-length 50-kDa isoform, 2) a 48-kDa isoform with an alternative C-terminus that is expressed in the testis11 and 3) an oncogenic, but kinase inactive, 42-kDa isoform that is mainly expressed in normal heart and skeletal muscle12 that is reported to be expressed (along with other shortened STK11 isoforms) at high levels in tumors with mutations in STK11 codons 1 and 213. Such isoforms could potentially obscure standard IHC assessment of STK11 pathway status. Thus, we sought to examine the expression of STK11 isoforms in established cells lines and tumor. Forty-two cell lines and fifty-six tumor samples with known STK11 status were examined by STK11 western blotting (Figure 1). The full length 50-kDa STK11 band was present in STK11 wildtype (WT) cell lines, as expected, with H2170 cells being the exception. The 48-kDa isoform was observed in only two cell lines (H292 and HCC4006, both STK11 WT) and was co-expressed with the 50-kDa band. In H2170, H520, H1395, Calu-6 and H1581 cells intense faster migrating bands were observed in the 42-kDa size range which were generally co-expressed with the 50-kDa isoform. Full-length STK11 protein expression was absent in all STK11 mutant cell lines, as expected. However, in contrast to our expectations from the literature 12, 13; 1) we did not see the 42-kDa band in H460s12, 2) the 42-kDa band was only present in one cell line with a known mutation in Exons 1 or 2 (H1395) and 3) the 42-kDa band was present in four cell line considered STK11 WT and was co-expressed with the full-length 50-kDa isoform in three of the four cell lines (we did not see both bands in STK11 WT H358 cells as previously reported 12).

In patient tumor samples, STK11 western blotting revealed a wide range of protein isoform expression for both STK11 WT and variant tumors (Figure 1B). Expression of both the 42- and 48-kDa isoforms were much more common in tumors than cell lines and in many tumors the shorter isoforms were expressed at higher levels than the full length protein. We do not observe expression of the 42-kDa isoform to be correlated with STK11 mutations as previously reported13.

Development, refinement and validation of a 122-gene STK11 mutation signature

Supplementary Table S1 lists the cell lines, STK11 mutation status, histology and source of microarray data used to identify an STK11 signature. An initial 130-probeset STK11 mutation signature was generated using gene expression data from 53 NSCLC cell line samples taken from ArrayExpress14 accession E-MTAB-783, with additional metadata from the Sanger Cell Line Project15, as reported16. A second STK11 mutation signature was later generated using NSCLC cell line gene expression data from the Cancer Cell Line Encyclopedia (CCLE)17. CEL files were normalized using IRON18 against the median sample. Principle component analysis (PCA) was performed, and samples were identified that did not cluster with other samples of the same conformed site of origin (SOO). For 51 of these samples, literature and other notations in the metadata were used to support reclassification of the originally reported SOO to a new conformed SOO that agreed with the gene expression metadata. Twenty outlier samples, for which no justification could be found for altering their reported SOO, were discarded due to large disagreement between gene expression and reported SOO. These remaining 971 samples were then de-batched using COMBAT19, using the batch reported in the metadata, and conformed SOO as covariate.

STK11 mutation calls were assembled from (CCLE)17, COSMIC20, the Roche Cancer Genome Database (RCGDB)21, 22, and personal communication (Luc Girard, UTSW). Synonymous mutations, as well as those occurring within introns, were considered to be phenotypically wild type (WT). Where multiple sources disagreed between mutant and WT, the mutant assignment was used. Assignment of consensus WT was more difficult, since reporting of WT does not necessarily indicate that the sample is truly WT, only that no mutations were specified by the given source. Literature sources and RCGDB were determined to be the most sensitive sources of mutation calls, so consensus WT was assigned if the sample was declared to be WT from either of these two sources, and did not have a mutation assignment from any other source. This resulted in a final gene expression cohort of 24 STK11 mutant and 55 STK11 WT cell lines.

Gene expression signal intensities were converted to log2 intensities, and the averages for both the STK11 mutant and WT groups were calculated for each probeset. The following strict filters were then applied: the maximum of the mutant and WT average log2 intensities must be greater than 5, the two averages must differ by at least one (≥ 2-fold change), both Student’s t-test and Mann-Whitney U-test < 0.0001, Hellinger distance >= ⅓. Probesets without any gene annotation, those mapping to only pseudogenes, and those mapping to multiple genes on non-adjacent chromosome bands were removed. This resulted in a second STK11 mutation signature composed of 82 probesets. Merging the first (E-MTAB-783) and second signatures (CCLE) together yielded a combined signature of 182 probesets.

Three published Affymetrix-based gene expression databases3, 6, 7 representing nearly 1000 non-small cell lung cancer (NSCLC) patients were used to further validate and refine the STK11 signature. Principle component analysis (PCA) was performed and the loading coefficients for the first principle component for each gene was used as a filter to determine which genes would translate well from cell lines to patient tissues. Figure S1 demonstrates that the vast majority of the genes (represented by 182 microarray probesets) have the same sign for the first principle component loading coefficients in both the cell lines and patient samples - likely demonstrating similar biological variation in both settings. Only a handful of probesets (representing genes CYP1B1, MECOM, PDLIM5 and TYMP which are plotted as triangles in Figure S1) consistently exhibited the opposite sign in tumors and in cell lines and were excluded from the signature. This resulted in 122 gene.

The ability of the STK11 signature to classify LUAD tumors from two large datasets with available expression and mutation/copy number data was tested. This analysis used the TCGA LUAD RNAseq dataset4 including 488 patient samples and a second recently described2 microarray-based expression dataset from 442 LUAD patients (herein the MLOS cohort, described in Table S1). Figure S1D and S1E demonstrate that the loading coefficients of the first PC of most genes in the signature have the same sign as in cell lines. Figure 2A and 2B demonstrate that the first principle component strongly separates STK11 WT and mutant tumors. Various measures of agreement with sequenced STK11 mutation status were then calculated (Table 1) and demonstrate that the sensitivity of the signature relative to determined mutations is good (0.95-0.97), but specificity (0.73-0.74) is poor based on sequencing as the standard. These are the expected results since there may be many tumors with WT STK11 DNA that down regulate the STK11 protein or pathway via other mechanisms10, 23.