Detailed Materials and Methods Section
This document contains a more detailed methods section than that provided in
Horvath S, Zhang B, Carlson M, Lu KV, Zhu S, Felciano RM, Laurance MF, Zhao W, Shu, Q, Lee Y, Scheck AC, Liau LM, Wu H, Geschwind DH, Febbo PG, Kornblum HI, Cloughesy TF, Nelson SF Mischel PS (2006) Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a Novel Molecular Target. PNAS
METHODS:
Patient samples:
120 glioblastoma patient samples with sufficient high quality RNA were available for molecular analysis. Tumor samples were obtained at the time of surgery in accordance with a UCLA IRB-approved protocol. The diagnosis was confirmed by frozen section and reviewed by subsequent at least two neuropathologists. Dataset 1, consisted of 55 glioblastomas (21). Dataset 2 consisted of the next 65 glioblastoma samples available.
Microarray Data
Gene expression profiling with Affymetrix high density oligonucleotide microarrays was performed and analyzed as previously described(21). 10 mg of total RNA was reverse transcribed, second strand synthesis performed, and in vitro transcription performed using biotinylated nucleotides using the ENZO BioArray High Yield Kit (Enzo Diagnostics, Farmingdale, NY). 15 mg of fragmented cRNA probe was hybridized to Affymetrix HG-U133A arrays (Affymetrix) and stained with streptavidin-phycoerythrin. Microarrays were scanned in the Genearray Scanner (Hewlett Packard) to a normalized target intensity of 2500. Microarray Suite 5.0 was used to define absent/present calls and generate cel files using the default settings. All grid placements and microarrays were visually reviewed for accuracy and array defects. Data files (cel) were uploaded into the dCHIP program (http://www.dchip.org/) and normalized to the median intensity array. Quantification was performed using model-based expression and the perfect match minus mismatch method implemented in dCHIP.
Software Tutorial and Data Availability
We provide the entire statistical code used for generating the weighted gene co-expression network results so that the reader can reproduce all of our findings. A weighted gene co-expression network analysis tutorial and the data files can be found at the following webpage:
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ASPMgene
Generation of weighted gene coexpression network:
Genes with expression levels that are highly correlated are biologically interesting, since they imply common regulatory mechanisms or participation in similar biological processes. To construct a network from microarray gene-expression data, we begin by calculating the Pearson correlations for all pairs of genes in the network. Because microarray data can be noisy and the number of samples is often small, we weight the Pearson correlations by taking their absolute value and raising them to the power ß. This step effectively serves to emphasize strong correlations and punish weak correlations on an exponential scale. These weighted correlations, in turn, represent the connection strengths between genes in the network. By adding up these connection strengths for each gene, we produce a single number (called connectivity, or k) that describes how strongly that gene is connected to all other genes in the network. The weighted network construction was performed using R as described in (22). Briefly, the absolute value of the Pearson correlation coefficient was calculated for all pair-wise comparisons of gene-expression values across all microarray samples. The Pearson correlation matrix was then transformed into an adjacency matrix A, i.e. a matrix of connection strengths using a power function. Thus, the connection strength aij between gene expressions xi and xj is defined as . The resulting weighted network represents an improvement over unweighted networks based on dichotomizing the correlation matrix, since a) the continuous nature of the gene co-expression information is preserved and b) the results of weighted network analyses are highly robust with respect to the choice of the parameter ß, whereas unweighted networks display sensitivity to the choice of the cutoff. Gene expression networks, like virtually all types of biological networks, have been found to exhibit an approximate scale free topology. To choose a particular power ß, we used the scale-free topology criterion on the 8000 most varying genes but our findings are highly robust with respect to the choice of ß. We chose a power ß=6, which is large enough so that the resulting network exhibited approximate scale free topology (model fitting index R-squared=0.95). The network connectivity ki of the i-th gene expression profile xi is the sum of the connection strengths with all other genes in the network, i.e.
The next step in network construction is to identify groups of genes with similar patterns of connection strengths by searching for genes with high topological overlap (22, 63). The use of topological overlap serves as a filter to exclude spurious or isolated connections during network construction.
To calculate the topological overlap for a pair of genes, we compare them in terms of their connection strengths with all other genes in the network. For a network represented by an adjacency matrix , a well-known formula for defining topological overlap for weighted networks is given by:
where, denotes the number of nodes to which both i and j are connected, and denotes the connectivity. Since the module identification is computationally intensive, only the 3600 most connected genes were considered for module detection. Since module genes tend to have high connectivity this step does not lead to a big loss in information. Using the topological overlap dissimilarity measure (1 – topological overlap) in average linkage hierarchical clustering, five gene coexpression modules were detected in the 55 training set samples. As we demonstrate in R tutorial that can be found on our webpage, module identification is fairly robust with respect to the dissimilarity measure; using the standard gene expression dissimilarity based on 1 minus the absolute value of the Pearson correlation produces roughly the same modules.
After identifying modules of co-expressed genes, each module in effect becomes a new network, and a new measure of connectivity (intramodular connectivity, or kin), is defined as the sum of a gene's connection strengths with all other genes in its module.
Scale-free networks
Scale free networks can be found in numerous domains and they are essential for understanding gene and protein interactions. The following references may serve as introductory reading on scale free networks: Barabási and Albert (1999), Albert and Barabasi (2000), Albert et al (2000), Barabasi and Oltvai (2004), Jeong et al (2001).
Computing the module eigengene
To compute the module eigengene, we decomposed the standardized gene-expression profile of each module via the singular value decomposition (X=UDVT). The first column V1 of V corresponded to the module eigengene (25).
Using GBM survival time to define a measure of prognostic gene significance
For each of the 120 glioblastoma samples (55 in dataset 1, 65 in dataset 2), patient survival information was available. Since some of the survival times were censored, we used a Cox proportional hazards model to regress survival time on individual gene expression profiles. To identify a measure of prognostic gene significance for each gene, we used a Cox proportional hazards regression model to compute a hazard ratio and a p-value. We defined the (prognostic) gene significance of a gene by minus the logarithm of its univariate Cox regression p-value.
To relate intramodular connectivity to prognostic gene significance we Spearman correlation coefficients and the corresponding p-values (cor.test function in the R software).
Breast cancer dataset
We used the Agilent microarray data that were used to find prognostic genes for breast cancer recurrence in a published dataset (23). The processed data and a corresponding network tutorial can be found at our webpage.
Using hierarchical clustering involving all genes, we identified one sample (S54) in the metastasis group as a data outlier and excluded it from further analysis. Since we were mainly interested in studying the preservation of the GBM module color assignment in the breast cancer network, we mapped the 3600 most connected GBM genes into the Agilent microarray data. Carrying out a distinct network analysis of the breast cancer data is beyond the scope of this article and will be reported elsewhere. We assigned colors to the breast cancer gene expression profiles according to the module assignment in the GBM 55 data. Next we constructed a weighted gene co-expression network using the power beta=6.Using the recurrence free survival time, we defined the gene significance of the i-th gene expression profile xi as minus log (with base 10) of the Spearman correlation test p-value, i.e. GSi=log10(Spearman p-value).
Statistical Robustness
Our results regarding the relationship between connectivity and prognostic gene significance are highly robust with regard to how the connectivity measure is defined (robust to choice of the power beta, the type of network used: weighted or unweighted) and with respect to how the gene significance measure was defined (see our online network tutorials).
Pathway analysis of hub genes:
The Ingenuity Pathways Knowledge Base (IPKB) was used to identify to the subnetwork of potential interactions (64). This subnetwork was further partitioned into smaller subnetworks by optimizing the local network density using the specificity connectivity of each focus gene (the relative percentage of its network connections to other focus genes). The initiation and growth of pathways proceeded from genes with the highest specificity of connections, and expanded to include up to 35 genes per network by iteratively selecting additional highly-specific network neighbors.
Genomic and functional analysis in glioblastoma cell lines:
The isogenic U87MG expressing PTEN, EGFR and EGFRvIII in varying combinations have been previously reported (2). In brief, cell lines were grown in duplicate cultures under serum free conditions for 48 hours, and RNA was isolated using the Qiagen RNeasy Mini Kit Gene. Expression analysis using Affymetrix HG-U133A arrays was performed and analyzed, as described above.
EGFR inhibitor treatment:
The EGFR tyrosine kinase inhibitor Erlotinib (Tarceva, OSI-774) was kindly provided by Genentech, Inc. (South San Francisco, CA). 1x105 U87MG and U87-EGFRvIII cells were seeded respectively in 100mm culture dishes and maintained in DMEM medium supplemented with 10% FBS. Cells were incubated in 5% CO2, 95% humidity incubator for 3 days to reach 50-70% confluency. Then all cells were switched to serum-free medium. The next day U87-EGFRvIII cells were treated by 5uM OSI-774 while U87MG and U87-EGFRvIII control group receiving the equivalent vehicle. 24 hrs later, cell total RNA was isolated by Qiagen RNeasy Mini Kit.
RT-PCR was performed for the following genes: ASPM, PRC1, AURKB, MELK, PTTG1 and TOP2A. RNA expression level was detected by semi-quantitatively. The PCR Primers used are: ASPM: forward ATCTCAAACGCCATCAGG, reverse CATTTTACGTTGCTTCCATTT; PRC1: forward AGCTCCACGATGCTGAGATT, reverse ACTATTGGCCGTAGCATTGG ; AURKB: forward CTATCGCCGCATCGTCAAGGTG, reverse GCAGCCGTTCCGAGGGGTTAT; MELK: forward CTCCGCCCCTCAGGTTCTTTTTCT, reverse AGCCACCTGTCCCAATAGTTTCAT; PTTG1: forward ATGCCCCACCAGCCTTACCTA, reverse GCTTGGCTGTTTTTGTTTGAG; TOP2A: forward CATTGGCTGTGGTATTGTAGAAAG, reverse GGCCCCCTGCATCATTGG.
siRNA mediated inhibition of ASPM
GM1600 low passage patient derived glioblastoma cells were established from a glioblastoma patient resection and cultured as previously described. Cells were maintained in IMEM medium supplied with 10% fetal bovine serum (FBS).
ASPM SiRNA sequences were designed by Dharmacon siDESIGN tool. (http://www.dharmacon.com/sidesign/default.aspx?source=0). 5’ overhang sequences for BglII and SalI were added for cloning. The ASPM SiRNA sequences were: forward gatccccCTGGTTCAGTGGTATTAAAttcaagagaTTTAATACCACTGAACCAGtttttggaac, reverse tcgagttccaaaaaCTGGTTCAGTGGTATTAAAtctcttgaaTTTAATACCACTGAACCAGggg. The ASPM SiRNA sequence was manually scrambled for generating scrambled ASPM siRNA as a control, and the mock sequence was blasted with NCBI nucleotide database yielding no significant alignments with other sequences. The mock SiRNA sequences were: forward gatccccTGTAGATACGTAGCTATGTttcaagagaACATAGCTACGTATCTACAtttttggaac, reverse tcgagttccaaaaaTGTAGATACGTAGCTATGTtctcttgaaACATAGCTACGTATCTACAggg. The SiRNA DNA oligos were synthesized and PAGE purified by IDT, Inc. The forward and reverse oligos were annealed at the presence of 50mM NaCl and cloned into retroviral vector LTRH1 (a gift from Dr. Medzhitov) BglII and SalI cloning site. The constructs were retro-virally infected into CD and HEK293T cells. Infected cells were sparsely seeded into 100mm dishes. Cells were maintained in 5% CO2, 95% humidity incubator until individual clones grew up. Cell clones were individually picked and expanded. Total RNA were isolated by Qiagen RNeasy Mini Kit and screened by semi-qRT-PCR for ASPM RNA expression level. The ASPM primer sequences used in RT-PCR were: forward ATCTCAAACGCCATCAGG, reverse CATTTTACGTTGCTTCCATTT.
For proliferation assays, 1500 cells/well in 8 replicates were seeded into 96 well plates. Cells were fixed and stained by 0.25% crystal violet in methanol every day or every other day. Stained plates were densitometry scored by AlphaImager 2200 software and plot in Microsoft Excel.
Neurosphere cell culture and transfection
Cerebral cortex was isolated from E12 mice. Cells were dissociated and cultured at 50,000 cells/ml in neurosphere formation medium (Neural Basal medium (Invitrogen) with B27 (GIBCO BRL), bFGF (Peprotech), EGF (Chemicon), heparin (Sigma-Aldrich) and penicillin-streptomycin (Gemini Bioproducts)) for a week. Growth factors were added every three days. Neurospheres were dissociated and plated onto poly-L-ornithine (Sigma)/ fibronectin coated 6-well plates in Neural Basal medium with 2% FBS (GIBCO BRL). 6 hours later, the serum medium was removed and replaced with neurosphere formation medium without heparin and penicillin-streptomycin. 24 hours later, cells were transfected with 100nM siRNA targeting Aspm and 100nM control siRNA targeting firefly luciferase using lipofectAMINE 2000 (Invitrogen). The cells were incubated with reagents for 6 hours and passaged for secondary neurosphere formation assay.
Secondary neurosphere formation assay
Cells were lifted off the plate with TriplExpress (GIBCO BRL), and then placed into neurosphere formation medium at 1000 cells/ml and 100 cells/ml. Neurospheres were propagated for 1 week and the number and the size of the secondary neurospheres formed were measured using Microcomputer Imaging Device program (MCID).
siRNA synthesis
siRNA was synthesized using the Silencer siRNA Construction Kit (Ambion). Three contructs were generated targeting different coding regions of Aspm, and one of them was demonstrated to knockdown Aspm efficiently and was chosen for further analysis.
Aspm target sequence: AATACAAAGCACGTACAGGAT
Semiquantitative RT-PCR
Total RNA was isoalted using TRIzol (GIBCO BRL), and 1μg RNA was converted to cDNA by reverse transcriptase (Imrpon). The amount of cDNA was measured by RT-PCR using prmers for GADPH. After adjustment, equal amount of cDNA for each sample was subjected to PCR analysis using gene-specific primers.
Some references on scale free networks