Supplemental Methods:
Primer Design. New primers were matched by Tm, GC content and amplicon size, such that all behaved comparably. By comparison to our initial KSHV qPCR array (1), the individual amplicons were larger (~ 200 bp) and multiple primer pairs were placed over each open reading frame. We showed earlier that amplicon sizes <400 bp all had similar high efficiency (2). Only amplicon sizes above 400 bps showed a length-dependent decrease in sensitivity under fast qPCR cycling conditions. The 200bp amplicon size allowed us to easily visualize, clone, and sequence the PCR amplicons as an added quality control step. All primers were located within the originally predicted open reading frames (orfs) in KSHV (3) and preferentially located towards the 3’ end.
Primer Validation: Since all qPCR primer pairs were located within orfs and none crossed spliced sites, we used total DNA from the BC-1 cell line as a positive control (Figure 3B). The results are depicted as a heat-map. All primer pairs amplified the positive control and none amplified the water-containing, non-template control. Namalwa RNA was used to test whether any of the KSHV qPCR primer pairs cross-reacted with human mRNAs or EBV mRNAs. Only primer pair PP2 K5 at position 26,200 did and was therefore excluded from further analysis. Some KSHV qPCR primer pairs had higher sensitivity than others, i.e. they yielded a positive signal even at 1/125 of the input DNA. Therefore the qPCR efficiency for each qPCR primer pair was calculated individually and used to derive a primer efficiency corrected CT, called CT’ as previously described (4). Also rather than assuming the ideal amplification efficiency of 2 to calculate fold changes, we henceforth use 1.8, which more closely reflects the amplification efficiencies in this array.
Analytical Power and Reproducibility. To increase the analytical power of our analyses we selected high quality samples. First, we excluded any cases with incomplete clinical data. Second, we excluded cases with to little overall RNAs. Third we excluded cases with to little KSHV latent mRNA, as these are likely to represent biopsies where only a few cells were cancer cells. To arrive at the final high quality data set, we removed additional samples, which had detectable, but much lower KSHV latent mRNA levels compared to the mean of the set. This reduced the standard deviation (SD), i.e. the biological variability, in the latent mRNA level as measured by two primers in the 3’end of the latency (LANA/ vCYC/ vFLIP) transcript to 1.5 CT units for the validation data set. Assuming that the LANA promoter is constitutively active at similar levels in all KS tumor cells. It implies that the number of infected cells and/or the copy number per cells varied less than 3 fold, which is close to variability of cell-line based biological replicates. The technical variability, i.e. pipetting error is < 0.5 CT units for quadruplicate measurements owing to the fully automated processing and set-up of the qPCR array. Hence, we are able to identify any difference in mRNA levels between two clusters that differs by > 6 fold (2 SD = 3x2 fold) with > 97% confidence.
Further Statistical analysis using R. We used Shapiro-Wilkinson test to evaluate if individual data groups were normal distributed. The Wilcoxon signed rank test was used to evaluate changes in KSHV viral load between samples for individual genes, Pearson regression analysis to establish a correlation between viral loads, CD4 counts and viral mRNA levels and unsupervised mixed factor analysis to evaluate overall gene expression patterns. Principal component analysis and agglomerative Hierarchical Clustering using Ward’s algorithm was conducted as published by Husson et al. (5). Fisher’s exact test or ANOVA was used to evaluate if a particular clinical variable or the expression any one mRNA was over represented in any one of the clusters. We use the q-Value method to adjust for multiple comparisons. Note that any comparison among clusters based on normalized expression ddCT does not take into account the overall mRNA abundance. A mRNA which is barely detectable in KS samples with extended transcription, but not at all detectable in the latent subtype would generate the same p-value as a mRNA which is highly and consistently present in one KS subtype, and downregulated, but still detectable in the other KS subtype.
References
1. Fakhari FD, Dittmer DP. 2002. Charting latency transcripts in Kaposi's sarcoma-associated herpesvirus by whole-genome real-time quantitative PCR. Journal of virology 76:6213-6223.
2. Hilscher C, Vahrson W, Dittmer DP. 2005. Faster quantitative real-time PCR protocols may lose sensitivity and show increased variability. Nucleic acids research 33:e182.
3. Russo JJ, Bohenzky RA, Chien MC, Chen J, Yan M, Maddalena D, Parry JP, Peruzzi D, Edelman IS, Chang Y, Moore PS. 1996. Nucleotide sequence of the Kaposi sarcoma-associated herpesvirus (HHV8). Proceedings of the National Academy of Sciences of the United States of America 93:14862-14867.
4. Lock EF, Ziemiecke R, Marron J, Dittmer DP. 2010. Efficiency clustering for low-density microarrays and its application to QPCR. BMC bioinformatics 11:386.
5. Husson F, Le S, Pages J. 2011. Exploratory Multivariate Analysis by example using R. CRC Press, Boca Raton.