High-throughput analysis of alternative splicing by RT-PCR

Roscoe Klinck, Benoit Chabot and Sherif Abou Elela*

Laboratoire de génomique fonctionnelle de l’Université de Sherbrooke and département de microbiologie et d’infectiologie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec J1H 5N4, Canada

*Correspondence to

1. Abstract

As the importance of alternative splicing in biology becomes increasingly evident, the drive to explore biological systems with ever-higher precision and depth continues apace. At the RNA level, microarray, and more recently, deep sequencing (RNA-Seq) techniques have emerged to meet this challenge, but the underlying requirement of data validation, most commonly using RT-PCR methods can be a limiting factor. We have developed a rapid and simple technique based on endpoint PCR for the precise characterization of defined alternative splicing events. We have adapted this technique to a high-throughput automated platform, which can routinely perform and analyze up to three thousand PCR reactions per day. This allows large-scale validations of genomewide microarray or RNA-Seq data, and analysis of sequence databases for the discovery and annotation of tissue-specific alternative splicing. The key features of this method are the computational design and data analysis protocols, and the use of microcapillary electrophoresis for the detection and characterization of alternative splicing events.

2. Theoretical background

2.1 Endpoint PCR for the detection of alternative splicing events

The regulation of gene expression by alternative splicing distinguishes itself from transcriptional regulation of expression by including an isoform ratio parameter. That is, in addition to the overall expression level of a given gene, the relative ratio of individual isoforms can have a marked biological significance. This has been well exemplified for the MAPT gene, where the disruption of the 4R-tau:3R-tau isoform ratio can lead to fronto-temporal dementia and parkinsonism [1]. Thus in practice, meaningful annotation of alternative splicing involves first an assessment of whether or not the splicing event is producing more than one isoform in the RNA sources under study, and second, a measure of the ratio of the isoforms produced. We have named these two properties the “active status” and the “percent splicing index (psi or Ψ)”, respectively. An alternative splicing event is considered “active” if both of its isoforms can be detected by PCR in two samples being compared. For example, a cassette exon alternative splicing event has one isoform which includes the exon and one which excludes it, giving rise to long and short isoforms respectively. If both the long and short forms are detected in at least one of the RNA sources under study, then that alternative splicing event is referred to as “active”. The ratio of the long and short forms of the active alternative splicing event in each RNA source is represented by the Ψ value, which is defined as the concentration of the long form over the sum of the long and short form concentrations [2].

The term endpoint PCR is used to describe the standard PCR amplification, where the reaction mixtures are thermocycled to completion, typically over 35 to 40 cycles. Although these reactions are the simplest to perform, they cannot typically be used for quantification purposes. This is primarily due to the fact that, as the reaction progresses the amplification is not purely exponential due to the progressive depletion of reagents and the accumulation of inhibitors in the reaction mixture. A number of quantitative PCR (qPCR) methods have been developed to enable target quantification, all rely on the real time detection of amplified products during the thermocycling, typically at the onset of the exponential phase.

The analysis of alternatively spliced mRNAs by RT-PCR is however an application which is particularly well suited to endpoint PCR analysis. A primer pair designed to target cDNA sequences flanking an alternative splicing event (Figure 1A) simultaneously amplifies both isoforms, and within certain experimental constraints discussed below, would be expected to preserve the relative ratio of the isoforms regardless of reagent depletion or inhibition. When this amplification is followed by the separation and quantification of the amplicons using capillary electrophoresis (see below), the isoform ratio can be determined experimentally. Thus, the simplest form of PCR amplification can provide valuable information regarding the relative concentrations of alternatively spliced mRNA isoforms. Conversely, the simultaneous detection of amplicons by qPCR requires either the design of isoform-specific, fluorescently labeled probes, with a different fluorophore for each isoform in a multiplex qPCR reaction, or the execution of two separate reactions, both of which are labour intensive and costly.

While factors such as the primer initiation efficiency, reagent depletion and reaction inhibition are identical for both short and long amplification and can thus be neglected, a number of other factors can affect the absolute validity of the ratio as measured by the endpoint method. These include the relative lengths, the quality of the initial RNA (or cDNA), RT efficiency, and the taq polymerase elongation efficiency. Although we have developed design and experimental strategies to mitigate the effects of these factors, they cannot be ignored completely. The effect of target concentration has been extensively tested. Although it is known that PCR can behave unpredictably below a certain concentration, (the so-called Monte-Carlo effect [3]), in general the isoform ratios obtained are insensitive to concentration changes over a 1000-fold concentration range. We have extensively tested the effect of amplicon size, in particular whether shorter amplicons tend to out compete longer amplicons, and have been able to conclude that a competitive effect is not a factor for amplicons under 500 bp. Given that the average human exon length is 170 nucleotides [4], this does not impose a significant constraint on the design. Thus, when applied to large scale studies in multiple samples [2, 5-7] the Ψ values determined by this endpoint PCR method are robust, highly reproducible and can be applied for the detection of changes as low as 5 percentage points in Ψ value between sample sets.

2.2 Computational identification of alternative splicing events

The identification of alternative splicing events (ASEs) can be inferred from EST and mRNA sequence information. There are a number of databases of transcript, and hence alternative splicing, information. Examples include the highly curated RefSeq [8] database and the more inclusive AceView database [9]. Gene transcript sequences can be examined for alternative splicing events either individually or in a database-wide mode. We have developed Perl programming language script which scans flat files from selected databases and outputs the location, type and size in nucleotides of all alternative splicing events. These parameters can be adjusted to optimize the output for our detection system in terms of size range and event complexity. For our cancer association studies (section 2.7), we have limited our analysis to simple cassette exon and 5’ or 3’ alternative splicing events, and have excluded intron retention events to avoid misinterpretation due to genomic DNA sample contamination. Nonetheless, nearly 80% of the alternative splicing events documented in the RefSeq database can be optimally detected using our standard design constraints. Specific events not conforming to these general constraints can be considered on a case-by-case basis. Table 1 summarizes the alternative splicing statistics for the most recent releases (as of September 2009) of the RefSeq databases for human and mouse.

Table 1: Summary of alternative splicing in RefSeq databases.

Human RefSeq 36.3 / Mouse RefSeq 37
Total Genes / 22969 / 20230
Total ASEs* / 4507 / 2404
Detectable ASEs / 3461 / 1901
Excluded (intron retention) / 430 / 215
Excluded (> 2 products) / 570 / 256
Excluded (other) / 46 / 32

*alternative splicing events

2.3 Primer design

The coordinates of the alternative splicing events are then used for the design of PCR primers flanking each event. An implementation of the Primer3 software package [10] is used for this process, allowing batch design and in silico validation for each primer pair. Ideally, the design generates amplicons in the size range of 115 to 500 bp, differing by at least 10% in length. Consideration for primer Tm, GC content, and self-hybridization are made during the design, and each pair is tested against the transcribed human genome for potential off-target effects.

2.4 Capillary electrophoresis

The use of microcapillary electrophoresis (µCE) for the detection and quantification of amplicons directly in the PCR reactions is one of the essential features of our approach. As compared with standard agarose gel electrophoresis, µCE has the distinct advantages of speed, resolution and sensitivity. Moreover, the direct digitization of the electropherograms greatly facilitates subsequent data analysis, storage and the comparison of results between laboratories. Our system uses the HT DNA 5K microcapillary chip assay in conjunction with the LabChip-90 instrument (Caliper LifeSciences, Hopkington, MA). The instrument samples 10 nl directly from the PCR reaction plate, and generates a digital electropherogram in less than 30 seconds. Each detected amplicon is compared with external size standards, run every 12-24 samples, and to two internal size and concentration markers at 15 and 5000 bp. Thus the electropherograms provide both size and relative quantification data for each PCR reaction (Figure 1B). In our high-throughput application, the digitized data is transferred directly to a database for analysis.

2.5 Data analysis

The digitized data from each PCR reaction containing size and relative concentrations for each amplified product is then analyzed. By comparison with the primer design, the detected amplified products can be classified as either ‘expected’ or ‘unexpected’. The former are used for the calculation of the Ψ value, and the latter, possible evidence of novel splicing events or experimental artifacts, are stored for subsequent analysis and troubleshooting. Thus, each alternative splicing event in each RNA source is assigned a Ψ value which is used in all further analyses. A series of Perl scripts perform these assignment and classification tasks and the data and results can be displayed as shown in Figure 1C. By performing the same set of experiments on biological replicates (e.g. similar tissues from different patients), statistical analysis can be performed on populations of Ψ values. Likewise, populations of different tissues types can be compared. These calculations and graphical representations can be performed using the using the R package ( an example of which is shown in Figure 1D.

2.6 Validation of microarray and RNASeq data

Results from alternative splicing sensitive microarray experiments are commonly validated by RT-PCR [11, 12]. The high-throughput approach described here is well suited to this, allowing rapid validation of tens to thousands of alternative splicing events. This technique was recently used to validate microarray results on SMN knockdown in mouse [13].

Recent advances in sequencing technologies have the potential to detect and quantify alternative splicing events [14]. However in practice, at present only highly expressed genes provide sufficient numbers of exon-exon junction reads for reliable quantification of splicing ratios. The read capacity will need to be increased significantly, i.e. orders of magnitude, to capture sufficient junction reads to allow reliable calculation of Ψ values for the majority of the alternative splicing events. In the meantime, one promising application is to use RNASeq as a discovery tool for active alternative splicing events, followed by RT-PCR validation of specific subsets.

2.7 Tissue-specific annotation from sequence databases

It follows from the fact that the high-throughput PCR method can provide rapid access to Ψ values that it can be applied to existing sequence databases, such as the extensive expressed sequence tag (EST) resources, for the tissue specific annotation of alternative splicing. We have devised a two-stage screening routine for this purpose, and have applied it to study breast and ovarian cancer-specific alternative splicing events [7], using the RefSeq transcript database as a starting point. In the first “detection” screen, the active status of each alternative splicing event was determined in the tissues of interest. The second, “validation”, stage assessed the tissue specificity of each event using multiple biological replicates. Primers were designed for all of the documented alternative splicing events in the human RefSeq database (release 36.2), representing 2168 primer pairs. These reactions were performed on representative tissue pools from breast and ovary, and a set of active alternative splicing events for each tissue type was identified. The validation screen was then used to monitor changes in Ψ values between tissue types. To establish the statistical significance of Ψ changes, each alternative splicing event was analyzed in a set of similar tissues and compared to sets of other tissues. Based on previous tissue comparisons in breast and ovary [2, 6, 7], the number of samples from each cohort required to obtain the appropriate statistical power (P values < 0.001 or better for Bonferroni corrected T-tests) varied from 5 to 20 depending on the alternative splicing event under consideration.

3. Protocol

3.1 Primer design

Primers are designed to target the exons flanking alternative splicing events, such that all expected amplicons will be between 115 and 500 bp, and differ from one another by at least 10% of their size. Primer3 [10] can be used for the design process, the extensive list of the program’s parameters used can be obtained by contacting the authors. Primers are verified for potential off target amplification using an adaptation of NCBI Blast [15] on an in-house transcriptome database.

3.2 RNA preparation

Total RNA is extracted from tissues or cells using a TissueLyzer apparatus (Qiagen) and either TRIzol (Invitrogen) or RNeasy kits (Qiagen) in tube or 96-well plate format, following the manufacturers’ protocols. The concentration and quality of the RNA is assessed using an RNA assay on a 2100 Bioanalyzer (Agilent Technologies) µCE instrument. The instrument software generates a RIN (RNA integrity number) between 0 and 10 for each sample. Samples with a RIN of 5 or above are considered acceptable.

3.3 RT and QC of cDNA

Reverse transcription of total RNA is carried out using either Omniscript (Qiagen) at 37C or Transcriptor (Roche) at 55C following the manufacturers’ protocols. Twenty-five ng to 2 µg of total RNA is incubated with the RT reagents, 20 U RNaseOUT ribonuclease inhibitor(Invitrogen), 1 µM oligo (dT)21 and 0.9 µM random hexamers (IDT), in reaction volumes of 20 µl. The cDNA is subjected to quality control using SYBR green qPCR of 3 reference genes, MRPL19, PUM1 and GAPDH (primer sequences available on request). Typical qPCR threshold (Ct) values in the range of 14 to 25 for these three genes are considered acceptable.

3.4 PCR reactions and amplicon detection

Ten µl reactions are typically formulated in 384 well plates from 20 ng cDNA (1 µl), 10 mM dNTP (0.2 µl, Roche), PCR 10X buffer (1 µl, Invitrogen), 1.5 mM MgCl2, (0.3 µl, Invitrogen), 5U Taq polymerase (0.04 µl, Invitrogen Platinum Taq), 1.2 µM primers (5 µl, diluted in water), and water (2.46 µl). For amplification, an initial incubation of 2 minutes at 95C is followed by 35 cycles of 94C for 30 s, 55C for 30 s, and 72C for 60 s, and completed by 2 minutes at 72C. The 384 (or 96) well plates are transferred directly to a LabChip-90 microcapillary electrophoresis instrument (Caliper LifeSciences) and a HT DNA 5K chip assay is run following the manufacturer’s protocol. Caliper has recently replaced the LabChip-90 with the LabChip-GX instrument, the same assay can be run on this instrument. Data from the microcapillary instrument is output in the form of text files which contain instrument-derived peak size (in bp) and concentration data for all fluorescent signals detected, and numerical data to plot the electropherograms. We have written a suite of Perl scripts to retrieve this data from the instruments and merge it in our database with the design data containing the expected amplicon sizes. Signal assignment routines are run, and Ψ values are calculated and stored for each PCR experiment. Data can be represented as shown in Figures 1A-D. Computational scripts for the identification of novel splicing events have also been developed; here data from two independent reactions characterizing the same putative alternative splicing event can be analyzed for consistency in the unexpected amplified products.

4. Example of an experiment

The design and experimental data for a set of 96 alternative splicing events on 8 RNA sources can be viewed at by following the link for “Sample alternative splicing annotation data”.

5. Troubleshooting

Table 2 lists the most common problems encountered with our approach, and suggestions for mitigating their effect.

Table 2: Troubleshooting

Symptom / Possible cause / Action
No or low PCR amplification / RNA quality / Use only total RNA with RIN > 5
Primer design / Redesign primers
PCR reaction evaporation (only marker peaks present in electropherogram) Improperly sealed PCR reaction plate or can occur during µCE / Dilute reaction and reanalyze, or rerun amplification
Low expression / Increase cDNA concentration
Presence of unexpected amplification products / RNA quality / Use only total RNA with RIN > 5
Genomic DNA contamination / Confirm with reference to genome sequence, treat sample with DNase
Primer dimers, concatamers / Avoid design of amplification products below 100 bp
Heterodimers / No action, generally elutes at higher molecular weight
NovelAS event / Confirm with independent primers spanning same region, purify and sequence
µCE instrument failed to automatically identify markers at 15 bp or 500 bp / Manual assignment of marker peaks
Direct repeat RT artifact / Analyze putative splice sites for consensus sequence and presence of direct repeats, perform RT at higher temperature (e.g. using Transcriptor reverse transcriptase (Roche))[16].

Figure legends

Figure 1. PCR annotation of alternative splicing. A complete dataset can be viewed at A. Alternative splicing events (ASEs) are identified from transcript databases or datasets, and PCR primers are designed to flank each event. A screenshot from our database for 2 transcripts of fibroblast growth factor receptor 2 (FGFR2) is shown. Schematic detail of the ASE is shown, indicating the relative positions of the forward (green) and reverse (red) primers. B. Electropherogram obtained following endpoint PCR amplification. Microcapillary electrophoresis is performed on each reaction to detect expected short (205 bp) and long (472 bp) amplicons. Size and concentration are measured for each amplicon signal and these are used to calculate the Ψ value as indicated at right. Marker peaks at 15 and 5000 bp are labeled M, residual primers are detectable immediately to the left of the M15 peak, and a heterodimer peak is detectable to the left of the long amplicon. C. Graphic display of the Ψ values. Left, Ψ value heatmap for 7 ASEs (rows) in 8 samples (columns). Top row of heatmap shows ASE depicted in A for FGFR2. Right, amplicon ratio representation for the FGFR2 ASE in 8 samples. Relative concentrations of the short (red bars) and long (green bars) amplicons, each row represents a different RNA source. Total detected molarity is shown to the right of each row; values greater than 5 nM are generally considered acceptable. D. Scaled heatmap representation of Ψ values for 656 ASEs (columns) in 25 normal human ovary tissues and 21 serous ovarian cancer specimens. See Venables et al. [6] for full details of clustering methods used.