Splicing and Disease.

Emanuele Buratti and Francisco E Baralle*

International Centre of Genetic Engineering and Biotechnology (ICGEB), Trieste, Italy.

*address correspondence to: Prof. Francisco E. Baralle, Padriciano 99, 34149 Trieste, Italy, Phone: +39-040-3757337, Fax: +39-040-3757361, E-mail: .

Keywords: splicing, mutation, RNA, 5’ splice site, 3' splice site, bioinformatics

Key points

An increasingly number of diseases is now recognized to be caused by the selection of ‘wrong’ splice sites

The selection of ‘wrong’ splice sites can be caused by mutation in the DNA or by changes in trans-acting factors

Abarrant splicing is best studied in monogenetic diseases, but is increasingly found in complex diseases

ABSTRACT

Not clear whether there should be an abstract in intro chapters, if not, this para can be moved to the first part of the introduction

The observation that impaired splicing may cause human diseases dates from the earliest days of splicing research, in particular from the pioneering studies on haemoglobin genes. However, in the last fifteen years or so the increased knowledge of the process itself coupled with the great advancements in diagnostic screening techniques has greatly expanded that initial awareness. It is now clear, in fact, that splicing mutations can occur in virtually any human intron-containing gene and that the resulting splicing alterations may cause disease. The pathological penetrance of these mutations may be variable depending on the individual genetic background. Up to now, the most studied examples regard classical genetic diseases linked to alterations in a single-gene splicing regulation. However, it is increasingly clear that splicing alterations play just as important roles also in the origin and progression of complex diseases, such as tumour formation or neurological defects. The aim of this review will be to provide basic pointers on splicing alterations and disease, especially focusing on an overview of the consequences of genomic variations.

INTRODUCTION

In order to ensure accurate gene expression the pre-mRNA splicing process has the task of removing intervening sequences (or introns) from eukaryotic precursor messenger RNAs (pre-mRNA) [1]. Apart from joining consecutive exons together, this process is capable of selective removal or inclusion of exonic and intronic sequences in mRNA, generating several transcripts from a single gene, often in a cell type-specific, developmental, and even gender-dependent manner [2-8]. This process is called alternative splicing and, during the course of evolution, has achieved a very high degree of complexity, as recently reviewed by a number of publications [9,10]. I would substitute this para with the abstract, plus an introductory line, the reason is that it is found in almost every chapter. This complexity is aimed at maintaining proper exon/intron recognition and is one of the esential factors that influence the shape of human genes [11]. In keeping with this, many recent reports have consistently highlighted the observation that even apparently neutral changes in sequence composition of exons may alter splicing revealing evolutionary mechanisms aimed at maintaining proper splicing regulatory pathways [12-14].

Both constitutive and alternative splicing pathways are carried out by a large ribonucleoprotein complex named spliceosome [1,15]. Assembly of this very sophisticated cellular machinery [16,17] in every exon-intron or intron-exon junction is controlled by conserved but rather degenerate sequence elements that include 5’ splice sites (5’ss) and 3’ splice sites (3’ss) and upstream of the 3’ss the polypyrimidine tract and the branch point sequence (BPS) (Fig.1). (chapter 5 luhrmann) Because of their degeneracy, however, these consensus splicing signals contain approximately half of the information necessary for accurate splice-site selection [18]. The remaining information is provided by auxiliary signals in introns and exons, termed splicing regulatory elements (SREs) that can also be referred to as enhancer and silencer sequences depending on their effect on exon recognition [19-21] (chapter 4 hertel) . In most cases, these sequences work by interacting with trans-acting factors whose number is steadily increasing over time[22,23]. In parallel, a considerable degree of splicing regulation can also occur in a protein-free fashion with low-molecular-weight ligands [24], snoRNAs [25], processed snoRNAs (25 and PUBMEDID 20053671)and modification of RNA secondary structure [26,27], all being capable of affecting this process. Finally, as the spliceosome and transcription machineries are tightly linked, splicing can be influenced by pre-mRNA processing kinetics and transcription [28,29], cellular stress [30], and external extracellular signals [31]. As a result, splicing mutations may not only affect RNA processing, but also transcription [32]and downstream gene expression pathways, including translation, largely by creating or eliminating exons containing upstream open reading frames [33,34]. The combination of all the factors influencing splicing contributes to what is now commonly known as the “splicing code” [35-37]. As expected from all this complexity, mutations in any of these cis-elements or factors can dramatically alter splicing efficiency, lead to aberrant splicing, and eventually to human disease, particularly in large genes with many introns [38,39].

Splicing and Disease

In recent years the topic of splicing and disease has been reviewed several times, most recently by Tazi et al. [40], Cooper et al. [41], and Baralle et al. [42]. All these reviews, with different emphasis on particular topics, have overviewed the general field in the light of the latest discoveries concerning the splicing process. The reader is therefore referred to these publications for a general, up to date overview of the subject.

At the same time, it is interesting to note that the huge amount of new information being published every year on the relationship between splicing and disease has resulted in the appearance of reviews that focus on particular kind of diseases. For example, starting from the initial overview by Venables [43] on potential connections between splicing and cancer, several reviews have followed up on the specific subject [44-48]there is also a book by Venables, Venables, J.P. (2006) Alternative splicing in cancer. Transworld Research Network, Kerala, India..In general, the mechanisms through which aberrant alternative splicing can bring about a tumorigenic transformation involve rather expected events, such as the production of protein isoforms with oncogenic properties (or with impaired anti-oncogenic properties). The genes involved in these cases predominantly belong to factors that control such processes as apoptosis, cell-cycle regulation, and angiogenesis. One important factor that is also emerging from these studies on human cancers is the central role played by alterations in the expression of the splicing factors themselves, rather than in individual genes being mutated in their splicing regulatory regions, recently reviewed by Grosso et al. [49]. Of particular interest is the recent identification of one of the most famous splicing (best studied) factors (SF2/ASF) as a potential proto-oncogene upon its overexpression in rodent fibroblasts [50]. In keeping with this conclusion, the same study has shown that the SF2/ASF factor is overexpressed in a variety of human tumours. The mechanism through which transformation has come about are still under study. The study by Karni et al. [50] has identified several likely targets whose alternative splicing patterns can be adversely affected by up regulation of SF2/ASF, such as tumour suppressor BIN1, and the MNK2, S6K1 kinases. Moreover, it was previously published that SF2/ASF expression levels can powerfully affect the alternative splicing process of the Ron oncogene [51]. Other well known splicing factors whose expression levels are altered in cancer cells are hnRNP A1, Tra-2, YB-1, and a host of other factors that were previously known just for their splicing regulatory abilities [52]. Many of these connections, especially with regards to their potential functional significance in tumour origin and progression, still need to be further tested. Even so, the rapid development of the therapeutic field associated at correcting splicing defects and the need to find new targets for therapeutic treatment for this type of ailments will certainly drive the field swiftly forward in the near future.

All these observations have increased the use of microarray analysis of transcript alterations as "biomarkers" for the diagnosis and prognosis of particular types of cancers. For example, attempts have been made to classify according to splicing variations in the transcripts diseases like Hodgkin lymphoma tumours [53], ovarian cancers [54], leukemia cell lines [55], and human breast cancer cells [56].

It should also be pointed out that human tumors are not the only complex disease where the importance of splicing is carefully evaluated. In particular, the potential role played by splicing alterations in neurological diseases has also gained a lot of attention [57,58]. For example, a lot of focus has been recently placed on clarifying the role of Nova, a neuronal specific splicing factor (see chapter 3 Allain for details). This factor regulates synapse formation during the development of the human brain by controlling the alternatively spliced levels of several neurotransmitter receptors, adhesion molecules, cation exchangers, and scaffold proteins [59,60]. Another recent addition to the list of splicing factors involved in neurodegeneration is also represented by TDP-43, a protein that was previously described to control CFTR exon 9 splicing [61] but has recently been described to be the major accumulating protein of patients affected by Frontemporal Lobar Degerneration and Amyotrophic Lateral Sclerosis [62].

Splicing Therapeutics (sounds like you use splicing for a therapy, what about Therapeutic approaches)

Current possibilites aimed at modifying aberrant splicing patterns have also been the subject of several recent reviews [63-67]. The strategies used to modify splicing profiles are quite varieddivergent. The most successful approaches up to now have involved the use of antisense oligonucleotides that target splicing control regions. These oligos can be used to inhibit the inclusion of unwanted exons and/or promote the production of a truncated but functional protein [64,68](see chapter 42 Aartsma Rus). Antisense oligonucleotides can also be modified to contain a complementary targeting region and an effector region which can recruit or mimick splicing factor activities [69,70]. Alternatively, one promising research line that is gaining a lot of attention is represented by the use of small molecules that act by interfering with cellular signalling pathways that selectively modifying modify the activity of splicing regulatory proteins through altered cellular distribution or change of changing phosphorylation states [71](see chapter 43 denise cooper). Screening methods have been developed to identify small molecules from chemical libaries that regulate a given splicing event (see chapter 42 peter stoilov).. Alternative approaches have also been described that use siRNA approaches to specifically knock down aberrant splicing isoforms, exploit trans-splicing startegies (SMaRT) [72], and use of modified U7-U1snRNP molecules to to block aberrant splice site sequences (ie. acting as antisense oligonucleotides) or to reverse missplicing by carrying compensatory mutations in the 5' end of their U1snRNA sequence [73-76](see chapter 40 daniel schumperli).. For the interested reader, it should be noted that many of these specific approaches are discussed in Section II.E of this book.

Generation of aberrant transcripts

From a biomedical point of view, one of the the most important aspects to be considered by researchers is what kind of aberrant transcript may be generated by typical splicing-affecting mutations. A schematic diagram of the possible consequences of mutations in the basic splicing regulatory elements is reported shown in Fig.1. Mutation in enhancer or silencer elements that lead to their disruption (or creation) can lead to the same consequences described for the basic regulatory factors. It should be kept in mind that with the addition that creation of enhancers or silencer loss can lead to increased levels of exon inclusion (Fig.2).

Exon skipping.

In general, the vast majority of 5’ss, 3’ss and regulatory elements mutations result in the skipping of the affected exon from the splicing queue of the affected exonnot clear to SS what splicing queue is[77] (Fig. 1A). Although by itself the skipping of an exon from the pre-mRNA splicing queue is a straightforward process, it should be noted that quite often the skipping event is not confined to the exon carrying the splicing mutation but it can also be extended to neighbouring exons (either upstream or downstream). This has suggested that in all cases the importance of the genomic milieu should never be underestimated [78]. It also also underlines that bioinformatic predictions have to be validated experimentally.

Cryptic splice site activation.

Cryptic splice site activation usually occurs when the natural donor or acceptor site is inactivated or weakened by a particular mutation. In this case, depending on the local sequence context, one or more splice sites are used that would normally be ignored by the splicing machinery (Fig.1B). These events result either in the addition or subtraction of nucleotide sequences from the original exon. In these cases there is a 2 in 3 chance of disrupting the reading frame by introducing aberrant translation stop codons in the final transcript that can either cause degradation of the mRNA transcript through Nonsense Mediated Decay (NMD) or synthesis of a truncated protein. Furthermore, even when the reading frame remains unchanged the addition/removal of a number of amino acid residues from the resulting protein may well prove to be harmful with regards to its biological properties or regulation.

A bioinformatics analysis of several hundred cryptic splice site activation events[79,80] has confirmed that cryptic splice-sites are, on average, intrinsically stronger than mutated authentic counterparts but generally weaker than their authentic, wild-type counterparts [81]. However, in ~10-15% cases, the wild-type authentic splice site was weaker than the corresponding cryptic site. This indicates that there are additional signals in the pre-mRNA that repress their use and several experimental observations have confirmed this hypothesis. First of all, on the bioinformatics level, the analysis of auxiliary sequences betweeen authentic and aberrant splice sites showed that one particular type of silencers, termed PESSs for putative exonic splicing silencers [19-21] was particularly informative for predicting aberrant splice-site activation[82]. Secondly, in genes such as FGB it has been reported that an SF2/ASF binding sequence, that does not normally partecipate in the recogniton of the constitutively recognized exon 7, can nonetheless profoundly influence the activation and type of cryptic splice site sequences being used by the splicing machinery following inactivation of the wild-type donor site [83].

There are two important databases that collect disease-related, cryptic splice site activation events following either acceptor or donor site inactivation DBASS3 and DBASS5 [79,80]. Both databases are freely available at Finally, an in silico tool (Cryp-Skip, available at has been recently developed to predict the potential occurrence of cryptic splice site activation versus exon skipping following the introduction of mutations in any given donor or acceptor site [84] (see chapter XXX and Table XXX).

Intron Retention

Intron retention events are usually defined as the retention of entire intronic sequences in the final processed mRNA (Fig.1C). The frequency of normal intron retention events in the human genome has been recently estimated to be around 15% in a set of more than 21000 annotated genes [85]. In many cases, the biological role of these events is currently unknown, however, it is known that they preferentially occur in the untranslated region of the RNA [85,86]. Their potentially regulatory role, however, is established by some well described example such as the generation of the P element and Msl2 transcripts in Drosophila[87,88], in the developmental regulation of the pro insulin messenger RNA in chicken embryos [89], in the generation of a novel adhesion molecule in rat testis [90], or in controlling the expression levels of the Apolipoprotein E in the central nervous system [91]. As expected, aberrant intron retention events following the introducton of mutations in splicing regulatory elements have also been shown to be associated with human disease such as pheocromocytomas [92], long QT syndrome [93], Leigh syndrome [94], Arthrogryposis multiplex congenita (AMC) [95], and B lineage human cancers [96].

Pseudoexon inclusion.

The term "pseudoexon" usually refers to any nucleotide sequence between 50 and 300 nucleotides in length with apparently viable 5'ss and 3'ss sites at either end. Because of the degeneracy of the splicing code it is expected that many such sequences would be present in most human genes. Indeed, in the hprt gene it has been estimated that pseudoexon sequences largely outnumber the "real" exons [97](see chapter 4 hertel). The evidence available up to now has pointed to several factors that can help the spliceosome discriminate real exons from these false targets. First of all, inclusion of many of these sequences is actively inhibited due to the presence of intrinsic defects (ss: what kind of defects?)[97], the presence of silencer elements [20,98,99], or the formation of inhibiting RNA secondary structures [100].

Nonetheless, the number of reported pseudoexon events involved in human disease is steadily increasing. Usually, this is due to de novo creation of the classical splicing consensus sequences: donor, acceptor, and branch site sequences (Fig.1D). Following these events, the second most frequent mechanism that leads to pseudoexon activation involves the creation/deletion of splicing regulatory sequences. Finally, in two individual cases the rearrangement of genomic regions through a gross deletion that has brought near to each other two viable donor and acceptor sites [101] or genomic inversions that have activated exons in what would normally have been the antisense genomic strand[102] has also been described to give rise to pseudoexon inclusion events.

Unexpected splicing outcomes following disruption of classical splicing sequences.

It should also be noted, however, that these possibilities do not rule other kind of outcomes such as those schematically reported in Fig.3. In this case, it has been observed that disease-associated inactivating mutations in the 3' acceptor sequences of the TP and XPA genes not only cause skipping of the affected exon but also determine a shift in donor acceptor usage of the preceding exon [103,104]. This kind of "atypical" outcomes is not confined to 3'ss sequences as donor site inactivation in the COL1A1 and CLN6 genes has yielded very similar results [105,106]. For this reason, in order to accurately determine aberrant splicing events it is always advisable to use the full range of diagnostic possibilites (most of which are fully described in other chapters of this book).