Functional RNA classes: a matter of time?
Oscar C. Bedoya-Reina and Chris P. Ponting
MRC Human Genetics Unit, The Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU
Corresponding author:
tel: +441316 518500, fax: +441316 518800, e-mail:
Heading
We know little about the functions of long noncoding RNAs relative toour accumulated knowledge of protein-mediated mechanisms. Mukherjee et al. (2016) propose RNA classes each with similar kinetics of synthesis, processing and turn-over that they predict often share functional properties.
The terminology we use for long noncoding RNAs (lncRNAs) rather says it all. Instead of using informative terms that refer to function, we define them always by what they are not (protein-coding or short), and frequently by their site and orientation of transcription relative to protein-coding genes. We do so because we lack useful experimental tools and insightful evolutionary or structural concepts that accurately predict molecular function. Indeed, it would be prudent to consider that alncRNA lacks function until any robust and reproducible experimental evidence to the contrary becomes available. If high-throughput functional transcriptomic approaches were to be available they might classify an RNA according to whether it acts in cis – locally, near to the RNA’s site of synthesis – or more distally, in trans, and in particular cellular compartments, such as in the cytoplasm or nucleus. In an attempt to break this functional classification deadlock, Mukherjee et al. [1] placed human mRNAs, lncRNAs and transcribed pseudogenes into 7 classes without recourse to existing annotations. They propose that such a classification helps to pinpoint those lncRNAs that are functional and hints as to what molecular activities they confer.
The classification was built from six experimental measures relating to the synthesis, processing, translation or degradation of RNA, and their localisation with respect to the cytoplasm or nucleus in HEK293 cells. These six kinetic and spatial parameters were inferred for each of 15,120 RNAs: 84% mRNAs, 13% lncRNAs and 3% transcribed pseudogenes (Figure 1A & B). The kinetics of mRNA biogenesis and turnover showed broad agreement with previous observations [2]. Furthermore, as expected from some previous studies, noncoding RNAs tended to differ from most mRNAs in being synthesised and spliced more slowly [3], and degraded more rapidly [4-6] (Figure 1CD) which accounts for their lower average cellular abundance [7]. Relative cytoplasmic or nuclear localisation was little different between mRNAs and lncRNAs, in contrast to results from some [8-9] but not all [10] previous studies.
Among the 7 classes, four (c1 to c4) were dominated by protein-coding mRNAs and one (c7) by lncRNAs; the remaining two classes (c5 and c6) were evenly split between mRNAs and lncRNAs (Figure 1B). The authors explain that similarities between two RNAs’ metabolism and localisation result in their placement together in the same class and could reflect their coordinated regulation by particular RNA processing factors. More controversially, they propose that this classification might predict the downstream molecular mechanisms of these RNAs. This is less well supported: it is unclear how similarities between two RNAs regarding their subcellular localisation, and synthesis and processing kinetics, relate to equivalence of molecular mechanism. This is particularly so when comparing an RNA encoding a protein with one that does not, and considering that other parameters relating to the kinetics of molecular complexes are likely to more effectively predict molecular mechanism.
Nevertheless, the classification argues persuasively that some lncRNAs with modest abundance could be stable and act in trans, and thus able to participate in multiple molecular transactions. It further highlights low numbers of lncRNAs in classes 1-4 (220, 11% of all lncRNAs) that instead could be protein-coding, and 1,014 proposed mRNAs in classes 6-7 that might be non-coding. Larger numbers of lncRNAs in class c7, which exhibit slow synthesis and processing together with rapid degradation, are perhaps less deserving of detailed experimental scrutiny. This is because they are more likely to reflect transcriptional noise [11] or are noncoding products of transcriptional events initiated divergently from nucleosome-depleted regions [12].
The 7 classes fail to separate lncRNAs cleanly with respect to evolution. Most lncRNAs evolved similarly to 3’UTRs of protein-coding genes, although those in c5 and c7 classes showed even lower constraint. Overall, lncRNA exons contain only a small fraction (~5%) of sequence that has been subject to evolutionary constraint [13] with this most focused, within multi-exon transcripts, on splicing enhancers [14-15]. Whether the classes contain variable fractions of multi-exon transcripts, and whether these show, against a well calibrated evolutionary model, constraint with respect to splicing remain to be determined.
The 7 Mukherjee et al. [1] classes also do not predict the histone H3 lysine-4 mono- or tri-methylation (H3K4me1 or H3K4me3) status of lncRNA promoters. These histone marks are relevant because the only other lncRNA categorisation that is not based on chromosomal location exploits their ratio to define two classes of lncRNA loci [16]. Thesetwoclasses do not merely reflect a relatively arbitrary data feature. They predict molecular mechanism, separating enhancer-associated lncRNAs that act in cis in a tissue-restricted manner from promoter-associated lncRNAs acting more distally, in trans. The classes’ features and applicability have been replicated subsequently in a study investigating multiple cell types across B cell development[17].
The most useful lncRNA classification will be that which is most generally applicable and predictive of mechanism. The two H3K4me-based classes are distinguished in four respects. The expression of enhancer-associated lncRNAs is typically more restricted and is lower than promoter-associated lncRNAs; moreover, it is correlated with enhanced levels of expression from neighbouring protein-coding genes. Finally, the two classes are also distinguished by their evolution, because the upstream and exon sequences of promoter-associated lncRNAs exhibit significant levels of constraint, whereas those of enhancer-associated lncRNAs do not. Because evolutionary properties are invariant with respect to cellular context these two lncRNA classes are largely mutually exclusive[16-17].By contrast, the Mukherjee et al. [1] categorisation of RNAs may not hold universally, because it was derived solely using the HEK293 cell line. Time will tellwhether categorisations based on RNA metabolism profiles and derived from different cell types, including primary cells, even across different species,arelargely congruent.
In time studies will determine whether, as Mukherjee et al. propose, their classification assists in predicting which lncRNAs are functional and to what cellular process, such as transcription, translation or splicing, they contribute. By function we mean that a cellular or organismal phenotype is observed when the lncRNA is altered, and importantly also that this phenotype depends on the RNA sequence rather than on its underlying DNA [18-19]. Some molecular events involving lncRNAs, when perturbed, will not be sufficiently consequential to be scrutinised by selection and thus will be devoid of function[20]. For these, changes in rates of transcription, processing, binding or turnover will fail to induce cellular phenotypic variation. For other lncRNAs, disruption of molecular transactions will cause phenotypic change. Experimental and evolutionary evidence suggests that such lncRNAs are rare and adopt diverse mechanisms. Consequently, any classification that prioritises experiments and accurately predicts those that are functional, would indeed be very timely.
Acknowledgements. This work was supported by the Medical Research Council and the Wellcome Trust. We thank Wilfried Haerty (Earlham Institute) for helpful comments.
Figure legend
Figure 1. Mukherjee et al. [1] placed human mRNAs, lncRNAs and transcribed pseudogenes into 7 categories. (A) mRNAs are more common in categories c1-c4 and lncRNAs in categories c5-c7. B) Categories c1-c4 are dominated by protein-coding mRNAs of various types, and category c7 by lncRNAs. C) In general, noncoding RNAs (green) are synthesised and spliced more slowly, and degraded more rapidly, than mRNAs (orange). D) Different categories were defined based on six experimental measures relating to the synthesis, processing, translation or degradation of RNA, and their localisation with respect to the cytoplasm or nucleus in HEK293 cells.
References
[1] Mukherjee, N. et al. Nat. Struct. Mol. Biol.xxx, xxx-xxx (2016)
[2] Rabani, M. et al. Nat. Biotech.29, 436–442 (2011)
[3] Tilgner, H. et al. Genome Res.22, 1616–1625 (2012)
[4] Houseley, J., Tollervey, D. Cell 136, 763-776 (2009)
[5] Tani et al. Genome Res.22, 947-56 (2012)
[6] Tuck, A.C., Tollervey, D. Cell. 154, 996-1009 (2013)
[7] Cabili, M. et al. Genes Dev.25, 1915-27 (2011)
[8] Cabili, M. et al. Genome Biol.16, 20 (2015)
[9] Derrien, T. et al. Genome Res.22, 1775-89 (2012)
[10] Van Heeschet al. Genome Biol.15, R6 (2014)
[11] Struhl, K. Nat. Struct. Mol. Biol.14, 103-5 (2007)
[12] Andersson, R. et al. Nat. Commun. 5,5336 (2014)
[13] Ponjavic, J., Ponting, C.P., Lunter, G. Genome Res.17, 556-65 (2007)
[14] Schüler, A., Ghanbarian, A.T., Hurst, L.D. Mol. Biol. Evol.31, 3164-83 (2014)
[15] Haerty, W., Ponting, C.P. RNA21, 333-46 (2015)
[16] Marques, A.C. et al. Genome Biol.14, R131 (2013)
[17] Brazão, T.F. et al. Blood128, e10-9 (2016)
[18] Bassett, A.R. et al. Elife3, e03058 (2014)
[19] Paralkar, V.R. et al. MolCell 62, 104-10 (2016)
[20] Doolittle, W.F. et al. Genome Biol.Evol.6, 1234-7 (2014)