- Introduction
Computational intelligence technique is process employed with the help of algorithms to analyze the patterns of the RNA sequence data. The intelligence is based upon the algorithms that analyze the RNA sequence. The artificial intelligence is applied to make the algorithm to analyze the patterns of RNA sequence data to determine the predictive results. There are so many computational intelligence techniques available to analyze the RNA sequenced data. These algorithms are designed in a way that apply the heuristics by identifying the patterns in step by step formulation such as classification, grouping and clustering, minimizing the result with similarity score definition and finally devising the results.
The process of RNA sequencing is a difficult task that includes the complex functional formulation with the step wise analysis. The accuracy of analysis result depends upon the various factors such as extraction of RNA, fragmentation of extracted RNA, Sequencing first. These three steps are basically non computational processes followed one by one in sequential fashion to finalize the sequencing of RNA in a pattern that further the algorithm further proceeds to analyze the RNA. The computational intelligence is applied on sequenced RNA patterns to qualify the required quality. Further, various analysis is employed to get the final result.
The computational biology is interdisciplinary area that is devoted to interpret and perform the analysis of the biological information through the computing techniques. This area includes the exhaustive research and findings with biology, computer science, mathematics and statistics. This combined approach is applied in step by step fashion internally in the form of computer algorithms to sequence the biological data, arrange the genome content and predict the structure of macro molecules such as RNA etc. The new techniques and tools are emerging regularly in the field of computational intelligence in biological data analysis. The regular advancement in biological data collection technique giving the opportunity and challenge to algorithm designer to make the analysis of complex and high volume data.
Traditional methods of computing are very much limited with their functional scope to such complex, huge, multi-dimensional and noisy data. Due to the limitation the traditional methods are not able to provide the accurate report after the analysis. This is also true for the traditional algorithms and methods that the process involved in their methodology is manual and time consuming.
The computational intelligence mechanisms are automated technique which combines the elements of learning, adaptation, evolution and logics and predicates to devise the analysis procedures. The complexities are formulated with the algorithmic steps and biological data is taken directly from the input sources like sensors and other input systems. It has the flexibility in the information processing capabilities to handle the large volume of real life data containing the noise, ambiguity, missing values. The problem solving with biological informatics generally involves the searching the useful regularities or pattern in huge amount of data from the multi-dimensional framework. This is fact behind the development of advanced pattern analysis approaches as the traditional methods often become intractable in such situation.
- Review Report Body
The review of the computational intelligence technique or tool used for RNA sequence data analysis is summarized in following sub sections.
Shanrong Zhao, Li Xi, JieQuan, Hualin Xi, Ying Zhang, von Schack, David, Vincent and MichaelBaohong Zhang(2016), defined a next generation sequencing technique for transcriptone profiling. This next generation technique decreases the cost of sequencing. Their defined technique also has the problem related with the massive amount of data generated by vast scale RNA sequence. They devised the technique by multiple computational algorithms and the tools associated to run the algorithms in sequential fashion are automated through the intelligence inception. These tools associated with algorithms are open source that again makes it more popular to be enhanced by other one also. The RNA-seqdata analyses and advanced web 2.0 technologies framework advances the functional efficiency of technique. The implemented version of tool is QuickRNASeq that is a pipeline for huge RNA sequence data analyses and visualization. This defined tool has three steps for analyze the RNA sequence. In first step individual sample is being processed with computation intensive fashion. The second brings the result of individual sample and a report is generated. Finally, at third step is data interpretation and presentation of the final RNS sequence analysis result.
Wagle, Prerana, Nikolić, Milo and Frommolt, Peter (2015), proposeda computational intelligent next-Generation Sequencing (NGS) tool‘QuickNGS’ for the molecular biology. This tool has the multiple algorithmic approaches for the basic data analysis. This tool has the ability to analyze the data from multiple NGS projects at same time. This tool utilizes the parallel computing resources having a back-end database entity. A comprehensive analysis of 10 RNA sequence samples are taken and be finished in elapse of few minutes. This also takes the large number of samples with multiple projects at the same time and analyzes to provide the RNS sequence report.
Sun, Yongmei, Li Xing, Wu, Di, Pan Qi, JiYuefeng, Ren Hong and Ding Keyue (2016), proposed computational intelligent software tool called ‘RED’ (RNA Editing Site Detector) which identifies RNA editing sites through integration of multiple rule based and statistical filters. The RNS site is being visualized at genome and the site levels are visualized by Graphical User Interface (GUI) based display window. This tool enhances the functional performances by integrating the MySql database engine for high level database throughput and queries processing. This has the ability to identify the presence and absence of C-> U RNA-editing sites that is experimentally validated in comparison to REDItool as it is command line tool for performing high output investigation of RNA editing. This also provides the better sensitivity and easy to use, platform independent java based software and applied to RNA sequence data without the presence of DNS sequence data.
Wolf Matthias, Koetschan Christian and Muller Tobias (2014), proposed a tool called 4SALE that is synchronous sequence and secondary structure alignment and editing technique. This tool enables one to align RNA sequences and their individual secondary structure synchronously and automatically. After that they introduced a scale down Graphical User Interface (GUI) version of 4SALE tool for the big data analysis. This is widely accepted for the phylogenetic information discovery.
Wintermans, Bastiaan, Brandt Bernd, Vandenbroucke-Grauls Christina and Budding Andries (2015), developed a tool TreeSeq that works with intelligence to quaternary tree search structure for the analysis of biological sequence data. The main beauty of this tool is employing the rapid search for the sequences of interest from large number of data sets. This tool inherent with the screen gutsmicro biotametagenomic dataset and a whole genome sequencing (WGS) dataset of a strain of Klebsiella pneumonia for antibiotic resistance. This tool is thirty times more faster that of B:AST and also the result is accurate in data sequence analysis.
Zyprych-Walczak J, Szabelsk A, Handschuh L, Górczak K, KlameckaFiglerowicz M. K and Siatkowski I (2015), proposed a technique for high throughput sequencing for the RNA sequence. This technique employs the statistical and computational methods which tackles the analysis and management of biological sequenced data. They provides a comprehensive comparison of five normalization methods related with sequencing depth by suggesting a common workflow which is being applied for the selection of optimal normalization procedures of any type of pattern of dataset. Statistically, the computational algorithms by this technique calculate the bias and variance values for the control gene. This gives the suitable normalization method to studied data set and finally determines the method can be employed interchangeably.
Vila-Casadesús Maria, Gironella Meritxel, Lozano Juan Jose (2016), developed an R package namely miRComb which combines miRNA expression data with hybridization information to find out potential miRNA-mRNA. There is pipeline constructed for the main output. The output results may be used to a huge numbers of testable hypotheses proposed by other authors in this domain. The computational steps are to first filter the high amount of miRNA-mRNA interactions obtained from the existing miRNA target prediction database and then presents by standardized form such as in PDF report form.
Yejun Wang, MacKenzie Keith D, White Aaron P (2015), developed an empirical method that is combined with empirical tests to determine the transcript features in association with transcriptional start sites such as TSSs, transcriptional termination sites such as TTSs and operon organization . They obtained 2764 TSSs and 1467 TTSs for the 1331 and 844 different genes respectively. The result of this technique shows that directional RNA sequence can be used to detect transcriptional borders at acceptable resolution. The computational algorithms employed through their proposed method or technique based on the transcript border detection, statistical models and operon organization pipeline. This technique is widely applied to study the RNA sequence in other bacteria as TSSs, TTSs, operons, promoters and un-translated regions.
Boley Nathan, Stoiber Marcus H, Booth Benjamin W, Wan Kenneth H, Hoskins Roger A, Bickel Peter J, Celniker Susan E and Brown James (2014), described a an automated pipeline technique to genome annotation which integrates RNA sequence and gene boundary data sets. This computational technique having the tool is called Generalized RNA Integration tool or GRIT. This tool analyzes the gene expression and site sequence data collected. This annotation based method is optimized by the way of pipelining the steps of analysis through the various steps. The report is obtained through the automated system after the analysis.
Irla Marta, Neshat Armin, BrautasetTrygve, Ruckert Christian, KalinowskiJorn, Wendisch Volker F (2015), applied a technique in which two different cDNA library preparation method are taken. The one method characterizes the whole transcriptome and another one includes enrichment of primary transcript 5-ends. The computational algorithms are employed with two different baselines. One algorithm estimates the whole transcript and another follow to primary transcript only. The exact TSSs positions were taken and utilized to determine the conserve sequence motifs for translation start sites. Finally, the analysis results give the operon structure.
Sturgill David, Malone John H, Sun Xia, Smith Harold E, Rabinow Leonard, Samson Marie-Laure, Oliver Brian (2013), proposed a technique by using a series of quantitative and qualitative filters through the computer algorithms. The diagnosed errors are eliminated and then RNA sequence data are applied onto the simulation. The method is used commonly for the RNA sequence to identify the known alternative splicing events of determination. The software package based on their method is called Splicing Analysis Kit (Spanki). This package is easily available and can be downloaded from the various web portals. The main advantage of this software tool is to better understand the error profiles in RNA sequence data and then improve the influence from this new technology.
An J, Lai J, Sajjanhar A, Lehman ML and Nelson CC (2014), developed a user friendly plant miRNA tool called miRPlant which takes 16 plant miRNA datasets from four different plat species and gives 10 percent more accurate result as compared to miRDeep-P. miRDeep-P is one of the most popular plant miRNA prediction tool. There is a graphical user interface for the data input and output with the miRNA tool that supports more interactive input output interfaces to the users. The visual parameters are also good with characteristics of color based output for the pattern data sequences.
Trapnell C,Pachter L,Salzberg SL (2009) have developed a protocol for sequencing the messenger RNA in the cell which is call RNA-seq that generates the millions of short sequence fragments through the single execution. These fragments are again be used to measure levels of the gene expression to identify novel splice variants of genes. The current version of software tool with this protocol to align RNA-seq data with a genome relies on the known splice junctions and cannot identify novel ones. The algorithm of read mapping is TopHat which is designed to align reads from an RNA-seq experiment to a reference genome without relying on the known splice sites. The pipeline included with this protocol software tool is much faster than any previous systems for RNA sequence data analysis. An standard desktop computer can also be used to analyze the RNA sequence data with this software based tool. This software tool is free available and open source in nature.
Goecks J,Nekrutenko A,Taylor J andGalaxy Team (2010), have proposed a web based software platform for genomic research evaluation. This web based tool automatically tracks and manages the data prevalence and provides support for capturing the context and intent of computational methodologies. The web pages of this web based software platform are interactive, and hold the documentation for the supports. These documents also support the complete computational analysis process involved with this web based software platform.
Langmead B,Trapnell C,Pop M,Salzberg SL (2009), developed a ultra fast memory efficient program to align short DNA sequence reads to large genomes. Through this software tool 25 millions reads are taken by each Central Processing Unit of computers with memory footprints of approx. 1.3 gigabytes. This program has an inherent technique of quality aware backtracking algorithm which permits mismatches. The multi-processor cores can also be used simultaneous execution of program to achieve more alignment. This software program is also a open source software tool for the gene research in biological sciences.
Bullard JH,Purdom E,Hansen KD,Dudoit S (2010), developed a sequencing technological tool such as IIIumina Genome Analyzer for investigation of wide range of biological and medical problems. This tool has the integrated approach of statistical and computational approaches to get the meaningful and accurate conclusions from the massive and complex datasets generated by the sequences. The test strategies begin with the counting of genes. The result by this tool is affected by the features of sequencing platform such as length of varying gene, base calling calibration method and flow preparation. This tool has also quintile based normalization procedures and also demonstrates an improvement of detection. Due to lack of characterization of gene for RNA sequencing further research is suggested by them to make advancement in their proposed tool.
MajiRanjan Kumar, Sarkar Arijita, KhatuaSunirmal, DasguptaSubhasis and GhoshZhumu (2014), stated that TopHat v2.0.8 tool is more accurate in result and also performs the computational analysis very fast. The usage of CPU, memory footprint and execution time during the spliced alignment with their design PVT as pipelined version of TopHat removes the redundant computational steps during the spliced alignment. After that this breaks the job into a pipeline with multiple stages to enhance the utilization of resources. Hence, this tool reducing the execution time, processing time and maintains the functional efficiency and provides the much accuracy in results timely.
Torres García W, Zheng S, Sivachenko A, Vegesna R, Wang Q and etc. all (2014), have developed PRADA (A pipeline for RNA-sequencing data analysis) tool which is flexible, modular and also highly scalable in nature. This tool is basically a scalable software platform which gives different types of information available by multi-faceted analysis starting from raw paired-end RNA-seq data, gene expression levels and quality metrics. The detection is unsupervised and supervised fusion transcripts. The implemented algorithms under the PARDA has dual mapping strategy which increases sensitivity and refines the analytical endpoints.
Bacci, G, Bazzicalupo M, Benedetti A and Mengoni, A (2014), presented a RNA reads trimming software tool named as StreamingTrim written under the Java application programming. This software tool is able to analyze the quality of RNA sequences in fast files and to search for low quality zones in a very conservative pathway. The main aim of this tool to be developed as to be capable of trimming amplicon library data, retaining taxonomic information as much as possible. There is graphical user interface where this software tool is equipped that gives the user friendly interfaces for usage. StreamingTrim reads and analyze the sequence one by one form input fast file without storing or keeping anything in the memory. This is compatible to run on the desktop computer system and also with laptop computer. The trimmed sequence output is stored in a output file that make more efficient for later usability of the output taken from this tool.