Paradigm Shift in the Approaches for Gene Annotation

TA Thanaraj, AJ Robinson, J Muilu, J-J Riethovan

Briefings in Bioinformatics, 1(4):324-330, 2000

  1. Introduction

2. Introduction to the Scope of the Symposium

Limitations in the representivity of protein and EST sequence collections can limit the reliability of predictions;

Evolution of function and sequence may not be as tightly linked as is sometimes believed, making accurate predictions using homology inferences difficult;

A more general problem is that the methods for gene prediction are trained and optimized on short to medium length DNA sequences containing signal genes, rather than the current and much longer genomic sequences.

  1. Gene Annotation Method

Basic methodologies for identifying of coding regions in gene :

•Signal based

•Content based (codon usage)

•Similarity based

Statistical and mathematical techniques for structural element prediction :

•Decision tree approaches

•Discriminant analysis

•Other statistical approaches(Hidden Markov models)

  1. Large-scale Genome Annotation Efforts

It is now apparent that the bottleneck in genomics is no longer in sequencing the genomes, but lies in their annotation.

Need to combine diverse sources of data and methods.

Require visualization tools to manually examine the automatic annotation.

Integration of human expertise to assess the validity and authenticity of all computational results goes a long way to improve the quality of gene annotation.

Current Problem

Re-annotation of the Mycoplasma Pneumoniae genome discovered numerous errors.

Finding new drug targets with incomplete or incorrect annotation is very difficult.

Manual methods are time consuming and text abstraction is extremely difficult in such a varied vocabulary.

Problems still exist when gene finding due to errors in results from software analysis.

  1. The immediate future

As well as identification of coding regions, more efforts need to be directed at identifying the more difficult regions in genomes, such as promoters and regulatory regions.

Issue such as identify alternative transcripts that involve multiple choices at the level of promoters or splicing will become prominent.

Comparative genome analysis will play a signification role in genome annotation leading to a reduction in the number of predicted genes with no supporting evidence.

Created by : Jih-Wei Huang

Date : Aug. 15, 2002

1