COMPARATIVE GENOMICS WORKSHOP

‘RESSOURCEMENT’

BASIC RESOURCES

● General

NCBI http://www.ncbi.nlm.nih.gov/ Entrez nucleotide and protein data bases; Blast similarity search programs.

TAIR http://www.arabidopsis.org/ The Arabidopsis information resource.

TIGRhttp://plantta.jcvi.org/index.shtml AnnotatedArabidopsis, rice etc genomes. TIGR Gene Indices (analysis of public EST data (contig assembly, analysis of expression patterns).

EcoliHub http://ecolihub.org/ & EcoliWiki http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki

ExPASy Translate Tool http://www.expasy.ch/tools/dna.html Translates a DNA sequence in all 6 frames

ExPASy Compute PI/Mol Wt Tool http://www.expasy.ch/tools/pi_tool.html

ExPASy AACompIdent Tool http://www.expasy.ch/tools/aacomp/ Identification of a protein from its amino acid composition.

Primer3 http://frodo.wi.mit.edu/primer3/ Primer design site

● Multiple sequence alignment, phylogeny

Computational Approaches in Comparative Genomics http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=sef.TOC&depth=1 On-line textbook by EV Koonin & MY Galperin.

Multalin Sequence Alignment http://multalin.toulouse.inra.fr/multalin/ Aligns sequences (output in color) and makes phylogenetic trees.

ClustalW and Phylogenetic Treeshttp://www.ebi.ac.uk/Tools/msa/clustalw2/ Aligns protein sequences and makes phylogenetic trees.

T-Coffee & M-Coffee http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi Tools for combining and comparing multiple sequence alignments.

MEGA http://www.megasoftware.net/ The MEGA phylogeny program, downloads and manual.

iTOL Interactive tree of life

● Transmembrane and organellar targeting predictions

TMHMM http://www.cbs.dtu.dk/services/TMHMM/ Prediction of transmembrane helices.

TargetP http://www.cbs.dtu.dk/services/ Prediction of protein localization.

Predotar http://urgi.versailles.inra.fr/predotar/predotar.html Prediction of protein localization.

iPSORT http://hc.ims.u-tokyo.ac.jp/iPSORT/ Prediction of protein localization.

WoLF PSORT http://wolfpsort.org/ Prediction of protein localization.

Signal-3L http://www.csbio.sjtu.edu.cn/bioinf/Signal-3L/# Signal peptide prediction.

COSMOSS Ambiguous Targeting Predictor http://www.cosmoss.org/bm/ATP

● Long-range homology searches

PSI-BLAST Position-Specific Iterated BLAST)

Phyre http://www.sbg.bio.ic.ac.uk/phyre/ Protein Homology/analogY Recognition Engine.

PSIPRED GenTHREADER http://bioinf.cs.ucl.ac.uk/psipred/ Protein structure prediction server.

FFAS03 http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl Fold & Function Assignment System.

COMPASS http://prodata.swmed.edu/compass/compass.php COmparison of Multiple Protein sequence Alignments with assessment of Statistical Significance.

● Protein structures

TargetDB http://targetdb.pdb.org/ Gives experimental progress and status of targets selected for structure determination.

MMDB http://www.ncbi.nlm.nih.gov/sites/entrez?db=structure Molecular Modeling DataBase with >40,000 structures, linked to the rest of the NCBI databases.

● Conserved domains and motifs

COGs http://www.ncbi.nlm.nih.gov/COG/ Clusters of Orthologous Groups (COGs), delineated by comparing protein sequences encoded in many complete genomes representing 30 major phylogenetic lineages. Each COG consists of proteins from at least 3 lineages and thus corresponds to an ancient conserved domain.

CCD http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml NCBI Conserved Domain Database

PFAM http://pfam.sanger.ac.uk/ Protein FAMily database

PRODOM http://prodom.prabi.fr/prodom/current/html/home.php PROtein DOMain families database

ENZYME & METABOLIC PATHWAY RESOURCES

Swiss-Prot Enzyme http://ca.expasy.org/enzyme/ Enzyme nomenclature data base (linked to SWISS-PROT protein database, BRENDA, KEGG, etc)

IntEnz http://www.ebi.ac.uk/intenz/ Integrated relational Enzyme database

BRENDA http://www.brenda-enzymes.info/ Comprehensive enzyme database.

KEGG http://www.genome.ad.jp/kegg/ The Kyoto Encyclopedia of Genes and Genomes. Includes metabolic pathways, and compound structures that can be captured.

IUBMB http://www.chem.qmul.ac.uk/iubmb/ and the subsection on Reaction schemes http://www.chem.qmul.ac.uk/iubmb/enzyme/reaction/ The website of the International Union of Biochemistry and Molecular Biology – Searchable database on enzyme, enzyme nomenclature; some high quality information on pathways etc.

Thermodynamics of Enzyme-Catalyzed Reactions http://xpdb.nist.gov/enzyme_thermodynamics/

EcoSalhttp://www.ecosal.org/ EcoSal, a new, continually updated Web resource based on the ASM Press publication Escherichia coli and Salmonella: Cellular and Molecular Biology. EcoSal is a comprehensive archive of knowledge on the enteric bacterial cell and a good source of the latest knowledge of metabolic pathways.

BioCyc, EcoCyc & MetaCyc http://BioCyc.org/ EcoCyc - Encyclopedia of E. coli Genes and Metabolism; MetaCyc - Metabolic Encyclopedia. Also computationally-derived pathway/genome databases.

AraCyc http://www.arabidopsis.org/biocyc/index.jsp Similar to BioCyc, for Arabidopsis. Software allows querying, graphical representation of pathways, and overlay of expression data on the biochemical pathway overview diagram.

MetaCrop http://pgrc-35.ipk-gatersleben.de/pls/htmldb_pgrc/f?p=112:1:2126740855612167 Summarizes diverse information about around 40 metabolic pathways in crop plants

COMPARATIVE GENOMICS (‘PHYLOGENOMICS’) RESOURCES

SEED http://www.theseed.org/wiki/Main_Page Database containing hundreds of genomes and many valuable tools.

NMPDR http://www.nmpdr.org/cur/FIG/wiki/view.cgi/Main/WebHome National Microbial Pathogen Data Resource.

STRING http://string.embl.de/ Database of known and predicted protein-protein relationships, derived from genomic context (fusions, conserved gene clusters, co-occurrence), high throughput experiments (co-expression), and the literature. STRING quantitatively integrates data from bacteria and other organisms.

PHYDBAC http://igs-server.cnrs-mrs.fr/phydbac/ PHYDBAC displays phylogenomic profiles (fusions, co-occurrence, co-localization in genome) of bacterial protein sequences. Analyzing the annotation of a protein’s phylogenomic neighbors helps generate hypothetical functions for the query protein(s).

FusionDB http://igs-server.cnrs-mrs.fr/FusionDB/main.html FusionDB is a database of bacterial and archaeal gene fusion events.

MGDB http://mbgd.genome.ad.jp/ Microbial Genome DataBase for comparative genomics.

Signature Genes http://www.nmpdr.org/FIG/wiki/rest.cgi/NmpdrPlugin/search?Class=SigGenes Locates genes associated with one set of organisms but not with another set.

Phylogenetic Profiler http://img.jgi.doe.gov/cgi-bin/pub/main.cgi?section=PhylogenProfiler&page=phyloProfileForm Phylogenetic Profiler for Single Genes.

E. coli Phenotypic Landscape http://ecoliwiki.net/tools/chemgen/

MICROARRAY DATABASES AND ANALYSIS RESOURCES

● General

GEO http://www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus

● Arabidopsis

Golm Transcriptome database http://csbdb.mpimp-golm.mpg.de/csbdb/dbxp/ath/ath_xpmgq.html Good tools for getting an overview of expression, and for finding co-responses.

ATTEDhttp://atted.jp/ A simple site to use to look for co-expression patterns; it shows gene networks, not just lists of correlated genes.

GeneCAT http://genecat.mpg.de/ GeneCAT Gene Co-expression Analysis Toolbox for Arabidopsis, rice, poplar, and barley

Diurnal http://diurnal.cgrb.oregonstate.edu/ Circadian/Diurnal gene expression data for an individual or set of Arabidopsis, rice, or poplar genes

PRIMe http://prime.psc.riken.jp/ Server for metabolomics and transcriptomics, tools for metabolomics, transcriptomics and integrated analysis of different omics data.

PED http://bioinfo.ucr.edu/projects/Unknowns/external/express.html Plant Gene Expression Database

Botany Array Resource http://bbc.botany.utoronto.ca/ Tools for finding co-responses, electronic Northerns.

● Bacteria

MicrobesOnline http://www.microbesonline.org/ A comprehensive database that includes correlated gene expression in E. coli and other bacteria

EcoGene http://ecogene.org/ A rich resource on E. coli that includes Microarray data on the major changes in gene expression observed in various experiments.

GenExpDB http://chase.ou.edu/oubcf/ E. coli Community Gene Expression DataBase

● Yeast

SPELL http://imperio.princeton.edu:3000/yeast Co-response search tool for yeast

● Mammals

BioGPS http://biogps.gnf.org/#goto=welcome

PLANT PHENOME DATABASES

RAPID http://rarge.gsc.riken.jp/phenome/ RIKEN Arabidopsis Phenome Information Database, phenotypic data in transposon-insertional mutants.

SeedGenes http://www.seedgenes.org/ Genes that give a seed phenotype when disrupted by mutation.

Chloroplast2010 http://www.plastid.msu.edu/ Large set of phenotypic for homozygous mutant of chloroplast genes.

BAPDB http://bioweb.ucr.edu/bapdb/ Bioassay And Phenotype DataBase

PLANT PROTEOME DATABASES

PPDB http://ppdb.tc.cornell.edu/ The Plant Proteome DataBase

SUBA II http://www.plantenergy.uwa.edu.au/suba2/ SUB-cellular location database for Arabidopsis proteins (includes GFP and MS-MS data)

UNKNOWN GENE/ENZYME DATABASES

POND http://bioweb.ucr.edu/scripts/unknownsDisplay.pl Plant Unknown-eome DB (POND) – Arabidopsis Unknown-eome

ORENZA http://www.orenza.u-psud.fr/ ORphan ENZyme Activities database (lists 1,200 orphan enzymes)

ADOMETA http://vitkuplab.cu-genome.org/html/adometa/adometa.html ADoption of Orphan METabolic Activities (Orphan enzyme activities in E. coli, B. subtilis, and S. cerevisiae).

GREP http://bio200082.bio.tcd.ie/cgi-bin/GREP/index.cgi Generator of Reaction Equations & Pathways look for reported and putative enzyme reaction equations, especially designed for finding metabolic pathways on orphan metabolites (compounds known to be present at least in a living organism, but whose synthetic/degradation pathways are unknown).

LITERATURE MINING RESOURCES

PubMed Central http://www.ncbi.nlm.nih.gov/sites/entrez?db=pmc

HighWire Press http://highwire.stanford.edu/

Google Scholar http://scholar.google.com/

eTBlast http://etest.vbi.vt.edu/etblast3/

iHOP http://www.ihop-net.org/UniPub/iHOP/ information Hyperlinked Over Proteins

Updated 6/7/11