COMPARATIVE GENOMICS WORKSHOP
‘RESSOURCEMENT’
BASIC RESOURCES
● General
NCBI http://www.ncbi.nlm.nih.gov/ Entrez nucleotide and protein data bases; Blast similarity search programs.
TAIR http://www.arabidopsis.org/ The Arabidopsis information resource.
TIGRhttp://plantta.jcvi.org/index.shtml AnnotatedArabidopsis, rice etc genomes. TIGR Gene Indices (analysis of public EST data (contig assembly, analysis of expression patterns).
EcoliHub http://ecolihub.org/ & EcoliWiki http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki
ExPASy Translate Tool http://www.expasy.ch/tools/dna.html Translates a DNA sequence in all 6 frames
ExPASy Compute PI/Mol Wt Tool http://www.expasy.ch/tools/pi_tool.html
ExPASy AACompIdent Tool http://www.expasy.ch/tools/aacomp/ Identification of a protein from its amino acid composition.
Primer3 http://frodo.wi.mit.edu/primer3/ Primer design site
● Multiple sequence alignment, phylogeny
Computational Approaches in Comparative Genomics http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=sef.TOC&depth=1 On-line textbook by EV Koonin & MY Galperin.
Multalin Sequence Alignment http://multalin.toulouse.inra.fr/multalin/ Aligns sequences (output in color) and makes phylogenetic trees.
ClustalW and Phylogenetic Treeshttp://www.ebi.ac.uk/Tools/msa/clustalw2/ Aligns protein sequences and makes phylogenetic trees.
T-Coffee & M-Coffee http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi Tools for combining and comparing multiple sequence alignments.
MEGA http://www.megasoftware.net/ The MEGA phylogeny program, downloads and manual.
iTOL Interactive tree of life
● Transmembrane and organellar targeting predictions
TMHMM http://www.cbs.dtu.dk/services/TMHMM/ Prediction of transmembrane helices.
TargetP http://www.cbs.dtu.dk/services/ Prediction of protein localization.
Predotar http://urgi.versailles.inra.fr/predotar/predotar.html Prediction of protein localization.
iPSORT http://hc.ims.u-tokyo.ac.jp/iPSORT/ Prediction of protein localization.
WoLF PSORT http://wolfpsort.org/ Prediction of protein localization.
Signal-3L http://www.csbio.sjtu.edu.cn/bioinf/Signal-3L/# Signal peptide prediction.
COSMOSS Ambiguous Targeting Predictor http://www.cosmoss.org/bm/ATP
● Long-range homology searches
PSI-BLAST Position-Specific Iterated BLAST)
Phyre http://www.sbg.bio.ic.ac.uk/phyre/ Protein Homology/analogY Recognition Engine.
PSIPRED GenTHREADER http://bioinf.cs.ucl.ac.uk/psipred/ Protein structure prediction server.
FFAS03 http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl Fold & Function Assignment System.
COMPASS http://prodata.swmed.edu/compass/compass.php COmparison of Multiple Protein sequence Alignments with assessment of Statistical Significance.
● Protein structures
TargetDB http://targetdb.pdb.org/ Gives experimental progress and status of targets selected for structure determination.
MMDB http://www.ncbi.nlm.nih.gov/sites/entrez?db=structure Molecular Modeling DataBase with >40,000 structures, linked to the rest of the NCBI databases.
● Conserved domains and motifs
COGs http://www.ncbi.nlm.nih.gov/COG/ Clusters of Orthologous Groups (COGs), delineated by comparing protein sequences encoded in many complete genomes representing 30 major phylogenetic lineages. Each COG consists of proteins from at least 3 lineages and thus corresponds to an ancient conserved domain.
CCD http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml NCBI Conserved Domain Database
PFAM http://pfam.sanger.ac.uk/ Protein FAMily database
PRODOM http://prodom.prabi.fr/prodom/current/html/home.php PROtein DOMain families database
ENZYME & METABOLIC PATHWAY RESOURCES
Swiss-Prot Enzyme http://ca.expasy.org/enzyme/ Enzyme nomenclature data base (linked to SWISS-PROT protein database, BRENDA, KEGG, etc)
IntEnz http://www.ebi.ac.uk/intenz/ Integrated relational Enzyme database
BRENDA http://www.brenda-enzymes.info/ Comprehensive enzyme database.
KEGG http://www.genome.ad.jp/kegg/ The Kyoto Encyclopedia of Genes and Genomes. Includes metabolic pathways, and compound structures that can be captured.
IUBMB http://www.chem.qmul.ac.uk/iubmb/ and the subsection on Reaction schemes http://www.chem.qmul.ac.uk/iubmb/enzyme/reaction/ The website of the International Union of Biochemistry and Molecular Biology – Searchable database on enzyme, enzyme nomenclature; some high quality information on pathways etc.
Thermodynamics of Enzyme-Catalyzed Reactions http://xpdb.nist.gov/enzyme_thermodynamics/
EcoSalhttp://www.ecosal.org/ EcoSal, a new, continually updated Web resource based on the ASM Press publication Escherichia coli and Salmonella: Cellular and Molecular Biology. EcoSal is a comprehensive archive of knowledge on the enteric bacterial cell and a good source of the latest knowledge of metabolic pathways.
BioCyc, EcoCyc & MetaCyc http://BioCyc.org/ EcoCyc - Encyclopedia of E. coli Genes and Metabolism; MetaCyc - Metabolic Encyclopedia. Also computationally-derived pathway/genome databases.
AraCyc http://www.arabidopsis.org/biocyc/index.jsp Similar to BioCyc, for Arabidopsis. Software allows querying, graphical representation of pathways, and overlay of expression data on the biochemical pathway overview diagram.
MetaCrop http://pgrc-35.ipk-gatersleben.de/pls/htmldb_pgrc/f?p=112:1:2126740855612167 Summarizes diverse information about around 40 metabolic pathways in crop plants
COMPARATIVE GENOMICS (‘PHYLOGENOMICS’) RESOURCES
SEED http://www.theseed.org/wiki/Main_Page Database containing hundreds of genomes and many valuable tools.
NMPDR http://www.nmpdr.org/cur/FIG/wiki/view.cgi/Main/WebHome National Microbial Pathogen Data Resource.
STRING http://string.embl.de/ Database of known and predicted protein-protein relationships, derived from genomic context (fusions, conserved gene clusters, co-occurrence), high throughput experiments (co-expression), and the literature. STRING quantitatively integrates data from bacteria and other organisms.
PHYDBAC http://igs-server.cnrs-mrs.fr/phydbac/ PHYDBAC displays phylogenomic profiles (fusions, co-occurrence, co-localization in genome) of bacterial protein sequences. Analyzing the annotation of a protein’s phylogenomic neighbors helps generate hypothetical functions for the query protein(s).
FusionDB http://igs-server.cnrs-mrs.fr/FusionDB/main.html FusionDB is a database of bacterial and archaeal gene fusion events.
MGDB http://mbgd.genome.ad.jp/ Microbial Genome DataBase for comparative genomics.
Signature Genes http://www.nmpdr.org/FIG/wiki/rest.cgi/NmpdrPlugin/search?Class=SigGenes Locates genes associated with one set of organisms but not with another set.
Phylogenetic Profiler http://img.jgi.doe.gov/cgi-bin/pub/main.cgi?section=PhylogenProfiler&page=phyloProfileForm Phylogenetic Profiler for Single Genes.
E. coli Phenotypic Landscape http://ecoliwiki.net/tools/chemgen/
MICROARRAY DATABASES AND ANALYSIS RESOURCES
● General
GEO http://www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus
● Arabidopsis
Golm Transcriptome database http://csbdb.mpimp-golm.mpg.de/csbdb/dbxp/ath/ath_xpmgq.html Good tools for getting an overview of expression, and for finding co-responses.
ATTEDhttp://atted.jp/ A simple site to use to look for co-expression patterns; it shows gene networks, not just lists of correlated genes.
GeneCAT http://genecat.mpg.de/ GeneCAT Gene Co-expression Analysis Toolbox for Arabidopsis, rice, poplar, and barley
Diurnal http://diurnal.cgrb.oregonstate.edu/ Circadian/Diurnal gene expression data for an individual or set of Arabidopsis, rice, or poplar genes
PRIMe http://prime.psc.riken.jp/ Server for metabolomics and transcriptomics, tools for metabolomics, transcriptomics and integrated analysis of different omics data.
PED http://bioinfo.ucr.edu/projects/Unknowns/external/express.html Plant Gene Expression Database
Botany Array Resource http://bbc.botany.utoronto.ca/ Tools for finding co-responses, electronic Northerns.
● Bacteria
MicrobesOnline http://www.microbesonline.org/ A comprehensive database that includes correlated gene expression in E. coli and other bacteria
EcoGene http://ecogene.org/ A rich resource on E. coli that includes Microarray data on the major changes in gene expression observed in various experiments.
GenExpDB http://chase.ou.edu/oubcf/ E. coli Community Gene Expression DataBase
● Yeast
SPELL http://imperio.princeton.edu:3000/yeast Co-response search tool for yeast
● Mammals
BioGPS http://biogps.gnf.org/#goto=welcome
PLANT PHENOME DATABASES
RAPID http://rarge.gsc.riken.jp/phenome/ RIKEN Arabidopsis Phenome Information Database, phenotypic data in transposon-insertional mutants.
SeedGenes http://www.seedgenes.org/ Genes that give a seed phenotype when disrupted by mutation.
Chloroplast2010 http://www.plastid.msu.edu/ Large set of phenotypic for homozygous mutant of chloroplast genes.
BAPDB http://bioweb.ucr.edu/bapdb/ Bioassay And Phenotype DataBase
PLANT PROTEOME DATABASES
PPDB http://ppdb.tc.cornell.edu/ The Plant Proteome DataBase
SUBA II http://www.plantenergy.uwa.edu.au/suba2/ SUB-cellular location database for Arabidopsis proteins (includes GFP and MS-MS data)
UNKNOWN GENE/ENZYME DATABASES
POND http://bioweb.ucr.edu/scripts/unknownsDisplay.pl Plant Unknown-eome DB (POND) – Arabidopsis Unknown-eome
ORENZA http://www.orenza.u-psud.fr/ ORphan ENZyme Activities database (lists 1,200 orphan enzymes)
ADOMETA http://vitkuplab.cu-genome.org/html/adometa/adometa.html ADoption of Orphan METabolic Activities (Orphan enzyme activities in E. coli, B. subtilis, and S. cerevisiae).
GREP http://bio200082.bio.tcd.ie/cgi-bin/GREP/index.cgi Generator of Reaction Equations & Pathways look for reported and putative enzyme reaction equations, especially designed for finding metabolic pathways on orphan metabolites (compounds known to be present at least in a living organism, but whose synthetic/degradation pathways are unknown).
LITERATURE MINING RESOURCES
PubMed Central http://www.ncbi.nlm.nih.gov/sites/entrez?db=pmc
HighWire Press http://highwire.stanford.edu/
Google Scholar http://scholar.google.com/
eTBlast http://etest.vbi.vt.edu/etblast3/
iHOP http://www.ihop-net.org/UniPub/iHOP/ information Hyperlinked Over Proteins
Updated 6/7/11