BMIF 310: Foundations of Bioinformatics

Instructor: David L. Tabb, PhD

In this course, students will be introduced to the algorithms and concepts fundamental to the field of bioinformatics. The experimental problems addressed by these algorithms will be part of the examination of the software.

Prerequisites

Ideally, students will have prior exposure to computer programming, though software development is not a requirement of the class. Students who are likely to develop software tools (ranging from Perl scripts to number-crunching code) in support of their research are likely to benefit most from this class, though users of publicly available web utilities will also find it useful.

Graded Elements

Students will be evaluated on the basis of two scored elements, each comprising 50% of the final grade:

  • A brief quiz at the start of each class will test each student’s understanding of material presented in the previous class and any assigned readings.
  • Students will create a written report for a project and present their work to the class at the close of the semester. Example projects include a review of literature on a bioinformatics topic or a newly developed algorithm from one of the areas described in the course. Project plans must be approved by the course director no later than one month before the final class.

Overview of topics

Introduction

  • Biochemistry basics: nucleic acids, proteins, lipids, carbohydrates
  • Molecular biology basics: cells and organelles, transcription and translation, mutation anddamage repair, cellular signaling, etc.
  • Molecular underpinnings of example diseases
  • Types of data in molecular biology: DNA electropherograms,sequences, microarrays, gels, mass spectrometry, NMR, X-ray crystallography, etc.
  • Defining bioinformaticsand differentiating from computational biology

Sequence Analysis

  • Sequence alignment: Dot plots, Needleman-Wunsch, Smith-Waterman, Lipman-Pearson, BLAST
  • Multiple sequence alignment: ClustalW / phylograms / cladograms
  • Hidden Markov Models (HMMs) for motif detection
  • Protein families and domains: Interpro and Blocks
  • PAM and BLOSUM substitution matrices

Genome Bioinformatics

  • Phred: assessing error rates from sequencing electropherograms
  • Phrap: building sequence contigs from sequencing reads
  • History of NCBI
  • Polymorphism detection

Microarray Bioinformatics

  • Fundamentals of cDNA arrays.
  • Clustering genes: Quality Threshold Clustering
  • MIAME: standards for communication of microarray data
  • LIMS development

Proteome Bioinformatics

  • Protein structure inference
  • Predicting migration in 2D gel electrophoresis
  • Finding peaks in MALDI-TOF profiles
  • Statistical models for MS/MS peptide identification
  • MIAPE: standards for communication of proteomics data
  • Searching for biomarkers

Systems Bioinformatics

  • Genetic regulatory networks
  • Functional annotation of genes
  • Gene Ontology (GO) terms
  • ANNs, SVMs, and CART decision trees