Computer Science 91.510

Computational Methods in Molecular Biology

Spring 2004

Instructor: Georges Grinstein

Office: Olsen 301 Phone: 978-934-3627

Email:

Office Hours:1:30 – 2:20 PM Tu-Th, and by appt.

Course Time and Place: Wed 5:30 – 8:20 PMin Olsen 410

Course web page:

Material: This is an advanced course in computer science, focusing on current problems in genomics. Our emphasis will be analytic, on discovering appropriate combinatorial algorithm problems and the techniques to solve these problems. Primary topics will include DNA sequence assembly, DNA/protein sequence comparison, phylogenetic trees, RNA and protein folding, microarray analysis, and their applications to human health.

The course will use retinol-binding protein 4 (RBP4) as a model gene/protein. RBP4 is a member of the lipocalin family. It is a small, abundant carrier protein.

We will study it in a variety of contexts including

  • sequence alignment
  • gene expression
  • protein structure
  • phylogeny
  • homologs in various species

We will also use the Pol protein of HIV-1 as an example.

Homeworks and Software: The most important homework will be readings in the primary textbooks, related chapters in secondary reference books, related papers, and solving the laboratory problems. In each case, public domain or other software will be made available to run the algorithms on public or other similar data sets.

Grading: Grades will be assigned based on the following formula:

  • Presentations - 10%
  • Laboratory, problem, and algorithm web page(portfolio/notebook) – 20%
  • Final Project – 40%
  • Final Exam – 30%
  • Extra credit: find a mistake in a database or in an algorithm in some public domain software

Presentation: I will be lecturing on foundational material in computational biology and algorithms. The following week, to augment this theoretical material, each student (presumably in a group pair) will be required to make a presentation on the modern algorithms in software tools currently available for some class of problems. This presentation will emphasize both what the software does via demonstration, and a discussion of what the associated algorithmic issues are (with pseudocode) . Each presentation will be available on your web page with links to appropriate systems and resources, as well as the slides used in the presentation and documented pseudocode. So each class, except for the first few, will consist of an advanced presentation related to last week’s topic, followed by my presentation on the topic at hand.

Web page: You will place all results from the labs, problems, exercises, algorithms designed (pseudocode) or implemented (code) on your web page. This web page is how I will be able to evaluate you.

Project: This is your opportunity to study and apply all aspects of the course topics in depth. You will discover a novel gene (by April 30) and corresponding phylogenetic tree (by May 13).The final results will be presented at a class poster session (as a PPT presentation) as well as written up as a major report. Electronic versions of both, along with supplementation information including figures, history, references, datasets, and custom software will be also made available on your web page.

Exercises: I will be making up some exercises for practice both from the algorithmic and biological viewpoints. Simply place the answers to these on your web pages. These are required to pass the course but will count toward your grade.

Notes and guidelines:

1. This is an advanced course in algorithms, focused on applications to Molecular Biology. It is targeted to advanced Masters and Doctoral students. Computer Science students should not take this course if they do not have good knowledge of (or done badly in) a course in algorithms (ideally the equivalent of the graduate course). Biomedical engineering students (and life science or medical school students) are expected to have good knowledge of biology, genetics and biochemistry.

2. I suggest that you form pairs with one computational science student and one life scientist student for presentations and projects, so as to help each other get a more balanced view.

3. Please check the WWW page for the course regularly. All course handouts and materials are available there, along with the latest announcements.

4. Begin developing a professional web page related to the course. Place notes, figures, and datasets there. Your final project will be placed there.

5. Because a primary goal of the course is to teach professionalism, any academic dishonesty will be viewed as evidence that this goal has not been achieved, and will be grounds for receiving a grade of F. (See CS and University procedures and guidelines on academic dishonesty).

Textbooks: There are two reference textbooks for this course. Some of the material we will also cover appears the secondary references, in order of priority! Additional readings will be assigned with papers available on the course web page.

Required Textbooks:

  • Pevsner, Bioinformatics and Functional Genomics, Wiley-Liss Publishers, .John Wiley and Sons, 2003, ISBN 0-471-21004-8
  • Setubal and Meidanis, Introduction to Computational Molecular Biology, Brooks Cole Publishing Company, 1997. ISBN 0-534- 95262-3

Primary additional textbooks (Not required)

  • Durbin, Eddy, Krogh, and Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, CambridgeUniversity Press, 1998. Seventh printing 2002; Paperback ISBN 0-521-62971-3
  • Kohane, Kho and Butte, Microarrays for an Integrative Genomics, A Bradford Book, MIT Press, 2003. ISBN 0-262-11271-X

Secondary Textbooks (Algorithms):

  • Mount, Bioinformatics: Sequence and Genome Analysis, ColdSpringHarbor Laboratory Press, 2001. ISBN 0-87969-597-8
  • Jiang, Xu, and Zhang, Current Topics in Computational Molecular Biology, A Bradford Book, MIT Press, 2002. ISBN 0-262-10092-4

Secondary Textbooks (Bioinformatics – Biological Viewpoint):

  • Krane and Raymer, Fundamental Concepts of Bioinformatics, 2003. Paperback ISBN 0-8053-4633-3
  • Campbell and Heyer, Discovering Genomics, Proteomics, and Bioinformatics, Benjamin Cummings, 2003. Paperback ISBN 0-8053-4722-4

Additional References - Algorithms:

  • Salzberg. Searls and Kasif, Computational Methods in Molecular Biology, Elsevier, 2002. ISBN 0-444-50-204-1
  • Waterman, Introduction to Computational Molecular Biology: Maps, Sequences and Genomes, Chapman & Hall, CRC Press, 1995, CRC reprint 2000. ISBN 0-412-99391-0
  • Gusfield, Algorithms on Strings, Trees, and Sequences, CambridgeUniv. Press, 1997. ISBN 0-521-58519-8
  • Baldi and Brunak, Bioinformatics: The Machine Learning Approach,
    MIT Press, 2001. ISBN 0-262-0256-X

Additional References – Molecular Biology:

  • Thompson, Hellack, Braver and Durica, Primer of Genetic Analysis: A Problems Approach, CambridgeUniversity Press, 1997. Reprinted 2000. Paperback isbn 0-521-47312-8
  • Clark and Russell, Molecular Biology made simple and fun, Cache River Press, 1997. Paperback ISBN 0-9627422-9-5
  • Frank-Kamenetskii, Unraveling DNA: The Most Important Molecule of Life, Perseus Books, 1997. Paperback ISBN 0-201-15884-2
  • Baldi and Hatfield, DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling, CambridgeUniversity Press, 2002. ISBN 0-521-80022-6
  • Knudsen, A Biologist’s guide to Analysis of Microarray Data, Wiley-LISS, 2002. ISBN 0-471-22490-1
  • Warrington, Todd and Wong, Microarrays and Cancer Research, BioTechniques Press, Eaton Publishing, 2002. Paperback ISBN 1-881299-51-1

Others:

  • Bishop and Rawlins, DNA and ProteinSequenceAnalysisOxfordUniversity Press, 1997.
  • Baxevanis and Ouellette, Bioinformatics, Wiley, 1998
  • Sankoff and Kruskal Time Warps, String Edits, and Macromolecules, CSLI Publications 1999 (reprint).
  • Watson, Gilman, Witkowski, and Zoller, Recombinant DNA, Scientific American Press, 1992.

Recordings: the lectures will be recorded and posted at:


Schedule (P=Pevsner; SM=Setubal&Meidanis)

Week / Lecture / Book Chapter / Topic
Jan 28 / Introduction to
bioinformatics / P1, P2, SM1 / Sequence analysis
Feb 4 / Pairwise alignment:
algorithms and matrices / P3, SM2
Feb 9 / BLAST and related programs / P4, SM3
Feb 19 / Advanced database searching / P5, SM3
Feb 23 / Gene expression / P6 / Functional genomics
Mar 1 / Gene expression: microarray data analysis / P7
Mar 8 / Protein families & proteomics / P8, SM3
Mar 22 / Protein structure / P9, SM8
Mar 29 / Multiple sequence alignment / P10, SM3
Apr 5 / Molecular phylogeny: principles / P11, SM6
Apr 12 / Molecular phylogeny: making trees / P11
Genome Analysis:
Fragment Assembly; Physical Mappings of DNA; Genome Rearrangements; Systematics / P12, SM4, SM5, SM7 / Genomics
Apr 21* / Completed genomes: viruses, prokaryotes and fungi / P13, P14, P15, SM4, SM5, SM7
Apr 26 / Functional analysis of pathways: yeast / P15
May 3 / Eukaryotic genomes: from parasites to primates / P16
May 10 / Human genome and disease / P17, P18
Final Project Presentations / Final Exam

NOTE:

Send me an email with a message (no more than one page) stating

1)your Computer Science, Mathematics, Biology and Chemistry backgrounds;

2)your goals and research interests;

3)what you hope to learn from taking this course; and

4)the amount of time you expect to spend on this course.