Virtual Classrooms and E-Learning: Bringing Cheminformatics

Training Into Academic and Industrial Settings

TJ O'Donnell

O'Donnell Associates

Norah E. MacCuish and John D. MacCuish

Mesa Analytics &Computing, LLC

Introduction

The use of computers in chemistry has grown quickly over few decades to a point where it is ubiquitous in the pharmaceutical industry. While one can think of it as being derived from roots in physical chemistry, it is an integrative discipline, incorporating ideas from physical chemistry, organic chemistry and computer science. After 25 years, it may be considered as a discipline in its own right. There are several textbooks devoted exclusively to cheminformatics. Many universities offer courses in cheminformatics.

We have begun a project that integrates the teaching of concepts with real-world software applications. The initial goal of the project is to create modules that will be used in graduate courses in cheminformatics. These modules could be extended to offer an introduction or training to chemists in industry. They could also provide educational assistance to students in undergraduate, or even earlier levels of education. During the first phase of the project, we demonstrated the feasibility of such a project by creating a web-based module to introduce the concepts of fingerprints, clustering, and sub-structure commonality analysis. We also made contact with several universities throughout the U.S. who expressed a need for tools like the ones we will develop.

In addition to Mesa Analytics & Computing’s software, a group of vendor participants’ products are also included. Together, these offer the full range of state-of-the-art cheminformatics and modeling. We show ChemAxon’s Marvin Tools integration into a prototype module as an example of the use of one of the vendor participants.

Web-based Courses

We have begun the second phase of our research. Our goal is to create web-based modules that cover distinct topics in cheminformatics, such as molecular representations, fingerprinting, clustering, databases, 3D modeling, etc. We have coordinated our efforts with professors at several universities that are currently offering courses in cheminformatics. Rather than providing an entire course including HTML pages of information and links, we have chosen to create CGI and Java interactive tools, with only a small amount of explanatory text. This approach relies on the teachers to provide the bulk of the text-based materials. They will use our tools in ways that integrate best with their current courses. In addition, they will provide feedback on how well our modules work in their courses. This will allow us to continually improve the modules and expand them into other areas. This approach should also work well with potential users in industry and in undergraduate and earlier educational institutions.

Interactive Modules

While we are using modern techniques to deliver instruction on the internet, our approach is based on traditional methods of education. Our modules correspond to chapters in a book, or perhaps even an entire university course. It is exciting to speculate that entire courses might someday be devoted to the single topics of fingerprints and clustering, the use of databases in cheminformatics or 3D modeling of chemical interactions.

Our use of interactive web-based methods, using CGI and Java corresponds to the traditional use of laboratories to augment classroom education. As in a laboratory setting, students using our modules will be directed to accomplish certain goals, but still be free to experiment with ways of using each of the particular computer tools at their disposal.

Demonstration

In order to better explain our approach, we demonstrate our first prototype module. It summarizes aspects of fingerprinting, clustering, and the identification of sub-structure commonalities among a group of similar chemical structures using Mesa’s ChemTattoo®. This module uses a web-client browser to display text and images and allow interaction with the student. It uses a web-server to process the student’s input and provide the appropriate results.

Fingerprints

Fingerprints are computed using software from Mesa that uses the MDLI’s MACCS 320 keys[1]. These can be used to group molecular structures or to identify interesting and important sub-structural fragments contained in the set of input structures. The structures are input from a variety of sources on the students’ computer: uploaded files of SMILES or SDF files, sketched using the ChemAxon Marvin Sketcher, or by pasting in a list of SMILES. The variety of input methods is rather typical of real-world work in cheminformatics. It also introduces the concepts of SMILES, sketchers, and connection table files to show how they all can be used to represent the same molecular structures. The Marvin Viewer can be used to verify the correct input of structures after uploading, pasting, or sketching. Finally, the student asks for the fingerprints to be computed. She is then shown typical output from the program, including a type of representation of the bitstring fingerprint, seen here in Figure 1.

Figure 1. Fingerprint generation and graphic representation.

Clustering

Once the fingerprints are computed, students can choose to cluster the input structures. Figure 2 shows the clustering module setup page. The results from the clustering are the typical text output as well as a graphical display of results. These include a dendrogram – in this example we use hierarchical clustering -- with which the student can interact to view the contents of each cluster using MarvinView. In addition, an interactive graph of hierarchical level-selection statistics is displayed to demonstrate the trade-offs inherent in selecting the final clustering result. Figure 3 shows both views.

Figure 2. Clustering Module set up page with MarvinSketch

Figure 3. Interactive clustering dendrogram (left); level selection and ambiguity plot (right).

Sub-Structure Commonalities

Another method that can be used to characterize the set of input structures is to identify sub-structural features that are common among them. We use the ChemTattoo®,, analogous to the Stigmata program[2], a public domain contributed software that uses Daylight, CIS fingerprints. ChemTattoo®, finds features common as defined by the MACCS 320 keys (or any predefined set of SMARTS-base key set) among a group of similar structures. This is what is known as a modal fingerprint. Those features can then in turn be displayed, and if they overlap, the frequency of the intersection of features can be enumerated and visually identified via atom and bond coloring of the structural depictions via MarvinView. What features are in common can also be relaxed via a threshold. Figure 4 shows an example of the results of ChemTattoo visualized with MarvinView.

Figure 3. ChemTattoo results with MarvinView display

Slides and Demo

The following slides summarize the description presented here and provide screen images of typical results while using this module. During the presentation, we also show a live demonstration of this module running with the web-server and web-client on a laptop computer. While we envision providing a web-server on the internet (or university intranet), our server-client based model does not require that configuration and can be easily adapted for use on one single computer.

Acknoweldgement

Our research results are based upon work supported by the National Science Foundation Small Business Innovation Research (SBIR) Program under Grant No. 0450457. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

[1] Reoptimization of MDL Keys for Use in Drug Discovery , J. L. Durant, B. A. Leland, D. R. Henry, J. G. Nourse, JCICS, 2002, 42 (6), 1273-1280.

[2] N.E. Shemetulskis, D. Weininger, C.J. Blankley, J.J. Yang, and C. Humblet, "Stigmata: An Algorithm To Determine Structural Commonalities in Diverse Datasets", Journal of Chemical Information and Computer Sciences , 36(4),1996,862-871.