Linking Biomolecular Structure and Function

LINKING BIOMOLECULAR STRUCTURE AND FUNCTION

THROUGH INTEGRATED COMPUTER AND PHYSICAL MODELING

MICHAEL H. PATRICK

Department of Medical Genetics, University of Wisconsin-Madison

TIM HERMAN

Center for BioMolecular Modeling, Milwaukee School of Engineering

Understanding biology at the molecular level is, for many students, a daunting challenge because it is abstract and not tangible: they are asked to make inferences about systems with which they have no experience and to provide answers to questions they have never asked. We have developed an inquiry-driven approach to help make the molecular world real and relevant to students, including those whose interest and direction may lie outside the sciences. At the core of this approach is the integrated use of computer visualization software and unique, 3-dimensional physical models of proteins, nucleic acids, and other biomolecules created by state-of-the-art rapid prototyping technologies, based on atomic coordinates of solved structures. These are used to make predictions about structure-function relationships that can be tested experimentally. A multi year research project is in progress to evaluate the efficacy of this approach at the college and pre-college levels.

Going from the observable to the unobservable

Science involves constructing an understanding of something that is unknown; people who become scientists are attracted by that. For many students, however, the study of science can be frustrating and incomprehensible; some may even find it to be a useless exercise utterly irrelevant to their lives. This is exacerbated by the learning mode in many classrooms: short-term memorization of facts…... facts that students fail to connect with fundamental concepts or of how their world works. Indeed, it has been suggested that one of the problems with our educational system is that we reward students for providing answers to questions they have never asked.

The similarity in learning science or doing science is that both involve constructing understanding and new knowledge based on something already known; and, for both, a certain amount of abstraction is involved. Abstract reasoning seems easier, however, if it is based upon past experiences and/or familiar objects or phenomena.Consider for example, the flow of genetic information, from genotype to phenotype, a fundamental concept in biology. This concept is central to understanding the basis for both the continuity and diversity of species and understanding modern biotechnology. At the level of the organism, this flow of genetic information is readily observed and understood through classical means of inferring genotypes from observable traits. In the classroom, there are several outstanding activities using plants, animals, and microbes that promote constructing this empirical understanding. Moreover, structure-function relationships, while sometimes subtle, are nevertheless directly observable at the level of organisms because the relevant anatomy and physiology is directly observable and comparable among species.

To most students, however, abstraction involving the molecular world is difficult to impossible, because this world is not “real” --- it is not related to things they already know and/or could infer from experience and observation. If it is not “real”, it is not surprising that it seems irrelevant to them– and to many, arcane if not magic. Moreover, it is usually presented in a language – chemistry – in which students are usually not at all fluent. The result is often that connecting molecular structure with function, and therefore understanding what might be called the “molecular logic of life”, does not occur.

This problem is equally serious at both the precollege and the undergraduate levels.. According to a major study conducted by the National Institute for Science Education at UW-Madison, "Introductory collegelevel science, mathematics, engineering, and technology courses act as curriculum "pressure points" they shape student career trajectories, influence science literacy, and promote equity."[1] One of the problems, however, is that these introductory level courses usually do not connect the science being taught with other sciences or with specific career goals. And in upper level courses, instructors often engage their students in the same laboratory exercises they experienced as undergraduates[2].

Because of the revolution experienced in the biological sciences in the last half of this century, the interface between the biological and the physical sciences has blurred. Hybrid fields have emerged as biological problems under investigation are studied with thinking and tools that are drawn from chemistry, physics, mathematics and computer science. However, many undergraduate courses in biology do not reflect this change. This is due, in part, to an inadequate background of students in the physical sciences, exacerbated by insufficient connections between biology and the physical sciences. This problem in turn resonates with inadequate learning materials and opportunities for students to interact with them (e.g. laboratories which are not only anachronistic in their content, but also cookbook in their format).

Pedagogical approaches that address some of these problems have been identified by NISE and others [3], including projects funded by the NSF Systemic Chemistry Initiative [4] They include the use of:

studentfocused active learning
guided inquiry/openended laboratories
interdisciplinary connections
topicoriented approach
information technology

Molecular structure and function and how we teach it

One of the great epiphanies of 20th century science was discovering how a molecule’s function arises from its structure, and realizing how information is conveyed through molecular shape. This is at the core of molecular and cell biology and modern biotechnology. Seeing the 3-dimensional structure of a biological molecule for the first time can be a revelation for the scientist and the student. The Watson-Crick DNA structure, for example, offers immediate insight into how the molecule copies itself and thus, how heredity works. If, armed with equal insight regarding protein structure, students might more easily appreciate the flow of genetic information at the molecular level, as codified by Francis Crick in his famous Central Dogma: how the information encoded digitally in a one-dimensional chemical array in DNA becomes structural, 3-dimensional information in the form of a protein that carries out some function necessary for life.

At the same time that our understanding of the molecular world has experienced tremendous growth, our methods to bring this understanding to our students has not. For example, we still rely almost entirely on static, two-dimensional drawings of proteins and protein active sites to present this material to students. A survey of popular undergraduate biochemistry texts, for example, reveals many similarities in their treatment of the enzyme lysozyme. Two texts used a well-known drawing of this enzyme with its bound substrate. In one text, this figure is accompanied by the paragraph [5]:

In another textbook, a computer rendering of a space filled" view of lysozyme is shown [6],

While in both cases the text that accompanies these figures is clear, precise and accurate, it is nevertheless unrealistic to think that students can truly appreciate the significance of these statements while viewing these two-dimensional figures.

Modeling molecular structure

Until recently, graphical displays of the type shown above were all that was available. However, new technology now allows this information to be conveyed in a more exciting and meaningful way. To understand the functional consequences of a three dimensional structure, students must be provided with a three-dimensional model of that structure to analyze. The most readily available model is that generated by various computer graphics programs. Although the image that is created on the computer screen is in fact two-dimensional, various shading, depth cueing and kinetic depth effects can produce an image with takes on three-dimensional character as soon as the user begins to rotate the model on the screen. Although these computer visualization programs were originally developed for UNIX-based computer workstations, public-domain versions of this software now exist that can be downloaded from the Internet and used free of charge (e.g. RasMol, Protein Explorer, Swiss PDB Viewer). They are powerful programs that will run on any standard desktop computer (PC or Mac).

An even more engaging kind of three-dimensional protein model that can be used to enhance a molecular structure/function curriculum is a physical model. Physical models have several advantages over the computer generated models, especially in the initial phase of a "guided-inquiry approach" in which students are encouraged to think about a molecular structure and formulate questions that they would like to explore further in the laboratory or with the aid of a molecular visualization computer program.

Until recently, physical models of proteins and other molecular structures are not been readily available to educators. However, with the creation of the Center for BioMolecular Modeling, it is now possible to apply rapid prototyping technology to the production of accurate three-dimensional models of any proteinwhose structure has been determined (and whose coordinates have been deposited in the Protein Data Bank) can be generated. A variety of different rapid prototyping technologies can be used to construct these models at scales ranging from 0.1 to 1.0 Å per cm, and in several formats (alpha carbon backbone, ball and stick, space-filled, and surface models showing electrostatic potentials). It is also possible to "zoom in" on a small region of a protein to produce an accurate model of the alpha carbon backbone and amino acid side chains that constitute an enzyme active site.

Physical modeling: background

The field of structural biology has enjoyed phenomenal growth and development over the past three decades. By 1970, only 11 protein structures had been solved. These initial structures were of proteins, like hemoglobin, that were easily purified in large amounts, and then conveniently crystallized. Since that time, advances in molecular biology and protein crystallization, along with improvements in computer hardware and software, have transformed this field from a small fraternity of scientists working on a few obscure problems, into a mainstream approach that impacts the work of virtually all biological scientists. The number of protein structures solved during this time has been increasing at an exponential rate, from 15-20 structures per year in the 1980's to approximately 100 per month in 1994. The Protein Data Bank is projected to contain over 30,000 structures by the year 2004.

The construction of a model representing the electron density of a protein has always been a critical exercise in any protein structure determination. Initially, this was done by tracing electron density contour maps onto thin sheets of Plexiglas that were then stacked up to give the initial impression of a three-dimensional model. This approach was later refined in the use of a “Richard’s box”, in which the protein model was constructed from appropriately scaled parts by fitting them into the electron density that was observed through the use of a half-silvered mirror that reflected the model into the density. Still later, a device invented by Byron Rubin (“Byron’s Bender”) made it possible to bend steel wire into a backbone model of a protein, using the

phi/psi angles of each alpha carbon atom

The development in the late 1970's of FRODO, the first software package to generate three-dimensional computer models of proteins from electron density data, eliminated in large part the need for crystallographers to continue the laborious and painstaking job of physical model construction. In subsequent years, these software packages have been enhanced and refined, along with dramatically increased computer power, resulting in the molecular visualization programs briefly described above.

The preceding description of the development of structural biology and molecular visualization over the past 30 years pertains primarily to practicing structural biologists. However, as structural biology has become a mainstream approach in virtually all the biosciences, a much larger group of biochemists and molecular biologists are now experiencing the need to visualize and analyze these new structures. It is with this later group of biological scientists that physical models are of value. Physical models accurately portray not only the alpha carbon backbone of the protein, but also critical amino acid side chains as well as important bound substrates or inhibitors. The physical model requires neither a molecular graphics workstation nor personnel experienced in using such systems.

Criteria that make physical models appealing to researchers are also ones that make them useful to educators:

First,the physical model is both tactile and interactive. It can be viewed from the top, bottom and side in quick succession, faster than even the most adept computer graphics user. In other words, it is theideal portable graphical display.
Second, unlike the computer-generated model,the physical model is always “on”. It is “on” as it sits on the lab bench, where it can be used for impromptu discussions of new experiments, demonstrations to visitors, and to teachers and students who just want to muse and speculate about a molecular structure.

At the same time that structural biology was experiencing a period of rapid growth and development, a similar phenomenon was occurring in an engineering discipline known as Solid Freeform Fabrication. As in structural biology, new developments in both software and hardware combined to revolutionize this area of manufacturing. Computer-assisted design (CAD) software was developed to allow engineers to quickly and accurately design three dimensional objects in the computer environment. At the same time, rapid prototyping technologies were developed which used the output from the CAD software packages to drive equipment that rapidly constructed a physical model of the part. Today, five different prototyping technologies are widely used in the design and manufacture of everything from automobile engines to soda bottles and child-proof caps for prescription drug bottles.

How we design and build physical molecular models

All of the available rapid prototyping technologies have been employed in the construction of different molecular models. For more information on the models, the technology and how they are used in research and education, please visit the web site for the Center for BioMolecular Modeling (

Building a model of any molecule requires that their exists a file containing the x-ray crystallographic data that describes the x,y,z coordinates of every atom in the molecule. For molecules whose structure has been determined, these data are stored as pdb files at the Protein Data Bank, the single worldwide repository for the processing and distribution of 3-D biological macromolecular structure data ( These files are freely accessible for downloading and displaying on any desktop computer using molecular visualization software such as RasMol, Chime, or Protein Explorer, which are also freely available (

The next step is to convert the pdb file into a “build” file that translates the atomic coordinates into a set of instructions that directs a rapid prototyping machine to create a macroscopic physical model....a kind of computer-assisted design, or CAD, file. This is done in several different ways depending on the type of representation of the structure we wish to model. Shown below are three RasMol images of the protein, beta globin, each of which can be converted into a physical model:

a backbone representation, which shows the orientation of the amino acid backbone (peptide bond-alpha carbon-peptide bond) in space in order to see regions of secondary structure (in this case, alpha helices),
a stick representation, which adds stick-like representation of all atoms in the backbone and side chains, using CPK coloring, and finally
a space-filled representation, showing the van der Waals radii of all atoms in the protein.

To make a backbone model, we scale up the atomic dimensions to the macroscopic level and eliminate all atoms but the alpha carbon. We then connect each of these together either by a “pipe” to make a continuous zig-zig solid structure, the vertices of which are the alpha carbon atoms. The stick model is produced in a similar fashion, with the addition of open-ended pipes representing the atoms of the side chains. Since the usual rapid prototyping machines do not build in color, post-build painting of the models by hand is necessary. Finally, a much different-looking type of model is created by representing each atom by a sphere proportional to its van der Waals radius. The result is a “space-fill” model in which any given atom is in contact with several others.

The recent acquisition of a fifth rapid prototyping machine, the Z Corp model 402 color printer allows us to incorporate color directly into the build process. This is the technology used to create the 3D Translation Kit. The models shown in the photos of the Kit not only demonstrate the color, they also show representations different from those described above, using proprietary software designed for the Center for BioMolecular Modeling by Roger Sayles.

For example, instead of representing the backbone as small zig-zag pipes, we can assign an arbitrary radius to the alpha carbon such that the polypeptide backbone is a “string of beads”. For nucleic acids, a similar operation is carried out by assigning an arbitrary radius to the phosphate in order to create a linear chain of nucleotides.

A somewhat similar-looking model is built by yet a different protoyping technique to create a “surface model”. Here, we use a program that essentially rolls a ball of arbitrary diameter across the entire surface of the molecule. The representation is that of peaks and valleys --- essentially a physical topographic map of the molecule’s surface. We are currently developing software that will color the peaks according to their electrostatic potential, thereby showing where concentrations of positive and negative charges are located.