Analyser and viewer of protein inter-residue contacts

V. Bojović, B. Lučić2, K. Skala1, I. Grubišić1

1Centre for Informatics and Computing, 2 NMRCenter

Ruđer Bošković Institute

Bijenička cesta 54, Zagreb, HR-10000, Croatia

Email:

This service is made in order to achieve more efficient insight into the nature of hydrophobic or hydrophilic forces in proteins. It enables to visualize details and specificity in contacts between amino acid residues in proteins. In addition, it is possible to visualize contributions of inter-residue contacts depending on selected physical and chemical properties of amino acid residues (current version is based on the hydrophobicity scale intoroduced by J. Kyte and R. F. Doolittle, J. Mol. Biol., vol. 157, p. 105-132, 1982).

Choosing one amino acid residue, distances between its side chain and side chains of all amino acid residues within the sphere of chosen radius can be visualized.

Current version of database included in the server contains only 100 proteins crucial for performing research on modeling of protein folding rates and some antimicrobial polypetides, but it will be enlarged by new sequences. Visualization is available in png, povray, and vrml format, and the PDB format output is also available.

Application can be found at

I. Introduction

The Protein Data Bank (PDB) archive is the single worldwide repository containing information about sequences, 3D structures andfunction of large number of biological molecules, including proteins and nucleic acids [1]. The archive data is stored into precisely formatted files available for download.

To achieve the more efficient insight into the nature of hydrophobic interactions in proteins, we constructed database containing proteins for which folding rates were experimentally determined, initially based on the study by Huang et al. [2]. It is optimized for fast searching through 3D space, where only specific atoms are included in searching procedure.

II. Implementation

A. Protein structural data extraction from PDB

Thefirst step in development of such a system was creation of a relational model of PDB database.

When visualizing PDB data, only parts of the PDB file containing information about tertiary structure are necessary for successful execution of the visualization program.

Coordinate section of PDB files contains information relevant for visualization purposes. The most relevant records for this work are “MODEL”, ”ENDMDL”, “ATOM” and “TER” [3].

The “MODEL” record specifies the model serial number when multiple models of the same structure are presented in a single coordinate entry, as it happens (often) with structures determined by NMR. In that case each model starts with reserved word “MODEL”, which includes the model number, and ends with the “ENDMDL” record. [10]. For visualization purposes, we only use the first model. The “ATOM” records present atomic coordinates for standard amino acids and nucleotides. Data format of record type “ATOM” includes atom serial, residue sequence number, atom name, residue name, chain identifier, coordinates in angstroms, occupancy, temperature factor, element symbol and charge. Sometimes, the non-polymer residues can be included in “ATOM” records if they are located close to the polymer chain. These non-polymer residues are not included into this version of relation model database.

Searching through the structure, some atoms will not be considered, because we only search for contacts between side chains. This means that backbone will not be visible.

To make searching through PDB files easier, we created a relational model of database using PostgreSQL database engine, which runs on a Debian GNU Linux server where our application is set and available at [9]. A simplified scheme of the database shown in Fig. 1 has five relations represented by gray blocks. All relations are filled using our software written in Perl.

B. Database schema

Fig 1. Database scheme

“TITLE” records from PDB files and PDB codes are stored in the “Proteins” relation. This relation has one or more primary structures (Fig. 1) that are stored into the “Primary structure” relation, where the chain identifiers of “HETATM” records are not included. From all atoms in ATOM section from PDB files, only amino acids are stored in this database leaving no place for the other types of residues.

The “Tertiary structure” relation uses the ID from “Primary structure” relation as foreign key. In tertiary structure table, there is a flag, which helps to distinguish which parts of residue, belongs to side chain, and, in such a way it increases the speed of searching. All information about coloring methods and atom covalent radii is also included there.

The “Connections” relation is used to draw covalent chemical bounds between different atoms in “Tertiary structure” relation using atom radii and het_dictionary data. It contains also the coordinates for vrml output to speed up the searching process.

The “Results” relation is used to store temporary results of searching to speed up the searching process, because after the search of the contact space is done, atom connections in chosen radiuses will also be included in the final picture.

.

C. Web application

The web application is written in PHP and JavaScript using the JQuery framework. The database used by this application is PostgreSQL.

Momentarily, the database contains specifications of only 100 proteins. Although small, the number of proteins contained in the database is sufficient - for development purposes and work on specific analysis related to research of relationships between structural information and protein folding/unfolding rates. Once completed, the database is expected to grow rapidly.

By selecting the protein of interest from the main page by clicking on the View link, the user is able to choose which amino acid residue of the chosen chain of the selected protein will be visualized, including its relevant surrounding residues according to selected options. Options mentioned are inner and outer radius of the surrounding contact space, color scheme and drawing methods.

The searching process through the protein structure works in this way: when the residue in selected chain is selected, web application sends the query to the database and displays results (environment) related to selected amino acid residue.

The query searches for all residues in protein which are not in the same chain with absolute difference of residue sequence larger than 1. That means that all close neighbours (residues connected by peptide bond to selected residue) are excluded form search. When calculating the distance only side chains are chosen, except in the case of glycine. Due to common unavailability of side chain in glycine, we have chosen to use C-alpha atom in that case.

When search is finished it exports the data into the PDB format where only side chain atoms are stored. Graphical formats (VRML and PovRay) exported files include also connectivity data represented as cylinders. In the current version, the user can choose between three drawing methods and four coloring methods.

Available drawing methods are: “Space fill”, “Surface”and“Nice bond” [4]. A single atom is represented by a single ball in all three drawing methods available in this version of web software. The connectivity data was used from “Connections” relation and it is represented as green cylinders. In the “Surface” and the “Nice bond”method blob spheres were used. The “Space fill” method uses normal spheres from POV-Ray.

Each coloring method used in this software uses the “Tertiary structure” relation. The temperature factor coloring is based on normalized values of temperature factor attribute.

The “Kyte-Doolittle” color scheme uses hydropathy index [5], which is also normalized in the range between 0 and 1.

In all drawing methods used in this application, every atom is displayed as a ball, having its own radius [6].

The radius is multiplied by the factor 2 to get the “Nice bond” method, or by 5 to get the “Surface method”. In the case of “Space fill”method the multiplication factor is constant and the different graphic object is used, as mentioned before.

Each color scheme is already implemented into “Tertiary structure” relation to speed up the extraction process.

After the application sends the query to the database, it fetches query results and begins exporting coordinates, radiuses and colors to POV-Ray objects, which are saved to files and automatically rendered to picture and displayed to user. The VRML output is also provided for those users who need other way of insight into the structure.

In case the user needs a textual output (with TAB separated columns), than it is also available for download at the same page where the POV-Ray source is located, but textual output format we created for development purposes only.

III. VISUAL ILLUSTRATION OF Obtained Results

This section contains examples of visualized data. Comparison with other programs is not made because we weren’t able to find any other visualization program, which is able to perform this kind of analysis. We found some software, which is capable of displaying surrounding of selected residue, but we did not find software products capable to display surrounding side chains of chosen residue side chain.

Example pictures

Pictures represent different methods of coloring and drawing used in the same protein. The name of experiment taken from PDB is (1l8w): Crystal Structure of Lyme Disease Variable Surface Antigen VlsE of Borrelia Burgdorferi [7].

In all pictures shown all displayed atoms represent those atoms, which satisfy searching distance radius criteria. All of them belong to side chains of surrounding residues, except C-alpha atoms of glycine.

The covalent bonds are represented with green sticks, as mentioned before, and the color of covalent bonds is different form blue one, which represents selected residue. Covalent bound sticks are included to keep structure connected when printing in 3D printer, and to enable user a better visual identification of structure details.

Fig 2. Surrounding of Glycyne 60 from the chain A (PDB: 1l8w), within the distance from 2 to 12 Å, using the “Space fill” and the “Kyte-Doolittle” color scheme

Fig 3. Surrounding of Glycyne 60 from the chain A (PDB: 1l8w), within the distance from 2 to 12 Å, using the “Nice bond” method and the “Atom color” scheme.

Fig 4. Surrounding of Glycyne 60 from the chain A (PDB: 1l8w), within the distance from 2 to 10 Å, using the “Space fill” method and the “Chain color” scheme.

IV. Conclusion

In this application more efficient insight into 3D inter residue contacts is provided. By clicking on a single residue on the web page, its relevant surrounding residues will be visualized according to selected options. Results are shown in PNG graphical format, and are free for download in PovRay, VRML and PDB format on the result page. Based on visualizations that we obtain using this server, we will design and calculate structural parameters useful for describing protein folding process, even more detailed than those recently published like contact order and contact maps [8]. More detailed insight into inter-residue 3D contacts should enable us to improve quantitative relationships between protein structure and important folding parameter, like protein folding rates.

Acknowledgement

We thank Sanja Tomić who provided us valuable information on data stored into PDB database.

This also would like to note that project was supported by Grants 098-0982562-2567 (Scientific Visualisation Methods) and 098-1770495-2919 (Developing methods for modeling properties of bioactive molecules and proteins) awarded by the Ministry of Science, Education and Sport of the Republic of Croatia.

References

[1]RCSB Protein Data Bank, “About the PDB Archive and the RCSB PDB” [online]. Available: [accessed 10 February 2009], 2008.

[2] J. T. Huang, J. Tian, “Amino acid sequence predicts folding rate of middle-size two-state proteins”, Proteins vol. 63, p. 551–554, 2006

[3]RCSB Protein Data Bank, “Coordinate Section”[online]., [accessed 20 January 2010], 2010.

[4]D. Zucić, “Nice Bonds” [online], [accessed 25 January 2009], 2006.

[5]J. Kyte and R. F. Doolittle: "A Simple Method for Displaying the Hydropathic Character of a Protein". J. Mol. Biol. vol. 157, p. 105-132, 1982.

[6]WebElements Periodic Table of the Elements, ” Covalent radius” [online], [accessed 25 January 2010], 2010.

[7]C. Eicken, V. Sharma, T. Klabunde, M. B. Lawrenz, “Crystal Structure of Lyme Disease Variable Surface Antigen VlsE of Borrelia Burgdorferi”, J. Biol. Chem., vol. 277 p. 21691- 21696, 2002.

[8] T. R. Weikl, “Loop-Closure Principles in Protein Folding”, Arch. Biochem. Biophys., vol. 489 p. 67-75, 2008.

[9] V. Bojović, “Analyser and viewer of protein inter-residue contacts” [online], [accessed 21 February 2010], 2009.

[10] V. Bojović, B. Lucic, K. Skala, “Protein Data Bank Graphics Generator on Grid”, MIPRO 2009., 32nd International Convention, p. 341, 2009.