Virtual Drug Screening stream Spring 2011

Lab: Virtual Screening GOLD 2

Objective

In this lab you will be screening a small molecule library (~1000 compounds) to search for ligands that may bind to human DHFR (Homo sapiens) and to its homolog the bacterial DHFR (Bacillus anthracis). You will compare the predicted binding modes and use Lipinski’s rule to determine if the molecules are drug-like.

Background

Identifying new drug leads using traditional methods is an expensive and time consuming process. Pharmaceutical firms often look for drug leads by physically testing hundreds of thousands of chemical compounds in an assay system. If we know which protein we want to bind a ligand to, and we know what it looks like, computers can be used to model the binding of these compounds. If this computer modeling or “virtual screening” is accurate, it can greatly facilitate drug design.

We are using dihydrofolate reductase (DHFR) as the drug target for our virtual screening. We know that the anti-cancer drug methotrexate acts by binding to the enzyme dihydrofolate reductase and preventing the production of tetrahydrofolic acid. Because tetrahydrofolate is required by rapidly dividing cells to synthesize DNA precursors, inhibition of DHFR is useful in cancer therapy. However, it is also a good target for treatment of bacterial infection from the myriad different strains and species that are pathogenic. In particular, Anthrax has been a high profile target because of its potential use as a bioterrorist weapon. For this lab you will perform virtual screening of a library of small molecules to identify compounds that may inhibit the Anthrax version of DHFR and compare it to the human version. Ideally, you are looking for a compound that will inhibit the Anthrax version and not the human version so that it could be administered as a therapy to humans with minimal side effects.

The small molecules you will be screening are part of a collection of compounds available from ChemBridge (http://chembridge.com/chembridge/). This company provides libraries for both computational and wet lab based high-throughput screening. Any ‘hits’ identified using the computer can be ordered individually or in batches and tested in the lab. More than 5 million small molecules are potentially available and their online inventory is searchable by compound ID or chemical structure. You will use the program GOLD to identify molecules that may inhibit DHFR and then use the online database to obtain information on the hits that you identify.

But merely identifying small molecules that may bind to DHFR is not enough. You want to find compounds that have potential for becoming new drugs. This means that they need to be orally active and have good absorption, distribution, metabolism, and excretion profiles (ADME). An orally active drug needs to be able to: be absorbed through the intestinal lining, be carried in blood (aqueous), and pass through the cell membrane into the cytosol. One method of assessing a compounds ability to be useful as a drug is Lipinski’s rule of five.

Lipinski’s Rule of Five

Lipinski’s Rule of Five is an empirically derived rule of thumb to evaluate if a small molecule is likely to be orally active as a drug (drug-likeness). It states that most orally active drugs have been shown to possess similar molecular properties and do not violate more than one of the following criteria:

1.  No more than 5 hydrogen bond donors.

2.  No more than 10 hydrogen bond acceptors.

3.  Molecular weight less than 500 g/mol.

4.  Partition coefficient (logP) less than 5.

You should be familiar with the concepts of hydrogen bond donors and acceptors and molecular weight. The last property is a measure of how soluble the compound is in water and lipids (does it partition to the aqueous or non-aqueous phase?). The partition coefficient plays a major role in determining where drugs are distributed within the body after absorption and in how rapidly they are metabolized and excreted. A drug must be hydrophobic enough to pass through the lipid bilayer, but hydrophilic enough to be transported in the blood and to be distributed to the cytosol or whatever the target site may be.

GOLD LESSON 2

This week you will be screening a library of ~1000 compounds to identify small molecules that are predicted to bind to human DHFR (PDB code: 1u72) and to bacterial DHFR (PDB code: 3dat). To make the results both rapid and accurate, the screening will be carried out in 2 separate runs.

The GOLD jobs that you ran last week were performed in a fast mode, which takes less computational time but is also less accurate. This week we want more accuracy, but still want to keep computation time low so will use the following strategy. The initial run will be done on the whole library of 1000 compounds using a fast but lower accuracy approach. The top 15% (150 compounds) will be output from the first run and will then be submitted to a more rigorous docking experiment. Then the top 10% (15 compounds) of these will be saved for the analysis.

Note: you will be running 4 jobs for this lab. Start the first two today because they will take overnight to complete. Then come back the next day or two to run the last two jobs (they will take 3-5 hrs). Then you will need to come back and analyze the results and make PyMol images.

Timeline

Day 1: Setup of first runs = 1 hour – 1.5 hrs

Day 2 or 3: Setup of second runs = 30 min – 1 hour

Day 3 or later: Analysis = several hours

GENERATING AN HYPOTHESIS

Structural Comparison: In order to compare the two proteins that we will dock into, use WinSCP or Secure Shell (SSH) to open a terminal window and log on to ddfe.cm.utexas.edu to

transfer the files over to your local computer and open them both in the same PyMol window.

protein_1u72.mol2

protein_3dat.mol2

Do they look particularly different? It may help to show them as cartoon. What is the RMS difference between the two when you align them?

BLASTP Pairwise Comparison: In order to better understand those residues within the active site involved in binding you can make a pairwise comparison of the primary amino acid sequences of these two proteins using the BLASTP program. To do this you will need the two sequence files in the /LabVirtualScreening2files directory (copy these to your local desktop):

1U72.fasta.txt

3dat.fasta.txt

·  Go to the site: http://blast.ncbi.nlm.nih.gov/

·  Click on Protein Blast

·  Check the ‘Align 2 or more sequences’ box

·  Paste the protein sequence of 1u72 in the top panel (or use the Upload File)

·  you do not put in the top line that has the 1U72 title in it– just the sequence of amino acid letters

·  Paste the protein sequence of 3dat in the bottom panel (or use the Upload File)

·  Hit ‘Search’

·  Wait for it to give you the results (it should look similar to what is shown below)

·  Copy and paste the Alignments text of the pairwise alignment and save it for your report. You may have to adjust the left and right margins on your page.

·  It should look something like this:

>lcl|54413 unnamed protein product

Length=159

Score = 60.8 bits (146), Expect = 1e-14, Method: Compositional matrix adjust.

Identities = 51/183 (27%), Positives = 89/183 (48%), Gaps = 28/183 (15%)

Query 4 LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQNLVIMGKKTWFSIPEK 63

++ I A++ + IG +PW L + +F+R T V IMG+ TW SI

Sbjct 2 ISLIAALAVDRVIGMENAMPWN-LPADLAWFKRNTLNKPV------IMGRHTWESI--- 50

Query 64 NRPLKG…

RPL G…

Sbjct 51 GRPLPG…

Based on the structural comparison and the BLASTP observation – what is your hypothesis about the 1,000 ligands that we will dock into these two different structures?

VIRTUAL SCREENING RUNS:

In your DDFE directory, make a directory called LabVS2.

Then make two directories within LabVS2 called: human1u72 and bacterial3dat.

Editing HOSTS files

Move the gold.hosts file that you used in LabVS1 to the local computer (the left side of the window). Then edit it - Open it up in WordPad and then change it to run on 2 processors instead of 1. Each person will be on a different blade – which is represented by the ‘X’. With the help of a mentor, choose which blade on the upper deck (#8-15) is empty by using the command $ganglia cpu_user. Then enter this number in place of the ‘X’ on your gold.hosts file. Running each person on a different blade will prevent our jobs from running slowly.

compute-0-X.local 1

no_of_processes 1

To:

compute-0-X.local 2

no_of_processes 2

Now put a copy into each of the directories (one in human1u72 and one in bacterial3dat). Otherwise, it will run slower!

****************************************************************************************************

1st Run for human1u72 (overnight)

Now, go to the /chem204/LabVirtualScreening2files directory and copy the following files to your new human1u72 directory (you may need to copy them to your local computer first, then back over to your directory on the DDFE): protein_1u72.mol2, mtxfor1u72.mol2.

You also need to get your gold2.conf from LabVS1. You will now need to rename this configuration file and make changes to it for Lab VS2. Remember to use a text editor like Wordpad and not Microsoft word to edit the files.

Rename gold2.conf to gold1u72run1.conf and edit the following lines:

cavity_file = mtx.mol2 è cavity_file = mtxfor1u72.mol2

The ligand database will be much larger than last time. So, instead of having each of us copy it to our directory, it will be left in the /LabVirtualScreening2files/Library directory and we will all use the same file. We will reference the file’s location in the configuration file:

ligand_data_file /home/chem204/LabVirtualScreening2files/Library/CB-kin_UT_1.sdf 10

Change the bestranking name:

bestranking_list_filename = bestranking1u72run1.lst

Change the output name:

concatenated_output = GoutCBkin1u72run1.sdf

Change number of poses saved for each ligand:

clean_up_option save_top_n_solutions 1

You need to ADD A LINE to change the output number(if you don’t do this you will get 1,000 ligand output file!). Put this line anywhere in the ‘SAVE OPTIONS’ section

clean_up_option save_best_ligands 150

Now save your edited file and use SFTP to send it back to your directory on the cluster. There should be 4 files present.

Have a mentor check your files.

You can now start the first GOLD run by opening a terminal window and navigating to the human1u72 directory. Use the commands ‘ls’ and ‘cd’ to get around. Then type:

goldremoteP 2 gold1u72run1.conf &

This job will take ~10-12 hours to run so it can be launched and left to run. Make sure you enter the ampersand (&) at the end of the command line as this runs the job in the background and allows you to log out while the job continues. Use the “Refresh” icon on the SFTP window, to see that GOLD has created some new files in your directory and is running properly. If the PID files is present then it is not done. You can also check the ganglia cpu_user command to see if your blade is still running.

Proceed to 1st Run for bacterial 3dat…. (it can be run in parallel –at the same time)

****************************************************************************************************

1st Run for bacterial3dat (overnight)

(start about the same time as 1st Run for 1u72)

Go back to the /chem204/ LabVirtualScreening2files directory and copy the following files to your new bacterial3dat directory: protein_3dat.mol2, mtxfor3dat.mol2.

Also copy the gold1u72run1.conf (you will now need to rename this configuration file and make changes to it for this protein docking).

Rename gold1u72run1.conf to gold3datrun1.conf and edit the following lines:

cavity_file = mtxfor1u72.mol2 è cavity_file = mtxfor3dat.mol2

Change the protein file:

protein_datafile = protein_3dat.mol2

Make sure the reference to the ligand file is there:

ligand_data_file /home/chem204/LabVirtualScreening2files/Library/CB-kin_UT_1.sdf 10

Change the bestranking name:

bestranking_list_filename = bestranking3datrun1.lst

Change the output name:

concatenated_output = GoutCBkin3datrun1.sdf

Change number of poses saved for each ligand:

clean_up_option save_top_n_solutions 1

You need to ADD A LINE to change the output number(if you don’t do this you will get 1,000 ligand output file!). Put this line anywhere in the ‘SAVE OPTIONS’ section

clean_up_option save_best_ligands 150

Have a mentor check your files.

Now save your edited file and use SFTP to send it back to your directory on the cluster. There should be 4 files present. You can now start the first GOLD run by opening a terminal window and navigating to the bacterial3dat directory. Use the commands ‘ls’ and ‘cd’ to get around. Then type:

goldremoteP 2 gold3datrun1.conf &

This job will take ~10-12 hours to run so it can be launched and left to run. Make sure you enter the ampersand (&) at the end of the command line as this runs the job in the background and allows you to log out while the job continues. Use the “Refresh” icon on the SFTP window, to see that GOLD has created some new files in your directory and is running properly. If the PID files is present then it is not done. You can also check the ganglia cpu_user command to see if your blade is still running.

You can then log off the remote computer and check back on it later.

****************************************************************************************************

2nd Run for human1u72 (3-5 hrs)

When the first run has finished, check the bestranking1u72run1.lst file to verify that all 1000 ligands were tested. Also open the GoutCBkin1u72run1.sdf in PyMol to see that there are 150 ligands present in the output. These are GOLD’s best docking poses for the ligands that you ran in the first job.

Now create a new directory in the human1u72 folder called 1u72run2

Copy the gold1u72run1.conf into the directory and rename it gold1u72run2.conf

Increase the accuracy of the run by changing:

àautoscale = 1

Instead of copying all the necessary files and taking up space, we can just reference them in the parent directory. To do this we put a ‘../’ before that says ‘go one directory up’.

cavity_file = ../mtxfor1u72.mol2

protein_datafile = ../protein_1u72.mol2

The input for the second GOLD run will be GoutCBkin1u72run1.sdf. Instead of moving it into the directory, we will reference its location from the configuration file.