Molecular Structure and Modeling

Molecular Structure and Modeling

Chemistry 164, Spring, 2007

Molecular Structure and Modeling

Creation of SYBYL mol2 Files

This document outlines three ways of creating a SYBYL mol2 file of small molecules. The process will be illustrated with methyl formate. Be sure to check the results of your coding.

I) Direct creation of a mol2 file with a text editor


The process is illustrated with the molecular structure and the ASCII (i.e. txt) file given below. A copy of the file that serves as a template is provided on the Chemistry 164 Web page. Note the spacing.

@<TRIPOS>MOLECULE

methyl formate

8 7 0 0 0

SMALL

NO_CHARGES

@<TRIPOS>ATOM

1 C1 0.0000 0.0000 0.0000 C.3 1 <1> 0.0000

2 H1 1.0959 0.0000 0.0000 H 1 <1> 0.0000

3 O1 -0.4707 1.3292 0.0000 O.3 1 <1> 0.0000

4 C2 0.6910 2.1276 0.0000 C.2 1 <1> 0.0000

5 O2 1.7986 1.5993 0.0000 O.2 1 <1> 0.0000

6 H2 0.6050 3.2201 0.0000 H 1 <1> 0.0000

7 H3 -0.3648 -0.5173 0.8946 H 1 <1> 0.0000

8 H4 -0.3648 -0.5173 -0.8946 H 1 <1> 0.0000

@<TRIPOS>BOND

1 1 2 1

2 1 3 1

3 1 7 1

4 1 8 1

5 3 4 1

6 4 5 2

7 4 6 1

A) Create the header (first 6 lines) of the file. It has the following information:

1) @<TRIPOS>MOLECULE

2) the name of the molecule or a comment

3) the number of atoms in the molecule, N, followed by the number of bonds and three zero’s.

4) SMALL

5) NO_CHARGES

6) a blank line

B) Determine the Cartesian coordinates in Ångstrom of each atom in the molecule.

This will be an exercise in geometry. Unfortunately bond angles, bond angles, and dihedral (torsional) angles rather than Cartesian coordinates are usually provided in the microwave and electron diffraction literature. You need to select one atom in the molecule as the origin. This choice is completely arbitrary although in many cases, an astute choice will simplify the calculations. You will need to number the atoms in the molecule. Refer to the figure above. The methyl carbon is atom 1 in the illustrative example.

C) Enter the coordinate information into the file.

1) The first line contains @<TRIPOS>ATOM

2) Next enter a line for each atom with the following information:

a) the atom number, an integer identifying the atom. The atom number is used in the connectivity table in the file.

b) a second label containing an atomic symbol and an integer. Tripos uses this second enumeration scheme to indicate the number of atoms of each element. For example, the molecule has two carbon atoms so the first is C1 and the second C2.

c) the Cartesian x, y, and z coordinates in Ångstrom in the next three fields,

d) the SYBYL atom type. For the simple cases that we shall consider, enter H for hydrogen, C.2 for a sp2 hybridized carbon, C.3 for a sp3 hybridized carbon, etc.

e) 1 <1> 0.0000 in the remainder of the line. SYBYL uses the information in these fields to divide a molecule into groups. Spartan does not use this information.

D) Enter the connectivity table for the molecule.

The first line in this section contains the command @<TRIPOS>BOND. This line is followed by a line for each bond in the molecule. Each of these connectivity lines contain 4 pieces of information:

1) the number of the bond,

2) the atom number of the first atom in the bond. The choice is not arbitrary. Note that atoms with the highest multiplicity (number of bonds) are used.

3) the atom number of the second atom in the bond, and

4) the type of bond (1 for single, 2 for double, and 3 for triple).

E) Save the file with the extension mo2.

Microsoft only recognizes 3-letter extensions so in the Windows world, the extension must be mo2 rather than mol2 used with UNIX machines. If you are using a PC, make sure that the file is saved with the correct extension.

II) Creation of a Z matrix file with a text editor and conversion of it to a mol2 file

A second route first involves the creation of a Z matrix which is discussed in your text. This has the benefit that the input is in terms of bond lengths, bond angles, and dihedral angles. Spartan does not read files in the Z matrix format so the Z matrix file must be converted to a mol2 fileusing a file-conversion utility such as Babel. Gaussian uses the Z-matrix file type so this example uses the gjm (Gaussian input file) format. The text of the gjm file for methyl formate is given below.

$RunGauss

%chk=/scr3/mstahl/

#rhf 3-21g opt scf=direct maxdisk=10000000 Test

comment line

0 1

C

H 1 r2

O 1 r3 2 a3

C 3 r4 1 a4 2 d4

O 4 r5 3 a5 1 d5

H 4 r6 3 a6 1 d6

H 1 r7 3 a7 2 d7

H 1 r8 3 a8 2 d8

Variables:

r2= 1.0959

r3= 1.4101

a3= 109.5

r4= 1.4096

a4= 105.0

d4= 0.00

r5= 1.2272

a5= 120.00

d5= 0.00

r6= 1.0959

a6= 120.00

d6= 180.0

r7= 1.0959

a7= 109.5

d7= 120.0

r8= 1.0959

a8= 109.5

d8= 240.0

A) In creating the Gaussian gjm file, simply copy the first seven lines of the example. The information in them will not be used in the conversion discussed below.

B) The seven-line header is followed by one line per atom in the molecule. These lines define the coordinate system for each atom in the molecule. In working up the example, I used the same numbering scheme as in the preparation of the SYBYL mol2 file. The atoms are numbered implicitly by the order in which they are entered.

1) the first atom. In drawing a geometric figure, one starts with a point. The location of all other atoms in the structure are based on the choice of the origin. One only provides the atomic symbol for the first atom.

2) the second atom. Two atoms only define a distance. In this line, provide the atomic symbol for the second atom, the number of the atom to which it is attached, and a designation of the parameter for the bond length, r2 (r for bond length and 2 for the second atom).

3) the third atom. Three points define an angle. In this line, provide the atomic symbol for the additional atom, the number of the atom to which it is attached, and the number of the third atom that defines the bond angle. Note that the line also contains the labels r3 for the bond length and a3 for the bond angle.

4) the fourth and all other atoms. Four atoms do not necessarily lie in a plane so the dihedral angle must now be provided. In these lines, provide the atomic symbol for the additional atom, the number of the atom to which it is attached, the number of the atom that defines the bond angle, and finally the number of the atom that defines the reference plane. Note that atom numbers for these three reference atoms are given. The first defines the local origin; the second, the positive x-axis and the third, the positive y-axis. The location of the positive z-axis follows from this as a right-handed coordinate system is always assumed. Three parameters-rn, an, and dn-are also specified where n indicates the number of the added atom. They represent the bond length, the bond angle, and the dihedral angle that the added atom makes with respect to the plane defined by the three reference atoms.

C) The final section of the file contains the values of the structural parameters defined above.

D) Save the file with the extension gjf.

E) Finally open the file in gjf format using GaussView and save the structure as a mol2 file.

III) The Spartan drawing tool. One can draw the molecule in Spartan, modify the structural parameters, and save the result as a mol2 file. This approach will not work in cyclic polyatomic molecules where the adjustment of one structural parameter affects the values of others.

IV) The document closes with a discussion of the xyz format which is included for completeness. This final approach parallels the first. One calculates the Cartesian coordinates in Ångstrom of the atoms in the molecule and with the aid of a text editor coverts these numbers into an xyz file. One then uses Babel to convert the xyz file into a SYBYL mol2 file. Although many Web sites will provide the structure of molecules in the xyz format, most commonly used programs such as Spartan and Gaussian do not read files in this format. Rasmol can read these files but does not have a module for exporting them in another format. An example of the xyz format is given below for the case of a hypothetical square planar form of CH4.

5

CH4

C 0.0 0.0 0.0

H 1.0 0.0 0.0

H 0.0 1.0 0.0

H -1.0 0.0 0.0

H 0.0 -1.0 0.0

The file has the following format:

A) The first line provides the number atoms in the molecule. No decimal point; the input is an integer.

B) The second line provides the molecular formula of the substance.

C) For a molecule with N atoms, there are N lines of the third type. Each of these lines contains the following information for each of the N atoms. Each item in the line is separated by a space.

1) the atomic symbol (upper case) of the atom,

2) the x coordinate in Ångstrom,

3) the y coordinate in Ångstrom, and

4) the z coordinate in Ångstrom.

Note that the xyz file does not provide any information about connectivity. Maybe Babel will infer the correct connectivity but this cannot be guaranteed. (It does in this case.) If the final file will be used in a quantum mechanical calculation, connectivity doesn’t matter as bonds are a human creation. In that case, convert the xyz file into a pdb file which Spartan can read without difficulty. If the structure will be used in a molecular mechanics calculation, connectivity and the correct determination of the atom type must be correct. Although the xyz format is common because of its simplicity, it is often useless as it is too simple. Its common appearance on Web sites is regrettable.

c164_mol2.doc, WES, 1 Jan. 2003; revised, 11 Jan. 2007, 4 Feb. 2008