Role of methyl groups in dynamics and evolution of biomolecules

Supplementary Information

Supplementary Table 1: Proteins Used in Methyl Content Analysis

Protein / PDB ID/UniProt ID
40S ribosomal protein S26 / P62854
40S ribosomal protein S11 / P62280
40S ribosomal protein S17 / P08708
60S ribosomal protein L38 / P63173
39S ribosomal protein L1 / Q9BYD6
39S ribosomal protein L10 / Q7Z7H8
39S ribosomal protein L12 / P52815
39S ribosomal protein L18 / Q9H0U6
39S ribosomal protein L2 / Q5T653
39S ribosomal protein L24 / Q96A35
39S ribosomal protein L28 / Q13084
39S ribosomal protein L37 / Q9BZE1
39S ribosomal protein L45 / Q9BRJ2
39S ribosomal protein L9 / Q9BYD2
28S ribosomal protein S10 / P82664
28S ribosomal protein S15 / P82914
28S ribosomal protein S7 / Q9Y2R9
60S ribosomal protein L14 / P50914
60S ribosomal protein L26 / P61254
60S ribosomal protein L30 / P62888
60S acidic ribosomal protein P0-like / Q8NHW5
60S acidic ribosomal protein P1 / P05386
40S ribosomal protein S20 / P60866
40S ribosomal protein S21 / P63220
Enzymes
Glucose Oxidase / 1GPE
proline dehydrogenase / NP_057419
RAS (RAB1A) / 2fol
RHO (RHO A) / 1A2B
MAPK / 1erk
JAK1 (Janus Kinase 1) / NP_002218
STAT-1 / 1bf5
Methylmalonate-semialdehyde dehydrogenase / Q02252
Alkaline phosphatase / P05186
Aminomethyltransferase / P48728
Acylamino-acid-releasing enzyme / P13798
Arsenite methyltransferase / Q9HBK9
ATP synthase subunit beta / P06576
Probable phospholipid-transporting ATPase IC / O43520
Gamma-butyrobetaine dioxygenase / O75936
BR serine/threonine-protein kinase 1 / Q8TDC3
Tyrosine-protein kinase BTK / Q06187
Carbonic anhydrase 1 / P00915
Calcium/calmodulin-dependent protein kinase type 1 / Q14012
Soluble calcium-activated nucleotidase 1 / Q8WVQ1
Caspase-6 / P55212
Inhibitor of nuclear factor kappa-B kinase subunit alpha / O15111
Creatine kinase B-type / P12277
2',3'-cyclic-nucleotide 3'-phosphodiesterase / P09543
Bifunctional coenzyme A synthase / Q13057
Carboxypeptidase A1 / P15085
Citrate synthase / O75390
Aspartyl-tRNA synthetase / P14868
Dipeptidase 1 / P16444
Epoxide hydrolase 2 / P34913
Prothrombin / P00734
Fatty acid synthase / P49327
Fas-activated serine/threonine kinase / Q14296
Fructose-1,6-bisphosphatase 1 / P09467
NADPH:adrenodoxin oxidoreductase / P22570
Fibroblast growth factor receptor 4 / P22455
Glucose-6-phosphatase / P35575
Glucose-6-phosphate 1-dehydrogenase / P11413
Glycyl-tRNA synthetase / P41250
Glucokinase / P35557
Guanine deaminase / Q9Y2T3
Glutamine synthetase / P15104
Alanine aminotransferase 1 / P24298
ECM
Collagen alpha-1(I) / P02452
Collagen alpha-1(II) / P02458
Collagen alpha-1(III) / P02461
Collagen alpha-1(IV) / P02462
Laminin subunit alpha-1 / P25391
Laminin subunit beta-1 / P07942
Laminin subunit gamma-1 / P11047
Fibronectin / P02751
Integrin alpha-1 / P18614
Integrin beta-1 / A5Z1X6
Cadherin-1 / P12830
E-selectin / P16581
Elastin / P15502
Cytoskeletal
Tubulin alpha-1 / Q71U36
Tubulin beta-1 / Q9H4B7
G-Actin / 1j6z
Vinculin / P18206
G-Actin / 1j6z
Myoglobin / NP_005359
Clathrin Heavy Chain / 1xi5
Growth factors
GDNF / 1agq
TGF-Beta / 1kla
BMP2 / 3bmp
VEGF / 2vpf
Neurotrophin-3 / 1b8k
CNTF / P26441
Somatotropin / P01241
NGF / 1bet
Plasma Proteins
Hemoglobin / 1gzx
Glucagon / 1d0r
prolactin / 1r25
Insulin / 1aph
Heat Shock Protein 60kDa / P10809
Interferon Gamma / P01579
von Willebrand Factor / 1AO3
Fibrinogen A chain / P02671
Fibrinogen B chain / P02675
Plasminogen / P00747
Serotransferrin / P02787
Fibrinectin / P02751
Immunoglobin / 1IGT
Ubiquitin
Albumin / 1e7h

Details of the proteins used in the analysis of methyl content per residue, they are sorted in the table by their functional ‘class’. The proteins are identified with either their UniProt sequence ID or PDB identifier.

Supplementary Figure 1: Detailed Lysozyme Simulation.

Lysozyme and its methyl distribution. 35 GLU (proton donor) and 52 ASP (nucleophile required to generate the intermediate) are key residues in the active site. The ligand (GlcNAc or NAG) lays in a cleft in the center of the protein. According to Scheraga’s original papers (see PNAS 73, 4261-4265, 1976 and PNAS 74, 2629-2633, 1977) he identified key contacts which he calls sites A through D. The normal mode that is thought to be key to allow for ligand entry and exit is thought to be dominated by 38 PHE and 97 LYS. The orientation of the (GlcNAc)3 ligand is from PDB structure 3A3Q (J. Biochem. 146, 651-657, 2009).

In the top left figure the backbone of lysozyme is shown as a tube, methyl containing amino acids are colored by tau (0 to 100 ps), the active site 35 GLU and 52 ASP are shown in yellow, and the hinge residues (38 PHE and 97 LYS) are shown in purple. The (GlcNAc)3 ligand is shown as smaller spheres colored by atom (cyan, blue, and red). Colored methyl containing side chains are shown in transparent color for clarity. In the top right figure, only those methyls that are “activated” at 125 K (12 MET, 17 LEU, 55 ILE, 56 LEU, 105 MET) are visualized (as balls colored based on tau values at 300K: 0 to 100 ps) along with the same coloring scheme as above for active site, hinge, and ligand atoms. The bottom left plot depicts the distance of the center of mass (COM) for each methyl containing group to the COM of either the active site ligand ((GlcNAc)3), the COM of the hinge residues, or the COM of the catalytic residues is shown below. In the plot on the bottom right, using the same definitions of the first plot, we then look at only those methyls that are “activated” at 125 K one can see that they are close to the hinge region yet not close to the ligand. Of course the hinge residues are on opposite sides of the protein and we are calculating the distance from the COM of this pair of residues.

The mere presence of these residues in and around both the hinge and catalytic residues does not alone dictate a direct effect on the biochemical function of the protein. There are scores of studies that have driven the protein engineering field where catalytic rates are discussed in terms of the underlying organic chemistry, the effect of hydrogen bond networks, the role of water, the role of ions, allosteric mutations, etc. However, methyls may be providing the lubrication that allows for dynamical changes at the active site and hinges and also that methyls supply the required entropy to drive the folding and maintenance of an active fold of the protein. Methyl containing residues are hydrophobic, yet they are dynamically more active than all other residues. They allow a “surface” like dynamical contribution to the entropy that is similar to that experienced by hydrophilic surface residues. In contrast to RNA that does not have methyl groups or well defined functional roles it supports our theory that methyls were an element that allowed proteins to dominate the biochemistry (enzymes & structural proteins) of modern organisms.

Supplementary Figure 2: Highly mobile methyl group justification.

In support of our assertion that highly mobile methyl groups are of more relevance than an analysis of all methyl groups, analysis was performed on previously published NMR studies of calmodulin and ubiquitin. The axial structure factor (S2axis) is a measure of the certainty of the NMR instrument resolving a methyl group location, i.e. a broad peak indicates the rotation is faster than the instrument resolution. (Error bars depict the standard deviation)

This result follows the expectation that the farther from the peptide backbone the methyl group is, the more freely it is able to rotate. Faster rotation is indicative of greater plastification of the structure, therefore we have an expectation that these methyl groups would be even more enriched in proteins seeking to maximize plasticity. In this framework methyl groups on leucine, the delta methyl group on isoleucine, and the methyl group on methionine are referred to in this work as ‘highly mobile’ methyl groups.