Supplementary Material for:

Effect of Training Data Size and Noise Level on Support Vector Machines Virtual Screening of Genotoxic Compounds from Large Compound Libraries

Pankaj Kumar,1 X.H. Ma,1X.H. Liu,1 JiaJia1, Han Bucong1, Y. Xue,2 Z.R Li,2S.Y. Yang3, Y.C. Wei3, and Y.Z. Chen*1,3

1Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, Department of Pharmacy, National University of Singapore,

Blk S16, Level 8, 3 Science Drive 2, Singapore 117546

2College of Chemistry, Sichuan University, Chengdu, 610064, P. R. China

3State Key Laboratory of Biotherapy, SichuanUniversity, Chengdu 610064, P.R.China

*Corresponding Author: Y.Z. Chen. Tel.: 65-6516-6877. Fax: 65-6774-6756. E-mail:

Supplementary Table S1 Molecular descriptors used in this work

Descriptor Class / Number of descriptors in class / Descriptors
Simple molecular properties / 18 / Molecular weight, Numbers of rings, rotatable bonds, H-bond donors, and H-bond acceptors, Element counts,
Molecular connectivity and shape / 28 / Molecular connectivity indices, Valence molecular connectivity indices, Molecular shape Kappa indices, Kappa alpha indices, flexibility index,
Electro-topological state / 97 / Electrotopological state indices, and Atom type electrotopological state indices, Weiner Index, Centric Index, Altenburg Index, Balaban Index, Harary Number, Schultz Index, PetitJohn R2 Index, PetitJohn D2 Index, Mean Distance Index, PetitJohn I2 Index, Information Weiner, Balaban RMSD Index, Graph Distance Index
Quantum chemical properties / 31 / Polarizability index, Hydrogen bond acceptor basicity (covalent HBAB), Hydrogen bond donor acidity (covalent HBDA), Molecular dipole moment, Absolute hardness, Softness, Ionization potential, Electron affinity, Chemical potential, Electronegativity index, Electrophilicity index, Most positive charge on H, C, N, O atoms, Most negative charge on H, C, N, O atoms, Most positive and negative charge in a molecule, Sum of squares of charges on H,C,N,O and all atoms, Mean of positive charges, Mean of negative charges, Mean absolute charge, Relative positive charge, Relative negative charge
Descriptor Class / Number of descriptor in class / Descriptors
Geometrical properties / 25 / Length vectors (longest distance, longest third atom, 4th atom), Molecular van der Waals volume, Solvent accessible surface area, Molecular surface area, van der Waals surface area, Polar molecular surface area, Sum of solvent accessible surface areas of positively charged atoms, Sum of solvent accessible surface areas of negatively charged atoms, Sum of charge weighted solvent accessible surface areas of positively charged atoms, Sum of charge weighted solvent accessible surface areas of negatively charged atoms, Sum of van der Waals surface areas of positively charged atoms, Sum of van der Waals surface areas of negatively charged atoms, Sum of charge weighted van der Waals surface areas of positively charged atoms, Sum of charge weighted van der Waals surface areas of negatively charged atoms, Molecular rugosity, Molecular globularity, Hydrophilic region, Hydrophobic region, Capacity factor, Hydrophilic-Hydrophobic balance, Hydrophilic Intery Moment, Hydrophobic Intery Moment, Amphiphilic Moment
Other established sets of descriptors / 323 / BCUT descriptors, Kier Molecular Flexibility Index, WHIM Descriptors, 3D-MoRSE descriptors, GETAWAY descriptors, Moran topological autocorrelation descriptors

References:

  1. R. Todeschini and V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, Weinheim(2000).
  2. M. Karelson and V. S. Lobanov, Quantum-Chemical Descriptors in QSAR-QSPR Studies. Chem. Rev., 1996, 96, 1027-1043.
  3. K. Tuppurainen and J. Ruuskanen, Electronic eigenvalus (EEVA): a new QSAR/QSPR descriptor for electronic substituent effect based on molecular orbital energies. A QSAR approach to the Ah receptor binding affinity of polychlorinated biphenyls (PCBs), dibenzo-p-dioxins(PCDDs) and dibenzofurns (PCDFs). Chemosphere, 2000, 41, 843-848.
  4. K. Tuppurainen, M. Vissas, R. Laatikainen and M. Perakyla, Evaluation of a Novel Electronic Eigenvalue (EEVA) Molecular Descriptor for QSAR/QSPR Studies: Validation Using a Benchmark Steroid Data Set. J. Chem. Inf. Comput. Sci. 2002, 42, 607-613.
  5. D. B. Turner and P. Willett, The EVA spectral descriptor, Eur. J. Med. Chem. 2000, 35, 367-375.
  6. V. Consonni, R. Todeschini and M. Pavan, Structure/Response Corelations and Similarity/Diversity Analysis by GETAWAY Descriptors. 1. Theory of the Novel 3D Molecular Descriptors. J. Chem. Inf. Comput. Sci. 2002,42,682-692.
  7. (a). J.Gasteiger and M. Marsili, Iterative Partial Equalization of Orbital Electronegativity―A Rapid Access to Atomic Charges. Tetrahedron, 1980, 36, 3219-3288. (b). M. D. Guillen and J. Gasteiger, Extension of the Method of iterative Partial Equalization of Orbital Electronegativity to Small Ring Systems. Tetrahedron, 1983, 39, 1331-1335. (c). W. J. Mortier, K. V. Genechten and J. Gasteiger, Electronegativity Equalizatio: Application and Parameterization. J.Am.Chem.Soc. 1985, 107, 829-835.
  8. (a). K. T. No, J. A. Grant and H. A. Scheraga, Determination of Net Atomic Charges Using a Modified Partial Equalization of Orbital Electronegativity. 1. Application to Neutral Molecules as Models for Polypeptides. J. Phys. Chem. 1990, 94, 4732-4739. (b).). K. T. No, J. A. Grant, M. S. Jhon and H. A. Scheraga, Determination of Net Atomic Charges Using a Modified Partial Equalization of Orbital Electronegativity. 2. Application to Ionic and Aromatic Molecules as Models for Polypeptides. J. Phys. Chem. 1990, 94, 4740-4746. (c). J.M.Park, K. T. No, M.S.Jhon and H. A. Scheraga, Determination of Net Atomic Charges Using a Modified Partial Equalization of Orbital Electronegativity. III. Application Halogenated and Aromatic Molecules. J. Comput. Chem. 1993, 14, 1482.
  9. R. T. Sanderson, Principles of Electronegativity, J.Chem.Edu. 1988, 65, 112-118.
  10. A.K.Ghose, V. N. Viswanadhan and J. J. wendoloski, Prediction of Hydrophobic (Lipophilic) Properties od Small Organic Molecules Using Fragmental Methods: An Analysis of ALOGP and CLOGP Methods. J. Phys. Chem. A 1998,102,3762-3772.
  11. K. J. Miller, Additive Methods in Molecular Polarizability. J. Am. Chem. Soc. 1990, 112, 8533-8542.
  12. H. P. Schultz, Topological Organic Chemistry. 1. Graph Theory and Topological Indices of Alkanes. J. Chem. Inf. Comput. Sci. 1989, 29, 227-228.
  13. Wiener, H. (a). Correlation of Heat of Isomerization, and Difference in Heats of Vaporization of Isomers among the Paraffin Hydrocarbons. J. Am. Chem. Soc., 1947, 69, 2636-2638; (b).Influence of Interatomic Forces on Paraffin Properties. J. Chem. Phys., 1947, 15,766-766; (c). Structural Determination of Paraffin Boiling Points. J. AM. Chem. Soc., 1947, 69,17-20.
  14. L. H. Hall, L.B.Kier, Electrotopological State Indices for Atom types: A Novel Combination of Electronic, Topological, and Valence State Information. J. Chem. Inf. Comput. Sci. 1995,35, 1039-1045.
  15. B. Ren, A new Topological Index for QSPR of Alkanes. J. Chem. Inf. Comput. Sci. 1999, 39, 139-143.
  16. L. H. Hall, L. B. Kier, Issues in Representation of Molecular Structure. The Development of Molecular Connectivity. J. Mol. Graph. Model. 2001, 20, 4-18.
  17. M. L.Mansfield and D. G. Covell, Anew Class of Molecular Shape Descriptors. 1. Theory and Properties. J. Chem. Inf. Comput. Sci. 2002, 42,259-273. (a). G. Bravi and J. H. Wikel, Application od MS_WHIM descriptors: 1.Introduction of New Molecular Surface Properties and 2. Prediction of Binding affinity Data. Quant. Struct. -Act. Relat., 2000,19,29-38; (b). G.Bravi and J. H. Wikel, Application of MS-WHIM Descriptors: 3. Prediction of Molecular Properties. Quant. Struct. –Act. Relat.,2000, 19, 39-49; (c). E. Gancia, G. Bravi, P. Mascagni and A. Zaliani, Global 3D-QSAR Methods: MS_WHIM and Autocorrelation. J. Comput.-Aid. Mol. Des., 2000,14,293; (d). G.Bravi, E. Gancia, P. Mascagni, M. Penga, R. Todeschini and A. Zaliani, MS-WHIM, New 3D Theoretical Descriptors Derived from Molecular Properties: A comparative 3D QSAR study on a series of steroids.
  18. T. Brinck, J. S. Murray, P. Politzer, Mol. Phys., 1992, 76, 609
  19. P. Politzer, P. Lane, J. S. Murray, T. Brinck, J. Phys. Chem., 1992, 96, 7938.
  20. J. S. Murray, P. Lane, T. Brinck, P. Politzer, J. Phys. Chem., 1993, 97, 5144
  21. (a). R.Todeschini and P. Grammatica, Quant. Struct. Act. Relat., 1997,16,113-119; (b). ). R.Todeschini and P. Grammatica, E. Marengo and R. Provenzani, Chemosphere, 1996, 33, 71-79; (c). R. Todeschini, M. Lasagni, and E. Marengo. J. Chem. 1994, 8, 263-273; (d). R.Todeschini, M. Vighi, R. Provenzani, A. Finzio and P. Grammatica. Chemosphere, 1996, 32, 1527-1545;(e). R.Todeschini and P. Grammatica. New 3D molecular Descriptors: The WHIM theory and QSAR Applications. Perspective in Drug Discovery and Design, 1998, 9-11, 355-380.
  22. J.Gasteiger, J. Sadowski, J. Schurr, P. Selzer, L. Steinhauer, and V. Steinhauer. Chemical Information in 3D Space. J. Chem. Inf. Comput. Sci. 1996, 36,1030-1037.

1