Supplementary Materials
Elaborate Ligand-Based Modeling RevealsNew Human Neutrophil Elastase Inhibitors
Maha Habash,a Ahmed H. Abdelazeem,b, c Mutasem O. Taha,d*
aDepartment of Pharmaceutical Chemistry and Pharmacognosy, Faculty of Pharmacy, Applied Science University, Amman, Jordan.
bDepartment of Medicinal Chemistry, Faculty of Pharmacy, Beni-Suef University, Beni-Suef, Egypt.
cDepartment of Pharmaceutical Chemistry, College of Pharmacy, Taif University, Taif, KSA.
dDrug Discovery Unit, Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, Jordan
* To whom correspondence should be addressed: Drug Discovery Unit, Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, Jordan.
Tel.: 00962 6 5355000
Fax: 00962 6 5339649
E-mail address: (Mutasem O. Taha, Ph.D.).
SM-1. Conformational Analysis
The molecular flexibilities of the collected compounds were taken into account by considering each compound as a collection of conformers representing different areas of the conformational space accessible to the molecule within a given energy range. Accordingly, the conformational space of each inhibitor (1-115, Table A) was explored adopting the ‘‘BEST’’ option within CATALYST. Default parameters were employed in the conformation generation procedure, i.e., a conformational ensemble was generated with an energy threshold of 20 kcal/ mol from the local minimized structure which has the lowest energy level and a maximum limit of 250 conformers per molecule(Smellie (a) et al., 1995: Smellie (b) et al., 1995; CATALYST, 2005; Discovery Studio , 2010).
SM-2. CATALYST pharmacophore generation algorithm
CATALYST-HYPOGEN enables automatic pharmacophore construction by using a collection of at least 16 molecules with bioactivities spanning over 3.5 orders of magnitude(Smellie (a) et al., 1995: Smellie (b) et al., 1995; CATALYST, 2005; Discovery Studio, 2010).
The fact that we have a list of 115HNE inhibitors of evenly spread bioactivities over 3.5 orders of magnitude prompted us to employ HYPOGEN algorithm to identify as many binding modes assumed by HNE inhibitors as possible.
HYPOGEN evaluates large number of potential binding pharmacophores through perturbations to hypotheses that survived the constructive and subtractive phases of the modeling algorithm (Li et al.,2000). The configuration (Config., see section SM-3 below) cost calculated for each modeling run reflects the number of evaluated pharmacophoric models. To guarantee thorough analysis of all models, it is recommended that the Config. cost of any HYPOGEN run not to exceed 17 (corresponding to 217 hypotheses to be assessed by CATALYST)(Sutter et al., 2000).
HYPOGEN identifies a 3D array of a maximum of five chemical features common to active training molecules, which provides a relative alignment for each input molecule consistent with their binding to a proposed common receptor site. The chemical features considered can be hydrogen bond donors and acceptors (HBDs and HBAs, respectively), aliphatic and aromatic hydrophobes (Hbic features), positive and negative charges, positive and negative ionizable groups and aromatic planes(Li et al., 2000).
All 115 molecules with their associated conformational models were regrouped into a spreadsheet. The biological data of the inhibitors were reported with an “Uncertainty” value of 3, which means that the actual bioactivity of a particular inhibitor is assumed to be situated somewhere in an interval ranging from one-third to three-times the reported bioactivity value of that inhibitor (Li et al., 2000; Guneret al., 2004).CATALYST requires the uncertainty parameter for two purposes: (i) as means to classify the training compounds into most-active, moderate and inactive categories, which is essential for the three modeling phases of CATALYST, and (ii) as means to define the tolerance sizes of the binding features' spheres in the resulting pharmacophore. The default value of this parameter is 3. Typically, CATALYST requires informative training sets that include at least 16 compounds of evenly spread bioactivities over at least three and a half logarithmic cycles(Li et al., 2000; Guner et al., 2004).
Pharmacophore modeling employing CATALYST proceeds through three successive phases: the constructive phase, subtractive phase and optimization phase. During the constructive phase, CATALYST generates common conformational alignments among the most-active training compounds. Only molecular alignments based on a maximum of five chemical features are considered. The program identifies a particular compound as being within the most active category if it satisfies equation (A).(Abu Hammad et al., 2007; Fischer, 1966; Jacobsson et al., 2003; Poptodorov, 2006; Sprague et al., 1997).
(MAct x UncMAct) - (Act / UncAct) > 0.0 ………….. (A)
Where “MAct” is the activity of the most active compound in the training set, “Unc” is the uncertainty of the compounds and “Act” is the activity of the training compounds (Table A).
In the subsequent subtractive phase, CATALYST eliminates some hypotheses that fit inactive training compounds. A particular training compound is defined as being inactive if it satisfies equation (B).(Abu Hammad et al., 2007; Fischer, 1966; Jacobsson et al., 2003; Poptodorov, 2006; Sprague et al., 1997).
Log (Act) – log (MAct) > BS …………. (B)
Where, “BS” is the bioactivity spread (equals 3.5 by default,Table A). However, in the optimization phase, CATALYST applies fine perturbations in the form of vectored feature rotation, adding new feature and/or removing a feature, to selected hypotheses that survived the subtractive phase, in an attempt to find new models of enhanced bioactivity/mapping correlations. CATALYST selects the highest-ranking models (10 by default) and presents them as the optimal pharmacophore hypotheses resulting from the particular automatic modeling run.
SM-3. Pharmacophore Assessment in CATALYST
When generating hypotheses, CATALYST attempts to minimize a cost function consisting of three terms: Weight cost, error cost and configuration cost.(Abu Hammad et al., 2007; Fischer, 1966; Jacobsson et al., 2003; Poptodorov, 2006; Sprague et al., 1997).In a successful automatic modeling run, CATALYST ranks the generated models according to their total costs. CATALYST also calculates the cost of the null hypothesis, which presumes that there is no relationship in the data and that experimental activities are normally distributed about their mean. Accordingly, the greater the difference from the cost of the null hypothesis, the more likely that the hypothesis does not reflect a chance correlation.(Abu Hammad et al., 2007; Fischer, 1966; Jacobsson et al., 2003; Poptodorov, 2006; Sprague et al., 1997).CATALYST-HYPOGEN implements additional approach to assess the quality of resulting pharmacophores, namely, the Cat-Scramble approach. This validation procedure is based on Fisher’s randomization test.In this test, a 90% confidence level was selected, which instructs CATALYST to scramble the bioactivities of training compounds to generate 19 random spreadsheets. Subsequently, CATALYST-HYPOGEN is challenged to use these random spreadsheets to generate hypotheses using exactly the same features and parameters used in generating pharmacophore models from unscrambled bioactivity data. Success in generating pharmacophores of comparable cost criteria to those produced by the original unscrambled data reduces the confidence in the training compounds and their respective pharmacophore models.(Abu Hammad et al., 2007; Fischer, 1966; Jacobsson et al., 2003; Poptodorov, 2006; Sprague et al., 1997).Weight cost is a value that increases as the feature weight in a model deviates from an ideal value of 2. The deviation between the estimated activities of the training set and their experimentally determined values adds to the error cost. The activity of any compound can be estimated from a particular hypothesis through equation (C).(Abu Hammad et al., 2007; Fischer, 1966; Jacobsson et al., 2003; Poptodorov, 2006; Sprague et al., 1997).
Log (Estimated Activity) = I + Fit ………………… (C)
Where, I = the intercept of the regression line obtained by plotting the log of the biological activity of the training set compounds against the Fit values of the training compounds. The Fit value for any compound is obtained automatically employing equation (D).(Abu Hammad et al., 2007; Fischer, 1966; Jacobsson et al., 2003; Poptodorov, 2006; Sprague et al., 1997).
Fit = Σ mapped hypothesis features × W [1−Σ(disp/tol) 2]……..…. (D)
Where, Σ mapped hypothesis features represents the number of pharmacophore features that successfully superimpose (i.e., map or overlap with) corresponding chemical moieties within the fitted compound, W is the weight of the corresponding hypothesis feature spheres. This value is fixed to 1.0 in CATALYST-generated models. disp is the distance between the center of a particular pharmacophoric sphere (feature centroid) and the center of the corresponding superimposed chemical moiety of the fitted compound; tol is the radius of the pharmacophoric feature sphere (known as Tolerance, equals to 1.6 Å by default). Σ(disp/tol)2 is the summation of (disp/tol)2 values for all pharmacophoric features that successfully superimpose corresponding chemical functionalities in the fitted compound.(Abu Hammad et al., 2007; Fischer, 1966; Jacobsson et al., 2003; Poptodorov, 2006; Sprague et al., 1997).
The third term, i.e., the configuration cost, penalizes the complexity of the hypothesis. This is a fixed cost, which is equal to the entropy of the hypothesis space. The more the numbers of features (a maximum of five) in a generated hypothesis, the higher is the entropy with subsequent increase in this cost. The overall cost (total cost) of a hypothesis is calculated by summing over the three cost factors. However, error cost is the main contributor to total cost.
SM-4. ROC analysis
It was necessary to prepare valid evaluation structural database (testing set) that contains an appropriate list of decoy compounds in combination with diverse list of known active compounds. The decoy list was prepared as described by Verdonk and co-workers (Kirchmair et al., 2008; Verdonk et al., 2004; Triballeau et al., 2005). Briefly, the decoy compounds were selected based on three basic one-dimensional (1D) properties that allow the assessment of distance (D) between two molecules (e.g., i and j), namely: (1) the number of hydrogen-bond donors (NumHBD); (2) number of hydrogen-bond acceptors (NumHBA) and (3) count of nonpolar atoms (NP, defined as the summation of Cl, F, Br, I, S and C atoms in a particular molecule). For each active compound in the testing set, the distance to the nearest other active compound is assessed using their Euclidean Distance (equation E):
The minimum distances are then averaged over all active compounds (Dmin). Subsequently, for each active compound in the testing set an average of 30 decoys were randomly chosen from the ZINC database(Irwin and Shoichet, 2005). The decoys were selected in such a way that they did not exceed Dmin distance from their corresponding active compound.
Moreover, to further diversify the actives members, i.e., to avoid close similarity among actives in the testing set, any active compound having zero distance () from other active compound (s) in the testing set were excluded. Active testing compounds were defined as those possessing HNE affinities (Ki values) ranging from 5.83 nM to 880.65 nM. The testing set included 19 active compounds and 621 ZINC compounds.
The testing set (640 compounds) was screened by each pharmacophore for ROC analysis employing the "Best flexible search" option implemented in CATALYST, while the conformational spaces of the compounds were generated employing the "Fast conformation generation option" implemented in CATALYST. Compounds missing one or more features were discarded from hit lists. The in silico hits were scored employing their fit values (best fit values) as calculated by equation (D). Subsequently, hit lists were used to construct ROC curves for corresponding pharmacophores
ROC curve analysis describes the sensitivity (Se or true positive rate, equation F) for any possible change in the number of selected compounds (n) as a function of (1-Sp). Sp is defined as specificity or true negative rate (equation G)(Jacobsson et al., 2003; Kirchmair et al., 2008).
where, TP is the number of active compounds captured by the virtual screening method (true positives), FN is the number of active compounds discarded by the virtual screening method, TN is the number of discarded decoys (presumably inactives), while FP is the number of captured decoys (presumably inactives).
A ROC curve is plotted by setting the score of the active molecule as the first threshold. Afterwards, the number of decoys within this cutoff is counted and the corresponding Se and Sp pair is calculated. This calculation is repeated for the active molecule with the second highest score and so forth, until the scores of all actives are considered as selection thresholds(Jacobsson et al., 2003; Kirchmair et al., 2008). The ROC curve representing ideal distributions, where no overlap between the scores of active molecules and decoys exists, proceeds from the origin to the upper-left corner until all the actives are retrieved and Se reaches the value of 1. Thus, the ideal ROC curve continues as a horizontal straight line to the upper-right corner where all actives and all decoys are retrieved, which corresponds to Se = 1 and Sp = 0. In contrast to that, the ROC curve for a set of actives and decoys with randomly distributed scores tends towards the Se = 1-Sp line asymptotically with increasing number of actives and decoys(Jacobsson et al., 2003; Kirchmair et al., 2008).
The success of a particular virtual screening workflow can be judged from the following criteria:
1) Area under the ROC curve (AUC): In an optimal ROC curve an AUC value of 1 is obtained; however, random distributions cause an AUC value of 0.5. Virtual screening that performs better than a random discrimination of actives and decoys retrieve an AUC value between 0.5 and 1(Jacobsson et al., 2003; Kirchmair et al., 2008)
2) Overall accuracy (ACC): describes the percentage of correctly classified molecules by the screening protocol (equation H). Testing compounds are assigned a binary score value of zero (compound not captured) or one (compound captured) (Jacobsson et al., 2003; Kirchmair et al., 2008)
where, N is the total number of compounds in the testing database, A is the number of true actives in the testing database.
3) Overall specificity (SPC): describes the percentage of discarded inactives by the particular virtual screening workflow. Inactive test compounds are assigned a binary score value of zero (compound not captured) or one (compound captured)(Jacobsson et al., 2003; Kirchmair et al., 2008).
4) Overall true positive rate (TPR or overall sensitivity): describes the fraction percentage of captured actives from the total number of actives. Active test compounds are assigned a binary score value of zero (compound not captured) or one (compound captured).
5) Overall false negative rate (FNR or overall percentage of discarded actives): describes the fraction percentage of active compounds discarded by the virtual screening method. Discarded active test compounds are assigned a binary score value of zero (compound not captured) or one (compound captured).
SM-5.Addition of Exclusion Volumes
In HIPHOP-REFINE the user defines how many molecules must map the selected pharmacophore hypothesis completely or partially through controling the Principal and Maximum Omitted Features (MaxOmitFeat) parameters. Active compounds are normally assigned a MaxOmitFeat parameter of zero and Principal value of 2 to instruct the software to consider all their chemical moieties to fit them against all the pharmacophoric features of the particular hypothesis. On the other hand, inactive compounds are allowed to miss one (or more) features by assigning them a MaxOmitFeat of 1 (or 2) and Principal value of zero. Moderately active compounds are normally assigned a principal value of 1 and a MaxOmitFeat of 0 or 1 to encode their intermediate status(Clement and Mehl, 2000).
A subset of training compounds was carefully selected for HIPHOP-REFINE modeling. It was decided to consider Ki of 2690nM as an arbitrary activity/inactivity threshold. Accordingly, inhibitors of Ki values ≤ 2690 nM were regarded as ‘‘actives’’ and were assigned principal and MaxOmitFeat values of 2 and 0, respectively, while less active inhibitors were assigned principal values of one or zero and were carefully evaluated to assess whether their lower potencies are attributable to missing one or more pharmacophoric features, i.e., compared to active compounds (MaxOmitFeat = 1 or 2), or related to possible steric clashes within the binding pocket (MaxOmitFeat = 0). HIPHOP-REFINE was configured to allow a maximum of 100 exclusion spheres to be added to the generated pharmacophoric hypotheses.
Table AThe structures of Human Neutrophil Elastase (HNE) inhibitors utilized in modeling
Compound / R / Ki (nM) / Reference1 / / 264 / (Ohmoto et al., 2001)
2 / / 42.8 / ibid
3 / (HCl salt) / 111 / ibid
4 / / 12.2 / ibid
Compound / R / Ki(nM) / ibid
5 / / 40.8 / ibid
6 / / 4.73 / ibid
7 / / 0.85 / ibid
8 / / 1.87 / ibid
9 / / 5.09 / ibid
10 / / 10.4 / ibid
11 / / 8.10 / ibid
12 / / 5.00 / ibid
13 / / 1.47 / ibid
14 / / 6.04 / ibid
15 / / 6.63 / ibid
16 / / 14.7 / ibid
17 / / 4.01 / ibid
18 / / 1.67 / ibid
19 / / 4.27 / ibid
20 / / 4.79 / ibid
21 / / 8.45 / ibid
22 / / 2.50 / ibid
23 / / 3.45 / ibid
Compound / X / Ki(nM) / ibid
24 / H / 3.41 / ibid
25 / 2-Me / 1.7 / ibid
26 / 3-Me / 4.10 / ibid
27 / 4-Me / 1.6 / ibid
28 / 2-OMe / 3.2 / ibid
29 / 4-OMe / 1.50 / ibid
30 / 2-CF3 / 4.90 / ibid
31 / 4-CF3 / 11.0 / ibid
Compound / X / Ki(nM) / ibid
32 / F / 2.00 / ibid
33 / Cl / 2.20 / ibid
34 / CN / 1.90 / ibid
35 / OCF3 / 9.90 / ibid
36 / OEt / 8.90 / ibid
37 / OBn / 4.60 / ibid
38 / OH / 4.20 / ibid
39 / NMe2 / 6.40 / ibid
Compound / X / Y / KI(nM) / Reference
40 / OMe / Me / 5.00 / ibid
41 / OMe / OMe / 5.10 / ibid
42 / F / Me / 3.10 / ibid
43 / F / OMe / 3.30 / ibid
44 / F / F / 1.70 / ibid
45 / Me / Me / 4.60 / ibid
Compound / R / Ki(nM) / Reference
46 / / 2.25 / (Inoue et al., 2009)
47 / / 7.10 / ibid
48 / / 7.99 / ibid
49 / / 4.14 / ibid
50 / / 9.76 / ibid
51 / / 7.99 / ibid
52 / / 12.72 / ibid
53 / / 12.42 / ibid
54 / / 10.35 / ibid
55 / / 13.90 / ibid
56 / / 6.80 / ibid
57 / / 5.03 / ibid
58 / / 18.34 / ibid
59 / / 8.87 / ibid
60 / / 18.33 / ibid
Compound / / R1 / Ki (nM) / Reference
61 / / / 6.21 / ibid
62 / / / 5.92 / ibid
63 / / / 1.95 / ibid
64 / / / 1.95 / ibid
65 / / CF3 / 8.28 / ibid
66 / / CF3 / 295.77 / ibid
67 / / CF3 / 8.58 / ibid
Compound / B5 / P2 / X / P6 / Ki(nM) / Reference
68 / / / N / / 41.32 / (Shreder et al., 2009)
69 / / / C / / 115.16 / ibid
70 / / / N / / 6300 / ibid
71 / / / N / / 609.68 / ibid
72 / / / N / / 57.58 / ibid
73 / / / N / / 58.94 / ibid
74 / / / N / / 480.97 / ibid
75 / / / N / / 108.39 / ibid
76 / / / N / / 264.9 / ibid
77 / / / N / / 4064.52 / ibid
78 / / / N / / 5487.10 / ibid
79 / / / N / / 74.52 / ibid
80 / / / N / / 43.35 / ibid
Compound / Series / B5 / B7 / Ki(nM) / Reference
81 / A / / / 63 / ibid
82 / B / / / 53.52 / ibid
83 / C / / / 50.13 / ibid
84 / A / / / 5.49 / ibid
85 / B / / / 5.83 / ibid
86 / C / / / 5.76 / ibid
87 / A / / / 13.55 / ibid
88 / B / / / 18.97 / ibid
89 / C / / / 18.97 / ibid
90 / A / / / 3590.32 / ibid
91 / C / / / 1761.29 / ibid
92 / A / / / 169.35 / ibid
93 / B / / / 121.94 / ibid
94 / C / / / 48.10 / ibid
95 / A / / / 142.26 / ibid
96 / B / / / 223.55 / ibid
97 / C / / / 94.84 / ibid
98 / A / / / 81.29 / ibid
99 / B / / / 94.84 / ibid
100 / C / / / 74.52 / ibid
101 / A / / / 101.61 / ibid
102 / B / / / 880.65 / ibid
103 / C / / / 575.81 / ibid
104 / A / / / 10161.29 / ibid
D E
Compound / Series-R GROUP / KI(nM) / Reference
D / E
105 / / ------ / 15.58 / ibid
106 / ------ / (R)-3-CO2H / 12.87 / ibid
107 / / ------ / 23.03 / ibid
108 / / ------ / 18.97 / ibid
109 / / ------ / 10.84 / ibid
110 / / ------ / 18.29 / ibid
111 / / ------ / 14.23 / ibid
112 / / ------ / 14.90 / ibid
113 / / ------ / 10.84 / ibid
114 / / ------ / 16.26 / ibid
115 / / ------ / 14.23 / ibid
Table B Training subsets employed in exploring the pharmacophoric space of HNE inhibitors.
subsets / Most activea / Moderate active / Least activeb / No. of compoundsI / 7,17,18,28,29,37,38
43, 46,57,62,63 / 4,10,69,74,76,87,91,93,103,106,108,109 / 70,77,78,90,104 / 29
II / 19,27,30,33,37,42,46 / 69,74,76,77,90,91,108 / 70,78,104 / 17
III / 13,17,19,27,30,33,44,63,65 / 5,68,69,73,77,79,80,112,114 / 70,78,104 / 21
IV / 46,57,59,63,64 / 1,2,68,72,73,75,76,77,78 ,80,83,88,101 / 70 / 19
V / 80,86,89,109,113,114 / 69,71,72,73,74,75,76,79 / 70,77,78,104 / 18
VI / 85,88,105,106 / 69,71,73,74,75,76,79,82,98 / 70,77,78 / 16
VII / 68,80,84,86,89,108, 109,110,111 / 69,72,73,75 / 70,77,78,104 / 17
a,bPotency categories as defined by equations (A) and (B) under SM-2 in Supplementary Materials.