COMPUTATIONAL APPROACH-BASED STRUCTURE-ACTIVITY RELATIONSHIP OF BACE1 INHIBITORS USING MONTE CARLO ALGORITHM
Md Lutful Islam1,2*#, Z. A. Usmani2, Gulabchand K. Gupta1
1Department of Computer Science and Engineering, Shri Jagdish Prasad Jhabarmal Tibrewala University, Vidyanagari, Jhunjhunu, Rajasthan – 333 001
2Department of Computer Engineering, M.H. Saboo Siddik College of Engineering, 8, Saboo Siddik Polytechnic Road, Byculla, Mumbai 400008
Corresponding author: Md Lutful Islam:
Email: ,
Phone: +91-9869972799
#Research scholar
Abstract
In the current manuscript Monte Carlo algorithm based quantitative structure-activity relationship (QSAR) was performed on BACE1 inhibitors to find out important chemical functionalities for therapeutic application in Alzheimer’s disease (AD). The molecular descriptors were developed from simplified molecular input-line entry system (SMILES) representation of chemical structures. The dataset was collected from BindingDB and further divided into training, calibration, test and validation sets. The training set was used to develop models, whereas remaining sets were applied to evaluate the predicted ability of the models. Two models were generated based on influence and without influence of cyclic rings on the inhibitory activity. The statistical parameters of both models were analysed and found that both models are statistically robust in nature. On comparison of statistical parameters it was revealed that cyclic rings have significant impact on inhibitory activity. Different SMILES molecular attributes were found to be crucial to increase or decrease biological activity which suggest that both models have mechanistic interpretation. Therefore, understanding of the model can be directed to design and discover potential BACE1 inhibitors for the AD.
Key words: QSAR, CORAL, Monte Carlo, BACE1, Alzheimer’s disease
1. Introduction
Alzheimer's disease (AD) is incurable neurodegenerative disease that change the mental condition of patients and highly dominant in old age1-4. AD is the most common cause of senile dementia which is considered by loss of memory, disorientation, struggle in speaking or writing, loss of reasoning skills, and misunderstandings among other symptoms5. AD is thetop cause for disabilitiesin later life of human being. As per reports by the World Health Organization (WHO), in United States an estimated 5.4 million of all ages have AD in 2016 only, whereas more than 40 millionpeople are living AD or a related dementia in 2016 worldwide. As per study, the AD prevalence is increasing exponentially which results affected people will be double by 2030 and more than triple by 20506. Moreover, with development of AD, other risk factors also identified including hypertension, dyslipidemia, metabolic syndrome and diabetes7-9. To date there is no effective curative agents for the treatment of such devastating disease except management of symptoms1.
Beta-site amyloid precursor protein cleaving enzyme 1 (BACE1) has been recognised as a significant target for AD intervention as its inhibition would halt the development of extracellular aggregates of amyloid-β protein (Aβ) at the very beginning of β-amyloid precursor protein (APP) processing1. Thereby, slow down the development of AD by inhibiting Aβ formation at an early stage is ideal approach. The actual action of mechanism of BACE1 is not entirely known, it is thought to have implications in myelin sheath formation and other biological processes, most associated with the processing of neuregulin-110. Experimentally, it has been proved that BACE1 enzyme could be clinically feasible with few mechanistic side effects11-13. Therefore, the BACE1 can be used as potential target to identify promising inhibitors for the management of AD.
The primary goal of the current research was to derive correlation between chemical structure in terms of descriptor with their respective biological activity with the help of pharmacoinformatics techniques such as quantitative structure activity relationships (QSAR) modeling. QSAR is the statistically well-validated mathematical relationship between experimental micro- or nano- level properties and chemical structure obtained from particular selected receptor inhibitor or activator. Therefore, in purpose of QSAR model development using Monte Carlo algorithm SMILES (simplified molecular-input line-entry system)formatted representation of BACE1 inhibitor were adopted to find out the important chemical functionalities to design promising BACE1 inhibitor for therapeutic application in AD
2. Materials and methods
2.1 Dataset
In order to construct QSAR models a set of 1400 BACE1 inhibitors was collected from binding database with inhibition constant (Ki). From this data set, all duplicated compounds were deleted followed by molecules with no activity or without definite activity also eliminated. Finally, 411 molecules were considered for the model development. The whole dataset was converted into SMILES format for the purpose of input using OpenBabel software (http://openbabel.org/wiki/Main_Page). The inhibitory activity were converted into pKi as
pKi=log((1Ki)×10000) and considered as the end point of the QSAR. The dataset was randomly divided into training set, calibration set, test set and validation set. The training set was used for model development, whereas calibration and test sets were considered to evaluate the prediction capability of the model. The validation set was further used for estimation of the model.
2.2 Optimal Descriptor
The SMILES representation of molecular structures were considered for descriptor generation. The optimal descriptors are mathematical functions of correlation weights (CW) that is descriptor correlation weight (“DCW”). In order to calculate DCW, the well-known Monte Carlo method was used. The DCW was calculated based on without- and with considering the influence of cyclic rings using the following equations (1) and (2) respectively
(1)
(2)
Where, T and Nepoch are the threshold and number of epochs respectively used in the Monte Carlo optimization. The Sk, SSk and SSSk are representation of one, two and three symbols from SMILES respectively. NOSP, HALO, BOND and PAIR are different types of descriptors derived based on presence or absence of different chemical elements and bonds. α, β, γ, x, y, t are discrete coefficient with values 0 and 1. Details of above descriptors with example are explained by Worachartcheewan et al.14. C3, C4, C5, C6 and C7 are three membered cycles, four membered cycles, five membered cycles, six membered cycles and seven membered cycles respectively. The goal of optimization was to derive maximal correlation coefficient between optimal descriptor and the endpoints of training set molecules.
To calculate CW, the Monte Carlo method was implemented which gives optimal statistical results for the test set molecules. The preferable threshold (T*) and number of epochs (N*), range of T and Nepoch were considered from 1-10 and 1-30 respectively. The endpoints of training set was calculated using the following equation
(3)
Where the ‘Endpoint’ is the inhibitory activity and C0 and C1 are constants.
2.3 Validation
In order to select best statistical model, it is essential and important to validate the models through predictive ability and reliability. In the current study, models were validated through different statistical parameters included correlation coefficient (R2), error of estimation (s) modified correlation coefficient (r2m) and degree of freedom (F).
3. Results and discussion
A set of more than 1400 BACE1 inhibitors were collected from binding database with inhibition constant (Ki) and subsequently deleted the duplicate and without definite or no biological activity molecules. Finally 411 BACE1 inhibitors were considered for QSAR model development using online freely available CORAL software (http://www.insilico.eu/coral/). The CORAL software is based on Monte Carlo optimization method. The molecular descriptors were generated from the SMILES representation of molecular structures and successively well-validated QSAR models developed. For development of QSAR models, descriptor were generated using two approaches such as with and without considering impacts of cyclic rings of the molecules in inhibitory activities. The best (T*, N*) were observed as (3, 8) and (3, 1) for without and with considering rings respectively. The training set was used for model generation while remaining sets, i.e., calibration, test and validation set were used to validate the model.
Table 1: Statistical parameters of Models 1 and 2
Without considering influence of various cyclic rings:(Model 1)
Training set / n = 203; R2 = 0.615; s = 0.719; F = 321; r2m = 0.603
Calibration set / n = 69; R2 = 0.661; s = 0.769; F = 131
Test set / n = 69; R2 = 0.638; s = 0.748; F = 118
Validation set / n = 70; R2 = 0.531, s = 0.759, F = 51
Considering influence of various cyclic rings:
(Model 2)
Training set / n = 203; R2 = 0.689; s = 0.646; F = 444; r2m = 0.571
Calibration set / n = 69; R2 = 0.695; s = 0.732; F = 152
Test set / n = 69; R2 = 0.610; s = 0.745; F = 105
Validation set / n = 70; R2 = 0.549, s = 0.721, F = 49
The QSAR models were generated using two approaches such as without considering and with considering the influence of several cyclic rings on the inhibitory activity of the training set molecules. Consequently, Model 1 (in Table 1) was developed without any influence of cyclic rings on Ki, whereas Model 2 (in Table 1) established with the contribution of cyclic rings on the inhibitory activity. Both models 1 and 2 along with the statistical parameters are depicted in the Table 1. The data and models delineated in the Table 1 undoubtedly explained that both models are statistically well validated and robust in nature to predict the activity of the molecules beyond training set. The high value of R2 and r2m clearly indicated that both models predicted the biology activity close to the experimental activity. The observed and predicted activity as per Models 1 and 2 are depicted in the Figures 1 and 2 respectively.
Figure 1: Experimental and predicted inhibitory activity as per Model 1
Figure 2: Experimental and predicted inhibitory activity as per Model 2
It can be seen that correlation value of training, calibration, test and validation sets are much higher in case of Model 2 with respect to Model 1 which clearly explained that cyclic rings of the whole dataset have significant impact on inhibitory activity. In order to check close proximity between experimental and predicted activity of the training set the Radar plots were generated on training set molecules of Models 1 and 2 and portrayed in Figure 3. Significant coinciding the observed and predicted inhibitory activity values in the radar plots explained that both models are adequately competent to predict the biological activity of the molecules of training set molecules.
Figure 3: Radar plot showing fitness of predicted and experimental activity of training set molecules of Models 1 and 2
To explain chemical information of the molecules the molecular fragments derived from SMILES were interpreted and analysed using correlation weights from QSAR models. The different aspects were considered to analysis the data samples as following categories: i) positive correlation weights of molecular features in all runs, increased for an endpoint value; ii) negatively contributed correlation weights of molecular features in all runs decreased for an endpoint value; iii) molecular features with both positive and negative correlation weights in several runs: the correlation weights of molecular features > ‘0’ infers that molecular features have positive influence on the activity, if it is <‘0’ then molecular features have negative influence on the inhibitory activity. Moreover, if the correlation weights of molecular features found a mix of greater than and less than ‘0’ values then the impact of molecular features is undefined on the inhibitory activity.
In the current research one of the molecule from training set was considered to explore the impact of molecular features on pKi. In case of Model 1, it was observe that smiles attributes ‘O……’, ‘N……’, ‘=……2…..’, ‘[…..(…..N…..’, ‘=…..O…..(…..’, ‘C…..(…..[……’ and ‘[…..2…..=…..’ contributed positively to enhance the inhibitory activity with their correlation weights 3.498, 3.751, 8.004, 4.752, 7.603, 4.502 and 11.001 respectively. On the other hand the SMILES attributes ‘@@.....’, ‘=…..’, ‘O…..1…..’, ‘C…..@...... ’, ‘C…..(…..[…..’ and ‘=…..C…..1.....’ were found to contribute negatively with inhibitory activity with CW values as -4.003, -2.005, -5.000, -3.249, -4.003 and -5.002 respectively. In case of Model 2, ‘O…..’, ‘N…..’, ‘[…..C’, ‘O…..(…..’, ‘N…..1…..’, ‘[…..(…..N…..’, ‘(…..N…..(…..’, ‘O…..=…..C…..’ and ‘C…..=…..2…..’ were revealed as positive coefficient of correlation weights indicated the positive impact on pKi with CW values of 2.745, 2.999, 2.496, 3.250, 5.996, 3.506, 4.999, 5.501 and 8.004 respectively. The negatively contributed SMILES attributes on Model 2 were found as ‘=…..’, ‘1…..’, ‘(…..’, ‘C…..1…..’, ‘O…..1…..’, ‘[…..2…..’, ‘=…..C…..1…..’, ‘O…..1…..C…..’, and ‘C…..O…..1…..’ with their correlation weights as -2.503, -3.004, -2.503, -5.004, -3.999, -5.005, -3.999, -3.997 and -5.001 respectively. Absence of halogen atoms (‘HALO00000000’) were found to be important to decrease the value of pKi in both models. The attribute ‘BOND10100000’ was revealed as important to enhance the inhibitory activity in case of both models 1 and 2. The attribute ‘NOSP11000000’ that is presence of ‘N’ and ‘O’, and absence of ‘S’ and ‘P’ contributed negatively on Model 1 but positively on Model 2. Therefor it was observed from the above discussion that models developed from BACE1 inhibitors with considering the inhibitory constant as endpoints have mechanistic interpretations for deducing how the molecular features can be attributed to appropriate atoms and/or molecular fragments.
4. Conclusions
A set of BACE1 inhibitors with inhibitory constant were considered to generate the predictive QSAR models using online CORAL software. Molecules were represented as SMILES format to derive the descriptors followed by development of statistically robust QSAR models with and without considering the influence of different rings on the inhibitory activity. Several statistical parameters were calculated to evaluate the predictive ability of models. Linear and radar plots were generated from experimental and predicted inhibitory activities and both plots explained the well-fitness of activities of the molecules. Derived statistical parameters clearly explained that cyclic rings significantly influence the inhibition of BACE1. Both models were interpreted mechanistically in terms of SMILES attributes contributed to the inhibition of BACE1. Models developed in the current study can be used to predict inhibitory activity for BACE1 inhibitors without experiments. Finally, it can be concluded that the reported models can be used to design and discover promising BACE1 inhibitors for therapeutic application on Alzheimer’s disease.