Supporting Materials for

Predictive models for identifying the binding activity of structurally diverse chemicals to human pregnane X receptor

Cen Yin 1, Xianhai Yang 2,*Mengbi Wei 1 and Huihui Liu1,*

1 Jiangsu Key Laboratory of Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province, China.

2 Nanjing Institute of Environmental Sciences, Ministry of Environmental Protection, Jiang-wang-miao Street, Nanjing, 210042, China.

* Corresponding Author. Tel./fax: +86 02584315827 and +86 02585287057. E-mail address: and .

1

Table S4 Statistical parameters of the developed classification models.

Models / Descriptors / k / Dataset / TP / TN / FN / FP / Sn / Sp / Q
1 / RBN
ChiA_G / 3 / training set / 399 / 1059 / 162 / 423 / 0.711 / 0.715 / 0.714
validation set / 144 / 350 / 43 / 144 / 0.770 / 0.709 / 0.725
2 / nCIC
ChiA_Dz(Z) / 3 / training set / 397 / 1080 / 164 / 402 / 0.708 / 0.729 / 0.723
validation set / 134 / 360 / 53 / 134 / 0.717 / 0.729 / 0.725
3 / nCIC
ChiA_Dz(i) / 3 / training set / 404 / 1082 / 157 / 400 / 0.720 / 0.730 / 0.727
validation set / 141 / 363 / 46 / 131 / 0.754 / 0.735 / 0.740
4 / nCIR
DLS_cons / 3 / training set / 395 / 1051 / 166 / 431 / 0.704 / 0.709 / 0.708
validation set / 132 / 364 / 55 / 130 / 0.706 / 0.737 / 0.728
5 / TRS
ChiA_G / 3 / training set / 413 / 1047 / 148 / 435 / 0.736 / 0.706 / 0.715
validation set / 137 / 347 / 50 / 147 / 0.733 / 0.702 / 0.711
6 / Rperim
ChiA_G / 3 / training set / 408 / 1059 / 153 / 423 / 0.727 / 0.715 / 0.718
validation set / 143 / 346 / 44 / 148 / 0.765 / 0.700 / 0.718
7 / RCI
ChiA_D / 3 / training set / 410 / 1067 / 151 / 415 / 0.731 / 0.720 / 0.723
validation set / 131 / 347 / 56 / 147 / 0.701 / 0.702 / 0.702
8 / RCI
ChiA_Dz(i) / 3 / training set / 407 / 1071 / 154 / 411 / 0.725 / 0.723 / 0.723
validation set / 139 / 351 / 48 / 143 / 0.743 / 0.711 / 0.720
9 / NRS
ChiA_Dz(Z) / 3 / training set / 408 / 1077 / 153 / 405 / 0.727 / 0.727 / 0.727
validation set / 131 / 354 / 56 / 140 / 0.701 / 0.717 / 0.712
10 / NRS
ChiA_Dz(i) / 3 / training set / 414 / 1074 / 147 / 408 / 0.738 / 0.725 / 0.728
validation set / 140 / 351 / 47 / 143 / 0.749 / 0.711 / 0.721
11 / NNRS
ChiA_Dz(i) / 3 / training set / 414 / 1063 / 147 / 419 / 0.738 / 0.717 / 0.723
validation set / 142 / 352 / 45 / 142 / 0.759 / 0.713 / 0.725
12 / nR06
ChiA_D / 3 / training set / 421 / 1057 / 140 / 425 / 0.750 / 0.713 / 0.723
validation set / 137 / 348 / 50 / 146 / 0.733 / 0.704 / 0.712
13 / nR06
ChiA_Dz(i) / 3 / training set / 423 / 1041 / 138 / 441 / 0.754 / 0.702 / 0.717
validation set / 145 / 349 / 42 / 145 / 0.775 / 0.706 / 0.725

Table S5 Description of the descriptors involved in the classification model.

ID / Name / Description / Block
1 / RBN / number of rotatable bonds / Constitutional indices
2 / nCIC / number of rings (cyclomatic number) / Ring descriptors
3 / nCIR / number of circuits / Ring descriptors
4 / TRS / total ring size / Ring descriptors
5 / Rperim / ring perimeter / Ring descriptors
6 / RCI / ring complexity index / Ring descriptors
7 / NRS / number of ring systems / Ring descriptors
8 / NNRS / normalized number of ring systems / Ring descriptors
9 / nR06 / number of 6-membered rings / Ring descriptors
10 / ChiA_G / average Randic-like index from geometrical matrix / 3D matrix-based descriptors
11 / ChiA_D / average Randic-like index from topological distance matrix / 2D matrix-based descriptors
12 / ChiA_Dz(Z) / average Randic-like index from Barysz matrix weighted by atomic number / 2D matrix-based descriptors
13 / ChiA_Dz(i) / average Randic-like index from Barysz matrix weighted by ionization potential / 2D matrix-based descriptors
14 / DLS_cons / DRAGON consensus drug-like score / Drug-like indices

Table S6 Description of the descriptors involved in the QSAR model of logEC20and the correspondingVIFvalues.

ID / Name / Description / Block / VIF
1 / ChiA_RG / average Randic-like index from reciprocal squared geometrical matrix / 3D matrix-based descriptors / 1.90
2 / CATS2D_08_AL / CATS2D Acceptor-Lipophilic at lag 08 / CATS 2D / 1.76
3 / ATSC6p / CentredBroto-Moreau autocorrelation of lag 6 weighted by polarizability / 2D autocorrelations / 1.79
4 / GATS7s / Geary autocorrelation of lag 7 weighted by I-state / 2D autocorrelations / 1.10
5 / F07[C-S] / Frequency of C - S at topological distance 7 / 2D Atom Pairs / 1.10
6 / NNRS / normalized number of ring systems / Ring descriptors / 1.22
7 / MEcc / molecular eccentricity / Geometrical descriptors / 1.20
8 / F03[N-O] / Frequency of N - O at topological distance 3 / 2D Atom Pairs / 1.26
9 / R1u+ / R maximal autocorrelation of lag 1 / unweighted / GETAWAY descriptors / 1.57
10 / H7p / H autocorrelation of lag 7 / weighted by polarizability / GETAWAY descriptors / 2.50
11 / B09[N-O] / Presence/absence of N - O at topological distance 9 / 2D Atom Pairs / 1.31
12 / RDF065u / Radial Distribution Function - 065 / unweighted / RDF descriptors / 3.96

Table S7 Description of the descriptors involved in the QSAR model of logEC50and the corresponding VIFvalues.

ID / Name / Description / Block / VIF
1 / F10[F-F] / Frequency of F-F at topological distance 10 / 2D Atom Pairs / 1.18
2 / DLS_01 / modified drug-like score from Lipinski (4 rules) / Drug-like indices / 2.61
3 / nRNHR / number of secondary amines (aliphatic) / Functional group counts / 1.41
4 / N% / percentage of N atoms / Constitutional indices / 1.43
5 / F08[N-Cl] / Frequency of N - Cl at topological distance 8 / 2D Atom Pairs / 1.12
6 / F10[N-O] / Frequency of N - O at topological distance 10 / 2D Atom Pairs / 1.59
7 / C-015 / =CH2 / Atom-centred fragments / 1.25
8 / CATS2D_06_DP / CATS2D Donor-Positive at lag 06 / CATS 2D / 2.12
9 / R6s+ / R maximal autocorrelation of lag 6 / weighted by I-state / GETAWAY descriptors / 2.13
10 / Mor27u / signal 27 / unweighted / 3D-MoRSE descriptors / 1.31
11 / DLS_05 / modified drug-like score from Zheng et al. (2 rules) / Drug-like indices / 1.24
12 / CATS2D_06_DD / CATS2D Donor-Donor at lag 06 / CATS 2D / 1.22
13 / nR12 / number of 12-membered rings / Ring descriptors / 1.08
14 / H0u / H autocorrelation of lag 0 / unweighted / GETAWAY descriptors / 1.93
15 / nCH2RX / number of CH2RX / Functional group counts / 1.20
16 / CATS2D_08_AL / CATS2D Acceptor-Lipophilic at lag 08 / CATS 2D / 1.42
17 / B05[O-Cl] / Presence/absence of O - Cl at topological distance 5 / 2D Atom Pairs / 1.60
18 / CATS2D_06_AP / CATS2D Acceptor-Positive at lag 06 / CATS 2D / 2.86
19 / N-070 / Ar-NH-Al / Atom-centred fragments / 1.18
20 / Mor16m / signal 16 / weighted by mass / 3D-MoRSE descriptors / 2.61
21 / H-053 / H attached to C0(sp3) with 2X attached to next C / Atom-centred fragments / 1.41

1

Fig. S1 Relationship of the percentage of active chemicals and the number of ring systems. The dash line represents the percentage of all active chemicals in the whole data set.

Fig. S2Plot of ChiA_Dz(i) values versus activity classes of chemicals.

1