SUPPORTING INFORMATION

The complexation of metal ions with various organic ligands in water:

Prediction of stability constants by QSPR ensemble modelling

Vitaly Solov’ev *[1], a, Natalia Kireeva a, b, Svetlana Ovchinnikova a, b, and Aslan Tsivadze a

a Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Leninskiy prospect, 31, 119071, Moscow, Russian

b Moscow Institute of Physics and Technology, Institutsky per., 9, 141700, Dolgoprudny, Russia

Table SI 1 The stability constant logK for the complexation of metal ions with organic ligands in water: predictive performances of the eMLR consensus models in 5CV a

no. / metal ion / with AD / without AD
RMSE / Rdet2 / npred / nmodel / RMSE / Rdet2
1 / Li+ / 0.49 / 0.712 / 26 / 300 – 1649 / 0.49 / 0.798
2 / Na+ / 0.63 / 0.682 / 39 / 153 – 1409 / 0.69 / 0.608
3 / K+ / 0.61 / 0.813 / 47 / 1119 – 1528 / 0.65 / 0.780
4 / Be2+ / 0.96 / 0.872 / 30 / 507 – 983 / 2.18 / 0.434
5 / Al3+ / 1.44 / 0.884 / 48 / 627 – 1074 / 2.44 / 0.651
6 / Ga3+ / 2.21 / 0.908 / 28 / 794 – 1206 / 2.78 / 0.886
7 / In3+ / 2.30 / 0.926 / 46 / 736 – 997 / 3.19 / 0.871
8 / VO2+ / 2.05 / 0.882 / 85 / 892 – 1434 / 2.22 / 0.846
9 / Fe3+ / 1.94 / 0.943 / 84 / 1267 – 1647 / 2.40 / 0.918
10 / Th4+ / 1.40 / 0.908 / 29 / 1533 – 1914 / 1.33 / 0.919
11 / NpO2+ / 0.89 / 0.809 / 23 / 310 – 2050 / 1.25 b / 0.818
12 / Am3+ / 1.59 / 0.942 / 17 / 1361 – 2243 / 2.62 / 0.867

a npred is the number of predicted logK values, here npred £ n (see n in Table 1) because some ligands were discarded by AD. nmodel is the number of individual models in different CMs of the 5CV subsets. b without outlier compound 356 (see SDfile: Li_Na_K_Be_Al_Ga_In_V_Fe_Th_Np_Am.SDF).

Table SI 2 The stability constant logK for the complexation of metal ions with organic ligands in water: predictive performances of the SVM models in 5CV a

no. / metal ion / SMF type b / with AD / without AD
RMSE / Rdet2 / npred / RMSE / Rdet2
1 / Li+ / IAB(2 – 8) / 0.61 / 0.558 / 26 / 0.60 / 0.692
2 / Na+ / IAB(2 – 8)t / 0.55 / 0.761 / 39 / 0.54 / 0.766
3 / K+ / IAB(2 – 15)a / 0.73 / 0.733 / 47 / 0.72 / 0.735
4 / Be2+ / IAB(2 – 8)t / 1.04 / 0.851 / 30 / 1.44 / 0.752
5 / Al3+ / IAB(2 – 8)t / 1.95 / 0.788 / 48 / 1.88 / 0.793
6 / Ga3+ / IAB(2 – 8) / 2.31 / 0.899 / 28 / 2.93 / 0.874
7 / In3+ / IAB(2 – 15)t / 2.45 / 0.916 / 46 / 2.70 / 0.908
8 / VO2+ / IAB(2 – 15)a / 2.72 / 0.793 / 85 / 2.82 / 0.751
9 / Fe3+ / IAB(2 – 8)t / 2.21 / 0.926 / 84 / 2.68 / 0.898
10 / Th4+ / IAB(2 – 8)a / 2.12 / 0.790 / 29 / 2.11 / 0.797
11 / NpO2+ / IAB(2 – 15)t / 1.06 / 0.731 / 23 / 1.93 / 0.555
12 / Am3+ / IAB(2 – 8)t / 2.33 / 0.876 / 17 / 2.12 / 0.913

a npred is the number of predicted logK values, here npred £ n (see n in Table 1) because some ligands were discarded by AD. b SMF type: see the notation in Methods section: descriptors.

Table SI 3 The stability constant logK for the complexation of metal ions with organic ligands in water: predictive performances of the ASNN models in 5CV a

no. / metal ion / SMF type b / with AD / without AD
RMSE / Rdet2 / npred / RMSE / Rdet2
1 / Li+ / IAB(2 – 15) / 0.54 / 0.647 / 26 / 0.60 / 0.693
2 / Na+ / IAB(2 – 8)t / 0.54 / 0.762 / 39 / 0.54 / 0.761
3 / K+ / IAB(2 – 15)t / 0.59 / 0.827 / 47 / 0.58 / 0.828
4 / Be2+ / IAB(2 – 8) / 1.04 / 0.849 / 30 / 1.49 / 0.736
5 / Al3+ / IAB(2 – 8)t / 1.70 / 0.839 / 48 / 1.67 / 0.836
6 / Ga3+ / IAB(2 – 15)t / 1.86 / 0.934 / 28 / 2.66 / 0.895
7 / In3+ / IAB(2 – 8)t / 2.92 / 0.881 / 46 / 3.23 / 0.868
8 / VO2+ / IAB(2 – 15)a / 2.38 / 0.841 / 85 / 2.54 / 0.799
9 / Fe3+ / IAB(2 – 15)t / 1.90 / 0.945 / 84 / 2.22 / 0.930
10 / Th4+ / IAB(2 – 8)a / 2.22 / 0.768 / 29 / 2.19 / 0.780
11 / NpO2+ / IAB(2 – 15)a / 1.34 / 0.570 / 23 / 1.34 / 0.785
12 / Am3+ / IAB(2 – 15)t / 1.67 / 0.936 / 17 / 1.75 / 0.941

a npred is the number of predicted logK values, here npred £ n (see n in Table 1) because some ligands were discarded by AD. b SMF type: see the notation in Methods section: descriptors.

Table SI 4 The statistical parameters of the best individual eMLR models and optimal descriptor types according to the training subsets of the 5CV procedure a

no. / metal ion / SMF type b / s / Q2
1 / Li+ / IAB(2 – 14)e / 0.07 – 0.18 / 0.953 – 0.984
2 / Na+ / IAB(3 – 11)e / 0.18 – 0.29 / 0.924 – 0.960
3 / K+ / IAB(3 – 12)te / 0.19 – 0.30 / 0.940 – 0.958
4 / Be2+ / IAB(2 – 12) / 0.20 – 0.82 / 0.900 – 0.979
5 / Al3+ / IAB(2 – 10) / 0.50 – 0.88 / 0.931 – 0.971
6 / Ga3+ / IAB(2 – 11)e / 0.33 – 1.6 / 0.941 – 0.972
7 / In3+ / IAB(2 – 13)e / 0.50 – 1.7 / 0.950 – 0.992
8 / VO2+ / IAB(2 – 9) / 0.42 – 1.0 / 0.960 – 0.982
9 / Fe3+ / IAB(2 – 14) / 0.62 – 1.2 / 0.953 – 0.970
10 / Th4+ / IAB(2 – 14) / 0.18 – 1.0 / 0.950 – 0.988
11 / NpO2+ / IAB(2 – 7)t / 0.18 – 0.87 / 0.902 – 0.991
12 / Am3+ / IAB(3 – 6)t / 0.29 – 1.2 / 0.970 – 0.994

a Statistical parameters of the eMLR models: standard deviation (s) and squared LOO cross-validation correlation coefficient (Q2); b SMF type: see the notation in Methods section: descriptors.

5

[1] Corresponding author. Address: Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Leninsky pr. 31, 119071 Moscow, Russia. Tel: +7 903 564 32 29.

E-mail address: (V. Solovev)