Supplementary Information for

Design of a Fragment Library that Maximally Represents Available Chemical Space.

M. N. Schulz, J. Landström, K Bright, R. E. Hubbard

Contents:

  1. Description of calculations
  2. Table S1: list of SMARTS strings used to define compounds from the Input Libraries with unwanted functionality, a brief description of the chemical feature and the number of compounds from each supplier that contained each feature.
  3. Table S2: diagrams of protocols and link to Pipeline Pilot protocols, annotated as to which are for the different versions of the software.
  4. Table S3: number of compounds in the low MW list that fail the properties filters to be included in the Fragment Library
  5. Table S4 - properties of the Fragment Sets generated by the different protocols for the three Fragment Libraries

Pipeline Pilot calculations:

The following is derived from the Pipeline Pilot documentation and describes the properties calculated for the Fragment Libraries and Fragment Sets:

No HBD Acc (HA)

Hydrogen Bond Acceptors are defined as heteroatoms (Oxygen, Nitrogen, Sulfur, or Phosphorus) with one or more Ion pairs, excluding atoms with positive charges, amide and pyrrole-type Nitrogens, and aromatic Oxygen and Sulfur atoms in heterocyclic rings.

No HBD Donors (HD)

Hydrogen Bond Donors are defined as heteroatoms (Oxygen, Nitrogen, Sulfur, or Phosphorus) with one or more attached Hydrogen atoms.

Solubility

This is calculated using the multiple linear regression model based on ElectrotopologicalState indices published by Tetko et al. [J Chem Inf. Comput. Sci, 2001, 41, 1488-1493, “Estimation of Aqueous Solubility of Chemical Compounds Using E-State Indices.”]. The solubility is expressed as logS, where S is the solubility in mol/L.

ALogP

Calculated using the Ghose and Crippen ALogP algorithm.

Surface Area and Volume

Molecular_SurfaceArea and Molecular_PolarSurfaceArea: Calculates the total surface area and/or polar surface area for each molecule using a 2D approximation.

Table S1 - list of SMARTS strings used to define compounds from the Input Libraries with unwanted functionality, a brief description of the chemical feature and the number of compounds from each supplier that contained each feature.

SMARTS string / Functionality / Specs / Asinex / Maybridge
[OX2H]c1ccccc1[OX2H] / Catechol / 209 / 417 / 31
[CX4](-[OX2])(-[OX2])(-[OX2]) / Ortho ester / 33 / 0 / 2
[$([NX3](=O)=o),$([NX3+](=O)[O-])][!#8] / Nitro / 23314 / 34477 / 6529
O-O / Ether / 3 / 0 / 0
[NX2]=[OX1] / Nitroso / 63 / 155 / 73
[CX3]=([NX2]-OH) / Oxime / 551 / 1066 / 688
[CX3]-[CX3](=O)-[CX3] / Aliphatic ketone / 2681 / 5660 / 1125
[CX4](-[OX2])(-[OX2]) / Acetal / 7065 / 19437 / 1468
c1nsnc1 / Thiadiazole / 180 / 876 / 320
C=C[H2] / Methylene / 37903 / 82066 / 4168
[#6][C!H0]=O / Aldehyde / 1210 / 1523 / 7
c1(NH2)cccs1 / Aminothiophene / 13033 / 8013 / 677
C1CN1 / Aziridine / 89 / 42 / 10
NC(=S)N / Thiourea / 11669 / 18077 / 4433
c1ncns1 / Thiadiazole / 91 / 271 / 118
[NX3,NX4+][CX3](=[OX1])[SX2,SX1-] / thiolactone / 2610 / 1576 / 96
[CX4](-[SX2])(-[SX2]) / Dithioacetal / 503 / 58 / 308
[CX3;!R](=[SX1])[NH2] / Thioamide / 323 / 465 / 384
c1(NH2)ccsc1 / Aminothiophene / 2051 / 2133 / 916
S-H / Thiol / 1590 / 2328 / 595
[NX3+] / tertiary amine / 24954 / 39219 / 8190
N=C=O / N=C=O / 1 / 1 / 0
S(=O)(=O)[OX2] / Sulphoxide / 1153 / 570 / 543
c2ccc1nonc1c2 / benzoxadiazole / 116 / 248 / 264
N=C=S / N=C=S / 9 / 11 / 0
c1scnn1 / Thiadiazole / 5249 / 10349 / 535
[CX4](-[NX3])(-[OX2]) / Aminal / 5037 / 6032 / 648
[$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])] / Imine / 13930 / 19678 / 2133
S-S / Thioether / 410 / 565 / 49
[CX3]=[CX3][Cl,Br,F,I] / alpiphatic C=C-Halogen / 1533 / 2032 / 389
C(=O)[F,Cl,Br,I] / acyl halide / 21 / 37 / 0
O=C1CSCO1 / Oxathiolane / 6 / 2 / 0
[C;!R](=O)-[S;!R] / acyclic C(=O)-S / 138 / 105 / 320
C1CO1 / Epoxide / 175 / 234 / 0
N-[NH2] / hydrazide / 652 / 1074 / 690
[CH2,H3;!r][CH2!r][CH2!r][CH2,H3;!r] / aliphatic chain / 11047 / 11413 / 847
[NX4+] / quarternary amine / 19080 / 86993 / 4647
[#6]C(=O)OC(=O)[#6] / anhydride / 32 / 0 / 0
O=C-C=[A;!O;!N;!S;!a] / R=C-C=O / 69559 / 139968 / 10466
[*;!#1;!#6;!#7;!#8;!#9;!#16;!#17] / not C,N,F,Cl,S / 40808 / 27118 / 3042
[N;!R]=[C;!R]=[N;!R] / acyclic N=C=N / 1 / 1 / 4
[NX3,NX4+][CX3](=[OX1])[OX2,OX1-] / N-C-O acetal / 2472 / 4065 / 2229
C-[CX3](=O)[CX4,CX3,CX2][Cl,Br,F,I] / Halo-acetophenone / 376 / 184 / 109
[C;!R](=S)-[O;!R] / acyclic C(=S)-O / 51 / 41 / 62

Table S2 - diagram of protocols and link to Pipeline Pilot protocols, annotated as to which are for the different versions of the software.

Protocol / Download / Minimum Pipeline Pilot Version to run
Initial LibrarySelection /
/ Student Edition 6.1.5.0 and Professional Edition 8.0.1.500
Cluster All /
/ Student Edition 6.1.5.0
Cluster Fragments /
/ Student Edition 6.1.5.0
SIM within Cluster /
/ Student Edition 6.1.5.0
Substructure Count /
/ Student Edition 6.1.5.0
Substructure Map /
/ Student Edition 6.1.5.0
Iterative Removal (light) / / Student Edition 6.1.5.0

Iterative Removal (advanced, including Master Library selection) / / Professional Edition 8.0.1.500
Unwanted /

Table S3 - number of compounds in the low MW list that fail the properties filters to be included in the Fragment Library.

MW
< 100 / No. O + No. N > 6 / No. HB
Donor > 3 / AlogP > 3 / NRot
> 5 / PSA
>80
Specs
(8829) / 38 (0.4%) / 424
(5%) / 57
(0.6%) / 5016 (57%) / 640
(7%) / 3977 (45%)
Maybridge
(5050) / 0
(0%) / 213
(4%) / 84
(2%) / 2482
(49%) / 263
(5%) / 2954
(58%)
Asinex
(13450) / 11
(0.1%) / 1183
(9%) / 108
(0.8%) / 4870
(36%) / 1148
(8%) / 9004
(67%)

Table S4 - properties of the Fragment Sets generated by the different protocols for the three Fragment Libraries

Library / MW / AC / FC / ALogP / HA
Asinex / 216.5 / ±33.35 / 15.3 / ±2.30 / -0.23 / ±0.47 / 1.11 / ±1.03 / 3.1 / ±0.99
Cluster All / 209.6 / ±30.02 / 14.8 / ±2.24 / -0.07 / ±0.29 / 1.38 / ±0.93 / 2.6 / ±1.00
Cluster Frag / 201.8 / ±30.87 / 14.3 / ±2.36 / -0.09 / ±0.38 / 1.24 / ±0.92 / 2.7 / ±1.10
SIM in Cluster / 218.6 / ±30.61 / 15.4 / ±2.28 / -0.09 / ±0.37 / 1.21 / ±0.99 / 3.0 / ±1.06
Substruct Count / 228.0 / ±25.08 / 16.8 / ±1.50 / -0.05 / ±0.38 / 1.79 / ±0.63 / 2.7 / ±0.99
Substruct Map / 241.8 / ±23.60 / 17.6 / ±1.27 / -0.03 / ±0.25 / 1.91 / ±0.73 / 2.6 / ±0.90
Iterative / 196.7 / ±30.87 / 13.8 / ±2.24 / -0.09 / ±0.28 / 1.46 / ±0.94 / 2.3 / ±0.93
Maybridge / 203.4 / ±30.71 / 14.3 / ±2.17 / -0.18 / ±0.42 / 1.36 / ±0.89 / 2.9 / ±1.03
Cluster All / 194.1 / ±28.72 / 13.5 / ±2.16 / -0.08 / ±0.39 / 1.50 / ±0.85 / 2.6 / ±1.13
Cluster Frag / 191.5 / ±28.58 / 13.2 / ±2.17 / -0.10 / ±0.40 / 1.42 / ±0.94 / 2.6 / ±1.07
SIM in Cluster / 204.5 / ±33.14 / 14.2 / ±2.28 / -0.10 / ±0.44 / 1.60 / ±0.81 / 2.8 / ±1.07
Substruct Count / 212.2 / ±29.58 / 15.7 / ±1.95 / -0.08 / ±0.44 / 1.80 / ±0.71 / 2.5 / ±1.03
Substruct Map / 232.3 / ±31.34 / 16.8 / ±1.88 / -0.04 / ±0.26 / 1.78 / ±0.67 / 2.5 / ±0.96
Iterative / 189.2 / ±26.90 / 13.1 / ±1.97 / -0.09 / ±0.32 / 1.52 / ±0.72 / 2.6 / ±0.91
Specs / 205.4 / ±37.69 / 14.5 / ±2.63 / -0.11 / ±0.42 / 1.35 / ±0.94 / 2.8 / ±1.03
Cluster All / 180.1 / ±37.56 / 12.7 / ±2.87 / 0.01 / ±0.40 / 1.32 / ±0.91 / 2.5 / ±1.18
Cluster Frag / 179.5 / ±36.87 / 12.7 / ±2.80 / 0.01 / ±0.38 / 1.33 / ±0.86 / 2.5 / ±1.12
SIM in Cluster / 201.9 / ±38.42 / 14.3 / ±2.79 / -0.04 / ±0.43 / 1.48 / ±0.92 / 2.8 / ±1.18
Substruct Count / 220.7 / ±28.24 / 16.3 / ±1.74 / 0.00 / ±0.40 / 1.95 / ±0.62 / 2.7 / ±1.07
Substruct Map / 237.7 / ±26.34 / 17.3 / ±1.46 / -0.01 / ±0.22 / 2.10 / ±0.65 / 2.6 / ±0.93
Iterative / 174.9 / ±31.58 / 12.2 / ±2.23 / -0.11 / ±0.32 / 1.38 / ±0.96 / 2.1 / ±0.85
Library / HD / RB / PSA / LogS / ArB
Asinex / 0.8 / ±0.76 / 2.42 / ±1.30 / 57.9 / ±14.35 / -2.30 / ±0.76 / 7.09 / ±3.57
Cluster All / 0.7 / ±0.71 / 2.41 / ±1.29 / 53.2 / ±16.73 / -2.37 / ±0.81 / 6.98 / ±3.22
Cluster Frag / 0.7 / ±0.72 / 2.14 / ±1.18 / 53.3 / ±16.31 / -2.26 / ±0.88 / 6.98 / ±3.58
SIM in Cluster / 0.7 / ±0.68 / 2.49 / ±1.31 / 57.3 / ±15.76 / -2.35 / ±0.85 / 7.46 / ±4.06
Substruct Count / 0.9 / ±0.75 / 2.43 / ±1.11 / 51.7 / ±16.13 / -3.03 / ±0.25 / 12.54 / ±1.99
Substruct Map / 0.6 / ±0.85 / 1.68 / ±0.86 / 48.3 / ±15.94 / -2.61 / ±0.57 / 8.47 / ±4.08
Iterative / 0.6 / ±0.70 / 2.08 / ±1.35 / 47.3 / ±16.61 / -2.39 / ±0.73 / 6.71 / ±3.15
Maybridge / 0.7 / ±0.75 / 1.94 / ±1.27 / 56.6 / ±14.78 / -2.34 / ±0.69 / 7.10 / ±3.50
Cluster All / 0.7 / ±0.74 / 1.74 / ±1.16 / 52.8 / ±16.16 / -2.37 / ±0.76 / 7.17 / ±3.60
Cluster Frag / 0.6 / ±0.74 / 1.76 / ±1.24 / 54.2 / ±16.75 / -2.32 / ±0.68 / 6.90 / ±3.89
SIM in Cluster / 0.7 / ±0.79 / 1.82 / ±1.27 / 55.8 / ±14.77 / -2.40 / ±0.78 / 7.41 / ±3.98
Substruct Count / 0.8 / ±0.77 / 1.87 / ±1.00 / 50.3 / ±16.51 / -2.84 / ±0.35 / 12.30 / ±2.02
Substruct Map / 0.7 / ±0.79 / 1.58 / ±0.76 / 49.6 / ±16.86 / -2.75 / ±0.52 / 8.94 / ±4.28
Iterative / 0.5 / ±0.68 / 1.76 / ±1.17 / 50.0 / ±14.61 / -2.31 / ±0.60 / 7.27 / ±2.54
Specs / 0.8 / ±0.71 / 2.18 / ±1.45 / 54.1 / ±15.35 / -2.33 / ±0.74 / 6.90 / ±3.31
Cluster All / 0.7 / ±0.69 / 1.44 / ±1.30 / 49.8 / ±17.84 / -2.12 / ±0.86 / 6.77 / ±4.13
Cluster Frag / 0.7 / ±0.71 / 1.59 / ±1.35 / 48.9 / ±17.60 / -2.11 / ±0.84 / 6.54 / ±3.92
SIM in Cluster / 0.7 / ±0.67 / 1.90 / ±1.43 / 52.3 / ±16.13 / -2.30 / ±0.88 / 7.51 / ±3.98
Substruct Count / 0.8 / ±0.78 / 2.21 / ±1.11 / 49.3 / ±16.67 / -3.01 / ±0.26 / 12.73 / ±2.03
Substruct Map / 0.7 / ±0.81 / 1.77 / ±0.93 / 47.2 / ±17.05 / -2.79 / ±0.44 / 8.76 / ±3.92
Iterative / 0.4 / ±0.62 / 1.54 / ±1.26 / 42.7 / ±16.13 / -2.17 / ±0.77 / 6.02 / ±2.98
Library / R / ArR / RA
Asinex / 1.9 / ±0.63 / 1.3 / ±0.69 / 1.5 / ±0.52
Cluster All / 1.8 / ±0.69 / 1.2 / ±0.60 / 1.5 / ±0.54
Cluster Frag / 1.8 / ±0.76 / 1.2 / ±0.69 / 1.5 / ±0.55
SIM in Cluster / 2.0 / ±0.70 / 1.4 / ±0.76 / 1.6 / ±0.58
Substruct Count / 2.4 / ±0.53 / 2.2 / ±0.43 / 2.1 / ±0.35
Substruct Map / 3.0 / ±0.52 / 1.5 / ±0.74 / 2.2 / ±0.43
Iterative / 1.6 / ±0.64 / 1.2 / ±0.60 / 1.4 / ±0.54
Maybridge / 1.8 / ±0.73 / 1.3 / ±0.67 / 1.4 / ±0.55
Cluster All / 1.7 / ±0.74 / 1.3 / ±0.67 / 1.4 / ±0.55
Cluster Frag / 1.6 / ±0.76 / 1.2 / ±0.73 / 1.3 / ±0.57
SIM in Cluster / 1.9 / ±0.73 / 1.3 / ±0.75 / 1.4 / ±0.56
Substruct Count / 2.4 / ±0.50 / 2.2 / ±0.42 / 2.0 / ±0.44
Substruct Map / 2.8 / ±0.78 / 1.6 / ±0.77 / 2.1 / ±0.48
Iterative / 1.6 / ±0.61 / 1.3 / ±0.48 / 1.4 / ±0.49
Specs / 1.8 / ±0.71 / 1.2 / ±0.63 / 1.4 / ±0.52
Cluster All / 1.7 / ±0.78 / 1.2 / ±0.77 / 1.3 / ±0.51
Cluster Frag / 1.6 / ±0.74 / 1.2 / ±0.72 / 1.3 / ±0.51
SIM in Cluster / 1.9 / ±0.73 / 1.4 / ±0.75 / 1.4 / ±0.55
Substruct Count / 2.4 / ±0.52 / 2.2 / ±0.43 / 2.0 / ±0.33
Substruct Map / 2.9 / ±0.68 / 1.5 / ±0.71 / 2.1 / ±0.44
Iterative / 1.5 / ±0.70 / 1.0 / ±0.54 / 1.2 / ±0.45