Supplementary Material

Compilation and Physicochemical Classification Analysis of a Diverse hERGInhibition Database

Remigijus Didziapetris1,2, Kiril Lanevskij1,2,*

1VšĮ „Aukštieji algoritmai“, A.Mickevičiaus 29, LT-08117 Vilnius, Lithuania

2ACD/Labs, Inc., 8 KingStreetEast,Suite 107, Toronto, Ontario, Canada M5C 1B5

*Correspondingauthor: Addressforcorrepondence: Traidenio 34, LT-08116 Vilnius, Lithuania

Telephone: +370 5 262 3408; Fax: +370 5 262 3728.

E-mail:

Description of fields in hERG inhibition data base

Name: the name of the compound. In case of congeneric series, this includes a short description of the series and ID number of the compound in the original publication.

SMILES: the compound structure provided in SMILES notation format.

I_hERG: assignedbinary classification of hERG inhibition (1/0):

  • 1 if IC50 (Ki) ≤10 µM
  • 0 if IC50 (Ki) > 10 µM.

Type:indicates what kind of metric is provided in the Value field:

  • IC50: half-inhibitory concentration determined in patch-clamp or radioligand displacement assay
  • Ki: inhibition constant determined in radioligand displacement assay
  • Kimin: the lower limit of Ki value
  • IC50min: the lower limit of IC50 value
  • IC50max: the upper limit of IC50 value
  • IC50 (%):IC50 value estimated from single point data (percentage inhibition at fixed ligandconcentration [L]) using the following equation:

Since this is a very rough estimate, only the entries with resulting IC50(%)50 µM or IC50(%)2 µM were recorded in the database.

Value:quantitative hERG activity value of givenType.

Assay: a brief description of hERG inhibition assay used to derive the given activity value:

  • Patch-clamp conventional (cell line indicated in parentheses: HEK293, CHO, XO, or ND if not specified)
  • Patch-clamp automated (cell line indicated in parentheses: HEK293, CHO, or ND if not specified)
  • Patch-clamp ND – patch-clamp with unspecified details
  • Electrophysiology (myocytes)
  • Binding (reference ligand indicated in parentheses: dofetilide, astemizole, or MK-499|).

This field may also contain two additional notes:

  • “assumed” –exact assay details were not reported in the article, but could be reasonably implied to be identical to those described in related publications by the same laboratory
  • “confirmed by authors” – the authors of the publication had provided some of the missing information upon request

Code: the assigned assay code as outlined in Table 1 of the article.

Reference: the literature reference number. The full list of references is provided as a separate data sheet alongside the main database.

Set: indicates whether the compound was part of Modeling set (used as training orinternal validation data in different modeling runs), or External validation set.

n (congeneric):the number of compounds in the congeneric series, or 1 for non-congeneric compounds (see Data & Methods section of the article)

w (adjusted):weightadjustment factor(see Data & Methods for details), can be 0.5; 1; or 2.

weight:the finalweight of the compound used in modeling,weight = w (adjusted)/n (congeneric)

logP: octanol/water partitioning coefficient
pKa1(Acid): the strongest acidic pKa
pKa1(Base): the strongest basic pKa
pKa2(Base): the second strongest basic pKa

logP and pKa were calculated using ACD/LogP GALAS and ACD/pKa GALAS algorithms implemented in ACD/Percepta software ( In several cases when these produced very unreliable estimates, they were replaced by ACD/LogP Classic and ACD/pKa Classic predictions (marked italic), or experimental values if available (marked bold).

MW: Molecular weight

TPSA: Topological Polar Surface Area

NAR: Number of Aromatic Rings

FRB: Fraction of Rotatable Bonds

p (predicted): predicted probability of the compound being a hERG inhibitor with IC50 ≤ 10 µM as an averaged output of ten models: