Online Resource 2: Gene Set Narrative with Citations

Delineating the Hemostaseome

The Coagulation Pathway

The coagulation pathway proper is executed by the regulated action of 14 genes encoding the protein components that work in synchrony to sequentially activate an interacting network of proteins, ultimately resulting in the generation of an insoluble fibrin clot (See Kyoto Encyclopedia of Genes and Genomes, KEGG, http://www.genome.jp/kegg; pathway hsa04610 for details). The extrinsic coagulation pathway is activated when vascular damage leads to the expression of tissue factor (factor III, gene symbol[1] F3) on the damaged endothelial cell surface. Factor VII (F7) binds the exposed tissue factor and is then activated by proteolysis; this complex then activates coagulation factors IX (F9) and X (F10). Factor X forms the ‘tenase’ complex with factor V (F5) which then activates factor 2 (F2), thrombin; thrombin cleaves fibrinogen (FGA, FGB, FGG) to produce the fibrin monomer. The fibrin monomer is cross-linked by factor XIII (F13A1, F13B) yielding a fibrin polymer which, in combination with platelets, forms the clot that limits bleeding and initiates wound healing. The coagulation pathway can also be initiated through the action of the contact system, the intrinsic pathway, in which factor XI (F11) is activated by factor XII (F12) which then activates factor IX (F9) that, bound together with factor VIII (F8) and von Willebrand factor (VWF) on the platelet surface, then activates the ‘tenase’ complex comprising factors X and V, thereby triggering activation of thrombin and fibrin. The key elements of the coagulation pathway are the primary effectors which are all associated with well known hemostatic conditions. The genes encoding factors VIII and IX are both located on the X chromosome and are well known as being the genes responsible for the sex-linked hemophilias (A and B).

Two common variants of coagulation factor genes for which significant, independent thrombotic risk has been well established are: factor V Leiden (Bertina et al. 1994) and the prothrombin gene variant G20210A (Poort et al. 1996). The factor V Leiden variant has been identified as the SNP with the greatest risk association with venous thrombosis, being found in 15-25% of patients with deep vein thrombosis (DVT), and generally in ~1-5% of populations of European descent (Aiach and Emmerich 2006) with a prevalence of 4 to 6% in the United States (Price and Ridker 1997). The prothrombin (F2) gene variant (G20210A), which localizes to the 3’-untranslated region of the F2 gene (Poort et al. 1996), is found in 1-3% of subjects of European descent and is present in 6-18% of patients with venous thrombosis thereby revealing an overall two- to five-fold increase in thrombotic risk among carriers (Franco and Reitsma 2001). These examples illustrate two well characterized hereditary contributors to hemostatic disease but, as illustrated in Tables 1-8, a surprisingly large number of genetic variants in the genes of the Hemostaseome have now been described and have been shown, primarily through case-control studies, to be associated with hemostatic disease.

Online Resource 1, Supplementary Table 1A lists the gene members of the coagulation pathway and summarizes their disease-associated phenotypes, genomic locations, gene sizes, exon number and the disease-related characteristics germane to their selection as members of the Hemostaseome. A broad range of phenotypes have been delineated for these genes. Thus, deficiency or loss-of-function mutations result in bleeding disorders in which the nature of the bleeding diathesis may differ depending upon which gene is involved. Single nucleotide polymorphisms can alter protein function or protein levels in a manner that is potentially dysregulatory. For example, the factor V Leiden mutation alters the amino acid critical for protein C cleavage of activated factor V (Va) resulting in the failure to inactivate factor Va appropriately. Factor V Leiden homozygotes have a 30- to 140-fold greater risk of thrombosis over that seen for individuals bearing the wild-type DNA sequence, whereas heterozygotes exhibit a five-fold higher risk of thrombosis than healthy controls (Aiach and Emmerich 2006). Genetic variation of a different type is also associated with hemostatic disease, increased coagulation factor protein levels are associated with thrombosis while decreased protein levels (associated with deficiency states) result in bleeding disorders. Online Resource 3, Supplementary Table 2A summarizes the biochemical phenotypes resulting from elevated protein levels for the members of the coagulation pathway proper and their association with hemostatic disease. Although protein levels overlap between healthy subjects and those with thrombotic disease, multiple studies have consistently shown statistical significance and some have calculated odds ratios for differences in protein levels (Online Resource 3, Supplementary Tables 2A-2H). Thus, conventional mutations as well as regulatory mutations and CNVs should be considered in any comprehensive assessment of the hereditary factors underlying hemostatic disease. In Supplementary Table 2A, studies are summarized with protein measurements annotated alongside the specific disease association and literature citation. For each gene encoding a member of the coagulation pathway proper, elevated protein levels have been reported to be associated with thrombotic disease.

Table 1 summarizes these findings for members of the coagulation pathway proper. For the genes for which elevated protein levels have been reported to be associated with disease, a “Yes” annotation in the table is indicated. In Table 1, factor X (F10) is an exception, with “Estro” noted; in this case, elevated protein levels are only observed in female subjects taking oral contraceptives (Davies et al. 1976; Katayama et al. 2009; Mink et al. 1972); oral contraceptives are an established thrombotic risk factor (WHO cardiovascular disease report, 1997; 1998; Burkman et al. 2004). Where protein levels have been observed to be decreased or are deficient (as in the context of inactivating mutations that result in bleeding disorders) and where a deficiency phenotype is described in human subjects in the Online Mendelian Inheritance in Man (OMIMTM; http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim), “Yes” is noted in Table 1 under protein level deficiency. In only a single case listed in Table 1, that of tissue factor (F3), has a human deficiency phenotype not been described; in this case, “Mice” is indicated in Table 1 because a deficiency phenotype has been described in murine knock-out studies that deleted this gene [86% of the embryos did not survive embryogenesis although the 14% that did survive appeared normal (Toomey et al. 1997)]. Haploinsufficiency of the tissue factor gene (F3) may also be poorly viable in humans and this may be the reason why it has not been described in clinical cases.

Copy number variations have the potential to impact upon protein levels through the duplication or deletion of gene copies resulting in either increases or decreases in protein levels with their attendant clinical phenotypes. Studies reported over the past five years that have investigated the extent of CNVs in individual human genomes have already identified a large number of these variants within the genes of the Hemostaseome. For two genes involved directly in coagulation, factor VII (F7, Wang et al. 2007) and factor VIII (F8, Levy et al. 2007), predicted complete gene duplications have been reported as CNV gains (Table 1, CG, complete gain). Genes shown by array studies to have completely deleted copies (Table 1, CL, complete loss) include factors V (F5, Conrad et al. 2009), VII (F7, Wong et al. 2007), X (F10, Wong et al. 2007) and XIII (F13B, Redon et al. 2006 refer to their Supplementary Table 16 for all loci cited herein). For the 13q34 CNV loss reported by Wong et al. (2007), the adjacent genes F7 and F10 were both expected to be deleted. Although intronic CNVs are common and frequently occur among the genes of the Hemostaseome, it is likely that they will have little or no impact upon gene function and so these have not been annotated here. However, partial CNVs that either delete or duplicate exons were considered since they are certainly capable of influencing gene function, especially where they involve the first exon. Indeed, a partial loss CNV that deleted a differentially expressed exon in the factor VIII gene has been reported (Table 1, F8, PL-1A, Matsuzaki et al. 2009) as well as a partial gain involving two exons in the factor IX gene (Table 1, F9, PG-2, Shaikh et al. 2009).

Given that deficiency phenotypes are associated with bleeding disorders for these 14 genes, whereas elevations in protein levels are associated with thrombotic disease, the requirement to interrogate genomes for copy number variations to accurately assess hereditary variation and its contribution to hemostatic disease is fairly evident. With respect to the more conventional variants, known single nucleotide polymorphisms (SNPs) including regulatory mutations and insertions and deletion polymorphisms associated with altered gene function for this gene set, are compiled in HGMD®. For each gene, Table 1 also lists the number of variants identified in the public version of HGMD® (August, 2010). Variant totals are listed for each gene to illustrate the large number of variants required to be assessed for a comprehensive genome analysis to be considered complete. Gross deletions have often been described in individuals with hemostatic disease but are probably equivalent structurally to CNV losses; the distinction being that gross deletions were associated with disease in the original scientific report while CNV studies do not include clinical status. Another distinction to be drawn is that CNVs often occur at polymorphic frequencies and are associated with an increased likelihood of disease whereas disease-causing deletions may be very rare. As with CNVs, regulatory mutations have the potential to increase or decrease protein levels. However, regulatory variants occur infrequently, at a level of ~1-2% of disease-associated gene lesions (Cooper et al., 2010). From this brief initial survey of established variants in the published literature, it is apparent that a comprehensive assessment of an individual human genome sequence in relation to variants associated with hemostatic disease will be a substantial undertaking which will also require the structuring of the attendant information to facilitate comprehensive and automated analysis.

The Contact System including the Kallikrein-Kinin pathway

The contact system facilitates the initiation of coagulation via factor XII (F12) in contact with the plasma membranes of blood and endothelial cells and, in addition, executes functions in fibrinolysis, clot breakdown, as well as inflammation. Critical protein effectors of this system are factor XII (F12), prekallikrein (KLKB1) and high-molecular weight kininogen (KNG1) that are primarily responsible for the activation of the contact system. Factor XII and prekallikrein are zymogens that are activated by proteolysis into active serine proteases. High-molecular-weight kininogen is a cofactor for these reactions as well as an inflammatory mediator. Bradykinin, a nonapeptide derived from high-molecular weight kininogen, is released through proteolytic cleavage by plasma kallikrein. Plasminogen (PLG) is converted to plasmin through proteolysis by tissue plasminogen activator (PLAT) or by the action of urinary plasminogen activator (PLAU); plasmin, a serine protease, dissolves fibrin clots (fibrinolysis).

Unlike the coagulation factors noted above, deficiency of factor XII does not result in a clinical bleeding disorder; rather, factor XII deficiency has been associated, at least in one case, with thrombosis (Lessiani et al. 2009). However, in the majority of cases presented in OMIMTM, deficiencies are described as asymptomatic biochemical deficiency phenotypes detected by presurgical laboratory testing. By contrast, elevated factor XII protein levels have been associated with acute coronary syndrome (Altieri et al. 2005) and have been noted in women taking conjugated equine estrogens (Katayama et al. 2009). In addition to deficiencies characterized in subjects with abnormal presurgical blood test results (activated partial thromboplastin time, aPTT), several missense mutations have been described for F12 resulting in hereditary angioedema type III due to enhanced kinin production. As depicted in Table 2, elevated protein levels associated with disease are observed for coagulation factor XII (F12), high-molecular weight kininogen (KNG1) in women taking conjugated estrogens (Table 2, Estro; (Katayama et al. 2009), and for tissue plasminogen activator (PLAT) where elevated protein levels have been associated with cardiovascular disease (Cortellaro et al. 1993; Johansson et al. 2000; Thogersen et al. 1998; Van Dreden et al. 2009). Finally, elevated protein levels for PLAU have been observed in megakaryocytes in Quebec platelet disorder (Veljkovic et al. 2009). Details of protein level changes are annotated in Online Resource 3, Supplementary Table 2B. For each of these genes, with the exception of PLAU, a deficiency phenotype has been characterized in human subjects, albeit typically a biochemical phenotype rather than a clinical one, while CNVs have also been observed. Plasminogen (PLG) is the only member of this set for which a complete gene gain CNV has been reported (Table 2, CG, Redon et al. 2006), although a tandem duplication was recently identified as the causal mutation in Quebec platelet disorder (Table 2, PLAU, TD, Paterson et al. 2010). Complete gene loss CNVs are described for coagulation factor XII (Table 2, F12, CL, Itsara et al. 2009; Jakobsson et al. 2008) and for high-molecular weight kininogen (Table 2, KNG1, CL, Gusev et al. 2009; Redon et al. 2006); a partial gene loss copy number variant has been described for tissue plasminogen activator that deletes six internal exons and is expected to impact alternatively spliced transcripts (Table 2, PLAT, PL-6A, Mills et al. 2006). Probably as a consequence of the paucity of recognizable clinical phenotypes, but also perhaps due to reduced clinical severity for this subset of the Hemostaseome, many fewer mutations have been identified. HGMD® reports a total of 54 mutations for these five genes, the majority of which are missense or nonsense mutations (Table 2).

Regulators of the Coagulation Pathway

A large number of genes encode proteins that regulate coagulation; these proteins restrict coagulation to areas where the vasculature has been compromised. Online Resource 1, Supplementary Table 1C, lists these genes, the phenotypes responsible for their selection as members of the Hemostaseome, their genomic positions, size in kilobases (kb), number of exons and other relevant information germane to the function of these genes in hemostasis or in hemostatic disease. Online Resource 3, Supplementary Table 2C, provides information on the association of changes in protein levels and hemostatic diseases for this gene set while Table 3 summarizes this information for each gene.

ADAMTS13 encodes a disintegrin-like metalloprotease with a thrombospondin type 1 motif (better known as the von Willebrand factor-cleaving protease); it is responsible for degradation of the large vWF multimers. Deficiency has been identified in multiple pedigrees and is responsible for congenital thrombotic thrombocytopenic purpura, a bleeding disorder characterized by thrombocytopenia and hemolytic anemia. Carboxypeptidase B2 (CPB2), more commonly known as thrombin-activatable fibrinolysis inhibitor (TAFI), reduces fibrinolysis by cleaving the C-terminal residues of fibrin that bind and activate plasminogen. Elevated protein levels of carboxypeptidase B2 are associated with stroke in patients younger than 40 years (Biswas et al. 2008) and with cardiovascular disease (de Bruijne et al. 2009). A deficiency phenotype has been characterized in a mouse knock-out model in which fibrinolysis was shown to be reduced in both heterozygotes and homozygotes (Swaisgood et al. 2002); no human deficiency phenotype has, however, been described and only a few variants have been reported (Table 3). A partial loss copy number variant has been reported that spans 10 exons and which is expected to affect alternatively spliced transcripts (Table 3, CPB2, PL-10A, Matsuzaki et al. 2009). The hyaluronan-binding protein (HABP2), also known as factor VII-activating protease, binds hyaluronic acid, a component of the extracellular matrix and connective tissue; when activated; it has been demonstrated to cleave fibrinogen (FGA and FGB, but not FGG) and fibronectin (Choi-Miura et al. 2001) in addition to factor VII (Romisch et al. 2001). Variants are associated with carotid stenosis (Willeit et al. 2003) and venous thromboembolism (Hoppe et al. 2005). A deficiency phenotype for HABP2 has not yet been characterized in either human or mouse. Elevated HABP2 protein levels are observed in women taking equine estrogens (Katayama et al. 2009) and reduced protein levels have been observed with Budd Chiari syndrome, a thrombotic state resulting in the occlusion of the hepatic veins (Hoekstra et al. 2010). The histidine-rich glycoprotein (HRG) has been associated with thrombophilia and binds both plasminogen and heparin, thereby inhibiting both fibrinolysis and heparin action. Elevated protein levels of HRG have been associated with childhood thrombosis (Ehrenforth et al. 1999) and a recent report identified common variants of large effect in F12, KNG1 and HRG that are responsible for the variance observed in the activated partial thromboplastin time, a standard laboratory assessment of coagulability (Houlihan et al. 2010). A deficiency phenotype has been characterized in humans where a secretion defect of HRG was associated with a case of right transverse sinus thrombosis (Shigekiyo et al. 1998). A copy number variant has been characterized for HRG with a complete gene loss identified (Table 3, CL, Redon et al. 2006).