Supplemental Information

DNA sequence analysis of the Salmonella serogroup O13 rfb gene cluster

i) Sugar biosynthetic genes

a) Fucose Biosynthesis genes. GDP-L-fucose is synthesized via extension of the GDP-D-mannose pathway, by enzymes encoded by three genes (gmd, fcl and gmm). Gmd converts GDP-mannose to a 4-keto-6-deoxy derivative used as an intermediate and Fcl converts the intermediate to GDP-Fucose. Gmm encodes GDP-manosyl hydrolase, which removes GDP-mannose from the GDP-fucose biosynthetic pathway and is thought to be involved in regulation of other sugar pathways in E. coli and S. enterica [57]. The G+C content of these five genes from the colonic acid gene cluster of Salmonella serogroup B range from 57% to 63% [64], suggesting four of the five fucose biosynthesis genes in the O13 O-antigen cluster were likely derived from fucose biosynthesis genes from a colonic acid gene cluster. Proteins encoded by orf3, orf4, orf5, orf6 and orf7 shared high homology to Gmd, Fcl, Gmm, ManC and ManB, respectively, from the O-antigen gene clusters from Salmonella serogroups O30 and O50 (92%-99% amino acid identity), from E. coli O128, O127 and O86 (77%-94%) (Table 2), and corresponding genes from the colonic acid gene cluster from Salmonella serogroups A, B and D (61% - 98%). We propose orf3, orf4, orf5, orf6 and orf7 encode the same enzymes for biosynthesis of GDP-L-Fuc of the Salmonella O13 O-antigen and name them gmd, fcl, gmm, manC and manB, respectively. Four of the five fucose biosynthesis genes had a higher G+C content ranging from 56.0-60.9%; the remaining gene, manC had a G+C content of 38.9% which was similar to the rest of the genes in the O13 rfb cluster.

Genes for the synthesis of GalNAc were expected in the gene cluster from Salmonella serogroups O13. orf1 was identified as gne by its high level of amino acid identity to gne genes from E. coli, Yersinia entercolitica serogroup O8 and Haemophilis influenzae (Table 1). In these O antigen clusters, gne encodes UDP-GlcNAc C4 epimerase, which catalyzes the conversion of GlcNAc to GalNAc [17, 60, 65]. orf1O13 was assigned the same function in the synthesis of GalNAc of the Salmonella O13 O antigen, respectively and was named gne. Salmonella O50 also has a gne gene located close to its O antigen gene cluster, upstream of galF [49]. The O50 gne gene displayed a lower homology (16.3-19.9% amino acid identity) to gne from Salmonella serogroup O13.

ii) Transferase genes. The O:13 rfb gene cluster contained three ORFs predicted to encode putative transferases. orf2, orf10 and orf11 shared varying degrees of similarity with glycosyltransferases from a range of other organisms (Table 1) and were named wfbG, wfbH and wfbI respectively. Based on the Carboydrate Active enZYme database (CAZy). (www.cazy.org) classification [61], wfbG is a retaining glycosyltransferase from family GT4. This transferase shared 53% amino acid similarity with a plasmid-encoded galactosyl transferase of Shigella dysenteriae (Klena et al 1992), 49% and 47% amino acid identity to the gene product of wbdH, which encodes a putative galactosyl transferase from E. coli O111 and Salmonella serogroup O35 [57]. When wfbG was compared to 41 other known Salmonella transferase genes, there was limited homology (2.9-14.9%), except for wbgM (23.9%) from serogroup O50 which also encodes a galactosyl transferase. We propose wfbG is a putative galactosyl transferase. wfbH is a putative inverting family GT2 glycosyltransferase. The closest homologs to this transferase were wbiP from E.coli O127 (71%) and wcmC from E. coli O86 (61%), which are sugar transferases responsible for the formation of β-D-Gal–(1→3)-α-D-GalNAc linkages [17]. The same linkage is present in the O antigen of Salmonella serogroup O13. wfbH is proposed to encode a transferase for the same linkage in Salmonella serogroup O13. wfbI is a inverting family GT 11; the known function of this family is α-1,2-fucosyltransferase activity. wfbI shared 61% and 51% amino acid identity to wbiQ and wcmD encoding α-1,2-fucosyltransferases from E. coli O127 and O86 [17]. Therefore, we propose wfbI to encode a fucosyltransferase.

iii) O antigen transport and polymerase genes. Wzx and Wzy proteins can be difficult to predict by sequence searches as sequence identity levels are often low, reflecting the unique specificity of these proteins [48]. Multiple predicted transmembrane domains with a large periplasmic loop is a characteristic topology associated with O antigen polymerases [62]. orf9 from O13 was predicted to encode hydrophobic membrane proteins with 10 transmembrane domains, with a periplasmic loop of 44 amino acid residues, respectively. orf9 from O13 had higher homology to known Wzy from several E. coli serotypes than Salmonella serogroups, with a 62%, amino acid identity to the Wzy protein of E. coli O127 [17, 63, and 58] and only 8.3% - 16.8% to the Wzy protein of 12 Salmonella serogroups. We predict that orf9 from the O antigen gene clusters of O13 is the O antigen polymerase gene, wzy. orf8 from O13 is predicted to encode integral inner membrane protein with 11 transmembrane segments respectively. Similar to other genes in the O13 gene cluster orf8 was most similar to the corresponding gene in E. coli O127 (35% amino acid identity) [17]. Its homology to Salmonella characterized wzx genes ranged from 8.8% to 29% amino acid identity, the highest being for serogroup O50. We predict orf8 from O13 is the O antigen transport gene, wzx.

References

60. Bengoechea, J.A., E. P inta, T. Salminen, C. Oertelt, O. Holst, J. Radziejewska-Lebrecht, Z.

Piotrowska-Seget, R. Venho, and M. Skurnik. 2002. Functional characterization of Gne (UDP-N-

acetylglucosamine-4-epimerase), Wzz (chain length determinant), and Wzy (O-antigen polymerase) of

Yersinia enterocolitica serotype O:8. J. Bacteriol. 184:4277-4287.

61. Coutinho, P. M., E. Deleury, G. J. Davies, and B. Henrissat. 2003. An evolving hierarchical family

classification for glycosyltransferases. J. Mol. Biol. 328:307-317.

62. Morona, R., M. Mavris, A. Fallarino, and P. A. Manning. 1994. Characterization of the rfc region of

Shigella flexneri. J. Bacteriol. 176:733-747.

63. Paton, A.W. and J. C. Paton. Molecular characterization of the locus encoding biosynthesis of the

lipopolysaccharide O antigen of Escherichia coli serotype O113. Infect. Immun. 67: 5930-5937.

64. Stevenson G, Lan R, Reeves PR. 2000. The colanic acid gene cluster of Salmonella enterica has a

complex history. FEMS Microbiol. Lett. 191:11-16.

65. Zhang L, Radziejewska-Lebrecht J, Krajewska-Pietrasik D, Toivanen P, and Skurnik M. 1997.

Molecular and chemical characterization of the lipopolysaccharide O-antigen and its role in the

virulence of Yersinia enterocolitica serotype O:8. Mol. Microbiol. 23:63-76.

Supplemental Table 1. S. enterica serogroup O13 antigen biosynthetic proteins

Orf Sizeb Similar proteinsa % identity Putative function database accession #

orf1 340 Gne E. coli O127 60% UDP-glucose-4-epimerase AAR90883

E. coli O86 60% AAV85952

Y. enterocolitica O8 58% AAC60777

orf2 375 Rfp S. dysenteriae 50% glycosyl transferase AAC60480

WbdH E.coli O111 46% AAD46728

WbdH S. enterica O35 44% AAK83009

orf3 373 Gmd S. enterica O30 99% GDP-mannose-dehydratase AAV34512

S. enterica O50 98% AAV34519

E. coli O127 93% AAR90885

orf4 322 Fcl S. enterica O30 98% GDP-fucose synthetase AAV34513

S. enterica O50 94% AAV34520

Salmonella serogroups A, B, D 99%

E. coli O128 90% AAO37692

E. coli O127 87% AAR90886

orf5 167 Gmm S. enterica O30 94% GDP-mannosyl hydrolase AAV34514

S. enterica O50 92% AAV34521

Salmonella serogroups A, B, D 93-95%

S. enterica O35 42% AAK83011

E. coli O127 78%

orf6 483 ManC GDP-mannose pyrophosphorylase

orf7 458 ManB phosphomannomutase

orf8 412 Wzx E. coli O127 35% O antigen transporter AAR90890

S. enterica O50 29% AAV34524

orf9 389 Wzy E. coli O127 62% O antigen polymerase AAR90892

orf10 246 WbiP E. coli O127 71% glycosyl transferase AAR90893

WcmC E. coli O86 61% AAV85962

WbgO S. enterica O50 33% AAV34525

orf11 287 WbiQ E. coli O127 61% fucosyl transferase AAR90894

WcmD E. coli O86 51% AAV85963

Supplemental Table 2. Comparison of fucose biosynthesis genes from serogroup O13

O13 O30 O35 O50 LT2* O86 O127 O128 O41 O125 YeIb YeO8

Gmd

Length (aa) 373 373 373 373 374 374 374 375 - - 374 373

% G+C 56.00 55.41 37.80 55.23 56.60 52.50 52.67 51.47 41.0 40.93

AA homology: O13 - 99.2 85.3 98.7 98.1 93.3 92.8 93.6 - - 79.6 77.7

Fcl

Length (aa) 322 322 NoFcl 239 322 322 286 322 - - 322 322

% G+C 59.94 59.94 - 57.74 59.83 56.21 55.59 55.80 - - 41.51 41.30

AA homology: O13 - 98.1 - 94.1 98.4 90.7 86.7 90.1 - - 74.5 74.2

Gmm

Length (aa) 167 167 150 167 158 168 167 168 - - - -

% G+C 56.5 56.49 30.22 55.69 60.34 52.18 52.50 51.79

AA homology: O13 - 94.0 42.0 92.2 91.9 77.8 77.8 77.2

ManC

Length (aa) 483 483 482 483 481 483 483 483 468 456 469 466

% G+C 38.85 39.13 36.38 39.34 60.43 36.37 35.16 37.41 37.46 37.57 42.0 42.78

AA homology: O13 - 91.7 63.3 92.8 60.7 79.1 79.3 80.5 54.1 55.0 55.2 55.2

ManB

Length (aa) 458 457 459 461 457 463 458 460 457 457 458 457

% G+C 60.92 60.98 34.2 60.3 61.71 50.47 52.77 53.77 54.05 54.34 38.79 37.42

AA homology: O13 - 95.4 66.6 92.1 95.8 86.0 86.5 87.1 88.4 87.5 72.5 71.1