General genome features of R. massiliae

The R. massiliae chromosome has an average G+C content of 32.5%, consistent with the nucleotide composition of the other sequenced SFG Rickettsia (from 32.3% to 32.5% for R. conorii, R .sibirica, R. felis, R. akarii and R. rickettsii). The inversion of the GT-excess curve, thought to coincide with the origin of replication (Andersson et al. 1998; McLeod et al. 2004; Ogata et al. 2001; Ogata et al. 2006), occurred at proximity of the hemE gene, near base number 1. The R. massiliae plasmid is significantly G+C poorer (31.4%) than the R. massiliae chromosome and the R. felis plasmid (33.6%). Its nucleotide composition is most similar to those of the chromosomes of R. bellii (31.6% G+C) and R. canadensis (31.1% G+C).

As expected from their close relatedness (identity >98% in coding sequence), the R. massiliae and R. conorii chromosomes exhibit a high level of colinearity (Figure 1). Several nested inverted segments are located in a 145 Kb region of the R. massiliae chromosome close to the predicted site of replication termination (between positions 675,066 bp and 828,993 bp in Figure S11). This region is reportedly a hot spot for rearrangements as independent inversions were also identified in R. akari, R. rickettsii and the ancestor of R. typhi and R. prowazekii(Blanc et al. 2007; McLeod et al. 2004). Furthermore, a 37 Kb region is inverted relative to R. conorii near the predicted origin of replication (between position 1,296,229 bp and 1,360,898 bp in Figure S11). This rearrangement is not shared by the other rickettsiae and therefore must have specifically occurred in the R. massiliae lineage.

The R. massiliae genome contains 574 identified repeated elements, most (98%) of which belonging to the Repeated Palindromic Element (RPE) families (Claverie and Ogata 2003; Ogata et al. 2001). The repeat sequences are exclusively located on the chromosome on which they encompass 3.7% (51 Kb) of the sequence. 97.7% of the repeats are conserved in colinear position in the R. conorii genome, including the ones inserted within the coding sequences of functional genes (Ogata et al. 2001). Only 11 R. massiliae repeats (including 6 truncated units) exhibit no counterparts in the R. conorii genome (Figure S11). Phylogenetic analyses of the 5 full length repeats provided no evidence for recent duplications from other copies elsewhere in the genome: the phylogenetic distances between the orphan repeats and their paralogs were not smaller than the distances between orthologous repeats in R. massiliae and R. conorii (Figure S12). In fact, the orphan repeats are located within larger (71 – 5470 bp) segments that were missing and probably deleted in the R. conorii genome. Thus, the vast majority if not all of the repeated elements found in the R. massiliae genome were presumably present before the separation with R. conorii.

Proteome comparisons

We detected 1,180 protein genes on the chromosome, including 212 genes fragmented in 421 identified open reading frames (i.e. defined as split genes (Ogata et al. 2001)). 1017 of the genes (86 %) exhibited homologs outside Rickettsia in the Genpep database and 929 (79 %) were assigned putative functions. In addition, 33 tRNA genes, 1 set of rRNA genes, and 3 other structural RNA genes (tmRNA, 5S RNA and M1 RNA) were predicted. As in the other Rickettsiales (Andersson et al. 1999), the 16S rRNA gene was separated from the 23S and 5S rRNA genes. Despite the high level of genomic colinearity, R. massiliae contains 8.7% more predicted genes on its chromosome than R. conorii, which possesses 1,086 genes including 157 fragmented genes (according to the revised annotation by Blanc et al. 2007). Consistently, the R. massiliae chromosome is 7.3% larger than that of R. conorii (1,268 Kb). The two chromosomes share 859 full length protein genes. In addition to this conserved set, R. massiliae and R. conorii possess respectively 109 and 70 specific intact genes that are either fragmented or absent in the other genome (Table S1). Eighty five genes (including 26 fragmented genes) are located on DNA segments that are missing in the R. conorii genome (Figure S11). Conversely, 21 R. conorii genes (including 7 fragmented genes) are found on genomic sequences missing in R. massiliae. The most remarkable of the R. massiliae-specific fragments is a 54.6 Kb region containing 44 genes, including 14 tra-related genes (see below). R. massiliae contains also three antitoxin and two toxin genes that are absent or fragmented in R. conorii. Conversely, R. conorii possesses a full length toxin gene that is fragmented in R. massiliae.

Most of the 968 intact protein genes on the R. massiliae chromosome exhibit reciprocal best BLASTP hit in at least one of the 10 other sequenced Rickettsia genomes. Furthermore, the homologous sequences were generally organized in colinear order. Although reciprocal best hit and colinearity are common criterions used to infer orthologous relationships (sequences separated by a speciation event), closer examination of some specific genes demonstrated that orthology was not true for certain genes satisfying the two rules (i.e. the tra cluster genes; see below). Twenty six gene products exhibited no reciprocal best blast hit in the other Rickettsia species (BLASTP E-value<1e-5). They encode 21 transposases, 3 proteins of unknown function, an archaeal-type ATPase and a protein partially similar to the DNA polymerase III alpha chain. Thus, the R. massiliae chromosome appears to encode only a limited number of novel functions with respect to the other Rickettsia. We also identified six fragmented genes that are intact in all of the ten other sequenced genomes. They encode two proteins involved in tRNA modification: the dihydrouridine synthase (Dus) and the queuosine synthetase (QueA), two ABC transporter subunits, one of the 5 copies of the VirB6 protein (VirB6-5) and the HemY protein involved in heme biosynthesis. In addition, 18 R. massiliae fragmented genes were found without orthologous counterparts. These are homologous to a proline/betaine transporter, a tetratricopeptide-repeat containing protein, an autotransporter-domain protein of the sca family (sca17), a type I restriction system subunit, 3 unknown proteins and 11 other transposases.

The relatively high number of specific transposase genes and pseudogenes (36 sequences) suggests that R. massiliae underwent a recent expansion of transposons, a phenomenon also encountered in the chromosomes of R. bellii (39 sequences) and R. felis (66 sequences) but not in the other sequenced Rickettsia genomes. Interestingly, the R. massiliae, R. bellii and R. felis genomes contain several tra genes that probably encode components of a type IV secretion system (T4SS) for conjugal DNA transfer (Ogata et al. 2006; Ogata et al. 2005). The recently released genome of Orientia tsutsugamushi(Cho et al. 2007), a remote relative of Rickettsia species, exhibits also very high numbers of transposase (>400 genes) and tra genes (359 genes). Thus, there may be a link between the relative abundance of transposase genes in the Rickettsiale genomes and an active process of conjugation enabling the acquisition of foreign DNA. Some of the R. massiliae transposons disrupted genes that are presumably functional in other Rickettsia species. For example the transposon comprising the two annotated ORFs RMA_1000 and RMA_1001 (5 identical copies in the R. massiliae genome) is inserted into a fragmented glycosyltransferase gene (formed by the ORFs RMA_0999, RMA_1002 and RMA_1003) that is intact in R. felis (RF_0407) and R. bellii (RB_0756). Other examples are the transposase genes RMA_0585 (6 identical copies in the genome) and RMA_0459 (11 identical copies in the genome) that respectively disrupted a putative RND efflux transporter gene [ORFs RMA_0582 to RMA_0588 in R. massiliae; intact in R. bellii (RB_0647)] and an anonymous gene [ORFs RMA_0458 and RMA_0460 in R. massiliae; intact in R. felis (RF_0525) and R. bellii (RB_0314)].

Plasmid

The R. massiliae plasmid is predicted to contain 12 protein genes and a pseudogene of transposase (RMA_p09). The similarity between the R. massiliae and R. felis plasmid sequences is fragmentary (Figure S9). Seven R. massiliae plasmid genes have homologs in the R. felis plasmid. The conserved genes encode two unknown proteins (RMA_p12 and RMA_p14), a transposase (RMA_p08), a recombinase (RMA_p07), a tetratricopeptide-repeat containing protein (RMA_p09), a protein similar to the plasmid stability protein parA (RMA_p15), and a protein (RMA_p01) similar to the product of the U gene. The R. felis plasmid proteins are always the most similar homologues in databases, except for parA that is most closely related to its homologue on the pCL1 plasmid of Chlorobium limicola. A combination of these conserved genes might be sufficient to create an artificial replicon in Rickettsia. The remaining non conserved genes encode 2 unknown proteins, 2 transposases (including the split transposase gene), spoT22 and a leucine rich protein. In contrast to the R. felis plasmid, which harbors genes putatively involved in plasmid transfer (traGF, traDF, traDTi and traATi-like proteins), the R. massiliae plasmid does not encode identifiable conjugative function. However, the presence of the tra gene cluster in the chromosome may probably enable the mobilization of the plasmid through conjugation processes.

REFERENCES

Andersson, S.G., D.R. Stothard, P. Fuerst, and C.G. Kurland. 1999. Molecular phylogeny and rearrangement of rRNA genes in Rickettsia species. Mol Biol Evol16: 987-995.

Andersson, S.G., A. Zomorodipour, J.O. Andersson, T. Sicheritz-Ponten, U.C. Alsmark, R.M. Podowski, A.K. Naslund, A.S. Eriksson, H.H. Winkler, and C.G. Kurland. 1998. The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature396: 133-140.

Blanc, G., H. Ogata, C. Robert, S. Audic, phane, K. Suhre, G. Vestris, J.-M. Claverie, and D. Raoult. 2007. Reductive Genome Evolution from the Mother of Rickettsia. PLoS Genetics3: e14.

Cho, N.-H., H.-R. Kim, J.-H. Lee, S.-Y. Kim, J. Kim, S. Cha, S.-Y. Kim, A.C. Darby, H.-H. Fuxelius, J. Yin et al. 2007. The Orientia tsutsugamushi genome reveals massive proliferation of conjugative type IV secretion system and host-cell interaction genes. PNAS104: 7981-7986.

Claverie, J.M. and H. Ogata. 2003. The insertion of palindromic repeats in the evolution of proteins. Trends Biochem Sci28: 75-80.

McLeod, M.P., X. Qin, S.E. Karpathy, J. Gioia, S.K. Highlander, G.E. Fox, T.Z. McNeill, H. Jiang, D. Muzny, L.S. Jacob et al. 2004. Complete genome sequence of Rickettsia typhi and comparison with sequences of other rickettsiae. J Bacteriol186: 5842-5855.

Ogata, H., S. Audic, P. Renesto-Audiffren, P.-E. Fournier, V. Barbe, D. Samson, V. Roux, P. Cossart, J. Weissenbach, J.-M. Claverie et al. 2001. Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science293: 2093-2098.

Ogata, H., B. La Scola, S. Audic, P. Renesto, G. Blanc, C. Robert, P.E. Fournier, J.M. Claverie, and D. Raoult. 2006. Genome sequence of Rickettsia bellii illuminates the role of amoebae in gene exchanges between intracellular pathogens. PLoS Genetics2: e76.

Ogata, H., P. Renesto, S. Audic, C. Robert, G. Blanc, P.E. Fournier, H. Parinello, J.M. Claverie, and D. Raoult. 2005. The Genome Sequence of Rickettsia felis Identifies the First Putative Conjugative Plasmid in an Obligate Intracellular Parasite. PLoS Biol3: e248.