Online Supplementary Material

Group I-intron trans-splicing and mRNA editing events in mitochondria of placozoan animals

Gertraud Burger, Yifei Yan, Pasha Javadi and B. Franz Lang

Robert Cedergren Centre for Bioinformatics and Genomics, Département de Biochimie, Université de Montréal, 2900 Boulevard Edouard-Montpetit, C.P. 6128, Montréal (Québec), H3T 1J4, Canada

Methods

Sequence analysis

We reanalyzed the mitochondrial genome sequences of four placozoan strains (GenBank accession numbers DQ112541 (T. adhaerens); DQ889456 (BZ10101); DQ889458 (BZ2423); DQ889457 (BZ49)). Genes coding for proteins, structural RNAs, and group I and II introns were identified using MFannot developed in house (N. Beck, P. Rioux, and B.F. Lang, unpublished). This tool integrates similarity search tools for detection of coding regions and exons (Blast, Exonerate, HMMER[1-3], and RNAweasel for detection of introns and structured RNAs[4]. RNAweasel itself is based on Erpin[5], which provides E-values for the statistical evaluation of solutions[6]. Precise exon-intron boundaries were manually adjusted in a few instances, to reconcile intron secondary structure with protein or rRNA sequence conservation, based on multiple alignments constructed by Muscle[7]. Multiple sequence alignments were inspected with the GDE environment[8]. The records with corrected annotations were automatically generated with Mf2sqn (N. Beck and B.F.L, unpublished) and are deposited in GenBank under version 2 of the accession numbers NC_0008151 version 2, NC_008832, NC_008834, NC_008833. The 3D-protein structure of the bovine Cox1 sequence (1OC[9]; http://www.rcsb.org/pdb/explore/explore.do?structureId=1OCR) was visualized using the UCSF Chimera software[10] (version 1, build 2199).

Phylogenetic analysis

The dataset contains 13 mitochondrion-encoded proteins (Cox1, 2, 3, Cob, Atp6, 9, and Nad1, 2, 3, 4, 4L, 5, 6) and includes sequences from the four Placozoa, all Porifera and Cnidaria for which complete mtDNA sequences are available, five moderately fast-evolving bilaterian species, and an outgroup of six holozoan plus fungal species. Inclusion of substantially more of the extremely fast-evolving Bilateria increases LBA artefacts, i.e., other fast-evolving species such as glass sponges will be attracted to Bilateria (for an example see[11] ). Protein collections were managed and automatically aligned, trimmed and concatenated with Mams (developed in-house; B. F. Lang and P. Rioux, unpublished) using Muscle[12] for alignment and Gblocks[13] for removal of ambiguously aligned residues. The final dataset contains 56 taxa and 3004 amino acid positions. Phylogenetic analyses of protein sequences were performed by Bayesian inference (PhyloBayes[14]) using the CAT+Gamma model and four discrete categories (2500 cycles (corresponding to ~ 200 000 generations); the first 1700 cycles were removed as burn-in. Convergence was controlled by running four independent chains, which provided consistent results. The robustness of internal branches was evaluated based on 100 bootstrap replicates.


Supplementary Figures and Tables

Figure S1: Schematic phylogenetic trees of animals with reported placements of Placozoa.

Alternative phylogenetic positions of Placozoa (arrows) as frequently found in literature are mapped on two trees of animals.

Figure S2: Comparison of cox1 gene structure across placozoans.

The exons are aligned by coding content. Introns are indicated by arrows and identified by numbers that correspond to the nucleotide position after which they are inserted in the T. adhaerens coding sequence (see also Table S1). The two trans-splicing group I introns are i-731 and i824. Two introns are variable across species: i-634 occurs only in T. adhaerens and BZ2423 and is located in cox1_a; i-966 is restricted to BZ49 and locates to cox1_c.

Figure S3: Multiple sequence alignment of Cox1 proteins. Alignment of the central portion of the re-annotated cox1 protein of four placozoans and seven other eukaryotic species (GenBank accession numbers in parenthesis): Reclinomonas, Reclinomonas americana (AAD11923.1); Rhodomonas, Rhodomonas salina (AGG17762.1|AF288090_38); Arabidopsis, Arabidopsis thaliana (NP_085587.1); Amoebidium, Amoebidium parasiticum (AAN04062.1); Monosiga, Monosiga brevicollis (AAN28355.1); Bos, Bos Taurus (ABV70623.1); Homo, Homo sapiens (ACA22152.1); Geodia, Geodia (AAP59167.1); Metridium, Metridium senilis (AAC04630.1); BZ10101, BZ2423, BZ49 and T. adhaerens (this paper). The start position of the depicted protein sections is indicated at the beginning of the sequence. The numbering on top of the alignment indicates the amino acid positions of the placozoan proteins. A horizontal line highlights the four conserved residues that, in placozoans, are specified by a mini-exon that was missed in the initial gene annotation. At position 291, the otherwise invariable histidine in the HHM motif is a tyrosine in placozoans (by conceptual translation the gene sequence). We provide evidence that an RNA-editing event converts the UAU(Tyr) codon to CAU(His). Insertion points of placozoan introns are marked by arrows below the alignment. For intron nomenclature see Table S1.

Table S1. Intron insertion sites in placozoan mitochondrial genes1

Gene / Position in coding sequence (T. adhaerens) / Total
cox1 / nt 386 / nt 643 / nt 720 / nt 7312 / nt 8242 / nt 871 / nt 966 / nt 1134
T. adha. / i1 (IB) / i2 (II) / i3 (IB) / i4 (IB) / i5 (IB) / i6 (IB) / / / i7 (IB) / 7
BZ10101 / i1 (IB) / / / i2 (IB) / i3 (IB) / i4 (IB) / i5 (IB) / / / i6 (IB) / 6
BZ2423 / i1 (IB) / i2 (II) / i3 (IB) / i4 (IB) / i5 (IB) / i6 (IB) / / / i7 (IB) / 7
BZ49 / i1 (IB) / / / i2 (IB) / i3 (IB) / i4 (IB) / i5 (IB) / i6 (II) / i7 (IB) / 7
nad5 / nt 788
T. adha. / i1 (IB) / 1
BZ10101 / i1 (IB) / 1
BZ2423 / / / 0
BZ49 / i1 (IB) / 1
rnl_b / nt 777 / nt 1794
T. adha. / / / i1 (II) / 1
BZ10101 / i1 (II) / i2 (II) / 2
BZ2423 / / / i1 (II) / 1
BZ49 / / / / / 0

1 Newly identified introns are marked grey; intron insertions are after the indicated nucleotide position in the cox1 coding region; intron group in brackets.

2 Trans-spliced group I introns.

Table S2. cox1 exon annotations in T. adhaerens1

DQ112541 / This report / Length in amino acids2
Exon 1 / Exon 1 / 25202-25587 / 129
Exon 2 / Exon 2 / 26044-26300 / 86
Exon 3 / Exon 3 / 29255-29331 / 26
Exon 4 / 31942-31952 / 4
Exon 4 / Exon 5 / 41414-41506 / 31
Exon 5 / Exon 6 / 10037-9991 / 15
Exon 6 / Exon 7 / 8996-8734 / 88
Exon 7 / Exon 8 / 8045-7575 / 155

1 Grey, newly identified exon; bold, positions corrected in this report. Note that the sequence in the corrected GenBank entry starts upstream rnl (following conventions), while that of the original entry DQ112541 starts downstream of nad4.

2 When a codon is split by an intron, the corresponding amino acid is assigned to the exon that includes two nucleotides of the split codon.

References

1. Altschul, S.F., et al. (1990) Basic local alignment search tool. J Mol Biol 215, 403-410

2. Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics 14, 755-763

3. Slater, G.S., and Birney, E. (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31

4. Lang, B.F., et al. (2007) Mitochondrial introns: a critical view. Trends Genet 23, 119-125

5. Gautheret, D., and Lambert, A. (2001) Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J Mol Biol 313, 1003-1011

6. Lambert, A., et al. (2005) Computing expectation values for RNA motifs using discrete convolutions. BMC Bioinformatics 6, 118

7. Edgar, R.C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113

8. Smith, S.W., et al. (1994) The genetic data environment an expandable GUI for multiple sequence analysis. Comput Appl Biosci 10, 671-675

9. Tsukihara, T., and Yoshikawa, S. (1998) Crystal structural studies of a membrane protein complex, cytochrome c oxidase from bovine heart. Acta Crystallogr A 54, 895-904

10. Goddard, T.D., et al. (2007) Visualizing density maps with UCSF Chimera. J Struct Biol 157, 281-287

11. Haen, K.M., et al. (2007) Glass sponges and bilaterian animals share derived mitochondrial genomic features: a common ancestry or parallel evolution? Mol Biol Evol 24, 1518-1527

12. Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32, 1792-1797

13. Talavera, G., and Castresana, J. (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56, 564-577

14. Lartillot, N., and Philippe, H. (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21, 1095-1109