SUPPLEMENTARY MATERIALS AND METHODS
Choanoflagellate Culture Conditions
Cultures of the loricate choanoflagellates Diaphanoeca grandis (Ellis 1930) and Stephanoeca diplocostata (Ellis 1929) were obtained from Barry Leadbeater (University of Birmingham, U.K.). Each species was cultured in artificial seawater medium (36.5 gL-1 Marin Salts (Dr. Biener Aquarientechnik, Wartenberg Germany) in ddH2O. The artificial seawater was vacuum-filtered through a 0.22µm Steriop GP Express Plus filter (Millipore, Massachusetts U.S.A.) into a sterile 1L screw-top glass bottle (Schott Duran). The filtered artificial seawater was then sterilized by autoclaving. New cultures were split under a Labcaire PCR 6 Workstation hood (Labcaire Systems, Avon, U.K.) to reduce the risk of contamination by foreign microorganisms. Cultures were grown in 100ml, 250ml and 500ml glass bottles with plastic screw-tops (Schott Duran). Starting cultures containing 50-200ml were topped up with sterile artificial seawater to 80% of the volume of the culture vessel. Splitting of cultures occurred every 3-5 weeks. Up to three grains of dry-autoclaved white long grain rice were added to provide nutrition for the prey bacteria in the cultures. For both species, cultures were maintained at 13.5oC in an incubator.
RNA preparation
RNA was extracted from cultures of S. diplocostata using a TRIzol (Invitrogen) based method (as employed in [1]). No antibiotic or filtration purification methods were employed in case they interfered with the normal choanoflagellate gene expression and in particular with transcription of biomineralization-related genes. Each RNA sample was tested for concentration and integrity using a 2100 Bioanalyser (Agilent Technologies, Waldbronn, Germany). The RNA 600 Nano Chip and 2100 Expert software (Agilent Technologies, Waldbronn, Germany) were used to generate an electropherogram of the RNA samples as per the manufacturer’s instructions. Degraded RNA samples were rejected. RNA samples were pooled to give 55µg of total culture RNA, of which approximately 10µg was S. diplocostata total RNA.
cDNA Library Preparation
As the RNA samples were inevitably contaminated with large amounts of rRNA and RNA from prey bacteria present in the cultures, two rounds of poly(A) mRNA enrichment were performed using the Dynabeads mRNA Purification Kit (Invitrogen) and subsequent rRNA contamination determined by running 1µl of the enriched mRNA on a 2100 Bioanalyser using an RNA 6000 Picochip (Agilent Technologies, Waldbronn, Germany).180ng of enriched mRNA (<10% rRNA contamination) was then used to construct a 454 transcriptome library as outlined in the cDNA rapid library Preparation Method (Roche). Library quality was assessed by running 1µl of the library on a DNA High Sensitivity Labchip (Agilent), and the number of viable library molecules per µl determined using the KAPA 454 qPCR library quantification kit (Kapabiosystems) on a Step-One qPCR machine (Applied Biosystems).
454 Sequencing
For full scale emulsification PCR a ratio of 1.3 molecules per bead was employed. Eight SV oil tubes from the GS Titanium SV emPCR Kit (Lib-L) were used to generate sufficient enriched templated beads for 454 sequencing. Approximately 2x106 enriched templated beads was subjected to 454 pyrosequencing on half of a picotitre plate on the GS FLX sequencer (Roche) using the GS FLX Titanium Chemistry according to the manufacturer's protocol.
Assembly method
Post-run sequence outputs were viewed in gsRunBrowser in order to verify their metrics and confirm that the sequencing was successful. An assembly for the S. diplocostata sequence data was generated using the Newbler v2.3 software (Roche).
Bioinformatic Analysis
The S. diplocostata EST contigs were filtered to remove those contigs ≤10 bp in length, resulting in the removal of 528 contigs from the dataset. Custom-written BioPerl scripts were used to classify the source organism for each of the remaining 25,797 contigs: first via tBLASTx [2] against a local copy of the NCBI’s non-redundant (nr) database (October 2010 release), accepting anything with with a threshold E-value < 0.01, and secondly by taking the best hit for each contig and interrogating the Entrez nucleotide database ( to find its taxonomic identity within the Genbank taxonomy (
In this way, a probable taxonomic identity was assigned to over half of the contigs (13,716), allowing the identification of over 3376 from choanoflagellates. A stand-alone copy of InterProScan v 4.6 [3] was used to obtain InterPro and GO annotation for the contigs (InterPro databaseversion 27.0;
Diaphanoeca grandis Genomic DNA Extraction and Analysis
Cultures of D. grandis were treated with a combination of 2.4ng/ml ampicillin (Sigma), 1.2ng/ml kanamycin (Sigma)and 1.2ng/ml streptomycin-penicillin (Gibco) for 36 hours in order to reduce the amount of bacterial contamination. 50ml of culture was then filtered through a 20μm nylon mesh (Small Parts Inc., Florida, USA) and 15ml of the filtrate collected for gDNA extraction, in a further attempt to remove a portion of the natural bacterial contamination present in the cultures. Approximately 20µg of gDNA was extracted from those cultures of D. grandis that were observed to have the highest amount of choanoflagellate material compared to bacterial contamination. DNA was extracted by a CTAB Buffer based method [4].
The extracted gDNA was sequenced using 120bp paired-end reads with Illumina HiSeq2000 sequencing (Illumina Inc.). The sequence reads produced were assembled into contigs with ABySS v1.2.5 [5] using the default settings. The assembled genomic dataset was analyzed further by tBLASTx [2] to detect sequence similarity to individual genes. A wider taxonomic assignment was conducted using the metagenomic analysis program PhymBL v3.2 [6]to classify contigs as being of bacterial or choanoflagellate origin. The choanoflagellate reference dataset comprised choanoflagellate sequences available in the EMBL/Genbank WGS genomes andnon-redundant nucleotide sequences databases. The prokaryotic reference dataset used was the bacterial/archeal genome database included with PymmBL v3.2. Contigs were arbitrarily divided into those <1kb and those >1kb in size. These two datasets were used as separate queries and for both query datasets the default PhymmBL settings were used.
SUPPLEMENTARY RESULTS
Stephanoeca diplocostata EST Dataset
RNA samples extracted from S. diplocostata cultures were oligo-dT bead treated to enrich for poly(A) tagged eukaryotic mRNA. The success of this process was tested for using the Agilent Bioanalyser Picochip and Pico assay software. By comparison to control eukaryotic RNA samples, the first round of enrichment produced a large reduction in contamination and a second round of poly(A) selection produced a marked reduction in rRNA content, with almost all of the rRNA peaks disappearing but a broad mRNA peak being retained. The remaining rRNA was measured at 7% of the total sample, below the 10% threshold recommended for 454 EST sequencing. The total choanoflagellate RNA after two rounds of poly(A) enrichment amounted to 180ng.
The results of the EST sequencing and assembly are summarized in table S1. The 454 Titanium sequencing produced 0.261Gb. The average read length was 329 bases (standard deviation ±110 bases, median read length= 347 bases) with a maximum read length of 659 bases, roughly in keeping with the predicted metrics for this sequencing platform. The Q40 score (a base identification of 99.9% accuracy) was 94.4%. The Newbler assembly of the reads produced 26325 contigs of mean length 962bp. The longest contig was 12.6kb long.
tBLASTx Analysis
The tBLASTx search of the EST dataset against the full EMBL/Genbank non-redundant nucleotide database was used to assign (a) similarity and (b) taxonomic identity to each contig. Hits to selected taxonomic groups are expressed as the total number and as a percentage of the EST dataset in table 4. It should be noted that these are only top hits and in the vast majority of cases equally or only marginally less significant hits to M. brevicollis were also returned.
The tBLASTx findings in Table 4 demonstrate the success of the poly(A) enrichment procedures in reducing the levels of bacterial, archaeal, viral and rRNA contamination. The enrichment is estimated to have reduced prokaryotic content from approximately 80% in the starting material to 13.5% of the final contigs. The true number of bacterial contigs may be even lower, given the proliferation of prokaryotic-to-eukaryotic lateral gene transfer found in choanoflagellates [7,8]. Of the tBLASTx hits241 were to known ribosomal RNA sequences, with 15% (37) eukaryotic rRNA and the remaining 85% (204) bacterial. Again this demonstrates the success of the mRNA enrichment procedures. All predicted eukaryotic rRNA contigs produced perfect (E-value= 0.0) hits to the S. diplocostata 18S and 23S sequences from the EMBL/Genbank databases.
There is no evidence for contamination from other eukaryotes. There were no large numbers of hits to one species (apart from M. brevicollis) and an absence of non-choanoflagellate housekeeping genes. Only 62 top hits were to human sequences, each having low E-values, indicating that RNA samples were not contaminated by lab workers during RNA extraction or cDNA library construction.
Approximately 24% of top hits were to M. brevicollis sequences, with a further 0.6% coming from other choanoflagellate sequences in the EMBL/Genbank database (note that this analysis pre-dates the submission of the S. rosetta genome to Genbank). These hits included housekeeping genes that would be expected to be conserved within clades, e.g. ribosomal RNA, alpha tubulin [9]. M. brevicollis represented the single species with the largest number of hits, the majority of which had highly significant E-values confirming the successful sequencing of loricate choanoflagellate genes.
The metazoans were the largest clade producing top hits, with 39.8% of hits. This is largely due to the tBLASTx query database containing over 50 fully sequenced animal species versus only only one fully sequenced choanoflagellate species, Monosiga brevicollis[10], and the many more animals with smaller scale sequence depositions into EMBL/Genbank. The high number of hits to metazoan sequences (and to opisthokont sequences, 68.2%) once again confirms the opisthokont affinity of loricate choanoflagellates and the evolutionary relationship between the choanoflagellates and metazoans [11–13].
A further notable finding of the tBLASTx analysis of the S. diplocostata EST dataset is that there are a large number of hits to sequences from other, distantly related eukaryotic groups. The most prominent of these are the stramenopiles (5.5%) and viridiplantae (archaeplastids) (6.8%). Given the low levels of sampling in these groups with respect to large-scale sequencing project, bias due to sequence availability cannot be used to fully explain these results. One explanation is gene loss in the close relatives of loricate choanoflagellates from the eukaryotic last common ancestor. Gene loss has been observed in the non-loricate choanoflagellates [14] and metazoans [15]. Another explanation for the tBLASTx results is eukaryotic-eukaryotic lateral gene transfer, known to be a prominent feature of choanoflagellate genomes [16,17].
Diaphanoeca grandis Genome Dataset
The Illumina sequencing of genomic DNA from D. grandis cultures provided 329,237,297bp of sequence data. The sequenced reads were assembled into 921,181 contigs, (sequence dataset available from the authors on request). However these were mainly short contigs (N50=725bp, mean contig length= 357.41bp). Local tBLASTx searches detected 100% matches to known sequences from D. grandis [1,9], confirming that D. grandis genomic material had been successfully sequenced. PhymBL analysis found that the vast majority (>98%) of contigs greater than 1kb in length, and all contigs >10kb in length, were of bacterial origin.The dataset did not contain sufficient choanoflagellate genes, nor genes of sufficient completeness, to merit further large-scale taxonomic or protein domain analysis.
The partial genome sequence datadid allow detection of contigs with significant similarity to parts of the SdSITa sequence using tBLASTx searches (see Results). PCR primers were designed from the longest contig sequence (see table S4) and the amplified PCR product (DgSITa) cloned, sequenced and used for all further analyses (see Materials and Methods, Results).
REFERENCES
1 Steenkamp, E. T., Wright, J. & Baldauf, S. L. 2006 The protistan origins of animals and fungi. Molecular Biology and Evolution23, 93-106.
2 Altschul, S. F., Madden, T. L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research25, 3389-3402.
3 Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R. & Lopez, R. 2005 InterProScan: protein domains identifier. Nucleic Acids Research33, W116-W120.
4 Doyle, J. & Doyle, J. 1987 A rapid DNA isolation method for small quantities of fresh tissues. Phytochemical Bulletin19, 11-15.
5 Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. M. & Birol, I. 2009 ABySS: a parallel assembler for short read sequence data. Genome Research19, 1117-11123.
6 Brady, A. & Salzberg, S. L. 2009 Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nature Methods6, 673-676.
7 Torruella, G., Suga, H., Riutort, M., Peretó, J. & Ruiz-Trillo, I. 2009 The evolutionary history of lysine biosynthesis pathways within eukaryotes. Journal of Molecular Evolution69, 240-248.
8 Sun, G. & Huang, J. 2011 Horizontally acquired DAP pathway as a unit of self-regulation. Journal of Evolutionary Biology24, 587-595.
9 Carr, M., Leadbeater, B. S. C., Hassan, R., Nelson, M. & Baldauf, S. L. 2008 Molecular phylogeny of choanoflagellates, the sister group to Metazoa. Proceedings of the National Academy of Sciences of the United States of America105, 16641-16646.
10 King, N. et al. 2008 The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature451, 783-788.
11 Nitsche, F., Carr, M., Arndt, H. & Leadbeater, B. S. C. 2011 Higher Level Taxonomy and Molecular Phylogenetics of the Choanoflagellatea. Journal of Eukaryotic Microbiology58, 452-462.
12 Ruiz-Trillo, I., Roger, A. J., Burger, G., Gray, M. W. & Lang, B. F. 2008 A phylogenomic investigation into the origin of metazoa. Molecular Biology and Evolution25, 664-672.
13 Torruella, G., Derelle, R., Jordi, P., Lang, B. F., Andrew, R., Shalchian-Tabrizi, K. & Iñaki, R.-T. 2011 Phylogenetic relationships within the Opisthokonta based on phylogenomic analyses of conserved single copy protein domains. Molecular Biology and Evolution29, 531-544.
14 Sebe-Pedros, A., de Mendoza, A., Lang, B. F., Degnan, B. M.Iñaki, R.-T. 2010 Unexpected repertoire of metazoan transcription factors in the unicellular holozoan Capsaspora owczarzaki. Molecular Biology and Evolution 28, 1241-1254.
15 Chauve, C., Doyon, J.-P. & El-Mabrouk, N. 2008 Gene family evolution by duplication, speciation, and loss. Journal of Computational Biology 15, 1043-1062.
16 Sun, G., Yang, Z., Ishwar, A. & Huang, J. 2010 Algal genes in the closest relatives of animals. Molecular Biology and Evolution27, 2879-2889.
17 Nedelcu, A.M., Miles, I. H., Fagir, A.M. & Karol, K. 2008 Adaptive eukaryote-to-eukaryote lateral gene transfer: stress-related genes of algal origin in the closest unicellular relatives of animals. Journal of Evolutionary Biology21, 1852-1860.
Supplemental Information Tables
Table S1.Summary statistics for the EST sequencing results. Statistics refer to the EST dataset after assembly by Newbler.
% A / 21% T / 25
% G / 25
% C / 22
No. of contigs / 26,325
Mean contig length / 962
Median contig length / 720
Max. contig size / 12,628
N50 / 1,300
Table S2. Top hits to contigs from the tBLASTx search against the EMBL/Genbank database. Classifications are as per the Entrez taxonomy (as of October 2010). The total number of contigs in the search query was 25,797. The number of contigs that provided a hit was 13,716 and the number of contigs with no significant similarity at e-value <0.01 was 12081.
Group/Species / Number of contigs with top hit to this taxon / % of Total Contigs with hitsFungi/Metazoa (i.e. Opisthokonts) / 9352 / 68.2
Metazoa / 5457 / 39.8
Eumetazoa / 5210 / 38
Homo sapiens / 62 / 0.5
Choanoflagellates / 3376 / 24.6
Monosiga brevicollis / 3286 / 24
Fungi / 517 / 3.8
Amoebozoa / 230 / 1.7
Euglenozoa / 74 / 0.5
Viridaeplantae / 926 / 6.8
Haptophyta / 4 / 0.03
Stramenopiles / 749 / 5.5
Diatoms / 304 / 2.2
Rhizaria / 4 / 0.03
Alveolata / 223 / 1.6
Bacteria / 1850 / 13.5
Archaea / 60 / 0.4
Viruses / 29 / 0.2
Table S3 BLAST and InterProScan results from SIT-like S. diplocostata EST contigs.Sequences are deposited within EMBL/Genbank BioProject PRJEB1282.
EMBL/Genbank Accession No. / No. of Reads / HMMPfam Domain [Region]Score / tBLASTx top hit / E-Value / PsiBLAST Top Hit / E-Value
HAAH01000001 / 322 / PF03842 Silicon Transporter [52-491] 7.3 e-76 / P. tricornutum SIT2-2 GI:215398379 / 9.00E-27 / P. tricornutum SIT3 GI:215398382 / 7.00E-66
HAAH01000002 / 64 / PF03842 Silicon transporter [11-193]
3.4e-30 / S. acus SIT GI:227460943 / 7.00E-11 / S. acus SIT3 GI:227460944 / 1.00E-23
HAAH01000003 / 9 / PF03842 Silicon transporter
[2-259]
2.3e-36 / N. pelliculosa SIT1 GI:82527174 / 1.00E-17 / P. tricornutum SIT3 GI:215398382 / 2.00E-31
HAAH01000004 / 324 / PF03842 Silicon transporter [45-260]
1.1e-36 / S. acus SIT GI:227460943 / 2.00E-24 / P. tricornutum Cell Surface Receptor Protein GI:219116172 / 6.00E-28
HAAH01000005 / 331 / PF03842 Silicon transporter [56-494]
3.9e-76 / P. tricornutum SIT2-2 GI:215398379 / 1.00E-27 / P. tricornutum SIT3 GI:215398382 / 1.00E-66
HAAH01000006 / 64 / PF03842 Silicon transporter
[9-233]
3.2e-35 / S. acus SIT GI:227460943 / 9.00E-11 / P. tricornutum SIT3 GI:215398382 / 4.00E-28
Table S4 Primer Sequences designed for the amplification of S. diplocostata SIT-like (SdSIT) and D. grandis SIT-like (DgSIT) genes. All sequences are given 5’-3’. Primers were synthesized by Sigma.
Primer / SequenceDgSITa_R / GGCATGAGCACGGTGTAGTACGC
DgSITa_F / AACAATGGAACAACCCTCCATGGG
SdSIT_24102_F / CCATCATCTAGAAGATCCTCAAAG
SdSIT_24102_R / CGTATTTAAGTAATGAAACGATAGTGT
SdSIT_00527_F1 / CACCCGACCACAAGGACCAG
SdSIT_00527_F2 / ACAATGGATAAGAGCCACATCC
SdSIT_00527_R1 / GTGGAAATAATAAAGATTTAATGAGAGTAC
SdSIT_00527_R2 / AAAGATTTAATGAGAGTACAACAATTACCC
SdSIT_10214 F / AACATGGAGAAAAGCCACG
SdSIT_R / GGCTGGTGCAGGTCAAATGGT
Table S5 Successful primer combinations and their resulting products.
Product Name / Forward Primer / Reverse PrimerSdSITa / SdSIT_24102_F / SdSIT24102_R
SdSITb.1 / SdSIT_00527_F1 / SdSIT_00527_R2
SdSITb.2 / SdSIT_00527_F2 / SdSIT_00527_R1
SdSITc / SdSIT_10214_F / SdSIT_R
DgSITa / DgSITa_F / DgSITa_R
Table S6 Choanoflagellate SIT similarity and protein domain search results. All analyses were conducted using default settings. BLAST searches were done against the EBML/Genbank non-redundant databases.
Gene(EMBL/Genbank Accession No.) / tBLASTx Top Hit / PsiBLAST Top Hit / InterProScan HMMPfam Domain
SdSITa
(HE981735) / P. tricornutum SIT2-2 GI:215398379
2e-31 / P. tricornutum SIT2-2 GI:215398382
4e-68 / Silicon Transporter
PF03842
[42-491] 9.4e-76
SdSITb
(HE981736) / P. tricornutum SIT2-2 GI:215398379
3e-27 / P. tricornutum SIT2-2 GI:215398382
5e-68 / Silicon Transporter
PF03842
[40-475] 2.8e-76
SdSITc
(HE981737) / S. acus SIT GI:227460943
2e-24 / P. tricornutum SIT2-2 GI:215398382
2e-67 / Silicon Transporter
PF03842
[40-474] 4.4e-75
DgSITa
(HE981738) / C. fusiformis SIT5 GI:3283037
3e-17 / C. fusiformis SIT3
GI:3283034
7e-22 / Silicon Transporter
PF03842
[3-180] 2.2e-30
Table S7 Results of WolfPSort analysis of S. diplocostata SITs. The majority prediction was for localization to the plasma membrane from all three available eukaryotic subcellular location databases.
Gene / Prediction vs. Animal Database / Prediction vs. Plant Database / Prediction vs. Fungal DatabaseSdSITa / 31 Plasma Membrane; 1 Golgi Membrane / 11 Plasma Membrane; 2 E.R.; 1 Vacuole / 23 Plasma Membrane; 3 E.R.
SdSITb / 32 Plasma Membrane / 11 Plasma Membrane; 2 E.R.; 1 Vacuole / 26 Plasma Membrane; 1 E.R.
SdSITc / 32 Plasma Membrane / 11 Plasma Membrane; 2 E.R.; 1 Vacuole / 26 Plasma Membrane; 1 E.R.
Table S8 Significant tBLASTx hits to SdSIT genes. These 156 sequences were used in a ClustalX alignment for the purposes of identifying conserved protein motifs and functionally relevant residues (charged or hydroxylated).
EMBL/Genbank Gene Identifier Number / Group / Speciesgi|215398382 / Pennate Diatom / Phaeodactylum tricornutum
gi|219116172 / Pennate Diatom / Phaeodactylum tricornutum
gi|3283034 / Pennate Diatom / Cylindrotheca fusiformis
gi|1480867 / Pennate Diatom / Cylindrotheca fusiformis
gi|3283030 / Pennate Diatom / Cylindrotheca fusiformis
gi|3283038 / Pennate Diatom / Cylindrotheca fusiformis
gi|3283036 / Pennate Diatom / Cylindrotheca fusiformis
gi|3283032 / Pennate Diatom / Cylindrotheca fusiformis
gi|227460944 / Pennate Diatom / Synedra acus
gi|82527177 / Pennate Diatom / Nitzschia alba
gi|219128344 / Pennate Diatom / Phaeodactylum tricornutum
gi|219126028 / Pennate Diatom / Phaeodactylum tricornutum
gi|82527195 / Centric Diatom / Thalassiosira pseudonana
gi|82527191 / Centric Diatom / Skeletonema costatum
gi|82527197 / Centric Diatom / Thalassiosira pseudonana
gi|224004538 / Centric Diatom / Thalassiosira pseudonana
gi|82527193 / Centric Diatom / Thalassiosira pseudonana
gi|224003147 / Centric Diatom / Thalassiosira pseudonana
gi|224002056 / Centric Diatom / Thalassiosira pseudonana
gi|82527175 / Pennate Diatom / Fistulifera pelliculosa
gi|82527161 / Pennate Diatom / Phaeodactylum tricornutum
gi|82527185 / Pennate Diatom / Nitzschia sp. KKT-2005
gi|82527183 / Pennate Diatom / Nitzschia alba
gi|82527179 / Pennate Diatom / Nitzschia alba
gi|82527181 / Pennate Diatom / Nitzschia alba
gi|82527169 / Pennate Diatom / Fistulifera pelliculosa
gi|82527167 / Pennate Diatom / Fistulifera pelliculosa
gi|94983079 / Centric Diatom / Thalassiosira pseudonana
gi|94983081 / Centric Diatom / Thalassiosira pseudonana
gi|94983087 / Centric Diatom / Porosira glacialis
gi|94983177 / Centric Diatom / Thalassiosira weissflogii
gi|94983155 / Centric Diatom / Thalassiosira weissflogii
gi|94983169 / Centric Diatom / Minidiscus trioculatus
gi|82527199 / Centric Diatom / Bacterosira sp. CCMP991
gi|94983211 / Centric Diatom / Thalassiosira weissflogii
gi|94983089 / Centric Diatom / Porosira pseudodenticulata
gi|94983085 / Centric Diatom / Porosira glacialis
gi|94983191 / Centric Diatom / Thalassiosira rotula
gi|94983153 / Centric Diatom / Minidiscus trioculatus
gi|94983141 / Centric Diatom / Thalassiosira nodulolineata
gi|82527201 / Centric Diatom / Thalassiosira weissflogii
gi|94983171 / Centric Diatom / Bacterosira sp. CCMP991
gi|82527163 / Pennate Diatom / Fistulifera pelliculosa
gi|82527173 / Pennate Diatom / Fistulifera pelliculosa
gi|94983193 / Centric Diatom / Thalassiosira rotula
gi|94983143 / Centric Diatom / Thalassiosira nodulolineata
gi|82527165 / Pennate Diatom / Fistulifera pelliculosa
gi|94983229 / Centric Diatom / Bacterosira bathyomphala
gi|94983165 / Centric Diatom / Thalassiosira minima
gi|94983181 / Centric Diatom / Thalassiosira sp. CCMP1065
gi|94983133 / Centric Diatom / Thalassiosira gessneri
gi|94983093 / Centric Diatom / Lauderia annulata
gi|94983235 / Centric Diatom / Skeletonema menzellii
gi|94983111 / Centric Diatom / Cyclotella cf. meneghiniana
gi|94983167 / Centric Diatom / Thalassiosira minima
gi|94983097 / Centric Diatom / Thalassiosira punctigera
gi|94983129 / Centric Diatom / Thalassiosira gessneri
gi|94983103 / Centric Diatom / Cyclotella striata
gi|94983227 / Pennate Diatom / Bacterosira bathyomphala
gi|94983091 / Centric Diatom / Porosira pseudodenticulata
gi|94983149 / Centric Diatom / Thalassiosira sp. CCMP353
gi|94983223 / Centric Diatom / Detonula pumila
gi|94983209 / Centric Diatom / Thalassiosira weissflogii
gi|82527171 / Pennate Diatom / Fistulifera pelliculosa
gi|94983175 / Centric Diatom / Thalassiosira weissflogii
gi|94983131 / Centric Diatom / Thalassiosira gessneri
gi|94983233 / Centric Diatom / Skeletonema subsalsum
gi|94983095 / Centric Diatom / Thalassiosira punctigera
gi|94983189 / Centric Diatom / Thalassiosira rotula
gi|94983105 / Centric Diatom / Cyclotella striata
gi|94983151 / Centric Diatom / Thalassiosira sp. CCMP353
gi|94983219 / Centric Diatom / Shionodiscus ritscheri
gi|94983199 / Centric Diatom / Thalassiosira pacifica
gi|94983107 / Centric Diatom / Cyclotella cf. meneghiniana
gi|94983299 / Centric Diatom / Stephanodiscus minutulus
gi|94983101 / Centric Diatom / Cyclotella cryptica
gi|94983275 / Centric Diatom / Stephanodiscus neoastraea
gi|94983237 / Centric Diatom / Skeletonema japonicum
gi|94983249 / Centric Diatom / Cyclostephanos tholiformis
gi|94983243 / Centric Diatom / Stephanodiscus binderanus
gi|82527189 / Centric Diatom / Skeletonema costatum
gi|94983245 / Centric Diatom / Stephanodiscus parvus
gi|94983145 / Centric Diatom / Thalassiosira nodulolineata
gi|94983225 / Centric Diatom / Detonula pumila
gi|94983289 / Centric Diatom / Stephanodiscus hantzschii
gi|94983119 / Centric Diatom / Cyclotella cf. meneghiniana
gi|94983293 / Centric Diatom / Stephanodiscus hantzschii
gi|94983217 / Centric Diatom / Thalassiosira sp. CC03-04
gi|94983203 / Centric Diatom / Thalassiosira pacifica
gi|94983241 / Centric Diatom / Stephanodiscus agassizensis
gi|94983291 / Centric Diatom / Stephanodiscus sp. Y98-1
gi|94983099 / Centric Diatom / Thalassiosira pseudonana
gi|94983221 / Centric Diatom / Shionodiscus ritscheri
gi|94983259 / Centric Diatom / Stephanodiscus minutulus
gi|94983127 / Centric Diatom / Cyclotella distinguenda
gi|94983163 / Centric Diatom / Thalassiosira antarctica
gi|94983247 / Centric Diatom / Stephanodiscus parvus
gi|94983161 / Centric Diatom / Thalassiosira antarctica
gi|94983121 / Centric Diatom / Cyclotella cf. meneghiniana
gi|94983231 / Centric Diatom / Skeletonema grethae
gi|94983123 / Centric Diatom / Cyclotella distinguenda
gi|94983285 / Centric Diatom / Stephanodiscus minutulus
gi|94983301 / Centric Diatom / Stephanodiscus minutulus
gi|94983257 / Centric Diatom / Stephanodiscus minutulus
gi|94983173 / Centric Diatom / Thalassiosira oceanica
gi|94983255 / Centric Diatom / Stephanodiscus niagarae
gi|94983269 / Centric Diatom / Stephanodiscus niagarae
gi|94983281 / Centric Diatom / Stephanodiscus reimerii
gi|94983125 / Centric Diatom / Cyclotella distinguenda
gi|94983277 / Centric Diatom / Stephanodiscus neoastraea
gi|94983297 / Centric Diatom / Stephanodiscus yellowstonensis
gi|94983265 / Centric Diatom / Discostella cf. pseudostelligera
gi|94983279 / Centric Diatom / Stephanodiscus reimerii
gi|94983239 / Centric Diatom / Stephanodiscus agassizensis
gi|94983267 / Centric Diatom / Discostella stelligera
gi|94983287 / Centric Diatom / Cyclostephanos sp. WTC16
gi|94983251 / Centric Diatom / Cyclostephanos invisitatus
gi|94983283 / Centric Diatom / Stephanodiscus reimerii
gi|94983215 / Centric Diatom / Thalassiosira sp. CC03-04
gi|94983253 / Centric Diatom / Cyclostephanos invisitatus
gi|82527187 / Centric Diatom / Paralia sulcata
gi|94983295 / Centric Diatom / Stephanodiscus yellowstonensis
gi|94983271 / Centric Diatom / Discostella pseudostelligera
gi|94983261 / Centric Diatom / Cyclotella bodanica
gi|76594269 / Centric Diatom / Chaetoceros muellerii
gi|94983273 / Centric Diatom / Discostella pseudostelligera
gi|20799543 / Pennate Diatom / Synedra acus var. radians
gi|94983263 / Centric Diatom / Cyclotella bodanica
gi|94983139 / Centric Diatom / Thalassiosira anguste-lineata
gi|94983157 / Centric Diatom / Thalassiosira aestivalis
gi|94983117 / Centric Diatom / Cyclotella atomus
gi|94983137 / Centric Diatom / Thalassiosira anguste-lineata
gi|94983187 / Centric Diatom / Thalassiosira sp. CCMP1093
gi|94983179 / Centric Diatom / Thalassiosira weissflogii
gi|94983213 / Centric Diatom / Thalassiosira weissflogii
gi|94983115 / Centric Diatom / Cyclotella sp. L1844
gi|94983135 / Centric Diatom / Thalassiosira anguste-lineata
gi|146746039 / Pennate Diatom / Achnanthes exigua
gi|94983183 / Centric Diatom / Thalassiosira sp. CCMP1065
gi|94983083 / Centric Diatom / Thalassiosira pseudonana
gi|94983159 / Centric Diatom / Thalassiosira aestivalis
gi|94983205 / Centric Diatom / Thalassiosira pacifica
gi|94983207 / Centric Diatom / Thalassiosira pacifica
gi|125661882 / Pennate Diatom / Rhopalodia gibba
gi|94983147 / Centric Diatom / Thalassiosira eccentrica
gi|148250143 / Pennate Diatom / Synedra vaucheriae
gi|125661884 / Pennate Diatom / Epithemia zebra
gi|94983109 / Centric Diatom / Cyclotella meneghiniana
gi|94983185 / Centric Diatom / Thalassiosira sp. CCMP1093
gi|94983195 / Centric Diatom / Thalassiosira punctigera
gi|82919490 / Chrysophyte
(Non-diatom Stramenopile) / Ochromonas ovalis
gi|94983197 / Centric Diatom / Thalassiosira punctigera
gi|94983113 / Centric Diatom / Cyclotella meneghiniana
gi|148250145 / Pennate Diatom / Nitzschia communis
gi|70797601 / Pennate Diatom / Pseudo-nitzschia multiseries
gi|71152682 / Pennate Diatom / Pseudo-nitzschia pungens
Table S9 Stramenopile sequences used in the maximum likelihood and Bayesian analyses of S. diplocostata SITa-cand D. grandis SITa. Taxonomy and SIT classifications are based on the EMBL/Genbank annotations.