Materials and methods

Classification of the North Atlantic V6-tag data

The phylogenetic diversity of the V6-sequence tags of Sogin et al. (Sogin et al. 2006) was computed with a custom PERL script and is summarized in Figure 1.

Each trimmed set of sequences, as described in the original paper, was searched using BLAST (Altschul et al. 1990) against the V6RefDB database (Sogin et al. 2006). Only hits with an e-value of less than 0.001 were considered to be significant.

The accession number in the FASTA header of the best hit was then used to retrieve the full length sequence and that sequence was searched on the RDP-II (Ribosomal Database Project) database (Maidak et al. 2001; Cole et al. 2005) parsing the phylogenetic lineage.

Whenever the best hit was to sequences that could not be classified, the script would go back and scan the first 5 BLAST hits to retrieve the highest scoring sequence that could be classified by this method. If no classifiable sequence was found in the top 5 or the significance dropped below the threshold (e-value<0.001), the sequence was discarded from the analysis. By this method between 77% and 96% of each sample was classified.

Determination of intergenic regions and ribosomal operon copy numbers.

The average intergenic region spacer was calculated by dividing the number of non-coding nucleotides in each genome by the total number of predicted ORFs, ribosomal RNA genes and tRNA genes.

The data about number of ribosomal operons was retrieved either from completed genome projects or from the Ribosomal RNA Copy Number Database (Klappenbach et al. 2001).

Draft sequences were used for this analysis only if they consisted of less than 10 scaffolds. In this case, additional ribosomal RNA operons were identified by searching the ends of the scaffolds as most RNA operons are expected to fall in sequence gaps because of their repetitive nature.

References

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic Local Alignment Search Tool. Journal of Molecular Biology 215:403-410

Cole JR et al. (2005) The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Research 33:D294-D296

Klappenbach JA, Saxman PR, Cole JR, Schmidt TM (2001) rrndb: the Ribosomal RNA Operon Copy Number Database. Nucleic Acids Research 29:181-184

Maidak BL et al. (2001) The RDP-II (Ribosomal Database Project). Nucleic Acids Research 29:173-174

Sogin ML et al. (2006) Microbial diversity in the deep sea and the underexplored "rare biosphere". Proceedings of the National Academy of Sciences of the United States of America 103:12115-12120

2