SOM:

S1: Detailed overview of the samples used in this study.

Indicated are the year the sample was collected, the month, BATS cruise number and the depth from which the sample was collected. Red sample fields indicate samples that were completely analyzed and are included in the master dataset of this study. Indicated in blue are samples that were collected but either 16S rRNA genes couldn’t be amplified or the T-RFLP analysis didn’t work, even after repeated tries.

S2. Predicted and observed fragment lengths of representative clones from different phylogenetic groups.

S3. ANOSIM analysis of different groups of BATS samples. A clear distinction between upper euphotic zone (UEZ; surface, 40 m and 80 m samples) and upper mesopelagic (UMP; 200 m, 250 m and 300 m samples) microbial communities was observed in the ANOSIM analysis (R: 0.78). The deep chlorophyll maximum community (DCM; 120 m and 160 m samples) was in-between UEZ and UMP (DCM-UEZ R: 0.39; DCM-UMP R: 0.36). The samples in the ordination (see also Figure 2) are colored according to the group to which they were assigned in the statistical analysis. For all pairs, a significance level of 0.1% was observed in the ANOSIM analysis. 999 out of a very large number of possible permutations were tested with none of the permuted statistics showed an R value greater than or equal to the global R of 0.596.

S4. Adjusted composite plot showing the deep chlorophyll maximum distribution pattern of marine Microthrix. Plotted is the relative fluorescence of the fragments from Bsh1236I and MspI, adjusted in a similar way as the triplets for patterns of other organisms. The BsuRI fragment wasn’t detected because it’s size, as predicted from clones, was below the size range (50-1000 bp) analyzed. Because of this, abundance might be overestimated. DM indicates the month of the deep mixing event.

S5. Depth profile of SAR324 clade organisms. All species-level subclades showed low abundance in the euphotic zone that increases in the mesopelagic. Subclade SAR276 was the only one with significant relative abundance above 80 m. Plotted are averages of relative abundances of the complete dataset.

S6. Phylogenetic reconstruction of SAR86 clade 16S rRNA gene sequences.

Neighbor-Joining trees of a) selected full length sequences (>1200 bp) from the database. Bootstrap values over 50% of 1000 replicates are shown. b) Phylogenetic tree shown under a) with shorter sequences from the clone libraries added using the ARB parsimony tool.

S7. 16S rRNA evolutionary tree of the SAR116 clade of Alphaproteobacteria. Shown are the different subclades and their respective triplets (BsuRI/Bsh1236I/MspI). Naming follows conventions established by Suzuki et al. (2001); however, this analysis resolved subclade I into subgroups Ia and Ib, subclade II into three different subgroups (IIa, IIb and IIc), and subclade III into subgroups IIIa and IIIb. Further, two new subclades (named IV and V) were delineated. Dominant triplets are indicated for all subclades except subclade IIb, where no unifying triplet could be detected.

S8. Phylogenetic reconstruction of SAR116 clade 16S rRNA gene sequences.

Neighbor-Joining trees of selected full length sequences (>1200 bp) from the database. Bootstrap values over 50% of 1000 replicates are shown.

S9. Supplementary discussion: unidentified T-RFLP fragments.

1