Apidologie Supplementary Material

Mitochondrial DNA variation of Apis mellifera iberiensis: further insights from a large scale study using sequence data of the tRNAleu-cox2 intergenic region

Julio Chávez-Galarza1,2, Lionel Garnery3,4, Dora Henriques1,2, Cátia J. Neves1, Wahida Loucif-Ayad5, J. Spencer Jonhston6, M. Alice Pinto1*

1Mountain Research Centre (CIMO), Polytechnic Institute of Bragança, Campus de Sta. Apolónia, 5300-253 Bragança, Portugal

2Centre of Molecular and Environmental Biology (CBMA), University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal

3Laboratoire Evolution, Génomes et Spéciation, CNRS, Bât13, Avenue de la Terrasse, 91198, Gif-sur-Yvette, France

4Université de Versailles, Saint Quentin en Yvelines, 45 Avenue des Etats-Unis, 78, Versailles, France

5Laboratory of Applied Animal Biology, University Badji-Mokhtar, Annaba, Algeria

6Dept. of Entomology, Texas A&M University, College Station, Texas 77843-2475, USA

*Corresponding author

Architecture of the tRNAleu-cox2 intergenic region

The architecture of the tRNAleu-cox2 intergenic region has been extensively described in the literature and was recently reviewed in the BeeBook (Meixner et al. 2013). Briefly, this region encompasses the 3' end of the tRNAleu gene, the P and Q elements, and the 5' end of the cox2 gene (see Figure S1). Its length is determined by size and composition of the non-coding fragment formed by the P and Q elements. The P element varies between ~53 and 68 bp whereas the Q element varies between ~194 and 196 bp. The P element exhibits three forms known as P0, P and P1. The Q element can be repeated in tandem one to five times, although the number of repeats is not lineage specific (Garnery et al. 1993; De la Rúa et al. 1998; Franck et al. 1998, 2001; Alburaki et al. 2011; Rortais et al. 2011). The Q element can be divided into three parts (Q1, Q2, and Q3), which show a high level of similarity with the 3' end of the cox1 gene, the tRNAleu gene, and the P element, respectively. While the described complexity is mostly captured by the DraI test, further variation (nucleotide substitutions and short indels) can only be revealed by sequence data.

Discrimination of lineages and African sub-lineages

Discrimination of A, M, and C mtDNA lineages and of African sub-lineages (AI, AII, AIII, and Z) has primarily been based on DraI sites located at the 3' end of tRNAleu and the 5' end of the first Q element, as well as on indels located in the P element. Two forms of the P element, P0 or P1, are typical of lineage A. The P1 form is characterized by a 15-bp deletion at the 3’ end of the P element (d1 in Figure S1) whereas P0 does not exhibit any large deletion (Figure S1). The P0 form is carried by sub-lineages AI, AII and Z (Franck et al. 2001, Alburaki et al. 2011) whereas P1 is carried by sub-lineage AIII (De la Rúa et al. 1998, Garnery et al. 1998). Sub-lineage AII is differentiated from sub-lineage AI by the absence of the DraI site, at the 5’ end of the first Q element, whereas sub-lineage Z presents an additional DraI site in the middle of the first Q element. Lineage M is mainly distinguished by a 13-bp deletion in the middle of the P element (d in Figure S1) and by two DraI sites in the first Q element, which are also present in sub-lineage Z. On the other hand, the absence of the P element, concurrently with existence of a single Q element, characterizes lineage C. In summary, the length of the intergenic region can be highly variable, depending on the combination of number of the Q elements and forms of the P element, among and within lineages.

Nomenclature system

A simple nomenclature system, which combined an upper-case letter denoting the evolutionary lineage (A, M, C) with an Arabic numeral determined by haplotype discovery in DraI surveys (e.g. A1 was the first African haplotype identified), was proposed earlier for naming haplotypes produced by the DraI test (Garnery et al. 1993). The haplotypes distinguished by the DraI test result from a combination of the length polymorphism exhibited by the pre-digested fragment, visualized in a standard agarose gel, with the DraI polymorphism that produces variable band patterns, visualized in a high-resolution gel (polyacrylamide or wide-range).

While the DraI test has enabled identification of numerous haplotypes, the increasing availability of sequence data has been revealing further variation produced by nucleotide substitutions and short indels, which demanded a refinement of the nomenclature system. Accordingly, Franck et al. (2000) suggested inserting a lower-case letter after the Arabic numeral (e.g. A1b) to distinguish sequence variants. However, as described below, these guidelines have often been overlooked leading to haplotype misnaming. To accommodate the complexity uncovered by increasing sequence data, and in an attempt to standardize the nomenclature system, recently Rortais et al. (2011) restated the criteria for naming DraI haplotypes, as follows: (1) haplotypes with the same DraI band pattern but different number of Q sequences should be labelled with the same number and added the symbols ', '', ''' to discriminate haplotypes with three, four, and five Q elements, respectively (e.g. M4', M21'', M56'''), (2) haplotypes exhibiting a novel DraI band pattern carrying three, four, and five Q elements, should be assigned new numerals followed by the symbols ', '', ''', consistent with the number of Q elements (three, four, and five, respectively), (3) haplotypes with one or two Q elements should be differentiated by numerals (e.g. M13, M8), and (4) haplotypes with similar DraI band patterns but exhibiting slight variations (very short indels or nucleotide substitutions), only detected by sequence data, should be assigned a lower-case letter following the numeral (e.g. M4a, M8a').

The problem with the existing nomenclature system is that detection of indel variants in a high-resolution gel will depend on the size of the fragment, and Rortais et al. (2011) failed to establish fragment size thresholds. Here, we further refine the nomenclature system by proposing two additional criteria to label novel haplotypes. First, 5 different size fragments varying between: 1) 27-65 bp with a one-nucleotide indel, 2) 66-110 bp with a two-nucleotide indel, 3) 111-200 bp with a three-nucleotide indel, 4) 201-300 bp with a four-nucleotide indel, and 5) >301 bp with a > five- nucleotide indel, should each be assigned a novel haplotype, identified by an Arabic numeral. In cases where the size of indels is below those thresholds and where nucleotide substitutions occur, the haplotypes should be considered novel variants, identified by a lower-case letter, as before. It should be noted that these variants can only be detected by sequencing. Second, when designating novel haplotypes, their frequencies should be taken into account; while the Arabic numeral should be assigned to the most frequent haplotype the least frequent ones are those distinguished by an appended letter (e.g. A1 should be followed in order of decreasing frequency by A1b, A1c etc.). These guidelines were adopted for naming the haplotypes of the newly sequenced 742 individuals (Table S1), and for revising previously published haplotypes of A, M, and C ancestry (Table S2).

Revision of haplotype names

Following the aforementioned nomenclature criteria, the 182 novel haplotypes identified in this study, as well as 30 GenBank haplotypes, were assigned a new name (see Tables S1 and S2 for haplotype descriptions and accession numbers). Additionally, fragment sizes of 13 haplotypes that were named by others from DraI band patterns, and for which there were no sequence data in GenBank (e.g. A3 reported by Garnery et al. 1993, 1995; Franck et al. 2001), were amended using our sequence data (see haplotypes marked by an asterisk in Table S2).

The haplotypes A3, A10 and A16, which have three Q elements (Garnery et al. 1993, 1995; Franck et al. 1998), were renamed A3', A10' and A16' (Table S2). The haplotypes A15 (De la Rúa et al. 1998) and A47 (Muñoz et al. 2013) exhibit DraI band patterns similar to those originally proposed for A14 (Franck et al. 2001). Because they have different sizes produced by a variable the number of Q elements (A14, A15, A47 have two, three, and four Q elements, respectively), A15 and A47 were renamed A14' (three Q elements) and A14'' (four Q elements), respectively. DraI band sizes of A4 reported from sequence data (Franck et al. 2001; Collet et al. 2006) were not congruent with those described earlier (Garnery et al. 1993; Franck et al. 1998). Specifically, while Franck et al. (2001) and Collet et al. (2006) reported DraI band patterns of 47/107/191/483 and 47/108/192/483, respectively, those reported by Garnery et al. (1993) and Franck et al. (1998) were 47/108/193/483. Haplotype A4 identified by Collet et al. (2006) was maintained, because it is more similar to the original proposal, while that of Franck et al. (2001) was renamed A4a. Similarly, haplotype A1 and its variants deposited in GenBank were not concordant with the original description (Garnery et al. 1993, 1995). The band pattern of A1b proposed by Franck et al. (2001) should be renamed as A1 because it exhibits the features originally described by Garnery et al. (1993) and Franck et al. (1998). Sequences reported by Collet et al. (2006) and Branchiccela et al. (2014) showed differences in size and nucleotide substitutions inconsistent with the original proposal, and were therefore reassigned as A1b and A1e, respectively.

As was done for lineage A, several haplotypes of M ancestry were renamed (Table S2). Haplotype M4 has been reported in numerous studies (Garnery et al. 1993; Franck et al. 1998, 2001; Collet et al. 2006; Rortais et al. 2011; Pinto et al. 2014), although sequence data were only deposited in GenBank by Franck et al. (2001), Collet et al. 2006, Rortais et al. (2011), and Pinto et al. (2014). The sequences of Franck et al. (2001) and Pinto et al. (2014) are very similar (only differing in a C/T transition at the 5' end of cox2), and match the original DraI band pattern (142/652/131/422), whereas the M4 sequences deposited by Collet et al. (2006) and Rortais et al. (2011) are very different. Considering the low frequency of the sequence reported by Franck et al. (2001) and the high frequency found by Pinto et al. (2014) and this study (Table S3), we propose to rename the M4a of Pinto et al. (2014) as M4 and the M4 of Franck et al. (2001) as M4a. The M4 reported by Collet et al. (2006) is similar to the M17 of Rortais et al. (2011), and should therefore be renamed M17b. Additionally, some M4 variants named by Pinto et al. (2014) were re-analyzed and renamed as M17 haplotype variants (M17b to M17f, M17h). The M4i of Pinto et al. (2014) exhibits a new band pattern and was therefore renamed M71 (Table S2). Rortais et al. (2011) proposed a final description of the M4'; however, the sequence data uploaded in GeneBank is not congruent with the DraI band pattern of M4'. The correct M4' sequence is presented in this study. Finally, we detected incongruences in fragment sizes and DraI band patterns reported for haplotypes M10, M11, M12, M13, M34, M55', M62, and M63. These names were amended according to their sequence data, and the sequence data was deposited in GenBank (see accession numbers in Table S2).

Although haplotypes of C ancestry are very rare in Iberia (Cánovas et al. 2008, Pinto et al. 2012, 2013; Chávez-Galarza et al. 2015; this study), we also revised any names of haplotypes C that did not comply with the aforementioned criteria. While haplotypes C2 (Franck et al. 2001) and C3 (Perrier et al. 2003) have been reported, their sequences are not available in GenBank. Haplotypes C2 and C11, reported by Techer et al. (2015) and Solórzano et al. (2009) respectively, should be reassigned to C2j (Muñoz and De la Rúa 2012) because the three sequences are perfect matches. Haplotype C2d was renamed C2 because of its high frequency (Muñoz et al. 2009). The band patterns of haplotypes C2e (Muñoz et al. 2009), C2k (Razpet et al. unpublished) and C31 (Magnus et al. 2011) were similar to those of C3 and were therefore renamed C3, C3a, C3b, respectively.

References

Branchiccela, B., Aguirre, C., Parra, G., Estay, P., Zunino, P., Antúnez, K. (2014) Genetic changes in Apis mellifera after 40 years of Africanization. Apidologie 45, 752-756.

Magnus, R.M., Tripodi, A.D., Szalanski, A.L. (2011) Mitochondrial DNA diversity of honey bees, Apis mellifera L. (Hymenoptera: Apidae) from queen breeders in the United States. J. Apic. Sci. 55, 37-46.

Muñoz, I., Dall´Olio, R., Lodesani, M., De la Rúa, P. (2009) Population genetic structure of Coastal Croatian honeybees. Apidologie 40, 617-626.

Muñoz, I., De la Rúa, P. (2012) Temporal analysis of genetic diversity in a honey bee mating area of an island population (La Palma, Canary islands, Spain). J. Apic. Sci. 56, 41-49.

Ödzil, F., Yildiz, M.A., Hall, H.G. (2009) Molecular characterization of Turkish honey bee populations (Apis mellifera) inferred from mitochondrial DNA RFLP and sequence results. Apidologie 40, 570-576.

Perrier, C., Strange, J., Langella, O., Sheppard, W.S., Garnery, L. (2003) Diversité génétique, introgressions mitochondriales et nucléaires dans une population d’abeilles des Landes de Gascogne. Actes du BRG 4, 79–100.

Solórzano, C.D., Szalanski, A.L., Kence, M., McKern, J.A., Austin, J.W., Kence, A. (2009) Phylogeography and population genetics of honey bees (Apis mellifera) from Turkey based on COI-COII sequence data. Sociobiology 53, 237-246.

Techer, M.A., Clémencet, J., Turpin, P., Volbert, N., Reynaud, B., Delatte, H. (2015) Genetic characterization of the honeybee (Apis mellifera) population of Rodrigues Island, based on microsatellite and mitochondrial DNA. Apidologie 46, 445-454.