Analysis of dating the age of the 7 paralog pairs in B. mori that moved out of the Z chromosome

If a retrotransposition event occurred before Drosophila melanogaster and Bombyx mori divergence, when the B. mori Z chromosome had not yet originated, we cannot determine whether retrogene movement is linked to the sex chromosome or to the autosomes. Therefore, we used Blastp and the construction of phylogenetic trees to analyze the age of the retrotransposition. We also considered the correspondence of exon number between B. mori and D. melanogaster’s best blastp hits to identify the age of the movement. The ortholog of the parental gene should be a multi-exon gene and the ortholog of the retrogene should be a single-exon gene to consider the movement ancestral to those species divergence.

We excluded from our final analysis in Table 2 one case in which we could unambiguously assign the age of retrotransposition to before the split between D. melanogaster and B. mori,

Below, we describe in details all cases of gene duplication by retrotransposition moving out of the Z chromosome.

Pair 1 BGIBMGA001422-BGIBMGA002047

In this pair, the retrogene and the parental gene are BGIBMGA001422 and BGIBMGA002047, respectively. We blastp the retrogene in B. mori (BGIBMGA001422) against the D. melanogaster protein database. The best hit of the retrogene is Bteb2-PA, which corresponds to a single-exon gene. We also blastp the parental gene protein in B. mori (BGIBMGA002047) against the D. melanogaster protein database. The best hit of the parental gene is luna-PB, which corresponds to a multi-exon gene. In addition, we constructed the phylogenetic tree to confirm their evolutionary relationship. Both the tree and the correspondence of the number of exons suggest that the retrogene was generated before the split of B. mori and D. melanogaster, when the “Z” chromosome hasn’t originated. Therefore, we don’t know whether the paralog pair experienced “out of Z” movement, and we excluded this pair from our dataset in which the retrogenes do not have orthologs in other species.

Pair 2 BGIBMGA004945-BGIBMGA000597

In this pair, the retrogene and the parental gene are BGIBMGA004945 and BGIBMGA000597, respectively. We blastp the retrogene protein (BGIBMGA004945) and parental gene protein (BGIBMGA000597) in B. mori against the D. melanogaster protein database. The best hit of retrogene and parental gene is the same gene, CG15669 (as shown in the above phylogenetic tree), which corresponds to a multi-exon gene. The phylogenetic tree and the correspondence of the number of exons confirmed the evolutionary relationships showed in the blast search: the D. melanogaster ortholog of BGIBMGA000597 is CG15669 and the single-exon retrogene, BGIBMGA004945, doesn’t have an ortholog in D. melanogaster. Therefore, we keep this pair in our dataset in which the retrogenes do not have orthologs in other species.

Pair 3 BGIBMGA005416-BGIBMGA000657

In this pair, the retrogene and the parental gene are BGIBMGA005416 and BGIBMGA000657, respectively. We blastp the retrogene protein (BGIBMGA005416) and parental gene protein (BGIBMGA000657) in B. mori against the D. melanogaster protein database. The best hit of retrogene and parental gene is the same gene, CG42739 (as shown in the above phylogenetic tree), which corresponds to a multi-exon gene. The phylogenetic tree and the correspondence of the number of exons confirmed the evolutionary relationships showed in the blast search: the D. melanogaster ortholog of BGIBMGA000657 is CG42739 and the single-exon retrogene, BGIBMGA005416, doesn’t have an ortholog in D. melanogaster. Therefore, we keep this pair in our dataset in which the retrogenes do not have orthologs in other species.

Pair 4 BGIBMGA003829-BGIBMGA002005

In this pair, the retrogene and the parental gene are BGIBMGA003829 and BGIBMGA002005, respectively. We blastp the retrogene protein (BGIBMGA003829) and parental gene protein (BGIBMGA002005) in B. mori against the D. melanogaster protein database. The best hit of retrogene and parental gene is the same gene, CG16944 (as shown in the above phylogenetic tree), which corresponds to a multi-exon gene. The phylogenetic tree and the correspondence of the number of exons confirmed the evolutionary relationships showed in the blast search: the D. melanogaster ortholog of BGIBMGA002005 is CG16944 and the single-exon retrogene, BGIBMGA003829, doesn’t have an ortholog in D. melanogaster. Therefore, we keep this pair in our dataset in which the retrogenes do not have orthologs in other species.

Pair 5 BGIBMGA010016-BGIBMGA000679

In this pair, the retrogene and the parental gene are BGIBMGA010016 and BGIBMGA000679, respectively. We blastp the retrogene protein (BGIBMGA010016) and parental gene protein (BGIBMGA000679) in B. mori against the D. melanogaster protein database. The best hit of retrogene and parental gene is the same gene, CG17051 (as shown in the above phylogenetic tree), which corresponds to a multi-exon gene. The phylogenetic tree and the correspondence of the number of exons confirmed the evolutionary relationships showed in the blast search: the D. melanogaster ortholog of BGIBMGA000679 is CG17051 and the single-exon retrogene, BGIBMGA010016, doesn’t have an ortholog in D. melanogaster. Therefore, we keep this pair in our dataset in which the retrogenes do not have orthologs in other species.

Pair 6 BGIBMGA004141-BGIBMGA000635

In this pair, the retrogene and the parental gene are BGIBMGA004141 and BGIBMGA000635, respectively. We blastp the retrogene protein (BGIBMGA004141) in B. mori against the D. melanogaster protein database. The best hit of the retrogene is fd59A (as shown in the above phylogenetic tree), which corresponds to a multi-exon gene. We also blastp the parental gene protein (BGIBMGA000635) in B. mori against the D. melanogaster protein dataset. The best hit of the parental gene is croc (as shown in the above phylogenetic tree), which corresponds to a single-exon gene. Although the parental gene and the retrogene have different genes as best hit in D. melanogaster, there is no correspondence of exon number. The retrogene ortholog (fd59A) is not a single exon gene and therefore most likely has not been originated by the same retrotransposition event of silkworm. Therefore, we keep this pair in our dataset in which the retrogenes do not have orthologs in other species.

Pair 7 BGIBMGA012075-BGIBMGA003865

In this pair, the retrogene and the parental gene are BGIBMGA012075 and BGIBMGA003865, respectively. We blastp the retrogene protein (BGIBMGA012075) and parental gene protein (BGIBMGA003865) in B. mori against the D. melanogaster protein database. The best hit of retrogene and parental gene is the same gene, CG7758 (as shown in the above phylogenetic tree), which corresponds to a multi-exon gene. The phylogenetic tree and the correspondence of the number of exons confirmed the evolutionary relationships showed in the blast search: the D. melanogaster ortholog of BGIBMGA003865 is CG7758 and the single-exon retrogene, BGIBMGA012075, doesn’t have an ortholog in D. melanogaster. Therefore, we keep this pair in our dataset in which the retrogenes do not have orthologs in other species.