Supplemental Text 1.Sequence annotation of 10 BAC clones from wheat chromosome 3B:Gene prediction, description and synteny with rice.

We conducted gene prediction analysis for the remaining 18.5% non-TEs and non-repeated DNA, using different search programs (see Supplemental Method 1 for detailed annotation method). Genes of known and unknown functions, or putative genes were defined based on predictions and the existence of rice or other Triticeae homologs. Hypothetical genes were identified based on prediction programs only. Pseudogenes were not well predicted and frameshifts need to be introduced within the CDS structure to better fit a putative function based on BLASTX (mainly with rice). Truncated pseudogenes (genes disrupted by large insertion or deletion) and highly degenerated CDS sequences were considered as gene-relics.

Combined together, all these types of gene sequence information (GSI) account for only 1.0% of the sequence and are present in seven BAC clones (one or two genes per clone) while the remaining three BAC clones (TA3B95C9, TA3B95G2, TA3B63N2) contain no genes (indicated in Figure 1A and detailed in Supplemental Text 1, Supplemental Table 3 and Supplemental Table 4).

Six genes (of known and unknown functions), and 2 putative genes were detected on 5 of the BAC clones (indicated on Figure 1A and detailed in Supplemental Table 3): BAC clone TA3B63B13 contains two genes of known functions, one of which was incompletely sequenced (located on the end of the BAC clone), BAC clone TA3B81B7 one putative gene, BAC clone TA3B95F5 one putative and two other genes of unknown functions, BAC clone TA3B63C11 one known gene and BAC clone TA3B63E4 one incompletely sequenced gene of unknown function.

In addition to genes (of known or unknown functions) and putative genes, the search for sequence homologies between the whole 18.5% non-TE and non-repeated DNA sequences and the rice genome sequence ( us to detect several conserved sequences between wheat and rice. As summarized, one pseudogene and four gene-relics detected in (respectively) the BAC clones TA3B54F7 (one pseudogene), TA3B63B7 (two gene-relics), TA3B81B7 (one gene-relic) and TA3B63C11 (one gene-relic) (Supplemental Table 3), could not be predicted with the CDS prediction program (FGENESH), as they show frameshifts, stop mutations, TE insertions and/or large indels, and are probably no longer functional (Supplemental Table 2). Three of these five truncated genes (pseudogenes and gene-relics) have resulted from TEs insertions (Supplemental Table 3).

The wheat chromosome 3B is homologous to the rice chromosome 1. For orthology and synteny analysis, we considered the rice chromosome 1 and its duplicated segments that are found on other chromosomes (Guyotet al. 2004 and TIGR site Three BAC clones (TA3B63B13, TA3B81B7, TA3B95F5) have one or two of their orthologous rice genes that can be mapped on the rice chromosome 1 and were considered as confirmed in their synteny (Table 1). It is interesting to note that the two genes of known functions, separated by 88,114 bp on the BAC clone TA3B63B13 (Figure 1A) have their respective orthologs separated by 22,816 bp on rice chromosome 1. Thus, for this intergenic region, there is four-fold size difference between rice and wheat since their divergence from a common ancestor. Three other BAC clones (TA3B54F7, TA3B63C11 and TA3B63E4) also have homologs on rice chromosome 1, but the best match was observed with genes mapped on other rice chromosomes (Supplemental Table 3). BAC clone TA3B63B7 shows, for its putative gene and pseudogene, homologies with rice genes located on rice chromosome other than chromosome 1 (Supplemental Table 3).

No GSI or orthologous rice regions could be assigned to the three remaining BAC clones (TA3B95C9, TA3B95G2, TA3B63N2).

Finally 10 hypothetical genes were identified based on gene prediction only in the BAC clones TA3B54F7 (one), TA3B63B13 (two), TA3B81B7 (one), TA3B95F5 (four), TA3B63C11 (two) (Supplemental Table 4).

Sources:

Guyot, R., and B.Keller, 2004 Ancestral genome duplication in rice.Genome47:610–614.

Jurka, J., 2000 Repbase update: a database and an electronic journal of repetitive elements.Trends Genet.16:418–420.

Jurka, J., P.Klonowski, V.Dagman and P.Pelton, 1996 CENSOR: a program for identification and elimination of repetitive elements from DNA sequences.Comput. Chem.20:119–121.

McCarthy, E. M., andJ. F.McDonald, 2003 LTR_STRUC: a novel search and identification program for LTR retrotransposons.Bioinformatics19:362–367.

Sonnhammer, E. L., and R.Durbin, 1995 A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis.Gene167:GC1–GC10.

1

Charles et al. Supplemental_Text-1