Supplementary Data

Table S1 Sequence reads in the D. pseudoobscura assembly

Insert size / Standard deviation / vector / All Reads / Average read length* / Passed reads / Passed paired reads / Assembled reads / Assembled paired reads
2.7 kb / 320 bp / PUC18 / 556,542 / 652 bp / 492,385 / 469,110 / 421,159 / 375,680
3.4 kb / 355 bp / PUC18 / 1,537,595 / 673 bp / 1,328,155 / 1,247,810 / 1,128,898 / 990,666
6.3 kb / 765 bp / PUC18 / 1,045,744 / 541 bp / 786,937 / 726,586 / 649,836 / 542,530
40 kb / 3,000 bp / Fosmid / 45,567 / 583 bp / 31,048 / 25,468 / 25,426 / 18,658
130 kb / 30,000 bp / BAC / 34,684 / 480 bp / 12,611 / 7,822 / 8,969 / 4,442
Total / 3,220,132 / 623 bp / 2,651,136 / 2,476,796 / 2,234,288 / 1,931,976

*average total Phred 20 base pairs per read.

Table S2 Sequence statistics

Sequence unit / Total number / Average length ( kb) / N50 length ( kb) / Total length (Mb)
Contigs / 8,288 / 16.3 / 51.9 / 134.6
Scaffolds* / 755 / 179.0 / 996 / 139.3
Ultra-scaffolds† / 16 / 7,900 / 11,600 / 130.8

*scaffolds include more than one contig

†ordered and orientated groups of anchored scaffolds

Table S3 Comparison of chromosome lengths in D. pseudoobscura and D. melanogaster

Chromosome
(Muller element) / Length (Mb) / % Difference
D. pseudoobscura* / D. melanogaster
A / 25.8 (XL) / 21.8 (X) / 18
B / 28.1 (4) / 22.2 (2L) / 27
C / 19.8 (3) / 20.3 (2R) / -2
D / 25.4 (XR) / 23.4 (3L) / 9
E / 31.7 (2) / 27.9 (3R) / 14
Total / 130.8 / 115.6 / 13

* D. pseudoobscura chromosome lengths based on anchored scaffolds greater than 100kb, chromosome arm designations shown in parenthesis.

Table S4. Description of conserved linkage breakpoint sequences in D. pseudoobscura.

Number / Length (bp)
Mean (Minimum-Maximum) / Base Composition (%A/T)
Mean (Minimum-Maximum)
Muller’s Element / Gaps* / No Gaps† / Gaps / No Gaps / Gaps / No Gaps
A / 69 / 141 / 27,741 (3,774-130,231) / 6,037 ( 266- 84,750) / 57.0 (51.8-64.6) / 58.7 (46.3-73.1)
B / 49 / 86 / 21,545 ( 839-106,143) / 5,509 ( 272- 39,527) / 59.2 (53.4-70.6) / 60.3 (43.4-72.3)
C / 43 / 168 / 30,342 (1,075-243,579) / 4,830 ( 243- 51,633) / 58.3 (48.7-67.5) / 59.2 (45.2-72.7)
D / 49 / 92 / 20,278 (1,196- 77,374) / 5,133 ( 43- 40,305) / 57.3 (52.1-67.8) / 60.2 (52.7-69.1)
E / 62 / 161 / 36,731 (1,372-417,949) / 6,576 ( 101-139,728) / 57.7 (52.8-67.9) / 59.7 (41.5-73.1)
F / 4 / 3 / 33,432 (6,909-103,537) / 7,392 (2,750- 10,897) / 63.8 (61.3-67.2) / 64.9 (63.7-66.8)
All elements / 276 / 651 / 27,823 ( 839-417,949) / 5,668 ( 43-139,728) / 57.9 (48.7-70.6) / 59.5 (41.5-73.1)

* “Gaps” indicates that there are sequencing gaps in the breakpoint sequence. †”No gaps” indicates that there are no sequencing gaps in the breakpoint sequence.

Table S5. Numbers of Interbreakpoint Matches.

Muller’s Element / BPn* / BP1 (%) / BPmax (%) / BPmax Seq ID / BPmaxL / Mean Match No.
A / 210 / 149 (71.0) / 85 (40.5) / BP_007_008_A / 10,541 / 18.7
B / 135 / 90 (66.7) / 56 (41.5) / BP_081_082_B / 56,284 / 15.7
C / 205 / 128 (62.4) / 86 (42.0) / BP_202_203_C / 1,488 / 28.0
D / 141 / 105 (74.5) / 58 (41.1) / BP_104_105_D / 20,515 / 18.0
E / 223 / 151 (67.5) / 107 (48.0) / BP_201_202_E / 313,415 / 29.7

*BPn, number of breakpoints; BP1, number of breakpoints that match at least one other breakpoint; BPmax, maximum number of breakpoints matched by a single breakpoint; BPmax Seq ID, the identification number of the breakpoint with the BPmax; BPmaxL, length of the breakpoint sequence with the BPmax; Mean Match No., average number of matches for the BPn breakpoints on each element

Table S6: 44 genes in the 5% false discovery rate set showing excess radical charge substitutions

CG number / Gene name
CG13012
CG11213 / Chorion protein 38
CG10570
CG10123 / Topoisomerase 3alpha
CG10587
CG11594
CG11880
CG11966
CG1630 / Inositol 1,4,5-triphosphate kinase 2 (IP3K2)
CG14913
CG15398
CG13109 / taiman (tai)
CG15116
CG1555 / cinnabar
CG13185
CG15005
CG13654
CG2175 / defective chorion 1 (dec-1)
CG2658 / period
CG18292
CG17104
CG3124
CG31122
CG31258
CG31177
CG32685
CG3525 / easily shocked
CG32627
CG31634
CG4752
CG4354 / slow border cells (slbo)
CG4835
CG32447
CG4201 / immune response deficient 5 (ird5)
CG8945
CG9676
CG9554 / eyes absent (eya)
CG9505
CG8733 / Cyp305a1
CG7886

Table S7: Top 27 gene showing excess radical polarity change

CG number / Gene name
CG12212 / pebbled
CG11006
CG12885
CG10811 / Eukaryotic-initiation-factor-4G (eIF-4G)
CG13960
CG17390
CG13627
CG1470 / Guanyl cyclase beta-subunit at 100B ( GycBeta100B )
CG32697
CG31342
CG31151
CG3338
CG4167 / Heat shock gene 67Ba
CG6238 / slingshot (ssh)
CG4913 / ENL/AF9-related (ear)
CG6384 / Centrosomal protein 190kD
CG6775 / rugose (rg)
CG8498
CG8817 / lilliputian (lilli)
CG7793 / Son of sevenless
CG8715 / lingerer (lig)
CG8595 / Toll-7 - immune response
CG9007
CG8013 / Su(z)12
CG7546
CG7177
CG6889 / taranis (tar)

Table S8. Gff format description of the 142 cis regulatory sequences from the literature used in the analysis of conservation of CRE’s. (1st page only – see spread sheet file)

2LFlyBase_ARGSprotein_bind34707063470715.+."gene_cg """"CG8846"""";sym """"Thor_protein_bind_1"""";citation """"FBrf0128398"""";gene_sym """"Thor"""";"""

2LFlyBase_ARGSprotein_bind34707183470730.+."gene_cg """"CG8846"""";sym """"Thor_protein_bind_2"""";citation """"FBrf0128398"""";gene_sym """"Thor"""";"""

2LFlyBase_ARGSreg_element52959195295932.+."gene_cg """"CG14029"""";sym """"vri_reg_element_1"""";citation """"FBrf0122968"""";comment """"""""E box CACGTG motif, putative CLK/CYC-binding site"""";4 """"copies tested for activation by CLK"""""""";gene_sym """"vri"""";"""

2LFlyBase_ARGSreg_element52965045296517.+."gene_cg """"CG14029"""";sym """"vri_reg_element_2"""";citation """"FBrf0122968"""";comment """"E box CACGTG motif, putative CLK/CYC-binding site"""";gene_sym """"vri"""";"""

2LFlyBase_ARGSreg_element52965225296535.+."gene_cg """"CG14029"""";sym """"vri_reg_element_3"""";citation """"FBrf0122968"""";comment """"E box CACGTG motif, putative CLK/CYC-binding site"""";gene_sym """"vri"""";"""

2LFlyBase_ARGSreg_element52972435297256.+."gene_cg """"CG14029"""";sym """"vri_reg_element_4"""";citation """"FBrf0122968"""";comment """"E box CACGTG motif, putative CLK/CYC-binding site"""";gene_sym """"vri"""";"""

2LFlyBase_ARGSreg_element1116418711164209.-."gene_cg """"CG16874"""";sym """"Vm32E_reg_element_14"""";citation """"FBrf0093392"""";comment """"Sequences from -135 to -113 are essential for expression in ventral columnar follicle cells."""";gene_sym """"Vm32E"""";"""

2LFlyBase_ARGSreg_element1199719811997208.+."gene_cg """"CG5279"""";sym """"Rh5_reg_element_16"""";putative """"@ey@ protein binding site."""""""";citation """"FBrf0093641"""";comment """"""""P3/RCS-1 site"""";gene_sym """"Rh5"""";"""

Table S9. Coordinates of Conserved Linkage Breakpoints (1st page only – see spread sheet file)

CLB_L, Conserved linkage block to the left; CLB_R, Conserved linkage block to the right; Scaffold, genomic scaffold with the breakpoint sequence; Beg, beginning of the interspecific breakpoint sequence within the scaffold; End, end of the interspecific breakpoint sequence within the scaffold; Matches, number of inter-breakpoint matches found for the particular breakpoint sequence; Motif, indicates if the conserved breakpoint motif was found in the interspecific breakpoint sequence based on a composite BLASTN analysis of the breakpoint motif across all Muller’s elements.

[SWS1]

CLB_L / CLB_R / Scaffold / Beg / End / Chr / InterBP Match / Motif
1 / 2 / Contig1045_Contig3832 / 37658 / 40792 / A / 0 / No
2 / 3 / Contig1045_Contig3832 / 44480 / 46153 / A / 0 / No
3 / 4 / Contig1045_Contig3832 / 60640 / 91520 / A / 6 / No
4 / 5 / Contig1045_Contig3832 / 99315 / 100964 / A / 0 / No
5 / 6 / Contig1045_Contig3832 / 111409 / 114002 / A / 0 / No
6 / 7 / Contig1045_Contig3832 / 134643 / 147289 / A / 53 / Yes
7 / 8 / Contig1045_Contig3832 / 217925 / 228265 / A / 84 / Yes
8 / 9 / Contig1045_Contig3832 / 235293 / 244240 / A / 1 / No
9 / 10 / Contig1045_Contig3832 / 265345 / 283521 / A / 63 / Yes
10 / 11 / Contig1045_Contig3832 / 326630 / 343606 / A / 5 / No
11 / 12 / Contig1045_Contig3832 / 415969 / 417815 / A / 0 / No
12 / 13 / Contig1045_Contig3832 / 510592 / 523648 / A / 0 / No
13 / 14 / Contig1045_Contig3832 / 704752 / 706592 / A / 0 / No
14 / 15 / Contig1045_Contig3832 / 710068 / 711770 / A / 0 / No
15 / 16 / Contig1045_Contig3832 / 735853 / 736443 / A / 0 / No
16 / 17 / Contig1045_Contig3832 / 738699 / 739193 / A / 0 / No
17 / 18 / Contig1045_Contig3832 / 756783 / 766360 / A / 48 / Yes
18 / 19 / Contig1045_Contig3832 / 783339 / 783955 / A / 0 / No
19 / 20 / Contig1045_Contig3832 / 850496 / 971732 / A / 34 / No
20 / 21 / Contig1045_Contig3832 / 972046 / 982591 / A / 19 / No
21 / 22 / Contig1045_Contig3832 / 1102259 / 1114787 / A / 14 / Yes
22 / 23 / Contig1239_Contig7917 / 6675 / 13074 / A / 0 / No
23 / 24 / Contig1277_Contig4006 / 213336 / 231588 / A / 10 / No
24 / 25 / Contig1277_Contig4006 / 356772 / 367736 / A / 0 / No
25 / 26 / Contig1277_Contig4006 / 370479 / 373577 / A / 5 / No
26 / 27 / Contig1277_Contig4006 / 388374 / 393922 / A / 34 / Yes
27 / 28 / Contig1277_Contig4006 / 397490 / 406702 / A / 32 / No
28 / 29 / Contig1321_Contig0723 / 35695 / 100147 / A / 14 / No
29 / 30 / Contig1321_Contig0723 / 106662 / 126870 / A / 1 / No
30 / 31 / Contig1773_Contig0628 / 111331 / 119954 / A / 21 / No
31 / 32 / Contig1773_Contig0628 / 176347 / 188954 / A / 41 / Yes
32 / 33 / Contig1773_Contig0628 / 346993 / 348893 / A / 0 / No
33 / 34 / Contig1773_Contig0628 / 351577 / 352489 / A / 37 / Yes

Supporting Online Materials Figure Legends

Fig S1. Distribution of D. pseudoobscura / D. melanogaster length ratios for orthologous introns and intergenic. The plot shows the frequencies of length ratios for 31,314 orthologous pairs of introns and 6,875 orthologous pairs of intergenic distances. Dashed line show trimmed means (means of all values between 0.5) of each distribution; red line indicates a log10 ratio of zero – where measurements in both species are the same. The intron mean is very close to zero (-0.0079) indicating no net change of intron length on average, although the distribution is asymmetrical. The intergenic distance ratio mean is 0.0700, which corresponds to a 17% average increase in sequence length.

Fig S2. Frequencies of different syntenic block lengths between D. pseudoobscura and D. melanogaster. Syntenic block length is measured as the number of syntenic genes, broken out by chromosome. The minimum block length was one gene, maximum, 69 genes and the mean 10.7 genes (83 kb).

Fig. S3. Alignment of chromosome arms between D. pseudoobscura and D. melanogaster. The D. pseudoobscura - D. melanogaster alignments were based on all against all BLASTZ alignment between the D. melanogaster and D. pseudoobscura genome sequences using the default parameters (Schwartz et al. 2003). To reduce the number of false alignments, two filtering steps were applied to the HSPs of the global alignment. First, alignments inconsistent with the synteny map based on coding sequences described above were excluded (~15%). Second, where D. melanogaster genomic regions aligned to multiple regions in D. pseudoobscura, alignments with the highest Smith-Waterman score were kept (~10% alignments passing the first filtering step). As a result, each base from D. melanogaster only aligns to one unique base in the D. pseudoobscura genome. For each chromosome arm, the green and purple tiers indicate the fraction of aligned bases which are identical (purple alone indicates the fraction expected by chance give the base composition of the two chromosome arms), the red tier indicates the fraction of mismatched bases, yellow – D. melanogaster bases aligned to deleted bases in D. pseudoobscura, or blue – D. melanogaster bases unaligned in our synteny-filtered BLASTZ alignment.

Fig. S4. Fraction of identical base pairs within 50Kb windows of the D. melanogaster – D. pseudoobscura filtered BLASTZ alignment. The D. melanogaster sequence was divided into 50Kb windows with 10Kb overlaps. The number of identical base pairs in each 50Kb window is plotted as a fraction for each D. melanogaster chromosome. The X-axis is the D. melanogaster chromosome co-ordinate in Mb.

Fig. S5. Fraction of identical base pairs within the aligned region of 50Kb windows of the D. melanogaster – D. pseudoobscura filtered BLASTZ alignment. The D. melanogaster sequence was divided into 50Kb windows with 10Kb overlaps. The fraction of identical base pairs of aligned base pairs in each 50Kb window is plotted for each D. melanogaster chromosome. For example if with a 50 Kb window, 30Kb of that sequence could be aligned and 15Kb of that aligned sequence contained identical base pairs then the fraction would be 0.5. The X-axis is the D. melanogaster chromosome co-ordinate in Mb.

Fig. S6. History of inter- and intraspecific inversions in the D. melanogaster and D. pseudoobscura lineages. The phylogeny shows the branches where interspecific inversions shuffled gene order in the two species. Five intraspecific gene arrangements are shown with the four intraspecific inversions that converted one arrangement into another. The cytological map is shown for each of the gene arrangements, Tree Line, Santa Cruz, Hypothetical, Standard, and Arrowhead. The D. pseudoobscura gene arrangement names are derived from the geographic locations where the chromosome was first observed (Dobzhansky and Sturtevant 1938).

Fig. S7. Repeat Family Alignment. Top: Alignment of repeat family 1. The consensus for the sequence is shown above the 128 bp repeat. The thirteen copies of repeat 1 are shown below the consensus where a dot indicates that the base is the same as the consensus. Bottom: Alignment of repeat family 2. The consensus for the sequence is shown above the 315 bp repeat. The three copies of repeat 2 are shown below the consensus where a dot indicates that the base is the same as the consensus. The abbreviations and locations of the repeats are given in Figure 3.

Fig. S8. Distribution for the inter-breakpoint match fraction for five chromosomal arms (Muller’s elements A through E) in D. melanogaster and D. pseudoobscura. The match fraction is defined as the percentage of breakpoints within a chromosomal arm that are matched by a query breakpoint. The analysis is given separately for the chromosomal arms of D. melanogaster (Dm) and D. pseudoobscura (Dp).

Fig. S9. TreeNeighbor joining (Saitou and Nei 1987) phylogeny of 91 breakpoint motifs from breakpoints, coding, and noncoding regions. A different symbol is used to denote the derivation of the motif sequence: breakpoint (triangles), coding regions (diamonds) and noncoding regions (circles).

Fig. S10. Amino-acid identity frequencies for aligned D. pseudoobscura – D. melanogaster orthologous genes. 10,987 gene pairs were individually aligned using CLUSTALW (Thompson et al. 1997; Thompson et al. 1994) with default parameter settings. The percentage of identical amino-acids within alignment was calculated individually for each alignment and the frequency of alignments within each percent identity bin calculated as a percentage of the total number of pairs. Yellow bars indicate all 10,987 gene pairs (mean 77.7%), green bars show the results for 761 male specific genes (mean 66.0%) as designated by testes specific ESTs, red bars show the results for 246 transcription factor genes (mean 82.76%) and blue for 58 nervous system genes (mean 84.2%).

Fig. S11. Number of gene features used for each category in the typical gene alignment pictured in Figure 5 of the main text. X-axis, category of gene feature analyzed. Y- axis number of gene features used in the analysis for each gene feature.

Fig. S1

Fig. S2

Fig. S3

Fig. S4

Fig S5

Fig. S6

Figure S7

Fig. S8

Fig. S9

Fig. S10

Figure S11

Supplemental information references:

Dobzhansky, T. and A.H. Sturtevant. 1938. Inversions on the Chromosomes of Drosophila pseudoobscura. Genetics23: 28.

Saitou, N. and M. Nei. 1987. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution4: 406-425.

Schwartz, S., W.J. Kent, A. Smit, Z. Zhang, R. Baertsch, R.C. Hardison, D. Haussler, and W. Miller. 2003. Human-mouse alignments with BLASTZ. Genome Res13: 103-107.

Thompson, J.D., T.J. Gibson, F. Plewniak, F. Jeanmougin, and D.G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res25: 4876-4882.

Thompson, J.D., D.G. Higgins, and T.J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res22: 4673-4680.

Page 1 of 25

[SWS1]This could be an Excel spreadsheet rather than a table.