Supplementary Materials

Map validation using clone hybridization data. Our map, and the protocol used to obtain it, were validated by looking at available, independent data that we had ignored while constructing the map: we asked how well our map agrees with the high-quality FISH clone data of the National Cancer Institute (NCI). Those data were obtained by experimental in situ hybridization of individual clones (short regions of DNA) on chromosomal bands at the 850-band resolution (Kirsch et al., 2000, Nature Genet. 24, 339-340; Trask et al., 1998, Hum. Mol. Genet. 7: 2007-2020), using a method that is acknowledged to be particularly accurate (see ref. 51, who gave results from those hybridizations a much higher weight than data from other centers). Our de novo compositional methods are independent of such results, since we hybridized only fractions of total DNA (under conditions of no interference by repeated sequences), not clones.

We used the NCI results (UCSC annotation database for hg17, noting the ‘labs’ field; Kent et al., 2003, Genome Res. 12: 996-1006) to find the proportion of our bands that were in the same band as reported for one or more of the NCI hybridizations. Since Furey and Haussler’s map (51) relied primarily on those NCI hybridizations, less on other clone hybridizations, but not on any compositional data (see also below), we regarded the performance of their map as an upper limit to the agreement we might have expected. We found that 939 (71.7%) of the 1309 clones that did not straddle boundaries were in the reported band in our map, while the (expectedly higher) proportion of the NCI clones that fell into the right bands of Furey and Haussler’s map was 92.0%. If band-straddling clones are treated as mismatches, the percentage of matching clones was 68.6% for our map and 79.4% for their map. It should be also noted that some NCI results were ambiguous, in that two hybridizations of the same clone reported different bands. In view of such limits, the validation of our map seems very satisfactory. When we used less stringent criteria, taking into account also other centers’ assignments of the same clones, we obtained stronger agreements (e.g., 20 of the 22 NCI clones for chromosome 21 had a matching assignment, in both maps).

Compositional band mapping needs finished sequences. A full, finished sequence was indispensable for some of our band localizations, and the draft sequence might have even cast doubt on whether the relation between GC and bands is reliable. For example, the three high-resolution bands of the p arm of chromosome Y would have been difficult or impossible to reconcile with the draft sequence. The outermost GC-rich segment corresponding to the telomeric R band Yp11.32 was still completely missing in the draft (compare Pavlicek et al., 2002, FEBS Letters 511, 165-169 with ref. 35), and only two of the three bands of Yp had been sequenced. At that time, the most telomeric region of the available sequence was consistently GC-poor, so that it would have been difficult to force three bands (R/G/R) into the incomplete Yp sequence.

Comparison of sequence lengths with band lengths on the idiogram. Even though it predated the first draft sequence by 7 years, the original digitized ideograms of Francke (45) can be reinterpreted as an early quantitative map of bands to sequence, based on experiments, that met modern standards of precision. Each chromosome arm can be aligned with a continuous sequence (if the gaps are neglected) for that arm, albeit with some uncertainty as to how far unsequenced centromeric or telomeric heterochromatin extends inwards from the two ends. The locations of the bands can then be marked on the sequence, although one must allow for large and possibly systematic errors, because real chromosomes observed under the microscope were transformed to an idealized linear situation, and because it is not easy to predict how compaction varies along a chromosome during condensation.