Supplementary Table 1. Extraction and initial characterization of DNA from ancient mammalian remains.
Sample / Age (kya) / Sample (g) / DNA (ng) / Mitochondrial PCRRuminant incisor / 69 / 1.75 / 139 / Negative
Horse long bone / 40 – 50 / 2.1 / 42.7 / Positive
Wolf mandible / 40 – 50 / 1.6 / 163 / Positive
Supplementary Table 2. BLAST and BLAT analysis of sequences from ancient mammalian DNA extracts. Sequences generated by GS20 sequencing and Sanger sequencing were aligned to the cow, dog, horse and human reference genomes and to the non-redundant nucleotide database (nt) using BLAST. Sequences were assigned according to criteria described above. For faster alignment of the large quantity of short reads, Solexa data was aligned to the four reference genomes using BLAT. * includes sequences that hit multiple database sequences with similar or identical scores”
Supplementary Table 3.Annotation of amplified ancient DNA and comparison with reference genome wide data. Alignments to the reference genome were intersected with genomic annotations in the UCSC genome browser using GALAXY(Giardine et al. 2005). For each sample, the number of reads and total number of bases overlapping each category of annotation is shown. For each reference genome used, the total % coverage for each annotation type is shown for comparison
Supplementary Table 4. Complexity of amplified GS20 and Solexa sequences. Sequences were clustered by self-alignment as described in the methods. The number of reads in each cluster size is shown. For example, 6400 reads from the horse extract were observed only once, 152 reads from the horse extract were observed in clusters of size 2 (i.e. 76 pairs of identical reads).
Horse (GS20) / SheepOrGoat (GS20) / Wolf (GS20) / Wolf (Solexa)Unique / 6400 (97.5%) / 9789 (93.7%) / 11719 (95.9%) / 2589953 (95.4%)
2 / 152 (2.3%) / 588 (5.6%) / 444 (3.6%) / 103904 (3.8%)
3 / 9 (0.1%) / 69 (0.7%) / 42 (0.3%) / 13005 (0.5%)
4 / 0 / 0 / 16 (0.1%) / 3652 (0.1%)
5 / 0 / 0 / 0 / 1425 (0.1%)
>5 / 0 / 0 / 0 / 2654 (0.1%)
Supplementary Figure 1. Distribution of amplified ancient DNA sequences
Supplementary Figure 2.A. Read length distributions for capillary sequencing and GS20 sequencing from ancient DNA extracts. Reads are grouped by length into 5bp bins. Continuous lines show the proportion of reads in each size for all sequenced reads (left axis). Vertical bars show the number of endogenous ancient reads in each size bin (right axis). Solexa sequencing produces reads of a constant length (40bp). B. Average read lengths for all sequences and endogenous ancient sequences from each sample.
A
B
Supplementary Figure 3. Mismatch frequency in aligned ancient DNA sequences from GS20 sequencing
Supplementary Figure 4. Mismatch frequencies per nucleotide position in aligned Solexa sequences. A. The total frequency of mismatches at each position in all ancient wolf Solexa sequences aligned to the dog genome B. The total frequency of mismatches at each position in modern human DNA control sequences aligned to the human genome (These sequences are from a modern human DNA BAC sequence that is included in each solexa sequencing run for quality control purposes). Both ancient and modern control DNA show an increase in all mismatches at the ends of Solexa sequences due to sequencing errors. Ancient wolf DNA sequences, but not modern control sequences exhibit a high frequency of C > T mismatches at the start of reads. This is consistent with previous observations of accumulation of C > T damage at the ends of ancient DNA sequences. These observations are consistent with previously published results by Briggs et al (Briggs et al. 2007).
Supplementary Figure 5. ClustalW alignment of concatenated sequences from Deer, Sheep, Goat, Cow and unknown ancient Ruminant.
....|....| ....|....| ....|....| ....|....| ....|....|
10 20 30 40 50
cow CCAGCACGAG AGGTCTGAAA GCCTCAGCTT GGATGTCTCC AAGAGACAAG
bt2 CCAGCACGAG AGGTCTGAAA GCCTCAGCTT GGATGTCTCC AAGAGACAAG
ancient CCAGCACAAG AGGTCTGAAA GCCTCAGCTT GGATGTCTCC AAGAAACAAG
deer CCAGCACAAG GGCTCTGAAA GCCTCAGCTT GTATGTCTCC AAGAGACAAG
goat CCAGCACAAG AGGTCTGAAA GCCTCAGCTT GTATGTCTCC AAGAGACAAG
sheep CCAGCACAAG AGGTCTGAAA GCCTCAGCTT GTATGTCTCC AAGAGACAAG
Clustal Co ******* ** * ******* ********** * ******** **** *****
....|....| ....|....| ....|....| ....|....| ....|....|
60 70 80 90 100
cow GACTGGGCTG GGCAGCAGAC AAGCCCGAAA GGAAATAAGG GCAGGTTCTG
bt2 GACTGGGCTG GGCAGCAGAC AAGCCCGAAA GGAAATAAGG GCAGGTTCTG
ancient GACTGGGCTG GGCAGCAGAC AAGCCCGAAA GGAAATAAGG GCAGGTTCTG
deer GACTGGGCTG GGCAGCAGAC AAGCCCGAAA GGAAATAAGG GCAGGTTCTG
goat GACTTGGCTG GGCAGCAGAC AAGCCTGAAA GGAAATAAGG GCAGGTTCTG
sheep GACTTGGCTG GGCAGCAGAC AAGCCTGAAA GGAAATAAGG GCAGGTTCTG
Clustal Co **** ***** ********** ***** **** ********** **********
....|....| ....|....| ....|....| ....|....| ....|....|
110 120 130 140 150
cow CCTTGGCAAA CAGAGTTGGA GGCAAATTAA AATTAACTAA CATTCCCCAG
bt2 CCTTGGCAAA CAGAGTTGGA GGCAAATTAA AATTAACTAA CATTCCCCAG
ancient CCTTGGCAAA CAGAGTTGGA GGCAAATTAA AATTAACTAA CATT-CCCAG
deer CCTTGGCAAA CAGAGTTGGA GGCAAATTAA AATTAACTAA CATTCCCCAG
goat CCTTGGCAAT TAGAGTTGGA GGCAAATTAA AATTAACTAA CATTTCCCAG
sheep CCTTGGCAAT TAGAGTTGGA GGCAAATTAA AATTAACTAA CATTTCCCAA
Clustal Co ********* ********* ********** ********** **** ****
....|....| ....|....| ....|....| ....|....| ....|....|
160 170 180 190 200
cow GCCATGATGT TGACAAGTGT GGGACATGAT AG----TCAA ACCGGGCTCT
bt2 GCCATGATGT TGACAAGTAT GGGACATGAT AG----TCAA ACCGGGCTCT
ancient GCCATGATGT TGACAAGTGT GGGACATGAT AG----TCAA ACCCGGCTCT
deer GCCGTGATGT TGACAAGTGT GGGACATGAT AG----TCAA ACCCGGCTCT
goat GCCATGAAGT TGACAAGTGT GGGTCATGAT AGATAGTCAA ACCCAGCTCT
sheep GCCATGATGT TGACAAGTGT GGGGCATGAT AGATAGTCAA ACCCAGCTCT
Clustal Co *** *** ** ******** * *** ****** ** **** *** *****
....|....| ....|....| ....|....| ....|....| ....|....|
210 220 230 240 250
cow GCTTGACTCT AAGGCCTGAA TTCCCAAAGT CACCAGTGCA GATTCATTGG
bt2 GCTTGACTCT AAGGCCTGAA TTCCCAAAGT CACCAGTGCA GATTCATTGG
ancient GCTTGACTCT AAGGCCTAAA TTCCCAAAGT CACCAGTGCA GATTCATTGG
deer GCCTGATTCT AGGGCCTGAA TTCCCAAAGT CACCAGC-CA GATTCATTGG
goat GCCTGACTCT AAGGCCTGAA TTCCCAAAGT CACCAGCGCA GATTCATTGG
sheep GCCTGACTCT AAGGCCTGAT TTCCCAAAGT CACCAGCGCA GATTCATTGG
Clustal Co ** *** *** * ***** * ********** ****** ** **********
....|....| ....|....| ....|....| ....|....| ....|....|
260 270 280 290 300
cow CACAGATTTG CTGAAGTCAG GCCCCAGGAC AGCTATAGTC ATGCACCTGC
bt2 CACAGATTTG CTGAAGTCAG GCCCCAGGAC AGCTATAGTC ATGCACCTGC
ancient CACAGATTTG CTGAAGTCAG GCCCCAGGAC AGCTATAGTC ATGCACCTGC
deer CACAGATTTG CTGAAGTCAG ACCCCAGGAT AGCTATAGTC ATGCACCTGG
goat CACAGGTTTG CTGAAGTCAG GCCCCAGGAC AGCTGTAGTC ACGCACCTGG
sheep CACAGATTTG CAGAAGTCAG GCCCCAGGAC AGCTGTAGTC ACGCACCTGG
Clustal Co ***** **** * ******** ******** **** ***** * *******
....|....| ....|....| ....|....| ....|....| ....|....|
310 320 330 340 350
cow TCCTGTTTAC ACTTGTCAGG CTGCCTCCTG AGGGTGGGCA GGAGGAGGGG
bt2 TCCTGTTTAC ACTTGTCAGG CTGCCTCCTG AGGGTGGGCA GGAGGAGGGG
ancient TCCTGTTTAT ACTTGTCAGG CTGCCGCCTG AGGGTGGGCA GGAGGAAAAG
deer ACCTGTTTAC ACTTGTCAGG CTGCCGCCGG AGGGTGGGCA GGAGGAGGGG
goat ACCTATTTAC ACTTGTCAGG CTGCCGCCTG AGGGTGGGCA GGAGGAGGGG
sheep ACCTATTTAC ACTTGTCAGG CTGCCGCCTG AGGGTGGGCA GGAGGAGGGG
Clustal Co *** **** ********** ***** ** * ********** ****** *
....|....| ....|....| ....|....| ....|....| ....|....|
360 370 380 390 400
cow TGGGCAGCAG GCTGGTCTCA CTTCAGCTCC ACTCCAGCCC CTCATATTTA
bt2 TGGGCAGCAG GCTGGTCTCA CTTCAGGTCC ACTCCAGCCC CTCATATTTA
ancient TGGGCAGCAA GCTGGTCTCA CTTCAGGTCC ACTCCAGCCC CTCATATTTA
deer TGGGCAGCAG GCTGGTCTCA CTTCAGCTCC ACTCCAGCCC CTCATGTT-A
goat TGGGCAGCAG GCTGGTCTCA CTTCAGCTCC ACTCCAGCCC CTCATGTTTA
sheep TGGGCAGCAG GCCGGTCTCA CTTCAGCTCC ACTCCAGCCC CTCATGTTTA
Clustal Co ********* ** ******* ****** *** ********** ***** ** *
....|....| ....|....| ....|....| ....|....| ....|....|
410 420 430 440 450
cow TCAGGAACTC CTGGGGTTGG CTGAGAGTCA TCTGAGGCTA AGCTGACCCA
bt2 TCAGGAACTC CTGGGGTTGG CTGAGAGTCA TCTGAGGCTA AGCTGACCCA
ancient TCAGGAACTC CTGGGGTTGG CTGCGAGTCA TCTGAGGCTA AGCTGACCCA
deer TCAGGAACGC CTGGGGTTGG CTGAGAGTCA TCTGAGGCTA AGCTGGCCCA
goat TCAGGAACTC CTGGGGTTGG CTGAGAGTCA TCTGAGGCTA AGCTGGCCCA
sheep TCAGGAACTC CTGGGGTTGG CTGAGAGTCA TCTGAGGCTA AGCTGGCCCA
Clustal Co ******** * ********** *** ****** ********** ***** ****
....|....| ....|....| ....|....| ....|....| ....|....|
460 470 480 490 500
cow GAAGGGCATG TTGGAGATGT TCAGAACCAG GCTGGCTGAG AACTGATACT
bt2 GAAGGGCATG TTGGAGATGT TCAGAACCAG GCTGGCTGAG AACTGATACT
ancient GAAGGGCATG TTGGAGATGT TCAGAACCAG GCTGGCTGAG AACTGATACT
deer GAAGGACATG TTGGAGATGT TCAGAACCAG GCTGGCTGAG AACTGATACT
goat GAAGGACATG TTGGAGATGT TCAGAACCAG GCTGGCTGAG AACTGATACT
sheep GAAGGACATG TTGGAGATGT TCAGAACCAG GCTGGCTGAG AACTGATACT
Clustal Co ***** **** ********** ********** ********** **********
....|....| ....|....| ....|....| ....|....| ....|....|
510 520 530 540 550
cow GCCAGATGGT GCAGCAGCCA AAATATTTAT GAGATGCTAA GCTCGCAGCC
bt2 GCCAGATGGT GCAGCAGCCA AAATATTTAT GAGATGCTAA GCTCGCAGCC
ancient GCCAGATGGT GCAGCAGCCA AAATATTTAT GAGATGCTAA GCTCGCAGCC
deer GCCAGATGGT ACAGCAGCCA AAATATTTAT GAGATGCTAA GGTCGCAGCC
goat GCCAGATGGT ACAGCAGCCA CAATATTTAT GAAATGCTAA GCTCGCAGCC
sheep GCCAGATGGT ACAGCAGCCA CAATATTTAT GAGATGCTAA GCTCGCAGCC
Clustal Co ********** ********* ********* ** ******* * ********
....|....| ....|....| ....|....| ....|....| ....|....|
560 570 580 590 600
cow C--GTTT-AT TATCTAAGAG GTCCTGGGGC TCTGTCTTTC TGAAACAGTC
bt2 C--GTTT-AT TATCTAAGAG GTCCTGGGGC TCTGTCTTTC TGAAACAGTC
ancient CCTGTTT-AT TATCTAAGAG GTCCTGGGGC TCTGTCTTTC TGAAACAGTC
deer CCTGTTTTAT TATCTAAGAG GTCCTGGGGC TCTGTCTTTC TGAAACAGTC
goat C-TGTTT-AT TATCTAAGAG GTCCTGGGGC TCTGTCTTTC TGAAACAGTC
sheep C-TGTTT-AT TATCTAAAAG GTCCTGGGGC TCTGTCTTTC TGAAACAGTC
Clustal Co * **** ** ******* ** ********** ********** **********
....|....| ....|....| ....|....| ....|....| ....|....|
610 620 630 640 650
cow TGTGCCAGCC CATGTGCTGA GATGCTGGCA GGGATGTCTG GAGACAGACA
bt2 TGTGCCAGCC CATGTGCTGA GATGCTGGCA GGGATGTCTG GAGACAGACA
ancient TGTGCCAGCC CATGTGCTGA GATGCTGGCA GGGATGTCTG GAGACAGACA
deer TCTGCCAGCC CATGTGCTGA GATGCTGGCA GGGATGTCTG GAGACAGGCA
goat TCTGCCAGCC CATGTGCTGA GATGATGGCA GGGATGTCTG GAGACAGACA
sheep TCTGCCAGCC CATGTGCTGA GATGATGGCA GGAATGTCTG GAGACAGACA
Clustal Co * ******** ********** **** ***** ** ******* ******* **
....
cow CAAA
bt2 CAAA
ancient CAAA
deer CAAA
goat CAAA
sheep CAAA
Clustal Co ****
Supplementary Figure 6 ClustalW alignment of sequences obtained from mitochondrial PCR of ancient incisor extract using bovine-primers and selected bovine species. Direct sequencing of both strands of the PCR product was performed by standard capillary sequencing using the forward and reverse PCR primers. The best scoring alignment to each bovine species was identified by BLAST of the incisor sequence to the NR database. Mismatches are indicated by grey boxes. Scores for alignment to the incisor sequence are shown for comparison. The most closely related sequence identified was Steppe Bison (Bison Priscus).
Species / Accession No.PosSequenceScore
Incisor AGTACATGAAATTATTAATCGTACATAGCACATTATGTCAAATCCACTCTTGACAA
Bison Priscus / AY748599 297-352AGTACATAAAATTATTAATCGTACATAGCACATTATGTCAAATCCACTCTTGACAA 98
Bos Grunniens / DQ139203209-264AGTACATGAAATTATTAATCGTACATAGCACATTATGTCAAACTCACTCCTGACAA 94
Bos Taurus / AB177775265-320AGTACATGAAGTTATTAATCGTACATAGCACATTATGTCAAATTCACTCCTGACAA 94
Bison Bison / AY748669 297-352AGTACATAAAGTTATTAATTGTACATAGCACATTATGTCAAATCTACTCTTGGCAA 91
Bison Bonasus / DQ14167818-72AATACAT-AAATTATTAATTGTACATAACATATTATGTCAAGTCCATTCTTGGTAA 85
Bos Indicus / AB268574275-330AATACATACAATTATTAACCGTACATAGTACATTATGTCAAATCCATCCTCAACAA 83
Bos Primigenius / DQ915558164-219AATACATACAATTATTGACCGTACATAGTACATTATGTCAAATCCATTCTTGATAG83
Supplementary Figure 7.Quality score comparison for GS20 and Solexa sequencing technologies. The graph shows the average quality scores over the first 100bases for all sequences from each GS20 sequencing run, and the first 40bp for all reads from the wolf Solexa sequencing run. Quality scores for each sample are generated using the proprietary data processing for each sequencing technology, and are claimed to be equivalent to phred scores used for capillary sequencing. A known flaw in GS20 sequencing is in miscalling of homopolymeric repeats. The quality of Solexa sequencing decreases substantially with read-length (see also analysis of mismatches in Solexa sequencing below). Capillary sequencing of cloned amplified wolf DNA was performed in two directions. The resulting assembled sequences have an average quality score of ~65, with the benefit coming from sequencing each base twice.
Supplementary Figure 8. Mismatch frequency in duplicated Solexa reads. The graph shows the average frequency of mismatches per position for the 51,952 pairs of duplicated reads identified in Solexa sequences from the ancient Wolf sample. The frequency of mismatches per position increases towards the end of the reads. This mismatch profile is similar to that observed in alignments of wolf Solexa sequences to the dog genome, and in human control Solexa sequences to the human genome (see supplementary figure 4). This observations suggests that Solexa sequencing error rather than emulsion PCR amplification is responsible for the majority of sequence differences.
Supplementary Note.
Mammalian mitochondrial PCR primers
Sheep/Goat forward:5’-CACAGACTTCCCACTCCACAA-3’
Sheep reverse: 5’-ACTCGTTTGCATGTTTAAGACAG-3’
Goat reverse: 5’-GTGTAGGCGAGCGGTGTAAT-3’
Wolf forward: 5’-CCT GAG GTA AGA ACC AGA TGC C-3’
Wolf reverse: 5’-GCATATCACTTAGTCCAATAAGGG -3’
Horse forward: 5’-TTT GAC TTG GAT GGG GTA TG-3’
Horse reverse: 5’-AAT GGC CTA TGT ACG TCG TG-3’
Adapter sequences
GS20 Adapter-1: 5’-CCA TCT CAT CCC TGC GTG TCC CAT CTG TTC CCT CCC TGT CTC AG-3’ and 5’-CTG AGA CAG GGA-3’
GS20 Adapter-2: 5’-CCT ATC CCC TGT GTG CCT TGC CTA TCC CCT GTT GCG TGT CTC AG-3’ and 5’-CTG AGA CAC GCA-3’
Solexa P1 adapter: 5’-TGA TAC GGC GAC CAC CGA GAT CTA CAC TCT TTC CCT ACA CGA CGC TCT TCC GAT CT-3’ and 5’-AGA TCG GAA GAG-3’
Solexa P2 adapter: 5’-CAA GCA GAA GAC GGC ATA CGA GCT CTT CCG ATC T-3’ and 5’-AGA TCG GAA GAG-3’
Primer sequences
GS20 forward primer: 5’-CCA TCT CAT CCC TGC GTG TCC CAT CTG TTC CCT CCC TGT CTC AG-3’
GS20 reverse primer: 5’-CCT ATC CCC TGT GTG CCT TGC CTA TCC CCT GTT GCG TGT CTC AG-3’
Solexa PCR forward primer:5’-TGA TAC GGC GAC CAC CGA GAT CTA CAC TCT TTC CCT ACA CGA CGC TCT TCC GAT CT-3’
Solexa PCR reverse primer:5’-CAA GCA GAA GAC GGC ATA CGA GCT CTT CCG ATC T-3’
Briggs, A.W., Stenzel, U., Johnson, P.L., Green, R.E., Kelso, J., Prufer, K., Meyer, M., Krause, J., Ronan, M.T., Lachmann, M. et al. 2007. Patterns of damage in genomic DNA sequences from a Neandertal. Proceedings of the National Academy of Sciences of the United States of America104: 14616-14621.
Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J. et al. 2005. Galaxy: a platform for interactive large-scale genome analysis. Genome research15: 1451-1455.