Title: Digital Genotyping and Haplotyping with Polymerase Colonies

Rob Mitra et al.

Harvard Medical School, Lipper Center for Computational Genetics, 200 Longwood Ave., Boston, MA 02115.

Abstract: The polymerase colony (polony) technology is a method in which a number ofamplifies multiple individual DNA molecules are PCR amplified in a thin acrylamide gel poured on the surface of a glass microscope slide. In eEach individual DNA molecule included in the reaction produces a cresulting polony of double stranded DNA, one strand of which is covalently attached to the gel. [HMS1]Here, we show that wWe are able to genotype these polymerase colonies (polonies) by performing single base extensions with dye labeled nucleotides, and we demonstrate the accurate quantitation of two allelic variants using this technology. We also show that polony technology can be used to directly determine the phase, or haplotype, of two single nucleotide polymorphisms (SNPs). We correctly determined the genotype and phase of three different pairs of SNPs. In one case, the distance between the two SNPs is 45 kilobases, the largest distance achieved to date without separating the chromosomes by cloning or somatic cell fusion. The results indicate that polony genotyping and haplotyping may play an important role in understanding genetic variation.

Introduction

One goal of genomic science is to find genetic variation that predicts susceptibility to disease. Individuals who have been identified as being at risk could then change their diet, lifestyle, or environment to reduce their chances of developing disease. For patients who have already developed disease, genetic markers could guide the choice of therapy to increase the likelihood of a successful outcome. [HMS2]

Most researchers who study DNA variation are focusing on single nucleotide polymorphisms (SNPs), as these are the most common variations in the human population. By studying candidate genes and performing genome-wide linkage studies, scientists are trying to hone in on the “causative SNP”- the SNP that alters gene function and increases the risk of disease. However, two recent studies[Drysdale, 2000 #1; Hoehe, 2000 #24] suggest that, for some genes, there may not be one single SNP that is responsible for altering protein function or expression - and thereby causing disease, but, instead, multiple SNPs that interact to alter function or expression[Davidson, 2000 #20]. Furthermore, this phenotypealteration only occurs when these SNPs arepresent on the same chromosome, so one must determine the haplotype of these SNPs to find a correlation to the observed phenotypes. In these cases, then, we have traded the notion of a causative SNP for that of a causative haplotype.

What existing technologies allow one toHow can one determine the haplotype, or phase, of a pair of SNPs? Currently, the most common approach is to first genotype the SNPs to acquireget unphased data from multiple related individuals and to then infer the haplotype computationally[Stephens, 2001 #4; Clark, 1998 #1; Excoffier, 1995 #2; Hawley, 1995 #3; Hoehe, 2000 #24; Niu, 2002 #6]. The development of this methodology has greatly increased the power of both linkage studies and candidate gene studies. However, theis computational inference of haplotypes has been estimated to be only 75-95% accurate[Stephens, 2001 #4; Niu, 2002 #6;Tishkoff, 2000 #13], making this technique an unlikely candidate for use in a clinical setting[HMS3], as well as presenting challenges when used as a research tool. Recent findings suggest a way to improve the accuracy of haplotype inference. Daly et. al[Daly, 2001 #27] and others[Patil, 2001 #31; Gabriel, 2002 #28] have shown that SNPs tend to be inherited in larger haplotype blocks than previously thought and that there are relatively few different variants of each block. This observation has sparked a public effort to characterize all common haplotypes in the human population[Couzin, 2002 #32]. Prior knowledge of common haplotype blocks maywill make it easier to infer the phase of SNPs that lie within the same haplotype block[Zhang, 2002 #18]. However, even with this knowledge, it will be difficult to accurately predict the haplotype of two SNPs that are in different haplotype blocks because these two SNPs will not typically be in linkage disequilibrium. This situation will be relatively common, because the sequences encoding aThese casesis may be quite commonnot be rare since the genome contains about 100 kb per gene and its regulatory region can be spread over 60-100 kilobases {George, any reference or info on this number} and the average haplotype block is only 22 kilobases in European and Asian populations and 11 kilobases in Yoruban and African-American populations[Gabriel, 2002 #28]. This point is well illustrated by two known mutations, R347->H and A970->D, in the CFTR gene[Clain, 2001 #17]. These mutations are that are separated by 65 kbilobases of genomic sequence, a distance much larger than the length of most haplotype blocksp. WYet, when present in cis, they interact to produce much more severe symptoms of cystic fibrosis than when present in trans, or when only one mutation is present. If haplotypes are to be used in the clinic as a prognostic marker, a direct molecular haplotyping technology is necessary.

Current methods for the direct determination of haplotypes have clear limitations. Allele specific PCR[Michalatos-Beloin, 1996 #14] and single molecule PCR[Ruano, 1990 #2] require significant optimization and cannot routinely determine the phase of SNPs separated by more than 10-15 kilobases. Atomic force microscopy[Woolley, 2000 #21] is an interesting alternative, but it is unclear how easily this technology can be scaled up, and it requires expensive equipment not commonly found in a molecular biologygenetics laboratory. Methods in which chromosomal fragments are cloned into BACs or in which somatic cell hybrids[Patil, 2001 #31;Douglas, 2001 #22] are made are not cost effective when one is interested in phasing a small number of SNPs for a large number of samples[Douglas, 2001 #22], as would be necessary for a clinical diagnostic.

Here, we present a method to determine haplotypes using polymerase colony (polony) technology, a technology in which a large number of individual DNA molecules are cloned, and amplified, and analyzed on a glass microscope slide[Mitra, 1999 #3]. We determined the phase of three different pairs of SNPs up to. One pair was 45 kilobases apart. I, and in principle, distances of tens or hundreds of megabases (whole chromsomes) are possible. This technology requires very little DNA as input - we show that a buccal swab provides enough DNA to perform hundreds of reactions. We provide evidence also demonstrate that a large number of polony assays can be performed on a single microscope slide, reducing the cost per assay. This method may become an important tool to accurately determine haplotypes in a cost-effective manner.[HMS4]

As a prerequisite to determining haplotypes, it was necessary to demonstrate that polony technology could be used to determine genotypes. Polony genotyping also has many applications such as detecting loss of heterozygosity[Zhou, 2001 #7], quantifying allelic imbalance[Yan, 2002 #9] and the detection of rare somatic mutations in a background of wild-type DNA[Lizardi, 1998 #34]. Therefore, in addition to demonstrating haplotyping, we also present data that demonstrates the utility of polony genotyping for these applications.

Materials and Methods

Polony Amplification

Template (50 to- 100,000 molecules) was added to the polony amplification mixture [10mM Tris-HCl pH 8.3, 50mM KCl, 0.01% gelatin, 1.5mM MgCl2, 200M dNTPs, 10U JumpStart Taq (Sigma), 5.91% acrylamide, 0.09% bis-acrylamide, 0.5M forward primer (with acrydite group), 0.5M reverse primer, 0.1% Ttween 20, 0.2% BSA]. ANext, ammonium persulfate and TEMEDtemed were added to the mixture, each at a final concentration of 0.083%. A 15m thick gel was poured on a glass microscope slide that was partially covered with a teflon coating (Erie Sscientific). The teflon coating served as a spacer between the glass surface of the slide and a glass coverslip (20mm x 30mm no. 2 - Fisher Scientific). The gel was allowed to polymerize under argon for 30 minutes. The coverslip was overlaid with mineral oil and the slide cycled using the following program: denaturation (2 minutes at 94C) 40 cycles (30s at 94C, 30s at 56C, 1min at 72C), and extension (2 min at 72C). After cycling, the mineral oil was removed by rinsing the slides in hexane.

Polony amplification for the haplotyping reactions were performed as above except four primers were used (two forward and two reverse primers) at a concentration of 0.25M each primer.

The polony protocol was modified to amplify polonies in gels that contained the cleaveable crosslinker DATD(see results). Instead of polymerizing the acrylamide gel with the template and PCR reagents present, we polymerized the gel first and later diffused in the DNA template molecules and PCR reagents. In this protocol, we made the gel mix [Tris-HCl pH 8.3, 50mM KCl, 0.01% gelatin, 1.5mM MgCl2, 7.6% acrylamide, 0.36% DATD, 0.036% bis-acrylamide, 0.5M acrydite modified reverse primer, 0.1% Ttween 20, 0.2% BSA], and then added ammonium persulfate and temed, to a final concentration of 0.083%. We poured a 15m thick gel on a glass microscope slide that was partially covered with a teflon coating (Erie scientific). The teflon coating served as a spacer between the glass surface of the slide and a glass coverslip (20mm x 30mm no. 2 - Fisher Scientific). The gel was allowed to polymerize under argon for 30 minutes. The slides were washed in deionized water, allowed to dry and stored under a vacuum until use. To perform polony amplification, we took PCR amplification mix [500 - 5 x 104 molecules/ul template, Tris-HCl pH 8.3, 50mM KCl, 0.01% gelatin, 0.2% BSA, 0.1% tweenTween 20, 0.5M primer PR1 pcr2.1-R, 200M dNTPs, 0.335 units/ul Jumpstart Taq] and covered the polymerized gel for 2 minutes and then removed excess fluid. The gel was covered with 35 l of mineral oil and covered with a coverslip. The slides were cycled as follows: denaturation (2 minutes at 94C) 44 cycles (30s at 94C, 45s at 56C, 90s at 72C). After amplification the DATD crosslinker was cleaved by treating the slides with 100mM NaIO4 for 15 minutes at room temperature. Next we washed the slides in deionized water for 5 minutes, in inactivation buffer [50mM ethanolamine, 100mM Tris-HCL pH 9.0, 0.1% SDS] for 30 minutes at room temperature, and in deionized water for 5 minutes.

Denaturing Polony Gels

After polony amplification, the unattached DNA strand was removed by incubating in 70C denaturing buffer [70% formamide, 1x SSC] and electrophoresing in 0.5x TBE with 42% urea for 1 hour at 5-10 v/cm. The slides were then washed 2x4minutes in Wash buffer 1 [10mM Tris-HCl pH 7.5, 50mM KCl, 2mM EDTA, 0.01% triton x-100].

For polony haplotyping reactions, after the first SBE, the extended primers were removed by washing in 70 degree denaturing buffer and the slides were washed 2x5' in dH20.

Single Base Extension (SBE) Reactions

The SBE reactions used in the polony haplotyping and genotyping experiments in this study were carried out using fluorescent deoxynucleotides. To do so, the acrylamide gel was covered with a frame seal chamber (MJ Research) and annealing mix [0.25M SBE primer, 6x SSPE, 0.01% triton-x100] was added over the gel. The slides were heated at 94C for 2 minutes, then at 56C for 15 minutes. We removed unannealed primer by washing the slides 2 x 4 minutes in wash buffer 1 and then equilibrated the slides in 1x Klenow buffer [10mM Tris-HCL pH 7.5, 10 mM MgCl2]. Next, we covered the gel with 40 microliters of extension mix[1x Klenow buffer, Klenow exo - polymerase Xunits/l, E.coli Single stranded binding protein, 1M Cy3 or Cy5 labeled deoxynucleotide] for two minutes and then washed the slides in wash buffer 1. The slides were scanned on a scanning confocal microscope designed for microarrays (Scanarray 5000, GSI L luminomics).

SBE reactions with dDideoxynucleotides were performed as follows: The acrylamide gel was covered with a frame seal chamber (MJ Research) and annealing mix [0.25M sequencing primer, 6x SSPE, 0.01% triton-x100] was added over the gel. The slides were heated at 94C for 2 minutes, then at 56C for 15 minutes. Unannealed primer was removed by washing the slides 2 x 4 minutes in wash buffer 1. The slides were equilibrated in 1x Amplitaq FS buffer [10mM Tris-HCL pH 8.0, 50mM KCL, 1.5mM MgCl2]. Next, the gel was covered with 40 microliters of extension mix[1x Amplitaq FS buffer, 2M FITC-12-ddUTP, 2M ROX-ddCTP, 2M Cy5-ddATP, 2M Cy3-ddGTP, Amplitaq FS Xunits/l, E.coli Single stranded binding protein ]. The gel was covered with a frame seal chamber and heated to 55 degrees for 4 minutes. A wash in wash buffer 1 was performed and the slides were scanned on a scanning confocal microscope designed for microarrays (Scanarray 5000, GSI luminomics).

Image Analysis

Images of polony gels were acquired in TIFtif format. The images were filtered using a Wiener filter and a median filter to remove speckle and noise. The background was subtracted and polonies were computationally identified using the ImageQuantNT software package. This package quantitated quantified the fluorescent intensity of each polony and output the data as a text file. Overlapping polonies were identified using a MATLAB script, HAPCALL which is available at http//arep.med.harvard.edu For the polony genotyping experiment in which the relative abundances of two alleles are measured, the images were smoothed, polonies identified, and their genotypes were determined using the MATLAB script polony_call.m also available at

Oligonucleotides and Patient DNA

All primers used to perform the polony amplification reactions were designed using Primer 3 software[Rozen, 1998 #41]. We found it was necessary to set the following parameters in order to obtain good results: PRIMER_OPT_SIZE=25, PRIMER_MIN_SIZE=19, PRIMER_MAX_SIZE=30, PRIMER_OPT_TM=70, PRIMER_MIN_TM=64, PRIMER_MAX_TM=73, PRIMER_MAX_DIFF_TM=5, PRIMER_MIN_GC=45, PRIMER_MAX_GC=80. For some experiments, the parameter PRIMER_PRODUCT_SIZE_RANGE=90-100 was used. All other parameters were set to default values. The names and sequences of the oligonucleotides used to amplify polonies are as follows: Locus containing SNP DK438: Primer DK438AP.1.FM 5’ QCATTGAGTCCTTACTGTGCACACAGCTC 3’; Primer DK438AP.1.R 5’ GGGGGAAATCCACTGAGCTAAATTGC 3’. Locus containing SNP DK445-2: Primer DK445-2AP.1.F 5’ GGTCCCCACCTAGGCCTCTGTGTTA 3’; Primer DK445-2AP.1.RM 5’ QTGAGTCCCTCAAACCCCTTTCTTCTG 3’. Locus containing SNP DK331: Primer DK331AP.1.FM 5’ QTGTTGGTATGGCAGAATGTAGCATGG 3’; Primer DK331AP.1.R 5’GGCGGTGAGAAAAGGTTTTAATGG 3’; Locus containing SNP C/T –13910: Primer IN13L126PS2F 5’ GGCCTCTGCGCTGGCAATACAG 3’; Primer In13l126ps2RM 5’ QCCTCGTGGAATGCAGGGCTCAA 3’; Locus containing SNP G/A –22018: Primer In9L125ps3FM 5’QGATGTCCTTAAAAACAGCATTCTCAGC 3’; Primer In9L125ps3r 5’CCATGTTGGCCAGGCTGGTCTC 3’;Model Templates for SBE Quantitation: Primer PR1-RM 5’QCTGCCCCGGGTTCCTCATTCTCT 3’; Primer PR1pcr2.1-R 5’ CCATGTAAGCCCACTGCAAGCTACC 3’;INSERT jays primers here; PR1-F CCACTACGCCTCCGCTTTCCTCTC 3’

The following oligonucleotides were used as primers for the single base extension reactions: Primer Seq 438 5’GAGCTAAATTGCACATAACTTAGTAACAGGCTTA3’; Primer Seq 445-2 5’ ACCTAGGCCTCTGTGTTAGTCTGTTTTCA 3’; Primer Seq 331 5’ACCTAGGCCTCTGTGTTAGTCTGTTTTCA 3’; Primer In9L102ps2R 5’ GGGACAAAGGTGTGAGCCACCG 3’; Primer SeqIN13ps2 5’ GGCCTCTGCGCTGGCAATACAGATAAGATAATGTAG 3’. Primer Hybe 010129-1GA 5’ TATGGGCAGTCGGTGATAGAGTGGTGGA 3’. INSERT JAYS PRIMER HERE.

Patient DNA used to haplotype SNPs DK438, DK445-2, and DK331 was obtained from the Coriell Institute. Patient DNA used to haplotype the SNPs G/A –22018 and C/T –13910 was purified from buccal swabs using the MasterAmp buccal swab DNA extraction kit (Epicentre).

Results

Principles Underlying Polony Haplotyping. [HMS5]Our approach is shown in
Ffigure 1. One hundred to five hundred genome equivalents of patient DNA are diluted into a mixture of acrylamide monomer, bis cross-linker, and PCR reagents. Two pairs of primers are included in this mixture, one pair flanking the first SNP of interest(Ffigure 1 inset), the other pair flanking the second SNP of interest, and this mixture is used to pour a thin (15 micrometer) acrylamide gel on a glass microscope slide. Because the concentration of patient DNA is so low, the chromosomes are well separated from each other on the surface of the slide. PCR is then performed using a PCR machine designed to accommodate slides[HMS6]. Each chromosome is amplified at two loci by the PCR reaction, and the acrylamide matrix prevents the amplification products from diffusing very far. As such, double stranded DNA accumulates around the chromosome, forming two overlapping polonies - each amplified from a different region on the same chromosome molecule. A key feature of this protocol is the use of modified primers in the PCR reaction that covalently attach one strand of the amplified DNA to the acrylamide matrix[Rehman, 1999 #196]. This feature allows the unattached other strand to be removed from all polonies by heating and washing the slide, leaving single stranded templates for the subsequent single base extension (SBE) reactions that will determine the genotypes of the two SNPs. After genotyping all polonies, the phase of the SNPs is then determined by identifing overlapping polonies.

Practical Considerations. Proof-of-Principle To perform the protocol described above, it was first necessary to establish that I) multiple polonies could be amplified from a single molecule of DNA and II) single base extension reactions[Pastinen, 1996 #4; Pastinen, 1997 #36; Syvanen, 1994 #35; Dubiley, 1999 #38] could be performed on DNA covalently attached to the acrylamide gel. To confirm that two polonies could be amplified from a single DNA molecule, we first cut a circular plasmid template (Ffigure 2c) with EcoRI to make it a linear molecule. We then amplified this linear template in a polony reaction using two sets of PCR primers – each primer pair chosen to amplify a different region of the template molecule (designated regions A and B in figure 2). After amplification, the polonies were made single stranded by heating the slide and washing away the unattached DNA strand. Next, two dye-labeled oligonucleotides were hybridized to the gel(figure 2a). The oligonucleotide complementary to DNA sequence located in region A was labeled with a Ccy5 molecule (red), and the oligonucleotide complementary to DNA sequence located in region B was labeled with a Ccy3 (green) molecule. In a separate control reaction, we cleaved the circular plasmid with two restriction endonucleases, EcoRI and NcoI, so that region A and region B were no longer on the same molecule of DNA (figure 2b). When the singly cut plasmid was used as the template for polony amplification, numerous overlapping polonies could be identified after the hybridization, as evident from the large number of yellow polonies in Ffigure 2a. The doubly cut plasmid produced few polonies that overlapped. The polonies that did overlap did so only near their edges and were the result of two separate DNA molecules falling near each other when the gel was poured. These results demonstrate that a single DNA molecule can give rise to two overlapping polonies. We determined the efficiency of amplification for the two primer pairs to be 85% and 81% (see methods).

We next characterized the specificity of single base extension (SBE) on acrylamide-immobilized DNA. We used a single dye-labeled deoxynucleotide or dideoxynucleotide to extend primer:template duplexes by one base in a DNA polymerase catalyzed reaction. We performed four reactions for each nucleotide tested to determine the specificity of the SBE reaction for the correct base relative to all possible mismatches. The results are shown in Ttable 1. SBE reactions with both fluorescent deoxynucleotides and dideoxynucleotides showed good discrimination for the correct base. We chose to use fluorescent deoxynucleotides in our SBE reactions as they performed somewhat better and are not as expensive as fluorescent dideoxynucleotides.