Large Scale SNP Scanning on Human Chromosome Y and DNA Pooling Study by Using Unlabeled Probes

Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah 84132

Abstract

High-throughput SNP scanning is an important tool for genome studies. Genotyping of known mutations and scanning for unknown ones using high-resolution melting analysis and unlabeled probes is simple, rapid, and inexpensive, requiring only PCR, an unlabeled oligonucleotide, LCGreen Plus, and melting instrumentation. This method works on the single-sample HR-1, the 384-sample LightScanner and the LightCycler. We have used synthetic PCR constructs to demonstrate the detection of all possible SNP base changes. LCGreen Plus was included in the PCR reaction andby high-resolution melting analysis was performed five minutes after amplification. In all cases heterozygotes were easily identified because the resulting heteroduplex, formed by the probe oligonucleotide (probes or 4 strands, is sentence even important?) and the mismatched amplicon strand, which altered the shape of the melting curve. Analysis of known mutations using high-resolution melting analysis and unlabeled probes is simple, rapid, and inexpensive. This only requires PCR, an unlabeled oligonucleotide, LCGreen Plus, and melting instrumentation. This method works on the single-sample HR-1, the 384-sample LightScanner and the LightCycler. Chromosome Y is an effective and simple target for evolution studies. Thirty-five SNP markers, distributed along the human Y chromosome, have been characterized in 192 individualsfromof south India on a 384-well LightScanner. DNA pooling is a practical way to reduce the cost of large-scale evolution or association studies. Pooling allows the population allele frequencies to be measured using far fewer PCR reactions and genotyping assays than required when genotyping individuals one by one. We have developed an unlabeled probe/high resolution melting methodology together with analysis software to determine SNP frequencies in a the pooled DNA sample. Different ratios of complementary and mismatched amplicon strands from 0% to 100% were mixed and melted and quantification software was optimized (calibrated?) using this model system. We repeated this analysis using two genomic DNA sampless homozygous for a G to A mutation in the cystic fibrosis gene. When mixed in different ratios, and analyzed using this methodology, the software was able to correctly determine the ratio of G to A mutation in the mixture to an accuracy of 2% over the range of 0% to 100% of one allele. This method was also applied to a pool of ninety-six human genomic DNA samples, which previously had been genotyped individually at eight SNP markers on chromosome Y. The analysis software was able to determine the allele frequencies to within 2% accuracy across a range of frequencies from 3% to 23%. This method is very simple, fast and inexpensive for the determination of SNP allele fraction.

Introduction

Single nucleotide polymorphisms (SNPs) are the most common source of human genetic variation. Genotyping large numbers of SNPs in linkage, association studies and evolution studies will aid in the understanding of complex diseases traits, including many common human diseases, drug responses and human evolution (1). These applications require reliable and economical methods for high-throughput SNP genotyping.

SNP genotyping methods include gel-based genotyping and non-gel-based genotyping. Single-strand conformation polymorphic analysis (2) is one of the most widely used gel-based methods for mutation detection. Oligonucleotide Ligation Assay (OLA) and mini sequencing (3) are also gel-based genotyping techniques. Gel-based genotyping methods are still widely used in many labs for a small number of samples though they areit is labor intensive and requires experience and technical skills for analysis. Non-gel-based high-throughput genotyping techniques are rapidly developed. Pyrosequencinge, which uses single-base extension with fluorescence detection, and DNA microarray genotyping could handle large numbers of SNP genotypinggenotype large numbers of SNPs simultaneously. Labeled with fluorescence, the oligonucliotide Hhigh-throughput genotyping methods using fluorescently labeled oligonucleotides includeare TaqMan (4), Hybridization probe (5), Simple probe (6), Invader assay (7) and allele-specific ligation (8)genotyping.

Previously, wWe have developed a non gel-based genotyping technique. This technique uses a non- without the need for fluorescently labeled probes (Refs here?). This technique uses melting of unlabeled oligonucleotide probes in conjunction with homogeneous melting ofand PCR products in the presenceof a high-resolution double stranded DNA dye, called LCGreen Plus. In addition, a 3’ end blocked oligonucliotide is used (the probe? Asymmetric PCR?). In this paper, we used this unlabeled probe technique foron high-throughput genotyping and genome-wide association studies using DNA pooling. The PCR may be performed on any 384-well thermoal cycler, and the melting carried out on the inexpensive machine called “LightScanner.” machine.

Chromosome Y evolution is a good candidate for us to do a large number of SNP genotyping via unlabeled probe. 35 SNPs and 192 samples from south India were genotyped by unlabeled probe. Genome-wide association studies are necessary to identify genes underlying certain complex diseases. Many genetic diseases have yet to be located on the human genome for reasons that include their multiple loci and incomplete penetration. To pinpoint these loci in terms of particular regions of the chromosomes, association studies, which compare allele frequency between affected individuals (probands) and controls, must be performed across the entire human genome. With approximately 0.4 cMs between markers, 10,000 microsatellite markers would be necessary to fully saturate the genome (). For a study of 1000 probands and 1000 controls, 20 million genotypings would be required (). DNA pooling could greatly reduce the genotyping burden and speed up the initial gene mapping studies. Few Ttechniques were previously used for analyzing SNP allele fraction by DNA pooling such asinclude amplification and cleavage at the SNP site (), primer extension (), amplification with allele-specific primers (), detection of conformational changes (), hybridization of PCR products to microarrays (), DHPLC () and Pyrosequencinge (). The allele frequency estimates measured byof these techniques are about 2-5%.

We demonstrate the technology we have developed for We have developed high-throughputthe genotyping using unlabeled -probestechnology to study Chromosome Y evolution, by genotyping measure SNP allele fraction35 SNPs in 192 samples from south India. We also use this method to measure SNP allele fraction for the Cystic fibrosis mutation (G542X) in pooled DNA amplified from samples of known genotype. The technique is fast, easy to design and inexpensive, with sensitivity and accuracy between 1-2%. This technique is fast, easy to design and inexpensive. The sensitivity and accuracy is between 1-2%.

Method

DNA samples of chromosome Y

192 DNA samples used in the analysis were collected in Tamil Nadu, South India from Brian Mowry’s lab in Queens Centre for Mental Health Research Wacol, Brishbane, Australia. All samples are from control individuals that were collected as a part of a larger study of complex disease and not associated with known disease phenotypes. DNA was extracted from established cell lines using standard protocols.

Genotyping SNP Markers on Chromosome Y

The protocols for genotyping many of the 237 polymorphic sites, which were analyzed on chromosome Y, have been published (Underhill et al. 2000, 2001; Hammer et al. 2001). 35 SNP markers were chosen for the south Indian evolution study. The 35 markers were are listed in Table 1.

Multiplex PCR

Four to six times deep multiplex PCR was used for the first PCR. Multiplex PCR was performed on the Peltier Thermal Cycler PTC-200 (MJ Research) on 96-well plates. The PCR reaction is at 1.5uM Mg++, 0.4U Taq polymerase, 2mM dNTP, 0.5uM of 4 to 6 times multiple forward and reverse primers and 12.5ng human genomic DNA. The PCR condition is 94C for 3 minutes followed by 25 cycles with 94C for 15 second, 52C for 15 second and 72C for 15 second. One-thousandth PCR products were used for nested asymmetric PCR to amplify an individual marker with an exclusive probe. There were two polymorphisms (alleles?) for each marker. Both genotypes of probes were used for the SNP typing. The PCR reaction is 2.0uM Mg++, 0.4 Taq polymerase, 2mm dNTP, 0.05uM forward primer, 0.5uM reverse primer, 0.5uM probe with 3’ end phosphorylated and 1/1000 multiplex PCR product. The PCR was performed on the same thermal cycler with 384-well plates with the following condition: 94C for 2 minutes follow by 25 cycles with 94C for 5 second, 52C for 5 second, 72C for 10 second.

Data analyses of SNP genotyping

After PCR, the melting curve analysies is performedwill be on the LightScanner. The melting temperature is raisedwill from 50C to 90C with the meltingat a rate 0.1C/second in theand Automatic mode. The process takes only 5 minutes (seems like 400 seconds = 6 2/3 minutes). The software CTWTool-1-18-03 was used to analyze the melting curve data (using older software for this?). Two genotype probes were used side by side to do the genotyping. Determination of genotype was achieved by comparing the melting curves of two probes. (Manually? Wouldn’t automatic clustering be more suited to high-throughput?)

Genomic DNA allele fractionand DNA pooling

Human genomic DNA of cystic fibrosis wild type (CFTR 542 G) and homozygous mutant (CFTR 542 T) genotypes was used for the allele fraction study. The cCystic fibrosis mutation (G542X) is a single base change on exon 11 (G change to T). Human genomic DNA of cystic fibrosis, wild type and homozygous mutation, was used for the allele fraction study. Human genomic wild type DNA (CFTR 542 G) and cystic fibrosis homozygous mutant DNAThe two genotypes(CFTR 542 T) were mixed in ratios from 0% to 100% in 10% increments and 2% increments from 0 to 10%, 20 to 30%, 45 to 55%, 70 to 80% and 90 to 100%. 3’ end phosphate wild type probe (-P) was used for the allele fraction test, 5’–CAATATAGTTCTTGGAGAAGGTGGAATC-P-3’. The primers and asymmetric PCR conditions were described by Zhou et al ().

Ninety-six samples of genotyped human genomic DNA were pooled together. 50ng pooled DNA was used to determineate the population frequency by use of the unlabeled -probe technique. By comparing estimated allele fraction and actual allele fraction we were able to determine the sensitivity of this technique.

Plots of the derivative melting curves of the pooled samples, –dF/dT, were generated from the melting curve analysis by the software.

Software for allele fraction (Bob Palais)

Background fluorescence was removed from raw fluorescence vs. temperature data using an exponential model for the background fit to the slope of the raw fluorescence curve in two temperature regions, one below and one above the probe melting temperatures for both genotypes. The resulting melting curves were normalized to the 0-100% range and differentiated using the polynomial least-squares fit (Savitsky-Golay) method. Allele fraction was determined by linear interpolation of the peak heights of the unknown sample melted in the presence of unlabeled probes matching both genotypes and that of pure samples and neighboring standard calibration curves having synthetically determined allele fractions melted in the same conditions, and performing an equally weighted average of the values obtained.

Results

SNP marker selection

Over the past 15 years, DNA polymorphisms have been widely used to reconstruct human evolutionary history. Mitochondrial DNA was originally used for this purpose, because the high mutation rate produced numerous polymorphisms, and the absence of recombination facilitated their interpretation. Thirty-five SNP markers that represent a set of sequence variants from the south Indian population were chosen to carry out the genotyping.

Multiplex PCR

For human genetic studies, such as looking for human genetic disease, tumor suppression genes and human evolution studiesy, the human genomic DNA samples are always limited. Multiplex PCR has the ability to amplify different loci at same time, using the same amount of human DNA, consequently saving large quantities of human genomic DNA. Multiplex PCR has the ability to simultaneously amplify up to ten different amplicons. (maybe more explanation of the reasoning in the next sentences – what’s the connection with unlabeled probes and six-times deep (is that six instead of ten different amplicons? What’s the connection between multiplex and nexted?) In this paper we focus on using unlabeled probes to genotype SNPs, consequently only sixthe mult-iplex PCR is only performed “six-times deep.” The purpose of multiplex PCR is to enrich the loci that need to be genotyped. Then, using nested PCR, the multiplex enrichment is followed by asymmetric PCR for easy probing the genotypes of ing individual loci should be very easy (Figure 1).

Asymmetric PCR and unlabeled probes

There are many different techniques to detect mutations or SNPs through the use of probes. TaqMan, Hybridization probes(probes instead of probe? ?) and Simple probes are the most common techniques. These techniques need one or two florescent labels at the end of the probe. Zhou et al has developed a technique to determine mutations using probes without fluorescent labels. n unlabeled probe technique (). The key to this technique is the use of asymmetric PCR with an unlabeled probe and melting of the product with an unlabeled probe in the presence of the high-resolution double stranded DNA dye, LC Green Plus. Asymmetric PCR amplifies one strand much more than the compleimentary strand. The probe direction is opposite to this strand with the 3’ end blocked (to prevent extension). After PCR,when the unlabeled probe is added and a re-nature is performed (hybridization is promoted?), the probe and the strand thatwith the same direction as the unlabeled probe will have competition shares its sequence will compete to anneal withto the opposite strand. The derivative melting curve of a symmetric PCR product shows the amplicon peak but not the probe peak. After asymmetric PCR, during the re-nature phasewhen hybridization is promoted, annealing of forward and reverse amplicon strands is limited by the lower concentration of one strandanneal, leaving a plethora of (the other strand.) single-stranded DNA. These single strands of DNA anneal to the unlabeled probe. After this process, tThe derivative melting curve of an asymmetric PCR product has a lower temperature peak where probes melt from single amplicon strands, and a higher temperature peak where amplicons melt.showsthe probe peak and the amplicon peak. The process of 384-well plate melting only takes five minutes.

Genotype determination

All the SNP markers of chromosome Y have two types (allele typess?). We have typed both possibilities. Genomic DNA from cChromosome Y has only one allele (I don’t understand this?) so the probe melting curve always shows only one peak, either a 100% perfect match peak or 100% mismatch peak, in contrast with 50% of each in the case of heterozygous DNA from other chromosomes. Comparing both probes’ melting curves, theTheperfect match probe melting temperature of the probe which is perfectly complementary to the sample genotype will appearshow 3 to 5C higher than that of the probe with one base mismatched probe. Hence, the probe displaying a higher melting temperature determines the genotype by compleementarity (Figure 1). The matched and mismatched probe melting curves may be easily distinguished visually by a human technician, or by automatic clustering or classification.Different genotypes can also be distinguished based on It is very clear and easy to distinguish two probes’ melting curves. tThe amplicon melting curves also show the different genotype if the SNP is small deletion (Figure 2) or A, T vs. C, G change (Figure 3). This allows for a double confirmation of the genotyping.

To confirm the genotyping obtained using thewith unlabeled probe technique, we have chosen samples from each SNP marker for sequencing. The process for choosing samples for sequencing is as follows: Ffor athe SNP marker that does not have any variation we chose the most unclear (?)ambiguous sample and f. For athe SNP marker that has variation, we chose one the most unclear (?)ambiguous sample of each genotype, these samples also being unclear.. The result of sequencing was inis 100% agreement with the result of genotyping by match unlabeled probes. Twenty-four of the most important markers for south India population were tested on 192 samples by SNP short technique (?). Only one operator error was made out of 4,608 samples.

High throughput genotyping with the aid of unlabeled probes is fast. It takes only, taking about five minutes after PCR. The unlabeled probe is a 20- to 30bp oligonucliotide with the 3’ end blocked. The unlabeled probe is very stable:. It can be stored at room temperature for a few years with no light reaction or other degradation. UThe unlabeled probe design does not need to consider thetake GC content into account (is not sensitive to GC content?), which gives it more flexibility than the TaqMan probe, Hybridization probe and the Simple probe. The cost of an unlabeled probe is significantly lower than a fluorescently labeled probe. The data analysis is very simple as well. On chromosome Y, we have genotyped 35 SNP markers on 192 samples, which equalsfor a total of 6,720 genotypes determined.