Mismatch Repair Detection for SNP Discovery
BackgroundRandom sequencing approaches led to the identification of a tremendous number of single nucleotide polymorphisms (SNP) in the human genome. Unfortunately, only a small fraction of the disease-causing variations in regulatory and coding regions (cSNP) have been identified through this approach. At the same time, there is increasing evidence that a significant number of disease-causing SNPs are likely to be found at modest frequency in the population within coding and regulatory elements. The identification of such disease-causing SNPs requires a large-scale targeted discovery effort; to have a 95% confidence of identifying an allele with a 1% frequency, 150 individuals need to be sequenced.
ParAllele has developed a greatly streamlined approach to discover all polymorphisms (greater than 1% frequency) using Mismatch Repair Detection (MRD) through the enrichment of variant alleles. For more details about MRD, please visit our publications page.
SNP discovery using the Mismatch Repair Detection (MRD) process
 |
Figure 1: A heterogeneous population is pooled in a single tube from which PCR amplicons are generated. If this mixture were simply sequenced, rare SNPs such as the 1% to 5% frequency SNP shown in the top trace would be lost in the noise of sequencing. An engineered bacterial mutation sorter strain is used to sort a group of DNA fragments into two pools: those carrying a variation and those that do not. This process allows the same 1% to 5% frequency SNP to now dominate the variation containing mixture, revealing the minor allele as the dominant signal on the sequencing trace. |
The Mismatch Repair Detection SNP discovery approach is very scalable. ParAllele has completed several collaborations for complete SNP discovery in coding regions as well as in conserved noncoding regions. Study sizes are typically performed with more than 1,000 fragments all enriched in a single assay.
Performance of a Mismatch Repair Detection BRCA2 SNP Discovery Project Pools of DNA were enriched that contained BRCA2 SNPs at a number of known frequencies (see Figure 2, below).
 |
MRD detects all SNPs with population frequencies higher than 1%.
Figure 2: This shows seven variations present in four pools at different known frequencies. If that variation was detected in a given pool, its frequency is plotted in the positive direction whereas if it was undetected, the frequency is plotted in the negative direction. It is shown that all SNPs with frequencies higher than 1% were detected. |
The Power of Mismatch Repair Detection
 |
MRD dramatically reduces the sequencing requirement.
Figure 3: This shows a comparison between the predicted performance of Mismatch Repair Detection discovery and targeted Sanger resequencing. This plot shows the power of MRD variation enrichment. To confidently find all variation in the 1% to 10% frequency range, resequencing of 150 samples is required. MRD SNP discovery accomplishes the same by sequencing just one reference sample and one enriched sample (forward and reverse). The neutral theory distribution of SNPs in the genome predicts that there will be as many SNPs between 1% to 10% frequency as between 10% to 50% frequency. |
|