Supplementary Data

Theoretical description of enrichment method

The schematic procedure for the targeted (single-locus) enrichmentmethod is described in Figure 1.The overall enrichment is achieved through a combination of the uniqueness of the overhang sequences and the specificity of the “capture-hook”method. In Table S1, the theoretical estimates for target and non-target sequence are based on 100% efficiencies for gDNA digestion by RE (Bsm AI), ligation of adapters, and exonuclease digestion and are intended to provide only a ‘back of the envelope’understanding of what each step in the protocol is contributing towards the enrichment.

Theoretical enrichment calculations

  • For human female diploid (6.406x109bp), the average Bsm AI–digested fragment is 512 bp (= 45/2 for the 5 base recognition sequences, GTCTC and GAGAC) assuming 50% AT. The resulting fragments will have 4-base overhangs on the two 5’ endshaving sequencesdetermined only by the local sequence at the sites of cleavage; there are about 256 (= 44) different 4-base combinations for each end. A specific Bsm AI-fragment having 2 unique ends would be found once in every 65,536 fragments (44x44).
  • Two adapters, having 5’-AATG and 5’-CTGT overhangs, are ligated to the FMR1-fragment and other fragments having complement overhangs.The number of fragments having only one ligated adapter from either one of these 2 adapters is approximately equal to Total Fragments x (4 / 44). The number of fragments having 2 ligated adapters is equal to Total Fragments x [4 / (44x44)].
  • For those fragments having 2 ligated adapters, there are about 75% of these fragments having 1 or 2Bsm AI recognition sequences and 25% fragments having no Bsm AI recognition sequence. Since FMR1-fragment having 2 ligated adapters and no Bsm AI recognition sequence would be safe from Bsm AI-digestion during or after the ligation step. The number of fragments having 2 ligated adapters is equal to Total Fragments / (44x44).
  • DNA fragments with 0 or 1 adapter (non-cyclized) are eliminated by exonucleases whereas fragments with two ligated adapters are resistant to exonucleases. The number of fragments after the complete Exo-treatment is equal to Total Fragments / (44x44).
  • After the capture-hook selection step, the total FMR1 enrichment results are higher: 125,000 – 685,000 (Table 1).

Stage / Fragments / Ligated Adapters / Fragment Calculation / FMR1 Fragments / FMR1 Fragment Fraction / Maximum Enrichment / DNA
(ng)
After Bsm AI / 12,511,718 / 0 / Total frag (TF) =
6.406x109 / (45/2) / 2 / 1.59x10-7 / 1 / 20000
After ligation / 12,315,458
195,496
764 / 0
1
2 / TF x 4 / 44
TF x 4 / (44x44) / 2 / 1.59x10-7 / 1 / 20000
After Exo / 764 / 2 / TF x 4 / (44x44) / 2 / 2.62x10-3 / 16,384 / 1.22
After Bsm AI + Exo / 191 / 2 / TF / (44x44) / 2 / 1.05x10-2 / 65,536 / 0.30

Table S1. Theoretical enrichment calculations

Bisulfite conversion of control templates

Both methylated and unmethylated 20 CGG or 30 CGG fragments were treated by bisulfite to convert unmethylated C residues to T residues following amplifiction (QiagenEpitect Plus Bisulfite Kit). The CCG repeat on the native template strand has one C that is converted and one methyl-C that is not converted, allowing determination of the bisulfite conversion efficiency by calculating how much of the unmethylated C is converted. Thus, primers were designed to amplify the converted lower template strand. Primers for 20 CGG are: 5’-GAACTCACCACTACTACAACA-3’; 5’- TGAGCAGGTTGGAGGTTTAG-3’. Primers for 30 CGG are: 5’-CTACAAAAATAAACRTTCTAACCCTC-3’; 5’-TGTAGGTTTTTTTTAGTTTTTTTAGTGTYGGG-3’. All of the PCR products were end-repaired and prepared as SMRT libraries with the blunt adapter, 5’-pATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT, by standard SMRT library preparation protocol (DNA Template Prep Kit 2.0, PacBio).

Bismark(Krueger and Andrews 2011) and Perl were used to analyze non-conversion rate for C in CG and CH context (non CG) after bisulfite treatment. CCS (circular consensus sequence) reads from each library were used for bisulfite non-conversion analysis. Adapters were removed from the reads and then applied for Bismark analysis to output converted and unconverted C at each site in every CCS read. Parameters were set as “--score_min L,0,-0.4 -rfg 6,1 -rdg 6,1” when doing alignment. The non-conversion rate was calculated based on the output file (Table S2, Supplementary Figure S1A). Heat maps (Supplementary Figure S1B) were generated to show the converted C (unmethylated) and unconverted C (methylated C) in CGG repeat region.

Figure S1. A) Non-conversion rate at each CpG site in CGG repeat region. B) Heat map from sampled 500 ccs reads showing methylation status in CGG repeat region for each library.

Enriched templates from Whole Genome Amplified DNA (WGA-DNA)

WGA-DNA was produced from 10 ng gDNA of EBV-transformed lymphoblast cell using the REPLI-g kit (Qiagen) and its suggested protocol for 16 hr reaction at 30oC to produce 50 µg amplified DNA. The WGA-DNA, purified by AMPurebeads to remove primers, proteins and small molecules, were used for the enrichment process as described above.

Figure S2. Direct comparison of IPD ratio – 1 between the native gDNA derived templates and SssI methylated control templates with the same CGG repeat sequences.

References

Krueger F, Andrews SR (2011) Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27(11): 1571-1572

1