Notes on Chromatin Conformation Capture
And Related Techniques
-3C is high throughput molecular biology technique used to analyze the organization of chromosomes in a cell’s natural state
-Studying these properties is important for understanding and evaluation of the regulation of gene expression, DNA replication and repair and recombination
- One example is chromosomal folding to bring an enhancer and associated transcription factors within close proximity of a gene, as was first shown in the beta-globin locus
3C technique has five experimental steps
-Step 1 Cross-linking: Addition of formaldehyde results in the cross-linking of DNA segments to proteins and the cross-linking of proteins with each other
- This leads to cross-linking of interacting DNA segments (e.g. enhancers and promoters)
- A cross-link is a bond that links one polymer chain to another – these bonds can be covalent or ionic bounds
- For example, disulfide and isopeptide bonds are natural cross-links that occur in organisms
- Through the clever use of crosslinking agents, protein-protein interactions in a cell’s natural state can be stabilized and looked for, since protein interactions are often too weak or transient to be easily detected on their own
- Formaldehyde is a common reagent of choice whose cross-linking effects can be reversed by incubation at 70C
-Step 2 Restriction digest: A restriction enzyme is added in excess to the cross-linked DNA, separating the non-cross-linked DNA from the cross-linked chromatin
- The selection of the restriction enzyme depends on the locus being analyzed
- Frequently cutting enzymes (4 bp) are used to study smaller loci (<10-20 kb), while larger loci demand the use of larger cutters (5 bp)
- The idea is that the restriction enzyme will not cut within the two interacting sequences (this will be due to the choice of restriction enzyme) so that the cross-linked DNA remains intact, while everything else is chopped into small pieces
-Step 3 Intramolecular ligation: Using very low concentrations of DNA favors the ligation of relevant DNA fragments with the corresponding junctions, instead of the ligation of random fragments
- The sticky ends produced at the ends of the relevant cross-linked sequences by the restriction enzymes can be ligated together using DNA ligase (similar to the way that Okazaki fragments are ligated together on the lagging strand during DNA replication), which is done through the formation of two covalent phosphodiester bonds
- This forms a circular fragment around the formaldehyde that had cross-linked the DNA if both ends ligate together (if only one end ligates, then the resulting strand is linear, with a central restriction site corresponding to the site of ligation and specific restriction ends as well
- However, randomly cross-linked fragments will also be ligated together, due to incomplete restriction digestion, which represents about 20-30% of all junctions. This number can be decreased by reducing the cross-linking stringency in the first step so that only very close DNA segments cross-link (i.e. those that are interacting)
- Furthermore, one end of a fragment can ligate with the other end of the same fragment (i.e. self-circularization), preventing the ligation of relevant cross-linked DNA to each other and this contributes up to 30% of all junctions formed
-Step 4 Reverse cross-links: High temperature will result in the reversal of cross-links formed in step 1
- The resulting linear DNA fragment has specific restriction ends as well as a central restriction site corresponding to the site of ligation
- The pool of these fragments is collectively referred to as the 3C library
-Step 5 Quantitation: PCR uses primers against the site of ligation to semi-quantitatively assess the frequencies of a restriction fragment of interest
- Quantitative PCR using TaqMan probes (3C-qPCR) provides a more quantitative measurement of the fragment of interest
- Quantitative PCR monitors the amplification of a targeted DNA molecule during PCR in real-time using non-specific fluorescent dyes that intercalate with any double-stranded DNA
- In this sense, since there will be more relevant fragments than random fragments ligated together, the ligated fragments will also amplify more quickly and so can be picked out using qPCR
Circularized Chromosome Conformation Capture (4C)
-This technique has an advantage over 3C in that only the sequence of one of the sites of interest needs to be known – this fragment is known as the “bait”
-Steps 1-4: These steps are identical to 3C. The idea is that we should produce a bunch of one promoter sequence that is ligated to one or more unknown enhancer sequences (for example)
-Step 5a Second restriction digest: After the reversal of the cross-linked DNA, the restriction fragments are subjected to another round of restriction digest, this time with a frequent cutter that will results in smaller fragments with restriction ends that differ from the central restriction site
-Step 5b Self-circularization: Self-circularization of the DNA fragments is more favored now that they are not bound to other proteins or fragments
- Intramolecular ligation occurs to induce the formation of circular fragments which become the 4C library
-Step 5c Inverse PCR and quantitation: Primers are designed against the outer restriction sites of the “bait”sequence, which result in the amplification of the small unknown captured fragment
- Large-scale sequencing can be used to sequence the 4C library – custom microarrays can also be made using probes designed against the adjacent upstream and downstream regions of all genomic sites of the restriction enzyme used in step 2
Carbon-Copy Chromosome Conformation Capture (5C)
-The 5C technique expands from 3C and allows for the parallel analysis of interactions between many selected loci
-Steps 1-4: Same as in 3C
-Step 5 Ligation-mediated amplification and quantitation
- Performing multiplex ligation-mediated amplification (LMA) after the construction of the 3C library leads requires using multiplex primers that consist of universal primer sequences like T7 and T2 and the ligation junction sequences (all the 3C library leads should still have the same central restriction site at the ligation junction)
- They anneal to the 3C fragments and get ligated together with a DNA ligase – the ligated primers serve as templates of which get amplified to generate the 5C libraries
- The use of universal primer sequences means these 5C fragments can be analyzed on microarrays and the small size of the 5C fragments is also compatible with analysis using high-throughput sequencing
- Multiplex PCR permits multiple targets to be amplified with only a single primer pair
- Each probe consists of two oligonucleotides that recognize adjacent sequences in the DNA and when they bind, and a ligase is used to mend the two probes together.
- Then, a primer that requires sequences from both probes is used for PCR so that amplification only occurs upon ligation
A decade of 3C technologies: insights into nuclear organization
-3C technologies are based on the remarkably simple idea that digestion and religation of fixed chromatin in cells, followed by the quantification of ligation junctions, allows for the determination of DNA contact frequencies and insight into chromosome topology
-First step is to establish a representation of the 3D organization of the DNA
- Chromatin is fixed using a fixative agent, most often formaldehyde
- Next, the fixed chromatin is cut with a restriction enzyme recognizing 6 bp (such as HindIII) or with more frequent cutters
- Then, the sticky ends of the cross-linked DNA fragments are religated under diluted conditions to promote Intramolecular ligations (i.e. between cross-linked fragments
- DNA fragments that are far away on the linear template, but that colocalize in space (e.g. enhancers and promoters) can be ligated to each other
- In this way, a one-dimensional linear DNA segment serves as a template of the 3D nuclear structure
-To establish the 3D conformation of a locus or chromosome, one must measure the number of ligation events between non-neighboring sites
- In 3C, this is done by quantitative or semiquantitative amplification of selected ligation junctions
- Primers are designed near and toward the ends of all restriction fragments of interest and the amplification efficiencies of different primer combinations are compared in a matrix of ligation frequencies that serve as proxies for pairwise interaction frequencies
-In the original study by Dekker et al. (2002), from this matrix, the average 3D conformation of yeast chromosome III was determined, showing that it forms a contorted ring
- The method was then adapted for mammalian systems to show that chromatin loops exist in vivo between regulatory DNA elements and their target genes via studies of the beta-globin locus
-With 3C, it is also possible to pick up enhancers that were previously unknown to regulate a gene
- Survey of spatial environment of CFTR gene identified a number of cell type-specific DNA-DNA interactions
- Some sequences showed enhancer activity in a reporter assay suggesting they may activate CFTR expression
-Figure 2 demonstrates how enhancer looping at the mouse beta-globin locus was demonstrated
- The relative cross-linking frequencies of several sequences for found to be particular high in fetal liver cells where the gene is expressed, while the cross-linking frequencies were low everywhere in fetal brains, where the gene is silent
- This suggests that the sequences with high-cross linking frequency are regulatory in nature (and, in particular, are enhancers)
-Enhancer activity on gene expression can be blocked by insulator sequences
-They are bound by proteins such as CTCF and 3C technology has been used to demonstrate that the function of certain insulators is dependent on the spatial organization
- CTCF sites form chromatin loops by contacting each other in the beta-globin locus
- Also recruit additional factors such as cohesin, which may facilitate DNA loop formation
-A number of recent studies have pointed to the existence of loops between the start and end of a gene
- 3C experiments in mouse liver cells showed that ribosomal DNA promoters have an increased propensity to interact with terminator sequences and that these loops are associated with increased rDNA expression
- A mechanistic explanation is that gene looping facilitates reloading of RNA polymerase and thereby increases expression throughput
- In yeast, loops form on genes when they are active or poised, but not when they are repressed
-Technical issues that arise when interpreting 3C data
- Any two sequences nearby on the linear chromosome are close in space and therefore sequences over hundreds of kilobases frequently cross-link and ligate to the anchor, independently of the chromatin’s 3D conformation
- To appreciate loops visualized by 3C-based technologies, one needs to find the anchor interacting with a distant sequence more frequently than with intervening sequences
- Therefore, 3C methods intrinsically rely on quantitative rather than qualitative measurements using qPCR for the quantitative detection of a given ligation junction
- At most alleles, cross-linking will result in larger chromatin aggregates with many DNA fragments together within which all DNA ends compete with each other for ligation to the anchor fragment
- Therefore, even a very stable enhancer-promoter interaction will only occasionally result in the corresponding ligation junction an, since every diploid cell only contributes maximally two ligation junctions of interest, 3C PCR requires faithful and quantitative amplification of very rare ligation junctions from many genome equivalents
- Consequently, qPCR is notoriously difficult and requires strict controls and careful experimental design
- The advent of genome-scale methods such as microarrays and high-throughput sequencing has enabled the development of more unbiased methods that offer a solution for assessing the relative abundance of long-range DNA-DNA contacts
Chromosome conformation capture-on-chip (4C) technology
-4C-seq uses next-generation sequencing (NGS) to analyze contacting sequences and is similar to using microarrays to analyze the contacts of a selected genomic site with all of the genomic fragments on the array
-It is a “one versus all” strategy because, a single viewpoint is defined and the genome is screened for sequences that contact this selected site
-In 4C technology, the ligated 3C template is processed with a second round of DNA digestion and ligation to create small DNA circles (some of which contain 3C ligation junctions)
- Using view-point-specific primers (that bind to our sequence of interest), inverse PCR specifically amplifies all sequences contacting this chromosomal site and can be analyzed by microarrays or NGS methods
- The latter is cheaper and enables more accurate quantification of DNA interaction frequencies and has a larger dynamic range
- The idea is that the viewpoint-specific primers also contain the Illumina sequencing primers so that the PCR products can be sequenced without further processing
- The reads contain the primer and the ligation junction, and after trimming the primer sequence, the remainder of the reads are aligned to the genome
-4C first used to investigate the DNA interaction profiles of a tissue-specific gene embedded in an inactive chromosomal region (beta-globin) and a house-keeping gene (Rad23a) present in an active gene-rich region
- Rad23a made contacts with active regions on its own chromosome and on other chromosomes that was largely conserved in both tissues
- The tissue-specific gene, however, made contacts with other active regions in erythroid cells, while in fetal brains (where it is not expressed), inactive regions were contacted
-4C studies have also shown that coregulated genes preferentially meet at dedicated transcription sites in the nucleus implying that genes dynamically move to specific nuclear locations for transcription, rather than the transcription machinery moving to genes
-Another 4C study focused on dosage compensation of the mammalian X chromosome
- Using allele-specific 4C strategy, it was shown that the active and inactive X chromosomes adopt distinct topologies in that the noncoding RNA Xist that drives X inactivation is not required to maintain gene silencing
-4C technology is preferred to assess the DNA contact profile of individual genomic sites but is limited to the description of long-range contacts with larger regions elsewhere on the chromosomes (in cis) or on other chromosomes (in trans), rather than local interactions between a gene and its enhancer 50 kb away due to lack of resolution
-Most 4C strategies use restriction enzymes with a 6-nucleotide recognition sequence that cut once every few kilobases, creating fragments that are much larger than the average regulatory sequences (which are less than several hundred base pairs)
-Using more frequent cutters (i.e. those that recognize 4-nt sequences) can potentially pick up more local interactions
Chromosome conformation capture carbon copy (5C technology)
-“Many versus many” technology allowing concurrent determination of interactions between multiple sequences
-In 5C, the 3C template is hybridized to a mix of oligonucleotides, each of which partially overlaps a different restriction site in the genomic region of interest
- Pairs of oligonucleotides that correspond to interacting fragments are juxtaposed on the 3C template and can be ligated together
- Since all 5C oligos carry one of two universal primer sequences at their 5’ ends, all ligation products can subsequently be amplified simultaneously in a multiplex PCR reaction and then analyzed through high-throughput sequencing
-The resolution of the technique is determined by the spacing between neighboring oligonucleotides on the linear chromosome template
-It can never reach the resolution of 4C as note every unique end of a restriction fragment will allow the design of a 5C oligonucleotide
-However, it provides a matrix of interaction frequencies for many pairs of sites allowing the reconstruction of the average 3D conformation of larger genomic regions
Interpretation of chromosome capture experiments: further considerations
-The resolution of all 3C-based methods is limited by the choice of the first restriction enzyme
-For a six-cutter like HindIII, there are ~800,000 HindIII sites in the mouse genome, and the average resolution throughput for the genome will be ~4 kb.
-Local distribution of restriction sites can vary between different genomic regions, resulting in different resolutions at different genomic locations
-An additional factor that can influence the results is the presence of repeats in the genome
- For 3C, this is a relatively minor problem because one can be quite flexible in the selection of primers for PCR
- For NGS-based methods, this can be more challenging, especially in 4C-seq and 5C, which rely on the sequence directly adjacent to the restriction site
- This can be partially circumvented by increasing the length of the sequencing reads, which gives higher mapping specificity
-A key characteristic of 3C-based methods is the very high capture probability between neighboring fragments, due to their close spatial proximity
- Moving further away from a given fragment leads to exponential decrease of the capture probability until it reaches a baseline level
- The rapid decline in contact probability makes it so that specific ligation junctions between two given sites far apart on the chromosome will be rare
- This makes 3C unsuitable for the analysis of long-range contacts
- For far cis and trans DNA contacts, 3C and HiC data sets are not reproducible at the single fragment resolution, but are highly reproducible over genomic windows
- When a long-range interaction within or between chromosomes is described, this is often a statistical definition, meaning two regions have a higher probability for making contacts compared with other regions at a similar distance on the same chromosome
-A further issue to consider is the number of contacts a given gene appears to have
- In 4C, a single locus can be engaged in tens or hundreds of contacts (depending on the threshold)
- These contacts are collected from many cells and will not all be present in the same cell, implying that the large number of contacts reflects cell-to-cell differences in genome topology
- During mitosis, each chromosome probably adopts one of a limited number of energetically favored conformations that will position a given gene next to a few other genes
-The dynamics of chromatin structure and cell-to-cell variation is not appreciable by 3C based methods, and it cannot be determined whether two different interactions of A with B and C occur simultaneously or sequentially and/or whether they are mutually exclusive