How Many Genes Are There

How Many Genes are There?

There is wide variation in genome size (Fig 3.1) ranging from mycoplasmas 1 x 106 bp to some flowering plants 1 x 1011 bp, a difference of 10,000 fold.

In general, eukaryotes have larger genomes than prokaryotes.

There are several reasons for this:

1. Genes are interrupted

2. There are multiple genes for a particular protein– gene families

3. Large blocks of repeated DNA sequences

4. Large blocks of DNA with no protein coding function

In general, it takes a greater number of genes to produce a more complex organism ((Fig 3.1, 3.2, and 3.3).

However, there is great disparity between different eukaryotes, e.g. amphibians and plants differ by 300 to 1000 fold (Fig 3.3). This is referred to as the C-value paradox, because it is not reasonable to assume that organisms in a phylogenetic group need very different amounts of DNA to produce them. C-value = the size of the haploid genome.

Because of their interrupted structure eukaryotic genes are often difficult to identify. A primary means for identification of a gene is whether it shows homology to one in another organism, whether or not the function of the gene is known in either organism.

Another way to identify genes is to analyze expressed genes- the mRNA.

Therefore- genome sequencing will ultimately answer the question- how many genes are there. But until then, an estimate can be obtained through reassociation kinetics.

The reverse of denaturation, it occurs through random collision between complementary polynucleotide strands. Reassociation can be followed, for example, by UV absorbance (the hyperchromicity).

Reassociation of simple genomes follows a sigmoidal curve (Fig 3.5), over a 100-fold Cot range

When the reaction is half complete it can be defined as the product of the DNA concentration x time at the midpoint- or Cot1/2. The Cot1/2 value is inversely proportional to the rate constant of the reaction, so greater Cot1/2 values reflect slower rate of reassociation.

Do not study the derivation of the equation (Fig 3.4), which is very poorly done in the book.

Larger genomes reassociate with greater Cot values because any individual DNA sequence is diluted, making it less likely that complementary sequences will collide. e.g. 12 pg reassociation reaction = 3000 copies of a 0.004 pg genome compared with 4 copies of a 3 pg genome.

Most eukaryotic genomes show very complex reassociation kinetics.

1) They do not reassociate over a 100-fold range, rather over million-fold range.

2) The reassociation can be considered as several separate components (Fig 3.6).

3) The Cot1/2 values of each component are either less or greater than for simple genomes.

4) Comparison to simple genomes allows complexity to be given a value.

Cot1/2 (test genome) =Complexity of test genome

Cot1/2 E. coli 4.2 x 106 bp

5) Analyzed this way each of the components behaves as DNA fragments of a specific size. Also referred to as the kinetic complexity.

6) Another way to analyze the component is as a fraction of the genome. e.g. the hypothetical genome in Fig 3.6 is 7 x 108 bp and the amount in each of the fractions can be calculated as a fraction of the DNA in that reassociation fraction. This value is termed chemical complexity.

7) Comparing the kinetic and chemical complexities, each component is determined to be of differing repetition frequencies.

8) It is important to realize that these are average values.

The properties of the differing components can also be analyzed by isolating them separately and carrying out a denaturation curve. The unique DNA component denatures similarly to native DNA- suggesting it is highly base paired. In contrast, the repetitive component denatures at lower temperatures, suggesting it is not entirely base paired and contains extensive regions of mismatching (Fig 3.7).

Genes lie in the nonrepetitive component- discovered using mRNA (cDNA) as a tracer in a reassociation kinetic experiment (Fig 3.9).

Please read the sections on gene number and essential genes (pgs 75-77), interesting reading, but will not be tested on it.

Expressed genes also have differing complexities (Fig 3.11). The slow component shows that ~13,000 mRNAs are expressed in chicken oviduct cells. Most somatic eukaryotic cells give estimates of between 10,000 to 20,000 expressed genes. (See Fig 3.10, which shows this number is in agreement with estimates from whole genome sequencing.

Simple sequence DNA (Genes VII Chapter 4, pg 106-114)

Reassociation kinetics shows that eukaryotic genomes contain short, repetitive DNA sequences observed as the component with low Cot1/2 value (Chapt. 3).

The amount ranges from 10% of genome in mammals v.s. 50% in Drosophila virilis (one of the largest arthropod genomes, where simple sequence DNA was first discovered)

Found in large and small clusters in the genome.

The repetition and clustering creates a DNA fraction with distinctive physical properties, which can be used to purify it.

Due to distinct base composition and high state of methylation simple sequence DNA often has a buoyant density different from the rest of the genome, allowing it to be separated by CsCl centrifugation- so called satellite DNA (Fig 1.14).

Other forms of repetitive DNA do not have different buoyant density and these are referred to as cryptic satellite DNA.

Satellite DNA is localized at centromeres (Fig 4.15) a region used during meiosis/mitosis for controlling chromosome movement. In interphase chromosomes this DNA is found in heterochromatin, a region that is in a tightly packaged form, compared with euchromatin, where expressed genes are localized and which is in an unpackaged state. In situ hybridization or cytological hybridization- (describe this method- chromosomes are isolated, fixed to microscope slide, denatured and then hybridized to radioactively labeled probe, then radioactivity detected on film by autoradiography).

Satellite DNA is not transcribed nor does not appear in RNA.

The observations described in #7 and #8 suggest that satellite DNA serves a structural function- e.g. chromosome migration. But exact function is not known.

Satellite DNA differs in organization among different species.

Arthropods have very short repeated sequences I, II, III (Fig 4.16)

Mammals have half and quarter repeats (Fig 4.17 and 4.18).

Another form of satellite DNA called Mini-Satellite DNA consist of short repeated units (5 to 50 repeats) that show a high degree of allelic polymorphism (individual to individual variation), also referred to as VNTR = variable number of tandem repeats

VNTR DNA is common in mammals

VNTR function is unknown, but its existence has huge practical implications for genetic mapping, also called “DNA fingerprinting” - explain technique and why it is useful (Fig 4.22).

Satellite DNA is difficult to sequence- explain why

Satellite DNA is large problem for genome sequencing projects- explain why.