Second Progress Report on the Analysis of the Physarum Sequences Obtained Through the Pilot

Second Progress Report on the Analysis of the Physarum Sequences Obtained Through the Pilot

Second progress report on the analysis of the Physarum sequences obtained through the pilot assay. by Gerard Pierron

To better understand what is found in the 22,616 Physarum “traces” or “reads”, some words are needed on the way the DNA was prepared by Marianne Bénard. Amoeba were grown in axenic liquid cultures, harvested and homogeneized to isolate nuclei. About 10% of the cells were resistant to the homogenization and were lysed with the isolated nuclei, bringing in a number of mitochondria. After proteinase K treatment, the DNA was purified on CsCl gradients (small tubes, vertical rotor, traces of ethidium bromide). The genomic DNA was then taken out with a syringe by side puncturing under UV. This is done by puncturing slightly below the main DNA band where the GC-rich extrachromosomal rDNA is found. So, the final DNA preparation contained significant amount of rDNA while, in contrast, most of the AT-rich mitochondrial DNA, which bands above the genomic DNA, was eliminated.

Mitochondrial DNA content in the 22,616 reads

Blasting the fully-sequenced mitochondrial Physarum genome (AF027295) against the 22,616 traces of Physarum indicates that only 29 reads contain mitochondrial DNA, as mentioned before by Sandy Clifton. This is a very small number since the mtDNA represents roughly 7-10% of total DNA and could have accounted for up to 2,000 reads. When a trace matches the mitochondrial genome (e.g 818410639 or aab31b01.b1), one also might expect its mate (aab31b01.g1) to match. This is true for the highest 24 scores and gives an opportunity to measure inserts sizes. I measured 4 inserts and found sizes of 3,596, 3,595, 3,561 and 3,411bp respectively.

The graphical representation (below) of the hits along the mitochondrial genome indicates, however, a non-even distribution of the hits. None of them are found in the first 20 kb when 3 clones overlap around 22 kb. There are 3 reads for which the mate is apparently absent from the mitochondrial genome (I traced a line between the mates). For read 1, the mate contains a TP1 (transposon 1 sequence of N. Hardman) suggesting a nuclear localisation? Or a cloning artefact? The read itself, which is 943 nt long, contains mitochondrial DNA up to nt 531 and not more. Read 2 is short, 440 bp, fully mitochondrial but its mate does not contain any known sequence. Finally, read 3 puzzles me. Its mate does contain mitochondrial DNA, although with “intervening” sequences, and should have shown up in the figure. The reason why it does not is completely unclear to me.

Question : Anything known about a differential stabilityin E. coli of mitochondrial DNA clones originating from sequences 0 to 20-kb? Or, is this a bias due to the small size of the sample?

Ribosomal DNA contentin the 22,616 reads

To detect ribosomal DNA in the pilot assay, I blasted the 6,191 bp sequence (VO1159) containing the 5.8 and 26S rRNA gene. The repetitive nature of this DNA is immediately seen in the form of many overlapping reads.

This time the distribution of the reads along the sequence is relatively even, with somewhat more clones on the left side, that corresponds to the 5.8S rRNA. Many reads are about 900 bp long and are 99% identical to the known sequence. The mistakes are located at the end of the trace. I measured 3 inserts that were 3.78, 3.65 and 4.18 kb long. If one considers a 1-kb window as on the graph, about 25 copies are present in the trace archive. If this is true all along the rDNA palindrome, this gives an estimate of 30 x 25 = 750 traces containing rDNA i.e 750/22 616 x 100 = 3.3%. The amount of rDNA in the Physarum genome is not exactly known, but is in the range of 1-2%, suggesting a slight preferential cloning and/or an enrichment during CsCl gradient DNA purification.

Genomic DNA : Transposon-like elements

The amount of DNA sequenced in the pilot assay is 14,000 kb. This is about 5-10% of the haploid genome, approximately a 0.1x coverage. Therefore most single copy genes should be absent from the reads. Indeed, none of the known Physarum actin, profilin, or histone H4 sequences were found by blast. On the other hand, repetitive DNA sequences should show up.

About one third of the Physarum genome is known to re-associate rapidly upon denaturation and to be hyper-methylated. For one part, this “repeated compartment” is composed of scrambled versions of a 8.3-kb LTR-retrotransposon--like sequence (TP1) that had inserted into itself, generating 20-50-kb islands of repeated, hyper-methylated DNA as shown by Norman Hardman. Blasting the TP1 element against the traces generated about 500 hits. Another LTR-retrotransposon of 1.68-kb (TP2) has been described by N. Hardman, which, despite its short size, generated another 300 hitswhen blasted against the reads. If one considers the 0.1x coverage, it is likely that these sequences are represented more than 1,000 times in the Physarum genome. Neither of these 2 elements appears to be present in Dictyostelium although a TP1-homolog is present in Arabidopsis. In any case, the number of traces containing homologs of retroviral-like proteins like Gag, Pol (Reverse Transcriptase) or polyprotein is much higher than the TP1+TP2 hits, in the range of 2,500, indicating other families of repeated elements in Physarum.

Clearly, much information on the Physarum genome can be obtained from this pilot assay. In the next report, I will review some data obtained on the analysis of unique sequences matching sequences previously known from Physarum and some others that have a counterpart in Dictyostelium.