Primary Primer Construction

See Primer Design Figure

A. Forward Primer

1.  Take the first 20 bases of the predicted ORF

2.  Confirm an ATG start; if necessary, edit sequence to produce an ATG start

3.  Add the tag sequence (5’ GGAGGCTCTTCA 3’) to the 5’ end of the primer

B. Reverse Primer

1.  Take the last 20 bases of the ORF (do not include the stop codon)

2.  Take the reverse complement of the sequence

3.  Add the tag sequence (5’ AGC TGG GTT CTA 3’) to the 5’ end of the primer

Primary Primer Design

To reduce the cost of the PCR primers a nested primer strategy was used in which the gene-specific primers had a 20 base overlap with the gene and a 12 base overlap with the secondary primers that contained the attB sequences. Key features in the primer design are:

1.  A sequence predicted to be strong ribosome binding site (RBS), GGAGGC, upstream of the predicted start codon for the target ORF. Starting translation using this RBS would allow the predicted protein to be expressed from the cloned DNA without any N-terminal or C-terminal extension. However, there are no stop codons in the reading frame of the protein, so the protein can be tagged at the N-terminus if translation begins at a start codon upstream of the attB1 sequence (unpublished).

2.  The DNA sequence corresponding to all start codons was changed to ATG and, between the RBS of the primer and the start codon, we included the sequence TCTTC. Together with the GC at the end of the RBS, these sequences generate a SapI recognition sequence (GCTCTTC) between the start codon and the ribosome binding site. Cutting with SapI produces a consistent 3’-TAC-5’ overhang after cleavage around the ATG sequence. SapI digestion enables replacement of the 5’ transcription and translation signals by restriction enzyme/ligase methods, should this be desired.

3.  TAG was inserted as a uniform stop codon to enable the possibility of C-terminal tagging of the protein by read through of this codon in amber suppressor strains.

4.  Bsp1407I restriction sites were included in both forward and reverse primers to enable excision of the insert.

Secondary Primers

Secondary forward primer 5’GGGGACAAGTTTGTACAAAAAAGCAGGCTTAGGAGGCTCTTCAATG’3

Secondary reverse primer 5’GGGGACCACTTTGTACAAGAAAGCTGGGTTCTA’3

Determination of the ORFs to clone

6204 ORFs were identified in the original annotation, referred to below as the Primary annotation. We reviewed the Primary annotation for two reasons. The original work was done by three different groups and we wanted to reassure ourselves that the criteria used were relatively uniform. In addition, the genome sequence of Agrobacterium tumefaciens, an organism closely related to S. meliloti, was published subsequent to the original annotation (Wood et al. 2001; Goodner et al. 2001) and available for comparison. Reannotation used the Comprehensive Microbial Resource (CMR) http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl found on The Institute for Genomic Research’s web site http://www.tigr.org/. The Primary annotation and an automated annotation referred to as the TIGR annotation are listed in the CMR. Comparisons of the two annotations were displayed using the Region View tool http://www.tigr.org/tigr-scripts/CMR2/Display_Region_Form.spl?db_data_id=40157&db=ntsm01.

The Primary annotation was used as the basis for the list of ORFs to clone. If the same ORF was listed in the TIGR annotation this was accepted without further review. If a region contained an ORF in both the Primary and TIGR annotation but the start positions did not agree between the two annotations, then the ORF was reevaluated by using BLAST to find similar sequences in the Genbank database and the set of these sequences was examined manually. A major difference between the Primary annotation and the TIGR annotation was that the TIGR annotation listed a large number of “ORFs” whose putative protein products were designated “conserved hypothetical protein” or “hypothetical protein”. “Conserved hypothetical protein” included several proteins similar to the putative protein product of an ORF found in the genome of A. tumefaciens, strengthening the assignment of the predicted polypeptide as a legitimate protein and these were examined very closely. Our reevaluation of the genomic sequence identified over 100 ORFs that we included in the cloning effort. Also included in the list of ORFs to be cloned are the additional ORFs discovered by Djordjevic et al. (2003) in their analysis of the S. meliloti proteome and a few genes where it is known that the S. meliloti genome contains mutant versions of genes active in other, closely related S. meliloti strains

The CMR is fully described in: J.D. Peterson, L.A. Umayam, T.M. Dickinson, E.K. Hickey and O. White. The Comprehensive Microbial Resource. Nucleic Acids Research, 29:1 (2001), 123-125.

* Toulouse webpage: http://sequence.toulouse.inra.fr/meliloti.html

* TIGR Region View webpage:

http://www.tigr.org/tigr-scripts/CMR2/Display_Region_Form.spl?db_data_id=40157&db=ntsm01

* NCBI webpage: http://www.ncbi.nlm.nih.gov/