Project number 2: To work with, and think about, binding sites for DNA binding proteins (DUE December 1st by 11:59 pm)

Zheng et al. published a paper in 2004 in which they used a couple different methods to identify genes and operons in the sequence E. coli strain MG1655 that are regulated by CAP. They found that ~180 promoter regions were activated by CAP•cAMP and about 20 were repressed by CAP•cAMP. The end result of this work is shown in Tables 1, 2 and 3 of the paper (which you can download at the class website).

For this project, two main things need to be done.

1a. Collect 10 CAP binding site sequences from Tables 1 and/or 2. Line them up with each other and generate a consensus sequence. (show this work, include the names of the genes that you used).(8 pts)

1b.Take your aligned sequences and put them into the weblogo program ( to create a WebLogo like the one below.

How does yours compare to the one below (Logo’s CAP consensus)?(4 pts)

(

2a. You need to You need to find a CAP site from Table 1 or Table 2 in Zheng et al., You need to find a CAP site from Table xx in Zheng et al. Choose one that lacks an obvious reason to need one. That is, don't pick a sugar catabolism gene.

Next, find the CAP binding sitein the gene promoter region of its gene and find the -35, -10, +1 sites as well as the start codon of the first gene regulated by it. These should be mapped out on the gene sequence as shown below.Indicate whether your gene shows type I or type II activation by CAP(12 pts)

2b.Write a few sentences on why you think your gene/operon is regulated by CAP•cAMP, be sure to include some information on what the gene/operon does (more than the small description in Table 1/2).(6 pts)

An example: if you choselacZ(which you won't because it is a sugar catabolism gene)

1. Go to the KEGG genes database (

2. Type the species and gene name into the search box like this: eco:lacZ.

3. This will bring you to a KEGG page with information about your gene. Down near the bottom is the gene's sequence from its start to its finish. Unfortunately, you need more than this--you need the promoter region which is upstream of the gene. To get that, add 400 bases to the "upstream box" that should be more than enough (I added 300 in this case).

Hit the "NT seq" button.

------

4. Finally, the sequence is at hand. Your gene is in blue at the bottom, space between your gene and the next one upstream is in in black and the upstream gene (lacI pointing in the same direction as lacZ) is shown in blue up at the top. If the upstream gene is more than 400 bp away from your gene, you won't see it unless you add more DNA in step 3. Green text upstream means the next gene is in the opposite orientation.

------

5. Lastly, use this DNA sequence and in Word, find and highlight the CAP site(s) etc as outlined on the first page. Don't just stupidly put your CAP sequence into Work and hit find. If the CAP binding site is broken by a line return (paragraph return, Word won't find it. So, best to look for it by eye.

TTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGC

lacI stop CAP

ACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAA

-35 -10 +1

TTTCACACAGGAAACAGCTATGACCATGATTACGGATTCACTGGCCGTCGTTTTACAACGTCGT

lacZ start

GACTGGGAAAACCCT

Note: Table2 indicates where the CAP sites are relative to the +1 site, from that info, you should be able to find reasonable -35 and -10 candidates, realize they may be off the standard sigma70 consensus by a bit, like the ones for lacZYA are.BUT..Table 1 indicates where the CAP sites are relative to the start of the gene (ie relative to the “A” of “ATG” if the start codon is an ATG start. There are sometimes GTG and TTG starts)