ICAR Workshop: July 2, 2009
Putting TAIR to work for you:
Hands-on workshop for beginning and advanced users
Part 2: Practice makes perfect . . . 10 fun problems to help you become better friends with TAIR
------
Participants can:
a) Choose the most relevant one(s) and try them
b) Ask any other questions related to their specific challenges at TAIR
c) Leave now, but come and visit us at the Curation Booth and/or send us an e-mail at
------
Data File and Answers:
------
1. You have the results of a microarray experiment that you have already analyzed and cleaned up. You have found the genes that are significantly altered in the csm1-1 (cold-sensitive mutant 1-1) background relative to wild-type genes. (Use: TAIR workshop - part 2 - data file)
a. What percentage of genes in the set is annotated to produce proteins located in the chloroplast?
(HINT: use Search -> GO Annotations -> Bar chart of genes)
(i) in your data set(21%) (ii) in the whole genome(13%)
Explanation: Go to the GO annotations bulk data retrieval tool: Upload the data file: TAIR tutorial - part 2 - data file. Click on "Functional Categorization." Then, select "Gene Bar Chart" and hit "Draw." Open a new window with the GO annotations bulk data retrieval tool. Click on "Whole Genome Categorization" Then, select "Gene Bar Chart" and hit "Draw." Compare the two sets of graphs.
To look for statistically over-represented terms, etc., you can use a program at GO:
b. What percentage of annotations in the set of genes analyzed is associated with “transport?”
(HINT: use Search -> GO Annotations -> Pie chart of annotations)
(i) in your data set(5%) (ii) in the whole genome(4%)
Explanation: Go to the GO annotations bulk data retrieval tool: Upload the data file: TAIR tutorial - part 2 - data file. Click on "Functional Categorization." Then, select "Annotation Pie Chart" and hit "Draw." Open a new window with the GO annotations bulk data retrieval tool. Click on "Whole Genome Categorization" Then, select "Annotation Pie Chart" and hit "Draw." Compare the two sets of charts.
To look for statistically over-represented terms, etc., you can use a program at GO:
c. You then do a microarray experiment in your csm1-1 mutant and you see what happens to the level of expression of the significantly altered genes 2 hours after a cold treatment.(Use: TAIR workshop - part 2 - data file)
* Your data file contains the gene identifiers and one column that has the relative expression values on a 0-centered scale. The data are in column 1.
Which metabolic pathways have two or more reactions that are up-regulated or down-regulated?
(HINT: use OMICs viewer)
UP = abscisic acid biosynthesis, superpathway of carotenoid biosynthesis, xanthophyll cycle, etc.
DOWN = phosphatidylcholine biosynthesis I, superpathway of choline biosynthesis, phospholipid biosynthesis, etc.
Explanation: Go to the AraCyc OMICs viewer tool: and upload the data file: TAIR workshop - part 2 - data file.
Leave the default values:
- Relative
- 0-centered scale
- Gene names and/or identifiers
In the "Data column (numerator in ratios)" slot, enter "1" and leave the default color scheme.
Optional: Select "Generate a table of individual pathways exceeding threshold" and enter a numeric value, e.g. 4, to get a list of pathways where at least one element has a gene that is up-regulated or down-regulated by 4-fold or more.
------
2. AT4G35790 is “involved in” the response to cold stress
a. What evidence code(s) are associated with this annotation? What paper(s) support this annotation?
(HINT - look at the annotations section of the locus page)
IMP:Li, et al. (2004)
IEP:Kawamura, et al. (2003)
Explanation: Go to the Locus page for At4g35790 using Quick search bar and “Gene” or “Exact name search”:
In the Annotation section, click on the “Annotation Detail” button at the bottom of the page to get to the screen that shows the evidence codes and references:
b. Is its absolute expression level higher 30 min after cold treatment or 24 hours after cold treatment (4°C)? At both time points, is it higher in the shoot or the root? What is the fold change in expression level in the roots at 24 hours?
(HINT - look at the external links section of the locus page – eFP browser)
24 hours; in the root at both time points; 1.51 fold change
Explanation:
Use the eFP Browser link on the Locus page, and make sure to select “Abiotic Stress” as the Data Source on the drop-down menu.
On the cold stress track (row 2), the absolute expression is higher at 24 hours than 30 min. At both time points, the expression is higher in the roots than in the shoots. All of these numbers can be obtained by hovering over the appropriate time point and plant organ. For example, the expression level in roots at 24 hours post cold treatment is 502. If you switch the “Mode” to “Relative” you can find out that fold-change in the roots is 1.51. You can also get these numbers by obtaining a table using a button at the bottom of the page.
------
3. a. You are mapping a cold-sensitive mutant in the interval between markers PRHA and JM411 on Chromosome 4. Find all of the other PCR-based markers to narrow this region. How many are there? What restriction enzyme can you use for CAPS marker JM142?
(HINT: use Marker Search)
There are 40.
AluI can be used for JM142
Explanation: Use the marker search page: In the section marked“Restrict by Features,” click on the "All PCR" box. In the section marked "Restrict by Map Location" enter the following:
Chromosome: 4; Map type: AGI; and then the two marker names into the two slots and choose "marker" next to their names. Then click on the "Submit Query” link. Once you get the results, click on “JM142” and the restriction enzyme is listed on the genetic marker page:
b. You narrow the mapping region to between markers SM103_365,6 and SM58_108,7. Are there any gene(s) in this region that have a GO annotation related to cold? If so, what are they?
(HINT: use Gene Search)
2 genes:AT4G35790andAT4G36020
Explanation: Use the gene search page: In the section marked: "Search by Associated Keyword", enter the text "cold" and change the qualifier to "contains." In the section marked "Restrict by Map Location" enter the following:
Chromosome: 4; Map type: AGI; and then the two marker names (SM103_365,6 and SM58_108,7) into the two slots and choose "marker" next to their names. Then click on the "Submit Query link.
c. If there are any cold-related genes in the region, get the genomic sequences for the representative gene models? Which models are these?
(HINT: use Bulk Data Retrieval)
AT4G35790.1 and AT4G36020.1
Explanation: Go to “Downloads ->Bulk Data Retrieval and choose “Sequences.”
Paste in your locus identifiers, select “AGI genomic locus sequences” for your “Dataset” and click the button next to “Get sequences for only the gene model/splice form matching my query.” In your fasta file headers, you can see the numbers of the representative gene models.
------
4. You are mapping a mutant with a lobed leaves phenotype.
a. Find the gene on chromosome 2 between 15700 and 24800 kb that has this phenotype.
(HINT - useGene Search)
AT2G37630(AS1 – Asymmetric Leaves 1)
Explanation: Use the gene search page:
Change the entry from “Gene Name” to “phenotype” in the first drop-down menu under “Search by Name or Phenotype,” enter “lobed leaves” in the text box, and change the option to “contains.”
Scroll down to 'Restrict by map locations' and select chromosome 2, AGI map, then enter 15700 and 24800 in the text boxes and make sure the units are “kb”. An alternative way to answer this question is to use the 'Seed/Germplasm' search option from the SEARCH drop-down menu.
b. How many other mutants with lobed leaves does the ABRC stock center have? Don’t they have some great images associated with them? (HINT: Use the Seed/Germplasm Search)
20. And, yes, they do have some great images associated with them! Check out CS3602 for one good example!
Explanation: Use the seed/germplasm search page:
In the section called Search by Name, Phenotype or Stock Number, enter “lobed leaves” in the text box across from “phenotype.” Scroll down and click on “is ABRC stock” and then “Submit Query.” Every little camera icon indicates that there is a photo of the mutant.
------
5 a. You are about to start working on a new set of 3 genes. Before starting any experiments you would like to double-check if the gene structures are correctly annotated at TAIR. The names of these genes are At4g22760, AT1G01010 and AT1G73400. (HINT: Find the appropriate tracks in GBrowse and make sure to look a little upstream and downstream of the current gene model)
i. Which one of the 3 genes is correctly annotated?AT1G01010 (based on current data)
ii. Which one gene needs an exon extension based on proteomics and cDNA data?AT1G73400
Explanation: Find the gene in GBrowse. Increase the interval shown by changing it to “Show 2 kb). At the bottom of the page, turn on the following tracks under “Expression”: AtPeptide and AtProteome and, under “Sequence Similarity” turn “All on.” The AT1G73400 gene is truncated and can be extended at the 5' end based on an Arabidopsis cDNA. A Brassica EST also supports extension beyond the current transcription start site although in this case theindicated splice sites are not canonical (thus an alternative gene model is not required). In addition proteomics data (AtPeptide, Castellana et al. 2008 track) supports the addition of a second splice variant which includes an alternatively spliced 5' exon although the translation start site is maintained in both splice variants..
iii. Which one gene requires splitting into two based on sequence similarity?At4g22760
[ Show » ]
David Swarbreck - 15/Jun/09 12:02 PM updated question 5a, i found a few examples of genes requiring updates if I find any better examples before the conference we can update this again. 5. a. You are about to start working on a new set of 3 genes. Before starting any experiments you would like to double-check if the gene structures are correctly annotated at TAIR. The names of these genes are At4g22760, AT1G01010 and AT1G73400. Find the appropriate tracks in GBrowse that help you decide whether the current gene structures are correct or not. (HINT 1: Only one of the 3 genes is correctly annotated HINT 2: One gene needs an exon extension based on proteomics and cDNA data HINT 3: One gene requires splitting into two based on sequence similarity.)
Explanation: Find the gene in GBrowse and turn on the same tracks as above. Arabidopsis cDNAs, ESTs, and Brassica ESTs support a division of this gene into two different models.
5b. You would like to know whether gene AT1G05240 contains a predicted TATA box in its promoter region. Does it? How many other genes in TAIR have the exact same TATA box within 500 bp of their upstream sequence?
(HINT 1: Use GBrowse to find the TATA box and use PPDB to obtain the exact sequence
HINT 2: Use PATMATCH)
Explanation: By selecting the PlantPromoterDB track in GBrowse, you will find a glyph that says 'TATA' right upstream of AT1G05240. By clicking on this glyph, you are redirected to the PPDB site (PlantPromoter database) where the exact sequence of this TATA box is shown. It is ATCTATAAAAG. You can now select this sequence, go to Patmatch in TAIR (select this from the TOOLS drop down menu):
Search exact matches of this TATA box sequence in all “TAIR9 Loci Upstream Sequences -- 500 bp (DNA)” You will find 38 sequences containing the queried sequence.
If you look in all intergenic regions of TAIR9, you will find 128 Sequences containing the queried sequence.
------
6.Are there any genes that might perform the same activity as AT3G30775 . . .
a. InRattus norvegicus and S. pombe, based on molecular function annotations? (HINT: use Annotation term on Locus Page and "GO Database")
Prodhin R. norvegicus
SPCC70.03cin S. pombe
Explanation: On the locus page, click on the term called “proline dehydrogenase activity.” On the term page that comes up there is a link called “GO database.” When you click on it, you go to AmiGO. On the top of the page, under the term name, there is a link to “26 gene product associations.” Click on it and search for the appropriate organisms. Or, you can set the “Species Filter” option to retrieve data only from the desired organisms.
b. In Sorghum bicolor and Vitis vinifera based on putative orthology / plant gene families? (HINT: Use GBrowse)
Sb01g029660.1 inSorghum bicolor
GSVIVT00028134001 in V.vinifera
Explanation: Go to see the gene in GBrowse. Turn on the track under “Orthologs and Gene Families” called “Plant Gene Families (Phytozome)” and look for the appropriate organisms.
------
7. a. How many splice variants does AT3G07360 have? (3)
(HINT: Use Locus page)
Explanation: Go to the locus page for AT3G07360 using the Quick search and “Gene” or “Exact name search”
i. Which one encodes the longest coding region?
AT3G07360.1 – The representative gene model is always the longest coding region
ii. What differences in biochemical activity might you expect to find between the different splice variants if the U-box is required for enzyme activity?
(HINT: Use gene model pages)
Only AT3G07360.1 has predicted enzymatic domain. If the other splice forms were expressed and tested in vitro, they would be unlikely to have this enzymatic activity.
Explanation: Go to the Locus page for AT3G07360 and see the list of how many gene models (splice variants) are present. (Use glyph). Click on each gene model / splice variant and go to the gene model page. In the protein data section you can see the protein domains that are predicted to be present in that specific splice variant. Different domains may be present in the different splice variants. In this case, only the AT3G07360.1 gene model is predicted to encode a U-box.
b. How many splice variants does At1g02880 have?(4)
i. Which one encodes the longest coding region?(At1g02880.3)
ii. What differences in biochemical activity might you expect to find between the different splice variants?
(HINT: Look at Locus and protein pages)
All 4 variants have the predicted enzymatic domain.
(see explanation above)
------
8. Your mutant has elevated levels of zeinoxanthin.
a. What candidate gene(s) might act upstream of it? Which of these have experimental support for their protein function?
(HINT: use quick search bar -> “Metabolic pathways” or and use quick search bar ->AraCyc)
LUT5 – experimental data: reaction blocked in mutant
B-OHase 1 – experimental data: reaction blocked in mutant and functional complementation
Explanation: Use the quick search bar, switch the option to “Metabolic pathways,” type in zeinoxanthin, and submit the request. On the compound detail page:
you can see that this compound is present in the lutein biosynthesis pathway. Click on this link. On the pathway page, there are 2 enzymes predicted to be immediately upstream of zeinoxanthin. Mousing over each enzyme gives the evidence code and type of support.
b. What candidate gene(s) might act downstream of it?
Downstream: LUT1 – experimental data: reaction blocked in mutant
c. What is the molecular formula of zeinoxanthin?
C40H56O
This information is displayed on the compound detail page:
------
9. a. You want a loss-of function (preferably null) mutant of AT2G37630. Which seed stock should you order?
(HINT: look at the polymorphism and germplasm sections of the locus page. Also, use GBrowse to look at the locations of polymorphisms and T-DNAs)
Loss of function mutant:AT2G37630
Explanation: Go to the locus page for AT2G37630:
From the polymorphism and germplasm sections of the locus page, you can see that as1-1 is a mutant that gives a strong phenotype. Click on as1-1 to go to the polymorphism page:
This shows that it is an X-ray induced frame shift mutation. The germplasm can be ordered in this section.
If there were no characterized lines,from the Locus page, click on the link to GBrowse, or click on the gene model image. Open the track called “T-DNA/ Transposons” in the “Variation” section. You can move this track to put it right next to the protein coding model. SALK_023987 would be a good bet because you can see from GBrowse that it is in a coding exon. Warning... not all SALK insertions are correctly linked to exons or introns - it's always wise to sequence the borders of the insertion yourself.
b. You decide to order a second insertion mutant and you know that you need to sequence to confirm the predicted site of the insertion shown on the website. So, you want to design primers that flank the predicted insertion site. Download a decorated fasta file with the exons in capital letters and the predicted insertion sites in blue.