Woods Hole – Zebrafish Genetics and Development

Bioinformatics/Genomics Lab

Ian Woods

Note: This document “wh_informatics_practical.doc” and supporting materials can be downloaded from my website:

Setting the stage: These tasks each pertain to the mutation that we (virtually) mapped in lab. The curved body axis and U-shaped somites observed in these mutants are hallmarks of disrupted slow muscle development, and similar phenotypes are observed in mutants with defects in Hedgehog signaling.

General descriptions of the tasks are provided below. Specific protocols can be found following this introductory section. Each of you should choose (at least) one task to accomplish, and collaboration is highly encouraged.

Task 0: High resolution mapping, sequencing, and expression

Overview: From a rough map position, refine the critical interval via (virtual) high resolution mapping with additional markers. Query the critical interval in the zebrafish genome for potential candidate genes. Locate expression patterns online for these candidates. Design primers to sequence candidate genes for the mutagenic lesion or for additional SNPs to use in mapping.

Task 1: Create a transgenic reporter line by cloning candidate enhancer/promoter sequences

Overview: Identify the translational start site of a gene of interest. Obtain ~6kb of sequence upstream of this site. Design PCR primers that will amplify this region, and clone it in-frame with GFP in a tol2 expression vector. Identify BACs for use in creating reporter constructs via homologous recombination. Identify evolutionarily conserved sequences from other organisms to uncover potential regulatory regions around your gene of interest.

Task 2: Expression, Rescue, and Phenocopying

Overview: Identify the zebrafish ortholog of your favorite gene. Find its location in the genome, locate the translational start site (ATG), and identify the exon-intron boundaries. Design two 25-mer morpholino sequences that target (1) the ATG and (2) an exon-intron boundary. Identify an orthologous gene in another fish species for use in rescue experiments to control for morpholino specificity. Align this sequence with your morpholinos to determine degree of potential activity. Obtain a full-length clone of the zebrafish gene (via RTPCR or clone collections) for use in overexpression experiments or expression analyses via in situ hybridization. Identify potential CRISPR targets within your gene.

Task 3: Batch BLAST and parsing with Python to identifying zebrafish transcripts related to a specific signaling pathway

Overview: Mine OMIM (Online Mendelian inheritance in Man) for genes related to Hedgehog signaling. Obtain amino acid sequences for these genes, and identify putative zebrafish orthologs for these proteins via BLAST. Use a simple script to parse the blast results to see where the genes are located in the zebrafish genome. Finally, find out where a few of these genes are expressed (via zfin).

Requirements: Terminal, python (both native on MacOSX)

Task 4: Visualization of enriched motifs in putative promoter / enhancer regions.

Overview: From a file of unidentified sequences derived from a transcriptome profiling experiment, identify the best matching Ensembl transcript via local BLAST, batch download potential promoter sequences for each of these transcripts, search through these promoter sequences for enriched motifs, and visualize the location of the motifs on the promoters (just a bit advanced)

Requirements: Terminal, Python, Matplotlib (all native on MacOSX)

Protocols:

Task 0: High resolution mapping, sequencing, and expression

1.The mutation we mapped in lab is flanked by SSLP/Zmarkers Z11119 and Z15270. Your first job will be to view the region of the genome that is flanked by these two markers. Within this region, you can identify candidate genes and find additional markers can be used to refine your map position, and thereby narrow the critical interval in the genome to look for the gene that is disrupted in the mutation.

Start at zfin.

Enter ‘Z15270’ in the box at the top. On the page that follows, hit the link, and then hit the link to GenBank. On the GenBank page, click FASTA and copy the sequence onto the clipboard of the computer. To find the location of this marker in the genome, we’ll go to the zebrafish genome browser hosted by EMBL:

Follow the link for ‘BLAT’, paste the sequence of this gene into the window, select ‘Danio_rerio’ from the species menu, and click ‘RUN’. On the following page, click the link for the best matching chromosome region. This takes you to a view of the genome, centered on this map marker. Note the physical location of this marker (the numbers in the genome window). Zoom out a bit to get a sense of the genomic region.

Repeat the above steps for Z11119. How many hits in the genome do you obtain for Z11119? What does this mean? Choose the ‘best’ alignment. Where in the genome if your mutation likely located (answer in terms of numbers).

2.Now let’s look at a candidate gene near one of your map markers. Find primer sequences for one of these genes (calca). From your browser window for Z15270 in #1, locate calca and click on it. This takes you to the Ensembl page for this marker. Click on the ZFIN link, which takes you to the ZFIN page for this zebrafish sequence. Scroll down to the RefSeq link under ‘Sequence Information’ and follow it. Locate the ‘FASTA’ link and click it, which takes you to a page where the sequence is located. Copy this sequence to the clipboard on your computer.

Now to go one of many websites for Primer design:

Paste in your sequence and select a length of 500-550 (the comfortable limit for sequencing PCR products). Hit ‘Pick Primers’ and retrieve your primer sequences. The next step would be to amplify gDNA from wildtype and mutant embryos via PCR, sequence the PCR products, identify sequence differences, and use this information to test for linkage between this gene and your mutation. We’ll go over how to do that in more detail below.

3.You collect hundreds of mutants for use in a high-resolution mapping panel, and test them for linkage to numerous markers from your region. You find that the SSLP Z15270 is the marker that is most tightly linked to your mutation, but some recombinants remain. Query the zebrafish genome assembly to see a model of your region of interest (the assembly is pretty good on a large scale, but can be misleading in a local region). Go to the Ensembl website.

Find the genomic location of Z15270 as above. Click on the “Configure this page” link on the left hand side of the page. Here you’ll find all sorts of ‘tracks’ you can turn on and off to show different kinds of information. Try turning some additional features on. Save and close the configuration window by hitting the checkmark in the upper right, and zoom out in the browser as far as is allowed.

4.Exploring the genomic region – what do these genes do? Click on some of the genes found in the region, taking you to the gene record page. Find and click the ‘orthologues’ link on the left hand side of the page for each gene. What kind of gene is PDE3B?

5.Go back to the genomic view. Can you get a link to ZFIN for any of these genes? Click on rras2, and follow through to ZFIN. Follow the link for Expression Data. Your mutant has defects in muscle specification – is the expression pattern of rras2 consistent with a role in muscle?

6.You decide to sequence rras2 in wildtype and mutant embryos to see if (1) you can find a SNP to map to rule this gene out via recombination, and (2) you can find a change in the mutant sequence that might cause a loss-of-function phenotype. Design primers that will amplify a 600 bp PCR product that contains the first exon of rras2.

Find the rras2 entry in ZFIN (you are probably already there in step #5). Go to the ZFIN homepage:

Click on Genes/Markers/Clones and enter rras2. On the ZFIN gene page, scroll down and follow the link to the RefSeq RNA record. Scroll down and note the coordinates of the coding sequence (CDS) in the entry. Copy the coding sequence onto the clipboard.

Go to the UCSC genome browser (you can also do this on the Ensembl browser, but the UCSC interface is a bit friendlier for this task):

Click on the BLAT tab, and paste in your sequence. Select “Zebrafish” from the Genome pulldown menu, and click “Submit”. Follow the link for “details” on the first BLAT hit. Scroll up and down to check your results – what to the different color-codings mean in your sequence?

Select about 600b of genomicsequence from which to design primers, then head to the primer3 website:

Paste in your sequence, choose a size range of 500-600b (about the limit of a sequence trace from a PCR template), and click “Pick Primers”.

7.You PCR from genomic DNA of wildtype and mutant embryos, and sequence the PCR products. The sequencing results are as follows:

wildtype_rras2_exon1

AGGCGGGAGTGTGAGCGCGCGCCCCCTCGCGCCCGCCGCGCGCACTGCCAGCACTGATTAGCCGTATCTTCCCCTCATCTTGCAGCACAGGCAGTCAGTCAGTGCCTGGTAGCGATTTGGACGAGGGCGTATGGACTTGAAGCAGCAGTGTATGCATTTCCCACAGACTGTGGTCGTACTTTTCTCCTGTCGGACGGATTACCACTGAGTTGACACATAGCCCAAAAGCCGCTTCGCATTTTTTCCGCTGCATTTCTCTAACTGAAGGCCTGTCACAGAGTAAAGTGGCTCGGTGTGCGTGTGTTTAGACAGCGGAGCGAGAGCAGCAGTGTGTCCCCGATGGCTGGCTGGAAGGACGGCTCAGTGCAGGAGAAATATCGCCTGGTGGTCGTCGGAGGTGGTGGCGTCGGAAAATCAGCGTTAACCATCCAGTTTATCCAGGTAAGCGGATACATGGCGGAATGTTATGTGGTTTTCGGCCCTTTAAAAAGATGTGAGGGTGTTGAGGAGAAATGCGTGGATCTTGCTCACAGAAATGGGGACCCCATGAGCGGAAAAGGGGGTTCAGGAATCCAAGCTAGGCCTGCGACACTTTAAACC

mutant_rras2_exon1

AGGCGGGAGTGTGAGCGCGCGCCCCCTCGCGCCCGCCGCGCGCACTGCCAGCACTGATTAGCCGTATCTTCCCCTCATCTTGCAGCACAGGCAGTCAGTCAGTGCCTGGTAGCGATTTGGACGAGGGCGTATGGACTTGAAGCAGCAGTGTATGCATTTCCCACAGACTGTGGTCGTACTTTTCTCCTGTCGGACGGATTACCACTGAGTTGACACATAGCCCAAAAGCCGCTTCGCATTTTTTCCGCTGCATTTCTCTAACTGAAGGCCTGTCACAGAGTAAAGTGGCTCGGTGTGCGTGTGTTTAGACAGCGGAGCGAGAGCAGCAGTGTGTCCCCGATGGCTGGCTGGAAGGACGGCTCAGTGCAGGAGAAATATCGCCTGGTGGTCGTCGGAGGTGGTGGCGTCGGAAAATCAGCGTTAACCATCCAGTTTATCCAGGTAAGCGGATACATGGCGGAATGTTATGTGGTTTTCGGCCCTTTAAAAAGATGTGAGGGTGTTGAGGAGAAATGAGTGGATCTTGCTCACAGAAATGGGGACCCCATGAGCGGAAAAGGGGGTTCAGGAATCCAAGCTAGGCATGCGACACTTTAAACC

You wish to know if these sequences harbor any polymorphisms, and whether you can use these polymorphisms to facilitate your high resolution mapping. Align the two sequences via BLAST2:

Follow the link for ‘nucleotide blast’, and check the box for ‘Align Two or More Sequences’. Note the points at which the two sequences differ.

Next, you’d like to see if the polymorphisms can be distinguished via restriction digest. Paste about 40b of wildtype and mutant sequence flanking the SNP into the dCAPS website, leaving the “mismatches” field blank.

Are there enzymes available that will cut wildtype but not mutant sequence (or vice versa)? If a SNP does not have a polymorphism, try entering “1” in the mismatch field – what does this accomplish?

8.Finally, do the SNPs result in changes in the coding sequence for rras2? Try BLASTing the mutant sequence (from #7 above) vs. the amino acid sequence (from the GenBank/NCBI page from #6), using ‘Align Two or More Sequences’ and BLASTX.

Task 1: Create a transgenic reporter line by cloning candidate enhancer/promoter sequences

1.Eventually you identify the mutation as a lesion in the gene scube2. You wish to analyze the morphogenetic movements of cells expressing this gene during development in live embryos. To accomplish this, you decide to make a GFP reporter line that reflects the endogenous expression of this gene. First you decide to try a quick-and-dirty approach: you plan to clone genomic sequences upstream of the translational start site (ATG) of this gene and put them into a tol2 GFP expression vector.

Locate this gene in the genome and retrieve the coding sequence: go to the ZFIN homepage, and enter scube2 in the search box at the top of the page.

Follow the “gene” link to the ZFIN record for this gene, and scroll down the page. Where (which chromosome) does ZFIN say this gene is located?

2.Next, you want to retrieve the nucleotide sequence of this gene to (1) compare it with the genomic sequence, and (2) identify the translational start site. Scroll down the ZFIN page until you find the link for “RNA”. Follow this to the RefSeq record for this gene. Scroll down to the sequence information at the bottom of the page. Where does the coding sequence (cds) begin and end within the complete mRNA transcript? Find the ATG in the nucleotide sequence. Beginning at the ATG, copy about 100b of nucleotide sequence to the clipboard and head to the Ensembl Genome Browser for Zebrafish.


Enter ‘scube2’ into the search box. On the resulting page, click on “Location.” Which direction is the gene transcribed (ie. which strand is the coding strand)?

By high-resolution genetic mapping, you localized the SSLP Z15270 to be 0.1 cM from the mutation in scube2. Z15270 is on chromosome 7 at about 27,488,000. The genetic map length of the zebrafish genome is 3000 cM total, and the total physical length of the genome is 1.7 x 109 bp. Is the actual physical (basepair) distance between Z15270 and scube2 surprising? What factors might account for any differences in expected distance?

Zoom in and move the window so that the first exon encompasses the entire view (you can do so by drawing a rectangle around the first exon or by pressing the < and > buttons). Resize the window to include about 5 kb of upstream sequence (just add 5000 to the righthand number in the location box). Would grabbing 5 kb of upstream sequence be a good idea to make a reporter construct for scube2? Why or why not?

You decide to retrieve all intergenic sequence and test various parts of it for enhancer activity. First, resize the browser window to just include this intergenic sequence. Click the link for “export data” on the left hand side of the page. Pull down ‘soft’ repeat masking in the genomic FASTA options, and hit next. Then click the ‘text’ link to get the sequence.

Copy the DNA on to the clipboard, then go to the Primer3 website to design primers, trying to get as much of the input sequence as possible into the PCR product.

To clone this bit of DNA, you would add appropriate restriction enzyme (or Gateway, or SLIC, or PIPE, or Gibson) sequences to the primers, PCR amplify, and hop into your favorite GFP expression vector.

3.You successfully make this vector and inject it into 1-cell stage embryos. The GFP expression in injected fish (aka. ‘transient-transgenics’) is promising – the pattern of GFP expression in a few fish roughly matches what is observed via in situ hybridization. In addition, many other tissues express GFP. Encouraged by this result, you raise the embryos to adulthood and cross them to identify founders. You identify ten founders, but none of your lines express GFP in a pattern consistent with the in situ data: expression in some tissues is absent, and many tissues express GFP where the gene is not normally expressed. How might you explain these results?

You decide to make a new reporter line by BAC recombination: you will obtain a large (~200kb) chunk of genomic DNA that contains this gene, and replace the first exon of your target gene with GFP. Why might this strategy result in GFP expression that more accurately recapitulates the endogenous expression pattern?

You can use at least two approaches to identify a BAC that contains your favorite gene: (1) directly from the Ensembl genome browser, (2) via a BLAST search at NCBI.

3a.Go to the Ensembl home page for zebrafish:

Enter “scube2” in the search box and click “Go.” Follow the link for “Location”. Look at the “Location” pane in the browser page – what is written in the blue bar in the center of the page? If a region of the assembly is represented by a sequenced BAC, there will be a GenBank accession number (eg. AL845363) in this blue bar. By contrast, if the region is represented by whole-genome shotgun traces, you will see something like “Zv9_scaffold12345” in the middle bar.

Turn on the BAC ends track (if not already on) by clicking “Configure this page” (Simple Features) on the left hand side. Check the boxes for CHXXX and DKEYXXX (where X = a series of numbers), and hit the check mark on the upper right corner . Zoom out until you can see connected BAC ends (represented by horizontal blue bars). Are there any good options for BACS that contain the scube2 coding sequence and putative regulatory regions?

3b.Another way to search for a BAC is via a BLAST query at NCBI/Genbank. Retrieve the GenBank accession number for scube2 again from ZFIN, then go to the NCBI BLAST homepage:

Click “nucleotide blast”, enter the accession number in the search box, select “nr” button from the pulldown menu, and type in “Daniorerio” in the organism box. Hit BLAST. On the results page, genome sequence will be annotated as “Zebrafish DNA sequence from clone….” Are there any BAC clones that cover the entirety of the scube2 sequence? You next decide to align the coding sequence with one BAC sequence to check for overlap. Note the accession number of the BAC, and go to the BLAST2 page:

=> select ‘nucleotide blast’ and click the ‘Align two sequences’ box

Enter the accession number for the coding sequence in the top box, and for the BAC in the bottom box, and hit “Align”. Where does the coding sequence (ie. query) begin and end in the BAC sequence? Hit the ‘Dot Matrix’ view for a graphical look.

The next steps would involve creating a targeting vector for homologous recombination. In this case, you could use ET recombination (or another method) to replace the first exon with GFP (or whatever you’d like), and also modify the BAC with tol2 transposon LTR sequences. BACs can be ordered from two sources, depending on the library: