Group / Library / Total reads / Aligned to the genome with <5 matches / 3P tags / Unique 3P tag positions
Core
3P-Seq
libraries / 24 hpf / 22,448,667 / 13,179,736 / 7,423,152 / 231,725
72 hpf / 20,914,473 / 17,818,974 / 9,813,481 / 314,462
Adult / 18,445,044 / 14,743,023 / 7,470,435 / 509,715
Ovary / 17,901,955 / 13,858,697 / 9,233,032 / 586,782
Brain / 16,447,759 / 12,783,999 / 7,905,266 / 616,960
Testis / 17,606,640 / 12,658,140 / 7,621,157 / 870,893
Pre-MZT / 17,002,299 / 11,900,412 / 6,805,611 / 954,595
Post-MZT / 17,219,169 / 10,983,711 / 6,281,147 / 589,195
Additional 3P-Seq libraries / Pre-MZT #2 / 19,613,993 / 13,712,359 / 7,711,764 / 1,020,116
Pre-MZT #3 / 64,220,780 / 44,943,690 / 24,534,399 / 1,736,839
1 cell / 20,234,734 / 12,787,110 / 8,736,478 / 271,239
4 hpf / 17,610,292 / 12,841,981 / 7,423,152 / 476,249

Supplemental Tables

Supplemental Table 1. 3P-Seq read statistics.

A unique 3P tag position is a set of start and end coordinates mapped to by at least one 3P tag.

Supplemental Table 2: Poly(A) sites identified in this study and their assigned categories (available as a separate Excel file).

Supplemental Table 3: 3P-Seq–based 3'UTR annotations (available as a separate Excel file). The table contains, for each protein-coding gene clusters analyzed, all the identified 3'UTRs, along with their classifications and the number of supporting 3P tags.

Supplemental Table 4: The longest 3'UTR assigned to each gene cluster (available as a separate Excel file)

For each cluster of overlapping known or predicted transcript models with a single annotated stop codon, and at least one predicted 3'UTR, the longest transcript was selected and reported together with the longest 3'UTR supported by at least 10% of the reads in at least one library. All the exons in the reconstituted transcript are reported along with the boundaries of the coding sequence.

Supplemental Table 5: Extensions of the amino acid sequences of annotated genes (available as a separate Excel file)

The extensions are based on translation of the sequence spanning the annotated stop of the gene and the first stop codon that appears in the same frame upstream of the longest 3'UTR end. Only genes in which that annotated end of the transcript occurs within the coding sequence were considered. Only sequences extended by at least three amino acids are listed. Cases were the extended sequence overlapped a gap in the genome assembly were excluded from the analysis.

Supplemental Table 6: Correlations between transcript levels estimated using 3P-Seq and RNA-Seq

3P-Seq Sample / RNA-Seq sample (source) / Spearman correlation
Pre-MZT / 2 cells (ERP000400) / 0.655
Post-MZT / 6 hpf (ERP000400) / 0.693
24 hpf / 24 hpf (ERP000016 ) / 0.628
72 hpf / 72 hpf (ERP000016 ) / 0.525
Ovary / Ovary (ERP000016 ) / 0.773
Brain / Average of male brain and female brain ( ERP000016) / 0.716
Adult / Average of adult male and adult female brain (ERP000016) / 0.545

Supplemental Table 7. Number of genes with differences in poly(A) site usages (available as a separate Excel file)

For each pair of samples, the number of distinct genes with conspicuous differences in usage of alternative poly(A) sites is shown.

Supplemental Table 8. qRT-PCR Primer sequences.

Gene / Constant region primers / Alternative region primers
rilpl2 / CGTTTCGACCGAAATGATACGTCTGC
TCCCGTCTCTCCTCACTCAGTTCC / TGCTGTAGGCAGCAATGTGTAGCA
GCCAGTGGTAATCAATCGACAGCAGG
dgcr8 / AGTCTGCTCAACCTGGCAGT
TTTGCACTGGAAATGAATGC / ACTGCAGAGAAAAGGCAAGC
CTGAAACGATGGGGACAACT
pum2 / AGTGGGCTTTGATGAAGGTG
GGAAACCACCTCTCTGGTGA / CATTTTGACACGTGGCAGTT
ATGTCCATGGAGGCAGGTAG
syt13 / ACTGAACCTGGCCAGCCCCG
GCAGGCCAGTGCCGGATGTC / ACCAACCATTCCCCAAGCCAAACT
TCCCTGCAGTTGTCAGGCATTTGT
peli1 / TGGAGTCAAATCCCACTTCC
ACTGGCCCCTGGAAGATTAG / GGCTTCCAGAGATCGTTCAG
GGTGGTCAGGATGGAGAAAA

Supplemental Figures

Supplemental Fig. 1. Pipeline for annotation of Poly(A) sites.

Supplemental Fig. 2 Enrichment of U- and GU-rich sequences around cleavage sites in zebrafish, human and C. elegans. In each case, the frequencies within 50 nt of a poly(A) site were normalized by the median frequency across all positions. Cleavage position is marked by a black arrow. (A) Normalized frequencies of U around the poly(A) sites in zebrafish human and C. elegans. Human poly(A) sites were all the annotated 3¢ ends of protein coding genes in Ensembl v66, C. elegans poly(A) sites were taken from (Jan et al., 2011). (B) Normalized frequencies of GU, CU and AU around zebrafish poly(A) sites. (C) Normalized frequencies of GU dinucleotides around zebrafish, human and worm poly(A) sites.

Supplemental Fig. 3. MicroRNA target site density in zebrafish and human 3'UTRs. Frequencies of 7mer miRNA target sites (non-overlapping 7mer-m8 or 7m-A1, defined as in (Bartel, 2009)) in the non-repetitive fraction of the zebrafish and human 3'UTRs. Each point is a microRNA family conserved between zebrafish and human. The families are color-coded based on the number of A or U bases in the 7mm8 site. For several examples, the microRNA name and the 7mm8 site sequence is shown.

Supplemental Fig. 4. Correlations of the number of 3P tags with expression lengths and 3'UTR lengths. (A) Correlation between the number of 3P tags assigned to a gene in the ovary and expression levels computed using RNA-Seq data from the same tissue (SRA accession ERP000016 ). (B) Correlation between the number of 3P tags assigned to a gene in the brain sample and the average 3'UTR length in the brain sample. The 3'UTR length was computed by averaging the lengths of all the 3'UTRs expressed in the ovary. Only genes with a single annotated or predicted stop codon were used.

Supplemental Fig. 5. Alternative polyadenylation events analyzed with qRT-PCR. (A-E) Annotated gene models and 3P-tag clusters in ovary, pre-MZT and post-MZT embryos. Blue arrows and black arrows indicate the position of the proximal and the distal qRT-PCR primer pair, respectively. RNA-Seq track shows a composite of reads from ten developmental stages and tissues (SRA accession ERP000016). (F) qRT-PCR results for peli1.

Supplemental Fig. 6. Differences in sequence composition between proximal and distal Poly(A) sites

(A) Nucleotide composition frequencies around proximal poly(A) sites that were used in the ovary. All the genes with only one annotated or predicted stop codon and at least 30% of change in the usage of at least one poly(A) sites between ovary and brain were used. (B) Nucleotide composition frequencies around distal poly(A) sites that were used in the brain. (C) Frequencies of matches of different motifs defined by polya_svm (Cheng et al., 2006). The motif logo for the CUE2 element is shown on the right. Only highly similar matches of motifs were considered (75th percentile of all possible positive scores). (D) Frequencies of different PAS-related motifs 10 to 40 bases upstream of proximal and distal Poly(A) sites. (E) The fraction of 3P tags that could be assigned to intronic poly(A) sites in different libraries. Intronic poly(A) sites were poly(A) sites that were intronic with respect to at least one annotated isoform.

Supplemental Fig. 7: Alternative polyadenylation of CPA-related genes

The shown 3'UTR structure is as annotated in Ensembl. The height of the density plot indicates the number of 3P tags ending at each position in each library. RNA-Seq data is a composite of expression from ten samples (SRA accession ERP000016).

Supplemental Fig. 8: Further characterization of the uridine-rich motif associated with stable transcripts in the early embryo

(A) Comparison of the maximal number of uridines in a window of 15 consecutive bases up to 200 bases upstream of cleavage sites in stable and unstable transcripts. (B) Frequencies of different non-uridine bases at specific positions in the window containing the maximal number of uridines up to 200 bases upstream of the stable Poly(A) sites. Only maximal windows with at least 10 uridines were considered. The first base in the window was uridine by definition.

Supplemental References

Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233.

Cheng, Y., Miura, R.M., and Tian, B. (2006). Prediction of mRNA polyadenylation sites by support vector machine. Bioinformatics 22, 2320-2325.

Jan, C.H., Friedman, R.C., Ruby, J.G., and Bartel, D.P. (2011). Formation, regulation and evolution of Caenorhabditis elegans 3'UTRs. Nature 469, 97-101.