Supplementary Data Guide

Supplementary Tables

Supplementary Table 1 – List of the genes associated with clusters of uTSSs in S2 cells, embryos 0-16h and mES cells.

Experimental Methods

Cell Culture Conditions, Proliferation assay and RNAi

Luciferase Reporter assays

4sU RNA-seq

ChIP

Start-seq

MNase-seq

Sequencing and Data Analysis

Annotated and unannotated TSS Calling

TSS clustering based on Promoter Pol II half-lives upon Trp-treatment

Generation of Modified Transcript Annotations

Start-seq data analysis

Identification of Start-seq reads with non-templated 3’end residues

Sequencing, mapping and data analysis of ChIP-seq

4sU RNA-seq sequencing, mapping and data analysis

MNase-seq data analysis

Gene Ontology Analysis

Heatmaps and Metagene analysis

Sequence Content and Motif Analysis

Publically Available Data

Supplemental References

Supplementary Tables

Supplementary Table S1 – List of the genes associated with clusters of uTSSs in S2 cells, embryos 0-16h and mES cells.

Experimental Methods

Cell Culture Conditions, Proliferation assay and RNAi

All Drosophila S2 cell culture was conducted at 26°Cusing cells from the DGRC in M3 media supplemented with bactopeptone, yeast extract, and 10% FBS. For all experiments, Spt5 RNAi was performed for 48h with cells harvested at a consistent cell density of 4-6x106cells/ml, using the same method as previously described(Henriques et al. 2013). For the proliferation assay, cells were released from the flask and viable, trypan blue negative cells were counted on the indicated days in culture. Cell numbers were used to calculate total fold expansion for the entire time course. For RNAi depletion efficiency Drosophila S2 cells protein extracts were prepared with 2x Laemmli Buffer, at the indicated days of culture, and Spt5 (guinea pig, 1:3000) or TFIIS (rabbit, 1:1000) antibodies were used. Westerns shown in Sup. Fig.S4A are representative of analysis performed in multiple experiments.

mESCs were derived from NELF-Bwt/wt, CreER+/-, and NELF-BFl/Fl, CreER+/-,mice on a C57Bl/6 background, as per standard protocols(Williams et al. 2015).Where indicated, cells were treated with 100 nM 4OHT (Sigma) to recombine out the floxed NELF-B alleles (NELF-KO).mESC culture was conducted at 37°C in 5% CO2, passaged every 2 days. mESCs were grown in 2i conditions and maintained without feeders in knockout DMEM (KO-DMEM), 15% knockout serum replacement (KOSR, Invitrogen), 1mM NaPyruvate (Millipore), 1% NEAA, 1% BME, 1% Pen/Strep, 1% Glutamax, 1000 U/ml ESGRO, 1 μM MEK inhibitor (PD0325901, Stemgent), and 3 μM GSK3 inhibitor (CHIR99021, Stemgent).

Bone marrow-derived macrophages were prepared from female 8- to 12-week old C57BL/6 mice and maintained in L-cell conditioned media at 37°C with 5% CO2for the 7-day expansion as described previously(Scruggs et al. 2015).All animal experiments were approved by the NIEHS Institutional Animal Care and Use Committee and were performed according to NIH guidelines for the care and use of laboratory animals.

Luciferase Reporter assays

S2 cells were transfected with modified pSTARR-seq Fly vectors (gift from A. Stark) where firefly luciferase expression was driven by sequence ±500 bp from the uTSS. Each enhancer drove luciferase expression to a level >2-fold than that of anenhancerless control construct. The vector pRL-ubi63ERenillaexpressing Renillaluciferase driven by the promoter of Drosophila ubiquitin 63E was co-transfected to control for transfection efficiency. Twenty-four hours after transfectioncells were harvested and lysates were assayed for firefly and Renillaluciferase activity using the Dual Luciferase Reporter Assay System (Promega). Results were plotted as the ratio of firefly to Renillaluminescence.

4sU RNA-seq

Newly transcribed RNA from 5 independent replicates of Control and Spt5-depleted S2 cells or two distinct clones of Control and NELF-KO mES cells grown in 2i,were labeled for 10 minutes using 500 μM 4-thiouridine (Sigma, T4509).Total RNA was extracted with Trizol (Qiagen)and treated for 15 minutes with DNAseI amplification grade (Invitrogen) per manufacturer’s instructions. To purify metabolic labeled RNA we used 300 μg total RNA for the biotinylation reaction. Separation of total RNA into newly transcribed and untagged preexisting RNA was performed as previously described(Cleary et al. 2005; Windhager et al. 2012). Specifically,4sU-labeled RNA was biotinylated using EZ-Link Biotin-HPDP (Pierce), dissolved in dimethylformamide (DMF) at a concentration of 1 mg/ml. Biotinylation was done in labeling buffer (10 mMTris pH 7.4, 1 mM EDTA) and 0.2 mg/ml Biotin-HPDP for 2 h at 25°C. Unbound Biotin-HPDP was removed by chloroform/isoamylalcohol (24:1) extraction using MaXtract (high density) tubes (Qiagen). RNA was precipitated at 20,000g for 20 minutes with a 1:10 volume of 5 M NaCl and 2.5x volume of ethanol. The pellet was washed with ice-cold 75% ethanol and precipitated again at 20,000g for 5 minutes. The pellet was resuspended in 1 mlRPB buffer (300 mMNaCl, 10mMTris pH 7.5, 1mM EDTA). Biotinylated RNA was captured using Streptavidin MagneSphere Paramagnetic particles (Promega). Before incubation with Biotinylated RNA, Streptavidin beads were washed 4 times with wash buffer (50 mMNaCl, 10 mMTris pH 7.5, 1 mM EDTA) and blocked with 1% polyvinylpyrrolidone(Sigma) for 10 minutes with rotation. Biotinylated RNA was then incubated with 600 μlof beads with rotation for 30 min at 25 °C. Beads were magnetically fixed and washed 5 times with4TU wash buffer (1 M NaCl, 10 mMTris pH 7.5, 1 mM EDTA, 0.1% Tween 20). Unlabeled preexisting RNA present in the supernatant was discarded. 4sU-RNAwas eluted twice with 75μl of freshly prepared 100 mMdithiothreitol (DTT). RNA was recovered from eluates by ethanol precipitation as described above.

As per library preparation, RNA quality was assessed using a Bioanalyzer Nano ChIP (Agilent). Ribosomal RNA was removed prior to library construction by hybridizing to ribo-depletion beads that contain biotinylated capture probes (Ribo-Zero, Epicentre). RNA was then fragmented and libraries were prepared according to the TruSeq Stranded Total RNA Gold Kit (Illumina) using random hexamer priming. ERCC Spike-ins were used for normalization(Williams et al. 2015; Lovén et al. 2012).

ChIP

For ChIP-seq, Drosophila S2 cellsor macrophages were crosslinked for 10 minutes with 1% formaldehyde. ChIP material was prepared independently for each cell type as described previously(Muse et al. 2007). To ensure proper normalization(Orlando et al. 2014) between Pol II ChIP-seq of Control and Spt5-depleted cells, S2 cells and macrophage ChIP material was pooled in a 10:1 ratio (Drosophila to mouse) and immunoprecipitations were carried out with 12 μl of anti-Rpb3 (Drosophila) and 25 μl total anti-Pol II antibody (H-224, Santa Cruz Biotechnology #SC-9001, mouse) per 7.5x106 cells. For the remaining ChIP-seq libraries, separate immunoprecipitations were performed with22.5 μl anti-Cohesin (gift from D. Dorsett), 40 μlanti-H3K4me1 (Millipore #07-436), 30 μlanti-H3K4me3 (Millipore, #07-473),15 μlanti-H3K27ac (abcam, ab4729) and 45 μl anti-H3K36me3 (abcam, ab9050). Immunoprecipitated material was purified using the Qiaquick PCR purification kit and ChIP-seq libraries were prepared using the NEXTflex ChIP-seq kit (Bioo Scientific) according to the manufacturer’s instructions.

Start-seq

For Start-seq, Control and Spt5-depleted cells were grown as described above.Start-RNAs were prepared from two (Spt5-dep.) biological replicates, as described before(Nechaev et al. 2010). In brief, approximately 5x107 S2 cells were collected by centrifugation. After washing with ice-cold 1x PBS, cells were swelled in 10 ml of Swelling Buffer (10 mMTrispH 7.5, 10 mMNaCl, 2 mM MgCl2, 3 mMCaCl, 300mM sucrose, 0.5% Igepal, 5 mMdithiothreitol, 1 mM PMSF, protease inhibitors and SUPERase-IN RNAse inhibitor (Ambion)) by incubating for 15 minutes on ice followed by 14 strokes with a loose pestle. The dounced cells were spun for 5 minutes at 500xg, the supernatant (cytoplasm) was discarded, the pellet resuspended in 30 ml of Swelling Buffer and spun as above. The supernatant was discarded and the nuclei pellet was resuspended in 1 ml of Swelling Buffer, aliquoted and stored at -80C. Libraries were prepared according to the TruSeq Small RNA Kit (Illumina). To normalize samples, 15 synthetic capped RNAs were spiked into the Trizol preparation at a specific quantity per 106 cells as described previously(Henriques et al. 2013).

MNase-seq

Mouse ES cells were trypsinized from platesand crosslinked for 1 minute with 1% formaldehyde. Cells were washed once with ice-cold PBS, resuspended in 7 ml RSB (10 mMTris pH 7.5, 10 mMNaCl, 5 mM MgCl2) and incubated on ice for 10 minutes. 7 ml RSB + 0.5% IGEPAL CA-630 was added and cells were dounced 10 times with a loose pestle. Nuclei were pelleted at 1000xg for 5 minutes and 5 × 106 nuclei were resuspended in 400 μl digest buffer (15 mMTris pH 8, 60 mMKCl, 15 mMNaCl, 1 mM CaCl2, 0.25 M sucrose, 0.5 mM DTT), and incubated with 62.5U MNase (Worthington) for 5 minutes. 400 μl stop solution (1% SDS, 0.1 M sodium bicarbonate, 20 mM EDTA) was added and digested nuclei were incubated at 65°C for 90 minutes. 24 μlTris pH 7.6, 3 μl Proteinase K (Life Technologies), and 3 μlGlycoBlue (Life Technologies) were added and digestions were incubated for 12 hours at 65°C. DNA was extracted and mono-nucleosome fragments were gel-purified from fractions digested to ~60% mono-nucleosome, ~30% di-nucleosome, and ~10% tri-nucleosome-sized fragments. MNase-seq libraries were prepared as described elsewhere(Gilchrist et al. 2010).

Sequencing and Data Analysis

Annotated and unannotated TSS Calling

We used TSScall for rapid annotation of TSSs across the entire fly and mouse genome. This calling approach is based on previously described methodologies(Scruggs et al. 2015; Nechaev et al. 2010). In short, TSScall first divides a genome into a series of bins and by operating with a reference annotation, windows are first generated at TSSs in the reference. Windows are then made in other areas of the genome where Start-seq coverage is present but TSSs are not annotated in the reference. A TSS is called within the bin with the highest total read counts and at the individual nucleotide position with the highest number of reads. In unannotated windows, this process is performed iteratively: after calling a TSS, all reads within a distance threshold are removed from the window, and calling repeats until no reads are left. In mES cells, TSSs were called using default parameters except read threshold was set to 10 (FDR < 0.001). For the fly datasets - given the smaller genome size -, call method was set to global, annotation search window set to 100 and annotation join distance set to zero. A read threshold of 5 was used. For mES cells, TSSs were further grouped by distance; groups were made such that any two TSSs within 1000 bp of each other were placed in the same group. The TSS with the highest read count was taken forward as a representative TSS. TSScall and a script for selecting group representatives are available at github.com/lavenderca/TSScall/. 45,921,177, 6,373,904, 160,320,881 total number of mappable reads in S2 cells, embryos (0-16h) and mES cells allowed the generation of a final TSS list used in the analysis comprised of 33,876 (S2 cells), 24,519 (embryos staged 0-16h) and 172,817 (mES cells) sites. Datasets used for S2 cells TSS calling are 4 untreated datasets published here (TSS call files) andthe 2 control datasets in (Nechaev et al. 2010).The 2 datasets used for Embryos were previously published in (Nechaev et al. 2010) and for mES cells Control and NELF-KO datasets (N=6) were previously published in (Williams et al. 2015).

TSS clustering based on Promoter Pol II half-lives upon Trp-treatment

Protein-coding and enhancer TSSs were clustered into 5 groups using k-medoids clustering based on the Clustering Large Applications (CLARA) object in R. TSSs were defined as described under annotated and unannotated TSS calling.Paused Pol II levels in S2 cells were calculated with normalized datasets around eTSSs and mRNA TSSs with at least 5 reads (±50 bp from TSS)in the Control (DMSO/ 0 minutes) dataset. Half-lives were calculated by measuring the relative levels of each timepoint to Control treatment and calculating the median half-life of all TSSs within each cluster group. The number of mRNA TSSs used was 8,389 and eTSSs 1,492.

Generation of Modified Transcript Annotations

All transcript annotations for D. melanogaster r5.57 were downloaded from flybase.org, in GTF format, and filtered such that only “exon” entries for the feature types considered for re-annotation remained. For Mus musculusRefSeqannotations were downloaded from the UCSC table browser (February 2016) and filtered to minimize overlapping search spaces. Annotations from chrY, chrM, and random chromosomes were also excluded.Unique “gene_id” values were assigned to each transcript, such that those grouped and represented by a single member in TSS-based analyses, were identical. The start location of each transcript was adjusted to the observed TSS when this resulted in the truncation, rather than the extension of the model. If the observed TSS fell within an intron, all preceding exons were removed, and the transcript start was set to the beginning of the following downstream exon.

Start-seq data analysis

For Control and Spt5-depleted cells, Start-seq libraries from 2 independent biological replicates were generated.Paired-end reads for all samples were trimmed for adapter sequence and low quality 3’ ends using cutadapt 1.2.1, discarding those containing reads shorter than 20 nt (-m 20 -q 10), and removinga single nucleotide from the 3’ end of all trimmed reads to allow successful alignment with Bowtie 0.12.8. Remaining pairs were paired-end aligned to an index consisting of fly tRNA and rRNA sequences.Mappable pairs were excluded from further analysis, and unmapped pairs were subsequently aligned to an index containing the sequence of spike-in RNAs. Remaining unmapped pairs were aligned to the dm3 genome assembly. Identical parameters were utilized in each alignment described above: up to 2 mismatches, maximum fragment length of 1000 nt, and uniquely mappable, multi-mapped, and unmappable pairs routed to separate output files (-m1, -v2, -X1000, --max, --un). End 1 and end 2 reads of pairs mapping uniquely to dm3, representing startRNA 5’ and 3’ ends, respectively, were separated, and strand-specific counts of the 5’ mapping positions determined at single nucleotide resolution, genome-wide, and expressed in bedGraph format with “plus” and “minus” strand labels swapped for each 3’ bedGraph, to correct for the “forward/reverse” nature of Illumina paired-end sequencing.Counts of pairs mapping uniquely to each spike-in RNA were determined for each sample. Mean counts per spike-in for the mock-treated samples were determined, and least squares linear regression performed against each individual sample, with forced y-intercept of 0. In the case of the Control and Spt5 RNAi paired samples the resulting slopes agreed well between replicates (Control: 1.0643, 0.9425; Spt5-dep.: 0.4301, 0.4427), thuswere utilized as a multiplicative normalization factor applied individually to each bedGraph. Combined bedGraphs were generated by summing counts per nucleotide of both replicates for each condition.

Sample / Data Type / Total read pairs / Uniquely mapped pairs (Percentage of total) / Agreement between replicates (Spearman’s rho)
Control / Start-seq / 73,114,753 / 53.49% / 0.975
Spt5-dep. / Start-seq / 78,881,214 / 54.43% / 0.975

Identification of Start-seq reads with non-templated 3’end residues

For identification of Start-RNAs with non-templated 3’end residues we used the dataset and same approach as described in (Henriques et al. 2013). Briefly, reads that initially failed to align with the above Bowtie parameters were evaluated. To map the increase in oligo-adenylationreads upon exosome depletion, all unaligned reads (using standard Bowtie parameters) were trimmed at the 3’ end to remove terminal A nucleotides. Reads trimmed of at least 3 As with at least 18 nt remaining after trimming were aligned to the genome (reads with >26 nt remaining after trimming were further trimmed at the 5’ end to 26mers) and counted as for uniquely-aligned Start-RNAs.

Sequencing, mapping and data analysis of ChIP-seq

For Control and Spt5-depleted cells, Pol II ChIP-seq libraries from 2 independent biological replicates were generated. Two replicates for Cohesin, H3K4me1, H3K4me3, H3K27ac and H3K36me3 ChIP-seq libraries were also prepared and sequenced in at least one lane using a paired-end 75-bp cycle runon the Illumina NextSeq 500 system with standard sequencing protocolsto achieve appropriate sequence coverage. Raw sequences aligned at full length against the dm3 version of the fly genome using Bowtie version 0.12.8(Langmead et al. 2009) with a maximum allowed mismatch of 2 (-m1 –v2). The yield of uniquely mappable reads for each set of biological replicates is listed below.

Sample / Data Type / Total reads / Uniquely mapped reads (Percentage of total) / Agreement between replicates (Spearman’s rho)
Control / Pol II ChIP-seq / 50,526,791 / 47.62% / 0.998
Spt5-dep. / Pol II ChIP-seq / 57,082,173 / 41.01% / 0.997
H3K4me1 / ChIP-seq / 193,183,511 / 68.13% / 0.998
H3K4me3 / ChIP-seq / 208,811,275 / 66.48% / 0.999
K3K27ac / ChIP-seq / 183,095,870 / 66.74% / 0.997
H3K36me3 / ChIP-seq / 210,961,156 / 66.94% / 0.995
Cohesin / ChIP-seq / 247,722,719 / 64.00% / 0.995

The genomic location and strand of mapped reads was compiled using custom scripts and visually examined using the UCSC genome browser in bedGraph format. ChIP-seq hit locations were filtered based on fragment length, removing duplicates. The ChIP-seq datasets were binned in 50 bp windows for visualization in bedGraph files. Pol II ChIP-seq of Control and Spt5-depleted cells were depth spike normalized such that the counts of uniquely mappable reads were equivalent across all samples. We used the mouse BMDM spike-ins to perform inter-sample normalization and enable quantification of the effects upon Spt5-dep. on transcription. Under the assumption that read counts from the mouse should be constant across samples; read counts from each fly sample were normalized by the sum of the reads detected at mouse promoters (+-150bp from TSS) within the same sample. ChIP-seq heatmaps (e.g. Fig. 2) depict normalized reads in 50 bp bins at the indicated distances with respect to the TSS.

4sU RNA-seq sequencing, mapping and data analysis

For 4sU RNA-seq five independent biological replicates were generated using either Control or Spt5-depleted cells per condition, while two replicates were generated for mES cells using Control and NELF-KO cells. 4su RNA-seq libraries were sequenced in the Illumina HiSeq system, and exceptionally the same mESC libraries were also sequenced using a paired-end 75-bp cycle runon the Illumina NextSeq 500 system, with standard sequencing protocolsto achieve appropriate sequence coverage.Read pairs were filtered, requiring a mean quality score greater than or equal to 20, for both mates, then mapped to the dm3 or mm9 reference genome using TopHat 2.0.4 with Bowtie1 as the underlying aligner, reporting up to 10 alignments per read pair (-g 10). Mean fragment sizes and standard deviations were determined using Picard Tools 1.86 CollectInsertSizeMetrics, based on a Bowtie 0.12.8 alignment (-k1 -v2 -X10000) of a subset of five million reads to an index of FlyBase 5.57or mm9 transcripts, and passed to TopHat using the --mate-inner-dist and --mate-std- dev parameters. In a separate Bowtie 0.12.8 alignment (-m1 -v2 -X10000), the number of read pairs aligning to each ERCC spike-in was determined. A total of 110,254,937 and 94,396,375 4sU RNA-seq fragments were successfully aligned for the Control and Spt5-depleted samples, respectively. For mES a total of 168,890,649and 206,528,9564sU RNA-seq fragments were successfully aligned for the Control and NELF-KO samples, respectively.Read counts were calculated per gene, in a strand-specific manner, based on annotations described in the Start-seq analysis below, using htseq-count 0.6.0, and differentially expressed genes identified using DESeq 1.18.1 (Anders and Huber 2010)under R 3.1.1. 4sU RNA-seq size factors were determined based on ERCC counts. At an adjusted p-value threshold of <0.001, 7,406 genes were identified as differentially expressedupon Spt5-dep in S2 cells. In mES cells, 297 genes were identified as differentially expressed upon NELF-KO at an adjusted p-value threshold of <0.05. UCSC Browser tracks displaying mean read coverage were generated from the combined replicates per condition, normalized as in the differential expression analysis.