Supplementary Materials and M Ethodsfor Sienski Et Al

Supplementary Materials and Methodsfor Sienski et al.

Computational analyses

Mapping of shortIlluminareads

TheIlluminashort reads were quality-controlled and mapped to the Drosophila melanogaster genome (dm3) with chromosomeUextraexcluded using bowtie 0.12.7 (Langmead et al. 2009). We allowed up to 1 mismatch for RNA-seq,ChIP-seqreads and small RNAs-seqfollowed by IP. No mismatches were allowed analyzing total small RNA-seqlibraries. Genome-aligned reads were mapped to FlyBase andRepbasetransposon consensus sequences. Only reads mapping to one transposon in our list were considered and for reads mapped twice within one TE we applied a weighting scheme. For all analysis except for small RNA-seq, number of reads mapped to each TE was normalized to its length and total number of genome-aligned reads (RPKM value, Reads PerKilobaseof exon model per Million mapped reads (Mortazavi et al. 2008). Small RNA-seqafter IPs libraries were normalized to their depth (the ppm value, parts per million) and total small RNA-seqlibrarieswerenormalized to their microRNA content.

mRNA-seq

We sequenced mRNAs in a strand-specific manner from OSCs upon different siRNA-mediated knockdowns (2 siRNAs different per gene, except forPiwi). For the computational analyses, we extracted high quality bases from every read (6-45nt) and mapped these to theDrosophilagenome. Uniquely aligned reads were used for quantification and differential gene expression levels with Homer Software(analyzeRepeats.plrnadm3 -count exons -condenseGenes-rpkm-strand +; getDiffExpression.pl -repeats -DESeq)(Heinz et al. 2010). For computing TE expressions, we used genome-mapped reads, aligned them to the TE consensus and calculated expression value by counting reads perkilobaseof transcript per million genome-mapped reads (RPKM). For further analysis, we calculated an average value between biological replicates and filtered TEs based on their expression levels. First, we considered only TEs, which score RPKM>=1 in our control libraries (GFP-KD and Luc-KD). Secondly, we include only TEs which reach RPKM>=10 at least in one of all depletion conditions presented in the study. Later, we classified TE as regulated byPiwi, if their expression between control andPiwidepletion reached at least 5 fold (n=11). The remaining TEs (n=36) were classified as non-regulated byPiwi.

ChIP-seq

We sequenced DNA fragments precipitated with H3K9me3 or RNA Polymerase II from OSC upon different siRNA mediated knockdowns. We extracted high quality bases from every read (6-45nt) and mapped these to theDrosophilagenome to generate genome-wide occupancy maps.

Total small RNA-seq

Small RNA cloning procedure introduces 4 random nucleotides at 3’ end of 5’ linker and 5’ end of 3’ linker, which reduces ligation biases (Jayaprakash et al. 2011). First, reads were fist stripped of the 3’ adaptor and then the 4 random nucleotides at each end of the read were removed. Only reads larger than 19ntwere selected to increase mapping specificities. Contaminants and degradation products of abundant cellular RNAs were removed, such as reads mapping torRNA, mitochondrial RNA, microRNAs (all fromFlybase) andDrosophilaC virus.

smallRNA-seq after IP

Analyses were done similar to above with a few differences. After removing adaptor all the reads containing identical 4 random nucleotides on 5' and 3' sites were collapsed due to limited complexity. Next, those nucleotides were removed from the 5' and 3' end of every read andproceededas above.

Data visualization

For preparation of meta-plots representing an average signal distribution around TE insertions we used Homer Software with TE insertion list described earlier (Sienski et al. 2012)selected for TEs regulated byPiwi(annotatePeaks.pl dm3 -size 20000 -hist50 -bedGraph-noadj-fragLength0) (Heinz et al. 2010). Heat maps generated in this study were prepared using JavaTreeview(Saldanha 2004).

Spreading-index

For assessing spreading of the H3K9me3 mark, we first quantified the average H3K9me3 signal in 100nt bins within 10 kb upstream and 10 kb downstream of the gypsy insertions (100 bins each). We used an exponential decay equation to fit a curve and calculated the spreading index, which represents the distance from the TE insertion at which the H3K9me3 signal reaches 50% of its maximum.

References

Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. 2010. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38: 576–589.

Jayaprakash AD, Jabado O, Brown BD, Sachidanandam R. 2011. Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing. Nucleic Acids Res 39: e141.

Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25.

Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5: 621–628.

Saldanha AJ. 2004. Java Treeview--extensible visualization of microarray data. Bioinformatics 20: 3246–3248.

Sienski G, Dönertas D, Brennecke J. 2012. Transcriptional silencing of transposons by Piwi and maelstrom and its impact on chromatin state and gene expression. Cell 151: 964–980.