Supplementary figures and legends

Figure S1. Distribution of the H3K9me3 mark in the genome of three cell types

(A) Input normalized level ofthe H3K9me3 mark in threecells types of 10 dpp animals on four types of genomic partitions (TSSs, exons, introns, and intergenic space). For germ cells the data was obtained on FACS-sorted spermatogonia. For liver and testis somatic cells, values of four and two ChIP-seq replicas were averaged, respectively. The error bars show standard error. (B) Input normalized level of the H3K9me3 mark in testicular somatic and germ cells (spermatogonia) on five TE classes. (C) Correlation of H3K9me3 levels between independent ChIP-seqexperiments performed on different spermatogonia samples: FACS- and MACS-sorted spermatogonia from Miwi2 KO and control animals. The correlation was analyzed for all 1 kb genomic windows and, separately, on all TE families. For this analysis only TE families that had at least 5,000 reads mapped in the input sample were considered (291 families in total). Genomic windows with > 50 reads in input sample were considered.

Figure S2. H3K9me3 inspermatogonia of Miwi2 knock-out and control animals

Input normalized level of the H3K9me3 mark on genomic partitions (A) and transposon classes (B) in FACS sorted spermatogonia isolated from Miwi2 knock-out and control (Miwi2 heterozygote) mice. (C) Enrichment of H3 ChIP signal on select TE families in Miwi2 KO and heterozygous control.The bars show input-normalized H3 ChIP signal on bodies of TEs belonging to six families of LINE and LTR classes of TEs. Reads mapped to all annotated genomic instances of TEs were summarized and normalized to account for differences in library depths (see Materials and Methods).(D)Differences in H3K9me3 levels in LINE and LTR families in MACS-sorted spermatogonia of Miwi2 knock-out animals relative to control littermates.5' repeats of selected LINE elements are displayed separately as black dots. Fold changes observed in two independent experiments are averaged, and the error bars show standard error. (E)Distribution of the H3K9me3 mark along a L1-T consensus sequences in spermatogonia of Miwi2 KO and control animals. (F)Distribution of the H3K9me3 mark along a L1-Gf consensus sequences in spermatogonia of Miwi2 KO and control animals. (G) Correlation of changes in H3K9me3 levels on TE families upon Miwi2 deficiency in independentChIP-seqexperiments performed on FACS and MACS-sorted spermatogonia. The x-axis shows change in input normalized level of the H3K9me3 mark on TE families observed upon Miwi2 knock-out in FACS-sorted testicular germ cells. The y-axis shows fold changes observed in two independent experiments on MACS-sorted testicular germ cells. We considered only TE families that had at least 5,000 reads mapped in the input sample (291 families in total).

Figure S3. Different strategies for normalization of H3K9me3 ChIP-seq signal on TE families

Differences in H3K9me3 levels in LINE and LTR families in spermatogonia of Miwi2 knock-out animals relative to control(heterozygous) littermates.5' repeats of selected LINE elements are displayed separately as black dots. In addition to normalization to the ChIP input sample, two alternative normalization strategies were implemented: 1) to average H3K9me3 enrichment in 100 kb windows that have high level of H3K9me3 signal (more than 2-fold enrichment in spermatogonia of control mice); 2) to average H3K9me3 enrichment on major satellite repeats that have very high level of H3K9me3 mark. Both normalization strategies show that H3K9me3 levels are decreased on 5’ portions of three L1 families (L1-A, L1-T and L1-Gf) in Miwi2 KO spermatogonia sorted by two methods. For MACS-sorted cells, fold changes observed in two independent experiments are averaged, and the error bars show standard error.

Figure S4. H3K9me3 signal in sequences flanking LTR IAPEz insertions and validation of ChIP-seq results in somatic cells

(A) Metaplots of input normalized level of the H3K9me3 markin spermatogonia of Miwi2 KO and control animals over 25-kb genomic regions flanking all annotated IAPEz insertions. Only uniquely mapped reads were considered. Dashed lines show the distance at which the signal dropped 2 fold from the peak value.(B)ChIP-qPCR was used to measure H3K9me3 signal on major satellite, IAPLTR1a, and L1-A in somatic testicular and liver (C)cells from 10-day old Miwi2 knock-out and heterozygous animals. Somatic testicular cells were purified by MACS. H3K9me3 levels are normalized to input and to signal on B1 SINE. Bars represent fold enrichment of the average of two independent ChIP experiments on each sample, and error bars show standard deviations.

Figure S5. MIWI2-bound piRNAstargeting LTR and LINE families

(A) Relationship between theamount of piRNA in MIWI2 complex in prospermatogonia (E16.5), TE transcript abundance in normal testis and TE derepression upon Miwi2 deficiency. The x-axis shows the level of expression of selected TE families in control (Miwi2 heterozygous) testis of 10-day old mice as measured by RNA-Seq. The y-axis shows the fold change of the expression between Miwi2 knock-out and heterozygous animals as measured by RNA-Seq. The amount of MIWI2-associated piRNAs derived from each TE family corresponds to the size of the bubbles. (B) Strand distribution of piRNAs in immunopurifiedMIWI2 complexes in prospermatogonia of E16.5 wild-type animals.

Figure S6. Derepression of TE in Miwi2-deficient spermatogonia

(A)RT-qPCRwas used to measure expression of L1-T in testis of 10dppMiwi2 KO and control animals. DNase-treated total RNA from two separate sets of heterozygous and KO littermate total testes was used for oligo(dT)-primed reverse transcription. Primers targeting ORF1 and ORF2 regions of the L1-T transcript were used for qPCR. The values were normalized to actin. Error bars show standard deviations. (B) Relationship between levels of H3K9me3 in LINE families in germ cells relative to somatic cells and effect of Miwi2 knock-out on LINE expression. The x-axis shows the fold difference of H3K9me3 levels on different LINE families between germ cells and somatic testicular cells.The y-axis shows the fold difference in abundance of LINE transcripts between Miwi2 knock-out and control cells. (C) Relationship between L1-A element divergence and derepression in Miwi2 KO. All genomic copies of L1-A were binned in groups based on their divergence from the consensus sequence. Numbers of genomic L1 copies in each group are indicated above the boxes. The y-axis shows fold difference in the transcript abundance in testes of Miwi2 knock-out animals vs. heterozygous littermates. Boxes correspond to 25-th and 75-th percentile, the thick line inside boxes is the median. The whiskers spread to either 1.5 of IQR or to the farthest outlier if the outlier was within 1.5 IQR distance.

Figure S7. Distribution of the H3K9me3 mark on the downstream flanking sequences of L1-A elements

Level of the H3K9me3 mark on the 1-kb downstream flanks of individual L1-A copies (y-axis) in relation to the length of each insertion (x-axis) in spermatogonia of Miwi2 heterozygous (A) and knock-out (C) mice. Only uniquely mapped ChIP-seq reads were considered. The dots correspond to individual L1-A copies that had at least one read mapped to their flanks in both ChIP and input libraries (9,855 insertions in control and 10,072 in KO). (B) and (C) show box plot representation of data in (A) and (C) for two categories of insertions: shorter than 2kb, and longer than 5kb.The whiskers spread to either 1.5 of IQR or to the farthest outlier if the outlier was within 1.5 IQR distance.

Figure S8. Distribution of the H3K9me3 mark on the flanking regions of L1-F elements

(A), (C), (E), (G) Level of the H3K9me3 mark in the upstream and downstream 1-kb flanks of individual L1-F copies (y-axis) in relation to the length of each insertion (x-axis) in spermatogonia of Miwi2KO and heterozygous mice. Only uniquely mapped ChIP-seq reads were considered. The dots correspond to individual L1-F copies that had at least one read mapped to their flanks in both the ChIP and input libraries. (B), (D), (F), (G)Box plot representation of data distribution shown in the other panels, for two categories of insertions: shorter than 2kb, and longer than 5kb. Boxes correspond to 25-th and 75-th percentiles, the lines inside the boxes are the medians. The whiskers spread to either 1.5 of IQR or to the farthest outlier if the outlier was within the 1.5 IQR distance.

Table S1. Statistics for ChIP-seq libraries

List of ChIP-seq libraries generated in the study together with genome mapping statistics. Total: total number of reads in the library; unique mappers: reads with only one unique valid alignment to the mouse genome (mm10) with zero mismatches; unique + multi mappers: reads with up to 10,000 valid alignments to the mouse genome with zero mismatches; complexity: the number of distinct sequences divided by the total number of reads (only uniquely mapped reads and corresponding sequences were considered for evaluation of complexity).

Table S2. Evaluation of mapping threshold for genomic alignment of ChIP-seq reads

Four libraries (Miwi2 KO and heterozygotes ChIP-seq and chromatin input libraries) were truncated to 1 million reads and aligned to the mouse genome (mm10) using Bowtie 0.12.7 allowing zero mismatches and an unlimited number of valid alignments per read. The table lists numbers of reads and distinct sequences that were discarded when certain thresholds for specific maximum number of genomic alignments were implemented. Five thresholds were tested: 1000, 5000, 10000, 15000, and 25000. The table also lists numbers of reads that have at least one alignment within a genomic region annotated as simple repeat in the RepeatMasker track of the UCSC Genome Browser.

Table S3. Statistics for RNA-seq libraries

List of RNA-seq libraries generated in the study together with genome and transcriptome mapping statistics. Total: total number of reads in the library; rRNA: number of reads mapped to rRNA consensus sequence with up to 3 mismatches; non-rRNA: number of reads that did not align to rRNA; transcriptome: number of non-rRNA reads that have at least one valid alignment to the mouse transcriptome with up to 3 mismatches; unique mappers: non-rRNA reads with only one unique valid alignment to the mouse genome (mm10) with zero mismatches; unique + multi mappers: non-rRNA reads with up to 10,000 valid alignments to the mouse genome with zero mismatches; complexity: the number of distinct sequences divided by the total number of reads (only uniquely mapped reads and corresponding sequences were considered for evaluation of complexity).

Table S4. Intronic location among truncated L1 insertions

Table S5.Individual L1A insertions analyzed in Fig. 4E.

Shown are the chromosomal location and the length of each L1 insertion, H3K9me3 enrichment in FACS-sorted spermatogonia of 10 dpp control animals, the ratio of H3K9me3 in KO vs control animals, KO/Het ratio of both H3K9me3 and H3K4me2/3 levels measured by ChIP-qPCR, and p-values for the difference between Het and KO for each individual insertion.

Table S6. Three genes located in vicinity of L1 insertions have altered expression levels in Miwi2 KO testes

Out of 353 genes whose TSSs are positioned within 25 kb of a full-length (6000 bp or longer) L1 insertions, three genes were differentially expressed in testes of 10 day old Miwi2 KO and control (heterozygous littermates) mice. Average expression: mean normalized effective count of reads (eXpress estimation); Miwi2 Het testis: normalized effective count of reads in control (Miwi2 heterozygous) testis; Miwi2 KO testis: normalized effective count of reads in Miwi2 KO testis; foldChange: fold change in expression between control and KO; log2FoldChange: fold change on log2 scale; p value: p value of differential expression (DESeq estimation); adjusted p value: p value adjusted for multiple testing using the Benjamini-Hochberg algorithm.

Table S7.Primer sequences for ChIP-qPCR on both TE consensuses and individual loci

Table S8.Primer sequences for DNA methylation analysis

Table S9.Primer sequences for qPCR analysis of gene expression

1