SUPPLEMENTAL INFORMATION

Expression and regulation of lincRNAs during T cell development and differentiation

Gangqing Hu1,3, Qingsong Tang1,3, Suveena Sharma2,3, Fang Yu2, Thelma M. Escobar2, Stefan A. Muljo2, Jinfang Zhu2,#, Keji Zhao1,#

1. Systems Biology Center, National Heart, Lung and Blood Institute, National Institutes of Health (NIH), Bethesda, MD 20892

2. Laboratory of Immunology, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892

3. These authors contributed equally to this work.

Contact Information (#):

Jinfang Zhu, Laboratory of Immunology, NIAIDBethesda, MD 20892. Tel: (301) 402-6662; Fax: (301)-480-7352; Email:

Keji Zhao, Systems Biology Center, NHLBIBethesda, MD 20892. Tel: (301) 496-2098; Fax: (301) 480-0961; Email:

Contents:

Supplemental Figures 1-7

Supplemental Tables 1-9

Supplemental Figure 1:LincRNA identification in T cells.(a)Summary of T cell subsets on which total and/or polyA+RNA-Seq was performed with biological duplicates unless stated otherwise. Left (total RNA-Seq): CD4-CD8 double negative (DN) cells (DN1, DN2, DN3 and DN4), double positive (DP) cells (CD4+CD8+CD3low and CD4+CD8intCD69+), single positive (SP) CD4 and CD8 T cellsfrom thymi and thymus-derived regulatory T cells (tTreg) from lymph nodes of C57BL/6 mice. Right (polyA+ RNA-Seq): TH1, TH2, TH17 and induced regulatory T (iTreg cells) by in vitro differentiation of naïve CD4+ T cells for 4 hrs, 8 hrs, 12 hrs, 24 hrs, 48 hrs, 72 hrs, 1 week and 2 weeks; total RNA-Seq was also performed for naïve CD4+ T cells, TH cells at two weeks, andotherTHcell subsets as summarized in Supplemental Table 2.(b) Flow chart for identification of lincRNA clusters from RNA-Seq data: 1) call RNA-Seq read enriched islands from intergenic regions using SICER (window = 100 bps, Gap = 200 bps, E-value = 100); 2) keep islands shared by duplicates; 3) pool islands from all samples, independently done for data sets fromtotal RNA-Seqand from PolyA+ RNA-Seq; 4) cluster neighboring islands based on similarityin expression profiles across different samples (r > 0.8);5) merge clusters from the two types ofRNA-Seq data; 6) define promotersof a cluster by the overlapping with H3K4me3 islands. (c) Boxplot of the ratio of cytoplasmic read density to nuclear read density averaged from duplicates for lincRNAs and for mRNAs in TH2 cells at 2 weeks, with two representative lincRNA clusters being indicated (assessed by total RNA-Seq).

Supplemental Figure 2: LincRNA and protein-coding gene pairs co-regulated by STATs tend to be co-expressed in TH cells. Cumulative distribution of Pearson correlation coefficient (r) of gene expressions between a lincRNA and an mRNA during the time course of differentiation from naïve CD4+ T cells into four types of TH cells for three groups of lincRNA-mRNA pairs defined based on their expression changes in STAT4-deficient TH1 (a) cells or in STAT6-deficient TH2 cells (b) against wide type TH1 (a) or TH2 (b) cells: <nchg, nchg> (both not changed), <up, up> (both up-regulated), and <dwn, dwn> (both down-regulated). ** P-value < 1E-15 (Kolmogorov-Smirnov test)

Supplemental Figure 3: Inference of lincRNA function based on co-expression analysis. (a)Examples of scatter plot of gene expressions assessed by normalized polyA+read densitiesfrom naïve CD4+ cells to different TH cell subsets of all time points between lincRNA and protein coding genes. (b)Each lincRNA cluster wasassociated with three groups of protein coding genes based on their similarities in gene expressions across all THsubsets to this lincRNA: r > 0.6, r < -0.6and -0.6≤ r ≤ 0.6. The percentage of genes located within 100K bps of the lincRNA was determined for each group. X-axis refers to the % of genes near a lincRNA (within 100K bps) for a particular group of protein coding genes. Y-axis refers to the % of groups that contains lincRNA-nearby genes at a rate less than the one specified in the X-axis. * P-value < 0.01(Kolmogorov-Smirnov test).(c) Heat map of hierarchy analysis of p-values (binomial test) from Gene Ontology (GO) enrichment analysis for groups of genes that show modestly positive (r > 0.6) or negative correlations(r < -0.6) with a lincRNA (each column).Each row representsa GO term.The p-value of enrichment is shown in a color key with red indicating the most significance.

Supplemental Figure 4: Expressions of ribosome-biogenesis-related lincRNAs and mRNAs during T cell differentiation.(a) Heat map of hierarchy clustering analysis of gene expressions of 151 lincRNAs associated with ribosome biogenesis as defined in Supplemental Fig. 3cat multiple time points from naïve CD4 T cells to different TH cells, assessed by polyA+ RNA-Seq.Each row representsa lincRNA of which the expression values were transformed into z-scores. (b) Heat map of hierarchy clustering analysis of gene expressions of53protein-coding genes annotated as“ribosome biogenesis”across different TH cell subsets as in (a).

Supplemental Figure 5: Th2-specific LincR-Ccr2-5'AS is transcribed in the antisense strand of Ccr2. (a) Genome browser image showing the distribution of reads from the Watson and Crick strands determined by strand-specific total RNA-Seq across the LincR-Ccr2-5'AS cluster in TH2 cells at two weeks. (b) Heatmap for the expression level of LincR-Ccr2-5'ASat different developmental stages of T cell subsets with duplicates.

Supplemental Figure 6: Shared targets of LincR-Ccr2-5'AS and GATA-3 are highly enriched in immune function.(a)Cartoons showing the four potential modes of regulation of the shared target genes by LincR-Ccr2-5'ASand GATA-3 in TH2 cells with genome browser images of RNA-Seq data for a specific example for each group: (i) both LincR-Ccr2-5'ASand GATA-3 activate the target gene; (ii) both LincR-Ccr2-5'ASand GATA-3 repress the target gene; (iii) GATA-3 activates but LincR-Ccr2-5'ASrepresses the target gene; (iv) GATA-3 represses but LincR-Ccr2-5'ASactivates the target gene. (b) Upper panel: 3D bar graphs for the ratios of observed (Obs) to expected (Exp) number of genes grouped based on their responses to LincRNA-Ccr2-5'AS knockdown and to Gata3 knockout in TH2 cells: increased (Incr), decreased (Decr), and no change (Nchg) in the treatment cells. Lower panel: 3D bar graphs for the percentages of TH2-preferred genes (TH2-pref) for the same group of genes as in the upper panel. (c) List of genes responsive to both LincRNA-Ccr2-5'AS knockdown and to Gata3 knockout in TH2 cells and included in GO terms “cell cycle phase” and “immune system process”, with color indicating the group where the gene is from, defined in (a).

Supplemental Figure 7: Visualization of lincRNA structures and expressionswith UCSC genome browser.(a)Visualization of RNA-Seq read distribution across certain genomic region by uploading a BEDGraph file from GEO (GSE48138) for interested T cell subset (replicate 1 of TH2 cell at two weeks in this example) as a custom track to the online UCSC genome browser ( “manage custom tracks”);y-axis representsthe number of RNA-Seq reads per 100bp window, normalized by total reads number (in millions) and then by window size. (b) Visualization of lincRNA clusters (LincR-Gata3-3' in this example) by uploading the GFF file for lincRNA coordinates from GSE48138 to the UCSC genome browser as a custom track. (c)Visualization of spliced transcripts (cufflinks assembly) by uploadingGFF file compiled for spliced transcripts from GSE48138 to the UCSC genome browser as a custom track. (d) Gene annotations from UCSC, RefSeq and Ensemble, which are among many of theuseful built-in features of the online UCSC genome browser. The figure was not edited except for adding labels a-d to maintain the original appearance captured from the online UCSC genome browser at the time of manuscript preparation.

SUPPLEMENTAL TABLES

Supplemental Table 1: A compilation of lincRNAs during T cell development and differentiation.

The spreadsheet file includes proposed names of lincRNAs identified in this study, their expressions across different T cells subsets, their likelihoods of being polyadenylated, their ratios of cytoplasmic to nuclear fractions in TH2 cells at two weeks, and their comparison to public lincRNA annotation.

0h / 24 hours / 72 hours / 2 weeks
R1 / R2 / R1 / R2 / R1 / R2 / R1 / R2
Naïve CD4 T / X / X
TH1 / X / X / X / X / X
TH2 / X / X / X / X / X
TH17 / X / X / X / X / X
iTreg / X / X / X / X

Supplemental Table 2: THcell subsets analyzed by both total RNA-Seq and polyA+ RNA-Seq.

All T cell subsets at the eight time points (4 hrs, 8 hrs, 12hrs, 24 hrs, 48 hrs, 72 hrs, 1 w, 2w) during the differentiation from naïve CD4+ T cell to distinctive TH cells were sequenced with polyA+ RNA, with duplicates (R1 and R2). Marked (X) here are T cell subsets from which total RNA-Seq was also performed.

Category / # / FDR
KEGG pathway / Systemic lupus erythematosus / 36 / 1.0E-04
Cytokine-cytokine receptor interaction / 62 / 8.6E-04
Allograft rejection / 22 / 1.3E-02
Hematopoietic cell lineage / 27 / 3.2E-02
Biological Process / Regulation of leukocyte activation / 44 / 3.0E-03
Negative regulation of cell proliferation / 55 / 2.1E-02
Regulation of apoptosis / 111 / 2.4E-02
Regulation of macromolecular biosynthetic process / 386 / 3.9E-02
Regulation of adaptive immune response / 21 / 4.0E-02
GO Molecular Function / Chemokine receptor activity / 15 / 6.5E-05

Supplemental Table 3: KEGG pathway and GO enrichment analyses for protein-coding genes located next to lincRNA genes

Protein coding genes residing within 100K bps of anyof the 1,524 lincRNA genes from this studywere analyzed.

Gene symbol / Chromosome / Distance to LincR-Ccr2-5'AS (K bps) / r
Ccr3 / chr9 / 11 / 0.72
Ccr2 / chr9 / 26 / 0.86
Ccr5 / chr9 / 45 / 0.69
Ccr1 / chr9 / 74 / 0.76
Xcr1 / chr9 / 196 / 0.64
Cxcr6 / chr9 / 242 / 0.64
Ccr7 / chr11 / - / -0.66

Supplemental Table 4: Chemokine receptor genes located to the genomic region surrounding the LincR-Ccr2-5'AS gene.

The seven chemokine receptor genes are annotated in the chemokine-mediated signaling pathwayand show modestly positive (r > 0.6) or negative (r < -0.6) correlations with LincR-Ccr2-5'AS in gene expression at multiple time points during the differentiation from naïve CD4+ T cells to different TH cells (defined in Fig. 2d). Six of the seven genes are located nearby the LincR-Ccr2-5'AS cluster.

sh36 / sh40 / Expression change in KD
Gene symbol / FC (Ctrl/KD) / FDR / FC (Ctrl/KD) / FDR
Ccr1 / 1.56 / 1.9E-03 / 2.57 / 2.7E-09 / Decrease
Ccr3 / 2.53 / 2.3E-05 / 4.3 / 1.2E-07 / Decrease
Ccr2 / 2.38 / 4.7E-14 / 2.42 / 8.6E-14 / Decrease
Ccr5 / 1.59 / 3.6E-04 / 11.7 / 1.1E-47 / Decrease

Supplemental Table 5: Expression responses of Ccr genesto LincR-Ccr2-5'AS knockdown in TH2 cells.

Supplemental Table 6: Th2-preferred genes affected by LincR-Ccr2-5'AS knockdown in TH2 cells.

Biological processes / P-value
Down-regulated genes / nuclear division / 2.6E-12
mitosis / 2.6E-12
organelle fission / 1.4E-11
cell division / 6.3E-10
cell cycle process / 2.1E-08
cell cycle / 2.6E-08
cell cycle phase / 2.4E-06
protein localization to kinetochore / 2.3E-05
microtubule-based process / 3.1E-05
regulation of store-operated calcium channel activity / 8.9E-05
Up-regulated genes / immune system process / 3.6E-27
regulation of immune system process / 3.4E-21
regulation of response to stimulus / 1.1E-19
positive regulation of immune system process / 1.9E-19
response to stimulus / 2.9E-19
regulation of cell activation / 4.6E-18
immune response / 9.0E-18
regulation of multicellular organismal process / 2.5E-17
biological regulation / 4.3E-17
regulation of leukocyte activation / 8.5E-17

Supplemental Table 7: GO enrichment analysis for genes responsive to LincR-Ccr2-5'AS knockdown in TH2 cells.

Biological process / q-value
Cell cycle phase / 1.59E-05
Immune system process / 1.82E-05
G-protein coupled receptor signaling pathway / 3.13E-05
Nuclear division / 2.60E-05
Mitosis / 2.08E-05
Cell cycle process / 3.46E-05
Organelle fission / 7.15E-05
Regulation of multicellular organismal process / 9.81E-05
Single-organism cellular process / 1.20E-04
Response to stimulus / 1.28E-04

Supplemental Table 8: GO enrichment analysis for genes responsive toboth LincR-Ccr2-5'AS knockdown and Gata3knockout in TH2 cells.

Targeted bya TF / Targeted & positively regulated bya TF / Targeted & negatively regulated bya TF
T-bet / STAT4 / GATA-3 / STAT6 / T-bet / STAT4 / GATA-3 / STAT6 / T-bet / STAT4 / GATA-3 / STAT6
LincRNA
(1524) / 209
(13.7%) / 862
(56.6%) / 365
(24.0%) / 856
(56.2%) / 11
(5.3%) / 126
(14.6%) / 102
(27.9%) / 104
(12.1%) / 4
(1.9%) / 73
(8.5%) / 37
(10.1%) / 90
(10.5%)
mRNA
(16303) / 2196
(13.5%) / 12996
(79.7%) / 5058
(31.0%) / 13503
(82.8) / 71
(3.2%) / 1938
(14.9%) / 545
(10.8%) / 1199
(8.9%) / 36
(1.6%) / 1020
(7.8%) / 544
(10.8%) / 1133
(8.4%)

Supplemental Table 9: LincRNA and mRNAtargets of TH cell specific TFs.

A gene is targeted by a TF if the TF binds to the promoter and/or gene body region of the gene. A gene is positively or negatively regulated by a TF if the absence of the TF significantly down-regulates or up-regulates the expression of the gene.The binding data of T-bet and STAT4 and gene expression data forTbx21-/- and Stat4-/- were from TH1 cells, while the binding data of GATA-3 and STAT6 and the gene expression data forGata3-/- and Stat6-/- were from TH2 cells. Genes without any mapped RNA-Seq tags from the TF-deficient cells or the control cells were excluded from the calculation whenever applied.

1