A Systematic Heritability Analysis of the Human Whole Blood Transcriptome

A Systematic Heritability Analysis of the Human Whole Blood Transcriptome

Supplementary Materials

A Systematic Heritability Analysis of the Human Whole Blood Transcriptome

Tianxiao Huan1,2, Chunyu Liu1.2, Roby Joehanes1,2,3, Xiaoling Zhang1, Brian H. Chen1.2, Andrew D. Johnson1, Chen Yao1,2, Paul Courchesne1,2, Christopher J. O'Donnell1, Peter J. Munson3, and Daniel Levy 1,2

1The National Heart, Lung and Blood Institute’s Framingham Heart Study, Framingham, MA

2Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, Bethesda, MD

3Mathematical and Statistical Computing Laboratory, Division of Computational Bioscience, Center for Information Technology, National Institutes of Health, Bethesda, MD

Correspondence should be addressed to:

Daniel Levy, MD

Framingham Heart Study

Population Sciences Branch

National Heart, Lung, and Blood Institute

73 Mt. Wayte Avenue, Suite 2

Framingham, MA 01702

Email:

Phone: 508-935-3458

Fax: 508-872-2678

Running title: Heritability of the Blood Transcriptome

Key words: heritability, eQTL, transcriptome, gene expression

Word count of the abstract: 245

Word count of the body without references or tables: 5637

Word count of the body including references and tables: 9814

Numbers of figures: 5

Numbers of tables: 4
Supplementary Notes

Heritability estimation in relation to cohorts and sample size

It is well known that heritability estimates are affected by sample size and family structure (Tenesa and Haley 2013). Previous studies reported heritability for gene expression in smaller sample groups (n=~1000) and with relatively simple family structures (Emilsson et al. 2008; Goring et al. 2007; Grundberg et al. 2012; Price et al. 2011; Stranger et al. 2007). For example, Grundberg et al. reported heritability for gene expression in twins (Grundberg et al. 2012); Price et al reported heritability for gene expressionin a population sample via identity-by-descent (IBD) in related or unrelated individuals(Price et al. 2011).

We found that 40% of gene transcript levels are heritable and the average heritability of global gene expression genome-wide is estimated to be 0.07 (for all 18,000 genes) and 0.13 (for 7,161 genes with >0). Several studies reported that 40-70% of gene transcripts are heritable with >0 (Emilsson et al. 2008; Goring et al. 2007; Price et al. 2011; Stranger et al. 2007). Dixon et al reported that the average heritability for global gene expression is 0.2 (Dixon et al. 2007), and the average heritability for transcripts with 0 range between 0.15 and 0.30 (Emilsson et al. 2008; Price et al. 2011; Stranger et al. 2007). These heritability estimates are considerable larger than our findings.

We further compared the heritability estimates at the individual transcript level between our study and the study by Goring et al. (Göring et al. 2007). We used the following formula to compare if heritability estimates were smaller in FHS than in the study by Goring et al. (Göring et al. 2007).

,

here and represent the heritability estimates in FHS and in the study by Goring et al. (Göring et al. 2007), respectively, for the same gene transcripts. Supplementary Table S3 showed that 62% of 838 genes with >0.2 display lower heritability estimates in FHS than in the study by Goring at al. (Göring et al. 2007) using Δ < -10% rule. The Pearson correlation is only 0.3 between the heritability estimates in two studies. However, when the heritability estimates being higher, a smaller proportion of genes display lower heritability estimates in FHS than in the study by Goring et al. (Göring et al. 2007). In addition, the heritability estimates become more concordant in both studies for genes with higher heritability estimates. The Pearson correlation between the heritability estimates of genes in both studies increased from 0.3 to 0.85 when the heritability estimates increased from 0.2 to 0.8.

The overall distribution of for genes in the FHS offspring cohort and the third generation cohort is similar (Supplementary Figure S2). The average heritability is slightly higher in the offspring cohort (average =0.087 and median =0.052) than in the third generation cohort (average =0.078 and median =0.044). To investigate if sample size and family structure affects heritability estimates, we randomly selectedn=50, 100 and 200 nuclear families from the full sample. We estimated heritability using two family structures: 1) offspring only (that is, only offspring individuals in these n=50, 100, and 200 nuclear families) and 2) both offspring and parents (that is, the entire nuclear families).To minimize fluctuations, we performed 500 random samplings in selected nuclear families. We randomly selected 10 genes in each of seven levels (70 genes in total), = (0, 0.1), (0.1, 0.2), (0.2, 0.3), (0.3, 0.4), (0.4, 0.5), (0.5, 0.6), and (0.6, 1)to represent the whole gene expressions.Supplementary Figures S3displays estimates from different numbers of nuclear families for 70 genes. The standard errors of are 0.16, 0.11 and 0.07 on average when sampling 50, 100, and 200 full families, and are 0.18, 0.12, and 0.08 when sampling 50, 100, and 200 siblings from unclear families. This result indicates that larger sample size and more comprehensive family structure provide more accurate heritability estimates.

Heritability estimation in relation to cell types

Previous heritability estimates were reported for gene expression measured in different tissues [19, 20]. In the FHS, gene expression levels were measured using whole blood derived RNA, which contains multiple cell types. To gain insight into how different cell types in whole blood affect heritability estimates, we compared the heritability estimates for transcripts (n=18,000) with and without adjusting for cell types (i.e. white blood cells [WBCs], neutrophils, lymphocytes, monocytes, platelets, eosinophils and basophils) as additional covariates in the LME models. The cell count proportions in whole blood were measured in 2138 individuals. We imputed cell count proportions of all FHS individuals by Partial Least Squares (PLS) regression (Boardman et al. 1981). The imputed cell count proportion values were highly consistent with the measured cell counts proportion (Joehanes R, PhD, in preparation). The average remained unchanged, that is the average =0.072 before and after accounting for cell counts.

Furthermore, we selected individuals falling within the top 25% and bottom 25% for three cell types [ WBCs, neutrophils, and lymphocytes, respectively], and compared the heritability estimates in the top and bottom levels for these three cell types. We used LME models implemented in lmekin()(Abecasis et al. 2001) to test the associations between cell types and transcripts. The transcripts associated with cell types at Bonferroni corrected p<0.05 were used to compare their heritability levels between samples from the top and bottom cell type proportion levels.The WBC count affected heritability estimates the most. The average was higher in the individuals in the top quartile of WBC (average =0.13) than those in the bottom quartile (average =0.10; the heritability estimates are significantly different [students’ T-test, p=5.7e-11]). Similarly, the heritability estimates were slightly higher in individuals in the top quartile of lymphocyte proportions (average =0.10) than in those in the bottom quartile (average =0.09) (students’ T-test, p=1.5e-4). The average for individuals in the top vs. bottom quartiles of neutrophilproportions were not different (average =0.10, p=0.6).

Summary of eQTLs results

The eQTL data were generated using gene expression results from 5257 individuals with both genotype dosage data imputed from 1000-Geonome SNP set and gene expression profiling. In total, we identified 2,206,283 cis and 160,183 transeQTLs for 12,407 genes at FDR<0.05. Details of eQTLs will be reported in a separate paper (Joehanes et al. in preparation).

Supplementary Figures

Supplementary Figure S1: Heritability distribution of transcriptom in Framingham cohorts. A) Heritability (h2) distribution of transcripts without SNP-probes (n=6,059, see methods); B) Summary of genes in different h2 sub-categories

Supplementary Figure S2: Heritability distribution of transcripts in Framingham offspring cohort and the third generation cohort.

Supplementary Figure S3: Heritability estimates of transcripts regarding to sample size and family structure.The x-axis indicated the 70 genes, 10 genes in each of seven levels, = (0, 0.1), (0.1, 0.2), (0.2, 0.3), (0.3, 0.4), (0.4, 0.5), (0.5, 0.6), and (0.6, 1). The y-axis is .

Supplementary Figure S4: Effect size of cis/trans eQTLs in the discovery and replication sets. Cis eQTLs were defined as the eQTLs located within 1MB before and after the transcriptional start site of eQTL genes, and transeQTLs indicated that trans eQTLs and eQTL genes were in different chromosome. In this study, we excluded transeQTLs in the same chromosome with eQTL genes but extended 1MB window.


Supplementary Figure S5: Comparison oft-test Statistics of cis (A) and trans (B) eQTLs at FDR<1e-4 before and after accounting for cell counts effects. We compared the ratios of BETA/SE (t-value) of eQTLs identified by adjusting for cell types to those without adjusting for cell types in regression models. The t-values remained similar with or without adjusting for cell types. Pearson correlations for t values were 0.99 and 0.98 for cis and transeQTLs respectively.

Supplementary Figure S6: Relationship between Heritability Estimates and proportion of genes with cis/transeQTLs. A) Proportion of transcripts having cis eQTLsin different heritability levels; B) Proportion of transcripts having transeQTLs in different heritability levels. (This result is based on eQTLs from 6,059 genes without SNP-probes, see methods)

Supplementary Figure S7: Proportion of Transcript Variance Explained by Peak eQTLs. A) Variance proportion of a transcript explained by a single peak cis eQTL; B) Variance proportion of a transcript explained by a single peak transeQTL. (This result is based on eQTLs from 6,059 genes without SNP-probes, see methods)

Supplementary Tables

Supplementary Table S1: Technical covariates utilized for gene expression data normalization

Supplementary Table S2: Pedigree structure formed by Framingham Heart Study participants

Supplementary Table S3: Comparison of h2estiamtes in the FHS and in the Goring et al

Supplementary Table S4: Gene Expression Heritability and Their Peak cis eQTL

Supplementary Table S5: Gene Expression Heritability and Their Peak transeQTL

Supplementary Table S6: Gene Expression Heritability and Their Peak long-range cis eQTL

Supplementary Table S7: Differentially expressed genes for body mass index at FDR<0.05

References

Abecasis GR, Cardon LR, Cookson WO, Sham PC, Cherny SS (2001) Association analysis in a variance components framework. Genet Epidemiol 21 Suppl 1: S341-6

Boardman AE, Hui BS, Wold H (1981) The partial least squares-fix point method of estimating interdependent systems with latent variables. Communications in statistics-theory and methods 10: 613-639

Dixon AL, Liang L, Moffatt MF, Chen W, Heath S, Wong KC, Taylor J, Burnett E, Gut I, Farrall M, Lathrop GM, Abecasis GR, Cookson WO (2007) A genome-wide association study of global gene expression. Nat Genet 39: 1202-7

Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Walters GB, Gunnarsdottir S, Mouy M, Steinthorsdottir V, Eiriksdottir GH, Bjornsdottir G, Reynisdottir I, Gudbjartsson D, Helgadottir A, Jonasdottir A, Styrkarsdottir U, Gretarsdottir S, Magnusson KP, Stefansson H, Fossdal R, Kristjansson K, Gislason HG, Stefansson T, Leifsson BG, Thorsteinsdottir U, Lamb JR, Gulcher JR, Reitman ML, Kong A, Schadt EE, Stefansson K (2008) Genetics of gene expression and its effect on disease. Nature 452: 423-8

Göring HH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, Cole SA, Jowett JB, Abraham LJ, Rainwater DL, Comuzzie AG (2007) Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nature genetics 39: 1208-1216

Goring HHH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, Cole SA, Jowett JBM, Abraham LJ, Rainwater DL, Comuzzie AG, Mahaney MC, Almasy L, MacCluer JW, Kissebah AH, Collier GR, Moses EK, Blangero J (2007) Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nature Genetics 39: 1208-1216

Grundberg E, Small KS, Hedman AK, Nica AC, Buil A, Keildson S, Bell JT, Yang TP, Meduri E, Barrett A, Nisbett J, Sekowska M, Wilk A, Shin SY, Glass D, Travers M, Min JL, Ring S, Ho KR, Thorleifsson G, Kong A, Thorsteindottir U, Ainali C, Dimas AS, Hassanali N, Ingle C, Knowles D, Krestyaninova M, Lowe CE, Di Meglio P, Montgomery SB, Parts L, Potter S, Surdulescu G, Tsaprouni L, Tsoka S, Bataille V, Durbin R, Nestle FO, O'Rahilly S, Soranzo N, Lindgren CM, Zondervan KT, Ahmadi KR, Schadt EE, Stefansson K, Smith GD, McCarthy MI, Deloukas P, Dermitzakis ET, Spector TD, R MTHE (2012) Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nature Genetics 44: 1084-+

Price AL, Helgason A, Thorleifsson G, McCarroll SA, Kong A, Stefansson K (2011) Single-Tissue and Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated Individuals. Plos Genetics 7

Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavare S, Deloukas P, Dermitzakis ET (2007) Population genomics of human gene expression. Nature Genetics 39: 1217-1224

Tenesa A, Haley CS (2013) The heritability of human disease: estimation, uses and abuses. Nat Rev Genet 14: 139-49

1