Collection file for Supplementary Figures 1 - 3

Estimation of number and size of QTL effects in forest tree traits

David Hall, Henrik R. Hallingbäck, Harry X. Wu*

* Corresponding author

Umeå Plant Science Centre, Department of Forest Genetics and Plant

Physiology, Swedish University of Agricultural Sciences,

SE-90183 Umeå, Sweden

e-mail:

Supplementary Figure 1

Estimates of the number of QTL (nqtl) derived from simulated data involving genomes that comprise either 25 (1a, b, c), 50 (2a, b, c) or 100 causal loci (3a, b, c). The a-column shows nqtl-estimates (white boxplots) for QTL-mapping populations of 500 individuals at different numbers of detected QTL, compared with the corresponding numbers of causal segregating loci (red boxplots). The b-column shows nqtl-estimates (grey boxplots) for association mapping populations of 500 individuals at different numbers of detected QTL. The c-column shows nqtl-estimates (grey boxplots) for association mapping populations at different sample sizes. The designed numbers of causal loci are shown as red lines and the number of estimates underlying a boxplot is shown as a number below the box. It should be noted that in total five nqtl-estimate outliers are omitted from the figure in order to preserve illustrativity. These outliers are found in: (1) subplot 1c for sample size 100 at 8986101; (2 & 3) subplot 2c for sample size 100 at 985424 and for 250 at 5289018; (4 & 5) subplot 3a for two detected QTL at 14066 and 2779.

Supplementary Figure 2

Estimates of the number of QTL (nqtl) at different population sample sizes originating from a simulated association mapping population where the studied trait is influenced by 50 causal loci. The leftmost subplot shows estimates based on QTL declared significant by a conventional significance threshold (Bonferroni-corrected p < 0.05) while the rightmost subplot shows the corresponding estimates where QTL were declared significant using a more liberal threshold (Bonferroni-corrected
p < 0.2).

Supplementary Figure 3

Comparison of commonly used single-locus based association mapping software (SNPAssoc and TASSEL) (Bradbury et al., 2007; González et al., 2007) with a multi-locus model method (BayesB) based on an expectation-maximization algorithm and originally intended for genomic selection (Shepherd et al., 2010) based on the full simulated association dataset described in the manuscript.

Comparison between two association methods (TASSEL and SNPassoc, GLM models) and a genomic selection method (emBayesB). Two top panels, grey dashed line is Bonferroni-corrected significance level of 0.05 and red-dashed line a more conservative significance level. The third panel details single-locus model analyses by TASSEL include a genetic variance component associated with a kinship-matrix to account for population structure (MLM-model, Loiselle et al. 1995). Bottom two panels are results from the emBayesB method which provide effect size estimates (second to last panel) and a probability of SNP being in LD with causal variation (bottom panel, grey dashed line P = 0.05 and red dashed line P = 0.5). Red dots are the causal segregating SNPs.

References:

Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. 2007. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23(19): 2633-2635.

González JR, Armengol L, Solé X, Guinó E, Mercader JM, Estivill X, Moreno V. 2007. SNPassoc: an R package to perform whole genome association studies. Bioinformatics 23(5): 654-655.

Loiselle BA, Sork VL, Nason J, Graham C. 1995. Spatial genetic structure of a tropical understory brush, Psychotria officinalis (Rubiaceae). American Journal of Botany 82: 1420-1425

Shepherd R, Meuwissen T, Woolliams J. 2010. Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers. BMC Bioinformatics 11(1): 529.