Authors: Jun Chen1, 2, Yan-Fei Zeng1*, Wan-Jin Liao3, Peng-Cheng Yan4, Jian-Guo Zhang1, 2*

A novel set of single-copy nuclear gene markers in white oak and implications for species delimitation

1State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation, State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China

2Collaborative Innovation Center of Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, China

3MOE Key Laboratory for Biodiversity Science and Ecological Engineering, Beijing Normal University, Beijing 100875, China

4BeijingKeyLaboratoryofCloudComputingKeyTechnologyandApplication,BeijingComputingCenter,Beijing,China

*Corresponding author:

Yan-Fei Zeng

Email: , Telephone: 86-10-62888786, Fax number: 86-10-62872015

Address: NO. 1 Dongxiaofu, Xiangshan Road, Haidian District, Beijing, China 100091

Jian-Guo Zhang

Email: , Telephone:86-10-62889601, Fax number: 86-10-62872015

Address: NO. 1 Dongxiaofu, Xiangshan Road, Haidian District, Beijing, China 100091

1

Supplementary Information

Construction of EST database

For RNA preparation, seedsofQ. mongolica were collected from a natural population in Ning’an, Heilongjiang Province, China. Aftergerminatingseeds in the lab,total RNA was extractedfrom rootsof a seedling usingTRIzol reagent (Gibco BRL), following the manufacturer’s recommendations. Beads with Oligo(dT)were used to isolate poly(A) mRNA, and random hexamerprimerswereused for synthesizing cDNA. Short cDNA fragments were purified by end reparation and adenylation as well as sequencing adapter ligation. The cDNA library, which consisted of the selected fragments, was then sequenced using Illumina HiSeq™ 2000 to obtain short sequences for each cDNA. The Trinity (Li et al. 2010) program was used to assemble the transcriptome de novo.Reads with a certain length of overlap were combined to formcontigs.Using paired-end reads, we detected contigs from the sametranscript and the distances between these contigs. Thecontigswere then connected by Trinity, and sequencesthat could notbe extended on either endweredefined as Unigenes.

References

Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, et al. (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Research20(2):265-72.

1

Fig. S1Schematic summary of marker development strategy used to develop markers for phylogeny using ESTs from Q. mongolica, Q. robur and selected ESTs from the research of Hubert et al. (2014)

Fig. S2Dendrogram of one individual for each of 14 oak speciesfrom three sections based on 21 SCNG loci. The dendrogram was computed using a Bayesian approach implemented in *BEAST; numbers are posterior probabilities;posterior probabilitiesof 0.50 or less wereremoved

Fig. S3Position and characterization of insertion/deletion in locus CL5191

1

Fig. S4Position and characterization of insertion/deletion in locus Q452

Fig. S5Position and characterization of insertion/deletion in locus Q543

1