Comparative transcriptome analyses of a mangrove treeSonneratiacaseolaris and its non-mangrove relatives, Trapabispinosa and Duabanga grandiflora
Jianfang Lia, Yuchen Yanga, Shuhuan Yanga, Zhang Zhanga, Sufang Chena, Cairong Zhongb, Renchao Zhoua,*, Suhua Shi a,*
Methods of transcriptome analysis
Quality control and De novo assembly
Before assembly, reads of low quality were filtered outfrom the three datasets of S. caseolaris, Trapabispinosa and Duabanga grandiflora using customized Perl scripts. After that, high-quality reads of each species werede novo assembled using the Trinity program (trinityrnaseq_r20140413) with a minimum k-mer coverage of 2 and a minimum length of 200 bp(Grabherr et al., 2011). Following the method described by Yang et al.,(2015a, 2015b), transcripts of high similarity were clustered using TGICL-2.1(Pertea et al., 2003) and CD-HIT (Fu et al., 2012), and the reassembled results (unigenes) of low coverage and depth were eliminated from each dataset to increase the reliability of assembly.
Functional annotations and Gene Ontology (GO) classification
All the retainedunigenes of the three datasets were functionally annotated against Swiss-Prot database with an e-value cutoff of 1e-6 using AgBase(McCarthy et al., 2011). Then, Gene Ontology (GO) classification were carried out in the online program WEGO (Ye et al., 2006).
Positively selected gene (PSG) identification in Sonneratia caseolaris
As described in Yang et al., (2015a),open reading frames (ORFs) comprising at least 100 amino acid were extracted from each unigenesof the three datasets using TransDecoder(trinityrnaseq_r20140413; Grabherr et al., 2011). Putative orthologs were extractedamong five species, Sonneratia caseolarisS. alba, Trapa bispinosa, Duabanga grandiflora and Eucalyptus grandiusing OrthoMCL (Fischer et al., 2011). From the identified orthologs, putative positively selected genes (PSGs) along the branch of S. caseolaris, which was set as “foreground branch”, were detected using the branch-site model in the codeml module of PAML (Yang, 2007, Yang and Nielsen, 2002, Zhang et al., 2005). Model A1 with and without ω fixed to 1 was compared with that without ω fixed to 1 for PSG determination. the likelihood ratio test (LTR) were employed to assess the reliability of the results with ap-value cutoffof 0.05, and Bonferroni’s multiple testing correction was performed to control the false discovery rate with a threshold of less than 0.05.
Reference for supplementary methods
Fischer, S., Brunk, B.P., Chen, F., Gao, X., Harb, O.S., Iodice, J.B., Shanmugam, D., Roos, D.S., Stoeckert Jr, C.J., 2011.Using OrthoMCL to Assign Proteins to OrthoMCL‐DB Groups or to Cluster Proteomes Into New Ortholog Groups. Current Protocols in Bioinformatics. 6.12.1-6.12.19.
Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W., 2012. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 28, 3150-3152.
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., Palma, F.D., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., Regev, A., 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644-652.
McCarthy, F.M., Gresham, C.R., Buza, T.J., Chouvarine, P., Pillai, L.R., Kumar, R., Ozkan, S., Wang, H., Manda, P., Arick, T., Bridges, S.M., Burgess, S.C.,2010. AgBase: supporting functional modeling in agricultural organisms. Nucleic Acids Res. 10.1093.
Pertea, G., Huang, X., Liang, F., Antonescu, V., Sultana, R., Karamycheva, S., Lee, Y, White, J., Cheung, F., Parvizi, B., Tsai, J., Quackenbush, J., 2003. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 19, 651-652.
Yang Y., Yang S., Li J., Deng Y., Zhang Z., Xu S., Guo W., Zhong C., Zhou R., Shi S., 2015a. Transcriptome analysis of the Holly mangrove Acanthus ilicifolius and its terrestrial relative, Acanthus leucostachyus, provides insights into adaptation to intertidal zones. BMC Genomics. 16, 605.
Yang Y., Yang S., Li J., Li X., Zhong C., Huang Y., Zhou R., Shi S., 2015b. De novo assembly of the transcriptomes of two yellow mangroves, Ceriops tagal and C. zippeliana, and one of their terrestrial relatives, Pellacalyx yunnanensis. Mar Genomics. 23, 33-36.
Yang Z., 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24,1586–1591.
Yang Z., Nielsen R., 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol. 19, 908–917.
Ye, J., Fang, L., Zheng, H., Zhang, Y., Chen, J., Zhang, Z., Wang, J., Li, S., Li, R., Bolund, L., Wang, J., 2006. WEGO: a web tool for plotting GO annotations. Nucleic Acids Res.34, 293-297.
Zhang J., Nielsen R., Yang Z., 2005. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 22, 2472–2479.