Construction and Evaluation of the Barley Flcdna Libraries

Supplemental Information

Construction and evaluation of the barley FLcDNA libraries

The malting barley cultivar Haruna Nijo (Hordeum vulgare L. cv. Haruna Nijo) was used as the RNA source for cDNA library construction. To obtain samples at various developmental stages, plant seedlings were grown in a field at Okayama University, Kurashiki, Japan, and flowers, flag leaves, and seed samples from these plants were prepared and stored in liquid N2. For experiments under various abiotic stresses and plant hormone treatments, plants were grown in a hydroponic nutrient solution (Sato et al. 2009) in a growth chamber at 20 °C with a photoperiod of 16 hours light:8 hours dark. After the plants had been cultured for 3 days, the growth solution was replaced with a solution containing specific chemicals, or the cultures were placed under the specific conditions described in Table S1. Then, shoot and root tissues were collected separately and subjected to RNA preparation.

Total RNA from the barley samples was prepared either as described previously (Sato et al. 2009) or using an RNeasy kit (Qiagen, Valencia, CA, USA). Purification of poly(A)-containing mRNA transcripts and construction of FLcDNA libraries were performed by DNAFORM, Inc. (Yokohama, Japan). The libraries were constructed using the CAP-trapper method (Carninci et al. 2000), which is an efficient method for constructing an FLcDNA library. To differentiate between cDNAs obtained from various sources (i.e., different tissues, developmental stages, or treatments) in a pooled library, each cDNA (after first-strand synthesis) was labeled with a different sequence tag within the adaptors. After tag ligation, the second strand of the cDNA was synthesized; the cDNA was ligated with the pFLC-III vector ( and the resulting plasmid was transformed into the Escherichia coli DH10 strain.

The cDNAs were pooled into three FLcDNA libraries. All three libraries (designated Pool 1, Pool 2, and Pool 3) were subjected to a normalization step in which cDNA was hybridized with its own RNA as a driver to maximize the number of independent clones within each library (Carninci et al. 2000). In the Pool 1 library, samples from a normal flag leaf, a normal shoot, a germinated shoot, a shoot treated at low temperature (4 °C) and a shoot maintained in the dark at 20 °C were separately flash frozen in liquid N2. mRNAs were purified from each sample and first-strand cDNAs were synthesized. These individual cDNAs were tag-labeled, pooled, and normalized. The resultant cDNAs were used for the Pool 1 cDNA clone library. A pooled, normalized, and subtracted Pool 2 library was constructed as follows: shoot and root samples from aluminum-, NaCl-, ABA-, or JA-treated plants and plants grown under dry conditions were separately flash frozen in liquid N2. After cDNA synthesis, subtraction was performed using pooled RNAs for the Pool 1 library as drivers to enrich the library with stress-specific mRNAs, and the resulting library was designated Pool 2. For the Pool 3 library, young spike samples of various lengths (1 to 6 cm), early flowers, adult flowers (10, 20, and 30 days before heading), and germinated seeds (before and after germination (up to 5 days)) were separately flash frozen in liquid N2, and cDNAs were synthesized. These cDNAs were used for the Pool 3 cDNA clone library.

The average insert sizes were 1.6 kb (Pool 1), 1.5 kb (Pool 2), and 1.7 kb (Pool 3), as calculated by restriction digestion of purified plasmids from 96 random clones from each pool; fragment size was determined after agarose gel electrophoresis. This size distribution was close to the values reported for other plant FLcDNA libraries (Seki et al. 2002; Kikuchi et al. 2003; Ogihara et al. 2004; Wang et al. 2008; Sato et al. 2009).

For each library (Pools 1, 2, and 3), 57,600 clones were randomly picked and grown in liquid LB medium. Plasmid DNAs were isolated by an alkaline lysis method and filtered with a Multiscreen NA filter (Millipore Corp., Billerica, MA, USA) to remove debris. Plasmid DNAs were then purified with a Multiscreen FB filter (Millipore). Both ends of the cDNA clones were sequenced for 500 to 600 nucleotides with a Big Dye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems Inc., Foster City, CA, USA) and analyzed using a capillary sequencer (ABI3700; Applied Biosystems). The average read length of the end sequences based only on the high-quality base scores (qv>20) was 538 bases at the 5′-terminus and 514 bases at the 3′-terminus.

Chloroplast sequences or contaminating sequences within the sequenced clones were eliminated from further analysis. Blastn analysis against the barley (Hordeum vulgare) chloroplast genome sequence (GenBank: EF115541.1) with thresholds of >50% coverage and >90% identity detected only 28 clones. Possible contaminated sequences were detected by blastn analysis using NCBI’s RefSeq database, and sequences with significant hits (E< 0.01) for only bacterial or animal sequences in a specific library (P< 0.01) were removed. Blastn analysis against sequences in the TIGR HordeumRepeats ( and the Graingenes TREP databases ( detected transposable elements in 1,534 clones.

In each of the pooled libraries, the distributions of the content of each tag were from 22.9% to 27.0%, indicating even distribution of the tags. This indicates an almost even distribution of sequences from each sub-library. The numbers of sequences generated from the 5′ and 3′ ends of all of the sequenced clones were 163,168 and 145,949, respectively. These sequences, with a total length of more than 92.7 Mb, correspond to approximately 1.85% of the estimated genome size (4,873 Mbp) of barley (Arumuganathan and Earle 1991).

From the 309,117 sequences that we obtained from the 5′ and 3′ ends of Haruna Nijo cDNAs, we first removed sequences that might have been derived from other species, presumably due to contamination. The 302,548 ESTs that remained were assembled into contigs using a transcript assembly program, TGICL (Pertea et al. 2003). The contigs were grouped into clusters connected by mate-pair information using Perl scripts written in-house. The minimum clustering criteria were set at 95% identity with a 40-bp overlap, and we rejected clusters in which fewer than 50% of mate pairs were involved in a contig-contig connection. The sequences formed 31,565 clusters (24,442 groups + 7,123 singlet sequences). The average number of ESTs within a contig was 5.6, and the average number of contigs within a cluster was 2.1.

We counted the number of ESTs classified into each Gene Ontology (GO) category for each “tag” condition (data not shown). We found GO biases for EST counts in some condition-gene function relationships. For example, the ESTs derived from normal tissues (flag leaf and shoot) presented some distribution biases (e.g., biological process, cellular homeostasis,cellular component organization and biogenesis, photosynthesis, RNA binding). We found that the flag leaf cDNA contained abundant transcripts associated with rubisco S subunits (seven different entries hits) and other proteins related to photosynthesis. In the low temperature (4 °C) condition, transcripts for ice recrystallization inhibition protein 1 and late embryogenesis abundant (LEA) protein were observed to accumulate; these proteins are thought to be associated with tolerance against water stress resulting from cold shock or desiccation, respectively (Goyal et al. 2005). In the early-flowering stage, we found many transcripts coding for pectate lyase and pectinesterase family proteins, which are responsible for pollen germination (Marin-Rodriguez et al. 2002). We also found an abundance of transcripts coding for anther-specific proteins in the young spike and for alpha-amylases and other enzymes involved in sugar metabolism in the malt. These results indicate that the FLcDNA libraries still show a somewhat biased gene distribution derived from differences in the original transcript abundances, despite the normalization process.

To select clones for full sequencing, the clones were clustered by blastn analysis of their 3’ end sequences. Then, based on the 5’ end sequences of the clustered clones, the longest clones were selected as candidates for full sequencing. Finally, 24,783 clones that were randomly selected from the candidate and singleton clones were sequenced.

References

Arumuganathan K and Earle E (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Rep 9:208-218

Carninci P, Shibata Y, Hayatsu N, Sugahara Y, Shibata K, Itoh M, Konno H, Okazaki Y, Muramatsu M and Hayashizaki Y (2000) Normalization and Subtraction of Cap-Trapper-Selected cDNAs to Prepare Full-Length cDNA Libraries for Rapid Discovery of New Genes. Genome Res 10:1617-1630

Goyal K, Walton LJ and Tunnacliffe A (2005) LEA proteins prevent protein aggregation due to water stress. Biochem J 388:151-157

Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, et al (2003) Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice. Science 301:376-379

Marin-Rodriguez MC, Orchard J and Seymour GB (2002) Pectate lyases, cell wall degradation and fruit softening. J Exp Bot 53:2115-2119

Ogihara Y, Mochida K, Kawaura K, Murai K, Seki M, Kamiya A, Shinozaki K, Carninci P, Hayashizaki Y, Shin-I T, et al (2004) Construction of a full-length cDNA library from young spikelets of hexaploid wheat and its characterization by large-scale sequencing of expressed sequence tags. Genes Genet Syst 79:227-232

Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, et al (2009) TIGR Gene Indices clutering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19:651-652

Sato K, Shin I T, Seki M, Shinozaki K, Yoshida H, Takeda K, Yamazaki Y, Conte M and Kohara Y (2009) Development of 5006 full-length cDNAs in barley: a tool for accessing cereal genomics resources. DNA Res 16:81-89

Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, Nakajima M, Enju A, Akiyama K, Oono Y, et al (2002) Functional Annotation of a Full-Length Arabidopsis cDNA Collection. Science 296:141-145

Wang J, Shi ZY, Wan XS, Sheu GZ and Zhang JL (2008) The expression pattern of a rice proteinase inhibitor gene OsP18-1 implies its role in plant development. J Plant Physiol 165:1519-1529