Supplemental Table 1. The ID and sequence for each of the 37,132 oligos in which a single oligo matched to one unique gene model in the rice genome.
Supplemental Table 2. The ID and expression level for each of 3,990 oligos from the six organs or cell types; these oligos may cross-hybridize with more than one gene model in the rice genome.
Supplemental Table 3.List ofrice gene models that were expressed in at least one of six selected organ or cell types.
Supplemental Table 4.List ofrice gene models that were expressed in all six selected organ and cell types.
Supplemental Table 5. List of all best-matched gene model pairs between the rice and Arabidopsis genomes.
Supplemental Table 6. List of adjacent rice genes that can be organized into a co-expression group.
Supplemental Figure 1. A sample rice oligo array hybridization image.
This image is from slide A of the two-slide rice whole-genome oligo (70-mer) microarray set, hybridized with probes derived from total RNA samples of heading stage panicles (red spots, CY5 channel) and cultured cells (green spots, CY3 channel). The two-slide microarray set contains41,122 70-mer oligos representing41,754 rice gene models, including 16,504 gene models supported by available full-length cDNAs, 5,968 EST-supported gene models, and 19,282 predicted gene models without previous experimental support.
Supplemental Figure 2. GO category distribution of specifically enriched gene models in tillering stage shoots, tillering stage roots, heading stage panicles, and filling stage panicles. No GO category distribution for seedling enriched genes is shown due to its small gene model number.
Supplemental Figure 3. The overlap of expressed best-matched gene model pairs between corresponding organs of rice and Arabidopsis. Panicle 1 is heading stage panicle; panicle 2 is filling stage panicle.
Supplemental Figure 4. The distribution of signal intensities of the negative and positive controls.
(A)and (B): The expression data are from slide A of rice shoots (A) and Arabidopsis flowers (B) respectively. Positive expression of a given oligomer is defined asabove 90% portion of the value of the negative controls, andusing this cut-off 1-3% of the positive controls werescored as false negatives. : type I error; : type II error.
Supplemental Methods
Oligo Microarray
Our initial oligo design was based on a phase II rice genome assembly version available in October 2002 ( which wassignificantly improved from the initial draft version (Yu et al., 2002), and on the available full-length cDNAs (Kikuchi et al., 2003) and ESTs (TIGR Database). Our initial data set contained 28,444 japonica cDNAswith complete open reading frames (Kikuchi et al., 2003); 15,558 unigenes and 31,776 ESTs were taken from both the public databases (TIGR Database) and from the Beijing Genome Institute (Zhou et al., 2003). These were also aligned to the Syngenta japonica genome (Goff et al., 2002), and whenever two alignments overlapped by at least 100-bp, the smaller cDNA was eliminated. This gave us a data set of 33,764 non-redundant cDNAs, including 18,137full-length cDNAs (Kikuchi et al., 2003) and 15,627 EST clusters. The same redundancy-limiting rule was applied to the 55,624 FgeneSH (Yu et al., 2005; Salamov and Solovyev, 2000)predictions, and the full-length cDNAswere always kept if they were redundant with the predictions. Finally, we obtained a non-redundant set of 61,123 gene models.
Using standard criteria, including the avoidance of oligo cross-hybridization, and TMvalue, GC value, etc. (Sengupta and Tompa, 2002), we designed 58,404 70-mer oligos to represent the entire non-redundant gene model sets above. If several oligos fit the criteria, we picked the oligos that also perfectly mapped to the Syngenta japonica genome (e.g., match both the indica and japonica genes perfectly) (Goff et al., 2002). After the release of the complete gene centric map (Yu et al, 2005), we re-mapped these oligos to the new version of genome and found that a set of 41,122 oligos matched to the 41,754 non-transposable element gene models. This oligo set includes 37,132 oligos that only matched to unique individual gene models in the rice genome when a 70% identity threshold is applied (Supplemental Table 1). This 37,132 oligo subset matched gene models including 15,149 full-length cDNA matched gene models, 5,345 EST supported gene models, and 16,638 predicted gene models without previous experimental support. The remaining 3,990 oligos from the 41,122 oligo set matched to more than one individual rice gene model with a 70% or higher identity (Supplemental Table 2). We did all analyses in the study with this portion of the 37,132 cross-hybridization-free set of oligos or gene models. Only for calculating the experimentally detectable gene model number shown in Figure 1A did we extrapolate the possible gene expression numbers from the 3,990 oligos with possible cross hybridization concerns based on the gene model composition covered by this subset of oligos.
Overall, the original collection of oligos we designed contains 41,122oligos to represent 41,754annotated rice gene models of the complete gene centric version of the indica rice genome (Yu et al., 2005), including 16,504 full-length cDNA supported gene models, 5,968 EST supported gene models, and 19,264 predicted gene models without support. We aligned these Oryza sativa L. ssp indica genome-mapped oligos to the Syngenta Oryza sativa L. ssp japonica (Goff et al., 2002) and IRGSP japonica (International Rice Genome Sequence Project) genomes, and found that 92% of them aligned to japonicaknown and predicted gene models with over 90% identity levels. For the small fraction of unaligned oligos, 78% of them corresponded to predicted gene models without experimental support. This indicates that the oligo set is suitable for both Oryza sativa L. ssp indicaand japonica cultivars, and may be suitable formost of the rice sub-species.
The oligo setwas synthesized by Qiagen/Operon (Supplemental Table 1), and all oligos were randomized with respect to their genome location before printing onto poly-lysine coated microscope slides in the DNA microarray laboratory at YaleUniversity ( and at the Institute of Human Genetics, University of Aarhus.Thus there is no correlation between the probe position on the arrays and the relative positions of gene models on the chromosome. There were two slides (A and B) forthe whole oligo set. The average oligo spot diameter on thearrays is 100 m. There were 12 distinct negative control oligos, and each of them was printed 13 times at well-spaced locations on both slides A and B, while 13 more distinct negative control oligos were included on slide B. Thus, each slide included a total of 29,634 oligo spots, with 156 negative control spots for slide A and 169 negative control spots for slide B. These negative controls do not have a match in the rice genome sequences.
The 70-mer oligo set for the Arabidopsis genome was designed and synthesized by Qiagen/Operon ( The slide was printed in the same DNA microarray laboratory at YaleUniversity. Each slide included 26,092 oligo spots,with 192 negative control spots. The negative controls do not have a match in any of the Arabidopsis genome sequences (Ma et al., 2005).
RNA Isolation, Probe Labeling, and Hybridization
RNA preparation, fluorescent labeling of the probe, slide hybridization, washing and scanning were performed as described previously (Ma et al., 2001, 2002). Three independent biological samples for each organ were used for RNA preparation. Each RNA preparation was used to generate both Cy-3 and Cy-5 probes for hybridization with opposite dye-labeled probes derived from the common cultured cell control RNA sample. There were three high quality replicate data sets for each experiment (the correlationcoefficientwas above 0.90 between replicates), with one quality data set from each independent biological sample (after processing for dye swap effect).
Data Processing and Normalization
Spot intensities were quantified using Axon GenePix Pro 3.0 imageanalysis software. The net intensities (after subtracting the backgroundvalues) for each channel and channel ratios were measured using theGenePix Pro 3.0 median of intensity or the ratio method, and they were normalizedusing the corresponding GenePix default normalization factor.To merge the replicated GenePix Pro 3.0 output data files (.gprfiles) in a reasonable way, we developed a computer program ( this program, we normalized replicated data to minimize the variations caused by experimental procedures. We further normalized the intensity of different experiments based on the median of all genes.
To determine the threshold for expression, we followed a commonly used strategy (Rinn et al., 2003; Kim et al., 2003) with minor adjustments. First, negative controls were used to estimate non-specific hybridization signals. Based on the distribution of over 150 negative controls within each slide, we determined an experimental cutoff for non-specific hybridization, and 90% of those negative controls have signals below this cutoff. We found the false negative error ratefor positive controls was about 1-3% in both rice and Arabidopsis (Supplemental Figure 4). Second, we consider a gene model to be expressed only if the majority (two or three out of three) of the spots from multiple experiments show experimentallydetectable expression as defined in the first criterion. Third, an outlier-searching algorithmwas incorporated into the program that defined as outliers and eliminatedfrom the analysis those spots that exhibited a large differencebetween replicates.
To identify differentially expressed genes among the five organ types, we used the ANOVA F-test with the null hypothesis that each gene has the same expression level in all organ types. Three biological repeats for each organ type were used for the modeling and the test. Differential expression was determined by the ANOVA F-test and a Bonferroni multiple comparison with p < 0.05 using the MAANOVA package for R (Wu et al., 2003). We defined a gene asspecifically enriched in a given organ only if the expression level of the gene in the organ was significantly higher than the levels in all the other organs.
Calculation of Chromosomal Regions with Co-expressed Adjacent Gene Models
We used the method reported by Spellman and Rubin (2002) to identify co-regulated adjacent gene models. For each given block window size, we calculated the average correlation coefficient among the gene model expression data. The value was compared to the values from1,000,000 times of the randomly selected data of the same number of gene models to calculate the p-value. We carried out this analysis for block sizes of 2 to 20 gene models and found the block size of 11 to be optimal. We then calculated the number of genes showing co-expression at p values 0.001, 0.005 and 0.01 with the block size of 11 gene models. To calculate the distribution of co-expressed gene models within a specific length of chromosome, the genomic DNA length was summed up for all genes in the group for which the adjacent gene models show asimilar expression pattern, while the genomic DNA length was calculated separately for those gene models that did not fill in the group based on their expression pattern compared with their adjacent gene models.
Supplemental References
Goff,S.A., Ricke, D., Lan, T., Presting, G., Wang, R., Dunn, M., Glazebrook, J., Sessions, A., Oeller, P., Varma, H., et al.. 2002. A draft sequence of the rice genome (Oryza sativa L.ssp. japonica).Science 296: 92-100.
Kikuchi, S., Satoh, K., Nagata, T., Kawagashira, N., Doi, K., Kishimoto, N., Yazaki, J., Ishikawa, M., Yamada, H., Ooka, H., et al.. 2003. Collection, mapping, and annotation of over 28,000 cDNA clones from japonicarice.Science301: 376-379.
Kim, H., Snesrud, E.C., Haas, B., Cheung, F., Town, C.D. and Quackenbush, J. 2003. Gene expression analysis of Arabidopsis chromosome 2 using a genomic DNA amplicon microarray. Genome Res.13: 327-340.
Ma, L., Gao, Y., Li, J., Chen, Z., Li, J., Zhao, H. and Deng, X.W. 2002. Genomic evidence for COP1 as a repressor of light-regulated gene expression and development in Arabidopsis. Plant Cell14: 2383-2398.
Ma, L., Li, J., Qu, L., Hager, J., Chen, Z., Zhao, H., Deng, X.W. 2001.Light control of Arabidopsis development entails coordinated regulation of genome expression and cellular pathways.Plant Cell 13: 2589-2607.
Ma, L., Sun, N., Liu, X., Jiao, Y., Zhao, H. and Deng, X.W. 2005. Organ-specific genome expression atlas during Arabidopsis development. Plant Physiology, in press.
Rinn, J.L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N.M., Hartman, S., Harrison, P.M., Nelson, F.K., Miller, P., Gerstein, M. and Snyder, M. 2003. The transcriptional activity of human Chromosome 22.Genes Dev.17: 529-540.
Salamov,A.A. and Solovyev, V.V. 2000. Ab initio Gene Finding in Drosophila Genomic DNA.Genome Res. 10: 516-522.
Sengupta, R. and Tompa, M. 2002. Quality control in manufacturing oligo arrays: a combinatorial design approach. J. Comput. Biol.9: 1-22.
Spellman, P.T. and Rubin, G.M. 2002. Evidence for large domains of similarly expressed genes in the Drosphila genome.J. Biol. 1:1-8.
Wu, H., Kerr, K., Cui, X. and Churchill, G.A. 2003. MAANOVA: A software package for the analysis of spotted cDNA microarray experiments. In The Analysis of Gene Expression Data: Methods and Software, G. Parmigiani, E.S. Garett, R.A. Irizarry, and S.L. Zeger, eds (Heidelberg: Springer), pp. 313–341.
Yu, J., Hu, S., Wang, J., Wong, G.K.S, Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., et al.. 2002. A draft sequence of the rice genome (Oryza sativa L.ssp. indica)Science 296: 79-92.
Yu, J., Wang, J., Lin, W., Li, S., Li., H., Zhou, J., Ni, P., Dong, W., Hu, S., Zeng, C, et al.. 2005. The genome sequence of indica and japonica rice. PLoS Biology, 3: e38. (see also
Zhou, Y., Tang, J., Walker, M.G., Zhang, X., Wang, J., Hu, S., Xu, H., Deng, Y., Dong, J., Ye, L.,et al.. 2003. Gene identification and expression analysis of 86,136 expressed sequence tags (EST) from the rice genome.Geno. Prot. & Bioinfo. 1: 26-42.
1