1/15

Supplementary Table 1. Sequence data used.

Species / # of mRNAs / # of ESTs
Oryza sativa / 34,887a / 285,019
Oryza spp.b / 10 / 5,868
Triticum aestivum (wheat) / 1,934 / 565,328
Zea mays (maize) / 10,754 / 415,235
Hordeum vulgare (barley) / 1,006 / 391,861
Saccharum officinarum (sugarcane) / 123 / 246,301
Sorghum bicolor (sorghum) / 86 / 190,949
Arabidopsis thaliana (thale cress) / 59,734 / 322,651
a32,127 FLcDNAs are included.
bOryza species other than O. sativa.

Note: rice full-length cDNAs (as of October 1, 2004) and other sequences (as of September 1, 2004) were retrieved from the International Nucleotide Sequence Databases.

Supplementary Table 2. Features of O. sativa and A. thaliana transcripts.

O. sativa / A. thaliana
with mRNAs / Predictions / with mRNAs
Number of exons / 106,447 / 33,937 / 101,396
Mean exon number / 5.19 / 4.89 / 5.40
Number of single exon loci / 4,473 (21.8%) / 1,235 (17.8%) / 3,583 (19.1%)
Mean exon length (bp) / 333 / 292 / 263
Mean first exon length (bp) / 412 / 379 / 323
Mean internal exon length (bp) / 177 / 207 / 159
Mean last exon length (bp) / 648 / 363 / 476
Mean intron length (bp) / 423 / 490 / 168
Mean mRNA length (bp) / 1,728 / 1,428 / 1,428
Mean pre-mRNA length (bp) / 3,501 / 3,334 / 2,160
Exon coverage on the genome / 35.4 Mbp / 9.9 Mbp / 26.8 Mbp
Transcribed genomic regions / 71.8 Mbp / 23.1 Mbp / 40.5 Mbp

Supplementary Table 3. Classification of transposable elements in the gneome and mRNAs.

Genome
TIGR code1 / Copy No. / Coverage (bp) / Coverage (%)2
Class I / Ty1-copia / TERT001 / 6,612 / 5,604,084 / 1.51
Ty3-gypsy / TERT002 / 25,426 / 30,662,518 / 8.27
LINE / TERT003 / 477 / 183,043 / 0.05
p-SINE1 / TEMT011 / 3,620 / 500,981 / 0.14
Other class I / TERTOOT / 21,264 / 18,680,085 / 5.04
Class II / Ac/Ds / TETN001 / 1,598 / 225,023 / 0.06
CACTA,
En/Spm / TETN002 / 17,651 / 15,060,940 / 4.06
MULE / TETN003 / 3,700 / 799,682 / 0.22
MLE / TETN004 / 326 / 78,309 / 0.02
Stowaway / TEMT002 / 317 / 27,335 / 0.01
Tourist / TEMT001 / 16,149 / 3,834,704 / 1.03
Other class II / TETN005,
TETNOOT / 42,301 / 6,492,629 / 1.75
Other TE / 130,256 / 21,721,885 / 5.86
Total / 269,697 / 103,871,218 / 28.01
mRNA
TIGR code1 / Copy No. / Coverage (bp) / Coverage (%)3
Class I / Ty1-copia / TERT001 / 91 / 35,202 / 0.07
Ty3-gypsy / TERT002 / 227 / 70,338 / 0.14
LINE / TERT003 / 5 / 1,150 / 0.00
p-SINE1 / TEMT011 / 66 / 7,401 / 0.02
Other class I / TERTOOT / 224 / 100,203 / 0.20
Class II / Ac/Ds / TETN001 / 14 / 2,637 / 0.01
CACTA,
En/Spm / TETN002 / 105 / 22,732 / 0.05
MULE / TETN003 / 52 / 13,544 / 0.03
MLE / TETN004 / 46 / 5,690 / 0.01
Stowaway / TEMT002 / 3 / 235 / 0.00
Tourist / TEMT001 / 149 / 25,308 / 0.05
Other class II / TETN005,
TETNOOT / 357 / 52,752 / 0.11
Other TE / 1,264 / 165,199 / 0.34
Total / 2,603 / 502,391 / 1.03

1For the TIGR codes, see

2Fraction in the genome

3Fraction in the total mRNAs

Supplementary Table 4. Features of annotated non-protein-coding (np) RNAs.

Feature / Multi-exon / Single-exon / Total
npRNA / 108 (100%) / 23 (100%) / 131 (100%)
Mean length (bp) / 1186 / 965 / N.A.*
Mean exon number / 2.8 / 1.0 / N.A.*
EST support / 47 (43.5%) / 5 (21.7%) / 52 (39.7%)
polyadenylation signal / 18 (16.7%) / 19 (82.6%) / 37 (28.2%)
genomic polyadenosine / 2 (1.9%) / 0 (0%) / 2 (1.5%)

*Not available.

Supplementary Table5. Putative rice antisense npRNAs and their sense genes.

as-npRNA / Chr / Sense gene / Sense gene description
(A) Antisensse to known protein genes:
Os02g0180800 / 2 / Os02g0180700 / Cinnamoyl-CoA reductase (EC 1.2.1.44)
Os03g0118500 / 3 / Os03g0118600 / Dihydrodipicolinate reductase-like protein
Os03g0127100 / 3 / Os03g0127200 / NAM protein
Os05g0577000 / 5 / Os05g0576900 / PIN1-like auxin transport protein
Os06g0514700 / 6 / Os06g0514600 / Cyclophilin-RNA interacting protein
Os07g0653300 / 7 / Os07g0653200 / BLE2 protein
Os07g0654800 / 7 / Os07g0654700 / BLE2 protein
Os08g0103700 / 8 / Os08g0103900 / NAM-like protein
Os08g0103600 / BTP/POZ domain containing protein
Os12g0114900 / 12 / Os12g0115000 / Lipid transfer protein LPT II
Os12g0132900 / 12 / Os12g0133000 / Major facilitator superfamily antiporter
Os07g0524300 / 7 / Os07g0524400 / Nucleolin (Protein C23)
(B) Antisense to domain-containing protein genes:
Os02g0684800 / 2 / Os02g0684900 / Zn-finger, FYVE type domain containing protein
Os01g0494300 / 1 / Os01g0494400 / Retrotransposon gag protein family protein
Os09g0429300 / 9 / Os09g0429200 / Ionotropic glutamate receptor family protein
Os08g0538100 / 8 / Os08g0538200 / Plant protein of unknown function family protein
Os06g0664000 / 6 / Os06g0663900 / Protein kinase domain containing protein
Os09g0471300 / 9 / Os09g0471400 / Protein kinase domain containing protein
Os10g0142700 / 10 / Os10g0142600 / Protein kinase domain containing protein
Os11g0173600 / 11 / Os11g0173700 / Protein kinase domain containing protein
Os05g0323400 / 5 / Os05g0323300 / BED finger domain containing protein
Os04g0588500 / 4 / Os04g0588600 / ABC transporter domain containing protein
Os09g0278900 / 9 / Os09g0279000 / ENT domain containing protein
Os11g0697100 / 11 / Os11g0697200 / Eukaryotic protein of unknown function DUF889 family protein
Os04g0172600 / 4 / Os04g0172500 / RNase H domain containing protein
Os01g0119800 / 1 / Os01g0119700 / Ubiquitin domain containing protein
Os06g0477600 / 6 / Os06g0477500 / Viral coat and capsid protein family protein
Os06g0555900 / 6 / Os06g0556000 / Amino acid carrier fragment
(C) Antisense to hypothetical protein genes:
Os01g0646400 / 1 / Os01g0646500 / Conserved hypothetical protein
Os03g0442800 / 3 / Os03g0442900 / Conserved hypothetical protein
Os06g0134200 / 6 / Os06g0134100 / Conserved hypothetical protein
Os11g0204500 / 11 / Os11g0204400 / Conserved hypothetical protein
Os12g0256600 / 12 / Os12g0256500 / Conserved hypothetical protein
Os01g0810700 / 1 / Os01g0810600 / Hypothetical protein
Os02g0228600 / 2 / Os02g0228700 / Hypothetical protein
Os02g0779500 / 2 / Os02g0779600 / Hypothetical protein
Os02g0792100 / 2 / Os02g0792200 / Hypothetical protein
Os02g0289300 / 2 / Os02g0289400 / Hypothetical protein (single-exon)
Os04g0308200 / 4 / Os04g0308000 / Hypothetical protein
Os05g0137800 / 5 / Os05g0137900 / Hypothetical protein
Os05g0294800 / 5 / Os05g0294700 / Hypothetical protein
Os05g0115200 / 5 / Os05g0115300 / Hypothetical protein
Os06g0516800 / 6 / Os06g0516900 / Hypothetical protein
Os07g0590700 / 7 / Os07g0590800 / Hypothetical protein
Os08g0384700 / 8 / Os08g0384800 / Hypothetical protein
Os08g0391300 / 8 / Os08g0391200 / Hypothetical protein
Os08g0555600 / 8 / Os08g0555700 / Hypothetical protein
Os09g0309900 / 9 / Os09g0310000 / Hypothetical protein
Os09g0321500 / 9 / Os09g0321600 / Hypothetical protein
Os09g0469500 / 9 / Os09g0469600 / Hypothetical protein
Os10g0479100 / 10 / Os10g0479000 / Hypothetical protein
Os11g0286500 / 11 / Os11g0286400 / Hypothetical protein
Os12g0255000 / 12 / Os12g0255100 / Hypothetical protein
Os12g0545600 / 12 / Os12g0545500 / Hypothetical protein
Os12g0199200 / 12 / Os12g0199300 / Hypothetical protein (single-exon)
Os04g0601200 / 4 / Os04g0601300 / Hypothetical protein

Supplementary Table6. Isoacceptor tRNA gene copy number and the relative synonymous codon usage (RSCU).

Amino acid / Codon / Gene number / RSCU
Gly / GGU / 0 / 0.80
GGC / 28 / 1.58
GGA / 10 / 0.78
GGG / 9 / 0.84
Val / GUU / 17 / 0.94
GUC / 16 / 1.21
GUA / 4 / 0.40
GUG / 10 / 1.46
Lys / AAA / 12 / 0.64
AAG / 20 / 1.36
Asn / AAU / 0 / 0.89
AAC / 29 / 1.11
Gln / CAA / 21 / 0.73
CAG / 10 / 1.27
His / CAU / 0 / 0.90
CAC / 26 / 1.10
Glu / GAA / 16 / 0.71
GAG / 25 / 1.29
Asp / GAU / 1 / 0.94
GAC / 31 / 1.06
Tyr / UAU / 2 / 0.76
UAC / 19 / 1.24
Cys / UGU / 1 / 0.65
UGC / 17 / 1.35
Phe / UUU / 0 / 0.72
UUC / 20 / 1.28
Ile / AUU / 18 / 1.00
AUC / 0 / 1.39
AUA / 5 / 0.61
Met / AUG / 56
Trp / UGG / 18
Arg / AGA / 12 / 0.93
AGG / 11 / 1.40
CGU / 24 / 0.61
CGC / 0 / 1.50
CGA / 4 / 0.49
CGG / 8 / 1.07
Leu / CUU / 15 / 1.02
CUC / 0 / 1.76
CUA / 10 / 0.46
CUG / 9 / 1.46
UUA / 4 / 0.38
UUG / 19 / 0.93
Ser / AGU / 0 / 0.66
AGC / 20 / 1.22
UCU / 12 / 0.97
UCC / 4 / 1.24
UCA / 17 / 0.97
UCG / 8 / 0.94
Thr / ACU / 11 / 0.88
ACC / 8 / 1.24
ACA / 15 / 0.97
ACG / 5 / 0.91
Pro / CCU / 14 / 0.94
CCC / 0 / 0.85
CCA / 14 / 0.99
CCG / 9 / 1.21
Ala / GCU / 20 / 0.84
GCC / 1 / 1.31
GCA / 10 / 0.75
GCG / 12 / 1.10

Note. - Most abundant isoacceptor tRNAs and codons are written in boldface.

Supplementary Table 7. The top 40 InterPro hits in O. sativa and A. thaliana.

Rank / IPR ID / Name / # of O. sativa proteins
1 / IPR011009 / Protein kinase-like / 1277
2 / IPR000719 / Protein kinase / 1221
3 / IPR002290 / Serine/threonine protein kinase / 1150
4 / IPR001245 / Tyrosine protein kinase / 1114
5 / IPR008271 / Serine/threonine protein kinase, active site / 842
6 / IPR001611 / Leucine-rich repeat / 666
7 / IPR008941 / TPR-like / 557
8 / IPR001810 / Cyclin-like F-box / 398
9 / IPR002885 / Pentatricopeptide repeat / 391
10 / IPR009057 / Homeodomain-like / 365
11 / IPR007090 / Leucine-rich repeat, plant specific / 354
12 / IPR001841 / Zn-finger, RING / 351
13 / IPR008938 / ARM repeat fold / 322
14 / IPR001128 / Cytochrome P450 / 303
15 / IPR002182 / NB-ARC / 291
16 / IPR000767 / Disease resistance protein / 274
17 / IPR008940 / Protein prenyltransferase / 273
18 / IPR002401 / E-class P450, group I / 255
19 / IPR000504 / RNA-binding region RNP-1 (RNA recognition motif) / 249
20 / IPR003593 / AAA ATPase / 244
21 / IPR000379 / Esterase/lipase/thioesterase / 237
22 / IPR001680 / WD-40 repeat / 233
23 / IPR003591 / Leucine-rich repeat, typical subtype / 233
24 / IPR001005 / Myb, DNA-binding / 229
25 / IPR011046 / WD40-like / 224
26 / IPR002110 / Ankyrin / 201
27 / IPR010983 / EF-Hand-like / 187
28 / IPR009007 / Peptidase aspartic / 169
29 / IPR002048 / Calcium-binding EF-hand / 167
30 / IPR001440 / TPR repeat / 155
31 / IPR010255 / Haem peroxidase / 148
32 / IPR002016 / Haem peroxidase, plant/fungal/bacterial / 146
33 / IPR002213 / UDP-glucuronosyl/UDP-glucosyltransferase / 146
34 / IPR008994 / Nucleic acid-binding OB-fold / 144
35 / IPR001878 / Zn-finger, CCHC type / 141
36 / IPR001092 / Basic helix-loop-helix dimerisation region bHLH / 138
37 / IPR003612 / Plant lipid transfer/seed storage/trypsin-alpha amylase inhibitor / 138
38 / IPR001687 / ATP/GTP-binding site motif A (P-loop) / 137
39 / IPR001650 / Helicase, C-terminal / 134
40 / IPR001410 / DEAD/DEAH box helicase / 132
Rank / IPR ID / Name / # of A. thaliana proteins
1 / IPR011009 / Protein kinase-like / 1075
2 / IPR000719 / Protein kinase / 1042
3 / IPR002290 / Serine/threonine protein kinase / 1008
4 / IPR001245 / Tyrosine protein kinase / 984
5 / IPR008271 / Serine/threonine protein kinase, active site / 731
6 / IPR001810 / Cyclin-like F-box / 606
7 / IPR008941 / TPR-like / 603
8 / IPR001611 / Leucine-rich repeat / 539
9 / IPR002885 / Pentatricopeptide repeat / 463
10 / IPR009057 / Homeodomain-like / 452
11 / IPR001841 / Zn-finger, RING / 430
12 / IPR008938 / ARM repeat fold / 364
13 / IPR007090 / Leucine-rich repeat, plant specific / 329
14 / IPR008940 / Protein prenyltransferase / 308
15 / IPR003593 / AAA ATPase / 306
16 / IPR001005 / Myb, DNA-binding / 297
17 / IPR006527 / F-box protein interaction domain / 256
18 / IPR000504 / RNA-binding region RNP-1 (RNA recognition motif) / 251
19 / IPR001128 / Cytochrome P450 / 246
20 / IPR011046 / WD40-like / 237
21 / IPR001680 / WD-40 repeat / 234
22 / IPR008994 / Nucleic acid-binding OB-fold / 224
23 / IPR002401 / E-class P450, group I / 222
24 / IPR000379 / Esterase/lipase/thioesterase / 217
25 / IPR010983 / EF-Hand-like / 205
26 / IPR011043 / Galactose oxidase, central / 181
27 / IPR003591 / Leucine-rich repeat, typical subtype / 172
28 / IPR011011 / FYVE/PHD zinc finger / 169
29 / IPR002048 / Calcium-binding EF-hand / 168
30 / IPR011050 / Pectin lyase-like / 164
31 / IPR000767 / Disease resistance protein / 159
32 / IPR001687 / ATP/GTP-binding site motif A (P-loop) / 155
33 / IPR002182 / NB-ARC / 154
34 / IPR001092 / Basic helix-loop-helix dimerisation region bHLH / 149
35 / IPR001410 / DEAD/DEAH box helicase / 149
36 / IPR011424 / C1-like / 146
37 / IPR001650 / Helicase, C-terminal / 144
38 / IPR002110 / Ankyrin / 142
39 / IPR001440 / TPR repeat / 142
40 / IPR006566 / FBD / 142

Supplementary Table 8. InterPro IDs of potential frequent hitters excluded fromfunctional descriptions.

PS00001 (IPR000042) / N-glycosylation site
PS00002 (IPR002179) / Glycosaminoglycan attachment site
PS00003 (IPR002032) / Tyrosine sulfation site
PS00004 (IPR001833) / cAMP/cGMP-dependent protein kinase, phosphorylation site
PS00005 (IPR001495) / Protein kinase C, phosphorylation site
PS00006 (IPR000430) / Casein kinase II phosphorylation site
PS00007 (IPR000220) / Tyrosine kinase phosphorylation site
PS00008 (IPR000338) / N-myristoylation site
PS00009 (IPR000134) / Amidation site
PS00010 (IPR000152) / Aspartic acid and asparagine hydroxylation site
PS00015 (IPR001430) / Bipartite nuclear targeting sequence
PS00016 (IPR001918) / Cell attachment region
PS00029 (IPR002158) / Leucine zipper
PS50079 (IPR001472) / Bipartite nuclear localization signal
PS50099 (IPR000694) / Proline-rich region
PS50101/PS00017(IPR001687) / ATP/GTP-binding site motif A
PR01217 (IPR002965) / Proline-rich extensin
PR00019/PF00560 (IPR001611) / Leucine-rich repeat