ANNOTATATION OF GENE LISTS

As we have seen from the SAM session, we can visualize the table containing the significant probesets typing:

>

The problem of this list is that it is not very informative on the biological point of view. We sure would like to know more about each of the significant probesets.

In this session, we will try to build a HTML output carrying relevant biological annotation for the significant probesets. We will also present the results ordered by fold change.

Preliminary operations.

  1. Open R-Gui;
  2. Load the workspace containing the SAM analysis objects.

Obtaining and modifying SAM results.

We can create a data.frame where we can store the SAM results:

> samsig<

let’s visualize the first 5 rows of this new data.frame:

>samsig[1:5,]

Row d.value stdev rawp q.value R.fold

206716_at 6242 -21.86692 0.2683784 0 0 0.01642527

209443_at 8936 -20.88361 0.1948048 0 0 0.06041301

204254_s_at 3781 -20.61441 0.1205610 0 0 0.17826224

206054_at 5580 -19.61923 0.2375107 0 0 0.03757413

220281_at 19645 -19.42689 0.2467351 0 0 0.03544170

The columns containing the q-values and the fold changes are the fifth and the sixth. So, we can create a new data.frame containing only the probeset IDs, the q-values and the fold changes:

> res<-samsig[,5:6]

You can notice that there are fold changes < 1 (0.-something). These fold changes indicate down-regulation of the expression in the RCC samples, compared to NORM samples. The reason of this, is that in SAM the fold changes have been calculated dividing the average expression of RCC by the average expression of NORM:

RCC / NORM.

It is more comfortable to visualize the down-regulated genes with negative fold change. For instance, a gene having fold change 0.50 would be expressed by the fold change -2. This is more readable.

Everything we need to do, is to transform the fold changes < 1 in -1/fold change:

> dim(res)

[1] 217 2

The data.frame “res” contains 217 rows and 2 columns.

> for(i in 1:217) {

+ if(res[i,2]<1)

+ res[i,2] <- -1/res[i,2]

+ }

For each row of the data.frame “res” (217), we will recalculate the new fold change if this is < 1.

In this very simple function, we use the variable i as a counter that will increase of one unit at the end of each cycle, till the value 217.

Let’s visualize the first 5 rows of the new res object:

> res[1:5,]

q.value R.fold

206716_at 0 -60.881782

209443_at 0 -16.552726

204254_s_at 0 -5.609713

206054_at 0 -26.614057

220281_at 0 -28.215350

We can notice that now the fold changes expressing down-regulation are negative numbers, but the probesets are ordered by increasing q-value, while we want to order the results by decreasing fold change.

We can sort the column containing fold changes and generate and index carrying the information of the position of each row of the data.frame according to the new order.

To do this, we can use the function sort() specifying that we want the decreasing order (dec=T) and that we want to generate the index (index.return=T).

> sorted<-sort(res[,2], dec=T, index.return=T)

The new object named “sorted” contains two slots: sorted$x contains the ordered fold changes, and the slot sorted$ix contains the index.

Now we can re-order all the rows of “res” using sorted$ix:

> res<-res[sorted$ix,]

Biological annotation and output creation.

In order to build up the annotation for our list of genes and to create the output table we need to use two libraries: hgu133a and annaffy.

> library(hgu133a)

>library(annaffy)

To see which columns of data can be included, we use the aaf.handler() function:

> aaf.handler()

[1] "Probe" "Symbol" "Description" "Function"

[5] "Chromosome" "Chromosome Location" "GenBank" "LocusLink"

[9] "Cytoband" "UniGene" "PubMed" "Gene Ontology"

[13] "Pathway"

In this example we will use all the annotation available, but many times we might decide to use only few of them.

> anncols <- aaf.handler()[c(1:13)]

Once we have defined the annotation columns (anncols), we can build the annotation table using the function aafTableAnn():

> anntable <- aafTableAnn(row.names(res), "hgu133a", anncols)

Now we can create separate tables carrying the q-values and the fold changes information:

> qvaltable <- aafTable("q-value"=res[,1])

> fctable <- aafTable("fold change"=res[,2], signed=T)

Using the function merge() it is possible to push together the qvaltable and the fctable. In the same way, we can merge also the annotation table (anntable).

> tb<-merge(qvaltable, fctable)

> tb<-merge(anntable, tb)

Finally, we can create the output table in HTML format. We need to specify the name of the file (“SAMout.html”).

> saveHTML(tb, "SAMout.html", title = "Significant Genes with annotation")

Visualizing the genomic position of the genes.

We can produce a plot to graphically visualize the position of our significant genes on the chromosomes.

First, we need to load the package geneplotter.

> library(geneplotter)

We can now build an object containing the location of all the genes represented on the Affymetrix HG-U133A chipset using the function buildChromLocation().

>chrloc<-buildChromLocation(“hgu133a”)

Now we can plot chrloc:

> cPlot(chrloc)

The relative position of the genes on the chromosomes is visualized in white.

Now we want to highlight our 217 significant probesets.

The probesets are stored as row.names of the data.frame res.

> gn<-row.names(res)

Finally, we can color our genes in red, for instance:

> cColor(gn, "red", chrloc)

Thank you for your attention!

Dario

For contacts:

Dario Greco

Institute of Biotechnology - University of Helsinki

Building Cultivator II

P.O.Box 56 Viikinkaari 4

FIN-00014 Finland

Office: +358 9 191 58951

Fax: +358 9 191 58952

Mobile: +358 44 023 5780

Email: