Simulated Example for Illustrating

Weighted Gene Co-Expression Network Analysis

Steve Horvath, Bin Zhang

Correspondence to

This is a companion R tutorial for explaining the simulated example in Zhang and Horvath (2005). To cite this tutorial, please use the following reference.

Zhang B, Horvath S (2005) A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology.

See also our webpage at

http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/

The example is constructed in a way

·  to show why the scale free topology criterion may be useful.

·  to show the advantage of weighted versus unweighted networks (soft versus hard thresholding of the correlation coefficient matrix)

·  show the observed relationship between the cluster coefficient CC and connectivity k in both weighted and unweighted networks

The module structure should be fairly realistic.

We define a measure of gene significance (denoted GS) and aim to correlate the intramodular connectivity k in the brown module with gene significance.

Here we present a simulated network for highlighting the differences between hard- and soft thresholding. The model assumes that the network is comprised of a brown, a blue and a turquoise module with module sizes n_1=100, n_2=200, and n3=300, respectively. Further, we assume that 500 grey (non-module) genes are present. For the genes of a given module, the correlation matrix was simulated as follows.

Description of module generation.

Here we present a simulated network model that has some of the properties observed in real networks. An explanation of this model can also be found in the article Zhang and Horvath (2005). The example exhibits a nearly optimal (simulated) signal for weighted and unweighted networks constructed using the scale free topology criterion. The network was simulated to consist of a brown, a blue and a turquoise module

with n1=100, n2=200, and n3=300 genes, respectively. Further, it contained 500 grey (non-module) genes. For the genes of the brown module, we simulated an external gene significance measure GS(i)=vsignal(i) and vsignal(i)=(1-0.3 i/n1)^5 where 1< i < n1. One goal of the analysis is to study how the Spearman correlation between gene significance and intramodular connectivity in the brown module depends on different

hard and soft thresholds.


The similarity matrix between genes of a given module contained a

signal and a noise part, see the figure below. Specifically, for i< 0.95* n1

and j<0.95*n1, we assumed that the similarity was given by S(i,j)=min(vsignal(i),vsignal(j)). The remaining 5 percent of (noise) genes were assumed to have a moderate similarity with the true signal genes. Specifically, for i>0.95* n1 and j < 0.95*n1 (or for i <0.95* n1 and j>0.95* n1), we set S(i,j)=vnoise(i)* vnoise(j) where vnoise(i)=(0.85+0.1* i/n)^5. Further, we assumed that the noise genes had a high similarity between each other: for i>0.95*n1 and j>0.95*n1 the similarity

matrix was set to S(i,j)=0.95^5.

The whole network similarity, which includes noise, is visualized below. Note that that the structure of the whole network similarity matrix is given by the 1100 times 1100 block-diagonal matrix S = BlockDiag(S1,S2,S3,0).

Noise epsilon(i,j) was added to the entries S(i,j) such that the result remained in the unit interval [0,1]. Specifically, epsilon(i,j)=(1-S(i,j)) U(i,j)^5 where the U(i,j) followed

independent uniform distributions on the the unit interval [0,1].

# The following function creates a module (value is the module correlation matrix and other quantities). Visually one can verify that the resulting modules look fairly realistic in a TOM plot. The input is the module size, the proportion of true signal (i.e. 1-prop1 is the proportion of noise genes), and a power parameter which is unrelated to the power parameters studied below.

if(exists("modulecreation") )rm(modulecreation); modulecreation=function(size1,prop1=.95,power1=5) {

# noise "background"

seq1=seq(1,size1)/size1

vnoise=( .85+(.95-.85)*seq1)^power1

SIMILARITY1=outer(vnoise,vnoise,FUN="*")

#this specifies the red noise rectangle of highly connected noise genes

SIMILARITY1[c((prop1*size1):size1), c((prop1*size1):size1)] =.95

# this specificies the signal part of the similarity matrix

vsignal=(1-.3*seq1)^power1

# signal part

SIMILARITY1[c(1:(size1*prop1)) ,c(1:(size1*prop1)) ]= outer(vsignal,vsignal,FUN="pmin")[c(1:(size1*prop1)) , c(1:(size1*prop1)) ]

diag(SIMILARITY1)=1

list(ModuleCOR= SIMILARITY1, GS= vsignal, k=apply(SIMILARITY1,2,sum),

CC=ClusterCoef.fun, lastColumn = SIMILARITY1[,size1])

}

library(class)

library(cluster)

library(sma) # this is needed for plot.mat below

library(scatterplot3d)

source("C:/Documents and Settings/SHorvath/My Documents/ADAG/JunDong/YEAST/YEASTtutorialApril2005/NetworkFunctions.txt")

#Memory

# check the maximum memory that can be allocated

memory.size(TRUE)

# increase the available memory

memory.limit(size=3448)

set.seed(1)

# module 1

n1=100;

par(mfrow=c(3,2))

mod1= modulecreation(n1)

module1=mod1[[1]];GS1= mod1[[2]];k1= mod1[[3]];CC1=mod1[[4]]

seq1=mod1$lastColumn

# This allows you to look at the constructed module

par(mfrow=c(1,1),mar=c(0,0,0,0))

reverse1=rank(-c(1:n1))

image(1-t(module1[reverse1,]))

# This bar reports the Gene significance measure of the genes.

signifBar = cbind(1-GS1, 1-GS1,1-GS1,1-GS1)

par(mfrow=c(1,1), mar=c(0, 0, 0, 0))

heatmap(as.matrix(t(signifBar)),Rowv=NA,Colv=NA,scale="none", xlab="", ylab="", margins = c(0,0))

# module 2

n2=200;mod1= modulecreation(n2)

module2=mod1[[1]];GS2= mod1[[2]];k2= mod1[[3]];CC2=mod1[[4]]

seq2=mod1$lastColumn

# module 3

n3=300;mod1= modulecreation(n3)

module3=mod1[[1]];GS3= mod1[[2]];k3= mod1[[3]];CC3=mod1[[4]]

seq3=mod1$lastColumn

# grey genes

n4=500

nTotal=n1+n2+n3+n4

seqTotal= seq(1:nTotal)/(nTotal+1)

Similarity1= matrix(0 , nrow=nTotal, ncol=nTotal)

index1=c(1:n1);index2=c((n1+1):(n1+n2)); index3=c((n1+n2+1):(n1+n2+n3))

index4=c((n1+n2+n3+1):(n1+n2+n3+n4))

Similarity1[index1,index1]=module1; Similarity1[index2,index2]=module2

Similarity1[index3,index3]=module3

# now we add noise to the correlation matrix to make it more realistic

# the noise is added in a way to ensure that the entries lie between 0 and 1.

Similarity1=Similarity1+(1-Similarity1)*matrix( runif(nTotal*nTotal)^5,nrow=nTotal,ncol=nTotal)

# here we symmetrize the similarity matrix

Similarity1=( t(Similarity1)+ Similarity1)/2

diag(Similarity1)=1

# here we verify that the entries lie between 0 and 1

summary(c(as.dist(Similarity1 )))

# true underlying module label

truemodule=rep(c("brown", "blue","turquoise","grey"),c(n1,n2,n3,n4))

# This is a color-coded picture of the similarity matrix

reverse1=rank(-c(1:nTotal))

par(mfrow=c(1,1),mar=c(0,0,0,0))

image(1-t(Similarity1[reverse1,]))

# The following shows the simulated module membership.

barplot(rep(1,nTotal),col=as.character(truemodule),space=0,border= as.character(truemodule),offset=0)

# SCALE FREE TOPOLOGY CHECK FOR SOFT AND HARD Thresholding

# no.breaks1=specifies the number of points in a scale free plot.

# Unfortunately R^2 depends on this to some extent --> opportunity for research

no.breaks1=9

powers1=seq(1,9,by=1)

TableSoft=data.frame(matrix(NA,nrow=length(powers1),ncol=4))

names(TableSoft)=c("powers", "ScaleFreeRsquared", "Slope", "TruncatedRsquared")

par(mfrow=c(3,3), mar= c(2, 2, 2, 2) )

for (i in c(1:length(powers1)) ){

TableSoft[i,1]=powers1[i]

ADJsoft=abs(Similarity1)^powers1[i];diag(ADJsoft)=0;ksoft=apply(ADJsoft,2,sum)

TableSoft[i, 2:4]=ScaleFreePlot1(ksoft, truncated=T,AF1= paste("beta=",as.character(powers1[i])), no.breaks=no.breaks1)

}

thresholds1= seq(from=.1,to=.9,by=0.1)

TableHard=data.frame(matrix(NA,nrow=length(thresholds1),ncol=4))

names(TableHard)=c("thresholds", "ScaleFreeRsquared", "Slope", "TruncatedRsquared")

par(mfrow=c(3,3), mar= c(2, 2, 2, 2) )

for (i in c(1:length(thresholds1)) ){

TableHard[i,1]=thresholds1[i]

ADJhard=abs(Similarity1)>thresholds1[i];diag(ADJhard)=0;khard=apply(ADJhard,2,sum)

TableHard[i, 2:4]=ScaleFreePlot1(khard, truncated=T,AF1= paste("tau=",as.character(thresholds1[i])) ,no.breaks=no.breaks1)

}

par(mfrow=c(1,1),mar= c(5, 4, 4, 2) +0.1)

plot(1:length(powers1),

-sign(TableHard[,3])*TableHard[,2], type="n",ylab="Scale Free Topology R^2",xlab="Adjacency Parameter Number", ylim=range(min(c(-sign(TableSoft[,3])*TableSoft[,1], -sign(TableHard[,3])*TableHard[,1]),na.rm=T),1) )

text(1:length(thresholds1), -sign(TableHard[,3])*TableHard[,2], labels= thresholds1, col="black")

points(1:length(powers1), -sign(TableSoft[,3])*TableSoft[,2], type="n")

text(1:length(powers1), -sign(TableSoft[,3])*TableSoft[,2], labels= as.character(powers1), col="red")

abline(h=.73)

# Based on where the "kink" occurs, the scale free topology

# criterion would lead us to choose the following parameters

# for the soft and hard adjacency function

# R^2>0.73

beta1=3 # this is the the power adjacency function parameter in power(s,beta)

tau1=.6 # this parameter is hard threshold parameter in the signum function.

# By playing around with these parameters, you will find that the

# findings are highly robust with respect to beta1, which is an attractive property.


# Creating Scale Free Topology Plots

par(mfrow=c(2,1))

ADJsoft=abs(Similarity1)^beta1;diag(ADJsoft)=0;ksoft=apply(ADJsoft,2,sum)

ScaleFreePlot1(ksoft, truncated=T,AF1= paste("beta=",as.character(beta1)), ,no.breaks=no.breaks1)

ADJhard=I(Similarity1>tau1)+0.0; diag(ADJhard)=0;khard=apply(ADJhard,2,sum)

ScaleFreePlot1(khard,truncated=T,AF1=paste("tau=",as.character(tau1)) ,no.breaks=no.breaks1 )


# Now we define the gene significance variable for the brown module

# We permute the GS measures in the other modules to make the example realistic

GS=c(GS1,sample(GS2)/2,sample(GS3)/2, sample(GS1,n4,replace=T)/2 )

# The following code assumes that we can identify the true underlying brown module.

# In reality this is not the case, which is why we revisit this issue below.

datconnectivitiesSoft=data.frame(matrix(NA,nrow=sum(truemodule=="brown"),ncol=length(powers1)))

names(datconnectivitiesSoft)=paste("kWithinPower",powers1,sep="")

for (i in c(1:length(powers1)) ) {

datconnectivitiesSoft[,i]=apply(abs(Similarity1[truemodule=="brown", truemodule=="brown"])^powers1[i],1,sum)}

SpearmanCorrelationsSoft=signif(cor(GS[ truemodule=="brown"], datconnectivitiesSoft, method="s",use="p"))

datconnectivitiesHard=data.frame(matrix(NA,nrow=sum(truemodule=="brown"),ncol=length(thresholds1)))

names(datconnectivitiesHard)=paste("kWithinPower",thresholds1,sep="")

for (i in c(1:length(thresholds1)) ) {

datconnectivitiesHard[,i]=apply(abs(Similarity1[truemodule=="brown", truemodule=="brown"])>thresholds1[i],1,sum)}

SpearmanCorrelationsHard=signif(cor(GS[ truemodule=="brown"], datconnectivitiesHard, method="s",use="p"))

par(mfrow=c(2,2))

plot(powers1, SpearmanCorrelationsSoft,type="n", main="Soft")

text(powers1, SpearmanCorrelationsSoft,labels=powers1,col="red")

abline(v=beta1,col="red")

plot(thresholds1, SpearmanCorrelationsHard,type="n",

main="Hard" ,ylim= range( c(SpearmanCorrelationsSoft, SpearmanCorrelationsHard),na.rm=T) )

text(thresholds1, SpearmanCorrelationsHard,labels=thresholds1,col="black")

abline(v=tau1,col="red")

plot(-sign(TableHard[,3])*TableHard[,2], SpearmanCorrelationsHard,type="n",ylim=range(c(SpearmanCorrelationsHard, SpearmanCorrelationsSoft),na.rm=T),

xlim=range(c(-sign(TableHard[,3])*TableHard[,2], -sign(TableSoft[,3])*TableSoft[,2]),na.rm=T))

text(-sign(TableHard[,3])*TableHard[,2] , SpearmanCorrelationsHard,label=as.character(thresholds1))

points(-sign(TableSoft[,3])*TableSoft[,2]

, SpearmanCorrelationsSoft, pch=as.character(powers1),col="red")

Message: in this example, soft thresholding leads to a signal that is very robust to the choice of the AF parameter. In contrast, the hard thresholding can lead to very different findings.

## TOM PLOT

# The following code computes the topological overlap matrix based on

# adjacency matrix.

dissGTOM1soft=TOMdist1(ADJsoft)

collect_garbage()

dissGTOM1hard=TOMdist1(ADJhard)

collect_garbage()

# resulting clustering tree will be used to define gene modules.

hierGTOM1soft <- hclust(as.dist(dissGTOM1soft),method="average");

hierGTOM1hard <- hclust(as.dist(dissGTOM1hard),method="average");

par(mfrow=c(1,2))

plot(hierGTOM1soft,labels=F,sub="",xlab="")

plot(hierGTOM1hard,labels=F,sub="",xlab="")

# here we chose the height cut-off such that many "true" brown module genes are #included in the designated brown module.

h1soft= .94

h1hard= .91

colorh1soft=as.character(modulecolor2(hierGTOM1soft,h1= h1soft, minsize1=20))

#colorh1soft=ifelse(colorh1soft=="grey","brown",colorh1soft)

table(colorh1soft)

>table(colorh1soft)

colorh1soft

blue brown grey turquoise

206 89 435 370

colorh1hard=as.character(modulecolor2(hierGTOM1hard,h1= h1hard,minsize1=20))

#colorh1hard=ifelse(colorh1hard=="grey","brown",colorh1hard)

table(colorh1hard)

>table(colorh1hard)

colorh1hard

blue brown grey turquoise

218 87 422 373

par(mfrow=c(3,2))

plot(hierGTOM1soft,labels=F,sub="",xlab="")

plot(hierGTOM1hard,labels=F,sub="",xlab="")

hclustplot1(hierGTOM1soft,truemodule,title1="True module colors")

hclustplot1(hierGTOM1hard,truemodule,title1="True module colors")

hclustplot1(hierGTOM1soft, colorh1soft,title1="Identified module colors")

hclustplot1(hierGTOM1hard, colorh1hard,title1="Identified module colors")

table(colorh1soft,truemodule)

table(colorh1hard,truemodule)

> table(colorh1soft,truemodule)

truemodule

colorh1soft blue brown grey turquoise

blue 200 0 6 0

brown 0 89 0 0

grey 0 11 424 0

turquoise 0 0 70 300

> table(colorh1hard,truemodule)

truemodule

colorh1hard blue brown grey turquoise

blue 189 0 29 0

brown 0 80 6 1

grey 11 20 384 7

turquoise 0 0 81 292

Message: soft thresholding leads to far better cluster retrieval.

par(mfrow=c(1,1))

# if you have patience and enough stack memory, try to run this code

#TOMplot1(dissGTOM1soft , hierGTOM1soft, colorh1soft)

#TOMplot1(dissGTOM1hard , hierGTOM1hard, colorh1hard)

# Instead, I will restrict the TOM plot to the most connected genes.

restrict1= ksoft>40

table(restrict1)

>table(restrict1)

restrict1

FALSE TRUE

563 537

dissGTOM1softrest=TOMdist1(ADJsoft[restrict1,restrict1])

collect_garbage()

dissGTOM1hardrest=TOMdist1(ADJhard[restrict1,restrict1])

hierGTOM1softrest <- hclust(as.dist(dissGTOM1softrest),method="average");

hierGTOM1hardrest <- hclust(as.dist(dissGTOM1hardrest),method="average");

# SOFT TOM PLOT

par(mfrow=c(1,1))

TOMplot1(dissGTOM1softrest , hierGTOM1softrest, truemodule[restrict1])


# HARD TOM PLOT

TOMplot1(dissGTOM1hardrest , hierGTOM1hardrest, colorh1hard[restrict1])


# We also propose to use classical multi-dimensional scaling plots

# for visualizing the network. Here we chose 2 scaling dimensions

cmd1=cmdscale(as.dist(dissGTOM1soft),2)

cmd2=cmdscale(as.dist(dissGTOM1hard),2)

par(mfrow=c(1,2))

plot(cmd1, col=as.character(colorh1soft), main="Soft MDS plot",xlab="Scaling Axis 1", ylab="Scaling Axis 2")

plot(cmd2, col=as.character(colorh1hard), main="Hard MDS plot",xlab="Scaling Axis 1", ylab="Scaling Axis 2")

These MDS plots also look fairly realistic, compare with Zhang and Horvath (2005).


# Just to be sure let’s re-define the adjacency matrix and connectivities used.

ADJhard=I(Similarity1>tau1)+0.0; diag(ADJhard)=0;khard=apply(ADJhard,2,sum)

ADJsoft=abs(Similarity1)^beta1;diag(ADJsoft)=0;ksoft=apply(ADJsoft,2,sum)

ADJhardbrown=I(abs(Similarity1[colorh1hard=="brown", colorh1hard=="brown"])

>tau1)+0.0; diag(ADJhardbrown)=0;khardbrown=apply(ADJhardbrown,2,sum)

ADJsoftbrown=abs(Similarity1[colorh1soft=="brown", colorh1soft=="brown"])^beta1;diag(ADJsoftbrown)=0;ksoftbrown=apply(ADJsoftbrown,2,sum)

# This computes the intramodular connectivities

AllDegreesSoft=DegreeInOut(ADJsoft,colorh1soft)

AllDegreesHard=DegreeInOut(ADJhard,colorh1hard)

# Here we compute the soft and the hard cluster coefficients

cluster.coefSoft= ClusterCoef.fun(ADJsoft)

cluster.coefHard= ClusterCoef.fun(ADJhard)

# Now we plot cluster coefficient versus connectivity

# for all genes. We use the true module coloring

par(mfrow=c(2,1))

plot(ksoft, cluster.coefSoft,col=truemodule,ylab="Cluster Coefficient",xlab="Connectivity",main="Soft")

#whichmodule="turquoise"

#plot(AllDegreesSoftkWithin[truemodule==whichmodule],

#cluster.coefSoft[truemodule==whichmodule],col=whichmodule,ylab="Cluster #Coefficient",xlab="Connectivity",main=whichmodule)

plot(khard, cluster.coefHard,col=truemodule,ylab="Cluster Coefficient",xlab="Connectivity",main="Hard")

Comment: for the high connectivity genes in each module CC is inversely related to K in unweighted networks (hard thresholding). In contrast, there is a slightly positive (or constant) relationship among hub genes in weighted networks.

#Gene significance analysis

# Now we study how the gene significance is related to connectivity in the brown module

datconnectivitiesSoft=data.frame(matrix(666,nrow=sum(colorh1soft=="brown"),ncol=length(powers1)))

names(datconnectivitiesSoft)=paste("kWithinPower",powers1,sep="")

for (i in c(1:length(powers1)) ) {

datconnectivitiesSoft[,i]=apply(abs(Similarity1[colorh1soft=="brown", colorh1soft=="brown"])^powers1[i],1,sum)}

SpearmanCorrelationsSoft=signif(cor(GS[ colorh1soft=="brown"], datconnectivitiesSoft, method="s",use="p"))

datconnectivitiesHard=data.frame(matrix(666,nrow=sum(colorh1hard=="brown"),ncol=length(thresholds1)))

names(datconnectivitiesHard)=paste("kWithinPower",thresholds1,sep="")

for (i in c(1:length(thresholds1)) ) {

datconnectivitiesHard[,i]=apply(abs(Similarity1[colorh1hard=="brown", colorh1hard=="brown"])>thresholds1[i],1,sum)}

SpearmanCorrelationsHard=signif(cor(GS[ colorh1hard=="brown"], datconnectivitiesHard, method="s",use="p"))

par(mfrow=c(2,2))

plot(powers1, SpearmanCorrelationsSoft,type="n" ,xlab="Power (beta)" , main="Soft")

text(powers1, SpearmanCorrelationsSoft,labels=powers1,col="red")

abline(v=beta1,col="red")

plot(thresholds1, SpearmanCorrelationsHard,type="n" ,ylim= range( c(SpearmanCorrelationsHard),na.rm=T) ,xlab="Thresholds (tau)" ,main="Hard")

text(thresholds1, SpearmanCorrelationsHard,labels=thresholds1,col="black")

abline(v=tau1,col="red")

plot(-sign(TableHard[,3])*TableHard[,2], SpearmanCorrelationsHard,type="n" ,

ylim=range(c(SpearmanCorrelationsHard, SpearmanCorrelationsSoft),na.rm=T), xlim=range(c(-sign(TableHard[,3])*TableHard[,2], -sign(TableSoft[,3])*TableSoft[,2]),na.rm=T),xlab="Signed Scale Free R^2",ylab="Correlation(k, Gene Significance)" )

text(-sign(TableHard[,3])*TableHard[,2], SpearmanCorrelationsHard,label=as.character(thresholds1))

points(-sign(TableSoft[,3])*TableSoft[,2], SpearmanCorrelationsSoft, type="n")

text(-sign(TableSoft[,3])*TableSoft[,2], SpearmanCorrelationsSoft,labels=as.character(powers1),col="red")

Again, we find that hard thresholding leads to results that are not robust.

Further the SFT criterion works quite well. No surprise here: this is how the simulated example was constructed.

# Now we want to see how the correlation between kWithin and gene significance

# changes for different hard threshold values tau within the BROWN module.

whichmodule="brown"

# the following data frame contains the intramodular connectivities

# corresponding to different hard thresholds

datconnectivitiesHard=data.frame(matrix(666,nrow=sum(colorh1hard==whichmodule),ncol=length(thresholds1)))

names(datconnectivitiesHard)=paste("kWithinTau",thresholds1,sep="")

for (i in c(1:length(thresholds1)) ) {

datconnectivitiesHard[,i]=apply(abs(Similarity1[colorh1hard==whichmodule, colorh1hard==whichmodule])>=thresholds1[i],1,sum)}

SpearmanCorrelationsHard=signif(cor(GS[ colorh1hard==whichmodule], datconnectivitiesHard, method="s",use="p"))

# Now we define the new connectivity measure omega based on the TOM matrix

# It simply considers TOM as adjacency matrix...

datOmegaINHard=data.frame(matrix(666,nrow=sum(colorh1hard==whichmodule),ncol=length(thresholds1)))

names(datOmegaINHard)=paste("omegaWithinHard",thresholds1,sep="")

for (i in c(1:length(thresholds1)) ) {

datconnectivitiesHard[,i]=apply(

1-TOMdist1(abs(Similarity1[colorh1hard==whichmodule, colorh1hard==whichmodule])>thresholds1[i]),1,sum)}

SpearmanCorrelationsOmegaHard=as.vector(signif(cor(GS[ colorh1hard==whichmodule], datconnectivitiesHard, method="s",use="p")))

# Now we define the intramodular cluster coefficient

datCCinHard=data.frame(matrix(666,nrow=sum(colorh1hard==whichmodule),ncol=length(thresholds1)))

names(datCCinHard)=paste("CCinHard",thresholds1,sep="")

for (i in c(1:length(thresholds1)) ) {

datCCinHard[,i]= ClusterCoef.fun(abs(Similarity1[colorh1hard==whichmodule, colorh1hard==whichmodule])>thresholds1[i])}

SpearmanCorrelationsCCinHard=as.vector(signif(cor(GS[ colorh1hard==whichmodule], datCCinHard, method="s",use="p")))


# Now we compare the performance of the connectivity measures (k.in,

# omega.in, cluster coefficience) across different hard thresholds when it comes to predicting

# prognostic genes in the brown module

dathelpHard=data.frame(signedRsquared=-sign(TableHard[,3])*TableHard[,2], corGSkINHard =as.vector(SpearmanCorrelationsHard), corGSwINHard= as.vector(SpearmanCorrelationsOmegaHard),corGSCCHard=as.vector(SpearmanCorrelationsCCinHard))

# Now we focus on soft thresholding

# the following data frame contains the intramodular connectivities

# corresponding to different soft powers

datconnectivitiesSoft=data.frame(matrix(666,nrow=sum(colorh1soft==whichmodule),ncol=length(powers1)))

names(datconnectivitiesSoft)=paste("kWithinTau",powers1,sep="")

for (i in c(1:length(powers1)) ) {

datconnectivitiesSoft[,i]=apply(abs(Similarity1[colorh1soft==whichmodule, colorh1soft==whichmodule])^powers1[i],1,sum)}

SpearmanCorrelationsSoft=signif(cor(GS[ colorh1soft==whichmodule], datconnectivitiesSoft, method="s",use="p"))

# Now we define the new connectivity measure omega based on the TOM #matrix

# It simply considers TOM as adjacency matrix...

datOmegaINSoft=data.frame(matrix(666,nrow=sum(colorh1soft==whichmodule),ncol=length(powers1)))

names(datOmegaINSoft)=paste("omegaWithinSoft",powers1,sep="")

for (i in c(1:length(powers1)) ) {

datconnectivitiesSoft[,i]=apply(

1-TOMdist1(abs(Similarity1[colorh1soft==whichmodule, colorh1soft==whichmodule])^powers1[i]),1,sum)}

SpearmanCorrelationsOmegaSoft=as.vector(signif(cor(GS[ colorh1soft==whichmodule], datconnectivitiesSoft, method="s",use="p")))

# Now we define the intramodular cluster coefficient

datCCinSoft=data.frame(matrix(666,nrow=sum(colorh1soft==whichmodule),ncol=length(powers1)))

names(datCCinSoft)=paste("CCinSoft",powers1,sep="")

for (i in c(1:length(powers1)) ) {

datCCinSoft[,i]= ClusterCoef.fun(abs(Similarity1[colorh1soft==whichmodule, colorh1soft==whichmodule])^powers1[i])}

SpearmanCorrelationsCCinSoft=as.vector(signif(cor(GS[ colorh1soft==whichmodule], datCCinSoft, method="s",use="p")))


# Now we compare the performance of the connectivity measures (k.in,

# omega.in, cluster coefficience) across different soft powers when it comes to predicting

# prognostic genes in the brown module

dathelpSoft=data.frame(signedRsquared=-sign(TableSoft[,3])*TableSoft[,2], corGSkINSoft =as.vector(SpearmanCorrelationsSoft), corGSwINSoft= as.vector(SpearmanCorrelationsOmegaSoft),corGSCCSoft=as.vector(SpearmanCorrelationsCCinSoft))

par(mfrow=c(1,1))

matplot(thresholds1,dathelpHard,type="l",lty=1,lwd=3,col=c("black","red","blue","green"),ylab="",xlab="tau",main="Hard")

legend(0.65,0.2, c("signed R^2","r(GS,k.in)","r(GS,omega.in)","r(GS,cc.in)"), col=c("black","red","blue","green"), lty=1,lwd=3,ncol = 1, cex=1)

abline(v=tau1,col="red")

Comment: in unweighted networks, the standard connectivity works as well as the TOM based connectivity omega.in.

But note that the correlations are highly dependent on the threshold tau, which means the findings are not robust.
matplot(powers1,dathelpSoft,type="l",lty=1,lwd=3,col=c("black","red","blue","green"),ylab="",xlab="beta",main="Soft")

legend(6.8,0.65, c("signed R^2","r(GS,k.in)","r(GS,omega.in)","r(GS,cc.in)"), col=c("black","red","blue","green"), lty=1,lwd=3,ncol = 1, cex=1)

abline(v=beta1,col="red")

We find that the weighted network is much more robust (high correlations for most choices of beta). Note that omega.in works very well in weighted networks. The cluster coefficient cc.in works well for low values of beta. Also see Figure 16 in Zhang and Horvath 2005. This suggests that in weighted networks the cluster coefficient may serve as yet another node centrality (connectivity) measure.

THE END

14