Simulated Example for Illustrating
Weighted Gene Co-Expression Network Analysis
Steve Horvath, Bin Zhang
Correspondence to
This is a companion R tutorial for explaining the simulated example in Zhang and Horvath (2005). To cite this tutorial, please use the following reference.
Zhang B, Horvath S (2005) A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology.
See also our webpage at
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/
The example is constructed in a way
· to show why the scale free topology criterion may be useful.
· to show the advantage of weighted versus unweighted networks (soft versus hard thresholding of the correlation coefficient matrix)
· show the observed relationship between the cluster coefficient CC and connectivity k in both weighted and unweighted networks
The module structure should be fairly realistic.
We define a measure of gene significance (denoted GS) and aim to correlate the intramodular connectivity k in the brown module with gene significance.
Here we present a simulated network for highlighting the differences between hard- and soft thresholding. The model assumes that the network is comprised of a brown, a blue and a turquoise module with module sizes n_1=100, n_2=200, and n3=300, respectively. Further, we assume that 500 grey (non-module) genes are present. For the genes of a given module, the correlation matrix was simulated as follows.
Description of module generation.
Here we present a simulated network model that has some of the properties observed in real networks. An explanation of this model can also be found in the article Zhang and Horvath (2005). The example exhibits a nearly optimal (simulated) signal for weighted and unweighted networks constructed using the scale free topology criterion. The network was simulated to consist of a brown, a blue and a turquoise module
with n1=100, n2=200, and n3=300 genes, respectively. Further, it contained 500 grey (non-module) genes. For the genes of the brown module, we simulated an external gene significance measure GS(i)=vsignal(i) and vsignal(i)=(1-0.3 i/n1)^5 where 1< i < n1. One goal of the analysis is to study how the Spearman correlation between gene significance and intramodular connectivity in the brown module depends on different
hard and soft thresholds.
The similarity matrix between genes of a given module contained a
signal and a noise part, see the figure below. Specifically, for i< 0.95* n1
and j<0.95*n1, we assumed that the similarity was given by S(i,j)=min(vsignal(i),vsignal(j)). The remaining 5 percent of (noise) genes were assumed to have a moderate similarity with the true signal genes. Specifically, for i>0.95* n1 and j < 0.95*n1 (or for i <0.95* n1 and j>0.95* n1), we set S(i,j)=vnoise(i)* vnoise(j) where vnoise(i)=(0.85+0.1* i/n)^5. Further, we assumed that the noise genes had a high similarity between each other: for i>0.95*n1 and j>0.95*n1 the similarity
matrix was set to S(i,j)=0.95^5.
The whole network similarity, which includes noise, is visualized below. Note that that the structure of the whole network similarity matrix is given by the 1100 times 1100 block-diagonal matrix S = BlockDiag(S1,S2,S3,0).
Noise epsilon(i,j) was added to the entries S(i,j) such that the result remained in the unit interval [0,1]. Specifically, epsilon(i,j)=(1-S(i,j)) U(i,j)^5 where the U(i,j) followed
independent uniform distributions on the the unit interval [0,1].
# The following function creates a module (value is the module correlation matrix and other quantities). Visually one can verify that the resulting modules look fairly realistic in a TOM plot. The input is the module size, the proportion of true signal (i.e. 1-prop1 is the proportion of noise genes), and a power parameter which is unrelated to the power parameters studied below.
if(exists("modulecreation") )rm(modulecreation); modulecreation=function(size1,prop1=.95,power1=5) {
# noise "background"
seq1=seq(1,size1)/size1
vnoise=( .85+(.95-.85)*seq1)^power1
SIMILARITY1=outer(vnoise,vnoise,FUN="*")
#this specifies the red noise rectangle of highly connected noise genes
SIMILARITY1[c((prop1*size1):size1), c((prop1*size1):size1)] =.95
# this specificies the signal part of the similarity matrix
vsignal=(1-.3*seq1)^power1
# signal part
SIMILARITY1[c(1:(size1*prop1)) ,c(1:(size1*prop1)) ]= outer(vsignal,vsignal,FUN="pmin")[c(1:(size1*prop1)) , c(1:(size1*prop1)) ]
diag(SIMILARITY1)=1
list(ModuleCOR= SIMILARITY1, GS= vsignal, k=apply(SIMILARITY1,2,sum),
CC=ClusterCoef.fun, lastColumn = SIMILARITY1[,size1])
}
library(class)
library(cluster)
library(sma) # this is needed for plot.mat below
library(scatterplot3d)
source("C:/Documents and Settings/SHorvath/My Documents/ADAG/JunDong/YEAST/YEASTtutorialApril2005/NetworkFunctions.txt")
#Memory
# check the maximum memory that can be allocated
memory.size(TRUE)
# increase the available memory
memory.limit(size=3448)
set.seed(1)
# module 1
n1=100;
par(mfrow=c(3,2))
mod1= modulecreation(n1)
module1=mod1[[1]];GS1= mod1[[2]];k1= mod1[[3]];CC1=mod1[[4]]
seq1=mod1$lastColumn
# This allows you to look at the constructed module
par(mfrow=c(1,1),mar=c(0,0,0,0))
reverse1=rank(-c(1:n1))
image(1-t(module1[reverse1,]))
# This bar reports the Gene significance measure of the genes.
signifBar = cbind(1-GS1, 1-GS1,1-GS1,1-GS1)
par(mfrow=c(1,1), mar=c(0, 0, 0, 0))
heatmap(as.matrix(t(signifBar)),Rowv=NA,Colv=NA,scale="none", xlab="", ylab="", margins = c(0,0))
# module 2
n2=200;mod1= modulecreation(n2)
module2=mod1[[1]];GS2= mod1[[2]];k2= mod1[[3]];CC2=mod1[[4]]
seq2=mod1$lastColumn
# module 3
n3=300;mod1= modulecreation(n3)
module3=mod1[[1]];GS3= mod1[[2]];k3= mod1[[3]];CC3=mod1[[4]]
seq3=mod1$lastColumn
# grey genes
n4=500
nTotal=n1+n2+n3+n4
seqTotal= seq(1:nTotal)/(nTotal+1)
Similarity1= matrix(0 , nrow=nTotal, ncol=nTotal)
index1=c(1:n1);index2=c((n1+1):(n1+n2)); index3=c((n1+n2+1):(n1+n2+n3))
index4=c((n1+n2+n3+1):(n1+n2+n3+n4))
Similarity1[index1,index1]=module1; Similarity1[index2,index2]=module2
Similarity1[index3,index3]=module3
# now we add noise to the correlation matrix to make it more realistic
# the noise is added in a way to ensure that the entries lie between 0 and 1.
Similarity1=Similarity1+(1-Similarity1)*matrix( runif(nTotal*nTotal)^5,nrow=nTotal,ncol=nTotal)
# here we symmetrize the similarity matrix
Similarity1=( t(Similarity1)+ Similarity1)/2
diag(Similarity1)=1
# here we verify that the entries lie between 0 and 1
summary(c(as.dist(Similarity1 )))
# true underlying module label
truemodule=rep(c("brown", "blue","turquoise","grey"),c(n1,n2,n3,n4))
# This is a color-coded picture of the similarity matrix
reverse1=rank(-c(1:nTotal))
par(mfrow=c(1,1),mar=c(0,0,0,0))
image(1-t(Similarity1[reverse1,]))
# The following shows the simulated module membership.
barplot(rep(1,nTotal),col=as.character(truemodule),space=0,border= as.character(truemodule),offset=0)
# SCALE FREE TOPOLOGY CHECK FOR SOFT AND HARD Thresholding
# no.breaks1=specifies the number of points in a scale free plot.
# Unfortunately R^2 depends on this to some extent --> opportunity for research
no.breaks1=9
powers1=seq(1,9,by=1)
TableSoft=data.frame(matrix(NA,nrow=length(powers1),ncol=4))
names(TableSoft)=c("powers", "ScaleFreeRsquared", "Slope", "TruncatedRsquared")
par(mfrow=c(3,3), mar= c(2, 2, 2, 2) )
for (i in c(1:length(powers1)) ){
TableSoft[i,1]=powers1[i]
ADJsoft=abs(Similarity1)^powers1[i];diag(ADJsoft)=0;ksoft=apply(ADJsoft,2,sum)
TableSoft[i, 2:4]=ScaleFreePlot1(ksoft, truncated=T,AF1= paste("beta=",as.character(powers1[i])), no.breaks=no.breaks1)
}
thresholds1= seq(from=.1,to=.9,by=0.1)
TableHard=data.frame(matrix(NA,nrow=length(thresholds1),ncol=4))
names(TableHard)=c("thresholds", "ScaleFreeRsquared", "Slope", "TruncatedRsquared")
par(mfrow=c(3,3), mar= c(2, 2, 2, 2) )
for (i in c(1:length(thresholds1)) ){
TableHard[i,1]=thresholds1[i]
ADJhard=abs(Similarity1)>thresholds1[i];diag(ADJhard)=0;khard=apply(ADJhard,2,sum)
TableHard[i, 2:4]=ScaleFreePlot1(khard, truncated=T,AF1= paste("tau=",as.character(thresholds1[i])) ,no.breaks=no.breaks1)
}
par(mfrow=c(1,1),mar= c(5, 4, 4, 2) +0.1)
plot(1:length(powers1),
-sign(TableHard[,3])*TableHard[,2], type="n",ylab="Scale Free Topology R^2",xlab="Adjacency Parameter Number", ylim=range(min(c(-sign(TableSoft[,3])*TableSoft[,1], -sign(TableHard[,3])*TableHard[,1]),na.rm=T),1) )
text(1:length(thresholds1), -sign(TableHard[,3])*TableHard[,2], labels= thresholds1, col="black")
points(1:length(powers1), -sign(TableSoft[,3])*TableSoft[,2], type="n")
text(1:length(powers1), -sign(TableSoft[,3])*TableSoft[,2], labels= as.character(powers1), col="red")
abline(h=.73)
# Based on where the "kink" occurs, the scale free topology
# criterion would lead us to choose the following parameters
# for the soft and hard adjacency function
# R^2>0.73
beta1=3 # this is the the power adjacency function parameter in power(s,beta)
tau1=.6 # this parameter is hard threshold parameter in the signum function.
# By playing around with these parameters, you will find that the
# findings are highly robust with respect to beta1, which is an attractive property.
# Creating Scale Free Topology Plots
par(mfrow=c(2,1))
ADJsoft=abs(Similarity1)^beta1;diag(ADJsoft)=0;ksoft=apply(ADJsoft,2,sum)
ScaleFreePlot1(ksoft, truncated=T,AF1= paste("beta=",as.character(beta1)), ,no.breaks=no.breaks1)
ADJhard=I(Similarity1>tau1)+0.0; diag(ADJhard)=0;khard=apply(ADJhard,2,sum)
ScaleFreePlot1(khard,truncated=T,AF1=paste("tau=",as.character(tau1)) ,no.breaks=no.breaks1 )
# Now we define the gene significance variable for the brown module
# We permute the GS measures in the other modules to make the example realistic
GS=c(GS1,sample(GS2)/2,sample(GS3)/2, sample(GS1,n4,replace=T)/2 )
# The following code assumes that we can identify the true underlying brown module.
# In reality this is not the case, which is why we revisit this issue below.
datconnectivitiesSoft=data.frame(matrix(NA,nrow=sum(truemodule=="brown"),ncol=length(powers1)))
names(datconnectivitiesSoft)=paste("kWithinPower",powers1,sep="")
for (i in c(1:length(powers1)) ) {
datconnectivitiesSoft[,i]=apply(abs(Similarity1[truemodule=="brown", truemodule=="brown"])^powers1[i],1,sum)}
SpearmanCorrelationsSoft=signif(cor(GS[ truemodule=="brown"], datconnectivitiesSoft, method="s",use="p"))
datconnectivitiesHard=data.frame(matrix(NA,nrow=sum(truemodule=="brown"),ncol=length(thresholds1)))
names(datconnectivitiesHard)=paste("kWithinPower",thresholds1,sep="")
for (i in c(1:length(thresholds1)) ) {
datconnectivitiesHard[,i]=apply(abs(Similarity1[truemodule=="brown", truemodule=="brown"])>thresholds1[i],1,sum)}
SpearmanCorrelationsHard=signif(cor(GS[ truemodule=="brown"], datconnectivitiesHard, method="s",use="p"))
par(mfrow=c(2,2))
plot(powers1, SpearmanCorrelationsSoft,type="n", main="Soft")
text(powers1, SpearmanCorrelationsSoft,labels=powers1,col="red")
abline(v=beta1,col="red")
plot(thresholds1, SpearmanCorrelationsHard,type="n",
main="Hard" ,ylim= range( c(SpearmanCorrelationsSoft, SpearmanCorrelationsHard),na.rm=T) )
text(thresholds1, SpearmanCorrelationsHard,labels=thresholds1,col="black")
abline(v=tau1,col="red")
plot(-sign(TableHard[,3])*TableHard[,2], SpearmanCorrelationsHard,type="n",ylim=range(c(SpearmanCorrelationsHard, SpearmanCorrelationsSoft),na.rm=T),
xlim=range(c(-sign(TableHard[,3])*TableHard[,2], -sign(TableSoft[,3])*TableSoft[,2]),na.rm=T))
text(-sign(TableHard[,3])*TableHard[,2] , SpearmanCorrelationsHard,label=as.character(thresholds1))
points(-sign(TableSoft[,3])*TableSoft[,2]
, SpearmanCorrelationsSoft, pch=as.character(powers1),col="red")
Message: in this example, soft thresholding leads to a signal that is very robust to the choice of the AF parameter. In contrast, the hard thresholding can lead to very different findings.
## TOM PLOT
# The following code computes the topological overlap matrix based on
# adjacency matrix.
dissGTOM1soft=TOMdist1(ADJsoft)
collect_garbage()
dissGTOM1hard=TOMdist1(ADJhard)
collect_garbage()
# resulting clustering tree will be used to define gene modules.
hierGTOM1soft <- hclust(as.dist(dissGTOM1soft),method="average");
hierGTOM1hard <- hclust(as.dist(dissGTOM1hard),method="average");
par(mfrow=c(1,2))
plot(hierGTOM1soft,labels=F,sub="",xlab="")
plot(hierGTOM1hard,labels=F,sub="",xlab="")
# here we chose the height cut-off such that many "true" brown module genes are #included in the designated brown module.
h1soft= .94
h1hard= .91
colorh1soft=as.character(modulecolor2(hierGTOM1soft,h1= h1soft, minsize1=20))
#colorh1soft=ifelse(colorh1soft=="grey","brown",colorh1soft)
table(colorh1soft)
>table(colorh1soft)
colorh1soft
blue brown grey turquoise
206 89 435 370
colorh1hard=as.character(modulecolor2(hierGTOM1hard,h1= h1hard,minsize1=20))
#colorh1hard=ifelse(colorh1hard=="grey","brown",colorh1hard)
table(colorh1hard)
>table(colorh1hard)
colorh1hard
blue brown grey turquoise
218 87 422 373
par(mfrow=c(3,2))
plot(hierGTOM1soft,labels=F,sub="",xlab="")
plot(hierGTOM1hard,labels=F,sub="",xlab="")
hclustplot1(hierGTOM1soft,truemodule,title1="True module colors")
hclustplot1(hierGTOM1hard,truemodule,title1="True module colors")
hclustplot1(hierGTOM1soft, colorh1soft,title1="Identified module colors")
hclustplot1(hierGTOM1hard, colorh1hard,title1="Identified module colors")
table(colorh1soft,truemodule)
table(colorh1hard,truemodule)
> table(colorh1soft,truemodule)
truemodule
colorh1soft blue brown grey turquoise
blue 200 0 6 0
brown 0 89 0 0
grey 0 11 424 0
turquoise 0 0 70 300
> table(colorh1hard,truemodule)
truemodule
colorh1hard blue brown grey turquoise
blue 189 0 29 0
brown 0 80 6 1
grey 11 20 384 7
turquoise 0 0 81 292
Message: soft thresholding leads to far better cluster retrieval.
par(mfrow=c(1,1))
# if you have patience and enough stack memory, try to run this code
#TOMplot1(dissGTOM1soft , hierGTOM1soft, colorh1soft)
#TOMplot1(dissGTOM1hard , hierGTOM1hard, colorh1hard)
# Instead, I will restrict the TOM plot to the most connected genes.
restrict1= ksoft>40
table(restrict1)
>table(restrict1)
restrict1
FALSE TRUE
563 537
dissGTOM1softrest=TOMdist1(ADJsoft[restrict1,restrict1])
collect_garbage()
dissGTOM1hardrest=TOMdist1(ADJhard[restrict1,restrict1])
hierGTOM1softrest <- hclust(as.dist(dissGTOM1softrest),method="average");
hierGTOM1hardrest <- hclust(as.dist(dissGTOM1hardrest),method="average");
# SOFT TOM PLOT
par(mfrow=c(1,1))
TOMplot1(dissGTOM1softrest , hierGTOM1softrest, truemodule[restrict1])
# HARD TOM PLOT
TOMplot1(dissGTOM1hardrest , hierGTOM1hardrest, colorh1hard[restrict1])
# We also propose to use classical multi-dimensional scaling plots
# for visualizing the network. Here we chose 2 scaling dimensions
cmd1=cmdscale(as.dist(dissGTOM1soft),2)
cmd2=cmdscale(as.dist(dissGTOM1hard),2)
par(mfrow=c(1,2))
plot(cmd1, col=as.character(colorh1soft), main="Soft MDS plot",xlab="Scaling Axis 1", ylab="Scaling Axis 2")
plot(cmd2, col=as.character(colorh1hard), main="Hard MDS plot",xlab="Scaling Axis 1", ylab="Scaling Axis 2")
These MDS plots also look fairly realistic, compare with Zhang and Horvath (2005).
# Just to be sure let’s re-define the adjacency matrix and connectivities used.
ADJhard=I(Similarity1>tau1)+0.0; diag(ADJhard)=0;khard=apply(ADJhard,2,sum)
ADJsoft=abs(Similarity1)^beta1;diag(ADJsoft)=0;ksoft=apply(ADJsoft,2,sum)
ADJhardbrown=I(abs(Similarity1[colorh1hard=="brown", colorh1hard=="brown"])
>tau1)+0.0; diag(ADJhardbrown)=0;khardbrown=apply(ADJhardbrown,2,sum)
ADJsoftbrown=abs(Similarity1[colorh1soft=="brown", colorh1soft=="brown"])^beta1;diag(ADJsoftbrown)=0;ksoftbrown=apply(ADJsoftbrown,2,sum)
# This computes the intramodular connectivities
AllDegreesSoft=DegreeInOut(ADJsoft,colorh1soft)
AllDegreesHard=DegreeInOut(ADJhard,colorh1hard)
# Here we compute the soft and the hard cluster coefficients
cluster.coefSoft= ClusterCoef.fun(ADJsoft)
cluster.coefHard= ClusterCoef.fun(ADJhard)
# Now we plot cluster coefficient versus connectivity
# for all genes. We use the true module coloring
par(mfrow=c(2,1))
plot(ksoft, cluster.coefSoft,col=truemodule,ylab="Cluster Coefficient",xlab="Connectivity",main="Soft")
#whichmodule="turquoise"
#plot(AllDegreesSoftkWithin[truemodule==whichmodule],
#cluster.coefSoft[truemodule==whichmodule],col=whichmodule,ylab="Cluster #Coefficient",xlab="Connectivity",main=whichmodule)
plot(khard, cluster.coefHard,col=truemodule,ylab="Cluster Coefficient",xlab="Connectivity",main="Hard")
Comment: for the high connectivity genes in each module CC is inversely related to K in unweighted networks (hard thresholding). In contrast, there is a slightly positive (or constant) relationship among hub genes in weighted networks.
#Gene significance analysis
# Now we study how the gene significance is related to connectivity in the brown module
datconnectivitiesSoft=data.frame(matrix(666,nrow=sum(colorh1soft=="brown"),ncol=length(powers1)))
names(datconnectivitiesSoft)=paste("kWithinPower",powers1,sep="")
for (i in c(1:length(powers1)) ) {
datconnectivitiesSoft[,i]=apply(abs(Similarity1[colorh1soft=="brown", colorh1soft=="brown"])^powers1[i],1,sum)}
SpearmanCorrelationsSoft=signif(cor(GS[ colorh1soft=="brown"], datconnectivitiesSoft, method="s",use="p"))
datconnectivitiesHard=data.frame(matrix(666,nrow=sum(colorh1hard=="brown"),ncol=length(thresholds1)))
names(datconnectivitiesHard)=paste("kWithinPower",thresholds1,sep="")
for (i in c(1:length(thresholds1)) ) {
datconnectivitiesHard[,i]=apply(abs(Similarity1[colorh1hard=="brown", colorh1hard=="brown"])>thresholds1[i],1,sum)}
SpearmanCorrelationsHard=signif(cor(GS[ colorh1hard=="brown"], datconnectivitiesHard, method="s",use="p"))
par(mfrow=c(2,2))
plot(powers1, SpearmanCorrelationsSoft,type="n" ,xlab="Power (beta)" , main="Soft")
text(powers1, SpearmanCorrelationsSoft,labels=powers1,col="red")
abline(v=beta1,col="red")
plot(thresholds1, SpearmanCorrelationsHard,type="n" ,ylim= range( c(SpearmanCorrelationsHard),na.rm=T) ,xlab="Thresholds (tau)" ,main="Hard")
text(thresholds1, SpearmanCorrelationsHard,labels=thresholds1,col="black")
abline(v=tau1,col="red")
plot(-sign(TableHard[,3])*TableHard[,2], SpearmanCorrelationsHard,type="n" ,
ylim=range(c(SpearmanCorrelationsHard, SpearmanCorrelationsSoft),na.rm=T), xlim=range(c(-sign(TableHard[,3])*TableHard[,2], -sign(TableSoft[,3])*TableSoft[,2]),na.rm=T),xlab="Signed Scale Free R^2",ylab="Correlation(k, Gene Significance)" )
text(-sign(TableHard[,3])*TableHard[,2], SpearmanCorrelationsHard,label=as.character(thresholds1))
points(-sign(TableSoft[,3])*TableSoft[,2], SpearmanCorrelationsSoft, type="n")
text(-sign(TableSoft[,3])*TableSoft[,2], SpearmanCorrelationsSoft,labels=as.character(powers1),col="red")
Again, we find that hard thresholding leads to results that are not robust.
Further the SFT criterion works quite well. No surprise here: this is how the simulated example was constructed.
# Now we want to see how the correlation between kWithin and gene significance
# changes for different hard threshold values tau within the BROWN module.
whichmodule="brown"
# the following data frame contains the intramodular connectivities
# corresponding to different hard thresholds
datconnectivitiesHard=data.frame(matrix(666,nrow=sum(colorh1hard==whichmodule),ncol=length(thresholds1)))
names(datconnectivitiesHard)=paste("kWithinTau",thresholds1,sep="")
for (i in c(1:length(thresholds1)) ) {
datconnectivitiesHard[,i]=apply(abs(Similarity1[colorh1hard==whichmodule, colorh1hard==whichmodule])>=thresholds1[i],1,sum)}
SpearmanCorrelationsHard=signif(cor(GS[ colorh1hard==whichmodule], datconnectivitiesHard, method="s",use="p"))
# Now we define the new connectivity measure omega based on the TOM matrix
# It simply considers TOM as adjacency matrix...
datOmegaINHard=data.frame(matrix(666,nrow=sum(colorh1hard==whichmodule),ncol=length(thresholds1)))
names(datOmegaINHard)=paste("omegaWithinHard",thresholds1,sep="")
for (i in c(1:length(thresholds1)) ) {
datconnectivitiesHard[,i]=apply(
1-TOMdist1(abs(Similarity1[colorh1hard==whichmodule, colorh1hard==whichmodule])>thresholds1[i]),1,sum)}
SpearmanCorrelationsOmegaHard=as.vector(signif(cor(GS[ colorh1hard==whichmodule], datconnectivitiesHard, method="s",use="p")))
# Now we define the intramodular cluster coefficient
datCCinHard=data.frame(matrix(666,nrow=sum(colorh1hard==whichmodule),ncol=length(thresholds1)))
names(datCCinHard)=paste("CCinHard",thresholds1,sep="")
for (i in c(1:length(thresholds1)) ) {
datCCinHard[,i]= ClusterCoef.fun(abs(Similarity1[colorh1hard==whichmodule, colorh1hard==whichmodule])>thresholds1[i])}
SpearmanCorrelationsCCinHard=as.vector(signif(cor(GS[ colorh1hard==whichmodule], datCCinHard, method="s",use="p")))
# Now we compare the performance of the connectivity measures (k.in,
# omega.in, cluster coefficience) across different hard thresholds when it comes to predicting
# prognostic genes in the brown module
dathelpHard=data.frame(signedRsquared=-sign(TableHard[,3])*TableHard[,2], corGSkINHard =as.vector(SpearmanCorrelationsHard), corGSwINHard= as.vector(SpearmanCorrelationsOmegaHard),corGSCCHard=as.vector(SpearmanCorrelationsCCinHard))
# Now we focus on soft thresholding
# the following data frame contains the intramodular connectivities
# corresponding to different soft powers
datconnectivitiesSoft=data.frame(matrix(666,nrow=sum(colorh1soft==whichmodule),ncol=length(powers1)))
names(datconnectivitiesSoft)=paste("kWithinTau",powers1,sep="")
for (i in c(1:length(powers1)) ) {
datconnectivitiesSoft[,i]=apply(abs(Similarity1[colorh1soft==whichmodule, colorh1soft==whichmodule])^powers1[i],1,sum)}
SpearmanCorrelationsSoft=signif(cor(GS[ colorh1soft==whichmodule], datconnectivitiesSoft, method="s",use="p"))
# Now we define the new connectivity measure omega based on the TOM #matrix
# It simply considers TOM as adjacency matrix...
datOmegaINSoft=data.frame(matrix(666,nrow=sum(colorh1soft==whichmodule),ncol=length(powers1)))
names(datOmegaINSoft)=paste("omegaWithinSoft",powers1,sep="")
for (i in c(1:length(powers1)) ) {
datconnectivitiesSoft[,i]=apply(
1-TOMdist1(abs(Similarity1[colorh1soft==whichmodule, colorh1soft==whichmodule])^powers1[i]),1,sum)}
SpearmanCorrelationsOmegaSoft=as.vector(signif(cor(GS[ colorh1soft==whichmodule], datconnectivitiesSoft, method="s",use="p")))
# Now we define the intramodular cluster coefficient
datCCinSoft=data.frame(matrix(666,nrow=sum(colorh1soft==whichmodule),ncol=length(powers1)))
names(datCCinSoft)=paste("CCinSoft",powers1,sep="")
for (i in c(1:length(powers1)) ) {
datCCinSoft[,i]= ClusterCoef.fun(abs(Similarity1[colorh1soft==whichmodule, colorh1soft==whichmodule])^powers1[i])}
SpearmanCorrelationsCCinSoft=as.vector(signif(cor(GS[ colorh1soft==whichmodule], datCCinSoft, method="s",use="p")))
# Now we compare the performance of the connectivity measures (k.in,
# omega.in, cluster coefficience) across different soft powers when it comes to predicting
# prognostic genes in the brown module
dathelpSoft=data.frame(signedRsquared=-sign(TableSoft[,3])*TableSoft[,2], corGSkINSoft =as.vector(SpearmanCorrelationsSoft), corGSwINSoft= as.vector(SpearmanCorrelationsOmegaSoft),corGSCCSoft=as.vector(SpearmanCorrelationsCCinSoft))
par(mfrow=c(1,1))
matplot(thresholds1,dathelpHard,type="l",lty=1,lwd=3,col=c("black","red","blue","green"),ylab="",xlab="tau",main="Hard")
legend(0.65,0.2, c("signed R^2","r(GS,k.in)","r(GS,omega.in)","r(GS,cc.in)"), col=c("black","red","blue","green"), lty=1,lwd=3,ncol = 1, cex=1)
abline(v=tau1,col="red")
Comment: in unweighted networks, the standard connectivity works as well as the TOM based connectivity omega.in.
But note that the correlations are highly dependent on the threshold tau, which means the findings are not robust.
matplot(powers1,dathelpSoft,type="l",lty=1,lwd=3,col=c("black","red","blue","green"),ylab="",xlab="beta",main="Soft")
legend(6.8,0.65, c("signed R^2","r(GS,k.in)","r(GS,omega.in)","r(GS,cc.in)"), col=c("black","red","blue","green"), lty=1,lwd=3,ncol = 1, cex=1)
abline(v=beta1,col="red")
We find that the weighted network is much more robust (high correlations for most choices of beta). Note that omega.in works very well in weighted networks. The cluster coefficient cc.in works well for low values of beta. Also see Figure 16 in Zhang and Horvath 2005. This suggests that in weighted networks the cluster coefficient may serve as yet another node centrality (connectivity) measure.
THE END
14