Recently, I have been taken up very much by the modular analysis developed by Damien Chaussabel and colleagues (see original paper or their recent review paper) for analysis of whole blood data.
A recent R package called tmod from Dr. January Weiner provides a very easy and elegant implementation of this concept and data. The vignette is well written and the author is very responsive to requests and suggestions. Currently, the evidence plot is based on the receiver operating curve. The next release will contain an option to produce an evidence plot that is the equivalent to enrichment score plot of the popular GSEA software from the Broad Institute.
I tweaked this function a bit to allow for different weighting of the genes in the module and to allow it to return the indices of the leading edge subset (i.e. core enrichment genes) which can be useful to generate heatmaps etc. tmod’s evidencePlot has the advantage of not requiring the logFC as an additional parameter.
getLeadingEdgeSubset <- function(symbol, lfc, mod, alpha=1, plot=T){ stopifnot( length(symbol) == length(lfc) ) genes <- getGenes(mod, mset="all")$Genes genes <- unique(unlist(strsplit(genes, split=","))) inMod <- sapply(symbol, function(x) any(x %in% genes)) ## "symbol" can be a list to allow multi-mapping probes N <- length(symbol) N.hit <- sum(inMod) runningES <- NA runningES[ inMod ] <- abs( lfc[ inMod ]^alpha )/sum(abs(lfc[ inMod ]^alpha)) runningES[ !inMod ] <- -1/( N - N.hit ) runningES <- cumsum(runningES) rank.at.max <- which.max(runningES) leadingEdge <- which(inMod[ 1:rank.at.max ]) if(plot){ data(tmod) title.main <- paste0(tmod$MODULES[ mod, "Title" ], " (", mod, ")", collapse="; ") plot(1:N, runningES, type="l", ylab="Enrichment Score (ES)", xlab="Ordered rank of genes", main=title.main) mtext(side=3, at=which(inMod), text="|", line=-1, cex=0.5) arrows(rank.at.max, 0, rank.at.max, max(runningES), length=0, col="blue") mtext(side=3, text=paste0("# genes in module:", length(genes), " ", "# genes found:", length(unique(unlist(symbol[inMod]))), " (", N.hit, " probes)\n"), line=-1) legend("bottom", bty="n", legend=paste0("Rank at max:", rank.at.max, "\n# probes in leading edge subset:", length(leadingEdge))) } return(leadingEdge) }
Here is a simple example taken from the tmod vignette:
library(limma) library(tmod) data(Egambia) design <- cbind(Intercept=rep(1, 30), TB=rep(c(0,1), each= 15)) E <- as.matrix(Egambia[,-c(1:3)]) fit <- eBayes( lmFit(E, design)) tt <- topTable(fit, coef=2, number=Inf, genelist=Egambia[,1:3] ) rm(design, E, fit) evidencePlot(tt$GENE_SYMBOL, "LI.M37.0") ## from tmod version 0.24 onwards
core <- getLeadingEdgeSubset(tt$GENE_SYMBOL, tt$logFC, "LI.M37.0", alpha=1)
I set alpha=1 as default to match Broad Institute’s default value. Setting alpha=0 will produce exactly the same graph as evidencePlot from tmod. Setting the alpha=2 will overweight the genes in the module and sometimes useful for finding smaller set of core genes.
You can then take on these indices (“w”), pull out the genes and visualize them as such
top <- Egambia[ core, -c(1:3)] rownames(top) <- paste0( tt$GENE_SYMBOL[ core ], " (", core, ")") status <- data.frame(status=ifelse( grepl("TB", colnames(top)), "TB", "Healthy")) library(NMF) library(gplots) aheatmap(top, annCol=status, color=bluered(10), Rowv=NA, Colv=NA, scale="row", main="leading edge subset from module LI.M37.0")