Leading edge subset from Gene Set Enrichment Analysis (GSEA) using blood transcription modules

Recently, I have been taken up very much by the modular analysis developed by Damien Chaussabel and colleagues (see original paper or their recent review paper) for analysis of whole blood data.

A recent R package called tmod from Dr. January Weiner provides a very easy and elegant implementation of this concept and data. The vignette is well written and the author is very responsive to requests and suggestions. Currently, the evidence plot is based on the receiver operating curve. The next release will contain an option to produce an evidence plot that is the equivalent to enrichment score plot of the popular GSEA software from the Broad Institute.

I tweaked this function a bit to allow for different weighting of the genes in the module and to allow it to return the indices of the leading edge subset (i.e. core enrichment genes) which can be useful to generate heatmaps etc. tmod’s evidencePlot has the advantage of not requiring the logFC as an additional parameter.

getLeadingEdgeSubset <- function(symbol, lfc, mod, alpha=1, plot=T){

  stopifnot( length(symbol) == length(lfc) )

  genes <- getGenes(mod, mset="all")$Genes
  genes <- unique(unlist(strsplit(genes, split=",")))
  inMod <- sapply(symbol, function(x) any(x %in% genes))  ## "symbol" can be a list to allow multi-mapping probes

  N     <- length(symbol)
  N.hit <- sum(inMod)

  runningES <- NA
  runningES[  inMod ] <-  abs( lfc[ inMod ]^alpha )/sum(abs(lfc[ inMod ]^alpha))
  runningES[ !inMod ] <-  -1/(  N - N.hit )
  runningES <- cumsum(runningES)

  rank.at.max <- which.max(runningES)
  leadingEdge <- which(inMod[ 1:rank.at.max ])

 if(plot){

    data(tmod)
    title.main <- paste0(tmod$MODULES[ mod, "Title" ], " (", mod, ")", collapse="; ")

    plot(1:N, runningES, type="l",
         ylab="Enrichment Score (ES)", xlab="Ordered rank of genes", main=title.main)

    mtext(side=3, at=which(inMod), text="|", line=-1, cex=0.5)

    arrows(rank.at.max, 0, rank.at.max, max(runningES), length=0, col="blue")

    mtext(side=3, text=paste0("# genes in module:", length(genes), "                                       ",
                              "# genes found:", length(unique(unlist(symbol[inMod]))), " (", N.hit, " probes)\n"), line=-1)

    legend("bottom", bty="n",
           legend=paste0("Rank at max:", rank.at.max, "\n# probes in leading edge subset:", length(leadingEdge)))
  }

  return(leadingEdge)
}

Here is a simple example taken from the tmod vignette:

library(limma)
library(tmod)

data(Egambia)
design <- cbind(Intercept=rep(1, 30), TB=rep(c(0,1), each= 15))
E <- as.matrix(Egambia[,-c(1:3)])
fit <- eBayes( lmFit(E, design))
tt <- topTable(fit, coef=2, number=Inf, genelist=Egambia[,1:3] )
rm(design, E, fit)

evidencePlot(tt$GENE_SYMBOL, "LI.M37.0")  ## from tmod version 0.24 onwards

Rplot

core <- getLeadingEdgeSubset(tt$GENE_SYMBOL, tt$logFC, "LI.M37.0", alpha=1)

Rplot04

 

I set alpha=1 as default to match Broad Institute’s default value. Setting alpha=0 will produce exactly the same graph as evidencePlot from tmod. Setting the alpha=2 will overweight the genes in the module and sometimes useful for finding smaller set of core genes.

You can then take on these indices (“w”), pull out the genes and visualize them as such

top <- Egambia[ core, -c(1:3)]
rownames(top) <- paste0( tt$GENE_SYMBOL[ core ], " (", core, ")")
status <- data.frame(status=ifelse( grepl("TB", colnames(top)), "TB", "Healthy"))

library(NMF)
library(gplots)

aheatmap(top, annCol=status,
         color=bluered(10), Rowv=NA, Colv=NA, scale="row",
         main="leading edge subset from module LI.M37.0")

 

Rplot05


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s