Alternatively Score by separation

The expression values of gene A in Figure 18.2 share a large numerical overlap whereas the overlap in the case of gene B is numerically smaller. Can we rank the genes based on how well the two conditions can be separated from each other without allowing too much overlap? The concept of separation is used by Pepe et al. (10) who suggest using pAUC-scores. A high pAUC-score indicates that the expression values in one condition are well, albeit not necessarily perfectly, separated from the values in the other condition. For small pAUC-values there is essentially no separation. Note that the concept of separability is different from comparing averages.

Currently, the pAUC-score is not implemented in R. You might want to read more about pAUC-values and their interpretation in Pepe et al. (10). To do the computations described in this publication you can use the following R function. It works for up-regulation only. For exploring down-regulation, reverse the class labels in your binary vector y to 1-y.

pauc <- function(x,A,B){ u <- sort(unique(A),decreasing=TRUE) t <- numeric(length(u)) r <- numeric(length(u)) for (i in 1:length(u)){ t[i] <- sum(A[B==0]>u[i])/sum(1-B) r[i] <- sum(A[B==1]>u[i])/sum(B)

roc <- numeric(length(x)) for (i in 1:length(x)){ z <- which(t<=x[i]) z <- z[length(z)]; roc[i] <- r[z]


You need to choose a false-positive rate, say 10%. Finally, calculate pAUC-scores by applying function integral to each gene:

integral <- function(a,b){integrate(pauc,0,0.1,A=a,B=b)$value} score6.up <- apply(X,1,integral,y) # up-regulation score6.down <- apply(X,1,integral,1-y) # down-regulation

For the combination of up- and down-regulation, compute the two variants and take the maximum of the two scores for each gene.

0 0

Post a comment