Find genes correlated to a reference gene

Like most biologists, you will have a favorite gene, and you want to find all those genes with expression values that correlate with the values of your pet gene. You might think that clustering all expression data and then looking up your gene in a red-green colored diagram is the best way to do this. We think it is not. In spite of the pitfalls already mentioned, we believe that a scoring approach is the safer way. Clustering is a mine field, and you only need to enter it if you do not want to focus on a pet gene but aim for a global view on the correlation structure in the data.

Consider the Golub et al. (11) example above. How can you identify genes with a high correlation to gene MGST1? Set the expression values of gene MGSTl as reference vector and apply either a correlation score based on numerical values (Pearson) or on ranks (Spearman) to each gene. With reference vector refvec, type:

score7 <- twilight.pval(x,refvec,method="pearson" # or method="spearman"

