Regression approaches including Bsplines

To date these methods have been used mainly for unreplicated time course data under a single biological condition. Zhao et al. (40) outlined a regression model to search for genes with transcriptional response to a stimulus. Their regression function was built to relate each gene's profile to a vector of covariates including dummies for the stimulus categories, time, and other characteristics of the sample. Their model included gene-specific parameters and parameters to model the heterogeneity across arrays. The mean vector was estimated using the technique described in Liang and Zeger (41). They further focused on the single-pulse model (SPM), which is specific for the setting when cells are released from cell cycle arrest. Xu et al. (42) described an application of the same kind of regression model to a time course study involving Huntington's disease.

Several researchers have suggested the use of B-splines to model gene profiles. In Bar-Joseph et al. (48) the expression profiles for each gene and each of two biological conditions were represented by continuous curves fitted using B-splines. A global difference between the two continuous curves and an ad hoc likelihood based p-value was calculated for each gene. Other papers using B-splines to model profiles are Luan and Li (44) and Hong and Li (45). Luan and Li (44) adopt the shape-invariant model (46, 47) for guide genes, and model the common periodic function shared by all periodically expressed genes using a B-spline basis. Such genes were identified using a false discovery rate (FDR) procedure. Both the B-spline based approach in Bar-Joseph et al. (48) and Luan and Li (44) were illustrated on the yeast cell cycle datasets. They do not seem suitable for short time courses. Similarly, Hong and Li (45) proposed a B-spline based approach to identify differentially expressed genes in the two-sample case. There they modeled the expected profile as linear combinations of B-spline basis functions, and used a Markov chain Monte Carlo EM algorithm (MCEM) to estimate the gene-specific parameters and hyperparameters from the hierarchical model. They selected differentially expressed genes using empirical Bayes log posterior odds, and the posterior probability based FDR. They showed their method performed better than the traditional ANOVA model. As above, the approach in Hong and Li (45) seems more suited to longer time course data.

0 0

Post a comment