This chapter discusses the statistical analysis of microarray time course data, with a focus on developmental time course experiments. The methods reviewed here are generally suitable for experiments based on the most widely used kinds of microarray platforms, including single-color, fluores-cently labeled, high-density short oligonucleotide arrays on silicon chips (1), radiolabeled cDNA arrays on nylon membranes, or two-color, fluores-cently labeled cDNA (2, 3) or long oligonucleotide arrays on glass slides.

Microarray time course experiments typically involve gene expression measurements for thousands of genes over relatively few time points, under one (e.g. wildtype) or more biological conditions (e.g. mutant 1, mutant 2,...). The number of time points can be 3-10 for shorter and 11-20 for longer time courses. The time points at which mRNA samples are taken are usually determined by the investigator's judgement concerning the biological events of interest and are frequently irregularly spaced, although for periodic time course experiments, equally-spaced times are standard. Measurements of mRNA abundance will be based on mRNA extracted from cell lines, tissue samples or whole organisms, and in what follows we will use the general term units for the source of the mRNA. The major advantage of microarray time course studies is that they give us the ability to monitor the temporal behavior of a biological process of interest through the measurement of expression levels of thousands of genes simultaneously. This can be a powerful experimental design for identifying patterns of gene expression in the units of interest.

Time course experiments can be classified into two main categories which we term periodic and developmental. Periodic time courses include natural biological processes whose temporal profiles follow regular patterns. Examples are cell cycles (4-6), and circadian rhythms (7), and we expect regulated genes to have periodic expression patterns. In the literature, periodic time course experiments are frequently unreplicated, that is, they arise as a single series of microarray experiments, experimenters perhaps preferring to

Copyright ©2004-2005 by the authors.

use scarce resources obtaining a finer temporal resolution, rather than repeating measurements at times already observed. In what we term developmental time course experiments, gene expression levels are measured at successive times during a developing process, for example, during the natural growth and development of, or following, a treatment applied to the units. In such cases there are usually few prior expectations concerning the form of the temporal profiles. Here there are commonly two to five replicate series, but sometimes there is no replication.

We now summarize several microarray time course experiments. Tomancak et al. (8) conducted a study of Drosophila embryogenesis using microarray time course experiments as the control of RNA in situ hybridization. Canton S fly embryos were collected, transferred to an incubator and aged. At hours 1 to 12 post egg laying, the embryos were dechorionated and quick-frozen, yielding 12 time point samples. The same procedure was repeated on three different days, producing three replicates. Himanen et al. (9) reported a study on a lateral root induction system of Arabidopsis thaliana to characterize the early molecular regulation induced by auxin. Seeds of A. thaliana were germinated on media containing auxin transport inhibitor N-1-naphthylphthalamic acid (NPA). After the germination, the seeds were moved to media with auxin 1-naphthalene acetic acid (NAA) only and the samples of the root segments were collected at four time points: 0, 2, 4, and 6 h after the transfer from NPA to NAA. There were two biological replicates at each time point and cDNA microarray experiments with the reference design were performed. They identified 906 differentially expressed genes over time and grouped these genes into six major clusters. In Qi et al. (10), bone marrow-derived mesodermal progenitor cells (MPCs) were obtained from three donors and the gene expression profiles for MPCs or MPCs induced to the osteoblast or chondroblast lineage for 1, 2, and 7 days were monitored using cDNA microarrays. They identified 41 transcription factors differentially expressed over time, in addition to some known signaling genes, hormones, and growth factors involved in osteo-genesis. A fourth example of a developmental time course study is Schwamborn et al. (11). These authors studied the transcriptional response of the human astrocytoma cells U373 to tumor necrosis factor a (TNFa). Again, cDNA microarray experiments were performed. Samples from both TNFa-treated and untreated U373 cells were collected at 1, 2, 4, 8, and 12 h post treatment, and each time point had three biological replicates. The temporal profiles between these two treatments were compared. More than 880 genes were shown to be responsive to TNFa. In Tepperman et al. (12), gene expression samples were collected at six time points, and the gene profiles of wildtype (wt) and the phytochrome B (phyB) null mutant A. thaliana were compared to identify genes regulated by phyB in response to continuous monochromatic red light (Rc) during the induction of seedling de-etiolation. The study of transcriptional response to corticotrophin-releasing factor (CRF) in Peeters et al. (13) provides an example of unreplicated time course experiments with four different treatments: DMSO, ovine CRF in DMSO, R121919 in DMSO, and CRF plus R121919 in DMSO. Samples were collected at 0, 0.5, 1, 2, 4, 8, and 24 h after these four treatments were applied to mouse AtT-20 cells, and were hybridized to Affymetrix chips.

Following standard practice in statistics (14), we further categorize time course experiments into longitudinal and cross-sectional. Longitudinal experiments are those in which the mRNA samples at different times are extracted from the same unit, be it cell line, tissue or individual. This allows joining of ordinate values corresponding to observations on the same unit at different times, either by straight lines or fitted curves, to give the unit's time course for each gene, which will also be called the temporal pattern or profile. By contrast, cross-sectional time course experiments are those in which the mRNA samples at different times are extracted from different sources (units). With cross-sectional data, the individual data points can also be joined across time, using averages when there are replicate measurements, but the interpretation of the resulting curve is different. It will not correspond to any particular unit, but will be thought of as a population curve. In practice there will be experiments exhibiting features of both longitudinal and cross-sectional types, e.g. Tomancak et al. (8). There are more cross-sectional microarray time course experiments published to date than longitudinal ones, for example, Tepperman et al. (12) and Himanen et al. (9) cited above. This is probably because it is often infeasible to carry out longitudinal experiments because of the limited availability of mRNA from individual organisms such as laboratory mice. However, Qi et al. (10) is an example of a longitudinal study.

In this chapter we review methods for the design and analysis of micro-array time course experiments. After discussing design issues in Section 20.2, we turn to methods for identifying the genes of interest to the experimenter in Section 20.3, be they genes which change over time, or genes which change differently over time between two or more biological conditions. Depending on one's perspective, this task can be viewed as a 'filtering' of the genes to remove those which are not of interest, before turning to a different kind of analysis such as clustering, or it can be seen as identifying a small to moderate list of genes for validation and further characterization. We use the microarray time course data of the study in Drosophila embryogenesis in Tomancak et al. (8) to illustrate the concept of moderation and compare some of the statistics we describe below. We then review the literature on clustering gene expression microarray time course data in Section 20.4, and end with a few comments about alignment of time series in Section 20.5.

We close this introduction with a few remarks on why the analysis of microarray time course data is special, and not adequately covered by the enormous literature that already exists on the analysis of time series (see e.g. 15 and references therein). There are three principal reasons. One is the fact that microarray time series are usually so short that we cannot consider applying methods typically used to analyze time series data, for example ARMA, Fourier or wavelet methods, as in Diggle's and other time series books. A second reason is that there are typically thousands of genes and hence thousands of short time series, all sharing the common experimental conditions. It is natural to think of analyses which have elements in common for all genes, such as the empirical Bayes (EB) methods described below. While there is some literature on EB methods in time series, we know of none involving thousands of series, as is the case here. Finally, an important aspect of microarray gene expression data is the clustering of genes, here based on their temporal profiles. As far as we know, this problem has not arisen in the traditional time series literature, at least not in the form we meet it here. A far more relevant body of literature is that on longitudinal data analysis, for when the data are longitudinal, this is precisely the right context. When the data are not longitudinal, our context is that of many related regression models.

0 0

Post a comment