Bioinformatics Considerations

SAGE is a sampling method that determines the expression level of a gene (i.e., a resulting transcriptome) by measuring the frequency of unique short sequence tags derived from the corresponding mRNA transcript. Although the method is a very effective approach for determining the expression of mRNA populations, there can be significant biases in results caused by sampling errors, sequencing errors, nonuniqueness, and nonran-domness of tag sequences. Moreover, there are often transcripts that are present at a low copy number. To obtain a valid estimate of transcriptome size from SAGE libraries, one must sample a number of tags in a fashion inversely proportional to the lowest abundance level, which is not always known. Taking these mathematical considerations into account puts a strain on the design of a SAGE experiment.1-17-1 Several statistical tests, all having their limitations, have been published for the pairwise comparison of SAGE libraries, and should be consulted before a SAGE experiment is designed. Computer programs, incorporating different statistical methods, are also available—free of charge—to facilitate data handling and analysis.[18]

To enhance the utility of SAGE data, the NCI's Cancer Genome Anatomy Project (CGAP) has created an informatics tool ''SAGE Genie,'' a web site for the analysis and presentation of SAGE data ( SAGE Genie provides an automatic link between gene names and SAGE transcript levels, accounting for alternative transcription and many potential errors, and allowing for an invaluable means to archive and analyze the expression profiles for any given gene under any biological context.[19]

