Array selection

The ChIP-chip procedure has now been successfully carried out for many factors and a number of different organisms. Ideally the microarray used in a ChIP-chip experiment will contain the entire sequence of the genome of interest, as this provides the most comprehensive platform for monitoring all potential regulatory sequences. This has been feasible for smaller organisms such as Saccharomyces cerevisiae where ChIP-chip studies on individual transcription factors were first carried out (5, 14). These experiments used intergenic arrays containing PCR products representing all the non-coding sequence in the yeast genome and operated under the presumption that this sequence would contain all of the regulatory sites. Shortly after these initial studies, separate large-scale reports in S. cerevisiae emerged, including the mapping of yeast transcription factors involved in the cell cycle (15, 16), revealing the power of ChIP-chip in elucidating transcriptional circuitries. These whole-genome studies were feasible due to the size and relatively compact nature of the yeast genome, where the intergenic regions contain most of the regulatory regions and are relatively small.

After the success of the technique in yeast the natural progression was to scale up to mammalian systems. Ideally when setting out to do ChlP-chip, one would use arrays that tile across both the coding and non-coding regions of the genome, however until recently the lack of reliable and stable genomic sequence in higher eukaryotic organisms has been a major limiting factor. Likewise, researchers have had to grapple with several challenges associated with more complex genome sequence; these include repetitive elements, poor annotation, and large spans on intergenic and intronic sequence.

Despite these limitations, several reports of ChIP-chip carried out in higher organisms have been published (Table 13.1). The first reports of mammalian ChlP-chip came in 2002 (7, 11, 16, 17). Two groups independently investigated the cell-cycle-specific transcription factor E2F binding (7, 17). These initial studies overcame the challenges associated with the more complex mammalian genome mentioned above by the use of arrays containing limited regions of the genome. One group fabricated a CpG island microarray containing 7776 distinct DNA elements that were selected based on their high GC content (7). Many regulatory elements reside in CpG islands (18), and thus would presumably be a useful means to study transcription factor binding. However CpG islands do not guarantee the inclusion of all promoter elements, nor does it take into consideration binding outside of promoter sites. Despite these drawbacks, 68 target sites were identified representing genes involved in the cell cycle

Table 13.1. ChlP-chip studies to date.


Type of array

Transcription factor



Whole genome

Gal4, Ste12






106 general TFs


Nine G1/S-specific TFs


Ste12, Tec1


203 general TFs


Fhl1, Ifh1


Whole genome

Histone H4


(ORF & intergenic)








tiling array

Homo sapiens



(7, 17)



HNF1 a, HNF4a, HNF6


and Pol II





Genomic tiling



NF-kB (p65)

(9, 19)

Sp1, c-Myc, p53




as well as in DNA damage and recombination - two processes not previously known to be regulated by E2F.

The second group chose to look at E2F binding in promoter regions using an alternative approach. They constructed a microarray containing the proximal promoter regions of 1444 human genes, around 1200 of which were cell-cycle-regulated as the E2F family of transcription factors is known to function in regulating this cellular event (17). The array contained PCR products designed to amplify 700 bp upstream and 200 bp downstream of the putative transcription start site of the genes. Although successful in identifying targets of E2F, the study was less than comprehensive, as the approach was biased to cell-cycle-specific genes and limited to promoter regions identified by potentially error-prone computer algorithms. Despite their limitations, both studies mapping E2F binding represent pioneering efforts to globally map transcription factor binding sites.

The third group used a limited genomic tiling array in which all sequences of a region of interest were represented on the array. The binding profile of the transcription factor GATA1 within the p-globin locus was mapped using an array composed of 1-kb PCR fragments representing the entire 75-kb locus (11). This study was comprehensive in that no interactions with DNA go undetected due to lack of content on the array. Indeed a new binding site for the well-studied GATA-1 factor was discovered.

Building on the tiling approach researchers generated a tiling array of an entire human chromosome. The binding of the NF-kB family member p65 along human chromosome 22 was elucidated using an array that had nearly complete coverage of the non-repetitive sequence of the chromosome (9, 19). This microarray was comprised of nearly 22 000 PCR fragments representing both the coding and non-coding portions of the chromosome, making it suitable to survey transcription factor binding in an unbiased and comprehensive fashion. Human chromosome 22 is 34 MB in length, and although this represented only 1% of the genome, it was an important analysis of transcription factor binding nonetheless. It revealed that p65 bound genomic regions in addition to the expected 5' proximal promoter sites - including intronic regions, intergenic regions, as well as near novel transcribed regions. This study was the first mapping of a transcription factor on an entire chromosome and revealed the importance of tiling arrays in identifying potential regulatory regions in the genome.

These findings were further supported by two subsequent publications, one mapping the binding sites of Sp1, Myc and p53 using Affymetrix Chromosome 21 and 22 tiling microarrays (10) and the other mapping CREB binding on the previously mentioned Chromosome 22 array (20). Both of these groups report similar chromosome-wide binding distributions for all the factors as observed for p65. Taken together, these studies suggest that the complexity of global transcription factor binding and subsequent contribution to gene regulation is perhaps underappreciated and can only be elucidated in the context of the whole-genome tiling arrays.

0 0

Post a comment