The material on this page is part of Chapter 10, which is shown in full as a preview on this site.
Chapter 10: Nucleic Acid Platform Technologies
Rando Oliver, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605
Although the specific bound oligonucleotides and labeled probes and details of the analysis of microarray hybridizations differ depending on the experimental questions being asked, most microarray experiments involve six steps.
- 1. Design a microarray.
- 2. Print or purchase a microarray.
- 3. Isolate and amplify the DNA or RNA probe material.
- 4. Label the DNA or RNA with fluorescent groups.
- 5. Hybridize the labeled probes to the microarray.
- 6. Analyze the microarray hybridization results.
For the remainder of the chapter, we discuss each of these steps in turn. Protocols are provided for printing a microarray in-house (Protocol 1) and for amplifying DNA and RNA following isolation (Protocols 24). Several techniques for adding fluorescent moieties to the nucleic acids are provided (Protocols 58). Protocol 9 explains how to block the positive charges of the polylysines bound to homemade microarrays. Finally, there is a detailed generalized protocol for hybridization of labeled probes to a microarray, as well as scanning, formatting, and storing the microarray hybridization data (Protocol 10). A brief guide to microarray analysis is contained in Chapter 8.
A great variety of microarrays are commercially available, and for most researchers, an off-the-shelf product will suffice. If not, customized microarrays can be ordered from several manufacturers or designed (and printed) in-house. The appropriate design of a microarray will depend on the application for which it is to be used. There are, however, some basic design principles that are common to most applications. We consider two types of basic microarray: the gene expression microarray and the tiling microarray. Specialized formats, such as splicing arrays and resequencing arrays, have been designed for different organisms, and we point interested readers to the relevant literature (Hacia 1999; Mockler et al. 2005; Blencowe 2006; Hughes et al. 2006; Calarco et al. 2007; Cowell and Hawthorn 2007; Gresham et al. 2008).
Designing oligonucleotides for microarrays requires expertise in bioinformatics, but some basic design properties are easily understood. First, choose an appropriate oligonucleotide length. The majority of oligonucleotide microarrays today are printed with oligonucleotides 5070 nucleotides long. Designing oligonucleotides of optimal lengths requires consideration of several factors, including signal strength, specificity, cost, and efficiency of synthesis. Shorter oligonucleotides are more likely to cross-hybridize to many different regions of a given genome and will often have very low melting temperatures, making hybridization technically problematic. A detailed analysis of oligonucleotide length versus sensitivity and specificity can be found in Hughes et al. (2001). A reasonable rule of thumb is that for most applications, 60-nucleotide oligonucleotides provide the best balance between these competing constraints.
The second consideration in oligonucleotide design is the degree of specificity or complementarity between the oligonucleotides on the microarray and the RNA species in the organism of interest. In general, the oligonucleotides should be designed so that they are complementary to only one RNA species. Commonly, BLAST searches (see Chapter 8) are used to screen the sequence of each oligonucleotide against the relevant genomic sequence to identify the potential for multiple hits in the genome of interest. Ideally, rather than focusing on a cut-off BLAST value, the user should select for each gene the oligonucleotide having the lowest-scoring second match in the genome.
The third design consideration is optimization of the hybridization properties. Several features can be optimized, but the most important is the melting temperature (Tm) of the oligonucleotide. Specifically, the Tms of all of the oligonucleotides on the array should be within as narrow a window as is feasible. Other characteristics, such as entropy (i.e., the complexity of the sequence), GC, and self-complementarity should be optimized as well. Software tools are available to help with oligonucleotide design, ranging from publicly available tools like ArrayOligoSelector (see below) to commercial services provided by microarray companies (for further details of oligonucleotide design, see Chapters 7 and 8).
As an example, consider a case in which an investigator wishes to design gene expression microarrays for a non-model organism with a recently sequenced genome. The first step in designing gene expression microarrays is, of course, to identify the genes. As this is routinely performed in any genome sequencing effort, we will assume that genes have already been predicted using standard tools (Zhang 2002; Ashurst and Collins 2003; Brent 2005; Solovyev et al. 2006).
Once coding regions are identified, oligonucleotides need to be designed for each gene. Many programs exist for oligonucleotide selection, including ArrayOligoSelector (Bozdech et al. 2003), a commonly used program that is freely available at http://arrayoligosel.sourceforge.net/ (for further details, see Chapters 7 and 8).
ArrayOligoSelector is designed to analyze a complete genome and prepare oligonucleotides of a user-defined length. For every oligonucleotide, ArrayOligoSelector calculates scores for uniqueness, sequence complexity, self-annealing, and GC content. Uniqueness is a measure of the theoretical difference in binding energy between a given oligonucleotide and either its perfect match or the next most homologous genomic sequence. Sequence complexity allows the user to filter oligonucleotides with homopolymeric tracts, which otherwise may cause hybridization problems. The self-annealing score is a measure of the secondary structure generated by the self-annealing of an oligonucleotide. Self-annealing is another potential source of hybridization problems. Finally, it is important to minimize variation in GC percentage among the oligonucleotide sequences, both to minimize Tm variation and to minimize variability in the fluorescence intensity among the spots.
When running ArrayOligoSelector, the following features can be specified: oligonucleotide length, GC, number of oligonucleotides per gene, sequences to mask, and uniqueness cutoff. Common oligonucleotide lengths are 60-mers or 70-mers. GC will vary depending on the GC of the genome in question and will typically be chosen as the genomic coding region average. The number of oligonucleotides per gene will depend on whether the microarray is to be purchased or printed in-house. If oligonucleotides are purchased for in-house printing, then cost and the printable spot density will preclude using more than one or two oligonucleotides per gene. Alternatively, if commercial arrays are to be used, then the number of spots available will determine the number of oligonucleotides to be chosen per gene. Masking sequences are not commonly specified, but if a problematic short repeat element is present in the genome, then it is sometimes valuable to mask it out of the microarray oligonucleotides. The uniqueness cutoff is typically left blank, which will result in the default value being used (for additional information, see the ArrayOligoSelector manual).
The output of ArrayOligoSelector can be filtered by the experimenter. For example, it is often desirable to use oligonucleotides located toward the 3 ends of genes because reverse-transcriptase-based labeling is more efficient in this region.
Oligonucleotide design for tiling microarrays is more straightforward than for gene expression microarrays because tiling presumes that essentially all of the oligonucleotides within a region of interest will be included on the microarray. The simplest tiling design involves choosing nucleotides 150 (say) of some region to be tiled as spot 1, nucleotides 2170 as spot 2, and so forth. Once all of the oligonucleotides have been designed, BLAST can be used to find oligonucleotides with multiple identical matches in the genome of interest, and these oligonucleotides can be removed. More subtle tiling designs incorporate a small amount of wiggle in the oligonucleotide location; thus spot 2 might run from nucleotides 1665, spot 3 might run from 4594, and so forth. By doing this, the process of matching hybridization properties, such as Tm and GC, is better than with a simple tiling microarray.
Once oligonucleotides have been designed, microarrays can be printed commercially. Alternatively, oligonucleotides can be synthesized commercially or by an in-house core facility, and then printed in-house using a spotting robot (see Protocol 1).
Most of the sample preparation procedures for microarrays follow standard protocols, many of which can be found elsewhere in this manual. For example, comparative genomic hybridization (CGH) analyses use genomic DNA isolated from samples of interest, as described in Chapter 1. Gene expression or splicing studies use either total RNA or mRNA, purified as described in Chapter 6. Protein localization analysis starts with material isolated via chromatin immunoprecipitation (ChIP), as described in Chapter 20. Typically, however, the intended assays for many of these protocols are based on blotting techniques or quantitative PCR readouts, which often require only nanograms of material. Microarray labeling, on the other hand, typically requires several micrograms of nucleic acid; therefore, an amplification step is necessary before labeling.
Because amplification of the nucleic acid that will be used to generate labeled microarray probes occurs before the hybridization step, it must not bias representation of any particular sequences in the genome. Thus, unbiased (or minimally biased) whole-genome amplification protocols are a key component of many microarray applications.
Three amplification protocols are included in this chapter: two for DNA (Protocols 2 and 3) and one for RNA (Protocol 4). There are several practical items to keep in mind while performing an amplification protocol. First, when trying an amplification method for the first time, start with a large and easily obtainable pool of material (e.g., liver RNA) and amplify an aliquot of the original pool. Microarray comparisons between the amplified material and the original bulk pool then provide a valuable readout of amplification biases. A perfect amplification would result in a yellow array with no spots showing any differences between the bulk and amplified material. Second, avoid contaminating your sample with anything that might contain DNA or RNA. Even tiny amounts of foreign nucleic acids will be amplified, contaminating your sample and corrupting your microarray experiments. Always wear gloves and use filter pipette tips when performing the isolation and amplification protocols. Finally, always include a control amplification with water only (no DNA or RNA) to ensure that the reagents are not contaminated with amplifiable material.
Less common than DNA amplification, RNA amplification is nonetheless required in experiments that use small populations of cells, such as may occur in neurobiological studies using laser-captured cell populations. RNA amplification kits are available from several vendors (e.g., Ambion), or amplification can be performed as described in Protocol 4.
Labeling nucleic acids for use in microarrays is similar for most microarray platforms, except for Affymetrix microarrays. For most platforms (homemade or commercial), fluorescent molecules are attached to DNA by the Klenow fragment of DNA polymerase I. For labeling RNA, reverse transcriptase is used to prepare labeled cDNA. Labeling methods can include a fluorescent dNTP in the labeling reaction (Protocols 5 and 7). Because this approach is rapid but expensive, a cheaper but lengthier alternative first incorporates aminoallyl nucleotides into the nucleic acid molecules and then couples the aminoallyl group to the fluorophore (Protocols 6 and 8). In all of the labeling protocols, the source nucleic acids can be either unamplified or amplified before labeling.
Methods for hybridization of labeled probe materials to a microarray differ significantly when using home-printed microarrays versus commercial microarrays. For home-printed microarrays, slides must first be blocked because polylysines remaining on the slide will cause significant background binding to labeled material unless they are neutralized by reaction with succinic anhydride. Protocols 9 and 10 provide methods, respectively, for blocking and hybridizing to homemade arrays. When using commercial microarrays, hybridization protocols are typically provided. Following hybridization, microarrays are scanned in a bench-top scanner. The data are collected, formatted, and stored digitally for subsequent analysis.
The tools used for analysis of microarray data will depend on the experimental question being asked: Localization studies require a different set of tools than do gene expression studies. Some of the available resources are described in Chapter 8. Here, we outline some of the basic steps in data analysis.
The first step in microarray data analysis is to remove bad data (i.e., data from spots that were flagged because they were obscured by fluorescent precipitate, etc.), and to normalize the remaining data. Working with a .gpr file, which is the standard output of GenePix software, we typically eliminate all flagged probes and then work exclusively with the Log2 ratio data. More advanced users may consider using specific features, such as Foreground and Background intensities.
Most two-color microarray studies are normalized to an average Log2 ratio value of 0. The assumption implicit in this normalization is that there was no overall change in whatever is being measured. Thus, it is important to remember when looking at such normalized data that relative values are being measured, not absolute values. Practically, normalization can be performed by averaging the unflagged log ratio value, then subtracting this value from every one of the entries in the column. This can be done by hand in spreadsheet programs or using common commands in languages such as MATLAB, Perl, or R.
Once normalization is completed, the list can be sorted by log ratio value to identify genes that are dramatically up-regulated or down-regulated (if gene expression) or identify loci with high levels of enrichment of the protein in question. At this point, data analysis paths will diverge depending on the questions being asked. However, because many microarray studies involve multiple microarrays, it is often useful to cluster data to identify genes or loci that share similar behavior.
For clustering and visualization, numerous programs are available online. We use the classic Cluster and TreeView programs (Eisen et al. 1998), available online at SourceForge (http://sourceforge.net/) or via the Eisen laboratory website (http://www.eisenlab.org/). Because sample files that guide the formatting of files for clustering are also available, formatting will not be described here. Briefly, however, data will be loaded into Cluster, various thresholds will be set (fraction of data missing for a given gene, number of genes changing over some threshold, etc.), and one of several clustering algorithms will be used. The output of the clustering can be visualized with TreeView, allowing users to generate the classic heatmap view of their microarray data.
Chapter 10 Protocols:
- Protocol 1: Printing Microarrays
- Protocol 2: Round A/Round B Amplification of DNA
- Protocol 3: T7 Linear Amplification of DNA (TLAD) for Nucleosomal and Other DNA < 500 bp
- Protocol 4: Amplification of RNA
- Protocol 5: Direct Cyanine-dUTP Labeling of RNA
- Protocol 6: Indirect Aminoallyl-dUTP Labeling of RNA
- Protocol 7: Cyanine-dCTP Labeling of DNA Using Klenow
- Protocol 8: Indirect Labeling of DNA
- Protocol 9: Blocking Polylysines on Homemade Microarrays
- Protocol 10: Hybridization to Homemade Microarrays
Save 30% & Get Free Shipping!*
Save 30% at checkout on our website.(Limited time special offer.)
Search for information about other protocols included in the book:
Read What Others Are Saying About Molecular Cloning:
* Free shipping to individuals in U.S. and Canada only