Editor's Note: Toolboxes are a new, occasional feature in the Journal designed to briefly highlight a new method or a resource of general use in neuroscience or to critically analyze existing approaches or methods. For more information, see http://www.jneurosci.org/misc/itoa.shtml.
When technical advances generated novel genetic markers in the 1980s, they were combined with linkage analysis allowing thousands of genes to be profiled simultaneously (Botstein et al., 1980; Weber and May, 1989). This capability has been used to isolate the molecular causes of most monogenetic disorders of the brain. With this success as a backdrop, the persistent difficulty in isolating the molecular defects that underlie complex brain disorders is particularly glaring. Complex disorders emerge from an interplay between genes and the environment and constitute by far the vast majority of diseases. It is no surprise that when microarray, a technique that simultaneously profiles the levels of thousands of mRNA transcripts, was introduced in the late 1990s (Lockhart et al., 1996; DeRisi et al., 1997), this technical advance was enthusiastically met by clinical neuroscientists. Because mRNA expression profiles are anatomically specific, and because expression levels are influenced by both genetic polymorphisms and environmental input, microarray held great promise for uncovering the pathogenic molecules underlying complex disorders.
Despite a number of interesting and potentially important findings (Mirnics et al., 2001; Geschwind, 2003), this promise has not yet been fully realized. Molecular heterogeneity can be invoked to account for this lack of success, in which defects in many separate molecular pathways produce overlapping disease phenotypes. Nevertheless, we believe that molecular parsimony should still be assumed (i.e., that each disease is driven primarily by relatively few molecular pathways) and that the difficulty in pinpointing pathogenic molecules with microarray is more likely a reflection of its analytical challenges and technical limitations.
Here we will review the analytical challenges presented by microarray and show how combining microarray with other technologies, such as in vivo brain imaging, can be used to: (1) maximize “signal amplitude” in a microarray experiment; (2) constrain sources of noise; and (3) address the high rate of false-positivity that naturally occurs with multiple comparisons. Next, we discuss the technical limitations presented by microarray and the importance of confirming microarray findings at the protein level. We conclude by considering how a microarray finding can be validated, testing whether the identified molecule is causally related to the disease under investigation.
We use our recent experience applying microarray to late-onset Alzheimer's disease (Small et al., 2005) to illustrate these points. Indeed, with both an early-onset and a late-onset form, Alzheimer's disease provides an informative example. Because the early-onset form is monogenetic, linkage analysis has successfully isolated most of the molecular defects that cause this extremely rare form of disease (St George-Hyslop, 1999). In contrast, the late-onset form, accounting for >95% of all cases, is a complex disorder, and its molecular causes remain elusive.
Maximizing signal amplitude
Comparing the expression levels of a hypothetical molecule underlying Alzheimer's disease with the expression levels of a hypothetical melanoma-causing molecule is useful in illustrating the analytical challenges inherent to microarray (Fig. 1). In melanoma, as in all neoplastic disorders, an unregulated increase in transcription is a defining feature of the disease. The expression levels of cases compared with controls (the signal amplitude in a microarray experiment) are therefore expected to differ by orders of magnitude (Fig. 1A). In contrast, low signal amplitude must be assumed for many brain disorders, including Alzheimer's disease, because of molecular mechanisms underlying neuronal dysfunction (Fig. 1B). Although protein aggregates and cell death ultimately mark Alzheimer's disease and other neurodegenerative disorders, “cell sickness” is the key feature of the earliest and most informative stage of disease (e.g., synaptic dysfunction in a relatively intact neuron) (Selkoe, 2002). Alzheimer's and other neurodegenerative diseases are therefore part of a larger category of brain disorders characterized by physiological rather than structural lesions, a category that includes most psychiatric and many developmental disorders, as well as age-related cognitive decline. Because synaptic dysfunction can occur with relatively subtle changes in molecular expression, low single amplitude is the first challenge presented by microarray when applied to these intractable disorders of the brain.
Almost all brain disorders are regionally selective, and it would seem selfevident that microarray performed on brain tissue most affected by the disease process should maximize signal amplitude (Fig. 1C). Microarray performed on large swatches of brain, containing a mixture of affected, less affected, and unaffected tissue, is likely to dilute an already low but meaningful abnormality in expression (Fig. 1B). Although obvious from an analytical perspective, it is not always clear how to identify the one brain region most vulnerable and affected by disease. This issue is particularly problematic for physiological rather than structural disorders, where telltale histological or structural lesions cannot be used as landmarks. Even in neurodegenerative diseases, such as Alzheimer's disease (Selkoe, 2002), Parkinson's disease (Dauer and Przedborski, 2003), and Huntington's disease (Arrasate et al., 2004), it is generally agreed that the histological distribution of protein aggregates does not necessarily mark the sites of greatest physiologic dysfunction. Furthermore, although cells do eventually degenerate after many years of sublethal injury, patterns of cell death might reflect differential sensitivity to apoptosis rather than to synaptic dysfunction. Finally, because these disorders have long protracted courses, postmortem studies might bias against the earliest and most informative sites of disease. In Alzheimer's disease, for example, which begins by causing hippocampal-dependent memory loss, postmortem maps of cell death have implicated the CA1 subfield (West et al., 1994), the entorhinal cortex (Shoghi-Jadid et al., 2002), or both (Price et al., 2001). Indeed, relying on postmortem findings, many microarray studies of Alzheimer's disease have focused on the CA1 subfield as the targeted region of investigation (Ginsberg et al., 2000).
With the improvement in functional imaging techniques, the physiological integrity of small and discrete subregions of the brain can now be assessed (Small et al., 2000, 2002, 2004), and these techniques are well suited for pinpointing sites most affected by physiological disorders. By imaging the functional integrity of multiple hippocampal subregions in living subjects (Small et al., 1999, 2002), imaging studies have clearly shown that the entorhinal cortex, not the CA1 subfield, is the first and most profoundly affected site in Alzheimer's disease (Fig. 1C). This pattern of physiological dysfunction matches the anatomical distribution of intracellular neurofibrillary tangles (Braak and Braak, 1991). Guided by these functional imaging findings, we decided to focus on the entorhinal cortex rather than the CA1 subfield as the target site in our microarray analysis of Alzheimer's disease (Small et al., 2005).
Constraining sources of noise
The hypothetical example comparing Alzheimer's disease with melanoma highlights a second analytical challenge inherent to microarray studies of the brain. Compared with melanoma, an Alzheimer's-related molecule is expected to manifest greater inter-individual variance, factors that influence mRNA expression independent of the disease. A main reason for the increase in “signal noise” (Fig. 1B) is that tissue samples used to investigate brain disorders are typically harvested from postmortem brains. Thus, in contrast to neoplasms in which tissue is typically harvested from biopsy material, sources of expression-variance extend beyond differences in genetic heritage and environment, to include differences that occur during the dying process, a particularly powerful and vexing source of noise (Li et al., 2004). For example, a slow agonal death, typically accompanied by hypoperfusion and blood-borne toxins or drugs, will dramatically affect mRNA levels compared with death caused by acute trauma.
Identifying a brain region relatively unaffected by a disease can, in principle, be used to constrain inter-individual sources of variance (Fig. 1D). Global sources of noise, such as vascular-related changes caused by blood-borne drugs or hypoperfusion, can be assumed to influence expression levels of a particular gene within both the affected and unaffected brain regions. Thus, expression levels of the unaffected region can be used to statistically normalize against these global sources of noise. Of course, this approach will not constrain variance caused by noise factors that are region specific. Many brain regions are typically resistant to a disease process, and so careful consideration should be given in selecting an unaffected brain region as a noise reducer in a microarray study. We advocate the use of an unaffected region that neighbors the affected region, because neighboring regions are more likely to share vascular supply (Small et al., 2005).
Identifying a neighboring unaffected region is relatively trivial when studying structural diseases, such as brain tumors, stroke, or multiple sclerosis. Isolating an unaffected region is more challenging for physiological disorders, and here again functional imaging is well suited for the task. For example, functional imaging studies have extended on postmortem observations showing that the physiological integrity of the dentate gyrus is relatively preserved even in late stages of Alzheimer's disease (Small et al., 1999).
To summarize, establishing the spatial profile of a disease can be used to perform an analytical “double subtraction,” comparing the expression levels from the affected and unaffected regions and comparing this difference between cases and controls. In statistical terms, the spatial profile can be used to convert a microarray analysis from a simple t test design to a more sophisticated factorial ANOVA, including both within- and between-group factors, a design that has proven to be effective in addressing analytical challenges of microarray (Kerr et al., 2000). Accordingly, in our Alzheimer's disease study, we applied a factorial design to microarray data generated from the entorhinal cortex and the dentate gyrus, harvested from each brain with and without disease. This analysis yielded 33 molecules that conformed to the spatial profile of Alzheimer's disease, molecules that are differentially expressed in the entorhinal cortex compared with the dentate gyrus, between cases and controls (Small et al., 2005). Nevertheless, because of reasons discussed below, we were forced to acknowledge that many of these molecules were not necessarily linked to disease pathogenesis.
Distinguishing true from false-positive findings
With the improvement in gene-chip technology, the expression levels of >20,000 genes can be assessed. A statistical analysis comparing expression profiles between disease and control tissue must therefore contend with the high false-positive rate (the type I error) that naturally occurs when thousands of comparisons are performed (Slonim, 2002). By applying a statistical cutoff, a typical microarray study yields a laundry list of molecules, and only some of these molecules are truly relevant to the disease under investigation, whereas many others are simply false-positive findings.
How can the type I error be effectively addressed? Acquiring data from thousands of tissue samples is considered impractical, and because expression levels are not independent events, applying simple statistical corrections is considered inappropriate. Using more sophisticated statistical methods is one general approach for dealing with the problem of multiple comparisons. For example, statistical techniques such as principle components or cluster analysis can be used, essentially looking for covariate patterns of expression levels among groups of molecules. However, given the potential discordance between mRNA and protein, as discussed below, it is not clear how the covariate pattern observed among levels of different mRNAs easily translates to a meaningful relationship at the protein level. Furthermore, attempting to extract a covariate pattern among a group of molecules might, in principle, make it difficult to pinpoint a single molecule or molecular pathway that plays a primary role in disease pathogenesis.
Relying on a more sophisticated understanding of the temporal profile of the disease under investigation is an alternative approach that has proven effective for addressing the type I error. For example, in microarray studies applied to simpler organisms, such as yeast (DeRisi et al., 1997), worms (Kim et al., 2001), and flies (White et al., 1999), the most relevant molecules have been successfully isolated by requiring that expression patterns match the temporal profile of a phenotype under investigation (Fig. 2A). In principle, the same logic can be applied to brain disorders, by assuming that the most pathogenic molecules will match the temporal profile of regional dysfunction. In Alzheimer's disease, for example, functional imaging has mapped the temporal profile of dysfunction, showing that compared with controls entorhinal dysfunction is age and time independent (de Leon et al., 2001; Small et al., 2002) (Fig. 2B). The implication is that once dysfunction occurs, it does not change with time, an interpretation supported by theoretical studies of neurodegeneration (Clarke et al., 2000). In contrast, and as another example, functional imaging studies have shown that in cognitive aging, regional dysfunction progresses linearly across the lifespan (Small et al., 2002, 2004) (Fig. 2C).
As with simpler organisms, this temporal information can be used to generate an a priori model predicting how a molecule most relevant to a particular disorder should temporally behave (Fig. 2). Then this model can be forward-applied onto a specifically tailored microarray dataset, identifying only those molecules with a temporal profile that best conforms to the model. Thus, by forcing a microarray experiment to be hypothesis driven, a temporal profile of dysfunction can be used as an analytical filter against false-positive findings.
Accordingly, in our Alzheimer's disease study, in which we anticipated the problem of false-positivity, we purposefully harvested samples from brains that covered a broad age-span. This allowed us to perform a secondary analysis on the 33 molecules originally derived from the spatial profile of Alzheimer's disease and to ask which of these molecules also conformed to its temporal profile (Fig. 2B). Only five molecules survived this analytical filter, and among these, VPS35, the core of the retromer trafficking complex (Seaman, 2005), best conformed to the spatiotemporal profile of disease (Small et al., 2005). Nevertheless, despite plausible mechanisms for Alzheimer's pathogenesis suggested by the trafficking itinerary of the retromer (Seaman, 2005), microarray findings by themselves do not support mechanistic interpretations, as discussed below.
Confirming a microarray finding
Two molecular confirmations are required of any microarray finding. The first is to confirm the basic mRNA finding using reverse transcription (RT)-PCR, the gold-standard method for quantifying mRNA. In truth, rarely have RT-PCR and microarray findings diverged, and so the requirement for RT-PCR confirmation has become more relaxed.
The second confirmation, testing the protein levels of an associated mRNA finding, is the one that has emerged as more important. Protein, not mRNA, is the meaningful end product of gene expression, and most microarray studies have implicitly assumed a simple relationship between transcription and translation (i.e., that more mRNA necessarily reflects more protein). By systematically quantifying mRNA and protein levels from the same tissue sample, a number of studies (Ideker et al., 2001; Chen et al., 2002; Greenbaum et al., 2003; Lee et al., 2003; Mehra et al., 2003; Beyer et al., 2004; Tian et al., 2004) have shown that this simple relationship does not hold true (Fig. 3A). Although in many cases there is a positive correlation between levels of mRNA and protein, often there is no correlation, and frequently a negative correlation is observed.
In retrospect, based on a more realistic model of gene expression, the complex relationship between mRNA and protein should not be surprising (Fig. 3B). Not only are transcription and translation governed by independent mechanisms and separate time constants, but mRNA and protein are likewise degraded by different pathways. Moreover, feedback mechanisms exist such that a protein that undergoes accelerated degradation might lead to increased translation that could account for a negative correlation between mRNA and protein. Thus, at best, a disease-related abnormality in levels of mRNA can indicate a protein abnormality, but by itself does not inform on the directionality of the defect. At worst, an abnormality in mRNA levels may not be accompanied by a coexistent protein defect.
The potential discordance between mRNA and protein levels can be addressed with relative ease: Any mRNA finding detected with microarray can, and in fact must, be tested at the protein level, for example using Western blot analysis.
Validating a microarray finding and establishing causality
A microarray experiment on human tissue is fundamentally correlative. No matter how sophisticated the experimental design or how significant the statistics, it is impossible to conclude that a molecule isolated by microarray plays a primary role in disease pathogenesis. At best, a well designed microarray experiment increases the odds of filtering out false-positive findings. However, given the analytical challenges, no microarray finding can ever be free of this shadow of doubt. Even if the finding is true (namely, that the isolated molecule is linked to the disease under investigation), the possibility always exists that a shift in expression is a secondary response to a protracted illness and is not by itself an “upstream” cause of neuronal dysfunction.
In general, three experimental approaches can be used to validate that an identified molecule, or the molecular pathway to which it belongs, plays a causal role in disease pathogenesis (Fig. 4): The first is cell-culture experiments that can be used for disorders for which there is a meaningful molecular or cellular readout (Fig. 4A). In Alzheimer's disease, for example, β-amyloid (Aβ) peptide is the biochemical “smoking gun” of the disease, and so manipulating a molecule relevant to disease pathogenesis should affect the levels of this peptide. Accordingly, as validation of our microarray study, we systematically increased or decreased the expression of retromer proteins, using expression vectors or small interfering RNA, and we did indeed observe the predicted affect on Aβ (Small et al., 2005). This finding suggests that retromer dysfunction can contribute to disease pathogenesis, but also acts to validate the spatiotemporal assumptions that went into the design and analysis of the microarray study.
Transgenic or viral vector manipulations in animals are a second approach to test for causality (Fig. 4B). In our case, we are investigating a transgenic mouse with retromer dysfunction (Lee et al., 1992) and testing whether these mice phenocopy Alzheimer's disease: at the behavioral level, testing for hippocampal-dependent memory deficits; at the electrophysiological level, testing for synaptic dysfunction; and, at the biochemical level, testing for Aβ and other indictors of disease. Manipulating the expression of a molecule of interest in a living brain has, of course, many advantages. Nevertheless, many human disorders are species-specific, suggesting that in some cases humans possess unique molecular machinery for disease pathogenesis (Hill and Walsh, 2005). Thus, animal models may not always be expected to replicate the disease process.
Returning to humans, screening for polymorphisms in the identified molecule pathway is another approach for testing causality (Fig. 4C). We have recently approached our colleagues who have genotyped a large cohort of families with lateonset Alzheimer's disease, and they are testing whether polymorphisms in any of the molecules that make up the retromer trafficking pathway increase disease risk. If so, this would provide the strongest evidence for causality. Nevertheless, this approach will only work for diseases that have a strong genetic component. In a disease driven primarily by the environment, a microarray finding might still reflect a causal process even in the absence of an identified genetic polymorphism.
After a first blush of excitement, microarray must now contend with a certain degree of backlash. Frustration is a common response to the first-generation studies, in which the most meaningful findings are typically buried in a laundry list of differentially expressed molecules. Although legitimate, this frustration does not indict the fundamental utility of microarray as a technique that can pinpoint pathogenic molecules of the brain. In fact, microarray suffers from its technical precision, sensitive enough to detect subtle mRNA differences that cause a disease but also to detect disease-independent differences in genetic heritage, environment, and regional anatomy.
In an influential review on gene mapping, Terwilliger and Goring (2000) make the compelling point that despite technical breakthroughs that have generated large amounts of genetic information and the availability of powerful computational statistics, the most important development for future discovery is “study design.” Namely, the analytical challenges and limitations inherent to genetic screening are best overcome by first establishing a more sophisticated understanding of the disease phenotype and then using this understanding to guide data acquisition and data analysis.
This conclusion also applies to geneexpression profiling. We have the technology, with microarray soon being able to measure all of the mRNA expressed in a given sample, and we have the computational power. Now, by using complementary techniques such as in vivo brain imaging, we can better “phenotype” a brain disorder: anatomically, by identifying brain regions vulnerable and resistant to a disease, and temporally, by mapping regional dysfunction over time or across age groups. Just as with Alzheimer's disease, high-resolution brain imaging can now map the spatiotemporal phenotype of many other complex disorders, by imaging the subregions of the hippocampal formation or prefrontal cortex in schizophrenia or cognitive aging, the subregions of the amygdala in autism or anxiety disorders, or the subregions of the brainstem or basal ganglia in Parkinson's disease and other movement disorders. Once established, a spatiotemporal phenotype of dysfunction can be used to improve the design of a microarray experiment, harnessing its precision and allowing microarray to fully realize its promise for isolating pathogenic molecules of the brain.
This work was supported in part by National Institutes of Health Grants AG025161 and AG08702, the Beeson Faculty Scholar Award from the American Federation of Aging, the McKnight Neuroscience of Brain Disorders Award, and the James S. McDonnell Foundation. We thank Joseph Lee for his helpful comments.
Correspondence should be addressed to Scott A. Small, Center for Neurobiology and Behavior, Columbia University, 630 West 168th Street, New York, NY 10032. E-mail:.
Copyright © 2005 Society for Neuroscience 0270-6474/05/2510341-06$15.00/0