Introduction
The identification of structural changes in the brain on magnetic resonance imaging (MRI) scans is increasingly important in the study of neurological and psychiatric diseases. MRI can be used to identify and exclude treatable causes of cognitive impairment and it has also become important in the differential diagnosis of disease, in tracking disease progression, and for research purposes. Pathological changes in the brain resulting in cell loss manifest as loss of brain tissue, or atrophy, which can be detected by structural MRI. Characteristic patterns of atrophy are associated with specific neurodegenerative diseases. Traditional techniques of analyzing atrophy on MRI include visual assessment by experienced radiologists and manual measurements of structures of interest. However, automated techniques have been developed which allow the assessment of atrophy across large groups of subjects without the need for time-consuming manual measurements or subjective visual assessments.
Voxel-based morphometry (VBM) is one such automated technique that has grown in popularity since its introduction (Wright et al., 1995; Ashburner and Friston, 2000), largely because of the fact that it is relatively easy to use and has provided biologically plausible results. It uses statistics to identify differences in brain anatomy between groups of subjects, which in turn can be used to infer the presence of atrophy or, less commonly, tissue expansion in subjects with disease. The technique typically uses T1-weighted volumetric MRI scans and essentially performs statistical tests across all voxels in the image to identify volume differences between groups. For example, to identify differences in patterns of regional anatomy between groups of subjects, a series of t tests can be performed at every voxel in the image. Regression analyses can also be performed across voxels to assess neuroanatomical correlates of cognitive or behavioral deficits. The technique has been applied to a number of different disorders, including neurodegenerative diseases (Whitwell and Jack, 2005), movement disorders (Whitwell and Josephs, 2007), epilepsy (Keller and Roberts, 2008), multiple sclerosis (Prinster et al., 2006; Sepulcre et al., 2006), and schizophrenia (Williams, 2008), contributing to the understanding of how the brain changes in these disorders and how brain changes relate to characteristic clinical features. Although results from VBM studies are generally difficult to validate, studies have compared results of VBM analyses to manual and visual measurements of particular structures and have shown relatively good correspondence between the techniques (Good et al., 2002; Giuliani et al., 2005; Whitwell et al., 2005; Davies et al., 2009), providing some confidence in the biological validity of VBM.
MRI processing
In order for statistical analyses to be performed across multiple MRI scans from different individuals, the MRI scans need to be matched together spatially (i.e., registered) so that a location in one subject's MRI corresponds to the same location in another subject's MRI. This process is known as spatial normalization. This is not an easy thing to do given that anatomy varies a great deal across subjects, and heads will be in different positions in the scanner. It is generally achieved by registering all images from a study onto the same template image so that they are all in the same space. Different algorithms can be used to perform this registration (Ashburner and Friston, 2000; Davatzikos et al., 2001), but they typically include a nonlinear transformation (Ashburner and Friston, 2000). The most commonly applied algorithm available in the Statistical Parametric Mapping (SPM) software involves performing a 12-parameter affine transformation followed by a nonlinear registration using a mean squared difference matching function (Ashburner and Friston, 2000). The template image used for the spatial normalization could be one specific MRI scan or could be created by averaging across a number of different MRI scans that have been put in the same space. Customized templates that are created using the study cohort or a cohort that is matched to the study cohort in terms of age, disease status, scanner field strength, and scanning parameters are recommended for registrations that use a mean squared difference matching function to improve the normalization between each subject in the study cohort and the template (Good et al., 2001; Senjem et al., 2005).
Images are segmented into different tissue compartments (gray matter, white matter, and CSF), and analysis is performed separately on either gray or white matter, dependent on the question being asked. There are a number of ways to perform the segmentation, including using prior probability maps as well as voxel intensity to guide segmentation, as in SPM. Such prior probability maps may be more unbiased when generated from the specific population under study. In SPM, in which a low-parameter shape transformation is performed for spatial normalization, a step called modulation is then often applied which aims to correct for volume change during the spatial normalization step (Good et al., 2001). Image intensities are scaled by the amount of contraction that has occurred during spatial normalization, so that the total amount of gray matter remains the same as in the original image. The analysis will then compare volumetric differences between scans. If the spatial normalization was precise, and all the segmented images appeared identical, no significant differences would be detected in unmodulated data, and the analysis would reflect registration error rather than volume differences. Other techniques that use different normalization procedures, such as the RAVENS (regional volumetric analysis of brain images) method, which uses high dimensional elastic transformations using point correspondence (Davatzikos, 1998; Davatzikos et al., 2001), preserve the volume of different tissues and so do not require a separate modulation step.
Finally, the images are smoothed (Ashburner and Friston, 2000; Good et al., 2001) whereby the intensity of each voxel is replaced by the weighted average of the surrounding voxels, in essence blurring the segmented image. The number of voxels averaged at each point is determined by the size of the smoothing kernel, which can vary across studies (Rosen et al., 2002; Karas et al., 2003; Whitwell et al., 2009). Smoothing makes the data conform more closely to the Gaussian field model, which is an important assumption of VBM, renders the data more normally distributed, increasing the validity of parametric tests, and reduces intersubject variability (Ashburner and Friston, 2000; Salmond et al., 2002). Smoothing increases the sensitivity to detect changes by reducing the variance across subjects, although excessive smoothing will diminish the ability to accurately localize change in the brain.
Although these processing steps are necessary to be able to analyze data across subjects, they can also introduce errors and variability into the analysis, which can reduce sensitivity. For example, VBM cannot differentiate real changes in tissue volume from local mis-registration of images (Ashburner and Friston, 2001; Bookstein, 2001). Normalization accuracy will vary across regions and, therefore, the ability to detect change will differ across regions. The accuracy of the segmentation will also depend on the quality of the normalization. Iterative normalization and segmentation methods have been developed which aim to optimize both procedures concurrently to improve the final segmentations (Ashburner and Friston, 2005). Segmentation errors can also occur because of displacement of tissue and partial volume effects between gray matter and CSF, which are both especially likely to occur in atrophic brains. The use of customized templates can help to minimize some of these potential errors (Good et al., 2001).
Statistical analysis of VBM results
Statistical analysis of the smoothed segmented images can be performed with parametric statistics using the general linear model and the theory of Gaussian random fields to ascertain significance (Ashburner and Friston, 2000), although nonparametric testing can also been applied (Nichols and Holmes, 2002; Ziolko et al., 2006; Rorden et al., 2007). The null hypothesis is that there is no difference in tissue volume between the groups in question. These analyses generate statistical maps showing all voxels of the brain that refute the null and show significance to a certain, user-selected, p value. These maps are often shown as color maps with the scale representing the t statistic, but can also be shown as three-dimensional (3D) surface renders of the brain or on what is known as the “glass-brain” display in which all significant voxels are displayed on an essentially transparent render (Fig. 1). Although both gray and white matter volumes can be assessed using VBM, the majority of VBM studies concentrate on gray matter. Changes in white matter integrity may be assessed more accurately using imaging techniques such as diffusion tensor imaging.
Because the statistical tests are performed across a very large number of voxels, it is important that studies correct for multiple comparisons to prevent the occurrence of false positives. There are a couple of typical methods used to perform such a correction, such as the family-wise error (FWE) correction (Friston et al., 1993) and the more lenient false discovery rate (FDR) correction (Genovese et al., 2002), which both reduce the chance of false-positive results (www.fil.ion.ucl.ac.uk). The FWE correction controls the chance of any false positives (as in Bonferroni methods) across the entire volume, whereas the FDR correction controls the expected proportion of false positives among suprathreshold voxels. A number of studies have also used what is called a small volume correction to reduce the number of comparisons being performed and increase the chance of identifying significant results in particular regions of interest. This method typically involves placing regions of interest over particular structures and only performing analysis over these regions. The placement of these regions should be hypothesis driven and ideally based on previous work.
Interpreting VBM results
Interpreting data across VBM studies is a problem because there are a large number of factors that can vary and influence the results. First, the processing steps often vary across studies (Whitwell and Jack, 2005), with studies using different degrees of smoothing and different registration and segmentation algorithms. Second, as well as having different options for correction for multiple comparisons, there are no standard conventions for what p value to apply to each statistical analysis, leading to variability across studies. It is important to understand that by changing the p value and using different corrections for multiple comparisons, the number of voxels that exceed the significance threshold will change, and this could potentially change the final conclusions of the study (Fig. 2). Studies also vary greatly in the number of subjects included in both control and disease cohorts, which in turn can have a large effect on the resulting p values. As with traditional statistical tests, the power to detect differences between groups will typically be a function of the sample size, the degree of the investigated “effect,” and the error probability. Therefore, the larger the sample size, the greater the power to detect differences, although differences can be observed with smaller cohorts if the effect size is large. Consequently, studies with larger sample sizes will typically be able to apply the harsh FWE correction for multiple comparisons, whereas smaller studies may favor the more lenient FDR correction. The resulting power will also depend on errors introduced in the image processing steps and variability across subjects. There are also many potential confounders that can influence the results of a VBM study, for example, differences in age, gender ratios, or disease severity across groups. These potential confounders need to be properly addressed in any study design to be able to make appropriate conclusions concerning the results.
Given all this potential variability, a comparison of t statistics or p values across studies does not tell us anything biologically meaningful, and only provides anecdotal evidence for differences between diseases and different cohorts of the same disease. Ideally, different patient cohorts should be analyzed in the same statistical model using the same processing techniques and analysis strategies, or at the very least, standardized reporting should be implemented. Currently, there are several sociological obstacles to such analyses, but projects such as the Alzheimer's Disease Neuroimaging Initiative (ADNI) (The Alzheimer's Disease Neuroimaging Initiative, 2008) may pave the way toward making better use of data. Nevertheless, it is still important that studies provide adequate detail on how they performed their statistical analysis, as well as their preprocessing, in order for the reader to be able to correctly interpret the results (Ridgway et al., 2008).
Summary
In summary, the technique of VBM if implemented correctly is an incredibly powerful and useful tool in the study of neurological disease. It can increase understanding of disease processes, which can be useful both from a scientific point of view and also by providing anatomical information that can be helpful for differential diagnosis of disease. Similar voxel-level statistical techniques can also be applied to other imaging modalities, such as functional MRI and positron emission tomography. It should be stressed, however, that because of the statistical nature of the technique, the power of VBM lies in group analyses. Although it has been applied to single subjects, it has not been optimized or validated for such use. Hence it can provide very important information about regions of atrophy across groups but cannot provide reliable information for single-subject diagnosis. Nevertheless, it is likely to be an important biomarker in future drug trials to assess treatment effects at the group level.
Footnotes
-
Editor's Note: Toolboxes are intended to briefly highlight a new method or a resource of general use in neuroscience or to critically analyze existing approaches or methods. For more information, see http://www.jneurosci.org/misc/itoa.shtml.
-
I acknowledge Dr. Keith A. Josephs and Dr. Clifford R. Jack for their helpful comments and suggestions.
- Correspondence should be addressed to Dr. Jennifer L. Whitwell, Assistant Professor of Radiology, Department of Radiology, Mayo Clinic, 200 1 st St SW, Rochester, MN 55905. whitwell.jennifer{at}mayo.edu