An important question for research on audiovisual integration in humans is whether multisensory information is brought together in the primary sensory or association areas of the cortex. For example, can auditory information activate primary visual cortex directly, or must it first be processed by the primary auditory cortex and higher-order association areas? Studying the information flow of audiovisual processing in the human brain is crucial for discovering the neural mechanisms of audiovisual integration.
Although many electroencephalography (EEG) studies have investigated the temporal aspects of brain processing during audiovisual integration, the limited spatial resolution of EEG cannot provide the actual propagation route across different brain regions in great detail. Meanwhile, with high spatial resolution but relatively low temporal resolution, most previous studies using functional magnetic resonance imaging (fMRI) emphasized spatial localization of brain activity during audiovisual processing. So far, only a few fMRI studies have investigated the temporal sequence of brain activations. Those studies were based mainly on the framework of the general linear model (GLM) (for review, see Formisano and Goebel, 2003). A recent fMRI study by Fuhrmann Alpert et al. (2008) published in The Journal of Neuroscience focused on studying the temporal characteristics of audiovisual processing, using mutual information to help assess the relative timing of activations in different brain areas under simultaneous audiovisual (AV) stimulation as well as separate auditory and visual stimulation (Fuhrmann Alpert et al., 2008).
Mutual information is a measure of the statistical interdependence of two random variables such as a particular stimulus condition (e.g., AV stimulation) and the blood oxygenation level-dependent (BOLD) response: a higher mutual information value implies a greater predictability of the BOLD signal from the preceding stimuli. Compared with conventional GLM, the advantages of mutual information are that it measures not only the linear but also the nonlinear relationship between two random variables, and that no prior assumption about the shape of the relationship [e.g., hemodynamic response function (HRF)] is required (Fuhrmann Alpert et al., 2007). By estimating the mutual information between the preceding stimulus condition and the BOLD responses for each voxel and the latency after the onset of the stimuli, this approach can detect both brain activation and the preferred latency that maximizes the information content of the BOLD signal about the preceding stimuli. Assuming that the preferred latency reflects brain processing time, the temporal sequence of brain activity can be revealed by comparing the preferred latencies of different brain regions.
Fuhrmann Alpert et al. (2008) found that the AV-related activity occurs earliest in the primary auditory and visual cortices and later in the inferior frontal cortex [Fuhrmann Alpert et al. (2008), their Figs. 3 (http://www.jneurosci.org/cgi/content/short/28/20/5344/F3) and 4 (http://www.jneurosci.org/cgi/content/short/28/20/5344/F4)]. This finding suggests a bottom-up information flow of audiovisual processing. Importantly, these findings, as well as the findings of earlier activity at auditory areas than at visual areas, are consistent with previous EEG studies. This supports the feasibility and validity of this information-theoretic method for extracting temporal information from fMRI data. This study may raise increasing interest in the use of fMRI for investigating the temporal sequence of brain activation in the future.
Another interesting finding of this study is that the latencies of brain activity at primary auditory and visual cortices are shorter for AV stimulation compared with those for auditory or visual stimulation alone [Fuhrmann Alpert et al. (2008), their Fig. 4 (http://www.jneurosci.org/cgi/content/short/28/20/5344/F4)]. This finding suggests that simultaneous audiovisual stimulation facilitates brain activity at early sensory areas. Indeed, an increasing number of EEG and fMRI studies have observed cross-modal effects in unimodal or even primary cortical areas (for review, see Driver and Noesselt, 2008). Some hypotheses of the underlying neural mechanism have been proposed: (1) direct interactions between early sensory areas, (2) nonspecific thalamic inputs, and (3) feedback from higher-level multimodal areas. Combined with the evidence of the activation in a unimodal primary sensory cortex by stimuli from another modality (Kayser et al., 2007), the facilitation of neural activity in primary sensory areas at early latencies in this study suggests the existence of direct connections between primary sensory areas or nonspecific thalamic inputs.
It should be noted, however, that the observed facilitation might not reveal the full extent of information flow during audiovisual processing, because only the latency with the highest mutual information value was assigned to each voxel, with other latencies discarded. It is possible that, in some cases, bottom-up and top-down processing may coexist but occur at different latencies. Therefore, those voxels involved in both bottom-up and top-down processing may have two peaks at two different latencies. If separate peaks for top-down processing can be identified, this method may be able to provide more information about the actual temporal dynamics of brain activation.
Although the findings of this study suggest that this novel method is a promising tool for exploring the temporal information of brain activity, the following methodological and analytical considerations may make unequivocal interpretation of the results difficult.
First, the timing of the stimulus presentations and the time course of the BOLD response make a temporal analysis difficult. The authors have argued previously that this novel method could be generalized for the analysis of sustained stimuli or short intertrial intervals (ITIs) because it does not require any assumption of linearity between the stimulus and BOLD response (Fuhrmann Alpert et al., 2007). However, for short ITI designs, more than one stimulus may contribute to the BOLD signal at short latencies because the BOLD response is sluggish and sustained, such that the response to the first stimulus has not returned to the baseline when the following stimulus evokes a subsequent response. It means that the BOLD response to a given stimulus may be contaminated by the responses to the adjacent stimuli. This may shift temporally the peak BOLD response to the given stimulus and thus the peak latency of mutual information. More importantly, the peak latency of mutual information may shift differently in different brain regions, which will thus affect the comparison of the preferred latencies in different brain regions. Therefore, an event-related design with suitably long ITIs could circumvent this problem. However, in their experimental paradigm, Fuhrmann Alpert et al. (2008) used a fixed and short ITI of 2 s, which cannot avoid the contamination from adjacent stimuli. Hence, the accuracy of their results needs to be confirmed in the future by experiments using longer ITIs.
Second, the findings of their study are based on the hemodynamic response, and thus caution should be taken when interpreting the results in terms of the underlying neural activity. It is known that different brain areas may have different HRFs (Lee et al., 1995). In their previous study (Fuhrmann Alpert et al., 2007), the authors argued that one advantage of this novel information-theoretic approach is that it does not require any assumption of the shape of HRF. For activation detection, this is indeed true when compared with conventional GLM. For latency analysis, however, whether any assumption of HRF is made or not, the underlying HRF variability across different brain areas will still introduce a gap between the neural activity and the corresponding hemodynamic response. As is pointed out in their previous study (Fuhrmann Alpert et al., 2007), the HRFs in some prefrontal areas have untypical shapes and delayed peak latencies compared with the motor areas. This fact further highlights the possibility that the differences in HRFs may confound the differences in preferred latencies of different brain areas (e.g., shorter latency at the primary auditory cortex and longer latency at the inferior frontal cortex). Given the extensive support from electrophysiological studies for the findings of this study, together with the evidence from other studies that fMRI can trace sequences of neural events adequately (Menon and Kim, 1999), the differences in HRFs may be relatively small compared with the time delay between neural activities in different brain areas. However, it is still possible that the latency differences observed for some brain areas in this study might be at least partly the result of the difference in HRFs, which confounds the interpretation of the results.
In future studies, simultaneous fMRI and dense EEG studies, or even magnetoencephalography studies with the same experimental design may provide more reliable and immediate evidence of the validity of this novel method and might also provide supplementary information about the temporal aspect of audiovisual processing. Given the importance of understanding the underlying neural mechanisms of audiovisual integration, more efforts should be made in the future to unravel the actual temporal dynamics of audiovisual processing.
This work was supported by the Volkswagen-Stiftung. We thank Karl Magnus Petersson and Michael Lee for constructive comments and corrections on this manuscript.
Editor's Note: These short, critical reviews of recent papers in the Journal, written exclusively by graduate students or postdoctoral fellows, are intended to summarize the important findings of the paper and provide additional insight and commentary. For more information on the format and purpose of the Journal Club, please see http://www.jneurosci.org/misc/ifa_features.shtml.
- Correspondence should be addressed to either of the following: Meng Liang, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, UK, ; or Tessa M. van Leeuwen, F.C. Donders Centre for Cognitive Neuroimaging, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands, E-mail: