Abstract
Current models of brain organization include multisensory interactions at early processing stages and within low-level, including primary, cortices. Embracing this model with regard to auditory–visual (AV) interactions in humans remains problematic. Controversy surrounds the application of an additive model to the analysis of event-related potentials (ERPs), and conventional ERP analysis methods have yielded discordant latencies of effects and permitted limited neurophysiologic interpretability. While hemodynamic imaging and transcranial magnetic stimulation studies provide general support for the above model, the precise timing, superadditive/subadditive directionality, topographic stability, and sources remain unresolved. We recorded ERPs in humans to attended, but task-irrelevant stimuli that did not require an overt motor response, thereby circumventing paradigmatic caveats. We applied novel ERP signal analysis methods to provide details concerning the likely bases of AV interactions. First, nonlinear interactions occur at 60–95 ms after stimulus and are the consequence of topographic, rather than pure strength, modulations in the ERP. AV stimuli engage distinct configurations of intracranial generators, rather than simply modulating the amplitude of unisensory responses. Second, source estimations (and statistical analyses thereof) identified primary visual, primary auditory, and posterior superior temporal regions as mediating these effects. Finally, scalar values of current densities in all of these regions exhibited functionally coupled, subadditive nonlinear effects, a pattern increasingly consistent with the mounting evidence in nonhuman primates. In these ways, we demonstrate how neurophysiologic bases of multisensory interactions can be noninvasively identified in humans, allowing for a synthesis across imaging methods on the one hand and species on the other.
Introduction
Multisensory research has significantly revamped the conceptualization of how information from the different senses interacts (Wallace et al., 2004; Ghazanfar and Schroeder, 2006; Stein and Stanford, 2008). In the case of auditory–visual (AV) interactions, primates possess the requisite anatomy (Falchier et al., 2002, 2010; Rockland and Ojima, 2003; Cappe and Barone, 2005; Cappe et al., 2009a) and exhibit the predicted neurophysiology (Kayser et al., 2007, 2008; Wang et al., 2008) for AV convergence and interactions at early poststimulus latencies and in low-level brain regions, including primary cortices. In humans, however, embracing this new model is challenged by methodological criticisms that in turn dramatically impact the purported timing of nonlinear interactions and limit the neurophysiologic interpretability of the results.
Event-related potential (ERP) studies of AV interactions have reported nonlinear neural response interactions within the initial ∼50 ms after stimulus onset during detection (Molholm et al., 2002) and discrimination tasks (Giard and Peronnet, 1999), based on the comparison of multisensory responses to the summed responses from the constituent unisensory conditions (i.e., the additive model). These findings and more generally the use of an additive model to test for multisensory effects with ERPs have been criticized, in part due to so-called “common” activity, including anticipatory potentials and motor responses (Teder-Sälejärvi et al., 2002; Gondan and Röder, 2006) [though see Vidal et al. (2008) for early effects during passive conditions]. Efforts to control for these caveats have delayed onset of nonlinear interactions to ∼100 ms or later (Teder-Sälejärvi et al., 2002; Gondan and Röder, 2006). Consequently, the latency at which nonlinear interactions commence remains undetermined and was a primary focus here.
The likely underlying neurophysiology also remains unknown. One aspect is whether nonlinearities are superadditive or subadditive. Analyses of voltage waveforms do not unequivocally disambiguate the directionality of effects due to their dependence on the choice of the recording reference (Murray et al., 2008). In animal models, there is an increasing recognition of the preponderance of subadditive effects (Laurienti et al., 2005; Kayser et al., 2008; Angelaki et al., 2009). Another is whether nonlinearities stem from modulations in response strength and/or response topography, the latter of which would indicate the recruitment of distinct configurations of brain generators during multisensory processing (albeit potentially also within low-level cortices). Quantitative analyses of both the direction and topographic stability of nonlinear interactions remain undone. Finally, the sources contributing to nonlinear interactions are undefined. Dipolar source estimations of effects at late (>100 ms) latencies implicate either midbrain structures (Fort et al., 2002a) or inferior occipitotemporal and superior temporal cortices (Teder-Sälejärvi et al., 2002). Although fMRI and magnetoencephalography identified low-level unisensory areas as well as higher-level association cortices (Calvert et al., 1999; Beauchamp et al., 2004; Martuzzi et al., 2007; Raij et al., 2010), no consensus exists regarding the correct statistical criteria for identifying multisensory regions (Calvert, 2001; Beauchamp, 2005; Laurienti et al., 2005).
We addressed these controversies and the neurophysiologic basis of AV interactions in humans by taking advantage of recent advances in ERP signal analysis and source estimation methods that also provide statistically based neurophysiologic interpretability in terms of modulations in response strength versus generator configurations (Michel et al., 2004; Murray et al., 2008). ERPs were analyzed in response to task-irrelevant stimuli that required attention but no motor responses (Cappe et al., 2009b).
Materials and Methods
Participants and paradigm
Twelve healthy individuals (aged 18–29 years: mean = 24 years; 4 women and 8 men), who reported normal hearing and normal or corrected-to-normal vision, participated. Eleven of the twelve subjects were right handed (Oldfield, 1971). All participants provided written informed consent to the procedures, which were approved by the Ethics Committee of the Faculty of Biology and Medicine of the Centre Hospitalier Universitaire Vaudois and University of Lausanne. The data presented in this study were acquired in the context of a broader experiment involving the go/no-go detection of moving versus static stimuli that were auditory, visual, or multisensory auditory–visual (A, V, and AV, respectively). Full details about this experiment can be found in Cappe et al. (2009b). Briefly, the perceived motion was either approaching or receding. Specific conditions were generated using the full set of combinations of motion type (approaching, receding, and static), sensory modality (auditory, visual, and multisensory), and in the case of multisensory stimuli the congruence in motion type. There were thus 15 configurations of stimuli in total (6 unisensory and 9 multisensory). Go trials (i.e., those on which either or both sensory modalities contained moving stimuli) occurred 80% of the time. Each of the 15 conditions was repeated 252 times across 18 blocks of randomly intermixed trials. In the present study, we focused our analyses on the three static conditions: auditory static (AS), visual static (VS) and auditory–visual static (AVS) conditions.
The VS stimulus consisted of a centrally presented 10° diameter disc (either black on a white background or white on a black background, counterbalanced across blocks of trials) that was presented for 500 ms. The AS stimulus was a 1000 Hz complex tone (44.1 kHz sampling; 500 ms duration with 10 ms linear amplitude enveloping at sound onset and offset to avoid clicks) composed of a triangular waveform and generated with Adobe Audition software (Adobe Systems). Auditory stimuli were presented via insert earphones (Etymotic model ER4S) at 77 dB SPL in intensity. AVS stimuli were simultaneous presentations of the AS and VS stimuli. The interstimulus interval ranged from 800 to 1400 ms and varied pseudorandomly across trials and conditions. As these conditions constituted the no-go trials in our experiment, no motor response was required. However, the task required participants to attend to both sensory modalities. We consider it unlikely that the present effects can be explained in terms of attention/arousal. A frequent observation in studies of selective attention is that attending to information in one sensory modality results in a decreased response within regions attributed to other sensory modalities (Laurienti et al., 2002). In the present paradigm, however, subjects were attending (but not responding on the trials analyzed in this study) to both sensory modalities simultaneously, with the task requiring the detection of movement in either audition or vision (cf. Cappe et al., 2009b). Stimulus delivery and response collection were controlled by E-prime software (Psychology Software Tools).
EEG acquisition and analyses
Continuous EEG was acquired at 1024 Hz through a 160-channel Biosemi ActiveTwo AD-box referenced to the common mode sense (CMS; active electrode) and grounded to the driven right leg (DRL; passive electrode), which functions as a feedback loop driving the average potential across the electrode montage to the amplifier zero. Peristimulus epochs of EEG (−100 ms to 500 ms after stimulus onset) were averaged for each stimulus condition and from each subject to calculate ERPs for the AS, VS, and AVS conditions separately. In addition to a ±80 μV artifact rejection criterion, EEG epochs containing eye blinks or other noise transients were removed based on trial-by-trial inspection of the data. On average, 230 trials were accepted per condition. Before group averaging, data from artifact electrodes of each subject were interpolated (Perrin et al., 1987). Data were baseline corrected using the prestimulus period, bandpass filtered (0.18–60.0 Hz and using a notch at 50 Hz), and recalculated against the average reference.
General analysis strategy.
Our analyses here are based on the application of an additive model to detect nonlinear neural response interactions, wherein the ERP in response to the AVS condition is contrasted with the summed ERPs in response to the AS and VS conditions (hereafter referred to as “pair” and “sum” ERPs, respectively). Such a model has been repeatedly applied in ERP and magnetoencephalography studies in humans (Miniussi et al., 1998; Giard and Peronnet, 1999; Foxe et al., 2000; Murray et al., 2001, 2005; Molholm et al., 2002; Besle et al., 2004; Möttönen et al., 2004; Brefczynski-Lewis et al., 2009; Sperdin et al., 2009, 2010; Raij et al., 2010) as well as electrophysiological investigations in nonhuman primates (Meredith and Stein, 1986; Stein and Meredith, 1993; Wallace et al., 1996; Wallace and Stein, 2007; Kayser et al., 2008). Despite its widespread application, this model nonetheless receives some criticism (Gondan and Röder, 2006), which has been refuted on theoretical and empirical grounds (Fort et al., 2002a; Besle et al., 2004). One criticism, for example, is based on the fact that the majority of prior studies analyzed conditions that also required a motor response. Consequently, the “sum” response contains two motor responses, whereas the “pair” ERP contains only one. This difference, it has been argued, can erroneously lead to nonlinear effects, particularly at latencies that encompass (pre)motor activity. This confound would also likely manifest as a subadditive effect (due to the abovementioned difference in the number of contributing motor responses) and potentially as a shift in response latency due to the reliably faster reaction times to multisensory stimuli. This criticism is circumvented in the present study by our analysis of trials requiring attention, but no motor response. Another criticism of studies using the additive model concerned the insufficient control of attention (Besle et al., 2004). Importantly, participants perform the same task for the three modalities (A, V, and AV) in the present study, and the static stimuli on which we focused our analyses here required attention (regardless of sensory modality) to attain a high level of performance. Thus, the use of an additive model is totally adapted here to analyze auditory–visual interactions.
A particular interest of the present study was the determination of whether early nonlinear interactions could be characterized as superadditive or subadditive. Prior ERP studies did not specifically address this issue and could not statistically ascertain such due to the influence of the reference electrode choice on the polarity of voltage waveforms [though some studies did display, but not analyze, topographic distributions (Giard and Peronnet, 1999; Fort et al., 2002a; Molholm et al., 2002; Vidal et al., 2008)]. The importance of resolving this issue is highlighted by the mounting evidence in animal models documenting the prevalence of subadditive interactions that result in enhanced information content of responses (Bizley et al., 2007; Angelaki et al., 2009; Kayser et al., 2009). The topic of superadditive and subadditive interactions is likewise often intertwined with discussion of the applicability of the principle of inverse effectiveness, which would stipulate that unisensory stimulus conditions presented alone that are more effective in generating a neural response would more likely result in additive or even subadditive interactions when presented simultaneously as a multisensory pair (Stanford et al., 2005; Holmes, 2009; Cappe et al., 2010). Given that most ERP and fMRI studies of AV interactions involving rudimentary stimuli (i.e., tones/noises and shapes/flashes) in humans have presented loud and high-contrast stimuli (i.e., suprathreshold stimuli), subadditive interactions would be expected to have been occurring (though as mentioned above were not statistically assessed) and are anticipated in the present study. Subadditive effects would also be predicted from a neurophysiologic standpoint, because the majority of multisensory neurons exhibit either additive or subadditive effects both using single-unit activity in the superior colliculus (Perrault et al., 2005) and using either single/multiunit activity or local field potentials within auditory cortices (Kayser et al., 2008).
Multisensory interactions were identified with a multistep analysis procedure, which we refer to as electrical neuroimaging. Electrical neuroimaging examines local as well as global measures of the electric field at the scalp. This procedure has been described in detail previously (Murray et al., 2008). Briefly, it entails analyses of response strength and response topography to differentiate effects due to modulation in the strength of responses of statistically indistinguishable brain generators from alterations in the configuration of these generators (viz., the topography of the electric field at the scalp), as well as latency shifts in brain processes across experimental conditions. In addition, we used the local autoregressive average distributed linear inverse solution (LAURA) (Grave de Peralta Menendez et al., 2001, 2004) to visualize and statistically contrast the likely underlying sources of effects identified in the preceding analysis steps. All analyses were performed using the freely available software Cartool.
ERP waveform modulations.
At a first level, we contrasted the pair and sum ERPs from each electrode as a function of time after stimulus onset in a series of pairwise comparisons (t tests). For this analysis, only effects with p values ≤0.05 for at least 15 contiguous data points were considered reliable [equivalent to >15 ms for data acquired at 1024 Hz (Guthrie and Buchwald, 1991)]. The results of this analysis are presented as an intensity plot representing time (after stimulus onset), electrode location, and the t test result (only effects meeting or exceeding our criteria are shown). We emphasize that while these analyses give a visual impression of specific effects within the dataset, our conclusions are principally based on reference-independent global measures of the electric field at the scalp. Nonetheless the reader should note the visual similarity between the ERP morphology, polarity, and temporal profile of nonlinear effects in our study and those in Molholm et al. (2002) (their Fig. 4a).
Global electric field analyses.
The global electric field strength was quantified using global field power (GFP) (Lehmann and Skrandies, 1980). GFP equals the root mean square across the electrode montage and is thus a reference-independent measure of the ERP amplitude. GFP was analyzed as above, using a millisecond-by-millisecond paired t test. Only p values ≤0.05 were considered reliable. As above, temporal autocorrelation was corrected through the application of an >15 contiguous data-point temporal criterion (equivalent to >15 ms for data acquired at 1024 Hz) for the persistence of differential effects.
To statistically identify periods of topographic modulation, we quantified the global dissimilarity (DISS) (Lehmann and Skrandies, 1980) between pair and sum ERPs as a function of time relative to stimulus onset. DISS is calculated as the root mean square of the difference between two strength-normalized vectors (here the voltage potential values across the electrode montage). The DISS value at each time point was then compared with an empirical distribution derived from a bootstrapping procedure based on randomly reassigning each subject's data to either the pair or sum group (5000 permutations per time point), which has been colloquially referred to as “TANOVA” (detailed in Murray et al., 2008). The neurophysiologic utility of this analysis is that topographic changes indicate differences in the configuration of the brain's underlying active generators (Lehmann, 1987). This method is independent of the reference electrode and is insensitive to pure amplitude modulations across conditions. As above, only effects where p values were <0.05 for at least 15 contiguous time points were considered reliable.
Source estimations.
We estimated the electrical activity in the brain using a distributed linear inverse solution applying the LAURA regularization approach comprising biophysical laws as constraints (Grave de Peralta Menendez et al., 2001, 2004) (see also Michel et al., 2004 for a review). For the lead field calculation, the spherical model with anatomical constrains (SMAC) method was applied (Spinelli et al., 2000). This method first transforms the individual MRI to the best-fitting sphere using homogeneous transformation operators. It then determines a regular grid of 3005 solution points in the gray matter of this spherical MRI and computes the lead field matrix using the known analytical solution for a spherical head model with three shells of different conductivities as defined by Ary et al. (1981). The results of the above topographic analysis defined time periods for which intracranial sources were estimated and statistically compared between conditions (here 60–95 ms after stimulus, as will be detailed in Results). Statistical analyses of source estimations were performed by first averaging the ERP data across time to generate a single data point for each participant and condition. The inverse solution (12 participants × 2 conditions) was then estimated for each of the 3005 nodes. Paired t tests were calculated at each node using the variance across participants. Only nodes with p values ≤0.05 (t(11) ≥ 2.2) and clusters of at least 15 contiguous nodes were considered significant [see also Toepel et al. (2009) and De Lucia et al. (2009)]. This spatial criterion was determined using the AlphaSim program (available at http://afni.nimh.nih.gov). The results of the source estimations were rendered on the Montreal Neurologic Institute's average brain with the Talairach and Tournoux (1988) coordinates of the largest statistical differences indicated using nomenclature according to Brodmann's areas (BAs).
Results
Waveform analyses
Our analyses first focused on determining the timing, direction (i.e., superadditive vs subadditive), and general topography differences between the pair and sum ERPs. Visual inspection of exemplar parieto-occipital electrodes (PZ, OZ, PO3, and PO4) suggests there to be subadditive nonlinear interactions beginning ∼50–60 ms after stimulus onset (Fig. 1a). The voltage topography of the pair and sum ERPs at 60 ms after stimulus onset (Fig. 1b) is consistent in each case with a typical C1 distribution (e.g., Foxe et al., 2008). It is worth noting, however, that the locus of the maximal negativity differs between the pair and sum ERPs, suggestive of topographic differences and by extension differences in the configuration of the underlying sources (a detailed analysis of this feature appears below). The topographic distribution of the difference between the pair and sum ERPs was maximal over the right parieto-occipital scalp (though a second, slightly smaller maximum was present over the left parieto-occipital scalp). This distribution is consistent with that shown in prior studies at similar latencies (Giard and Peronnet, 1999; Fort et al., 2002a; Molholm et al., 2002; Vidal et al., 2008). Statistical analyses of the pair versus sum ERP waveforms as a function of time are displayed in Figure 2a and show that significant and temporally sustained (i.e., p < 0.05 for a minimum of 15 ms duration) nonlinear neural response interactions began at ∼60 ms after stimulus onset at several electrodes over the posterior scalp (when using an average reference). We would remind the reader at this stage, however, that our conclusions regarding the likely causes of nonlinear interactions are based solely on analyses of reference-independent features of the global electric field at the scalp. Selecting one or a few electrodes to determine the timing and directionality of neural response interactions would introduce a source of experimenter bias into the analyses, particularly given that recordings were made from 160 electrodes; we sought to avoid this bias.
Global electric field analyses
To determine whether the above nonlinear neural response interactions stemmed from a change in response strength (which would be consistent with a modulation in response gain) or alternatively from a change in response topography (which would be consistent with a modulation in the underlying brain sources), we conducted two reference-independent analyses of the global electric field. The first, an analysis of the GFP, failed to reveal statistically reliable differences in response strength. The second, TANOVA, was a millisecond-by-millisecond analysis of DISS (Fig. 2b). Significant topographic modulations were observed over the 70–90 ms poststimulus period, followed by subsequent effects over the 115–134 ms and 148–173 ms periods. These results indicate that the above nonlinear neural response interactions involve changes in the configuration of the underlying brain networks and are not simply the consequence of a gain mechanism. To more precisely delineate the timing and stability of topographic modulations, we considered the DISS time series (Fig. 2b). Examination of this waveform revealed a singular peak that onset at 60 ms, peaked at 82 ms, and reached a minimum at 95 ms. These dynamics of the DISS waveform served as the basis for the selection of the time period submitted to source estimation.
Source estimations
We last estimated the sources underlying the above waveform and topographic modulations over the 60–95 ms poststimulus period using the LAURA distributed linear inverse solution. Because the above DISS waveform exhibited a singular peak over this time period, it stood to reason that a stable topographic difference accounted for the nonlinear neural response interactions over the 60–95 ms post-stimulus onset time period. Both the pair and sum conditions included prominent sources within occipital, temporal, and temporoparietal areas. Scalar values from these source estimations throughout the entire brain volume from each participant and condition were contrasted, and the loci of significant differences are shown in Figure 3. Three clusters were identified that included bilateral BA17/18, right BA21/22, and left BA39/40. The strongest difference within BA 21/22 was centered at 58, 2, 4 mm using the Talairach and Tournoux (1988) coordinate system and was located in the temporal cortex, at the superior temporal gyrus level, which corresponds to the primary auditory cortex (Penhune et al., 1996; Rademacher et al., 2001), and extended inferiorly into the middle temporal gyrus. The second region lay within BA17/18 (centered at −3, −76, 12 mm), which includes the primary visual cortex. Last, BA39/40 (centered at −55, −61, 28 mm) is situated in the temporal cortex, more precisely near the posterior superior temporal sulcus (pSTS) and the lateral sulcus. Mean scalar values across the nodes in these clusters are likewise shown in Figure 3 and illustrate that subadditive effects were observed in all regions exhibiting a significant difference over the 60–95 ms period. To provide the reader with a general sense of the dynamics within different brain regions, Figure 3 displays the time course of the node within each cluster exhibiting the maximal statistical difference.
As a final step, we examined whether source estimations in these clusters were correlated under multisensory versus summed unisensory conditions. To do this, we calculated the mean scalar value from the 20 solution points surrounding the maximum within each cluster shown in Figure 3. Intercluster correlations are separately listed for the pair and sum conditions in Table 1. Source estimations to multisensory conditions resulted in significant positive correlations between all pairs of clusters. By contrast, source estimations to summed unisensory conditions resulted in significant positive correlation only between BA17/18 and BA39/40. Multisensory stimulation appears to result in more extensive functional coupling between primary auditory and visual cortices as well as pSTS. Similar findings have been reported in studies of macaque monkeys (Ghazanfar et al., 2008; Maier et al., 2008; Kayser and Logothetis, 2009). Recordings in these studies were limited to auditory regions and STS (no electrodes were implanted in visual cortices), Nonetheless, these studies showed there to be functional coupling between auditory cortex and the STS in the theta and beta frequency bands of the local field potential (Kayser and Logothetis (2009) as well as between auditory belt regions and the STS at higher gamma frequencies above 50 Hz (Ghazanfar et al., 2008; Maier et al., 2008).
Discussion
We identified the timing, directionality, topographic stability, and sources of auditory–visual interactions in humans by applying electrical neuroimaging analyses to ERPs in response to attended, but otherwise task-irrelevant, rudimentary stimuli. This combination circumvented caveats of prior research and revealed that auditory–visual neural response interactions occurring over the ∼60–95 ms poststimulus interval are subadditive and the result of changes in the configuration of the intracranial sources (viz., topographic modulations). Source estimations identified these subadditive effects to be simultaneously originating within a network including primary auditory cortices, primary visual cortices, and the pSTS. These results facilitate the synthesis of findings, not only from studies using other noninvasive brain-mapping techniques in humans, but also in nonhuman primates.
The timing and scalp topography of the present effects replicate prior findings with task-relevant or passively presented stimuli where nonlinear neural response interactions measured at individual electrodes started as early as ∼40–55 ms and resulted in a parieto-occipital positivity in the difference topography (Giard and Peronnet, 1999; Fort et al., 2002a; Molholm et al., 2002; Vidal et al., 2008). Our analysis of ERPs to attended but task-irrelevant stimuli, which required no motor response, excluded common motor-related activity (Teder-Sälejärvi et al., 2002; Gondan and Röder, 2006). Moreover, the varied interstimulus interval ensured that poststimulus effects were not due to prestimulus anticipatory or state-dependent modulations (Teder-Sälejärvi et al., 2002). An additive model applied to ERPs is therefore wholly suitable to evaluate multisensory interactions (Besle et al., 2004).
The present analyses extend the interpretability of early AV interactions in humans by demonstrating that this initial nonlinearity is the consequence of topographic modulations, rather than only the result of pure strength modulations. Multisensory stimuli recruit distinct configurations of intracranial generators at early stages of sensory processing. No prior ERP study statistically contrasted topographic features, obfuscating any ability to discern such effects. Instead, prior studies intimated that early effects stem from alterations in the gain of responses (Vidal et al., 2008). The observation of topographic modulations also provides an additional level of rebuttal against an explanation in terms of common activity. Common activity would presumably have modulated response strength.
We would hasten to note that while statistically robust, the topographic differences we observed were nonetheless subtle, potentially raising doubts about their constituting a general mechanism of multisensory interactions in humans. We replicated the observation of significant topographic modulations between pair and sum ERPs starting 60 ms after stimulus onset while subjects categorized living and man-made auditory, visual, and AV environmental objects (http://imrf.mcmaster.ca/IMRF/ocs2/index.php/imrf/2010/paper/view/324). Nonlinear ERP interactions appear to be the consequence of topographic modulations regardless of the specific type of stimuli and task. Still, it will be important to ascertain the specific mechanisms whereby task demands (Fort et al., 2002b), the sensory dominance of a given subject (Giard and Peronnet, 1999), and behavioral performance speed (Sperdin et al., 2009, 2010) affect nonlinear neural response interactions. Subtle modulations in response profiles are consistent with what has recently been described as a “patchy” organization (at least in auditory association cortices) of neurons responsive to unisensory versus multisensory stimulation (Beauchamp et al., 2004; Dahl et al., 2009). Future studies could apply multivoxel pattern analysis and/or machine-learning algorithms to ERP source estimations to disentangle similar phenomena.
A second innovation in the present results is the determination that the earliest nonlinear interactions are subadditive at the level of both the ERP topography and source estimation waveforms. Despite general consensus within the extant ERP literature concerning the occurrence of early nonlinear interactions, there is little agreement (or emphasis) regarding their directionality. In prior works, differentiating superadditive and subadditive effects was obfuscated by the reference-dependent nature of the analyses performed. The polarity (and presence) of effects at voltage waveforms changes with the choice of another reference electrode (cf. Murray et al., 2008). Some prior works provide visualization (but no analysis) of ERP topographies that are consistent with subadditive interactions (Giard and Peronnet, 1999; Fort et al., 2002a; Molholm et al., 2002; Vidal et al., 2008). In agreement, subadditive interactions are concordant at least at a qualitative level on the one hand with intracranial electrophysiologic recordings within auditory cortices in both humans (Besle et al., 2008) and animals (Bizley et al., 2007; Kayser et al., 2009) (see also Allman et al., 2009; Angelaki et al., 2009) and on the other hand with fMRI findings in humans (Martuzzi et al., 2007; Stevenson et al., 2009) and monkeys (Kayser et al., 2009). Despite this qualitative commonality, it remains unclear how fMRI activations relate directly or otherwise to specific varieties of neural activity either in terms of postsynaptic potentials versus spiking or in terms of specific frequency bands of responses (Kayser et al., 2009).
A third advance in the present study is its analysis of distributed source estimations. This was particularly informative, as prior ERP works diverge on whether early effects emanate from nominally visual (Fort et al., 2002a; Molholm et al., 2002) or auditory (Vidal et al., 2008) cortices, or both (Teder-Sälejärvi et al., 2002; Raij et al., 2010). Sources significantly contributing to the nonlinear neural response interactions over the 60–95 ms poststimulus period were identified within primary visual cortex, primary (and surrounding) auditory cortex, and pSTS (Fig. 3). Source strength in all of these regions was significantly subadditive, and there was evidence for functional coupling between these regions following multisensory stimulation (and to a lesser degree for unisensory conditions). Early multisensory interactions are occurring simultaneously in this distributed network of regions. This synchronicity is consistent with extrapolation of timing data available from monkeys (Musacchia and Schroeder, 2009), wherein the timing of multisensory effects would be largely constrained by the arrival of visually driven inputs. That is, auditory responses in all of the regions where we observed nonlinear interactions could begin before the arrival of visual responses. The requisite anatomy for such interactions has been well documented (Falchier et al., 2002, 2010; Rockland and Ojima, 2003; Cappe and Barone, 2005) and may even include a corticothalamocortical loop (Cappe et al., 2007a, 2009c; Hackett et al., 2007). When considered in conjunction with functional studies showing multisensory interactions in low-level cortices and/or at early latencies (Ghazanfar and Schroeder, 2006; Bizley et al., 2007; Cappe et al., 2007b; Martuzzi et al., 2007; Romei et al., 2007, 2009; Bizley and King, 2008; Kayser et al., 2008; Wang et al., 2008; Raij et al., 2010), we consider it probable that the present phenomena are supported by direct interconnections that at a functional level are likely a combination of feedforward as well as feedback activity.
Synchronous, coupled effects across brain regions are consistent at a general level with a role for oscillatory activity in multisensory phenomena (Senkowski et al., 2008; Kayser and Logothetis, 2009). In monkeys, oscillations in primary cortices have been examined vis-à-vis the ability of nonpreferred sensory inputs to modulate the phase of ongoing activity and/or facilitate evoked neuronal responses to the preferred sensory modality (Lakatos et al., 2007, 2008, 2009). Attended, nonpreferred unisensory stimuli reset the phase of ongoing responses in primary cortices without generating a discrete evoked response (Lakatos et al., 2009). The evoked response to the preferred unisensory stimulus was then enhanced, because it was in phase with the ongoing oscillations. This type of mechanism has been postulated to explain superadditive nonlinear interactions between auditory and somatosensory stimuli (Lakatos et al., 2007). Combining source estimations with single-trial time–frequency analyses and causality analyses would allow for evaluating this model in humans, but still requires additional methodological developments (Gonzalez Andino et al., 2005; Van Zaen et al., 2010). In addition, it will be critical to reconcile how such phase-resetting mechanisms generate superadditive versus subadditive interactions and whether such is contingent on the sensory modalities tested. While superadditive interactions have been consistently observed with auditory–somatosensory stimuli (Foxe et al., 2000, 2002; Murray et al., 2005), effects have reliably been subadditive in the case of auditory–visual interactions (Giard and Peronnet, 1999; Molholm et al., 2002; Martuzzi et al., 2007). More generally, subadditive effects are consistent within predictions and recordings at the single-neuron level (Laurienti et al., 2005; Perrault et al., 2005; Kayser et al., 2008). Here (as elsewhere) the stimuli were suprathreshold (i.e., high-contrast displays and readily audible sounds) and therefore highly effective. Stein and Meredith's (1993) principle of inverse effectiveness would predict such conditions to result in (sub)additive neural interactions. Likewise, the dynamic range of neuronal firing has been shown to influence the directionality of multisensory effects. Neurons with larger dynamic ranges, described as the most prevalent, are more likely to exhibit subadditive effects (Perrault et al., 2005) [see also Carriere et al. (2007) for a description of “subthreshold” and “modulatory” neurons]. The present findings may be detecting a homologous situation in humans.
In summary, electrical neuroimaging analyses surmounted long-standing debates on the suitability of ERPs for multisensory research and provide insights regarding neurophysiologic bases of AV interactions. Nonlinear interactions commence ∼60 ms after stimulus onset, follow from topographic (i.e., generator configuration) modulations, and result in subadditive and functionally coupled responses within primary auditory and visual cortices as well as pSTS. These findings advance the ability to synthesize conclusions across imaging methods and species.
Footnotes
- Received March 3, 2010.
- Revision received June 20, 2010.
- Accepted July 16, 2010.
-
This work has been supported by the Swiss National Science Foundation (Grant #3100AO-118419 to M.M.M.) and the Leenaards Foundation (to M.M.M. and G.T.). Cartool software has been programmed by Denis Brunet, from the Functional Brain Mapping Laboratory (Geneva, Switzerland), and is supported by the EEG Brain Mapping Core of the Center for Biomedical Imaging (www.cibm.ch) of Geneva and Lausanne.
- Correspondence should be addressed to either Céline Cappe or Micah M. Murray, Neuropsychology and Neurorehabilitation Service, Department of Clinical Neurosciences, Centre Hospitalier Universitaire Vaudois and University of Lausanne, rue du Bugnon 46, 1011 Lausanne, Switzerland. celine.cappe{at}chuv.ch or micah.murray{at}chuv.ch
- Copyright © 2010 the authors 0270-6474/10/3012572-09$15.00/0