Abstract
The presentation of simple auditory stimuli can significantly impact visual processing and even induce visual illusions, such as the auditory-induced double flash illusion (DFI). These cross-modal processes have been shown to be driven by occipital oscillatory activity within the alpha band. Whether this phenomenon is network specific or can be generalized to other sensory interactions remains unknown. The aim of the current study was to test whether cross-modal interactions between somatosensory-to-visual areas leading to the same (but tactile-induced) DFI share similar properties with the auditory DFI. We hypothesized that if the effects are mediated by the oscillatory properties of early visual areas per se, then the two versions of the illusion should be subtended by the same neurophysiological mechanism (i.e., the speed of the alpha frequency). Alternatively, if the oscillatory activity in visual areas predicting this phenomenon is dependent on the specific neural network involved, then it should reflect network-specific oscillatory properties. In line with the latter, results recorded in humans (both sexes) show a network-specific oscillatory profile linking the auditory DFI to occipital alpha oscillations, replicating previous findings, and tactile DFI to occipital beta oscillations, a rhythm typical of somatosensory processes. These frequency-specific effects are observed for visual (but not auditory or somatosensory) areas and account for auditory–visual connectivity in the alpha band and somatosensory–visual connectivity in the beta band. We conclude that task-dependent visual oscillations reflect network-specific oscillatory properties favoring optimal directional neural communication timing for sensory binding.
SIGNIFICANCE STATEMENT We investigated the oscillatory correlates of the auditory- and tactile-induced double flash illusion (DFI), a phenomenon where two interleaved beeps (taps) set within 100 ms apart and paired with one visual flash induce the sensation of a second illusory flash. Results confirm previous evidence that the speed of individual occipital alpha oscillations predict the temporal window of the auditory-induced illusion. Importantly, they provide novel evidence that the tactile-induced DFI is instead mediated by the speed of individual occipital beta oscillations. These task-dependent occipital oscillations are shown to be mediated by the oscillatory properties of the neural network engaged in the task to favor optimal temporal integration between the senses.
- Multisensory Integration
- Double-Flash-Illusion
- Functional Connectivity
- Alpha Oscillations
- Beta Oscillations
Introduction
Our senses act as temporal gateways to our environment, allowing continuous information streams within and across senses to be coded into discrete information units (VanRullen and Koch, 2003; Chakravarthi and VanRullen, 2012; VanRullen, 2016). The temporal resolution of such mechanisms may allow the brain to temporally bind sensory input over time and across senses into meaningful objects and events (Cecere et al., 2015), reducing the complexity of our environment (Wutz et al., 2016, 2018).
This Bayesian mechanism (Beierholm et al., 2009; Barakat et al., 2013; Kayser and Shams, 2015; Cuppini et al., 2017) generally leads to prompt, efficient readouts of the experienced environment. However, when presented with incongruent sensory information, it often gives rise to illusory phenomena. One such example is the double flash illusion (DFI). Shams et al. (2000) first discovered that when two shortly interleaved beeps are paired with a single flash, participants often perceive a second illusory flash (Shams et al., 2000, 2002). Such an illusion may possibly represent the best coherent perceptual resolution of otherwise conflicting sensory information (Cecere et al., 2015). By systematically manipulating temporal intervals between paired “beeps,” it is possible to define the temporal window of this illusion (TWI; i.e., the time interval in which the illusory flash is perceived). This TWI, first characterized by Shams et al. (2002) and detailed by Cecere et al. (2015), demonstrates that the illusion decays when the average time between stimuli exceeds 100 ms. Cecere et al. (2015) argued that these TWIs, variable across individuals, are reminiscent of the temporal profile of posterior oscillatory activity in the alpha band (8–12 Hz). Using both correlational and causal approaches, Cecere et al. (2015) found a tight correlation between individual TWI and individual alpha frequency (IAF) peak with faster IAFs predicting shorter TWIs, and slower IAFs predicting larger TWIs.
Yet, we are unaware whether this mechanism is determined by local network rules per se (i.e., local occipital oscillatory resonance activity, typically alpha; Rosanova et al., 2009) or whether it depends on long-range communication networks (Fries, 2015; i.e., the way in which a sensory modality such as auditory, impacts visual cortex activity; Romei et al., 2012). In other words, are cross-modal visual illusions determined strictly by typically visual oscillatory constraints or do the visual oscillations mediating these effects reflect the oscillatory properties of the functional connection between sensory modalities?
An elegant way to tease apart these hypotheses is to investigate the temporal profile and neural underpinnings of a DFI induced by a sensory modality other than audition and compare it with the auditory DFI. Here, we used the tactile DFI (Violentyev et al., 2005), whereby replacing paired “beeps” with “taps” upon the index finger elicits a similar illusory experience. No previous report of a temporal profile for the tactile DFI exists. If the induced illusory flash is determined by local resonance frequency of the visual cortex (alpha), regardless of paired modality, then similar illusory phenomena should also be mediated by occipital IAF. Alternatively, if functional connections between auditory/somatosensory and visual cortices determine the fate of the illusory experience, then occipital oscillations accounting for auditory and tactile DFI may depend on communication-specific mechanisms influencing visual cortical processing at the speed of their typical resonance frequency.
According to the “Communication Through Coherence” framework (Fries, 2005, 2015), neural communication subserved by oscillatory synchronization between remote but functionally interconnected areas would be the result of the alignment of postsynaptic neural activity (visual cortex) to presynaptic input (auditory/somatosensory cortex), creating temporal windows of optimal communication.
This hypothesis would not contradict evidence that auditory-induced TWI is mediated by alpha oscillations as auditory processing (presynaptic), which is typically associated with alpha activity (Weisz et al., 2011), phase aligns alpha oscillations in visual cortex (postsynaptic; Romei et al., 2012). Crucially, this would predict somewhat faster waves to influence the tactile-TWI, since tactile processing (presynaptic) is often associated with beta frequency oscillations (Salenius and Hari, 2003; Foffani et al., 2005; Engel and Fries, 2010; Baumgarten et al., 2015).
Materials and Methods
Participants
A total of 62 participants volunteered to take part in the study, which was approved by the ethics committee of the University of Essex. Eleven participants were excluded from data analysis as their perceived illusion could not be fitted to the sigmoid function curve.
All but three participants (of whom two were left handed and one was ambidextrous by self-report were right handed; mean age, 25 years; age range, 18–44 years; 31 females).
Before taking part, participants completed a screening questionnaire ensuring that they had no psychiatric or neurological history and normal (or corrected) vision, as well as normal hearing and somatosensation by self-report.
Materials and apparatus
All visual stimuli were presented on a 17.5 inch cathode ray tube monitor via a Dell Optiplex 960 computer (resolution, 1280 × 1024; Windows XP, Microsoft) with a refresh rate of 85 Hz. Auditory stimuli were delivered via a pair of speakers placed either side of the monitor (perceived by the participants as originating from the center of the screen, close to the visual stimuli). Volume was set so stimuli were ∼50 dB (SPL) at the location of the participants' head. The tactile stimulation was provided via a tactile controller and mechanical solenoid stimulator (Heijo Research Electronics). This would deliver a suprathreshold tap (on the left index finger tip) by pushing a blunt plastic tip against the participant's skin whenever a current was passed through the solenoid. During the tactile stimulation, white noise (∼50 dB) was played to participants through speakers to mask and ensure that the mechanic noise produced by the tactile stimulator was not heard by the participants. Experimental stimuli were presented via E-prime (version 2.0; Psychology Software Tools).
We piloted the experiment in the first 15 participants, and an electroencephalogram (EEG) was recorded with a restricted number of electrodes including electrodes Oz, O2, O1, FP1, FPz, and FP2, alongside the ground electrode (location, AFz), and the reference electrode was placed over the right mastoid bone.
In the remaining participants (N = 36), the EEG was recorded from 64 sintered Ag/AgCl electrodes mounted on an elastic cap (Easycap) alongside the ground electrode (position, AFz) and the reference electrode (placed upon the right mastoid bone). The EEG signals were digitized at 500 Hz and amplified using a BrainVision Professional BrainAmp amplifier through the BrainVision Recorder program (BrainProducts). Before the recording began, we ensured that all electrodes were set on the participant's scalp at an impedance not exceeding 10 kΩ.
In all trials, participants were presented with a flashing disc, displayed just below a central fixation cross (this disc always flashed once for a duration of 12 ms and had a diameter of 2 cm). During the auditory DFI task, the disc was always paired with a double beep, with each beep having a frequency of 3500 Hz and a duration of 7 ms. During the tactile DFI task, disc presentation was paired with a double tactile stimulation to the left index finger.
The two brief tones (and the two tactile stimulations) were spaced apart by varying stimulus onset asynchronies (SOAs) ranging between 36 and 204 ms with increments of 12 ms, resulting in 15 different SOAs. Each SOA was presented 10 times, resulting in 150 randomly ordered trials per task.
The time between trials included the presentation of the stimuli (as described above) plus a varying interval. The interval corresponded to the elapsed time following the experimenter inputting on the keyboard the participant's vocal response plus an interval ranging between 1000 and 1800 ms (there were five different intertrial delays in steps of 200 ms, each occurring 30 times).
Experimental design
Upon EEG fitting completion, participants were seated 57 cm away from the screen. EEG recording was manually started before trial commencement. Participants were instructed to fixate on a cross situated at the center of the screen while 150 flashing discs were presented in a first block of trials paired with two auditory (or tactile) stimuli, followed, after a brief resting period, by a second block of 150 flashing discs paired with two tactile (or auditory) stimuli. To control for order effects (including fatigue or boredom), the order of the blocks was counterbalanced, with half of the participants performing the tactile DFI first, and the other half performing the auditory DFI first. For the tactile DFI block, participants were asked to place their left index finger immediately below the presentation of the flashing disc to maximize spatial co-occurrence of the visual and tactile stimuli processing.
In all trials, participants were required to verbally report whether they perceived one or two flashes, to avoid motor interference from participants using their resting hand to respond to the stimuli, especially with the tactile version of the experiment. Participants were instructed to provide unspeeded, accurate responses. The verbal report was then input by the examiner via the “1” and “2” key on the keyboard, which prompted the new trial to start after a variable intertrial interval.
Statistical analysis
Behavioral data analysis
The participants' perceived illusory flashes across the different SOAs were used to separately calculate for the auditory and tactile DFI the temporal window in which the visual illusion was maximally perceived. Therefore, we calculated the percentage of illusory trials (i.e., two flashes perceived) and plotted them as a function of SOAs separately for the auditory and tactile DFI. A psychometric sigmoid function [y = a + b/(1 + exp(−(x − c)/d)), where a is the top asymptote, b is the bottom asymptote, c is the inflection point, and d is the slope] was then fitted to each percentage of distribution returning a corresponding inflection point (center c) of the fitted sigmoid representing the point of decay of the illusion, taken as an index of the TWI. If data would not fit to the sigmoid function, a participant's performance was deemed unreliable and discarded. Following this procedure, 11 of the 62 participants were not enrolled in the full experiment procedure and therefore were excluded from data analysis.
EEG data analysis
Sensor space analysis.
EEG activity concurrently recorded during task execution was analyzed to calculate individual alpha and beta frequency peaks for each participant performing the auditory and tactile DFI tasks.
In the first 15 participants, EEG analysis was performed on electrode Oz only. Depending on the band of interest, the data were bandpass filtered as follows: for alpha, a high-pass filter of 3 Hz and a low-pass filter of 40 Hz were used (identical to Cecere et al., 2015); for beta, given the lower power relative to alpha, a more stringent criterion, a high-pass filter of 12 Hz and a low-pass filter of 25 Hz, was used. The EEG signal was segmented into equal epochs of 2000 ms. As data in this first sample of participants were not synched to stimulus presentation (no trigger was recorded for each stimulus onset and response), the 2000 ms epochs corresponded to consecutive nonoverlapping segments independent of the stimulus onset (for a total of ∼170 epochs on average). The potential confound of induced and evoked oscillatory responses was controlled for in the second group of 36 participants, where a 64 channel EEG was recorded at a sampling rate of 500 Hz. In this group, the EEG signal was rereferenced off-line to the average of all scalp electrodes. EEG data were subsequently segmented into 2000 ms epochs time locked to and preceding the visual stimulus onset. This resulted in 150 epochs of prestimulus oscillatory activity for each of the three frequency bands assessed both for the tactile and auditory DFI task. Each single epoch was visually inspected for artifacts (from eye blinks and muscle contractions) and manually rejected where necessary. For each participant and for all the recorded electrodes, a full power spectrum was obtained through fast Fourier transform with a zero padded window (nominal frequency resolution, 0.125 Hz). Finally, for each participant, task, and frequency band, EEG segments were averaged for calculation of the average peak frequency in the visual cortex, as calculated at the electrode Oz. For each frequency band, the peak frequency was determined for each participant as the value corresponding to the maximum peak frequency within their frequency range: alpha, 7–12 Hz; beta, 12–25 Hz. Finally, for each participant the speed (in milliseconds) of one single oscillatory cycle was calculated using the peak frequency data (in hertz) obtained in the alpha and beta bands over Oz in the first 15 participants and >64 channels in the other 36 participants.
Source space analysis.
All source space analyses were performed on the second group of 36 participants for whom the signal had been recorded from a full set of 64 EEG channels.
Frequency peak analysis in virtual electrodes.
Virtual electrodes were computed for three different cortical areas (visual cortex, auditory cortex, and somatosensory cortex) using the linearly constrained minimum variance scalar beamformer (Sekihara et al., 2004) implemented in Fieldtrip. First, a 10 mm three-dimensional grid was fitted to the MNI standard brain. Then, the forward model was created using a standardized realistic head model. The spatial filters were computed for each DFI task using a 2 s prestimulus and a 0.5 s post-second stimulus covariance window, with the regularization parameter set to 10%. Single-trial time series were projected to the cortical surface by multiplying them by the spatial filters weights. The source volume was interpolated with the MNI standard brain to define the following three regions of interest: right calcarine gyrus (visual cortex), right superior temporal gyrus (auditory cortex), and the right postcentral gyrus (somatosensory cortex). For each participant, the IAF and individual beta frequency (IBF) were calculated in the voxel inside each of the three ROIs that showed a clear peak with the maximal amplitude. Finally, for each participant and selected voxel we calculated the speed (in milliseconds) of one single oscillatory cycle for each peak frequency data (in hertz).
Phase-locking value analysis.
To quantify the frequency specificity synchronization between the somatosensory and the visual cortex in the tactile DFI condition, and between the auditory and the visual cortex in the auditory DFI condition, we computed the phase-locking value (PLV) centered in each participant-specific IAF and IBF (Lachaux et al., 1999). The time series in each virtual electrode was filtered with a central frequency of IAF and IBF ±1 Hz. The instantaneous phase complex representation of the filtered signal was calculated as follows: eiφ(t) = sa(t)/|sa(t)|, where sa(t) is the analytic representation of the signal. The phase alignment between the two virtual electrodes was computed as follows: where N is the number of trials.
PLVs were computed separately for trials within each participant's TWI and for trials outside each participant's TWI, and rescaled with respect to a 100 ms prestimulus window. Nonparametric statistics were used to compute significant differences between each condition (Maris and Oostenveld, 2007). First, temporal clusters of PLVs were calculated based on time-points that were significant in paired t tests. Then, Monte-Carlo randomization was performed to obtain the empirical distribution of the maximum cluster statistic, computed as the sum of within-cluster t values. The observed cluster was considered significant if its cluster statistic value was above the 95% of the empirical distribution.
Correlation analyses on behavioral data
First, we looked at the behavioral data obtained in the 51 participants for the auditory and tactile DFI, to compare performance in the two tasks and characterize for the first time the temporal profile of the tactile DFI. Second, we assessed the relationship between the known auditory DFI and the previously unexplored tactile DFI temporal profiles.
To investigate this relationship, we also used the robust skipped correlation method, as described by Pernet et al. (2013).
Correlation analyses between behavioral and electrophysiological data (sensor space)
Next, we performed correlational analyses between the individual speeds (in milliseconds) of each oscillatory cycle and the individual width (in milliseconds) of the TWI separately for the auditory and tactile DFI.
Our behavioral and electrophysiological data were used to test the following predictions. First, we aimed to replicate data from Cecere et al. (2015) providing evidence suggesting that occipital IAF is selectively predictive of TWI size. Second, we wanted to test the hypothesis that occipital IAF is predictive of both the size of the auditory and tactile TWI or alternatively that the size of TWI is differently accounted for by the occipital IAF in the specific instance of the auditory DFI and by the IBF in the specific instance of the tactile DFI. We tested these hypotheses first in the initial 15 participants over Oz (with epochs unlocked to stimulus onsets) and again in the sample of 36, this time using a full array of electrodes allowing for a topographical distribution of Pearson's r (and stimulus-locked epochs). As the preliminary analyses of both behavioral and EEG data showed comparable results between groups, notably excluding at the EEG level the potential confounds of evoked responses in the calculation of individual frequency peaks, data from both groups were pooled together for behavioral and EEG analyses at sensor Oz. Furthermore, we used the robust skipped correlation method as described by Pernet et al. (2013).
Multiple regression analyses between behavioral and electrophysiological data (source space)
To test whether any relationship between behavioral and oscillatory data was specific to the visual cortex, a multiple linear regression analysis was used to assess the relationship between: (1) the TWI in the auditory DFI and the IAF and IBF of visual and auditory virtual electrodes; and (2) the TWI in the tactile DFI and the IAF and IBF of visual and somatosensory virtual electrodes (Keil et al., 2016). A forward step procedure was adopted to fit the regression model.
Results
Auditory-induced versus tactile-induced DFI
We first determined the temporal profile for the auditory and tactile DFI. For the auditory DFI, we replicated previous reports (Cecere et al., 2015) of an average TWI of just ∼100 ms. The temporal profile of the tactile-induced DFI was very similar to the auditory-induced DFI in the same participants and did not significantly differ from each other (auditory-induced TWI, 99.02 ms; SEM, 3.08; tactile-induced TWI, 102.80 ms; SEM, 3.23; t(50) = −1.02; p = 0.31). We then tested whether these two measures were correlated. We found a significant correlation between the two versions of the DFI (Pearson's r = 0.31, p = 0.03), which also survived the robust skipped correlation method (r = 0.31, CI = 0.02–0.55; Fig. 1).
We further compared the two sensory versions of the illusion by contrasting the goodness of fit across the two versions of the DFI. Specifically, measurements were taken for the R2 value (as an indicator of the goodness of fit) for each curve across participants and conditions. We found that the goodness of fit for the tactile illusion (R2 = 0.70) was significantly lower compared with that of the auditory illusion (R2 = 0.83, p < 0.001), suggesting that the tactile illusion is inherently noisier than the auditory illusion.
Overall, a first interpretation of these behavioral findings is that the auditory and tactile versions of the DFI might be driven by similar neurophysiological mechanisms.
EEG correlates of auditory DFI and tactile DFI
Sensor space
We found that occipital IAF (in milliseconds) positively correlates with the size of the TWI in the auditory DFI (Pearson's r = 0.52; p < 0.001), which also survives robust skipped correlations (r = 0.41, CI = 0.18–0.59), such that faster IAFs accounted for shorter TWIs, essentially replicating the results of the study by Cecere et al. (2015). Pearson's correlation topography (calculated on 36 participants) suggests that this effect is maximal over posterior regions and is frequency specific as no significant correlations could be found for IBF (calculated on 51participants: r = −0.06, p = 0.69; Fig. 2). Crucially, when looking at the tactile DFI, a different pattern of results emerged. IAF did not correlate with TWI when the TWI was induced by tactile stimuli (r = 0.13, p = 0.38). Instead, we found that occipital IBF positively correlated with the size of the TWI in the tactile DFI (Pearson's r = 0.54, p < 0.001), which also survives robust skipped correlations (r = 0.54, CI = 0.32–0.69), such that faster IBFs accounted for shorter TWIs (Fig. 3B).
Source space
Multiple linear regression analysis showed that, for the TWI of the auditory DFI task, the visual IAF (beta = 0.751, p < 0.01) was a significant predictor [in line with recent findings by Keil and Senkowski (2017)], while the auditory IAF (0.040, p > 0.05), the visual IBF (beta = 0.020, p > 0.05) and the auditory IBF (beta = −0.05, p > 0.05) were not significant. The overall model fit was R2 = 0.184.
For the TWI of the tactile DFI task, the visual IBF (beta = 0.984, p < 0.05) was a significant predictor, while the somatosensory IBF (−0.141, p > 0.05), the visual IAF (beta = −0.020, p > 0.05), and the somatosensory IAF (beta = 0.104, p > 0.05) were not significant. The overall model fit was R2 = 0.16.
Phase-locking value
Next, we explored whether the frequency-specific effects observed at the level of the visual cortex for the auditory DFI and the tactile DFI can be best explained by a network-specific mechanism. For this purpose, we measured the PLV in alpha and beta oscillatory activity for auditory–visual and somatosensory–visual networks depending on the following: (1) the performed task (auditory and tactile DFI); and (2) the individual TWI, thus contrasting trials within and outside the TWI, respectively.
Nonparametric statistical analysis revealed significant differences between trials within and outside the TWI (Fig. 3). Specifically, IAF PLVs between the auditory and visual cortices in the auditory DFI were significantly greater for the trials outside the TWI in a temporal cluster composed between 310 and 400 ms poststimulus (p = 0.046). IBF PLVs between the visual and somatosensory cortices in the tactile DFI differed between conditions in two temporal clusters, between 210 and 260 ms and between 280 and 360 ms poststimulus (p = 0.015 and p = 0.03, respectively).
Discussion
In the current study, we characterized for the first time the temporal profile of the tactile DFI by directly comparing it to the temporal profile of the auditory DFI. We found that these temporal profiles are comparable; they do not significantly differ and positively correlate, suggesting that similar mechanisms may be at play in determining these effects. We thus tested which neurophysiological mechanism might best account for the auditory and tactile DFI.
EEG results demonstrated that oscillatory processes relate to the two illusions in a frequency-specific and network-specific manner. While replicating previous findings demonstrating a relationship between IAF and auditory DFI (Cecere et al., 2015; Keil and Senkowski, 2017), we could not replicate this relationship between IAF and tactile-TWI. Instead, a positive correlation between TWI and IBF was found, such that faster IBF predicted shorter TWI. This was found both at sensor and source space, over early visual areas. Moreover, in source space we found that visual (but not auditory or somatosensory) IAF explained the auditory–visual TWI (in line with a recent report by Keil and Senkowski, 2017) and similarly only visual IBF explained the tactile–visual TWI.
To test for the specific interpretation that oscillatory correlates of the auditory DFI and tactile DFI represent not just a local occipital phenomenon but rather a reliable marker of the specific cross-modal network engendering the illusion, we have looked at an index of connectivity between nodes of the network, namely PLV. Specifically, we investigated the modulation of signal strength between auditory–visual and somatosensory–visual networks in alpha and beta bands following stimulus presentation.
We found enhanced PLV in alpha (but not beta) oscillations between auditory–visual (but not tactile–visual) nodes, while the same was found in beta (but not alpha) oscillations between tactile–visual (but not auditory–visual) nodes, confirming that oscillatory tuning to the particular version of the illusion reflects a marker of network-specific activation.
This frequency- and network-specific PLV enhancement was found for trials not inducing the illusion. This finding might reflect temporal alignment to coherent temporal and quantity information across the senses within the temporal binding unit defined by the oscillatory cycle (Romei et al., 2012). This same mechanism may be time sensitive to quantity-disparity information presented within the temporal binding unit defined by the oscillatory cycle, leading to altered integration processes across the senses, ultimately resulting in an illusory percept.
What neurophysiological mechanism might be in place to account for this set of results? A relevant model that might explain the current data is the “communication through coherence” framework (Fries, 2005, 2015). Here, neural communication is subserved by neural synchronization between remote but functionally interconnected areas. Specifically, such neural synchronization is the result of alignment of postsynaptic neural activity to presynaptic input, creating temporal windows of optimal preferred communication between involved areas. In this case, such temporal profiles observed in our study related to the auditory and tactile DFI may be the result of top-down directed alpha and beta (7–25 Hz) influences (feedback connections) on primary sensory input (Fries, 2015), shaping the final illusory perceptual outcome.
From this perspective, if a cross-modal stimulus (auditory/tactile) phase aligns oscillatory activity (alpha/beta) in visual areas, it will define the temporal windows corresponding to such oscillatory cycle lengths (alpha/beta) within which two consecutive stimuli may give rise to the illusory percept (i.e., the TWI). The illusory phenomenon will be engendered by a second cross-modal phase alignment attempt induced by the second cross-sensory stimulus reactivating the visual trace still being processed by the ongoing phase alignment induced by the first multisensory pair. Thus, individual frequency peaks would characterize the temporal resolution of inter-regional synchronization within which the TWI phenomenon arises.
A closely related reference framework has been introduced by Klimesch et al. (2007), who propose that communication between remote, but interconnected, areas can be achieved through traveling waves; that is, neural oscillations allowing information transference as measured through propagation between electrodes via a neural network (Klimesch et al., 2007; Muller et al., 2018). According to this framework, local oscillatory activity (i.e., resonance frequency) in auditory (alpha) or somatosensory (beta) cortices will propagate toward the visual cortex accounting for the specific differential impact of alpha and beta oscillations on the auditory DFI and tactile DFI, respectively. This mechanism allows prompt rescaling of temporal sampling across the senses, optimizing cross-sensory communication efficiency.
Under these circumstances, one expects the respective size of observed TWIs to reflect the length of the oscillatory cycle determining it (i.e., ∼100 ms when alpha oscillations mediate the auditory TWI and ∼70 ms when beta oscillations mediate the tactile TWI). While this is the case for the auditory DFI, the tactile DFI instead shows a TWI comparable to the auditory DFI rather than one that is significantly shorter.
Here several issues may combine to account for the lack of one-to-one correspondence between beta cycle length and the length of tactile TWI. First, it simply takes longer for signals from the hand to reach the brain than it does for signals from the ears (von Bekesy, 1959). Such conduction time differences could total 10–15 ms, which may in part account for the longer than expected tactile TWI. Second, the tactile DFI was far noisier than its auditory counterpart, with its overall goodness of fit being significantly lower. A possible caveat accounting for noisier fitting may lie in the asymmetry in our experimental design. White noise was continuously played in the tactile DFI but not in the auditory DFI to cancel out the spiky noise induced by the tactile stimulator. One potential solution could have been to use white noise across both versions of the illusion or, even better, to intermix both versions within the same block while continuously playing white noise. Additionally, this might have taken care of a potentially induced bias in the allocation of intersensory attention (Pomper et al., 2015) across the two versions of the illusion.
However, it should be noted that by pairing white noise with the auditory DFI, participants may have relied more on visual information (Hartcher-O'Brien et al., 2014), which may hamper the auditory DFI.
Moreover, several reports have shown the DFI to be resistant to feedback training (Rosenthal et al., 2009) and that participants perceive the illusion independently of cross-modal spatial congruence (Innes-Brown and Crewther, 2009) or even with prior awareness of the illusion itself (Rosenthal et al., 2009), suggesting a minor role played by intersensory attention allocation in this particular task.
Therefore, given the comparative nature of our design looking at possible differences of the impact of auditory and tactile stimuli on DFI, it was imperative to control for the specific contribution of each sensory modality.
Playing white noise in the tactile DFI might have contributed to the tactile TWI being more skewed toward slower durations due to noisier curve fitting, leading to a less efficient temporal profile calculation of the tactile DFI. These aspects may in part provide an explanation as to the lack of a one-to-one relationship between TWI and the beta cycle length. Nevertheless, they would not affect or alter the relationship between TWI and the oscillatory marker as they represent a fixed-level noise to be accounted for in the calculation of the absolute size of the tactile TWI.
The specific mechanism subtending this outcome may be comparable across sensory modalities but simultaneously reflects the peculiarity of each sensory modality, including temporal resolution. In other words, auditory and tactile cross-modal induced visual illusions might have been caused by the specific oscillatory properties of the pairing of each sensory signal. The different oscillatory tuning could be explained as the specific computational speed needed by the cross-sensory network to efficiently integrate information, thus representing the optimal quantum for temporal binding between a given cross-sensory pair when impacting visual processing specifically. In this respect, there is ample evidence that, in isolation, visual and auditory sensory processing are governed by oscillatory activity in the alpha band (Ergenoglu et al., 2004; Hanslmayr et al., 2007; Romei et al., 2008a,b, 2010; Van Dijk et al., 2008; Dugué et al., 2011; Weisz et al., 2011; Frey et al., 2014), while somatosensory processing typically occurs within the beta band (Salenius and Hari, 2003; Foffani et al., 2005; Engel and Fries, 2010; Baumgarten et al., 2015). While there is abundant documentation of the relationship of visual processing with alpha oscillations, and with the speed of alpha frequency (Samaha and Postle, 2015; Wutz et al., 2016, 2018; Gulbinaite et al., 2017; Minami and Amano, 2017; Ronconi et al., 2018), there is little empirical evidence highlighting the specific oscillatory nature of the interaction between multiple senses. We and other groups have shown that the impact of simple auditory stimulation on visual processing seems to be governed by the way sounds phase align alpha oscillatory activity in the occipital cortex (Teplan et al., 2003; Romei et al., 2012; Mercier et al., 2013; Frey et al., 2014; Gleiss and Kayser, 2014). Yet, it was unclear whether this was a general feature of cross-modal interactions within the visual system or whether the specific cross-sensory input determines the fate of the visual response to the visual processing. In the current study, we provide the first evidence highlighting the relevance of neural communication at the network level through frequency-specific oscillatory activity.
Footnotes
- Received December 19, 2018.
- Revision received April 14, 2019.
- Accepted May 13, 2019.
V.R. is supported by the BIAL Foundation (Grant 204/18).
The authors declare no competing financial interests.
- Correspondence should be addressed to Jason Cooke at jcookea{at}essex.ac.uk or Vincenzo Romei at vincenzo.romei{at}unibo.it
- Copyright © 2019 the authors