Abstract
Optimal behavior relies on the successful integration of complementary information from multiple senses. The neural mechanisms underlying multisensory interactions are still poorly understood. Here, we demonstrate the critical role of neural network oscillations and direct connectivity between primary sensory cortices in visual-somatosensory interactions. Extracellular recordings from all layers of the barrel field in Brown Norway rats in vivo showed that bimodal stimulation (simultaneous light flash and whisker deflection) augmented the somatosensory-evoked response and changed the power of induced network oscillations by resetting their phase. Anatomical tracing revealed sparse direct connectivity between primary visual (V1) and somatosensory (S1) cortices. Pharmacological silencing of V1 diminished but did not abolish cross-modal effects on S1 oscillatory activity, while leaving the early enhancement of the evoked response unaffected. Thus, visual stimuli seem to impact tactile processing by modulating network oscillations in S1 via corticocortical projections and subcortical feedforward interactions.
Introduction
The interplay of different senses is a very efficient strategy for amplifying behaviorally relevant stimuli (Stein, 2012). Integration of visual and tactile stimuli into a coherent percept is mandatory for day-to-day life, especially during visually guided actions. This is impressively demonstrated by the perceptual illusion of “rubber arm”: a dummy arm aligned with one's own body and tactually stimulated together with the own hand is perceived as belonging to the one's own body (Botvinick and Cohen, 1998).
Originally, it was assumed that the integration of inputs across senses follows hierarchically organized pathways and mainly involves higher cortical areas and some subcortical nuclei (Meredith and Stein, 1983; Rodgers et al., 2008; Stein and Stanford, 2008; Deeg and Aizenman, 2011). Neurons in multisensory brain regions receive convergent inputs from multiple senses and encode perceptual information by enhancing or depressing their firing in response to cross-modal versus unimodal stimuli (Meredith, 2002; Stein and Rowland, 2011). However, the classical convergence view cannot account for all features of multisensory processing (Driver and Spence, 2000). Experimental evidence has documented cross-modal activation in the primary sensory cortices, which traditionally have been considered as sensory-specific (Ghazanfar and Schroeder, 2006; Guzman-Martinez et al., 2012). Several attempts have been undertaken to elucidate by which means cross-modal inputs reach the primary cortices. Feedback projections from higher convergence areas have been proposed as main substrate of cross-modal activation in putative unisensory cortices (Clavagnier et al., 2004). Moreover, direct but rather sparse connections between primary sensory cortices have been identified (Beer et al., 2011), yet their contribution to multisensory processing remains largely unknown (Cappe and Barone, 2005).
In the primary sensory cortices, cross-modal interactions take place at the level of individual neurons as well as at the neuronal network level (Miller and D'Esposito, 2005; Kayser et al., 2007; Iurilli et al., 2012). For example, visual and olfactory stimuli modulate neuronal firing in the auditory cortex (Kayser et al., 2008; Cohen et al., 2011). Correlated neuronal activity within cortical networks might support the multisensory integration of firing rates. Cross-modal stimulation shapes the power and phase of oscillatory activity, enabling flexible modulation of the response's strength (Lakatos et al., 2007; Arnal et al., 2011). Synchronization in γ frequency band, which allows rapid and transient enhancement of the mutual influence of neural populations (Wang, 2010), may provide another efficient mechanism of multisensory integration (Senkowski et al., 2008).
Although these findings highlight diverse mechanisms of cross-modal interplay, the link between anatomical substrates and complex dynamic interactions of neuronal populations during multisensory processing is still missing (Cappe and Barone, 2005). The present study aims at filling this gap by focusing on the poorly investigated visual-somatosensory interactions. For this, multisite extracellular recordings and pharmacological manipulation of S1 and V1 in vivo during unimodal or cross-modal stimulations were combined with tracing of axonal projections between both cortices. We provide evidence that corticocortical connectivity accounts for the visual modulation of oscillatory power and phase in somatosensory networks and also in combination with subcortical feedforward interactions, for supra-additive effects on evoked potentials.
Materials and Methods
Surgical preparation
All experiments were performed in compliance with the German laws and the guidelines of the European Community for the use of animals in research and were approved by the local ethical committee. Brown Norway rats were obtained from Charles River and housed individually in the animal facility of University Medical Center with a 12 h light/12 h dark cycle and fed ad libitum. Male and female rats weighing 32–41 g were used. The surgery was performed under ketamine/xylazine anesthesia (72/9.6 mg/kg body weight, i.p.; Ketavet, Pfizer electromagnetic valves; Rompun, Bayer). The scalp was removed, and two metal anchor bars were fixed on the nasal and occipital parts of the skull via dental cement, serving for fixation in the stereotaxic device. The bone over S1 and V1 was removed by drilling holes of <0.5 mm in diameter without causing leakage of CSF or blood. According to our unpublished observations (I.L.H.-O.), such leakage damps cortical activity and neuronal firing.
Recording protocols
Extracellular recordings were performed under light urethane anesthesia (0.5 g/kg body weight, i.p., Sigma-Aldrich). Body temperature, breathing rate, and pain reflexes were monitored. During recording (120–220 min after initial injection), additional urethane (0.25–0.5 g/kg body weight) was administered via an intramuscular catheter (n = 6 rats). One-shank 16-channel electrodes with an interrecording site spacing of 100 μm (0.5–3 MΩ, Silicon Michigan probes, NeuroNexus Technologies) were perpendicularly inserted into S1 (2.4–2.6 mm posterior to bregma and 5.5–5.8 mm from the midline) and V1 (6.9–7.1 mm posterior to bregma and 3.4–3.7 mm from the midline) of both hemispheres to a depth of 1.6 mm. The electrodes were labeled with DiI (1,1′-dioctadecyl-3,3,3′,3′-tetramethyl indocarbocyanine, Invitrogen) to enable postmortem in histological sections the reconstruction of electrode tracks in S1 and V1 (see Fig. 1C). Two silver wires were inserted into the cerebellum and served as ground and reference electrodes. Local field potentials were recorded at a sampling rate of 32 kHz using a multichannel extracellular amplifier (Digital Lynx 10S, Neuralynx) and the acquisition software Cheetah. During recording, the signal was bandpass filtered between 0.1 Hz and 5 kHz.
Sensory stimulation
Unimodal (light flash, whisker deflection) or bimodal stimulation was achieved using a custom-made stimulation device. During bimodal stimulation, whisker deflection and light flashes were presented simultaneously either in the same (congruent) hemifields or in opposite (incongruent) hemifields with respect to the tactile stimulus. Whiskers were stimulated with a precise timing (0.013 ± 0.81 ms) by deflection through a compressed-air controlled roundline cylinder (RT/57110/M/10, Norgren) gated via solenoid valves (VCA, SMC Pneumatik). To guarantee nearly noiseless and nonelectrical stimulation, the solenoid valves were placed outside the setup and isolated with foamed material. Because the strength of deflection was constant, the pattern and duration of “follow-up” whisker vibrations were also constant across trials and did not influence the cross-modal effects; 50 ms LED light flashes (300 Lx) were used for visual stimulation. A custom-made controlling device (V.115.2.09) triggered the stimuli in 4 different conditions. Unimodal visual and tactile, congruent, and incongruent simultaneous cross-modal stimulations were randomized through the controlling device and presented at interstimulus intervals of 6.5 ± 0.5 s. The nonstimulated eye was covered with an aluminum foil patch, and ears were additionally sealed with cotton. Each type of stimulus was presented 100 ± 10 times, except for stimulation under lidocaine. The reversibility of drug-induced blockade (Frostig et al., 2008) reduced the effective time for stimulation and consequently the number of stimuli to 50 ± 10.
Blockade of V1 activity with lidocaine
Blockade of action potentials in V1 was performed in rats mounted in the stereotaxic apparatus. A total volume of 100–300 nl lidocaine (4% in artificial CSF, Sigma-Aldrich) was intracortically applied at a rate of 200 nl/min via a 26 G needle (10 μl microsyringe) attached to a microsyringe pump controller (Micro4, WPI). To confine the lidocaine-induced blockade to V1, we calculated the appropriate drug volume according to the spherical volume equation (Tehovnik and Sommer, 1997) as follows: where V is the lidocaine volume and r is the radius of tissue in which neurons are inactivated ≥90% of the time. The site of drug application was in the direct vicinity (<0.5 mm) of the Michigan electrode inserted into V1. In our previous studies, the application procedure was optimized to avoid side effects attributable to mechanical damage of tissue. Insertion of microsyringe alone or paired with infusion of solvent had no effect on network activity (Janiesch et al., 2011). After application, the needle was left in place for at least 1–3 min to allow optimal diffusion of lidocaine. Successful lidocaine-induced manipulation was confirmed by reduction of visually evoked responses in V1 (see Fig. 8B).
Retrograde tracer and histology
Anesthetized rats were immobilized into a preformed mold fixed into the stereotaxic apparatus and received unilateral injections of Fluorogold (FG, Fluorochrome) in S1 (2.4–2.6 mm posterior to bregma and 5.5–5.8 mm from the midline). A total volume of 100 nl FG (5% in PBS) was delivered via a 26 G needle attached to a pump controller. The slow injection speed (30 nl/min) and the maintenance of the syringe in place for at least 2–3 min ensured an optimal diffusion of the tracer. After a survival time of 4–8 d, the rats were deeply anesthetized with ketamine/xylazine and perfused transcardially with 4% PFA. For FG staining, the brains were removed and postfixed in the same solution for 24–72 h. Blocks of tissue containing S1 or V1 were sectioned in the coronal plane at 100 μm, air dried, and examined using ultraviolet excitation filter. For quantification, FG-stained cells were counted by eye.
For cytochrome oxidase and Nissl staining, the brains were removed and halved along the midline. The subcortical brain regions of one half of the brain were removed, and the cortex was flattened between two acrylic glass plates. Both halves were postfixed in 4% PFA for 24–72 h. The flattened cortices were sectioned in the transverse plane at 100 μm and processed for cytochrome oxidase histochemistry (see Fig. 1B). Briefly, the sections were incubated in a solution containing diaminobenzidine (0.5 mg/ml), cytochrome C (0.6 mg/ml), katalase (0.36 mg/ml), and saccharose (44.4 mg/ml). The sections were examined using light microscopy and a red (535–555 nm) excitation filter of the fluorescence microscope (SZX16, Digital camera DP72, Olympus) to reconstruct the trace of DiI-labeled Michigan probe. The nonflattened brain halves were sectioned in the coronal plane at 100 μm and air dried. Fluorescent Nissl staining was performed as previously described (Brockmann et al., 2011) using the NeuroTrace 500/525 green fluorescent Nissl stain (Invitrogen). Briefly, rehydrated slices were incubated for 20 min with NeuroTrace (dilution 1:100). Sections were washed, coverslipped with Fluoromount and examined using the green (460–480 nm) and the red (535–555 nm) excitation filter of the fluorescence microscope. The photographs were adjusted for brightness and contrast using Adobe Photoshop CS4 (Version 11.0.2).
Data analysis and statistics
Data were imported and analyzed offline using custom-written tools in Matlab software version 7.7 (MathWorks). Data are presented as mean ± SEM. All values were tested for normal distribution with Lilliefors test (α = 0.05). Significance levels of p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***) were detected.
Calculation of evoked potentials.
Continuous recordings were epoched offline and 1-s-long time windows (300 ms before stimulus and 700 ms after stimulus) corresponding to each stimulation condition (unimodal visual/tactile, cross-modal congruent/incongruent) were averaged for each recording. Epochs with stimulation artifacts or offsets were cut out. The properties of resulting evoked potentials (EPs) were used to confirm the position of recording sites across rats (maximum amplitude of EPs in layer IV, polarity reversal at the border between layer III and IV). This laminar organization was confirmed by histology (track of DiI-labeled electrode), whereas current source density (CSD) analysis allowed functional identification of S, G, and I layers at higher spatial resolution. One-dimensional CSD profiles were calculated to a five point formula (Nicholson and Freeman, 1975). The CSD values Im were derived from the smoothed second spatial derivative of the extracellular field potentials Φ and calculated as follows: where h is the distance between electrodes (100 μm) and r is the coordinate perpendicular to the cortical layer (n = 2, k = 7 a0 = −2, a± 1 = −1 and a± 2 = 2). The blue in the plots represented current sinks, and red represented current sources. The number of time windows averaged over rats for each recording site was kept constant for each stimulation condition [from recording site 1 (up) to 16 (bottom): 707, 902, 1256, 1264, 1566, 1760, 1754, 1769, 1754, 1755, 1664, 1462, 1567, 1461, 1310, 983]. The EP onset was calculated for each trial as the delay between stimulus and first EP deflection exceeding the baseline SD (−300 to 0 ms). The EP peaks were detected as local local field potential (LFP) maxima/minima within sliding time windows of 100 ms. The peaks were defined as positive (P) or negative (N) based on surface polarity. Their amplitude and delay from stimulus were averaged over all trials and tested together with the EP onsets for normal distribution as well as for significant differences between conditions with Kruskal–Wallis test corrected with the Holm–Bonferroni method. For scatter plots, the amplitude of P1 and N1 of EPs (see Fig. 2C) was randomized over all trials in all investigated rats and plotted for each layer and stimulation condition. To decide whether the amplitude distribution significantly changes between stimulation conditions, the Euclidean distances between points belonging to different conditions (intercluster Euclidean distances) and Euclidean distances in mixed condition group (intracluster Euclidean distances) were calculated and tested for significance with Kruskal–Wallis test (α = 0.05). The effect of lidocaine was assessed by calculating the area under the curve for every tactile response in S1 and visual response in V1 under control conditions and after drug application.
Spectral analysis of induced activity.
For each stimulation trial, continuous wavelet coefficients C were calculated for time windows of 1500 ms (500 ms before stimulus and 1000 ms after stimulus) at frequency scale a and position b by the following: where Ψ is a Morlet wavelet. They were corrected for pink noise by normalization to the coefficients of baseline activity (100–300 ms before stimulus) at every frequency. The timing of power modulation is less precise because of the pink noise correction and the low time resolution of wavelets in low-frequency range. Baseline normalized wavelets were averaged for all rats, and their coefficients were tested for significant differences between unimodal and cross-modal stimulation condition by two-sample t test. Significant coefficients were grouped in four time windows and averaged. The amplitude of induced oscillations in different frequency bands was calculated for individual trials (300 ms before stimulus and 700 ms after stimulus) using the Hilbert transform of filtered data (third-order Butterworth bandpass filter) and averaged for all trials. Mean amplitude of oscillations during time windows before (50–150 ms) and after (75–175 ms; 300–400 ms) stimulus was tested for significant differences with Kruskal–Wallis test.
Phase analysis
Phase distribution across trials was characterized by calculating the resultant length of the mean vector. For this, LFPs during a 1500-ms-long time windows (500 ms before stimulus and 1000 ms after stimulus) were filtered in three different frequency ranges [third-order Butterworth bandpass filter (4–12 Hz, 13–30 Hz, 31–100 Hz)]. The phase of oscillatory activity was extracted using the Hilbert transform, and single trial event-related phase values were analyzed by circular statistical methods (Circular Statistic Toolbox). Because of zero-phase digital filtering, the phase was not distorted but the time resolution of phase distribution was poor. The mean resultant vector length was calculated at each frequency and time point and baseline-normalized (300–500 ms before stimulus). The 99% confidence intervals were calculated with a z-value corrected for the number of all time points (n = 1628, z = 4.5214).
Results
Evoked potentials in S1 as result of unimodal versus visual-tactile stimulation
We first assessed the impact of unimodal stimulation (light flash or deflection of principal whiskers) on the LFP recorded over the entire cortical depth of the posterior medial barrel subfield (Figures 1 and 2A) in S1 of lightly urethane-anesthetized Brown Norway rats (n = 10). The good visual acuity of pigmented Brown Norway rats compared with albino rats (Prusky et al., 2002) makes them well suited for testing visual-somatosensory processing. By conducting the entire investigation under sleep-like conditions mimicked by the urethane anesthesia (Clement et al., 2008), we avoided the interference with spontaneous whisking and the impact of alert state, which modulates the cross-modal integration.
In the contralateral S1 whisker, stimulation evoked responses with first fast peaks followed by slower components with positive or negative polarity (Fig. 2B,C). These mean EPs resulted from averaging the LFPs over a large number (100/rat) of whisker stimulations and had a precisely stimulus-timed onset. For given depths corresponding to the supragranular (S), granular (G) and infragranular (I) layers of the contralateral S1 (Fig. 2B), the EP onset and the delay of the first EP peak did not significantly differ across rats (S, p = 0.19; G, p = 0.19; I, p = 0.26). Whisker stimulation evoked a smaller response in the ipsilateral S1 (Fig. 2D), which resulted most likely from the activation of noncrossing projections. Multicomponent stimulus-locked EPs in the contralateral S1 were accompanied by a switch from current sinks to sources at the border between granular and supragranular layers (Fig. 2B). In the granular layer, the EPs had the shortest onset (12.8 ± 0.6 ms, n = 1769 unimodal trials) and their first peak showed the largest absolute amplitude (552.9 ± 7.1 μV, n = 1754 trials, p < 0.001) as well as positive surface polarity (P1). In line with the feedforward activation that has been previously identified in primary cortical areas upon stimulation (Schroeder et al., 1998; Fu et al., 2003), the initial activation of supragranular (onset 16.4 ± 1 ms, n = 1032 trials) and infragranular (onset 14.9 ± 0.5 ms, n = 1457 trials) layers significantly (p < 0.001, p = 0.001) lagged behind the granular response. The initial peak of EPs was followed after a long delay (∼150–400 ms) by two or three additional peaks with positive or negative polarity and lower amplitude (Tables 1 and 2). The most prominent among them had negative surface polarity (N1) and shortest peak time (154 ± 2.3 ms, n = 1754 trials) in the granular layer. In contrast to whisker deflection, a short light flash evoked after 73.6 ± 4.3 ms a significantly smaller response in the contralateral S1 that had the same polarity over all cortical layers (Fig. 2E). As expected, strong activation of the contralateral V1 was obtained after light-stimulation (Fig. 2F).
In a second step, whisker deflection and light flashes were presented simultaneously either in the same (congruent) hemifield or in opposite (incongruent) hemifields with respect to the tactile stimulus (Fig. 3). Compared with the unimodal P1 and N1 (552.9 ± 7.1 μV, 98.6 ± 2.7 μV) as well as their arithmetic sum (536.9 ± 8 μV, 123.5 ± 3.8 μV), congruent visual-tactile stimulation elicited stronger activation of S1 (Fig. 3Ai). Especially the absolute amplitude of both peaks in the granular layer was enhanced (P1, 581.8 ± 6.9 μV, p < 0.001; N1, 133.1 ± 2.5 μV, p < 0.001, n = 1754 cross-modal trials). Although the EP onset (11.4 ± 0.3 ms, n = 1769 trials) and the P1 peak time (26 ± 0.2 ms, n = 1769 trials) did not differ between unimodal and cross-modal conditions, the N1 peak time was significantly faster (137.6 ± 1.5 ms, n = 1754, p < 0.001) after visual-tactile stimulation. By contrast, incongruent bimodal stimulation did not evoke similar supra-additive effects on EPs (Fig. 3Aii). The onset of the cross-modal enhanced first EP peak in the S1 was shorter than the onset of light-evoked response in the contralateral V1 (37.3 ± 0.8 ms) (Fig. 2F). Thus, visual stimuli augmented the evoked somatosensory response before the activation of V1 by light. The supra-additive effects after bimodal stimulation were not affected by volume conductance and were confirmed by CSD analysis (Fig. 3B). The amplitude of CSD signals for the P1 and N1 peaks was similarly augmented after bimodal stimulation compared with unimodal stimulation or arithmetic sum of visual and tactile responses. Assessment of intercluster and intracluster Euclidean distances for the amplitude distributions of the first and second peak for both unimodal and cross-modal evoked EPs confirmed a significant visual-somatosensory enhancement after congruent stimulation only (S, p = 0.06; G, p < 0.001; I, p < 0.001) and the lack of such effects after incongruent activation (Fig. 3C). This enhancement was significant only when tactile and visual stimuli originated in the same hemifield.
Induced oscillatory activity in S1 as result of unimodal versus visual-tactile stimulation
In addition to prominent multicomponent evoked responses, two additional patterns of network activity characterize S1: (1) spontaneous ongoing oscillations and (2) stimulus-induced oscillations (Senkowski et al., 2008) (Fig. 4A). The first type is known to correlate with various brain states (e.g., up and down states) and has been shown to modulate individual incoming inputs (Civillico and Contreras, 2012). Because they are not causally linked to sensory stimuli, the spontaneous ongoing oscillations are commonly cancelled out after averaging the LFPs corresponding to a large number of stimulation trials. The second type of activity, stimulus-induced oscillations, is causally related, but in contrast to the evoked response, not phase-locked to sensory stimulus (e.g., whisker deflection). Consequently, these oscillations are underscored by plotting individual frequency-power spectra for a large number of trials, whereas they cancel out when LFPs were averaged across trials.
Whisker stimulation-induced prominent oscillations with frequencies ranging from θ (4–8 Hz) to γ band (30–100 Hz) in the supragranular, granular, and infragranular layers of S1 (Fig. 4). Analysis of individual frequency spectra corresponding to a large number of stimulation trials revealed that, shortly after the prominent stimulus-evoked response, the amplitude of stimulus-induced γ band oscillations was significantly (S, p = 0.02; G, p = 0.03; I, p = 0.03) decreased from 12.6 ± 0.8 μV (S), 15 ± 1 μV (G), and 14.1 ± 0.7 μV (I) under baseline conditions (50–150 ms before stimulus) to 8.9 ± 0.2 μV (S), 10.3 ± 0.2 μV (G), and 10.9 ± 0.3 μV (I) for 75–175 ms time epoch after stimulus. This depression was followed (300–400 ms after stimulus) by a stimulus-induced increase of γ band network activity to 14.8 ± 0.6 μV (S), 20.7 ± 1 μV (G), and 16.9 ± 0.6 μV (I) that significantly (p < 0.001) exceeded the baseline level (Fig. 4B). Although less consistent across layers, tactile stimulation additionally induced low-frequency oscillations. The amplitude of θ band activity significantly (p < 0.001) increased from 77.9 ± 4.5 μV (G) and 89.2 ± 4.1 μV (I) before the stimulus to 89.4 ± 3.4 μV (G) and 109.4 ± 4.4 μV (I) after the evoked response. Similarly, α and β band oscillations were significantly augmented after stimulus (Fig. 4C).
To assess the effect of simultaneous visual and tactile stimulation on the induced oscillatory activity in the S1, we pooled the multiple-trial baseline normalized Morlet wavelet spectra across animals (n = 8), separately for the four different stimulation conditions: unimodal visual stimulation, unimodal tactile stimulation, simultaneous congruent cross-modal stimulation, and simultaneous incongruent cross-modal stimulation. Unimodal visual stimulation did not modify the frequency distribution over S1 layers (Fig. 5A). In contrast, whisker deflection induced prominent responses in different frequency bands in the S, G, and I layers (Fig. 5B). During cross-modal stimulation, the somatosensory activity was visually modulated. To quantify the effects of cross-modal stimulation on the oscillatory activity, we compared the baseline-normalized power change for each time-frequency point of the Morlet wavelet spectra. The time points with significant power change after cross-modal versus unimodal stimulation were clustered. Three time windows (I: 150–230 ms; II: 440–550 ms; III: 850–900 ms after stimulus) with significantly different induced γ band (30–100 Hz) activity and one time window (IV: 115–915 ms after stimulus) with significantly different induced θ band (4–8 Hz) activity resulted after clustering analysis (Fig. 5B,C). During time window I the mean power relative to baseline decreased in all cortical layers, whereas during time window II the mean power increased after bimodal stimulation. The relative augmentation in γ frequency band (defined as fraction with respect to baseline) was more prominent in the granular (from 1.3 ± 0.1 to 1.9 ± 0.2, p = 0.004) than in the infragranular and supragranular layers (from 1.1 ± 0.1 to 1.5 ± 0.1, p = 0.029; from 1.4 ± 0.1 to 1.7 ± 0.2, p = 0.134). Compared with unimodal stimulation, the bimodal stimulus significantly (p = 0.02) enhanced the oscillatory activity in the granular layer during time window III (1.1 ± 0.1 to 1.5 ± 0.2). Incongruent visual-tactile stimulation similarly modified the power of induced oscillations during these time windows. However, the changes were less prominent compared with the congruent condition (Fig. 5C,D). The most obvious difference was the decrease in γ power of induced oscillations during time window III from 1.5 ± 0.2 after congruent stimulation to 1.3 ± 0.3 after incongruent stimulation. Thus, visual input accompanying whisker deflection modified the power of induced network oscillations in S1.
Mechanisms and anatomical substrate of visual-tactile interplay
Next, we aimed at identifying the neural mechanisms by which the copresented visual input modulates the tactile evoked response on one hand and the induced network activity in S1 on the other hand. This will enable to link both major cross-modal effects into an unitary concept of visual-tactile processing. One possible mechanism is a visual stimulus-induced phase reset of the ongoing network oscillations in S1 to an optimal phase, during which the incoming tactile stimulus has the highest effectiveness. To test this hypothesis, we compared the phase synchrony of spontaneous oscillations in S1 during a large number of unimodal visual stimulation trials and calculated the mean resultant vector length of oscillatory phases for θ (4–8 Hz), α (8–12 Hz), β (13–30 Hz), and γ (30–100 Hz) frequencies. If the oscillatory phase is the same in each trial, the mean resultant vector length will be 1, whereas if the oscillatory phase is absolutely random, the value will be 0. Unimodal visual stimulation induced prominent, but similar, stimulus-related phase concentration of θ and α band oscillations in all layers of the contralateral S1. To avoid errors of narrow band filtering and improve the power of statistical testing, the data for θ and α frequency bands were pooled (Fig. 6A). A similar, but smaller, phase concentration was observed in the ipsilateral S1 (Fig. 6B). Visually driven oscillations in the contralateral S1 remained in phase for a long time, their synchronization being significant (i.e., >99% confidence interval) for up to 330 ms in S, up to 530 ms in G, and up to 410 ms in I. Similar cross-modal phase reset for oscillatory activity in θ and α band was observed in V1 after tactile stimulation (Fig. 6C).
To decide whether the visual-somatosensory interplay relies on direct communication between neuronal networks in V1 and S1, we first assessed by anatomical tracing the direct connectivity between the two primary sensory cortices. We injected small amounts of the retrograde tracer FG, which has high resistance to fading (Schmued and Fallon, 1986), into S1 of seven rats taking special attention to the confinement of tracer within cortical layers (Fig. 7A). Confirming previous studies, bright fluorescent back-labeling of parent cell bodies feedforwardly projecting to the barrel field was observed in the ventral posteromedial (VPM) and posterior nuclei of thalamus as well as in the ipsilateral secondary somatosensory cortex. Moreover, we identified labeled cells in the visual cortices and to a weaker extent in the motor cortex (Fig. 7A,B). Remarkably, FG injection revealed that V1 neurons directly project to S1. Retrogradely stained neurons were detected in V1 of all investigated rats (Fig. 7B,C), being concentrated in a small cortical volume (0.064 mm3) over the S and G layers. The maximal cell density strongly varied between animals (13–56 cells; mean 28.9 ± 5.8 cells), possibly reflecting the slightly variable position of the FG injection site in S1. No stained neurons were observed in the contralateral V1, indicating that direct interhemispheric connections between V1 and S1 are missing. We hypothesize that the connectivity directly linking the primary cortices may represent one of the anatomical substrates underlying functional communication between S1 and V1.
To test this hypothesis, we silenced the electrical activity in V1 (n = 5 rats). Small amounts (100–300 nl) of the action potential blocker lidocaine were injected at low speed (200 nl/min) into V1. Confirming our previous data (Brockmann et al., 2011) and in line with the spherical volume calculation, lidocaine acted on a small volume (radius ∼290–425 μm) confined to V1, leading here to a significant reduction (49.5 ± 11.4%, p < 0.001) of the visual responses while leaving the tactile responses in S1 unaffected (Fig. 8). The effects of transient and partial V1 silencing were investigated in the ipsilateral barrel cortex. Lidocaine-induced blockade of activity in V1 did not affect the bimodal enhancement of the first EP peak (P1) over the S, G, and I layers (Fig. 9A; Tables 3 and 4). However, lidocaine eliminated the cross-modal augmentation of the second peak (N1), the amplitude of which did not anymore differ between unimodal (S: 25.1 ± 20.4 μV, n = 102 trials; G: 75.8 ± 7.3 μV, n = 243 trials; I: 56.4 ± 8.2 μV, n = 248 trials) and cross-modal (S: 21.4 ± 32.1 μV; G: 89.4 ± 9.5 μV; I: 46.5 ± 9.2 μV) stimulation (Fig. 9A,B). Moreover, blockade of V1 activity eliminated almost all power differences between the network activity induced by unimodal versus cross-modal stimulation (Fig. 9C). Partial silencing of V1 allowed also testing whether phase reset of ongoing oscillatory activity mediates the visual-tactile interplay. The analysis of the mean resultant vector length showed that phase concentration of slow oscillations (4–12 Hz) by contralateral visual input was significantly decreased in all S1 layers and almost abolished in the S after lidocaine injection into V1 (Fig. 9D). Moreover, functional disconnection of S1 and V1 delayed the remaining phase concentration in the G and I (∼190 ms and ∼180 ms after stimulus, respectively).
These results indicate that V1 and S1 connections are necessary for cross-modal processing. The direct connectivity demonstrated in the present study might represent one possible pathway that mediates the visual reset of oscillatory phase in the barrel field, whereas subcortical feedforward interactions seem to account for the supra-additive early evoked responses.
Discussion
The present study aimed at elucidating the functional and structural correlates of visual-somatosensory interactions by combining electrophysiological recordings and pharmacology in vivo with anatomical tracing. We demonstrate the following: (1) integration of visual and tactile information takes place in the barrel field of S1; (2) visual cross-modal supra-additive augmentation of somatosensory-evoked responses mainly originates on the subcortical sensory tract partially independent of the primary neocortices; and (3) visual stimulus-induced reset of ongoing neuronal oscillations and power modulation of the activity in S1 critically depend on the communication between primary sensory cortices (Fig. 10).
As previously reported (Roy et al., 2011), tactile information from the whisker reached the granular layer of the barrel field after a short delay (∼12 ms). From here, the activity spread to the infragranular and supragranular layers. In addition to evoking such prominent response, the unimodal somatosensory stimulus additionally modified S1 network activity and induced neuronal oscillations in a wide range of frequency bands (4–100 Hz). Simultaneous cross-modal stimulation of the whiskers and the eye augmented the evoked responses and significantly changed the poststimulus power of induced network oscillations in the θ and γ frequency band while leaving the temporal pattern of activation across layers unaffected.
Our results point toward two distinct mechanisms of cross-modal interaction. First, subcortical multisensory regions along the sensory tract that relay information from the periphery to the neocortex may account for cross-modal augmentation of the short-delay initial response in S1. Several lines of evidence support this conclusion. Partial silencing of V1 by lidocaine did not affect the first peak of the evoked response in the S1, suggesting that this processing pathway bypasses V1. Moreover, the first multisensory effect had a shorter latency than the first visual response in both V1 and S1. Thalamic nuclei [e.g., lateral geniculate nucleus (LGN), VPM] relay sensory information from the periphery (e.g., retina, whiskers) first to the G layer and later to the S and I layers of the corresponding primary sensory cortices (e.g., V1, S1) (Wise and Jones, 1978; Herkenham, 1980; Chapin and Lin, 1990). In addition to unisensory processing, some thalamic nuclei might mediate multisensory interactions. For this, four distinct mechanisms have been postulated (Cappe et al., 2009): (1) thalamic nuclei send sensory information to more than one sensory cortex; (2) cross-modal input is integrated at the thalamic level and subsequently sent to sensory cortices; (3) sensory cortices (and possibly single neurons) receive input from several thalamic regions processing different sensory inputs; and (4) inputs of one sensory area are transferred to another cortical area via the thalamus (corticothalamo-cortical route) (Noesselt et al., 2010). Because of the limited temporal resolution of imaging techniques (Barth et al., 1995; Noesselt et al., 2010), it was not possible to distinguish whether thalamic multisensory effects are mediated via feedback connections from cortical areas (mechanism 4) or whether they are the result of bottom-up processing (mechanisms 1–3). Our results clearly demonstrate the existence of thalamic feedforward mechanisms because the supra-additive effect on the first peak of the evoked response was not affected by lidocaine and its peak time was too short to allow for feedback interactions. Because the LGN was not retrogradely stained, mechanism 2 is the most probable explanation for the present experimental findings. Several thalamic regions have been reported to react to stimulation of more than one sensory modality, representing possible relay stations of cross-modal effects (Tyll et al., 2011). Simultaneous visual and tactile stimulation activated neurons in the medial geniculate body (Wepsic, 1966) and reticular nucleus (Sugitani, 1979). Similarly, cross-modal interactions have been reported for the auditory thalamus (Komura et al., 2005). Because some sensorimotor loops are formed below the cortical level (Diamond et al., 2008), early thalamic cross-modal interactions might speed up the reaction time.
The second mechanism of cross-modal processing involves the activation of neuronal networks in the primary sensory cortices. Visual stimuli reset the phase of network oscillations in the barrel field. When presented alone, light flashes induced a prominent and sustained phase concentration of low-frequency oscillations in all layers of contralateral S1, whereas the effect on the ipsilateral side was much weaker and shorter. Similarly, tactile stimulation caused phase reset in V1. As shown by their augmented power, superimposed fast oscillations (β-γ frequency band) accumulated at a specific phase of ongoing low-frequency oscillatory activity. Consequently, a co-occurring tactile stimulus arrives during the same phase of ongoing oscillations in S1. The coincidence of a stimulus with a specific (high- or low-excitability) phase of network oscillations is assumed to increase its processing efficiency (Fries et al., 2001). Similar oscillatory phase reset has been previously proposed as an underlying mechanism of cross-modal interactions in the auditory cortex (Lakatos et al., 2005; Kayser et al., 2008). Both supra-additive auditory-somatosensory and subadditive auditory-visual interactions were associated with a phase resetting of oscillatory activity in the A1 (Lakatos et al., 2007; Thorne et al., 2011). The mechanism of phase reset also shows an incongruency effect because the weaker and shorter visual phase concentration results in a weaker power modulation of oscillatory activity. Because we observed a similar cross-modal phase modulation in S1 and V1, we suggest that phase reset of ongoing oscillations might be a general mechanism of multisensory interactions used by all sensory systems.
Moreover, we elucidated the possible anatomical pathways that mediate the observed phase reset. As shown by retrograde tracing and lidocaine-induced silencing of V1 activity, direct projections from the V1 to S1 might represent the anatomical substrate of cross-modal modulation of network oscillations. These projections mainly target supragranular layers of the S1 (Paperna and Malach, 1991); and correspondingly, lidocaine had at this depth the strongest effect on the oscillatory phase concentration. Corticocortical projections terminating in layer II/III may boost neurotransmitter release in these layers. Similarly, GABAergic transmission in the supragranular layer has been identified as a prerequisite for phase concentration of ongoing activity and a key cellular mechanism that underlies auditory inhibition of V1 (Iurilli et al., 2012). Taking into account the rather low density of direct projections between V1 and S1 and the persistence, even at a lower magnitude, of visually induced phase reset in S1 granular and infragranular layers after lidocaine silencing of V1, it is very likely that additional mechanisms are meant to amplify and/or complement the cross-modal interplay. The high density of retrogradely labeled neurons in V2 (Fig. 7B) suggests that higher-order visual cortices that receive feedforward inputs from V1 may be equally involved. However, the cross-modal effects are relayed from V1 to S1 via few stations because the visually induced phase reset in S1 is only slightly delayed compared with the visually evoked response in V1 (42–68 ms vs 37–40 ms). On the other hand, projections from multisensory thalamic nuclei as well as feedback projections from higher-order convergence areas (Theyel et al., 2010) might account for the delayed lidocaine-insensitive peaks of phase concentration.
In contrast, the weaker power modulation and phase reset of S1 activity, which were induced by spatially incongruent stimulation, seem to rely on different mechanisms. In the absence of direct interhemispheric connections between V1 and S1, either ipsilateral projections from the retina to the dorsal LGN (Discenza and Reinagel, 2012) or transcallosal information transfer within V1 or S1 (Genc et al., 2011; Ragert et al., 2011) may account for the phase reset observed when visual and tactile stimuli were presented in opposite hemispheres.
The present results suggest that the visual modulation of S1 activity facilitates the processing of tactile stimuli. Even though the rat state (sleep-like conditions under urethane anesthesia) and the simple stimulation patterns used do not perfectly match the natural multisensory stimulation in behaving animal, the investigation under these controlled conditions identified general principles and mechanisms of cross-modal interactions. As recently shown (Rowland et al., 2012), interactions identified under anesthetized conditions might equally control cross-modal processing in a complex natural environment. We propose that, similar to recent findings on audiovisual detection (Gleiss and Kayser, 2012), access to visual and tactile information is beneficial for rats, supporting perceptual discrimination. Performance in whisker-based discrimination tasks has been reported to critically depend on V1 neuronal firing, even in the dark (Vasconcelos et al., 2011). Similarly, looking at the arm at which a tactile stimulus was applied reduced the two-point discrimination in humans (Kennett et al., 2001). This visual enhancement of touch was found to involve visual modulation of S1 (Serino et al., 2009). We propose that both cross-modal interactions at subcortical (thalamic) level and the oscillatory entrainment of primary sensory cortices via direct connectivity are powerful instruments of information processing, which help individuals to detect and localize the most salient and possibly relevant events in the environment.
Footnotes
This work was supported by the excellence initiative of city Hamburg (“neurodapt!” to K.S., B.R. and I.L.H.-O.), Emmy Noether-Program of German Research Foundation Grant Ha4466/3-1 to I.L.H.-O., Grant SFB 936 B05 to I.L.H.-O., and German Federal Ministry of Education and Research Grant 01GQ0809 to I.L.H.-O. We thank Drs. Shigeru Kitazawa, Andreas Engel, and Amy Wolff for valuable discussions and helpful comments on the manuscript; Achim Dahlmann for technical assistance; Fritz Kutschera and Torsten Renz for building the custom-designed stimulator; and Martin Westerberg for help with statistics.
The authors declare no competing financial interests.
- Correspondence should be addressed to either Dr. Ileana L. Hanganu-Opatz or Dr. Kay Sieben, Developmental Neurophysiology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, Falkrenried 94, 20251 Hamburg, Germany, hangop{at}zmnh.uni-hamburg.de or kay.sieben{at}zmnh.uni-hamburg.de