Abstract
An increasing number of studies have shown that cross-modal interaction can occur in early sensory cortices. Yet, how neurons in sensory cortices integrate multisensory cues in perceptual tasks and to what extent this influences behavior is largely unclear. To investigate, we examined visual modulation of auditory responses in the primary auditory cortex (A1) in a two-alternative forced-choice task. During the task, male rats were required to make a behavioral choice based on the pure tone frequency (low vs high) of the self-triggered stimulus to get a water reward. The result showed that the presence of a noninformative visual cue did not uniformly influence auditory response, with frequently enhancing just one of them. Closely correlated with behavioral choice, the visual cue mainly enhanced responsiveness to the auditory cue indicating a movement direction contralateral to A1 being recorded. Operating in this fashion provided A1 neurons a superior capability to discriminate sound during multisensory trials. Concomitantly, behavioral data and decoding analysis revealed that visual cue presence could speed the process of sound discrimination. We also observed this differential multisensory integration effect in well-trained rats when tested with passive stimulation and under anesthesia, albeit to a much lesser extent. We did not see this differentially integrative effect while recording in A1 in another similar group of rats performing a free-choice task. These data suggest that auditory cortex can engage in meaningful audiovisual processing, and perceptual learning can modify its multisensory integration mechanism to meet task requirements.
SIGNIFICANCE STATEMENT In the natural environment, visual stimuli are frequently accompanied by auditory cues. Although multisensory integration has traditionally been seen as a feature of associational cortices, recent studies have shown that cross-modal inputs can also influence neuronal activity in primary sensory cortices. However, exactly how neurons in sensory cortices integrate multisensory cues to guide behavioral choice is still unclear. Here, we describe a novel model of multisensory integration used by A1 neurons to shape auditory representations when rats performed a cue-guided task. We found that a task-irrelevant visual cue could specifically enhance the response of neurons in sound guiding to the contralateral choice. This differentially integrative model facilitated sound discrimination and behavioral choice. This result indicates that task engagement can modulate multisensory integration.
Introduction
Sensory cortices were long thought to be exclusively dedicated to a single sensory modality. However, an increasing number of studies show that early sensory cortices are involved in processing cross-modal interaction (Ghazanfar and Schroeder, 2006; Driver and Noesselt, 2008; Vasconcelos et al., 2011). Electrophysiological studies have demonstrated that neuronal activities in auditory cortex can be modulated by visual (Ghazanfar et al., 2005; Bizley et al., 2007; Kayser et al., 2008; Perrodin et al., 2015; Atilgan et al., 2018) and somatosensory stimulation (Fu et al., 2003; Meredith and Allman, 2015). Intracellular recordings in mice have shown that many neurons in sensory cortices are unisensory in spiking behavior but multisensory in intracellular signals (Olcese et al., 2013; Ibrahim et al., 2016). It is an emerging notion that multisensory association learning could increase the prevalence of cross-modally responsive neurons in sensory cortices (Xu et al., 2014; Vincis and Fontanini, 2016; Knöpfel et al., 2019; Han et al., 2021). In addition, anatomic studies have disclosed reciprocal nerve projections among the somatosensory, auditory, and visual cortices in monkeys, ferrets, and rats (Falchier et al., 2002; Cappe and Barone, 2005; Bizley et al., 2007; Campi et al., 2010; Stehberg et al., 2014).
Despite these observations, essential questions remain on the extent, origin, modulatory effect, and function of inputs from other sensory areas into regions of the brain concerned primarily with modality-specific processing. Also, many previous studies examining neuronal multisensory integration in sensory cortices were conducted either in anesthetized or awake animals in a passive state (Kayser et al., 2008; Meijer et al., 2017; Atilgan et al., 2018; Deneux et al., 2019). Indeed, relatively few studies have investigated cross-modal interaction in sensory cortices of awake, freely moving animals, and fewer in rodents specifically (Hirokawa et al., 2011; Raposo et al., 2014). In addition, the majority of studies examining cross-modal interaction at the neuronal level were focused on the influence of physical properties, spatial location, and the timing of stimuli on the combination of multisensory cues (Kayser et al., 2008; Meijer et al., 2017; Xu et al., 2018; Deneux et al., 2019; Wallace et al., 2020). To date, we know little about how sensory cortices process cross-modal interaction in freely moving animals during task engagement. It also remains an open question about how cue discrimination, behavioral choice, and perceptual learning influence the multisensory process in sensory cortices. A growing number of studies report that primary sensory areas do not just extract and encode stimulus physical attributes but can reflect factors such as motor activity, arousal, learning, and so on (Brosch et al., 2005; Niell and Stryker, 2010; Bizley et al., 2013; Zhou et al., 2014; Kuchibhotla et al., 2017; Huang et al., 2019). For instance, previous studies have shown that neurons in the early sensory cortex, analogous to those in the frontal cortex, can adapt rapidly to contextual demands when animals engage in cue discrimination tasks (Bagur et al., 2018; Elgueda et al., 2019). Thus, we are inspired to design the corresponding behavioral tasks to explore the cross-modal interaction in sensory cortices.
In the present study, we scrutinized the impact of visual stimulus on auditory responses by recording single units from the A1 of rats performing a sound frequency discrimination task. During the course of the task, rats were required to discriminate stimuli based on auditory signal alone (two pure tones of different frequencies sometimes paired with an invariant uninformative visual cue) and then make a behavioral choice (left or right). The result showed that A1 neurons, in the presence of a visual cue, frequently developed an enhanced response that facilitated sound discrimination. This differentially integrative processing model was not seen in a free-choice task. As a result, we conclude that the neural activities forwarded from A1 to subsequent sensory processing stages not only contain information about acoustical events but also can reflect aspects of the visual environment and task contexts.
Material and Methods
Rat subjects
Animal procedures were approved by the Local Ethical Review Committee of East China Normal University and conducted in accordance with the Guide for the Care and Use of Laboratory Animals of East China Normal University. Twenty-one adult male Sprague Dawley rats, provided by the Shanghai Laboratory Animal Center, were used for the experiments. At the start of behavioral training, these rats were 250–300 g each and were 4–6 months old. Each was housed as one animal per cage under constant temperature (23 ± 1°C) with a normal diurnal light cycle. All rats had access to food ad libitum. Water was restricted only on experimental days up to the behavioral session. Rats are usually trained 6 d per week, in one 50–80 min session held at approximately the same time of day. Body weight was carefully monitored and kept above 80% of the age-matched control rats undergoing no behavioral training.
Behavioral task
Rats were required to perform a cue-guided two-alternative forced-choice task slightly modified from other published protocols (Jaramillo and Zador, 2011; Zheng et al., 2021). Automated training was controlled using a custom-built program running on MATLAB 2015b (MathWorks). The training was conducted in an open-topped custom-built operant chamber made of opaque plastic (size, 50 × 30 × 40 cm, length × width × height) in a sound-insulated double-walled room with its inside walls and ceiling covered by 3 inches of sound-absorbing foam to reduce acoustic reflections. Three snout ports, each monitored by a photoelectric switch, are located on one sidewall of the operant chamber (Fig. 1A). The signals from the photoelectric switches were first fed to an analog-digital multifunction card and digitized (DAQ NI USB-6363, National Instruments) and sent via USB to a PC running the training program.
Rats initiated a trial by poking their nose into the center port. Following a short variable delay (500–700 ms), a stimulus (one of four options. two auditory and two auditory visual) was presented randomly. After the presentation of this cue, rats could immediately initiate their behavioral choice, moving to the left or right port (Fig. 1A). If rats made a correct choice (hit trial), they could obtain a water reward, and a new trial could immediately follow. If rats made a wrong choice or made no behavioral choice within 3 s after cue onset, a punishment time-out of 5–6 s was applied.
The auditory cue was delivered via a speaker (FS Audio), using a 300-ms-long 3 kHz (low) or 10 kHz (high) pure tone with a 25 ms ramp decay given at 60 dB sound pressure level (SPL) against an ambient background of 30–35 dB SPL. SPLs were measured at the position of the central port (the starting position). The visual cue was a 300-ms-long flash of white light given at 5 ∼ 7 cd/m2 intensity, delivered by a light-emitting diode. The auditory-visual cue (multisensory cue) was the simultaneous presentation of both auditory and visual cues. Rats were trained to the competency of >80% correct in three consecutive sessions before surgical implantation of recording electrodes.
Assembly of tetrodes
Formvar-insulated Nichrome wire (bare diameter, 17.78 μm, A-M Systems) was twisted in groups of four as tetrodes (impedance, 0.5–0.8 MΩ at 1 kHz). Two 20 cm-long wires were folded in half over a horizontal bar for twisting. The ends were clamped together and manually twisted clockwise. Finally, their insulation coating was fused with a heat gun at the desired twist level and cut in the middle to produce two tetrodes. To reinforce each tetrode longitudinally, each tetrode was then inserted into polyimide tubing (inner diameter, 0.045 inches; wall, 0.005 inches; A-M Systems) and fixed in place by cyanoacrylate glue. An array of 2 × 4 tetrodes were then assembled using an inter-tetrode gap of 0.4–0.5 mm. After assembly, the insulation coating of each wire was gently removed at the tip and soldered to a connector pin. The reference electrode used was a tip-exposed Ni-Chrome wire, diameter 50.8 μm (A-M Systems), and a ground electrode was a piece of copper wire with a diameter of 0.1 mm. Each of these was also soldered to a corresponding pin on the connector. Tetrodes and connectors were then carefully arranged and cemented with silicone gel. The tetrodes were trimmed to an appropriate length immediately before implantation.
Electrode implantation
The animal received subcutaneous injections of atropine sulfate (0.01 mg/kg) before surgery to reduce bronchial secretions. Immediately before surgery, the animal was anesthetized with an intraperitoneal injection of sodium pentobarbital (40–50 mg/kg) and then fixed on a stereotaxic apparatus (RWD). The skin was incised and the temporal muscle recessed. Then, a craniotomy and a durotomy were performed. The tetrode array was then implanted in the primary auditory cortex (3.5 to –5.0 mm posterior to bregma and 6.4 mm lateral to the midline) by slowly advancing a micromanipulator (RWD). Tissue gel (3m) was then used to seal the craniotomy. The tetrode array was next secured to the skull with stainless steel screws and dental acrylic. After surgery, the animals were given a 4 d prophylactic course of antibiotics (Baytril, 5 mg/kg, body weight; Bayer). They had a recovery period of at least 7 d (usually 9–12 d with free access to food and water).
Neural recordings in task engagement
When fully recovered from the surgery, animals resumed performing the same behavioral task. Recording sessions began after the behavioral performance of the animal recovered to the level attained before surgery (typically 2–3 d). Wideband neural signals (250–6000 Hz) were recorded using a head-stage amplifier (RHD2132, Intantech). Amplified (×20) and digitized (at 20 kHz) neural signals were combined with trace signals representing both the stimuli and session performance information and sent to a USB interface board (RHD2000, Intan Technologies), and then to a PC for online observation and data storage.
Neural recordings in anesthesia
Rats were anesthetized with an intraperitoneal injection of sodium pentobarbital (40 mg/kg, body weight), and the anesthesia was maintained throughout the experiment by continuous intraperitoneal infusion of sodium pentobarbital (0.004∼0.008 g/Kg/h) via an automatic microinfusion pump (WZ-50C6, Smiths Medical). Body temperature was maintained at 37.5°C using a heating blanket. We placed the anesthetized rats in the behavioral training chamber and put the head into the cue port to imitate the cue trigger as they did in task engagement. The stimuli (auditory, visual, and multisensory) used were the same as task engagement and presented randomly. We recorded 40–60 trials of response for each cue condition.
Histology
After the last data recording session, the final tip position of the recording electrode was marked with a small DC lesion (−30 μA for 15 s). Afterward, rats were deeply anesthetized with sodium pentobarbital (100 mg/kg) and perfused transcardially with saline for several minutes, followed immediately by PBS with 4% paraformaldehyde (PFA). Their brains were carefully removed and stored in the 4% PFA solution overnight. After cryoprotection in PBS with 20% sucrose solution for at least 3 d, the fixed brain tissue was sectioned in the coronal plane on a freezing microtome (Leica) at a slice thickness of 50 μm. Sections containing the auditory cortex were stained with methyl violet to verify the lesion sites and/or the trace of electrode insertion to be located within the primary auditory cortex.
Data analysis
The reaction time was defined as the temporal gap between the cue onset and the time that an animal withdrew its nose from the infrared beam in the cue port. The correct performance rate was defined by the following:
Raw neural signals were recorded and stored for off-line analysis. Spike sorting was later performed using Spike 2 software (Cambridge Electronic Design, version 8). Recorded raw neural signals were first bandpass filtered in 300–6000 Hz to remove field potentials. A threshold criterion of no less than threefold SDs above background noise was then used for identifying spike peaks. The detected spike waveforms were then clustered by principal component analysis and a template-matching algorithm. Waveforms with interspike intervals of <2.0 ms were excluded. Relative spike timing data for a single unit were then obtained for different trials of different cued conditions and used to construct raster plots and peristimulus time histograms (PSTHs) using custom MATLAB scripts. Only neurons for which the overall mean firing rate within the session was at least 2 Hz were included for analysis. As generally observed, behavioral and neuronal results were similar across all relevant animals for a particular testing paradigm. Thus, the data across sessions were combined to study population effects.
To render PSTHs, all spike trains were first binned at 10 ms and convolved with a Gaussian smoothing kernel (δ = 100 ms) to minimize the impact of random spike-time jitter at the borders between bins. The mean spontaneous firing rate was calculated from a 500 ms window immediately preceding stimulus onset. Cue-evoked responses were quantified as mean firing rates found in a 400 ms window after cue onset minus the mean spontaneous firing rate.
We quantified the cue selectivity between two different cue conditions (e.g., low-tone trials vs high-tone trials) by using an analysis based on a receiver operating characteristic (ROC; Britten et al., 1992). First, we set 12 threshold levels of activity covering the range of firing rates obtained in cue_A and cue_B trials. Following that, a ROC curve is generated for each threshold criterion by plotting the proportion of cue_A trials on which the response exceeded the criterion against the proportion of cue_B trials on which the response exceeded the criterion. The value of cue selectivity is defined as 2 * [(area under the ROC curve) – 0.5]. Therefore, a value of zero indicates no difference in the distribution of responses between cue_A and cue_B. A value of 1/−1 represents the highest selectivity; that is, responses triggered by cue_A were always higher or lower than those evoked by cue_B.
We ran a permutation test to test the significance of each cue selectivity value. This was accomplished by randomly distributing all trials from a neuron into two groups, independent of the actual cue conditions. These groups were nominally called cue_A trials and cue_B trials and contained the same number of trials as the experimentally obtained groups. The cue selectivity value was then calculated from the redistributed data, and the procedure was repeated 5000 times, thereby giving a distribution of values from which to calculate the probability of the result we obtained. When our actual value was found in the top 5%, it was defined as significant (i.e., p < 0.05).
Support Vector Machine decoding
To compare the population decoding accuracy of auditory neurons for tone frequencies with or without the presence of visual cue, we trained Support Vector Machine (SVM) classifier with a linear kernel to predict frequency selectivity (itcsvm in MATLAB). Briefly, the spike counts for each neuron in correct trials were grouped by triggered cues and binned into a 100 ms window moving at 10 ms resolution. Only neurons with >40 trials for each cue were included to minimize overfitting. All these neurons were pooled as a pseudo population. Responses of population neurons were arranged in an M × N × T matrix, where M is the number of trials, N is the number of neurons, and T is the number of bins. For each resample run, 40 trials were randomly selected for each cue from each neuron. For each cross-validation, we randomly sampled 90% of trials as the training set and the remaining 10% of trials as the test set. The training set was used to compute the linear hyperplane that optimally separated the population response vectors corresponding to high-tone versus low-tone trials. Performance was calculated as the fraction of correct classifications of the test trials. The classification accuracy was estimated over the 10-fold cross-validation procedures. We repeated the resampling process 100 times and computed the mean and SD of decoding accuracy across the 100 resampling iterations. Decoders were trained and tested independently for each bin.
Experimental design and statistical analyses
All statistical analyses were conducted in MATLAB with statistical significance assigned for findings attaining a p value of < 0.05. To determine whether an A1 neuron was responsive to sensory stimulus, we computed the baseline response during the 0.5 s before stimulus onset and the evoked firing rate within the time window of 0.4 s after the stimulus onset for each trial. Those A1 neurons in which the evoked responses were larger than 2 spikes/s and were significantly higher than the baseline response (p < 0.05, Wilcoxon signed-rank test) were used in the analysis. All behavioral data (e.g., mean reaction time differences between auditory and multisensory trials) and multisensory integrative index between different tone conditions were compared using the Wilcoxon signed-rank test or paired t test. We performed the chi-square test to analyze the difference in proportions of neurons showing tone-frequency selectivity. Permutation tests were used to measure the significance of the multisensory enhancement and the strength of sound frequency preference. The data were not collected in a blinded fashion. Unless stated otherwise, all data group results are presented as mean ± SD.
Results
Unisensory and multisensory decision-making behavior in freely moving rats
We trained rats (n = 15) to perform a well-established cue-guided two-alternative, forced-choice task (Fig. 1A). Rats were required to make a specific behavioral choice based on whether the auditory stimulus or the auditory component of a multisensory cue was a lower (3 kHz) or higher frequency (10 kHz) pure tone. A trial was initiated when the rat poked its nose into the central port in a line of three ports on one wall of the training box (Fig. 1A). After the waiting period of 500–700 ms, a cue, randomly chosen from a group of four cues (3 kHz pure tone, Alow; 10 kHz pure tone, Ahigh; 3 kHz pure tone plus flash of light, VAlow; 10 kHz pure tone plus flash of light, VAhigh), was presented in front of the central port. Based on the cue, rats were required to choose a port (left or right) to obtain a water reward within 3 s. If the stimulus was Ahigh or VAhigh, the rat should move to the left port for harvesting the water reward (Fig. 1A). The other two cues indicated rats should move to the right port for a reward. After rats reached the criteria of 80% correct in three consecutive sessions, they were deemed well trained. On average, the behavioral performance stabilized at 86 ± 7.4% (Fig. 1B). There was no difference in behavioral performance between auditory and multisensory-cued trials (Fig. 1C). Despite this, the presence of a visual cue sped up the process of cue discrimination. The reaction time, defined as the temporal gap between the cue onset and the moment when the animal withdrew its nose from the infrared beam monitoring point in the central port (Fig. 1A), was compared between auditory and multisensory trials (Fig. 1D,E). Note that rats responded more quickly in multisensory trials (mean reaction time across rats, 432 ± 46 ms in multisensory conditions; 456 ± 54 ms in auditory conditions; p < 0.001, Wilcoxon signed-rank test; Fig. 1E).
Frequency discrimination of A1 neurons in task engagement
To investigate the visual modulation of the response of A1 neurons in the cue discrimination task, we used tetrode recordings to characterize the activity of A1 neurons (Fig. 2A,B). We measured each the characteristic frequency (CF) of each neuron directly after successful tetrode implantation. We did not see any frequency biasing of the CF sampling distribution throughout experimentation. On average, rats performed 254 ± 49 trials in a daily session. We recorded a total of 721 A1 neurons (48 ± 24 neurons per animal) when well-trained rats performed the task. Among them, 68% (489/721) of neurons elicited apparent cue-evoked responses. However, 14% (70/489) of them showed cue-evoked inhibitory responses and were precluded from further analysis. Thus, our following analysis was focused on the remaining 419 neurons, 224 neurons from the right auditory cortex and 195 from the left auditory cortex.
The majority of neurons (72%, 302/419) showed a single-peaked response during the initial period of cue presentation (Fig. 2C–G), and 19% (79/419) of neurons showed two separated responses, with a quick onset response and a late offset response (Fig. 2E). A small subset of neurons elicited a sustained response (Fig. 2H). Fewer neurons appeared to show an offset response (Fig. 2I). In addition, some neurons (9%, 38/419) showed both cue-evoked and behavioral-choice-related responses (Fig. 2C). However, these two distinct responses could be easily separated as the behavioral-choice-related response occurred much late (frequently 400 ms later following the cue onset).
Because the vast majority of neurons showed their main cue-evoked response in the initial period of cue presentation, we focused our analysis on responses obtained at the window of 0–150 ms after cue onset. We used an ROC analysis to generate an index of auditory preference that measures how strongly the firing rate of a neuron for low-tone trials diverged from the firing rate for high-tone trials. In the same way, an index of multisensory preference was defined. We found that 47% (199/419) of neurons showed a significant frequency preference in task engagement (permutation test, p < 0.05; Fig. 2C–F, individual cases; Fig. 3A, population data). A subset of neurons (142/419) appeared to show responsiveness only to one of two tones in task engagement. We failed to see such extreme frequency selectivity in the passive and anesthetized conditions (see below), suggesting that frequency preference is related to the perceptual task. Figure 2, E and G, illustrates such two exemplar cases. In addition, we found a significant difference in frequency preference between left A1 neurons versus right A1 neurons. Most right A1 neurons preferred the high tone (the number of neurons preferring high tone, 88; preferring low tone, 26; Fig. 3A). For example, all neurons shown in Figure 2 were recorded from the right auditory cortex, indicating that all of them preferred high-tone sound.
In contrast, left A1 neurons showed the reversal (the number of neurons preferring high tone, 25; preferring low tone, 60). This result was in line with a previous study conducted in the mouse auditory cortex that a population tone frequency preference could be induced in a tone discrimination task (Xin et al., 2019). Considering that low and high tones denoted the right and left behavioral choices, our result indicates that A1 neurons selectively preferred the tone associated with contralateral behavioral choice in task engagement. We believe that such frequency-preference biasing could enable the learned transformation of sound into action as sensorimotor association learning could create direct corticocortical pathways between sensory and motor cortices (Makino et al., 2016). Similar biasing was also reported in a previous study that auditory discrimination learning preferentially potentiates corticostriatal synapses from neurons representing either high or low frequencies associated with contralateral choices (Xiong et al., 2015).
Visual modulation of the response of A1 neurons in task engagement
Visual modulation of auditory cortical responses has been previously characterized in anesthetized and passively listening animals but not while engaged in perceptual tasks. We examined the influence of the visual cue on auditory responses recorded in task engagement. As shown in exemplar cases (Fig. 2C–I), the simultaneous presentation of a noninformative visual cue could yet significantly modulate the responsiveness to auditory cues (permutation test, p < 0.05). Multisensory enhancement appeared to be the favored processing strategy in use. Surprisingly, we found that the visual cue did not uniformly influence auditory responses and frequently just facilitated one of them (Fig. 2C–I). After peer examination, we found that for neurons showing tone preference and multisensory enhancement, the visual cue mostly enhanced the preferred auditory response and left the nonpreferred response with no significant change (Fig. 2C–E). Someone might attribute this differential response to the failure of the neurons respond to the nonpreferred sound, as shown in exemplar cases (Fig. 2E,G). However, it is not the case. For instance, like the neurons in Figure 2, D and F, show, many neurons exhibited responsiveness to both tone stimuli, but still, only the preferred was boosted by the visual stimulus. Also, we found that one-quarter of neurons (25%, 124/419) showed no significant tone preference in auditory-alone trials but exhibited preference in multisensory trials because of this multisensory enhancement biasing (Fig. 2F). This process of integration further increased frequency selectivity of neurons. The intriguing question to us was then what factor(s) govern this differentially integrative model?
Previous studies showed that perceptual discrimination and behavioral choice influence cue encoding (Fritz et al., 2003; Otazu et al., 2009; David et al., 2012; Niwa et al., 2012; Schneider et al., 2014). Our data shown above support this conclusion. We then asked whether this model of multisensory integration is also behavioral-choice dependent. Indeed, we found that visual cue specifically boosted the response to the tone associated with the contralateral choice (Fig. 3A–D), indicating that behavioral choices influence the strategy of neurons of multisensory integration. For neurons recorded from right auditory cortex (n = 224), visual cue primarily enhanced the response to the high tone associated with the left choice (Fig. 3A,C). Twenty-five percent (56/224) of these neurons showed the high-tone response significantly enhanced by a visual cue, and in rare cases (4%, 10/224), they showed visually induced inhibition (Fig. 3C). When the sound was a low tone, the low proportion (13%, 28/224) of neurons showed auditory responses significantly influenced by visual cue, and in this, cross-modal inhibition was even a little bit more pronounced (inhibited, 8%, 17/224; enhanced, 5%, 11/224). This choice-related multisensory integration was the same regardless of whether the neuron showed a tone preference or not. We also used ROC-based analysis to generate an index of multisensory integration (MI) that measures how strongly the response of multisensory trials diverges from that in auditory trials. Mean MI in the high-tone condition was significantly higher than in the low-tone condition (high, 0.09 ± 0.12; low, −0.01 ± 0.11; Wilcoxon signed-rank test, p < 0.001). Thus, on average, the VAlow response was very close to the Alow response (Fig. 3A), but VAhigh response was a bit greater than the Ahigh response as more neurons showed multisensory enhancement.
In contrast, neurons recorded from left A1 biased visually induced multisensory enhancement primarily to low-frequency sound (low tone, enhanced, 26%, 50/195; inhibited, 3%, 6/195; high tone, enhanced, 6%, 11/195; inhibited, 7%, 14/195). Mean MI in the low-tone condition was significantly higher than in the high-tone condition (low, 0.08 ± 0.12; high, −0.01 ± 0.09; paired t test, p < 0.001). This result indicated that the visual cue, albeit uninformative, could facilitate perceptual discrimination, which was consistent with previous studies conducted in humans (Thorne and Debener, 2008; McDonald et al., 2013; Feng et al., 2014). Again, such choice-related multisensory integration appeared to strengthen frequency selectivity of neurons (Fig. 3E,F).
The multichannel recording method enabled us to sample a large number of neurons simultaneously, which in turn allowed us to decode sensory information on a single-trial basis from population activity in auditory cortex. We used cross-validated linear classifiers based on SVMs to decode stimulus information from the population activity by training the classifier to discriminate pairs of tone frequencies (see above, Materials and Methods). This population decoding analysis demonstrated that although the decoding accuracy could arrive at a nearly equal height in both auditory and multisensory conditions, the presence of a visual cue sped up the decoding process and facilitated the decoding accuracy of neurons to reach a high level earlier (Fig. 3G,H). This decoding result is in line with the behavioral data. It is almost 15 ms earlier in multisensory condition than in auditory-alone condition that the population neurons achieved a decoding accuracy of 90%. Summarily, these results indicate that multisensory integration can serve the task goal and aid in successful task completion.
Choice-related cross-modal interaction is independent of visual response
There is the possibility that the choice-related differential cross-modal interaction we observed might be just because of the difference in visual response between left and right choices. To test this, we introduced an individual visual cue into the task after rats performed the auditory-cue-dependent choice task. Without any training, rats (n = 8) could make the behavioral choice in visual-only trials. Rats could get water rewards in both left- and right-side water ports when the triggered cue was visual only. The behavioral data showed that adding visual trials decreased the behavioral performance of the rats in multisensory trials a bit (correct rate, auditory-only trials, 86.4 ± 4.9%; multisensory trials, 80.8 ± 4.0%; two-tailed p value = 0.0213), indicating that a noise effect was induced on decision-making in the multisensory condition. We examined a total of 324 auditory neurons recorded in this task. In task engagement, neurons recorded in the left and right A1 exhibited the same stimulus tone preference denoting the contralateral choice (Fig. 3). To examine all A1 neurons together, left A1 neurons were appropriately adjusted (the trial identity was flipped for low- vs high- tone and right vs left choice, making the orientation of cuing and choice identical) and placed in with neurons recorded in right A1. The result showed that the vast majority of neurons (92%, 298/324) failed to show visual responses in task engagement. Such a case is shown in Figure 4A, in which neurons showed no visual response but exhibited choice-related multisensory enhancement. For those (n = 26) that responded to visual cue, most of them (23/26) exhibited similar visual responses between left- and right-choice trials (Fig. 4B). Figure 4, C and D, shows the mean responses across populations in which we failed to see the difference in mean responses in left and right visual trials. Thus, our result indicates that choice-related differential multisensory integration should not be attributable to differences in visual response.
Also, we notice that the noise effect caused by the addition of visual trials also appeared to decrease the incidence of neurons showing visually induced enhancement of high-tone response (Fig. 4E), with 16% (52/324) versus 25% (106/419; with vs without visual trials; χ2 test, p = 0.0022). Despite that, still, more neurons showed frequency preference in multisensory trials (57%, 186/324; auditory trials, 43%, 140/324; χ2, p = 0.0003, Fig. 4F).
Visual modulation of response of A1 neurons in passive stimulation
In task engagement, rats were required to make behavioral choices for water rewards. We then asked whether the differentially integrative model could be seen only during task engagement. To answer this question, we recorded responses of A1 neurons in well-trained rats in the passive condition, conducted immediately after a task session on the same day or the following 2 d. Rats were placed into a box (Fig. 5A) and could sense the same stimuli as those they triggered in task engagement but had no task requirement. We found that the proportion of neurons showing visually induced enhancement of high-tone response declined to 11% (45/399; Fig. 5D), the number of which is 25% (106/419) in task engagement (χ2 test, p < 0.0001). Despite that, this proportion is still higher than those showing multisensory inhibition (5%, 21/399; χ2 test, p = 0.002). In low-tone trials, the ratio of neurons showing multisensory enhancement versus multisensory inhibition is nearly the same (enhanced, 4.8%, 19/399; inhibited, 5.5%, 22/399). The mean MI across populations for high-tone and low-tone conditions was 0.038 ± 0.113 and 0.001 ± 0.099, respectively (Wilcoxon signed-rank test, p < 0.001). These data indicate that the differentially integrative model is still present but weakened in passive stimulation (Fig. 5D,F,G). As shown in Figure 5B, some neurons showed multisensory enhancement to VAhigh cuing the contralateral movement in task engagement but failed to show the same in passive stimulations. Like the neuron shown in Figure 5C, some neurons exhibited multisensory enhancement to both preferred and nonpreferred auditory responses in the passive stimulation, which was rare in task engagement. Therefore, the differential multisensory integration in these auditory cortical neurons mainly arose from task-related modulation.
Consistently, during passive stimulations, the proportion of neurons exhibiting frequency preference (Fig. 5E) was lower than we saw in task engagement in both auditory and multisensory trials (auditory, passive vs task, 35% vs 47%; χ2 test, p = 0.0002; multisensory, passive vs task, 46% vs 57%; χ2 test, p = 0.0008). We believe the task-dependent enhancement of tone selectivity well served an increased discrimination demand. Still, more neurons favored the high tone in passive conditions (high vs low, 99 vs 40 for auditory trials; 149 vs 33 for multisensory trials). These exhibited residual effects (differentially integrative fashion and tone selectivity) suggest that perceptual training itself might leave a plastic trace in the auditory cortex. But another possibility is that the alert brain in passive stimulation processed sensory information in a fashion a bit similar in task engagement despite no behavioral choice.
Visual modulation of auditory cortical responses in anesthesia
To further probe whether multisensory perceptual learning induced a plastic change in A1, we examined visual modulation of auditory cortical response when well-trained rats were under anesthesia. The vast majority of A1 neurons recorded in anesthesia failed to show multisensory enhancement (Fig. 6A, example, E,F, mean PSTHs. However, consistent with task engagement, more neurons showed multisensory enhancement in Ahigh trials (enhanced, 9%, 46/501; inhibited, 3%, 16/501; χ2 test, p = 0.00,008). Figure 6B provides such a case. In Alow trials, the ratio of neurons showing multisensory enhancement versus inhibition was identical (enhanced, 6%, 23/501; inhibited, 8%, 34/501; χ2 test, p = 0.13). This result reveals that the differential multisensory integration is still present during anesthesia, suggesting the perceptual training of associating multisensory cues with the target produced a plastic change in A1. Compared with the data in task engagement, a lower proportion of neurons displayed frequency preference in both auditory (34%, 172/501) and multisensory (30%, 151/501) trials when rats were anesthetized, and cross-modal interaction could not increase the incidence of neurons favoring the high tone (auditory condition, 24%, 120/501; multisensory condition, 20%, 101/501). Again, this result suggests that the process of cross-modal interaction could be modified to meet task requirements in task engagement.
Visual modulation of response of A1 neurons in a free-choice task
To determine whether the behavioral choice itself could induce the differentially integrative model, we recorded the response of A1 neurons in a no-cue-discrimination free-choice task, where rats (n = 6) poked the head into the cue port to trigger a cue (Alow, Ahigh, V, VAlow or VAhigh) and then could get water reward at the water port of either side. We gathered the responses of 240 A1 neurons when rats performed this free-choice task. For consistency with the earlier analyses, responses of neurons in high-tone trials moving to the left port were compared with responses in low-tone trials moving to the right port, which was done for multisensory comparison. The result showed that unlike those recorded in the cue-discrimination task, neurons recorded in the free-choice task did not show evidence of possessing a differentially integrative model (Fig. 7A,B, examples, C,D, populations). For the majority of neurons (69%,165/240), their multisensory response is similar to the corresponding auditory response regardless of tone frequencies (p > 0.05, permutation test; Fig. 7A–E). For those neurons with the auditory response(s) influenced by visual cue, they showed induced inhibitory or facilitatory effects that appear similar in each tone condition (low tone, enhanced, 11%, 27/240; inhibited, 10%, 25/240, p = 0.77, χ2 test; high tone, enhanced, 9%, 21/240; inhibited, 11%, 27/240, p = 0.31, χ2 test; Fig. 7E). Thus, across the populations, the mean multisensory response is equal to the auditory (Fig. 7C,D). The mean MIs for both tone conditions were similar (low tone vs high tone, 0.001 ± 0.184 vs −0.001 ± 0.183; paired t test, p = 0.868). This result indicates that a consequence-free behavioral choice itself fails to induce the differentially integrative model seen earlier, and associative learning between cues and behavioral choices might be necessary to do so.
In addition, we did not see neurons display tone biasing as strongly as their counterparts did in the cue-discrimination task (auditory, preferring Alow, 15%, 37/240; preferring Ahigh, 22%, 52/240, χ2 test, p = 0.08; multisensory, preferring VAlow, 16%, 38/240; preferring VAhigh, 18%, 44/240, χ2 test, p = 0.47; Fig. 6F). This result, combined with those given above, reveals that the differential auditory-visual interaction in auditory cortex likely reflects the requirement of the given task. When stimulus discrimination is not required, the cross-modal interaction exhibits no choice-related difference. When demanded by a task, an appropriate interaction is developed to serve sensory discrimination.
Discussion
As mentioned in the introduction, several previous studies have reported that sensory cortices process more than one sensory modality, challenging the long-held view that they process only one. However, it is still unclear how cross-modal interaction occurs in these areas and whether they contribute to perceptual judgments. This study investigated the visual modulation of auditory cortical activities of neurons when rats performed a two-alternative forced-choice task. We found that a choice-related differentially integrative model provides a good description of how auditory cortical neurons integrate a visual cue into an auditory discrimination task to guide behavioral choice. When rats performed the task, we observed that a task-irrelevant visual cue typically could enhance the response of auditory cortical neurons specifically to the tone sound guiding the contralateral choice. This differentially integrative model facilitated sound discrimination and shortened reaction time. We believe this new integrative model provides a unique explanation for how cross-modal interaction helps facilitate sensory encoding and perceptual discrimination, which extends our understanding of multisensory integration in the brain.
Cross-modal influences on auditory cortex have been reported by a host of imaging studies that provided strong evidence that auditory activity can be modulated by visual and/or somatosensory stimulation (Calvert et al., 1997; Foxe et al., 2002; Kayser et al., 2005; Schürmann et al., 2006). For instance, an fMRI study found that checkerboards activated primary auditory cortex and noise bursts activated primary visual cortex, and when presented together, these stimuli shortened the latency of the hemodynamic BOLD response in each area, indicating multisensory facilitation (Martuzzi et al., 2007). Also, a host of electrophysiological studies have shown that cross-modal responses are present in auditory cortices (Bizley et al., 2007; Bizley and King, 2008; Meredith et al., 2020; Lohse et al., 2021; Opoku-Baah et al., 2021). For instance, visual stimuli and touch are found to modulate the firing of neurons in auditory cortex (Kayser et al., 2008). When the luminance of a visual stimulus is temporally coherent with the amplitude fluctuations of one sound, the representation of that sound is enhanced in the auditory cortex (Atilgan et al., 2018). However, most of these electrophysiological studies regarding cross-modal representation in sensory cortices were done in anesthetized or alert animals in a passive state (Driver and Noesselt, 2008; Xu et al., 2014; Meijer et al., 2017), leaving the issue of cross-modal interaction in early cortical sensory areas unclear. The present study shows that task engagement can combine more neurons into cross-modal interaction in a region not long ago considered pristine for a single sense. This finding is consistent with claims that presumptively unimodal areas of the neocortex are fundamentally multisensory (Ghazanfar and Schroeder, 2006).
Most previous studies showed that stimulus properties such as spatial location, intensity, direction of motion, and temporal relationships between different stimuli influence the process of multisensory integration and the physiological salience of external events (Stein et al., 2014; Wallace et al., 2020). Our present study shows that auditory cortical neurons apply a differentially integrative model to process auditory-visual interaction to support the task goal of tone-frequency discrimination, despite all cue combinations being spatiotemporally congruent. In this experiment, this novel integrative model helps discriminate multisensory cues. We speculate that this model might be a generalized integrative principle in perceptual tasks as many tasks involve multisensory cue discrimination. To our knowledge, no research articles previously showed a similar integrative model at the neuronal level. However, our result is in line with more recent experiments in awake preparations that contend that the perceptions of objects (or events) are influenced by many factors such as corresponding values, reward, memory, behavioral state, and decision-making (Miller and Cohen, 2001; Gourley and Taylor, 2016; Makino et al., 2016; Gold and Stocker, 2017; Han et al., 2021).
Our data showed that after rats had learned a sound frequency discrimination task well, A1 neurons exhibited a strong frequency preference in task engagement, consistent with previous studies (Znamenskiy and Zador, 2013; Xiong et al., 2015; Xin et al., 2019). Auditory cortex in rodents receives several top-down projections from association cortices such as posterior parietal cortex (PPC; Zingg et al., 2014; Zhong et al., 2019), orbitofrontal cortex (Winkowski et al., 2018; Sharma and Bandyopadhyay, 2020), and medial prefrontal cortex (Gao et al., 2022). These connected association cortices can both process multiple sensory inputs (Raposo et al., 2014; Sharma and Bandyopadhyay, 2020; Zheng et al., 2021) and give strong decision-related responses (Raposo et al., 2014; Lyamzin and Benucci, 2019; Reinert et al., 2021). Top-down projections from high-order cortices can powerfully influence sensory processing in sensory cortices. In mice, neurons of the cingulate region of the frontal cortex enhanced responses of visual cortical neurons and improved visual discrimination (Zhang et al., 2014). Similarly, PPC projections to auditory cortex were shown to play a critical role in the auditory categorical decisions made on new sensory stimuli (Zhong et al., 2019). In addition, associative learning enhances the relative impact of top-down processing in the visual cortex (Makino and Komiyama, 2015). In the present study, rats were required to perform an auditory detection task and make a behavioral choice. We believe that top-down projections similarly regulate the superior sound selectivity of A1 neurons and the choice-related cross-modal interaction that we have observed here.
As is well known, the cortex projects to the dorsal striatum topographically to regulate behavior (Friedman et al., 2015; Hintiryan et al., 2016), and each part of the striatum precisely mirrors activity in topographically associated cortical regions (Peters et al., 2021). Learning has the effect of potentiating corticostriatal synapses from A1 neurons representing the frequencies denoting the contralateral choice (Xiong et al., 2015). We believe that the signal of tone selectivity observed in A1 could be being forwarded to the striatum to trigger a specific choice, which might explain our finding that most neurons preferred the sound indicating the contralateral choice. Also, it helps explain why nonspecific activation of the auditory cortex resulted in predominantly contralateral biases after associative learning (Znamenskiy and Zador, 2013).
Compared with passive listening, cue-evoked responses in task engagement were attenuated in amplitude, which is consistent with previous studies (Otazu et al., 2009; Schneider et al., 2014; Zhou et al., 2014; Kuchibhotla et al., 2017). The possible underlying mechanism might be attributed to strengthened motor-related activation of auditory cortical inhibitory neurons (Schneider et al., 2014; Zhou et al., 2014; Kuchibhotla et al., 2017) or a decreased signal-to-noise ratio in the thalamocortical pathway (Otazu et al., 2009).
In normal anesthetized rats, visual cue frequently induces an inhibitory effect on auditory cortex, and no sound-frequency-dependent auditory-visual interaction is observed (Wallace et al., 2004; Xu et al., 2014; Han et al., 2021). In the present study, we found that a subset of neurons still kept a differentially integrative model in anesthesia, suggesting that perceptual training leaves a plastic change in the auditory cortex. Our result here is in line with an increasing number of studies showing that multisensory perceptual learning can effectively drive plastic change in both sensory and association cortices (Shams and Seitz, 2008; Proulx et al., 2014), as well as in subcortical areas (Yu et al., 2010; Xu et al., 2015). Previous studies have shown that multisensory associative learning can enhance cross-modal representation in sensory cortices (Vincis and Fontanini, 2016; Knöpfel et al., 2019; Han et al., 2021). In the present study, although rats experienced a lot of audiovisual stimuli in task engagement, we did not see an increase in visual representation in auditory cortex. The reason, we believe, is that the visual cue was not a necessary factor for rats to perform this task. As an efficiency, rats could just focus their attention on the auditory modality. An intriguing question remaining for us is whether auditory cortex can deeply involve visual presentation and discrimination if visual cues (like the auditory here) become necessary for performing multisensory cue discrimination tasks. This idea will be explored by our group in future studies.
In this study, although irrelevant to this task, visual cue hastened the process of frequency discrimination. This result is consistent with previous studies that show task-irrelevant visual information can enhance perceptual outcomes on a variety of low-level auditory tasks, including, but not limited to, auditory detection (Lovelace et al., 2003), loudness perception (Odgaard et al., 2004); spatial localization (Bolognini et al., 2007), and frequency discrimination (Thorne and Debener, 2008). For example, an auxiliary light could enhance the accuracy of localizing a near-threshold auditory target (Bolognini et al., 2007). Salient sounds, regardless of task relevance, can activate the human visual cortex automatically (McDonald et al., 2013) and improve the discriminative processing of colocalized visual events (Feng et al., 2014). In the sound discriminative task used here, we believe the combined auditory-visual cue induced more attention, which might partly explain the enhancement of perceptual discrimination. Attentional mechanisms are believed to compute approximate solutions to the multisensory binding problem in naturalistic environments when complex signals arise from myriad causes (Noppeney, 2021).
Footnotes
This work was supported by Technology Innovation 2030 Major Projects on Brain Science and Brain-like Computing of the Ministry of Science and Technology of China Grant 2021ZD0202600, National Natural Science Foundation of China Grant 31970925, and Shanghai Natural Science Foundation Grant 20ZR1417800. We thank Qin Nan for technical assistance in preparing the paper.
The authors declare no competing financial interests.
- Correspondence should be addressed to Liping Yu at lpyu{at}bio.ecnu.edu.cn