Abstract
The cerebral cortex is a major hub for the convergence and integration of signals from across the sensory modalities; sensory cortices, including primary regions, are no exception. Here we show that visual stimuli influence neural firing in the auditory cortex of awake male and female mice, using multisite probes to sample single units across multiple cortical layers. We demonstrate that visual stimuli influence firing in both primary and secondary auditory cortex. We then determine the laminar location of recording sites through electrode track tracing with fluorescent dye and optogenetic identification using layer-specific markers. Spiking responses to visual stimulation occur deep in auditory cortex and are particularly prominent in layer 6. Visual modulation of firing rate occurs more frequently at areas with secondary-like auditory responses than those with primary-like responses. Auditory cortical responses to drifting visual gratings are not orientation-tuned, unlike visual cortex responses. The deepest cortical layers thus appear to be an important locus for cross-modal integration in auditory cortex.
SIGNIFICANCE STATEMENT The deepest layers of the auditory cortex are often considered its most enigmatic, possessing a wide range of cell morphologies and atypical sensory responses. Here we show that, in mouse auditory cortex, these layers represent a locus of cross-modal convergence, containing many units responsive to visual stimuli. Our results suggest that this visual signal conveys the presence and timing of a stimulus rather than specifics about that stimulus, such as its orientation. These results shed light on both how and what types of cross-modal information is integrated at the earliest stages of sensory cortical processing.
Introduction
The cerebral cortex enables dynamic, flexible responses to the sensory environment. To achieve this, signals from a variety of sources must come together, often across the sensory modalities. For example, the acoustic signal (auditory domain) and the lip movements of speech (visual domain) are integrated to fundamentally influence perception of speech (McGurk and MacDonald, 1976), a process that occurs, at least in part, in the cortex (Skipper et al., 2007). Such sensory information is often viewed as traveling through a cortical hierarchy, from primary regions through secondary regions and eventually to association cortex. Traditionally, the earliest stages of this processing were thought to be exclusively unimodal, exhibiting responses to only one sensory modality (Felleman and Van Essen, 1991). Recently, this notion has been challenged by evidence that even the primary regions of visual, auditory, and somatosensory cortex all receive and integrate information from other sensory modalities (Calvert et al., 1997; Foxe et al., 2000; Ghazanfar et al., 2005; Schaefer et al., 2006; Iurilli et al., 2012).
Evidence for multisensory convergence in early sensory cortex comes from two complementary lines of research. First, anatomical tracing has revealed direct connections between cortical or thalamic regions of different sensory systems (e.g., Falchier et al., 2010; Banks et al., 2011; Henschke et al., 2015). Second, physiological studies have shown that neural activity, as measured by firing rate or field potential response, can be altered in unimodal versus multimodal stimulus conditions. For example, neural responses to sound in both core and belt regions of monkey auditory cortex (ACtx) can be modulated by the presence of concurrent visual stimulation (Ghazanfar et al., 2005; Kayser et al., 2008). Less common is the finding that a neuron would respond to a different sensory modality in absence of stimulation in the modality preferred by the surrounding cortex. In the ACtx, spiking responses to visual stimuli have been shown to develop after behavioral training; specifically, neurons in the ACtx of primates, trained on an auditory categorization task, exhibit responses to the onset of a cue light (Brosch et al., 2005). In the untrained context, ACtx responses to visual stimulation have been found in rats (Wallace et al., 2004), ferrets (Bizley et al., 2007), and gerbils (Kobayasi et al., 2013) but are generally reported to represent a small fraction of responses. Furthermore, the stimulus preferences and cortical organization of these visually responsive neurons remain poorly understood. Such information is critical for understanding the role that these responses play in auditory processing.
Here we sought to determine whether and how the ACtx of the mouse (Mus musculus) responds to visual stimulation. We performed acute, awake recordings in mouse auditory and visual cortices during auditory and visual stimulation. In a subset of recording sites, neurons in mouse ACtx responded to visual stimulation even without a concurrent auditory stimulus. These neurons reside almost exclusively in layer 5 (L5) and layer 6 (L6) of the cortex, and may signal the presence and timing of a salient visual stimulus to the local circuitry of the ACtx.
Materials and Methods
Animals.
All experiments were approved by the Institutional Animal Care and Use Committee at the University of California, San Francisco. For optogenetic identification of L6, we used the Ntsr1-Cre knock-in line (GENSAT GN220), in which Cre recombinase is expressed specifically in L6 corticothalamic cells (Gong et al., 2007; Olsen et al., 2012). To achieve targeted activation of L6, this line was crossed with the Ai32 strain (JAX stock #012569), which encodes the light-gated depolarizing cation channel channelrhodopsin-2 (ChR2) conjugated to eYFP, after a floxed stop cassette under the CAG promoter. For all other experiments characterizing the visual response in ACtx, we used mouse strains on a C57BL/6 background that were not expressing optogenetic effector proteins. Mice in all experiments were between 6 and 12 weeks old. All adult mice were housed in groups of 2–5 under a 12 h/12 h light/dark cycle. Both female (5 of 19) and male mice were included in this study.
In vivo awake recordings.
A surgery to implant a custom steel headplate over the temporal skull using dental cement was conducted 2–7 d before each recording. The headplate was positioned to allow access to a point putatively centered on primary ACtx, 2.5 mm posterior to bregma and under the squamosal ridge. On the day of the recording, the animal was anesthetized using isoflurane and a ∼2-mm-diameter opening was made in the skull over ACtx using a dental drill. This opening was promptly covered with silicone elastomer (Kwik-Cast, World Precision Instruments), and the animal was allowed to recover from anesthesia for 1–2 h. The animal was then affixed by its headplate over a free-spinning spherical treadmill and the silicone plug over the craniotomy was removed. A 16-channel linear probe (50 μm site spacing; Neuronexus) was covered in the lipophilic dye Di-I (2.5 mg/ml in EtOH) using a needle and syringe and then slowly inserted in the brain using a motorized microdrive (FHC). After reaching the desired depth, the brain was allowed to settle for ∼10 min, after which neural recording and stimulus presentation commenced. Typically, 3–5 acute penetrations were performed per animal.
The signal acquisition system consisted of an Intan RHD2000 recording board and an RHD2132 amplifier (Intan Technologies), sampling at 30 kHz. Auditory stimuli were presented with a free-field electrostatic speaker (ES1, Tucker-Davis Technologies) driven by a Quad Capture external soundcard (Roland) at a sampling rate of 192 kHz. Visual stimuli were presented on a 19 inch LCD monitor with a 60 Hz refresh rate (VW199, ASUS). Auditory and visual stimuli were both generated in MATLAB using the Psychophysics Toolbox Version 3 (Kleiner et al., 2007).
Sound stimuli consisted of blocks of click trains followed by pure tone sequences. Click trains, generated from broadband 5 ms white noise pulses, were presented at 20 Hz for 500 ms duration, and were used as a search stimulus to determine auditory responsiveness; they were not analyzed further. Pure tone stimuli consisted of 100 ms tones of varied frequencies (4–64 kHz, 0.2 octave spacing) and sound attenuation levels (30–60 dB in 5 dB linear steps), with an interstimulus interval of 500 ms to construct a frequency-response area (FRA). Between 6 and 10 trials were presented at each frequency-attenuation level.
Visual stimuli consisted of either flash stimuli or drifting grating stimuli on a monitor centered in front of and 25 cm away from the mouse. Monitor luminance was calibrated to 25 cd/m2 for a gray screen, measured at the approximate location of the mouse's eyes. The flash stimulus was a white square (32° horizontal × 32° vertical) on a black background, 150 ms in duration, with a peak brightness of 95 cd/m2. Typically, 150 flash presentations were used per block, and the interstimulus interval was randomly varied between 650 and 2850 ms. Drifting gratings were presented full screen (79° horizontal × 50° vertical) for 1 s using parameters optimal for driving mouse visual cortex (VCtx): 4 Hz temporal frequency, 0.02 cycles/degree spatial frequency, 100% contrast (Niell and Stryker, 2008). Gratings were presented in 12 orientations from 0 to 330 degrees in a randomly varied sequence, with 50 presentations per orientation and a randomly varied interstimulus interval between 500 and 1600 ms. To verify that the monitor did not produce sound, we recorded and analyzed sounds during visual stimulus presentation using an ultrasonic microphone and recording device sampling at 250 kHz (UltraSoundGate 416H, Avisoft Bioacoustics).
For L6 optogenetic identification experiments, we activated ChR2 by illuminating the cortex with a blue 470 nm LED (Mightex) coupled to a 400-μm-diameter optical fiber, NA = 0.39 (Thorlabs). A micromanipulator was used to position the fiber tip just above the cortical surface immediately adjacent to the probe penetration site. Light powers were varied between 0.2 and 2.2 mW; trials with light powers of 1.6–2.2 mW were used for later analysis. Light duration was 500 ms, with a 50 ms linear ramp to reach full power and a recovery time randomly varied between 1600 and 2600 ms.
Data analysis.
After recordings, the raw voltage trace was bandpassed between 600 Hz and 6 kHz, and events were extracted using a moving-window 4.5 SD threshold. For single unit (SU) analysis, event waveforms were sorted using custom software in MATLAB (KFMMAutoSorter, written by Mathew Fellows). Multiunit (MU) analysis was performed on all events captured by the 4.5 SD threshold; as such, this analysis includes all units recorded on a channel, as well as events that could not be attributed to a SU neural source due to the absence of a uniquely identifiable waveform shape. Such analysis is typically thought to capture spiking activity from tens of neurons in the vicinity of the recording electrode (Buzsáki, 2004).
For all MU and SU analyses, auditory responsiveness was defined as a significant difference in firing rate between the 100 ms before stimulus onset and the 100 ms poststimulus period (paired t test, Benjamini–Hochberg corrected for false discovery rate, q = 0.001) (Benjamini and Hochberg, 1995). The Benjamini–Hochberg procedure, used here to correct for multiple comparisons in determining significance of auditory and visual responses, is a method for controlling false positives (Type I errors) that has increased power relative to more common family-wise error rate control procedures, such as the Bonferroni correction; the latter class of procedures attempts to control the probability of including one false positive, typically at the expense of a high false negative (Type II error) rate. On the other hand, false discovery rate methods set an acceptable rate of Type I errors (Benjamini and Hochberg, 1995). Here, we set our false discovery rate for both auditory and visual responses to q = 0.001. Auditory FRAs were generated using firing rate in the 100 ms after sound onset. Significant tuning to frequency, used in auditory response classification, was defined as a modulation of firing rate by frequency using a one-way ANOVA (α = 0.05).
Visual responsiveness was defined as a significant difference in firing rate between the 200 ms preceding stimulus onset and 200 ms poststimulus onset (paired t test, Benjamini–Hochberg corrected for false discovery rate; q = 0.001). MU sites and units considered in this dataset were from depths within cortex, as measured by our electrode track tracing procedure described below, and only included recordings determined to be in ACtx based on MU responses to pure tones (n = 676 MU sites; n = 223 SUs).
For both auditory and visual responses, latency to onset was defined as the time point at which the poststimulus firing rate exceeded the prestimulus firing rate by 4 SDs. Likewise, response offset was defined as the point after onset at which firing rate dropped back down <4 SDs above prestimulus firing rate.
To identify the Ntsr1-Cre-positive L6 band in our Ai32/Ntsr1-Cre optogenetic activation experiments, we analyzed recordings for a group of three or more adjacent light-activated channels. A channel was defined as light-activated if it showed a sustained increase in MU firing rate throughout the light-on period. To remove transient onset effects that were often observed throughout the cortical column, our analysis focused on the last 200 ms of the light-on period. We converted firing rate during this period to a z score using the baseline (200 ms before stimulus onset) mean and SD. Degree of firing rate modulation differed greatly between recordings, presumably because of factors such as light penetrance in cortical tissue. As such, we defined significant activation as any period that surpassed half of the peak z score firing rate observed across all light levels in each recording. A channel was considered light responsive if at least half of all 20 ms time bins showed such activation, and an Ntsr1-Cre band was defined as three adjacent channels (i.e., activation spanning 150 μm) Using this method, the Ntsr1-Cre-positive band of L6 was readily identifiable in 5 of 8 experiments with visually responsive SUs or MUs.
To determine the degree of tuning to orientation of drifting grating stimuli, we calculated an orientation selectivity index (OSI) as follows: In the above equation, Rpref is the mean response to the stimulus of the preferred orientation (that which elicited the response with the highest firing rate), and Rorth is the mean response to the two orientations orthogonal to the preferred orientation. All responses were baseline-corrected by subtracting the mean prestimulus firing rate averaged over all trials. For units in which Rorth was suppressed relative to baseline, OSI will be >1. Analyses were performed on the 500 ms after stimulus onset.
Histological verification of electrode track depth.
To visualize recording site locations, we used the fluorescent lipophilic dye Di-I, which has been shown to reliably mark the full extent of electrode tracks in extracellular recordings (DiCarlo et al., 1996) and has been used to visualize multisite silicon probe tracks in the mouse brain (Lee et al., 2015). After completion of physiological recordings, the animal was killed, and the brain was removed and placed into a solution of 4% PFA in PBS (0.1 m, pH 7.4) for 12 h, followed by 30% sucrose in PBS solution for several days. The brain was then frozen, and sections were cut on a sliding microtome (SM2000R, Leica Biosystems). Slices were then mounted and imaged on a fluorescence microscope with a red filter cube (Eclipse 90i, Nikon). Fluorescent marks on slices were mapped to each recording with the aid of a penetration site map drawn based on the exposed cortical surface during the experiment. Approximately half of the electrode tracks were fully imaged and mapped onto the Paxinos and Franklin (2004) for 3D reconstruction of recording site location. For the remainder, histology was used only to identify the depth of recorded electrode sites and verify that they were located in the cortex.
Classification of auditory sites.
ACtx contains subfields with characteristic neural responses to sound stimuli (Stiebler et al., 1997; Joachimsthaler et al., 2014). We classified recording sites as either primary-like in their responses (putatively primary ACtx, A1, or anterior auditory field) or secondary-like (for example, secondary ACtx, A2, or dorsal-posterior field). Primary and secondary regions are most differentiable by latency to response onset, with primary regions exhibiting shorter onset latencies than secondary ones (Carrasco and Lomber, 2011; Joachimsthaler et al., 2014). As such, the classification procedure we used was as follows. A channel-wise automatic classification was made for MU pure tone responses based on the following criteria: primary-like responses were those with significant firing rate tone responses and onset latencies of <14 ms; secondary-like responses were those with significant tone responses and onset latencies of >14 ms (Joachimsthaler et al., 2014). Channels not significantly responsive to sound were coded as nonauditory. Next, channels whose classification differed from that of their neighbors were examined in the context of all auditory responses on the probe and coded by eye. This dealt with two problematic cases resulting from automatic channel-wise classification. First, this allowed us to correct for “one-off” inconsistencies in classification within a probe. Second, this increased labeling accuracy in border cases where the probe track did not enter orthogonal to the brain surface, and both primary- and secondary-like responses were recorded from the same auditory experiment. All auditory response classification was performed blind to results of the visual response experiments conducted at the same site.
Results
Visual responses in mouse ACtx
To measure visual responses in mouse auditory cortical neurons, we performed acute extracellular recordings in awake mice using a linear probe to simultaneously measure responses in different layers of cortex (Fig. 1A). In separate blocks, mice were either presented with 100 ms pure tones of varied frequencies and attenuations or 150 ms flashes of a white square on a black background (Fig. 1B). We then analyzed MU and SU activity evoked by these auditory and visual stimuli (Fig. 1C). Recording site depth and location were determined from post hoc histological visualization of the lipophilic dye Di-I, which was applied to the probe shank before each penetration (Fig. 1D,E). Probe placement was targeted to the ACtx based on surface vasculature (Stiebler et al., 1997) and confirmed by robust responsiveness to sound stimulation at the recording sites (see Materials and Methods; Fig. 1F, left and middle). After identifying ACtx, we presented flashes and determined whether firing rate was modulated by these purely visual stimuli (Fig. 1F, right). In 28 of 48 ACtx laminar recordings from 16 mice, at least one MU showed a statistically significant increase in firing rate in response to the visual stimulus (paired t test, Benjamini–Hochberg corrected for false discovery rate, q = 0.001; see Materials and Methods). Visually evoked spiking responses were found in both MUs (Fig. 1G) and SUs (Fig. 1H). Habituation of responses to repeated stimulus presentations is a well-established feature of auditory cortical processing (Cook et al., 1968). We tested whether auditory cortical SU visual responses found here also habituated over the course of stimulus presentation blocks by checking for a systematic increase or decrease in response magnitude using Spearman's correlation analysis. We found 3 of 15 units from 9 mice whose stimulus-evoked firing rates changed (with no corresponding change in baseline firing; p < 0.05). Of these, two decreased in firing rate and one increased. Additionally, we found no trend in stimulus-evoked firing rate on the population level (Jonckheere–Terpstra trend test statistic = 0.84, p = 0.20). Together, these tests show that there is little evidence for systematic increase or decrease of response magnitude over time with our recording protocol.
Recordings were performed in a blocked manner, with auditory stimuli presented together followed by visual stimuli. Not all units identified as visually responsive were also identifiable in the auditory blocks, due either to sparse firing in response to auditory stimulation or electrode drift over time. There were 9 well-isolated visually responsive SUs also identified in auditory recordings (n = 7 recordings from 7 mice); of these, 7 units (7 of 9), exhibited significant auditory responses (example: Fig. 1I), all of which were also tuned to sound frequency. This result shows that some neurons in mouse ACtx multiplex auditory and visual stimuli.
Deep layer bias of visual responses in ACtx
The laminar location of a neural response can be strongly suggestive of its computational role. We mapped the geometry of the probe electrode sites onto brain slice images marked by Di-I (Figs. 1E, right, 2A) to determine the cortical depths of visually-responsive electrode sites (example: Fig. 2B). We then measured the distance from white matter of each recording site and normalized this by the white matter-pia distance of the corresponding brain slice to correct for any tissue distortion. This yielded a fractional cortical depth measurement for each recorded channel, which was then assigned to a cortical layer (from 48 recordings in 16 mice: L6, n = 160 sites; L5, n = 254; L4, n = 129; L2/3, n = 120; L1, n = 13; Fig. 2D). Analysis of these data reveals that the majority of MU and SU visual responses occur in L6, with the bulk of the remainder occurring in L5 (Fig. 2C,D). To provide physiological confirmation of our depth measurements, we used the Ntsr1-Cre mouse strain in which Cre recombinase is expressed specifically in L6 corticothalamic cells (Fig. 2E) (Gong et al., 2007; Olsen et al., 2012). This mouse line was crossed with mice of the Ai32 strain, which expresses ChR2 conjugated to eYFP in a Cre-dependent manner, so that ChR2 was restricted to L6 (Fig. 2E). Illumination of the cortical surface with blue light resulted in strong MU activation of a distinct band of channels deep on the probe (Fig. 2F,G), thereby providing an optogenetically induced physiological marker of L6. A band of three or more adjacent channels with sustained optogenetic activation was identified in five of eight recordings (n = 4 of 6 mice). We determined the depth of visual responses relative to this band of activation and found that all of them occurred ≤200 μm from its lower border (Fig. 2H). This “photo-tagging” approach to identify the band of L6 corticothalamic cells further confirmed the deep layer bias of visual responses.
Visual responses in primary and secondary regions of ACtx
The neural signatures of multisensory integration are more commonly observed in secondary or “higher-order” areas of sensory cortex compared with primary regions (Ghazanfar et al., 2005; Bizley et al., 2007). To test whether this finding holds for visual responses in mouse ACtx, we classified our recording sites into primary or secondary regions using temporal dynamics and frequency tuning of evoked MU responses to pure tones of varied frequencies and attenuations (see Materials and Methods). Recordings classified as primary-like typically showed robust frequency-attenuation tuning (example: Fig. 3A), whereas many sites classified as secondary did not (example: Fig. 3B). When reconstructed, most visually responsive primary-like sites were found within primary ACtx on the Paxinos and Franklin (2004) mouse brain atlas (Fig. 3C). Mean normalized FRAs from primary and secondary sites centered on best frequency (BF) show that, on average, MUs from both classification exhibit tuning, but BF- and off-BF responses were closer in magnitude in secondary than in primary sites (Fig. 3E). MU onset and peak response latencies to BF sound stimuli in primary areas were lower than those in secondary areas (primary onset: 10 ± 4 ms, mean ± SD; secondary onset: 17 ± 11 ms; Wilcoxon rank-sum Z = 11.7, p = 1.71e-31; primary peak: 18 ± 8 ms; secondary peak: 30 ± 16 ms; rank sum Z = 9.76, p = 1.67e-22; n = 396, 167 MUs from n = 32, 23, recordings in n = 14, 13 mice for primary and secondary, respectively; Fig. 3D). While distributions of recorded auditory BFs were biased to the middle of the frequency band tested (∼8–30 kHz), visual responses were more prominent at those sites with BFs near 64 kHz (Fig. 3F). The fraction of visual responses was also slightly higher at secondary than primary recording sites in both L5 and L6, the only two layers that exhibited any substantial visual responsiveness (Fig. 3G). Thus, our results indicate that visual MU responses are slightly biased toward secondary sites and toward sites with high-frequency tuning in primary areas, but are nevertheless present at sites with a variety of auditory BFs.
Visual response latencies compared in auditory and visual cortices
Latencies to response onset for tones at BF in the ACtx vary from 8 to 30 ms, depending on auditory field (Fig. 3D). What are the temporal dynamics of the visual response in ACtx? Visually responsive SUs in primary auditory regions exhibit onset latencies of 75 ± 10 ms (mean ± SD; n = 7 SUs from 6 recordings in 6 mice), whereas those in secondary regions have onsets of 92 ± 25 ms (n = 8 SUs from 5 recordings in 5 mice; Fig. 4A); this difference did not reach statistical significance (Wilcoxon rank-sum = 47.5, p = 0.34). MU visual response onset latencies were 85 ± 37 ms in primary sites (n = 45 MUs from 19 recordings in 12 mice) and 95 ± 25 ms (n = 28 MUs from 11 recordings in 8 mice) in secondary (Fig. 4B). MU visual response onsets in primary ACtx occurred significantly earlier than those in secondary ACtx (Wilcoxon rank-sum Z = 2.3, p = 0.021).
Anatomical tracing work has shown that mouse ACtx receives direct inputs from several visual cortical regions, and that these inputs show a preference for L1 and L6 (Banks et al., 2011). If these projections are carrying the visual information to the ACtx, visual stimulation should elicit earlier responses in VCtx than in ACtx. To test this, we recorded from awake mouse VCtx using flash stimuli with the same parameters as used to elicit responses in ACtx (Fig. 4C,D). Analysis of MU data shows that responses to visual stimuli in VCtx are significantly earlier in onset than in ACtx (VCtx: 40 ± 11 ms, n = 78 MUs from 8 recordings in 3 mice; ACtx: 90 ± 32 ms, n = 73 MUs from 30 recordings in 15 mice; Wilcoxon rank-sum Z = 9.06, p = 1.26e-19) and peak (VCtx: 70 ± 33 ms; ACtx: 115 ± 39 ms; Wilcoxon rank-sum Z = 7.60, p = 2.95e-14). Latencies to response offset in VCtx and ACtx were not significantly different (VCtx: 160 ± 73 ms; ACtx: 131 ± 40 ms; Wilcoxon rank-sum Z = 1.4 p = 0.16; Fig. 4E,F), although VCtx sites showed a wider distribution of offset latencies (Fig. 4F). These dynamics show that VCtx begins processing the visual flash stimulus before it arrives in the ACtx.
Visual orientation tuning in the ACtx
A hallmark of visual cortical processing is tuning of neurons to edges of particular orientations. We sought to test whether visual responses in ACtx also carry specific information about the visual scene, such as edge orientation. While recording in ACtx, we presented full screen 1 s drifting gratings of 12 orientations and found strong responses in a subset of our flash-responsive SUs (example, Fig. 5A). Comparison of response peristimulus time histograms and firing rate histograms typically revealed only moderate orientation tuning in the ACtx (examples: Fig. 5B,C). For reference, we also recorded drifting grating responses from VCtx units. Side-by-side comparison of the most orientation-selective ACtx and VCtx units shows a much higher degree of orientation selectivity in the VCtx (Fig. 5D,E). We calculated the OSI (see Materials and Methods) for all ACtx and VCtx SUs. We find strongly orientation-selective units in the VCtx (41% [7 of 17] of OSIs > 0.75, n = 7 recordings in 3 mice), along with weakly tuned units, but only weakly tuned units in ACtx (0% [0 of 7] of OSIs > 0.75; n = 5 recordings in 4 mice; Fig. 5F); orientation selectivity differs significantly between these two populations (one-tailed Kolmogorov–Smirnov test statistic = 0.512; p = 0.049). These results suggest that visual responses in ACtx do not carry fundamental visual information about edge orientation but instead may represent a more general signal indicating the presence and timing of a salient visual stimulus.
Discussion
To determine whether mouse ACtx responds to visual stimulation, we presented awake mice with unimodal visual and auditory stimuli under passive conditions while performing acute recordings from auditory or visual cortices. In both primary and secondary ACtx, we found SU and MU activity that responded directly to visual flash and drifting grating stimuli in the absence of sound. These responses were almost entirely restricted to L6 and, to a lesser degree, L5. In L6 of ACtx, ∼25% of MUs were visually responsive; in L5, this value was 10%; yet, <2% of MUs in layers 2–4 were visually responsive. Visually responsive units in ACtx have longer latencies than those in VCtx and, unlike VCtx neurons, are not strongly tuned to drifting grating orientation. Together, these results suggest that the deep layers of cortex may represent a locus for cortical multisensory integration in the mouse.
These findings are supported by anatomical tracing work that shows that mouse primary ACtx receives inputs from VCtx (Banks et al., 2011). Anterograde tracers injected into secondary regions both lateral (V2L) and medial (V2M) of V1 reveal robust labeling of terminals in A1. Since this work, much has been done to further parcellate the fields of mouse secondary VCtx (e.g., Garrett et al., 2014); it remains to be tested whether projections to auditory areas vary further by visual cortical subfield. Particularly relevant here is the finding that neurons from V2 primarily send projections to L1 and L6 (Banks et al., 2011). This points to a potential anatomical pathway for visual signals to elicit spiking responses in the deep layers of mouse ACtx via monosynaptic connections from V2M and V2L. Although we did not observe visually responsive cells in L1, this layer of cortex was not well represented in this dataset (n = 13 MU sites recorded). Projections arriving at L1 do not necessarily target the sparse population of cells that reside there: the superficial visual projection to ACtx may terminate on the relatively large mass of L1 apical dendrites from pyramidal cells in L5 and L2/3 (Larkum and Zhu, 2002), and thus may produce some of the deep layer visual responses we observed. Further work involving methods such as trans-synaptic tracing must be used to resolve questions of this nature.
Beyond direct connections from secondary visual cortices, there are several other potential pathways by which visual signals may drive spiking responses in the ACtx. Other areas of association cortex may send feedback projections to modulate processing in the ACtx. The gerbil primary ACtx receives direct input from multisensory cortical regions, such as posterior parietal cortex (Budinger et al., 2006). In addition, several thalamic regions with projections to ACtx show multimodal responses; the medial aspect of the medial geniculate body exhibits multisensory responses (Wepsic, 1966) and sends a dense projection to L6 of rat ACtx (Linke and Schwegler, 2000). Furthermore, the suprageniculate nucleus, another highly multimodal thalamic region, projects to L5 and L6 of rat ACtx (Smith et al., 2010). The termination patterns of these projections are also consistent with deep layer visual spiking responses; none of these anatomical pathways can be ruled out based on our results. While neurons in many visual stations, including primary and secondary visual cortices as well as visual thalamus, exhibit tuning to orientation, the untuned responses we observe in ACtx could result from the pooling of such tuned inputs.
The layer specificity of visually evoked spiking responses in ACtx may inform the role of such responses in modulating activity within the cortical column (Douglas et al., 1989). The deep or “infragranular” layers of cortex are considered the primary subcortical output layers but also send collaterals to cortical targets, including local circuitry. Historically, the role of L6, in particular, has been considered enigmatic due in part to its high degree of morphological and physiological heterogeneity (Briggs, 2010) and atypical sensory responses (Zhou et al., 2010). Recent work has shown that L6, through synapses onto local inhibitory interneurons, may play a role in gain control of sensory responses (Bortone et al., 2014). Furthermore, L6 neurons are known to influence cortical receptive field structure (Bolz and Gilbert, 1986) and gate sensory input through corticothalamic connections (Briggs and Usrey, 2008). Cells in these layers appear to be strategically located for sculpting and modulating sensory processing. The restriction of visually evoked spiking responses to the infragranular layers suggests that such modulation of sensory activity may be controlled, in part, by cross-modal inputs.
Previous work has shown that visually evoked spiking responses are rare in the ACtx. Kobayasi et al. (2013) recorded from A1 of the Mongolian gerbil, and concluded that 2 of their 128 units exhibited responses to a visual stimulus alone. In the ACtx of the ferret, reported percentages of visually responsive neurons are much higher, with ∼15% of primary auditory neurons showing responses to the light flash of an LED (Bizley et al., 2007). In the rat ACtx, ∼6% of units showed responses to the visual stimulus alone (Wallace et al., 2004). Although factors, such as cross-species differences, likely explain some of these discrepancies in visual responsiveness, our work brings up the possibility that they may also be due, in part, to differences in laminar sampling. Our work also shows that visual responses are more prominent in secondary ACtx, consistent with findings in the ferret and monkey (Bizley et al., 2007; Kayser et al., 2008).
The findings presented here extend this literature by revealing that visual stimuli evoke spiking responses in the mouse ACtx and showing conclusively that visually responsive units show a strong laminar bias. Previous nonprimate work on audiovisual integration in the ACtx has largely been performed in anesthetized animals (Wallace et al., 2004; Bizley et al., 2007). Given that many anesthesias preferentially inhibit corticocortical connections (Raz et al., 2014) and that at least some visual information likely arrives at the ACtx through such connections (Banks et al., 2011), recordings in the awake animal may uncover responses otherwise obscured in anesthetized recordings. Furthermore, our examination of the orientation tuning properties of auditory responses begins to answer questions about the type of information visual responses convey to the local circuitry.
This work must be considered in the broader context of the literature on multisensory integration, much of which has found effects of cross-modal stimulation not in firing rate changes, but in evoked field potential responses and oscillatory changes (Ghazanfar et al., 2005; Lakatos et al., 2007). For example, Lakatos et al. (2007) found that visual stimulus onset resulted in phase reset of ongoing oscillations in the monkey ACtx with no change in MU spiking. It remains to be seen whether such a mechanism is also present in the mouse. Furthermore, there are additional aspects of the spiking signal in response to cross-modal stimulation that remain to be examined. The quenching of trial-to-trial response variability, for example, is a widespread phenomenon related to stimulus onset and may be an additional mechanism by which visual signals affect auditory cortical processing (Churchland et al., 2010).
This work is motivated partly by the utility of the mouse as a mammalian model organism for cell-type-specific microcircuit dissection. Tools such as fluorescent cell labeling, optogenetics, and chemogenetics, when applied to the problems of multisensory integration, may help elucidate the microcircuitry that integrates cross-modal signals. We hope that this study of visual responses in the ACtx of a genetic model organism will further the use of cell-type-specific tools for microcircuit dissection of multisensory phenomena.
Footnotes
This work was supported by the National Science Foundation GRFP to R.J.M., National Institutes of Health Grant R01DC014101 to A.R.H., the Klingenstein Foundation to A.R.H., Hearing Research Inc. to A.R.H., and the Coleman Memorial Fund to A.R.H.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Andrea R. Hasenstaub, Coleman Memorial Laboratory, University of California–San Francisco, San Francisco, CA 94158. andrea.hasenstaub{at}ucsf.edu