Abstract
Many of the sounds that we perceive are caused by our own actions, for example when speaking or moving, and must be distinguished from sounds caused by external events. Studies using macroscopic measurements of brain activity in human subjects have consistently shown that responses to self-generated sounds are attenuated in amplitude. However, the underlying manifestation of this phenomenon at the cellular level is not well understood. To address this, we recorded the activity of neurons in the auditory cortex of mice in response to sounds generated by their own behavior. We found that the responses of auditory cortical neurons to these self-generated sounds were consistently attenuated, compared with the same sounds generated independently of the animals' behavior. This effect was observed in both putative pyramidal neurons and in interneurons and was stronger in lower layers of auditory cortex. Downstream of the auditory cortex, we found that responses of hippocampal neurons to self-generated sounds were almost entirely suppressed. Responses to self-generated optogenetic stimulation of auditory thalamocortical terminals were also attenuated, suggesting a cortical contribution to this effect. Further analyses revealed that the attenuation of self-generated sounds was not simply due to the nonspecific effects of movement or behavioral state on auditory responsiveness. However, the strength of attenuation depended on the degree to which self-generated sounds were expected to occur, in a cell-type-specific manner. Together, these results reveal the cellular basis underlying attenuated responses to self-generated sounds and suggest that predictive processes contribute to this effect.
SIGNIFICANCE STATEMENT Distinguishing self-generated from externally generated sensory input poses a fundamental problem for behaving organisms. Our study in mice shows for the first time that responses of auditory cortical neurons are attenuated to sounds generated manually by the animals' own behavior. This effect is distinct from the nonspecific effect of behavioral activity on auditory responsiveness that has previously been reported and its magnitude is modulated by the probability with which self-generated sounds occur, suggesting an underlying predictive process. We also reveal how this effect varies across cell types and cortical layers. These findings lay a foundation for studying impairments in the processing of self-generated sounds, which are observed in psychiatric illness, in animal disease models.
Introduction
Distinguishing stimuli that are self-generated from those caused by external events is a fundamental problem faced by all behaving organisms (Crapse and Sommer, 2008). In the auditory domain, for example, self-generated sounds are caused by our footsteps when we walk or by our voice when we speak. Studies in human subjects suggest that neural responses to such self-generated sounds are attenuated: smaller responses are evoked by subjects' own speech sounds (Curio et al., 2000; Ford et al., 2001; Chen et al., 2011) or by manually triggered sounds (Schafer and Marcus, 1973; Martikainen et al., 2005; Baess et al., 2011) than when the same stimuli are generated independently of behavior. In addition to separating self-generated from externally generated sounds, such attenuation could help direct attention away from irrelevant stimuli and may also be important for self-monitoring during vocal production and learning (Hickok et al., 2011; Schneider and Mooney, 2015; Schröger et al., 2015). Furthermore, responses to self-generated sounds are attenuated less in individuals suffering from schizophrenia (Ford et al., 2001, 2014; Perez et al., 2012) and this deficit may contribute to the hallucinations and delusions observed in the disease (Heinks-Maldonado et al., 2007; Fletcher and Frith, 2009). Understanding how self-generated sounds are processed in the brain may therefore contribute to our understanding of the pathophysiology of psychiatric illness.
Although studies in human subjects have provided compelling evidence that neural responses to self-generated sounds are attenuated, they lack sufficient spatial resolution for revealing how this attenuation is manifest in the responses of individual neurons and where it originates in the auditory pathway. Previous studies in nonhuman primates have shown that the auditory responsiveness of cortical neurons is reduced during vocalizations (Eliades and Wang, 2003, 2008), but it is not known whether this also occurs when sounds are self-generated by nonvocal means. It has also become clear recently that the responsiveness of auditory cortical neurons to external sounds is reduced not only during vocalizations but during a variety of behaviors, including locomotion, whisking, and grooming (Schneider et al., 2014; Zhou et al., 2014; McGinley et al., 2015a). This raises the question of whether the attenuation of self-generated sounds is simply a manifestation of a nonspecific reduction in auditory responsiveness caused by movement. An alternative possibility is that responses to self-generated sounds are attenuated because they are caused, and therefore predicted, by the animal's behavior (Schröger et al., 2015). However, the extent to which nonspecific and predictive mechanisms contribute to the attenuation of self-generated sounds is not fully understood.
To address these questions, in the present study we examined how neurons in the mouse auditory cortex respond to sounds directly generated by the animals' own behavior. We found that responses to self-generated sounds were consistently smaller than responses to randomly generated sounds. This effect was primarily observed in lower layers of the auditory cortex and was also seen during self-generated optogenetic stimulation of auditory thalamocortical terminals, consistent with intracortical mechanisms. Additional analyses revealed that the attenuation of self-generated sounds could not be explained by the nonspecific effects of movement or behavioral state on sensory responsiveness. However, introducing unexpected sensory outcomes decreased the attenuation of self-generated sounds, suggesting that this effect relies on predictive processes in the brain.
Materials and Methods
Animals.
Male C57BL/6 mice, 10–12 weeks old at the beginning of the experiments, were used. A total of 17 mice were used for experiments in freely moving animals (see Figs. 2, 9A–E). A total of 18 mice were used for experiments in head-fixed animals (11 mice for experiments described in Figs. 3, 4, 6, 9F–G; three mice for experiments described in Fig. 7A–G; two mice for experiments described in Fig. 7H—J; four mice for experiments described in Fig. 8; five mice for experiments described in Fig. 10). Animals were kept in a ventilated animal container on a 12 h light/dark cycle. All experiments were performed during the light cycle. All procedures were approved by the local animal care committee (Regierungspräsidium Darmstadt).
Behavioral training.
For experiments in freely moving animals, mice were first trained to press a lever. Training took place inside a mouse operant chamber (ENV-307A-CT, Med Associates) placed inside a sound-attenuating cubicle (ENV-022V, Med Associates). The operant chamber was fitted with a lightweight lever consisting of a small piece of plastic attached to an electronic switch. Separated by 5 cm from the lever was a nose port with a spout connected to a reservoir of liquid reward (water containing 10% sucrose). To obtain a reward, animals had to press the lever and then enter the nose port within 2 s. Entries into the nose port were detected using an infrared emitter/detector circuit and reward (10 μl) was delivered by opening a solenoid valve (003-0218-900, Parker Hannifin) between the spout and the liquid reservoir. Behavioral events were detected and reward delivery was controlled using a microcontroller (Arduino Uno, Arduino). To motivate animals, their access to water was restricted to 2 h per day (excluding time spent on task). Animals were trained until they reliably pressed the lever (>100 times per session), which typically required 2 weeks of training. At this point animals underwent surgery for implantation of electrodes.
Surgical implantation of recording electrodes for recordings in freely moving mice.
Animals were anesthetized using isoflurane (1–2%) and placed in a stereotaxic frame with the skull exposed. At the onset of anesthesia, all animals received subcutaneous injections of carprofen (4 mg/kg) and dexamethasone (2 mg/kg) and an intraperitoneal injection of atropine (0.1 mg/kg). The animal's temperature was maintained for the duration of the surgical procedure using a heating blanket. Anesthesia levels were monitored throughout the surgery and the concentration of isoflurane adjusted so that the breathing rate never fell below 1 Hz. For recordings from the auditory cortex, we used moveable arrays of stereotrodes made by twisting together two 0.0005 inch tungsten wires (M219350, California Fine Wire). A bundle of 5–6 stereotrodes was attached to a custom-made microdrive that made it possible to advance the electrodes along the dorsoventral axis. The electrodes were connected to an electrode interface board (EIB-16, Neuralynx) for relaying neural signals to the data acquisition system. On the day of implantation, the stereotrodes were gold-plated to reduce the impedance to 0.2–0.8 MΩ at 1 kHz. The stereotrode bundle was inserted straight down through a craniotomy on the top surface of the skull located 2.7–3.0 mm posterior to bregma and 4.1–4.3 mm to the left of the midline, to a depth of 400–600 μm below the brain surface, to target the primary auditory cortex (Paxinos and Franklin, 2001). The microdrive was secured to the skull using dental cement (Paladur, Hereaus Kulzer). In a subset of animals, we also implanted single tungsten microelectrodes (0.5 MΩ; WE30010.5F3, MicroProbes) into the auditory thalamus. The electrodes were inserted through a craniotomy 3.0 mm posterior to bregma and 2.0 mm to the left of the midline. Electrodes were lowered into the brain to a depth of ∼3.0 mm while recording multiunit activity (MUA) in response to broadband white noise stimuli. The auditory thalamus was identified based on the occurrence of short-latency (<10 ms) auditory responses. The electrodes were then attached to the skull using dental cement. In a subset of animals, a low-impedance electrode consisting of 0.003 inch tungsten wire (CFW2011252, California Fine Wire) was implanted into the CA1 region of the hippocampus (1.9 mm posterior to bregma; 1.4 mm to the left of the midline; 1.3 mm below the brain surface). For all electrode implants, skull screws over the frontal cortex and cerebellum served as reference and ground, respectively, and provided additional anchoring support for the microdrives and electrodes. Animals were individually housed and allowed to recover for 1 week following the operation.
Recording of neural activity during self-generation of auditory stimuli in freely moving mice.
Following recovery from surgery (typically 1 week), animals were habituated to the experimental setup. A 16-channel headstage (HS-18, Neuralynx) was connected to the electrode interface board on the animal's microdrive implant and neural data were acquired using a Digital Lynx SX recording system (Neuralynx). Animals were placed inside a soundproof electrophysiological recording chamber that contained an operant chamber identical to the one used for training. During the first 1–2 d, animals became accustomed to performing the lever-pressing task while being connected to the recording system. During these sessions, the animals were also presented with the auditory stimulus that would later be used for the actual experiment, consisting of a white noise burst (150 ms duration, 72 dB SPL), presented randomly every 5–15 s through a speaker mounted on the chamber wall opposite to the lever and the nose port. On subsequent sessions, in addition to being randomly generated, the same white noise burst was also generated whenever the animal pressed the lever. This enabled us to compare responses to the same stimulus when it was self-generated and randomly generated (Fig. 1). This was repeated over several sessions between which the microdrive was lowered ≥40 μm to record from new populations of neurons.
Recording of neural activity during self-generation of auditory stimuli in head-fixed mice.
Using the same surgical techniques described above for electrode implantation, animals were anesthetized and placed in a stereotaxic frame with the skull exposed. A stainless-steel head post (Luigs and Neumann) was then cemented to the exposed skull. The area of the skull overlying the auditory cortex was left free of cement but covered with a silicon elastomer (Kwik-Sil, World Precision Instruments). Skull screws were inserted over the frontal cortex and cerebellum to serve as reference and ground, respectively, and to provide anchoring support. Following recovery from surgery, animals were handled and habituated to being head-fixed, which was achieved by inserting the head post into a matching head post holder (Luigs and Neumann). Animals were then trained to press a lever while being head-fixed to obtain liquid reward. The lever was identical to the one used in freely moving experiments and was placed just in front of and above the left paw. To obtain reward (3–6 μl of 10% sucrose solution, delivered using a solenoid valve as described above), animals had to press the lever and lick at a reward spout placed in front of their mouth. Licks were detected using an infrared emitter and detector on either side of the reward spout. Behavioral events were detected and reward delivery was controlled using a microcontroller (Arduino Uno, Arduino). To motivate animals, their access to water was restricted to 1 ml per day.
Once animals pressed the lever reliably (>250 lever presses per session, typically 7 training days), they underwent another surgery to prepare a craniotomy over the auditory cortex. Animals were anesthetized with isoflurane and placed in a stereotaxic frame. The Kwik-Sil was removed from the skull. A small craniotomy was then made in the skull overlying the left auditory cortex and sealed with Kwik-Sil. The following day, the animals were head-fixed, the Kwik-Sil removed, and a 32-channel silicon probe (A1X32-Edge-5mm-20-177-A32 or A1X32-Poly2-5mm-50s-177-A32, NeuroNexus) was lowered into the auditory cortex using a micromanipulator (SM-8, Luigs and Neumann). The dura was left intact. In the majority of experiments, electrodes were inserted perpendicular to the brain surface (30–35° relative to the horizontal plane) to align the electrode sites perpendicular to the layers of the auditory cortex (see Fig. 6D). Electrodes were advanced to a depth of 620–1100 μm below the brain surface. Because the recording sites on the two probe models we used spanned 620 and 775 μm, respectively, data were obtained from a restricted depth of the auditory cortex in any given recording session; however, across all sessions, recording sites were located between 0 and 1060 μm below the brain surface. To record from the hippocampus, electrodes (A1X32-6mm-50-177-A32, NeuroNexus) were advanced to a depth of 1850–2000 μm, resulting in recording sites being placed in the CA1 region and in the deep layers of the auditory cortex (see Fig. 8A). In some experiments, recordings were made from the auditory thalamus by inserting a silicon probe (A1X32-Edge-5mm-20-177-A32) through a craniotomy centered at 3.0 mm posterior to bregma and 2.0 mm to the left of the midline, to a depth of 3200–3500 μm below the brain surface. Following final placement of electrodes and a brief waiting period (∼15 min), neural activity was recorded while animals pressed the lever as before. Now, the lever triggered a white noise burst (150 ms, 70 dB) delivered from a speaker (R1904/613001, Scanspeak) situated 20 cm above and 27 cm to the right of the mouse. The same auditory stimulus was also triggered randomly every 5–10 s. After the recording sessions, the electrodes were removed from the brain, the craniotomy closed with Kwik-Sil, and the mouse returned to its home cage. This procedure was repeated for 1–4 d. On the last 1 or 2 recording days, the silicon probe was coated with a fluorescent dye (DiI or DiO, Life Technologies) to assist with the identification of the recording locations.
Optogenetic stimulation of thalamocortical terminals.
Using the same surgical techniques described above for electrode implantation, animals were anesthetized and placed in a stereotaxic frame and the skull was exposed. A small craniotomy was made in the skull overlying the auditory thalamus (3.0 mm posterior to bregma and 2.0 mm to the left of the midline) and a 35 gauge needle attached to a Hamilton syringe (Hamilton) was inserted to a depth of 2.8–3.0 mm below the brain surface. A microsyringe pump controller (UltraMicroPump III, World Precision Instruments) then injected 1 μl of viral construct (pAAV5-CaMKIIa-hChR2(H134R)-EYFP, titer ∼3 × 1012 vg/ml, University of North Carolina Vector Core) at a constant injection speed of 30 nl/min. Following the injection, the syringe was allowed to remain in place for ≥15 min before it was carefully removed from the brain. The scalp was then sutured closed using a medical sewing kit. After ≥10 weeks to allow for surgical recovery and virus expression, animals underwent a second surgery for head post implantation and were trained to press a lever while head-fixed, after which a craniotomy was made overlying the auditory cortex, as described above. During the actual experiments, lever presses triggered blue light pulses (2 ms, 34 mW) delivered from a 473 nm laser (LuXx473, Omicron) through an optical fiber (diameter, 125 μm; numerical aperture, 0.22) attached to a silicon probe (A1X32-10mm-50-177-OA32, NeuroNexus). The optical fiber terminated 200 μm above the topmost electrode site. Light pulses identical to those triggered by lever presses were also delivered randomly every 5–10 s. High light power was used to counteract the effects of light scatter in brain tissue and maximize the chances of eliciting responses throughout the depth of the cortex. Although light stimulation at high intensities can cause heating of brain tissue, this is mostly a concern when using continuous light stimulation and not when using brief light pulses as done in this study (Stujenske et al., 2015).
Responses to expected and unexpected self-generated sounds.
In experiments examining responses to self-generated but “unexpected” sounds (see Fig. 10), we used pure tones (100 ms, 5 ms rise/fall time, 68 ± 5 dB), generated by a 24 bit digital-to-analog converter with an output sampling rate of 192 kHz (RZ6, Tucker-Davis Technologies). Recordings were performed in head-fixed mice, as described above. The tones were delivered by a speaker (R1904/613001, Scanspeak) located 20 cm above and 27 cm to the right of the mouse. Sound intensity was calibrated using a 1/8 inch pressure-field microphone (model 4939, Brüel & Kjær), a conditioning amplifier (Nexus, Brüel & Kjær), and SigCal software (Tucker-Davis Technologies). The “expected” and “unexpected” tones were chosen for each animal by delivering tones between 2 and 45.1 kHz (in 0.25 octave steps at 72, 62, and 52 dB) while they were head-fixed but not performing the lever-pressing task. Based on the responses to these tones, we chose two frequencies that elicited responses of a similar magnitude (0.45–0.70 octave separation). During the actual experiment, each lever press elicited one of the two tones with a probability of 75% (expected tone) and the other tone with a probability of 25% (unexpected tone). The two tones were also delivered randomly during the experiment every 5–10 s with the same 75/25% probability. These probabilities were chosen to ensure that a sufficient number of unexpected stimuli could be analyzed (after excluding stimuli using the criteria described below) while still being distinctly less probable than expected sounds. Although probabilities of 90/10% are frequently used to examine responses to unexpected auditory stimuli presented passively, effects can also be seen using probabilities of 70/30% (Ulanovsky et al., 2003; von der Behrens et al., 2009; Antunes et al., 2010).
Histology.
At the end of the last recording session, animals used in freely moving experiments were anesthetized with Na-pentobarbital and a small lesion for identifying recording locations was made by passing current (50 mA, 10 s) through one electrode. Animals were subsequently perfused transcardially with 4% paraformaldehyde, 15% picric acid in PBS, pH 7.4. Brains were postfixed overnight, coronal sections (80 μm) were prepared (VT1000S microtome, Leica), and lesions were identified using a light microscope. Histological analysis subsequently revealed most of the lesions to be within the primary auditory cortex as defined by Paxinos and Franklin (2001). However, because the lesions represent the final position of the electrodes, it is possible that some recordings were also made from the adjacent dorsal region of the secondary auditory cortex (see Fig. 5). For head-fixed animals, brains were prepared and sectioned as described above and were mounted with Vectashield mounting medium containing a DAPI counterstain (Vector Laboratories). Electrode tracks labeled with fluorescent dye were identified using a confocal microscope (Eclipse90i, Nikon). The majority of tracks were perpendicular to the cortical layers and located within the primary auditory cortex (see Fig. 5; Paxinos and Franklin, 2001).
Data acquisition.
For experiments in freely moving mice, neural data [putative spikes and local field potentials (LFPs)] were acquired using a 16-channel headstage and a Digital Lynx SX data acquisition system (Neuralynx). To extract putative spike waveforms, neural signals were referenced against one of the stereotrodes and bandpass-filtered between 0.6 and 6 kHz. Waveforms that exceeded a voltage threshold (typically 40–50 μV) were then digitized at 32 kHz and 32 samples of each waveform (∼250 and 750 μs before and after waveform trough, respectively) were stored for subsequent analysis. LFPs were extracted by bandpass filtering the same signals between 1 and 1000 Hz and digitizing the filtered signal at 2 kHz. An overhead video camera detected the position of the animals from a red light-emitting diode mounted on the headstage. The x and y coordinates of the animals were digitized at 30 Hz (Neuralynx) and time-stamped with the same clock used for the electrophysiological data. For experiments in head-fixed mice, electrophysiological signals were filtered between 1 and 7500 Hz, digitized at 30 kHz using a digitizing headstage (RHD2132 Amplifier Board, Intan Technologies), and acquired using a USB interface board (RHD2000, Intan Technologies). The USB interface board also registered the time stamps of behavioral events based on transistor-to-transistor logic pulses delivered by the microcontroller (Arduino Uno, Arduino).
Data analysis.
To extract LFPs from head-fixed recordings, the raw data were filtered off-line between 1 and 1000 Hz and downsampled to 2 kHz. To extract spiking activity from head-fixed recordings, common average referencing was performed on all functional electrode sites (Ludwig et al., 2009), followed by filtering between 600 and 7500 Hz. All filtering was performed in Matlab using an equiripple filter (Matlab 2014b, MathWorks). Putative spike waveforms crossing a voltage threshold (typically 45–55 μV) were extracted from the spiking activity and 32 samples of each spike waveform (∼367 and 700 μs before and after waveform trough, respectively) were extracted. Virtual tetrodes were then created from groups of four neighboring electrode sites, so that a spike waveform was acquired on all four sites in response to a threshold crossing on any of the four sites.
To identify individual single units (neurons) in both freely moving and head-fixed recordings, spike waveform features were computed across either stereotrodes or virtual tetrodes, and an automatic clustering algorithm (KlustaKwik; http://klustakwik.sourceforge.net/) was used, followed by manual refinement in SpikeSort 3D (Neuralynx). The time stamps of all waveforms were further considered for analysis as MUA. All subsequent analysis was performed using custom-written scripts in Matlab (Matlab 2014b, MathWorks). Single units recorded in head-fixed experiments were classified as putative interneurons or pyramidal neurons based on two features of their spike waveform (see Fig. 6A). First, we used the time from the negative trough to the positive peak of the waveform, based on the tendency of interneurons to have narrower action potentials. Second, we used the voltage of the waveform at the last sample point, reflecting the fact that action potentials of interneurons return more rapidly to baseline. These features were calculated after upsampling each spike waveform by a factor of 100. Note that because we extracted only ∼1 ms of the waveform of each spike, not all neurons had returned to baseline by the last waveform sample. To objectively classify neurons, the distribution of these two waveform features was fit using a two-dimensional Gaussian mixture model (Stark et al., 2013; Kim et al., 2016). Neurons with low classification confidence (p < 0.05) were excluded from analysis. For neurons recorded using silicon probes inserted perpendicular to the cortical layers (see Fig. 6D), their depth below the brain surface was estimated from the micromanipulator travel and by determining the recording site where their waveform was largest. For subsequent analysis, neurons were classified as belonging to “upper layers” or “lower layers” if their depth was 0–500 or 500–1000 μm, respectively, from the brain surface. These two depth categories correspond approximately to layers 1–4 and 5–6, respectively (Anderson et al., 2009). Supporting this classification, one-dimensional current-source density plots (Nicholson and Freeman, 1975; Mitzdorf, 1985) generated from the depth profile of averaged tone-evoked LFPs often revealed separate short-latency current sinks ∼300–500 and 600–800 μm below the brain surface (see Fig. 6E), likely reflecting thalamocortical input to layers 3/4 and 5/6, respectively (Kimura et al., 2003; Sakata and Harris, 2009).
For comparing responses to self-generated and randomly generated sounds, stimuli were excluded from analysis that occurred <1 s after the previous stimulus to minimize the effects of sensory adaptation. However, increasing this time window to 2 s did not affect the results. Second, to minimize differences in behavioral state during random and self-generated stimuli, only random stimuli that occurred when the animals were engaged in the lever-pressing task were included in the analysis (Fig. 1B). These were operationally defined as random sounds for which the preceding sound was self-generated.
To analyze sensory-evoked spiking activity, we first constructed peristimulus time histograms aligned to the onset of random and self-generated sounds, for both single-unit activity (SUA) and MUA. We then quantified the response amplitude by calculating the average firing rate between 10 and 50 ms following stimulus onset and subtracting the baseline firing rate (0–200 ms before stimulus onset). This analysis window was chosen to match the short latency and brief duration of the responses we observed in most of our neurons and multiunit recordings (Figs. 2⇓–4). Responses to optogenetic stimulation of thalamocortical terminals (see Fig. 7) were quantified by calculating the firing rate 5–10 ms following stimulus onset to match the brief duration of these responses and avoid contamination by the laser artifact. Neurons or multiunit sites were classified as auditory responsive, and included in subsequent analysis, if their firing rate 10–50 ms following stimulus onset was >3 SDs from the baseline firing rate (measured in 25 ms bins 0–200 ms before stimulus onset). To examine sensory-evoked event-related potentials (ERPs), we first averaged the LFP surrounding stimulus onset. ERPs elicited by our auditory stimuli always consisted of an initial negative deflection that reached its peak within 25 ms of stimulus onset (Figs. 2, 3). The amplitude of each ERP was therefore quantified by measuring the LFP value at this negative peak and subtracting from it the LFP value at stimulus onset. ERPs were included for subsequent analysis only if their amplitude was >80 μV and eight times the SD of the prestimulus baseline (measured 0–200 ms before stimulus onset). Individual neurons, LFP recording sites, and multiunit sites were included in subsequent analysis if they were auditory responsive, as determined using the abovementioned criteria, in response to either the random or self-generated sounds, or both. Because the stereotrodes used in freely moving experiments were closely spaced together in a bundle, they measured approximately the same LFP signals; for this reason, auditory-responsive LFP sites were averaged within each session. In head-fixed experiments using silicon probes, LFP and multiunit responses from every third site (corresponding to a 60–75 μm separation) were included in the analysis.
To quantify the differences in responses to self-generated and random sounds, we computed a modulation index (MI) by subtracting the response to the random sound from the response to the self-generated sound and dividing this difference by the sum of the two responses: MI = (ResponseSelf-generated − ResponseRandom)/(ResponseSelf-generated + ResponseRandom).
For quantifying attenuation of responses to expected and unexpected self-generated sounds (see Fig. 10), MI values were calculated for expected and unexpected self-generated sounds, by comparing them to random expected and unexpected sounds, respectively: MIexpected = (ResponseSelf-generated expected − ResponseRandom expected)/(ResponseSelf-generated expected + ResponseRandom expected); MIUnexpected = (ResponseSelf-generated unexpected − ResponseRandom unexpected)/(ResponseSelf-generated unexpected + ResponseRandom unexpected).
Differences in MIExpected and MIUnexpected could reflect changes in responses to random or self-generated sounds or both. We therefore computed an MI quantifying differences in expected and unexpected sounds separately for self-generated and random sounds (see Fig. 10): MISelf-generated = (ResponseSelf-generated unexpected − ResponseSelf-generated expected)/(ResponseSelf-generated unexpected + ResponseSelf-generated expected); MIRandom = (ResponseRandom unexpected − ResponseRandom expected)/(ResponseRandom unexpected + ResponseRandom expected).
For quantifying the effect of movement on auditory responsiveness (see Fig. 9), animals' movement speed was measured based on changes in the position of the LEDs attached to the recording headstage and averaged in 500 ms windows before stimulus onset. Random sounds were defined as occurring during “active” or “quiescent” states if the prestimulus movement speed was above or below 1.5 cm/s, respectively (see Fig. 9A). To quantify differences in auditory responsiveness in these two states, we computed an MI (see Fig. 9D) as follows: MI = (ResponseActive − ResponseQuiescent)/(ResponseActive + ResponseQuiescent).
An MI similar to the one above was also computed to compare responses to random sounds at different movement speeds, as well as self-generated sounds and random sounds selected for analysis, to responses during quiescence (see Fig. 9E).
Statistical significance of group differences was estimated using nonparametric tests. Unless stated otherwise, comparisons were paired and were assessed using Wilcoxon's sign-rank test.
Results
Responses to self-generated sounds are attenuated in the auditory cortex
To examine the neural processing of self-generated sounds, we developed an experimental paradigm for mice in which sounds were generated either by the animal's own behavior or presented randomly (Fig. 1A). To this end, mice were first trained to press a lever to obtain water reward. Once animals were lever-pressing reliably (>100 lever presses per session, typically 1–2 weeks), they underwent surgery for implantation of recording electrodes into the auditory cortex. Following recovery from surgery, the animals again performed the lever-pressing task but this time each lever press triggered the delivery of an auditory stimulus (white noise, 150 ms duration, 72 dB SPL; see Materials and Methods). While the animals performed this task, the same auditory stimulus was also delivered randomly (5–15 s interstimulus interval), allowing us to measure neural responses to the same physical stimulus when it was self-generated or randomly generated (Fig. 1A,C). To minimize the influence of the overall behavioral state on our results, random stimuli occurring outside periods of lever-pressing were excluded from analysis (Fig. 1B). Auditory stimuli (either randomly generated or self-generated) were also excluded from analysis if they occurred within 1 s following the previous stimulus to minimize the effects of sensory adaptation (see Materials and Methods).
Figure 2A shows the response of a single auditory cortical neuron to the self-generated and random sounds. As this example illustrates, the auditory stimuli used in our task typically evoked short-latency phasic responses in auditory cortical neurons that were largely confined to the first 50 ms following stimulus onset (for additional examples, see Figs. 3A and 4). We therefore focused on this time window for our analysis (see Materials and Methods). Across the population of recorded neurons, evoked responses were consistently smaller in magnitude following self-generated compared with random sounds (Fig. 2D; firing rate 10–50 ms after stimulus onset, p < 0.0001, n = 64 neurons, Wilcoxon sign-rank test). Analysis of individual neurons revealed that ∼17% (11 of 64) showed significantly smaller responses to self-generated sounds (p < 0.05, Wilcoxon rank-sum test), whereas only 3% (2 of 64) showed significantly enhanced responses to these sounds. In contrast to the differences in evoked responses, spontaneous activity preceding random or self-generated stimuli did not differ (firing rate 0–200 ms before stimulus onset; random, 5.64 ± 0.91 Hz; self-generated, 6.30 ± 1.24 Hz, p = 0.60, n = 64 neurons). Attenuated responses to self-generated sounds were also observed in measures of population activity including MUA (Fig. 2B,E; firing rate 10–50 ms following stimulus onset; p < 0.0001, n = 106 sites) and evoked LFPs (Fig. 2C,F; peak amplitude 0–25 ms following stimulus onset, n = 60 sites, p < 0.0001). To quantify responses to self-generated sounds, we computed an MI by subtracting the response to the random sound from the response to the self-generated sound and dividing this difference by the sum of the two responses: (ResponseSelf-generated − ResponseRandom)/(ResponseSelf-generated + ResponseRandom). Thus, the MI measure ranges from −1 (response only to the random sound) to +1 (response only to the self-generated sound) with 0 indicating equal responses to both stimuli. Notably, the MI values were significantly negative for all measures of neural activity, indicating weaker responses to self-generated sounds (Fig. 2G–I; p < 0.0001).
We next examined responses to self-generated sounds in the auditory cortex of head-fixed mice using a similar paradigm as the one for freely moving mice described above (see Materials and Methods). An important advantage of head fixation is that the position of the speaker relative to the animal's head remains constant throughout the experimental session as well as for random and self-generated sounds. The results of these experiments are shown in Figure 3. Similar to the results obtained in freely moving animals, responses to self-generated sounds in head-fixed mice were consistently reduced in magnitude as measured using SUA (MI: −0.15 ± 0.02, p < 0.0001, n = 186 neurons), MUA (MI: − 0.17 ± 0.03, n = 135 sites, p < 0.0001), or LFPs (MI: −0.13 ± 0.01, p < 0.0001, n = 197 sites). Approximately 30% (55 of 186) of neurons showed a significantly larger response to randomly generated stimuli (p < 0.05, Wilcoxon rank-sum test), whereas only 3% (6 of 186) showed the opposite effect (for additional individual neuron examples, see Fig. 4). Note that the larger fraction of neurons showing significantly attenuated responses in head-fixed animals likely reflects the higher statistical power in these experiments due to the larger number of trials (492.69 ± 42.67 lever presses per session vs 163.24 ± 7.23 in freely moving animals). MI values also remained significantly negative after averaging data within each animal (SUA, −0.14 ± 0.03, p = 0.003, n = 11; MUA, −0.16 ± 0.07, p = 0.02, n = 11; LFP, −0.12 ± 0.03, p = 0.002, n = 10). Unless otherwise noted, results described in subsequent sections were all obtained in head-fixed mice. Together, these results from freely moving and head-fixed animals demonstrate for the first time that responses of auditory cortical neurons to manually self-generated sounds are attenuated, revealing the cellular correlate for similar observations in humans (Schafer and Marcus, 1973; Martikainen et al., 2005; Baess et al., 2011).
Attenuation of responses to self-generated sounds across cell types and cortical layers
Although many auditory cortical neurons displayed attenuated responses to self-generated sounds, the size of this effect varied from one neuron to the next, leading us to examine possible sources of this variability. First, we asked whether the strength of attenuation depended on neuron type. To this end, we used spike waveform features (Stark et al., 2013) to separate neurons recorded in head-fixed experiments into putative pyramidal neurons and interneurons (Fig. 6A,B; see Materials and Methods). Of our auditory responsive neurons, 46% were classified as putative interneurons and 54% were classified as putative excitatory neurons. Note that the relatively high proportion of interneurons likely reflects their higher likelihood of being classified as auditory responsive; the proportions of the two types among all recorded neurons, 79.4% excitatory (281 of 354) versus 20.6% excitatory (73 of 354), were closer to what has previously been reported. As expected, putative interneurons had higher firing rates compared with putative pyramidal neurons (interneurons, 9.74 ± 1.10 Hz; pyramidal neurons, 5.15 ± 0.47 Hz, p = 0.0002, Wilcoxon rank-sum test). Both classes of neurons showed significant attenuation in response to self-generated sounds (Fig. 6C; interneurons, −0.12 ± 0.03, n = 85, p < 0.0001; pyramidal neurons, −0.17 ± 0.04, n = 93, p < 0.0001), which did not differ in magnitude between the two populations (p = 0.12, Wilcoxon rank-sum test). The MI values of interneurons with the 25% narrowest waveforms (<0.24 ms) also did not differ from putative pyramidal neurons (MI: −0.10 ± 0.05, n = 21, p = 0.26) and, more generally, we did not observe a correlation between MI values and spike width (r = −0.10, p = 0.18).
We next examined whether processing of self-generated sounds varied across cortical layers. In the majority of our experiments in head-fixed mice, neural activity was recorded using 32-channel silicon probes inserted perpendicular to the cortical surface (Fig. 6D), allowing us to simultaneously record from neurons at different cortical depths (see Materials and Methods). Figure 6F shows the MI as a function of recording depth for all neurons of both cell types. For subsequent analysis, we compared neurons recorded 0–500 μm (“upper layers”) and 500–1000 μm (“lower layers”) below the cortical surface, corresponding approximately to layers 1–4 and 5–6, respectively (Anderson et al., 2009). We found that across all neurons, attenuation was much stronger in lower compared with upper layers (Fig. 6G; upper layers, −0.03 ± 0.04, n = 38; lower layers, −0.18 ± 0.03, n = 90, p < 0.001, Wilcoxon rank-sum test). MI values obtained from MUA corroborated this finding (upper layers, −0.06 ± 0.02, n = 73; lower layers, −0.25 ± 0.02, n = 163, p < 0.0001, Wilcoxon rank-sum test). This effect was also seen when only considering putative interneurons (Fig. 6H; upper layers, 0.00 ± 0.04, n = 23; lower layers, −0.15 ± 0.04, n = 32, p < 0.01, Wilcoxon rank-sum test) and a similar nonsignificant trend was observed for putative pyramidal neurons (Fig. 6I; upper layers, −0.08 ± 0.07, n = 15; lower layers, −0.21 ± 0.05, n = 53, p = 0.08, Wilcoxon rank-sum test). These results were furthermore confirmed by computing the correlation coefficient between each neuron's recording depth and its MI value, which was significant for all neurons combined (r = −0.28, p < 0.01, Spearman's rank correlation) as well as for putative interneurons (r = −0.38, p < 0.01), but not for putative pyramidal neurons (r = −0.16, p = 0.20). Notably, MI values in upper layer neurons did not differ significantly from zero (all neurons, p = 0.31; pyramidal neurons, p = 0.25; interneurons, p = 0.31; Wilcoxon sign-rank test), although significant attenuation was observed in MUA from upper layers (−0.06 ± 0.02, n = 73, p < 0.0001). These results suggest that attenuated responses to self-generated sounds are largely confined to neurons in the lower layers of the auditory cortex.
Local cortical contribution to the attenuation of self-generated sounds
Auditory information passes through several stations along the auditory pathway before reaching the cortex. It is therefore conceivable that the attenuated responses to self-generated sounds we observed in the auditory cortex reflect attenuation in upstream structures. To examine whether attenuation occurs locally within the cortex, we asked whether responses to stimuli that directly activate auditory cortical neurons are attenuated when they are generated by the animal's behavior. To address this question, we replaced the auditory stimuli in our head-fixed paradigm with optogenetic stimulation of auditory thalamocortical terminals, in effect bypassing upstream structures along the auditory pathway (Fig. 7A). To this end, neurons in the auditory thalamus were transfected with a viral construct coding for channelrhodopsin-2 (ChR2), resulting in ChR2 expression in the thalamus as well as in thalamocortical terminals in the auditory cortex (Fig. 7B,C). Laser pulses delivered to the auditory cortex elicited brief short-latency (5–10 ms) excitatory responses in auditory cortical neurons (Fig. 7D). Notably, these responses were smaller when the laser pulses were triggered by the animals' lever presses (self-generated) compared with when they were randomly generated (Fig. 7E–G; MI: −0.22 ± 0.10, p < 0.05, n = 17 neurons). Because the responses were caused by direct activation of cortical neurons, these results suggest that circuits within the auditory cortex are capable of attenuating responses to self-generated stimuli. We also examined sound-evoked responses upstream of the auditory cortex, in the auditory thalamus in head-fixed animals. Thalamic neurons did show significantly smaller responses to self-generated sounds (Fig. 7H,I; MI, −0.04 ± 0.02, p < 0.05, n = 111 cells); however, the magnitude of this effect was relatively modest and differed significantly from what we observed for single units in the auditory cortex (Fig. 7J; p < 0.001 compared with data in Fig. 3G). A small increase in baseline firing rates preceding self-generated sounds was also observed in thalamic neurons (random, 4.71 ± 0.55 Hz; self-generated, 5.09 ± 0.58 Hz, n = 111, p < 0.05). We also examined multiunit responses in the thalamus of freely behaving mice and observed comparable responses to random and self-generated sounds (MI, 0.01 ± 0.04, p = 0.84, n = 13 recording sessions from three animals).
Responses to self-generated sounds are more strongly attenuated in the hippocampus
Although responses to self-generated sounds were consistently reduced in the auditory cortex, they were not suppressed entirely. This led us to examine whether responses to self-generated sounds might be attenuated more strongly in regions downstream of the auditory cortex that are involved in cognitive processing. To this end we recorded, using multisite silicon probes, simultaneously from the auditory cortex and the immediately adjacent hippocampus, in a subset of our head-fixed recording sessions (Fig. 8A,B). Auditory stimuli evoked robust responses in the hippocampus that had a longer latency than responses recorded in the auditory cortex (onset latencies of grand-averaged single-unit responses: 19.2 vs 8.7 ms). Strikingly, responses to self-generated stimuli were strongly attenuated in the hippocampus, often resulting in near complete suppression of auditory responses (Fig. 8C–H; SUA, −0.77 ± 0.06, n = 16 neurons, p < 0.001; LFP, −0.78 ± 0.01, n = 63 sites, p < 0.0001). The magnitude of this effect was much greater than what we found previously in the auditory cortex (Fig. 8E,H; SUA, p < 0.0001 compared with data in Fig. 3D; LFP, p < 0.0001 compared with data in Fig. 3F, Wilcoxon rank-sum test) and was also greater than what was seen in simultaneously recorded LFP responses in the auditory cortex (MI, −0.29 ± 0.01, n = 92, p < 0.0001, Wilcoxon rank-sum test). Similar results were obtained from LFP recordings in freely moving mice (hippocampus, −0.65 ± 0.09, n = 4; auditory cortex, −0.13 ± 0.01, n = 159, p < 0.001, Wilcoxon rank-sum test).Together, these results suggest that responses to self-generated sounds are more strongly attenuated in areas that support cognitive processes, such as memory and decision-making.
Attenuation of self-generated sounds is not caused simply by differences in behavioral state
What could be responsible for the attenuated responses to self-generated sounds? Recent studies have shown that neurons in the rodent auditory cortex are less responsive to sounds when animals are behaviorally active (e.g., during locomotion or grooming) compared with when they are quiescent (Schneider et al., 2014; Zhou et al., 2014; McGinley et al., 2015a). Because self-generated sounds occur during periods of behavioral activity (i.e., lever-pressing), it could be argued that responses to these stimuli are smaller for this reason alone. To minimize this possibility, we compared self-generated stimuli only to randomly generated stimuli that occurred when the animals were engaged in the lever-pressing task (Fig. 1A,B). To further investigate how the overall behavioral state of the animals might influence our results, we also examined how movement on its own affects neural responses in the auditory cortex. To do this, we examined in freely moving animals responses to randomly generated sounds that occurred during different levels of behavioral activity, which we quantified by measuring animals' movement speed 500 ms before stimulus onset (see Materials and Methods). Speed value distributions from individual sessions typically revealed a prominent peak between 0 and 1.5 cm/s, representing periods of behavioral inactivity (Fig. 9A). We therefore defined epochs where movement speed was above and below 1.5 cm/s as representing active and quiescent periods, respectively.
Consistent with previous reports (Schneider et al., 2014; Zhou et al., 2014), we found that responses to random sounds were smaller during active periods (Fig. 9B,C; LFP peak amplitude; n = 80 sites, p < 0.0001). This effect was quantified by computing an MI as follows: (ResponseActive − ResponseQuiescent)/(ResponseActive + ResponseQuiescent) (Fig. 9D). To further examine whether the magnitude of this effect depends on the level of behavioral activity, we separately examined responses to random sounds occurring during periods of low (1.5–5.6 cm/s), medium (5.6–10 cm/s), or high (>10 cm/s) movement speed and computed the MI for each speed range relative to the quiescent periods (Fig. 9E). Auditory responses were smaller even during periods of low movement speed (MI, −0.14 ± 0.01, n = 80, p < 0.0001), but they were smaller still during periods of intermediate speed (MI, −0.20 ± 0.02, n = 80, p < 0.0001 compared with low movement speed), whereas high movement speed was not associated with a further decrease in responsiveness (MI, −0.22 ± 0.02, n = 80, p = 0.08 compared with intermediate movement speed). These results suggest that the effect of behavioral activity on responsiveness to sounds is dependent on the vigor of movement. Importantly, responses to random sounds selected for comparison with self-generated stimuli (Fig. 1B) were also smaller compared with responses during quiescence (MI, −0.27 ± 0.02, n = 80, p < 0.0001), confirming that the former occurred during periods of behavioral activity. Furthermore, responses to self-generated sounds were not only smaller than to the random sounds selected for analysis, as already demonstrated (Fig. 2I), but also to the subset of random sounds occurring at higher movement speeds than self-generated sounds (MI, −0.39 ± 0.03 vs −0.22 ± 0.02, n = 80, p < 0.0001).
To examine the influence of behavioral state on auditory responsiveness during head-fixed recordings, we compared single-unit responses to self-generated sounds and random sounds that by chance occurred shortly before (<500 ms) the animals' pressed the lever. During this time period, animals often displayed behavioral activity, including licking in anticipation of reward delivery. Random stimuli delivered in this period therefore occurred while the animals were in a behavioral state very similar to the one in which self-generated sounds occurred, without being caused by the animal. Notably, responses to self-generated sounds were smaller than to this subset of randomly generated sounds (Fig. 9F; random before lever, 9.59 ± 0.82 Hz; self-generated, 7.05 ± 0.65 Hz, n = 83 neurons, p < 0.0001, Wilcoxon sign-rank test), and the latter were also similar to random sounds that did not occur close to a lever press (10.42 ± 0.88 Hz, n = 83, p = 0.10). The attenuation of responses to self-generated sounds was of similar magnitude whether it was calculated relative to one subset of random sounds or the other (random sounds, −0.17 ± 0.04; random sounds before lever press MI, −0.15 ± 0.04, n = 83, p = 0.19). Together, these results suggest that the attenuated responses to self-generated sounds do not simply reflect the influence of behavioral activity or behavioral state on auditory responsiveness.
Stronger attenuation of responses to expected than unexpected self-generated sounds
An alternative explanation for our results is that responses to self-generated sounds are smaller because they are caused by the animals' behavior and therefore are expected to occur. Indeed, it has long been proposed that responses to self-generated stimuli are attenuated by a “corollary discharge” signal representing the expected sensory consequences of the organism's actions (Crapse and Sommer, 2008; Schneider and Mooney, 2015; Schröger et al., 2015). If responses to self-generated sounds are attenuated because they match the sensory expectations derived from the animal's behavior, attenuation should be reduced when self-generated stimuli violate those expectations. We therefore asked what would happen if pressing the lever would trigger a different and hence “unexpected” sound on a subset of trials. To this end, we modified our head-fixed paradigm such that pressing the lever triggered a pure tone of a particular frequency (the “expected” tone) on 75% of trials and a pure tone of a different frequency (the “unexpected tone”) on the remaining 25% of trials. The same two tones were also presented randomly with the same probabilities during the experimental sessions (Fig. 10A). MIs were calculated separately for expected and unexpected self-generated sounds by comparing them to expected and unexpected random sounds, respectively (see Materials and Methods).
While animals experienced expected and unexpected self-generated sounds, we recorded the activity of auditory cortical neurons and classified them as putative pyramidal neurons or interneurons using the same criteria as before (Fig. 6A). Consistent with our previous results, we observed attenuation of responses to “expected” self-generated sounds in putative pyramidal neurons (Fig. 10B; −0.19 ± 0.03, n = 135, p < 0.0001) as well as in interneurons (Fig. 10C; − 0.16 ± 0.04, n = 60, p < 0.001). Notably, however, putative pyramidal neurons showed weaker attenuation of responses to self-generated but “unexpected” sounds (Fig. 10B; −0.09 ± 0.03, n = 135, p < 0.001, compared with expected sounds), whereas putative inhibitory neurons showed comparable levels of attenuation to both (Fig. 10C; p = 0.96). In principle, the smaller attenuation of responses to self-generated unexpected sounds in pyramidal neurons could reflect larger responses to these sounds or smaller responses to unexpected random sounds, or both. To distinguish between these possibilities, we computed an MI as follows comparing responses to expected and unexpected sounds: MI = (ResponseUnexpected − ResponseExpected)/(ResponseUnexpected + ResponseExpected). This was computed separately for random and self-generated sounds (MIRandom and MISelf-generated; see Materials and Methods). In pyramidal neurons, MISelf-generated was significantly positive (Fig. 10D; 0.11 ± 0.03, n = 135 neurons, p < 0.001), indicating larger responses to self-generated unexpected sounds whereas MIRandom did not differ significantly from zero (Fig. 10D; 0.00 ± 0.03, n = 135 neurons, p = 0.92) and was smaller than MISelf-generated (p < 0.001). In contrast, interneurons displayed larger responses to both unexpected random sounds (Fig. 10E; MIRandom: 0.11 ± 0.03, n = 60 neurons, p < 0.001) and unexpected self-generated sounds (MISelf-generated: 0.11 ± 0.05, n = 60, p = 0.02). The magnitude of this effect was the same for random and self-generated sounds (MIRandom vs MISelf-generated, p = 0.79), suggesting a nonspecific increase in responses to unexpected sounds. Importantly, the larger responses of putative pyramidal neurons to unexpected self-generated sounds are not simply due to the fact that they are less likely to occur per se, since the random expected and unexpected sounds were delivered with the same probabilities. Rather, these results suggest that the attenuation of self-generated sounds depends on the probability that they will be caused by the animal's behavior and furthermore that this attenuation is stimulus-specific. Note that these results also control for the influence of behavioral state, which should be identical preceding either expected or unexpected self-generated sounds.
Discussion
Previous studies in human subjects have consistently observed attenuated neural responses to self-generated sounds that are triggered manually (Schafer and Marcus, 1973; Martikainen et al., 2005; Baess et al., 2011). However, because these studies relied on macroscopic measurements of brain activity, it has remained unclear how this attenuation might manifest itself in the responses of individual neurons. To address this, we developed an experimental paradigm similar to what has been used in human studies, in which mice generated sounds by pressing a lever. In both freely moving and head-fixed mice, self-generated sounds evoked smaller responses in auditory cortical neurons than the same sounds presented independently of the animal's behavior. This effect was seen both in the population average as well as in the responses of individual neurons; of the neurons showing significantly different responses to self-generated and random sounds, almost all showed smaller responses to self-generated sounds.
Although our study is the first to examine cellular responses to manually self-generated sounds, previous studies in primates have found that the majority of neurons in the auditory cortex strongly decrease their spontaneous firing rates shortly before and during vocalizations and do not show an evoked response to the vocalizations themselves (Eliades and Wang, 2003, 2008). In contrast, neurons in our study displayed slightly increased baseline firing rates preceding self-generated sounds and in most cases showed clear evoked responses that were nonetheless diminished in amplitude. The more specific attenuation of evoked responses in our study may reflect differential processing of self-generated sounds, depending on whether they are generated by vocal or nonvocal means or whether they are caused by behaviors that are learned (i.e., lever-pressing) rather than innate (i.e., vocalizations); also, species differences cannot be ruled out. Additional experiments that, for example, directly compare responses to vocal and nonvocal self-generated sounds, will be needed to further examine these issues. It should also be noted that the strain of mice we used (C57BL/6) displays accelerated age-related hearing loss starting around the age at which our experiments began. Although this did not prevent us from observing robust responses to auditory stimuli (which were presented well above hearing thresholds), alterations in the cortical representation of sound have nevertheless been reported at this age (Willott et al., 1993) and may have influenced our results.
Previous studies have shown that evoked responses in the auditory thalamus are reduced during active behavior (McGinley et al., 2015b; Williamson et al., 2015; but see Otazu et al., 2009; Zhou et al., 2014). We observed in our study that auditory thalamic neurons showed mild attenuation of responses to self-generated sounds. Although the magnitude of this effect was much weaker than what we observed in cortical neurons, it is nonetheless possible that it could be amplified and lead to larger attenuation in the cortex. However, if the attenuation in cortical neurons were simply inherited from the thalamus, it should have been observed throughout the cortex, whereas we found that it was largely restricted to the lower cortical layers. Furthermore, we found that responses of cortical neurons to self-generated optogenetic stimulation of thalamocortical terminals were attenuated. Together, these findings suggest that intracortical mechanisms contribute to the attenuation of responses to self-generated sounds. We also observed slightly increased spontaneous firing rates in the thalamus before self-generated sounds. Such increases have been observed in previous studies during active behavioral states (Otazu et al., 2009; McGinley et al., 2015a) and have been suggested to cause reduced responsiveness of cortical neurons by depressing thalamocortical synapses (Otazu et al., 2009). However, if this were the case in our study, it would also have led to reduced responses to random sounds occurring immediately before a lever press, which we did not observe.
We compared responses of putative pyramidal neurons and interneurons to self-generated sounds and found that they were attenuated to a comparable extent. However, because our findings apply primarily to the subtype of interneurons with a narrow spike waveform, the responses of other interneuron classes to self-generated sounds will need to be examined. We also found that responses were robustly attenuated in lower cortical layers, whereas only weak attenuation was seen in upper layers. The lower layers (defined as 500–1000 μm below the brain surface, primarily comprising layers 5–6) receive strong top-down input from other cortical areas (Sakata and Harris, 2009), including the motor cortex (Nelson et al., 2013), which may contribute to the attenuated responses we observed (see below). The lower layers also project strongly to the striatum, brainstem, and spinal cord; the stronger attenuation in these layers may therefore limit the influence of self-generated sounds over behavioral output. Interestingly, we observed that in the hippocampus, responses were almost entirely suppressed to self-generated sounds, suggesting that these stimuli are strongly filtered once they reach areas involved in higher-order functions, such as memory and decision-making.
Recent studies have shown that the responsiveness of auditory cortical neurons to externally generated sounds is reduced during movement and behavioral activity (Schneider et al., 2014; Zhou et al., 2014; McGinley et al., 2015a,b). We were able to replicate this finding by comparing responses to the randomly generated stimuli in our experiments during periods of quiescence and movement. However, we also show that behavioral activity in itself cannot account for the attenuated responses to self-generated sounds. First, we compared these sounds to random sounds occurring while animals were actively engaged in the lever-pressing task. This suggests that our effects are distinct from the attenuation of cortical auditory-evoked responses seen during general task engagement (Otazu et al., 2009). Second, responses to self-generated sounds were weaker than to randomly generated sounds that occurred during greater levels of behavioral activity (higher movement speed; Fig. 9). Finally, responses to self-generated sounds were attenuated to the same extent when compared with random stimuli that occurred immediately preceding (<500 ms before) lever presses, when animals were in a similar behavioral state as during self-generated sounds. It is also worth noting that we did not observe a decrease in the spontaneous firing rates of auditory cortical neurons preceding self-generated sounds, which previous studies have found distinguish behaviorally active from quiescent states (Schneider et al., 2014; Zhou et al., 2014). We therefore conclude that the attenuation of responses to self-generated sounds is an effect that is distinct from the more general reduction in sensory responsiveness caused by active behavioral states.
It has long been suggested that the processing of self-generated stimuli relies on “corollary discharge” signals that represent the expected sensory consequences of the organism's behavior (Crapse and Sommer, 2008; Schneider and Mooney, 2015; Schröger et al., 2015). According to this view, responses to self-generated stimuli are attenuated because they match such sensory predictions. Consistent with this idea, we found that responses of putative pyramidal neurons were less attenuated when self-generated sounds were ‘unexpected’ (occurred with a lower probability) than when they were ‘expected’ (occurred with a higher probability). Numerous studies have demonstrated that auditory cortical neurons in animals and electroencephalographic signals in humans show larger responses to low-probability sounds presented passively (Ulanovsky et al., 2003; Bendixen et al., 2012). This phenomenon, however, cannot explain our results because the same expected and unexpected sounds did not elicit different responses in pyramidal neurons when presented randomly (Fig. 10). This suggests that the stronger attenuation of responses to self-generated “expected” sounds is the result of stimulus-specific sensory predictions derived from the animals' behavior. These results are in agreement with those of previous studies describing smaller attenuation of neural responses to unexpected self-generated sounds in human subjects (Knolle et al., 2013) as well as to pitch-shifted vocalizations in both human subjects (Heinks-Maldonado et al., 2005) and nonhuman primates (Eliades and Wang, 2008). Collectively, these results and ours are consistent with a larger body of research showing that the auditory system uses predictive mechanisms to process incoming sensory information (Schröger et al., 2015).
The neural circuits mediating the attenuation of self-generated sounds remain to be elucidated. It has been proposed that “efference copies” of motor signals are sent to sensory regions to modulate sensory input caused by the organism's behavior (Crapse and Sommer, 2008). Consistent with this possibility, the auditory cortex receives direct projections from the motor cortex in the mouse (Nelson et al., 2013). These projections contribute to the decrease in responsiveness of auditory cortical neurons during movement (Schneider et al., 2014) and may also underlie the more specific attenuation of self-generated sounds we observed. Local circuit mechanisms within the auditory cortex are also likely to be important, in particular for selectively attenuating responses to specific sounds, as we have shown (Fig. 10). Although synaptic inhibition is a plausible circuit mechanism for response attenuation, our results do not support a major role for narrow-spiking interneurons, which comprise a large fraction of the interneuron population. However, recent studies have revealed that different interneuron subpopulations play diverse roles in modulating auditory responsiveness, often in a stimulus-specific manner (Kato et al., 2015; Natan et al., 2015). A more detailed characterization of how these different subpopulations respond to self-generated sounds will therefore be required in future studies.
Understanding how self-generated stimuli are processed under normal conditions may also have implications for the study of neuropsychiatric disorders. Individuals suffering from schizophrenia show reduced attenuation of responses to their own speech sounds (Ford et al., 2001; Perez et al., 2012) and to manually generated sounds (Ford et al., 2014). Deficits in processing self-generated somatosensory (Shergill et al., 2005) and visual (Thakkar et al., 2015) stimuli have also been observed in schizophrenia patients. These deficits have been hypothesized to underlie the hallucinations and delusions that are characteristic of schizophrenia and may provide important insights into the pathophysiology of the disease (Heinks-Maldonado et al., 2007; Fletcher and Frith, 2009). Importantly, our results establish an experimental paradigm, closely resembling studies in human subjects, with which deficits in processing self-generated sounds could be studied in animal models of schizophrenia, thus allowing the underlying neural circuit impairments to be investigated in greater detail (Sigurdsson, 2016).
Footnotes
This work was supported by LOEWE (Landes-Offensive zur Entwicklung wissenschaftlich-ökonomischer Excellenz) Grant Neuronale Koordination Forschungsschwerpunkt Frankfurt (NeFF) and the Deutsche Forschungsgemeinschaft (Grant SI 1942/2-1). The authors thank members of the Sigurdsson and Roeper laboratories for helpful discussion; Sevil Duvarci for assistance with illustrations and comments on the manuscript; and Bea Kern, Jasmine Salmen, and Thomas Wulf for histological and technical assistance. T.S. thanks Joshua Gordon for his generous support during the initial stages of the project and Clay Lacefield for help designing the behavioral apparatus.
The authors declare no competing financial interests.
- Correspondence should be addressed to Torfi Sigurdsson, Institute of Neurophysiology, Neuroscience Center, Goethe University, Frankfurt 60590, Germany. sigurdsson{at}em.uni-frankfurt.de