Abstract
According to the predictive processing framework, perception emerges from the reciprocal exchange of predictions and prediction errors (PEs) between hierarchically organized neural circuits. The nonlemniscal division of the inferior colliculus (IC) is the earliest source of auditory PE signals, but their neuronal generators, properties, and functional relevance have remained mostly undefined. We recorded single-unit mismatch responses to auditory oddball stimulation at different intensities, together with activity evoked by two sequences of alternating tones to control frequency-specific effects. Our results reveal a differential treatment of the unpredictable “many-standards” control and the predictable “cascade” control by lemniscal and nonlemniscal IC neurons that is not present in the auditory thalamus or cortex. Furthermore, we found that frequency response areas of nonlemniscal IC neurons reflect their role in subcortical predictive processing, distinguishing three hierarchical levels: (1) nonlemniscal neurons with sharply tuned receptive fields exhibit mild repetition suppression without signaling PEs, thereby constituting the input level of the local predictive processing circuitry. (2) Neurons with broadly tuned receptive fields form the main, “spectral” PE signaling system, which provides dynamic gain compensation to near-threshold unexpected sounds. This early enhancement of saliency reliant on spectral features was not observed in the auditory thalamus or cortex. (3) Untuned neurons form an accessory, “nonspectral” PE signaling system, which reports all surprising auditory deviances in a robust and consistent manner, resembling nonlemniscal neurons in the auditory cortex. These nonlemniscal IC neurons show unstructured and unstable receptive fields that could result from inhibitory input controlled by corticofugal projections conveying top-down predictions.
- deviance detection
- linear model
- predictive processing
- repetition suppression
- SSA/stimulus-specific adaptation
- single units
- subcortical
Significance Statement
Frequency response areas of nonlemniscal neurons in the inferior colliculus (IC) correlate with certain predictive processing traits, distinguishing two prediction error systems. The “spectral” system comprises neurons with broadly tuned receptive fields and is exclusively at play in the nonlemniscal IC. It generates prediction errors only to low-intensity deviant sounds and is biased toward ascending changes in frequency. Hence, it provides early gain compensation to near-threshold but informative acoustic events, thereby facilitating efficient auditory discrimination under challenging conditions. The “nonspectral” system reports on unexpected auditory events, uninfluenced by the acoustic spectral characteristics of deviant sounds. This accessory system comprises untuned neurons with disorganized and dynamic receptive fields, which conceivably receive top-down predictions from higher-order levels of the auditory hierarchy.
Introduction
Probability-biased processing of auditory input first appears at the inferior colliculus (IC; for review, see Malmierca et al., 2019). During an auditory oddball paradigm (Fig. 1A), certain IC neurons reduce discharges to standard (STD) tones while responding vigorously to deviant (DEV) conditions (Malmierca et al., 2009). The resulting neuronal mismatch response has been interpreted as early predictive activity within the auditory pathway (for reviews, see Carbajal and Malmierca, 2018, 2020; Tabas and von Kriegstein, 2021). According to the predictive processing framework (Friston, 2003, 2009, 2010; Hohwy, 2013), high-order neural networks develop predictions to inhibit input from lower levels when it fits into the current predictive model, while lower-order levels signal back prediction errors (PE) whenever expectations are not met. Although hierarchical predictive processing was originally conceived for cortical networks (Rao and Ballard, 1999; Friston, 2005; Friston and Kiebel, 2009a; Bastos et al., 2012; Keller and Mrsic-Flogel, 2018), empirical testing in the auditory system is expanding the framework into subcortical territory, as early as the IC (Parras et al., 2017; Tabas et al., 2020; Valdés-Baizabal et al., 2020; Lesicko et al., 2022; Tabas and von Kriegstein, 2024).
Experimental design. A, Three possible experimental conditions within an oddball paradigm for a given tone of interest
The IC constitutes a crucial relay station of the ascending auditory pathway that gives rise to two parallel lines of processing (Cant and Oliver, 2018; Liu et al., 2024). The lemniscal pathway that emerges from the central nucleus presents a sharp tonotopy with minimal cortical input, whereas the nonlemniscal pathway that originates from the IC cortices shows a blurred tonotopy under heavy cortical input (Schreiner and Langner, 1997; Malmierca, 2015). In a previous study (Parras et al., 2017), we performed extracellular recordings in both divisions of the IC (Fig. 1C), medial geniculate body (MGB) of the thalamus, and auditory cortex (AC), while controlling frequency-specific adaptation induced during the oddball paradigm (Fig. 1A) by means of two control sequences of alternating tones without consequent repetitions: cascade (CAS) and many-standards control (MSC; Fig. 1B; Ruhnau et al., 2012). Using control-evoked activity as a benchmark, each neuronal mismatch response was dissociated in two components: (1) repetition suppression (Garrido et al., 2009a; Auksztulewicz and Friston, 2016; Grotheer and Kovács, 2016), indexed as attenuation of the STD-evoked response that could be accounted for by frequency-specific effects (Mill et al., 2011; Taaseh et al., 2011), and (2) PE, indexed as an enhancement of the DEV-evoked response due to predictive processing activity (Fig. 1D). This study revealed the IC cortices as the earliest source of auditory PE signals, which amplified as they ascended through the nonlemniscal pathway into the AC in a hierarchical distribution (Parras et al., 2017). Combining this method (Fig. 1) with microiontophoresis unveiled that this early flow of PE signals is modulated by dopaminergic input (Valdés-Baizabal et al., 2020). A recent study using optogenetic inactivation of the auditory corticocollicular feedback in awake mice revealed the critical role of top-down input in subcortical PE signaling (Lesicko et al., 2022). Neuroimaging data from humans also reflect predictive activity at midbrain level (Slabu et al., 2012; Cacciaglia et al., 2015; Tabas et al., 2020; Tabas and von Kriegstein, 2024). However, an in-depth analysis of neural generators and functional properties of early auditory PEs is still lacking.
The present study aims to characterize the unique traits of PE signaling in the IC, as compared with that in the MGB and AC, by correlating it with sound intensity, direction of frequency changes, and neuronal receptive fields. Our results reveal a dual nature of predictive processing in the IC cortices that is not present in higher regions of the auditory pathway. PE signaling in nonlemniscal IC neurons with broad receptive fields is fundamentally dependent on certain spectral features, whereas untuned neurons with unstructured and dynamic receptive fields disregard acoustic aspects when reporting unexpected input. For descriptive purposes, we will refer to the distinct predictive activity of these two neural populations, respectively, as the “spectral” and “nonspectral” PE systems of the IC cortices.
Materials and Methods
Ethics statement
All surgical procedures were approved by the Bioethics Committee for Animal Care of the University of Salamanca (USAL-ID-195 and USAL-ID-574) and performed in compliance with the standards of the European Convention ETS 123, the European Union Directive 2010/63/EU, and the Spanish Royal Decree 53/2013 for the use of animals in scientific research.
Surgical procedures
The surgical and recording procedures were as described previously (Nieto-Diego and Malmierca, 2016; Parras et al., 2017; Valdés-Baizabal et al., 2020). Surgical anesthesia was first induced with (1) a mixture of ketamine (100 mg/kg) and xylazine (20 mg/kg) injected intramuscularly and then maintained with urethane (0.8 g/kg) injected intraperitoneally or (2) directly induced with urethane (1.5 g/kg, i.p.). To ensure a stable deep anesthetic level, we administrated supplementary doses of urethane (∼0.5 g/kg) intraperitoneally when the rat recovered pedal withdrawal reflexes. Urethane was chosen over other anesthetic agents because it preserves normal neural activity better, having a modest, balanced effect on inhibitory and excitatory synapses (Maggi and Meli, 1986; Hara and Harris, 2002; Sceniak and MacIver, 2006; Duque et al., 2016).
Normal hearing of rats was verified by recording the auditory brainstem responses subcutaneously with needle electrodes. An RZ6 Multi I/O Processor (Tucker-Davis Technologies) was used to acquire the auditory brainstem response, which was processed with BioSig software (Tucker-Davis Technologies) before beginning each experiment. Stimuli used to elicit the auditory brainstem response consisted of 0.1 ms clicks at a rate of 21 clicks/s, delivered monaurally to the right ear in 10 dB steps, from 10 to 90 decibels of sound pressure level (dB SPL), in a closed system through a Beyer DT-770 earphone (0.1–45 kHz) fitted with a custom-made cone and coupled to a small tube (12 G) sealed in the ear.
Once normal hearing had been confirmed, a cannula was installed in the trachea to provide artificial ventilation to the rat with monitored expiratory CO2, given that urethane depresses respiratory function. Likewise, to maintain body temperature at 37 ± 1°C, rats were introduced a rectal thermometer and placed on a homeothermic blanket system (Cibertec). The head was stabilized in a stereotaxic frame with a bite bar and two hollow specula replacing the ear bars. A sound delivery system was accommodated within the right hollow speculum. Eyes were protected with a drop of ophthalmic gel. To prevent an excess of bronchial secretions, 0.1 mg/kg of atropine sulfate was administrated subcutaneously. To ameliorate brain edema, we injected 0.25 mg/kg of dexamethasone intramuscularly. To prevent dehydration, we administrated 5 ml of glucosaline solution subcutaneously. The scalp was shaved, and the revealed skin was disinfected with povidone-iodine. Using a scalpel, an incision was opened along the midline to expose the skull, and the periosteum covering the parietal and the most rostral part of the occipital bones was retracted. Using a dental drill, a round craniotomy was performed in the caudal part left parietal bone, just rostral to the lambdoid suture and lateral to the sagittal suture, thereby exposing the cerebral cortex overlying the left IC. The exposed dura was removed, and the tissue beneath it was covered with 2% agar to prevent desiccation during the recording session.
Data acquisition procedures
The dataset used in this study is a compilation of extracellular single-unit recordings performed throughout a period of 7 years following the same experimental design (Fig. 1). Most of the neurons from the MGB and the AC have featured in previous studies (Parras et al., 2017, 2021; Pérez-González et al., 2021). In the present study, they are combined and reanalyzed for the purpose of comparing their population trends with those of the IC, which is the focus of the present study. Regarding the IC, 125 neurons in the current dataset have featured in two preceding studies (Parras et al., 2017; Valdés-Baizabal et al., 2020), while the remaining 190 units of the sample (60%) were specifically recorded for this purpose. We will describe the acquisition procedures for these IC data in the following, while the methods followed for the MGB and AC data are detailed previously (Parras et al., 2017).
Tungsten microelectrodes used to record the extracellular activity of IC neurons were crafted with a tip impedance of 1.5–3.5 MΩ at 1 kHz following the ensuing protocol (Merrill and Ainsworth, 1972). Experiments were performed inside a sound-insulated and electrically shielded chamber. The microelectrode was mounted on a holder over the exposed cortex, forming an angle of 20° perpendicularly rostral to the coronal plane. Using a piezoelectric micromanipulator (Sensapex), the electrode was inserted into the brain while measuring the penetration depth until strong spiking activity synchronized with the train of searching stimuli could be identified. All sound stimuli were generated using the RZ6 Multi I/O Processor (Tucker-Davis Technologies) and custom software programmed with OpenEx Suite (Tucker-Davis Technologies) and MATLAB (MathWorks). In search of evoked auditory neuronal responses from the IC, white noise bursts and sinusoidal pure tones of 75 ms duration with 5 ms rise–fall ramps were presented, varying stimuli parameters manually to prevent frequency-specific adaptation.
Once the activity of a single neuron was clearly isolated, only pure tones were used to record the experimental stimulation protocols, which ran at four stimuli per second. Stimuli were delivered monaurally to the ear contralateral to the left IC through a close-field speaker. We calibrated the speaker using a 0.25-inch condenser microphone (model 4136, Brüel & Kjær) and a dynamic signal analyzer (Photon+, Brüel & Kjær) to ensure a flat spectrum up to 76 ± 3 dB SPL between 0.5 and 45 kHz and that the second and third signal harmonics were at least 40 dB lower than the fundamental at the loudest output level.
Analog signals were digitized with a RZ6 Multi I/O Processor, a RA16PA Medusa Preamplifier, and a ZC16 headstage (Tucker-Davis Technologies) at 12 kHz sampling rate and amplified 251×. Neurophysiological signals of spiking activity were bandpass filtered between 0.5 and 3 kHz using a second-order Butterworth filter. Stimulus generation, neuronal response processing, and visualization were controlled online with custom software created with the OpenEx suite (Tucker-Davis Technologies) and MATLAB. A unilateral threshold for automatic action potential detection was manually set at ∼2–3 standard deviations of the background noise. Spike waveforms were displayed on the screen and overlapped on each other in a pile plot to facilitate isolation of single unit activity. The recorded action potentials were considered to belong to a single unit only when all spike waveforms were akin, clearly separable from other smaller units and the background noise, and the spike amplitude-to-noise ratio of the average waveform was larger than 5.
Experimental design
A map of response magnitude for each frequency–intensity combination was first computed, representing the receptive field of the single unit in a frequency response area (FRA; Figs. 4A, 5A). The stimulation protocol to obtain the FRA consisted of a sequence of sinusoidal pure tones ranging between 0.7 and 44 kHz, 75 ms of duration with 5 ms rise–fall ramps, presented at a 4 Hz rate, randomly varying frequency and intensity (3–5 repetitions of all tones).
Then, 10 frequencies separated by 0.5 octave steps at a fixed sound intensity (usually 10–20 dB above minimal response threshold) were selected so that at least two consecutive tones fell within the excitatory region of the FRA. These 10 tones were used to generate the control sequences (Fig. 1B), and adjacent pairs within the excitatory region of the FRA were used to create oddball sequences (Fig. 1A). All sequences were 400 tones in length, presented at a constant rate of 4 Hz and at a constant intensity between 10 and 70 dB SPL, in steps of 10 dB SPL.
A first complete experimental set, composed of oddball and control sequences in both their “ascending” and “descending” versions (Fig. 1A,B), was presented at a chosen intensity. Whenever it was possible to maintain a certain neuron isolated over a long period of time, another set would be presented at a different intensity. Usually, one intensity would be <40 dB SPL (low) and the other ≥40 dB SPL (high), but this depended on the receptive field of each neuron as shown by the FRA. The presentation of a set at different intensity levels did not follow any particular order. Sometimes the stability of the signal allowed to record multiple complete sets of sequences at several levels of intensity. However, such occurrence was relatively rare, since our recording protocol did not permit reisolation attempts. When signal quality dropped below standards, the recording was interrupted and the incomplete set discarded.
A 10% probability was used to pseudorandomly distribute tones across control sequences within chunks of 10 stimuli, as well as to scatter DEV tones across the oddball sequence. Since only the last STD tones before a DEV were considered for analysis, this method resulted in 40 trials of DEV, STD, MSC, and CAS for each given tone of interest. In the oddball paradigm, the first 10 tones were always STD, and a minimum of 3 STD preceded each DEV, which were pseudorandomly presented with a 10% probability. Oddball sequences were labeled either “ascending” or “descending”, depending on whether DEV had a higher or lower frequency than STD, respectively (Fig. 1A). DEV and STD conditions of the oddball paradigm will be used to obtain neuronal mismatch measurements.
One limitation of the neuronal mismatch measurements obtained using the oddball paradigm is that the activity related to high-order processes of PE signaling cannot be distinguished from lower-order effects such as frequency-specific adaptation (Todorovic and de Lange, 2012; Carbajal and Malmierca, 2018). The controls allow assessing the relative contribution of both higher- and lower-order processes to the overall mismatch response (Ruhnau et al., 2012). These controls of the auditory oddball paradigm are tone sequences that must meet three criteria: (1) to feature the same tone of interest with the same presentation probability as that of the DEV (i.e., 10%); (2) to induce an equivalent state of refractoriness by presenting the same rate of stimulus per second (i.e., 4 Hz); and (3) to present no recurrent repetition of any individual stimulus, especially the tone of interest (Carbajal and Malmierca, 2020).
These three presentation criteria prevent the short-term dynamics of frequency-specific adaptation in the IC while keeping the long-term dynamics at a minimum (Zhao et al., 2011). Eliminating this minimum would be impractical due to the long-lasting effects of frequency-specific adaptation, which in the rat IC have been reported at presentation rates of up to one repetition per second (Pérez-González et al., 2005; Zhao et al., 2011). Consequently, even DEV-evoked responses manifest some marginal frequency-specific adaptation (Malmierca et al., 2009) and will similarly affect tones in a control sequence when presented at equal rate and probability. Note that the objective of these criteria is not to avoid inducing frequency-specific adaptation altogether during the control sequences. Their aim is to guarantee that the response to the tone of interest is subject to similar amounts of refractoriness and adaptation in DEV and control conditions, thereby allowing comparisons to reveal the influence of higher-order processes.
Hence, we can assess the portion of the mismatch response (DEV–STD) that can be attributed to frequency-specific adaptation induced during the STD train (Ulanovsky et al., 2003; Jääskeläinen et al., 2004; Mill et al., 2011). When the auditory-evoked response is similar or higher during the control than in DEV, then the mismatch response can be fully accounted for by repetition suppression, and no higher-order process of PE signaling can be deduced (i.e., DEV ≤ control; Fig. 1D). Otherwise, a stronger response to DEV than to the control unveils a component of the mismatch response that can be better explained by PE signaling (i.e., DEV > control; Fig. 1D).
To dissociate the relative contribution of frequency-specific effects from genuine PE signaling (Fig. 1D), we generated two different controls for our oddball paradigms: MSC and CAS (Fig. 1B). MSC presents the tone of interest embedded in a random sequence of assorted tones, where each tone shares the same 10% presentation probability as DEV in the oddball paradigm (Schröger and Wolff, 1996; Jacobsen et al., 2003). However, some authors have argued that the MSC is not fully comparable with the oddball paradigm, inasmuch as the disorganized succession of tones creates a context of uncertainty that never allows to generate high-precision predictions, whereas STD does (Ruhnau et al., 2012; Harms et al., 2014). Additionally, each tone is preceded by a different tone from trial to trial, which makes it difficult to control any possible effects of spectral processing caused by the differences in frequency steps.
CAS tries to overcome the alleged caveats of MSC by presenting tones in a regular fashion, that is, in an increasing or a decreasing frequency succession or scale. Thus, the stimulus of interest conforms to a regularity—as opposed to DEV—but not a regularity established by repetition, contrary to STD, thereby restricting possible frequency-specific effects to a minimum while making CAS a more fitted control than MSC. As an additional advantage, the tone immediately preceding the tone of interest is the same in both oddball and cascade sequences, since only versions following the same direction will be compared (i.e., DEV ascending vs CAS ascending, DEV descending vs CAS descending). This allows to control for possible spectral sensitivity effects, which are responses to a rise or fall in frequency between two successive tones. For these reasons, CAS is usually regarded as a better control for the oddball paradigm (Ruhnau et al., 2012; Harms et al., 2014).
Histology and neuroanatomical location
At the end of each experiment, electrolytic lesions were inflicted by applying an electric current of 5 μA during 5 s through the recording electrode along the recorded tract. If still alive, animals were killed by injecting a lethal dose of pentobarbital, after which they were decapitated. Brains were immediately immersed in a mixture of 4% formaldehyde in 0.1 M PB. After fixation, the neural tissue was cryoprotected in 30% sucrose and sectioned in the coronal plane at 40 µm thickness on a freezing microtome. Slices were stained with 0.1% cresyl violet to facilitate identification of cytoarchitectural boundaries. Finally, the recorded neurons were assigned to one of the main subdivisions of the IC using the standard sections from a rat brain atlas as reference (Paxinos and Watson, 2007).
The main dataset is composed of 315 auditory neurons recorded in the IC of 51 anesthetized Long–Evans rats. Additionally, 161 neurons from the MGB of 34 rats and 181 neurons from the AC of 51 rats from our lab's database were included in this study for comparative purposes. Histological examination of the IC samples revealed that 70 neurons came from the central nucleus of the IC, the lemniscal division. The other 245 collicular neurons were in the cortical regions of the IC, the nonlemniscal division. MGB samples were composed of 58 lemniscal (ventral division) and 103 nonlemniscal neurons (dorsal and medial divisions). The AC sample included 119 neurons from the lemniscal core (primary, anterior, and ventral fields) and 62 neurons from the nonlemniscal belt (posterior and suprarhinal fields).
Data analysis and visualization
All data analyses and data visualization were performed with MATLAB software, using the built-in functions, the Statistics and Machine Learning toolbox, and custom scripts and functions developed in our laboratory.
Code accessibility
All data and codes used in this study are available upon direct request to the corresponding author.
Baseline-corrected spike counts
Taking the 40 trials available for each tone (kHz) within each condition (DEV, STD, MSC, and CAS), a peristimulus time histogram was computed to represent the action potential density over time in spikes per second from −75 to 250 ms around stimulus onset. This histogram was smoothed with a 6 ms Gaussian kernel (ksdensity function in MATLAB) in 1 ms steps to estimate the spike-density function over time. The baseline spontaneous firing rate was determined as the average firing rate (in spikes/s) during the 75 ms preceding stimulus onset. The excitatory response was measured as the area below the spike-density function and above the baseline spontaneous firing rate, between 0 and 180 ms after stimulus onset. In other words, the average level of spontaneous firing within the −75 to 0 ms time window was subtracted from the activity recorded within the 0–180 ms time window, thereby obtaining a measurement of the evoked firing activity that is referred to as the baseline-corrected spike count.
After subtracting the spontaneous activity, we used a Monte Carlo method to find statistically significant responses evoked by sound within the resulting baseline-corrected spike counts. This approach consists of a probability simulation that withdraws numerical values from several random samplings. First, 1,000 peristimulus time histograms were simulated using a Poisson model with a constant firing rate equal to the baseline spontaneous firing rate. With this collection of histograms, we generated a null distribution of baseline-corrected spike counts. Finally, we computed the p value of the original baseline-corrected spike count as p = (g + 1) / (N + 1), where g is the count of null measures greater than or equal to the baseline-corrected spike count and N = 1,000 is the size of the null sample. Hence, the Monte Carlo method allowed us to remove any unit-frequency combinations without significant firing activity evoked in response to at least one of the conditions in each experimental set (DEV, STD, or control).
Calculation of predictive processing indices
CAS and MSC were introduced to control for the repetition effects of the oddball paradigm, allowing a dissociation of frequency-specific adaptation into PE and repetition suppression components. To adequately compare between responses from different neurons, we normalized the spike count evoked by each tone in DEV, STD, and control as follows:
Lastly, to analyze the emergence of predictive signals around stimulus presentation, we also calculated the average iPE in eight time windows of 20 ms width from −50 to 190 ms relative to stimulus onset.
FRA analysis and classification
We investigated whether PE signaling in the IC was influenced by the neuronal tuning bandwidth, considering the different morphologies that the FRA of IC neurons can exhibit (Figs. 4A, 5A). First, the characteristic frequency (kHz) within each FRA was identified as the sound frequency that evoked a response with the lowest sound intensity. Then, the bandwidth at 30 dB SPL above threshold was measured for each unit by calculating the difference between the base 2 logarithms of the upper and lower frequencies of the tuning curve, expressed in kHz. The most used measure to express the sharpness of frequency tuning is the Q factor. Qn is defined as the characteristic frequency divided by the bandwidth in kHz at “n” dB above threshold. Thus, the Q30 was calculated, and its correlation with the iPE was tested.
In addition, each FRA was classified according to its shape. We established four categories following criteria adapted from Hernández et al. (2005):
V-shaped FRAs (V; Figs. 4A, 5A, first plot) must show a primary-like receptive field, similar to those observed in the auditory nerve fibers. These FRAs exhibit a very steep high-frequency slope and a shallower low-frequency slope. The low-frequency slope might comprise two segments separated by an elbow. One of the low-frequency segments has a steeper slope and extends from the tip of the FRA to ∼35 dB or more above threshold. The second segment has a very shallow slope, forming the so-called low-frequency tail, but it is not always present. For concision, in the text we will refer to nonlemniscal IC neurons showing V-shaped FRAs simply as “V-neurons” or sharply tuned neurons.
Multipeaked FRAs (W; Figs. 4A, 5A, second plot) had two or more excitatory regions separated by an area of little or no response. These FRAs had to meet three criteria: (1) the thresholds of at least two of the peaks had to differ by ≤30 dB; (2) separation between peaks had to be maintained for levels ≥20 dB above minimum thresholds; and (3) firing rate in the peaks had to be of at least ≥1 spike per trial. For concision, in the text we will refer to nonlemniscal IC neurons showing multipeaked FRAs simply as “W-neurons.”
U-shaped FRAs (U; Figs. 4A, 5A, third plot) responded similarly to a wide range of frequencies at its threshold level and presented relatively shallow slopes on the sides. For concision, in the text we will refer to nonlemniscal IC neurons showing U-shaped FRAs simply as “U-neurons.” W-neurons and U-neurons may also be collectively referred to as broadly tuned neurons.
Mosaic FRAs (X; Figs. 4A, 5A, fourth plot) comprised several islands of response and/or inhibition, without revealing any consistent characteristic frequency or clear tuning. These islands of response did not form any particular or stable shape, showing dynamic changes between different mappings of the FRA. For concision, in the text we will refer to nonlemniscal IC neurons showing mosaic FRAs simply as “X-neurons” or untuned neurons.
Our sample did not include some non-V-shaped FRA types featured in other IC studies, such as narrow, closed, high-tilt, and low-tilt FRAs (Hernández et al., 2005; Palmer et al., 2013). This could be due to an experimental bias introduced while choosing adequate neurons to record responses to our stimulation protocol. Narrow, closed, high-tilt, and low-tilt FRAs reflect rather small receptive fields, with little responsive space to place stimuli to generate oddball paradigms and control sequences. Hence, neurons with small receptive fields were most likely deemed unfit for recording the stimulation protocol by the experimenters.
Statistical hypothesis testing
Statistical analyses were carried out using distribution-free, nonparametric tests. These included the Friedman test for baseline-corrected spike counts and normalized responses to DEV, STD, and each control condition (CAS or MSC), as well as for the iMM, iRS, and iPE. For multiple-comparisons tests, p values were corrected for false discovery rate (FDR = 0.05) using the Benjamini–Hochberg method. A two-way analysis of variance (ANOVA) with Bonferroni’s correction for multiple comparisons for factors processing region and control type was performed to directly compare between normalized responses evoked by the control sequences (CAS and MSC) within each lemniscal and nonlemniscal divisions of the IC, MGB, and AC.
To study the influence of the previous tone on the evoked responses of IC neurons during the control sequences (CAS and MSC), each trial on each sequence was extracted. For each individual trial (n), the frequency step in octaves was computed as the absolute value of the tone frequency of the trial (n) minus the tone frequency of the preceding trial (n−1), resulting in a “frequency step” value in octaves. This frequency step could be 0.5 or 4.5 octaves during CAS, where tones are ordered regularly in increasing or decreasing frequency (Fig. 2A). In the MSC, tone presentation is pseudorandom, so the frequency steps between them could be 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, or 4.5 octaves (Fig. 2A). Pairs of MSC and CAS sequences using the same 10 tones presented at the same intensity to the same neuron were normalized by the largest response in either sequence. Then, trials were grouped by frequency step, control type (MSC or CAS), and either recording site (lemniscal or nonlemniscal divisions of IC, MGB, or AC) or FRA category (V, W, U, or X; only for nonlemniscal IC neurons), and the normalized average response was computed for each group. A two-way ANOVA with Scheffé's procedure correction for multiple comparisons with control type (MSC or CAS) and frequency step (0.5–4.5 octaves in 0.5-octave steps) as factors was performed separately for each neuronal group. ANOVA results that are especially relevant to this study are at display in the comparison matrices (Figs. 2, 7), where the tone of green codifies significance level. Wherever average normalized responses were significantly different between two groups, the magnitude of such difference is expressed as a percentage.
IC processing of control sequences. A, Schematic representation of frequency step comparisons. Responses to extreme step values featuring in both MSC and CAS, that is, 0.5-octave and 4.5-octave steps, are compared across control sequences (highlighted and framed in green). Responses to different frequency steps (0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, or 4.5 octaves) within the MSC are also compared. B, C, Comparison matrices showing the differences in the evoked spiking responses after different frequency steps presented within the MSC or the CAS in lemniscal and nonlemniscal divisions of the IC. The normalized mean difference between the spikes per trial in the x-axis minus those in the y-axis is noted as a percentage where statistical significances were found (ANOVA; p < 0.05; highlighted in green). Significant differences found in both lemniscal and nonlemniscal divisions of the IC are marked as underlined digits, all corresponding to comparisons between 4.5-octave steps in CAS with the rest of smaller steps. Two circles in each matrix mark equal step comparisons across control sequences. B, The lemniscal IC better distinguished different frequency steps (note the many significant green squares) and responded similarly to equal frequency steps happening in different controls (black dotted circles). C, The nonlemniscal IC did not perform this spectral processing but could instead differentiate between equal frequency steps presented in different sequences (white circles).
Ten linear models were fitted using the fitlm function in MATLAB, with robust options. First, we replicated the “global model” of Parras et al. (2017) regarding the hierarchical distribution of iPE (dependent variable) along the auditory pathway by introducing region (IC, MGB, or AC) and pathway (lemniscal or nonlemniscal) as predictor variables. The reference levels chosen for these factors were “lemniscal” and “IC”, as no traces of PE signals have been observed in the central nucleus of the IC (Parras et al., 2017). Second, we expanded the original global model to account for spectral properties of the stimulation by incorporating two more predictors: sound intensity (1–7 B SPL in 1 B SPL steps; 1 B SPL = 0.1 dB SPL) and DEV direction (ascending or descending). Third, a post hoc “local model” using these two predictors (sound intensity and DEV direction) was fitted for each auditory region that yielded significant iPE values (nonlemniscal IC, nonlemniscal MGB, lemniscal AC, and nonlemniscal AC), as well as for each FRA-type group of nonlemniscal IC neurons (V-shaped, multipeaked, U-shaped, and mosaic). These linear models used the values of 1 B SPL and “descending” as reference to calculate the intercept. Finally, an ANOVA was performed to identify which predictors and interactions between them were significant in each of the resulting linear models: one global model (replicating Parras et al., 2017), one expanded global model, four regional models, and four FRA models.
Results
In the following sections, we first describe how tonal succession in CAS and MSC is encoded by means of spectral processing in the lemniscal IC and predictive processing in the nonlemniscal IC. Second, we present an updated global model of the auditory pathway that confirms the hierarchical distribution of the iPE from the nonlemniscal IC to the AC, replicating Parras et al. (2017) with much larger sample sizes (IC: 82→315 units, 384% increase; MGB: 58→161 units, 278% increase; AC: 70→181 units, 259% increase), as well as expanding the linear model from 2 to 4 predictor variables: region, pathway, intensity and DEV direction. Third, we introduce new local iPE models for the nonlemniscal IC, nonlemniscal MGB, and both lemniscal and nonlemniscal AC, which revealed the unique sensitivity of PE signaling in the nonlemniscal IC to sound intensity and direction of frequency change. Finally, we demonstrate the dependency between iPE values and the shape of the receptive field or FRA in IC neurons, which defined the “spectral” and “nonspectral” PE systems in the nonlemniscal IC.
Differential processing of MSC and CAS in each IC division
In agreement with previous observations in rodents (Parras et al., 2017, 2021; Casado-Román et al., 2020; Valdés-Baizabal et al., 2020; Lesicko et al., 2022) and humans (Wiens et al., 2019), the global responses elicited by CAS and MSC did not differ significantly within each of the hierarchical levels studied: IC, MGB, and AC [two-way ANOVA with Bonferroni’s correction for multiple comparisons for factors region (F = 250.17; p < 0.001), control (F = 0.5; p = 0.482), and their interaction (F = 0.99; p = 0.424)]. Consequently, calculations using either CAS (Table 1) or MSC (Table 2) yielded equivalent results. Hence, the models presented in the following sections use the results obtained with CAS, given its methodological advantages (see Materials and Methods, Experimental design; Ruhnau et al., 2012; Harms et al., 2014).
Median normalized spike counts and indices calculated using CAS in each auditory region and at different intensities of stimulation
Median normalized spike counts and indices calculated using MSC in each auditory region and at different intensities of stimulation
MSC and CAS present different frequency steps between successive tones (Fig. 2A). We analyzed how the previous tone in the control sequence could be influencing the normalized neuronal evoked response in the IC, as well as the MGB and AC [two-way ANOVA with Scheffé's procedure correction for multiple comparisons, using control type (MSC or CAS) and frequency step (from 0.5 to 4.5 octaves in 0.5-octave steps) as factors]. In the following, we provide a detailed account of results regarding the IC, whereas MGB and AC results will be just succinctly outlined for comparative purposes.
Regarding the IC, the frequency step (lemniscal IC: F = 307.56, p < 0.001; nonlemniscal IC: F = 107.15, p < 0.001) and the interaction between control type and frequency step (lemniscal IC: F = 28.51, p < 0.001; nonlemniscal IC: F = 107.15, p < 0.001) had an influence on the evoked response, but not the control type alone.
In the lemniscal IC, frequency change from the preceding tone exerted noticeable influence, since a sizable proportion of responses to various octave separations were statistically different from each other (Fig. 2B). Lemniscal IC neurons distinguished 10 frequency steps more than their nonlemniscal counterparts, implying finer spectral resolution. Furthermore, equal frequency step comparisons across control sequences (0.5- and 4.5-octave steps; Fig. 2A, in green) revealed that lemniscal IC neurons responded similarly to equal frequency changes presented within CAS and MSC (Fig. 2B, black dotted circles).
On the other hand, contrast between frequency steps had to be larger in the nonlemniscal IC to observe a significant difference in spike counts (Fig. 2C, in green), indicating more limited spectral resolution. However, nonlemniscal IC neurons responded differently to equal frequency steps depending on where they were presented, within CAS or within MSC (Fig. 2C, white circles), in line with the tenets of the predictive processing framework. The 0.5-octave step yielded larger spike counts when presented in MSC than in CAS (Fig. 2C, upward black arrow). During the MSC, all frequency changes are unpredictable, including the 0.5- and 4.5-octave steps (Fig. 2A, in green). In contrast, 0.5-octave steps are very predictable during the CAS and thereby susceptible to expectation suppression (Garrido et al., 2009b; Todorovic and de Lange, 2012; Stefanics et al., 2014; Auksztulewicz and Friston, 2016; Carbajal and Malmierca, 2018; Emberson et al., 2019). Yet whenever a scale starts over during CAS, instead of the expected 0.5-octave step, it produces a 4.5-octave leap in the opposite direction. Consequently, the first tone of each scale represents a DEV (Tervaniemi et al., 1994). The resulting PE signal could explain why nonlemniscal IC neurons fired more spikes to frequency changes of 4.5 octaves during CAS than during MSC (Fig. 2C, leftward black arrow). Thus, at the expense of finer spectral resolution, nonlemniscal IC neurons obtain early contextual sensitivity that can be accounted for by predictive processing.
Regarding the MGB, the frequency step (lemniscal MGB: F = 40.32, p < 0.001; nonlemniscal MGB: F = 23.97, p < 0.001) and the interaction between control type and frequency step (lemniscal MGB: F = 11.87, p < 0.001; nonlemniscal MGB: F = 26.31, p < 0.001) had an influence on the evoked response, but not the control type alone. MGB comparison matrices look mostly like that of the nonlemniscal IC, with the critical difference that MGB neurons could not distinguish between equal frequency steps presented in different sequences (similar to Fig. 2C, but significance was lost for the circled comparisons pointed by arrows).
Regarding the AC, evoked responses of lemniscal AC neurons were influenced by frequency step (F = 28.16; p < 0.001) and the interaction between control type and frequency step (F = 46.72; p < 0.001), but not the control type alone. Similar to the MGB, the comparison matrix of the lemniscal AC looked mostly like that of the nonlemniscal IC, with the critical difference that lemniscal AC neurons could not distinguish between equal frequency steps presented in different sequences (similar to Fig. 2C, but significance was lost for the circled comparisons pointed by arrows).
On the other hand, evoked responses of nonlemniscal AC neurons were influenced only by frequency step (F = 12.72; p < 0.001) but not the control type alone nor the interaction between control type and frequency step. In contrast with the other regions, nonlemniscal AC neurons could only distinguish between the 0.5-octave step and the 4.5-octave step of the CAS (Fig. 7A).
Distribution of predictive processing indices along the auditory pathway
To find traces of predictive processing activity at neuronal level, a within-region multiple-comparisons Friedman test was performed between DEV, STD, and control responses, such that each pair of conditions within each auditory region was tested for a difference in medians. The corresponding indices of the IC, MGB, and AC, calculated using normalized spike counts evoked by CAS and MSC, are respectively summarized in Tables 1 and 2. These observations were fitted in a linear model with a robust regression for the iPE along the hierarchy, in an effort to replicate previous results with sample sizes tripling those of the original study (Parras et al., 2017). The model used region (“IC”, “MGB”, “AC”) and pathway (Lemniscal “Lem”, Nonlemniscal “NonLem”) as categorical factors, acquiring values 1 when true and 0 when false. The reference levels chosen for these factors were “Lem” and “IC,” as no traces of PE signals have been observed in the central nucleus of the IC (Parras et al., 2017). The resulting global model was as follows:
Median DEV responses were stronger than STD responses all over the IC, resulting in a median iMM = 0.16 in the lemniscal IC (p < 0.001; Friedman test) and a median iMM = 0.44 in the nonlemniscal IC (p < 0.001). In the lemniscal IC, the median normalized spike count in response to both controls was 0.65, slightly stronger than the 0.59 yielded by DEV. Both DEV and control normalized responses surpassed the 0.43 normalized STD spike count. Hence, the resulting median iRS = 0.21 could account for the whole neuronal mismatch signal in the lemniscal IC (p < 0.001), even yielding negative iPE values (Tables 1, 2). Conversely, in the nonlemniscal IC the median normalized DEV spike count of 0.66 surpassed the 0.64 registered for CAS, but not the 0.66 yielded by MSC. When calculated with CAS, this resulted in a significant iPE = 0.02 (Table 1), confirming the presence of significant PE signaling activity at sample level within the nonlemniscal IC (p < 0.001). However, this was not the case when using MSC (Table 2). In addition, the median normalized STD spike counts in the nonlemniscal IC was half of those observed in the lemniscal IC (Tables 1, 2). The resulting median iRS = 0.42 was considerably higher among nonlemniscal neurons than in their lemniscal counterparts (p < 0.001).
Once previous results were replicated (Parras et al., 2017), we tested how spectral features of the stimulation affected the resulting iPE value in the whole auditory pathway. Two new predictor variables were added to the linear model: sound intensity and DEV direction. Sound intensity was measured in bels (1–7 B SPL in 1 B SPL steps). DEV direction is a dichotomic variable, so the factor ascending acquired the numeric value of 1 when true or 0 when false, while the factor descending was used as reference to calculate the intercept. Hence, this expanded model used four predictors: region (“IC”, “MGB” or “AC”) and pathway (Lemniscal “Lem” or Nonlemniscal “NonLem”), sound intensity (“SPL”, in bels), and DEV direction (ascending “asc” or descending “dsc”). The reference levels chosen for these factors were “IC”, “Lem”, 1 B SPL and “dsc.” The resulting expanded global model was the following:
Local models of predictive processing indices for the IC, MGB, and AC
To better understand how spectral features influenced PE signals at each level of the auditory hierarchy, we fitted linear models for each index—that is, iMM, iPE, and iRS—using intensity and direction of DEV (ascending or descending as categorical factors that acquired values 1 when true and 0 when false) as predictors for each auditory region (separately) where significant traces of predictive activity were found: the nonlemniscal divisions of IC, MGB, and AC, as well as the lemniscal fields of AC. The acoustic characteristics of oddball stimulation only affected indices in the nonlemniscal IC, but not in the rest of the auditory regions (Fig. 3A).
Prediction error as a function of intensity and direction of frequency change in four auditory regions that engage in predictive processing. A, Linear models fitted for the iPE (calculated with CAS or MSC) using dB SPL and direction of frequency change (ascending or descending) as predictors. Error bars denote mean and standard error of the mean for each intensity level and direction. The only models that achieved a statistically significant adjustment were in the nonlemniscal IC (first plot), revealing a clear tendency to show higher iPE values at lower intensities. B, Violin plots comparing the spike counts evoked by DEV (in red) and CAS (in green) at low intensities (<40 dB SPL, left violin) and high intensities (≥40 dB SPL, right violin).
The linear model for the iMM in the nonlemniscal IC was as follows:
The linear model for the iPE in the nonlemniscal IC was as follows:
In contrast with observations in the nonlemniscal IC, iPE values in the nonlemniscal MGB, lemniscal AC, and nonlemniscal AC could not be accounted for by any linear model reliant on intensity and direction as predictor variables (note their poor fits as opposed to the nonlemniscal IC in Fig. 3A). Therefore, PE signaling in the nonlemniscal IC encodes deviant acoustic information, whereas in higher processing levels PE signaling seems to be driven by more abstract information than the intensity or direction of the auditory input.
The iRS remained relatively stable across different intensities and pitch changes (Tables 1, 2), and therefore, it could not be predicted by linear models reliant on intensity and direction as predictors in any auditory region. In contrast, the iMM and iPE observed in nonlemniscal IC neurons were larger when intensities were softer, as well as in ascending oddball paradigms. As stimulation intensity increased, the PE component progressively decreased (Fig. 3A, first plot). Also at higher intensities, a shrunk iMM equated to the iRS (Tables 1, 2). Intensity and direction effects disappeared as input ascended the auditory hierarchy, since no linear model of this kind could be fitted in the nonlemniscal MGB or the AC (Fig. 3A). Therefore, changes in the iMM of nonlemniscal IC neurons due to intensity and frequency shifts were caused by how those physical characteristics of the stimuli elicit PE signaling in the auditory midbrain.
Considering the tendency described by the linear models of the nonlemniscal IC, we examined population indices dividing the samples of each region into “low intensities” (<40 dB SPL) and “high intensities” (≥40 dB SPL). Once again, the relationship between the intensity of the stimulation and the magnitude of the PE component within the evoked firing response was evident only in the nonlemniscal IC neurons (Fig. 3B), further confirming the tendencies described by the linear models. The neuronal mismatch responses in the nonlemniscal IC were significantly larger (p < 0.001) at low intensities (iMM = 0.56) than at high intensities (iMM = 0.39). As mentioned in the previous section, the PE component of neuronal mismatch responses in the nonlemniscal IC was rather small and only detectable with CAS (iPE = 0.02) when all the intensities were included in the analysis. However, when the analysis only included the nonlemniscal IC neuronal responses to low intensity stimulation, the median normalized DEV spike counts tended to comfortably surpass those elicited by both controls (Fig. 3B, first plot, left violin), resulting in iPECAS= 0.12 (p < 0.001) and iPEMSC = 0.09 (p = 0.001). Conversely, at high intensities, this PE component disappears completely (Fig. 3B, first plot, right violin). Results of these analyses using CAS and MSC are summarized in Tables 1 and 2, respectively.
Hence, direction of frequency and intensity manipulation only affected the PE component of the mismatch response of nonlemniscal IC neurons, without disrupting their levels of repetition suppression. This remarkable and unique population effect imply that the main function of PE signaling activity in the IC cortices is to enhance the early representation of unexpected sounds that meet certain spectral characteristics, for example, providing dynamic gain compensation to near threshold but informative auditory input. Nevertheless, a closer inspection of different neuronal types proved that this main population effect of “spectral” nature was not the only function of PE signaling in the IC cortices, as detailed in the following section.
PE signaling characteristics in the nonlemniscal IC depend on FRA shape
Nonlemniscal IC neurons with tuned receptive fields (V, W, and U shapes) did not exhibit a significant correlation between Q30 factors and iPE values. However, >20% of nonlemniscal IC neurons in our sample showed an untuned, “mosaic” FRA type where it was not possible to determine any Q factor, due to the lack of a clear and stable characteristic frequency in their receptive fields. The median iPECAS = 0.21 (p < 0.001) and iPEMSC = 0.15 (p < 0.001) in the group of X-neurons were significantly higher (p < 0.001) than the median iPE values observed in the rest of nonlemniscal IC neurons, which were not significantly different from 0.
Due to this stark contrast between neural populations of the nonlemniscal IC, all previous analyses were repeated for nonlemniscal IC neurons, dividing the sample into four FRA categories: V-shaped (V), multipeaked (W), U-shaped (U), and mosaic (X). First, we performed a within-category multiple-comparisons Friedman test between DEV, STD, and control responses, such that each pair of conditions within each FRA category was tested for a difference in medians. Samples within each FRA category were then further divided into “low intensities” (<40 dB SPL) and “high intensities” (≥40 dB SPL) for additional comparisons (Fig. 4C). The corresponding indices calculated using both controls are summarized in Table 3. The evolution of predictive processing index values over time was also analyzed for low and high intensities (Fig. 5). In every condition, the iRS appears immediately (Fig. 5, cyan lines), whereas the iPE takes longer to emerge, if at all (Fig. 5, orange lines).
Prediction error as a function of intensity and direction of frequency change in four types of FRA. A, Categories of FRA shapes recorded in the nonlemniscal IC. B, Linear models fitted for the iPE using dB SPL and direction (ascending or descending) as predictors. Error bars denote mean and standard error of the mean for each SPL and direction. The only models that did not achieve a statistically significant adjustment were those of the neurons that exhibited mosaic FRAs (fourth plot). Note that these neurons tended to show high iPE values, regardless of the intensity or direction of frequency change. C, Violin plots comparing the spike counts evoked by DEV (in red) and CAS (in green) at low intensities (<40 dB SPL, left violin) and high intensities (≥40 dB SPL, right violin). Note how the category of mosaic FRAs (fourth plot) completely deviates from the rest.
Time course of iPE (in orange) and iRS (in cyan) for each FRA type. A, Categories of FRA shapes recorded in the nonlemniscal IC. B, Time course of the iPE and iRS (mean ± standard error of the mean) when the intensity of stimulation was low (<40 dB SPL). Black asterisks mark a significant value (p < 0.05, Wilcoxon signed-rank test, FDR-corrected for 8 comparisons) for the corresponding time window of 20 ms. Note that all but sharply tuned neurons (V, first plot) showed significant traces of PE signaling at low-intensity stimulation. C, Time course of the average iPE and iRS when the stimulation intensity was high (≥40 dB SPL). Note that whereas the iRS prevails largely unchanged across all categories, only the untuned neurons showing mosaic FRAs preserved their iPE values at high intensity stimulation (fourth plot).
Median indices calculated using both controls for each FRA category recorded in the nonlemniscal IC
Following the general trend in the nonlemniscal IC, all FRA categories showed robust repetition suppression. However, only W- and U-neurons followed the population trend regarding PE signaling, with significant iPE values only at low intensities. Conversely, V-neurons did not reach significant iPE values in any condition, whereas X-neurons yielded robust iPE values at every level of intensity. This was further confirmed by linear models that used intensity and direction of DEV as predictors of iPE values for each FRA category in the nonlemniscal IC, achieving good fits in the three tuned groups, that is, V-, W-, and U-neurons, as opposed to untuned X-neurons (Fig. 4B).
Finally, we analyzed how the previous tone in the control sequence could be influencing the normalized evoked response in nonlemniscal IC neurons depending on their FRA type [two-way ANOVA with Scheffé's procedure correction for multiple comparisons, using control type (MSC or CAS) and frequency step (from 0.5 to 4.5 octaves in 0.5-octave steps) as factors]. Again, the three tuned groups tended to follow the same response trend, which resembled the general trend of the nonlemniscal IC (Fig. 2C). Conversely, X-neurons described a completely different pattern (Fig. 7B), more akin to that of the nonlemniscal AC neurons (Fig. 7A). All these data, in addition to the unstable and unstructured nature of their receptive fields, seem to set aside X-neurons from the rest of nonlemniscal IC neurons (Fig. 6), suggesting higher-order capabilities. In the following, detailed results are provided for each FRA category.
Summary of results. A, Median iPE (orange) and iRS (cyan) of each FRA category recorded in the nonlemniscal IC, represented with respect to the baseline set by CAS (green horizontal bar), at low- and high-intensity stimulation (left and right plots, respectively). Thereby, iPE is upward positive while iRS is downward positive (Fig. 1B). Striped regions in the bars mark negative iPE values. Asterisks denote statistical significance of the indices against zero: n.s., nonsignificant, *p < 0.05, **p < 0.01, ***p < 0.001. B, Median iPE and iRS of each FRA category represented with respect to the baseline set by MSC (green horizontal bar).
Sharply tuned receptive fields: V-neurons
The linear model for V-neurons (Fig. 4B, first plot) was as follows:
Regarding frequency step discrimination, V-neurons followed the general tendency of the nonlemniscal IC population, showing evoked responses influenced by frequency step (F = 103.72, p < 0.001) and the interaction between control type and frequency step (F = 86.76; p < 0.001). Comparison matrix of V-neurons was like Figure 2C with exception of the comparison between 0.5-octave steps in MSC and CAS, which significance was lost.
Hence, V-neurons lack the high spectral resolution of their lemniscal homologs, but in exchange they manifest some contextual sensitivity. V-neurons do not contribute to the main PE signaling effect observed at population level in the IC cortices, while being subject to stronger repetition suppression than lemniscal IC neurons (Tables 1⇑–3). Therefore, their properties seem somewhat transitional, which hints at the possibility of V-neurons constituting the “input level” of the predictive processing circuitry of the IC cortices (Fig. 7D).
Early emergence of predictive processing in the auditory system. A, B, Comparison matrices for, respectively, the nonlemniscal AC and the group of X-neurons in the nonlemniscal IC, setting X-neurons apart from the rest of nonlemniscal IC neurons (compare Fig. 2C) and implying a functional connection between both neural substrates. C, Schematics of connectivity between the IC, MGB, and AC. Descending projections targeting the nonlemniscal pathway extend the hierarchical exchange of predictions and prediction errors to earlier stages of auditory processing, presumably improving its overall efficiency. D, Diagram summarizing our proposed classification of neuronal and functional elements in the intrinsic circuitry of the IC. “Raw” spectral information from the central nucleus flows into the predictive processing circuitry of the IC cortices via V-neurons, which constitute the first filter of redundant input. Once at the beginning of the nonlemniscal pathway, unexpected input will be consistently reported to higher-order processing levels by the “nonspectral” PE system constituted by X-neurons with untuned receptive fields. When a deviant sound is close to the hearing threshold or is moving to a higher pitch, unexpected input additionally benefit from dynamic gain compensation provided by the “spectral” PE system formed by W- and U-neurons with broad receptive fields.
Broadly tuned receptive fields: W- and U-neurons
The linear model for W-neurons (Fig. 4B, second plot) was as follows:
The linear model for U-neurons (Fig. 4B, third plot) was as follows:
In W- and U-neurons, the effect of intensity over the magnitude of the PE component within the neuronal mismatch response became evident when comparing the distribution of normalized DEV- and control-evoked spike counts (Fig. 4C, second and third plots), when observing the evolution of the iPE over time (Fig. 5, second and third plots), as well as when comparing the median indices of each category at low- and high-intensity stimulation using both CAS and MSC (Fig. 6, Table 3). Traces of PE signaling could be found at low intensity stimulation in W-neurons, but only when using CAS (iPE = 0.11; p < 0.001; Fig. 5B, second plot). For U-neurons, a significant PE component was revealed for low intensity stimulation using both CAS and MSC (iPE = 0.11; p = 0.003; Fig. 5B, third plot).
Regarding frequency step discrimination, U-neurons followed the general tendency of the nonlemniscal IC population (at display in Fig. 3A), showing evoked responses influenced by frequency step (F = 153.19; p < 0.001) and the interaction between control type and frequency step (F = 88.03; p < 0.001). Comparison matrix of U-neurons was like Figure 2C with exception of the comparison between 0.5-octave steps in MSC and CAS, which significance was lost. In contrast, evoked responses of W-neurons were only influenced by frequency step (F = 26.38; p < 0.001). The comparison matrix reveals that W-neurons could only distinguish between pronouncedly different frequency steps, that is, the 4.5-octave leap in CAS against frequency steps of 2.5 octaves or smaller.
Hence, W- and U-neurons tend to follow the general trends of the nonlemniscal IC population, with rather limited frequency-step discrimination and PE signaling heavily influenced by spectral features of auditory input. In fact, as detailed in the next section, only W- and U-neurons are responsible for the main population effect observed in the IC cortices, revealing them as the true neural substrate of a “spectral” PE system (Fig. 7D).
Untuned receptive fields: X-neurons
In contrast with tuned FRA categories (V, W, and U), the linear model of X-neurons did not achieve a good fit, neither did they show any significant effects of intensity, direction, or their interaction (ANOVA; Fig. 4B, fourth plot). The size of the PE component found in X-neurons more than doubled that of the previous categories at low intensities using both CAS and MSC (iPE = 0.23; p < 0.001; Fig. 6). When only neuronal responses to high-intensity stimulation were selected, we found that only X-neurons were able to preserve their PE component intact (iPE = 0.22; p < 0.001; Fig. 5C), whereas the rest of FRA categories ceased issuing PE signals altogether (Fig. 6).
Therefore, contrary to the general trend in the nonlemniscal IC, X-neurons yielded significantly positive iPE values across all intensities and regardless of DEV direction, more in the vein of the AC (compare fourth column of Fig. 4 with the third and fourth columns of Fig. 3). This differential behavior can be appreciated when comparing the distribution of normalized DEV- and control-evoked spike counts (Fig. 4C), when observing the evolution of the iPE over time (Fig. 5), as well as when comparing the median indices of each category at low and high intensity stimulation using both CAS and MSC (Fig. 6, Table 3). This means that X-neurons do not contribute to the main effect of PE signaling observed in the nonlemniscal IC and thereby are not part of the “spectral” PE system.
Regarding frequency step discrimination, only the interaction between control type and frequency step had an influence on the evoked response of X-neurons (F = 23.52; p < 0.001). Spectral resolution of X-neurons is the poorest in the IC cortices, only able to contrast between 0.5-octave steps and 4.5-octave leaps in the CAS (Fig. 7B), in the same vein of the nonlemniscal AC (Fig. 7A). This resemblance suggests again a more abstract processing of X-neurons related to higher-order processing regions of the auditory pathway.
Hence, in contrast with the main effect observed in nonlemniscal IC processing, X-neurons perform a distinct accessory function that essentially disregard spectral content to consistently report on more contextual information related to novelty or deviances within the auditory scene. X-neurons are thus the neural substrate of a “nonspectral” PE system (Fig. 7D).
Discussion
This study reveals the dual nature of predictive processing in the IC cortices, which are endowed with a main, “spectral” PE system, and an accessory, “nonspectral” PE system. These midbrain-level systems constitute the earliest source of PE signals in the auditory pathway, as confirmed by our expanded linear model for the hierarchical distribution of iPE values. Our results also unravel the differential processing of MSC and CAS by lemniscal and nonlemniscal IC neurons (Fig. 2). Most importantly, we expand the established lemniscal-vs-nonlemniscal dichotomy by distinguishing three levels of hierarchical processing within the IC cortices (Fig. 7D): (1) V-neurons exhibiting mild repetition suppression without signaling PEs constitute the input level of the predictive processing circuitry in the nonlemniscal IC; (2) W- and U-neurons signaling PEs that increase saliency of near-threshold unexpected sounds establish the “spectral” PE system; and (3) X-neurons consistently reporting all surprising auditory input compose the “nonspectral” PE system.
Sharply tuned receptive fields specialize in lower-order functions such as spectral processing but are not well suited for complex auditory integration (Oliver et al., 2017). In the IC cortices, however, V-neurons discriminate less frequency steps than their lemniscal homologs while manifesting some contextual sensitivity. Despite not signaling PEs, V-neurons undergo stronger repetition suppression than lemniscal IC neurons, although milder than other nonlemniscal types (Fig. 6). These transitional properties arguably make V-neurons the best candidates to funnel “raw” input from the central nucleus into the predictive processing circuits of the IC cortices. V-shaped receptive fields are inherited from lower auditory nuclei (Le Beau et al., 2001), implying that V-neurons are mainly driven by bottom-up auditory input. Thus, V-neurons may constitute an “input level” in the nonlemniscal IC, akin to layer IV in the canonical microcircuits of cortical predictive processing (Bastos et al., 2012).
Broadly tuned receptive fields can integrate spectral information across many frequency channels (Oliver et al., 2017) and enable the main effect of predictive processing in the IC cortices. W- and U-neurons increase PE signaling to low-intensity sounds and ascending frequency DEVs (Figs. 3, 4). Previously reported changes of neuronal mismatch responses as a function of intensity and frequency in the IC (Duque et al., 2012, 2016) must be due to changes in the PE component, given that repetition suppression tends to remain stable across varying spectral parameters (Figs. 5, 6).
Regarding sound intensity, modeling work indicates that neurons are maximally sensitive to change for stimulus ranges where firing rates are below saturation (Abbott et al., 1997), which has been empirically confirmed in the rat IC (Duque et al., 2012). As intensity increases, W- and U-neurons exhaust their discharge capacity to signal PEs. Consequently, iPE decreases with loudness in the IC cortices (Figs. 3–6), whereas the iRS remains unaffected (Figs. 5, 6).
Regarding ascending oddballs, enhanced responses compared with descending oddballs have also been reported in electroencephalographic data of rats (Harms et al., 2014) and humans (Peter et al., 2010). This bias might help defining figure-ground relationships in audition (Schneider et al., 2018, 2021). The difficulty of high-pitched sounds to travel long distances means that they usually carry relevant information from our nearby surroundings. Neural representation of surprising changes toward higher frequencies could be facilitated as “figure” DEV sounds emerging from distant, “background” STD noise.
Hence, W- and U-neurons enhance the saliency of muted unexpected sounds both indirectly, by consistently suppressing repetitive auditory input, and directly, by selectively signaling “spectral” PEs. This neural substrate constitute the main PE system of the auditory midbrain, which provides gain to subtle but relevant spectral cues, thereby aiding auditory perception under challenging conditions—for example, when loud background noise masks softer but informative deviant sounds, as described in the visual system (Carandini and Ferster, 1997; Hupé et al., 1998; Michel et al., 2018). Most interestingly, only the IC cortices provide this kind of predictive equalization reliant on spectral features (Fig. 3), despite MGB and AC neurons presenting multipeaked and U-shaped FRAs (Polley et al., 2007; Parras et al., 2017). Therefore, the “spectral” PE system cannot be the direct product of the receptive field exclusively, but an emerging property of the nonlemniscal IC circuitry. Furthermore, this agrees with the tenets of predictive processing (Friston and Kiebel, 2009b), as implementing this unique mechanism of dynamic gain in the IC would relieve the MGB and AC from such a loading discrimination task.
Finally, the lack of delimited and stable receptive fields suggests that X-neurons operate under dynamic inhibitory input (Le Beau et al., 2001) and may receive corticofugal projections (Lesicko and Geffen, 2022). Perhaps their receptive field cannot be defined in terms of frequency–intensity combinations because they are sensitive to more abstract probabilistic structures in auditory input (Tabas et al., 2020; Peng et al., 2024) or to multisensory input (Aitkin et al., 1981; Itaya and Van Hoesen, 1982; Tawil et al., 1983; Tokunaga et al., 1984; Gutfreund et al., 2002; Porter et al., 2007; Gruters and Groh, 2012; Olthof et al., 2019). Unlike in the rest of nonlemniscal IC neurons, evoked responses in X-neurons are barely influenced by sound intensity (Figs. 4A, 5A). This prevents discharge saturations caused by loudness, thereby securing enough dynamic range to always afford increasing firing rates to signal PEs. Consequently, median iPE values in X-neurons more than double those of the rest of nonlemniscal IC neurons and are insensitive to intensity variations (Figs. 4–6, fourth columns).
Contrasting and complementing the “spectral” PE system, X-neurons provide the neural substrate for a “nonspectral” PE system, which activity bears more resemblance with that of higher auditory regions. This similarity implies a functional connection between these neural populations through corticocollicular projections (Adams, 1980; Herbert et al., 1991; Winer et al., 2002; Malmierca and Ryugo, 2011; Schofield and Beebe, 2019), which enable an accessory PE system focused on consistently reporting unexpected auditory events while disregarding their spectral nature.
In coherence with the canonical microcircuits proposed for cortical predictive processing (Bastos et al., 2012), deep pyramidal neurons in layers V and VI generate predictions, which may be imposed on X-neurons through descending projections (Fig. 7C; Games and Winer, 1988). The densest cortical input that the IC cortices receive comes from nonlemniscal AC fields (Herbert et al., 1991). Nonlemniscal AC neurons reach the highest iPE values in the auditory pathway (Parras et al., 2017, 2021) and share remarkable response similarities with X-neurons (compare fourth columns of Figs. 3 and 4 and Fig. 7A,B), further suggesting a functional connection between them.
Although pyramidal projections are glutamatergic (Feliciano and Potashner, 1995), electrical stimulation of the AC produces excitation and inhibition in different subpopulations of IC neurons (Mitani et al., 1983; Jen et al., 1998; Zhou and Jen, 2005; Vila et al., 2019), and AC deactivation yields disinhibition of some IC neurons (Popelář et al., 2003, 2016; Blackwell et al., 2020). GABAergic neurons make up to 25% of the rat IC cortices (Merchán et al., 2005), where distribution of GABA receptors is denser than in the central nucleus (Milbrandt et al., 1996; Jamal et al., 2012; Choy Buentello et al., 2015). Moreover, previous studies demonstrated that AC deactivation reduces neuronal mismatch responses in the IC (Anderson and Malmierca, 2013) by decreasing the PE component without disrupting repetition suppression (Lesicko et al., 2022). Hence, deep pyramidal AC neurons could impose top-down predictions on X-neurons by exciting inhibitory connections within the IC (Oliver, 2005; Beebe et al., 2016; Ito et al., 2018; Ito and Malmierca, 2018). As the predictive model receives updates through bottom-up PE signals, top-down predictions would evolve (Fig. 7C), which could explain the unstable and dynamic nature of mosaic FRAs.
In conclusion, FRA analysis of nonlemniscal IC neurons evidences their complementary physiological roles and affords the distinction of two early PE systems. The “spectral” PE system enhances saliency of near-threshold surprising sounds, while the “nonspectral” PE system reports unexpected events in the auditory scene in a more consistent and abstract manner. Both PE systems improve overall processing efficiency in the auditory pathway by dynamically representing informative auditory events already at midbrain level, thereby sparing MGB and AC resources that can be allocated to processing more complex aspects of audition. This subcortical capacity to generate early but contextually rich neural representations of the auditory scene contravenes the traditional corticocentric understanding (Parvizi, 2009). Hierarchical predictive processing in the IC cortices may be critical to trigger attention shifts (Jane et al., 1965; Hu and Dan, 2022), set off rapid eye–head orienting movements toward sounds (Belenkov and Goreva, 1969; Thompson and Masterton, 1978; Zwiers et al., 2004; Porter et al., 2007), and assist perception and sound-guided behavior even before auditory input reaches the cortex (Ryan and Miller, 1977; Brainard and Knudsen, 1993; Metzger et al., 2006; Rinne et al., 2008; Schelinski et al., 2022; Escera, 2023; Quass et al., 2023; Lee et al., 2024). Ultimately, our data support claims that regard the IC as a functional analog to the primary visual cortex (Nelken et al., 2003; King and Nelken, 2009), thus furthering the notion proposed by Janacsek and colleagues (2022) of subcortical cognition.
Footnotes
This work was supported by the Spanish Ministry of Science and Innovation (MICIN) [grant number PID2019-104570RB-I00]; the Ramón Areces Foundation, Madrid, Spain [grant number CIVP20A6616]; the Consejería de Educación y Cultura de la Junta de Castilla y León (grants, SA023P17, and SA252P20) and a MICIN PhD Fellowship held by GVC [grant number BES-2017-080030]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Drs Edward L Bartlett and Douglas L Oliver for their insightful comments on preliminary versions of this manuscript. We thank Drs Javier Nieto-Diego, Gloria G Parras, and Catalina Valdés-Baizabal for facilitating data to increase our samples. We also thank Mr Antonio Rivas Cornejo and Ms María Torres Valles for their support with histological procedures.
The authors declare no competing financial interests.
- Correspondence should be addressed to Manuel S. Malmierca at msm{at}usal.es.