Excitatory pyramidal neurons and inhibitory interneurons constitute the main elements of cortical circuitry and have distinctive morphologic and electrophysiological properties. Here, we differentiate them by analyzing the time course of their action potentials (APs) and characterizing their receptive field properties in auditory cortex. Pyramidal neurons have longer APs and discharge as regular-spiking units (RSUs), whereas basket and chandelier cells, which are inhibitory interneurons, have shorter APs and are fast-spiking units (FSUs). To compare these neuronal classes, we stimulated cat primary auditory cortex neurons with a dynamic moving ripple stimulus and constructed single-unit spectrotemporal receptive fields (STRFs) and their associated nonlinearities. FSUs had shorter latencies, broader spectral tuning, greater stimulus specificity, and higher temporal precision than RSUs. The STRF structure of FSUs was more separable, suggesting more independence between spectral and temporal processing regimens. The nonlinearities associated with the two cell classes were indicative of higher feature selectivity for FSUs. These global functional differences between RSUs and FSUs suggest fundamental distinctions between putative excitatory and inhibitory interneurons that shape auditory cortical processing.
Sensory cortex contains both excitatory and inhibitory cells whose functional role in shaping local processing has not been fully determined (Fairén et al., 1984; Houser et al., 1984). These cells are fundamental components of neocortical circuits, providing both feedforward and recurrent connections in all modalities and species (Callaway, 1998, 2004; Douglas and Martin, 2004). Although both neuron types are found in the same circuit, their connection patterns, and thus their functional properties, likely differ (Thomson et al., 2002). A constraint on this connectional complexity is that excitatory neurons can have both long-range (>1 mm) and short-range connections, whereas inhibitory interneurons display more local connection patterns (Holmgren et al., 2003; Markram et al., 2004).
In the primary auditory cortex (AI), a diverse number of cell types have been identified based on morphology, neurotransmitter type, and connectivity (Winer, 1992). Excitatory cells comprise ∼75% of the neural population and are highly correlated with pyramidal cell morphology (Douglas and Martin, 2004). Inhibitory interneurons seem much more diverse because they may be classified according to different morphological, physiological, molecular, and synaptic properties (Kawaguchi, 1993a; Kawaguchi and Kubota, 1997; Markram et al., 2004). Recent studies promise to expand these classification schemes by analyzing cells based on gene expression, suggesting an even more diverse family of inhibitory interneurons (Wang et al., 2002; Toledo-Rodriguez et al., 2004; Wang et al., 2004; Sugino et al., 2006).
Inhibitory interneurons provide feedforward and feedback inhibition onto excitatory cells, although the role served by these contacts has not been fully delineated (Miller, 2003; Gabernet et al., 2005; Cruikshank et al., 2007; Silberberg and Markram, 2007). Little is known about the receptive field properties of AI inhibitory interneurons because it is challenging to anatomically identify and record from them (Mitani and Shimokouchi, 1985; Mitani et al., 1985). In primary visual cortex, inhibitory interneurons have simple or complex receptive field properties, resembling their excitatory counterparts (Azouz et al., 1997; Hirsch et al., 2003). In somatosensory cortex, inhibitory interneurons have larger receptive fields and higher firing rates than excitatory cells (Simons and Carvell, 1989; Bruno and Simons, 2002). In contrast, little progress has been made on delineating the receptive field structure of auditory cortical inhibitory interneurons and comparing it with excitatory cells (de Ribaupierre et al., 1972; Volkov et al., 1989). Inhibitory interneurons are most readily identified using intracellular labeling and reconstruction coupled with histochemical techniques. This anatomical evidence, however, is difficult to accrue in combination with extensive physiological characterization. Recent work has used the features of extracellular action potentials (APs) to distinguish two physiological types of cortical neurons, regular-spiking units (RSUs) and fast-spiking units (FSUs). RSUs, which correspond predominantly to pyramidal neurons, have longer APs, whereas FSUs typically have briefer APs both in vitro and in vivo, thus integrating anatomical and physiological results (McCormick et al., 1985; Bruno and Simons, 2002; Andermann et al., 2004; Barthó et al., 2004). FSUs often correlate with parvalbumin-stained cortical cells and thus correspond to basket and chandelier cells, which are inhibitory interneurons that are essential components in the AI microcircuit (Hendry and Jones, 1991; McMullen et al., 1994; Kawaguchi and Kubota, 1997).
We classified AI neurons based on their AP shape into RSUs or FSUs and computed their spectrotemporal receptive fields (STRFs) and the accompanying nonlinearities. This permitted functional comparisons between the different neuron types according to their selectivity for different spectrotemporal stimulus features.
Materials and Methods
All procedures were in compliance with the University of California, San Francisco Committee for Animal Research and the guidelines of the Society for Neuroscience and in compliance with National Institutes of Health guidelines (publication 85-23). Cats (Felis catus; female, 8–18 months of age; n = 8) were first given ketamine (22 mg/kg) and acepromazine (0.11 mg/kg) and then anesthetized with pentobarbital sodium (Nembutal, 15–30 mg/kg) for the surgical procedure. The animal's core temperature was maintained with a thermostatic heating pad. Bupivicaine was infiltrated into incisions and pressure points. A tracheotomy, reflection of the soft tissues of the scalp, craniotomy over AI, and durotomy were made. For the recording session, the animal was maintained in an areflexive state with a continuous infusion of ketamine/diazepam (2–10 mg · kg−1 · h−1 ketamine, 0.05–0.2 mg · kg−1 · h−1 diazepam in lactated Ringer's solution).
Recordings were made in a sound-shielded anechoic chamber (IAC, Bronx, NY), with stimuli delivered monaurally to the ear contralateral to the exposed cortex via a closed speaker system (diaphragms from Stax, Saitama, Japan). Simultaneous extracellular recordings were made using multichannel silicon recording probes provided by the University of Michigan Center for Neural Communication Technology (Wise, 2005). The probes contained 16 linearly spaced recording channels, each separated by 150 μm. The area of each electrode contact was 177 μm2, and the impedance of each channel was 2–3 MΩ. Probes were positioned orthogonally to the cortical surface and lowered to 2300–2400 μm using a microdrive (David Kopf Instruments, Tujunga, CA).
Neural traces were bandpass filtered between 600 and 6000 Hz and recorded to disk with a Neuralynx (Bozeman, MT) Cheetah analog-to-digital system at sampling rates between 18,000 and 27,000 Hz. Stimulus-driven neural activity was recorded for ∼75 min at each location. After each experiment, the traces were sorted off-line with a Bayesian spike-sorting algorithm (Lewicki, 1994). Each probe penetration yielded 8–16 active channels, with approximately one to two well isolated single units per channel. Only these well isolated units were used in receptive field analyses.
Neurons were first probed with pure tones and then with one or two presentations of a 15 or 20 min dynamic moving ripple stimulus, followed by ∼20 min of silence, during which spontaneous activity was recorded. Each pure tone was presented five times. The level and frequency of each pure tone was chosen randomly from 15 different levels (5 dB sound pressure level spacing) and 45 different frequencies. The dynamic ripple stimulus is a temporally varying broadband sound (500–20,000 or 40,000 Hz) composed of ∼50 sinusoidal carriers per octave, each with randomized phase (Escabí and Schreiner, 2002). The carrier magnitude at any time is modulated by the spectrotemporal envelope, consisting of sinusoidal amplitude peaks (“ripples”) on a logarithmic frequency axis that changes through time. Spectral and temporal modulation parameters define the stimulus envelope. Spectral modulation rate is characterized by the number of spectral peaks per octave. Temporal modulations are defined by the speed and direction of the changes of the peaks. The spectral and temporal modulation parameters varied randomly and independently during the stimulus. Spectral modulation rate varied slowly (maximum rate of change, 1 Hz) between 0 and 4 cycles per octave; the temporal modulation rate varied between −40 Hz (upward sweep) and 40 Hz (downward sweep), with a maximum 3 Hz rate of change. Maximum modulation depth of the spectrotemporal envelope was 40 dB. Mean intensity was set 30–50 dB above the pure-tone threshold of the neuron.
Data analysis was performed in Matlab (MathWorks, Natick, MA). For each neuron, frequency response areas were computed from the pure-tone responses, whereas the reverse correlation method was used to derive the STRF, which is the average spectrotemporal stimulus envelope preceding each spike (Aertsen and Johannesma, 1980; deCharms et al., 1998; Klein et al., 2000; Theunissen et al., 2000; Escabí and Schreiner, 2002). Positive regions of the STRF indicate that stimulus energy at that frequency and time will increase the firing rate of the neuron above the average rate, and negative regions indicate the converse.
Characterization of the temporal and spectral modulation properties were derived by computing a two-dimensional fast Fourier transform (2D FFT) of each significant STRF and then folding the absolute value of the 2D FFT along the temporal modulation frequency axis, to obtain the ripple transfer function (RTF). Because the Fourier transform is sensitive to periodicities in the STRF envelope, the RTF reflects the relationship of excitatory and suppressive STRF subfields.
Temporal (tMTF) and spectral (sMTF) modulation transfer functions were obtained by summing the RTF along the spectral and temporal modulation axis, respectively. MTFs were classified as bandpass if, after identifying the peak in the MTF, values at lower and higher modulation rates decreased by at least 3 dB. If there was no such decrease, the MTF was classified as low pass. High-pass MTFs were not encountered. Best modulation rate for bandpass MTFs corresponded to the peak value in the MTF, whereas for low-pass MTFs, it was the mean between the 0 modulation frequency value and the 3 dB high-side cutoff. MTF width for bandpass MTFs was the difference between the high and low modulation rate 3 dB cutoff values, whereas for low-pass MTFs, the width was the difference between the high-side 3 dB cutoff rate and the 0 modulation frequency value.
The inseparability of the STRF was determined by performing singular value decomposition (Depireux et al., 2001). The inseparability index (ISPI) was defined as where σ1 is the largest singular value. The ISPI, which ranges between 0 and 1, describes how well the STRF may be described by a pair of one-dimensional functions: one a function of time and the other a function of frequency, with values near 0 corresponding to an STRF for which time and frequency may be dissociated.
Using previously described methodologies, we computed a phase-locking index (PLI) for each neuron using the relation ), where max(STRF) and min(STRF) are the maximum and minimum values in the STRF, and r is the average firing rate (Escabí and Schreiner, 2002). The square root of 8 allows the PLI to range from 0 (not phase locked) to 1 (precisely phase locked).
To determine the stimulus selectivity of each neuron, we calculated a feature selectivity index (FSI) for each neuron (Miller et al., 2001a; Escabí and Schreiner, 2002). For each spike generated by the neuron, the ripple envelope that preceded the spike was captured and correlated with the STRF of the neuron. The similarity index (SI) is formally defined as follows: where stim and STRF are matrices that represent the stimulus segment preceding a spike and the receptive field of the neuron, respectively, and i and j range over the number of rows and columns in the STRF. The SI ranges between −1 and +1 and is a measure of the spectrotemporal correlation between the stimulus and the STRF.
A similarity index value was calculated for each action potential, forming an SI probability distribution, P(SI), of the driven activity. Using a spike train of similar length but from random spikes (Miller et al., 2001a, 2002), we calculated SIs from the STRF of the neuron and formed a probability distribution, Prand(SI), for a random selection of stimulus segments. For each SI probability distribution, the cumulative distribution function (CDF) was then calculated according to the following: The difference between the random and driven spike trains was quantified by obtaining the areas, A and Arand, under each cumulative distribution function, from which we then calculated the FSI as follows: FSI values vary between 0 and 1, where 0 corresponds to similar distributions for Prand(SI) and P(SI), i.e., a neuron that responds indiscriminately to stimulus segments, and 1 corresponds to a neuron that is responsive to a very restricted range of stimulus features.
For each STRF, we computed the nonlinear function that related the stimulus to the probability of spike occurrence. Each ripple stimulus segment, s, that elicited a spike was correlated with the STRF by projecting it onto the STRF via the inner product z = s × STRF. The projection, or stimulus-filter similarity, characterizes the probability distribution P(z|spike). We then projected a large number of randomly selected stimulus segments onto the STRF and formed the prior stimulus distribution, P(z). The projection values that comprised P(z|spike) and P(z) were transformed to units of SD by normalizing relative to the mean, μ, and SD, σ, of P(z), using x = (z − μ)/σ. The nonlinearity for the STRF was then computed as where P(spike) is the average firing rate of the neuron (Aguera y Arcas et al., 2003). Thus, the nonlinearity describes the likelihood of a spike occurrence given the similarity between the STRF and the stimulus. The FSI and the integral of P(x|spike) are closely related measures.
We recorded STRFs from 1252 single neurons across all cortical layers in cat AI and classified 1093 neurons as either RSUs or FSUs. To distinguish FSUs from RSUs, we characterized the action potential waveforms of the recorded neurons using previously proposed criteria (Bruno and Simons, 2002; Barthó et al., 2004). Two representative action potentials, with example traces from which the action potential shapes were inferred, are shown in Figure 1A. Each action potential was analyzed, and the phases of the action potential were quantified (Fig. 1B). The dotted lines indicate different temporal phases of the waveform that were used for classification purposes.
The action potential duration was calculated as the time between the waveform maximum and minimum values (Fig. 1C) (Mitchell et al., 2007). In the histogram, a mode is present for values <0.2 ms, in accord with a previously used boundary between FSUs and RSUs (Mitchell et al., 2007). FSUs and RSUs may also be classified based on the action potential temporal phases (Fig. 1D). Here, all recorded neurons are shown, with cells that also had action potential durations <0.2 ms marked in gray. Unlike reports in barrel cortex (Bruno and Simons, 2002; Andermann et al., 2004), the phase plot does not show two distinct clusters with a clear classification boundary but reveals a continuum for the relationship between the two temporal phases. This could reflect the larger sample size or it could be a consequence of the overall faster kinetics of AI neurons (Hefti and Smith, 2003).
All neurons, including those that had action potential durations <0.2 ms, had to satisfy two criteria to be classified as FSUs. First, the sum of the two temporal phases, phase 1 and phase 2, had to be <0.6 ms. Second, a classification boundary was chosen from a previous report, in which clear separation was seen between FSUs and RSUs (Bruno and Simons, 2002) and was calculated to be (1.8824 × phase 1 + phase 2 = 0.8). A total of 115 neurons satisfied these constraints and were labeled FSUs. Neurons with (phase 1 + phase 2 > 0.7 ms) were classified as RSUs (n = 978). To reduce the probability of classification error, neurons with phase 1 and phase 2 sums >0.6 ms and <0.7 ms (n = 159) were excluded from additional analysis.
Because multichannel probes were used, the cortical depth of each neuron could be defined. Most neurons (778 of 1252) resided between 600 and 1600 μm, although recordings included the full cortical width. We found FSUs at all depths, with population peaks at 800–1000 μm and at 1400–1600 μm, corresponding to layer 4, and deep layer 5 and superficial layer 6 (Fig. 2A), indicating that FSUs are concentrated in different AI laminae. The distribution of RSUs was more uniform across the depth of AI (Fig. 2B). The ratio of the number of FSUs to RSUs at different depths also revealed the two FSU peaks, as well as one at the most superficial depths of cortex (Fig. 2C). The FSU profile peaks are consistent with anatomical studies in cat and rabbit AI that indicated two local peaks, in layer 4 and layer 6, for parvalbumin-stained neurons (Hendry and Jones, 1991; McMullen et al., 1994). FSUs most often correspond to basket and chandelier cells, which stain positively for parvalbumin, thus lending support for the FSU/RSU classification approach used in this study (Kawaguchi, 1993a; Kawaguchi and Kondo, 2002).
The following sections compare functional properties of FSUs and RSUs. A summary of the differences or similarities for each of the analyzed parameters is given in Table 1.
Basic receptive field parameters
Basic response properties of FSUs and RSUs were first compared using standard receptive field parameters. Frequency response characteristics were assessed by first extracting the characteristic frequency (CF) of each FSU from the STRF (Fig. 3A). Because each FSU was recorded using a multichannel probe, the mean of all RSU CFs from the same probe penetration was used as a comparison. This comparison was made for each individual FSU in a probe penetration. The FSU and RSU CF distributions were not significantly different (range, ∼6–25 kHz; p = 0.26, signed-rank test), with FSU and RSU CFs highly correlated (Fig. 3B) (r = 0.88; p < 0.01, t test), thus supporting a columnar organization of CF. The average CF difference between the mean for RSUs and for individual FSUs in a penetration was 0.12 ± 0.15 octaves. The same comparison procedure was followed for the sharpness of tuning (quality factor, Q: CF/bandwidth at 90% below excitatory peak) (Schreiner and Mendelson, 1990). Q values were significantly correlated for FSU/RSU data (Fig. 3C) (r = 0.57; p < 0.01). However, the distributions were significantly different, with FSUs more broadly tuned than RSUs (FSU mean/SD, 3.48/1.30; RSU mean/SD, 3.90/1.32; p < 0.01, signed-rank test). Peak excitatory response latency of the STRF for each FSU was also compared with that of RSUs. Here, the latency of the FSU was compared with the mean of the latencies of RSUs in the same layer and in the same penetration as the FSU, with the comparison made for each FSU in a given penetration. FSU and mean RSU latencies were significantly correlated (Fig. 3D) (r = 0.76; p < 0.01, t test) within each penetration, with FSUs having shorter latencies (FSU mean/SD, 11.6/3.4 ms; RSU mean/SD, 13.2/4.3 ms; p < 0.01, signed-rank test).
A basic response characteristic of a neuron is its firing rate in response to a stimulus. To compare FSU and RSU response patterns, we plotted the distribution of firing rates over the course of the ripple stimulus.
FSUs are dominated by low firing rates for the ripple stimulus, mostly less than 3 spikes per second (sp/s) (Fig. 4A) (FSU median, 2.66 spikes per second), whereas RSUs show a clear exponential distribution of firing rates, consistent with previous experimental and theoretical work (Baddeley et al., 1997; Attwell and Laughlin, 2001; Lennie, 2003). The RSU firing rate distribution (Fig. 4B) decays smoothly from low to high rates (RSU median, 4.40 sp/s). The firing rate CDFs (Fig. 4C) separate at ∼3 sp/s and are significantly lower for FSUs [p < 0.01, Kolmogorov–Smirnov (KS) test].
Although FSU firing rates were significantly lower, this may not affect the temporal precision of FSU responses. We computed a PLI for each neuron: if spikes were poorly synchronized to a particular stimulus phase, i.e., they show considerable temporal jitter, then the STRF peaks and troughs would be shallower, scaling with the average spike rate (Escabí and Schreiner, 2002).
FSUs and RSUs had significantly different degrees of phase locking (Fig. 5A,B). FSUs had higher PLIs and thus higher temporal precision in their relationship of spike occurrence and stimulus waveform (FSU mean/SD, 0.19/0.14; RSU mean/SD, 0.11/0.09). This difference is evident in the cumulative distribution function (Fig. 5C) (p < 0.01, KS test). The temporal jitter of RSU spikes was 65% larger than for FSUs. Thus, FSUs fire fewer spikes, although the generated spikes are aligned more precisely to specific ripple stimulus segments and thus represent a more accurate and reproducible temporal response pattern than RSUs.
Spectrotemporal receptive field parameters
STRF structure reflects a more complete picture of the spectrotemporal stimulus features that elicit a neuronal response than the properties considered thus far. The RTF for each neuron constitutes an equivalent way of characterizing these features and is computed from the 2D FFT of the STRF (Fig. 6A,B). The RTF has axes that represent the spectral and temporal modulation spectra captured by a neuron (Fig. 6C).
Combining all recordings, the distributions of best temporal modulation frequency (bTMF) were similar for FSUs and RSUs (Fig. 6D,E). The distribution of bTMF values (FSU mean/SD, 14.1/6.15 Hz; RSU mean/SD, 13.0/5.25 Hz) spanned the range expected in AI, ∼4–24 Hz (Joris et al., 2004), and were not significantly different (Fig. 6F) (p = 0.059, KS test), although a trend toward lower RSU values was detected. Cumulative distributions of the encountered tMTF bandwidths were also not significantly different (data not shown; p = 0.208, KS test), nor were there differences in best spectral modulation frequency and spectral MTF width (data not shown; p = 0.230 and p = 0.109, respectively, KS test).
Best modulation frequency and MTF bandwidth are basic parameters characterizing temporal and spectral envelope filters (Joris et al., 2004). The overall shape of the MTF is another descriptor, which for auditory cortical neurons is usually in the form of low-pass or bandpass modulation filters. We found that FSU and RSU filter shapes were significantly different.
tMTFs were significantly more bandpass than low pass for FSUs, with 65% of FSUs having bandpass tMTFs compared with 47% of RSUs (Fig. 7, top). The difference was smaller but the result was the same for sMTFs, with 25% of FSUs and 15% of RSUs having bandpass sMTFs (Fig. 7, bottom). The null hypothesis was tested by pooling all FSU and RSU MTF shape data, for either temporal or spectral MTFs, and drawing 115 values from this distribution to represent simulated FSU data and 978 values for RSUs. From these groups, the bandpass percentages in the simulated FSU and RSU distributions were computed, and this process was repeated 25,000 times to form a resampled distribution. The observed percentages of bandpass MTFs were then compared with the resampled distribution, and it was found that FSU tMTFs were significantly more bandpass (p < 0.0001). In addition, more FSU sMTFs were bandpass (p = 0.016).
STRF shape contains important information about the interaction between temporal and spectral processing aspects and may be quantified by the inseparability index. The index measures how well an STRF is approximated by a single pair of one-dimensional functions of frequency and time, respectively (Depireux et al., 2001). If the STRF must be approximated by multiple pairs of one-dimensional functions, the inseparability index will be ∼1, whereas values near 0 indicate that the approximation by a single pair of functions is appropriate, and thus, time and frequency may be dissociated in the STRF. Inseparability indices revealed significant differences in STRF shape for the two neuronal classes (FSU mean/SD, 0.36/0.18; RSU mean/SD, 0.44/0.18), with FSUs having more separable STRFs than RSUs (Fig. 8A,B).
Population comparisons using CDFs show the RSU distribution shifted to higher inseparability index values as well (Fig. 8B) (p < 0.01, KS test). Thus, the spiking property of a cell can predict some aspects of its spectrotemporal processing regime.
Although FSUs and RSUs differ in their spectrotemporal receptive field structure, these differences do not reveal the selectivity of each cell type for specific spectrotemporal stimulus patterns. We therefore computed FSIs, in which FSI values near 0 indicate a neuron unselective for spectrotemporal stimuli and values near 1 indicate a neuron responsive to few spectrotemporal patterns (Escabí et al., 2005). FSUs were significantly more feature selective (FSU mean/SD, 0.14/0.10; RSU mean/SD, 0.09/0.07) (Fig. 9A,B), as seen in the CDFs for both neuron types (p < 0.01, KS test) (Fig. 9C).
Only properties derived directly from the linearly accumulated STRFs have been examined. However, the relationship between stimulus, STRF, and neural response is often nonlinear. A nonlinear function applied to the linear filter can capture the relationship between the stimulus and the expected neural response. It describes the firing probability of a neuron as the similarity, or correlation, between the stimulus and the STRF changes (Fig. 10A) and forms a fundamental component in linear/nonlinear cascade models of neuronal function (Chichilnisky, 2001; Schwartz et al., 2006). From the example in Figure 10A, the firing rate increases as the similarity between segments of the dynamic moving ripple stimulus and the STRF increases. If the stimulus configuration and the filter (or STRF) properties are anticorrelated (negative abscissa values in Fig. 10A), the firing probability remains low, constituting an asymmetric nonlinearity. Asymmetric nonlinearities can be strongly monotonic (Fig. 10B), or the nonlinearity may plateau at higher similarity values (Fig. 10C). Nonmonotonic nonlinearities represent neurons selective for stimuli that are not optimally matched to the filter (Fig. 10D).
We quantitatively analyzed the skewness, monotonicity, and asymmetry of FSU and RSU nonlinearities. Skewness is the third central moment divided by the cube of the SD: where Yi are the nonlinearity values, μ is the mean of the nonlinearity, N is the number of nonlinearity points, and σ is the SD of the nonlinearity. Skewness describes the amount of information in the tail of the nonlinearity. Gradual increases in firing rate with increasing stimulus–STRF correlation lead to a skewness near 1 (Fig. 11A), whereas skewness values are higher for nonlinearities with more rapidly increasing firing rates at higher similarity values (Fig. 11B). FSU skewness distributions were generally higher than for RSUs (Fig. 11C,D) (FSU mean/SD, 1.72/0.68; RSU mean/SD, 1.52/0.71). Analysis of the cumulative distribution showed the same result (Fig. 11E) (p < 0.01, KS test), which was validated by resampling analysis (Bain and Engelhardt, 1992). This finding suggests that the nonlinearities enable a higher feature selectivity of putative inhibitory interneurons compared with excitatory neurons.
The shape of the nonlinearities can be characterized by determining their monotonicity. If the spiking probability for at least the two highest stimulus-filter similarities was less than the maximum probability, the nonlinearity was classified as nonmonotonic. The majority of both FSU (69.6%) and RSU (64.4%) nonlinearities were monotonic (Fig. 12). The resampling analysis revealed no significant difference between the two cell groups (p > 0.01).
Last, we examined the asymmetry of the nonlinearities. An asymmetry index (ASI) quantifies the difference in spiking probability for positive and negative correlations between stimulus and STRF. The ASI was defined as (R − L)/(R + L), where R and L are the sums of all nonlinearity values that correspond to similarity values greater than or less than 0, respectively. The index ranges from −1 to 1, with 0 representing a nonlinearity that is completely symmetric for positive and negative stimulus similarity values, implying that correlated and anticorrelated stimuli equally influence the probability of a neural response. ASIs near 1 or −1 indicate neurons that have an increased probability of spiking when the stimulus is either positively or negatively correlated with the filter, respectively. FSUs had nonlinearities that were more positively asymmetric than those of RSUs (Fig. 13) (FSU mean/SD, 0.84/0.17; RSU mean/SD, 0.76/0.17; p < 0.01, KS test; also confirmed by resampling). The higher asymmetry for FSUs again suggests that putative inhibitory interneurons tend toward higher feature selectivity, i.e., they prefer more complete matches between a spectrotemporal stimulus and filter features.
Interdependence of STRF characteristics
We have analyzed 10 main response characterizations based on STRFs and their associated nonlinearities to reveal differences between FSUs and RSUs. To assess whether these parameters covaried, we performed a factor analysis to determine which of the different parameters (firing rate, phase-locking index, best temporal modulation frequency, bandwidth of temporal MTF, spectral best modulation frequency, bandwidth of spectral MTF, inseparability index, features selectivity index, nonlinearity skewness, and nonlinearity asymmetry) could be treated independently. We pooled the FSU and RSU data separately and obtained the eigenvalues from the correlation matrix to estimate the number of independent factors that could best account for the variance in each dataset. Three independent factors emerged for both cell classes (eigenvalues >1.7) (Table 2). Additional factors had low eigenvalues (<0.9) and were not considered further. The three factors had the same basic composition for FSUs and RSUs. The largest factor revealed positive correlations between feature selectivity, phase-locking ability, and the asymmetry of the nonlinearity. Firing rate was negatively correlated with these parameters. This factor primarily captures response modes including response strength, timing, and stimulus specificity. The other two factors capture independently the spectral and temporal modulation preferences of the neurons, respectively. The distribution of individual neurons along these three factors was broad and unimodal, showing no unequivocal clustering that could be interpreted as functionally distinct neuronal subgroups. This finding suggests that FSU and RSU responses fall along a continuum, from low to high stimulus specificity, with accompanying changes in firing rate, timing precision, and nonlinearity asymmetry. This behavior is independent from any particular receptive field property, such as best temporal or spectral modulation frequency.
Inhibition has been implicated in multiple aspects of auditory cortical processing, including the establishment and shaping of frequency tuning. Although the frequency tuning of synaptic excitatory and inhibitory inputs is well matched (Wehr and Zador, 2003; Zhang et al., 2003), spike-based tuning curves are usually narrower than the frequency range of the synaptic inputs (Tan et al., 2004) and broaden with the application of a GABAA antagonist (Chang et al., 2005). This suggests that the shaping of spectral receptive fields may reflect specific interactions between excitatory and inhibitory synaptic components as well as biophysical mechanisms present in the spike-generating process (Wilent and Contreras, 2005).
Intensity tuning of cortical neurons, i.e., the nonmonotonic behavior of rate-level functions, can also be shaped or even created by an imbalance of cortical inhibition and excitation (Wu et al., 2006; Tan et al., 2007). In these cases, monotonic excitatory rate-level functions can be transformed into nonmonotonic membrane-potential growth, with inhibition dominating at higher sound levels (Tan et al., 2007).
The relative timing of excitatory and inhibitory inputs also plays a significant role in determining auditory cortical responses to more complex sounds. The direction selectivity for frequency modulation (FM) sweeps is correlated with the delay between excitatory and inhibitory inputs (Zhang et al., 2003). Furthermore, the relative timing between excitation and inhibition may affect spike precision (Wehr and Zador, 2003). Finally, the ability to change receptive field properties through plasticity mechanisms is enabled by inhibitory influences. Whole-cell recordings showed that short-term receptive field plasticity occurs when inhibition is reduced during the exposure to nonpreferred stimulus configurations (Froemke et al., 2007).
The current study showed that auditory cortical FSUs (putative inhibitory interneurons) differ from RSUs (putative excitatory pyramidal neurons) in nearly all investigated STRF properties and their associated nonlinearities (Table 1). Although many individual differences are relatively modest, combined they indicate a clear functional segregation between the two neuronal classes. Additionally, we found that cortical response behavior could be captured by three independent factors that equally apply to FSUs and RSUs (Table 2). The first factor encapsulates the response mode along a continuum of feature selectivity, firing rate, and spike-timing precision. The other two factors characterize the spectral and temporal modulation preferences of the neuron classes.
Classification of FSUs and RSUs
FSUs have been described in slice (McCormick et al., 1985; Rose and Metherate, 2005) and in both the anesthetized (Azouz et al., 1997; Andermann et al., 2004) and unanesthetized (de Ribaupierre et al., 1972; Simons, 1978; Volkov et al., 1989) in vivo preparations and, thus, represent categorical distinctions essentially independent from experimental procedures.
Spike-waveform-based identification of putative inhibitory interneurons, FSUs, is imperfect (Simons, 1978; Volkov et al., 1989; Andermann et al., 2004; Barthó et al., 2004) but shows excellent accord with anatomical results (McCormick et al., 1985; Kawaguchi and Kubota, 1993; Kawaguchi and Kondo, 2002; Swadlow, 2003). Intracellular recordings combined with morphological reconstructions and cytochemical analysis are required for a full identification but were not made in the present study, and thus some degree of classification error may be present. Our results apply only to FSUs and not other AI inhibitory neurons, because FSUs comprise a subset of neocortical GABAergic cells (Kawaguchi, 1993b; Kawaguchi and Kubota, 1993; Markram et al., 2004). Some pyramidal cells may also have short-duration action potentials that may create classification errors (Dykes et al., 1988; Gray and McCormick, 1996). Thus, although most FSUs are inhibitory neurons, the methodological concerns noted above have led to their classification as putative inhibitory interneurons (Swadlow, 1994, 1995).
Most cells had action potential waveforms consistent with RSUs, thus ∼90% of the cells in this study were likely pyramidal. In rabbit AI, ∼10% of layer 4 cells stain positively for parvalbumin (McMullen et al., 1994). We found that 13% (34 of 257) of the cells in layer 4 were FSUs, a class that includes parvalbumin-positive neurons (Kawaguchi and Kondo, 2002). This suggests a similar proportion of FSUs and RSUs in AI of cat and rabbit. Lower proportions of FSUs in layers 2/3 and upper layer 5 also echo distributions obtained in somatosensory and visual cortex (Demeulemeester et al., 1989; van Brederode et al., 1991).
Receptive field features
Excitatory sharpness of frequency tuning among simultaneously recorded FSUs and RSUs differed despite the similarity of local characteristic frequency. This is compatible with the notion that the spike-based tuning of RSUs, the potential synaptic target of FSUs, is narrower than their synaptic inputs (Tan et al., 2004).
Response latency was shorter for FSUs versus RSUs within a layer, perhaps enabling them to influence nearby cells with feedforward inhibition (Gabernet et al., 2005). This inhibition is likely to be strong, because FSUs receive stronger thalamocortical innervation than RSUs (Bruno and Simons, 2002). FSUs show greater phase locking to the spectrotemporal stimulus envelope and thus more precise spiking. Because FSUs may make somatic contact on pyramidal cells, the shorter latencies and high temporal precision of FSUs may control the temporal precision, response duration, and spectrotemporal response profile of other local neurons (Miles et al., 1996; Kawaguchi and Kubota, 1997; Zhang et al., 2003). The generation or shaping of intensity tuning (Tan et al., 2007) and FM direction selectivity (Zhang et al., 2003) may also reflect these FSU timing features.
Spectrotemporal receptive field features
Temporal and spectral modulations are prominent aspects of communication sounds. Tuning to particular modulation frequencies may help listeners to correctly identify speech (Drullman et al., 1994a,b; Drullman, 1995; Shannon et al., 1995; Smith et al., 2002). Likewise, spectral modulation information is important for communication sound processing because it is challenging for listeners to discriminate speech with degraded spectral envelopes (Leek et al., 1987; Shannon et al., 1998; Smith et al., 2002; Dreisbach et al., 2005). The global similarity of the FSU and RSU modulation properties indicates that they likely perform similar transformations of their thalamic modulation input, e.g., preferred spectral and temporal modulation frequency, and different transformations for others, such as the shape of MTFs. Connected thalamocortical neuron pairs often differ in their modulation properties (Miller et al., 2001b), and cortical modulation behavior may be shaped by local inhibitory mechanisms. However, the precise role of inhibition in determining temporal modulation behavior is still unclear (Kurt et al., 2006), and other factors, such as convergence of different modulation ranges and synaptic depression/facilitation, may play major roles in the generation of cortical modulation responses (Eggermont, 2002).
With regard to spectral modulation tuning, spectral MTF shape may remain constant as spectral contrast varies (Calhoun and Schreiner, 1998), and thus inhibition, with spectral modulation tuning matched to that of excitation, could contribute to cortical coding of vowel features, such as formants.
Separability and feature selectivity
STRF structure differs for FSUs and RSUs. FSU STRFs were more separable, thus more fully dissociating spectral and temporal processing because they can be approximated as the product of two independent functions. This implies less context dependence for spectral and temporal processing aspects than for RSUs. Whether this reflects different cortical connection patterns and/or different distributions and kinetic properties of GABAergic inputs to RSUs (Hefti and Smith, 2003) is unknown because detailed records of corticocortical inputs to inhibitory interneurons are not yet available.
FSUs had much higher feature selectivity indices than RSUs. Thus, FSUs respond to a much smaller stimulus set during the ripple stimulus, resulting in a lower overall firing rate for these neurons. Consequently, FSUs will likely have higher information values per spike than RSUs, because spike rate and information per spike are inversely related, as shown for a subset of inferior colliculus neurons (Escabí et al., 2005). These higher information values likely relate to FSU channel properties, because a likely contributor to greater feature selectivity is a higher spiking threshold for FSUs, similar to the high feature selectivity of inferior colliculus neurons (Escabí et al., 2005).
The lower feature selectivity of RSUs implies that they respond to a wider range of spectrotemporal patterns, suggesting increased context sensitivity (Fritz et al., 2003). FSUs transmit local information that is spectrotemporally restricted and temporally precise and, thus, are poised to play a crucial role in the dynamic properties of RSUs, including short-term and long-term spectrotemporal dynamics and plasticity (Fritz et al., 2003; Polley et al., 2006; Froemke et al., 2007).
Nonlinearities of FSUs and RSUs
The response probability of FSUs increases strongly as the correlation between the STRF and the stimulus grows, consistent with the finding that FSUs are more feature selective than RSUs. Feature selectivity in this context means that the variability of the stimuli eliciting a spike is decreased relative to that of RSUs. This decrease, combined with greater temporal precision and shorter latencies, places FSUs in a strong position to supply fast and temporally precise inhibition to neighboring cells (Gabernet et al., 2005).
A fundamental question of neocortex is how variations on a common excitatory template are used to accomplish the tasks native to each sensory system (Douglas and Martin, 2004). A component of this question may be addressed by examining the functional properties of inhibitory neurons in the microcircuit of different systems. In the visual cortex, inhibitory neurons have either simple or complex receptive fields (Azouz et al., 1997; Hirsch et al., 2003). There appears to be no significant difference between receptive field structure for putative inhibitory interneurons (FSUs) and putative excitatory neurons (RSUs) with respect to basket cells, a major FSU candidate (Hirsch et al., 2003). Parallels in receptive field structure of FSUs and RSUs in visual cortex contrast with auditory and somatosensory cortex, and specifically barrel cortex, in which basic receptive field properties diverge, and FSUs have larger bandwidths and different firing rates than RSUs (Simons, 1978; Simons and Carvell, 1989). The spectral integration of AI FSUs also seems broader than for RSUs, and the structure of the receptive fields and the temporal response precision differs between these neuron classes. Significantly, the biophysical properties of AI inhibitory neurons may be adapted for fast temporal processing, because the channel kinetics of AI neurons are faster than those in other sensory systems (Hefti and Smith, 2003). Thus, in total, the current study provides a preliminary description of the differences in spectrotemporal processing between FSUs and RSUs and begins to resolve the functional role of inhibition in auditory cortex.
This work was supported by National Institutes of Health Grants DC02260 and MH 077970 and by Hearing Research (San Francisco, CA). We thank Andrew Tan, Marc Heiser, Kazuo Imaizumi, and Benedicte Philibert for experimental assistance; Mark Kvale for the use of his SpikeSort1.3 Bayesian spike-sorting software; and Jeff Winer for previous comments on this manuscript.
- Correspondence should be addressed to Craig A. Atencio, 513 Parnassus Avenue, HSE 834, Box 0732, San Francisco, CA 94143-0732.