Abstract
Interaural time differences (ITDs) are the dominant cue for the localization of low-frequency sounds. While much is known about the processing of ITDs in the auditory brainstem and midbrain, there have been relatively few studies of ITD processing in auditory cortex. In this study, we compared the neural representation of ITDs in the inferior colliculus (IC) and primary auditory cortex (A1) of gerbils. Our IC results were largely consistent with previous studies, with most cells responding maximally to ITDs that correspond to the contralateral edge of the physiological range. In A1, however, we found that preferred ITDs were distributed evenly throughout the physiological range without any contralateral bias. This difference in the distribution of preferred ITDs in IC and A1 had a major impact on the coding of ITDs at the population level: while a labeled-line decoder that considered the tuning of individual cells performed well on both IC and A1 responses, a two-channel decoder based on the overall activity in each hemisphere performed poorly on A1 responses relative to either labeled-line decoding of A1 responses or two-channel decoding of IC responses. These results suggest that the neural representation of ITDs in gerbils is transformed from IC to A1 and have important implications for how spatial location may be combined with other acoustic features for the analysis of complex auditory scenes.
Introduction
The ability to accurately localize sounds is critical for directing behavior, as well as for identifying and segregating individual sources within complex acoustic scenes (Cherry, 1953; Bronkhorst, 2000; Darwin, 2008). The dominant cue for the localization of a low-frequency sound such as speech is the difference in its arrival time at the two ears, referred to as the interaural time difference (ITD; Wightman and Kistler, 1992). ITD sensitivity in the mammalian brain arises in the medial superior olive (MSO) in the auditory brainstem where cells are sensitive to microsecond differences in the arrival time of inputs from the two ears. The spike rates of cells in the MSO and subsequent subcortical processing stages are modulated by ITD, with most cells responding preferentially to sounds with ITDs corresponding to locations in the contralateral hemifield (for review, see Grothe et al., 2010).
While ITD processing in subcortical areas has been extensively studied, there have been relatively few studies of ITD processing in auditory cortex. It is clear that auditory cortex is necessary for ITD processing in both animals and humans, though lesions in either hemisphere cause a contralateral deficit in spatial processing in animals (Jenkins and Masterton, 1982; Jenkins and Merzenich, 1984; Malhotra et al., 2004), while right auditory cortex appears both necessary and sufficient for ITD processing in humans (Yamada et al., 1996; Tanaka et al., 1999). ITD tuning in primary auditory cortex (A1) was first reported several decades ago (Brugge et al., 1969; Brugge and Merzenich, 1973), but the few studies in A1 with large samples that have been performed since have produced inconsistent results: a study in cats reported results similar to those in subcortical areas, with nearly all cells responding preferentially to ITDs corresponding to locations in the contralateral hemifield (Reale and Brugge, 1990), while studies in chinchillas, rabbits, and monkeys reported a weaker contralateral bias with preferred ITDs distributed more evenly across the physiological range (Benson and Teas, 1976; Fitzpatrick et al., 2000; Scott et al., 2009). There have been no direct studies of single-cell ITD sensitivity in human cortex, but recent EEG and MEG studies suggest a strong contralateral bias (Magezi and Krumbholz, 2010; Salminen et al., 2010).
In this study, we characterize the neural representation of ITD in A1 of gerbils, one of the most widely used model species for studies of ITD processing. In gerbils, the vast majority of cells in subcortical structures have preferred ITDs corresponding to locations in the contralateral hemifield (Spitzer and Semple, 1995; Siveke et al., 2006; Pecka et al., 2008; Lesica et al., 2010), consistent with a two-channel representation in which the ITD of a sound is encoded by the difference in the overall activity of the two brain hemispheres (McAlpine et al., 2001). Here we show that the neural representation of ITDs is transformed between inferior colliculus (IC) and A1 such that the preferred ITDs of A1 cells are distributed evenly throughout the physiological range without any contralateral bias. We examine the impact of this transformation on the population coding of ITDs and assess the ability of two-channel and labeled-line codes to account for gerbil behavioral acuity.
Materials and Methods
In vivo recordings.
All procedures were approved under the UK Animals (Scientific Procedures) Act of 1986. Nineteen adult male gerbils (70–90 g, P60–P120) were anesthetized for surgery with an initial injection of a mix of fentanyl, medetomidine, and midazolam or ketamine and xylazine, and the same solution was infused continuously during recording. A small metal rod was mounted on the skull and used to secure the head of the animal in a stereotaxic device in a sound-attenuated chamber. A craniotomy was made over the inferior colliculus or the primary auditory cortex, an incision was made in the dura mater, and a multi-tetrode array (NeuroNexus) was inserted into the brain. The array had four shanks spaced 0.2 mm apart, and each shank had two tetrodes spaced 0.15 mm apart. Recordings were made with a sampling rate of 25 kHz. Only recordings from the central nucleus of the IC and A1 were analyzed. Because the array covered a large area, recording sites in the central nucleus of the IC could be distinguished from those in other areas by comparison of their responses to tones (Aitkin et al., 1975; Syka et al., 2000), and A1 could be distinguished from other fields based on the direction of the tonotopic gradient (Thomas et al., 1993). A1 recordings were made between 1 and 1.5 mm below the cortical surface (most likely layer V; Happel et al. (2010)). We choose to record in layer V because we found the single-unit yield to be higher there than in layer IV in pilot experiments (we did not try other layers). Though it is difficult to say exactly why this would be the case, the fact that, relative to layer IV, layer V cells are large and sparsely packed, and spike with lower rates and less synchronously, may allow for single units to be more easily separated from multi-unit background. In both IC and A1, recordings were targeted to areas with low preferred frequencies.
Spike sorting.
The procedure for the isolation of single-unit spikes consisted of (1) bandpass filtering each channel and the tetrode array between 500 and 5000 Hz; (2) whitening each tetrode, i.e., projecting the signals from the four channels into a space in which they are uncorrelated; (3) identifying potential spikes as snippets with energy (Choi et al., 2006) that exceeded a threshold (with a minimum of 0.7 ms between potential spikes); (4) projecting each of the snippets into the space defined by the first three principal components for each channel; (5) identifying clusters of snippets within this space using KlustaKwik (http://klustakwik.sourceforge.net) and Klusters (Hazan et al., 2006); and (6) quantifying the likelihood that each cluster represented a single unit using isolation distance (Schmitzer-Torbert et al., 2005). Isolation distance assumes that each cluster forms a multidimensional Gaussian cloud in feature space and measures, in terms of the SD of the original cluster, the increase in the size of the cluster required to double the number of snippets within it. The number of snippets in the “noise” cluster (multi-unit activity) for each tetrode was always at least as large as the number of spikes in any single-unit cluster. Only clusters with an isolation distance >20 were classified as single units and included in our analysis.
Sound delivery.
Sounds were generated with a 48 kHz sampling rate, attenuated, and delivered to speakers. Speakers (Etymotic ER2) coupled to tubes were inserted into both ear canals for sound presentation along with microphones for calibration. The frequency response of these speakers measured at the entrance of the ear canal was flat (±5 dB) between 0.2 and 5 kHz. At each recording site, a sequence of tones with different frequencies and intensities with 5 ms cosine on and off ramps were presented to characterize frequency tuning. Speech and broadband noise were then presented at 60 dB SPL with nine different ITDs spanning the physiological range for gerbils (±160 μs in 40 μs steps) to characterize ITD tuning (with positive values of ITD denoting sounds leading at the ear contralateral to the recording site). These sounds were 500 ms in duration and were presented 32 or 64 times each in random order with a 500 ms pause between sounds and 2 ms cosine on and off ramps. Two different tokens of speech were used. Token 1 was presented to all cells in IC and A1 (n = 188 and 906, respectively). Token 2 was presented to all cells recorded in the left A1 under fentanyl, medetomidine, and midazolam (n = 517). Broadband noise was presented to a subset of cells recorded in the left A1 (n = 492) and all cells recorded in the right A1 (n = 100) and all IC cells.
Decoding ITD from spike rates.
To decode responses based on spike rate alone, we used maximum likelihood decoding. The probability that a spike rate r was evoked by an ITD s is given by Bayes' rule as p(s|r) = (p(r|s)p(s))/p(r). Because all ITDs were presented with equal probability, p(s|r) ∝ p(r|s). Thus, the ITD that is most likely to have caused a given response is simply arg maxs p(r|s). We assumed that the distribution of spike rates evoked by a given ITD was Gaussian (with truncation at zero if necessary). This assumption improved performance in cross-validated testing. We did not place any constraints on the mean spike rates at each ITD, i.e., the shape of the ITD tuning curve. The significance of ITD tuning was assessed by decoding responses after randomizing the pairing of responses and ITDs (Monte Carlo resampling). ITD tuning was considered significant if decoding performance was >4 SDs above the mean performance for 100 different sets of shuffled responses.
To decode population responses with a labeled-line decoder, the joint probability of a set of spike rates from N cells, p(r1,r2,…,rN|s), was computed as the product of the probabilities of the spike rate of each cell,
Decoding ITD from spike times.
To decode responses based on spike timing, we used the metric introduced by Victor and Purpura (1996), which measures the distance between two spike trains as the overall cost of the set of operations required to transform one spike train into the other, with possible operations including the insertion of a spike, the deletion of a spike, and the time shift of a spike (Goldberg et al., 2009). By changing the cost of time shifting a spike relative to deleting the spike at one time and inserting it at another, the metric can be used to evaluate the distance between spike trains at different timescales. Decoding using this metric was performed as follows. (1) A single spike train was removed from the full set of all spike trains. (2) The distance between the removed spike train and each of the remaining spike trains in the set was computed across a range of timescales spaced logarithmically between 1 ms and 1 s. (3) For each timescale, the removed spike train was assigned to the sound for which its average distance to the remaining spike trains evoked by that sound was smallest. This process was repeated for all spike trains in the set to obtain a percentage correct for each timescale, and the overall percentage correct was taken as the maximum value across timescales. Decoding based on spike timing was considered significantly better than decoding based on spike rate alone if decoding performance based on spike timing was >4 SDs above the mean performance for decoding based on spike rate computed via bootstrap resampling.
Results
We made multi-tetrode recordings (Fig. 1A) from populations of single units in the IC and A1 of anesthetized gerbils. Our methods for IC recordings have been described in detail previously (Garcia-Lazaro et al., 2013). For A1 recordings (Fig. 1B), we aligned the shanks of the tetrode array along the rostrocaudal axis (approximately parallel to the tonotopic gradient in A1) and recorded from depths between 1 and 1.5 mm below the cortical surface (most likely layer V; Happel et al. (2010)). The direction of the tonotopic gradient in A1 was evident in the multi-unit activity across tetrodes, as illustrated by the frequency response areas (FRAs) for an example recording site shown in Figure 1C. We used a semi-automated clustering algorithm (see Materials and Methods) to isolate single units based on the first three principal components of their spike waveforms across each of the four channels of a tetrode. Clusters corresponding to single units (colors) and multi-unit noise (gray) are shown for an example tetrode in Figure 1D (note that is a 2D projection of a 12D space). We quantified the quality of each cluster based on its isolation distance (Schmitzer-Torbert et al., 2005) and set a threshold value of 20 for a cluster to be classified as a single unit (this value corresponded to detection of ∼90% of spikes from a target neuron with a false alarm rate of ∼1% in paired intracellular and tetrode recordings in hippocampus). The spike waveforms for two example single units with relatively low (24.5) and high (69.9) isolation distances are shown in Figure 1E. The median isolation distance across our sample of single units in A1 was 32.5 (Fig. 1F).
Multi-tetrode recordings in gerbil A1. A, A schematic diagram of the electrode arrangement on the multi-tetrode array. Thirty-two electrodes were grouped into eight tetrodes. B, A schematic diagram of the gerbil auditory cortex illustrating the alignment of the tetrode array with respect to the tonotopic gradient in A1 (modified from Thomas et al. (1993)). C, The FRAs for the multi-unit activity on each tetrode from a typical recording site. The first column shows the FRAs for the tetrodes on the most rostral shank, while the last column shows the FRAs for the tetrodes on the most caudal shank. Multi-unit activity was summed across the four electrodes on each tetrode. For each tetrode, the center frequency (CF) estimated from the FRA is indicated. D, Spikes from single units were identified by projecting spike waveforms into principal component space (12 dimensions corresponding to 3 principal components for each electrode). An example 2D projection that illustrates the isolation of different single-unit clusters is shown, along with the isolation distance of each cluster. Single-unit clusters are shown in color, and undifferentiated multi-unit noise is shown in gray. E, Spike waveforms for two single units (overlaid on a sample of multi-unit noise waveforms). F, A histogram of the isolation distances for all of the single units in our A1 sample.
We targeted our recordings to areas with low preferred frequencies. The distributions of best frequencies (BFs) for our samples of IC and A1 cells are shown in Figure 2A. As our main goal was to compare the neural representation of ITDs with existing measures of gerbil behavioral acuity in the localization of a single broadband low-frequency sound source (Lesica et al., 2010), we restricted our analysis to responses to broadband sounds (speech and noise) with ITDs spanning only the physiological range (±160 μs in 40 μs steps; with positive values of ITD denoting sounds leading at the ear contralateral to the recording site). Note that we have chosen to use a range of ITDs that is slightly larger than that measured for gerbils by Maki and Furukawa (2005), as their measurements were made for frequencies above 1.5 kHz and the physiological range of ITDs tends to increase for lower frequencies (Rébillat et al., 2014).
Responses to speech at different ITDs in gerbil IC and A1. A, The distribution of BFs in our samples of IC and A1 cells. B, The responses of example cells with significant ITD tuning in IC and A1. Each column shows the FRA for one cell, along with raster plots for the responses to speech at five different ITDs spanning the physiological range and the tuning curve showing the mean spike rate in response to speech as a function of ITD. The black line and gray bands on the tuning curve plots indicate the mean ± 1 SD.
The responses to speech with different ITDs from example cells with significant ITD tuning are shown in Figure 2B. Each column shows the FRA for one cell, along with raster plots for the responses to speech at five different ITDs spanning the physiological range and the tuning curve showing the mean spike rate as a function of ITD. To assess the strength of each cell's ITD tuning, we used a decoder to measure the accuracy with which the spike rate on a single trial could be used to infer which of nine possible ITDs evoked it. We considered ITD tuning to be significant if decoding accuracy was more than 4 SDs above the mean performance for shuffled responses.
The fraction of cells with significant ITD tuning for speech was higher in IC than in A1 (IC: 117/188 cells, 62%; A1: 239/517, 46%). This difference was not due to differences in the distribution of BFs in the two populations; the fraction of cells with significant tuning in our entire A1 sample was the same as that of random subsamples of A1 cells with BFs matched to those of our sample of IC cells (45 ± 3%). The fraction of cells with significant ITD tuning for broadband noise was also higher in IC than in A1 (IC: 134/188 cells, 71%; A1: 133/203, 65%). For all subsequent analyses in this study, only cells with significant ITD tuning were included.
Best ITDs in A1 are distributed evenly across the physiological range
To compare the representation of ITDs in IC and A1, we began by measuring the spike rate tuning curve for each cell's response to speech at different ITDs. All of the example IC cells shown in Figure 2B responded most strongly to the ITD corresponding to the contralateral edge of the physiological range (+160 μs), while each example A1 cell had a different preferred ITD. These examples were representative of the IC and A1 in general; each row in the images in Figure 3A shows the ITD tuning curve for one cell (with cells sorted by best ITD), and the histograms in Figure 3B show the distributions of best ITDs across all cells.
Best ITDs in A1 are distributed evenly across the physiological range. A, The ITD tuning curves for speech for all significantly tuned cells in our samples of IC and A1 cells. Each row shows the ITD tuning curve for one cell. All tuning curves were normalized to have the same maximum and minimum for plotting. Cells were sorted by best ITD for plotting. B, The histograms of the best ITDs for speech for all significantly tuned cells in our samples. C, The same A1 tuning curves shown in A with cells sorted either by BF or decoding performance. Decoding performance was measured as the percentage of single trial responses that were assigned to the correct ITD by a spike rate decoder (the chance level was 1/9).
The majority of IC cells in our sample had a best ITD of +160 μs, while the best ITDs in A1 were evenly distributed across the physiological range. To quantify the degree to which the distribution of best ITDs in each area was biased toward ipsilateral or contralateral values, we measured the percentage of cells with best ITDs in the contralateral hemifield (cells with best ITD = 0 were ignored). The distribution of best ITDs in the IC was strongly biased toward the contralateral side, with 83% of cells having best ITDs in the contralateral hemifield. In contrast, the distribution of best ITDs in A1 was unbiased, with only 53% of cells having best ITDs in the contralateral hemifield.
The best ITDs of A1 cells were unrelated to their BFs, as illustrated in Figure 3C, left, which shows the same ITD tuning curves for all A1 cells as in Figure 3A, but with cells sorted by BF. For any particular BF, there were cells with a range of different best ITDs, and, across the entire population, best ITD and BF were uncorrelated (r = 0.03, p = 0.61). Figure 3C, right, also shows the same tuning curves, but with the cells sorted by ITD decoding performance. For any particular level of performance, there were cells with a range of different best ITDs, though there was a weak, but significant, correlation between best ITD and decoding performance across the entire population (r = 0.18, p = 0.003), indicating that ITD tuning was slightly stronger for cells with best ITDs corresponding to contralateral locations than for cells with best ITDs corresponding to ipsilateral locations.
ITD tuning is consistent across different sounds
We next investigated whether ITD tuning was consistent across sounds with different spectrotemporal properties by comparing ITD tuning curves for speech and broadband noise in both IC and A1, as well as for two different speech tokens in A1. Figure 4A shows the raster plots for the responses of an example A1 cell to the different sounds at five different ITDs, along with the tuning curves showing the mean spike rate as a function of ITD. The ITD tuning for this example cell was consistent across all three sounds, with the strongest responses evoked by ITDs near 0, corresponding to locations near the midline.
ITD tuning is consistent across different sounds. A, The responses of an example cell from A1 with significant ITD tuning for two different segments of speech and broadband noise. Each column shows the raster plots for the responses to one sound at five different ITDs spanning the physiological range and the tuning curve showing the mean spike rate as a function of ITD. The black line and gray bands on the tuning curve plots indicate the mean ± 1 SD. B, The histograms of the correlation coefficients between each cell's ITD tuning curves for speech segment 1 and broadband noise in the IC and A1, and the two speech segments in A1. Only cells with significant ITD tuning for both of the sounds being compared were included. The median value across each sample of cells is noted on each histogram.
To quantify the similarity of ITD tuning across sounds for each cell, we measured the correlation coefficient between ITD tuning curves. As shown in Figure 4B, ITD tuning curves were highly similar across sounds for nearly all cells in both IC and A1; the median correlation between ITD tuning curves for speech and noise was 0.97 in IC and 0.87 in A1, and the median correlation between ITD tuning curves for two different segments of speech was 0.89 in A1. This suggests that the transformation of the representation of ITDs from IC to A1 is a general phenomenon that will be evident for any complex sound.
Spike timing carries relatively little information about ITDs
Studies of IC and A1 responses have shown that for the coding of spectral notches and interaural level differences in high-frequency sounds, spike timing contains substantial information beyond that in spike rate alone (Furukawa and Middlebrooks, 2002; Chase and Young, 2006), but the role of spike timing in coding ITDs in low-frequency sounds is not yet clear. To assess the role of spike timing in coding ITDs, we compared the performance of a decoder based on spike rate alone with that of a decoder that used a distance metric to consider the full spike train at the optimal timescale for each cell (Victor and Purpura, 1996).
Figure 5A shows the raster plots for the responses of an example A1 cell to speech at three different ITDs. For this cell, the timing of some spiking events varied with ITD (see arrows), and considering spike timing in addition to spike rate resulted in a 50% improvement in decoding performance (21% correct for the timing decoder and 14% correct for the rate decoder, for nine possible ITDs). This cell was, however, not typical of either IC or A1; as shown in Figure 5B, the improvement in decoding in both IC and A1 that resulted from considering spike timing in addition to spike rate was relatively small for both speech and noise. In IC, the improvement in the performance of the timing decoder over the rate decoder was significant for 55% of cells for speech and 60% of cells for noise, but the median improvement for those cells with significant improvement was only 11% for speech and 10% for noise. The improvement in A1 was higher than in IC for speech (50% of cells significant, median improvement 18%), and similar for noise (57% significant, median improvement 12%). These results suggest that spike timing is unlikely to play a major role in the coding of ITDs in either IC or A1.
Spike timing carries relatively little information about ITDs. A, The responses of an example cell from A1 with significant information about ITD in spike timing. The raster plots show the responses to speech at three different ITDs. Spiking events that are unique to a particular ITD are marked with arrows. B, Scatter plots showing the percentage improvement in decoding ITD that resulted from considering spike timing in addition to spike rate versus the percentage correct for spike rate alone for responses to speech and noise in IC and A1. The values for cells for which the improvement was statistically significant are shown as black circles. The percentage of cells for which the improvement was significant and the median improvement across those cells is noted on each histogram.
ITD tuning in A1 is qualitatively similar under different anesthesia
All of the responses described above were recorded under a mix of fentanyl, medetomidine, and midazolam (FMM). As the responses of neurons in gerbil A1 are known to vary with brain state (Ter-Mikaelian et al., 2007), we also made recordings under a mix of ketamine and xylazine (KX) to determine whether our observations of ITD tuning in A1 were dependent on our choice of anesthesia. In general, ITD tuning in A1 was much weaker under KX: only 39 of 289 cells (13%) had significant ITD tuning for speech (compared with 46% under FMM), and only 37 of 289 cells (12%) had significant ITD tuning for broadband noise (compared with 65% under FMM). However, as shown in Figure 6, the qualitative nature of ITD tuning under KX was similar to that under FMM: for those cells with significant ITD tuning for speech, best ITDs were distributed across the physiological range with no bias toward ITDs corresponding to locations in the contralateral hemifield (Figs. 6A,B), and spike timing carried relatively little information about ITD (Fig. 6C). All further analyses of ITD tuning in A1 described below were performed only on responses recorded under FMM.
ITD tuning in A1 is qualitatively similar under different anesthesia. A, The ITD tuning curves for speech for all significantly tuned cells in our sample of A1 cells recorded under ketamine and xylazine, plotted as in Figure 3A. B, The histogram of the best ITDs for speech for all significantly tuned cells in our sample of A1 cells recorded under ketamine and xylazine. C, Scatter plots showing the percentage improvement in decoding ITD that resulted from considering spike timing in addition to spike rate versus the percentage correct for spike rate alone for responses to speech and noise in A1 recorded under ketamine and xylazine, plotted as in Figure 5B.
ITD tuning in left and right A1 are similar
All of the responses described above were recorded from the left A1. To verify that ITD tuning in A1 was similar in both brain hemispheres, we made additional recordings from the right A1. ITD tuning in the right A1 was somewhat weaker than that in the left A1: 32 of 100 cells (32%) had significant ITD tuning for speech (compared with 46% in the left A1), and 46 of 100 cells (46%) had significant ITD tuning for broadband noise (compared with 65% in the left A1). As in the left A1, the best ITDs for cells in the right A1 were distributed across the physiological range with only a weak bias toward ITDs corresponding to locations in the contralateral hemifield (Figs. 7A,B), and spike timing carried relatively little information about ITD (Fig. 7C).
ITD tuning in left and right A1 are similar. A, The ITD tuning curves for speech for all significantly tuned cells in our sample of cells recorded in the right A1, plotted as in Figure 3A. B, The histogram of the best ITDs for speech for all significantly tuned cells in our sample of cells recorded in the right A1. C, Scatter plots showing the percentage improvement in decoding ITD that resulted from considering spike timing in addition to spike rate versus the percentage correct for spike rate alone for responses to speech and noise in the right A1, plotted as in Figure 5B.
Two-channel decoding of population responses in A1 results in a loss of information
The difference in the distributions of best ITDs in IC and A1 suggest a fundamental difference in the coding of ITDs at the population level. We considered two different population codes for ITD: a “two-channel” code (McAlpine et al., 2001; Lüling et al., 2011; Day and Delgutte, 2013) that considers only the total spike rate in each brain hemisphere [also known as a summed code (Lesica et al., 2010) or hemispheric code (Goodman et al., 2013)], and a “labeled-line” code that considers the tuning of individual cells [also known as a distributed code (Lesica et al., 2010) or pattern code (Day and Delgutte, 2013; Goodman et al., 2013].
We have shown previously that because the ITD tuning curves of most cells in gerbil IC are similar, a two-channel decoder performs almost as well as a labeled-line decoder at inferring the ITD of the sound that evoked a particular single trial population response (Lesica et al., 2010). However, for a population with more heterogeneous tuning curves, considering only the total spike rate in each hemisphere can impair decoding performance (Day and Delgutte, 2013; Goodman et al., 2013). We compared the performance of labeled-line and two-channel decoders on IC and A1 responses to speech and noise for populations of increasing size. Rather than constrain decoding to a particular computation (e.g., the difference in total spike rate between the two hemispheres), we used a maximum likelihood approach to infer which of nine possible ITDs evoked each single trial population response based on the joint distributions of spike rates in each hemisphere (two-channel) or in individual cells (labeled line; Miller and Recanzone, 2009; Day and Delgutte, 2013). Because the noise correlations between pairs of simultaneously recorded cells in both IC and A1 were extremely weak (IC: 0.006 ± 0.018, n = 9112; A1: 0.004 ± 0.012, n = 3606), we assumed that the spike rates of individual cells were conditionally independent (Garcia-Lazaro et al., 2013).
As shown in Figure 8, A and B, the labeled-line decoder performed well on both IC and A1 responses to speech and noise, with no loss of information between the two areas (IC: median performance of 94% correct for speech for the largest populations, 93% for noise; A1: 93% for speech, 91% for noise). In contrast, the performance of the two-channel decoder on A1 responses was much worse than its performance on IC responses (IC: 74% for speech, 73% for noise; A1: 49% for speech, 54% for noise). Thus, for a labeled-line code based on the spike rates of individual cells, the information about ITDs present in IC is preserved in A1, but for a two-channel population code based on the total spike rate in each hemisphere, there is a substantial loss of information about ITDs between IC and A1.
Two-channel and labeled-line decoding of ITD from population responses. A, The performance of labeled-line and two-channel decoders on IC and A1 responses to speech for populations of increasing size. Performance was measured as the percentage of single trial responses that were assigned to the correct ITD by the decoder. The chance level (1/9) is indicated. The black line and colored bands indicate the mean ± 2 SDs of the performance for 100 different random subpopulations of each size drawn from the full sample of cells. B, The performance of labeled-line and two-channel decoders on IC and A1 responses to noise, plotted as in A.
It should be noted that although the performance of the two-channel decoder on A1 responses was relatively poor, it was still well above chance; although the ITD tuning curve peaks for A1 cells are evenly distributed throughout the physiological range, there is still a significant monotonic modulation of the total population spike rate with ITD, though this modulation is much weaker than that in the IC (Fig. 9A). It should also be noted that the labeled-line decoder significantly outperformed the two-channel decoder not only on A1 responses, but also on IC responses. This result is consistent with recent studies, suggesting that the heterogeneity of tuning curves in IC can carry significant information about ITD (Day and Delgutte, 2013; Goodman et al., 2013).
Comparing decoding of population responses and behavior. A, The tuning curve showing the total mean spike rate of our populations of cells in IC and A1 in response to speech and noise as a function of ITD. The black lines and colored bands on the tuning curve plots indicate the mean ± 1 SD. B, A comparison of the performance of labeled-line and two-channel decoders to gerbil behavior. The black dots show the actual performance of gerbils in lateralizing low-frequency noise bursts as a function of the difference in ITD approximated from the angle of separation between two speakers centered on the midline (median performance for 7 gerbils from Lesica et al., 2010). The colored lines show the performance of labeled-line and two-channel decoders on IC and A1 responses to pairs of noise bursts centered on the midline. The dots on the colored lines correspond to ΔITDs that were actually tested experimentally, the remainder of the values was obtained after interpolating ITD tuning curves as described in Materials and Methods. Performance for ΔITD = X μs was assessed by decoding responses to noise with ITD = ±(X/2) μs. C, The colored lines show the performance of labeled-line and two-channel decoders on IC and A1 responses to pairs of noise bursts centered on different ITDs. Performance for center ITD = X μs was assessed by decoding responses to noise with ITD = X ± 10 μs for the IC and ITD = X ± 20 μs for A1. The black line and colored bands indicate the mean ± 2 SDs of the performance for 100 different bootstrap samples of cells from the full populations.
Both two-channel and labeled-line decoding of population responses are sufficient to explain behavior
Our decoding results demonstrate that a labeled-line code carries substantially more information about ITDs than a two-channel code in gerbil A1. However, there is no guarantee that the code that is most informative about a particular sound feature is the one that underlies its perception, especially in the cortex where, presumably, the same neural circuitry is used to analyze many different features (Brette, 2010). One approach to rule in or rule out different candidate codes for a particular feature is to determine whether they are sufficient to account for behavioral performance (Jacobs et al., 2009).
There have only been a few behavioral studies of sound localization in gerbils (Heffner and Heffner, 1988; Maier and Klump, 2006; Maier et al., 2008; Lesica et al., 2010; Carney et al., 2011; Lingner et al., 2012). Figure 9B shows the accuracy with which gerbils lateralized low-frequency noise bursts as a function of the difference in ITD approximated from the angle of separation between two speakers across the midline (median performance for seven gerbils from Lesica et al. (2010)). We used the labeled-line and two-channel decoders described above to simulate the same behavioral task and infer which of two possible ITDs centered on the midline evoked each single trial population response to noise. Surprisingly, we found that both labeled-line and two-channel decoding were sufficient to reproduce this performance, even for A1 responses (Fig. 9B).
Though ability of gerbils to use ITDs to localize sounds has only been tested for pairs of sounds centered on the midline, it is known that behavioral acuity tends to decrease for sounds centered on more lateral locations in many mammals, for example, humans (Mossop and Culling, 1998) and rabbits (Ebert et al., 2008). We examined the ability of the labeled-line and two-channel decoders to infer which of two possible ITDs evoked each single trial population response as a function of the ITD on which the sounds were centered (different ΔITDs were used for IC and A1 so that the performance of the labeled-line decoder for pairs of sounds centered on the midline was ∼90% correct for both brain areas). As shown in Figure 9C, while decoder performance on IC responses decreased for more lateral sounds as expected, decoder performance on A1 responses was relatively consistent across the physiological range.
Discussion
We have shown that the neural representation of ITDs in gerbils is transformed from IC to A1. In the IC, we found that most cells responded maximally to ITDs corresponding to the contralateral edge of the physiological range, consistent with previous studies of ITD processing in different subcortical stages in gerbils (Spitzer and Semple, 1995; Siveke et al., 2006; Pecka et al., 2008; Lesica et al., 2010). In contrast, the preferred ITDs of A1 cells were distributed evenly throughout the physiological range, with an equal number of cells preferring ITDs corresponding to ipsilateral and contralateral locations. This transformation in the distribution of preferred ITDs resulted in a loss of information between IC and A1 when using a two-channel decoder that considered only the total spike rate in each brain hemisphere, but not when using a labeled-line decoder that considered the tuning of individual cells. However, despite this loss of information, the two-channel decoder was still sufficient to reproduce gerbil behavioral performance.
Our analysis has revealed several aspects of the neural representation of ITDs in A1 that appear inconsistent with existing behavioral data. First, decoding of ITD from A1 activity results in performance that is far better than that observed behaviorally. Behavioral performance may be expected to be worse than decoder performance for activity from the early stages of peripheral processing, but differences of this degree in cortex are more surprising. Second, both left and right A1 appear to have a complete representation of azimuthal space (i.e., best ITDs span the full physiological range). Thus, it is unclear why a lesion to either the left or right A1 would cause a deficit in the localization of only contralateral sounds, as is the case in several mammals (Jenkins and Masterton, 1982; Jenkins and Merzenich, 1984; Kavanagh and Kelly, 1987; Malhotra et al., 2004). Third, while decoder performance on IC responses decreased for sounds with ITDs corresponding to more lateral locations, consistent with behavioral observations in several mammals (Mossop and Culling, 1998; Ebert et al., 2008), decoder performance on A1 responses was similar for ITDs corresponding to medial and lateral locations.
One possible explanation for the apparent mismatch between the neural representation of ITDs in A1 and the existing behavioral data in gerbils is that A1 is not actually required for or involved in the localization of single sound sources in a quiet background. While A1 seems to play a role in localization in most mammals that have been tested (King and Middlebrooks, 2011), it does not appear necessary for sound localization in rats (Kelly and Kavanagh, 1986). It may also be that if localization was tested under more difficult (e.g., reverberant) conditions, a better match between A1 activity and behavioral performance would become apparent. Another possible explanation for the disconnect between the neural representation of ITDs in A1 and behavior arises when one considers that the role of cortex is presumably to combine information about different acoustic features for the analysis of complex auditory scenes. While a given population of subcortical cells can be specialized for the processing of a particular feature, cortical populations may need to process multiple stimulus features simultaneously. Thus, the representation of ITDs in A1 (and/or the manner in which information from A1 is decoded in higher cortical areas, which may differ from the decoders we tested) may not be specialized for sound localization per se, but rather for the general processing of complex scenes, allowing, for example, sound sources from different locations to be processed by different subpopulations of cells (Middlebrooks and Bremen, 2013) and facilitating the allocation of attentional resources to enhance or suppress activity related to a given source (Lee and Middlebrooks, 2011).
How does the transformation of ITD tuning between IC and A1 in gerbils compare with that in other species?
The change in the distribution of ITD tuning curve peaks from strongly contralaterally biased in IC to unbiased in A1 makes gerbils unique among species for which ITD tuning in midbrain and cortex has been systematically studied (for review, see Vonderschen and Wagner, 2014). There is a clear transformation of the representation of ITDs in the auditory pathway of barn owls, but in the opposite direction, with narrowly tuned midbrain inputs converging to form broadly tuned channels in the forebrain (Vonderschen and Wagner, 2009, 2012). The differences in ITD tuning between IC and A1 in other mammals are not as clear as those in gerbils, and results differ across species. Studies in cats have reported a relatively strong contralateral bias in both IC and A1 (Reale and Brugge, 1990; Yin and Chan, 1990), while in rabbits, best ITDs span the physiological range with a relatively weak contralateral bias throughout the entire auditory pathway (Fitzpatrick et al., 2000), though tuning curves get sharper in more central stations (Fitzpatrick et al., 1997). Our results are perhaps most similar to those from chinchillas and primates; in chinchillas, there appears to be a strong contralateral bias in the IC (Bremen and Joris, 2013), but only a relatively weak contralateral bias in A1 (Benson and Teas, 1976). While there have been no systematic studies of ITD tuning in the primate midbrain, studies of spatial sensitivity in the IC suggest a strong contralateral bias (Groh et al., 2001, 2003; Zwiers et al., 2004), while ITD tuning in A1 exhibits a relatively weak contralateral bias (Scott et al., 2009).
Our results may also have implications for the study of ITD processing in humans. EEG and MEG studies in humans investigated the representation of ITDs based on measurements of the change in overall cortical activity in response to a change in ITD (Magezi and Krumbholz, 2010; Salminen et al., 2010). In a labeled-line representation, a change in ITD in either direction should cause an increase in overall activity (as the sensory drive is directed toward an unadapted neuronal subpopulation), while in a two-channel representation, overall activity within a given hemisphere should increase with change in ITD in one direction, and decrease with a change in ITD in the other direction. Both studies found that the sign of the change in overall activity depended on the direction of the change in ITD, and, thus, argued for a two-channel representation. However, our A1 data demonstrate that a relatively coarse two-channel representation can coexist with a much more sensitive labeled-line representation and suggest that more detailed studies may be required to determine the true nature of the cortical representation of ITDs in humans.
What neural mechanisms underlie the transformation between IC and A1?
The neural circuitry that facilitates the transformation of the neural representation of ITDs between IC and A1 is not yet clear. It is possible to transform a distribution of best ITDs with a strong contralateral bias into an unbiased one either through the addition of inputs with opposing preferences or through the subtraction of inputs with similar preferences but different tuning curves (Groh et al., 2003). In principle, either of these possibilities could be implemented between IC and A1, even within a single brain hemisphere, using either the small subpopulation of cells in each IC with best ITDs corresponding to locations in the ipsilateral hemifield, or the heterogeneity of tuning curves in the majority of cells with best ITDs corresponding to locations in the contralateral hemifield.
If the transformation between IC and A1 does involve integration across the two brain hemispheres, it is likely through callosal connections (Budinger et al., 2000), as the projections from the IC to the auditory thalamus and from the thalamus to A1 are predominantly ipsilateral (Winer and Schreiner, 2005, 2011). The possibility that callosal connections play a role in shaping ITD tuning in A1 could explain why a unilateral cortical lesion results in a behavioral deficit for only contralateral locations (Jenkins and Masterton, 1982; Jenkins and Merzenich, 1984; Malhotra et al., 2004); without callosal inputs, the residual sensitivity in the remaining A1 may be for ITDs corresponding to ipsilateral locations only. Efforts to identify the neural circuitry that underlie the transformation from IC to A1 should begin by determining the highest stage at which the distribution of best ITDs still has a strong contralateral bias. While we cannot be certain of the layer in which our A1 recordings were made, it was most likely layer V; it is possible that the distribution of best ITDs in layer IV still has a strong contralateral bias (indeed, recent studies have suggested that the responses of cells in layer IV of A1 are simply amplified versions of their thalamic inputs; Li et al., 2013), and that the transformation takes place between layer IV and layer II/III, or between layer II/III and layer V.
How does the ITD tuning in gerbil IC observed in this study compare with that observed previously?
The results of our population decoding analysis of IC responses differ somewhat from those of a similar analysis that we performed in a previous study (Lesica et al., 2010). In the previous study, the performance of the labeled-line and two-channel decoders was nearly identical, whereas in the current study, the performance of the labeled-line decoder was substantially better than that of the two-channel decoder. The difference in the performance of the two-channel decoder in the two studies is consistent with the differences in the distributions of best ITDs in the two populations of cells that were studied. In the previous study, we found that nearly all IC cells had best ITDs corresponding to the contralateral edge of the physiological range and, because of this homogeneity, very little information was lost when ignoring the tuning of individual cells and decoding only the total activity in the population. In the current study, however, 17% of the cells in our IC sample had best ITDs corresponding to locations in the ipsilateral hemifield, and this heterogeneity affected the performance of the two-channel decoder.
We believe that the difference in the distributions of best ITDs in the two studies is due to a difference in the fraction of the IC that was sampled during our recordings. In the previous study, we used a bundle of concentrically arranged electrodes that spanned a relatively small area and targeted the recordings to the rostromedial quadrant of the IC where the dominant input is provided by the MSO (Cant and Benson, 2006), while in the current study, we used a much larger electrode array and sampled a larger fraction of the IC. Thus, the results of the present study are likely a more accurate reflection of the processing of ITDs in the IC as a whole, and are consistent with other recent studies suggesting that differences in the ITD tuning of IC cells carry significant information (Day and Delgutte, 2013; Goodman et al., 2013).
Footnotes
This work was supported by the Wellcome Trust and the Munich Center for Neurosciences. We thank D. McAlpine, M. Pecka, and J. Bizley for helpful discussions.
The authors declare no competing financial interests.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.
- Correspondence should be addressed to Nicholas Lesica, Ear Institute, University College London, Gower Street, London WC1E 6BT, UK. n.lesica{at}ucl.ac.uk
This article is freely available online through the J Neurosci Author Open Choice option.