Coding of Sound-Source Location by Ensembles of Cortical Neurons

We examined the coding of sound-source location by ensembles of neurons in the auditory cortex. Broadband noise bursts were presented from loudspeakers throughout 360° in the horizontal plane. Sound levels varied from 20 to 40 dB above neural thresholds. We recorded temporal spike patterns simultaneously at 16 recording sites in area A2 of α-chloralose-anesthetized cats. Spike patterns of individual units varied in spike counts and in spike timing as a function of sound-source location. Ensembles of up to 19 units recorded simultaneously demonstrated additional location sensitivity in the form of relative spike counts and relative spike timing among neurons. We used an artificial neural network (ANN) algorithm to recognize ensemble spike patterns and, thereby, to infer the locations of sound sources. The ANN could estimate stimulus locations based on ensemble responses to single-stimulus presentations. Median errors (MEs) averaged 49.2 ± 11.9° (mean ± SD; n = 34; chance level, 90°). The ANN maintained better-than-chance performance even when input spike patterns were expressed as relative spike counts across units (i.e., no information available from absolute spike counts of individual units; ME, 63.0 ± 11.8°) or when spike latencies were represented as time relative to the first spike for each trial (i.e., no external time reference available; ME, 54.3 ± 12.4°). The ANN performance improved monotonically as the sizes of ensemble patterns were increased by combining patterns across the entire unit sample. The performance by ensembles of 128 units approached the level of localization performance of behaving cats.

We examined the coding of sound-source location by ensembles of neurons in the auditory cortex. Broadband noise bursts were presented from loudspeakers throughout 360°in the horizontal plane. Sound levels varied from 20 to 40 dB above neural thresholds. We recorded temporal spike patterns simultaneously at 16 recording sites in area A2 of ␣-chloraloseanesthetized cats. Spike patterns of individual units varied in spike counts and in spike timing as a function of sound-source location. Ensembles of up to 19 units recorded simultaneously demonstrated additional location sensitivity in the form of relative spike counts and relative spike timing among neurons. We used an artificial neural network (ANN) algorithm to recognize ensemble spike patterns and, thereby, to infer the locations of sound sources. The ANN could estimate stimulus locations based on ensemble responses to single-stimulus presenta-tions. Median errors (MEs) averaged 49.2 Ϯ 11.9°(mean Ϯ SD; n ϭ 34; chance level, 90°). The ANN maintained better-thanchance performance even when input spike patterns were expressed as relative spike counts across units (i.e., no information available from absolute spike counts of individual units; ME, 63.0 Ϯ 11.8°) or when spike latencies were represented as time relative to the first spike for each trial (i.e., no external time reference available; ME, 54.3 Ϯ 12.4°). The ANN performance improved monotonically as the sizes of ensemble patterns were increased by combining patterns across the entire unit sample. The performance by ensembles of 128 units approached the level of localization performance of behaving cats.
Key words: sound localization; auditory cortex; neural population; artificial neural network; neural coding; spatial hearing The necessity of intact auditory cortex for normal sound localization has been demonstrated in clinical studies in humans (Greene, 1929;Wortis and Pfeiffer, 1948;Sanchez-Longo and Forster, 1958;Klingon and Bontecou, 1966) and in ablationbehavioral studies in animals (Jenkins and Masterton, 1982;Jenkins and Merzenich, 1984;Kavanagh and Kelly, 1987). Nonetheless, the physiological mechanisms for the cortical representation of sound-source location are not well understood. In particular, physiological studies consistently have failed to demonstrate a topographical representation of auditory space in the cortex [cat (Middlebrooks and Pettigrew, 1981;Imig et al., 1990;Rajan et al., 1990;Korte and Rauschecker, 1993;Brugge et al., 1994Brugge et al., , 1996Middlebrooks et al., 1998); monkey (Ahissar et al., 1992)]. Our previous studies have shown that the spike patterns of single neurons in the auditory cortex can carry information about sound-source locations throughout 360°of space (Middlebrooks et al., 1994, and such "panoramic" neurons are distributed widely throughout the auditory cortex. The results have led us to recognize an alternative "distributed code" in which information about any point in auditory space is distributed across large populations of neurons. In the present study, we attempted to quantify the accuracy with which sound-source location can be coded by small popula-tions of cortical neurons or neural ensembles. Ensemble spike patterns were obtained by recording unit activity simultaneously from 16 sites in cortical area A2. Neurons in area A2 generally have favorable features for the study of sound-location coding, such as broad-frequency tuning (Schreiner and Cynader, 1984), sensitivity to sound location both in horizontal and vertical planes , and spatial sensitivity that parallels psychophysical responses to sounds that produce spatial illusions (Xu et al., 1999a,b). An artificial neural network (ANN) algorithm was used to identify sound-source locations by recognizing characteristic spatiotemporal spike patterns of cortical neural ensembles. The ANN could recognize high-dimensional input patterns without need for a priori specification of particular information-bearing features of the patterns, such as spike counts or first-spike latencies. We interpreted the accuracy of ANN estimations as an empirical measure of the amount of stimulus-related information carried by the spike patterns. In further analyses, we eliminated particular features in the spike patterns that might carry stimulusrelated information. The degradation of ANN performance that resulted from elimination of specific features demonstrated the relative importance of those features.
The results show that the accuracy of single-trial identification of stimulus location improved with increases in the sizes of neural ensembles. We demonstrate three features of ensemble spike patterns that could account for this improvement. First, the addition of samples from multiple units decreased the signal-tonoise ratio in neural signals. Second, the combination of units with different spatial sensitivities provided independent information about differing spatial regions. Third, ensemble spike patterns carried information in the form of location-specific differences in activity across units, such as relative spike count and relative spike timing. The results demonstrated that location signaling by neural ensembles of moderate size approached the level of accuracy exhibited by behaving animals.

MATERIALS AND METHODS
E xperimental apparatus and stimulus generation. The experimental apparatus and the procedure for stimulus generation were identical to those detailed previously . Briefly, experiments were controlled with an Intel-based personal computer. Acoustic stimuli were synthesized digitally, using equipment from T ucker-Davis Technologies (TDT; Gainesville, FL). The sound-attenuating experimental chamber was lined with acoustical foam (I llbruck, Minneapolis, M N) to suppress reflections of sounds at frequencies Ͼ500 Hz. Sounds were presented from multiple calibrated loudspeakers, one loudspeaker at a time, at a distance of 1.2 m from the animal's head. A circular hoop held 18 loudspeakers in the horizontal plane with an angular separation of 20°. The speaker location directly in front of the animal was labeled 0°, and positive azimuths indicated speakers on the right side of the animal, which was ipsilateral to the recorded cortical hemisphere. Noise bursts were 80 msec in duration with abrupt onsets and offsets. Tone bursts were 80 msec in duration, ramped on and off with 5 msec rise/fall times. Noise and tone bursts were presented once every ϳ800 msec.
Animal preparation. This report presents data from 10 purpose-bred adult cats of both sexes. The animal preparation was identical to that detailed previously . In brief, isoflurane anesthesia was used during surgery, and ␣-chloralose was used for unit recording. All recordings were made from the right cortical hemisphere. A skull opening was made to reveal the middle ectosylvian gyrus, and a plastic chamber was cemented around the ventral margin of the opening to contain a pool of silicone oil. The scalp was sutured closed around the plastic chamber. The animal was positioned to the center of the soundattenuating chamber, with its body supported in a sling with a heating pad and its head supported from behind by a bar attached to a skull fixture.
Thin wire supports were used to push the external ears into a forward position (Middlebrooks and Knudsen, 1987). The position of the ears was constant throughout each experiment. Previous studies have demonstrated that large experimenter-produced changes in ear position can change the spatial location of greatest sensitivity of the ear (Middlebrooks and Knudsen, 1987) and can change the center frequency of spectral notches in the head-related transfer f unction (HRTF) for particular locations (Young et al., 1996). Nevertheless, such changes in ear position do not seem to alter the overall structure of HRTFs. A study by Xu and Middlebrooks (2000) has shown that changes in HRTFs that are produced by small changes in ear position are substantially smaller than intercat differences in HRTFs.
At the end of each experiment, the animal was killed. The cortex was immersed in buffered aldehydes and later inspected visually to confirm the region of cortex recorded.
Data acquisition and spik e sorting. Unit activity was recorded extracellularly with silicon-substrate multichannel probes (Anderson et al., 1989) that were provided by the University of Michigan C enter for Neural Communication Technology. We used probes of type 16CHAN3, which permitted simultaneous recording from as many as 16 cortical sites. Each probe had one shank along which 16 recording sites were located in 100 m intervals. Impedances were 2-4 M⍀ at 1 kHz. The activity at each site was amplified with custom hardware, digitized at a sampling rate of 25 kHz, sharply low-pass filtered below 6 kHz, resampled at 12.5 kHz, and stored on a computer disk for off-line spike sorting. For monitoring purposes, spikes on selected channels were discriminated on-line with an amplitude and time discriminator (TDT model SD1). On-line monitoring was used to estimate the units' threshold sound pressure levels (SPL s) and frequency tuning.
The results presented here were based on spikes that were discriminated off-line using custom software. The off-line spike-sorting procedure used a template-matching algorithm that consisted of three stages. First, the recorded waveforms were interpolated to permit resampling at 50 kHz, and waveform peaks that exceeded a criterion level were identified as candidate spikes. Second, candidate spike waveforms were analyzed using principal components analysis, and the weights on the first and second principal components were plotted. C andidate spikes that were likely from the same unit tended to form a cluster of points on the scatter plot. An operator selected such a cluster on a computer screen, and a template waveform and acceptance limits were determined on the basis of the selected candidate spikes. Usually, only one template was generated for each recording site. At 7% of recording sites, however, two units could be discriminated, so two templates were generated. In the third stage, the template was used to screen all the candidate spikes for each recording site, and the poststimulus times of accepted spikes were stored with 20 sec resolution. Units were subjected to a screening for responsiveness and stability according to the following criteria: (1) the mean spike count for the best stimulus was Ͼ1 spike per trial, and 2) the spike counts for the first and second halves of trials of a recording session (summed across randomized stimulus conditions) differed by a factor of no greater than two. If fewer than five units were available for any probe penetration after this screening, the data for that penetration were excluded from the analysis.
The final data set that passed all criteria amounted to 377 units at 350 recording sites in 34 probe penetrations. Fifty four of the 377 units were identified as well isolated single units according to the following additional criteria: (1) the weights on the first and second principal components formed a discrete cluster, and (2) the distribution of interspike intervals formed across all trials peaked at Ͼ2 msec. In the remaining 323 cases that failed to meet one or both of these criteria, the recording probably consisted of indistinguishable spikes from two or more neurons. show the first and last 50 spikes collected from the whole series of trials. The site at 1200 m (a-c) had distinct spike shapes, and the obtained unit was identified as the single unit, on the basis of the above criteria. Records from the site at 200 m (d-f ) are representative of our typical recordings, in which the signal-to-noise ratio was relatively low, spike waveforms were distributed continuously in shape and size, and interspike intervals were generally short. The unit at the 200 m site was thus classified as multiunit. For both sites, top and bottom sections of the panels for the raw record and spike waveforms indicate stable recording throughout the series that lasted ϳ1.5 hr. In general, we observed no systematic difference in stimulus coding between the well isolated units and the others. We presume that contamination of the single-unit recording by additional units could only decrease the spatial specificity of spike patterns, so we regard our estimates of stimulus-coding accuracy as conservative.
Pairs of units recorded from adjacent sites sometimes showed sharp peaks at 0 msec in histograms of between-unit spike times. This implies that spikes from one unit appeared on more than one recording site, and thus our spike-sorting procedure accepted those common spikes multiple times in individual sites. We believe, however, that such common spikes had negligible effects on the present analyses. In Ͼ96% of units, common spikes (defined as spikes that occurred within Ϯ50 sec of time relative to spikes in other units) accounted for Ͻ10% of the total number of spikes. We also presume that contamination by common units could only decrease the efficiency of stimulus coding by multiple units, so we regard our estimates of coding accuracy as conservative.
E xperimental procedure. Recordings were made from penetrations that passed dorsoventrally, oblique to the cortical surface near the crest of the middle ectosylvian gyrus, ventral to area A1. Search stimuli consisted of broadband noise bursts, presented in the region of 0°to contralateral 40°a zimuth. The penetration depth was adjusted to observe unit responses at as many recording sites as possible. T ypically, unit responses were observed at ϳ10 out of 16 recording sites in each probe penetration. Area A2 was distinguished from cortical area A1 by the absence of tonotopic organization and by response bandwidths that were one or more octaves at 40 dB above threshold.
We restricted attention to cortical area A2. We favored area A2 over adjacent area A1 because area A2 neurons show broader frequency tuning and for that reason presumably are better suited to integrate location cues across a broad frequency range. We favored area A2 over the anterior ectosylvian sulcus area (area AES) because our previous studies of single units in areas A2 and AES show quantitatively somewhat more uniform representation of auditory space by A2 units , particularly in the vertical dimension . Also, the relation of area A2 to thalamic inputs and to other auditory fields is somewhat better understood than is that for area AES. Nevertheless, in comparing recordings from areas A2, AES (Middlebrooks et al., 1998), and A1 (Middlebrooks and Pettigrew, 1981) (J. C. Middlebrooks, L. Xu, and S. Furakawa, unpublished observations), we see no obvious qualitative specialization of one area over another for spatial coding. Most units in those three areas show similarly broad spike-count tuning for sound location when sounds are presented at moderate levels. The temporal spike patterns of most units can represent sound-source locations with varying degrees of accuracy throughout 360°o f auditory space. Because of the qualitatively similar responses of single units in areas A2, AES, and A1, we see no reason to expect prominent interarea differences in location coding by neural ensembles.
Study of the units in each electrode penetration began by identif ying a sound-source azimuth at which units responded reliably, typically 0°or contralateral 40°, and then measuring responses to noise bursts at a range of SPL s in 5 dB steps. The units' thresholds were estimated to the nearest 5 dB by inspection of poststimulus time histograms and plots of spike count versus SPL on the on-line monitor. When the units' threshold SPL s were not the same between recording sites, we adopted the modal threshold SPL of units as the representative threshold SPL for that penetration. Usually, the units' thresholds differed by Ͻ10 dB within one probe penetration. Next, the units' frequency sensitivities were measured with a sound source fixed at a location at which a noise source produced a strong response, usually 0°or contralateral 40°azimuth. Tone frequencies were varied in one-third octave steps from 1.18 to 30 kHz. The breadth of frequency sensitivity distinguished area A2 from A1 (Reale and Imig, 1980;Schreiner and C ynader, 1984). Then, we measured the units' spatial sensitivities using a stimulus set that typically consisted of noise bursts presented from 18 azimuths in the horizontal plane (Ϫ180 to 160°in steps of 20°) at five SPL s ranging from 20 to 40 dB above the units' threshold. Stimuli were presented in pseudorandom order such that all locations were tested at all SPL s once before repeating all stimuli again in a different random order. Each combination of location and SPL was tested Ն40 times. The study at each probe placement typically lasted ϳ2 hr. Measurement of azimuth sensitivity normally was followed by presentation of additional stimuli needed for related studies (Xu et al., 1999a,b), so several additional hours often were spent in each electrode penetration. E xperiments typically lasted 30 -60 hr and yielded recordings from one to seven electrode penetrations.
Data anal ysis. In off-line spike sorting, spike times were stored as latencies relative to the onset of sound at a loudspeaker. The arrival of sound at the cat's head was delayed by ϳ3.5 msec because of the acoustical travel time. The range of spike times used for the analysis was between 10 and 60 msec after the stimulus onset. The cortical neurons' spike latencies are longer than 10 msec after the stimulus onset, and we rarely saw robust responses after 60 msec. We created a spike density f unction from each response by expressing spike times with 100 sec resolution, convolving the spike times with a Gaussian impulse ( ϭ 1 msec), and then resampling at 2 msec time resolution. Convolution with the Gaussian impulse served to low-pass filter the spike patterns below 137 Hz, thereby attenuating aliased high frequencies, and served to smooth the otherwise-sparse spike density f unctions that were used as input to the AN N. The 2 msec resolution was chosen on the basis of preliminary tests with the AN N algorithm. Generally, coarser time resolution resulted in degradation in network performance, and finer resolution increased computation time without appreciable improvement of performance. We refer to spike density f unctions obtained as described here as "single-unit spike patterns." Note that, in the present paper, the single-unit spikes refer to spikes from a single unit or a small cluster of multiple units recorded at one recording site, as opposed to ensembles of spikes recorded at multiple recording sites, unless otherwise stated. In the analysis of responses of units at multiple cortical sites, single-unit spike patterns for each stimulus presentation were concatenated to form a long vector, referred to as an "ensemble spike pattern." We also manipulated single-unit and ensemble spike patterns to control response features that might carry stimulus-related information (detailed in each section of Results). For the purpose of testing the AN N recognition of spike patterns, we sorted the spike patterns for odd-and even-numbered trials into training and test sets, respectively. Thus, 40 trials yielded 20 training trials and 20 test trials for each stimulus. The separation of training and test sets provided a cross-validation of the pattern recognition scheme. Note that each spike pattern in the present study was a spike density f unction from a single trial, rather than an average of density f unctions from multiple trials as used in our previous studies (Middlebrooks et al., 1994Xu et al., 1998Xu et al., , 1999a. Artificial neural networks were constructed with the M ATL AB Neural Network Toolbox (The Mathworks, Natick, M A). The network architecture used in the present study was similar to that used in our previous studies (Middlebrooks et al., 1994Xu et al., 1998Xu et al., , 1999a. Figure 2 illustrates the architecture of the network. Input to the network consisted of vectors representing spike patterns. There was one hidden layer that had eight units with hyperbolic tangent transfer f unctions. The output layer had two units that had linear transfer f unctions and estimated the sine and cosine of the stimulus azimuth. By representing the azimuth by the sine and cosine, we avoided computational difficulties that resulted from the discontinuity in azimuth labels across the rear midline, where azimuths abruptly change from ϩ179 to Ϫ180°. The network structure was feed-forward and f ully connected. The network was trained with supervision to minimize the mean-squared error in estimates of the sine and cosine of the stimulus azimuth. The two outputs were combined into a single term by forming the arctangent of the two outputs. The only difference from previous studies was in the number of hidden units (eight rather than four units). Preliminary analysis of the ensemble spike patterns showed that eight hidden units were somewhat optimal; a network with fewer than eight hidden units seemed not capable of recognizing stimulus-related features in ensemble spike Figure 1. Unit recordings at two recording sites of one electrode penetration, P980618. Recording sites at 1200 m (a-c) and 200 m (d-f ), ventral to the most dorsal site of the electrode, are represented. a, d, Bandpass-filtered raw traces of records for 80 msec after the stimulus onset (passband, 0.8 -5 kHz) are shown. Unit spikes accepted by the spike-sorting procedure are marked with circles. Top and bottom sections of a and d are responses to the same stimulus (35 dB SPL at Ϫ40°azimuth) but are for the 16th and 3595th trials, respectively, which were ϳ1.5 hr apart in time. Amplitudes are on an arbitrary scale, but consistent within each recording site. b, e, Samples of spike waveforms accepted by the spikesorting procedure are shown. Top and bottom sections of b and e show the first and last 50 spikes, respectively, collected from the whole series of trials. Times are expressed relative to the times of the peaks. Scales of the amplitudes are the same as those of the raw record. c, f, First-order interspike-interval histograms of the spikes accepted are shown. Only intervals Ͻ9 msec are shown. The recording for the site at 1200 m (c) is an example of recordings that had well isolated single-unit spikes. The recording for the site at 200 m (f) is an example of our typical recordings in which we failed to isolate single-unit spikes because of relatively low signal-to-noise ratio, indistinct spike waveforms, and generally short interspike intervals. patterns effectively. A network with more than eight hidden units often showed slightly poorer performance than that with eight hidden units.
Supervised training of the networks used the "resilient backpropagation" algorithm to adapt network weights and biases (Demuth and Beale, 1998). During training, the network was presented only with spike patterns in the training set. Overtraining with the training set would have led to increases in the error in recognition of the test set. We avoided overtraining by an "early stopping" method. In this method, recognition accuracy for the test set was checked after each epoch of training, and training was halted when the network performance on the test set failed to improve for five epochs in a row. We adopted the weights and biases that resulted in the minimum error for the test set. Because training a network with the back-propagation algorithm begins with randomized weights and biases, each training of networks using a constant set of data produces slightly varying outputs. For that reason, we repeated the network training three times for each training set and then recorded the output of the network that produced the smallest error.
The stimulus SPL often had large effects on spike patterns, typically increasing spike counts with increasing SPL. We wanted to identif y codes for sound-source location that were invariant with stimulus intensity. Therefore, throughout the study, analyses were performed for responses to stimuli at five levels, ranging from 20 to 40 dB above the units' threshold in 5 dB steps.
The median value of the unsigned error (median error) was used as a summary measure of the quality of network performance. An alternative would have been to compute transmitted information in an informationtheoretic sense. In the present study, we preferred the median error to transmitted information, because the median error is intuitively more comparable with psychophysical measures of sound-source localization. Also, in a pilot analysis, we computed the transmitted information of network outputs on the basis of stimulus-versus-response matrices with response locations categorized with 20°steps. A plot of the transmitted information against the median error generally lay on a smooth, monotonically decreasing curve, regardless of the configuration of input spike patterns and the data set. This indicates that median errors and transmitted information had practically a one-to-one relationship. Under a few conditions in which the transmitted information did not correspond with the median error, we have reason to believe that the transmitted information measurement overestimated the amount of usef ul information.
T wo additional summary statistics that we used are the mean direction, or centroid, as a measure of the central tendency of estimates and the circular variance as a measure of the dispersion of estimates (Fisher, 1993). The centroid is the direction of the vector sum of the unit vectors for sample direction. That is, for a given set of sample directions, 1 , 2 , . . . , n , the centroid of the samples is computed by: The circular variance V is given by one minus the mean length of the resultant vector; that is, V ϭ 1 Ϫ R/n. The value of V ranges from 0 (perfect alignment of all responses) to 1 (highly dispersed responses); V is equivalent to one minus the "vector strength" (Goldberg and Brown, 1969).

RESULTS
We begin by describing the spatial sensitivity of single units and of ensembles of units. Then, we apply an ANN algorithm to identify sound-source location by recognizing the spike patterns of single units and unit ensembles. Next, we examine some specific features of ensemble response patterns that might carry stimulus-related information. Finally, we test location coding by large ensembles of units and compare with sound localization by behaving cats.

Response patterns of single units and of ensembles of units
Generally, units responded to the stimuli with one or a few spikes that fell within ϳ60 msec after the stimulus onset. Spike counts and latencies tended to vary with sound-source location. The spatial tuning of spike counts generally was broad, and the width of spatial tuning often increased with SPL. Those general characteristics were consistent with previous studies of area A2  and other cortical areas [AI (Middlebrooks and Pettigrew, 1981;Brugge et al., 1996); AES ]. Figure 3 represents three units recorded from one electrode penetration (P980618). Units 400a, 1200a, and 1400a were recorded at 400, 1200, and 1400 m distant from the most dorsal recording site, respectively. Raster plots (Fig. 3, top) represent spike latencies for various source locations at 20 dB above the units' thresholds. Each horizontal row of vertical bars represents one spike pattern. Each band separated by dotted lines represents 10 examples of spike patterns elicited at one source location. The bottom row of plots in Figure 3 shows average spike counts as a function of the sound-source azimuth. The thick and thin lines indicate stimulus SPLs of 20 and 40 dB above the units' thresholds, respectively. In the example, one can see that the first-spike latency varied with stimulus azimuth for unit 1200a, whereas spike latencies of unit 1400a were relatively invariant with stimulus location. Units 400a and 1400a showed some degree of contralateral tuning in spike count, but unit 1200a had a flat tuning. Increasing the stimulus SPL by 20 dB generally broadened the tuning of all three units. As shown in Figure 3, units differed in the spatial sensitivity of the magnitude and timing of their spike patterns. Those differences presumably would enhance spatial coding by ensembles of units. Figure 4 shows the spatiotemporal distribution of spikes elicited by sounds at six locations. In this format, the gray scale represents spike probabilities averaged over 40 trials. The y-axis represents cortical place, as distance relative to the most dorsal recording site, and the x-axis represents time after stimulus onset. One can see several features in the patterns that vary with the stimulus location, including general strength of response, response latency after stimulus onset, relative response strength among units, and relative response latency among units.

Recognition of ensemble spike patterns
We tested the accuracy with which stimulus locations could be identified by recognition of their corresponding cortical spike patterns. We used ANNs for pattern recognition. Figure 2 diagrams the network architecture used in the present study; the details of network architecture are described in Materials and Methods. We used various configurations of inputs to the network to explore features in single-unit and ensemble spike patterns that might carry stimulus-related information (see Materials and Methods for the definitions of single-unit and ensemble spike patterns). Figure 2 illustrates an example in which inputs were ensemble spike patterns. In this example, each input vector was constructed by representing poststimulus spike times for each unit with a vector of 25 2 msec time bins and then by concatenating the vectors of N units to form a 25-ϫ-N-dimensional vector. We emphasize that, except when stated otherwise, input vectors were based on spike patterns from single trials, unlike our previous studies that used average spike patterns of multiple trials drawn randomly with replacement (Middlebrooks et al., 1994Xu et al., 1998Xu et al., , 1999a. We compared the identification of sound location based on the recognition of ensemble spike patterns with that based on the recognition of single-unit spike patterns. As described above, ensemble spike patterns were 25-ϫ-N-element vectors based on N units recorded simultaneously in a single trial. The bar graph of Figure 5 shows the network classification performance of ensemble spike patterns. Each bar represents the median error of the network estimations for one electrode penetration. The results are ordered according to the median error for ease of visual inspection. Note that the result for each penetration was based on spike patterns using all the available units (after screening), and thus the number of units varied between 5 and 19 from one penetration to another. Median errors ranged between 22.9 and 85.1°(49.2 Ϯ 11.9°, mean Ϯ SD), and the majority of ensembles showed much better performance than the 90°median error that  is expected given chance performance. The plus signs indicate the median errors of ANN estimates based on spike patterns of the individual units that formed the unit ensemble represented by the corresponding bars. Here, each input vector was a 25-bin vector based on a single-unit spike pattern for a single trial. Not surprisingly, the network performance for single-unit patterns was considerably worse that that for ensemble patterns in nearly every instance.
One factor that could account for the poorer performance in the single-unit condition compared with the ensemble condition was large trial-by-trial variability of single-unit spike patterns. That variability could be reduced by combining multiple independent samples of spike patterns. To evaluate the effects of response variability, we presented an ANN with spike patterns compiled from responses of single units across multiple trials. For each penetration, we chose the single unit that showed the best performance (i.e., smallest median error) in the single-trial condition. Then, we concatenated spike patterns of that unit drawn from multiple trials. The number of trials used for one input vector was the same as the number of available units for the corresponding penetration, so that the size of an input vector was equal to that of a ensemble spike pattern. For each input vector, trials were selected randomly with replacement. We refer to this configuration of input vectors as the "best-unit concatenatedtrials" configuration. A comparison with the ensemble configuration is shown in Figure 6. The figure shows that median errors for the best-unit concatenated-trials configuration (mean, 46.0 Ϯ 10.3°) usually were equal to or slightly better than that for the ensemble (single-trial) configuration (49.2 Ϯ 11.9°; difference, 3.3 Ϯ 4.1°; p Ͻ 0.001, paired t test; df ϭ 33). Note that the ensembles included individual units that showed both good and poor ANN performance, whereas the spike patterns for the best-unit concatenated-trials configuration were based on the best units. Nonetheless, the result raises the possibility that the advantage of ensemble spike patterns over single-unit patterns is nothing more than a reduction in trial-by-trial variability.
That possibility is countered by examination of an alternate measure of ANN performance. The centroid error represents the deviation of the centroid of network estimates (as defined in Materials and Methods) from the actual stimulus location. The median error is influenced both by the centroid error and by the trial-by-trial variance of network estimates (the circular variance, see Materials and Methods). We found that network estimates based on ensemble patterns generally had smaller centroid errors than did estimates based on best-unit concatenated-trials patterns. Figure 7 shows the correspondence of network estimates with actual stimulus locations for the ensemble (left) and best-unit concatenated-trials (right) configurations. The results are for penetration P990450 and for unit 12a for the best-unit concatenatedtrials configuration. Each plus sign represents a network estimate based on one input spike pattern, plotted against the stimulus azimuth. The solid lines indicate the centroids of network estimates at each azimuth. For these examples, the median errors of network responses were similar for the ensemble and the bestunit concatenated-trials condition (39.4 and 38.9°, respectively), but the deviation of the centroid from the perfect-performance line was smaller for the ensemble configuration than for the best-unit concatenated-trials configuration. The smaller centroid error for the ensemble configuration was balanced by the somewhat larger variance of responses (i.e., the large scatter of points around the centroids). The centroid error found in the best-unit concatenated-trials configuration probably reflects relatively nonuniform spatial sensitivity of the single unit. The centroid error could be reduced by combining spike patterns of multiple units, each of which had a somewhat different pattern of spatial sensitivities. The difference between the accuracy of location recognition based on ensemble spike patterns compared with that of the  best-unit concatenated-trials configuration was compared across the sample of 34 penetrations tested at 18 sound-source locations. On average, (unsigned) centroid errors were smaller for the ensemble configurations (difference, 5.8 Ϯ 29.3°; p Ͻ 0.001, paired t test; df ϭ 611), but the variance was slightly larger (difference, 0.10 Ϯ 0.17; p Ͻ 0.001, paired t test; df ϭ 611).
In sum, this result suggested an advantage of combining information from units that vary in spatial sensitivity. This factor probably could account for the difference in ANN performance between neural ensembles that consisted of single units with nearly equal median errors if tested individually. For example, in Figure 5, penetrations P980618 and P981120 ( first and sixth bar from left, respectively) consisted of single units with similar ranges of median errors in the single-unit configuration (compare the distributions of the plus signs), but the median error for the ensemble of P980618 was substantially smaller. We noted that the spatial sensitivity varied considerably among the units in penetration P980618 (see Fig. 3), whereas the spatial sensitivity was fairly constant among the units of penetration P981120. Thus, the coding efficiency by a neural ensemble is determined not only by the sensitivities of individual member neurons but also by the diversity of response patterns across units.

Exploring codes for sound-source localization
In this section, we consider specific features of the magnitude and timing of ensemble spike patterns that might contribute to accurate stimulus-location recognition. Our basic approach is to identify a feature that might carry stimulus-related information, to eliminate that feature, and to test for a change in the accuracy of location recognition.

Count versus timing information
Spatiotemporal response patterns like those in Figure 4 appear to show stimulus-related variation in the distribution of spike latencies. We tested the hypothesis that spike latencies carry stimulusrelated information by comparing the ANN recognition of locations with two configurations of input vectors. The "full-pattern" configuration used input vectors to the ANN that were fullensemble spike patterns (identical to the ensemble configuration in the preceding section). The "count-only" configuration used vectors that consisted of only the spike count of each unit. That is, the response of an ensemble of N units was represented by N numbers; no timing information was maintained. The spike count of each unit within an ensemble was expressed in a z-score by subtracting the mean across trials and dividing by the SD. This normalization was conducted to equalize the dynamic range of spike counts between units. Although this normalization had no effect on the results of the analysis in this section, it was done for the consistency with other conditions tested in the following section. Figure 8 plots median errors of ANN estimates for the count-only configuration versus those for the full-pattern configuration. Each circle represents one penetration, and the diagonal line indicates equal median errors. Median errors for the countonly condition were substantially lower than the chance-level median error of 90°. This indicates that the pattern of spike counts only, without spike-timing information, can carry a considerable amount of location-related information. The figure also shows that median errors for the count-only condition (58.8 Ϯ 12.0°) were almost always larger than that for the full-pattern condition (49.2 Ϯ 11.9°; difference, 9.5 Ϯ 6.0°; p Ͻ 0.001, paired t test). Thus, spike-timing information carries additional locationrelated information.

Coding by relative spike count
We can distinguish two features of ensemble spike counts that might carry stimulus-related information. One is the strength of the total activity of multiple units. We refer to this code as the "absolute-count" code. This type of code might be effective to some degree for auditory cortical neurons for the reason that spike counts of most units are tuned to contralateral locations (Brugge et al., 1996;Middlebrooks et al., 1998) so the sum of Figure 7. Correspondence of network estimates with actual stimulus locations for the ensemble configuration (left) and the best-unit concatenated-trials configuration (right). The results are for penetration P990450 and for unit 12a for the best-unit concatenated-trials configuration. Each plus sign represents a network estimate based on one input spike pattern, plotted against the stimulus azimuth. The dashed lines indicate the perfect performance. The solid lines indicate the centroids of network estimates at each azimuth. Contra and Ipsi indicate contra-and ipsi-lateral to the side of the recording site. spike counts across units also tends to show contralateral tuning. Another type of code is the profile of spike counts or relative spike counts across units. We refer to this as the "relative-count" code. We presume that the relative-count code could be effective because individual neurons have more or less unique patterns of spatial tuning. Therefore relative spike counts among units would be sensitive to sound locations. The relative-count code might be advantageous over the absolute-count code because the activity of multiple units is often modulated by common factors such as the stimulus SPL (Brugge et al., 1996;Middlebrooks et al., 1998) and by widespread fluctuations in cortical activity (Arieli et al., 1996). We evaluated the contributions of those two types of codes to stimulus representation by the patterns of ensemble spike counts.
We tested three configurations of input vectors to an ANN. In the count-only configuration, the spike counts of each unit were normalized to a zero mean and unity variance, as in the previous section. That configuration preserved both the absolute spike count and the relative spike count. In the mean-spike-count condition, the mean of the normalized spike counts across all units was used on each trial, which gave a measure of the total activity of multiple units, with information from the relative spike counts among units eliminated. In the relative-spike-count condition, the mean of normalized counts across all units was subtracted on each trial, eliminating the stimulus-related change in absolute spike count and leaving only the relative spike count. Stimulus-related information represented in the mean-spikecount and relative-spike-count conditions were complementary to each other.
ANN performance in the mean-spike-count and the relativecount conditions is compared with that of the count-only condition in the left panel of Figure 9. Circles and crosses indicate the mean-spike-count and the relative-count conditions, respectively, and each symbol indicates one penetration. The mean-count condition showed substantial degradation of ANN performance (69.2 Ϯ 10.2°), with the median error increasing by 10.4 Ϯ 8.3°on average ( p Ͻ 0.001, paired t test). On the other hand, the relative-count condition (63.0 Ϯ 11.8°) showed only a small, although significant, increase in the median error relative to the count-only condition (difference, 4.3 Ϯ 2.9°; p Ͻ 0.001, paired t test). This result indicates that relative spike counts across units accounted for a substantial fraction of location-related information carried by ensemble spike-count patterns.
The right panel of Figure 9 represents ANN performance for the count-only, mean-count, and relative-count conditions, but in this case, the ANN input vectors were based on responses of a single unit sampled from multiple trials, as was done for the best-unit concatenated-trials configuration in a previous section. We presumed that for arrays of identical units, relative spike counts across elements of vectors would not provide meaningful information about the stimulus and the mean spike count would be a more efficient code. As expected, one can see substantial degradation of ANN performance for the relative-spike-count condition (77.0 Ϯ 13.3°; difference, 18.2 Ϯ 11.2°; p Ͻ 0.001, paired t test), whereas median errors for the count-only and the meancount conditions (57.8 Ϯ 11.9°) differed insignificantly (difference, 1.0 Ϯ 3.4°; p ϭ 0.083, paired t test). In several cases, however, there was little degradation in ANN performance for the relativecount conditions. Probably, those cases could be accounted for by residual information about absolute spike counts in the relativecount vectors.

Coding by relative spike timing
In the present study, as in previous reports (Middlebrooks et al., 1994, the latencies of spikes relative to the onset of the stimulus were sensitive to the stimulus location. Information derived from latencies per se would be useful to the nervous system, however, only if an external signal were available to mark the stimulus onset. As an alternative, we tested the degree to which stimulus-related temporal information could be extracted from the relative timing among spikes. We compared ANN performance for four configurations of input spike patterns. In the "absolute-timing" configuration, patterns were identical to the full-ensemble patterns considered in previous sections. All spike times were expressed relative to the stimulus onset. In the "between-unit-timing" configuration, all spike times on each trial were expressed relative to the earliest spike recorded from any unit (excluding spontaneous spikes that occurred with latencies Ͻ10 msec). That configuration eliminated absolute latency information but preserved any information carried by the relative timing among units. In the "within-unit-timing" configuration, all spike times on each trial for each trial for each unit were expressed relative to the first spike from that unit. That configuration eliminated information from absolute latencies and from relative timing among units but preserved any information car- Figure 9. Comparison of the median errors of the network estimates for the count-only configuration and for the mean-spike-count or the relative-spike-count configuration. Left, The ensemble configuration. Right, The best-unit concatenated-trials configuration. Each circle (mean spike count) or cross (relative spike count) represents one probe penetration. ried by interspike intervals from each unit. In the three configurations that incorporated temporal information, each spike was represented by a unit Gaussian impulse, as described in Materials and Methods. Finally, the "count-only" patterns eliminated all temporal information.
In Figure 10, median errors of ANN responses for various configurations were plotted for each penetration. The cross signs in the top panel show median errors for the between-unit-timing patterns, and the open circles in the bottom panel show median errors for the within-unit-timing patterns. Two thin lines in each panel represent the median errors for the count-only patterns (top line) and for the absolute-timing patterns (bottom line), and they are identical in both panels. For ease of visual inspection, penetrations were sorted according to the median errors for the absolute-timing configuration. The top panel shows that, on average, the median errors for the between-unit-timing patterns (54.3 Ϯ 12.4°) were lower than that for the count-only patterns (difference, 4.5 Ϯ 4.8°; p Ͻ 0.001, paired t test), although they were higher than that for the absolute-timing patterns (difference, 5.1 Ϯ 4.1°; p Ͻ 0.001, paired t test). One can see instances in which performance in the between-unit condition was as good as that in the absolute-timing condition and other instances in which performance in the between-unit-timing condition was no better than that in the count-only condition. This result can be interpreted as indicating that a significant amount of stimulus-related information could be carried by relative spike timing across units without the knowledge of an external reference time. The bottom panel implies that this good performance for the between-unittiming patterns was not caused by interspike timing within the spike patterns of individual units. The points for the within-unit-timing pattern (60.0 Ϯ 12.0°) lie closely over the line for the count-only patterns. The difference in median error between the two configurations was small, indicating that within-unit spike patterns alone can carry little more information about the source location than can count-only patterns. Note that the median errors for the within-unit-timing patterns were slightly larger on average than that for the count-only patterns (difference, 1.3 Ϯ 2.0°; p ϭ 0.001, paired t test). We regard this difference as trivial, most likely caused by the stochastic nature of the ANN training procedure.

Effect of the size of the neuronal ensemble
In previous sections, the sizes of the unit ensemble were determined by the number of units recorded simultaneously at each probe placement; that number ranged from 5 to 19. We wanted to test the influence of the size of the unit ensemble on the accuracy of neural coding. We formed unit ensembles of arbitrary sizes by concatenating spike patterns of various numbers of cortical units drawn from the pool of 377 units from 10 cats. The number of cortical units N used for each input vector was 1, 2, 4, 8, 16, 32, 64, or 128. As in the previous analyses, 25 bins with 2 msec width were used for a spike pattern of one unit, and thus each input vector to the ANN had 25 ϫ N elements. For each N (N Ͼ 1), we obtained M combinations of N units drawn randomly with replacement from the pool of 377 units. The number of combinations was chosen such that M was Ն10 and the SEM of the median errors for each N was Յ1°; thus, M ranged from 10 to 82. We refer to this as the "random-sampling" condition. In addition, we used a combination of the N units that showed the best ANN performance (i.e., the smallest median errors with the single-unit singletrials configuration). We refer to this condition as the "best-Nunits" configuration. Note, however, that the combination of the best units would not necessarily be the optimum combination for coding by ensemble. A unit that had spike latencies that were very insensitive to sound-source locations, for instance, might have a very large median error by itself but would provide an accurate time marker in combination with units that showed locationsensitive latencies.
In Figure 11, each symbol represents the median error of the ANN performance for one combination of cortical units, plotted against the number of units N in the ensemble. Plus signs indicate the random-sampling configuration, and triangles indicate the best-N-units configuration. ANN errors decreased with an increasing number of cortical units in ensembles both for the random-sampling and the best-N-units configurations, approaching a value of ϳ16°in the best-N-units configuration. Also, we note that the present simulations could underestimate the performance by a large population of units, because the units were sampled from a limited population of units, and therefore neurons newly added to the ensemble would make less contribution to performance for the larger size of an ensemble.
We compared the results for the random-sampling configuration with those for ensemble spike patterns recorded simultaneously. Specifically in this section, we refer to the latter configuration as the "simultaneous-sampling" configuration. The circles in Figure 11 show the median errors of ANN responses for the simultaneous-sampling configuration, as a function of the number of units in the penetrations. The results for the simultaneoussampling configuration showed a somewhat similar tendency to those for the random-sampling configuration; median errors tended to decrease with an increasing number of cortical units (r ϭ Ϫ0.37; p ϭ 0.030, t test), and the distribution of the circles somewhat overlapped the median-error-versus-unit-number distribution of plus signs (random sampling). On average, however, median errors tended to be larger in the simultaneous-sampling configuration than in the random-sampling configuration. In quite a few cases, median errors for the simultaneous-sampling configuration were substantially greater than the distribution for the random samples.
At least three factors might account for the discrepancy between the results for the simultaneous-sampling configuration and that for the random-sampling configuration. First, our database could include data from regions of the cortex that are not involved in sound localization. It is possible that the distribution of auditory space-sensitive neurons was not uniform over area A2, from which some electrode penetrations might be made. Indeed, several penetrations that showed the largest median errors for the ensemble spike patterns rarely included units that showed median errors Ͻ70°in the single-unit configuration (Fig. 5, see bars and plus signs on the right). The possibility of sampling from an insensitive region could account for the particularly large median errors of some points of the simultaneous-sampling configuration shown in Figure 11.
Second, in the simultaneous-sampling configuration, cortical units were sampled from relatively small regions on the cortex, no wider than 1.5 mm. It could be that units within a small region of the cortex tended to have similar spatial sensitivities, and thus, combining those units would not be effective in reducing the centroid error. Schreiner and Cynader (1984) observed "binaural bands" and "binaural patches" in the cat area A2. Neurons within each band or patch tended to show a similar pattern of binaural interactions, such as excitatory/excitatory, excitatory/inhibitory, and monaural excitatory. The size of a binaural band or patch was on the order of 1 mm, which is comparable with the span of the electrode (maximum of 1.5 mm) used in the present study. On the other hand, in the random-sampling configuration, units were sampled across a broad range of the cortex and even across subjects. Distant units might tend to have relatively large differences in spatial sensitivity, and therefore, combining those units would effectively reduce centroid error.
The final factor that might account for the relatively poor ANN performance for the simultaneous-sampling configuration is a correlation of responses between units. Let us assume that the efficiency of stimulus coding by a neural ensemble is determined predominantly by units that have similar stimulus sensitivities and by their neural noise added to the neural signal. If the neural noise had no correlation between units, increasing the number of units in the ensemble would increase the signal-to-noise ratio of the ensemble responses and thereby would improve the coding efficiency. If the noise were somewhat correlated between units, however, the improvement of the signal-to-noise ratio would be substantially limited (Zohary et al., 1994). It is possible that the neural noise of units in our database was correlated to some degree. If that was the case, random sampling of the units would disrupt the noise correlation and therefore would overestimate the coding efficiency by actual unit ensembles. We examined the effect of noise correlation by comparing the ANN performance between two configurations. One was the simultaneous-sampling configuration. The other was the configuration that used the same data set, but trial numbers for each unit in an ensemble were randomly shuffled to disrupt the hypothetical noise correlation across units. We refer to the latter configuration as the "shuffledtrials" configuration. In the shuffled-trials condition, any correlation in firing between units could have resulted only from entrainment to stimulus onsets. Figure 12 compares the median errors of ANN responses for the two configurations. On average, the median errors for the shuffled-trial configuration (47.6 Ϯ 11.8°) were only slightly smaller than that for the simultaneous-sampling configuration (difference, 1.7 Ϯ 1.9°; p Ͻ 0.001, paired t test). The difference was too small to account for the discrepancy between the simultaneous-sampling and the random-sampling configurations. Thus, we conclude that in most cases the proximity of units is the most likely explanation for the discrepancy.

Comparison with psychophysical data
We compared the neural coding of sound-source locations, as represented by our ANN analysis, with the cat's performance in  a localization task. May and Huang (1996) measured the accuracy of the cat's voluntary head orientation responses to broadband noise bursts presented from speakers in the frontal sound field. Source locations in that study were restricted within Ϯ90°in azimuth. We trained and tested an ANN with input vectors consisting of spike patterns of the 128 units with the smallest median errors (as defined for the best-N-units configuration in the preceding section). We simulated the effects of a cat possibly basing its judgment on neurons from both sides of the cortical hemispheres by treating the responses for the even-numbered units as if they had been recorded from the contralateral (left) hemisphere. This was done by reversing the sign of the target azimuths for those units. To mimic the cat's task in the experiment by May and Huang (1996), we used neural responses to azimuths between Ϫ80°and ϩ80°only, and we disregarded ANN estimates to other than frontal locations. The other conditions of the ANN analysis were the same as the best-N-units (nonsimultaneous) ensemble conditions. Figure 13 summarizes the responses of behaving cats (left) and the ANN (right). Means and SDs of orientation responses of psychophysical listeners are indicated by circles and error bars, respectively, for each target speaker azimuth [May and Huang (1996), average data from their Table 1]. Triangles and error bars show the means and SDs, respectively, for ANN estimates based on ensemble spike patterns for one trial. The cat behavior tended to show a systematic undershoot in responses; that is, responses were biased toward frontal locations. In contrast, the means of the neural data showed little undershoot except for the most lateral targets. The differences in the characteristics of undershoots probably did not perfectly reflect real sensory sensitivities to sound-source locations for either the ANN or the behaving cat. The undershoot in the psychophysical data was probably attributable primarily to head movements that fell short of the target speakers, particularly at the extreme lateral locations. For the neural data, the undershoot for lateral targets probably was caused by a bias to avoid rear locations that arose from ANN training that was restricted to frontal speakers only.
Response variance was generally larger for the neural data than for the psychophysical data. Averages of the SDs across the speaker locations tested were 17.9°for the neural data (across 9 target locations) and 8.2°for the psychophysical data (across 13 locations). That difference also was reflected in the averages of unsigned errors across all locations, which were 8.9°for the neural data and 15.9°for the psychophysical data. SDs for the cat behavior tended to increase with increasing distance of the target from the midline, whereas the SD of the ANN performance was fairly constant across target locations except for the most lateral target locations. The small SDs of the cat's responses for target speakers around 0°could have reflected an artificial factor. In the psychophysical task, the cat was asked to fixate its head toward 0°in azimuth and elevation before a stimulus was presented followed by head orientation. Therefore, the response to a target at 0°r equired no head movement to achieve a correct response.

DISCUSSION
The results demonstrate (1) that spike patterns of unit ensembles recorded in response to single-sound presentations can signal the locations of sound sources, (2) that the relative counts and relative timing of spikes within ensemble spike patterns carry information about stimulus location, and (3) that the accuracy of localization by neural ensembles of adequate size approaches the accuracy of localization by cats in behavioral trials. Here, we comment on the strengths and weaknesses of the use of ANNs for analysis of neural coding, we consider features of ensemble spike patterns that do or do not appear to carry information related to sound-source location, and we compare sound localization by behaving animals with that by unit ensembles.

Use of artificial neural networks for analysis of stimulus coding
One might argue that the results of the present study could be obscured by our particular choice of network architecture and/or the way of representing spike patterns. Although we decided on the network and spike pattern configurations on the basis of preliminary analysis, our ANN configuration might have not been optimal to represent real coding efficiency by a neural ensemble. For that reason, our results represent a conservative estimate of information carried by the spike patterns. Another disadvantage of an ANN is that it tends to conceal the specific features that it uses to recognize spike patterns. For identifying specific information-bearing features, it is necessary to use alternative pattern recognition algorithms or to study carefully the connection weights and biases of trained ANNs. Nonetheless, we were able to infer information-bearing features in ensemble spike patterns empirically, for example, by removing information about absolute spike counts or about external reference time. Figure 13. Responses of behaving cats and the ANN to sounds in frontal locations. Left, Circles and error bars indicate the means and SDs of the cats' head orientation responses to sound in the free field [from May and Huang (1996), their Table 1]. Right, Triangles and error bars show the means and SDs of network estimates based on a single presentation of a ensemble spike pattern consisting of 128 best units. The network was trained and tested for the frontal speakers (Ϫ80°t o ϩ80°), and network estimates to rear locations were disregarded. The final computation of SDs in the right panel omitted outlying points that were defined as points that were Ͼ3.0 SDs from the means in the initial computation of the SD.

Concluding remarks
Overall, the present study demonstrated the effectiveness of neuronal ensemble codes in sound-source localization. As suggested by the significance of relative spike count and relative spike timing, coding by ensembles of neurons probably involves more than simple sums of information carried by individual units. Further studies will be necessary to identify details of information-bearing features and the neural mechanisms that can decode the features. Nonetheless, similarities in localization performance between spike patterns of neuronal ensembles and behaving animals should encourage further research on cortical roles and mechanisms for auditory space perception.