## Abstract

A common way to assess the function of sensory neurons is to measure the number of spikes produced by individual neurons while systematically varying a given dimension of the stimulus. Such measured tuning curves can then be used to quantify the accuracy of the neural representation of the stimulus dimension under study, which can in turn be related to behavioral performance. However, tuning curves often change shape when other dimensions of the stimulus are varied, reflecting the simultaneous sensitivity of neurons to multiple stimulus features. Here we illustrate how one-dimensional information analyses are misleading in this context, and propose a framework derived from Fisher information that allows the quantification of information carried by neurons in multidimensional stimulus spaces. We use this method to probe the representation of sound localization in auditory neurons of chinchillas and guinea pigs of both sexes, and show how heterogeneous tuning properties contribute to a representation of sound source position that is robust to changes in sound level.

**SIGNIFICANCE STATEMENT** Sensory neurons' responses are typically modulated simultaneously by numerous stimulus properties, which can result in an overestimation of neural acuity with existing one-dimensional neural information transmission measures. To overcome this limitation, we develop new, compact expressions of Fisher information-derived measures that bound the robust encoding of separate stimulus dimensions in the context of multidimensional stimuli. We apply this method to the problem of the representation of sound source location in the face of changes in sound source level by neurons of the auditory midbrain.

## Introduction

Sensory stimuli consist of multiple features, and thus occupy multidimensional spaces: an approaching predator has size, shape, and color; an aversive odorant has intensity and gradient; an acoustic communication signal has spectral, temporal, and spatial properties. Despite such variations, perception is remarkably invariant: that is, sensitivity to stimuli along a single stimulus dimension is robust to fluctuations in other dimensions. For example, animals can recognize the shape of an approaching predator at varied distances, in overcast or bright sun, or identify the origin of a communication signal whether loud or soft.

In many instances, populations of neurons have been identified whose responses are modulated by changes of a given stimulus dimension, and thus are said to represent that dimension. However, responses of sensory neurons are typically modulated by several stimulus features simultaneously, such as in vision (Lagae et al., 1993; Fiscella et al., 2015) or olfaction (Anderson et al., 2003; Stopfer et al., 2003). In hearing, there is a vast literature on this topic concerned with the encoding of sound frequency (Kim et al., 1990), vowel spectral shape (Sachs and Young, 1979; Blackburn and Sachs, 1990; May et al., 1998), sound source azimuth (Aitkin et al., 1984, 1985; Aitkin and Martin, 1987; Rajan et al., 1990; Brugge et al., 1996; Irvine et al., 1996; Day and Delgutte, 2015), and spatial cues (Semple and Kitzes, 1987, 1993; Irvine and Gago, 1990; Rice et al., 1995).

Sensitivity to multiple stimulus features means that single sensory neurons are not robust to changes in secondary dimensions of the stimulus, and complicates the question of which stimulus dimension a neuron or a population of neurons “represents”. Although this issue has been recognized across sensory systems, theoretical tools that robustly address the problem of neural coding of multidimensional stimuli are lacking. This article provides such tools to systematically analyze multidimensional receptive fields, e.g., to quantify how much information can be recovered about a single stimulus dimension in the face of changes in secondary dimensions. We use the sound localization system of the mammalian midbrain as a case study; however, the techniques can be applied more widely and generally to any sensory system.

Localizing sound sources relies on processing acoustical cues that vary systematically with source location. In general, low-frequency sources are localized via interaural time differences (ITD), whereas high-frequency sources are localized via interaural level differences (ILD; Rayleigh, 1907; for measurement of cues in the species considered in the present paper, see Koka et al., 2011; Greene et al., 2014). Sensitivity to ILD relies on comparison of sound level at the two ears, which is also of course directly affected by the level of the sound source (ABL; Fig. 1*c*). Any given binaural level combination is therefore the result of the source being at a particular location and a particular intensity.

Despite this potential source of ambiguity, human and animal sound source localization behavior is remarkably insensitive to the level of the source signal: sounds are accurately localized provided they are loud enough to be heard. This is true of humans' absolute localization (Macpherson and Middlebrooks, 2000; Su and Recanzone, 2001; Vliegen and Van Opstal, 2004; Sabin et al., 2005), but also in cats (Gai et al., 2013), or macaque monkeys (Recanzone and Beckerman, 2004).

Altogether, available studies suggest that an ABL-independent neural representation of location (or ILD) should be available at some level of the auditory pathway. In the present article, we seek to find this representation by estimating the information available about ILD in auditory neurons whose response in general also varies with ABL (Semple and Kitzes, 1987; Irvine and Gago, 1990; Park et al., 2004; Tsai et al., 2010; Day and Delgutte, 2015).

## Materials and Methods

#### Experimental design and statistical analysis

Data were obtained from eight adult long-tailed chinchillas (*Chinchilla lanigera*; males, 500–700 g), and six adult guinea pigs (*Cavia porcellus*; 2 females, 450–1170 g). Details of the statistical analyses for the experiments can be found in the rest of Materials and Methods, as well as in the legends of Figures 2, 3, and 5.

#### One-dimensional Fisher information and the Cramér–Rao bound

The Cramer–Rao bound (CRB) is a bound on the performance of unbiased estimators (or decoder) of a stimulus dimension *x* (for background on estimation, see the two chapters by Jansen and Claeskens, 2011; Reid, 2011). We assume that the response of the neuron is a random variable conditional on the value of the stimulus, such that it can be described by *R*|*X*, where *X* is the one-dimensional stimulus and *R* the rate. Of course, the mean value of *R*|(*X* = *x*) depends on *x*, and is generally referred to as the tuning curve, noted *r*(*x*). Letting *x̂* be an estimator of *x* given the observed rate: *x̂* is a random variable (it is a function of the spike rate, itself a random variable). An estimator *x̂* therefore has an expected value, and an unbiased estimator is one such that *E* [*x̂*] = *x*. This assumption expresses the fact that the estimator does not make any systematic errors, i.e., systematically underestimate or overestimate the value of the stimulus.

In this context, the CRB states that the best achievable precision of such an estimator is such that:
That is, the variance of the estimator can be no smaller than the inverse of the Fisher information (FI), noted *J*(*x*). The information is the second moment of the score function:
where the log-likelihood function is derived from the rate distribution conditional on the value of the stimulus *R*|*X*:
Provided that *r*(*x*) is the spike count of the neuron, and that it can be modeled as Poisson distributed (the variance of the spike count is equal to the spike count itself), the FI has a simple expression (Seung and Sompolinsky, 1993) that only depends on the tuning curve:
where ∂*r*/∂*x*(*x*) is the derivative of the mean count with respect to the stimulus dimension *x*. Note that the denominator of this expression is the variance of the spike count at *x* (equal to the mean count in a Poisson model).

If, rather, we assume that the spike rate of the neuron is normally distributed, with a constant variance across *x* values then:
where σ^{2} is the variance of the spike rate reflecting neural noise.

If the variance is considered to change with the stimulus, it positively contributes to the information, and the FI reads (Dayan and Abbott, 2001):
In this paper, we are only interested in how changes in the mean response [i.e., *r*(*x*)] influence the information in the context of multiple stimulus dimensions. To simplify later developments, we will assume that we can estimate FI by omitting the second term (still using the stimulus-dependent values of the variance). The validity of this assumption can be quantified precisely, especially in the reasonable case where each neuron's response exhibits a constant Fano factor *F*, defined as follows:
In this case, the FI can be written as follows:
Thus our assumption leads to an error proportional to the inverse squared spike count, and can therefore be considered valid provided they are large enough.

#### FI in multiple dimensions

##### Fisher information from the marginalized rate distribution.

In the one-dimensional case, FI emerges from a comparison of the spike rate distributions at neighboring stimulus parameter values. When dealing with multiple tuning curves, there is not a single rate distribution *R*|*X* available to the experimenter for a given stimulus parameter value, but rather all distributions corresponding to different values of the second dimension *y*: *R*|(*X*,*Y* = *y*) (Fig. 3*i*, colored distributions). To compute the FI when decoding *X*, we need to consider the rate distribution conditional on *X*, *R*|*X*, which is a compound distribution obtained by marginalizing *Y* out of *R*|*X*,*Y*. Note that *R*|*X*,*Y* describes neural noise: its variance corresponds to the variability of the neural response with fixed stimulus parameters, on the other hand *R*|*X* describes neural variability: it also describes apparent response variability due to the uncertainty about *Y*.

To compute the marginal distribution *R*|*X*, it is useful to assume that the random variables *X* and *Y* are independent. In the example developed in the main text, it is clear that ILD and ABL are a priori independent, because the loudness and spatial position of a sound should not be strongly correlated. If more information about the non-independence of *X* and *Y* is available, it can be useful to change the parametrization of the stimulus space into independent dimensions (e.g., by independent component analysis; Comon, 1994).

It should be noted that, when we are considering the *R*|*X* obtained by marginalizing *Y* out, the estimation problem exactly reduces to the one-dimensional case in which one is estimating the value of *X* from the conditional distribution *R*|*X*. Therefore, the CRB still holds when FI is evaluated using this likelihood derived from the marginalized conditional rate distribution. We compute the FI about *X* as the FI obtained by numerically evaluating the second moment of the score function, from a conditional rate distribution where *y* has been marginalized out.
At this point, no specific assumption as to the precise nature of the conditional rate distribution was made (the neural noise *R*|*X*,*Y*), it is thus for now impossible to provide a more compact expression.

A notable difficulty is that even if we assumed that the neural noise (*R*|*X*,*Y*) had a specific nature (e.g., is Poisson or normal), the nature of the variability distribution *R*|*X* would still not in general be the same as the neural noise. It in fact depends in a complex way on the marginal distribution of *Y*. In the next section we investigate methods to provide a more compact expression of the FI about *X* under reasonable conditions on the distributions of *R*|*X*,*Y* and *Y*.

##### Local marginal fisher information.

Let *R* be the random variable describing firing of rate, and *X* and *Y* two random variables describing the values of the stimulus. We will assume that neural noise is normally distributed with a constant variance across *X*, such that:
To put it simply, the discharge rate of the neuron is assumed to depend on the values of the stimulus parameters (*X* and *Y*) through the response map *r* which we measure experimentally (the multidimensional analog of the tuning curve). The discharge rate is variable because of neural noise, and therefore there is some stimulus-independent variance to the distribution of rates conditional on the stimulus values: σ^{2}.

The motivation for this development presented here is to reduce the rate distribution to the one-dimensional case in which mean rate only depends on the value of one stimulus parameter, i.e., evaluating the random variable *R*|*X*. We will do so by marginalizing out the contribution of *Y*. We will look at *R*|*X* as an instance of a compound distribution (or mixture), in which the mean of a random variable (*R*|*X*) depends on another random variable (*Y*; Seidel, 2011). Conveniently, when *R*|*X*,*Y* and *Y* are normally distributed, so is *R*|*X*. This result allows us to find a good expression of the FI about *X* locally in *y*.

First, we postulate that the uncertainty about *Y* is modeled with a normal distribution. That is as follows:
Now, we can see the conditional rate distribution *R*|*X* as compounding *R*|*X*,*Y* and *Y*. First, let us recall a basic result of compounded distributions. If *A* and *B* are random variables such that:
Then the marginal distribution of *B* (the compounded distribution) is also normal with μ* _{B}* mean and σ

_{A}

^{2}+ σ

_{B}

^{2}variance: Second, we assume that δ

*is small enough with respect to variations of the response map, we can linearize the values of the mean rate*

_{Y}*r*(

*x*,

*y*) about

*y*

_{0}and write: We are now ready to express

*R*|

*X*as the distribution resulting from compounding: when

*U*and

*X*are independent. Applying the above result to our case we obtain: As above, by marginalizing out the contribution of

*Y*we have reduced our problem to the one-dimensional case. Therefore, we can still compute the FI using the normal with constant variance estimate of FI, and the CRB holds.

This expression is intuitive: assuming that *Y* is distributed around *y*_{0} a good approximation of the rate distribution conditional on *X* = *x* is a distribution of mean *r*(*x*, *y*_{0}), whose variance is the neural noise σ^{2} to which is added a term proportional to (1) the uncertainty about *Y*: δ_{Y}^{2} and (2) the magnitude of variations in mean rate when *y* is varied: ∂*r*/∂*y*)^{2}.

Having found an expression for the distribution of the conditional rate distribution *R*|*X*, we can now compute the FI with a closed form expression because *R*|*X* is a normal distribution. In the end, we define the local marginal Fisher information as the FI associated with estimating *X* from *R*|*X*:
This expression is again rather intuitive: the marginal information is analogous to that obtained by computing FI with the tuning curve *r*(*x*, *y*_{0}), with an added positive term in the denominator accounting for the additional variability due to *y*.

In the case where tuning curves are perfectly robust to changes in *y*, then ∂*r*/∂*y* = 0 and the local marginal FI expression reduces to the one-dimensional FI. On the other hand, two effects can reduce the amount of local marginal FI by making the denominator larger. First, if tuning curves are more “separated” at different *y* values (the (∂*r*/∂*y*)^{2} term), and second if we assume that *Y* values can span a larger range (the δ_{Y}^{2} term).

To reach this simple expression we have assumed that neural noise is normal with a stimulus-independent variance, that the values of *Y* are normally distributed around a mean value, and that the variations of the response map in *y* are linear. This expression should be understood as a description of local (in stimulus space) information, in which case those assumptions are more likely to be met.

##### Extension to *N* dimensions.

Hereto we have treated the case of two stimulus dimensions, but note that the foregoing developments are readily extended to *N* dimensions. We assume that we are trying to decode one dimensions out of *N*, so we let **Y** be a vector of *N* − 1 dimensions, where each dimension is independent (and independent on *X*) and distributed according to normal distributions of different SDs **δ**. We can again linearize the rate locally around (*x*,*y*_{0}) and think of *R*|*X* as the compounded distribution:
Where the *U _{i}* variables are independent on each other and on

*X*. Applying

*N*− 1 times the compounding result stated above, we obtain: and in the end:

#### Fisher information matrices

The problem of decoding from multidimensional neural response maps is mathematically similar to that of decoding from one-dimensional tuning curves. In fact, it also admits a CRB analogous to that in one dimension, except it is expressed with Fisher information matrices (FIMs; Cover and Thomas, 2012). We will present this concept in two dimensions, but it extends in *N* dimensions. Consider an estimator *T* of two parameters *x* and *y*, then all unbiased estimators *T* of (*x*,*y*) are such that:
Where the order is taken to mean that the matrix Cov(T) – J^{−1}(*x*, *y*) is positive semidefinite, and *J*(*x*, *y*) is the FIM, defined using the log-likelihood ℒ(R; x, y) = log(P(R|X = x, Y = y)), where *R* is the random variable describing a neuron's rate:
The exact expression of the FIM again depends on the nature of the neural noise distribution (i.e., how is *R* distributed). If we again assume that neuronal firing is normally distributed with constant variance σ^{2}, the log-likelihood reads:
With minor manipulations, the FIM can be shown to have elements:
where the subscript *u*,*v* refers to entries in the matrix, corresponding to the dimensions of the stimulus. More explicitly, if the response map varies with two dimensions *x*,*y*:
where ∇*r* is the gradient of the response map: ∇*r* = .

Note that the FIM assuming spike counts follow a Poisson distribution of mean *r*(*x*, *y*) is very similar: ∇*r*∇*r ^{t}*/

*r*

The FIM matrix, as the outer product of the gradient, has unit rank and thus only one non-zero eigenvalue equal to its trace:
with the corresponding principal vector along the gradient ∇*r*.

#### Local marginal FI in arbitrary directions

To obtain the marginal information for an arbitrary stimulus direction, we use the definition of local marginal FI introduced above, and consider arbitrary rotations of the coordinate system. Let us compute the local marginal information in the (cos(θ), sin(θ))* ^{T}* direction by rotating the axes by an angle −θ (i.e., clockwise by an angle θ such that the direction of the gradient is along the

*x*-axis). The gradient of the rate map in this new coordinate system is equal to

*R*

_{−θ}∇

*r*, given that

*R*

_{−θ}is the rotation matrix: The local marginal information around a given stimulus value can be written as follows: We can then use trigonometry relations (see next section) to show that the local marginal information in any direction can also be written in a more compact way: Where α = arctan((∂

*r*/∂

*y*)/(∂

*r*/∂

*x*)) is the angle between the direction of best information (as provided by the FIM) and the

*x*-axis.

We can now compare the behavior of this expression with the FIM-derived values. We observe that when computing the local marginal information in the direction of the gradient (i.e., when θ = α), it reduces to *J*_{α±π}(*x*, *y*) = *J*_{λ}(*x*, *y*). Furthermore, if we compute the local marginal information in a direction orthogonal to that of the gradient, we get Therefore, it is the case that the eigendirection of the FIM provides the direction of maximal information, and that locally the information in a direction orthogonal to the gradient is equal to zero.

##### Derivation of the local marginal information in arbitrary directions.

This section provides a detailed derivation of the local marginal information in arbitrary directions presented in the previous section, which is not essential to the comprehension of the paper. We use the trigonometry relation below to reduce the linear combination of sines and cosines in the expression of the local marginal information:
where α = arctan(*b*/*a*). In our case, let be the angle between the direction of best information and the *x*-axis (i.e., the direction of the gradient). The numerator of the expression of the local marginal information is equal to:
Where the term is in fact equal to the magnitude of the gradient *J*_{λ}(*r*).

Using the same trigonometry relation, we can reduce the denominator, even though it takes a bit more work: where We can then use the relation: to show that: In the end, the denominator reduces in both cases to:

#### Numerical methods

##### Fitting the one-dimensional tuning curves.

When dealing with the one-dimensional ILD tuning curves (Fig. 1) we fitted the ILD tuning curves at each ABL using a gradient descent algorithm to fit a sigmoidal function of the form:
where ILD is the varied parameter, *r _{b}* the baseline firing rate,

*r*the maximal firing rate, ∇

_{m}_{ILD}a measure of the steepness of the sigmoid, and ILD* is the ILD of half-maximum of the sigmoidal function (inflection point). The analytic expression of the rate is then used to evaluate the slope, and thus the Fisher information.

The goodness of fit was evaluated using the *r*^{2} measure, with the usual sum of squares approach:
where *SS*_{reg} is the sum of squared differences between the data and the model, and *SS*_{tot} is the sum of squared differences of the data (i.e., proportional to the variance of the data across ILDs).

##### Estimation of local response map gradients.

The computation of Fisher information relies on a robust estimation of the gradient of the response map. Throughout this article, the gradients were estimated using a bootstrapping method. Response maps were resampled 100 times according to different trials, and the gradient estimated at each point. All gradients used for FI computation are computed by taking the median of the bootstrapped distributions of each component.

#### Electrophysiological recordings

##### Ethics statement.

All experimental procedures complied with guidelines set forth by the National Institutes of Health and were approved under a protocol submitted to the University of Colorado Anschutz Medical Campus Animal Care and Use Committee.

##### Brief description of the procedure.

All procedures for electrophysiological recordings used by the laboratory were described previously (Jones et al., 2015; Brown and Tollin, 2016) and are only briefly outlined here.

Data were obtained from adult chinchillas and guinea pigs anesthetized with intramuscular and intraperitoneal (respectively) injections of ketamine hydrochloride (22.5 mg/kg, 80 mg/kg, respectively) and xylazine hydrochloride (5 mg/kg, 8 mg/kg, respectively). Body temperature was maintained at 37°C by use of a heating pad and thermal probe, and all vital signs were monitored over the duration of the experiment.

All recordings occurred in a double-walled sound-attenuating chamber. The head was immobilized using a stereotactic apparatus, and a midline incision was made to expose the skull. The pinnae were reflected laterally to expose the bony ear canals. Custom insert earpieces were inserted in the ear canals and sealed with acoustical foam and cyanoacrylate. Microphone probe tubes for stimulus calibration were inserted to within ∼2 mm of each tympanum (chinchillas: probes were inserted via small holes drilled in each bulla ventral to the external canal; guinea pigs: probes were routed through custom insert earpieces). Finally, a craniotomy was made on the dorsal aspect of the skull exposing the cortex overlying the inferior colliculus (IC) and a tungsten microelectrode (2–3 MΩ) lowered into the central nucleus of the IC (ICC). All stimuli were presented at a nominal 100 kHz sample rate by Tucker-Davis Technologies System 3 hardware. Each earphone was calibrated for tones between 0.1 and 30 kHz in 0.1 kHz intervals using probe microphones (Type 4182, Bruel and Kjaer). Calibration data were then used to compute 256-tap digital filters (2.5 ms long finite impulse responses) providing a virtually flat acoustic response (±2 dB at frequencies ≤16 kHz). Auditory neurons were located and isolated by presenting sweeps of 50 ms tone pips increasing in frequency (0.1–16 kHz) at ∼40 dB SPL to the ear contralateral to the electrode (in general, contralateral sound presentation is excitatory). Auditory-driven responses were typically encountered ∼4–6 mm ventrally from the surface of the cortex. In addition to referencing stereotyped electrode coordinates, the success of the penetration in the ICC was assessed using electrophysiological signatures: ICC neurons exhibit strong, nonadapting responses to auditory stimuli, and are typically organized with a clear dorsoventral tonotopy [such that low-characteristic frequency (CF) neurons are encountered dorsally and high-CF neurons ventrally; Merzenich and Reid, 1974]. Neural signals were amplified, filtered, digitized, and recorded using TDT hardware before offline analysis. The CF of an isolated neuron was estimated by presenting 100 ms tones at 0–90 dB SPL across a ∼2 octave range of frequencies, forming a response area curve (Ramachandran et al., 1999).

Subsequently, ILD sensitivity of an isolated neuron was assessed by varying the sound intensity of 100 ms CF tones presented to both ears. For each unit, an ILD “map” was obtained where the sound level of the tones presented to the ipsilateral and contralateral ears were varied. In general, contralateral and ipsilateral levels were varied from ∼10 dB below the threshold of the unit (elicited via contralateral-only stimulation), up to 70–90 dB SPL in 5 dB steps. Every level combination was presented 50 times. It should be noted that because the sampling of sound levels differed across the measured neurons, when we compare spatially averaged information across neurons (Fig. 5*f–i*) we restrict the range of ipsilateral and contralateral level to a common range over which we have data for all the neurons of each species (40–90 dB for chinchilla, 40–70 dB for guinea pig; Fig. 5*c*,*d*).

## Results

### Neural information about ILD in the auditory brainstem

Neurons sensitive to differences in sound level presented to both ears (ILD) are present in the auditory brainstem as well as in other downstream sites (Semple and Kitzes, 1987; Irvine and Gago, 1990; Davis et al., 1999; Tollin, 2003; Jones et al., 2015). One such site is the ICC, a nearly obligatory synapse in the ascending auditory system at the level of the midbrain. An example ILD tuning curve for an ICC neuron is given in Figure 2*a* (in which the ABL has been kept constant at 60 dB SPL). This neuron responds strongly when the sound level at the contralateral ear (driving its excitatory inputs) is greater than the level at the ipsilateral ear (driving its inhibitory inputs), but responds much less when the ipsilateral ear intensity is strong enough. The firing rate of another example neuron on Figure 2*b* obtained under the same conditions is also modulated by ILD, yet its tuning curve is steeper.

A common approach to compare the tuning of each of the example neurons in Figure 2 is to assess the information carried by those neurons about ILD, for example using FI (Seung and Sompolinsky, 1993; Butts and Goldman, 2006). FI provides a measure of the reliability of the representation of a stimulus parameter by the response rate of a neuron. It was used in early auditory studies investigating the lower bounds on the performance of pure-tone frequency discrimination (Siebert, 1970) or binaural discrimination of ITDs (Colburn, 1973). As a result, there now exists a large body of research using FI in the context of binaural cue encoding by auditory neurons: for ITD (Dabak and Johnson, 1992; Harper and McAlpine, 2004; Gordon et al., 2008) and ILD (Johnson et al., 1990; Jones et al., 2015; Brown and Tollin, 2016). Other aspects of hearing have been analyzed using receiver operating characteristic (ROC) curves, a paradigm closely related to FI, for example in the context of ITD discrimination (Shackleton et al., 2003), tone detection in noise (Young and Barta, 1986; Relkin and Pelli, 1987), or intensity discrimination (Viemeister, 1988; Winslow and Sachs, 1988).

Consider the response of a neuron presented with a given stimulus: it follows a probability distribution (rate distributions on vertical axis of Fig. 2*c*,*d*) whose width reflects neural noise. FI quantifies how easy it is to distinguish spike rates drawn from response distributions at neighboring stimulus values (Fig. 2*c*,*d*, colored dots and distributions). If the tuning curve is steep (Fig. 2*c*, right), the overlap between rate distributions at neighboring values is small, and thus information (FI) is high. Similarly, FI increases if neural noise decreases, i.e., rate distributions conditional on the stimulus parameter are narrower (Fig. 2*d*). Both effects are captured by the classical expression of the FI (Seung and Sompolinsky, 1993; see Materials and Methods). Computed for the tuning curves in Figure 2*a*,*b* (dashed lines), FI is clearly higher for the neuron in Figure 2*b* at positive ILDs. That is, close to the steepest point in the tuning curve of the neuron in Figure 2*b*, smaller changes in ILD (around ∼20 dB ILD) can be distinguished based on its firing rate than that of the neuron on Figure 2*a*. This is because the tuning curve is much steeper and neural noise is lower (despite lower absolute firing rate modulation across ILDs). In this sense, the response of the neuron in Figure 2*b* could be said to carry more information about ILD than that of Figure 2*a*.

However, in most instances, the stimulus dimension being studied (here, the ILD) is not the only stimulus property that modulates the response of a sensory neuron, which should be reflected in the information it carries. In the next sections, we develop a framework to quantify the amount of information that the response of a neuron (or population of neurons) carries about a given stimulus dimension across variations in other stimulus dimensions.

### The problem with one-dimensional sensitivity analyses

Single neuron ILD tuning curves such as those shown in Figure 2 often depend on ABL, as previously observed at many stages in the auditory pathway: the brainstem [lateral superior olive (LSO); Benevento and Coleman, 1970; Park et al., 2004; Tollin et al., 2008; Tsai et al., 2010], the midbrain [(ICC) Geisler et al., 1969; Benevento and Coleman, 1970; Stillman, 1972; Semple and Kitzes, 1987; Irvine and Gago, 1990; Park et al., 2004; Mokri et al., 2015)], and auditory cortex (Semple and Kitzes, 1993; Brugge et al., 1996; Middlebrooks et al., 1998; Reale et al., 2003; Zhou and Wang, 2012; Kyweriga et al., 2014; Lui et al., 2015). Although the response of the neuron in Figure 2*a* changes very little with changes in ABL (Fig. 3*a*), the tuning curve of the neuron on Figure 2*b* dramatically shifts with ABL (Fig. 3*e*). Thus, although the neuron of Figure 2*b* is ILD-sensitive and has relatively high FI at each ABL (Fig. 3*h*), its firing rate cannot unambiguously be mapped to ILD at all ABLs; a firing rate of 20 Hz maps to −10 dB ILD at low ABL (47.5 dB SPL; Fig. 3*e*, red dot) but +20 dB ILD at high ABL (65 dB SPL; Fig. 3*e*, orange dot). In contrast, despite shallower modulation at each ABL, the neuron depicted in Figure 3*a–d*, has a much more robust tuning across ABLs (Fig. 3*e–g*), such that its firing rate maps to ILD similarly at all ABLs. That is, despite having smaller FI at each ABL value (Fig. 3*d*), this neuron represents ILD more faithfully than the neuron in Figure 3*e–h*.

Neurons such as the one in Figure 3*a–d* are thought to be sensitive to ILD because they are binaural: excited by stimulating the contralateral ear and inhibited by stimulating the ipsilateral ear (Tollin, 2003). By color-coding the spike rate against the value of the ipsilateral and contralateral level (Fig. 3*b*, f, darker colors are higher firing rates), it becomes apparent that the neuron in Figure 3, *e* and *f*, is in fact a monaural neuron; its firing rate is only modulated by the sound level at the contralateral, excitatory ear. Rather, this neuron only appeared to have a binaurally modulated response because when keeping ABL constant and varying ILD, the intensity at the contralateral ear changes (thus changing the excitatory drive of the neuron; Semple and Kitzes, 1987; Irvine and Gago, 1990). In contrast, the firing rate of the neuron in Figure 3*a–d* is modulated by ipsilateral level even when contralateral level is held constant (Fig. 3*b*). Although binaural physiologists have long recognized the specific pitfall highlighted here (Semple and Kitzes, 1987; Irvine and Gago, 1990), we use the case of ILD versus ABL to illustrate that blind measurement of responses along one dimension (ILD in the neuron shown in Fig. 3*e*,*f*) can lead to erroneous conclusions about the origin of response modulation and the information that it carries.

### Information in one stimulus dimension can be quantified using FI

The tuning properties of a sensory neuron to a given stimulus dimension *x* (ILD in our example), in the presence of a secondary dimension *y* (ABL in our example) can be fully captured with a set of tuning curves (Fig. 3*I*, blue curves). The key insight in this analysis is that added uncertainty due to secondary dimensions will appear to the external observer as an increase in response variability (Tsai et al., 2010, see their Appendix). Consider a single sensory neuron: when its rate changes, it is impossible to know if it was caused by a change in the first stimulus dimension (*x*, abscissa), or the second (*y*, the secondary dimension). To account for this uncertainty, all tuning curves must be analyzed collectively; the distribution of firing rates of a neuron at any given *x* value is obtained by compounding the distributions across different *y* values (Fig. 3*i*, bold lines). The additional uncertainty about *y*, thus results in an apparent increase in neural noise; i.e., a wider rate distribution corresponding to increased response variability. The magnitude of this effect depends on the spread of tuning curves across *y*: the more tuning curves vary with *y*, the wider the compound distribution becomes (Fig. 3*i*, right).

The obtained compound rate distributions can be used to derive a FI-like quantity, which we term the local marginal FI (mFI; see Materials and Methods), representing the amount of information available to decode *x* in a context where nothing is known about *y*. We applied this analysis to the responses of the two neurons presented in Figure 3, computing the compound rate distributions as a function of ILD (Fig. 3*j*, insets), and the corresponding mFI values (Fig. 3*j*). This analysis reveals that mFI for the ILD is in fact much larger for the binaural neuron (Fig. 3*j*, dashed line) than the monaural neuron (Fig. 3*j*, solid line), reflecting the fact that the ILD sensitivity of the binaural neuron is robust to changes in ABL, whereas that of the monaural neuron is not.

The application of the present analysis of course is not constrained to determining the binaurality of an auditory neuron. Rather, the information measure we propose, mFI, overcomes the general problem of erroneous overestimation of information in a single-stimulus dimension by allowing correct quantification of information about the dimension of interest across changes in secondary dimensions.

### The stimulus direction of best information can be computed using FI matrices

Although mFI enables us to quantify information within a given stimulus dimension across changes in a secondary dimension, we may instead be interested in characterizing which direction of stimulus space is best represented by a neuron. Note that this direction can in general be any linear combination of the physical dimensions that were experimentally varied. The complete treatment of this problem requires the introduction of FIMs, which describe the best achievable covariance matrix of estimators of the dimensions of a response map (the multidimensional equivalent of a tuning curve; see Materials and Methods). For a single neuron, the FIM for two stimulus dimensions *x* and *y* can be easily derived from the map *r*; letting ∇*r* be the gradient vector of the response map, the FIM is a matrix expressed as *J* = (∇*r*∇*r ^{t}*)/σ

^{2}(when neural noise is normally distributed; see Material and Methods).

Intuitively, it is impossible to unambiguously decode two stimulus dimensions from the response of only one neuron: there are two unknowns for only one equation; each rate value maps to a continuum of possible stimulus values (Fig. 3*e*). Equivalently, the single neuron FIM is singular: it only has one non-zero principal value. The combination of stimulus dimensions that is best encoded by the neuron is along the response map gradient (the FIM principal vector, ∇*r*; Fig. 4*a*, green arrow); the corresponding information magnitude equals the gradient magnitude divided by the variance of the spike rate (the FIM principal value). The orthogonal direction is 0 because locally the response map is flat (Fig. 4*a*, gray direction); consequently, estimators of the orthogonal dimensions will necessarily be biased (or suffer infinite variance; Stoica and Marzetta, 2001). The insights of the two-dimensional FIM are a natural extension of the one-dimensional FI; the information magnitude equals the magnitude of the derivative of the response map (i.e., the norm of the gradient in two dimensions) divided by the variance; the stimulus dimension of best information follows the gradient, i.e., the stimulus direction along which the tuning surface is steepest. FI matrices thus make it possible to quantify the stimulus dimension direction and magnitude of information from any sensory neuron's response map.

To illustrate the use of FIMs, Figure 4, *b* and *c*, represent the information vectors for the putative “monaural” and “binaural” neurons of Figures 2 and 3 (arrows), overlaid on the information magnitude (color-coded). As expected, the information vector magnitude is maximal in regions where the rate varies maximally (i.e., where the gradient is highest), and the best information direction follows along the direction where the tuning surface is steepest (Fig. 3*b*,*f*). Consistent with the intuition that the regions of higher slope in the response map yield higher information; the putative binaural neuron (Fig. 4*b*) carries information in the direction of ILD for a large range of stimuli, particularly at negative ILDs (Fig. 4*b*, below the first diagonal). On the other hand, the putative monaural neuron (Fig. 4*c*) carries the most information in the contralateral level direction at levels near the contralateral threshold (across a wide band of ipsilateral levels). Although the responses of both neurons are clearly modulated by ILD, and could thus be considered “ILD coders” on the basis of a single ILD tuning curve measured at fixed ABL (Fig. 2), this analysis reveals the stark difference in directionality of information carried by each neuron.

### FI in arbitrary directions of stimulus space

So far we have quantified information globally along a given dimension using mFI, and then the directions of best information using FIMs. We now describe how to evaluate information locally in arbitrary stimulus directions. For each stimulus parameter combination, we compute a local measure of mFI, which quantifies encoding of a given dimension locally invariant on the secondary dimension. We assume that the secondary dimension (*y*) values are normally distributed around a given point, and then linearize the rate map at that point (see Materials and Methods). As a result, we obtain an expression of the mFI that is local in stimulus parameters, and can be computed in arbitrary directions; we illustrate its behavior as a function of stimulus direction in Figure 4*d* (blue lines on both a polar and Cartesian panels), where we are assuming that ILD is the direction of maximal information. The local mFI expression resembles that of one-dimensional FI, and depends on both components of the gradient (see Materials and Methods for numerical details). Furthermore, this local mFI is equal to the magnitude of the FIM vector obtained when computing information in the direction of maximal information, and to 0 in the direction orthogonal to it. A *y*-scale parameter controls the range over which the secondary dimension is assumed to be distributed, therefore when it is increased, the reduction of information for directions away from the dimension of maximal information is steeper (Fig. 4*d*, polar and Cartesian panels). Indeed, the rate distribution at any given *x* (the primary dimension) is wider as the secondary stimulus parameter is allowed to span a wider range.

We computed the marginal information in the example neurons, in the four dimensions of interest, and as a function of each ipsilateral and contralateral level (Fig. 4*e*,*g*, color-coded panels). As expected, information for the putative monaural neuron was maximal about contralateral level at low contralateral levels (Fig. 4*g*). On the other hand, the putative ILD-sensitive binaural neuron contains information predominantly about ILD, for a wider domain of contralateral and ipsilateral values (Fig. 4*e*, top left). Notably, the information about contralateral level in the binaural neuron, or ILD in the monaural neuron, does not completely vanish: there is some residual information about both dimensions. When averaging local mFI over all stimulus parameter values (Fig. 4*f*,*h*, blue curves), each neuron is found to have a direction of maximal mFI, with information rapidly decreasing to almost zero in its orthogonal direction. This direction coincides with the highest valued principal direction of the averaged FIM (green lines).

We have shown that the information carried by single neurons can be characterized using the FIM framework, which relies on evaluating the gradient of the response maps. This FIM-derived information can also be linked back to the local marginal FI intuition as developed in Figures 2 and 3, providing a complete picture of the information carried by the neuron about all stimulus dimensions, across all stimulus values.

### Quantifying multidimensional information in populations of neurons

Last, we seek to characterize how multiple stimulus dimensions are represented by populations of sensory neurons. By definition, FI for a population of conditionally independent neurons is the sum of the FIs for each neuron because log likelihoods sum (Seung and Sompolinsky, 1993; Cover and Thomas, 2012); population FIM (pFIM) is thus obtained by summing the FIM for each neuron. As a result of summing, the pFIM is no longer singular provided that there are more neurons than stimulus dimensions, and there is sufficient heterogeneity in their responses. In contrast to the single-neuron case, it is generally possible to decode all dimensions of the stimulus space from a population response.

Consider the population of two neurons depicted in Figure 5*a* (box, arrows: direction of best information; solid lines: marginal information), in which both neurons carry information about the same dimension of the stimulus space. The population FIM is therefore twice that of each individual neuron, collectively summing in the direction of best information, yet remaining null in the orthogonal stimulus direction. On the other hand, if each neuron in the population carries information about different dimensions of the stimulus space (Fig. 5*b*), the FIM has two nonzero principal values (black arrows, right). Therefore, the principal vectors are no longer collinear, enabling representation of both stimulus dimensions simultaneously by the population. Correspondingly, the mFI (computed as the sum of the mFI of each neuron) is now nonzero in all directions (Fig. 5*b*), and the singular points of the mFI curve are well predicted by the pFIM. These observations suggest that tuning heterogeneity plays a key role in the context of simultaneous encoding of multiple dimensions. Note the property that the population FI equals the sum of individual FIs requires neuronal responses to be conditionally independent (given a stimulus value).

Thus far, we have shown two dramatic examples of auditory brainstem neurons: a binaural neuron for which most information is about the ILD, and a monaural neuron, for which most information is about the contralateral level. The question remains, however, whether all neurons fall into either category. To tackle this issue we computed FIMs and mFI carried by single neurons obtained in the ICC of two mammalian species: the chinchilla (*n* = 28; Fig. 5*c*) and guinea pig (*n* = 21; Fig. 5*d*). In these experiments, we generally targeted neurons in the deep, high-frequency regions of the ICC that tend to respond most strongly to sound in the contralateral hemifield (Day and Delgutte, 2013, 2015), and thus are modulated by ILD. Interestingly, we do not observe any obvious clustering of neurons into distinct categories. Rather, in both species neurons tended to carry the most information in a direction between contralateral level and ILD (Fig. 5*e*, top), whereas neurons carried little information in other directions (ipsilateral level and ABL; Fig. 5*e*, bottom). Nonetheless, roughly the same numbers of neurons were preferentially ILD coders and contralateral level coders (Fig. 5*d*). For visualization purposes we divided recordings into two groups: preferentially sensitive to ILD (binaural; Fig. 5*f*), or contralateral level (monaural; Fig. 5*g*). We observe that the direction of best information is very prominent for each neuron, and its orthogonal dimension has little to no information associated with it. In other words, all neurons in the population carry information preferentially about one stimulus dimension, and those dimensions tend to be in the ILD/contralateral level quadrant.

We compute both the population FIM and population mFI by summing across neurons, and averaging across space. In Figure 5, *h* and *i*, we represent the population mFI and pFIM-derived information vectors for either population of ICC neurons (green or blue solid lines). As expected, the highest eigenvalue of the population FIM indicates the direction of maximal m FI (Fig. 5*g*), whereas the lowest indicates the direction of poorer mFI. Collectively, the populations of ICC neurons carry information in all dimensions of the stimulus space, with a strong emphasis evenly distributed between ILD and contralateral level. Very little marginal information is carried about ABL or ipsilateral level, which reflects the fact that our population consists of clearly monaural (carrying information about contralateral level), or clearly binaural neurons (carrying information about ILD).

The IC is a bilateral structure, receiving inputs from symmetric neural pathways on each side. We represent the information carried by the ICC population on the other side of the midbrain (Fig. 5*h*,*i*, dashed lines), and the summed contribution of both ICs (Fig. 5*h*,*i*, black line). The contribution of a symmetrical ICC population is a clear increase in marginal information about ILD. Strikingly, the information remains very anisotropic, with little to no information being carried about the ABL, and the bulk of the information being carried about ILD. The observation that there is a large amount of FI in the ILD direction suggests that the heterogeneity in responses of ICC neurons (which includes monaural neurons), as well as symmetrical ICC populations allow for a robust representation of ILD in the presence of fluctuations of ABL.

### Higher-dimensional stimulus spaces and tuning heterogeneity

Of course, stimulus spaces are neither one- nor two-dimensional, but rather have many more physical dimensions. As an example, sound stimuli have frequency content in addition to level and source position, and ICC neurons respond to sounds within a given frequency band. In fact, although we have only treated an example of two-dimensional stimulus space, the key insights presented in this article readily generalize to higher-dimensional spaces.

This information carried by single neurons is maximal in one direction, and quickly decreases to vanish in the orthogonal direction (Fig. 6*a*). As a result, more heterogeneous populations of neurons will exhibit more uniformity in population information across stimulus dimensions (Fig. 6*b–d*). In *N*-dimensional stimulus spaces the response map becomes an *N*-dimensional surface, and the local marginal FI can be computed using its gradient (provided that the response variance is constant; see Materials and Methods). Any neuron still locally carries maximal information in the direction of the gradient, which quickly decreases as a function of the difference in directions (in 3 dimensions; Fig. 6*e*, top). With one neuron, marginal information is thus zero about any direction in the orthogonal plane (Fig. 6*e*, bottom, plane); with two, only one direction has zero information (Fig. 6*f*, black line). When the best information directions of at least three neurons span the full space (Fig. 3*g*), information is nonzero in all stimulus directions. A sufficiently large population with uniformly distributed best information directions (Fig. 6*h*, black arrows) represents all directions equally well (such that the population marginal information is spherical; Fig. 6*h*).

## Discussion

### Neural tuning in a multidimensional context

In this study we have emphasized the limitations of single stimulus dimension tuning curve analyses, and provided a framework to quantify information within more ecological multidimensional stimulus spaces. The main message of this article is that inferring the function of sensory neurons (or populations thereof) generally requires the simultaneous manipulation of multiple stimulus dimensions. Then, using local marginal FI and FI matrices, it becomes possible to accurately quantify information within a single dimension, or to estimate the dimension(s) of highest information. Toward rigorous characterization of sensory coding, identifying stimulus dimensions that do not modulate neural responses can be just as important as identifying ones that do. In that respect, the present study demonstrates that it is important to distinguish sensitivity and coding when it comes to analyzing single neuron responses (Brette, 2010). A counterintuitive insight is that assigning a specific dimension of the stimulus that a neural population “cares about” is more complex than merely establishing that the responses of neurons are modulated by that dimension.

The realization that neural coding must be considered in the context of multiple dimensional spaces suggests that neural tuning should be measured by varying all dimensions of the stimulus that affect the response of the neurons under study. Assuming that each stimulus dimension is discretized in *K* stimulus values, then it takes *K ^{N}* stimulus values to sample an

*N*-dimensional stimulus space. This causes an exponential increase in the number of stimulus values that are to be considered to assess the function of a single neuron, and therefore may appear experimentally infeasible. However, recent developments of electrophysiological techniques with high throughput (multichannel electrodes, calcium imaging, etc.), can mitigate that effect by increasing the number of simultaneously recorded neurons. Indeed, it is now possible to sample large stimulus spaces because many neurons are recorded for each stimulus presentation, and although such multiunit recordings do not reduce the dimensionality of the stimulus space, they do reduce the “cost” of recording per cell.

Practical issues encountered when sampling high-dimensional stimulus spaces aside, we have shown in this article that conventional sampling of single-dimensional subspaces can lead to false conclusions concerning the informational content of the neural population; this fact should be of concern to investigators interested in understanding neural information coding.

### Marginalization

Several previous studies have considered the problem of decoding a primary stimulus dimension from the responses of neurons that vary with secondary dimensions that are assumed to be irrelevant. We refer to this problem as the marginal decoding problem. In the context of the encoding of ILD and ABL, Day and Delgutte (2015) discuss maximum likelihood estimation of ILD from a population of ICC neurons. The authors show that one can recover ILD at a fixed ABL, but also regardless of variations in ABL with little added complexity. It should be noted the proposed model assumes that ABL is known to the decoder, which therefore requires an additional estimation step to recover ABL and does not directly address the marginalization decoding problem.

In a direct attempt to solve the marginal decoding problem in the context of probabilistic population codes, Ma et al. (2006) and Beck et al., (2011) provide an exact implementation of optimal marginalization in a biologically plausible context with nonlinear interactions (divisive normalization). More recently, Kim et al. (2016) showed that a simpler mechanism only using linear interactions to represent the marginalized likelihood could approximate that operation with good success. These approaches are based on maximizing a form of marginalized likelihood function reminiscent to that introduced in this article to derive the local marginal FI. As such, it is likely (although it remains to be proven) that the information measure that we derive here provides the bounds on decoding performance of the model they propose. In addition, consistent with our study, the authors stress importance of heterogeneity in tuning properties across neurons and stimuli to solve the marginal decoding problem.

### Neural and behavioral sensory thresholds

It has long been known that the response of a single sensory neuron can in some instances predict the performance limits of the complete organism; i.e., the information it carries is enough to perform as well as the complete organism. In addition, information scales with the number of neurons in a population and thus the information available at the level of populations of neurons greatly exceeds that necessary to explain behavioral performance (although in the presence of noise correlations population information may saturate [Pitkow et al., 2015]). This counterintuitive observation was made in many modalities (for review, see Parker and Newsome, 1998), e.g., in the macaque visual system (the MT; Britten et al., 1992, 1996), but also in the context of the encoding of ILD (Tollin et al., 2008; Tsai et al., 2010) or ITD (Skottun et al., 2001; Shackleton et al., 2003) by brainstem and midbrain neurons. A significant effort has thus been dedicated to explain this discrepancy: either the brain does not optimally use all information available (Gu et al., 2014; Pitkow et al., 2015), or noise correlations between neurons limit the information increase of neural populations (Moreno-Bote et al., 2014).

Here we argue that a limiting factor is the fact that sensory systems operate to process multidimensional stimuli. Consider a behavioral experiment probing discrimination abilities along a single dimension, while secondary dimensions are kept constant. One typically gathers neural recordings for the same stimulus combinations, and compares neural and behavioral thresholds. This comparison carries the implicit assumption that the values of the secondary stimulus dimensions are known to the subject, which is not the case of all experiments, especially when the secondary dimension values are roved (e.g., in psychophysics experiments). Because neural responses are not in general invariant to changes in secondary dimensions, discriminating values of the primary dimension requires either marginalization of the secondary dimensions (in which case information is measured by FI computed from the marginalized rates), or joint estimation of all dimensions (in which case information is measured by FIMs). Either way, characterizing the information requires the measure of neural responses in which secondary dimensions are also varied, and the use of the more complete information measures we introduced here. Because the information available can only decrease when secondary dimensions are allowed to vary (because of the negative effect of covariation of each dimension's estimates), we argue that this effect may cause the apparent discrepancy between behavioral and neural thresholds.

### Assumptions and limitations of the Fisher information approach

Our development relies on estimation theory and the FI framework with a number of assumptions that may limit its applicability. Notably, we only considered rate coding in this paper, although there also is information in the temporal patterns of spikes of auditory neurons (Chase and Young, 2006; Narayan et al., 2007). In addition, FI is criticized for not always agreeing with other measures of neural information based on information theory (such as stimulus-specific or mutual information; Yarrow et al., 2012), especially in small populations of neurons (<∼100 units). Our argument should, at minimum, be understood as qualitative: variations of responses with secondary stimulus dimensions impact the estimation of relevant aspects of the stimulus. In addition, the argument we make with FI is valid for larger population sizes, for which all information measures agree. Therefore, taking into consideration those secondary dimensions should also reduce the amount of information as computed with other measures because it increases the neural noise from the point of view of an observer. Finally, we have assumed for simplicity that neural noise is normal with stimulus-dependent variance, yet we have intentionally neglected the contributing of this stimulus dependence to the information (see Materials and Methods). This simplification enables us to provide simple expressions relating the variations of the response map with information loss in a multidimensional context.

The covariation of the rate among neurons in the same sensory population can impact information content both negatively (Moreno-Bote et al., 2014) and positively (Hu et al., 2014; Franke et al., 2016; Zylberberg et al., 2016). Here we have omitted consideration of the effect of the correlations, due to the specifics of the system under study (in which noise correlations are very small and do not seem to contribute to the encoding of ILD; Seshagiri and Delgutte, 2007; Belliveau et al., 2014). However, our framework, because it is based on FI, should lend itself to the inclusion of noise correlations in future developments. Further work could therefore extend our developments to information-theoretic measures, and relax assumptions we have made about the distribution of neural noise.

## Footnotes

We thank the reviewers for their helpful comments on the paper, as well as Dr. Joel Zylberberg for comments early in the project and Dr. Nathaniel T. Greene for comments and help with data collection.

This work was supported by National Institute for Deafness and Communication Disorders, National Institutes of Health Grant R01-DC011555 to D.J.T.

The authors declare no competing financial interests.

- Correspondence should be addressed to Victor Benichoux, Unité de génétique et physiologie de l'audition, Institut Pasteur, Paris, France 75015. victor.benichoux{at}pasteur.fr