## Abstract

Extracellular voltage recordings (*V _{e}*; field potentials) provide an accessible view of

*in vivo*neural activity, but proper interpretation of field potentials is a long-standing challenge. Computational modeling can aid in identifying neural generators of field potentials. In the auditory brainstem of cats, spatial patterns of sound-evoked

*V*can resemble, strikingly,

_{e}*V*generated by current dipoles. Previously, we developed a biophysically-based model of a binaural brainstem nucleus, the medial superior olive (MSO), that accounts qualitatively for observed dipole-like

_{e}*V*patterns in sustained responses to monaural tones with frequencies >∼1000 Hz (Goldwyn et al., 2014). We have observed, however, that

_{e}*V*patterns in cats of both sexes appear more monopole-like for lower-frequency tones. Here, we enhance our theory to accurately reproduce dipole and non-dipole features of

_{e}*V*responses to monaural tones with frequencies ranging from 600 to 1800 Hz. By applying our model to data, we estimate time courses of paired input currents to MSO neurons. We interpret these inputs as dendrite-targeting excitation and soma-targeting inhibition (the latter contributes non-dipole-like features to

_{e}*V*responses). Aspects of inferred inputs are consistent with synaptic inputs to MSO neurons including the tendencies of inhibitory inputs to attenuate in response to high-frequency tones and to precede excitatory inputs. Importantly, our updated theory can be tested experimentally by blocking synaptic inputs. MSO neurons perform a critical role in sound localization and binaural hearing. By solving an inverse problem to uncover synaptic inputs from

_{e}*V*patterns we provide a new perspective on MSO physiology.

_{e}**SIGNIFICANCE STATEMENT** Extracellular voltages (field potentials) are a common measure of brain activity. Ideally, one could infer from these data the activity of neurons and synapses that generate field potentials, but this “inverse problem” is not easily solved. We study brainstem field potentials in the region of the medial superior olive (MSO); a critical center in the auditory pathway. These field potentials exhibit distinctive spatial and temporal patterns in response to pure tone sounds. We use mathematical modeling in combination with physiological and anatomical knowledge of MSO neurons to plausibly explain how dendrite-targeting excitation and soma-targeting inhibition generate these field potentials. Inferring putative synaptic currents from field potentials advances our ability to study neural processing of sound in the MSO.

## Introduction

Extracellular voltage (*V _{e}*) recordings, also known as field potentials, can provide a view of neural activity across multiple locations in a brain area. The biophysics of synaptic and neural generators of field potentials are well understood (for review, see Buzsáki et al., 2012), but interpreting field potentials remains a challenge. Field potentials are shaped in complicated ways by volume conduction (Kajikawa and Schroeder, 2015), morphology of neurons (Pettersen and Einevoll, 2008), their spatial arrangement (Klee and Rall, 1977), and the statistics of synaptic inputs (Lindén et al., 2011), among other factors.

Inferring underlying neural activity from field potentials has been accomplished in studies that skillfully wove together mathematical modeling with knowledge of neurophysiology and anatomy (Rall and Shepherd, 1968: olfactory bulb; Nicholson and Llinas, 1971: cerebellum; Kuokkanen et al., 2010: nucleus laminaris; and Fernández-Ruiz et al., 2013: dentate gyrus). We follow this approach to analyze *V _{e}* recorded in the auditory brainstem of cats in response to monaural pure tones. We adopt a standard terminology and refer to these field potentials as the

*auditory neurophonic*(Weinberger et al., 1970).

A motivation for studying the auditory neurophonic has been the difficulty of accessing individual medial superior olive (MSO) neurons with *in vivo* preparations. Traditional extracellular techniques in MSO have yielded small samples (Goldberg and Brown, 1969; Guinan et al., 1972a; Yin and Chan, 1990; Day and Semple, 2011) and are contaminated by the neurophonic due to its large amplitude and the small size of MSO action potentials (Scott et al., 2007). Examples of single-unit patch-clamp recordings (Franken et al., 2015) and juxtacellular recordings (van der Heijden et al., 2013; Plauška et al., 2016) have recently been reported, but the technical challenges of these methods remain. Another approach has been to record from the axons of MSO neurons at locations isolated from the neurophonic (Bremen and Joris, 2013), but with this approach there is less certainty regarding the location and identity of recorded neurons. Given its large size and ease of recording, the neurophonic provides a useful view of the MSO with possible applicability across species (including humans).

Early investigators of the neurophonic understood that postsynaptic currents in MSO neurons likely are the dominant generators of these field potentials (Galambos et al., 1959; Biedenbach and Freeman, 1964; Tsuchitani and Boudreau, 1964). A second motivation to study the neurophonic is, therefore, to observe synaptic inputs to MSO because they may be difficult to access with other techniques. MSO neurons have a simple and stereotyped bipolar morphology (Stotler, 1953; Rautenberg et al., 2009), the dendritic arbors of neighboring neurons extend approximately in parallel, and synaptic excitation primarily targets these dendrites (Cant and Casseday, 1986; Couchman et al., 2012). For sounds played to one ear, there is a flow of excitatory current into dendrites on one side of the MSO (current *sink*) and a corresponding outflow of current through the soma and opposite dendrite (current *source*). These facts suggest that MSO neurons act (qualitatively) as current dipoles. Our previous work has supported this view (Mc Laughlin et al., 2010; Goldwyn et al., 2014).

This dipole theory is a helpful but incomplete account of *V _{e}* responses. In particular, we observed that sustained

*V*responses to low-frequency tones are more coherent in space (more “monopole-like”) than the “dipole-like” responses to higher-frequency tones (∼1000 Hz and above). We argue that the “dipole theory” is incomplete because it does not consider inhibitory inputs that are known to converge with excitatory inputs on MSO neurons. The role of inhibition in MSO processing is a matter of debate. Some have surmised that inhibition precedes excitation and thereby delays the effect of excitatory inputs to MSO (Brand et al., 2002; Pecka et al., 2008; Myoga et al., 2014), but juxtacellular and intracellular recordings do not support this view (van der Heijden et al., 2013; Franken et al., 2015). If inhibition could be studied via field potentials, such recordings may enable the formulation and evaluation of alternative hypotheses.

_{e}We demonstrate that it may be possible to infer essential features of synaptic inputs to MSO from brainstem field potentials. We construct a mathematical model of an MSO neuron that receives both dendrite-targeting excitation and soma-targeting inhibition and use field potentials to determine the time courses of these paired inputs. Simulations accurately reproduce the diverse and frequency-dependent spatiotemporal *V _{e}* patterns observed in experiments. Furthermore, aspects of the inferred synaptic currents are consistent with properties of excitatory and inhibitory inputs to MSO. Our model-based analysis of the neurophonic provides a route to “see” excitatory and inhibitory inputs to MSO in field potentials. Importantly, we use the model to show that soma-targeting inhibition contributes a monopole-like component to neurophonic responses that is necessary to accurately model onset responses and sustained portions of responses to low-frequency tones. This prediction can be tested in experiments. In particular, our theory predicts that blocking inhibition pharmacologically (Brand et al., 2002; Pecka et al., 2008; Jercog et al., 2010; Roberts et al., 2013; Myoga et al., 2014; Franken et al., 2015) would remove monopole-like features from neurophonic responses.

## Materials and Methods

##### Neurophonic recordings.

We analyzed extracellular voltage recordings from adult cats of both sexes anesthetized with sodium pentobarbital at doses titrated to achieve an areflexive state. These experiments were first described by Mc Laughlin et al. (2010). We summarize our methods here and refer readers to that publication for further details.

We recorded extracellular voltage signals using a planar array of five electrodes (quartz-platinum, 2–4 MΩ). We advanced the electrode array through the auditory brainstem in steps of 50 or 100 μm using a TREC microdrive. We selected an angle of approach (15° to 30° mediolateral relative to the midsagittal plane) with the intention that the electrode track would run parallel to the orientation of dendrites in the MSO (the short axis of the MSO). All signals were filtered (10 Hz to 10 kHz) by the TREC headstages. The 10 Hz lower cutoff eliminates the “DC” component of the signal, a point we discuss in more detail below when comparing onset and ongoing portions of neurophonic responses.

We presented monaural tone bursts over a range of stimulus frequencies (100–2500 Hz for contralateral inputs). We analyzed a subset of these data, focusing on the range of frequencies over which neurophonic responses were largest and most reliably evoked (600–1800 Hz, for contralateral stimulation). Tone bursts to the ipsilateral side were, in most cases, presented at a slightly higher frequency (i.e., 610 Hz instead of 600 Hz, 710 Hz instead of 700 Hz, etc.). For ease of presentation, we ignore this small frequency difference when discussing results in text and figures. The stimulus level was 70 dB SPL for all recording sessions analyzed in this study.

In two recording sessions, for two frequencies (700 and 1300 Hz), we repeated the same stimulus 100 times and averaged the responses. In these two sessions the tone duration was 50 ms. In all other recording sessions, each sound stimulus was presented once for a duration of 2000 ms.

In our exploration of these data, we identified a subset of recordings that exhibited qualitatively similar spatiotemporal patterns. Our goal for this study was to provide an account of these characteristic response patterns. As such, in this paper we present data from five recording sessions (4 animals) and exclude data from nine recording sessions (5 animals). In each recording session, we obtained five simultaneous recordings (using the 5 electrode array). We present data from one electrode per recording session and do not make comparisons across electrodes in this study. Data from these or similar recording sessions were presented previously by Mc Laughlin et al. (2010) and Goldwyn et al. (2014). In Goldwyn et al. (2014) we limited ourselves to analyzing responses to tones with frequency 1000 Hz or higher because these responses exhibited dipole-like features most strongly.

In some cases, we obtained histological information that allowed us to identify the location of the MSO along the electrode track (Table 1; Mc Laughlin et al. (2010)).

##### Spatiotemporal *V*_{e} patterns.

The basic unit of analysis in this paper is the spatiotemporal pattern of *V _{e}*. To construct these patterns, we aggregated the (non-simultaneous)

*V*responses that were obtained at multiple locations in the brainstem by advancing the electrode microdrive in steps of 50 or 100 μm through the brainstem. A prominent feature of neurophonic responses to pure tone stimuli is a temporal oscillation at the tone frequency. One period of oscillation (the inverse of the tone frequency; 1.67 ms for 600 Hz tone, 1.43 ms for 700 Hz tone, etc.) sets a natural time scale for analysis. The spatiotemporal

_{e}*V*patterns that we constructed and analyzed have spatial extents of ∼2000 μm (smallest is 1500 μm, largest is 2500 μm) and temporal extents of one period of oscillation (time in milliseconds depends on the frequency of the pure tone stimulus).

_{e}We distinguished between “onset” and “ongoing” portions of neurophonic responses. The ongoing portion, which began ∼20 ms poststimulus onset, consisted of sustained oscillations. The oscillation frequency matched the frequency of the acoustic tone burst and the amplitude of oscillations was stable over the duration of the response. The onset portion of the *V _{e}* signal comprised the first ∼20 ms of the response. It also exhibited oscillations, but with amplitudes that grew (after ∼5 ms latency) and attenuated before settling into the sustained ongoing response.

The transition from onset to sustained response is likely due, in part, to physiological processes such as adaptation in synaptic transmission and neural firing. In addition, the filtering properties of our recording equipment affected the recorded signal in significant ways. Specifically, the TREC headstages were not DC coupled (10 Hz lower cutoff frequency) and thus removed “steady-state” and low-frequency components (<10 Hz) of neurophonic responses. High-pass filtering impedes interpretation of field potential data because the time-average of ongoing portions of *V _{e}* recordings has 0 mean (no “baseline”; Herreras, 2016). We make the observation that during a brief portion of the response near signal onset, there is transient evidence of the signal baseline, even after high-pass filtering; Figure 1 shows a demonstration. In some analyses, we use this onset portion of

*V*responses to overcome ambiguities associated with interpreting the ongoing portion of responses.

_{e}In most recording sessions, we obtained a single *V _{e}* recording 2000 ms long at multiple spatial locations within the brainstem. There is variability (“noise”) in the recording, but we isolated a meaningful signal from the ongoing portion of the response by averaging with respect to the period of oscillation. Specifically, we associated recording time with a corresponding phase value (relative to some arbitrary starting phase), partitioned each period into small phase “bins” and averaged

*V*values that shared similar phase values. For ease of analysis, we used linear interpolation to resolve

_{e}*N*= 99 distinct phase values so that all spatiotemporal patterns were the same size, regardless of stimulus frequency. We refer to this process as “cycle-averaging.” We illustrate the process of creating a cycle-averaged spatiotemporal

*V*pattern in Figure 2. Onset responses were not stationary over time, so we did not cycle-average onset responses. Instead, to study onset responses, we selected a single cycle of the response that contained the time at which

_{e}*V*reached its maximum amplitude (computed over all spatial locations).

_{e}In two recording sessions, we presented 100 repetitions of a single tone frequency (700 or 1300 Hz, 50 ms duration). In these cases, we could average over the repeated trials to enhance the signal-to-noise ratio in our data. After averaging responses over trials, we created spatiotemporal *V _{e}* patterns from the onset and ongoing portions of these responses, as described above.

##### Approximation of *V*_{e} spatiotemporal patterns in the Fourier domain.

To simplify our model-fitting procedure (described below), we found it helpful to represent cycle-averaged *V _{e}* responses in the Fourier domain (temporal frequency). Specifically, we computed Fourier coefficients from spatiotemporal

*V*patterns by evaluating the discrete Fourier transform (Kido, 2015): The notation

_{e}*V*(

_{e}^{CA}*x*,

*k*) emphasizes that we used cycle-averaged data in this calculation, not the “raw”

*V*recordings (except in analyses of onset responses, for which we used one cycle chosen based on the peak amplitude of

_{e}*V*). In this equation,

_{e}*x*identifies the spatial location (depth in the brainstem) of the recording electrode,

*k*is the discrete time point within a cycle, and

*N*is the period in discrete time units (

*N*= 99 in our analyses). The index

*j*identifies the discrete frequency associated with each Fourier coefficient (

*f*

_{discrete}=

*jf*

_{tone}, where

*f*

_{discrete}is the discrete frequency and

*f*

_{tone}is the stimulus frequency). The Fourier coefficient

*a*

_{0}represents the 0 frequency “DC” component of the

*V*response, the Fourier coefficient

_{e}*a*

_{1}and its complex conjugate

*a*

_{N}_{−1}represent

*V*responses at the fundamental frequency (same as the stimulus frequency). Complex conjugate pairs with higher index values, such as

_{e}*a*

_{2}and

*a*

_{N}_{−2}, are Fourier coefficients for higher harmonics (multiples of the stimulus frequency).

After computing the discrete Fourier coefficients, we reconstructed cycle-averaged *V _{e}^{CA}* responses using the inverse discrete Fourier transform (Kido, 2015):
The 0 frequency coefficient

*a*

_{0}was only required for analysis of onset responses. The 10 Hz lower cutoff frequency in the recording equipment removes this component from ongoing portions of

*V*responses as discussed above (the TREC headstages were not DC-coupled).

_{e}Importantly, we found that we could set *a _{j}* = 0 for all but a few Fourier modes, and still accurately reconstruct

*V*using the above equation. This observation was key to simplifying our model-fitting procedure (described below). Specifically, we found that three pairs of Fourier coefficients {

_{e}^{CA}*a*,

_{j}*a*

_{N−j}} for

*j*= 1, 2, and 3 sufficed to accurately reconstruct ongoing spatiotemporal

*V*patterns. For onset patterns, we used these three pairs and, in addition, retained the constant term

_{e}^{CA}*a*

_{0}. An example

*V*pattern and its Fourier-based reconstruction (using 3 pairs of Fourier coefficients) are shown in Figure 2 (bottom row).

_{e}^{CA}We measured error in the Fourier approximation by treating cycle-average patterns as matrices (space-time) of *V _{e}* values and using the Frobenius norm to define relative error as follows:
where, for an arbitrary matrix

*M*with elements

*m*, the Frobenius norm is

_{ij}Relative errors of this approximation for varying numbers of Fourier coefficients are shown in Figure 3(top row). We chose to use three Fourier modes (the 3 pairs {*a _{j}*,

*a*

_{N−j}} for

*j*= 1, 2, and 3), plus the 0 frequency coefficient for onset responses, in our analyses and modeling because this approximation was satisfactorily accurate across recordings and stimulus frequencies. Figure 3, bottom row, reports relative errors across all recording sessions and stimulus frequencies for the three-mode Fourier reconstruction. The error was greater for responses to lower-frequency tones when compared with higher-frequency tones. Also, onset responses were not as accurately approximated as ongoing responses. This is expected because onset

*V*patterns were computed from a single cycle of response whereas ongoing

_{e}*V*patterns were computed from cycle-averaged data.

_{e}##### Mathematical model.

Our goal was to model extracellular voltage patterns generated by (simulated) neural activity. Moreover, we sought a model that could be fit to spatiotemporal *V _{e}* patterns recorded in experiments. As in our previous studies of the neurophonic (Mc Laughlin et al., 2010; Goldwyn et al., 2014), we assumed that extracellular voltage in the auditory brainstem varied in a direction parallel to the orientation of dendrites in the MSO and was relatively constant in other directions. In addition, we assumed the spatiotemporal pattern of membrane current in a neuron

*I*generates

_{m}*V*responses according to Poisson's equation (electrostatic approximation; Mitzdorf, 1985). Thus, our model for

_{e}*V*included a single spatial dimension and Poisson's equation reduced to: where

_{e}*r*

_{e}is electrical resistivity in the extracellular medium (assumed to be isotropic and homogeneous). The terminals of the model neuron are denoted

*x*

_{C}_{1}and

*x*

_{C}_{2}. We abbreviate the right-hand side of the equation as

*J*(

*x*,

*t*). The function

*J*(

*x*,

*t*) is proportional to membrane current between the terminals of the cable, and 0 beyond the terminals of the cable. We defined boundaries in the extracellular domain to be two points,

*x*

_{G}_{1}and

*x*

_{G}_{2}, that are distant from the terminals of the cable:

*x*

_{G1}≪

*x*

_{C1}and

*x*

_{G2}≫

*x*

_{C2}. We imposed the boundary conditions

*V*(

_{e}*x*

_{G1},

*t*) =

*V*(

_{e}*x*

_{G2},

*t*) = 0. In other words

*x*

_{G}_{1}and

*x*

_{G}_{2}represent the location of electric ground.

The starting point for our model of membrane current *I _{m}* was the classical description of a neuron as a cable with a passive (leaky) membrane (Rall, 1977). The membrane potential as a function of position along the neuron

*x*and time

*t*, denoted

*V*(

_{m}*x*,

*t*), satisfies the linear partial differential equation (cable equation): where

*c*is membrane capacitance per unit length,

_{m}*r*is membrane resistance,

_{m}*r*is intracellular (axial) resistance per unit length, and

_{i}*E*is the leak reversal potential. The value of

_{lk}*E*has no effect on our results, so we set

_{lk}*E*= 0 mV for simplicity (or equivalently, we think of

_{lk}*V*in the above equation as deviation from a resting potential. The term

_{m}*I*represents synaptic input currents and will be discussed in more detail below. Nonlinearities such as spike-generating sodium current and voltage-gated currents that are known to be present in MSO neurons (e.g., low-threshold potassium) are not included in our model. Spikes in MSO neurons are small (Scott et al., 2007) and typically difficult to identify in extracellular recordings, so we did not expect that they contribute appreciably to neurophonic responses. In our previous work, we observed that the contribution of low-threshold potassium current to

_{in}*V*responses was small relative to synaptic currents (Goldwyn et al., 2014). We would not expect, therefore, that our results would change substantially if this current were included.

_{e}Following standard practice we introduced the membrane time constant parameter τ_{m} = *r _{m}c_{m}* and the space constant parameter λ = and reformulated Equation 6 as
The spatial domain of the cable is a finite interval [

*x*

_{C}_{1},

*x*

_{C}_{2}] and we imposed “sealed end” (no flux) boundaries at the end points = 0.

The cable equation expresses a conservation of current relationship. In particular, membrane current in this model is the sum of capacitive, leak, and input currents and can be written (after rearranging terms):
Our approach to simulating *V _{e}* can now be stated briefly. For given input current

*I*and parameter values (τ

_{in}_{m}, λ, and others), we computed

*V*by solving Equation 7. We then computed

_{m}*I*using Equation 8. Last, with

_{m}*I*determined, we specified the right-hand side of Equation 5 [denoted as

_{m}*J*(

*x*,

*t*)] and found

*V*(

_{e}*x*,

*t*) using a Green's function (Tuckwell, 1988): This procedure establishes a straightforward and analytical connection between MSO neuron activity (cable equation for

*V*) and the auditory neurophonic (Poisson's equation for

_{m}*V*). The critical “missing link” in this sequence is the presumed knowledge of the input current

_{e}*I*. Our experimental measurements were extracellular (we recorded

_{in}*V*). We did not have access to the synaptic currents driving neural activity in the MSO. The challenge, therefore, was to infer

_{e}*I*to produce simulated

_{in}*V*patterns that accurately reproduced

_{e}*in vivo*data. We describe our method for determining

*I*in the Model fitting procedure section, below.

_{in}Equation 9 establishes how the transmembrane current *I _{m}*(

*x*,

*t*) of one MSO neuron (modeled as a passive cable) generates an extracellular voltage response

*V*(

_{e}*x*,

*t*). The auditory neurophonic, like most field potentials, represents the combined activity of many neurons that are “near” the recording electrode. We replaced

*I*(

_{m}*x*,

*t*) in Equation 5, therefore, with membrane current summed over many neurons: To simplify this population-level description of membrane currents, we considered an idealized group of MSO neurons. Neurons in this local subpopulation were modeled as identical cables, receiving identical inputs

*I*(

_{in}*x*,

*t*), and oriented in parallel to one another along a common spatial dimension (

*x*-axis). In other words,

*I*

_{m}

^{(j)}≡

*I*and

_{m}*V*

_{m}

^{(j)}followed Equation 7 for all neurons in the subpopulation. Thus, the aggregate membrane current summed over a population of

*n*neurons is as follows: To allow for some variation in the spatial position of each model neuron, we assumed the center of each neuron Δ

*x*was drawn (independently) from a Gaussian distribution with mean

_{j}*x*(the center of the MSO) and variance σ

_{c}^{2}.

In the limit of a large population of neurons, this sum could be replaced with the convolution integral
There are two additional parameters in this model of aggregate membrane current. The amount of “spatial jitter” in the population of MSO neurons is σ and the number of neurons presumed to contribute to recorded field potentials is *n*. As we explain below, however, *n* and the extracellular resistance *r _{e}* did not need to be specified when fitting simulated

*V*responses to data.

_{e}##### Model fitting procedure.

As discussed above, spatiotemporal *V _{e}* patterns are the basic unit of data that we considered in this study. Our goal, then, was to find parameter values and input currents so that solutions of Equation 9 (simulated

*V*) matched recordings of the auditory neurophonic. Two simplifications made this model fitting procedure tractable. First, because spatiotemporal

_{e}*V*patterns were accurately approximated in the Fourier domain, it sufficed to fit a small number of Fourier coefficients. Second, we stipulated that

_{e}*I*

_{in}(

*x*,

*t*) was the sum of two point currents localized to two distinct spots on the cable: The symbol δ(

*x*−

*x*), where = 1 or 2, is the Dirac delta function. It is 0 everywhere except if

_{i}*x*=

*x*, has area equal to 1, and thus represents the assumption that synaptic inputs are point sources (localized and not spread out along the cable). Unless otherwise stated, the location of the first input (

_{i}*x*

_{1}) is the center of the cable, and the second input targets an off-center position (

*x*

_{2}, 75 μm from the cable center for onset responses and 150 μm from the cable center for ongoing responses).

In this form, *I _{in}*(

*x*,

*t*) has a natural interpretation based on known properties of MSO neurons. Excitation evoked by monaural sound stimuli are segregated (ventromedial side of MSO for contralateral sounds, the dorsolateral side of MSO for ipsilateral inputs). Inhibition to MSO neurons is known to primarily target the soma (Kapfer et al., 2002) and excitation is known to primarily target dendrites (Couchman et al., 2012). The input currents can, therefore, be tentatively identified as soma-targeting inhibition (

*I*

_{1}) and dendrite-targeting excitation (

*I*

_{2}). Figure 4 illustrates the arrangement of synaptic inputs to MSO neurons and the idealized configuration of input currents in the model. Under this interpretation, fitting the model can be viewed as a test of a “working hypothesis” that the combined contributions of inhibition and excitation to MSO neurons are the dominant generators of the spatiotemporal

*V*patterns observed

_{e}*in vivo*.

Fitting *I*_{1}(*t*) and *I*_{2}(*t*) was tractable when we worked in the Fourier (temporal frequency) domain. Because *V _{e}* responses were accurately approximated by three frequency components (and a constant term for onset responses), we could rewrite Equation 13 in terms of Fourier coefficients and sinusoidal temporal dynamics as follows:
where

*f*is the frequency of the sound stimulus and the Fourier coefficients α

*and β*

_{j}*are complex-valued functions. The zero frequency “DC” components α*

_{j}_{0}and β

_{0}were non-zero only when modeling onset responses, for reasons discussed above (10 Hz lower frequency cutoff in

*V*recordings). With

_{e}*I*expressed as this sum of complex exponentials, we could solve the preceding equations to determine

_{in}*V*. This formulation admits complex-valued solutions, with coefficients that include both magnitude and phase information to completely describe the waveform of

_{e}*V*(which is known to be real-valued). To make comparisons to data, therefore, we extracted the real-valued portion of the calculated

_{e}*V*.

_{e}This representation of *I _{in}* substantially simplified the problem of fitting simulated

*V*responses to data because

_{e}*I*(

_{in}*x*,

*t*) is completely determined by a small number of Fourier coefficients (α

*and β*

_{j}*). Additionally, all equations in the model are linear in time. As a result, orthogonality of the complex exponential function ensures that Fourier coefficient pairs can be fit “individually”, i.e., values of α*

_{j}*and β*

_{j}*do not depend on values of α*

_{j}*and β*

_{k}*for*

_{k}*j*≠

*k*.

When possible, we chose the values of other parameters in the model to reflect the known physiological and anatomical properties of MSO neurons. The physical length of the cable model neuron is 430 μm (Stotler, 1953). We imagine this cable represents two dendrites of length 200 μm each and a soma of length 30 μm. The space constant of the cable is therefore set to be λ = 200 μm, or approximately the length of one dendrite (Mathews et al., 2010). The membrane time constant is τ = 0.3 ms and reflects the exceptionally fast dynamics of MSO neurons (Golding and Oertel, 2012). These parameter values were fixed for all model fits (regardless of MSO, tone frequency, and contralateral or ipsilateral side of tone presentation).

We selected the remaining parameters by hand to obtain satisfactory fits to the data. They included the location of the cable center (Eq. 14, *x*_{1}), the “spatial jitter” of the population (Eq. 12, σ), and the locations of electric ground (Eq. 9, *x _{G}*

_{1}and

*x*

_{G}_{2}). We selected different parameter values for each recording session (i.e., each MSO), but the parameter values did not change with tone frequency. We found it necessary to use different values for onset and ongoing portions of responses, and for contralaterally- and ipsilaterally-presented tones. These parameter values are reported in Table 1. We consider possible reasons for these differences in parameter values in the Discussion.

When reporting results, we often speak of input currents obtained using the model. To be precise, our model fitting procedure estimates the quantity *nI _{in}*(

*x*,

*t*). The parameters for membrane resistance (Eq. 6,

*r*

_{m}), electrical resistivity of the extracellular domain (Eq. 5,

*r*) and the number of neurons in the neuron population (Eq. 12,

_{e}*n*) enter as multiplicative factors. All equations in the model are linear, however, so these parameters have no effect on solutions other than rescaling the amplitude of

*I*. We find it convenient, therefore, to refer to this quantity as “input current” and trust this will not cause confusion. We typically report normalized values of this quantity, so there is no need to specify the values of these scaling factors.

_{in}With all parameters selected, we used a global optimization algorithm in MATLAB (*lsqnonlin*) to find Fourier coefficients α* _{j}* and β

*. We sought to minimize the error between simulated and recorded spatial-temporal patterns of*

_{j}*V*. Specifically, for each frequency component used in the Fourier representation of

_{e}*V*, we solved the minimization problems (

_{e}*j*= 0, 1, 2, 3 for onset data,

*j*= 1, 2, 3 for ongoing data): where

*a*

_{j}

^{(data)}are Fourier coefficients defined as in Equation 1 from spatiotemporal

*V*data and

_{e}*a*

_{j}

^{(model)}are Fourier coefficients extracted from simulated

*V*responses. The index

_{e}*i*on the spatial variable indicates the finite set of locations (electrode positions) at which

*V*recordings were obtained. Once α

_{e}*and β*

_{j}*were determined, we reconstructed the input currents*

_{j}*I*

_{1}(

*t*) and

*I*

_{2}(

*t*) in the time domain by extracting the real part of the complex-valued functions described by Equation 14.

Although we fit models by minimizing error in the Fourier domain, we use the relative error measure introduced above (Equation 3) when reporting the quality of model fits. In particular, we treated cycle-averaged neurophonic responses as matrices *V*_{e}^{CA(data)} and computed the relative error of the simulated responses *V*_{e}^{CA(model)} patterns as follows:
We chose a soma-targeting position for *I*_{1} and dendrite-targeting input for *I*_{2} based on known patterns of synaptic inputs to MSO. We do not claim to be identifying unique and globally optimal solutions to this minimization problem. As we discuss below, similar fits can be obtained for different locations of input currents.

All computations were performed on a laptop computer using the MATLAB scientific computing software package (RRID:SCR_001622).

## Results

### “Dipole” and “non-dipole” features in *V*_{e} responses

_{e}

The standard conceptual model for neurophonic responses, what we have termed the dipole theory, posits that membrane currents in MSO neurons are the dominant generators of auditory brainstem *V _{e}*. This account caricatures MSO neurons as current dipoles because dendrite-targeting excitation generates inward (“sink”) current and the return of this current to the extracellular domain through the soma and opposite dendrite generates outward (“source”) current.

The *V _{e}* responses shown in Figure 5 exhibit features that we classify as dipole-like, i.e., consistent with the dipole theory. These data (which are, in fact, the average of 100 repeated trials) were recorded in response to a 1300 Hz pure tone presented to the ear contralateral to the side of the brain in which the data were collected. The three

*V*time courses in Figure 5

_{e}*A*were recorded at locations separated by increments of 300 μm. From histology we estimated that the green curve was recorded near the MSO center and the blue and red curves were recorded medial and lateral to the MSO center, respectively (negative depth indicates more ventromedial location, positive depth indicates more dorsolateral). These

*V*time courses oscillate at 1300 Hz, which matches the frequency of the stimulating tone. Such sustained sound-evoked oscillations are the signature of the auditory neurophonic. The essential dipole-like feature of these responses is the anti-phase relationship between the temporal oscillations in

_{e}*V*signals recorded medial to the MSO center (blue curve, negative depths) and lateral to the MSO (red curve, positive depths). This anti-phase relationship is present in both onset and ongoing responses (Fig. 5

_{e}*A1*,

*A2*).

We visualize the spatiotemporal dynamics of *V _{e}* responses using two-dimensional color maps. These “patterns” depict one cycle of the onset

*V*response (Fig. 5

_{e}*B1*) and the cycle-averaged ongoing response (Fig. 5

*B2*). One can visually identify these

*V*patterns as dipole-like by observing that temporal oscillations in

_{e}*V*at depths lateral to the MSO (larger depths on the

_{e}*y*-axis) appear to be anti-phase to oscillations at more medial positions (smaller depth values). Indeed, if one were to draw a cross-section at a fixed moment in time, one would observe a sharp transition (and a null in

*V*) around a depth of 1000 μm. This position corresponds approximately (based on histological observations) to the location of the MSO center, which we mark with a black line in Figure 5

_{e}*B1*and

*B2*.

We will discuss modeling results in more detail below, but here we provide initial evidence that we accurately reproduced both onset (Fig. 5*C1*) and ongoing (Fig. 5*C2*) spatiotemporal patterns of *V _{e}* responses to monaural tones using our mathematical model. The currents that drove activity in the model neuron (and were obtained from fitting the model to

*V*data) are shown in Figure 5

_{e}*D1*and

*D2*. For this dipole-like pattern, the amplitude of the dendrite-targeting current

*I*

_{2}was larger than the amplitude of the soma-targeting current

*I*

_{1}.

A primary motivation for the present work was that we observed diverse and frequency-dependent *V _{e}* responses to monaural tones that are not always “dipole-like.” In particular, sustained responses to lower-frequency tones tended to exhibit monopole-like features as illustrated with an example in Figure 6. These are

*V*responses to a 700 Hz contralaterally presented tone (average of responses in 100 trials). The time courses in Figure 6,

_{e}*A1*and

*A2*, are from three locations in and around the MSO. The prominent temporal oscillation at the tone frequency (700 Hz in this case) is again apparent.

The spatial distribution of these *V _{e}* responses differed from the previous dipole-like example. Extracellular voltages obtained at positions medial (blue) and lateral (red) to the MSO did not exhibit an anti-phase relationship. Rather,

*V*responses at all three locations oscillated nearly in phase with one another. The coherent oscillation was present in both the onset and ongoing portions of the response. Consequently, the dominant feature of the spatiotemporal

_{e}*V*pattern of these responses was a temporal oscillation that was coherent across recording depths (see especially the ongoing pattern in Fig. 6

_{e}*B2*). The largest amplitude of the spatially coherent temporal oscillation was at a recording depth near 1100 μm. This position was near the center of the MSO, as estimated from histology.

In Figure 6*C*, we show that simulated *V _{e}* patterns accurately reproduced the

*V*patterns observed in onset and ongoing portions of this dataset. Reproducing these monopole-like

_{e}*V*patterns required strong soma-targeting current

_{e}*I*

_{1}(Fig. 6

*D*). Soma-targeting current produced spatially-coherent

*V*responses in the model with input current at the center of the cable offset by return currents distributed symmetrically across the cable. The dendrite-targeting current

_{e}*I*

_{2}was also present and, as we will show, the combination of a soma-targeting input (that creates monopole-like

*V*patterns) and a dendrite-targeting input (that creates dipole-like

_{e}*V*patterns) sufficed to accurately simulate non-dipole-like neurophonic responses.

_{e}The preceding examples illustrated our parsimonious and biophysically-based update to the dipole theory. In the remainder of this work, we will demonstrate that the model flexibly reproduced diverse and frequency-dependent neurophonic responses. In addition, we will argue that it is plausible to view dendrite-targeting current in the model as a signature of excitatory inputs to MSO neurons and soma-targeting current as a signature of inhibitory inputs to MSO.

### Simulated *V*_{e} patterns reproduce diverse and frequency-dependent neurophonic responses

_{e}

We identified five MSOs (in 4 cats) in which neurophonic responses exhibited qualitatively similar spatiotemporal *V _{e}* patterns. These patterns varied with stimulus frequency and with side of sound presentation. Contralateral responses at three frequencies are in Figure 7 and ipsilateral responses are in Figure 8. In all cases we found that the model could be fit to data so that simulated

*V*patterns accurately reproduced the diverse and frequency-dependent neurophonic responses observed

_{e}*in vivo*. Spatiotemporal

*V*patterns obtained from data and simulations are interleaved in alternating columns in Figures 7 and 8. All

_{e}*V*patterns shown in these figures are ongoing portions of responses.

_{e}The dominant feature of low-frequency responses (600 Hz, left columns) was a monopole-like temporal oscillation that is coherent along the spatial (vertical) dimension. Responses to 1200 Hz tones (central columns) exhibit dipole-like patterns, apparent in the (nearly) anti-phase relation between temporal oscillations at small recordings depths relative to larger depths. Extracellular voltage responses to high-frequency tones (1800 Hz, right columns), exhibited patterns that were dipole-like in some ways, but also had the appearance of “traveling waves”.

Model fits to contralateral responses were accurate across all recording sessions and stimulus frequencies (Fig. 9*A*,*B*). Ipsilateral responses exhibited, on occasion, more complex patterns (see MSO 1 response to 600 Hz tone in Fig. 8). As a result, model fits to data were more accurate for contralateral responses than ipsilateral responses (Fig. 9*C*,*D*). The error we are reporting in Figure 9 measured differences between model fits and cycle-averaged data, but model parameters were determined from comparisons between simulations and the Fourier domain approximation of the data (see Materials and Methods). The accuracy of the model was limited, therefore, by the accuracy of the Fourier approximation. In other words, the relative errors reported in Figure 3 are a lower bound for relative error in the model.

We distinguished dipole-like from “non-dipole-like” *V _{e}* features primarily on the basis of visually inspecting spatiotemporal patterns like those shown in Figures 7 and 8. To make this classification more objective, we fit versions of the model that included input current at only a single location. We reasoned that if a spatiotemporal

*V*pattern could be accurately approximated by a model that included an input current targeting one off-center (“dendrite”) location, then the

_{e}*V*response could reasonably be labeled dipole-like. Alternatively, if a spatiotemporal

_{e}*V*pattern could be accurately approximated by a model that included one input current targeting the center (“soma”) of the model neuron, then the

_{e}*V*response could be termed monopole-like.

_{e}By this measure, ongoing *V _{e}* responses transitioned from monopole-like for low-frequency tones to dipole-like for higher-frequency tones (Fig. 9

*B*,

*D*). Specifically, the “soma input only” model (dotted line) yielded more accurate fits to

*V*data at low frequencies relative to the “dendrite input only model” (dashed line). This relationship reversed at frequencies >∼1000 Hz. In all cases, the “standard” model with two inputs yielded more accurate fits to data than models with one input only. This is not surprising, but it emphasizes our point that the dipole theory does not provide a comprehensive and quantitative account of neurophonic responses. Instead, these results support the “working hypothesis” that

_{e}*V*patterns are shaped by the combination of dendrite-targeting excitation and soma-targeting inhibition.

_{e}We also used the model to accurately reproduce onset portions of neurophonic response to monaural tones across the same range of frequencies. Onset *V _{e}* patterns are not displayed, but see Figure 9,

*A*and

*C*, for relative errors and Figures 5 and 6 (left columns) for examples. Onset responses tended to require two inputs to be accurately modeled (errors in fits using dendrite-only or soma-only models were much higher than errors in fits using both inputs for all frequencies studied).

In sum, model fitting of *V _{e}* responses required a combination of dendrite-targeting and soma-targeting currents to accurately reproduce the diverse and frequency-dependent neurophonic responses observed

*in vivo*. The contribution of soma-targeting current was most apparent for responses to lower-frequency tones, which appeared monopole-like. The contribution of dendrite-targeting current was most apparent for ongoing responses to higher-frequency tones, which appeared dipole-like.

### Onset responses identify *I*_{1} and *I*_{2} as source and sink current, respectively

As illustrated by the preceding results, we can use our idealized modeling approach to accurately reproduce extracellular voltage recordings. The key step in the model-fitting procedure was the determination of two input currents: *I*_{1}(*t*) and *I*_{2}(*t*) in Equation 13. By construction of the model, there was a natural interpretation of these input currents: *I*_{1} as soma-targeting inhibition and *I*_{2} as dendrite-targeting excitation to MSO neurons. To support this interpretation, we examined these inputs currents further.

To reasonably identify *I*_{1} as inhibitory current and *I*_{2} as excitatory current, we expected that *I*_{1} should represent an outward current (source) and *I*_{2} should represent an inward current (sink) in the model. In other words, following standard convention, we expected *I*_{1} to take positive values and *I*_{2} to take negative values. The input currents obtained from model fits of ongoing responses had no baseline (DC) component (as discussed in Materials and Methods). They oscillated with a time-average of zero and, consequently, could not be classified as sink or source current. We focused, therefore, on *I*_{1} and *I*_{2} obtained from fits to onset portions of responses. There was a baseline component in these currents that could possibly offset *I*_{1} and *I*_{2} to be positive- or negative-valued.

The ranges of *I*_{1} and *I*_{2} values obtained in all model fits to onset responses are displayed in Figure 10. For the parameter sets we used, input currents fit to onset responses were segregated (for the most part) into negative-valued *I*_{1} and positive-valued *I*_{2}. In other words, the soma-targeting current (*I*_{1}) was a source (outward current) and the dendrite-targeting input (*I*_{2}) was a sink (inward current). Because inhibition is primarily soma-targeting in MSO and excitation is primarily dendrite-targeting, we could plausibly conclude that *I*_{1} reflects inhibitory inputs to MSO and *I*_{2} reflects excitatory inputs to MSO.

### Characteristics of putative inhibition and putative excitation

#### Putative inhibition attenuates with tone frequency and putative excitation exhibits a “best frequency”

We have argued that the inputs to the model can plausibly be viewed as inhibition (*I*_{1}, targets center of model neuron) and excitation (*I*_{2}, targets away from center of model neuron). To assess this interpretation, we measured the amplitude of these putative currents obtained from model fits to each *V _{e}* response (5 MSOs, 12 frequencies in increments of 100 Hz from 600 to 1800 Hz, ongoing responses). We defined amplitude as peak-to-trough difference in the waveforms

*I*

_{1}(

*t*) and

*I*

_{2}(

*t*). To facilitate comparisons across different MSOs, we normalized the amplitude measure in each MSO and for each side of sound input (contralateral or ipsilateral ear) to the maximum amplitude value obtained for

*I*

_{1}over all frequencies.

For the parameter sets we used to fit *V _{e}*, we found consistent trends across recordings sessions. The amplitude of

*I*

_{1}decreased as the frequency of the sound stimulus increased, as shown in Figure 11

*A*and

*B*. There were a few exceptions at the very lowest frequencies for contralateral responses, but overall the amplitude profile of

*I*

_{1}was “low-pass”. In contrast, the amplitude profiles of

*I*

_{2}were distinctly non-monotonic. We defined a “best frequency” for each MSO and each side of stimulation as the maximum value of the amplitude profiles. Note this best frequency refers only to the response at which amplitude of putative excitatory inputs to MSO neurons was largest, it is not a measure of firing rate (spiking) in MSO. The median best frequency for contralateral and ipsilateral responses was 1300 Hz.

We cannot directly relate the amplitude-frequency profiles in Figure 11*A–D* to known synaptic currents because we only measured extracellular voltage in these experiments. Nonetheless, these amplitude-frequency profiles are consistent with known properties of synaptic inputs to MSO. For instance, Couchman et al. (2010) found the average decay time constant for evoked EPSCs *in vitro* to be 270 μs and for IPSCs to be 1.76 ms (gerbil MSO). Some have suggested that IPSCs could have fast (submillisecond time scale) kinetics (Brand et al., 2002), but *in vitro* evidence consistently identify postsynaptic inhibitory currents and potentials with millisecond-scale dynamics; (Grothe and Sanes, 1994; Jercog et al., 2010; Fischl et al., 2012; Roberts et al., 2013). Direct measurements of synaptic currents *in vivo* have not been reported, but EPSPs in gerbil are fast relative to IPSPs (Franken et al., 2015). Because dendrite-targeting excitation in MSO is fast and temporally precise (Joris et al., 1994a), it is plausible that EPSCs can contribute to the neurophonic on a cycle-by-cycle basis in response to relatively high tone frequencies. Soma-targeting inhibition is slower, thus we expect the contribution of inhibition to the neurophonic to have low-pass properties. IPSPs temporally summate at frequencies >200–300 Hz (Grothe and Sanes, 1994; Roberts et al., 2013), although some cycle-by-cycle oscillatory component is visible at 800 Hz in the study by Roberts et al. (2013) (*in vitro* recordings in gerbil).

Many properties of synaptic transmission shape the dependence of current amplitude on stimulus frequency (stochastic vesicle release, temporal jitter, adaptation, etc.). The tonotopic organization of MSO and its inputs likely contributes to the non-monotonic profile of *I*_{2}. The best frequency of 1300 Hz may reflect the frequency-tuning of excitatory inputs to MSO neurons in the region of the brainstem from which we recorded. In additional recordings (data not shown), we sampled neurophonic responses from several locations in the dorsoventral plane of the MSO and found that the frequency at which *V _{e}* amplitudes were maximal varied in a manner consistent with the presumed tonotopic axis of MSO.

#### Putative inhibition precedes putative excitation

Last, we examined the temporal dynamics of *I*_{1} and *I*_{2} waveforms for ongoing responses. Recent *in vitro* work (in gerbils) has shown that inhibitory inputs to MSO neurons can precede excitatory events by several hundred microseconds (Roberts et al., 2013), as proposed by Brand et al. (2002). We were interested to know, therefore, if a similar temporal ordering of putative inhibition (*I*_{1}) and putative excitation (*I*_{2}) emerged from fitting spatiotemporal neurophonic responses.

In response to low-frequency tones, we observed that *I*_{2} time courses appeared to include brief events reminiscent of EPSCs. The timing of these events coincided with the appearance of negative-going regions of *V _{e}* in recordings and simulations. The correspondence between negative-going

*I*

_{2}and

*V*is expected because we associate a transient dip in

_{e}*I*

_{2}with a dendrite-targeting current sink. Figure 12

*A–C*provides an example of a response to a low-frequency tone in which this temporal ordering of excitation and inhibition is apparent. Arrows in Figure 12,

*A*and

*B*, mark the location of the brief negative-going events in the

*V*patterns.

_{e}Because inward currents are negative-valued by convention, the minimum value of *I*_{2} represents the maximum value of the putative excitatory current. We refer to this as “peak excitation”. Similarly, we defined “peak inhibition” as the maximum value of *I*_{1} (outward currents are positive, by convention). We then measured the difference Δ*t* between the times at which peak inhibition and peak excitation occurred; using the convention that positive Δ*t* indicates that peak inhibition precedes peak excitation in the model. In the example shown in Figure 12*C*, peak inhibition preceded peak excitation by Δ*t* = 135 μs.

We found the temporal ordering of putative inhibition and putative excitation to be consistent across tone frequency and side of sound stimulus for four of the five MSOs we analyzed. Results for tones presented to the contralateral ear are shown in Figure 12*D* and results for tones presented to the ipsilateral ear are shown in Figure 12*E*. MSO number 4 is excluded from this figure because timing results were not consistent across frequency.

Although Roberts et al. (2013) used a different experimental preparation and methodology, these findings concur (circumstantially) with their work and provide evidence that inhibition precedes excitation in MSO neurons. Roberts et al. (2013) used a thick slice preparation to stimulate afferent inputs to gerbil MSO neurons and measured onset times of synaptic events, which were defined in their study as the 20% rise time of PSPs. They observed that the onset of inhibitory PSPs preceded the onset of excitatory PSPs by 380 μs, on average, for contralateral stimuli and 320 μs for ipsilateral stimuli.

### Consideration of alternate input configurations

We have found that a model with two input locations accurately replicated diverse and frequency-dependent neurophonic responses observed *in vivo* in response to monaural pure tones. The model is minimal in the sense that models with a single input could not accurately fit recorded *V _{e}* patterns (recall the high relative errors for the “soma only” and “dendrite only” models in Fig. 9). It is also biophysically based in the sense that we can plausibly associate the two inputs with dendrite-targeting excitation and soma-targeting inhibition in MSO neurons.

We have not, however, ruled out other possible input configurations. In addition to the “standard” model (dendrite-targeting and soma-targeting inputs), we constructed two alternative models. In the “same dendrite” input configuration, two inputs targeted the same dendrite (75 and 150 μm from the center of the model neuron). In the “opposite dendrite” input configuration, two inputs targeted positions on the model neuron ±75 μm relative to its center for onset responses (±150 μm for ongoing responses). Schematics of these model configurations are in the left column of Figure 13.

We found that these models replicated *V _{e}* responses with the same accuracy as our standard model. The similarity of modeling fitting results for all three input configurations highlights the difficulty of unambiguously inferring synaptic currents and neural dynamics from extracellular voltage data. It also, in our view, reinforces the importance of using available anatomical and physiological knowledge to construct and interpret a model of

*V*responses.

_{e}Consider the onset response to 700 Hz contralateral tone first depicted in Figure 6. The spatiotemporal *V _{e}* pattern is reproduced in central column Figure 13 along with simulated

*V*patterns produced by three different input configurations. The simulated

_{e}*V*patterns are nearly identical.

_{e}The input currents obtained from fitting each model are shown in the right column of Figure 13. We labeled positive values as source current and negative values as sink currents, following standard convention. Sink current is an inward current that depolarizes the model neuron so it is natural, as we suggested previously, to associate sinks in the model with synaptic excitation to MSO neurons. Similarly, we associate sources (outward currents) with synaptic inhibition. Following this interpretation, we characterized the same dendrite model as having excitation and inhibition targeting the same dendrite (Fig. 13C). We characterized the opposite dendrite model as having inhibitory inputs to both dendrites with similar time courses (Fig. 13*D*). Although we cannot state with certainty that these configurations are unrealistic, we view the standard model (dendrite-targeting excitation and soma-targeting inhibition) as most consistent with known properties of MSO neurons and their synaptic inputs.

## Discussion

### A minimal model of the MSO reproduces diverse and frequency-dependent neurophonic responses

Acoustic stimulation evokes prominent and sustained *V _{e}* responses in the auditory brainstem (Tsuchitani and Boudreau, 1964; Mc Laughlin et al., 2010). These field potentials, known as the auditory neurophonic, have characteristic spatial profiles that can resemble patterns of

*V*created by current dipoles (Galambos et al., 1959; Biedenbach and Freeman, 1964; Tsuchitani and Boudreau, 1964; Guinan et al., 1972b; Mc Laughlin et al., 2010; Goldwyn et al., 2014).

_{e}We constructed an idealized, but physiologically plausible, model of the MSO to simulate these *V _{e}* responses. There were two input currents that drove activity in the model. One input targeted the center of the model neuron and the other targeted an off-center location. Based on known morphology of MSO neurons and their synaptic inputs, we associated these inputs with soma-targeting inhibition and dendrite-targeting excitation, respectively. We determined time courses of these putative synaptic currents to accurately reproduce onset and ongoing portions of neurophonic responses to monaural pure tones presented over a range of frequencies. This is an advance over our previous modeling work which, for the most part, made qualitative comparisons between simulated and recorded

*V*responses and focused on dipole-like responses to tones with frequencies ≥1000 Hz (Goldwyn et al., 2014).

_{e}### A testable working hypothesis: inhibition explains non-dipole-like neurophonic responses

Due to the dipole-like spatial patterning of *V _{e}* responses and bipolar morphology of MSO neurons, early investigators hypothesized that dendrite-targeting excitation is the dominant generator of the neurophonic (Galambos et al., 1959; Biedenbach and Freeman, 1964). We drew attention here to the fact that auditory neurophonic responses can exhibit “non-dipole” features, particularly in response to low-frequency tones (<1000 Hz, approximately; Fig. 6). Simulations using dendrite-targeting excitation alone did not produce non-dipole

*V*patterns, so we adopted the working hypothesis (informed by the physiology and anatomy of MSO neurons) that inhibition is soma-targeting (Kapfer et al., 2002) and that inhibition combines with dendrite-targeting excitation to shape neurophonic responses.

_{e}When we fit this parsimonious model to field potential data, we obtained time courses of soma-targeting and dendrite-targeting currents. Several observations supported our interpretation of these inputs as putative inhibition and excitation, respectively. First, for onset responses, dendrite-targeting inputs were negative (Fig. 10). These are sinks in the model (i.e., inward currents), and can plausibly be associated with excitation. Soma-targeting inputs, in contrast, were positive in most cases. These are sources (i.e., outward current), and can be associated with inhibition. Second, putative inhibition tended to attenuate with increasing tone frequency, whereas the putative excitatory current exhibited a maximum amplitude in response to ∼1300 Hz tones (Fig. 11). These results accord with our expectations because inhibitory currents undergo temporal summation in response to high-frequency tones due to their relatively slow (millisecond-scale) kinetics (Grothe and Sanes, 1994; Franken et al., 2015), whereas the amplitude profile of excitation may be shaped by the tonotopic organization of MSO and its inputs. Indeed, in unpublished recordings (to be the subject of a future report) in which we oriented the five electrode array in a plane that sampled different dorsoventral levels, we saw evidence of tonotopic tuning in neurophonic responses. Third, we found that inhibition preceded excitation (Fig. 12). A similar temporal ordering of MSO inhibition and excitation has been observed *in vitro* (Grothe, 1994; Roberts et al., 2013). Importantly, our observation that inhibition may underlie monopole-like features in the neurophonic could be tested in future experiments. We would expect that blocking inhibition (for instance, using pharmacological methods as done by Brand et al., 2002; Pecka et al., 2008; Franken et al., 2015) would transform *V _{e}* responses to appear more dipole-like.

### Evaluation of modeling approach

Our plausible and quantitative model demonstrates how excitatory and IPSCs, as distributed across bipolar MSO neurons, can generate the neurophonic. That being said, we have not ruled out alternative hypotheses. For example, we know that different subpopulations of MSO neurons are activated depending on tone frequency (tonotopy of the MSO). It is possible, therefore, that frequency-dependent changes in *V _{e}* responses reflect the spatial distribution of MSO activity and/or the particular ways our electrodes sampled these activity patterns throughout the brainstem (i.e., our choice of insertion angle). Another possibility is that nearby brain regions contribute to auditory neurophonic responses. The lateral superior olive is one candidate (Biedenbach and Freeman, 1964; Clark and Dunlop, 1968). Others have suggested that the dipole-like spatial profile of the neurophonic reflects delay lines as postulated by the Jeffress model of sound localization (Jeffress, 1948), see Bojanowski et al. (1989) for cat, and related work in the auditory brainstem of birds (Sullivan and Konishi, 1986; Köppl and Carr, 2008; Carr et al., 2015). In preliminary analyses of neurophonic recordings at different rostrocaudal positions in the MSO, we have not observed evidence of delay lines in

*V*responses along that axis.

_{e}There are also cellular and biophysical details we excluded from our model, including morphological complexity of dendrites and axons, heterogeneity of MSO response dynamics (Baumann et al., 2013; Remme et al., 2014), and voltage-gated currents, such as low-threshold potassium current (Svirskis et al., 2003) and spike-generating sodium current (Scott et al., 2010). These details shape *V _{e}* responses (Reimann et al., 2013; Ness et al., 2016 for contribution of voltage-gated currents to field potentials) and they underlie more accurate models of MSO dynamics, but we have not observed that they change the qualitative aspects of neurophonic responses that we explored in this study. To support this statement, we performed simulations of a more biophysically detailed model that includes voltage-gated low threshold potassium current and some morphological structure (soma region has larger diameter than dendrite region). Details can be found in the studies by Mathews et al. (2010) and Goldwyn et al. (2014). Simulations of this model, with a sinusoidal excitatory input current targeting one dendrite, are shown in Figure 14, left column. The

*V*response in Figure 14

_{e}*B1*exhibits a dipole-like spatial pattern. The non-synaptic membrane currents evoked in response to the dendrite-targeting input are composed primarily of source currents near the site of input and in the soma region (Fig. 14

*C1*). We repeated this simulation, but using a homogeneous and passive version of the model. We fixed the conductance variable associated with the low-threshold potassium current to its value for

*V*equal to the resting membrane potential, and we reduced the diameter of the soma so that it matched the diameter of the dendrite regions. With these changes, we converted the biophysical model to a passive cable model similar to what we considered in this study. Nonsynaptic membrane currents are distributed differently for this model compared with the biophysical model (Fig. 14

_{m}*C2*). In particular, the homogeneous model lacks the prominent source currents in the soma region of the cell. Nevertheless, simulated

*V*responses are qualitatively similar and exhibit the characteristic dipole-like pattern Figure 14

_{e}*B2*. We suggest the following interpretation of our work: the time courses of the putative synaptic currents that we obtained with the passive cable model may not be quantitatively-precise reflections of excitatory and inhibitory currents in MSO neurons, but they do reveal essential properties of the spatial arrangement and relative timing of sinks and sources that generate the neurophonic.

In fitting our model to data, we chose parameter values, when possible, based on known properties of MSO neurons. Some parameters, however, were chosen based on numerical experimentation and took different values for contralateral and ipsilateral responses, and for onset and ongoing responses, for the same recording session (Table 1). There are several known differences between contralateral and ipsilateral inputs that may explain the need for different parameter sets. Excitatory inputs to MSO are derived from the same cell types regardless of the ear of stimulation (spherical busy cells; Cant and Casseday, 1986), but the projections of axons from the ipsilateral side are more complex and irregular than those from the contralateral side (Karino et al., 2011). Inhibition evoked by ipsilateral stimuli [which arrives via the lateral nucleus of the trapezoid body (LNTB)] may be less temporally precise than inhibition evoked by contralateral stimuli [which arrives via the medial nucleus of the trapezoid body (MNTB)]. MNTB neurons in cat can show enhanced phase-locking (relative to spike-timing of auditory nerve fibers) as they relay spikes to MSO (Smith et al., 1998). LNTB is a more diverse nucleus than the MNTB (Spirou et al., 1998), but recent evidence in gerbil shows that LNTB neurons can also show enhanced phase-locking, although this was not the case for all LNTB neurons studied (Franken et al., 2015). Although the matter is certainly not settled, overall it seems unlikely that the LNTB provides the degree of temporally-precise inhibition to MSO that is conveyed by MNTB. The likelihood that contralateral inputs to MSO are more orderly arranged and more precisely timed may also explain why, in most cases, simulation results were more accurate for contralateral responses than for ipsilateral responses (Fig. 9). The need for different parameter sets to model ipsilateral and contralateral responses may also reflect possible mismatches in the frequency tuning of ipsilateral and contralateral inputs to MSO neurons (Shamma et al., 1989); see for instance Day and Semple (2011) and Benichoux et al. (2015) for consideration of this “stereausis” phenomenon. Differences in onset and ongoing responses may reflect a surge of well timed excitation at stimulus onset provided by spherical bushy cells (Cant and Casseday, 1986). Spherical bushy responses to tones are marked by strong onset responses followed by adaptation, both in terms of spike probability and spike-time precision (Smith et al., 1993; Joris et al., 1994a,b).

We presented data from five recording sessions (of 14 total) that we identified as sharing similar spatiotemporal *V _{e}* response patterns. We excluded the other datasets for a number of reasons. In two MSOs, the electrode track seemed to miss the MSO so that spatial patterns of

*V*responses were only partially recorded. In a third case, the data could have been modeled with our method, but responses were not collected for tone frequencies >1500 Hz. Among the six remaining datasets, most were excluded because

_{e}*V*responses exhibited idiosyncratic changes with tone frequency or side of stimulation. For example, in three cases, the spatial patterns of

_{e}*V*responses to low-frequency tones differed from

_{e}*V*responses to high-frequency tones suggesting (possibly) the existence of multiple populations of neurons that generate distinct neurophonic responses in a frequency-dependent manner. Additional work is required to account for idiosyncratic

_{e}*V*responses that do not match the “typical” responses studied here (Figs. 7, 8).

_{e}### Field potentials as a route for further study of MSO

Field potentials (*V _{e}*) are an important source of data for surveying neural activity

*in vivo*. These measures resist simple interpretation because there is no direct way to identify the neural generators of

*V*. Our approach, inspired by pioneering work of Rall and Shepherd (1968) and others, illustrates that we can make plausible connections between synaptic inputs to MSO and sound-evoked brainstem field potentials using a physiologically-informed mathematical model.

_{e}We highlighted how soma-targeting inhibition in MSO can create non-dipole-features of auditory neurophonic responses. The role of inhibition in MSO processing of binaural inputs is a topic of considerable interest (for reviews, see Grothe, 2003; Joris and Yin, 2007). In single-unit *in vivo* preparations, IPSCs cannot be directly recorded (but IPSPs can be detected; Franken et al., 2015). Recordings of *V _{e}* may provide a useful alternative perspective. For example, our model-based analysis of

*V*data suggest that inhibition may be particularly prominent in low-frequency sustained responses (Fig. 9) and may precede excitation (Fig. 12).

_{e}*In vitro*recordings in principal cells of the MSO show that inhibitory events precede excitatory events (Roberts et al., 2013). There is indirect evidence that inhibition acts to delay the effect of incoming excitatory inputs and thereby shift the tuning of MSO neurons to interaural time differences

*in vivo*(Brand et al., 2002; Myoga et al., 2014; but see van der Heijden et al., 2013; Franken et al., 2015).

The interaural time difference at which an MSO neuron fires maximally (“best delay”) has implications for how sound source location is represented across the population of MSO neurons. There is debate (Goodman et al., 2013; Harper et al., 2014) whether the distribution of best delays covers the range of interaural delays imposed by an animal's head size, as envisioned by Jeffress (1948), or even reflects acoustical regularities in the animal's environment (Benichoux et al., 2015), or rather whether MSO neurons are grouped into subpopulations clustered around a distinct set of best delays (Harper and McAlpine, 2004). It has been proposed that inhibitory inputs that precede excitatory inputs alter the tuning of best delays, so understanding the relative dynamics of inhibitory and excitatory inputs to MSO is of great importance to current theories of spatial hearing. Our work shows that model-assisted interpretation of *V _{e}* responses may aid in determining the timing of inhibitory inputs.

Further understanding of the neurophonic will also clarify to what extent, if any, the neurophonic can operate as a mechanism for non-synaptic coupling between nearby MSO neurons. In simulations, endogenous *V _{e}* generated by MSO neurons can modulate spike timing and thresholds for spike generation (Goldwyn and Rinzel, 2016). Accurate models of

*V*generation refine our understanding of how endogenous field potentials influence neural activity.

_{e}We studied brainstem field potentials that were recorded intracranially (an invasive procedure). There are common electrophysiological measures of auditory brainstem activity that can be obtained non-invasively. The auditory brainstem response is an important tool for diagnosing auditory function in normal and impaired human hearing. Moreover, phase-locked neural potentials can be measured near the cochlea and on the scalp and have been proposed as tools to investigate human hearing (Snyder and Schreiner, 1984; Kuwada et al., 1986; Shaheen et al., 2015; Verschooten et al., 2015). Presumably, these signals and the auditory neurophonic share neural generators (Caird et al., 1985; Sontheimer et al., 1985). Our observations regarding possible connections between neurophonic responses and MSO neural activity may, therefore, aid interpretation of these non-invasive diagnostic measures of neural activity in the human auditory system.

## Footnotes

This work was supported by Grants from KU Leuven BOF (OT-14-118) and Research Foundation Flanders (G0A1113N and G091214N) to P.X.J.

The authors declare no competing financial interests.

- Correspondence should be addressed to Dr. Joshua H. Goldwyn, Department of Mathematics and Statistics, Swarthmore College, 500 College Avenue, Swarthmore, PA 19081. jhgoldwyn{at}gmail.com