## Abstract

Sparse coding schemes are employed by many sensory systems and implement efficient coding principles. Yet, the computations yielding sparse representations are often only partly understood. The early auditory system of the grasshopper produces a temporally and population-sparse representation of natural communication signals. To reveal the computations generating such a code, we estimated 1D and 2D linear-nonlinear models. We then used these models to examine the contribution of different model components to response sparseness.

2D models were better able to reproduce the sparseness measured in the system: while 1D models only captured 55% of the population sparseness at the network's output, 2D models accounted for 88% of it. Looking at the model structure, we could identify two types of computation, which increase sparseness. First, a sensitivity to the derivative of the stimulus and, second, the combination of a fast, excitatory and a slow, suppressive feature. Both were implemented in different classes of cells and increased the specificity and diversity of responses. The two types produced more transient responses and thereby amplified temporal sparseness. Additionally, the second type of computation contributed to population sparseness by increasing the diversity of feature selectivity through a wide range of delays between an excitatory and a suppressive feature.

Both kinds of computation can be implemented through spike-frequency adaptation or slow inhibition—mechanisms found in many systems. Our results from the auditory system of the grasshopper are thus likely to reflect general principles underlying the emergence of sparse representations.

## Introduction

Successful behavior is tied to the formation of specific representations of the environment to enable the discrimination of friend and foe. Typically, complex feature selectivity arises from more generic representations as one ascends a sensory pathway. This increase of specificity can lead to temporal and population sparseness (Chacron et al., 2011) (but see Willmore et al., 2011). Temporally sparse responses are characterized by well isolated firing events interleaved by periods of neuronal quiescence. Additionally, higher-order neurons are more idiosyncratic, each responding to “its own” feature. Consequently, neurons in a population are less prone to fire together, yielding population-sparse activity. Besides being an outcome of the creation of more specific representations, sparseness also has intrinsic advantages like the efficient use of energy and neuronal bandwidth as well as the facilitation of subsequent computations involving learning and memory (Barlow, 2001; Olshausen and Field, 2004).

Here we ask how temporal and population sparseness arise in the auditory system of the grasshopper, which generates a sparse and specific representation of courtship signals in a small, three-layer feed-forward network (Fig. 1*a*) (Clemens et al., 2011).

Primary auditory receptors in the grasshopper yield a temporally dense, relatively unspecific and faithful representation of a sound's envelope (Fig. 1*b*,*c*; Machens et al., 2001; Gollisch et al., 2002; Rokem et al., 2006). We attempt to characterize the transformations underlying neural encoding in second- and third-order neurons—the local and ascending neurons. Local neurons create a temporally sparse representation of song from the dense inputs provided by the receptors (Fig. 1*d*). However, population sparseness is still low as different local neurons respond to very similar features. The ascending neurons in turn establish a population-sparse code with each ascending neuron responding to a more specific stimulus pattern (Fig. 1*e*).

We aimed to get insight into the mechanisms contributing to temporal and population sparseness by fitting low-dimensional models of the stimulus-response relationship to recordings of second- and third-order neurons in the grasshopper using the framework of linear-nonlinear (LN) models. These models provide intuitive phenomenological depictions of the neural computations actualized by a neuron and its inputs. In their simplest form, LN models consist of a single linear filter and a static nonlinearity. As the kind of representation found at the level of ascending neurons suggests more complex transformations, we used an extension of the one-filter LN models that allowed us to describe multidimensional computations—spike-triggered covariance analysis (STC; Rust et al., 2005; Fairhall et al., 2006; Petersen et al., 2008; Fox et al., 2010).

We find two different classes of nonlinear computations contributing to sparseness in the auditory system of the grasshopper: sensitivity to the derivative of a stimulus and an AND-NOT like transformation. These abstract computations can be implemented by mechanisms ubiquitous in many neural systems and are thus likely to constitute general principles providing sparse and specific representations.

## Materials and Methods

##### Animals, electrophysiology, and acoustic stimulation.

Recordings were performed in adult locusts (*Locusta migratoria*) obtained from a local supplier and held at room temperature (22 ± 5°C). We recorded intracellularly from identified auditory neurons in the locust's metathoracic ganglion. Auditory neurons are organized in a three-layer feedforward network with receptors as an input layer, an intermediate layer of local neurons, and an output layer of ascending neurons. Intracellular electrophysiological recording methods are described in detail in Vogel et al. (2005). After completion of the stimulation protocol, neurons were stained with Lucifer yellow and identified by their characteristic morphology (Römer and Marquart, 1984; Stumpner and Ronacher, 1991). The dataset consists of seven types of auditory neurons from the intermediate (second-order or local neurons: TN1 *N* = 10, SN1 *N* = 2, SN3 *N* = 1, BSN1 *N* = 9) and the output layer (third-order or ascending neurons AN1 *N* = 5, AN2 *N* = 1, AN3 *N* = 2). Previous studies have shown that the local neuron BSN1 comes in two subtypes, one responding with a short burst to the onset of pulses and one firing more persistently during a pulse, most likely due to different strengths of inhibitory inputs (Stumpner, 1989). Accordingly, we refer to them as “phasic” (*N* = 6) or “tonic” (*N* = 3) subtypes of BSN1.

Natural songs of grasshoppers consist of a broadband carrier whose amplitude is modulated by a species-specific envelope. As the decisive cues for song recognition lie in this envelope, we were interested in how single neurons represent the pattern of amplitude modulation of a sound. We therefore modulated the amplitude of broadband noise (5–40 kHz) with lowpass Gaussian noise (cutoff frequency 140 Hz). The mean of this amplitude modulation was set to ≈10–15 dB above each cell's threshold (thresholds ranged between 45 and 65 dB SPL). The standard deviation (SD) of the random amplitude modulations was 6 dB. We presented these noise stimuli in two variants to estimate and verify the models: one long segment lasting between 5 and 14 min for estimating the models and a shorter 6 s segment, which was repeated at least 18 times and was used for estimating the time-varying firing rate for model testing. For all further analysis we only used steady-state responses by omitting the first 400 ms of each spike train.

##### Constructing lower dimensional models to characterize neuronal responses.

Responses to the long noise stimulus formed the basis for spike-triggered analysis (Schwartz et al., 2006). In essence, spike-triggered analysis consists of finding stimulus features influencing a neuron's spiking by comparing the distribution of stimuli *s→* preceding a spike *r*, *p*(*s→*|*r*), to the distribution of all stimuli, *p*(*s→*), and finding directions in stimulus space for which both distributions differ most. This yields LN cascade models of neural computation: a high-dimensional stimulus is reduced by linear projection to one or two feature values; then, a nonlinearity transforms the feature value(s) to the cell's firing rate.

We defined the stimulus *s→* as a vector corresponding to the envelope of the sound in the 64, 1 ms wide bins preceding each point in time. The 64-dimensional distribution of stimuli *p*(*s→*) was by construction Gaussian. *p*(*s→*|*r*) was sampled by the spike-triggered ensemble (STE), i.e., the set of stimulus segments preceding each spike collected in response to the long noise segment.

In its simplest form, spike-triggered analysis results in calculating the difference of the mean of both distributions, yielding the spike-triggered average (STA) as a single feature: *f→*_{STA} = ∑*s→p*(*s→*|*r*) · *s→* − ∑*s→p*(*s→*) · *s→* (the last term is the mean of all stimuli and a constant for our noise stimuli).

To characterize more complex, multidimensional feature selectivity, STC was performed. To that end, we computed the covariance matrix of the STE, *C*_{s→|r}, and subtracted the covariance matrix of all stimuli, *C _{s→}*, from it: Δ

*C*=

*C*

_{s→|r}−

*C*. The covariance matrix

_{s→}*C*of an arbitrary distribution

*P*(

*x→*) is given by

*C*= ∑

_{x→}*x→p*(

*x→*)(

*x→*− 〈

*x*〉)(

*x→*− 〈

*x*〉)

*, where the angled brackets denote the average. An eigenvalue decomposition of Δ*

^{T}*C*yields stimulus directions in which the variance—and not the mean as for the STA—of the spike-triggered and the raw stimulus ensemble differ most. These directions are indicated by eigenvectors associated with non-zero eigenvalues. However, due to the finite sample size (number of spikes) most eigenvalues are non-zero. We checked the significance of the deviation of each eigenvalue from zero by computing 1 and 99% confidence intervals for the maximal/minimal eigenvalues of each recording. To that end, we generated randomized responses by shuffling the spike times and used the distribution of the larges/smallest eigenvalue from 1000 such randomized responses to derive confidence intervals. All cells in our dataset exhibited at least two significant eigenvalues at this significance level.

We performed the STC analysis in a subspace orthogonal to the STA, by projecting the STA from each stimulus vector: *s→*_{⊥} = *s→* − (*s→ ^{T}f→*

_{STA})

*f→*

_{STA}|

*f→*

_{STA}|

^{2}. This rendered the STC eigenvectors orthogonal to the STA and greatly facilitated the comparison of models derived from STA and STC analysis, as a model including STC filters is a direct extension of the lower-dimensional STA-only model. As eigenvectors yielding the filters recovered by STC analysis are only defined up to an arbitrary sign, we choose the sign such that the STC filter is most similar to the negative derivative of the STA filter (Fairhall et al., 2006). Furthermore, all filters were normalized to unit-norm.

The nonlinearity is given by Bayes' rule as the ratio of the raw and the spike-triggered stimulus distribution in the stimulus subspace defined by the filter(s): 〈*r*〉*p*(*s→′*|*r*)/*p*(*s→′*). 〈*r*〉 is the average firing rate in the response set used for estimating the model. *s→′* is the stimulus projected onto a subspace defined by the STA, or the STA and the STC filter with the largest absolute non-zero eigenvalue; it can thus be either 1D or 2D. *p*(*s→′*) is the distribution of projection values of all stimuli and is by definition Gaussian with SD 6 dB. *p*(*s→′*|*r*) is the distribution of projection values of the STE and was computed by kernel-density estimation.

Two kinds of model were constructed for each recording: one model consisted only of the STA and a 1D nonlinearity. We refer to it as the “STA model.” The other model contained the STA filter and the filter with the largest absolute eigenvalue from the STC analysis—here called the “STC filter”—plus a 2D nonlinearity. We called it the “STC model.”

##### Quantification of model performance.

A bias-corrected version of Pearson's coefficient of correlation ρ was used to quantify how well each model predicted the neuronal response to a novel stimulus (Petersen et al., 2008). To that end, the time-varying firing rate *r*(*t*) of the neuron was estimated from responses to several repetitions of a stimulus not used for model estimation by binning time at 1 ms and smoothing with a box kernel spanning two bins. A predicted response *r̂*(*t*) to the same stimulus was obtained from the STA and STC models.

As the neuronal response is noisy, a model of the neuron can never perform better than that noise level. Thus, the naive estimator of the correlation is downwardly biased by that noise. To correct for this bias, we estimated the noise in the response by calculating *r*(*t*) from two equal-sized, exclusive subsets of the stimulus repetitions, yielding two independent estimates of the firing rate *r*_{1}(*t*) and *r*_{2}(*t*). The coefficient of correlation between these two estimates was then used to normalize the raw correlation: ρ = ρ(*r*, *r̂*)/ρ(*r*_{1},*r*_{2}).

##### Characterization of model structure.

To characterize the shapes of the filters we used two metrics: the first was defined by the coefficient of correlation between the derivative of the STA filter and the STC filter, and the second was given by the delay between the peak of the STA filter and the peak of the STC filter with the STA filter being the reference.

We characterized to what extent each 2D nonlinearity in the STC models corresponded to a truly nonlinear combination of the STA and STC filter. This was done by fitting minimal linear and quadratic models incorporating only linear or quadratic interactions between the two filters as described by Fitzgerald et al. (2011). As the 1D STA model is a lower bound for the performance of the minimal 2D models, we quantified how well a minimal linear or quadratic model could explain the performance gain of the 2D STC model relative to the STA model: (ρ_{STC} − ρ_{STA})/(ρ_{STC} − ρ_{min}), where ρ_{STA} and ρ_{STC} are the bias-corrected coefficients of correlation between the empirical and the modeled firing rate and ρ_{min} is the performance of the minimal linear or quadratic model. Values close to 100% indicate that a minimal model of a given order fully explains the performance gain, values close to 0% indicate a failure to explain and hence the importance of higher-order interactions.

To better characterize the types of computation described by the 2D nonlinearity, we quantified to what extent the nonlinearities implemented one of two canonical logical operations on the output of the STA and STC filter. Logical operations were defined as follows: an (STA AND STC)-like operation corresponds to most large values of the 2D nonlinearity being concentrated in the upper right quadrant (positive values of both filters drive the cell), and an (STA AND-NOT STC)-like operation is implemented by a nonlinearity with most weight in the lower right quadrant (positive STA and negative STC outputs drive the cell). The relative weight each quadrant had in driving the cell was computed by summing over all values in a given quadrant and normalizing by the sum over all quadrants. We then summed the relative weights over the “active” quadrant(s) as defined by each canonical operator and normalized by the number of active quadrants (see Fig. 4*h*, orange). This yielded values between 0 (no weight in the “active” quadrants) and 1 (all of the weight in the “active” quadrants).

##### Simulation of model responses to natural songs.

To study the responses of the models to natural signals, we used a set of songs from eight different male grasshoppers of the species *Chorthippus biguttulus* (Clemens et al., 2011). It is well justified to use models fit to neurons recorded in one species of grasshopper, *L. migratoria*, to study the responses to signals of another species, *C. biguttulus*, as the morphological and physiological properties of neurons at the early stages of processing we are interested in are highly similar (Ronacher and Stumpner, 1988; Neuhofer et al., 2008, Creutzig et al., 2009). As the song's amplitude increases over its duration, we used the last 400 ms where the amplitude plateaued. We transformed the amplitude to a dB scale. The natural songs had a SD of 6 ± 1 dB, close to the SD of the noise stimuli used for the estimation and evaluation of the models. To cover the range of firing rates between 20 and 50 Hz observed for natural stimuli (Clemens et al., 2011), we set the average amplitude to +6 dB.

To quantify the transience of model responses, we calculated the percentage of time the firing rate was below its half-maximal value. A highly transient response will reach its maximal firing rate and then quickly return to smaller firing rates, spending little time above the half-maximal firing rate. In contrast, highly persistent responses will spend most of the time near the maximal firing rate and hence above the half-maximal firing rate. In addition, peak firing rates were estimated as the 99th percentile of the predicted firing rate to natural songs.

##### Quantification of temporal and population sparseness.

Sparseness of the modeled responses was quantified using the measure in Willmore and Tolhurst (2001) as in the following:

For temporal sparseness, the average in Equation 1 was taken over time and then *S* was averaged over the eight songs. For population sparseness, we constructed 500 populations for each model class by randomly combining the responses of four cells of the same model class. Then, the average in Equation 1 was taken over the four cells in a population in each individual time bin, and *S* was averaged over all time bins and songs. For comparison, we included empirical sparseness values of the responses of local and ascending neurons to the same set of songs (Clemens et al., 2011).

## Results

We will start by describing the models of one representative cell in detail. We will then show the two classes of computation we found in our dataset and relate their properties to the generation of a sparse representation of natural communication signals.

### 2D models capture additional aspects of computation

To provide an intuition for the model structure obtained by STC, we will describe the model obtained for one ascending neuron AN1 in depth. The 1D STA model (Fig. 2*a*,*b*) consisted of a single STA filter, which describes the temporal feature the cell is responsive to, and a 1D nonlinearity, which transforms the output of the filter to the cell's firing rate and depicts the neuron's tuning for that feature. The cell's STA was largely unimodal, exhibiting one prominent positive lobe at 20 ms preceding the spike and a weak negative lobe between 30 and 50 ms preceding the spike (Fig. 2*a*). Thus, the cell was sensitive to a lowpass-filtered version of the amplitude of the stimulus. The nonlinearity was skewed toward positive filter values, indicating that the cell preferred stimuli that were similar to the STA (Fig. 2*b*). Firing was reduced for very large values—the cell exhibited thus a bandpass-like tuning for the STA.

The STC model (Fig. 2*c*,*d*) consisted of the STA and a second filter recovered by STC analysis. This STC filter was broader than the STA filter, mostly positive, and led the STA (Fig. 2*c*). Thus, the neuron was influenced not only by the STA but also by the envelope in the 20 ms preceding the STA. In the 2D STC model the stimulus is filtered by both filters in parallel. The output values of both filters are then combined to yield the cell's firing rate. This transformation from pairs of filter values to firing rate is implemented by a 2D nonlinearity, which depicts for each pair of filter outputs the resulting firing rate of the neuron (Fig. 2*d*). This 2D nonlinearity shows that the cell was best driven by stimuli in the lower right quadrant, i.e., when the STA filter produced large positive output values and the STC filter yielded negative outputs. Such a combination corresponds to an AND-NOT like logical operation on the output of the filters. Thus, the STC model yielded a much richer description of neuronal feature selectivity of this cell: addition of a second filter revealed a nonlinear computation performed on the stimulus which was not obvious from the STA model alone.

Generally, the stimulus transformations of auditory neurons in the grasshopper were well described by STA and STC models—model performance ranged between 0.5 and 0.8 (0.57 and 0.63 for the STA and STC models of the example presented above; Fig. 2*e*). The 2D STC models were able to capture additional aspects of the stimulus-response relation as they performed significantly better than the 1D STA models, increasing model performance on average by 9% (ρ_{STA} = 0.59 ± 0.11, ρ_{STC} = 0.65 ± 0.11, mean ± STD, *p* = 6 · 10^{−6}, sign rank). While the STC model thus explained 9% more response variance than the STA model, this gain was relatively small considering the increased complexity of the model (one additional filter and a 2D nonlinearity). However, we will show below that there existed systematic differences in the structure of the predicted responses that enabled the 2D STC models to better explain the level of sparseness observed in the auditory system of the grasshopper.

### Analysis of the model structure reveals two types of computation

Looking at the model structure of all cells in our dataset, we found two principal classes of model (Fig. 3). Notably, this dichotomy was not obvious by looking at the STA filters alone: the STA filter and its nonlinearity were similar to that shown in the example (Fig. 2*a*,*b*)—the STA filter of all cells was thus mainly integrating and drove the cells for positive projection values (Figs 3, upper row, 4*i*). Only the incorporation of the STC filter revealed fundamental, qualitative differences between models, justifying the discrimination of two principal classes of neurons: the STC filters on the left side of the figure were all biphasic whereas those on the right were unimodal (compare Fig. 3*a*,*b*, lower row). The fact that we found only two classes of models does not exclude the existence of additional computational classes in the auditory system of the grasshopper—local or ascending neurons not recorded might implement different kinds of transformations.

Analyzing the filters and nonlinearities allowed us to interpret the computations performed by both model classes.

#### “Derivative-like” cells

We found specimens of this first group of cells only among the second-order, local neurons: TN1, SN1, SN3, and BSN1 (the tonic subtype, termed BSN1t) (Fig. 3*a*). As for all cells, the STA filter was excitatory for positive projections of the stimulus (Fig. 4*i*, red). The STC filter of this class of models was highly similar to the negative derivative of the STA (Fig. 4*a*,*e*; correlation coefficient between the derivative of the STA and the STC filter 0.83 ± 0.11). This high correlation means that the shape of the STC filter was largely determined by that of the STA filter. Given that the STA filters were relatively uniform, this made different cells of this class respond to very similar features. In addition, both filters exhibited great overlap in time, making these cells respond to the stimulus on a short time scale of the order of the STA filter's width (Fig. 4*f*; delay between peaks of the STA and STC filter 3.0 ± 0.6 ms).

The nonlinearity of the STC filter was quadratic-like, rendering these cells weakly phase invariant (Fig. 4*j*, red). A minimal linear model of the interactions of both filters accounted for only 23 ± 8% of the performance gain of the STC model (Fig. 4*g*, top). The incorporation of quadratic interactions doubled this (51 ± 19%; Fig. 4*g*, bottom), but was not able to *fully* explain the interaction of the filters. This indicates that both filters interacted in a highly nonlinear fashion in this class of cells.

To better describe how derivative-like cells integrate the output of the STA and STC filter we fit two different canonical logical operators to the 2D nonlinearity. A cell firing only to positive projections values of both filters performs an AND operation on the two filters. A cell that responds only to positive outputs of the STA and negative outputs of the STC filter performs an AND-NOT operation. We quantified to what degree the two operations were implemented by determining the match between the empirical 2D nonlinearity (Fig. 4*b,d*) and a template corresponding to either logical operation (Fig. 4*h*, bottom; see Materials and Methods). This revealed that the nonlinearity was best explained by an AND-like computation on the STA and STC filter. As the STA filter primarily integrated the stimulus and as the STC filter resembled an upstroke of the stimulus (Figs. 3*a*, lower row, 4*a*), derivative-like cells thus encoded a combination of the intensity—by means of the STA filter—and the derivative—by means of the STC filter—of a sound's envelope.

#### “Leading-suppressive” cells

The phasic subtype of the local neuron BSN1 (BSN1p) and the three ascending neurons, AN1 (Fig. 2), AN2, and AN3, formed the second class of models, which was thus dominated by ascending neurons (Fig. 3*b*). In contrast to the derivative-like cells, where the STC filter strongly resembled the derivative of the STA filter, here, both filters were largely independent and covered a longer segment of the stimulus: the STC filter was mostly integrating and led the STA; both filters spanned between 30 and 40 ms of the stimulus (Figs 3, 4*c*). This class of cells thus integrated the stimulus on a much longer time scale than the derivative-like cells. Along this line, the great range of delays between both filters (−3 to −12 ms; Fig. 4*f*) equipped these cells with a more diverse temporal selectivity than the comparatively uniform derivative-like cells.

The nonlinearity of the STA filter resembled that of derivative-like cells in that it also made the cells fire for positive output values (Fig. 4*i*, green). The nonlinearity of the STC filter was unimodal, with a maximum at 0 and a bias toward negative projection values (Fig. 4*j*, green). This means that positive projection values of the STC filter generally suppressed firing. While a minimal linear model of the nonlinearity only accounted for 34 ± 10% of the performance gain, a minimal quadratic model almost fully explained the performance gain of the STC models (78 ± 23%; compare Fig. 4*g*, top and bottom). This indicates that the operation of these cells is well described by a quadratic interaction of both filters. The logical operation implemented by the nonlinearity was best approximated by an AND-NOT-like logical operation of the output of both filters (Fig. 4*h*), meaning that these cells fired strongly only for positive projection values of the STA and negative projection values of the STC (Fig. 4*d*). As the peak of STC filter preceded that of the STA, we termed this model class “leading-suppressive cells.”

### Contribution of model components to sparse and decorrelated coding of natural stimuli

The analysis of the encoding properties of local and ascending neurons in the grasshopper revealed two different computational classes: the local neurons in our dataset could mostly be termed derivative-like cells, while all ascending neurons were leading-suppressive cells (Figs. 3, 4). To relate the computational properties of both classes of cells to sparse coding we used the STA and STC models to predict responses to a set of natural songs and quantified the contribution of both filters to temporal and population sparseness. The range of response patterns produced by the models matched those found in actual recordings indicating that our models generalized well to natural stimuli (compare model responses in Fig. 5*a*,*b* with recordings in Fig. 1).

### Temporal sparseness

Temporal sparseness describes the tendency to fire in well-defined events interleaved by long stretches of relative quiescence. Generally, STC models responded more transiently to natural signals than STA models (Fig. 5*a*,*b*, compare red and orange traces). For derivative-like cells, this effect was often subtle, leading to a shortening and downscaling of persistent responses after onsets in the stimulus (Fig. 5*a*, black arrows). In contrast, the responses of STC models of leading-suppressive cells deviated more strongly from those of STA models. Here, the tonic responses of STA models became often purely transient in STC models (Fig. 5*b*, black arrows). To gain an intuition on how the properties of individual firing events—defined as isolated packets or bursts of spikes and evident as segregated peaks in a firing-rate profile—affect temporal sparseness, we constructed artificial firing patterns with varying degrees of response transience by either changing the duration or the magnitude of these events (Fig. 6*a*). This showed that the shortness of firing events strongly correlates with temporal sparseness (Fig. 6*a*, top). We quantified the transience of the modeled responses to natural song as the fraction of time the firing rate was below 50% of its maximum. The firing rate of a persistently firing cell will spend most of the time near the maximum, while that of a transiently firing cell will quickly fall below the half-maximal firing rate. Indeed, STC models of both classes responded much more transiently than STA models when quantified by this measure (STA vs STC: derivative-like 69 ± 25 vs 90 ± 11%, leading-suppressive 60 ± 30 vs 94 ± 10%; STA vs STC model: *p* < 6.2 · 10^{−4}, sign rank). In addition to event shortness, the height of individual firing events also contributed to temporal sparseness (Fig. 6*a*, bottom). We quantified this response property by the 99th percentile of each cell's firing-rate distribution and found that STC models exhibited higher peak firing rates than STA models (STA vs STC: derivative-like 238 ± 65 Hz vs 329 ± 140 Hz, leading-suppressive 102 ± 49 Hz vs 160 ± 92 Hz; STA vs STC model *p* < 6.7 · 10^{−3}, sign rank).

In accord with their higher transience and peak firing rates, STC models of both classes did exhibit significantly higher temporal sparseness than the respective STA models (Fig. 5*c*). This effect was most prominent for the leading-suppressive cells. This suggests that the higher temporal sparseness of STC models was due to an amplification and shortening of firing events.

To determine how well the models could reproduce the temporal sparseness of the auditory system of grasshoppers for natural signals, we compared the values obtained from the model responses to those measured empirically in a previous study (Clemens et al., 2011). As the class of derivative-like cells was formed by local neurons we compared the sparseness in these models to that found in local neurons. The sparseness values of leading-suppressive models, being dominated by ascending neurons, were compared with those of ascending neurons. STA and STC models of derivative-like cells well reproduced the range of temporal sparseness values found in actual recordings of natural songs (Fig. 5*c*; STA 0.44 ± 0.17, STC 0.54 ± 0.19, data 0.52 ± 0.19). In contrast, temporal sparseness of leading-suppressive cells depended on model type. While STA models underestimated the temporal sparseness found in data, the STC model exhibited no significant difference in temporal sparseness when compared with experimental data (Fig. 5*c*, right; STA 0.34 ± 0.18, STC 0.63 ± 0.23, data 0.57 ± 0.12). Note that the STC models tended to exhibit slightly higher temporal sparseness than observed in the data.

What property of the STC model of both classes of model contributed to temporal sparseness? The STC filter of derivative-like cells employs these cells with a sensitivity to the derivative of the stimulus and presumably accentuates responses to onsets. Distorting the derivative-like STC filter by scaling down either its positive or negative lobe prolonged firing events (Fig. 6*b*, gray arrowheads, *c*, top) and amplified small firing events (Fig. 6*b*, black arrowheads, *c*, middle). Models with distorted filters thus exhibited longer firing events, less transient responses, and hence reduced temporal sparseness (Fig. 6*c*, bottom). In contrast, changing the delay between the STA and the STC filter had a negligible impact on temporal sparseness (Fig. 7*a–c*). This indicates that the differentiating shape of the STC filter in conjunction with the AND-like integration of both filters led to the increase in temporal sparseness in derivative-like cells.

For the leading-suppressive cells, it was the AND-NOT like interaction of the STA and the STC filter that increased the transience of responses. The fact that the STC filter led the STA filter and suppressed firing made these cells respond strongly only to onsets (Fig. 6*d*). The “inhibitory” STC filter responded tonically to stimuli with constant amplitude due to its integrating shape (Figs. 3*b*, bottom for the shape of STC filters, 6*d*, middle). This tonic suppression effectively reduced strong persistent activity, increased the height of firing events (Fig. 6*d*, bottom) and thereby enhanced temporal sparseness in this class of cells (Fig. 6*d*, inset). The delay between both filters also influenced temporal sparseness (Fig. 7*g*). While making the delay in the model more negative increased temporal sparseness by 10–15%, making the delay less negative tended to decrease sparseness.

### Population sparseness

We calculated population sparseness by constructing four-cell populations of random combinations of models belonging to the same class. Population sparseness is high if different cells in a population do respond with different patterns to the same signal, e.g., by being selective for different stimulus features.

The model responses reveal a fundamental difference in the impact of the STC filter on response diversity (Fig 5*a*,*b*). Derivative-like cells (Fig. 5*a*) exhibited relatively uniform responses with high firing rates during onsets. The STC filter did little to add to response diversity. This comes to no surprise as both filters were relatively uniform in members of this class (Figs. 3*a*, 4*e*,*f*). In contrast, while STA models of leading-suppressive cells also responded relatively uniformly to song, different STC models of this class exhibited a greater diversity of response patterns to the same stimulus (Fig. 5*b*), comparable to that found in actual recordings of ascending neurons (compare Fig. 1).

Consistent with the similarity of the STA filters, populations of STA models of both classes displayed comparable levels of population sparseness (Fig. 5*d*; derivative-like STA 0.28 ± 0.06, leading-suppressive STA 0.26 ± 0.06). However, while the second filter did increase population sparseness only marginally in derivative-like cells, leading-suppressive cells profited greatly from the inclusion of the second filter (STA vs STC: derivative-like 0.28 ± 0.06 vs 0.32 ± 0.06, leading-suppressive 0.26 ± 0.06 vs 0.42 ± 0.08). While STA models of both classes and STC models of the derivative-like models exhibited population sparseness comparable to those obtained empirically for local neurons (0.35 ± 0.05), only the 2D STC models of leading-suppressive cells approached the high values reported previously for the output of the network (0.47 ± 0.03).

Why were only leading-suppressive cells but not derivative-like cells able to significantly increase population sparseness? We have shown above that leading-suppressive cells exhibited a wide range of delays between the STA and STC filter, while those of the derivative-like cells had relatively uniform delays (Fig. 4*f*). If the small range of delays limits response diversity of derivative-like cells, then artificially increasing this range should increase response diversity. Interestingly, this did not truly increase response diversity but only shifted a uniform firing event in time (Fig. 7*a*,*b*,*d*). We thus hypothesized that the derivative-like shape of the filter is the factor limiting response diversity. Indeed, distorting the shape of the STC filter by reducing its positive or negative lobe filter appeared to increase response diversity while reducing temporal sparseness (Fig. 6*b*). Hence, a derivative-like filter in combination with a small range of delays seems to be ineffective in increasing population sparseness.

In contrast to the small impact of the delay between the STA and the STC filter on response diversity in derivative-like cells, the delay had strong impact on response diversity in leading-suppressive cells. Systematically varying the delay between the STA and the STC filter altered the response patterns for these types of cells (Fig. 7*e*,*f*). The response to the onset of the stimulus became longer with increasingly negative delays (Fig. 7*f*, gray arrows). Positive delays fully abolished any onset response. The firing-rate modulations after the onset also changed with delay (Fig. 7*f*, black arrows). Hence, the range of delays between the STA and the STC filter decorrelated responses and increased population sparseness in leading-suppressive cells.

## Discussion

Employing the framework of LN models, we found two classes of cells in the auditory system of grasshoppers. While the STA filter was similar for both classes, cells differed in their second filter: models with a derivative-like STC filter and an AND-like nonlinearity were found at the level of local neurons and models with a leading-suppressive STC filter and an AND-NOT like nonlinearity were found mainly among ascending neurons (Figs. 3, 4).

Our simulations have shown that only 2D models produce the degree of temporal and population sparseness found in the auditory system of the grasshopper (Fig. 5). While both, derivative-like and leading-suppressive cells increased temporal sparseness—though to different degrees—only the latter class of cells substantially increased population sparseness.

In the following, we discuss how the structure of both classes of models increases sparseness. In addition, we will use prior knowledge about the grasshopper to speculate on likely biophysical mechanisms, which could implement these computations.

Note that not all cells in the early auditory system of grasshoppers are included in our dataset. Other cells in network probably perform different computations, adding to the response diversity (Römer and Marquart, 1984; Stumpner and Ronacher, 1991); e.g. the ascending neuron AN4 receives fast inhibitory inputs, which leads to a suppression of responses at onsets (Fig. 7*f*). Another ascending neuron not included in the dataset is AN14, which fires only during silent parts of a stimulus.

### Temporal sparseness

Temporal sparseness increases if transient firing is accentuated and persistent firing is attenuated, leading to responses with short firing events interleaved by long silent epochs (Figs. 5*a*,*b*, 6*a*). The two model classes achieve this transformation by two different computations. The derivative-like STC filter of most local neurons leads to a differentiation of the stimulus (Figs. 6*b*, 7*b*). The leading-suppressive STC filter of the ascending neurons quenches prolonged responses (Fig. 6*d*).

Both operations can be subsumed under the phenomenon of spike-frequency adaptation, the decrease of neuronal firing in response to prolonged stimulation (Benda et al., 2001; Lundstrom et al., 2008; Tripp and Eliasmith, 2010). The relation between spike-frequency adaptation and temporal sparseness has been reported previously (Farkhooi et al., 2009; Houghton, 2009; Nawrot, 2012). Adaptation has been described in terms of differentiation (Lundstrom et al., 2008) or as a highpass filter (Benda et al., 2001), transformations that reduce temporal correlations and thereby increase temporal sparseness (Wang et al., 2003; Tripp and Eliasmith, 2010). The STC filter in derivative-like cells thus likely implements adaptation by determining the derivative of the stimulus (Figs. 3*a*, 6*b*). Derivative-like, 2D models as described in our study have been found in many sensory systems (Brenner et al., 2000; Slee et al., 2005; Fairhall et al., 2006; Atencio et al., 2008; Fox et al., 2010; Kim et al., 2011; Sharpee et al., 2011), suggesting that this model structure instantiates—in addition to its contribution to temporal sparseness—beneficial properties, like adaptation to stimulus statistics and robust and efficient encoding of time-varying stimuli (Fairhall et al., 2001; Sharpee et al., 2011).

Adaptation can be implemented by cell-intrinsic mechanisms via adaptation currents (Wang et al., 2003) or in a network via synaptic depression, feedback inhibition (Papadopoulou et al., 2011), and slow feedforward inhibition (Assisi et al., 2007; Creutzig et al., 2009). In our system, the derivative-like STC filter is likely to be implemented by cell-intrinsic adaptation currents, especially for those cells (TN1, SN1), which receive only excitatory inputs from receptors (Römer and Marquart, 1984). In two cells (BSN1t and SN3), the derivative-like filter could be shaped by additional inhibitory inputs.

Note that a derivative-like STC filter can also be the result of imprecise spike-timing (Dimitrov et al., 2006). However, this is more likely if such jitter is greater than the time scale of the filter (Fairhall et al., 2006; Sharpee et al., 2011). We calculated spike-time jitter as the SD of the timing of individual firing events across trials as in Desbordes et al. (2008). The jitter was always much smaller than the filter width, for derivative-like 10 times and for leading-suppressive cells 5 times smaller (jitter: derivative-like 0.6 ± 0.4 ms, leading-suppressive 1.3 ± 0.5 ms; filter width: derivative-like 5.3 ± 0.9 ms, leading-suppressive 6.8 ± 1.7 ms) (Figs. 2*e*, 3). This renders jitter an unlikely source of the derivative-like filter.

Although the filters of derivative-like cells are relatively similar across cells, the temporal sparseness values cover a relatively broad range (Fig. 5*c*). This reflects a great diversity in the fine structure of the nonlinearity, which enables different degrees of selectivity and hence temporal sparseness. While all nonlinearities implemented an AND-like integration of the STA and STC filter (Fig. 4*h*), the thresholds of individual cells of this class ranged between 45 and 65 dB. This great diversity of thresholds increases the dynamic range of the network and justifies the existence of many different types of local neurons with highly similar filters (Fig. 3*a*).

In the leading-suppressive cells, temporal sparseness is increased by shutting off persistent responses via slow suppression. The two properties of the leading-suppressive models contributing to this transformation are the delay between the STA and the STC filter (Figs. 4*e*, 7*e–f*) and the AND-NOT like nonlinearity (Fig. 4*h*), which are possibly implemented in a network: the STA filter corresponds to excitatory inputs and drives the cell; as the STC filter leads the STA and is suppressive, the cell will only fire strongly if the stimulus preceding the STA is relatively soft. Such an implementation is highly likely for the phasic BSN1, AN1, and AN3, for which strong, slow inhibitory inputs have been shown in dendritic recordings (Römer and Marquart, 1984; Hildebrandt et al., 2009). For another leading-suppressive cell type in our dataset—AN2—a strong afterhyperpolarization has been shown to underlie adaptation (Hildebrandt et al., 2009); yet, this cell also receives contralateral inhibition that can be slower than the excitation. The STC filter of this cell type is thus likely to be a combination of both cell-intrinsic adaptive currents and inhibitory inputs.

That the most likely mechanisms contributing to temporal sparseness are adaptation currents in derivative-like cells and inhibitory and excitatory inputs in leading-suppressive cells suggests a change in the factors dominating response properties of neurons in subsequent stages. The derivative-like cells were local neurons, which primarily pool the responses of receptors and probably gain their sensitivity to the derivative via a cell-intrinsic mechanism. In contrast, most leading-suppressive cells were ascending neurons; their computations are most likely shaped by the connectivity with local neurons and thus by network properties.

### Population sparseness

Our results show that differentiation of the stimulus and slow inhibition increase temporal sparseness by reducing persistent firing (Figs. 5, 6). This property in itself does not necessarily lead to population sparseness. For population sparseness to be high, cells in a population need to exhibit little tendency to fire together by being selective for different features of a stimulus.

The ability of derivative-like filters to increase population sparseness was relatively small (Fig. 5*d*). As the STA filters were similar and the STC filter was heavily constrained by the shape of the STA filter—on average 83% of the STC filter's shape of each cell was explained by the STA filter—this second filter added little response diversity across cells (Figs. 3*a*, 4*e*). Accordingly, the derivative-like cells exhibited very similar feature selectivity and responded uniformly to a stimulus.

In contrast, the STC filter of leading-suppressive cells strongly increased population sparseness—up to the values observed in the auditory system of the grasshopper (Fig. 5*d*). A highly diverse feature selectivity in these cells is established through a large range of delays between the excitatory STA filter and the suppressive STC filter (Fig. 4*f*). As argued above, the model structure of leading-suppressive cells is probably generated by slow feedforward inhibition (Luo et al., 2010). The role of excitation and inhibition in shaping temporal filters and in decorrelating responses between cells in a population has been appreciated previously (Schmuker and Schneider, 2007; Wiechert et al., 2010; George et al., 2011). In addition to the filters of leading-suppressive cells being diverse, the AND-NOT like joint nonlinearity equips these models with a highly nonlinear operation to select a small set of stimuli suitable for firing (Fig. 4*d*,*h*). This narrows the tuning of leading-suppressive cells and reduces the overlap between responses of different cells of this type. The AND-NOT like computation also leads to a “delayed anti-coincidence detection”—the cells fire strongly only if the stimulus at different delays is not loud (Borst et al., 2005). This can yield a combinatorial and synergistic code (Osborne et al., 2008; Schneidman et al., 2011).

### Conclusion

Not diversity and changes in the linear STA filter but nonlinear integration of two filters governs the transformation of the code from a dense and uniform one to a temporally and population-sparse one in the grasshopper (Pitkow and Meister, 2012). The shape of the second filter equips neurons with transformations that decorrelate responses in time and across cells in a population. Additionally, a 2D nonlinearity allows neurons to specifically select a small subset of the feature space spanned by the STA and the STC filter. Mechanisms implementing these abstract computations are ubiquitous in many nervous systems; the transformations found in the grasshopper are thus likely to constitute general principles underlying the transformation of neural representations.

## Footnotes

This work was supported by grants from the Federal Ministry of Education and Research, Germany (01GQ1001A) and the Deutsche Forschungsgemeinschaft (SFB618, GK1589/1). We thank Susanne Schreiber for valuable comments on the manuscript and Henning Sprekeler for discussions. We also gratefully acknowledge the careful reading and constructive criticism of the manuscript by two anonymous reviewers.

The authors declare no competing financial interests.

- Correspondence should be addressed to Jan Clemens, Behavioral Physiology Group, Department of Biology, Humboldt-Universität zu Berlin, Invalidenstrasse 43, 10115 Berlin, Germany. clemensjan{at}googlemail.com