## Abstract

Motion estimation is crucial for aerial animals such as the fly, which perform fast and complex maneuvers while flying through a 3-D environment. Motion-sensitive neurons in the lobula plate, a part of the visual brain, of the fly have been studied extensively for their specialized role in motion encoding. However, the visual stimuli used in such studies are typically highly simplified, often move in restricted ways, and do not represent the complexities of optic flow generated during actual flight. Here, we use combined rotations about different axes to study how H1, a wide-field motion-sensitive neuron, encodes preferred yaw motion in the presence of stimuli not aligned with its preferred direction. Our approach is an extension of “white noise” methods, providing a framework that is readily adaptable to quantitative studies into the coding of mixed dynamic stimuli in other systems. We find that the presence of a roll or pitch (“distractor”) stimulus reduces information transmitted by H1 about yaw, with the amount of this reduction depending on the variance of the distractor. Spike generation is influenced by features of both yaw and the distractor, where the degree of influence is determined by their relative strengths. Certain distractor features may induce bidirectional responses, which are indicative of an imbalance between global excitation and inhibition resulting from complex optic flow. Further, the response is shaped by the dynamics of the combined stimulus. Our results provide intuition for plausible strategies involved in efficient coding of preferred motion from complex stimuli having multiple motion components.

## Introduction

Flies often perform complex maneuvers while navigating a 3-D environment (Wagner, 1986; Fry et al., 2003), which results in a wide range of angular velocities. Behavioral studies (Collett and Land, 1975; Budick et al., 2007) have shown that such maneuvers also lead to complex optic flow with rotational components mixed in a way that is hard to disambiguate. Wide-field motion-sensitive neurons in the lobula plate region of the fly's brain have been known to respond selectively to motion along different directions (Hausen, 1981, 1982a,b; Hengstenberg et al., 1982). Anatomical studies have identified ∼60 different motion-sensitive neurons in the lobula plate (Hausen, 1981, 1982a; Hengstenberg et al., 1982; Borst and Haag, 2002), from which only a small subset of neurons, namely H1, V1, HS, and VS, have been extensively studied, owing to their ease of detection and robustness of response.

Much of the classic work on visual motion detection in insects and vertebrates employed simple 1-D spatiotemporal stimuli such as bar, grating, and sinusoidal patterns moving along a particular direction, usually along the preferred direction of the neuron (Hassenstein and Reichardt, 1956; Hausen, 1982b; van Santen and Sperling, 1984). Despite their practical advantages, these simplified stimuli fail to capture essential elements of a complex image flow generated during flight, such as rotations about the three principal body axes, and translation. These motion elements have been demonstrated in the flights of not only blowflies, but also of bees and wasps (Srinivasan et al., 1991; Hateren and Schilstra, 1999; Schilstra and van Hateren, 1999; Zeil et al., 2008). The receptive field structures of LPTCs further suggest sensitivity to movement along other directions besides the preferred direction (Krapp et al., 2001). It is thus conceivable that motion encoding is affected by the presence of additional rotational degrees of freedom. The geometry of spatial patterns (Borst et al., 1993; Saleem et al., 2012) and velocity components from other rotation types (Kern et al., 2005, 2006) also affect the response of motion-sensitive neurons. However, a quantitative study of preferred motion encoding from a complex multistimulus input in the context where a stimulus is not aligned to the preferred direction of the neuron has so far been lacking.

Here, we investigate the effect of additional wide-field components on the encoding of yaw by H1, a motion-sensitive neuron in the lobula plate of the fly. The stimulus consists of a 2-D pattern subjected to combined rotation about two axes. Combining techniques of information theory and dimensional reduction, we quantify the reduction in H1's yaw encoding efficiency in the presence of distractor and show that H1 also encodes certain distractor features; the latter depending on the global excitation and inhibition from the optic flow. We further show that the neural response matches the collective dynamic range of the input, which indicates that response scaling is preserved for these multiple motion components.

## Materials and Methods

##### Fly preparation.

Wild-caught blowflies, *Calliphora vicina*, were housed in an enclosed cabinet that was maintained at a temperature of 21°C and a relative humidity of 55%, and was illuminated with a 12 h light/dark cycle. Experiments were conducted primarily in adult female flies, and consistent results were obtained from six flies. We also checked for consistency of results against male blowflies. The fly was placed in a cylindrical tube, its wings and legs restrained using dental wax. Its head was immobilized by wax bridges from the genae (“cheeks”) to the rim of the tube, leaving the proboscis free so that the fly could be fed in between experiments. The spiracles were kept free to ensure proper respiratory intake during the experiment. After opening the rear of the head capsule, dorsolateral muscles and air sacs covering the lobula plate were removed. The fly holder was then transferred to a goniometer platform near the display screen. Room temperature was maintained between 19°C and 20°C and was monitored throughout the experiment.

##### Stimuli.

A spatial pattern was created generating a 2-D random matrix on a square 0.1° × 0.1° grid, filtering these data with a 2-D spatial Gaussian of SD σ = 6° and thresholding the resulting numbers around their mean to produce a set of binary intensity values. Subsequently, the image was linearly filtered to prevent spatial aliasing during stimulus presentation. The temporal stimuli consist of velocity waveforms corresponding to yaw, pitch, and roll, each of which is drawn from independent Gaussian distributions. Because H1 is primarily sensitive to horizontal motion, roll and pitch motion will generically be referred to as “distractor” motion. A wide range of SD values for roll, σ* _{R}* = {100, 300, 1000, 3000, 5000, 7000, 10000}°/s, and pitch, σ

*= {10, 30, 100, 300, 500, 700, 1000}°/s, were chosen. The SD of yaw motion (σ*

_{P}*) was kept unchanged at 100°/s for all yaw–distractor SD pairs. The relative strength of roll and pitch to yaw is defined as the ratio of their SDs (i.e., ξ*

_{Y}*= σ*

_{R}*/σ*

_{R}*and ξ*

_{Y}*= σ*

_{P}*/σ*

_{P}*, respectively). For each value of ξ*

_{Y}*and ξ*

_{R}*, six different motion variants were presented sequentially in a trial (Fig. 1*

_{P}*C*), with each segment lasting 3 s. Segments 1, 2, and 6 consist of repeated yaw with nonrepeated distractor motion, while segments 3, 4, and 5 consist of yaw and distractor motions both of which were nonrepeated across trials. A total of 210 trials were used for each experiment.

We choose different ranges for ξ* _{R}* and ξ

*because roll motion, which is generated about a point of origin, introduces local velocity components that are different from those generated by pitch motion, which acts in the vertical direction. This serves two purposes. First, we obtain a reasonably wide range of distractor noise over which we can test the efficiency of H1. With the given size of the visual field (Fig. 1*

_{P}*B*), the tangential component induced by a 10,000°/s roll has a velocity of ∼3000°/s at the edge of the field. Tangential components at points closer to the center are smaller. Thus, the tangential velocity components of roll can effectively reach up to an order of 10 times the yaw velocity (σ

*= 100°/s). Since the maximum value of σ*

_{Y}*is 1000°/s, pitch velocities also can reach up to an order of 10 times the yaw velocity. Similarly, velocity components of roll and pitch at the lower limits of σ*

_{P}*and σ*

_{R}*are ∼10 times lower than that for yaw. Second, the velocities lie in the neighborhood of those generated during actual flight, though one must recognize the tradeoff between choosing a wide range of parameter values and the ecological relevance of the values. Behavioral studies suggest that the angular velocity of the head of the fly reaches up to ∼2500°/s for yaw, ∼1200°/s for roll, and <1000°/s for pitch during saccades. The velocities during nonsaccades are typically ≤100°/s (Wagner, 1986; Hateren and Schilstra, 1999; Kern et al., 2005). Therefore, the velocities induced by roll and pitch used in our experiments approximately overlap with the range of angular velocities accessed during saccadic flights. Since the retinal image speed is on the order of 100°/s for nonsaccadic flights, which account for ∼60% of the total flight time (Hateren and Schilstra, 1999), we choose a value of σ*

_{P}*that is of the same order. This allows us to study yaw encoding in the presence of fluctuations ranging from nonsaccadic to saccadic limits.*

_{Y}Euler rotations in a 3-D coordinate system do not commute. In simple terms, this means that the collective rotation of two successive rotations about two different axes depends on the order in which the rotations are performed. However, in our stimuli, yaw, roll, and pitch rotations are small from one frame to the next, such that the rotations commute to a good approximation. Therefore, the order in which rotations are combined does not matter. Also, the rotations are displayed on a 2-D screen, such that in a flattened approximation, homogeneous movements along the horizontal and vertical axes of the screen are equivalent to yaw and pitch motion, respectively. Roll motion, resulting from rotation about the longitudinal axis, is always produced about the center of the screen.

The stimulus was presented on a Tektronix 608 monitor at a 500 Hz frame rate and a radiance of 150 mW · sr^{−1} · m^{−2} (sr: steradian), which corresponds to an estimated 5 × 10^{4} transduced photons per second per photoreceptor. The spatial pattern was displayed on a hexagonal raster of 833 pixels with 38° horizontal and 44° vertical extents (Fig. 1*B*), which subtends ∼8% of the solid angle accessed by the eye of the fly (Krapp and Hengstenberg, 1997; Lewen et al., 2001). Because the pixels were arranged in a hexagonal array, the intensity of each pixel was calculated as the linear weighted average of intensities of four nearest-neighbor pixels. The fly was positioned at 8.4 cm from the screen, such that the projected interpixel separation approximately matches the interommatidial distance of 1.5° in the blowfly (Land, 1997). The match is, however, approximate because pixels were spatially blurred to prevent aliasing. This makes the experiment insensitive to the misalignment between the raster of the eye of the fly and the pixel-sampling raster. The fly watched the center of the display screen at an azimuth angle of 30°. The stimulus files were precomputed using MathWorks Matlab version 7.4. A custom-written C program was used to control a National Instruments PCI-6259 data acquisition card that processed the conversion of stimulus data to the analog voltage displayed on the screen.

##### Electrophysiology.

Tungsten microelectrodes (1 MΩ; tip diameter, ∼5 μm; FHC) were used to make differential single-unit recordings from H1 in the contralateral lobula plate in the right half of the head. H1 was identified by its characteristic excitatory response to horizontal back-to-front motion and its suppressive response to horizontal front-to-back motion. Control experiments using bar patterns of a 7° wavelength were conducted to adjust elevation and twist, such that the response of H1 is the maximum for horizontal motion and the minimum for vertical motion. Care was taken to achieve distinct isolated spikes with a high signal-to-noise ratio of at least 6. The voltage difference between the recording and reference electrodes was filtered, amplified, and thresholded by a window discriminator to generate discrete pulses. The pulses were time stamped with precision at 10 μs intervals by the National Instruments data acquisition card that generates the analog output. This guaranteed that the displayed stimuli and recorded spikes were synchronized to the same board clock. During the course of the experiment, the drift in spike amplitude was <30% and the amplitude remained well above the spike-discriminating threshold set before the start of the experiment.

##### Spike train information.

Noise, both internal and external to the system (de Ruyter van Steveninck and Bialek, 1995; Manwani and Koch, 1999; White et al., 2000), limits the information transmitted by a spike train. The presence of a distractor effectively adds another “noise” component, which further limits the transmitted information. We quantify this by estimating the information transmitted about a repeated yaw motion in the presence of a nonrepeated distractor motion across multiple trials (Fig. 1*C*,*D*, segments 1,2; de Ruyter van Steveninck et al., 1997). To remove the effect of initial transients, we discard data from the first 10 trials of each experiment.

To estimate entropy of the spike train, spikes and spike-absent events within a time bin Δ*t* = 1 ms were assigned values 1 and 0, respectively. This guaranteed a maximum of one spike in each time bin. A distribution of binary values or “words,” *P*(*W*), was obtained from the sequence of 1 and 0 s lying within each time window, *T* ≥ Δ*t* (de Ruyter van Steveninck et al., 1997). The entropy rate for infinite word length, or *T* → ∞, was obtained from linear extrapolation of the flattened part of the *R* versus 1/*T* curve. For a time window *T*, the total entropy rate of the spike train is obtained from *P*(*W*) as follows:
The noise entropy rate is obtained by averaging the distribution of words obtained at each time instance *P*(*W*|*t*), over the time instances in a trial, as follows:
The rate of yaw information transmitted by the spike train in the presence of the distractor is given by the difference between the total entropy rate and the average noise entropy rate, as follows:
Information per spike can be obtained either by setting *T* = Δ*t* and dividing by 〈*r*〉 in Equation 3, or by using the time-varying firing rate *r*(*t*) (Brenner et al., 2000b), as follows:
Here, *T*_{trial} is the time period of repeated stimulus. To estimate the information carried by the spike train about repeated distractor motion in the presence of nonrepeated yaw, we use the same method of analysis as above, but with data from the sixth segment of the trial (Fig. 1*C*,*D*).

##### Error estimation.

It is well known that direct estimates of the entropy rate (*R*_{Tot}) and the noise entropy rate (*R*_{Noise}) are biased due to undersampling, and that this bias becomes more prominent at larger values of *T*. (Treves and Panzeri, 1995; Panzeri et al., 2007). With the assumption that spike correlations are finite, the true entropy rate can be estimated by taking the limiting conditions that data size → ∞ and *T* → ∞ (Treves and Panzeri, 1995; Strong et al., 1998), as follows:
Linear extrapolation is performed separately for the curves *R*_{Noise} and *R*_{Tot,} using only the time windows in which each curve satisfies linearity. The information rate (*R*_{Info}) is obtained by subtracting the extrapolated *R*_{Noise} from the extrapolated *R*_{Tot}.

To obtain statistical error bounds for entropy estimates, we used bootstrap analysis (Efron and Tibshirani, 1993). From the 200 trials, an ensemble of 100 datasets (bootstrap samples), each containing 200 trials, was obtained by resampling. The SD of entropy for different values of *T* was obtained from the bootstrap samples and was used to minimize χ^{2} (Bevington and Robinson, 2003; i.e., the least-square error between estimates from the bootstrap and the parent distribution). The slope of the linear extrapolation is determined from the best-fit curve with a 95% confidence interval.

##### Reverse correlation methodology.

To extract the set of the most significant stimulus features associated with spikes, we use reverse correlation analysis (de Boer and Kuyper, 1968), which was extended to second order by sampling spike-triggered covariances (STCs; de Ruyter van Steveninck and Bialek, 1988). This allows the extraction of a low-dimensional stimulus space, which is spanned by a set of orthogonal eigenvectors associated with leading eigenvalues of the covariance matrix. Since the stimuli consist of combined yaw–distractor motion, both yaw and distractor stimuli are used in the analysis.

The spike time resolution in our analysis is 1 ms, but the stimulus velocity is sampled at 2 ms, corresponding to the frame rate of 500 Hz. In all results presented in this work, the yaw (*Y*) and distractor (*D*) velocities are scaled to their respective SDs, σ* _{Y}* and σ

*. The spike-triggered average (STA) is obtained by averaging 100 ms time windows (τ) of stimulus fluctuations (*

_{D}*s⃗*

_{0}) preceding each spike, over all the

*N*spikes, as follows: where

*t*represents the time of

_{n}*n*th spike. The STC is calculated by using the concatenated spike-triggered yaw and distractor stimuli with their STAs subtracted (de Ruyter van Steveninck and Bialek, 1988), as follows: where

*i*,

*j*ϵ [1, 100] and

*s⃗*,

_{Y}*s⃗*represent stimuli with the mean subtracted normalized to SD. The prior covariance is obtained by averaging over all time bins in the experiment, as follows: Eigenvalues and eigenvectors are then calculated from the difference between the spike-triggered and prior covariance matrices, as follows: The off-diagonal blocks depicted in Equation 12 represent yaw–distractor cross-covariances

_{D}*YD*and

*DY*, while the diagonal blocks represent the yaw and distractor autocovariances

*YY*and

*DD*.

##### Significance testing.

Although the symmetry of Δ*C* guarantees that the eigenvalues (λ) are real, they can be positive or negative, depending on an increase or decrease in variance along the eigenvector directions. To extract the set of most significant eigenvalues, we used a radius of threshold determined from the Wigner distribution (Wigner, 1958). The eigenvalue spectrum is parsed into a relevant set and an irrelevant set that embody the correlations and random noise in Δ*C*, respectively, such that the eigenvalue distribution of the irrelevant set resembles the distribution of eigenvalues of a symmetric random matrix (*C*_{Rnd}) with no element-wise correlations. In the limit of infinite data, this distribution converges to the Wigner semicircle, as follows:
where *M* is the dimensionality of the covariance matrix, σ_{M}^{2} is the variance of matrix elements of *C*_{Rnd}, and *R* = 2 * _{M}* is the radius of the Wigner semicircle. The cumulative Wigner distribution is given by the following:
We implement this method by first generating

*C*

_{Rnd}from

*N*randomly chosen prior stimulus time windows, where

*N*represents the total number of spikes. We then determine the radius (

*R*) of the semicircle from a nonlinear least-squares fit of the irrelevant eigenvalues of Δ

*C*to the theoretical cumulative Wigner distribution (Eq. 15). We can also formulate

*R*analytically. Each off-diagonal element of

*C*

_{Rnd}is the product of two independent random variables with an SD of 1, averaged over

*N*spikes. Therefore, the SD of the elements of

*C*

_{Rnd}is 1/

*R*can be written as

*R*= 2

*= 2*

_{M}*R*are considered relevant. A bootstrap resampling procedure accomplished by randomly shifting the spike train relative to the stimulus (Schwartz et al., 2006) was also used to cross-check the set of relevant eigenvalues.

##### Input–output function.

Bayes' theorem allows us to map our prior stimulus knowledge to the spiking response of H1. The functional form of this map is represented by the input–output function (Brenner et al., 2000a), as follows:
where *P*(*s̃*_{1},…,*s̃ _{K}*) and

*P*(

*s̃*

_{1},…,

*s̃*|spk) represent the prior and the posterior distribution of stimulus projections (

_{K}*s̃*) on the relevant eigenvectors (

_{k}*ê*). The normalized firing rate shown in Equation 17 describes the input–output relationship.

_{k}##### Yaw–distractor subspace information.

The information per spike associated with a relevant dimension can be found by calculating the entropy difference between distributions of spike conditional stimulus projections and prior stimulus projections. For a *K*-dimensional subspace, this is given by the following:
Note that this formulation (Eq. 18) is the same as that used by DeWeese and Meister (1999), who define it as “stimulus-specific” information. According to this definition, the information gained from the observation of a symbol can be positive or negative. Recall that positive eigenvalues are associated with the broadening of distribution, which leads to a posterior entropy that is higher than the prior, and consequently negative, symbol information. Similarly for negative eigenvalues, the symbol information is positive. Therefore, choosing this formulation instead of the alternative “surprise” (DeWeese and Meister, 1999) allows us to make a connection between symbol information and the expansion or compression of the stimulus space.

The information associated with nonoccurrence of spikes can be formulated as follows: Though a symbol can carry negative information, the information averaged over the symbols or, mutual information (MI) is still positive, as follows: We perform this information-theoretic analysis on the yaw subspace, the distractor subspace, and the combined yaw–distractor subspace.

## Results

### Firing pattern modulation

H1 is responsive to yaw motion, and the peristimulus time histogram (PSTH) for repeated presentations of the same pseudorandom yaw waveform therefore shows strong variations in firing rate, often with sharp onsets and offsets. The addition of random nonrepeated distractor motion to the original yaw stimulus will tend to wash out the structure of such yaw-induced rate fluctuations, lowering the information that H1 carries about yaw. This is illustrated in Figure 2 for both roll and pitch distractors at ξ* _{R}* = σ

*/σ*

_{R}*= 10 and ξ*

_{Y}*= σ*

_{P}*/σ*

_{P}*= 3, respectively. For the pure yaw case, the spike raster locks on to preferred yaw motion but shows suppression to nonpreferred yaw motion (Fig. 2*

_{Y}*A*). From the modulation in firing rate, we can compute the information per spike according to Equation 4, and for the conditions of the experiment shown in Figure 2, we get

*I*(spk) = 1.82 bits. The spike rasters for yaw–roll and yaw–pitch combinations show similar locking behavior but have increased variability compared with the pure yaw case. This increase in variability results in broadening of the PSTH peaks (Fig. 2

*B*) and a decrease of single-spike information. Conversely, regions with very low activity experience an increase in firing rate (Fig. 2

*B*, insets). As an example of the spread in PSTH peaks, we fit a Gaussian to an isolated peak in the PSTH (Fig. 2

*C*) over the region where its shape is preserved. The SD of the Gaussian increases from 4 ms with pure yaw, to 14 ms with combined yaw and roll, and to 8 ms with combined yaw and pitch. The effect of the distractor on firing rate is relatively small at intermediate values of ξ

*and ξ*

_{R}*. For example, the mean firing rate (*

_{P}*r̄*) changes from 35 spikes/s without the distractor to 31 spikes/s at ξ

*= 10 and 34 spikes/s at ξ*

_{R}*= 3. The effect is much larger at higher values of ξ, especially for roll, when*

_{P}*r̄*decreases to ∼25 spikes/s at ξ

*= 100.*

_{R}### Effect of distractor on yaw information rate

A more detailed analysis of spike train information takes into account how spike sequences are reproduced across trials. For pure yaw (σ* _{Y}* = 100°/s), the

*R*

_{Tot}is 175 bits/s and the

*R*

_{Noise}is 118 bits/s (Fig. 3

*A*), resulting in an

*R*

_{Info}of 57 bits/s. With a combined roll (ξ

*= 10),*

_{R}*R*

_{Info}drops to 38 bits/s, but

*R*

_{Tot}and

*R*

_{Noise}both increase. Notably, the increase in

*R*

_{Noise}is higher by almost a factor of 2 than the increase in

*R*

_{Tot}. This suggests that the interspike correlation decreases and the variability in spike sequences across trials increases. Because limited sampling affects

*R*

_{Noise}more than

*R*

_{Tot}, the

*R*

_{Noise}curve collapses earlier than the

*R*

_{Tot}curve (Strong et al., 1998).

Spike train information rates at different distractor strengths parametrized by ξ* _{R}* and ξ

*are shown in Figure 4*

_{P}*A*. In the region where ξ

*increases from 1 to 10,*

_{R}*R*

_{Tot}remains virtually unchanged at 181 bits/s, while

*R*

_{Noise}increases from 122 to 143 bits/s. For ξ

*> 10, both*

_{R}*R*

_{Tot}and

*R*

_{Noise}steadily decrease. The onset of the decrease of

*R*

_{Tot}occurs earlier than the decrease of

*R*

_{Noise}. This suggests that when the strength of the distractor is low,

*R*

_{Tot}increases due to a decrease in interspike correlations. When the strength of the distractor becomes very large, the mean firing rate drops. This leads to a lower number of spikes and therefore a smaller repertoire of binary words (

*W*), which reduces both

*R*

_{Noise}and

*R*

_{Tot}.

Over the range of ξ* _{R}* from 1 to 100,

*R*

_{Info}decreases monotonically from 57 to 8 bits/s (i.e., by a factor of 7; Fig. 4

*A*). Over the range of ξ

*from 0.1 to 10,*

_{P}*R*

_{Info}decreases from 58 to 22 bits/s (i.e., by a factor of 2.6; Fig. 4

*B*). The decrease in

*R*

_{Info}for the two cases, however, is subtly different. While the decrease in

*R*

_{Info}for pitch occurs largely due to an increase in

*R*

_{Noise}, for roll it occurs due to a decrease in both

*R*

_{Tot}and

*R*

_{Noise}. Given the geometry of the display used in our experiments (Fig. 1

*B*), the local tangential velocity components of roll at the edge of display is ∼3000°/s for a roll velocity of 10,000°/s (see Materials and Methods). Conceivably, the spatial contrast reduces at these high velocities, which could very well decrease both

*R*

_{Tot}and

*R*

_{Noise}. The information rate shows approximately a linear dependence on the logarithm of ξ

*and ξ*

_{R}*. The largest error in entropy estimates stem from the uncertainty in determining the slope of the extrapolated curves (Fig. 3). Although the absolute value of*

_{P}*R*

_{Info}changes from one fly to the other, we found the characteristic dependence of

*R*

_{Info}on ξ to be consistent across all six tested flies.

### Reverse correlation

The previous subsection quantified the effect of distractor motion on the information encoded about yaw. Here we sketch a more detailed picture of the interactions between yaw and distractor motion associated with spiking response. In principle, this is a problem of describing interactions in a high-dimensional space of stimulus waveforms, but in practice it is often the case that only a few dimensions are relevant. The analysis is based on a generalization of the reverse correlation method (de Boer and Kuyper, 1968), which starts by extracting a few stimulus dimensions that define the relevant subspace. Given that we use a symmetric Gaussian stimulus, this method is guaranteed to provide unbiased estimates of the linear filters, namely, the STA and the eigenvectors of the STC matrix (Chichilnisky, 2001; Paninski, 2003). To estimate the filters, we need many independent samples of the stimulus waveform. For this, we use data from the fifth segment (Fig. 1*C*,*D*) of the trial, where yaw and distractor waveforms are nonrepeated across trials. The results are illustrated using the following subset of parameter values: ξ* _{R}* = {0, 3, 10, 30} and ξ

*= {0, 1, 3, 10}.*

_{P}### Spike-triggered average

The first moment of the stimulus leading up to a spike is given by the STA. For calculating the STA, we use the 100 ms history of yaw and distractor motion preceding each spike (de Ruyter van Steveninck and Bialek, 1988).

As expected, the yaw STAs from both yaw–roll and yaw–pitch stimulus combinations are positive and unimodal, with a peak preceding the spike by ∼20 ± 2 ms. As ξ* _{R}* increases from 0 to 30, the yaw STA amplitude decreases to 45% of its amplitude at ξ

*= 0 (Fig. 5*

_{R}*A*). The roll STA (Fig. 5

*A*, dotted line) is positive and relatively small, but shows signs of increase with an increase in ξ

*, which suggests that H1 is excited more by positive roll than by negative roll. Over the range of ξ*

_{R}*ϵ [0, 30], the mean firing rate,*

_{R}*r̄*, decreases by <10%. The pitch STA (Fig. 5

*B*, dotted line) has a downward peak, the amplitude of which increases with an increase in ξ

*. This could be due to horizontal motion components arising from the vertical movement of slanted pattern edges (Marr and Ullman, 1981). Additionally, projections from VS1 neuron could play a role in it (Haag and Borst, 2003). To test the dependence on pattern structures, we used a symmetric checkerboard pattern of square size 6° × 6° and contrast 1 (not shown). The resulting reduction in pitch STA (Fig. 5*

_{P}*B*, gray curve) indicates that the structure of patterns does play a significant role in modulating the response of H1 to vertical motion.

To study the correspondence between the time-lagged mean stimulus and a spike, we fit an exponential to the tail of the yaw STA (Fig. 5*A*,*B*, insets). The decay constant of the exponential decreases from 40 to 24 ms as ξ* _{R}* increases from 3 to 30, while it decreases from 40 to 28 ms as ξ

*increases from 1 to 10. This suggests that as the distractor variance increases, spikes are more likely to be generated by stimulus features from the immediate past.*

_{P}### Spike-triggered covariance

We compute the joint yaw–distractor spike-triggered covariance around the respective yaw and distractor STAs (de Ruyter van Steveninck and Bialek, 1988). To illustrate the covariance structure, we choose parameter values ξ* _{R}* = 10 and ξ

*= 3, at which*

_{P}*R*

_{Info}for yaw–roll and yaw–pitch stimulus combinations are comparable (Fig. 4, gray shaded boxes).

The structure of the temporal auto-covariance and cross-covariance between stimuli are displayed in the diagonal blocks (*YY*, *RR*, *PP*) and off-diagonal blocks (*YR*, *YP*), respectively, in Figure 6, *A* and *B*. The matrix values represent covariance calculated around the mean, or the STA. For example, negative covariance in the *YR* block indicates that spikes are associated with interactions between an increasing yaw and a decreasing roll around their respective means, or vice versa. The covariance structures indicate that spikes are associated with yaw, distractor, and yaw–distractor correlations. The extension of the structures indicates existence of short- and long-range temporal correlations, with the latter lasting for at least 50 ms.

Using the cutoff set by the radius of the Wigner semicircle, we found one positive and two negative significant eigenvalues for the case ξ* _{R}* = 10, and one positive and three negative significant eigenvalues for the case ξ

*= 3 (Fig. 6*

_{P}*C*,

*D*, red circles). For the other values of distractor variance, we found at the most one positive and four negative significant eigenvalues. This confirms that, even for a complex stimulus with multiple motion components, the relevant subspace has only a few dimensions. An interesting offshoot of this result, the significance of which will be discussed later in the section Relevant subspace information, is that spikes carry negative information about features defined by

*ê*

_{+}.

The radius of the Wigner semicircle provides a hard cutoff for determining the relevant eigenvalues, unlike the bootstrap bounds, the accuracy of which depends on the number of bootstrap samples. To obtain the radius, we fit the set of irrelevant eigenvalues of Δ*C* from the central portion of the eigenvalue spectra (Fig. 6*E*, gray) to the theoretical cumulative Wigner distribution (Eq. 15). For the case ξ* _{R}* = 10 illustrated in Figure 6,

*E*and

*F*, the radius of the Wigner semicircle is

*r*= 0.112. The three eigenvalues (Fig. 6

*E*,

*F*, red markers) lying outside the perimeter of the Wigner semicircle are identified as relevant. With

*N*= 28,236 spikes and

*M*= 100 dimensions, the analytic estimate of the radius turns out to be

*R*= 2

*and ξ*

_{R}*, the set of significant eigenvalues determined from the bootstrap analysis matches the corresponding set from the Wigner analysis (data not shown), thus confirming that all of the leading eigenvalues are correctly identified.*

_{P}### Leading dimensions of yaw–distractor subspace

The eigenvectors corresponding to the leading eigenvalues of Δ*C* define the stimulus subspace most relevant to H1. To obtain an interpretation of the actual motion features that might trigger spiking, we examine the shapes of the three eigenvectors *ê*_{−1}, *ê*_{−2} and *ê*_{+1} corresponding to the two leading negative and the leading positive eigenvalues (Fig. 7*A–C*). An exception is drawn for the pure yaw stimulus case, for which we did not find any positive eigenvalues.

The eigenvector *ê*_{+1} of the yaw–distractor space represents the direction of largest expansion of the stimulus space. For both yaw–roll and yaw–pitch combinations, we obtain one such direction. Since positive yaw velocity excites H1, the sign of each eigenvector is set by imposing the constraint that the yaw part of the curve has a positive rising peak (Fig. 7). The profile of distractor and yaw halves of *ê*_{+1} clearly shows that the effect of the distractor is much larger than the effect of yaw. The presence of a heavy tail suggests relatively long-lasting temporal effects. In addition, the yaw and distractor halves are monophasic, with their relative phases opposite to each other. This could point to a competition for generating spikes by preferential (or positive) roll, and downward pitch motion. We will discuss this in further detail in the next subsection.

The temporal structure of the eigenvector *ê*_{−1} suggests that it is dominated by yaw when the distractor variance is low, and by distractor when the distractor variance is high. The negative eigenvalue associated with *ê*_{−1} indicates a compression of stimulus space along the yaw STA and along a distractor direction that varies with ξ. This can be understood from the change in the temporal profile of the distractor half of *ê*_{−1}, from a monophasic one to a triphasic one (Fig. 7*B*,*C*, blue curves). The triphasic profile demonstrates the presence of positive and negative distractor correlations with different time lags, thus representing the feature “jerk.” Since the eigenvector *ê*_{−2} resembles the derivative of the STA, it represents the feature “acceleration.”

To estimate the change in yaw variance induced solely by distractor, eigenvalue decomposition was performed on the *YY* block (Eq. 12) of the Δ*C* matrix, which contains only yaw correlations. Table 1 shows that the magnitude of the largest negative eigenvalue of yaw stimulus space decreases rapidly with an increase in distractor variance. Over the full range of ξ* _{R}* and ξ

*, the corresponding eigenvalues decrease by 86% and 80%, respectively. Thus, spike coupling to yaw stimulus is markedly weakened by an increase in distractor fluctuations.*

_{P}### Response as a function of relevant stimulus

The input–output relationship (Eq. 17) helps bypass the complexities of sensory processing in the intermediate stages and provides the instantaneous firing rate based on the stimulus distribution. To obtain the input–output mapping, we project the spike-triggered and prior stimuli on each of the yaw and distractor halves of the eigenvectors. The 1-D and 2-D distribution of projections are then used to compute the corresponding response curves (Brenner et al., 2000a).

First, we focus on how the stimulus feature associated with *ê*_{−1} impacts the firing rate. The response curve obtained using the yaw half of *ê*_{−1} shows a characteristic sigmoidal increase, which eventually plateaus at approximately three times the mean firing rate (Fig. 8*A*,*C*, top). This is a typical rectifier-type response indicating the direction selectivity of H1. Unlike negative yaw, negative distractor does not completely suppress the response of H1, as indicated by the firing rate, which remains at the mean firing level (Fig. 8*A*,*C*, left). This suggests excitation arising from a relatively weak distractor in an otherwise yaw-dominated direction. For positive distractor projections, the firing rate initially increases but declines when ξ becomes large, with the latter clearly evident at ξ* _{R}* = 30 (Fig. 8

*A*,

*C*, left, blue curve). Considering that the actual SD of roll in this case is 30 times that of yaw, it is reasonable to expect that such velocities would reduce image contrast and wash out otherwise detectable features, which in turn would diminish the response. An overall assessment of the response curves indicates a broadening of yaw posterior and a narrowing of distractor posterior as ξ

*or ξ*

_{R}*increases. If we define stimulus sensitivity as the maximum slope of the stimulus–response curve, then this indicates an increase in yaw sensitivity and a decrease in distractor sensitivity.*

_{P}It is essential to set a reference for deriving the sign of the physical velocity from the sign of the velocity projection. As the STA shape is a good indicator of the direction of the mean velocity that is preferred by the neuron, we shall use it as our reference. It is clear that the spike-triggered roll and yaw distributions peak respectively at negative and positive projections of *ê*_{+1} (Fig. 8*B*). This, however, fails to indicate the direction of the physical roll or yaw velocity that excites or suppresses the neuron. Therefore, we compared the roll posterior distribution obtained from the eigenvector *ê*_{+1R} with that obtained from the STA of a pure roll stimulus, at the same roll variance (Fig. 8*B*, inset). As shown, the two curves are asymmetric with a large peak and a prominent shoulder. Because the sign of the roll STA is positive and the peak of the roll posterior obtained from projecting the spike-triggered roll on the positive designated roll half of *ê*_{+1} lies on the negative side of the projection axis, the negative roll projection value for *ê*_{+1} corresponds to physical positive roll velocities. The peak indicates strong excitation by positive [counterclockwise (CCW)] roll, and the shoulder indicates weak excitation from negative [clockwise (CW)] roll. The asymmetric U-shaped nonlinearity (Fig. 8*B*, left) demonstrates the bidirectional response to roll.

For the yaw–pitch case (Fig. 8*D*), the firing rate mainly increases with an increase in negative pitch projection values. Since the pitch STA is negative, using the same line of argument as above, we find that downward-directed pitch motion corresponds to negative projection values for *ê*_{+1}. Thus, the feature represented by *ê*_{+1}, associated with an excitatory response, is largely a reflection of the pitch STA.

A topographical map of the 2-D input–output relationship allows us to examine the firing rate dependence on the interaction of the relevant features. The response profile obtained from projections on *ê*_{+1Y}, *ê*_{+1R} and *ê*_{−1Y}, *ê*_{−1P} display a bean-like shape, whose center is located in the positive half of the yaw projection space, with the two ends extending out into the positive and negative halves of the distractor projection space (Fig. 8*A*,*C*). As the distractor variance increases, the two ends gradually fade away, leaving only the center. This indicates the dominance of yaw stimulus along *ê*_{−1}, the effect of which lingers even at high distractor variance, unlike the distractor response, which is weak and ultimately vanishes. Interestingly for *ê*_{+1}, the bean extends farther into the two diagonally opposite quadrants (Fig. 8*B*,*D*). This implies that a coupling between strong preferred distractor and weak positive or negative yaw, and between weak nonpreferred distractor and strong positive yaw, increases the firing rate. Overall, these results demonstrate the presence of excitatory components in distractor motion and a competitive inhibition of the spiking response to yaw caused by strong preferred distractor motion components.

### Stimulus energy shapes input–output response

Direct visual inspection of the 2-D input–output relations of Figure 8 shows that they are not separable as the product of two marginal functions. Moreover, their shape changes with the value of ξ. Here we examine those relations in more detail by drawing various 1-D sections. Sections parallel to the horizontal axis describe how yaw encoding depends on the amount of distractor motion, as measured by its projection on the eigenvector. Conversely, vertical sections describe the encoding of the distractor at different values of yaw projection.

First, we focus on the yaw response when the distractor projection is zero (i.e., *s _{R}*

_{,}

*= 0). This means that the instantaneous distractor energy is zero, but the global distractor energy is not. The input–output curves exhibit two important features. First, with increasing values of ξ*

_{P}

_{R}_{,}

*the yaw response curves flatten (Fig. 9*

_{P}*A*,

*D*, left). Second, to a good approximation the input–output curves all cross at a common point,

*s*=

_{Y}*s*

_{cross}= 0.5, where the slope of the curves is maximal (Fig. 9

*A*,

*D*, right). Based on previous findings that the gain of H1 scales with the SD of the yaw stimulus (Brenner et al., 2000a), one might expect similar scaling to exist for a combined stimulus. To test this hypothesis, we scaled the input–output curves with the “total SD,” but allowing different weights for each stimulus. Analytically, the scale factor is written as

*SF*=

*values collapse onto a single function for α*

_{P}*= 0.1 (Fig. 9*

_{P}*D*, right). However, the curves for different ξ

*values collapse for different values of α*

_{R}*ranging from 0.03 to 0.3 (Fig. 9*

_{R}*A*, right). This suggests that a universal scaling exists for pitch but not for roll. Since the fitting parameter effectively weights the contribution of distractor variance to yaw variance, this discrepancy might be a consequence of the inhomogeneity of local motion components of roll, something that is absent in pitch.

We can also look at the yaw input–output curves that are conditional on nonzero values of distractor projection. Representative examples are shown in Figure 9, *B* and *E* (dominant-negative eigenvalue). Here the red, green, and blue curves are horizontal slices through the 2-D input–output maps at projection values of −2, 0, and 2 SDs of the distractor. These conditions measure instantaneous projections on the distractor modes. Compared with the yaw response curves that are conditional on zero distractor projection (Fig. 9*A*,*D*), the response curves for nonzero distractor projections become more flattened and shift along the horizontal axis (Fig. 9*B*,*E*). The direction of shift is opposite for the opposite sign of the projections. An earlier result that shows that H1 is weakly sensitive to mean distractor velocity (Fig. 5*A*,*B*) provides a clue to this behavior. A rightward or leftward shift in the threshold indicates the presence of large instantaneous distractor energy associated with motion components that primarily compete or act in unison with excitatory yaw components, such that in one case higher than normal yaw excitation is required to elicit a response, and vice versa for the other case. The flattening occurs due to an overall reduction in the sensitivity to yaw in the presence of strong instantaneous distractor energy, thus diminishing the gain to yaw.

Finally, Figure 9, *C* and *F*, shows distractor input–output curves conditional on different yaw projections for the eigenvector *ê*_{+1}, as well as the pure distractor input–output behavior. One interesting feature for the roll input–output curve is that it has an asymmetric “U” shape, which means that spikes are generated preferentially for large negative or large positive projections on the distractor half of *ê*_{+1}. This behavior is related to a broadening of the posterior distribution compared with the prior, resulting in an increased variance, which is also related to the positive sign of the eigenvalue. The asymmetry in the U shape of the roll response curves suggests unequal excitation from CCW and CW roll. A small signature of the U shape is also found in the pitch response curves, and we shall return to a possible explanation for this behavior in the Discussion. The vertical shift of the input–output curves here may be expected simply from the overall effect of yaw on spiking probability.

### Relevant subspace information

We begin with the caveat that eigenvalues provide an incomplete description of the posterior distribution because it fails to capture statistical features beyond the second order. Stimulus-specific information per spike (Eq. 18) is a more appropriate choice because it is independent of assumptions about the statistics of distribution and allows us to probe the existence of a synergistic relationship (Brenner et al., 2000b; Schneidman et al., 2003) between yaw and distractor features. Data limitations make it difficult to sample a high-dimensional stimulus space (Rieke et al., 1997; Dayan and Abbott, 2001), so we limit the analysis to a 2-D subspace. The leading stimulus dimensions corresponding to *ê*_{−1} and *ê*_{+1} are used for the analysis.

Figure 10*A* shows that spike information about yaw decreases asymptotically from 0.53 bits to zero as ξ* _{R}* increases, while spike information about roll and pitch increases from zero to 0.21 and 0.18 bits, respectively (Fig. 10

*B*). Thus, maximum roll and pitch information are respectively 40% and 35% of the maximum yaw information. Since the curves do not plateau, it is conceivable that saturation occurs at yet larger distractor variance. Overall, this characterizes a transition from a yaw-encoding regime to a distractor-encoding regime. Although yaw dominates

*ê*

_{−1}, nonzero distractor information indicates that when distractor variance is high, distractor motion overrides yaw motion in driving spike generation. In contrast, spikes carry negative information, up to −0.22 bits about roll and up to −0.14 bits about pitch, corresponding to the

*ê*

_{+1}dimension (Fig. 10

*C*,

*D*). Note that these measures correspond to the difference between posterior and prior entropies, also termed stimulus-specific information (see DeWeese and Meister, 1999; see Materials and Methods). Recalling the shape of nonlinearity (Fig. 8

*B*,

*D*, left), the negative sign results from higher entropy of a widened posterior distribution. We found no evidence of negative yaw information carried by spikes. The mutual information (Eq. 21) is necessarily positive, and is respectively 0.034 and 0.017 bits for

*ê*

_{−1Y}and

*ê*

_{−1R}. We also found that the information estimate from the 2-D stimulus subspace spanned by

*ê*

_{−1Y}and

*ê*

_{−1R}, or

*ê*

_{−1Y}and

*ê*

_{−1P}, was equal to the sum of information estimates from the individual subspaces, within statistical error. This rules out the synergistic effect between yaw and distractor features.

### A comparison of information estimates

Stimulus information is conveyed both by individual spikes and by spike patterns (de Ruyter van Steveninck and Bialek, 1988; Schneidman et al., 2011). The excess information that spike patterns carry over single spikes (Brenner et al., 2000b) can be obtained by comparing the estimates from Equations 3 and 4 (see Materials and Methods). For example, the spike train conveys 1.8 bits per spike, while single spikes convey 1.25 bits about the repeated yaw stimulus at ξ* _{R}* = 1. Thus, single spikes account for 70% of the spike train information at ξ

*= 1, which eventually decreases to 47% at large roll variance (Fig. 11*

_{R}*A*). For pitch, the decrease is from 79% to 45% (Fig. 11

*B*). A decrease in spike-timing precision to repeated yaw, caused by a distractor competing with yaw for available spikes, may play a dominant role here. The overall dependence of information rate on ξ is approximately logarithmic (Fig. 11

*A*,

*B*).

We can assess the completeness of the feature description by comparing the information associated with features (Eq. 18), with single-spike information (Eq. 4). However, such a comparison requires caution because specific distractor features contribute negative symbol information, unlike the estimate from Equation 4, which is always positive. Therefore, for comparison, we choose only those features that contribute positive information. The leading yaw dimension (*ê*_{−1Y}) of the yaw–roll and yaw–pitch subspaces contributes a maximum of 42% (0.52 bits) and 36% (0.45 bits), respectively, of the single-spike yaw information (Fig. 11*A*,*B*). The joint yaw space constructed from the leading yaw dimensions *ê*_{−1Y} and *ê*_{−2Y} contributes up to 53% of single-spike yaw information. The information we obtained from other relevant dimensions, such as *ê*_{−2}, *ê*_{−3}, were lower and decreased with decreasing significance of the eigenvalues.

Since distractor features contribute to spike generation (Fig. 10), it is conceivable that spike patterns carry significant distractor information. We test this by using a presentation of repeated distractor with nonrepeated yaw across trials (Fig. 1*C*,*D*, segment 6). Figure 11*C* shows that spike train information rises to 1 bit/spike, while single-spike information rises to 0.8 bits for roll. For pitch, the spike train information rises to 0.63 bits/spike, and single-spike information rises to 0.60 bits/spike (Fig. 11*D*). As a comparison, the joint feature space of *ê*_{−1R} and *ê*_{−2R} contributes 0.35 bits, and the joint feature space of *ê*_{−1P} and *ê*_{−2P} contributes 0.37 bits (Fig. 11*A*,*B*). Even if we add the information associated with each of the relevant dimensions, the sum is still less than what we obtain for single spikes from the spike train (data not shown). This indicates that the spike train carries information about stimulus features beyond those identified from the relevant subspace, and it is possible that those features are associated with particular spike combinations.

## Discussion

In this article, we provide a quantitative analysis of preferred motion coding from stimuli not aligned with the preferred direction of the neuron. The presence of distractor reduces the efficiency of coding the yaw of H1, but increases the efficiency of coding the distractor or, specifically, the features of the distractor that excite H1. Geometry of the spatial pattern, disparity in the strength of local excitation and suppression, and blurring are some of the factors that may impact the efficiency of coding preferred motion. Our results provide insight into how the response is shaped, not only by the stimuli statistics, but also by the complexity of the optic flow. The method used here can be generalized to studies in other systems, including, but not limited to, the coding of complex, multidimensional stimulus.

It has been shown that H1 can operate with a reliability close to the statistical limits set by noise in the visual input (de Ruyter van Steveninck and Bialek, 1995). Other, internal, noise sources (Schneidman et al., 1998; Manwani and Koch, 1999; White et al., 2000) may also limit the information communicated about the stimulus. How large is the distractor noise in comparison with noise from these other sources? We found that distractor motion acts as the primary source of noise for yaw (“signal”) encoding, dominating noise from other sources (Fig. 3). If this were not the case, then we would have observed a minimal decrease in spike train information with an increase in distractor variance. The most direct effect of this distractor noise is a decrease in information rate, which is caused by a reduction in firing precision to repeated yaw (Bialek et al., 1991) at moderate distractor variances, and a reduction in both firing precision and firing rate at very large distractor variances (Optican and Richmond, 1987; Bialek et al., 1991; Victor and Purpura, 1996; Warland et al., 1997).

There are two possible explanations for the above findings. First, at low-to-moderate distractor variance, local variation of yaw components is induced by the distractor due to a combination of the aperture effect (Marr and Ullman, 1981) and varying local flow directions, the latter in the case of roll. The resulting ambiguity between components of pure yaw and those induced by distractor cannot be resolved by a local measurement. Rather, the percept of a wide-field motion is built out of many such local ambiguous measurements. This effectively induces noise, which becomes stronger as distractor variance increases. Second, at very high distractor variance, rapid fluctuations from the distractor cause blurring, leading to diminished image contrast. Since the incoming visual signal is low-pass filtered by photoreceptors (Howard et al., 1987), only low spatial frequencies survive at those high-velocity amplitudes. With a typical spatial correlation length of 6° (Fig. 1*A*) and a pitch SD of 1000°/s, the correlation time is 6/1000 = 6 ms, which is of the same order as the photoreceptor integration time (Laughlin and Weckstrom, 1993). The decrease in *R*_{Tot} may be related to these effects.

Previous work suggests that optic flow from natural flight (Kern et al., 2001; Boeddeker et al., 2005; van Hateren et al., 2005; Karmeier et al., 2006) and higher-order motion stimulus (Quenzer and Zanker, 1991; Lee and Nordström, 2012) contain motion components that can evoke neural response. A strong enough motion along a nonoptimal axis can also induce neural excitation (Karmeier et al., 2003). Our results provide evidence that the structure of the optic flow field shaped by the combined motion plays a crucial role in preferred motion encoding. Since roll generates local motion, with yaw and pitch components speeding up as the distance increases from the center of rotation, it induces both excitation and suppression in H1. But a net positive roll STA (Fig. 5*A*) indicates that excitation is stronger in the ventral half than in the dorsal half, corroborating earlier observations (Eckert, 1980; Hausen, 1981). This means that effective excitation from counterclockwise roll (ventral excitation with dorsal inhibition) overrides effective inhibition from clockwise roll (dorsal excitation with ventral inhibition). Excitation and inhibition could also have different levels of saturation, depending on the stimulus strength (Borst and Egelhaaf, 1990), such that with a strong enough clockwise roll, inhibition in the ventral half cannot overcome excitation in the dorsal half. This may qualitatively describe the asymmetric U shape of the roll response curves (Fig. 9*C*).

The geometry and size of the spatial pattern have also been reported to impact motion response, both in insects (Borst et al., 1993; Meyer et al., 2011; O'Carroll et al., 2011; Saleem et al., 2012) and in vertebrates (Rodman and Albright, 1989). The response of H1 to pitch motion in our case can perhaps be attributed to two factors. First, the apparent horizontal motion induced by vertical movement of slanted pattern edges (i.e., the aperture effect; Marr and Ullman, 1981) can induce excitation of H1, and an excitation/inhibition asymmetry, as noted above, may result in residual net effects. Second, excitatory input from ipsilateral VS1 to H1 also can increase the response sensitivity to downward pitch (Haag and Borst, 2003).

What motion features does H1 encode? The eigenvector profiles demonstrate that H1 preferentially encodes stimulus features such as velocity, acceleration, and jerk, among others (Fig. 7), which was also reported in the study by Brenner et al. (2000a). These types of features are also coded by motion-sensitive neurons in pigeons (Cao et al., 2004) and parieto-insular vestibular cortex neurons in macaques (Chen et al., 2010). Our results further establish that these features carry a large portion of stimulus information. For example, the 2-D subspace spanned by *ê*_{−1Y} and *ê*_{−2Y} contributes 40% of the spike train yaw information. This raises the question of whether natural stimuli are better alternatives to Gaussian stimuli. Studies on fly motion-sensitive neurons under natural conditions (Lewen et al., 2001) and using natural stimulus (Karmeier et al., 2006) suggest that neurons encode features beyond velocity. Further, natural stimuli have been linked to a higher coding efficiency in auditory systems (Rieke et al., 1995; Escabí et al., 2003) and visual systems (Vinje and Gallant, 2002; Felsen et al., 2005), indicating the presence of a richer repertoire of features in such stimuli (Dong and Atick, 1995). However, the statistical complexity of natural stimuli and the strong response nonlinearities make the analyses nontrivial (Egelhaaf and Borst, 1989; van Hateren, 1997; O'Carroll et al., 2011). Nonparametric methods such as Maximally Informative Dimensions (Sharpee et al., 2004) and generalizations of it (Rajan and Bialek, 2013) may help identify which features are relevant and how informative those features are.

It is well known that the visual system adapts to stimulus properties, such as contrast and stimulus variance (Maddess and Laughlin, 1985; Smirnakis et al., 1997; Brenner et al., 2000a; Harris et al., 2000; Fairhall et al., 2001; Baccus and Meister, 2002). This has been linked to efficient signal coding (Barlow, 1961) and an increase in neural information throughput (Laughlin, 1981; Brenner et al., 2000a; Safran et al., 2007). We find that H1 adapts its response to yaw in a way that depends on the global distractor variance even when the instantaneous distractor energy is zero. The collapse of scaled yaw response curves onto a single curve (Fig. 9*A*,*D*) may indicate a universal strategy for scaling. In contrast, a change in the fitting parameter for roll (α* _{R}*) suggests that scaling depends crucially on the structure of the optic flow, which determines the effective wide-field excitation, and not on the stimulus variance alone. Overall, these empirical findings provide insight into how a system might adapt to the combined dynamics of different stimulus components. Whether, and to what extent, this can be understood in a framework of optimal coding (Laughlin, 1981) remains an open question. Since different directions of motion are relevant here, a complete answer would require an analysis of coding across multiple neurons, which would necessitate experiments that simultaneously record from motion-sensitive cells with different directional selectivities (see also van Hateren, 1990).

In summary, we have characterized the features encoded by H1 from a stimulus with competing motion components and have shown how such features shape the neural response. Although scaling of the output to the collective dynamic range of the input might be viewed as a strategy for optimizing information transmission about the preferred motion buried in a complex stimulus, further studies are needed, with multiple neurons with different directional selectivity, to validate this argument. The Gaussian white noise stimulus imposes an obvious limitation in our study, because it lacks the statistical variability and functional relevance of a natural stimulus. But its merit lies in the mathematical tractability (Marmarelis and Marmarelis, 1978), which allows complete characterization of the stimulus feature space. A natural next step is to incorporate stimuli with higher-order correlations and, ultimately, natural stimuli into this information-theoretic framework. Modeling (Prenger et al., 2004; Butts et al., 2011) and nonparametric approaches (Sharpee et al., 2004) will prove particularly useful in such studies for characterizing the physiologically relevant features and their neural representation.

## Footnotes

We thank Anne C. Mennen for participating in experiments [supported by National Science Foundation Grant 1156540 for the Research Experiences for Undergraduates (REU) Program]; and Philip L. Childress for technical support.

The authors declare no competing financial interests.

- Correspondence should be addressed to Rob de Ruyter van Steveninck, Department of Physics, Indiana University Bloomington, Bloomington, IN 47405. deruyter{at}indiana.edu