Abstract
Sensory systems encode the environment in egocentric (e.g., eye, head, or body) reference frames, creating inherently unstable representations that shift and rotate as we move. However, it is widely speculated that the brain transforms these signals into an allocentric, gravity-centered representation of the world that is stable and independent of the observer's spatial pose. Where and how this representation may be achieved is currently unknown. Here we demonstrate that a subpopulation of neurons in the macaque caudal intraparietal area (CIP) visually encodes object tilt in nonegocentric coordinates defined relative to the gravitational vector. Neuronal responses to the tilt of a visually presented planar surface were measured with the monkey in different spatial orientations (upright and rolled left/right ear down) and then compared. This revealed a continuum of representations in which planar tilt was encoded in a gravity-centered reference frame in approximately one-tenth of the comparisons, intermediate reference frames ranging between gravity-centered and egocentric in approximately two-tenths of the comparisons, and in an egocentric reference frame in less than half of the comparisons. Altogether, almost half of the comparisons revealed a shift in the preferred tilt and/or a gain change consistent with encoding object orientation in nonegocentric coordinates. Through neural network modeling, we further show that a purely gravity-centered representation of object tilt can be achieved directly from the population activity of CIP-like units. These results suggest that area CIP may play a key role in creating a stable, allocentric representation of the environment defined relative to an “earth-vertical” direction.
Introduction
We first encode our environment relative to egocentric reference frames defined by our sensory organs (e.g., the eyes). These representations are consequently unstable, shifting and rotating as we move. In contrast, we perceive the world as stable, leading to the suggestion that the brain relies on efference copies of internal motor commands to stabilize sensory signals (Sperry, 1950; von Holst and Mittelstaedt, 1950; Cullen, 2004; Crapse and Sommer, 2008; Klier and Angelaki, 2008). However, efference copies cannot fully explain how sensory information is stabilized. For instance, the visual scene is perceived relative to the gravitational vector, an “earth-vertical” direction, regardless of our spatial orientation (buildings are seen as vertically oriented even if we are not upright). This reflects that external gravitational signals detected by the vestibular and proprioceptive systems are used to reinterpret egocentrically encoded retinal images in gravity-centered coordinates (De Vrijer et al., 2008). Deficits in the vestibular system or in the ability to combine gravitational and visual signals due to brain injury thus compromise visual stability (Brandt et al., 1994; Funk et al., 2010, 2011; Baier et al., 2012). Similarly, the absence of gravitational signals in space causes astronauts to experience disorienting jumps in perceived orientation (e.g., “upward” suddenly becomes “rightward”; Oman et al., 1986).
Where and how the brain combines gravitational and visual signals is currently unknown. Human psychophysical work suggests that an earth-vertical representation arises late in the visual hierarchy (Mitchell and Blakemore, 1972), and clinical studies implicate parietal cortex (Brandt et al., 1994; Funk et al., 2010). The suggestion that a gravity-centered visual representation originates in parietal cortex is consistent with the region's role in multisensory processing and reference frame transformations (Buneo et al., 2002; Avillac et al., 2005; Chang and Snyder, 2010; Seilheimer et al., 2014), but it is unclear which area(s) may be involved. One possibility is the macaque caudal intraparietal area (CIP), which encodes visual object orientation (Taira et al., 2000; Rosenberg et al., 2013). Here we examine the potential role of CIP in creating a gravity-centered representation of object tilt.
To investigate whether gravity influences the visual responses of CIP neurons, we measured tuning curves for the tilt of a visually presented planar surface with the monkey upright and rolled ear down. Across the population, a heterogeneous but systematic representation was found in which planar tilt was encoded in a range of reference frames continuously distributed between egocentric and gravity-centered. Through neural network modeling, we further show that a purely gravity-centered visual representation can be created directly from a population of units whose response properties quantitatively match those of CIP neurons. Whereas heterogeneous reference frame representations were previously implicated in the transformation of sensory signals between different egocentric coordinates (Buneo et al., 2002; Mullette-Gillman et al., 2009; Chang and Snyder, 2010; McGuire and Sabes, 2011), this result suggests that they also bridge egocentric and allocentric, gravity-centered representations. Together, the present findings demonstrate that an earth-vertical representation of object tilt can be achieved from CIP responses reflecting the combination of visual and gravitational signals.
Materials and Methods
Animal preparation.
Surgeries and procedures were approved by the Institutional Animal Care and Use Committee, and were in accordance with National Institutes of Health guidelines. Three male rhesus monkeys (Macaca mulatta) weighing between 5.0 and 7.5 kg were surgically implanted with a Delrin ring for head restraint and a removable recording grid for guiding electrode penetrations. In separate surgeries, scleral search coils for monitoring three-dimensional (3D) eye position were implanted. Standard operant conditioning procedures were then used to train the monkeys to fixate a visual target within 2° version and 1° vergence windows. Ocular counter-roll (torsion) was measured using previously described methods (Klier et al., 2011).
Data acquisition.
Recording locations were targeted using MRI atlases (Rosenberg et al., 2013) and confirmed physiologically based on the prevalence of 3D orientation tuning for planar surfaces defined by binocular disparity and/or texture cues (Tsutsui et al., 2001). Extracellular action potentials were recorded with epoxy-coated tungsten microelectrodes (FHC) inserted through a transdural guide tube using a hydraulic microdrive. Neural voltage signals were amplified, filtered (1–10 kHz), and displayed on an oscilloscope to isolate single units using a window discriminator (BAK Electronics). Signals were digitized at 25 kHz using a CED Power1401 data acquisition interface (Cambridge Electronic Design) and stored for off-line analysis. Approximately half of CIP neurons are tuned for planar surface orientation (Taira et al., 2000; Rosenberg et al., 2013). We isolated 78 such neurons and maintained a stable isolation long enough to complete the protocol for 47 (23 in monkey X, 16 in monkey P, and 8 in monkey U). The others were lost due to the protocol's length and frequent rotations of the animal. Limitations on the weight the system could support at rolled head–body orientations restricted our ability to perform the experiment as the animals grew, limiting the sample size from individual monkeys. Custom Spike2 scripts were used for behavioral control. During an experiment, a monkey sat in a primate chair 30 cm from an LCD screen on which the planar stimuli were displayed. An aperture constructed from black nonreflective material was centered on the monitor such that the viewable region was a disc with a 30 cm diameter directly in front of the monkey. To prevent visual cues from influencing estimates of earth-vertical (Funk et al., 2011), the same material was used to encase the setup such that only the stimulus was visible.
Planar stimuli were programmed using OpenGL and rendered with a checkerboard texture pattern and binocular disparity cues. They were displayed as red–green anaglyphs, filling the aperture and centered on the screen where fixation was maintained. Surface orientation tuning curves were first measured with the monkey upright. Tilt was sampled in 30° steps over the range 0° ≤ t < 360°. Slant was sampled in 15° steps over the range 0° ≤ s ≤ 60°. See Rosenberg et al. (2013) for a detailed analysis of CIP surface orientation tuning properties (Fig. 1). Tilt tuning curves were then measured at the preferred slant with the monkey at three static head–body orientations: upright and rolled left/right ear down (LED/RED; Fig. 2). The head–body roll amplitude was always 30° in monkeys X and U, and either 20 or 30° in monkey P. The experimental protocol was as follows. First, a head–body orientation was randomly selected and the animal rolled into that orientation (the screen and animal rotated together about the line of sight). Second, to ensure that the vestibulo-ocular reflex had ended before presenting the visual stimuli, there was a 20 s delay between the end of the movement and the start of stimulus presentation. During that time, the monkey could fixate a point at the center of the black screen for fluid reward. Third, each planar stimulus (plus a black screen for some cells) was presented once in random order. The sampling of planar tilt was matched to the head–body roll amplitude, with 12 tilts for 30° rolls and 18 tilts for 20° rolls (equally spaced over 0° ≤ t < 360°). Each trial lasted 1350 ms, during which the monkey fixated a yellow dot at the center of the screen. The screen was black for the first 300 ms, a planar stimulus was then presented for 1 s, and the screen was black for the last 50 ms. The monkey was rewarded if fixation was maintained for the entire duration. The trial was aborted and data discarded if fixation was broken prematurely. This process was then repeated for another randomly selected head–body orientation. The median number of stimulus repetitions at each head–body orientation was seven, the interquartile range was three, and a minimum of three (N = 5 cells) were required for inclusion.
Analysis.
Stimulus-driven firing rates were calculated from the onset of the visual response to the end of the 1 s stimulus presentation. Response latency was defined as the time after stimulus onset at which the spike density function exceeded the average value over the 250 ms preceding the stimulus onset by three SDs for ≥30 ms (Rosenberg et al., 2013). Example spike density functions smoothed using a Gaussian function with a 20 ms SD are shown in Figure 3. Planar tilt tuning curves were analyzed relative to a head reference frame (HRF). Finding no shift between tuning curves measured in upright and rolled head–body orientations therefore implies a HRF. An eye reference frame (ERF) differs from a HRF because of ocular counter-roll, averaging ∼10% of the head roll amplitude (Haslwanter et al., 1992; Klier et al., 2011). If tilt is encoded in an ERF, tilt tuning curves measured in rolled head–body orientations will shift toward a gravity-centered representation by the degree of ocular counter-roll (here ∼2–3°). If tilt is encoded in a gravity-centered reference frame (GRF), tilt tuning curves measured in upright and rolled head–body orientations will shift by the head–body roll amplitude.
The strength of planar tilt tuning was assessed by calculating a discrimination index (DI), which compares the difference in preferred and least-preferred planar tilt responses to the within-stimulus variation in neuronal firing rate. The DI is calculated as follows:
where Rmax corresponds to the maximum response on the tuning curve (i.e., to the preferred tilt), Rmin corresponds to the minimum response on the tuning curve (i.e., to the least-preferred tilt), and RSE is the square root of the residual variance around the mean responses. This calculation was performed on the square root of the measured firing rates (Prince et al., 2002).
The effects of head–body orientation on planar tilt tuning were quantified by performing the linear transformation analysis illustrated in Figure 4A–C. The relationship between an upright tilt tuning curve FU(t) and a rolled tilt tuning curve FR(t) was modeled as a change in DC offset (DC), multiplicative gain (G), and preferred tilt (t → t + Δ): FU(t) = DC + G · FR (t + Δ). All tuning curves were first linearly interpolated with 0.1° resolution. Each tuning curve measured in a rolled head–body orientation was then circularly shifted (i.e., rotated) to find the Δ term maximizing its correlation with the upright tuning curve. The DC offset and multiplicative gain terms were then determined simultaneously by minimizing the sum squared error between the upright and circularly shifted version of the rolled tuning curve. The transformation order (shift then scale) was used because a correlation-based method for determining the shift between two tuning curves is insensitive to response scale, whereas the scaling depends on the alignment. Changes in each parameter (e.g., shift vs gain) could be reliably distinguished because the complete 360° tuning curves were measured (Mullette-Gillman et al., 2009; Chang and Snyder, 2010).
To quantify the tuning curve shifts, we calculated a shift index by dividing each measured shift (Δ) by the head–body roll amplitude (±20 or 30°). Positive shift indices correspond to shifts toward a gravity-centered representation, whereas negative shift indices correspond to shifts away from a gravity-centered representation. A shift index of 0 indicates that tilt tuning curves measured in upright and rolled head–body orientations are aligned relative to the head (i.e., tilt is encoded in a HRF). A value of 0.1 indicates the tuning curves are aligned relative to the eyes (i.e., tilt is encoded in an ERF). A value of 1 indicates they are aligned relative to earth-vertical (i.e., tilt is encoded in a GRF). Other positive values between 0 and 1 correspond to IRFs between egocentric and gravity-centered representations.
The statistical significance of changes in planar tilt tuning curves with head–body orientation was assessed using permutation tests. Null distributions were defined by comparing bootstrapped tuning curves created under the assumption that head–body orientation has no effect on tilt tuning. Specifically, two bootstrapped tuning curves were created by drawing samples with replacement from all head–body orientations for each tilt in head coordinates, and differences in the two tuning curves were calculated as before. This was repeated 1000 times to define a null distribution against which the actual value from the data was compared. Similarly, for each upright–rolled tilt tuning curve pair, the 95% confidence interval of the shift index was calculated using a bootstrap with 1000 resamplings. Shift indices of 0.1 and 1 were used as boundaries for classifying responses as “egocentric,” “intermediate,” or “gravity-centered.” A comparison of an upright–rolled tilt tuning curve pair was classified as egocentric if the confidence interval included 0 and/or 0.1 (since head and eye reference frames could not be reliably separated; see Results) but not 1, intermediate if the shift index fell between 0.1 and 1 and the confidence interval did not include either 0.1 or 1, and gravity-centered if the confidence interval included 1 but not 0.1. If none of these conditions were met, the comparison was left unclassified.
To test whether tuning bandwidth was affected by head–body orientation, a circular variance measure for 2π-periodic data was calculated (Rosenberg and Issa, 2011; Fig. 5) as follows:
Here, tj is the jth of N planar tilts, Rj is the average neural response (spikes/s) to the jth tilt, i is the imaginary number, and the vertical bars denote the modulus. A value of 0 indicates the neuron responded equally well to all tilts and 1 indicates it only responded to a single tilt.
Neural network.
A neural network model was used to test if CIP-like response properties are sufficient to achieve a purely gravity-centered representation of object tilt. Because the network's foundation was previously described (Deneve et al., 2001; Avillac et al., 2005), we summarize its construction and how it was modified. The model's architecture consisted of a gravitational input layer encoding head-body orientation relative to gravity, a visual input layer encoding object tilt in a HRF, an intermediate layer (putatively CIP), and a gravity-centered visual layer that computes object tilt relative to gravity (see Fig. 8). The decoding methods and equations governing the evolution of the intermediate layer units are described in Deneve et al. (2001). Following work extending the model to account for physiological data (Avillac et al., 2005; Fetsch et al., 2007), we initialized the input layers assuming the underlying tuning curves were 2π-periodic von Mises functions V(t) = DC + Gek[cos(t−t0)−1] with Poisson noise. Here, t is planar tilt, DC is the DC offset, G is the gain, k sets the tuning bandwidth, and t0 is the preferred tilt. The −1 makes the response amplitude independent of k. To determine the parameters, a von Mises function was fit to each CIP tilt tuning curve (Fig. 3, 6A–C). The median fit values were used in the model: DC, 3.4; gain (G), 31; bandwidth (κ), 1.52. The gravity-centered visual layer was initialized with zeros (inactive) since at first the brain has no estimate of object tilt in a GRF (it must be computed). The equation describing the evolution of the gravity-centered visual layer units was as follows:
where RGj(t + 1) is the activity of the jth unit in the gravity-centered visual layer at time t + 1, λ sets the relative weight of the unit's previous state RGj(t) and its input at time t + 1, ωl,mj is the reciprocal connection weight between the jth unit in the gravity-centered visual layer and the intermediate layer unit Al,m, and S and μ are divisive normalization terms. The equations describing the evolution of the input layer units were analogous, and the parameter values were the same as in Deneve et al. (2001).
The weights between the input and intermediate layers are important determinants of the intermediate layer units' behavior. A parameter was therefore introduced to modify the weights between the gravitational and intermediate layer units (Avillac et al., 2005; Fetsch et al., 2007). Specifically, each ωl,m· term between the gravitational and intermediate layer units was multiplied by a random number drawn from a uniform distribution (one value for each intermediate layer unit). Since some CIP tilt tuning curves had no shift or gain with changes in head–body orientation, the lower bound of the distribution was fixed to 0 (i.e., some intermediate layer units had no gravitational drive). To find the distribution of weights resulting in an intermediate layer that most closely resembled CIP, the upper bound was varied between 0.55 and 1.05 in steps of 0.05. The distributions of intermediate layer shifts and gains most closely matched those of CIP (minimizing the total root mean squared error) when the upper bound was 0.75. The network was analyzed after it converged to a stable state (40 iterations).
Results
The spatial orientation of a fixated planar object can be described by two angular variables called tilt and slant (Stevens, 1983). Tilt is a rotation about the line of sight (0° ≤ t < 360°) and slant is a rotation about an axis perpendicular to the line of sight (0° ≤ s < 90°). These variables define a polar coordinate system for surface orientation (Fig. 1A). Approximately half of CIP neurons are tuned for planar slant–tilt (Rosenberg et al., 2013), as illustrated for two cells in Figure 1B (left). An experimental protocol consisting of eight tilts and five slants allowed for the identification of each cell's preferred surface orientation (with the animal upright), such that tilt tuning could be examined at the preferred slant (Fig. 1B, right).
Visual encoding of planar surface orientation. A, Tilt (angular variable; blue) and slant (radial variable; red) are polar coordinates describing the orientation of a planar object. Tilt specifies the direction the plane leans in depth (e.g., left-to-right or front-to-back) and slant specifies the magnitude of the depth gradient (how much it leans). Stimuli were rendered with texture and disparity cues, and viewed through red–green stereoglasses (screenshots shown). They were always presented directly in front of the monkey and centered on the fixation point (small yellow dot). B, Surface orientation tuning curves of two CIP neurons measured with the monkey upright are shown in the left column. Firing rate is color coded and responses are baseline subtracted. The top neuron prefers a slant of ∼30° and a tilt of ∼60° (a plane with the upper right side closest to the monkey). The bottom neuron prefers a slant of ∼60° and a tilt of ∼255° (a plane with the lower left side closest to the monkey). Black circles in the surface orientation plots correspond to tilt tuning curves at constant (preferred) slants, which are plotted in the right column. Error bars show SEM.
To assess the reference frame in which individual CIP neurons encode planar tilt, tilt tuning curves were then measured at the preferred slant with the monkey in different head–body orientations (Fig. 2): upright and rolled LED/RED by 20 or 30°. The monitor was fixed to the setup such that it rolled with the monkey about the line of sight. Thus, the animal's head–body orientation changed the planar tilt in egocentric coordinates but left the slant angle unaffected, dissociating egocentric from gravity-centered representations of planar tilt (Fig. 2B). Only the planar stimuli were visible.
Reference frames for encoding planar tilt. A, In an upright head–body orientation, head (yellow), eye (red), and gravity-centered (cyan) reference frames align. Eight planar tilts with the same slant are illustrated. Stimuli were always presented directly in front of the monkey and centered on the fixation point (small yellow dot). B, In a rolled head–body orientation, the reference frames dissociate. Head and gravity-centered reference frames differ by the head–body roll. Eye and head reference frames differ by ∼10% of the head–body roll because of ocular counter-roll. Planar tilts are labeled in head (yellow) and gravity-centered (cyan) coordinates for an illustrated 45° roll (LED). Rolling the monkey ear down does not affect the plane's slant.
There are several ways a neural population can encode object tilt relative to gravity. One possibility is that tilt is encoded egocentrically, but with gain fields modulating response amplitude (Zipser and Andersen, 1988) with head–body orientation. Potential egocentric representations include an ERF or a HRF. The two differ because of ocular counter-roll, a reflexive eye movement that occurs when the head rolls, rotating the eyes in the opposite direction (Haslwanter et al., 1992). To assess the difference between these representations, counter-roll was measured in two of the monkeys. For 20° head–body rolls, the counter-roll averaged over LED and RED rotations was 2.06° in monkey P. For 30° head–body rolls, the average counter-roll was 2.32° in monkey P and 3.19° in monkey X. This indicates that ocular counter-roll rotates the ERF toward a GRF by ∼10% of the head–body roll amplitude (Fig. 2B). A second possibility is that tilt is directly encoded relative to earth-vertical (i.e., in a GRF). Last, tilt may be encoded in a range of intermediate reference frames (IRFs) distributed between egocentric and gravity-centered, potentially in conjunction with gain fields (Buneo et al., 2002; Avillac et al., 2005; Fetsch et al., 2007; Mullette-Gillman et al., 2009; Chang and Snyder, 2010; McGuire and Sabes, 2011). These possibilities can be differentiated by examining the effects of head–body orientation on the preferred tilt and gain of visual responses.
Dependence of planar tilt tuning on head–body orientation: linear transformation analysis
Planar tilt tuning curves and spike density functions measured with the monkey upright and rolled 30° LED/RED are shown for a single cell in Figure 3. The tuning curves show average firing rates calculated from the onset of the visual response to the end of the 1 s stimulus presentation. Because the tuning curves are plotted in head coordinates, no shift between upright and rolled tuning curves implies a HRF, 3° shifts in tilt preference leftward for LED rolls and rightward for RED rolls imply an ERF, and 30° shifts in the same directions imply a GRF. This cell shows shifts consistent with an IRF representation (i.e., shifting toward but not reaching a GRF) and relatively little change in response gain.
Planar tilt tuning curves and spike density functions measured at three head–body orientations. Central plot shows tilt tuning curves and von Mises fits for a single cell with the monkey LED 30° (blue), upright (green), and RED 30° (magenta). Data are plotted in head coordinates using a polar representation. The angular variable is planar tilt and the radial variable is firing rate. Spike density functions for each tilt and head–body orientation are plotted in the same colors as the tuning curves. Time courses show the 1 s duration of stimulus presentation. Lower left inset shows the tuning curves plotted on a line as in Figure 1B and the rest of the paper. The shifts in the tuning curves with head–body orientation are consistent with an intermediate reference frame representation of planar tilt.
Figure 4A shows planar tilt tuning curves of a cell that had shifts consistent with an IRF representation as well as large gain changes. To quantify the shifts and gain changes without confounding them (see Materials and Methods), each tuning curve measured in a rolled head–body orientation was first circularly shifted (i.e., rotated) to maximize its correlation with the upright tuning curve (Fig. 4B). Multiplicative gain and DC offset terms were then determined simultaneously by minimizing the sum squared error between the upright and circularly shifted version of the rolled tuning curve (Fig. 4C). A shift index was calculated by dividing the measured shift by the head–body roll amplitude. Positive values correspond to shifts toward a gravity-centered representation, whereas negative values correspond to shifts away from a gravity-centered representation. A shift index of 0 indicates that tilt tuning curves measured in upright and rolled head–body orientations are aligned relative to the head (i.e., tilt is encoded in a HRF), a value of 0.1 indicates alignment relative to the eyes (an ERF), and 1 indicates alignment relative to earth-vertical (a GRF). Positive values between 0.1 and 1 correspond to IRFs between egocentric and gravity-centered.
Dependence of planar tilt tuning on head–body orientation (linear transformation analysis). A, Tilt tuning curves of a single cell measured with the monkey LED 30°, upright (UP), and RED 30°. Shading shows the 95% confidence interval. Data are plotted in head coordinates. B, LED and RED tuning curves circularly shifted to maximize their correlation with the UP tuning curve. The LED shift was 17° (shift index, 0.57) and the RED shift was 18° (shift index, 0.60), indicating that the cell encoded planar tilt in an IRF between egocentric and gravity-centered. C, Shifted LED and RED tuning curves with DC offset and multiplicative gain terms applied. The LED gain was 1.5 and the RED gain was 0.8. The LED DC offset was 0.9 and the RED DC offset was −2.0. D, E, Each plot summarizes 92 upright–rolled tuning curve pairs from 47 cells. Data from each monkey are plotted with a different symbol. The green data point in each plot is the cell in A–C. D, Scatter plot and marginal distributions of LED (N = 47) and RED (N = 45) shift indices. A shift index of 0 corresponds to a HRF (yellow plus), 0.1 to an ERF (red plus), and 1 to a GRF (cyan plus). The green plus marks the population average. The distributions are colored according to response classifications based on 95% confidence intervals. E, Scatter plot and marginal distributions of LED and RED gains. Black shading indicates significant gain changes.
The vast majority of shift indices (79 of 92 upright–rolled tuning curve pairs from 47 cells) were positive, indicating that tuning generally shifted in the direction of a gravity-centered representation. Figure 4D shows a scatter plot of shift indices measured for LED and RED head–body orientations along with marginal distributions. The median difference between matched LED and RED shift indices was not significantly different from 0 (sign test, p > 0.9), but the shift indices were also not correlated (Pearson r = 0.01, p > 0.9). This implies that for individual cells, LED and RED tuning curve shifts were often asymmetrical, consistent with results from egocentric reference frame investigations in other areas (Galletti et al., 1993; Duhamel et al., 1997; Chen et al., 2013a,b). Across the 92 tilt tuning curve pairs (pooled, since the LED and RED shift indices were uncorrelated but not different in magnitude), the average shift index was 0.30 and significantly different from the values corresponding to eye, head, and gravity-centered reference frames (Wilcoxon signed rank test, p < 0.001). This was also true for each monkey separately (p ≤ 0.02), with an average shift index of 0.21 (monkey P, 31 pairs), 0.31 (monkey X, 45 pairs), and 0.39 (monkey U, 16 pairs).
To help interpret the measured shifts, each was classified as egocentric, intermediate, or gravity-centered using the bootstrapped 95% confidence interval of the shift index (Materials and Methods). The representation of planar tilt was classified as egocentric for 45 of 92 (∼49%), intermediate for 26 of 92 (∼28%), and gravity-centered for 6 of 92 (∼7%). The remaining 15 of 92 (∼16%) were not statistically classifiable (Fig. 4D). It is important to note that these classifications do not reflect discrete cell types since a continuum of representations was observed across the population. This is consistent with heterogeneous sensory representations in other areas, which also vary continuously (Buneo et al., 2002; Avillac et al., 2005; Mullette-Gillman et al., 2009; Chang and Snyder, 2010; McGuire and Sabes, 2011). These findings demonstrate that most CIP neurons encode planar tilt in a range of reference frames distributed between egocentric and gravity-centered.
To test whether the shifts were related to the strength of tilt tuning, we calculated a DI for each neuron assessing the difference in preferred and least-preferred tilt responses (Materials and Methods). Values closer to 0 indicate weaker tuning whereas values closer to 1 indicate stronger tuning. No significant differences were found in the average DIs between upright (0.69 ± 0.09 SD, N = 47), egocentric (0.70 ± 0.08 SD, N = 45), intermediate (0.73 ± 0.08 SD, N = 26), and gravity-centered (0.64 ± 0.10 SD, N = 6) responses (ANOVA, p ≥ 0.21). However, the tuning strength for the unclassified responses (0.59 ± 0.07 SD, N = 15) was significantly lower than the upright, egocentric, and intermediate responses (p ≤ 0.003). Thus, differences in tuning strength cannot account for the existence of egocentric, intermediate, and gravity-centered classifications, but may explain why some responses were not statistically classifiable. We further examined whether weaker tuning could account for shifts away from (shift index, <0) or beyond (shift index, >1) a gravity-centered representation (Chang and Snyder, 2010). Although the average DI was higher for shifts within the egocentric to gravity-centered bounds (0 ≤ shift index ≤ 1; 0.70 ± 0.09 SD, N = 77) than outside of these bounds (0.63 ± 0.09 SD; N = 15), the difference was not significant (p = 0.06). Theoretical work suggests that cells with “out-of-bound” shifts occur naturally in neural implementations of reference frame transformations (Blohm et al., 2009).
Significant effects were also observed on the response gain of some CIP neurons. A scatter plot of LED and RED gains is shown in Figure 4E along with marginal distributions. In 17 of 92 comparisons (∼18%), there was a significant gain change (permutation test, p < 0.05), and the median difference between matched LED and RED gains was not significantly different from 0 (sign test, p = 0.66). Altogether, 41 of 92 comparisons (∼45%) had a significant tuning curve shift and/or a gain change. In 12 of 92 comparisons (∼13%), there was a significant change in the DC offset (permutation test, p < 0.05), and the median difference between matched LED and RED DC offsets was not significantly different from 0 (sign test, p = 0.28).
The sufficiency of the linear transformation analysis to capture the effects of head–body orientation on the visual responses of CIP neurons was supported by two findings. First, the proportion of explained variance between upright and transformed rolled tuning curves was on average 0.93 ± 0.06 SD (N = 92). Second, head–body orientation did not have a significant effect on tuning bandwidth, which otherwise would have implied a nonlinear transformation. Tilt tuning curve bandwidths at each head–body orientation were compared using a circular variance measure, V2π (Rosenberg and Issa, 2011). The difference in V2π measured when the monkey was upright versus either LED or RED was not significantly different from 0 (sign test: LED, p = 0.14, N = 47; RED, p = 0.23, N = 45), and the upright and rolled values were highly correlated (Fig. 5). This suggests that gravitational signals resulted predominantly, if not entirely, in linear transformations of the planar tilt tuning curves.
Tuning bandwidth does not depend on head–body orientation. Scatter plot of tilt tuning curve bandwidths (V2π) measured in upright versus rolled LED (N = 47) and RED (N = 45) head–body orientations. A value of 0 indicates the cell responded equally well to all planar tilts and 1 indicates it responded to only one tilt. The average V2π upright was 0.34 ± 0.14 SD (N = 47 cells). Upright and rolled tuning bandwidths were highly correlated (LED: r = 0.95, p < 0.0001; RED: r = 0.96, p < 0.0001). The unity line is plotted in black.
Dependence of planar tilt tuning on head–body orientation: von Mises fit analysis
To confirm the findings of the linear transformation analysis, we fit each tilt tuning curve with a von Mises function and assessed the effects of head–body orientation by comparing the fitted parameters. For example, based on the linear transformation analysis, the cell shown in Figure 3 had a LED shift index of 0.67 and gain of 0.70, and a RED shift index of 0.77 and gain of 1.06. Based on the von Mises fits, the cell had a LED shift index of 0.69 and gain of 0.63, and a RED shift index of 0.82 and gain of 0.97. Tuning curves and von Mises fits are shown for three additional cells in Figure 6A–C. Figure 6D shows a scatter plot of shift indices measured for LED and RED head–body orientations along with marginal distributions. On average, the shift index was 0.33, and the representation of planar tilt was classified as egocentric in 42 of 92 comparisons (46%), intermediate in 11 of 92 (12%), and gravity-centered in 11 of 92 (12%). The remaining 28 of 92 (30%) were not statistically classifiable. The larger number of gravity-centered and unclassified responses for the von Mises analysis than the linear transformation analysis reflects that the 95% confidence intervals of the shift indices were on average 1.67 times larger based on the von Mises fits.
Dependence of planar tilt tuning on head–body orientation (von Mises fits). A–C, Tilt tuning curves and von Mises fits for three additional cells tested at three head–body orientations: LED (blue), upright (UP; green), and RED (magenta). A, Roll amplitude, 30°. LED: shift index, −0.13 (a small shift away from a GRF); gain, 2.1. RED: shift index, 0; gain, 1.32. This cell had gain changes but no clear shifts in tilt preference. B, Roll amplitude, 20°. LED: shift index, 0.28; gain. 0.98. RED: shift index, 0.26; gain, 1.0. This cell encoded planar tilt in an IRF, but had no gain changes. C, Roll amplitude, 30°. LED: shift index, 0.61; gain, 0.77. RED: shift index, 1.42; gain, 1.01. This cell had both a shift and a gain change LED, and a shift RED. D, E, Each plot summarizes 92 upright–rolled tuning curve pairs from 47 cells. Data from each monkey are plotted with a different symbol. D, Scatter plot and marginal distributions of LED (N = 47) and RED (N = 45) shift indices. A shift index of 0 corresponds to a HRF (yellow plus), 0.1 to an ERF (red plus), and 1 to a GRF (cyan plus). The green plus marks the population average. The distributions are colored according to response classifications based on 95% confidence intervals. E, Scatter plot and marginal distributions of LED and RED gains. Black shading indicates significant gain changes.
To test whether the shifts were related to the quality of the von Mises fits, we compared the fit correlations for tuning curves measured in rolled head–body orientations as a function of classification. The average fits were as follows: r = 0.94 ± 0.05 SD (N = 42) for egocentric, 0.95 ± 0.04 SD (N = 11) for intermediate, 0.94 ± 0.03 SD (N = 11) for gravity-centered, and 0.86 ± 0.07 SD (N = 28) for unclassified responses. For comparison, the average fit was 0.92 ± 0.05 SD upright and 0.92 ± 0.06 SD across all 139 tuning curves (N = 47 cells). There were no significant differences between upright, egocentric, intermediate, and gravity-centered responses (ANOVA, p ≥ 0.86), but the fits for the unclassified responses were significantly lower than all others (p ≤ 0.002). Thus, differences in the quality of the fits cannot account for the existence of egocentric, intermediate, and gravity-centered classifications, but may explain why some responses were not statistically classifiable. In addition, the fit correlations were not significantly different for shift indices within the egocentric to gravity-centered bounds (0.92 ± 0.06 SD, N = 66) versus outside of these bounds (0.91 ± 0.08 SD, N = 26), indicating that fit quality cannot explain the out-of-bound shifts (Chang and Snyder, 2010).
The von Mises analysis also revealed a significant gain change in 20 of 92 (∼22%) of the upright–rolled tilt tuning curve comparisons (permutation test, p < 0.05; Fig. 6E). The median difference between matched LED and RED gains was not significantly different from 0 (sign test, p > 0.9). In 13 of 92 comparisons (∼14%), there was a significant change in the DC offset (permutation test, p < 0.05), and the median difference between matched LED and RED DC offsets was not significantly different from 0 (sign test, p = 0.74). Additionally, head–body orientation rarely had a significant effect on tuning bandwidth (5 of 92 comparisons, ∼5%), further supporting the suggestion that gravitational signals resulted predominantly, if not entirely, in linear transformations of the planar tilt tuning curves.
Importantly, the tuning curve shifts could not be fully explained by ocular counter-roll. To determine the largest possible shift index that can be attributed to counter-roll, each of the measured counter-rolls (reported above) was divided by the head–body roll amplitude and then averaged. The average normalized measurement was equal to 0.1 (10% of the head–body roll). Because the average shift index was ≥3× larger than this and the distribution of shift indices was significantly different from 0.1, ocular counter-roll cannot fully explain the shifts. Moreover, although the visual input to CIP from V3A may include both eye-centered and head-centered representations (Galletti and Battaglini, 1989; Nakamura et al., 2001), it was not possible to reliably differentiate an ERF (a shift index of 0.1) from a HRF (a shift index of 0). This is because the angular difference between eye and head reference frames (here ∼2–3°) was substantially smaller than the width of the planar tilt tuning curves. In the upright head–body orientation, the average full-width at half-height of the tilt tuning curves calculated from the von Mises fits was 138 ± 53° SD (N = 43 cells; not defined for four cells).
To examine whether the shifts were related to the anatomical locations of the cells, we averaged the LED and RED shift indices for each cell and correlated this with the anterior–posterior, medial–lateral, and dorsal–ventral locations. There were no significant correlations for individual animals or with the data combined across animals (aligned to the average location in each animal) for either analysis method. We additionally found no significant correlation between the shift index and response latency (linear transformation analysis: r = 0.01, p = 0.92; von Mises fits: r = −0.07, p = 0.54). Last, to determine whether the shifts were related to the preferred planar tilt, upright tilt preferences estimated from the von Mises fits were expressed relative to horizontal (wrapped between 0 and 90°) and correlated with the shift index. Both analysis methods revealed weak correlations such that cells preferring more vertical (90 or 270°) than horizontal (0 or 180°) tilts tended to encode planar tilt closer to a GRF (linear transformation analysis: r = 0.10, p = 0.33; von Mises fits: r = 0.18, p = 0.08). Although neither correlation reached statistical significance, both were positive in sign, consistent with psychophysical results showing that humans are slightly more accurate in their judgments of subjective visual vertical than horizontal (Betts and Curthoys, 1998).
Asymmetrical gravitational drive explains differences in tuning shifts
We found that LED and RED tilt tuning curve shifts in CIP are often asymmetrical. Similar asymmetries are found in egocentric reference frame transformations (Galletti et al., 1993; Duhamel et al., 1997; Chen et al., 2013a,b), but it is unclear why they exist. One possibility is measurement error, but alternatively the asymmetry of tuning curve shifts in CIP may be due to differences in the “gravitational drive” individual neurons receive in LED and RED head–body orientations. To test this, we measured activity during fixation of a black screen (no visual stimulation) for 38 cells in upright and rolled head–body orientations, and took the absolute difference between the upright and rolled responses as a measure of gravitational drive. For the RED head–body orientation, the gravitational drive was significantly correlated with the magnitude of the tuning curve shift (Spearman r = 0.46, p = 0.005), and for LED it approached significance (r = 0.30, p = 0.06). This indicates that cells with greater gravitational drive had larger tilt tuning curve shifts (Fig. 7A). It also explains individual differences in tuning shifts across monkeys: they had the same rank order (P, X, then U; least to greatest) whether they were ranked by the average gravitational drive or average shift index. More importantly, the absolute difference in gravitational drive measured in LED and RED head–body orientations was significantly correlated with the absolute difference in the magnitude of LED and RED tuning curve shifts (Spearman r = 0.39, p = 0.01; Fig. 7B). Thus, when there was a larger asymmetry in the gravitational drive between LED and RED head–body orientations, there was a larger difference in the magnitude of the LED and RED tuning curve shifts. This suggests that CIP receives gravitational signals reflecting head–body orientation, and that these signals influence the visual representation of object tilt.
Asymmetrical gravitational drive explains differences in LED and RED planar tilt tuning curve shifts. A, The magnitude of the tuning curve shift is plotted against the gravitational drive (the absolute difference in responses measured during fixation of a black screen in rolled and upright head–body orientations; LED: N = 38; RED: N = 37). Greater gravitational drive predicts larger tuning curve shifts. B, The absolute difference in the magnitude of LED and RED tuning curve shifts is plotted against the absolute difference in LED and RED gravitational drives (N = 37). Greater differences in gravitational drive predict larger asymmetries in the LED and RED shifts. For each data set, a type-II regression line is shown in the same color.
Achieving a gravity-centered visual representation of object tilt
Computational studies suggest there are several ways neural populations can implement reference frame transformations. For example, given a population of units that encode the position of a visual target in retinal coordinates but which have gain changes with eye position, a back-propagation network can learn a set of weights for combining the population activity to compute the target location in head coordinates (Zipser and Andersen, 1988). Such transformations can also be achieved by a feed-forward network that combines the activity of a population of units encoding sensory information in heterogeneous reference frames (Blohm et al., 2009), or similarly by a network with bidirectional connections that performs probabilistic inference (Deneve et al., 2001; Beck et al., 2011; Seilheimer et al., 2014).
To examine whether the response properties of neurons in area CIP are sufficient to create a purely gravity-centered representation of object tilt, we implemented a neural network that has biologically realistic (bidirectional) connectivity between layers, performs probabilistic inference near optimally, and exhibits diverse reference frames (Deneve et al., 2001; Beck et al., 2011). The network architecture is illustrated in Figure 8A. It includes a gravitational input layer encoding head–body orientation relative to gravity (putatively the source of gravitational drive) and a visual input layer encoding object tilt in a HRF. The multisensory combination of these signals is performed by an intermediate layer (putatively CIP), where heterogeneous reference frames arise. Although the model does not separate eye-centered and head-centered visual signals, which may both project to CIP (Galletti and Battaglini, 1989; Nakamura et al., 2001), the simplification is justified since the two representations are too similar here to distinguish. A purely gravity-centered visual representation of object tilt can then be achieved through a weighted combination of the intermediate layer units' activities.
Achieving a gravity-centered visual representation of object tilt. A, A neural network with input layers encoding head–body orientation relative to gravity (pink units) and object tilt in a HRF (yellow units), an intermediate layer (putatively CIP; gray units), and a layer that computes object tilt in a GRF (cyan units). The input layers are initialized with noisy estimates of head–body orientation and object tilt, and the gravity-centered layer is initially inactive since at first the brain has no estimate of object tilt relative to gravity (it must be computed). Network dynamics transform initial states (open black circles) into smooth hills of population activity (filled black circles) whose peaks provide orientation estimates. B, Shift and gain distributions for CIP (LED and RED measurements pooled; black) and the intermediate layer (average ±SD of 100 simulations; gray). C, Tilt tuning curves of an intermediate layer unit in upright (UP) and rolled 30° LED/RED head–body orientations. For this unit, the LED shift was 15° (shift index, 0.51) and the RED shift was 14° (shift index, 0.45). The LED gain was 0.90 and the RED gain was 1.32. D, Object tilt estimates from the visual input layer versus the gravity-centered visual layer in three head–body orientations. The estimates are the same upright (green; lying along the identity line) but displaced vertically by the roll amplitude in rolled head–body orientations (blue and magenta), indicating that a gravity-centered visual representation was achieved. Data points show the average ±SD of 100 simulations.
To conclude that a purely gravity-centered representation of object tilt can be computed from CIP population activity, the response properties of the intermediate layer units and CIP neurons must match quantitatively. To compare their responses, we calculated shifts and gains for the intermediate layer units using the linear transformation analysis described above. A key determinant of the intermediate layer response properties is the input layer weights (Avillac et al., 2005; Fetsch et al., 2007). When the input layers were equally weighted, the tilt tuning curves shifted more with head–body orientation (average shift index, 0.5) than those of CIP neurons, and the distributions of shift indices were significantly different (Kolmogorov–Smirnov test, p = 0.005). A parameter controlling the relative weight of the visual and gravitational signals was therefore introduced and optimized to match the intermediate layer response properties to those of CIP (see Materials and Methods). We found that if the gravitational weights varied uniformly between 0 and 75% as strong as the visual weights, then the intermediate layer units behaved quantitatively like CIP neurons (Fig. 8B,C). The distributions of CIP and intermediate layer shift indices and gains were not significantly different (Kolmogorov–Smirnov test; shifts, p = 0.89; gains, p = 0.80), and the root mean squared errors between the distributions were small (shifts, 0.014; gains, 0.044). The stronger weighting of visual than gravitational input suggests that the responses of surface orientation-selective CIP neurons are visually dominated. Importantly, the network achieved a purely gravity-centered visual representation of object tilt with these weights (Fig. 8D). This demonstrates that CIP-like population activity is sufficient to bridge egocentric and allocentric, gravity-centered representations of visual orientation.
Discussion
Gravity plays a critical role in shaping our experience of the world, influencing both sensory perception and motor planning at fundamental levels (Zago and Lacquaniti, 2005; MacNeilage et al., 2007; Gaveau et al., 2011; Senot et al., 2012). Yet, the question of how gravitational signals affect visual neural responses has remained largely unexplored. In this study, we found that gravity influences the visual responses of neurons in macaque area CIP, resulting in a heterogeneous but systematic representation in which planar tilt is encoded in a range of reference frames continuously distributed between egocentric and gravity-centered. A unique form of multisensory processing thus occurs at the level of CIP, implementing a reference frame transformation using an estimate of the external gravitational vector rather than internal efference copies. Importantly, a sizeable number of CIP neurons encoded an allocentric, gravity-centered representation of visual orientation that was independent of the monkey's spatial pose. Neural network modeling additionally showed that a purely gravity-centered visual representation can be created directly from a population of units with CIP-like response properties. These results together reveal how the brain may achieve an earth-vertical representation of object orientation through the combination of visual and gravitational signals.
Gravitational signals were previously suggested to affect the visual responses of a minority of cells in cat V1 (Denney and Adorjani, 1972; Horn et al., 1972; Tomko et al., 1981), but the results were not compelling. Similar effects were observed both before and after eliminating gravitational signals through high cervical transection of the spinal cord or bilateral labyrinthectomy (Horn et al., 1972), suggesting an alternative explanation based on fluctuations in arousal (Schwartzkroin, 1972; Tomko et al., 1981). The shifts were also not systematic: they were as likely to occur away from a gravity-centered representation as toward such a representation. In contrast, we observed a systematic shift at the population level (Figs. 4D, 6D). Two previous studies examined the effects of gravity on the visual responses of neurons in the early visual cortex of primates. The first reported findings similar to those in the cat (Sauvan and Peterhans, 1999), but had multiple methodological issues and was contradicted by the second, which found no indication of a gravity-centered representation in V1 (Daddaoua et al., 2014). Consistent with the present results, clinical studies suggest gravity-centered representations arise in parietal cortex (Brandt et al., 1994; Funk et al., 2010; Guardia et al., 2012), an important locus of multisensory processing and reference frame transformations (Buneo et al., 2002; Avillac et al., 2005; Mullette-Gillman et al., 2009; Chang and Snyder, 2010; Seilheimer et al., 2014).
But where do the underlying visual and gravitational signals originate? Between V1 and CIP lies V3A (Nakamura et al., 2001), a likely source of egocentric visual representations (the visual input layer in our model) since it can relay both eye-centered and head-centered visual signals (Galletti and Battaglini, 1989). However, the creation of a gravity-centered visual representation may also begin earlier than CIP, perhaps in V3A or another earlier area, though there is no clear evidence supporting this possibility (Sauvan and Peterhans, 1999; Daddaoua et al., 2014). It is also possible that the combination of gravitational and visual signals occurs after CIP, and that the effects observed here reflect feedback. Both possibilities leave room for future investigations, but the correlation between gravitational drive and tuning curve shift (Fig. 7) suggests that the computation may be occurring, at least in part, in CIP. A potential origin of the gravitational input is the caudal cerebellar vermis, which contains a neural estimate of the orientation of the self relative to gravity (Laurens et al., 2013). In future work, it will be important to examine whether other cortical areas, such as the visual posterior sylvian (Dicke et al., 2008; Chen et al., 2011) or the parietoinsular vestibular cortex (Brandt et al., 1994; Chen et al., 2010), contribute to the gravity-centered encoding of visual signals.
We found that differences in LED and RED gravitational drives predict asymmetries in tuning curve shifts at the level of single cells. Analogous differences in efference copy signals may potentially explain asymmetrical tuning curve shifts observed in egocentric reference frame transformations (Galletti et al., 1993; Duhamel et al., 1997; Chen et al., 2013a,b). Whereas heterogeneous reference frame representations are often implicated in the transformation of sensory signals between different egocentric coordinates (Buneo et al., 2002; Mullette-Gillman et al., 2009; Chang and Snyder, 2010; McGuire and Sabes, 2011), we found that they may also bridge egocentric and allocentric, gravity-centered representations. A previous study varying the animal's yaw reported allocentrically referenced positional gain fields in macaque area 7a, which were interpreted as “world-centered” (Snyder et al., 1998). However, because head–body orientation was not varied relative to gravity, it is unknown whether gain fields in area 7a are truly world-centered in the sense that this implies “referenced to gravity.”
In addition to encoding a gravity-centered representation of visual orientation, what other advantages may the heterogeneous reference frame representation we found in CIP confer? It has been suggested that such representations increase the flexibility of neural coding (Chang and Snyder, 2010). In the case of CIP, this may allow the brain to represent an object's orientation in multiple behaviorally relevant reference frames. For example, while a GRF is necessary for determining how an object is posed within the environment, an ERF is more effective for discriminating relative orientations between two objects because the transformation to a gravity-centered representation is both unnecessary for the task and detrimental since it introduces noise (Sober and Sabes, 2005; De Vrijer et al., 2008; Burns and Blohm, 2010). This increase in noise is also evident in our modeling results, which show larger error bars for gravity-centered than egocentric tilt estimates (Fig. 8D). By reweighting the activity of individual CIP neurons, it may be possible to decode object tilt in the most effective reference frame for performing a task (Deneve et al., 2001; Pesaran et al., 2006). Several experimental findings support this possibility. Human psychophysical data show that the same sensory signals can be reweighted to perform different computations (Sober and Sabes, 2005), and fMRI studies suggest that some visual areas switch from encoding retinotopic to spatiotopic representations of a stimulus if it is attended (Burr and Morrone, 2011). Similarly, reach-coding areas can switch from encoding a gaze-centered representation of a motor goal when the target is visible to a body-centered representation when the target is defined by unseen proprioceptive cues (Bernier and Grafton, 2010). The present results are consistent with these findings and suggest that CIP may be important for achieving an allocentric, gravity-centered visual representation as well as for dynamically switching the reference frame in which visual orientation is represented. An important next step is to determine how areas downstream of CIP, such as the anterior intraparietal area which is involved in grasping (Nakamura et al., 2001), encode object orientation. One intriguing possibility is that object orientation is flexibly represented in the most effective reference frame for performing the task at hand.
Footnotes
This work was supported by National Institutes of Health Grants DC014305 (A.R.) and EY022538 (D.E.A.) and by the Koetser Foundation for Brain Research. We thank Mandy Turner for help with monkey care and training; Jing Lin for help with the stimulus software; Eliana Klier for help with the torsion analysis; Adhira Sunkara for the monkey sketch used in Figure 2 and for suggesting the gravitational drive analysis; and Noah Cowan, Christopher Dakin, Greg DeAngelis, and Wei Ji Ma for comments on the paper.
The authors declare no competing financial interests.
- Correspondence should be addressed to Ari Rosenberg, One Baylor Plaza, MS:BCM 295, Houston, TX 77030. rosenberg{at}cns.bcm.edu