Abstract
Convergence of visual motion information (optic flow) and vestibular signals is important for self-motion perception, and such convergence has been observed in the dorsal medial superior temporal (MSTd) and ventral intraparietal areas. In contrast, the parieto-insular vestibular cortex (PIVC), a cortical vestibular area in the sylvian fissure, is not responsive to optic flow. Here, we explore optic flow and vestibular convergence in the visual posterior sylvian area (VPS) of macaque monkeys. This area is located at the posterior end of the sylvian fissure, is strongly interconnected with PIVC, and receives projections from MSTd. We found robust optic flow and vestibular tuning in more than one-third of VPS cells, with all motion directions being represented uniformly. However, visual and vestibular direction preferences for translation were mostly opposite, unlike in area MSTd where roughly equal proportions of neurons have visual/vestibular heading preferences that are congruent or opposite. Overall, optic flow responses in VPS were weaker than those in MSTd, whereas vestibular responses were stronger in VPS than in MSTd. When visual and vestibular stimuli were presented together, VPS responses were dominated by vestibular signals, in contrast to MSTd, where optic flow tuning typically dominates. These findings suggest that VPS is proximal to MSTd in terms of vestibular processing, but distal to MSTd in terms of optic flow processing. Given the preponderance of neurons with opposite visual/vestibular heading preferences in VPS, this area may not play a major role in multisensory heading perception.
Introduction
The continuously changing image motion on the retina (“optic flow”) during navigation provides information about one's direction of heading (Gibson, 1950, 1986; Warren, 2004; Britten, 2008). Optic flow responses have been described in multiple cortical areas, including the dorsal medial superior temporal (MSTd) (Saito et al., 1986; Tanaka et al., 1986, 1989; Tanaka and Saito, 1989; Duffy and Wurtz, 1991a,b, 1995; Graziano et al., 1994; Lagae et al., 1994; Orban et al., 1995; Lappe et al., 1996) and ventral intraparietal (VIP) areas (Schaafsma and Duysens, 1996; Bremmer et al., 2002a; Zhang et al., 2004; Zhang and Britten, 2010). MSTd and VIP neurons are also tuned to inertial vestibular stimulation (Duffy, 1998; Schlack et al., 2002; Klam and Graf, 2003; Gu et al., 2006; Chen et al., 2007; Takahashi et al., 2007).
Neither MSTd nor VIP receives direct vestibular projections through the thalamus (Meng et al., 2007; Meng and Angelaki, 2010). However, short-latency vestibular inputs do reach the parieto-insular vestibular cortex (PIVC) and the visual posterior sylvian area (VPS), among other areas (Akbarian et al., 1992). Indeed, the spatiotemporal dynamics of vestibular responses to translation revealed that VIP and MSTd have significantly longer latencies than PIVC (Chen et al., 2007, 2011). However, using random-dot stimuli that evoke robust optic flow responses in MSTd and VIP (Gu et al., 2006, 2008; Chen et al., 2007; Takahashi et al., 2007), we found that PIVC neurons are not responsive to optic flow (Chen et al., 2010). This is also true for vestibular-driven neurons in the ventral posterior thalamus (Meng and Angelaki, 2010) and brainstem/cerebellar nuclei (S. Liu and D. E. Angelaki, unpublished observations). Thus, convergence of optic flow and vestibular signals may be limited to extrastriate visual cortical areas and their targets in parietal cortex, regions that appear to be far removed from the vestibular periphery.
Before accepting such a conclusion, which has important consequences for the neural basis of self-motion perception, it is important to explore whether other cortical areas that receive short-latency vestibular input may show selective responses to optic flow. One such candidate area is VPS, also known as “parieto-temporal association area T3” (Jones and Burton, 1976; Guldin et al., 1992; Guldin and Grüsser, 1998; Dicke et al., 2008). VPS is strongly interconnected with PIVC and receives thalamic input from the pulvinar and ventral posterior nuclei (Akbarian et al., 1992), as well as inputs from a portion of the superior temporal sulcus thought to be area MST (Guldin and Grüsser, 1998). Thus, we have explored regions around the posterior tip of the lateral (sylvian) sulcus and have characterized VPS responses to three-dimensional (3D) inertial motion and optic flow stimuli. We find that some VPS cells are tuned for heading defined by both visual and vestibular cues, but nearly all VPS cells have opposite visual and vestibular heading preferences, whereas roughly equal proportions of MSTd neurons have congruent and opposite preferences for the two cues.
Materials and Methods
Subjects and setup.
Extracellular recordings were obtained from three hemispheres in two male rhesus monkeys (Macaca mulatta) weighing between 6 and 10 kg. The surgical preparation, experimental apparatus, and methods of data acquisition have been described in detail previously (Gu et al., 2006; Fetsch et al., 2007; Takahashi et al., 2007; Chen et al., 2011). Briefly, each animal was chronically implanted with a circular molded, lightweight plastic ring for head restraint and a scleral coil for monitoring eye movements inside a magnetic field (CNC Engineering). Behavioral training was accomplished using standard operant conditioning with liquid rewards. All animal surgeries and experimental procedures were approved by the Institutional Animal Care and Use Committee at Washington University and were in accordance with NIH guidelines.
During experiments, animals were seated comfortably in a primate chair, which was secured to a 6 df motion platform (Moog; 6DOF2000E). Translational and rotational movements along or around any arbitrary axis in 3D space were delivered by this platform. In all experiments, the head was positioned such that the horizontal stereotaxic plane was earth-horizontal, with the axis of rotation always passing through the center of the head (i.e., the midline point along the interaural axis). Computer-generated visual stimuli were rear-projected (Christie Digital Mirage 2000) onto a tangent screen placed ∼30 cm in front of the monkey, subtending 90 × 90° of visual angle. Visual stimuli simulated self-motion through a 3D cloud of random dots (100 cm wide, 100 cm tall, and 40 cm deep), were programmed using the OpenGL graphics library, and were generated using an OpenGL accelerator board (Quadro FX 3000G; PNY Technologies) (for details, see Gu et al., 2006). The projector, screen, and magnetic field coil frame were mounted on the platform and moved together with the animal.
Tungsten microelectrodes (Frederick Haer Company; tip diameter, 3 μm; impedance, 1–2 MΩ at 1 kHz) were inserted into the cortex through a transdural guide tube, using a hydraulic microdrive (Frederick Haer Company). Behavioral control and data acquisition were accomplished by custom scripts written for use with the TEMPO system (Reflective Computing). Neural voltage signals were amplified, filtered (400–5000 Hz), discriminated (Bak Electronics), and displayed on an oscilloscope. The times of occurrence of action potentials and all behavioral events were recorded with 1 ms resolution. Raw neural signals were also digitized at a rate of 25 kHz using a CED Power 1401 (Cambridge Electronic Design) for off-line spike sorting.
Anatomical localization.
The relevant areas in the lateral sulcus were first identified using MRI scans. An initial (“baseline”) scan was performed on each monkey, before any surgeries, using a high-resolution sagittal MPRAGE sequence (0.75 × 0.75 × 0.75 mm voxels). SUREFIT software (Van Essen et al., 2001) was used to segment gray matter from white matter. A second scan was performed after the head holder and recording grid had been surgically implanted. Small cannulae filled with a contrast agent (gadoversetamide) were inserted into the recording grid during the second scan to register electrode penetrations with the MRI volume. The MRI data were converted to a flat map using CARET software and the flat map was morphed to match a standard macaque atlas (Van Essen et al., 2001). The data were then refolded and transferred onto the original MRI volume.
With the MRI scans and functional boundaries as a guide, we performed electrode penetrations to map the posterior extent of the lateral sulcus. We identified VPS based on the presence of multiunit responses to visual motion and its location posterior to PIVC. This region exhibiting visual responses extended ∼4–5 mm anterior to posterior. At each location along the anterior/posterior axis, we first identified the location of the medial tip of the lateral sulcus and then moved laterally until we no longer encountered directionally selective visual responses in the multiunit activity. At the anterior end of this region, VPS merged into PIVC. At the posterior end, the gray matter of the lateral sulcus became shallow and responses less clear. Within the region identified as VPS, we recorded from any neuron that either responded to a large-field flickering random-dot stimulus or was spontaneously active (even if it did not respond to the random-dot stimulus). Thus, there was no preselection of cells based on particular response properties—we recorded from all well isolated neurons. The location of recorded neurons was then reconstructed based on MRI scans and plotted on coronal sections through the monkey's brain (see Fig. 1) or a flat map of the cortex (see Fig. 12).
Experimental protocol.
Once action potentials from a single VPS neuron were satisfactorily isolated, responses were measured during a 3D translation protocol (Gu et al., 2006). Stimuli were presented along 26 heading directions corresponding to different azimuth and elevation angles in increments of 45°. This included all combinations of movement vectors having eight different azimuth angles (0, 45, 90, 135, 180, 225, 270, and 315°, where 0° corresponds to rightward translation) and three different elevation angles: 0° (the horizontal plane) and ±45° (8 × 3 = 24 directions). In addition, elevation angles of −90 and 90° were included to generate upward and downward movement directions, respectively. The motion stimulus lasted 2 s and had a Gaussian velocity profile with a corresponding biphasic acceleration profile. The motion amplitude was 13 cm (total displacement), with a peak acceleration of ∼0.1 g (∼0.98 m/s2) and a peak velocity of ∼30 cm/s.
Within a single block of trials, two or three distinct stimulus types were interleaved. In the “vestibular” condition, the monkey was translated by the motion platform along each of the 26 directions in the absence of optic flow. The screen was blank, except for a head-centered fixation target. In the “visual” condition, the motion platform remained stationary while optic flow stimuli were presented on the display screen. Optic flow simulated translation along the same set of directions tested in the vestibular condition. Note that all stimulus directions are referenced to body motion (real or simulated), such that neurons with the same direction preference in the visual and vestibular conditions have congruent tuning. In the “combined” condition, the animal was moved by the motion platform while a spatially and temporally matched optic flow stimulus was simultaneously presented. In all stimulus conditions, the animal was required to establish visual fixation on a central target (0.2° in diameter) for 200 ms before stimulus onset, and to maintain fixation throughout the stimulus presentation (fixation windows spanned 2 × 2° of visual angle). A liquid reward was administered at the end of each trial if fixation was successfully maintained. If fixation was broken at any time during the stimulus, the trial was aborted and data were discarded. Neurons were included in the sample if each distinct stimulus was successfully repeated at least three times. Across our sample of VPS neurons, 90% of cells were isolated long enough to complete at least five stimulus repetitions.
For most neurons, the visual and vestibular stimulus conditions were randomly interleaved in a single block of trials, along with a (null) condition in which the motion platform remained stationary and no star field was shown (to assess spontaneous activity). To complete five repetitions of all 26 directions of motion for each stimulus condition, plus five repetitions of the null condition, the monkey was required to successfully complete 26 × 2 × 5 + 5 = 265 trials. For a subset of neurons, the combined condition was also interleaved in the same block of trials (requiring a total of 395 trials for five repetitions). These stimulation protocols are identical with those used previously to characterize MSTd (Gu et al., 2006; Takahashi et al., 2007), PIVC (Chen et al., 2010), and VIP neurons (Chen et al., 2007). For a subpopulation of VPS neurons, neural responses were also collected for the vestibular condition in complete darkness (with the projector turned off). In these controls, there was no behavioral requirement to fixate and rewards were delivered manually to keep the animal motivated.
If the 3D translation protocol was completed and good cell isolation was maintained, VPS neurons were also tested with a 3D rotation protocol in complete darkness, to assess vestibular rotation sensitivity. Stimulus direction was defined by the same set of 26 vectors, which now represent the corresponding axes of rotation according to the right-hand rule (Takahashi et al., 2007). For example, azimuth angles of 0 and 180° (elevation, 0°) correspond to pitch-up and pitch-down rotations, respectively. Azimuths of 90 and 270° (elevation, 0°) correspond to roll rotations (right-ear-down and left-ear-down, respectively). Finally, elevation angles of −90 or 90° correspond to leftward and rightward yaw rotation, respectively. The rotational motion trajectory followed a Gaussian velocity profile and rotation amplitude was 9° (peak angular velocity, ∼20°/s).
Data analysis.
To allow direct comparisons across brain areas, the data analyses performed here are similar to those used previously to characterize vestibular and visual responses to self-motion in areas MSTd (Gu et al., 2006; Takahashi et al., 2007), PIVC (Chen et al., 2010), and VIP (Chen et al., 2007). Data analyses and statistical tests were performed using MATLAB (MathWorks). Peristimulus time histograms (PSTHs) were constructed for each direction of translation/rotation using 25 ms time bins smoothed with a 400 ms boxcar filter. We calculated the maximum response of the neuron across stimulus directions for each 25 ms time bin between 0.5 and 2 s after motion onset. We then used ANOVA to assess the statistical significance of direction tuning as a function of time and to evaluate whether there are multiple time periods in which a neuron shows directional tuning (for details, see Chen et al., 2010). “Peak times” were then defined as the times of local response maxima corresponding to distinct epochs of directional tuning.
Based on the number of distinct peak times, VPS cells were divided into three groups: cells with a single time period of directional selectivity (“single-peaked”), cells with two temporal peaks of direction tuning (“double-peaked”), cells with three temporal peaks of direction tuning (“triple-peaked”), and cells that were not significantly direction selective in any time period (“not tuned”). To illustrate 3D directional tuning, mean responses are plotted as a function of azimuth and elevation in the form of color contour maps. The spherical data are plotted on Cartesian coordinates using the Lambert cylindrical equal-area projection (for details, see Gu et al., 2006). This produces a flattened representation in which the abscissa represents azimuth angle and the ordinate corresponds to a sinusoidally transformed version of the elevation angle. The color scale in each contour map was determined from the range of responses exhibited by each neuron, rounded to the nearest multiple of 10 spikes/s. Note that the azimuth axis in the contour plots is circular, such that the tuning for all examples cells shown is unimodal (see Figs. 2B,F, 3B,F).
The strength of directional tuning at each peak time was quantified using a direction discrimination index (DDI; Takahashi et al. 2007), given by: where Rmax and Rmin are the maximum and minimum responses from the 3D tuning function, respectively. SSE is the sum squared error around the mean response, N is the total number of observations (trials), and M is the number of stimulus directions (M = 26). The DDI compares the difference in firing between the preferred and null directions against response variability, and quantifies the reliability of a neuron for distinguishing between preferred and null motion directions. Neurons with large response modulations relative to the noise level will have DDI values closer to 1, whereas neurons with weak response modulation will have DDI values closer to 0. DDI is conceptually similar to a d′ metric in that it quantifies signal-to-noise ratio but it has the advantage of being bounded between 0 and 1, similar to other conventional metrics of response modulation.
The preferred direction of a neuron for each stimulus condition was described by the azimuth and elevation of the vector sum of the individual responses (after subtracting spontaneous activity). In such a representation, the mean firing rate in each trial was considered to represent the magnitude of a 3D vector whose direction was defined by the azimuth and elevation angles of the particular stimulus. Preferred directions have been plotted on Cartesian axes using the Lambert projection (see above). This transformation was also used to calculate the distributions of the difference in 3D direction preferences (|Δ preferred direction|).
The vector sum can reliably reflect the tuning preference of the cell only when the directional tuning profile is unimodal at a particular peak time. However, we found that this was not always the case for VPS neurons. Thus, we first classified the directional tuning at each peak time as “unimodal” versus “bimodal,” with the latter group also potentially including multimodal cells (for details, see Chen et al., 2010). Distributions of 3D direction preferences (and |Δ preferred direction| between conditions) only contain data from peak times for which directional tuning was characterized as unimodal.
To assess whether particular distributions of response parameters were significantly different from uniform, a resampling analysis was used (Takahashi et al., 2007). We computed the sum squared error (across bins) between the measured distribution and an ideal uniform distribution containing the same number of observations. Then we generated a random distribution by drawing the same number of data points from a uniform distribution using the “unifrnd” function in Matlab. The sum squared error was again calculated between this random distribution and the ideal uniform distribution. This second step was repeated 1000 times to generate a distribution of sum squared error values that represent random deviations from an ideal uniform distribution. If the sum squared error for the experimentally measured distribution lay outside the 95% confidence interval of values from the randomized distributions, then the measured distribution was considered to be significantly different from uniform (p < 0.05).
For nonuniform distributions, the number of modes was further assessed using a multimodality test based on the kernel density estimate method (for details, see Takahashi et al., 2007). A von Mises function (the circular analog of the normal distribution) was used as the kernel for circular data and a normal distribution for noncircular data. Watson's U2 statistic, corrected for grouping, was computed as a goodness-of-fit test statistic to obtain a p value through a bootstrapping procedure. This test generated two p values, with the first one (puni) for the test of unimodality and the second one (pbi) for the test of bimodality. A distribution was classified as significantly bimodal if puni < 0.05 and pbi > 0.05.
Last, we quantified visual and vestibular contributions to the combined response by measuring a “vestibular gain” and a “visual gain.” This was achieved by fitting the combined response data with the following equation: where Rx are matrices of mean firing rates for all heading directions; a1 and a2 are the vestibular and visual gains, respectively; and a3 is a constant that accounts for direction-independent differences between the three conditions. A “gain ratio” was defined as a1/a2: the higher the gain ratio, the higher the vestibular contribution (relative to visual) to the combined response (Takahashi et al., 2007). Only cells for which the linear model provided a good fit to the data (R2 > 0.5) have been included in this analysis.
Results
We recorded neuronal activity from the most posterior portion of the lateral (sylvian) sulcus in three hemispheres of two awake, behaving rhesus monkeys. The majority of neurons were recorded from the left hemisphere of animal E (n = 92) and the right hemisphere of animal A (n = 50), as illustrated in Figure 1A–G. A small sample of cells (n = 24) was obtained from the right hemisphere of monkey E. As shown in Figure 1B–G, cells with significant responses to only vestibular translation (black symbols) were encountered in the upper bank, tip, and lower bank of the lateral sulcus. Cells with significant responses to both vestibular and visual translation (pink symbols), as well as cells with only visual responses (yellow symbols), were mostly encountered in the upper bank and tip of the lateral sulcus, and only in the most posterior sections. Using the parcellation scheme of Lewis and Van Essen (2000a,b), most cells with visual responses were located in opercular area 7 (area 7op) and the posterior portion of retroinsular cortex (Ri) bordering 7op.
We recorded from every well isolated neuron in VPS, without prescreening, such that we can make direct comparisons with area MSTd, where a similar approach to cell selection was taken (Gu et al., 2006). Once isolated, every VPS neuron was first tested with physical translation (vestibular condition) and simulated translation (visual condition) along 26 motion directions uniformly distributed in 3D space (see Materials and Methods). For a subset of cells (58 of 166), this block of trials also included a third stimulus condition: congruent combinations of inertial and visual translation (combined condition). If satisfactory isolation was maintained throughout this block of translational stimuli, cells were also tested with physical rotation about the same 26 axes, with each axis defining a direction of rotation according to the right-hand rule. Note that heading direction in all conditions is referenced to physical body motion (i.e., heading direction for optic flow refers to the direction of simulated body motion).
Visual and vestibular responses to translation
Figure 2 shows an example single-peaked VPS cell tested with translational motion in the vestibular (top), visual (middle), and combined (bottom) conditions. The left panels show average PSTHs for all 26 directions of motion (Fig. 2A,C,E). The red dashed lines mark the peak response time for each stimulus condition, which is defined as the time window that produces the largest departure in firing rate from the baseline response (see Materials and Methods). At the corresponding peak times, we computed the 3D directional tuning of the neuron, which is illustrated by the color contour maps in Figure 2, B, D, and F. Responses of this cell were significantly tuned (ANOVA, p < 0.01) in both the vestibular and visual conditions and it was classified as “multisensory.”
The direction preference for each stimulus condition was defined as the azimuth and elevation of the vector sum of the neural responses (see Materials and Methods). In the vestibular condition, this cell exhibited strong spatial tuning with a heading preference of (azimuth, elevation) = (104, 55°), corresponding to a forward/downward translation. When the same set of translational movements was simulated by optic flow (visual condition), the direction preference was nearly opposite [(azimuth, elevation) = (−94°, −50°)], corresponding to backward/upward translation. Combining optic flow and inertial motion (combined condition) resulted in a response pattern that was very similar to the vestibular tuning of the cell [(azimuth, elevation) = (121, 61°)] (Fig. 2, compare E, F, with A, B). Note that the robust activation observed for backward/upward directions during the visual stimulus condition (Fig. 2C,D) is strikingly absent during combined stimulation (Fig. 2E,F). Thus, the combined response of this neuron is dominated by the vestibular input, even at 100% visual motion coherence, a phenomenon that was not observed in area MSTd (Gu et al., 2006).
Figure 3 shows responses from another multisensory VPS neuron, which has two peak times in response to vestibular stimulation, resulting in two distinct directional tuning patterns (Fig. 3A,B). The red (0.88 s) and green (1.58 s) lines in Figure 3A mark the two peak times. The corresponding direction tuning profiles reveal two nearly opposite direction preferences at (azimuth, elevation) = (48, −61°) and (−100, 45°), respectively (Fig. 3B). Note that the visual response of this same neuron is single-peaked (Fig. 3C,D). Like the example cell of Figure 2, the combined response is dominated by the vestibular tuning, with two distinct peaks of directional tuning at (azimuth, elevation) = (42, −55°) (0.81 s) and (−126, 39°) (1.48 s), respectively.
Responses of a few VPS cells were inhibited by visual and/or vestibular stimulation, as illustrated by the example neuron in Figure 4. Responses of this cell were suppressed for most vestibular and visual stimulus directions, and there was no significant directional tuning at any time during the 2 s motion profile (not tuned). Across the population, only a small portion of VPS neurons had inhibitory responses to all stimulus directions: vestibular, 10% (16 of 166); visual, 18% (29 of 166) (Table 1, not-tuned cells).
Overall, more VPS cells were tuned to vestibular than visual stimuli (Table 1). More than one-third of the cells showed significant tuning for both vestibular and visual stimulation (multisensory neurons), another one-third showed significant tuning for vestibular translation only (vestibular only neurons), and only a small minority (3%) showed significant tuning for optic flow only (visual only neurons). Among the significantly tuned cells, vestibular responses were almost equally likely to be single-peaked or double-peaked, with only a handful of triple-peaked cells (Table 2, vestibular condition). In contrast, the majority of visual responses were single-peaked (Table 2, visual condition). Because most combined responses were dominated by the vestibular tuning, as seen for the examples in Figures 2 and 3, they were either single-peaked, double-peaked, or triple-peaked in proportions similar to those seen for the vestibular condition (Table 2, combined condition).
To summarize the strength of heading tuning across the population of neurons, we computed a DDI that ranges from 0 (poor tuning) to 1 (strong tuning) (see Materials and Methods). DDI values for visual and vestibular translation responses are compared in Figure 5A. In this scatter plot, neurons are separated into multisensory, vestibular only, visual only, and not-tuned classes based on the significance of heading tuning in each stimulus condition (ANOVA, p < 0.01). Considering all cells together (n = 166), the vestibular DDI (0.69 ± 0.01, SE) was significantly greater than the visual DDI (0.60 ± 0.01, SE) (paired t test, p < 0.001), indicating that vestibular heading tuning is generally stronger in VPS than visual heading tuning.
Because our experimental protocols were identical with those used previously to characterize optic flow and vestibular tuning in area MSTd, a direct comparison of tuning strength between areas is possible. As illustrated by the cumulative distributions of DDI in Figure 5B (black vs gray), the vestibular DDI for VPS (mean ± SE, 0.69 ± 0.01) was significantly greater than that for MSTd (0.59 ± 0.01) (Gu et al., 2006; Takahashi et al., 2007) (p < 0.001, Wilcoxon's rank test). In contrast, the visual DDI for VPS (0.60 ± 0.01) was significantly less than that for MSTd (0.76 ± 0.01) (p < 0.001, Wilcoxon's rank test) (Fig. 5C). Thus, vestibular signals dominate heading tuning in VPS, whereas visual signals dominate in MSTd.
Considering only neurons with significant tuning for translation, we further examined the timing of directional responses across areas by plotting cumulative distributions of peak times (the earliest peak time was used for double-peaked and triple-peaked cells; findings were similar when single-peaked and double-peaked cells were plotted separately). Figure 6A shows that vestibular peak times were significantly earlier on average in VPS (0.95 ± 0.02; n = 120) than in MSTd (1.04 ± 0.01; n = 277) (p < 0.001, Wilcoxon's rank test). For the visual condition, there was no significant difference in peak times between VPS (1.02 ± 0.01 s; n = 66) and MSTd (1.00 ± 0.01 s; n = 331) (p = 0.13, Wilcoxon's rank test).
Cells with significant directional tuning were further subdivided based on whether their spatial tuning (at a particular peak time) was unimodal or multimodal (see Materials and Methods) (Chen et al., 2010). The vast majority of VPS neurons showed unimodal directional tuning at the first peak time [vestibular: 86% (103 of 120); visual: 76% (50 of 66)], at the second peak time [vestibular: 70% (40 of 57); visual: 100% (10 of 10)], and at the third peak time [vestibular: 83% (5 of 6)]. For these VPS cells, direction preferences (first peak time) were distributed throughout the spherical stimulus space, as illustrated in Figure 7, A (vestibular responses) and B (visual responses). Each data point in these scatter plots specifies the preferred 3D direction of a single neuron, while histograms along the axes show the marginal distributions of azimuth and elevation preferences. None of these distributions was significantly different from uniform given the sample size (p > 0.30, uniformity test). Hence, vestibular and visual heading preferences are distributed fairly uniformly in VPS.
As shown by the examples of Figures 2⇑–4, most VPS cells have opposite direction preferences for visual and vestibular stimuli. This is summarized in Figure 7C, which shows the distribution of the absolute difference in 3D direction preference (|Δ preferred direction|) between visual and vestibular responses for all cells with significant unimodal tuning in both stimulus conditions. Although the distribution of |Δ preferred direction| was significantly bimodal (p < 0.001, uniformity test; puni = 0.007, pbi = 0.075, modality test), the majority of VPS neurons (27 of 39; 69%) had |Δ preferred direction| > 120°. Based on the definition of “congruent” and “opposite” neurons used in previous studies (Gu et al., 2006, 2008; Takahashi et al., 2007; Chen et al., 2011), less than one-quarter (9 of 39; 23%) of VPS cells with unimodal tuning were characterized as congruent (i.e., |Δ preferred direction| < 90°), whereas 77% (30 of 39) were opposite (i.e., |Δ preferred direction| > 90°. This result provides a clear contrast with area MSTd, in which congruent and opposite cells are found in roughly equal proportions (Gu et al., 2006). In fact, the distribution of congruent (55%) and opposite cells (45%) in MSTd is significantly different from that seen in VPS (p < 0.001, χ2 test).
Combined visual/vestibular responses to translation
To characterize the interaction between visual and vestibular inputs and to compare with previous studies, a subset of VPS neurons (n = 58) was also tested with translational stimuli under a combined visual/vestibular stimulus condition (see Materials and Methods). Figure 8, A and B, illustrates how tuning strength of the combined response, as quantified by the DDI, compares with tuning strength of the single-cue responses. Note that this comparison includes all cells, whether they are significantly tuned for both visual and vestibular stimuli (black filled symbols), visual stimuli only (green symbols), vestibular stimuli only (red symbols), or neither (open black symbols). Unlike in MSTd (Takahashi et al., 2007), the combined DDI in VPS (0.71 ± 0.01) was not significantly different from the vestibular DDI (0.70 ± 0.01) (p = 0.59, Wilcoxon's rank test) (Fig. 8A). In contrast, the combined DDI was significantly greater than the visual DDI (0.64 ± 0.01) (p < 0.001) (Fig. 8B).
Consistent with a dominance of vestibular input to the combined response, the cumulative distributions of the earliest peak times for the combined and vestibular responses were nearly overlapping (Fig. 8C, filled circles vs upward triangles). Peak times in the visual condition were longer on average, although the difference was not significant [mean ± SE, 0.92 ± 0.05 (combined); 0.92 ± 0.03 s (vestibular); 1.02 ± 0.05 s (visual); p > 0.14, Wilcoxon's rank test]. To compare the timing of VPS responses with stimulus velocity/acceleration, the vertical lines denote peak stimulus velocity (solid line) and peak acceleration/deceleration (dashed lines). For both the vestibular and combined stimulus conditions, the mean peak time in VPS occurred significantly earlier than the time of peak velocity (p < 0.004, Wilcoxon's signed rank test), but significantly later than the time of peak acceleration (p < 0.003, Wilcoxon's signed rank test). For the visual stimulus condition, the mean peak time was significantly later than the time of peak acceleration (p < 0.001, Wilcoxon's signed rank test), but not significantly different from the time of peak velocity (p = 0.78, Wilcoxon's signed rank test). These findings suggest that, whereas visual responses likely follow stimulus velocity, linear acceleration components are likely a strong contributor to vestibular responses in VPS.
To further illustrate the vestibular dominance of combined responses, Figure 8D shows the distribution of |Δ preferred direction| between the combined and vestibular conditions for 17 multisensory cells with opposite direction preferences in the visual and vestibular conditions (by definition, congruent cells have combined responses that are aligned with both visual and vestibular tuning; thus, they are not shown here). Among these opposite cells, 12 (70%) had |Δ preferred direction| < 60°, indicating that the heading preference in the combined condition was similar to that of the vestibular condition. This finding contrasts with previous results from area MSTd, for which combined responses under identical stimulus conditions (100% motion coherence) were generally dominated by the visual tuning (Gu et al., 2006; Takahashi et al., 2007; Morgan et al., 2008).
Finally, to explore further the relative contributions of vestibular and visual inputs to the combined tuning of multisensory neurons, we computed vestibular and visual gains, as well as the gain ratio. These gains describe the weighting of the visual and vestibular responses that provide the best linear fit to the combined response (Eq. 2). As illustrated in Figure 9A, the linear model generally provided very good fits to the combined responses, with median values of R2 being 0.82 for VPS and 0.92 for MSTd, respectively. Note that this is consistent with findings of a previous study of MSTd neurons (Morgan et al., 2008), for which inclusion of a larger range of stimuli allowed comparison of linear and nonlinear models, with little explanatory power gained by nonlinear models. The corresponding distributions of the ratio of vestibular/visual gains are shown in Figure 9B. A gain ratio of 1 indicates that vestibular and visual inputs are equally weighted in the combined response, whereas gain ratios >1 indicate that vestibular inputs contribute more than visual inputs to the combined response. For 28 VPS neurons with significant tuning (p < 0.01, ANOVA) under both single-cue conditions, the overall gain ratio was 2.18 ± 1.42 (geometric mean ± SE), with 68% of VPS neurons having gain ratios >1 (Fig. 9B, top). By comparison, the corresponding geometric mean value of the gain ratio was 0.33 ± 0.32 for MSTd (Fig. 9B, bottom). Across the population, the gain ratio for VPS was significantly greater than that for MSTd (p < 0.001, Wilcoxon's rank test). Thus, it is clear that visual-vestibular integration by VPS neurons is dominated by vestibular input, whereas that in MSTd is dominated by visual input.
Fixation versus darkness
A subpopulation of VPS cells (n = 74) was also tested during vestibular translation in complete darkness (with the video projector turned off) (see Materials and Methods). Response selectivity and spatial tuning of VPS neurons were similar whether the animal fixated a head-fixed target on the screen or was translated in darkness with no requirement to maintain visual fixation (Fig. 10). As shown in Figure 10A, tuning strength, as measured with the DDI, was not significantly different between the fixation and darkness conditions (paired t test, p = 0.19), and DDI values for the two conditions were robustly correlated (Fig. 10B) (r = 0.66; p < 0.001). In addition, for neurons with significant unimodal spatial tuning under both conditions (42 of 59 cells), the distribution of the absolute difference in 3D direction preference between fixation and darkness was narrow and strongly biased toward zero (median, 14°) (Fig. 10B). These data suggest that responses in the vestibular condition are likely to be driven by sensory input from the otolith system, rather than by either retinal slip or efferent eye movement signals that might be involved in canceling a translational vestibulo-ocular reflex.
Vestibular responses to rotation
A subset of VPS cells (n = 58) was also tested with rotational stimuli in complete darkness. Each neuron was tested with rotations around the same 26 axes used for the translational stimuli. The majority of VPS neurons (44; 76%) were significantly tuned for direction of rotation (p < 0.01, ANOVA). Since all cells tested with the rotation protocol were also tested with the translation protocol first, a direct comparison between rotation and translation tuning is possible. Tuning strength, as measured with the DDI, was significantly weaker for rotation (mean ± SE, 0.63 ± 0.01) than translation (0.70 ± 0.01) (p < 0.001, Wilcoxon's rank test).
Nearly all (38 of 44) vestibular rotation responses in VPS were classified as single-peaked (Table 2). The direction preferences (at the first peak time) for cells with unimodal directional tuning were distributed throughout the spherical stimulus space, as illustrated in Figure 11B. The distribution of azimuth preferences was significantly nonuniform (p < 0.01, uniformity test) with some bias toward pitch preferences, although the sample size is too small to gain a clear impression of this bias. The distribution of elevation preferences was marginally different from uniform (p = 0.08, uniformity test), with relatively few cells preferring yaw rotations (elevation, ±90°). For 26 cells with unimodal directional tuning for both translation and rotation, the absolute difference in 3D direction preference (|Δ preferred direction|) between rotation and translation was unimodal (p = 0.03, uniformity test; puni = 0.79, modality test), with a clear tendency for translation and rotation preferences to differ by ∼90° (Fig. 11C). For example, cells that prefer lateral translation (0, 180°) also tend to prefer roll rotation (±90°). This tendency for rotation and translation preferences to be orthogonal has also been reported for neurons in MSTd (Takahashi et al., 2007) and PIVC (Chen et al., 2010).
Discussion
Using identical stimulation protocols to those previously used to study neurons in areas MSTd (Gu et al., 2006; Takahashi et al., 2007), PIVC (Chen et al., 2010), and VIP (Chen et al., 2007), we have explored convergence of visual and vestibular cues to self-motion in area VPS. Unlike PIVC neurons (Chen et al., 2010), a substantial proportion (∼40%) of VPS cells are tuned for heading defined by optic flow, and roughly one-third of VPS neurons are multisensory. The two most notable findings of the present study are (1) that responses of VPS neurons to combined vestibular/visual stimulation are dominated by their vestibular tuning, and (2) that more than three-quarters of multisensory VPS neurons have opposite direction preferences in response to visual and vestibular stimulation. Both of these properties lie in clear contrast to what is known about neurons in other visual/vestibular multisensory areas, most notably MSTd and VIP. In addition, optic flow responses in VPS are weaker than those in MSTd, whereas vestibular responses are stronger in VPS than in MSTd. We conclude that, within the network of interconnected areas involved in vestibular/visual integration, VPS emphasizes the contribution of vestibular inputs, whereas MSTd emphasizes optic flow processing.
Anatomical location of VPS
Based on the parcellation scheme of Lewis and Van Essen (2000a,b), most of the visual/vestibular multisensory neurons that we recorded lie within area 7op and the posterior portion of Ri bordering on 7op (Figs. 1, 12). Our recording locations appear to overlap considerably with those of Dicke et al. (2008), also from rhesus macaques. To compare the locations of the present recordings with those from PIVC, Figure 12 shows flattened cortical maps of regions with and around the lateral sulcus of four animals, two from the present study (Fig. 12A,B) and two from the previous PIVC study by Chen et al. (2010) (Fig. 12C,D). Note that visual only (yellow symbols) and multisensory neurons (magenta) are largely confined to the most posterior region we have defined as VPS, whereas most neurons from the PIVC study are located substantially more anterior (Fig. 12C,D, black symbols). Some of the more posterior neurons in the PIVC study, including a few multisensory neurons (Fig. 12C, magenta), were likely recorded from VPS, as noted previously (Chen et al., 2010).
In squirrel monkeys, Guldin et al. (1992) reported optokinetic and vestibular responses in a region of the posterior temporal bank of the sylvian fissure, with most cells located in the lower bank of the sulcus. This area was initially described as area T3 (Guldin et al., 1992; Guldin and Grüsser, 1998) but was later referred to as the “visual temporal sylvian area (VTS)” (Grüsser and Guldin, 1995; Guldin and Grüsser, 1996) and the “visual posterior sylvian area (VPS)” (Guldin and Grüsser, 1998). We have used the name VPS to refer to what we believe is the functionally corresponding area in rhesus monkeys, although the locations of this area within the sulcus in the two species may be slightly different. Note also that VPS in rhesus monkeys might correspond to the human motion-responsive area PIC (posterior insular cortex) (Sunaert et al., 1999; Claeys et al., 2003).
Most anatomical studies characterizing the afferent and efferent connections of area VPS were done in squirrel monkeys (Akbarian et al., 1992, 1993; Guldin et al., 1992) and Java monkeys (Akbarian et al., 1994). Area VPS is extensively and bidirectionally connected with insular and retroinsular cortex (including area PIVC), as well as other cortical areas, including the anterior cingulate, the anterior ventral part of area 6, as well as parts of areas 7b, 7a, and area 3a (Guldin et al., 1992; Guldin and Grüsser, 1998). Unlike PIVC, however, area VPS is also interconnected with visual areas of the parieto-occipital and parieto-temporal regions (area 19) and with a sector of the upper bank of the temporal sulcus (Guldin et al., 1992). VPS also receives thalamic inputs, predominantly from the visual and visuomotor regions of the pulvinar, the intralaminar nuclei, and the posterior thalamic nuclei (Akbarian et al., 1992). Vestibular signals in VPS could arise directly from the intralaminar and ventral posterior nuclei of the thalamus (Lang et al., 1979; Meng et al., 2007). Notably, area VPS in squirrel monkeys has also been reported to project directly to the vestibular brainstem (Akbarian et al., 1993). Overall, the extensive connections of VPS with other nodes of vestibular circuitry, as well as its more limited connectivity with visual pathways, are consistent with our findings of visual–vestibular convergence in VPS, but with a dominance of vestibular input.
Physiological properties of VPS neurons
Few studies have characterized neural response properties in area VPS. Responses to optokinetic stimulation have been reported previously in both PIVC and VPS (Grüsser et al., 1990a,b; Guldin et al., 1992; Guldin and Grüsser, 1998). In contrast, we did not find significant optic flow tuning in PIVC (Fig. 12C,D) (Chen et al., 2010). Although ∼40% of VPS neurons were tuned for heading defined by optic flow, the dominant stimulus that activated most VPS cells was inertial motion, not optic flow. In fact, the percentage of cells tuned to vestibular stimulation in the present study (72% during translation and 76% during rotation) (Table 1) is considerably higher than that (30%) reported by Guldin et al. (1998) in squirrel monkeys. One possible explanation for this difference is that we have used 3D motion stimuli, whereas rotation about a single axis (yaw) was used previously. Importantly, we found that vestibular responses were similar during fixation and free viewing in darkness, suggesting that they are likely driven by vestibular sensory inputs, rather than retinal slip or efferent eye movement signals. Whether these responses are purely labyrinthine in origin or also have a somatosensory component cannot be determined from the present experiments. However, Guldin et al. (1992) reported that VPS neurons were not sensitive to somatosensory stimulation.
Comparison of VPS with other cortical visual/vestibular multisensory areas
We found considerable differences between visual/vestibular response properties in area VPS and those of other multisensory areas such as MSTd. Strikingly, during combined visual/vestibular stimulation, VPS responses strongly resemble those in the vestibular condition (Figs. 2, 3). This finding is captured by the vestibular/visual gain ratio of Figure 9, which was nearly 10-fold greater in VPS than in MSTd (Gu et al., 2006). The functional significance of this finding is presently unclear. In MSTd, visual tuning dominates combined responses at high motion coherence (Gu et al., 2006; Morgan et al., 2008), but a more balanced contribution of the two signals is seen when coherence is reduced to roughly match the behavioral reliability of the vestibular cue (Gu et al., 2008; Morgan et al., 2008; Fetsch et al., 2009). Visual/vestibular interactions in area VPS may also be flexible and dynamically adjust with cue reliability and behavioral demands. It is not clear why one multisensory area (MSTd) should favor visual inputs when representing self-motion while another area (VPS) would emphasize vestibular inputs. One can speculate perhaps that the balance of activity within and across areas might be useful for estimating the relative reliabilities of visual and vestibular cues, but the different weighting of visual/vestibular signals might also simply arise from gradual transformations of unisensory to multisensory representations.
Another salient difference between multisensory response properties of areas VPS and MSTd/VIP involves the incidence of opposite cells. Whereas roughly equal numbers of congruent and opposite cells were observed in areas MSTd (Gu et al., 2006, 2008, 2010) and VIP (Bremmer et al., 2002b; Chen et al., 2007), the vast majority of VPS neurons show opposite visual/vestibular tuning. In MSTd, the subpopulation of congruent cells showed enhancements in heading sensitivity during cue combination that paralleled behavioral improvements in heading discrimination (Gu et al., 2008). In contrast, opposite cells in MSTd became less sensitive during cue combination. Moreover, responses of congruent MSTd cells correlated with perceptual decisions about heading, whereas those of opposite cells did not (Gu et al., 2008). The finding that VPS contains mostly opposite cells suggests that it is not a major contributor to cue integration for heading perception.
Rather, the fact that opposite cells may be maximally activated when visual and vestibular cues are not consistent with motion through a stationary environment suggests that these neurons may play a role in identifying components of optic flow that are inconsistent with self-motion. VPS might contribute to generating a representation of self-motion that is robust to movement of objects in the scene. Such a role would be compatible with previous suggestions that VPS is involved in generating perceptual stability during pursuit eye movements (Thier et al., 2001; Lindner et al., 2006; Dicke et al., 2008; Trenner et al., 2008). Other data from humans suggest a general role of area VPS in the suppression of optokinetic nystagmus (Dieterich et al., 1998, 2003; Haarmeier and Kammer, 2010) and in the percept of vection (Brandt et al., 1998). Future experiments should directly address the role of VPS in perceptual stability during eye, head, and body motion.
Footnotes
This work was supported by NIH Grants EY017866 (D.E.A.) and EY016178 (G.C.D.). We thank Babatunde Adeyemo for the MRI analysis, as well as Yong Gu, Chris Fetsch, Syed Chowdhury, and Yun Yang for contributing neural recordings to this analysis.
- Correspondence should be addressed to Dora E. Angelaki, Department of Anatomy and Neurobiology, Washington University Medical School, 660 South Euclid Avenue, St. Louis, MO 63110. angelaki{at}pcg.wustl.edu