Abstract
Information pertaining to visual motion is used in the brain not only for conscious perception but also for various kinds of motor controls. In contrast to the increasing amount of evidence supporting the dissociation of visual processing for action versus perception, it is less clear whether the analysis of visual input is shared for characterizing various motor outputs, which require different kinds of interactions with environments. Here we show that, in human visuomotor control, motion analysis for quick hand control is distinct from that for quick eye control in terms of spatiotemporal analysis and spatial integration. The amplitudes of implicit and quick hand and eye responses induced by visual motion stimuli differently varied with stimulus size and pattern smoothness (e.g., spatial frequency). Surprisingly, the hand response did not decrease even when the visual motion with a coarse pattern was mostly occluded over the visual center, whereas the eye response markedly decreased. Since these contrasts cannot be ascribed to any difference in motor dynamics, they clearly indicate different spatial integration of visual motion for the individual motor systems. Going against the overly unified hierarchical view of visual analysis, our data suggest that visual motion analyses are separately tailored from early levels to individual motor modalities. Namely, the hand and eyes see the external world differently.
Introduction
In exploring how the brain analyzes visual motion, numerous psychophysical and physiological studies (Burr and Ross, 1982; Johnston and Wright, 1983; Maunsell and Van Essen, 1983; Newsome and Pare, 1988; Anderson and Burr, 1991; Kawano et al., 1994; Duffy and Wurtz, 1997; Whitney and Cavanagh, 2000; Priebe and Lisberger, 2004) have focused on the specificities of perceptual performance and motion-sensitive neural activities. Prevailing views on how external visual motion is encoded assume that a unified visual motion analysis is used for various brain computations for object recognition, scene perception, and motor control (Van Essen et al., 1992). In fact, many studies reported that oculomotor performance driven by visual motion is closely related to the perceived motion (Yasui and Young, 1975; Beutter and Stone, 2000; Stone and Krauzlis, 2003) and that similar forms of visual feature detection subserve different motor systems, such as the eyes, arms, and legs (Engel et al., 2000; Glover and Dixon, 2004). This parsimonious representation would be attractive in examining the brain computation because the brain may require coherent visual information representing the external physical world for its different downstream functions.
On the other hand, an alternative idea of dissociation in visual processing between action and perception has been developed on the basis of a series of experimental studies (Goodale et al., 1986; Pélisson et al., 1986; Churchland et al., 2003). The dissociation observed in a patient (Goodale et al., 1991), however, has recently been better described by a dichotomy of egocentric and allocentric processing (Schenk, 2006), and several studies carefully argued about the dissociation between action and perception (Franz et al., 2000; Smeets and Brenner, 2006). It is therefore still unclear what level of visual processing is dissociated. According to some of these discussions, the dissociation could occur in the high-level process for brain functions rather than in the low-level visual feature analysis. Actually, as for quick eye responses induced by several visual motion attributes, some of the dissociations of the response specificities from the perception specificities can be explained by different levels of motion processing, such as first-order motion and non-first-order motion detections (Guo and Benson, 1999; Masson and Castet, 2002; Hayashi et al., 2008; Hayashi et al., 2010). However, whether a low-level visual analysis extracting a particular visual attribute is distinctively processed for different brain functions has not yet been well examined.
Here we focus on the visual motion analysis for hand and eye controls in which surrounding visual motion elicits ultrashort latency responses in the direction of the visual motion (Miles et al., 1986; Kawano et al., 1994; Brenner and Smeets, 1997; Whitney et al., 2003; Saijo et al., 2005). These quick reflexive responses of the hand and eye, respectively termed the manual following response (MFR) and ocular following response (OFR), are considered to function in reducing the corresponding movement error in dynamical interactions with environments (Miles et al., 1986; Gomi, 2008). Interestingly, the hand response has gain-tuning specificities similar to those of the eye response for changes in image luminance contrast and spatiotemporal frequencies of the large-field visual motion (Gomi et al., 2006), whereas these specificities are different from those of the perceptual effects caused by visual motion (Burr and Ross, 1982; Levi and Schor, 1984; De Valois and De Valois, 1991). Because of the similar specificities of the quick hand and eye responses, it was inferred that the visual motion analysis is shared for these quick motor responses. Surprisingly, however, by varying the size and location of the stimuli, here we found that the visual analyses for the hand and eye control are dramatically distinct from each other with respect to their spatiotemporal analysis and spatial integration of visual motion.
Materials and Methods
Experimental setup.
Twenty-eight subjects [8 for Experiment 1 (Exp. 1), 10 for Experiment 2 (Exp. 2), and 10 for Experiment 3 (Exp. 3); 21 males, all right handed, 21–39 years of age] participated in this study. None of them reported having any motor or visual disorders. The subjects had normal or corrected-to-normal visual acuity. All gave informed consent to participate in the study, which was approved by the local ethics committee.
The experimental setup for the visuomotor task was almost identical to that used in our previous experiment (Kadota and Gomi, 2010). A back-projection screen was vertically placed in front of a subject 48 cm from the eyes. Mean luminance values around the screen center were 46.7 cd/m2. Right-hand position (around the base of the index finger) was measured by a motion-capture system (VICON MX13; Peak Performance Technologies) at 250 Hz, and right-eye angular position was measured by an eye tracker (Eyelink II; SR Research) at 500 Hz. Eye positions were calibrated before every eye-movement recording block. The subject's head was supported by a chin rest in all experiments. In the simultaneous recording experiment (Exp. 1), we additionally used a silicon bite bar formed for each subject to immobilize the subject's head on which the eye tracker was mounted. This setup was required to minimize noises caused by arm movement in this experiment. In Exp. 2 and Exp. 3 (described below), which examined the responses in many stimulus conditions and therefore took much longer time than Exp. 1, to reduce subjects' fatigue and frustration, a bite bar was not used. The hand and eye positions were, therefore, recorded in separate sessions to avoid eye–camera fluctuation caused by hand movements. The actual time of visual motion onset was detected by a photodiode signal as in our previous studies (Saijo et al., 2005; Gomi et al., 2006; Kadota and Gomi, 2010).
Protocol and visual stimuli.
Figure 1 shows the stimulus sequence and behavioral task used in all experiments in this study. Subjects were asked to keep pushing the button-switch placed on a table (40 cm in horizontal direction from the screen) with the right index finger. The subjects were also asked to keep their eyes fixated on the screen center (target position) and to avoid blinking until the end of each trial. While they kept pushing the button, a red target marker (6 mm) was flashed at the center of the screen, and one of the static grating patterns (Michelson contrast, 50%) randomly selected from the prepared stimuli in each experiment (explained below) was shown at the center of the screen 0.2 s after the target flash, as shown in Figure 1. Subsequently, at 0.6 s after the grating pattern had appeared, a beeping sound was given to initiate hand reaching with button release (Exp. 1 and hand-reaching recording sessions in Exp. 2 and Exp. 3) or to initiate button release only (eye-movement recording sessions in Exp. 2 and Exp. 3).
Temporal sequence of the task and stimulus. The visual stimulus (contrast grating pattern) appeared 0.6 s after the target flash (red small marker), started to move leftward or rightward 0.1 s after the button release, and lasted 0.4 s. In the hand-movement blocks, subjects were instructed to make a smooth reaching movement after releasing the button and touch the location of flashed target on the screen (see Materials and Methods for details). Note that stimulus motion started just after reaching start (button release) in the hand-movement blocks.
In the hand-reaching recording sessions, the subjects were instructed to make a smooth reaching movement with moderate speed (∼0.6 s for a distance of 51 cm from the button to the target on the screen) and touch the location of the flashed target on the screen with their index finger and to return the finger to the button on the table. The subjects were asked not to vary the reaching speed and touch position across trials, but there was no explicit feedback on the accuracy of the reaching duration and touch position during the experiment.
At 0.1 s (±6 ms SD) after the button release in visual motion trials, the grating pattern started to move either rightward or leftward at a constant speed with temporal frequency of 10 Hz, as illustrated in Figure 1. The stimulus motion continued for 0.4 s during hand-reaching movement. In no-visual-motion trials (the number of trials for each experiment is described below), the grating pattern was stable until the pattern vanished.
In Exp. 1, we used coarse (0.05 cpd) and fine (0.2 cpd) contrast grating patterns with the size of 50° in diameter as visual stimuli (Fig. 2). The outside of the stimulus pattern was a gray-filled area with 50% luminance contrast. In this experiment, hand and eye movements were recorded simultaneously (360 trials in total, 80 trials for each direction of each pattern, and 20 trials for each of coarse and fine static patterns) for each subject. The order of stimuli was randomized, and an intermission was taken every 90 trials to avoid fatigue.
In Exp. 2 (center-stimulus experiment), to examine the response modulations of the hand and eye for various stimuli, we used 16 contrast grating patterns with four spatial frequencies (0.02, 0.05, 0.2, and 0.8 cpd) and four sizes of stimuli (visual angle diameters of 10, 20, 40, and 50°; Fig. 3). The outside of the stimulus pattern was a gray-filled area with 50% luminance contrast as in Exp.1. This experiment consisted of two hand-reaching recording sessions and two eye-movement recording sessions. In the eye-movement recording sessions, the subjects were asked to perform the same tasks without reaching movement to prevent noise in the eye position measurement, which is caused by eye-tracker fluctuation accompanying arm movement without a bite bar. Instead, the subjects were asked to just release the button when they heard the beep. All trials in each of the hand-reaching and eye-movement recording sessions were divided into five blocks (144 trials per block). An intermission of over 5 min was inserted between blocks to avoid fatigue, and the order of hand and eye sessions were counterbalanced among subjects. In total, a grating pattern (randomly selected in each trial) moved in 1280 (40 trials for each pattern and each direction) of 1440 trials in the hand-reaching recording sessions. In the remaining trials, each of the 16 grating patterns (four stimulus sizes and four spatial frequencies) was kept stable in 10 trials (160 trials in total for static patterns).
In Exp. 3 (center-mask experiment), we used full-screen contrast grating patterns [82 cm (horizontal) × 62 cm (vertical); 81 × 65.7°; 0.02, 0.05, 0.2, and 0.8 cpd] with circular gray masks (50% luminance contrast; diameters of 0, 10, 20, 40, and 50°; Fig. 4). The procedure of this experiment, except for the stimulus patterns and number of trials, was identical to that of Exp. 2. Each block of hand or eye session consisted of 180 trials. In total, a grating pattern moved in 1600 (40 trials for each pattern and each direction) of 1800 trials for each of hand and eye sessions. In the remaining trials, each of the 20 grating patterns (five mask sizes and four spatial frequencies) was kept stable in 10 trials (200 trials in total for static patterns).
Data analysis.
All captured hand and eye positions were filtered (third-order Butterworth low-pass filter with the cutoff of 20 Hz), and the velocity and acceleration profiles were obtained by three- and five-point numerical time differentiations (without delay), respectively.
All data were aligned with respect to the visual motion onset. The hand and eye (right) responses caused by the visual motion were characterized by taking the difference between the corresponding time courses of mean acceleration in the horizontal direction parallel to the screen for the rightward and leftward moving conditions. Note that the mean was taken after failed trials had been excluded (no or delayed reaching, eyes closed, saccades, and missed recordings), as done in our previous studies (Kadota and Gomi, 2010). Response amplitudes were quantified by temporal averaging over a period of 120–160 ms for the hand acceleration profiles and 80–120 ms for the eye acceleration profiles after the visual motion onset. Statistical examinations for the hand and eye data in each experiment were conducted by a two-way repeated-measures ANOVA with stimulus (or mask) size and spatial frequency as factors. Post hoc comparisons (Tukey-Wholly Significant Difference test) were also applied if each factor or interaction was significant in the ANOVA.
Results
Mean hand movement durations across subjects (detected by the threshold of 0.05 m/s) for the simultaneous recording experiment (Exp. 1), center-stimulus experiment (Exp. 2), and center-mask experiment (Exp. 3) were 0.60 s (±0.08 SD), 0.58 s (±0.08 SD), and 0.56 s (±0.08 SD), respectively, and there was no significant difference in hand-reaching duration in the three experiments (one-way ANOVA, p > 0.6).
In the first experiment, the subjects (n = 8) were asked to repetitively produce arm-reaching movements to a remembered target position, and the pattern shown just before reaching start suddenly began to move leftward or rightward during arm reaching (see Materials and Methods for details). As a result of the application of visual motion, the hand path is shortly curved (Saijo et al., 2005; Gomi et al., 2006) and the eye quickly moves (Miles et al., 1986; Kawano et al., 1994) in the direction of the visual motion without any action intention.
Figure 2a shows the mean temporal patterns of hand and eye responses in the direction parallel to the visual motion simultaneously recorded for a particular subject. Note that these patterns were characterized by the difference in the acceleration patterns for the rightward and leftward stimuli, as done in our previous studies (Saijo et al., 2005; Gomi et al., 2006; Kadota and Gomi, 2010). Interestingly, the peak of the hand response appeared to be greater for the 0.05 cpd stimulus than for the 0.2 cpd stimulus, whereas the initial peak of the eye response was slightly greater for the 0.2 cpd stimulus. Figure 2b shows the amplitudes of the corresponding responses quantified by temporal averaging around each peak (see Materials and Methods). The differences between response amplitudes for the two stimuli in the hand and eye were statistically significant across subjects, suggesting distinct visual motion analyses for the hand and eye controls.
Difference in hand and eye responses for the two spatial frequency stimuli recorded simultaneously. a, Temporal patterns of hand (left) and eye (right) responses of a particular subject to the 50° stimuli with spatial frequencies of 0.05 cpd (solid curve) and 0.2 cpd (dashed curve) in Exp. 1. Each response was characterized by taking the difference between corresponding accelerations for rightward and leftward stimuli. Time 0 denotes the visual motion onset, and the thick black horizontal bar denotes the temporal mean duration (hand, 120–160 ms; eye, 80–120 ms) for the quantifying response amplitude. b, Mean response amplitudes of the hand (left) and eye (right) for the visual motion with spatial frequencies of 0.05 and 0.2 cpd (top diagrams), respectively. Each error bar denotes the SE. Asterisks indicate statistical significance (*p < 0.05; ***p < 0.005) in the paired t test.
To investigate in more detail the differences in the visual motion processing for the hand and eye control, we applied grating pattern stimuli with four different sizes and four spatial frequencies (10, 20, 40, and 50° of stimulus diameter; 0.02, 0.05, 0.2, and 0.8 cpd; 10 Hz) in the next experiment (Exp. 2). As expected from the spatial summation principle of visual motion perception (Anderson and Burr, 1991), the hand and eye responses gradually increased with stimulus size for the lower spatial frequency stimuli (blue for 0.02 and green for 0.05 cpd of size tuning functions; Fig. 3a, left). For the higher spatial frequency stimuli (red for 0.2 and cyan for 0.8 cpd), however, the hand response did not increase as stimulus size increased, whereas the eye response continuously increased. The interaction between the size and spatial frequency factors as well as those factors' main effects was statistically significant for the hand (p < 0.001 for all three; see Materials and Methods), whereas the interaction was not for the eye (two main factors, p < 0.001; interaction, p > 0.9). As a result, the opposite tendencies of the hand and eye response amplitudes for the 0.05 and 0.2 cpd stimuli in the large stimulus conditions (Stim size of 40 and 50°; Fig. 3a) were similar to those observed in Exp. 1 (Fig. 2b), indicating that the eye response tuning is independent of the hand movement execution. This independency is consistent with a previous observation in which the eye response induced by visual motion was not influenced by the hand movement direction (Abekawa and Gomi, 2010).
Hand and eye response changes for various center stimuli. a, Stimulus-size tunings of hand (left) and eye (right) response amplitudes for different spatial frequencies (0.02, 0.05, 0.2, and 0.8 cpd). Amplitude was quantified by temporal averaging for the duration indicated by solid black bars in Figure 2a. Each error bar denotes the corresponding SE across subjects. b, Temporal developments of spatial frequency tunings of hand (top) and eye (bottom) responses for the center stimuli with the size of 10, 20, 40, and 50°. Spatial frequencies of the tuning peaks for the hand were different between the smaller stimuli (0.2 cpd for the 10 and 20°) and the larger stimuli (0.05 cpd for 40 and 50°), whereas the peaks for the eye were the same for all stimuli (0.2 cpd for the 10, 20, 40, and 50°).
To examine whether this tuning difference was caused by a trial adaptation in this experiment, we compared these stimulus tuning specificities for the data of the first and second half of the sessions. The interaction between the size and spatial frequency factors was consistently found both in the first and second half for the hand response (p < 0.001 for both), whereas no interaction was found for the eye response. It is therefore unlikely that the tuning differences between the hand and eye responses were formed by adaptation in many trials in the tasks.
Figure 3b shows the temporal development (abscissa) of spatial frequency tunings (ordinate) of the hand and eye responses for stimulus sizes of 10–50°. As shown in Figure 3b (top), the peaks (dark red) of spatial frequency tuning for the hand response for the smaller stimuli (10 and 20°) were located at 0.2 cpd, whereas those for the larger stimuli (40 and 50°) were at 0.05 cpd without changes in the temporal timings of these peaks. In contrast, the difference between the spatial frequency tuning peaks for the smaller stimuli and larger stimuli was not observed in the eye responses, as shown in Figure 3b (bottom), because the eye responses were greatest at the 0.2 cpd at each stimulus size (Fig. 3a, right). This tuning dissociation between the hand and eye responses suggests a different spatial integration of visual motion for quick responses induced by visual motion in these two motor systems.
What sort of difference in the spatial integration mechanism produced these distinct visual motion analyses for different motor systems? One plausible explanation of the above phenomena is that spatial integration of visual motion with higher spatial frequency for the hand motor system is restricted within the visual center region only. To examine this simple area-integration model, we conducted another experiment (Exp. 3) in which contrast grating motion was applied with visual center masks of different sizes.
Figure 4a (bottom) shows hand (left) and eye (right) responses averaged across subjects for the two stimulus patterns depicted at the top: the grating motions of 0.05 cpd with the visual center masks of 0 and 40°. The responses for these two stimuli are depicted by solid and dotted curves, respectively. Surprisingly, the hand acceleration peak did not decrease but, rather, appeared to slightly increase with the 40° masking of the stimulus center, whereas the initial increases of eye acceleration and resultant eye velocity were markedly delayed and reduced.
Hand and eye response changes for various center-masked stimuli. a, Acceleration and velocity temporal patterns of hand (left) and eye (right) responses (rightward stimulus − leftward stimulus) for 0.05 cpd grating stimuli with 0° (dotted curve) and 40° (solid curve) masks (illustrated at the top middle), averaged across subjects. b, Mask-size tunings of hand (left) and eye (right) response amplitudes for different spatial frequencies. Statistical analyses indicate that the eye response significantly decreased with mask size for all spatial frequencies but the hand response did not, especially for the 0.05 cpd stimuli (see Results). c, Temporal developments of spatial frequency tunings of hand and eye responses for the mask stimuli with the size of 0, 10, 20, 40 and 50°.
To examine the tendencies in response modulations, we plotted the mask-size tuning functions (mask size, 0–50°) of the hand and eye responses to the visual motions at each spatial frequency, as shown in Figure 4b. Although the eye response amplitude (Figure 4b, right) significantly decreased with mask size (p < 0.0001), the hand response amplitude (Figure 4b, left) did not show such a monotonic decrease with mask size [significant interaction (p < 0.05) between two main factors; significant changes (p < 0.05) in hand response only for 0.2 and 0.8 cpd in post hoc analyses]. As demonstrated in Figure 4a, even with the 40° visual center mask, a clear hand response was observed for the 0.05 cpd stimulus (Fig. 4b, left, green curve), and, surprisingly, it was slightly greater (one-sided paired t test, p < 0.05) than that with the no (0°) visual center mask.
To examine the trial adaptation effect on the stimulus tuning specificities, we also analyzed the data of the first and second half of all sessions of the center-mask experiment. As observed in the above analyses, the eye response decreased with mask size both in the first and second half (size main effect, p < 0.05 for both) and hand response did not, especially for the spatial frequency of 0.05 cpd, suggesting that trial adaptation in this experiment is less relevant in characterizing the difference between the hand and eye responses, as examined for the data of Exp. 2.
Additionally, for the higher spatial frequencies (0.2 and 0.8 cpd), the response was clearly induced (t test, p < 0.005) with the 10 and 20° masks (Fig. 4b, left, red and cyan curves), although in the center-stimulus experiment shown above (Fig. 3a, left), the response for 0.2 and 0.8 cpd stimuli (red and cyan lines) was not increased significantly (ANOVA, p > 0.5) by expanding the stimulus area over 10°. Furthermore, small masks (10 and 20°) appeared to magnify the hand responses for the stimuli with higher spatial frequency (0.2 and 0.8 cpd) despite the reduction of stimulus area.
These response increases caused by removing the visual center stimulus suggest a disinhibition from center to peripheral motion coding. As a result of these spatial interactions, spatial frequency tunings varied with mask size as shown in Figure 4c. In the tunings of the hand (Figure 4c, top), a relatively low spatial frequency peak was observed in the 0, 40, and 50° mask conditions at around 140 ms, whereas such strong specificity was not observed in the 10 and 20° mask conditions. These observations negate the above-mentioned simple area-integration model for the hand control. Instead, the results suggest spatial (inhibitory) interactions between the visual center and periphery in integrating spatially distributed visual motion signals. On the other hand, in the eye response shown in Figure 4c (bottom), the peak of spatial frequency tuning appeared to be monotonically shifted to a lower spatial frequency with mask size, accompanied by a decrease in the peak amplitude.
Discussion
Distinct visual motion analyses for the hand and eye
The current results shed light on the spatial frequency- and stimulus-size-dependent modulation of visual motion analyses involved in the quick visuomotor controls for the hand and eye. Although similar spatial frequency tunings for the quick hand and eye responses were found for the small stimuli (Stim size of 10 and 20°; Fig. 4b) in the current study and for the large and low-contrast stimuli (Gomi et al., 2006), different specificities of the stimulus-size and mask-size tunings were newly found for the hand and eye responses by examining the various types of stimuli. To dissociate these tunings, a subset of visual motion signals over the visual field needs to be integrated differently for the hand and eye. This suggests distinct processing for the visual motion although its neural implementation cannot yet be ascertained only from the current results. Note that since the specificities of spatiotemporal frequency tunings of the hand and eye responses (tuning peak at around ∼15–20 Hz with ∼0.05 cpd for relatively large size stimulus) (Gomi et al., 2006) are completely different from those of the known visual interaction effects of induced position shift (tuning peak at around 4–8 Hz with low spatial frequency) (De Valois and De Valois, 1991) and induced motion (tuning peak of induction ratio at <1 Hz with ∼1–5 cpd) (Levi and Schor, 1984), it is unlikely that this dissociation can be explained by the different visual attributes: one motor response is driven by an illusory position shift of the grating stimulus and the other is by the direct effect of visual motion.
In the second experiment, the most prominent difference in the hand response tunings from the eyes was the insignificant response increase with stimulus size for the higher spatial frequencies (Fig. 3a). One could speculate that this absence of an increase in the hand responses is attributable to a ceiling effect on the motor response. This is, however, unlikely because the response amplitudes for the stimuli with higher spatial frequency appeared to be still smaller than the response for the 50° stimulus with 0.05 cpd (Fig. 2b). Another possible reason could be a sparse spatial integration of the higher spatial frequency visual motion in the visual periphery because the neuron population is low in peripheral vision (Rolls and Cowey, 1970; Rovamo et al., 1978; Johnston and Wright, 1983). However, if we assume a unified visual motion analysis, this idea cannot explain our observation that the amplitude of the eye response increased with stimulus size similarly for all spatial frequency stimuli (Fig. 3a, right).
One could also speculate that the stimulus-tuning difference between the hand and eye responses is caused by the temporally delayed sampling of the hand response compared with that for the eye response, as can be seen in Figure 2a. However, as shown in the temporal development of spatial frequency tunings of the hand response (Fig. 3b), the tuning-peak timings seemed not to be different among all the stimulus sizes. Therefore, the longer temporal delay of the hand response (30–40 ms) than that of the eye would be simply attributable to the longer neural transmission/processing time and the large inertial dynamics of the arm (Saijo et al., 2005). In addition, considering that the retinal slip changed by an initial eye response affects the successive eye and hand responses after ∼160 ms (since latency of eye response was ∼80 ms), different specificities in stimulus tuning of the hand and eye responses shown in the results cannot be ascribed to any difference in visual motion inputs. It is therefore suggested that the visual motion processing producing the stimulus-size-dependent change in the spatial frequency tuning of the hand response is distinct from that producing the stimulus-size-independent change in the spatial frequency tuning of the eye.
Furthermore, the results of Exp. 3 clearly indicate spatial frequency-dependent interactions between the visual motion signals coded in the visual center and periphery for the hand control. Since simple accumulation or local interactions of motion signals over the visual field do not produce such a long-range spatial interaction (Angelucci et al., 2002), high-level visual motion processing stages would be involved to characterize the remote interaction. Importantly, since the mask-size tunings of the eye response were greatly different from those of the hand response, spatial interactions over visual areas would differently characterize the hand and eye responses.
A critical aspect of the above findings is that the dissociations cannot be explained by any difference in motor dynamics or in motor coordinates with a single (or completely shared) visual motion analysis. In contrast, previously reported dissociation depending on motor system (Masson et al., 1995) or depending on task context (Abekawa and Gomi, 2010; Tramper and Gielen, 2011) can be ascribed to different motor coordination processes for eye and hand controls (i.e., motor planning, motor command generation, or eye–hand coupling), rather than the distinct visual processes. The dissociation found in our study is, therefore, essential in proving multiple streams of visual motion processing for different motor functions.
Functional significance of motor-dependent visual motion analysis
The two sharp contrasts in the stimulus-size and mask-size tunings of hand and eye responses provide a new functional insight into the visual motion analysis. Since vestibular information greatly contributes to, but is still insufficient to, adjust hand reaching against body movements (Blouin et al., 1995a,b; Whitney et al., 2003), this peripheral-weighted visual motion analysis would be useful for improving the dynamic performance of the hand-reaching control during the body motion that frequently accompanies a large-field visual motion (Whitney et al., 2003; Gomi, 2008). In other words, the motion of a course pattern detected at the visual periphery would usually reflect body motion. This is in agreement with the body posture control that is also sensitive to a stimulus on the visual periphery (Brandt et al., 1973; Lestienne et al., 1977; Straube et al., 1994). In contrast, relatively higher spatial frequency and visual-center-weighted tuning in the eye response would be functional in stabilizing retinal images on the foveal and parafoveal regions, which are important in capturing gazed objects. Even though the observed ultraquick hand and eye responses were small and transient in the current experimental setup to strictly focus on the straightforward effect of visual motion, they might be continuously used in the dynamic controls in daily life. Distinct tuning of visual motion integration for each motor system would, therefore, be quite meaningful and functional for improving each motor performance.
Traditional visual neuroscience studies have tackled the question of how visual attributes are decomposed and represented in different neural substrates (Zeki, 1978; Van Essen et al., 1992), and it has been widely believed that each decomposed feature is commonly used in various brain functions, such as perception and controls in different motor systems. Actually, as mentioned in the introduction, similar forms of visual motion analyses would be involved in a particular type of oculomotor control and motion perception (Yasui and Young, 1975; Beutter and Stone, 2000; Stone and Krauzlis, 2003). On the other hand, several studies suggested multiple visual processing according to each downstream function (Goodale et al., 1986; Pélissonet al., 1986; Churchland et al., 2003), and recent studies have revealed that visual motion is coded in several brain areas, which, as described below, could contribute to different output functions. Our results provide quantitative evidence that visual motion coding is differently formed according to particular downstream motor functions, i.e., quick hand and eye controls. For a deeper understanding of the hierarchical and parallel brain processing, we need to reconsider the overly simplified representation of visual attributes in the brain.
Possible neural substrates contributing to distinct visual motion analyses
Together, the current results therefore predict that different neural substrates are involved in the visual motion analyses for the hand and eye controls. As mentioned in the introduction, physiological and functional brain imaging techniques associated with eye movements and visual motion perception have examined the contribution of motion-specific areas in the extrastriate cortex [middle temporal area (MT/V5) and medial superior temporal area (MST)] (Maunsell and Van Essen, 1983; Komatsu and Wurtz, 1988; Newsome and Pare, 1988; Graziano et al., 1994; Kawano et al., 1994; Duffy and Wurtz, 1997; Priebe and Lisberger, 2004) to the visual motion analysis. These studies successfully demonstrated many features of neural activities in the MT/MST for the various stimuli (speed preference, receptive field size/location, spatiotemporal frequency tunings, and latency) but did not suggest any idea that distinct spatial integration of visual motion is characterized for each motor system. Even though the visual motion signals coded in area MST drives the quick OFR (Kawano et al., 1994; Takemura et al., 2007), a different population of MT/MST neurons could be involved in the visual motion analysis for the MFR. Actually, we found a significant correlation between the blood oxygen level-dependent signal around hMT+ and MFR amplitude in our fMRI experiment (Gomi et al., 2011) and found that the inactivation of monkey MST by muscimol injection leads to the MFR reduction (Takemura et al., 2008). In addition, the dorsal part of MST has connections to the area 7b (Andersen et al., 1990; Boussaoud et al., 1990), which could send signals to the primary motor cortex via the ventral part of the premotor area (Shipp et al., 1998). These facts encourage the possibility of contribution of the MT/MST to the MFR generation.
On the other hand, it has been found that occipito-parietal areas are also involved in the global motion analysis (Galletti et al., 1990; Watson et al., 1993; Dupont et al., 1994; Cheng et al., 1995; Gegenfurtner et al., 1997; Tootell et al., 1997; Smith et al., 1998; Sunaert et al., 1999; Braddick et al., 2001; Vanduffel et al., 2002; Orban et al., 2003; Fischer et al., 2012). These areas could therefore provide different or additional spatial integration characteristics suitable for quick hand control.
The area V6 receives visual signals from the early visual cortexes (V1, V2, V3, V3A) (Galletti et al., 2001), and many neurons are activated by the motion stimuli with various speeds (Galletti et al., 2001) and by peripheral visual field stimuli (Galletti et al., 1999; Pitzalis et al., 2006), in conjunction with the other motion-sensitive area, MT/MST (Galletti et al., 2001). In addition, it is suggested that V6, as well as the area MST, is involved in the self-motion analysis (Cardin and Smith, 2010). Furthermore, V6 is strongly connected to V6A, which is reciprocally connected to the dorsal premotor cortex (Matelli et al., 1998; Shipp et al., 1998; Fattori et al., 2005). Considering these facts, V6 could be involved in the MFR generation process. As far as we know, however, motion spatiotemporal frequency tunings of the V6 neurons and the stimulus-size dependency of those tunings are not clearly understood yet.
As for the other motion-sensitive area, V3A, the receptive field sizes [e.g., 6° in monkey at eccentricity of 14° (Galletti et al., 1990); ∼10° at human peripheral visual field of eccentricity of ∼12° (Smith et al., 2001; Amano et al., 2009)] are smaller than those of neurons in V6 (Galletti et al., 1999) and MST (Komatsu and Wurtz, 1988), but an fMRI experiment showed that V3A is vigorously activated by coherent motion (Braddick et al., 2001) and even by large stimuli (Cardin and Smith, 2010). Since, in the current study, MFR was clearly induced by the relatively small (10°) stimulus with higher spatial frequency (0.2 or 0.8 cpd), area V3A could be involved in the MFR generation process for these small stimuli.
Additional evidence would be needed to clarify whether either or both V6 and MT/MST contribute to the MFR. Clarifying the neural substrates for the MFR, it would be at least required to examine (1) the details of the specificities of the neural activities for various stimuli (spatiotemporal-frequency tunings, stimulus-size spatial frequency tunings, and latencies of neural activity to stimuli) and (2) the effects of focal inactivation or damage of V6 and MST on the MFR. Our findings indicating the motor-dependent visual motion analyses would provide new insights for precisely investigating the neural substrates that process the visual motion for each motor function.
Footnotes
- Received October 6, 2012.
- Revision received August 21, 2013.
- Accepted September 9, 2013.
This work was supported by the ERATO Shimojo Implicit Brain Function Project, Japan Science and Technology Agency. We thank S. Nishida, T. Kimura, and D. Whitney for constructive discussions and N. Ueda, E. Maeda, and M. Kashino for support and encouragement. We also thank anonymous reviewers for valuable comments and suggestions.
- Correspondence should be addressed to Hiroaki Gomi, NTT Communication Science Laboratories, Wakamiya 3-1, Morinosato, Atsugi, Kanagawa 243-0198, Japan. gomi.hiroaki{at}lab.ntt.co.jp
- Copyright © 2013 the authors 0270-6474/13/3316502-08$15.00/0