Abstract
Humans can readily and effortlessly perceive a rich, stable, and unified visual world from a complex visual scene. Yet our internal representation of a visual object appears to be sparse and fragmented. How and where in the brain are such fragmented representations organized into a whole percept? Recent studies have accumulated evidence that some global feature integration is mediated at the early stage of visual processing. However, the spatial operating range of the integration still remains unclear. The present human functional magnetic resonance imaging study provides support that the global integration process in early visual areas, including even the primary visual area V1, is mediated beyond the separated projection of visual hemifields from right and left sides of the fixation to the visual cortex of the contralateral cerebral hemisphere. Retinotopic neural responses corresponding to a visual target were significantly enhanced when another target was simultaneously presented at the point-symmetrical position in the nonassociated visual field quadrant. The result makes a convincing case that the contextual effects involve feedback from higher areas, because there are no direct callosal connections that allow such interhemispheric contextual modulation. This enhancement from the ipsilateral hemifield may help rapid position-and-size-invariant detection of a circular pattern, which may be special among visual structures because of its ubiquity in natural scenes. Early visual areas as well as higher ones may play a more essential role in perceiving the unity of the real world than previously thought.
Introduction
A primary challenge facing the visual system in the representation of visual scenes is that of integrating piecemeal features into information that signals unified visual structures. Recent physiological and neuroimaging studies have accumulated evidence that some integration of local information into a global pattern may take place at the early as well as higher visual areas, well known as neural “contextual modulation” or “collinear contour integration” (Lamme, 1995; Zipser et al., 1996; Super et al., 2001; Altmann et al., 2003; Kourtzi et al., 2003). However, there are still many unresolved questions about contextual modulation, such as its spatial operating range. In recent experimental paradigms, contextual modulations within spatially limited regions (beyond classical receptive fields, but from relatively neighboring regions or within one of the visual field quadrants) have been argued. It remains unclear whether previously reported neural contextual modulations in early visual cortex operate across right and left cerebral hemispheres, opposing the fact that their visual field representations are organized contralaterally.
Contralateral dominance in sensory representation is a fundamental property of the primate sensory system: ascending sensory inputs to each cerebral hemisphere project from the contralateral (right or left) side and are processed separately at early stages of sensory processing (Gazzaniga, 2000). In human vision, retinal inputs from the right or left visual hemifield are separately projected to the primary visual cortex of the contralateral hemisphere, without any evidence of continuous representation across the junction between the two hemifields (Tootell et al., 1998b; Lavidor and Walsh, 2004). According to this internal split representation, imprints formed by visual objects on the early visual cortex differ remarkably from the way that they appear, depending on subtle changes in spatial position in the visual field, as argued by Schiller (1997) and illustrated in Figure 1. It is thus assumed that, in bottom-up visual hierarchy, dissociated representations of an object located across the visual vertical meridian should be combined only in higher visual areas in which the receptive fields of neurons are large enough to cover ipsilateral as well as contralateral visual hemifields.
However, given the existence of massive feedback connections from higher areas targeting early visual cortex and a recent reports of their contribution to some contextual modulation (for review, see Lamme and Roelfsema, 2000; Super et al., 2001), we may well expect that early visual areas may contribute to more global feature integration than previously thought, even beyond dissociated representations of visual hemifields.
To test this intriguing hypothesis, we performed block-design human functional magnetic resonance imaging (fMRI) experiments in which circular visual patterns aligned with various configurations were used. Each pattern consisted of centrally foveated quarter arcs, each of which was located in one quadrant of the visual field, which allowed for a precise localization and anatomical separation of an early cortical representation of an arc from others (see Figs. 3 A, 4 A). We investigated whether the retinotopic neural responses corresponding to a target arc were modulated by the other arcs presented in nonassociated visual fields when such elements were perceptually linked into a whole structure. If early retinotopic visual areas contribute to the interfield integration process and are sensitive to global arrangements of visual elements, they may systematically change activity depending on overall structure even when the elements are represented on disparate portions of the cerebral cortex.
Materials and Methods
Participants.
Seven participants (one female, one left-handed, 24–61 years old) took part in experiment 1. The data of one participant were discarded because of considerable head movements during fMRI scanning. In the gap condition of experiment 1, five participants took part. Seven participants (one female, one left-handed, 22–61 years old) took part in experiment 2. All participants had normal or corrected-to-normal vision and provided written and oral informed consent. They had experience in psychophysics and were well trained to sustain eye fixation. The Ethics Committee of the Graduate School of Human and Environmental Studies, Kyoto University, and that of the Department of Neurosurgery, Meiji University of Oriental Medicine, approved the experimental procedures in advance.
Imaging data acquisition.
Functional MR measurements were performed using a standard clinical 1.5 Tesla scanner (Signa Horizon; General Electric, Milwaukee, WI) equipped with a surface coil that covered the occipital lobe, with echo-planar capability (1.56 × 1.56 × 3 mm voxels; repetition time, 2000 ms; echo time, 50 ms; flip angle, 55°; 16 slices perpendicular to the calcarine sulcus). High-contrast T1-weighted structural MR images of the whole brain (0.93 × 0.93 × 1.4 mm voxels; 124 axial slices) were obtained once for each participant to reconstruct the cortical surface (Teo et al., 1997).
ROI localization.
First, retinotopic visual areas, V1, V2, and V3 were identified using standard “rotating wedge” and “expanding ring” stimuli for each participant (Engel et al., 1997; Tootell et al., 1998a). Then, for each retinotopic area, time series of blood oxygenation level-dependent (BOLD) signals were analyzed on the basis of eccentric representation of the visual field as follows. First, a set of “bands” was defined on the reconstructed cortical surface based on the geodesic distance (the distance of the shortest path along the cortical surface) from the peripheral limit of the eccentricity map (16°) to the foveal region in each visual area. The geodesic distance was computed using Dijkstra’s algorithm (Dijkstra, 1959). The bands were delineated so that their centers differed in 1.5 mm steps in cortical distance and the width of each band was 3 mm with 50% overlaps. Because each band represented similar retinal eccentricity, we called these bands “isoeccentricity bands.” Next, the voxel time series within each isoeccentricity band was averaged to increase the signal-to-noise ratio, yielding the mean time series for each eccentricity. Hereafter, the mean time series from repeated scans was averaged. The time series for each eccentricity were further averaged across stimulus cycles to obtain event-related responses for a stimulus cycle. The event-related responses for different eccentricities were displayed as an image with an interpolated pseudocolor format. Using this technique, the spatiotemporal response to each stimulus can be easily determined.
Imaging data analysis.
After motion correction (Woods et al., 1998) and slice time correction, the functional data in the main experiments were coregistered with the anatomical scan. Data were analyzed in predefined regions of interest (ROIs) in right V1d, V2d, and V3. For visualizing spatiotemporal response profiles in these ROIs to the circular visual patterns, we applied the same analysis procedure of isoeccentricity bands averaging as taken in the ROI localizations. Time courses from each ROI were extracted and imported into our in-house Matlab software for additional analysis. This included removing linear trend by classical decomposition, an event-related averaging of each stimulus condition, and estimation of each BOLD signal magnitude by fitting a hemodynamics response model (Boynton et al., 1996). Statistically paired t tests were used to evaluate differences in the response for each of the ROIs.
Visual stimuli and experimental procedures.
Target stimuli consisted of white arcs (CIE 1931, x = 0.32, y = 0.36) with 14% contrast from the gray background. The visual stimuli were programmed in C++ with Visualization Toolkit (Kitware, Clifton Park, NY) and manipulated using a Windows PC equipped with an optimized graphics card. The stimuli were projected onto a translucent screen (20 × 15 cm) placed in front of the participant’s chin using a color-and-luminance-calibrated UP-1100 digital light processing projector (PLUS Vision, Tokyo, Japan). Participants were supine and viewed the stimuli by looking directly into a front surfaced mirror placed at a 45° angle to both the screen and the participant’s line of sight. For stimulus presentation, a block-design was used for greater signal-to-noise ratio than the recent event-related paradigm. Each main experiment lasted 192 s and consisted of six visual epochs of 16 s stimulus presentation followed by a 16 s blank screen. To prevent neural adaptation and to sample voxels from wider cortical regions during the stimulus presentation epoch, we changed the size of the stimulus from 8 to 24° in diameter successively and repeatedly (scaling based on the cortical magnification factor) in four steps (500 ms duration each), alternating with gray background (500 ms duration) to block the perception of apparent motion and expansion of the circular elements from one frame to the next. A central fixation point was presented throughout each scan. All participants were scanned at least three (at most six) times for one stimulus condition. Control of participant’s attention and blocking of the effects of eye movement were crucial for the present study. We therefore imposed a task requiring attention on the central fixation point as follows. The color of the fixation point changed pseudorandomly (0.5–5 s intervals) to red, green, blue, or yellow, was sustained for 300 ms, and then changed to pale blue (default color). Participants were instructed to push an optical switch when the color of the fixation point turned red. Image data obtained with <80% correct responses to the fixation task were discarded. Our success in localizing the ROIs very clearly, as shown in Figure 2, indicates that eye movement was correctly minimized by this fixation task, although we did not record eye movement in real time during the scanning session.
Results
ROI localization
As a first step, we localized the cortical subregions representing a portion of the visual field where the target stimulus was presented in the following main experiments (4–12° in eccentricity in the lower-left visual quadrant). This localization was done for V1d, V2d, and V3 on the right cerebral hemisphere of each participant by using a checkerboard pattern spanning the target zone (Fig. 2 A,B). Figure 2 C shows cortical activity of one representative participant to the localizer. These contralateral retinotopic representations were clearly demonstrated for all participants. The localized regions were used as ROIs for the analysis of the following main experiments.
Contextual modulation from the nonassociated visual field quadrants
Next, we explored the possibility that contextual modulation was observed across separated field representation in early visual areas. To this end, we investigated whether neural activity within the localized regions is modulated when the other visual target was simultaneously presented in the nonassociated visual field quadrant(s).
In experiment 1, neural activity evoked by a quarter arc presented alone within the lower-left visual field (no context) was compared with that evoked by four quarter arcs contextually arranged into a complete annulus (context) (Fig. 3). Note that although these annular stimuli were different in global configurations, their lower-left portions were completely identical, and were represented retinotopically within the predefined ROIs on the early visual cortex. BOLD signals within these ROIs were extracted and compared. Here, if the early visual areas were not involved in the split visual field integration, no modulation would be observed within the predefined ROIs even if the other target were simultaneously presented.
However, the result showed interesting response enhancements in the ROIs when the lower-left arc was contextually completed into a whole annulus over the visual meridians (Fig. 3). Figure 3 A shows cortical spatiotemporal responses of one representative participant in right V1d for the no context, the context, and the localizer. Figure 3 B shows the estimated BOLD amplitudes. The ROI in V1d showed stronger responses to the complete annulus than to the quarter alone, despite the fact that retinotopic representations of the stimulus within the ROI were identical.
Next, to ensure that the present response enhancements were not caused by local stimulus differences such as smoothness or end-stopping edges just over the visual meridians, we measured neural activities to the complete and quarter arcs interposed with gaps over the meridians (40° polar angle; 2.6–8.2° visual angle from the meridians). Again, we observed response facilitation with the complete annulus (Fig. 3 C). We therefore conclude that visual context even in the nonassociated visual field is reflected in the retinotopic neural activity of early visual areas.
Interhemispheric contextual modulation in early visual areas
The question then arises, which visual component of a complete annulus contributes to the response enhancement in early visual areas. More specifically, does the observed enhancement derive from only intrafield (within left visual hemifield, from the upper to lower visual quadrant) or interfield (across left and right visual hemifields) contextual modulation?
To address this question, we measured and compared the neural activity evoked by two quarter arcs aligned with four different configurations in experiment 2: continuous and axisymmetrical over the horizontal meridian (longitudinal), continuous and axisymmetrical over the vertical meridian (transverse), symmetrical and diagonal about a fixation point (diagonal), and asymmetrical (control) (Fig. 4 A). Note that the lower left portions of these stimuli were also identical, as in experiment 1.
Surprisingly, the strongest responses were evoked in the diagonal arrangement, despite the fact that it presents the most distant spatial arrangement between two arcs in the visual field and on the cortical surface (Fig. 4 B,C). The enhancement disappeared when the two arcs were placed asymmetrically (control). Slight enhancements were observed for the longitudinal case. The transverse stimulus did not enhance activity in the early visual areas. These findings indicate that interfield contextual modulation occurred in the contralaterally dominated early visual areas when the visual system is confronted with a point-symmetrical configuration of visual elements. Compared with a semicircle, the diagonally located fragments may perceptually appear more like a complete circle. The surprising response enhancement observed is, thus, not a result of the distal setting between image fragments per se, but of perceptual context modulation (i.e., feature integration).
Discussion
In summary, the present study provides clear evidence that neural contextual modulation in early visual areas occurs beyond dissociated representations of visual hemifields: contralateral dominance of human primary visual sensory organization is overturned when the visual system is confronted with a point-symmetrical circular pattern. This suggests that early visual areas implement an interhemispheric integration mechanism (EIIM) that realizes more global integration of local features into a whole structure than previously considered. Importantly, the result indicates that early visual areas, including even primary visual area V1, may play a more essential role in stable and unified percepts of the real world than simply extracting local features of a visual object.
The underlying neural mechanisms of interhemispheric contextual modulation are not yet known. One possible candidate may be direct interhemispheric callosal connections between early visual areas. However, although interhemispheric callosal connections would play an important role, the observed facilitation cannot be explained in terms of the callosal bridge alone because response facilitation occurred even when the visual stimuli were interposed with gaps (2.6–8.2° visual angle from the meridians; experiment1), whereas direct interhemispheric callosal connections across early visual areas in primates, if any, are limited to cells with receptive fields near the vertical meridians up to 2° (Clarke and Miklossy, 1990; Gazzaniga, 2000). In addition, the callosal time delay is greater than the delay caused by feedback projections to the early visual cortex, and hence entails a “time penalty” (Wilson et al., 2001). From these facts, the response modulation by the most distant visual element even in V1, whose neurons have small receptive fields, strongly indicates the contribution of a feedback mechanism. Cortical activity selective for symmetrical structures in V3A, V4, V7, and lateral occipital regions in human neuroimaging (Sasaki et al., 2005; Tyler et al., 2005), or for circular patterns in V4 in physiological studies (Gallant et al., 1993) suggests that our finding may be ascribed to the interactions of V1, V2, and V3 with such higher areas. Animal single recording also reported delayed response in V1 selective to symmetrical patterns and indicated contribution of feedback modulation (Lee et al., 1998) as a likely explanation. In recent monkey studies, late signal enhancement in V1 was reported for the contextual modulation on figure-ground segregation (Super et al., 2001). Furthermore, consistent with our argument, human visual evoked potential studies investigating the timing of interaction across the visual field reported late (after 120 ms) signals in lower-order visual areas probably derived from descending inputs (Vanni et al., 2004). Our result extends these studies by clearly demonstrating that early global integration processes operate beyond split-field representations and that neural activity of early visual areas is modulated. Moreover, recent human behavioral studies propose the existence of a specialized mechanism to integrate separate field representations at the early stage of visual processing (Pillow and Rubin, 2002). The present study provides a neural substrate to their report.
The circular pattern used here is one of the special classes of visual structure in the following three respects. First, the structure is ecologically important for its ubiquity in natural scenes (Sigman et al., 2001). Second, a circular pattern is theoretically and behaviorally significant for object processing (Kovacs and Julesz, 1993, 1994; Wilson et al., 1997). Third, a recent physiological study found neurons selective for circular patterns in relatively early visual areas (Gallant et al., 1993; Hegde and Van Essen, 2000). Thus, a circular symmetrical pattern allowed us to probe interhemispheric integration in a manner not explored in previous studies in which stimuli consisted of asymmetrical patterns confined within one of the visual hemifields. Follow-up studies will reveal to what extent our finding in circular visual elements is applicable in other visual configurations.
Our result challenges the current view of a step-wise (bottom-up) hierarchical model in which local visual elements are gradually integrated into a whole structure. How does EIIM benefit the visual system? In our view, the EIIM might be attributable to responses to ecological demand because cocircular contours comprise a special class of visual structure as described above. To detect such ecologically significant structures readily, even when presented across right and left visual hemifields, necessitates the EIIM. By implementing the EIIM, the visual system may be able (1) to enhance sensitivity to potential candidates, (2) to bundle them toward a common circle, and finally (3) to detect the whole structure readily, independent of its size and position. Such a mechanism will optimally enhance outputs from the early areas and will help cortical higher processing of global shapes (Sigman et al., 2001).
It should be noted that the EIIM is functionally distinct from recently reported early integration mechanisms for within-hemifield stimuli in V1; the recently reported mechanism accomplishes partial or local integrations based on collinearity (Altmann et al., 2003; Kourtzi et al., 2003), whereas the EIIM reported here accomplishes holistic integration based on figurality. This contrast, given the ubiquity of circular structures, raises the possibility that similar holistic integration as reported here operates also for within-hemifield visual stimuli. This hypothesis is intriguing given the current limited knowledge about the nature of the integration mechanism of V1. Although this hypothesis can be assessed by an analogous experiment, where within-hemifield stimuli are used, such an exploration is currently challenged by the limited spatial resolution of fMRI, and awaits future methodological advances.
Finally, EIIM in the visual system might be extended to the other primary sensory areas, such as the primary auditory cortex, where contralateral organization prevails; EIIM is hypothesized to play a similar role in organizing divided sensory inputs into a whole and coherent percept. As demonstrated here, the investigations of contextual modulation by ipsilateral stimuli in early sensory areas may offer a promising way to reveal the nature of top-down signals that contribute to perceptual coherency.
Footnotes
-
This work was supported by the 21st Century Center of Excellence Program (D-2 to Kyoto University), the Ministry of Education, Culture, Sports, Scicence and Technology, Japan (H.Y., Y.E.), and Young Scientist Research Fellowships from the Japan Society for the Promotion of Science (17-2088) (H.B.). We are grateful to G. van Tonder, K. Maeda, and J. Saiki for fruitful discussions on this work and to S. Takahashi, N. Goda, and T. Azukawa for cortical surface reconstructions and programming some analysis tools.
- Correspondence should be addressed to Hiroshi Ban, Department of Cognitive and Behavioral Science, Graduate School of Human and Environmental Studies, Kyoto University, Yoshida-Nihonmatsu-Cho, Sakyo-Ku, Kyoto City, Kyoto 606-8501, Japan. ban{at}cv.jinkan.kyoto-u.ac.jp