V1 mechanisms and some figure–ground and border effects
Introduction
Segmenting figure from ground is one of the most important visual tasks, since it is seen as a pre-requisite for object recognition. While this topic has been studied extensively in computer vision and human psychophysics, physiological studies to probe the neural correlates of figure–ground segmentation in early visual cortex started only in recent years. In this paper, I review the relevant physiological observations on the “figure–ground” effects triggered by Lamme's finding that neural responses in V1 are higher to figures than background [20]. I will then relate them to physiological data on contextual and surround influences to cell responses in cortex. A V1 model is then used to demonstrate a proposal that V1 mechanisms, in particular, the intra-cortical interaction, are the causes of the physiological “figure–ground effects”. Additional model predictions will be presented, and subsequent physiological data confirming model predictions will be reviewed. I will use the insights gained from the model to account for some figure–ground and segmentation effects observed psychophysically.
V1 is usually considered a low level visual area, its classical receptive fields (CRFs) are usually much smaller than the sizes of most figure surfaces. It is therefore exciting to find that neural responses in V1 are higher to figures than background [20], [21], [40]––the figure–ground effect. Further experiments revealed that the medial axis of a figure can sometimes induce even higher responses than the figure surface nearby [23]–– the medial axis effect (see Fig. 1 for illustration). This effect is worth noting since, computationally, a convenient skeleton representation of a figure surface is suggested to be the medial axis transform [1], formally defined as the locus of the centers of the largest circles inside the figure region. It is a set of connected lines that are a formal reduction of the shape of a surface (think of a stick figure for a man). The response differentiation between figure and ground becomes significant 80 ms after stimulus onset or 30–40 ms after the initial responses [20], [22], [23], [40], late enough to allow contributions from higher visual areas. Furthermore, the figure–ground effects can be reduced by anesthesia or lesions in higher visual areas [21]. Hence, there was a common assumption that they mainly result from feedbacks from higher visual areas [20], [23], [40].
However, it is obviously important to consider how the figure–ground effects may result from boundary processing, a computational task more closely associated with V1. Indeed, another experiment [6] found that V1 cells robustly give higher responses to global borders between two texture regions, even under anesthesia. Furthermore, in the experiments showing the figure–ground effect [21], [23], [40], the response to the figure surface is usually highest near the figure boundary rather than anywhere further inside the boundary, including the medial axis. The differentiation between response levels to figure and ground appears earlier near figure boundaries and is significant at 10–15 ms [6] or 10–20 ms [22], [23] after the initial responses, whereas it takes 30–40 ms after the initial responses to differentiate responses to figure interior from that to the ground [20], [22], [23], [40]. The figure–ground effect thus consists of the border effect (Fig. 1), the response highlight to part of the figure near the boundary, and the interior effects (including the medial axis effect), the response highlights further inside the boundary.
In 1999, Li proposed [28] that V1 mechanisms are mainly responsible for these figure–ground effects observed physiologically, and that the interior effects, in particular, the medial axis effect, are by-products of the border effect. This proposal was inspired by the observations by Gallant et al. [6], as well as the following anatomical and physiological findings. Finite range intra-cortical interactions [5], [7], [11], [34] cause the responses of a cell to be modulated by stimuli that are nearby, but outside its CRF. They are manifested in the contextual influences seen experimentally, which are mainly suppressive, though sometimes facilitatory. For instance, Knierim and van Essen [17] observed that a cell's response to an optimally oriented bar can be reduced by 80% when the bar is surrounded by similarly oriented bars near but outside the CRF. This is termed iso-orientation suppression. The surround suppression is weaker if the surround bars are oriented randomly, and is the weakest when the surround bars are oriented orthogonally to the central bar. A related observation is “cross-orientation facilitation”, observed by Sillito et al. [37], that a V1 cell's response to a grating patch can be enhanced when the grating is surrounded by an orthogonally oriented grating. This facilitation effect was elusive as some subsequent attempts by other researchers failed to find it. Kapadia et al. [14] found that a V1 cell's response to a bar can be enhanced when contextual bars are aligned with the central bar to form a smooth line or contour––colinear facilitation. All these contextual influences, if caused by V1 mechanisms only, should be accounted for by the same V1 neural circuit of the intra-cortical interaction. The finite range interaction, mediated by axons extending a few millimeters laterally [7], [34], i.e., linking CRFs separated by up to a few CRFs from each other, could propagate to make V1 cells sensitive to long range image features despite the locality of their CRFs.
Li's proposal was validated [28], [30] by using a model of V1 whose parameters are chosen such that the model's responses to stimuli are consistent with the experimental data summarized above on intra-cortical interactions and contextual influences [25], [26], [27]. The model cells with nearby but not necessarily overlapping CRFs interact via intra-cortical connections. The model exhibited the border and interior effects, in particular the medial axis effect, and allowed to probe the dependence of these effects on size, shape, and texture features of the figures. It showed that whereas the border effect is robust, the interior, and, in particular, the medial axis, effects are by-products of the border effect. Furthermore, the interior effect is predicted to diminish as the figure size increases and the medial axis effect is predicted to be significant only for certain figure sizes. Figure size specificity of the medial axis effect was indeed evident in the original data [23]. Subsequently, new physiological data [35] confirmed the predicted diminishing response enhancement to figure interiors of increasingly large figures. Meanwhile, it was shown that the surround modulations of V1 responses do not depend on V2 feedbacks [12].
The insights provided by the model allowed an understanding of the elusive “cross-orientation facilitation” as dis-inhibition of the response to the center of the figure grating by the background grating. The model reveals that this effect can only be manifested within a small range of sizes of the figure grating, thus explaining its elusiveness in experimental investigations. Other related surround modulations, such as the extent of the surround summation and suppression as manifested in a V1 cell's responses to a grating [36] can also be accounted for.
Psychophysically, contrast detection tasks have been observed to be easier inside a closed contour, presumed as figure, than those in the background image regions [18]. A familiar dependence of this effect on the size of the contour region was also observed [19]. Recently, a “shine-through” phenomena, that a very briefly presented vernier target can be perceived as superposed on a subsequently presented grating, was also shown to depend on the size of the grating. I will show how the V1 model can also provide insights in these psychophysical phenomena.
In the rest of the paper, I will first describe the V1 model. Then the model is used as an organization guide to understand the neural mechanisms behind, and to provide a link between, the physiological and the psychophysical data outlined in this section.
Section snippets
Methods
The model contains arrays of model neuron units tuned to orientation and spatial location (see below). A unit (i,θ) has CRF center at location i and prefers orientation θ. An image is processed through the corresponding receptive fields to provide input to individual model units. The units interact with each other via lateral connections, using both monosynaptic facilitation and disynaptic inhibition through interneurons [7], [11], [34], [38]. Fig. 2 shows the elements of the model and their
Results
Fig. 3 shows that the model exhibits the figure–ground (border and interior), medial axis, and the border effects [28]. The highest responses are to the figure borders or the whole of a small figure against background. The responses to the medial axis are enhanced, but not so greatly as at the borders. These different response levels are to input bars of the same contrast, and are therefore solely due to contextual influences. The border effect is highly significant within a distance of about 2
Discussion
Our model suggested and predicted that (1) V1 mechanisms can account for the particular kinds of figure–ground effects observed in the physiological experiments by Lamme [20], Zipser et al. [40], Lee et al. [23], and Lamme et al. [22], including interior effects, in particular, the medial axis effect, and the border effect, observed physiologically, (2) the interior effects, including the medial axis effect, are weaker than the border effect, and, most importantly, (3) the interior effects are
Acknowledgements
I am very grateful to Peter Dayan, Michael Herzog, and two anonymous reviewers for careful readings of various versions of the manuscript and for their very helpful comments. This work is supported by the Gatsby Foundation.
References (41)
Biological shape and visual science
J. Theor. Biol.
(1973)- et al.
Contour integration by the human visual system: evidence for a local `associat ion field'
Vision Res.
(1993) - et al.
The role of “contrast enhancement” in the detection and appearance of visual contours
Vision Res.
(1998) - et al.
Improvement in visual sensitivity by changes in local context: parallel studies in human observers and in V1 of alert monkeys
Neuron
(1995) - et al.
The role of the primary visual cortex in higher level vision
Vision Res.
(1998) A saliency map in primary visual cortex
TRENDS Cogn. Sci.
(2002)- et al.
Discrimination of orientation-defined texture edges
Vision Res.
(1995) - et al.
Dynamic properties of recurrent inhibition in primary visual cortex: contrast and orientation dependence of contextual effects
J. Neurophysiol.
(2000) - et al.
Local interactions in neural networks explain global effects in the masking of visual stimuli
Neural Comput.
(2003) The functional organization of local circuits in visual cortex: insights from the study of tree shrew striate cortex
Cereb. Cortex
(1996)
Two-dimensional and three-dimensional texture processing in visual cortex of the macaque monkey
Clustered intrinsic connections in cat visual cortex
J. Neurosci.
Normalization of cell responses in cat striate cortex
Visual Neurosci.
Seeing properties of an invisible element: feature inheritance and shine-through
Proc. Natl. Acad. Sci. USA
Synaptic physiology of horizontal connections in the cat's visual cortex
J. Neurosci.
Response modulations by static texture surround in area V1 of the macaque monkey do not depend on feedback connections from V2
J. Neurophysiol.
Spatial organization and magnitude of orientation contrast interactions in primate V1
J. Neurophysiol.
Relationship between lateral inhibitory connections and the topograph of the orientation map in cat visual cortex
Euro. J. Neurosci.
Orientation-specific relationship between populations of excitatory and inhibitory lateral connections in the visual cortex of the cat
Cerebral Cortex
Neuronal responses to static texture patterns ion area V1 of the alert macaque monkeys
J. Neurophysiol.
Cited by (65)
Peripheral vision is mainly for looking rather than seeing
2024, Neuroscience ResearchUnraveling brain interactions in vision: The example of crowding
2021, NeuroImageModeling bottom-up and top-down attention with a neurodynamic model of V1
2020, NeurocomputingTracking the completion of parts into whole objects: Retinotopic activation in response to illusory figures in the lateral occipital complex
2020, NeuroImageCitation Excerpt :However, these illusion-specific activations are likely to provide only a crude picture, since a variety of processes, including contour interpolation and surface filling-in, are thought to be involved (Grossberg and Mingolla, 1985; Pessoa et al., 1998) and various brain regions in the visual hierarchy are likely to contribute differentially to these component processes of completion (Grossberg and Mingolla, 1985; Grossberg, 2000; Kogo et al., 2010). For instance, early visual areas with their relatively small receptive fields have been suggested to predominantly encode edges and to be involved in processes of contour interpolation (Lamme, 1995; Zhaoping, 2003), while LOC, with its comparatively large receptive fields, plays a crucial role in figure-ground segregation and, thus, in the construction of bounded surfaces (Stanley and Rubin, 2003; Chen et al., 2018b). A potential approach to track the processes underlying the construction of a grouped object representation within a single experiment is to provide observers with “partial” groupings that target intermediate steps in the generation of complete-object representations.
Figure–ground segregation: A fully nonlocal approach
2016, Vision Research