V1 mechanisms and some figure–ground and border effects

doi:10.1016/j.jphysparis.2004.01.008

Journal of Physiology-Paris

Volume 97, Issues 4–6, July–November 2003, Pages 503-515

https://doi.org/10.1016/j.jphysparis.2004.01.008 Get rights and content

Abstract

V1 neurons have been observed to respond more strongly to figure than background regions. Within a figure region, the responses are usually stronger near figure boundaries (the border effect), than further inside the boundaries. Sometimes the medial axes of the figures (e.g., the vertical midline of a vertical figure strip) induce secondary, intermediate, response peaks (the medial axis effect). Related is the physiologically elusive “cross-orientation facilitation”, the observation that a cell's response to a grating patch can be facilitated by an orthogonally oriented grating in the surround. Higher center feedbacks have been suggested to cause these figure–ground effects. It has been shown, using a V1 model, that the causes could be intra-cortical interactions within V1 that serve pre-attentive visual segmentation, particularly, object boundary detection. Furthermore, whereas the border effect is robust, the figure–ground effects in the interior of a figure, in particular, the medial axis effect, are by-products of the border effect and are predicted to diminish to zero for larger figures. This model prediction (of the figure size dependence) was subsequently confirmed physiologically, and supported by findings that the response modulations by texture surround do not depend on feedbacks from V2. In addition, the model explains the “cross-orientation facilitation” as caused by a dis-inhibition, to the cell responding to the center of the central grating, by the background grating. Furthermore, the elusiveness of this phenomena was accounted for by the insight that it depends critically on the size of the figure grating. The model is applied to understand some figure–ground effects and segmentation in psychophysics: in particular, that contrast discrimination threshold is lower within and at the center of a closed contour than that in the background, and that a very briefly presented vernier target can perceptually shine through a subsequently presented large grating centered at the same location.

Introduction

Segmenting figure from ground is one of the most important visual tasks, since it is seen as a pre-requisite for object recognition. While this topic has been studied extensively in computer vision and human psychophysics, physiological studies to probe the neural correlates of figure–ground segmentation in early visual cortex started only in recent years. In this paper, I review the relevant physiological observations on the “figure–ground” effects triggered by Lamme's finding that neural responses in V1 are higher to figures than background [20]. I will then relate them to physiological data on contextual and surround influences to cell responses in cortex. A V1 model is then used to demonstrate a proposal that V1 mechanisms, in particular, the intra-cortical interaction, are the causes of the physiological “figure–ground effects”. Additional model predictions will be presented, and subsequent physiological data confirming model predictions will be reviewed. I will use the insights gained from the model to account for some figure–ground and segmentation effects observed psychophysically.

V1 is usually considered a low level visual area, its classical receptive fields (CRFs) are usually much smaller than the sizes of most figure surfaces. It is therefore exciting to find that neural responses in V1 are higher to figures than background [20], [21], [40]––the figure–ground effect. Further experiments revealed that the medial axis of a figure can sometimes induce even higher responses than the figure surface nearby [23]–– the medial axis effect (see Fig. 1 for illustration). This effect is worth noting since, computationally, a convenient skeleton representation of a figure surface is suggested to be the medial axis transform [1], formally defined as the locus of the centers of the largest circles inside the figure region. It is a set of connected lines that are a formal reduction of the shape of a surface (think of a stick figure for a man). The response differentiation between figure and ground becomes significant 80 ms after stimulus onset or 30–40 ms after the initial responses [20], [22], [23], [40], late enough to allow contributions from higher visual areas. Furthermore, the figure–ground effects can be reduced by anesthesia or lesions in higher visual areas [21]. Hence, there was a common assumption that they mainly result from feedbacks from higher visual areas [20], [23], [40].

However, it is obviously important to consider how the figure–ground effects may result from boundary processing, a computational task more closely associated with V1. Indeed, another experiment [6] found that V1 cells robustly give higher responses to global borders between two texture regions, even under anesthesia. Furthermore, in the experiments showing the figure–ground effect [21], [23], [40], the response to the figure surface is usually highest near the figure boundary rather than anywhere further inside the boundary, including the medial axis. The differentiation between response levels to figure and ground appears earlier near figure boundaries and is significant at 10–15 ms [6] or 10–20 ms [22], [23] after the initial responses, whereas it takes 30–40 ms after the initial responses to differentiate responses to figure interior from that to the ground [20], [22], [23], [40]. The figure–ground effect thus consists of the border effect (Fig. 1), the response highlight to part of the figure near the boundary, and the interior effects (including the medial axis effect), the response highlights further inside the boundary.

In 1999, Li proposed [28] that V1 mechanisms are mainly responsible for these figure–ground effects observed physiologically, and that the interior effects, in particular, the medial axis effect, are by-products of the border effect. This proposal was inspired by the observations by Gallant et al. [6], as well as the following anatomical and physiological findings. Finite range intra-cortical interactions [5], [7], [11], [34] cause the responses of a cell to be modulated by stimuli that are nearby, but outside its CRF. They are manifested in the contextual influences seen experimentally, which are mainly suppressive, though sometimes facilitatory. For instance, Knierim and van Essen [17] observed that a cell's response to an optimally oriented bar can be reduced by 80% when the bar is surrounded by similarly oriented bars near but outside the CRF. This is termed iso-orientation suppression. The surround suppression is weaker if the surround bars are oriented randomly, and is the weakest when the surround bars are oriented orthogonally to the central bar. A related observation is “cross-orientation facilitation”, observed by Sillito et al. [37], that a V1 cell's response to a grating patch can be enhanced when the grating is surrounded by an orthogonally oriented grating. This facilitation effect was elusive as some subsequent attempts by other researchers failed to find it. Kapadia et al. [14] found that a V1 cell's response to a bar can be enhanced when contextual bars are aligned with the central bar to form a smooth line or contour––colinear facilitation. All these contextual influences, if caused by V1 mechanisms only, should be accounted for by the same V1 neural circuit of the intra-cortical interaction. The finite range interaction, mediated by axons extending a few millimeters laterally [7], [34], i.e., linking CRFs separated by up to a few CRFs from each other, could propagate to make V1 cells sensitive to long range image features despite the locality of their CRFs.

Li's proposal was validated [28], [30] by using a model of V1 whose parameters are chosen such that the model's responses to stimuli are consistent with the experimental data summarized above on intra-cortical interactions and contextual influences [25], [26], [27]. The model cells with nearby but not necessarily overlapping CRFs interact via intra-cortical connections. The model exhibited the border and interior effects, in particular the medial axis effect, and allowed to probe the dependence of these effects on size, shape, and texture features of the figures. It showed that whereas the border effect is robust, the interior, and, in particular, the medial axis, effects are by-products of the border effect. Furthermore, the interior effect is predicted to diminish as the figure size increases and the medial axis effect is predicted to be significant only for certain figure sizes. Figure size specificity of the medial axis effect was indeed evident in the original data [23]. Subsequently, new physiological data [35] confirmed the predicted diminishing response enhancement to figure interiors of increasingly large figures. Meanwhile, it was shown that the surround modulations of V1 responses do not depend on V2 feedbacks [12].

The insights provided by the model allowed an understanding of the elusive “cross-orientation facilitation” as dis-inhibition of the response to the center of the figure grating by the background grating. The model reveals that this effect can only be manifested within a small range of sizes of the figure grating, thus explaining its elusiveness in experimental investigations. Other related surround modulations, such as the extent of the surround summation and suppression as manifested in a V1 cell's responses to a grating [36] can also be accounted for.

Psychophysically, contrast detection tasks have been observed to be easier inside a closed contour, presumed as figure, than those in the background image regions [18]. A familiar dependence of this effect on the size of the contour region was also observed [19]. Recently, a “shine-through” phenomena, that a very briefly presented vernier target can be perceived as superposed on a subsequently presented grating, was also shown to depend on the size of the grating. I will show how the V1 model can also provide insights in these psychophysical phenomena.

In the rest of the paper, I will first describe the V1 model. Then the model is used as an organization guide to understand the neural mechanisms behind, and to provide a link between, the physiological and the psychophysical data outlined in this section.

Section snippets

Methods

The model contains arrays of model neuron units tuned to orientation and spatial location (see below). A unit (i,θ) has CRF center at location i and prefers orientation θ. An image is processed through the corresponding receptive fields to provide input to individual model units. The units interact with each other via lateral connections, using both monosynaptic facilitation and disynaptic inhibition through interneurons [7], [11], [34], [38]. Fig. 2 shows the elements of the model and their

Results

Fig. 3 shows that the model exhibits the figure–ground (border and interior), medial axis, and the border effects [28]. The highest responses are to the figure borders or the whole of a small figure against background. The responses to the medial axis are enhanced, but not so greatly as at the borders. These different response levels are to input bars of the same contrast, and are therefore solely due to contextual influences. The border effect is highly significant within a distance of about 2

Discussion

Our model suggested and predicted that (1) V1 mechanisms can account for the particular kinds of figure–ground effects observed in the physiological experiments by Lamme [20], Zipser et al. [40], Lee et al. [23], and Lamme et al. [22], including interior effects, in particular, the medial axis effect, and the border effect, observed physiologically, (2) the interior effects, including the medial axis effect, are weaker than the border effect, and, most importantly, (3) the interior effects are

Acknowledgements

I am very grateful to Peter Dayan, Michael Herzog, and two anonymous reviewers for careful readings of various versions of the manuscript and for their very helpful comments. This work is supported by the Gatsby Foundation.

References (41)

H. Blum
Biological shape and visual science
J. Theor. Biol.
(1973)
D.J. Field et al.
Contour integration by the human visual system: evidence for a local `associat ion field'
Vision Res.
(1993)
R.F. Hess et al.
The role of “contrast enhancement” in the detection and appearance of visual contours
Vision Res.
(1998)
M.K. Kapadia et al.
Improvement in visual sensitivity by changes in local context: parallel studies in human observers and in V1 of alert monkeys
Neuron
(1995)
T.S Lee et al.
The role of the primary visual cortex in higher level vision
Vision Res.
(1998)
Z. Li
A saliency map in primary visual cortex
TRENDS Cogn. Sci.
(2002)
S. Wolfson et al.
Discrimination of orientation-defined texture edges
Vision Res.
(1995)
V. Dragoi et al.
Dynamic properties of recurrent inhibition in primary visual cortex: contrast and orientation dependence of contextual effects
J. Neurophysiol.
(2000)
M.H. Herzog et al.
Local interactions in neural networks explain global effects in the masking of visual stimuli
Neural Comput.
(2003)
D. Fitzpatrick
The functional organization of local circuits in visual cortex: insights from the study of tree shrew striate cortex
Cereb. Cortex
(1996)

J.L. Gallant et al.

Two-dimensional and three-dimensional texture processing in visual cortex of the macaque monkey

C.D. Gilbert et al.

Clustered intrinsic connections in cat visual cortex

J. Neurosci.

(1983)

D.J. Heeger

Normalization of cell responses in cat striate cortex

Visual Neurosci.

(1992)

M.H. Herzog et al.

Seeing properties of an invisible element: feature inheritance and shine-through

Proc. Natl. Acad. Sci. USA

(2001)

J.A. Hirsch et al.

Synaptic physiology of horizontal connections in the cat's visual cortex

J. Neurosci.

(1991)

J.-M. Húpe et al.

Response modulations by static texture surround in area V1 of the macaque monkey do not depend on feedback connections from V2

J. Neurophysiol.

(2001)

H.E. Jones et al.

Spatial organization and magnitude of orientation contrast interactions in primate V1

J. Neurophysiol.

(2002)

Z.F. Kisvarday et al.

Relationship between lateral inhibitory connections and the topograph of the orientation map in cat visual cortex

Euro. J. Neurosci.

(1994)

Z.F. Kisvarday et al.

Orientation-specific relationship between populations of excitatory and inhibitory lateral connections in the visual cortex of the cat

Cerebral Cortex

(1997)

J.J. Knierim et al.

Neuronal responses to static texture patterns ion area V1 of the alert macaque monkeys

J. Neurophysiol.

(1992)

Cited by (65)

Peripheral vision is mainly for looking rather than seeing
2024, Neuroscience Research
Vision includes looking and seeing. Looking, mainly via gaze shifts, selects a fraction of visual input information for passage through the brain’s information bottleneck. The selected input is placed within the attentional spotlight, typically in the central visual field. Seeing decodes, i.e., recognizes and discriminates, the selected inputs. Hence, peripheral vision should be mainly devoted to looking, in particular, deciding where to shift the gaze. Looking is often guided exogenously by a saliency map created by the primary visual cortex (V1), and can be effective with no seeing and limited awareness. In seeing, peripheral vision not only suffers from poor spatial resolution, but is also subject to crowding and is more vulnerable to illusions by misleading, ambiguous, and impoverished visual inputs. Central vision, mainly for seeing, enjoys the top-down feedback that aids seeing in light of the bottleneck which is hypothesized to starts from V1 to higher areas. This feedback queries for additional information from lower visual cortical areas such as V1 for ongoing recognition. Peripheral vision is deficient in this feedback according to the Central-peripheral Dichotomy (CPD) theory. The saccades engendered by peripheral vision allows looking to combine with seeing to give human observers the impression of seeing the whole scene clearly despite inattentional blindness.
Unraveling brain interactions in vision: The example of crowding
2021, NeuroImage
Crowding, the impairment of target discrimination in clutter, is the standard situation in vision. Traditionally, crowding is explained with (feedforward) models, in which only neighboring elements interact, leading to a “bottleneck” at the earliest stages of vision. It is with this implicit prior that most functional magnetic resonance imaging (fMRI) studies approach the identification of the “neural locus” of crowding, searching for the earliest visual area in which the blood-oxygenation-level-dependent (BOLD) signal is suppressed under crowded conditions. Using this classic approach, we replicated previous findings of crowding-related BOLD suppression starting in V2 and increasing up the visual hierarchy. Surprisingly, under conditions of uncrowding, in which adding flankers improves performance, the BOLD signal was further suppressed. This suggests an important role for top-down connections, which is in line with global models of crowding. To discriminate between various possible models, we used dynamic causal modeling (DCM). We show that recurrent interactions between all visual areas, including higher-level areas like V4 and the lateral occipital complex (LOC), are crucial in crowding and uncrowding. Our results explain the discrepancies in previous findings: in a recurrent visual hierarchy, the crowding effect can theoretically be detected at any stage. Beyond crowding, we demonstrate the need for models like DCM to understand the complex recurrent processing which most likely underlies human perception in general.
Modeling bottom-up and top-down attention with a neurodynamic model of V1
2020, Neurocomputing
Previous studies suggested that lateral interactions of V1 cells are responsible, among other visual effects, of bottom-up visual attention (alternatively named visual salience or saliency). Our objective is to mimic these connections with a neurodynamic network of firing-rate neurons in order to predict visual attention. Early visual subcortical processes (i.e. retinal and thalamic) are functionally simulated. An implementation of the cortical magnification function is included to define the retinotopical projections towards V1, processing neuronal activity for each distinct view during scene observation. Novel computational definitions of top-down inhibition (in terms of inhibition of return, oculomotor and selection mechanisms), are also proposed to predict attention in Free-Viewing and Visual Search tasks. Results show that our model outpeforms other biologically inspired models of saliency prediction while predicting visual saccade sequences with the same model. We also show how temporal and spatial characteristics of saccade amplitude and inhibition of return can improve prediction of saccades, as well as how distinct search strategies (in terms of feature-selective or category-specific inhibition) can predict attention at distinct image contexts.
Tracking the completion of parts into whole objects: Retinotopic activation in response to illusory figures in the lateral occipital complex
2020, NeuroImage
Citation Excerpt :
However, these illusion-specific activations are likely to provide only a crude picture, since a variety of processes, including contour interpolation and surface filling-in, are thought to be involved (Grossberg and Mingolla, 1985; Pessoa et al., 1998) and various brain regions in the visual hierarchy are likely to contribute differentially to these component processes of completion (Grossberg and Mingolla, 1985; Grossberg, 2000; Kogo et al., 2010). For instance, early visual areas with their relatively small receptive fields have been suggested to predominantly encode edges and to be involved in processes of contour interpolation (Lamme, 1995; Zhaoping, 2003), while LOC, with its comparatively large receptive fields, plays a crucial role in figure-ground segregation and, thus, in the construction of bounded surfaces (Stanley and Rubin, 2003; Chen et al., 2018b). A potential approach to track the processes underlying the construction of a grouped object representation within a single experiment is to provide observers with “partial” groupings that target intermediate steps in the generation of complete-object representations.
Illusory figures demonstrate the visual system’s ability to integrate separate parts into coherent, whole objects. The present study was performed to track the neuronal object construction process in human observers, by incrementally manipulating the grouping strength within a given configuration until the emergence of a whole-object representation. Two tasks were employed: First, in the spatial localization task, object completion could facilitate performance and was task-relevant, whereas it was irrelevant in the second, luminance discrimination task. Concurrent functional magnetic resonance imaging (fMRI) used spatial localizers to locate brain regions representing task-critical illusory-figure parts to investigate whether the step-wise object construction process would modulate neural activity in these localized brain regions. The results revealed that both V1 and the lateral occipital complex (LOC, with sub-regions LO1 and LO2) were involved in Kanizsa figure processing. However, completion-specific activations were found predominantly in LOC, where neural activity exhibited a modulation in accord with the configuration’s grouping strength, whether or not the configuration was relevant to performing the task at hand. Moreover, right LOC activations were confined to LO2 and responded primarily to surface and shape completions, whereas left LOC exhibited activations in both LO1 and LO2 and was related to encoding shape structures with more detail. Together, these results demonstrate that various grouping properties within a visual scene are integrated automatically in LOC, with sub-regions located in different hemispheres specializing in the component sub-processes that render completed objects.
Figure–ground segregation: A fully nonlocal approach
2016, Vision Research
We present a computational model that computes and integrates in a nonlocal fashion several configural cues for automatic figure–ground segregation. Our working hypothesis is that the figural status of each pixel is a nonlocal function of several geometric shape properties and it can be estimated without explicitly relying on object boundaries. The methodology is grounded on two elements: multi-directional linear voting and nonlinear diffusion. A first estimation of the figural status of each pixel is obtained as a result of a voting process, in which several differently oriented line-shaped neighborhoods vote to express their belief about the figural status of the pixel. A nonlinear diffusion process is then applied to enforce the coherence of figural status estimates among perceptually homogeneous regions. Computer simulations fit human perception and match the experimental evidence that several cues cooperate in defining figure–ground segregation. The results of this work suggest that figure–ground segregation involves feedback from cells with larger receptive fields in higher visual cortical areas.
The role of attention in figure-ground segregation in areas V1 and V4 of the visual cortex
2012, Neuron
Our visual system segments images into objects and background. Figure-ground segregation relies on the detection of feature discontinuities that signal boundaries between the figures and the background and on a complementary region-filling process that groups together image regions with similar features. The neuronal mechanisms for these processes are not well understood and it is unknown how they depend on visual attention. We measured neuronal activity in V1 and V4 in a task where monkeys either made an eye movement to texture-defined figures or ignored them. V1 activity predicted the timing and the direction of the saccade if the figures were task relevant. We found that boundary detection is an early process that depends little on attention, whereas region filling occurs later and is facilitated by visual attention, which acts in an object-based manner. Our findings are explained by a model with local, bottom-up computations for boundary detection and feedback processing for region filling.

View all citing articles on Scopus

View full text

V1 mechanisms and some figure–ground and border effects

Abstract

Introduction

Section snippets

Methods

Results

Discussion

Acknowledgements

J. Theor. Biol.

Vision Res.

Vision Res.

Neuron

Vision Res.

TRENDS Cogn. Sci.

Vision Res.

Dynamic properties of recurrent inhibition in primary visual cortex: contrast and orientation dependence of contextual effects

J. Neurophysiol.

Local interactions in neural networks explain global effects in the masking of visual stimuli

Neural Comput.

The functional organization of local circuits in visual cortex: insights from the study of tree shrew striate cortex

Cereb. Cortex

Two-dimensional and three-dimensional texture processing in visual cortex of the macaque monkey

Clustered intrinsic connections in cat visual cortex

J. Neurosci.

Normalization of cell responses in cat striate cortex

Visual Neurosci.

Seeing properties of an invisible element: feature inheritance and shine-through

Proc. Natl. Acad. Sci. USA

Synaptic physiology of horizontal connections in the cat's visual cortex

J. Neurosci.

Response modulations by static texture surround in area V1 of the macaque monkey do not depend on feedback connections from V2

J. Neurophysiol.

Spatial organization and magnitude of orientation contrast interactions in primate V1

J. Neurophysiol.

Relationship between lateral inhibitory connections and the topograph of the orientation map in cat visual cortex

Euro. J. Neurosci.

Orientation-specific relationship between populations of excitatory and inhibitory lateral connections in the visual cortex of the cat

Cerebral Cortex

Neuronal responses to static texture patterns ion area V1 of the alert macaque monkeys

J. Neurophysiol.