2004 Special IssueIntegration of form and motion within a generative model of visual cortex
Introduction
The classical view of information processing in visual cortex is that of a bottom-up process in a feed-forward hierarchy (Hubel & Wiesel, 1977). However bottom-up information that encodes physical properties of the sensory input is often insufficient, uncertain, and even ambiguous—for example consider the classic demonstrations of the Dalmatian dog (Thurston & Carraher, 1966) and Rubin's vase (Rubin, 1915). Psychophysical (Adelson, 1992, Driver and Spence, 1998), anatomical (Budd, 1998, Callaway, 1998) and physiological (Bullier et al., 1996, Martinez-Conde et al., 1999) evidence suggests that integration of bottom-up and top-down processes plays a crucial role in the processing of the sensory input. For example top-down factors, such as attention, can result in strong modulation of neural responses as early as primary visual cortex (V1) (McAdams & Read, 2003). Information also flows laterally between populations of cortical neurons within the same level of the processing hierarchy. This is due to the ‘generalized aperture problem’1 with which the visual system is confronted. An individual neuron or local population of neurons ‘sees’ only a limited patch of the visual field. To form coherent representations of objects, non-local informational dependencies, and their uncertainties, must be integrated across space and time, as well as other stimulus and representational dimensions.
One particularly striking example illustrating the nature of the visual integration problem is that of inferring object motion in a scene. The motion of a homogeneous contour (or edge) is perceptually ambiguous because of the ‘aperture problem’—i.e. a single local measurement along an object's bounding contour cannot be used to reliably infer an object's motion. However, this ambiguity can be potentially overcome by measuring locally unambiguous motion signals, tied to specific visual features, and then integrating these to form a global motion percept. An early study by Adelson (Adelson & Movshon, 1982) has suggested that these two stages—local motion measurement and integration—are indeed involved in visual motion perception.
There are several visual features that have been identified as being unambiguous local motion cues. Examples of such features are line terminators and junctions. Line terminators have been traditionally classified into two different types: intrinsic terminators, that are due to the natural end of a line, and extrinsic terminators, that are not created by the end of a line itself but rather a result of occlusion by another surface (Nakayama, Shimojo, & Silverman, 1989). Intrinsic terminators are claimed to provide an unambiguous signal for the true velocity of the line, while extrinsic terminators generate a locally ambiguous signal which presumably should have little influence for accurate motion estimation (Shimojo, Silverman, & Nakayama, 1989). One problem is that there is ambiguity in all local measurements and therefore it is not a simple matter of determining which of the motions are ambiguous or unambiguous but the degree of ambiguity—i.e. degree of certainty of the cue.
In this paper we describe a generative network model for integrating form and motion cues which directly exploits the uncertainty in these cues. Generative models are probabilistic models which directly model the distribution of some set of observations and underlying hidden variables or states (Jebara, 2004). The advantages of a generative model are that (1) it uses probabilities to represent the ‘state of the world’ and therefore directly exploits uncertainties associated with noise and ambiguity, (2) through the use of Bayesian machinery, one can infer the underlying state of hidden variables, (3) it naturally enables integration via multiple sources, including those arising from bottom-up, top-down, and lateral inputs as well as those arising from other streams and (4) it results in a system capable of performing a variety of analysis functions including segmentation, classification, synthesis, compression, etc.
We describe a generative network model that accounts for interaction between form and motion in a relatively simple way, focusing on the influence of ‘direction of figure’ on local motion at junctions where line terminators are defined. A visual surface can be defined by associating an object's boundary to a region representing the object's surface. The basic problem in this surface assignment is to determine the contour ownership (Nakayama et al., 1989). We represent ownership using a local representation which we call the ‘direction of figure’ (DOF) (Sajda & Finkel, 1995). In our model the DOF at each point of the contour is represented by a hidden variable whose probability is inferred via integration of bottom-up, top-down and lateral inputs. The ‘belief’ in the DOF is used to estimate occlusion boundaries between surfaces that are defined by where the DOF changes—the ownership junction (Finkel & Sajda, 1994). In the model, the probability of extrinsic line terminators is a function of the probability of these ownership junctions. Thus rather than completely suppressing the motion signals at the extrinsic terminators, the degree of certainty (i.e. belief) in the DOF is considered as the strength of the evidence for surface occlusion and used to determine the strength of local motion suppression.
The remainder of the paper is organized as follows. Section 2 describes our generative network model, putting it in the context of the organizational structure of visual cortex. Though the model we describe does not use biologically realistic units (e.g. conductance based integrate-and-fire neurons) it is instructive to consider how the generative model maps to the cortical architecture. We next describe the details of the integration process between the form and motion streams, a process that exploits informational uncertainty. We describe how this uncertainty is propagated through the network using Bayesian machinery. We then present two sets of simulation results, illustrating the interaction of the form and motion streams. The first set of simulations shows how the model can generate results for motion coherence stimuli consistent with the psychophysical experiments of McDermott, Weiss, and Adelson (2001). We show how form cues change the model's inference of perceived motion, in particular a gradual transition from incoherent to coherent motion. We then demonstrate results for the classic barber-pole stimulus (Wallach, 1935), showing how occlusion influences the certainty in the perceived object motion through form (DOF) cues.
Section snippets
Hypercolumns in the visual cortex
Since the term ‘hypercolumn’ was coined by Hubel and Wiesel (1977) it has been used to describe the neural machinery necessary to process a discrete region of the visual field. Typically, a hypercolumn occupies a cortical area of ∼1 mm2 and contains tens of thousands of neurons. Current experimental and physiological studies have revealed substantial complexity in neuronal response to multiple, simultaneous inputs, including contextual influence, as early as V1 (Gilbert, 1992, Kapadia et al.,
Simulation results
All simulations were done with the same network architecture and parameter values, except the covariance of the motion prior. See Appendix for parameter values used in the simulations.3
Discussion
In this paper we describe a generative network model for integrating form and motion cues. The model can account for a number of perceptual phenomena related to how form information is used to distinguish between intrinsic and extrinsic terminators in the motion integration process. Previous neural network models on segmentation and integration of motion signals have studied the influence of motion signals at terminators and occlusion cues (Grossberg et al., 2001, Lidén and Pack, 1999). The
Acknowledgements
This work was supported by the DoD Multidisciplinary University Research Initiative (MURI) program administered by the Office of Naval Research under grant N00014-01-1-0625, and a grant from the National Imagery and Mapping Agency, NMA201-02-C0012.
References (59)
- et al.
Functional interactions between areas V1 and V2 in the monkey
Journal of Physiology (Paris)
(1996) Horizontal integration and cortical dynamics
Neuron
(1992)- et al.
Neural dynamics of motion integration and segmentation within and across apertures
Vision Research
(2001) - et al.
Improvement in visual sensitivity by changes in local context: parallel studies in human observers and in V1 of alert monkeys
Neuron
(1995) - et al.
Monocular occlusion cues alter the influence of terminator motion in the barber pole phenomenon
Vision Research
(1998) - et al.
The role of terminators and occlusion cues in motion integration and segmentation: a neural network model
Vision Research
(1999) - et al.
The aperture problem II: spatial integration of velocity information along contours
Vision Research
(1988) - et al.
Neurophysiological evidence for contrast dependent long-range facilitation and suppression in the human visual cortex
Vision Research
(1996) - et al.
Occlusion and the solution to the aperture problem for motion
Vision Research
(1989) - et al.
Lateral connectivity and contextual interactions in macaque primary visual cortex
Neuron
(2002)