Elsevier

Neural Networks

Volume 17, Issues 5–6, June–July 2004, Pages 809-821
Neural Networks

2004 Special Issue
Integration of form and motion within a generative model of visual cortex

https://doi.org/10.1016/j.neunet.2004.03.013Get rights and content

Abstract

One of the challenges faced by the visual system is integrating cues within and across processing streams for inferring scene properties and structure. This is particularly apparent in the inference of object motion, where psychophysical experiments have shown that integration of motion signals, distributed across space, must also be integrated with form cues. This has led several to conclude that there exist mechanisms which enable form cues to ‘veto’ or completely suppress ambiguous motion signals. We describe a probabilistic approach which uses a generative network model for integrating form and motion cues using the machinery of belief propagation and Bayesian inference. We show, using computer simulations, that motion integration can be mediated via a local, probabilistic representation of contour ownership, which we have previously termed ‘direction of figure’. The uncertainty of this inferred form cue is used to modulate the covariance matrix of network nodes representing local motion estimates in the motion stream. We show with results for two sets of stimuli that the model does not completely suppress ambiguous cues, but instead integrates them in a way that is a function of their underlying uncertainty. The result is that the model can account for the continuum of bias seen for motion coherence and perceived object motion in psychophysical experiments.

Introduction

The classical view of information processing in visual cortex is that of a bottom-up process in a feed-forward hierarchy (Hubel & Wiesel, 1977). However bottom-up information that encodes physical properties of the sensory input is often insufficient, uncertain, and even ambiguous—for example consider the classic demonstrations of the Dalmatian dog (Thurston & Carraher, 1966) and Rubin's vase (Rubin, 1915). Psychophysical (Adelson, 1992, Driver and Spence, 1998), anatomical (Budd, 1998, Callaway, 1998) and physiological (Bullier et al., 1996, Martinez-Conde et al., 1999) evidence suggests that integration of bottom-up and top-down processes plays a crucial role in the processing of the sensory input. For example top-down factors, such as attention, can result in strong modulation of neural responses as early as primary visual cortex (V1) (McAdams & Read, 2003). Information also flows laterally between populations of cortical neurons within the same level of the processing hierarchy. This is due to the ‘generalized aperture problem’1 with which the visual system is confronted. An individual neuron or local population of neurons ‘sees’ only a limited patch of the visual field. To form coherent representations of objects, non-local informational dependencies, and their uncertainties, must be integrated across space and time, as well as other stimulus and representational dimensions.

One particularly striking example illustrating the nature of the visual integration problem is that of inferring object motion in a scene. The motion of a homogeneous contour (or edge) is perceptually ambiguous because of the ‘aperture problem’—i.e. a single local measurement along an object's bounding contour cannot be used to reliably infer an object's motion. However, this ambiguity can be potentially overcome by measuring locally unambiguous motion signals, tied to specific visual features, and then integrating these to form a global motion percept. An early study by Adelson (Adelson & Movshon, 1982) has suggested that these two stages—local motion measurement and integration—are indeed involved in visual motion perception.

There are several visual features that have been identified as being unambiguous local motion cues. Examples of such features are line terminators and junctions. Line terminators have been traditionally classified into two different types: intrinsic terminators, that are due to the natural end of a line, and extrinsic terminators, that are not created by the end of a line itself but rather a result of occlusion by another surface (Nakayama, Shimojo, & Silverman, 1989). Intrinsic terminators are claimed to provide an unambiguous signal for the true velocity of the line, while extrinsic terminators generate a locally ambiguous signal which presumably should have little influence for accurate motion estimation (Shimojo, Silverman, & Nakayama, 1989). One problem is that there is ambiguity in all local measurements and therefore it is not a simple matter of determining which of the motions are ambiguous or unambiguous but the degree of ambiguity—i.e. degree of certainty of the cue.

In this paper we describe a generative network model for integrating form and motion cues which directly exploits the uncertainty in these cues. Generative models are probabilistic models which directly model the distribution of some set of observations and underlying hidden variables or states (Jebara, 2004). The advantages of a generative model are that (1) it uses probabilities to represent the ‘state of the world’ and therefore directly exploits uncertainties associated with noise and ambiguity, (2) through the use of Bayesian machinery, one can infer the underlying state of hidden variables, (3) it naturally enables integration via multiple sources, including those arising from bottom-up, top-down, and lateral inputs as well as those arising from other streams and (4) it results in a system capable of performing a variety of analysis functions including segmentation, classification, synthesis, compression, etc.

We describe a generative network model that accounts for interaction between form and motion in a relatively simple way, focusing on the influence of ‘direction of figure’ on local motion at junctions where line terminators are defined. A visual surface can be defined by associating an object's boundary to a region representing the object's surface. The basic problem in this surface assignment is to determine the contour ownership (Nakayama et al., 1989). We represent ownership using a local representation which we call the ‘direction of figure’ (DOF) (Sajda & Finkel, 1995). In our model the DOF at each point of the contour is represented by a hidden variable whose probability is inferred via integration of bottom-up, top-down and lateral inputs. The ‘belief’ in the DOF is used to estimate occlusion boundaries between surfaces that are defined by where the DOF changes—the ownership junction (Finkel & Sajda, 1994). In the model, the probability of extrinsic line terminators is a function of the probability of these ownership junctions. Thus rather than completely suppressing the motion signals at the extrinsic terminators, the degree of certainty (i.e. belief) in the DOF is considered as the strength of the evidence for surface occlusion and used to determine the strength of local motion suppression.

The remainder of the paper is organized as follows. Section 2 describes our generative network model, putting it in the context of the organizational structure of visual cortex. Though the model we describe does not use biologically realistic units (e.g. conductance based integrate-and-fire neurons) it is instructive to consider how the generative model maps to the cortical architecture. We next describe the details of the integration process between the form and motion streams, a process that exploits informational uncertainty. We describe how this uncertainty is propagated through the network using Bayesian machinery. We then present two sets of simulation results, illustrating the interaction of the form and motion streams. The first set of simulations shows how the model can generate results for motion coherence stimuli consistent with the psychophysical experiments of McDermott, Weiss, and Adelson (2001). We show how form cues change the model's inference of perceived motion, in particular a gradual transition from incoherent to coherent motion. We then demonstrate results for the classic barber-pole stimulus (Wallach, 1935), showing how occlusion influences the certainty in the perceived object motion through form (DOF) cues.

Section snippets

Hypercolumns in the visual cortex

Since the term ‘hypercolumn’ was coined by Hubel and Wiesel (1977) it has been used to describe the neural machinery necessary to process a discrete region of the visual field. Typically, a hypercolumn occupies a cortical area of ∼1 mm2 and contains tens of thousands of neurons. Current experimental and physiological studies have revealed substantial complexity in neuronal response to multiple, simultaneous inputs, including contextual influence, as early as V1 (Gilbert, 1992, Kapadia et al.,

Simulation results

All simulations were done with the same network architecture and parameter values, except the covariance of the motion prior. See Appendix for parameter values used in the simulations.3

Discussion

In this paper we describe a generative network model for integrating form and motion cues. The model can account for a number of perceptual phenomena related to how form information is used to distinguish between intrinsic and extrinsic terminators in the motion integration process. Previous neural network models on segmentation and integration of motion signals have studied the influence of motion signals at terminators and occlusion cues (Grossberg et al., 2001, Lidén and Pack, 1999). The

Acknowledgements

This work was supported by the DoD Multidisciplinary University Research Initiative (MURI) program administered by the Office of Naval Research under grant N00014-01-1-0625, and a grant from the National Imagery and Mapping Agency, NMA201-02-C0012.

References (59)

  • E.H Adelson

    Perceptual organization and the judgment of brightness

    Science

    (1992)
  • E.H Adelson et al.

    Phenomenal coherence of moving visual patterns

    Nature

    (1982)
  • C.H Anderson et al.

    Neurobiological computational systems

  • K Baek et al.

    A probabilistic network model for integrating visual cues and inferring intermediate-level representations

    (2003)
  • W.H Bosking et al.

    Orientation selectivity and the arrangement of horizontal connections in tree shrew striate cortex

    Journal of Neuroscience

    (1997)
  • J.M.L Budd

    Extrastriate feedback to primary visual cortex in primates: a quantitative analysis of connectivity

    Proceedings of the Royal Society of London B

    (1998)
  • E.M Callaway

    Local circuits in primary visual cortex of the macaque monkey

    Annual Review of Neuroscience

    (1998)
  • J.C Crowley et al.

    Development of ocular dominance columns in the absence of retinal input

    Nature Neuroscience

    (1999)
  • S Deneve et al.

    Reading population codes: a neural implementation of ideal observers

    Nature Neuroscience

    (1999)
  • J Driver et al.

    Cross-modal links in spatial attention

    Philosophical Transactions of the Royal Society of London B: Biological Science

    (1998)
  • R.O Duncan et al.

    Occlusion and the interpretation of visual motion: perceptual and neuronal effects of context

    The Journal of Neuroscience

    (2000)
  • L.H Finkel et al.

    Constructing visual perception

    American Scientist

    (1994)
  • W.T Freeman et al.

    Learning low-level vision

    International Journal of Computer Vision

    (2000)
  • W.S Geisler et al.

    Bayesian natural selection and the evolution of perceptual systems

    Philosophical Transactions of the Royal Society of London B: Biological Science

    (2002)
  • S Grossberg et al.

    A neural model of horizontal and interlaminar connections of visual cortex develop into adult circuits that carry out perceptual grouping and learning

    Cerebral Cortex

    (2001)
  • D.J Heeger et al.

    Computational models of cortical visual processing

    Proceedings of the National Academy of Sciences

    (1996)
  • G.E Hinton et al.
  • J.C Horton et al.

    Intrinsic variability of ocular dominance column periodicity in normal macaque monkeys

    Journal of Neuroscience

    (1996)
  • D.H Hubel et al.

    Functional architecture of macaque monkey visual cortex

    Proceedings of the Royal Society of London B

    (1977)
  • Cited by (0)

    View full text