Elsevier

Neurocomputing

Volume 56, January 2004, Pages 329-343
Neurocomputing

Predictions of a model of spatial attention using sum- and max-pooling functions

https://doi.org/10.1016/j.neucom.2003.09.006Get rights and content

Abstract

Assuming a convergent projection within a hierarchy of processing stages stimuli from different areas of the receptive field project onto the same population of cells. Pooling over space affects the representation of individual stimuli, and thus its understanding is crucial for attention and ultimately for object recognition. Since attention, in turn, is likely to modify such spatial pooling by changing the competitive weight of individual stimuli, we compare the predictions of sum- and max-pooling methods using a model of attention. Both pooling functions can account for data investigating the competition between a pair of stimuli within a V4 receptive field; however, our model using sum-pooling predicts a different tuning curve. If we present an additional probe stimulus with the pair, sum-pooling predicts a bottom-up bias of attention, whereas the competition for attention using max-pooling is robust against the additional stimulus.

Introduction

Increases in receptive field size along the ventral pathway of the visual system are assumed to facilitate location-invariant object recognition [17], [10], [28], [40], [32]. Many feature detectors with smaller receptive field sizes at different spatial positions must converge onto cells with large receptive fields. Typically a sum over all afferents is assumed. Hubel and Wiesel [17] suggested that the response of a “complex cell” can be generated by a pooled response from several “simple cells”. This idea of location invariance was first exploited by the Neocognitron from Fukushima [10]. Recently, a max-like pooling has been proposed to establish more robust shift invariant feature detectors [32]. The idea is that a max-function results in a sharp tuning curve around the encoded stimulus, whereas a summation smears the information from different locations.

Regarding the competition for attention, we have also used a max-pooling function [14], [15]. It has previously been suggested within the framework of biased competition that competitive interactions in the receptive field are responsible for attentional phenomena [7]. Given a target stimulus among several identical distractors, a sum-pooling would enhance cells encoding the distractors, which would result in a preference of distractors. As an emergent consequence of using sum-pooling, Humphreys and Müller suggested a SEarch via Recursive Rejection model that tends to repeatedly reject strongly represented distractors until the target is detected [18]. Such a strategy has not been confirmed by findings, yet. A max-pooling would prevent multiple distractors placed within the receptive field from adding up their feature weight and dominating the competition.

It seems that different constraints, one for robust object recognition and the other for an efficient attentional processing, lead to a max-like pooling mechanism. To shed light on this convergence, we discuss the predictions of a model of attention, which is consistent with the above-mentioned biased competition framework of attention [7], using a max-pooling and a ∑-pooling function.

Section snippets

Competition and attention

Findings in attention experiments indicate that attention and spatial pooling are related to each other. For example, V4 neurons that encode a target stimulus show a strong increase in activity compared to neurons that encode a distractor stimulus, but only when the target and the distractor are both within the cells receptive field [5]. Why is the receptive field (RF) a central-processing resource? When more than one object is located within the classical receptive field, ambiguities emerge. A

Model of attention in V4

We present a model that describes how spatial attention assigns a processing priority to optimize object recognition. The interactions in V4 are modeled using a population code approach. One influential and often-used approach for modeling cooperative and competitive interactions is based on an additive activation function [1], [16], [2], [27]. However, a comparison with neural data is difficult, since unlike real cells, units with less input are very quickly suppressed. These models typically

Results

The finding that attention implements a multiplicative gain increase if only one stimulus is presented [24], [38] is replicated by both the ∑- and max-pooling functions (Fig. 2). To work out the predictions of the alternative pooling functions, we now observe the effect of different attentional conditions, when a reference, a probe or both stimuli are presented. In order to determine the effect of adding a probe on the whole population, we computed a selectivity value and interaction index for

Discussion

We used a model of attention to explore the possible effects of pooling on the response of cells under different attentional conditions. Alternately using spatial max-pooling and ∑-pooling, our model accounts for the finding that if the receptive field contains just one stimulus attention results in a multiplicative gain increase, as observed in MT, MST and V4 [38], [24]. If two stimuli are presented within the receptive field, the model using each pooling function can reproduce the data of

Acknowledgements

This research has been performed at Caltech. I am grateful to John Reynolds for providing the data showing the attention effects on V4 cells (Fig. 3A). I am pleased for extensive discussions with Jamie Mazer and Rufin VanRullen. I thank Christof Koch, Dirk Walther, Brad Motter, Max Riesenhuber and Tomaso Poggio for helpful comments on an earlier manuscript. This work was supported by DFG HA2630/2-1 and in part by the ERC Program of the NSF (EEC-9402726).

References (42)

  • L. Chelazzi et al.

    Responses of neurons in macaque area V4 during memory-guided visual search

    Cereb. Cortex

    (2001)
  • S. Corchs et al.

    Large-scale neural model for visual attentionintegration of experimental single-cell and fMRI data

    Cereb. Cortex

    (2002)
  • R. Desimone et al.

    Neural mechanisms of selective attention

    Annu. Rev. Neurosci.

    (1995)
  • R. Desimone et al.

    Visual properties of neurons in area V4 of the macaquesensitivity to stimulus form

    J. Neurophysiol.

    (1987)
  • P. Fries et al.

    Modulation of oscillatory neuronal synchronization by selective visual attention

    Science

    (2001)
  • K. Fukushima

    Neocognitrona self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position

    Biol. Cybern.

    (1980)
  • T.J. Gawne et al.

    Responses of primate visual cortical V4 neurons to simultaneously presented stimuli

    J. Neurophysiol.

    (2002)
  • S. Grossberg

    Contour enhancement short term memory, and constancies in reverberating neural networks

    Stud. Appl. Math.

    (1973)
  • F.H. Hamker, Visuelle Aufmerksamkeit und lebenslanges Lernen im Wahrnehmungs-Handlungs-Zyklus, Ph.D. Thesis, Technische...
  • F.H. Hamker

    The role of feedback connections in task-driven visual search

  • J.J. Hopfield

    Neurons with graded response have collective computational properties like those of two-state neurons

    Proc. Natl. Acad. Sci. USA

    (1984)
  • Cited by (33)

    • Synchrony, flexible network configuration, and linking neural events to behavior

      2020, Current Opinion in Physiology
      Citation Excerpt :

      A characteristic property of a mechanism modifying signal routing and network configuration is the site or structure at which it interferes with the transmission or processing of information. For selection among different subsets of afferent input signals, as required by selective attention, there are three primary options: (1) at the distinct upstream populations providing the competing subsets of afferent signals, strong modulations of firing rate may minimize irrelevant input while boosting the relevant afferent signals, (2) selective gating of the synaptic input of these subsets at the receiver neuron could diminish irrelevant input [12,16], and (3) modulating the receiver neuron’s output gain, for example, by changing spike threshold depending on a match between attended features and location of the target stimulus and the cells’ feature selectivity and RF location [17–19]. The frequent lack of strong attention-dependent firing rate modulations in the upstream populations in areas V1 or V2 [20–25] discards the first option as a general mechanism.

    • Image quality recognition technology based on deep learning

      2019, Journal of Visual Communication and Image Representation
    • A mechanistic cortical microcircuit of attention for amplification, normalization and suppression

      2015, Vision Research
      Citation Excerpt :

      Moreover, several studies observed that attention typically leads to a shift in the contrast response function (Li et al., 2008; Martínez-Trujillo & Treue, 2002; Reynolds, Pasternak, & Desimone, 2000) which suggests that well visible stimuli are not further boosted. A large number of neuro-computational models have been developed and demonstrated to account for parts of these data (Ardid, Wang, & Compte, 2007; Boynton, 2009; Buia & Tiesinga, 2008; Compte & Wang, 2006; Hamker, 2004, 2005b; Hugues & José, 2010; Lee & Maunsell, 2009; Ni, Ray, & Maunsell, 2012; Reynolds & Heeger, 2009; Spratling, 2008; Spratling & Johnson, 2004; Wagatsuma et al., 2013). Basically, each of these models includes lateral or feedforward inhibition and some form of attentive gain increase.

    • Fast human action classification and VOI localization with enhanced sparse coding

      2013, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      A few recent works claim that the max-pooling is more biologically plausible. The research study [27] through simulating the monkey’s responses in neurons cells and theirs spatial attentions discovered that max-pooling does not rule out the sum pooling, especially when the distracters are increased in their visual attention. This finding can be reasonably bridged into our study.

    • The micro-structure of attention

      2006, Neural Networks
      Citation Excerpt :

      Across an ensemble of such neurons the stimulus specificity is not sharp enough. There is support for our results from (Hamker, 2004), who only used graded neurons and a single layered cortex; nor did he consider additive or output gain forms of attention feedback. Our results are also supported somewhat by recent experimental results (Williford & Maunsell, 2006) though they found whilst contrast gain could describe the results, they concluded that response gain and activity gain were marginally better fits.

    View all citing articles on Scopus
    View full text