Predictions of a model of spatial attention using sum- and max-pooling functions
Introduction
Increases in receptive field size along the ventral pathway of the visual system are assumed to facilitate location-invariant object recognition [17], [10], [28], [40], [32]. Many feature detectors with smaller receptive field sizes at different spatial positions must converge onto cells with large receptive fields. Typically a sum over all afferents is assumed. Hubel and Wiesel [17] suggested that the response of a “complex cell” can be generated by a pooled response from several “simple cells”. This idea of location invariance was first exploited by the Neocognitron from Fukushima [10]. Recently, a max-like pooling has been proposed to establish more robust shift invariant feature detectors [32]. The idea is that a max-function results in a sharp tuning curve around the encoded stimulus, whereas a summation smears the information from different locations.
Regarding the competition for attention, we have also used a max-pooling function [14], [15]. It has previously been suggested within the framework of biased competition that competitive interactions in the receptive field are responsible for attentional phenomena [7]. Given a target stimulus among several identical distractors, a sum-pooling would enhance cells encoding the distractors, which would result in a preference of distractors. As an emergent consequence of using sum-pooling, Humphreys and Müller suggested a SEarch via Recursive Rejection model that tends to repeatedly reject strongly represented distractors until the target is detected [18]. Such a strategy has not been confirmed by findings, yet. A max-pooling would prevent multiple distractors placed within the receptive field from adding up their feature weight and dominating the competition.
It seems that different constraints, one for robust object recognition and the other for an efficient attentional processing, lead to a max-like pooling mechanism. To shed light on this convergence, we discuss the predictions of a model of attention, which is consistent with the above-mentioned biased competition framework of attention [7], using a max-pooling and a ∑-pooling function.
Section snippets
Competition and attention
Findings in attention experiments indicate that attention and spatial pooling are related to each other. For example, V4 neurons that encode a target stimulus show a strong increase in activity compared to neurons that encode a distractor stimulus, but only when the target and the distractor are both within the cells receptive field [5]. Why is the receptive field (RF) a central-processing resource? When more than one object is located within the classical receptive field, ambiguities emerge. A
Model of attention in V4
We present a model that describes how spatial attention assigns a processing priority to optimize object recognition. The interactions in V4 are modeled using a population code approach. One influential and often-used approach for modeling cooperative and competitive interactions is based on an additive activation function [1], [16], [2], [27]. However, a comparison with neural data is difficult, since unlike real cells, units with less input are very quickly suppressed. These models typically
Results
The finding that attention implements a multiplicative gain increase if only one stimulus is presented [24], [38] is replicated by both the ∑- and max-pooling functions (Fig. 2). To work out the predictions of the alternative pooling functions, we now observe the effect of different attentional conditions, when a reference, a probe or both stimuli are presented. In order to determine the effect of adding a probe on the whole population, we computed a selectivity value and interaction index for
Discussion
We used a model of attention to explore the possible effects of pooling on the response of cells under different attentional conditions. Alternately using spatial max-pooling and ∑-pooling, our model accounts for the finding that if the receptive field contains just one stimulus attention results in a multiplicative gain increase, as observed in MT, MST and V4 [38], [24]. If two stimuli are presented within the receptive field, the model using each pooling function can reproduce the data of
Acknowledgements
This research has been performed at Caltech. I am grateful to John Reynolds for providing the data showing the attention effects on V4 cells (Fig. 3A). I am pleased for extensive discussions with Jamie Mazer and Rufin VanRullen. I thank Christof Koch, Dirk Walther, Brad Motter, Max Riesenhuber and Tomaso Poggio for helpful comments on an earlier manuscript. This work was supported by DFG HA2630/2-1 and in part by the ERC Program of the NSF (EEC-9402726).
References (42)
- et al.
Competition and cooperation in neural nets
- et al.
Contrast-sensitive perceptual grouping and object-based attention in the laminar circuits of primary visual cortex
Vis. Res.
(2000) - et al.
Search via recursive rejection (SERR)a connectionist model of visual search
Cogn. Psychol.
(1993) - et al.
Suppression of visual responses of neurons in inferior temporal cortex of the awake macaque by addition of a second stimulus
Brain Res.
(1993) - et al.
The neurophysiology of shape processing
Image and Vision Comput.
(1993) - et al.
Taking the MAX from neuronal responses
Trends Cogn. Sci.
(2003) - et al.
Invariant face and object recognition in the visual system
Prog. Neurobiol.
(1997) Modelling Brain Functions
(1989)- et al.
Linearity and normalization in simple cells of the macaque primary visual cortex
J. Neurosci.
(1997) - et al.
Synaptic depression and the temporal response characteristics of V1 cells
J. Neurosci.
(1998)
Responses of neurons in macaque area V4 during memory-guided visual search
Cereb. Cortex
Large-scale neural model for visual attentionintegration of experimental single-cell and fMRI data
Cereb. Cortex
Neural mechanisms of selective attention
Annu. Rev. Neurosci.
Visual properties of neurons in area V4 of the macaquesensitivity to stimulus form
J. Neurophysiol.
Modulation of oscillatory neuronal synchronization by selective visual attention
Science
Neocognitrona self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position
Biol. Cybern.
Responses of primate visual cortical V4 neurons to simultaneously presented stimuli
J. Neurophysiol.
Contour enhancement short term memory, and constancies in reverberating neural networks
Stud. Appl. Math.
The role of feedback connections in task-driven visual search
Neurons with graded response have collective computational properties like those of two-state neurons
Proc. Natl. Acad. Sci. USA
Cited by (33)
Synchrony, flexible network configuration, and linking neural events to behavior
2020, Current Opinion in PhysiologyCitation Excerpt :A characteristic property of a mechanism modifying signal routing and network configuration is the site or structure at which it interferes with the transmission or processing of information. For selection among different subsets of afferent input signals, as required by selective attention, there are three primary options: (1) at the distinct upstream populations providing the competing subsets of afferent signals, strong modulations of firing rate may minimize irrelevant input while boosting the relevant afferent signals, (2) selective gating of the synaptic input of these subsets at the receiver neuron could diminish irrelevant input [12,16], and (3) modulating the receiver neuron’s output gain, for example, by changing spike threshold depending on a match between attended features and location of the target stimulus and the cells’ feature selectivity and RF location [17–19]. The frequent lack of strong attention-dependent firing rate modulations in the upstream populations in areas V1 or V2 [20–25] discards the first option as a general mechanism.
Image quality recognition technology based on deep learning
2019, Journal of Visual Communication and Image RepresentationLearning Pooling for Convolutional Neural Network
2017, NeurocomputingA mechanistic cortical microcircuit of attention for amplification, normalization and suppression
2015, Vision ResearchCitation Excerpt :Moreover, several studies observed that attention typically leads to a shift in the contrast response function (Li et al., 2008; Martínez-Trujillo & Treue, 2002; Reynolds, Pasternak, & Desimone, 2000) which suggests that well visible stimuli are not further boosted. A large number of neuro-computational models have been developed and demonstrated to account for parts of these data (Ardid, Wang, & Compte, 2007; Boynton, 2009; Buia & Tiesinga, 2008; Compte & Wang, 2006; Hamker, 2004, 2005b; Hugues & José, 2010; Lee & Maunsell, 2009; Ni, Ray, & Maunsell, 2012; Reynolds & Heeger, 2009; Spratling, 2008; Spratling & Johnson, 2004; Wagatsuma et al., 2013). Basically, each of these models includes lateral or feedforward inhibition and some form of attentive gain increase.
Fast human action classification and VOI localization with enhanced sparse coding
2013, Journal of Visual Communication and Image RepresentationCitation Excerpt :A few recent works claim that the max-pooling is more biologically plausible. The research study [27] through simulating the monkey’s responses in neurons cells and theirs spatial attentions discovered that max-pooling does not rule out the sum pooling, especially when the distracters are increased in their visual attention. This finding can be reasonably bridged into our study.
The micro-structure of attention
2006, Neural NetworksCitation Excerpt :Across an ensemble of such neurons the stimulus specificity is not sharp enough. There is support for our results from (Hamker, 2004), who only used graded neurons and a single layered cortex; nor did he consider additive or output gain forms of attention feedback. Our results are also supported somewhat by recent experimental results (Williford & Maunsell, 2006) though they found whilst contrast gain could describe the results, they concluded that response gain and activity gain were marginally better fits.