Research reportAn electrophysiological study of scene effects on object identification
Introduction
Most empirical work on visual object identification has focused on isolated objects (see for review Refs. [41], [47], [78]). Yet, in our everyday visual environment, objects are embedded in meaningful visual scenes. In this work, we focus on the processing effects of one specific high-level regularity in visual scenes, namely the long-term co-occurrence of object classes in certain scene contexts, on visual object identification. Specifically, we compare the identification of an object in a setting in which it is often seen (congruous context) versus another in which it is rarely seen (incongruous context) such as a personal computer in an office versus a bathroom scene. In these studies, we do not distinguish between ‘associative’ and ‘semantic’ regularities: objects that are semantically related not only tend to be seen in similar scenes but also typically tend to co-occur in the same scene.
Recent studies by Chun and collaborators (see for review Ref. [16]) using visual search paradigms have shown that the cognitive system can indeed acquire incidentally information about the co-occurrence of visual shapes. These studies have demonstrated that seeing a target shape in the context of an array of other shapes facilitates later search of the target shape when it appears in the same context array, relative to a novel one. Furthermore, the results of electrophysiological studies in non-human primates have provided direct evidence that neurons in the anterior temporal cortex can encode such co-occurrence patterns by means of associations between elaborate representations of visual stimuli [55], [56]. Outside the laboratory, temporal associations of this type may occur routinely via systematic patterns of eye movements (e.g., Ref. [22]); after all, visual objects that tend to co-occur in a visual scene are likely to be fixated in close temporal proximity.
Cognitive psychological accounts have hypothesized the existence of specific memory representations for such co-occurrence patterns. Within a number of psychological accounts, cumulative interactions of the organism with the environment are presumed to lead to the formation of scene-specific schemata, or frames that represent the “likelihoods, ranges and distributions of things and events” (Ref. [25], p. 321). Some accounts also include memory representations that are built of a network of dynamic associations among neuron-like nodes and have the advantage of providing a potential link with the neurobiology (e.g., Ref. [64]).
Once activated, scene-specific memory representations are assumed to influence processing of incoming visual information, although both the mode and time course of this influence are debated. Different accounts of object identification in scenes postulate different loci for context effects. Among the processes into which object identification has been decomposed, perceptual processes are believed to analyze the visual input and to transform it into ‘structural descriptions’ (i.e., high-level visual representations of the shape and structure of visual objects [63]). Subsequent processes are presumed to match the structural descriptions of the object to be identified with those of object models stored in the structural description system [63]. If a good match is found, relevant semantic knowledge is activated. From a computational vision perspective, object identification terminates with a successful match; thus, the processes involved in the activation of semantic knowledge (and beyond) are typically referred to as ‘post-identification’. In our view, the activation of semantic knowledge is instead an integral part of object identification because the purpose of object vision is precisely that of ascertaining semantic knowledge about visual stimuli.
The level of processing at which scene information can affect object analysis varies between accounts, among which there are three main subdivisions.
According to these accounts, an activated schema can affect the early perceptual analysis of objects within the scene. Schema activation is thought to occur rapidly on the basis of global, low-resolution contextual information [43], [51]. Among the possible candidates for such low-resolution information are scene-emergent features, such as geon clusters typically associated with specific classes of scenes [9] or configurations of oriented ‘blobs’ [71]. An activated schema is assumed to facilitate the detection of perceptual features (color, texture, size, motion, etc.) that are associated with objects specified within the schema itself [4], [25], by means of feature-selective attention. Objects consistent with a scene would thus be identified more quickly on the basis of such features than objects that are inconsistent with a scene.
According to a second class of accounts, scene schemata are assumed to have their effect on the processes involved in matching the structural description with those of object models stored in memory, beyond early perceptual analysis. If we conceptualize this matching process as a selection process wherein the visual system must scan multitudes of representations in the structural description system to find a good match, scene constraints could serve to reduce the size of the search space by priming/biasing classes of the most likely object model representations (e.g., Ref. [78]).
This third class of accounts relegates scene effects to even later processes, such as during semantic knowledge activation or later (e.g., Refs. [21], [33], [37], [47]). According to these accounts, bottom-up visual analysis is sufficient to discriminate between entry-level object categories, after which context may have its effects.
It has proven difficult to infer the time course of context effects from behavioral measures alone because they reflect the ‘downstream’ effects of an experimental manipulation (from the earliest perceptual stages to the motor response). We, therefore, chose to examine the time course of scene effects on object identification via a brain measure with greater temporal precision, namely, recordings of event-related brain potentials (ERPs). Scalp ERPs enable continuous monitoring of the modulations of synchronous neural activity elicited by experimental manipulations in a relatively direct manner. As ERPs have provided crucial information regarding the time course of semantic context effects in language processing [42], [80], we aimed to use a similar approach to investigate the timing of context effects on nonlinguistic, visual processing.
To assess the time course of scene effects on object identification, we compared ERPs elicited by objects appearing in congruous versus incongruous scene contexts. In such an analysis, the timing of ERP differences between these two conditions provides an estimate of the time by when neural representations of scene information must have begun to interact with identification processes for the target object. Furthermore, the spatial distribution of these ERP congruity effects across the head can provide some clues about the nature of the processes involved. To our knowledge, this is the first ERP study that directly addresses scene effects on object identification.
Experiment 1 is a behavioral study to demonstrate the scene congruity effect with our stimulus set. Experiment 2 is an ERP study to determine the time course and spatial signature of the scene congruity effect. Experiment 3 is a replication of Experiment 2 with a modified paradigm not requiring explicit congruity judgments in order to allow a more direct comparison of the congruity effects obtained using scenes with those found in earlier studies with sentential contexts.
Note that throughout the paper we will use the traditional electrophysiological nomenclature to refer to most ERP components (i.e., voltage deflections): the first letter indicates whether the component is negative (‘N’) or positive (‘P’) relative to the chosen reference, whereas the subsequent number(s) indicates either the average latency of the component in ms (e.g., N200) or the ordinal position of the component (e.g., ‘P1’ refers to the first positive ERP component in the visual ERP).
Based on numerous behavioral studies that have reported scene effects on object identification (e.g., Refs. [10], [14]), we expected to obtain a reaction time advantage for congruous relative to incongruous objects. The timing of ERP congruity effects is thus the main focus of the present study. We also used ERP congruity effects to evaluate the three main accounts of scene effects on object cognition. On the one hand, we reasoned that if scene schemata affect the early perceptual analysis of objects, then ERP components indexing early perceptual processes should be modulated by congruity. Short-latency ERPs (onsetting during the first 200 ms poststimulus) are believed to index early perceptual processes because (a) they are modulated by variations in the perceptual attributes of a stimulus, such as spatial location (e.g., Ref. [17]), luminance [46], color [15], [46], spatial frequency [40] coherent motion [59], and shape [68], [77], and (b) they are modulated by task manipulations that affect the perceptual encoding of a stimulus: the P1 and the N1 components (which onset between 80 and 130 ms poststimulus) are modulated by spatial attention (e.g., Ref. [32]); furthermore, the selection negativity (SN, which onsets between 140 and 180 ms poststimulus) is elicited by selective attention to nonspatial features of a visual stimulus such as color [3], shape [72], direction of motion [2], spatial frequency [31], and orientation [39], [62]. The time course and estimated neuroanatomical location of these ERPs (early extrastriate cortex) suggest that they occur prior to stimulus identification.
On the other hand, if scene schemata act only later to affect the activation of semantic knowledge, then congruity should modulate only later components, such as the N400 [42]. N400 amplitude is typically reduced when the eliciting stimulus is preceded by an associatively or semantically related one (e.g., Refs. [5], [6], [7], [8], [12], [36], [50], [69], [70]). Indeed, one view of the N400 is that it reflects neural processes involved in the activation of semantic knowledge [61], [80] probably stored in the anterior temporal lobes.
Predictions for the ERP correlates of scene schemata effects on structural description matching processes can be estimated from prior ERP investigations of the time course of object identification. A class of negativities (‘N300/N350’, ‘Ncl’) peaking around 350 ms with a frontal scalp distribution (with mastoid reference site) has been proposed to reflect activation of structural description matching processes [24], [26], [36], [50], [67]; we refer to these as the structural description negativity, Nsd. By 200 to 300 [230/250] ms, the Nsd is greater for unidentified real objects and non-objects (i.e., images of objects that do not exist) than for identified or real objects. These effects can last for several hundred milliseconds, consistent with the idea that the Nsd reflects repeated but ultimately failed attempts to find a good match for unidentified objects and non-objects; in contrast, Nsd is rapidly reduced within 300 ms when structural description matching succeeds with identified real objects. Thus, if scene schemata affect object identification at the level of structural description matching processes, we might expect to find an ERP congruity effect that onsets around 200 ms with an anterior scalp distribution similar to Nsd effects.
Section snippets
Participants
All 42 participants were UCSD students and native speakers of English. They received course credit, or were paid $5.00/h for participating. Ten participants took part in Experiment 1 (four men, six women between 18 and 25 years of age, mean 20; nine right-handed). Seventeen participants, different from those employed in Experiment 1, took part in Experiment 2 (nine men, eight women between 18 and 25 years of age, mean 20.3; 14 right-handed); data from two participants were discarded due to
Behavioral data
In Experiment 1 (Table 1) participants correctly identified 93% of the target objects. Congruity affected neither the object identification rates (F(2,18)=1.14, P>0.1) nor the object confidence ratings (F(2,18)=0.02, P>0.1). In contrast, mean confidence ratings for the scenes were higher in the congruous (3.93) than incongruous (3.86) condition (F(1,9)=5.62, P<0.05). Finally, congruous stimuli were rated significantly higher in congruity than incongruous ones (F(1,9)=3156.84, P<0.00001). Median
N390 congruity effect
The first reliable effect of congruity, modulation of an N400-like component, begins ∼300 ms, peaks∼ 390 ms, and lasts until ∼500 ms. Throughout this interval, congruous targets show more positivity than incongruous ones. The time course of this congruity effect is roughly similar to that reported for written words in sentences, suggesting that the neural processes underlying the interaction between a visual stimulus and the context in which it appears may operate under similar time constraints
Conclusions
The current experiments, together with prior findings, suggest the following picture of object identification in briefly flashed scenes. Within the first 250–300 ms, visual inputs are processed without any notable impact of top-down influences from the semantic content of a scene. The effects of a scene or a sentence context on object identification processes occur somewhat later, ∼300 ms, as reflected in the N390 scene congruity effect; here, information in memory accessed by the scene context
Acknowledgements
Work reported herein was supported by grants MH52893, HD22614, and AG08313 to M. Kutas. During the revision process of this article the first author was supported by a McDonnell-Pew Program in Cognitive Neuroscience Award and by grants NMA 201-01-C-0032 and 5R01 MH 60734-03 to Stephen M. Kosslyn. We would like to thank Haline E. Schendan and Stephen M. Kosslyn for helpful discussion.
References (80)
- et al.
Asymmetries in event-related potentials during rhyme-matching: confirmation of the null effects of handedness
Neuropsychologia
(1989) - et al.
Event-related potentials and the phonological matching of picture names
Brain Lang.
(1990) - et al.
Event-related potentials and the semantic matching of pictures
Brain Cogn.
(1990) - et al.
Event-related potentials and the matching of familiar and unfamiliar faces
Neuropsychologia
(1988) - et al.
Scene perception: detecting and judging objects undergoing relational violations
Cognit. Psychol.
(1982) - et al.
High level object recognition without an anterior inferior temporal lobe
Neuropsychologia
(1997) - et al.
An ERP study of expectancy violation in face perception
Brain Cogn.
(1994) Contextual cueing of visual attention
Trends Cogn. Sci.
(2000)Time-locked multiregional retroactivation: a systems-level proposal for the neural substrates of recall and recognition
Cognition
(1989)- et al.
Cue-invariant activation in object-related areas of the human occipital lobe
Neuron
(1998)
Event-related brain potentials reflect semantic priming in an object decision task
Brain Cogn.
Object identification is isolated from scene semantic constraint: evidence from object type and token discrimination
Acta Psychol.
‘P300’ and memory: Individual differences in the von Restorff effect
Cogn. Psychol.
Event-related potentials to conjunctions of spatial frequency and orientation as a function of stimulus parameters and response requirements
Electroencephalogr. Clin. Neurophysiol.
On the processing of spatial frequencies as revealed by evoked-potential source modeling
Clin. Neurophysiol.
Differences in human visual evoked potentials during the perception of colour as revealed by a bootstrap method to compare cortical activity. A prospective study
Neurosci. Lett.
Scalp distributions of event-related potentials: An ambiguity associated with analysis of variance models
Electroencephalogr. Clin. Neurophysiol.
Visual recognition and recall after right temporal-lobe excision in man
Neuropsychologia
Feedback signal from medial temporal lobe mediates visual associative mnemonic codes of inferotemporal neurons
Brain Res. Cogn. Brain Res.
Perceptual-mnemonic functions of the perirhinal cortex
Trends Cogn. Sci.
Characteristics of visual evoked potentials generated by motion coherence onset
Brain Res. Cogn. Brain Res.
Schemata and sequential thought processes in PDP models
A bootstrap method to compare the shapes of two scalp fields
Electroencephalogr. Clin. Neurophysiol.
Anatomy of the medial temporal lobe
Magn. Reson. Imaging
Uniqueness of the generators of brain evoked potential maps
IEEE Trans. Biomed. Eng.
Selective attention to the color and direction of moving stimuli: electrophysiological correlates of hierarchical feature selection
Percept. Psychophys.
Spatio-temporal dynamics of attention to color: evidence from human electrophysiology
Hum. Brain Mapp.
Processing global information in briefly presented pictures
Psychol. Res.
Aspects and extensions of a theory of human image understanding
Identification of objects in scenes: the role of scene background in object naming
J. Exp. Psychol. Learn. Mem. Cogn.
Effect of background information on object identification
J. Exp. Psychol. Hum. Percept. Perform.
The timing of visual evoked potential activity in human area V4
Proc. R. Soc. Lond. B Biol. Sci.
Identification of early visual evoked potential generators by retinotopic analyses
Hum. Brain Mapp.
Identification of early visual evoked potential generators by retinotopic and topographic analyses
Hum. Brain Mapp.
The brain binds entities and events by multiregional activation from convergence zones
Neural Comput.
Scene-context effects and models of real-world perception
Local and global contextual constraints on the identification of objects in scenes. Special Issue: Object perception and scene analysis
Can. J. Psychol.
Is the P300 component a manifestation of context updating?
Behav. Brain Sci.
Activation timecourse of ventral visual stream object-recognition areas: high density electrical mapping of perceptual closure processes
J. Cogn. Neurosci.
Framing pictures: the role of knowledge in automatized encoding and memory for gist
J. Exp. Psychol. Gen.
Cited by (204)
Effect of the congruity of emotional contexts at encoding on source memory: Evidence from ERPs
2022, International Journal of PsychophysiologyAssociation between symbol digit modalities test and regional cortex thickness in young adults with relapsing-remitting multiple sclerosis
2021, Clinical Neurology and Neurosurgery