Research ReportPreattentive representation of feature conjunctions for concurrent spatially distributed auditory objects
Introduction
Neurophysiological studies have provided evidence that spatially distinct sets of neurons are involved in analyzing different stimulus features (in vision, [2], [18], [35], in audition, [1], [11], [25]). If features are processed by distinct neuronal groups, then they have to be joined in order to form unitary (holistic) object representations. The process of conjoining the features of a perceptual object has been termed “feature integration” or “feature binding”. One long-standing question in the literature is whether focal attention is required for correctly integrating features of an object.
Treisman's feature integration theory [44], [45], [46], a leading theory in this field, holds that (1) sensory features are preattentively extracted in parallel from various objects and mapped separately onto feature maps; (2) in parallel, the spatial location of each extracted feature is encoded onto a master map; (3) guided by the master map, features belonging to the same object are integrated by an attentional process that moves through a complex scene in a serial manner, processing one location at a time [44]. This theory was largely based on two types of experimental results: (1)“Conjunction cost”: reaction times are longer when subjects search for targets specified by the combination of two features (e.g., red circles) within an array of items than when they search for targets specified by one or the other feature (e.g., red or circle-shaped items); (2) “Illusory feature conjunctions”: subjects often report the presence of objects with a combination of two features that only appeared on separate objects within the stimulus array when large stimulus arrays are presented only for a short period of time (e.g., perceiving a red O or a green X in an array consisting of a red Xs and green Os; [46]). The perception of objects with incorrect feature combinations suggests that feature integration does not occur automatically. Thus, both the conjunction cost and the illusory conjunction phenomena can be regarded as evidence supporting the notion of an attention-requiring feature integration stage of perception that would take place after the analysis of individual stimulus features.
However, the separation of the feature-analysis and feature-binding stages proposed by Treisman's model have been questioned by subsequent studies showing that (1) attention may affect feature analysis as well as feature-binding process [29]; (2) feature conjunctions can, in some cases be formed in the absence of focused attention [15], [16]; (3) directing attention to one feature of an object may result in the selection of other features as well [6]; and (4) parallel processing can occur during conjunction search [30], [52].
Although Treisman's feature integration theory as well as the abovementioned alternative views have been originally developed on the basis of experiments conducted in the visual modality, the issue of feature binding is similarly relevant for theories of auditory perception. Unfortunately, only a few studies have directly tested feature integration in the auditory modality and their results are controversial. Illusory feature conjunctions were reported in experiments using a target-search task [14], matching task [41], [42], as well as during the octave and scale illusion [4]. On the other hand, a conjunction cost was not always found in target-search tasks. Measuring reaction times to conjunction and feature targets and, in parallel, event-related potential (ERP) correlates of selective attention (the Nd wave, a negative difference elicited by attended sounds compared with ignored or unattended ones), Woods and his colleagues [53], [54] showed that a representation of conjoined features can be formed even before the processing of the constituent features is complete.
The mismatch negativity (MMN) ERP component [22] can be used as a tool to investigate the preattentive stage of central auditory processing (for a review, see [26]). MMN is a fronto-centrally negative potential that is elicited by “deviant” sounds violating some regular (termed “standard”) aspect of the preceding sound sequence. Occasional deviations from simple as well as complex and abstract regularities result in MMN elicitation [24]. It has been established that the elicitation of an MMN indicates that the specific regular characteristic of the standard-stimulus sequence, which is violated by the deviant sound, has been encoded in auditory memory [21]. MMN is elicited whether the subject attends the test sounds or performs a task that is unrelated to the auditory stimulation, in general. Sussman and her colleagues' results [38] confirmed Näätänen's view [20], [22] that MMN elicitation does not require attention, although the elicitation of MMN per se does not imply that all processes leading to the detection of deviants are also attention independent [37]. Therefore, MMN is an attractive tool for studying whether or not auditory feature binding requires attention.
Several studies have reported that MMN is elicited by deviants differing from the standards with regard to the combination of two acoustic features, such as frequency and intensity [12], [39] or frequency and location [36], [40]. For example, Gomes et al. [12] presented a sequence containing three frequent tones (standards: 30% probability, each). Each standard tone differed from the other two both in frequency and intensity. Occasional deviant tones (10% probability) with a frequency–intensity combination that differed from any of those, which were used in the standards elicited the MMN. Because both the frequency and the intensity of the deviant tone also appeared in one of the standard tones, the deviant tones did not violate any of the possible single-feature regularities of Gomes et al.'s tone sequences (i.e., the features appearing in the deviants appeared frequently within these sequences). Therefore, the deviant tones could only be detected as deviants if the frequency and intensity features were conjoined for both the standard and deviant tones prior to MMN generation.
Because the subjects of Gomes et al.'s as well as of the other cited studies were reading a book or watching a movie and ignoring the auditory stimuli, the elicitation of the MMN by conjunction deviants is compatible with the notion of preattentive feature binding, although it is also possible that subjects divided their attention between the uncontrolled primary task and the to-be-ignored test sounds. However, Winkler et al. [50] found that infrequent stimuli with deviant combinations of auditory (and, separately, visual) features elicited the MMN (vMMN, the visual counterpart of MMN) when subjects performed a difficult within-modality primary task. Furthermore, the difficulty of the primary task did not affect the MMN (vMMN) amplitude.
Thus, findings from the MMN studies challenge the attentive feature integration theory, at least for the auditory modality. However, since the abovementioned MMN studies delivered sounds sequentially, the contrast between the results from previous MMN studies and those regarded as supporting Treisman's feature-integration theory might be explained by the difference between sequential and simultaneous (spatially distributed) stimulation. That is, sequential presentation of the sounds does not allow the emergence of illusory conjunctions, because only the features of one sound emerge from the feature-analysis processes at any given time. Supporting this notion, neuropsychological studies [10], [31], [43] reported a patient with brain damage, who had severe problems in correctly binding the features of multiple simultaneously presented objects, whereas he had no difficulty in binding the features of a single object. These observations suggest that the brain mechanisms conjoining the features of multiple simultaneously presented objects may be partly different from those involved in conjoining the features of a single object.
Hall et al. [14] demonstrated the occurrence of illusory feature conjunctions for multiple, simultaneously presented sounds. These authors presented a target sound (e.g., violin-509 Hz) followed by an array of simultaneously presented sounds of different localization (e.g., violin-262 Hz and trombone-509 Hz). Subjects were required to judge whether one designated feature (timbre or pitch) or the combination of the two features appeared in the array. In line with studies conducted in the visual modality, the results indicated the emergence of illusory feature conjunctions.
Thus, on one hand, previous MMN studies suggested that auditory features are conjoined without focal attention when only one sound is presented at a time. On the other hand, the emergence of illusory conjunctions in Hall et al.'s [14] experiments suggests that, when multiple sounds reach one's ears concurrently, correct integration of the features may require focused attention. To resolve this contradiction, in the present study, we tested whether the features of two concurrently presented sounds are correctly conjoined without focal attention. Similarly to Hall et al. [14], we presented two concurrent, spatially distributed sounds that differed from each other both in timbre (piano and violin) and pitch (e.g., C5 and E5 in the music scale). The combination of timbre and pitch (e.g., most of the time, piano-C5 and violin-E5) were occasionally reversed (piano-E5 and violin-C5), thus forming conjunction-deviant sound-pairs (parallel condition; Fig. 1a). During the presentation of the test sounds, subjects were engaged in a visual 1-back or 3-back working-memory task and were instructed to ignore the sounds. The elicitation of the MMN response by infrequent conjunction deviants requires that features are conjoined (whether correctly or incorrectly) for the sound-pairs, since no individual feature appears infrequently within the sound sequences. If feature integration does not require attention to the sounds, then the MMN amplitude will be independent of the load of the primary task. If, on the other hand, correct binding of the features of two concurrent sounds requires focused attention, then one of two possible outcomes can be expected: (1) no MMN is elicited or (2) the MMN amplitude will depend on the load of the primary task. This is because less attention can be allocated to the sounds with the high- than with low-load primary task. Therefore, if attention is required to correctly conjoin the features of the test sound-pairs, then, on average, more illusory conjunction will occur with the high- than with the low-load primary task. When illusory conjunctions emerge, a standard pair is encoded in memory as the correctly conjoined deviant pair, whereas a deviant as the correctly conjoined standard (because the only difference between the standard and deviant pairs is their combination of pitch and timbre). With increasing number of test sound-pairs whose features are incorrectly conjoined, the measured MMN amplitude decreases for two reasons. (1) Because there are more standards than deviants in the stimulus sequences, the number of standard stimuli, which are (incorrectly) encoded as deviants should be higher than the number of deviants (incorrectly) encoded as standards (assuming equal probability for incorrectly conjoining the features of deviant and standard pairs). Increasing the deviant-stimulus probability, however, decreases the MMN amplitude [23], [32]. (2) Because the response to all deviant sound-pairs are averaged together (see Methods), if fewer of them elicited the MMN (because more were encoded as standards), the measured averaged MMN amplitude decreases. Consequently, comparing the MMN amplitude between the two task-load conditions will indicate whether attention is required for correctly conjoining the features of two concurrently delivered sounds.
For comparison with the previous MMN results, we also presented the same stimuli in a sequential manner (serial condition; Fig. 1b). Finally, an auditory target-search task was conducted using the same sounds (following the EEG recordings). In this task, subjects judged whether a target sound appeared within a subsequently presented array composed of two sounds.
Section snippets
Subjects
Twelve young adults (22–36 years, mean 27.0 years, 8 females) with no history of hearing impairment or neurological disorder participated in the experiment. Two of the subjects had received training for music instruments for 14 years (piano or cello) and another two for 6 years (piano for both), but none of them were involved in musical training at the time of the experiment.
Auditory stimuli and procedures for EEG experiment
Auditory stimuli of piano- and violin-like timbres (AKAI sample library) were created and recorded with a MIDI keyboard
MMN responses
Fig. 2 shows the ERP responses elicited by the standard and deviant stimuli together with the corresponding deviant-minus-standard difference waveforms. In all four conditions, between 100 and 200 ms from stimulus onset, the ERP elicited by the deviant stimuli was negatively displaced over the fronto-central scalp (FCz) compared with the standard-stimulus response. Further, in the same period, the deviant-stimulus response was more positive than the standard-stimulus response over the mastoid
The role of attention in conjoining features
Rare combinations of timbre and pitch elicited significant MMNs when two sounds were presented simultaneously as well as when they were delivered sequentially. Performance level was significantly lower in the 3-back compared with the 1-back primary task. This result suggests that the 3-back task was more demanding and thus required more of the subjects' attention than the 1-back task. Nevertheless, task load did not significantly affect the parameters of the MMN response (amplitude, peak
Acknowledgments
This study was supported by the Academy of Finland (Grant 80819), the Hungarian National Science Fund (OTKA T048383), the Centre of International Mobility (CIMO), and Pythagoras Graduate School for Sound and Music Research (Ministry of Education, Finland). We thank Dr. János Horváth for technical assistance.
References (54)
- et al.
Crossmodal attention
Curr. Opin. Neurobiol.
(1998) - et al.
Competitive brain activity in visual attention
Curr. Opin. Neurobiol.
(1997) - et al.
Auditory and visual objects
Cognition
(2001) - et al.
Scalp distributions of event-related potentials: an ambiguity associated with analysis of variance models
Electroencephalogr. Clin. Neurophysiol.
(1985) - et al.
Early selective-attention effect on evoked potential reinterpreted
Acta Psychol.
(1978) - et al.
“Primitive intelligence” in the auditory cortex
Trends Neurosci.
(2001) - et al.
Neuromagnetic evidence of an amplitopic organization of the human auditory cortex
Electroencephalogr. Clin. Neurophysiol.
(1989) - et al.
Cognitive and linguistic factors affect visual feature integration
Cogn. Psychol.
(1984) - et al.
Effects of sequential and temporal probability of deviant occurrence on mismatch negativity
Cogn. Brain Res.
(2001) Objects and attention: the state of the art
Cognition
(2001)