Elsevier

Cognitive Brain Research

Volume 25, Issue 1, September 2005, Pages 169-179
Cognitive Brain Research

Research Report
Preattentive representation of feature conjunctions for concurrent spatially distributed auditory objects

https://doi.org/10.1016/j.cogbrainres.2005.05.006Get rights and content

Abstract

The role of attention in conjoining features of an object has been a topic of much debate. Studies using the mismatch negativity (MMN), an index of detecting acoustic deviance, suggested that the conjunctions of auditory features are preattentively represented in the brain. These studies, however, used sequentially presented sounds and thus are not directly comparable with visual studies of feature integration. Therefore, the current study presented an array of spatially distributed sounds to determine whether the auditory features of concurrent sounds are correctly conjoined without focal attention directed to the sounds. Two types of sounds differing from each other in timbre and pitch were repeatedly presented together while subjects were engaged in a visual n-back working-memory task and ignored the sounds. Occasional reversals of the frequent pitch–timbre combinations elicited MMNs of a very similar amplitude and latency irrespective of the task load. This result suggested preattentive integration of auditory features. However, performance in a subsequent target-search task with the same stimuli indicated the occurrence of illusory conjunctions. The discrepancy between the results obtained with and without focal attention suggests that illusory conjunctions may occur during voluntary access to the preattentively encoded object representations.

Introduction

Neurophysiological studies have provided evidence that spatially distinct sets of neurons are involved in analyzing different stimulus features (in vision, [2], [18], [35], in audition, [1], [11], [25]). If features are processed by distinct neuronal groups, then they have to be joined in order to form unitary (holistic) object representations. The process of conjoining the features of a perceptual object has been termed “feature integration” or “feature binding”. One long-standing question in the literature is whether focal attention is required for correctly integrating features of an object.

Treisman's feature integration theory [44], [45], [46], a leading theory in this field, holds that (1) sensory features are preattentively extracted in parallel from various objects and mapped separately onto feature maps; (2) in parallel, the spatial location of each extracted feature is encoded onto a master map; (3) guided by the master map, features belonging to the same object are integrated by an attentional process that moves through a complex scene in a serial manner, processing one location at a time [44]. This theory was largely based on two types of experimental results: (1)“Conjunction cost”: reaction times are longer when subjects search for targets specified by the combination of two features (e.g., red circles) within an array of items than when they search for targets specified by one or the other feature (e.g., red or circle-shaped items); (2) “Illusory feature conjunctions”: subjects often report the presence of objects with a combination of two features that only appeared on separate objects within the stimulus array when large stimulus arrays are presented only for a short period of time (e.g., perceiving a red O or a green X in an array consisting of a red Xs and green Os; [46]). The perception of objects with incorrect feature combinations suggests that feature integration does not occur automatically. Thus, both the conjunction cost and the illusory conjunction phenomena can be regarded as evidence supporting the notion of an attention-requiring feature integration stage of perception that would take place after the analysis of individual stimulus features.

However, the separation of the feature-analysis and feature-binding stages proposed by Treisman's model have been questioned by subsequent studies showing that (1) attention may affect feature analysis as well as feature-binding process [29]; (2) feature conjunctions can, in some cases be formed in the absence of focused attention [15], [16]; (3) directing attention to one feature of an object may result in the selection of other features as well [6]; and (4) parallel processing can occur during conjunction search [30], [52].

Although Treisman's feature integration theory as well as the abovementioned alternative views have been originally developed on the basis of experiments conducted in the visual modality, the issue of feature binding is similarly relevant for theories of auditory perception. Unfortunately, only a few studies have directly tested feature integration in the auditory modality and their results are controversial. Illusory feature conjunctions were reported in experiments using a target-search task [14], matching task [41], [42], as well as during the octave and scale illusion [4]. On the other hand, a conjunction cost was not always found in target-search tasks. Measuring reaction times to conjunction and feature targets and, in parallel, event-related potential (ERP) correlates of selective attention (the Nd wave, a negative difference elicited by attended sounds compared with ignored or unattended ones), Woods and his colleagues [53], [54] showed that a representation of conjoined features can be formed even before the processing of the constituent features is complete.

The mismatch negativity (MMN) ERP component [22] can be used as a tool to investigate the preattentive stage of central auditory processing (for a review, see [26]). MMN is a fronto-centrally negative potential that is elicited by “deviant” sounds violating some regular (termed “standard”) aspect of the preceding sound sequence. Occasional deviations from simple as well as complex and abstract regularities result in MMN elicitation [24]. It has been established that the elicitation of an MMN indicates that the specific regular characteristic of the standard-stimulus sequence, which is violated by the deviant sound, has been encoded in auditory memory [21]. MMN is elicited whether the subject attends the test sounds or performs a task that is unrelated to the auditory stimulation, in general. Sussman and her colleagues' results [38] confirmed Näätänen's view [20], [22] that MMN elicitation does not require attention, although the elicitation of MMN per se does not imply that all processes leading to the detection of deviants are also attention independent [37]. Therefore, MMN is an attractive tool for studying whether or not auditory feature binding requires attention.

Several studies have reported that MMN is elicited by deviants differing from the standards with regard to the combination of two acoustic features, such as frequency and intensity [12], [39] or frequency and location [36], [40]. For example, Gomes et al. [12] presented a sequence containing three frequent tones (standards: 30% probability, each). Each standard tone differed from the other two both in frequency and intensity. Occasional deviant tones (10% probability) with a frequency–intensity combination that differed from any of those, which were used in the standards elicited the MMN. Because both the frequency and the intensity of the deviant tone also appeared in one of the standard tones, the deviant tones did not violate any of the possible single-feature regularities of Gomes et al.'s tone sequences (i.e., the features appearing in the deviants appeared frequently within these sequences). Therefore, the deviant tones could only be detected as deviants if the frequency and intensity features were conjoined for both the standard and deviant tones prior to MMN generation.

Because the subjects of Gomes et al.'s as well as of the other cited studies were reading a book or watching a movie and ignoring the auditory stimuli, the elicitation of the MMN by conjunction deviants is compatible with the notion of preattentive feature binding, although it is also possible that subjects divided their attention between the uncontrolled primary task and the to-be-ignored test sounds. However, Winkler et al. [50] found that infrequent stimuli with deviant combinations of auditory (and, separately, visual) features elicited the MMN (vMMN, the visual counterpart of MMN) when subjects performed a difficult within-modality primary task. Furthermore, the difficulty of the primary task did not affect the MMN (vMMN) amplitude.

Thus, findings from the MMN studies challenge the attentive feature integration theory, at least for the auditory modality. However, since the abovementioned MMN studies delivered sounds sequentially, the contrast between the results from previous MMN studies and those regarded as supporting Treisman's feature-integration theory might be explained by the difference between sequential and simultaneous (spatially distributed) stimulation. That is, sequential presentation of the sounds does not allow the emergence of illusory conjunctions, because only the features of one sound emerge from the feature-analysis processes at any given time. Supporting this notion, neuropsychological studies [10], [31], [43] reported a patient with brain damage, who had severe problems in correctly binding the features of multiple simultaneously presented objects, whereas he had no difficulty in binding the features of a single object. These observations suggest that the brain mechanisms conjoining the features of multiple simultaneously presented objects may be partly different from those involved in conjoining the features of a single object.

Hall et al. [14] demonstrated the occurrence of illusory feature conjunctions for multiple, simultaneously presented sounds. These authors presented a target sound (e.g., violin-509 Hz) followed by an array of simultaneously presented sounds of different localization (e.g., violin-262 Hz and trombone-509 Hz). Subjects were required to judge whether one designated feature (timbre or pitch) or the combination of the two features appeared in the array. In line with studies conducted in the visual modality, the results indicated the emergence of illusory feature conjunctions.

Thus, on one hand, previous MMN studies suggested that auditory features are conjoined without focal attention when only one sound is presented at a time. On the other hand, the emergence of illusory conjunctions in Hall et al.'s [14] experiments suggests that, when multiple sounds reach one's ears concurrently, correct integration of the features may require focused attention. To resolve this contradiction, in the present study, we tested whether the features of two concurrently presented sounds are correctly conjoined without focal attention. Similarly to Hall et al. [14], we presented two concurrent, spatially distributed sounds that differed from each other both in timbre (piano and violin) and pitch (e.g., C5 and E5 in the music scale). The combination of timbre and pitch (e.g., most of the time, piano-C5 and violin-E5) were occasionally reversed (piano-E5 and violin-C5), thus forming conjunction-deviant sound-pairs (parallel condition; Fig. 1a). During the presentation of the test sounds, subjects were engaged in a visual 1-back or 3-back working-memory task and were instructed to ignore the sounds. The elicitation of the MMN response by infrequent conjunction deviants requires that features are conjoined (whether correctly or incorrectly) for the sound-pairs, since no individual feature appears infrequently within the sound sequences. If feature integration does not require attention to the sounds, then the MMN amplitude will be independent of the load of the primary task. If, on the other hand, correct binding of the features of two concurrent sounds requires focused attention, then one of two possible outcomes can be expected: (1) no MMN is elicited or (2) the MMN amplitude will depend on the load of the primary task. This is because less attention can be allocated to the sounds with the high- than with low-load primary task. Therefore, if attention is required to correctly conjoin the features of the test sound-pairs, then, on average, more illusory conjunction will occur with the high- than with the low-load primary task. When illusory conjunctions emerge, a standard pair is encoded in memory as the correctly conjoined deviant pair, whereas a deviant as the correctly conjoined standard (because the only difference between the standard and deviant pairs is their combination of pitch and timbre). With increasing number of test sound-pairs whose features are incorrectly conjoined, the measured MMN amplitude decreases for two reasons. (1) Because there are more standards than deviants in the stimulus sequences, the number of standard stimuli, which are (incorrectly) encoded as deviants should be higher than the number of deviants (incorrectly) encoded as standards (assuming equal probability for incorrectly conjoining the features of deviant and standard pairs). Increasing the deviant-stimulus probability, however, decreases the MMN amplitude [23], [32]. (2) Because the response to all deviant sound-pairs are averaged together (see Methods), if fewer of them elicited the MMN (because more were encoded as standards), the measured averaged MMN amplitude decreases. Consequently, comparing the MMN amplitude between the two task-load conditions will indicate whether attention is required for correctly conjoining the features of two concurrently delivered sounds.

For comparison with the previous MMN results, we also presented the same stimuli in a sequential manner (serial condition; Fig. 1b). Finally, an auditory target-search task was conducted using the same sounds (following the EEG recordings). In this task, subjects judged whether a target sound appeared within a subsequently presented array composed of two sounds.

Section snippets

Subjects

Twelve young adults (22–36 years, mean 27.0 years, 8 females) with no history of hearing impairment or neurological disorder participated in the experiment. Two of the subjects had received training for music instruments for 14 years (piano or cello) and another two for 6 years (piano for both), but none of them were involved in musical training at the time of the experiment.

Auditory stimuli and procedures for EEG experiment

Auditory stimuli of piano- and violin-like timbres (AKAI sample library) were created and recorded with a MIDI keyboard

MMN responses

Fig. 2 shows the ERP responses elicited by the standard and deviant stimuli together with the corresponding deviant-minus-standard difference waveforms. In all four conditions, between 100 and 200 ms from stimulus onset, the ERP elicited by the deviant stimuli was negatively displaced over the fronto-central scalp (FCz) compared with the standard-stimulus response. Further, in the same period, the deviant-stimulus response was more positive than the standard-stimulus response over the mastoid

The role of attention in conjoining features

Rare combinations of timbre and pitch elicited significant MMNs when two sounds were presented simultaneously as well as when they were delivered sequentially. Performance level was significantly lower in the 3-back compared with the 1-back primary task. This result suggests that the 3-back task was more demanding and thus required more of the subjects' attention than the 1-back task. Nevertheless, task load did not significantly affect the parameters of the MMN response (amplitude, peak

Acknowledgments

This study was supported by the Academy of Finland (Grant 80819), the Hungarian National Science Fund (OTKA T048383), the Centre of International Mobility (CIMO), and Pythagoras Graduate School for Sound and Music Research (Ministry of Education, Finland). We thank Dr. János Horváth for technical assistance.

References (54)

  • E. Sussman et al.

    Feature conjunctions and auditory sensory memory

    Brain Res.

    (1998)
  • E. Sussman et al.

    Top–down effects can modify the initially stimulus-driven auditory organization

    Brain Res. Cogn. Brain Res.

    (2002)
  • R. Takegata et al.

    Independent processing of changes in auditory single features and feature conjunctions in humans as indexed by the mismatch negativity

    Neurosci. Lett.

    (1999)
  • A.M. Treisman et al.

    A feature-integration theory of attention

    Cogn. Psychol.

    (1980)
  • J.M. Wolfe et al.

    The psychophysical evidence for a binding problem in human vision

    Neuron

    (1999)
  • C. Alain et al.

    “What” and “where” in the human auditory system

    Proc. Natl. Acad. Sci. U. S. A.

    (2001)
  • A. Bartels et al.

    The theory of multistage integration in the visual brain

    Philos. Trans. R. Soc. Lond., Ser. B Biol. Sci.

    (1998)
  • S. Berti et al.

    Working memory controls involuntary attention switching: evidence from an auditory distraction paradigm

    Eur. J. Neurosci.

    (2003)
  • D. Deutsch

    Grouping mechanisms in music

  • J. Duncan et al.

    Restricted attentional capacity within but not between sensory modalities

    Nature

    (1997)
  • M. Eimer et al.

    ERP effects of intermodal attention and cross-modal links in spatial attention

    Psychophysiology

    (1998)
  • C. Escera et al.

    Involuntary attention and distractibility as evaluated with event-related brain potentials

    Audiol. Neuro-Otol.

    (2000)
  • S.R. Friedman-Hill et al.

    Parietal contributions to visual feature binding: evidence from a patient with bilateral lesions

    Science

    (1995)
  • M.H. Giard et al.

    Separate representation of stimulus frequency, intensity, and duration in auditory sensory memory: an event-related potential and dipole-model analysis

    J. Cogn. Neurosci.

    (1995)
  • H. Gomes et al.

    Storage of feature conjunctions in transient auditory memory

    Psychophysiology

    (1997)
  • J.B. Grier

    Nonparametric indexes for sensitivity and bias: computing formulas

    Psychol. Bull.

    (1971)
  • M.D. Hall et al.

    Evidence for auditory feature integration with spatially distributed items

    Percept. Psychophys.

    (2000)
  • Cited by (0)

    View full text