Abstract
Anecdotal reports and also empirical observations suggest a preferential processing of personally significant sounds. The utterance of one's own name, the ringing of one's own telephone, or the like appear to be especially effective for capturing attention. However, there is a lack of knowledge about the time course and functional neuroanatomy of the voluntary and the involuntary detection of personally significant sounds. To address this issue, we applied an active and a passive listening paradigm, in which male and female human participants were presented with the SMS ringtone of their own mobile and other's ringtones, respectively. Enhanced evoked oscillatory activity in the 35–75 Hz band for one's own ringtone shows that the brain distinguishes complex personally significant and nonsignificant sounds, starting as early as 40 ms after sound onset. While in animals it has been reported that the primary auditory cortex accounts for acoustic experience-based memory matching processes, results from the present study suggest that in humans these processes are not confined to sensory processing areas. In particular, we found a coactivation of left auditory areas and left frontal gyri during passive listening. Active listening evoked additional involvement of sensory processing areas in the right hemisphere. This supports the idea that top-down mechanisms affect stimulus representations even at the level of sensory cortices. Furthermore, active detection of sounds additionally activated the superior parietal lobe supporting the existence of a frontoparietal network of selective attention.
Introduction
The human auditory system reveals highly specialized neural pathways that are optimally tuned to categorize biologically relevant stimuli (Belin et al., 2000; Lewis et al., 2005). First evidence suggests that this categorization is achieved notably fast within the first 100 ms after sound onset (Murray et al., 2008). Nevertheless, semantic categorical boundaries for these signals can be based on physical stimulus characteristics and it is thus difficult to distinguish a categorical effect from an effect due to mere physical differences between the categories. Furthermore, stimulus categories as used in these previous studies, like vocalizations and tool sounds, are acquired through life span or even phylogenetically. Thus, they may have prompted specialized pathways during evolution, which potentially provide the basis for their remarkable role in auditory object processing.
Whether also temporarily limited experience and training with behaviorally relevant sounds leads to plastic changes in cortical representations has been investigated in several animal studies (Ohl et al., 2001; Weinberger, 2004; Rutkowski and Weinberger, 2005; Blake et al., 2006), but remained to be extended for human processing. For that purpose, we measured oscillatory brain activity that has been suggested to be a correlate of activated stimulus representations (von der Malsburg and Schneider, 1986), especially oscillations in the gamma-band range (40 Hz) since these are assumed to reflect stimulus-specific processing (Kaiser et al., 2007). To overcome previous limitations when investigating auditory experience-based processing characteristics in humans, we used arbitrary signals, namely personal SMS ringtones, which became associated with emotional and motivational relevance over a relatively short natural training period and which were not characterized by fixed physical stimulus features that could account for different processing characteristics. Since all participants received identical stimulation while only the personal significance of the stimuli changed, responses to physically identical stimuli could be compared over all participants and conditions.
Materials and Methods
Participants.
Twelve volunteers (mean age 23 years, range 18–31; 7 females) participated in the study. They reported normal auditory and normal or corrected-to-normal visual acuity and no neurological, psychiatric, or other medical problems and were treated according to the ethical principles of the World Medical Association (1996; Declaration of Helsinki). They gave informed consent, and received course credit or monetary compensation.
Stimuli, design and procedure.
In an individual approach, we varied the personal significance of an acoustic event. Therefore, participants' personal ringtones to incoming text messages were used as stimuli. On an individual level, physical differences between sounds could account for different effects in the analyzed oscillatory activity. Therefore, we applied a so-called yoked design: For every participant, his or her own ringtone served as the personally significant sound and the ringtone of another participant as the nonsignificant sound. This way, each sound served as a significant and as a nonsignificant sound and, overall, contributed equally to these sound categories. For analysis purposes, participants have been randomly yoked into pairs. That is, electrophysiological measurements of the own ringtone have been compared with measurements to a nonsignificant sound, i.e., the ringtone of the paired participant and vice versa. Due to this yoked design, responses to physically identical stimuli could be compared in the personally significant versus nonsignificant sound condition over all participants. All physical differences should thus be cancelled out and the only variable that accounts for differences in the dependent measures is personal significance.
All ringtones had been recorded before the experimental session (HEAD acoustics HMS III.0), were root mean square normalized in intensity, and had an average duration of 799 ms, ranging from 84 to 1689 ms. The participants were comfortably seated in an electrically shielded and sound-attenuated experimental chamber (International Acoustics Company) and were presented via headphones with an auditory sequence of all 12 randomly played ringtones (set constant at ∼70 dB SPL). Hence, every stimulus, including one's own ringtone, occurred with a probability of ∼8%.
During the first condition, participants were instructed to watch a self-chosen muted and subtitled film (passive condition, 150 trials per stimulus). In two additional separate blocked conditions, they had to listen actively to the auditory stimulation and responded to either one's own ringtone (personally significant target/nonsignificant non-target condition, 840 trials, i.e., 70 for each ringtone) or the ringtone of a paired associated participant (nonsignificant target/personally significant non-target condition, 840 trials). The stimulus-onset asynchrony was set constant at 1800 ms.
Electrophysiological recordings.
The electroencephalogram (EEG) (Ag/AgCl electrodes; BioSemi) was recorded continuously from 64 standard scalp locations according to the extended 10–20 system with a sampling rate of 512 Hz. Additional electrodes were placed on the tip of the nose, two electrodes at the left and right mastoid positions, as well as two facial bipolar electrode pairs to record electroocular activity (EOG). The vertical EOG was recorded from the right eye by one supratemporal and one infraorbital electrode, and the horizontal EOG from two electrodes at positions lateral to the outer canthi of the two eyes.
Data analysis.
EEG was reconstructed into discrete epochs of 1500 ms length including a 500 ms baseline interval time-locked to the onset of the stimulus. Epochs were averaged offline, separately for the six different sound types (own versus other's ringtone passive listening, own versus other's ringtone personally significant target condition and nonsignificant target condition), while epochs containing signal changes exceeding 100 μV on any channel had been excluded from further analyses. Using this approach, one participant was excluded due to excessive artifacts.
Spectral changes in oscillatory activity were analyzed by means of Morlet wavelets with a width of seven cycles per wavelet (Bertrand and Pantev, 1994). In brief, the method provides a time-varying magnitude of the signal in each frequency band, leading to a time by frequency representation of the data (Fig. 1). Effects due to experimental manipulation were further analyzed using individually set gamma-band peaks (within the gamma-band range of 35–75 Hz between 40 and 100 ms) by means of repeated measurement ANOVAs. For the passive condition (mean gamma-band peak 46.7 Hz, 23.1 SD) the factors significance (personally significant versus nonsignificant) and electrode (14 selected 10–20 sites: AFz, F3, F4, FCz, T7, C3, C4, T8, Pz, P7, P3, P4, P8, Oz) were used. For the active condition (mean gamma-band peak 56.6 Hz, 22.8 SD) the factor target (target versus non-target) was added. For effects with more than one degree of freedom, the original degrees of freedom are reported along with the corrected probability (Greenhouse-Geisser).
Results
The analyzed evoked gamma-band responses (eGBRs) were significantly increased for the personally significant sound compared with the nonsignificant sound of the paired participant in the passive (personally significant 0.38 ± 0.10 μV SEM, nonsignificant 0.14 ± 0.05 μV SEM) as well as in the active listening condition (personally significant 0.23 ± 0.05 μV SEM, nonsignificant 0.04 ± 0.06 μV SEM; ANOVA main effect significance, active condition: F(1,10) = 16.96, p = 0.002; passive condition: F(1,10) = 8.72, p = 0.015). These main effects of significance were complemented by enhanced eGBRs for target sounds compared with non-target sounds in the active target detection task (target 0.20 ± 0.04 μV SEM, non-target 0.06 ± 0.07 μV SEM; ANOVA main effect target: F(1,10) = 6.78, p = 0.026; see line plot in Fig. 1). Statistical analysis revealed no significant interactions including the factors electrode and significance or target suggesting a broad distribution of the effects over the scalp surface.
Time-frequency plot of the average stimulus-locked activity. Clear evoked gamma-band responses (40–100 ms; ∼35–75 Hz) were elicited in the passive and the active listening conditions (averaged across all 64 electrodes). The lowest panel shows a line plot that illustrates the differences in eGBRs for the different conditions: passive (P) and active (A), target (T) versus non-target (NT) for the personally significant sound (S) versus nonsignificant sound (NS).
VARETA (variable resolution electromagnetic tomography) was used to infer the underlying sources of all eGBR effects (Bosch-Bayard et al., 2001; Gruber et al., 2006). For the passive condition the source reconstruction revealed sources in the left superior temporal gyrus (STG) and the left middle frontal gyrus. Importantly, the active condition showed the same sources as in the passive condition, but additional activity stemming from the right STG. These areas were also active during target detection with additional involvement of the left superior parietal lobe (targets versus non-targets; see Fig. 2).
VARETA source solutions for the eGBR effects. Coronal, axial and sagittal transparent outline views are presented. Statistically significant eGBR differences are depicted in light gray (p < 0.01) for the personally significant (S) versus nonsignificant (NS) sounds during the passive and active listening conditions and target (T) versus non-target (NT) sounds in the active condition (latency range: 40–100 ms).
Discussion
The present study has been designed to investigate the earliest memory match and functional neuroanatomy that underlies the apparently preferential processing of personally significant sounds (Moray, 1959; Formby, 1967; Perrin et al., 1999, 2006; Roye et al., 2007). Extensive research of the past years already supported the idea of the brain as a highly adaptive system that changes responsiveness depending on past experiences (Ohl et al., 2001; Weinberger, 2004; Dahmen and King, 2007; Feldman, 2009). Our results show that the eGBR is enhanced for the own ringtone compared with the ringtone of others within the first 100 ms after stimulus onset despite the large physical variation in stimulation. This enhancement in oscillatory activity for personally significant sounds was localized in the left STG as well as in the left middle frontal gyrus. It seems very likely that this effect mirrors a match of incoming sensory information with long-term memory templates (Herrmann et al., 2004). A positive outcome of that match, especially with elaborated memory templates as for the personally significant sound, results in enhanced eGBRs. This holds for the active target detection task as well as for the passive listening condition when participants were watching a muted movie, and thus regardless of whether the auditory stimulation was relevant for the current task. This finding furthermore concurs with the outcome of a previous study in which eGBR effects were reported when a current perceived tone matches an expected tone in pitch (Widmann et al., 2007). It seems possible that the system is prepared for encountering a personally significant sound by holding its representation preactivated to a certain degree. This preactivation may be implemented on a neuronal level as an established assembly of neurons which oscillate in the observed 40 Hz frequency range measureable as early as 40 to 100 ms after sound onset.
This interpretation is in further concordance with animal studies suggesting a memory code for behaviorally relevant sounds in primary auditory cortices (Weinberger, 2004). There, it has been shown that neurons' responses in the primary auditory cortex are tuned to the motivational value of a stimulus: the more important the sound, the more specifically tuned cells become involved. However, in extension to the findings from animals, the present activation patterns were not confined to early sensory areas. Our results revealed a higher coactivation in left middle frontal areas evoked by the own ringtone as opposed to other tones. This coactivation might mirror a general mechanism related to the detection of personally significant information. This interpretation is in line with imaging studies which have associated left frontal areas with familiarity detection (Plailly et al., 2007) and self-referential processing (e.g., hearing one's own name; Carmody and Lewis, 2006).
Usually, a left-hemispheric dominance is reported for the processing of auditory speech input on a phonological, lexical and sentence-level compared with nonlinguistic acoustic stimuli (Indefrey and Cutler, 2004). To speculate, in this very early time window of the reported effect, the processing of these complex and rapidly changing non-speech signals may still be at a stage that goes in parallel with the acoustic analysis of prelexical stimulus features. Due to its earliness, our differential effect may be interpreted as one component of a memory recognition mechanism that is based on fast processing of local sound properties known to be predominantly analyzed in the left hemisphere (Wetzel et al., 2008). This is also in line with the assumption, that speech and non-speech sounds partly share structural resources of the anteriorly directed processing stream in the left hemisphere (Giraud and Price, 2001; Scott and Wise, 2004).
Given that the early coactivation of the left STG and left frontal areas was found during passive listening, we propose that it primarily reflects a feedforward mechanism within the earliest stages of the auditory what-processing pathway (Alain et al., 2001).
Furthermore, we identified an activation pattern within a cortical “gamma network,” which spatially expanded with increasing top-down demands. When participants were instructed to attend to the auditory sequence and to detect a target sound, the right STG was involved in addition to the regions described above. In particular, eGBR amplitudes in the right STG were enhanced for one's own ringtone relative to the ringtone of others. Thus, earliest processing of personally significant sounds is not solely a feedforward mechanism based on a left hemispheric network isolated from top-down influences. Since the acoustic input was the same in active and passive listening conditions, it shows that stimulus features itself are insufficient to explain hemispheric differences in a bottom-up manner. This result even more supports the notion that hemispheric specializations especially within auditory cortices are at least partially influenced by task demands and that top-down involvement leads to additional right hemisphere activation in memory-dependent representational processes (Brechmann and Scheich, 2005; Scheich et al., 2007; König et al., 2008).
Additionally, regardless of the significance status of the currently occurring sound, the eGBR in the left superior parietal lobe was enhanced for targets as opposed to non-targets during the active listening task. The superior parietal lobe is regarded as a part of a frontoparietal network that seems to play a central role in goal-directed selection of stimuli as well as in motor-preparation (Corbetta and Shulman, 2002; Bidet-Caulet and Bertrand, 2005; Salmi et al., 2009). When a stimulus matches a sound template currently held in short-term memory, i.e., the target sound, it may activate a network of distinct cortical areas that give rise to a fast allocation of attention and trigger the associated motor-response.
In sum, this study provides evidence that incoming acoustic information is matched with existing memory templates representing the specific, personally significant, and behaviorally relevant stimulus. This matching mechanism starts with sound onset and involves experience-based neural connections that exceed sensory processing areas. Importantly, voluntary attention is no prerequisite for this mechanism. Nonetheless, an increased involvement of attention-related top-down demands leads to the activation of a more widespread neural network and increased synchronous neural activity.
Footnotes
-
A.R.'s work was supported with a Ph.D. fellowship from the Evangelisches Studienwerk e.V. Villigst. E.S.'s work was supported by a Deutsche Forschungsgemeinschaft-Reinhart-Koselleck project. We thank the Research Academy Leipzig for financial support to publish the study.
- Correspondence should be addressed to Anja Roye, BioCog-Cognitive and Biological Psychology, Institute of Psychology I, University of Leipzig, Seeburgstraße 14-20, D-04103 Leipzig, Germany. anja.roye{at}uni-leipzig.de