Abstract
Emotionally significant objects and events in our environment attract attention based on their motivational relevance for survival. Such kind of emotional attention is thought to lead to affect-specific amplified processing that closely resembles effects of directed attention. Although there has been extensive research on prioritized processing of visual emotional stimuli, the spatio-temporal dynamics of motivated attention mechanisms in auditory processing are less clearly understood. We investigated modulatory effects of emotional attention at early auditory processing stages using time-sensitive whole-head magnetoencephalography. A novel associative learning procedure involving multiple conditioned stimuli (CSs) per affective category was introduced to specifically test whether affect-specific modulation can proceed in a rapid and highly differentiating fashion in humans. Auditory evoked fields (AEFs) were recorded in response to 42 different ultrashort, click-like sounds before and after affective conditioning with pleasant, unpleasant, or neutral auditory scenes. As hypothesized, emotional attention affected neural click tone processing at time intervals of the P20-50m (20–50 ms) and the N1m (100–130 ms), two early AEF components sensitive to directed selective attention (Woldorff et al., 1993). Distributed source localization revealed amplified processing of tones associated with aversive or pleasant compared with neutral auditory scenes at auditory sensory, frontal and parietal cortex regions. Behavioral tests did not indicate any awareness for the contingent CS–UCS (unconditioned stimulus) relationships in the participants, suggesting affective associative learning in absence of contingency awareness. Our findings imply early and highly differentiating affect-specific modulation of auditory stimulus processing supported by neural mechanisms and circuitry comparable with those reported for directed auditory attention.
Introduction
Emotionally significant stimuli receive prioritized perceptual processing. This survival-promoting mechanism has been attributed to the automatic engagement of selective attention by emotionally salient objects or events in our environment (Vuilleumier, 2005). Like directed attention—driven by current goals, task relevance, or inherent physical stimulus salience—prioritizes behaviorally relevant stimuli in the competition for limited processing resources by means of sensory gain control (Hillyard and Anllo-Vento, 1998), motivated attention leads to affect-specific amplified processing of stimuli with intrinsic significance for basic motive systems (Lang et al., 1998). Hemodynamic neuroimaging studies have revealed increased activation within a distributed neural network of amygdala, prefrontal, parietal, and modality-specific sensory cortex regions for emotionally arousing auditory, visual, and olfactory stimuli (Royet et al., 2000; Sander and Scheich, 2001; Bradley et al., 2003; Sabatinelli et al., 2005). The high temporal resolution offered by electroencephalography and magnetoencephalography (EEG/MEG) has been used to investigate the temporal dynamics of motivated attention in the visual system. Event-related potentials/magnetic fields (ERPs/ERMFs) measured in response to various types of affective pictorial stimuli were found to be modulated by motivated attention at distinct processing stages (Junghöfer et al., 2001; Schupp et al., 2004; Stolarova et al., 2006; Keil et al., 2007; Kissler et al., 2007), convergent to effects of directed attention (Anllo-Vento et al., 1998; Hillyard and Anllo-Vento, 1998; Martínez et al., 2001; Moratti et al., 2004; Ferrari et al., 2008).
In contrast to the rather extensive research on temporal dynamics of human visual affective processing, no corresponding auditory studies have been reported to date. This is mainly attributable to the dynamic nature of affective sounds, such as crying or laughing, resulting in highly complex spatio-temporal convolutions of evoked neurophysiological signals (Bradley and Lang, 2000). Addressing this methodological constraint, we here used multiple ultrashort tones that revealed their identity almost instantaneously—as in vision—and assigned emotional meaning by means of associative learning. More specifically, we compared auditory evoked fields (AEFs) acquired by 275 sensor whole-head MEG in response to 42 different click-like tones [conditioned stimuli (CSs)] before and after contingent pairing with 42 different emotionally arousing pleasant, unpleasant, or neutral auditory scenes [unconditioned stimuli (UCSs)].
We hypothesized that early auditory processing would be modulated by the learned emotional meaning of multiple CSs and that the spatio-temporal characteristics of these motivated attention effects would converge to AEF modulations reported for directed auditory attention. In particular, we expected a modulation of the N1m (100–130 ms), the first major auditory sensory component, and of the even earlier P20-50m (20–50 ms) component, both sensitive to directed attention (Hillyard et al., 1973; Woldorff et al., 1993; Ozaki et al., 2004; Fritz et al., 2007; Poghosyan and Ioannides, 2008). Tones associated with appetitive and aversive as opposed to neutral scenes were expected to evoke enhanced activity in a distributed fronto-temporo-parietal network (Knight et al., 1999; Corbetta and Shulman, 2002; Fritz et al., 2007). Because of the large number of CS–UCS associations to be acquired within few learning instances, neurophysiological effects were expected to occur in the absence of contingency awareness.
Materials and Methods
Participants
Twenty-four healthy, right-handed volunteers (12 females; mean age, 24.4 years; SD, 2.47 years) with normal hearing, as assessed by individual hearing threshold determination, participated in the study. All participants gave informed consent to the protocol approved by the ethics committee of the Medical Faculty, University of Muenster, in accordance with the Declaration of Helsinki. Participants showed inconspicuous levels of trait anxiety and depression as determined by administration of the State–Trait–Anxiety Inventory (Laux et al., 1981) and Beck's Depression Inventory (Beck et al., 1961).
Stimulus materials and experimental procedure
In affective neuroscience research, the time-critical investigation of auditory stimulus processing with ERP or ERMF measures is complicated by the fact that the emotional meaning of an acoustic stimulus, such as sadness in crying or happiness in laughing, typically only emerges over the course of at least a few hundred milliseconds (cf. Bradley and Lang, 2000). The dynamic nature of emotional sounds leads to temporal interactions between and convolutions of different neurophysiological responses, explaining the complete lack of studies on the temporal dynamics of acoustic emotion processing in humans. A way to address this issue is to use different short, quasistatic stimuli that convey their emotional meaning almost instantaneously, as in vision, and assign differential affective significance to these tones by means of classical conditioning. In this form of associative learning, a formerly neutral stimulus (CS) comes to elicit a conditioned response (CR) previously associated with the presentation of an UCS alone after pairing CS and UCS in a contingent manner. Although studies both in animal and human research investigating differential affective conditioning typically use one conditioned stimulus per affective category (Quirk et al., 1995; Dolan et al., 2006; Stolarova et al., 2006; Keil et al., 2007), we here used a multitude of different CSs to be conditioned either with positive, negative, or neutral UCSs. We used this novel MultiCS-conditioning paradigm (cf. Steinberg et al., 2011) for several reasons: (1) to show that the neural system underlying emotion processing is rapid and highly resolving in terms of its capacity to differentiate a multitude of emotional stimuli; (2) to investigate motivated attention effects on early cortical processing stages in the absence of participants' explicit knowledge about CS–UCS contingencies; since we revealed in a pilot study with a different set of subjects (N = 8) that they were already unable to recall three CS–UCS pairs per emotion condition (data not shown), it was expected that the participants of the present study would be completely unaware of the stimulus associations for even larger numbers of pairs; (3) to allow for a sufficiently high number of trials within each condition assuring good signal-to-noise ratio for MEG data analysis, whereas at the same time, every single stimulus has to be repeated only a few times reducing extinction of the acquired emotional meaning because of repeated non-reinforced CS presentations (Rogan et al., 1997).
Conditioned stimuli.
We recorded a set of 60 natural sounds with a click-like character generated by different hard materials (metal, wood, glass, stone, etc.) bouncing against each other. The tones were trimmed to a length of 20 ms after CS onset, which was defined as the earliest point at which the amplitude of the signal equaled 10% of the maximum amplitude difference of the overall signal. These ultrashort and spectrally complex natural sounds (1) do not require accrual of information over a significant time span in order for their identity to be revealed, (2) do not systematically carry physical features associated with emotional relevance, and (3) do not inherently differ in evoked emotional arousal and perceived hedonic valence. Thus, these CSs appeared ideally suited to investigate early (P20/50m; N1m) emotional processing. We term this processing as “highly resolving,” because the semantic/affective differentiation has to be based on differences in highly complex frequency characteristics and a multitude of subjectively rather similar sounds have to be differentiated. The stimuli were normalized with regards to loudness by applying the Group Waveform Normalization algorithm of Adobe Audition (Adobe) that uniformly matches the loudness based on the root-mean-square (RMS) levels. All stimuli were characterized by a very short rise time. Figure 1 shows the time course, as well as the Fourier power spectrum of two exemplary CSs and illustrates the very distinct physical properties of the tones despite their shortness. In a pilot study, based on a different group of subjects as in the study at hand, the 42 most distinct sounds were selected from this set of 60 by means of a multidimensional scaling (MDS) procedure to serve as conditioned stimuli in the associative learning paradigm. MDS required subjects to rate the similarity of CS pairs and was applied to eliminate maximally similar pairs from the CS set, while preserving an overall homogeneity of the perceived stimulus similarity in the remaining set.
Amplitude time series (left) and amplitude power spectra (right) for two exemplary sounds (A, B) used as conditioned stimuli. The figure illustrates the very distinct physical properties of the CSs despite their shortness.
Unconditioned stimuli.
As unconditioned stimuli, 14 high-arousing unpleasant (UCSneg), 14 high-arousing pleasant (UCSpos), and 14 low-arousing neutral (UCSneut) stimuli were selected from the International Affective Digitized Sounds system (IADS) (Bradley and Lang, 2000). The selection was based on normative ratings of hedonic valence (means: UCSneg = 2.3 < UCSneut = 4.9 < UCSpos = 7.0) and arousal (means: UCSneg = 7.2 > UCSneut = 4.3 < UCSpos = 7.0) on nine-point affective Self-Assessment Manikin rating scales (SAM rating) (Lang, 1980). All UCS sounds were trimmed to a length of 6 s. Participants in the present study evaluated the perceived degree of hedonic valence and emotional arousal of the auditory scenes used as UCSs using the same SAM rating scales (Fig. 2Bi) as used by Lang (1980) and Bradley and Lang (2000) in the normative study. Results of this rating corresponded to the normative data (valence means, UCSneg = 2.6 < UCSneut = 4.8 < UCSpos = 6.3; arousal means, UCSneg = 6.4 > UCSneut = 3.6 < UCSpos = 5.9).
Experimental setup. A, MultiCS-conditioning paradigm in MEG: The preconditioning and postconditioning runs, during which the magnetoencephalogram was recorded, consisted of five pseudorandomized isolated presentations of all 42 CSs (i, iii). Each CS had a duration of 20 ms. The click-like tones were presented with an ITI jittered between 1500 and 2500 ms. ii, During conditioning, each CS–UCS pair was presented three times. In a single associative learning trial, each CS was presented four times, once before UCS onset and three times during UCS presentation. The ITI was jittered between 1500 and 2500 ms. B, Behavioral tasks. i, Nine-point SAM rating scales for UCS valence and arousal rating and CS valence rating. ii, In the affective category recall task, subjects had to choose the affective category to which the presented CS had become associated with during conditioning. iii, In the CS–UCS pair recognition task, pairings of the same CS with six different UCSs (two from each affective category) were presented.
Experimental procedure.
The affective associative learning procedure in the MEG comprised one preconditioning MEG measurement, two interspersed conditioning sessions, and one postconditioning MEG measurement (Fig. 2A), as well as three additional behavioral tasks (Fig. 2B) administered partly before and after MEG data acquisition.
For the MEG recordings, we chose a pretreatment–posttreatment within-subject design, in which the measurement before conditioning served as a baseline to correct for possible preexisting differences in click tone processing between the three conditions. This served to assure that differential neural CS responses could be interpreted exclusively as attributable to affective conditioning. During this phase, subjects listened to a total of five blocks with random presentation of all 42 neutral click-like sounds. To account for disturbing effects of ambience and stimulus novelty, stimulus repetition, and mere exposure, the first two blocks were used for habituation and stabilization. The three following presentations were used as preconditioning recordings to assess the emotion effect. The CS presentation was pseudorandomized in a way so that no more than two stimuli of the same to-be-assigned affective category were presented in a row. The intertrial interval (ITI) was jittered between 500 and 1500 ms (Fig. 2Ai). During conditioning, each of the 42 neutral CSs was uniquely paired with one of the 42 IADS sounds (Fig. 2Aii) (the three CS categories resulting from affective associative learning will be referred to as CSpos, CSneg, and CSneut in the following). The assignment of CS and UCS was completely randomized across subjects (i.e., no two subjects received the same sets of CS–UCSpos, CS–UCSneg, and CS–UCSneut pairs). The pairing scheme for the affective associative learning was a combination of delay and trace conditioning: In a single associative learning trial, a conditioned stimulus was presented once either 500 or 450 ms in advance to UCS onset and three times at random onsets during the 6 s UCS presentation. For each of the 42 unique CS–UCS pairs, three such learning trials were presented randomly in order for the CS to get associated with the UCS occurrence. The pairing of a specific CS and UCS was 100% contingent (i.e., CS–UCS pairs did not vary across repetitions). The postconditioning measurement (Fig. 2Aiii) was identical with the preconditioning session, comprising five blocks of differently pseudorandomized presentations of all 42 CSs. During all phases of the experiment, subjects were instructed to listen passively to the presented sounds and, to prevent MEG signal disturbing eye movements, to fixate a small cross presented at the center of a screen in front of them.
Three behavioral tasks were administered outside the MEG scanner as follows.
-
(1) To assess effects of emotional learning on subjective CS perception, subjects rated the valence of all 42 click-like sounds on a nine-point affective SAM rating scale in advance and subsequent to the MEG measurement (Fig. 2Bi, CS rating task).
-
(2) In an affective category recall task, we tested participants' awareness of the predictive CS–UCS relationship: Although all 42 CSs were randomly presented one by one, participants were instructed to indicate for each CS with which one of three possible affective categories (positive, negative, or neutral) the formerly neutral click tone had become associated with during conditioning (Fig. 2Bii). We evaluated the capability to detect the correct hedonic valence or the correct emotional arousal category by calculating the sensitivity measure d′ (Green and Swets, 1966) and testing it against 0 with a one-sample t test.
-
(3) Finally, we tested subjects' ability to explicitly recognize specific CS–UCS pairs (Fig. 2Biii, CS–UCS pair recognition task). Along with one specific CS, six different UCSs—two from each affective category chosen from the overall 42 UCS sounds—were sequentially and randomly presented as CS–UCS pairs in the same fashion as during conditioning. For 15 of the 20 tested CSs that were randomly chosen from the overall set of 42, there was one CS–UCS combination among the six different pairs that had been presented during conditioning. The five remaining CS–UCS pairs, but not the CS or UCS themselves, were new and not learned before. At the end of each sequence, subjects were asked to indicate whether all trials in the sequence presented new pairings or to identify the familiar pair. A d′ sensitivity measure was calculated for recognizing a pair belonging to the same valence and for a pair belonging to the same arousal category as the correct one and tested against 0 for statistical evaluation of subjects' performance with a one-sample t test.
Stimulation was delivered by means of Presentation software (version 12.1; Neurobehavioral Systems).
ERMF recordings and analysis
During MEG measurement, subjects were seated in a magnetically shielded and sound-attenuated room. Head coordinates were determined with three landmark coils fixed to the auditory canals and the nasion to match MEG data with anatomical information from MR scans. Air-conducted sounds were delivered at 60 dB above individual hearing threshold through silicon tubes and individually fitted silicon earpieces. Subjects were instructed to avoid head or eye movements and to fixate a central cross that was projected onto a screen in front of them. MEG data was acquired with a 275-sensor whole-head MEG system (Omega 275; CTF Systems; VSM MedTech) equipped with first-order axial SQUID gradiometers. The magnetoencephalogram was recorded continuously at a sampling rate of 1200 Hz and filtered on-line with a hardware low-pass filter of 300 Hz. For preprocessing and statistical analysis of MEG data, the Matlab-based (The MathWorks) EMEGS software (Junghöfer and Peyk, 2004) (freely available at http://www.emegs.org/) was used. Off-line responses were sampled down to 600 Hz and filtered with a 0.2–48 Hz bandpass filter. The continuously recorded signal was discretized into averaging epochs ranging from −200 to +600 ms relative to onset of the conditioned stimulus. The prestimulus baseline interval ranged from 150 ms before until stimulus onset. For single-trial data editing and artifact rejection, a method for statistical control of artifacts in dense-array MEG studies was applied (Junghöfer et al., 2000). After data editing and artifact rejection, separate averages were calculated for the three resulting affective conditions (CSpos, CSneg, and CSneut) in the preconditioning and postconditioning runs for each sensor and participant. For the preconditioning measurement, data were averaged across the three CS blocks after habituation. Analogously, for the postconditioning measurement, only the first three CS repetition blocks were considered, further restricting the impact of rapid neural extinction processes (Rogan et al., 1997).
Inverse modeling.
After averaging, the cortical sources of the event-related magnetic fields were estimated using a distributed source model suitable to detect distributed network activity covered by modern whole-head MEG scanner, as well as a more traditional dual source model, convergent to the approach used in previous MEG studies on directed attention (Woldorff et al., 1993) with limited MEG head coverage.
As distributed inverse source modeling method, the L2-minimum-norm-pseudoinverse (L2-MNP) was applied. This technique allows the estimation of distributed neural network activity without a priori assumptions regarding the location and/or number of current sources (Hämäläinen and Ilmoniemi, 1994). In addition, from all possible generator sources, only those exclusively determined by the measured magnetic fields are considered by the method (Hauk, 2004). A spherical shell with evenly distributed 2 (azimutal and polar direction, radial dipoles do not generate magnetic fields outside of a sphere) × 360 dipoles was used as source model. A source shell radius of 87% of the individually fitted head radius has been chosen, approximately corresponding to the gray matter volume. Across all participants and conditions, a Tikhonov regularization parameter k of 0.1 was applied. Topographies of source direction independent neural activities (i.e., the vector length of the estimated source activities at each position) were calculated for each individual subject, condition, and time point based on the averaged magnetic field distributions and the individual sensor positions for each subject and run.
In addition to the L2-MNP distributed source modeling approach, we applied conventional single equivalent current dipole reconstruction of the N1m component to make results more comparable with previous findings of directed attention effects (Woldorff et al., 1993) and to further strengthen the distributed source modeling results. The individual N1m peak response was identified as the maximal RMS value between 70 and 130 ms after CS onset. As fit interval for the source estimation, a rising slope from 20 ms before until the N1m peak was chosen. Source locations and orientations (one dipole for each hemisphere) were estimated for each subject and each run (preconditioning/postconditioning) individually. The estimated sources were fixed in location and orientation and served as a spatial filter during the calculation of the source strength for each affective condition (Okamoto et al., 2011). We calculated a repeated-measures ANOVA on the source strengths for each condition, using the factors session, valence, and hemisphere in the N1m time interval. A “fixed dipole orientation” may constrain estimates to a nonoptimal solution eventually leading to a bias of one over the other conditions. To exclude such a potential bias, an additional “rotating dipole” model has also been tested.
A priori definition of time intervals of interest and statistical analysis of the emotion effect.
The effect of selective directed attention is usually characterized by the difference between stimulus-evoked responses under attended and unattended conditions. Electrophysiological studies on auditory stimulus processing report attention effects during the N1 time window between 70 and 130 ms after stimulus onset (Hillyard et al., 1973; Woods et al., 1991; Woldorff et al., 1993; Ozaki et al., 2004) and at even earlier cortical processing stages during the P20-50 time interval (Woldorff et al., 1993; Poghosyan and Ioannides, 2008). In the visual domain, effects of directed and motivated attention are comparable regarding latency and topography (Moratti et al., 2004; Ferrari et al., 2008). We expected a similar correspondence between motivated and directed attention effects in the auditory system. Consequently, we defined the N1m—as the first component known to be modulated by directed attention to frequency-based stimulus characteristics—a priori as time window of interest for our analysis. Although until today the P20-50m component has been shown to be modulated by directed spatial attention only, we wanted to test whether subtle differences in frequency patterns signaling emotional relevance of the stimuli might lead to a modulation of auditory processing even at this early time stage. Thus, the P20-50m complex was defined as second a priori time window of interest.
Statistically, the predicted emotion effects were reflected by the two-way interaction session (preconditioning vs postconditioning) by valence (CSpos, CSneut, CSneg). For analysis of ERMF data and estimated underlying neural activity (L2-MNP), we focused on this interaction within the a priori defined N1m and P20-50m time intervals of interest. To this end, we first calculated a repeated-measures ANOVA including the two experimental factors (session by valence) at all time points and all sensors or estimated neural generators, respectively. This single sensor/source waveform analysis served for the optimized identification of sensor/source regions within the time windows of interest. To avoid false-positive decisions during the selection process, only significant effects (p < 0.05) in regions consisting of at least eight neighboring sensors/sources and within time intervals comprising at least 15 consecutive time points (25 ms) were considered meaningful (Schupp et al., 2003, 2007). In a second step, we performed conventional two-way repeated-measures ANOVAs (session by valence) for each sensor/source region and both time intervals. In addition, quadratic trend analyses, contrasting both the CSpos and the CSneg condition with CSneut were calculated, since we expected that the high-arousing pleasant and unpleasant CS conditions would show similar relative differences compared with the neutral CS category. The Greenhouse–Geisser procedure was used to correct for violations of sphericity.
For the graphical display of the relevant session by valence interaction topographies in sensor and source space (see Figs. 3⇓–5), the preconditioning run was subtracted from the postconditioning run for each of the three CS conditions separately (Δpost-pre CS). The difference wave for neutral CS was then subtracted from the respective differences for the affectively conditioned stimuli (CSpos, CSneg), yielding separate difference waveforms for pleasant (Δpost-pre CSpos − Δpost-pre CSneut) and unpleasant valence (Δpost-pre CSneg − Δpost-pre CSneut).
Results
MEG data
Affective modulation of neural processing in the N1m time window
Evoked magnetic fields (sensor space).
Between 100 and 130 ms after CS onset, session by valence interactions were significant within two widespread posterior sensor clusters on the left and right hemisphere (Fig. 3A) (left: F(2,46) = 5.25, p = 0.017; right: F(2,46) = 10.14, p = 0.000). Statistical analysis of quadratic trends revealed identical relative difference polarities for processing of both positive and negative compared with neutral CS (left: F(2,46) = 10.19, p = 0.004; right: F(2,46) = 18.5, p < 0.000). Post hoc comparisons indicated that effects for the CSpos condition were comparably strong in both hemispheres (left: F(1,23) = 17.9, p = 0.000; right: F(1,23) = 10.97, p = 0.003), whereas differential processing of CSneg appeared stronger in the right hemisphere (left: F(1,23) = 4.35, p = 0.048; right: F(1,23) = 16.86, p = 0.000). Stronger right lateralization for negative than for positive stimuli was supported by a three-way interaction session by valence by hemisphere (F(2,46) = 8.48; p = 0.001). There were no main effects of session at the right (F(1,23) = 1.63; p = 0.21) or at the left (F(1,23) = 0.54; p = 0.47) sensor cluster.
The N1m emotion effect (time interval 100–130 ms after stimulus onset) for event-related magnetic fields (A) as measured in the present study using emotionally relevant conditioned tones. The grand-averaged magnetic field data (black box) and difference plots illustrating the emotion effect separately for positive (Δpost-pre CSpos − Δpost-pre CSneut; blue box) and negative valence (Δpost-pre CSneg − Δpost-pre CSneut; red box) are shown on the left side of the panel. The bar plots on the right side show the post–pre differences for the three valence conditions in each selected sensor group (*p < 0.05; **p < 0.01). B displays corresponding N1m effects of directed attention from Woldorff et al. (1993), who compared AEFs to sounds presented to one ear when spatial attention was directed toward (attended; left) or away (unattended; center) from this ear. The attentional-difference wave (i.e., the subtracted difference between the attended and unattended condition; right) nicely corresponds to the N1m emotion effect presented in A with respect to latency, topography, and polarity.
Estimated neural activity (source space).
In the selected N1m time interval of interest between 100 and 130 ms after CS onset, the analysis of L2-MNP estimates revealed significant session by valence interactions within three main regions (Fig. 4A). Neural generators in left and right frontal cortex (F(2,46) = 7.98, p = 0.001; and F(2,46) = 11.64, p = 0.000, respectively) showed significantly stronger activity for both CSpos and CSneg compared with CSneut. Quadratic trends of the effect were significant at both hemispheres (left: F(2,46) = 10.53, p = 0.004; right: F(2,46) = 17.44, p = 0.000). In left parieto-temporal cortex, the session by valence interaction also revealed a significant modulation (F(2,46) = 5.6; p = 0.007; quadratic trend: F(2,46) = 7.64; p = 0.011), whereas effects in the mirror-symmetric right parieto-temporal group failed to reach significance (session by valence: F(2,46) = 0.42; p = 0.658; quadratic trend: F(2,46) = 0.91; p = 0.351). Thus, corresponding to the findings in sensor space during the N1 time interval, there was significantly amplified activity for both positively and negatively relative to neutrally conditioned tones. Again, a trend toward a right lateralization of effects for unpleasant stimuli was evident (left parieto-temporal: F(1,23) = 5.03, p = 0.035; left frontal: F(1,23) = 5.15, p = 0.033; right frontal: F(1,23) = 16.04, p = 0.001), whereas the effects for the pleasant condition seemed equally strong at both hemispheres (left parieto-temporal: F(1,23) = 9.77, p = 0.005; left frontal: F(1,23) = 13.98, p = 0.001; right frontal: F(1,23) = 8.73, p = 0.007). Convergent to the sensor space, there was no main effect of session at either region (left posterior: F(1,23) = 0.06, p = 0.81; left frontal: F(1,23) = 0.99, p = 0.33; right frontal: F(1,23) = 0.37, p = 0.55).
A, L2-minimum-norm estimates for the emotion effect in the N1m time window between 100 and 130 ms after stimulus. Analogously to Figure 3, the black box shows the grand-averaged neural generator activity, the blue box displays the emotion effect for positive (Δpost-pre CSpos − Δpost-pre CSneut), and the red box for negative valence (Δpost-pre CSneg − Δpost-pre CSneut). The bar plots illustrate regional amplitude differences for postconditioning minus preconditioning measurement separately for each affective condition and selected dipole groups (*p < 0.05; **p < 0.01). B, Results of the corresponding dual dipole reconstruction of the N1m component. Dipoles are located at left and right auditory cortex regions. The bar plots are analogous to those in A.
Single dipole source reconstruction of the N1m
In a reanalysis of the data with single dipole source reconstruction, neural activation at two dipoles with seed points located in left and right auditory cortex regions was analyzed in the N1m time window between 100 and 130 ms. There were no significant differences of dipole locations and orientations or latencies of the N1m between preconditioning and postconditioning measurement (all values of p > 0.05; mean peak latency of the N1m = 94 ms for both sessions). The residual variance of the dipolar source model underlying the derived field distribution amounted to 12 ± 6.3% for preconditioning and 12.7 ± 7% for postconditioning (no significant differences between sessions). Source estimation succeeded for 18 of 24 subjects, whereas for the remaining 6 subjects ECD modeling with two dipoles resulted in anatomically implausible locations far outside auditory cortex and unrealistically large dipole moments for the fits (>2 SD above mean amplitude). We found an emotion effect comparable with that reported for sensor space data and L2-MNP results: The session by valence interaction averaged across both dipoles was significant (F(2,34) = 5.46; p = 0.009), whereas there was no three-way interaction with hemisphere (F(2,34) = 1.39; p = 0.264). The emotion effect was in the expected direction, as was revealed by a significant quadratic trend for the session by valence interaction (F(1,17) = 7.51; p = 0.014). The additionally performed single dipole analyses with “rotating dipole orientation” assumption resulted in qualitatively identical effects. In sum, results for the dual dipole reconstruction were comparable with those reported for sensor space data and L2-MNP results in the N1m time window.
Affective modulation of neural processing in the P20/50 time window
Evoked magnetic fields (sensor space).
In the earlier P20-50m time window, an emotion effect was present between 15 and 45 ms after stimulus onset within three sensor regions of interest (ROIs) over left and right superior, as well as right inferior sites (Fig. 5A). The session by valence interaction was significant within all three ROIs (left superior: F(2,46) = 3.55, p = 0.037; right superior: F(2,46) = 3.52, p = 0.038; right inferior: F(2,46) = 5.14, p = 0.01). Quadratic trends with identical relative difference polarities for both pleasant and unpleasant compared with neutral CS processing were present (left superior: F(2,46) = 4.47, p = 0.046; right superior: F(2,46) = 5.08, p = 0.034; right inferior: F(2,46) = 6.66, p = 0.017). Post hoc tests showed differential topographies of the emotion effect as a function of CS valence: The two superior sensor clusters on the left and right hemisphere showed stronger modulation by pleasant compared with neutral CS (left: F(1,23) = 8.02, p = 0.009; right: F(1,23) = 6.6, p = 0.017), whereas an according interaction for unpleasant and neutral stimuli missed significance (left: F(1,23) = 2.3, p = 0.143; right: F(1,23) = 1.18, p = 0.288). A statistically significant effect for the unpleasant condition was lateralized to the right hemisphere over lateral-inferior prefrontal regions (F(1,23) = 13.06; p = 0.001). At all three sensor groups, there was no main effect of session (left superior: F(1,23) = 0.052, p = 0.82; right superior: F(1,23) = 1.35, p = 0.26; right inferior: F(1,23) = 0.03, p = 0.86).
The P20-50m component and corresponding emotion effect displayed for event-related magnetic field data (averaged across the time interval between 15 and 45 ms) (A) and L2-minimum-norm estimates (averaged across the time interval between 25 and 65 ms) (B). Grand-averaged magnetic fields and neural generator activity (black boxes), and differences for positive (Δpost-pre CSpos − Δpost-pre CSneut; blue boxes) and negative valence (Δpost-pre CSneg − Δpost-pre CSneut; red boxes) displayed on the left side of A and B, respectively. The bar plots display regional amplitude differences for postconditioning minus preconditioning measurement, separately for each affective condition and selected sensor or dipole groups (*p < 0.05; **p < 0.01).
Estimated neural activity (source space).
To further elucidate the contributions of different brain areas to rapid emotion processing in the auditory system, we analyzed the L2-MNP estimates of the underlying neural generators of the event-related magnetic fields. Within the P20-50m time range, a significant session by valence interaction was present between 25 and 65 ms after CS onset in a region located in right temporo-parietal cortex (F(2,46) = 7.528; p = 0.001) (Fig. 5B). A quadratic trend analysis within this region revealed a significant effect of enhanced processing for both affective compared with the neutral CS conditions (F(2,46) = 10.53; p = 0.004). Post hoc tests calculated separately for pre-post differences of CSpos versus CSneut and CSneg versus CSneut were significant (F(1,23) = 11.76, p = 0.002; and F(1,23) = 4.87, p = 0.038, respectively). The main effect of session was not significant (F(1,23) = 0.76; p = 0.39).
An attempt to reconstruct the P20-50m sources with a dual dipole model as applied for the N1 time interval did not succeed because of insufficient modeling in the majority of the subjects.
Control analysis in an earlier time interval (sensor and source space).
Because of its overall smaller amplitudes and therefore smaller signal-to-noise ratios, statistical effects of the P20-50m interval were less robust compared with the N1m time interval. To further reinforce the reported effects, we tested the selected regions and sensor groups of interest within earlier control intervals preceding the P20-50m (i.e., 0–15 ms in sensor and 0–25 ms in source space). The analysis in source space (F(2,46) = 0.18; p = 0.84) and the analyses for two of the three regions in sensor space (right superior: F(2,46) = 1.52, p = 0.23; inferior: F(2,46) = 1.85, p = 0.17) supported the predicted temporal specificity of results showing no significant session by valence interactions in the preceding intervals. The left hemispheric sensor group showed a significant interaction (F(2,46) = 3.77; p = 0.03), yet this effect was driven by slightly earlier peaking field differences evoked by negative (15 ms) CS after learning. We conclude that these additional tests overall reinforced the interpretability of the P20-50m effects, although even earlier mid-latency auditory evoked fields with cortical generators might be modulated by tones with an aversive connotation.
In sum, we obtained clear evidence for a rapid differentiation of emotionally significant appetitive and aversive from nonsignificant neutral stimuli, as reflected by a modulation of cortical processing early after stimulus onset. With regard to timing and topography, the resultant modulatory pattern for the N1m was in close correspondence to findings for directed attention reported previously (Woldorff et al., 1993). Additionally, we found a motivated attention effect even in the earlier P20-50m time interval. Inverse distributed source estimations indicated not only a modulation of sensory processing regions but also an involvement of frontal and parietal cortex regions in early emotion processing in both time windows.
Behavioral data
The preconditioning versus postconditioning comparison of CS ratings served to assess changes in subjectively rated CS hedonic valence as a function of emotional learning. A two-way repeated-measures ANOVA (session by valence) did not reveal any significant effect of conditioning on ratings of the CS valence (F(2,46) = 0.83; not significant). The affective category recall task and the CS–UCS pair recognition task were both administered to evaluate subjects' awareness of the contingent relationship between CS and UCS. We calculated d′ as a measure of sensitivity for detecting the correct valence and the correct arousal category in the affective category recall task. Neither the d′ calculated separately for CSpos, CSneut, and CSneg differed significantly from 0 (d′ = −0.11, −0.05, and −0.15, respectively), nor the d′ for choosing the correct arousal category (emotional CS: d′ = −0.06). The CS–UCS pair recognition task yielded similar results: The d′ values for choosing the correct valence or correct arousal category did not differ significantly from 0 for all CS conditions (positive CS: d′ = 0.09; negative CS: d′ = −0.14; neutral CS: d′ = 0.02; emotional CS: d′ = −0.03).
Discussion
In the present study, we obtained evidence for an early and highly differentiating modulation of auditory processing for emotionally significant appetitive and aversive tones by motivated attention. Recording magnetoencephalographic responses to a multitude of ultrashort tones that were assigned affective meaning through MultiCS conditioning (cf. Steinberg et al., 2011), this study was the first to provide insights into the spatio-temporal dynamics of auditory emotion processing in humans. Consistent with our hypotheses, motivated attention (1) led to differential processing of emotion-associated compared with neutral auditory stimuli; (2) modulated early AEF components, suggesting common neural mechanisms underlying both motivated and directed attention effects; and (3) recruited neural circuitry overlapping with a domain-independent fronto-parieto-temporal attention network.
We analyzed two a priori-defined AEF components, the N1m and the earlier P20-50m. Both components reflect initial cortical processing stages and are known to be sensitive to directed attention: Larger ERP/ERMF amplitudes for attended relative to nonattended stimuli have been reported as early as 20–50 ms after stimulus onset for spatial directed attention (P20-50 component) (Woldorff and Hillyard, 1991; Woldorff et al., 1993) and ∼70–100 until 130 ms after stimulus onset for behaviorally relevant spatial and nonspatial stimulus attributes (Hillyard et al., 1973; Woldorff et al., 1993; Ozaki et al., 2004; Fritz et al., 2007; Poghosyan and Ioannides, 2008). Focusing on attention evoked by specific frequency patterns associated with emotional relevance of the stimulus, we primarily hypothesized to find modulatory effects for the N1m, but included the earlier time interval to test whether motivated attention would affect processing already at these latencies.
In the N1m time range (100–130 ms after stimulus onset), the emotion effect was expressed as stronger negative-deflecting evoked magnetic fields at the right and stronger positive-deflecting evoked magnetic fields at the left hemisphere for both appetitive and aversive relative to neutral stimuli. Timing and topography of this N1m motivated attention effect closely corresponded to findings reported for directed attention tasks (Woldorff et al., 1993). We interpret this convergence between studies that used different material and tasks as a strong indicator for a common underlying neural system to prioritize all kinds of behaviorally relevant stimuli. Distributed L2-minimum-norm source estimations localized the affect-specific increased cortical activation to regions in temporal, as well as parietal and frontal cortex. Such a distributed temporo-fronto-parietal network has been implicated in neuroimaging studies on selective directed attention as a domain-independent neural circuitry underlying the control of auditory and visual attention, and is thought to modulate processing driven by current goals, task relevance, or inherent stimulus salience (Corbetta and Shulman, 2002; Bidet-Caulet and Bertrand, 2005; Fritz et al., 2007). We propose that affectively conditioned auditory stimuli likewise engage this multisensory attention network by virtue of their greater emotional salience (Vuilleumier, 2005). Motivated attention in the auditory system seems to recruit similar neural sensory gain control mechanisms as reported for directed attention to behaviorally relevant stimuli in the auditory and the visual domain (Hillyard and Anllo-Vento, 1998; Moratti et al., 2004; Ferrari et al., 2008). Both sensor and distributed source space data converge to suggest a common attention system providing prioritized processing of emotional stimuli in the auditory system in a rapid and highly resolving fashion already 100 ms after stimulus onset.
Most neuroscientific affective conditioning studies investigated effects of associative learning during conditioning and/or postconditioning sessions only (Büchel et al., 1998; Pizzagalli et al., 2003; Phelps et al., 2004; Dolan et al., 2006; Stolarova et al., 2006; Keil et al., 2007). Relatively stronger processing of affective compared with neutral CS has consistently been interpreted as amplified processing of emotional stimuli. However, such relative differences after conditioning could also be generated by reduced neutral CS processing or a combination of both effects. In the present study, the preconditioning session was additionally taken into account (cf. Steinberg et al., 2011). This more conservative presession/postsession design controls for potential preconditioning variance in CS processing and thus improves signal-to-noise ratio. Although amplitudes elicited by emotional CSs were relatively larger than for neutral CSs in the postconditioning session only, differences between post and presession AEF amplitudes for each affective condition revealed highly significant decreases for neutral CSs but only nonsignificant amplitude increases for click tones associated with affective UCS. At first, this result pattern seems at odds with our “attentional modulation hypothesis” predicting prioritized processing of emotional instead of attenuated processing of neutral CSs. Yet the findings do fit our prediction of affect-specific attentional modulation of CS processing resulting in a relatively enhanced activity for CSemot compared with CSneut in the presession/postsession design. On the neural level, the resultant pattern could well be explained by auditory sensory gating, a presumed inhibitory prefrontal-thalamic sensory filter (Yingling and Skinner, 1977) observed as neocortical habituating responses to repeated click-sounds peaking around 50 ms (Boutros and Belger, 1999). Sensory gating is thought to protect humans from being flooded by irrelevant stimuli (Waldo and Freedman, 1986; White and Yee, 1997) by attenuation or habituation of incoming irrelevant sensory input (Grunwald et al., 2003). Patients with dorsolateral prefrontal cortex lesions showed enhanced P50 amplitudes to task-irrelevant distracters (Alho et al., 1994; Knight and Grabowecky, 1995; Knight et al., 1999), which is consistent with the idea of a prefrontal-thalamic sensory gating system. Importantly, directed attention eliminates suppression of the auditory component to repeatedly presented stimuli (Guterman et al., 1992). We propose that CS tones associated with irrelevant sounds have been subject to sensory filtering and are reduced in amplitude, whereas emotion-associated tones passed the thalamic gate because of motivated attention.
Within the earlier P20-50m time interval, emotion-associated stimuli were also differentially processed. Since the topography of the emotion effect less clearly corresponded to the topography of the grand-averaged auditory CS-evoked response, we suggest that not auditory sensory processing itself, but higher cognitive areas as part of a distributed attention network might be modulated in the presence of emotional stimuli so rapidly after stimulus onset. Inverse source modeling revealed a significant amplification of affective compared with neutral CS processing in a temporo-parietal region of the right hemisphere. This region might mediate attentional reorienting toward behaviorally significant stimuli and belongs to the neural attention network also activated in the N1m time interval (Shomstein and Yantis, 2006). Although such remarkably rapid modulation of auditory processing was not expected based on findings for directed attention to nonspatial stimulus features, it appears highly adaptive in terms of selective attention toward potentially survival-relevant stimuli.
How does the brain accomplish this rapid and highly resolving differentiation of multiple complex stimuli after sparse affective associative learning? We consider short-term plasticity in the auditory cortex in conjunction with top-down modulation by higher cognitive cortex structures as essential for this capacity. Associative learning is thought to induce short-term plasticity in auditory cortex (Edeline et al., 1993; Weinberger, 2004; Ohl and Scheich, 2005; Stolarova et al., 2006; Keil et al., 2007). The primary auditory cortex not only analyzes stimulus features but has been directly implicated in the storage of specific information about auditory experiences, among others, the behavioral relevance of auditory input (Weinberger, 2004). Fritz et al. (2007) suggested that receptive fields in primary auditory cortex might be dynamically reshaped in accord to salient target features and task demands by means of top-down signals adjusting attentional filters. The amplified activity within the prefrontal-parietal network shown here may in fact represent such attentional top-down filter functions.
Results of the CS–UCS matching and the affective category recall task suggested that participants had no awareness of the predictive CS–UCS relationship [contingency awareness (CA)]. Yet CA was assessed only after repeated unreinforced CS presentations and thus might have decayed at the time of assessment (cf. Lovibond and Shanks, 2002). However, a preceding behavioral pilot test on a different subject group assessing CA directly after conditioning showed that subjects were unaware already for six different CS–UCS pairs. Claiming that CA was a necessary precondition for affective conditioning to occur, Lovibond and Shanks (2002) evoked a quite controversial scientific discussion (Wiens and Ohman, 2002). Recent studies are inconclusive, showing that aversive pavlovian conditioning may (Baeyens et al., 1990; Öhman and Mineka, 2001; Walther and Nagengast, 2006) or may not (Purkis and Lipp, 2001; Dawson et al., 2007; Pleyers et al., 2007) occur in the absence of CA. The parallel investigation of brain activation measured with fMRI and skin conductance response (Tabbert et al., 2005) or valence ratings (Klucken et al., 2009), delivered evidence for a differential effect of CA on neural activity versus more cognitive response levels that might explain the lack of consistent findings in other studies. In line with these observations, we found a dissociation of affective learning effects present in CS-evoked MEG responses and absent in preconditioning/postconditioning CS rating.
In conclusion, the present results demonstrate that motivated attention (1) is engaged very rapidly after onset of emotional auditory stimuli modulating neural activity in the N1m and even in the P20-50m time interval, (2) differentiates multiple ultrashort click-like tones as a function of their associated affective significance for both appetitive and aversive stimuli, and (3) recruits the same neural mechanisms and circuitry as selective directed attention.
Footnotes
-
This work was supported by Deutsche Forschungsgemeinschaft Grant SFB TRR-58 C01. We thank A. Wollbrink, Karin Berning, Ute Trompeter, Hildegard Deitermann, and Janna von Beschwitz for technical assistance.
-
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Markus Junghöfer, Institute for Biomagnetism and Biosignalanalysis, University Hospital Münster, Malmedyweg 15, D-48149 Münster, Germany. markus.junghoefer{at}uni-muenster.de