Abstract
Although advances have been made regarding how the brain perceives emotional prosody, the neural bases involved in the generation of affective prosody remain unclear and debated. Two models have been forged on the basis of clinical observations: a first model proposes that the right hemisphere sustains production and comprehension of emotional prosody, while a second model proposes that emotional prosody relies heavily on basal ganglia. Here, we tested their predictions in two functional magnetic resonance imaging experiments that used a cue-target paradigm, which allows distinguishing affective from sensorimotor aspects of emotional prosody generation. Both experiments show that when participants prepare for emotional prosody, bilateral ventral striatum is specifically activated and connected to temporal poles and anterior insula, regions in which lesions frequently cause dysprosody. The bilateral dorsal striatum is more sensitive to cognitive and motor aspects of emotional prosody preparation and production and is more strongly connected to the sensorimotor speech network compared with the ventral striatum. Right lateralization during increased prosodic processing is confined to the posterior superior temporal sulcus, a region previously associated with perception of emotional prosody. Our data thus provide physiological evidence supporting both models and suggest that bilateral basal ganglia are involved in modulating motor behavior as a function of affective state. Right lateralization of cortical regions mobilized for prosody control could point to efficient processing of slowly changing acoustic speech parameters in the ventral stream and thus identify sensorimotor processing as an important factor contributing to right lateralization of prosody.
Introduction
Prosody, the speech melody, corresponds to slow modulations of pitch, rhythm, stress, or loudness, and conveys linguistic information but also most of the information regarding speakers' intentions or emotional state. Although recent advances have been made regarding how the brain perceives emotional prosody, the neural bases involved in its generation remain unclear and debated. Two models have been proposed and forged on the basis of clinical observations: a first model proposes that the right hemisphere sustains production and comprehension of emotional prosody, while a second model proposes that emotional prosody relies heavily on basal ganglia (BG) function. Here, we aimed at testing predictions arising from these models using functional magnetic resonance imaging (fMRI).
The lateralization hypothesis considers that affective prosody is a lateralized and dominant function of the right hemisphere (Ross, 1981; Shapiro and Danly, 1985; Borod et al., 2002). Regarding comprehension, several fMRI studies have emphasized the role of a right-lateralized network involving the superior temporal sulcus (STS; BA22/42) and inferior frontal cortex (BA45/47), with no clear influence of emotional valence or category of emotion (for review, see Wildgruber et al., 2006; Kotz et al., 2006). On the production side, prosodic control during speech production has either been related to a right-lateralized (Riecker et al., 2002) or to a bilateral perisylvian network (Aziz-Zadeh et al., 2010), and right hemisphere damage often induces a monotonous speech pattern evoking “flat affect.” Despite these observations, dysprosody is also found after left hemisphere damage and the lateralization theory remains debated (Baum and Pell, 1999; Kotz et al., 2003; Van Lancker et al., 2006; Ross and Monnot, 2008).
An alternative model proposes that BG play an essential role in the generation of prosody and also mediate emotional expression and comprehension (Cancelliere and Kertesz, 1990; Pell and Leonard, 2003; Van Lancker et al., 2006; Paulmann et al., 2011). Cancelliere and Kertesz (1990) found that stroke patients who presented clinically with dysprosody suffered most frequently from left or right BG lesions, followed by anterior temporal lobe and insula lesions. Moreover, dysprosody is frequently associated with the disruption of striatal dopaminergic innervation in Parkinson's disease (PD) patients (Caekebeke et al., 1991; Benke et al., 1998). Although PD patients speak monotonously, they are able to imitate emotional prosody reasonably well (Möbes et al., 2008). This suggests that BG play a prominent role in modulating the speech motor plan as a function of the affective state.
Anatomy indicates that the affective and motor components of prosody should be handled by two adjacent BG territories: dorsal striatum exchanges information with sensorimotor and associative cortices while ventral striatum is linked to the limbic system (Yelnik, 2008). We thus used fMRI in healthy participants to test which processing steps during emotional prosody generation recruit BG and the right hemisphere. We explored a cue-target paradigm that allows distinguishing the affective from the sensorimotor aspects of emotional prosody generation (see Materials and Methods). In a first experiment, participants were scanned while reading aloud sentences adopting either an emotional or a neutral prosody. In a second experiment, we tested the specificity of our findings and dissociated generation of emotional from generation of linguistic prosody by comparing happy with interrogative intonation.
Materials and Methods
Procedure of Experiment 1.
During the first experiment (Fig. 1), subjects performed two sessions of a reading task adapted from Kell et al. (2011). In each trial, they read sentences aloud and modulated prosody according to a brief auditory word cue (fearful, sad, angry, happy or neutral, lasting ∼500 ms), which preceded sentences by a variable delay (discrete uniform distribution) ranging from 2 to 4 s. The use of a jittered instruction delay allowed us to temporally dissociate speech preparation from speech execution (factor trial phase) and assess the influence of emotional versus neutral prosody (factor emotion) within each phase. Because during the preparation phase-specific speech planning was impossible, it reflected only intention to speak and induction of emotional state. All sensorimotor aspects of speech processing thus fall into the execution phase. The affect associated with the cue was randomized between trials. To avoid frequent trial-to-trial switching between the five emotional states, fearful and sad trials were pooled in the same session, separately from angry and happy trials that appeared together in another session. A neutral condition was included in each counterbalanced session. Manipulation of emotion yielded a total of six experimental conditions, each containing 30 trials (total duration per session 17 min 40 s). For each condition, we modeled the preparation and the execution phase separately, yielding a total of 12 experimental conditions. Thirty written semantically neutral German declarative sentences with identical syntactical structure were presented for 3 s each (e.g., “Grüne Marken kleben vorne auf dem Umschlag,” translated “Green stamps are attached to the front of the envelope”). Intertrial intervals were pseudorandomly generated using a discrete uniform distribution with a mean of 5.6 s, lower and upper bounds of 2 and 10 s, respectively. Before scanning, participants were familiarized with the experimental setting and practiced by reading aloud a training set of sentences that were not used in the main experiment. Training was repeated until participants' performances were deemed satisfactory by the experimenter, and were recorded for later acoustic and perceptual analysis.
Participants.
Twenty right-handed volunteers (nine females, mean age ± SD: 26.5 years ±3.4; and nine males: 27.3 years ± 2.5) without neurological or psychiatric history participated in a 3 T fMRI study. All provided written informed consent according to guidelines of the local research ethics committee and were paid for their participation.
fMRI data acquisition.
Gradient-echo T2*-weighted transverse echo-planar images (EPI) with blood oxygenation level-dependent (BOLD) contrast were acquired with a 3 T Magnetom TIM Trio scanner (Siemens). Each volume contained 33 axial slices acquired in a sequential manner (TR/TE/flip angle = 2000 ms/30 ms/90°, FOV = 192 mm, resolution = 64*64, isotropic voxels size of (3 mm)3, distance factor 25%). We collected a total of 1060 functional volumes for each subject, as well as a high-resolution T1-weighted anatomical image (TR/TI/TE/flip angle = 2250 ms/900 ms/2.6 ms/9°, FOV = 256 mm, resolution = 256*256, slice thickness = 1.1 mm, 144 sagittal slices). We administered the behavioral protocol using Presentation software (Neurobehavioral Systems).
Acoustic and perceptual analysis of speech data from the training phase.
The initial training phase was audio-recorded with a standard microphone plugged to a computer equipped with Adobe Audition. Pauses between utterances were cut out and sentences grouped according to the category (fearful, sad, angry, happy, neutral). Using PRAAT (Boersma, 2001; http://www.fon.hum.uva.nl/praat), we extracted the intensity contours of each sentence and obtained mean values by converting the sound pressure level measured in decibels (dB) into linear sound pressure values. After creating the mean for all sentences of one condition, values were reconverted to dB. To analyze how much a participant modulated his voice in terms of frequency we calculated the SD of each sentence's fundamental frequency. The SDs from all sentences belonging to one condition were averaged. Statistics were performed using paired t tests in SPSS (SPSS Inc.). Averaged acoustic parameters were then used for contrast weighting with functional imaging group data (see below), solely for the analysis rendered in Figure 2. Finally, the audio speech samples were presented in randomized order to seven lay raters who judged the prosody of the sentences as neutral, fearful, sad, angry, or happy. Percentage correct answers for each prosody type were compared against each other using paired t tests in SPSS.
fMRI image processing.
Image processing and statistical analyses were performed using SPM8 (Wellcome Trust Centre for Neuroimaging, London, UK; www.fil.ion.ucl.ac.uk/spm). Functional images were spatially realigned to the first volume by rigid body transformation, corrected for time differences in slice acquisition using the middle slice in time as reference, spatially normalized to the standard Montreal Neurological Institute EPI template, resampled to an isotropic voxel size of 2 mm, and spatially smoothed with an isotropic 8 mm full-width at half-maximum (FWHM) Gaussian kernel (Friston et al., 1995).
fMRI whole-brain analysis.
We performed standard analyses using the general linear model (GLM) as implemented in SPM8, where event-related signal changes were modeled separately for each subject. For each session, we specified a linear model including seven event types. A first condition modeled transient effects related to the auditory cue (all cues were pooled in a single regressor). Three other regressors (two emotions, one neutral condition) modeled the preparatory period as a function of the primed affect, while three additional regressors modeled the presentation of the sentence and its articulation by the subject. We decided to keep both neutral conditions apart because of possible influence by session-specific emotions. For each condition, a covariate was calculated by convolving delta functions with a canonical hemodynamic response function (HRF), over the duration of the phase (preparation or execution). As explained earlier, the use of a jittered cue allowed us to decorrelate preparation from execution and thus temporally dissociate them. Twelve additional covariates were included in the GLM to model movement-related artifacts (realignment parameters and their temporal derivatives). As each session encompassed a neutral condition, this yielded a total of 12 images of interest per subject.
For the whole-brain group level analysis, images of parameter estimates were entered in a repeated-measures ANOVA crossing the factors trial phase (preparation, execution) and emotion (fearful, sad, neutral1, angry, happy, neutral2). In this way, the variance estimates incorporated appropriately weighted within-subject and between-subject variance effects. Given the absence of emotion-specific effects, we averaged emotions together at the second level by weighting the contrasts preparation (emotion > neutral) and execution (emotion > neutral) accordingly (Table 1, Table 2, respectively). We preferred to keep the neutral baseline apart to estimate session-specific contrasts although no significant difference was detected when comparing both conditions against each other. For informative purposes, we also indicate in Table 1 and Table 2 which regions show a phase*emotion interaction pattern specific to preparation or execution of emotional prosody. For Table 1, this corresponds to a greater differential response between emotional and neutral trials during the preparation phase (execution phase for Table 2) as compared with the same difference in the execution phase (preparation phase for Table 2). A nonsphericity correction was applied to prevent inhomogeneity in variance of differences among pairs of conditions. Anatomical labeling was performed over the average T1-weighted brain of the group and all parametric maps were rendered on the single-subject T1-weighted brain available in SPM8 using MRIcron (Rorden et al., 2007). To identify brain regions that generally contribute to acoustic modulations underlying prosody, the emotion-specific regressors (execution phase) were weighted (in the group analysis) with the averaged acoustic parameters from prescanning recordings (Fig. 2). This contrast corresponds to a simple correlation between condition-specific first level β images and the pattern of acoustic parameters extracted from the training set of sentences. It is important to note that the correlation is not with the actual prosody of the sentences spoken during scanning, but with the prescanning recordings, which were different sentences, spoken in different conditions. All whole-brain statistical maps were corrected for multiple comparisons using standard Bonferroni correction (p < 0.01 familywise error rate; FWE).
This analysis was completed by region of interest (ROI) analyses targeting the amygdala and the striatum. The amygdala is well known for its involvement in processing emotional visual and vocal stimuli (Phillips et al., 1998; Frühholz et al., 2012), notably fearful, angry, and happy expressions (Vuilleumier, 2005; Sergerie et al., 2008; Pichon et al., 2012a,b). Together with the hippocampus, they appear as good candidates for mood induction as both structures are known to be engaged during fear and happiness induction (Damasio et al., 2000). We first extracted β values using maximum probability maps of the left and right amygdala provided in the anatomy toolbox (Eickhoff et al., 2005). We then examined the degree of activation (extracted β values averaged across all voxels of each mask) evoked by each trial type using a repeated-measures ANOVA crossing the factors laterality, trial phase and emotion (α threshold at p < 0.01). The ROI analysis in the striatum aimed at studying whether the previously described dissociable loops through the BG (Alexander et al., 1990; Yelnik, 2008) manifested in functional striatal subdivisions or rather in a continuous dorsoventral gradient in limbic processing (see Fig. 6). We extracted, in each hemisphere and for each trial phase, contrast values for six voxels located between the dorsal (±22 8 14) and the ventral striatum (±22 8 −6), and separated each by 4 mm on the z-axis. These values were submitted to an ANOVA crossing the factors coordinate (1, 2, 3, 4, 5, 6), trial phase (preparation, execution), and laterality (left, right). We performed a specific contrast on the factor coordinate to test for the presence of a significant linear term (increase) along the z-axis (from dorsal to ventral).
Finally, to visualize the time courses of the ventral striatum (that we found markedly recruited during preparation for emotional prosody) as compared with time courses of the STS (that was recruited during both phases) we computed the group average time courses in both regions over a period of 14 s after the trial onset (cue presentation). For each subject, adjusted time courses were extracted using a 2 mm radius sphere centered at the group activation coordinate peaks in the right hemisphere. Given the absence of emotion-specific effects in our results, we averaged the time courses of all emotions together and also plotted the average time course during generation of neutral prosody.
fMRI lateralization analysis.
We tested for lateralization of brain activity during emotional prosody preparation and production by comparing individual contrast images with their flipped counterparts. To account for minor spatial differences in exact localization of corresponding brain regions, we additionally smoothed images with an isotropic 6 mm FWHM Gaussian kernel (Kell et al., 2011). Results of the paired t tests are reported at p < 0.01, corrected for multiple comparisons at the voxel level (FWE) over the entire brain.
Psychophysiological interactions analysis.
We conducted psychophysiological interaction (PPI) analyses (Friston et al., 1997) to compare the connectivity of ventral and dorsal striatum with limbic and motor regions, during preparation and execution of emotional relative to neutral prosody. PPIs estimate effective connectivity via changes in inter-regional covariance as a function of different experimental manipulations or tasks. First eigenvariate values were extracted for each participant from the filtered BOLD signal of the right and left ventral and dorsal striatum, separately. Regions were identified at the level of the group by contrasting the preparation for emotional (vs neutral) prosody (see Table 1). Signal was extracted for each region from a 2 mm radius sphere centered at the group activation coordinate peak. Each time series was adjusted to exclude effects of no interest (movement parameters) and mean corrected. The regressors were deconvolved to obtain an estimate of the neural response, multiplied by the psychological context of interest (i.e., emotion vs neutral, either in the preparation or in the execution phase), and reconvolved using the canonical HRF to obtain a PPI regressor. According to the standards, we set up for each subject 8 GLMs (2 laterality × 2 regions × 2 phases). Each contained the PPI regressor as well as 15 other regressors of no interest, including the time series of the seed region, the psychological factor convolved with the canonical HRF, a regressor modeling transient responses induced by the auditory cues, and 12 covariates modeling the movement parameters and their temporal derivatives. The model was then fitted to each brain voxel. At the group level, one-sample t tests over the PPI regressor of each model were used to reveal brain regions that showed significant increases in functional coupling with the seed region during preparation or production of emotional relative to neutral prosody. Consistent with the previous analysis, results were reported at p < 0.01 (FWE).
Using paired t tests, we also performed laterality analyses to compare the intrahemispheric and interhemispheric connectivity differences between (PPI maps of) the striatal seed regions in each hemisphere. We found none, even at more lenient thresholds (p < 0.05 FWE). These analyses were performed separately for the ventral or dorsal striatum, and separately for each trial phase. This required comparing the PPI map of a given seed region with the flipped version of the homolog region's PPI map in the other hemisphere for the same phase. As in the fMRI lateralization analysis, PPI contrast images were smoothed before performing statistical comparisons, using an isotropic 6 mm FWHM Gaussian kernel to account for minor spatial differences in exact localization of corresponding brain regions.
Procedure of Experiment 2.
We ascertained whether our findings in ventral and dorsal striatum from Experiment 1 were specific for emotional prosody by comparing emotional (happy) and nonemotional linguistic (interrogative) prosody. We explored an existing dataset that had been acquired before Experiment 1 and that has partly been used for publication previously (Kell et al., 2011). While emotional and linguistic tasks are both known to engage superior temporal lobe, perceptual tasks requiring judgments on linguistic prosody engage other brain regions than affective judgments (Wildgruber et al., 2006). This experiment was initially designed to test if such differences could be detected between generation of emotional and nonemotional linguistic prosody. Twenty-six participants were scanned (13 females, mean age 29, range 19–44). Similar to Experiment 1, they were asked to prepare according to four different cues (factor 1): “happy,” “question,” “neutral,” and “covert” (i.e., neutral silent reading) before reading the sentence that was displayed after the jittered interval. fMRI data were analyzed separately for the preparation and execution phases (factor 2), producing eight conditions of interest. We applied the same methodological procedure as the one described in Experiment 1. At the group level, we performed a whole-brain analysis (thresholded at p < 0.01, FWE corrected) and ROI analyses using the coordinates of the ventral and dorsal striatum subpeaks isolated in Experiment 1. We extracted β values for each of the eight conditions of interest and each of the four (left and right, ventral and dorsal) striatal ROIs using a 2 mm radius sphere centered on each coordinate as estimated above (β values were averaged across all voxels of the sphere). We then performed post hoc tests between the emotional and linguistic conditions within each trial phase. Similarly to Experiment 1, we applied a Bonferroni correction for the number of ROIs (n = 4) used within each contrast and report adjusted p values thresholded at p < 0.01 corrected (corresponding to an α threshold of p < 0.0025). We also performed laterality analyses on these ROI data for the comparison of emotional and linguistic versus neutral prosody.
Results
Behavioral results
Participants in Experiment 1 produced perceptually distinguishable prosody: seven independent raters attributed speech samples from the training phase correctly to the neutral condition or one of the emotions in 67% of trials, which is more than three times above chance level and congruent with previous literature (Scherer et al., 2003). Angry (76%) and happy (74%) prosody were most easily identified, while recognition scores were lower for sad (66%), neutral (64%), and fearful (54%) prosody (the latter significantly different from the others at p < 0.01), again replicating previous studies (Banse and Scherer, 1996; Wildgruber et al., 2005). Acoustic analysis of participants' speech production during the training phase confirmed that they adequately modulated speech acoustics to express emotions. Respective of the neutral condition, mean speech amplitude (intensity, Fig. 2A) was stronger when producing fearful, happy, or angry prosody (t(19) = 2.98/8.22/12.7, respectively, adjusted p < 0.01). No intensity modulation was observed for sad prosody (adjusted p = 0.48). This is consistent with previous reports indicating that the speech amplitude is less pronounced for sad, fearful, and neutral prosody than for happy and angry prosody (Pfitzinger and Kaernbach, 2008). We observed no difference in the variability of fundamental frequency (SD of F0) for fearful, sad, or angry prosody (t(19) = 0.95/0.72/1.17, adjusted p = 1). Only happy prosody differed from neutral (t(19) = 7.07, adjusted p < 0.001; Fig. 2A), again reproducing previous results (Pell, 1999).
fMRI results of Experiment 1
Effects of emotion induction and prosody preparation (preparation phase)
During the preparation for emotional (vs neutral) prosody, we observed a strong activation of the bilateral dorsal and ventral striatum, consistent with the prediction that BG are important in the production of prosody (Fig. 3A; Cancelliere and Kertesz, 1990; Van Lancker Sidtis et al., 2006). As part of the limbic system, the bilateral amygdala (as confirmed by the ROI analysis below) and posterior insula increased their activity during emotion induction. Notably, the strongest activation was located medially in the retrosplenial and the posterior cingulate cortices. The retrosplenial cortex is densely connected to the hippocampal formation (Vann et al., 2009), which was also activated bilaterally. The involvement of this region during emotion induction is consistent with the role of the retrosplenial cortex in retrieving autobiographical memory and its involvement in processing self-related information (Maddock, 1999; Vann et al., 2009; Qin and Northoff, 2011).
In addition to emotion induction, preparation for emotional prosody also involved speech-related preparatory activity. Stronger activity for preparing for emotional than for neutral prosody was found in frontal and prefrontal motor-related cortices including left inferior frontal gyrus (dorsal BA44) and bilaterally in supplementary motor area (SMA), pre-SMA, and primary motor cortex. Also auditory regions pre-activated more strongly in the anticipation of auditory feedback, including the bilateral STS, left planum temporale, and right primary auditory cortex (Heschl's gyrus). Except for a cluster in the right primary visual cortex, no significant lateralization was found, even at lower threshold (p < 0.05 FWE). We also checked whether there were some emotion-specific patterns of responses by contrasting each emotion to all other emotions during the preparation phase but found none. Details of all activations are presented in Table 1.
In bilateral amygdala, the ROI analysis revealed no clear sign of lateralization. The ANOVA confirmed a main effect of emotion (F(2.74,52) = 10.32, p < 0.001) with no other significant main effects or interactions [main effect of laterality (p = 0.94), main effect of phase (p = 0.16), laterality-by-emotion interaction (p = 0.35)], indicating that emotional prosody did not lateralize amygdala activity.
Striatal connectivity during emotion induction (preparation phase)
We compared changes in effective connectivity between the ventral and the dorsal striatum and other brain regions while subjects prepared for emotional (vs neutral) prosody (see Fig. 5; see Materials and Methods). The right ventral and dorsal striatum showed stronger ipsilateral connectivity between each other and with the anterior superior temporal gyrus, temporal pole, and right anterior insula. The right dorsal striatum additionally connected to the right Broca homolog (BA44) and to the contralateral orbitofrontal cortex and inferior temporal gyrus. Given that patients who suffer from lesions in these brain regions are dysprosodic, insular and anterior temporal integrity is necessary to generate or at least mediate information necessary for generation of emotional prosody (Cancelliere and Kertesz, 1990). Connectivity results are detailed in Table 3.
Effects of emotional prosody production (execution phase)
Production of emotional prosody compared with neutral speech production increased BOLD responses bilaterally in inferior frontal gyrus (Fig. 3B; BA44, BA45, and BA47, extending into anterior insula), superior cerebellum, thalamus and globus pallidus, substantia nigra, and STS with a much larger extent from anterior to posterior in the right hemisphere. Lateralization analyses revealed stronger activation of the posterior right (vs left) STS (p < 0.01 FWE). We observed no other lateralization, even at lower threshold (p < 0.05 FWE). Note that these results are consistent with the nonlateralized bilateral inferior prefrontal network observed in perception studies (Wildgruber et al., 2006; Ethofer et al., 2009). Interestingly, the left sylvian parietotemporal area (SPT/TPJ), as an important component of the dorsal stream, was more strongly activated during production of emotional than of neutral prosody, but no other regions belonging to this fast sensorimotor translating system (i.e., dorsal premotor cortex, dorsal Broca's region; Hickok and Poeppel, 2007) showed this activity pattern. Details of activations are presented in Table 2.
During speech production, activity in right STS (xyz = [48 − 32 0], t = 5.01) and cerebellar vermis (xyz = [2 −64 −34], t = 5.44) reflected modulation of fundamental frequency (Fig. 2B). This suggests that right lateralization of prosody-related activity is directly related to the acoustic features of the speech signal. Also condition-specific changes in the other studied acoustic parameter, speech intensity, affected right STS (xyz = [48 −32 0], t = 5.95), and cerebellar vermis activity (xyz = [2 −62 −34], t = 6.88), but also activity in the bilateral inferior frontal gyrus (BA47; xyz = [±48 32 −6], t = 5.15/5.45) and superior temporal gyrus (xyz = [±64 −12 2], t = 5.04/6.28). The time courses of BOLD responses in ventral striatum and STS (Fig. 4) confirmed an involvement of the limbic region during preparation only while STS activated during preparation and actual production.
Striatal connectivity during production of emotional prosody (execution phase)
The right ventral striatum mainly connected to the dorsal striatum (Fig. 5). The dorsal striatum in turn was strongly connected to the bilateral articulatory motor and right auditory cortices but was also connected with limbic structures such as the right anterior hippocampus and bilateral amygdala. Connectivity results are fully detailed in Table 3.
Striatal gradient in affective processing
We tested whether there were distinguishable motor-related dorsal and limbic ventral striatal subdivisions. We hypothesized that if BG loops were spatially segregated (Alexander et al., 1990), we should observe separable activations in the dorsal versus ventral striatum during emotion induction only. Nevertheless, the ANOVA revealed a significant main effect of coordinate (p < 0.001) characterized by a significant linear increase (p < 0.001) of bilateral striatal preparatory activity from dorsal to ventral striatum (Fig. 6). This linear increase disappeared during prosody production (p > 0.2). There was also a laterality-by-coordinate interaction (p = 0.008), driven by a marginally stronger response of the left (vs right) ventral striatum (p = 0.13) while no difference was present in dorsal striatum (p = 0.4).
fMRI results of Experiment 2
Specificity of effects related to emotional prosody
We controlled in 26 additional subjects whether the activity pattern observed in Experiment 1 was-specific to emotional prosody or could also be detected for linguistic prosody. We used an existing dataset that had been acquired before Experiment 1 that has partly been used for publication previously (Kell et al., 2011). The whole-brain analysis showed that the ventral striatum was again more active during preparation for happy than neutral prosody (Kell et al., 2011), and dissociated preparation for emotional from preparation for linguistic prosody (p < 0.01 FWE), while no difference was detected during the execution of emotional versus linguistic prosody. We also reproduced our results for the contrast comparing the execution of happy versus neutral prosody except for the activation in dorsal ventral premotor cortex (which activated at p < 0.001, uncorrected).
The ROI analyses (Fig. 7) revealed that although the bilateral ventral striatum was also slightly pre-activated for linguistic and neutral prosody (with no significant difference between neutral and linguistic prosody, adjusted p > 0.01) it was mostly involved in the preparation for happy prosody (preparation for happy > linguistic prosody: left: t(25) = 3.99, adjusted p = 0.002; right: t(25) = 3.63, adjusted p = 0.005). The bilateral dorsal striatum was equally pre-activated for emotional and linguistic prosody (adjusted p > 0.16), suggesting a more cognitive role of the dorsal compared with the ventral striatum. During production of happy prosody, the bilateral ventral and dorsal striatum did not increase activity compared with linguistic baseline (p > 0.01). Again, we observed no striatal lateralization (p > 0.15).
Discussion
The present study investigated the neural substrates involved in the generation of emotional prosody. We dissociated affective from sensorimotor aspects of this speech behavior to test the assumptions of two existing mostly lesion-based models in emotional prosody generation: a first model which emphasizes the role of BG, together with anterior temporal regions and insula (Cancelliere and Kertesz, 1990), and a second model proposing that emotional prosody is a dominant function of the right hemisphere (Ross, 1981; Shapiro and Danly, 1985; Borod et al., 2002). We found support for both models as we show a strong bilateral involvement of BG, limbic regions, temporal pole, and anterior insula during preparation for emotional prosody and right-lateralized auditory feedback-related processing during actual production of prosodic speech.
Our results support the BG model, as this structure appears particularly involved in the early phase of emotional prosody generation preceding linguistic or sensorimotor processing. Although we did not find support in favor of completely segregated BG loops, we attribute different roles to the ventral and dorsal parts of the striatum: while the ventral striatum is specifically related to emotion induction and not to linguistic or motor aspects of emotional prosody generation, the dorsal striatum is particularly sensitive to the latter. The limbic activity observed during the preparation phase in ventral striatum and amygdala is accompanied by processing in bilateral retrosplenial cortex, hippocampi, and anterior temporal poles, which in turn could be interpreted as self-referential semantic processing subserving emotion induction. This network has been consistently found in tasks requiring retrieval of autobiographical experience, processing of emotional stimuli and episodic memory (Maddock, 1999; Frith and Frith, 2003; Vann et al., 2009). Lesions of this network (temporal pole and anterior insula) have been identified as the second most frequent cause of dysprosody after BG lesions (Cancelliere and Kertesz, 1990). Note also that we cannot exclude the alternative explanation that the ventral striatum activation may reflect the mismatch of reading a neutral sentence aloud with a strong emotion rather than mood induction per se. Participants may have experienced the emotion condition as motivating and funny, which may explain why we failed to observe significant emotion-specific effects. We also show that the ventral striatum does not lateralize when preparing for emotional prosody and is functionally connected via the dorsal striatum to a bilateral fronto-insular network that likely codes task rules (Dosenbach et al., 2006). Importantly, we did not find increased connectivity with attentional networks that would have been expected if the emotional tasks were more difficult than the neutral condition. This does not exclude entirely the possibility that difficulty modulated responses in the observed networks. We assume nevertheless that training the participants considerably attenuated difficulty effects.
The ventral striatum receives information from a considerable array of limbic structures (Cardinal et al., 2002). It is tightly linked to motivation, reward processes, and pleasure (Kringelbach and Berridge, 2009). It has also been related to recognition and expression of affects. Patients with lesion in the ventral striatum (but not dorsal parts of BG) show deficits in emotion recognition (whether expressed through the face or voice) as well as increased or decreased emotional experiences (Calder et al., 2004). Note that although we lack the sufficient spatial resolution to identify with precision which BG nuclei are involved during emotion induction, the cluster's maximum appears centered on the ventral pallidum rather than the nucleus accumbens. Ventral pallidum, which is part of the limbic BG territory (Parent and Hazrati, 1995; Yelnik, 2008; Smith et al., 2009), receives inputs from other limbic regions including nucleus accumbens, orbitofrontal cortex, amygdala, and lateral hypothalamus. It connects to the brainstem and to the prefrontal cortex indirectly via the thalamus (Smith et al., 2009). The ventral pallidum thus likely plays a considerable role in translating limbic signals into motor output (Mogenson and Yang, 1991).
Furthermore, alteration of BG dopaminergic transmission in PD patients has often been associated with changes in emotional experience (both in terms of subjective experience and physiological arousal) and impaired recognition of emotions conveyed by faces or voices (for review, see Péron et al., 2012). Note that deficits in emotion recognition are not specific to PD and have also been reported in other pathologies involving disturbed dopaminergic functioning such as autism, Huntington's disease, or schizophrenia (Edwards et al., 2002; Johnson et al., 2007; Harms et al., 2010). In PD, deficits in spontaneous expression of emotional prosody arise independently from affective state and without alteration of semantic understanding or ability to imitate emotional prosody when an external model is available (Blonder et al., 1989; Smith et al., 1996; Simons et al., 2003; Möbes et al., 2008). It indicates that the disease cannot be regarded solely as purely motor, but involves additional deficits at the interface between the limbic and the motor system. Our data indicate that a dysfunctional interplay between ventral and dorsal striatum could underlie PD patients' dysprosody.
Once emotion is induced and speech material available for articulation (execution phase), subcortical integration of affective with sensorimotor speech processing seems to center around the dorsal striatum. The cortical network engaged in sensorimotor processing underlying the slow prosodic speech modulations involves bilateral inferior frontal gyri and superior temporal gyri and sulci, as well as superior cerebellum. This network, together with BG, has previously been associated with control of speech rhythm and speed during syllable repetitions (Ackermann and Riecker, 2010). Interestingly, it is only in this sensorimotor network that we found support for the right-lateralization model of prosody. Right lateralization in the entire system was confined to right posterior STS when participants produced emotionally charged speech. The right STS receives direct input from auditory cortex, is integrated in the ventral speech processing stream (Hickok and Poeppel, 2007), and is sensitive to prosodic speech features during speech perception (Wildgruber et al., 2006). Strikingly, it is this brain region (together with the cerebellum) that is consistently sensitive to acoustic speech features in our study and a previous perceptual study (Wiethoff et al., 2008). It is thus conceivable that this brain region monitors slow (prosodic) modulations of auditory feedback during ongoing speech production. This suggests that previously reported right lateralization of prosody processing (be it frontal or temporal) may relate more to sensory features of the acoustic signal rather than to emotional valence. Please note that in addition to intensity and modulation of fundamental frequency, other correlated acoustic speech parameters could contribute to this right lateralization as hidden variables.
One attractive model that could explain the right-hemisphere advantage of prosody control is the double filtering by frequency hypothesis (Ivry and Robertson, 1999) and its expansion in the speech domain, the asymmetric sampling in time hypothesis (Poeppel, 2003): both models propose an efficient processing of relative lower frequencies in the right hemisphere (suprasegmental processing) while the left hemisphere would have an advantage for the processing of relative higher frequencies (segmental processing). The slow modulations in fundamental frequency and intensity underlying prosody could thus be efficiently decoded in the right hemisphere while the left hemisphere succeeds better in phonematic processing that in turn relies more heavily on higher relative frequencies of the speech signal. This would suggest that during speech production, the right hemisphere could provide auditory feedback regarding the relatively slower modulations of the auditory stream (i.e., prosody) while relatively faster modulations of the speech signal are fed back from the left auditory cortex. Consistent with this model, we observed activation of the right-lateralized posterior STS during prosody production. Yet, in addition to this activity in the right ventral speech processing stream, prosody also recruited left hemispheric ventral speech regions including the superior temporal gyrus and sulcus extending into the insula and ventral portions of the inferior frontal gyrus. The bilateral ventral speech stream is primarily thought to map auditory speech representations onto lexical conceptual representations for speech comprehension (Hickok and Poeppel, 2007; Rauschecker and Scott, 2009). Our data indicate that during higher prosodic demands, the bilateral ventral streams become engaged in feedback processing (Hickok, 2012). We therefore hypothesize that both ventral streams are able to integrate relatively slower modulations of the speech signal that characterize prosody. Nevertheless, right lateralization of the ventral streams is observed during increased processing due to prosodic control. This could result from that fact that during speech production the left hemisphere is primarily engaged in fast sensorimotor processing in the left-lateralized dorsal stream, which could map rapidly changing (high-frequency) properties of the acoustic speech signals to frontal lobe articulatory networks (Hickok and Poeppel, 2007). Supposing parallel processing in the two hemispheres, the right ventral stream could have consequently become specialized in analyzing the slow modulations of the acoustic speech spectrum that underlie prosody and in mapping these sensory representations to motor representations. It is interesting to note that increased prosodic processing does not further mobilize the left-lateralized dorsal stream except for its proposed input region, area (SPT)/(TPJ). This region could potentially contribute to sensorimotor processing in the ventral stream in addition to its known involvement in sensorimotor mapping within the dorsal stream. Yet, given that right SPT/TPJ is not engaged during prosody generation, its activity does not seem to constitute a prerequisite for ventral stream processing.
Conclusion
Our findings support both the BG and the lateralization hypothesis, yet for different aspects of prosody processing. Emotional prosody generation involves affective components that are linked to bilateral ventral striatum and other limbic and paralimbic structures. On the other hand, dorsal striatum is more sensitive to cognitive and motor aspects of emotional prosody preparation and production and is more strongly connected to the sensorimotor speech network compared with the ventral striatum. Right lateralization is limited to the right STS in which activity reflects acoustic speech parameters, suggesting that it is sensorimotor and not affective processing that right lateralizes prosodic computations. Our results also suggest that dysprosodic syndromes could arise from two different lesions. First, a first lesion that perturbs a limbic-semantic network, which comprises anterior insular and temporal cortices connected to limbic (ventral) portions of the BG, and that would yield an emotional prosodic production deficit on the basis of disturbed integration of affect into motor routines. A second type of lesion involving a right-lateralized sensorimotor network more strongly connected to dorsal striatum would lead to dysprosody on the basis of perturbed sensorimotor processing of relatively slower modulations of the speech signal.
Footnotes
S.P. is supported by the Academic Society of Geneva (Fund Foremane) and by the Swiss Centre for Affective Sciences. S.P. would also like to thank Patrik Vuilleumier for his support. C.A.K. is supported by the Medical Faculty of Goethe University, Frankfurt. We thank Christiane Arnold, Marion Behrens, Sebastian Hoffer, Christian Keller, Maritza Darquea, and Frederic von Wegner for help.
- Correspondence should be addressed to Dr. Swann Pichon, Laboratory for Behavioral Neurology and Imaging of Cognition, Department of Neuroscience, Medical School, University of Geneva, 1 rue Michel-Servet, 1211 Geneva 4, Switzerland. swann.pichon{at}gmail.com