Understanding the emotional expressions of others is an essential capacity of the human brain. Not only is it a prerequisite for social interaction and communication, it can also be critical for survival. Emotional expressions may signal immediate danger from an external source, signaled by others' fear, or directly from the opponent, signaled by anger. Because emotions are conveyed via multiple channels including face, voice, and body posture, it has been assumed that this information is used and integrated into a coherent percept of emotional expression. Nevertheless, most previous research dealt with only one stimulus modality at a time. Although this yields information about the neural pathways that are specific to processing emotional cues from each single modality, the questions remain: how is information from different modalities integrated (multimodally), and are emotions also coded at a more abstract level beyond modality-specific stimulus characteristics (supramodally)?
The latter point was recently addressed in an article by Peelen et al. (2010). The authors presented video clips of either faces or body postures, or audio clips of vocalizations expressing anger, disgust, fear, happiness, or sadness. Although these stimuli were presented separately as in previous studies, they were all shown to the same participants in a single experiment, allowing direct comparison of the different modalities. The use of dynamic visual stimuli offers two advantages over static displays. First, and most importantly to the study, it establishes compatibility between the visual and auditory stimuli, as acoustic signals necessarily evolve over time. Second, it strongly increases ecological validity as real-world interactions rarely involve static stimulation. The use of five different emotions provides a more comprehensive view of affective processing than the usual positive–negative dichotomy.
Participants' ratings of the stimuli indicated equivalent emotional intensity in faces, body postures, and voices across emotional conditions. To identify brain regions that exhibit emotion-specific activation independent of the modality of presentation, Peelen et al. (2010) performed an fMRI-based multivoxel pattern analysis (Kriegeskorte et al., 2006). In contrast to conventional fMRI analysis, which searches for regions in which mean activity, averaged across a number of voxels, increases in an experimental condition, this approach searches for regions in which the pattern of activity contains information about an experimental condition (i.e., the pattern of activity increases and decreases across voxels). Here, Peelen et al. (2010) applied a variant of this analysis in which they first correlated the activation patterns related to each experimental condition with one another (for the resulting correlation matrices, see their supplemental Fig. 1, available at www.jneurosci.org as supplemental material). In the next step, the authors searched for regions showing high correlations between the different modalities for each specific emotion, e.g., regions displaying a high correlation between activity for fear in face, voice, and body posture. Last, those regions that showed high correlations across modalities for all of the different emotional categories were identified (i.e., fear correlated highly across modalities, and so did anger, disgust, happiness, and sadness). This analysis revealed two regions that contained supramodal information about all the presented emotions, independent of the modality of presentation: the medial prefrontal cortex (MPFC) and the left posterior superior temporal sulcus (STS). Although these regions have been implicated in affective processing before, these data now show that they contain supramodal, abstract representations of emotion. These regions may, therefore, play a key role in understanding others' emotions.
However, real-life social encounters almost always involve multiple modalities and thus can benefit from the combined use of sensory cues from face, voice, and body postures. These channels simultaneously supply information, which requires multimodal integration for a coherent emotional percept. Because Peelen et al. (2010) presented their stimuli in only one modality at a time, a critical question is how their data relates to integrative processing. A more detailed functional differentiation of MPFC and STS based on the existing literature may help to elucidate this issue.
Superior and middle temporal regions have been repeatedly linked to audiovisual integration in speech perception and also in face–voice integration for person identification (for review, see Campanella and Belin, 2007). One of the few studies that investigated multimodal integration of affective signals (Kreifelts et al., 2007) presented emotional faces and emotional voices separately and simultaneously. In addition to modality-specific activations, a supra-additive effect of the combined face–voice presentation was detected in a posterior superior temporal region close to the one reported by Peelen et al. (2010). Even though the process of multimodal integration is distinct from a supramodal representation of emotion, it is conceivable that an abstract representation of emotion may be the product of the integration process. The overlap of activations related to face–voice integration and to the abstract representation drawn from faces, voices, and body postures suggests that the observed activation in posterior STS reflects this late processing stage of multimodal integration.
A delineation of the functional role of the observed MPFC activation, in contrast, may be approached by asking what abstract representations of emotion are required for. Here, we suggest that they make emotion accessible to higher cognitive function, which abstracts from the specifics to evaluate how others feel. For example, to evaluate a person's emotional state, it is necessary to label it “angry”, abstracting from specific information from face, voice, and body posture. Interestingly, processing emotional words (including emotion labels) activates regions in the MPFC and anterior cingulate cortex (ACC) (Nakic et al., 2006), similar to the MPFC activation described by Peelen et al. (2010), which also extends into ACC (see their Fig. 3). The explicit task used in the study (an intensity rating with given emotion labels), rather than a more implicit task, might have pushed activation of abstract conceptual knowledge about emotional categories and activation in the MPFC/ACC. These results are compatible with the notion of the MPFC/ACC as an “interface between emotion and cognition” (Allman et al., 2001), which is also suggested by its connectivity pattern to prefrontal cortex and limbic regions, including the amygdala, and its involvement both in regulating emotion and in mediating the influence of emotion on cognitive control (Kanske and Kotz, 2010). Measuring MPFC/ACC activation under different task demands could elucidate this issue. If the role of the MPFC is to provide an abstract representation of emotion to be used in higher cognitive function like emotion labeling (Peelen et al., 2010), then implicit tasks such as passive viewing/listening would make this function dispensable and abolish MPFC activation.
One peculiarity of the study by Peelen et al. (2010) is the lack of significant activations in several regions that have previously been implicated in emotion processing in different modalities, foremost the amygdala. It can be asked if the multivariate pattern analysis as performed by the authors might have been biased towards larger brain structures because of the large radius of the pattern searchlight [8 mm, in contrast to 4 mm recommended by Kriegeskorte et al. (2006)] and because of smoothing, which removes the fine structure of activation patterns. The authors suggest that the amygdala was not activated because the intensity of the different emotions was equivalent. To substantiate this speculation, it would have been useful to compare activation by the emotional categories to a low-intensity, emotionally neutral condition. This condition was part of the experiment, but unfortunately was not included in the analysis.
Finally, it should be noted that the methodological approach taken in the study has the potential to identify regions that encode single specific emotions across modalities. Peelen et al.'s (2010) particular research question demanded a search for regions that were activated by all of the presented emotions. Potential differences between the representations of different emotions were therefore not taken into account. However, it has been shown that certain basic emotions are related to very specific neural circuits. Disgust, for example, is consistently found to activate the insula across modalities (Sambataro et al., 2006). Whether part of this activation is truly supramodal could be elegantly tested with the approach presented by Peelen et al. (2010).
In conclusion, Peelen et al. (2010) have taken a necessary step in the study of how emotional information from different channels is processed and integrated. The finding of supramodal representations of emotion in the brain is encouraging and will fuel further investigation of multimodal stimulus processing. The particular approach taken by the authors will surely support this important endeavor toward a more ecologically valid neuroscience.
Footnotes
Editor's Note: These short, critical reviews of recent papers in the Journal, written exclusively by graduate students or postdoctoral fellows, are intended to summarize the important findings of the paper and provide additional insight and commentary. For more information on the format and purpose of the Journal Club, please see http://www.jneurosci.org/misc/ifa_features.shtml.
This work was supported by Deutsche Forschungsgemeinschaft Grant We3638/3-1 (P.K.) and Deutsche Forschungsgemeinschaft Grant FOR 499-Ko2268/1-3 (A.S.H.).
- Correspondence should be addressed to Philipp Kanske, Department of Cognitive and Clinical Neuroscience, Central Institute of Mental Health, Square J5, 68159 Mannheim, Germany. philipp.kanske{at}zi-mannheim.de