Abstract
The visual cortex is sensitive to emotional stimuli. This sensitivity is typically assumed to arise when amygdala modulates visual cortex via backwards connections. Using human fMRI, we compared dynamic causal connectivity models of sensitivity with fearful faces. This model comparison tested whether amygdala modulates distinct cortical areas, depending on dynamic or static face presentation. The ventral temporal fusiform face area showed sensitivity to fearful expressions in static faces. However, for dynamic faces, we found fear sensitivity in dorsal motion-sensitive areas within hMT+/V5 and superior temporal sulcus. The model with the greatest evidence included connections modulated by dynamic and static fear from amygdala to dorsal and ventral temporal areas, respectively. According to this functional architecture, amygdala could enhance encoding of fearful expression movements from video and the form of fearful expressions from static images. The amygdala may therefore optimize visual encoding of socially charged and salient information.
Introduction
Emotional images enhance responses in visual areas, an effect typically observed in the fusiform gyrus for static fearful faces and ascribed to backwards connections from amygdala (Morris et al., 1998; Vuilleumier and Pourtois, 2007). Although support for amygdala influence comes from structural connectivity (Amaral and Price, 1984; Catani et al., 2003), functional connectivity (Morris et al., 1998; Foley et al., 2012), and path analysis (Lim et al., 2009), directed connectivity measures and formal model comparison are still needed to show that backwards connections from amygdala are more likely than other architectures to generate cortical emotion sensitivity.
Moreover, it is surprising that the putative amygdala feedback would enhance fusiform cortex responses. According to the prevailing view, a face-selective area in fusiform cortex, the fusiform face area (FFA), is associated with processing facial identity, whereas dorsal temporal regions, particularly in the superior temporal sulcus (STS), are associated with processing facial expression (Haxby et al., 2000). An alternative position is that fusiform and STS areas both contribute to facial expression processing but contribute to encoding structural forms and dynamic features, respectively (Calder and Young, 2005; Calder, 2011). In this case, static fearful expressions may enhance FFA encoding of structural cues associated with emotional expression. We therefore characterized the conditions under which amygdala mediates fear sensitivity in fusiform cortex, compared with dorsal temporal areas (Sabatinelli et al., 2011).
We asked whether dynamic and static fearful expressions enhance responses in dorsal temporal and ventral fusiform areas, respectively. One dorsal temporal area, hMT+/V5, is sensitive to low level and facial motion and may be homologous to the middle temporal (MT), medial superior temporal (MST), and fundus of the super temporal (FST) areas in the macaque (Kolster et al., 2010). Another dorsal area, the posterior STS, is responsive generally to biological motion (Giese and Poggio, 2003). Compared with dorsal areas, the fusiform gyrus shows less sensitivity to facial motion (Schultz and Pilz, 2009; Trautmann et al., 2009; Pitcher et al., 2011; Foley et al., 2012; Schultz et al., 2012). Despite its association with facial identity processing, many studies have shown that FFA contributes to processing facial expressions (Ganel et al., 2005; Fox et al., 2009b; Cohen Kadosh et al., 2010; Harris et al., 2012) and may have a general role in processing facial form (O'Toole et al., 2002; Calder, 2011). Sensitivity to static fearful expressions in the FFA may reflect this role in processing static form. If so, then dynamic fearful expressions may evoke fear sensitivity in dorsal temporal areas instead, reflecting the role of these areas to processing motion.
Our fMRI results confirmed our hypothesis that dorsal motion-sensitive areas showed fear sensitivity for dynamic facial expressions, whereas the FFA showed fear sensitivity for static expressions. To explore connectivity mechanisms that mediate fear sensitivity, we used dynamic causal modeling (DCM) to explore 508 plausible connectivity architectures. Our Bayesian model comparison identified the most likely model, which showed that dynamic and static fear modulated connections from amygdala to dorsal or ventral areas, respectively. Amygdala therefore may control how behaviorally relevant information is visually coded in a context-sensitive fashion.
Materials and Methods
Participants.
fMRI data were collected from 18 healthy, right-handed participants (>18 years, 13 female) with normal or correct-to-normal vision. Experimental procedures were approved by the Cambridge Psychology Research Ethics Committee.
Imaging acquisition.
fMRI data were collected using a 3T Siemens Tim Trio MRI scanner and a 32-channel coil. We collected whole-brain T2*-weighted echo-planar imaging volumes with 32 oblique axial slices that were 3.5 mm thick, in-plane 64 × 64 matrix with resolution of 3 × 3 mm, TR 2 s, TE 30 ms, flip angle 78°. T1-weighted MP-RAGE structural images were acquired with 1 mm3 voxels. The first five “dummy” volumes were discarded to allow for magnetic equilibration.
Experimental design.
The experiment used a block design, with 18 “main experiment” runs and two “localizer” runs. We chose a block design because this is the statistically most efficient design for convolution models, such as DCM (Mechelli et al., 2003). All blocks were 11 s, comprised eight 1375 ms presentations of greyscale stimuli, and were followed by a 1 s interblock fixation interval. Participants fixated on a gray dot in the center of the display, overlaying the image, and pressed a key when the dot turned red for a random one-third of stimulus presentations. In each localizer run, participants viewed six types of blocks, each presented six times. Face blocks contained dynamic facial expressions taken from the Amsterdam Dynamic Facial Expression Set (van der Schalk et al., 2011) or the final static frames from the dynamic facial videos, capturing the expression apexes. Eight different identities (four male and four female) changed among neutral and disgust, fearful, happy, or sad expressions. The eight identities and four expressions appeared in a pseudo-random order, with each of the four expressions appearing twice. Object blocks included eight dynamic objects used in a previous study (Fox et al., 2009a) or the final static frames from the dynamic object videos, shown in a pseudo-random order. The low-level motion blocks consisted of dynamic random-dot pattern videos with motion-defined oriented gratings. The stimuli depicted 50% randomly luminous pixels, which could move at one frame per second horizontally, vertically, or diagonally left or right. Oriented gratings were defined by moving the dots within four strips of pixels in the opposite direction to the rest of the display, but at the same rate (Van Oostende et al., 1997). Each motion direction was shown twice per block in a pseudo-random order. There were also corresponding low-level static blocks composed of the final static frames from the low-level motion videos.
The remaining runs comprised the main experiment and allowed us to measure expression-specific responses in the ROIs defined by the localizer data. Each of these main experiment runs had 12 blocks. Each block contained a distinct type of stimulus and was presented in a pseudorandom order. Six of the blocks contained faces, using the same four female and four male identities as in the localizer runs. In each block, all faces were either dynamic or static and showed just one of three expressions: disgust, happy, or fearful. The remaining six blocks were Fourier phase-scrambled versions of each of the six face blocks (dynamic videos were phase-scrambled in three dimensions). There was some overlap between face stimuli used in the localizer and main experiment runs; thus, the face-selective voxels we observed in the localizer may show a preference in favor of the faces we used in the main experiment.
After scanning, participants made speeded categorizations of the emotion, expressed in the dynamic and static faces, as disgust, happy, or fearful and rated their emotional intensity on a 1–9 scale. They also rated on a 1–9 scale the intensity of the motion they perceived in each of the dynamic stimuli. Stimuli were presented for the same duration as in the fMRI experiment, and the next stimulus appeared once the participant completed a rating.
Preprocessing and analysis.
We performed preprocessing and analysis using SPM8, DCM10 (Wellcome Trust Centre for Neuroimaging, London; http://www.fil.ion.ucl.ac.uk/spm/) and MATLAB (MathWorks). Data were motion- and slice-time corrected, spatially normalized to an EPI template in MNI space, smoothed to 8 mm full-width half-maximum, and analyzed using the general linear model. At the first (within-subject) level, regressors were constructed by convolving the onset times and durations for the different experimental blocks with a canonical hemodynamic response function. Contrasts of interest were computed for each participant and tested at the random effects (between-subject) level, using one-sample t tests.
At the first level, we identified face-selective ROIs from the localizer runs in the right occipital and fusiform face areas (OFA, FFA) and in the right posterior STS, by contrasting the average response to dynamic and static faces versus the average response to dynamic and static objects and random-dot patterns. We also identified an ROI showing motion sensitivity to faces in the vicinity of area hMT+/V5 (here labeled V5f) by contrasting dynamic versus static faces for the localizer runs. Although the posterior STS was also apparent in this contrast, we defined our STS ROI for further analysis using the peak face selectivity in posterior STS (see Fig. 1). Thirteen of the 18 participants evidenced all four ROIs in the right hemisphere, and further analyses focused on right hemisphere ROIs. This maximized the available data, as right hemisphere ROIs were most commonly identified across participants, a finding consistent with the well-known right hemisphere dominance in face perception (Kanwisher et al., 1997). These ROIs were defined as 9 mm spheres surrounding the peak coordinates. The amygdala was anatomically defined, using an 8 mm sphere centered on MNI 23, −1, −22. We summarized these findings by analyzing localizer runs at the group level. Statistical parametric maps from this group analysis (see Fig. 1) were thresholded at p < 0.001 (for display purposes) and reported if significant at p < 0.05 familywise error corrected at the cluster level (Brett et al., 2003). We further illustrate the localizer findings by showing results computed from individually defined ROIs (see Fig. 2). We ascertained sensitivity to fearful facial expressions by performing general linear model analyses, including ANOVAs, on main experiment run data extracted from the individually defined ROIs.
We used DCM (Friston et al., 2003) to characterize the influence of the amygdala on visual cortical responses. We tested a large model space, in which all models included dynamic and static face inputs to OFA and a dynamic face input to V5f. We also explored two model spaces: one using an additional static face input in V5f and the other using all 18 participants with ROI locations based on group localizer results. Comparisons using both of these model spaces verified our main findings with respect to the amygdala (see Results). We systematically varied which connections were bilinearly modulated by dynamic or static fear. To test which connections give rise to fear-sensitive responses to dynamic faces in V5f and STS, we tested 15 model variants with dynamic fear modulation (Table 1, left column). These included six models with modulation of every combination of connections from OFA, V5f, and amygdala, plus a model in which only connections between V5f and STS were modulated, plus six more models with modulation modulating on every combination of connections from OFA, V5f, and amygdala, but adding further modulation on connections between V5f and STS. To test how static fear modulation could give rise to fear sensitivity to static faces in FFA, we tested 15 variants with static fear modulation on every combination of connections from OFA, V5f, STS, and amygdala to FFA (Table 1, right column). The dynamic fear and static fear variants were crossed to create 225 combinations. To accommodate a putative subcortical pathway to the amygdala processing facial expressions of emotion (Morris et al., 1999), we tested two model variants where there was either no exogenous input to the amygdala or where faces served as inputs. We also implemented two model variants that were either “full connectivity,” with all possible endogenous connections (450 models), or were sparse models (58 models). These sparse models were based on a previous study (Furl et al., 2013) showing no endogenous connectivity between FFA and STS and only feedforward connections. For purposes of this model space, a feedforward connection was defined as one that propagates signals from an exogenous input, and so sparse models with an exogenous input to amygdala were equipped with additional endogenous connections from amygdala to cortex, which could also be bilinearly modulated by dynamic or static fear (Table 1). We combined all the aforementioned model variants in one model space, yielding 508 total models.
We compared models at the level of individual models and as model families according to their relative log-evidences and posterior probabilities, under the assumption that our healthy participants possessed the same connectivity architecture (Stephan et al., 2010). We estimated the log-evidences for each participant for each model by computing the free energy, a lower bound on the log-evidence that balances model fit with model complexity (Friston et al., 2003; Penny, 2012). The log-evidences, summed over participants, were compared as a generalization of the Bayes factor (Kass and Raftery, 1995), expressed as a posterior probability (where model evidence is also the posterior probability of a model, under uniform priors over models).
Results
ROI specification
Using the localizer runs, we identified conventional face-selective areas (Kanwisher et al., 1997), including the right OFA, FFA, and an area in the posterior STS, by contrasting faces versus objects and random-dot patterns. We also used the localizer runs to identify motion-sensitive areas that showed greater responses to dynamic relative to static facial expressions. Figure 1a illustrates face selectivity and facial motion effects in the right hemisphere using a conventional (statistical parametric mapping) group-level, whole-brain analysis. Consistent with previous studies of facial motion (Schultz and Pilz, 2009; Trautmann et al., 2009; Pitcher et al., 2011; Foley et al., 2012; Schultz et al., 2012), we found bilateral areas sensitive to facial motion (V5f) in the vicinity of human hMT+/V5 (right MNI: 50, −62, 6; left MNI: −50, −70, 6). We also detected motion sensitivity to faces in the right posterior STS (MNI: 48, −34, 0), an area whose biological motion sensitivity has been well documented (Giese and Possio, 2003) and that has been shown to respond more to dynamic than static faces (Schultz and Pilz, 2009; Trautmann et al., 2009; Pitcher et al., 2011; Foley et al., 2012; Schultz et al., 2012). No motion sensitivity was observed in the vicinity of OFA or FFA, and no face selectivity was observed in hMT+/V5, even at liberal (uncorrected) thresholds (p < 0.5). In contrast, we observed overlapping face-selective and motion-sensitive voxels in the right STS (Fig. 1a,b). The amygdala also showed face selectivity in the group-level analyses (Fig. 1c).
The localizer results for individual participants were used to define functional ROIs in the right hemisphere for OFA, FFA, STS, and V5f, given the well-known right hemisphere dominance in face perception (Kanwisher et al., 1997). We also defined the right amygdala anatomically. We defined our STS ROIs using the peak face selectivity to maintain continuity with previous research, which has largely defined this region of the STS based on face-selective voxels (Haxby et al., 2000). All four ROIs could be defined in 13 of the participants. All of these participants showed overlapping facial motion and face-selective areas in the posterior STS.
Group-level ROI analyses
Using ANOVA on our localizer run data, we tested for motion (dynamic, static) × category (face, object, random-dot pattern) interactions and, when the interaction was not significant, we report main effects. OFA was motion sensitive and face selective (Fig. 2a) and showed main effects of motion (F(1,60) = 21.27; p < 0.001) and category (F(1,60) = 81.98; p < 0.001). Significant motion × category interactions were observed in FFA (Fig. 2b; F(1,60) = 4.98; p = 0.009), V5f (Fig. 2c; F(1,60) = 6.48; p = 0.003), and STS (Fig. 2d; F(1,62) = 6.92; p = 0.002). Post hoc tests (Tukey honest significant difference corrected p < 0.05) showed that these interactions arose because FFA and V5f showed greater motion sensitivity to nonface categories than to faces, whereas STS showed greater motion sensitivity to faces than to nonfaces. STS and V5f showed a significant pairwise difference between dynamic and static faces, whereas FFA did not. In addition, the amygdala (Figs. 1c and 2e) also showed face selectivity (F(1,60) = 18.53, p < 0.001).
To summarize the localizer results, dorsal areas showed robust sensitivity to motion in faces, whereas ventral areas, such as FFA, did not. Even though the STS did not show significant motion sensitivity to the low-level motion of random-dot patterns, it was nevertheless sensitive to the more complex forms of motion in faces, possibly reflecting coding of the higher-order motion features present in faces but not in random-dot patterns. Importantly, differences in STS responses between face expressions, such as enhanced responses to fear, are therefore not easily explained by low-level motion differences among dynamic expressions.
We extracted and analyzed data from the main experiment runs by extracting responses to faces, compared with Fourier-scrambled patterns, from the independent ROIs described above. Planned comparisons (one-tailed) were used to test whether our ROIs showed increased sensitivity to either dynamic or static fearful faces. Consistent with previous research (Vuilleumier et al., 2007), the FFA (Fig. 3b) showed greater responses to static fearful faces compared with static nonfearful (disgust + happy) faces (t(12) = 1.82, p = 0.047), with no significant sensitivity to dynamic fearful expressions (p = 0.600). In contrast, and as predicted, areas sensitive to facial motion showed increased sensitivity for dynamic fearful, compared with nonfearful, faces (V5f (t(12) = 2.78, p = 0.0080; and STS (t(12) = 1.78; p = 0.049) and no significant sensitivity to static expressions (p > 0.403) (Fig. 3c,d). The OFA (Fig. 3a) did not show any significant fear sensitivity (p > 0.144). The average response to faces in the amygdala during the main experiment (compared with Fourier scrambled faces) is shown in Figure 1d. Prior functional imaging and electrophysiological studies in the human and macaque monkey have established that the amygdala is responsive to fearful expressions as well as multiple other emotional expressions, when presented as static images (Krolak-Salmon et al., 2004; Sergerie et al., 2008; Hadj-Bouziane et al., 2012), and we replicate this finding (static faces > Fourier scrambled faces, t(12) = 2.3, p = 0.02, one-tailed) (Fig. 3e), with no significant differences between emotions (p = 0.37). Responses in the amygdala to dynamic faces also showed no main effect of expression (p = 0.37, Fig. 3e). Responses to Fourier phase-scrambled patterns were near zero in all conditions for all ROIs and showed no significant effects of expression (p > 0.22).
Connectivity models of fear sensitivity
Our ROI analysis established fear-sensitive responses in separate visual areas, depending on whether facial expressions were dynamic or static. We used DCM to address the mechanisms underlying these distinct fear-sensitive responses. DCM explains ROI time series by estimating “coupling” parameters that relate the activity (a hidden state variable) in a source area to the rate of change of activity in a target area. Parameters can be added or removed to specify hypothetical connectivity architectures. These parameters include the following: (1) coupling to exogenous stimulus inputs, which enable perturbation of hidden neuronal states by stimulus presentation (faces); (2) endogenous connections, which reflect directed coupling among areas, averaged over experimental conditions; and (3) bilinear modulations, which reflect changes in coupling induced by an experimental factor, in our case, dynamic or static fear (Friston et al., 2003). Our model space considered different combinations of these bilinear parameters to identify connections that were modulated by either dynamic or static fear (see below). Having specified the model, its parameters are optimized to best explain the data using standard variational (Bayesian) techniques. The model evidence summarizes this ability to accurately predict the data in a way that accounts for model complexity. Model evidence allows different models to be compared in terms of their posterior probability (Penny et al., 2004). We also performed “model family” comparisons to test for evidence for specific architectural features of interest by aggregating over the posterior probabilities for models that share that property (Penny et al., 2010).
We specified a comprehensive model space covering plausible alternatives for explaining increased fear-sensitive cortical responses to dynamic and static fearful faces. Our models systematically varied which connections, projecting to FFA, were modulated by static fear and which connections, projecting to V5f and STS, were modulated by dynamic fear. To address the possibility of a subcortical input to the amygdala, such as from the superior colliculus via the pulvinar nucleus (Morris et al., 1999), we also considered models with or without exogenous (stimulus bound) inputs to the amygdala. Finally, to address a previous study showing a feedforward organization among face-selective areas and no connectivity between STS and FFA (Furl et al., 2013), we included models with this “sparse” connectivity structure, as well as models with full connectivity. All 508 combinations of the aforementioned model variants were included in Bayesian model comparison to find the model features that best explained the data.
When all 508 models were individually compared, we found one highly likely model whose posterior probability was 0.93 (Fig. 4a); all competing models had posterior probabilities of <0.068. This optimal model possessed full endogenous connectivity, exogenous inputs only to OFA and V5f (with no extra input to the amygdala), modulation of the connections from the amygdala to V5f and STS by dynamic fear, and modulation of the connections from the amygdala to FFA and V5f to FFA by static fear. Thus, this model confirmed our hypotheses that the amygdala mediates fear sensitivity in visual areas via backwards connections and that the mode of presentation (i.e., dynamic vs static faces) determines the regional selectivity of this top-down effect. Although the contribution of V5f, in addition to the amygdala, was not predicted, it is consistent with findings that motion-sensitive areas contribute to perception of static expressions (Furl et al., 2012).
We also confirmed key attributes of this connectivity pattern using model family comparisons. Posterior probabilities were close to 1 in favoring (1) the 234 models with no extra exogenous amygdala input versus models that had this input (Fig. 4b), (2) the 450 models with full endogenous connectivity compared with sparse models (Fig. 4b), and (3) the 127 models that satisfied our a priori hypothesis, with fear modulation on all connections from amygdala to FFA, V5f, and STS, compared with models without these features. Figure 4c shows four more model family comparisons, testing models where a region had connections to other regions modulated by dynamic fear versus models where this region's connections were not modulated by dynamic fear. There was little evidence for families with dynamic fear modulation on connections originating in OFA (244 models), FFA (240 models), or mutual modulation between V5f and STS (274 models). However, the posterior probability was close to 1 for connections originating in amygdala (238 models). Figure 4d shows that there was little evidence for families possessing static fear modulation on connections originating in OFA (290 models) or STS (198 models). In contrast, there was a 0.99 posterior probability of static fear modulation on connections originating in amygdala (268 models) and a 0.92 posterior probability of static fear modulation on connections originating in V5f (270 models).
Overall, our model comparisons provided very strong evidence for a model where dynamic fear modulated the connections from amygdala to V5f and STS, the dorsal temporal facial motion areas that showed enhanced responses to dynamic fearful expressions. In the same model, static fear modulated the connections from amygdala to FFA, the face-selective ventral area that showed enhanced responses to static fearful expressions.
Postscanning behavioral measures
We assessed the validity of the facial and control stimuli presented during the fMRI experiment by obtaining behavioral data after scanning. To maintain continuity across results, we describe behavioral data for the 12 participants who were also included in the ROI and DCM analyses (one participant lacked behavioral data). All findings are reported using repeated-measures ANOVAs and post hoc pairwise Tukey honest significant difference range tests at p < 0.05.
For localizer run stimuli, participants rated the “motion intensity” (from 1 to 9) in dynamic faces (mean ± SE, 5.28 ± 0.25), objects (mean ± SE, 6.54 ± 0.36), and patterns (mean ± SE, 7.25 ± 0.34) and showed a main effect of category (F(2,55) = 13.64, p < 0.0001), with a significantly lower rating for dynamic faces than for dynamic random-dot patterns. For main experiment run faces (Fig. 5a), participants reported more motion intensity for veridical than for Fourier phase-scrambled dynamic faces, yielding a main effect (F(1,22) = 29.39, p < 0.0002), but there were no expression differences or interaction. Participants also performed speeded classifications for the faces from the main experiment runs, followed immediately by a 1–9 rating of emotional intensity. For correct classifications, neither motion, nor expression, nor their interaction affected emotional intensity ratings (Fig. 5b). For both dynamic and static faces, participants showed a higher hit rate (F(1,55) = 7.59, p = 0.002) and d′ (F(1,55) = 37.70, p > 0.001) for happy expressions than for disgust or fearful expressions. Fearful expressions were the least accurate, with a lower d′ than both disgust and happy expressions (Fig. 5c). Happy expressions were also classified faster (F(1,55) = 5.82, p < 0.005; Fig. 5d) than fearful or disgust expressions.
Overall, our behavioral findings suggest that, at least at the level of conscious reports, the different facial expressions did not differ much in perceived motion or emotional intensity. Thus, the emotion-enhanced responses we observed in FFA, V5f, and STS are not likely to result from heightened emotional intensity in fearful expressions, relative to happy and disgust. We also replicated the numerous previous studies showing accuracy and reaction time advantages for static happy expressions over static fearful expressions (Sweeny et al., 2013), and here we show that these results extend to dynamic expressions.
Discussion
We have shown that the responses of temporal visual areas exhibiting enhanced responses to emotional facial expressions are determined by whether or not they contain motion. For dynamic facial expressions, we found increased sensitivity to fearful faces in dorsal temporal lobe areas sensitive to facial motion (V5f, STS). In contrast, for static expressions, there was increased sensitivity to fearful faces in a more ventral area, the face-selective FFA. These data were better explained by connectivity models where dynamic and static facial fear modulated the backwards connections from amygdala to dorsal and ventral temporal areas, respectively, than by a large number of alternative connectivity architectures. For many years, it has been speculated that emotion-sensitive responses in occipitotemporal visual areas might arise because of the influence of backwards connections from the amygdala (Morris et al., 1998). We have used DCM to explicitly confirm this hypothesis in healthy human participants. And we further demonstrate that the amygdala influences specific visual areas in a context-sensitive fashion. Indeed, the areas targeted by the amygdala appear to be those best suited for processing the information in the stimulus (i.e., motion or static form).
Our model selection compared models where amygdala connections influenced cortical areas against numerous models with alternative forms of connectivity. We found clear evidence for one model, where the amygdala influenced ventral and dorsal temporal areas, via backwards connections. There has been little causal evidence for such an amygdala influence in the healthy human brain, although there are suggestive reductions in cortical emotion sensitivity in amygdala-lesioned macaques (Hadj-Bouziane et al., 2012) and human epilepsy patients with amygdala and hippocampal sclerosis (Vuilleumier et al., 2004). Other indirect conclusions have been based on response timing (Krolak-Salmon et al., 2004; Sabatinelli et al., 2009). In the current study, we measured effective connectivity, which does not preclude polysynaptic mediation via “relay areas,” such as frontal cortex (Lim et al., 2009) or the pulvinar (Pessoa and Adolphs, 2010). However, human dissection (ffytch et al., 2005; Martino et al., 2011) and DTI (Catani et al., 2003) show profuse connectivity between amygdala and visual areas and tract-tracing results in the monkey (Amaral and Price, 1984) show prevalent backwards connections In the human, amygdala feedback could be propagated by direct white matter tracts to occipitotemporal cortex.
Our results suggest that the amygdala does more than feed back to visual cortex and that it may have a contextual role in visual coding. Specifically, the amygdala feedback may target brain areas to enhance encoding of the visual elements of a stimulus that best predict fear. When faces were dynamic, the amygdala selectively targeted V5f and STS, areas thought to encode motion information (Schultz and Pilz, 2009; Trautmann et al., 2009; Pitcher et al., 2011; Foley et al., 2012; Schultz et al., 2012). The posterior STS may contribute to perceiving changeable facial attributes, such as dynamic facial expression and eye gaze (Haxby et al., 2000), and may rely on motion representations via connections with motion-sensitive area hMT+/V5 (O'Toole et al., 2002; Calder, 2011). These proposals, and our results, dove-tail with a sizeable literature on biological motion responses in the posterior STS (Giese and Poggio, 2003) and research in the monkey implicating motion-sensitive areas in expression representation (Furl et al., 2012). Thus, when faces were dynamic, the amygdala enhanced fear responses specifically in visual areas well suited for representing the dynamics of the fearful expressions. Motion-sensitive areas would provide the definitive visual information about fearful expressions that are presented as videos.
In contrast, when faces were static, the amygdala produced fear sensitivity in the FFA, an area that shows relatively little motion sensitivity. Indeed, the FFA is thought to be more specialized for representing “static-based” (O'Toole et al., 2002) or “invariant” (Haxby et al., 2000) facial information, which may include representations of facial form, shape, or structure (Calder and Young, 2005; Calder, 2011). Although it is sometimes supposed that representations in the FFA are limited to facial identity perception (Haxby et al., 2000), our findings, together with a considerable body of literature (Ganel et al., 2005; Fox et al., 2009b; Cohen Kadosh et al., 2010; Harris et al., 2012), suggest that FFA can also contribute to expression perception. In this case, the FFA (rather than dorsal areas) would provide the definitive visual information about fearful expressions from static images. In sum, our results can be best explained if the amygdala optimized encoding of behaviorally relevant information by selectively targeting areas best suited for representing the information available in the stimuli, motion, or static-based information.
The enhanced visual responses we observed presumably reflect elaborated visual coding of information, such as form or motion, and this elaborated coding may have behavioral consequences. The fearful expressions of others contain visual information that indicates danger. The amygdala may guide the visual system to prioritize encoding of visual information (form or motion) that best predicts such aversive events. Indeed, there is a vast literature on animal fear conditioning, suggesting that the amygdala plays a role in learning which preceding sensory cues predict adverse events (Dolan, 2000; Maren and Quirk, 2004). Similar amygdala-based learning and conditioning mechanisms operate in the human when learning from others' fear (Olsson and Phelps, 2007). Moreover, fear-related events can also lead to emotion-related memory enhancements that are amygdala-dependent (Phelps, 2004). Our data, together with this literature, suggest that the amygdala controls encoding and prediction of aversive events based on the individual visual elements of a stimulus (form and motion).
Because we used a comprehensive DCM space, we can also report two further findings. First, our model space explored the manner in which the amygdala initially receives visual information, which is currently a topic of active debate (Vuilleumier and Pourtois, 2007; Pessoa and Adolphs, 2010). We considered models with an exogenous input to the amygdala, which was intended to account for a subcortical route to the amygdala (Morris et al., 1999; Rudrauf et al., 2008). Models with this amygdala input were suboptimal compared with models in which the amygdala received inputs from OFA, FFA, V5f, and STS, without any other exogenous influences. Several studies suggest that a subcortical input might be more apparent, however, when faces are low spatial frequency, peripherally or subliminally presented or unattended (Morris et al., 1999; Anderson et al., 2003; Winston et al., 2003; Williams et al., 2004). Second, we observed static fear modulation on the connection from V5f to FFA, a finding that might relate to perception of implied motion in static images (Kourtzi and Kanwisher, 2000; Senior et al., 2000). In a previous study (Furl et al., 2012), we found that static facial expressions could be decoded from motion-sensitive areas in the macaque, despite limited mean responses to static expressions. Similarly, here we found weak responses to static faces in V5f, yet responses in this area influenced responses to static faces in FFA.
This new perspective on amygdala function also introduces new research questions concerning the mechanisms that the amygdala uses to specify its cortical targets. For example, cortical areas may be targeted within the amygdala presynaptically or, alternatively, postsynaptic neurons within the cortical areas may render themselves more receptive to amygdala influence. There may be short-term synaptic plasticity that alters the connection strengths between cortical areas and the amygdala, depending on activity in both areas. Also, oscillatory phase synchrony has been hypothesized as a mechanism for gating or routing information transmission in the brain (Salinas and Sejnowski, 2001). These possibilities are difficult to test using hemodynamic measures, however, and so suggest new research avenues for electrophysiology in monkeys.
In conclusion, we have shown that dynamic facial expressions evoke fear-sensitive responses in dorsal temporal areas sensitive to visual motion but that static expressions evoke fear-sensitive responses in a ventral temporal area, the FFA. Fear-sensitive responses in both dorsal and ventral areas were best explained by a connectivity model where top-down influences from the amygdala were modulated by fear. This model provides strong evidence, from the healthy human brain, for the long-standing speculation that augmented visual responses to emotional stimuli are caused by amygdala feedback (Morris et al., 1998). Our model further elaborates our understanding of amygdala function by showing that the amygdala can flexibly enhance fear responses in specific brain areas that are best suited for representing definitive stimulus information. Our study, the first to apply extensive connectivity modeling to fMRI responses to dynamic and static faces, yields a new perspective on how the amygdala controls the visual system and speaks to novel research avenues.
Footnotes
This work was supported by the United Kingdom Economic and Social Research Council Grant RES-062-23-2925 to N.F. and the Medical Research Council Grant MC_US_A060_5PQ50 to A.J.C. and Grant MC_US_A060_0046 to R.N.H. We thank Christopher Fox for supplying the dynamic object stimuli and James Rowe and Francesca Carota for contributing useful comments.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Nicholas Furl, Medical Research Council Cognition and Brain Sciences Unit, 15 Chaucer Road, Cambridge, CB2 7EF, United Kingdom. nick.furl{at}mrc-cbu.cam.ac.uk