Previous functional neuroimaging studies have characterized brain systems mediating associative learning using classical delay conditioning paradigms. In the present study, we used event-related functional magnetic resonance imaging to characterize neuronal responses mediating aversive trace conditioning. During conditioning, neutral auditory tones were paired with an aversive sound [unconditioned stimulus (US)]. We compared neuronal responses evoked by conditioned (CS+) and nonconditioned (CS−) stimuli in which a 50% pairing of CS+ and the US enabled us to limit our analysis to responses evoked by the CS+ alone. Differential responses (CS+ vs CS−), related to conditioning, were observed in anterior cingulate and anterior insula, regions previously implicated in delay fear conditioning. Differential responses were also observed in the amygdala and hippocampus that were best characterized with a time × stimulus interaction, indicating rapid adaptation of CS+-specific responses in medial temporal lobe. These results are strikingly similar to those obtained with a previous delay conditioning experiment and are in accord with a preferential role for medial temporal lobe structures during the early phase of conditioning. However, an additional activation of anterior hippocampus in the present experiment supports a view that its role in trace conditioning is to maintain a memory trace between the offset of the CS+ and the delayed onset of the US to enable associative learning in trace conditioning.
In classical conditioning paradigms, a previously neutral stimulus [conditioned stimulus (CS)] comes to elicit a behavioral response through temporal pairing with an unconditioned stimulus (US). In fear conditioning, the US is aversive, and the behavioral response can be measured in terms of changes in skin conductance responses (SCR) (Esteves et al., 1994). Hence, classical conditioning is a form of associative learning involving linkage between a neutral stimulus and a stimulus with innate behavioral significance.
The two most commonly used types of classical conditioning paradigms, namely trace and delay conditioning, differ in the temporal relationship between the CS and the US. In delay conditioning, the US is presented at the end of the CS so that they overlap temporally. In trace conditioning, there is a gap between the offset of a CS and onset of a US. The term trace conditioning stems from the idea that a “memory trace” is necessary to “bridge the gap” between CS and US so that associative learning can take place (Pavlov, 1927).
Lesion studies suggest a critical role for medial temporal lobe structures, especially the amygdala, in the acquisition of conditioned fear responses (Bechara et al., 1995; LaBar et al., 1995; LeDoux, 1996). However, the hippocampus also seems to play a key role in classical conditioning if the CS is not immediately followed by the US, as exemplified in trace conditioning paradigms. In eyeblink conditioning, midbrain and cerebellar structures are sufficient for associative learning to occur if CS and US are presented together (i.e., delay conditioning). If a delay between CS and US is introduced (i.e., trace conditioning), then data from animal studies (Solomon et al., 1986; Moyer et al., 1990; Kim et al., 1995; McEchron et al., 1998) and studies of amnesic patients (McGlinchey-Berroth et al., 1997; Clark and Squire, 1998) indicate a crucial role for the hippocampus.
Event-related (mixed–single trial) functional magnetic resonance imaging (fMRI) provides the optimal context for studying the neurobiology of classical conditioning in humans, using functional neuroimaging (Buckner et al., 1996; Josephs et al., 1997; Büchel et al., 1998). In simple terms, this technique resembles those used to record event-related potentials in electrophysiology in which different stimuli are presented and sampled repeatedly over time. This approach enabled us to investigate the neuronal basis of trace fear conditioning in humans using a partial reinforcement discrimination–conditioning paradigm.
To identify the neuronal correlates of rapidly learned association, we tested for differential responses between nonconditioned (CS−) and conditioned (CS+) stimuli over the whole experiment. Responses of this type were expected in cortical regions. On the basis of previous studies, we also anticipated responses in medial temporal lobe structures implicated in emotional learning that would only be expressed during acquisition. These were tested for with time × stimulus interactions to identify regions that responded more to the CS+ in the early phases of the experiment. Acquisition-specific responses to CS+ were expected in (1) the amygdala, given its central role in all forms of emotional learning and (2) the hippocampus, which could be specifically associated with trace conditioning.
MATERIALS AND METHODS
Paradigm. We studied 11 healthy volunteers (six male and five female). Written informed consent was obtained before the experiment. Two neutral tones (400 and 1600 Hz) with a duration of 3 sec were used as CS. During conditioning, one of the two CS was paired with a loud unpleasant tone (1 kHz) and consequently became CS+. The amplitude of the US was set to 10% above each subject's aversive threshold [∼100 dB(A); estimated by self-report during MR scanning]. The 500 msec aversive tone (US) followed the offset of the CS after a trace period of 1000 msec (Fig.1). The assignment of the low (400 Hz) or high (1600 Hz) frequency tone to either CS+ or CS− was randomized across subjects. The volume of both tones was adjusted to produce identical subjective loudness levels. We used a 50% partial reinforcement strategy (i.e., only half of the presentations of the CS+ were paired with the US [CS+paired]) to allow us to assess evoked hemodynamic responses to the CS+ in the absence of the US. In total, we presented 104 CS over 21 min. Fifty-two were CS−, 26 were CS+ paired with noise (CS+paired), and 26 were unpaired CS+ (CS+unpaired). Figure 1 A shows an example of the scanning protocol. Computer-generated auditory stimuli were delivered through plastic tubes, sealed by foam ear inserts. To further decrease the influence of the gradient switching noise from the scanner, the sound delivery system was shielded by plastic ear defenders. Subjects were familiarized with the US and CS during a 10 min preconditioning scanning session.
Functional imaging. Data were acquired with a 2 Tesla Magnetom VISION whole-body MRI system (Siemens, Erlangen, Germany) equipped with a head volume coil. Contiguous multi-slice T2*-weighted echoplanar images [echo time, 40 msec; 80.7 msec/image; 64 × 64 pixels (19.2 × 19.2 cm)] were obtained in an axial orientation. This sequence enhances blood oxygenation level-dependent contrast (Kwong et al., 1992). The volume acquired covered the whole brain (32 slices; slice thickness of 3 mm, giving a 14.4 cm vertical field of view). The effective repetition time (TR) was 3.2 sec/vol. To minimize head motion, subjects were restrained with bitemporal pressure pads. Four hundred image volumes were acquired for each subject over 21 min.
Skin conductance responses. On-line SCRs were successfully measured in all subjects during fMRI scanning. SCR Ag/AgCl electrodes were placed on the proximal phalanx of the index and middle finger of the left hand. The signal was amplified and sampled at 100 Hz. Further off-line processing was performed with MatLab (The Mathworks, Natick, MA). Data were detrended and temporally smoothed (gaussian kernel with full-width at half maximum of 2000 msec) to remove MRI scanning artifacts. Finally, the time-series were resampled at 10 Hz. For quantitative analysis of SCRs, evoked SCRs were characterized by the maximum of the SCR signal in the 8 sec interval after stimulus onset. Extending this window beyond the onset of the US (after 4 sec) was possible because only CS− and CS+unpaired were analyzed (Büchel et al., 1998). This value was then subtracted from a baseline, the mean of the SCR in the second before the onset of the CS, to account for residual baseline fluctuations. For statistical analysis, the differences were normalized to zero mean and SD of unity. The significance of SCR differences for CS− and CS+unpaired was assessed separately for the first and second half of the experiment. This allowed us to test for time × conditioning interactions.
Imaging data analysis. Image processing and statistical analysis were performed using SPM97 (Friston et al., 1995b; Worsley and Friston, 1995). All volumes were realigned to the first volume (Friston et al., 1995a). Residual motion effects were eliminated by regressing the time course of each voxel on a periodic function of the estimated movement parameters. To account for the different sampling times of different slices, voxel time-series were interpolated using Sinc interpolation and resampled using the slice at the anterior–posterior commissural line as the reference. A mean image was created using the realigned volumes. An anatomical MRI, acquired using a MPRAGE three-dimensional (3D) T1-weighted sequence (1 × 1 × 1.5 mm voxel size), was coregistered to this mean (T2*) image. This ensured that the functional and structural images were spatially aligned. Finally, the functional images were spatially normalized (Friston et al., 1995a) to a standard T2* template (Talairach and Tournoux, 1988;Evans et al., 1994), using nonlinear basis functions. This transformation was also applied to the structural T1 volume. Functional data were smoothed using a 6 mm (full-width at half maximum) isotropic gaussian kernel to compensate for residual variability after spatial normalization and to permit application of gaussian random field theory to provide for corrected statistical inference (Friston et al., 1995b).
In contrast to evoked responses in electrophysiology, the sampling rate (i.e., TR) in event-related fMRI is restricted. To characterize hemodynamic responses after an indexed stimulus, it is necessary to sample data points after the onset of many stimuli at different peristimulus time points. We achieved this through a random jitter between the intertrial interval (ITI) and TR, leading to an equal distribution of data points after each event type. The intertrial interval was randomized in the range of 4 ± 0.5 TR, leading to ITIs between 11.2 and 14.4 sec. The example in Figure1 A (second row) illustrates the relationship between fMRI volume acquisition and trials. For example, scans took place 3, 6.2, and 9.4 sec after the first CS+unpaired stimulus onset. After the second CS+unpaired trial, volumes were acquired 2, 5.2, and 8.4 sec after the onset of the stimulus. The data were analyzed by modeling the evoked hemodynamic responses for different stimuli as delta functions convolved with a synthetic hemodynamic response function (HRF), its temporal and dispersion derivative, in the context of the general linear model as used by SPM97 (Josephs et al., 1997). Using both derivatives in addition to the canonical HRF allowed us to characterize HRFs with late onset or longer duration. Because of our interest in the time period after the offset of the CS, all events were time-locked to the offset of the CS. An additional analysis in which events were modeled earlier (i.e., at the onset of the CS) did not reveal additional activations. We defined three different event types: (1) CS−, (2) CS+ paired (CS+paired), and (3) CS+ unpaired (CS+unpaired) (Fig. 1). Differential effects were tested by applying appropriate linear contrasts to the parameter estimates for the canonical HRF regressor of each event, resulting in a t statistic for every voxel. These t statistics (transformed to Z statistics) constitute a statistical parametric map (SPM). The contrast used in the main analysis tested for greater responses evoked by CS+unpaired stimuli relative to CS−.
In addition to the main analysis, we defined three new regressors representing time × event interactions. These three additional regressors were created by multiplying the amplitude regressors for CS−, CS+paired, and CS+unpaired with a zero mean exponential function with a time constant of one-fourth of the session length (320 sec). Based on our previous study, we hypothesized an overall negative difference for regions showing decreased activation over time. We therefore tested for a mean negative difference masked with a contrast coding the differential decrease between the CS− and CS+unpaired. The masking threshold was set top < 0.1. We analyzed the group as a whole and all subjects individually.
Ensuing SPMs were interpreted by referring to the probabilistic behavior of gaussian random fields. Functional imaging data were only analyzed for those seven subjects who showed successful conditioning. Evoked responses were modeled in a subject-specific way. In the case of the amygdala, the anterior cingulate, and anterior insula in which we had a regional-specific hypothesis correction for multiple comparisons was based on the volume of interest (Filipek et al., 1994) and the smoothness of the underlying SPM (Worsley et al., 1996). For other brain regions, the correction was for the entire volume analyzed (i.e., whole brain). Thus, in all cases, the threshold was set top < 0.05 (corrected).
Skin conductance responses
The SCR time series showed that 7 of 11 subjects acquired conditioned autonomic responses to the CS+ (Fig.2 A). Figure2 B shows an example of the time course of the SCR signal for one subject during MR scanning over the first 7 min of the experiment. Statistical analysis revealed significant differences (p < 0.05) between SCR for CS− compared with CS+ trials for all subjects during the first half of the conditioning phase, indicating that these subjects were successfully conditioned during fMRI. During the second half of the conditioning phase, the difference between CS+ and CS− evoked SCRs was much smaller (only subject 6 showed a significant difference) (Fig. 2 A). As with the fMRI analysis, we only analyzed unpaired CS+ trials.
In a CS+paired trial, the peak of the response evoked by the CS+ could occur after delivery of the US (Fig.1 A, first event in the third row). This can give rise to interaction effects in regions that respond to both the US and the CS+ (Friston et al., 1998). To fully disambiguate the effects of CS+ and US, we only compared responses evoked by the conditioned tone when it was not followed by the US (CS+unpaired). Consequently, the comparison of interest in this experiment was that between the CS+unpaired evoked responses and those evoked by the CS−. This comparison revealed differential activation of bilateral anterior cingulate gyri and anterior insulae (Fig.3). Further differential responses were detected in the right medial thalamus, bilateral dorsolateral prefrontal cortex, bilateral ventral putamen, and posterior peri-auditory cortex. The coordinates and significance of these activations are summarized in Table1.
Figure 3 shows the location of the anterior cingulate activation for two subjects overlaid on their individual T1-weighted MRI. The coronal slice also shows activation in bilateral insulae. To best characterize the data, we plotted the responses evoked by the CS− (green ± SE), the response evoked by the CS+paired (red ± SE), and the response evoked by the CS+unpaired(blue ± SE). As expected, the curves for CS+paired and CS+ unpaired are almost identical, highlighting the fact that the evoked responses in these regions are not related to processing of the US. Note that the statistical comparison is based on the difference between CS+unpaired and the CS− alone.
Decreases of amygdala responses over time in the context of classical conditioning have been demonstrated in animal (Quirk et al., 1997) and human studies of classical conditioning (Büchel et al., 1998;LaBar et al., 1998). Furthermore, hippocampal activations have been shown to follow a similar temporal pattern during learning (Strange et al., 1999). Such temporal effects would be disguised in a simple categorical comparison of the CS+unpairedand CS−. We therefore explicitly tested for the presence of a time × event type interaction in an additional analysis. This analysis tests for areas in which neuronal responses evoked by CS+unpaired decrease over time, and at the same time this pattern is significantly different from the pattern for the responses evoked by the CS−. In effect, this analysis shows voxels with a differential adaptation for the CS+unpaired relative to the CS−. In this analysis, amygdala and anterior hippocampal activations were significant bilaterally at p < 0.05. The exact coordinates and statistics of the activations are given in Table 1.
Figure 4 shows significant voxels for the group analysis in bilateral amygdalas and hippocampi on a coronal section of the mean T1 MRI for the group. To unequivocally attribute the activations to the amygdala and the anterior hippocampus, we exploited the spatial and temporal resolution of fMRI and analyzed single-subject data separately. Figure 5shows medial temporal lobe activations revealed by the same analysis as in the group for subjects 3 and 5. Each subplot shows the activation overlaid on the subject's T1-weighted MRI and the time course of responses evoked by CS+unpaired. The 3D graph shows how evoked responses change over time as learning progresses (see figure legend for details). Both amygdala and hippocampus exhibited rapid decreases of their responses. This is in accord with our SCR data showing a significant time × condition interaction.
In summary, our results showed enduring conditioning-related activations in cortical regions reflecting the acquired association. In contrast, other regions showed differential responses that were time-limited. Specifically, medial temporal lobe regions, amygdala, and hippocampus showed rapid decreases in differential response implying effects related to acquisition per se. Activation of the hippocampus in trace, but not in delay, conditioning suggests a specific involvement in this form of associative learning, i.e., by bridging the temporal gap between CS and US.
Differential cortical responses
This analysis highlights cortical regions showing differential activity associated with CS+ and CS− trials. Note that the analysis did not explicitly model time-dependent effects but represents a sensitive approximation of rapidly evolving learning-related responses. Such responses were expected in cortical regions highlighted previously in the context of a classical delay conditioning paradigm (Büchel et al., 1998).
The anterior cingulate plays a crucial role in assessing the motivational content of internal and external stimuli and in regulating context-dependent behaviors (Devinsky et al., 1995), such as approach and avoidance learning (Freeman et al., 1996). Direct evidence for the participation of the anterior cingulate in classical conditioning comes from animal (Powell et al., 1996; Everitt and Robbins, 1997) and human functional imaging studies (Büchel et al., 1998; LaBar et al., 1998). Cingulo-thalamic neuronal plasticity may be crucial for the acquisition of avoidance responses in the context of conditioning, and it has been suggested that amygdala projections play an important role in the modulation of these plastic changes (Poremba and Gabriel, 1997a,b). This is especially interesting because we also found conditioning-related signal changes in the medial thalamus, in accord with this hypothesis.
Several studies suggest a role for the anterior insula in processing emotionally relevant contexts, such as disgust (Phillips et al., 1997) and pain (Casey et al., 1995). This functional characterization fits with insular projections to anterior cingulate, perirhinal, entorhinal, and periamygdaloid cortices, and the amygdala (Mesulam and Mufson, 1982). This pattern of connectivity, together with neurophysiological data, has led to a view of the insula as an area functionally associated with emotional processing (Casey et al., 1995).
This profile of significant differential responses in the amygdala is consistent with its proposed modulatory effect on cortical processing. It has been shown that one role of the amygdala during conditioning is in the early modulation of cingulo-thalamic connectivity (Poremba and Gabriel, 1997a,b). Furthermore, differential posterior secondary auditory cortex responses are in accord with the modulatory influence the amygdala exerts over auditory cortex (Armony et al., 1998), leading to learning-related plastic changes (Weinberger et al., 1993; Morris et al., 1998). These observations suggest that, with time, mnemonic representations of behaviorally salient contexts are expressed in cortical regions other than medial temporal lobe structures (McGaugh et al., 1996; Cahill and McGaugh, 1998). Proposed candidate structures, including the anterior cingulate and insular cortices, were all activated in our study (Everitt and Robbins, 1997).
Differential medial temporal lobe responses
Amygdala responses to CS+unpaired stimuli decreased rapidly over time. This accords with results from other neuroimaging studies showing similar decreases of amygdala responses in the context of viewing emotionally expressive faces (Breiter et al., 1996; Whalen et al., 1998) and classical delay conditioning (Büchel et al., 1998; LaBar et al., 1998). This is also supported by electrophysiological data showing decreasing amygdala single cell responses during conditioning in the rat (Quirk et al., 1997) and suggests a preferential role for the amygdala during the early phase of aversive conditioning.
Another component of the decreases in amygdala responses might be linked to negative feedback in classical conditioning (Fanselow, 1998). In our experiment in which an aversive tone serves as US, engagement of the stapedius reflex might decrease sound transmission at the level of the middle ear and mediate negative feedback, which reduces the impact of the US (Cacace et al., 1992). However, this is unlikely because the SCR evoked by the US (i.e., CS+paired) did not show a systematic decrease over time, as one would expect in the presence of negative feedback. In fact, four of seven subjects, including subjects 3 and 5 (Fig. 5), showed a slight increase in SC responses over time. Furthermore, the functional imaging data indicate that amygdala responses exhibited significantly less decreases to the CS+paired (i.e., US-associated) responses than to the CS+unpaired. The amygdala voxels showing the maximum difference in the time × condition interaction (i.e., decreases for CS+unpaired but no decrease for CS+paired) were found in close vicinity to the location reported in Table 1 (relevant coordinates: x = 24, y = −3, z = −21;x = −18, y = 0, z = −24). A constant amygdala response to the US but decrease with respect to the CS+ indicates that the observed time × condition interaction in the amygdala is linked to learning rather than to negative feedback.
Two previously conducted single-trial fMRI studies of aversive conditioning showed activation of the amygdala but did not report hippocampal activation (Büchel et al., 1998; LaBar et al., 1998). Interestingly, both studies used a delay conditioning protocol. This is consistent with animal studies showing that lesions of the hippocampus do not interfere with delay fear conditioning (Schmaltz and Theios, 1972; Phillips and LeDoux, 1992; Clark and Squire, 1998). The hippocampus is implicated in classical conditioning principally on the basis of studies using nictitating membrane response conditioning. In these studies, an intact hippocampus is necessary for learning when CS and US presentations are separated in time (i.e., trace conditioning), in line with a view that the hippocampus acts as an associator of discontiguous events (Wallenstein et al., 1998). We note that it has been reported recently that trace conditioning and other hippocampus-dependent learning tasks have a trophic effect on adult-generated hippocampal neurons (Gould et al., 1999).
Although the hippocampus is not essential for nictitating membrane response conditioning, neurophysiological recordings in rabbits (Disterhoft et al., 1986) and human positron emission tomography (PET) studies (Molchan et al., 1994; Logan and Grafton, 1995; Blaxton et al., 1996; Schreurs et al., 1997) have also reported hippocampal involvement for delay eyeblink conditioning. The hippocampal signal changes as seen in PET studies during delay conditioning might reflect a tonic change in hippocampal activity. Such effects are not time-locked to presentation of individual stimuli and were therefore beyond the scope of single-trial designs as implemented in the present study. These design differences may account for apparent discrepancies in findings from studies using different experimental modalities.
In aversive conditioning, the involvement of the hippocampus has often been attributed to context (Penick and Solomon, 1991; Phillips and LeDoux, 1992; Rudy and Pugh, 1996). Context is referred to as the ensemble of cues that coexist with a specific CS. Similar to the idea that different cues need to be temporally and spatially integrated to form a context, the introduction of a trace period leads to the formation of a temporal context. This hypothesis explains the necessity of an intact hippocampus for both, contextual fear and trace conditioning.
The similarity of hippocampal and amygdala temporal activation pattern points to an important interaction during learning between these two neighboring regions (McGaugh et al., 1996; Cahill and McGaugh, 1998;Packard and Teather, 1998). In delay conditioning, the amygdala alone is sufficient for the formation of associations (e.g., the modulation of thalamo-cortical plasticity). In trace conditioning, with the additional temporal discontiguity, the hippocampus might be involved in bridging the temporal gap and allow an association to occur. Assuming that the modulatory role of the amygdala is time-limited, this would also imply a time-limited response for the hippocampus. This accords with our data highlighting similar decreases of amygdala and hippocampal responses.
Different time courses of cortical (anterior cingulate and insula) and medial temporal lobe structures (amygdala and hippocampus) raise the question of how these different regions and their time courses are related. The fact that responses of medial temporal lobe structures continued even after cortical structures have reached their maximal response could reflect an early saturation of cortical responses, implying a nonlinear temporal relationship between medial temporal lobe and other cortical areas in associative learning.
The rapid decreases of amygdala responses fits well with theoretical accounts of reinforcement and emotional learning, which suggest a role for the amygdala in modulating or enabling associative changes in synaptic efficacy through vicarious neuromodulatory mechanisms (Friston et al., 1994; Schultz et al., 1997). The hypothesis is that brain systems that mediate learning in which the amygdala plays a central role do so by enabling or permitting associative plasticity that encodes sensory contingencies being acquired. After acquisition, the learned association will be expressed at a cortical level reflecting changes in synaptic connection strengths (Alvarez and Squire, 1994; McGaugh et al., 1996). However, once the association has been learned, there is no need for further permissive modulation of plasticity, and systems mediating it (e.g., the amygdala) disengage (Quirk et al., 1997). Although this provides a sufficient account of delay conditioning, trace conditioning-specific activation of anterior hippocampus suggests an involvement in bridging the temporal gap between CS+ and US and accords with its suggested role as an associator of discontiguous events (Clark and Squire, 1998; Wallenstein et al., 1998).
This work was supported by the Wellcome Trust. We thank R. Frackowiak for helpful comments. We would also like to thank the FIL physics group and the radiographers for their support and help.
Correspondence should be addressed to Christian Büchel, Leopold Müller Functional Imaging Laboratory, The Wellcome Department of Cognitive Neurology, Institute of Neurology, 12 Queen Square, London WC1N 3BG, UK. E-mail:.