Abstract
Midbrain dopaminergic neurons (DANs) typically increase their discharge rate in response to appetitive predictive cues and outcomes, whereas striatal cholinergic tonically active interneurons (TANs) decrease their rate. This may indicate that the activity of TANs and DANs is negatively correlated and that TANs can broaden the basal ganglia reinforcement teaching signal, for instance by encoding worse than predicted events. We studied the activity of 106 DANs and 180 TANs of two monkeys recorded during the performance of a classical conditioning task with cues predicting the probability of food, neutral, and air puff outcomes. DANs responded to all cues with elevations of discharge rate, whereas TANs depressed their discharge rate. Nevertheless, although dopaminergic responses to appetitive cues were larger than their responses to neutral or aversive cues, the TAN responses were more similar. Both TANs and DANs responded faster to an air puff than to a food outcome; however, DANs responded with a discharge elevation, whereas the TAN responses included major negative and positive deflections. Finally, food versus air puff omission was better encoded by TANs. In terms of the activity of single neurons with distinct responses to the different behavioral events, both DANs and TANs were more strongly modulated by reward than by aversive related events and better reflected the probability of reward than aversive outcome. Thus, TANs and DANs encode the task episodes differentially. The DANs encode mainly the cue and outcome delivery, whereas the TANs mainly encode outcome delivery and omission at termination of the behavioral trial episode.
Introduction
The neural network of the basal ganglia (Bar-Gad and Bergman, 2001; Gurney et al., 2004) is commonly viewed as two functionally related subsystems. The main axis includes fast neurotransmissions (glutamate and GABA) between the cortex, striatum, and the basal ganglia output structures. The second subsystem is composed of neuromodulators that adjust the activity along the main axis by regulation of plasticity at the corticostriatal synapse (Calabresi et al., 2000; Reynolds et al., 2001). The primary basal ganglia neuromodulators are dopamine [from midbrain dopaminergic neurons (DANs) (Arbuthnott and Wickens, 2007)] and acetylcholine [from striatal cholinergic tonically active interneurons (TANs) (Calabresi et al., 2000)].
Previous studies have shown that DANs encode the prediction error in the positive domain; (i.e., they respond when conditions are better than expected) (Schultz et al., 1997). Consistent with the classical concept of dopamine–acetylcholine balance (Barbeau, 1962), the DANs and the TANs have opposite responses. DANs typically increase their discharge rate in response to appetitive predictive cues and outcomes, whereas TANs suppress their tonic discharge (Graybiel et al., 1994). Thus, it has been suggested that some of the dopamine influence on striatal projections neurons is mediated through inhibition of the TANs (Wang et al., 2006).
In contrast to the extensive research on reward-related activity, only a few studies have explored whether basal ganglia neurons encode the negative domain (e.g., aversive outcome or omission of rewards, which might not be identically encoded by the nervous system). Dopamine neurons decrease their firing rate in response to reward omission (Schultz et al., 1997). However, this suppression is limited because firing rate is truncated at zero. Other groups (Morris et al., 2004; Bayer and Glimcher, 2005) have reported that the discharge rate of dopaminergic neurons does not demonstrate instantaneous incremental encoding of reward omission, and an alternative encoding scheme, based on response duration, has been proposed (Bayer et al., 2007). There are even fewer studies and less agreement on basal ganglia responses to aversive events. Classical and instrumental conditioning studies suggest that some of the dopamine neurons increase their firing rate after a cue that predicts aversive outcomes (Mirenowicz and Schultz, 1996; Guarraci and Kapp, 1999). However, that increase in firing rate may be a result of reward generalization. Studies on anesthetized rats have shown that DANs mainly decrease their discharge rate after aversive stimulus (Ungless et al., 2004; Coizet et al., 2006). There are reports that TAN activity differentiates appetitive and aversive stimuli (Ravel et al., 2003; Yamada et al., 2004), but it remains unclear whether and how TANs respond to expectation of aversion.
Here, we designed a classical conditioning paradigm with aversive and rewarding probabilistic outcomes. Symmetric manipulations of expectation of food (rewarding event) or an air puff (aversive event) enable the comparison of neural responses to expectation of positive and negative outcomes. To provide additional controls for sensory, arousal-related, and generalization responses, our behavioral task included neutral trials, which had the same structure as the rewarding and the aversive trials but never yielded positive or negative outcomes.
Materials and Methods
All experimental protocols were performed in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and with Hebrew University guidelines for the use and care of laboratory animals in research, supervised by the institutional animal care and use committee.
Behavioral task.
Two monkeys (L and S; macaque fascicularis; female, 4 kg; male, 5 kg) were engaged in a probabilistic delay classical-conditioning task (see Fig. 1b). The monkeys were seated in a primate chair facing a 17 inch computer screen placed at a distance of 50 cm. Seven different fractal cues (Chaos Pro 3.2 program; www.chaospro.de), stretched on the entire screen, were introduced to the monkey, each predicting the outcome in a probabilistic manner. Three cues (reward cues) predicted a liquid food outcome (L, 0.4 ml, 100 ms duration; S, 0.6 ml, 150 ms) with a delivery probability of 1/3, 2/3, and 1. Three other cues (aversive cues) predicted an air puff outcome (L, 100 ms duration; S, 150 ms; 50–70 psi; split and directed 2 cm from each eye; Airstim; San Diego Instruments) with a delivery probability of 1/3, 2/3, and 1. The seventh cue (the neutral cue) was never followed by a food or air puff outcome. Cues were presented for 2 s and were immediately followed by a result epoch, which could include an outcome (food, air puff) or no outcome according to the probabilities associated with the cue. The beginning of the result epoch was signaled by one of three sounds that discriminated the three possible events: a drop of food, an air puff, or no outcome (see Fig. 1b). Sounds were normalized to the same intensity and duration. These sounds were additional to the background device sounds (air puff solenoid and food pump). All trials were followed by a variable intertrial interval (ITI) (monkey S, 3–7; monkey L, 4–8 s). Because of the probabilistic structure of the behavioral task and to equalize the average occurrence of each outcome the nondeterministic cues (p ≠ 1 for reward or aversive outcome) were introduced three time more than the deterministic ones. With this occurrence ratio, all trials were randomly interleaved.
During a behavioral session (usually five sessions per week), the monkeys performed 900–1300 trials/d before losing their motivation for food. Over the weekend, the monkeys were given ad libitum access to food. Water was available ad libitum during all training and recording periods. After the training period (L, 6; S, 2 months), we recorded the behavior and the basal ganglia neural activity while the monkeys were engaged in the behavioral task. The same images and sounds were used both for training and for the recording periods (L, 6; S, 5 months); however, the visual and the auditory stimuli were shuffled between monkeys.
Surgery, magnetic resonance imaging, and rehabilitation.
After the training period, a magnetic resonance imaging (MRI)-compatible Cilux head holder and a square Cilux recording chamber with a 27 mm (inner) side were attached to the monkey's head. The head holder enabled the immobilization of the head during recording. The recording chamber was attached to the skull tilted 40° laterally in the coronal plane with its center targeted at the stereotaxic coordinates of the GPe (A15, L7, H1) (Szabo and Cowan, 1984; Martin and Bowden, 2000). Analgesia and antibiotics were administered during surgery and continued for 2 d postoperatively. Recording began after a postoperative recovery period of 5 d.
We estimated the stereotaxic coordinates of the physiological recordings within the basal ganglia nuclei with MRI scans (see Fig. 1a). The MRI scan (General Electric 1.5 Tesla system; fast spin echo inversion recovery sequence; dual surface coil; repetition time, 3 s; echo time, 0.044 s; inversion time, 0.3 s; echo train length, 8; coronal slices, 2 mm wide) (Matsui et al., 2007) was performed with five tungsten electrodes at accurate coordinates of the chamber [Y,X = (6,0), (0,−6), (0,0), (0,6), and (−6,0) in mm from the chamber center]. We then aligned the two-dimensional MRI images with the sections of the atlas of Macaca fascicularis (Martin and Bowden, 2000). We performed an additional MRI scan at the final stage of the recording period of monkey L to verify our coordinate system. At the end of the experiment, the chamber and head holder of both monkeys were removed, the skin was sutured, and after a recovery period the monkeys were sent to a primate sanctuary (http://monkeypark.co.il).
All surgical procedures were performed under aseptic conditions and general isoflurane and N2O deep anesthesia. MRI procedure was performed under Dormitor and ketamine light anesthesia.
Recording and data acquisition.
During recording sessions, the monkey's head was immobilized and eight glass-coated tungsten microelectrodes (impedance, 0.2–0.8 MΩ at 1000 Hz), confined within a cylindrical guide (1.65 mm inner diameter), were advanced separately (EPS; Alpha Omega Engineering) into the targets in the basal ganglia. The electrical activity was amplified with a gain of 5K and bandpass filtered with a 1–6000 Hz four-pole Butterworth filter and continuously sampled at 25 kHz by 12 bits ± 5 V analog-to-digital (A/D). Spike activity was sorted and classified on-line using a template-matching algorithm (ASD; Alpha Omega Engineering). Spike detection pulses and behavioral events were sampled at 25 kHz (AlphaLab; Alpha Omega Engineering).
Mouth movements were monitored by an infrared reflection detector (see Fig. 2a) (Dr. Bouis Devices). The infrared signal was filtered between 1 and 100 Hz by a bandpass four-pole Butterworth filter, and sampled at 1.56 kHz. In addition, three computerized digital video cameras recorded the monkey's face and upper limbs at 50 Hz. Video analysis was performed on home-made custom software to identify periods when the monkeys closed their eyes (see Fig. 2b). Briefly, monkey eye location was identified by a human observer (once for a daily recording session in which the monkey's head was immobilized by connecting the head holder to an external metal frame), and a classification of eye states (open or closed) was made based on the number of dark pixels in the eye area. The algorithm was tested by random samples from several recording days and found to be consistent with the judgments of a human observer for >99% of the images. In representing recording sessions, we recorded the monkeys' spontaneous vocalizations, their arm movements with an accelerometer, eye position using infrared reflection, and heartbeat by electrocardiogram (ECG) (veterinary ECG 5 leads system; Palco Laboratories).
During the acquisition of the neuronal data, two experimenters (M.J., A.A.) controlled the position (2–50 μm steps) of the eight electrodes and the on-line spike sorting (ASD; Alpha Omega). Quality of detection and spike sorting was estimated and graded on-line every 3 min. The on-line quality estimation was based on the superimposed analog traces of the recently (20–100) sorted spikes, the waveforms of events that crossed an amplitude threshold set by the experimenter above the noise level of each electrode, the cumulative distribution of the distances from the detected events to the detection template, and the stability of the discharge rate.
The first step in the neuronal data analysis targeted verification of the real-time isolation quality (Joshua et al., 2007) and stability of the discharge rate (Gourévitch and Eggermont, 2007). Recorded units were subjected to off-line quality analysis that included tests for rate stability, refractory period, waveform isolation, and recording time. First, firing rate as a function of time during the recording session was graphically displayed, and the largest continuous segment of stable data were selected for additional analysis. Second, cells in which >0.02 of the total interspike intervals were <2 ms were excluded from the database. Third, only TANs with an isolation score (Joshua et al., 2007) >0.8 and DANs with an isolation score >0.5 were included in the database. The lower threshold used for the DANs is attributable to the highly dense cellular structure of the substantia nigra pars compacta (SNc) which makes single-cell isolation difficult. We tested the subgroup of DANs with an isolation score >0.8 (N = 52) and found the same qualitative population results as reported for the larger DAN population. Finally, only cells that met the above inclusion criteria for >20 min during performance of the behavioral task were included in the neural database (average, 59 min and 346 trials). Table 1 provides the statistical details of the cells that were included in the analysis database.
During recording, units were classified according to anatomical location, extracellular waveform, firing rate and pattern, background activity, and in some cases response to free reward and to injection of dopamine agonists. To validate classification, we performed off-line analyses of the extracellular waveform shape, firing rate, and firing pattern of the neurons (see Figs. 3b, 4b). Waveform shape was quantified as the duration from the first negative peak to the next positive peak; rate was defined as the average of the overall firing rate; firing pattern was quantified by the coefficient of variation (SD/mean) of the interspike intervals. To further validate the DAN population response, we repeated the population analysis on a subset of DANs with a firing rate <8 Hz and peak-to-peak duration of >0.5 ms. The results of this analysis were similar to those of the whole recorded population (data not shown). Finally, apomorphine (0.1 mg/kg) was injected in a few cases (see Fig. 4c) to test for suppression of DAN activity (Aebischer and Schultz, 1984). We quantified this suppression as the root mean square (RMS) of the high-pass-filtered signal (300–6000 Hz). We used the RMS and not the spike rate to avoid possible errors and biases induced by spike detection and sorting (Moran et al., 2006), which are enhanced after apomorphine intramuscular injection because of monkey movements.
Statistical analysis.
Neuronal responses to behavioral events were first characterized by their poststimulus time histogram (PSTH). The histograms were calculated in 1 ms bins and smoothed with a Gaussian window with a SD of 20 ms. The baseline firing rate was calculated by averaging the firing rate in the last 3 s of the variable (4–8, 3–7 s; monkey L and S, respectively) ITI and was denoted as baselineFR as follows: To determine significant responses in the single PSTH analysis, we calculated the SD of the PSTH of the last 3 s of the ITI using the same number of trials as in the studied PSTH and identified time segments in which the deviation from baselineFR exceeded three times the ITI-SD (3 σ rule). A response was considered significant only if the duration of the deviant segment was >60 ms (three times the SD of the smoothing filter). A cell was considered to have a significant response on a trial epoch if at least one of its PSTHs (e.g., one of the three PSTHs after reward aversive or neutral cue) had a significant response. To check that this analysis was not biased by multiple comparison confounds, we performed the same analysis on ITI epoch and found that none of the cells was significantly modulated at this epoch.
We defined the difference index between two events as the mean absolute difference between their PSTHs, i.e., the following: This index is a mean difference between rate functions and hence has units of spike per second. To test the significance of this index, we used resampling (bootstrap) methods. Single-trial responses were shuffled and resampled repeatedly into two groups, and the difference index was then calculated between them. This process was repeated 500 times. A difference index was considered significant if it was larger than the difference indices of a given fraction of these surrogates (1 − p where p is the test confidence level). To cross-check the difference index results, we performed MANOVAs (p < 0.05) using 50 and 100 ms time bins. We also bootstrapped the MANOVA statistics and found that all these analyses yielded similar results. In this manuscript, we elected to show the difference index because it gives an intuitive range of difference [i.e., the average difference (in spike rate) between the responses to two events].
We derived two indices from the difference index; the first was the response index that was defined as the difference index when one of the events was the neutral event, i.e. the following: The second was the probability coding index. This index was defined as the difference index between the events with a high probability (p = 2/3 and 1) of receiving an outcome and the event with a low probability (p = 1/3) of receiving the same outcome. The clustering of the events into high and low probability followed the behavioral responses of the monkey (see Results) and allowed us to generate a simple graphic representation of our results. A MANOVA of the responses to all the three different probabilities yielded similar results.
In addition to the single-cell analysis, we performed population analyses. The responses of striatal TANs and DANs are very stereotypic (Graybiel et al., 1994; Schultz, 1998). Hence, the average population response was estimated by averaging the PSTH deviation from baselineFR across the whole population. To determine whether the population response was significant, we first constructed the single-cell PSTH at bins of 20 ms, and then averaged across the population to obtain the population PSTH. Finally, we performed a t test to check bin by bin whether the population response was significantly different from zero (p < 0.01). If the population PSTH was significant for more than three consecutive bins, it was considered a significant population response.
The data of the two monkeys were grouped unless a significant difference between the individual monkeys was detected. Data analysis was performed on custom software using MATLAB V7 (Mathworks).
Results
We recorded the neuronal activity of TANs and DANs (Fig. 1a, Table 1) in parallel with the monitoring of the monkeys' behavior (Fig. 2). During recordings, the monkeys performed a probabilistic classical conditioning task (Fig. 1b) with food or air puff as the rewarding and aversive outcomes, respectively. This task design provides a symmetric expectation of a rewarding or aversive event after cue presentation and therefore served to test the following three hypotheses.
First, DAN and TAN activity reflects expectation, delivery, and omission of reward and of aversive events. The alternative is that only reward-related events are represented by the activity of one or both basal ganglia neuromodulators.
Second, DAN and TAN activity encode an error in the temporal prediction (TD) of reward and aversive events (Sutton and Barto, 1998). The TD hypothesis suggests opposite modulations for positive (i.e., delivery and expectation of reward and omission of aversive events) and negative errors (i.e., expectation and delivery of aversive events and omission of predicted rewarding event).
Third, TAN activity mirrors DAN activity. In a previous study, it was shown that the pause response of TANs was coincident with the increase in DAN activity; however, DANs but not TANs incrementally encoded reward probability (Morris et al., 2004). Here, we test whether this simultaneous opposite response also appears in a task that includes expectation and delivery of aversive events. We also examine whether TANs and DANs discriminate between reward and aversive events during the same parts of the task episode.
Monkey behavior reflects expectation of rewarding and aversive events
We recorded the monkeys' behavior during performance of a probabilistic classical conditioning task (Fig. 1b) with food or air puff as the rewarding and aversive outcomes, respectively. We tested how extensive (several months; 5 d/week; ∼1000 trials/d) conditioning affected the monkeys' behavior by monitoring licking and blinking responses during neural recordings (Fig. 2a,b).
The monkeys increased their licking in response to cues predicting food but only slightly to the aversive and neutral cues (Fig. 2c, top row). Similarly, the monkeys' frequency of blinking increased to cues predicting air puff but only slightly to reward and neutral cues (Fig. 2c, bottom row). The increase in blinking and licking during the cue epoch was maximal in trials in which the probability of outcome was 2/3 or 1 and smaller in trials in which the probability was 1/3 (0–500 ms before cue ending; p < 0.01, Tukey's HSD post hoc).
The behavioral responses to food or air puff delivery (and their corresponding sounds) were not dependent on their previous predictions (Fig. 2c, outcome column). Food and air puff omission, as well as the final (no outcome) event of the neutral trials were indicated to the monkeys by an additional “no outcome” sound. When expected food or air puff were not delivered (no outcome on the p = 1/3 or p = 2/3 trials), licking and blinking increased, respectively; this increase was in accordance with the previously instructed probability. The increase in the licking and blinking behavior was smaller and shorter than the increase after food or air puff outcomes (Fig. 2c, no outcome). Licking and blinking increased slightly to the neutral trials (Fig. 2c, no outcome, green line).
Normalization of the behavioral responses (Fig. 2d) reflects the opposite trends of the response to aversive versus rewarding events. It suggests that the monkeys mainly categorized the high-probability (p = 2/3 and 1) versus the low-probability (p = 1/3) cues. Heart rate analysis can discriminate between high- and low-arousal states (Berntson et al., 1997). However, analysis of the heart rate and its variability did not reveal significant differences between the epochs after aversive versus reward predicting cues, suggesting a symmetric effect on monkey arousal.
In sum, the analysis of the behavioral responses indicates the monkeys could distinguish between aversive, reward, and neutral cues and between the cues with high- and low-outcome probabilities. According to these behavioral findings, we grouped the events with high probability (p = 2/3 and p = 1) for the neural activity analysis.
The neuronal database
We recorded 191 DANs from the SNc and 313 TANs from the putamen; of these, 106 DANs and 180 TANs passed the quality criteria (see Materials and Methods) and their response was further analyzed (Table 1). Figures 3a and 4a show examples of the activity of a TAN and a DAN, respectively, recorded during the performance of the behavioral task. The TANs and DANs were identified on-line (see Materials and Methods) and identity was verified by off-line clustering of the spike waveforms, spike train pattern (Figs. 3b, 4b), and occasionally by analysis of the responses to apomorphine injection (Fig. 4c).
We found that, in each trial epoch (cue, outcome, and no outcome), most of the cells had a significant response to at least one event (Fig. 5). Below, we provide additional analysis both of the population and the single-cell responses at each epoch and compare the responses to the aversive, neutral, and reward-related events and between the DANs and TANs.
TAN and DAN activity is asymmetrically modulated by expectation of aversive events and reward in the cue epoch
Population analysis of the neuronal activity in the cue epoch shows that, whereas TAN average responses to the aversive, neutral, and reward predicting cues tended to overlap (Fig. 6a, top), the population average activity of the DANs was highly discriminative both between reward and aversive events and between cues with high (p = 2/3 and p = 1) and low (p = 1/3) prediction probability of reward delivery. The DANs positive response to aversive cues was smaller than the response to the reward cues (Fig. 6a, bottom). The suppression of TAN activity after high-probability reward cues tended to be longer than after low-probability cues (Fig. 6a, compare blue with light blue lines) (for similar trends, see Shimo and Hikosaka, 2001; Ravel et al., 2003). As previously reported (Morris et al., 2004), comparison of the time of reward cue modulation showed that the DAN increase and TAN decrease of discharge rate were coincident (Fig. 6b, top) with slightly shorter lags for the DAN responses. Finally, unlike the responses to the reward cue, the TAN and DAN responses to aversive cues had a significant second phase in which the TANs increased their activity and DANs decreased their activity (Fig. 6b, bottom).
The population average PSTH can be biased by a few neurons with an extreme response or opposite effects may be averaged out. We therefore formulated the difference index (see Materials and Methods) as a measure of the modulations of a single neuron to different events. We grouped responses across probabilities and tested whether the single-cell responses to reward and aversive cues were different from the response to the neutral cue. We found that, in both TAN and DAN populations, the response index (absolute deviation from the neutral response) for the reward trials was larger than the response index for aversive trials (Fig. 7a). A substantial fraction of TANs and DANs showed a significant response index to reward cues, whereas only a small number of cells had a significant response index to aversive cues (Fig. 7a, inset).
When separating the DAN and the TAN responses into high-probability (p = 2/3 and 1) and low-probability (p = 1/3) cues, coding of the reward probability was larger and more frequent than coding of the aversive probability (Fig. 7b). A multivariate ANOVA in which we did not group the high probability (p = 2/3 and 1) cues yielded similar results (data not shown). The difference between TAN single-cell responses and the TAN population results suggests that single-cell responses had opposite trends and were averaged out in the population analysis.
To summarize, we found larger and more frequent single-cell modulation of TAN and DAN discharge after reward than after aversive predicting cues (hypothesis 1). As expected from the TD hypothesis, DANs code reward probability both at level of the population and the single-unit responses; however, the TAN population did not robustly encode the outcome probability. Disconfirming the TD hypothesis, DANs also increased their activity in response to aversive cues and there was a small (but significant) difference between the DAN responses to aversive cues with different predictions of a future aversive event (hypothesis 2). Finally, the first phase of the responses of the TANs to the visual cues generally mirrored the response of the DANs. DANs but not the TAN population discriminated robustly between reward and aversive cues (hypothesis 3).
Both TANs and DANs respond to aversive and reward outcome
Population analysis of the neural activity at the time of outcome delivery and coincident sounds showed that both TANs and DANs respond to food and air puff delivery, but with a faster response to the air puff (Fig. 8a). Whereas the cumulative responses of the TANs to aversive and reward outcome were similar in magnitude, the response of the DANs was larger for the reward events. However, although smaller in magnitude, the DANs respond with an excitation to the aversive outcome (Fig. 8a). TANs and DANs activity at reward delivery was larger for the low-probability trials than for the high-probability trials (Fig. 8a). Comparison of the modulation time showed that the TAN and the DAN responses at the outcome epoch did not mirror each other. The large significant increase in the second phase of the TAN response to the reward outcome was coincident with only a small nonsignificant decrease in DAN activity (Fig. 8b, top). Furthermore, in the aversive outcome, the second phase of the TAN response overlapped the end of the first phase of the DAN response (i.e., the increase in the discharge rate of the TAN overlapped the increase in DAN activity) (Fig. 8b, bottom).
Single-cell analysis shows that a large fraction of the cells responded to reward or aversive events (Fig. 9a). However, as in the cue epoch, reward probability was better encoded than air puff probability in the outcome epoch as well (Fig. 9b), further showing that TAN and DAN activity was more strongly modulated by expectation of reward.
To summarize, we found a large modulation of TAN and DAN discharge rate after delivery of food or air puff, but with probability coding only for reward (hypothesis 1). We found larger TAN and DAN activity at reward delivery for the low-probability trials than for the high-probability trials. This trend was opposite to the trend found in the cue epoch (large response to the large probability) as expected according to the TD hypothesis (hypothesis 2). However, contrary to the naive TD hypothesis (which predicts that activity moves from outcome to cue during conditioning), we found that many TANs and DANs encoded the aversive outcome, whereas only very few encoded the aversive cue (compare Figs. 8, 9 with Figs. 6, 7). Furthermore, as opposed to the TD hypothesis, which predicts a decrease, DANs increased activity in response to the aversive outcome. In addition, the multiphasic response of the TANs, and the major changes observed in the second excitatory phase of the TAN responses, does not enable a straightforward comparison with the TD predictions. Finally, the TAN and DAN activity were not completely coincident, and both populations discriminated between reward and aversive trials (hypothesis 3).
TAN but not DAN populations robustly differentiates between omission of rewards and omission of aversive events
In contrast to our previous study (Morris et al., 2004), outcome omission was explicitly notified to the monkeys by a typical sound (Fig. 1b). In the 2004 study, the responses of both DANs and TANs to the reward omission were small. Population analysis of the no-outcome events in the current study showed that TANs, but not DANs, had large modulations of their discharge rates. During this epoch, the TAN population response (Fig. 10a) differentiated between the reward trials (food omission), the aversive trials (air puff omission), and the neutral trials (expected no outcome). Furthermore, the suppression of TAN activity was slightly longer after omission of the high-probability reward (Fig. 10a). Analysis of the population PSTHs shows that only the average response of the TANs, but not the DANs to the outcome omission events was significant (Fig. 10b). The multiphase TAN response was not coincident with the phases of the insignificant DAN modulation (Fig. 10b). Finally, we did not find a significant difference between the duration of the DAN responses (Bayer et al., 2007) to the outcome omission after high- and low-probability cues (data not shown).
Single-cell analysis shows that, as in the cue epoch, the response index of both TANs and DANs for the reward trials was larger than the response index for aversive trials (Fig. 11a). A substantial fraction of TANs and DANs showed a significant response index to reward omission, whereas only a smaller number of cells had a significant response index to aversive omission (Fig. 11a, inset). Coding of reward probability was larger and more frequent than coding of the probability of the aversive events (Fig. 11b). The difference between DAN single-cell responses and the population results suggests that some single-cell responses had opposite trends and were averaged out in the population analysis.
To summarize, TAN and DAN single-cell modulations were larger for reward than for aversive omission, with probability coding only for reward omission (hypothesis 1). As shown previously, after outcome omission (no outcome) the DANs encode the TD error weakly (hypothesis 2). The TAN and DAN activity were not coincident, and the TAN, but not the DAN, population coded the difference between reward, aversive, and neutral trials robustly (hypothesis 3).
Discussion
In this manuscript, we have shown that DAN and TAN encoding is not limited to encoding of reward prediction error but also reflects other psychological and behavioral processes. We found that rate modulations of TANs and DANs to expectation of reward were larger than the modulation, which followed predictions of aversive events. Furthermore, these neurons encode the expectation level (or the previous probability) of reward better than the expectation of aversive events. Finally, TAN responses were not coincident with DAN responses in all trial epochs. DANs encode the difference between reward and aversive trials in the cue and outcome epoch, whereas the TAN population encodes this difference in the outcome and no-outcome epochs. Therefore, complementary coding of TANs and DANs expands the encoding scope of the basal ganglia neuromodulators.
TANs and DANs strongly encode aversive outcome but not aversive expectations
There is no consensus regarding the responses of DANs to aversive events. Some studies suggest that at least some of the DANs increase their firing rate after aversive outcome (for review, see Horvitz, 2000), whereas others have evidence of a decrease (Ungless et al., 2004; Coizet et al., 2006). The reported increase in the firing rate of the DANs and striatal dopamine levels to negative events might be attributable to reward generalization (Mirenowicz and Schultz, 1996; Kakade and Dayan, 2002; Day et al., 2007). However, the blinking and licking behavior observed here indicate that the monkeys were able to reliably discriminate between reward, neutral, and aversive cues. Second, the calculation of the response index as the difference between the responses to appetitive/aversive and the neutral events overcomes the confounding effects of generalization. Finally, we found that the neuronal response to the aversive outcome was faster than responses to reward trials.
Thus, DAN responses to aversive events may reflect multiple sources of modulation (see below for error prediction encoding). This may explain some of the inconsistencies between previous experiments. Whereas in the behaving animal there can be positive modulations of DAN discharge in response to aversive events because of attention/arousal processes, when the animal is anesthetized, only discomfort-related activity can be demonstrated (Ungless et al., 2004; Coizet et al., 2006).
As for the TANs, our study confirms the pioneering works showing fast and robust TAN responses to an aversive outcome (Ravel et al., 1999, 2003). However, our study extends our previous work (Morris et al., 2004) showing minimal differences between the TAN population responses to cues predicting future rewards, to cues predicting future aversive events, and neutral cues. However, we found that unlike the population results that probably represent the average of opposing effects, many single TANs differentiate between reward and neutral trials and encode reward probability (Fig. 7). We further found that the population TANs encode the difference in the reward probability at outcome delivery (Fig. 8a) and that these cells have a large response to outcome omission (Fig. 10a). In the previous work (Morris et al., 2004), we found no discriminative response at outcome delivery and only a low response to reward omission. Future studies should explore whether this lack of concordance is attributable to differences in behavioral paradigms (for example, explicit vs implicit notification of trial termination, operant vs classical conditioning, and only rewarding outcomes vs rewarding and aversive outcomes) or to the animals' behavioral strategy and confidence in the prediction of future outcome.
DANs encode more than reward prediction errors
Recent studies have shown that DAN activity encodes the mismatch between prediction and reality. Most of these studies have focused on the mismatch in the positive domain (i.e., when conditions are better than expected) (Schultz et al., 1997). DANs typically increase their discharge rate in response to appetitive predictive cues and outcomes. In line with the predictions of reinforcement learning theories, the DAN discharge decreases with omission of predicted rewards (Schultz et al., 1993; Fiorillo et al., 2003; Matsumoto and Hikosaka, 2007). However, this discharge suppression is limited because the neuronal firing rate is truncated at zero. Indeed, several groups (Morris et al., 2004; Bayer and Glimcher, 2005) have reported that the instantaneous firing of DANs does not demonstrate incremental encoding of reward omission, and it was suggested that omission is encoded by duration of the discharge decrease (Bayer et al., 2007). In this experiment, however, we failed to find any significant coding of reward omission by response amplitude or duration.
Naive reinforcement learning models categorize events as having positive or negative errors and would suggest opposite sign modulation to reward and aversive trials (Schultz et al., 1997). However, we found similar trends for DAN responses to predictions, outcomes, and omission of reward and aversive-related events (Figs. 6a, 8a, 10a). In particular, we found a substantial increase to both reward and aversive outcome. Furthermore, responses of the DANs to reward omission and aversive outcome (Figs. 8a, 10a, respectively) were very different (decrease vs increase), although in both cases there was a negative reinforcement error.
To summarize, our results reveal an increase in the complexity of the encoding by the DANs of value. This does not rule out their role in the temporal difference hypothesis. On the contrary, our working hypothesis holds that the discharge rate of DANs and TANs reflects changes in reward prediction as well as changes in attention/arousal levels (Horvitz, 2000; Ravel and Richmond, 2006; Redgrave and Gurney, 2006).
Asymmetric encoding of positive and negative expectations by the basal ganglia
Previous primate instrumental conditioning experiments in which DANs and TANs were recorded did not include expectation of aversive outcome because the air puff could be avoided by a correct response (Mirenowicz and Schultz, 1996; Yamada et al., 2004). The symmetric classical conditioning paradigm of this study, which included reward predicting, aversive predicting, and a neutral cue, enabled us to explore whether there was symmetry in the encoding of expectation of rewarding versus aversive events by the TANs and the DANs.
Single-cell analysis revealed that TAN and DAN encoding of reward expectation and omission was larger and more frequent than encoding of expectation and omission of aversive events (Figs. 7a, 11a). Furthermore, we found that TAN and DAN encoding of the reward probability was larger and more frequent than their encoding of the probability of the air puff-related events (Figs. 7b, 9b, 11b). The preferential activation to reward was also apparent in the population response of the DANs at the cue and outcome epoch in which the activity of these cells was larger and coded the probabilities better (Figs. 6a, 8a). Thus, in line with previous studies (Mirenowicz and Schultz, 1996; Yamada et al., 2007), we show that even in a classical conditioning task in which the air puff is unavoidable, expectation of aversive events is weakly represented in the basal ganglia activity.
TANs do not mirror the DAN responses
The anatomical demonstration of dopaminergic innervations of striatal cholinergic interneurons (Lehmann and Langer, 1983) and the suppression of acetylcholine efflux from striatal slice by dopamine (Stoof et al., 1992) suggest that DANs directly inhibit the TANs (Wang et al., 2006). TANs might mediate the dopaminergic message to the D1 and D2 dopamine receptor containing striatal projection neurons.
The opposite and coincident responses of the TANs and DANs to predictive cues (Fig. 6) support direct inhibition. However, TAN responses at the terminal stage of the trial (Figs. 8b, 10b) include major positive deflections that do not mirror any phase of the dopaminergic response. Notably, after outcome omission, DANs respond similarly to the neutral outcome, reward, and air puff omissions, whereas the TANs robustly discriminate between the three events (Fig. 10). Thus, DANs may better encode the cue predicting events and the TANs may provide more information at the completion of the trial. This is consistent with the findings of subpopulations of striatal projection neurons with selective evaluative encoding of trial results (Lau and Glimcher, 2007, 2008). In any case, these differential responses indicate that the TAN discharge is not totally governed by its dopaminergic inputs; neither are the TANs and DANs driven by a common source (Matsumoto et al., 2001) with opposite effects on the two systems.
Concluding remarks
In this study, we showed that the dopaminergic and the cholinergic neuromodulators of the basal ganglia encode the positive domain of behavior in a nonredundant manner. This asymmetric encoding of behavior suggests that the basal ganglia collaborate with other neuronal systems to shape the animal's response to diverse environmental events. The characteristics and interactions of these different neuronal systems may provide the basis for asymmetric, irrational human attitudes toward rewarding and aversive events (Tversky and Kahneman, 1981). Finally, the stronger involvement of the basal ganglia in positive reinforcement learning is congruent with the findings that parkinsonian patients are better at learning to avoid choices that lead to negative outcomes than they are at learning from positive outcomes (Frank et al., 2004).
Footnotes
-
This work was partly supported by the “Fighting against Parkinson” grant from the Hebrew University Netherlands Association. We thank Dr. Bryon Gomberg for MRI; Michael Levi and Michal Rivlin for help in preparing the experimental setup; Yael Renernt and Inna Finkes for monkey training and general assistance; and Geoffrey Schoenbaum, Yavin Shaham, and Genela Morris for critical reading of early versions of this manuscript.
- Correspondence should be addressed to Mati Joshua, Department of Physiology, The Hebrew University–Hadassah Medical School, P.O. Box 12272, Jerusalem 91120, Israel. mati{at}alice.nc.huji.ac.il