Abstract
We recorded neural activity in the ventral pallidum (VP) while rats learned a pavlovian reward association. Rats learned to distinguish a tone that predicted sucrose pellets (CS+) from a different tone that predicted nothing (CS–). Many VP units became responsive to CS+, but few units responded to CS–. When two CS+ were encountered sequentially, the earliest predictor of reward became most potent. Many VP units were also activated when the sucrose reward was received [unconditioned stimulus (UCS)]. These VP units for UCS remained responsive to sucrose reward after learning, even when sucrose was already predicted by CS+. Neural representation of reward learning and reward itself was characterized by population codes. The population of units that responded to CS+ increased with learning, whereas the population that responded to UCS did not change. A relative firing rate code also represented the identities of conditioned stimuli and UCS. Firing rate differences among stimuli were acquired early and remained stable during subsequent training, whereas population codes and behavioral conditioned responses continued to develop during subsequent training. Thus, the VP makes use of dynamic CS population and rate codes to encode pavlovian reward cues in reward learning and uses stable UCS population and firing codes to encode sucrose reward itself.
- ventral pallidum
- reward
- pavlovian conditioning
- population code
- firing rate code
- mesolimbic
- dopamine
- accumbens
- incentive salience
- motivation
- learning
- electrophysiology
Introduction
The ventral pallidum (VP) is critical in mesocorticolimbic reward circuits (Kalivas and Nakamura, 1999). It integrates GABA, glutamate, opioid, and dopamine signals from nucleus accumbens, striatum, amygdala, and prefrontal cortex (Johnson and Napier, 1997; Olive and Maidment, 1998; Kelley, 1999; Zahm, 2000). In addition, the VP transmits reciprocal information with nucleus accumbens (Groenewegen et al., 1993) and drives ascending signals to prefrontal cortex via dorsomedial thalamus and descending signals to ventral tegmentum and brainstem targets (McAlonan et al., 1993; Panagis et al., 1997).
Behavioral studies have demonstrated that the VP mediates animals' responses to drug and natural rewards, and their hedonic impact. VP electrical stimulation, or microinjection of various drugs, elicits eating, locomotion, and reward-related behaviors like instrumental responding and conditioned place preference (Johnson et al., 1993; Panagis et al., 1995; Gong et al., 1996, 1999; Fletcher et al., 1998; Stratford et al., 1999). VP lesions conversely prevent sucrose or psychostimulant reward from establishing conditioned place preferences and alter food reward thresholds necessary for instrumental responding (Hiroi and White, 1993; McAlonan et al., 1993; Johnson et al., 1996; Gong et al., 1997). Lesions of the VP also cause reduced hedonic “liking” for sucrose taste, as indicated by loss of positive orofacial reactions and replacement with aversive reactions in rats (Cromwell and Berridge, 1993). VP mediation of the hedonic impact of taste is consistent with its input from accumbens shell regions, in which opioid agonists modulate taste affective reactions and palatability-related aspects of food intake (Kelley et al., 2002; Peciña and Berridge, 2000).
VP neurons have received little attention in neuronal recording studies of reward learning in behaving animals. However, neural recording studies of other structures, such as ascending dopamine projections from midbrain and targets in striatum and nucleus accumbens, have shown that after reward learning, many of those neurons show anticipatory responses to predictive stimuli, especially the first of a series of predictors. Those anticipatory responses to reward cues sometimes predominate over responses to instrumental movements and even over responses to the predicted rewards themselves (Aosaki et al., 1994; Mirenowicz and Schultz, 1994; Tremblay et al., 1998; Schultz, 1998b, 2001; Carelli et al., 2000; Cromwell and Schultz, 2003; Ghitza et al., 2003; Setlow et al., 2003). A crucial question is how brain reward systems continue to encode the receipt of a reward unconditioned stimulus (UCS) after its prediction has been learned. The VP seems viable as a candidate component to mediate both conditioned stimulus (CS) effects (e.g., associative prediction, cue-triggered incentive motivation, conditioned reinforcement) and reward UCS effects (e.g., associative teaching or prediction error signals, hedonic impact, incentive salience boosting, response reinforcement). If so, the VP could be pivotal to a multiplex reward system for processing rewards and their predictors.
To test whether the VP encodes both learned reward cues and unconditioned rewards, we recorded from VP units while rats learned a pavlovian association between auditory conditioned stimuli and sucrose reward UCS (Fig. 1). We found that VP neuronal responses used two types of code to represent learned conditioned stimuli and their reward UCS: a dynamic population code and a firing rate code. The population code for CS+ developed slowly and progressively with learning trials, whereas the CS+ firing rate code developed relatively early in learning. In contrast, the UCS population and firing rate codes both were stable over trials, remaining constant before and after learning. These results provide a first demonstration that the VP firing rate and population codes might be neural candidates for representing both UCS reward and CS incentive cues.
Materials and Methods
Animals. Male Sprague Dawley rats (290–340 gm body weight; n = 5 rats with seven recording electrodes per rat) were housed individually on a 10:00 A.M. to 10:00 P.M. reversed light/dark schedule. Food was limited to 20–25 gm per day, provided after recording sessions, to keep the rats motivated to retrieve sucrose pellets. Water was supplied ad libitum. Weight was monitored, and no rats lost weight during the study. The University Committee on the Use and Care of Animals approved all experimental methods.
Recording. All pavlovian training, testing, and recordings were conducted in the same 28 × 35 cm plastic test chambers. A rat was placed near the center of the chamber at the beginning of a trial and was free to explore throughout the session. The chamber top was open to allow connections to the electrode headstage by a commutator. A mirror was placed at an angle under the glass bottom of the chamber to allow a video camera to view and record the behavior of rats during recording sessions. The chamber contained a red house light, a pellet dispenser, a pellet dish, and two buzzers that produced the CS tones. Computer programs controlled the stimuli presentations and data recording. We used a program written in this laboratory (Mtask) to manage the presentation of conditioned and unconditioned stimuli and to record the presentation times in a database record. Neuronal activity was recorded on a computer with Workbench (DataWave Technologies, Longmont, CO). The timestamp clocks for behavioral task control and neural and video recordings were synchronized to enable subsequent evaluation of neural activity related to stimulus presentations and behavioral movements, which were subsequently identified offline in a frame-by-frame video analysis (Aldridge and Berridge, 1998).
Preparatory magazine training. Each rat was handled initially and allowed to explore the recording chamber freely in three 20 min sessions before training to gentle them and accommodate them to experimental procedures. The accommodation sessions were followed by three 20 min magazine training sessions, wherein rats received one sucrose pellet (the UCS) on a fixed-interval 1 reinforcement schedule (one sucrose pellet per minute). In these magazine training sessions, all rats learned to move from the chamber perimeter or center toward the sucrose bowl and take the sucrose pellet within a few seconds after the feederclick (the short clicking sound made by the pellet dispenser when it releases a pellet). Once the magazine training task was learned, recording electrodes were implanted stereotaxically into the left VP of each rat.
Electrode implantation surgery. Rats were anesthetized with a mixture of ketamine (100 mg/kg) and xylazine (10 mg/kg) for the stereotaxic placement of electrodes (lateral, 2.6; anterior, –0.46; D, 7.3). The caudal half of the VP was targeted based on prior studies indicating possible reward roles for this region (Cromwell and Berridge, 1993; Johnson et al., 1993; Panagis et al., 1997). Electrodes consisted of twisted bundles of eight wires (15 μm tungsten or platinum iridium) (Jaeger et al., 1990). Each wire served as an independent recording electrode, providing up to seven recording sites per rat tightly clustered within a region equal to or smaller than 1 mm in diameter. One wire with no spike activity was selected during recording sessions to serve as a reference channel for differential recording. Bone screws were inserted into the skull and served as a ground reference.
To facilitate accurate placement within the VP, neuronal activity was recorded during surgery (differential reference to ground) as the electrode cluster was lowered into the brain. Neurons in regions dorsal to the VP have slower spike train patterns and smaller amplitudes than the pallidum. A transition to faster spike rates and larger amplitudes while the electrodes were lowered during surgery was interpreted as a sign of electrode movement from dorsal structures into the VP, indicating successful VP placement of the electrode. Another useful indicator of successful placement in some instances was transition from the signature pattern of axons in the internal capsule (e.g., short duration, initially positive). Electrode placements were confirmed histologically after the experiment. Once placed, electrodes and the mounting adapter were fixed in place with dental acrylic.
Pavlovian training. Rats recovered for at least 3 d before the pavlovian training and VP recording sessions began. VP neural activity was recorded on a daily basis during all pavlovian training sessions. During 45 min pavlovian training sessions, two different tones were arbitrarily assigned as either the rewarded conditioned stimulus on which sucrose presentation was contingent (CS+) or the unrewarded conditioned stimulus that predicted nothing (CS–) [one tone was continuous and relatively high in frequency (3800 Hz; 10 sec duration); the other tone was pulsed and lower in frequency (400 Hz; 10 sec total of ∼0.75 sec on/off pulses)]. The assignment of tones as CS+/CS– was counterbalanced for different rats (low-frequency pulsed tone CS+, two rats; high-frequency continuous tone CS+, three rats).
Tones were presented on a variable interval, 2 min schedule in pseudorandom order, so rats could never predict when a tone would be presented or which tone would be next (Fig. 1). Rats were free to move about the entire chamber and typically were near the sucrose bowl or chamber perimeter at the start of a trial. All rats received 5 d of pavlovian training. In addition, three rats were further tested on one to four additional pavlovian sessions to ensure that a sufficient number of units were recorded to permit statistical analysis for every rat.
Serial order of CS+1, CS+2, and UCS. The CS+ tone was always the first predictor in a rewarded trial. On termination of the 10 sec CS+ tone, the feeder made a clicking sound, and ∼1 sec later the sugar pellet reward (UCS) reached the food cup. Thus, two serial CS+ events, a tone and a feederclick, predicted reward (UCS). For that reason, we will refer to the CS+ tone as the CS+1 and to the feederclick as the CS+2. The onset time of UCS was defined as the moment a rat's mouth or tongue made contact with the sucrose pellet as determined from a freeze frame video analysis.
Early/late training phases for analysis. To evaluate the temporal development of reward learning, pavlovian training was broken into early and late training phases. We divided the unit sample across these phases in a way that balanced the number of units in each phase. For rats that received an odd number of training days (5 or 9), the early/late split was assigned at the middle day that best equalized the number of units recorded in each phase (e.g., early = days 1–2 and late = days 3–5; early = days 1–4 and late = days 5–9) (Table 1). However, because early/late division by units produces discrepancies across rats in the number of days contained in each phase, we also reanalyzed data after defining early phase simply as days 1 and 2 and defining late training as all subsequent days. In most cases, statistical results and conclusions were the same for both analyses, and so we report only results from the equal division of units, except when the other analyses produced different results, when we report both.
Histology. After final recordings were finished, rats were anesthetized before marking electrode placements. Electrolytic lesions were made by passing sufficient electrical currents through one wire to cause visible damage around the wire bundle. Rats were given an overdose of pentobarbital and perfused transcardially with saline and formaldehyde. Brains were removed, sliced in 40 μm coronal sections, and stained with cresyl violet. Slices were examined under a microscope to verify lesion site and electrode placement in the VP.
Behavioral analyses. The behavioral response to reward was quantified by the number of trials a rat's nose crossed the food dish boundary. We compared the total number of nose crossings during each CS+ tone and CS– tone (10 sec periods) to a matched baseline period of 10 sec before each tone. ANOVA and Bonferroni-corrected post hoc tests were used for statistical comparisons of nose crossings. Other behavioral responses were scored in a frame-by-frame video analysis, including: orientation to the dish (sudden shift of the head toward the food dish), approach to dish (scored as steps toward the food dish and as nose crosses into the food dish), orofacial movements (licking and jaw movements made outside of pellet acquisition), and UCS acquisition (scored as the moment the sucrose pellet was first touched by the rat's mouth or tongue).
Neural analyses. We used an interactive computer program, Offline Sorter (Plexon, Inc., Dallas, TX), to discriminate neural unit spike waveforms from noise and other units. Single units (45% of total sample) were identified by distinct spike waveforms with a clear refractory period in an autocorrelation histogram (NeuroExplorer; Nex Technologies, Littleton, MA). Units that could not be separated from each other or from noise were discarded. Clear waveform clusters that appeared to represent single units, but had small amounts of noise evident in the refractory period on the autocorrelogram, were classified and tested separately (55% of total sample). A comparison of units with noise components to single units without noise showed no difference in the number of responses to conditioned stimuli or UCS (F = 1.469; p < 0.232) and no difference with training (F = 0.260; p < 0.610). In all cases below, unless noted specifically, we, therefore, pooled single units and units with noise and referred to them as “units.” As a final discrimination procedure, we performed a cross-correlation analysis (NeuroExplorer; Nex Technologies) on all units to ensure that any unit recorded simultaneously on two electrode sites was counted only once in the analyses.
Generally, electrodes did not have the same units on consecutive days. For example, units that appeared on one day often did not appear with identical waveforms on a consecutive day, and new units sometimes appeared after several days on an electrode that previously had no units. Overall, the waveforms and other characteristics of neuronal activity suggested that different units were being recorded over different training sessions.
Population code analyses. The goal of the neuronal analysis was to compare and evaluate unit population activity changes during the conditioned and unconditioned stimuli. Perievent time histograms and rasters for each unit were analyzed for the behavioral events of interest (CS+1 tone, CS+2 feederclick, CS– tone, contact with UCS sucrose pellet, emission of spontaneous movements). The 10 sec preceding each tone served as baseline for conditioned responses in VP firing. A unit was considered responsive to a behavioral event when three of five of its consecutive 50 msec bins on the histogram crossed a 99% confidence interval at any time during the epoch being assessed (Fig. 2). The rewarded tone (CS+1) and unrewarded tone (CS–) data periods were each 10 sec long, identical to duration of the tones themselves. The feederclick (CS+2) and reward stimulus (UCS) periods were 500 msec in duration. To analyze changes in unit population coding of particular stimuli over training, responsive units were tallied for each stimulus on each day of training. These tallies were translated into a binomial analysis comparing percentage of responsive units across stimuli and across training stages. ANOVA and Bonferroni-corrected post hoc tests were implemented with the program Systat (SPSS, Inc., Chicago, IL). For the analysis of reward learning effects on VP responses, neural data were divided into early and late training sessions and evaluated statistically with Systat.
Movement-related analysis. To determine whether neural activity of responsive VP units during the tones was related to movements, we examined movement-related activation of 80% (45 of 56) of the units responsive to CS+ and 78% (21 of 27) of units responsive to CS– from all rats and all training phases during several types of limb movements, head movements, and oral movements. At least 10 repetitions of right or left head turns and right and left stepping movements were scored for perievent histograms in the same manner as conditioned stimuli. To determine whether neural activity of responsive units during UCS was related to jaw or tongue movements, we compared responses during oral movements made while grooming for 77% (37 of 48) of units responsive to UCS. Mouth and tongue movements that occurred during paw licking or flank licking were scored and identified by video analysis. Onset times of at least 10 licking or jaw movements made during grooming were scored for each unit, and the resulting histograms were analyzed as described above.
Rate code analysis. Relative changes in firing rates evoked by conditioned and unconditioned stimuli were computed in a rate code analysis of 500 msec epochs at the onset of all conditioned stimuli and unconditioned stimuli, and also of 10 sec epochs for CS tones. Our goal was to identify possible rate-coding mechanisms across the entire sample of responsive VP units. Because absolute pallidal firing rates vary widely from neuron to neuron, we compared the relative changes of each unit from its individualized baseline rather than absolute firing rates. For each unit, the rate of firing was determined during the 10 sec baseline periods before CS+ and CS–, during the 10 sec CS+1 tone and CS– tone, and during the first 500 msec periods of CS+1, CS+2, CS–, and UCS. Only 500 msec periods were used for the CS+2 click to prevent overlap with the immediately subsequent UCS (Fig. 1). Similarly, only a short 500 msec period was used for the UCS rate analysis to avoid the potentially confounding movement-related orofacial responses related to subsequent consummatory behavior. The pretone period served as baseline for each unit. With these data for each unit and each stimulus, relative firing rates were computed. Rates were normalized by dividing the firing rate in the stimulus period by its associated pretone baseline.
All comparisons between CS and UCS differences in firing rates used these normalized firing rates, and all comparisons assessed the entire population of VP units responsive to any stimulus. That is, analyses of relative firing rates of a particular stimulus were not restricted to units that exceeded the 99% confidence interval during that stimulus but included all VP units that responded to any stimuli at all. Only units that had no responses to any behavioral event were excluded. Thus, the relative rate changes to each stimulus represent a conservative estimate of rate coding, because some units may not have been responsive to that particular stimulus. ANOVA and Bonferroni-corrected post hoc tests were used for statistical evaluation of rate changes.
Results
Histology
All electrodes were in the VP within 0.25 mm of each other [anteroposterior (AP), –0.26 to –0.51 bregma; mediolateral (ML), 1.8–2.2; dorsoventral, –7.1 to –7.8]. The sites were in a midcaudal region of the VP, at an AP level in which the VP is immediately dorsolateral to the lateral preoptic area of the hypothalamus. Most electrodes were in the medial half of the VP, except one caudal electrode that was slightly more lateral than any of the others (ML, 2.9) (Fig. 3).
Behavior
Conditioned approach responses to the sucrose bowl, defined by nose crossing of the sucrose bowl boundary, confirmed that rats learned the discriminative pavlovian association by the late training trials (days 4–6) (Fig. 4). The CS+1 tone evoked more approaches in late training trials than in early training (early, 6.9 + 0.7; late, 9.3 + 0.4; t = –2.377; p = 0.012), and the CS+1 also evoked more nose crosses than the CS– tone by late training trials (early: t = 1.047, p < 0.306; late: t = 5.236, p < 0.0001). Approaches during CS– remained unchanged from early to late training (early, 6.0 + 0.5; late, 5.8 + 0.5; t = 0.318, p = 0.752). Approaches during the baseline pre-CS period also did not change with learning (early, 2.3 + 0.75; late, 2.0 + 0.265; t = 0.785, p = 0.438).
Slight generalization between the tones was indicated by the ability of both CS+ and CS– tones to evoke more approaches than during prestimulus baselines (F = 80.497; p < 0.0001), especially during early training before learning was complete (training times stimulus interaction: F = 5.203; p = 0.007). In early training, both CS+1 and CS– evoked weak approaches, but only CS+1 approaches rose 150% further on later trials (t = –2.377; p = 0.012).
A series of two separate auditory CS+ existed on reward trials, which provided opportunity to investigate the role of CS+ temporal order. The reward-predicting tone (CS+1) was always followed immediately by the audible feederclick mechanism (CS+2) that delivered the sucrose pellet. Rats had previously learned the relationship between this click and sucrose during several days of magazine training conducted before any pavlovian training with tones. Even on early tone-training trials, the CS+2 click caused rats to approach the sucrose bowl and eat the sucrose pellet within 10 sec on 69% of trials. The ability of CS+2 to evoke approach early in training confirmed it had already gained some reward significance in magazine training before conditioning of tones.
Beyond evoking an initial nosepoke into the sucrose bowl, the CS tones could also elicit multiple additional pokes during its 10 sec duration. These were reflected in an additional analysis of the absolute number of nose crossings per duration. The CS+ was considerably stronger than CS– in evoking this type of persistent behavioral approach. Even in early training, CS+1 crossings were 164% greater than during CS– (CS+1 mean, 17.47; CS– mean, 10.6), and in late training this grew so that CS+1 nose crossings were 278% greater than CS– (CS+1 mean, 31.67; CS– mean, 11.39). So, the CS+ elicited much more sustained or persistent investigation of the sucrose bowl after training, even if the CS– tone elicited a brief check of the bowl, indicating the CS+1 became a stronger conditioned incentive stimulus. In summary, rats learned to distinguish between rewarded (CS+1) and unrewarded (CS–) tones, and the CS+1 was a much more potent elicitor of conditioned behavioral approach.
Neural activation
We recorded 69 units in the early training phase and 69 units in late training, for a total of 138 units from 35 electrode wires (see Table 1 for contributions of each rat). Overall, 96 units (70%) were responsive to one or more stimuli in the pavlovian reward task (CS+1, CS+2, CS–, or UCS). Subsequent analyses focus on this sample of 96 responsive units. UCS-responsive units were plentiful at all electrode sites, and CS-responsive units were especially numerous at caudal electrode sites (all except the most rostral site) (Fig. 3). The average firing rate of responsive units during the pretone period was 9.0 ± 0.5 spikes/sec, which is similar to rates observed in the VP by other workers (Heidenreich et al., 1995). Most ventral pallidal units had characteristic fast firing rates, although some clearly discriminable units had slower rates as observed in other pallidal structures and species (Aldridge and Gilman, 1991; Sachdev et al., 1991; Meyer-Luehmann et al., 2002).
Exclusion of sensory VP coding
Sensory features of the tones (e.g., high-frequency continuous vs low-frequency pulsed) appeared less important than associative learning or conditioned reward features (i.e., CS+ vs CS– prediction or incentive value) in determining VP responses. Sensory comparison was conducted on day 1, when the pulsed tone and continuous tone did not yet elicit different conditioned behavioral responses (F = 0.484; p = 0.513) or interact with their CS assignments in behavioral responses (F = 0.744; p = 0.422), indicating that the tones were still merely sensory stimuli that had not yet gained associative or conditioned incentive properties on the first day of training. We found no differences in VP unit populations responding to the low-frequency pulsed tone versus the high-frequency continuous tone on day 1 (F = 0.922; p = 0.343), no group differences between rats that had pulsed tone assigned as CS+1 and rats that had the continuous tone as CS+1 (F = 0.003; p = 0.957), and no interaction (F = 2.484; p = 0.124). Similarly, we found no differences in overall VP firing rates to the two sensory stimuli on day 1 (F = 0.437; p = 0.509), no group differences (F = 2.046; p = 0.154), and no interaction (F = 0.047; p = 0.829). These results indicate that there was no intrinsic auditory sensory coding in VP unit responses to the two tones before learning.
Exclusion of movement VP coding
Neural activity of VP units responsive to the task did not correlate strongly to any scored type of movement. For our sample of units responsive to CS+1, 98% were completely unresponsive to movement. Only 1 unit of the 45 CS+1 units examined (2%) responded at all to a head turn, and no units responded to a stepping movement. Thus, most CS+ responses seem to reflect the predictive cue stimulus, not concurrent orientation movements. Over 90% of examined units responsive to CS– showed no movement-related activity. Only two units responsive to CS– (10%; 2 of 21) were minimally responsive to any movement: one to head turn and the other to stepping movements. For UCS-responsive units, neural activity rarely coincided in time with the jaw movements and was never rhythmic in character. The first 500 msec of UCS contact, which is the period evaluated for UCS, usually ended before chewing began. Furthermore, over 89% of units responsive to UCS did not respond at all to mouth and tongue movements such as licking or jaw movements made during grooming of body flanks or paws (i.e., during the “movement control” condition). We found only 4 of 37 sampled units (10.8%) responsive to UCS also responded modestly to licking or jaw movements made outside of pellet acquisition. In summary, VP units did not generally seem to code the motor properties of body or oral movements (Fig. 2).
Population coding of reward task
Some units (41%; n = 39) were responsive uniquely to one stimulus, but most units responded to a combination of stimuli as described below (59%; n = 57); for example, units responsive to CS+1 may also be responsive to CS+2, CS–, or UCS (Fig. 5). Of 11 possible combinations of stimuli, only the combination of CS– and UCS had no responsive units.
Populations responding in early training
On trials early in training, rats had learned behavioral conditioned responses only to the CS+2 feederclick sound of sucrose delivery. This CS+2 click was also the most potent stimulus for eliciting VP neural activation in early training (days 1–3). Of 51 VP units examined in the early training period, 34 units (67%) responded to the CS+2 click. Twenty percent (n = 10) of CS+2 units responded only to the CS+2. Twenty-six percent of CS+2 click-responsive units (n = 13) were also activated by other conditioned stimuli (CS+1, CS–), and the remainder was also responsive to UCS. In contrast to the large VP population that responded to the CS+2 click, only 4% (n = 2) of responsive VP units fired uniquely to the predictive reward tone (CS+1) that was still being learned and was relatively ineffective behaviorally. Approximately 12% responded to both the CS+1 tone and the CS+2 click. Similarly, small proportions of VP units responded to both CS+ and CS– tones (2%; n = 1) or uniquely to the CS– tone (2%; n = 1).
After the CS+2, the next most potent neural activator during early training was the unconditioned reward itself (the sucrose pellet, UCS). Sucrose UCS activated 47% (n = 24) of VP units, including 12% (n = 7) that responded to it alone. Other UCS-responsive units (14%; n = 7) were activated as well by one or both of the CS tones, and 12% (n = 6) responded to the CS+2 feederclick in addition to sucrose pellet rewards. Five units (10%) responded to the UCS in combination with the feederclick and one or both of the tones. In the reanalysis that defined early training as only the first 2 d, the UCS again evoked a robust population of early units, and the pattern among stimuli was also the same as for the equal unit analysis just described. However, the magnitude of elevation of CS+2 over CS+1 and UCS were seen as trends rather than as significant differences in the more restricted analysis. This slight difference in outcome might reflect that learned differences among conditioned stimuli were still weak in the first 2 d or be attributable to the smaller number of units included when early training is defined as only the first 2 d rather than as 3–4 d.
In summary, during early training, the two most effective elicitors of VP unit activation were the click (CS+2) that signaled immediate sucrose availability and the sucrose reward UCS itself. At this early stage of learning, the novel CS+ and CS– tones were relatively ineffective elicitors of VP neuronal activity.
Late training
The pattern of neural responses reversed across CS+1 and CS+2 in late training sessions. At this stage, more VP units responded to the CS+1 tone than to either the CS+2 click or to the CS– tone (stimulus main effect: F = 7.339; p < 0.0001). This change in the neuronal population reflected discriminative pavlovian learning about CS+ versus CS– (Fig. 6) (interaction between stimulus and training phase: F = 4.386; p < 0.005). Again, in the 2 d reanalysis, the reversed elevation of CS+1 over CS+2 in late phase was seen only as a trend, but the pattern of results was otherwise the same, supporting the analysis already described that included additional units. In summary, the development of CS+1 population responses in late learning generally paralleled the development of behavioral conditioned anticipatory approach responses to the sucrose dish.
Most VP units (71%; n = 32 of 45 units) showed an acquired response to the rewarded CS+1 tone by late trials (Fig. 7a). This change represented a significant increase in CS+1 responses compared with early training (t = –2.859; p < 0.005). Units responsive to CS+1 included 20% (n = 9) of VP units that responded to the CS+1 tone alone. An additional 10% (n = 5) of VP units responded to the CS+2 click as well as the CS+1 tone. An additional 9% (n = 4) of VP units responded both to the CS+1 tone and to the receipt of the sucrose UCS, and 7% (n = 3) of units responded to all three in the series of reward-related stimuli, CS+1, CS+2, and UCS. Thus, in contrast to early training, the majority of VP units in late training were activated by presentation of the CS+1 tone, which was the first predictor during a trial to signal that sucrose reward was impending.
The increased population of responses to the first predictor in late training was accompanied by a fall in responses to the CS+2 feederclick (Fig. 7c) (t = 2.216; p < 0.029). Most units responsive to CS+2 also responded to CS+1 (10%; n = 5), UCS (4%; n = 2), or both (7%; n = 3). Only 7% (n = 3) of VP units responded to the CS+2 feederclick alone on late trials, <1/5 of the population size that had uniquely responded to CS+2 in early training.
The associative specificity of CS+ responses was indicated further by continuing low responses to the CS– tone that predicted no reward (Fig. 7b). Only 2% (n = 1) of the units responded to the CS– alone in late training trials. A larger minority responded to the CS– (27%; n = 12) in addition to CS+ or UCS stimuli. Overall, the proportion of VP units that responded to CS– on late trials remained unchanged from previous training (t = 0.296; p = 0.768).
In contrast to the drop in CS+2 responses in late training, the UCS reward itself remained a potent stimulus even after the CS associations were well learned. Nearly half (48%; n = 22) of the responsive units in the VP still altered firing in response to the actual receipt of sucrose reward in late training, a population size unchanged from early training (t = –0.177; p = 0.860). On late training trials, ∼16% (n = 7) of VP units responded solely to sucrose UCS (oral contact), and to no other stimulus. This population of “unique UCS responders” was undiminished in size from early training trials (t = –0.750; p = 0.458). An additional 33% (n = 15) of VP units responded both to the sucrose UCS and to one or more conditioned stimuli, usually the CS+1 tone (29%; n = 13). Approximately 9% (n = 4) of the units responded only to CS+1/UCS stimuli, and an additional 7% responded to those and to the intervening CS+2 feederclick. Another 9% (n = 4) responded to CS+1/UCS stimuli plus the CS–, and 4% (n = 2) responded to all conditioned stimuli as well as the UCS. In summary, the sucrose UCS continued to elicit robust responses from VP units, even after rats had learned the pavlovian CS associations that predicted it.
Excitatory/inhibitory activation patterns
Of all responsive VP units, 69% (n = 66 of 96) exhibited excitations alone, 4% (n = 4) were inhibited, and 27% (n = 26) had a mixture of excitation and inhibition. All responses evoked by conditioned stimuli (CS+1, CS+2, CS–; n = 83) were excitatory. UCS responses were sometimes excitatory but more often exhibited decreased activation patterns. The incidence of decreasing patterns predominated over excitation (p < 0.01; χ2 test; early: 33% excitation, 67% inhibition; late: 36% excitation, 63% inhibition), and the incidence did not change significantly across learning trials (t = –0.761; p = 0.453).
The “inhibition” associated with UCS was relative and not absolute. That is, firing rates decreased relative to preceding CS+ excitatory peaks but usually did not decrease below pre-CS+ baseline firing rates (Fig. 8). Therefore, this drop in firing with UCS might be more precisely characterized as “loss of CS+ excitation” rather than true inhibition, because it required prior CS-elicited excitation, and the drop existed mainly in relation to that excitation. This sequential interaction between CS+ and UCS may reflect a UCS “shutting off” of the activation triggered by CS+ (Fig. 8). UCS termination of CS+ activation is consistent with our observation that of 29 units that responded to both CS and UCS, 24 units had decreasing UCS responses, whereas units responding solely to UCS had more excitation than decreased activity (eight excitations vs four decreasing responses).
Firing rate coding of reward task
To evaluate possible rate-coding mechanisms, and to better characterize the population of responsive units, we analyzed normalized firing rates during the 10 sec duration of CS+1 and CS–, and the 500 msec duration of all stimuli (CS+1, CS+2, CS–, and UCS). The entire responsive VP population was used for this analysis to ensure that any rate differences found reflected true firing rate changes shared by the active VP population as a whole and not just the subpopulations responsive to particular stimuli. Rate codes distributed across large numbers of VP units could be a channel of information processing that is at least potentially independent from the population codes based on smaller groups of units.
We found that firing rates differed consistently between the pavlovian and reward stimuli. The UCS sugar pellet elicited the fastest overall firing from VP populations, CS+1 and CS+2 elicited intermediate rates of firing, and CS– had the slowest firing rates (Fig. 9a) (ANOVA, stimulus main effect, 10 sec period: F = 28.226, p < 0.0001; 500 msec period: F = 24.382, p < 0.0001).
Although some units decreased firing to UCS after previous activation to a CS+, as described above, sucrose UCS still elicited the fastest firing overall from the entire population of responsive VP units that responded to any CS or UCS reward stimulus at all. This suggests that the large majority of VP units must contribute to stimulus differences in the rate codes, even if some of those units do not meet our statistical criteria for the smaller subpopulation that “responds” to a particular stimulus. In other words, responsive VP units as a whole increased their firing rates most to UCS. This includes two-thirds of the UCS-responsive subpopulation that responded with decreasing activity after CS+1 and one-third that, like the VP population overall, increased firing further to UCS although they may have previously increased firing to CS+1.
Interestingly, the rate differences between stimuli across the entire population of responsive units existed in both early (i.e., first 3–4 d) and late training (absence of training phase main effect, 10 sec period: F = 0.584, p = 0.445; 500 msec period: F = 0.230, p = 0.631). The UCS had significantly higher firing rates than conditioned stimuli in both early and late training (early: t = 5.409, p < 0.0001; late: t = 3.316, p = 0.001). Thus, the training phase did not seem to modulate the relationship in which UCS had higher rates than CS stimuli (Fig. 9b) (absence of interaction between stimulus and training phase, 10-sec period: F = 2.377, p = 0.068; 500 msec period: F = 2.238, p = 0.082). This persistent relationship indicated that rate codes for UCS and conditioned stimuli were relatively stable.
CS rate codes also were acquired in very early training (earlier than detectable population codes), and CS rates thereafter remained relatively stable (early: t = 2.960, p = 0.003; late: t = 3.193, p = 0.002). The only exception to rate constancy was that the CS+1 rate during its full 10 sec presentation rose from the earliest training to late training (p < 0.05) in the analysis that defined early training as only the first 2 d (rather than 3 or 4 d). That rise in CS+1 rate when “early” is restricted to the earliest day or two shows that rate codes for the CS+1 tone are initially low (similar to the CS– tone). The CS+1 rates began to rise above CS– rates by days 3 and 4 (compared with population codes, which as described above develop more slowly and do not become fully established until the subsequent training phase that includes days 4–9). A significant interaction between stimulus and training phase supported this conclusion that rate coding of CS+1 versus CS– changes after the first 2 d of learning (p < 0.008). Relatively early establishment of CS+1 rate codes was also indicated by a day-by-day comparison of rates. For example, on day 1, CS+1 did not trigger higher firing rates than CS– (NS) but did so on days 2 and 3 (p < 0.05). An implication of these observations is that firing rate codes were established in the earliest few days of training and then remained constant, although the contributing subpopulations of VP units that code particular conditioned stimuli continued to change further with more training days (in conjunction with behavioral conditioned responses that also continued to change).
Discussion
VP units responded to sucrose rewards (UCS) and to learned incentive cues (CS+) that predicted those rewards (Fig. 10). Reward learning was coded by a change in the population of responsive units and by relatively stable firing rate patterns. Tone and click conditioned stimuli elicited neuronal responses that changed dynamically from early to late training. The proportion of responsive units that was recruited for the CS+1 tone, the earliest predictor of reward, increased whereas responses to CS+2 (feederclick) decreased (Fig. 10). This mirrored the behavioral pattern. In contrast, sucrose UCS elicited similar neuronal population responses before and after learning (Fig. 10). A relatively stable firing rate code for conditioned stimuli and UCS coexisted with the changing population of responsive units: UCS elicited the highest rates, followed by CS+2 and CS+1.
A dynamic population code was the dominant neural representation of behavioral learning about CS incentive stimuli, because that population code corresponded most closely to the growth of conditioned behavior. More VP units responded to CS+1 as its reward association strengthened over training (Fig. 10), and this population response was reflected in the rats' increase in behavioral approaches to the sucrose bowl during CS+1 on later trials. In contrast, the unpredictive CS– never recruited a responsive population beyond its small initial level and, correspondingly, never elicited increased bowl approaches. Thus, the VP seems to encode pavlovian reward conditioned stimuli using neural populations, the size of which correlates to the strength of behavioral learning. Neuronal population coding of CS+ might plausibly reflect either their learning value (e.g., associative prediction), their motivational value (e.g., conditioned incentive salience), or both (Schultz, 1998a; Berridge and Robinson, 1998, 2003; Ghitza et al., 2003; McClure et al., 2003; Dayan and Balleine, 2002).
VP populations were sensitive to the serial order of CS+ predictors, when two well-learned CS+ were encountered in a series. The first CS+ was more effective than the second CS+ at eliciting VP populations in late training sessions (Fig. 10). This represented a reversal from previous trials when units preferred the second stimulus (CS+2) that was temporally closer to reward. Preferential activation to the first predictor of reward has been demonstrated by other investigators for mesolimbic dopamine neurons and for neurons in the nucleus accumbens and striatum (Ljungberg et al., 1992; Aosaki et al., 1994; Mirenowicz and Schultz, 1994; Hollerman and Schultz, 1998).
The neural population responsive to UCS was more stable in size than CS populations and unchanged by learning (Fig. 10). This stability differs from dopamine neurons, which eventually cease to fire to already predicted unconditioned rewards (Ljungberg et al., 1992; Mirenowicz and Schultz, 1994; Hollerman and Schultz, 1998). In our study, neither UCS population size nor UCS rate code was modified by associative prediction. Failure of CS+ learning to modify UCS response suggests that the VP may not code prediction error or teaching-related properties that vary depending on whether UCS is predicted or unpredicted. Instead, it seems possible that VP neurons may encode more stable properties of UCS, such as hedonic (liking) or gustatory sweetness. Hedonic coding would be consistent with previous behavioral studies of the role of the VP in reward (Cromwell and Berridge, 1993; McAlonan et al., 1993; Hiroi and White, 1993; Johnson et al., 1996; Gong et al., 1997), but future VP studies are needed to test between hedonic versus sensory hypotheses of UCS coding.
However, at least for the CS+ and CS– auditory tones, sensory properties like frequency and continuity of the tones seemed less important than associative/incentive properties for VP activation. For example, the low-frequency pulsed and high-frequency continuous tones did not elicit different VP responses before learning. Regardless of which tone was assigned as CS+1, VP unit responses increased over training to CS+1 and not to CS–.
Similarly, motor responses appeared relatively unimportant for VP activation. Our sampling analysis revealed that VP units responsive to CS+ were unresponsive to head or limb orientation movements. Additionally, the large majority of UCS-responsive units showed no neural activity to tongue or jaw movements made during grooming outside of pellet acquisition. Such observations suggest that sensory or motor features per se are not primary triggers for this VP population and instead that reward learning or motivational features of pavlovian stimuli may be more important for VP coding.
Integration was a prominent characteristic of VP population coding. Units typically combined processing of several types of stimuli. More than half of the VP units responded to more than one stimulus, three times more than the largest subpopulation responsive to any single stimulus (Fig. 5). Other studies have found that neurons in the nucleus accumbens, lateral hypothalamus, and ventral tegmental area (VTA) also respond to multiple components of a task, including cues, operant behaviors, and reward (Ono et al., 1985; Nishino et al., 1987; Chang et al., 1996; Cromwell and Schultz, 2003; Ghitza et al., 2003). Thus, most VP units combine multiple threads of information related to pavlovian incentive cues and sucrose reward. Neurons code information about stimulus status and reward learning, forming dynamically flexible assemblies of neurons.
In addition to a population code, the VP processed stimulus identity information simultaneously by a rate code in responsive units. Firing rate changes seem to be parallel and somewhat independent to population codes. This is supported by the observation that firing rate codes did not change markedly once established in early training, despite the subsequent changes in the CS neural population code and behavioral conditioned responses during additional learning (Fig. 10).
Firing rate codes were signatures for UCS and CS identities. In the responsive population as a whole, UCS consistently evoked the fastest firing rates in VP units in early and late training. Conversely, the CS– that predicted no reward evoked the lowest firing rates, and CS+ elicited intermediate firing rates. That these signatures appeared early in training indicates that firing rate codes require minimal learning. Thus, firing rates seem to stabilize quickly, even as subpopulations of units that respond to particular stimuli continue to change in membership throughout training.
It is interesting that only sucrose reward was associated with relative decreases in the firing rate for two-thirds of the subpopulation that responded to UCS and that such decreases occurred only for units that first were excited by CS+ (Fig. 9). Most other UCS responses were excitatory. Decreases in neural activity associated with reward UCS have been reported for limbic structures such as nucleus accumbens, striatum, lateral hypothalamus, and VTA (Lee et al., 1998; Cromwell and Schultz, 2003), although some of those decreases may be absolute as well as relative (Ono et al., 1985; Nishino et al., 1987). The UCS firing dip found here may reflect the “turning off” of an anticipatory excitation process at the end of a rewarded trial (Jog et al., 1999), rather than absolute inhibition. The anticipatory process terminated might involve CS+-triggered associative prediction or incentive motivation processes (Schultz, 1998a; Dayan and Balleine, 2002; Berridge and Robinson, 2003; McClure et al., 2003). The idea of anticipation termination, which suggests a disfacilitation mechanism, seems likely because all inhibitory UCS responses were restricted to units that were already excited by an earlier CS+.
Strikingly, however, UCS elicited the highest increase in firing rate for the entire responsive VP population (Fig. 9a). There are two reasons why the apparent paradox of the general UCS rate elevation could persist in the face of the subpopulation decrease discussed above. First, the decrease associated with UCS was mild, rarely falling below baseline firing, and involved relatively few units. Also, many units from the remaining VP population increased their rates to UCS, although those units did not reach our criterion to be classified as UCS responsive. This elevation of activity in the VP population as a whole during UCS may encode sucrose reward even if it is not detectable on individual units by our statistical techniques for identifying functional subpopulations. Thus, on average, the firing rate of the total VP population was highest during UCS.
Our electrode sites were in mid-posterior VP, and mostly medial within it. It would be of interest for future studies to compare other VP sites, too. Several studies indicate that different VP subregions may play different roles in reward (Johnson et al., 1993; McAlonan et al., 1993; Johnson et al., 1996). Even in our data sample, it seems possible that rostral electrode sites did not code reward stimuli completely identically to caudal sites. Future work on regional variation clearly is needed to ascertain whether VP subregions differ in reward coding. The caudal VP receives anatomical projections from the shell and core of the nucleus accumbens (Zahm and Heimer, 1990), and medial caudal VP projects to reward-related structures such as the VTA and mediodorsal thalamus that projects to the prefrontal cortex (Zahm and Heimer, 1990; Zahm et al., 1996). Additionally, medial VP is intermediary in a circuit from the accumbens shell to lateral hypothalamus involved in feeding behavior (Stratford et al., 1999), and caudal VP is needed to generate positive affective liking reactions to the taste of sucrose (Cromwell and Berridge, 1993). These considerations support the possibility that neurons in different VP sites may code reward information somewhat differently.
In conclusion, our results demonstrate for the first time that VP units in freely moving animals use both firing rate codes and population codes to represent sweet rewards and the conditioned incentive cues that predict them. These VP firing rates and populations may code learning, motivational, and affective features of pavlovian stimuli and sucrose reward. The population code identifies both reward conditioned stimuli and UCS, and its acquisition to learned stimuli mirrors the behavioral acquisition of conditioned approaches to reward CS. The rate code, by comparison, reflects overall activation of local VP units and may encode both reward UCS and predictor stimulus identities in relatively stable ways. Additional research is needed for better understanding of VP coding of predictive and motivational information, but our results so far support dynamic, multiplex roles for VP units in mesocorticolimbic circuits of learning, motivation, and reward.
Footnotes
This work was supported by National Institutes of Health Grants NS31650, DA015188, and MH63649 to J.W.A. and K.C.B. and by a National Science Foundation Fellowship to A.J.T. We acknowledge the technical assistance of Nadia Siddiqui, computing assistance of Rick Kindt, helpful suggestions from Dr. Matt Matell, and videotape scoring by Sarah Dematio.
Correspondence should be addressed to Dr. J. Wayne Aldridge, Department of Neurology, Neurology Research Office, 300 North Ingalls Building 3D03, University of Michigan, Ann Arbor, MI 48109-0489. E-mail: jwaynea{at}umich.edu.
DOI:10.1523/JNEUROSCI.1437-03.2004
Copyright © 2004 Society for Neuroscience 0270-6474/04/241058-12$15.00/0