Abstract
There is ample evidence that blockade of CB1 receptors reduces reward seeking. However, the reported effects of CB1 blockade on performance for rewarding electrical brain stimulation stand out as an exception. By applying a novel method for conceptualizing and measuring reward seeking, we show that AM-251, a CB1 receptor antagonist, does indeed decrease performance for rewarding electrical stimulation of the medial forebrain bundle in rats. Reward seeking depends on multiple sets of variables, including the intensity of the reward, its cost, and the value of competing rewards. In turn, reward intensity depends both on the sensitivity and gain of brain reward circuitry. We show that drug-induced changes in sensitivity cannot account for the suppressive effect of AM-251 on reward seeking. Therefore, the role of CB1 receptors must be sought among the remaining determinants of performance. Our analysis provides an explanation of the inconsistencies between prior reports, which likely arose from the following: (1) the averaging of data across subjects showing heterogeneous effects and (2) the use of methods that cannot distinguish between the different determinants of reward pursuit. By means of microdialysis, we demonstrate that blockade of CB1 receptors attenuates nucleus accumbens dopamine release in response to rewarding medial forebrain bundle stimulation, and we propose that this action is responsible for the ability of the drug to decrease performance for the electrical reward.
Introduction
Rats work vigorously for electrical stimulation of the medial forebrain bundle (MFB) (Olds and Milner, 1954), a phenomenon known as intracranial self-stimulation (ICSS). The effect that the rat seeks to reinitiate is called “brain stimulation reward” (BSR) (Table 1). Like pharmacological and natural rewards, rewarding MFB stimulation causes dopamine (DA) release in the nucleus accumbens (NAc) (Hernandez and Hoebel, 1988; You et al., 2001; Hernandez et al., 2006).
Curve-shift scaling (Edmonds and Gallistel, 1974, 1977; Miliaressis et al., 1986) is used widely to infer effects of drugs on BSR from displacement of psychometric curves linking stimulation strength to instrumental performance. CB1 receptor (CB1R) ligands have produced inconsistent effects in the curve-shift paradigm (Arnold et al., 2001; Deroche-Gamonet et al., 2001; Vlachou et al., 2003; De Vry et al., 2004; Vlachou et al., 2005; Xi et al., 2008). This contrasts sharply with the consistent effects of CB1R ligands on performance for food and drugs (Solinas et al., 2008).
Hernandez et al. (2010) have extended curve-shift scaling by measuring and modeling performance for BSR as a joint function of the strength of the stimulation, as determined by the pulse frequency, and its opportunity cost (“price”): the cumulative time required to earn a reward (Fig. 1). The proportion of a subject's time devoted to reward seeking [time allocation (TA)] increases as a function of pulse frequency and decreases as a function of price (Fig. 1B,C). The resulting three-dimensional (3D) structure (Eq. 1) is dubbed the “reward mountain.”
The pulse frequency that produces a half-maximal reward, Fhm, sets the position of the mountain along the pulse-frequency axis, whereas the price at which the rat spends half its time working for a maximal BSR, Pe, sets the position along the price axis. These location parameters reflect different stages in the translation of stimulation-induced firings into reward-seeking behavior (Fig. 1A). Drug action before the output of the “intensity-growth” function that translates the firing rate of the directly activated neurons into a subjective reward intensity (Gallistel and Leon, 1991; Leon and Gallistel, 1992; Simmons and Gallistel, 1994) alters Fhm (Fig. 1B), whereas drug action at later stages alters Pe (Fig. 1A,C) (Hernandez et al., 2010). The sensitivity of the reward substrate determines Fhm (Fig. 1B) and is analogous to the affinity of a ligand for a receptor. The gain of the substrate determines the maximal reward intensity attainable; it is analogous to receptor density and is reflected in Pe, as are alterations in perceived costs or in the value of competing activities (Fig. 1C).
Arvanitogiannis and Shizgal (2008) and Hernandez et al. (2010) have shown that displacements of the 3D reward mountain along the axes representing the strength or cost of reward cannot be distinguished on the basis of conventional two-dimensional (2D) measurements, such as curve shifts or progressive-ratio break points (Hodos, 1961; Keesey and Goldstein, 1968). Thus, we used the novel 3D measurement method to determine whether CB1Rs modulate BSR, and, if so, to constrain the stage(s) of processing to which these receptors contribute. We also show that CB1R blockade attenuates the ability of rewarding MFB stimulation to boost extracellular DA concentrations in the NAc, which could explain the decrease produced by this treatment in the opportunity cost at which rats maintain performance for BSR.
Materials and Methods
Subjects.
Subjects were 19 male Long–Evans rats from Charles River Breeding Farms. Thirteen of these animals took part in the intracranial self-stimulation (ICSS) experiment, and the rest took part in the microdialysis experiment. The rats were housed in Plexiglas cages in a vivarium with controlled temperature and reversed 12 h dark/light cycle. Food and water were available ad libitum. The behavioral procedures were conducted during the dark phase of the cycle, between 7:30 A.M. and 2:00 P.M. All procedures complied with the principles of the Canadian Council on Animal Care.
Implantation of electrodes and cannulas.
Rats weighed 400–550 g at the time of surgery. We administered atropine sulfate (0.05 mg/kg, s.c.) to reduce bronchial secretions. Anesthesia was induced with ketamine–xylazine (10–100 mg/kg, i.p.) and maintained with isoflurane vapor. Penicillin (0.3 ml/kg, i.m.) was administered to prevent infections. Before the rat was mounted in the stereotaxic frame, xylocaine jelly was applied to the external auditory meatus to reduce discomfort from the ear bars. Monopolar stainless-steel electrodes were constructed from 000 insect pins and insulated with Formvar to within 0.5 mm of the tip. The electrodes were aimed bilaterally at the lateral hypothalamic level of the MFB [anteroposterior (AP): −2.8, mediolateral (ML): ±1.7, dorsoventral (DV): 8.7–8.9 from the skull]. Four stainless-steel jeweler screws were threaded into pilot holes drilled in the skull; the electrodes were anchored to these screws with dental acrylic. A length of wire wrapped around two of the screws served as the current return. Gold-plated Amphenol connectors, attached via a short length of wire to each of the electrodes and the skull-screw return, were inserted into a McIntyre Miniature Connector (Scientific Technology Centre, Carleton University, Ottawa, ON, Canada), which was attached to the skull screws with dental acrylic to form a head cap. In the rats destined for the microdialysis experiment, 20 gauge guide cannulas were aimed bilaterally at the NAc (1.5 AP, 2.8 ML, and −5.4 DV from skull at a 10° angle), in addition to the MFB stimulation electrodes. Buprenorphine (0.05 mg/kg, s.c.) was administered immediately following surgery to reduce subsequent pain. Rats were allowed 5–7 d of recovery before behavioral training began.
Apparatus.
Behavioral testing was performed in four plastic operant boxes (30 × 21 × 51 cm) with a mesh floor and a clear Plexiglas front. Each box was equipped with a flashing light, located 10 cm above the floor mesh, and a retractable lever (ENV–112B, MED Associates) mounted on the right side wall. A 1 cm light was located 2 cm above the lever and was activated when the rat depressed the lever.
The temporal parameters of the electrical stimulation were set by a computer-controlled digital pulse generator, and pulse amplitude was determined by a computer-controlled constant-current amplifier. Stimulation consisted of 0.5 s trains of cathodal pulses, 0.1 ms in duration. The stimulation current was routed to the rat through a multichannel slip ring that allowed the rat to circle without tangling the leads. Experimental control and data acquisition were handled by a personal computer running a custom-written program (“PREF”) developed by Steve Cabilio (Concordia University, Montreal, QC, Canada). The stimulation was monitored on an oscilloscope by displaying the potential drop across a 1% precision resistor in series with the rat.
The behavioral phase of the microdialysis experiment was conducted in the previously described setup. In the neurochemical-sampling phase, the rats were transferred to similar operant chambers from which the levers had been removed, and dialysate samples were collected. Stimulation trains were programmed by a Master-8 pulse generator (A.M.P.I.), controlled by LabView software (National Instruments), and delivered by a constant current amplifier (Mundl, 1980). An infusion pump (Harvard Instruments) was connected by polyethylene tubing (PE-20) to a fluid swivel located at the top of the chamber. The second port of the swivel was connected to one end of a 50 cm length of polyethylene tubing, and a microdialysis probe was connected to the other end. A small diameter silica tube, extending into the tip of the microdialysis probe, completed the fluid circuit. The probes were described in detail previously (Hernandez et al., 2006, 2007).
Self-stimulation training.
For each rat, we determined the stimulating electrode and the current-frequency combination that supported vigorous lever pressing with minimum aversive side effects. From that point onwards the current and stimulating electrode were held constant. Rats were then trained to keep the lever depressed for a cumulative time of 4 s to receive the stimulation. Once this task had been mastered, training commenced on the “frequency-sweep” procedure. Each sweep consisted of a set of trials during which the stimulation parameters were held constant, and the rat had the opportunity to harvest as many as 20 rewards. Following delivery of each reward, the lever was disarmed and retracted for 2 or 3 s. The pulse frequency during the first three trials was set to the highest value the rat could tolerate without signs of aversion or forced movement. Over the subsequent eight trials, the pulse frequency was decreased systematically from trial to trial in equal proportional steps. The dependent variable was a corrected measure of the proportion of trial time that the lever was depressed (time allocation) (Breton et al., 2009). The range of pulse frequencies was selected to drive time allocation from its maximal to its minimal values, in sigmoidal fashion. Every trial was preceded by a 10 s intertrial interval signaled by a flashing light. During the last 2 s of this period rats received priming stimulation consisting of two stimulation trains at the maximum pulse frequency that the rat could tolerate, delivered at 1 train s−1.
After the subject showed consistently high asymptotic values of time allocation (not lower than 0.8) in at least the first two trials and low asymptotic values (<0.2) in at least the last two trials of each determination, we introduced two new types of sweeps. During “price sweeps,” the rats had to hold down the lever for increasing cumulative periods (i.e., prices) to obtain a stimulation train of maximal strength. The duration of each trial was adjusted to allow the rat to harvest a maximum of 20 rewards. After consistently high and low asymptotic time-allocation values (≥0.8 and ≤0.2, respectively) were observed in price-sweep data, a new “radial” sweep was added. In a radial sweep, the required price increased, and the stimulation strength decreased simultaneously across sequential trials. The stimulation-price combinations and the spacing between the trials were calculated so that the vector described by the radial sweep in the parameter space [log10(P) vs log10(F)] passed through, or very near, the point defined by the fitted values of the location parameters [log10(Pe), log10(Fhm)] (see Fig. 3A,B). This was achieved using the data from the frequency and price sweeps and a simulator developed by Yannick Breton and implemented in MATLAB (The MathWorks).
Two sweeps of each type were run during every session. We use the term “survey” to refer to the combination of a frequency, a price, and a radial sweep; these provide the minimal dataset required to fit the mountain model. The sequence of sweeps was random within session for subjects C8–C14 and random within survey for C17–C20. In the latter case, the rats had to complete a full survey before any of the sweeps were repeated; this adjustment was made to increase the power of the resampling-based surface-fitting approach (see below). Each rat performed under these conditions for four sessions, and then the model was fitted (see below, Self-stimulation data: model fitting and comparisons). If the radial sweep deviated excessively from the fitted values of [log10(Pe), log10(Fhm)] or if the upper or lower asymptotic time-allocation values were insufficiently well defined, the sequence of prices and pulse frequencies was readjusted. Each rat was considered ready for behavioral or in vivo microdialysis drug testing when its responding was consistent throughout sessions and the trajectory of the radial sweep passed sufficiently close to [log10(Pe), log10(Fhm)]. Rats required 5 weeks of training, on average, to reach the drug-testing phase. Rats that failed to meet the criteria described above were excluded from the experiment.
Self-stimulation testing under the influence of AM-251 and its vehicle.
Each session consisted of a warm-up frequency sweep, followed by two price, two radial, and two frequency sweeps, either randomized within sessions (rats C8–C14) or randomized within surveys (rats C17–C20).
AM-251 (3 mg/kg; Tocris Bioscience) was diluted in 90% ethanol (90 μl/mg), cremophor (90 μl/mg), and 0.9% saline (900 μl/mg). The drug or its vehicle was administered at a volume of 3 ml/kg, i.p., 30 min before each behavioral test. This dose was chosen in accordance with previous studies (Xi et al., 2006, 2008). The stimulation frequencies and prices in the vehicle sessions were the same as those determined in the training phase. During the drug sessions, the price values tested in the price sweep were decreased by 0.1–0.2 log10 units on the basis of leftward shifts along the price axis observed during pilot testing (data not shown).
At least one “washout” day followed each drug session to allow for elimination of the drug. Rats received vehicle injections on Mondays and Thursdays, drug injections on Tuesdays and Fridays; Wednesdays, Saturdays, and Sundays were washout days. Eight to 12 test sessions, 5–6 h in duration, were run in both the drug and vehicle conditions. Approximately 3 months were required, following the initial surgery, to complete testing of each subject.
Following behavioral testing, rats were overdosed with ketamine–xylazine. As described previously (Hernandez et al., 2007), stimulation sites were marked by means of the Prussian Blue method and located by microscopic inspection of formol–thionine stained brain sections, with reference to an atlas of the rat brain (Paxinos and Watson, 2007).
Self-stimulation data: model fitting and comparisons.
Equation 1 describes the mountain model, as follows: where a is the constant determining the abruptness with which TA grows as the payoff from BSR increases; F is the pulse frequency; Fhm is the pulse frequency that produces a half-maximal reward; g is the constant determining the abruptness with which reward intensity grows as F is increased; P is the price (opportunity cost) of a stimulation train, the cumulative time the lever must be depressed in order for delivery of a stimulation train to be triggered; and Pe is the price at which the payoff from a maximally intense BSR equals the payoff from competing activities.
Among the objectives of the model-fitting approach were unbiased estimates of location-parameter (Fhm, Pe) values and their dispersions for each subject. This was accomplished by means of a MATLAB (The MathWorks) procedure developed by Kent Conover, based on the nonlinear least-squares routine in the MATLAB Optimization Toolbox and re sampling methods (Efron and Tibshirani, 1994). A primary fit of the six-parameter model presented in Equation 1 was performed independently to the data from each session (subject C8–C14) or survey (C17–C20) in each condition; this was done using the “location-specific approach” (Hernandez et al., 2010). This approach entails fitting individual values of the two location parameters to the data for each session or survey while using common values of the four remaining parameters. The reason for this procedure is to protect the values of the two slope parameters, a and g (Eq. 1), from the degradation that would ensue from fitting common values of all parameters to datasets that shift in the parameter space from session to session (Hernandez et al., 2010); such shifts would be expected to arise from unavoidable variation in drug administration, absorption, etc. Following the primary fit, the data were then resampled with replacement by session or survey, 1000 times; the model was fitted to each resampled dataset as described above. Estimates of the mean value of each parameter and the corresponding 95% confidence interval were computed over the 1000 fits; in the case of the location parameters, the session-specific values were averaged within each set of fits to a given resampled dataset. The 95% confidence intervals were percentile-based: they exclude the lowest and highest 25 of the 1000 values (see Fig. 5).
The seven-parameter model described previously (Hernandez et al., 2010) allowed us to account for the exceptionally high time allocation observed at the lower pulse frequencies during the frequency sweeps for subject C17; according to the Akaike information criterion (Akaike, 1974), this model provided a better fit (data not shown) than the standard model, but only in the case of this one rat.
A difference vector was constructed for each location parameter in each subject by subtracting, element by element, the 1000 estimates for the AM-251 condition from the 1000 estimates for the vehicle condition. The mean changes in parameter values reported here represent the mean of this difference vector, whereas the 95% confidence intervals are simply its 2.5th and 97.5th percentiles (see Fig. 5). If the confidence interval did not include zero, the difference between conditions was considered statistically reliable, with an α level of 0.05.
Quantification of NAc DA release produced by rewarding MFB stimulation: in vivo microdialysis.
The rats in this phase of the study also underwent ICSS training, and the 3D model was fitted to each rat's data, as described above. The obtained parameters were used to estimate the pulse frequency that drove reward intensity to 95% of its maximum value (Eq. 2). where F95 is the pulse frequency that produces a subjective reward intensity equal to 95% of the maximal attainable value, Fhm is the pulse frequency that produces half-maximal reward, and g is the parameter that determines the rate at which subjective reward intensity grows as a function of pulse frequency.
Rats were transferred to the microdialysis testing room 14 h before dialysate collection commenced. They were lightly anesthetized with isoflurane, and the microdialysis probes were inserted bilaterally into the NAc through the guide cannulae. Once the probes were in place, artificial CSF (145 mm Na+, 2.7 mm K+, 1.22 mm Ca2+, 1.0 mm Mg2+, 150 mm Cl−, 0.2 mm ascorbate, 2 mm Na2HPO4, pH = 7.4 ± 0.1) was pumped through them continuously, at a rate of 0.3 μl/h, to prevent the membrane from occluding. Food and water were available ad libitum. Two hours before sampling began, food was removed from the chamber and the flow was increased to 1.0 μl/h. Samples were then collected every 20 min. Baseline values for the DA concentration in the dialysate were obtained over the first 60 min of sampling (three samples). Animals then received either an injection of AM-251 (3 mg/kg) or its vehicle. Three dialysate samples were collected following the injection. This provided sufficient time for absorption and distribution of the drug and sufficient information to measure the effect of the drug on basal levels of NAc DA. Following collection of these samples, electrical stimulation was delivered for 2 h (six samples) at unpredictable intervals, according to a VT12 schedule. The stimulation pulse frequency was set to F95 for each rat. Six additional samples were collected after delivery of the stimulation ceased.
All animals received both AM-251 and its vehicle, in counterbalanced order, on different days. Drug administration sessions were always followed by a washout day during which the flow rate was reduced to 0.3 μl/h, and no samples were collected.
DA and its metabolites were quantified by means of electrochemical detection, using high performance liquid chromatography, as described in detail previously (Hernandez et al., 2006, 2007). Neurochemical data were analyzed by means of a two-way, repeated-measures ANOVA, using the “treatment” (drug/vehicle) and “time” (time of sampling, 18 samples per each treatment per rat) as factors. The effects of the drug on basal DA levels, the effects of stimulation on DA tone, and the differences between drug and vehicle during stimulation were then assessed by means of planned comparisons.
Simulation of “2D curve-shifts.”
On the basis of the mountain model and the fitted parameter values for each rat, we estimated the frequency required to support half-maximal performance (Fm50), the value that would have been obtained in a conventional curve-shift experiment (Eq. 3). To account for the low price paid for reward when the commonly used, continuous-reinforcement schedule is in force, we set the price to 0.1 s. In accordance with the practice in most prior studies linking CB1Rs with BSR (Arnold et al., 2001; Deroche-Gamonet et al., 2001; Vlachou et al., 2003; De Vry et al., 2004; Vlachou et al., 2005; Xi et al., 2008), the simulated Fm50 values were averaged within condition (drug or vehicle) for each subject. The paired means for all subjects were then compared across conditions using a paired-sample t test. where Fm50 is the pulse frequency that produces half-maximal time allocation, Fhm is the pulse frequency that produces half-maximal reward intensity, g is the exponent (growth constant) of the intensity-growth function, P is the price (opportunity cost) of the stimulation train, and Pe is the price at which the rat devotes half of its time to harvesting a reward of maximal intensity.
Results
The tip of the stimulating electrode in all eight subjects was within the MFB, at the level of the lateral hypothalamus (Fig. 2, top). The probes for the microdialysis subjects were located within the NAc (Fig. 2, bottom).
The dependent measure was the proportion of trial time that the lever was depressed as a function of the pulse frequency and the price. The mountain model was fitted to these data to determine the Fhm and Pe parameter values and their associated confidence intervals, for each rat under each condition. As an example, Figure 3 shows the fit to the drug and vehicle data from subject C19, the location-parameter estimates, and their confidence intervals for each condition. Two-dimensional representations of the fitted sweeps from subject C19 are shown in Figure 4.
Changes in the values of the location parameters produced by AM-251 were assessed independently for each rat. Figure 5 shows contour-graph representations of the fits to the data from subject C19 along with the drug-induced changes in the location parameters. The contour graph for the drug condition (Fig. 5C) is displaced leftward with respect to the contour graph for the vehicle condition (Fig. 5A), whereas the vertical positions of the two contour graphs (Fig. 5C,D) are similar. Thus, AM-251 failed to alter the Fhm parameter but produced a substantial (nearly 0.2 log10 unit) decrease in the value of the Pe parameter (Fig. 5B).
The rows of blue diamonds in Figure 5, A, C, and D, denote the prices tested in rat C19 along the price sweeps. Note that the orientation of the contour lines is almost vertical at their intersection with the price-sweep vectors. Each contour line plots the combinations of price and pulse frequency that support a given level of behavior (time allocation). Thus, the contour lines trace out the intensity-growth function for BSR (Fig. 1A, red curve in the 3D graph on the left). The diagonal portions of the contour lines span ranges of pulse frequency over which reward intensity rises; as a result, the effect of a price increase can be offset by an increase in pulse frequency. In contrast, where the contour lines run vertically, reward intensity has leveled off at its maximal value, and increases in price can no longer be offset by further increases in pulse frequency. An estimate of the Pe parameter can be obtained by visual inspection of price-sweep data that intersect the vertically oriented portions of the contour lines: it is the price at which time allocation for the maximally intense reward lies halfway between the lower and upper asymptotes of the sigmoid psychometric curve. For example, the prices corresponding to the vertical midpoints of the two price-sweep curves in Figure 4B, which were obtained at near-maximal reward intensities, provide rough estimates of the Pe parameter, and the decrease in the value of this parameter produced by AM-251 is approximated by the leftward displacement of the solid, dark-blue curve from the dashed, light-blue curve.
Figure 6 shows the drug-induced changes in location-parameter estimates for all subjects. The changes in the value of the Fhm parameter met the criterion for statistical reliability in the data from only three of eight rats and ranged from −0.119 to .0194 common logarithmic units. The direction of these changes was inconsistent; in the case of Rat C8, Fhm decreased in the drug condition whereas in the cases of Rats C11 and C14, the same treatment increased it (Fig. 6). In contrast, we found a reliable decrease in the value of Pe following drug administration in seven of the eight rats. Figure 6 shows that the size of these changes ranged from −0.084 to −0.242 common logarithmic units (17.6–42.7% decreases in Pe).
We quantified the levels of DA in the NAc at various time points before, during, and after electrical stimulation following an injection of AM-251 or its vehicle (Fig. 7). We found a significant main effect of time of sampling (F(17,119) = 8.8032, p < 0.01), the treatment (F(1,119) = 9.1776, p < 0.05), and their interaction (F(17,119) = 4.0021, p < 0.01). Planned comparisons showed that electrical stimulation produced a significant increase in NAc DA levels (Fig. 7B). This increase was significantly attenuated by CB1R blockade, without affecting basal levels (Fig. 7).
We used the mountain model to derive a widely used location parameter for psychometric curves obtained in 2D curve-shift experiments: Fm50, the pulse frequency that supports a half-maximal level of performance (Table 1, Fig. 8). In accord with conventional practice, we compared the Fm50 estimates for the drug and vehicle conditions by means of a paired sample t test (Arnold et al., 2001; Deroche-Gamonet et al., 2001; Vlachou et al., 2003; De Vry et al., 2004; Vlachou et al., 2005; Xi et al., 2008). Whereas the 3D methodology allowed us to detect reliable drug-induced changes in the Pe parameter in 7/8 rats, the effects of AM-251 on the derived Fm50 values failed to cross the statistical threshold (t(7) = 1.885, p > 0.05) (Figs. 8, 9).
Discussion
CB1Rs modulate the behavioral impact of rewards. Rodents pretreated with CB1R antagonists show decreased break points in progressive-ratio tests of performance for food (Rasmussen and Huskinson, 2008), blunted appetitive responses in the taste-reactivity test (Jarrett et al., 2007), impaired acquisition of conditioned place preferences to drugs (Singh et al., 2004; Forget et al., 2005; Yu et al., 2009), and reduced drug self-administration (Filip et al., 2006; Shoaib, 2008; Xi et al., 2008). Conversely, CB1 receptor agonists increase operant responding for food (Solinas and Goldberg, 2005) and induce place preference (Valjent and Maldonado, 2000).
Dopamine release in the NAc has been implicated in reward and motivation (Wise, 2008). Mice lacking CB1Rs show decreased DA release in the NAc in response to drug rewards (Mascia et al., 1999; Hungund et al., 2003; Li et al., 2009). The release of dopamine in the NAc by rewarding drugs is inhibited by CB1R blockade (Cheer et al., 2007) and enhanced by pharmacological activation of these receptors (Cheer et al., 2004; Solinas et al., 2006). Given the vast and consistent evidence linking CB1Rs with reward modulation, it is striking that the effects of CB1R blockade on ICSS, one of the most widely used procedures for the quantitative study of reward, have heretofore yielded contradictory results (Solinas et al., 2008). As discussed below, our results offer an explanation for this inconsistency and provide a way to reconcile the effects of CB1R blockade on ICSS with the rest of the literature implicating CB1Rs in reward modulation.
We found consistent effects of CB1R blockade on the pursuit of BSR by manipulating both the strength and cost of rewarding stimulation and by applying a 3D analysis appropriate for testing the influence of drugs on the performance of individual subjects. Application of the 3D model distinguishes between changes induced by CB1R blockade in the sensitivity of brain reward circuitry and changes induced by the multiple factors that alter the price at which rats maintain a given level of performance for stimulation of a given strength (Fig. 1A). Changes in sensitivity alter the stimulation strength required to produce a half-maximal reward (Fhm), which governs the position of the reward mountain along the pulse-frequency axis. Changes in the value of the Fhm parameter met the criterion for statistical reliability in the data from only three of eight rats, and the direction of these changes was inconsistent. In contrast, we found a reliable decrease in the value of Pe following drug administration in seven of the eight rats. Thus, CB1Rs play their principal role at or beyond the output of neural circuitry that determines reward sensitivity. Such actions could include downward rescaling of integrator output (i.e., decreased gain) or increases in subjective costs (i.e., subjective valuation of the time or effort required to earn a reward), and the value of competing activities such as grooming, resting, and exploring (Herrnstein, 1970, 1974; Killeen, 1972; Heyman, 1988).
The decrease in the prices at which a given level of performance is sustained (↓Pe) under the influence of AM-251 may reflect an interaction of CB1R blockade with neurotransmitter systems implicated in reward pursuit. The fact that boosting DA tone in the NAc is accompanied by an increase in the prices at which performance for BSR is sustained (Hernandez et al., 2010), an effect opposite in sign to the one reported here, suggests that the present effect could be due to a decrease of DA signaling in the NAc.
As in prior studies (Hernandez et al., 2006), rewarding MFB stimulation produced a significant increase in NAc DA levels. This increase was attenuated significantly by CB1R blockade, without affecting basal levels. Thus, AM-251 may decrease the prices at which a given level of performance is sustained (↓Pe) by blunting the ability of MFB stimulation to boost DA tone in the ventral striatum. The observed behavioral and neurochemical effects are likely due to attenuated endocannabinoid-mediated disinhibition of DA neurons (Sperlágh et al., 2009). That AM-251 failed to alter basal levels of DA but did reduce the stimulation-induced enhancement of DA tone suggests that endocannabinoids are released in response to rewarding MFB stimulation and that their disinhibitory influence on DA neurons is reduced by AM-251.
CB1 receptors are the target of at least two endogenous ligands: anandamide and 2-arachidonoylglycerol. It has been suggested that these two lipids play different behavioral roles (Long et al., 2009). Given that blockade of the CB1R interferes with the binding of both endocannabinoids, we cannot, at present, partition the observed effects between them. This might be achieved in future work through the use of novel pharmacological tools that selectively and differentially prevent the degradation of these compounds (Fegley et al., 2005; King et al., 2007).
Our behavioral results illustrate an important methodological point: restricting the collection and analysis of ICSS data to two dimensions and averaging results across subjects can obscure effects that are discernable clearly when performance is measured as a function of both the strength and cost of BSR and in a manner that supports single-subject analysis. This point is illustrated by deriving from our data a measure analogous to the 2D group curve-shifts that have typically been measured. We used the mountain model to derive a widely used location parameter for psychometric curves obtained in curve-shift experiments, Fm50. Despite the reliable decreases in the value of the Pe parameter in 7/8 rats, the difference in Fm50 values failed to cross the statistical threshold. This shows that the 3D methodology permits the detection of differences that may not be readily distinguished with the usual BSR methodology (Figs. 8, 9).
A decrease in Pe can arise in multiple ways (Fig. 1). Although the decrease in the prices at which a given level of performance was sustained could reflect increased subjective costs, it may also be explained otherwise, e.g., by a decrease in reward-system gain (Hernandez et al., 2010). Further methodological progress will be required to distinguish between the currently tenable explanations. In a manner analogous to the method used here, this task can be pursued profitably by taking advantage of nonlinearities in psychophysical functions that translate objective variables (e.g., physical work required to earn a reward) into their psychological equivalents (e.g., subjective effort costs).
Depression has been linked to dopaminergic dysfunction and to a blunted reaction to rewards (Martin-Soelch, 2009). The latter symptom is consistent with a reduction in the gain of brain reward circuitry. Reduced gain in the BSR substrate is a tenable explanation of the results reported here. In this regard, it is noteworthy that an increase in the incidence of depressed mood has been noted in clinical trials of rimonabant (Van Gaal et al., 2008; Moreira et al., 2009), a CB1R antagonist.
The present findings offer an explanation for the inconsistency of prior reports. The traditional rate-frequency curves can be portrayed as 2D projections of a 3D structure. The face of the structure is diagonally oriented. Thus, when the mountain is displaced along an axis representing either pulse frequency or price, the 2D silhouette is displaced along the orthogonal axis (see Notes). If the data are 2D, this can produce the illusion of motion in the plane in which the data are acquired when the actual movement was orthogonal to that plane. In other words, a shift along the price axis (ΔPe) can create the illusion of a shift along the pulse-frequency axis (ΔFhm). However, this relationship is asymmetrical. The low slope of the diagonal portion of the contour lines in Figures 5 and 8A implies that a given change in Pe will produce a substantially smaller displacement in the silhouette of the mountain along the pulse-frequency axis, which is the sole independent-variable axis considered in traditional curve-shift experiments. Such shifts may not be discernible. Thus, it is not surprising that significant effects of CB1R blockade on ICSS have not been found in several prior studies (Vlachou et al., 2003, 2005; Xi et al., 2008). The detection problem is compounded by small changes in Fhm, which can counteract the displacement of the 2D silhouette due to the shift of the 3D structure along the price axis. Moreover, the three reliable Fhm changes observed here were inconsistent in sign. This reduces the likelihood of finding a significant effect when changes in Fm50 are averaged across subjects and group comparisons are carried out. In contrast, the 3D representation of single-subject results (Fig. 5) renders the changes in the location parameters and their statistical reliability unambiguously and clearly.
The allocation of behavior to the pursuit of reward necessarily depends on multiple variables, including reward strength, cost, probability, delay, and risk (Shizgal, 1997). Methods that can distinguish and quantify the contributions of these different variables will be required to determine the roles in reward seeking played by different neural systems. The findings reported here constitute one step toward understanding the contribution(s) of the endogenous cannabinoid system in the evaluation, selection, and pursuit of appetitive goals. The combination of quantitative modeling and multidimensional measurement of behavior promises future advances toward this goal.
Notes
Supplemental material for this article can be found at http://spectrum.library.concordia.ca/7084/. This material has not been peer reviewed.
Footnotes
-
This research was supported by a grant to P.S. from the Canadian Institutes of Health Research (#MOP-74577), by a group grant from the Fonds de la recherche en santé du Québec to the Groupe de recherche en neurobiologie comportementale/Center for Studies in Behavioral Neurobiology (Barbara Woodside, P.I.), and by scholarships to I.T.-P from Consejo Nacional de Ciencia y Tecnologia (CONACYT, #209314) and from le Ministère de l'Éducation, du Loisir et du Sport du Québec (PBEEE-1M, #140498). David Munro built and maintained the computer-controlled equipment for experimental control and data acquisition. Software for experimental control and data acquisition was written and maintained by Steve Cabilio.
- Correspondence should be addressed to Peter Shizgal at the above address. peter.shizgal{at}concordia.ca