INTRODUCTION

The gaming industry has recently experienced a period of rapid growth; opportunities to gamble have increased and gambling is becoming more socially acceptable (Shaffer and Korn, 2002). For most individuals, gambling is enjoyable and harmless, but for others, it can become a compulsive and maladaptive activity comparable with drug addiction (Grant et al, 2005). Such pathological gambling (PG) is associated with significant impairments in quality of life (Grant and Potenza, 2007). Despite growing concern over the impact of gambling on public health, PG has been relatively understudied and treatment options are limited. Development of animal models of gambling behavior would facilitate research into the neurobiological basis of PG as well as other disorders in which gambling-related decision making is compromised.

One neuropsychological test that has been widely adopted in the study of gambling behavior is the ‘Iowa’ gambling task (IGT), during which participants choose cards from four decks, and either win or lose money (Bechara et al, 1994). The optimal strategy is to pick from the decks associated with small gains and also smaller penalties, whereas cards from the disadvantageous decks generate larger wins per trial but also incur heavy long-term losses. Persistent choice of the latter decks is indicative of risky decision making, and is observed in pathological gamblers (Cavedini et al, 2002), substance abusers (Bechara et al, 2001), and those with frontal damage (Bechara et al, 1999; Fellows and Farah, 2005). Impaired judgment on this task has also been observed in other psychiatric populations, including those with schizophrenia, personality disorders, obsessive–compulsive disorder, and Asperger's disorder (Johnson et al, 2006; Lawrence et al, 2006; Maurex et al, 2009; Shurman et al, 2005). Critical to both naturalistic gambling and the IGT is the risk of losing, that is, the resources staked on a favorable outcome are lost when a wager is unsuccessful. This is distinct from failing to win, that is, the absence of any additional gain. However, most animal models of risky decision-making deal exclusively with the latter, for example, probability discounting paradigms, in which subjects choose between smaller certain vs larger uncertain rewards (Adriani and Laviola, 2006; Cardinal and Howes, 2005; Mobini et al, 2000).

In a previous attempt to model gambling in rodents, loss was theoretically signaled by adding quinine to reward pellets, rendering them edible but less palatable (van den Bos et al, 2006). Although this is an interesting approach, animals are effectively choosing between rewards on the basis of the probability of their appetitive quality; there is still no risk of finishing the trial at a disadvantage compared with the start. In the rat gambling task used here (rGT), subjects have a limited amount of time to maximize the number of pellets obtained, and loss is signaled by punishing timeouts during which reward cannot be earned. On each trial, animals choose from four options, each associated with different numbers of sugar pellets. The animal then receives either the associated reward or a punishing timeout. Larger reward options are associated with a higher chance of longer timeouts, resulting in less reward earned overall per session. To maximize their earnings, rats must learn to avoid these risky options similar to the optimal strategy in the IGT.

Serotonin (5-HT) and dopamine (DA) play important roles in impulsivity and addiction (Pattij and Vanderschuren, 2008), and current data suggest that they also contribute to PG; patients with Parkinson's disease (PD) treated with DA agonists can develop symptoms of PG (Weintraub et al, 2006), and peripheral measures of DA are elevated when both healthy and problem gamblers gamble (Meyer et al, 2004; Shinohara et al, 1999). In contrast, decreases in peripheral measures of 5-HT have been observed in pathological gamblers (Marazziti et al, 2008; Pallanti et al, 2006). Regarding treatment of PG, both positive and negative effects have been reported with dopaminergic antagonists (Seedat et al, 2000; Zack and Poulos, 2007), and although serotonin-specific reuptake inhibitors are often prescribed for PG, placebo-controlled studies have yielded equivocal results (Grant and Potenza, 2007). Improved understanding of the mechanisms through which DA and 5-HT regulate gambling could, therefore, contribute to better treatments for PG.

Here, we investigated the effects of agonists and antagonists at the D1, D2, and 5-HT1A receptors, as well as of d-amphetamine on rGT performance. We predicted that drugs that enhanced DA function would impair, whereas DA antagonists may improve choice behavior. Given that the 5-HT1A receptor agonist, 8-OH-DPAT, decreases 5-HT release, acutely replicating the low 5-HT function suspected in PG, we hypothesized that this drug would also impair rGT performance. Additional groups of rats were included as control for both the probability of punishment (group CProb) and duration of the punishing timeouts (group CPun) to determine how important these punishment signals were in determining choice.

MATERIALS AND METHODS

Subjects

The subjects were 32 male Long–Evans rats (Charles River Laboratories, St. Constant, Canada). The rats weighed 275–300 g at the start of the experiments, and were food-restricted to 85% of their free-feeding weight and maintained on 14 g rat chow per day. Water was available ad libitum. All animals were pair-housed in a colony room under a reverse 12 h light–dark cycle (lights off at 8:00 am) maintained at a temperature of 21 °C. The testing and housing were in accordance with the Canadian Council of Animal Care, and all experimental protocols were approved by the Animal Care Committee of the University of British Columbia.

Behavioral Apparatus

Behavioral testing took place in eight standard five-hole operant chambers, each enclosed within a ventilated sound-attenuating cabinet (Med Associates Inc, Vermont). Each chamber was fitted with an array of five response holes positioned 2 cm above a bar floor. A stimulus light was set at the back of each hole. Nose-poke responses into these apertures were detected by a horizontal infrared beam. A food magazine, also equipped with an infrared beam and a tray light, was located in the middle of the opposite wall, and sucrose pellets (45 mg; Bioserv, New Jersey) could be delivered into it from an external pellet dispenser. Chambers could be illuminated using a house light, and were controlled by software written in Med PC by CAW running on an IBM-compatible computer.

Behavioral Testing

Habituation and training

Animals were first habituated to the operant chambers over two daily 30-min sessions, during which sucrose pellets were placed in the response holes and in the food magazine. Animals were then trained to make a nose-poke response into an illuminated response hole within 10 s to earn reward, similar to the training for the five-choice serial reaction time task (5CSRT) described in previous reports (Winstanley et al, 2003a). The spatial location of the stimulus light varied between trials across holes 1, 2, 4, and 5. Each session consisted of 100 trials and lasted approximately 30 min. After five sessions, animals were consistently completing 100 trials with 80% trials correct and 20% trials omitted. Animals were then trained on a forced-choice version of the rGT (or variant thereof in the case of the control groups) for seven sessions before moving on to the full free choice task. This ensured all animals had equal experience with all of the four reinforcement contingencies, and aimed to prevent simple biases toward a particular hole from developing.

The rGT

A task schematic is provided in Figure 1. Each session lasted for 30 min. Subjects initiated each trial by making a nose-poke response in the illuminated food magazine. This response extinguished the tray light and triggered the start of a 5-s inter-trial interval (ITI). At the end of the ITI, holes 1, 2, 4, and 5 were illuminated for 10 s (in the forced-choice version of the task used in training, only one hole was illuminated). The trial was scored as an omission if animals failed to respond within 10 s, at which point the tray-light was re-illuminated and animals could start a new trial. A response in any illuminated hole turned off all stimulus lights, and led to either onset of the tray-light and delivery of reward or the start of a time-out ‘punishment’ period. The reinforcement schedules were designed so that the two-pellet choice (P2) was optimal in terms of reward earned per unit time. Consistent choice of either the smaller or larger amounts resulted in more frequent rewards, in the case of the former, or larger amounts of reward per response in the case of the latter, but ultimately fewer food pellets per unit time because of the associated punishing timeouts. If the trial was punished, no reward was delivered and the stimulus light within the chosen hole flashed at 0.5 Hz until the punishing timeout had elapsed, at which point the tray light was illuminated. A response in the food magazine started the next trial after both reward and punishment. In parallel to the 5CSRT, premature responses made at the array during the ITI were punished by a 5-s time-out period, signaled by illumination of the house light, after which the tray light was re-illuminated and animals could start a new trial. Perseverative responses made at the array, both after reward and during punishing timeouts, were monitored but not punished.

Figure 1
figure 1

Schematic diagram showing the trial structure of the rGT. The task began with illumination of the tray light. A nose-poke response in the food tray extinguished the tray light and initiated a new trial. After an inter-trial-interval (ITI) of 5 s, four stimulus lights were turned on in holes 1, 2, 4, and 5, and the animal was required to respond in one of these holes within 10 s. This response was then rewarded or punished depending on the reinforcement schedule for that option (indicated by the probability of a win or loss in brackets for each option). If the animal was rewarded, the stimulus lights were extinguished and the animal received the corresponding number of pellets in the now-illuminated food tray. A response at the food tray then started a new trial. If the animal was punished, the stimulus light in the corresponding hole flashed at a frequency of 0.5 Hz for the duration of the punishing timeout and all other lights were extinguished. At the end of the punishment period, the tray light was turned on and the animal could initiate a new trial. Failure to respond at the illuminated holes resulted in an omission, whereas a response during the ITI was classified as a premature response and punished by a 5-s timeout during which the house light was turned on.

The location of the pellet choice options (P1–4) was counterbalanced across animals such that half the animals were tested on version A (n=8) and half on version B (n=8). According to the hole order in the 5-hole operant chamber (left to right: 1, 2, 4, and 5), the order of pellet options in version A was P1, P4, P2, and P3, and that in version B was P4, P1, P3, and P2. Animals received five daily testing sessions per week until statistically stable patterns of choice behavior were observed over three sessions (29 sessions in total). Two additional groups of rats (n=8) were trained on variants of the rGT in which either the probability of punishment (group CProb) or the punishment duration (group CPun) was kept at 0.2 and 10 s, respectively, for all four options, mimicking the parameters for the best option (P2). Apart from these differences, the reinforcement schedules and parameters were identical for these control groups as those used in the rGT, and the order of the choice options was likewise counterbalanced within each group (order A: n=4, order B: n=4, see Figure 2 for more information concerning reinforcement schedules).

Figure 2
figure 2

Baseline choice behavior: (a) Animals in the rGT group consistently showed a large preference for the two-pellet option, associated with not only a modest gain but also smaller and less frequent punishments. (b) When the duration of the punishing timeouts was kept constant (group CPun), animals favored the two-pellet option over the larger rewards, but this preference was not statistically significant. (c) When the probability of the punishing timeouts was equal across all options (group CProb), animals strongly favored the four-pellet option above all others, as selection of this large reward is no longer deterred by more frequent punishments. The punishment duration (in seconds) and the probability of the punishment occurring are located below each corresponding pellet option on the horizontal x-axis. Numbers located inside or above each bar represent the total number of pellets that could be theoretically obtained if the option was chosen exclusively in a 30-min session, hence providing an objective value for each option. This variable was calculated using the absolute minimum trial length (5 s); hence, individual variation in the time penalties incurred because of levels of premature responding, trials omitted, and response latencies would alter the absolute values for each rat, but not the net value of the different options relative to each other. A breakdown of choice behavior across different session quartiles (that is, each 25% of trials) in the rGT, CPun and CProb groups is also provided in panels d–f respectively. Data are shown as the mean percent choice for each option (±SEM).

Behavioral Measurements

The percentage of trials on which an animal chose a particular option was calculated according to the following formula: number of choices of a particular option/number of total choices made × 100. The percentage of choices, rather than a raw count of responses, was used to determine preferences so that either individual variation or drug-induced changes in the number of trials completed (which could itself be influenced by changes in response latencies or premature responding) would not be interpreted as genuine differences in choice preference, that is, animals had to choose an option proportionally more or less relative to the other options, regardless of the absolute number of responses made. As with analysis of data from the 5CSRT, the percent of premature responses made was calculated as the number of premature responses made/total number of trials initiated × 100. Perseverative responses made during the punishment period were analyzed as a fraction of the total punishment duration experienced. Likewise, perseverative responses made after a reward was received were analyzed as a fraction of the total number of trials rewarded. The total number of trials completed and the number of omissions made were also analyzed, in addition to the latency to respond at the array and to collect reward for each choice option.

Drugs

Drug doses are provided in Table 1. Doses were calculated as the salt, with the exception of WAY 100635 for which doses were calculated as the free base. 8-OH-DPAT, WAY 100635, SCH23390 hydrochloride, and quinpirole hydrochloride were purchased from Sigma-Aldrich, (Oakville, Canada). SKF 81297 hydrobromide and bromocriptine mesylate were purchased from Tocris Bioscience (Ellisville, MO). d-amphetamine sulfate was a gift from Dr Stan B Floresco. Drugs were administered through the intraperitoneal route and dissolved in 0.9% sterile saline in a volume of 1 ml/kg, with the exception of WAY 100635, which was dissolved in phosphate-buffered saline and administered subcutaneously, and bromocriptine, which was dissolved in 15% DMSO and 2% (v/v) EtOH in 0.9% sterile saline and injected at a volume of 1.5 ml/kg. The drugs were administered in the following order: quinpirole, 8-OH-DPAT, WAY100635, SCH23390, eticlopride, SKF 81297, d-amphetamine, and bromocriptine. Animals were tested drug-free for a minimum of 1 week between compounds to prevent carryover effects.

Table 1 Doses of Dopaminergic and Serotinergic Drugs

Pharmacological challenges began once stable baseline behavior had been established. All drugs were prepared fresh daily, and different doses were administered according to a digram-balanced Latin Square design (for doses A–D: ABCD, BDAC, CABD, DCBA; p.329, (Cardinal and Aitken, 2006)). Drug injections were given on a 3-day cycle, starting initially with a baseline session. The following day, rats received a drug or saline injection before testing. On the third day, animals were not tested. Injections were given 10 min before the behavioral testing commenced, with the exception of bromocriptine, which was given 40 mins before testing in accordance with previous reports (St Onge and Floresco, 2009), and WAY 100635, which was injected 10 min before either saline or 0.3 mg/kg 8-OH-DPAT.

Data Analysis

All statistical analyses were conducted using SYSTAT for Windows (version number 12.00.08; SSI, Chicago, IL). Data from the pharmacological challenges were analyzed using a two-way, repeated-measures analysis of variance (ANOVA) with choice (four levels, P1–4) and drug dose (four levels, vehicle plus three doses of compound) as within-subject factors. Session or dose was used as the only within-subjects factor if a measurement was not separated by choice, for example, for trials omitted or completed. An arcsine transformation was performed before analysis of variables expressed as a percentage or proportion to limit the effect of an artificially imposed ceiling. If analyses produced significant main effects of dose or dose × choice at the p<0.05 level, further ANOVA comparing individual drug doses with vehicle were performed, and values for individual choice options were compared post-hoc with saline values using paired sample t-tests.

Analysis of baseline behavior (an average of the last three sessions before injections commenced) indicated that there was no significant difference between the choice behavior of animals performing versions A and B (for example, for the rGT: Choice × Version: F3,42=0.738, not significant (NS)). Thus, animals were not separated on the basis of version A or B for subsequent statistical analyses. Measurements for each drug were analyzed according to its individual saline or vehicle dose within the Latin Square design. Although there was some variation in the response to saline, particularly in the CPun group, this effect was not statistically significant (for example, sal Dose, F7,49=1.026, NS). To assess whether administration of WAY100635 blocked the effects of 8-OH-DPAT or affected the behavior independently, a repeated measures ANOVA with three levels was performed: antagonist (two levels: present, absent), dose (two levels: saline, DPAT), and choice (four levels, P1–4).

RESULTS

Baseline Behavior

Choice behavior

Animals performing the rGT significantly favored the best option, P2, followed by P4, P1, and then P3 (Figure 2a, Choice: F3,45=13.658, p<0.0001; P2 vs P1, t(15)=−4.234, p<0.0007; P2 vs P3, t(15)=4.789, p<0.0002; P2 vs P4, t(15)=3.670, p<0.0023). This pattern was established relatively early, but became more pronounced as training continued, until a stable baseline was established (Figure 3). Subjectively, the animals ranked the options in the following order: P2>P4>P1>P3. These preferences generally remained constant across the duration of the session (Figure 2d; Quartile × Choice: F9,135=0.72, NS). Objectively, ranking the pellet options by calculating the maximum amount of pellets that could be earned in a 30-min session if an option was chosen exclusively (shown in Figure 2) indicated that P2 was the best option, followed by P1, P3, then P4. The observation that animals rank P4 higher than would be expected on the basis of the objective ranking suggests that rats, just like humans, find larger reward options tempting despite the associated heavier punishments.

Figure 3
figure 3

Acquisition of stable choice patterns in the rGT and control groups. After seven days of forced-choice testing, animals in the rGT group (a) showed a modest preference for the two-pellet option (P2), which became more pronounced with increased training. (b) When the punishment time was held constant (group CPun), animals consistently chose the two-pellet option throughout the training sessions. Preference for P3 increased after the first few sessions, whereas the choice of P1 decreased as training continued. (c) When the probability of punishment was held constant (group CProb), rats' preference for P4 dramatically increased throughout training.

A distinct pattern of choice behavior was observed in the two control groups as compared with performance of the rGT (Figure 2b and c; Choice × Group: F3,87=9.935, p<0.0001), indicating that choice in the rGT does not depend solely on either the probability of punishment or the punishment duration, but rather on an integration of both variables with reward magnitude. When the size of the punishments associated with every option was set at 10 s, so that animals were solving the discrimination based solely on probability and magnitude of reward (Figure 2b, group CPun), preference largely reflected the objective ranking of the options, although animals again suboptimally preferred P4 to P1, and choice remained constant across the session (Figure 2e; Quartile × Choice: F9,63=0.301, NS). However, importantly, there was no main overall effect of choice option in the statistical analysis (Choice: F3,21=1.388, NS), indicating that animals failed to show a significant bias away from the more disadvantageous options linked to the larger rewards. This is unlikely to be because of lower statistical power arising from the smaller sample size in this group compared with the rGT cohort (n=8 vs n=16), as a main effect of choice is still observed in the rGT when performance was analyzed separately in the eight rats performing version A and the eight performing version B (Choice, rGT-A: F3,21=2.972, p<0.05; rGT-B: F3,21=18.42, p<0.0001; all rats: Choice × Version: F3,42=0.738, NS). This shows that the longer time penalties associated with these large reward options in the rGT are important in determining choice and are sufficiently aversive to suppress their selection.

Conversely, if the probability of reward delivery was equated across all four options (Figure 2c, group CProb), such that animals were discriminating on the basis of reward magnitude and the size of the punishment associated with the different rewards, animals chose strictly on the basis of reward value (P4>P3>P2>P1) even though P3 was objectively worse than P2 (Choice: F3,21=28.736, p<0.0001; P4 vs P3, t(7)=−4.342, p<0.003; P3 vs P2, t(7)=−0.933, NS; P2 vs P1, t(7)=−2.247, p<0.06). Again, choice behavior remained constant across the session (Figure 2f; Quartile × Choice: F9,63=0.493, NS). Clearly, these animals are willing to experience longer punishments to receive larger rewards when the probability of being punished is equal for all choices. This indicates that the increased probability of being punished also has a considerable impact on choice in the rGT, wherein the probability of punishment for the large rewards is significantly higher and choice of these options is correspondingly lower.

Other behavioral measurements

All data are provided in Table 2. Animals in all groups completed a similar number of trials per session, while omissions remained very low. Although the latency to choose a particular option did not differ by choice in any group (rGT: F3,45=0.140, NS; CPun: F3,21=0.296, NS; CProb: F3,21=0.730, NS), animals were quicker to collect larger rewards, and slower to collect smaller rewards in both the rGT and CPun, but not in CProb, groups (Choice—rGT: F3,45=55.828, p<0.0001; CPun: F3,21=11.553, p<0.0001; CProb: F3,21=0.766, NS). This lack of effect in group CProb could reflect a decrease in anticipatory excitement accompanying delivery of the large rewards due to the high incidence with which this occurred. Interestingly, this group also made significantly fewer premature responses overall compared with the rGT and CPun groups (CProb vs rGT: F1,22=9.632, p<0.005; CProb vs CPun: F1,14=11.127, p<0.005; Table 2). Finally, animals in the CPun group made significantly more perseverative responses, both during the punishing time-out periods (rGT vs CPun: F1,22=5.108, p<0.03) and after the reward was received (CPun vs rGT: F1,22=8.728, p<0.007), although the reason for this is not clear.

Table 2 Baseline Behavioral Measurements For rIGT, CPun, and CProb Groups

d-Amphetamine

Choice behavior

Amphetamine significantly increased non-optimal choice in the rGT, decreasing choice of P2 and increasing choice of the P1 option, which is associated not only with the least punishment but also with less reward (Figure 4a–c; Dose × Choice: F9,135=5.581, p<0.0001; sal vs 0.3 mg/kg: F3,45=2.546, p<0.07; sal vs 1.0 mg/kg: F3,45=6.077, p<0.001; P1: t(15)=−3.454, p<0.003; P2: t(15)=3.205, p<0.006; P3: t(15)=−1.459, NS; P4: t(15)=−0.735, NS; sal vs 1.5 mg/kg: F3,45=11.325, p<0.0001; P1: t(15)=−4.413, p<0.0005; P2: t(15)=4.047, p<0.001). A small increase in choice of P4 was also observed after the highest dose only (P4: t(15)=−2.372, p<0.03). When the probability of punishment was held constant (group CProb), animals still shifted their preference toward smaller rewards associated with shorter punishments (Figure 4g–i; Dose: F3, 21=4.149, p<0.02; Dose × Choice: F9,63=2.576, p<0.01; sal vs 1.0 mg/kg: F3,21=3.813, p<0.02; P2: t(7)=−1.931, p<0.09; P4: t(7)=2.832, p<0.02; sal vs 1.5 mg/kg: F3,21=3.119, p<0.04; P2: t(7)=−2.853; p<0.02). Likewise, when the duration of punishment was held constant (group CPun), animals again seemed to shift their preference toward the smaller reward with the smallest probability of punishment, although the dose × choice effect was not significant in this group (Figure 4d–f; Dose: F3,21=7.239, p<0.001; Dose × Choice: F9,63=1.510, NS). It would, therefore, seem that amphetamine increased choice of P1 in the rGT by reducing rats’ tolerance for both increased probability and duration of the punishing timeouts.

Figure 4
figure 4

d-amphetamine administration shifts choice preference toward smaller rewards with smaller punishments: (a–c) After amphetamine administration (1.0 mg/kg and 1.5 mg/kg), animals performing the rGT shifted their choice preference toward the one-pellet option associated with shorter and less frequent punishments. (d–f) Although a significant dose × choice effect was not observed in the CPun group after amphetamine, visual inspection of the data suggests that animals chose the one-pellet option more, an option again associated with the smallest rate of punishment. (g–i) Animals in the CProb group shifted their preference toward the two-pellet option associated with shorter punishment duration. Data are shown as the mean percent choice for each option (±SEM). *Indicates a significant difference (p<0.05) as determined by paired samples t-test comparing choice of a particular option after drug or vehicle.

Other behavioral measurements

Data values and details of all statistical analyses are provided in Supplementary information, Table S1. In keeping with previous reports (for example, Cole and Robbins, 1987; Harrison et al, 1997; Pattij et al, 2007), all doses of amphetamine significantly increased premature responding in the rGT (Dose: F3,45=12.791, p<0.0001; sal vs 0.3 mg/kg: F1,15=21.203, p<0.0003; sal vs 1.0 mg/kg: F1,15=20.941, p<0.0004; sal vs 1.5 mg/kg: F1,15=19.879, p<0.0005) and a similar pattern was observed in the other groups (Dose—CPun: F3,21=7.043, p<0.001; CProb: F3,21=2.711, p<0.07). Omissions remained low across all three groups (Dose—rGT: F3,45=0.652, NS; CPun: F3,21=0, NS; CProb: F3,21=0.636, NS). A slight decrease in the number of trials completed observed in the rGT and CPun groups can probably be attributed to the loss of ‘playing time’ caused by high levels of premature responding (Trials: Dose—rGT: F3,45=6.984, p<0.0006; CPun: F3,21=6.091, p<0.004). Amphetamine also lead to a slight decrease in reward collection latency in the rGT (Dose: F3,45=4.503, p<0.008) and an increase in perseverative responding (Dose—Reward perseveratives: F3,45=5.863, p<0.002; F3,45=10.651, p<0.0001), which was mimicked to some extent in the control groups.

D2/D3 Agonist: Quinpirole

Choice behavior

Data values for the percentage choice of different options after administration of quinpirole are provided in Supplementary Table S2, and values for the other behavioral measurements plus details of all statistical analyses are provided in Supplementary Table S3. In contrast to amphetamine, quinpirole did not affect choice behavior in the rGT at any dose (Dose × Choice: F9,135=1.355, NS). However, when the duration of the punishing timeouts was equalized across options (group CPun), it seemed as though there was a small increase in choice of P1 and a decrease in choice of P2 at the highest dose used. Although this effect was significant at the dose level, there was again no significant dose × choice interaction owing to high inter-individual variation (Dose: F1,7=6.665, p<0.04; Dose × Choice: F3,21=0.093, NS). The highest dose of quinpirole also significantly altered choice behavior when the probability of reward delivery was controlled for (group CProb), thereby decreasing choice of P4 and increasing choice of P3 and P2, which are associated with smaller rewards and smaller punishment durations (Dose × Choice: sal vs 0.125 mg/kg: F3,21=14.119, p<0.0001). Thus, although quinpirole did not affect preference for the different options in the rGT, it did affect choice in simpler versions of the task.

Other behavioral measurements

Although quinpirole did not affect choice behavior on the rGT, the drug did induce a general motor slowing and a decrease in motor output. At all doses, significant increases in both choice and reward collection latencies were observed (Dose—Choice latency: F3,45=27.576, p<0.0001; Collect latency: F3,45=14.130, p<0.0001), as well as a decrease in premature and perseverative responding (Dose—Punishment perseveratives: F3,45=3.496, p<0.02; Reward perseveratives: F3,45=3.996, p<0.01; Prematures: F3,45=11.271, p<0.0001), and an increase in omissions at the highest dose tested (Dose—F1,15=19.306, p<0.0005; sal vs 0.125 mg/kg: F1,15=19.305, p<0.0005). A similar pattern of behavior was also observed in the control groups.

D2/D3 Agonist: Bromocriptine

Bromocriptine, which can be considered a D2-preferring compound because of its greater affinity for D2 vs D3 receptors (Seeman and Van Tol, 1994), did not alter choice behavior in any of the three groups (Table S2; Dose × Choice: rGT: F9,135=1.186, NS; CPun: F9,63=1.183, NS; CProb: F9,63=0.859, NS). Other behavioral measurements were similarly unaffected with the exception of reward collection latency, which was increased at the highest dose in animals performing the rGT (Dose: F3,45=3.244, p<0.03; vehicle vs 5.0 mg/kg: F1,15=5.494, p<0.03) and some minor variations in perseverative responding in the control groups (see Supplementary Table S4 for further details and statistics).

D1 Receptor Agonist: SKF 81297

Although SKF 81297 did not affect choice in the rGT, the highest dose of the drug increased choice of P4 and decreased choice of P2 when punishment duration was controlled for (Supplementary Table S2; Dose × Choice—CPun: F9,63=2.045, p<0.05; sal vs 0.3 mg/kg: F3,21=3.216, p<0.04; rGT: F9,135=2.532, NS; CProb: F9,63=1.468, NS). This effect of SKF on choice behavior—increasing the animals’ choice of the higher reward with the largest probability of punishment—is in direct contrast to the increase in choice of the smallest reward with the lowest probability of punishment observed with quinpirole. However, both effects are only seen when animals are effectively performing a simpler probability discounting task compared with the more complex rGT. Some small, but inconsistent, changes in premature and perseverative responding were also observed in some groups, but these are unlikely to be of behavioral significance (see Supplementary Table S5 for details).

D2 Receptor Antagonist: Eticlopride

Choice behavior

The DA-D2 receptor antagonist, eticlopride, significantly improved optimal choice in the rGT group, increasing choice of P2, and decreasing choice of P3 and P4 at the lowest dose tested (Figure 5a–c; Dose × Choice: F9,135=2.699, p<0.006; sal vs 0.01 mg/kg: F3,45=5.364, p<0.003; P2: t(15)=−2.597, p<0.02; P3: t(15)=2.136, p<0.05; P4: t(15)=2.734, p<0.01; sal vs 0.03 mg/kg: F3,45=2.826, p<0.05; P2: t(15)=−1.765, p<0.09; P3: t(15)=1.618, p<0.1; P4: t(15)=2.343, p<0.03). However, there was no change in choice behavior in the two control groups (Figure 5d–f, Dose × Choice—CPun: F9,63=0.681, NS; Figure 5g–i, CProb: F9,63=0.826, NS). Hence, it would seem that D2 receptor antagonism only enhances gambling-related decision making when both probability and duration of punishment vary between options, which may place greater demands on systems which track reward value over time, recruit cognitive effort, or resolve conflict.

Figure 5
figure 5

Eticlopride administration improves performance of the rGT: (a–c) Eticlopride (0.01 mg/kg and 0.03 mg/kg) resulted in a significant improvement in choice behavior in the rGT. Choice of the best option, P2, increased, whereas choice of the disadvantageous P3 and P4 options decreased. This effect was most significant at the lowest dose. Choice patterns of both the CPun (d–f) and CProb (g–i) groups were not affected by eticlopride. Data are shown as the mean percent choice for each option (±SEM). *Indicates a significant difference (p<0.05) as determined by paired samples t-test comparing choice of a particular option after drug or vehicle.

Other behavioral measurements

Data and statistical analysis are provided in Supplementary Table S6. All doses of eticlopride decreased the latency to collect reward in animals performing the rGT (Dose: F3,45=3.048, p<0.04; sal vs 0.01 mg/kg: F1,15=6.550, p<0.02; sal vs 0.03 mg/kg: F1,15=6.435, p<0.02; sal vs 0.06 mg/kg: F1,15=5.250, p<0.04), whereas this measure was unaffected in the control groups (Dose—CPun: F3,21=0.595, NS; CProb: F3,21=1.972, NS). Animals performing the rGT also completed slightly more trials after the lowest dose, which likely reflects an increase in the most advantageous choice over options delivering longer and more frequent punishing timeouts (Trials: rGT: F3,45=3.066, p<0.04; sal vs 0.01 mg/kg: F1,15=9.042, p<0.009, CPun: F3,21=0.5333, NS; CProb: F3,21=2.767, NS).

D1 Receptor Antagonist: SCH23390

Choice behavior

In contrast to the D2 receptor antagonist, the DA-D1 receptor antagonist, SCH23390, did not affect choice behavior in the rGT (Table S2; Dose × choice: F9,135=0.881, NS). However, when the probability of the punishments was held constant, there was a small decrease in choice of the largest reward option at one dose (Table S2; Dose × Choice—CProb: F9,63=3.236, p<0.003; sal vs 0.01 mg/kg: F3,21=49.510, p<0.0001). Although highly statistically significant, this effect is marginal in size and, therefore, unlikely to be important behaviorally.

Other behavioral measurements

Data values and statistical details are provided in Supplementary Table S7. The highest dose of SCH23390 decreased the number of trials and increased omissions, indicating a general decrease in motor output (Dose—rGT: F3,45=33.680, p<0.001; CPun: F3,21=24.674, p<0.001; CProb: F3,21=26.634, p<0.001). Likewise, both premature and perseverative responding decreased, particularly in the rGT and CPun groups, in which these responses are more frequent (Punishment perseveratives: Dose—rGT: F3,45=5.572, p<0.002; CPun: F3,21=10.172, p<0.0002; CProb: F3,21=0.118, p<0.02; Prematures—rGT: F3,45=9.156, p<0.0001; CPun: F3,21=11.471, p<0.0001; CProb: F3,21=0.701, NS).

5-HT1A Receptor Agonist and Antagonist: 8-OH-DPAT and WAY100635

Choice behavior

8-OH-DPAT significantly impaired performance of the rGT, decreasing choice of the best option and increasing selection of the non-optimal options, P1 and P3, an effect which was most pronounced at the middle dose (Figure 6a–c; Dose × Choice: F9,135=2.151, p<0.02; sal vs 0.3 mg/kg: F3,45=3.016, p<0.04; P1: t(15)=−2.626, p<0.02; P3: t(15)=−2.494, p<0.02). When the probability of punishment was equalized across all options, this dose of 8-OH-DPAT again decreased choice of the best option, which in this case is associated with the longest duration of punishment, and increased selection of the smaller reward options, P1 and P3, associated with shorter punishments (Figure 6g–i; Dose × Choice: F9,63=4.540, p<0.0001; sal vs 0.3 mg/kg: F3,21=9.445, p<0.0004; P4: t(7)=3.634, p<0.008; P3: t(7)=−3.15, p<0.01, P1: t(7)=−2.507, p<0.04). However, when the duration of the punishing timeouts was held constant, 8-OH-DPAT no longer affected performance (Figure 6d–f; group CPun: F9,63=0.965, NS). All effects of 8-OH-DPAT were effectively blocked by co-administration of the selective 5-HT1A antagonist, WAY100635 (Supplementary information, Figure S1; Antagonist × Dose × Choice: rGT: F3,45=3.5057, p<0.02; CPun: F3,21=0.943, NS; CProb: F3,21=7.004, p<0.0019), although WAY100635 in isolation did not significantly alter choice behavior in any group (comparing WAY100635 plus saline with saline alone—Dose × Choice: rGT: F3,45=11.490, NS; CPun: F3,21=0.493, NS; CProb: F3,21=0.690, NS).

Figure 6
figure 6

8-OH-DPAT impaired rGT performance: (a–c) 8-OH-DPAT significantly decreased choice of the most optimal option (P2), while increasing choice of P1 (0.3 mg/kg) and P3 (0.3 and 0.6 mg/kg). (d–f) Animals in the CPun group remained relatively unchanged under the influence of 8-OH-DPAT. (g–i) 8-OH-DPAT shifted preference away from the optimal choice (P4) and increased selection of both P1 (0.3 mg/kg) and P3 (0.3 and 0.6 mg/kg), both of which are associated with shorter punishment durations. Data are shown as the mean percent choice for each option (±SEM). *Indicates a significant difference (p<0.05) as determined by paired samples t-test comparing choice of a particular option after drug or vehicle.

In summary, the 8-OH-DPAT-induced increase in choice of P1 on the rGT may reflect increased sensitivity to punishment magnitude rather than punishment probability, as a comparable shift was observed in the CProb but not in the CPun group. However, the additional increase in choice of P3 on the rGT, which is not only linked to larger rewards but also larger punishments, indicates a broader drug-induced deficit in the ability to integrate numerous factors together (magnitude and probability of punishment vs reward) to accurately assess an option's objective value (see discussion).

Other behavioral measurements

Data values and statistical information are provided in supplementary information (Supplementary Table S8). In the rGT, all doses of 8-OH-DPAT decreased the number of trials completed (Dose: F3,45=17.148, p<0.001), whereas the latency to respond at the array and to collect food reward increased (Dose—Response latency: F3,45=59.641, p<0.0001; Collection latency: F3,45=13.446, p<0.0001). In keeping with this evidence of a general reduction in motor output, the number of premature responses made likewise decreased (Dose—Prematures: F3,45=21.066, p<0.0001), yet perseverative responding was unaltered (Dose—Punishment perseveratives: F3,45=1.407, NS; Reward perseveratives: F3,45=0.365, NS). Although these findings clearly indicate that 8-OH-DPAT impaired motor function, a similar pattern was observed in both control groups even though animals in group CPun did not shift their choice preferences. Hence, this altered motor function is unlikely to have directly contributed to altered choice behavior in the rGT.

DISCUSSION

Here, we show that rats are capable of ‘playing the odds’ when choosing between multiple options differing in the probability and magnitude of gain and loss; they learn to avoid options associated with larger rewards but heavier long-term losses, and prefer more advantageous options associated with smaller rewards but greater net gain. This pattern of behavior is similar to that observed when people perform laboratory-based gambling tasks, such as the IGT. Furthermore, rats' ability to perform the rGT is sensitive to drugs that modulate serotonin and DA levels, and clinical studies have implicated these neurotransmitter systems in the regulation of gambling behavior. Hence, the rGT may prove to be a useful tool for investigating the neurochemical regulation of gambling. Compared with data from the rGT, different patterns of choice preference were observed when either the probability of punishment (group CProb) or the duration of the punishing timeouts (group CPun) was held constant. This suggests that choice in the rGT is guided by an integration of the size of the expected reward with both the probability and the magnitude of expected punishment rather than being dominated by one of these factors, thus supporting the validity of the task design.

Dopaminergic Modulation of rGT Performance

Acute treatment with bromocriptine has been reported to increase choice of larger uncertain rewards in a probability-discounting experiment (St Onge and Floresco, 2009). In contrast, none of the DA agonists used here lead to changes in rGT performance, although quinpirole and SKF 81297 did lead to some behavioral changes when only the probability or magnitude of punishment was varied. These data highlight the difference between decision-making processes based solely on differences in reward probability and those incorporating more complex punishment signals. Although chronic treatment with DA agonists increases risky decision making in some Parkinsonian patients (Voon et al, 2007; Weintraub et al, 2006), the acute effects of DA agonists in healthy volunteers have been less well studied. However, acute pramipexole does not alter the overall number of risky decisions made in a simple betting paradigm, although participants failed to show an increase in conservative choice after an unexpectedly large gain, that is, the outcome of the previous trial did not influence subsequent betting strategies (Riba et al, 2008). Although our demonstration that rodents can solve discriminations on the basis of the probability of reward and loss is a significant advance, this only simulates part of the gambling process; other factors, such as the propensity to chase losses and sensitivity to previous trial outcome, or the amount wagered, are critically important when considering the motivation to gamble. Different gambling behaviors could, hence, be dissociable in terms of their neurobiological basis (Raylu and Oei, 2002), and experiments explicitly designed to explore these issues further are currently underway.

Such dissociations could be important when considering that the DA-D2 receptor antagonist, eticlopride, improved rGT performance, enhancing the choice of the best option (P2). This improvement was not observed in the simpler control tasks, suggesting that D2-mediated signaling is particularly important when the task requires greater cognitive effort or conflict resolution. The D1 antagonist, SCH23390, had no effects on choice behavior, indicating a dissociation between the two receptor subtypes. Clinically, there have been numerous reports linking allelic variation in the D2 receptor gene with PG and reward deficiency syndrome (Cohen et al, 2005; Comings and Blum, 2000), and the mixed D2–5-HT2A antagonist, risperidone, improved the symptoms of PG observed in a Parkinsonian patient (Seedat et al, 2000). It could therefore be argued that, whereas too much DA facilitates aspects of gambling behavior, inhibition of dopaminergic neurotransmission suppresses the drive to gamble.

However, this view is clearly oversimplistic, and is not universally supported. For example, in contrast to the effects of risperidone and the current data on eticlopride, the D2 antagonist haloperidol increased the drive to gamble in pathological gamblers, but not in healthy controls (Zack and Poulos, 2007). Eticlopride and haloperidol have similar pharmacological properties and comparable affinities for the D2 receptor (Assie et al, 2006; Seeman and Ulpian, 1988), therefore, the discrepancy in the data is unlikely to be because of the difference in the drug used. Furthermore, acutely decreasing central DA levels in healthy volunteers impaired IGT performance (Sevy et al, 2006), contradicting the view that too much DA enhances risky decision making.

One theoretical explanation is that the optimal level of DA in terms of regulating gambling behavior may follow an inverted U-shaped curve, as has been repeatedly shown for other cognitive processes (Arnsten, 1997; Granon et al, 2000). Hence, the effect of dopaminergic drugs could depend on both basal levels of DA and gambling-induced changes in DA release that may vary between gambling paradigms such that individual differences in DA function modulate the motivation to gamble. Baseline levels of risky decision making could, therefore, influence drug effects, and such baseline dependency has been observed on the IGT after administration of the stimulant drug, modafinil (Zack and Poulos, 2008). It is, therefore, perhaps unsurprising that inconsistencies have arisen when extrapolating between data from healthy volunteers and those with PD and PG, in which DA regulation is either confirmed or suspected. Furthermore, D2 receptor antagonists can block inhibitory autoreceptors, thereby stimulating the firing of dopaminergic neurons as well as suppressing DA-mediated neurotransmission at post-synaptic sites. The mechanism, by which these drugs affect gambling behavior, and how this is altered after chronic up- or downregulation of the DA system, is currently unknown. In sum, the conditions under which D2 receptor antagonism ameliorate or exacerbate gambling behavior require further exploration, and could relate to the type of gambling patterns individuals are engaged in or other comorbid pathology.

Despite the fact that acute administration of DA agonists did not affect performance of the rGT, amphetamine shifted preference toward the option associated with the smallest reward, and the lowest frequency and duration of punishment. In some ways, selection of P1 may be analogous to the making of risk-averse mistakes, wherein over-weighting potential losses leads to maladaptive choice (Kuhnen and Knutson, 2005). This effect persisted even when the probability of punishment was equated across options (group CProb), indicating that amphetamine is not simply increasing the preference for the highest rate of reinforcement. Amphetamine can affect rats' perception of time (Al-Ruwaitea et al, 1999; Maricq and Church, 1983; Maricq et al, 1981), theoretically, by speeding up an internal clock or pacemaker (Gibbon et al, 1997; Meck, 1983). Such an impairment in temporal judgment would have increased the perceived duration of the punishing timeouts, thereby biasing preference toward P1. However, amphetamine has been shown to increase choice of larger, but more delayed, rewards in delay-discounting paradigms (for example, van Gaalen et al, 2006b; Wade et al, 2000; Winstanley et al, 2003b), suggesting that the drug's effects on temporal perception do not have a primary role in mediating its impact on reward-related decision making. Furthermore, the observation that amphetamine increased choice of P1 even when the duration of the punishment signal was held constant (group CPun) argues against this interpretation.

This CPun group was effectively performing a probability-discounting task, therefore, our observation that amphetamine increased choice of the smaller, more certain option contrasts with recent data using a probability-discounting paradigm, wherein amphetamine increased choice of a larger uncertain reward (St Onge and Floresco, 2009). One key difference between the tasks used was that a failure to win was explicitly punished in the rGT by a signaled timeout, designed to convey ‘loss’, whereas no such punishment signal was present in the St Onge and Floresco study. Hence, amphetamine may have increased the choice of the small reward–small punishment option by making animals hypersensitive to the punishment signal. The ability of amphetamine to enhance the influence that reward-related cues have on behavior is well known (Hill, 1970; Robbins, 1976), but amphetamine also potentiates the conditioned suppression of responding caused by presentation of a CS previously paired with footshock (Killcross et al, 1997). Such data support the suggestion that amphetamine could be enhancing the aversive nature of the signaled punishments in the rGT. The importance of punishment and punishment-related signals in models of gambling clearly warrants further investigation.

Serotonergic Modulation of rGT Performance

The 5-HT1A receptor agonist, 8-OH-DPAT, caused similar changes in behavior to amphetamine, decreasing choice of the best option (P2) and increasing choice of P1, but also consistently increasing choice of P3. Selection of both P3 and P1 is maladaptive: choice of P3 leads to larger rewards associated with disproportionately larger and more frequent punishments, whereas, although P1 generates smaller and more frequent rewards, the losses are still overly large when compared with the net gain possible from P2. 8-OH-DPAT therefore generally impaired the animals' ability to judge between expected outcomes based on the relative likelihood and size of rewards and punishments, that is, they were less able to collate numerous factors together to ‘play the odds’ and accurately assess an option's objective value, opting for either overly conservative or risky strategies. Similar to amphetamine, 8-OH-DPAT has also been shown to speed up the perception of time; however, the drug increased the choice of options associated with both shorter and longer punishing timeouts, indicating that altered temporal judgment is unlikely to underlie 8-OH-DPAT's effects.

The finding that 8-OH-DPAT impaired rGT performance is consistent with the data indicating that dysfunction within the 5-HT system contributes to PG (Grant and Potenza, 2007; Marazziti et al, 2008). Increased choice of the disadvantageous options in the IGT has also been observed in healthy volunteers carrying the short allele of the serotonin transporter-linked polymorphic region (5-HTTLPR(s); Homberg et al, 2008). Although it is currently unclear what inheritance of 5-HTTLPR(s) means for the functioning of the 5-HT system (Lesch et al, 1996; van Dyck et al, 2004), rats which are homo- or heterozygous SERT knockouts, and therefore have constitutively higher levels of extracellular 5-HT, were better at performing a rodent gambling task in which loss was signaled by the addition of quinine to the reward pellets (Homberg et al, 2008). On the basis of these data, decreasing 5-HT efflux would be expected to increase risky choice. 8-OH-DPAT may therefore be leading to maladaptive choice in the rGT by activating presynaptic 5-HT1A autoreceptors and decreasing global 5-HT release. However, 8-OH-DPAT could also be acting by mimicking the effects of 5-HT in serotonergic projection regions, such as areas of frontal cortex, where activation of post-synaptic 5-HT1A receptors inhibits pyramidal cell firing (Araneda and Andrade, 1991). Future experiments using intra-cranial drug infusions will aim to determine 8-OH-DPAT's mechanism of action, and may provide useful data regarding the pathway by which 5-HT modulates gambling behavior.

From a theoretical perspective, the 5-HT system may contribute to gambling behavior because of its role in the emotional response to aversive events (Cools et al, 2008). As such, compromising the 5-HT system may impair subjects' ability to select the best option when relatively complex information about expected losses is critical to the decision-making process. Such a hypothesis, although speculative, fits with clinical observations (Murphy et al, 2008). Alternatively, the impairment observed after 8-OH-DPAT may relate to the anxiogenic properties of the drug (File et al, 1996). Certainly, stress and anxiety can contribute to maladaptive gambling behavior (Black and Moyer, 1998; Meyer et al, 2000; Meyer et al, 2004), although whether anxiolytic agents would improve rGT performance remains to be determined.

The 5-HT also plays a well-established role in regulating impulse control (Linnoila et al, 1983; Soubrie, 1986), and as such it is perhaps unsurprising that it has been implicated in gambling. However, it has been suggested that the drive to gamble may relate more strongly to the indices of risk seeking and compulsivity rather than impulsivity, and these character traits have been dissociated in clinical studies (Kreek et al, 2005). In the current study, 8-OH-DPAT and amphetamine had the opposite effects on premature responding, a measure of impulsive action based on that obtained from the 5CSRT (Carli et al, 1983), yet both drugs lead to comparable changes in choice behavior. Likewise, although eticlopride increased choice of the best option, it did not decrease motor impulsivity. Although reports concerning the effects of 8-OH-DPAT on premature responding in the 5CSRT have been mixed (Carli and Samanin, 2000; Winstanley et al, 2003a), the effects of amphetamine and eticlopride on motor impulsivity presented here are consistent with other data (Cole and Robbins, 1987; Harrison et al, 1997; van Gaalen et al, 2006a), suggesting that the rGT can concurrently measure motor impulsivity and gambling-related decision making, and that these concepts are independent.

Behavioral Considerations: Comparison of the rGT to other Gambling Paradigms

It should be noted that, unlike in human gambling tasks in which money is the incentive, using food as a primary reinforcer in rodent models of risky decision making means that animals cannot finish a session worse off than at the start. Whereas humans will work to restore a negative balance to zero (Campbell-Meiklejohn et al, 2007), this is fundamentally difficult in rats unless primary punishers, such as electric shocks, are used. This latter approach lacks validity, as the losses incurred when gambling are essentially the decline of a secondary positive reinforcer (money) rather than the presentation of a primary negative reinforcer (pain). Using a decline in time-to-earn reward to represent loss seems to result in choice behavior comparable with that observed in tasks such as the IGT, and eliminating this factor prevents a significant aversion developing toward the disadvantageous, larger reward options.

Although the rGT aims to model gambling-related decision making in a manner comparable with clinically used tasks such as the IGT, there are clear differences between such approaches. In the rGT, rats are systematically exposed to the different contingencies associated with all four options through forced-choice training sessions, after which a general preference for the different options is quickly established and stabilizes with repeated testing. Preference for the different options also remains fairly constant throughout each session. In tasks such as the IGT, performance is measured across a single session. The classic finding is that participants initially prefer the larger risky decks and then switch to the more advantageous decks over time (Bechara et al, 1999). However, this shift in preference is only observed in the IGT when the decks are stacked so that the larger losses associated with the disadvantageous decks occur later in the session; if wins and losses occur randomly throughout the session, control participants favor the advantageous decks from the first block of trials onward, similar to data from the rGT (Fellows and Farah, 2005).

It could be argued that rats are relying on memory for the position of the different options to solve the task, rather than basing their preference on the different reinforcement contingencies. Indeed, the same confound is present in the IGT, and gambling paradigms which have a lower memory load have been developed for clinical use (for example, Rogers et al, 1999). However, systemic administration of D2 receptors does not alter either working or reference memory (Bushnell and Levin, 1993; Kobayashi et al, 1995), yet eticlopride improves rGT performance. In contrast, peripheral administration of D2 agonists can impair working memory, yet do not affect short-term reference memory or choice in the rGT (Bushnell and Levin, 1993). Theoretically, the contribution of reference memory to rGT performance could be completely abolished if the location of the different options was altered randomly between sessions. However, this would make the task exceedingly difficult and it is doubtful that rats would reliably perform such a complex paradigm.

In summary, the data presented here show that rats can effectively ‘play the odds’ and make decisions between multiple outcomes based on both the size of the expected reward, and also the probability and magnitude of expected punishment. This cognitive process shares key features with the decision making involved in gambling. The rGT may therefore provide novel, important, and timely data regarding the neurobiological basis of gambling that can be used to identify therapeutically relevant drug targets for such gambling-related disorders.