Abstract
To achieve a goal, animals procure immediately available rewards, escape from aversive events, or endure the absence of rewards. The neuronal substrates for these goal-directed actions include the limbic system and the basal ganglia. In the striatum, tonically active neurons (TANs), presumed cholinergic interneurons, were originally shown to respond to reward-associated stimuli and to evolve their activity through learning. Subsequent studies revealed that they also respond to aversive event-associated stimuli such as an airpuff on the face and that they are less selective to whether the stimuli instruct reward or no reward. To address this paradox, we designed a set of experiments in which macaque monkeys performed a set of visual reaction time tasks while expecting a reward, during escape from an aversive event, and in the absence of a reward. We found that TANs respond to instruction stimuli associated with motivational outcomes (312 of 390; 80%) but not to unassociated ones (51 of 390; 13%), and that they mostly differentiate associated instructions (217 of 312; 70%). We also found that a higher percentage of TANs in the caudate nucleus respond to stimuli associated with motivational outcomes (118 of 128; 92%) than in the putamen (194 of 262; 74%), whereas a higher percentage of TANs in the putamen respond to go signals for the lever release (112 of 262; 43%) than in the caudate nucleus (27 of 128; 21%), especially for an action expecting a reward. These findings suggest a distinct, pivotal role of TANs in the caudate nucleus and putamen in encoding instructed motivational contexts for goal-directed action planning and learning.
Introduction
Animals initiate actions to obtain immediately available rewards, to escape from aversive events, or to endure the absence of rewards for future rewards. These goal-directed behaviors occur as series of processes: detection and discrimination of the stimuli associated with outcomes that allow animals to expect before they actually occur, maintaining the expected outcome information, and performing actions toward the outcomes. The neuronal substrate for these processes includes the limbic system and the basal ganglia (Mogenson et al., 1980; Robbins and Everitt, 1996; Rolls, 1999).
In the dorsal and ventral striatum, neuron activity is represented with motivational as well as sensorimotor properties. Projection neurons code expectation of rewards (Hikosaka et al., 1989; Schultz et al., 1992; Tremblay et al., 1998; Lauwereyns et al., 2002), kinds of rewards (Hassani et al., 2001), magnitude of rewards (Cromwell and Schultz, 2003), and proximity of rewards (Shidara et al., 1998; Jog et al., 1999). Tonically active neurons (TANs), presumed cholinergic interneurons in the striatum (Wilson et al., 1990; Aosaki et al., 1995; Kawaguchi et al., 1995), were initially characterized by the responses to reward-associated stimuli (Kimura et al., 1984; Apicella et al., 1991; Kimura, 1992; Raz et al., 1996), by the evolution of responses via behavioral learning (Aosaki et al., 1994b), and by the involvement of the nigrostriatal dopaminergic system in the responses (Aosaki et al., 1994a). It was subsequently shown that TANs respond not only to reward-associated stimuli but also to aversive stimuli such as an airpuff on the face (Ravel et al., 1999; Blazquez et al., 2002; Ravel et al., 2003). Furthermore, the responses of TANs in the caudate nucleus to visual cues for eye movement tasks were selective to the contralateral visual field but much less selective to whether the cue was associated with reward or no-reward outcome (Shimo and Hikosaka, 2001). This observation led Shimo and Hikosaka (2001) to propose that TANs, distinct from the projection neurons, would contribute to the detection of the schedule to obtain rewards while not discriminating cues in relation to rewards.
The functional roles played by TANs in the striatal circuitry (a center for reward-based decision and learning) are still not thoroughly understood. Three critical issues remain to be clarified. First, are rewards special for the activity of TANs? Second, what are the roles of TANs in detection, discrimination, maintaining motivational contexts, and initiating behavioral response based on the contexts? Third, what are the differences and similarities in the activation properties of TANs between the caudate nucleus and putamen? To investigate these issues, we designed this study in which monkeys performed a set of visual reaction time (RT) tasks while expecting reward, during escape from an airpuff on the face, and during a beep sound while enduring the absence of reward. We recorded the activity of TANs in response to outcome-associated and nonassociated stimuli in the caudate nucleus and putamen. Our results supported the notion that TANs in the caudate nucleus and putamen differentially detect and discriminate instructed motivational contexts for goal-directed action.
Materials and Methods
Experimental animals. We used two Japanese monkeys (Macaca fuscata): monkey DA (male, 5.6 kg) and monkey AI (female, 6 kg). All surgical and experimental procedures were approved by the Animal Care and Use Committee of Kyoto Prefectural University of Medicine and were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals. Monkeys participated in the experiment on a restricted-water diet for 5–6 d per week but received ad libitum access to food and water on the weekend.
Behavioral tasks. The monkeys were trained to sit on a primate chair facing a wood panel 44 cm in front of them in a soundproof, electrically shielded room. The behavioral task (Fig. 1) was composed of seven events: hold lever (HL) press, fixating on a central spot, first instruction light-emitting diode (LED) (IN1), second instruction LED (IN2), go signals for the lever release (GO), hold lever release, and outcome. We used three different motivational outcomes, as follows: water reward [rewarded-condition (REW)], airpuff to escape [unrewarded-aversion condition (AVE)], and beep sound [unrewarded-sound condition (SOU)]. The behavioral tasks used with the two monkeys were slightly different. When monkey DA (Fig. 1 A) depressed the hold lever (OFL-V-S5; Osiden, Osaka, Japan) for 0.9–1 sec with his right hand and fixated his eyes on the fixation point (FP; a small red LED; diameter, 3 mm; 25 cd/m2), the first instruction stimulus (green LED) appeared on the right side of the FP (15° apart, contralateral side to the neuronal recording). The color of the IN1 then changed to one of three colors as IN2. Red (luminance, 69 cd/m2), blue (82 cd/m2), and yellow (36 cd/m2) colors indicated REW, AVE, and SOU conditions, respectively. The monkey had to release the lever as soon as the IN2 disappeared (GO). Water reward (0.2 ml) was delivered by a microtube pump (MP-N; Tokyo Rikakiki, Tokyo, Japan) from a spout in front of the monkey's mouth in the REW condition at a delay of ∼600 msec after the lever release when the lever was released with RTs of <600 msec after GO. In the AVE condition, an airpuff (26 psi, 30 msec) was delivered from a pipette tip placed 10 cm away from the monkey's face (at eye level on the right side) when the lever was released later than 600 msec. A solenoid valve to control the airpuff was placed outside of the soundproof room. In the SOU condition, a beep sound (1500 Hz, 200 msec, 70 dB) occurred after the lever was released within 600 msec. Thus, three types of IN2 served as “associative” stimuli instructing the monkeys about reward, aversive, and sound outcomes, whereas the IN1 served as a “nonassociative” stimulus. In monkey AI (Fig. 1 B), however, IN1 served as an associative stimulus, whereas IN2 served as a nonassociative stimulus. The order of the associative stimuli and nonassociative stimulus was opposite to that of monkey DA. The relationships between colors and conditions were different from those of IN2 in monkey DA to ensure that color did not have an effect on the neuronal activity, if indeed the stimulus color had any effect. IN2 (green LED) served as a nonassociative instruction. The location of the instruction LED was closer to the FP (5° from FP) for monkey AI than for monkey DA, because it was hard for monkey AI to maintain her gaze on the FP with the instruction LED at a greater distance from the FP. In a block of ∼120 trials, the trials of the three conditions occurred in a semirandom order. Trials with RTs of <100 msec and >600 msec were considered to be early release errors and late release errors, respectively. Trials in which the eye position deviated >4° (monkey DA) or 5° (monkey AI, who was allowed to see instruction LEDs because she could not maintain her gaze on the FP in many trials) from the FP were considered to be fixation errors. In these error trials, the trial was aborted and the same condition was repeated. The FP was on throughout the block of trials. Thus, monkeys could start trials at their own pace. The next trial began shortly after the appearance of the outcomes from the previous trial.
In the original reaction time task (Fig. 1 A,B), three associative instructions and one nonassociative instruction were used. Thus, the uncertainty about the color instructions was different between the associative and nonassociative instructions, which might influence the responsiveness of striatal neurons to the two types of instructions. A control task was used in monkey AI (Fig. 1C) in which two associative (blue REW and yellow SOU) and two nonassociative (green and red) stimuli appeared as IN1 and IN2, respectively. Each of the REW and SOU conditions instructed by IN1 had an equal number of green and red nonassociative stimuli.
Monkeys learned the task in the REW condition, in the SOU condition, and finally in the AVE condition. Monkey AI learned the control task in the REW and SOU conditions, and the AVE condition was then introduced.
Surgery. All surgeries were performed under sterile conditions with the monkeys under deep sodium pentobarbital anesthesia. Anesthesia was induced with ketamine hydrochloride (10 mg/kg, i.m.) and sodium pentobarbital (Nembutal; 27.5 mg/kg, i.p.), and supplemental Nembutal (6 mg/kg, i.m., for 2 hr) was given as needed. Four head-restraining bolts and one stainless-steel recording chamber were implanted under stereotaxic guidance on the skulls of each monkey. The chamber, for recording neuronal activity in the striatum, was placed laterally at a 45° angle. The center of the chamber was adjusted according to Horsley–Clark stereotaxic coordinates: lateral, 10 mm; anterior, 18 mm; and height, 9 mm (in the left hemisphere).
Recordings. We recorded the action potentials of single neurons in the striatum (caudate nucleus and putamen) from the left hemisphere of the two monkeys by using epoxy-coated tungsten microelectrodes (FHC, Bowdoinham, ME) with an exposed tip of 15–60 μm and with an impedance of 2–4MΩ. The electrodes were inserted through the implanted recording chamber and advanced into the striatum by means of an oil-driven micromanipulator (MO-95; Narishige, Tokyo, Japan). The neuronal activity was amplified and displayed on an oscilloscope using conventional electrophysiological techniques. Bandpass filters (50 Hz to 1 kHz) were used to tune the amplifier system to sample neural action potentials with low noise levels. The action potentials of single neurons were isolated by using a spike sorter with a template-matching algorithm (multi-spike detector; Alpha Omega Technologies, Nazareth, Israel), and the onset times of the action potentials were recorded on a laboratory computer (9801BX4; NEC, Tokyo, Japan) together with the onset and offset times of stimuli and the behavioral events that occurred in association with the tasks. We identified TANs on the basis of their tonic firing (2–8 Hz) and their broad action potentials (Kimura et al., 1984; Apicella et al., 1991; Aosaki et al., 1994). Electromyographic (EMG) activity was recorded from the extensor, flexor, and biceps brachii muscles of the right arm as well as from the digastric muscle through chronically implanted multithreaded Teflon-coated stainless-steel wire electrodes (AS631; Cooner Wire, Chatsworth, CA) with leads that led subcutaneously to the head implant. Eye movements were also monitored by measuring the corneal reflections of an infrared light beam using a video camera with a time resolution of 4 msec. The computer system (R-22C-I; Iseyo-Denshi, Tokyo, Japan) determined horizontal and vertical signals of the center of the reflected infrared light beam in the cornea. The spatial resolution of this system was approximately ±0.15°. The EMG signals and eye-position data were recorded on a laboratory computer through an analog-to-digital converter interface at a sampling rate of 100 Hz. The recordings started when the monkeys had mastered the behavioral task at the high correct performance rate (>80%). This required 1 month for monkey DA after the AVE condition was last introduced, whereas 3 months were required for monkey AI.
Data analysis. Differences between correct performance rates and error rates were compared among the three task conditions using the Bonferroni test to control the family-wise significance level. The RTs were compared using two-way ANOVA among the three task conditions and among the IN2–GO intervals. Peristimulus time histograms (PSTHs) of the impulse discharges of the TANs were constructed as increases or decreases in the discharge rates before and after a behavioral event. We studied neuronal activity and behavior at the correct trials. Significant increases and decreases in neuronal activity from the background discharge rate were determined by comparing the discharge rate during the 50 msec (5 bins) test window with that during the 250 msec (25 bins) baseline window just before the occurrence of IN1. The test window was compared with the baseline window by shifting the test window up to 400 msec from the onset of an event by each bin in the PSTH (10 msec). The activity was considered to be significant if more than three consecutive comparisons between the test window and two of the three baseline windows (baseline activity was obtained from the REW, AVE, and SOU conditions) resulted in statistical significance (Wilcoxon two-sample test; p < 0.05) (Kimura, 1986). The onset and offset of the response were taken to be the beginning and end of significant changes in activity, respectively. The latency, offset, and duration of the significant activity were compared using ANOVA.
Histology. At the end of all recording experiments, small electrolytic lesions were made at 17 locations along selected nine electrode tracks, both in the caudate nucleus and in the putamen. Direct anodal current (20 μA) was passed for 30 sec through tungsten microelectrodes. The monkeys were deeply anesthetized with Nembutal (60 mg/kg, i.p.) and were perfused transcardially with 10% formalin in 0.9% NaCl solution. Coronal sections of the striatum, 50 μm in thickness, were stained with cresyl violet. Electrode tracks through the striatum were reconstructed on the histology sections using the electrolytic lesion marks as reference points, and the recording sites of TANs were identified (see Fig. 6).
Results
We recorded the activity of 461 TANs in the caudate nucleus (n = 131), in the putamen (n = 312), and in the rostral part of the striatum located in the internal capsule between the caudate nucleus and the putamen (caudate–putamen bridge; n = 18) in two monkeys (Table 1). In 390 TANs (317 in monkey DA and 73 in monkey AI), neuronal activity was examined using the original reaction time task with three instructed motivational outcomes. The activity of a separate group of TANs (n = 71) in monkey AI was examined during the control task.
The expectation of motivational outcomes influences task performance
Expectation of reward, airpuff, and beep sound as outcomes significantly affected the task performance of the monkeys. The correct performance rate was consistently higher in the REW condition than in the AVE and SOU conditions (Fig. 2A) (Bonferroni test; p < 0.01). The RTs for lever release after GO were also dependent on the conditions. Because GO occurred after the instruction stimuli at variable time intervals, the RTs in monkey DA were longer at shorter (unpredictable) IN2–GO intervals and shortest at the longest (predictable) interval (Fig. 2B, left) (ANOVA; p < 0.01; F(2,40930) = 4521.8). The RTs in the AVE condition were shorter than those in the SOU condition at longer IN2–GO intervals [ANOVA (condition × interval); p < 0.01; F(4,40930) = 57.4]. However, monkey AI performed the task within a range of RTs that were shorter than those of monkey DA in all three possible IN2–GO intervals (Fig. 2B, right). Nevertheless, the RTs were significantly shorter at the longest IN2–GO interval than at the shortest interval [ANOVA, p < 0.01, F(2,24749) = 95.1; condition × interval, p < 0.01, F(4,24749) = 10.8). Also, the RTs in the AVE condition were shorter than those in the SOU condition (ANOVA; p < 0.01; F(2,24749) = 13.4). In the control task, in which only REW and SOU conditions occurred, monkey AI performed the task at longer RTs. The RTs changed depending on the IN2–GO intervals (average of REW and SOU conditions was 365, 318, and 285 msec at short, middle, and long intervals, respectively), similarly to the case of the original task in monkey DA (Fig. 2B, left). This indicated that monkey AI changed her strategy to react to GO as quickly as possible in the original task in which the AVE condition was newly introduced in addition to the REW and SOU conditions.
The early release error rate was highest in the AVE condition in both monkeys. Error rates in the AVE condition were significantly higher than those in the REW condition in the two monkeys (Fig. 2A) (Bonferroni test; p < 0.01), although the difference between the AVE and SOU conditions was significant in monkey DA but not in monkey AI.
The different task performances under different motivational contexts were reflected in the activity of prime mover muscles in both of the two monkeys studied. Figure 2C shows the activation of the wrist flexor muscle of monkey DA before and after he released the hold lever. The muscle activity for lever release was smaller in the REW condition than in the AVE and SOU conditions, although RTs were shorter in the REW condition (Fig. 2B, left). This suggested that lever release in the REW condition was the first conditioned movement coupled to the second orofacial movement involved in consuming the reward, whereas in the AVE and SOU conditions, the lever release was made as a single movement. Thus, the small muscle activity in the REW condition was probably attributable to fast, efficient combination movements. These data indicated that monkeys learned the contingency of the visual instructions with motivational outcomes and performed the reaction time tasks in different ways while expecting distinct motivational outcomes.
TANs selectively and differentially respond to instructions for motivational outcomes of an action
Most of the TANs responded specifically to visual instructions associated with motivational outcomes by characteristic suppression and facilitation of tonic discharges. The visual instruction was presented on the contralateral side to the neuronal recording, because the responses of TANs are preferentially contralateral to the visual field (Shimo and Hikosaka, 2001). Figure 3, A and B, shows representative activity of two TANs. They showed a suppression of discharges at a short latency after all three kinds of instructions associated with reward, airpuff, and sound outcomes. The suppression of discharges was often followed by facilitation (late facilitation). In a small number of TANs, a facilitation of discharge occurred at a short latency after the associative instructions (initial facilitation) and was then followed by suppression. Most of the suppression occurred at a latency of <280 msec after the associative instructions, whereas the late facilitation occurred at latencies of >280 msec, except in the AVE condition in monkey DA, in which the late facilitation occurred at 200 msec (Fig. 3C,D). The suppression responses occurred more frequently than the facilitation (initial and late) responses (monkey DA, p < 0.01, χ2 = 14.8; monkey AI, p < 0.05, χ2 = 4.3). A smaller number of neurons showed the initial or late facilitation alone. Therefore, we evaluated neuronal responsiveness to the associative instructions in terms of the suppression responses. Figure 3, E and F, summarizes the percentage of TANs showing significant suppression of discharges to the three kinds of associative and nonassociative instructions. A large number of TANs responded to at least one of the associative instructions (85% in monkey DA; 58% in monkey AI), whereas a very small percentage of TANs responded to either one of the nonassociative instructions (14% in monkey DA; 7% in monkey AI). The average responsiveness to the three associative instructions was 56% in monkey DA and 35% in monkey AI, whereas average responsiveness to nonassociative instructions was 6 and 2% in monkeys DA and AI, respectively. The difference was significant (monkey DA, p < 0.01, χ2 = 536.4; monkey AI, p < 0.01, χ2 = 76.4). What makes the responses of the TANs remarkably selective to instructions associated with motivational outcomes? We studied whether the difference in the number of colors used for the associative (three colors) and nonassociative (one color) instructions might have influenced the neuronal responsiveness. Neuronal activity was examined in monkey AI during the control task, in which two associative (blue REW and yellow SOU) and two nonassociative (green and red) stimuli appeared. TANs responded selectively to the associative instructions (Fig. 4). These data indicate that TANs specifically respond to the stimuli associated with motivational outcomes.
The next critical issue was whether TANs, as single neurons and as a population, discriminate instructed motivational outcomes. Most TANs discriminated the three kinds of associative instructions (Fig. 5). In both monkeys, the differential type of TANs responding to either single or double associative instructions (RA, AS, RS, R, A, and S) was more common than the nondifferential type (RAS) responding to all three kinds of instructions [monkey DA, 68% (184 of 269), p < 0.01, χ2 = 18.9; monkey AI, 77% (33 of 43), p < 0.01, χ2 = 6.7]. One-fourth of the TANs responded only to a single kind of instruction (30% in monkey DA; 27% in monkey AI). Figure 5, C and D, shows ensemble averages of activity of differential (RA, AS, and A) and nondifferential (RAS) types of TANs. These data indicate that individual TANs discriminate instructed motivational outcomes rather than respond generally to associated instructions. TANs, as a population, also seem to be able to discriminate the instructions in terms of the different types of neurons. Although the percentages of responsive neurons changed considerably over time (Fig. 3C,D), TANs as a population responded to the three kinds of instructions for ∼400 msec after appearance of the instructions. This was true in the facilitation responses. If the neuronal responses were evaluated on the basis of the facilitation responses, the differential type was more common than the nondifferential type (monkey DA, 75%, p < 0.01, χ2 = 32.0; monkey AI, 89%, p < 0.01, χ2 = 13.5). One-fourth of the TANs responded only to a single kind of instruction (30% in monkey DA; 28% in monkey AI).
Figure 6 plots the locations of all 317 TANs in the putamen and caudate nucleus of monkey DA. Although our recordings did not cover the rostral end of the striatum, TANs responding to associative instructions were found throughout the striatum. We found that TANs in the caudate nucleus had contrasting properties in their responses to instructed motivational outcomes compared with those in the putamen (Fig. 7). First, the percentage of TANs responding to associative instructions was higher in the caudate nucleus than in the putamen (Fig. 7A–C) (monkey DA, 94 vs 80%, p < 0.01, χ2 = 11.7; monkey AI, 85 vs 50%, p < 0.05, χ2 = 5.1; control, 86 vs 50%, p < 0.01, χ2 = 7.9). The higher responsiveness of TANs in the caudate nucleus was observed even in the responses to nonassociative instructions in monkey DA and in the control task in monkey AI (monkey DA, p < 0.01, χ2 = 18.1; monkey AI, p = 0.52; control, p < 0.05, χ2 = 6.2), although the percentage of the responsiveness was low (<12%). Second, the percentage of nondifferential TANs was higher in the caudate nucleus than in the putamen (Fig. 7D–F) (monkey DA, 43 vs 18%, p < 0.01, χ2 = 23.3; monkey AI, 20 vs 11%, p = 0.34, χ2 = 0.93). Although this difference was not significant in monkey AI, the percentage of nondifferential type (RS) was higher in the caudate nucleus than in the putamen in the control task (57 vs 20%; p < 0.01; χ2 = 9.5). Third, the caudate nucleus and putamen were different in terms of the latency of responses to the associative instructions. As summarized in Table 2, the latency of the suppression was significantly shorter in the caudate nucleus than in the putamen in both monkeys (ANOVA; monkey DA, p < 0.01, F(1,486) = 32.4; monkey AI, p < 0.01, F(1,65) = 9.8).
Responses to instructions associated with motivational outcomes during correct and error performances
In a small number (5–12%) of trials, monkeys made incorrect trials, as described above (Fig. 2A). We studied whether the neuronal responses to instructions associated with motivational outcomes are different between correct trials and incorrect trials. Superimpositions of average traces of activity of TANs in the caudate nucleus and putamen during the correct and error performance in monkey DA overlapped almost completely in every outcome condition, whereas magnitudes and time courses of responses in the three conditions were different (Fig. 8A). This indicated that the monkeys made incorrect performances by making too early releases of hold lever or fixation breaks of eyes, although they had acknowledged the motivational contexts via the instruction stimuli. Once the task was aborted by errors, the same instruction was repeated in the subsequent trials. In these force correct trials, monkeys could have known about the kinds of instruction (REW, AVE, and SOU) before the instruction occurred, if the monkey was aware of the associative cue in the previous aborted trial (force correct trials). We compared neuronal activity between the correct and force correct trials and found no apparent difference, except in the REW condition, in which neuronal activity at the force correct trials appears to be smaller (Fig. 8B). This indicated two possibilities. First, the instruction stimuli induced different motivational drive in monkeys independent of whether the kinds of instruction were predictable or not. Second, monkeys were not aware of the associative cue in the previous aborted trial.
TANs respond to GO for an action expecting different motivational outcomes
A very high percentage of TANs, as a population, responded not only to associative instructions but also to GO for the lever release. Figure 9, A and B, shows ensemble averages of the activity of all TANs recorded after the occurrence of four events in the REW condition. As shown in the time courses of facilitation and suppression responses after GO in Figure 9, C and D, initial and late facilitations were more common than suppression (monkey DA, p = 0.093, χ2 = 2.8; monkey AI, p < 0.01, χ2 = 8.3). A small number of neurons showed suppression alone. Therefore, we evaluated the responsiveness to GO based on the occurrence of facilitation. This was distinct from the responses to associative instructions, in which suppression was the common response. The GO responses were much stronger in the putamen than in the caudate nucleus in both monkeys (Figs. 9A,B, 10A,B). The percentages of occurrence of GO responses were higher in the putamen than in the caudate nucleus (monkey DA, 35 vs 21%, p < 0.01, χ2 = 7.6; monkey AI, 72 vs 25%, p < 0.01, χ2 = 13.1). This relationship is in contrast to the responses to associative instructions, which were much stronger in the caudate nucleus than in the putamen (Figs. 7A–C, 9E, F)
We next examined whether responses to GO are contingent on motivational outcomes. Figure 10 shows the GO responses in three conditions. The percentage of differential response types (RA, AS, RS, R, A, and S) was much higher than that of the nondifferential type (RAS) in the putamen (Fig. 10E,F) (monkey DA, 68 of 74, p < 0.01, χ2 = 31.5; monkey AI, 29 of 38, p < 0.05, χ2 = 5.7). The GO responses were characteristic in that they occurred more frequently in the REW condition than in the other two conditions (Fig. 10A–D) (monkey DA, p < 0.01, χ2 = 19.9; monkey AI, p < 0.05, χ2 = 7.5). The higher responsiveness in the REW condition was also observed when the selectivity was estimated on the basis of suppression responses (data not shown). The GO responses were observed more often in the caudal part of the putamen (posterior to the anterior commissure) than in the rostral putamen [monkey DA, 42% (53 of 126) vs 25% (21 of 83), p < 0.05, χ2 = 6.1; monkey AI, 80% (28 of 35) vs 56% (10 of 18), p = 0.061, χ2 = 3.5].
Because GO triggered lever release movements at a short reaction time (average, 240–340 msec) as well as neuronal responses (Table 2), it is possible that the GO responses of the TANs are involved in processing GO for an action expecting different motivational outcomes, eliciting lever release movements, or both. To address this issue, we examined the temporal relationship between GO and the onset of initial facilitation of the TANs and the onset of activation of the prime mover muscle (wrist flexor). Figure 11 illustrates simultaneously recorded activity of a TAN located in the putamen and muscle activity in monkey AI. It was found that the onset of initial facilitation of the TAN was better time-locked to GO than to the onset of muscle activation. This supported previous findings that activity of TANs is time-locked to conditioned stimuli but not to conditioned responses (Kimura, 1992; Aosaki et al., 1995).
Another important issue was whether individual TANs respond to both associative instructions and GO, or whether they respond to only one of the two events. We evaluated the responsiveness of individual TANs in each condition to associative instructions and to GO in terms of suppression and facilitation responses, respectively. Most TANs (>80%) responded to either the associative instruction or GO in each condition, but not to both (Fig. 12A,B). In other words, different populations of TANs responded to the associative instruction and GO. In Figures 7, 8, 9, 10 and 12, a small number of TANs (n = 16 in monkey DA; n = 2in monkey AI) located in the caudate–putamen bridge were included in the neuron group of the caudate nucleus, because the responsiveness of the two groups of TANs was not different (p = 0.10 in monkey DA).
Discussion
The present study reveals four properties inherent to tonically active neurons, the presumably cholinergic interneurons in the striatum. First, TANs specifically respond to instruction stimuli associated with motivational outcomes but not to unassociated stimuli. Second, TANs discriminate between different kinds of associated instructions. Third, TANs encode the onset of GO for actions to be performed while expecting motivational outcomes, especially a reward. Fourth, TANs in the caudate nucleus and putamen have contrasting properties in encoding instructed motivational outcomes of actions. These findings suggested a distinct and crucial role for TANs in the caudate nucleus and putamen in encoding instructed motivational contexts for goal-directed action planning and learning in the striatum.
Distinct involvement of TANs in the caudate nucleus and putamen in encoding instructed motivational contexts for goal-directed processing
Although the responses of TANs to instruction stimuli and GO were found in both the caudate nucleus and putamen, there were contrasting properties between the activity of TANs in these two striatal nuclei. The responses to associative instructions were more abundant in the caudate nucleus than in the putamen, whereas responses to GO were more common in the putamen. Interestingly, the different responsiveness of TANs in two monkeys to associative instructions and to GO was tightly coupled with each monkey's strategy for performing the task. In monkey DA, a very high percentage of TANs, especially in the caudate nucleus, responded to instructions that differentiated between the associated outcomes, whereas a small group of TANs responded to GO. In contrast, in monkey AI, a higher percentage of TANs, especially in the putamen, showed GO responses, whereas a lower percentage of TANs responded to associative instructions. These results suggest that the differential responses of TANs to instructions associated with three kinds of motivational outcomes might enable monkey DA to perform the task in a condition-dependent manner, whereas very strong GO responses in the putamen might enable monkey AI to perform the task at very short RTs.
This is the first experimental demonstration of distinctive activity profiles of TANs in the caudate nucleus and putamen. The observed differences were consistent with the functional connectivity of the two striatal nuclei. The predominance of the GO responses in the putamen, especially in caudal region, appears to play a major role in goal-directed planning and learning of limb movements in terms of the corticobasal ganglia loop circuits through the sensorimotor cortices (Alexander et al., 1986; Takada et al., 1998; Nambu et al., 2002). Furthermore, the thalamostriatal projections may provide TANs with their major inputs, because TANs almost lose responsiveness to reward-associated stimuli after inactivation of the centre median (CM)–parafascicular (Pf) complex of the thalamus (Matsumoto et al., 2001). Pf neurons project mostly to the caudate nucleus and putamen situated rostral to the anterior commissure, whereas CM neurons project to the posterior part of the putamen (Sadikot et al., 1992). Neurons in the CM–Pf complex respond to multimodal stimuli, especially those presented on the contralateral side to which monkeys paid selective attention (Minamimoto and Kimura, 2002). This evidence could explain the contralateral preference of TAN responses in the caudate nucleus to visual instructions for saccade task (Shimo and Hikosaka, 2001).
Is reward special for activity of TANs?
Although a beep sound at moderate intensity seemed to have no apparent motivational impact as an outcome compared with the reward or airpuff, the percentage of TANs responsive in the SOU condition was not lower than in the other two conditions. This indicated that the beep sound had acquired behavioral connotations of the absence of reward in the current trial and of waiting for a future reward. Thus, the instruction in the SOU condition must have had sufficient motivational salience for the monkeys, and this was probably the reason why a large number of TANs responded to the stimulus. Interestingly, the relative preferences of the three motivational outcomes differed between the instruction responses and the GO responses. For the instruction responses, the percentage of TANs responding exclusively in the REW condition (R type) was much smaller than that responding not only in the REW but also in the other conditions (RAS, RA, and RS type). A considerable percentage of TANs were A type and AS type (Figs. 5, 7). In contrast, the percentage of R type remarkably increased in the GO responses (Fig. 10).
The present study revealed that TANs in the caudate nucleus are highly responsive, although less selective to the reward-associated and no-reward-associated instructions (Fig. 7), and that GO responses are dominant in the reward condition, especially in the putamen (Fig. 10). This explains why most TANs recorded in the caudate nucleus of monkeys performing memory-guided saccade tasks were similarly responsive to reward-associated and no-reward-associated visual instructions preceding GO (Shimo and Hikosaka, 2001). This could also be the reason that, in previous studies, TANs responded selectively to reward-associated stimuli such as the click noise of a solenoid valve to deliver liquid reward, or selectively to reward itself when immediately followed by conditioned orofacial movement (Kimura et al., 1984; Apicella et al., 1991; Aosaki et al., 1994b, 1995; Ravel et al., 1999).
Ravel et al. (2003) recently reported that TANs discriminate between reward and no-reward (airpuff and loud sound) stimuli in terms of the temporal pattern of responses. The present study did not show a differential temporal pattern in the responses to outcome-associated instructions but did show a difference in the temporal response patterns between the instruction responses and GO responses (Figs. 3, 9).
Functional significance of characteristic activity of TANs
The present study suggests that TANs encode instructed motivational contexts for actions while expecting rewards, escaping aversive events, and enduring the absence of rewards. Although lever release movements followed GO with a short reaction time, the initial facilitation of neuronal activity was better time-locked to GO than to the activation of the prime mover muscle for lever release (Fig. 11). Thus, TANs are not homogeneous in their activity profiles but are composed of three classes: those encoding instructed motivational contexts for actions, those encoding GO for actions expecting different motivational outcomes, and those encoding both of them (Fig. 12). Different classes of TANs appear to participate in distinct, serial processes of goal-directed action planning. This functional subdivision may constitute a neuronal substrate for a notion of “incentive motivational learning” by which environmental stimuli become signals that allow animals to effectively expect various rewards and aversive events and to elicit goal-directed behavior (Rescorla and Solomon, 1967; Bolles, 1972; Bindra, 1978; Dickinson and Balleine, 1994).
As presumed cholinergic interneurons located mostly around the border between the striosomes and matrix in the striatum (Graybiel et al., 1986; Aosaki et al., 1995), TANs, although small in number, may play an essential role in modifying the activity of surrounding projection neurons directly (Calabresi et al., 2000; Partridge et al., 2002) and indirectly by way of fast-spiking interneurons (Koos and Tepper, 2002). Synchronized firing of nearby TANs during a conditioning task would contribute to locally organized processing of corticostriatal inputs conveying sensorimotor and cognitive information to the projection neurons, which are major constituents of the striatal neuron circuits (Raz et al., 1996; Kimura et al., 2003). The activity of projection neurons is profoundly modulated by the expectation of reward (Hikosaka et al., 1989; Schultz et al., 1992; Tremblay et al., 1998), the magnitude of reward (Cromwell and Schultz, 2003), the kinds of reward (Hassani et al., 2001), and the absence of reward (Watanabe et al., 2003). In the present study, projection neurons showed not only the responses to outcome-associated instructions but also a tonic increase in activity in a period between the instruction and outcome delivery, suggesting involvement in maintaining outcome (a goal) information until actually acquiring it (Matsumoto et al., 2003). Most projection neurons differentiated reward–no-reward conditions, but very few neurons discriminated conditions within the no-reward category (aversive and sound). Thus TANs could contribute to goal-directed planning in the striatum by providing projection neurons with signals for both rewarding and aversive contexts for actions. The present study also suggests that the representation of instructed motivational contexts in the activity of TANs might play an indispensable neurobiological role in reward-based learning by modifying dopamine-dependent plasticity of corticostriatal signal transmissions in the striatum (Calabresi et al., 2000; Partridge et al., 2002; Kitabatake et al., 2003) and by adaptively setting a learning rate by a computational means (Doya, 2002).
We are aware that an attentional process allocated to instructions associated with reward, airpuff, sound, and GO might also be involved in the responsiveness of TANs, because attention can contribute to shaping new forms of behaviors toward the direction of goals (i.e., approaching the reward and avoiding aversive events) (Boussaoud and Kermadi, 1997; Dayan et al., 2000; Zink et al., 2003). Thus, it is important to examine the involvement of attention in the activity of TANs as separate from the motivation.
Footnotes
This work was supported by a grant-in-aid for Scientific Research on Priority Areas (C)–Advanced Brain Science Project and a grant-in-aid for Scientific Research (B) (M.K.), as well as by a grant-in-aid for Scientific Research on Priority Areas (A)–Research for Comprehensive Promotion of Study of Brain and a grant-in-aid for Young Scientists (B) (N.M.) from the Ministry of Education, Culture, Sports, Science and Technology of Japan. We thank Harue Matsuda and Ryoko Sakane for technical assistance and Takafumi Minamimoto for his comments.
Correspondence should be addressed to Minoru Kimura, Department of Physiology, Kyoto Prefectural University of Medicine, Kawaramachi-Hirokoji, Kamigyo-ku, Kyoto 602-8566, Japan. E-mail:mkimura{at}koto.kpu-m.ac.jp.
DOI:10.1523/JNEUROSCI.0068-04.2004
Copyright © 2004 Society for Neuroscience 0270-6474/04/243500-11$15.00/0