Recent studies have suggested that the basal ganglia are essential for reward-oriented behavior. A popular proposal is that the interaction between sensorimotor and reward-related signals occurs in the striatal projection neurons. However, the role of interneurons remains unclear. Using the one-direction-rewarded version of the memory-guided saccade task (1DR), we examined the activity of tonically active neurons (TANs), presumed cholinergic interneurons, in the caudate. Many TANs (73/155, 47.1%) responded, usually with a pause, to a visual cue that indicated both the saccade goal and the presence or absence of reward. For most TANs (44/73, 60.3%), the response was spatially selective (contralateral dominant), but was not modulated by the reward significance. TANs are thus distinct from caudate projection neurons, which have responses to the cue that are both spatially selective and reward contingent, and from midbrain dopamine neurons, which have cue responses that are spatially nonselective and reward contingent. TANs were nonetheless sensitive to the reward schedule: in the all-directions-rewarded version (ADR) compared with 1DR, the cue responses of TANs were smaller, less frequent, and less spatially selective. In 1DR, it would first be detected that reward is not given regularly, and this process would then promote discrimination of individual stimuli in relation to reward. We propose that TANs would contribute to the detection of the context that requires discrimination, whereas dopamine neurons would contribute to the stimulus discrimination. These features of TANs might be explained by their cytoarchitecture, namely, as large aspiny neurons.
The striatum contains a large number of projection neurons (Preston et al., 1980) and a small number of interneurons (Phelps et al., 1985). The projection neurons are inhibitory and GABAergic (Feltz, 1971; Fonnum et al., 1978; Fisher et al., 1986), anatomically characterized as “medium-spiny neurons” (Kitai et al., 1979; Preston et al., 1980). Inputs from heterogeneous origins (cortical, thalamic, dopaminergic, etc.) converge onto single projection neurons (Parent, 1990; Wilson, 1990; Smith and Bolam, 1990;Kincaid et al., 1998) and even onto single spines (Bouyer et al., 1984;Freund et al., 1984). Physiological data suggest the integration of sensorimotor, cognitive, and motivational information in individual neurons (Rolls et al., 1983; Nishino et al., 1984; Alexander and DeLong, 1985; Kimura, 1986; Schultz and Romo, 1988; Hikosaka et al., 1989; Kermadi and Joseph, 1995; Kawagoe et al., 1998). These data suggest that multiple kinds of information are integrated within striatal projection neurons, and the results are sent out to the output regions of the basal ganglia, namely, the substantia nigra and the globus pallidus.
Something seems to be missing in this story. Although smaller in number, there are interneurons in the striatum (DiFiglia et al., 1976;Kawaguchi et al., 1995). Among several types of interneurons, only one type has been the subject of behavioral studies. This type is called “tonically active neurons (TANs)” because they fire tonically and irregularly, unlike the projection neurons (Kimura, 1986, 1992). They are presumed to be cholinergic and are characterized anatomically as “large aspiny neurons” (Lehmann and Langer, 1983; Bolam et al., 1984; Phelps et al., 1985). In a classical conditioning paradigm in which reward delivery [unconditioned stimulus (US)] was conditioned by a preceding sensory stimulus [conditioned stimulus(CS)], TANs responded to CS only when CS was followed by US (Apicella et al., 1991, 1997; Graybiel et al., 1994; Aosaki et al., 1995).
These results suggested that the information integration in the striatum may require the contribution of TANs, in addition to the direct convergence onto projection neurons. A modified version of the memory-guided saccade task devised in our laboratory (Kawagoe et al., 1998) would be suitable to test this hypothesis because it requires the subject to fully use cognitive resources to perform the task, but at the same time the motivational state was manipulated by giving reward only for one particular direction of four (thus called “one-direction-rewarded task” or “1DR”). The comparison between the classical conditioning task and our 1DR task was interesting because 1DR required the subject's voluntary action based on cognitive information. A simple prediction was that TANs would respond to the cue stimulus that specifically indicated future reward. We found, however, that this hypothesis was incorrect. We found instead that TANs may carry spatial information, but it was only weakly and nonselectively modulated by the reward contingency.
MATERIALS AND METHODS
We used three adult male Japanese monkeys (Macaca fuscata): monkey D (9.4 kg), monkey G (10.1 kg), and monkey M (9.5 kg). The monkeys were kept in individual primate cages in an air-conditioned room where food was always available. At the beginning of each experimental session, they were moved to the experimental room in a primate chair. The monkeys were given a restricted amount of water during the periods of training and recording. Their body weight and appetite were checked daily. Supplementary water and fruit were provided daily. Throughout the experiment, the monkeys were treated in accordance with the Guiding Principals for Research Involving Animals and Human Beings by the American Physiological Society. All surgical and experimental protocols were approved by the Juntendo University Animal Care and Use Committee and are in accordance with theNational Institutes of Health Guide for the Care and Use of Animals.
Before the recording experiments started, we implanted a head holder, a chamber for unit recording, and an eye coil under the following surgical procedures. The monkey was sedated with ketamine (4.6–6.0 mg/kg) and xylazine (0.4–0.6 mg/kg) given intramuscularly, and then general anesthesia was induced by intravenous injection of pentobarbital (5 mg · kg−1 · hr−1). Surgical procedures were performed under aseptic conditions in an operating room. After the skull was exposed, 15–20 acrylic screws were bolted into it and fixed with dental acrylic resin. The screws served as anchors by which a head holder and chambers, both made of delrin, were fixed to the skull. A scleral eye coil was implanted in one eye for monitoring eye position (Robinson, 1963; Judge et al., 1980). The recording chamber was tilted laterally by 35° from vertical, and its center was aimed at the caudate nucleus according to the atlas of Kusama and Mabuchi (1970). The monkey received antibiotics (sodium ampicillin, 25–40 mg/kg, i.m., each day for 10 d) after the operation.
Memory-guided saccade. The monkeys were first trained to perform memory-guided saccades (Hikosaka and Wurtz, 1983) (see Fig.1). A task trial started with the onset of a central fixation point on which the monkeys had to fixate. A cue stimulus (spot of light) came on 1 sec after onset of the fixation point (duration 100 msec), and the monkeys had to remember its location. If the monkey broke fixation, the trial was aborted, and a new trial started after an inter-trial interval. After 1–1.5 sec, the fixation point turned off, and the monkeys were required to make a saccade to the previously cued location. The target came on 400 msec later for 150 msec at the cued location. The saccade was judged to be correct if the eye position was within a window around the target (usually within ±3°) when the target turned off. The correct saccade was indicated by a tone stimulus and, in some trials, reward (drop of water). The next trial started after an inter-trial interval of 3–4 s. The monkeys could wait for the target to appear and make a saccade to it, but the eyes would then rarely reach the target window (and rarely obtain the reward) because the duration of the target was set short; they were encouraged to make a saccade before the target onset.
Position-reward association. The monkeys were then trained to perform the memory-guided saccade task in two different reward conditions: all-directions-rewarded (ADR) and one-direction-rewarded (Kawagoe et al., 1998) (see Fig. 1). In ADR, every correct saccade was rewarded with the liquid reward together with the tone stimulus. In 1DR, an asymmetric reward schedule was used in which only one of the four directions was rewarded, whereas the other directions were not rewarded. The rewarded direction was fixed in a block of experiments that included 60 successful trials. Even for the nonrewarded direction, the monkeys had to make a correct saccade. The correct saccade was indicated by the tone stimulus with no reward, which was followed by the next trial; if the saccade was incorrect, the same trial was repeated. The amount of reward per block was set approximately the same between 1DR and ADR by setting the amount of reward per trial approximately four times larger for 1DR than for ADR. Other than the actual reward, no indication was given to the monkeys as to which direction was currently rewarded.
Classical conditioning. In addition to the operant conditioning paradigm (i.e., 1DR and ADR), we examined the reward predictability of the neuron by using classical-conditioning paradigms: free reward (FRW) and free reward with cue (FRW-C). In FRW, a reward (i.e., drop of water) together with a tone was given at random intervals (6–10 sec). FRW-C was the same as FRW, except that a visual stimulus preceded the reward by 500 msec. For the visual stimulus, a spot of light (duration 100 msec) was presented at the center of the screen. The monkey was not required to fixate or make eye movements in FRW or FRW-C. Note that there was a time delay of ∼150 msec from the electronic signal for reward to the actual water delivery because of the relatively long plastic tube (∼3 m) for water delivery. This applies to ADR and 1DR as well.
Behavioral testing. The monkey sat in a primate chair in a dimly lit and sound-attenuated room with its head fixed. In front of the monkey was a tangent screen (30 cm from his face) onto which small red spots of light (diameter 0.2°) were backprojected using two LED projectors. The first projector was used for a fixation point, and the second was used for an instruction cue stimulus and a target. The position of the cue stimulus was controlled by reflecting the light via two orthogonal (horizontal and vertical) galvanomirrors.
The cue stimulus was presented at one of four positions with the same eccentricity: left-up (LU), left-down (LD), right-down (RD), right-up (RU) (see Fig. 1). The target eccentricity was usually set at either 10 or 20°.
Once a TAN was isolated, its activity was examined with FRW and FRW-C. We then asked the monkey to perform ADR and 1DR. ADR was performed in one block. 1DR was performed in four blocks, each with a different rewarded direction. The order of blocks was randomized for different neurons. We sometimes repeated the 1DR blocks to confirm the reproducibility of the behavior of the neuron.
In one block of ADR or 1DR, the target cue was chosen pseudorandomly for each trial such that every subblock of four trials contained an equal number of all four positions. One block of ADR or 1DR contained 60 successful trials for the four-target set (i.e., 15 trials for each cue position).
Before the single-unit recording experiment, we obtained magnetic resonance images (0.3 T; AIRIS, Hitachi, Tokyo) such that they were perpendicular to the recording chamber. We then determined the recording sites in the caudate on the basis of the chamber-based coordinates.
Single-unit recordings were performed using tungsten electrodes (0.5–2 MΩ measured at 1 kHz) (Frederick Haer). The electrode was inserted into the brain through a stainless steel guide tube (diameter 0.8 mm) that was used to penetrate the dura. A hydraulic microdrive (MO 95-S, Narishige, Tokyo) was then used to advance the electrode into the brain. TANs were identified by their characteristic spike waveform (broad and often initial positive) and irregular-tonic firing (3–10 Hz), which was dissimilar to the very low frequency firing of presumed projection neurons (Aosaki et al., 1994b). A later histological analysis revealed electrode tracks inside the caudate.
Eye movements were recorded using the search coil method (MEL-20U; Enzanshi Kogyo, Tokyo, Japan) (Robinson, 1963; Judge et al., 1980; Matsumura et al., 1992). Eye positions were digitized at 500 Hz and stored continuously in an analog file during each block of trials. The behavioral tasks as well as storage and display of data were controlled by a computer (PC 9801RA; NEC, Tokyo). The unitary action potentials were passed through a window discriminator (DDIS-1; BAK Electronics), and the times of their occurrences were stored with a resolution of 1 msec.
Analysis of eye movements
We first determined the time of saccade. We judged that an eye movement (candidate of a saccade) occurred if velocity and acceleration exceeded threshold values (30°/sec and 90°/sec2, respectively). The eye movement was accepted as a saccade on the basis of its velocity and duration. After the onset, the velocity must exceed 45°/sec, and this suprathreshold velocity must be maintained for at least 10 msec. The total duration must be >25 msec. The end of the eye movement was determined if the velocity became lower than 40°/sec. These threshold values were determined empirically by applying them to sample saccades. For each saccade thus determined, we obtained several parameters: latency, amplitude, peak velocity, duration, and eye position at the beginning and end of the saccade.
To examine whether the characteristics of memory-guided saccades depended on the reward condition, we statistically compared the saccade parameters (mainly velocity, latency, and amplitude) between the rewarded and nonrewarded conditions of 1DR. For each neuron recorded, we obtained the mean values of saccade parameters for each saccade direction, separately for the rewarded condition (∼15 trials) and the nonrewarded condition (∼45 trials) of 1DR. We then performed a statistical comparison (paired t test) between the rewarded and nonrewarded conditions for each parameter.
Analysis of neuronal activity
Determination of response period. To statistically evaluate the post-cue response, we set the same test window for all recorded TANs, and for each TAN we counted the number of spikes within the window for each trial. The test window was determined on the basis of the population histogram aligned on cue onset averaged across all recorded TANs, using the following procedure. A time window with a duration of 100 msec was moved in 10 msec steps starting at the onset of the cue stimulus. This was done until the averaged firing rate within the window was significantly different from the baseline firing rate (within a 100 msec window starting from 300 msec before the fixation point onset) for five consecutive steps (t test,p < 0.01). The onset of the test window was taken to be the beginning of the window that was the earliest among the five consecutive steps. The 100 msec window was further moved until the averaged firing rate within the window was not significantly different from the baseline firing rate for five consecutive steps. The offset of the test window was taken to be the beginning of the window that was the earliest among the five consecutive steps. This procedure was done separately for FRW, FRW-C, ADR, and 1DR.
Presence or absence of cue response. We then determined whether each TAN showed a response. For each trial we calculated the response firing rate in the test window (converted from the spike count within the window) and the baseline firing rate in the control window (1 sec period before onset of the fixation point). If the difference between the response and baseline firing rates was statistically significant (Wilcoxon signed rank test, p < 0.05), it was judged that the TAN showed a cue response.
Spatial and reward selectivity. To examine the spatial selectivity, all trials in ADR and 1DR were divided into two groups, one with the contralateral cues and the other with the ipsilateral cues. If the difference between the contralateral and ipsilateral groups was statistically significant (Mann–Whitney U test,p < 0.05), it was judged that the activity of the TAN had a directional preference. This analysis was done separately for ADR and 1DR. To examine the reward selectivity, all trials in 1DR were divided into rewarded and nonrewarded trials. If the difference between these groups was statistically significant (Mann–Whitney Utest, p < 0.05), it was judged that the activity of the TAN was modulated by the upcoming reward. These analyses were done separately for ADR and 1DR.
We trained three monkeys on a memory-guided saccade task in two rewarded conditions: ADR and 1DR (Fig.1). As shown in a previous study (Kawagoe et al., 1998), saccade parameters changed depending on the reward condition (Table 1). In 1DR, the saccade velocities were higher in the rewarded trials than in the nonrewarded trials; this was true for all three monkeys (paired t test,p < 0.0001) (Table 1A). The saccade velocities in the 1DR rewarded trials were also higher than in the ADR trials, again for the three monkeys, but less obviously (Table 1A). The saccade latencies in the rewarded trials of 1DR were shorter than those in the nonrewarded trials of 1DR [in two monkeys (Table 1B)] and were shorter than those in ADR in one monkey (Table 1B)]. The saccade amplitudes were not different among the three conditions, although the saccades were more accurate in the rewarded trials than in the nonrewarded trials (data not shown).
Response of TANs to reward and its predictor
We recorded from 169 TANs in three monkeys. TANs showed irregular tonic firing during inter-trial intervals, as reported previously (Aosaki et al., 1994a,b). The firing rate was 5.8 ± 1.1 spikes/sec (mean ± SD; n = 169), ranging from 3 to 10 spikes/sec. The firing pattern was distinctly different from that of presumed projection neurons having baseline firing rates that were almost always <1 spike/sec (Hikosaka et al., 1989).
Most TANs responded to reward, in this case, a drop of water, when reward was delivered while the monkey performed no task (the condition called FRW; see Materials and Methods). The neuron shown in Figure2 responded to the delivery of reward with a pause followed by excitation (Fig. 2 A). Of 169 TANs, 112 (66.2%) showed a statistically significant response (Wilcoxon signed rank test, p < 0.05). Their responses were phasic, usually pure inhibition or inhibition followed by excitation. The population histogram based on all recorded TANs indicates that the inhibition started ∼200 msec after reward onset (Fig. 2 B). However, taking into account the time delay from the electronic signal for reward (Fig. 2, reward) to the actual water delivery (see Materials and Methods), the latency of response of TANs to reward delivery would be ∼50 msec.
When a visual stimulus was presented consistently before the reward delivery (the condition called FRW-C), the TAN shown in Figure2 A now responded to the visual stimulus, not the reward (Fig. 2 C). This shift of activity was consistent among TANs, as indicated by the population histogram (Fig.2 D). Moreover, the visual response was similar in shape to the reward response. On the basis of the population histogram (see Analysis of neuronal activity in Materials and Methods), we determined the reliable response period for FRW-C to be 140–270 msec, although the latency of the response appears to be ∼100 msec. Using this response period as the test window, we performed statistical analyses (see Materials and Methods). Of 111 TANs tested on FRW-C, 70 (63.1%) showed a statistically significant response (Wilcoxon signed rank test, p < 0.05).
That TANs respond to a sensory event preceding reward (as revealed by FRW-C) is consistent with previous findings by Kimura and colleagues (Kimura, 1986; Aosaki et al., 1994b). The results suggest that TANs respond to a reward predictor. However, our experiments using 1DR disagreed with this suggestion, as shown below.
Response of TANs to an instruction stimulus in 1DR
We examined 155 TANs in the caudate (left, 113; right, 42) using 1DR and ADR. Many of them responded to the onset of the fixation point and the onset of the cue stimulus, whereas the response to reward itself was usually absent. These responses were usually phasic and inhibitory, followed or preceded by a weak excitatory component. The response to the onset of the fixation point showed no apparent relation to task performance, and we will not describe it further in this paper. In the following we will focus on the response to the cue stimulus (hereafter simply called “cue response”).
A first example of a TAN with a cue response is shown in Figure3. In ADR, the neuron responded to the cue with a phasic decrease in the firing rate when it was presented in the right-down (RD) or right-up (RU) direction. These directions were contralateral to the side of the recording site (left caudate). To test whether the response was modulated by the upcoming reward, we used 1DR in which only one of four directions was rewarded consistently within a block of 60 trials. The response pattern was qualitatively unchanged in any block of 1DR, but the responses to RD and RU cues became more robust as a pause of activity. For example, the responses to RU cue (shown in thebottom row) were nondifferential whether it indicated reward (fourth from left, with a bull's eye mark) or no reward (left three). The neuron showed no response to reward itself in 1DR or ADR (Fig. 3 B). In short, this TAN carried spatial information, not reward information.
A second example of a TAN, which was less typical, showed some reward-contingent modulation (Fig. 4). This TAN, recorded in the right caudate, responded to a contralateral (LU or LD) cue only when the cue indicated an upcoming reward (see the top two rows). The TAN showed no response to RD or RU cue, even when the cue indicated reward. In short, this TAN carried a combination of spatial information and reward information. In ADR, however, this neuron showed no response even to LU or LD cue.
Contralateral preference of TANs
The contralateral preference of the cue response is visualized in population histograms (Fig. 5). TANs in the left caudate preferred the right cues, whereas TANs in the right caudate preferred the left cues, in a mirror-symmetric manner. Moreover, the cue responses were not obviously different depending on which direction was rewarded. Careful inspection, however, reveals that the cue response tended to be prolonged when the cue indicated an upcoming reward; for example, the response to LD direction in the right caudate was prolonged when LD direction was rewarded (black line) than when it was not rewarded (gray line). We did not examine whether TANs have clear response fields or respond to nonvisual spatial inputs.
On the basis of the population histogram as shown in Figure 9 (see Analysis of neuronal activity in Materials and Methods), we determined the reliable response period for ADR and 1DR to be 140–260 msec, although the latency of the response appears to be ∼100 msec. Using this response period as the test window, we performed statistical analyses (see Materials and Methods). Of 155 TANs examined on ADR and 1DR, 73 (47.1%) showed a statistically significant cue response in 1DR, whereas 39 (25.2%) showed a statistically significant cue response in ADR (Wilcoxon signed rank test, p < 0.05) (Fig. 6).
Figure 7 shows that the two features described so far, the robust spatial preference and the weak reward modulation in 1DR, were fairly common among cue-responsive TANs (n = 73). Figure 7 A indicates that all TANs but two showed inhibitory responses to the contralateral cues (expressed as positive values in the horizontal axis), whereas the same TANs increased or decreased their activity in response to the ipsilateral cues (expressed as negative and positive values in the vertical direction). Consequently, most TANs showed stronger inhibitory responses to the contralateral cues than to the ipsilateral cues (i.e., circles below the 45° line). The contralateral preference was statistically significant for 44 TANs (60.3%) (indicated by open circles). Only four TANs (5.5%) showed ipsilateral preference (gray circles) (Mann–WhitneyU test, p < 0.05; also see Fig. 6). The contralateral preference was also present in the population measure of TANs: a paired comparison based on the mean responses of individual neurons indicates that the response of TAN was significantly stronger to the contralateral cues than to the ipsilateral cues (pairedt test, p < 0.0001).
In contrast, Figure 7 B shows that the cue responses of the same TANs were similar in magnitude between the rewarded and nonrewarded conditions. Only four TANs (5.5%) showed stronger responses in the rewarded condition than in the nonrewarded condition (open circles) (Mann–Whitney U test,p < 0.05). In the population measure of TANs, a paired comparison indicates that the responses of TANs were stronger only marginally to the rewarded cues than to the nonrewarded cues (paired t test, p = 0.04).
Reward schedule dependency of TANs
Examples of TANs shown in Figures 3 and 4 suggested that the cue response may be smaller in ADR than in 1DR. This was supported by the statistical analysis (Fig. 6) indicating that the statistically significant cue response was present less commonly in ADR (n = 39; 25.2%) than in 1DR (n = 73; 47.1%) (Wilcoxon signed rank test, p < 0.05). Furthermore, the cue responses in ADR tended to be spatially nonselective (n = 32; 82%) (Mann–Whitney Utest, p < 0.05) (Fig. 6). Figure8 illustrates these tendencies for individual TANs by comparing the cue responses in 1DR (abscissa) and in ADR (ordinate) separately for the contralateral (Fig.8 A) and ipsilateral (Fig. 8 B) cues. For the contralateral cues (Fig. 8 A), the responses tended to be stronger in 1DR than in ADR (circles below the 45° line). This tendency was statistically significant in 22 TANs (open circles) (Mann–Whitney U test,p < 0.05); no TAN showed the opposite effect. As a population of TANs, a paired comparison indicates that the responses of TANs were significantly stronger in 1DR than in ADR (pairedt test, p < 0.0001). In contrast, the difference between 1DR and ADR for the ipsilateral cues (Fig.8 B) was not clear; no TAN showed a statistically significant difference. As a population of TANs, a paired comparison indicates that the responses of TANs was stronger only marginally in 1DR than in ADR (paired t test, p = 0.047).
Weak reward responses of TANs
As represented in Figure 3, TANs showed no or only weak responses to the reward itself in 1DR or ADR. The responses could occur in the nonrewarded trials as well. We could not determine the response period for this trial-end activity change, probably because the response was too small.
TANs do not predict reward
Our initial experiment was basically a classical conditioning task in which reward delivery (US) was preceded by a spot of light (CS). TANs in the caudate responded to US when it was presented alone (task, FRW), but responded to CS only when both CS and US were presented (task, FRW-C). The results were virtually the same as those reported previously (Apicella et al., 1991, 1996, 1997; Graybiel et al., 1994;Aosaki et al., 1995), indicating that we recorded the same group of neurons. The response was usually a transient pause of firing but was sometimes followed by phasic firing and occasionally preceded by a short burst. These response patterns are also consistent with previous reports (Apicella et al., 1997).
In the 1DR version of the ordinary memory-guided saccade task, the cue stimulus provided both the instruction for action (where to saccade) and the predictive information on reward (whether rewarded). Thus, the cue stimulus in 1DR includes the same function as CS in the classical conditioning task. We expected therefore that TANs would respond to the cue stimulus, and they did.
An important question was on the reward-predicting nature of the response. We expected that the TANs would respond to the cue only when it indicated the upcoming reward. This expectation proved to be wrong. Recent studies from other laboratories have also shown that TANs are not specialized for predicting reward; they may respond to or predict aversive stimuli (Ravel et al., 1999). We found that the activities of TANs were hardly modulated by the upcoming reward but did show some preference to the locations of the cue stimulus (usually preferring the contralateral side).
TANs are sensitive to reward schedule
Although the post-cue response of TANs was not clearly dependent on the outcome of the immediate reward, it showed a different type of reward contingency: the response was weaker and less spatially selective in ADR than in 1DR. What might be the functional meaning of the dependency on reward schedule?
A hint may be given by the comparison with dopamine (DA) neurons. According to a preliminary observation from our laboratory using 1DR (Kawagoe et al., 1999), DA neurons respond to the cue by emitting a short burst if the cue indicates reward and by pausing firing if the cue indicates no reward. This is consistent with the idea that DA neurons encode a reward prediction error (Barto, 1994; Houk et al., 1995; Schultz, 1998; Schultz and Dickinson, 2000). Although the probability of reward before the cue is 25% in 1DR, the cue changes the reward probability to either 100% (rewarded trials) or 0% (nonrewarded trials). Hence, there is a reward prediction error of either +75% or −25%, and the responses of DA neurons (i.e., burst or pause) appear to correspond to these values. On the other hand, DA neurons showed no response in ADR (Kawagoe et al., 1999), because there is no prediction error in that the probability of reward is 100% before and after the cue.
Compared with the selective activation of DA neurons, TANs were much less selective. Nonetheless, TANs responded to the cue stimulus and did so better in 1DR than in ADR, similarly to DA neurons. According to the above argument, the activity of TANs was stronger when the reward prediction error was present (in 1DR) than when it was absent (in ADR). However, TANs do not report the reward prediction error itself, because they do not discriminate between the rewarded and nonrewarded trials in which the reward prediction error has opposite signs. To summarize, TANs would signify that the reward prediction error is present, whereas DA neurons encode the error itself.
One might think, then, that the function of TANs is trivial compared with that of DA neurons. This may not be true. In the framework of classical conditioning theory, the change from ADR to 1DR could be regarded as a “discrimination” process (Rescorla and Solomon, 1967) because in ADR all cue stimuli are followed by reward, whereas in 1DR one stimulus is selectively followed by reward. TANs would thus be related to the detection of the context that requires the discrimination, whereas DA neurons would be related to the discrimination of stimuli. Such a two-step process, context detection followed by stimulus discrimination, would be an efficient way of learning stimulus–reward associations in the complex environment.
Possible mechanism of reward schedule dependency
TANs are presumed to be cholinergic interneurons that are anatomically characterized as large aspiny neurons (Bolam et al., 1984). Although the tonic firing of TANs is caused by their intrinsic properties (Bennett and Wilson, 1999; Bennett et al., 2000), their sensory responses may be caused or triggered by extrinsic synaptic inputs (Bennett and Wilson, 1998). In fact, TANs receive glutamatergic excitatory inputs from the cerebral cortex and the thalamus (DiFiglia, 1987; Wilson et al., 1990; Lapper and Bolam, 1992; Sidibé and Smith, 1999). Thalamic inputs may be more important for the sensory responses of TANs (Matsumoto et al., 2001). DA inputs to TANs (Lehmann and Langer, 1983; Kubota et al., 1987; Calabresi et al., 2000) are crucial for the ability of TANs to respond to sensory stimuli (Aosaki et al., 1994a). This raises the possibility that the cue responses of TANs in 1DR or ADR are caused directly by DA inputs, which might be supported by studies on D2 and D5 receptors on cholinergic interneurons (Yan et al., 1997). However, our data cannot be explained solely by this mechanism, because TANs respond to the cue even in nonrewarded trials in 1DR and ADR, at which DA neurons show a pause or no response (Kawagoe et al., 1999). The effect of DA inputs would thus be less direct, perhaps in addition to the direct effect.
A better idea may be provided by the comparison of TANs and striatal projection neurons. An obvious difference is that the synapses for these inputs are present on cell somata and proximal dendritic shafts in TANs (Kubota et al., 1987) and frequently on dendritic spines in projection neurons (Bouyer et al., 1984; Freund et al., 1984;Kötter 1994; Smith et al., 1994; Smith and Kieval, 2000). We speculate that the anatomical difference may underlie the characteristic behavior of TANs, as illustrated in Figure 10.
Studies from our laboratory have shown that the post-cue activities of caudate projection neurons were usually spatially selective but strongly and consistently modulated by reward outcome, as illustrated in Figure 9 (Kawagoe et al., 1998). Our interpretation of this phenomenon was that cortical inputs, which are spatially selective, are enhanced or depressed by the concurrent dopaminergic inputs, which predict reward. This mechanism would require that the active cortical input be identified accurately and that the coincidence of DA inputs be detected accurately (Wickens and Kötter, 1995). Such a spatiotemporal coincidence detection might be made possible by the convergent cortical and DA synapses onto single spines (Bouyer et al., 1984; Freund et al., 1984; Smith and Bolam, 1990; Smith et al., 1994) (Fig. 10,left).
In contrast, the structure of TANs may not allow such fine tuning of information (Fig. 10, right). We now propose a hypothesis based on the assumption that DA causes diffuse effects along dendrites. In 1DR, DA neurons respond to the cue only when it indicates the upcoming reward, but this signal would cause diffuse and persisting effects in TANs. It follows that any inhibitory (or excitatory) inputs, which are capable of causing the cue responses in TANs, would be modulated by the diffuse DA effects. This might account for the general enhancement of the post-cue responses of TANs in 1DR compared with those in ADR in which DA neurons show no response to the cue.
This hypothesis is still speculative and requires further investigation. Alternatively or additionally, the larger responses in 1DR may already be present presynaptically, for example, among thalamic neurons projecting to TANs (Lapper and Bolam, 1992; Matsumoto et al., 2001).
Relation to projection neurons
Because TANs are interneurons, their signals must be transmitted to projection neurons to be functionally effective (Kawaguchi et al., 1995). The behavior of TANs suggested that they may signify the context that contains stimuli that are potentially more meaningful, that is, 1DR as opposed to ADR. The connection from TANs to projection neurons is usually made by synapses outside spines (Izzo and Bolam,1988) and is mediated by muscarinic receptors (Hersch et al., 1994; Contant et al., 1996). Many studies showed that the direct muscarinic effect to projection neurons is facilitatory (Dodt and Misgeld, 1986; Harsing and Zigmond, 1998; Galarraga et al., 1999), and this effect is state dependent (Akins et al., 1990). On the other hand, the muscarinic input may suppress excitatory inputs to projection neurons presynapticallly (Dodt and Misgeld, 1986; Akins et al., 1990;Barral et al., 1999). Thus the net effect of the pause of TANs could be either disfacilitation or disinhibition.
In any case, this functional connection, together with cortical and DA inputs, would create the situation that fits the double-step hypothesis that we proposed (see above and Fig. 10). TANs respond to the cue stimulus with a pause, thereby leading to a modulation in projection neurons, more strongly in 1DR than in ADR (context detection). If the cue indicates the upcoming reward, DA neurons burst so that the cortical signal signifying the location of the cue will be enhanced (stimulus discrimination).
Although TANs and DA neurons are presumed to be involved in context detection and stimulus discrimination, respectively, they would convey no information about what the detected context is or what the discriminated stimulus is. In contrast, the cortical or thalamic inputs contain the information on the context and stimulus, but not the reason why they are selected.
A question still remains: why should TANs ever be spatially selective if their function is so general as to select a potentially rewarding state? The answer may be found in their boosting action on projection neurons by disinhibition. If TANs had no spatial selectivity, the spatial selectivity of projection neurons would be reduced by the TAN-induced boosting, which is obviously undesirable. Therefore, the TAN-induced boosting effect should also be spatially selective, and this is what we have observed. Furthermore, the fact that the spatial selectivity of TANs was higher in 1DR would further promote the spatial selectivity of projection neurons.
This work was supported by a Grant-in-Aid for Scientific Research on Priority Areas (C) of the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Core Research for Evolutional Science and Technology (CREST) of Japan Science and Technology Corporation (JST), and Japan Society for the Promotion of Science (JSPS) Research for the Future program. Y.S. was supported by Research Fellowships of the Japan Society for the Promotion of Science for Young Scientists. We thank Johan Lauwereyns, Yoriko Takikawa, and Hiro Nakahara for helpful comments, Makoto Kato for designing the computer programs, and Masashi Koizumi for technical support.
Correspondence should be addressed to Okihide Hikosaka, Department of Physiology, Juntendo University, School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo 113-8421, Japan. E-mail:.