The regulation of cognitive activity relies on the flexibility of prefrontal cortex functions. To study this mechanism we compared monkey dorsolateral prefrontal activity in two different spatial cognitive tasks: a delayed response task and a self-organized problem-solving task. The latter included two periods, a search by trial and error for a correct response, and a repetition of the response once discovered. We show that (1) delay activity involved in the delayed task also participates in self-generated responses during the problem-solving task and keeps the same location preference, and (2) the amplitude of firing and the strength of spatial selectivity vary with task requirement, even within search periods while approaching the correct response. This variation is dissociated from pure reward probability, but may have a link with uncertainty because the selectivity dropped when reward predictability was maximal. Overall, we show that spatially tuned delay activity of prefrontal neurons reflects the varying level of engagement in control between different spatial cognitive tasks and during self-organized behavior.
Neurons in the dorsolateral prefrontal cortex (DLPFC) of nonhuman primates are distinguished by two basic properties: their capacity for sustained activation during delay periods and, in spatial tasks, their directional specificity or “memory field” (Funahashi et al., 1989). These dual properties are ideally suited for performing the short-term visuospatial memory, goal maintenance and/or planning functions, which have been identified as working memory (Baddeley, 1986; Goldman-Rakic, 1995), cognitive control, or “as active memory in the service of control” (Miller and Cohen, 2001). These functions operate across a wide variety of behaviors and have been implicated in diverse cognitive tasks (Goldman-Rakic, 1987; Petrides, 1995; Wise et al., 1996; Fuster, 1997).
However, prefrontal function is often narrowly associated with variants of the delayed-response tasks widely used with monkeys (Barone and Joseph, 1989; Funahashi et al., 1989, 1993; Sakagami and Niki, 1994; Rao et al., 1997; Leon and Shadlen, 1999; Wallis et al., 2001; Takeda and Funahashi, 2002). Although these tasks involve more or less complex functions, they all depend on arbitrary associations between stimuli and responses. Few experiments have tested prefrontal functions in self-organized cognitive tasks (Passingham, 1985; Coe et al., 2002), despite the fact that human studies have reported DLPFC activations during free-selection tasks (Frith et al., 1991; Jueptner et al., 1997a). Thus, the question remains as to how the “on-line” processes identified in the sensory cued tasks relates to behaviors in which goal-directed responses are self-generated.
The question of the ubiquity of PFC function is related to the issue of flexibility. A number of brain-imaging studies have shown varying degrees of DLPFC activation depending on task requirements (Frith et al., 1991; Carter et al., 1998; Petersen et al., 1998; Leung et al., 2002). DLPFC activity changes are often associated with modulation of the anterior cingulate cortex (ACC) activity and can be explained by several computational models that define the lateral PFC, ACC, and parietal cortex as the core components involved in executive control (Dehaene et al., 1998; Cohen et al., 2004). The neurophysiological correlates of these phenomena have not been fully explored, in particular because traditional delayed-response tasks have used fixed task requirements.
We addressed these issues by comparing the properties of DLPFC neurons in two tasks: a spatial delayed-response (DR) task, in which the mnemonic process is driven by sensory stimuli, and a spatial problem-solving (PS) task designed to elicit self-generated behaviors (Procyk and Joseph, 1996). The PS task consists of a trial and error search period during which the monkey determined its own response to find the reward, and a repetition period in which the correct response is repeated three times.
We predicted that prefrontal neurons that express directional preferences and sustained activation in the delay period of DR tasks should also elicit sustained activity with similar spatial properties during the PS task. In the light of our previous data from the ACC (Procyk et al., 2000), we further predicted that the activity of DLPFC neurons would be modulated during the PS task which alternates routine (repetition) with nonroutine periods.
Materials and Methods
Data from three monkeys (Macaca mulatta) are reported here. Two monkeys were trained in the two following tasks (see Fig. 1).
Delayed response task.
The monkey sat in a primate chair in front of a vertical touch screen (Elo-Touch; 19 inches, 48 cm) positioned at arm reach distance. Eye movements were monitored and digitized at 100 Hz using an Iscan (Burlington, MA) infrared system.
The animal touched a central target (lever) to trigger the appearance of a fixation point (FP). An FP fixation of 700 ms triggered the appearance of a 500 ms light cue at one of the four possible locations (targets were positioned at the corners of a virtual square 10 cm from the FP (see Fig. 1). After an ensuing delay period of 2–2.5 s (during which the monkey was required to maintain fixation on the FP), all four possible targets were illuminated and, 100 ms later, the FP was extinguished. The monkey then had to make a saccade toward the remembered target. After the monkey fixated on the remembered target for 390 ms, all the targets turned white (go signal), indicating that the monkey was required to touch the target. A juice reward was delivered 600 ms after a correct touch. The trial aborted in case of an incorrect or a premature touch, or a break in eye fixation.
For the first sessions of the experiment, cues were delivered by blocks of three consecutive trials. Thus, in the second and third trials of each block the animals could predict the location of the next cue. This design was then abandoned. It concerns five cells included in the pool of data.
Task events were similar to the DR task, except that the correct target location was not specified by a cue. Monkeys had to find it via trial and error. A problem was composed of two periods: a “search period” that included all incorrect trials up to the first correct touch, and a “repetition period” wherein the animal was required to repeat the correct touch three times.
In the case of an incorrect touch, all targets disappeared, and in the next trial the animal was required to continue his search for the correct target. A juice reward was delivered 600 ms after a correct touch. After the third repetition, a red flashing signal (circle of 8 cm in diameter centered on the FP position) indicated the start of a new problem (i.e., a search for a new correct target). Two consecutive problems never had the same solution (see Fig. 1c).
A third monkey was trained in the PS task with a variation concerning reward size. For each problem, the size of the reward was randomly selected between two sizes (small, 0.25 ml; large, 0.5 ml). At the beginning of each trial, the color of the lever indicated whether the reward would be small (red lever) or large (green lever).
Monkeys were implanted with a head-restraining device, and a magnetic resonance imaging-guided craniotomy was done to expose a circular aperture over the prefrontal cortex (see Fig. 1b, outer circle). A recording chamber was implanted with its center placed at stereotaxic anterior level +30. Neuronal activity was recorded using varnish-coated tungsten electrodes (1–4 MΩ at 1 kHz). One or two electrodes were placed in stainless-steel guide tubes and independently advanced into the cortex through a set of micromotors (Alpha-Omega, Nazareth, Israel). Neuronal activity was sampled with 30 μs resolution and recorded waveforms were sorted into separate units using a template-matching algorithm (CED, Cambridge, UK). All animal training, surgeries, and experimental procedures were done in accordance with National Institutes of Health guidelines, and approved by the Yale Animal Care and Use Committee. The third animal was recorded using an AlphaLab system (Alpha-Omega).
Data analyses and statistics
Performance of the monkeys were measured using previously published methods (Procyk and Joseph, 1996; Procyk et al., 2000). Performance in search and repetition periods were measured using the average number of trials performed until discovery of the correct target (including first correct trial) and the number of trials performed to repeat three times the correct response, respectively. Different types of trials are defined in a problem (see Fig. 1); during search the successive trials were labeled by their order of occurrence (indices: 1, 2, 3…, until the first correct trial). The four correct trials were labeled Cor1, Cor2, Cor3, and Cor4. Indices of trials in search do not reflect the correctness of trials. For each index a certain proportion of Cor1 trials is included.
Reaction times and movement times, as well as saccade reaction times were measured on each trial. Saccade reaction times were evaluated by detecting the earliest peak velocity of horizontal and vertical eye traces after the FP offset. Starting and ending event codes defined each trial. Timings of break fixation errors were computed using the start code as reference time 0 (Amiez et al., 2005).
The probability to be correct (probability to get the reward) is calculated according to the number of choices at each particular trial indices. When the monkey enters a problem there is one of four chances to get a reward on the first trial (p = 0.25), then the probability increases after each error (indices 2, p = 0.33; indices 3, p = 0.5). During repetition, p = 1. However, if the monkey take into account the rule according to which two consecutive problems never have the same solution, probabilities are as follows: for indices 1, p = 0.33; for indices 2, p = 0.5; for indices 3, p = 1. When this is the case, the performance in search should be around two trials on average, which was the case for two of our monkeys.
Home made program codes [Spike2 (CED); MatLab (MathWorks, Natick, MA)] were used to extract average activity in different predefined epochs in each trial: three epochs corresponding to the end of trial and delay (see below), the 400 ms epochs before and after the saccade, and the 200 ms epochs before and after the touch. Visual examination of peristimulus time histogram (PSTH) and ANOVA over these epochs were used to select task-related cells.
In the PS task, the key neuronal events were expected to occur between a touch in one trial and the saccade toward the next trial; we therefore focused our study on this interval of time and excluded activity recorded at the time of target fixation and arm movement. Peristimulus histograms thus represent the juxtaposition of the end of one trial and the beginning of the next trial (see Figs. 3, 5⇓⇓–⇓).
The average firing rates studied here were calculated in three epochs: inter trial interval (ITI), the last 800 ms of the trial; early delay, the first 1 s of the delay, and late delay, last 1 s of the delay period (Fig. 1d). In the DR task, only delay activity was analyzed. Neural activity was considered to be tonic during the delay if it exceeded 2 SDs of the mean activity measured in the 600 ms preceding the feedback and in 1000 ms preceding the next touch (in the DR, search or repetition) and remained above 2 SDs for at least 20 consecutive 50 ms bins.
The activity of a neuron was classified as spatially selective when this activity was significantly modulated by the location of the target chosen by the animal (one-way ANOVA, p < 0.05).
The target preference of a neuron was determined by ranking the average activity measured in the delay of the DR task when this activity was significantly modulated by the location of the target (a, b, c, and d; a is the preferred target and d is the least-preferred target). When we did not observe any significant effect of target location in the DR task, or when the activity was recorded only in the PS task, we used the average delay activity recorded during the repetition period of the PS task. The ranking was then used for population data. For each cell, the activity was normalized to the maximum and minimum of activity measured in the repetition period of the PS task.
To investigate the consistency of the target preference of a cell between search, repetition, and the DR task, we defined a consistent cell as one that either has (1) the same preferred target location across all significant periods, or (2) a consistent side preference (top, bottom, left, right) across all significant periods in case of broad tuning. To compare between periods, we also calculated a preference index: (a − d)/(a + d) that express the difference in activity between the most (a) and least (d) preferred targets.
To study changes in spatial selectivity (tuning) throughout trials during the PS task, we used for each cell the average activity ranked according to preference (a, b, c, d) and calculated the norm of a preference vector: H = (a + c) − (b + d) and V = (a + b) − (c + d); norm = (H2 + V2)1/2.
The value of the preference vector norm was taken as reflecting the strength of spatial coding of the cell. A norm equal to zero would reflect equal activity for the four target locations (for example, see supplemental Fig. 2, available at www.jneurosci.org as supplemental material). This objective measure allows the extraction of one single value, for each trial and each cell, and can be averaged across cells. We used an arbitrary spatial arrangement to calculate the vector: targets a to d were positioned on a square where a is at the top left corner and d at the opposite position (supplemental Fig. 2, available at www.jneurosci.org as supplemental material); thus, H denotes the horizontal axis (from a and c to b and d) and V the vertical axis.
Most statistical analyses were performed with Statistica software (Statsoft, Tulsa, OK). Statistics on vector norms (base 2) were made using Linear Mixed-Effects Models, a generic function that fits a linear mixed-effects model. To measure the global effect of type of trials on the norm, cells were defined as mixed effect. This was done using R (A language and environment for statistical computing; R Foundation for Statistical Computing, Vienna, Austria; ISBN 3-900051-07-0; http://www.R-project.org).
Two animals were trained in the two cognitive tasks with eye and arm response requirements (Fig. 1). During recording sessions, the animals performed the DR task on average at 95% correct. Monkeys rarely repeated an “incorrect” choice on the PS task before finding the correct target and rarely failed to repeat the correct response after discovering it. Analysis of 239, 235, and 538 problems randomly chosen among recordings sessions revealed that the average number of trials to solution was 2.13 ± 1.06, 2.62 ± 1.19, and 2.29 ± 1.16 in the three monkeys, respectively (Procyk and Joseph 1996) (see Materials and Methods for details on behavioral performance in PS tasks). The animals repeated the correct response almost perfectly (average 3.66 ± 1.46, 3.46 ± 1.29, and 3.07 ± 0.42 times, respectively). Although the animals' strategy for determining the correct target was highly efficient, the pattern of successive choices was not systematic. Analyses of series of choices during search periods revealed that monkeys could use clockwise (e.g., choosing target up-left then up-right), counterclockwise, or crossing (going from one target to the opposite target in the display; e.g., from up-left to low-right) strategies with a higher incidence for clockwise and counterclockwise strategies (percent clockwise, counterclockwise, crossing and repeats were 42, 33, 34, 1%, and 37, 40, 21, 2% for the two first monkeys, and 36, 39, 24, and 1% for the third monkey; measured for 1420, 920, and 1233 transitions between two targets for the three monkeys, respectively). The animals could perform up to 120 problems per recording session.
As observed already in previous studies, behavioral parameters reflected important changes between search and repetition periods (Procyk and Joseph, 1996; Procyk et al., 2000). Both reaction time for arm movements and saccades varied between the two periods (Fig. 2a,b, supplemental fig. 1, available at www.jneurosci.org supplemental material). The transition appeared after the first reward of a problem.
A third animal was trained in a PS task in which the size of the reward varied randomly from large to small between problems. The color of the starting lever indicated whether the reward would be small or large (see Materials and Methods). Reaction times did not differ significantly between the two conditions. However, a clear effect of reward size was found for break of fixations (execution errors). Execution error rate is a powerful measure of reward expectancy (Shidara and Richmond, 2002; Amiez et al., 2005; Amemori and Sawaguchi, 2006). On average, the animal made more break of fixation errors late in trials when the expected reward was small (Fig. 2c). This counter-intuitive result is clarified when timings are taken into account; when the large reward was expected, the animal was more likely to break fixations early in the trial than at the crucial time of the saccade toward the selected target. This reveals a tendency of the animal to concentrate more for the full execution of the trial and to ensure reward delivery when the reward is large (Fig. 2d). A reward size effect was also statistically significant for the latency between lever onset and lever touch (larger for small reward; ANOVA reward size for search trials: F(1,1761) = 6.6032; p = 0.01026). Thus, in these conditions the animal discriminated between large and small rewards.
Recordings in DR and PS tasks
The findings presented here are based on 205 neurons, recorded in the caudal DLPFC of two monkeys (Fig. 1) (see Materials and Methods). Of these cells, 82 expressed significant delay activity (i.e., tonic firing during delay periods of PS task and/or DR task depending on whether cells were recorded in both tasks). As already reported in the literature, tonic activity showed various patterns: stable, decreasing, or increasing activity from beginning to the end of the delay (Chafee and Goldman-Rakic, 1998; Rainer and Miller, 2002; Brody et al., 2003). We do not distinguish these types of activity in the present study.
In the PS task, our main focus was on the interval after a response, when the animal received feedback as to the validity of its response, and had to decide which target to choose next. This interval, included between two choices, was divided into the last 800 ms of a trial (before a new trial began; ITI), the first 1000 ms of the next trial (early delay) and the last 1000 ms of the delay period before a new choice was made (late delay) (Fig. 1e). In the DR task, only early and late delay, situated between the cue offset and the saccade time, were used. We analyzed the spatial selectivity during the repetition period. A total of 64.3% of the 82 cells were spatially tuned (i.e., responded with significantly higher firing rates to preferred locations during ITI or delay intervals) (one-way ANOVA, p < 0.05).
A major issue addressed in this study is whether spatially selective delay-period responses revealed in a DR task were engaged during self-generated responses and were predictive of the animal's voluntary choices. Fifty-eight task-related cells were examined in both the DR and PS tasks, of which 36 delay neurons recorded for a sufficient number of trials were used to compare activity in the two tasks. Figure 3, a and b, shows two examples of neurons that exhibited spatially selective activity both during the delay period in the DR task and during late delay in the PS task. Importantly, although neurons were recorded under different task demands and at different times within a session, they displayed identical spatial selectivity.
We compared neural activity recorded in the three different periods: in the DR task, and in the search and repetition periods of the PS task. The analysis (two-way ANOVA, touch by period) showed that 72% (26 of 36) of cell activities were influenced by the period. Figure 4a shows the activity for six cells for the four different target locations and the three periods. The figure illustrates that, although changes in activity can occur between periods, the cells retain their spatial selectivity. A case-by-case examination (see Materials and Methods) revealed that the majority of cells maintained their target preferences. Eighty percent of cells (28 of 36) retained their preferred target a between repetition and search, and between repetition and DR task (Fig. 4b). Twenty cells (77%) had the same preferred target in search, repetition, and DR task. In a few cases, cells lost the statistically significant differences between locations in at least one period (for instance, during repetition).
The overall behavior of DLPFC delay cells was evaluated by normalizing population activity. The target preference of each cell was defined by ranking from (a) to (d), the average activity measured in the delay period for the four target locations chosen by the monkey (see Materials and Methods), with (a) representing the highest activity (Figs. 3, 5, the population average). The reliability of target preference across tasks is clearly illustrated in Figure 5a. It confirms that, on average, target preference rankings did not change across tasks; the preferred and least preferred targets remained in the first and last ranks at the population level. Population average also reveals that delay activity in this part of DLPFC halted at the time of target selection by the gaze. Very little activity was found before arm movements. We evaluated the spatial selectivity by measuring a preference index based on the average activity for the best (a) and least preferred (d) targets (see Materials and Methods). Individual measures show little changes between Repetition and DR and an increase in variability for search compared with the other periods (Fig. 5b).
Neural Coding during the PS task
These data raise another issue concerning the information coded by delay activity during the PS task. As reported in the literature, delay activity can reflect sensory information and/or motor plans and contain information regarding previous choices (Quintana et al., 1988; Funahashi et al., 1993; Romo et al., 1999; Constantinidis et al., 2001; Takeda and Funahashi, 2002; Barraclough et al., 2004). During the PS task, activities could represent either the location of the previous target selected (memory of previous choice), or the location of the forthcoming choice. To distinguish between these possibilities, we analyzed the activity of search trials in which the target chosen in trial n (incorrect choice) is different from the target chosen in the trial n + 1 (Fig. 6a). The top diagram shows averages calculated by sorting trials according to the position of the target (a, b, c, or d) selected in the previous trial (n − 1) (downward arrow). In the bottom diagram, trials were sorted according to the upcoming choice, in (n). If the delay activity contained information about the previous chosen target, then we should find spatial selectivity during delay activity when sorting trials according to previous target locations. However, the discrimination between targets was present only when trials were sorted according to the forthcoming choice, reflecting the predominance of a prospective coding. ANOVA on early and late delay activity revealed that the upcoming choice had an effect on the majority of cells (46 cells; 56% of 82 cells). Note that these statistical results are not directly comparable with the first obtained about spatial selectivity (see Recordings in DR and PS tasks) because, for the present analysis, we only consider activity during search periods. Figure 6b shows p values (ordinate) obtained with ANOVAs. All points below the horizontal line represent statistically significant effects (p < 0.05). The inset compares p values for both tests (upcoming choice vs previous choice) and shows that for a large majority of tests, the significance was higher for the upcoming choice. In only seven cells, the previous choice had greater effect than the upcoming choice. The analysis on ITI activity showed even less effect. Although we did not analyze in depth the different effects on the various types of delay activity (decreasing, increasing, stable), there were small differences for tests performed on early delay and late delay. Mainly, a highest incidence of search versus repetition effect was found for early delay, and slightly more cases with significant prospective coding were found for late delay (ANOVA, p < 0.05; next choice: early delay, 32.9%, late delay, 43.4% of cells; previous choice: early delay, 9.8%, late delay, 11% of cells).
Our data reveal the contribution of delay activity to a prospective function during self-organized behavior, and therefore support the hypothesis of a key role of DLPF in the maintenance of information for prospective use (Passingham and Sakai, 2004).
Search versus repetition
We assessed whether the activity of delay cells for the different choices made by the animal changed during the search and repetition periods of the PS task (two-way ANOVA, target location by period for three epochs) (see Materials and Methods). Both period alone and/or the combination of target location and period significantly affected neural activity in 58% of cells (n = 48). Activities were often modulated between the two periods but rarely completely disappeared. Figure 3, c and d, illustrates two cases, one for which the sustained activity observed from the end of trial until the saccade in search diminished in repetition (Fig. 3c), and another for which activity was modulated between search and repetition in particular at the end of trials (i.e., during the 3 s after a touch) (Fig. 3d). For another example, see Figure 8.
A change in the level of neural activity between search and repetition clearly appears at the population level (Fig. 5a). Although target preference did not change, spatial selectivity (as defined by the differences between the average activity for the four targets) did (Fig. 5b). The main difference between search and repetition activity was a higher firing rate at the end and between trials (i.e., after a response and when the animal had to decide on the next move) (Fig. 5c) (ANOVA, p < 0.05; ITI, early delay and late delay: 39, 35.5, and 25.5% of cells, respectively). Note that the onset of delay activity during search appeared shortly after the time at which a reward should have occurred if the trial were correct (∼200 ms after reward time).
Evolution during the PS task
At the population level, changes in amplitude and spatial selectivity were maximum before the first reward delivery, and decreased during repetition. Population PSTH for successive trials for 82 neurons revealed (1) increased discrimination of the four targets just before the discovery of the correct target, and (2) a decrease in firing rate of delay activity in the repetition compared with the search period (Fig. 7a). Note that, these overall changes between the different successive trials for the entire population were significant for targets a and b only (one-way ANOVA). To better understand this phenomenon we need to examine what happens in the course of search periods (i.e., before arriving at the first correct trial). This requires analyzing neural activity of each cell recorded in different types of search trials (successive indices; see Materials and Methods) and for each target. This was possible for 17 cells where we recorded sufficiently to have search trials with indices 1–3 for the four targets (supplemental Fig. 2a,b, available at www.jneurosci.org as supplemental material). The average activity during the early delay is represented for the different types of trials and for each target chosen so as to reveal changes in spatial selectivity during a problem. To illustrate and quantify these changes, we calculated, for each cell, a preference vector from which we extracted the norm reflecting the strength of spatial selectivity (see Materials and Methods) (supplemental Fig. 2a,b, available at www.jneurosci.org as supplemental material). The norm was calculated for each type of trial. The average norm for the 17 cells showed a significant increase during the search period, with a maximum at indices 3 (i.e., just before discovering the correct target) (linear mixed-effects model; effect of type of trial F = 3.81, p = 0.0038) (Fig. 7b). In comparison to the first trial (indices 1) only the second and third trials show significantly increased spatial selectivity. A clear transition was observed between indices 3 and the second correct trial (supplemental Fig. 2c, available at www.jneurosci.org as supplemental material). Thus, spatial selectivity increased during the trial-and-error process, and the changes ceased as soon as the monkey obtained the first reward.
Activity patterns could either be stable during repetition (denoting a “routine” state of activity), or change during successive repetitions. To address this, we recorded 14 cells during a PS task with five repetition trials. Thirteen cells showed a significant search versus repetition effect. A repeated-measures ANOVA proved significant for five cells but only two showed a coherent increase or decrease of activity along the repetition. Although we used a short repetition (n = 5), these data support the idea of two levels of activation of prefrontal cells, during nonroutine (search) and routine (repetition) behaviors, as if there was a clear functional switch between the two periods.
Reward expectations during PS task
Reward expectation is a key aspect of our experiments and we used two strategies to further investigate this issue.
First, changes in firing rate and target selectivity during search could reflect an increase in pure reward expectation (i.e., should be proportional to the probability of reward delivery). For the two first monkeys, we evaluated the probability to get a reward associated to each trial type using behavioral records of neural recordings sessions (from 395 and 188 problems, respectively). The theoretical probabilities calculated for optimal performance are 0.33, 0.5, 1, and 1 for trial indices 1, 2, 3, and repetition, respectively (see Materials and Methods). Average probabilities estimated from behavioral data were 0.24, 0.46, and 0.6 for trial indices 1, 2, and 3, respectively. We found that spatial selectivity does not follow reward probability: plotting average vector norm values against the probability to get a reward shows the maximum at p = 0.6 and the decrease during repetition (Fig. 7c). This illustrates that changes in spatial selectivity during the PS task is not purely dependent on reward probability.
Second, in a third monkey, 14 delay cells of 66 task-related prefrontal activities were recorded during the PS task with reward size variation (see Materials and Methods). One example is shown in Figure 8, a and b. The delay activity was strongly modulated between search and repetition (Fig. 8a) and also within the search in approach to the solution (Fig. 8a, right). This example illustrates also the fact that reward size had little effect on the modulations observed between search and repetition (Fig. 8b). Statistics on the single or combined effects of reward size, target position, and search versus repetition periods show that the main parameters influencing firing rates were target position and periods. The example shown in Figure 8, a and b, clearly illustrates these results. The statistics for all cells performed on ITI, and early and late delays reveal that spatial selectivity and differences between search and repetition are the major parameters influencing firing rates compared with reward size (Fig. 8c). These data argue against reward expectation as being a confounding factor for delay activity modulation during PS task.
We show that (1) prefrontal neurons exhibiting delay activity in a traditional delayed response task are similarly engaged when an animal makes voluntary choices in a problem-solving task. The spatial selectivity of these neurons remains constant across tasks; (2) prefrontal neurons exhibit rapid plasticity in their activity tuning widths (not in their tuning preference per se) in correlation with changes in behavioral control. Thus, the characteristics of spatially selective prefrontal activity are task independent, but the depth of their spatial tuning can be modulated along the course of flexible self-organized behavior; (3) these modulations are not dependent on pure expectation or prediction but could relate to outcome uncertainty.
Previous experiments demonstrated the capacity of prefrontal unit activity to encode sensory attributes or direction of impending movements, and sequential information (Barone and Joseph, 1989; Funahashi et al., 1993; Constantinidis et al., 2001; Brody et al., 2003). Prefrontal activity has also been found to reflect prospective coding in delayed paired association tasks (Rainer et al., 1999; Brody et al., 2003).
In contrast to the DR, the PS task does not require active memory of past events, but rather requires facilitation (or activation) of a freely selected action plan and, as a consequence of a competitive process, the inhibition of past choices. The present data indicate that delay activity in the PS task primarily encodes the decision and/or plan for the next response and not the location of the previous target. The rapid increase of spatially tuned activity observed after an incorrect choice during search periods presumably relates to the decision made by the monkey regarding its impending choice. Such phenomena have been described by (Fukushima et al., 2004), who showed that most of prefrontal delay activity represented saccade directions while monkeys performed a sequential target-shift task in which monkeys were required to internally update the target position sequentially when a nonspatial target-shift cue was presented (Fukushima et al., 2004). These authors showed a flexible update of the target position encoding after nonspatial target-shift cues. They report that directional preference of each delay activity remained constant and that delay activity was observed whenever the direction of the saccade became the same as the preferred direction of the neuron after target position updating.
In analogy with the delay period of a DR task, the epoch after a response in the PS search period is an information “black-out” (i.e., it does not contain explicit external information regarding the correct target or the location of the target to choose in the following trial, which must be supplied by the animal's own volition). The abrupt increase in spatially tuned activity is presumably the equivalent of the mnemonic response in the DR task except that it is triggered here by a self-generated internal “cue.” That the delay activity is essentially the representation of a prospective move or movement goal is supported by the fact that the tuning of the average activity during delay predicts the animal's response.
It could be argued that the response modulations that we have observed are caused by change in rule encoding, in line with the proposition that prefrontal neurons encode behavioral rules (Miller and Cohen, 2001; Wallis et al., 2001; Genovesio et al., 2005). The DR and PS tasks in the present study involve different rules. In the DR task, the animal has to hold in mind an externally given stimulus position that differs in each single trial. The PS task, in contrast, engages a “win-stay/lose-shift” strategy. Nevertheless, despite the fact that the task rules changed, the spatial preference and tonic firing of the cells remained constant. Cases of preservation of spatial selectivity between a visual conditional and a spatial task have been reported by (White and Wise 1999), although in that study a rule effect was found in a majority of the cells recorded. In the present case, rule encoding fails also to explain modulations within the PS task, because (1) search and repetition are driven by the same rule “reward/stay no-reward/shift” and (2) the same rule applies also throughout the search period. Our data and findings related to abstract rule encoding in the prefrontal cortex open the possibility that generic rules are implemented by specified neurons as required by specific task domains (e.g., spatial tasks vs visual tasks).
Although the spatial specificity of cells is maintained in the different tasks, the spatial discrimination power of the response varies in particular within the search period in the PS task. Reward expectation or reward consumption could cause these variations because several studies have shown that reward quality or quantity can influence delay activities in the prefrontal cortex (Watanabe, 1996; Leon and Shadlen, 1999; Amemori and Sawaguchi, 2006). The difference between search and repetition measured during ITI may well be a result of reward consumption: after correct responses, the focus of attention would presumably be on the reward rather than on the next response. In this way, the firing of prefrontal delay cells can relate to predicted behavioral outcomes or to the predictability of forthcoming movement selections (Quintana and Fuster, 1992; Watanabe, 1996) as well as previous choice outcomes as shown during a competitive game-like task (Barraclough et al., 2004).
Reward expectation obviously plays some role in the increase of selectivity before the first reward. Reward is a major element of the task especially in the search period. However, information purely related to reward probability does not explain all the variations in DLPF activity. At the population level, reward probability can explain neither the similarities seen between the DR task and search period, nor the differences between the DR task and repetition. In fact, the level of expectations is similar in the DR task and in repetition, and differs in the search period. Variation in the size of the reward expected by the monkey does not induce delay activity variations comparable with those observed depending on types of trials, a result that have been observed for planning related activity (Kobayashi et al., 2002; Amemori and Sawaguchi, 2006). Finally, pure expectation (directly related to reward probability) does not account for activity changes between the last trials in the search period (indices 3) and the other correct trials: if the amplitude of activity were related to reward probability, then activity should remain elevated for the following repetition trials during which reward expectation is maximal (i.e., during which the probability to get a reward is p = 1). Thus, modulation of prefrontal activity during the PS task does not simply reflect an anticipatory reward-related bias as reported in the caudate nucleus (Lauwereyns et al., 2002). However, it is possible that DLPFC integrates some dimensions of reward expectation (Kobayashi et al., 2002; Amemori and Sawaguchi, 2006) with other factors involved in learning and control of behavior. One candidate is uncertainty.
Dopamine modulates delay activity in the prefrontal cortex (Williams and Goldman-Rakic, 1995; Williams and Castner, 2006). Mesencephalic potentially dopaminergic neurons can show sustained activity that varies according to uncertainty of obtaining a particular reward (Fiorillo et al., 2003). This information might participate in modulating PFC delay activity in the PS task and could explain higher activity during search. The modulations we report here do not completely fit uncertainty (which should be maximal at p = 0.5) (Fig. 8), but one should note that on trial indices 3 the animal is often facing a choice between two remaining targets (i.e., an alternative with maximum uncertainty regarding the outcome). Moreover, this choice is key because the outcome at p = 0.5 contains the maximal amount of information (one bit) regarding the next choice. In this context, our findings are compatible with the hypothesis that the demands on dopamine for prefrontal modulation changes in relation to engagement of working memory and executive functions (Williams and Castner, 2006).
Different theories invoke sustained activity and its variations as a major element of DLPFC function (Goldman-Rakic, 1998; Miller and Cohen, 2001). Increased working-memory load, higher cognitive control or attentional selection are concepts widely used to interpret prefrontal activity modulations dependant on task requirements (Miller and Cohen, 2001; Leung et al., 2002; Kerns et al., 2004). Note that these concepts are closely related. In fact, an executive function (including working memory) can be thought of as a special case of attentional phenomena (Barkley, 2001) and, conversely, attention can be seen as “a varying reflection, in behavior, of the operation of a single underlying mechanism of cognitive control” (Miller and Cohen, 2001). Various dimensions of attentional control and selection have been put forward as the main functions of the PFC (Rowe and Passingham, 2001; Lebedev et al., 2004).
Our descriptions of delay activity modulation can be related to increases in PFC activity as seen in human brain-imaging studies during free random selection, learning, highly demanding working-memory tasks, and conflict situations, as opposed to lower activation for well known or over-practiced motor responses (Jueptner et al., 1997b; Carter et al., 1998; Petersen et al., 1998; Leung et al., 2002). Similar tasks have also revealed the collaborative role of the ACC during behavior adaptation and we have described changes in ACC neuronal activity during the PS task with similar dynamics to those observed here in the DLPFC (Procyk et al., 2000).
Changes in delay activity observed in the three different conditions (DR, search, and repetition) might relate to the different task demands. It is likely that the DR task and the search period of the PS tasks are situations in which working memory is most heavily engaged. Increased involvement of this process is particularly important to deal with unfamiliar or challenging situations in contrast to routine situations. This interpretation would explain why the strongest and most selective activities are observed at the end of the search (Figs. 7, 8) (i.e., when inhibiting previous incorrect choices and keeping the next potentially correct one in mind is critical). High control of behavior through spatially selective delay activity would facilitate the selection and expression of the appropriate responses, suppress or win competition with the inappropriate or prepotent ones, and contribute to resisting interference (Goldman-Rakic, 1998; Miller and Cohen, 2001; Sakai et al., 2002). This phenomenon might also take place during visual attentional tasks in which filtering of visual information has been described (Everling et al., 2002). The adaptive properties of prefrontal delay activity are expected to reflect the basic mechanisms by which they influence connected structures. In this framework, the present data reflects variations of top-down signals devoted to maintain and bias internal representations to guide goal-oriented behaviors (Goldman-Rakic, 1998; Sakai et al., 2002; Cohen et al., 2004).
This work was supported by Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, Fyssen foundation, and the NRJ foundation (E.P.). This paper is dedicated to the memory of P. S. Goldman-Rakic. We thank H. Kennedy for his help and comments on this manuscript and R. Quilodran, L. Sartorius, and B. Haider for their help during experiments. Previous editions of this manuscript greatly benefited from the help of C. Constantinidis, B. Haider, J. P. Joseph and R. E. Passingham. This manuscript, including discussion, has been modified from the previous versions written by both P.S.G.-R. and E.P.
- Correspondence should be addressed to Emmanuel Procyk, Institut National de la Santé et de la Recherche Médicale, U371, Stem Cell and Brain Research, Department of Integrative Neuroscience, 18 Avenue Doyen Lépine, 69500 Bron, France.