Abstract
The frontal eye field (FEF) plays a well-established role in the control of visual attention. The strength of an FEF neuron's response to a visual stimulus presented in its receptive field is enhanced if the stimulus captures spatial attention by virtue of its salience. A stimulus can be rendered salient by cognitive factors as well as by physical attributes. These include surprise. The aim of the present experiment was to determine whether surprise-induced salience would result in enhanced visual-response strength in the FEF. Toward this end, we monitored neuronal activity in two male monkeys while presenting first a visual cue predicting with high probability that the reward delivered at the end of the trial would be good or bad (large or small) and then a visual cue announcing the size of the impending reward with certainty. The second cue usually confirmed but occasionally violated the expectation set up by the first cue. Neurons responded more strongly to the second cue when it violated than when it confirmed expectation. The increase in the firing rate was accompanied by a decrease in spike-count correlation as expected from capture of attention. Although both good surprise and bad surprise induced enhanced firing, the effects appeared to arise from distinct mechanisms as indicated by the fact that the bad-surprise signal appeared at a longer latency than the good-surprise signal and by the fact that the strength of the two signals varied independently across neurons.
Significance Statement
Neurons in the macaque frontal eye field (FEF) respond strongly to cues predicting rewards. Their firing might be related to the representation of value or to the capture of attention. We set out to resolve this issue by monitoring neuronal responses to cues conveying surprising information about upcoming rewards. We find that surprising cues elicit enhanced responses regardless of whether the surprise is good (more reward than expected) or bad (less reward than expected). These findings, which constitute the first evidence for surprise-driven activity in the FEF, are most harmonious with an interpretation based on capture of attention.
Introduction
In seminal studies carried out during the 1990s, Schultz and colleagues discovered that midbrain dopaminergic neurons encode events that violate reward expectation in a signed manner, responding with excitation and suppression to positive and negative errors, respectively (Schultz, 1998). They proposed, in line with classical learning theory, that the resulting variation in release of dopamine acted in the striatum and other target structures to mediate operant and pavlovian conditioning (Schultz et al., 1997). In other subcortical nuclei, neurons carry unsigned reward–surprise signals, responding strongly to cues that announce surprising outcomes whether good or bad (Lin and Nicolelis, 2008; Matsumoto and Hikosaka, 2009; Bromberg-Martin et al., 2010; Roesch et al., 2010; Li et al., 2011; Esber et al., 2012; Roesch et al., 2012; Monosov et al., 2015). Unsigned signals carried by these neurons might contribute to learning by enhancing the associability of unexpected events (Mackintosh, 1975; Pearce and Hall, 1980; Esber and Haselgrove, 2011) or by mediating capture of attention through the orienting reflex (Nieuwenhuis et al., 2011).
Numerous subsequent studies have been carried out to determine whether neurons in the cerebral cortex also respond to surprising reward-related events. Areas subjected to scrutiny include the supplementary eye field, anterior and posterior cingulate cortex, dorsolateral prefrontal cortex, and orbitofrontal cortex (Tremblay and Schultz, 2000; Ito et al., 2003; McCoy et al., 2003; Amiez et al., 2005; Matsumoto et al., 2007; Seo and Lee, 2007; Kennerley et al., 2009, 2011; Kennerley and Wallis, 2009; Takahashi et al., 2009; Sul et al., 2010; Asaad and Eskandar, 2011; Bryden et al., 2011; Hayden et al., 2011; So and Stuphorn, 2012; Klavir et al., 2013; Kawaguchi et al., 2015; Khamassi et al., 2015; Asaad et al., 2017; Monosov, 2017; Stalnaker et al., 2018; Oemisch et al., 2019). The resulting body of findings is complex and inconsistent with regard to both the occurrence and the sign of reward prediction error signals.
The aim of the present study was to characterize responses elicited by surprising reward-related events in the macaque frontal eye field (FEF). Neurons of the FEF respond to visual stimuli in restricted receptive fields and are active in conjunction with planning and executing saccades to the corresponding locations (Bruce and Goldberg, 1985; Thompson et al., 1996; Mayo et al., 2016). The strength of the visual response is enhanced by attention directed to the stimulus (Wurtz and Mohler, 1976; Thompson et al., 2005). Consequently, the FEF is considered to embody a salience map in which neurons representing a given location fire at a rate proportional to its current salience (Fernandes et al., 2014; Moore et al., 2003; Thompson and Bichot, 2005; Fecteau and Munoz, 2006; Katsuki and Constantinidis, 2012). We tailored the design of the present experiment to these known properties of FEF neurons. On each trial, we presented, within the neuronal receptive field (RF), first a visual cue predicting with high probability that the reward delivered at the end of the trial would be large or small and then a visual cue announcing its size with certainty. The second cue usually confirmed but occasionally violated the expectation set up by the first cue. The violation induced surprise which could be either good or bad (more or less reward than expected). On the assumption that a cue violating expectation captures attention, we hypothesized that FEF neurons would respond with an increase in the firing rate to cues conveying both good and bad surprise. The results confirmed this hypothesis. Numerous neurons responded to surprising cues with elevated firing, whereas instances of reduced firing were no more common than expected by chance. The increase in the firing rate was accompanied by a decrease in spike-count correlation as expected from capture of attention (Cohen and Maunsell, 2011). The effects of good and bad surprise appeared to arise from distinct mechanisms insofar as they appeared at different latencies and varied in strength across the neuronal population with a considerable degree of independence.
Materials and Methods
Subjects
Two adult male rhesus macaques (Macaca mulatta) were the subjects of this study: M1 (12 kg, laboratory designation R) and M2 (6 kg, laboratory designation S). All procedures were in accordance with guidelines set forth by the United States Public Health Service Guide for the Care and Use of Laboratory Animals and were approved by the Carnegie Mellon University IACUC (Institutional Animal Care and Use Committee).
Data collection procedure
In each monkey, before the period of data collection, we implanted a cylindrical Cilux recording chamber with an inner diameter of ∼2 cm (Crist Instrument) over the right frontal lobe. On each recording day, we inserted a cylindrical plug containing guide holes in a square grid at 1 mm spacing into the chamber. We then inserted a dura-penetrating guide tube through a selected guide hole. We advanced an electrode through the guide tube into the cortex under control of a hydraulic micromanipulator (M0–10; Narishige International). During different sessions, the electrode was either a varnish-coated tungsten microelectrode (FHC) with an impedance of 0.2–5.0 MΩ at 1 kHz or a 16-channel linear microelectrode array (U-Probe, Plexon) with contacts spaced 150 μm apart on one side of the shank. Following insertion of the multichannel array, we allowed an hour to pass before we commenced recording.
Stimulus presentation, eye-position monitoring, and reward delivery were controlled by a PC-running NIMH Cortex. All stimuli were presented on a 21″ LCD monitor with a resolution of 1,024 × 768 pixels and a refresh rate of 60 Hz at a viewing distance of 23 cm. Eye position and pupil diameter were monitored by means of an ISCAN ETL-200 video oculographic system (ISCAN). Continuous electrophysiological signals, eye-position and pupil-diameter, and time-stamped event markers generated by NIMH Cortex were stored on a Plexon OmniPlex system (Plexon). In calibration sessions, we collected oculographic data, while monkeys maintained fixation on targets displaced from the center of the screen in cardinal directions at known eccentricity.
Reward–surprise task
This task required the monkeys to maintain fixation on a central achromatic spot with a diameter of 0.3° of visual angle, while two reward-predicting visual cues (natural images subtending 4° of visual angle and centered 11° from fixation) were presented in sequence peripherally followed by delivery of either a large (M1, 1.3 ml; M2, 0.8 ml) or a small (M1, 0.06 ml; M2, 0.04 ml) juice reward (Fig. 1A). The ∼20:1 ratio between large and small rewards ensured that the difference was meaningful to both subjects. The monkey initiated each trial by attaining fixation. After 500 ms, Cue 1 appeared for 300 ms. The identity of this cue predicted with high probability (p = 0.83) the size of the reward to be delivered at the end of the trial. A 600 ms interstimulus interval ensued. Then Cue 2 appeared for 300 ms at the same location. The identity of this cue predicted with certainty (p = 1) whether the reward delivered at the end of the trial would be large or small. We refer to this cue as “announcing” the size of the reward so as to indicate that the prediction was certain. After a delay of 600 ms, the fixation spot was extinguished, and the juice reward was delivered. Reward delivery was contingent on the monkey's having maintained fixation throughout the trial.
Reward–surprise task. On each trial, the monkey maintained central fixation while two visual cues were presented in sequence in the neuronal RF. Cue 1 indicated that a large or small reward was likely on this trial (p = 0.83). Cue 2 announced with certainty that the reward would be large or small. Reward was delivered after a further delay. The information conveyed by Cue 2 might be good and expected (A), good and surprising (B), bad and expected (C), or bad and surprising (D). Two-drop and one-drop icons are provided solely for the reader. On occasional trials involving a good outcome (E) or a bad outcome (F), Cue 2 announced the outcome predicted by Cue 1 but had the form of an image that did not usually follow Cue 1. During a full session, Cue 1 could be any of four images and likewise for Cue 2. The degree to which Cue 2 elicited surprise was controlled by the frequency with which particular images in leading and trailing position were paired. In the matrix of pairings (G), the entry in each cell indicates the number of trials during a standard session in which a given Cue 1 (column) was followed by a given Cue 2 (row).
A full run consisted of six to eight successive blocks each consisting of 24 successfully completed trials. Within a block, there were four trials conforming to each condition identified as “expected” in Figure 1G and one trial conforming to each condition identified as “surprising.” The counts in Figure 1B indicate the number of trials conforming to each condition completed in a run of six blocks. The within-block sequence of conditions was random. Two leading images were associated with large reward and two with small reward and likewise for trailing images. This feature allowed dissociating the reward associations of the images from their physical properties. It also allowed measuring responses to image-based surprise in the absence of reward-based surprise (“surprising image” conditions in Fig. 1G). To further rule out any effect of the cues’ physical properties, we counterbalanced their associations across monkeys so that cues predicting or announcing a large reward in M1 predicted or announced a small reward in M2 and vice versa.
The central question of this experiment was whether neurons would respond differently to the given Cue 2 depending on whether it confirmed or violated expectation regarding the reward size. To answer this question, we compared neuronal activity in “expected-good” (or “expected-bad”) trials to neuronal activity in “surprising-good” (or “surprising-bad”) trials (Fig. 1B–E). It would seem reasonable to ascribe any measured difference to reward-based surprise. However, one cannot absolutely rule out a contribution from the fact that the identity of the trailing image, independently of its reward association, was surprising. To control for such an effect, we included conditions in which the identity of Cue 2 was surprising, whereas the reward it announced was expected (“surprising image” conditions in Fig. 1G). These conditions were excluded from all analyses except those focused specifically on assessment of the effects of image-based surprise.
During a training phase before the period of neurophysiological data collection, each animal was exposed to a variant of the task that included only conditions in which Cue 1 was followed by a confirmatory Cue 2 (conditions on the diagonal in Fig. 1G). Training involved ∼3,000 presentations of each cue for Monkey M1 and 2,000 presentations for Monkey M2. During the subsequent phase of data collection, each animal was exposed only to the task containing the full set of conditions shown in Figure 1G.
Memory-guided saccade task
At the outset of each experimental session, we mapped the receptive fields of the recorded FEF neurons using a memory-guided saccade task. The monkey initiated a trial by fixating a central spot for 150 ms. Then a small target appeared for 50 ms. On each trial, the target appeared at one of six possible locations. These were 11° eccentric relative to fixation and were distributed at 60° intervals around the clock with two of the locations straight above and straight below fixation, two in the contralateral hemifield and two in the ipsilateral hemifield. After a 300 ms delay, the fixation spot was extinguished, instructing the monkey to execute a saccade to the remembered location. We inspected online poststimulus-time histograms to determine whether neurons responded to the target onset. Only if visual responses were evident did we proceed to collect data during performance of the reward–surprise task. When recording through a single electrode, we centered cues at the location yielding the strongest responses in the memory-guided saccade task. When recording through a multicontact linear probe, we centered cues at the location yielding strong responses on the largest number of channels.
Identification of FEF
In each animal, we implanted a recording chamber over the right frontal lobe under guidance of structural magnetic resonance images, so as to gain access to the cortex forming the anterior bank of the genu of the arcuate sulcus (CMU-Pitt BRIDGE Center RRID:SCR_023356). We subsequently placed electrodes in the anterior bank of the sulcus under the guidance of depth readings and grid coordinates. In M1, at the end of the period of data collection, we confirmed that recording sites were in the FEF by demonstrating that saccades could be elicited at low current with electrical microstimulation. While the monkey performed a task requiring only central fixation, we delivered a bipolar pulse train (250 μs pulse width, 350 Hz, 100 ms duration) at currents up to 50 μA. Starting at 10 μA, we gradually increased the current until we observed consistent contraversive saccades time-locked to the stimulation onset. Throughout the region from which data had been collected, we encountered sites at which saccades could be elicited in this manner. The threshold for eliciting saccades on >50% of trials was typically 20–30 μA. In M2, the period of data collection ended when guide-tube insertion induced a hematoma accompanied by symptoms of contralesional neglect. The resulting tissue damage prevented microstimulation mapping.
Database
We collected data in the context of the reward–surprise task during 68 sessions. In 57 of these sessions, we recorded at one site with a single microelectrode. In the remaining 11 sessions, we recorded at multiple sites with a linear array consisting of 16 contacts at 150 μm spacing. The location at which cues were presented (determined during a preliminary run of the memory-guided saccade task) was always contralateral to the recording chamber or on the vertical meridian. It was straight up in 20 sessions, straight down in 6 sessions, up and to the left in 30 sessions, and down and to the left in 12 sessions. Threshold-crossing events were sampled by the OmniPlex system at 40 kHz and digitally stored in .pl2 format. Waveforms stored from each recording session were sorted off-line with a PCA-based approach implemented by Plexon Offline Sorter (Plexon). Single units were defined on the basis that a group of waveforms composed an independent point cloud in principal component space and that no >1.5% of all waveforms assigned to one unit occurred within a 1 ms time bin. The voltage threshold was occasionally adjusted upward before cutting clusters in the off-line sorting procedure in order to satisfy these criteria. We selected for subsequent analysis only data from neurons meeting an inclusion criterion based on visual responsiveness in the reward–surprise task. The criterion for inclusion was that the average firing rate within a 300 ms window immediately following the leading-cue onset should significantly exceed the average firing rate in 100 ms window immediately preceding the leading-cue onset (p < 0.01, t test). Out of a total of 533 neurons tested, 303 met this criterion and were included in subsequent analyses (83 in Monkey M1; 220 in M2). Neurons excluded at this step, although different from the selected population with regard to visual-response significance, did not differ in any obvious way with regard to the overall pattern of task-related activity.
Experimental design and statistical analysis
Testing the significance of differences between conditions
To determine whether the time-varying population–mean firing rates measured under two conditions were significantly different at any time within the 600 ms period following the cue onset, we employed a cluster-based permutation test (Maris and Oostenveld, 2007). This nonparametric approach avoided pitfalls inherent in basing analysis on an arbitrary time window or alternatively carrying out comparisons in multiple windows. We first smoothed data from each trial by convolving the firing rate in 1 ms bins with a 25 ms standard deviation Gaussian kernel. We then converted the value in each bin to a z-score based on the mean and standard deviation of all bins in all trials under all conditions for the neuron in question. All subsequent steps of analysis were based on data represented in this form. To identify the “best” cluster, we carried out a two-tailed paired t test comparing the distribution of firing rates under one condition to the distribution under the other condition in each 1 ms bin. If the test yielded a p < 0.05, we tagged the bin as positive or negative according to which condition was associated with the higher firing rate. If the test yielded a p ≥ 0.05, we tagged the bin as zero. The t test was used as a means for imposing an arbitrary threshold and not to assess a statistical significance. For each cluster of contiguous bins of uniform sign, either positive or negative, we computed cluster weight as the sum of the t statistics of its bins. The cluster with the greatest weight was classified as best. To assess the statistical significance of the best cluster, we generated a permutation-based distribution of weights by applying the above-described procedure 1,000 times to shuffled data. Shuffling involved randomly redistributing the condition labels across the trials for each neuron. We computed p as the fraction of the 1,000 iterations in which the weight of the best cluster was greater than the weight of the originally observed best cluster. If p < 0.05, then we classified the originally observed best cluster as significant.
Spike-count correlation
We measured the spike-count correlation between a pair of neurons as the Pearson's correlation coefficient between the two firing rates measured across multiple trials conforming to the same condition (Cohen and Kohn, 2011). Because this measure is sensitive to outliers, we excluded trials in which either neuron had a firing rate greater than three standard deviations from its mean (Zohary et al., 1994; Smith and Kohn, 2008). We based the reported measurements on a time window 228–395 ms following the onset of the trailing image. This window was chosen as a period during which both good-surprise and bad-surprise effects were prominent as indicated by an overlap of significant clusters identified by the cluster-based permutation test. Using a markedly larger window (100–600 ms) or the good-surprise window alone (100–463 ms; permutation test) yielded qualitatively similar results. Analysis was based on 12–16 trials for surprising conditions and 48–64 trials for expected conditions. The difference in trial number would be expected to affect systematically the standard deviation but, critically, not the mean of the measures made over multiple pairs. We combined in each category trials in which Cue 2 had two identities. This ensured that the number of trials was sufficient for robust measurement of cross-trial correlation. Cue identity did not affect the comparison between surprising and expected conditions because the cues were the same.
Signal timing
To characterize signal timing, both in population-averaged data and in individual-neuron data, we computed the time following the cue onset at which the signal attained half its peak height. Signal strength was expressed as a function of time in 1 ms bins and smoothed by convolution with a Gaussian kernel with a standard deviation of 25 ms. The peak was defined as the maximum of signal strength within a window 50–400 ms following cue onset. We then computed time to half-peak as the earliest bin in which the signal exceeded (peak − baseline) / 2 where baseline was the firing rate at the time of the cue onset. If time to half-peak had already been attained at the first bin in analysis window, then the neuron was excluded from the dataset.
Analysis of signal independence
To determine whether two signals varied across the neuronal population with a significant degree of independence required estimating the reliability of each signal. We based our estimate of reliability on a split-halves analysis. For each neuron, under each of the 12 conditions imposed in a run, we numbered trials in a temporal order and divided the data into odd-trial and even-trial subsets. We computed the signal of interest for each subset. We then computed the Pearson's correlation (r) across neurons between signal strength in the odd subset and signal strength in the even subset. By applying a Spearman–Brown correction, r′ = 2r / (1 + r), we estimated the maximal correlation plausibly achievable between this signal and any other signal in an analysis based on the full dataset. To determine whether the correlation between two signals was significantly less than the ceiling imposed by reliability, we applied a z-test to the Fisher-transformed values of the observed correlation coefficient and the lower of the two r′ values based on split-halves analysis.
Principal component analysis
Neurons may have varied with respect to the time-course and condition dependence of firing elicited by Cue 2. To investigate this possibility, we carried out a principal component analysis (PCA) in the following manner. For each neuron, under each of four conditions (good surprise, good expected, bad surprise, and bad expected), we computed the firing rate in 60 nonoverlapping bins spanning a 600 ms window beginning with the cue onset. We normalized each 60-element vector by subtracting from it the mean of all four vectors. We concatenated the four normalized vectors in a standard order to produce a 240-element vector which we then z-score normalized. We based PCA on the full 303 neuron × 240 timepoint matrix. To assess the significance of the variance explained by each principal component, we performed a shuffle control. The shuffle procedure involved all of the steps described above with the sole exception that, on each iteration, for each neuron, the four 60-element vectors were concatenated in a random order. The variance explained by each principal component was averaged over 1,000 shuffles. We then compared the variance explained by each principal component with the average of variance explained across all shuffles to determine how many principal components explained more variance than would be expected by chance.
Results
We monitored the spiking activity of 303 visually responsive FEF neurons (83 and 220 in M1 and M2, respectively) as monkeys performed a task requiring that they maintain central fixation while two visual cues were presented in sequence inside the RF followed by delivery of a large or small juice reward (Fig. 1A). The cues conveyed information about the size of the reward to be delivered at the end of the trial. Their associations were fixed throughout an initial training period and the entire ensuing data collection period. Cue 1 predicted with high probability (p = 0.83) the delivery of either a large or small reward. Cue 2 announced with certainty (p = 1) that the reward would be large or small. Cue 2 usually confirmed the prediction conveyed by Cue 1 (expected outcome, p = 0.83) but occasionally violated it (surprising outcome, p = 0.17; Fig. 1B). The monkeys were sensitive to the cue–reward associations as indicated by the fact that good leading cues elicited markedly stronger visual responses than bad leading cues (Fig. 2A,D; p = 1.51 × 10−21; Wilcoxon signed-rank test). This pattern inverted late in the delay period, as if the monkeys allocated attention particularly strongly to the location where Cue 2 would appear when there was a possibility of its reversing an unpleasant message conveyed by Cue 1. To disentangle the many possible explanations for this effect, based on attention, information anticipation, arousal, or preparation to make or suppress eye movements (Gottlieb et al., 2013; Zhang et al., 2019) would require manipulations outside the scope of the present study. All effects described above were present in both monkeys.
Neurons responded more strongly to Cue 2 when the reward it announced was of a surprising size than when it was of an expected size. This was true both (A–C) when the reward was large and the surprise good and (D–F) when the reward was small and the surprise bad. A, The mean population firing rate as a function of time on trials when Cue 2 announced a surprising (red) or expected (blue) large reward. Curves are aligned to the Cue 2 onset. Black bar indicates the period following the Cue 2 onset during which firing on surprising trials significantly exceeded firing on expected trials (p ≤ 0.001, cluster-based permutation test). B, Mean population good-surprise signal (the firing rate on surprising-large minus the firing rate on expected-large trials) ±standard error of the mean (SEM). Curves are aligned to Cue 2 onset. C, The average firing rate on surprising-large versus expected-large trials for every recorded neuron. Based on epoch indicated by black bar in A. Filled circles represent instances in which the effect was statistically significant (paired t test, p < 0.05). Counts (n) give the total number of points (and parenthetically the number of filled points) above and below the identity line. The p value indicates the level of significance at which the median firing rate on good-surprising trials exceeded the median firing rate on good-expected trials (Wilcoxon signed-rank test). D, The mean population firing rate as a function of time on trials when Cue 2 announced a surprising (red) or expected (blue) small reward. Underlying bars indicate periods following the Cue 2 onset during which firing on surprising trials was significantly lower (gray) or higher (black) than firing on expected trials. E, Mean population bad-surprise signal. Conventions as for B. F, The average firing rate on surprising-small versus expected-small trials for every recorded neuron. Firing rates in C and F were measured during the epochs indicated by black bars in A and D.
Reward-based surprise
The key question of this study was whether neuronal activity elicited by Cue 2, when it announced a given reward, either large or small, depended on its confirming or violating expectation regarding reward size set up by Cue 1. To answer this question without introducing bias through the use of a predefined temporal window, we employed a cluster-based permutation test (see Materials and Methods). This allowed identifying epochs following the onset of Cue 2 during which the mean population firing rate differed significantly between expected and surprising conditions. Following the onset of Cue 2 when it announced a good outcome, the population firing rate was significantly elevated under conditions of surprise (Fig. 2A, black bar, 0–463 ms, p < 0.001). The same was true when it announced a bad outcome (Fig. 2D, black bar, 228–395 ms, p < 0.001). We proceeded to measure surprise-based modulation in individual neurons within the windows so identified. We found that a significant majority of neurons exhibited enhancement for good surprise (Fig. 2C, 258/303 neurons, p = 2.9 × 10−40, Wilcoxon signed-rank test) and likewise for bad surprise (Fig. 2F, 185/303 neurons, p = 3.5 × 10−6). Despite the existence of a clear population trend toward enhancement, it might still be the case that, for some neurons, firing was depressed under conditions of surprise. To explore this possibility, we assessed, neuron by neuron, whether the difference in firing rate between expected and surprising conditions achieved statistical significance (paired t test, α = 0.05). We found that the fraction of neurons exhibiting good-surprise enhancement significantly exceeded the expected false-positive rate (Fig. 2C, 122/303, p < 1.0 E-15, χ2 test with Yates correction), whereas the fraction exhibiting surprise suppression did not (2/303, p = 0.062, χ2 test with Yates correction). Likewise, for bad surprise, the count of enhancing neurons significantly exceeded the expected count of false positives (Fig. 2F, 38/303, p < 1.0 × 10−15, χ2 test with Yates correction), whereas the count of suppressing neurons did not (9/303, p = 0.73, χ2 test with Yates correction). Effects indicated as significant in the composite dataset (Fig. 2) also achieved significance in each monkey considered individually.
If cue–reward associations were fully ingrained at the end of the training period, then the strength of surprise-induced activity should have been stable across subsequent data-collection sessions. To determine whether this was so, we computed z-score-normalized measures of good surprise and bad surprise for each session and fitted lines to surprise indices expressed as a function of session number. The tendency for the slopes to differ from zero did not approach significance for either good or bad surprise in either M1 or M2 (Wald test, p > 0.5). We conclude that cue–reward associations were fully ingrained at the beginning of the period of data collection. It still might be the case that learning occurred from the beginning to end of a given session, driven by within-session sequential statistics, only to vanish before the next day. To test this idea, we took advantage of the fact that each run contained at least five 24-trial blocks within each of which the full set of conditions was represented. On fitting a line to the mean z-score-normalized surprise signal expressed as a function of the ordinal position of the block within the run, we found that the tendency for the slopes to differ from zero did not approach significance for either good or bad surprise in either M1 or M2 (Wald test, p > 0.4). We conclude that surprise-related activity depended primarily or exclusively on the long-term associations of the cues.
Correction for the baseline offset
During the delay period leading up to display of Cue 2, firing was markedly higher when monkeys expected a bad outcome than when they expected a good outcome (Fig. 2A,D). If this effect were to carry over into the period following onset of Cue 2, it alone, without any contribution from firing elicited by Cue 2, would be manifested as good-surprise enhancement and bad-surprise suppression. This presumably explains why periods of good-surprise enhancement and bad-surprise suppression, as identified by cluster-based permutation in the preceding analysis, were present in the earliest bin to which the test was applied (Fig. 2A, black bar, and Fig. 2D, gray bar). To separate genuine surprise-related firing from firing dependent on the baseline offset, we repeated the preceding analysis on a population of neurons selected to give zero-baseline offset on average. To create this subpopulation, we ranked neurons by the normalized difference in the firing rate, within a 100 ms window preceding onset of Cue 2, between good-expected and bad-expected conditions. Neurons spanned a range from Rank 1 (with firing markedly greater when a large reward was expected) to rank 303 (with firing markedly less when a large reward was expected). In ascending order by rank, we added neurons to the dataset until the mean firing rate difference between good-expected and bad-expected conditions was as close to zero as possible within the precue window. The zero-baseline subpopulation resembled the subpopulation containing the remaining excluded neurons in most patterns of task-related activity; however, firing rate and signal strength were moderately stronger in the excluded population. On applying all analyses described above to this reduced dataset, we found that statistically significant surprise-induced enhancement occurred at a plausible latency following Cue 2 onset (Fig. 3A–F), whereas anomalous early effects present in the full dataset had vanished. The epoch of significant good-surprise enhancement began ∼100 ms after onset of Cue 2 (Fig. 3A, black bar, 102–436 ms, p < 0.001). The epoch of significant bad-surprise enhancement began even later at a delay of >200 ms (Fig. 3D, black bar, 225–371 ms, p < 0.001). Effects indicated as significant in the composite dataset (Fig. 3) also achieved significance in each monkey considered individually with the following exception. In M2, although the cluster-based permutation test revealed a significant bad-surprise effect (as in Fig. 3D), the difference in the firing rate between “bad surprising” and “bad expected” conditions (as in Fig. 3F) approached but did not achieve a statistical significance (p = 0.061).
The response to Cue 2 was elevated on trials involving both good surprise and bad surprise in a subpopulation of 166 neurons selected to eliminate any time-zero offset in the firing rate between surprising and expected conditions. All conventions as in Figure 2.
Although surprise signals were present in the zero-baseline–offset subpopulation, it might still be the case that surprise-signal magnitude covaried with baseline-offset magnitude across neurons. To determine whether this was so, we analyzed the correlation across neurons between z-score-normalized measures of baseline-offset magnitude (bad expected minus good expected within a 100 ms window immediately preceding Cue 2 onset) and post-Cue 2 surprise-signal magnitude (surprising minus expected 228–395 ms following the Cue 2 onset). We found that the strength of the good-surprise signal was positively (r = 0.17) and significantly (p = 0.003) correlated with the baseline offset, whereas the correlation in the case of the bad-surprise signal was negative (r = −0.090) and insignificant (p = 0.12). These observations are compatible either with covariation of genuinely distinct signals or with carryover of baseline activity into the post-Cue 2 period.
The absence in the selected neuronal subset of a baseline offset between conditions in which expectation was good or bad facilitated measuring the latency of key signals relative to the onset of Cue 2. The signals were staggered in time in a manner such that the value signal (large vs small reward) and the good-surprise signal (surprising-large reward vs expected-large reward) followed onset of the visual response at a short delay, while the bad-surprise signal (surprising-small reward vs expected-small reward) followed at a long delay (Fig. 4A). To assess the significance of differences in timing between signals, we measured time-to-half-height for individual neurons. This was possible only for particularly robust signals (the visual response, good surprise, and value). We then constructed a cumulative time-to-half-height plot for each signal and compared the plots by means of a Kolmogorov–Smirnov test. The median time-to-half-height of the visual response was 83 ms, while the median times-to-half-height for good surprise, value of Cue 1, and value of Cue 2 lay within a narrow later range of 114–132 ms (Fig. 4B). Time-to-half-height did not differ significantly among these signals, whereas all three began significantly later than the onset of the visual response (p < 1.0 × 10−15, Kolmogorov–Smirnov test).
Timing of key signals. A, Population mean signal as a function of time following the cue onset. B, Cumulative plots of time to half-peak-height for all signals sufficiently strong to allow dependable estimation in individual neurons. Inset times represent median of each distribution. Each signal was computed on the basis of z-score-normalized firing rates in the subset of neurons selected to eliminate any time-zero offset in the firing rate between surprising and expected conditions (Fig. 3). All signals except visual response were computed as firing rate differences. Good surprise: cue announcing surprising-large reward minus cue announcing expected-large reward. Bad surprise: cue announcing surprising-small reward minus cue announcing expected-small reward. Value Cue 1: Cue 1 predicting large reward minus cue1 predicting small reward. Value Cue 2: Cue 2 announcing expected-large reward minus Cue 2 announcing expected-small reward. Image-based surprise: Cue 2 as a surprising image announcing an expected-large reward minus Cue 2 as an expected image announcing an expected-large reward.
Spike-count correlations
Surprise might conceivably affect not only the mean neuronal firing rate but also the cross-neuronal spike-count correlation. We investigated this possibility in a subset of sessions, by monitoring the activity of multiple neurons simultaneously through a 16-contact linear array electrode. Altogether, we recorded the simultaneous activity of 4,511 neuron pairs involving 276 neurons. For each of the four trial types obtained by crossing reward size (large or small) with prediction status (expected or surprising), we measured the cross-trial spike–count correlation in a temporal window selected on the basis of the cluster-based permutation tests to yield both good-surprise and bad-surprise modulation (228–395 ms following the onset of Cue 2). The median spike-count correlation was positive for all four trial types (Fig. 5A,B). However, the degree of correlation was significantly reduced when the outcome was surprising versus expected both for large reward (p = 1.7 × 10−38, Wilcoxon signed-rank test) and, to a lesser degree, for small reward (p = 9.4 × 10−3). The surprise-induced reduction in spike-count correlation is unlikely to have been an artifact of the surprise-induced increase in the firing rate because, all other things being equal, an increase in firing rate is associated with an increase—not a decrease—in measured correlation (Smith and Kohn, 2008; Cohen and Maunsell, 2009; Cohen and Kohn, 2011). Nevertheless, to be sure that the effect was not an artifact of firing rate differences, we binned neuron pairs by geometric mean firing rate (with bins 12 spikes/sec wide centered at 6, 18, 30, and 42 spikes/sec) and measured the mean spike-count correlation of each binned group (Fig. 5C). Surprise reduced the spike-count correlation in all firing rate categories. The surprise-induced reduction of the spike-count correlation remained significant when we repeated the analysis using a variety of windows including the entire cue period. It was statistically significant in both individual-monkey datasets although moderately more pronounced in M2 than in M1. It was also significant in the subpopulation of 166 neurons (Fig. 3) selected to eliminate any time-zero offset in the firing rate between surprising and expected conditions.
Following a cue announcing a surprising as compared with an expected outcome, the median pairwise spike-count correlation was reduced. A, Distribution across all neuron pairs of the spike-count correlation measured following the onset of a cue signaling an expected (blue) or surprising (red) large reward. B, Distribution across all neuron pairs of the spike-count correlation measured following the onset of a cue signaling an expected (blue) or surprising (red) small reward. C, The differences between expected and surprising conditions survived binning pairs by a geometric mean firing rate and thus could not be construed as secondary to firing rate differences. Error bars represent ±SEM.
PCA-based analysis of functional heterogeneity among neurons
We wondered whether neurons differed systematically with regard to the timing and condition dependence of firing following Cue 2 onset. To address this issue, we measured, for each neuron, under each of four conditions obtained by crossing reward size (large or small) with prediction status (expected or surprising) the mean-normalized firing rate in 60 contiguous 10 ms bins beginning with the onset of Cue 2 (see Materials and Methods for details). We then carried out PCA on 303 neurons represented as points in the resulting 240-dimensional response space. Only the first two PCs captured significantly more variance when the procedure was applied to the true dataset than when it was applied to a series of pseudodatasets generated by randomly shuffling the condition labels for each neuron (PC1, 31.8% of variance, p < 0.001; PC2, 9.5% of variance, p = 0.008, cluster-based permutation test). Accordingly we will describe only PC1 and PC2. Condition- and time-dependent patterns of activation captured by PC1 and PC2 are depicted in Fig. 6A. PC1 embodied the tendency to fire more strongly for large reward than for small reward, to fire more strongly for surprising than for expected-good reward, and to fire more strongly (after a delay) for surprising than for expected bad reward. Nearly all neurons had positive loadings on PC1 (Fig. 6B, mean, 0.54; p = 3.7 × 10−106; t test). It follows that the major source of variance among neurons was simply the strength with which they gave shared reward-based and surprise-based responses. PC2 resembled PC1 early in the postcue onset period but underwent an inversion at ∼200 ms. Loadings on PC2 were both positive and negative (Fig. 6B, mean, −0.04; p = 0.068; t test). When added to PC1, PC2, if the loading was positive, enhanced the tendency for condition-dependent signals to take a phasic form (Fig. 6C) and, if the loading was negative, enhanced the tendency for them to take a tonic form (Fig. 6D). To determine whether neurons with positive and negative loadings on PC2 responded differently to surprise, we examined population firing rate plots. Each group of neurons exhibited a significant firing rate enhancement under conditions of good and bad surprise (Fig. 6C,D). However, signal dynamics clearly differed between the two groups. In particular, the good-surprise signal was of very early onset among neurons with a positive loading on PC2 (Fig. 6C), whereas, among neurons with a negative loading, the good-surprise signal developed late (Fig. 6D)—although still sooner than the bad-surprise signal. The good-surprise signal appeared earliest in neurons with a positive loading on PC2, and the bad-surprise signal appeared earliest in neurons with a negative loading on PC2. Each offset in timing was statistically significant (a cluster-based permutation test identified an early postcue period when, at a significance level of p ≤ 0.001, the signal was present in one group of neurons and not the other). These observations remained true in individual monkeys and in a subset of neurons selected to equate pre-Cue 2 firing rate on good-expected and good-surprising trials. We conclude that FEF neurons are distributed along a functional continuum, with neurons at the two extremes differentiated in particular by the latency following Cue 2 onset at which they signal good surprise.
The primary sources of variance across neurons, as revealed by PCA, were their loadings on tonic (PC1) and phasic (PC2) signal components. A, PC1 (top panel) embodies tonic condition-dependent firing, whereas PC2 (bottom panel) embodies phasic condition-dependent firing. B, Nearly all neurons had positive loadings on PC1, whereas they varied with respect to the sign of the loading on PC2. C, Mean population firing rate as a function of time following the onset of Cue 2 for neurons with a positive loading on PC2. D, The mean population firing rate as a function of time following the onset of Cue 2 for neurons with a negative loading on PC2. Black bars in C and D indicate the period following the Cue 2 onset during which firing on surprising trials significantly exceeded firing on expected trials (p ≤ 0.001, cluster-based permutation test applied within a 600 ms window beginning at the Cue 2 onset).
Correlation-based analysis of functional heterogeneity among neurons
To explore further the possibility that neurons differed with respect to good-surprise and bad-surprise signals, we analyzed the degree to which the strength of the good-surprise and bad-surprise signals covaried across the population. Any significant deviation from covariation would indicate that neurons differ with regard to the relative strengths of the signals. We defined the good-surprise signal as (Gs − Ge) / (Gs + Ge) and the bad-surprise signal as (Bs − Be) / (Bs + Be), where G and B indicate good and bad outcomes and e and s indicate expected and surprising outcomes. In designing the analysis, we took into account the fact that signals might peak at different times. To allow for this possibility, we divided the 600 ms following the onset of Cue 2 into 12 50 ms epochs and carried out a correlation analysis on data from each of the resulting 144 pairs of epochs. We represented the results of each comparison in the form of a heat map in which color temperature indicated the correlation coefficient in each time bin. The cross-neuronal correlation between good-surprise and bad-surprise signals was quite weak as indicated by the roughly equal frequency of warm and cool colors in the correlation map (Fig. 7C).
Good-surprise and bad-surprise signals varied in strength across neurons with a significant degree of independence. In each panel, color temperature indicates correlation coefficient over the range −1 (deep blue) to +1 (deep red). A, Bad-surprise reliability. For all combinations of 50 ms epochs in a 600 ms window beginning with the onset of Cue 2, color indicates the Spearman–Brown-corrected correlation across neurons between signal strength on odd trials and signal strength on even trials. B, Good-surprise reliability. The same conventions as for bad surprise. C, Correlation across neurons between the good-surprise signal and the bad-surprise signal as based on all trials. Correlation analysis applied to data from a single more prolonged epoch, demarcated by the black outline in A–C, yielded the results indicated in D. Explainable R2 is the fraction of variance in one signal that could in principle be explained by variance in the other signal, a limit imposed by the less reliable of the two signals. Explained R2 is the variance actually explained. P(equal) is the probability that explained variance was equal to explainable variance. P(explained = 0) is the probability that explained variance was equal to zero. All p values were computed by applying a z-test to Fisher-transformed correlation coefficients. By an identical approach, we characterized the cross-neuronal correlation between the good-surprise signal and the value signal (E–H) and between the bad-surprise signal and the value signal (I–L). In all analyses, the good-surprise signal was defined as (Gs − Ge) / (Gs + Ge), the bad-surprise signal was defined as (Bs − Be) / (Bs + Be), and the value signal was defined as (Ge − Be) / (Ge + Be), where G and B indicate the mean firing rate on trials when Cue 2 announced a good or a bad outcome and e and s indicate whether the announcement was expected or surprising.
To interpret this result requires taking into account the reliability of the signals being compared. To estimate reliability, we divided the full dataset into two subsets based on odd-numbered and even-numbered trials. We then measured the Pearson's correlation across neurons between each signal as measured on odd trials and the same signal as measured on even trials and followed up with a Spearman–Brown correction to obtain reliability. In the resulting maps (Fig. 7A,B), warm colors clearly predominate, indicating substantial reliability. Knowing the reliability of each signal allows posing the critical question: is the correlation between a given pair of signals as strong as the reliability of the less reliable member of the pair being compared? The cross-neuronal correlation between good-surprise and bad-surprise signals (Fig. 7C) was obviously lower on average than the reliability of the two signals considered individually (Fig. 7A,B). This suggests that there was genuine variability across neurons with regard to the relative strength of good-surprise and bad-surprise signals.
To test this conclusion statistically, we considered data from a temporal window selected on the basis of the cluster-based permutation test (228–395 ms following the Cue 2 onset) in which both signals were strong. This window is represented by a black square superimposed on each heat map in Figure 7A–C. The results are presented in Figure 7D. The correlation between the two signals (r = 0.097) was significantly lower (p = 7.4 × 10−9 as based on a z-test applied to the Fisher-transformed correlation coefficients) than the reliability ceiling imposed by bad-surprise self-correlation (r = 0.51). The small residual positive correlation did not itself achieve statistical significance (p = 0.089). We conclude that the good-surprise and bad-surprise signals were not manifestations of a single latent variable. Instead, there was significant variation from neuron to neuron with regard to their relative strengths. Analyses based on individual-monkey datasets and on the neuronal subset selected to eliminate any time-zero offset in firing rate between surprising and expected conditions (Fig. 3) yielded qualitatively similar results. However, there were small interindividual differences. We note in particular that the trend toward a positive correlation between the good-surprise and bad-surprise signals (Fig. 7D) depended solely on M1.
Because neurons were sensitive not only to the surprise induced by Cue 2 but also to the value of the reward it announced, we thought it is worthwhile to determine whether there was any correlation across neurons between the surprise signals and the value signal. For this purpose, we defined the value signal as (Ge − Be) / (Ge + Be), where Ge and Be represent the mean firing rate on trials when Cue 2 announced an expected-good or an expected-bad outcome. The good-surprise signal was not significantly correlated with the value signal (Fig. 7E–H). In contrast, the bad-surprise signal was strongly (r = 0.39) and significantly (p = 3.9 E-12) correlated with it (Fig. 7I–L). We have no ready explanation for these surprising findings. They may bear some relation to the observation that firing elicited by a surprising visual stimulus and firing dependent on the reward size are uncorrelated across cortical areas (Zhang et al., 2022).
Image-based surprise
The analyses described in preceding sections were all based on conditions in which, when surprise occurred, it was induced by an unexpected image announcing an unexpected outcome. The resulting elevation of firing rate might in principle have depended either on the unexpectedness of the image or the unexpectedness of the outcome. To distinguish between these possibilities, we compared trials in which the image alone was surprising (trials excluded from all preceding analyses) to trials in which neither the image nor the outcome was surprising. The results differed as a function of the reward size. Under the large-reward condition, firing was enhanced when the image was surprising (Fig. 8A–C, p = 1.2 × 10−10, Wilcoxon signed-rank test), with the effect occurring at a latency intermediate between the latencies of good-surprise and bad-surprise effects (Fig. 4A). Under the small-reward condition, image-based surprise induced no observable change in the population firing rate (Fig. 8D–F, p = 0.50, Wilcoxon signed-rank test). We proceeded to compare the elevation of the firing rate induced by image-based surprise to the elevation of the firing rate induced by reward-based surprise. We found, for conditions with large reward, that the population firing rate was significantly greater when both the reward size and image identity were surprising than when image identity alone was surprising (Fig. 9A, p = 7.5 × 10−28, Wilcoxon signed-rank test). This was true as well for conditions with small reward (Fig. 9B, p = 0.00025). We conclude that reward-based surprise induced an elevation of the population firing rate independent of image-based surprise. Effects significant in the full dataset, as indicated here, also achieved statistical significance in individual-monkey datasets.
Neurons exhibited image-based surprise only when the predicted reward was large. Results presented in this figure are based on trials in which Cue 1 set up an expectation of large reward (A–C) or small reward (D–F) and Cue 2 confirmed the expectation. The identity of the confirming Cue 2 image was itself either expected (p = 0.67) or surprising (p = 0.17). A and D show the mean population firing rate as a function of time on trials in which Cue 2 was surprising (orange) or expected (cyan). All other conventions as in Figure 2.
The elevation of the neuronal firing rate associated with reward-based surprise could not be explained merely by the surprising identity of the image announcing the unexpected reward. This is indicated by the fact that the firing rate was significantly greater on trials when image identity and outcome were both surprising than on trials when image identity alone was surprising. A, Analysis for images announcing a large reward, based on firing in a 138–310 ms post-Cue 2 window during which the reward-based good−surprise signal and the image-based surprise signal overlapped. B, Analysis for images announcing a small reward, based on firing in a 228–395 ms post-Cue 2 window during which the bad-surprise signal was present. Each point represents data from one neuron. Filled circles represent instances in which the difference in the firing rate between the two conditions was statistically significant (paired t test, p < 0.05). Counts (n) give the total number of points (and parenthetically the number of filled points) above and below the identity line. The total of counts is one less in B than in A because one point fell on the identity line. The p value indicates the level of significance at which the median firing rate on trials involving reward-based surprise exceeded the median firing rate on trials involving image-based surprise alone (Wilcoxon signed-rank test).
Relation to behavioral measures
Pupillary diameter
Reward prediction errors have been reported to induce arousal as indexed by pupillary dilation (Preuschoff et al., 2011; O’Reilly et al., 2013; Braem et al., 2015; Browning et al., 2015; de Gee et al., 2021; Rothenhoefer et al., 2021). We wondered whether prediction errors in our study would have the same effect. Resolving this issue was complicated by the fact that pupillary diameter at the time of the onset of Cue 2 was greater when Cue 1 had predicted a large reward than when it had predicted a small reward (Fig. 10A). For this reason alone, the difference in pupillary diameter between surprising and expected conditions would have been negative on trials with a good outcome and positive on trials with a bad outcome. To cancel out this effect, we combined data across trials with good and bad outcomes before computing the impact of surprise on pupillary diameter. We found that pupillary diameter was indeed greater when the announced outcome was surprising than when it was expected (Fig. 10B). The effect was superadded at a latency of ∼240 ms to the constrictive light reflex, which itself began ∼180 ms after the Cue 2 onset (Pong and Fuchs, 2000). In contrast to reward-based surprise, image-based surprise exerted no consistent effect on pupillary diameter (Fig. 10C). The results shown in Figure 10 were driven primarily by M1 both because M1 contributed data from more sessions (56 in M1 vs 8 in M2) and because the effects were stronger. Nevertheless the pattern in which pupillary diameter depended on task condition was similar in the two monkeys. We conclude that reward-based surprise was arousing, whereas image-based surprise was not.
Pupillary diameter (PD) increased in response to cues inducing reward-based surprise. A, Difference in PD between trials in which Cue 1 predicted a large versus small reward. This effect complicated analysis of subsequent surprise-induced effects as explained in the text. B, Difference in PD between trials in which Cue 2 announced a surprising versus expected outcome. C, Difference in PD between trials in which Cue 2 announced a reward of the expected size with an unexpected image. Each bar represents the mean across 64 sessions. The ribbon is ±SEM.
Fixation breaks
If reward-based surprise induced arousal, then one might expect it to enhance task engagement. Because the task in our study required only central fixation, the sole measure of task engagement available to us was the rate at which monkeys terminated a trial prematurely by breaking fixation. Upon analyzing this behavior as a function of task condition, we found that reward-based surprise did indeed reduce the fixation-break rate (Fig. 11). The effect was statistically significant both for good surprise (p = 0.037, Wilcoxon signed-rank test) and for bad surprise (p = 1.6 × 10−7). In contrast, image-based surprise did not significantly reduce the fixation-break rate. The results shown in Figure 11 were driven primarily by M1 because M1 contributed data from more sessions (58 in M1 vs 8 in M2). However, the tendency for bad surprise to reduce the fixation-break rate was present and statistically significant (p < 0.01) in both monkeys. We conclude that reward-based surprise significantly enhanced task engagement, whereas the impact of image-based surprise was weak if present at all.
The rate at which monkeys broke fixation decreased in response to cues inducing reward-based surprise. Left pair of bars: The fixation-break rate during the Cue 1 presentation period and during Delay 1 on trials in which Cue 1 predicted a large or a small reward. Middle triad of bars: The fixation-break rate during the Cue 2 presentation period and during Delay 2 on trials in which Cue 2 announced an expected-large reward, announced an expected-large reward with an unexpected image, or announced a surprising-large reward. Right triad of bars: The fixation-break rate during the Cue 2 presentation period and during Delay 2 on trials in which Cue 2 announced an expected-small reward, announced an expected-small reward with an unexpected image, or announced a surprising-small reward. Each bar represents the mean across 66 sessions. p values are based on Wilcoxon signed-rank test. Only the indicated comparisons were carried out.
Gaze angle
Monkeys were required to maintain fixation within a central window throughout each trial. This did not, however, prevent microsaccades, which commonly occur during fixation (Hafed and Ignashchenkova, 2013; Lowet et al., 2018b; Yu et al., 2022; Willett and Mayo, 2023). Microsaccades can excite visually responsive neurons by inducing motion of the image within the RF (Leopold and Logothetis, 1998; Lowet et al., 2018a,b; Yu et al., 2022). If microsaccades were more frequent under surprising than under expected conditions, then reafferent visual stimulation arising from the eye movements might underlie surprise-associated firing. To address this issue, we measured within-trial gaze–angle variance in a 500 ms window beginning with the onset of Cue 2. Variance was <200 arc minutes on average, from which it follows that gaze was within half a degree of the within-trial mean >95% of the time. Variance, both horizontal and vertical, was significantly lower under the surprising-good than under the expected-good condition (p < 0.001, Wilcoxon rank-sum test). This effect was present and statistically significant in both M1 and M2. It is the opposite of the pattern expected if surprise-associated firing were an artifact of eye movements. We conclude that the surprise signal cannot be explained as a product of eye movement-induced reafferent visual stimulation.
Discussion
The key finding of this study is that FEF neurons give enhanced responses to a visual cue presented inside the RF when it announces a reward of surprising size, regardless of whether the surprise is good or bad. The surprise-induced elevation of the firing rate is accompanied by a reduction of pairwise spike-count correlations. This observation is consistent with the interpretation that surprising cues capture attention. Attention reduces the average spike-count correlation in the visual cortex (Cohen and Maunsell, 2009; Ghosh and Maunsell, 2021), while, in the FEF, measures closely related to attention including detection accuracy (Astrand et al., 2016; Amengual et al., 2022) and saccadic reaction time (Khanna et al., 2019) are associated with reduced spike-count correlation. Good-surprise and bad-surprise signals are not, however, strictly equivalent. Signaling of bad surprise is delayed by ∼150 ms relative to signaling of the good-surprise. The existence of a substantial temporal offset rules out generation of good-surprise and bad-surprise signals by simple rectification of a signed surprise signal because rectification should require the same computational time regardless of whether the input is negative or positive (Barry and Gerstner, 2024). In particular, it rules out generation of the unsigned surprise signal in FEF by rectification of the dopaminergic signed reward-prediction-error signal (Schultz et al., 1997). On a related note, good-surprise and bad-surprise signals vary in strength across neurons with considerable independence. This observation is in harmony with a recent meta-analysis of human fMRI-based literature indicating that good-surprise and bad-surprise signals are not distributed uniformly across the cerebral cortex (Fouragnan et al., 2018).
Previous studies in cortical areas other than the FEF
Numerous previous studies have examined the impact of reward-based surprise on neuronal activity in the cerebral cortex. However, the reported results generally differ from ours in that surprise did not elicit consistent excitation. In the supplementary eye field, neurons are sensitive to cues eliciting both good and bad surprise, but the impact of surprise is sometimes an increase and sometimes a decrease of firing rate and does not necessarily take an unsigned form (So and Stuphorn, 2012; Kawaguchi et al., 2015). Descriptions of anterior cingulate cortex are inconclusive either because positive and negative errors were not characterized in a parallel manner or because the results were mixed (Ito et al., 2003; McCoy et al., 2003; Amiez et al., 2005; Matsumoto et al., 2007; Seo and Lee, 2007; Kennerley and Wallis, 2009; Sul et al., 2010; Kennerley et al., 2011; Klavir et al., 2013; Khamassi et al., 2015; Monosov, 2017; Oemisch et al., 2019). The same is true of the majority of studies of the dorsolateral prefrontal cortex (Kennerley and Wallis, 2009; Kennerley et al., 2011; Khamassi et al., 2015; Asaad et al., 2017; Oemisch et al., 2019). Finally, in orbitofrontal cortex, most reports indicate a lack of robust reward-based surprise signals (Tremblay and Schultz, 2000; Kennerley et al., 2009, 2011; Takahashi et al., 2009; Stalnaker et al., 2018), while one describes signaling of signed reward prediction error (Sul et al., 2010).
The discrepancy between our results and results described above might reflect genuine variation across cortical areas or, alternatively, might arise from differences of experimental design. The current approach is unique in combining three design features. (1) We induced surprise with visual cues. Many previous studies used primary reinforcers, for example, reward delivery, to induce positive surprise, and omission of reward, to induce negative surprise. This complicates analysis by producing a confound between the sign of surprise and the modality of the eliciting event. The complication is unnecessary. Cues of one modality, announcing worse-than-expected and better-than-expected outcomes, are sufficient, within the framework of standard learning theory, to elicit prediction errors (Schultz et al., 1997; Ludvig et al., 2012). Moreover, in humans, visual cues inducing reward prediction errors elicit BOLD activation in the orbitofrontal cortex and ventral striatum (O'Doherty et al., 2003; Delgado et al., 2011) and mediate learning (Nassar et al., 2012; Rouhani and Niv, 2021). (2) We presented visual stimuli within the mapped neuronal RF. Previous studies using visual or other sensory cues presented them at an arbitrary location. This complicates interpretation in the case of any neuron with a spatially restricted RF. (3) We used cues with fixed highly overlearned reward associations. Many previous studies required animals to learn action–reward associations as they changed across trials. That approach allows analyzing the relation of neuronal activity to learning but is suboptimal for characterizing surprise-induced neuronal signals because prediction errors cannot be precisely controlled or explicitly measured and are often small.
Two prior reports do describe effects similar to those we report. Bryden and colleagues recorded from the anterior cingulate cortex of rats performing a task in which, on each trial, an odor instructed the subject which of the two fluid wells to select (Bryden et al., 2011). One well delivered a large or rapid reward, whereas the other delivered a small or delayed reward. The contingencies were subject to unannounced reversal. Responses to reward were especially strong immediately after reversal. Asaad and Eskandar recorded from the periprincipalis prefrontal cortex in monkeys performing a four-target forced–choice saccadic task in which the target associated with reward changed intermittently. Delivery or nondelivery of reward was announced by a postsaccadic foveal cue (Asaad and Eskandar, 2011). Among a majority of neurons, the response to the cue was enhanced when it announced a surprising outcome regardless of whether the violation of expectation was good or bad. Particularly relevant to our own findings is the observation that bad-surprise signals appeared at longer latency than good-surprise signals (Asaad and Eskandar, 2011). Unfortunately, latencies in the two studies cannot be compared directly because the authors report normalized rather than absolute durations.
No previous study has characterized the impact of reward-based surprise on neurons in three regions most commonly considered to embody salience maps: the FEF, the lateral intraparietal area, and the superior colliculus (Fecteau and Munoz, 2006). Even studies of surprise involving modalities other than reward have been very limited in this system. It is well established that a visual stimulus clearly different from antecedent or surrounding stimuli elicits enhanced neuronal responses if placed in the RF (Constantinidis and Steinmetz, 2005; Mayo and Sommer, 2008; Boehnke et al., 2011; Dutta and Gutfreund, 2014; Joiner et al., 2017; White et al., 2019). At a computational level, such effects might be construed as surprise-based, given the expectation that elements in a scene should be continuous in time and uniform in space (Baldi and Itti, 2010). However, at the level of implementation, the elevation of the firing rate depends on hard-wired mechanisms—adaptation and surround suppression—rather than on active expectation and surprise. To our knowledge, the only previous evidence for genuine surprise-based firing within this system is the observation that neurons of the superior colliculus give enhanced responses to a dim oddball stimulus preceded by a string of several bright standard stimuli (Boehnke et al., 2011).
Reward-based surprise versus reward prediction error and image-based surprise
Our results might be construed to indicate that FEF neurons signal either unsigned reward prediction error or reward-based surprise. Unsigned reward prediction error, which typically figures in discussions of learning, is the magnitude of the difference in subjective value between expected and delivered reward (Pearce and Hall, 1980; Schultz, 2017). Surprise, which typically figures in discussions of attention, is defined as arising from violation of a strong expectation (Barry and Gerstner, 2024) and is typically characterized with information-based measures such as Kullback–Leibler divergence (Itti and Baldi, 2009; Baldi and Itti, 2010). It is possible with appropriate experimental design to orthogonalize reward prediction error and reward-based surprise (Chumbley et al., 2014; Grohn et al., 2020; Rothenhoefer et al., 2021). This is not achievable under the limited set of conditions in the present experiment. However, given the well-established role of the FEF in attention, we lean toward an interpretation in terms of reward-based surprise as distinct from unsigned reward prediction error.
Any image announcing an unexpected outcome necessarily possesses unexpected visual properties. No previous study has controlled for the possibility that the response to a cue announcing an unexpected reward might be driven by the unexpectedness of the cue rather than of the reward. As a control for the contribution of purely image-based surprise, we included conditions in which Cue 2 was surprising although the outcome it announced was expected. We found that image-based surprise did elicit enhanced firing. However, it did so only when the image predicted a large reward. This contingency could explain why, in a previous study, neurons of Area 8a, in or near the FEF, did not exhibit image-based surprise (Zhang et al., 2019). The FEF thus appears to differ from numerous other cortical areas where visual surprise does elicit enhanced firing even when the images convey no reward predictions (Meyer and Olson, 2011; Meyer et al., 2014; Ramachandran et al., 2016, 2017; Kumar et al., 2017; Schwiedrzik and Freiwald, 2017; Kaposvari et al., 2018; Zhang et al., 2019; Vergnieux and Vogels, 2020; Feuerriegel et al., 2021; Esmailpour et al., 2023).
Footnotes
We thank Karen McCracken for her technical assistance. This work was supported by National Institutes of Health (NIH) R01 EY030226 and technically supported by NIH P30 EY08098, T32 EY017271, and T32 EY017271 and CMU-Pitt BRIDGE Center RRID:SCR_023356.
The authors declare no competing financial interests.
- Correspondence should be addressed to Carl R. Olson at colson{at}cnbc.cmu.edu or Michael R. Shteyn at mshteyn{at}andrew.cmu.edu.

















