## Abstract

Risk is a ubiquitous feature of life. It plays an important role in economic decisions by affecting subjective reward value. Informed decisions require accurate risk information for each choice option. However, risk is often not constant but changes dynamically in the environment. Therefore, risk information should be updated to the current risk level. Potential mechanisms involve error-driven updating, whereby differences between current and predicted risk levels (risk prediction errors) are used to obtain currently accurate risk predictions. As a major reward structure, the orbitofrontal cortex is involved in coding key reward parameters such as reward value and risk. In this study, monkeys viewed different visual stimuli indicating specific levels of risk that deviated from the overall risk predicted by a common earlier stimulus. A group of orbitofrontal neurons displayed a risk signal that tracked the discrepancy between current and predicted risk. Such neuronal signals may be involved in the updating of risk information.

## Introduction

Very few things in life are certain. Most of our decisions involve some degree of uncertainty. Even if we know the values of desired objects, we often don't know for sure whether we will obtain them. Our decisions benefit greatly from knowing how certain these objects are. To make reasonable decisions, we appreciate two vital pieces of information, namely the value of each desired object and the uncertainty of obtaining it. More formally, the two main parameters influencing our decisions are the first two statistical moments of reward probability distributions, namely the expected value (anticipated mean) and the variance. The variance and its square root, standard deviation (SD), are commonly referred to as risk. Importantly, these parameters are not constants but vary dynamically in the environment. Therefore, it is important to update our knowledge to the currently valid levels of expected value and risk.

Error-driven mechanisms provide common methods for updating knowledge about important decision variables. For example, reward prediction errors, which capture the discrepancy between current and predicted reward values, are thought to critically serve the learning about future reward values (Rescorla and Wagner, 1972; Sutton and Barto, 1981). A similar mechanism may function to update the learning of the risk. Specifically, a risk prediction error that captures the discrepancy between current and predicted risk may be involved in updating our knowledge about risk (Preuschoff et al., 2008).

Reward value and risk are encoded by neurons in the orbitofrontal cortex (Thorpe et al., 1983; Tremblay and Schultz, 1999; Hikosaka and Watanabe, 2000; Wallis and Miller, 2003; Roesch and Olson, 2004; Padoa-Schioppa and Assad, 2006; Kepecs et al., 2008; Kennerley et al., 2009, 2011; O'Neill and Schultz, 2010). Moreover, human imaging studies suggest coding of reward value prediction errors in orbitofrontal cortex (O'Doherty et al., 2003; Dreher et al., 2006). These data suggest the presence of error-driven mechanisms in the orbitofrontal cortex. Given the involvement of orbitofrontal cortex in risk processing and the possibility of error-driven mechanisms, we investigated whether single neurons in the orbitofrontal cortex code risk prediction errors. In a simple task, monkeys viewed visual cues that indicated transitions in risk. We identified a population of orbitofrontal neurons that tracked the discrepancies between current and predicted risk, thus displaying a neurophysiological risk prediction error signal.

## Materials and Methods

##### Subjects.

We used two adult male rhesus monkeys (*Macaca mulatta*), weighing 10–14 kg. The monkeys were implanted, under general anesthesia, with a head holder and a stainless steel chamber on the skull to enable daily electrophysiological recordings from single neurons. All surgical and experimental procedures were performed under a Home Office License according to the United Kingdom Animals (Scientific Procedures) Act 1986.

##### Behavioral task.

During training and testing, the monkeys were on a restricted water schedule 6 d of the week and 24 h water *ad libitum*. The monkeys were trained to sit in a restraining chair in front of a computer monitor with the head fixed and to perform a memory-guided saccade task. An aperture in the front of the chair provided access to a touch-sensitive key. To commence a trial, the monkey fixated on a red spot in the center of the monitor and contacted the key. After 1.5 s, a visual cue appeared in pseudorandom alternation to either the left or right of the fixation spot for 0.5 s (Fig. 1*A*). The animal maintained fixation for an additional 2 s before the center spot was extinguished, which was the signal for the monkey to saccade to the left or right cue location. A successful saccade led to the appearance of a red fixation spot at the peripheral location. After fixation for 1 s, the spot turned green, and the animal released the key. Juice reward was delivered 1 s later. The next trial started with the appearance of the central fixation spot at 3.5 s after the reward. Thus, the intertrial interval was 3.5 s, and the total cycle time (trial duration plus intertrial interval) was 10.5 s. For neuronal recordings, only one cue was shown per trial. We assessed the monkey's risk preferences in a subset of behavioral sessions in which the animal chose between two cues (one safe and one risky).

##### Stimuli and independent variables.

As risk cues, we used black bars on framed, rectangular white backgrounds. The vertical position of the bar indicated juice volume. Two bars within the rectangle indicated that one of two possible juice volumes would be delivered with equal probability (*p* = 0.5 each), thus explicitly indicating the risk of the outcomes (Fig. 1*B*). We used three pseudorandomly alternating gambles whose reward volumes were 0.27 and 0.33 ml for gamble 1, 0.24 and 0.36 ml for gamble 2, and 0.18 and 0.42 ml for gamble 3, resulting in the same mathematical expected value (EV; Eq. 1) of 0.3 ml for each gamble (and thus a common EV of 0.3 ml). Only one reward volume was delivered at trial end. The gambles had three different levels of risk (Fig. 1*B*). We defined risk as SD of a probability distribution with the following:
where *n* is the number of possible juice volumes (two volumes for current risk at each cue, six volumes for overall risk at the fixation spot).

Using Equation 2, the current risk is defined as the SD of each risk cue, and the predicted risk is defined as the SD of all possible reward outcomes (Fig. 1*B*,*C*). We defined risk prediction error as the current risk minus the predicted risk (Eq. 3). Thus, the risk prediction error (RiPE) was calculated as the SD indicated by each risk cue minus the predicted SD indicated by the preceding fixation spot:

This measure provided three different levels of signed RiPE, as indicated by the red arrows above and below the maroon horizontal dashed line in Figure 1*B*. The signed RiPEs were correlated with risk because the overall risk prediction, which is subtracted from current risk, was constant for all three gambles. Therefore, we considered unsigned, absolute RiPEs, which were 0.05, 0.02, and 0.04 ml for the low-, medium-, and high-risk cues, respectively (Fig. 1*B*,*C*).

The red fixation spot at trial onset was identical in all trial types and predicted that one of the three possible risk cues would follow. Therefore, the global, predicted risk at the time of the fixation spot was identical in all trials (Fig. 1*B*, maroon arrow; Eq. 2). The subsequent appearance of the explicit risk cue indicated the specific risk in the current trial (Fig. 1*B*, blue arrows; Eq. 2). In addition to the explicit risk information, with each risk cue there was a transition from the global, predicted risk at the time of the fixation spot to the specific risk signaled by the explicit risk cue in the current trial (Fig. 1*B*, red arrows; Eq. 3). Therefore, appearance of a specific cue indicated the risk per se in any given trial and also elicited a RiPE between the risk indicated by that cue and the global risk predicted by the fixation spot. Note that the expected value was constant for all cues. A reward value prediction error is calculated as the difference in reward value from expected value and is thus zero at the time of cue presentation.

However, monkeys displayed subjective preferences for risk, as described before (McCoy and Platt, 2005; O'Neill and Schultz, 2010). Thus, although all outcomes had the same expected objective value, their subjective values varied with risk level. Hence, there was a subjective value prediction error at the time of cue presentation. To assess the subjective values derived from the influence of risk on value, we investigated behavioral choices between a safe cue and each of the risky cues. The safe cue consisted of a single bar that indicated a safe juice volume equal to the expected value of the risky cues (*p* = 1.0). The identical expected objective values of juice volumes of the safe and the risky cues allowed us to assess the subjective value derived from risk sensitivity without confounding differences in expected objective value. We used two-way ANOVA to assess the monkeys' preferences in choices between safe and risky cues (percentage of choice of the risky over the safe option), with level of risk and monkey as factors. The percentage preference for each of the risky cues compared with the safe cue was taken as a numerical measure of subjective value. Thus, the subjective value prediction error (SVPE) was calculated as the subjective value (SV) of each risk cue minus the predicted subjective value indicated by the preceding fixation spot:

The predicted SV at the fixation spot was the numerical average of the subjective values of the three risk cues as assessed by the respective behavioral preferences. The current SV was derived from the numerical preference measure (percentage) for each risk cue. Because we were interested in comparing the neuronal responses to the unsigned, absolute RiPEs, we considered the absolute SVPEs, which were 10, 0, and 18% for the low-, medium-, and high-risk cues, respectively.

##### Neuronal recording and data analysis.

We isolated and recorded the activity of single neurons in the orbitofrontal cortex while monkeys performed the task, according to procedures described previously (O'Neill and Schultz, 2010). In the first step of analysis, we defined the presence of cue-related neuronal responses by the Wilcoxon test, which compared neuronal activity during a period of 0.1–0.6 s following cue onset against a control period of 1.0 s before the fixation spot. In the second step, we performed a multiple linear regression analysis on the cue responses identified by the Wilcoxon test.

The cues indicated both the RiPE and the risk per se on each trial. Therefore, both these terms were included in a multiple regression model used to assess the relationship of the cue responses to each of these variables:
where *Y* is the neuronal firing rate; |RiPE| is the unsigned, absolute risk prediction error; β_{1} and β_{2} are corresponding regression coefficients; β_{0} is the intercept; and *e* is error. Note that because of the risk-seeking attitude of our animals (Fig. 1*D*), risk covaried with subjective value, neither of which were of primary interest for this study. However, the prediction errors in these variables might correlate. Therefore, we disambiguated RiPE from SVPE as described below (Eqs. 7, 8).

To confirm the capacity of RiPE for explaining variance of neuronal activity in addition to that accounted for by the risk regressor of Equation 5, we used a hierarchical approach in which we compared the full regression model against a reduced model with the *F* test (Snedecor and Cochran, 1989). Thus, we compared Equation 5 with the reduced model:

For comparisons between different regressors, we normalized their slopes (β) and calculated the standardized regression coefficient (SRC) for the *i*th regressor *x*_{i} as *a*_{i} × *s*_{i}/*s*_{y}, with *a*_{i} as the original slope regression coefficient (β) and *s*_{i} and *s*_{y} as the SDs of *x*_{i} and the dependent variable *y*. To quantify the extent to which the regressors accounted for the variance of the neuronal data, we used the coefficient of partial determination (CPD).

To assess the potential influence of SVPE, we performed separate single linear regressions using SVPE and RiPE as regressors, as adding SVPE to Equation 5 would exceed the number of trial types allowed in the configuration of our multiple regression model.
where |SVPE| indicates the unsigned, absolute SVPE. Comparisons of the *r*^{2} between these two regressions served to assess whether the |RiPE| captured more of the variance in the data than the |SVPE|.

## Results

We recorded the extracellular activity of 242 single neurons in the orbitofrontal cortex during task performance. Of these, 180 neurons (74%) responded significantly to the cues (*p* < 0.05, Wilcoxon test).

The multiple regression analysis (Eq. 5) revealed that the cue responses of 33 of 180 neurons (18%) coded the unsigned, absolute RiPE (all *p* < 0.05), with 15 of 33 showing significant positive and 18 of 33 showing significant negative correlation coefficients (Fig. 2*A–C*, left and right, respectively). The hierarchical regression analysis revealed that RiPE explained additional variance after risk was taken into account in 30 of the 33 neurons identified by Equation 5 (*p* ≤ 0.05; *F* test on Eq. 5 vs Eq. 6; remaining three neurons, *p* ≤ 0.06). The standardized regression coefficients were significantly higher for RiPE than for risk in the 15 neurons with positive coefficients (Fig. 2*D*, left; *F*_{(1,28)} = 5.31, *p* = 0.03, one-way ANOVA) and lower in the 18 neurons with negative coefficients (Fig. 2*D*, right; *F*_{(1,34)} = 48.36, *p* < 0.001, one-way ANOVA).

Of the 180 neurons with cue responses, 23 coded only RiPE, 42 coded only risk per se, and 10 coded both RiPE and risk (*p* < 0.05). A χ^{2} test failed to detect a significant difference between the likelihood of a RiPE-coding neuron to code risk compared with any task-related neuron coding risk (χ^{2} = 0.127, *p* = 0.721), suggesting that risk coding did not occur preferentially in RiPE-coding neurons. In addition, the amount of variance explained was not correlated between RiPE and risk (Fig. 2*E*, left). Thus, RiPE coding was mostly distinct from risk coding.

To control for subjective value prediction error (SVPE) coding, we derived a measure of subjective value for the three risky cues from the monkeys' behavioral preferences (Fig. 1*D*). The monkeys preferred the risky option more as the risk increased (main effect of risk: *F*_{(2,51)} = 28.02, *p* < 0.001, two-way ANOVA), and this effect was not statistically different between the two monkeys (main effect of monkey; *F*_{(1,51)} = 2.87, *p* = 0.1, n.s.; risk × monkey interaction: *F*_{(2,51)} = 0.2, n.s.). Therefore, we averaged the monkeys' preference ratings (Fig. 1*D*, black triangles) to calculate the SVPE (Eq. 4). The variance in the neuronal data was better accounted for by a RiPE than an SVPE in the regressions (Fig. 2*E*, right; *p* = 0.02, Wilcoxon signed-rank test between *r*^{2} derived from Eqs. 7 and 8). Thus RiPE coding was not explained by SVPE coding in this neuronal population.

As an additional test on the suitability of a linear regression analysis on our data of neuronal firing rates, we log transformed the data from all neurons and re-ran the full analysis. This resulted in 31 of 180 neurons (17%) with significant regression coefficients for RiPE at cue presentation, comparable with the results from the analysis on the raw data.

The distribution of the 33 orbitofrontal neurons with cue-related RiPE responses was not significantly different between orbitofrontal areas 11 (14 of 83 neurons), 12 (2 of 5 neurons), 13 (15 of 88 neurons), and 14 (2 of 4 neurons) (χ^{2} = 4.5, *p* = 0.216, χ^{2} test).

## Discussion

This study investigated the neurophysiological coding of RiPEs, as defined by the difference between the predicted risk and the current risk. A group of orbitofrontal neurons coded the unsigned, absolute RiPE with a positive or negative slope. This error coding was mostly distinct from risk coding per se and subjective value coding.

Prediction error is a general term that can be derived from any predictable variable. It is defined as the difference between the current measure and the predicted measure. For example, reward value prediction error is defined as current reward value minus predicted reward value and constitutes a crucial component in reinforcement learning (Rescorla and Wagner, 1972). In analogy, RiPE is defined as current risk minus predicted risk, as used here and in previous studies (Preuschoff et al., 2008; d'Acremont et al., 2009). This definition allows calculation of RiPE at the cue and the outcome, as both events are preceded by well defined levels of risk prediction. This definition follows that of previous studies that also used binary gambles (Preuschoff et al., 2008; d'Acremont et al., 2009). The previous studies derived risk from value prediction error, whereas we calculated the SD directly. Although the calculations for deriving risk differ between the previous studies and ours, in binary gambles these two measures are numerically identical (Preuschoff et al., 2008, their Table S1). Therefore, with either approach, the difference between the current and the predicted risk, the RiPE, is equivalent. In our experiment, RiPEs calculated in this way amount to zero at reward outcome. Therefore, we were only able to test for RiPEs at cue presentation.

Investigations of neuronal risk processing have revealed the involvement of several brain structures. Studies defining risk as statistical variance or SD have identified risk processing in frontal cortex, parietal cortex, cingulate cortex, striatum, amygdala, and insula (Sanfey et al., 2003; Hsu et al., 2005; McCoy and Platt, 2005; Huettel et al., 2006; Kepecs et al., 2008; Preuschoff et al., 2008; Christopoulos et al., 2009; Tobler et al., 2009; O'Neill and Schultz, 2010). In addition, activity in the human insula correlates with RiPEs (Preuschoff et al., 2008; d'Acremont et al., 2009). Together with our findings, the orbitofrontal cortex and the insula seem to be involved in processing both risk per se and deviations from predicted risk. In addition, the orbitofrontal cortex is involved in encoding deviations from expected reward value (value prediction errors; O'Doherty et al., 2003; Dreher et al., 2006). These findings suggest an important role of orbitofrontal cortex in updating the key variables of reward probability distributions, namely expected value and risk.

Prediction errors can be signed or unsigned. Signed prediction errors are positive for greater-than-predicted outcomes and negative for less-than-predicted outcomes. In reinforcement learning about reward value, signed prediction errors serve for updating the value function. In contrast, unsigned, absolute value prediction errors serve for adjusting the value learning coefficient in the associability learning rules (Mackintosh, 1975; Pearce and Hall, 1980). These distinct roles are a direct consequence of the prediction errors being signed or unsigned; signed value prediction errors signal that values are less than or greater than predicted, whereas unsigned prediction errors simply track the difference from prediction regardless of whether it is greater than or less than predicted. Similar roles may hold for risk; signed RiPEs could be involved in the main process of risk updating (Preuschoff et al., 2008; d'Acremont et al., 2009). In contrast, the currently reported unsigned RiPE signal in orbitofrontal neurons could serve to set the coefficient of error-driven risk learning to adjust or modulate the main process of updating risk information, analogous to unsigned value prediction errors. Thus, an organism conceivably would learn most efficiently and flexibly about risks by using different forms of RiPE signals. These signals may be coded within the same brain area or in different areas, such as the insular (Preuschoff et al., 2008; d'Acremont et al., 2009) and orbitofrontal cortex.

## Footnotes

This work was supported by the Wellcome Trust, the European Research Council, the Behavioural and Clinical Neuroscience Institute Cambridge, and the Human Frontier Science Program. We thank Peter Bossaerts, Armin Lak, and William Stauffer for discussions and Mercedes Arroyo for technical support.

The authors declare no competing financial interests.

- Correspondence should be addressed to Martin O'Neill, Department of Physiology, Development, and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK. oneillmartin007{at}gmail.com