Making choices between goods appears intuitive but difficulties arise from a computational perspective. In particular, what metric can be used to compare goods that vary along dimensions lacking common units, such as time and amount? A predominant idea is that we integrate value across those dimensions and choose the option with the highest value on the integrated common currency scale. Indeed, when individuals make economically consistent choices, they behave as though they choose the good with the highest integrated value. A fundamental challenge for neuroscience is mapping neuronal activity onto different aspects of this choice process, thereby illuminating underlying biological mechanisms. A plausible neuronal model for the choice process has neurons at the input stage that encode integrated subjective value signals, and downstream neurons that facilitate decisions by comparing those subjective value input signals (Padoa-Schioppa, 2011). It is therefore of great interest to understand neuronal activity that could provide the integrated subjective value input signals to a comparison process.
A previous study provided evidence that economic value is encoded in the orbitofrontal cortex (OFC) (Padoa-Schioppa and Assad, 2006). By offering monkeys two different juices in variable quantities, the study found that most sampled neurons in the OFC represented one of three important economic variables: (1) the values of individual choice options (offer-value cells), (2) the value of the chosen option (chosen-value cells), and (3) the identity of the chosen reward (chosen-juice cells). The data indicated that chosen-value cells integrated juice taste and quantity into a subjective value signal. However, the activity of offer-value cells strongly correlated with juice quantity, making it difficult to determine whether these neurons encoded subjective value or objective properties. Importantly, whereas chosen-value neurons appear to represent a post-decision variable, offer-value cells can, in principle, act as input to a decision. Thus, offer-value cells might represent the subjective value input signals necessary for a neuronal choice computation. A recent study by Raghuraman and Padoa-Schioppa (2014) sought to investigate this by examining whether these neurons encode objective or subjective values.
Raghuraman and Padoa-Schioppa (2014) recorded single-neuron activity in the OFC while monkeys performed saccade-guided binary choices. The choice options were indicated by visual cues that predicted different juice tastes that varied in size as well as the risk associated with probabilistic delivery. The principal advance in this study was achieved through the latter manipulation: reward risk. In decision theory, risk is defined as known uncertainty in the outcome distribution (Knight, 1921), and it has a central role in influential economic theories such as expected utility theory (von Neumann and Morgenstern, 1944). Importantly for neuroscientists attempting to isolate subjective value signals in the brain, the value of a risky option is purely subjective: risk enhances subjective value in risk seekers and reduces subjective value in risk avoiders. In contrast, other reward attributes (such as taste) differ objectively as well as subjectively: differential neuronal responses to different tastes might represent different chemical properties of the tastes rather than different subjective values. Manipulating reward risk allows researchers to present choices that have different subjective values but identical expected values, thus isolating subjective from objective value.
Raghuraman and Padoa-Schioppa (2014) found that monkeys were risk-seeking the majority of the time. That is, when faced with a choice between a gamble and a safe option with identical expected values, the animals tended to choose the gamble (αbehavioral <1, as quantified in the paper). Although risk aversion is normal among human decision makers, most studies on monkeys have uncovered risk-seeking behavior, at least for small rewards (McCoy and Platt, 2005; O'Neill and Schultz, 2010; Lak et al., 2014). Therefore, from a behavioral perspective the current study is consistent with the risk preferences previously reported.
To determine whether OFC neurons incorporated risk into value signals in a manner consistent with the behavioral choices, Raghuraman and Padoa-Schioppa (2014) constructed a neuronal index for risk attitude, αneuronal. The logic behind this measurement is straightforward. Neurons that encode objective value alone (i.e., a gamble's expected value) would respond identically to a 25% chance of winning eight drops of juice, a 50% chance of four drops of juice, and a certain offer of two drops of juice. Such a neuron will display αneuronal = 1. Greater modulation of neuronal activity in response to the risky options compared with certain rewards corresponds to αneuronal <1, indicating neuronal value responses compatible with risk seeking. Likewise, smaller changes in the neuronal response to the risky options compared with the safe options would translate to αneuronal >1 and reflect risk aversion. Using this framework, the authors showed that the majority of offer- and chosen-value neurons incorporated risk into value in a manner consistent with risk seeking (αneuronal <1). Because this result mirrored the monkeys' overall risk seeking behavior, the OFC neurons appeared to integrate reward attributes and reflect risk attitude.
Risk attitude is not a stable phenomenon. For instance, current wealth state is well known to influence risk attitude (Bernoulli, 1954). The poorer a person is, the more risk averse his choices will be. It was recently discovered that this phenomenon generalizes to thirst: thirsty monkeys were more risk averse for juice rewards than sated monkeys (Yamada et al., 2013). Therefore, if a neuron is involved in trial-by-trial choice behavior, its activity should reflect animals' moment-by-moment risk attitude, rather than a global metric of risk seeking or avoiding. To address this issue, the authors compared the session-by-session measures of behavioral and neuronal risk modulation (αbehavioral and αneuronal). This subtle yet important analysis revealed a positive relationship between daily risk attitude and the activity of offer- and chosen-value neurons. Despite the small correlation coefficient, the result was significant and provided evidence that OFC neurons are dynamically engaged in the economic valuation. It is possible that taking the reward magnitude more directly into account might substantially increase the measured correlation. Human risk attitudes are highly dependent on the magnitude of the rewards at stake (Markowitz, 1952). People are more likely to gamble when they are “playing for peanuts” (Prelec and Loewenstein, 1991) and become more risk averse as real stakes increase (Holt and Laury, 2002). The monkeys in this task gambled for differently sized rewards. Therefore, taking the outcome magnitudes into consideration might reveal a nonlinear relationship between reward size and risk attitude, and increase the correlation between subjective value and OFC neuronal responses.
In addition to encoding variables relevant immediately before and after a decision, Raghuraman and Padoa-Schioppa (2014) demonstrated that OFC neurons encoded events that occurred after the decision, such as the delivery or omission of juice or the value of the received juice. These post-decision signals could be useful during reward learning, and may be related to other reward learning signals, including dopamine prediction error responses (Schultz et al., 1997). Indeed, similar to OFC value-coding neurons, dopamine prediction error responses encode integrated subjective value (Lak et al., 2014). These prediction error signals encode reward properties only when they are valuable to individual decision makers. For instance, risk enhances dopamine responses during risk seeking but fails to modulate dopamine responses during risk-neutral behavior. Therefore, together with the current findings and studies in other brain regions (Matsumoto et al., 2007; So and Stuphorn, 2012), it is apparent that learning-related subjective value signals are found in the activity of single neurons in several brain structures. Quantification of neuronal reinforcement-learning signals using behavioral economic theories would thus provide a coherent framework for studying neuronal correlates of reward learning during economic decision-making.
Raghuraman and Padoa-Schioppa (2014) concluded that OFC neurons integrated reward attributes and encoded the subjective value of choice alternatives and the chosen option. This finding reinforces the idea that the OFC has the capacity to perform a major part of the computations necessary for economic choice. This conclusion does not preclude the possibility that some OFC neurons retain the capacity to reflect valueless variables. Indeed, Raghuraman and Padoa-Schioppa (2014) reported neurons that encode the riskiness of the option. Such OFC neurons were found previously and their activity was shown to be distinct from reward value (O'Neill and Schultz, 2010). This risk signal could be a precursor used in the computation of subjective value signals. Thus, the OFC encodes subjective value and precursors of subjective value.
Despite this clear evidence that OFC neurons encode subjective value of goods, the nature or source of the subjective value signal remains a mystery. Nearly every influential theory of decision-making involves a combination of reward value and the probability of getting that reward. It has long been hypothesized that subjective value can arise from the nonlinear relationship between reward utility and physical reward value (Bernoulli, 1954). More recently, it has been recognized that a second source of nonlinearity can be found in reward probability processing. Decision makers often overweight low-probability rewards and underweight high-probability rewards, leading to an inverted S-shaped probability distortion function (Fig. 1A). Modern decision theories posit that decision makers combine distorted probabilities with a nonlinear utility function (Fig. 1B) to calculate the ultimate decision variable (Kahneman and Tversky, 1979). In this scheme, risk attitudes can be modified in different ways. For instance, apparent risk seeking would increase as the elevation of the probability weight functions increases (gambles become more attractive, especially with the low probabilities used here), or utility functions become more convex (increasing marginal utility for rewards) (Fig. 1, blue arrows). Conversely, apparent risk avoidance would increase when the elevation of the probability weight function decreases (gambles are less attractive) or when utility functions become more concave (decreasing marginal utility for rewards) (Fig. 1, red arrows) (Gonzalez and Wu, 1999). Accordingly, Raghuraman and Padoa-Schioppa (2014) attempted to distinguish different forms of nonlinear subjective value in the OFC signals (dP and dU, as they were named in their paper). Although the current data could not distinguish between the two sources of subjective value, disambiguating these concepts in the future studies could provide mechanistic insights into the neurobiological computation of values and choices.
A, Classical probability weighting function indicates distorted probabilities that individuals assign to nominal reward probabilities. B, Utility functions represent the relationship between reward magnitude and utility and can account for risk seeking (convex), risk avoiding (concave), or risk neutral (identity line) behavior.
By introducing economic risk into their experiment, Raghuraman and Padoa-Schioppa (2014) demonstrated that neurons in the OFC could provide the necessary integrated subjective value input to the choice computation. Additionally, they identified neuronal correlates for several other variables that might have direct roles in a variety of behavioral contexts, including reward learning. Future studies can build on these insights in multiple ways, including identifying the neuronal building blocks of economic values (namely neurons encoding utility functions and distorted probability), as well as investigating single neuron correlates of reward learning during economic choice.
Footnotes
Editor's Note: These short, critical reviews of recent papers in the Journal, written exclusively by graduate students or postdoctoral fellows, are intended to summarize the important findings of the paper and provide additional insight and commentary. For more information on the format and purpose of the Journal Club, please see http://www.jneurosci.org/misc/ifa_features.shtml.
We thank Fabian Grabenhorst for helpful comments, and Wellcome Trust and European Research Council for financial support.
- Correspondence should be addressed to either Armin Lak or William R. Stauffer, Department of Physiology, Development, and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK. arminlak{at}gmail.com or william.stauffer{at}gmail.com