Neural Signatures of Value Comparison in Human Cingulate Cortex during Decisions Requiring an Effort-Reward Trade-off

Integrating costs and benefits is crucial for optimal decision-making. Although much is known about decisions that involve outcome-related costs (e.g., delay, risk), many of our choices are attached to actions and require an evaluation of the associated motor costs. Yet how the brain incorporates motor costs into choices remains largely unclear. We used human fMRI during choices involving monetary reward and physical effort to identify brain regions that serve as a choice comparator for effort-reward trade-offs. By independently varying both options' effort and reward levels, we were able to identify the neural signature of a comparator mechanism. A network involving supplementary motor area and the caudal portion of dorsal anterior cingulate cortex encoded the difference in reward (positively) and effort levels (negatively) between chosen and unchosen choice options. We next modeled effort-discounted subjective values using a novel behavioral model. This revealed that the same network of regions involving dorsal anterior cingulate cortex and supplementary motor area encoded the difference between the chosen and unchosen options' subjective values, and that activity was best described using a concave model of effort-discounting. In addition, this signal reflected how precisely value determined participants' choices. By contrast, separate signals in supplementary motor area and ventromedial prefrontal cortex correlated with participants' tendency to avoid effort and seek reward, respectively. This suggests that the critical neural signature of decision-making for choices involving motor costs is found in human cingulate cortex and not ventromedial prefrontal cortex as typically reported for outcome-based choice. Furthermore, distinct frontal circuits seem to drive behavior toward reward maximization and effort minimization. SIGNIFICANCE STATEMENT The neural processes that govern the trade-off between expected benefits and motor costs remain largely unknown. This is striking because energetic requirements play an integral role in our day-to-day choices and instrumental behavior, and a diminished willingness to exert effort is a characteristic feature of a range of neurological disorders. We use a new behavioral characterization of how humans trade off reward maximization with effort minimization to examine the neural signatures that underpin such choices, using BOLD MRI neuroimaging data. We find the critical neural signature of decision-making, a signal that reflects the comparison of value between choice options, in human cingulate cortex, whereas two distinct brain circuits drive behavior toward reward maximization or effort minimization.


Introduction
Cost-benefit decisions are a central aspect of flexible goaldirected behavior. One particularly well-studied neural system concerns choices where costs are tied to the reward outcomes (e.g., risk, delay) (Kable and Glimcher, 2007;Boorman et al., 2009;Philiastides et al., 2010). Much less is known about choices tied to physical effort costs, despite their ubiquitous presence in human and animal behavior. The intrinsic relationship between effort and action may engage neural circuits distinct from those involved in other value-based choice computations.
There is growing consensus that different types of valueguided decisions are underpinned by distinct neural systems, depending on the type of information that needs to be processed (e.g., Rudebeck et al., 2008;Camille et al., 2011b;Kennerley et al., 2011;Pastor-Bernier and Cisek, 2011;. For example, activity in the ventromedial prefrontal cortex (vmPFC) carries a signature of choice comparison (chosen-unchosen value) for decisions between abstract goods or when costs are tied to the outcome (Kable and Glimcher, 2007;Boorman et al., 2009;FitzGerald et al., 2009;Philiastides et al., 2010;Hunt et al., 2012;Kolling et al., 2012;Clithero and Rangel, 2014;Strait et al., 2014). By contrast, such value difference signals are found more dorsally in medial frontal cortex when deciding between exploration versus exploitation (Kolling et al., 2012).
Choices requiring the evaluation of physical effort rest on representations of the required actions and their energetic costs, and thus likely require an evaluation of the internal state of the agent. This is distinct from choices based solely on reward outcomes (Rangel and Hare, 2010). Indeed, the proposed network for evaluating motor costs comprises brain regions involved in action planning and execution, including the cingulate cortex, putamen, and supplementary motor area (SMA) (Croxson et al., 2009;Kurniawan et al., 2010;Prévost et al., 2010;Burke et al., 2013;Kurniawan et al., 2013;Bonnelle et al., 2016). Neurons in anterior cingulate cortex (ACC) encode information about rewards, effort costs, and actions (Matsumoto et al., 2003;Kennerley and Wallis, 2009;Luk and Wallis, 2009;Hayden and Platt, 2010), and integrate this information into an economic value signal (Hillman and Bilkey, 2010;Hosokawa et al., 2013). Moreover, lesions to ACC profoundly impair choices of effortful options and between action values Schweimer and Hauber, 2005;Kennerley et al., 2006;Rudebeck et al., 2006Rudebeck et al., , 2008Camille et al., 2011b).
While these studies highlight the importance of motor-related structures in representing effort information, it remains unclear whether computations in these regions are indeed related to comparing effort values (or effort-discounted net values), the essential neural signature, which would implicate these areas in decision making. Indeed, these regions could simply represent effort, which is then passed onto other regions for value comparison processes. A number of questions thus arise. First, is information about reward and effort compared in separate neural structures, or is this information fed to a region that compares options based on their integrated value? Second, do regions that preferably encode reward or effort have a direct influence on determining choice? Finally, assuming separate neural systems are present for influencing choices based on reward versus effort, how does the brain arbitrate between these signals when reward and effort information support opposing choices?
Here we used a task designed to identify signatures of a choice comparison for effort-based decisions in humans using fMRI and to test whether different neural circuits "drive" choices toward reward maximization versus energy minimization. We show that the neural substrates of effort-based choice are distinct from those computing outcome-related choices: well-known reward and effort circuits centered on vmPFC and SMA bias choices to be more driven by benefits or motor costs, respectively, with a region in cingulate cortex integrating cost and benefit information and comparing options based on these integrated subjective values.

Materials and Methods
Participants. Twenty-four participants with no history of psychiatric or neurological disease, and with normal or corrected-to-normal vision took part in this study (mean age 28 Ϯ 1 years, age range 19 -38 years, 11 females). All participants gave written informed consent and consent to publish before the start of the experiment; the study was approved by the local research ethics committee at University College London (1825/003) and conducted in accordance with the Declaration of Helsinki. Participants were reimbursed with £15 for their time; in addition, they accumulated average winnings of £7.16 Ϯ 0.11 during each of the two blocks of the task (the maximum winnings per block were scaled to £8; the resulting average total pay was £29.32). Three participants were excluded from the analysis: one for failing to stay awake during scanning and two due to excessive head movements (summed movement in any direction and run Ͼ40 mm). All analyses were performed on the remaining 21 participants.
Behavioral task. Participants received both written and oral task instructions. They were asked to make a series of choices between two options, which independently varied in required grip force (effort) and reward magnitude (see Fig. 1A). The reward magnitude was shown as a number (range: 10 -40 points; approximately corresponding to pence) and required force levels were indicated as the height of a horizontal bar (range: 20%-80% of the participant's maximum grip force).
Each trial comprised an offer, response, and outcome phase; a subset of 30% of trials also contained an effort production phase. During the offer phase, participants decided which option to choose but they were not yet able to indicate their response. There were two trial types (50% each): ACT (action) and ABS (abstract). In ACT trials, the two choice options were presented to the left and right of fixation, and thus in a horizontal or action space configuration in which the side of presentation directly related to the hand with which to choose that option. In ABS trials, choice options were shown above and below fixation, and thus in a vertical or goods space arrangement that did not reveal the required action. In both conditions, stimuli were presented close to the center of the screen and participants did not need to move their eyes to inspect them. To maximally distinguish the hemodynamic response from the offer and response phase, the duration of the offer phase varied between 4 and 11 s (Poisson distributed; mean 6 s).
The response phase started when the fixation cross turned red. In ACT trials, the arrangement of the two choice options remained the same; in ABS trials, the two options at the top and bottom were switched to the left and right of fixation (with a 50/50% chance), thus revealing the required action mapping. Choices were indicated by a brief squeeze of a grip device (see below for details) on the corresponding side (maximum response time: 3 s; required force level: 35% of maximum voluntary contraction [MVC]). ACT and ABS trials were merged for all analyses because no significant differences were found for the tests reported in this manuscript.
On 70% of trials, no effort was required: as soon as participants indicated their choice, the unchosen option disappeared, and the message "no force" was displayed for 500 ms. The next trial commenced after a variable delay (intertrial interval: 2-13 s; Poisson distributed; mean: 5 s). On the remaining 30% of trials, a power grip of 12 s was required (effort). Again, the unchosen option disappeared, but now a thermometer appeared centrally and displayed the target force level of the chosen option. Participants were given online visual feedback about the applied force level using changing fluid levels in the thermometer. On successful application of the required force for at least 80% of the 12 s period, a green tick appeared (500 ms; outcome phase; delay preceding outcome: 0.5-1.5 s uniform) and the reward magnitude of the chosen option was added to the total winnings. Otherwise, the total winnings remained unchanged (red cross: 500 ms). Because participants were almost always successful in applying the required force on effort trials (accuracy: 99.30 Ϯ 0.004%; only 4 participants made any mistakes), there was no confound between effort level and risk/reward expectation.
The sensitivity of the grip device was manipulated between trials (high or low). A high gain meant that the grippers were twice as sensitive as for a low gain, and thus the same force deviation doubled the rate of change in the thermometer's fluid level. While this manipulation was introduced to study interactions between mental and physical effort, none of our behavioral or fMRI analyses revealed any significant effects of gain during the choice phase, which is the focus of the present paper.
To summarize, our task involved several important features: (1) as our aim was to specifically examine value comparison mechanisms during effort-based choice, we manipulated both options' values and thus the expected values of the two offers had to be computed and compared online in each trial, unlike in previous experiments (Croxson et al., 2009;Kurniawan et al., 2010Kurniawan et al., , 2013Prévost et al., 2010;Burke et al., 2013;Bonnelle et al., 2016); (2) the decision process and the resulting motor response were separated in time (see Fig. 1A). This enabled us to examine the value comparison in the absence of, and not confounded with, pro-Figure 1. Task and behavior. A, Human participants chose between two options associated with varying reward magnitude (numbers) and physical effort (bar height translates into force, Offer). Once the fixation cross turned red (Response), participants were allowed to indicate their choice. Thus, the time of choice computation was separable in time from the motor response. Following a response, the effort had to be realized on an unpredictable 30% of trials (top). On these trials, participants had to produce a 12 s power grip at a strength proportional to the bar height of the chosen option. Force levels were adjusted to individuals' maximum force at the start of the experiment. Participants received feedback about successful performance of the grip (99% accuracy), and the rewards collected on successful trials were added to the total winnings. On 70% of trials (bottom), no effort was required and the next trial commenced (intertrial interval [ITI]). B, Participants' choices were driven by both options' reward magnitude and effort level showing that all dimensions of the outcome were taken into account for computing a choice. Benefits and costs had opposite effects: larger efforts discouraged, and larger reward magnitudes encouraged, the choice of an option (standard errors: Ϯ SEM). C, Correlations between left (L), right (R), chosen (C), and unchosen (U) effort levels (e) and reward magnitudes (r) show that the regressors of interest were sufficiently decorrelated in our design. D, Effort has a strong effect on choice in trials with small reward differences, but no effect when the reward difference is large (green panels; median split on reward difference; effort binned for visualization). Similarly, reward has a stronger effect in trials with small effort differences compared with trials with large effort differences (blue panels). This shows that participants indeed trade off effort against reward, and confirms that reward has a stronger and opposite effect compared with effort (red slope), as shown in B. Black lines indicate individual participants and suggest that reward and effort were treated as continuous variables.
cesses related to action execution; (3) both reward and effort levels were varied parametrically rather than in discrete steps, and orthogonally to each other, thereby granting high sensitivity for the identification of effort and reward signals, respectively; (4) efforts were only realized on a subset of trials, ensuring that decisions were not influenced by fatigue (Klein-Flügge et al., 2015). Importantly, however, at the time of choice, participants did not know whether a given trial was real or hypothetical; therefore, the optimal strategy was to treat each trial as potentially real; and (5) the duration of the grip on effort trials (12 s) had been determined in pilot experiments and ensured that force levels were factored into the choice process. Moreover, the fixed duration of grip force also meant that effort costs were not confounded with temporal costs.
Scanning procedure. Before scanning, force levels were adjusted to each individual's grip strength using a grip calibration. Participants were seated in front of a computer monitor and held a custom-made grip device in both hands. Each participant's baseline (no grip) and MVC were measured over a period of 3 s, separately for both hands. The measured values were used to define individual force ranges (0%-100%) for each hand, which were then used in the behavioral task, both prescanning and during scanning.
Before entering the scanner, participants completed a training session consisting of one block of the behavioral task (112 trials, ϳ30 min). This gave them the opportunity to experience different force levels and to become familiar with the task. Importantly, it also ensured that decisions made subsequently in the scanner would not be influenced by uncertainty about the difficulty of the displayed force levels. In the scanner, participants completed two blocks of the task (overall task duration ϳ60 min; 224 choices).
Generation of choice stimuli. Because our main question related to the encoding of value difference signals during effort-based choices, the generation of suitable choice stimuli was a key part of the experimental design. Choice options were identical for every individual and were chosen such that they would minimize the correlation between the fMRI regressors for chosen and unchosen effort, reward magnitude, and value (obtained mean correlations after scanning: effort: Ϫ0.23; reward magnitude: 0.11; value: 0.43; see Fig. 1C). We also ensured that left and right efforts, reward magnitudes, and values were decorrelated to be able to identify action value signals (effort: 0.28; reward magnitude: 0.05; value: 0.07). We simulated several individuals using a previously suggested value function for effort-based choice (Prévost et al., 2010). Stimuli were optimized with the following additional constraints: either the efforts or the reward magnitudes had to differ by at least 0.1 on each trial, the range of efforts and reward magnitudes was [0.2 to 0.8] ϫ MVC or 0 -50 points, respectively, and the overall expected value for both hands was comparable. Furthermore, in 85% of trials, the larger reward was paired with the larger effort level, and the smaller reward with the smaller effort level, making the choice hard, but on 15% of trials, the larger reward was associated with the smaller effort level ("no-brainer"). The two choice sets that minimized the correlations between our regressors of interest were used for the fMRI experiment. A third stimulus set was saved for the behavioral training before scanning.
Preliminary fMRI analyses revealed that we had overlooked a bias in our stimuli. In the last third of trials of the second block, the overall offer value ((magnitude1/effort1 ϩ magnitude2/effort 2)/2) decreased steadily, leading to skewed contrast estimates. Therefore, the last 40 trials were discarded from all analyses.
We refer to choices in this study as "effort-based" to highlight the distinction from purely outcome/reward-based choices or choices involving other types of costs (e.g., delay-based). But of course, in our task, all choices were effort-as well as reward-based.
Recordings of grip strength. The grippers were custom-made and consisted of two force transducers (FSG15N1A, Honeywell) placed between two molded plastic bars (see also Ward and Frackowiak, 2003). A continuous recording of the differential voltage signal, proportional to the exerted force, was acquired, fed into a signal conditioner (CED 1902, Cambridge Electronic Design), digitized (CED 1401, Cambridge Electronic Design), and fed into the computer running the stimulus presentation. This enabled us, during effort trials, to give online feedback about the exerted force using the thermometer display.
Behavioral analysis. To examine which task variables affected participants' choice behavior, a logistic regression was fitted to participants' choices (1 ϭ RH; 0 ϭ LH) using the following nine regressors: a RH-LH bias (constant term); condition (ABS or ACT); gain (high or low); LHeffort on previous trial; RH-effort on previous trial; reward magnitude left; reward magnitude right; effort left; effort right. t tests performed across participants on the obtained regression coefficients were adjusted for multiple comparisons using Bonferroni correction. Because only reward magnitudes and efforts influenced behavior significantly (see Results), the logistic regression models performed for the analysis of the neural data below (Eqs. 2, 3) only contained these variables (or their amalgamation into combined value).
To examine the influence of reward and effort on participants' choice behavior in more depth, we tested whether participants indeed weighed up effort against reward, and whether they treated reward and effort as continuous variables. If reward and effort compete for their influence on choice, then the influence of effort should become larger as the reward difference becomes smaller, and vice versa. Thus, we performed a median split of our trials according to the absolute difference in reward (or effort) between the two choice options. We then calculated the likelihood of choosing an option as a function of its effort (reward) level, separately for the two sets of trials. Effort (reward) values were distributed across 10 bins with equal spacing; this binning was independent of the effort (reward) level of the alternative option. For statistical comparisons, we fitted a slope for each participant to the mean of all bins. t tests were performed on the resulting four slopes testing for the influence of (1) effort in trials with small reward difference, (2) effort in trials with large reward difference, (3) reward in trials with small effort difference, and (4) reward in trials with large effort difference. We report uncorrected p values, but all conclusions hold when correcting for six comparisons (1-4 against zero; 1 vs 2; 3 vs 4).
We also tested for effects of fatigue: the above logistic regression suggested that choices were not affected by whether or not the previous trial required the production of effort, as shown previously in this task (Klein-Flügge et al., 2015). More detailed analyses examined the percentage of trials in which the higher effort option was chosen (running average across 20 trials), and participants' performance in reaching and maintaining the required force. The latter was measured as the time point when 10 consecutive samples were above force criterion (the shorter, the sooner), and as the percentage of time out of 12 s that participants were at criterion, respectively. For all measures, we compared the first and last third of trials. Here we report the comparison between the first and last third across the entire experiment. However, separate analyses, using the first and last third of just the first or the second block, revealed identical results. There were no effects of fatigue: in all cases, participants either improved or stayed unchanged (percentage higher effort chosen: first third, 60.56 Ϯ 1.94%; last third, 60.93 Ϯ 2.79%, p ϭ 0.69, t (20) ϭ Ϫ0.40; reaching the force threshold: first third, 0.83 Ϯ 0.04 s; last third, 0.76 Ϯ 0.03 s; p ϭ 0.01, t (20) ϭ 2.82; maintaining the force above threshold: first third, 92.49 Ϯ 0.47%; last third, 93.51 Ϯ 0.28%; p ϭ 0.01, t (20) ϭ Ϫ2.84).
To derive participants' subjective values for the offers presented on each trial, we developed an effort discounting model (Klein-Flügge et al., 2015). This model has been shown to provide better fits than the hyperbolic model previously suggested for effort discounting (Prévost et al., 2010) both here and in our previously published work (Klein-Flügge et al., 2015). Crucially, its shape is initially concave, unlike a hyperbolic function, allowing for smaller devaluations of value for effort increases at weak force levels, and steeper devaluations at higher force levels, which is intuitive for effort discounting and biologically plausible. Our model follows the following form: V is subjective value, C is the effort cost, M the reward magnitude, and k and p are free parameters. C and M are scaled between 0 and 1, corresponding to 0% MVC and 100% MVC, and 0 points and 50 points, respectively. A simple logistic regression on the difference in subjective values between choice options was then used to fit participants' choices; in other words, the following function (softmax rule) was used to transform the subjective values V1 and V2 of the two options offered on each trial into the probability of choosing option 1 as follows: (2) The free parameters (slope k, turning point p, softmax precision parameter ␤ V ), were fitted using the Variational Laplace algorithm (Penny et al., 2003;Friston et al., 2007). This is a Bayesian estimation method, which incorporates Gaussian priors over model parameters and uses a Gaussian approximation to the posterior density. The parameters of the posterior are iteratively updated using an adaptive step size, gradient ascent approach. Importantly, the algorithm also provides the free energy F, which is an approximation to the model evidence. The model evidence is the probability of obtaining the observed choice data, given the model. To maximize our chances to find global, rather than local maxima with this gradient ascent algorithm, parameter estimation was repeated over a grid of initialization values, with eight initializations per parameter. The optimal set of parameters (i.e., that obtained from the initialization that resulted in the maximal free energy) was used for modeling subjective values in the fMRI data. For our BOLD analyses, the most relevant parameter was ␤ V . It reflects the weight (i.e., strength) with which participants' choices are driven by subjective value, rather than noise; it is also often referred to as precision or inverse softmax temperature. Fitting ACT and ABS, or high and low gain trials separately did not lead to any significant differences between conditions (paired t tests on parameter estimates between conditions all p Ͼ 0.3) and did not improve the model evidence (paired t test on the model evidence; fitting conditions separately or not: p ϭ 0.82; fitting gain separately or not: p ϭ 0.63). Trials were therefore pooled for model fitting. Once fitted, the performance of our new model was compared with that of the hyperbolic model and two parameter-free models (difference: reward Ϫ effort; quotient: reward/effort) as described by Klein-Flügge et al. (2015) using a formal model comparison. fMRI data acquisition and preprocessing. The fMRI methods followed standard procedures (e.g., Klein-Flügge et al., 2013): T2*-weighted EPI with BOLD contrast were acquired using a 12 channel head coil on a 3 tesla Trio MRI scanner (Siemens). A special sequence was used to minimize signal drop out in the OFC region (Weiskopf et al., 2006) and included a TE of 30 ms, a tilt of 30°relative to the rostrocaudal axis and a local z-shim with a moment of Ϫ0.4 mT/m ms applied to the OFC region. To achieve whole-brain coverage, we used 45 transverse slices of 2 mm thickness, with an interslice gap of 1 mm and in-plane resolution of 3 ϫ 3 mm, and collected slices in an ascending order. This led to a TR of 3.15 s. In each session, a maximum of 630 volumes were collected (ϳ33 min) and the first five volumes of each block were discarded to allow for T1 equilibration effects. A single T1-weighted structural image with 1 mm 3 voxel resolution was acquired and coregistered with the EPI images to permit anatomical localization. A fieldmap with dual echo-time images (TE1 ϭ 10 ms, TE2 ϭ 14.76 ms, whole brain coverage, voxel size 3 ϫ 3 ϫ 3 mm) was obtained for each subject to allow for corrections in geometric distortions induced in the EPIs at high field strength (Andersson et al., 2001).
During the EPI acquisition, we also obtained several physiological measures. The cardiac pulse was recorded using an MRI-compatible pulse oximeter (model 8600 F0, Nonin Medical), and thoracic movement was monitored using a custom-made pneumatic belt positioned around the abdomen. The pneumatic pressure changes were converted into an analog voltage using a pressure transducer (Honeywell International) before digitization, as reported previously (Hutton et al., 2011).
Preprocessing and statistical analyses were performed using SPM8 (Wellcome Trust Centre for Neuroimaging, London; www.fil.ion.ucl.ac. uk/spm). Image preprocessing consisted of realignment of images to the first volume, distortion correction using fieldmaps, slice time correction, conservative independent component analysis to identify and remove obvious artifacts (using MELODIC in Fmrib's Software Library; http:// fsl.fmrib.ox.ac.uk/), coregistration with the structural scan, normalization to a standard MNI template, and smoothing using an 8 mm FWHM Gaussian kernel.
Data analysis: GLM. The first GLM (GLM1) included 12 main event regressors. The offer phase was described using onsets for (1) ACT trials preparing a left response, (2) ACT trials preparing a right response, and (3) ABS trials. All three events were modeled using durations of 2 s and were each associated with four parametric modulators: the reward magnitude and effort of the chosen and unchosen option. Crucially, these four parametric modulators competed to explain common variance during the estimation, rather than being serially orthogonalized (in other words, we implicitly tested for effects that were unique to each parametric explanatory variable). The response phase was described using four regressors for "no force" trials (1 s duration) and four regressors for effort production trials (12 s duration): (4 -7) no force ACT left , ACT right , ABS left , ABS right ; (8 -11) effort production left (low gain), left (high gain), right (low gain), right (high gain). Finally, the outcome was modeled as a single regressor because the proportion of trials in which efforts were not produced successfully was negligible (median: 0; mean: 0.43 Ϯ 0.22 trials; only 4 of 21 participants had any unsuccessful trials).
In addition to event regressors, a total of 23 nuisance regressors were included to control for motion and physiological effects of no interest. First, to account for motion-related artifacts that had not been eliminated in rigid-body motion correction, the six motion regressors obtained during realignment were included. Second, to remove variance accounted for by cardiac and respiratory responses, a physiological noise model was constructed using an in-house MATLAB toolbox (The Math-Works) (Hutton et al., 2011). Models for cardiac and respiratory phase and their aliased harmonics were based on RETROICOR (Glover et al., 2000). The model for changes in respiratory volume was based on Birn et al. (2006). This resulted in 17 physiological regressors in total: 10 for cardiac phase, 6 for respiratory phase, and 1 for respiratory volume.
The parameters of the hemodynamic response function were modified to obtain a double-gamma hemodynamic response function, with the standard settings in Fmrib's Software Library (http://fsl.fmrib.ox.ac.uk/): delay to response 6 s, delay to undershoot 16 s, dispersion of response 2.5 s, dispersion of undershoot 4 s, ratio of response to undershoot 6 s, length of kernel 32 s.
The second GLM (GLM2) was identical to the first, except that the four parametric regressors (reward magnitude and effort of the chosen and unchosen option) were replaced by the subjective model-derived values of the chosen and unchosen option. This allowed us to identify regions encoding the difference in subjective value between the offers.
Three further GLMs were fitted to the data to test whether the values derived from the sigmoidal model provide the best explanation of the measured BOLD signals. These GLMs were identical to GLM2, except that the parametric regressors for the values of the chosen and unchosen option derived from the sigmoidal model were replaced by (1) the values derived from a hyperbolic model (GLM3), (2) the values derived from a parameter-free difference "reward Ϫ effort" (GLM4), or (3) the values derived from a parameter-free quotient "reward/effort" (GLM5).
Identifying signatures of choice computation. Our first aim was to identify brain regions with BOLD signatures of choice computation (see Fig.  2A). Thus, we first identified brain regions that fulfilled the following two criteria (GLM1): (1) the BOLD signal correlated negatively with the difference in effort between chosen and unchosen options; and (2) the BOLD signal correlated positively with the difference in reward magnitude between chosen and unchosen options. Collectively, these two signals form the basis of a value difference signal because effort contributes negatively and reward magnitude contributes positively to overall value. Previous work has demonstrated, using predictions derived from a biophysical cortical attractor network, that at the level of large neural populations, as measured using human neuroimaging techniques, such as fMRI or MEG, the characteristic signature of a choice comparison process is a value difference signal (Hunt et al., 2012). The responses predicted for harder and easier choices differ because the speed of the network computations varies as a function of choice difficulty (e.g., faster for high value difference). Thus, an area at the formal conjunction of the two contrasts described by options 1 and 2 would carry the relevant signatures for computing a subjective value difference signal, a cardinal requirement for guiding choice. Importantly, while we reasoned that the choice computations in our specific task should follow similar principles as in Hunt et al. (2012), we expected this computation to occur in different regions because it would be based on the integration of a different type of decision cost. In an additional analysis (see Fig. 5), for completeness, we also identified brain regions significant in the inverse contrast (i.e., a conjunction of positive effort and negative reward magnitude difference) (Wunderlich et al., 2009;Hare et al., 2011).
Regions of interest (ROIs) and extraction of time courses. For wholebrain analyses, we used a FWE cluster-corrected threshold of p Ͻ 0.05 (using a cluster-defining threshold of p Ͻ 0.01 and a cluster threshold of 10 voxels). For a priori ROI analyses, we used a small-volume corrected BOLD time series were extracted from the preprocessed data of the identified regions by averaging the time series of all voxels that were significant at p Ͻ 0.001 (uncorrected). Time series were up-sampled with a resolution of 315 ms (1/10 ϫ TR) and split into trials for visual illustration of the described effects (e.g., see Fig. 2B).
At the suggestion of one reviewer, the two main analyses (conjunction of reward and inverse effort difference described above, and value difference contrast described below) were repeated in FSL using Flame1 because of differences between SPM and FSL in controlling for false positives when using cluster-level corrections (Eklund et al., 2015). For this control analysis, we imported the preprocessed (unsmoothed) images to FSL. We then used FSL's default smoothing kernel of 5 mm and a cluster-forming threshold of z Ͼ 2.3 (corresponding to p Ͻ 0.01; default in FSL). The obtained results are overlaid in Figure 2A, D.
Encoding of subjective value. We next asked whether BOLD signal changes in the regions identified using the abovementioned conjunction could indeed be described by the subjective values derived from our custom-made behavioral model. We thus performed a whole-brain contrast, identifying regions encoding the difference in subjective value between the chosen and unchosen option (GLM2; see Fig. 2D). To test whether the BOLD signal was better explained by subjective value as modeled using the sigmoidal function or three alternative models (hyperbolic; "difference": reward Ϫ effort; "quotient": reward/effort; GLM3-GLM5; see Fig. 3B), we calculated the difference between the value difference maps obtained on the first level for each participant (sigmoid vs hyperbolic; sigmoid vs difference; sigmoid vs quotient; see Fig. 3C). A standard second-level t test was performed on the three resulting difference images and statistical significance evaluated as usual.
Relating neural and behavioral effects of value difference. If it was indeed the case that the regions identified to encode value difference are involved in choice computation and as a result, inform behavior, the BOLD value signal should systematically relate to behavioral measures of choice performance (Jocham et al., 2012;Kolling et al., 2012). To test this, we used the behavioral measure of the effect of value difference, ␤ V, as derived from the logistic regression analysis above (Eq. 2). Importantly, before fitting ␤ V , model-derived subjective values were scaled between [0, 1] for all participants so that any difference in the fitted regression coefficient ␤ V indicated how strongly value difference influenced behavioral choices in a given participant. ␤ V reflects how consistently participants choose the subjectively more valuable option. In other words, this parameter captures how strongly value rather than noise determines choice behavior. To examine whether the size of the neural value difference signal carried behavioral relevance, the behavioral weights ␤ V were then used as a covariate for the value difference contrast in a second-level group analysis. At the whole-brain level, we thus identified regions where the encoding of value difference was significantly modulated by how strongly participants' choices were driven by subjective value (see Fig.  2F ). This analysis was restricted to the regions that encoded value difference at the first level. For illustration of the effect, the neural signature of value difference (regression coefficients for chosen vs unchosen value at the peak time of 6 s) was plotted against ␤ V (see Fig. 2F ).
Reward maximization versus effort minimization. In our task, reward maximization is in conflict with effort minimization in almost all trials because the option that has a higher reward value is also associated with a higher effort level. To capture the separate behavioral influences of reward and effort for each participant, another logistic regression analysis was conducted, but now both the difference in offer magnitudes and in efforts were entered into the design matrix, rather than just their combination into value as in Equation 2 as follows: Here, ␤ M is the weight or precision with which reward magnitude difference (M1 Ϫ M2) influences choice, and ␤ E is the weight (precision) with which effort difference (E1 Ϫ E2) influences choice. Next, to identify which brain regions might bias the choice computation either toward reward or away from physical effort, we performed two independent tests. First, we used the behaviorally defined weights for effort, Ϫ␤ E , as a covariate on the second level, to identify regions where the encoding of effort difference scales with how "effort averse" participants were. In such regions, a larger difference between chosen and unchosen effort signals would indicate that participants avoid efforts more strongly (see Fig. 4B). Based on prior work, we had a priori hypotheses about effort preferences being guided by SMA and putamen (e.g., Croxson et al., 2009;Kurniawan et al., 2010Kurniawan et al., , 2013Burke et al., 2013). Therefore, we used a small-volume correction ( p Ͻ 0.05) around previously established coordinates (putamen [Ϯ26, Ϫ8, Ϫ2]; SMA [4, Ϫ6, 58]) (Croxson et al., 2009). Second, in an analogous fashion, we used the behavioral weights for reward magnitude, ␤ M , as a covariate on the second level to identify regions where the encoding of reward magnitude difference scales with how reward-seeking participants are. In brain regions thus identified, a larger BOLD signal difference between chosen and unchosen reward signals would imply that participants place a stronger weight on reward maximization in their choices (see Fig. 4A). Based on prior work, we expected reward magnitude comparisons to occur in vmPFC (e.g., Kable and Glimcher, 2007;Boorman et al., 2009;Philiastides et al., 2010). Therefore, we used a small-volume correction (p Ͻ 0.05) around previously established coordinates [Ϫ6, 48, Ϫ8] (Boorman et al., 2009).
We further characterized the relationship between participants' effort sensitivity and BOLD signal changes by asking whether the neural encoding of effort difference relates to the individual distortions captured in the parameters k and p of the effort discounting function. For each trial, we compared the true effort difference between the chosen and unchosen option with the modeled subjective effort difference between the chosen and unchosen option. We took the sum of the absolute error from the best linear fit between these two variables as an index of how well our initial GLM captured subjective distortions in the evaluation of effort. We used this measure as an additional regressor for our second-level analysis, in addition to ␤ E (these two regressors are uncorrelated: r ϭ Ϫ0.27, p ϭ 0.24). This approach had the advantage that it combined subjective effort distortions driven by both p and k into a single parameter relevant for the effort comparison (correlation of the summed errors with k: r ϭ 0.9646, p Ͻ 0.001; with p: r ϭ 0.60, p ϭ 0.0043).

Results
Human participants performed choices between options with varying rewards and physical efforts (force grips; Fig. 1A). Our main aim was to identify areas carrying neural signatures of value comparison, which are sometimes absent on choices when all decision variables favor the same choice (Hunt et al., 2012). Therefore, for the majority of decisions, larger rewards were paired with larger efforts so that reward maximization competed with energy minimization, and the reward and effort of each option had to be combined into an integrated subjective value to derive a choice. We first tested whether both the size of reward and the associated effort of each choice option had an impact on participant's choice behavior. A logistic regression showed that participants' choices were indeed guided by the reward magnitude and effort of both options (left reward: t (20) ϭ Ϫ9.71, Co-hen's d ϭ Ϫ4.34, p ϭ 4.28e-08; right reward: t (20) ϭ 8.89, Cohen's d ϭ 3.98, p ϭ 1.44e-07; left effort: t (20) ϭ 7.56, Cohen's d ϭ 3.38, p ϭ 2.79e-06; right effort: t (20) ϭ Ϫ8.37, Cohen's d ϭ Ϫ3.74, p ϭ 2.79e-06; Fig. 1B). As expected, larger rewards and smaller effort costs attracted choices. Overall, participants chose the higher effort option on 48 Ϯ 2% of trials.
Given that behavior was guided by the costs as well as the benefits associated with the two choice options, we next asked whether any brain region encoded both effort and reward in a reference frame consistent with choice. Our main aim was to identify neural signatures of the choice computation: any brain region comparing the values of the two choice options should be sensitive to information about both costs and benefits. Recent work using a biophysically realistic attractor network model (Wang, 2002) suggests that the mass activity of a region computing a choice should reflect the difference of the values of both choice options (Hunt et al., 2012). In our task, a region comparing the options should hence encode (1) the inverse difference between chosen and unchosen efforts and (2) the (positive) difference between chosen and unchosen rewards. We therefore computed the formal conjunction of these two contrasts, which is a conservative test, asking whether any region is significant in both comparisons. This test focused on the decision phase, which was separated in time from the motor response (Fig. 1A). We identified a cluster of activation in the SMA and in the caudal portion of dorsal ACC (dACC), on the border of the anterior and posterior rostral cingulate zones (RCZa, RCZp) and area 24 (Neubert et al., 2015) ( Fig. 2A; p Ͻ 0.05 cluster-level FWE-corrected; peak coordinate: [Ϫ6, 11, 34], t (1,40) ϭ 4.02; SMA peak coordinate: [Ϫ9, Ϫ7, 58], t (1,40) ϭ 5.29). No other regions reached FWE cluster-corrected significance ( p Ͻ 0.05). Notably, we did not identify any activations in the vmPFC, a region commonly identified in reward-related value computations, even at lenient statistical thresholds ( p Ͻ 0.01, uncorrected). Replication of this conjunction analysis in FSL, performed at the suggestion of one reviewer, obtained comparable results, with only dACC and SMA reaching cluster-level corrected significance ( Fig. 2A, green overlays). The two difference signals for effort and reward are illustrated for the BOLD time series extracted from the dACC cluster in Figure 2B.
These results raise the question of whether and how effort and reward are combined into an integrated value for each option, a prerequisite for testing whether any brain region encodes the comparison between subjective option values. Although established models exist to examine how participants compute compound values for uncertain/risky rewards (prospect theory) (Kahneman and Tversky, 1979;Tversky and Kahneman, 1992) and delayed rewards (hyperbolic) (Mazur, 1987;Laibson, 1997;Frederick et al., 2002), it remains unclear how efforts and rewards are combined into a subjective measure of value. We performed several behavioral experiments to develop a behavioral model that can formally describe the range of effort discounting behaviors observed in healthy populations (Klein-Flügge et al., 2015).
One key feature of this model is that it can accommodate cases when increases in effort at lower effort levels have a comparatively small effect on value, compared with increases in effort at higher effort levels (i.e., concave discounting).
When we fitted this model to the choices recorded during the scanning session, participants' behavior was indeed best captured by an initially concave discounting shape (initially concave in 16 of 21 participants; Fig. 2C), consistent with previous work (Klein-Flügge et al., 2015) and the intuition that effort increases are less noticeable at lower levels of effort compared with higher levels of effort.
Using the individual model fits, we then directly tested for neural signatures consistent with a value comparison between the subjective values of the two choice options. This is a slightly less conservative test than the formal conjunction of effort and reward magnitude difference described above, but we note that this test revealed a highly consistent pattern of results. We found strong evidence for a network consisting of the SMA (peak: [Ϫ9,  2D; all cluster-level FWE-corrected; p Ͻ 0.05). Again, comparable results were obtained using FSL (Fig. 2D, green overlays). This network resembled regions previously described for the evaluation of physical effort but was clearly distinct from the neural system associated with decisions about goods involving the vmPFC (Kable and Glimcher, 2007;Boorman et al., 2009;FitzGerald et al., 2009;Philiastides et al., 2010;Hunt et al., 2012;Kolling et al., 2012;Clithero and Rangel, 2014;Strait et al., 2014).
To validate our choice of behavioral discounting model, we performed a formal model comparison and found that the sigmoidal model provided a better explanation of choice behavior than (convex) hyperbolic discounting, previously proposed for effort discounting (Prévost et al., 2010), and two parameter-free descriptions of value "reward minus effort" and "reward divided by effort" (model exceedance probability: xp ϭ 1; mean of posterior distribution: mp_sigm ϭ 0.75; mp_hyp ϭ 0.05; mp_diff ϭ 0.16; mp_div ϭ 0.04; Fig. 3A). On average, the sigmoidal model correctly predicted 88 Ϯ 1% of choices. To examine whether our measure of value derived from the sigmoidal model also best predicted the BOLD signal, we recalculated the value difference contrasts in an analogous way, this time modeling value using a hyperbolic or one of the two parameter-free models. The resulting whole-brain maps similarly highlighted SMA and dACC (surviving cluster-level FWE-corrected, p Ͻ 0.05 for the hyperbolic and difference models, not significant for the quotient model; Fig. 3B (2) the inverse difference between the chosen and unchosen effort levels. The conjunction of both contrasts in SPM (shown at p Ͻ 0.001 uncorrected) revealed the SMA and a region in the caudal portion of dACC (both FWE-corrected p Ͻ 0.05). Cluster-level corrected results obtained from FSL's Flame 1 (z Ͼ 2.3, p Ͻ 0.05) are overlaid in green to confirm this finding. B, For illustration purposes, the two opposing difference signals are shown for the dACC cluster on the right (standard errors: Ϯ SEM). C, A custom-built sigmoidal model was fitted to participants' choices to obtain individual effort discounting curves (gray; red represents group mean). In the model, the subjective value of an option's reward ( y-axis, represented in %) is discounted with increasing effort levels (x-axis). This allowed inferring the subjective values ascribed to choice options and modeling of subjective value in the BOLD data. D, The difference in subjective value between the chosen and unchosen option, as derived from the behavioral effort discounting model in C, was encoded in a similar network of regions as the combined difference in reward magnitude and effort shown in A, including caudal dACC, SMA, bilateral putamen, and insula (shown at p Ͻ 0.001 uncorrected as obtained with SPM; cluster-level corrected FSL results overlaid in green for z Ͼ 2.7, p Ͻ 0.05). E, The subjective value difference signal extracted from the dACC is shown for illustration (standard errors: Ϯ SEM). F, Left, Regions encoding subjective value as in D but where the strength of this signal additionally correlated with the extent to which value difference guided behavior (inverse softmax temperature ␤ V ; shown at p Ͻ 0.01 uncorrected; only the dACC survives cluster-level FWE-correction at p Ͻ 0.05). Right, Illustration of the correlation in dACC for visual display purposes only. The stronger the BOLD difference between the chosen and unchosen option in this region, the more precisely participants' choices are guided by value (␤ V ). This suggests that the dACC's value signal computed at the time of choice is relevant for guiding choices. G, Regions where the encoding of effort difference correlates, across subjects, with a marker for the individual level of effort distortion as captured by the parameters k and p of the modeled discount function. The better an individual's subjectively experienced effort was captured in the GLM (i.e., the less distorted their discount function), the stronger the inverse effort difference signal in caudal dACC and SMA (light blue represents p Ͻ 0.001 uncorrected; dark blue represents p Ͻ 0.005 uncorrected; dACC/SMA survive cluster-level FWE-correction at p Ͻ 0.05). This suggests that dACC and SMA encode effort difference in the way it subjectively influences the choice. 4.77; SMA peak [Ϫ6, Ϫ7, 61], t (1,19) ϭ 6.72; Fig. 3C). This suggests that the BOLD signal aligns with the subjective experience of effort-discounted value, which was best captured using the sigmoidal model.
A crucial question is whether the observed value difference signal bears any behavioral relevance for choice, rather than potentially being a mere byproduct of a choice computation elsewhere. In the former case, one would expect that the encoding of subjective value difference relates to the strength, or "weight," with which subjective value difference influenced behavior across participants (Jocham et al., 2012;Kolling et al., 2012;Khamassi et al., 2015). Such a behavioral weight was derived for each participant using a logistic regression on the normalized model-derived subjective values. The resulting parameter estimate is the same as the inverse softmax temperature or precision and reflects how consistently participants choose the subjectively more valuable option (see ␤ V in Eq. 2). The only region that was significant in this second-level test and also encoded value difference at the first level was the dACC (Fig. 2F; cluster-level FWE-corrected, p Ͻ 0.05; peak [Ϫ3, 11, 31], t (1,19) ϭ 3.71). In other words, dACC encoded value difference on average across the group, and participants who exhibited a larger BOLD value difference signal in the dACC were also more consistent in choosing the subjectively better option (larger ␤ V ); this relationship is illustrated in Figure 2F.
To further probe whether the identified network of regions evaluates the choice options in a subjective manner, we examined the relationship between the subjective "distortion" of effort described by the parameters k and p of the individual effort discount function, and the BOLD signal related to the effort difference across participants. We calculated a measure to describe how much the true effort difference deviated from the subjectively experienced effort difference overall across trials. This "distortion" regressor correlated with k (r ϭ 0.9646, p Ͻ 0.001) and p (r ϭ 0.60, p ϭ 0.0043), but not ␤ E (r ϭ Ϫ0.27, p ϭ 0.24), and was used as a second-level covariate for the effort difference contrast. GLM1 contained the efforts shown on the screen and thus should have captured the subjectively experienced effort better in participants who showed smaller effort distortions (i.e., with discounting closer to linear). Thus, in regions related to the comparison of subjective effort or effort-integrated value, we expected participants with less effort distortions to show a stronger negative effort difference signal. Indeed, we found such a positive secondlevel correlation with the BOLD signal in dACC and SMA, supporting the notion that effort difference is encoded in these regions in the way it subjectively influences the choice (Fig. 2G ACC has access to information from motor structures (Selemon and Goldman-Rakic, 1985;Dum and Strick, 1991;Morecraft andVan Hoesen, 1992, 1998 Fig. 2D), the hyperbolic model (purple), and the parameter-free descriptions of value (blue: reward Ϫ effort; green: reward/effort; all shown at p Ͻ 0.001 uncorrected). The three alternative contrasts reveal a similar network, albeit less strongly. C, Crucially, the sigmoidal model provides a significantly better description of the BOLD signal in SMA, extending into caudal dACC, compared with all other models. Purple represents sigmoidal versus hyperbolic. Blue represents sigmoidal versus parameter-free subtraction. Green represents sigmoidal versus parameter-free division (shown at p Ͻ 0.001 uncorrected). 2013), and prefrontal regions known to be involved in reward processing, such as the vmPFC and OFC (Padoa-Schioppa and Assad, 2006;Kennerley and Wallis, 2009;Levy and Glimcher, 2011;Rudebeck and Murray, 2011;Klein-Flügge et al., 2013;Chau et al., 2015;Stalnaker et al., 2015). We thus reasoned that the ACC may be a key node for the type of effort-based choice assessed in the present task. To further test this hypothesis, we sought to identify regions that mediate between reward maximization versus effort minimization in our task.
To this end, we first extracted two separate behavioral weights reflecting participants' tendency to seek reward and avoid effort. These behavioral parameters were derived from a logistic regression with two regressors explaining how much choices were guided by the difference in reward magnitude and the difference in effort level between options (␤ M and ␤ E in Eq. 3). This is distinct from using just one regressor for the combined subjective value difference as done above (␤ V ). Across participants, we then first identified brain regions where the encoding of chosen versus unchosen reward magnitude correlated with the weight, ␤ M , with which choices were influenced by the difference in reward between the chosen and unchosen option. Second, we performed the equivalent test for effort (i.e., we identified regions where the neural encoding of chosen vs unchosen effort correlated with the weight, Ϫ␤ E, with which behavior was guided by the difference in effort between the chosen and unchosen option). The two tests revealed two distinct networks of regions. First, the vmPFC encoded reward magnitude difference across subjects as a function of how much participants' choices were driven by the difference in reward between the options (SVC FWE-corrected cluster-level p ϭ 0.037; peak [Ϫ6, 44, Ϫ8], t (1,19) ϭ 2.87; Fig. 4A). Unlike in many other tasks (Kable and Glimcher, 2007;Boorman et al., 2009;FitzGerald et al., 2009;Philiastides et al., 2010;Hunt et al., 2012;Kolling et al., 2012;Clithero and Rangel, 2014;Strait et al., 2014), the vmPFC BOLD signal did not correlate with chosen reward or reward difference on average in the group. However, reward difference signals were on average positive for participants whose choices were more strongly driven by reward magnitudes (median split; Fig. 4A). At the whole-brain level, the correlation of behavioral reward-weight, ␤ M , and BOLD reward difference encoding did not reveal any activations using our FWE cluster-level corrected criterion of p Ͻ 0.05. Using a lenient exploratory threshold ( p ϭ 0.01, uncorrected), we identified a small number of other regions including the posterior cingulate cortex bilaterally and visual cortex (Fig. 4A), but crucially no clusters in motor, supplementary motor, or striatal regions.
By contrast, a network of motor regions, including SMA and putamen, encoded effort difference as a function of the individual behavioral effort weight Ϫ␤ E (Fig. 4B; SVC FWE-corrected cluster-level SMA: p ϭ 0.048, peak [3, Ϫ7, 58], t (1,19) ϭ 2.59; left putamen: p ϭ 0.035, peak [Ϫ27, Ϫ4, Ϫ5], t (1,19) ϭ 3.39; right putamen no suprathreshold voxels). In other words, these regions encoded the difference in effort between the chosen and unchosen options more strongly in participants whose choices were negatively influenced by large effort differences (i.e., participants who were more sensitive to effort costs). Using a whole-brain FWE cluster-level-corrected threshold (p Ͻ 0.05), no regions were detected in this contrast. At an exploratory threshold (p ϭ 0.01, uncorrected), this contrast also highlighted regions in the brainstem, primary motor cortex, thalamus, and dorsal striatum (Fig. 4B), and thus regions previously implicated in evaluating motor costs and in . This showed that the BOLD signal in vmPFC (SVC FWE-corrected, p Ͻ 0.05) reflected the difference between chosen and unchosen reward more strongly in participants who also placed a stronger weight on maximizing reward (top, bottom left). Although we could not identify an average reward difference coding in vmPFC across the group, the subset of participants who placed a stronger weight on reward (larger ␤ M ; median split, ellipse) did encode the difference between the chosen and unchosen reward magnitudes (bottom right). This suggests that vmPFC might bias choices toward reward maximization (standard errors: Ϯ SEM). B, A very distinct network of regions, including the SMA and putamen (both SVC FWE-corrected, p Ͻ 0.05), encoded effort difference as a function of participants' behavioral effort weight (␤ E ; shown at p Ͻ 0.01 uncorrected). This system was active more strongly in participants who tried to more actively avoid higher efforts and has often been associated with effort evaluation. It might counteract the vmPFC circuit shown in A to achieve effort minimization, which is in constant conflict with reward maximization in our task. Correlation plots (bottom) are only shown for visual illustration of the effects for a priori ROIs; no statistical analyses were performed on these data.
recruiting resources in anticipation of effort (Croxson et al., 2009;Burke et al., 2013;Kurniawan et al., 2013), but clearly distinct from the vmPFC/posterior cingulate cortex network identified in the equivalent test for reward above.
Together, our data thus show that two distinct networks centered on vmPFC versus SMA/putamen encode the reward versus effort difference as a function of how much these variables influence the final choice. Yet only the caudal portion of dACC encodes the difference in overall subjective value as a function of how much overall value influences choice. This region in dACC could therefore be a potential mediator between reward maximization and effort minimization, which appear to occur in separate neural circuits.

Functionally distinct subregions of medial PFC
For completeness, we also tested whether any areas encode an opposite value difference signal (i.e., the inverse of the conjunction analysis and of the subjective value difference contrast performed above), reflecting the evidence against the chosen option and thus one notion of decision difficulty. This did not reveal any regions at our conservative (cluster-level FWE-corrected) threshold in either test. At a more lenient exploratory threshold ( p ϭ 0.01 uncorrected), a single common cluster in medial PFC (pre-SMA/area 9) was identified (Fig. 5), in agreement with previous reports of negative value difference signals in this region (Wunderlich et al., 2009;Hare et al., 2011). Importantly, the location of this activation was clearly distinct from the caudal dACC region found to encode a positive value difference (Fig. 2). Here, by contrast, value difference signals did not correlate with the strength with which subjective value difference influenced behavior across participants (␤ V ; no suprathreshold voxels at p ϭ 0.01 uncorrected), suggesting that this region's functions during choice are separate from those that bias behavior.

Discussion
Choices requiring the consideration of motor costs are ubiquitous in everyday life. Unlike other types of choices, they require knowledge of the current state of the body and its available energy resources, to weight physical costs against potential benefits. How this trade-off might be implemented neurally remains largely unknown.
Here, we identified a region in the caudal part of dACC as the key brain region that carried the requisite signatures for effort-based choice: dACC represented the costs and benefits of the chosen relative to the alternative option, integrated effort and reward into a combined subjective value signal, computed the subjective value difference between the chosen relative to the alternative option, and activity here correlated with the degree to which participants' choices were driven by value.

ACC integrates effort and reward information
Work from several lines of research suggests that ACC may be a key region for performing cost-benefit integration for effortbased choice. For example, lesions to ACC (but not OFC) result in fewer choices of a high effort/high reward compared with a low effort/low reward option: yet such animals still choose larger reward options when effort costs for both options are equated, implying that ACC is not essential when decisions can be solved only by reward Schweimer and Hauber, 2005;Rudebeck et al., 2006;Floresco and Ghods-Sharifi, 2007). BOLD responses in human ACC reflect the integrated value of effort-based options in the absence of choice (Croxson et al., 2009). Further, single neuron recordings from ACC encode information about both effort and reward (Shidara and Richmond, 2002;Kennerley et al., 2009Kennerley et al., , 2011 and integrate costs and benefits into a value signal (Hillman and Bilkey, 2010;Hosokawa et al., 2013;Hunt et al., 2015). ACC thus appears to have a critical role in integrating effort and reward information to derive the subjective value of performing a particular action.

ACC encodes a choice comparison signal
However, from the aforementioned work, it remained unclear whether cost-benefit values of different choice options are actually compared in ACC, or whether reward and effort may be compared in separate neural structures and the competition resolved between areas. When one choice option is kept constant, the value of the changing option correlates perfectly with the value difference between the options (Kurniawan et al., 2010;Prévost et al., 2010;Bonnelle et al., 2016), which precludes distinguishing between valuation and value comparison processes. This is similarly true when only one option is offered and accepted/rejected (Bonnelle et al., 2016). We here varied both options' values from trial to trial, which allowed us to identify a choice comparison signal in the ACC, and thus the essential neural signature implicating this area in decision making. First, we the BOLD signal encodes an inverse rather than a positive difference between chosen and unchosen reward magnitudes and a positive rather than an inverse difference between chosen and unchosen effort (i.e., the exact inverse of the conjunction shown in Fig. 2A). The only region detected at a lenient threshold ( p ϭ 0.01 uncorrected; no regions survive FWE correction) is a nearby but anatomically distinct region in medial prefrontal cortex previously suggested to serve as a choice comparator (Wunderlich et al., 2009;Hare et al., 2011). B, However, in this region, the BOLD signal does not relate to behavior, as was the case for the caudal portion of dACC (see Fig. 2F ).
show that a region in the caudal portion of dACC encodes separate difference signals for effort and reward. The direction of these difference signals aligns with their respective effect on value, with effort decreasing and reward increasing an option's overall value. Second, we demonstrate a comparison signal between integrated option values. We used a novel behavioral model (Klein-Flügge et al., 2015) to characterize participants' individual tendency to discount reward given the level of motor costs. Using the resultant model-derived subjective values, we identified the dACC as a region encoding a combined value difference signal. Indeed, our model provided a better characterization of the BOLD signal than other models of effort discounting, and dACC activity was related to individuals' "distortions" of effort. This resolves an important question showing that effort and reward information are indeed brought together within a single region to inform choice.
Finally, this value comparison signal also varied as a function of how much value influenced choices across participants. This result further strengthens the idea that the dACC plays a crucial role in guiding choice, rather than merely representing effort or reward information. In our task, no other region exhibited similar dynamics, even at lenient thresholds.
Influences from "effort" and "reward" circuits Nevertheless, an important question remains: do the regions that preferentially encode reward or effort have any influence on choice? To examine this question, we looked for regions that explain participants' tendency to avoid effort, or to seek reward. This analysis revealed two distinct circuits. Whereas signals in vmPFC reflected the relative benefits as a function of how reward-driven participants' choices were, a network more commonly linked to action selection and effort evaluation (Croxson et al., 2009;Kurniawan et al., 2010Kurniawan et al., , 2013Prévost et al., 2010;Burke et al., 2013;Bonnelle et al., 2016), including SMA and putamen, encoded relative effort as a function of how much participants tried to avoid energy expenditure. It will be of interest to examine in future work how these circuits interact, and how different modulatory systems contribute to this interplay (see e.g., Varazzani et al., 2015). This question should be extended to situations when different costs coincide or different strategies compete (for one recent example, see Burke et al., 2013), or when information about effort and reward has to be learned (Skvortsova et al., 2014;Scholl et al., 2015).

Converging evidence for multiple decision circuits
Our results contribute to an emerging literature demonstrating the existence of multiple decision systems in the brain which are flexibly recruited based on the type of decision . One well-studied system concerns choices where costs are directly tied to outcomes (e.g., risk, delay). During this type of choice, vmPFC encodes the difference between the chosen and unchosen options' costbenefit value (Kable and Glimcher, 2007;Boorman et al., 2009;Philiastides et al., 2010;Hunt et al., 2012;Kolling et al., 2012), consistent with the decision impairments observed after vmPFC lesions (Noonan et al., 2010;Camille et al., 2011a, c). Other types of choices, however, rely on other networks (Kolling et al., 2012;Hunt et al., 2014;Wan et al., 2015). In the present study, decisions required the integration of motor costs, and we show that for this, dACC, rather than vmPFC, plays a more central role. vmPFC did not encode overall value or the difference in value between the options in our task; in our hands, vmPFC evidenced no information about effort costs, consistent with previous proposals (

Functionally dissociable anatomical subregions of mPFC
The location in the dACC identified here is distinct from a more anterior and dorsal region in medial frontal cortex (in or near pre-SMA) where BOLD encodes the opposite signal: a negative value difference (Wunderlich et al., 2009;Hare et al., 2011). It is also more posterior than a dACC region involved in foraging choices (Kolling et al., 2012). The cluster of activation identified here extends from the cingulate gyrus dorsally into the lower bank of the cingulate sulcus, and it is sometimes also referred to it as midcingulate cortex (MCC) (Procyk et al., 2016) or rostral cingulate zone (Ridderinkhof et al., 2004). According to a recent connectivity-based parcellation, our activation is on the border of areas RCZa (34%), RCZp (33%) and area 24 (48%) (Neubert et al., 2015). While it shares some voxels with the motor cingulate regions in humans , most parts of our cluster are more ventral and located in the gyral portion of ACC (for a discussion of functionally dissociable activations in ACC, see also Kolling et al., 2016).

Relevance for disorders of motivation
Our findings in the dACC speak to an important line of research showing deficits in effort-based decision making in a number of disorders, including depression, negative symptom schizophrenia, and apathy (Levy and Dubois, 2006;Treadway et al., 2012Treadway et al., , 2015Fervaha et al., 2013;Gold et al., 2013;Hartmann et al., 2014;Pizzagalli, 2014;Yang et al., 2014;Bonnelle et al., 2015). Patients with these disorders often show a reduced ability to initiate effortful actions to obtain reward. Crucially, they also exhibit abnormalities in ACC and basal ganglia circuits, as well as other regions processing information about the autonomic state, including the amygdala and some brainstem structures (Drevets et al., 1997;Botteron et al., 2002;Levy and Dubois, 2006). Furthermore, individuals with greater behavioral apathy scores show enhanced recruitment of precisely the circuits implicated in the present study, including SMA and cingulate cortex, when deciding to initiate effortful behavior (Bonnelle et al., 2016). This is interesting because apathy correlates with increased effort sensitivity (␤ E ) (Bonnelle et al., 2016), and we found that individuals with increased effort sensitivity showed enhanced recruitment of SMA and brainstem regions for encoding the effort difference (Fig. 3B). In other words, when committing to a larger (relative) effort, these circuits were more active in people who were more sensitive to effort. As discussed by Bonnelle et al. (2016), we cannot infer cause and effect, but it is possible that the neural balance between activations in reward and effort systems might be different in individuals with greater sensitivity to efforts (such as apathetic individuals). This may be why these people avoid choosing effortful options more often than others. It also provides a possible connection between the network's specific role in effort-based choice and its functional contribution to everyday life behaviors.