Abstract
Humans routinely learn the value of actions by updating their expectations based on past outcomes – a process driven by reward prediction errors (RPEs). Importantly, however, implementing a course of action also requires the investment of effort. Recent work has revealed a close link between the neural signals involved in effort exertion and those underpinning reward-based learning, but the behavioral relationship between these two functions remains unclear. Across two experiments, we tested healthy male and female human participants (N = 140) on a reinforcement learning task in which they registered their responses by applying physical force to a pair of hand-held dynamometers. We examined the effect of effort on learning by systematically manipulating the amount of force required to register a response during the task. Our key finding, replicated across both experiments, was that greater effort increased learning rates following positive outcomes and decreased them following negative outcomes, which corresponded to a differential effect of effort in boosting positive RPEs and blunting negative RPEs. Interestingly, this effect was most pronounced in individuals who were more averse to effort in the first place, raising the possibility that the investment of effort may have an adaptive effect on learning in those less motivated to exert it. By integrating principles of reinforcement learning with neuroeconomic approaches to value-based decision-making, we show that the very act of investing effort modulates one's capacity to learn, and demonstrate how these functions may operate within a common computational framework.
SIGNIFICANCE STATEMENT Recent work suggests that learning and effort may share common neurophysiological substrates. This raises the possibility that the very act of investing effort influences learning. Here, we tested whether effort modulates teaching signals in a reinforcement learning paradigm. Our results showed that effort resulted in more efficient learning from positive outcomes and less efficient learning from negative outcomes. Interestingly, this effect varied across individuals, and was more pronounced in those who were more averse to investing effort in the first place. These data highlight the importance of motivational factors in a common framework of reward-based learning, which integrates the computational principles of reinforcement learning with those of value-based decision-making.
Introduction
Choosing what to do next involves weighing up the value of possible actions based on their expected outcomes (Schall, 2001; Samejima et al., 2005; Rangel et al., 2008; Bartra et al., 2013). According to reinforcement learning theory, the value of an action increases following outcomes that are better than expected, and decreases following those that are worse than expected (Sutton and Barto, 1998; Niv, 2009). Critically, acting on our choices also requires the exertion of effort. However, despite substantial evidence that effort modulates the value of reward, current theories of reinforcement learning do not account for the role of effort in learning.
Recent data suggest that dopamine is fundamental to driving both effort-based decisions and reward-based learning (Berke, 2018). Striatal dopamine plays a critical role in decisions to invest effort; the exertion of effort itself; and the evaluation of choice outcomes (Wise, 2004; Salamone and Correa, 2012). Notably, dopamine also encodes reward prediction errors (RPEs), teaching signals that represent the difference between predicted and actual reward (Montague et al., 1996; Schultz et al., 1997; Watabe-Uchida et al., 2017). Emerging evidence suggests that the dopaminergic signals involved in learning and effort overlap during choice behavior (Berke, 2018). For example, transient fluctuations in dopamine activity, classically recognized as RPEs, have been found to play a key role in shaping the vigor of movement (Hamid et al., 2016; Howe and Dombeck, 2016; Coddington and Dudman, 2018; da Silva et al., 2018; Hughes et al., 2020). Moreover, a recent behavioral finding in humans suggests RPEs themselves may invigorate responses (Jarvis, 2019; Sedaghat-Nejad et al., 2019).
Behavioral data indicate that effort modulates reward valuation both before and after choices are made. When prospectively evaluating a course of action, humans and other animals typically choose the least effortful option to achieve an outcome (Hull, 1943; Bitgood and Dukes, 2006; Kool et al., 2010; Cos et al., 2011; Ranganathan et al., 2013; Shadmehr et al., 2016). For example, suppose we wished to appreciate sweeping views from atop a mountain – most of us would prefer to take the chairlift rather than physically climb to the peak. The aversiveness of effort has been quantified by a large body of experimental work across species showing that effort reduces the subjective value of a prospective reward, the phenomenon known as “effort discounting” (Aberman and Salamone, 1999; Walton et al., 2006; Botvinick et al., 2009; Kurniawan et al., 2010; Prévost et al., 2010; Chong et al., 2015, 2017; McGuigan et al., 2019; Atkins et al., 2020).
In contrast, when an action is evaluated retrospectively, effort tends to inflate the value of a realized reward (Aronson and Mills, 1959; Alessandri et al., 2008; L. Wang et al., 2017). For example, animals tend to prefer outcomes that have previously been associated with more effortful behavior (Klein et al., 2005; Singer et al., 2007; Tsukamoto et al., 2017). Returning to our earlier example, these frameworks predict that the views atop the mountain would be more rewarding if we chose to climb to the peak rather than take the chairlift. In humans, this phenomenon is often interpreted in the context of “cognitive dissonance,” whereby the application of effort is thought to be justified retrospectively by inflating the value of its outcome (Festinger, 1957).
Taken together, these separate lines of research have shown that effort can modulate both the predicted value of a future reward, as well as the observed value of a realized reward, the very two signals that define the RPE in classical reinforcement learning models. This suggests that effort has the potential to modulate learning itself (Tanaka et al., 2021). Importantly, however, this prediction has not been empirically tested in humans. Here, across two experiments, we investigated how effort modulates learning by requiring participants to exert predefined levels of physical force in a reinforcement learning paradigm. Given the close relationship between effort and reward signals in the brain (Salamone and Correa, 2002; Guitart-Masip et al., 2014; Berke, 2018; Tanaka et al., 2021), we predicted that learning would be contingent on RPEs shaped by both reward feedback and the amount of effort exerted. Indeed, in light of recent suggestions that effort and reward signals operate within a common computational framework (Berke, 2018; Sedaghat-Nejad et al., 2019; Jenkins and Walton, 2020), we hypothesized that related computations of effort costs might discount value before choice, and modulate learning after choice.
Materials and Methods
Participants
Participants were young, healthy male and female adults recruited and tested at Monash University in Melbourne, Australia. They reported normal or corrected-to-normal vision, no history of neurologic or psychiatric disorder, and no musculoskeletal injuries to the upper limbs. We tested 94 participants in experiment 1 (77 females; mean age 20 years), and 46 in experiment 2 (28 female, mean age 21.9 years). The experimental protocols were approved by Monash University's Human Research Ethics Committee, and informed consent was obtained from all participants before testing.
Experimental design and statistical analyses
At the core of both experiments was a standard reinforcement learning paradigm. The critical difference between this task and previous reinforcement learning studies is that participants were required to apply prespecified levels of physical force on a pair of hand-held dynamometers to register their responses. We examined the effect of effort on learning by systematically manipulating the amount of force they needed to exert. The primary distinction between experiments 1 and 2 was in whether these prespecified levels of force were manipulated across separate experimental blocks (experiment 1), or within individual blocks (experiment 2).
In both experiments, participants on each trial were presented with a pair of abstract stimuli on the left and right of the screen, and were required to select which was more rewarding based on probabilistic feedback after each trial. The probability of obtaining a reward on selecting a stimulus was 0.7 for a highly rewarded stimulus and 0.3 for a poorly rewarded stimulus. These contingencies periodically reversed over the course of the experiments, and these reversals were not signaled. We randomized the left/right location of the two stimuli on each trial. On rewarded trials, a “smiley face” appeared accompanied by a positively-valenced auditory tone. On nonrewarded trials, a “sad face” appeared with a negatively-valenced tone. Participants had a maximum of 2 s to register a response on each trial, otherwise a “Too slow!” message was displayed for 1 s and then the next trial began. Participants were incentivized by the opportunity to increase their remuneration based on their performance.
Participants registered their choices by applying a prespecified level of physical force to the corresponding left/right dynamometer (SS25LA, BIOPAC Systems). Force levels were standardized for each participant before testing by measuring their maximum voluntary contraction (MVC), which was defined for each hand as the maximum force generated from three ballistic contractions. Both experiments were run in Psychtoolbox (Brainard, 1997) implemented in MATLAB (R2018a, MathWorks), and presented on a monitor at a viewing distance of ∼60 cm.
Experiment 1
Participants completed two counterbalanced blocks of 180 trials. In a “control” block, participants only needed to apply a negligible amount of force (>5% MVC) to register their choices. In a separate “effort” block, choices required a greater amount of force. The precise amount of force required in this “effort” block was systematically varied across three experimental groups (>18%, 31%, or 44% MVC in separate groups of n = 32, 31, and 31, respectively). Block order (“control” vs “effort”) was counterbalanced across participants, and each block was preceded by 15 practice trials to familiarize participants with the force requirements. On any given trial, one of the two stimuli had a 0.7 probability of being rewarded, and the other a 0.3 probability of reward. These reward contingencies reversed when participants reached a cumulative accuracy of 70% (after a minimum of 10 trials) or else after every 16 trials (Park et al., 2010; Schlagenhauf et al., 2013).
Experiment 2
Experiment 2 was similar to experiment 1 except that effort requirements were manipulated within an experimental block, rather than across blocks. The same two stimuli were presented across all trials in experiment 2. One of these stimuli was designated the “low effort” stimulus, and required only a negligible amount of force to be selected (>5% MVC). The other was designated the “high effort” stimulus, which required a higher amount of force (>44% MVC). These stimulus-effort mappings remained constant for the duration of the experiment, and participants were explicitly informed about the identity of the low and high effort stimuli before the experimental task began.
To reinforce these stimulus-effort mappings, participants performed a preliminary block of 50 trials in which they were cued to generate the force corresponding to either the low or high effort stimulus (randomly determined). Participants then received binary feedback (correct vs incorrect) about whether they generated the correct amount of force (5–44% of MVC for the low effort stimulus, or >44% MVC for the high effort stimulus). To proceed to the experimental block, participants had to apply the correct force on at least 20 of the final 25 trials. All participants achieved this on their first attempt, with the exception of one participant, who achieved this on their second attempt.
The experimental task comprised two blocks of 150 trials, preceded by a practice block of 50 trials. As in experiment 1, participants were required to choose between the two stimuli on offer, but the key difference was that, here, they had to incorporate on every trial a consideration of the effort required to select each stimulus. Although the effort required to select a given stimulus was fixed, the value of each stimulus varied periodically. On any given trial, both stimuli could have a high probability of being rewarded (0.7); both could have a low probability of reward (0.3); or one could be superior to the other (0.7 vs 0.3). These contingencies changed after every 12 trials according to a pseudorandomized sequence that ensured the number of transitions and trials involving each combination of reward contingencies was the same for all participants (Manohar et al., 2021).
Data analysis
In both experiments, we first tested whether effort affected learning by examining its effect on win-stay and lose-switch behavior. In experiment 1, we fitted a generalized linear model (GLM) to test whether the probability of choosing the same stimulus after a positive outcome varied as a function of the amount of Effort, E, exerted on trial t:
Effort in these models was defined as the peak force amplitude on each trial (as a proportion of the participant's MVC). We included Block as a dummy variable, as well as the Block × Effort interaction. Participants were modeled as a random effect. Separately, we examined the effect of the exerted Effort on the probability of choosing the alternative stimulus after a negative outcome:
We analyzed the effect of Effort on behavior in experiment 2 using similar GLMs, but without the effect of Block (given that the effort manipulation was within blocks):
We excluded one participant from experiment 1 whose relative accuracy was >3 SDs above the group mean. Statistical analyses were performed in R using RStudio (version 1.1.447; RStudio Team, 2016). GLMs were fitted using the lme4 package (Bates et al., 2015), and plots were created using the ggplot2 package (Wickham, 2016).
Computational modelling
To formally test how effort modulates learning on a trial-by-trial basis, we considered a family of reinforcement learning models based on the traditional Rescorla–Wagner model of reinforcement learning (Rescorla and Wagner, 1972). At the core of the Rescorla–Wagner model is the RPE, δ, which updates the expected value v of a stimulus on trial t. δ is the difference between the reward acquired r and the reward expected based on current stimulus value v, and is scaled by a learning rate parameter α:
Our primary goal was to examine the effect of effort on modulating RPEs. To do so, we first constrained α between 0 and 1 by defining it as a sigmoidal function of a subject-specific signal gain parameter γ:
We then compared this baseline model (M1) against six alternative models that hypothesized distinct effects of effort on γ (Fig. 1).
Signal shift model (M2)
Previous studies suggest that the mere act of investing effort increases the overall value of acquired rewards (Klein et al., 2005; Alessandri et al., 2008; Syed et al., 2016; L. Wang et al., 2017). This work suggests that effort should have a directional (positive) effect on RPEs, such that it should increase the amplitude of RPEs that are positive, and blunt those that are negative. We tested this prediction in a signal shift model (M1), which examined the effect of effort E on learning rate α. This model assumed a linear effect of effort on the signal gain parameter γ:
E is the peak amplitude of force applied on each trial (as a proportion of MVC), and k is a subject-specific parameter that captures individuals' sensitivity to effort. We let k take positive or negative values, such that
Signal enhancement model (M3)
An alternative possibility is that effort amplifies RPEs in response to both positive and negative rewards (i.e., regardless of the valence of the outcome). Normative accounts of choice behavior describe effort as the cost of investing limited energy into one action at the expense of other candidate actions: an “opportunity cost” (Niv et al., 2007; Shadmehr et al., 2019). Effort exertion would potentially offset such costs if it enhanced learning rates independent of the outcome. We tested this in a signal enhancement (M3) model by again estimating learning rate α as a function of both subject-specific signal gain γ and the amount of effort exerted E. However, in this model, effort modulates signal gain symmetrically, such that
Models incorporating separate learning rates for positive and negative RPEs (M4–M7)
M2 and M3 both assume that the magnitude of any effort-related effect on signal gain γ is equivalent for positive and negative RPEs. However, several studies have suggested that the neural processes underpinning positive and negative RPEs may be at least partially dissociable (Bayer and Glimcher, 2005; Cools et al., 2008; Matsumoto and Hikosaka, 2009; Collins and Frank, 2014; Westbrook et al., 2020). Consequently, we fit a further family of models that aimed to decouple the effect of effort on different reward outcomes.
First, we fit two additional models to test whether effort modulates the learning rate only in response to positive RPEs (positive RPE model, M4), or only in response to negative RPEs (negative RPE model, M5):
In addition, to examine whether effort had a weighted effect on learning as a function of RPE sign, we also fit a more complex model with separate effort parameters for positive
Finally, we included a model that assumes asymmetrical learning from positive and negative RPEs that is entirely independent of effort exertion [dual learning rate (no effort) model, M7]. This model includes separate parameters for positive
For all models, we used a softmax function to calculate choice probabilities, whereby the probability P of choosing a given stimulus
We constrained the signal gain parameter γ to the range [−5, 5], which ensured we allowed for the full range of learning rates (effectively 0.01 ≤ α ≤ 0.99) while also preventing extreme values close to the asymptotes. We used flat priors for all parameters. The best-fitting model parameters for each participant were found using maximum likelihood estimation, and we compared overall model fits based on the Akaike Information Criterion (AIC; Akaike, 1974). We also quantified the relative likelihood that the winning model best accounted for choice behavior compared with the others in the model space (i.e., Akaike weights):
To ensure that each of these models was uniquely identifiable, we performed a model recovery analysis based on synthetic data. We simulated behavior on a reinforcement learning task in which agents registered their choices by exerting effort. On each simulation, we randomly sampled subject-specific parameter values from a plausible range, as well as each agent's effort on each trial. We performed 50 simulations, each of which generated data from 100 learning agents making 200 choices each. This analysis revealed that our procedure was able to correctly identify the true generative model with an accuracy of ≥0.88 for each model (Fig. 1C).
Finally, we tested the reliability of the parameter estimates from the winning model in both experiments by performing a parameter recovery analysis. We generated data from the best-fitting parameter estimates for each participant, and sampled trial-by-trial effort exertion from a Gamma distribution fit using maximum likelihood estimation to the observed distribution of peak amplitudes for each subject. We then repeated our model-fitting procedure on these synthetic data and quantified the reliability of the parameter estimates as the rank-order correlation between the generative “true” value and the recovered value.
Results
Experiment 1
Effort increased the tendency to repeat rewarded choices
The aim of experiment 1 was to examine the effect of effort on learning, independent of any individual preferences to exert effort itself. First, we compared the effect of effort on choice accuracy across the three (low, medium and high) effort groups. For each participant, this effect was computed as the proportion of trials in which the more rewarded stimulus was selected in the effort block relative to the control block. A one-way ANOVA revealed a significant effect of effort (F(1,91) = 6.22, p = 0.014; Fig. 2C), which was primarily driven by greater relative accuracy in the high vs the low effort group (p = 0.045, Bonferroni-corrected).
We investigated whether this effect on accuracy could be explained by differences in choice strategy. A GLM examining win-stay behavior found that trial-by-trial Effort was associated with a higher probability of choosing the same stimulus following a positive reward outcome (β = 1.08, SE = 0.47, p = 0.021; Fig. 2D). Neither the simple effect of Block nor the Block × Effort interaction was a significant predictor (both p > 0.22). An analogous GLM on lose-switch behavior showed that effort was associated with a lower probability of switching following a negative reward outcome (β = −0.84, SE = 0.34, p = 0.014; Fig. 2D). Again, neither Block nor the Block × Effort interaction was a significant predictor (both p > 0.11).
Together, these analyses suggest that effort may have a directional effect on reward signals during learning, boosting positive outcomes (promoting win-stay behavior), and blunting negative outcomes (reducing lose-switch behavior).
Computational models demonstrated that effort boosted positive and blunted negative RPEs
To test our key prediction that effort modulates learning, we compared a baseline Rescorla–Wagner model against alternative models that hypothesized distinct effects of effort on RPEs (Fig. 1). Our model comparisons revealed that the signal shift model provided the best fit to the observed data, with an improved AIC relative to the baseline model of 223.32 units (Fig. 3A), and an Akaike weight in excess of 0.99 across the model space. Critically, k values derived from the model were significantly greater than zero (0.76 ± 0.28; Wilcoxon sign-rank test, W = 2964, p = 0.006; Fig. 3B), demonstrating that the overall effect of effort was to boost positive RPEs, and blunt negative RPEs. These results provide a computational explanation for our earlier finding that greater effort was associated with more prominent win-stay behavior and less prominent lose-switch behavior. Post hoc comparisons revealed that this effect was driven by the medium effort group (1.48 ± 0.44, W = 408, p = 0.005, Bonferroni-corrected), rather than the low or high effort groups (both p > 0.16). A parameter recovery analysis confirmed that all parameters were reliably recoverable (parameter estimation reliability, γ = 0.86, β = 0.94, k = 0.67; p < 0.001 for all parameters).
Notably, the winning signal shift model provided a superior fit compared with the dual learning rate (no effort) model (ΔAIC = 25.39), indicating that effort was critical in increasing learning rates for positive RPEs, and reducing learning rates for negative RPEs. To further confirm that this result was not simply because of a differential effect of positive and negative RPEs on value-updating independent of effort, we ran a post hoc permutation test in which we randomly shuffled the effort exerted across trials for each participant. On each permutation, we compared model fits for the signal shift model based on the empirical data against the permuted data. Across 100 permutations, the empirical data resulted in superior model fits relative to the permuted data in every case (p = 1, Z = 57.14, p < 0.001), confirming that effort played a critical role in modulating learning rate in response to positive and negative RPEs.
In summary, experiment 1 revealed that effort reinforced learning by shifting the RPEs in a positive direction. Notably, on every trial of this experiment, the effort required to select either stimulus was identical. This allowed us to focus solely on how effort affects the capacity to learn from choice outcomes, independent of an individual's aversion toward effort. An important question that remains is how learning is modulated by the willingness of an individual to invest effort in the first place. We addressed this question in a second experiment that was similar to the first, but in which the two stimuli presented on every trial were associated with different effort requirements. This allowed us to test the capacity of the winning signal shift model to account for effort on both the prospective and retrospective valuation of reward.
Experiment 2
Experiment 2 was similar to experiment 1, with the key difference that the two stimuli presented on each trial required different amounts of effort to select (“low,” >5% MVC, vs “high,” >44% MVC; Fig. 4A,B). For each participant, these mappings remained constant for the duration of the task, but their reward contingencies systematically changed over time. Thus, decisions on each trial required individuals to consider both the effort costs of each stimulus (known in advance) and its potential rewards (learned during the task).
Choices between low and high effort stimuli demonstrated effort aversion
Participants displayed an overall aversion to effort, shown by a higher proportion of choices for the low compared with the high effort stimulus (0.53 ± 0.01, d = 0.45, t(45) = 3.1, p = 0.003; Fig. 4C). This did not affect overall choice accuracy (mean difference = 0.01 ± 0.01, d = 0.13, t(45) = 0.97, p = 0.34, Fig. 4D). Greater effort was associated with reduced win-stay behavior (β = −0.88, SE = 0.17, p < 0.001; Fig. 4E), and increased lose-switch behavior (β = 0.46, SE = 0.1, p < 0.001; Fig. 4E). Note that this contrasts with experiment 1, in which effort increased win-stay behavior and reduced lose-switch behavior. This is unsurprising, given that the aversiveness of the high effort stimulus (Fig. 4C) was likely to have obscured any subtler effects of effort on the RPE in a GLM. Therefore, to test whether these two possible effects of effort could be dissociated, we used a similar computational approach to experiment 1.
Computational models indicated that effort discounted value and shifted RPEs in a positive direction
The key difference between experiments 1 and 2 was that participants now had to balance an aversion to the high effort stimulus against their desire to maximize reward. To capture individuals' aversion to effort, we paired models from experiment 1 (M2–M7) with an effort discounting function, such that the discounted value of each stimulus (v′) was computed as its learned value (v), discounted by the amount of effort required to select it (E), whereby E = 0.05 for the low effort stimulus and E = 0.44 for the high effort stimulus. Effort was scaled by a separate subject-specific effort discounting parameter (
As in experiment 1, learning rate varied as a function of trial-by-trial effort exertion, scaled by a free parameter capturing the effect of effort on RPEs (
Model comparisons replicated our earlier findings by showing that the best fitting model comprised the same signal shift model of experiment 1, now paired with an effort discounting function to account for choices between the low and high effort stimulus on each trial. This model had an improved AIC of 156.02 units relative to the baseline model, and an Akaike weight in excess of 0.99 across the model space (Fig. 5A). Across the group,
A parameter recovery analysis confirmed that all parameter estimates from the winning signal shift model were reliably recoverable (parameter estimation reliability, γ = 0.85, β = 0.91, ked = 0.62, krpe = 0.60, p < 0.001 for all parameters).
Again, we note that the winning signal shift model provided a superior fit to the dual learning rate (no effort) model (ΔAIC = 12.53), confirming that effort played a significant role in modulating RPEs. To verify that the signal shift model was not merely approximating dual learning rates independent of effort requirements, we ran the same post hoc permutation test as in experiment 1, involving 100 permutations with effort data shuffled across trials for each participant. On every one of these permutations, the empirical data provided superior fits to the permuted data (p = 1, Z = 26.3, p < 0.001), again pointing to a critical role for effort in increasing learning rates for positive RPEs, and reducing learning rates for negative RPEs.
The effect of effort on RPEs was greater in individuals who were more averse to effort
These results demonstrate that an individual's sensitivity to effort results in both a greater aversion to effort (i.e., greater effort discounting,
Discussion
Recent data suggest that the neural signals mediating effort exertion overlap with those that convey reward value (Hamid et al., 2016; Syed et al., 2016; Sedaghat-Nejad et al., 2019; Hughes et al., 2020). To date, however, it has remained unclear how effort and reward-based learning could co-exist within a common computational framework (Berke, 2018; Jenkins and Walton, 2020; Tanaka et al., 2021). Here, we tested how effort modulates human reinforcement learning. Our key finding was that the exertion of physical effort resulted in a unidirectional increase in the subjective value of a learned outcome. Furthermore, we showed that the extent to which effort reinforced learning was directly proportional to the degree to which an individual was averse to investing that effort. Together, these data demonstrate that learning is shaped not only by rewards, but also by the effort required to obtain them.
Our analysis aimed to reconcile two broad frameworks of value-based decision-making. Reinforcement learning theory stipulates that choices are driven by estimates of expected future reward (Sutton and Barto, 1998), whereas neuroeconomic theories frame decisions as cost-benefit trade-offs (Rangel et al., 2008). By integrating these frameworks, we show that effort influences both the prospective and retrospective valuation of reward, and in turn modulates the RPEs that drive learning. Across two different experiments, we found that effort shifted RPEs in a positive direction on a trial-by-trial basis. This positive shift was independent of the valence of the reward outcome, showing that the overall effect of effort was to boost positive RPEs, and blunt negative RPEs.
These results are in keeping with previous findings that effort exertion influences subsequent choice preferences. For example, humans prefer to view stimuli that have previously been paired with more aversive levels of effort (Klein et al., 2005; Alessandri et al., 2008). Similarly, nonhuman animals show a preference for food rewards that previously required greater effort to obtain (Clement et al., 2000; Friedrich and Zentall, 2004; Johnson and Gallagher, 2011). Here, we present a possible computational basis for these past findings by demonstrating that effort shapes choice preferences by modulating the efficiency with which action values are updated, increasing that efficiency for outcomes that are positive, and reducing it for those that are negative.
This result has interesting implications. Specifically, the net effect of effort exertion in our study gave rise to a pattern of results similar to that described in previous work on reinforcement learning. Several studies have now shown that learning rates are typically higher for positive relative to negative reward outcomes (Frank et al., 2007; den Ouden et al., 2013; Lefebvre et al., 2017; Palminteri et al., 2017). In addition, this learning rate asymmetry seems to be more pronounced following actions in which individuals are more invested (e.g., self-determined vs forced choices; Chambon et al., 2020), which accords with our effort results. Taken together, these results suggest that effort may at least in part contribute to differences in positive and negative learning rates that have been observed across a wide range of tasks, although this remains to be confirmed in future studies.
An important finding in experiment 2 is the link between the computation of prospective effort costs, which discounted value before choice, and realized effort costs (i.e., exertion itself), which reinforced learning after choice. This suggests that, for a given individual, the extent to which effort reinforces learning depends on their sensitivity to effort. One interpretation of this result is that the aversiveness of effort lowers the value of an individual's current state, which in turn increases the relative value of rewards obtained in that state (Zentall and Singer, 2007). Such state-based valuation effects have been inferred previously (Clement et al., 2000; Kacelnik and Marsh, 2002; Zentall and Singer, 2007), including in aversive contexts other than the exertion of effort, such as temporal delay (DiGian et al., 2004; Pompilio and Kacelnik, 2005), reward omission (Friedrich et al., 2005), and hunger (Aw et al., 2009), which points to a more general link between the strength of reinforcement and the motivational state of the individual (Berridge, 2004; McNamara et al., 2012).
We found that the individuals whose learning was most affected by effort were those more averse to investing effort in the first place. This provides an interesting juxtaposition to the psychological concept of effort justification, which is the finding that individuals who are more averse to effort tend to overvalue the outcomes of any such investment (Festinger and Carlsmith, 1959). Theories of effort justification argue that this is driven by the cognitive dissonance that follows the experience of unpleasant levels of effort (Festinger, 1957). In contrast, our data suggest that the augmentation of value by effort may not merely be a consequence of cognitive dissonance – rather, it may be an adaptive mechanism that offsets the potential disadvantage of being less motivated (McNamara et al., 2012).
An extensive body of work has established the importance of dopamine signals in both motivation and learning (Salamone and Correa, 2002; Wise, 2004). The classical account is that these signals operate over different timescales (Schultz, 2007), with motor vigor linked to slow fluctuations in striatal dopamine activity (Niv et al., 2007; Howe et al., 2013; Y. Wang et al., 2021), and learning driven by more rapid, phasic changes in dopamine firing rates (Montague et al., 1996; Schultz et al., 1997). However, recent data have challenged these views by showing that effort exertion may also increase the activity of dopaminergic neurons in phasic bursts (Hamid et al., 2016; Syed et al., 2016; da Silva et al., 2018; Hughes et al., 2020). These neurophysiological data raise the possibility that transient, effort-induced increases in dopamine activity could augment the reward signals that drive learning (Tanaka et al., 2021), a speculation that accords with our computational findings, and with previous studies showing that effort boosts dopamine signals for positive outcomes (Syed et al., 2016) and blunts the reductions in dopamine activity that accompany negative outcomes (Stelly et al., 2019).
A topical issue has been to reconcile the role of dopamine in signaling reward and effort (Gan et al., 2010; Hollon et al., 2014; Hamid et al., 2016; Syed et al., 2016; Skvortsova et al., 2017; Westbrook et al., 2020). In particular, several studies have probed the effect of reward on subsequent effort exertion (Nakamura and Hikosaka, 2006; Beierholm et al., 2013; Chong et al., 2015). An important unanswered question, however, is whether the reverse relationship holds, that is, whether effort can systematically affect learning about reward outcomes. Our study fills this gap by confirming the existence of a robust link between effort and the RPEs that lie at the core of reinforcement learning. Note that this also differs from previous work that has characterized the computational architecture underpinning learning about effort costs (Skvortsova et al., 2014, 2017). Here, we deliberately trained participants on the required levels of effort before testing to minimize learning about effort costs during our experiments, allowing us to focus our analysis on the effect of effort on reward signals. Taken together with earlier studies showing that reward increases the speed of subsequent movements (Milstein and Dorris, 2007; Summerside et al., 2018) and the willingness to exert effort (Chong et al., 2015), our results suggest a strong bidirectional relationship between effort and reward.
In summary, this study contributes to a growing body of work highlighting the importance of motivational factors, such as the willingness to exert effort, in models of reward-based learning (Berridge, 2007; Zhang et al., 2009; McNamara et al., 2012; Collins and Frank, 2014; Berke, 2018; Juechems and Summerfield, 2019; van Swieten and Bogacz, 2020; Tanaka et al., 2021). From a clinical perspective, learning impairments are common across a range of neurologic and psychiatric diseases, including Parkinson's disease (Peterson et al., 2009; Schapira et al., 2017; Chong, 2018), schizophrenia (Waltz et al., 2007; Schlagenhauf et al., 2014), and ADHD (Seidman et al., 2001; Luman et al., 2010). This study thus lays the foundation for future work to test the role of striatal dopamine signals in effort-based learning, and to examine whether effort-based interventions could be applied therapeutically in clinical populations with learning impairments.
Footnotes
H.J. is supported by an Australian Government Research Training Program Scholarship. J.C. is supported by the Australian Research Council Grant DP190100772. T.T.-J.C. is supported by Australian Research Council Grants DP 180102383 and DE 180100389, the Judith Jane Mason and Harold Stannett Williams Memorial Foundation, the Brain Foundation, the Society for Mental Health Research, and the Office of Naval Research (Global). We thank Virginia Klink and Veronica Mazur for assisting with data collection and Julian Matthews and Adam Morris for helpful discussions. H.J. and T.T.-J.C. were supported by the Rebecca L. Cooper Medical Research Foundation.
The authors declare no competing financial interests.
- Correspondence should be addressed to Huw Jarvis at huw.jarvis{at}monash.edu