Abstract
In Reinforcement Learning (RL), animals choose by assigning values to options and learn by updating these values from reward outcomes. This framework has been instrumental in identifying fundamental learning variables and their neuronal implementations. However, canonical RL models do not explain how reward values are constructed from biologically critical intrinsic reward components, such as nutrients. From an ecological perspective, animals should adapt their foraging choices in dynamic environments to acquire nutrients that are essential for survival. Here, to advance the biological and ecological validity of RL models, we investigated how (male) monkeys adapt their choices to obtain preferred nutrient rewards under varying reward probabilities. We found that the rewards’ nutrient composition strongly influenced learning and choices. The animals’ preferences for specific nutrients (sugar, fat) affected how they adapted to changing reward probabilities: the history of recent rewards influenced monkeys’ choices more strongly if these rewards contained the monkey’s preferred nutrients (‘nutrient-specific reward history’). The monkeys also chose preferred nutrients even when they were associated with lower reward probability. A nutrient-sensitive RL model captured these processes: it updated the values of individual sugar and fat components of expected rewards based on experience and integrated them into subjective values that explained the monkeys’ choices. Nutrient-specific reward prediction errors guided this value-updating process. Our results identify nutrients as important reward components that guide learning and choice by influencing the subjective value of choice options. Extending RL models with nutrient-value functions may enhance their biological validity and uncover nutrient-specific learning and decision variables.
SIGNIFICANCE STATEMENT:
Reinforcement learning (RL) is an influential framework that formalizes how animals learn from experienced rewards. Although 'reward’ is a foundational concept in RL theory, canonical RL models cannot explain how learning depends on specific reward properties, such as nutrients. Intuitively, learning should be sensitive to the reward’s nutrient components, to benefit health and survival. Here we show that the nutrient (fat, sugar) composition of rewards affects monkeys’ choices and learning in an RL paradigm, and that key learning variables including ‘reward history’ and ‘reward prediction error’ should be modified with nutrient-specific components to account for monkeys’ behavior in our task. By incorporating biologically critical nutrient rewards into the RL framework our findings help advance the ecological validity of RL models.
Footnotes
The authors declare that they have no competing interests.
We thank Wolfram Schultz and his group for support and discussions; Putu Khorisantono for discussions; Christina Thompson and Aled David for animal care; Polly Taylor for anesthesia; Henri Bertrand for veterinary care. This work was funded by the Wellcome Trust and the Royal Society (Sir Henry Dale Fellowships 206207/Z/17/Z and 206207/Z/17/A to F.G.). F.-Y.H. was supported by a Fellowship from the Taiwan Ministry of Education. This research was funded in whole, or in part, by the Wellcome Trust. For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.